opendevreview | Ghanshyam proposed openstack/nova stable/ussuri: DNM: Testing stable/ussuri with tempest fix for constraints mismatch https://review.opendev.org/c/openstack/nova/+/843046 | 01:53 |
---|---|---|
sean-k-mooney[m] | gmann is ussuri EM or supported currently | 07:07 |
sean-k-mooney[m] | because it seams to be useing master uc | 07:07 |
sean-k-mooney[m] | but has pinned tempest | 07:07 |
sean-k-mooney[m] | so i think we either need to have it use python 3.8 so that master uc works | 07:08 |
sean-k-mooney[m] | or we need to chanve uc to be <= instead of === | 07:08 |
sean-k-mooney[m] | gmann: if we dont pin i think we canc get it to install but i was actully using 3.9 to test i was going to try deplying ussuri today | 07:11 |
sean-k-mooney[m] | i was using one of my centos 9 vms to test yesterday which was too new to test properly | 07:11 |
opendevreview | Rico Lin proposed openstack/nova-specs master: Add vIOMMU device support for libvirt driver https://review.opendev.org/c/openstack/nova-specs/+/840310 | 07:18 |
ricolin | stephenfin: I need your feedback on comment https://review.opendev.org/c/openstack/nova-specs/+/840310/7..10/specs/zed/approved/libvirt-viommu-device.rst#b52 thanks :) | 07:19 |
ricolin | Also for aw_bits, I propose we set it to 48 (which at least will cover both 39 and 48 width option ) and don't expose it to end user | 07:20 |
sean-k-mooney[m] | ricolin if 48 is supported on our min version of qemu im ok to hard code to that | 07:38 |
sean-k-mooney[m] | at least for now until we have a need for something higher | 07:38 |
whoami-rajat | hi #openstack-nova , wanted to add a topic to today's meeting agenda, do we have an etherpad for the meeting? | 07:55 |
sean-k-mooney[m] | we have a wiki page | 07:55 |
whoami-rajat | ack, can i add topics there? | 07:56 |
sean-k-mooney[m] | yep just add it to the adgenda here https://wiki.openstack.org/wiki/Meetings/Nova https://wiki.openstack.org/wiki/Meetings/Nova | 07:56 |
whoami-rajat | done, thanks! | 08:04 |
opendevreview | Rajesh Tailor proposed openstack/nova master: Fix typos https://review.opendev.org/c/openstack/nova/+/843127 | 10:31 |
ricolin | sean-k-mooney[m]: aw_bit is introduced since libvirt 6.5.0 is that means I should propose in spec to bump min version for libvirt/qemu as well? | 11:08 |
ricolin | current: MIN_LIBVIRT_VERSION = (6, 0, 0) | 11:09 |
sean-k-mooney | libvirt 6.5.0 or qemu | 11:09 |
sean-k-mooney | ricolin: we have to advertise or min version bumps in advance | 11:09 |
sean-k-mooney | so in general no | 11:09 |
sean-k-mooney | but i need to check when we last did that | 11:10 |
ricolin | thanks | 11:10 |
sean-k-mooney | looking at https://docs.openstack.org/nova/latest/reference/libvirt-distro-support-matrix.html i dont think we can increase it this cycle | 11:12 |
sean-k-mooney | we shoudl be able to do it in AA | 11:12 |
sean-k-mooney | we need to still suport 20.04 this cycle | 11:12 |
sean-k-mooney | but we can go to libvirt 7.0 in AA | 11:13 |
sean-k-mooney | and qemu 5.2 | 11:13 |
sean-k-mooney | ricolin: so yes you will need to do a min version check and either not set the value or reject the boot | 11:14 |
ricolin | sean-k-mooney: So this patch might gonna need to wait AA to bump libvirt 7.0.0 if we keep aw_bits 48(which required min libvirt version 6.5.0), right? | 11:21 |
sean-k-mooney | ricolin: no it can proceed | 11:22 |
sean-k-mooney | but you can only set the aw_bit if the hsot has 6.5.0 | 11:23 |
ricolin | ah, got it | 11:23 |
sean-k-mooney | so on older version the address with woudl not be defined which would limit the device that could be used | 11:23 |
sean-k-mooney | but if you are not using pci passthough it might still be fine or if you are but dont need the extended adresspace width | 11:24 |
opendevreview | Merged openstack/nova stable/wallaby: Define new functional test tox env for placement gate to run https://review.opendev.org/c/openstack/nova/+/840717 | 12:24 |
kashyap | sean-k-mooney: On my fresh F36 I see these pip conflicts - do you see too? - https://paste.opendev.org/show/baAm2JxuzDLCsviaK90D/ | 13:26 |
sean-k-mooney | what version of python are you using | 13:33 |
sean-k-mooney | ill check but i dont think so | 13:34 |
sean-k-mooney | that was nova right | 13:34 |
kashyap | Yeah | 13:34 |
kashyap | python3-3.10.4-1.fc36.x86_64 | 13:34 |
sean-k-mooney | we dont fuly supprot 3.10 yet | 13:35 |
sean-k-mooney | its experimental and will be added next cycle | 13:35 |
sean-k-mooney | we test with up to 3.9 as voting so there may be issue if you use 3.10 | 13:35 |
sean-k-mooney | i have 3.10 locally i think so ill try 3.9 and 3.10 | 13:35 |
gibi | kashyap: I haven't seen that yet but it does not seem to be a py310 specific issue. I can try it in a clean env | 13:37 |
sean-k-mooney | kashyap: wa that just unit test by the way | 13:38 |
sean-k-mooney | i.e. tox -e py3 | 13:38 |
kashyap | sean-k-mooney: Yeah, I was just trying to run a unit test | 13:38 |
kashyap | sean-k-mooney: Yes, it was a `tox -e py36[|37] some_test` | 13:39 |
gibi | on master? | 13:39 |
sean-k-mooney | it seams to be working fine for me | 13:40 |
kashyap | Actually slightly (some 20 commits) behind master. Lemme rebase my branch, then | 13:40 |
sean-k-mooney | ya good point 3.10 support is non-voting on master but not supported on anything older | 13:40 |
sean-k-mooney | although both debian and ubuntu relased yoga on 3.10 | 13:40 |
gibi | -e py36[|37] feels old on master | 13:40 |
sean-k-mooney | so it apprealy works there | 13:40 |
sean-k-mooney | we dropped supprot for 3.6 and 3.7 | 13:41 |
sean-k-mooney | on master | 13:41 |
sean-k-mooney | min is now 3.8 | 13:41 |
kashyap | Aaah, duh. I didn't realize that | 13:41 |
gibi | I don't see the conflict with 3.10 locally | 13:41 |
kashyap | Gimme a few; /me tries | 13:41 |
kashyap | gibi: Hmm, seems like PEBKAC, then. Thanks for trying | 13:42 |
gibi | no worries | 13:42 |
sean-k-mooney | some of the package have required_python set | 13:42 |
sean-k-mooney | which will prevent pip form installing them | 13:42 |
sean-k-mooney | and it will break that way | 13:42 |
kashyap | I see, yeah; I'm investigating what's borked here | 13:45 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: zuul: Add nova-live-migration-ceph job https://review.opendev.org/c/openstack/nova/+/843145 | 13:47 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: DNM: Testing live migration with local attach https://review.opendev.org/c/openstack/nova/+/843146 | 13:47 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: DNM: zuul: Add nova-live-migration-ceph job https://review.opendev.org/c/openstack/nova/+/843145 | 13:47 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: DNM: Testing live migration with local attach https://review.opendev.org/c/openstack/nova/+/843146 | 13:47 |
sean-k-mooney | kashyap: https://github.com/openstack/python-glanceclient/blob/master/setup.cfg#L10= | 13:50 |
sean-k-mooney | nova should have that too by the way i just dont think we have bumped it yet | 13:51 |
kashyap | sean-k-mooney: Ah-ha | 13:51 |
kashyap | Thank you for digging! :) | 13:51 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: DNM: zuul: Add nova-live-migration-ceph job https://review.opendev.org/c/openstack/nova/+/843145 | 13:52 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: DNM: Testing live migration with local attach https://review.opendev.org/c/openstack/nova/+/843146 | 13:52 |
sean-k-mooney | kashyap: well i knew exactly where too look if that was the issue and it was so it took like 20 seconds to find | 13:53 |
kashyap | sean-k-mooney: Sure, even that; you bothered to look :) | 13:54 |
kashyap | sean-k-mooney: gibi: Just to confirm; that's it - using py38 works. Thx! | 13:56 |
gmann | sean-k-mooney[m]: ussuri is EM, yes tempest and constraints both will be older and stable/constraints. that is I am working on this series and testing https://review.opendev.org/q/topic:ussuri-pin-tempest | 13:58 |
sean-k-mooney | we have the same issue on stable ussuri currently | 13:58 |
sean-k-mooney | gmann: right so currently its using master and failing because ussuri is using py36 and master dose not support it | 13:59 |
sean-k-mooney | so that need to be fixed in devstack to eitehr clamp ot stable/ussuri or at the latest yoga | 13:59 |
sean-k-mooney | since that is the last version to support py 36 | 13:59 |
gmann | yeah. i know | 14:00 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: DNM: Testing live migration with local attach https://review.opendev.org/c/openstack/nova/+/843146 | 14:02 |
ygk_12345 | can someone respond to this please https://bugs.launchpad.net/nova/+bug/1973887 | 14:27 |
sean-k-mooney | ygk_12345: you put [oslo_messaging_rabbit] | 14:32 |
sean-k-mooney | heartbeat_in_pthread = False | 14:32 |
sean-k-mooney | in nova.conf | 14:33 |
sean-k-mooney | in this case in the nova.conf used by nova-compute | 14:33 |
ygk_12345 | sean-k-mooney: you said to avoid it in other nova services except nova-api. how to do that since all nova services have a single nova.conf file | 14:33 |
sean-k-mooney | that is not how we recommend deploying nova | 14:34 |
ygk_12345 | we have OSA Wallaby | 14:34 |
sean-k-mooney | the compute nodes espically shoudl have a differnt config then the contoller services | 14:34 |
sean-k-mooney | you can turn it off on the api too | 14:34 |
ygk_12345 | sean-k-mooney: ok so applying it for computes alone in this case would help. isn't it ? | 14:35 |
sean-k-mooney | i just will reslt in more rabbit heatbeat lost messages | 14:35 |
sean-k-mooney | yes applying it just to the comptue nodes will resolve most of your issues | 14:35 |
sean-k-mooney | but the oslo bug could also impact the conductor and schduler in the same way | 14:35 |
ygk_12345 | sean-k-mooney: also can you please reply in that bug post as to what are those eventpoll files in the lsof output ? | 14:36 |
sean-k-mooney | in general its safe to turn of the pthread for heartbeat on all nova services | 14:36 |
sean-k-mooney | no | 14:36 |
sean-k-mooney | i close this as a duplicate becasue this is not a nova bug | 14:36 |
ygk_12345 | sean-k-mooney: thats ok. you can close it,. I want to know for curiosity's sake what are those evenpolls ? | 14:37 |
sean-k-mooney | i dont know what they are exactly beut based on teh oslo mssaging bug it something related to the amqp lib we use to connect to rabbitmq | 14:37 |
ygk_12345 | sean-k-mooney: oh ok | 14:37 |
sean-k-mooney | i belive its eventlet polling the tcp connect to rabbit | 14:37 |
ygk_12345 | sean-k-mooney: but why there are so many in this case ? | 14:38 |
sean-k-mooney | but i have not acctully dug into the detail tobias-urdin knows more about it | 14:38 |
ygk_12345 | oh ok | 14:38 |
sean-k-mooney | ygk_12345: its leaking them due to an interaction between evently and py-amqp | 14:39 |
sean-k-mooney | *eventlet | 14:39 |
sean-k-mooney | ygk_12345: this https://github.com/celery/py-amqp/commit/f4fd4f952dded9ea1006e82642f2c15008bb68d3 apprently is the fix in py-amqp based on the other bug | 14:39 |
ygk_12345 | sean-k-mooney: ok | 14:40 |
ygk_12345 | sean-k-mooney: so in this case we are disabling the heartbeat through python green threading. isn't it ? | 14:41 |
sean-k-mooney | so the workaround is to fallback to runnign the heatbeat as a greentheread | 14:41 |
ygk_12345 | sean-k-mooney: if we disable pthread, how is the heartbeat maintained ? | 14:42 |
sean-k-mooney | instead of spwaning a fully unpatched pthread | 14:42 |
kashyap | A newbie question: on brand-new box, any reason why `git review` gets stuck at the ".git/hooks/pre-review" stage? - https://paste.opendev.org/show/bwdWn2H2H61ZYDUbqZqf/ | 14:42 |
sean-k-mooney | ygk_12345: by a greenthread that is coperativly executed with all the others | 14:42 |
sean-k-mooney | ygk_12345: the pthread was only used to work around uwsgi killing the heatbeat greenthread | 14:43 |
ygk_12345 | sean-k-mooney: so there is a diff between pthread and a green thread ? | 14:43 |
sean-k-mooney | ygk_12345: it was used in teh api too escape the lifecycle manamgnet of the wsgi server | 14:43 |
sean-k-mooney | greenthreads are userland thread that the kernel does not know about created by greenlet/eventlet | 14:44 |
sean-k-mooney | pthread are real kernel/os threads | 14:44 |
ygk_12345 | sean-k-mooney: hmm ok. makes sense now | 14:44 |
sean-k-mooney | os there are many greenthread to one pthread normlay | 14:44 |
sean-k-mooney | the reason this was used by wsgi is after a time out appach or uwsgi would kill the interperer following a request | 14:45 |
sean-k-mooney | and that would kill the heatbeat | 14:45 |
sean-k-mooney | and that woudl cause a timeout error on rabbitmq side | 14:45 |
* kashyap is migrating laptops ... and going through the dance of setting everything up. | 14:45 | |
sean-k-mooney | it didnt break anything just create false errors | 14:45 |
sean-k-mooney | ygk_12345: using a pthread when runing under apache mod_wsgi or uwsgi avoids that | 14:46 |
sean-k-mooney | nova-compute does nto run under either so its not needed same for the conductor and schduler | 14:46 |
sean-k-mooney | kashyap: not sure did you install git-review from pypi. you should never use the disto version of git-review | 14:47 |
kashyap | Never? It works just fine normally :) | 14:47 |
sean-k-mooney | its very very old normally | 14:47 |
sean-k-mooney | even on fedora | 14:47 |
kashyap | Yes, it's distro-based; and it's not related to git-review. It's related to the +ssh-rsa thingie setup | 14:47 |
sean-k-mooney | oh you have not updated your keys | 14:48 |
sean-k-mooney | gerrit still does not negociate sha2 and fedora has disable all sha based auth | 14:48 |
tobias-urdin | ygk_12345: what sean-k-mooney says :D | 14:48 |
sean-k-mooney | kashyap: https://src.fedoraproject.org/rpms/git-review git review is currently 2.3.1 | 14:50 |
sean-k-mooney | so even f37 is still behind | 14:50 |
sean-k-mooney | 2.3.1 came out in april | 14:50 |
kashyap | Yeah, saw that; I'm using 2.2.0.3 version | 14:50 |
sean-k-mooney | ya form last november | 14:51 |
kashyap | Right; but that's not the issue here for me. It's something else. I've aloso got the SSH setup: | 14:51 |
kashyap | Host review.openstack.org | 14:51 |
kashyap | HostName review.openstack.org | 14:51 |
kashyap | PubkeyAcceptedKeyTypes +ssh-rsa | 14:51 |
kashyap | (Required at least on Fedora) | 14:51 |
sean-k-mooney | yep unless you update your key | 14:51 |
gibi | I needed both | 15:11 |
gibi | HostkeyAlgorithms +ssh-rsa | 15:11 |
gibi | PubkeyAcceptedAlgorithms +ssh-rsa | 15:11 |
gibi | as new clients does not accept rsa as host key from old servers | 15:11 |
sean-k-mooney | gibi: ya it depens on the client and host i guess. | 15:12 |
sean-k-mooney | i dont have either | 15:12 |
sean-k-mooney | i just created a second key | 15:12 |
bauzas | reminder : nova meeting in 48 mins here | 15:13 |
sean-k-mooney | i have an id_ed25519 key and a less secure id_ecdsa as well as id_rsa now | 15:13 |
gibi | the ssh client stopped accepting rsa host keys since 8.8 | 15:14 |
gibi | *openssh-client | 15:14 |
gibi | but old servers still send the host key as rsa | 15:15 |
gibi | https://www.openssh.com/txt/release-8.8 | 15:15 |
sean-k-mooney | right but i think opendev have updated theirs | 15:16 |
sean-k-mooney | to have ec keys | 15:16 |
kashyap | Okay; /me uploads EC keys | 15:17 |
kashyap | gibi: Ah, good to know; lemme try | 15:17 |
sean-k-mooney | gibi: i think im using 8.8 but i guess i have not updated in a few days | 15:18 |
kashyap | gibi: W/o that HostkeyAlgorithms, I could SSH to `ssh kashyapc@review.opendev.org -p 29418` just fine | 15:19 |
sean-k-mooney | actully i think i have 8.7 | 15:21 |
sean-k-mooney | hum OpenSSH_8.8p1, OpenSSL 1.1.1o 3 May 2022 | 15:22 |
kashyap | gibi: No dice; for me `git review` just seem to be stuck in limbo trying to run the 'pre-review' hook (which in turn just runs `tox -e pep8`) | 15:22 |
gibi | ack | 15:23 |
sean-k-mooney | kashyap: i think they had to fix somethin gin git-review for this by the way | 15:23 |
gibi | I don't user git-review | 15:23 |
* kashyap runs w/ "tox -vv" to investigate | 15:23 | |
* gibi uses pure git like a caveman | 15:23 | |
kashyap | Heh, I used to as well; but got used to git-review | 15:24 |
sean-k-mooney | gibi: via git push refs/for/chage? | 15:24 |
gibi | yepp | 15:24 |
sean-k-mooney | i havnt done that in about 5 years | 15:24 |
kashyap | gibi: Then you might like `git submit` -- I really like using it when submitting patches to libvirt/QEMU lists (or any list-based thing) | 15:24 |
sean-k-mooney | but for the first 2-3 of working on openstack that is what i did too | 15:24 |
gibi | it is a bit longer command but at least I'm in full control :) | 15:24 |
kashyap | https://github.com/stefanha/git-publish | 15:25 |
kashyap | gibi: See this: https://listman.redhat.com/archives/libguestfs/2022-May/028911.html | 15:26 |
kashyap | It's a wrapper around `git-format-patch` and `git-send-email` | 15:26 |
kashyap | But for us it doesn't make sense; as we don't use lists :D | 15:26 |
gibi | thanks, I will try to remember it when I have to send patches by mail | 15:27 |
kashyap | Okay, when I run 'tox' with -vv, it seems to be stuck here: https://paste.opendev.org/show/baNIMuM5YxlBA8nwxm10/ | 15:28 |
kashyap | INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C. Using cached decorator-4.2.1-py2.py3-none-any.whl (9.3 kB) | 15:29 |
sean-k-mooney | have you considered using ubuntu :P | 15:29 |
sean-k-mooney | is that still with py38? or are you going back to 3.10 | 15:30 |
kashyap | No; Fedora is the devil I know, I'm a Fedora dev too ... not gonna abandon it :D | 15:30 |
kashyap | sean-k-mooney: py38 stil | 15:30 |
gibi | you can try it in an ubuntu docker container ;) | 15:36 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: DNM: Testing live migration with local attach https://review.opendev.org/c/openstack/nova/+/843146 | 15:38 |
artom | Uggla, ^^ is why I almost never touch devstack :P | 15:39 |
artom | CI is my devstack ^_^ | 15:39 |
opendevreview | Merged openstack/nova stable/rocky: [stable-only] Drop lower-constraints job https://review.opendev.org/c/openstack/nova/+/838041 | 15:43 |
opendevreview | Kashyap Chamarthy proposed openstack/nova master: libvirt: Add a workaround to skip compareCPU() on destination https://review.opendev.org/c/openstack/nova/+/838926 | 15:43 |
kashyap | gibi: melwitt: --^ Fixed a small test code clean-up (as per Mel's review; thanks) | 15:44 |
Uggla | artom, you are using a big devstack | 15:45 |
gibi | kashyap: I think you delete too much | 15:47 |
artom | Uggla, in a way :) | 15:47 |
gibi | replied inline | 15:47 |
kashyap | Aargh :) /me goes to fix | 15:47 |
gibi | Uggla, artom: this is cool until you are OK iterating once in 2 hours | 15:47 |
kashyap | It's taking a full 3+ minutes to run `git review`, it's slower than a pig | 15:47 |
gibi | s/until/while/ | 15:47 |
artom | gibi, it suits my mind more | 15:48 |
artom | I'd rather concentrate on an "interesting" thing, let it run while I do something else | 15:48 |
artom | Rather than faffing about on a devstack that can break in various fun random ways | 15:48 |
artom | At least the CI is (mostly) consistent and predictable, on master anyways | 15:48 |
gibi | artom: actually re-creating devstack in an ubuntu VM is fairly stable and a lot faster than 2 hours | 15:49 |
gibi | and when that breaks you expect that the gate will break too :) | 15:49 |
artom | With Ceph, and multinode? | 15:49 |
gibi | ahh | 15:49 |
artom | Zuul does it for me :) | 15:49 |
gibi | good point | 15:49 |
gibi | I avoided Ceph | 15:50 |
*** whoami-rajat__ is now known as whoami-rajat | 15:50 | |
artom | We've fixed all the "easy" things in Nova | 15:50 |
gibi | but multinode is still pretty easy | 15:50 |
artom | And any that are left can be done in func tests, mostly | 15:50 |
gibi | :D | 15:50 |
artom | So the stuff you need real deployments for are almost always a pain in the ass | 15:50 |
artom | Anyways, I'm not advocating that anyone work the way I do | 15:51 |
artom | But it's been a recurring topic between Uggla and myself | 15:51 |
artom | So I figured I'd point to an example of why I rarely touch devstack directly anymore | 15:51 |
gibi | sure, I'm not advocating either | 15:53 |
Uggla | artom, sure. I try to pick the best in the 2 way of working. To be honest, having a devstrack is most of the time more convenient for me. This is probably because I currently make so many mistakes. | 15:53 |
kashyap | gibi: That diff good w/ you? - https://paste.opendev.org/show/bJB6AN1q4slkEzyhqN4u/ | 15:57 |
gibi | kashyap: yep | 15:59 |
bauzas | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue May 24 16:00:24 2022 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
bauzas | hey folks | 16:00 |
whoami-rajat | Hi | 16:00 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:00 |
elodilles | o/ | 16:01 |
opendevreview | Kashyap Chamarthy proposed openstack/nova master: libvirt: Add a workaround to skip compareCPU() on destination https://review.opendev.org/c/openstack/nova/+/838926 | 16:01 |
gibi | o/ | 16:02 |
bauzas | ok, let's start | 16:02 |
bauzas | #topic Bugs (stuck/critical) | 16:02 |
bauzas | #info No Critical bug | 16:02 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 15 new untriaged bugs (+1 since the last meeting) | 16:02 |
bauzas | no worries sean, I saw you did a hard work | 16:02 |
bauzas | #link https://storyboard.openstack.org/#!/project/openstack/placement 26 open stories (0 since the last meeting) in Storyboard for Placement | 16:03 |
bauzas | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:03 |
bauzas | sean-k-mooney: any bug you wanted to discuss for triage ? | 16:03 |
sean-k-mooney | am not anyting pressing | 16:03 |
sean-k-mooney | https://etherpad.opendev.org/p/nova-bug-triage-2022-05-17 | 16:03 |
sean-k-mooney | we had one feature request | 16:03 |
sean-k-mooney | for numa in placment | 16:03 |
sean-k-mooney | which i marked as invlaid | 16:03 |
sean-k-mooney | and and one duplciate fo an | 16:04 |
sean-k-mooney | oslo messaging bug | 16:04 |
sean-k-mooney | the rest did nto have enouch info to triage really | 16:04 |
sean-k-mooney | so i marked them incomplete | 16:04 |
sean-k-mooney | i also checked some of the incomplete form last week | 16:04 |
sean-k-mooney | but no change really | 16:04 |
sean-k-mooney | one fixed bug form stephen https://bugs.launchpad.net/nova/+bug/1974173 | 16:05 |
sean-k-mooney | thats about it | 16:05 |
bauzas | ok thanks | 16:06 |
bauzas | and thanks again for triaging | 16:06 |
bauzas | elodilles: are you okay for getting the baton for this week ? | 16:06 |
elodilles | bauzas: yepp o7 | 16:07 |
bauzas | thanks | 16:07 |
bauzas | #info Next bug baton is passed to elodilles | 16:08 |
bauzas | #topic Gate status | 16:08 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:08 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status | 16:09 |
bauzas | #link https://zuul.opendev.org/t/openstack/builds?job_name=nova-emulation&pipeline=periodic-weekly&skip=0 Emulation periodic job runs | 16:09 |
bauzas | as you can see ^ nothing to tell | 16:09 |
bauzas | both jobs and pipelines work | 16:09 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:09 |
bauzas | #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures | 16:09 |
bauzas | as a reminder for everyone ^ :) | 16:10 |
gibi | please note that we are still playing wack-a-mole with the volume detach issue. There are still open tempest patches adding more SSHABLE waiters | 16:10 |
bauzas | gibi: yup, we'll discuss this for the stable topic | 16:10 |
gibi | ack, but this is affecting master still :) | 16:11 |
sean-k-mooney | bauzas: well it affects master too | 16:11 |
sean-k-mooney | but sure | 16:11 |
bauzas | yep, I know, thanks for the reminder it also impacts master | 16:11 |
bauzas | #topic Release Planning | 16:12 |
bauzas | #link https://releases.openstack.org/zed/schedule.html | 16:12 |
bauzas | #info Zed-1 was last week | 16:12 |
bauzas | thanks sean-k-mooney for accepting the rc1 releases for the projectsd | 16:12 |
bauzas | oh, actually elodilles | 16:13 |
sean-k-mooney | ya it was elodilles | 16:13 |
sean-k-mooney | i repled on one after the fact | 16:13 |
bauzas | #link https://review.opendev.org/c/openstack/releases/+/841851 novaclient release for zed-1 | 16:13 |
elodilles | well, it had a deadline, so needed a review & merge o:) | 16:13 |
sean-k-mooney | we dicussed it but i forgot to do it before the deadline | 16:13 |
bauzas | #link https://review.opendev.org/c/openstack/releases/+/841845 os-vif release for zed-1 | 16:13 |
bauzas | sean-k-mooney: me too | 16:13 |
sean-k-mooney | elodilles: strictly speaking we dont have to do it by m1 | 16:14 |
bauzas | and i was off this friday, didn't help | 16:14 |
sean-k-mooney | that is just the convention the the release team are following | 16:14 |
sean-k-mooney | btu its not requried by the release model | 16:14 |
bauzas | elodilles: don't be afraid to ping me if you need me to review some release change | 16:14 |
sean-k-mooney | we jsut need a intermediat release :) | 16:14 |
bauzas | correct | 16:14 |
elodilles | bauzas: ack :) | 16:15 |
elodilles | sean-k-mooney: not necessary, yes, but if there is no -1 from the team, then release managers merges the generated patches at deadlines o:) | 16:16 |
sean-k-mooney | elodilles: right the deadline is actully m3 | 16:16 |
sean-k-mooney | the docs dont mention m1 at all | 16:17 |
sean-k-mooney | that is jsut a hold over form the release with milestone model | 16:17 |
sean-k-mooney | but thanks form taking care of it in any case | 16:17 |
sean-k-mooney | https://github.com/openstack/releases/blob/61f891ddd7bd3b28ac7b5e7e9e1d9203fbbe297d/doc/source/reference/release_models.rst#cycle-with-intermediary= | 16:18 |
elodilles | sean-k-mooney: see #2, and it's last chapter: https://releases.openstack.org/reference/process.html#milestone-1 | 16:19 |
bauzas | elodilles: how can I see whether for example os-vif is either using a cycle-with-rc model or a cycle-with-intermediary one ? | 16:19 |
sean-k-mooney | elodilles: yep that is not in lien with the governance doc | 16:19 |
elodilles | bauzas: in the yaml file under deliverables/zed | 16:19 |
sean-k-mooney | anyway its not importnat now | 16:20 |
bauzas | elodilles: ok b/c https://releases.openstack.org/teams/nova.html doesn't tell it | 16:20 |
bauzas | anyway, moving on | 16:20 |
elodilles | ++ | 16:20 |
bauzas | #topic Review priorities | 16:20 |
bauzas | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+label:Review-Priority%252B1 | 16:20 |
bauzas | #link https://review.opendev.org/c/openstack/project-config/+/837595 Gerrit policy for Review-prio contributors flag. Naming bikeshed in there. | 16:21 |
bauzas | #link https://docs.openstack.org/nova/latest/contributor/process.html#what-the-review-priority-label-in-gerrit-are-use-for Documentation we already have | 16:21 |
bauzas | I provided a comment for https://review.opendev.org/c/openstack/project-config/+/837595 | 16:21 |
bauzas | please review it | 16:21 |
gibi | done :) | 16:21 |
bauzas | thanks | 16:22 |
bauzas | at least I'm French so in general I'm not good at naming things | 16:22 |
bauzas | but at least I try to find a consensus | 16:23 |
gibi | thank you for that | 16:23 |
bauzas | I think all contributors know what nova-core means | 16:23 |
bauzas | hopefully | 16:23 |
gibi | that is a fair assumption | 16:24 |
bauzas | for other repos, we could name the label differently of course, like 'osvif-core' if this is named by gerrit | 16:24 |
bauzas | ie. nova-specs-core review promise | 16:25 |
bauzas | os-vif-core etc. | 16:25 |
bauzas | but this is a naming bikeshed | 16:25 |
bauzas | anyway, moving on | 16:26 |
bauzas | #topic Stable Branches | 16:26 |
bauzas | in general I ask elodilles | 16:26 |
bauzas | but this time, let me do it | 16:26 |
bauzas | #info ussuri and older branches are still blocked, newer branches should be OK | 16:26 |
bauzas | melwitt had a point | 16:27 |
elodilles | just an update for that ^^^ i think ussuri is blocked but the older branches are not blocked anymore | 16:27 |
bauzas | #link https://etherpad.opendev.org/p/nova-stable-branch-ci stable branches CI issues tracking, feel free to update with stable branch CI issues | 16:27 |
bauzas | elodilles: woah | 16:27 |
bauzas | kudos to the team then | 16:28 |
elodilles | bauzas: l-c branches were merged | 16:28 |
elodilles | bauzas: i don't say they don't have intermittent failures though o:) | 16:28 |
bauzas | elodilles: I thought most of the issues were related to volume detach things, which are unrelated to l-c | 16:28 |
elodilles | but at least they are not blocked | 16:28 |
bauzas | ah | 16:28 |
bauzas | elodilles: but then, why ussuri is blocked while older not ? | 16:29 |
elodilles | ussuri and train were where tempest were not pinned, | 16:29 |
elodilles | and where tempest is running with py36 | 16:29 |
elodilles | if i'm not mistaken that's it | 16:30 |
elodilles | and gmann's train fix has landed | 16:30 |
bauzas | ok thanks | 16:31 |
elodilles | originally we thought that ussuri does not need a fix as it has zuulv3 jobs already, but that's not true unfortunately | 16:31 |
bauzas | gmann told me he couldn't attend this meeting, so let's discuss this again next week | 16:31 |
elodilles | i mean, it has zuulv3 jobs, but still we are facing with the same issue | 16:31 |
elodilles | bauzas: ++ | 16:31 |
gibi | so I think the next step is still to gather the intermitten failures and try to fix them | 16:31 |
bauzas | gibi: yeah, we'll track those on a weekly basis thanks to the etherpad | 16:32 |
gibi | ack | 16:32 |
elodilles | thanks melwitt for starting the etherpad \o/ | 16:32 |
bauzas | yup, melwitt++ | 16:32 |
bauzas | anything to discuss about those intermittent issues btw. ? | 16:34 |
elodilles | i guess we still need to collect them to have the full picture | 16:35 |
bauzas | yup | 16:35 |
gibi | yepp | 16:36 |
elodilles | maybe one note: for placement we don't have periodic-stable on wallaby and older | 16:37 |
bauzas | :/ | 16:37 |
gibi | elodilles: do you suspect some instability in placement? | 16:37 |
elodilles | gibi: nope, but the gate is broken in wallaby and older in placement | 16:38 |
gibi | or is this just proactively running some jobbs | 16:38 |
gibi | broken?! | 16:38 |
elodilles | gibi: see melwitt's etherpad | 16:38 |
gibi | that is bad :/ | 16:38 |
elodilles | though probably they are some known issues to fix | 16:38 |
gibi | I agree to add some periodic there then | 16:39 |
elodilles | gibi: ack, i can backport the patch that added the periodic | 16:39 |
elodilles | * periodic-stable | 16:39 |
bauzas | gibi: agreed too | 16:40 |
bauzas | moving on ? | 16:41 |
elodilles | bauzas: ++ | 16:41 |
bauzas | #topic Open discussion | 16:41 |
bauzas | (whoami-rajat) Discuss regarding the design of rebuild volume backed instance feature | 16:41 |
bauzas | whoami-rajat: your turn | 16:41 |
whoami-rajat | Hi | 16:41 |
whoami-rajat | thanks | 16:41 |
whoami-rajat | #link https://review.opendev.org/c/openstack/nova-specs/+/840155 | 16:41 |
whoami-rajat | So I started working on this feature in yoga (this was proposed/reproposed several times before) and the spec got approved | 16:42 |
whoami-rajat | now while reproposing it, sean-k-mooney has some concerns regarding the new parameter we are introducing ``reimage_boot_volume`` | 16:42 |
whoami-rajat | it's a request parameter to tell the API, we are performing rebuild on a volume backed instance and not an ephemeral disk | 16:43 |
sean-k-mooney | yep | 16:43 |
whoami-rajat | initially the idea was not to have feature parity between both workflows but later there were many concerns with this operation being destructive | 16:43 |
whoami-rajat | even if you follow past specs, the concern has been discussed | 16:43 |
whoami-rajat | so lyarwood suggested to add this parameter ``reimage_boot_volume`` so any user who would like to opt in for this (as it has data loss risk) would only be able to do it | 16:44 |
sean-k-mooney | i really think that havign feature partiy btween bfv=True|false is imporant | 16:44 |
sean-k-mooney | i dont think the data loss argument holds | 16:44 |
sean-k-mooney | my reason is tha thtis is a deliberate instance action to rebuild the root disk | 16:44 |
gibi | rebuild is destructive for image bases instances too | 16:45 |
sean-k-mooney | yep | 16:45 |
sean-k-mooney | and rebuild is not the same as evacuate | 16:45 |
whoami-rajat | yes but the destructive operation is performed by cinder in this case where the volume resides on the cinder side | 16:45 |
sean-k-mooney | for evacuate we shoudl preserve the data | 16:45 |
bauzas | that's the whole purpose of this spec | 16:45 |
sean-k-mooney | for rebuild via the api we shoudl reimage the root volume | 16:45 |
bauzas | rebuild on BFV wasn't destructive, right? | 16:46 |
sean-k-mooney | rebuild was rejected | 16:46 |
sean-k-mooney | for bfv | 16:46 |
whoami-rajat | we didn't support rebuild on BFV | 16:46 |
sean-k-mooney | so the wole point is to allow rebuild with bfv | 16:46 |
bauzas | if so, there is a clear implication of what rebuild means for the root disk | 16:46 |
bauzas | we blocked because we were unable to rebuild the root disk if bfv | 16:47 |
sean-k-mooney | and technialy extra ephmeral disks | 16:47 |
sean-k-mooney | bauzas: correct | 16:47 |
bauzas | then, I don't see a need for differenciating BFV and non-BFV from an API pov | 16:48 |
bauzas | both will be destructive for the root disk | 16:48 |
sean-k-mooney | if so we also do not need an api microversion correct | 16:48 |
sean-k-mooney | and no api change at all | 16:48 |
sean-k-mooney | we just remove the block | 16:48 |
whoami-rajat | the destructive nature of this operation was the concern from many folks, I can't name everyone but this was approved in yoga so you can see | 16:48 |
bauzas | good question | 16:48 |
sean-k-mooney | when cinder is new enough | 16:48 |
whoami-rajat | dansmith, has been actively reviewing the changes I proposed last cycle so maybe he can weigh in | 16:49 |
bauzas | whoami-rajat: frankly, if we were about adding some parameter, it would be more for *not* recreating the volume | 16:49 |
dansmith | bauzas: the point of the spec/effort is to rebuild the root volume | 16:50 |
dansmith | i.e. to reimage it, but let cinder do the reimaging | 16:50 |
bauzas | dansmith: that's what I understand | 16:50 |
bauzas | so... | 16:50 |
bauzas | tbc, I don't see a need for an API param that'd say "yes, I want to rebuild by reimaging" | 16:51 |
bauzas | which would imply that the default would be "rebuild by not reimaging" | 16:51 |
sean-k-mooney | bauzas: no default would reject | 16:52 |
sean-k-mooney | bauzas: that was the behavior that i think lee suggested but i dont think i reviewd the previous iteration | 16:52 |
dansmith | I think user-initiated rebuild where we don't reimage root is pointless right? | 16:52 |
dansmith | as long as we don't rebuild on evacuate then we're good, | 16:53 |
sean-k-mooney | correct | 16:53 |
sean-k-mooney | ya | 16:53 |
bauzas | I agree | 16:53 |
dansmith | but this is specifically to make BFV behave like regular instances | 16:53 |
sean-k-mooney | right so evacuate shoudl continue to preseve the root disk if its on shared storage | 16:53 |
bauzas | correct me if I'm wrong, but I feel we are on the same page | 16:53 |
bauzas | evacuate should differ | 16:53 |
sean-k-mooney | and rebuidl will always reimage it provided cinder i new enough | 16:54 |
whoami-rajat | Since the main destruction is performed on the cinder side, I know a lot of folks on cinder side that won't agree to the idea of not adding this additional precautionary measure to avoid it | 16:54 |
bauzas | but rebuild should behave like regular instance, ie. reimage | 16:54 |
dansmith | bauzas: okay I guess I thought you were arguing for a special param | 16:54 |
whoami-rajat | as where the initial concern started ^ | 16:54 |
bauzas | dansmith: I was absolutely on the other direction, see above :) | 16:54 |
sean-k-mooney | i realy dont liek the idea of make bfv special in the nova api | 16:54 |
bauzas | me too | 16:54 |
dansmith | bauzas: ack, sorry, I'm double-meeting-ing | 16:54 |
bauzas | from an API point of view, this is clear | 16:55 |
sean-k-mooney | whoami-rajat: if we want to prevent this form the cidner side | 16:55 |
sean-k-mooney | i think cinder need a way to block the reimage not nova | 16:55 |
bauzas | of course, since we share the same internal methods for evacuate and rebuild, we should make them differ based on some conditional | 16:55 |
sean-k-mooney | like locking the volume or similar | 16:55 |
bauzas | but this conditional doesn't have to be exposed at the API level | 16:55 |
sean-k-mooney | bauzas: i think we pass a flag to rebudil to signal if its an evacuate right | 16:55 |
dansmith | bauzas: we already have a flag to pass, | 16:55 |
dansmith | bauzas: because we have to honor the old microversion, | 16:56 |
dansmith | so we can just make sure it's ==false for the evac case | 16:56 |
bauzas | dansmith: yeah, I know, that's the conditional I thought | 16:56 |
dansmith | conditional at the rpc layer, but the only conditional in the api is "old or new microversion" | 16:56 |
dansmith | the only conditional *should* be version, I mean | 16:56 |
bauzas | dansmith: correct, that being said, there was an open question | 16:56 |
sean-k-mooney | dansmith: well do we need a microverion | 16:56 |
bauzas | about even whether we would need a microversion | 16:56 |
sean-k-mooney | there is not api request change | 16:57 |
bauzas | if we just unblock | 16:57 |
dansmith | I think we absolutely do, | 16:57 |
sean-k-mooney | i realy think we should not | 16:57 |
dansmith | because right now rebuild does not destroy data and after this, it would | 16:57 |
sean-k-mooney | right now it rejects the request | 16:57 |
whoami-rajat | sean-k-mooney, if the operation is initiated from nova side, I'm not sure how from cinder side we can provide a user input to block this | 16:57 |
dansmith | sean-k-mooney: only if the image is different | 16:57 |
sean-k-mooney | dansmith: no its rejected always i tought | 16:58 |
dansmith | sean-k-mooney: if the image is the same, it allows it | 16:58 |
sean-k-mooney | ... | 16:58 |
dansmith | whoami-rajat: right? | 16:58 |
whoami-rajat | but maybe I'm the only one defending the proposal | 16:58 |
whoami-rajat | dansmith, yes, for same image it does allow the rebuild | 16:58 |
dansmith | sean-k-mooney: ^ | 16:58 |
bauzas | hah | 16:59 |
sean-k-mooney | that seams like a bug | 16:59 |
sean-k-mooney | since tha talso destorys data | 16:59 |
bauzas | (18:46:01) bauzas: rebuild on BFV wasn't destructive, right? | 16:59 |
sean-k-mooney | there is no differne form a data perspective fi you use the same image or differnt one | 16:59 |
bauzas | damn, we're about the end of time | 16:59 |
dansmith | sean-k-mooney: it doesn't on BFV but does on regular instances | 16:59 |
dansmith | sean-k-mooney: on BFV if the image is the same, it will just rebuild the ports or whatever, but no change to the disk | 17:00 |
dansmith | but will destroy the disk with the same image on a regular instance | 17:00 |
bauzas | I'll close this meeting, but I beg the people here to continue discussing this topic after | 17:00 |
sean-k-mooney | that the same as a hard reboot | 17:00 |
dansmith | sean-k-mooney: alas, it's api behavior we have had for YEARS | 17:00 |
sean-k-mooney | rebuild is not a move op | 17:00 |
dansmith | so changing it to now destroy data is a Bad Plan (tm) | 17:00 |
sean-k-mooney | and it should not really update the port either | 17:00 |
dansmith | well understood :) | 17:00 |
bauzas | thanks all, and for people interested in this bfv resize discuss, please stay around | 17:00 |
bauzas | #endmeeting | 17:01 |
opendevmeet | Meeting ended Tue May 24 17:01:04 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 17:01 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2022/nova.2022-05-24-16.00.html | 17:01 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2022/nova.2022-05-24-16.00.txt | 17:01 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2022/nova.2022-05-24-16.00.log.html | 17:01 |
bauzas | ok, so, lemme clarify | 17:01 |
sean-k-mooney | dansmith: it feel pretty bad to have to opt in to have it do what we document | 17:01 |
dansmith | sean-k-mooney: https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3617 | 17:01 |
bauzas | 1/ resize wasn't destructive on bfv if you pass the same image | 17:01 |
sean-k-mooney | the current docs https://docs.openstack.org/api-ref/compute/?expanded=rebuild-server-rebuild-action-detail#rebuild-server-rebuild-action= link to https://bugs.launchpad.net/nova/+bug/1482040 | 17:01 |
dansmith | sean-k-mooney: if it wasn't a question of DESTROYING data I would maybe agree with you | 17:02 |
bauzas | 2/ we agree on not providing a specific param for resize on bfv | 17:02 |
dansmith | however, for years and years you could call this api and not destroy your very precious pet root volume | 17:02 |
bauzas | 3/ given 1/ and 2/, we still need a microversion to signal the behavioural change | 17:02 |
dansmith | and just silently changing that is just asking for a very angry customer | 17:02 |
sean-k-mooney | dansmith: so what is it doing in this case | 17:02 |
bauzas | sean-k-mooneypoint is, like it or not, we can't change the behaviour without signaling it | 17:03 |
sean-k-mooney | updating the image metadata | 17:03 |
dansmith | sean-k-mooney: now or after this spec merges? | 17:03 |
sean-k-mooney | if so that can break things and shoudl be blocked | 17:03 |
sean-k-mooney | now | 17:03 |
dansmith | sean-k-mooney: now it does all the rebuild machinery it just doesn't change your root disk at all | 17:03 |
dansmith | i.e. evacuate but without the move | 17:03 |
sean-k-mooney | well rebuild just does two things | 17:04 |
dansmith | I'm not so sure it's identical to a hard reboot, but maybe | 17:04 |
dansmith | it doesn't really matter though | 17:04 |
sean-k-mooney | erases epmeral storage unless its ironic and you use an api exteion | 17:04 |
sean-k-mooney | reimage the root disk and hard rebotos | 17:04 |
sean-k-mooney | dansmith: im really debating if we should be blocking rebuidl with the same image as a bug in older releases | 17:05 |
dansmith | so I think rebuild will refresh your stored image_meta if the meta has changed on your same image right? | 17:05 |
sean-k-mooney | it seam dangous to me to allow | 17:05 |
dansmith | so you could use that to get new numa settings or something | 17:05 |
sean-k-mooney | yep | 17:05 |
opendevreview | Elod Illes proposed openstack/placement stable/wallaby: Add periodic-stable-jobs template https://review.opendev.org/c/openstack/placement/+/843174 | 17:05 |
dansmith | not the intent, but could definitely be people using it that way | 17:05 |
sean-k-mooney | i only block that if the image chagnes | 17:05 |
dansmith | not that we need to allow that in the future, BUT it means they could be using it now not expecting destruction of data | 17:05 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3647-L3648= | 17:06 |
sean-k-mooney | dansmith: if the image chagnes we validate the host in the schduler | 17:06 |
sean-k-mooney | dansmith: i belive we have an optimisation where we do not do that if its the same image | 17:06 |
sean-k-mooney | dansmith: or at least we used too but maybe that was removed | 17:06 |
dansmith | sean-k-mooney: but we update the image_meta stored with the instance and rebuild the pci device stuff | 17:07 |
whoami-rajat | with the new proposal, if opted in, we will be performing the reimage whether it is the same image or different | 17:07 |
whoami-rajat | if not opted in, we can still perform rebuild for same image but 400 for different image | 17:08 |
sean-k-mooney | no | 17:08 |
sean-k-mooney | i really dont think that is safe. i need to read the code | 17:08 |
dansmith | sean-k-mooney: no what? | 17:08 |
sean-k-mooney | but i dont think rebuidl to same image in the current case is safe in all cases | 17:08 |
whoami-rajat | we are keeping backward compatibility with the new microversion ? | 17:09 |
sean-k-mooney | im quetioning if the curren behavior in the old microverion is a bug | 17:09 |
sean-k-mooney | whoami-rajat: i tought we blocked it always | 17:09 |
sean-k-mooney | that is not the case and now im trying to assess if it currently safe as is | 17:10 |
dansmith | oh yeah rebuild also lets you add/change metadata, server name, keys, user data, hostname, certs, etc | 17:10 |
dansmith | so people could totes be using that on pets right now and expecting no data loss | 17:10 |
sean-k-mooney | yes it does now | 17:10 |
whoami-rajat | ok, I'm not too sure about it, when i started working on it I thought it wasn't supported at all but now I'm trying to impose the new behavior without differentiating with same or different image | 17:10 |
whoami-rajat | not sure if the old behavior makes sense from a nova perspective | 17:11 |
sean-k-mooney | whoami-rajat: i tought it was not supported at all too | 17:11 |
sean-k-mooney | so the resoltion for https://bugs.launchpad.net/nova/+bug/1482040 was just to note that the image is not replaced | 17:12 |
sean-k-mooney | by linking to the bug | 17:12 |
sean-k-mooney | where as teh correct fix liekly shoudl have been to block the operatio or implemtn your spec. | 17:13 |
dansmith | https://github.com/openstack/tempest/blob/44dac69eb77d78a0de8e68e63617099249345578/tempest/api/compute/servers/test_server_actions.py#L292-L329 | 17:13 |
sean-k-mooney | dansmith: so ya we likely need to actully have a microversion... | 17:13 |
dansmith | the comment there implies that the test is doing a rebuild of a volume-backed instance, but it's not, just volume-attached | 17:13 |
dansmith | but it does say "is common" FWIW :) | 17:13 |
dansmith | sean-k-mooney: yup | 17:13 |
sean-k-mooney | i really hate that but we need to fix the api ref to docuemnt this properly | 17:14 |
sean-k-mooney | i guess i can live with new microversion always reimages and old perserves current behaivor | 17:15 |
bauzas | ok, looks like we then have a consensus | 17:15 |
dansmith | "There is a known limitation where the root disk is not replaced for volume-backed instances during a rebuild." | 17:15 |
dansmith | ^ in the api-ref | 17:15 |
sean-k-mooney | dansmith: does that work for you. | 17:15 |
bauzas | dansmith: heh, I'm glad I remind this correctly | 17:15 |
sean-k-mooney | dansmith: ya but i was expecting an error in that case | 17:15 |
bauzas | sean-k-mooney: are you okay with the direction ? | 17:15 |
bauzas | personnally, I'm all good | 17:15 |
gibi | I'm ok with a new microversion | 17:15 |
dansmith | sean-k-mooney: yes, but I think that it would probably be prudent for the _client_ to have some flag if you provide the same image to make sure you really mean it | 17:15 |
dansmith | because most people are just going to be firing the client the same way and not realize the change | 17:16 |
sean-k-mooney | i dont really liek that | 17:16 |
bauzas | dansmith: that's the whole purpose of microversions, no ? | 17:16 |
sean-k-mooney | as every custoemr will then have to use a new microver and flag | 17:16 |
sean-k-mooney | and heat/ansibel ectra | 17:17 |
sean-k-mooney | will have to then always know if its a bfv instnace or not | 17:17 |
dansmith | bauzas: it is, but I think most people "opt into" microversions because they try to do something, google for why it's not working and someone says "oh pass this cryptic version flag" | 17:17 |
sean-k-mooney | BFV is not a flav show on instance show | 17:17 |
whoami-rajat | if you request the rebuild and it gets rejected, the message will clearly mention that you need to pass ``reimage_boot_volume`` with the request | 17:17 |
dansmith | but fine, if it's not palatable, then whatever, it's a risk | 17:17 |
sean-k-mooney | so you need to do some deep introspecation fo the instnace to fiture out if the root voluem is on cinder or not | 17:18 |
dansmith | whoami-rajat: they're asking for *not* that | 17:18 |
sean-k-mooney | and we dodnt provde the bdm either | 17:18 |
whoami-rajat | ok | 17:18 |
dansmith | sean-k-mooney: the client could always require that flag if the image is not changing, even for non-BFV instances, but I understand the concern | 17:18 |
* gibi needs to drop now but OK with the direction of the discussion | 17:19 | |
dansmith | just saying, someone's going to lose their data.. granted because they weren't paying close enough attention, but .. it's a pet root volume so I just think it's worth being careful here | 17:19 |
dansmith | but the microversion is the important part to me for sure | 17:19 |
sean-k-mooney | so new microver alwasy require you to pass yes-i-really-really-mean-it | 17:19 |
dansmith | sean-k-mooney: that's what whoami-rajat has, at the API level (IIRC), but I was suggesting make that just a client shell behavior | 17:20 |
sean-k-mooney | dansmith: i think it was only for bfv instnaces in the spec currently | 17:20 |
dansmith | like, the client will refuse to use microversion 2.x *and* the same image_ref, unless you provide -yes-really | 17:20 |
dansmith | sean-k-mooney: yes, in the spec currently | 17:20 |
sean-k-mooney | if it was for all instance i think it woudl be more palletable | 17:20 |
sean-k-mooney | becasue as an end user i dont need to care if its bfv or not | 17:21 |
dansmith | right, I was suggesting making it consistent for all to avoid having to know if it's bfv or not | 17:21 |
sean-k-mooney | ya that im ok with even if it will break people that blindly use latest | 17:21 |
sean-k-mooney | it will break them by being safer | 17:21 |
dansmith | right | 17:21 |
sean-k-mooney | but that will mean rebuidl with nova client wont work anymore | 17:22 |
sean-k-mooney | since we are nolonger updatign the cli | 17:22 |
dansmith | sean-k-mooney: again I'm saying make it a client behavior, not an API one | 17:22 |
sean-k-mooney | so what woudl the api behavior be | 17:22 |
sean-k-mooney | new microversion just reimage | 17:22 |
sean-k-mooney | but gaurd in osc | 17:23 |
dansmith | the api behavior would be "if new, always destroy" | 17:23 |
dansmith | right | 17:23 |
sean-k-mooney | ok i like that more | 17:23 |
dansmith | osc logic is: "if new_version and image_ref==server.image_ref and args.yes_really: then do_it" | 17:23 |
dansmith | maybe we don't already have the server there in osc I guess, but we do validations like that in other places right? | 17:23 |
dansmith | sorry, that's not the right logic, let me try again: | 17:24 |
dansmith | if image_ref==server.image_ref and new_version: if not args.yes_really: explode with warning | 17:24 |
dansmith | only require the flag if new version and the image is not changing | 17:25 |
sean-k-mooney | or just check for new version | 17:25 |
whoami-rajat | but if someone directly curls the API (not from client), don't we require the validation of additional parameter? | 17:25 |
dansmith | could do that, but then everyone always has to do that, and the image not changing is so niche | 17:25 |
dansmith | whoami-rajat: right | 17:25 |
sean-k-mooney | whoami-rajat: i really dont think that heat should have to do differnt thigns for bfv or not | 17:25 |
dansmith | sean-k-mooney: and also, heat had better know the impact of the new microversion if they opt into it | 17:26 |
sean-k-mooney | ya | 17:26 |
sean-k-mooney | this comes back to not blindly using latest | 17:26 |
dansmith | heh | 17:26 |
dansmith | people already blindly paste the shell code from the first answer on stackexchange into a root terminal, | 17:27 |
sean-k-mooney | so from a tempest point of view we would want test for the new and old microverion too right | 17:27 |
dansmith | we're pretty well sunk on making them carefully consider microversions :) | 17:27 |
dansmith | yes | 17:27 |
sean-k-mooney | am so do you want to summerise what you propsoe we do | 17:28 |
dansmith | I don't "want" to, but I will | 17:29 |
sean-k-mooney | new micorversion -> always reimage, old -> preseve data for bfv reimage for not bfv, evac-> alwasy preserve data if on share starge regardless of micoversion(no change) | 17:29 |
whoami-rajat | since it was mentioned, tempest test for new behavior https://review.opendev.org/c/openstack/tempest/+/831018 | 17:30 |
whoami-rajat | still in progress though | 17:30 |
bauzas | folks, btw. I forgot to remember that I'll off from tonight until Monday | 17:32 |
bauzas | thanks | 17:32 |
dansmith | sean-k-mooney: https://review.opendev.org/c/openstack/nova-specs/+/840155 | 17:34 |
sean-k-mooney | dansmith: ack just reading it that sounds good to me | 17:36 |
dansmith | cool | 17:36 |
sean-k-mooney | whoami-rajat: reading the tempest test its doing some thing i think shoudl not be in the test | 17:36 |
sean-k-mooney | and its not valdiating eveythign i think ti shoudl be validating | 17:36 |
dansmith | sean-k-mooney: which is what? | 17:36 |
dansmith | sean-k-mooney: I added the bit to create and file and assert that it's gone after the rebuild, because initially we were supposed to be rebuild and we weren't -- the file was still present after the rebuild | 17:37 |
dansmith | not sure if that is resolved now | 17:37 |
sean-k-mooney | dansmith: the old micoversion behavior | 17:37 |
dansmith | ah for sure | 17:37 |
sean-k-mooney | also https://review.opendev.org/c/openstack/tempest/+/831018/12/tempest/api/compute/servers/test_server_actions.py#917= | 17:37 |
whoami-rajat | sean-k-mooney, ack, happy to have feedback on the test | 17:37 |
sean-k-mooney | i dont think adding a cleanup that rebuild to the old image is a good idea | 17:38 |
sean-k-mooney | it just add another failure mode in the test cleanup | 17:38 |
dansmith | yeah, not sure what that's about | 17:38 |
dansmith | comment says "not needed" | 17:38 |
whoami-rajat | dansmith, I did some changes in the nova code so the errors from logs are gone but it still somehow is not able to do it, I tested manually and the file never exists after the rebuild but somehow in this test, it stays there | 17:38 |
whoami-rajat | I'm working on a new job with two different images and will take input from there to fix it | 17:39 |
dansmith | whoami-rajat: ack, well, glad to have that assertion in there then :) | 17:39 |
dansmith | it was not rebuilding when I tried locally - the file was still present | 17:39 |
whoami-rajat | sean-k-mooney, ack, yeah that i added from the original rebuild test, can remove that part | 17:40 |
whoami-rajat | i think it was for an instance that is shared so reverted back to original image | 17:40 |
whoami-rajat | dansmith, if you try with latest code, it should work, at least it works for me and i tried 3-4 times | 17:40 |
dansmith | whoami-rajat: but not in the gate right? | 17:40 |
whoami-rajat | from nova side (in-use) and also from cinder side (available volume) | 17:40 |
whoami-rajat | yep, not in gate | 17:41 |
dansmith | okay | 17:41 |
whoami-rajat | thanks sean-k-mooney and dansmith for your feedback, it's a pity that this parameter travels down from api->conductor->compute layer and would require plenty of rework in a cycle where I've less bandwidth | 17:42 |
whoami-rajat | but i agree with the concerns and issues, so i will try to get it done | 17:42 |
dansmith | whoami-rajat: we still have to have the parameter on the rpc side | 17:42 |
dansmith | so that's not a waste :) | 17:42 |
whoami-rajat | dansmith, i don't understand | 17:43 |
whoami-rajat | i thought it's not passed at all to the API? | 17:43 |
dansmith | whoami-rajat: only the api knows whether the client requested the old or new behavior, so you still have to communicate that down to the compute worker | 17:43 |
whoami-rajat | dansmith, do you mean if microversion >=2.91 then use the ``reimage_boot_volume`` parameter for telling it to conductor and compute ? | 17:45 |
dansmith | yes | 17:45 |
whoami-rajat | oh, that reduces huge amount of work then | 17:45 |
whoami-rajat | thanks for that | 17:46 |
sean-k-mooney | with the new microverion the partmer at teh rpc level wil always be true | 17:48 |
sean-k-mooney | with the old one it will be false | 17:48 |
sean-k-mooney | so the conductor/compute chagne will still be used | 17:48 |
sean-k-mooney | whoami-rajat: but the way that paramter is set is not based on a new api parmater | 17:48 |
sean-k-mooney | whoami-rajat: it will be based on the microversion used | 17:48 |
dansmith | right | 17:48 |
whoami-rajat | sean-k-mooney[m], yes | 17:48 |
sean-k-mooney | oh i had an irc diconnect | 17:49 |
sean-k-mooney | looking at my matix client i see i did not recive a bunch fo messages | 17:50 |
sean-k-mooney | well 2 or 3 messages i guess | 17:51 |
sean-k-mooney | anyway whoami-rajat are you ok to update the spec and i can re review | 17:51 |
whoami-rajat | <sean-k-mooney> whoami-rajat: but the way that paramter is set is not based on a new api parmater | 17:52 |
sean-k-mooney | im going to call it a day however so ill review tomorrow | 17:52 |
whoami-rajat | <sean-k-mooney> whoami-rajat: it will be based on the microversion used | 17:52 |
whoami-rajat | * sean-k-mooney has quit (Remote host closed the connection) | 17:52 |
whoami-rajat | <dansmith> right | 17:52 |
whoami-rajat | <whoami-rajat> sean-k-mooney[m], yes | 17:52 |
whoami-rajat | sean-k-mooney, just for reference ^ | 17:52 |
whoami-rajat | sean-k-mooney, sure, will do that | 17:52 |
dansmith | thanks whoami-rajat ! | 17:52 |
whoami-rajat | thanks dansmith and other nova folks for this discussion, I will sign out now since it's quite late my time. have a good day :) | 17:53 |
opendevreview | Merged openstack/placement stable/wallaby: Use 'functional-without-sample-db-tests' tox env for placement nova job https://review.opendev.org/c/openstack/placement/+/840718 | 18:37 |
opendevreview | Merged openstack/nova stable/victoria: Define new functional test tox env for placement gate to run https://review.opendev.org/c/openstack/nova/+/840765 | 19:33 |
opendevreview | Ghanshyam proposed openstack/nova stable/victoria: DNM: Testing https://review.opendev.org/c/openstack/tempest/+/843182 https://review.opendev.org/c/openstack/nova/+/843188 | 19:43 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: DNM: Testing live migration with local attach https://review.opendev.org/c/openstack/nova/+/843146 | 20:10 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/wallaby: DNM: Testing live migration with local attach https://review.opendev.org/c/openstack/nova/+/843146 | 21:32 |
opendevreview | Ghanshyam proposed openstack/nova stable/ussuri: DNM: Testing stable/ussuri with tempest fix for constraints mismatch https://review.opendev.org/c/openstack/nova/+/843046 | 23:21 |
opendevreview | Miguel Lavalle proposed openstack/os-vif master: Delete trunk bridges to avoid race with Neutron https://review.opendev.org/c/openstack/os-vif/+/841499 | 23:37 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!