whoami-rajat | sean-k-mooney, it is already being tested in tempest-full but the problem is the image is same before/after rebuild so it's not testing the actual feature i.e. to reimage to a new image hence the new job, not sure if there is a better way to do it | 05:10 |
---|---|---|
opendevreview | Merged openstack/nova master: Add support for volume backed server rebuild https://review.opendev.org/c/openstack/nova/+/820368 | 07:44 |
gibi | good morning | 07:47 |
opendevreview | Merged openstack/nova master: Add conductor RPC interface for rebuild https://review.opendev.org/c/openstack/nova/+/831219 | 07:51 |
opendevreview | Merged openstack/nova master: Add API support for rebuilding BFV instances https://review.opendev.org/c/openstack/nova/+/830883 | 07:53 |
bauzas | good morning | 07:58 |
Uggla | gibi, bauzas good morning | 08:16 |
* bauzas needs a serious coffee | 08:17 | |
bauzas | Uggla: how was this morning ? :) | 08:17 |
bauzas | was Rose happy to see her friends ? | 08:17 |
Uggla | Yes, she's ok and happy because she has the teacher she wanted. | 08:18 |
gibi | :) | 08:19 |
Uggla | One little girl in her group was crying, and Rose gave her hand to go to the classroom. | 08:21 |
gibi | so nice | 08:21 |
bauzas | nice and bravo Rose | 08:24 |
bauzas | my oldest is still on vacations, she'll start tomorrow | 08:25 |
Uggla | in short no pb so far. I'll see this afternoon. | 08:25 |
* bauzas has to remember that the school times have changed this year | 08:25 | |
bauzas | if you see me disappearing quickly, that's because I forgot about the time :p | 08:26 |
Uggla | 1st day on a Friday ? This is a warmup day for teachers ? :) | 08:32 |
bauzas | Uggla: no, for the "6e" chidren (10-11yo) they started today | 08:33 |
bauzas | for the older (12-14yo), they start tomorrow | 08:33 |
*** akekane_ is now known as abhishekk | 08:33 | |
Uggla | yep strange to start on a Friday. (for your older one) | 08:33 |
bauzas | for the "6eme" folks, they only work today and not tomorrow | 08:35 |
bauzas | ("6eme" is first year of middle school) | 08:36 |
Uggla | ok so only 1 day the first week. Soft start. | 08:46 |
sean-k-mooney | its pretty common here | 09:19 |
sean-k-mooney | they give the students who are new to the school 1 day before the rest come back | 09:19 |
sean-k-mooney | so they can be shown around without all the other people | 09:20 |
sean-k-mooney | so normally they start on a wednesday here youngets class group, then first half then rest | 09:20 |
opendevreview | Konrad Gube proposed openstack/nova-specs master: Add API for assisted volume extend https://review.opendev.org/c/openstack/nova-specs/+/855490 | 09:21 |
sean-k-mooney | melwitt: ^ | 09:21 |
sean-k-mooney | whoami-rajat: tempest has an alternitive image so we can have two test one that reimages to the same image and one the reimages to the alt image https://github.com/openstack/tempest/blob/master/tempest/config.py#L294-L297 | 09:36 |
sean-k-mooney | whoami-rajat: so you would boot with image_ref and rebuild to image_ref_alt | 09:36 |
sean-k-mooney | whoami-rajat: we really only care that the images are idfferent in glance they can both be cirros and its fine | 09:37 |
sean-k-mooney | the mecanics are the same if its the same image or differnt with the new microverion | 09:38 |
sean-k-mooney | with the old microversion you can jsut the requried alt image to assert its not allowed | 09:38 |
sean-k-mooney | and the same image to assert it preserved data | 09:38 |
*** brinzhang_ is now known as brinzhang | 09:46 | |
whoami-rajat | sean-k-mooney, those parameters point to the same image in default tempest job, like see this tempest.conf from tempest-full run, both image_ref and image_ref_alt points to same image https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_83f/831018/24/check/tempest-full-py3/83f6b4a/controller/logs/tempest_conf.txt | 09:56 |
whoami-rajat | sean-k-mooney, I'm configuring different images for those parameters in my job, see tempest.conf for my new job https://a59fb54e72bca9b30065-6505ce46ad1aa88851df8d527471f9dc.ssl.cf1.rackcdn.com/831018/24/check/tempest-rebuild-volume-backed/c4755d7/controller/logs/tempest_conf.txt | 09:56 |
whoami-rajat | IMAGE_URLS parameter in devstack enables setting different images for those parameters | 09:57 |
whoami-rajat | by default only one image is created and set as both image_ref and image_ref_alt | 09:57 |
whoami-rajat | that's why i configured a new job with this parameter, wasn't sure if modifying existing jobs for my test run is a valid thing so created a new job | 09:58 |
gibi | stephenfin: hi! how do you feel, will you have time to review the PCI series up until https://review.opendev.org/c/openstack/nova/+/850468/20 before the FF deadline? | 10:00 |
sean-k-mooney | whoami-rajat: yes but it would be better to configure devstack to upload the cirrios image twich with differnt names and then use that | 10:09 |
sean-k-mooney | so i think yes modifying the exisitng josb is totally valid | 10:09 |
sean-k-mooney | and would get more coverage of the feature. that can be done in the futrue | 10:10 |
sean-k-mooney | whoami-rajat: also congrats its merged in nova this morning | 10:10 |
whoami-rajat | hmm, ack, will try that | 10:10 |
sean-k-mooney | whoami-rajat: sorry it took so long | 10:10 |
whoami-rajat | sean-k-mooney, thanks and thank you to all the nova team bauzas gibi dansmith for helping with that! | 10:10 |
whoami-rajat | no problem :) | 10:11 |
sean-k-mooney | did you see my comments on teh osc patch | 10:11 |
whoami-rajat | nope, looking | 10:11 |
sean-k-mooney | ack you duplicated a test but forgot to rename it in the copy paste | 10:11 |
sean-k-mooney | trivial fix | 10:11 |
gibi | whoami-rajat: you are welcome | 10:11 |
whoami-rajat | sean-k-mooney, oh i addressed that yesterday only, i thought there was a new comment | 10:12 |
sean-k-mooney | ack, no i have not reviewd since you updated it | 10:12 |
sean-k-mooney | ill go do that now | 10:12 |
whoami-rajat | thanks | 10:12 |
whoami-rajat | https://review.opendev.org/c/openstack/python-openstackclient/+/831014 -- for quick reference | 10:12 |
sean-k-mooney | alredy reviewed | 10:13 |
sean-k-mooney | it looks good to me stephenfin ^ your an osc core care to take a look | 10:14 |
opendevreview | Amit Uniyal proposed openstack/nova master: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/854499 | 10:17 |
opendevreview | Amit Uniyal proposed openstack/nova master: [compute] always set instnace.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/791135 | 10:17 |
sean-k-mooney | ricolin: can you update https://review.opendev.org/c/openstack/nova/+/844507/18/nova/virt/libvirt/driver.py#12202 | 11:14 |
sean-k-mooney | if you can make that change im +2 on both patches | 11:15 |
sean-k-mooney | bauzas: ^ | 11:15 |
sean-k-mooney | ill see if i can quickly test your changes in devstack too | 11:15 |
sean-k-mooney | to confirm this work as intended | 11:15 |
bauzas | sean-k-mooney: I can review the traits change once ricolin updates it | 12:46 |
ricolin | Thanks sean-k-mooney: bauzas will do it ASAP | 12:46 |
bauzas | cool | 12:46 |
sean-k-mooney | in its current form its not wrong its just over complciated | 12:46 |
sean-k-mooney | ricolin: i think if you adress that everything else looks ok | 12:47 |
sean-k-mooney | and it can land before FF | 12:47 |
ricolin | sean-k-mooney: sounds great | 12:47 |
sean-k-mooney | i have not had a change to test it locally yet but i expect it to work | 12:47 |
opendevreview | Amit Uniyal proposed openstack/nova master: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/854499 | 12:47 |
opendevreview | Amit Uniyal proposed openstack/nova master: [compute] always set instnace.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/791135 | 12:47 |
sean-k-mooney | ricolin: i assume you ahve booted a vm with this code and validated the iommu is present | 12:48 |
*** dasm|off is now known as dasm | 12:54 | |
opendevreview | Rico Lin proposed openstack/nova master: Add traits for viommu model https://review.opendev.org/c/openstack/nova/+/844507 | 12:55 |
ricolin | sean-k-mooney: done | 12:56 |
ricolin | and yes, I'm sure IOMMU is there. Just not sure I test Traits in the right way | 12:58 |
bauzas | ricolin: sean-k-mooney: reviewing | 13:01 |
ricolin | thanks bauzas | 13:02 |
sean-k-mooney | bauzas: gibi if ye want to proceed with ^ im more or less happy with it. the release note shoudl be in the seond patch | 13:23 |
sean-k-mooney | but as long as we land both of them its not an issue | 13:24 |
bauzas | sean-k-mooney: I just said +1 for an upgrade question | 13:24 |
sean-k-mooney | i replied | 13:24 |
sean-k-mooney | thats intentional | 13:24 |
sean-k-mooney | altough dansmith might point out we could have also added a min compute service check for this instead of using the traits | 13:25 |
sean-k-mooney | we normally did not dod that in the past however | 13:26 |
sean-k-mooney | so i think this is all good to go | 13:26 |
gibi | the trait based capability scheduling looks OK to me | 13:26 |
gibi | I think dansmith had issues using the capability trait outside of the scheduling | 13:26 |
dansmith | if we're already scheduling, then traits make plenty of sense | 13:26 |
dansmith | right | 13:26 |
sean-k-mooney | yep its in the existing pre filter | 13:27 |
gibi | bauzas: do you need my +2 or you will send it in? | 13:27 |
sean-k-mooney | before we started to do these check with placment we would have landed on the host and failed there with hypervior too old or similar | 13:27 |
bauzas | sorry folks was in 1:1 meeting | 13:28 |
bauzas | sean-k-mooney: thanks, will reply then with +2 | 13:28 |
gibi | ack, then I'm not needed there :) | 13:28 |
bauzas | it was just an open thought | 13:28 |
bauzas | I just want to make sure that operators know they need to upgrade all their hosts if so | 13:28 |
bauzas | but that's understandable | 13:29 |
sean-k-mooney | well its as you said | 13:29 |
sean-k-mooney | the dont actully have to but it will be capsity limited | 13:29 |
sean-k-mooney | but that kind of to be expected that you cant use new feature on old nodes | 13:30 |
bauzas | yup, agreed | 13:30 |
bauzas | at least with traits | 13:30 |
sean-k-mooney | well even without | 13:30 |
bauzas | without, we ask for compute service checks in general | 13:30 |
sean-k-mooney | yep | 13:31 |
sean-k-mooney | but that is why i said the traits patch chould be first | 13:31 |
bauzas | well, if both merge for Zed, I'm cool | 13:31 |
sean-k-mooney | and the other one secodn and why i said in the current order they need to merge togather | 13:31 |
bauzas | yes and no | 13:31 |
sean-k-mooney | bauzas: let me rephase i dont want to merge only one of thoes two patches | 13:32 |
bauzas | you could merge the first, this is just you won't be able to get the feature until you upgrade all | 13:32 |
dansmith | gibi: sean-k-mooney: are we FFEing the user_data stuff such that I should try to bang out the rest of the RPC stuff ASAP? | 13:32 |
bauzas | won't be able to *be sure* to get the feature | 13:32 |
bauzas | dansmith: eeek, context ? | 13:32 |
bauzas | oh, the rpc call | 13:32 |
bauzas | lemme just send to the gate ricolin's work | 13:33 |
sean-k-mooney | bauzas: that an di guess i need to rev my follow up patch with actul tests | 13:33 |
sean-k-mooney | dansmith: im not agasint doint that if you think you will have time | 13:33 |
sean-k-mooney | but that probaly the main RFE that i think FFE might make sense for | 13:34 |
dansmith | sean-k-mooney: I was expecting to see a rev of the patch to move the regen to after the instance is destroyed and fix the actual writing that was failing | 13:34 |
dansmith | so I hadn't rebased my RPC stuff yet | 13:34 |
sean-k-mooney | the only other one might be the PCI series ebut have nto looked at it for two days | 13:34 |
dansmith | but that has to happen first right? | 13:34 |
sean-k-mooney | oh i wrote a ptach to fix config drive | 13:34 |
dansmith | oh is that ths? https://review.opendev.org/c/openstack/nova/+/855351/1 | 13:35 |
sean-k-mooney | yes | 13:35 |
gibi | sean-k-mooney: you are +2 up until the healing patches in the PCI and that is the realistic target there. If stephenfin will have no time to review them today then I will ask for an FFE for that. We let the scheduling part slip in any case | 13:35 |
bauzas | the mutable userdata has API impact | 13:35 |
dansmith | okay but that would need to be squashed or go in front right? | 13:35 |
bauzas | but I'm OK with FFE'ing if needed | 13:35 |
dansmith | bauzas: it would also be about half not-very-reviewed code at this point based on the look of the configdrive patch | 13:36 |
dansmith | so not really "just didn't get reviewed in time" | 13:36 |
sean-k-mooney | dansmith: yes or they split the patch in to non config drive and config drive | 13:36 |
dansmith | that would mean two microversions for effectively the same thing | 13:37 |
sean-k-mooney | dansmith: i basically started my patch so they would have a refernce for what needed to be done | 13:37 |
dansmith | unless we do the config regen and rpc ahead of exposing int the api | 13:37 |
bauzas | dansmith: I think we reviewed it good, but we got a bone | 13:37 |
bauzas | so, I'm OK with giving more time to review that bone fix | 13:37 |
dansmith | bauzas: I'm saying the code that needs to be written to make it landable would probably double the actual code in the patch | 13:38 |
sean-k-mooney | dansmith: yep we coudl do that so intialy it would not be invokeable from the api btut he code would be in place | 13:38 |
dansmith | sean-k-mooney: yeah that seems better to me | 13:38 |
bauzas | dansmith: then, we need to take this extratime to balance the risks and maybe not merge it | 13:38 |
bauzas | or decide we only merge half the things | 13:38 |
sean-k-mooney | dansmith: so you could proably combin your rpc code into the pathc i started then we could flip the order of them | 13:39 |
gibi | we could merge the part that support user data update with non config drive instances. | 13:39 |
sean-k-mooney | but that also means we need to other pepoel to agree to review this | 13:39 |
sean-k-mooney | since you and i are basicaly out at that point | 13:39 |
gibi | I can review the patches (I will be around 18:00 CEST today but I can spend time early tomorrow too) | 13:40 |
gibi | * I will be around until 18:00 CEST today | 13:40 |
dansmith | sean-k-mooney: right, that's a problem too | 13:40 |
dansmith | sean-k-mooney: so a patch in front that adds the flag to the reboot call, then the patch to make it regeneratable, then the api patch to do it for both types would be cleanest I think | 13:41 |
sean-k-mooney | dansmith: ya that sounds like a plan | 13:41 |
sean-k-mooney | it keeps just one micorversion | 13:42 |
sean-k-mooney | i can go writeh the tests need to make the rebuild patch complete if we agree to proceed in this direction | 13:42 |
dansmith | sean-k-mooney: ack, I'll rebase my rpc stuff | 13:43 |
sean-k-mooney | dansmith: it would proably make sense for you to write your patch directly off master and i could rebase and flip the order when your done | 13:44 |
dansmith | sean-k-mooney: yeah | 13:44 |
bauzas | I'm OK with having 3 patches, the API one being the latest | 13:44 |
bauzas | and this being FFE, so we have 1 week to correctly look at all of this | 13:44 |
bauzas | elodilles: when is the exact deadline for cycle highlights ? | 13:47 |
bauzas | today or tomorrow ? | 13:47 |
bauzas | one day, the marketing team will understand how crazy it is to ask for feature docs deliveries at the FeatureFreeze (and even before) | 13:48 |
bauzas | while RC1 time would be so easier | 13:48 |
bauzas | this is just, I need to take 1 hour of my time for writing something while we're on fire | 13:49 |
sean-k-mooney | on the plus side tempest-integrated-compute forces config drive and that passed so at least that change in logic works for normal boots https://zuul.opendev.org/t/openstack/build/98cfa86b93974d79b1d43c29433d7e48/log/controller/logs/etc/nova/nova-cpu_conf.txt#17 | 13:50 |
sean-k-mooney | i also checked that locally but nice to see it in ci too. | 13:52 |
elodilles | bauzas: i think there is no strict deadline for today, we will merge it tomorrow if it will be ready only around that time :) | 13:53 |
sean-k-mooney | elodilles: i think the remidner was asking for it yesterday | 13:54 |
opendevreview | Justas Poderys proposed openstack/nova-specs master: Improve multiqueue network adapter support https://review.opendev.org/c/openstack/nova-specs/+/855514 | 13:54 |
sean-k-mooney | but i guess bauzas point was we might not know the full feature set until we hit FF | 13:54 |
bauzas | elodilles: surely there is no problem, but I wonder why such rush | 13:54 |
sean-k-mooney | or RC1 if we have FFEs | 13:55 |
bauzas | elodilles: I just replied to your email fwiw | 13:55 |
bauzas | I guess my point is not if I can deliver later | 13:55 |
bauzas | but rather asking for an official move to RC1 | 13:55 |
sean-k-mooney | elodilles: oh you just said this week | 13:56 |
bauzas | sean-k-mooney: https://releases.openstack.org/zed/schedule.html#cycle-highlights | 13:56 |
sean-k-mooney | yep | 13:56 |
sean-k-mooney | looking at it now | 13:57 |
bauzas | the schedule officially says the week of Zed-3 | 13:57 |
bauzas | this is non-trivial | 13:57 |
sean-k-mooney | i can understand that they might want to have them read for RC1 | 13:57 |
sean-k-mooney | but the week beteeen the two woudl be a better comproise | 13:58 |
dansmith | sean-k-mooney: so I'm bringing in the libvirt driver reconfigure_configdrive() method from the original patch, but I should move that until after the self.destroy() yeah? | 14:01 |
sean-k-mooney | well it need to get called via the callback | 14:02 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/855351/1/nova/virt/libvirt/driver.py#3978 | 14:03 |
dansmith | I guess I need to look at your patch | 14:03 |
elodilles | bauzas: if everything goes accroding to plans, then by FF a team knows every feature which were merged. of course, FFEs can alter the picture, but that i think is acceptable. Be the most part of the highlights merged and update the rest if needed as soon as possible. IMO. | 14:03 |
sean-k-mooney | tl;dr building the config drrive need info generage by libvirt in the xml | 14:03 |
elodilles | bauzas: "The Zed-3 milestone marks feature freeze for projects following the release:cycle-with-rc model. No featureful patch should be landed after this point. Exceptions may be granted by the project PTL." | 14:03 |
sean-k-mooney | elodilles: yes but generaly review bandwith consumes all the ptls time | 14:04 |
sean-k-mooney | elodilles: so the project team does not have time to reflect on the feature that lanaded over the cycle to see what should be highlighted | 14:04 |
bauzas | sean-k-mooney: that's my point | 14:05 |
dansmith | sean-k-mooney: ah okay so actually yours should go first then | 14:05 |
bauzas | this is counterproductive for me to block 2 hours of my time on a FF Thursday in order to write a document I could be able to write the week after | 14:05 |
dansmith | otherwise I'm, plumbing rpc that lies about what it does | 14:05 |
bauzas | the rpc thing concerns me | 14:06 |
bauzas | do we need to have a service check on the API ? | 14:06 |
sean-k-mooney | i think we can do them in any order intially and jsut fix it once we have the first verion of all the patches | 14:06 |
dansmith | bauzas: yes because it's a case | 14:06 |
dansmith | *cast | 14:06 |
bauzas | dansmith: agreed, particularly because of the non-blocking RPC call then | 14:07 |
dansmith | bauzas: the original patch was checking with tempest, but if we have an rpc version, we can use the version pin, or check service version | 14:07 |
bauzas | and an ERROR would be terrible to manage | 14:07 |
dansmith | bauzas: since this is a new param, I was going to make rpcapi refuse to send if you ask for the new flag, per usual, so the api will catch and handle that, knowing it was not allowed | 14:08 |
elodilles | bauzas: ack, i see. to tell you the truth i don't know when exactly marketing team processes the cycle highlights, so maybe we have here some days until they'll need it (especially due to FFEs) | 14:08 |
dansmith | but all that has to get tested for sure | 14:08 |
sean-k-mooney | elodilles: by the way can you add https://review.opendev.org/c/openstack/nova/+/833435 to your list whenever you have time | 14:09 |
elodilles | sean-k-mooney: ack, looking | 14:10 |
elodilles | bauzas: i'll answer to your mail as well :) | 14:11 |
sean-k-mooney | its just a backport making its way down to train so not urgent i came across the downstream bug and realise i should proably followup | 14:11 |
bauzas | dansmith: oh, right | 14:11 |
bauzas | dansmith: just exception handling at the API level, you're right | 14:12 |
bauzas | elodilles: don't get me wrong, I wasn't grumbling, I was just pointing out my problem for discussion | 14:12 |
bauzas | maybe the foundation folks don't exactly get what's happening at FF | 14:13 |
bauzas | in particular with the big projects | 14:13 |
elodilles | bauzas: i summarized what i think about this in my mail. let's see if someone from Marketing or Release team has different view o:) | 14:27 |
elodilles | sean-k-mooney: the patch looks valid, +2'd | 14:28 |
sean-k-mooney | ill slowly refersh those patches as they merged but give we are are FF im also trying to limit the ci usage | 14:33 |
elodilles | sean-k-mooney: ++ | 14:35 |
opendevreview | sean mooney proposed openstack/nova master: support configdrive rebuilding https://review.opendev.org/c/openstack/nova/+/855351 | 14:53 |
opendevreview | sean mooney proposed openstack/nova master: add support for updating server's user_data https://review.opendev.org/c/openstack/nova/+/816157 | 14:53 |
sean-k-mooney | dansmith: that wont work ^ but there in the correct order now and i can start on the tests | 14:54 |
dansmith | sean-k-mooney: okay but we can't land an RPC version that says it does a thing when it doesn't | 14:54 |
dansmith | sean-k-mooney: so I think your thing needs to be in front with "if False" or something | 14:55 |
dansmith | and I'll change that to "if the thing is enabled:" | 14:55 |
sean-k-mooney | dansmith: something like https://review.opendev.org/c/openstack/nova/+/855351/2/nova/virt/libvirt/driver.py#3879 | 14:55 |
sean-k-mooney | which is never set anywhere to True | 14:56 |
sean-k-mooney | and this if https://review.opendev.org/c/openstack/nova/+/855351/2/nova/virt/libvirt/driver.py#5011 | 14:56 |
dansmith | yeah | 14:56 |
sean-k-mooney | so i was thinking you could start a patch on top of the first one or if you want i can proably do it based on the termbin you provided | 14:57 |
sean-k-mooney | you will be quicker at that then me since its been a while since i toughed rpc but either way | 14:57 |
sean-k-mooney | im going to work on the base patch and get it to pass unit and func tests | 14:58 |
dansmith | yeah, I'm working on the rpc stuff already | 14:58 |
dansmith | I think that base patch will work now | 14:58 |
sean-k-mooney | it shoudl work in devstack im expecting the unit test to bitch about the extra paramter to _hard_reboot | 14:59 |
bauzas | hmm, are configdrivers only supported by libvirt N? | 15:04 |
bauzas | configdrives | 15:04 |
bauzas | I'm afraid of https://review.opendev.org/c/openstack/nova/+/816157/16/nova/virt/driver.py | 15:04 |
sean-k-mooney | techinially they are supproted by other drivers i think | 15:04 |
bauzas | trampling all our drivers but libvirt | 15:04 |
gibi | bauzas: nope, it is supported by other drivers too | 15:04 |
bauzas | that's super late for them to support them | 15:04 |
bauzas | s/them/this | 15:05 |
sean-k-mooney | they dont need too | 15:05 |
sean-k-mooney | they just wont be able to use this feature | 15:05 |
sean-k-mooney | i think that was in the spec | 15:05 |
sean-k-mooney | either way this was only planned to be implented by libvirt | 15:05 |
sean-k-mooney | at least for now | 15:05 |
bauzas | hmmmm | 15:05 |
gibi | so we need to detect the capability in the API and reject the reboot request | 15:06 |
bauzas | yup | 15:06 |
sean-k-mooney | thats in the patch currentlyyes | 15:06 |
gibi | if there is user_data update but the virt driver has no capability to regenerate | 15:06 |
bauzas | it wasn't stated a virt driver dependency in the spec https://review.opendev.org/c/openstack/nova-specs/+/816542/7/specs/zed/approved/update-userdata.rst | 15:07 |
sean-k-mooney | by the way we can now drop the object changes | 15:07 |
bauzas | this change seems to me more and more fragile | 15:07 |
sean-k-mooney | bauzas: well im not going to have time to work on ironic by monday | 15:07 |
bauzas | sean-k-mooney: I'm not asking you to do such things | 15:07 |
gibi | (it is not just ironice) | 15:08 |
sean-k-mooney | so hehe i think we were reving this as libvirt only or at least i was | 15:08 |
bauzas | but I point out how fragile this is | 15:08 |
sean-k-mooney | gibi: ya i know | 15:08 |
bauzas | and again, this wasn't explained in the spec | 15:08 |
sean-k-mooney | ironic would just be the most painful | 15:08 |
sean-k-mooney | bauzas: orgianly in the spec they planned to only do it for the metadta api | 15:08 |
bauzas | I was having concerns with this spec because I thought it wasn't really a needed usecase | 15:08 |
sean-k-mooney | but then we pointd out config drive exsited and need to also work | 15:09 |
bauzas | sean-k-mooney: no, they said about configdrives too | 15:09 |
sean-k-mooney | bauzas: no i asked them to add that | 15:09 |
sean-k-mooney | they wanted api only | 15:09 |
bauzas | sean-k-mooney: yes, but I also said it was a problem | 15:09 |
sean-k-mooney | id did not want this to not work if you used config drive | 15:09 |
sean-k-mooney | so as it stands any driver that does not supprot the new metond will not report the trait | 15:10 |
sean-k-mooney | that need to be called out in the api ref | 15:10 |
sean-k-mooney | or other docs for this | 15:10 |
sean-k-mooney | we could put it in the driver suport matirx i guess | 15:10 |
sean-k-mooney | that proably beter long term | 15:10 |
sean-k-mooney | dansmith: only 6 failure for the extra paramater | 15:14 |
sean-k-mooney | i actully need the driver api change in the first patch too so ill add that | 15:14 |
gibi | I'm wondering about the virt driver api change | 15:15 |
dansmith | but bauzas' point about this not being supported by other drivers is legit | 15:15 |
dansmith | having features that look like swiss cheese is very confusing for users | 15:15 |
sean-k-mooney | gibi: i could drop that now actully | 15:15 |
gibi | I asked for it originally as I thought the reboot + regenerate logic can be orchestrated from the compute manager | 15:15 |
sean-k-mooney | if we just have the parmater | 15:15 |
dansmith | even if we reject for the api caller, it's still annoying to say "reboot works, but not reboot with userdata, but only if you're using configdrive" | 15:16 |
gibi | but the actualy implementation calls the regenerate call from the virt driver | 15:16 |
dansmith | gibi: reboot is a single call to virt, | 15:16 |
gibi | so it is less useful to have a virt driver method | 15:16 |
dansmith | and the regenerate has to happen in the middle | 15:16 |
gibi | dansmith: I see now | 15:16 |
dansmith | it's pretty unideal in general for sure | 15:16 |
dansmith | just saying, I think that's why it was done this way | 15:16 |
gibi | this is why I pointing out that the separate virt driver method is probably not needed | 15:17 |
sean-k-mooney | so we dont need a new public methond in the virt driver api | 15:17 |
sean-k-mooney | its just a new parmater to hard_reboot right | 15:17 |
dansmith | no, the publicness isn't the concern I think | 15:17 |
sean-k-mooney | right i know its the uniformatiy of the api | 15:17 |
sean-k-mooney | and not having it depend on the backend | 15:17 |
bauzas | agreed with dansmith my concern isn't the publicness | 15:17 |
bauzas | this is about the fact we depend on virt drivers | 15:18 |
gibi | it would be nice to do the reboot in 3 generic steps from the compute manager: stop, regenerate, start. But this is a big change | 15:18 |
sean-k-mooney | yep i get that but we have many things that depend on the virt driver | 15:18 |
dansmith | I definitely think these kinds of pinpoint features that look generic but only work in libvirt are a problem | 15:18 |
bauzas | sean-k-mooney: well, do all the virt drivers support to create *or* update the configdrives ? | 15:19 |
dansmith | create yes | 15:19 |
dansmith | nobody supports update today AFAIK | 15:19 |
sean-k-mooney | vmware ironic libvirt yes | 15:19 |
bauzas | dansmith: that's my point, I wasn't seeing the dependency when reviewing the spec | 15:19 |
sean-k-mooney | for create | 15:19 |
sean-k-mooney | not sure about powervm | 15:19 |
dansmith | gibi: the problem with that is things like vmware where I'm not sure where the config drive actually gets created, or even if generating it on the compute node and passing it to the virt driver as a file would necessarily be reasonable | 15:19 |
dansmith | since it's a remote thing | 15:19 |
sean-k-mooney | i can take a look and see what would be required for other driver quickly | 15:20 |
sean-k-mooney | ironic is the hard one | 15:20 |
sean-k-mooney | they pass the config drive via ipa | 15:20 |
gibi | dansmith: I see | 15:21 |
dansmith | like I think ironic passes the configdrive as a string to the api, | 15:21 |
dansmith | never generates it on disk | 15:21 |
sean-k-mooney | ah that might be how nova passes it to ironci but then ironic connects it to the vm via ipa or mounting it | 15:21 |
dansmith | yeah | 15:21 |
dansmith | it serializes it and then ironic writes it remotely I think | 15:22 |
dansmith | now, if compute generated the file I guess the driver could read and send it | 15:22 |
dansmith | but still, those are big changes requiring lots of testing | 15:22 |
sean-k-mooney | so really we would want the regeneration logic to be here https://github.com/openstack/nova/blob/master/nova/virt/configdrive.py | 15:22 |
sean-k-mooney | in the generic shared part | 15:22 |
sean-k-mooney | althogh maybe not | 15:23 |
sean-k-mooney | that does not quite do what i was expecting | 15:23 |
bauzas | I tend to agree with dansmith, we're playing with guns | 15:23 |
dansmith | well, I like playing with guns, but maybe.. playing with fire? :) | 15:23 |
bauzas | if we try to update a configdrive and this doesn't work, then our user won't be happy at all | 15:24 |
bauzas | dansmith: what you prefer | 15:24 |
bauzas | or playing with teslas | 15:24 |
dansmith | as long as you mean "teslas, the unit of charge, meaning playing with dangerous high-voltage" .. then sure :D | 15:25 |
bauzas | point is, we need to only accept to update the userdata if the configdriver correctly regenerated | 15:25 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1060-L1092 | 15:25 |
sean-k-mooney | so ironci jsut tuns it into a string yes | 15:25 |
bauzas | dansmith: like "driving a tesla with a 10% battery for 100km" | 15:25 |
dansmith | oh I see | 15:25 |
dansmith | that *is* scary :) | 15:26 |
bauzas | this *may* work | 15:26 |
bauzas | but before driving, you need to ensure you can do it | 15:26 |
bauzas | the same goes with configdrives | 15:26 |
sean-k-mooney | and they pass it in spawn and rebuild https://github.com/openstack/nova/blob/bcdf5988f6ae902dba9b41144a7b4a60688b627c/nova/virt/ironic/driver.py#L1189-L1191 | 15:26 |
bauzas | you only accept the userdata to be updated once you're sure your configdrive is correctly regenerated | 15:26 |
dansmith | so the feeling I'm getting here is that there are a lot of open questions at this point | 15:27 |
bauzas | since the user doesn't have an idea whether the userdata is updated and only relies on the fact he/she restarted the instance | 15:27 |
sean-k-mooney | bauzas: well that is already a problem today | 15:27 |
dansmith | sean-k-mooney: no it's not | 15:27 |
bauzas | today the userdata is immutable | 15:27 |
dansmith | sean-k-mooney: if you can only provide userdata during create, it's clear what userdata the instance sees | 15:27 |
bauzas | that reminds me the discussions we had when reviewing server groups /PUT | 15:28 |
sean-k-mooney | it is the config drive is not updated today if you atach interface or volumes or update the server metadta | 15:28 |
sean-k-mooney | the user data is not statble | 15:28 |
sean-k-mooney | btu in general the config drive can be | 15:28 |
dansmith | configdrive != userdata | 15:28 |
sean-k-mooney | yep that is my point | 15:28 |
sean-k-mooney | the config drive can be stale today the user data is not | 15:28 |
sean-k-mooney | but that is only because its imutable | 15:29 |
dansmith | right | 15:29 |
dansmith | not ideal, but also not ambiguous | 15:29 |
dansmith | which is bauzas' point I think | 15:29 |
bauzas | yup | 15:29 |
bauzas | since it's immutable, this isn't a problem | 15:29 |
sean-k-mooney | sure but its expected that the user data script will likely depend on the other datta in the config drive | 15:29 |
sean-k-mooney | specificly the device role taggin info | 15:29 |
opendevreview | Dan Smith proposed openstack/nova master: WIP: Add recreate_configdrive to reboot https://review.opendev.org/c/openstack/nova/+/855529 | 15:29 |
dansmith | sean-k-mooney: here's the rpc stuff ^ passing tests, not wired into anything else | 15:30 |
dansmith | just so it's available | 15:30 |
sean-k-mooney | dansmith: but to your point it does sound like there is enough stuff that we wont have this mergable even with an FFE | 15:30 |
dansmith | needs more test coverage and an assert on soft I think | 15:30 |
bauzas | sean-k-mooney: my personal opinion is that the more I review this API change, the more I think I discover points we missed | 15:33 |
bauzas | hence my concern | 15:33 |
dansmith | so I should probably rebase that on your regen configdrive patch and I guess I can rebase the api bit on top of that and trivially wire it up | 15:33 |
bauzas | the configdrive regeneration is already a concern (making sure we validate it synchronously with the userdata update) | 15:33 |
bauzas | the fact that we leave other virt drivers in the weeds is another concern | 15:34 |
bauzas | both weren't discussed at the PTG | 15:34 |
dansmith | so on that, | 15:34 |
bauzas | and when I reviewed the spec, I was letting enough comments explaining how I was thinking the reboot solution was fragile | 15:34 |
dansmith | are we really expecting anyone to update vmware or hyperv to do this? | 15:34 |
bauzas | reasonably not | 15:35 |
dansmith | ironic perhaps, but for the above patch, | 15:35 |
dansmith | I was looking at those other drivers and wondering when they were last touched | 15:35 |
bauzas | surely, but we haven't even asked them to look | 15:35 |
dansmith | okay I guess we've had changes in 2022 for vmware | 15:35 |
dansmith | so maybe doable | 15:35 |
dansmith | not much for hyperv though | 15:36 |
dansmith | so I don't see any configdrive stuff in hyperv at first glance | 15:37 |
bauzas | fair enough, but then question | 15:37 |
bauzas | I'm an user | 15:37 |
bauzas | I wanna update my pet's userdata | 15:38 |
bauzas | I do it | 15:38 |
bauzas | and then I don't understand, my f*** script doesn't run as expected when I restart the guest | 15:38 |
bauzas | shall I open a ticket ? | 15:38 |
dansmith | oh it's in hyperv, just in vmops.py | 15:39 |
bauzas | oh, simple, your cloud was running some driver on that host that was having trouble with updating the configdrive | 15:39 |
bauzas | either because it was a legacy driver | 15:39 |
bauzas | or because something (like a perm error) expected | 15:39 |
bauzas | honestly, this spec was just about touching an internal Nova DB record | 15:40 |
bauzas | now, we're pulling way more than that | 15:40 |
dansmith | yeah, it's becoming more about configdrive than anything else | 15:40 |
sean-k-mooney | bauzas: in two converstaion but form my persepcitve i dont think we really have discoverd anythign bar the rpc change. i reveiw the spec with the understandin it was scoped to libvirt | 15:41 |
bauzas | yet again, can't we just assume to update instance's metadata only if not configdrive-driven, and leave people who care about configdrives deal with the complexity one day or another? | 15:42 |
dansmith | I dunno, I can see the argument for this being scoped to just one driver, but for something this fundamental it seems pretty unfortunate to say it's only for libvirt | 15:42 |
bauzas | I'm pretty sure we wouldn't have this back-to-back | 15:43 |
dansmith | so we reject the api call if configdrive=True always? | 15:44 |
bauzas | that's my question | 15:44 |
dansmith | doesn't that get closer to your concern on the spec that people think they have updated their instance but didn't reboot it and don't realize that cloud-init doesn't update their stuff? | 15:45 |
dansmith | I mean, that's just a misunderstanding of cloud-init, but I thought that was your concern | 15:45 |
dansmith | anyway, I dunno | 15:45 |
bauzas | if the update failed, they know the instance userdata wasn't updated | 15:45 |
bauzas | they would know this was because configdriver if they asked by boot params | 15:46 |
bauzas | but they wouldn't know if the clould defaulted to force configdrives or if the image was asking for it | 15:47 |
dansmith | right, it's not always their choice | 15:47 |
bauzas | anyway, I'm just offering a trade-off because I'm not really happy with the current state of the series | 15:47 |
bauzas | also, is the owner of the patch around ? | 15:48 |
bauzas | or are we offering our help for free ? | 15:48 |
dansmith | who is asking for this btw? | 15:48 |
dansmith | aside from the author | 15:48 |
bauzas | a private clould company | 15:49 |
bauzas | Inovex | 15:49 |
jhartkopf | I'm here | 15:49 |
bauzas | cool, glad to hear you jhartkopf | 15:49 |
bauzas | jhartkopf: lemme summarize | 15:49 |
dansmith | jhartkopf: do you care about the configdrive case? | 15:49 |
bauzas | we're having a problem with https://review.opendev.org/c/openstack/nova/+/816157/16 | 15:49 |
bauzas | and we don't know how to move forward | 15:49 |
jhartkopf | yep I quickly read your discussion | 15:50 |
bauzas | there is a proposed solution, but this is premature work and we're very late in the cycle | 15:50 |
bauzas | we're also creating a dependency to the virt drivers and unfortunately, Hyper-V, VMWare and Ironic could be impacted if we merge | 15:50 |
bauzas | lastly, I have concnerns about ensuring that the userdata API query is tied to the successfulness of the configdrive regen op | 15:51 |
bauzas | which would require the API call to be synchronous on a configdrive regen | 15:51 |
bauzas | so, my question is | 15:52 |
* sean-k-mooney i dont think its fiar to say the other virt dirver would be impacted | 15:52 | |
bauzas | do you really care of configdrives? | 15:52 |
dansmith | bauzas: we can't make the api call synchronous for config drive regen | 15:53 |
dansmith | that could take a long time, be dependent on the IO load on the compute, etc | 15:53 |
sean-k-mooney | i dont think that was an option | 15:53 |
bauzas | sean-k-mooney: I think it would be fair to say that this feature wouldn't be available to the other drivers while apparently they already support configdrive regen | 15:53 |
bauzas | eek, not regen, but creation | 15:53 |
sean-k-mooney | bauzas: yes that is correct | 15:53 |
bauzas | dansmith: right, I was pointing out loudly the problem | 15:53 |
dansmith | ah okay | 15:54 |
bauzas | this is a design problem | 15:54 |
bauzas | configdrives take time to regenerate | 15:54 |
bauzas | on the other hand, users have an API for updating userdata | 15:54 |
bauzas | they just assume they just have to restart their guests in order to have the latest update | 15:55 |
jhartkopf | bauzas: Initially we did not even want to touch config drives at all. We'd only like to update user data in the metadata service really. | 15:55 |
sean-k-mooney | right but the api prevents you updating the user data if the instance has a conf driver outside of hard_reboot | 15:55 |
bauzas | I know | 15:55 |
stephenfin | gibi: Sorry for the delay. I'd run through most of those PCI patches yesterday but didn't leave reviews until I got to the end of the series. Done now (y) | 15:55 |
gibi | stephenfin: much appreciated. I will look | 15:55 |
stephenfin | I went as far as sean-k-mooney did. Could go further but I see -1's from you | 15:55 |
gibi | stephenfin: yes, I left them -1 there as there is a bug in the scheduling part | 15:56 |
gibi | stephenfin: I think it is totally OK to merge it up until https://review.opendev.org/c/openstack/nova/+/850468/20 | 15:56 |
jhartkopf | What's up with the plan you guys proposed yesterday to split the patch and only allow updating instances without config drive for now? | 15:57 |
gibi | stephenfin: the rest slips to AA | 15:57 |
stephenfin | sounds reasonable to me | 15:57 |
sean-k-mooney | jhartkopf: i would prefer not to do that if we can avoid it but its an option | 15:58 |
sean-k-mooney | i think making this feature depend on if config drive is used or not is worse then only supproting it for some virt dirvrs | 15:58 |
gibi | stephenfin: there is two FUPs at the top. I will move them to be on the mergeable part of the series | 15:58 |
sean-k-mooney | sicne the config drive can change based on what host you land on | 15:58 |
sean-k-mooney | there is no way to knwo if it will work or not | 15:59 |
dansmith | sean-k-mooney: hmm, the mandatory config drive thing is a compute-node config right? | 15:59 |
sean-k-mooney | dansmith: yes | 15:59 |
dansmith | yeah, that sucks | 15:59 |
dansmith | so instances at the edge which use configdrive become un-updatable | 15:59 |
sean-k-mooney | yep | 16:00 |
gibi | user could use the capability trait to land on a host that supports regen | 16:00 |
dansmith | so another thing, | 16:00 |
sean-k-mooney | or ones in isolated networks | 16:00 |
dansmith | we really should support reboot with user_data for any instance, | 16:00 |
dansmith | because that's the consistent way to get it updated, regardless of what type it is | 16:00 |
dansmith | that way you get updated, you get rebooted, etc | 16:00 |
dansmith | gibi: the user doesn't know this is a problem until it's too late is the point | 16:01 |
sean-k-mooney | dansmith: that more or less what we were tryign to do and why i was insitieng on config drive support | 16:01 |
sean-k-mooney | i guess the disconenct is | 16:01 |
dansmith | gibi: they don't choose a host based on whether they want to update the user_data a month from now :) | 16:01 |
sean-k-mooney | i tought it was ok to start with the libvirt dirver | 16:01 |
sean-k-mooney | since we coudl add other drivers later | 16:01 |
dansmith | sean-k-mooney: I don't think I said I'm opposed to that, what I'm opposed to is just never having that implemented for the other drivers | 16:02 |
sean-k-mooney | gibi: dansmith also i fixed a bug where migration could cause you to gain/loose a config drive by making it sticky | 16:02 |
sean-k-mooney | so if the vm is first created with a config driver because of the option i made that sitcky | 16:02 |
dansmith | and given how intertwined this is with the virt driver, and that it was already done wrong once, I would hate to implement this one way and find out that vmware or hyperv or ironic can't honor the request to rebuild during a reboot at the right time | 16:02 |
sean-k-mooney | dansmith: well that is currently blocked by the trait today | 16:03 |
dansmith | sean-k-mooney: that also means that if you got a configdrive when you created the instance, it will never have update-able user_data right? | 16:03 |
sean-k-mooney | but if we tried to do it quick that could break | 16:03 |
sean-k-mooney | dansmith: yes exactly | 16:03 |
dansmith | sean-k-mooney: no I get that we can detect if it's supported today, I'm talking about if we merge this and next cycle ironic says "we literally can't regenerate the config drive" and hyperv says 'we can, but not in the middle of a reboot because of how we implement that" | 16:04 |
dansmith | then we're kinda stuck | 16:04 |
sean-k-mooney | ya fair | 16:05 |
sean-k-mooney | its internal to the driver but it might for exampel require the current reboot to be slit into stop,update config drive, start | 16:05 |
sean-k-mooney | for the hyperv senario | 16:05 |
sean-k-mooney | which we might or might be able to hide | 16:05 |
dansmith | libvirt's hard reboot is basically a recreate, but that doesn't mean the other drivers are | 16:06 |
sean-k-mooney | yep | 16:06 |
dansmith | let me also say that I'm sorry I brought up all these concerns late, but I *was* asked to review this and have tried to put my money where my mouth is on changes | 16:06 |
dansmith | but I think these are all legit concerns | 16:06 |
bauzas | yeah, there are no easy paths for solving this problem | 16:07 |
dansmith | I know gibi is plotting my murder right now, probably conspiring with jhartkopf :) | 16:07 |
sean-k-mooney | they are. we discussed it in the ptg as we had previous rejected the spec | 16:07 |
sean-k-mooney | last cycle | 16:07 |
gibi | I'm sorry that I drove jhartkopf's solution to a dead end. | 16:08 |
bauzas | well, I had concerns about the complexity it was creating for little gain | 16:08 |
bauzas | but I was opposed this was an easy win | 16:08 |
gibi | I reviewed these patches and missed obvious design errors. I will try better next time. | 16:09 |
bauzas | so, now, I'm trying to find a trade-off but I don't wanna pull the strings if I think this is risky | 16:09 |
sean-k-mooney | well little gain is not neeisaly fiar it makes nova more "cloud native" as the idea was that user data shoudl be more liek k8s config maps | 16:09 |
sean-k-mooney | alhtoguh to be fiare that woudl also imply we shoudl delete and recreate the vms | 16:09 |
dansmith | hah right | 16:09 |
dansmith | no problem updating user_data if we just shoot the instances in the head :D | 16:09 |
bauzas | sean-k-mooney: we never proposed userdata to be mutable | 16:09 |
sean-k-mooney | who is we | 16:10 |
bauzas | in a cloud, you just spin another instance if you dislike your existing userdata | 16:10 |
sean-k-mooney | this has been a long runing request for multiple cycles | 16:10 |
sean-k-mooney | not in aws | 16:10 |
sean-k-mooney | and other plathforms | 16:10 |
jhartkopf | dansmith: Honestly it's better to notice problems now than when it's too late | 16:10 |
sean-k-mooney | apprenly openstack was an outlier | 16:10 |
bauzas | https://docs.openstack.org/nova/latest/user/metadata.html#user-provided-data | 16:11 |
gibi | dansmith: I would I? I feel sorry for jhartkopf's time spent on this. And I feel bad about that I was not able to find the issues you and sean-k-mooney found | 16:11 |
gibi | *why wouldi? | 16:11 |
bauzas | glad I'm not quoted :) | 16:12 |
sean-k-mooney | bauzas: right but this concept was orginally borrowed form ec2 | 16:12 |
sean-k-mooney | bauzas: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html | 16:12 |
sean-k-mooney | https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html#user-data-view-change | 16:13 |
bauzas | the stop requirement from EC2 may help | 16:14 |
bauzas | anyway, I was about to draft a design modification | 16:15 |
sean-k-mooney | well doing it on hard reboot was an optimisation of stop update start | 16:15 |
sean-k-mooney | but we could do htat instead | 16:15 |
sean-k-mooney | or shelve | 16:15 |
bauzas | the fact is, if we only allow the instance to be stopped, we would only allow the instance to be started once the userdata is updated | 16:16 |
sean-k-mooney | unless server update is blocking right now im not sure it helps | 16:16 |
bauzas | at least once the request ended | 16:16 |
sean-k-mooney | bauzas: well we woudl need an new rpc to the compute to rebuild teh cofnig drive | 16:17 |
dansmith | we still have to regen the config drive though | 16:17 |
dansmith | which we don't do on reboot right? | 16:17 |
dansmith | we, stop..start | 16:17 |
sean-k-mooney | right we dont regen the config drive out side of unshelve or corss cell migration | 16:17 |
dansmith | but yeah, saying you can only update user_data if the instance is stopped is probably a lot better for all the arguments | 16:17 |
bauzas | phew | 16:17 |
sean-k-mooney | and unshelve only for non rbd backed instnaces | 16:17 |
dansmith | it addresses the "so do I need to reboot?" question on non-configdrive | 16:17 |
dansmith | and eliminates the need for the rpc | 16:18 |
dansmith | it still requires regen to work, and still may or may not be easy or possible for some drivers | 16:18 |
dansmith | like ironic might have to change the provision state to re-write a disk or something | 16:18 |
gibi | will we regen for each start then? | 16:18 |
dansmith | gibi: that'd be a question yeah | 16:19 |
sean-k-mooney | im not sure we woudl remove the rpc | 16:19 |
dansmith | I really don't like the idea of a dirty flag for the other reasons | 16:19 |
dansmith | sean-k-mooney: oh you mean a new rpc for the update | 16:19 |
dansmith | yeah that could work | 16:19 |
sean-k-mooney | ya | 16:19 |
dansmith | we could just nuke the existing drive in the libvirt case, but something like ironic would have to keep track | 16:19 |
sean-k-mooney | i mean we have another option | 16:19 |
sean-k-mooney | make this entirly uer driver by adding a new instance action | 16:20 |
sean-k-mooney | for updating the config drive | 16:20 |
dansmith | oh instead of a PUT | 16:20 |
bauzas | I know we're on Zed-3 day and I'm supposed to stay but this is 6:20pm here and I wonder how much of this could be etherpaded | 16:20 |
sean-k-mooney | ya, stop, update whatever you liek, explictly tell it to update config drive if you care, start | 16:21 |
bauzas | because we're discussing a design change | 16:21 |
bauzas | dansmith: yeah, something we would say "damn nova, please do the necessary to plumb my things" | 16:21 |
sean-k-mooney | bauzas: i think we all realise that we likely wont make progress now | 16:21 |
bauzas | an instance action would be nice to me, because it would get a status | 16:22 |
bauzas | and the async thing would be simple for the user | 16:22 |
bauzas | like, I started again and my userdata didn't updated, why ? | 16:22 |
bauzas | => server action tells me it failed | 16:22 |
sean-k-mooney | well the server action can be blocking or async | 16:23 |
sean-k-mooney | but it has a feeback mecahnim either way | 16:23 |
gibi | this could be a very long running blocking call | 16:23 |
bauzas | yup, the whole point is that it gives a clue why the userdata wasn't uptodate | 16:23 |
sean-k-mooney | right it woudl be in the instance event log | 16:23 |
bauzas | and until the userdata says "I'm updated", don't expect to get the new things if you start | 16:23 |
gibi | we can also make the start to fail if it needed to regen but failed to | 16:24 |
sean-k-mooney | if its async or in the respocne if blocking | 16:24 |
sean-k-mooney | gibi: that woudl requrie trackign the dirty state | 16:24 |
sean-k-mooney | and also some peopel might not care | 16:24 |
gibi | I don't see how somebody explicitly updated the user_data and then don't case if that update is ending up in the VM being sarted | 16:25 |
gibi | started | 16:25 |
gibi | I think the dirty flag was bad because of shadow rpc | 16:25 |
sean-k-mooney | im thinking about something else | 16:25 |
sean-k-mooney | as a user i might not know if the user data is in config drive or not | 16:26 |
sean-k-mooney | but i said start | 16:26 |
sean-k-mooney | and i might not want tha to break in this case | 16:26 |
gibi | I see | 16:27 |
sean-k-mooney | the dirty flag was to give you the behaior that it would only boot if it succeed to update | 16:27 |
bauzas | honestly, if I'm an user, I don't care about the internals | 16:27 |
sean-k-mooney | if it failed at least in my version teh vm would have gone to error | 16:27 |
bauzas | what I want to know is when my update succeeds and when I can use it | 16:27 |
dansmith | sorry, I'm getting pulled away, | 16:28 |
dansmith | but if it was an instance action, | 16:28 |
dansmith | then the rebuild either works or doesn't, I can see what happened, | 16:28 |
dansmith | and then I either get the old version on fail, or the new version on succeed | 16:28 |
dansmith | if we just set a dirty flag and then fail to regen on start, then we've killed the instance with no way to restart | 16:28 |
dansmith | which sucks | 16:28 |
bauzas | yup, I get feedback from an user perspective | 16:28 |
bauzas | yup, agreed | 16:28 |
sean-k-mooney | jhartkopf: so based on the above | 16:29 |
dansmith | we probably need some task_state or something to keep the user from starting it until the regen completes though | 16:29 |
sean-k-mooney | jhartkopf: and the fact we are all runing out of steam | 16:29 |
bauzas | mikal would say "spec'y spec'y" | 16:29 |
dansmith | or just use instance lock on the compute node to hold off on any other requests.. I guess that's enough | 16:29 |
sean-k-mooney | i think we need to disucss this again next cycle | 16:29 |
bauzas | the next PTG will be virtual | 16:29 |
bauzas | anyone can join, budged saved | 16:30 |
* bauzas tries to not be sarcastic | 16:30 | |
sean-k-mooney | ya but we can reopne the sepc and discuss it again before htat | 16:30 |
bauzas | oh yeah | 16:30 |
sean-k-mooney | i see to ways we coudl go with the instance action by the way | 16:30 |
sean-k-mooney | one is one for regeneratin cofnig drive | 16:31 |
sean-k-mooney | the other is more targed for this feature | 16:31 |
sean-k-mooney | an instance action to update user_data | 16:31 |
sean-k-mooney | that either succeed or does not | 16:31 |
sean-k-mooney | and have that be the only wya to update it | 16:31 |
sean-k-mooney | so no updating it via server put | 16:31 |
dansmith | yeah, the user should not see "regenerate config drive" | 16:32 |
dansmith | so it should be "update my user_data" at the external surface for sure | 16:32 |
sean-k-mooney | right | 16:32 |
sean-k-mooney | so if we do that | 16:32 |
sean-k-mooney | that dedicated instance action can "do the right thing" regardess of how the instance is booted | 16:33 |
dansmith | right | 16:33 |
sean-k-mooney | and returna 409 conflict if its not supprot by the driver or whatever | 16:33 |
jhartkopf | sean-k-mooney: Oh that'd be bad news. Was hoping to get at least the stuff without config drives merged this cycle. | 16:33 |
sean-k-mooney | but it can be atomic | 16:34 |
JayF | o/, I have a handful of ironic driver patches I'm still trying to get backported thru. They are all clean backports and passing tests. | 16:34 |
JayF | https://review.opendev.org/c/openstack/nova/+/854257 (no live migration tests on ironic driver changes -> yoga) https://review.opendev.org/c/openstack/nova/+/821351 (ignore plug_viles on ironic driver -> ussuri) https://review.opendev.org/c/openstack/nova/+/853546 (minimize window for a resource provider to be lost -> train) | 16:34 |
bauzas | sean-k-mooney: dansmith: yeah, the user-surfaced API is the userdata, not the configdrive | 16:34 |
bauzas | so the instance action has to be related to it | 16:35 |
sean-k-mooney | JayF: did we land the skip of the migration job yet | 16:35 |
JayF | sean-k-mooney: that's the yoga backport above | 16:35 |
JayF | sean-k-mooney: the version -> master is in | 16:35 |
sean-k-mooney | an cool | 16:35 |
sean-k-mooney | i was onder if the master on had landed | 16:36 |
bauzas | jhartkopf: the problem is that we can't introduce this userdata update by splitting without configdrive as we just agreed on a brand new API | 16:36 |
JayF | oh, I've already checked all these -- previous branches have merged, these should be good to go. e.g. I'd approve these if they were ironic stable changes | 16:36 |
bauzas | for consistency reasons, we have to hide the configdrive details from the user request | 16:36 |
sean-k-mooney | bauzas: that and my +2 on the spec was contingent on it not depending on if you use configdrive or not | 16:36 |
JayF | and elod helped me clean up the only piece around nova stable policy that's more strict than Ironic :D | 16:36 |
bauzas | sean-k-mooney: that kind of design problems happening late in the cycle are common | 16:37 |
bauzas | jhartkopf: this isn't at all about your implementation, this is rather about the fact we discovered some design problems we weren't considering at the spec review time | 16:38 |
gibi | we should be better at doing this kind of design discussion before the FF day | 16:38 |
gibi | but I don't know how | 16:38 |
bauzas | honestly, this is the only blueprint that fails this for this cycle AFAICT | 16:38 |
bauzas | gibi: we had precedents | 16:38 |
bauzas | some spec approved and a design question raising late | 16:39 |
jhartkopf | bauzas: I understand that, but it's really unfortunate though | 16:39 |
bauzas | jhartkopf: I don't disagree with you and I understand how much this can be frustrating | 16:39 |
artom | bauzas, wait, so ephemral disk, PCI, and manial will all make it? | 16:40 |
* artom is frankly astounded | 16:40 | |
artom | *Manilla | 16:40 |
gibi | bauzas: I don't think we can detect all the design issues at spec time. But still I think we should be better than doing it at the day of FF | 16:40 |
sean-k-mooney | not sure about manilla | 16:40 |
bauzas | jhartkopf: I just hope you can understand that our API resources are important to us and we really care about the UX | 16:40 |
* artom maybe misunderstood "this is the only blueprint that fails this for this cycle AFAICT"? | 16:40 | |
sean-k-mooney | ephmeral also likely not | 16:40 |
bauzas | artom: no, I was mentioning a design issue being raised | 16:40 |
artom | Ah | 16:40 |
bauzas | ephemerals and PCI are on their way but they're big and need a lot of back and forths | 16:41 |
gibi | the only easy answer for me know is that I should have asked dansmith earlier to review the series | 16:41 |
bauzas | but they hadn't required a redesign | 16:41 |
bauzas | gibi: well, if you read the spec, I had concerns about the restart | 16:42 |
dansmith | gibi: this is why I'm against yearly releases.. because this stuff, very unfortunately, only comes to a head when there's a deadline... | 16:42 |
gibi | so the I should have asked you to review the series too | 16:42 |
bauzas | we just didn't went further thinking about configdrive problems | 16:42 |
gibi | dansmith: I totally agree about having more meaningful deadline would help | 16:42 |
sean-k-mooney | dansmith: because thats the forcing function that get a lot of use to review a tthe same time | 16:43 |
sean-k-mooney | dansmith: in the absence of an inperson PTG | 16:43 |
dansmith | and I only reviewed it because it was in front of something else that I know we needed to land, which was up against the deadline | 16:43 |
bauzas | and I was on PTO for 3.5 weeks | 16:43 |
bauzas | so blame me too | 16:43 |
dansmith | WE ALL SUCK | 16:43 |
dansmith | there ^ :P | 16:43 |
sean-k-mooney | :) | 16:43 |
gibi | I don't want to blame, I want to find something I or we can do different next time | 16:43 |
jhartkopf | this ^ | 16:44 |
bauzas | everyone has different review sensitiveness | 16:44 |
bauzas | for me, I mostly care about upgrades and controler/compute relationship | 16:44 |
bauzas | but I miss other things | 16:45 |
bauzas | the point is, we're humans | 16:45 |
* artom was completely absent for the entire thing, and thus would like to claim that he does not, in fact, suck. | 16:45 | |
bauzas | artom: you're Quebecian | 16:45 |
bauzas | there is a default fail | 16:45 |
gibi | I don't want to believe that this is OK as is. :) | 16:46 |
artom | You want a Maurice Richard riot v2.0? Because that's how you get a Maurice Richard riot v2.0. | 16:46 |
gibi | I want to improve! | 16:46 |
sean-k-mooney | well i guss where this should have been hashed out comes back to the spec | 16:46 |
bauzas | yup ^ | 16:46 |
sean-k-mooney | and the first red flag should have been when the quetion of how to regenerate teh config drive came up | 16:47 |
sean-k-mooney | after the spec was merged | 16:47 |
gibi | dansmith: we would need a bit more than deadlines we would need to inject deadlines that forces a patch series to be ignored for 2 months or so | 16:48 |
dansmith | to not be? | 16:48 |
sean-k-mooney | we did actully have such a deadline in the past | 16:49 |
gibi | dansmith: to be. If the deadline only means I can try the same deadline two weeks from now then that does not force me to push for this particular deadline | 16:49 |
dansmith | I think part of the problem is that jhartkopf has to wait 6mon (minimum) before he has a consumable release with this in it | 16:49 |
sean-k-mooney | it was called the non-priorty feature propsal deadline | 16:49 |
dansmith | gibi: yeah that's true | 16:49 |
sean-k-mooney | which requried any feature not marked as a priorty to be propsed by m2 | 16:49 |
dansmith | gibi: two weeks is probably too infrequent to be pressure-inducing | 16:49 |
gibi | dansmith: true, I defaulted to scrum | 16:50 |
dansmith | we should probably retro this later, I have other m3 stuff I need to do | 16:50 |
gibi | sean-k-mooney: that fulfills the big-enough-pain-to-miss requirement. :) | 16:50 |
gibi | sean-k-mooney: I guess it introduced another issue, selecting priorities. But that was maybe less of a problem | 16:51 |
gibi | and we did see that having priorities would help with the rebuild bfv series | 16:52 |
gibi | * would have helped | 16:52 |
gibi | so I'm not against discussing the reintroduction of non-priority feature freeze as one possible option | 16:52 |
sean-k-mooney | yep | 16:52 |
bauzas | noted. | 16:53 |
sean-k-mooney | so way back we used to requrie non priorty feature to merge by m2 | 16:53 |
* dansmith starts singing "the circle of life" in the background | 16:53 | |
JayF | As a mostly-outside observer, this sounds mostly like the standard "10 gallons of work has to fit in a 5 gallon bucket" problem :) | 16:53 |
sean-k-mooney | we then relaxed it for the to be proposed by m2 | 16:53 |
sean-k-mooney | and then removed it | 16:53 |
bauzas | remember, we discussed this at the PTG when I pointed out I was off for three weeks | 16:53 |
gibi | I have to drop, but thanks for the above discussion | 16:54 |
bauzas | but anyway, I personnally feel another deadline wouldn't have changed the situation | 16:54 |
gibi | lets continue this on the PTG as part of a retro | 16:54 |
sean-k-mooney | sure | 16:54 |
bauzas | yet again, this was a design concern, being missed at the design state | 16:54 |
bauzas | stage* | 16:54 |
dansmith | to be clear, my point about deadlines was that moving to yearly releases would make this a lot worse, not necessarily that more deadlines than we have would *solve* it | 16:54 |
bauzas | dansmith: oh yeah | 16:54 |
sean-k-mooney | dansmith: yep it would | 16:54 |
melwitt | sean-k-mooney: thanks for the link | 17:13 |
opendevreview | Jay Faulkner proposed openstack/nova stable/xena: nova-live-migration tests not needed for Ironic https://review.opendev.org/c/openstack/nova/+/855469 | 17:38 |
*** efried1 is now known as efried | 17:53 | |
opendevreview | Rico Lin proposed openstack/nova master: Add traits for viommu model https://review.opendev.org/c/openstack/nova/+/844507 | 18:14 |
ricolin | sean-k-mooney: bauzas sorry, a pep8 error, can you approve it again, thanks | 18:14 |
opendevreview | Merged openstack/nova master: Heal missing simple PCI allocation in the resource tracker https://review.opendev.org/c/openstack/nova/+/851359 | 18:15 |
opendevreview | Merged openstack/nova master: Heal PCI allocation during resize https://review.opendev.org/c/openstack/nova/+/852396 | 18:16 |
opendevreview | Merged openstack/nova master: Heal allocation for same host resize https://review.opendev.org/c/openstack/nova/+/854822 | 18:16 |
opendevreview | Merged openstack/nova master: Handle PCI dev reconf with allocations https://review.opendev.org/c/openstack/nova/+/852397 | 18:21 |
opendevreview | Merged openstack/nova master: Allow enabling PCI tracking in Placement https://review.opendev.org/c/openstack/nova/+/850468 | 18:21 |
dansmith | ricolin: I got you | 18:25 |
ricolin | thanks dansmith | 18:28 |
bauzas | ricolin: dansmith: done too | 18:52 |
bauzas | fwiw, I'm setting Implemented against all blueprints that have their API changes merged but waiting for client patches | 19:15 |
bauzas | nova in launchpad != novaclient and osc | 19:15 |
opendevreview | Merged openstack/nova stable/ussuri: Ignore plug_vifs on the ironic driver https://review.opendev.org/c/openstack/nova/+/821351 | 19:29 |
bauzas | fwiw, procedural -2 on https://review.opendev.org/c/openstack/nova/+/816157/16 | 19:32 |
JayF | Intersesting failure case in stable/yoga CI, I think it's probably something that's actually-broken, but not by my patch: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8b1/854257/2/gate/openstack-tox-py39/8b1323f/testr_results.html | 19:35 |
JayF | Given that all tests that were comparing the contents there are all failing in a similar way, with no human-readable changes | 19:35 |
sean-k-mooney | JayF i have hit that too | 19:36 |
sean-k-mooney | on complete unrelated path | 19:36 |
sean-k-mooney | i cant repodcue it locally for what its worth | 19:36 |
JayF | are you suggseting I just retry then? | 19:36 |
sean-k-mooney | let me check if my other one pass after a retry | 19:36 |
JayF | I can do that, I usually don't pull that lever if it looks nonrandom like this one does, but I don't know enough about nova failure cases to be certain | 19:36 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/855025 failed agian | 19:38 |
sean-k-mooney | JayF: same issue | 19:38 |
sean-k-mooney | and its only on py39 | 19:38 |
sean-k-mooney | the py36 test passes | 19:38 |
bauzas | gibi: sean-k-mooney: others, you may be interested in the antelope timeframe https://review.opendev.org/c/openstack/releases/+/850753/5/doc/source/antelope/schedule.rst | 19:38 |
sean-k-mooney | JayF: so its a real blocker | 19:38 |
bauzas | gibi: sean-k-mooney: this sounds a shorter development cycle | 19:39 |
sean-k-mooney | not realted to your patch at all | 19:39 |
sean-k-mooney | i had t fail on all patches in a 5/6 patch sereis and on the recheck of the first patch | 19:39 |
sean-k-mooney | so its seams to be repoducabel in ci | 19:39 |
sean-k-mooney | JayF: i am not sure ill have time to look at that this week but we likely should track that as a stable gate bug. ill try and follow up with elodilles about it next week | 19:45 |
JayF | When you say "track that as a stable gate bug", what does that mean in practice for Nova? | 19:46 |
JayF | I know in Ironic, e.g., we have an etherpad we track that in | 19:46 |
JayF | I'm happy to document the issue; I can't sit here and ask for a bajillion stable reviews without being willing to help with CI somewhat :D | 19:46 |
sean-k-mooney | JayF: we have an etherpad and we file a bug with gate blocker in the title | 19:46 |
JayF | Where could I find a link to that etherpad? | 19:46 |
sean-k-mooney | https://etherpad.opendev.org/p/nova-stable-branch-ci | 19:46 |
JayF | ty | 19:47 |
JayF | bugs in nova are launchpad, not storyboard? | 19:48 |
JayF | yep looks like it, filing it and will update the etherpad | 19:48 |
sean-k-mooney | yep launchpad | 19:48 |
sean-k-mooney | we have as gate-failure tag | 19:49 |
opendevreview | Merged openstack/nova master: libvirt: Add vIOMMU device to guest https://review.opendev.org/c/openstack/nova/+/830646 | 19:56 |
sean-k-mooney | JayF: https://tinyurl.com/mr3pjzyn | 19:57 |
JayF | What login would I use for that? | 19:57 |
JayF | I've not used opensearch yet | 19:57 |
sean-k-mooney | openstack openstack | 19:57 |
* JayF hasn't done serious bulk-gate-troubleshooting since the elastic-recheck days | 19:57 | |
sean-k-mooney | its basically the same | 19:57 |
sean-k-mooney | so only 8 hits 6 are me i think but i feel like that might be a limaitaiotn of the query | 19:58 |
JayF | one of the other two is me lol | 19:59 |
sean-k-mooney | its shoing in the nova cros job | 19:59 |
sean-k-mooney | but this job is openstack-tox-py39 | 19:59 |
sean-k-mooney | so i dont know why that is not showing up | 19:59 |
JayF | it's going to be extremely hard to make a matcher on this that isn't at least possible it'll catch other stuff | 20:01 |
JayF | because this is essentially "it looks like a proper unit test failure but just... isn't" | 20:01 |
sean-k-mooney | nova.tests.unit.cmd.test_manage.UtilitiesTestCase.test_format_dict [0.025453s] ... FAILED | 20:01 |
sean-k-mooney | i should be able to match on that instead | 20:01 |
JayF | Yeah, I've read the docs a couple of times, and I can't figure out how to wildcard out the time in the test run search | 20:13 |
sean-k-mooney | same i think im going to give up for now | 20:13 |
* JayF sometimes wishes these guis were just a terminal with something greppy installed | 20:13 | |
opendevreview | Merged openstack/nova-specs master: Create specs directory for 2023.1 Antelope https://review.opendev.org/c/openstack/nova-specs/+/851007 | 20:15 |
opendevreview | Sofia Enriquez proposed openstack/nova master: WIP: Check NFS protocol https://review.opendev.org/c/openstack/nova/+/854030 | 21:09 |
opendevreview | Sofia Enriquez proposed openstack/nova master: Check NFS protocol https://review.opendev.org/c/openstack/nova/+/854030 | 21:28 |
*** dasm is now known as dasm|off | 21:42 | |
JayF | I filed https://bugs.launchpad.net/nova/+bug/1988482 and am documenting it in that etherpad now. | 21:57 |
JayF | let me know if there's anything I can do to help resolve it, I am a little behind on my usual stuff due to being out sick earlier this week, but if it's not picked up by early next week I might see if I can figure it out | 21:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!