Thursday, 2022-09-01

whoami-rajatsean-k-mooney, it is already being tested in tempest-full but the problem is the image is same before/after rebuild so it's not testing the actual feature i.e. to reimage to a new image hence the new job, not sure if there is a better way to do it05:10
opendevreviewMerged openstack/nova master: Add support for volume backed server rebuild  https://review.opendev.org/c/openstack/nova/+/82036807:44
gibigood morning07:47
opendevreviewMerged openstack/nova master: Add conductor RPC interface for rebuild  https://review.opendev.org/c/openstack/nova/+/83121907:51
opendevreviewMerged openstack/nova master: Add API support for rebuilding BFV instances  https://review.opendev.org/c/openstack/nova/+/83088307:53
bauzasgood morning07:58
Ugglagibi, bauzas good morning08:16
* bauzas needs a serious coffee08:17
bauzasUggla: how was this morning ? :)08:17
bauzaswas Rose happy to see her friends ?08:17
UgglaYes, she's ok and happy because she has the teacher she wanted.08:18
gibi:)08:19
UgglaOne little girl in her group was crying, and Rose gave her hand to go to the classroom.08:21
gibiso nice08:21
bauzasnice and bravo Rose08:24
bauzasmy oldest is still on vacations, she'll start tomorrow08:25
Ugglain short no pb so far. I'll see this afternoon.08:25
* bauzas has to remember that the school times have changed this year 08:25
bauzasif you see me disappearing quickly, that's because I forgot about the time :p08:26
Uggla1st day on a Friday ? This is a warmup day for teachers ? :)08:32
bauzasUggla: no, for the "6e" chidren (10-11yo) they started today08:33
bauzasfor the older (12-14yo), they start tomorrow08:33
*** akekane_ is now known as abhishekk08:33
Ugglayep strange to start on a Friday. (for your older one)08:33
bauzasfor the "6eme" folks, they only work today and not tomorrow08:35
bauzas("6eme" is first year of middle school)08:36
Ugglaok so only 1 day the first week. Soft start.08:46
sean-k-mooneyits pretty common here09:19
sean-k-mooneythey give the students who are new to the school 1 day before the rest come back09:19
sean-k-mooneyso they can be shown around without all the other people09:20
sean-k-mooneyso normally they start on a wednesday here youngets class group, then first half then rest09:20
opendevreviewKonrad Gube proposed openstack/nova-specs master: Add API for assisted volume extend  https://review.opendev.org/c/openstack/nova-specs/+/85549009:21
sean-k-mooneymelwitt: ^09:21
sean-k-mooneywhoami-rajat: tempest has an alternitive image so we can have two test one that reimages to the same image and one the reimages to the alt image https://github.com/openstack/tempest/blob/master/tempest/config.py#L294-L29709:36
sean-k-mooneywhoami-rajat: so you would boot with image_ref and rebuild to image_ref_alt09:36
sean-k-mooneywhoami-rajat: we really only care that the images are idfferent in glance they can both be cirros and its fine09:37
sean-k-mooneythe mecanics are the same if its the same image or differnt with the new microverion09:38
sean-k-mooneywith the old microversion you can jsut the requried alt image to assert its not allowed09:38
sean-k-mooneyand the same image to assert it preserved data09:38
*** brinzhang_ is now known as brinzhang09:46
whoami-rajatsean-k-mooney, those parameters point to the same image in default tempest job, like see this tempest.conf from tempest-full run, both image_ref and image_ref_alt points to same image https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_83f/831018/24/check/tempest-full-py3/83f6b4a/controller/logs/tempest_conf.txt09:56
whoami-rajatsean-k-mooney, I'm configuring different images for those parameters in my job, see tempest.conf for my new job https://a59fb54e72bca9b30065-6505ce46ad1aa88851df8d527471f9dc.ssl.cf1.rackcdn.com/831018/24/check/tempest-rebuild-volume-backed/c4755d7/controller/logs/tempest_conf.txt09:56
whoami-rajatIMAGE_URLS parameter in devstack enables setting different images for those parameters09:57
whoami-rajatby default only one image is created and set as both image_ref and image_ref_alt09:57
whoami-rajatthat's why i configured a new job with this parameter, wasn't sure if modifying existing jobs for my test run is a valid thing so created a new job09:58
gibistephenfin: hi! how do you feel, will you have time to review the PCI series up until https://review.opendev.org/c/openstack/nova/+/850468/20 before the FF deadline?10:00
sean-k-mooneywhoami-rajat: yes but it would be better to configure devstack to upload the cirrios image twich with differnt names and then use that 10:09
sean-k-mooneyso i think yes modifying the exisitng josb is totally valid10:09
sean-k-mooneyand would get more coverage of the feature. that can be done in the futrue10:10
sean-k-mooneywhoami-rajat: also congrats its merged in nova this morning10:10
whoami-rajathmm, ack, will try that10:10
sean-k-mooneywhoami-rajat: sorry it took so long10:10
whoami-rajatsean-k-mooney, thanks and thank you to all the nova team bauzas gibi dansmith for helping with that!10:10
whoami-rajatno problem :)10:11
sean-k-mooneydid you see my comments on teh osc patch10:11
whoami-rajatnope, looking10:11
sean-k-mooneyack you duplicated a test but forgot to rename it in the copy paste10:11
sean-k-mooneytrivial fix10:11
gibiwhoami-rajat: you are welcome10:11
whoami-rajatsean-k-mooney, oh i addressed that yesterday only, i thought there was a new comment10:12
sean-k-mooneyack, no i have not reviewd since you updated it10:12
sean-k-mooneyill go do that now10:12
whoami-rajatthanks10:12
whoami-rajathttps://review.opendev.org/c/openstack/python-openstackclient/+/831014 -- for quick reference10:12
sean-k-mooneyalredy reviewed10:13
sean-k-mooneyit looks good to me stephenfin ^ your an osc core care to take a look10:14
opendevreviewAmit Uniyal proposed openstack/nova master: Adds a repoducer for post live migration fail  https://review.opendev.org/c/openstack/nova/+/85449910:17
opendevreviewAmit Uniyal proposed openstack/nova master: [compute] always set instnace.host in post_livemigration  https://review.opendev.org/c/openstack/nova/+/79113510:17
sean-k-mooneyricolin: can you update https://review.opendev.org/c/openstack/nova/+/844507/18/nova/virt/libvirt/driver.py#1220211:14
sean-k-mooneyif you can make that change im +2 on both patches11:15
sean-k-mooneybauzas: ^11:15
sean-k-mooneyill see if i can quickly test your changes in devstack too11:15
sean-k-mooneyto confirm this work as intended11:15
bauzassean-k-mooney: I can review the traits change once ricolin updates it12:46
ricolinThanks sean-k-mooney: bauzas will do it ASAP12:46
bauzascool12:46
sean-k-mooneyin its current form its not wrong its just over complciated12:46
sean-k-mooneyricolin: i think if you adress that everything else looks ok12:47
sean-k-mooneyand it can land before FF12:47
ricolinsean-k-mooney: sounds great12:47
sean-k-mooneyi have not had a change to test it locally yet but i expect it to work12:47
opendevreviewAmit Uniyal proposed openstack/nova master: Adds a repoducer for post live migration fail  https://review.opendev.org/c/openstack/nova/+/85449912:47
opendevreviewAmit Uniyal proposed openstack/nova master: [compute] always set instnace.host in post_livemigration  https://review.opendev.org/c/openstack/nova/+/79113512:47
sean-k-mooneyricolin: i assume you ahve booted a vm with this code and validated the iommu is present12:48
*** dasm|off is now known as dasm12:54
opendevreviewRico Lin proposed openstack/nova master: Add traits for viommu model  https://review.opendev.org/c/openstack/nova/+/84450712:55
ricolinsean-k-mooney: done12:56
ricolinand yes, I'm sure IOMMU is there. Just not sure I test Traits in the right way12:58
bauzasricolin: sean-k-mooney: reviewing13:01
ricolinthanks bauzas 13:02
sean-k-mooneybauzas: gibi  if ye want to proceed with ^ im more or less happy with it. the release note shoudl be in the seond patch13:23
sean-k-mooneybut as long as we land both of them its not an issue13:24
bauzassean-k-mooney: I just said +1 for an upgrade question13:24
sean-k-mooneyi replied13:24
sean-k-mooneythats intentional13:24
sean-k-mooneyaltough dansmith might point out we could have also added a min compute service check for this instead of using the traits13:25
sean-k-mooneywe normally did not dod that in the past however13:26
sean-k-mooneyso i think this is all good to go13:26
gibithe trait based capability scheduling looks OK to me13:26
gibiI think dansmith had issues using the capability trait outside of the scheduling13:26
dansmithif we're already scheduling, then traits make plenty of sense13:26
dansmithright13:26
sean-k-mooneyyep its in the existing pre filter13:27
gibibauzas: do you need my +2 or you will send it in?13:27
sean-k-mooneybefore we started to do these check with placment we would have landed on the host and failed there with hypervior too old or similar13:27
bauzassorry folks was in 1:1 meeting13:28
bauzassean-k-mooney: thanks, will reply then with +213:28
gibiack, then I'm not needed there :)13:28
bauzasit was just an open thought13:28
bauzasI just want to make sure that operators know they need to upgrade all their hosts if so13:28
bauzasbut that's understandable13:29
sean-k-mooneywell its as you said13:29
sean-k-mooneythe dont actully have to but it will be capsity limited13:29
sean-k-mooneybut that kind of to be expected that you cant use new feature on old nodes13:30
bauzasyup, agreed13:30
bauzasat least with traits13:30
sean-k-mooneywell even without13:30
bauzaswithout, we ask for compute service checks in general13:30
sean-k-mooneyyep13:31
sean-k-mooneybut that is why i said the traits patch chould be first13:31
bauzaswell, if both merge for Zed, I'm cool13:31
sean-k-mooneyand the other one secodn and why i said in the current order they need to merge togather13:31
bauzasyes and no13:31
sean-k-mooneybauzas: let me rephase i dont want to merge only one of thoes two patches13:32
bauzasyou could merge the first, this is just you won't be able to get the feature until you upgrade all 13:32
dansmithgibi: sean-k-mooney: are we FFEing the user_data stuff such that I should try to bang out the rest of the RPC stuff ASAP?13:32
bauzaswon't be able to *be sure* to get the feature13:32
bauzasdansmith: eeek, context ?13:32
bauzasoh, the rpc call13:32
bauzaslemme just send to the gate ricolin's work13:33
sean-k-mooneybauzas: that an di guess i need to rev my follow up patch with actul tests13:33
sean-k-mooneydansmith: im not agasint doint that if you think you will have time 13:33
sean-k-mooneybut that probaly the main RFE that i think FFE might make sense for13:34
dansmithsean-k-mooney: I was expecting to see a rev of the patch to move the regen to after the instance is destroyed and fix the actual writing that was failing13:34
dansmithso I hadn't rebased my RPC stuff yet13:34
sean-k-mooneythe only other one might be the PCI series ebut have nto looked at it for two days13:34
dansmithbut that has to happen first right?13:34
sean-k-mooneyoh i wrote a ptach to fix config drive13:34
dansmithoh is that ths? https://review.opendev.org/c/openstack/nova/+/855351/113:35
sean-k-mooneyyes13:35
gibisean-k-mooney: you are +2 up until the healing patches in the PCI and that is the realistic target there. If stephenfin will have no time to review them today then I will ask for an FFE for that. We let the scheduling part slip in any case13:35
bauzasthe mutable userdata has API impact13:35
dansmithokay but that would need to be squashed or go in front right?13:35
bauzasbut I'm OK with FFE'ing if needed13:35
dansmithbauzas: it would also be about half not-very-reviewed code at this point based on the look of the configdrive patch13:36
dansmithso not really "just didn't get reviewed in time"13:36
sean-k-mooneydansmith: yes or they split the patch in to non config drive and config drive13:36
dansmiththat would mean two microversions for effectively the same thing13:37
sean-k-mooneydansmith: i basically started my patch so they would have a refernce for what needed to be done13:37
dansmithunless we do the config regen and rpc ahead of exposing int the api13:37
bauzasdansmith: I think we reviewed it good, but we got a bone13:37
bauzasso, I'm OK with giving more time to review that bone fix13:37
dansmithbauzas: I'm saying the code that needs to be written to make it landable would probably double the actual code in the patch13:38
sean-k-mooneydansmith: yep we coudl do that so intialy it would not be invokeable from the api btut he code would be in place13:38
dansmithsean-k-mooney: yeah that seems better to me13:38
bauzasdansmith: then, we need to take this extratime to balance the risks and maybe not merge it 13:38
bauzasor decide we only merge half the things13:38
sean-k-mooneydansmith: so you could proably combin your rpc code into the pathc i started then we could flip the order of them13:39
gibiwe could merge the part that support user data update with non config drive instances.13:39
sean-k-mooneybut that also means we need to other pepoel to agree to review this13:39
sean-k-mooneysince you and i are basicaly out at that point13:39
gibiI can review the patches (I will be around 18:00 CEST today but I can spend time early tomorrow too)13:40
gibi* I will be around until 18:00 CEST today13:40
dansmithsean-k-mooney: right, that's a problem too13:40
dansmithsean-k-mooney: so a patch in front that adds the flag to the reboot call, then the patch to make it regeneratable, then the api patch to do it for both types would be cleanest I think13:41
sean-k-mooneydansmith: ya that sounds like a plan13:41
sean-k-mooneyit keeps just one micorversion13:42
sean-k-mooneyi can go writeh the tests need to make the rebuild patch complete if we agree to proceed in this direction13:42
dansmithsean-k-mooney: ack, I'll rebase my rpc stuff13:43
sean-k-mooneydansmith: it would proably make sense for you to write your patch directly off master and i could rebase and flip the order when your done13:44
dansmithsean-k-mooney: yeah13:44
bauzasI'm OK with having 3 patches, the API one being the latest13:44
bauzasand this being FFE, so we have 1 week to correctly look at all of this13:44
bauzaselodilles: when is the exact deadline for cycle highlights ?13:47
bauzastoday or tomorrow ?13:47
bauzasone day, the marketing team will understand how crazy it is to ask for feature docs deliveries at the FeatureFreeze (and even before)13:48
bauzaswhile RC1 time would be so easier13:48
bauzasthis is just, I need to take 1 hour of my time for writing something while we're on fire13:49
sean-k-mooneyon the plus side tempest-integrated-compute forces config drive and that passed so at least that change in logic works for normal boots https://zuul.opendev.org/t/openstack/build/98cfa86b93974d79b1d43c29433d7e48/log/controller/logs/etc/nova/nova-cpu_conf.txt#1713:50
sean-k-mooneyi also checked that locally but nice to see it in ci too.13:52
elodillesbauzas: i think there is no strict deadline for today, we will merge it tomorrow if it will be ready only around that time :)13:53
sean-k-mooneyelodilles: i think the remidner was asking for it yesterday13:54
opendevreviewJustas Poderys proposed openstack/nova-specs master: Improve multiqueue network adapter support  https://review.opendev.org/c/openstack/nova-specs/+/85551413:54
sean-k-mooneybut i guess bauzas  point was we might not know the full feature set until we hit FF13:54
bauzaselodilles: surely there is no problem, but I wonder why such rush13:54
sean-k-mooneyor RC1 if we have FFEs13:55
bauzaselodilles: I just replied to your email fwiw13:55
bauzasI guess my point is not if I can deliver later13:55
bauzasbut rather asking for an official move to RC113:55
sean-k-mooneyelodilles: oh you just said this week13:56
bauzassean-k-mooney: https://releases.openstack.org/zed/schedule.html#cycle-highlights13:56
sean-k-mooneyyep13:56
sean-k-mooneylooking at it now13:57
bauzasthe schedule officially says the week of Zed-313:57
bauzasthis is non-trivial13:57
sean-k-mooneyi can understand that they might want to have them read for RC113:57
sean-k-mooneybut the week beteeen the two woudl be a better comproise13:58
dansmithsean-k-mooney: so I'm bringing in the libvirt driver reconfigure_configdrive() method from the original patch, but I should move that until after the self.destroy() yeah?14:01
sean-k-mooneywell it need to get called via the callback14:02
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/855351/1/nova/virt/libvirt/driver.py#397814:03
dansmithI guess I need to look at your patch14:03
elodillesbauzas: if everything goes accroding to plans, then by FF a team knows every feature which were merged. of course, FFEs can alter the picture, but that i think is acceptable. Be the most part of the highlights merged and update the rest if needed as soon as possible. IMO.14:03
sean-k-mooneytl;dr building the config drrive need info generage by libvirt in the xml14:03
elodillesbauzas: "The Zed-3 milestone marks feature freeze for projects following the release:cycle-with-rc model. No featureful patch should be landed after this point. Exceptions may be granted by the project PTL."14:03
sean-k-mooneyelodilles: yes but generaly review bandwith consumes all the ptls time14:04
sean-k-mooneyelodilles: so the project team does not have time to reflect on the feature that lanaded over the cycle to see what should be highlighted14:04
bauzassean-k-mooney: that's my point14:05
dansmithsean-k-mooney: ah okay so actually yours should go first then14:05
bauzasthis is counterproductive for me to block 2 hours of my time on a FF Thursday in order to write a document I could be able to write the week after14:05
dansmithotherwise I'm, plumbing rpc that lies about what it does14:05
bauzasthe rpc thing concerns me14:06
bauzasdo we need to have a service check on the API ?14:06
sean-k-mooneyi think we can do them in any order intially and jsut fix it once we have the first verion of all the patches14:06
dansmithbauzas: yes because it's a case14:06
dansmith*cast14:06
bauzasdansmith: agreed, particularly because of the non-blocking RPC call then14:07
dansmithbauzas: the original patch was checking with tempest, but if we have an rpc version, we can use the version pin, or check service version14:07
bauzasand an ERROR would be terrible to manage14:07
dansmithbauzas: since this is a new param, I was going to make rpcapi refuse to send if you ask for the new flag, per usual, so the api will catch and handle that, knowing it was not allowed14:08
elodillesbauzas: ack, i see. to tell you the truth i don't know when exactly marketing team processes the cycle highlights, so maybe we have here some days until they'll need it (especially due to FFEs)14:08
dansmithbut all that has to get tested for sure14:08
sean-k-mooneyelodilles: by the way can you add https://review.opendev.org/c/openstack/nova/+/833435 to your list whenever you have time14:09
elodillessean-k-mooney: ack, looking14:10
elodillesbauzas: i'll answer to your mail as well :)14:11
sean-k-mooneyits just a backport making its way down to train so not urgent i came across the downstream bug and realise i should proably followup14:11
bauzasdansmith: oh, right14:11
bauzasdansmith: just exception handling at the API level, you're right14:12
bauzaselodilles: don't get me wrong, I wasn't grumbling, I was just pointing out my problem for discussion14:12
bauzasmaybe the foundation folks don't exactly get what's happening at FF14:13
bauzasin particular with the big projects14:13
elodillesbauzas: i summarized what i think about this in my mail. let's see if someone from Marketing or Release team has different view o:)14:27
elodillessean-k-mooney: the patch looks valid, +2'd14:28
sean-k-mooneyill slowly refersh those patches as they merged but give we are are FF im also trying to limit the ci usage 14:33
elodillessean-k-mooney: ++14:35
opendevreviewsean mooney proposed openstack/nova master: support configdrive rebuilding  https://review.opendev.org/c/openstack/nova/+/85535114:53
opendevreviewsean mooney proposed openstack/nova master: add support for updating server's user_data  https://review.opendev.org/c/openstack/nova/+/81615714:53
sean-k-mooneydansmith: that wont work ^ but there in the correct order now and i can start on the tests14:54
dansmithsean-k-mooney: okay but we can't land an RPC version that says it does a thing when it doesn't14:54
dansmithsean-k-mooney: so I think your thing needs to be in front with "if False" or something14:55
dansmithand I'll change that to "if the thing is enabled:"14:55
sean-k-mooneydansmith: something like https://review.opendev.org/c/openstack/nova/+/855351/2/nova/virt/libvirt/driver.py#387914:55
sean-k-mooneywhich is never set anywhere to True14:56
sean-k-mooneyand this if https://review.opendev.org/c/openstack/nova/+/855351/2/nova/virt/libvirt/driver.py#501114:56
dansmithyeah14:56
sean-k-mooneyso i was thinking you could start a patch on top of the first one or if you want i can proably do it based on the termbin you provided14:57
sean-k-mooneyyou will be quicker at that then me since its been a while since i toughed rpc but either way14:57
sean-k-mooneyim going to work on the base patch and get it to pass unit and func tests14:58
dansmithyeah, I'm working on the rpc stuff already14:58
dansmithI think that base patch will work now14:58
sean-k-mooneyit shoudl work in devstack im expecting the unit test to bitch about the extra paramter to _hard_reboot14:59
bauzashmm, are configdrivers only supported by libvirt N?15:04
bauzasconfigdrives 15:04
bauzasI'm afraid of https://review.opendev.org/c/openstack/nova/+/816157/16/nova/virt/driver.py15:04
sean-k-mooneytechinially they are supproted by other drivers i think15:04
bauzastrampling all our drivers but libvirt15:04
gibibauzas: nope, it is supported by other drivers too15:04
bauzasthat's super late for them to support them15:04
bauzass/them/this15:05
sean-k-mooneythey dont need too15:05
sean-k-mooneythey just wont be able to use this feature15:05
sean-k-mooneyi think that was in the spec15:05
sean-k-mooneyeither way this was only planned to be implented by libvirt15:05
sean-k-mooneyat least for now15:05
bauzashmmmm15:05
gibiso we need to detect the capability in the API and reject the reboot request 15:06
bauzasyup15:06
sean-k-mooneythats in the patch currentlyyes15:06
gibiif there is user_data update but the virt driver has no capability to regenerate15:06
bauzasit wasn't stated a virt driver dependency in the spec https://review.opendev.org/c/openstack/nova-specs/+/816542/7/specs/zed/approved/update-userdata.rst15:07
sean-k-mooneyby the way we can now drop the object changes15:07
bauzasthis change seems to me more and more fragile15:07
sean-k-mooneybauzas: well im not going to have time to work on ironic by monday15:07
bauzassean-k-mooney: I'm not asking you to do such things15:07
gibi(it is not just ironice)15:08
sean-k-mooneyso hehe i think we were reving this as libvirt only or at least i was15:08
bauzasbut I point out how fragile this is15:08
sean-k-mooneygibi: ya i know15:08
bauzasand again, this wasn't explained in the spec15:08
sean-k-mooneyironic would just be the most painful15:08
sean-k-mooneybauzas: orgianly in the spec they planned to only do it for the metadta api15:08
bauzasI was having concerns with this spec because I thought it wasn't really a needed usecase15:08
sean-k-mooneybut then we pointd out config drive exsited and need to also work15:09
bauzassean-k-mooney: no, they said about configdrives too15:09
sean-k-mooneybauzas: no i asked them to add that15:09
sean-k-mooneythey wanted api only15:09
bauzassean-k-mooney: yes, but I also said it was a problem15:09
sean-k-mooneyid did not want this to not work if you used config drive 15:09
sean-k-mooneyso as it stands any driver that does not supprot the new metond will not report the trait15:10
sean-k-mooneythat need to be called out in the api ref15:10
sean-k-mooneyor other docs for this15:10
sean-k-mooneywe could put it in the driver suport matirx i guess15:10
sean-k-mooneythat proably beter long term15:10
sean-k-mooneydansmith: only 6 failure for the extra paramater15:14
sean-k-mooneyi actully need the driver api change in the first patch too so ill add that15:14
gibiI'm wondering about the virt driver api change15:15
dansmithbut bauzas' point about this not being supported by other drivers is legit15:15
dansmithhaving features that look like swiss cheese is very confusing for users15:15
sean-k-mooneygibi: i could drop that now actully15:15
gibiI asked for it originally as I thought the reboot + regenerate logic can be orchestrated from the compute manager15:15
sean-k-mooneyif we just have the parmater15:15
dansmitheven if we reject for the api caller, it's still annoying to say "reboot works, but not reboot with userdata, but only if you're using configdrive"15:16
gibibut the actualy implementation calls the regenerate call from the virt driver15:16
dansmithgibi: reboot is a single call to virt,15:16
gibiso it is less useful to have a virt driver method 15:16
dansmithand the regenerate has to happen in the middle15:16
gibidansmith: I see now15:16
dansmithit's pretty unideal in general for sure15:16
dansmithjust saying, I think that's why it was done this way15:16
gibithis is why I pointing out that the separate virt driver method is probably not needed15:17
sean-k-mooneyso we dont need a new public methond in the virt driver api15:17
sean-k-mooneyits just a new parmater to hard_reboot right15:17
dansmithno, the publicness isn't the concern I think15:17
sean-k-mooneyright i know its the uniformatiy of the api15:17
sean-k-mooneyand not having it depend on the backend15:17
bauzasagreed with dansmith my concern isn't the publicness15:17
bauzasthis is about the fact we depend on virt drivers15:18
gibiit would be nice to do the reboot in 3 generic steps from the compute manager: stop, regenerate, start. But this is a big change15:18
sean-k-mooneyyep i get that but we have many things that depend on the virt driver15:18
dansmithI definitely think these kinds of pinpoint features that look generic but only work in libvirt are a problem15:18
bauzassean-k-mooney: well, do all the virt drivers support to create *or* update the configdrives ? 15:19
dansmithcreate yes15:19
dansmithnobody supports update today AFAIK15:19
sean-k-mooneyvmware ironic libvirt yes15:19
bauzasdansmith: that's my point, I wasn't seeing the dependency when reviewing the spec15:19
sean-k-mooneyfor create15:19
sean-k-mooneynot sure about powervm15:19
dansmithgibi: the problem with that is things like vmware where I'm not sure where the config drive actually gets created, or even if generating it on the compute node and passing it to the virt driver as a file would necessarily be reasonable15:19
dansmithsince it's a remote thing15:19
sean-k-mooneyi can take a look and see what would be required for other driver quickly15:20
sean-k-mooneyironic is the hard one15:20
sean-k-mooneythey pass the config drive via ipa15:20
gibidansmith: I see15:21
dansmithlike I think ironic passes the configdrive as a string to the api,15:21
dansmithnever generates it on disk15:21
sean-k-mooneyah that might be how nova passes it to ironci but then ironic connects it to the vm via ipa or mounting it15:21
dansmithyeah15:21
dansmithit serializes it and then ironic writes it remotely I think15:22
dansmithnow, if compute generated the file I guess the driver could read and send it15:22
dansmithbut still, those are big changes requiring lots of testing15:22
sean-k-mooneyso really we would want the regeneration logic to be here https://github.com/openstack/nova/blob/master/nova/virt/configdrive.py15:22
sean-k-mooneyin the generic shared part15:22
sean-k-mooneyalthogh maybe not15:23
sean-k-mooneythat does not quite do what i was expecting15:23
bauzasI tend to agree with dansmith, we're playing with guns15:23
dansmithwell, I like playing with guns, but maybe.. playing with fire? :)15:23
bauzasif we try to update a configdrive and this doesn't work, then our user won't be happy at all15:24
bauzasdansmith: what you prefer15:24
bauzasor playing with teslas15:24
dansmithas long as you mean "teslas, the unit of charge, meaning playing with dangerous high-voltage" .. then sure :D15:25
bauzaspoint is, we need to only accept to update the userdata if the configdriver correctly regenerated15:25
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1060-L1092 15:25
sean-k-mooneyso ironci jsut tuns it into a string yes15:25
bauzasdansmith: like "driving a tesla with a 10% battery for 100km"15:25
dansmithoh I see15:25
dansmiththat *is* scary :)15:26
bauzasthis *may* work 15:26
bauzasbut before driving, you need to ensure you can do it15:26
bauzasthe same goes with configdrives15:26
sean-k-mooneyand they pass it in spawn and rebuild https://github.com/openstack/nova/blob/bcdf5988f6ae902dba9b41144a7b4a60688b627c/nova/virt/ironic/driver.py#L1189-L119115:26
bauzasyou only accept the userdata to be updated once you're sure your configdrive is correctly regenerated15:26
dansmithso the feeling I'm getting here is that there are a lot of open questions at this point15:27
bauzassince the user doesn't have an idea whether the userdata is updated and only relies on the fact he/she restarted the instance15:27
sean-k-mooneybauzas: well that is already a problem today15:27
dansmithsean-k-mooney: no it's not15:27
bauzastoday the userdata is immutable15:27
dansmithsean-k-mooney: if you can only provide userdata during create, it's clear what userdata the instance sees15:27
bauzasthat reminds me the discussions we had when reviewing server groups /PUT15:28
sean-k-mooneyit is the config drive is not updated today if you atach interface or volumes or update the server metadta15:28
sean-k-mooneythe user data is not statble15:28
sean-k-mooneybtu in general the config drive can be15:28
dansmithconfigdrive != userdata15:28
sean-k-mooneyyep that is my point15:28
sean-k-mooneythe config drive can be stale today the user data is not15:28
sean-k-mooneybut that is only because its imutable15:29
dansmithright15:29
dansmithnot ideal, but also not ambiguous15:29
dansmithwhich is bauzas' point I think15:29
bauzasyup15:29
bauzassince it's immutable, this isn't a problem15:29
sean-k-mooneysure but its expected that the user data script will likely depend on the other datta in the config drive15:29
sean-k-mooneyspecificly the device role taggin info15:29
opendevreviewDan Smith proposed openstack/nova master: WIP: Add recreate_configdrive to reboot  https://review.opendev.org/c/openstack/nova/+/85552915:29
dansmithsean-k-mooney: here's the rpc stuff ^ passing tests, not wired into anything else15:30
dansmithjust so it's available15:30
sean-k-mooneydansmith: but to your point it does sound like there is enough stuff that we wont have this mergable even with an FFE15:30
dansmithneeds more test coverage and an assert on soft I think15:30
bauzassean-k-mooney: my personal opinion is that the more I review this API change, the more I think I discover points we missed15:33
bauzashence my concern15:33
dansmithso I should probably rebase that on your regen configdrive patch and I guess I can rebase the api bit on top of that and trivially wire it up15:33
bauzasthe configdrive regeneration is already a concern (making sure we validate it synchronously with the userdata update)15:33
bauzasthe fact that we leave other virt drivers in the weeds is another concern15:34
bauzasboth weren't discussed at the PTG15:34
dansmithso on that,15:34
bauzasand when I reviewed the spec, I was letting enough comments explaining how I was thinking the reboot solution was fragile15:34
dansmithare we really expecting anyone to update vmware or hyperv to do this?15:34
bauzasreasonably not15:35
dansmithironic perhaps, but for the above patch,15:35
dansmithI was looking at those other drivers and wondering when they were last touched15:35
bauzassurely, but we haven't even asked them to look 15:35
dansmithokay I guess we've had changes in 2022 for vmware15:35
dansmithso maybe doable15:35
dansmithnot much for hyperv though15:36
dansmithso I don't see any configdrive stuff in hyperv at first glance15:37
bauzasfair enough, but then question15:37
bauzasI'm an user15:37
bauzasI wanna update my pet's userdata15:38
bauzasI do it15:38
bauzasand then I don't understand, my f*** script doesn't run as expected when I restart the guest15:38
bauzasshall I open a ticket ?15:38
dansmithoh it's in hyperv, just in vmops.py15:39
bauzasoh, simple, your cloud was running some driver on that host that was having trouble with updating the configdrive15:39
bauzaseither because it was a legacy driver15:39
bauzasor because something (like a perm error) expected15:39
bauzashonestly, this spec was just about touching an internal Nova DB record15:40
bauzasnow, we're pulling way more than that15:40
dansmithyeah, it's becoming more about configdrive than anything else15:40
sean-k-mooneybauzas: in two converstaion but form my persepcitve i dont think we really have discoverd anythign bar the rpc change. i reveiw the spec with the understandin it was scoped to libvirt15:41
bauzasyet again, can't we just assume to update instance's metadata only if not configdrive-driven, and leave people who care about configdrives deal with the complexity one day or another?15:42
dansmithI dunno,  I can see the argument for this being scoped to just one driver, but for something this fundamental it seems pretty unfortunate to say it's only for libvirt15:42
bauzasI'm pretty sure we wouldn't have this back-to-back15:43
dansmithso we reject the api call if configdrive=True always?15:44
bauzasthat's my question15:44
dansmithdoesn't that get closer to your concern on the spec that people think they have updated their instance but didn't reboot it and don't realize that cloud-init doesn't update their stuff?15:45
dansmithI mean, that's just a misunderstanding of cloud-init, but I thought that was your concern15:45
dansmithanyway, I dunno15:45
bauzasif the update failed, they know the instance userdata wasn't updated15:45
bauzasthey would know this was because configdriver if they asked by boot params15:46
bauzasbut they wouldn't know if the clould defaulted to force configdrives or if the image was asking for it15:47
dansmithright, it's not always their choice15:47
bauzasanyway, I'm just offering a trade-off because I'm not really happy with the current state of the series15:47
bauzasalso, is the owner of the patch around ? 15:48
bauzasor are we offering our help for free ?15:48
dansmithwho is asking for this btw?15:48
dansmithaside from the author15:48
bauzasa private clould company15:49
bauzasInovex15:49
jhartkopfI'm here15:49
bauzascool, glad to hear you jhartkopf15:49
bauzasjhartkopf: lemme summarize15:49
dansmithjhartkopf: do you care about the configdrive case?15:49
bauzaswe're having a problem with https://review.opendev.org/c/openstack/nova/+/816157/1615:49
bauzasand we don't know how to move forward15:49
jhartkopfyep I quickly read your discussion15:50
bauzasthere is a proposed solution, but this is premature work and we're very late in the cycle15:50
bauzaswe're also creating a dependency to the virt drivers and unfortunately, Hyper-V, VMWare and Ironic could be impacted if we merge15:50
bauzaslastly, I have concnerns about ensuring that the userdata API query is tied to the successfulness of the configdrive regen op15:51
bauzaswhich would require the API call to be synchronous on a configdrive regen15:51
bauzasso, my question is15:52
* sean-k-mooney i dont think its fiar to say the other virt dirver would be impacted15:52
bauzasdo you really care of configdrives?15:52
dansmithbauzas: we can't make the api call synchronous for config drive regen15:53
dansmiththat could take a long time, be dependent on the IO load on the compute, etc15:53
sean-k-mooneyi dont think that was an option15:53
bauzassean-k-mooney: I think it would be fair to say that this feature wouldn't be available to the other drivers while apparently they already support configdrive regen15:53
bauzaseek, not regen, but creation15:53
sean-k-mooneybauzas: yes that is correct15:53
bauzasdansmith: right, I was pointing out loudly the problem15:53
dansmithah okay15:54
bauzasthis is a design problem15:54
bauzasconfigdrives take time to regenerate15:54
bauzason the other hand, users have an API for updating userdata15:54
bauzasthey just assume they just have to restart their guests in order to have the latest update15:55
jhartkopfbauzas: Initially we did not even want to touch config drives at all. We'd only like to update user data in the metadata service really.15:55
sean-k-mooneyright but the api prevents you updating the user data if the instance has a conf driver outside of hard_reboot15:55
bauzasI know15:55
stephenfingibi: Sorry for the delay. I'd run through most of those PCI patches yesterday but didn't leave reviews until I got to the end of the series. Done now (y)15:55
gibistephenfin: much appreciated. I will look 15:55
stephenfinI went as far as sean-k-mooney did. Could go further but I see -1's from you15:55
gibistephenfin: yes, I left them -1 there as there is a bug in the scheduling part15:56
gibistephenfin: I think it is totally OK to merge it up until https://review.opendev.org/c/openstack/nova/+/850468/2015:56
jhartkopfWhat's up with the plan you guys proposed yesterday to split the patch and only allow updating instances without config drive for now?15:57
gibistephenfin: the rest slips to AA15:57
stephenfinsounds reasonable to me15:57
sean-k-mooneyjhartkopf: i would prefer not to do that if we can avoid it but its an option15:58
sean-k-mooneyi think making this feature depend on if config drive is used or not is worse then only supproting it for some virt dirvrs15:58
gibistephenfin: there is two FUPs at the top. I will move them to be on the mergeable part of the series15:58
sean-k-mooneysicne the config drive can change based on what host you land on15:58
sean-k-mooneythere is no way to knwo if it will work or not15:59
dansmithsean-k-mooney: hmm, the mandatory config drive thing is a compute-node config right?15:59
sean-k-mooneydansmith: yes15:59
dansmithyeah, that sucks15:59
dansmithso instances at the edge which use configdrive become un-updatable15:59
sean-k-mooneyyep16:00
gibiuser could use the capability trait to land on a host that supports regen16:00
dansmithso another thing,16:00
sean-k-mooneyor ones in isolated networks16:00
dansmithwe really should support reboot with user_data for any instance,16:00
dansmithbecause that's the consistent way to get it updated, regardless of what type it is16:00
dansmiththat way you get updated, you get rebooted, etc16:00
dansmithgibi: the user doesn't know this is a problem until it's too late is the point16:01
sean-k-mooneydansmith: that more or less what we were tryign to do and why i was insitieng on config drive support16:01
sean-k-mooneyi guess the disconenct is16:01
dansmithgibi: they don't choose a host based on whether they want to update the user_data a month from now :)16:01
sean-k-mooneyi tought it was ok to start with the libvirt dirver16:01
sean-k-mooneysince we coudl add other drivers later16:01
dansmithsean-k-mooney: I don't think I said I'm opposed to that, what I'm opposed to is just never having that implemented for the other drivers16:02
sean-k-mooneygibi: dansmith  also i fixed a bug where migration could cause you to gain/loose a config drive by making it sticky16:02
sean-k-mooneyso if the vm is first created with a config driver because of the option i made that sitcky16:02
dansmithand given how intertwined this is with the virt driver, and that it was already done wrong once, I would hate to implement this one way and find out that vmware or hyperv or ironic can't honor the request to rebuild during a reboot at the right time16:02
sean-k-mooneydansmith: well that is currently blocked by the trait today16:03
dansmithsean-k-mooney: that also means that if you got a configdrive when you created the instance, it will never have update-able user_data right?16:03
sean-k-mooneybut if we tried to do it quick that could break 16:03
sean-k-mooneydansmith: yes exactly16:03
dansmithsean-k-mooney: no I get that we can detect if it's supported today, I'm talking about if we merge this and next cycle ironic says "we literally can't regenerate the config drive" and hyperv says 'we can, but not in the middle of a reboot because of how we implement that"16:04
dansmiththen we're kinda stuck16:04
sean-k-mooneyya fair16:05
sean-k-mooneyits internal to the driver but it might for exampel require the current reboot to be slit into stop,update config drive, start16:05
sean-k-mooneyfor the hyperv senario16:05
sean-k-mooneywhich we might or might be able to hide16:05
dansmithlibvirt's hard reboot is basically a recreate, but that doesn't mean the other drivers are16:06
sean-k-mooneyyep16:06
dansmithlet me also say that I'm sorry I brought up all these concerns late, but I *was* asked to review this and have tried to put my money where my mouth is on changes16:06
dansmithbut I think these are all legit concerns16:06
bauzasyeah, there are no easy paths for solving this problem16:07
dansmithI know gibi is plotting my murder right now, probably conspiring with jhartkopf  :)16:07
sean-k-mooneythey are. we discussed it in the ptg as we had previous rejected the spec16:07
sean-k-mooneylast cycle16:07
gibiI'm sorry that I drove jhartkopf's solution to a dead end. 16:08
bauzaswell, I had concerns about the complexity it was creating for little gain16:08
bauzasbut I was opposed this was an easy win16:08
gibiI reviewed these patches and missed obvious design errors. I will try better next time.16:09
bauzasso, now, I'm trying to find a trade-off but I don't wanna pull the strings if I think this is risky16:09
sean-k-mooneywell little gain is not neeisaly fiar it makes nova more "cloud native" as the idea was that user data shoudl be more liek k8s config maps16:09
sean-k-mooneyalhtoguh to be fiare that woudl also imply we shoudl delete and recreate the vms16:09
dansmithhah right16:09
dansmithno problem updating user_data if we just shoot the instances in the head :D16:09
bauzassean-k-mooney: we never proposed userdata to be mutable16:09
sean-k-mooneywho is we16:10
bauzasin a cloud, you just spin another instance if you dislike your existing userdata16:10
sean-k-mooneythis has been a long runing request for multiple cycles16:10
sean-k-mooneynot in aws16:10
sean-k-mooneyand other plathforms16:10
jhartkopfdansmith: Honestly it's better to notice problems now than when it's too late16:10
sean-k-mooneyapprenly openstack was an outlier16:10
bauzashttps://docs.openstack.org/nova/latest/user/metadata.html#user-provided-data16:11
gibidansmith: I would I? I feel sorry for jhartkopf's time spent on this. And I feel bad about that I was not able to find the issues you and sean-k-mooney found16:11
gibi*why wouldi?16:11
bauzasglad I'm not quoted :)16:12
sean-k-mooneybauzas: right but this concept was orginally borrowed form ec216:12
sean-k-mooneybauzas: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html16:12
sean-k-mooneyhttps://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html#user-data-view-change16:13
bauzasthe stop requirement from EC2 may help16:14
bauzasanyway, I was about to draft a design modification16:15
sean-k-mooneywell doing it on hard reboot was an optimisation of stop update start16:15
sean-k-mooneybut we could do htat instead16:15
sean-k-mooneyor shelve16:15
bauzasthe fact is, if we only allow the instance to be stopped, we would only allow the instance to be started once the userdata is updated16:16
sean-k-mooneyunless server update is blocking right now im not sure it helps16:16
bauzasat least once the request ended16:16
sean-k-mooneybauzas: well we woudl need an new rpc to the compute to rebuild teh cofnig drive16:17
dansmithwe still have to regen the config drive though16:17
dansmithwhich we don't do on reboot right?16:17
dansmithwe, stop..start16:17
sean-k-mooneyright we dont regen the config drive out side of unshelve or corss cell migration16:17
dansmithbut yeah, saying you can only update user_data if the instance is stopped is probably a lot better for all the arguments16:17
bauzasphew16:17
sean-k-mooneyand unshelve only for non rbd backed instnaces16:17
dansmithit addresses the "so do I need to reboot?" question on non-configdrive16:17
dansmithand eliminates the need for the rpc16:18
dansmithit still requires regen to work, and still may or may not be easy or possible for some drivers16:18
dansmithlike ironic might have to change the provision state to re-write a disk or something16:18
gibiwill we regen for each start then?16:18
dansmithgibi: that'd be a question yeah16:19
sean-k-mooneyim not sure we woudl remove the rpc16:19
dansmithI really don't like the idea of a dirty flag for the other reasons16:19
dansmithsean-k-mooney: oh you mean a new rpc for the update16:19
dansmithyeah that could work16:19
sean-k-mooneyya16:19
dansmithwe could just nuke the existing drive in the libvirt case, but something like ironic would have to keep track16:19
sean-k-mooneyi mean we have another option16:19
sean-k-mooneymake this entirly uer driver by adding a new instance action16:20
sean-k-mooneyfor updating the config drive16:20
dansmithoh instead of a PUT16:20
bauzasI know we're on Zed-3 day and I'm supposed to stay but this is 6:20pm here and I wonder how much of this could be etherpaded16:20
sean-k-mooneyya, stop, update whatever you liek, explictly tell it to update config drive if you care, start16:21
bauzasbecause we're discussing a design change16:21
bauzasdansmith: yeah, something we would say "damn nova, please do the necessary to plumb my things"16:21
sean-k-mooneybauzas: i think we all realise that we likely wont make progress now16:21
bauzasan instance action would be nice to me, because it would get a status16:22
bauzasand the async thing would be simple for the user16:22
bauzaslike, I started again and my userdata didn't updated, why ? 16:22
bauzas=> server action tells me it failed16:22
sean-k-mooneywell the server action can be blocking or async16:23
sean-k-mooneybut it has a feeback mecahnim either way16:23
gibithis could be a very long running blocking call16:23
bauzasyup, the whole point is that it gives a clue why the userdata wasn't uptodate16:23
sean-k-mooneyright it woudl be in the instance event log16:23
bauzasand until the userdata says "I'm updated", don't expect to get the new things if you start16:23
gibiwe can also make the start to fail if it needed to regen but failed to16:24
sean-k-mooneyif its async or in the respocne if blocking16:24
sean-k-mooneygibi: that woudl requrie trackign the dirty state16:24
sean-k-mooneyand also some peopel might not care16:24
gibiI don't see how somebody explicitly updated the user_data and then don't case if that update is ending up in the VM being sarted16:25
gibistarted16:25
gibiI think the dirty flag was bad because of shadow rpc16:25
sean-k-mooneyim thinking about something else16:25
sean-k-mooneyas a user i might not know if the user data is in config drive or not16:26
sean-k-mooneybut i said start16:26
sean-k-mooneyand i might not want tha to break in this case16:26
gibiI see16:27
sean-k-mooneythe dirty flag was to give you the behaior that it would only boot if it succeed to update16:27
bauzashonestly, if I'm an user, I don't care about the internals16:27
sean-k-mooneyif it failed at least in my version teh vm would have gone to error16:27
bauzaswhat I want to know is when my update succeeds and when I can use it16:27
dansmithsorry, I'm getting pulled away,16:28
dansmithbut if it was an instance action,16:28
dansmiththen the rebuild either works or doesn't, I can see what happened,16:28
dansmithand then I either get the old version on fail, or the new version on succeed16:28
dansmithif we just set a dirty flag and then fail to regen on start, then we've killed the instance with no way to restart16:28
dansmithwhich sucks16:28
bauzasyup, I get feedback from an user perspective16:28
bauzasyup, agreed16:28
sean-k-mooneyjhartkopf: so based on the above16:29
dansmithwe probably need some task_state or something to keep the user from starting it until the regen completes though16:29
sean-k-mooneyjhartkopf: and the fact we are all runing out of steam16:29
bauzasmikal would say "spec'y spec'y"16:29
dansmithor just use instance lock on the compute node to hold off on any other requests.. I guess that's enough16:29
sean-k-mooneyi think we need to disucss this again next cycle16:29
bauzasthe next PTG will be virtual16:29
bauzasanyone can join, budged saved16:30
* bauzas tries to not be sarcastic16:30
sean-k-mooneyya but we can reopne the sepc and discuss it again before htat16:30
bauzasoh yeah16:30
sean-k-mooneyi see to ways we coudl go with the instance action by the way16:30
sean-k-mooneyone is one for regeneratin cofnig drive16:31
sean-k-mooneythe other is more targed for this feature16:31
sean-k-mooneyan instance action to update user_data16:31
sean-k-mooneythat either succeed or does not16:31
sean-k-mooneyand have that be the only wya to update it16:31
sean-k-mooneyso no updating it via server put16:31
dansmithyeah, the user should not see "regenerate config drive"16:32
dansmithso it should be "update my user_data" at the external surface for sure16:32
sean-k-mooneyright16:32
sean-k-mooneyso if we do that16:32
sean-k-mooneythat dedicated instance action can "do the right thing" regardess of how the instance is booted16:33
dansmithright16:33
sean-k-mooneyand returna 409 conflict if its not supprot by the driver or whatever16:33
jhartkopfsean-k-mooney: Oh that'd be bad news. Was hoping to get at least the stuff without config drives merged this cycle.16:33
sean-k-mooneybut it can be atomic16:34
JayFo/, I have a handful of ironic driver patches I'm still trying to get backported thru. They are all clean backports and passing tests. 16:34
JayFhttps://review.opendev.org/c/openstack/nova/+/854257 (no live migration tests on ironic driver changes -> yoga) https://review.opendev.org/c/openstack/nova/+/821351 (ignore plug_viles on ironic driver -> ussuri) https://review.opendev.org/c/openstack/nova/+/853546 (minimize window for a resource provider to be lost -> train)16:34
bauzassean-k-mooney: dansmith: yeah, the user-surfaced API is the userdata, not the configdrive16:34
bauzasso the instance action has to be related to it16:35
sean-k-mooneyJayF: did we land the skip of the migration job yet16:35
JayFsean-k-mooney: that's the yoga backport above16:35
JayFsean-k-mooney: the version -> master is in16:35
sean-k-mooneyan cool16:35
sean-k-mooneyi was onder if the master on had landed16:36
bauzasjhartkopf: the problem is that we can't introduce this userdata update by splitting without configdrive as we just agreed on a brand new API16:36
JayFoh, I've already checked all these -- previous branches have merged, these should be good to go. e.g. I'd approve these if they were ironic stable changes16:36
bauzasfor consistency reasons, we have to hide the configdrive details from the user request16:36
sean-k-mooneybauzas: that and my +2 on the spec was contingent on it not depending on if you use configdrive or not16:36
JayFand elod helped me clean up the only piece around nova stable policy that's more strict than Ironic :D 16:36
bauzassean-k-mooney: that kind of design problems happening late in the cycle are common16:37
bauzasjhartkopf: this isn't at all about your implementation, this is rather about the fact we discovered some design problems we weren't considering at the spec review time16:38
gibiwe should be better at doing this kind of design discussion before the FF day16:38
gibibut I don't know how16:38
bauzashonestly, this is the only blueprint that fails this for this cycle AFAICT16:38
bauzasgibi: we had precedents16:38
bauzassome spec approved and a design question raising late16:39
jhartkopfbauzas: I understand that, but it's really unfortunate though16:39
bauzasjhartkopf: I don't disagree with you and I understand how much this can be frustrating16:39
artombauzas, wait, so ephemral disk, PCI, and manial will all make it?16:40
* artom is frankly astounded16:40
artom*Manilla16:40
gibibauzas: I don't think we can detect all the design issues at spec time. But still I think we should be better than doing it at the day of FF16:40
sean-k-mooneynot sure about manilla16:40
bauzasjhartkopf: I just hope you can understand that our API resources are important to us and we really care about the UX16:40
* artom maybe misunderstood "this is the only blueprint that fails this for this cycle AFAICT"?16:40
sean-k-mooneyephmeral also likely not16:40
bauzasartom: no, I was mentioning a design issue being raised16:40
artomAh16:40
bauzasephemerals and PCI are on their way but they're big and need a lot of back and forths16:41
gibithe only easy answer for me know is that I should have asked dansmith earlier to review the series16:41
bauzasbut they hadn't required a redesign 16:41
bauzasgibi: well, if you read the spec, I had concerns about the restart16:42
dansmithgibi: this is why I'm against yearly releases.. because this stuff, very unfortunately, only comes to a head when there's a deadline...16:42
gibiso the I should have asked you to review the series too16:42
bauzaswe just didn't went further thinking about configdrive problems16:42
gibidansmith: I totally agree about having more meaningful deadline would help16:42
sean-k-mooneydansmith: because thats the forcing function that get a lot of use to review a tthe same time16:43
sean-k-mooneydansmith: in the absence of an inperson PTG16:43
dansmithand I only reviewed it because it was in front of something else that I know we needed to land, which was up against the deadline16:43
bauzasand I was on PTO for 3.5 weeks16:43
bauzasso blame me too16:43
dansmithWE ALL SUCK16:43
dansmiththere ^ :P16:43
sean-k-mooney:)16:43
gibiI don't want to blame, I want to find something I or we can do different next time16:43
jhartkopfthis ^16:44
bauzaseveryone has different review sensitiveness16:44
bauzasfor me, I mostly care about upgrades and controler/compute relationship16:44
bauzasbut I miss other things16:45
bauzasthe point is, we're humans16:45
* artom was completely absent for the entire thing, and thus would like to claim that he does not, in fact, suck.16:45
bauzasartom: you're Quebecian16:45
bauzasthere is a default fail16:45
gibiI don't want to believe that this is OK as is. :)16:46
artomYou want a Maurice Richard riot v2.0? Because that's how you get a Maurice Richard riot v2.0.16:46
gibiI want to improve!16:46
sean-k-mooneywell i guss where this should have been hashed out comes back to the spec16:46
bauzasyup ^16:46
sean-k-mooneyand the first red flag should have been when the quetion of how to regenerate teh config drive came up16:47
sean-k-mooneyafter the spec was merged16:47
gibidansmith: we would need a bit more than deadlines we would need to inject deadlines that forces a patch series to be ignored for 2 months or so16:48
dansmithto not be?16:48
sean-k-mooneywe did actully have such a deadline in the past16:49
gibidansmith: to be. If the deadline only means I can try the same deadline two weeks from now then that does not force me to push for this particular deadline16:49
dansmithI think part of the problem is that jhartkopf has to wait 6mon (minimum) before he has a consumable release with this in it16:49
sean-k-mooneyit was called the non-priorty feature propsal deadline16:49
dansmithgibi: yeah that's true16:49
sean-k-mooneywhich requried any feature not marked as a priorty to be propsed by m216:49
dansmithgibi: two weeks is probably too infrequent to be pressure-inducing16:49
gibidansmith: true, I defaulted to scrum 16:50
dansmithwe should probably retro this later, I have other m3 stuff I need to do16:50
gibisean-k-mooney: that fulfills the big-enough-pain-to-miss requirement. :)16:50
gibisean-k-mooney: I guess it introduced another issue, selecting priorities. But that was maybe less of a problem16:51
gibiand we did see that having priorities would help with the rebuild bfv series16:52
gibi* would have helped16:52
gibiso I'm not against discussing the reintroduction of non-priority feature freeze as one possible option16:52
sean-k-mooneyyep16:52
bauzasnoted.16:53
sean-k-mooneyso way back we used to requrie non priorty feature to merge by m216:53
* dansmith starts singing "the circle of life" in the background16:53
JayFAs a mostly-outside observer, this sounds mostly like the standard "10 gallons of work has to fit in a 5 gallon bucket" problem :)16:53
sean-k-mooneywe then relaxed it for the to be proposed by m216:53
sean-k-mooneyand then removed it16:53
bauzasremember, we discussed this at the PTG when I pointed out I was off for three weeks16:53
gibiI have to drop, but thanks for the above discussion16:54
bauzasbut anyway, I personnally feel another deadline wouldn't have changed the situation16:54
gibilets continue this on the PTG as part of a retro 16:54
sean-k-mooneysure16:54
bauzasyet again, this was a design concern, being missed at the design state16:54
bauzasstage*16:54
dansmithto be clear, my point about deadlines was that moving to yearly releases would make this a lot worse, not necessarily that more deadlines than we have would *solve* it16:54
bauzasdansmith: oh yeah16:54
sean-k-mooneydansmith: yep it would16:54
melwittsean-k-mooney: thanks for the link17:13
opendevreviewJay Faulkner proposed openstack/nova stable/xena: nova-live-migration tests not needed for Ironic  https://review.opendev.org/c/openstack/nova/+/85546917:38
*** efried1 is now known as efried17:53
opendevreviewRico Lin proposed openstack/nova master: Add traits for viommu model  https://review.opendev.org/c/openstack/nova/+/84450718:14
ricolinsean-k-mooney: bauzas sorry, a pep8 error, can you approve it again, thanks18:14
opendevreviewMerged openstack/nova master: Heal missing simple PCI allocation in the resource tracker  https://review.opendev.org/c/openstack/nova/+/85135918:15
opendevreviewMerged openstack/nova master: Heal PCI allocation during resize  https://review.opendev.org/c/openstack/nova/+/85239618:16
opendevreviewMerged openstack/nova master: Heal allocation for same host resize  https://review.opendev.org/c/openstack/nova/+/85482218:16
opendevreviewMerged openstack/nova master: Handle PCI dev reconf with allocations  https://review.opendev.org/c/openstack/nova/+/85239718:21
opendevreviewMerged openstack/nova master: Allow enabling PCI tracking in Placement  https://review.opendev.org/c/openstack/nova/+/85046818:21
dansmithricolin: I got you18:25
ricolinthanks dansmith 18:28
bauzasricolin: dansmith: done too18:52
bauzasfwiw, I'm setting Implemented against all blueprints that have their API changes merged but waiting for client patches19:15
bauzasnova in launchpad != novaclient and osc19:15
opendevreviewMerged openstack/nova stable/ussuri: Ignore plug_vifs on the ironic driver  https://review.opendev.org/c/openstack/nova/+/82135119:29
bauzasfwiw, procedural -2 on https://review.opendev.org/c/openstack/nova/+/816157/1619:32
JayFIntersesting failure case in stable/yoga CI, I think it's probably something that's actually-broken, but not by my patch: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8b1/854257/2/gate/openstack-tox-py39/8b1323f/testr_results.html19:35
JayFGiven that all tests that were comparing the contents there are all failing in a similar way, with no human-readable changes19:35
sean-k-mooney JayF  i have hit that too19:36
sean-k-mooneyon complete unrelated path19:36
sean-k-mooneyi cant repodcue it locally for what its worth19:36
JayFare you suggseting I just retry then?19:36
sean-k-mooneylet me check if my other one pass after a retry19:36
JayFI can do that, I usually don't pull that lever if it looks nonrandom like this one does, but I don't know enough about nova failure cases to be certain 19:36
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/855025 failed agian19:38
sean-k-mooneyJayF: same issue19:38
sean-k-mooneyand its only on py3919:38
sean-k-mooneythe py36 test passes19:38
bauzasgibi: sean-k-mooney: others, you may be interested in the antelope timeframe https://review.opendev.org/c/openstack/releases/+/850753/5/doc/source/antelope/schedule.rst19:38
sean-k-mooneyJayF: so its a real blocker19:38
bauzasgibi: sean-k-mooney: this sounds a shorter development cycle19:39
sean-k-mooneynot realted to your patch at all19:39
sean-k-mooneyi had t fail on all patches in a 5/6 patch sereis and on the recheck of the first patch19:39
sean-k-mooneyso its seams to be repoducabel in ci19:39
sean-k-mooneyJayF: i am not sure ill have time to look at that this week but we likely should track that as a stable gate bug. ill try and follow up with elodilles about it next week19:45
JayFWhen you say "track that as a stable gate bug", what does that mean in practice for Nova? 19:46
JayFI know in Ironic, e.g., we have an etherpad we track that in19:46
JayFI'm happy to document the issue; I can't sit here and ask for a bajillion stable reviews without being willing to help with CI somewhat :D 19:46
sean-k-mooneyJayF: we have an etherpad and we file a bug with gate blocker in the title19:46
JayFWhere could I find a link to that etherpad? 19:46
sean-k-mooneyhttps://etherpad.opendev.org/p/nova-stable-branch-ci19:46
JayFty19:47
JayFbugs in nova are launchpad, not storyboard? 19:48
JayFyep looks like it, filing it and will update the etherpad19:48
sean-k-mooneyyep launchpad19:48
sean-k-mooneywe have as  gate-failure tag19:49
opendevreviewMerged openstack/nova master: libvirt: Add vIOMMU device to guest  https://review.opendev.org/c/openstack/nova/+/83064619:56
sean-k-mooneyJayF: https://tinyurl.com/mr3pjzyn19:57
JayFWhat login would I use for that?19:57
JayFI've not used opensearch yet19:57
sean-k-mooneyopenstack openstack19:57
* JayF hasn't done serious bulk-gate-troubleshooting since the elastic-recheck days19:57
sean-k-mooneyits basically the same19:57
sean-k-mooneyso only 8 hits 6 are me i think but i feel like that might be a limaitaiotn of the query19:58
JayFone of the other two is me lol19:59
sean-k-mooneyits shoing in the nova cros job19:59
sean-k-mooneybut this job is openstack-tox-py3919:59
sean-k-mooneyso i dont know why that is not showing up19:59
JayFit's going to be extremely hard to make a matcher on this that isn't at least possible it'll catch other stuff20:01
JayFbecause this is essentially "it looks like a proper unit test failure but just... isn't"20:01
sean-k-mooneynova.tests.unit.cmd.test_manage.UtilitiesTestCase.test_format_dict [0.025453s] ... FAILED20:01
sean-k-mooneyi should be able to match on that instead20:01
JayFYeah, I've read the docs a couple of times, and I can't figure out how to wildcard out the time in the test run search20:13
sean-k-mooneysame i think im going to give up for now20:13
* JayF sometimes wishes these guis were just a terminal with something greppy installed20:13
opendevreviewMerged openstack/nova-specs master: Create specs directory for 2023.1 Antelope  https://review.opendev.org/c/openstack/nova-specs/+/85100720:15
opendevreviewSofia Enriquez proposed openstack/nova master: WIP: Check NFS protocol  https://review.opendev.org/c/openstack/nova/+/85403021:09
opendevreviewSofia Enriquez proposed openstack/nova master: Check NFS protocol  https://review.opendev.org/c/openstack/nova/+/85403021:28
*** dasm is now known as dasm|off21:42
JayFI filed https://bugs.launchpad.net/nova/+bug/1988482 and am documenting it in that etherpad now.21:57
JayFlet me know if there's anything I can do to help resolve it, I am a little behind on my usual stuff due to being out sick earlier this week, but if it's not picked up by early next week I might see if I can figure it out21:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!