Thursday, 2022-09-01

whoami-rajat	sean-k-mooney, it is already being tested in tempest-full but the problem is the image is same before/after rebuild so it's not testing the actual feature i.e. to reimage to a new image hence the new job, not sure if there is a better way to do it	05:10
opendevreview	Merged openstack/nova master: Add support for volume backed server rebuild https://review.opendev.org/c/openstack/nova/+/820368	07:44
gibi	good morning	07:47
opendevreview	Merged openstack/nova master: Add conductor RPC interface for rebuild https://review.opendev.org/c/openstack/nova/+/831219	07:51
opendevreview	Merged openstack/nova master: Add API support for rebuilding BFV instances https://review.opendev.org/c/openstack/nova/+/830883	07:53
bauzas	good morning	07:58
Uggla	gibi, bauzas good morning	08:16
* bauzas needs a serious coffee		08:17
bauzas	Uggla: how was this morning ? :)	08:17
bauzas	was Rose happy to see her friends ?	08:17
Uggla	Yes, she's ok and happy because she has the teacher she wanted.	08:18
gibi	:)	08:19
Uggla	One little girl in her group was crying, and Rose gave her hand to go to the classroom.	08:21
gibi	so nice	08:21
bauzas	nice and bravo Rose	08:24
bauzas	my oldest is still on vacations, she'll start tomorrow	08:25
Uggla	in short no pb so far. I'll see this afternoon.	08:25
* bauzas has to remember that the school times have changed this year		08:25
bauzas	if you see me disappearing quickly, that's because I forgot about the time :p	08:26
Uggla	1st day on a Friday ? This is a warmup day for teachers ? :)	08:32
bauzas	Uggla: no, for the "6e" chidren (10-11yo) they started today	08:33
bauzas	for the older (12-14yo), they start tomorrow	08:33
*** akekane_ is now known as abhishekk		08:33
Uggla	yep strange to start on a Friday. (for your older one)	08:33
bauzas	for the "6eme" folks, they only work today and not tomorrow	08:35
bauzas	("6eme" is first year of middle school)	08:36
Uggla	ok so only 1 day the first week. Soft start.	08:46
sean-k-mooney	its pretty common here	09:19
sean-k-mooney	they give the students who are new to the school 1 day before the rest come back	09:19
sean-k-mooney	so they can be shown around without all the other people	09:20
sean-k-mooney	so normally they start on a wednesday here youngets class group, then first half then rest	09:20
opendevreview	Konrad Gube proposed openstack/nova-specs master: Add API for assisted volume extend https://review.opendev.org/c/openstack/nova-specs/+/855490	09:21
sean-k-mooney	melwitt: ^	09:21
sean-k-mooney	whoami-rajat: tempest has an alternitive image so we can have two test one that reimages to the same image and one the reimages to the alt image https://github.com/openstack/tempest/blob/master/tempest/config.py#L294-L297	09:36
sean-k-mooney	whoami-rajat: so you would boot with image_ref and rebuild to image_ref_alt	09:36
sean-k-mooney	whoami-rajat: we really only care that the images are idfferent in glance they can both be cirros and its fine	09:37
sean-k-mooney	the mecanics are the same if its the same image or differnt with the new microverion	09:38
sean-k-mooney	with the old microversion you can jsut the requried alt image to assert its not allowed	09:38
sean-k-mooney	and the same image to assert it preserved data	09:38
*** brinzhang_ is now known as brinzhang		09:46
whoami-rajat	sean-k-mooney, those parameters point to the same image in default tempest job, like see this tempest.conf from tempest-full run, both image_ref and image_ref_alt points to same image https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_83f/831018/24/check/tempest-full-py3/83f6b4a/controller/logs/tempest_conf.txt	09:56
whoami-rajat	sean-k-mooney, I'm configuring different images for those parameters in my job, see tempest.conf for my new job https://a59fb54e72bca9b30065-6505ce46ad1aa88851df8d527471f9dc.ssl.cf1.rackcdn.com/831018/24/check/tempest-rebuild-volume-backed/c4755d7/controller/logs/tempest_conf.txt	09:56
whoami-rajat	IMAGE_URLS parameter in devstack enables setting different images for those parameters	09:57
whoami-rajat	by default only one image is created and set as both image_ref and image_ref_alt	09:57
whoami-rajat	that's why i configured a new job with this parameter, wasn't sure if modifying existing jobs for my test run is a valid thing so created a new job	09:58
gibi	stephenfin: hi! how do you feel, will you have time to review the PCI series up until https://review.opendev.org/c/openstack/nova/+/850468/20 before the FF deadline?	10:00
sean-k-mooney	whoami-rajat: yes but it would be better to configure devstack to upload the cirrios image twich with differnt names and then use that	10:09
sean-k-mooney	so i think yes modifying the exisitng josb is totally valid	10:09
sean-k-mooney	and would get more coverage of the feature. that can be done in the futrue	10:10
sean-k-mooney	whoami-rajat: also congrats its merged in nova this morning	10:10
whoami-rajat	hmm, ack, will try that	10:10
sean-k-mooney	whoami-rajat: sorry it took so long	10:10
whoami-rajat	sean-k-mooney, thanks and thank you to all the nova team bauzas gibi dansmith for helping with that!	10:10
whoami-rajat	no problem :)	10:11
sean-k-mooney	did you see my comments on teh osc patch	10:11
whoami-rajat	nope, looking	10:11
sean-k-mooney	ack you duplicated a test but forgot to rename it in the copy paste	10:11
sean-k-mooney	trivial fix	10:11
gibi	whoami-rajat: you are welcome	10:11
whoami-rajat	sean-k-mooney, oh i addressed that yesterday only, i thought there was a new comment	10:12
sean-k-mooney	ack, no i have not reviewd since you updated it	10:12
sean-k-mooney	ill go do that now	10:12
whoami-rajat	thanks	10:12
whoami-rajat	https://review.opendev.org/c/openstack/python-openstackclient/+/831014 -- for quick reference	10:12
sean-k-mooney	alredy reviewed	10:13
sean-k-mooney	it looks good to me stephenfin ^ your an osc core care to take a look	10:14
opendevreview	Amit Uniyal proposed openstack/nova master: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/854499	10:17
opendevreview	Amit Uniyal proposed openstack/nova master: [compute] always set instnace.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/791135	10:17
sean-k-mooney	ricolin: can you update https://review.opendev.org/c/openstack/nova/+/844507/18/nova/virt/libvirt/driver.py#12202	11:14
sean-k-mooney	if you can make that change im +2 on both patches	11:15
sean-k-mooney	bauzas: ^	11:15
sean-k-mooney	ill see if i can quickly test your changes in devstack too	11:15
sean-k-mooney	to confirm this work as intended	11:15
bauzas	sean-k-mooney: I can review the traits change once ricolin updates it	12:46
ricolin	Thanks sean-k-mooney: bauzas will do it ASAP	12:46
bauzas	cool	12:46
sean-k-mooney	in its current form its not wrong its just over complciated	12:46
sean-k-mooney	ricolin: i think if you adress that everything else looks ok	12:47
sean-k-mooney	and it can land before FF	12:47
ricolin	sean-k-mooney: sounds great	12:47
sean-k-mooney	i have not had a change to test it locally yet but i expect it to work	12:47
opendevreview	Amit Uniyal proposed openstack/nova master: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/854499	12:47
opendevreview	Amit Uniyal proposed openstack/nova master: [compute] always set instnace.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/791135	12:47
sean-k-mooney	ricolin: i assume you ahve booted a vm with this code and validated the iommu is present	12:48
*** dasm\|off is now known as dasm		12:54
opendevreview	Rico Lin proposed openstack/nova master: Add traits for viommu model https://review.opendev.org/c/openstack/nova/+/844507	12:55
ricolin	sean-k-mooney: done	12:56
ricolin	and yes, I'm sure IOMMU is there. Just not sure I test Traits in the right way	12:58
bauzas	ricolin: sean-k-mooney: reviewing	13:01
ricolin	thanks bauzas	13:02
sean-k-mooney	bauzas: gibi if ye want to proceed with ^ im more or less happy with it. the release note shoudl be in the seond patch	13:23
sean-k-mooney	but as long as we land both of them its not an issue	13:24
bauzas	sean-k-mooney: I just said +1 for an upgrade question	13:24
sean-k-mooney	i replied	13:24
sean-k-mooney	thats intentional	13:24
sean-k-mooney	altough dansmith might point out we could have also added a min compute service check for this instead of using the traits	13:25
sean-k-mooney	we normally did not dod that in the past however	13:26
sean-k-mooney	so i think this is all good to go	13:26
gibi	the trait based capability scheduling looks OK to me	13:26
gibi	I think dansmith had issues using the capability trait outside of the scheduling	13:26
dansmith	if we're already scheduling, then traits make plenty of sense	13:26
dansmith	right	13:26
sean-k-mooney	yep its in the existing pre filter	13:27
gibi	bauzas: do you need my +2 or you will send it in?	13:27
sean-k-mooney	before we started to do these check with placment we would have landed on the host and failed there with hypervior too old or similar	13:27
bauzas	sorry folks was in 1:1 meeting	13:28
bauzas	sean-k-mooney: thanks, will reply then with +2	13:28
gibi	ack, then I'm not needed there :)	13:28
bauzas	it was just an open thought	13:28
bauzas	I just want to make sure that operators know they need to upgrade all their hosts if so	13:28
bauzas	but that's understandable	13:29
sean-k-mooney	well its as you said	13:29
sean-k-mooney	the dont actully have to but it will be capsity limited	13:29
sean-k-mooney	but that kind of to be expected that you cant use new feature on old nodes	13:30
bauzas	yup, agreed	13:30
bauzas	at least with traits	13:30
sean-k-mooney	well even without	13:30
bauzas	without, we ask for compute service checks in general	13:30
sean-k-mooney	yep	13:31
sean-k-mooney	but that is why i said the traits patch chould be first	13:31
bauzas	well, if both merge for Zed, I'm cool	13:31
sean-k-mooney	and the other one secodn and why i said in the current order they need to merge togather	13:31
bauzas	yes and no	13:31
sean-k-mooney	bauzas: let me rephase i dont want to merge only one of thoes two patches	13:32
bauzas	you could merge the first, this is just you won't be able to get the feature until you upgrade all	13:32
dansmith	gibi: sean-k-mooney: are we FFEing the user_data stuff such that I should try to bang out the rest of the RPC stuff ASAP?	13:32
bauzas	won't be able to be sure to get the feature	13:32
bauzas	dansmith: eeek, context ?	13:32
bauzas	oh, the rpc call	13:32
bauzas	lemme just send to the gate ricolin's work	13:33
sean-k-mooney	bauzas: that an di guess i need to rev my follow up patch with actul tests	13:33
sean-k-mooney	dansmith: im not agasint doint that if you think you will have time	13:33
sean-k-mooney	but that probaly the main RFE that i think FFE might make sense for	13:34
dansmith	sean-k-mooney: I was expecting to see a rev of the patch to move the regen to after the instance is destroyed and fix the actual writing that was failing	13:34
dansmith	so I hadn't rebased my RPC stuff yet	13:34
sean-k-mooney	the only other one might be the PCI series ebut have nto looked at it for two days	13:34
dansmith	but that has to happen first right?	13:34
sean-k-mooney	oh i wrote a ptach to fix config drive	13:34
dansmith	oh is that ths? https://review.opendev.org/c/openstack/nova/+/855351/1	13:35
sean-k-mooney	yes	13:35
gibi	sean-k-mooney: you are +2 up until the healing patches in the PCI and that is the realistic target there. If stephenfin will have no time to review them today then I will ask for an FFE for that. We let the scheduling part slip in any case	13:35
bauzas	the mutable userdata has API impact	13:35
dansmith	okay but that would need to be squashed or go in front right?	13:35
bauzas	but I'm OK with FFE'ing if needed	13:35
dansmith	bauzas: it would also be about half not-very-reviewed code at this point based on the look of the configdrive patch	13:36
dansmith	so not really "just didn't get reviewed in time"	13:36
sean-k-mooney	dansmith: yes or they split the patch in to non config drive and config drive	13:36
dansmith	that would mean two microversions for effectively the same thing	13:37
sean-k-mooney	dansmith: i basically started my patch so they would have a refernce for what needed to be done	13:37
dansmith	unless we do the config regen and rpc ahead of exposing int the api	13:37
bauzas	dansmith: I think we reviewed it good, but we got a bone	13:37
bauzas	so, I'm OK with giving more time to review that bone fix	13:37
dansmith	bauzas: I'm saying the code that needs to be written to make it landable would probably double the actual code in the patch	13:38
sean-k-mooney	dansmith: yep we coudl do that so intialy it would not be invokeable from the api btut he code would be in place	13:38
dansmith	sean-k-mooney: yeah that seems better to me	13:38
bauzas	dansmith: then, we need to take this extratime to balance the risks and maybe not merge it	13:38
bauzas	or decide we only merge half the things	13:38
sean-k-mooney	dansmith: so you could proably combin your rpc code into the pathc i started then we could flip the order of them	13:39
gibi	we could merge the part that support user data update with non config drive instances.	13:39
sean-k-mooney	but that also means we need to other pepoel to agree to review this	13:39
sean-k-mooney	since you and i are basicaly out at that point	13:39
gibi	I can review the patches (I will be around 18:00 CEST today but I can spend time early tomorrow too)	13:40
gibi	* I will be around until 18:00 CEST today	13:40
dansmith	sean-k-mooney: right, that's a problem too	13:40
dansmith	sean-k-mooney: so a patch in front that adds the flag to the reboot call, then the patch to make it regeneratable, then the api patch to do it for both types would be cleanest I think	13:41
sean-k-mooney	dansmith: ya that sounds like a plan	13:41
sean-k-mooney	it keeps just one micorversion	13:42
sean-k-mooney	i can go writeh the tests need to make the rebuild patch complete if we agree to proceed in this direction	13:42
dansmith	sean-k-mooney: ack, I'll rebase my rpc stuff	13:43
sean-k-mooney	dansmith: it would proably make sense for you to write your patch directly off master and i could rebase and flip the order when your done	13:44
dansmith	sean-k-mooney: yeah	13:44
bauzas	I'm OK with having 3 patches, the API one being the latest	13:44
bauzas	and this being FFE, so we have 1 week to correctly look at all of this	13:44
bauzas	elodilles: when is the exact deadline for cycle highlights ?	13:47
bauzas	today or tomorrow ?	13:47
bauzas	one day, the marketing team will understand how crazy it is to ask for feature docs deliveries at the FeatureFreeze (and even before)	13:48
bauzas	while RC1 time would be so easier	13:48
bauzas	this is just, I need to take 1 hour of my time for writing something while we're on fire	13:49
sean-k-mooney	on the plus side tempest-integrated-compute forces config drive and that passed so at least that change in logic works for normal boots https://zuul.opendev.org/t/openstack/build/98cfa86b93974d79b1d43c29433d7e48/log/controller/logs/etc/nova/nova-cpu_conf.txt#17	13:50
sean-k-mooney	i also checked that locally but nice to see it in ci too.	13:52
elodilles	bauzas: i think there is no strict deadline for today, we will merge it tomorrow if it will be ready only around that time :)	13:53
sean-k-mooney	elodilles: i think the remidner was asking for it yesterday	13:54
opendevreview	Justas Poderys proposed openstack/nova-specs master: Improve multiqueue network adapter support https://review.opendev.org/c/openstack/nova-specs/+/855514	13:54
sean-k-mooney	but i guess bauzas point was we might not know the full feature set until we hit FF	13:54
bauzas	elodilles: surely there is no problem, but I wonder why such rush	13:54
sean-k-mooney	or RC1 if we have FFEs	13:55
bauzas	elodilles: I just replied to your email fwiw	13:55
bauzas	I guess my point is not if I can deliver later	13:55
bauzas	but rather asking for an official move to RC1	13:55
sean-k-mooney	elodilles: oh you just said this week	13:56
bauzas	sean-k-mooney: https://releases.openstack.org/zed/schedule.html#cycle-highlights	13:56
sean-k-mooney	yep	13:56
sean-k-mooney	looking at it now	13:57
bauzas	the schedule officially says the week of Zed-3	13:57
bauzas	this is non-trivial	13:57
sean-k-mooney	i can understand that they might want to have them read for RC1	13:57
sean-k-mooney	but the week beteeen the two woudl be a better comproise	13:58
dansmith	sean-k-mooney: so I'm bringing in the libvirt driver reconfigure_configdrive() method from the original patch, but I should move that until after the self.destroy() yeah?	14:01
sean-k-mooney	well it need to get called via the callback	14:02
sean-k-mooney	https://review.opendev.org/c/openstack/nova/+/855351/1/nova/virt/libvirt/driver.py#3978	14:03
dansmith	I guess I need to look at your patch	14:03
elodilles	bauzas: if everything goes accroding to plans, then by FF a team knows every feature which were merged. of course, FFEs can alter the picture, but that i think is acceptable. Be the most part of the highlights merged and update the rest if needed as soon as possible. IMO.	14:03
sean-k-mooney	tl;dr building the config drrive need info generage by libvirt in the xml	14:03
elodilles	bauzas: "The Zed-3 milestone marks feature freeze for projects following the release:cycle-with-rc model. No featureful patch should be landed after this point. Exceptions may be granted by the project PTL."	14:03
sean-k-mooney	elodilles: yes but generaly review bandwith consumes all the ptls time	14:04
sean-k-mooney	elodilles: so the project team does not have time to reflect on the feature that lanaded over the cycle to see what should be highlighted	14:04
bauzas	sean-k-mooney: that's my point	14:05
dansmith	sean-k-mooney: ah okay so actually yours should go first then	14:05
bauzas	this is counterproductive for me to block 2 hours of my time on a FF Thursday in order to write a document I could be able to write the week after	14:05
dansmith	otherwise I'm, plumbing rpc that lies about what it does	14:05
bauzas	the rpc thing concerns me	14:06
bauzas	do we need to have a service check on the API ?	14:06
sean-k-mooney	i think we can do them in any order intially and jsut fix it once we have the first verion of all the patches	14:06
dansmith	bauzas: yes because it's a case	14:06
dansmith	*cast	14:06
bauzas	dansmith: agreed, particularly because of the non-blocking RPC call then	14:07
dansmith	bauzas: the original patch was checking with tempest, but if we have an rpc version, we can use the version pin, or check service version	14:07
bauzas	and an ERROR would be terrible to manage	14:07
dansmith	bauzas: since this is a new param, I was going to make rpcapi refuse to send if you ask for the new flag, per usual, so the api will catch and handle that, knowing it was not allowed	14:08
elodilles	bauzas: ack, i see. to tell you the truth i don't know when exactly marketing team processes the cycle highlights, so maybe we have here some days until they'll need it (especially due to FFEs)	14:08
dansmith	but all that has to get tested for sure	14:08
sean-k-mooney	elodilles: by the way can you add https://review.opendev.org/c/openstack/nova/+/833435 to your list whenever you have time	14:09
elodilles	sean-k-mooney: ack, looking	14:10
elodilles	bauzas: i'll answer to your mail as well :)	14:11
sean-k-mooney	its just a backport making its way down to train so not urgent i came across the downstream bug and realise i should proably followup	14:11
bauzas	dansmith: oh, right	14:11
bauzas	dansmith: just exception handling at the API level, you're right	14:12
bauzas	elodilles: don't get me wrong, I wasn't grumbling, I was just pointing out my problem for discussion	14:12
bauzas	maybe the foundation folks don't exactly get what's happening at FF	14:13
bauzas	in particular with the big projects	14:13
elodilles	bauzas: i summarized what i think about this in my mail. let's see if someone from Marketing or Release team has different view o:)	14:27
elodilles	sean-k-mooney: the patch looks valid, +2'd	14:28
sean-k-mooney	ill slowly refersh those patches as they merged but give we are are FF im also trying to limit the ci usage	14:33
elodilles	sean-k-mooney: ++	14:35
opendevreview	sean mooney proposed openstack/nova master: support configdrive rebuilding https://review.opendev.org/c/openstack/nova/+/855351	14:53
opendevreview	sean mooney proposed openstack/nova master: add support for updating server's user_data https://review.opendev.org/c/openstack/nova/+/816157	14:53
sean-k-mooney	dansmith: that wont work ^ but there in the correct order now and i can start on the tests	14:54
dansmith	sean-k-mooney: okay but we can't land an RPC version that says it does a thing when it doesn't	14:54
dansmith	sean-k-mooney: so I think your thing needs to be in front with "if False" or something	14:55
dansmith	and I'll change that to "if the thing is enabled:"	14:55
sean-k-mooney	dansmith: something like https://review.opendev.org/c/openstack/nova/+/855351/2/nova/virt/libvirt/driver.py#3879	14:55
sean-k-mooney	which is never set anywhere to True	14:56
sean-k-mooney	and this if https://review.opendev.org/c/openstack/nova/+/855351/2/nova/virt/libvirt/driver.py#5011	14:56
dansmith	yeah	14:56
sean-k-mooney	so i was thinking you could start a patch on top of the first one or if you want i can proably do it based on the termbin you provided	14:57
sean-k-mooney	you will be quicker at that then me since its been a while since i toughed rpc but either way	14:57
sean-k-mooney	im going to work on the base patch and get it to pass unit and func tests	14:58
dansmith	yeah, I'm working on the rpc stuff already	14:58
dansmith	I think that base patch will work now	14:58
sean-k-mooney	it shoudl work in devstack im expecting the unit test to bitch about the extra paramter to _hard_reboot	14:59
bauzas	hmm, are configdrivers only supported by libvirt N?	15:04
bauzas	configdrives	15:04
bauzas	I'm afraid of https://review.opendev.org/c/openstack/nova/+/816157/16/nova/virt/driver.py	15:04
sean-k-mooney	techinially they are supproted by other drivers i think	15:04
bauzas	trampling all our drivers but libvirt	15:04
gibi	bauzas: nope, it is supported by other drivers too	15:04
bauzas	that's super late for them to support them	15:04
bauzas	s/them/this	15:05
sean-k-mooney	they dont need too	15:05
sean-k-mooney	they just wont be able to use this feature	15:05
sean-k-mooney	i think that was in the spec	15:05
sean-k-mooney	either way this was only planned to be implented by libvirt	15:05
sean-k-mooney	at least for now	15:05
bauzas	hmmmm	15:05
gibi	so we need to detect the capability in the API and reject the reboot request	15:06
bauzas	yup	15:06
sean-k-mooney	thats in the patch currentlyyes	15:06
gibi	if there is user_data update but the virt driver has no capability to regenerate	15:06
bauzas	it wasn't stated a virt driver dependency in the spec https://review.opendev.org/c/openstack/nova-specs/+/816542/7/specs/zed/approved/update-userdata.rst	15:07
sean-k-mooney	by the way we can now drop the object changes	15:07
bauzas	this change seems to me more and more fragile	15:07
sean-k-mooney	bauzas: well im not going to have time to work on ironic by monday	15:07
bauzas	sean-k-mooney: I'm not asking you to do such things	15:07
gibi	(it is not just ironice)	15:08
sean-k-mooney	so hehe i think we were reving this as libvirt only or at least i was	15:08
bauzas	but I point out how fragile this is	15:08
sean-k-mooney	gibi: ya i know	15:08
bauzas	and again, this wasn't explained in the spec	15:08
sean-k-mooney	ironic would just be the most painful	15:08
sean-k-mooney	bauzas: orgianly in the spec they planned to only do it for the metadta api	15:08
bauzas	I was having concerns with this spec because I thought it wasn't really a needed usecase	15:08
sean-k-mooney	but then we pointd out config drive exsited and need to also work	15:09
bauzas	sean-k-mooney: no, they said about configdrives too	15:09
sean-k-mooney	bauzas: no i asked them to add that	15:09
sean-k-mooney	they wanted api only	15:09
bauzas	sean-k-mooney: yes, but I also said it was a problem	15:09
sean-k-mooney	id did not want this to not work if you used config drive	15:09
sean-k-mooney	so as it stands any driver that does not supprot the new metond will not report the trait	15:10
sean-k-mooney	that need to be called out in the api ref	15:10
sean-k-mooney	or other docs for this	15:10
sean-k-mooney	we could put it in the driver suport matirx i guess	15:10
sean-k-mooney	that proably beter long term	15:10
sean-k-mooney	dansmith: only 6 failure for the extra paramater	15:14
sean-k-mooney	i actully need the driver api change in the first patch too so ill add that	15:14
gibi	I'm wondering about the virt driver api change	15:15
dansmith	but bauzas' point about this not being supported by other drivers is legit	15:15
dansmith	having features that look like swiss cheese is very confusing for users	15:15
sean-k-mooney	gibi: i could drop that now actully	15:15
gibi	I asked for it originally as I thought the reboot + regenerate logic can be orchestrated from the compute manager	15:15
sean-k-mooney	if we just have the parmater	15:15
dansmith	even if we reject for the api caller, it's still annoying to say "reboot works, but not reboot with userdata, but only if you're using configdrive"	15:16
gibi	but the actualy implementation calls the regenerate call from the virt driver	15:16
dansmith	gibi: reboot is a single call to virt,	15:16
gibi	so it is less useful to have a virt driver method	15:16
dansmith	and the regenerate has to happen in the middle	15:16
gibi	dansmith: I see now	15:16
dansmith	it's pretty unideal in general for sure	15:16
dansmith	just saying, I think that's why it was done this way	15:16
gibi	this is why I pointing out that the separate virt driver method is probably not needed	15:17
sean-k-mooney	so we dont need a new public methond in the virt driver api	15:17
sean-k-mooney	its just a new parmater to hard_reboot right	15:17
dansmith	no, the publicness isn't the concern I think	15:17
sean-k-mooney	right i know its the uniformatiy of the api	15:17
sean-k-mooney	and not having it depend on the backend	15:17
bauzas	agreed with dansmith my concern isn't the publicness	15:17
bauzas	this is about the fact we depend on virt drivers	15:18
gibi	it would be nice to do the reboot in 3 generic steps from the compute manager: stop, regenerate, start. But this is a big change	15:18
sean-k-mooney	yep i get that but we have many things that depend on the virt driver	15:18
dansmith	I definitely think these kinds of pinpoint features that look generic but only work in libvirt are a problem	15:18
bauzas	sean-k-mooney: well, do all the virt drivers support to create or update the configdrives ?	15:19
dansmith	create yes	15:19
dansmith	nobody supports update today AFAIK	15:19
sean-k-mooney	vmware ironic libvirt yes	15:19
bauzas	dansmith: that's my point, I wasn't seeing the dependency when reviewing the spec	15:19
sean-k-mooney	for create	15:19
sean-k-mooney	not sure about powervm	15:19
dansmith	gibi: the problem with that is things like vmware where I'm not sure where the config drive actually gets created, or even if generating it on the compute node and passing it to the virt driver as a file would necessarily be reasonable	15:19
dansmith	since it's a remote thing	15:19
sean-k-mooney	i can take a look and see what would be required for other driver quickly	15:20
sean-k-mooney	ironic is the hard one	15:20
sean-k-mooney	they pass the config drive via ipa	15:20
gibi	dansmith: I see	15:21
dansmith	like I think ironic passes the configdrive as a string to the api,	15:21
dansmith	never generates it on disk	15:21
sean-k-mooney	ah that might be how nova passes it to ironci but then ironic connects it to the vm via ipa or mounting it	15:21
dansmith	yeah	15:21
dansmith	it serializes it and then ironic writes it remotely I think	15:22
dansmith	now, if compute generated the file I guess the driver could read and send it	15:22
dansmith	but still, those are big changes requiring lots of testing	15:22
sean-k-mooney	so really we would want the regeneration logic to be here https://github.com/openstack/nova/blob/master/nova/virt/configdrive.py	15:22
sean-k-mooney	in the generic shared part	15:22
sean-k-mooney	althogh maybe not	15:23
sean-k-mooney	that does not quite do what i was expecting	15:23
bauzas	I tend to agree with dansmith, we're playing with guns	15:23
dansmith	well, I like playing with guns, but maybe.. playing with fire? :)	15:23
bauzas	if we try to update a configdrive and this doesn't work, then our user won't be happy at all	15:24
bauzas	dansmith: what you prefer	15:24
bauzas	or playing with teslas	15:24
dansmith	as long as you mean "teslas, the unit of charge, meaning playing with dangerous high-voltage" .. then sure :D	15:25
bauzas	point is, we need to only accept to update the userdata if the configdriver correctly regenerated	15:25
sean-k-mooney	https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1060-L1092	15:25
sean-k-mooney	so ironci jsut tuns it into a string yes	15:25
bauzas	dansmith: like "driving a tesla with a 10% battery for 100km"	15:25
dansmith	oh I see	15:25
dansmith	that is scary :)	15:26
bauzas	this may work	15:26
bauzas	but before driving, you need to ensure you can do it	15:26
bauzas	the same goes with configdrives	15:26
sean-k-mooney	and they pass it in spawn and rebuild https://github.com/openstack/nova/blob/bcdf5988f6ae902dba9b41144a7b4a60688b627c/nova/virt/ironic/driver.py#L1189-L1191	15:26
bauzas	you only accept the userdata to be updated once you're sure your configdrive is correctly regenerated	15:26
dansmith	so the feeling I'm getting here is that there are a lot of open questions at this point	15:27
bauzas	since the user doesn't have an idea whether the userdata is updated and only relies on the fact he/she restarted the instance	15:27
sean-k-mooney	bauzas: well that is already a problem today	15:27
dansmith	sean-k-mooney: no it's not	15:27
bauzas	today the userdata is immutable	15:27
dansmith	sean-k-mooney: if you can only provide userdata during create, it's clear what userdata the instance sees	15:27
bauzas	that reminds me the discussions we had when reviewing server groups /PUT	15:28
sean-k-mooney	it is the config drive is not updated today if you atach interface or volumes or update the server metadta	15:28
sean-k-mooney	the user data is not statble	15:28
sean-k-mooney	btu in general the config drive can be	15:28
dansmith	configdrive != userdata	15:28
sean-k-mooney	yep that is my point	15:28
sean-k-mooney	the config drive can be stale today the user data is not	15:28
sean-k-mooney	but that is only because its imutable	15:29
dansmith	right	15:29
dansmith	not ideal, but also not ambiguous	15:29
dansmith	which is bauzas' point I think	15:29
bauzas	yup	15:29
bauzas	since it's immutable, this isn't a problem	15:29
sean-k-mooney	sure but its expected that the user data script will likely depend on the other datta in the config drive	15:29
sean-k-mooney	specificly the device role taggin info	15:29
opendevreview	Dan Smith proposed openstack/nova master: WIP: Add recreate_configdrive to reboot https://review.opendev.org/c/openstack/nova/+/855529	15:29
dansmith	sean-k-mooney: here's the rpc stuff ^ passing tests, not wired into anything else	15:30
dansmith	just so it's available	15:30
sean-k-mooney	dansmith: but to your point it does sound like there is enough stuff that we wont have this mergable even with an FFE	15:30
dansmith	needs more test coverage and an assert on soft I think	15:30
bauzas	sean-k-mooney: my personal opinion is that the more I review this API change, the more I think I discover points we missed	15:33
bauzas	hence my concern	15:33
dansmith	so I should probably rebase that on your regen configdrive patch and I guess I can rebase the api bit on top of that and trivially wire it up	15:33
bauzas	the configdrive regeneration is already a concern (making sure we validate it synchronously with the userdata update)	15:33
bauzas	the fact that we leave other virt drivers in the weeds is another concern	15:34
bauzas	both weren't discussed at the PTG	15:34
dansmith	so on that,	15:34
bauzas	and when I reviewed the spec, I was letting enough comments explaining how I was thinking the reboot solution was fragile	15:34
dansmith	are we really expecting anyone to update vmware or hyperv to do this?	15:34
bauzas	reasonably not	15:35
dansmith	ironic perhaps, but for the above patch,	15:35
dansmith	I was looking at those other drivers and wondering when they were last touched	15:35
bauzas	surely, but we haven't even asked them to look	15:35
dansmith	okay I guess we've had changes in 2022 for vmware	15:35
dansmith	so maybe doable	15:35
dansmith	not much for hyperv though	15:36
dansmith	so I don't see any configdrive stuff in hyperv at first glance	15:37
bauzas	fair enough, but then question	15:37
bauzas	I'm an user	15:37
bauzas	I wanna update my pet's userdata	15:38
bauzas	I do it	15:38
bauzas	and then I don't understand, my f*** script doesn't run as expected when I restart the guest	15:38
bauzas	shall I open a ticket ?	15:38
dansmith	oh it's in hyperv, just in vmops.py	15:39
bauzas	oh, simple, your cloud was running some driver on that host that was having trouble with updating the configdrive	15:39
bauzas	either because it was a legacy driver	15:39
bauzas	or because something (like a perm error) expected	15:39
bauzas	honestly, this spec was just about touching an internal Nova DB record	15:40
bauzas	now, we're pulling way more than that	15:40
dansmith	yeah, it's becoming more about configdrive than anything else	15:40
sean-k-mooney	bauzas: in two converstaion but form my persepcitve i dont think we really have discoverd anythign bar the rpc change. i reveiw the spec with the understandin it was scoped to libvirt	15:41
bauzas	yet again, can't we just assume to update instance's metadata only if not configdrive-driven, and leave people who care about configdrives deal with the complexity one day or another?	15:42
dansmith	I dunno, I can see the argument for this being scoped to just one driver, but for something this fundamental it seems pretty unfortunate to say it's only for libvirt	15:42
bauzas	I'm pretty sure we wouldn't have this back-to-back	15:43
dansmith	so we reject the api call if configdrive=True always?	15:44
bauzas	that's my question	15:44
dansmith	doesn't that get closer to your concern on the spec that people think they have updated their instance but didn't reboot it and don't realize that cloud-init doesn't update their stuff?	15:45
dansmith	I mean, that's just a misunderstanding of cloud-init, but I thought that was your concern	15:45
dansmith	anyway, I dunno	15:45
bauzas	if the update failed, they know the instance userdata wasn't updated	15:45
bauzas	they would know this was because configdriver if they asked by boot params	15:46
bauzas	but they wouldn't know if the clould defaulted to force configdrives or if the image was asking for it	15:47
dansmith	right, it's not always their choice	15:47
bauzas	anyway, I'm just offering a trade-off because I'm not really happy with the current state of the series	15:47
bauzas	also, is the owner of the patch around ?	15:48
bauzas	or are we offering our help for free ?	15:48
dansmith	who is asking for this btw?	15:48
dansmith	aside from the author	15:48
bauzas	a private clould company	15:49
bauzas	Inovex	15:49
jhartkopf	I'm here	15:49
bauzas	cool, glad to hear you jhartkopf	15:49
bauzas	jhartkopf: lemme summarize	15:49
dansmith	jhartkopf: do you care about the configdrive case?	15:49
bauzas	we're having a problem with https://review.opendev.org/c/openstack/nova/+/816157/16	15:49
bauzas	and we don't know how to move forward	15:49
jhartkopf	yep I quickly read your discussion	15:50
bauzas	there is a proposed solution, but this is premature work and we're very late in the cycle	15:50
bauzas	we're also creating a dependency to the virt drivers and unfortunately, Hyper-V, VMWare and Ironic could be impacted if we merge	15:50
bauzas	lastly, I have concnerns about ensuring that the userdata API query is tied to the successfulness of the configdrive regen op	15:51
bauzas	which would require the API call to be synchronous on a configdrive regen	15:51
bauzas	so, my question is	15:52
* sean-k-mooney i dont think its fiar to say the other virt dirver would be impacted		15:52
bauzas	do you really care of configdrives?	15:52
dansmith	bauzas: we can't make the api call synchronous for config drive regen	15:53
dansmith	that could take a long time, be dependent on the IO load on the compute, etc	15:53
sean-k-mooney	i dont think that was an option	15:53
bauzas	sean-k-mooney: I think it would be fair to say that this feature wouldn't be available to the other drivers while apparently they already support configdrive regen	15:53
bauzas	eek, not regen, but creation	15:53
sean-k-mooney	bauzas: yes that is correct	15:53
bauzas	dansmith: right, I was pointing out loudly the problem	15:53
dansmith	ah okay	15:54
bauzas	this is a design problem	15:54
bauzas	configdrives take time to regenerate	15:54
bauzas	on the other hand, users have an API for updating userdata	15:54
bauzas	they just assume they just have to restart their guests in order to have the latest update	15:55
jhartkopf	bauzas: Initially we did not even want to touch config drives at all. We'd only like to update user data in the metadata service really.	15:55
sean-k-mooney	right but the api prevents you updating the user data if the instance has a conf driver outside of hard_reboot	15:55
bauzas	I know	15:55
stephenfin	gibi: Sorry for the delay. I'd run through most of those PCI patches yesterday but didn't leave reviews until I got to the end of the series. Done now (y)	15:55
gibi	stephenfin: much appreciated. I will look	15:55
stephenfin	I went as far as sean-k-mooney did. Could go further but I see -1's from you	15:55
gibi	stephenfin: yes, I left them -1 there as there is a bug in the scheduling part	15:56
gibi	stephenfin: I think it is totally OK to merge it up until https://review.opendev.org/c/openstack/nova/+/850468/20	15:56
jhartkopf	What's up with the plan you guys proposed yesterday to split the patch and only allow updating instances without config drive for now?	15:57
gibi	stephenfin: the rest slips to AA	15:57
stephenfin	sounds reasonable to me	15:57
sean-k-mooney	jhartkopf: i would prefer not to do that if we can avoid it but its an option	15:58
sean-k-mooney	i think making this feature depend on if config drive is used or not is worse then only supproting it for some virt dirvrs	15:58
gibi	stephenfin: there is two FUPs at the top. I will move them to be on the mergeable part of the series	15:58
sean-k-mooney	sicne the config drive can change based on what host you land on	15:58
sean-k-mooney	there is no way to knwo if it will work or not	15:59
dansmith	sean-k-mooney: hmm, the mandatory config drive thing is a compute-node config right?	15:59
sean-k-mooney	dansmith: yes	15:59
dansmith	yeah, that sucks	15:59
dansmith	so instances at the edge which use configdrive become un-updatable	15:59
sean-k-mooney	yep	16:00
gibi	user could use the capability trait to land on a host that supports regen	16:00
dansmith	so another thing,	16:00
sean-k-mooney	or ones in isolated networks	16:00
dansmith	we really should support reboot with user_data for any instance,	16:00
dansmith	because that's the consistent way to get it updated, regardless of what type it is	16:00
dansmith	that way you get updated, you get rebooted, etc	16:00
dansmith	gibi: the user doesn't know this is a problem until it's too late is the point	16:01
sean-k-mooney	dansmith: that more or less what we were tryign to do and why i was insitieng on config drive support	16:01
sean-k-mooney	i guess the disconenct is	16:01
dansmith	gibi: they don't choose a host based on whether they want to update the user_data a month from now :)	16:01
sean-k-mooney	i tought it was ok to start with the libvirt dirver	16:01
sean-k-mooney	since we coudl add other drivers later	16:01
dansmith	sean-k-mooney: I don't think I said I'm opposed to that, what I'm opposed to is just never having that implemented for the other drivers	16:02
sean-k-mooney	gibi: dansmith also i fixed a bug where migration could cause you to gain/loose a config drive by making it sticky	16:02
sean-k-mooney	so if the vm is first created with a config driver because of the option i made that sitcky	16:02
dansmith	and given how intertwined this is with the virt driver, and that it was already done wrong once, I would hate to implement this one way and find out that vmware or hyperv or ironic can't honor the request to rebuild during a reboot at the right time	16:02
sean-k-mooney	dansmith: well that is currently blocked by the trait today	16:03
dansmith	sean-k-mooney: that also means that if you got a configdrive when you created the instance, it will never have update-able user_data right?	16:03
sean-k-mooney	but if we tried to do it quick that could break	16:03
sean-k-mooney	dansmith: yes exactly	16:03
dansmith	sean-k-mooney: no I get that we can detect if it's supported today, I'm talking about if we merge this and next cycle ironic says "we literally can't regenerate the config drive" and hyperv says 'we can, but not in the middle of a reboot because of how we implement that"	16:04
dansmith	then we're kinda stuck	16:04
sean-k-mooney	ya fair	16:05
sean-k-mooney	its internal to the driver but it might for exampel require the current reboot to be slit into stop,update config drive, start	16:05
sean-k-mooney	for the hyperv senario	16:05
sean-k-mooney	which we might or might be able to hide	16:05
dansmith	libvirt's hard reboot is basically a recreate, but that doesn't mean the other drivers are	16:06
sean-k-mooney	yep	16:06
dansmith	let me also say that I'm sorry I brought up all these concerns late, but I was asked to review this and have tried to put my money where my mouth is on changes	16:06
dansmith	but I think these are all legit concerns	16:06
bauzas	yeah, there are no easy paths for solving this problem	16:07
dansmith	I know gibi is plotting my murder right now, probably conspiring with jhartkopf :)	16:07
sean-k-mooney	they are. we discussed it in the ptg as we had previous rejected the spec	16:07
sean-k-mooney	last cycle	16:07
gibi	I'm sorry that I drove jhartkopf's solution to a dead end.	16:08
bauzas	well, I had concerns about the complexity it was creating for little gain	16:08
bauzas	but I was opposed this was an easy win	16:08
gibi	I reviewed these patches and missed obvious design errors. I will try better next time.	16:09
bauzas	so, now, I'm trying to find a trade-off but I don't wanna pull the strings if I think this is risky	16:09
sean-k-mooney	well little gain is not neeisaly fiar it makes nova more "cloud native" as the idea was that user data shoudl be more liek k8s config maps	16:09
sean-k-mooney	alhtoguh to be fiare that woudl also imply we shoudl delete and recreate the vms	16:09
dansmith	hah right	16:09
dansmith	no problem updating user_data if we just shoot the instances in the head :D	16:09
bauzas	sean-k-mooney: we never proposed userdata to be mutable	16:09
sean-k-mooney	who is we	16:10
bauzas	in a cloud, you just spin another instance if you dislike your existing userdata	16:10
sean-k-mooney	this has been a long runing request for multiple cycles	16:10
sean-k-mooney	not in aws	16:10
sean-k-mooney	and other plathforms	16:10
jhartkopf	dansmith: Honestly it's better to notice problems now than when it's too late	16:10
sean-k-mooney	apprenly openstack was an outlier	16:10
bauzas	https://docs.openstack.org/nova/latest/user/metadata.html#user-provided-data	16:11
gibi	dansmith: I would I? I feel sorry for jhartkopf's time spent on this. And I feel bad about that I was not able to find the issues you and sean-k-mooney found	16:11
gibi	*why wouldi?	16:11
bauzas	glad I'm not quoted :)	16:12
sean-k-mooney	bauzas: right but this concept was orginally borrowed form ec2	16:12
sean-k-mooney	bauzas: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html	16:12
sean-k-mooney	https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html#user-data-view-change	16:13
bauzas	the stop requirement from EC2 may help	16:14
bauzas	anyway, I was about to draft a design modification	16:15
sean-k-mooney	well doing it on hard reboot was an optimisation of stop update start	16:15
sean-k-mooney	but we could do htat instead	16:15
sean-k-mooney	or shelve	16:15
bauzas	the fact is, if we only allow the instance to be stopped, we would only allow the instance to be started once the userdata is updated	16:16
sean-k-mooney	unless server update is blocking right now im not sure it helps	16:16
bauzas	at least once the request ended	16:16
sean-k-mooney	bauzas: well we woudl need an new rpc to the compute to rebuild teh cofnig drive	16:17
dansmith	we still have to regen the config drive though	16:17
dansmith	which we don't do on reboot right?	16:17
dansmith	we, stop..start	16:17
sean-k-mooney	right we dont regen the config drive out side of unshelve or corss cell migration	16:17
dansmith	but yeah, saying you can only update user_data if the instance is stopped is probably a lot better for all the arguments	16:17
bauzas	phew	16:17
sean-k-mooney	and unshelve only for non rbd backed instnaces	16:17
dansmith	it addresses the "so do I need to reboot?" question on non-configdrive	16:17
dansmith	and eliminates the need for the rpc	16:18
dansmith	it still requires regen to work, and still may or may not be easy or possible for some drivers	16:18
dansmith	like ironic might have to change the provision state to re-write a disk or something	16:18
gibi	will we regen for each start then?	16:18
dansmith	gibi: that'd be a question yeah	16:19
sean-k-mooney	im not sure we woudl remove the rpc	16:19
dansmith	I really don't like the idea of a dirty flag for the other reasons	16:19
dansmith	sean-k-mooney: oh you mean a new rpc for the update	16:19
dansmith	yeah that could work	16:19
sean-k-mooney	ya	16:19
dansmith	we could just nuke the existing drive in the libvirt case, but something like ironic would have to keep track	16:19
sean-k-mooney	i mean we have another option	16:19
sean-k-mooney	make this entirly uer driver by adding a new instance action	16:20
sean-k-mooney	for updating the config drive	16:20
dansmith	oh instead of a PUT	16:20
bauzas	I know we're on Zed-3 day and I'm supposed to stay but this is 6:20pm here and I wonder how much of this could be etherpaded	16:20
sean-k-mooney	ya, stop, update whatever you liek, explictly tell it to update config drive if you care, start	16:21
bauzas	because we're discussing a design change	16:21
bauzas	dansmith: yeah, something we would say "damn nova, please do the necessary to plumb my things"	16:21
sean-k-mooney	bauzas: i think we all realise that we likely wont make progress now	16:21
bauzas	an instance action would be nice to me, because it would get a status	16:22
bauzas	and the async thing would be simple for the user	16:22
bauzas	like, I started again and my userdata didn't updated, why ?	16:22
bauzas	=> server action tells me it failed	16:22
sean-k-mooney	well the server action can be blocking or async	16:23
sean-k-mooney	but it has a feeback mecahnim either way	16:23
gibi	this could be a very long running blocking call	16:23
bauzas	yup, the whole point is that it gives a clue why the userdata wasn't uptodate	16:23
sean-k-mooney	right it woudl be in the instance event log	16:23
bauzas	and until the userdata says "I'm updated", don't expect to get the new things if you start	16:23
gibi	we can also make the start to fail if it needed to regen but failed to	16:24
sean-k-mooney	if its async or in the respocne if blocking	16:24
sean-k-mooney	gibi: that woudl requrie trackign the dirty state	16:24
sean-k-mooney	and also some peopel might not care	16:24
gibi	I don't see how somebody explicitly updated the user_data and then don't case if that update is ending up in the VM being sarted	16:25
gibi	started	16:25
gibi	I think the dirty flag was bad because of shadow rpc	16:25
sean-k-mooney	im thinking about something else	16:25
sean-k-mooney	as a user i might not know if the user data is in config drive or not	16:26
sean-k-mooney	but i said start	16:26
sean-k-mooney	and i might not want tha to break in this case	16:26
gibi	I see	16:27
sean-k-mooney	the dirty flag was to give you the behaior that it would only boot if it succeed to update	16:27
bauzas	honestly, if I'm an user, I don't care about the internals	16:27
sean-k-mooney	if it failed at least in my version teh vm would have gone to error	16:27
bauzas	what I want to know is when my update succeeds and when I can use it	16:27
dansmith	sorry, I'm getting pulled away,	16:28
dansmith	but if it was an instance action,	16:28
dansmith	then the rebuild either works or doesn't, I can see what happened,	16:28
dansmith	and then I either get the old version on fail, or the new version on succeed	16:28
dansmith	if we just set a dirty flag and then fail to regen on start, then we've killed the instance with no way to restart	16:28
dansmith	which sucks	16:28
bauzas	yup, I get feedback from an user perspective	16:28
bauzas	yup, agreed	16:28
sean-k-mooney	jhartkopf: so based on the above	16:29
dansmith	we probably need some task_state or something to keep the user from starting it until the regen completes though	16:29
sean-k-mooney	jhartkopf: and the fact we are all runing out of steam	16:29
bauzas	mikal would say "spec'y spec'y"	16:29
dansmith	or just use instance lock on the compute node to hold off on any other requests.. I guess that's enough	16:29
sean-k-mooney	i think we need to disucss this again next cycle	16:29
bauzas	the next PTG will be virtual	16:29
bauzas	anyone can join, budged saved	16:30
* bauzas tries to not be sarcastic		16:30
sean-k-mooney	ya but we can reopne the sepc and discuss it again before htat	16:30
bauzas	oh yeah	16:30
sean-k-mooney	i see to ways we coudl go with the instance action by the way	16:30
sean-k-mooney	one is one for regeneratin cofnig drive	16:31
sean-k-mooney	the other is more targed for this feature	16:31
sean-k-mooney	an instance action to update user_data	16:31
sean-k-mooney	that either succeed or does not	16:31
sean-k-mooney	and have that be the only wya to update it	16:31
sean-k-mooney	so no updating it via server put	16:31
dansmith	yeah, the user should not see "regenerate config drive"	16:32
dansmith	so it should be "update my user_data" at the external surface for sure	16:32
sean-k-mooney	right	16:32
sean-k-mooney	so if we do that	16:32
sean-k-mooney	that dedicated instance action can "do the right thing" regardess of how the instance is booted	16:33
dansmith	right	16:33
sean-k-mooney	and returna 409 conflict if its not supprot by the driver or whatever	16:33
jhartkopf	sean-k-mooney: Oh that'd be bad news. Was hoping to get at least the stuff without config drives merged this cycle.	16:33
sean-k-mooney	but it can be atomic	16:34
JayF	o/, I have a handful of ironic driver patches I'm still trying to get backported thru. They are all clean backports and passing tests.	16:34
JayF	https://review.opendev.org/c/openstack/nova/+/854257 (no live migration tests on ironic driver changes -> yoga) https://review.opendev.org/c/openstack/nova/+/821351 (ignore plug_viles on ironic driver -> ussuri) https://review.opendev.org/c/openstack/nova/+/853546 (minimize window for a resource provider to be lost -> train)	16:34
bauzas	sean-k-mooney: dansmith: yeah, the user-surfaced API is the userdata, not the configdrive	16:34
bauzas	so the instance action has to be related to it	16:35
sean-k-mooney	JayF: did we land the skip of the migration job yet	16:35
JayF	sean-k-mooney: that's the yoga backport above	16:35
JayF	sean-k-mooney: the version -> master is in	16:35
sean-k-mooney	an cool	16:35
sean-k-mooney	i was onder if the master on had landed	16:36
bauzas	jhartkopf: the problem is that we can't introduce this userdata update by splitting without configdrive as we just agreed on a brand new API	16:36
JayF	oh, I've already checked all these -- previous branches have merged, these should be good to go. e.g. I'd approve these if they were ironic stable changes	16:36
bauzas	for consistency reasons, we have to hide the configdrive details from the user request	16:36
sean-k-mooney	bauzas: that and my +2 on the spec was contingent on it not depending on if you use configdrive or not	16:36
JayF	and elod helped me clean up the only piece around nova stable policy that's more strict than Ironic :D	16:36
bauzas	sean-k-mooney: that kind of design problems happening late in the cycle are common	16:37
bauzas	jhartkopf: this isn't at all about your implementation, this is rather about the fact we discovered some design problems we weren't considering at the spec review time	16:38
gibi	we should be better at doing this kind of design discussion before the FF day	16:38
gibi	but I don't know how	16:38
bauzas	honestly, this is the only blueprint that fails this for this cycle AFAICT	16:38
bauzas	gibi: we had precedents	16:38
bauzas	some spec approved and a design question raising late	16:39
jhartkopf	bauzas: I understand that, but it's really unfortunate though	16:39
bauzas	jhartkopf: I don't disagree with you and I understand how much this can be frustrating	16:39
artom	bauzas, wait, so ephemral disk, PCI, and manial will all make it?	16:40
* artom is frankly astounded		16:40
artom	*Manilla	16:40
gibi	bauzas: I don't think we can detect all the design issues at spec time. But still I think we should be better than doing it at the day of FF	16:40
sean-k-mooney	not sure about manilla	16:40
bauzas	jhartkopf: I just hope you can understand that our API resources are important to us and we really care about the UX	16:40
* artom maybe misunderstood "this is the only blueprint that fails this for this cycle AFAICT"?		16:40
sean-k-mooney	ephmeral also likely not	16:40
bauzas	artom: no, I was mentioning a design issue being raised	16:40
artom	Ah	16:40
bauzas	ephemerals and PCI are on their way but they're big and need a lot of back and forths	16:41
gibi	the only easy answer for me know is that I should have asked dansmith earlier to review the series	16:41
bauzas	but they hadn't required a redesign	16:41
bauzas	gibi: well, if you read the spec, I had concerns about the restart	16:42
dansmith	gibi: this is why I'm against yearly releases.. because this stuff, very unfortunately, only comes to a head when there's a deadline...	16:42
gibi	so the I should have asked you to review the series too	16:42
bauzas	we just didn't went further thinking about configdrive problems	16:42
gibi	dansmith: I totally agree about having more meaningful deadline would help	16:42
sean-k-mooney	dansmith: because thats the forcing function that get a lot of use to review a tthe same time	16:43
sean-k-mooney	dansmith: in the absence of an inperson PTG	16:43
dansmith	and I only reviewed it because it was in front of something else that I know we needed to land, which was up against the deadline	16:43
bauzas	and I was on PTO for 3.5 weeks	16:43
bauzas	so blame me too	16:43
dansmith	WE ALL SUCK	16:43
dansmith	there ^ :P	16:43
sean-k-mooney	:)	16:43
gibi	I don't want to blame, I want to find something I or we can do different next time	16:43
jhartkopf	this ^	16:44
bauzas	everyone has different review sensitiveness	16:44
bauzas	for me, I mostly care about upgrades and controler/compute relationship	16:44
bauzas	but I miss other things	16:45
bauzas	the point is, we're humans	16:45
* artom was completely absent for the entire thing, and thus would like to claim that he does not, in fact, suck.		16:45
bauzas	artom: you're Quebecian	16:45
bauzas	there is a default fail	16:45
gibi	I don't want to believe that this is OK as is. :)	16:46
artom	You want a Maurice Richard riot v2.0? Because that's how you get a Maurice Richard riot v2.0.	16:46
gibi	I want to improve!	16:46
sean-k-mooney	well i guss where this should have been hashed out comes back to the spec	16:46
bauzas	yup ^	16:46
sean-k-mooney	and the first red flag should have been when the quetion of how to regenerate teh config drive came up	16:47
sean-k-mooney	after the spec was merged	16:47
gibi	dansmith: we would need a bit more than deadlines we would need to inject deadlines that forces a patch series to be ignored for 2 months or so	16:48
dansmith	to not be?	16:48
sean-k-mooney	we did actully have such a deadline in the past	16:49
gibi	dansmith: to be. If the deadline only means I can try the same deadline two weeks from now then that does not force me to push for this particular deadline	16:49
dansmith	I think part of the problem is that jhartkopf has to wait 6mon (minimum) before he has a consumable release with this in it	16:49
sean-k-mooney	it was called the non-priorty feature propsal deadline	16:49
dansmith	gibi: yeah that's true	16:49
sean-k-mooney	which requried any feature not marked as a priorty to be propsed by m2	16:49
dansmith	gibi: two weeks is probably too infrequent to be pressure-inducing	16:49
gibi	dansmith: true, I defaulted to scrum	16:50
dansmith	we should probably retro this later, I have other m3 stuff I need to do	16:50
gibi	sean-k-mooney: that fulfills the big-enough-pain-to-miss requirement. :)	16:50
gibi	sean-k-mooney: I guess it introduced another issue, selecting priorities. But that was maybe less of a problem	16:51
gibi	and we did see that having priorities would help with the rebuild bfv series	16:52
gibi	* would have helped	16:52
gibi	so I'm not against discussing the reintroduction of non-priority feature freeze as one possible option	16:52
sean-k-mooney	yep	16:52
bauzas	noted.	16:53
sean-k-mooney	so way back we used to requrie non priorty feature to merge by m2	16:53
* dansmith starts singing "the circle of life" in the background		16:53
JayF	As a mostly-outside observer, this sounds mostly like the standard "10 gallons of work has to fit in a 5 gallon bucket" problem :)	16:53
sean-k-mooney	we then relaxed it for the to be proposed by m2	16:53
sean-k-mooney	and then removed it	16:53
bauzas	remember, we discussed this at the PTG when I pointed out I was off for three weeks	16:53
gibi	I have to drop, but thanks for the above discussion	16:54
bauzas	but anyway, I personnally feel another deadline wouldn't have changed the situation	16:54
gibi	lets continue this on the PTG as part of a retro	16:54
sean-k-mooney	sure	16:54
bauzas	yet again, this was a design concern, being missed at the design state	16:54
bauzas	stage*	16:54
dansmith	to be clear, my point about deadlines was that moving to yearly releases would make this a lot worse, not necessarily that more deadlines than we have would solve it	16:54
bauzas	dansmith: oh yeah	16:54
sean-k-mooney	dansmith: yep it would	16:54
melwitt	sean-k-mooney: thanks for the link	17:13
opendevreview	Jay Faulkner proposed openstack/nova stable/xena: nova-live-migration tests not needed for Ironic https://review.opendev.org/c/openstack/nova/+/855469	17:38
*** efried1 is now known as efried		17:53
opendevreview	Rico Lin proposed openstack/nova master: Add traits for viommu model https://review.opendev.org/c/openstack/nova/+/844507	18:14
ricolin	sean-k-mooney: bauzas sorry, a pep8 error, can you approve it again, thanks	18:14
opendevreview	Merged openstack/nova master: Heal missing simple PCI allocation in the resource tracker https://review.opendev.org/c/openstack/nova/+/851359	18:15
opendevreview	Merged openstack/nova master: Heal PCI allocation during resize https://review.opendev.org/c/openstack/nova/+/852396	18:16
opendevreview	Merged openstack/nova master: Heal allocation for same host resize https://review.opendev.org/c/openstack/nova/+/854822	18:16
opendevreview	Merged openstack/nova master: Handle PCI dev reconf with allocations https://review.opendev.org/c/openstack/nova/+/852397	18:21
opendevreview	Merged openstack/nova master: Allow enabling PCI tracking in Placement https://review.opendev.org/c/openstack/nova/+/850468	18:21
dansmith	ricolin: I got you	18:25
ricolin	thanks dansmith	18:28
bauzas	ricolin: dansmith: done too	18:52
bauzas	fwiw, I'm setting Implemented against all blueprints that have their API changes merged but waiting for client patches	19:15
bauzas	nova in launchpad != novaclient and osc	19:15
opendevreview	Merged openstack/nova stable/ussuri: Ignore plug_vifs on the ironic driver https://review.opendev.org/c/openstack/nova/+/821351	19:29
bauzas	fwiw, procedural -2 on https://review.opendev.org/c/openstack/nova/+/816157/16	19:32
JayF	Intersesting failure case in stable/yoga CI, I think it's probably something that's actually-broken, but not by my patch: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8b1/854257/2/gate/openstack-tox-py39/8b1323f/testr_results.html	19:35
JayF	Given that all tests that were comparing the contents there are all failing in a similar way, with no human-readable changes	19:35
sean-k-mooney	JayF i have hit that too	19:36
sean-k-mooney	on complete unrelated path	19:36
sean-k-mooney	i cant repodcue it locally for what its worth	19:36
JayF	are you suggseting I just retry then?	19:36
sean-k-mooney	let me check if my other one pass after a retry	19:36
JayF	I can do that, I usually don't pull that lever if it looks nonrandom like this one does, but I don't know enough about nova failure cases to be certain	19:36
sean-k-mooney	https://review.opendev.org/c/openstack/nova/+/855025 failed agian	19:38
sean-k-mooney	JayF: same issue	19:38
sean-k-mooney	and its only on py39	19:38
sean-k-mooney	the py36 test passes	19:38
bauzas	gibi: sean-k-mooney: others, you may be interested in the antelope timeframe https://review.opendev.org/c/openstack/releases/+/850753/5/doc/source/antelope/schedule.rst	19:38
sean-k-mooney	JayF: so its a real blocker	19:38
bauzas	gibi: sean-k-mooney: this sounds a shorter development cycle	19:39
sean-k-mooney	not realted to your patch at all	19:39
sean-k-mooney	i had t fail on all patches in a 5/6 patch sereis and on the recheck of the first patch	19:39
sean-k-mooney	so its seams to be repoducabel in ci	19:39
sean-k-mooney	JayF: i am not sure ill have time to look at that this week but we likely should track that as a stable gate bug. ill try and follow up with elodilles about it next week	19:45
JayF	When you say "track that as a stable gate bug", what does that mean in practice for Nova?	19:46
JayF	I know in Ironic, e.g., we have an etherpad we track that in	19:46
JayF	I'm happy to document the issue; I can't sit here and ask for a bajillion stable reviews without being willing to help with CI somewhat :D	19:46
sean-k-mooney	JayF: we have an etherpad and we file a bug with gate blocker in the title	19:46
JayF	Where could I find a link to that etherpad?	19:46
sean-k-mooney	https://etherpad.opendev.org/p/nova-stable-branch-ci	19:46
JayF	ty	19:47
JayF	bugs in nova are launchpad, not storyboard?	19:48
JayF	yep looks like it, filing it and will update the etherpad	19:48
sean-k-mooney	yep launchpad	19:48
sean-k-mooney	we have as gate-failure tag	19:49
opendevreview	Merged openstack/nova master: libvirt: Add vIOMMU device to guest https://review.opendev.org/c/openstack/nova/+/830646	19:56
sean-k-mooney	JayF: https://tinyurl.com/mr3pjzyn	19:57
JayF	What login would I use for that?	19:57
JayF	I've not used opensearch yet	19:57
sean-k-mooney	openstack openstack	19:57
* JayF hasn't done serious bulk-gate-troubleshooting since the elastic-recheck days		19:57
sean-k-mooney	its basically the same	19:57
sean-k-mooney	so only 8 hits 6 are me i think but i feel like that might be a limaitaiotn of the query	19:58
JayF	one of the other two is me lol	19:59
sean-k-mooney	its shoing in the nova cros job	19:59
sean-k-mooney	but this job is openstack-tox-py39	19:59
sean-k-mooney	so i dont know why that is not showing up	19:59
JayF	it's going to be extremely hard to make a matcher on this that isn't at least possible it'll catch other stuff	20:01
JayF	because this is essentially "it looks like a proper unit test failure but just... isn't"	20:01
sean-k-mooney	nova.tests.unit.cmd.test_manage.UtilitiesTestCase.test_format_dict [0.025453s] ... FAILED	20:01
sean-k-mooney	i should be able to match on that instead	20:01
JayF	Yeah, I've read the docs a couple of times, and I can't figure out how to wildcard out the time in the test run search	20:13
sean-k-mooney	same i think im going to give up for now	20:13
* JayF sometimes wishes these guis were just a terminal with something greppy installed		20:13
opendevreview	Merged openstack/nova-specs master: Create specs directory for 2023.1 Antelope https://review.opendev.org/c/openstack/nova-specs/+/851007	20:15
opendevreview	Sofia Enriquez proposed openstack/nova master: WIP: Check NFS protocol https://review.opendev.org/c/openstack/nova/+/854030	21:09
opendevreview	Sofia Enriquez proposed openstack/nova master: Check NFS protocol https://review.opendev.org/c/openstack/nova/+/854030	21:28
*** dasm is now known as dasm\|off		21:42
JayF	I filed https://bugs.launchpad.net/nova/+bug/1988482 and am documenting it in that etherpad now.	21:57
JayF	let me know if there's anything I can do to help resolve it, I am a little behind on my usual stuff due to being out sick earlier this week, but if it's not picked up by early next week I might see if I can figure it out	21:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!