Thursday, 2025-04-03

opendevreviewMohammed Naser proposed openstack/nova master: Add reproducer for bug#2106073  https://review.opendev.org/c/openstack/nova/+/94622100:43
mnasercardoe: well that's a reproducer alongside https://bugs.launchpad.net/nova/+bug/2106073 .. now to figure out how to nicely fix this :)00:44
cardoemnaser: btw did you see my note about the release GH action not working on keystoneauth-websso?00:45
mnaserahhhh sorry cardoe got distracted00:45
mnaseri'll add it to my tomorrow list!00:45
cardoehmm. That's gonna be a tough one to solve cause config_drive is generated before the update_port_postcommit() which will resolve that deferred allocation no?00:46
opendevreviewMohammed Naser proposed openstack/nova master: Add reproducer for bug#2106073  https://review.opendev.org/c/openstack/nova/+/94622102:36
mnaserhttps://bugs.launchpad.net/nova/+bug/2106073 - revised the bug, seems to be affecting nova at wide since it feeds in ports without ip allocation (and doesnt wait for it ip_allocation=deferred) and pushed another repro..02:37
opendevreviewYaguang Tang proposed openstack/nova stable/2024.1: Fix deepcopy usage for BlockDeviceMapping in get_root_info  https://review.opendev.org/c/openstack/nova/+/92669603:14
opendevreviewmelanie witt proposed openstack/nova master: Call volume detach rollback API if detach fails  https://review.opendev.org/c/openstack/nova/+/88039903:35
opendevreviewmelanie witt proposed openstack/nova master: Reproducer for bug 2016173  https://review.opendev.org/c/openstack/nova/+/94622203:35
opendevreviewMasahito Muroi proposed openstack/nova master: Add check_instance_state validation to lock API  https://review.opendev.org/c/openstack/nova/+/94622303:52
opendevreviewMasahito Muroi proposed openstack/nova master: Add check_instance_state validation to lock API  https://review.opendev.org/c/openstack/nova/+/94622304:14
opendevreviewMasahito Muroi proposed openstack/nova master: Add check_instance_state validation to lock API  https://review.opendev.org/c/openstack/nova/+/94622308:46
opendevreviewMasahito Muroi proposed openstack/nova master: Use dict object for request_specs_dict in the _list_view  https://review.opendev.org/c/openstack/nova/+/93965809:15
masahitogmann: sean-k-mooney: ack. thanks for the review. updated the commit. https://review.opendev.org/c/openstack/nova/+/93965809:18
sean-k-mooneymnaser: that not new10:47
sean-k-mooneymnaser: there is already a older bug for that10:47
sean-k-mooneymnaser: well or at least there have been some patches floating around for that issue for a while10:48
sean-k-mooneymnaser: ah i twas a downstream only bug report https://bugzilla.redhat.com/show_bug.cgi?id=227670110:50
sean-k-mooneywhich got moved to here https://issues.redhat.com/browse/OSP-3348910:51
sean-k-mooneyill add the upstream tracker to that10:51
sean-k-mooneysorry https://issues.redhat.com/browse/OSPRH-1314410:52
*** haleyb_ is now known as haleyb13:18
mnasersean-k-mooney: thanks for all those details13:18
mnaserI wonder if can get away with deferring config drive creation till we get vif plugged event13:19
sean-k-mooneyfor ironci we cant13:19
sean-k-mooneyfor libvirt i think yes13:19
sean-k-mooneythe file just need to exist before we start the vm13:19
sean-k-mooneyso we can do that later13:20
mnaserSo then the only way we have for this in ironic would be some sort of polling for ports if they're ip address deferred13:20
sean-k-mooneywell we can look at the ironic situration but the ironci driver is "special"13:20
mnaserIt feels ugly but I guess we can just loop over the vifs13:21
sean-k-mooneyfor ironic we are not really generting a config drive. what we are doign is generating the content we woudl provide and then passing that to there rest api13:21
mnaserAnd check if any ports have deferred IP allocation, if so, wait for it to appear13:21
sean-k-mooneyits possibel we could add a call back ot update ironic if the metadata changes13:22
sean-k-mooneybut i dont knwo fi that woudl work13:22
sean-k-mooneyideally you would use the metadat service instead of config drive with ironic if you need it to be up to date13:22
mnaserThat might be a lot of complexity and also still prone to race conditions cause then what will ironic do if it's mid provision and it changes?13:22
sean-k-mooneyill quickly take a look at the driver and see if we can generate it slightly later13:23
mnaserYeah -- my issue is that I'm using bonds13:23
mnaserAFAIK I need config drive for those13:23
sean-k-mooneywell bonds are not supprot in nova or neutron13:23
mnaserIronic port groups?13:23
sean-k-mooneyand its implemtned i think in ironci directly via port groups?13:23
sean-k-mooneyya prot groups are not a thing in nova or neutron13:23
sean-k-mooneyits a purely ironic only network construct13:24
mnaserYeah and the only place that l13:24
mnaser"Bond" network data is generated is config drive13:24
mnaserSo otherwise if I use metadata ill have to do this manually after system boot13:25
sean-k-mooneyso ironic has it own enrechment of the stand metadta https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1045-L110613:25
sean-k-mooneyfor the network info13:25
sean-k-mooneywhich is all custom for the ironic integration and not part of how this normally works13:25
mnaserThe enrichment relies on the get network metadata returning the right thing and that doesn't do that so that's how we end up where we are :p13:26
sean-k-mooneyright os most if not all other dirver use the commeon code that is shared with the api for generating this metadtaa13:27
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/api/metadata/base.py#L11213:27
sean-k-mooneyso they generate there enriched config driver here when calling provision https://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1239-L125813:29
mnaserI think attach_interface is how ports are plugged?13:29
sean-k-mooneyso port bidnign is intened to happen in the compute manger and that is when the ip should be assigend13:30
sean-k-mooneyand then port pluggin is where we are ment to talk to the network backend to make sure the ports are conencted on the swtich 13:30
sean-k-mooneyvirutal or otherwise13:30
sean-k-mooneyand tha tis when we are ment to waith for the network-vif-plugged evnet13:30
sean-k-mooneyplug_vifs in the ironic driver is a noop 13:31
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/virt/ironic/driver.py#L1567-L163813:31
sean-k-mooneythat is breakign the virt dirver contract by how its delegating the attachment13:32
dansmithsean-k-mooney: juicy OTU series just waiting for +Ws :) https://review.opendev.org/c/openstack/nova/+/94414813:33
sean-k-mooneydansmith: yes... im aware. im slightly anooyed at you for askign for an os-traits release when i ask not to do that unlie we landed teh vdi traits but its on my list to look at before the ptg13:34
mnaserNow I'm wondering where does it even get the ports from... Unless it doesn't and ironic does that  stuff behind the scenes?13:34
sean-k-mooneymnaser: so attach_interface is ment to be for adding/removing port after the "server" is created 13:35
mnaserThat's what I thought...13:35
dansmithsean-k-mooney: you did? I'm sorry13:35
mnaserSo that means the neutron vif plug happens when the spawn is called13:35
sean-k-mooneydansmith: its fine, release are cheap13:35
sean-k-mooneymnaser: ya internaly in ironic13:36
mnaserSo I think it might be impossible to get configdrive then13:36
dansmithsean-k-mooney: yeah, I honestly never would have though it would be worth ever holding a release to batch up more than one13:36
mnaserBecause the plug happens after we call ironic to spawn with configdrive included.13:37
mnaserI see what you mean about the whole callback thing now13:37
mnaserUuuuuugh13:37
sean-k-mooneydansmith: well last time i ask peole said they woudl take a look at the traits patch and didnt. gibi did look this time to give them credit13:37
sean-k-mooneymnaser: this is one of the large pices of technial debt with the ironic driver13:37
sean-k-mooneyits doign several things its not allowed to do13:37
dansmithsean-k-mooney: link me13:38
sean-k-mooneyincluding changign the vnic type ot baremetal13:38
sean-k-mooneydansmith: its no kind of blockd on use approving the specless blueprint which si now a ptg topic but sure on sec13:38
dansmithsean-k-mooney: okay well if I had realized I would have argued not to block my thing on another blocked thing, especially since "releases are cheap" :)13:39
sean-k-mooneydansmith: its https://review.opendev.org/c/openstack/os-traits/+/94041813:39
sean-k-mooneydansmith: we can wait till melwitt has a chance ot respond or till the ptg 13:40
sean-k-mooneydansmith: anyway ill try and get to the OTU device tommorw or this evening13:40
dansmithyeah, sounds like that's the best idea13:40
sean-k-mooneymnaser: i think port bidning is still done in nova.13:41
sean-k-mooneymnaser: if we can confirm that we may be able ot tweak plug_vifs to refesh the cache or soemthign like that to get the ip in teh defered case13:41
sean-k-mooneymnaser: but its kind of building one hack on top of anohter13:41
sean-k-mooneyunless we can do that in a common way in the compute manger of coruse13:42
mnaserhmm yea13:43
mnaserDo other drivers rely on compute manager to bind the ports?13:43
sean-k-mooneyyes13:44
sean-k-mooneyvirt drivers are not allowed to do it13:44
sean-k-mooneytehy can do port pluging13:44
sean-k-mooneybtu that a diffent thing13:44
sean-k-mooneyas i said the ironic dirver or ironci its self is doing thing that its not technailly allowed to do like modifyiing cidner volume attachmetn or neutron ports13:45
mnaserI see, so then technically we should have a bound port at some point when we generate the config drive13:45
sean-k-mooneyi dont thinke we will13:45
mnaserLike I know this sounds stupid but as a workaround for me now a sleep and a refresh of network info can potentially make the ip arrive?13:46
sean-k-mooneybecause im not sure any of the prot groups stuff for the bonds will be done when we bind but i might be wrong13:46
mnaserIt doesn't matter since all I need is for the IPs to be there on the neutron port13:46
sean-k-mooneymnaser: ill point out that we never added supprot for l3 rounte network to the ironic driver in nova as far as i recall13:46
sean-k-mooneywe added some generic supprpot but it was never tested with ironic13:47
sean-k-mooneyand i doing think it was supproted in the nova spec13:47
mnaserThe ironic code even sorta handles the port group and maps it automatically13:47
mnaserAnd updates the Mac address in the config13:47
mnaserSo I'm pretty sure if Nova manager binds it, we should be able to fix things if we wait?13:48
sean-k-mooneyi think we would have to call attach_interface to trigger that13:49
sean-k-mooneyport bidnign is the process of setting a binding:host on the neutron port to the hostname of the relvent host13:49
sean-k-mooneythat casuse the neutron ml2 driver to update the port13:50
sean-k-mooneywith extra data like the ip13:50
sean-k-mooneyhowever we dont know the mac adress of interface in nvoa unless we are looking that up form ironci or anytihng about the port groups again unless we are calling out to ironic13:50
sean-k-mooneythe ironic driver moved all fo that to ironic so they could have a singel code path for that in ironic for standaloen mode13:51
sean-k-mooneymnaser: a simpler way to do this might be to modify ironic to just update the content of the config drive wehn it does all the networking setup13:52
sean-k-mooneythat doubling down on the wrong behavior but if they are modifying networkhing out of band then updating the metadta to reflect that makes sense13:52
dansmithsean-k-mooney: https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-nova.2025-04-01.log.html#t2025-04-01T14:09:0613:54
dansmithsean-k-mooney: to be fair, you were talking to gibi not me.. I did show up a few minutes later (first thing in the morning) asking gibi a question about a gerrit review, but I obviously didn't read back the scroll there13:54
sean-k-mooneyya thats fair.13:55
sean-k-mooneyi expected Uggla  to rely that into the release patch13:55
dansmithalso, what's not fair is how hard it is to grep logs for something sean-k-mooney said.. "grep sean-k-mooney.*release 2025*.log" .. no results :D13:56
sean-k-mooney:) there use to be a hound instnace that someone ran that had all the irc logs13:56
sean-k-mooneyliek 8-10 year ago 13:56
sean-k-mooneyit was kind of nice to be able to find thing13:56
dansmithsean-k-mooney: honestly, every time I add something to os-traits I get really upset about how incredibly silly the process is that we've designed for getting string constants into nova13:56
sean-k-mooneyya13:57
sean-k-mooneyso that was added by mikal as a ptg topic13:57
sean-k-mooneyaskign if we can avoid the dance somehow13:57
dansmithwrite nova patch, write os-traits patch, merge os-traits, release os-traits, merge upper-constraints, bump requirements.. only then can you go back to the first step and run tests on the nova patch :(13:57
sean-k-mooneywe coudl have nova just create the standard traits on start up if not present13:57
dansmithsean-k-mooney: I proposed this per gmann to cut out one step at least https://review.opendev.org/c/openstack/requirements/+/94618313:58
sean-k-mooneyhum that could help13:58
dansmithfeel free to +1 to show your support :D13:58
sean-k-mooneywe stil have an annoying coupleing dance with placeemtn too13:58
sean-k-mooneythis applies to os-ressouce classes too.13:59
dansmithyes, honestly we should just squash placement and os-traits back into nova to solve it for good :D13:59
sean-k-mooneywell we solve it for us. but then we force neutorn et all to land a patch to nova to add a triat13:59
sean-k-mooneywhat i was consdiring is allowing anytihng with a service role to add standard traits via the api14:00
dansmith...which is still less work than they have to do currently with os-traits, releases, requirements, etc :)14:00
dansmithidk, that's a much larger step14:00
sean-k-mooneyit is 14:00
sean-k-mooneythe other thing i codnered was sevador plugin14:01
sean-k-mooneyi.e. have an entry poirnt for addign traits14:01
sean-k-mooneybtu that feel more complex14:01
sean-k-mooneydansmith: https://etherpad.opendev.org/p/nova-2025.2-ptg#L235 that where mikal still expressed there furstration with the current workflow14:02
dansmiththat's sort of a different complaint, but yeahj14:03
opendevreviewMasahito Muroi proposed openstack/nova master: Use dict object for request_specs_dict in the _list_view  https://review.opendev.org/c/openstack/nova/+/93965814:09
mnasersean-k-mooney: this all seems to be an ironic issue for now at the moment so maybe i'll take that there but thank you14:11
sean-k-mooneymnaser: ack, we do have a related issue with the generic code path for port that never have ip adress14:12
sean-k-mooneybut that technially a differnt bug14:12
sean-k-mooneywe shoudl be providing the mac info from neutron if the ip_allcoation policy is none but that is nto what happens today14:13
dansmithUggla: by the way, on the otu session you scheduled.. even if we don't need to discuss otu specifically, we can discuss the placement bug fix that is related to it (and migration while over capacity)15:08
dansmiththat may be contentious (not sure) but might benefit from discussion at least15:08
Uggladansmith, sure. I will add a note about it.15:10
gibi+1 to mention that15:10
sean-k-mooneyi think tehre are enough bugs related to it that it would be contorvisal15:11
sean-k-mooney but sure good to highlight it in general15:11
dansmithsean-k-mooney: ?15:11
sean-k-mooneythe placement bug fix15:12
sean-k-mooneylike we have at least 3 bugs form diffent people related to it15:12
dansmith"enough bugs" makes it controversial?15:12
sean-k-mooneysorry not controversial15:12
dansmithah15:13
sean-k-mooneythe approch to fixign it might be, i dont know but the fact the current bavhior is surpising to poeple is demonstarted by the exising bug reports15:13
dansmithright, I meant the actual approach might be controversial15:14
dansmithbut I'd definitely like the opportunity to make my case for why I think this approach is the right one15:14
sean-k-mooneyi dont really see many other options to fixing it that would be backportable so that a vote in favor of the appoch15:14
sean-k-mooneybut ya lets see what others think next week15:15
dansmithyeah, I agree of course.. but people may have religious/idealistic objections, idk15:15
sean-k-mooneyas an aside i might call it soon. im feellign progressively worse. soim going to take some panadol and get a hot drink if that does nto help ill call it a day15:15
dansmithack, melwitt just approved the traits patch15:16
opendevreviewMerged openstack/os-traits master: Add traits for sound and USB devices on instances.  https://review.opendev.org/c/openstack/os-traits/+/94041815:18
sean-k-mooneycool melwitt ++15:24
melwittsean-k-mooney: just in case you missed it, from your comment I added a repro func test under this and also a reno https://review.opendev.org/c/openstack/nova/+/88039915:31
melwittnote that the func test doesn't cover the "old" cinder API path,  I wasn't sure if it needs to i.e. I wasn't sure if the old Cinder path is something we expect to be possible at this point/release15:32
sean-k-mooneyi had not seen that15:34
sean-k-mooneybut cool15:35
sean-k-mooneyon a related not i have a cinder volume multi attch pathc i need to clean up and push again..15:35
sean-k-mooneyhttps://review.opendev.org/q/topic:%22bug/2048837%22 but its in merge conflict so no need ot look now.15:37
sean-k-mooneythere are a couple of volume related pathces in diffetn state aroudn at the moment15:37
melwittack15:39
sean-k-mooneyweried os before we alwasy faked the volujme was attached even fi it was not assocated ot teh vm15:40
melwittwhat? :)15:41
sean-k-mooneymelwitt: https://review.opendev.org/c/openstack/nova/+/946222/1/nova/tests/fixtures/cinder.py#27116:14
sean-k-mooneyi was commenting on the addtion of the if16:14
sean-k-mooneynot that what i orginally type convayed that particallary well16:16
melwittsean-k-mooney: oh haha16:26
melwittyeah the cinder fixture is a little wonky16:27
sean-k-mooney+1 but im not entirlly sure how stable the looping is16:47
sean-k-mooneycomments inlien16:47
sean-k-mooneywe often use threading.evetns of locs when we want to encfoce a specific ordering16:48
sean-k-mooneymelwitt: im going to run your repoducer in a loop for a bit and see if its sable16:49
sean-k-mooneymelwitt: have you done that?16:49
melwittsean-k-mooney: no I didn't16:49
melwittI'll adjust things based on your comments. I haven't really done a func test like this before where it is more about checking the state of a resource in a different component, so I may have done it not the best way16:50
sean-k-mooneywell i think yoru test is fine. i just know that we have had issues with retry loops in fucntional tests before16:51
sean-k-mooneyso if we can force the ordering more explcitly that can help16:51
sean-k-mooneyill let this run for a while and if it passes 50-100 times its proably ok16:52
sean-k-mooneyi think we can impove things if we reuse some of the helpers but im no tsure we strictly need too for this to prove what it needs too16:52
sean-k-mooneyya the cidner fixutre behvaior is a litttel odd in places16:53
sean-k-mooneybut its also non trivial to emulate os im not really surpised16:53
sean-k-mooneymelwitt: ah https://review.opendev.org/c/openstack/nova/+/919979/4/nova/tests/functional/integrated_helpers.py16:54
sean-k-mooneyi _attch_volume for my multi attach repoducer16:54
melwittyeah I think my concern with it is if it could possibly "skip" the status "detaching" due to timing differences. so I won't be surprised if need to do the wait a different way16:55
sean-k-mooneyi was very confused why i did nto find it when i went looking16:55
sean-k-mooneymelwitt: so there are some exmples form gibi useing treading.event16:56
sean-k-mooneyto get thisn to pause in a specific state16:56
melwittoh my bad, I either didn't see that or there was some reason it didn't work for me and I forgot. but more likely the former16:56
sean-k-mooneymainly for networking i think16:56
sean-k-mooneyno _attach_volume is not merged16:57
sean-k-mooneyits in that other volume serises i mentioend to you that i need to respine16:57
melwittwhen I first started this test I used libvirt func test base which super did not work and is not needed anyway and I changed it16:57
melwittoh haha ok, didn't realize that was an unmerged patch16:57
sean-k-mooneyso i last worked ont hat patch 6 months ago so in my brain its totally merged16:58
melwitt:)16:58
sean-k-mooneybut i keep not geting around to rebaseing it16:58
sean-k-mooneymelwitt: so this is what i was referint too https://github.com/openstack/nova/blob/master/nova/tests/functional/regressions/test_bug_2085975.py17:00
sean-k-mooneyusing trheading event to allow use to pause thing in speciifc places17:00
melwittah ok, thanks17:00
sean-k-mooneythat might not be appliable to this case but its ther if you thing it would be useful basicly you woudl set the evnet in the palce your raising17:01
sean-k-mooneyand we woudl wait on it were we currently have the retry loop17:02
melwittgotcha. yeah that might be nicer17:02
sean-k-mooney======17:03
sean-k-mooneyTotals17:03
sean-k-mooney======17:03
sean-k-mooneyRan: 100 tests in 700.9413 sec.17:03
sean-k-mooney - Passed: 10017:03
sean-k-mooney - Skipped: 017:03
sean-k-mooney - Expected Fail: 017:03
sean-k-mooney - Unexpected Success: 017:03
sean-k-mooney - Failed: 017:03
sean-k-mooneySum of execute time for each test: 353.3871 sec.17:03
sean-k-mooneyi mean it seam to work as it is17:03
sean-k-mooneyso im less concerned now17:03
melwittyeah I think what could mess it up is if the timing was sufficiently bad, like what if the _wait got delayed due to CI node busyness and it misses the intermediate state of "detaching". so I think you're right there is some instability possible there17:05
sean-k-mooneyya it could happen. we could rehceck it once or twice 17:06
sean-k-mooneyits a test only change so it does not run tempest17:06
sean-k-mooneyso rechecking is cheap17:06
melwittyeah. I'll try the event thing. oh right17:06
sean-k-mooneyack, no worreis if you dont hcange it. i upgraded to +2 and rechecked17:07
melwittand also I was thinking future failures after it merged17:07
sean-k-mooneyif we recheck once more after that17:07
sean-k-mooneyi think we are good17:07
sean-k-mooneyya its the future case that i was thinking of17:08
sean-k-mooneywe have loops like this today17:08
sean-k-mooneybut we have had issues with them too17:08
sean-k-mooneyso im alwasy cautious when adding more17:08
melwittyeah. also probably wastes some time due to the loop17:08
sean-k-mooneytechnially yes but its 3.5 seccond pretty reliably on my laptop17:09
sean-k-mooneynot amazing but not terrible17:09

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!