opendevreview | melanie witt proposed openstack/nova master: Adapt websocketproxy tests for SimpleHTTPServer fix https://review.opendev.org/c/openstack/nova/+/853379 | 00:34 |
---|---|---|
gibi | melwitt: thank you for taking the triage baton | 07:06 |
whoami-rajat | hey bauzas , jfyi I've rebased my rebuild feature to use mv 2.94 | 07:14 |
gibi | melwitt, sean-k-mooney, stephenfin: with frickler's help the https://review.opendev.org/c/openstack/nova/+/826523 is unblocked now (the procedural -2 is removed). I let some of you +A the bottom when you feel it can start landing | 10:06 |
sean-k-mooney | ah sorry had to load context frickler removed sylvains procedual -2 | 10:08 |
sean-k-mooney | got it | 10:08 |
sean-k-mooney | looks like the first third to half of the feature is ready to merge so ill kick off the start of the serries thanks for that | 10:09 |
sean-k-mooney | gibi: done | 10:11 |
gibi | sean-k-mooney: cool | 10:11 |
sean-k-mooney | that should merge everything up until https://review.opendev.org/c/openstack/nova/+/760456/12 | 10:12 |
sean-k-mooney | ah i was +1 on that while we were debating the terminology | 10:14 |
sean-k-mooney | ok upgraded my votes on 2 other so that should bring us up to https://review.opendev.org/c/openstack/nova/+/826529/9 and later which i have not really reviewd. thats just over half way i think so that will help manage the patch seires review | 10:19 |
sean-k-mooney | gibi: im not sure ill have time today but am i right in assuming there are some patches ready to review in the pci series | 10:19 |
gibi | sean-k-mooney: yes, until https://review.opendev.org/c/openstack/nova/+/850468 it is ready | 10:20 |
gibi | patches top of that is under development (have TODOs in the commit message) | 10:21 |
sean-k-mooney | ok cool ill see if i can make some time for those | 10:21 |
gibi | thanks | 10:30 |
sean-k-mooney | do you have the link to the etherpad for the feature reviews by the way | 10:31 |
sean-k-mooney | i have the microversion plan https://etherpad.opendev.org/p/nova-zed-microversions-plan but cant recall what the other one was called | 10:32 |
sean-k-mooney | oh its proably on the meeting wiki | 10:34 |
sean-k-mooney | ah yes https://etherpad.opendev.org/p/nova-zed-blueprint-status | 10:34 |
gibi | sorry I was distracted | 10:43 |
gibi | but yes you found the right one | 10:44 |
opendevreview | Merged openstack/nova master: Add limitation to docs about bug 1983570 https://review.opendev.org/c/openstack/nova/+/852168 | 11:14 |
sean-k-mooney | artom: o/ | 11:18 |
artom | ~o~ | 11:19 |
sean-k-mooney | artom: will you have time to work on updating bauzas's mdev parseing seriese this week | 11:19 |
artom | sean-k-mooney, yeah, I didn't forget | 11:20 |
sean-k-mooney | thanks | 11:21 |
opendevreview | Merged openstack/nova stable/train: [CI] Fix gate by using zuulv3 live migration and grenade jobs https://review.opendev.org/c/openstack/nova/+/795435 | 13:06 |
opendevreview | Merged openstack/nova stable/xena: Reproducer unit test for bug 1934094 https://review.opendev.org/c/openstack/nova/+/842922 | 14:17 |
*** dasm|off is now known as dasm | 14:22 | |
opendevreview | Merged openstack/nova stable/xena: Fix instance's image_ref lost on failed unshelving https://review.opendev.org/c/openstack/nova/+/842923 | 15:14 |
melwitt | gibi: \o/ thanks! | 15:27 |
gibi | melwitt: :) | 15:29 |
melwitt | elodilles: thank you for getting stable/train unblocked. it's a miracle! 😂 | 15:34 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Reproduce bug 1986838 https://review.opendev.org/c/openstack/nova/+/853516 | 15:54 |
gibi | sean-k-mooney: ^^ another interesting bug fall out from the pci work | 15:54 |
gibi | the scheduler expects the compute will fail and the compute says the scheduler should have done a better job | 15:56 |
gibi | but eventually nobody does its job and we have broken instances | 15:57 |
gibi | story of nova :D | 15:57 |
elodilles | melwitt: np, it was your patch anyway o:) so thanks too! :) | 16:08 |
elodilles | i still don't know what is causing the original problem, but at least with zuul v3 jobs the gate works \o/ | 16:09 |
melwitt | same. glad you thought to try that patch again, I don't think I would have thought of it :) | 16:16 |
elodilles | well, it was not even my idea as neutron team fixed the same issue with the same solution: changing to zuul v3, so that is why i also tried it o:) | 16:19 |
melwitt | :D | 16:24 |
sean-k-mooney | gibi: i tought we had a check to prevent two alisis beign the same | 16:53 |
gibi | sean-k-mooney: we dont have. And with the traits in alias we can create different aliases that matches the same device from nova perspective | 16:54 |
gibi | but at least with placement we in the above case there would be zero allocation candidates | 16:54 |
gibi | so the scheduling would fail accordingly | 16:54 |
gibi | anyhow I will propose a simple fix that is backportable | 16:55 |
gibi | and on master this could never happen with placement | 16:55 |
gibi | after the pci in placement lands | 16:55 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/tests/unit/pci/test_request.py#L190-L231 | 16:55 |
sean-k-mooney | this is what i was remembering | 16:56 |
sean-k-mooney | we prevent the same alis beign defiend twice with confliging numa_policies or device types | 16:56 |
gibi | yeah we prevent defining the same alias twice | 16:57 |
gibi | but we allow two aliases matching the same device | 16:57 |
sean-k-mooney | yep the second is valid | 16:57 |
sean-k-mooney | what is not valid is having 2 claims agaisnt one device | 16:57 |
sean-k-mooney | the two alisas shoudl have created two pci request objects | 16:57 |
sean-k-mooney | the bug is that we dont ensure the all pci_request objects are fullfiled by unique host pci adresses | 16:58 |
gibi | there will be two InstancePCIRequest objects that is OK. the bug is either a) the scheduler does not fail when it fails to consume both requests b) pci_claim does not fail when fails to consume both requests | 16:59 |
sean-k-mooney | for example if the alias name is differnt its fine for the addres/(product_id/vendor_id) to be the same with different numa polcies | 16:59 |
sean-k-mooney | ya so its A | 16:59 |
sean-k-mooney | we should not get to the claim on the compute host if there is only one device | 17:00 |
sean-k-mooney | by the way the code for the schdluler and the comptue node is basically the same | 17:00 |
sean-k-mooney | excpt the schduler path uses a copy of the data | 17:00 |
sean-k-mooney | when its checking if you can consume the pci device | 17:00 |
sean-k-mooney | so if you fix it it will fix it for both | 17:01 |
gibi | but even if #a) passes as there is two free devices at that time. #b) shouldt detect and fail if a parallel claim request consumed one of the devices and the new request cannot be fulfilled | 17:01 |
sean-k-mooney | yes | 17:01 |
sean-k-mooney | but a is impletnend by calling the code for b on a copy of the data | 17:01 |
sean-k-mooney | so we fix the later it fixes both | 17:01 |
gibi | while in general the logic of the scheduler and the pci claim is the same for pci, in reality this part differs. I will push the fix and show that we need to fix this in two separate places | 17:02 |
gibi | I think we are the same as using _filter_pools but then that is called from two differnt paths by the two service | 17:04 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L622-L647 | 17:04 |
sean-k-mooney | support_resuts just calls filter_pools | 17:04 |
sean-k-mooney | correct | 17:04 |
sean-k-mooney | on the compute node it calls consume_requests | 17:05 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L226-L269 | 17:05 |
gibi | btw, https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L645 is broken as it individually matches requests to pools and does not consume pools by requests | 17:05 |
sean-k-mooney | but fixing it in _filter_pools is what i ment by fixing it in one place to fix both | 17:05 |
gibi | the scheduler calls apply_requests that detects the double consumption | 17:05 |
gibi | sean-k-mooney: fixing it in filter_pools is tricky as filter_pools does not meant to consume things | 17:06 |
gibi | can be done though if needed | 17:06 |
sean-k-mooney | right but apply is not ment to be called in the sculer | 17:06 |
sean-k-mooney | it will work but we need to make sure not to commit to it on the db | 17:06 |
sean-k-mooney | so it will need to be a deep copy | 17:07 |
sean-k-mooney | with no save | 17:07 |
sean-k-mooney | ok it does not save | 17:07 |
gibi | nova.scheduler.manager.SchedulerManager._consume_selected_host | 17:07 |
sean-k-mooney | so as long as we do a deep copy of the pool we are good | 17:07 |
gibi | that is the one calling apply | 17:07 |
sean-k-mooney | really for multi create | 17:08 |
sean-k-mooney | we might already be using a copy by the way | 17:08 |
gibi | yes probably | 17:08 |
sean-k-mooney | we had to fix this in the past | 17:08 |
sean-k-mooney | gibi: https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/host_manager.py#L309 | 17:15 |
gibi | yepp that one | 17:15 |
gibi | as you said we need to consume in the scheduler to handle multi create | 17:15 |
sean-k-mooney | ya ok | 17:15 |
sean-k-mooney | the host manager has a copy of the pci stats | 17:16 |
sean-k-mooney | so i guess doing the apply in teh filter might be ok | 17:16 |
sean-k-mooney | i would have expect _locked_consume_from_request | 17:16 |
sean-k-mooney | to fail | 17:16 |
sean-k-mooney | if calling apply_requests was enough | 17:16 |
gibi | it fails | 17:17 |
sean-k-mooney | oh so we never get to the compute | 17:17 |
gibi | but @set_update_time_on_success catches everything | 17:17 |
sean-k-mooney | i tought you said we got to the compute | 17:17 |
gibi | https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/host_manager.py#L266 | 17:17 |
gibi | https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/host_manager.py#L81 | 17:17 |
gibi | this is where scheduler points to the compute :D | 17:18 |
sean-k-mooney | ok but we dont actully select the host and proced to we | 17:18 |
sean-k-mooney | i guess we might if noting depend on teh result of the consume | 17:18 |
sean-k-mooney | ok it returns nothing | 17:19 |
sean-k-mooney | and since that just logs | 17:19 |
sean-k-mooney | we continue and fail | 17:19 |
sean-k-mooney | on the compute | 17:19 |
gibi | yepp that just logs | 17:19 |
sean-k-mooney | ok so that should raise an instance build excption or similr | 17:19 |
sean-k-mooney | so we retry the next alternit host | 17:19 |
sean-k-mooney | well no | 17:20 |
sean-k-mooney | we need to fail in the filter | 17:20 |
sean-k-mooney | so call apply there | 17:20 |
sean-k-mooney | but we proably should aslo adress the fact that eats valid excptions | 17:20 |
gibi | and this is where compute detects the error and only logs, does not fail the instance build https://github.com/openstack/nova/blob/master/nova/pci/stats.py#L244 | 17:21 |
gibi | it point to the scheduler :D | 17:21 |
sean-k-mooney | well its ment to return none | 17:21 |
sean-k-mooney | and the caller shoudl error but i guess it does not | 17:21 |
sean-k-mooney | i.e. the caller proably shoudl be checkign the lenght of the allocation to the requests | 17:22 |
gibi | also apply does not care about dependent devices, but consume_requests does, so we might actually need to call consume_requests | 17:22 |
sean-k-mooney | so suports request only exists to not call consume | 17:22 |
sean-k-mooney | but we proably could remvoe it can call consume | 17:23 |
gibi | yeah probably | 17:23 |
sean-k-mooney | the reason we dont right now | 17:23 |
sean-k-mooney | is incase the host is eliminated by another filter | 17:23 |
sean-k-mooney | we want to make sure the host state object has the correct data | 17:24 |
sean-k-mooney | but we can do that by doing a deep copy of the pci stats in the filter | 17:24 |
sean-k-mooney | there are other ways around that too | 17:25 |
sean-k-mooney | this comes back to correct behavior for muticreate/server groups | 17:25 |
sean-k-mooney | i would proably try just calling consume and see if that breaks any of our tests | 17:25 |
sean-k-mooney | then we might eb able to remove the call to apply | 17:26 |
sean-k-mooney | in the host manager | 17:26 |
gibi | yepp | 17:26 |
gibi | that make sense | 17:26 |
* sean-k-mooney hates multi create, our lives would be much simpler without it | 17:28 | |
spatel | can i set hard limit in nova to create number of vm instance? | 17:40 |
JayF | So, trying to figure out why gate jobs keep failing on my change. They are seemingly random, and I noticed something strange in zuul status: https://home.jvf.cc/~jay/gated-after-master-change.png | 18:15 |
JayF | 826523,5 is a change landing into master AFAICT | 18:15 |
JayF | but is in the same merge queue as my change to stable/yoga (800873,2) | 18:15 |
JayF | s/yoga/victoria/ | 18:16 |
sean-k-mooney | they might be intermiten failure yes | 18:17 |
JayF | Well, I'm just saying I'm surprised that stable/ patches and master patches are gate-queued together | 18:17 |
sean-k-mooney | oh they should not be unless its tempest | 18:18 |
JayF | that's what my png ^^^ shows | 18:18 |
JayF | is that it seems to have done exactly that | 18:18 |
sean-k-mooney | do you have the link to the backprot | 18:18 |
JayF | oooh, or do the separate lines indicate "nope, this is in a separate queue" | 18:18 |
JayF | https://review.opendev.org/c/openstack/nova/+/800873 same as yesterday | 18:19 |
JayF | trying to get it to actually land | 18:19 |
sean-k-mooney | the seperate lines are seperate sets of patches that are in flight | 18:19 |
sean-k-mooney | so those cidner changes are sperate form the nova one | 18:19 |
JayF | got it, this is just me learning for the first time after looking at this UI for an embarassingly long time | 18:19 |
JayF | that the lines on the left have meaning | 18:19 |
JayF | lol | 18:20 |
JayF | thank you for the clarity :D | 18:20 |
sean-k-mooney | there is somethign wrong here | 18:20 |
sean-k-mooney | well hum im not sure | 18:20 |
sean-k-mooney | 800873,2 is a master patch | 18:20 |
sean-k-mooney | but i dont think it depends on the 826523,5 | 18:21 |
JayF | https://review.opendev.org/c/openstack/nova/+/800873 that's not true | 18:21 |
sean-k-mooney | i think the patch it was specultivly merged with has already merged | 18:21 |
JayF | this says repo | branch: openstack/nova | stable/victoria | 18:21 |
sean-k-mooney | on that yes | 18:21 |
sean-k-mooney | but not on https://review.opendev.org/c/openstack/nova/+/826523/ | 18:21 |
JayF | that is 800873,2 | 18:21 |
sean-k-mooney | that is the second nova patch with the x | 18:22 |
sean-k-mooney | so the integrate queue currently has 3 sets of changes in flight i think | 18:22 |
sean-k-mooney | 826523,5 which is your one that depends on nothing | 18:22 |
sean-k-mooney | 800873,2 that depend on a change that has laready merged | 18:22 |
sean-k-mooney | and then the swich/cinder ones | 18:23 |
sean-k-mooney | which are a third set | 18:23 |
JayF | yeah; I think I misread the queue docs | 18:23 |
JayF | the behavior seems sane to me now that I understand what the UI was tryin' to tell me | 18:23 |
sean-k-mooney | the ui is not super clear | 18:24 |
sean-k-mooney | on the pluse side the backport is still green | 18:24 |
sean-k-mooney | so hopefully it will have mergedin the next half hour or so | 18:24 |
JayF | no, the backport is the one that is X I thought | 18:25 |
sean-k-mooney | oh you are right | 18:25 |
JayF | yeah nova-live-migration failed again :| | 18:25 |
sean-k-mooney | failed on nova live migration | 18:25 |
sean-k-mooney | ah | 18:25 |
JayF | I wish I could tell Zuul "trust me, Ironic driver can't break this" :| | 18:26 |
sean-k-mooney | test_live_block_migration_with_attached_volume | 18:26 |
sean-k-mooney | well actullly this came up in the past | 18:26 |
sean-k-mooney | we could skip the live migration job if its an ironic only change | 18:26 |
JayF | I mean, that seems sane-ish to me, but maybe a lot of effort for a branch where this is likely literally one of the only ironic driver things to hit | 18:26 |
* JayF is doing an audit in Ironic right now for patches to backport | 18:27 | |
sean-k-mooney | we can add nova/virt/ironic and nova/tests/unit/virt/ironic | 18:27 |
sean-k-mooney | to the irrelevent files list | 18:27 |
sean-k-mooney | well im sure you woudl not mind having it on master and it shoudl be simple to backport | 18:27 |
JayF | you want to tell me where that change goes? If so I'll post it, and you can keep your +2 :D | 18:27 |
JayF | I took just long enough of a break from OpenStack to not really grok the new zuul configs very well | 18:28 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/.zuul.yaml#L92-L106 | 18:28 |
JayF | incoming | 18:28 |
sean-k-mooney | unfortuetly that is used as the defintio of nova-base-irrelevant-files | 18:29 |
sean-k-mooney | that is sginifed by the & | 18:29 |
sean-k-mooney | so you will need to copy tha tsection to https://github.com/openstack/nova/blob/master/.zuul.yaml#L131 | 18:29 |
sean-k-mooney | then in the current one for the migration update it it | 18:29 |
JayF | got it | 18:29 |
JayF | yaml anchors are awesome and terrible :D | 18:29 |
sean-k-mooney | ok im going to go figure out dinner ill check back in a while but feel free to add me to the review and/or ping me here | 18:30 |
sean-k-mooney | in the mean time youcan likelly just recheck once the gate change fails | 18:30 |
sean-k-mooney | with a message plaease :) "recheck ironic does not support live migration" or similar | 18:31 |
sean-k-mooney | we have an issue with volume detach intermitently failing and i think that is what you are hitting | 18:32 |
opendevreview | Jay Faulkner proposed openstack/nova master: nova-live-migration tests not needed for Ironic https://review.opendev.org/c/openstack/nova/+/853529 | 18:32 |
JayF | sean-k-mooney: ^ | 18:33 |
JayF | Thanks sean :) | 18:33 |
sean-k-mooney | ok that looks ok i think | 18:34 |
sean-k-mooney | test only changes and releasenote only change are already skipped | 18:34 |
sean-k-mooney | so just the ironic virt driver | 18:34 |
JayF | That's why I didn't add the tests line, yep | 18:34 |
JayF | I know how the stanzas for zuul config work, I just can't ever find the #@%$# configs anymore lol | 18:35 |
sean-k-mooney | you can technially go form teh zuul ui to the job definition mostly at least to the repo where there defiend | 18:36 |
sean-k-mooney | i normally fall back on https://codesearch.opendev.org/ if i cant find it quickly | 18:36 |
sean-k-mooney | anyway chat to you tomorrow o/ | 18:36 |
opendevreview | Merged openstack/nova stable/victoria: [ironic] Minimize window for a resource provider to be lost https://review.opendev.org/c/openstack/nova/+/800873 | 20:27 |
JayF | \o/ | 20:27 |
opendevreview | Jay Faulkner proposed openstack/nova stable/ussuri: [ironic] Minimize window for a resource provider to be lost https://review.opendev.org/c/openstack/nova/+/853540 | 20:30 |
opendevreview | Rico Lin proposed openstack/nova master: Add locked_memory extra spec and image property https://review.opendev.org/c/openstack/nova/+/778347 | 21:19 |
opendevreview | Rico Lin proposed openstack/nova master: libvirt: Add vIOMMU device to guest https://review.opendev.org/c/openstack/nova/+/830646 | 21:19 |
opendevreview | Rico Lin proposed openstack/nova master: Add traits for viommu model https://review.opendev.org/c/openstack/nova/+/844507 | 21:19 |
ricolin | gibi: was asking about the necessary of raise exception for locked memory, but I guess the right behavior should be raise it. just update the patch accordingly. thanks for your review, please help to review again. ^^^ | 21:20 |
*** dasm is now known as dasm|off | 22:00 | |
opendevreview | Jay Faulkner proposed openstack/nova stable/train: [ironic] Minimize window for a resource provider to be lost https://review.opendev.org/c/openstack/nova/+/853546 | 22:04 |
JayF | Train is as far back as you all go, right? | 22:05 |
sean-k-mooney[m] | i think we still might have older branches but i think we have gate issues beyond train | 22:23 |
sean-k-mooney[m] | we still have stable queens branches | 22:24 |
sean-k-mooney[m] | but i dont think the gate will pass. you can try | 22:24 |
sean-k-mooney[m] | we have been discussing disabling some broken jobs on really old branches | 22:25 |
JayF | train is probably the most reasonable to go back to anyway | 22:35 |
melwitt | JayF: iiuc, the eventual fix in the ironic driver was this https://review.opendev.org/q/Iba87cef50238c5b02ab313f2311b826081d5b4ab but it hasn't been reviewed on older branches yet. fyi | 22:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!