Thursday, 2021-06-17

opendevreviewmelanie witt proposed openstack/nova master: Add func test for nova-manage db archive_deleted_rows --before  https://review.opendev.org/c/openstack/nova/+/79674401:46
opendevreviewmelanie witt proposed openstack/nova master: Add --task-log option to nova-manage db archive_deleted_rows  https://review.opendev.org/c/openstack/nova/+/78039501:57
melwittlyarwood, elodilles: it's funny passing CI \o/ (but recheck-a-thon) https://review.opendev.org/c/openstack/nova/+/79543202:00
melwitts/funny/finally/02:00
gibilyarwood: hi! regarding https://review.opendev.org/c/openstack/nova/+/796523 I'm sure I asked this before but forgot. Where do we have now the evacuation test coverage? 07:11
opendevreviewMerged openstack/nova stable/rocky: libvirt:driver:Disallow AIO=native when 'O_DIRECT' is not available  https://review.opendev.org/c/openstack/nova/+/74761207:18
opendevreviewMerged openstack/nova stable/wallaby: Neutron fixture: don't clobber profile and vif_details if empty  https://review.opendev.org/c/openstack/nova/+/79223307:18
opendevreviewMerged openstack/nova stable/wallaby: Test SRIOV port move operations with PCI conflicts  https://review.opendev.org/c/openstack/nova/+/79071007:18
*** rpittau|afk is now known as rpittau07:22
opendevreviewYongli He proposed openstack/nova master: smartnic support  https://review.opendev.org/c/openstack/nova/+/75894407:31
opendevreviewYongli He proposed openstack/nova master: smartnic support - reject server move and suspend  https://review.opendev.org/c/openstack/nova/+/77991307:31
opendevreviewYongli He proposed openstack/nova master: smartnic support - functional tests  https://review.opendev.org/c/openstack/nova/+/78014707:31
lyarwoodmelwitt: awesome :) I'll +1 only as I modified it07:43
lyarwoodgibi: it's part of the live migration jobs07:44
lyarwoodgibi: runs in the post playbook07:44
lyarwoodgibi: https://github.com/openstack/nova/tree/master/roles/run-evacuate-hook is the role we use07:45
lyarwoodgibi: https://github.com/openstack/nova/blob/master/playbooks/nova-live-migration/post-run.yaml is where it's called07:45
lyarwoodgibi: the logic being that we didn't want to stand up another multinode env every run to test evacuation07:46
lyarwoodgibi: doing it in post was easier as we didn't need to copy and paste any of the tempest playbook logic into Nova07:46
lyarwoodgibi: so for that review evacuation is tested from here https://zuul.opendev.org/t/openstack/build/057093756ca64ef994584e2cae50f537/log/job-output.txt#6439207:50
opendevreviewMerged openstack/nova stable/victoria: Reproduce bug 1897528  https://review.opendev.org/c/openstack/nova/+/79176707:50
gibilyarwood: thanks08:03
gibiI hope I will not foget this again :)08:03
*** akekane__ is now known as abhishekk08:04
lyarwood^_^08:14
opendevreviewYongli He proposed openstack/nova master: Smartnic support - cyborg drive  https://review.opendev.org/c/openstack/nova/+/77136209:01
opendevreviewYongli He proposed openstack/nova master: smartnic support - new vnic type  https://review.opendev.org/c/openstack/nova/+/77136309:01
opendevreviewYongli He proposed openstack/nova master: smartnic support  https://review.opendev.org/c/openstack/nova/+/75894409:01
opendevreviewYongli He proposed openstack/nova master: smartnic support - reject server move and suspend  https://review.opendev.org/c/openstack/nova/+/77991309:01
opendevreviewYongli He proposed openstack/nova master: smartnic support - functional tests  https://review.opendev.org/c/openstack/nova/+/78014709:01
opendevreviewMerged openstack/nova stable/victoria: Ignore PCI devices with 32bit domain  https://review.opendev.org/c/openstack/nova/+/79176809:02
yongliherebase to fix dependency problem, that's weird.09:03
lyarwoodyonglihe: A pip dependency problem? We've seen loads that make no sense recently.09:19
* lyarwood really needs to write something up on the ML to see if other projects are also hitting it09:19
stephenfinlyarwood: it's a cache issue, I think09:21
lyarwoodoh the limestone thing?09:21
stephenfinyeah, I think so09:21
lyarwoodwonderful09:22
stephenfinit's failing with e.g. dep a requesting >=1.2 and upper constraints requesting == 3.0, which would pass unless 3.0 wasn't available09:22
stephenfinhmm, maybe not actually - the error message I get locally is different09:24
* stephenfin looks at the failure from yonglihe 09:24
lyarwoodack thanks09:25
lyarwoodthat makes sense now if the cache is borked09:25
stephenfinyonglihe: the failure on https://review.opendev.org/c/openstack/nova/+/758944/ looks real?09:26
stephenfin    if vnic_type in network_model.VNIC_TYPES_ACCELERATOR:09:26
stephenfin    AttributeError: module 'nova.network.model' has no attribute 'VNIC_TYPES_ACCELERATOR'09:26
stephenfin(from https://zuul.opendev.org/t/openstack/build/e08dc74546d34d9a8ee67e597ade8fb2)09:26
stephenfinelodilles: lyarwood: Care to keep working through this backport series? The victoria patches have landed now and this is another clean backport https://review.opendev.org/q/topic:%2522bug/1897528%2522+branch:stable/ussuri09:28
lyarwoodack looking09:29
lyarwoodelodilles: https://review.opendev.org/c/openstack/nova/+/796626 - can you also take a look at this on master if you get a chance, moving the cherry-pick script out of pep8.09:30
elodillessure, looking at the patches :)09:32
gibilyarwood, stephenfin: yesterday infra turned off limestone due to the pip cache issue09:42
gibiso we should not see these nonsensical req conflicts09:43
gibiany more today09:43
yonglihestephenfin, that's because that patch lost the decency to second patch, fixed.09:45
lyarwoodwonderful09:45
lyarwoodgibi: https://bugs.launchpad.net/cinder/+bug/1932287 just caught this if you see any random volume creation failures today09:46
gibilyarwood: thanks, I haven't seen that issue yet09:47
opendevreviewMerged openstack/nova stable/rocky: Remove allocations before setting vm_status to SHELVED_OFFLOADED  https://review.opendev.org/c/openstack/nova/+/77198509:47
stephenfinelodilles: Yeah, as lyarwood said, we need to move the cherry-pick change out of the pep8 job. I hadn't seen that failure09:48
* stephenfin respins09:48
gibilyarwood: with the exit code 139 lvs complains about missing devices and that I saw before09:50
* gibi digging up job results09:50
lyarwoodyeah https://review.opendev.org/c/openstack/cinder/+/783660 fixed it elsewhere09:50
lyarwoodjust not in this path09:51
gibilyarwood: cool, then we have a way forward09:51
opendevreviewStephen Finucane proposed openstack/nova stable/ussuri: Reproduce bug 1897528  https://review.opendev.org/c/openstack/nova/+/79177009:51
opendevreviewStephen Finucane proposed openstack/nova stable/ussuri: Ignore PCI devices with 32bit domain  https://review.opendev.org/c/openstack/nova/+/79177109:51
stephenfinelodilles: lyarwood: fixed the pep8 failure ^09:52
gibilyarwood: I'm hitting https://bugs.launchpad.net/nova/+bug/1912310 many times now and almost always in the test_volume_backed_live_migration tempest test. Wondering if it worth to disable that test until ovsdbapp fix lands09:52
opendevreviewStephen Finucane proposed openstack/nova stable/train: Reproduce bug 1897528  https://review.opendev.org/c/openstack/nova/+/79211609:53
opendevreviewStephen Finucane proposed openstack/nova stable/train: Ignore PCI devices with 32bit domain  https://review.opendev.org/c/openstack/nova/+/79211709:53
stephenfinand the train ones are updated now too09:53
lyarwoodgibi: ack lets do it, I'll disable them now09:57
opendevreviewLee Yarwood proposed openstack/nova master: zuul: Skip volume backed LM tests until bug #1912310 is resolved  https://review.opendev.org/c/openstack/nova/+/79681310:04
lyarwoodgibi: ^ hopefully that's enough, if it isn't then we might want to move the LM jobs to non-voting10:04
opendevreviewStephen Finucane proposed openstack/nova master: db: Reintroduce validation of shadow table schema  https://review.opendev.org/c/openstack/nova/+/79681410:11
stephenfinlyarwood: gibi: one final one, as requested ^10:11
gibilyarwood: thanks10:12
elodillesstephenfin: actually i was surprised that pep8 is failing in ussuri because of py27/six problem as py27 should be supported only up until train :-o10:13
stephenfinelodilles: yeah, we simply weren't aggressive enough in dropping the no-longer relevant hacking checks10:13
elodillesoh, i see10:13
stephenfinI would personally like to backport the patch that dropped the check, but I don't know what you think about that. I can't imagine that would violate stable policy since it's nothing to do with production code10:15
stephenfin(commit 9dca0d186f834c38d0d06e226b18ab3ae717c140 fwiw)10:15
lyarwoodYup I assumed we would tbh10:17
lyarwoodno reason to leave it just on >=stable/xena10:18
elodillesstephenfin: well, it formally violates, as it is a blueprint o:) ... anyway, I would stick to backporting only bug fixes... but... given that py27 is not supported in ussuri anymore... anyway I'm a bit unsure... o:)10:21
lyarwoodoh sorry I thought we were talking about the cherry-pick script10:24
elodilleslyarwood: actually I've missed that discussion :X Are you planning to move out the cherry-pick-check from pep8?10:35
elodilleslyarwood: nevermind, i'm just a bit slow today :D10:36
lyarwoodelodilles: https://review.opendev.org/c/openstack/nova/+/796626 yeah that's the idea, make it non-voting in check and only voting in the gate10:36
elodilleslyarwood: yeah, sorry :X10:36
sean-k-mooneylyarwood: instead of skiping the test https://review.opendev.org/c/openstack/nova/+/796813 why not just use the old driver10:45
sean-k-mooneythe stalling issue only happens if you use the native driver10:45
sean-k-mooneythe vsctl one wont have that problem10:45
lyarwoodWe could but unless that's done in devstack across all jobs we end up with a mixed set of jobs10:50
sean-k-mooneyis that a bad thing10:51
sean-k-mooneywith the current patch your just reducing coverage10:51
sean-k-mooneybut the issue can still happen10:51
sean-k-mooneyim fine with makeing that change in devstack tempeorally10:52
lyarwoodkk if you could post that we can yank or revert this10:52
sean-k-mooneyi left a -1 on the patch already but would you like me to do the devstack change10:53
sean-k-mooneyjsut finishing an email but i can do it then10:54
lyarwoodsean-k-mooney: ack10:56
lyarwoodstephenfin: can you yank the +W on https://review.opendev.org/c/openstack/nova/+/79681310:56
lyarwoodstephenfin: sean-k-mooney is going to work around this in devstack10:56
stephenfinsure, done10:56
lyarwoodta10:56
lyarwoodnoice, the FIPS fallout doesn't look that bad11:05
lyarwoodhttps://6cbf38d10f57b850b36e-212ab268b5e4bbb4b3348f98a2a831ee.ssl.cf5.rackcdn.com/790519/6/check/nova-fips/914f344/testr_results.html11:05
lyarwoodparamiko as expected and a server create timeout 11:06
lyarwoodand that smells like the ovs locking up issue11:07
sean-k-mooneylyarwood: cool11:12
sean-k-mooneyby the way i have found an interestign ceph issue that well i know how to fix but dont know how to detect11:12
sean-k-mooneylyarwood: are you familar with EC pools in ceph 11:13
sean-k-mooneyi was following https://docs.ceph.com/en/latest/rbd/rbd-openstack/ to configure ceph for openstack in general and https://themeanti.me/technology/2018/08/23/ceph_erasure_openstack.html for the ec pools11:13
sean-k-mooneyand i missed a step kind of11:14
sean-k-mooneysince i have a vms pool for nova and a vms_data pool 11:14
sean-k-mooneywehn i was doing the cephx user caps configuration i need to list both the vms pool and vms_data pool11:15
sean-k-mooneyonly listed vms11:15
sean-k-mooneythe result of which is that nova booted a vm and it went into the active state11:15
sean-k-mooneybut it could not reade or write its root disk11:15
sean-k-mooneythe root disk has all the data present but it was inaccessable to qemu11:16
sean-k-mooneylyarwood: due you think that qemu might be abel to detect that and create a warnig/error or could we detech that somehow and create an error11:17
sean-k-mooneynova is calling ceph directly to get the avaibale storage11:17
sean-k-mooneyim debating if woudl make sense for nova to try and create a volume and read form it on the host or something on startup of the agent11:18
lyarwoodI'm not sure how QEMU could catch that tbh11:22
lyarwoodtbh that smells more like a deployment tooling validation?11:23
lyarwoodI wouldn't ask Nova to check it11:23
sean-k-mooneyok its just annoying to debug11:23
sean-k-mooneythere is no error in qemu or nova or ceph11:23
sean-k-mooneythe vm just cant find a bootable disk11:24
sean-k-mooneylyarwood: the deployment tool im using does not technially support this anymore which is why i messed it up11:24
sean-k-mooneykolla-ansibel now just has extrenal ceph support11:24
lyarwoodthat could still be a validation for external ceph11:25
sean-k-mooneyso you predeply ceph with your favor tool and then pass it a few files like the keyrings and it does the rest11:25
lyarwoodthat the keyring has r/w access11:25
sean-k-mooneyit could yes11:25
sean-k-mooneythere are post run check which i did not run11:25
sean-k-mooneyi might add one for this11:25
sean-k-mooneyi was going to try and update there docs later anyway to document some of the more advanced customisation that im doing11:26
sean-k-mooneyfor example running all fo the opnestack servics on the same port but with different subdomains11:26
sean-k-mooneylyarwood: stephenfin  https://review.opendev.org/c/openstack/devstack/+/79682611:54
sean-k-mooneyi think that will do the right thing11:55
lyarwoodLGTM but I'll wait for CI to run before I vote12:09
sean-k-mooneyi have not had time to test that so that is proably a good idea :)12:12
lyarwoodSmall nit in the commit message btw, you called out the wrong bug.12:14
sean-k-mooneyoh12:14
sean-k-mooneyi can fix it but might wait for the ci to finish12:14
lyarwoodyeah no issues12:14
* lyarwood -> lunch brb12:15
sean-k-mooneyah i di 12:15
sean-k-mooneyit should be https://bugs.launchpad.net/nova/+bug/192944612:15
sean-k-mooneynot https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/192946612:15
sean-k-mooney446 no 46612:15
sean-k-mooneylyarwood: its going to fail12:17
sean-k-mooneyopt/stack/devstack/lib/os-vif: line 12: return: False: numeric argument required12:17
sean-k-mooneyi forgot you cant return sting in bash12:18
sean-k-mooneyyou echo them12:18
opendevreviewMerged openstack/nova master: db: Remove dead code  https://review.opendev.org/c/openstack/nova/+/78629112:19
opendevreviewMerged openstack/nova master: gate: Remove test_evacuate.sh  https://review.opendev.org/c/openstack/nova/+/79652312:19
opendevreviewRodrigo Barbieri proposed openstack/nova stable/ussuri: Error anti-affinity violation on migrations  https://review.opendev.org/c/openstack/nova/+/79671912:50
opendevreviewMerged openstack/nova stable/stein: Improve error log when snapshot fails  https://review.opendev.org/c/openstack/nova/+/78296213:06
opendevreviewMerged openstack/nova stable/ussuri: Reproduce bug 1897528  https://review.opendev.org/c/openstack/nova/+/79177013:06
opendevreviewLee Yarwood proposed openstack/nova master: zuul: Add nova-tox-functional-centos8-py36 job  https://review.opendev.org/c/openstack/nova/+/79668413:13
opendevreviewLee Yarwood proposed openstack/nova master: zuul: Add nova-tox-functional-centos8-py36 job  https://review.opendev.org/c/openstack/nova/+/79668413:18
lyarwoodgah!13:19
opendevreviewLee Yarwood proposed openstack/nova master: zuul: Add nova-tox-functional-centos8-py36 job  https://review.opendev.org/c/openstack/nova/+/79668413:19
* gibi is frustrated that https://review.opendev.org/c/openstack/nova/+/796255 needed a 10th recheck :/13:47
lyarwoodgibi: sean-k-mooney is working on https://review.opendev.org/c/openstack/devstack/+/796826 to hopefully resolve lots of instability 13:50
sean-k-mooneyi wonder why we are hitting this so much more often recently13:51
lyarwoodmaybe we are just noticing it more recently, it's an awkward one.13:53
sean-k-mooneyya we also kind of mentally filter out those lines in the log13:53
sean-k-mooneyat least i do most of the time13:54
lyarwoodright takes some processing of timestamps to even see the issue but most of the time the ultimate test failure is miles away from that13:59
lyarwoodsometimes I wish I worked on an easier stack :)13:59
sean-k-mooneylyarwood: gibi  its almost finished the check run by the way the current version seams to be working13:59
lyarwoodack yeah I've been watching13:59
lyarwoodlooking good thus far13:59
noonedeadpunko/14:04
noonedeadpunkfolks we noticed weird behaviour that you're probably aware about14:04
opendevreviewMohammed Naser proposed openstack/nova stable/wallaby: Allow X-OpenStack-Nova-API-Version header in CORS  https://review.opendev.org/c/openstack/nova/+/79686014:05
opendevreviewMohammed Naser proposed openstack/nova stable/victoria: Allow X-OpenStack-Nova-API-Version header in CORS  https://review.opendev.org/c/openstack/nova/+/79686114:06
opendevreviewMohammed Naser proposed openstack/nova stable/ussuri: Allow X-OpenStack-Nova-API-Version header in CORS  https://review.opendev.org/c/openstack/nova/+/79686214:06
opendevreviewMohammed Naser proposed openstack/nova stable/train: Allow X-OpenStack-Nova-API-Version header in CORS  https://review.opendev.org/c/openstack/nova/+/79686314:07
noonedeadpunkSo, algorithm is kind of the following: 1. HV goes down. 2. VM is sent Shutdown (or any other request). 3. Then VM is in `powering-off` state, but it needs to be evacuated. So reset-state is issued and evacuate is processed. Now VM is running on another HV. 4 When original HV goes up it process messages that were issued while it was down and powers off VM that was evacuated and owned by another HV atm14:07
noonedeadpunkI have a feeling that if node is not owning VM it should not have ability to influence it even if it has some commands in queue?14:08
noonedeadpunkand maybe you have some guess where in code worth looking for this?14:08
lyarwoodso the compute manager that gets the cast in this case isn't doing any checks to ensure the instance is still on that host14:10
lyarwoodI guess it's a valid thing to do for any operations using casts14:10
noonedeadpunkyeah, I expect smth like that is happening. But not super familiar with codebase :(14:10
sean-k-mooneynoonedeadpunk: why are you doing reset-state in your evacuate workflow14:11
sean-k-mooneynoonedeadpunk: you should not be doing reset state first14:11
noonedeadpunkwell, otherwise it can't be evacuated with `ERROR (Conflict): Cannot 'evacuate' instance e46404b1-e6e1-4d22-9f8f-12d6f51b55ae while it is in task_state powering-off`14:11
sean-k-mooneyhum14:11
sean-k-mooneyi see14:12
noonedeadpunkIs there any other proper way to do evacuate?14:12
gibilyarwood, sean-k-mooney thanks. I'm happy to see that this week a lot of us focused on stabilizing the gate. 14:12
noonedeadpunkI mean technicaly we could wait until node goes up, but it might be days theoretically?14:12
lyarwoodtbh I think we should allow evacuate if the instance is powering-off14:12
lyarwoodeither way the src compute is dead14:12
sean-k-mooneyyep i ws thinking the same14:13
noonedeadpunkbut it won't resolve original issue though14:13
noonedeadpunkas then evacuated instance would be shot anyway14:13
lyarwoodwell it shouldn't kill the instance on the dest14:13
noonedeadpunk(but agree it's super valid to allow evacuate)14:13
lyarwoodthe cast to shutdown the original instance on the original host should fail14:13
noonedeadpunkfwiw it's on Victoria14:14
lyarwoodbut that has the potential of moving the instance into an ERROR state14:14
lyarwooda simple decorator to check that instance.host points at the current host would work here14:14
lyarwoodit might not work everywhere we cast 14:15
lyarwoodbut in this example it's fine14:15
lyarwoodnoonedeadpunk: did you have a bug for this?14:15
noonedeadpunknope, not yet, but will submit one :)14:15
lyarwoodawesome thanks14:16
noonedeadpunkor maybe even two...14:16
gibithere is a recent bug asking for evacuating in soft-delete state https://bugs.launchpad.net/nova/+bug/193212614:17
sean-k-mooneywe spoke about allowing it in other state at the ptg14:17
sean-k-mooneylike paused/suspended14:18
noonedeadpunkwell... soft delete is really corner case imo...14:18
lyarwoodI'm not sure that soft-delete makes sense14:18
bauzassoft-deleted in the Nova API or in the database ?14:18
sean-k-mooneyim not sure soft delete makes much sense14:18
noonedeadpunkonce you will evacuate it it would be already time to delete instance...14:18
sean-k-mooneylyarwood: :)14:18
bauzasah, this14:19
sean-k-mooneyi guess the use case is to undelete it14:19
bauzas(nova api soft delete, that's it)14:19
bauzaswell, i do understand the concern from an operator pov14:19
lyarwoodTIL we can do that14:19
bauzasif you wanna evacuate, you're in a rush14:19
sean-k-mooneyi would be ok with making undelete work when the host is down and then allow evacuate14:19
lyarwoodfor some reason I didn't think we had a way back14:19
sean-k-mooneybtu eveauate on a soft-deleted instace form me would have to undelete it14:19
bauzassean-k-mooney: or we could just not rebuild the instance14:20
bauzasit's soft deleted in the source host14:20
bauzasso the target should just not rebuild the instance14:20
sean-k-mooneybauzas: so rebuild when the hosts is down14:20
bauzasnah14:20
bauzasnot rebuild the soft-deleted instance14:21
sean-k-mooneybut why would we keep it deleted14:21
bauzasbut the evacuate API should woirk14:21
sean-k-mooneynot for soft deleted14:21
bauzasbecause it's already deleted14:21
bauzassomeone asked the instance to be deleted14:21
bauzasthen the host got an issue14:22
sean-k-mooneyyep14:22
bauzasso the operator would recreate the instances in a target14:22
sean-k-mooneyat which point i think we should jsut treat it as if it has been deelted fully14:22
bauzassean-k-mooney: that's my point14:22
bauzaswhen saying to not rebuild it on the target14:22
bauzasbut here the API doesn't work14:23
bauzasso, we should provide a HTTP200 for an soft-delete evacuate14:23
bauzasbut not recreating it14:23
bauzasanyway, needs to get my kids from the school14:24
gibiI think the use case could be14:27
gibi1) user deletes the VM 14:27
gibi2) soft deleting is enabled so the VM is just soft-deleted14:27
gibi3) host goes down14:27
gibi4) user realizes that there was a mistake deleting the VM and calls restore14:28
gibi5) restore fails as the host is down14:28
gibiwhat to do nwo14:28
gibinow14:28
sean-k-mooneyso the user cannot know if soft delete is avaiable or the time to restore14:28
sean-k-mooneyto me we can make the restorce call work when the hsot is down but we should not change the evacuate beahvior IMO14:30
sean-k-mooneyso have restore jsut undelete it in the db14:30
gibiOK so then the restore + evacuate would work14:31
sean-k-mooneyyep14:31
gibithat is acceptable to me14:31
gibibut14:31
gibisoft-delete, host down, restore (undelete in db), host up sequence would lead to inconsistency14:31
sean-k-mooneywell when the host comes up it will need to check the db state14:32
sean-k-mooneybefore completing the soft delete action correct14:32
sean-k-mooneye.g. when the compute comes back up it shoudl see the vm was evacuated14:33
sean-k-mooneyor if it was jsut restored14:33
gibithere was no evacuation in this sequence14:33
sean-k-mooneytehn it would see that its been restored in the db14:33
sean-k-mooneyso it would need to hanel that14:33
gibitoday restore sets the power state back to running. the db only restore would set it to shutoff?14:34
gibianyhow I have to run14:36
gibimy days are soo random, I don't feel productive14:36
noonedeadpunkhttps://bugs.launchpad.net/nova/+bug/193232614:37
sean-k-mooneynoonedeadpunk: thanks14:46
sean-k-mooneynoonedeadpunk: just going to triage this quickly how impactful is to you production wise14:47
sean-k-mooneyim leaning towards medium or low since there is no data lose but there is a workload outage14:47
sean-k-mooneynoonedeadpunk: i.e. you can just fix it by starting the vm again14:48
noonedeadpunkI'd say it's closer to medium I guess, because as for public cloud provider it's hard to explain why customer VM wents down in a day after previous outage14:48
noonedeadpunkand you can start it when you own vm or monitor it14:49
noonedeadpunkbut if it's not your VM it's hard to even know that it went down14:49
sean-k-mooneyyep14:50
noonedeadpunkAs current workaround we will probably attempt to flush queue for compute that went down...14:50
noonedeadpunkbut it's so nasty imo14:50
sean-k-mooneymore then likely it will cause the custoemr to notice a failure of the vm restore it and file a ticket14:50
noonedeadpunkthat's exactly what has happened :)14:50
noonedeadpunkthat's pretty much a corner case though as well14:51
sean-k-mooneynoonedeadpunk: so if we allow evac in the powering-off state restoring it to shutdown would make sense right14:51
sean-k-mooneyrather then active14:51
noonedeadpunkyes, totally14:52
noonedeadpunkand powering-on to active :)14:52
sean-k-mooneywe had disscced doing that for vms in suspend and pause so including powering-off in that list i think is consitent14:52
noonedeadpunk(but it's harder to imagine happening)14:52
sean-k-mooneypowering-on to active would also make sense14:53
sean-k-mooneywell i dont know14:53
sean-k-mooneyif i was a custoemr and my vm sudenly stoped working i might do a start to see if that fixes it14:53
noonedeadpunkI think it will appear as active14:54
sean-k-mooneyit will yes14:54
noonedeadpunkso you are able only to reboot or shutdown?14:54
sean-k-mooneybut i might not check and just do a start but ya i normlally woud do hard-reboot14:54
sean-k-mooneyits less likely but if we are adressing this we proably should go throug all the statees and just make them consitent/intuitive14:55
noonedeadpunkI mean that you can't start already active instance - you will get same Conflict exception iirc14:55
sean-k-mooneyah14:55
sean-k-mooneyyes proably since its in the state you want14:55
noonedeadpunkyeah14:55
noonedeadpunkso powering-on would be really unfortunate co-incidence that will affect most likely only CI toolings or dunno...14:57
sean-k-mooneyya its much less likely14:58
melwittgibi, stephenfin: heya, I've updated the --task-log archive patch to address gibi's comments https://review.opendev.org/c/openstack/nova/+/780395 14:58
sean-k-mooneywe can reason about it though and come to a logic conclution for what it shoudl do so we proably shoudl just cover it14:58
stephenfinmelwitt: trying to backport https://review.opendev.org/c/openstack/nova/+/602432 at the moment (it's hell) but I'll hit that again before EOD, hopefully15:15
melwittstephenfin: ok np, and good luck15:16
kashyapstephenfin: sean-k-mooney: NUMA-related: you might find it interesting - libvirt upstream is wiring up "HMAT" - which defines the different latencies and bandwidths b/n NUMA nodes:15:24
kashyap[quote]15:24
kashyap"Links between NUMA nodes can have different latencies and bandwidths. This info is newly defined in ACPI 6.2 under Heterogeneous Memory Attribute Table (HMAT) table. Linux kernel learned how to report these values under sysfs and thus we can expose them in our capabilities XML. The sysfs interface is documented in kernel's Documentation/admin-guide/mm/numaperf.rst."15:24
kashyap[/quote]15:24
kashyapThis is called "NUMA interconnects", apparently: https://listman.redhat.com/archives/libvir-list/2021-June/msg00268.html15:24
sean-k-mooneyill take a look15:28
sean-k-mooneykashyap: that could be useful yes15:46
kashyapYep; noted15:46
*** rpittau is now known as rpittau|afk16:09
opendevreviewStephen Finucane proposed openstack/nova stable/wallaby: libvirt: Delegate OVS plug to os-vif  https://review.opendev.org/c/openstack/nova/+/79044716:42
opendevreviewStephen Finucane proposed openstack/nova stable/wallaby: fixup! libvirt: Delegate OVS plug to os-vif  https://review.opendev.org/c/openstack/nova/+/79689116:42
opendevreviewStephen Finucane proposed openstack/nova stable/wallaby: libvirt: Delegate OVS plug to os-vif  https://review.opendev.org/c/openstack/nova/+/79044716:43
*** akekane_ is now known as abhishekk16:44
opendevreviewLee Yarwood proposed openstack/nova stable/ussuri: virt: Add destroy_secrets kwarg to destroy and cleanup  https://review.opendev.org/c/openstack/nova/+/79626216:50
opendevreviewArtom Lifshitz proposed openstack/nova stable/victoria: fixtures: Handle binding of first port  https://review.opendev.org/c/openstack/nova/+/79690517:04
opendevreviewArtom Lifshitz proposed openstack/nova stable/victoria: Neutron fixture: don't clobber profile and vif_details if empty  https://review.opendev.org/c/openstack/nova/+/79690617:04
opendevreviewArtom Lifshitz proposed openstack/nova stable/victoria: functional: Add live migration tests for PCI, SR-IOV servers  https://review.opendev.org/c/openstack/nova/+/79690717:04
opendevreviewArtom Lifshitz proposed openstack/nova stable/victoria: Test SRIOV port move operations with PCI conflicts  https://review.opendev.org/c/openstack/nova/+/79690817:04
opendevreviewArtom Lifshitz proposed openstack/nova stable/victoria: Update SRIOV port pci_slot when unshelving  https://review.opendev.org/c/openstack/nova/+/79690917:04
opendevreviewLee Yarwood proposed openstack/nova stable/victoria: virt: Add destroy_secrets kwarg to destroy and cleanup  https://review.opendev.org/c/openstack/nova/+/79625917:06
opendevreviewLee Yarwood proposed openstack/nova stable/victoria: libvirt: Do not destroy volume secrets during _hard_reboot  https://review.opendev.org/c/openstack/nova/+/79626017:06
opendevreviewLee Yarwood proposed openstack/nova stable/victoria: Trival Change: Remove redundant code in instance delete  https://review.opendev.org/c/openstack/nova/+/79691217:06
lyarwoodmelwitt: https://review.opendev.org/c/openstack/nova/+/796626 - Would you mind taking a look at this today if you get a chance, stephenfin is looking to split out the cherry-pick.sh script from the pep8 job.17:14
melwittsure17:16
lyarwoodthanks17:16
opendevreviewLee Yarwood proposed openstack/nova stable/ussuri: virt: Add destroy_secrets kwarg to destroy and cleanup  https://review.opendev.org/c/openstack/nova/+/79626217:20
opendevreviewLee Yarwood proposed openstack/nova stable/ussuri: Detach is broken for multi-attached fs-based volumes  https://review.opendev.org/c/openstack/nova/+/79626317:20
opendevreviewLee Yarwood proposed openstack/nova stable/ussuri: libvirt: Do not destroy volume secrets during _hard_reboot  https://review.opendev.org/c/openstack/nova/+/79626417:20
opendevreviewLee Yarwood proposed openstack/nova stable/ussuri: Trival Change: Remove redundant code in instance delete  https://review.opendev.org/c/openstack/nova/+/79692917:20
melwittstephenfin: does this empty deps = do something? https://review.opendev.org/c/openstack/nova/+/796626/2/tox.ini#8717:22
sean-k-mooneymelwitt: i beleive it prevent use installing any deps17:22
melwittok17:22
sean-k-mooneywe just need bash so this shoud make it slightly faster17:23
sean-k-mooneysicne we ware just looking at the commit message17:23
melwittthanks17:24
*** mdbooth4 is now known as mdbooth17:33
opendevreviewLee Yarwood proposed openstack/nova stable/train: virt: Add destroy_secrets kwarg to destroy and cleanup  https://review.opendev.org/c/openstack/nova/+/79693518:19
opendevreviewLee Yarwood proposed openstack/nova stable/train: Detach is broken for multi-attached fs-based volumes  https://review.opendev.org/c/openstack/nova/+/79693618:19
opendevreviewLee Yarwood proposed openstack/nova stable/train: Handle unset 'connection_info'  https://review.opendev.org/c/openstack/nova/+/79693718:19
opendevreviewLee Yarwood proposed openstack/nova stable/train: Trival Change: Remove redundant code in instance delete  https://review.opendev.org/c/openstack/nova/+/79693818:19
opendevreviewLee Yarwood proposed openstack/nova stable/train: libvirt: Do not destroy volume secrets during _hard_reboot  https://review.opendev.org/c/openstack/nova/+/79693918:19
opendevreviewMerged openstack/nova master: Move 'check-cherry-picks' test to gate, n-v check  https://review.opendev.org/c/openstack/nova/+/79662621:42

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!