Monday, 2022-03-07

opendevreviewHYSong proposed openstack/nova master: fix local volume extend  https://review.opendev.org/c/openstack/nova/+/83218002:05
opendevreviewXuan Yandong proposed openstack/nova master: Remove redundant symbols  https://review.opendev.org/c/openstack/nova/+/83218503:29
opendevreviewkiran pawar proposed openstack/nova master: VMware: Split out VMwareAPISession  https://review.opendev.org/c/openstack/nova/+/83215609:41
opendevreviewkiran pawar proposed openstack/nova master: VMware: StableMoRefProxy for moref recovery  https://review.opendev.org/c/openstack/nova/+/83216409:41
ignaziocassano_Hello, sometimes the volume retype from a netapp nfs storage to another netapp nfs storage does not work. I do not know the reason but I think something is going wrong in nova:10:10
ignaziocassano_ File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1593, in _swap_volume\n    raise exception.VolumeRebaseFailed(reason=six.text_type(exc))\n', "VolumeRebaseFailed: Volume rebase failed: Requested operation is not valid: pivot of disk 'vda' requires an active copy job\n"]: VolumeAttachmentNotFound: Volume attachment 2cd820e0-85e8-498d-a62a-800260d0cf31 could not be found10:11
ignaziocassano_Any helo please ?10:11
ignaziocassano_Any help please ?10:11
kashyapignaziocassano_: No direct answer, but that error (from libvirt) means: the "volume retype" (i.e volume migration) itself is not active10:41
kashyap"active copy" == the copy that is on the NFS and is being mirrored from the NetApp storage10:42
kashyapAlso what version of OSP is this?  And also mention libvirt/QEMU versions10:42
* kashyap --> needs to be AFK briefly10:42
ignaziocassano_kashyap: I am using queens on centos 7 libvirt 4.5.0 QEMU emulator version 2.12.0 (qemu-kvm-ev-2.12.0-33.1.el7_7.4)10:45
ignaziocassano_Somettimes retyped volumes are corrupted and file system on instances went in read only10:48
kashyapignaziocassano_: The versions seems moderately old (~2017/2018); lots of storage bugs have been fixed in this area.  And that corruption doesn't sound good.11:05
kashyapI don't know if this is even reproducible consistenlty in your env.11:05
kashyapSo many variables :-(11:05
sean-k-mooneyignaziocassano_: what version of nfs are you using11:15
sean-k-mooneyignaziocassano_: nova recommends v4.0 as a minium preferably 4.211:16
ignaziocassano_sean-k-mooney: I do not knkow why, but the controlles mount cinder with version 4.0 while the compute nodes are using nfs vers 311:17
sean-k-mooneywe know that v3 has some issue with lockign that might affect data integrety11:17
sean-k-mooneyim really not sure how mixing would affect things11:18
ignaziocassano_ok, so I must specify nfs_mount_options in nova.conf on compute nodes11:18
sean-k-mooneyit slikely not adviasble but i suspect that this is outside the scope of nova.11:18
ignaziocassano_I will try it11:19
kashyapOh, yeah - the NFS version also plays a role.  Indeed we recommend a minimum of NFS > 4.211:19
sean-k-mooneyhttps://docs.openstack.org/nova/latest/configuration/config.html#libvirt.nfs_mount_options11:19
sean-k-mooneylikely can help11:19
sean-k-mooneybut i am not that familar with it11:19
sean-k-mooneyi know downstream we have some recommended optiosn for that related to selinux11:19
sean-k-mooneybeyond that i have never really looked at what we suggest setting there11:20
ignaziocassano_thanks for your help11:24
dmitriissean-k-mooney, gibi: zuul seems to be happy about https://review.opendev.org/c/openstack/nova/+/82997414:49
sean-k-mooneyah thanks for the reminder15:01
opendevreviewAlexey Stupnikov proposed openstack/nova master: Add functional tests to reproduce bug #1960412  https://review.opendev.org/c/openstack/nova/+/83001015:10
sean-k-mooneygibi: can you spot check my expectations, when we shelve an instance would you expect use to unbinding the ports form the host15:36
sean-k-mooneyor well when we shelve offload15:36
gibisean-k-mooney: I think when we offload we should unbind from the host, otherwise the physical resource (i.e. pci device) is not freed. Or can we free up a PCI device without unbinding the port?15:39
sean-k-mooneyack that is what i woudl expect too but we dont15:39
sean-k-mooneythe port shoudl still be atached to the vm but it shoudl not be bound to a host or an ml2 driver once its offloaded15:40
gibiso we shoudl keep device_id in the port. Is that enough to keep the port reserved in neutron?15:41
sean-k-mooneyyes the device_id is enough to track ownwership15:41
sean-k-mooneybut binding:host_id shoudl be set to ''15:42
sean-k-mooneyim trying to create functional test for vdpa move operations while if figure out if i can get a 2 node deployment to test/develop move ops15:42
sean-k-mooneybut when i was writign the shelve test i noticed it was not cleared15:43
sean-k-mooneyso i dont know of this cause any bug but its not what i was expecting15:43
sean-k-mooneygibi: we are relying on driver.cleanup to tear down the networking on the host15:44
sean-k-mooneywhich it does and that also disconnects the volumes15:44
gibido we free the compute claim?15:44
sean-k-mooneybut we dont actully unbind the ports15:44
sean-k-mooneyits a good question i would have to look but i think so15:45
sean-k-mooneybut im not certin now15:45
gibiif we free the claim but does not unbind the port then I think we have no resource problems it is just ugly / misleading that we keep the binding:host_id in neutron 15:46
sean-k-mooneyya 15:46
sean-k-mooneythat would happen in the compute manager i guess15:47
sean-k-mooneywe call  self.rt.delete_allocation_for_shelve_offloaded_instance15:47
sean-k-mooneyi would assuem that would free them15:47
sean-k-mooneyno... that is just clearing the placment allocation15:48
sean-k-mooneywe do update the resouce tracker after that however15:49
sean-k-mooneymy func test say we do not free  testtools.matchers._impl.MismatchError: 4 != 315:58
opendevreviewAlexey Stupnikov proposed openstack/nova master: Clean up when queued live migration aborted  https://review.opendev.org/c/openstack/nova/+/82857015:58
sean-k-mooneygibi: i might try and reporduce this in devstack and see if the same is true in realitiy16:05
gibimaybe the periodic update_available_resources task frees it16:06
sean-k-mooneyi ran that in the func test with run_perodics16:06
sean-k-mooneyi tought it would 16:06
sean-k-mooneybut apperently not16:06
sean-k-mooneyits not inconciveable this is a func test issue but its worth figureing out16:08
dansmithwhat is the nova-emulation job?16:09
sean-k-mooneyit test emulating arm vms16:09
sean-k-mooneyso x86 host bootign arm vms16:10
dansmithis it supposed to be stable?16:10
dansmithseems like a ton of rechecking going on these days, and nova jobs have gotten pretty fat16:10
sean-k-mooneyam i dont know if that is stable16:10
sean-k-mooneywe just enabled it a few days ago16:10
dansmithit's voting,16:11
sean-k-mooneyi did not think it was failing but it can certenly be set non-voting or moved to periodic16:11
dansmithand I just saw a kernel panic on it16:11
dansmithI think it's a guest kernel16:11
sean-k-mooneyhttps://zuul.openstack.org/builds?job_name=nova-emulation&skip=016:12
sean-k-mooneyit looks kind of ok16:12
sean-k-mooneyi think that is the first failure since it was merged16:12
dansmithhttps://zuul.opendev.org/t/openstack/build/cb1314bff0f34bfdbb3a4f1fd5547b7216:12
dansmithokay, well, regardless, nova jobs are looking pretty heavy16:12
dansmithI dunno how widely-known it is, but we're losing 30% of our CI capacity at the end of the month16:12
dansmithso we'll probably need to be making some cuts16:13
dansmithwhat's the major benefit of testing arm-on-x86?16:13
sean-k-mooneyits a proxy for ensureing that the new emulation featur works in general16:14
sean-k-mooneyit could be a weekly job16:14
sean-k-mooneyor run only on libvirt changes16:14
dansmiththe thing that lets us choose the guest emulation mode you mean?16:14
dansmithcouldn't it be a single test? if we have an arm image available, couldn't we just boot one instance from it and make sure it's alive, instead of a whole other job?16:15
sean-k-mooneywell we ant to ensure resize ectra works16:16
sean-k-mooneywe could proably do it as a post action or something more light weight16:16
dansmithsure, so one scenario test that boots, resize, snapshot, etc16:16
sean-k-mooneyya we coudl do that16:16
dansmithjust saying, it seems pretty expensive for a minor verification16:16
sean-k-mooneywell the idea was to test all feature with emulation16:17
sean-k-mooneybut we can 1 move it to weakly and 2 make it a set of senario tests16:17
sean-k-mooneychateaulav:^16:17
dansmithyeah, ideally we'd run every configuration on every patch, but..16:18
chateaulavsean-k-mooney: would the scenario tests need added to the tempest project?16:20
sean-k-mooneyya16:20
sean-k-mooneywell or as a plugin16:20
sean-k-mooneybut upstream tempest i think would be ok16:20
chateaulavyeah, i noticed it took some time for the ci itself to run. so then we want to pursue a new tempest scenario test that can be added into another ci?16:22
chateaulavthen pause the nova emulation, or run it not as frequently?16:23
dansmithif there's a high likelihood of it being broken, then a tempest test to check that on each patch would be good16:23
dansmithhowever, if it's not very likely, then a weekly periodic test would be better and easier16:23
dansmithI suspect the latter16:24
chateaulavyeah. i think long term, maybe next cycle add in the tempest scenario that we can leverage. I think the weekly periodic would be good for the interim though.16:26
chateaulavyour thoughts sean-k-mooney16:26
dansmithwas anything not working when we first tried to do this?16:27
opendevreviewAlexey Stupnikov proposed openstack/nova master: Clean up when queued live migration aborted  https://review.opendev.org/c/openstack/nova/+/82857016:27
chateaulavwhat do you mena in regards to not working?16:29
dansmithchateaulav: you added the ability to select the guest emulation mode right? when you added that, were other things broken that made that non-trivial?16:30
dansmithor, how invasive was the change? it thought it was mostly just a flag16:30
chateaulavyeah, so the  main item is the meta property that lets you define the guest architecture16:33
chateaulaveverything else was mods to the various checks to account for reading that value along with the host arch16:33
kashyapsean-k-mooney: I franky question the value of this "nova-emulation" job, given dansmith's comment on the impact.16:34
kashyapAlso who are the users for this?16:34
chateaulavand then choosing the guest arch if it was defined. so the ci is just to ensure the emulation works. it is highly likely that changes to nova wont affect its functionality, because it follows the logical paths for the physical architecture support16:35
sean-k-mooneykashyap: well chateaulav for one :)16:35
kashyapHmm, still16:36
kashyapchateaulav: Also, please note: https://www.qemu.org/docs/master/system/security.html#non-virtualization-use-case16:36
sean-k-mooneykashyap: they are aware. there are many production uscase for it even with that in mind16:37
dansmithyeah, really seems pretty low-impact in terms of a feature, and a whole job on every change is very high cost16:37
sean-k-mooneyprobly not public cloud16:37
dansmithI tend to think that even a scenario in every job is more expensive than we need16:38
dansmitha weekly periodic is fine if we want, but..16:38
kashyapYeah, 30% impact on other nodes is just too much16:38
sean-k-mooneywell its not 30% from this job16:39
kashyapsean-k-mooney: "many cases" - I'm assuming they don't give a hoot about security16:39
sean-k-mooneywe are loosing one of the providers i assume16:39
sean-k-mooneykashyap: much of our downstream ci use qemu some uses kvm16:39
sean-k-mooneyso for ci, package building it think its fine16:40
kashyapsean-k-mooney: Well, near as I know, most is exercising nested KVM16:40
kashyapInternal CI is fine16:40
sean-k-mooneydont forget that rackspace used to run there public cloud on power provideign x86 vms16:40
dansmithum, what? that's news to me :)16:41
dansmithI think they toyed with that, probably for second-source reasons but.. not to my knowledge for anything real16:41
dansmitheven still, that doesn't mean it makes sense, or is a good idea with qemu, and arm on x86 :)16:42
sean-k-mooneythey used ot have xen but also ppc host16:42
dansmithyeah, probably for political reasons :)16:43
sean-k-mooneyperhaps16:43
chateaulavyeah initial use of this is not meant to real-world systems. it is to bring testing and validation forward a little more so you dont have to run physical, and then work towards greater parity going forward16:44
kashyapchateaulav: Okay, as long as you're clear that for any production usage this cross-arch emulation is entirely unfit. 16:48
kashyapDepending on the (cross-arch emulation) config, you still have _massive_ holes for a truck to comfortably drive through ;-)16:49
chateaulavcorrect, this is entirely meant to bring security testing, validation testing, and providing simulated environments (which doesnt exist anywhere) to the common person within openstack16:50
dansmithchateaulav: yeah, so it's cool if this is a toy, useful for developers or whatever, but that means the ci impact has to be negligible, IMHO16:50
chateaulavdansmith: I was requested to add a ci in, so from the Nova Core Dev community perspective use it as you see fit. no need to waste ci time if it is exhuasting a lot of extra. I think i would be useful to have a periodic check to ensure that it remains functional; however, i can see it also being added to an existing ci as a scenario for the long term support of its testing16:58
dansmithchateaulav: yeah, I understand, I'm not blaming you16:59
chateaulavfor sure, just want to make sure you understand our overall intent for this feature as a whole17:00
bauzaschateaulav: dansmith: fwiw, I explain in the prelude that this is experimental and not tested in our CI17:23
bauzasnot false promises17:23
bauzasno*17:23
bauzasplus in the cycle highlights, hoping the marketing folks don't freak out and write something wrong17:24
dansmithbauzas: okay but it is tested in our ci, has already broken for me this morning, and is costing a fair bit in terms of resource17:25
dansmithbut if you mean to describe it that way (and make the job reflect that) then ++17:25
bauzasI'm just testing the prelude as I write, so I'll upload it17:25
bauzasheh, done17:25
bauzasuploading it so reviews are welcome17:26
opendevreviewSylvain Bauza proposed openstack/nova master: Add the Yoga prelude section  https://review.opendev.org/c/openstack/nova/+/83229217:26
bauzasdansmith: gibi: sean-k-mooney: gmann: ^17:26
gibibauzas: ack, I will look at it tomorrow morning17:27
gmannbauzas: thanks. will check in my after noon17:27
sean-k-mooneygibi: my func test now show that the device are claimed and freed porperly i had an off by on error19:28
sean-k-mooneyclaiming a vdpa device decremets the total count by 219:30
sean-k-mooney1 for the vdpa device and 1 for the pf19:30
opendevreviewMerged openstack/nova master: Add grenade-skip-level irrelevant-files config  https://review.opendev.org/c/openstack/nova/+/83122920:36
opendevreviewsean mooney proposed openstack/nova master: [WIP] add fun tests for VDPA operations that should work.  https://review.opendev.org/c/openstack/nova/+/83233020:46
*** dasm is now known as dasm|off23:36

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!