Tuesday, 2022-02-22

opendevreviewMinghong Hou proposed openstack/nova master: db: Retrieve VirtualInterface objects by UUID, not address  https://review.opendev.org/c/openstack/nova/+/82881901:36
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Register defaults for undefined hw image properties  https://review.opendev.org/c/openstack/nova/+/80070801:46
opendevreviewMinghong Hou proposed openstack/nova master: db: Retrieve VirtualInterface objects by UUID, not address  https://review.opendev.org/c/openstack/nova/+/82881906:32
*** amoralej|off is now known as amoralej07:34
opendevreviewyuval proposed openstack/nova master: Lightbits LightOS driver  https://review.opendev.org/c/openstack/nova/+/82160607:39
opendevreviewFelix Huettner proposed openstack/nova stable/stein: Gracefull recovery when attaching volume fails  https://review.opendev.org/c/openstack/nova/+/82985908:06
opendevreviewFelix Huettner proposed openstack/nova stable/rocky: Gracefull recovery when attaching volume fails  https://review.opendev.org/c/openstack/nova/+/82986008:07
opendevreviewFelix Huettner proposed openstack/nova stable/queens: Gracefull recovery when attaching volume fails  https://review.opendev.org/c/openstack/nova/+/82986108:07
opendevreviewElod Illes proposed openstack/nova stable/queens: Fix sphinx doc job  https://review.opendev.org/c/openstack/nova/+/83032709:09
opendevreviewFabian Wiesel proposed openstack/nova master: Transport context to all threads  https://review.opendev.org/c/openstack/nova/+/82746711:05
opendevreviewBalazs Gibizer proposed openstack/nova master: Fix eventlet.tpool import  https://review.opendev.org/c/openstack/nova/+/83038311:42
chateaulavgibi: with a multi patch series, like i have. when i go to update, do i rebase back to the driver patch and then make the appropriate changes, or do I have to submit the whole series for review each time even though the objects patch has the first +2? 11:50
chateaulavi hope that makes sense, just want to make sure i fully understand and am not creating to much work for you guys11:51
sean-k-mooneychateaulav: the object patch is the base patch so you dont have to rebase that to work on the following patch12:00
chateaulavok12:00
sean-k-mooneyyou have 2 ways to do it you can write a patch at the end to adress the issues and then do an interactive rebase to squash merge it into the patch its fixign12:00
sean-k-mooneyour you can start with a interactive rebase and mark the commits you want to alter for editing12:01
sean-k-mooneyin both cases you tell git to keep the base object patch exactly as is12:01
chateaulavok, good, thats what im doing then. just wanted to make sure12:01
sean-k-mooneythen when you finish the rebase it will have only modified the following commits and the git review at the end will not update the first review12:02
sean-k-mooneyyou can also pass -R to git review to ensure it does not do any automatic rebases for you12:02
sean-k-mooneygibi: did you see ralonsoh comments on https://review.opendev.org/c/openstack/neutron/+/82924712:05
sean-k-mooneygibi: while form a nova perspeictive it looks like the port is fully updated apprently its not fully updated in the db12:06
opendevreviewyuval proposed openstack/nova master: Lightbits LightOS driver  https://review.opendev.org/c/openstack/nova/+/82160612:12
gibisean-k-mooney: now read ralonsoh's comment 12:16
gibisean-k-mooney: I'm not a neutron expert so it can very well be that the change is not persisted12:17
sean-k-mooneythe fact that what ralonsoh commented is possibel today speaks to a larger problem in neutron IMO but i guess for now you just need to ensure the db version is also updated12:17
ralonsohsean-k-mooney, what other problem?12:17
sean-k-mooneygibi: i think its possible perseited but only in memory12:17
sean-k-mooneyralonsoh: teh fact that you can partly update a port but later api calls will show it as updated12:18
sean-k-mooneyeven though the db is not12:18
sean-k-mooneyit suggest there is a problem with caching or how neutron is retriving data12:18
gibiralonsoh: if you happen to have a pointer where shoudl the mac_address be updated in neutron for this to work then please let me know that will help me updating the patch properly12:19
ralonsohsean-k-mooney, no no, this is not the case12:19
ralonsohgibi, yes, let me work on this today (or tomorrow morning)12:19
ralonsohI'll focus on this12:19
sean-k-mooneyralonsoh: the unit test are doing a port show to get the port after it was updated, and you said you could see the chage in port show/list12:20
gibiralonsoh: thanks12:20
sean-k-mooneyeventhough you also said the db record still had the old mac12:20
sean-k-mooneyso there is clearly a disconenct in my and your understanding12:20
ralonsohsean-k-mooney, yes, this is because you populate the port dictionary (the one that is going to be in the JSON) with the port register info12:20
ralonsohand other related db registers12:20
ralonsohsean-k-mooney, the port dictionary comes from how neutron was implemented 10 years ago12:21
ralonsohwithout OVOs12:21
sean-k-mooneyright so you are not just readign the port info form the db12:21
ralonsohthe OVO is a DB view, linking several DB registers12:21
ralonsohthis method (and most of the _dict_* ones)12:22
ralonsohare only methods to populate in one dict several DB registers data12:22
sean-k-mooneybut its clearly possibel for the mac to have 2 different values in two places so db consitecy is not enforced12:23
gibiOK I think I see. Every time a port is returned it will hit the code I added so that code will apply the MAC from the binding _on the port dict being returned_12:23
gibibut I never apply the MAC from the binding on a port that is being saved to the db12:23
sean-k-mooneygibi: ya you coudl validate that by claring the mac form the binding profile12:23
ralonsohsean-k-mooney, no, it is not. But this is why we should review what we populate in the resource dictionaries12:23
sean-k-mooneyand asserting it does not change12:24
sean-k-mooneyralonsoh: well our actully remove them and just use the ovos12:24
sean-k-mooneywe still ahve some dicts in nova too like the bdms12:25
sean-k-mooneybut we try to not use dicst where we can avoid it12:25
ralonsohsean-k-mooney, those _dict_* methods are also used by the extensions12:25
ralonsohto populate new data in the resource dictionaries12:25
ralonsohin any case, changing this now is complex and an unnecessary refactor12:26
sean-k-mooneywe can disagree on unnesesary but it would prevent the possibleity fo havign the db and runtime view getting out of sync in this way12:27
sean-k-mooneywell or at least reduce it12:27
sean-k-mooneyso right here at the end https://review.opendev.org/c/openstack/neutron/+/829247/2/neutron/tests/unit/plugins/ml2/test_port_binding.py#77612:28
sean-k-mooneyif we just add a call to clear the mac in the binding profile12:28
sean-k-mooneyand then assert the mac has the new value it should fail12:28
sean-k-mooneysince it will nolonger triger gibis change and we will get the mac from the db right12:29
gibihm that acutally has an interesting side effect12:29
gibiif a port was bound to a PF then unbound from it12:29
gibithen that port keeps the MAC of the PF12:29
ralonsohright, and you can also retrieve the port created and stored in the DB12:29
gibiso if another port is tried to be bound to the same PF12:29
gibithat would cause MAC conflict in neutron 12:30
sean-k-mooneygibi: because the reset of the mac only resets the mac in the db12:30
sean-k-mooneyya unless the bidnign profile is cleared it woudl be a problem12:30
gibiI'm not sure how the reset works today, but if we follow ralonsoh's suggestion and persist the MAC in the DB then when the binding is removed we need to clear the DB too12:31
sean-k-mooneyisnt there already code for that12:31
sean-k-mooneyfor the db part12:31
sean-k-mooneythat code wont clear the mac in the profile however12:31
gibithe profile is cleard by nova12:31
gibithat is OPK12:31
gibiOK12:31
gibias I said I'm not sure about the reset logic12:32
sean-k-mooneyok then the old code for reseting the mac shoudl fix the db version12:32
sean-k-mooneyhttps://github.com/openstack/neutron/commit/e603d19939c98b94081bb6a3af8bcb943f7bd2ce12:33
gibiahha OK12:34
gibithen that is OK12:34
sean-k-mooneyya the detach case is covered12:34
ralonsohsean-k-mooney, https://github.com/openstack/neutron/commit/e603d19939c98b94081bb6a3af8bcb943f7bd2ce12:40
ralonsohthis is when the port MAC can be changed12:40
ralonsohof course, in this call you can bind the port and change the MAC12:40
ralonsohthe only consideration is the other port with the same MAC12:40
ralonsohbecause you first bind the new port and then unbind the older one, right?12:41
sean-k-mooneyno12:43
sean-k-mooneythere is only 1 port12:43
sean-k-mooneywe either have multiple prot bidnings12:43
ralonsohright, right12:43
sean-k-mooneyor we do a update of the single port binding12:43
sean-k-mooneyhttps://review.opendev.org/c/openstack/neutron/+/829247/2/neutron/tests/unit/plugins/ml2/test_port_binding.py#775 does that make sense12:43
ralonsohmy bad, there is only one DB port, two L1 ports12:43
sean-k-mooneyyes12:43
ralonsohno because of the implementation of "_update_port_dict_binding"12:45
sean-k-mooneyi was expecting that call to be wrong12:45
ralonsohin L672 we are calling the API12:45
ralonsohthis API will retrieve the DB port and create this dictionary, this is what is returned12:45
ralonsohwhy?12:46
sean-k-mooneyright but what i was trying to capture was the logic of unset bidnding:profile['port_mac']12:46
sean-k-mooneyto ensure that mac is not coming form the profile but the db12:47
ralonsohahhhh sorry12:47
ralonsohyes, this is after deleting the port binding12:47
sean-k-mooneyafter deleteing the source one yes12:47
ralonsohso we are reading the port randomly generated mac12:47
ralonsoh        port = self._create_unbound_port()12:48
ralonsoh        # neutron generates a MAC for each port created12:48
ralonsoh        generated_mac = port[port_def.PORT_MAC_ADDRESS]12:48
sean-k-mooneywith the current code ya but we expect it to keep the host1 pf mac12:48
sean-k-mooneyit wont now because of the bug12:48
sean-k-mooneyso i woudl like the test to be writen so that it caught the bug so it cant be regressed going forward12:49
ralonsohsean-k-mooney, when the port is first assigned, the mac is changed, right?12:50
ralonsohbefore the live migration12:50
ralonsohthe Neutron DB port register MAC is changed12:50
sean-k-mooneyyes12:50
ralonsohok12:50
sean-k-mooneynova does a port update before we bind the port12:50
sean-k-mooneythe issue is we cant do a port update while it is bound12:51
ralonsohso the case here is to add this exception: to allow to change the MAC of the port even if it is bound12:51
ralonsohbut12:51
ralonsohyou also need to keep the original MAC12:51
ralonsohright?12:51
gibino12:51
gibiwe need to be able to define MAC per binding12:51
gibifor mutliple binding12:51
gibiat least that is how I model this12:52
ralonsohso this is a different feature I think12:52
gibione MAC per binding12:52
sean-k-mooneywe dont generally care about the orginal mac since we can always just read that form the PF if we need it12:52
ralonsohyes, like multiple port binding12:52
ralonsohok, do you mind if I spend some time this afternoon or tomorrow thinking about this?12:52
ralonsohI need to finish some patches now12:53
gibiralonsoh: sure 12:53
ralonsohthanks!!12:53
ralonsohI'll ping you12:53
gibithanks!12:53
gibiI anyhow need to work on the nova side of this too12:53
gibisean-k-mooney: I like the idea to define the MAC when nova creates the binding instead of defining it when nova activates an inactive binding12:54
gibiso I'd like to store the MAC in the binding profile12:54
gibiit feels clearer12:54
gibias we define the rest of the infor in the profile at the same time like slot12:54
sean-k-mooneyack12:55
sean-k-mooneybut in that case we need to permently keep the mac in the profile yes12:55
sean-k-mooneyso we shoudl be doing this for first spawn and sriov attach too12:55
sean-k-mooneyfor consitency12:55
gibiyes we do it12:55
gibiany time we bind a SRIOV PF12:56
gibior create an inactive binding12:56
sean-k-mooneyok and the key we use in teh profile for the PF mac is also different form teh key we use for the PF mac in the case of remote managed prots right12:56
sean-k-mooneyjust making sure we dont have a collision there12:57
gibiit is differen12:57
gibit12:57
gibithe remote managed codepath stores pf_mac_address12:57
gibiin the VF profiles12:57
gibimy bugfix stores mac_address in the PF profile12:57
sean-k-mooneyack12:57
*** amoralej is now known as amoralej|lunch13:08
opendevreviewDmitrii Shcherbakov proposed openstack/nova master: Fix migration with remote-managed ports & add FT  https://review.opendev.org/c/openstack/nova/+/82997413:21
dmitriissean-k-mooney, gibi: ^ found a missing piece related to migration when adding more func tests. Since binding:profile is updated differently during live migration there is a slight change needed to also update new fields.13:24
dmitriisI added a test case for live migration, looking into adding resize/evacuate/cold_migrate now as well.13:25
sean-k-mooneydmitriis: you will need to update the inactive port bindign with the dest host info13:29
sean-k-mooneydmitriis: i assume you are goign to supprot live migration by hot unplugging and hot plugging the VF as we do for vnic_type=direct yes13:30
sean-k-mooneyin the future we might eb able to read if the vf supprot live migration from sysfs and skip that if it does13:30
dmitriissean-k-mooney: the fix is quite simple https://review.opendev.org/c/openstack/nova/+/829974/2/nova/compute/manager.py#10805 - as you say, it's updating the inactive port binding13:31
sean-k-mooneyya although that is adding more coupling to sysfs13:32
sean-k-mooneygibi how do you feel about ^13:32
dmitriisyes, I was thinking of storing more in extra_info but this kind of information is more volatile (pf mac, vf num)13:33
dmitriiscompared to the serial number that we currently store in extra_info13:33
sean-k-mooneydmitriis: gibi was going to store it in extra info13:33
sean-k-mooneywell the pf mac is basically hardcoded in the flash13:34
gibiI'm on the side of moving this info the PciDevice object populated by the virt driver 13:34
sean-k-mooneyya13:34
sean-k-mooneywe have that object here so we can read if form there13:34
chateaulavto set image metadata during a ci tempest, i would most likely have to use a script as there are no variables, correct?13:34
sean-k-mooneyso really we shoudl put this info in the extra_info dict with properties to access it form teh object13:34
sean-k-mooneychateaulav: am it depends if you are writing new test then you woudl do it in the test13:35
dmitriissean-k-mooney: well, the PF itself may be a vport of a NIC switch. I've seen HW with variable number of PFs configurable (per a given flash configuration though)13:35
sean-k-mooneybut if you are trying to reuse exsitng tempest then you woudl have a pre playbook that generates a local.sh that runs at the end of devstack before tempest is run to update the default image with the properties13:35
dmitriisit's *probably* safe to assume that the PF mac won't change across reboots is what I am going for. But we may encounter odd cases too.13:36
sean-k-mooneychateaulav: you can also config tempst to use a different image and have devstack download addtional images via the local.conf13:36
chateaulavsean-k-mooney: k, thanks13:36
sean-k-mooneydmitriis: changing the number of PF would change ther PCI adress which would not be supported if you had existing vms13:37
sean-k-mooneywe can also update the mac in the db on agent start13:37
sean-k-mooneyalthoguh that wont update the pf mac in the neutron port so maybe we shoudl detect that and cause an error13:38
sean-k-mooneyor at least a warning13:38
dmitriisVF numbers have an offset but they should remain stable unless the number of PFs changes13:39
sean-k-mooneyin the unlikely case a nic dies we likely want to be able to replace it with another nic13:39
sean-k-mooneydmitriis: right which we dont want to supprot13:39
dmitriissean-k-mooney: ok, then I agree in principle that we can store vf_num and pf_mac in extra_info13:39
dmitriisnot hard to retrieve it from there either13:40
sean-k-mooneywe are already storing them in the neutron db in the port bidnign13:40
sean-k-mooneyso we already have the requirement that they be stable13:40
dmitriisright13:40
sean-k-mooneyif we detach a change in the mac we can warn on that in agent start if we want to and tell the operator what the old mac was and the new one so that they know they have to manually update the ports. we could also automaticaly do that i guess but i woudl proably defer that to a followup patch13:41
dmitriisack13:43
dmitriisSo I think that's the change gibi is working on https://review.opendev.org/c/openstack/nova/+/82924813:46
gibidmitriis: yes, but note the self -1 :) I realized that I need to change direction13:48
gibito use the PciDevice object 13:48
dmitriisgibi: ack. I'll focus on adding the remaining func test cases in the short term but I'll be around to help with moving things to PciDevice object13:50
gibidmitriis: thanks13:51
opendevreviewFelix Huettner proposed openstack/nova stable/queens: Gracefull recovery when attaching volume fails  https://review.opendev.org/c/openstack/nova/+/82986113:57
opendevreviewAlexey Stupnikov proposed openstack/nova master: Add functional tests to reproduce bug #1960412  https://review.opendev.org/c/openstack/nova/+/83001014:06
*** amoralej|lunch is now known as amoralej14:13
opendevreviewyuval proposed openstack/nova master: Lightbits LightOS driver  https://review.opendev.org/c/openstack/nova/+/82160614:33
opendevreviewJonathan Race proposed openstack/nova master: driver/secheduler/docs for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82205314:46
opendevreviewJonathan Race proposed openstack/nova master: zuul-job for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82837214:46
sean-k-mooneygibi: can you respond ot https://review.opendev.org/c/openstack/placement/+/826492/8/placement/lib.py#219 i think that is the only thing im not sure about in your placement series14:53
sean-k-mooneyi have +w most of the patches before that so they should all be sitting in the ci queue and should merge up to that point later today14:54
gibisean-k-mooney: looking...15:03
sean-k-mooneyits pretty minor15:04
sean-k-mooneyjust i dont think we are checkign for conflicts properly15:04
gibiI have to load context... :)15:05
sean-k-mooneylike i think we bascially should be flattening the list of required traits and asserting that there is on intersection with forbidden15:05
sean-k-mooneyim not sure using all does that15:05
gibiso the goal is to find a conflict in the request like required=T1,!T1 15:11
gibiin the complex case15:11
gibirequired=in:T1,T2&required=!T215:11
gibi!T2 is not a conflict15:11
opendevmeetgibi: Error: "T2" is not a valid command.15:11
gibibut15:12
sean-k-mooneywell it shoudl still be rejected15:12
gibiit is satisfyable15:12
sean-k-mooneyonly if we have t115:13
gibiyes15:13
sean-k-mooneyright im not sure we shoudl allow that15:13
gibiI think we cannot expect that the client will normalize the query15:13
sean-k-mooneyi dont think any frobiden traits should over lap with any required even if we in15:13
gibimaybe these required triat pieces are coming form different direction in nova15:13
sean-k-mooneysince we cant make forbidne traits optional15:13
gibione prefilter says I need either T1 or T2 another prefilter says I don't like T215:14
gibiI don't want nova to do the normalization in this case15:14
sean-k-mooneyi kind of feel like nova shoudl be preprocessing that rahter then having ot have placment fiture that out15:14
sean-k-mooneybut i see why you are trying to be more lax15:15
gibiI think it is a lot easier to accept it than forcing the client to normalize15:15
sean-k-mooneyrequired=in:T1,T2&required=!T2 is logically jsut required=T115:16
gibiyes15:16
sean-k-mooneyif we do accept that we shoudl really preprocess that before we generate the sql queries15:16
gibiwhy?15:16
gibiI think sql can handle this15:16
gibiI even think sql should have the query planner to optimize this out15:17
sean-k-mooneyi mean it could but i think that will not scale well15:17
bauzassean-k-mooney: thanks for accepting the existing SQL query from gibi15:17
gibi* the sql server15:17
sean-k-mooneygibi: if that was your orginal intent i guess what you have will do that15:17
bauzassean-k-mooney: as I knew since 10 years, in general the subqueries use the same execution plans than in the inner joins15:17
bauzasbut we could verify it with the EXPLAIN query15:18
sean-k-mooneybauzas: my expirnce is inner joins like that tended to result in more memory usage then subquires15:18
sean-k-mooneyim not sure that sqligte will optimise to the same degree as mysql in this regard15:18
gibiI don't believe either 15:19
gibibut nobody runs sqlite in production (hopefully :)15:19
bauzassean-k-mooney: well, in general it depends on the DMBS15:19
sean-k-mooneyya i was more worreid about our func tests but15:19
bauzassean-k-mooney: but with mysql 5.0 I saw that it was the same15:20
sean-k-mooneywe likely wont have db quiries that will stress it enough there to matter15:20
opendevreviewElod Illes proposed openstack/nova master: Lightbits LightOS driver  https://review.opendev.org/c/openstack/nova/+/82160615:20
sean-k-mooneybauzas: lets revisit this in the future when we have data one way or another15:21
bauzas++15:21
bauzasagreed15:21
gibiI promise I get back to this and gather some performance data from a mysql env15:21
sean-k-mooneybauzas: if you woudl not mind reviwing 826492 that is the only patch that does not have +w15:22
sean-k-mooneybauzas: gibi and i were just discussing how we detct conflicts15:22
bauzassean-k-mooney: I could do this after the meeting15:22
bauzasbtw.15:22
bauzasnova meeting in 38 mins15:23
bauzashere15:23
sean-k-mooneyack for now i think i can accept gibis explation of what they were trying to enable15:23
gibicool15:23
sean-k-mooneywe can let the db handel the extra complexity for now and we can simplyfy the query either in nova or placment in the futrue if needed15:23
kashyapgibi: When you get a minute, do you, or anyone know what's going in this test here? - https://github.com/openstack/tempest/blob/master/tempest/api/compute/images/test_list_image_filters.py#L108 15:23
gibithank you15:23
sean-k-mooneygibi: im just worried that if we say "the db query optimiser will take care of it" too much we could get some nasty surprise in large deplopyments15:24
kashyap(I'm trying to debug an upstream TripleO bug from hell that's "critical", and it's hitting the failure at line-128)15:24
gibisean-k-mooney: sure, I will do some performance testing to see if it matters15:24
gibikashyap: looking...15:26
kashyap(That's the Tempest bug here: https://bugs.launchpad.net/tripleo/+bug/1959014/)15:27
gibikashyap: boot servers, then snapshot the server and upload the image to glance15:27
kashyapIs it _really_ a live snapshot?  Where "it" == create_image_from_server()15:27
kashyapIs it a live snapshot?  /me looks15:27
kashyapcreate_image_from_server() --> create_image()15:28
gibiit depends 15:30
kashyapHm, I'm combing through the libvirt <-> QEMU logs given that it seems to fail at "live snapshot", but need more time to dig15:31
sean-k-mooney kashyap  its not always a live snapshot there is a config option that you can enabel.15:31
gibiI will try to get to the logic nova does...15:31
sean-k-mooneywe also fall back to cold snapshots in some cases15:31
kashyapsean-k-mooney: Yeah, I realize that15:31
sean-k-mooneynova's api does not gurentee that its live15:31
sean-k-mooneyhowever if we stop it we will start the vm after the snap shot15:32
sean-k-mooneyso the pre and post state shoudl be the same15:32
gibikashyap: https://github.com/openstack/nova/blob/28cbcbb3737d544b7f73bd5aec55ccbd40222c95/nova/virt/libvirt/driver.py#L2893-L292215:35
* kashyap clicks15:35
gibinova logs if it is live of cold snapshot15:35
kashyapgibi: Thank you; so this path live snapshot here.  (The instance isn't in SHUTDOWN state)15:36
* kashyap looks the Nova log15:36
kashyaps/looks/looks at/15:37
kashyapThat's the Nova log - https://logserver.rdoproject.org/83/38983/2/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-victoria/67d64d4/logs/overcloud-novacompute-0/var/log/containers/nova/nova-compute.log.1.gz15:38
kashyap(There's another nova-compute.log.txt.gz in the directory; maybe log rotation)15:38
kashyapgibi: Yep, it logs it:15:39
kashyap2022-02-08 17:25:54.917 7 INFO nova.virt.libvirt.driver [req-5a68f755-b20c-41a7-a005-3bcc8b800b28 9afce47c724e4db7958ea7fc1eca1c27 8f305ae0fc324a41971784ef2430d8e3 - default default] [instance: 4262e009-6e83-4d8f-bd24-8a478cecda4a] Beginning live snapshot process15:39
kashyapThank you!15:39
gibikashyap: based on artom's comment in the bug there is also a log about direct snapshot it should be something like "Performing standard snapshot because direct..."15:42
gibidirect snasphot only works with RDB I think15:42
gibiRBD15:42
kashyapAs if people know what is a "standard snapshot" :)15:43
kashyapI don't see such a message in the above nova-compute log15:43
kashyapgibi: So, here it is directly hitting live snapshot process15:44
kashyapI'm talking to the libvirt dev to see if we can reverse engineer the failure ... from the QEMU traces15:44
kashyap(26MB: https://kashyapc.fedorapeople.org/lp_1959014/QMP_exchange.txt)15:44
gibithen maybe artom looked at a different run her https://bugs.launchpad.net/tripleo/+bug/1959014/comments/215:44
kashyapYeah, likely.  This is a different run where the (QEMU) 'blockdev-del' failure is occuring15:45
sean-k-mooneywell standard snapshot in this context means a local snapshot of a file done by qemu15:50
kashyapI've asked Peter Krempa from libvirt about the log traces; we'll see if we get something15:50
sean-k-mooneyvs direct which means we are delegating the snapshot to the ceph cluster15:50
kashyapsean-k-mooney: Yeah, but "standard" and "direct" are too abstract terms.  We should be explicit on what it means.15:51
kashyap(But that's a patch for a different day.)15:51
sean-k-mooneywell perhaps but thse are really intended for us as nova develpers to read15:52
kashyapYeah, even for developers who work on different subsystems it will be far from obvious.15:52
sean-k-mooneyso there si some expecation of knowing what the code does15:53
gibikashyap: looking at the req-5a68f755-b20c-41a7-a005-3bcc8b800b28 you quoted above, I don't see anything in the logs about the failure, nova logged that the snapshot is done and the image is uploaded https://review.rdoproject.org/zuul/build/67d64d4a4d584eb8be9db0fbc6e18c9d/log/logs/overcloud-novacompute-0/var/log/containers/nova/nova-compute.log.1.gz#570515:53
kashyapgibi: Hmmm, odd; so the guy says the above "reproduces" the problem, not sure what is going on here15:54
kashyapBut in the libvirtd debug log, I _do_ see15:54
kashyap:15:54
kashyap2022-02-08 17:32:24.369+0000: 31580: error : qemuMonitorJSONCheckErrorFull:406 : internal error: unable to execute QEMU command 'blockdev-del': Failed to find node with node-name='libvirt-4-15:54
kashyapstorage'15:55
gibihm based on the timestamp that is 7 mintutes later15:55
kashyap("blockdev-del" deletes a block device that's added by "blockdev-add")15:55
kashyapgibi: Hmm, I see.  Then I'm scratching my head here about what's going on from a failure trigger15:56
sean-k-mooneythat is likely part of the test cleanup15:57
kashyapsean-k-mooney: No, from the traceback in the bug description, it is happening at resource_setup() here:15:58
sean-k-mooneyif its 7 minute after the snapshot15:58
kashyaphttps://github.com/openstack/tempest/blob/master/tempest/api/compute/images/test_list_image_filters.py#L11015:58
opendevreviewAlexey Stupnikov proposed openstack/nova master: Run clean up calls when queued live migration is aborted  https://review.opendev.org/c/openstack/nova/+/82857015:58
sean-k-mooney hum why woudl dat do a blockdev-del15:59
sean-k-mooneyunless the vm coudl not boot15:59
sean-k-mooneyand it was cleaning up15:59
kashyapYeah, it sounds like a clean-up act somewhere16:00
gibibauzas: ping :)16:01
opendevreviewElod Illes proposed openstack/nova stable/queens: [stable-only] Fix sphinx doc job  https://review.opendev.org/c/openstack/nova/+/83032716:01
bauzasoh shit16:02
bauzas#startmeeting nova16:02
opendevmeetMeeting started Tue Feb 22 16:02:13 2022 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.16:02
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:02
opendevmeetThe meeting name has been set to 'nova'16:02
bauzashey folks, sorry the delay16:02
gmanno/16:02
* bauzas looks at his coffee16:02
chateaulav\o16:02
bauzasthat's its fault16:02
bauzas#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting16:03
elodilleso/16:03
bauzaslet's a quick meeting since we're close to some deadline :)16:03
bauzaslet's do*16:03
bauzas#topic Bugs (stuck/critical) 16:03
bauzas#info No Critical bug16:03
bauzas#link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 27 new untriaged bugs (-15 since the last meeting)16:03
bauzasthanks gibi and others for having looked at the bugs16:03
bauzas#help Nova bug triage help is appreciated https://wiki.openstack.org/wiki/Nova/BugTriage16:03
bauzaslink https://storyboard.openstack.org/#!/project/openstack/placement 27 open stories (+0 since the last meeting) in Storyboard for Placement 16:03
bauzas#link https://storyboard.openstack.org/#!/project/openstack/placement 27 open stories (+0 since the last meeting) in Storyboard for Placement 16:04
bauzasvoila16:04
bauzasany bugs to discuss ?16:04
bauzaslooks not16:04
bauzasagain, thanks for the folks who looked at the bugs16:04
bauzas#topic Gate status 16:05
bauzas#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:05
bauzas#link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly Placement periodic job status 16:05
bauzas#info Please look at the gate failures and file a bug report with the gate-failure tag.16:05
bauzasso we had a few new gate failures16:05
bauzasI saw gibi looking at one16:05
bauzasany of them needing to be discussed ?16:06
gibiI just did some bug triage and duplicated thing together16:06
gibiI did not see a new failure this week16:06
gibi* new type of failure16:06
bauzaskk16:07
bauzasok, then moving to the next topic16:07
bauzas#topic Release Planning 16:07
bauzas#link https://releases.openstack.org/yoga/schedule.html#y-ff FeatureFreeze in 2 days16:07
bauzas#link https://etherpad.opendev.org/p/nova-yoga-blueprint-status Etherpad for blueprints tracking 16:07
bauzasso the FeatureFreeze will be on Thursday EOB16:07
bauzastbh, that means that nova-cores won't merge any feature change on Friday16:08
bauzasunless we got a gate issue16:08
bauzasdo people have any concerns ?16:09
gibibauzas: you mean no +A on Friday16:09
bauzasgibi: sorry yes16:09
gibiexcept if rebase is needed16:09
bauzascorrect16:09
gibiack16:09
gibiworks for me16:09
gmannack16:10
bauzasdo people want to discuss about any blueprint here ?16:10
gmannneed more reviews on RBAC series https://review.opendev.org/q/topic:bp%252Fpolicy-defaults-refresh-216:10
liorf95yes please16:10
gmannthanks gibi for reviews 16:10
bauzasyes, I'll do16:10
gmannthanks16:10
liorf95https://blueprints.launchpad.net/nova/+spec/nvmeof-multipath16:10
bauzasliorf95: we'll discuss about your bp in the open discussion topic 16:10
sean-k-mooneyliorf95: i think yours is in opendicuss at the end but we coudl bring it up now too16:10
liorf95as you want 16:11
bauzasliorf95: that's in the agenda :)16:11
bauzasanyway16:11
bauzassecure RBAC, yes I'll review16:11
bauzaswe merged a few features before16:12
bauzasI'll modify Launchpad on Friday16:12
chateaulavjust reviews for emulation to ensure nothing bltent sticks out16:12
bauzasok16:13
bauzasok, moving on then16:14
bauzas #topic Review priorities 16:14
bauzas#topic Review priorities 16:14
bauzas#link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+label:Review-Priority%252B116:14
bauzasmost of the priorities were merged16:14
bauzasnothing to tell here 16:15
bauzaslet's continue to review16:15
bauzas#topic Stable Branches 16:15
bauzaselodilles: your time16:15
elodillesinfo stable/queens gate fix proposed: https://review.opendev.org/830327 (needs to backport to pike as well)16:15
elodilles#info stable/xena gate should be better as Artom's workaround was merged (test_tagged_attachment)16:15
elodilles#info stable/wallaby still needs the libvirt_disable_apic to merge + it seems the test_tagged_attachment issue is also present16:15
elodillesoops16:15
elodillessorry16:15
elodillesanyway, this is mostly it ^^^16:16
elodilleswe might need the workarounds to stable/victoria too16:16
elodilles#info stable/queens gate fix proposed: https://review.opendev.org/830327 (needs to backport to pike as well)16:17
elodilles(just to have it in the logs)16:17
bauzas:)16:18
bauzasthanks elodilles16:19
elodillesnp16:19
bauzas#topic Open discussion 16:19
bauzasliorf95:  (lior Friedman) https://blueprints.launchpad.net/nova/+spec/nvmeof-multipath16:20
bauzasliorf95: your turn16:20
liorf95yes please16:20
bauzasor I can explain16:20
bauzasmaybe some of them don't understand the contect16:20
bauzascontext16:20
bauzasso,16:20
liorf95we need this bp to be approved for https://review.opendev.org/c/openstack/nova/+/82394116:20
bauzasbasically we had https://review.opendev.org/c/openstack/nova/+/823941 that was reviewed16:21
bauzasand stephenfin and sean-k-mooney accepted the change16:21
bauzasbut,16:21
bauzasas you can look at the comments, we saw it was actually a feature 16:22
bauzasthat's why I accepted to have a discussion in this meeting to see whether folks accept to merge this one even if we had not accepted the blueprint before16:22
bauzasfor the moment, there is a procedural -2 from me16:23
bauzasbut if folks agree here to have an exception for https://blueprints.launchpad.net/nova/+spec/nvmeof-multipath as a specless bp, then I could remove it16:23
bauzasnow, the question : is https://blueprints.launchpad.net/nova/+spec/nvmeof-multipath a specless BP and are folks accepting it as a exception ?16:24
gibiI've nothing against it. It is just a volume driver so no spec needed. And the change is really small 16:24
bauzascorrect16:25
bauzaschange is small and only touching a very few places16:26
sean-k-mooneyi am supprotive fo doign it as a specless blueprint16:26
bauzasok, anyone having concern ?16:26
sean-k-mooneyits after teh cutoff but the scale of it is small enought that i think an excption can be made16:26
bauzasyeah16:26
opendevreviewMerged openstack/placement master: Extend the RP db query to support any-traits  https://review.opendev.org/c/openstack/placement/+/82584816:27
opendevreviewMerged openstack/placement master: DB layer should only depend on trait id not names  https://review.opendev.org/c/openstack/placement/+/82649016:27
bauzasbasically we have deadlines for accepted blueprints because they change lots of things16:27
opendevreviewMerged openstack/placement master: Enhance doc of _get_trees_with_traits  https://review.opendev.org/c/openstack/placement/+/82578016:27
opendevreviewMerged openstack/placement master: Extend the RP tree DB query to support any-traits  https://review.opendev.org/c/openstack/placement/+/82584916:27
opendevreviewMerged openstack/placement master: Add any-traits support for listing resource providers  https://review.opendev.org/c/openstack/placement/+/82649116:27
bauzasso we're procedural not for the sake of being it, but for making sure we are sure how this works16:27
bauzashere, I don't see a problem16:27
gibiI agree16:28
bauzasok so,16:29
bauzas#agreed specless BP for https://blueprints.launchpad.net/nova/+spec/nvmeof-multipath as an exception16:30
bauzasliorf95: I'll then remove my -216:30
bauzasand +W it16:30
bauzasthat's it I had16:30
liorf95Thanks a lot for your efforts here16:31
bauzasany other topic to discuss as a last minute ?16:33
gibi-16:33
bauzaslooks not16:34
bauzasthanks all16:34
bauzas#endmeeting16:34
opendevmeetMeeting ended Tue Feb 22 16:34:22 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:34
opendevmeetMinutes:        https://meetings.opendev.org/meetings/nova/2022/nova.2022-02-22-16.02.html16:34
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/nova/2022/nova.2022-02-22-16.02.txt16:34
opendevmeetLog:            https://meetings.opendev.org/meetings/nova/2022/nova.2022-02-22-16.02.log.html16:34
gibithanks16:34
elodillesthanks bauzas o/16:34
liorf95Thanks16:34
gibisean-k-mooney: just a reminder you only left +1 on https://review.opendev.org/c/openstack/placement/+/82649 yesterday :)16:34
chateaulavthanks o/16:35
gmannthanks 16:35
gibikashyap: https://logserver.rdoproject.org/83/38983/2/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-victoria/67d64d4/logs/undercloud/var/log/tempest/stestr_results.html.gz the tempest logs does not contain either the server uuid or the request id for the snapshot command :/16:38
gibiI cannot correlate the logs :/16:39
kashyapgibi: Good news!16:39
gibioh16:39
kashyapgibi: I just talked to Peter Krempa on #virt, he already identified it as a libvirt bug ...16:39
kashyapgibi: The problem is this, in short:16:39
kashyap[quote]16:39
kashyaplibvirt has a piece of code which ensures that thh backing image of the reused destination image is added only when finishing the job.  On cancellation of the [copy] job, we want to unplug the image, but the backing image was not yet plugged in.16:39
kashyap[/quote]16:40
kashyapIt assumes a bit of libvirt storage knowledge, but Peter is already working on a fix16:40
kashyapgibi: I'm filing RHEL bugs to get it backported to CentOS 8 and 916:41
gibiawesome16:41
gibithanks for the info16:41
kashyapgibi: But Peter is again asking what is the test "exactly doing"16:44
kashyapIs this exactly it?16:44
kashyap"boot servers, then snapshot the server and upload the image to Glance"16:45
kashyapDid you mean boot a "server", as opposed to plural?16:45
gibikashyap: create a server, and then create a snapshot16:47
gibikashyap: the test case itself create multiple servers16:47
kashyapI see16:48
kashyapgibi: https://bugzilla.redhat.com/show_bug.cgi?id=205706716:57
kashyapgibi: But Peter also says there might be another issue too, because:16:57
kashyap... "this is on a code path that can't propagate errors to the caller"16:57
kashyapgibi: Anyway, he posted a patch here: https://listman.redhat.com/archives/libvir-list/2022-February/msg00790.html16:57
gibikashyap: ack. I would need a tripleoo job result where I can correlate logs better to be able to look more into it17:19
gibikashyap: I serched for the QEMU error in the upstream nova job results and there is a lot of hit17:20
gibibut I guess it is a generic error message17:20
gibias the QEMU error does not correlate with the tempest test failure 17:20
gibiI stop for today but happy to help tomorrow if there are things to look at17:21
*** amoralej is now known as amoralej|off17:45
opendevreviewRajat Dhasmana proposed openstack/nova master: Add support for volume backed server rebuild  https://review.opendev.org/c/openstack/nova/+/82036818:53
opendevreviewMerged openstack/nova master: Support use_multipath for NVME driver  https://review.opendev.org/c/openstack/nova/+/82394118:53
opendevreviewDmitrii Shcherbakov proposed openstack/nova master: Fix migration with remote-managed ports & add FT  https://review.opendev.org/c/openstack/nova/+/82997419:18
opendevreviewRajat Dhasmana proposed openstack/python-novaclient master: Add parameter to rebuild boot volume  https://review.opendev.org/c/openstack/python-novaclient/+/82716319:33
whoami-rajatdansmith, Hey, if you get some time, can you take a look at my feature patches?19:37
whoami-rajatnova: https://review.opendev.org/c/openstack/nova/+/82036819:37
whoami-rajatnovaclient: https://review.opendev.org/c/openstack/python-novaclient/+/82716319:37
dansmithwhoami-rajat: ah I didn't see that you had updated them.. is it working now?19:37
dansmithwhoami-rajat: ah I didn't see that you had updated them.. is it working now?19:39
dansmithwhoami-rajat: I assume you've got a tempest patch somewhere (or planned) to poke it19:39
dansmith?19:39
opendevreviewMerged openstack/placement master: Add any-traits support for allocation candidates  https://review.opendev.org/c/openstack/placement/+/82649219:40
whoami-rajatdansmith, just updated everything and yes it is working now. thanks!19:40
*** tbarron is now known as Guest25219:40
opendevreviewJonathan Race proposed openstack/nova master: driver/secheduler/docs for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82205319:41
opendevreviewJonathan Race proposed openstack/nova master: zuul-job for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82837219:41
whoami-rajatdansmith, I've mostly tested this manually but i can work on a tempest test, the problem is deadline is close19:41
whoami-rajatdansmith, If i can get an FFE or propose the test after feature freeze then it will be great19:41
dansmithI mean, I think a tempest test should be part of the requirements here.. was that not in the spec for this?19:42
dansmithI would think it'd be easy right? take an existing rebuild test, make it use BFV, and pass the right version on rebuild?19:42
dansmithif the tempest test shows it works and just needs some work after the deadline to get it into a merge-able shape, I think that's fine19:43
whoami-rajatsure, I haven't looked at existing tests but if there's already one for rebuild, it should be pretty straightforward19:44
dansmithyeah, tempest test is in the testing section of the spec... 19:44
whoami-rajatdansmith, I will try and get something working by tomorrow19:48
dansmithack19:48
whoami-rajatthanks!19:48
opendevreviewMerged openstack/placement master: Remove unused compatibility code  https://review.opendev.org/c/openstack/placement/+/82649320:20
opendevreviewMerged openstack/placement master: Add microversion 1.39 to support any-trait queries  https://review.opendev.org/c/openstack/placement/+/82671920:23
gmanndansmith: rbac series is ready, added release notes also in last patch please check if that is ok https://review.opendev.org/q/topic:bp/policy-defaults-refresh-2+status:open20:26

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!