Monday, 2021-06-28

opendevreviewSlawek Kaplonski proposed openstack/nova stable/ussuri: [neutron] Get only ID and name of the SGs from Neutron  https://review.opendev.org/c/openstack/nova/+/78725305:44
opendevreviewYongli He proposed openstack/nova master: Smartnic support - cyborg drive  https://review.opendev.org/c/openstack/nova/+/77136206:06
opendevreviewYongli He proposed openstack/nova master: smartnic support - new vnic type  https://review.opendev.org/c/openstack/nova/+/77136306:06
opendevreviewYongli He proposed openstack/nova master: smartnic support - create arqs  https://review.opendev.org/c/openstack/nova/+/75894406:06
opendevreviewYongli He proposed openstack/nova master: smartnic support - cleanup arqs  https://review.opendev.org/c/openstack/nova/+/79805406:06
opendevreviewYongli He proposed openstack/nova master: smartnic support - reject server move and suspend  https://review.opendev.org/c/openstack/nova/+/77991306:06
opendevreviewYongli He proposed openstack/nova master: smartnic support - functional tests  https://review.opendev.org/c/openstack/nova/+/78014706:06
opendevreviewYongli He proposed openstack/nova master: smartnic support - build instance with smartnic arqs  https://review.opendev.org/c/openstack/nova/+/79824906:06
gibisean-k-mooney[m]: do you still hold your -1 on https://review.opendev.org/c/openstack/nova/+/797142 ? the follow up is green07:39
gibilyarwood: I have a comment in https://review.opendev.org/c/openstack/nova/+/779275 about the assumption that size is always provided to create_image07:49
MrClayPoleMorning all, We currently have an OpenStack ansible rocky deployment running on Ubuntu 18.04. We've been having failures during live migrations. We are seeing the following error in the journal logs but are not having much luck trying to trace it "error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainFSFreeze)" & "error : 08:26
MrClayPoleqemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchDomainMigratePrepareTunnel3Params)"08:26
lyarwoodgibi: ack, I think this is because ramdisk and kernel files are always RAW but let me grep around again and confirm09:01
lyarwoodwas anyone working on the VIR_CONNECT_LIST_NODE_DEVICES_CAP_VDPA libvirt regression btw?09:02
lyarwoodhttps://zuul.opendev.org/t/openstack/build/68e59744ef7444a5ae108118983c9353/log/controller/logs/screen-n-cpu.txt#1525 - the new centos job is hitting it09:02
* lyarwood is sure it came up somewhere either up or downstream last week09:02
lyarwoodhttps://bugs.launchpad.net/nova/+bug/1933096 ah ha09:03
stephenfinIn the docs on AZs, we have this sentence "A host can be part of multiple aggregates but it can only be in one availability zone". Anyone know off the top of their heads what enforces this? 09:06
stephenfinfrom https://docs.openstack.org/nova/latest/admin/availability-zones.html09:06
stephenfinI wrote that, but I think I copy-pasted it from elsewhere09:06
stephenfinah, found it. 'is_safe_to_update_az' in nova/compute/api.py09:20
gibilyarwood: Sean looked in https://bugs.launchpad.net/nova/+bug/1933096 before09:24
lyarwoodyup and closed it invalid as it was third party CI, this time it's our own upstream CI09:24
lyarwoodwas going to ask them how we can proceed here, no idea how we cache packages on CI nodes tbh09:24
gibilyarwood: I guess we also has a bad cache somewher then09:25
lyarwoodyeah likely09:25
opendevreviewJorhson Deng proposed openstack/nova master: recheck the attachment_id after the reschedule successful  https://review.opendev.org/c/openstack/nova/+/79620909:31
*** bhagyashris_ is now known as bhagyashris09:40
sean-k-mooney[m]lyarwood:  so im still not conviced that that is a valid bug11:37
sean-k-mooney[m]or rather we could adress it but only by nolonger relying on any libvirt version checks in our code11:37
sean-k-mooney[m]libvirt-python is not really intended to be installed as a wheel11:38
sean-k-mooney[m]its intended to generate bindings when its installed for your current libvirt version which it wont do if you have prebuilt it as a wheel 11:38
sean-k-mooney[m]stephenfin:  yes we enforce that a host can only be in one az11:39
sean-k-mooney[m]lyarwood:  i can add an extra guard conditon for this specific case but it would just be a wack a mole problem for any other case where we use code that is generated on install11:41
opendevreviewsean mooney proposed openstack/nova master: fix sr-iov support on Cavium ThunderX hosts.  https://review.opendev.org/c/openstack/nova/+/77767912:48
bauzasstephenfin: when you're around, we can discuss on https://review.opendev.org/c/openstack/nova/+/798145 if you wish13:00
bauzastl;dr: problem is that we don't verify the AZs if you don't use the AZfilter13:00
bauzasso we can't just look at them by the API service unless we know that the AZFilter is used13:01
sean-k-mooneyour down stream customer could avoid the issue they had if they just enabled the placemnt preilter13:01
sean-k-mooneythat would enforece the AZ existance check13:02
sean-k-mooneybut they could still select the host using the hack13:02
sean-k-mooneybauzas: i do agree though that we should remove that in a new microversion now that we have teh new way to do it13:03
bauzassean-k-mooney: my thought is that we should just not using the az hack after a new microversion13:03
sean-k-mooneyyep13:03
sean-k-mooneyi was expecting that to have been done in the one that added --host13:03
bauzasfor sure, it wouldn't fix the issue of a requested AZ not good but...13:03
sean-k-mooneyi also agree with our assement tha the az in the request spec and instance are not always intended to match13:04
sean-k-mooneyclassic example being request spec is none but instance has a value set13:04
sean-k-mooneyin princiapl i think that is the only ligitimat case where they should disagree13:04
sean-k-mooneyif the request spec is non None then they should agree13:05
sean-k-mooneyif they dont you forced a live migration13:05
stephenfinbauzas: we don't currently, but I'm adding that13:05
stephenfinand the AZFilter is no use to us if we're bypassing the scheduler by forcing a host13:05
sean-k-mooneystephenfin: right but im not conviced you should13:05
sean-k-mooneystephenfin: that is not how that works13:05
sean-k-mooneywe check that the az exists13:06
bauzasstephenfin: what sean-k-mooney said13:06
stephenfinrequesting zone:host makes no sense if $host is not in $zone13:06
sean-k-mooneyand only proceed if it does when you use the az hack13:06
bauzasstephenfin: it's an hack, we should just remove it13:06
stephenfinwe can't remove it for the older APIs13:06
bauzassurely13:06
sean-k-mooneystephenfin: that is something we could check potentally but im not sure the api is the right place13:06
stephenfinso people will keep hitting this13:06
bauzasstephenfin: it's an hack, right?13:07
sean-k-mooneywell its was a supported feature13:07
bauzasand you need to be an operator13:07
bauzassoooo13:07
sean-k-mooneybut yes13:07
bauzasthe az hack can't be used by an end user13:07
sean-k-mooneyyes it can13:07
bauzasnot by default13:07
sean-k-mooneythey just need to use an older microverion13:07
bauzasthe default policy is admin13:08
sean-k-mooneybauzas: is it?13:08
bauzasfor the az hack ? yes13:08
sean-k-mooneyi tought we did not have a sepreate policy for it13:08
stephenfinI must admit I don't understand the issue13:08
bauzas(and fortunately)13:08
sean-k-mooneyjust the az one13:08
stephenfinwhy wouldn't a simple "does this host belong to this AZ" check make sense?13:08
stephenfinit's not too expensive fwict13:09
gibibauzas: do you suggest to keep allowing calling --availability_zone my-az:host-not-in-my-az and succeed in old microversions? 13:09
bauzassean-k-mooney: I'm 100% sure about the different policy13:09
stephenfina simple lookup in the API DB13:09
bauzasgibi: we *could* fix this for old versions only, but then I have another concern13:09
* stephenfin dashes to shop to get food for lunch, brb13:10
sean-k-mooneystephenfin: my main issue is that you are doing it in a different location to the other az check13:10
bauzasgibi: my other concern is that I know some environments that don't use the AZfilter13:10
sean-k-mooneystephenfin: whihc is doen in the schduler i belive13:10
bauzasgibi: and previously, you were able to use the az hack without the AZFilter13:11
gibibauzas: yes, but that hack resulted in an inconsistent system13:11
gibias described in the bug13:11
bauzasgibi: not if you don't use the filter13:11
bauzassee the problem ?13:11
sean-k-mooneygibi: well its  basically forcing the host13:12
sean-k-mooneysame as a forced migration13:12
gibiso if you dont use teh AzFilter then no AZ recorded in the instance or in the request_spec?13:12
bauzasgibi: and again, I remember ourselves saying A LOT 'well, if you force a host, then meh"13:12
sean-k-mooneygibi: the az should be recoreded in the isntacne regaradless of the filter13:12
bauzasgibi: now, we become super picky about forcing hosts and we want to verify them13:13
gibiI think the base issue is that if you use the hack then you end up having inconsistent az recorded in the instance and in the request_spec13:13
bauzasbut again, you *SHOULDN'T* force a destination 13:13
gibiwe cannot remove the hack from old microversions13:13
bauzasthat's why we added 2.74 version13:13
bauzasto have a way to propose a target without forcing it13:14
sean-k-mooneygibi: you wont always13:14
bauzasand we said as a consensus that we should stop supporting to force move 13:14
bauzasso, if operators wanna move (because again, you need to be ADMIN in order to use the AZ hack), then your dog13:15
sean-k-mooneybauzas: this is the policy yes https://github.com/openstack/nova/blob/master/nova/policies/servers.py#L204-L22513:15
bauzasagain, I was 100% sure about it13:15
sean-k-mooneyoh no that the new one13:15
gibibauzas: even if we fix the bug in the hack the admin still can move to any host just need to specy the proper az name of the host13:15
gibibauzas: so no functionality is lost13:15
sean-k-mooneythis is the old one https://github.com/openstack/nova/blob/master/nova/policies/servers.py#L177-L19613:15
bauzasgibi: admins can move to bad targets anyway13:16
sean-k-mooneygibi: what would the fix be13:16
bauzasgibi: admins can force migrate to hosts without verifying other attributes of the host13:16
gibi bauzas: it is not a bad target, the host is valid, nova just record a wrong az name during the move as it trustes admin input13:16
sean-k-mooneygibi: just include the AZ and not the host in the request spec?13:16
gibisean-k-mooney: the fix is to make sure admin provide an az name for the host that is valid for the host, then nova will record a valid az name13:17
bauzasagain, we're breaking existing behaviours if we change things 13:17
gibibauzas: we are fixing a bug13:17
bauzasbecause again, some operators opt-out the AZfilter13:17
gibiand such we can break old buggy behaviro13:17
bauzasgibi: it's not a bug, it's a 40x13:17
gibibauzas: the db inconsistency is the bug13:17
bauzasno13:17
sean-k-mooneygibi: stephenfin  if we add this check we shoudl also move the az exits check to the api also13:18
bauzasyou asked for a target you can't succeed13:18
sean-k-mooneystephenfin: you added that to your check but did you remove the check later13:18
bauzasgibi: you can create an instance on AZ1, then force migrate to AZ213:18
gibibauzas: I still think that if nova creates an inconsistent db record then we should fix that13:18
bauzasgibi: and then, good luck with resizing the instance13:18
sean-k-mooneybauzas: well a resize in that case will resize back to AZ113:19
gibiso I can accept any fix that result in a consistent db data. 13:19
bauzasagain, this is a forced operation and we made a clear statement on the fact broken migrations are not nova's fault13:19
bauzasgibi: if we really want to fix this thing13:20
bauzasgibi: I'd then suggest two things13:20
bauzasgibi: 1/ remove the call by a new microversion13:20
bauzas2/ change the az value to None or to the host AZ in the az hack method13:21
bauzasthe az value is meaningless when you use the force hosts 13:21
bauzasbut I wouldn't hardstop on the call13:21
bauzaseg.13:22
bauzasnova boot --az az1:host_in_az2 would consist into getting the tuple (None, host, node)13:22
gibi1/ is totally OK to me. So remove the hack in future version. 13:22
sean-k-mooneysetting it to none would be consitent with using --host13:22
bauzasor actually (schedule_default_az, host, node)13:23
bauzasI mean, setting the returned az to be the default AZ from the option13:23
bauzas(which defaults to None)13:23
gibi2/ if we can simulate --host when --az was given with bad az name, and log a warning, then I can accep that as well13:23
gibiso keep the existing bad (but used) behavior but avoid incosistent db data13:24
gibiin old microversin13:24
bauzasgibi: the crucial distinction between --host and the az hack is the fact we call out the scheduler on the former, not on the latter13:24
bauzasgibi: honestly, again, ops are using the az hack not for the az, but for providing a target13:25
bauzasgibi: so agreed, we should log a warning (after all, this is an op who did this) and just propose the default AZ as a returned AZ13:25
bauzasif people really want to both force to a target *AND* stick on this AZ, then they can use --host and --az (without the az hack)13:26
gibiyepp13:27
bauzasI'll log my thoughts in the review13:27
gibibauzas: thanks13:28
gibilet's see how stephenfin feels about it after his lunch13:29
bauzassure13:33
sean-k-mooneygibi: here is a patch to update teh neutron doc by the way  https://review.opendev.org/c/openstack/neutron/+/798302 13:34
sean-k-mooneygit distracted by the previous conversation13:34
bauzasgosh, eavesdrop is soooo slow to update the 13:35
sean-k-mooneybauzas: i think its a cron job or similar13:35
bauzaslast updated bits are from more than 20 mins13:35
sean-k-mooneyit often pretty quick but sometimes its delayed13:35
sean-k-mooneyya that sometimes happens13:36
sean-k-mooneyususally its only a minute or so behind at most13:36
bauzasstill lagging13:37
*** abhishekk is now known as akekane|home13:39
*** akekane|home is now known as abhishekk13:39
gansolyarwood: hi! could you please take one quick look at https://review.opendev.org/c/openstack/nova/+/795432 ? The other reviewers said they are waiting for your feedback. Thanks in advance!13:39
gibisean-k-mooney: ups, I also pushed a doc patch https://review.opendev.org/c/openstack/neutron/+/79829413:40
sean-k-mooneyoh ok lol13:41
bauzascan someone confirm it's not PEBKAC if https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-nova.2021-06-28.log.html is lagging ?13:42
sean-k-mooneygibi: i dont mind going with yours we took slightly different approches but the message is similar13:42
artombauzas, https://meetings.opendev.org/irclogs/%23openstack-nova/latest.log.html13:42
sean-k-mooneyartom: its the same page13:43
artomsean-k-mooney, I know13:43
artom13:15 is the latest timestamp there13:43
artomThat's in UTC I imagine13:43
artomSo about 30 minutes behind...13:43
sean-k-mooneybauzas: i assume you just want to link to the point where the converstation started13:43
artomYeah, feels longer than normal13:43
bauzassean-k-mooney: I'd rather point to the written agrement13:44
sean-k-mooneybauzas: it have updated by the time anyone actully reads it13:44
sean-k-mooneyah ok13:44
sean-k-mooneyill quickly check with infra13:44
artomAlso, "then your dog"13:45
bauzasartom: did you like it ? did i used it correctly ?13:50
artombauzas, I actually have no idea what you were trying to say :P13:50
bauzasartom: "then it's your problem"13:50
gibisean-k-mooney: ack, let me know if you think some part of your message should be incorporated to mine13:51
artomNever heard it used like that... or at all, in fact13:51
bauzas"mommy, I don't wanna take the dog out it's raining", "darling, you wanted it, so YOUR DOG"13:51
artombauzas, I think you just invented an expression. You're this century's Shakespeare13:53
gibi:D13:53
bauzasartom: Voltaire, please13:53
bauzasso, I left my comment but i need to get my kids from school, ttyl14:18
opendevreviewsean mooney proposed openstack/os-vif master: [WIP] add configurable per port bridges  https://review.opendev.org/c/openstack/os-vif/+/79805514:33
gibibauzas, sean-k-mooney: you were +2 on the Placement RP re-parenting spec, could you also look at the implementation https://review.opendev.org/c/openstack/placement/+/784020 ?14:57
bauzassurelyu14:57
sean-k-mooneyyes i can do that14:57
bauzasI tho have a meeting in 2 mins... so tomorrow morning, your dog :p14:57
* bauzas tries to make a catchphrase14:57
gibibauzas: thanks :)14:58
gibisean-k-mooney: thanks14:58
gibialso on placement side there is a quick small patch to add pps resources to os-resource-classes https://review.opendev.org/c/openstack/os-resource-classes/+/796591 14:58
bauzas+2d on the last one15:29
gibibauzas: thanks15:33
opendevreviewElod Illes proposed openstack/nova stable/pike: Update pci stat pools based on PCI device changes  https://review.opendev.org/c/openstack/nova/+/79834515:36
*** abhishekk is now known as abhishekk|dinner15:55
*** abhishekk|dinner is now known as abhishekk|home16:28
*** abhishekk|home is now known as abhishekk16:28
opendevreviewStephen Finucane proposed openstack/nova master: WIP: api: Validate host belongs to availability zone  https://review.opendev.org/c/openstack/nova/+/79814516:32
opendevreviewMerged openstack/nova stable/ussuri: [CI] Fix gate by using zuulv3 live migration and grenade jobs  https://review.opendev.org/c/openstack/nova/+/79543219:19
opendevreviewsean mooney proposed openstack/os-vif master: [WIP] add configurable per port bridges  https://review.opendev.org/c/openstack/os-vif/+/79805519:23
gansogibi, bauzas: if you have a spare minute to please look at this backport that is ready for an extra +2 and +W: https://review.opendev.org/c/openstack/nova/+/79671919:30
toskymelwitt, elodilles: yay, now time to update the train backport of the legacy cleanup and the CI should be (almost) fine20:36

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!