Tuesday, 2021-06-22

*** ricolin_ is now known as ricolin05:05
*** rpittau|afk is now known as rpittau07:06
zigoSince Bullseye, live migration is broken on my system, with libvirt not listening on the qemu+tcp port (it's using socket activation). Does anyone know how to configure the new libvirt correctly for this?08:37
stephenfinsean-k-mooney1: you're looking at the wrong XML, I think08:58
stephenfinsean-k-mooney1: You can see the update happening here https://zuul.opendev.org/t/openstack/build/73f734b9ffe745bd833ffe90f324d6d8/log/compute1/logs/screen-n-cpu.txt#7468-747608:59
bauzasdo someone know whether we use in our jobs the singleconductor devstack value ?  https://github.com/openstack/devstack/blob/master/lib/nova#L537 ?09:02
bauzasI was expecting us to create a superconductor by default even for a devstack AIO09:02
bauzasnevermind, found it 09:09
bauzasthis is just legacy09:09
bauzasI can't see a single job running with a singleconductor09:09
stephenfinbauzas: maybe remove it so?09:15
stephenfin*we should09:15
bauzasyup, think so09:15
sean-k-mooney1 stephenfin the xml should always be ethernet10:07
sean-k-mooney1stephenfin: it is useing ethernet in that segment which is good10:08
sean-k-mooney1but it should spawn with it too10:08
sean-k-mooney1stephenfin: ok so we need to determin why its only  taking effect on migration and not on inital spawn10:09
sean-k-mooney1that narrows it down at least10:10
stephenfinsean-k-mooney: We never set the 'delegate_create' attribute of the VIF object anywhere except as part of a migration10:14
stephenfinsean-k-mooney: https://github.com/openstack/nova/commit/a62dd42c0dbb6b2ab128e558e127d76962738446#diff-d47af556a60631b879ac4f609e24b5616aa81643d152583b1591ef90848382bbR10810:15
stephenfinwhich means it's never set as part of the os-vif object https://github.com/openstack/nova/commit/a62dd42c0dbb6b2ab128e558e127d76962738446#diff-918e740344fd54062b1d2040fc7d83a6fb349e33a484b3a442d154977a7f872bR35010:16
sean-k-mooneyah ok that was my orginal error then. i used to do it uncondtionally then i added the code to do the patch detection10:16
sean-k-mooneyso we want to default it to true but have the migration code check and disable if the dest is not supported10:17
stephenfinI can look into that. It's a different issue than the one I'm fixing in that patch though, so I think that patch is still valid by itself10:19
opendevreviewJorhson Deng proposed openstack/nova master: recheck the attachment_id after the reschedule successful  https://review.opendev.org/c/openstack/nova/+/79620910:22
sean-k-mooney it is but without that other change its a really odd behavior10:22
sean-k-mooneyi kind of think they need to be in one patch10:22
sean-k-mooneywe dont want the interface to be differn after a live migration and have it chagne back after a hard reboot do we10:23
sean-k-mooneywe could do that but it feels odd10:23
sean-k-mooneyits also not what i had planned in that libvirt would still be invovled in port plugging10:23
sean-k-mooneystephenfin: we can just set it here i belive https://github.com/openstack/nova/blob/master/nova/network/neutron.py#L297210:24
sean-k-mooneyalthough that technically would mean it should be in the vif model object10:26
opendevreviewJorhson Deng proposed openstack/nova master: recheck the attachment_id after the reschedule successful  https://review.opendev.org/c/openstack/nova/+/79620910:27
sean-k-mooneythis is where we make the desision currently10:28
stephenfinsean-k-mooney: we create the VIF just after we call that function https://github.com/openstack/nova/blob/5979c648462b03a2fe90148f20f099c964cdd298/nova/network/neutron.py#L307810:28
sean-k-mooneystephenfin: we need create port to default to true10:28
stephenfinso we can just set it there always10:28
stephenfinthere's no reason it ever needs to be false - it will be ignored for backends where it doesn't make sense10:28
sean-k-mooney am we could but you will have to extend the vif model object if you do that which will change the versioned notifcations10:29
sean-k-mooneyand that wont be backportable10:29
sean-k-mooneywe need to do this slightly different10:30
stephenfinthat's no an o.vo10:30
stephenfin*not10:30
sean-k-mooneyits not but its in one10:30
sean-k-mooneythe network info cache10:30
sean-k-mooneyits got a list of vifs10:30
stephenfinbut the VIF model has already been modified https://github.com/openstack/nova/commit/a62dd42c0dbb6b2ab128e558e127d76962738446#diff-dfa43d5033ab6af143d8772d584ed93255e60deb7bc58d87eb7305b19f8fa2ffR38110:30
sean-k-mooneyi guess we normally dont change for compostion but it does change the hash10:30
sean-k-mooney by addign a constant and properties10:31
sean-k-mooneyboth of which do not change the version of the object10:31
sean-k-mooneyoh10:31
sean-k-mooneysorry you ar just setting the property10:31
sean-k-mooneyya so that is why that property stores its data in the profile10:32
sean-k-mooneyto avoid the ovo change and be backportable10:32
sean-k-mooneyok10:32
sean-k-mooneyso yes we can just set the property to true uncondtionally10:32
sean-k-mooneybut we als have https://github.com/openstack/nova/blob/5979c648462b03a2fe90148f20f099c964cdd298/nova/network/model.py#L40510:33
sean-k-mooneyok im confused10:33
sean-k-mooneyi tought that was an ovo issue but maybe not10:34
stephenfinthose are just dics10:34
stephenfin*dicts10:34
stephenfinso we can store whatever we want in them10:34
sean-k-mooneyok so we just pass delegate_create=True 10:34
sean-k-mooneystephenfin: they are but they are also https://github.com/openstack/nova/blob/master/nova/objects/instance_info_cache.py#L3810:34
sean-k-mooneyi guess  thats a network model10:35
stephenfinall that that's doing is calling '.json' on the object https://github.com/openstack/nova/blob/5979c648462b03a2fe90148f20f099c964cdd298/nova/objects/fields.py#L108410:35
stephenfinso it doesn't really matter10:35
sean-k-mooneywell its is using https://github.com/openstack/nova/blob/5979c648462b03a2fe90148f20f099c964cdd298/nova/network/model.py#L51210:36
sean-k-mooneyok i guess its fine10:36
sean-k-mooneyi had issue adding other filds in the past10:36
sean-k-mooneybut maybe i was chanign something else10:36
sean-k-mooneyin anycase i think your right we can just set it to true there10:36
sean-k-mooneybut we need to ensure in the migration path we set it to false wehn its not present orginally10:37
gibistephenfin, sean-k-mooney: for me https://review.opendev.org/c/openstack/nova/+/797142 looks fine, but sean-k-mooney has a -1 on it so I'm affraid of approving it11:13
sean-k-mooneygibi: thats what we were discussign above stephen fixed it for migration but its still using the wrong interface type on boot11:15
gibiOK, so there will be changes. thanks11:16
sean-k-mooneyso with that patch as is we boot with interface type=bridge then we migrate with ethernet and hard reboot back to bridge11:16
sean-k-mooneythe orignal intent was to always use ethernet11:16
sean-k-mooneygibi: technically as is this should actully fix the migration issue we were trying to fix11:17
sean-k-mooneybut its a little odd to change the type just for the migration11:17
sean-k-mooneyim also worried that this current state breaks ovs on windows11:17
sean-k-mooneyso i would prefer to fix the boot case.11:18
opendevreviewMerged openstack/nova stable/victoria: [neutron] Get only ID and name of the SGs from Neutron  https://review.opendev.org/c/openstack/nova/+/78725211:18
opendevreviewStephen Finucane proposed openstack/nova master: libvirt: Always delegate OVS plug to os-vif  https://review.opendev.org/c/openstack/nova/+/79742811:19
stephenfinsean-k-mooney: gibi: ^ there's the follow-up patch. I need to figure out test coverage (currently all unit tests are passing and I'm running functional tests now, but I guess I should add a new test). We can proceed with the current patch as-is though IMO11:20
stephenfinit definitely fixes one issue11:20
sean-k-mooneywell what about for backporting11:20
sean-k-mooneyunless you are going to squash all 3 patches you would have to undo the squashing you have already done11:21
stephenfinNot necessarily. The main patch only ever fixed the issue for live migration but had a bug which we've squashed the fix in for. This new patch stands on its own feet11:22
sean-k-mooneyit was ment to do it always11:22
sean-k-mooneyti did do it always before i put in the detection mechanium for upgrades11:22
stephenfinAlso, I don't think hard reboot is an issue. We'll have set the 'delegate_create' attribute of the VIF entries as part of live migration and nothing unsets that11:23
sean-k-mooneystephenfin: well i would prefer not to backport it in its current state11:23
stephenfinSo it'll change during live migration but it shouldn't change after a hard reboot11:23
sean-k-mooneyit will revert back to bridge in a hard reboot11:23
sean-k-mooneywithout the final patch11:23
stephenfinwhy?11:23
sean-k-mooneybecause bridge is what we use for spawn11:24
stephenfinunless the VIF objects in network_info have delegate_create=True11:24
sean-k-mooneywhich they wont11:24
stephenfinagain, why?11:24
sean-k-mooneythe migration data vifs are not the same as the ones in the info cache11:24
sean-k-mooneythe ones in the info cache get rebuilt form neutron by the perodic task11:25
sean-k-mooneyor whenever we get a network-changed event11:25
sean-k-mooneyso delegate_create=True will be lost11:25
sean-k-mooneyif its ever populated in it in the first place11:25
stephenfinah, TIL11:26
stephenfinokay, so I'll squash the third patch in too so11:26
* gibi nods11:26
sean-k-mooneyok am can we wait for grenade to run on the 3rd patch then we can merge the second one?11:27
sean-k-mooneyi just want to make sure that it is not breaking anything11:27
stephenfinwe really need to mute that SQLAlchemy warning11:29
sean-k-mooneyyep11:29
stephenfinI've a fix proposed to oslo.db11:29
stephenfinhttps://review.opendev.org/c/openstack/oslo.db/+/79742611:29
sean-k-mooneyah nice11:30
sean-k-mooneyi was assuming we woudl set that to false11:30
sean-k-mooneyare we ok with caching softdelete or json data11:30
sean-k-mooneyThe requirements for cacheable elements is that they are hashable and also that they indicate the same SQL rendered for expressions using this type every time for a given cache value.11:31
sean-k-mooneyok11:31
sean-k-mooneyso ya that is correct11:32
sean-k-mooneycache_ok applies11:32
*** osmanlicilegi is now known as Guest18712:25
*** ChanServ changes topic to "This channel is for Nova development. For support of Nova deployments, please use #openstack"12:30
sean-k-mooneystephenfin: the full job suite is not finished yet but the live migration job passed and is now using ethernet so thats good to see just waiting on grenade now12:39
sean-k-mooneygibi: gmann so intersing problem, apparently we do not validate server_group policy strings12:52
sean-k-mooneywe have never intended the server group to be user extendable right?12:52
sean-k-mooneye.g. by using out of tree filters and weighers?12:52
sean-k-mooneyactully we do have schema validataion for thsi 12:59
sean-k-mooneyhttps://github.com/openstack/nova/blob/5979c648462b03a2fe90148f20f099c964cdd298/nova/api/openstack/compute/schemas/server_groups.py#L22-L6712:59
sean-k-mooneybut i guess that is not workign i need to confrim what is being repoted downstream12:59
stephenfinsean-k-mooney: gibi: gmann: Yeah, there's a request to undo the validation of what we pass to the API in OSC because its seems someone is relying on this broken behavior https://storyboard.openstack.org/#!/story/2008975 13:10
sean-k-mooneyso i thikn we need to figure out why the json schema validation is not rejectign there request and fix that and backport it in thet api13:13
sean-k-mooneywe could have a workaround config option to disable that but this is not a valid extenion point today13:14
gibiI agree that we should validate the policy13:15
gibiso let's fix it 13:15
gibiand backport it13:15
gibiI'm meh on providing the workaround config13:15
gibiif we do that the the default should be still to validate the policy13:15
sean-k-mooneythe real question is why is the existing code not validating it when we have the json scmea files to do that and unit test that apprently are testing the validation13:15
sean-k-mooneygibi: yep agreed if we have a config option it shoudl default to validating13:16
gibigood question. unfortunately I have swamped with other things right now13:16
sean-k-mooneyam.... wehre do we actully do the jsonscema validation fo the api schema https://codesearch.opendev.org/?q=jsonschema&i=nope&files=&excludeFiles=&repos=openstack/nova13:57
sean-k-mooneyah this is how we import them https://codesearch.opendev.org/?q=from%20nova.api.openstack.compute.schemas%20import&i=nope&files=nova%2Fapi&excludeFiles=&repos=openstack/nova14:27
sean-k-mooneyand this is where we use the server group schema https://opendev.org/openstack/nova/src/branch/master/nova/api/openstack/compute/server_groups.py#L2614:27
sean-k-mooneyand this is where we apply the vlaidations https://opendev.org/openstack/nova/src/branch/master/nova/api/openstack/compute/server_groups.py#L182-L18414:29
gibisean-k-mooney: there is a bug around VDPA https://bugs.launchpad.net/nova/+bug/193309614:42
sean-k-mooneythat should not happen14:43
sean-k-mooneywe have a version check for this 14:43
sean-k-mooneylet me triple check but im pretty sure we only try to use that if libvirt is knew enouch14:44
sean-k-mooneyits used here https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/nova/virt/libvirt/host.py#L133814:47
sean-k-mooneywhich is called form here https://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/nova/virt/libvirt/vif.py#L537-L54014:48
sean-k-mooneyi guess we dont have a libvirt verion check here14:49
sean-k-mooneybut they would only hit this if they were using a vdpa port with a version of libvirt that did not uspprot this14:49
sean-k-mooneyalthough form the trace it looks like this is from the perodic14:50
sean-k-mooneywhich is protected by the version check14:51
sean-k-mooneyhttps://github.com/openstack/nova/blob/50fdbc752a9ca9c31488140ef2997ed59d861a41/nova/virt/libvirt/driver.py#L7346-L734914:51
gibisean-k-mooney: the bug report says they see it on libvirt 7.0.014:55
gibithat would pass our version check14:56
sean-k-mooneyyep just responding i think there version fo libvirt python is older then there libvirt version14:56
sean-k-mooneywhich is not supported 14:56
sean-k-mooneyhum successfully installed libvirt-python-7.4.014:59
sean-k-mooneythat should work14:59
kashyapThe libvirt commit that introduced that _CAP_VDPA should be available from v.6.9.0 (including the Pyth bindings too):15:16
kashyap$> git tag --sort version:refname --contains 53aec799fa31 | head -115:16
kashyapv6.9.015:16
kashyap(That was on libvirt Git repo)15:16
sean-k-mooneyyep next thing to check is the libvirt-python version on pypi15:21
sean-k-mooneyso "sean@p50:~/Downloads/libvirt-python-7.4.0$ grep -i VDPA -r *" should have found something but it has not15:24
opendevreviewStephen Finucane proposed openstack/nova master: libvirt: Always delegate OVS plug to os-vif  https://review.opendev.org/c/openstack/nova/+/79742815:36
gibinova weekly meeting starts in 6 minutes here in the channel15:53
gibi#startmeeting nova16:00
opendevmeetMeeting started Tue Jun 22 16:00:05 2021 UTC and is due to finish in 60 minutes.  The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot.16:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:00
opendevmeetThe meeting name has been set to 'nova'16:00
bauzas\o16:00
dansmitho/ (kinda)16:00
sean-k-mooneyo/16:01
elodilleso/16:01
gibi\o16:01
alistarleo/16:02
gibi#topic Bugs (stuck/critical) 16:02
gibiNo critical bugs16:02
gibi#link 21 new untriaged bugs (+7 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New16:02
stephenfino/16:02
gibiour backlog is growing so please if you have some time then look at it and triage some bugs16:02
gibiis there any specific bug that we need to discuss now?16:03
sean-k-mooneygibi: i am pretty sure that https://bugs.launchpad.net/nova/+bug/193309616:04
sean-k-mooneyis a bug in ther ci env16:04
sean-k-mooneyso hopefully we can close that soon16:04
gibisean-k-mooney: cool. thanks for looking into that16:04
gibiif nothing else then16:05
gibi#topic Gate status 16:05
gibiNova gate bugs #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure16:05
bauzasgibi: I'll look at open bugs16:06
gibibauzas: thank you16:06
gibiI don't see fresh gate bug in that list16:06
gibilast week we merged couple of fixes16:06
gibihow do you feel, does the gate improved?16:06
sean-k-mooneyam any suggestions on who i should ping to move https://review.opendev.org/c/openstack/devstack/+/796826 along in devstack16:06
bauzasthanks lyarwood btw.16:07
bauzasfor the gate failures fixes16:07
gibisean-k-mooney: gmann could be one 16:07
sean-k-mooneyi assume that we are still seeing the intermitant failures due to the agent haning16:08
sean-k-mooneyso if we can merge that sooner rather then later it should help with gate stablity16:08
gibiI was off yesterday and did not push any code today so I don't have the view what is failing ont he gate16:08
gibibut I agree that devstack patch is a good step forwa4rd16:08
gibianything else about the gate?16:09
gibiPlacement periodic job status #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly16:10
gibiplacement jobs are green16:10
gibi#topic Release Planning 16:10
gibiMilestone 2 is 15 of July which is spec freeze16:10
gibiSpec review day is 6th of July #link http://lists.openstack.org/pipermail/openstack-discuss/2021-June/023083.html16:10
gibianything else about the coming milestone?16:11
gibi#topic Stable Branches16:12
gibicopying notes from elodilles 16:12
gibistable/ussuri is blocked (fix needs to be merged: https://review.opendev.org/c/openstack/nova/+/794675 )16:12
gibiother branches should be OK (however, there are quite frequent volume detach failures for example on stable/victoria)16:13
gibistable/ocata branches of nova are EOL, branches are deleted ( https://review.opendev.org/c/openstack/releases/+/795664 )16:13
gibiEOM16:13
elodillessorry, I was lost in the ussuri gate fixing patches,16:13
gibithe detach failure has a fix on master 16:13
elodillesi will try to sort it out in the coming days16:13
gibihttps://review.opendev.org/c/openstack/nova/+/79625516:13
gibithis needs to be backported I think 16:14
gibihm16:14
gibino\16:14
gibisorry16:14
gibiI'm not sure16:14
gibithat fix helps with the detach issue on master where we have the new notification based detach 16:15
sean-k-mooneyit would need to be backported to any release using the events?16:15
sean-k-mooneybut not where we poll?16:15
sean-k-mooneyso if we backport the events based detach we need to backport both16:16
gibisean-k-mooney: that is my understanding yes16:16
elodillesI saw lot of failures in victoria as I said, can it be valid backport there?16:16
sean-k-mooneyi think its a relitivly safe backport in either case16:17
gibielodilles: the detach modification is pretty huge patch but it does not change external interfaces16:17
sean-k-mooneygibi: did we backport your events patch to victoria?16:17
gibiso from process perspective it is backportable16:17
gibisean-k-mooney: no16:17
gibiI dont think so16:18
elodillesanyway, that sound good (except that it's a huge patch :S), I'll add this to my TODO to investigate :)16:18
sean-k-mooneyso the failure on vicotria are proably not related to OPERATION_FAILED16:18
gibisean-k-mooney: yeah16:18
sean-k-mooneyelodilles: wehre they OPERATION_FAILED specifcialy or was it just detach failures16:18
elodilleswell, the usual error: volume ... failed to detach (in XXXX state )  something16:19
sean-k-mooneyok then that is likely not related to https://review.opendev.org/c/openstack/nova/+/79625516:19
gibisean-k-mooney: I dont think nova logs the sepcific error code16:19
elodilles:-/16:19
sean-k-mooneywe can review after the meeting if you have an example16:19
elodillessure, thanks!16:20
gibiack16:20
sean-k-mooneygibi: we might see it in the libvirt log in the ci run not sure but just a guess16:20
gibiok16:20
gibiany other stable related issue?16:20
elodillesnothing else i think :X16:21
gibi#topic Sub/related team Highlights 16:22
gibiLibvirt (bauzas) 16:22
bauzasjust one thing, we discussed about https://review.opendev.org/c/openstack/nova/+/769614 about how to help operators to know about a behaviour modification for a libvirt issue16:22
bauzaswe had some argument about some relnote reno section16:23
bauzasbut,16:23
bauzaseventually, I +Wd the fix16:23
gibibauzas: what is the summary?16:23
bauzasdo we need to create some better way to tell operators that we no longer verify the cpu topologies ?16:24
sean-k-mooneyyes just noticed if you really wanted me to update it i was ok with that just wanted you and stephenfin  to agree on a path forward16:24
bauzasgibi: this ^16:24
bauzasbasically, flavors could continue to have the same extraspecs16:24
stephenfinbauzas: I'm still not sure what you're asking for. Are you suggesting we drop the 'upgrade' section?16:25
bauzasstephenfin: no, not about this16:25
bauzasstephenfin: about whether we would need more some kind of documentation for this16:25
stephenfinoh, that16:25
sean-k-mooneyoh ok you were conserened if we needed to document it at all16:26
sean-k-mooneynot how we did it16:26
bauzasyup16:26
sean-k-mooneyok i tought you just wanted it in the fixes section so i missed that in your previous comments16:26
stephenfinMy personal opinion is no. We never documented this behaviour and we never agreed to guarantee it. I wanted to include the release note simply as a courtesy16:26
bauzasbasically, the flavors would continue to be the same, it's just the libvirt driver that won't longer verify them16:26
stephenfinsimilar to how we announce virt API changes, even though we don't support out-of-tree virt drivers16:26
stephenfinI think you've misunderstood things. Nothing changes in what we verify16:27
sean-k-mooneywell its not about verificaiotn is that we change unspecified behavior to different unspecifed behavior16:27
bauzasstephenfin: fair enough, I'm having the same opinion but I'm asking the community here if they have concerns16:27
stephenfinyes, what Sean said16:27
gibiif we are changing undefined behavior then reno is just an extra, but not at all mandatory. 16:28
stephenfinwe had a hidden behavior where we would try to match the number of guest CPU "threads" to the number of host CPU threads16:28
stephenfinbut it was hidden. We didn't even know it existed until sean-k-mooney started fixing this bug16:28
bauzasyes, just a performance kind of help16:28
stephenfinexactly. An optimization at most16:28
bauzasokay, any concerns ?16:28
bauzasif not, I'm OK16:28
gibiI'm Ok too16:28
gibistephenfin, sean-k-mooney: is is settled from your perspective too?16:29
stephenfinno issues from me16:29
sean-k-mooneyyep16:29
sean-k-mooneyi did not have a stong opiion16:29
bauzask, that's it for me16:29
gibiok, thanks16:29
gibi#topic Open discussion 16:29
gibithere is one topic16:30
gibi(wenpingsong) vGPU spec reviewing: as we discussed during ptg, cyborg-managed/nova-managed gpu should have a trait indicating who owns this gpu, thanks for bauzas's comments with different thoughts about the trait. We need to have an agreement on that. https://review.opendev.org/c/openstack/nova-specs/+/780452 (can't attending the meeting due to the time slot, but will check the logs later)16:30
gibias far as I understand we originally suggested the owner traits16:30
gibibut bauzas now suggest using separate RC 16:30
gibifor cyborg managed vgpus16:30
gibithis way the trait would not be needed16:30
sean-k-mooneyboth could work16:30
bauzasmy -1 was just for discussing about it16:31
gibiI guess we don't have the use case: Give me a vgpu I don't care if it is cyborg or nova managed16:31
bauzasgiven I was also working on the mdev spec16:31
bauzasI thought about it16:31
sean-k-mooneyowner traits i guess could have upgrade issues16:31
bauzasand I'm not sure we need a "nova" trait16:31
sean-k-mooneye.g. if you upgrade cyborg first16:31
sean-k-mooneywithout upgrading nova to support owner traits16:31
bauzasI know tho we said we would do it at some PTG16:32
bauzasso I'm sorry to reopen the can16:32
sean-k-mooneywell no i think there is a usecase for a nova triat16:32
sean-k-mooneyor well RP ownwer in general16:32
sean-k-mooneynot nessiarly a trait16:32
bauzasit's the other way of a consumer type :)16:33
bauzassomething like a inventory type :)16:33
sean-k-mooneyso its a provider type :)16:33
bauzasyeah16:33
bauzastbh, I don't really like us marking traits for things unnecessary16:34
sean-k-mooneyam would it be better to continue this discsssuon on the spec. was tehre a summary you wanted to give in real time16:34
bauzasagreed16:34
bauzasnot sure we can have a consensus now16:34
gibiagreed too, I don't have the brainpower to think this through right now16:35
bauzasbut I want us to think more about this16:35
bauzasand again, I'm sorry to hold a bit the spec16:35
gibino worries16:35
sean-k-mooneyonce we add a trai we cant remove them so we shoudl get this right16:35
bauzasbut if we are about to be generic, we could also help cyborg I think16:35
bauzassean-k-mooney: unfortunately yes16:36
sean-k-mooneyunless we used CUSTOM_OWNER i guess we cloud i will try and read the spec again tomorow16:36
bauzasmaybe it's also the fact that a 'nova' trait seems to me bizarre16:36
gibiOK, continue this in the spec16:36
sean-k-mooneyi dont think we shoudl treat nova as special in placment16:37
gibiany other topic for today?16:37
bauzaswith jay's mind, I would transform this into "I can support 'nova'"16:37
sean-k-mooneynot form me16:37
bauzasgibi: nope, I'm done16:37
sean-k-mooneybauzas: well it would mean "i can support consumtion by nova" but that just a detail16:38
bauzassean-k-mooney: that's why I think the name is wrong16:38
gibiif nothing else then I close the meeting and you can continue :)16:38
gibi#endmeeting16:38
opendevmeetMeeting ended Tue Jun 22 16:38:52 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:38
opendevmeetMinutes:        https://meetings.opendev.org/meetings/nova/2021/nova.2021-06-22-16.00.html16:38
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/nova/2021/nova.2021-06-22-16.00.txt16:38
opendevmeetLog:            https://meetings.opendev.org/meetings/nova/2021/nova.2021-06-22-16.00.log.html16:38
bauzasthanks gibi16:39
elodilleso/16:39
gibifrom one hand OWNER_NOVA is a trait as it is a qualitative thing not a quantitative one16:39
bauzassurely16:39
gibiand if we ever want to mix nova managed vgpus with cyborg managed ones then we need a common RC16:39
gibibut I'm not sure about we ever want to mix16:40
bauzasgibi: OWNED_BY_NOVA would be a better name16:40
bauzasbut it's a bikeshed16:40
sean-k-mooneygibi: well i dotn think we want to mix it via just resouce:vgpu16:40
gibiagree16:40
gibid16:40
elodillessean-k-mooney: sorry, I forgot that I need to leave now :S I'll search for example failures that we can look at tomorrow16:40
bauzasgibi: that's why I think we need different RCs16:40
sean-k-mooneycyborg one will always come form a device-profile16:40
elodillessean-k-mooney: if that is OK for you o:)16:40
gibiif we never mix, then I'm fine with the different RC16:40
bauzasgibi: because inventories would be managed by different services16:41
sean-k-mooneyelodilles: cool just ping us and we can confirm if tis the same isssue or not16:41
sean-k-mooneyelodilles: hopefully at least16:41
bauzasthe consumption would be identical tho16:41
elodillessean-k-mooney: sure, thanks! :)16:41
sean-k-mooneybauzas: we dont need different RC classes to mix16:41
bauzassean-k-mooney: yup, the other way16:41
sean-k-mooneywe dont need them to isolate ither16:42
gibisean-k-mooney: if you even want to support give-me-a-vgpu-i-dont-care-if-nova-or-cyborg-managed then we need a common pool of resource and therefore a common RC16:42
bauzassean-k-mooneybut I'm not sure marking an inventory by a trait for knowing about the owner is the best16:42
sean-k-mooneya trait + RC can do eveything 2 RC classes can do since we do not allow two services to create inventories on teh same RP16:42
sean-k-mooneygibi: right if we  want to supprot that your are corect a common resouce class woudl be required16:43
gibiif we don't what that then no need for a common RC16:43
bauzasbut I think we're missing some logic16:43
sean-k-mooneybauzas: its marking the RP not the inventory16:44
bauzaseither we wanna mix, and then we don't need to know which inventory was created by who16:44
sean-k-mooneybauzas: we do16:44
bauzasor, we just make resources alongside 16:44
bauzasand then we don't need the RCs to be the same16:44
sean-k-mooneywe do not allow 2 services to modify the same resouce provider16:44
bauzasreally ? 16:45
bauzasI don't think we want to avoid this16:45
sean-k-mooneyso fi we have ownert traits or resocue classes we are fine16:45
bauzasthe generation bit is even for distributed servcies16:45
gmannsean-k-mooney: stephenfin gibi can you point me to bug where server group validation did not work. I can see test validating the scema also - https://github.com/openstack/nova/blob/master/nova/tests/unit/api/openstack/compute/test_server_groups.py#L45916:45
sean-k-mooneybauzas: we dont allow it becaue the virt driver will replace the traits on the RP16:45
stephenfingmann: not the exact bug but https://storyboard.openstack.org/#!/story/2008975 and https://github.com/openstack/python-openstackclient/commit/ab0b1fe885ee0a210a58008b631521025be7f3eb16:46
stephenfinI just posted to openstack-discuss about this16:46
sean-k-mooneybauzas: and it will update the inventoris based on its local view16:46
gibiI have to drop of now. I will read back tomorrow16:46
gmannstephenfin: ok, thanks checking16:46
bauzassean-k-mooney: how placement know which service owns which RP ?16:47
sean-k-mooneybauzas: it does not and it does not enforce it16:47
bauzassean-k-mooney: that's my point16:47
sean-k-mooneywe enforce this today by tellint neutorn and cybrog they may not modify our RPs16:47
bauzaswe're blindly recreating a RP 16:47
gmannstephenfin: so this is about removing the validation not that current validation did not work right?16:47
sean-k-mooneybauzas: yep16:47
stephenfingmann: the bug is about removing the API validation from the client. sean-k-mooney is saying that the server validation isn't working though16:48
sean-k-mooneywell not quite16:48
bauzassean-k-mooney: that's my point, some service owns some RP16:48
bauzasit's just nova/placement which doesn't support it16:48
sean-k-mooneystephenfin: gmann im saying that the customer reported it worked and i cant see how the server validation would not block this16:49
bauzasso16:49
bauzassay cyborg creates some RPs 16:49
bauzasand nova too16:49
bauzasboth are different but have the same RCs16:49
spatelsean-k-mooney hey! had quick question related Cellv2 design, does neutron can be scale using cellv2 or just nova?16:49
gmannsean-k-mooney: stephenfin yeah it is rejected from server side in v2.1 at least (v2 old code I am not sure)16:49
bauzassean-k-mooney: then why should we have problems if a flavor is asking for some vGPUs ?16:50
sean-k-mooneygmann: it shoudl be rejected in 2.0 based on the validation16:50
bauzassean-k-mooney: this would eventually go to the libvirt driver16:50
bauzaswhich would know whether the allocated RP is owned by it or cyboth16:50
sean-k-mooneybauzas: right now cybrog neutron and other prject cannot use any standard RC used by nova16:50
gmannsean-k-mooney: v2 code path was different until we merged it. I am not sure if user using that much old nova16:50
sean-k-mooneygmann: ah good point it was working for them on osp 13 which is queens and not workin on osp 16 whic is train16:51
gmannsean-k-mooney: and in old v2 code path I think we did not had schema validation. its v2.1 introducing the JSON schema validation in more strict way16:51
sean-k-mooneybauzas: perhaps we could just have a google meet call16:52
bauzassean-k-mooney: unfortunately, I need to stop in 5 mins16:52
bauzasgosh I miss PTGs16:52
bauzasand f2f meetings16:52
bauzasand whiteboards16:52
sean-k-mooneybauzas: we can talk tabout it tomorrow then but tl;dr; today no too openstack service can use the same RC class and no two openstack service can share the same RP16:53
bauzasI could even show source code16:53
sean-k-mooneybauzas: the ownwer triat was a way to allow reuse of stanard RCs across services16:53
bauzassean-k-mooney: based on your first assumption, that means cyborg can't provide VGPU resources ?16:53
bauzassean-k-mooney: gosh, this sounds hacky16:53
sean-k-mooneycorrect which is why the ownwer trait was intoduced in the spec16:54
gmannsean-k-mooney: Mitaka was last with legacy v2 code and since newton we started single code path for v2 and v2.1 which has these schema validation16:54
sean-k-mooneywithout an owner of the resouce provider in some form resouce classes cant be shared16:54
bauzassean-k-mooney: honestly, I'd prefer us to have a way to know about which service owns a RP with by default be 'nova'16:54
bauzasmeaning another attribute of a RP16:55
sean-k-mooneybauzas: well that is exactly what the ownwer traits were ment to be16:55
sean-k-mooneybauzas: since peole did not want a dedicated value on the RP16:55
bauzasthe owner trait is just a hack because we wanna make it round 16:55
sean-k-mooneybauzas: we proposed added ownwer before and peopel objected16:55
bauzassean-k-mooney: then I don't guess why we need to explicitely mark a nova trait16:55
sean-k-mooneywe can certenly do that again but the default shoudl nto be nova16:56
gmannsean-k-mooney: stephenfin ohk its modified code he mentioned in github discussion so upstream code is all good16:56
bauzassean-k-mooneythe default should what ?16:56
sean-k-mooneybauzas: empty16:56
sean-k-mooneybauzas: placment should not prefer nova over other services16:56
bauzasempty meaning ?16:56
sean-k-mooneymeaning unowned16:57
bauzasthis is wrong16:57
sean-k-mooneybascaly we ould need a reshape16:57
bauzasany RP is owned by something16:57
sean-k-mooneywel just update16:57
sean-k-mooneyto populated it for nova RPs16:57
sean-k-mooneygmann: thanks ill feed that back downstream unless you commented on ghitbub already?16:58
bauzassean-k-mooney: I need to hardstop but let's continue the convo with gibi later tomorrow16:58
gmannsean-k-mooney: doing on github16:58
sean-k-mooneybauzas: ack o/16:58
sean-k-mooneygmann: thanks16:59
bauzassean-k-mooney: I guess I'm afraid by all the implications of tagging our RPs explicitely16:59
sean-k-mooneybauzas: whcih is why we need to discuss in a spec :)16:59
bauzasand I just wanna be sure this is the only possibility16:59
bauzassean-k-mooney: think about something until tomorrow17:00
* sean-k-mooney breaths as the 3 conversation i was having in paralle all come to an end17:00
bauzassean-k-mooney: would it be a problem if nova would create RPs with VGPUs in it, while cyborg creates RPs with VGPU in them too, and eventually the allocation would go straight to the nova-compute service holding the allocated RP17:01
bauzasI just feel the nova-compute service could know which RPs is owned by who17:01
bauzasand defer to cyborg-agent binding if so17:03
bauzasor just call libvirt if the allocated RP was owned by nova17:03
opendevreviewStephen Finucane proposed openstack/nova master: scheduler: Remove 'USES_ALLOCATION_CANDIDATES'  https://review.opendev.org/c/openstack/nova/+/77364017:04
opendevreviewStephen Finucane proposed openstack/nova master: scheduler: Remove 'hosts_up'  https://review.opendev.org/c/openstack/nova/+/77364117:04
opendevreviewStephen Finucane proposed openstack/nova master: trivial: Remove FakeScheduler (for realz)  https://review.opendev.org/c/openstack/nova/+/77364217:04
opendevreviewStephen Finucane proposed openstack/nova master: scheduler: Merge 'FilterScheduler' into base class  https://review.opendev.org/c/openstack/nova/+/77364317:04
opendevreviewStephen Finucane proposed openstack/nova master: docs: Drop references to non-filter scheduler drivers  https://review.opendev.org/c/openstack/nova/+/77364517:04
opendevreviewStephen Finucane proposed openstack/nova master: scheduler: Merge driver into manager  https://review.opendev.org/c/openstack/nova/+/77364417:04
opendevreviewStephen Finucane proposed openstack/nova master: tests: Merge 'test_utils', 'test_scheduler_utils'  https://review.opendev.org/c/openstack/nova/+/77364617:04
opendevreviewStephen Finucane proposed openstack/nova master: conf: Remove deprecated aliases  https://review.opendev.org/c/openstack/nova/+/77364717:04
opendevreviewStephen Finucane proposed openstack/nova master: scheduler: 'USES_ALLOCATION_CANDIDATES' removal cleanup  https://review.opendev.org/c/openstack/nova/+/79751317:04
bauzassean-k-mooney: this isn't a scheduling decision until you wanna explicitely shard your resources between nova and cyborg (the non-mixed case)17:04
bauzasand for the non-mixed case, we can have different RCs17:04
bauzasinstead of a owner trait17:04
* bauzas needs to run now17:05
sean-k-mooneybauzas: if the request is in the flavor and ti does not come form a device profile then nova shoudl ensure it does not come form an nova created RP17:05
sean-k-mooneythe instresting part is if you have resouces:vgpu=117:06
bauzasthe mixed case17:06
sean-k-mooneywhich today can only come form nova 17:06
bauzasthe non-mixed case can be solved with custom RCs17:06
sean-k-mooneyand the question then is what hsoudl the bhavior be in the absense of a trait17:06
sean-k-mooneyanyway let talk tommorow17:07
bauzasfor the mixed case, you can admit that it's OK for a spawn to go to the nova-compute17:07
bauzasand then the nova-compute can either call cyborg-agent to bind17:07
bauzasor, call libvirt to just spawn17:07
sean-k-mooneyno that is not ok17:07
sean-k-mooneywell maybe17:07
sean-k-mooneybut it would be rather complicated17:07
sean-k-mooneyill think about it tonight17:08
bauzaswe already get allocations before we call libvirt17:08
sean-k-mooneybut it would require change not currently in the spec17:08
bauzasso we already know the allocated RPs17:08
sean-k-mooneybauzas: correct but we will need the condurtor to determin if the rp in the allocation is owned by nova or cyborg17:09
bauzasit's just the nova-compute service which doesn't care and directly passes the allocations as arguments17:09
sean-k-mooneyand then have it start the bidning before we call the compute node17:09
opendevreviewMerged openstack/nova master: Fix max cpu topologies with numa affinity  https://review.opendev.org/c/openstack/nova/+/76961417:09
sean-k-mooneysicne we do the arq binding in the conductor17:09
bauzassean-k-mooney: that's not the workflow I see in https://review.opendev.org/c/openstack/nova-specs/+/780452/8/specs/xena/approved/support-vGPU-nova-cyborg-interaction.rst#13217:09
sean-k-mooneyand without a device procfile we will not know how to create teh arq17:09
bauzasholy shit17:09
bauzas'nova's is both the conductor and the compute service17:09
sean-k-mooneyso we cant make that work without changing the cybrog api17:09
bauzasin the workflow diagram17:10
bauzaswell, we get allocations in the conductor, right?17:10
bauzasin order to know which compute services to call, right?17:10
sean-k-mooney yes and we create and bind arqs in the conductor too17:10
bauzasok, so it's just an extra bit17:10
sean-k-mooneybut to create an arq we need a device profile17:10
sean-k-mooneyso if we have resouce:vgpu in the flavor it cant be used to create a cybrog ARQ today as we dont have a device profile17:11
bauzassure, I see it in the diagram17:11
*** rpittau is now known as rpittau|afk17:11
bauzaswe create a device profile before we call scheduling, right?17:12
sean-k-mooneyso right now a refilter can eailar determin if the vgpu request is from nvoa or cyborg17:12
sean-k-mooneybauzas: yes its a resouce in the cyborg api17:12
bauzasbecause we never mix17:12
sean-k-mooneyso the admihn create a device profile and then creates a flavor that refrence s it17:12
sean-k-mooneybauzas: a device profile is like a pci aliasa17:12
bauzasyup, I got it17:13
sean-k-mooneyyour are 13 mins passed your hard stop :)17:13
bauzasbut it's premature I think to get a device profile based on an assumption that a flavor asked for a vgpu17:13
sean-k-mooneybauzas: that not how that works17:14
bauzasyou should get the device profile only if the allocated resource is owned by cyborg17:14
bauzassean-k-mooney: I know17:14
sean-k-mooneyright now the only way to request cybrog resoruce is via a device profile17:14
sean-k-mooneysupprot cybrog via the resouce: sysntax is totallyout of scope17:14
bauzassean-k-mooney: I understand and I don't wanna change this17:14
bauzasok, ok, 17:15
bauzasso a flavor is set with a device profile that's turned into a resources query by a prefilter, right?17:15
sean-k-mooneyyes17:15
bauzasin this case, there is ZERO possibility to mix resources17:15
bauzaswhatever the resource is17:16
sean-k-mooneywell not refilter i thik we have explict logic for cyborg17:16
sean-k-mooneytoday correct17:16
bauzasso the mixed case is impossible today17:16
sean-k-mooneyyes17:16
bauzasso, we're discussing about non-mixed resources17:16
bauzaswhich can be solved with custom RCs17:16
sean-k-mooneyunless tehy break one fo the rule we have about palcment useage17:16
sean-k-mooneybauzas: it can but it does not have to be17:17
bauzassean-k-mooney: the other possibility implies a huge change for existing deployments, right?17:17
sean-k-mooneyno17:17
sean-k-mooneynot really17:17
sean-k-mooneyjust that we would add a trait to all nova rps17:18
sean-k-mooneyand we would automatically add it with a prefilter17:18
bauzasyup, I got this, this is a technical answer17:18
bauzaswhich prevents us to discuss about alternatives17:18
sean-k-mooneywe can discuss alternitves17:19
bauzasand the one I'm proposing (again, for the non-mixed case) doesn't imply upgrade impacts17:19
sean-k-mooneybut we have discsuted this for 2+ cycle now17:19
sean-k-mooneycorrect but it means we dotn need os-resouce classes17:19
sean-k-mooneysince resouce classes will have to be different per service17:19
sean-k-mooneyso its not free17:20
sean-k-mooneybauzas: if we ant differnt service to use diffeernt resouces classes for the same resouce class we can do that17:20
bauzasthe whole situation is that we need some scheduling query for telling "yo' I really want cyborg"17:20
sean-k-mooneybut its restricting the usfulness of placment17:20
sean-k-mooneyor i really want nova but yes17:21
sean-k-mooneywhich is why the ownwer trait was proposed so we could make a qualitave statmetn about the resouce provier17:21
bauzasnope, in general, you don't need to tell placement "give me nova stuff"17:22
bauzasbecause if you ask for nova resouces, you get nova stuff17:22
sean-k-mooneyyou may not17:22
bauzasagreed17:22
sean-k-mooneyzun could create RPs with cpu ram and disk17:22
bauzasif cinder was using placement for their own volumes17:23
bauzasyou could ask placement for things that nova doesn't knoxw17:23
sean-k-mooneyyes17:23
bauzaszun is a good example17:23
sean-k-mooneywe have demonstarted that with provider.yaml also17:23
sean-k-mooneybauzas: i think you jsut made an argument indirecly that i might be able to get behind 17:24
sean-k-mooneyi never want people ot user resouce_# syntax with traits_# sysntax in teh falvor directly17:24
sean-k-mooneye.g. i dont want people to have to use the placmenet group sysntax in flavors17:25
sean-k-mooneyso if we are getttign resouce owned by differetn service i dont want them to have to specify traits in request groups in the flavor17:25
bauzasthat's reasonable17:26
sean-k-mooneywith different resouce_class we dont have to do this17:26
sean-k-mooneyalong as we dont share resouce classes or do it in a managed way17:26
sean-k-mooneye.g. cindr can use the disk_GB resouce class as long as it does so useing sharing aggreates and does nto create RPs or inventorees under the compute 17:27
sean-k-mooneycyborg could use vgpu rc in the same way but only if its not under the compute RP17:28
sean-k-mooneywell hum no i need to take a break and think about this with a clear head17:28
bauzasme too17:28
bauzasmy hardstop was a softstop17:29
bauzasbut my wife should kill me if I'm still writing in 2 mins17:29
bauzas(me joking tbc)17:29
sean-k-mooneyi think im going to finsih here also :)17:29
*** iurygregory_ is now known as iurygregory18:39

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!