Tuesday, 2023-06-06

gibibauzas, dansmith: nice findings. I'm OK with the privsep fix and you can ping me with the governor check fix when it is available07:57
gibiregarding asserting the use of the decorator during testing we can build a list of filesystem.write(path, data) calls that we know require the privsep decorator and then check in the test that when those calls happen the func has _ENTRYPOINT_ATTR set.08:00
gibi*filesyste.write_sys08:04
songwenpingsean-k-mooney:hi, does live migration pass filters if assigned the host?08:09
opendevreviewBalazs Gibizer proposed openstack/nova master: Fix failed count for anti-affinity check  https://review.opendev.org/c/openstack/nova/+/87321608:16
gibisean-k-mooney: This ^^ was already approved but needed a rebase and a unit test fix due the base changes. Could you check it please?08:17
gibiI did not wait for the author with the rebase and went ahead and fixed up the patch08:17
opendevreviewBalazs Gibizer proposed openstack/nova stable/2023.1: Fix failed count for anti-affinity check  https://review.opendev.org/c/openstack/nova/+/88534308:22
opendevreviewBalazs Gibizer proposed openstack/nova stable/zed: Fix failed count for anti-affinity check  https://review.opendev.org/c/openstack/nova/+/88534408:30
bauzasgibi: sorry for the late reply, but thanks08:33
opendevreviewBalazs Gibizer proposed openstack/nova stable/yoga: Fix failed count for anti-affinity check  https://review.opendev.org/c/openstack/nova/+/88534508:36
opendevreviewBalazs Gibizer proposed openstack/nova stable/xena: Fix failed count for anti-affinity check  https://review.opendev.org/c/openstack/nova/+/88534708:45
songwenpinggibi:morning, does live migration pass filters if assigned the host?08:46
gibisongwenping: it depends. See the doc of the host and the force option in https://docs.openstack.org/api-ref/compute/?expanded=live-migrate-server-os-migratelive-action-detail#id13108:47
songwenpingwe use rocky version, and nova-conductor donot find new destination.08:53
songwenpingand there is a problem, if the vm has affinity, it can be migarated to other host.08:53
opendevreviewBalazs Gibizer proposed openstack/nova stable/wallaby: Fix failed count for anti-affinity check  https://review.opendev.org/c/openstack/nova/+/88534808:57
songwenpingthen if we use the same affinity strategy to create vms, these vms scheduled to different hosts. gibi, is this reasonable?08:57
gibiif you use affinity strategy then you cannot move the VM. Execpt if you disable the scheduler via the force flag and old enough microversion. But if you disable the scheduler then the affinity will not be honored.08:59
gibiIf you need both affinity and move operations then you should use soft-affinity08:59
opendevreviewBalazs Gibizer proposed openstack/nova stable/victoria: Fix failed count for anti-affinity check  https://review.opendev.org/c/openstack/nova/+/88534909:08
songwenpinggibi, got it, thanks^^09:09
opendevreviewSylvain Bauza proposed openstack/nova master: cpu: make governors to be optional  https://review.opendev.org/c/openstack/nova/+/88535209:57
opendevreviewBalazs Gibizer proposed openstack/nova stable/ussuri: Fix failed count for anti-affinity check  https://review.opendev.org/c/openstack/nova/+/88535310:04
opendevreviewBalazs Gibizer proposed openstack/nova stable/train: Fix failed count for anti-affinity check  https://review.opendev.org/c/openstack/nova/+/88535510:19
sean-k-mooneygibi: its workign on https://github.com/openstack-k8s-operators/nova-operator/pull/400 by the way10:22
sean-k-mooneywhich ran at 2023-06-05 14:50:2010:23
sean-k-mooneyso it passed yesterday10:23
gibiack10:23
gibiyou oddly switched from downstream slack to upstream irc to write that though :)10:24
sean-k-mooneyoh right donwstream is normlaly on the bottom half of my screen and upstream is the top 10:24
sean-k-mooneythis windows is in the wrong place10:25
sean-k-mooneyfixed :)10:25
gibi:)10:25
opendevreviewGorka Eguileor proposed openstack/nova master: Libvirt: remove old discard with virtio log  https://review.opendev.org/c/openstack/nova/+/88535611:07
*** EugenMayer44 is now known as EugenMayer411:21
dvo-plvgibi,bauzas: Hello, Could you please review nova patch: https://review.opendev.org/c/openstack/nova/+/87607511:41
bauzasdvo-plv: sure, I already promised but unfortunately I needed to work on my presentation for the OpenInfra Summit :(11:41
dvo-plvSure, thank you no rush, review according to your plan, I just would like to remind in case request was lost11:44
bauzasI'm really sorry folks but I forgot to tell that today is the spec review day 12:52
bauzas!!!12:52
opendevmeetbauzas: Error: "!!" is not a valid command.12:52
opendevreviewAmit Uniyal proposed openstack/nova master: Reproducer for dangling bdms  https://review.opendev.org/c/openstack/nova/+/88145714:12
opendevreviewAmit Uniyal proposed openstack/nova master: Delete dangling bdms  https://review.opendev.org/c/openstack/nova/+/88228414:12
opendevreviewSylvain Bauza proposed openstack/nova master: cpu: make governors to be optional  https://review.opendev.org/c/openstack/nova/+/88535214:23
dansmithbauzas: are you going to update the reno for the first patch?14:27
mnederlofhi, i've created this bp https://blueprints.launchpad.net/nova/+spec/rbd-allow-glance-image-deletion and the code change required, can someone help with the next steps for review? https://review.opendev.org/c/openstack/nova/+/88459514:32
bauzasdansmith: yeah, I'm just fixing the series14:33
opendevreviewSylvain Bauza proposed openstack/nova master: cpu: fix the privsep issue when offlining the cpu  https://review.opendev.org/c/openstack/nova/+/88529314:37
opendevreviewSylvain Bauza proposed openstack/nova master: cpu: make governors to be optional  https://review.opendev.org/c/openstack/nova/+/88535214:37
bauzasdansmith: gibi: just updated the cpu fixes ^14:38
bauzaselodilles: can you help me ? wanted to propose the train-eol patch but looked at the docs and saw https://docs.openstack.org/project-team-guide/stable-branches.html#end-of-life 14:53
bauzas"point #2 : Remove any related zuul jobs that are defined in other repositories and not needed anymore."14:53
bauzaswdym by that ? 14:53
bauzaslike in tempest ?14:54
elodillesbauzas: any job that nova uses in its .zuul.yaml, but defined outside of nova repository14:59
bauzasI don't see any of them14:59
elodillesfor example if there is let's say nova-special-grenade-train defined in, for example, openstack/grenade repository15:00
elodillesbauzas: if there is none, then you're done with that step ;)15:01
bauzasI'll doublecheck with Gerrit15:01
bauzasgibi: btw. I have an appointment around 20 mins after the start of the meeting, can you chair it ?15:02
gibibauzas: I can try but I'm probably not the best person today as I will be on an flaky connection at that time15:10
bauzasok, I can ask someone else, I just wonder who15:10
bauzaselodilles: want to lead it ?15:10
elodillesbauzas: i'm not feeling quite well, so i'd rather pass this time :/15:15
bauzasokok15:16
bauzasso, we'll try to have a quick meeting then 15:16
elodilles+115:16
gibibauzas: then I will jump in after you need to leave but I don't promise I will not get disconnected at some point :)15:18
dansmithgibi: #chair me and I can recover it if you drop15:23
gibiack15:26
bauzasdansmith: cool thanks for the offer15:38
bauzasshit, my appointment just arrived15:50
bauzasgibi: can you please lead it ?15:50
bauzasthe agenda is done https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting15:50
gibisure15:51
gibiI will do15:51
gibi#startmeeting nova16:00
opendevmeetMeeting started Tue Jun  6 16:00:04 2023 UTC and is due to finish in 60 minutes.  The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot.16:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:00
opendevmeetThe meeting name has been set to 'nova'16:00
gibi#chair bauzas 16:00
opendevmeetCurrent chairs: bauzas gibi16:00
gibi#chair dansmith 16:00
opendevmeetCurrent chairs: bauzas dansmith gibi16:00
dansmitho/16:00
auniyalo/16:00
elodilleso/16:00
gibibauzas has an appointment so I try to chair this16:01
gibi#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting16:01
gibi#topic Bugs (stuck/critical)16:01
gibilets see16:01
gibi#info No Critical bug16:01
gibi#link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 18 new untriaged bugs (+3 since the last meeting)16:02
gibi#info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster16:02
gibilast week the baton was at bauzas 16:02
gibiso I'm not sure if we have any news from him now16:03
gibithe next on the roster is me16:03
bauzasI'll take it next week 16:04
gibibut I will be mostly away next week16:04
Uggla_o/16:04
bauzasI didn't had time to look at them this week16:04
bauzasDitto due to the summit16:04
gibiso moving down the list the next on it is melwitt 16:04
bauzasBut I can try to look at them16:04
bauzas(sorry on my phone)16:04
gibimelwitt: could you take the baton?16:05
auniyalgibi, last to last week I looked into this bug: https://bugs.launchpad.net/nova/+bug/2018719, I could not reproduce  so added comment to ask for more info16:05
gibiauniyal: ack16:06
gibiI guess logging in to the rescue image depends on the actual image so you are right16:06
gibiI will ping melwitt later about the bug baton16:07
gibiany other bugs we need to discuss?16:07
auniyalnothing from my side, thanks16:08
gibi#topic Gate status 16:09
gibi#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:09
gibi#link https://etherpad.opendev.org/p/nova-ci-failures16:10
dansmithlots of little test failures lately which is making it challenging to get a clean result16:10
dansmithbut nothing outstanding as a super common thing to go tackle that I've seen16:10
gibiI saw two different guest failures one case an disk io error16:10
gibithe other was probably some metadata error16:10
gibibut I agree I did not see a pattern yet16:11
dansmithI have seen some IO errors related to volumes yeah, but I don't know what that's coming from16:11
gibiI don't see any new bug reported tagged with gate-failure. If I see a pattern in tomorrows reject then I will file some16:12
gibis/reject/recheck/16:13
bauzashaven't seen any gate failure 16:13
bauzas(still otp)16:13
gibi#link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement16:14
gibiperiodics look good16:14
gibiany other gate issues to raise?16:15
gibi#info Please look at the gate failures and file a bug report with the gate-failure tag.16:15
dansmithnothing from me16:15
bauzasI'm back16:16
gibithen the usual announcement16:16
gibi#info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures16:16
dansmithfwiw,16:16
dansmithI think we're doing quite well on the blind recheck thing.. not that we shouldn't remind people, but we could probably un-ALL-CAPS-ify that now :D16:17
bauzasgibi: wants me to take again the chair seat ?16:17
gibidansmith: cool. I'm OK to uncap it :)16:17
dansmithit's been tracked in the TC meeting and we'16:17
gibibauzas: the char is yours :)16:17
dansmithwe seem to be settling around pretty good behavior16:17
bauzasdansmith: lol, I'll change it :)16:17
bauzas#topic Release Planning 16:19
bauzas#link https://releases.openstack.org/bobcat/schedule.html16:19
bauzas#info Nova deadlines are set in the above schedule16:19
bauzas#info Nova spec review day today16:19
bauzasas a reminder ^16:19
gibiI deeply missed that :/16:19
bauzastbh, I wasn't able to do my duty but I'll do this later tonight16:19
bauzas(some internal discussion ate my whole afternoon)16:20
bauzasso, yeah, would be nice16:20
gibiI saw that there is a spec proposal for continuing the PCI in placement work16:20
bauzasnothing to tell apart this16:20
bauzasgibi: indeed, someone proposed16:20
gibiI need to review that 16:20
bauzascool16:20
gibibut other can chime in there too :)16:20
bauzasas a reminder, if folks don't have time to review specs today, that's fine (c)16:20
bauzasbut please try to look at them this week16:21
gibithere is alway tomorrow :)16:21
bauzasat least before the Summit in case people discuss there16:21
bauzasanyway, good related point, 16:21
bauzas#topic pPTG Planning 16:21
bauzas#info please add your topics and names to the etherpad https://etherpad.opendev.org/p/vancouver-june2023-nova16:21
bauzascrickets in there ^16:21
bauzasso I'll write an -discuss ML thread for this16:22
gibinah, I added one thing now :D16:22
bauzasin case ops or devs want to discuss with us16:22
bauzashehe16:22
bauzas+ I'll tell ops during our forum meet&greet about our PTG16:23
bauzas#info The table #24 is booked for the whole two days. See the Nova community thereĀ !16:23
bauzasthat's it16:23
bauzasmoving on16:23
bauzas#topic Review priorities 16:23
bauzas#link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2)16:23
bauzas#info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review16:24
bauzas#topic Stable Branches 16:24
bauzaselodilles is maybe afk16:24
bauzasso lemme add his points16:24
bauzas#info stable gates should be OK (from stable/2023.1 to stable/train)16:24
bauzas#info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci16:24
bauzashuzzah for this16:24
bauzasand my point16:24
bauzas#info train-eol patch proposed https://review.opendev.org/c/openstack/releases/+/88536516:24
bauzasI'd appreciate if nova-cores could comment it ^16:25
dansmithI will16:25
* gibi just proposed a train backport today16:25
dansmithI noticed the cinder people are taking a more aggressive approach16:25
bauzasgibi: okay, then -1 my patch and I'll modify it to await for your backport merge16:25
bauzasdansmith: yup, saw it too16:25
gibibauzas: I think I'm OK to drop that backport16:25
gibiI did that to see if the patch works 16:26
bauzasgibi: as you want, just tell me your insights in the train-eol patch 16:26
gibiack16:26
bauzasdansmith: fwiw I'm afraid of EOLing the whole EM branches16:26
dansmithshrug.. my argument for train applies to all the EM ones too16:27
dansmithmaybe we wait and see how it goes for cinder ;)16:27
bauzasbut since we haven't backported the os-brick CVE fix in Ussuri and Victoria, I could understand16:27
bauzasbut yeah, let's see what it's happening for cinder :D16:28
* bauzas takes his popcorn :)16:28
bauzasanyway, moving on16:28
bauzas#topic Open discussion 16:28
bauzasnone in the agenda16:28
bauzasanything that someone wants to tell ?16:28
gibione thing16:30
bauzasshoto16:30
bauzasshoot even16:30
gibithere was a request for opinion about openstack on k8s16:30
gibilet me find the link16:30
sean-k-mooneyits a thing that people do16:30
bauzasokidoki, let's wait16:31
gibi13:01 < mdbooth> I'll be running a forum session on Kubernetes on OpenStack in Vancouver next week. It's for users and developers of all related projects to talk to each other. Etherpad is here if there's anything you'd like to discuss: https://etherpad.opendev.org/p/openinfra-2023-kubernetes-on-openstack16:32
gibi13:01 < mdbooth> I'll be running a forum session on Kubernetes on OpenStack in Vancouver next week. It's for users and developers of all related projects to talk to each other. Etherpad is here if there's anything you'd like to discuss: https://etherpad.opendev.org/p/openinfra-2023-kubernetes-on-openstack16:32
gibisorry for the duplicate16:32
sean-k-mooneyim wondering why mdboot is runnign that session but ok16:32
dansmiththat's k8s on openstack not what you said right?16:32
gibisorry I mixed up16:32
sean-k-mooneythat k8s on openstack16:32
sean-k-mooneyya16:33
bauzasok, so lemme add the link16:33
sean-k-mooneythat makes more sense why mdbooth is involved16:33
bauzas#link https://etherpad.opendev.org/p/openinfra-2023-kubernetes-on-openstack OpenInfra Forum session for discussing about k8s on openstack16:33
bauzasgibi: the other way btw. :)16:33
gibithere is some nova related question in the etherpad16:33
gibiabout getting notified if anything changed with servers16:34
gibiI offered the nova notification inteface16:34
bauzasyeah and it's a public API :)16:34
gibibut apperantly they want something public16:34
sean-k-mooneythe notifcation are the only interface we have currently 16:34
gibithe API is documented but the message bus access then to be non public16:34
bauzassince someone worked on notifications objects like 6 years ago (guess who :p )16:35
gibibauzas: hah :D16:35
bauzasgibi: are you sure that the message bus can't be public ?16:35
dansmithit shouldn't be16:35
gibiyeah16:35
dansmithwe have instance events.. that's what they want I think16:35
sean-k-mooneythe notification bus is semi privladged16:35
gibiand our notifications tend to contain infra information16:35
dansmithI think they just need an async way to get those16:35
sean-k-mooneythe notifications can have private infor in them16:35
sean-k-mooneydepening on what you configure16:36
sean-k-mooneylike the bdms16:36
gibiyeah, bottom line the notification API is designed for consumed by admins or other openstack service not endusers16:36
bauzasah they want it to be consumable by endusers ?16:36
dansmithright, it would leak bad things between tenants for sure16:36
dansmithnot just infra things16:36
gibiyeah16:36
bauzasurth16:36
bauzasurgh16:37
sean-k-mooneyyou could have a multi tentant service that converts the notificaton in to a webhook callback or similar16:37
dansmithinstance events/actions is the right thing I think, it's just only polling currently16:37
sean-k-mooneybut that still not greate16:37
sean-k-mooneydansmith: ya the event stream would work but im not sure that 16:37
gibiyeah, a websocket around instance actions wouldbe nice16:37
sean-k-mooneyeven if it was event based they woud lwant ot liste per instance16:37
sean-k-mooneymore liek open a websocket and get all instance events for a project?16:37
dansmiththey probably want to be able to register a handler with a scope (one instance, all my instances) that lives for a period of time that we call when there's a new event16:38
sean-k-mooneyor that you are allowed to see based on teh scope of the keystone token16:38
bauzasI think everytime someone asks us to monitor some instance action, we tell them 'lookup the notifications'16:38
bauzasbut this is for admin usage16:38
dansmithwebsocket will require a lot of standby resources that I think would be hard for us to manage16:38
gibitrue16:38
dansmithanyway,16:38
bauzasso, they want some enduser public subscription mechanism for asynchronously being notified on my instance state changes ?16:38
dansmithnot sure how many people will be there to make any sort of headway on that topic, since I think those people are likely here :)16:39
dansmithbauzas: yeah16:39
bauzassounds a client thing to mre16:39
bauzasme16:39
gibidansmith: I will be there hence collecting ideas here now :)16:39
sean-k-mooneylets see if they acan at least expand on the usecases16:39
dansmithbauzas: a client can poll (or long poll) but that's much less efficient16:39
gibiyeah I will pull out some specific use case and try to limit the scope to something very simple on our side16:39
dansmithespecially when instance actions could be days apart16:40
bauzasbut yeah, someone could provide some tool that would listen to the notification bus and scramples all the admin-only data16:40
gibibauzas: exactly16:40
bauzassorry, by client I meant something unrelated to nova16:40
sean-k-mooneyso the way to do this in the past was ceilometer put the relevetn events in AODH16:40
dansmiththey could, but that's basically re-constructing the tenant isolation that nova already has, so it's a big new surface to secure and new services to run16:40
sean-k-mooneyand then you woudl set up alarms on the events you cared about16:40
dansmithsean-k-mooney: that's all intended to be operator-focused, not for users to get status/events on their instances right?16:41
sean-k-mooneyno16:41
gibiif they only need a trigger to re-read the instance action API then most of of the data can be hidden from our notifications16:41
sean-k-mooneyaodh and celoimeter provided user facing events/metrics16:41
dansmithokay I didn't realize16:41
bauzasyeah16:41
sean-k-mooneythey didnt actully expose the full notificaiton 16:41
bauzasceilometer was the fit16:42
sean-k-mooneyjsut instance boot started and instance boot finsined events16:42
bauzasand that's why we never had this in nova16:42
dansmithhonestly, I feel like this is probably something nova can/should be doing16:42
dansmithnowadays this is how stuff plugs together16:42
sean-k-mooneyya i think its something we coudl do 16:42
sean-k-mooneybut we need to think about how16:42
dansmithmaking an external tool reconstruct what we already know is kinda :/16:42
gibiyeah16:43
dansmithit could be a service like console that you run if you want, and scale separately to handle the amoun tof load you want to tolerate16:43
bauzasdansmith: I'm still struggling to find how we would ensure the tenancy isolation by the message bus, but I'm open to ideas16:43
bauzasunless we create a bus per tenant16:43
sean-k-mooneyif its in nova16:43
dansmithbauzas: we wouldn't?16:43
sean-k-mooneywe can just filter16:43
bauzasI'm maybe misunderstanding the proposal, but I thought we were about saying that we may emit project-related notifications16:44
dansmithnot at the rabbit level16:44
dansmithlet's let gibi collect some data,16:44
dansmithand then we probably need a high-bandwidth conversation about options16:44
gibibauzas: we def need to understand their use case better 16:45
bauzasdansmith: ah ok16:45
bauzasdansmith: then we need to construct some HTTP/2 layer with keystone auth 16:45
bauzasor something like that16:45
dansmithbauzas: not necessarily16:45
dansmithit just depends.. but it should be HTTP-something, either an event stream or callbacks16:45
sean-k-mooneyyou would do somethign like "openstack project event subsribe (instnace.action.)*" which woudl return a websock url that would only stream the relevent events for the current project based on the keystone token16:46
dansmithyep, could be something like that16:46
bauzassounds very console-ish16:46
sean-k-mooneyso like the console you woudl first create it and then get a handel for where to collect the data16:46
bauzasbut okay16:46
gibiI can simplify it down to give me a server uuid and I stream you data about notifications affecting the server, but only with very limited data provided16:46
dansmithbauzas: exactly.. it's the same sort of arrangement, and the same target audience16:46
bauzasok, then it sounds we have an agreement on the direction, let's not overpaper the technical details 16:47
sean-k-mooneyand if its a seperate binary nova-event-proxy16:47
dansmithgibi: yeah I just don't think you should need 100 websockets you have to read from if you have 100 instances in your wordpress deployment16:47
dansmithsean-k-mooney: right16:47
sean-k-mooneythen its scalablity and wether its deployed is up ot the operator16:48
dansmithyup16:48
gibidansmith: ahh true, we can do it pre project then16:48
gibiper16:48
bauzasyeah, devil is in the details of the productization16:48
dansmithgibi: or even server group16:48
sean-k-mooneyanyway lets see what they actully bring up16:48
bauzasgibi: honestly, the granularity sounds per project to me16:48
sean-k-mooneyand see if this type of solution would work for them or not16:48
dansmithwell, nfv people are all one project in some cases, so that probably won't work for them16:48
bauzasgibi: are you done with this topic now that we drafted a solution for you ? :D16:49
gibiI'm done16:49
bauzascool16:49
gibithanks for the discussion\16:49
gibiI will link this to the etherpad16:49
gibiand I will report back from the summit16:49
bauzascool16:49
bauzasI'll be back watching you at the Summit anyway16:49
bauzasso if you promise too many things, I could yell :p16:49
gibibauzas: please do so16:49
gibi:D16:49
gibiI don't need another 3 years of "notification" work16:50
dansmithheh16:50
dansmithstill got scars eh/16:50
bauzasok, I was balancing the idea to paperwork the scaphandre and manila series but I'm exhausted of today16:50
bauzasso, let's skip it and pretend it will be discussed in two weeks from now16:50
gibidansmith: time makes all these memory nicer and nicer actaully16:51
dansmithheh16:51
gibiso bauzas' has a good point watching me :)16:51
dansmithgibi: https://www.youtube.com/watch?v=dLjNzwEULG816:51
bauzasgibi: you're fortunate, canadians don't open carry16:51
bauzassorry, was a terrible joke :)16:52
gibidansmith: I need to check this out after the meeting :)16:52
bauzasanyway, I think we're done for today16:52
gibiindeed16:52
bauzasthanks all16:52
bauzasand thanks gibi for the chair16:52
bauzas#endmeeting16:52
opendevmeetMeeting ended Tue Jun  6 16:52:50 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:52
opendevmeetMinutes:        https://meetings.opendev.org/meetings/nova/2023/nova.2023-06-06-16.00.html16:52
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-06-06-16.00.txt16:52
opendevmeetLog:            https://meetings.opendev.org/meetings/nova/2023/nova.2023-06-06-16.00.log.html16:52
bauzasgibi: there you go, you have your logs :p16:53
gibithank16:54
gibiand it was the moment when my connection dropped first16:54
gibiso it was not that flaky after all16:54
dansmithare we canceling the nova meeting next week?16:55
bauzasdansmith: damn shit, forgot to tell it16:55
bauzasunless people wanna run it, which I'm cool wiht 16:55
geguileodansmith: sean-k-mooney last week I mentioned the os-brick idempotency and we talked about reconstructing the disk XML on some operations. I've just opened a bug (#2023078) where Nova is not rebuilding the disk XML after block migration.  Don't know nova enough to know if this would be fixed with the changes to the os-brick idempotency thingy.16:56
sean-k-mooneythe xml for live migration is built on the souce node based on info passed back form the destination host16:57
sean-k-mooneyfor cold migration it build on the dest host16:57
sean-k-mooneyi assume you are refering to live-migration local block devices?16:57
geguileosean-k-mooney: the XML after live migration is wrong16:57
sean-k-mooneythat xml must be generated on the source host16:58
geguileosean-k-mooney: it doesn't have the right discard=unmap value even when the destination is saying that it is supported16:58
sean-k-mooneyis it enabeld on the source16:58
geguileosean-k-mooney: but if the source didn't support it and now it does? Then it cannot be supported until reboot?16:59
geguileoBecause then I think Cinder should start reporting that everything supports discard...16:59
sean-k-mooneygeguileo: correct16:59
sean-k-mooneyso this is not somethign that should be chnaging durign a live migrate17:00
geguileook, so then why do we even report the support for this thing?17:00
geguileoWe should always set discard=unmap in the XML17:00
sean-k-mooneywell we cant because our min libvirt/qemu did not support it17:01
geguileoif it works, nice, if it doesn't it will not prevent it from working after a live migration17:01
sean-k-mooneythat may have changed recently but it was a limitation in the past17:01
opendevreviewsean mooney proposed openstack/nova master: Allow discard with virtio-blk  https://review.opendev.org/c/openstack/nova/+/87879517:01
geguileosean-k-mooney: but if it does now then maybe we have to give it another go at this whole thing17:01
sean-k-mooneygeguileo: without ^ we also dont support discard for all disk buses17:01
sean-k-mooneygeguileo: sure but we have to be careful to ensure that the move ops work properly17:02
geguileosean-k-mooney: that patch should be abandoned... I added a comment to the LP bug17:02
sean-k-mooneythat is not your patch17:02
sean-k-mooneyits my patch to fix the bug17:02
sean-k-mooneythat for some reason was not in launchpad17:02
geguileosean-k-mooney: how is it different to mine?17:02
geguileoit's literaly the same17:03
geguileoand it doesn't work17:03
sean-k-mooneyi make discard work for disk buses17:03
geguileoit just removes a debug log message17:03
geguileoand the log is correct17:03
geguileodiscard doesn't work with virtio17:03
sean-k-mooneyyes it does17:03
geguileodon't know why17:03
sean-k-mooneyit requires a min version fo qemu and libvirt17:03
geguileoI could only make it work with IDE, SCSI, and SATA17:04
sean-k-mooneyi can try and find the downstream bz for virtio-blk again one sec17:04
geguileosean-k-mooney: I have the downstream BZ17:04
geguileosean-k-mooney: I'm just telling you I can't make it work17:04
sean-k-mooneyfor qemu supprot of virtio-blk17:04
geguileo(maybe I'm dumb)17:04
sean-k-mooney*trim with virtio--blk17:04
geguileosean-k-mooney: sure, it says it supports it... I can't make it work without using IDE, SCSI or SATA17:05
sean-k-mooneywell it worked in our ci17:05
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/879077/117:05
geguileosean-k-mooney: did you actually check that the size was reduced?17:05
sean-k-mooneyno but if its not that a qemu bug17:05
geguileosean-k-mooney: but then the log should remain17:05
sean-k-mooneywhy it would not be correct17:06
geguileojust because fstrim says it has freed space within the guest OS it doesn't mean that it has actually hapened17:06
sean-k-mooneysure there are several layeres at play here17:07
geguileosean-k-mooney: oh, I'm sure there are many, but they are beyond my expertise17:07
geguileoI'm just reporting as a storage guy saying, don't know why I can't make it work17:07
geguileolol17:07
sean-k-mooneyso you were using local sotrage17:07
sean-k-mooneyno cinder17:07
sean-k-mooneybooted a vm17:07
sean-k-mooneyallcoated space17:08
geguileoI was booting a VM from RBD, iSCSI, NFS17:08
sean-k-mooneyand then deleteed it an ddid a trim17:08
geguileoThen I did a live volume migration, which made the disk loose sparseness (became thick)17:08
geguileothen I issued the "fstrim -v --all"17:08
sean-k-mooneywell i filed https://bugs.launchpad.net/nova/+bug/2013123 for local storage17:08
geguileosean-k-mooney: yeah, I replied in that LP bug17:09
geguileosean-k-mooney: I'm working on a sparseness document, because this is a CF17:09
geguileo(including the cinder side)17:09
sean-k-mooneywell discard is not just about sparceness17:10
sean-k-mooneybut for what its worth we do not make any statement about spacencie at the api level17:10
dansmithAFAIK, with various file formats you can only expect the used size to decrease if you fully discard a block that covers an extent17:11
geguileosean-k-mooney: yeah, it's also about SSDs optimization, power consumption, etc17:11
dansmithvmware, TMK, only actually reclaims space when the guest is shutdown17:11
geguileodansmith: true, but even then I can clearly see the reduction17:11
geguileoI mean, the size goes down from 1GB to 100MB or so...17:12
sean-k-mooneyso for example i dont knwo if qcow or raw files will actully reduce space17:12
geguileosean-k-mooney: qcow2 does17:12
sean-k-mooneyfor qcow i woudl expect it to be reduced for raw proably not17:12
geguileoit things are set correctly17:12
geguileo(aka all the starts align)17:12
geguileos/starts/stars17:12
geguileoand I can even make NFS/qcow2 and RBD preserve sparseness on live migration17:13
geguileochanging the nova code17:13
sean-k-mooneyhow did you chagne the nova code17:13
geguileoto use the detect_zeroes feature17:13
geguileoI created this LP bug for that one https://bugs.launchpad.net/nova/+bug/202307917:13
sean-k-mooneyok then we can add that as a new feature if you can explain what it is17:13
sean-k-mooneythat is not a bug it woudl be a new feature17:14
sean-k-mooneya small one but its still a feature17:14
geguileosean-k-mooney: https://paste.openstack.org/show/brFgX6MgBlxjgCrE3rbg/17:14
sean-k-mooneygeguileo: what do you mean by volume in https://bugs.launchpad.net/nova/+bug/202307917:14
geguileosean-k-mooney: but that's not the right code17:14
geguileoit's what I used to test17:14
sean-k-mooneyoh "When doing a live volume migration "17:15
geguileobut that has CPU implications when running, so it's best only to change it when the block migration is going to happen17:15
sean-k-mooneygeguileo: we have a differnt concept called block-migration in the context of live migrating a vm17:15
dansmith...yeah17:15
geguileosean-k-mooney: yes, I mean block live migration, I've updated the bug name, thanks17:16
dansmithI was going to say, we should never be block migrating a volume17:16
geguileosean-k-mooney: ooooh, then soooooooorry for mixing terms (/me facepalms)17:16
sean-k-mooneygeguileo: block live migratin in nova mean live migrate a vm with local raw/qcow storage17:16
dansmithit means we literally move all the data17:16
dansmith(all the disk data)17:16
geguileomy bad, I've updated the LP bug17:17
geguileodansmith: we are currently moving ALL the data17:17
sean-k-mooneygeguileo: ya so i know knwo what your trying to fix17:17
geguileothat's why the detect_zeroes would be good for volume live migration17:17
sean-k-mooneyso we expect this to be done by cider using the driver assited migration feature17:17
geguileosean-k-mooney: thanks for you patience in understanding my ramblings :-)17:17
sean-k-mooneyhowever you want nova ot be intelegent enough17:17
sean-k-mooneyso that when we fallback to nova doing the volume migration17:18
sean-k-mooneythat we also preseve the sparceness17:18
geguileosean-k-mooney: problem is that driver assisted migration cannot work between different backends (and afaik it doesn't work for any driver even between volumes of the same array)17:18
geguileosean-k-mooney: afaik today all online volume migrations are done by nova17:18
dansmithwait what?17:18
geguileoI don't think any cinde driver supports it17:18
opendevreviewAmit Uniyal proposed openstack/nova master: Reproducer for dangling bdms  https://review.opendev.org/c/openstack/nova/+/88145717:19
opendevreviewAmit Uniyal proposed openstack/nova master: Delete dangling bdms  https://review.opendev.org/c/openstack/nova/+/88228417:19
sean-k-mooneygeguileo: i was pretty sure you fixed at least one vendor driver last year17:19
dansmithwhy would we ever want to do that? unless you're crossing AZs or something17:19
geguileodansmith: all volumes that are attached to now are migrated by nova17:19
geguileos/now/nova17:19
sean-k-mooneygeguileo: if you change backend right17:19
sean-k-mooneynot just retype17:19
sean-k-mooneywithin the same backend17:19
dansmithretype is a different thing right?17:19
sean-k-mooneyyes17:20
dansmithI'm talking about you live migrate from one server to the next one in the rack, we should not be moving all the volume data to a *new* volume...17:20
geguileodansmith: retype is a different think, but many times it triggers a migration17:20
geguileodansmith: oh, yeah, not a nova live migration17:20
geguileodansmith: it's a volume live migration17:20
sean-k-mooneyretypes can be within hte same backend (just different qos policy) or to a diffent backend17:20
geguileobasically when you mirror the data from one volume to another17:20
dansmithokay, I guess we're confusing too many things17:21
geguileosean-k-mooney: correct!17:21
sean-k-mooneyso i remmeber a custoemr issue like 4-6 months ago where i tought wew fixed scalio or one of the other driver to explcitly preserve sparcens when doign a driver assited voluem migration17:21
geguileodansmith: what's the right term for moving data from one attached volume to another while the instance is running?17:21
dansmithswap volume I think?17:22
geguileosean-k-mooney: I fixed for offline and to report the value to Nova17:22
dansmiththat's the action we see, AFAIK17:22
sean-k-mooneyon the nova side its the swap volume is what is called yes17:22
geguileodansmith: ok, I'll try to talk about swap volume17:22
sean-k-mooneywell no volume migration is fine17:22
dansmithgeguileo: not trying to make you use our language, I just need to know that a bunch of terms have been re-used :)17:23
sean-k-mooneybut you are asserting that the driver asseited path never works for an online volume migration17:23
geguileodansmith: oh, I prefer to use the right language17:23
geguileosean-k-mooney: I don't think we have any cinder driver capable of doing it...17:23
sean-k-mooneyi see...17:24
sean-k-mooneythat kind of suck since you have to transet all the data via the compute node then17:24
dansmithI just don't understand where that happens17:24
geguileomaybe RBD can...17:24
dansmithif the instance is running but nova is transferring the data between volumes.. where in nova is that happening?17:24
geguileosean-k-mooney: agreed, it sucks17:24
geguileodansmith: libvirt/QEMU supports that17:25
geguileoby adding a mirror to the disk17:25
geguileoand once the volumes are mirrored17:25
geguileonova removes the old volume17:25
dansmithah, okay and is that what we're poking via swap?17:25
geguileoI think so17:25
geguileoI believe that's the swap volume17:26
dansmithokay I thought our swap was just "pause, change the connection, unpause"17:26
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#LL2241C16-L2241C1617:26
sean-k-mooneythere i think17:26
dansmithoh okay so it really is us blocking on that action, interesting17:27
geguileosean-k-mooney: sounds about right17:27
sean-k-mooneyi dug into this a few months ago but i have mostly purged that info17:27
dansmithman, that sucks :)17:27
sean-k-mooneyits actully libvirt doing this17:27
sean-k-mooneybut there are a few flags we can pass17:27
geguileoyeah, Nova just waits for the job to complete17:27
sean-k-mooneyto have it sparify the zeors17:27
geguileoyeah, it's the detect_zeroes option from https://paste.openstack.org/show/brFgX6MgBlxjgCrE3rbg/17:28
geguileowell, that's the brute force approach for my tests17:28
geguileobecause that enables them ALL the time17:28
geguileowhich sucks17:28
geguileobut I was able to confirm that it works for NFS/qcow2 and RBD17:28
geguileodoesn't work for SCSI devices though17:28
sean-k-mooneywell it depens on if there is any perfornace overhead to it normally17:28
geguileosean-k-mooney: there is performance overhead17:29
dansmithdefinitely17:29
geguileoso I think it only makes sense when you are going to be reading the whole thing17:29
geguileoand then you save on network + writes17:29
geguileoso only enable it during the volume swap operation17:29
sean-k-mooneywell we cant really change this in responce to a volume migration api request17:29
geguileothen disable it17:29
dansmithyou're trading cpu for disk17:30
sean-k-mooneyeven if we coudl im not sure if we should17:30
geguileodansmith: in volume swap, we are trading cpu for network + disk + time17:30
dansmithyeah I meant if it's enabled all the time17:30
dansmithand yeah, network too17:31
sean-k-mooneywell it would only have an effect on livemigration and on the intal disk creation17:31
geguileodansmith: I don't think we should enable it all the time, because if the storage supports discard, then the disk will be recovered by periodically calling fstrimg like some OSs do17:31
dansmithyeah17:31
geguileobut I think this can greatly improve some volume swap cases17:31
sean-k-mooneywell discard is off by default17:31
sean-k-mooneyand currently only works if you use virtio-scsi which is not our default17:32
geguileosean-k-mooney: yeah, but that's something we have to improve on the cinder side17:32
sean-k-mooneyno i mean in the nova side17:32
geguileoso we properly report the value17:32
sean-k-mooneywe have a config option to opt into allowing discard17:32
sean-k-mooneyand by default we dont17:32
geguileoyeah, but if cinder reports it is supported then nova does the right thing17:32
geguileosean-k-mooney: really?17:32
geguileosean-k-mooney: which one? because I don't recall touching nova conf to enable that one17:33
geguileo(maybe devstack does automatically)17:33
sean-k-mooneyhttps://docs.openstack.org/nova/latest/configuration/config.html#libvirt.hw_disk_discard17:33
sean-k-mooneyso that contols if discard works for local disk at least17:34
geguileosean-k-mooney: oh, but for local, not cinder17:34
geguileogood to know about that one17:34
sean-k-mooneyi was under the impression it had an effect for cinder too but im not sure17:34
sean-k-mooneygeguileo: so the reason the discard beahvior came to my attention17:34
geguileosean-k-mooney: I don't think it does17:34
sean-k-mooneywas i wanted to make discard the defautl for nova17:34
sean-k-mooneyand found it broke virtio-blk17:34
geguileosean-k-mooney: that would be awesome!!!17:34
sean-k-mooneyso i fixed that17:34
geguileowhat did you fix?17:35
sean-k-mooneyi removed the block based on the qemu min version17:35
sean-k-mooneyand i coudl boot vms17:35
sean-k-mooneyso what i need to do is repoduce this locally again17:35
sean-k-mooneyand do some manual testing17:35
sean-k-mooneyto confirm if discard with qcow acturally works17:36
geguileosean-k-mooney: LVM with LIO doesn't currently support trimming17:36
geguileoit's one of the bugs I have a local fix for17:36
sean-k-mooneyok but that wont affect things right17:36
sean-k-mooneysince that driver wont report discard supprot17:36
geguileoso you may see fstrim telling you it has recovered space, but it hasn't really17:36
geguileosean-k-mooney: we can ask cinder drivers to report discard support17:36
sean-k-mooneywell again i dont care about the cinder case im trying to fix discard support for non-cinder storage17:37
geguileousing the report_discard_supported backend option17:37
geguileosean-k-mooney: I was digging into the discard case for cinder   lol17:37
geguileoand forgot to look into the ephemeral case17:37
sean-k-mooneyyep i know :)17:37
sean-k-mooneyephmeral means somethign esle in nova17:38
geguileosean-k-mooney: I'm doing a writeup on my findings, so I'll send you the link later so you can add yours as well17:38
sean-k-mooneyacl17:38
sean-k-mooney*ack17:38
geguileosean-k-mooney: dansmith thank you both for your time :-)17:41
dansmithsame :)17:41
carlosso/ bauzas - what are your thoughts on having a follow-up cross-project session between Nova and Manila in the PTG next week? we can use it to chat about gouthamr's specs18:08
opendevreviewMerged openstack/nova master: Add debug logging when Instance raises OrphanedObjectError  https://review.opendev.org/c/openstack/nova/+/88332520:07

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!