Tuesday, 2024-01-09

LarsErikPHappy new year everyone! Any updates on nova zed 26.2.1? https://review.opendev.org/c/openstack/releases/+/89960408:20
bauzasLarsErikP: we need to merge https://review.opendev.org/c/openstack/nova/+/900307/3 first09:06
bauzassean-k-mooney: could you please +2 this backport ? 09:06
bauzashttps://review.opendev.org/c/openstack/nova/+/900307/309:06
tobias-urdinhad a issue in december where a nova-compute was installed with same hostname and nova-compute service happily started even though placement RP creation got a conflict09:43
tobias-urdinimo nova-compute should fail on startup if placement RP create gets conflict09:43
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Introduce support for rbd with LUKS  https://review.opendev.org/c/openstack/nova/+/88991209:43
tobias-urdinso this: https://review.opendev.org/c/openstack/nova/+/904381 – but that changes all exceptions, see the child patch on that for just catching ResourceProviderCreationFailed09:43
tobias-urdinwhich approach is acceptable?09:44
opendevreviewAmit Uniyal proposed openstack/nova stable/yoga: Fix the PCI device capability dict creation  https://review.opendev.org/c/openstack/nova/+/89925210:05
opendevreviewAmit Uniyal proposed openstack/nova stable/yoga: Store pf_mac_address and vf_num in extra_info  https://review.opendev.org/c/openstack/nova/+/89925310:05
opendevreviewAmit Uniyal proposed openstack/nova stable/yoga: Translate VF network capabilities to port binding  https://review.opendev.org/c/openstack/nova/+/89925410:05
zigostephenfin: Hi there! Do you agree with the new version of my patch? https://review.opendev.org/c/openstack/keystoneauth/+/90386010:36
stephenfinzigo: wrong channel, but comment left on same11:41
sean-k-mooneyzigo: for what its worht the current version of the patch looks ok to me12:24
sean-k-mooneybauzas: ill review it now12:25
sean-k-mooneybauzas: done 12:28
tobias-urdinsean-k-mooney: can u check above – would like your feedback on it and i'll whip up a correct patch based on that12:33
sean-k-mooneyyou started 2 nova compute services withthe same host name12:34
sean-k-mooneyand the placement rp failed to be created because it was using a diffent service uuid12:34
sean-k-mooneythat should only be possibel if they are in diffent cells today12:35
sean-k-mooneythe first one to start will register its self as a service, and write the uuid to /var/lib/nova/compute_node12:36
sean-k-mooneythe second one will find the service and not the file containing the uuid and will fail to start12:36
tobias-urdinkind of, the original nova-compute was already removed, new node was added with same hostname before it was removed, but nova-compute reported up even though RP creation failed12:40
tobias-urdini would expect nova-compute to fail if RP could not be created i.e wrong uuid = conflict12:41
tobias-urdins/removed/stopped/g12:41
sean-k-mooneythe current expection is that it will eventually be crated12:41
sean-k-mooneybut it soudn like you incorrectly replafced the node12:42
sean-k-mooneyif you are replacing the node and keeping the same identity/hostname12:42
sean-k-mooneythen you shoudl not do a compute service delete12:42
sean-k-mooneyas part of the replacement12:42
sean-k-mooneyand you shoudl take the old compute_node uuid and put it in the compute node file12:43
tobias-urdinyes automation/manual steps failed for removing the nova service record (and thus RP) before installing new node, but issue would be caught  if nova-compute failed start so that it reported as down12:43
sean-k-mooneyright but what im saying is the logic is wrong12:43
sean-k-mooneythere are 2 ways to do that replacement12:43
sean-k-mooneyyou can scale in (fully deleting the node in nova/neutron/placemetn) and then scale out12:44
sean-k-mooneyif you do that there is a strict ordering requirement12:44
sean-k-mooneythe name cannot be reused until its fully deleted12:44
sean-k-mooneyor you can replace the node without deleting the compute/neutron agent services12:44
sean-k-mooneywhich is how we normally recommend you replace a node when maintaining its identity12:45
tobias-urdinnormally we just delete everything but this time somebody forgot the compute service delete step, i would like to protect operators from having a nova-compute reporting up when it's not working12:45
sean-k-mooney...12:46
tobias-urdinfail on nova-compute startup if RP creation failed12:46
tobias-urdinlike neutron-ovs-agent is looping forever and not reporting up if br-int or local_ip is wrong12:46
tobias-urdinsafety first ¯\_(ツ)_/¯12:47
sean-k-mooneyso the problem is that we expect to be able to start the compute agents before placement is up12:47
sean-k-mooneyand for them to eventually register when it becomes aviableable12:47
sean-k-mooneythat woudl still be true if you have something liek systemd12:47
sean-k-mooneythat will restart nova-compute externally12:48
tobias-urdinbut if we explicitly fail on placement RP conflict then? would that work12:48
sean-k-mooneyso your change might be ok12:48
sean-k-mooneyit woudl only work if you have an external service restarting it12:48
sean-k-mooneyim not nessisarly agaisnt this change12:49
sean-k-mooneyim a little wary of just doing it for any exception12:50
sean-k-mooneytobias-urdin: the up status of a compute service is primarly ment to signal that its able to comunicate via rpc with the hypervior12:51
sean-k-mooneyi say it that way as we have specific logic to mark it down if the connection to libivrt is lost12:51
sean-k-mooneyits technially not  ment to be a healtcheck12:52
sean-k-mooneyso im a little concerned about scope creap12:52
sean-k-mooneywith that said you are stoping the agent if it fails12:53
sean-k-mooneyso the meaning of up technically has not been changed by this patch12:53
tobias-urdinyeah i understand, for us basically since we don't verify anything in placement specifically with monitoring, the nova-compute was up but it never scheduled any workload there12:54
tobias-urdinit's a messy edge case but if we get ResourceProviderCreationFailed and it's a conflict we could perhaps just fail on startup to not report up12:55
sean-k-mooneyfor what its worht this is why im working on adding a per process healt check endpoint12:55
tobias-urdinso took us some day or two before we noticed it12:55
tobias-urdinyea12:55
sean-k-mooneyso i think i would be happier if you added a new except for ResourceProviderCreationFailed12:55
sean-k-mooneyrather then doing this for just Exception12:55
sean-k-mooneythe other thing im thinking about is 12:56
sean-k-mooneyhow does your chage work with ironic12:56
sean-k-mooneyim not expictly aware of an issue with ironic but each ironic agent is managing multiple compute nodes (1 per ironic node) each with there own RP12:57
sean-k-mooneyin its current form if there was an error for any of the ironic RPs it would cause the ironic compute agent to exit12:58
sean-k-mooneyso we may want to skip this if its ironic12:58
sean-k-mooneytobias-urdin: what release are you runing by the ay13:01
sean-k-mooney*way13:01
tobias-urdinmix of yoga and zed, but nova is yoga13:05
sean-k-mooneyack so ya you dont have the stabel uuid feature13:05
sean-k-mooneywhich would catch this.13:05
tobias-urdindidn't think about the ironic use-case13:06
tobias-urdinwould nova-compute service be down in that case?13:06
sean-k-mooneyso usign the normal meaning of up/down it should be up if it cna comunicate with teh ironic rest api and it accessabel over the rpc bus13:07
sean-k-mooneyeven if one of the ironic computes is not usabel13:07
sean-k-mooneycomptue nodes do not have an up or down state13:07
sean-k-mooneyonly comptue services do13:07
tobias-urdinsorry, with stable uuid feature13:08
sean-k-mooneyoh for stable uuid we explcitly dont use the fature with ironic13:08
sean-k-mooneywe put if checks in the code to skip it if its ironic because ironic is special13:08
tobias-urdini mean if we were on newer release with stable uuid feature would nova-compute service fail to start and not report itself as up if there was a conflict like this?13:08
sean-k-mooneyit will be less special once the peer_list and hashring is removed13:08
sean-k-mooneyyes13:09
sean-k-mooneyin antelope it shoudl not start for libvirt13:09
sean-k-mooneyto reuse the same hostname you now need to create /var/lib/nova/comptue_node with the old comptue node uuid13:09
tobias-urdinthen imo i will just drop the patches and ignore the issue until we are there, it's a edge case that at best helps operators know they have done something wrong (which imo is good, but prob not worth the work here) 13:10
sean-k-mooneythe placment RP is curated useing the hypervior_hostname as the RP name and the compute node uuid as the rp uuid13:10
sean-k-mooneyi prefer your second patch by the way13:10
sean-k-mooneyi woudl not be against that approch as a backprot pre antelope13:10
sean-k-mooneyif it also didnt raise for ironic13:11
tobias-urdinack, thx for your time!13:13
opendevreviewMerged openstack/nova stable/xena: Unify placement client singleton implementations  https://review.opendev.org/c/openstack/nova/+/85899913:22
opendevreviewMerged openstack/nova stable/zed: add a regression test for all compute RPCAPI 6.x pinnings for rebuild  https://review.opendev.org/c/openstack/nova/+/90030713:31
opendevreviewRajesh Tailor proposed openstack/nova-specs master: List requested availability zones  https://review.opendev.org/c/openstack/nova-specs/+/90436813:41
ykarelhi is the issue known checked https://bugs.launchpad.net/nova/+bug/2045785 ?13:44
* bauzas reminder : nova meeting in 1.5h here 14:31
sean-k-mooneyykarel: maybe a timeout issue with the tempet test14:33
sean-k-mooneyi knwo there is at least one unfixed tempest bug14:34
sean-k-mooneywhere tempet is not currectly using the instance action events api to determin if voluem detach? has completed14:34
sean-k-mooneyso perhaps this test is not waiting properly14:34
dansmithbauzas: FYI, I have a dentist appointment so I'll miss the meeting15:01
bauzas++ hope you'll be all good :)15:01
kashyapdansmith: Good luck with the dentist :)15:58
bauzas#startmeeting nova16:00
opendevmeetMeeting started Tue Jan  9 16:00:59 2024 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.16:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:00
opendevmeetThe meeting name has been set to 'nova'16:00
bauzasheyho16:01
bauzashappy new year folks !16:01
elodilleso/16:01
* kashyap waves hi16:01
grandchildo/16:01
elodilleshappy new year o/16:01
bauzas#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting16:01
opendevreviewRajesh Tailor proposed openstack/nova master: Add support for showing requested az in output  https://review.opendev.org/c/openstack/nova/+/90456816:02
fwieselo/16:02
kashyapbauzas: I had one more item to bring up, but didn't add it to the wiki (bad me)16:03
bauzasok let's start our first meeting of 0x7E816:03
bauzaskashyap: no worries, I'll ping you at the end16:03
bauzas#topic Bugs (stuck/critical) 16:03
bauzas#info No Critical bug16:03
bauzaswoohoo16:03
bauzas#link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 43 new untriaged bugs (-1 since the last meeting)16:04
bauzasthanks folks !16:04
bauzas#info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster16:04
bauzas#info bug baton could be Uggla16:04
bauzasany bug reports you would want to discuss ?16:04
bauzaslooks none16:05
bauzas#topic Gate status 16:05
bauzas#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:05
bauzas#link https://etherpad.opendev.org/p/nova-ci-failures-minimal16:05
bauzas#link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status16:05
bauzasall greens16:06
bauzas#info Please look at the gate failures and file a bug report with the gate-failure tag.16:06
bauzasI've seen some post-failure on the zed stable16:07
bauzasbut apart from this, nothing else16:07
bauzasanything else ?16:08
bauzaslooks not16:10
bauzas#topic Release Planning 16:10
bauzas#link https://releases.openstack.org/caracal/schedule.html#nova16:10
bauzasIMPORTANT :16:10
bauzas#info Caracal-2 (and spec freeze) milestone this Thursday 16:10
bauzasthis means that after Thursday, we would not accept new specs16:10
elodillesnote: there is only os-vif with changes worth to release at Caracal-2 it seems: https://review.opendev.org/c/openstack/releases/+/90492716:11
bauzasI'll try to look at all the already open specs but we already have a lot of accepted specs for this cycle : https://blueprints.launchpad.net/nova/2024.116:11
bauzaselodilles: I've seen the email, I'll look at it16:11
elodillesbauzas: ++16:12
bauzasanything people want to discuss about open specs ?16:14
bauzaslooks not16:15
bauzasmoving on16:15
bauzas#topic Review priorities 16:15
bauzas#link https://etherpad.opendev.org/p/nova-caracal-status16:16
bauzasfolks, please look at the etherpad16:16
bauzasI've added all the blueprints in it16:16
bauzas(I actually forgot to add the last blueprint we agreed)16:17
kashyapThanks for the link; /me bookmarks16:18
bauzasvoila16:18
bauzasmoving on16:18
bauzas#topic Stable Branches 16:18
bauzaselodilles: ?16:19
elodilles#info stable gates are not blocked but patches on older branches can have many rechecks until they get merged16:19
elodillesthe usual, i would say.16:19
elodilles#info nova stable Zed release: https://review.opendev.org/c/openstack/releases/+/89960416:19
elodillesthis is waiting for one last patch to merge afaik ^^^16:19
bauzasindeed16:19
elodilles+116:19
elodilles#info according to TC's plan (train and) ussuri branches are proposed to be End of Life if a team is OK with that: https://review.opendev.org/c/openstack/releases/+/90327816:20
sean-k-mooneyyep16:20
sean-k-mooneycan we proceed with that16:20
bauzasI'll look at the Ussuri EOL change and if everything is OK,  I'll +1 it16:21
elodillesthe patch needs a PTL / release liaison +116:21
elodillesotherwise i think it's OK16:21
elodillesbauzas: thx16:21
elodilles#info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci16:21
elodillesand that's all from me16:21
bauzascool16:21
bauzasthanks elodilles for reporting16:22
bauzas#topic vmwareapi 3rd-party CI efforts Highlights 16:22
fwiesel#Info Access to logs was blocked automatically thanks to corporate policy. Working on exemption.16:22
bauzascool16:22
fwieselSo, the link I provided was promptly automatically shut down. Nothing says IaaS like introducing a ticket process on top of it.16:22
fwiesel#Info Limited external network access from test infrastructure with allow-list (currently: opendev.org, pypi.org).16:22
fwieselThe test infrastructure is now fairly locked down, we probably need to figure out, what we have to add to the allow-list.16:23
fwieselBut at least, that way we feel more comfortable running essentially random stuff from the internet.16:23
fwieselThat's it more or less from my side. Any questions?16:23
bauzasnope for me :)16:24
bauzasthanks for reporting :)16:24
fwieselYou're welcome. Any happy new year.16:24
bauzasyou too 16:24
bauzas#topic Open discussion 16:24
auniyalbauzas, I have one bug to discuss/share 16:24
auniyalwe got the bug w.r.t clean dangling volume patch 16:25
auniyalhttps://bugs.launchpad.net/nova/+bug/204815416:25
auniyalhttps://bugs.launchpad.net/nova/+bug/204818416:25
bauzasnothing in the wikipage, but kashyap had one point16:25
kashyapYeah16:25
kashyapSo it's about this specless blueprint - a suggestion that came from sean-k-mooney: https://blueprints.launchpad.net/nova/+spec/allow-disabling-ephemeral-disk-formatting16:25
kashyapThe main motivation (as I understand it) seems to be the following:16:25
kashyap(1) To explore removal of libguestfs / and unneeded dependencies16:26
kashyap(2) Doing so will help reduce security/package footprint from containers that might be shipping nova-compute.16:26
kashyapToday libguestfs' usage seems to be in the area of file injection and ephemeral disk creation.16:27
bauzasto be clear, the default would be "formatted", right?16:27
kashyapI think so, yes.  I need to check.16:27
kashyapbauzas: That said, putting my upstream hat on:16:27
bauzaswe can't really change our current behaviour so I'd prefer to keep the current usage by default16:28
bauzasbut of course, if operators and products want to not use libguestfs, they can change the option value16:28
kashyapWe should also bear in mind that libguestfs provides many other "adjacent tools" (e.g. an incomplete list: https://paste.opendev.org/show/bfNahtW5QJVeYSQUNaFH/) that operators might be using.16:29
bauzasthere could be some nits on the values of the items for this config option16:29
kashyap(These "adjacent tools" are all to do with disk images, etc.  They're extremely useful.  So even for downstreams I'd be wary of suggesting to 'drop' libguestfs.)16:29
bauzasbut looks to me some implementation question, not a design one16:29
kashyapbauzas: Yeah, the nits could be worked out.  But yeah, the devil is in the implementation indeed.16:30
bauzaskashyap: do you remember the current defaults we have for ephemeral disks ? ext4 ?16:30
kashyapQuickly checking...16:31
bauzasthis would mean that we would need to provide all the supported disk formats in the config option items16:31
bauzasmaybe people don't want this, right?16:31
kashyap(I'd think so, it'd be one of ext3)16:31
bauzasare we able to format our disks by something else but the default format ?16:31
bauzasI can quickly look at the code16:32
kashyap(There's several places the CONF.default_ephemeral_format is used - I guess you're looking for this?)16:34
bauzasnope, I just wonder the negative value of 'unformatted' which then would be the default16:37
kashyapbauzas: The current default is ext4, as I see it16:38
bauzaseverything seems to be handled by this module https://github.com/openstack/nova/blob/master/nova/virt/disk/vfs/guestfs.py16:38
kashyapIt depends on what you mean by "everything" :) I think you mean everything related to libguestfs.  But we don't have to get into the details of it here.16:39
bauzasanyway, I'm OK to approve this specless blueprint but we'll need to make sure that the default behaviour is continuing to support to format the ephemeral disk16:39
bauzassean-k-mooney: any concern by what I said ?16:40
kashyapPlease add that remark in the BP.  I need to still explore what makes sense here16:40
kashyapBut also I need to hear if anyone actually needs this.  We shouldn't do needless work16:40
kashyapI have a hard stop in 5 minutes; afraid.  I need to visit a friend in a hospital.  Can discuss the rest afterwards16:41
bauzasanyone not wanting to accept this blueprint ?16:41
bauzasif not,16:41
bauzas#agreed https://blueprints.launchpad.net/nova/+spec/allow-disabling-ephemeral-disk-formatting is approved as specless, provided it defaults to format the disks16:42
bauzaskashyap: ^16:42
kashyapOkay; for now we can go ahead; thank you16:43
bauzas++16:44
bauzasauniyal: you wanted to discuss a specific bug ?16:44
auniyalyeah - we got this bug https://bugs.launchpad.net/nova/+bug/204815416:45
auniyalsimilar to this - https://bugs.launchpad.net/nova/+bug/204818416:45
auniyalI have created a patch - https://review.opendev.org/c/openstack/nova/+/90481716:45
bauzasadd your patch proposals into https://etherpad.opendev.org/p/nova-caracal-status16:46
auniyalsean-k-mooney for " source_type: snapshot and source_type:image" 16:46
auniyalis it snapshot image16:46
auniyalI created snapshot of Vm then Vm from that snapshot, but their source_type was sa same as original vm16:47
bauzasauniyal: add your patches into https://etherpad.opendev.org/p/nova-caracal-status#L114 so people could review them16:47
auniyalack bauzas16:47
bauzasI think we're done for the meeting16:48
bauzasauniyal: we don't really discuss bugs in the nova meeting16:48
bauzasbut if you want, please ping other folks after we end this meeting16:48
auniyalack16:48
auniyalbauzas, I added the patch in etherpad16:49
bauzasok, thanks all then16:49
bauzasand happy new year again16:50
bauzas#endmeeting16:50
opendevmeetMeeting ended Tue Jan  9 16:50:04 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:50
opendevmeetMinutes:        https://meetings.opendev.org/meetings/nova/2024/nova.2024-01-09-16.00.html16:50
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/nova/2024/nova.2024-01-09-16.00.txt16:50
opendevmeetLog:            https://meetings.opendev.org/meetings/nova/2024/nova.2024-01-09-16.00.log.html16:50
elodillesthanks o/16:50
opendevreviewAmit Uniyal proposed openstack/nova master: Refactor vf profile for PCI device  https://review.opendev.org/c/openstack/nova/+/90513417:00
sean-k-mooneyauniyal: i was refering to a volume created form a volume snapshot17:01
sean-k-mooneybauzas: sorry i was on another call17:02
sean-k-mooneybauzas: but know i dont have an objection ot https://blueprints.launchpad.net/nova/+spec/allow-disabling-ephemeral-disk-formatting being specless17:02
sean-k-mooneybauzas: and ya the default should not change from today17:03
sean-k-mooneyyou shoudl just have the option of configuring it to be unformatted. and not require libguestfs as a result17:03
sean-k-mooneybut no change of behavior by defaul17:03
opendevreviewAmit Uniyal proposed openstack/nova stable/2023.2: Refactor vf profile for PCI device  https://review.opendev.org/c/openstack/nova/+/90513517:05
opendevreviewAmit Uniyal proposed openstack/nova stable/2023.1: Refactor vf profile for PCI device  https://review.opendev.org/c/openstack/nova/+/90513717:07
opendevreviewAmit Uniyal proposed openstack/nova stable/zed: Refactor vf profile for PCI device  https://review.opendev.org/c/openstack/nova/+/90513817:08
bauzassean-k-mooney: my only implementation concern is what we would provide as a option value for continuing to format the disk17:15
bauzasshould it be "formatting" or "ext4" or something else17:16
sean-k-mooneywell the propsasl is to extend the existing enum value with unformated17:19
sean-k-mooneyand leave the existing behavior unchaged17:20
sean-k-mooneythe current formating is driver speciric17:20
sean-k-mooneyand we also look at the OS_TYPE metadta if set on the instnace image17:20
sean-k-mooneyfor linux i belive we default to ext4 but use fat32 or ntfs for windows17:21
sean-k-mooneynon of that behavior needs to change17:21
sean-k-mooneywe just need to add unformated ot https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_ephemeral_format17:21
sean-k-mooneyan yes so looking at the docs windows guest use ntfs by default17:22
sean-k-mooneyso the default will still be None meaning the virt driver chooses17:22
sean-k-mooneybauzas: on a related note i want to remove some config options that should have been removed a long time ago but i need to deprecate them first.17:23
sean-k-mooneyobviously we need to agree to do that but is that something yoru happy to agree on via gerrirt or do you want me to file something17:24
sean-k-mooneyi want to for  https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.vif_plugging_is_fatal to be true and remove it as an option or at least move it to workarounds17:24
sean-k-mooneyyou shoudl not run with vif_plugging_is_fatal=false in production17:25
sean-k-mooneyever17:25
sean-k-mooneyso i want to deprecate that in this release and then condier removing it next cycle17:25
opendevreviewAmit Uniyal proposed openstack/nova master: Refactor vf profile for PCI device  https://review.opendev.org/c/openstack/nova/+/90513417:26
opendevreviewAmit Uniyal proposed openstack/nova master: Refactor vf profile for PCI device  https://review.opendev.org/c/openstack/nova/+/90513417:35
opendevreviewMerged openstack/nova stable/xena: Avoid n-cond startup abort for keystone failures  https://review.opendev.org/c/openstack/nova/+/85900018:40
sean-k-mooneybauzas: can you review this spec tomorrow its trivial but required as its an api change https://review.opendev.org/c/openstack/nova-specs/+/90436819:04
opendevreviewMerged openstack/nova stable/zed: Fix rebuild compute RPC API exception for rolling-upgrades  https://review.opendev.org/c/openstack/nova/+/90034120:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!