Tuesday, 2023-10-24

sean-k-mooney[m]simondodsley:  one thing to keep in mind is openstack and nova in particular is not an enterprise virtualisation platform and vmware is not a compator, aws or azure is a better comparison01:10
sean-k-mooney[m]ignoreing the fact ware are likely to remove the vmware driver shortly, vmware is one of the backend hyperviors you can use with nova. but nova is a cloud compute project so feature parity with enterpirse virtualisation software is not a goal for nova01:12
opendevreviewDmitriy Rabotyagov proposed openstack/nova stable/2023.2: Install lxml before we need it in post-run  https://review.opendev.org/c/openstack/nova/+/89914207:10
opendevreviewDmitriy Rabotyagov proposed openstack/nova stable/2023.2: Revert "Add upgrade check for compute-object-ids linkage"  https://review.opendev.org/c/openstack/nova/+/89895307:10
bauzasfwiw, I'm more than happy to welcome volunteers wanting to work on https://blueprints.launchpad.net/nova/+spec/detach-boot-volume in particular if they want this is important for their users08:16
bauzasI can offer nova onboarding and knowledge transfer if that helps08:17
bauzasI'm paid for leading a community, but I need this community to enact by itself :)08:17
bauzassimondodsley: fwiw, we have an PTG operator hour that's proposed today at 1500UTC https://ptg.opendev.org/ptg.html 08:41
bauzassimondodsley: if you feel you'd like to discuss your feature request, feel free to join08:41
noonedeadpunkhey folks. I get weird error on HEAD of stable/2023.1 when trying to rebuild the instance: https://paste.openstack.org/show/biUIcOzMCx0YlsFob2KK/08:42
bauzasand like I said, if I can help someone volunteering, I'll surely offer onboarding 08:42
bauzasnoonedeadpunk: eek08:43
noonedeadpunkI decided to ask if it's smth known already so that I won't spend time on looking deeper (as for me it's not very obvious from the trace)08:43
noonedeadpunkyeah, detach-boot-volume is really interesting thing, though I can recall it being not as trivial as it might look at the beginning08:46
bauzasnoonedeadpunk: this is a RPC call 08:46
noonedeadpunkok, so I should check on.. conductor I guess08:46
bauzasthis sounds a compat issue between conductor and compute08:47
bauzasbasically your compute awaits some argument08:47
bauzasthat conductor isn't passing08:47
bauzasI guess you upgraded ?08:47
bauzasdo you have a rolling upgrade in lieu ?08:48
bauzasnoonedeadpunk: that's the client RPC code that does the RPC negociation https://github.com/openstack/nova/blob/stable/2023.1/nova/compute/rpcapi.py#L1106-L111008:50
bauzasthis client code is run by the nova-conductor service08:50
bauzasor by a compute service if this is compute-to-compute calls08:50
noonedeadpunkYeah, I am. And I see version 66 for every service in `services` table of cell db08:50
noonedeadpunkoh, wait, couple of computes are still 60.... Hm08:51
noonedeadpunkNot the affected one though, but still smth I need to fix08:51
bauzasdo you have some rpc pins ?08:52
noonedeadpunknah, I don't08:54
noonedeadpunkOr well, should not have - never set them08:54
bauzashttps://github.com/openstack/nova/blob/stable/2023.1/nova/conductor/manager.py#L1395-L1414 is where conductor calls compute for rebuilding08:54
noonedeadpunkOk, computes that have eariler version are down right now...08:55
bauzasthe conductor manager is then invoking compute.rpcapi's rebuild_instance() which does RPC negociation08:55
bauzasand then remove target_state if computes are older than rpc 6.208:55
noonedeadpunkbauzas: is there any easy way to see reported prc version? becauce I guess in nova.services it's a different version rather then RPC.08:57
bauzasnoonedeadpunk: https://github.com/openstack/nova/commit/8c2e76598995f0d417653c60a63ea342baf4e880 is the commit that adds RPC 6.2 08:58
bauzasnoonedeadpunk: good question08:58
bauzasI'd say internal RPC versions aren't meant to be exposed, but we provide aliases08:58
bauzashttps://github.com/openstack/nova/blob/stable/2023.1/nova/compute/rpcapi.py#L42808:59
bauzaszed computes are exposing 6.1, antelope ones are 6.208:59
noonedeadpunkI guess I was more willing to check what computes/conductor reports as versions09:00
bauzasif both your conductor *and* computes are 6.2, then try out to pin the rpc version09:00
bauzasyeah, got it09:00
bauzasthe service version should tho give you that info09:00
noonedeadpunkso if all is 66 - it should kinda work...09:00
bauzashttps://github.com/openstack/nova/commit/8c2e76598995f0d417653c60a63ea342baf4e880#diff-c0b6a5928be3ac40200a2078b084341bb9187a12b1f959ad862e0038c902919309:00
bauzas66 is the object version that matches RPC 6.209:01
noonedeadpunkAs iirc 66 is 2023.1 09:01
noonedeadpunkyeah...09:01
bauzasare you sure you restarted all your conductors after the upgrade ? :)09:01
bauzassorry for that silly question but I had to09:01
noonedeadpunkThey won't report 66 otherwise?09:02
bauzasgood call09:02
noonedeadpunkI was actually trying to find a way to check what I could forgot to restart :)09:02
noonedeadpunkBut I'm checking now....09:02
bauzasI guess you can find the error in the conductor log, isn't it ?09:02
bauzasif so, identify the conductor service in cause09:03
noonedeadpunkWell, I've checked ps on all conductor hosts - all running from same venv and venv have same nova version installed...09:10
noonedeadpunkAnd logs are clean as conductor doesn't obviously see any issues09:10
noonedeadpunklet me check if code does contain these patches you've sent me09:10
noonedeadpunkyeah, it's obviously there for all conductors...09:14
noonedeadpunkWell, upgrade check throws `hw_machine_type unset` warning. Will try to sort it out - not sure if that's related though09:21
bauzasno, this isn't09:23
bauzasnoonedeadpunk: since it's a venv, can you force to add some log in the conditional ?09:25
noonedeadpunksure09:25
bauzasoh waity09:26
bauzasthis is a rebuild revert, right?09:26
bauzasuh no09:26
bauzaswe're just calling the decorator in case rebuild_instance raises an exception09:27
bauzasspeaking out loud,09:27
bauzasthe RPC check on conductor considers that compute is older09:27
bauzasbut this is also not passing the target_state parameter or we would have an exception09:28
noonedeadpunkno, it's not revert, it's bassically just `openstack server rebuild <server_uuid> --image <image_uuid>`09:29
bauzasyeah yeah09:29
bauzasit's popping up in the deco just because this is wrapping the call itself09:30
noonedeadpunkBut yes, looking at code you've shown I tend to agreee that issue is in rpc version, but it's very confusing, given that in DB it's reported as 66... And I pretty sure that before upgrade it was 61 or smth09:30
bauzasbut that means the RPC call doesn't contain the expected target_state arg09:30
bauzasso something happens in the negociation09:30
bauzascompute says "give me target_state" but conductor doesn't offer it09:31
noonedeadpunklet me try to restart compute maybe, just in case....09:31
bauzasI'd say the other way, a conductor restart09:32
bauzascompute is new enough to require target_state09:32
noonedeadpunkyeah, makes sense...09:32
bauzasand https://github.com/openstack/nova/blob/stable/2023.1/nova/compute/manager.py#L621 exposes 6.2 I just checked09:33
bauzasso yeah, I'd add a few debugging logs in the compute rpcapi method09:34
bauzas(well, if this was my env, I'd pdb :D)09:34
noonedeadpunk`Oct 24 09:34:35 control01-nova-api-container-13281850 nova-conductor[4131712]: 2023-10-24 09:34:35.835 4131712 INFO nova.compute.rpcapi [None req-28a1e0a6-3f71-4238-825d-b508d33ac758 - - - - - -] Automatically selected compute RPC version 6.1 from minimum service version 64`09:34
noonedeadpunko_O09:34
bauzasbut logging the object states before the RPC call seems a way forward09:34
bauzasahaha !09:35
noonedeadpunkHm... Maybe that\s a compute which is down making things weird09:35
bauzasno09:35
bauzasI have to think about it, sec09:36
noonedeadpunkWell. Compute that is down still having version 6409:36
noonedeadpunkSo it can bring negotiation to that version?09:36
noonedeadpunkI pretty much can bet it's the case...09:37
bauzasoh, wait a sec09:37
noonedeadpunkthe compute was down during the upgrade09:38
bauzasI need to remember my rpc compat skilles09:38
bauzasskills*09:38
bauzasyeah, so the thing is,https://github.com/openstack/nova/blob/stable/2023.1/nova/compute/rpcapi.py#L504-L505 gives you 6409:39
bauzasso we're capping to the minimum version09:39
bauzasbut then, the manager shouldn't yell09:39
bauzasI'm missing something obvious but I'm rusty09:39
noonedeadpunkI kinda wonder if `get_minimum_version` should account only for ones that are up...09:40
noonedeadpunkBut yeah, also if that minimum version is fine, then computes shouldn't fail because version is too low either I guess...09:41
bauzasno no09:41
bauzasthis is a bug09:41
bauzasI just checked09:41
bauzaswe broke rolling-upgrade compat, that's it09:41
noonedeadpunkShould I submit one then?09:41
bauzasthe manager should default target_state=None 09:42
bauzasbecause we automatically pin our RPC calls to the min version09:42
bauzasI'll doublecheck that with dansmith but I bet this is the problem09:42
noonedeadpunkfwiw, dropping that old compute from services made negitoation pass09:43
noonedeadpunk`Automatically selected compute RPC version 6.2 from minimum service version 66` 09:43
bauzasand https://github.com/openstack/nova/commit/30aab9c234035b49c7e2cdc940f624a63eeffc1b#diff-47eb12598e353b9e0689707d7b477353200d0aa3ed13045ffd3d017ee7d9e753R3709 has the same problem09:43
noonedeadpunkand rebuild worked out09:43
bauzasyeah, 64 is zed09:45
noonedeadpunkOk, I can submit a bug report if you think it's a bug, and I have a way around the issue as well :)09:45
bauzasbut basically, rebuild is broken on a rolling-upgrade from Zed to Antelope09:45
bauzasnoonedeadpunk: sure, file it please09:46
bauzasand I think rebuild is also broken from Yoga to Zed btw. :)09:46
noonedeadpunkYes, another region we were upgrading from Yoga was also broken...09:47
bauzason rebuild ?09:47
noonedeadpunkBut it's unofficially supported upgrade, so I took better example :p09:47
noonedeadpunkYeah, same error09:47
bauzasbut not the same argument I guess09:47
noonedeadpunkThe only that has rebuild working is the one that has no computes down09:48
bauzasfrom yoga to zed, it should yell at reimage_boot_volume missing09:48
noonedeadpunkI actually wasn't digging there yet to be frank, as reported by users issues were the same from their side09:48
bauzasI have to doublecheck but I think we're missing to test rebuild with grenade09:49
bauzasanyway, I need to disappear, I have a few toxins to sweat before we start the PTG09:50
noonedeadpunksure, thanks a lot for the help, report is on it's way09:53
noonedeadpunkeventually, since I found out about `hw_machine_type` what you would suggest adding? `q35` for all images basically?09:58
noonedeadpunkbug: https://bugs.launchpad.net/nova/+bug/204026410:15
opendevreviewAlexey Stupnikov proposed openstack/nova stable/2023.1: Translate VF network capabilities to port binding  https://review.opendev.org/c/openstack/nova/+/89894511:39
sean-k-mooneynoonedeadpunk: i have not read back fully but one of the main issues with detachign a boot drive is it leave the instance in a fundementally broken state12:09
sean-k-mooneyif we were every to supprot that i htink we woudl need to have a new state for the vm to model that. and or allow away to say use a seperate disk as the boot device.12:10
noonedeadpunkWell, shelved_offloaded should work kinda?12:10
sean-k-mooneyit also kind of messes with the resouce allcoatins if you dont follwo our guidance and use flavor with 0 disk_gb for bfv guests12:10
noonedeadpunkBut I guess challange there was also to record what the drive is detached to re-attach it as root lately12:11
sean-k-mooneynoonedeadpunk: shelve_offloaded does not work because we cant unshelve if you have detached the volume and attached it to somethign else12:11
noonedeadpunkbut it would be a valid failure that VM can't be offloaded12:11
sean-k-mooneyya so it really comes down to the usecase12:12
noonedeadpunksean-k-mooney: sorry, I didn't get part about "use flavor with 0 disk_gb for bfv guests" - can you elaborate a bit on this part?12:12
noonedeadpunkAs I thought that for bfv you're kinda good at having 0 disk_gb?12:12
sean-k-mooneyyes that is the recomendation12:12
sean-k-mooneybut some peopel dont do that and use a normal flavor12:13
bauzassean-k-mooney: the fact is, we need a new design discussin12:13
bauzasbut if someone wants to work on it, sure I can hel12:13
noonedeadpunkah, ok, as I read it as "you don't follow recommendation AND use 0 disk_gb"12:13
bauzashelp12:13
bauzasthat was my point12:13
bauzasnoonedeadpunk: ++ I'll also discuss it on the PTG12:13
sean-k-mooneybauzas: well right now i think its a hard cell to add to nova unless we put some restricitons in place12:14
sean-k-mooney... sell12:14
bauzassure, see my above comments :)12:14
bauzasanyway12:14
bauzasnoonedeadpunk: I'll provide a bugfix hopefully next week12:15
sean-k-mooneynoonedeadpunk: i would have less issue with saying "you can only detach the root voluem flavor with 0 disk_gb"12:15
bauzasor tomorrow morning if I have time12:15
sean-k-mooneyand then putting the vm in to an "unbootable state"12:15
sean-k-mooneyor only supporting this for shleve_offloaded instances12:15
sean-k-mooneynoonedeadpunk: the real concern was not with the detach but with reattaching potentially a diffent root volume with image properties that were invlaid for the current host12:16
sean-k-mooneybut if its shelve_offloaded after the reattachment then that not  a problem as when we unshleve it will go throughthe schduler12:16
noonedeadpunkYeah, that's why I thought that shelved should be quite appropriate state where you potentially can do that12:17
noonedeadpunkAnd I would say that having 0 disk_gb for flavor would be also fair requirements12:17
noonedeadpunkfor the feature to work12:18
sean-k-mooneythe 0 disk_gb thing is partly a personal prefence. we do not allow non bfv guest to use 0 disk flavors12:18
sean-k-mooneybut we dont require them too either and i really would prefer to require that12:18
sean-k-mooneywe dont actully need to use the flavor size to knwo if its bfv in the api12:19
sean-k-mooneyso we could make it work without that restirction12:19
sean-k-mooneynoonedeadpunk: what i really want to aovid is needing to hit the schduler on every volume attach12:20
noonedeadpunkwell. requiring that might complicate some things actually...12:20
sean-k-mooneyjust in case its happens to be a root volume12:20
noonedeadpunkI mean prohibit to do bfv with non-0 flavor12:20
sean-k-mooneywe didnt do that in the past due to upgrade concerns12:21
sean-k-mooneywhat we did instead id if its BFV we nolonger include the disk_gb in the allcoation request12:21
noonedeadpunknah, I think that it indeed should be some offloaded state imo... Or indeed introduce new one, so that resources won't be wasted on offload per sat12:21
sean-k-mooneyas of some releivly recent openstack version12:21
noonedeadpunkbut which will still go throguh scheduler on boot12:21
noonedeadpunkbut yeah, eventually it's still offloaded...12:22
sean-k-mooneynoonedeadpunk: we will talk about this later in the ptg but im also wondering about the orginal usecase that is of most interest to people12:22
sean-k-mooneyi.e. what fraction of those that want this are using a version of nova that does not supprot rescue or rebuild for bfv guests12:23
sean-k-mooneywe added both in the lat 2 years12:23
noonedeadpunkI think most usecase is in fact some rescue....12:23
noonedeadpunkand yes, that kinda covers my usecase fully12:23
noonedeadpunksean-k-mooney: is it possible to reset delete on termination flag for the root drive?12:24
sean-k-mooneynoonedeadpunk: part of the reason we didnt do this was we decied lets just make the instance action that dont work with bfv work first12:24
sean-k-mooneynoonedeadpunk: that im not sure i think you might eb able to do it via cinder12:24
sean-k-mooneybut that woudl also be reasonable to support IMO if we dont supprot that already12:25
sean-k-mooneyyou shoudl always be able to say "hay dont delete this" after the fact i think12:25
noonedeadpunkyeah12:25
noonedeadpunkas this what strikes my mind when I think why I would need to detach the root volume...12:26
noonedeadpunkJust to save it from being deleted12:26
noonedeadpunk(with the VM)12:26
noonedeadpunkand then rescue works nicely indeed12:26
sean-k-mooneyya and you should be able to do the same for ports. the port case is defenitly on our todo list12:26
noonedeadpunkbut then indeed I don't know why would you need to detach it12:28
noonedeadpunkJust out of convenience... Like first thing someone would try to do rather using rescue...12:29
noonedeadpunkso I think you;re right here and it's smth that ppl were just historically missing due to not working rescue12:29
noonedeadpunkand that is kinda  inertion of thinking12:30
sean-k-mooneyya so thats why im wondering about the usecase12:32
sean-k-mooneyif there is a usecase we cant supprot then im happy to hear that and see what we can do to supprot it12:32
sean-k-mooneybut if its already covered by rescue or updating the delete on terminate flag12:33
sean-k-mooneythen i woudl perfer to use that approch then add root detach12:33
sean-k-mooneyso we will see later i guess12:33
bauzaswe gonna use Zoom eventually https://www.openstack.org/ptg/rooms/diablo13:00
bauzassean-k-mooney: are you arriving to the PTg ?13:05
opendevreviewPavlo Shchelokovskyy proposed openstack/nova master: Fix image conversion check in finish_migration  https://review.opendev.org/c/openstack/nova/+/89784214:28
opendevreviewPavlo Shchelokovskyy proposed openstack/nova master: Deprecate use_cow_images and 'default' images_type  https://review.opendev.org/c/openstack/nova/+/89822914:28
opendevreviewKashyap Chamarthy proposed openstack/nova master: libvirt: Avoid getCapabilities() to calculate host CPU definition  https://review.opendev.org/c/openstack/nova/+/89918515:19
opendevreviewKashyap Chamarthy proposed openstack/nova master: libvirt: Avoid getCapabilities() to calculate host CPU definition  https://review.opendev.org/c/openstack/nova/+/89918515:25
simondodsleysean-k-mooney: sorry - i can't stay any longer on the PTG today - i have other commitments i have to deal with. I would love some update on the spec I put in the etherpad though16:18
opendevreviewMerged openstack/nova stable/2023.2: Install lxml before we need it in post-run  https://review.opendev.org/c/openstack/nova/+/89914216:21
opendevreviewPavlo Shchelokovskyy proposed openstack/placement master: Use new db session when retrying  https://review.opendev.org/c/openstack/placement/+/89880716:43
opendevreviewPavlo Shchelokovskyy proposed openstack/placement master: Use new db session when retrying  https://review.opendev.org/c/openstack/placement/+/89880716:45
bauzasnot fun: we had the exact same issue with rolling upgrade in Victoria>Wallaby on rebuild https://review.opendev.org/c/openstack/nova/+/761458/18:13
bauzasso we really need a grenade test on it18:13
bauzasand perhaps some hacking check, I dunno18:13
bauzasI'll write the fixes tomorrow morning 18:14
bauzasand yeah, we don't test rebuild https://github.com/openstack/nova/blob/master/.zuul.yaml#L56018:15
opendevreviewPavlo Shchelokovskyy proposed openstack/nova master: Fix image conversion check in finish_migration  https://review.opendev.org/c/openstack/nova/+/89784221:49
opendevreviewPavlo Shchelokovskyy proposed openstack/nova master: Deprecate use_cow_images and 'default' images_type  https://review.opendev.org/c/openstack/nova/+/89822921:49
opendevreviewmelanie witt proposed openstack/nova stable/zed: Remove use of removeprefix  https://review.opendev.org/c/openstack/nova/+/89914823:01
opendevreviewMerged openstack/nova master: fix sphinx-lint issues in releasenotes  https://review.opendev.org/c/openstack/nova/+/89708723:54
opendevreviewMerged openstack/nova master: fix sphinx-lint issues in api guide  https://review.opendev.org/c/openstack/nova/+/89708823:54

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!