sean-k-mooney[m] | simondodsley: one thing to keep in mind is openstack and nova in particular is not an enterprise virtualisation platform and vmware is not a compator, aws or azure is a better comparison | 01:10 |
---|---|---|
sean-k-mooney[m] | ignoreing the fact ware are likely to remove the vmware driver shortly, vmware is one of the backend hyperviors you can use with nova. but nova is a cloud compute project so feature parity with enterpirse virtualisation software is not a goal for nova | 01:12 |
opendevreview | Dmitriy Rabotyagov proposed openstack/nova stable/2023.2: Install lxml before we need it in post-run https://review.opendev.org/c/openstack/nova/+/899142 | 07:10 |
opendevreview | Dmitriy Rabotyagov proposed openstack/nova stable/2023.2: Revert "Add upgrade check for compute-object-ids linkage" https://review.opendev.org/c/openstack/nova/+/898953 | 07:10 |
bauzas | fwiw, I'm more than happy to welcome volunteers wanting to work on https://blueprints.launchpad.net/nova/+spec/detach-boot-volume in particular if they want this is important for their users | 08:16 |
bauzas | I can offer nova onboarding and knowledge transfer if that helps | 08:17 |
bauzas | I'm paid for leading a community, but I need this community to enact by itself :) | 08:17 |
bauzas | simondodsley: fwiw, we have an PTG operator hour that's proposed today at 1500UTC https://ptg.opendev.org/ptg.html | 08:41 |
bauzas | simondodsley: if you feel you'd like to discuss your feature request, feel free to join | 08:41 |
noonedeadpunk | hey folks. I get weird error on HEAD of stable/2023.1 when trying to rebuild the instance: https://paste.openstack.org/show/biUIcOzMCx0YlsFob2KK/ | 08:42 |
bauzas | and like I said, if I can help someone volunteering, I'll surely offer onboarding | 08:42 |
bauzas | noonedeadpunk: eek | 08:43 |
noonedeadpunk | I decided to ask if it's smth known already so that I won't spend time on looking deeper (as for me it's not very obvious from the trace) | 08:43 |
noonedeadpunk | yeah, detach-boot-volume is really interesting thing, though I can recall it being not as trivial as it might look at the beginning | 08:46 |
bauzas | noonedeadpunk: this is a RPC call | 08:46 |
noonedeadpunk | ok, so I should check on.. conductor I guess | 08:46 |
bauzas | this sounds a compat issue between conductor and compute | 08:47 |
bauzas | basically your compute awaits some argument | 08:47 |
bauzas | that conductor isn't passing | 08:47 |
bauzas | I guess you upgraded ? | 08:47 |
bauzas | do you have a rolling upgrade in lieu ? | 08:48 |
bauzas | noonedeadpunk: that's the client RPC code that does the RPC negociation https://github.com/openstack/nova/blob/stable/2023.1/nova/compute/rpcapi.py#L1106-L1110 | 08:50 |
bauzas | this client code is run by the nova-conductor service | 08:50 |
bauzas | or by a compute service if this is compute-to-compute calls | 08:50 |
noonedeadpunk | Yeah, I am. And I see version 66 for every service in `services` table of cell db | 08:50 |
noonedeadpunk | oh, wait, couple of computes are still 60.... Hm | 08:51 |
noonedeadpunk | Not the affected one though, but still smth I need to fix | 08:51 |
bauzas | do you have some rpc pins ? | 08:52 |
noonedeadpunk | nah, I don't | 08:54 |
noonedeadpunk | Or well, should not have - never set them | 08:54 |
bauzas | https://github.com/openstack/nova/blob/stable/2023.1/nova/conductor/manager.py#L1395-L1414 is where conductor calls compute for rebuilding | 08:54 |
noonedeadpunk | Ok, computes that have eariler version are down right now... | 08:55 |
bauzas | the conductor manager is then invoking compute.rpcapi's rebuild_instance() which does RPC negociation | 08:55 |
bauzas | and then remove target_state if computes are older than rpc 6.2 | 08:55 |
noonedeadpunk | bauzas: is there any easy way to see reported prc version? becauce I guess in nova.services it's a different version rather then RPC. | 08:57 |
bauzas | noonedeadpunk: https://github.com/openstack/nova/commit/8c2e76598995f0d417653c60a63ea342baf4e880 is the commit that adds RPC 6.2 | 08:58 |
bauzas | noonedeadpunk: good question | 08:58 |
bauzas | I'd say internal RPC versions aren't meant to be exposed, but we provide aliases | 08:58 |
bauzas | https://github.com/openstack/nova/blob/stable/2023.1/nova/compute/rpcapi.py#L428 | 08:59 |
bauzas | zed computes are exposing 6.1, antelope ones are 6.2 | 08:59 |
noonedeadpunk | I guess I was more willing to check what computes/conductor reports as versions | 09:00 |
bauzas | if both your conductor *and* computes are 6.2, then try out to pin the rpc version | 09:00 |
bauzas | yeah, got it | 09:00 |
bauzas | the service version should tho give you that info | 09:00 |
noonedeadpunk | so if all is 66 - it should kinda work... | 09:00 |
bauzas | https://github.com/openstack/nova/commit/8c2e76598995f0d417653c60a63ea342baf4e880#diff-c0b6a5928be3ac40200a2078b084341bb9187a12b1f959ad862e0038c9029193 | 09:00 |
bauzas | 66 is the object version that matches RPC 6.2 | 09:01 |
noonedeadpunk | As iirc 66 is 2023.1 | 09:01 |
noonedeadpunk | yeah... | 09:01 |
bauzas | are you sure you restarted all your conductors after the upgrade ? :) | 09:01 |
bauzas | sorry for that silly question but I had to | 09:01 |
noonedeadpunk | They won't report 66 otherwise? | 09:02 |
bauzas | good call | 09:02 |
noonedeadpunk | I was actually trying to find a way to check what I could forgot to restart :) | 09:02 |
noonedeadpunk | But I'm checking now.... | 09:02 |
bauzas | I guess you can find the error in the conductor log, isn't it ? | 09:02 |
bauzas | if so, identify the conductor service in cause | 09:03 |
noonedeadpunk | Well, I've checked ps on all conductor hosts - all running from same venv and venv have same nova version installed... | 09:10 |
noonedeadpunk | And logs are clean as conductor doesn't obviously see any issues | 09:10 |
noonedeadpunk | let me check if code does contain these patches you've sent me | 09:10 |
noonedeadpunk | yeah, it's obviously there for all conductors... | 09:14 |
noonedeadpunk | Well, upgrade check throws `hw_machine_type unset` warning. Will try to sort it out - not sure if that's related though | 09:21 |
bauzas | no, this isn't | 09:23 |
bauzas | noonedeadpunk: since it's a venv, can you force to add some log in the conditional ? | 09:25 |
noonedeadpunk | sure | 09:25 |
bauzas | oh waity | 09:26 |
bauzas | this is a rebuild revert, right? | 09:26 |
bauzas | uh no | 09:26 |
bauzas | we're just calling the decorator in case rebuild_instance raises an exception | 09:27 |
bauzas | speaking out loud, | 09:27 |
bauzas | the RPC check on conductor considers that compute is older | 09:27 |
bauzas | but this is also not passing the target_state parameter or we would have an exception | 09:28 |
noonedeadpunk | no, it's not revert, it's bassically just `openstack server rebuild <server_uuid> --image <image_uuid>` | 09:29 |
bauzas | yeah yeah | 09:29 |
bauzas | it's popping up in the deco just because this is wrapping the call itself | 09:30 |
noonedeadpunk | But yes, looking at code you've shown I tend to agreee that issue is in rpc version, but it's very confusing, given that in DB it's reported as 66... And I pretty sure that before upgrade it was 61 or smth | 09:30 |
bauzas | but that means the RPC call doesn't contain the expected target_state arg | 09:30 |
bauzas | so something happens in the negociation | 09:30 |
bauzas | compute says "give me target_state" but conductor doesn't offer it | 09:31 |
noonedeadpunk | let me try to restart compute maybe, just in case.... | 09:31 |
bauzas | I'd say the other way, a conductor restart | 09:32 |
bauzas | compute is new enough to require target_state | 09:32 |
noonedeadpunk | yeah, makes sense... | 09:32 |
bauzas | and https://github.com/openstack/nova/blob/stable/2023.1/nova/compute/manager.py#L621 exposes 6.2 I just checked | 09:33 |
bauzas | so yeah, I'd add a few debugging logs in the compute rpcapi method | 09:34 |
bauzas | (well, if this was my env, I'd pdb :D) | 09:34 |
noonedeadpunk | `Oct 24 09:34:35 control01-nova-api-container-13281850 nova-conductor[4131712]: 2023-10-24 09:34:35.835 4131712 INFO nova.compute.rpcapi [None req-28a1e0a6-3f71-4238-825d-b508d33ac758 - - - - - -] Automatically selected compute RPC version 6.1 from minimum service version 64` | 09:34 |
noonedeadpunk | o_O | 09:34 |
bauzas | but logging the object states before the RPC call seems a way forward | 09:34 |
bauzas | ahaha ! | 09:35 |
noonedeadpunk | Hm... Maybe that\s a compute which is down making things weird | 09:35 |
bauzas | no | 09:35 |
bauzas | I have to think about it, sec | 09:36 |
noonedeadpunk | Well. Compute that is down still having version 64 | 09:36 |
noonedeadpunk | So it can bring negotiation to that version? | 09:36 |
noonedeadpunk | I pretty much can bet it's the case... | 09:37 |
bauzas | oh, wait a sec | 09:37 |
noonedeadpunk | the compute was down during the upgrade | 09:38 |
bauzas | I need to remember my rpc compat skilles | 09:38 |
bauzas | skills* | 09:38 |
bauzas | yeah, so the thing is,https://github.com/openstack/nova/blob/stable/2023.1/nova/compute/rpcapi.py#L504-L505 gives you 64 | 09:39 |
bauzas | so we're capping to the minimum version | 09:39 |
bauzas | but then, the manager shouldn't yell | 09:39 |
bauzas | I'm missing something obvious but I'm rusty | 09:39 |
noonedeadpunk | I kinda wonder if `get_minimum_version` should account only for ones that are up... | 09:40 |
noonedeadpunk | But yeah, also if that minimum version is fine, then computes shouldn't fail because version is too low either I guess... | 09:41 |
bauzas | no no | 09:41 |
bauzas | this is a bug | 09:41 |
bauzas | I just checked | 09:41 |
bauzas | we broke rolling-upgrade compat, that's it | 09:41 |
noonedeadpunk | Should I submit one then? | 09:41 |
bauzas | the manager should default target_state=None | 09:42 |
bauzas | because we automatically pin our RPC calls to the min version | 09:42 |
bauzas | I'll doublecheck that with dansmith but I bet this is the problem | 09:42 |
noonedeadpunk | fwiw, dropping that old compute from services made negitoation pass | 09:43 |
noonedeadpunk | `Automatically selected compute RPC version 6.2 from minimum service version 66` | 09:43 |
bauzas | and https://github.com/openstack/nova/commit/30aab9c234035b49c7e2cdc940f624a63eeffc1b#diff-47eb12598e353b9e0689707d7b477353200d0aa3ed13045ffd3d017ee7d9e753R3709 has the same problem | 09:43 |
noonedeadpunk | and rebuild worked out | 09:43 |
bauzas | yeah, 64 is zed | 09:45 |
noonedeadpunk | Ok, I can submit a bug report if you think it's a bug, and I have a way around the issue as well :) | 09:45 |
bauzas | but basically, rebuild is broken on a rolling-upgrade from Zed to Antelope | 09:45 |
bauzas | noonedeadpunk: sure, file it please | 09:46 |
bauzas | and I think rebuild is also broken from Yoga to Zed btw. :) | 09:46 |
noonedeadpunk | Yes, another region we were upgrading from Yoga was also broken... | 09:47 |
bauzas | on rebuild ? | 09:47 |
noonedeadpunk | But it's unofficially supported upgrade, so I took better example :p | 09:47 |
noonedeadpunk | Yeah, same error | 09:47 |
bauzas | but not the same argument I guess | 09:47 |
noonedeadpunk | The only that has rebuild working is the one that has no computes down | 09:48 |
bauzas | from yoga to zed, it should yell at reimage_boot_volume missing | 09:48 |
noonedeadpunk | I actually wasn't digging there yet to be frank, as reported by users issues were the same from their side | 09:48 |
bauzas | I have to doublecheck but I think we're missing to test rebuild with grenade | 09:49 |
bauzas | anyway, I need to disappear, I have a few toxins to sweat before we start the PTG | 09:50 |
noonedeadpunk | sure, thanks a lot for the help, report is on it's way | 09:53 |
noonedeadpunk | eventually, since I found out about `hw_machine_type` what you would suggest adding? `q35` for all images basically? | 09:58 |
noonedeadpunk | bug: https://bugs.launchpad.net/nova/+bug/2040264 | 10:15 |
opendevreview | Alexey Stupnikov proposed openstack/nova stable/2023.1: Translate VF network capabilities to port binding https://review.opendev.org/c/openstack/nova/+/898945 | 11:39 |
sean-k-mooney | noonedeadpunk: i have not read back fully but one of the main issues with detachign a boot drive is it leave the instance in a fundementally broken state | 12:09 |
sean-k-mooney | if we were every to supprot that i htink we woudl need to have a new state for the vm to model that. and or allow away to say use a seperate disk as the boot device. | 12:10 |
noonedeadpunk | Well, shelved_offloaded should work kinda? | 12:10 |
sean-k-mooney | it also kind of messes with the resouce allcoatins if you dont follwo our guidance and use flavor with 0 disk_gb for bfv guests | 12:10 |
noonedeadpunk | But I guess challange there was also to record what the drive is detached to re-attach it as root lately | 12:11 |
sean-k-mooney | noonedeadpunk: shelve_offloaded does not work because we cant unshelve if you have detached the volume and attached it to somethign else | 12:11 |
noonedeadpunk | but it would be a valid failure that VM can't be offloaded | 12:11 |
sean-k-mooney | ya so it really comes down to the usecase | 12:12 |
noonedeadpunk | sean-k-mooney: sorry, I didn't get part about "use flavor with 0 disk_gb for bfv guests" - can you elaborate a bit on this part? | 12:12 |
noonedeadpunk | As I thought that for bfv you're kinda good at having 0 disk_gb? | 12:12 |
sean-k-mooney | yes that is the recomendation | 12:12 |
sean-k-mooney | but some peopel dont do that and use a normal flavor | 12:13 |
bauzas | sean-k-mooney: the fact is, we need a new design discussin | 12:13 |
bauzas | but if someone wants to work on it, sure I can hel | 12:13 |
noonedeadpunk | ah, ok, as I read it as "you don't follow recommendation AND use 0 disk_gb" | 12:13 |
bauzas | help | 12:13 |
bauzas | that was my point | 12:13 |
bauzas | noonedeadpunk: ++ I'll also discuss it on the PTG | 12:13 |
sean-k-mooney | bauzas: well right now i think its a hard cell to add to nova unless we put some restricitons in place | 12:14 |
sean-k-mooney | ... sell | 12:14 |
bauzas | sure, see my above comments :) | 12:14 |
bauzas | anyway | 12:14 |
bauzas | noonedeadpunk: I'll provide a bugfix hopefully next week | 12:15 |
sean-k-mooney | noonedeadpunk: i would have less issue with saying "you can only detach the root voluem flavor with 0 disk_gb" | 12:15 |
bauzas | or tomorrow morning if I have time | 12:15 |
sean-k-mooney | and then putting the vm in to an "unbootable state" | 12:15 |
sean-k-mooney | or only supporting this for shleve_offloaded instances | 12:15 |
sean-k-mooney | noonedeadpunk: the real concern was not with the detach but with reattaching potentially a diffent root volume with image properties that were invlaid for the current host | 12:16 |
sean-k-mooney | but if its shelve_offloaded after the reattachment then that not a problem as when we unshleve it will go throughthe schduler | 12:16 |
noonedeadpunk | Yeah, that's why I thought that shelved should be quite appropriate state where you potentially can do that | 12:17 |
noonedeadpunk | And I would say that having 0 disk_gb for flavor would be also fair requirements | 12:17 |
noonedeadpunk | for the feature to work | 12:18 |
sean-k-mooney | the 0 disk_gb thing is partly a personal prefence. we do not allow non bfv guest to use 0 disk flavors | 12:18 |
sean-k-mooney | but we dont require them too either and i really would prefer to require that | 12:18 |
sean-k-mooney | we dont actully need to use the flavor size to knwo if its bfv in the api | 12:19 |
sean-k-mooney | so we could make it work without that restirction | 12:19 |
sean-k-mooney | noonedeadpunk: what i really want to aovid is needing to hit the schduler on every volume attach | 12:20 |
noonedeadpunk | well. requiring that might complicate some things actually... | 12:20 |
sean-k-mooney | just in case its happens to be a root volume | 12:20 |
noonedeadpunk | I mean prohibit to do bfv with non-0 flavor | 12:20 |
sean-k-mooney | we didnt do that in the past due to upgrade concerns | 12:21 |
sean-k-mooney | what we did instead id if its BFV we nolonger include the disk_gb in the allcoation request | 12:21 |
noonedeadpunk | nah, I think that it indeed should be some offloaded state imo... Or indeed introduce new one, so that resources won't be wasted on offload per sat | 12:21 |
sean-k-mooney | as of some releivly recent openstack version | 12:21 |
noonedeadpunk | but which will still go throguh scheduler on boot | 12:21 |
noonedeadpunk | but yeah, eventually it's still offloaded... | 12:22 |
sean-k-mooney | noonedeadpunk: we will talk about this later in the ptg but im also wondering about the orginal usecase that is of most interest to people | 12:22 |
sean-k-mooney | i.e. what fraction of those that want this are using a version of nova that does not supprot rescue or rebuild for bfv guests | 12:23 |
sean-k-mooney | we added both in the lat 2 years | 12:23 |
noonedeadpunk | I think most usecase is in fact some rescue.... | 12:23 |
noonedeadpunk | and yes, that kinda covers my usecase fully | 12:23 |
noonedeadpunk | sean-k-mooney: is it possible to reset delete on termination flag for the root drive? | 12:24 |
sean-k-mooney | noonedeadpunk: part of the reason we didnt do this was we decied lets just make the instance action that dont work with bfv work first | 12:24 |
sean-k-mooney | noonedeadpunk: that im not sure i think you might eb able to do it via cinder | 12:24 |
sean-k-mooney | but that woudl also be reasonable to support IMO if we dont supprot that already | 12:25 |
sean-k-mooney | you shoudl always be able to say "hay dont delete this" after the fact i think | 12:25 |
noonedeadpunk | yeah | 12:25 |
noonedeadpunk | as this what strikes my mind when I think why I would need to detach the root volume... | 12:26 |
noonedeadpunk | Just to save it from being deleted | 12:26 |
noonedeadpunk | (with the VM) | 12:26 |
noonedeadpunk | and then rescue works nicely indeed | 12:26 |
sean-k-mooney | ya and you should be able to do the same for ports. the port case is defenitly on our todo list | 12:26 |
noonedeadpunk | but then indeed I don't know why would you need to detach it | 12:28 |
noonedeadpunk | Just out of convenience... Like first thing someone would try to do rather using rescue... | 12:29 |
noonedeadpunk | so I think you;re right here and it's smth that ppl were just historically missing due to not working rescue | 12:29 |
noonedeadpunk | and that is kinda inertion of thinking | 12:30 |
sean-k-mooney | ya so thats why im wondering about the usecase | 12:32 |
sean-k-mooney | if there is a usecase we cant supprot then im happy to hear that and see what we can do to supprot it | 12:32 |
sean-k-mooney | but if its already covered by rescue or updating the delete on terminate flag | 12:33 |
sean-k-mooney | then i woudl perfer to use that approch then add root detach | 12:33 |
sean-k-mooney | so we will see later i guess | 12:33 |
bauzas | we gonna use Zoom eventually https://www.openstack.org/ptg/rooms/diablo | 13:00 |
bauzas | sean-k-mooney: are you arriving to the PTg ? | 13:05 |
opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Fix image conversion check in finish_migration https://review.opendev.org/c/openstack/nova/+/897842 | 14:28 |
opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Deprecate use_cow_images and 'default' images_type https://review.opendev.org/c/openstack/nova/+/898229 | 14:28 |
opendevreview | Kashyap Chamarthy proposed openstack/nova master: libvirt: Avoid getCapabilities() to calculate host CPU definition https://review.opendev.org/c/openstack/nova/+/899185 | 15:19 |
opendevreview | Kashyap Chamarthy proposed openstack/nova master: libvirt: Avoid getCapabilities() to calculate host CPU definition https://review.opendev.org/c/openstack/nova/+/899185 | 15:25 |
simondodsley | sean-k-mooney: sorry - i can't stay any longer on the PTG today - i have other commitments i have to deal with. I would love some update on the spec I put in the etherpad though | 16:18 |
opendevreview | Merged openstack/nova stable/2023.2: Install lxml before we need it in post-run https://review.opendev.org/c/openstack/nova/+/899142 | 16:21 |
opendevreview | Pavlo Shchelokovskyy proposed openstack/placement master: Use new db session when retrying https://review.opendev.org/c/openstack/placement/+/898807 | 16:43 |
opendevreview | Pavlo Shchelokovskyy proposed openstack/placement master: Use new db session when retrying https://review.opendev.org/c/openstack/placement/+/898807 | 16:45 |
bauzas | not fun: we had the exact same issue with rolling upgrade in Victoria>Wallaby on rebuild https://review.opendev.org/c/openstack/nova/+/761458/ | 18:13 |
bauzas | so we really need a grenade test on it | 18:13 |
bauzas | and perhaps some hacking check, I dunno | 18:13 |
bauzas | I'll write the fixes tomorrow morning | 18:14 |
bauzas | and yeah, we don't test rebuild https://github.com/openstack/nova/blob/master/.zuul.yaml#L560 | 18:15 |
opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Fix image conversion check in finish_migration https://review.opendev.org/c/openstack/nova/+/897842 | 21:49 |
opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Deprecate use_cow_images and 'default' images_type https://review.opendev.org/c/openstack/nova/+/898229 | 21:49 |
opendevreview | melanie witt proposed openstack/nova stable/zed: Remove use of removeprefix https://review.opendev.org/c/openstack/nova/+/899148 | 23:01 |
opendevreview | Merged openstack/nova master: fix sphinx-lint issues in releasenotes https://review.opendev.org/c/openstack/nova/+/897087 | 23:54 |
opendevreview | Merged openstack/nova master: fix sphinx-lint issues in api guide https://review.opendev.org/c/openstack/nova/+/897088 | 23:54 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!