Tuesday, 2023-10-24

sean-k-mooney[m]	simondodsley: one thing to keep in mind is openstack and nova in particular is not an enterprise virtualisation platform and vmware is not a compator, aws or azure is a better comparison	01:10
sean-k-mooney[m]	ignoreing the fact ware are likely to remove the vmware driver shortly, vmware is one of the backend hyperviors you can use with nova. but nova is a cloud compute project so feature parity with enterpirse virtualisation software is not a goal for nova	01:12
opendevreview	Dmitriy Rabotyagov proposed openstack/nova stable/2023.2: Install lxml before we need it in post-run https://review.opendev.org/c/openstack/nova/+/899142	07:10
opendevreview	Dmitriy Rabotyagov proposed openstack/nova stable/2023.2: Revert "Add upgrade check for compute-object-ids linkage" https://review.opendev.org/c/openstack/nova/+/898953	07:10
bauzas	fwiw, I'm more than happy to welcome volunteers wanting to work on https://blueprints.launchpad.net/nova/+spec/detach-boot-volume in particular if they want this is important for their users	08:16
bauzas	I can offer nova onboarding and knowledge transfer if that helps	08:17
bauzas	I'm paid for leading a community, but I need this community to enact by itself :)	08:17
bauzas	simondodsley: fwiw, we have an PTG operator hour that's proposed today at 1500UTC https://ptg.opendev.org/ptg.html	08:41
bauzas	simondodsley: if you feel you'd like to discuss your feature request, feel free to join	08:41
noonedeadpunk	hey folks. I get weird error on HEAD of stable/2023.1 when trying to rebuild the instance: https://paste.openstack.org/show/biUIcOzMCx0YlsFob2KK/	08:42
bauzas	and like I said, if I can help someone volunteering, I'll surely offer onboarding	08:42
bauzas	noonedeadpunk: eek	08:43
noonedeadpunk	I decided to ask if it's smth known already so that I won't spend time on looking deeper (as for me it's not very obvious from the trace)	08:43
noonedeadpunk	yeah, detach-boot-volume is really interesting thing, though I can recall it being not as trivial as it might look at the beginning	08:46
bauzas	noonedeadpunk: this is a RPC call	08:46
noonedeadpunk	ok, so I should check on.. conductor I guess	08:46
bauzas	this sounds a compat issue between conductor and compute	08:47
bauzas	basically your compute awaits some argument	08:47
bauzas	that conductor isn't passing	08:47
bauzas	I guess you upgraded ?	08:47
bauzas	do you have a rolling upgrade in lieu ?	08:48
bauzas	noonedeadpunk: that's the client RPC code that does the RPC negociation https://github.com/openstack/nova/blob/stable/2023.1/nova/compute/rpcapi.py#L1106-L1110	08:50
bauzas	this client code is run by the nova-conductor service	08:50
bauzas	or by a compute service if this is compute-to-compute calls	08:50
noonedeadpunk	Yeah, I am. And I see version 66 for every service in `services` table of cell db	08:50
noonedeadpunk	oh, wait, couple of computes are still 60.... Hm	08:51
noonedeadpunk	Not the affected one though, but still smth I need to fix	08:51
bauzas	do you have some rpc pins ?	08:52
noonedeadpunk	nah, I don't	08:54
noonedeadpunk	Or well, should not have - never set them	08:54
bauzas	https://github.com/openstack/nova/blob/stable/2023.1/nova/conductor/manager.py#L1395-L1414 is where conductor calls compute for rebuilding	08:54
noonedeadpunk	Ok, computes that have eariler version are down right now...	08:55
bauzas	the conductor manager is then invoking compute.rpcapi's rebuild_instance() which does RPC negociation	08:55
bauzas	and then remove target_state if computes are older than rpc 6.2	08:55
noonedeadpunk	bauzas: is there any easy way to see reported prc version? becauce I guess in nova.services it's a different version rather then RPC.	08:57
bauzas	noonedeadpunk: https://github.com/openstack/nova/commit/8c2e76598995f0d417653c60a63ea342baf4e880 is the commit that adds RPC 6.2	08:58
bauzas	noonedeadpunk: good question	08:58
bauzas	I'd say internal RPC versions aren't meant to be exposed, but we provide aliases	08:58
bauzas	https://github.com/openstack/nova/blob/stable/2023.1/nova/compute/rpcapi.py#L428	08:59
bauzas	zed computes are exposing 6.1, antelope ones are 6.2	08:59
noonedeadpunk	I guess I was more willing to check what computes/conductor reports as versions	09:00
bauzas	if both your conductor and computes are 6.2, then try out to pin the rpc version	09:00
bauzas	yeah, got it	09:00
bauzas	the service version should tho give you that info	09:00
noonedeadpunk	so if all is 66 - it should kinda work...	09:00
bauzas	https://github.com/openstack/nova/commit/8c2e76598995f0d417653c60a63ea342baf4e880#diff-c0b6a5928be3ac40200a2078b084341bb9187a12b1f959ad862e0038c9029193	09:00
bauzas	66 is the object version that matches RPC 6.2	09:01
noonedeadpunk	As iirc 66 is 2023.1	09:01
noonedeadpunk	yeah...	09:01
bauzas	are you sure you restarted all your conductors after the upgrade ? :)	09:01
bauzas	sorry for that silly question but I had to	09:01
noonedeadpunk	They won't report 66 otherwise?	09:02
bauzas	good call	09:02
noonedeadpunk	I was actually trying to find a way to check what I could forgot to restart :)	09:02
noonedeadpunk	But I'm checking now....	09:02
bauzas	I guess you can find the error in the conductor log, isn't it ?	09:02
bauzas	if so, identify the conductor service in cause	09:03
noonedeadpunk	Well, I've checked ps on all conductor hosts - all running from same venv and venv have same nova version installed...	09:10
noonedeadpunk	And logs are clean as conductor doesn't obviously see any issues	09:10
noonedeadpunk	let me check if code does contain these patches you've sent me	09:10
noonedeadpunk	yeah, it's obviously there for all conductors...	09:14
noonedeadpunk	Well, upgrade check throws `hw_machine_type unset` warning. Will try to sort it out - not sure if that's related though	09:21
bauzas	no, this isn't	09:23
bauzas	noonedeadpunk: since it's a venv, can you force to add some log in the conditional ?	09:25
noonedeadpunk	sure	09:25
bauzas	oh waity	09:26
bauzas	this is a rebuild revert, right?	09:26
bauzas	uh no	09:26
bauzas	we're just calling the decorator in case rebuild_instance raises an exception	09:27
bauzas	speaking out loud,	09:27
bauzas	the RPC check on conductor considers that compute is older	09:27
bauzas	but this is also not passing the target_state parameter or we would have an exception	09:28
noonedeadpunk	no, it's not revert, it's bassically just `openstack server rebuild <server_uuid> --image <image_uuid>`	09:29
bauzas	yeah yeah	09:29
bauzas	it's popping up in the deco just because this is wrapping the call itself	09:30
noonedeadpunk	But yes, looking at code you've shown I tend to agreee that issue is in rpc version, but it's very confusing, given that in DB it's reported as 66... And I pretty sure that before upgrade it was 61 or smth	09:30
bauzas	but that means the RPC call doesn't contain the expected target_state arg	09:30
bauzas	so something happens in the negociation	09:30
bauzas	compute says "give me target_state" but conductor doesn't offer it	09:31
noonedeadpunk	let me try to restart compute maybe, just in case....	09:31
bauzas	I'd say the other way, a conductor restart	09:32
bauzas	compute is new enough to require target_state	09:32
noonedeadpunk	yeah, makes sense...	09:32
bauzas	and https://github.com/openstack/nova/blob/stable/2023.1/nova/compute/manager.py#L621 exposes 6.2 I just checked	09:33
bauzas	so yeah, I'd add a few debugging logs in the compute rpcapi method	09:34
bauzas	(well, if this was my env, I'd pdb :D)	09:34
noonedeadpunk	`Oct 24 09:34:35 control01-nova-api-container-13281850 nova-conductor[4131712]: 2023-10-24 09:34:35.835 4131712 INFO nova.compute.rpcapi [None req-28a1e0a6-3f71-4238-825d-b508d33ac758 - - - - - -] Automatically selected compute RPC version 6.1 from minimum service version 64`	09:34
noonedeadpunk	o_O	09:34
bauzas	but logging the object states before the RPC call seems a way forward	09:34
bauzas	ahaha !	09:35
noonedeadpunk	Hm... Maybe that\s a compute which is down making things weird	09:35
bauzas	no	09:35
bauzas	I have to think about it, sec	09:36
noonedeadpunk	Well. Compute that is down still having version 64	09:36
noonedeadpunk	So it can bring negotiation to that version?	09:36
noonedeadpunk	I pretty much can bet it's the case...	09:37
bauzas	oh, wait a sec	09:37
noonedeadpunk	the compute was down during the upgrade	09:38
bauzas	I need to remember my rpc compat skilles	09:38
bauzas	skills*	09:38
bauzas	yeah, so the thing is,https://github.com/openstack/nova/blob/stable/2023.1/nova/compute/rpcapi.py#L504-L505 gives you 64	09:39
bauzas	so we're capping to the minimum version	09:39
bauzas	but then, the manager shouldn't yell	09:39
bauzas	I'm missing something obvious but I'm rusty	09:39
noonedeadpunk	I kinda wonder if `get_minimum_version` should account only for ones that are up...	09:40
noonedeadpunk	But yeah, also if that minimum version is fine, then computes shouldn't fail because version is too low either I guess...	09:41
bauzas	no no	09:41
bauzas	this is a bug	09:41
bauzas	I just checked	09:41
bauzas	we broke rolling-upgrade compat, that's it	09:41
noonedeadpunk	Should I submit one then?	09:41
bauzas	the manager should default target_state=None	09:42
bauzas	because we automatically pin our RPC calls to the min version	09:42
bauzas	I'll doublecheck that with dansmith but I bet this is the problem	09:42
noonedeadpunk	fwiw, dropping that old compute from services made negitoation pass	09:43
noonedeadpunk	`Automatically selected compute RPC version 6.2 from minimum service version 66`	09:43
bauzas	and https://github.com/openstack/nova/commit/30aab9c234035b49c7e2cdc940f624a63eeffc1b#diff-47eb12598e353b9e0689707d7b477353200d0aa3ed13045ffd3d017ee7d9e753R3709 has the same problem	09:43
noonedeadpunk	and rebuild worked out	09:43
bauzas	yeah, 64 is zed	09:45
noonedeadpunk	Ok, I can submit a bug report if you think it's a bug, and I have a way around the issue as well :)	09:45
bauzas	but basically, rebuild is broken on a rolling-upgrade from Zed to Antelope	09:45
bauzas	noonedeadpunk: sure, file it please	09:46
bauzas	and I think rebuild is also broken from Yoga to Zed btw. :)	09:46
noonedeadpunk	Yes, another region we were upgrading from Yoga was also broken...	09:47
bauzas	on rebuild ?	09:47
noonedeadpunk	But it's unofficially supported upgrade, so I took better example :p	09:47
noonedeadpunk	Yeah, same error	09:47
bauzas	but not the same argument I guess	09:47
noonedeadpunk	The only that has rebuild working is the one that has no computes down	09:48
bauzas	from yoga to zed, it should yell at reimage_boot_volume missing	09:48
noonedeadpunk	I actually wasn't digging there yet to be frank, as reported by users issues were the same from their side	09:48
bauzas	I have to doublecheck but I think we're missing to test rebuild with grenade	09:49
bauzas	anyway, I need to disappear, I have a few toxins to sweat before we start the PTG	09:50
noonedeadpunk	sure, thanks a lot for the help, report is on it's way	09:53
noonedeadpunk	eventually, since I found out about `hw_machine_type` what you would suggest adding? `q35` for all images basically?	09:58
noonedeadpunk	bug: https://bugs.launchpad.net/nova/+bug/2040264	10:15
opendevreview	Alexey Stupnikov proposed openstack/nova stable/2023.1: Translate VF network capabilities to port binding https://review.opendev.org/c/openstack/nova/+/898945	11:39
sean-k-mooney	noonedeadpunk: i have not read back fully but one of the main issues with detachign a boot drive is it leave the instance in a fundementally broken state	12:09
sean-k-mooney	if we were every to supprot that i htink we woudl need to have a new state for the vm to model that. and or allow away to say use a seperate disk as the boot device.	12:10
noonedeadpunk	Well, shelved_offloaded should work kinda?	12:10
sean-k-mooney	it also kind of messes with the resouce allcoatins if you dont follwo our guidance and use flavor with 0 disk_gb for bfv guests	12:10
noonedeadpunk	But I guess challange there was also to record what the drive is detached to re-attach it as root lately	12:11
sean-k-mooney	noonedeadpunk: shelve_offloaded does not work because we cant unshelve if you have detached the volume and attached it to somethign else	12:11
noonedeadpunk	but it would be a valid failure that VM can't be offloaded	12:11
sean-k-mooney	ya so it really comes down to the usecase	12:12
noonedeadpunk	sean-k-mooney: sorry, I didn't get part about "use flavor with 0 disk_gb for bfv guests" - can you elaborate a bit on this part?	12:12
noonedeadpunk	As I thought that for bfv you're kinda good at having 0 disk_gb?	12:12
sean-k-mooney	yes that is the recomendation	12:12
sean-k-mooney	but some peopel dont do that and use a normal flavor	12:13
bauzas	sean-k-mooney: the fact is, we need a new design discussin	12:13
bauzas	but if someone wants to work on it, sure I can hel	12:13
noonedeadpunk	ah, ok, as I read it as "you don't follow recommendation AND use 0 disk_gb"	12:13
bauzas	help	12:13
bauzas	that was my point	12:13
bauzas	noonedeadpunk: ++ I'll also discuss it on the PTG	12:13
sean-k-mooney	bauzas: well right now i think its a hard cell to add to nova unless we put some restricitons in place	12:14
sean-k-mooney	... sell	12:14
bauzas	sure, see my above comments :)	12:14
bauzas	anyway	12:14
bauzas	noonedeadpunk: I'll provide a bugfix hopefully next week	12:15
sean-k-mooney	noonedeadpunk: i would have less issue with saying "you can only detach the root voluem flavor with 0 disk_gb"	12:15
bauzas	or tomorrow morning if I have time	12:15
sean-k-mooney	and then putting the vm in to an "unbootable state"	12:15
sean-k-mooney	or only supporting this for shleve_offloaded instances	12:15
sean-k-mooney	noonedeadpunk: the real concern was not with the detach but with reattaching potentially a diffent root volume with image properties that were invlaid for the current host	12:16
sean-k-mooney	but if its shelve_offloaded after the reattachment then that not a problem as when we unshleve it will go throughthe schduler	12:16
noonedeadpunk	Yeah, that's why I thought that shelved should be quite appropriate state where you potentially can do that	12:17
noonedeadpunk	And I would say that having 0 disk_gb for flavor would be also fair requirements	12:17
noonedeadpunk	for the feature to work	12:18
sean-k-mooney	the 0 disk_gb thing is partly a personal prefence. we do not allow non bfv guest to use 0 disk flavors	12:18
sean-k-mooney	but we dont require them too either and i really would prefer to require that	12:18
sean-k-mooney	we dont actully need to use the flavor size to knwo if its bfv in the api	12:19
sean-k-mooney	so we could make it work without that restirction	12:19
sean-k-mooney	noonedeadpunk: what i really want to aovid is needing to hit the schduler on every volume attach	12:20
noonedeadpunk	well. requiring that might complicate some things actually...	12:20
sean-k-mooney	just in case its happens to be a root volume	12:20
noonedeadpunk	I mean prohibit to do bfv with non-0 flavor	12:20
sean-k-mooney	we didnt do that in the past due to upgrade concerns	12:21
sean-k-mooney	what we did instead id if its BFV we nolonger include the disk_gb in the allcoation request	12:21
noonedeadpunk	nah, I think that it indeed should be some offloaded state imo... Or indeed introduce new one, so that resources won't be wasted on offload per sat	12:21
sean-k-mooney	as of some releivly recent openstack version	12:21
noonedeadpunk	but which will still go throguh scheduler on boot	12:21
noonedeadpunk	but yeah, eventually it's still offloaded...	12:22
sean-k-mooney	noonedeadpunk: we will talk about this later in the ptg but im also wondering about the orginal usecase that is of most interest to people	12:22
sean-k-mooney	i.e. what fraction of those that want this are using a version of nova that does not supprot rescue or rebuild for bfv guests	12:23
sean-k-mooney	we added both in the lat 2 years	12:23
noonedeadpunk	I think most usecase is in fact some rescue....	12:23
noonedeadpunk	and yes, that kinda covers my usecase fully	12:23
noonedeadpunk	sean-k-mooney: is it possible to reset delete on termination flag for the root drive?	12:24
sean-k-mooney	noonedeadpunk: part of the reason we didnt do this was we decied lets just make the instance action that dont work with bfv work first	12:24
sean-k-mooney	noonedeadpunk: that im not sure i think you might eb able to do it via cinder	12:24
sean-k-mooney	but that woudl also be reasonable to support IMO if we dont supprot that already	12:25
sean-k-mooney	you shoudl always be able to say "hay dont delete this" after the fact i think	12:25
noonedeadpunk	yeah	12:25
noonedeadpunk	as this what strikes my mind when I think why I would need to detach the root volume...	12:26
noonedeadpunk	Just to save it from being deleted	12:26
noonedeadpunk	(with the VM)	12:26
noonedeadpunk	and then rescue works nicely indeed	12:26
sean-k-mooney	ya and you should be able to do the same for ports. the port case is defenitly on our todo list	12:26
noonedeadpunk	but then indeed I don't know why would you need to detach it	12:28
noonedeadpunk	Just out of convenience... Like first thing someone would try to do rather using rescue...	12:29
noonedeadpunk	so I think you;re right here and it's smth that ppl were just historically missing due to not working rescue	12:29
noonedeadpunk	and that is kinda inertion of thinking	12:30
sean-k-mooney	ya so thats why im wondering about the usecase	12:32
sean-k-mooney	if there is a usecase we cant supprot then im happy to hear that and see what we can do to supprot it	12:32
sean-k-mooney	but if its already covered by rescue or updating the delete on terminate flag	12:33
sean-k-mooney	then i woudl perfer to use that approch then add root detach	12:33
sean-k-mooney	so we will see later i guess	12:33
bauzas	we gonna use Zoom eventually https://www.openstack.org/ptg/rooms/diablo	13:00
bauzas	sean-k-mooney: are you arriving to the PTg ?	13:05
opendevreview	Pavlo Shchelokovskyy proposed openstack/nova master: Fix image conversion check in finish_migration https://review.opendev.org/c/openstack/nova/+/897842	14:28
opendevreview	Pavlo Shchelokovskyy proposed openstack/nova master: Deprecate use_cow_images and 'default' images_type https://review.opendev.org/c/openstack/nova/+/898229	14:28
opendevreview	Kashyap Chamarthy proposed openstack/nova master: libvirt: Avoid getCapabilities() to calculate host CPU definition https://review.opendev.org/c/openstack/nova/+/899185	15:19
opendevreview	Kashyap Chamarthy proposed openstack/nova master: libvirt: Avoid getCapabilities() to calculate host CPU definition https://review.opendev.org/c/openstack/nova/+/899185	15:25
simondodsley	sean-k-mooney: sorry - i can't stay any longer on the PTG today - i have other commitments i have to deal with. I would love some update on the spec I put in the etherpad though	16:18
opendevreview	Merged openstack/nova stable/2023.2: Install lxml before we need it in post-run https://review.opendev.org/c/openstack/nova/+/899142	16:21
opendevreview	Pavlo Shchelokovskyy proposed openstack/placement master: Use new db session when retrying https://review.opendev.org/c/openstack/placement/+/898807	16:43
opendevreview	Pavlo Shchelokovskyy proposed openstack/placement master: Use new db session when retrying https://review.opendev.org/c/openstack/placement/+/898807	16:45
bauzas	not fun: we had the exact same issue with rolling upgrade in Victoria>Wallaby on rebuild https://review.opendev.org/c/openstack/nova/+/761458/	18:13
bauzas	so we really need a grenade test on it	18:13
bauzas	and perhaps some hacking check, I dunno	18:13
bauzas	I'll write the fixes tomorrow morning	18:14
bauzas	and yeah, we don't test rebuild https://github.com/openstack/nova/blob/master/.zuul.yaml#L560	18:15
opendevreview	Pavlo Shchelokovskyy proposed openstack/nova master: Fix image conversion check in finish_migration https://review.opendev.org/c/openstack/nova/+/897842	21:49
opendevreview	Pavlo Shchelokovskyy proposed openstack/nova master: Deprecate use_cow_images and 'default' images_type https://review.opendev.org/c/openstack/nova/+/898229	21:49
opendevreview	melanie witt proposed openstack/nova stable/zed: Remove use of removeprefix https://review.opendev.org/c/openstack/nova/+/899148	23:01
opendevreview	Merged openstack/nova master: fix sphinx-lint issues in releasenotes https://review.opendev.org/c/openstack/nova/+/897087	23:54
opendevreview	Merged openstack/nova master: fix sphinx-lint issues in api guide https://review.opendev.org/c/openstack/nova/+/897088	23:54

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!