Friday, 2023-02-03

opendevreview	Merged openstack/nova master: Check our nodes for hypervisor_hostname changes https://review.opendev.org/c/openstack/nova/+/872220	02:08
*** blarnath is now known as d34dh0r53		06:53
*** blarnath is now known as d34dh0r53		07:01
opendevreview	Amit Uniyal proposed openstack/nova master: Adds check if resized to swap zero https://review.opendev.org/c/openstack/nova/+/857339	09:47
opendevreview	Rajesh Tailor proposed openstack/nova master: Add functional regression tests for bug 1857306 https://review.opendev.org/c/openstack/nova/+/700456	10:38
opendevreview	Rajesh Tailor proposed openstack/nova master: Add functional regression tests for bug 1857306 https://review.opendev.org/c/openstack/nova/+/700456	10:52
zigo	My patch for rocky/stein/train probably has broke nova ... :(	11:16
zigo	What's the current status of the Nova gate for these releases?	11:17
bauzas	zigo: we're on hold with wallaby	11:20
bauzas	and I think we won't have backports for Train	11:20
bauzas	(and as a reminder, Stein and older branches are EOL'd in nova)	11:21
sean-k-mooney	zigo: part of the problem with train and older is we cannot bump the minium requirements of oslo and the patach the provides the extra info was merged in ussuri	11:23
sean-k-mooney	so the only way to backport this upstream in train and older would be to either vendor the oslo changes in nova or fallback to not fixign teh cve when oslo is not new enough	11:23
bauzas	mmm, looks like a new gate problem \o/	11:31
bauzas	https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-7d,to:now))&_a=(columns:!(filename),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:filename,negate:!f,params:(query:job-output.txt),type:phrase),query:(match_phrase:(filename:job-output.txt)))),index:'94869730-aea	11:31
bauzas	8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:test_replace_location),sort:!())	11:31
bauzas	huzzah	11:32
bauzas	https://bugs.launchpad.net/nova/+bug/2004641 is created	11:37
sean-k-mooney	ya we have seen that a few times	11:40
sean-k-mooney	not that new but definetly intermitant	11:40
sean-k-mooney	i did see b'400 Bad Request\n\nThe Store URI was malformed.\n\n ' last year but only like once or twice	11:41
sean-k-mooney	i have no idea why that happens sometimes	11:43
bauzas	sean-k-mooney: unrelated, I tried this morning to see how to use the SysFixture I created to be used for two different computes	11:50
bauzas	sean-k-mooney: the problem is not on how to use two fixture instances	11:50
sean-k-mooney	its the mock of the sys path	11:51
sean-k-mooney	you need to activate that when teh compute is started with a With statement	11:51
bauzas	sean-k-mooney: but the question I have is how to make sure if that we call, say, migrate_instance() how to use the right fixture instance	11:52
bauzas	sean-k-mooney: you think it would work so ?	11:52
bauzas	I was thinking of this	11:52
sean-k-mooney	we have done similar things in the past. dansmith is doing somethign similar in the stable uuid series	11:52
bauzas	but I was afraid that when we call migrate_instance() it wouldn't work	11:52
sean-k-mooney	well there is one way to find out	11:53
sean-k-mooney	try it and see	11:53
bauzas	I could use a context manager	11:53
bauzas	yeah, I'll try	11:53
sean-k-mooney	if its a proably the we can do it sleightly differnt by doing mock.patch.object on the speficic compute instnaces	11:54
sean-k-mooney	*compute objects	11:54
sean-k-mooney	and ensure each one will use the correct one	11:54
bauzas	I'll test it	11:56
bauzas	I'm pretty done with my series	11:56
bauzas	I'm just adding more tests	11:56
sean-k-mooney	as in "ready to give up" or its good for re review	11:57
sean-k-mooney	im hoping the latter :)	11:57
sean-k-mooney	main feedback at a glance is you have no docs and no release note so in addtion to tests please think about those	11:57
sean-k-mooney	ping me when you would like a full review and/or manual testing of this	11:58
bauzas	sean-k-mooney: yup, you can manually test if you want	11:58
bauzas	sean-k-mooney: the WIP is +W because of the tests and indeed the documents	11:59
sean-k-mooney	you probaly have not tried using mixed cpus (a vm with pinned and unpined cores) and im not sure if you have check what happens when you use vcpu_pin_set vs cpu_dedicated_set	11:59
bauzas	s/+W/-W of course	11:59
bauzas	sean-k-mooney: good point, I can try	11:59
sean-k-mooney	ill test those manully but when you have some basic test functional test to cover the commen cases pushed in an non wip it would be nice to have functional tests for those	12:00
sean-k-mooney	we should also test the other funky numa cases i really dont expect it to matter but asymetic numa nodes and multiple numa. all of those advanced cases however can be added later	12:02
sean-k-mooney	i dont belive your code will care	12:02
sean-k-mooney	instance.numa_topology.cpu_pinning should be ever PCPU the vm is pinned too regardless of numa	12:03
sean-k-mooney	including the extra one not used by vm cores when you use hw:emulator_thread_policy=isolate	12:04
bauzas	not sure I fully understand your point, sorry	12:06
sean-k-mooney	if you tried to parse the pinned cpus form the xml there are a bunch of edgecases that you would have had to handel	12:06
sean-k-mooney	like also lookign at the emulator thread	12:06
sean-k-mooney	but since your using instance.numa_topology.cpu_pinning you can mostly ignore that complexity	12:07
bauzas	yup changed it based on your point	12:07
sean-k-mooney	so while it would still be nice to test some fo the more complex numa toplogies and edgecases the code should not genrally have to care about them	12:08
bauzas	yup, that's why I want to have migration tests + some numa ones	12:08
sean-k-mooney	bauzas: we are expictly only supporting this when you use cpu_dedicated_set right and not vcpu_pin_set	12:09
bauzas	correct	12:09
bauzas	well, checking my code	12:09
sean-k-mooney	do we want to raise a config error like dansmith is doing in the stable uuid series	12:10
sean-k-mooney	i.e. if you enable this but also have vcpu_pin_set defiend	12:10
sean-k-mooney	or dont have cpu_dedicated_set defiend	12:10
bauzas	good question, I need to think about it	12:10
sean-k-mooney	there are extra edgecases that only come into play if you use vcpu_pin_set	12:11
sean-k-mooney	which is why im asking	12:11
bauzas	https://review.opendev.org/c/openstack/nova/+/868237/6/nova/virt/libvirt/cpu/api.py	12:11
sean-k-mooney	specificaly related to cpu_thread_policy=isolate on a host with hyperthreading	12:11
bauzas	here, we only check get_decicated_set()	12:11
bauzas	but we try to look the numa_topo blob from the instance	12:12
sean-k-mooney	ok so that is not going to block it	12:12
bauzas	if the instance is not having a numa_topo blob, we could have an exception, shit	12:12
sean-k-mooney	well we shoudl not have an expction	12:13
bauzas	as I directly call the subfield from numa_topo	12:13
sean-k-mooney	we should do nothing	12:13
bauzas	what's the default ?	12:13
bauzas	i need to check the object default values	12:13
sean-k-mooney	oh you ment it will currently raise	12:13
bauzas	for the standard instance typo	12:13
sean-k-mooney	yes it will	12:13
sean-k-mooney	the default is None	12:13
bauzas	yeah	12:13
bauzas	my code is wrong for standard non-numa instances	12:14
sean-k-mooney	so right now this will casue an atribute error if there is no numa toplogy	12:14
sean-k-mooney	yep	12:14
bauzas	correct, I need to fix it	12:14
sean-k-mooney	but also we shoudl add a check in init host in the driver to see if CONF.libvirt.cpu_power_management is enabled and cpu_dedicated_set is not defined	12:15
sean-k-mooney	and raise an InvalidConfiguration exception	12:15
bauzas	we haven't said it in the spec but that looks ok to me	12:16
sean-k-mooney	well the old way of doing cpu pinning is deprecated for removal	12:16
sean-k-mooney	and the old way ( if the vm has cpu_thread_policy=isolate and the host has hyperthread) claims 2 host cpus per guest cpu	12:17
sean-k-mooney	bauzas: i think your code will handel that but if we want to allow that we need to test it	12:17
sean-k-mooney	so either raise an error or we need to test that in a fucntional test later in the series	12:17
bauzas	I don't want to support the old config	12:17
sean-k-mooney	works for me	12:18
bauzas	so I'd rather prefer to hardstop at startup if power management is set with legacy pinning config	12:18
bauzas	hence me saying about the non-discussed in the spec but I'm OK	12:18
sean-k-mooney	i would like to remove the old config and pining logic next cycle if we can find time to do it	12:18
sean-k-mooney	bauzas: yep we didnt dicuss it on the spec since we generally for get about the legacy pinning	12:19
bauzas	cool	12:19
bauzas	no worries	12:19
sean-k-mooney	it was ment to be remvoed 2 or 3 releases ago	12:19
bauzas	np	12:20
bauzas	sean-k-mooney: one last question, can we both have cpu_dedicated_set and vcpu_pin_set defined ?	12:23
bauzas	or are they mutually exclusive ?	12:23
* bauzas starts writing the init_host failstop		12:24
sean-k-mooney	cpu_dedicated_set and vcpu_pin_set are mutally exclsive	12:25
bauzas	++	12:25
sean-k-mooney	although that is checked later in the code not in the config option definition	12:25
sean-k-mooney	ones in compute and the others in default	12:25
sean-k-mooney	bauzas: you can simpley check "if CONF.compute.cpu_dedicated_set is None and CONF.libvirt.cpu_power_management: raise"	12:27
bauzas	that was my guess	12:30
* bauzas needs to go errand for 1.5h		12:32
sean-k-mooney	im going to swap to downstream stuff for a while unless pinged	12:32
sean-k-mooney	bauzas: can we try and land dansmith's stable uuid seriese again today	12:32
bauzas	sean-k-mooney: yup, I can try to take a look again	12:32
sean-k-mooney	cool im +2 all the way up. dan did some rebases and fix a minior bug so they lost the +2s they had before	12:33
*** dasm\|off is now known as dasm		13:52
*** dasm is now known as dasm\|rover		13:52
ralonsoh	sean-k-mooney, hi! Do you know if this is a known error?	14:23
ralonsoh	https://zuul.opendev.org/t/openstack/build/0d9af8c4e906422a9e4a27b1b849f315	14:23
ralonsoh	This is the second recheck with this error	14:23
sean-k-mooney	no thats not a know error	14:25
sean-k-mooney	although the volume tests can be flaky	14:25
sean-k-mooney	this looks unrealted	14:26
sean-k-mooney	i think there may have been an OOM issue	14:29
sean-k-mooney	https://zuul.opendev.org/t/openstack/build/0d9af8c4e906422a9e4a27b1b849f315/log/controller/logs/screen-memory_tracker.txt#4768	14:29
ralonsoh	right, thanks	14:29
sean-k-mooney	look like nova api lost access to the db	14:30
sean-k-mooney	so my guess is mariadb was killed	14:30
sean-k-mooney	same for keystone it los db connection	14:31
ralonsoh	that usually happens with a oom	14:31
sean-k-mooney	yep so we might need to add a little more swap to that job	14:31
ralonsoh	did you try reducing the parallelism in the functional tests?	14:31
sean-k-mooney	its not the funcitonl tests its tempest	14:31
ralonsoh	right, in tempest	14:31
sean-k-mooney	its currently 4 which should be ok	14:33
ralonsoh	we reduced some jobs to 3 or 2	14:33
ralonsoh	to avoid this issue	14:33
sean-k-mooney	i think this is using the default swap of 1G we shoudl set it to 8	14:33
sean-k-mooney	we could reduce concernace too but i would try swap first	14:34
ralonsoh	perfect	14:34
sean-k-mooney	we crrently set concurancy here https://github.com/openstack/os-vif/blob/master/.zuul.yaml#L23	14:34
sean-k-mooney	you can add configure_swap_size: 8192 there as well	14:35
ralonsoh	should I do in a separate patch?	14:36
sean-k-mooney	if that does not work then set concurancy to 3	14:36
sean-k-mooney	ya lets make it a spereate patch	14:36
ralonsoh	cool, give me 1 min	14:36
sean-k-mooney	then we can just recheck yours if it passes once its merged	14:36
opendevreview	Rodolfo Alonso proposed openstack/os-vif master: Increase the swap size to 8GB in tempest jobs https://review.opendev.org/c/openstack/os-vif/+/872655	14:39
sean-k-mooney	ok lets leave that run but we should be able to merge both by the end of the day all going well	14:40
ralonsoh	that's perfect	14:41
opendevreview	Elod Illes proposed openstack/nova stable/wallaby: [stable-only] Remove broken sdk job from wallaby https://review.opendev.org/c/openstack/nova/+/871798	15:39
spatel	sean-k-mooney did you ever see this error when trying to create vm1 - https://paste.opendev.org/show/bmAj7IrkhbRpMbe14zht/	16:10
sean-k-mooney	i have see it if the wsgi timeout or haproxy timeout is set too low and the services are under heavy load	16:13
spatel	Hmm!!	16:13
sean-k-mooney	you need to check if the request ever got to the nova api. if it did you need to check if the api responed before the connection closed and if it got to the loadbalance or not	16:14
spatel	server load is 0.66 average	16:14
spatel	I can do nova list etc.. which works fine. only vm creation choking up	16:15
spatel	let me check nova api logs etc..	16:15
spatel	HAproxy showing all green health for nova members	16:17
spatel	openstack compute service list also showing all service up	16:18
spatel	This logs has nothing to do with this issue correct - https://paste.opendev.org/show/bhP6raKj9pFgARdlb0ES/	16:22
spatel	sean-k-mooney it works after i restarted all nova-* service	16:25
spatel	do you think this is bug? I am running wallaby release	16:25
dansmith	bauzas: you still around?	16:28
bauzas	dansmith: yes	16:29
dansmith	any chance you could look at the stable compute stuff today? it's +2d up to the top, most of what is left are additional checks and tests	16:29
dansmith	I'm going to work on docs next week	16:29
sean-k-mooney	spatel: maybe but without you debuging it and finding a root cause or an indication of the problem we dont dont ahve anything to go on	16:30
bauzas	dansmith: I'm a bit short in time but I can try	16:31
sean-k-mooney	spatel: i dont think that log is related	16:31
spatel	sean-k-mooney i think better i should upgrade to Xena or yoga.	16:32
spatel	wallaby is little behind now	16:32
dansmith	bauzas: ack, well, if we have to recheck grind them, I just don't want to be racing next week.. and monday is a holiday for some folks	16:32
bauzas	dansmith: I'm running up to finish to write functests on https://review.opendev.org/c/openstack/nova/+/868237/ but again, I'll try	16:34
dansmith	okay, well, it's not that critical I guess	16:35
dansmith	maybe I could bribe melwitt to have a look-see	16:36
bauzas	sean-k-mooney: quick q, when reading https://docs.openstack.org/nova/latest/admin/cpu-topologies.html#configure-libvirt-pinning	16:40
bauzas	sean-k-mooney: on my functest, I only did set cpu_dedicated_set but when trying to boot a regular non-pinned instance, it errors out	16:40
bauzas	with NoValidHost	16:41
bauzas	so I guess I also need to set cpu_shared_set ?	16:41
* bauzas thought it was implicit		16:41
bauzas	https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set doesn't mention the dependency	16:42
bauzas	oh wait, see my bug	16:43
bauzas	my self.flags was setting None to cpu_shared_set	16:44
bauzas	ffff	16:45
sean-k-mooney	cpu_shared_set is for non pined core yes	16:45
sean-k-mooney	or floating vms	16:45
bauzas	I know tyhis	16:45
sean-k-mooney	you only need to set it if cpu_dedicated_set is defiend	16:45
bauzas	ok, both are dependent then	16:45
sean-k-mooney	i dont wnat to go into the complexiyt i can explain it in detail next week if you like	16:46
bauzas	you can't just set cpu_dedicated_set without opt-in the cpu_shared_set	16:46
bauzas	no worries, it works	16:46
bauzas	it was a pebkac	16:46
sean-k-mooney	you can but if you dont set cpu_shared_set then the host can only be used for pinnded vms	16:46
bauzas	I see	16:46
bauzas	with the restricted set of dedicated cpus	16:47
bauzas	like, I have 10 cpus, I set 5 of them as dedicated and period.	16:47
sean-k-mooney	yep	16:47
bauzas	that means that the 5 lefts won't be used	16:47
bauzas	gotcha	16:47
sean-k-mooney	ya they are reserved for the host	16:47
sean-k-mooney	the ones that are not listed	16:47
sean-k-mooney	so normally i would expect the first core in each socket to not be in either set	16:48
sean-k-mooney	and then you devied up the rest depending on your workloads	16:48
bauzas	cool enough	16:50
opendevreview	Elod Illes proposed openstack/nova stable/yoga: ignore deleted server groups in validation https://review.opendev.org/c/openstack/nova/+/867989	17:05
opendevreview	Elod Illes proposed openstack/nova stable/yoga: add repoducer test for bug 1890244 https://review.opendev.org/c/openstack/nova/+/872663	17:05
opendevreview	Merged openstack/nova stable/yoga: Improving logging at '_allocate_mdevs'. https://review.opendev.org/c/openstack/nova/+/871414	18:31
opendevreview	Sylvain Bauza proposed openstack/nova master: enable cpus when an instance is spawning https://review.opendev.org/c/openstack/nova/+/868237	18:46
dvo-plv	Hello	19:50
dvo-plv	I have a question about trait. I added new trait COMPUTE_NET_VIRTIO_PACKED. WHen I execute next command : openstack allocation candidate list --resource VCPU=1 I can see that one compute node has this trait and another does not	19:52
sean-k-mooney	you did not request the trait in the query	19:53
sean-k-mooney	so you will get back a list of all host that have 1 cpu core free	19:53
dvo-plv	When I do live migration from compute node with new trait to the comptue without this trait, soes scheduler should check if requested compute node has this trait and forbid to execute migration ?	19:53
sean-k-mooney	if and only if you have requested it in the flavor or implemnted that in your change correctly	19:54
sean-k-mooney	you would ideally have a prefilter that add the trait request	19:54
dvo-plv	I set this trait in the flavor with next command openstack flavor set --property trait:COMPUTE_NET_VIRTIO_PACKED=required 3	19:55
sean-k-mooney	that will take effect for new vms but not existing ones	19:55
sean-k-mooney	if you have that set however then it will be included with the placment request and enforced	19:55
dvo-plv	Yes, I create flavor with this trait and after that create VM and try to do live migration	19:56
sean-k-mooney	then placement should enable this to work as intended	19:56
sean-k-mooney	the vm will only shcedule to a host with that trait including on migrations	19:57
sean-k-mooney	this is part of this proposal correct https://review.opendev.org/c/openstack/nova-specs/+/868377/1/specs/2023.1/approved/virtio_packedring_configuration_support.rst	19:58
dvo-plv	In the spec file you told that packed ring option should be handled in some specific way to be sure that vm with packed_ring qemu option will not migrate to the node without this feature support and i can not get this feature from the libvirt. I found that this feature is enabled only from libvirt 6.3. So if I migrate to the node with lower libvirt the next method _check_compatible_with_source_hypervisor will forbid to migrate	20:00
sean-k-mooney	that is not a sufficent check unfortunatly	20:01
sean-k-mooney	this is not a feature of libvirt its a qmeu one and that check would already be too late	20:01
sean-k-mooney	what we need to do is have nova report the trait on all hosts that support this	20:02
sean-k-mooney	and then either have the guest opt into this with a image property or flavor extra specs	20:03
sean-k-mooney	we can then use that image property/extra spec to request the trait automatically	20:03
sean-k-mooney	that is the simple approch	20:03
sean-k-mooney	the harder approch is to figure out a way to auto enable this	20:04
sean-k-mooney	i dont know a simple way to do that off the top of my head unfortunetly	20:04
sean-k-mooney	its a bit late here but we should proably disucss this some more next week	20:05
dvo-plv	another way, we can check qemu version, this feature was implemented from qemu 4.2	20:05
dvo-plv	or grep funtionality like in this port http://blog.vmsplice.net/2020/05/how-to-check-virtio-feature-bits-inside.html	20:06
dvo-plv	Thank you, let's move it to next weekє	20:07
sean-k-mooney	the issue is we need to make the scudleing work before we have selected any host	20:07
sean-k-mooney	so in general we cannot check the qemu/libvirt verion at the schduling step	20:08
dvo-plv	I tried to get this feature at this method static_traits in the nova/virt/libvirt/driver.py	20:08
sean-k-mooney	the best i can think of would be to stash a flag in the instance_system_metadta to recored it was booted on a host that supported the pact format	20:09
sean-k-mooney	and then include a request for the trait if that is found in the instance_system_metadata	20:09
sean-k-mooney	that is proably the best we can do since we dont know if its used by the guest	20:09
sean-k-mooney	the static_traits function is the correct plase to report the trait for the compute host	20:10
gmann	bauzas: stephenfin: need one more review in this to unblock stable/wallaby gate (removing broken sdk job) https://review.opendev.org/c/openstack/nova/+/871798	21:06
*** dasm\|rover is now known as dasm\|afk		21:36
melwitt	dansmith: I have look-see'd	22:39
opendevreview	Merged openstack/nova stable/wallaby: [stable-only] Remove broken sdk job from wallaby https://review.opendev.org/c/openstack/nova/+/871798	22:40
dansmith	melwitt: um	22:54
dansmith	I'm not sure you can past-tense-verb look-see	22:55
dansmith	(but thanks :)	22:55
melwitt	:)	22:55
sean-k-mooney	its english there is always a way to do things	22:56
sean-k-mooney	023-02-03 23:00:25.664 41 ERROR nova.virt.driver [None req-80a00ce3-332e-4a98-be2e-d0a1bd9329d7 - - - - - -] Compute driver option required, but not specified	23:00
sean-k-mooney	lol ok that explains why its not working	23:01
opendevreview	Sylvain Bauza proposed openstack/nova master: Enable cpus when an instance is spawning https://review.opendev.org/c/openstack/nova/+/868237	23:05
EugenMayer4	I'aim running OVS on zed. What i'am trying to do is. I have 2 networks (or 3, if you also count the provider lan). lets call them provider_wan, intranet and DMZ. Now i want DMZ to only be able to talk to some very few clients on some specific port in intranet (ip / port allow list), but not nothing else in intranet. Also, a client in DMZ should be	23:34
EugenMayer4	able to access the internet via the provider lan. I finished setting up the internet access, i can also access the intranet, but i;am not able to limit what a DMZ client is able to access in the intranet. Any hints? All clients in this case are nova vms (qemu)	23:34
sean-k-mooney	the only way i can think of to do that without the firewall as a service project which is dead is to use a vm as a router	23:36
sean-k-mooney	so dont interconnect the networks on the datacenter side or with neutron routhers	23:36
sean-k-mooney	but run a pfsence or similar vm and make it the default gateway for the DMZ and intranet	23:37
sean-k-mooney	and connet it to thr wan network	23:37
sean-k-mooney	then you can implemnte whatever network policy you like there	23:37
sean-k-mooney	EugenMayer4: in general the neutorn channel would proably be able to help more	23:37

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!