Friday, 2023-02-03

opendevreviewMerged openstack/nova master: Check our nodes for hypervisor_hostname changes  https://review.opendev.org/c/openstack/nova/+/87222002:08
*** blarnath is now known as d34dh0r5306:53
*** blarnath is now known as d34dh0r5307:01
opendevreviewAmit Uniyal proposed openstack/nova master: Adds check if resized to swap zero  https://review.opendev.org/c/openstack/nova/+/85733909:47
opendevreviewRajesh Tailor proposed openstack/nova master: Add functional regression tests for bug 1857306  https://review.opendev.org/c/openstack/nova/+/70045610:38
opendevreviewRajesh Tailor proposed openstack/nova master: Add functional regression tests for bug 1857306  https://review.opendev.org/c/openstack/nova/+/70045610:52
zigoMy patch for rocky/stein/train probably has broke nova ... :(11:16
zigoWhat's the current status of the Nova gate for these releases?11:17
bauzaszigo: we're on hold with wallaby11:20
bauzasand I think we won't have backports for Train11:20
bauzas(and as a reminder, Stein and older branches are EOL'd in nova)11:21
sean-k-mooneyzigo: part of the problem with train and older is we cannot bump the minium requirements of oslo and the patach the provides the extra info was merged in ussuri11:23
sean-k-mooneyso the only way to backport this upstream in train and older would be to either vendor the oslo changes in nova or fallback to not fixign teh cve when oslo is not new enough11:23
bauzasmmm, looks like a new gate problem \o/11:31
bauzashttps://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-7d,to:now))&_a=(columns:!(filename),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:filename,negate:!f,params:(query:job-output.txt),type:phrase),query:(match_phrase:(filename:job-output.txt)))),index:'94869730-aea11:31
bauzas8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:test_replace_location),sort:!())11:31
bauzashuzzah11:32
bauzashttps://bugs.launchpad.net/nova/+bug/2004641 is created11:37
sean-k-mooneyya we have seen that a few times11:40
sean-k-mooneynot that new but definetly intermitant11:40
sean-k-mooneyi did see  b'400 Bad Request\n\nThe Store URI was malformed.\n\n ' last year but only like once or twice11:41
sean-k-mooneyi have no idea why that happens sometimes11:43
bauzassean-k-mooney: unrelated, I tried this morning to see how to use the SysFixture I created to be used for two different computes11:50
bauzassean-k-mooney: the problem is not on how to use two fixture instances11:50
sean-k-mooneyits the mock of the sys path11:51
sean-k-mooneyyou need to activate that when teh compute is started with a With statement11:51
bauzassean-k-mooney: but the question I have is how to make sure if that we call, say, migrate_instance() how to use the right fixture instance11:52
bauzassean-k-mooney: you think it would work so ?11:52
bauzasI was thinking of this11:52
sean-k-mooneywe have done similar things in the past. dansmith is doing somethign similar in the stable uuid series11:52
bauzasbut I was afraid that when we call migrate_instance() it wouldn't work11:52
sean-k-mooneywell there is one way to find out11:53
sean-k-mooneytry it and see11:53
bauzasI could use a context manager11:53
bauzasyeah, I'll try11:53
sean-k-mooneyif its a proably the we can do it sleightly differnt by doing mock.patch.object on the speficic compute instnaces11:54
sean-k-mooney*compute objects11:54
sean-k-mooneyand ensure each one will use the correct one11:54
bauzasI'll test it11:56
bauzasI'm pretty done with my series11:56
bauzasI'm just adding more tests11:56
sean-k-mooneyas in "ready to give up"  or its good for re review11:57
sean-k-mooneyim hoping the latter :)11:57
sean-k-mooneymain feedback at a glance is you have no docs and no release note so in addtion to tests please think about those11:57
sean-k-mooneyping me when you would like a full review and/or manual testing of this11:58
bauzassean-k-mooney: yup, you can manually test if you want11:58
bauzassean-k-mooney: the WIP is +W because of the tests and indeed the documents11:59
sean-k-mooneyyou probaly have not tried using mixed cpus (a vm with pinned and unpined cores) and im not sure if you have check what happens when you use vcpu_pin_set vs cpu_dedicated_set11:59
bauzass/+W/-W of course11:59
bauzassean-k-mooney: good point, I can try11:59
sean-k-mooneyill test those manully but when you have some basic test functional test to cover the commen cases pushed in an non wip it would be nice to have functional tests for those12:00
sean-k-mooneywe should also test the other funky numa cases i really dont expect it to matter but asymetic numa nodes and multiple numa. all of those advanced cases however can be added later12:02
sean-k-mooneyi dont belive your code will care12:02
sean-k-mooneyinstance.numa_topology.cpu_pinning should be ever PCPU the vm is pinned too regardless of numa12:03
sean-k-mooneyincluding the extra one not used by vm cores when you use hw:emulator_thread_policy=isolate12:04
bauzasnot sure I fully understand your point, sorry12:06
sean-k-mooneyif you tried to parse the pinned cpus form the xml there are a bunch of edgecases that you would have had to handel12:06
sean-k-mooneylike also lookign at the emulator thread12:06
sean-k-mooneybut since your using instance.numa_topology.cpu_pinning you can mostly ignore that complexity12:07
bauzasyup changed it based on your point12:07
sean-k-mooneyso while it would still be nice to test some fo the more complex numa toplogies and edgecases the code should not genrally have to care about them12:08
bauzasyup, that's why I want to have migration tests + some numa ones12:08
sean-k-mooneybauzas: we are expictly only supporting this when you use cpu_dedicated_set right and not vcpu_pin_set12:09
bauzascorrect12:09
bauzaswell, checking my code12:09
sean-k-mooneydo we want to raise a config error like dansmith is doing in the stable uuid series12:10
sean-k-mooneyi.e. if you enable this but also have vcpu_pin_set defiend12:10
sean-k-mooneyor dont have cpu_dedicated_set defiend12:10
bauzasgood question, I need to think about it12:10
sean-k-mooneythere are extra edgecases that only come into play if you use vcpu_pin_set12:11
sean-k-mooneywhich is why im asking12:11
bauzashttps://review.opendev.org/c/openstack/nova/+/868237/6/nova/virt/libvirt/cpu/api.py12:11
sean-k-mooneyspecificaly related to cpu_thread_policy=isolate on a host with hyperthreading12:11
bauzashere, we only check get_decicated_set()12:11
bauzasbut we try to look the numa_topo blob from the instance12:12
sean-k-mooneyok so that is not going to block it12:12
bauzasif the instance is not having a numa_topo blob, we could have an exception, shit12:12
sean-k-mooneywell we shoudl not have an expction12:13
bauzasas I directly call the subfield from numa_topo12:13
sean-k-mooneywe should do nothing12:13
bauzaswhat's the default ?12:13
bauzasi need to check the object default values12:13
sean-k-mooneyoh you ment it will currently raise12:13
bauzasfor the standard instance typo12:13
sean-k-mooneyyes it will12:13
sean-k-mooneythe default is None12:13
bauzasyeah12:13
bauzasmy code is wrong for standard non-numa instances12:14
sean-k-mooneyso right now this will casue an atribute error if there is no numa toplogy12:14
sean-k-mooneyyep12:14
bauzascorrect, I need to fix it12:14
sean-k-mooneybut also we shoudl add a check in init host in the driver to see if CONF.libvirt.cpu_power_management is enabled and cpu_dedicated_set is not defined12:15
sean-k-mooneyand raise an InvalidConfiguration exception12:15
bauzaswe haven't said it in the spec but that looks ok to me12:16
sean-k-mooneywell the old way of doing cpu pinning is deprecated for removal12:16
sean-k-mooneyand the old way ( if the vm has cpu_thread_policy=isolate and the host has hyperthread) claims 2 host cpus per guest cpu12:17
sean-k-mooneybauzas: i think your code will handel that but if we want to allow that we need to test it12:17
sean-k-mooneyso either raise an  error or we need to test that in a fucntional test later in the series12:17
bauzasI don't want to support the old config12:17
sean-k-mooneyworks for me12:18
bauzasso I'd rather prefer to hardstop at startup if power management is set with legacy pinning config12:18
bauzashence me saying about the non-discussed in the spec but I'm OK12:18
sean-k-mooneyi would like to remove the old config and pining logic next cycle if we can find time to do it12:18
sean-k-mooneybauzas: yep we didnt dicuss it on the spec since we generally for get about the legacy pinning12:19
bauzascool12:19
bauzasno worries12:19
sean-k-mooneyit was ment to be remvoed 2 or 3 releases ago12:19
bauzasnp12:20
bauzassean-k-mooney: one last question, can we both have cpu_dedicated_set and vcpu_pin_set defined ?12:23
bauzasor are they mutually exclusive ?12:23
* bauzas starts writing the init_host failstop12:24
sean-k-mooneycpu_dedicated_set and vcpu_pin_set are mutally exclsive12:25
bauzas++12:25
sean-k-mooneyalthough that is checked later in the code not in the config option definition12:25
sean-k-mooneyones in compute and the others in default12:25
sean-k-mooneybauzas: you can simpley check "if CONF.compute.cpu_dedicated_set is None and CONF.libvirt.cpu_power_management: raise"12:27
bauzasthat was my guess12:30
* bauzas needs to go errand for 1.5h12:32
sean-k-mooneyim going to swap to downstream stuff for a while unless pinged12:32
sean-k-mooneybauzas: can we try and land dansmith's stable uuid seriese again today12:32
bauzassean-k-mooney: yup, I can try to take a look again12:32
sean-k-mooneycool im +2 all the way up. dan did some rebases and fix a minior bug so they lost the +2s they had before12:33
*** dasm|off is now known as dasm13:52
*** dasm is now known as dasm|rover13:52
ralonsohsean-k-mooney, hi! Do you know if this is a known error?14:23
ralonsohhttps://zuul.opendev.org/t/openstack/build/0d9af8c4e906422a9e4a27b1b849f31514:23
ralonsohThis is the second recheck with this error14:23
sean-k-mooneyno thats not a know error14:25
sean-k-mooneyalthough the volume tests can be flaky14:25
sean-k-mooneythis looks unrealted14:26
sean-k-mooneyi think there may have been an OOM issue14:29
sean-k-mooneyhttps://zuul.opendev.org/t/openstack/build/0d9af8c4e906422a9e4a27b1b849f315/log/controller/logs/screen-memory_tracker.txt#476814:29
ralonsohright, thanks14:29
sean-k-mooneylook like nova api lost access to the db14:30
sean-k-mooneyso my guess is mariadb was killed14:30
sean-k-mooneysame for keystone it los db connection14:31
ralonsohthat usually happens with a oom14:31
sean-k-mooneyyep so we might need to add a little more swap to that job14:31
ralonsohdid you try reducing the parallelism in the functional tests?14:31
sean-k-mooneyits not the funcitonl tests its tempest14:31
ralonsohright, in tempest14:31
sean-k-mooneyits currently 4 which should be ok14:33
ralonsohwe reduced some jobs to 3 or 214:33
ralonsohto avoid this issue14:33
sean-k-mooneyi think this is using the default swap of 1G we shoudl set it to 814:33
sean-k-mooneywe could reduce concernace too but i would try swap first14:34
ralonsohperfect14:34
sean-k-mooneywe crrently set concurancy here https://github.com/openstack/os-vif/blob/master/.zuul.yaml#L2314:34
sean-k-mooneyyou can add configure_swap_size: 8192 there as well14:35
ralonsohshould I do in a separate patch?14:36
sean-k-mooneyif that does not work then set concurancy to 314:36
sean-k-mooneyya lets make it a spereate patch14:36
ralonsohcool, give me 1 min14:36
sean-k-mooneythen we can just recheck yours if it passes once its merged14:36
opendevreviewRodolfo Alonso proposed openstack/os-vif master: Increase the swap size to 8GB in tempest jobs  https://review.opendev.org/c/openstack/os-vif/+/87265514:39
sean-k-mooneyok lets leave that run but we should be able to merge both by the end of the day all going well14:40
ralonsohthat's perfect14:41
opendevreviewElod Illes proposed openstack/nova stable/wallaby: [stable-only] Remove broken sdk job from wallaby  https://review.opendev.org/c/openstack/nova/+/87179815:39
spatelsean-k-mooney did you ever see this error when trying to create vm1 - https://paste.opendev.org/show/bmAj7IrkhbRpMbe14zht/16:10
sean-k-mooneyi have see it if the wsgi timeout or haproxy timeout is set too low and the services are under heavy load16:13
spatelHmm!! 16:13
sean-k-mooneyyou need to check if the request ever got to the nova api. if it did you need to check if the api responed before the connection closed and if it got to the loadbalance or not16:14
spatelserver load is 0.66 average 16:14
spatelI can do nova list etc.. which works fine. only vm creation choking up 16:15
spatellet me check nova api logs etc..16:15
spatelHAproxy showing all green health for nova members16:17
spatelopenstack compute service list also showing all service up 16:18
spatelThis logs has nothing to do with this issue correct - https://paste.opendev.org/show/bhP6raKj9pFgARdlb0ES/16:22
spatelsean-k-mooney it works after i restarted all nova-* service 16:25
spateldo you think this is bug? I am running wallaby release 16:25
dansmithbauzas: you still around?16:28
bauzasdansmith: yes16:29
dansmithany chance you could look at the stable compute stuff today? it's +2d up to the top, most of what is left are additional checks and tests16:29
dansmithI'm going to work on docs next week16:29
sean-k-mooneyspatel: maybe but without you debuging it and finding a root cause or an indication of the problem we dont dont ahve anything to go on16:30
bauzasdansmith: I'm a bit short in time but I can try16:31
sean-k-mooneyspatel: i dont think that log is related16:31
spatelsean-k-mooney i think better i should upgrade to Xena or yoga. 16:32
spatelwallaby is little behind now16:32
dansmithbauzas: ack, well, if we have to recheck grind them, I just don't want to be racing next week.. and monday is a holiday for some folks16:32
bauzasdansmith: I'm running up to finish to write functests on https://review.opendev.org/c/openstack/nova/+/868237/ but again, I'll try16:34
dansmithokay, well, it's not that critical I guess16:35
dansmithmaybe I could bribe melwitt to have a look-see16:36
bauzassean-k-mooney: quick q, when reading https://docs.openstack.org/nova/latest/admin/cpu-topologies.html#configure-libvirt-pinning16:40
bauzassean-k-mooney: on my functest, I only did set cpu_dedicated_set but when trying to boot a regular non-pinned instance, it errors out16:40
bauzaswith NoValidHost16:41
bauzasso I guess I also need to set cpu_shared_set ?16:41
* bauzas thought it was implicit16:41
bauzashttps://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu_dedicated_set doesn't mention the dependency16:42
bauzasoh wait, see my bug16:43
bauzasmy self.flags was setting None to cpu_shared_set16:44
bauzasffff16:45
sean-k-mooneycpu_shared_set is for non pined core yes16:45
sean-k-mooneyor floating vms16:45
bauzasI know tyhis16:45
sean-k-mooneyyou only need to set it if cpu_dedicated_set is defiend16:45
bauzasok, both are dependent then16:45
sean-k-mooneyi dont wnat to go into the complexiyt i can explain it in detail next week if you like16:46
bauzasyou can't just set cpu_dedicated_set without opt-in the cpu_shared_set16:46
bauzasno worries, it works16:46
bauzasit was a pebkac16:46
sean-k-mooneyyou can but if you dont set cpu_shared_set then the host can only be used for pinnded vms16:46
bauzasI see16:46
bauzaswith the restricted set of dedicated cpus16:47
bauzaslike, I have 10 cpus, I set 5 of them as dedicated and period.16:47
sean-k-mooneyyep16:47
bauzasthat means that the 5 lefts won't be used16:47
bauzasgotcha16:47
sean-k-mooneyya they are reserved for the host16:47
sean-k-mooneythe ones that are not listed16:47
sean-k-mooneyso normally i would expect the first core in each socket to not be in either set16:48
sean-k-mooneyand then you devied up the rest depending on your workloads16:48
bauzascool enough16:50
opendevreviewElod Illes proposed openstack/nova stable/yoga: ignore deleted server groups in validation  https://review.opendev.org/c/openstack/nova/+/86798917:05
opendevreviewElod Illes proposed openstack/nova stable/yoga: add repoducer test for bug 1890244  https://review.opendev.org/c/openstack/nova/+/87266317:05
opendevreviewMerged openstack/nova stable/yoga: Improving logging at '_allocate_mdevs'.  https://review.opendev.org/c/openstack/nova/+/87141418:31
opendevreviewSylvain Bauza proposed openstack/nova master: enable cpus when an instance is spawning  https://review.opendev.org/c/openstack/nova/+/86823718:46
dvo-plvHello19:50
dvo-plvI have a question about trait. I added new trait COMPUTE_NET_VIRTIO_PACKED. WHen I execute next command :  openstack allocation candidate list --resource VCPU=1 I can see that one compute node has this trait and another does not19:52
sean-k-mooneyyou did not request the trait in the query19:53
sean-k-mooneyso you will get back a list of all host that have 1 cpu core free19:53
dvo-plvWhen I do live migration from compute node with new trait to the comptue without this trait, soes scheduler should check if requested compute node has this trait and forbid to execute migration ?19:53
sean-k-mooneyif and only if you have requested it in the flavor or implemnted that in your change correctly19:54
sean-k-mooneyyou would ideally have a prefilter that add the trait request19:54
dvo-plvI set this trait in the flavor with next command openstack flavor set --property trait:COMPUTE_NET_VIRTIO_PACKED=required 319:55
sean-k-mooneythat will take effect for new vms but not existing ones19:55
sean-k-mooneyif you have that set however then it will be included with the placment request and enforced19:55
dvo-plvYes, I create flavor with this trait and after that create VM and try to do live migration19:56
sean-k-mooneythen placement should enable this to work as intended19:56
sean-k-mooneythe vm will only shcedule to a host with that trait including on migrations19:57
sean-k-mooneythis is part of this proposal correct https://review.opendev.org/c/openstack/nova-specs/+/868377/1/specs/2023.1/approved/virtio_packedring_configuration_support.rst19:58
dvo-plvIn the spec file you told that packed ring option should be handled in some specific way to be sure that vm with packed_ring qemu option will not migrate to the node without this feature support and i can not get this feature from the libvirt. I found that this feature is enabled only from libvirt 6.3. So if I migrate to the node with lower libvirt the next method _check_compatible_with_source_hypervisor will forbid to migrate20:00
sean-k-mooneythat is not a sufficent check unfortunatly20:01
sean-k-mooneythis is not a feature of libvirt its a qmeu one and that check would already be too late20:01
sean-k-mooneywhat we need to do is have nova report the trait on all hosts that support this20:02
sean-k-mooneyand then either have the guest opt into this with a image property or flavor extra specs20:03
sean-k-mooneywe can then use that image property/extra spec to request the trait automatically20:03
sean-k-mooneythat is the simple approch 20:03
sean-k-mooneythe harder approch is to figure out a way to auto enable this20:04
sean-k-mooneyi dont know a simple way to do that off the top of my head unfortunetly20:04
sean-k-mooneyits a bit late here but we should proably disucss this some more next week20:05
dvo-plvanother way, we can check qemu version, this feature was implemented from qemu 4.220:05
dvo-plvor grep funtionality like in this port http://blog.vmsplice.net/2020/05/how-to-check-virtio-feature-bits-inside.html20:06
dvo-plvThank you, let's move it to next weekє20:07
sean-k-mooneythe issue is we need to make the scudleing work before we have selected any host20:07
sean-k-mooneyso in general we cannot check the qemu/libvirt verion at the schduling step20:08
dvo-plvI tried to get this feature at this method static_traits in the nova/virt/libvirt/driver.py20:08
sean-k-mooneythe best i can think of would be to stash a flag in the instance_system_metadta to recored it was booted on a host that supported the pact format20:09
sean-k-mooneyand then include a request for the trait if that is found in the instance_system_metadata20:09
sean-k-mooneythat is proably the best we can do since we dont know if its used by the guest20:09
sean-k-mooneythe static_traits function is the correct plase to report the trait for the compute host20:10
gmannbauzas: stephenfin: need one more review in this to unblock stable/wallaby gate (removing broken sdk job) https://review.opendev.org/c/openstack/nova/+/871798 21:06
*** dasm|rover is now known as dasm|afk21:36
melwittdansmith: I have look-see'd22:39
opendevreviewMerged openstack/nova stable/wallaby: [stable-only] Remove broken sdk job from wallaby  https://review.opendev.org/c/openstack/nova/+/87179822:40
dansmithmelwitt: um22:54
dansmithI'm not sure you can past-tense-verb look-see22:55
dansmith(but thanks :)22:55
melwitt:)22:55
sean-k-mooneyits english there is always a way to do things22:56
sean-k-mooney023-02-03 23:00:25.664 41 ERROR nova.virt.driver [None req-80a00ce3-332e-4a98-be2e-d0a1bd9329d7 - - - - - -] Compute driver option required, but not specified23:00
sean-k-mooneylol ok that explains why its not working23:01
opendevreviewSylvain Bauza proposed openstack/nova master: Enable cpus when an instance is spawning  https://review.opendev.org/c/openstack/nova/+/86823723:05
EugenMayer4I'aim running OVS on zed. What i'am trying to do is. I have 2 networks (or 3, if you also count the provider lan). lets call them provider_wan, intranet and DMZ. Now i want DMZ to only be able to talk to some very few clients on some specific port in intranet (ip / port allow list), but not nothing else in intranet. Also, a client in DMZ should be23:34
EugenMayer4able to access the internet via the provider lan. I finished setting up the internet access, i can also access the intranet, but i;am not able to limit what a DMZ client is able to access in the intranet. Any hints? All clients in this case are nova vms (qemu)23:34
sean-k-mooneythe only way i can think of to do that without the firewall as a service project which is dead is to use a vm as a router23:36
sean-k-mooneyso dont interconnect the networks on the datacenter side or with neutron routhers23:36
sean-k-mooneybut run a pfsence or similar vm and make it the default gateway for the DMZ and intranet23:37
sean-k-mooneyand connet it to thr wan network23:37
sean-k-mooneythen you can implemnte whatever network policy you like there23:37
sean-k-mooneyEugenMayer4: in general the neutorn channel would proably be able to help more23:37

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!