Friday, 2022-02-18

*** hemna5 is now known as hemna01:45
*** hemna5 is now known as hemna03:21
gibisean-k-mooney: thanks. enjoy PTO today07:51
gibichateaulav: you have to look into the openstack service logs                        or 'spread' VM's NUMA cells  07:52
gibisorry07:53
gibiwrong copy paste buffer07:53
gibihttps://zuul.opendev.org/t/openstack/build/39dcedc2915b4dce9bacce7d1c21f8fe/logs07:53
gibiso heer in the job result 07:53
gibiunder controller/logs and comupute1/logs you will find screen-n-cpu.txt with the nova-compute logs07:53
opendevreviewFelix Huettner proposed openstack/nova stable/victoria: Gracefull recovery when attaching volume fails  https://review.opendev.org/c/openstack/nova/+/82950407:56
brinzhangbauzas, gibi, songwenping: vGPU support in Cyborg may need a slot of PTG, do you have some suggestion?07:58
opendevreviewFelix Huettner proposed openstack/nova stable/train: Gracefull recovery when attaching volume fails  https://review.opendev.org/c/openstack/nova/+/82950708:00
brinzhangwe would like to register 2:00UTC-3:UTC at April 5, is it ok?08:01
opendevreviewGhanshyam proposed openstack/nova master: Separate flavor extra specs policy for server APIs  https://review.opendev.org/c/openstack/nova/+/82962608:02
gibibrinzhang: I'm not sure we have a ptg etherpad yet. as far as I know RedHat folks including bauzas is PTO on today. 08:04
brinzhanggibi: ack08:07
gibiso let's get back to this on Monday08:07
brinzhanghttps://etherpad.opendev.org/p/nova-zed-ptg I saw this link, it's nothing else08:07
brinzhanggibi: ok, we can discuss on monday ^^08:07
gibichateaulav: so for example the logs from your trial with my last suggestion visible here (filtered to error only) https://zuul.opendev.org/t/openstack/build/08789d9f0b0546cb9e6fb2b6f4f0231c/log/compute1/logs/screen-n-cpu.txt?severity=408:12
opendevreviewFelix Huettner proposed openstack/nova stable/stein: Gracefull recovery when attaching volume fails  https://review.opendev.org/c/openstack/nova/+/82985908:44
tobias-urdingibi: friendly request for backport review https://review.opendev.org/c/openstack/nova/+/828407 :)08:45
opendevreviewFelix Huettner proposed openstack/nova stable/rocky: Gracefull recovery when attaching volume fails  https://review.opendev.org/c/openstack/nova/+/82986008:51
opendevreviewFelix Huettner proposed openstack/nova stable/queens: Gracefull recovery when attaching volume fails  https://review.opendev.org/c/openstack/nova/+/82986108:55
gibitobias-urdin: hi! I don't have +2 rights on stable branches :/09:15
tobias-urdingibi: oh sorry for the noice!09:17
gibitobias-urdin: no worries09:17
opendevreviewGhanshyam proposed openstack/nova master: Complete phase-1 of RBAC community-wide goal  https://review.opendev.org/c/openstack/nova/+/82986609:29
chateaulavgibi: ok,I was trying a few things yesterday, because greens was still failing. I'll put the change back and then investigate that way. 10:51
chateaulavGrenade10:51
gibichateaulav: probably easier to create a unit test that calls obj_make_compatible on a populated ComputeNode object10:56
gibiwith that you can troubleshoot locally10:56
chateaulavgibi: Yeah. Biggest thing was trying to have zuul pass which it did. I'll see about doing that, and can test to ensure the new values aren't processed10:59
chateaulavgibi: so then with the current patchset, grenade and everything is happy.13:22
chateaulavit did not like those exceptions or moving the backport for hvspec above super13:23
gibichateaulav: but it did not like with a different reasons. 13:23
chateaulavgotcha13:24
gibichateaulav: I personally would like to keep rejecting new archs like here https://review.opendev.org/c/openstack/nova/+/828369/14/nova/objects/hv_spec.py13:24
gibichateaulav: I know that you removed this to please grenade13:24
gibibut I think that just hides the problem 13:24
gibiI have not time right now to proposa a patch top of yours showing how to make a unit test to show the same problem in a you local env 13:25
chateaulavok, and thats what im starting to see and understand. getting use to zuul and where to find tthings.13:25
gibibut I think that would be a way forward to troubleshoot 13:25
chateaulavmakes sense13:25
gibiI try to get to your problem before end of today13:26
chateaulavappreciate it, ill be working all day on it. gonna see about that unit test13:29
opendevreviewFelix Huettner proposed openstack/nova stable/train: Gracefull recovery when attaching volume fails  https://review.opendev.org/c/openstack/nova/+/82950714:06
jamespageo/ - is there a good reference on how scheduling should behave when hypervisors have partial hugepage memory configuration (say 150GB of 512GB)14:51
jamespage?14:51
gibijamespage: I don't think we have. if your instance needs numa topology becasue of cpu pinning or huge pages then the NUMATopologyFilter is responsible to select the proper host15:05
artomjamespage, I don't think the "partial" matters. If the host has enough pages of the correct size to fit the instance, it should pass scheduling15:11
jamespagegibi, artom: interestingly instances with larger page size configuration schedule fine - the problem I'm looking at happens when an instances without large pages gets scheduled to the hypervisor15:13
jamespagedeployment is using instances with large amounts of RAM - exceeding the diff between Total RAM - HugePages - Reserved by quite a bit15:13
gibijamespage: does instances without huge pages are they have any numa related requirements, i.e. cpu pinning?15:13
jamespagegibi: nope no extra specs at all15:14
artomjamespage, wait, that just sounds like there's not enough RAM15:15
jamespageartom: I quite agree there is not enough - but I also expected the instance not to be scheduled to a hypervisor with this memory configuration15:16
artomjamespage, ah, so scheduling passes, but the instance doesn't boot because not enough RAM15:16
jamespagewell it does - it then sucks all of the ram up and gets OOM'ed15:16
artomThat's... I think that's a known issue? In the sense that Nova doesn't correctly track memory when it's a mix of hugepages and "normal" page size15:17
gibiartom: yeah I think if the small pages instance doesn't request any numa related things then the NUMATopologyFilter is not applied on that instance15:19
gibibut I'm sure sean-k-mooney know a lot more about this15:20
gibibut most of the RH folks on PTO today15:20
sean-k-mooneyi just sat down to do something else15:20
artomgibi, it's not even that, I think we treat all memory as available when scheduling, but when the instance gets on the host the huge pages obviously cannot be allocated to it15:20
sean-k-mooneysomethign about large guests?15:21
* artom tried to avoid pinging Sean to let him enjoy his PTO :P15:21
gibish*t, sorry15:21
sean-k-mooneyif you dont use hw:mem_page_size=small or 4k then the numa code does not run15:21
sean-k-mooneyits15:21
sean-k-mooneyfine15:21
artomWell, it's on his shoulders as well, ignoring IRC is always an option :P15:22
sean-k-mooneyi was just goign to check my home insurance renewal15:22
gibisean-k-mooney: it is about mixing NUMA and non NUMA guests15:22
sean-k-mooneyya that is not supported today15:22
gibijamespage: above sees memory overallocation issues15:22
gibisean-k-mooney: so you say that if hw:mem_page_size=small is added then we allow mixing small and large instances?15:23
sean-k-mooneyyou must not under any cirucmstance mix numa and non numa guest or it will break all our memory tracking15:23
sean-k-mooneyyes15:23
sean-k-mooneyif you use hw:mem_page_size=small its fine15:23
sean-k-mooneybrb15:23
sean-k-mooneytl;dr is using hw:mem_page_small will numa affine the guest and track the memory correctly15:25
sean-k-mooneyif you dont it will float and cause random OOM events15:25
gibithanks!15:26
sean-k-mooneyplacement is not enough to save us since the OOM reaper runs per numa node and also hugepage guest dont supprot oversubsricption15:26
sean-k-mooneyso if if you have memeroy over subscrion it wont really work right15:26
sean-k-mooneywe have something about this in our downstream docs but basically we say use hostaggreates to prevnt mixing numa and non numa guests15:27
sean-k-mooneyjamespage: if you have enuch swap by the way those other instace coudl technicaly boot but the system would likely be unstable so its better to avoid that15:29
jamespagesorry - power outage just at the wrong moment15:29
gibijamespage: https://meetings.opendev.org/irclogs/%23openstack-nova/latest.log.html here are the IRC logs if you are dropped15:30
sean-k-mooneyjamespage: the reason it can be schduled to the host is in placemnt we report total ram and then only reserver what set in host_reserver_ram or what ever that option is called15:31
* jamespage reads backscroll15:31
sean-k-mooneyjamespage: we dont currently automaticlly reserve the hugepage memrory because we dont currenlty track that as a seperate pool15:31
sean-k-mooneyso the schduelr is only looking at total-reserved not total -reserved - hugepages15:32
sean-k-mooneyactully if its a recent release we dont even have a ram filter anymore so its just placment that is checking15:33
jamespagesean-k-mooney: having read most of the code that was my hunch so thanks for confirming... 15:33
sean-k-mooneyso ya that is why we say you cant mix numa and non numa today15:34
sean-k-mooneyif we ever get to tracking numa in placment then it will fix that15:34
sean-k-mooneysicne we will have a seperate pool per page size15:34
sean-k-mooneyand or numa node15:34
sean-k-mooneywe proably could paper over this temporally with a filter similar to the nuam one or even enhance the numa one to work with non numa instance but right now we just tell people not ot do it15:36
artomI guess another option would be to tell operators to include the amount of hugepages in reserved_host_ram?15:42
artomIt's... weird, but should work?15:42
sean-k-mooneyno15:42
sean-k-mooneyit will break schduling15:43
artomWouldn't it just prevent scheduling to a host if it doesn't have enough "normal" RAM?15:43
sean-k-mooneyyes but you vould not use the hugepages and normal ram15:43
artomOh, right15:44
sean-k-mooneyyou coul only use total - reserved15:44
sean-k-mooneyso that is what you do if you are using the hugepage on the host15:44
sean-k-mooney+ the huge page reserved option15:45
sean-k-mooneyso we coudl set max allocation size in placement = to total -hugepages15:50
sean-k-mooneythat would help15:51
jamespageside effect to that would be that you could only ever allocate total - hugepages on that hypervisor no?15:52
sean-k-mooneyno15:52
jamespageoh that's the max allocation size - driving the maximum single instance footprint - I see15:53
sean-k-mooneyhttps://docs.openstack.org/api-ref/placement/?expanded=update-resource-provider-inventory-detail#update-resource-provider-inventory15:53
sean-k-mooneyyes15:53
sean-k-mooneymax_unit15:53
sean-k-mooneysorry was just on the phone15:54
sean-k-mooneyso ya total woudl be set to total memroy 15:54
sean-k-mooneyreserved = reserved form config15:54
sean-k-mooneybut if we set max_unit to total-reserved-hugepages that should help15:55
sean-k-mooneyalthough it would have to be the larger of 15:55
sean-k-mooneytotal-reserved-hugepages and hugepages15:55
sean-k-mooneyto not limit the size of hugepage guests15:56
sean-k-mooneyjamespage: is it a resent release15:56
sean-k-mooneyif so im not sure if you can use provider.yaml to twaak the make unit15:57
sean-k-mooneygibi: do you rememebr if that can tweak non custom resouce classes?15:57
jamespageussuri for this particular deploy15:57
gibiOnly CUSTOM_* resource classes and traits may be managed this way.15:58
gibiso no15:58
gibihttps://docs.openstack.org/nova/latest/admin/managing-resource-providers.html15:59
sean-k-mooneyya ok 15:59
sean-k-mooneyit was afetr ussuri anyway15:59
sean-k-mooneyjamespage: best we likely could do is a config option that could posible be backported to set or clamp the max unit15:59
sean-k-mooneyor just not mix for now16:00
gibiwe merged in victoria16:00
sean-k-mooneyjamespage: sorry i cant really be more help there. you could write a custom filter to do the same thing out of tree and load that in the env since its plugable16:01
jamespagesean-k-mooney: no worries - you have been more that helpful in getting my knowledge up to speed in this area  :)16:02
sean-k-mooneyfor existign deployment that is the only way to make mixing numa/non numa on the same host somewhat ok but it still risks OOM events16:02
sean-k-mooneyok i need to go do some other things and my home insure is not done so ill be afk now until monday16:03
sean-k-mooneyenjoy your weekends folks o/16:03
gibio/16:04
gibichateaulav: sorry I run out of time today. I will try to look at the issue tomorrow16:51
gibio/16:51
chateaulavthats sounds good, i think im in the right place and have been playing around with the tests for compute node. 16:52
opendevreviewDmitrii Shcherbakov proposed openstack/nova master: WIP Improve remote-managed port test coverage  https://review.opendev.org/c/openstack/nova/+/82997418:33
opendevreviewRajat Dhasmana proposed openstack/nova master: WIP: Add support for volume backed server rebuild  https://review.opendev.org/c/openstack/nova/+/82036818:34
opendevreviewArtom Lifshitz proposed openstack/nova master: block_device_info: Add swap to inline  https://review.opendev.org/c/openstack/nova/+/82652321:03
opendevreviewArtom Lifshitz proposed openstack/nova master: libvirt: Improve creating images INFO log  https://review.opendev.org/c/openstack/nova/+/82652421:03
opendevreviewArtom Lifshitz proposed openstack/nova master: libvirt: Remove defunct comment  https://review.opendev.org/c/openstack/nova/+/82652521:03
opendevreviewArtom Lifshitz proposed openstack/nova master: imagebackend: default by_name image_type to config correctly  https://review.opendev.org/c/openstack/nova/+/82652621:03
opendevreviewArtom Lifshitz proposed openstack/nova master: image_meta: Add ephemeral encryption properties  https://review.opendev.org/c/openstack/nova/+/76045421:03
opendevreviewArtom Lifshitz proposed openstack/nova master: BlockDeviceMapping: Add encryption fields  https://review.opendev.org/c/openstack/nova/+/76045321:03
opendevreviewArtom Lifshitz proposed openstack/nova master: BlockDeviceMapping: Add is_local property  https://review.opendev.org/c/openstack/nova/+/76448521:03
opendevreviewArtom Lifshitz proposed openstack/nova master: compute: Update bdms with ephemeral encryption details when requested  https://review.opendev.org/c/openstack/nova/+/76448621:03
opendevreviewArtom Lifshitz proposed openstack/nova master: virt: Add ephemeral encryption flag  https://review.opendev.org/c/openstack/nova/+/76045521:03
opendevreviewArtom Lifshitz proposed openstack/nova master: scheduler: Add an ephemeral encryption pre filter  https://review.opendev.org/c/openstack/nova/+/76045621:03
opendevreviewArtom Lifshitz proposed openstack/nova master: block_device: Add DriverImageBlockDevice to block_device_info  https://review.opendev.org/c/openstack/nova/+/82652721:03
opendevreviewArtom Lifshitz proposed openstack/nova master: block_device: Add encryption attributes to image and ephemeral disks  https://review.opendev.org/c/openstack/nova/+/82652821:04
opendevreviewArtom Lifshitz proposed openstack/nova master: virt: Add block_device_info helper to find encrypted disks  https://review.opendev.org/c/openstack/nova/+/82652921:04
opendevreviewArtom Lifshitz proposed openstack/nova master: blockinfo: Add encryption details to the disk_info mappings when provided  https://review.opendev.org/c/openstack/nova/+/77227221:04
opendevreviewArtom Lifshitz proposed openstack/nova master: imagebackend: Add disk_info_mapping as an optional attribute of Image  https://review.opendev.org/c/openstack/nova/+/82653021:04
opendevreviewArtom Lifshitz proposed openstack/nova master: privsep: Move qemu-img create calls under nova.privsep.qemu  https://review.opendev.org/c/openstack/nova/+/82675021:04
opendevreviewArtom Lifshitz proposed openstack/nova master: privsep: Return QemuImgInfo objects from qemu-img info calls  https://review.opendev.org/c/openstack/nova/+/82675121:04
opendevreviewArtom Lifshitz proposed openstack/nova master: privsep: Add encryption support to qemu-img create command  https://review.opendev.org/c/openstack/nova/+/82675221:04
opendevreviewArtom Lifshitz proposed openstack/nova master: libvirt: Report ephemeral encryption traits based on imagebackend  https://review.opendev.org/c/openstack/nova/+/82675321:04
opendevreviewArtom Lifshitz proposed openstack/nova master: libvirt: Configure and teardown ephemeral encryption secrets  https://review.opendev.org/c/openstack/nova/+/82675421:04
opendevreviewArtom Lifshitz proposed openstack/nova master: imagebackend: Add support to libvirt_info for LUKS based encryption  https://review.opendev.org/c/openstack/nova/+/82675521:04
opendevreviewArtom Lifshitz proposed openstack/nova master: imagebackend: Cache the key manager when disk is encrypted  https://review.opendev.org/c/openstack/nova/+/82675621:04
opendevreviewArtom Lifshitz proposed openstack/nova master: libvirt: Introduce support for qcow2 with LUKS  https://review.opendev.org/c/openstack/nova/+/77227321:04
opendevreviewJonathan Race proposed openstack/nova master: object/notification for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82836922:02
opendevreviewJonathan Race proposed openstack/nova master: driver/secheduler/docs for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82205322:02
opendevreviewJonathan Race proposed openstack/nova master: zuul-job for Adds Pick guest CPU architecture based on host arch in libvirt driver support  https://review.opendev.org/c/openstack/nova/+/82837222:02

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!