Thursday, 2023-09-14

*** osmanlicilegi is now known as Guest004:33
opendevreviewNobuhiro MIKI proposed openstack/nova-specs master: Re-propose "Add maxphysaddr support for Libvirt" for 2024.1 Caracal  https://review.opendev.org/c/openstack/nova-specs/+/89513505:53
sahidsean-k-mooney: o/ we have discussed about using tenant isolation, but that is not really responding our use case an now all vms scheduled for that given tenant are targeted to that az07:41
sahidwe would like that, the scheduler consider the new az for some specified tenants07:42
sahidany idea about how we can achieve that?07:43
sahidbasically for a given tenant allowed for az3 the return used by the scheduler will be [az1, az2, az3], and for the other tenants only [az1, az2]07:44
sahidbauzas: o/ you may have some idea ^07:46
sahidi'm thinking about to fix a bit multitenantisolation to acheive that07:46
opendevreviewMerged openstack/nova master: Add service version for Bobcat  https://review.opendev.org/c/openstack/nova/+/89374908:24
bauzassahid: looking (I was afk)08:27
bauzasso you would like to return different AZs based on the tenant ?08:28
bauzasif so, yeah, it's https://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.py#L9508:28
sahidhum...08:37
sahidthat does not really work, we will have to duplicate our aggregates and set filter_tenant_id to all the duplicate ones09:09
sahidI guess in that way, for a given tenant it will be possible to schedule on as1, az2, az3 using filter_tenant_id and the other will continue on az1 and az209:11
zigostephenfin: FYI, I also found that freezer-api probably needs some sqlalchemy 2.x love too...09:49
zigoTypeError: LegacyEngineFacade.from_config() got an unexpected keyword argument 'autocommit'09:49
zigo...09:49
zigofailures=51109:50
zigoIs it possible that just removing the autocommit argument is enough?09:50
sean-k-mooneyis freezer still supported09:52
zigoWooops... s/freezer/watcher/09:52
zigoMy bad.09:52
sean-k-mooneyi have almost the same qustion for watcher :P although i had seen that mention at least this side of the pandemic09:53
zigoTo the contrary of freezer, there was some commits for this cycle.09:53
sean-k-mooneythere was like 8 commit in watcher most procedural09:54
sean-k-mooneybut i guess its stil offically supported09:54
sean-k-mooneytobias-urdin: you were making some changes to watcher eiarler in the year. is that a project you have use of personaly09:56
sean-k-mooneyjust wondering if patches were created how likely is it that they would be merged/reviewd09:56
opendevreviewAmit Uniyal proposed openstack/nova master: WIP:Adds device tagging functional tests  https://review.opendev.org/c/openstack/nova/+/89516211:31
tobias-urdinsean-k-mooney: we do not use it, nothing i actively work on, i just did some drive-by fixes for oslo.messaging or fixing ci probably11:51
sean-k-mooneytobias-urdin: ack ya that is what i was assumign i was just wondering if there were any active contibutors left11:57
bauzasralonsoh: while https://review.opendev.org/c/openstack/neutron/+/893447 is now merged (the neutron revert), we still have CI failures today https://zuul.opendev.org/t/openstack/build/01dd3befff5e42cfb0c0f667b8e7e30b12:24
bauzaselodilles: I guess you know the current situation in Nova12:28
bauzaswe still need to await a few merges before RC112:29
bauzashmpfff https://zuul.opendev.org/t/openstack/builds?job_name=nova-live-migration&skip=012:30
bauzas30% of failures I'd say12:31
mgariepyis it possible to force nova a device_type for pci devices ? 12:44
sean-k-mooneyno12:44
mgariepyi wiant to only do some pci passthrough on a couple of A10012:45
sean-k-mooneyyou can filter by the pci adress12:46
sean-k-mooneywith support for  regxes or bash globs if you dont want to just use the adress12:46
mgariepyhttps://github.com/openstack/nova/blob/stable/2023.1/nova/pci/stats.py#L682C1-L683C112:46
sean-k-mooney yes if the alais is not device_type pf then we filter them out when considering candiates12:47
sean-k-mooneythat does not mean you cant reqeust PF for passthrough12:48
sean-k-mooneyyou jsut have to set the device_type to type-PF in the pci ailais12:48
sean-k-mooneyso inestead of type-pci or type-vf https://github.com/openstack/nova/blob/stable/2023.1/nova/pci/request.py#L34 you woudl set type-PF12:49
sean-k-mooneythe alias and device list work toghtere to selec tthe device. the alias is how you describe the request and the device_list (pci whitelist) is how you declar what device may be allocated12:50
mgariepyalias = { "name": "nvidia-a100-smx4-40gb", "product_id": "20b0", "vendor_id": "10de", 'device_type': 'type-PF'}12:51
mgariepyrestart the scheduler, and still get the same error.12:51
sean-k-mooneyyou need to set this on all the compute too just an fyi hte have to match12:52
sean-k-mooneybut what is the error12:52
mgariepyon it's also set on the gpu compute.12:53
sean-k-mooneyhave you checked the pci devices tabel to ensure the a100s are correctly tracked12:53
sean-k-mooneywether they show up as type-PF or type-pci will depend on if they report supprot for creating VF12:54
sean-k-mooneythat may depend on if you have enbale mig mode of not12:55
sean-k-mooneyfor t4 it depeneded on the firmware version12:55
sean-k-mooneyi think a100 woudl alays report sriov supprot but its worth checking12:55
mgariepy Placement PCI resource view: Placement PCI view on gpu-hpc21145: RP(gpu-hpc21145_0000:01:00.0, CUSTOM_PCI_10DE_20B0=1, traits=COMPUTE_MANAGED_PCI_DEVICE)12:56
sean-k-mooneyoh your using the pci in placemnt feature12:56
mgariepyyep all i tried was also failing :) haha12:57
mgariepyon 2023.112:57
sean-k-mooneywell this should be fine12:57
sean-k-mooneyso is it passign placment and failing in the pci filter?12:57
sean-k-mooneyor is it failing to get results form placemnt12:57
elodillesbauzas: ACK. you can -1 the patch until it'll be updated: https://review.opendev.org/c/openstack/releases/+/89469312:59
bauzasoh missed it from my mails12:59
mgariepyfails at running : _filter_pools_for_unrequested_pfs()13:00
bauzasI was wondering why I wasn't seen it13:00
mgariepydev_type vs device_type ?13:00
mgariepyor it's abstracted somewhere ?13:00
sean-k-mooneyits device_type in the alias13:01
ralonsohbauzas, yes, I'm going to mark this test as unstable again13:01
sean-k-mooneyhttps://github.com/openstack/nova/blob/stable/2023.1/nova/pci/request.py#L95C10-L104 that is the jsonschma fo rthe validation13:01
ralonsohwe need to find a better way to set the subport as active after the live migration13:02
sean-k-mooneybut i think its dev type in the object13:02
mgariepylet me logs the request and pool. to see if something seems odds.13:02
sean-k-mooneyhttps://github.com/openstack/nova/blob/stable/2023.1/nova/pci/request.py#L143C13-L14513:02
bauzasralonsoh: ack13:02
sean-k-mooneymgariepy: i would check the pci_devices table in teh cell db13:02
sean-k-mooneyand make sure the device has the expected type ectra13:03
mgariepynova.pci_devices tells me that the device is type-PF13:04
ralonsohbauzas,  https://review.opendev.org/c/openstack/tempest/+/89516713:05
sean-k-mooneymgariepy: what does the pci whitelist look like for that host13:09
sean-k-mooneybased on the alias above it looks co unless you added physnet or remote managed or something im not sure why it would be remvoed13:11
sean-k-mooneyits the pci filter not the numa filter that is removing it right?13:11
mgariepyhttps://paste.openstack.org/show/bij0SgRpSEVCTILyzIzf/13:11
mgariepyseems to be the pci filter13:12
sean-k-mooneyhum that all looks correct to me13:13
mgariepyhttps://paste.openstack.org/show/bP2oY2IOe4lNvj6wGB8w/13:14
mgariepywhat is resolving the alias ? is it dont by the scheduler directly ?13:14
sean-k-mooneythat depends on the request13:15
mgariepymy flavor has alias stuff in in.13:15
sean-k-mooneybut no i think its doen in the compute or the api13:15
sean-k-mooneyhttps://docs.openstack.org/nova/latest/admin/pci-passthrough.html#configure-nova-api13:16
sean-k-mooneyso for ne vm request this is usd by the nova-api not the scheduler13:17
sean-k-mooneyand for rezize its used form teh compute13:17
sean-k-mooneyso if you are udataing this you need to restart the nova-api not the scheduler13:17
sean-k-mooneythe scheudle reciveds a fully popultated RequestSepc object which has the pci resuets embeeded in it13:18
mgariepybut me resquest doesn't receive the device_type :/13:19
sean-k-mooneydid you restart the nova-api processes after updating hte alias13:23
mgariepyi did restart the nova slice. let me try juste putting the same config accross all the cluster to see if it changes something.13:24
mgariepywhat driver shoud be loaded fo the pci dev ? 13:32
mgariepypci-stub or vfio-pci ?13:32
sean-k-mooneythe driver should not matter to the schduler13:32
sean-k-mooneythat woudl only matter on the compute node but yes either of those shoudl work13:33
mgariepyok13:33
sean-k-mooneythe important part is that the framebuffer is not initallised by xorg/wayland as that woudl prevent the passthough13:33
mgariepyha ok.13:33
sean-k-mooneyvfio-pci is what kvm/qemu will use when its being passthough through13:34
sean-k-mooneyso prebinding to that is generally not a bad idea but often not required13:34
mgariepyi did some stuff with passthrough with el-cheapo gamer gpus. but with older openstack release.13:34
sean-k-mooneythre is nothing really that has changed that would impact this13:35
sean-k-mooneythe alias and whitelist/device list syntax is the same13:36
sean-k-mooneyand your reporting that its fialing in the pci filter os its not related to the placment part13:36
sean-k-mooneysince it passes that to get to the point where its failing13:36
mgariepyyeah i'll retest in like 30 minutes.. meeting time ;) 13:37
*** tosky_ is now known as tosky14:06
mgariepysean-k-mooney, it works now Thanks a log for your help.15:04
mgariepywith the type-PF and all my server with the same config it's almost magical :D15:04
sean-k-mooneyawsome15:21
opendevreviewSylvain Bauza proposed openstack/placement master: Update 2023.2 reqs to support os-traits 3.0.0 as min version  https://review.opendev.org/c/openstack/placement/+/89518615:21
sean-k-mooneythe reqeust was proably being accpeted by a not restarted instace15:22
mgariepyyeah15:22
bauzasdansmith: sean-k-mooney: forgot to propose the requirements update for the new traits :  https://review.opendev.org/c/openstack/placement/+/89518615:23
mgariepyany tips on debugging that sort of stuff when multiple instance of every service are running ?15:23
bauzaswe need it before RC115:23
dansmithbauzas: aren't we past requirements freeze?15:23
dansmithis that in u-c already?15:23
bauzasdansmith: os-traits was delivered in March15:24
bauzasand u-c already has it15:24
dansmithokay15:24
bauzashttps://opendev.org/openstack/requirements/raw/branch/master/upper-constraints.txt15:24
dansmithack, cool15:25
bauzasso we just need to say 'look, for Bobcat, we need to accept only 3.0 since we use a trait in it"15:25
bauzasI haven't checked tho whether we really use this trait in Placement but I'm assuming it15:25
dansmithbauzas: wait you're not sure if we need the new traits? :)15:26
dansmithwe just bumped for this a few weeks ago: https://review.opendev.org/c/openstack/nova/+/87322115:26
opendevreviewGorka Eguileor proposed openstack/nova master: Fix load_validators  https://review.opendev.org/c/openstack/nova/+/89518915:38
opendevreviewGorka Eguileor proposed openstack/nova master: Fix debug options  https://review.opendev.org/c/openstack/nova/+/89519015:39
opendevreviewGorka Eguileor proposed openstack/nova master: Logs cinderclient requests when debugging  https://review.opendev.org/c/openstack/nova/+/89519115:39
opendevreviewGorka Eguileor proposed openstack/nova master: Fix guard for NVMeOF volumes  https://review.opendev.org/c/openstack/nova/+/89519215:39
*** efried1 is now known as efried15:51
bauzasdansmith: so, I made a few researches, my bad, just followed blindly a pattern from Antelope when I did https://review.opendev.org/c/openstack/placement/+/87408015:53
bauzasso, https://review.opendev.org/c/openstack/nova/+/873221 is using traits-2.10 https://review.opendev.org/c/openstack/releases/+/873106 which is already supported as minimum15:54
bauzasdansmith: so, I think I'll abandon https://review.opendev.org/c/openstack/placement/+/89518616:10
dansmithbauzas: okay16:11
bauzasI'm actually lost, we created 3.0 in order to no longer support py2.6 and py2.716:12
sean-k-mooneybauzas: so https://review.opendev.org/c/openstack/nova/+/831194/37/nova/share/manila.py#43 is returning the connection to via the sdk to talke to manilla16:12
bauzassean-k-mooney: sec, trying to see whether we need https://review.opendev.org/c/openstack/releases/+/89469816:12
sean-k-mooneyits fine16:13
sean-k-mooneyfocus on that i was just wonderign where the error was16:13
bauzassorry I meant https://review.opendev.org/c/openstack/placement/+/89518616:13
bauzaswe'll add a new trait which isn't used yet by Nova16:14
bauzasbut we'll also make py3.6 and py3.7 unsupported16:14
bauzasdansmith: sean-k-mooney: okay, I think I sorted it, we eventually need https://review.opendev.org/c/openstack/placement/+/895186 not because of the new trait, but because we want to unsupport 3.6 and 3.716:20
bauzasin Bobcat16:20
dansmithokay16:20
sean-k-mooneyok so i should +2w that then right16:20
sean-k-mooneythe min version bump to 3.0.016:20
* sean-k-mooney waits...16:21
bauzassean-k-mooney: sorry, yeah16:24
bauzasthanks16:24
* bauzas doesn't have yet chameleon eyes :D16:25
* bauzas is now back into recheck mode16:25
sean-k-mooneymgariepy: the only real tip is to use the request-id to corralate the requets across diffent services when looking at the logs16:39
sean-k-mooneythat and try and use automation like kolla-ansible to ensure you have the same config on all relevent nodes rather then doing manual changes16:40
sean-k-mooneybut no not really other then that16:40
* bauzas disappears for family reasons but will be back later in the evening16:41
-opendevstatus- NOTICE: The lists.airshipit.org and lists.katacontainers.io sites will be offline briefly for migration to a new server16:48
mgariepysean-k-mooney, i'm using openstack-ansible17:03
mgariepywhen doing changes that i'm not quite sure i do it manually instead of doing it via osa.17:03
*** EugenMayer4404 is now known as EugenMayer44018:13

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!