Thursday, 2023-10-12

opendevreviewsean mooney proposed openstack/nova master: refactor numa claims  https://review.opendev.org/c/openstack/nova/+/89805603:35
opendevreviewsean mooney proposed openstack/nova master: refactor numa claims  https://review.opendev.org/c/openstack/nova/+/89805603:37
opendevreviewsean mooney proposed openstack/nova master: imporve nova object logging  https://review.opendev.org/c/openstack/nova/+/89805704:29
tobias-urdinsean-k-mooney: when you have some seconds, based on https://review.opendev.org/c/openstack/nova/+/824048 and the issue from libvirt down expressed in https://gitlab.com/libvirt/libvirt/-/issues/161 I assume it would be thrown upon implementing the workaround in the nova layer by checking for cgroupsv1/cgroupsv2 and reintroducing the default09:08
tobias-urdincputune.shares value based on that (lets ignore the live migration part of it now and just focus on "having the correct value" based on the crazy decision made in libvirt).. any thoughts on that?09:08
opendevreviewSylvain Bauza proposed openstack/nova master: WIP: add mtty support for vgpus  https://review.opendev.org/c/openstack/nova/+/89810009:40
bauzasdansmith: I know that's too early for you but my devstack fails on your mtty patch https://paste.opendev.org/show/bwpHZPMuePifFk0JWi7v/ (just reply when you want)12:46
bauzassean-k-mooney: ^12:46
bauzasactually the whole stack is here https://paste.opendev.org/show/bLirDAMuGmxhLLod0S9p/12:47
bauzasstack@sbauza-dev2:~/devstack$ cat /etc/lsb-release 12:49
bauzasDISTRIB_ID=Ubuntu12:49
bauzasDISTRIB_RELEASE=22.0412:49
bauzasDISTRIB_CODENAME=jammy12:49
bauzasDISTRIB_DESCRIPTION="Ubuntu 22.04.1 LTS"12:49
bauzasstack@sbauza-dev2:~/devstack$ uname -r12:49
bauzas5.15.0-1037-kvm12:49
fricklerbauzas: is there some linux-modules-extra pkg for that kernel?13:19
dansmithbauzas: looks like maybe it has module symbols from a different kernel13:22
dansmithsince it failed to create /opt/stack/kernel .. did you run it once, upgrade kernel, run again?13:22
bauzasdansmith: it didn't failed to create kernel13:36
bauzas(sorry was on a meeting)13:36
dansmithdidn't fail?13:36
dansmith++/opt/stack/nova/devstack/lib/mdev_samples:compile_mdev_samples:19  mkdir /opt/stack/kernel13:36
dansmithmkdir: cannot create directory ‘/opt/stack/kernel’: File exists13:36
bauzasthe kernel modules, yeah13:36
bauzasah13:36
bauzasprobably because devstack failed before on OVS13:37
bauzasbut I hadn't upgraded my kernel13:37
dansmiththat's what I just said yeah :)13:37
dansmithblow that away and try again, I'll add cleaning of that to clean.sh13:37
bauzaswant me to unstack, delete the kernel dir and stack ?13:37
bauzasok13:38
bauzasI can try13:38
bauzasit's a pure compute node, no controllers on it so should be small and quick13:38
dansmithwell, you could just run the plugin itself again (after cleaning the kernel dir) but re-stacking would be better13:39
bauzasyeah13:39
bauzasif that fails again, I'll run the plugin itself13:40
bauzasdansmith: fwiw, I started to provide the mtty change on nova13:40
bauzasI could have missed something (and I know I did, since I haven't modified the method that checks the mdevs when we restart, but meh not a problem for testing) but I think it's a start13:41
dansmithcool13:42
bauzasdansmith: failing again https://paste.opendev.org/show/bFTPQR1VBSJNQj2jrknG/13:42
bauzasthis time, this is linking the modules13:43
bauzasbut still failing on the definitions13:43
dansmithbauzas: show me "dpkg -l | grep linux" and "ls /usr/src"13:43
bauzascool13:43
dansmithalso uname -r and "ls -l /lib/modules/*/build"13:44
dansmithworked fine for me and sean-k-mooney, but it gets weird if you have a bunch of mismatched kernels and things installed sometimes13:44
bauzasdansmith: there you go https://paste.opendev.org/show/bSXwvpbc6Z96awnWuIRS/13:45
bauzasI can reclone the machine and reinstall devstack again13:46
bauzasI mean, if my env is dirty, meh13:46
bauzasprobably better to start with fresh things13:46
dansmithyeah that' a ton of kernel distraction, although it seems like it all lines up13:48
dansmithhowever,13:48
dansmiththere are lots of reports out there about having linux-headers installed for the right kernel, but needing to reinstall it if it was installed by a previous version13:48
dansmithnot sure why, because it should be all separate13:49
dansmithdid you already have linux-headers for your kernel installed or did the first run of this install it for you?13:49
sean-k-mooneyyou have a kvm variant of the kernl13:52
sean-k-mooneyso maybe thats the issue13:52
sean-k-mooneyi have Linux upstream-devstack 5.15.0-86-generic #96-Ubuntu SMP Wed Sep 20 08:23:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux13:52
dansmithoh is there a separate headers package for the kvm variant? if so that's definitely it13:52
bauzasI just rebuilt the instance I have as devstack213:53
dansmithah yep13:53
dansmiththat's the problem13:53
bauzasand yeah, if the symbols are only on the plain kernel, that explains13:54
bauzasnot sure how I got this variant, probably because of the image I used in my cloud13:54
dansmithso linux-headers-$kver won't work for the variants, because it needs to be linux-headers-kvm I guess13:54
dansmithbauzas: so you can switch to the normal one?13:55
bauzasyeah that's my guess13:55
bauzasdansmith: too late, I rebuilt it13:55
dansmithwhat does that mean, you installed the right headers package and built the modules?13:55
sean-k-mooneyi think they mean reinstalled the vm13:55
dansmiththat's what I said.. switch to the normal kernel, however that needs to happen :)13:56
dansmitheither with apt or rm :)13:56
sean-k-mooneybauzas: are you using a cloud imave or similar? i have never seen the kvm kernel actully installed in anything13:56
dansmithyeah I'm not sure what it's for either13:56
sean-k-mooneywell its a striped down kernel without any hard specific drivers13:56
dansmith"Targeting KVM instance usage. This carries the minimum support required for a guest kernel."13:56
sean-k-mooneybut they dont use it for the cloud image13:57
sean-k-mooneyso i dont know who the audiance for that is13:57
dansmithyeah, probably for ultra size-conscious situations or something13:57
bauzassean-k-mooney: I used some internal cloud at my employer13:57
bauzasstack@sbauza-dev2:~$ uname -r13:58
bauzas5.15.0-1017-kvm13:58
bauzasokay, coming from the image itsefl13:58
bauzasfun13:58
bauzasI'm tempted to pick another image or download another kernel13:59
sean-k-mooneyyes just install a diffent kernel and reboot13:59
sean-k-mooneyor use the ubuntu image i upoloaded13:59
bauzasthat will be the quickiest13:59
dansmithI can see if I can make it work either way, but yeah do the workaround for the moment14:00
sean-k-mooneythe offical ubuntu cloud images have 5.15.0-48-generic 14:00
sean-k-mooneywell thats an old image so its probaly outdated but i just confirmed that so you booted form a customized image14:01
dansmithyeah so you can't even get the support packages for very old kernels, so you need to be running something current anyway14:02
dansmithlike they remove them14:02
bauzashum14:05
bauzaswhich kernel version is currently running on our CI jobs ?14:06
dansmiththe image in our CI gets refreshed periodically for that reason14:06
bauzasbecause I have the choices14:06
sean-k-mooneyi allways do an apt dist-upgrade and reboot before running devstack for what its worth14:06
sean-k-mooneynightly for upstream ci14:06
dansmithbauzas: https://zuul.opendev.org/t/openstack/build/3c70446c0452434a87166d336dc0dbc5/log/job-output.txt#2258014:06
sean-k-mooneybauzas: just install the generic kernel14:06
dansmithlinux-headers-5.15.0-86-generic14:06
bauzasI'll then install linux-image-5.15.0-86-generic14:07
dansmiththat's what I'm running locally too14:07
sean-k-mooneysame14:07
dansmithhmm, actually linux-headers-kvm resolves to linux-headers-$ver-kvm as well, so I'm not quite sure why this isn't just working, actually14:11
dansmithoh but there's also linux-kvm-headers-$ver .. weird14:12
bauzasok, changing the kernel gave me some kernel tain at boot14:15
bauzastaint*14:15
dansmithat boot?14:15
bauzasyeah14:15
bauzasI'll just upload a fucking plain ubuntu image14:15
dansmithlol14:15
dansmithcan you pastebin the taint message? really curious14:16
dansmithunless your provider is inserting something of their own, I can't imagine...14:16
bauzasargh, sorry, again already rebuilt14:21
bauzasIIRC, it was about getting the disk14:22
bauzasok I need to go parenting my kid14:25
bauzasbbiab14:25
dansmithbauzas: fwiw, I can reproduce your issue if I install the kvm variant, so I'll see if I can figure out why that's not working14:50
bauzasack14:50
bauzasthat's crazy, I don't know yet why but the variant is automatically installed in my cloud env :facepalm14:50
dansmithyeah14:51
dansmithoh neither of the headers packages for that kernel contain the module symbols it seems14:52
dansmithyeah totally different, weird14:54
bauzasdansmith: yup and apparently I can't avoid this kernel variant to be installed in my cloud14:54
bauzasI can try to use a bare kernel like the last time but my guess is that it won't boot14:55
dansmithbauzas: can't avoid installing it or can't avoid booting to it?14:55
bauzasboth 14:55
dansmithhow does it fail?14:55
bauzasit's internal thing, so slack14:55
dansmithbauzas: sean-k-mooney: ah, I got it.. the -kvm kernel has none of the vfio or mdev stuff enabled in config (because it's for virtual only) and so we have module symbols, but not symbols for the vfio or mdev base infrastructure15:19
dansmithwhich makes sense of course15:19
dansmithso we can't compile those mdev drivers because the infra they depend on is missing15:20
bauzasdansmith: I eventually was able to create a new instance w/o that variant thanks to another image15:26
dansmithcool, I think that's the way15:26
bauzasbut fwiw https://cloud-images.ubuntu.com/jammy/20231012/ provides a specific QEMU image with the kvm variant15:26
dansmithyep, but we're simulating hardware here so we need a kernel with hardware stuff in it15:27
bauzasso that's something so we can't support15:27
bauzasagreed15:27
-opendevstatus- NOTICE: The lists.openstack.org site will be offline over the next few hours for migration to a new server15:31
opendevreviewDan Smith proposed openstack/nova master: Compile mdev samples for nova-next  https://review.opendev.org/c/openstack/nova/+/89770815:58
dansmithbauzas: this ^ hard fails on the kvm variant to make sure it's obvious16:01
dansmithand also addresses some of sean-k-mooney's feedback16:02
bauzascool16:02
bauzasmy devstack node is still deploying the stack but I have good hopes16:08
bauzasactually, my hope was wrong :  E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 30066 (apt)16:08
bauzasshit, need to respin16:09
dansmithis auto upgrades running in the background?16:09
dansmiththat's a common thing16:09
sean-k-mooneythats why i always do sudo apt update; sudo apt dist-upgrade -y; sudo reboot16:17
sean-k-mooneybefore i run devstack for the first time16:17
sean-k-mooneyok i also do "sudo apt install python3-dev libffi-dev libssl-dev gcc make git" too16:19
sean-k-mooneydevstack does pull those in as needed but it causes less issues if you do it upfront16:19
sean-k-mooneythats just making sure that we can compile python c modules for pip deps16:21
bauzasfwiw, was able to build the kernel modules with the plain ubuntu kernel16:43
bauzasso I'll +2 dansmith's patch16:43
bauzasstill fighting with a last devstack issue, but unrelated to dan's 16:43
dansmithcool16:43
bauzasdansmith: for some reason, I'm hitting hard https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1523 but it's just the first devstack start17:21
bauzasso I wonder how to fix it17:21
dansmithuh, leftover database?17:21
bauzasprobably17:22
bauzasso, do you prefer that I directly write the file or removing the node ?17:22
dansmithup to you, either way it's detecting the right thing17:24
bauzasactually yeah, I recloned the host but this isn't a AIO17:24
bauzasso the DB still exists17:24
bauzasDB record*17:24
bauzasokay, will write the file17:24
dansmithright which is exactly what that check is for :P17:27
bauzasdansmith: actually, I have to doublecheck but since state_path is /opt/stack/data/nova, this probably goes away when you unstack17:54
dansmithonly when you clean I think17:54
dansmithbut you said it's not an AIO right?17:54
bauzasyeah, just a single compute connecting to another instance17:55
dansmithoh you mean nova's state path gets deleted17:55
bauzasyeah, my theory is when unstacking17:55
dansmithright, but that's the thing it's checking for.. that you basically rebuilt a node with the same name in place17:55
dansmithif you don't want that, copy the uuid file to /etc/nova and it will re-use the same thing over and over17:56
bauzasright, on purpose :)17:56
dansmith(assuming you don't delete /etc/nova I guess unstack might)17:56
dansmithmaybe we need to let devstack force a uuid so we can keep it for subsequent stack runs17:56
dansmithI always do AIO so I never hit that17:56
bauzasanyway, not sure we need to change anything, but that's a bit complicating a story of a 2-node devstack install with the n-cpu failing once17:56
bauzasmaybe, for the moment, my brain is doing concurrency between a devstack install and a customer bug17:57
bauzasso I don't have the energy to think about any potential solution17:58
dansmithit's doing the thing it's supposed to which is catching that you've recreated a compute node with the same name without deleting it from the database18:01
dansmithso either delete it from the DB, or slap the uuid file into place18:01
dansmithI'll add something to devstack to basically disable that behavior by letting you force the uuid to be the same on each stack18:02
dansmithbauzas: https://review.opendev.org/c/openstack/devstack/+/89813418:10
dansmithput a uuid in NOVA_CPU_UUID= in your localrc and you should be good18:10
bauzasyup18:11
bauzasexcellent thing18:11
bauzas+1 from me18:11
opendevreviewDan Smith proposed openstack/nova master: Do not manage CPU0's state  https://review.opendev.org/c/openstack/nova/+/89813719:20
dansmithbauzas: yw ^19:21
opendevreviewSylvain Bauza proposed openstack/nova master: WIP: add mtty support for vgpus  https://review.opendev.org/c/openstack/nova/+/89810019:22
sean-k-mooneydansmith: before i finish for tonight do we want to merge the mdev plugin or wait for bauzas  to modify nova to work with it first20:26
sean-k-mooneyim leanign towords merge since we can trivially revert it if we dont end up using it for any reason but i said i would ask first20:27
bauzasI still need to test devstack witth precreated mdevs20:27
sean-k-mooneyok we can wait then i was assuming your +2 ment you were ok to merge it but was not sure20:28
sean-k-mooneyim  just clsoing broser tabs so said i would ask before i do20:28
sean-k-mooneyi can loop back to this on monday20:28

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!