Wednesday, 2023-11-29

sean-k-mooney[m]well its not exactly form 0 since i starded it like a year ago https://review.opendev.org/q/topic:%22alpine%2200:01
sean-k-mooney[m]but ya we dont use much in the images an honestly i kind of wish tempest only used busybox or similar tools00:01
sean-k-mooney[m]tineycore might be a bitt too spartan but it would be worth a try00:01
sean-k-mooney[m]my alpine image does not have what needed for growpart00:02
clarkbfwiw I don't expect the ubuntu kernel built for cirros to be that drastically different than another tiny kernel. They use the same source but rebuild to make it smaller and lightweight iirc00:02
sean-k-mooney[m]so that was causing things to fail00:02
JayFalpine does use musl libc, which can cause changes in behavior FWIW; but as I say that I realize I don't know if tinycore uses glibc :)00:02
sean-k-mooney[m]clarkb:  hum ok i tought the just unpacked the deb file without rebuilding00:02
clarkbsean-k-mooney[m]: there is a whole massive build process for those images. I thought they were rebuilding the kernels. frickler  can probably confirm or deny00:03
sean-k-mooney[m]JayF:  the use of musl is part of why alpine was apealing00:03
clarkbsean-k-mooney[m]: fwiw I think the ubuntu kernel packages hosted for ubuntu are several times larger than an entire cirros image00:03
JayFAh. I would consider it quite the opposite since all our other testing is on glibc, and most target distributions for openstack are glibc. Seems like a place where we could find weird breakages that we might not be well-suited to fix.00:03
sean-k-mooney[m]clarkb:  ack, in this case the kernel  messages seam to be indicating that we are runnign out of memeory for memroy mapped io to allocte to the pcie devices00:03
clarkbmaybe even an entire order of magnitude00:03
clarkbJayF: also python runs poorly on musl00:04
clarkbthough most of our jobs don't really need to run python in the nested VMs00:04
sean-k-mooney[m]JayF:  nothign about openstack should care if the guest has glibc or not00:04
sean-k-mooney[m]if it does that a problem we should fix00:04
JayFsean-k-mooney[m]: aha, you don't run anything inside the images; Ironic does, I forgot that :D 00:04
sean-k-mooney[m]we basically use touch, ls ssh and a few other very minimal commands00:05
sean-k-mooney[m]hench why busybox + mkfs.fat3200:05
sean-k-mooney[m]woudl cover most things00:05
clarkbI think the most complicated stuff we do in the nodes is realted to networking to confirm network functionality00:05
sean-k-mooney[m]assuming glean or cloud-init work for metadata/ssh keys00:05
JayFsean-k-mooney[m]: you really should see how far you could get with commenting out 99% of our tinyipa build script and tossing the resulting image into a CI job00:06
JayFI guess really the hard part is CI infra, and it wouldn't be able too reuse hte IPA-B stuff because it's not really IPA-adjacent00:06
JayFhttps://opendev.org/openstack/ironic-python-agent-builder/src/branch/master/tinyipa/build-tinyipa.sh is the entrypoint for our tinyipa ramdisk build, fwiw00:07
sean-k-mooney[m]why is it pulling from linux.dell.com?00:07
JayFto get a source release of biosdevname, which isn't shipped in tinyipa00:08
sean-k-mooney[m]ya but i assume that is not the upstream source repo for that00:08
JayFyou actually assume wrong, surpringly00:08
JayFbut that was basically my knee jerk review feedback too LOL00:08
sean-k-mooney[m]hum the more you know00:09
sean-k-mooney[m]does the ipa image have glean or cloud init?00:10
JayFfor purposes of tinyipa, I think we only test that on dhcp images, but I'm not 100% sure00:10
JayFrpittau is the expert on those00:10
* JayF reading it to be sure00:10
sean-k-mooney[m]it looks like its configuing dhcp using udhcp ya00:13
JayFsean-k-mooney[m]: yep; based on our docs I'd assume we don't glean/cloud-init it outta the box (but I'll note that TheJulia worked on upstream glean support for tinycore so it's likely just automation away): https://docs.openstack.org/ironic-python-agent-builder/latest/admin/tinyipa.html#enabling-disabling-ssh-access-to-the-ramdisk 00:13
sean-k-mooney[m]so the gap would be that iamge today does not ahve anythign to reach out and download the ssh key form metadata so that we can log in00:13
JayFour relationship with glean in IPA is weird, bceause we run it on demand, not automatically 00:13
sean-k-mooney[m]we could fall back to password login but tempest asssumes  a “cloud image” i.e. one with a cloud-init equivalent00:14
JayFyou literally just need to wait for https://review.opendev.org/c/opendev/glean/+/899219 to land00:14
JayFand add a stanza to install glean in the image00:14
sean-k-mooney[m]good to know.00:15
JayFI'll summarize this all with: I don't care what you use00:15
JayFI don't even know why we used tinycore for this image00:15
JayFbut we've used it literally for years and years, it mostly just works00:16
JayFonly real downside is the longstanding open bug we have where tinycore just does not want to host their sources over https00:16
JayFwhich is infuriating, but ... acceptably infuriating for a CI-only tool00:16
TheJuliaYeah, and that is huge sadness00:16
* JayF -> EOD; good luck sean!00:17
sean-k-mooney[m]melwitt:  so in nova next we allcoate 24 pci root ports https://opendev.org/openstack/nova/src/branch/master/.zuul.yaml#L414 we should drop that to something more reasonable like 12-1600:18
sean-k-mooney[m]that will signigincatly reduce the pci mmio space requried in the guest00:19
melwittsean-k-mooney[m]: ok, sweet. glad there is something we can try 00:19
sean-k-mooney[m]im not entirly sure that the actuall issue is the io space asignment. looking at the kernel trace the error seams to be coming form an irq interupt handeler but its also referencing  ? exc_page_fault+0x89/0x170 so i would wonder if this is still a memory issue00:39
sean-k-mooney[m]its saying Kernel panic - not syncing: Attempted to kill init!00:39
sean-k-mooney[m]as the main panic message00:39
sean-k-mooney[m]and it looks like we re loading form the initramdisk to the root file system at the time the interup was happening00:39
sean-k-mooney[m]so im just wondering if the interupt handler is trying to allocate memroy at a time we are very close to the limit and failling ocationally00:40
sean-k-mooney[m]if that is the case allocatedign a very small amount of swap in the nova flavor might also help but reducing the pcie root ports would still be my first step00:41
sean-k-mooney[m]i think each unused pci root port is using like a mb of ram so 24 mb or our 128 total just for the pcie slot00:41
sean-k-mooney[m]and we dont actully use more hten a hand full of them for volumes/ports00:42
melwittok, I see00:42
sean-k-mooney[m]the last messages before the panic are00:44
sean-k-mooney[m]info: initramfs loading root from /dev/vda100:44
sean-k-mooney[m]/sbin/init: can't load library 'libtirpc.so.3'00:44
melwittI noticed that too but didn't know what it means00:44
melwittthat message is also just before the panic in the non-q35 job00:46
sean-k-mooney[m]so this is where we are changign form runing in the kernel ramdisk to running form the actual root disk i think and in this case init (which i belive is sysvinit not systmed in this case) is loading a dynmaic lib into memory00:46
melwittmaybe the common denominator?00:46
sean-k-mooney[m]well if we get an interupt at that point im guessing the kernel oom killer tries to kill init. ill admit this si a bit beyond my understanding of early boot so im not sure either00:48
sean-k-mooney[m]neutron are seeing the same error for what its worth00:49
sean-k-mooney[m]https://bugs.launchpad.net/neutron/+bug/203994000:49
sean-k-mooney[m]ya cirros has there own init script https://github.com/cirros-dev/cirros/blob/main/src/init00:53
sean-k-mooney[m]its failing here in the switch root call l https://github.com/cirros-dev/cirros/blob/main/src/init#L8600:54
melwittnice find00:56
sean-k-mooney[m]i feel like this is busybox init00:57
sean-k-mooney[m]or rather its calling into busybox init00:58
sean-k-mooney[m]based on https://github.com/cirros-dev/cirros/blob/46a1162787f669ad8d6065cb6bbe477654b4327f/conf/busybox.config#L70100:58
sean-k-mooney[m]i belive switch_root is at least being provided by busybox even if we are not using busy box init directly00:58
sean-k-mooney[m]anyway im going to go to sleep now o/01:04
melwittsean-k-mooney[m]: thanks for helping with this, have a good night o/01:05
melwitt(leaving a message for tomorrow) bauzas, gibi, sean-k-mooney: I made a list of some of the CI failures I saw on one of dan's patches to help me keep track and linked to this ^ discussion ... https://etherpad.opendev.org/p/nova-ci-failures-minimal in case anyone might be interested01:20
melwittI made a new etherpad bc I have trouble parsing etherpads with larger amounts of text01:23
tkajinamwondering if that "please report a bug" message is beneficial these days . I agree it was in the past when OpenStack was not yet common but I doubt it still is considering its current status08:20
tkajinamrecent reports are mostly caused by unrelated problems (wrong configuration, broken rabbitmq, etc) and personally I've not seen any reports related to actual logic bug in nova for a while08:22
fricklerclarkb: sean-k-mooney[m] is right, cirros is using a stock ubuntu kernel, which is why that makes up about half of the complete image size. but that's also why I'm not convinced that this is the source for the issues. did someone check for possible general OOM situations for these failures?08:33
jkulikto me that kernel-panic topic sounds like something is wrong with the image or with loading it. I'd interprete it as /sbin/init running into an error and the kernel thus panicing. /sbin/init: can't load library 'libtirpc.so.3'08:48
jkulik[   13.568826] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000100008:48
jkulikthis bug report seems related https://bugs.launchpad.net/tempest/+bug/1888224 when ends in "It looks like rootfs image is corrupted."08:49
jkulikI'm not sure how the test infra works, but if it only happens sporadically - could it be that some workers have a corrupt image and others don't?08:51
fricklerthe cirros image should be baked into the cloud image we boot our test instances from, so if there really is some data corruption happening, it would need to be at a very low level and would likely lead to a more diverse spread of error patterns09:52
bauzaswhen looking at the kernel panics, yeah I'm not sure other OSes would better work10:00
sean-k-mooney[m]frickler: i am thinking its a general OOM issue which is why i was suggesting reducint the pcie ports we allocate and or adding swap to the tempest flavor to avoid continually adding more ram to the flavors. say add 64mb of swap in the flavor10:27
sean-k-mooney[m]*reduceing10:42
sean-k-mooney[m]frickler:  by the way how would you feel about building busybox in statically linked mode so we can aovid dynmaiclly loading these shared objects in general? https://github.com/cirros-dev/cirros/blob/main/conf/busybox.config#L4311:01
jkulikhm ... yeah, could be OOM since the kernel-panic is in `exc_page_fault` ... maybe the kernel tries to load the shared library into memory but doesn't have the space?11:40
sean-k-mooney[m]we have 128m total the kernel ramdisk was using 32mb of that and we allocated 24 pcie root devices each of which uses about 1mb or ram so we are using about half our ram just in does two. the initramfs can be unloaded once we complete the switch_root to the root block device but thats that failed. so ya it not entirely cleared if it OOM’d or not partly due to how early this is in boot but its worth exporing12:01
elodillesbauzas: hi, you've asked me on the meeting to ping you today about the nova stable releases o:) ( https://review.opendev.org/q/project:openstack/releases+is:open+intopic:nova )12:37
elodillesbauzas: let me know if for any reason any of the release patches needs an update (hopefully the version bumps are correct, and hashes should be the latest)12:39
opendevreviewDanylo Vodopianov proposed openstack/nova master: Packed virtqueue support was added.  https://review.opendev.org/c/openstack/nova/+/87607512:40
fricklersean-k-mooney[m]: we can test that, would have to check how the effect on image size looks like13:03
sean-k-mooneyfrickler: ya im not sure if it woudl go up or down (both on disk and at runtime)13:13
sean-k-mooneyfrickler: are there any videos or blog posts on how cirros is built and its design goals?13:14
sean-k-mooneyi have too many other pulls on my time right now but kind of interested in either improving cirros to adress these probelems ro replacing it eventually13:15
sean-k-mooneythat way i was looking at alpine as a replacement, tinycore ala ironic ipa style would be fien too but if we can tweak cirros to make it more stable in this regrad and continue to use it then that fine in my book too13:17
dvo-plvHello, sean-k-mooney Are you here ?13:38
sean-k-mooneyim listenting to an internal call but yep13:39
fricklersean-k-mooney: I'm not aware of any kind of docs except for what is inside the repo itself. if you want to talk to smoser directly, you can find us in #cirros on liberachat. iiuc the design goal is to have a minimal image for testing purposes, which matches pretty much what openstack CI does with it13:40
dvo-plvI'm currently has a time to work with nova patch13:41
dvo-plvI've resolve some of your comments13:41
dvo-plvbut I have an open question13:41
dvo-plvhttps://review.opendev.org/c/openstack/nova/+/876075/28/nova/virt/libvirt/config.py#181513:41
sean-k-mooneyfrickler: ack i skimed the docs breifly last night but have not actully looked at how the build system works or the current content of the image in any dept13:44
sean-k-mooneyfrickler: so i was conisderign trying the static build myself 13:44
sean-k-mooneydvo-plv: so "self.driver_packed is True" is a common python mistake13:46
sean-k-mooneyyou shoudl not use "is" to compare if something is True13:46
sean-k-mooneyis shoudl only be used to test agains monostate types like NONE or to check the adress of two objects13:47
sean-k-mooneyso " self.driver_packed is True or" shoudl just be "self.driver_packed or"13:47
dvo-plvoh, I see what you mean, you would like to rewrite it to the just "if self.driver_packed"13:48
sean-k-mooneyyep13:48
sean-k-mooneyjsut remove "is True"13:48
dvo-plvI thought that you want some another statment here, okay, I will do it asap13:48
sean-k-mooneycool13:49
sean-k-mooneyi normally give a code example when i ask for changes like this but i didnt last year13:50
sean-k-mooney*night13:50
opendevreviewDanylo Vodopianov proposed openstack/nova master: Packed virtqueue support was added.  https://review.opendev.org/c/openstack/nova/+/87607514:07
dvo-plvits okay, I just get your comment wrong from my side14:09
opendevreviewElod Illes proposed openstack/nova stable/zed: add a regression test for all compute RPCAPI 6.x pinnings for rebuild  https://review.opendev.org/c/openstack/nova/+/90030714:20
opendevreviewElod Illes proposed openstack/nova stable/zed: Fix rebuild compute RPC API exception for rolling-upgrades  https://review.opendev.org/c/openstack/nova/+/90034114:20
opendevreviewElod Illes proposed openstack/nova stable/zed: Adding server actions tests to grenade-multinode  https://review.opendev.org/c/openstack/nova/+/90034214:20
elodillesbauzas: i've updated the zed version of the 'RPC backports', please review if you want them included in the zed release ^^^14:29
bauzassure14:29
bauzasgibi: do you want https://review.opendev.org/c/openstack/nova/+/901656 to be in the first Bobcat z release ?14:29
gibibauzas: ohh we can try a recheck but I don't want to hold the release14:35
bauzasjust rechecked14:37
gibime too :D14:37
gibithanks for the ping 14:37
bauzasgibi: looks not me wasn't due to a ssh issue14:37
bauzasoh, you looked at the other job failure14:38
bauzasmy bad14:38
gibino worries :)14:48
dvo-plvgibi, Maybe you will have a hance to review this one ? 14:56
dvo-plvhttps://review.opendev.org/c/openstack/nova-specs/+/89592414:56
dvo-plvand this14:56
dvo-plvhttps://review.opendev.org/c/openstack/nova/+/87607514:56
*** blarnath is now known as d34dh0r5315:00
bauzasgibi: sean-k-mooney: before I leave, so about https://review.opendev.org/c/openstack/nova/+/902084/116:54
bauzaswe have two possibilities :16:54
bauzas1/ use a star like I did16:54
bauzas2/ let device_addresses be optional16:54
bauzasthis way, 16:55
bauzashttps://paste.opendev.org/show/bbw8uYTcDqasoSSD54WD/ would be acceptable16:55
sean-k-mooneyif we have fxed https://review.opendev.org/c/openstack/nova/+/899406/216:55
sean-k-mooneythen i think it can be optional16:55
bauzasthat's a bit differenbt16:56
sean-k-mooneythat is fixing the fact that if you have just  enabled_mdev_types = nvidia-3516:56
sean-k-mooneydevice_addresses16:56
bauzaspas-ha[m] change is accepting to only use one type but with a section16:56
bauzasyeah16:56
sean-k-mooneyya so16:56
sean-k-mooneywithout that i dont think device_adderss being optional is really a good approch16:57
bauzasso, my WIP (if we make dev_add optional) would change the existing behaviour16:57
sean-k-mooneybtu if we fix that then im fine with it16:57
bauzaswhich is that if you use two types but only one got a section, then none of them use any section16:57
bauzasnot sure we could backport that16:58
sean-k-mooneythats a bug right16:58
bauzasnot really16:58
sean-k-mooneyit kind of is16:58
sean-k-mooneyim aware of the legacy behavior for the first type16:58
bauzashttps://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7938-L795116:58
sean-k-mooneybut the second adn subsquent type are ment ot have a section16:59
bauzasthis conditional makes all types needing to have groups for each of them16:59
sean-k-mooneyand we shoudl not ignore them if they are defiend just because the first type does not ahve one16:59
bauzassure, I'm just changing this is a bit of a behavioural change16:59
bauzaswhich couldn't be acceptable for a backport16:59
sean-k-mooney so i do think all types shoudl have a section in general17:00
sean-k-mooneyhowever that only makes snese if there is somethign in it17:00
bauzasyou do or don't ?17:00
sean-k-mooneyi think we shoudl requrie a section for each mdev type unconditonally17:00
sean-k-mooneypersonlaly17:00
sean-k-mooneybut wiether device_adresss is required or not is a sperate question17:01
bauzasso you'd prefer to have https://paste.opendev.org/show/bKc2SjB6l7LILYE0onzG/17:01
sean-k-mooneykind of yes although i would also like to consider if max_instances is requried or not17:02
bauzasif so, that simplifies my change and then we can backport it :)17:02
sean-k-mooneyto me https://paste.opendev.org/show/bKc2SjB6l7LILYE0onzG/17:02
sean-k-mooneyis you intentionally not setting device_adresses17:03
bauzasI wouldn't require max_instance or only unless device_addresses isn't set17:03
sean-k-mooneyand as a result opting into the whild card behavior17:03
bauzassean-k-mooney: correct, you're saying 'this is a default type for all found GPUs'17:03
bauzasokay, then I'll change my current non-uploaded patch and I'll provide it tomorrow17:04
bauzasthanks sean-k-mooney17:04
sean-k-mooneyack 17:04
sean-k-mooneygibi: ^ when you have time does that work for you17:04
bauzassean-k-mooney: fwiw, https://review.opendev.org/c/openstack/nova-specs/+/900636 is open for reviews :)17:04
sean-k-mooneyhehe ya i need to do a review day soon17:05
sean-k-mooneymaybe tomorrow17:05
bauzasyou're lucky17:05
bauzasmy review karma is bad those days17:05
* bauzas raises hand up at some n... company17:05
sean-k-mooneyhttps://www.stackalytics.io/report/contribution?module=nova-group&project_type=openstack&days=30 i mean min is not much better for thte last little bit17:06
gibibauzas: sean-k-mooney: sounds good to me17:07
bauzasthanks gibi17:07
bauzaswill upload the patch quickly then17:07
opendevreviewmelanie witt proposed openstack/nova master: Lower num_pcie_ports to 12 in the nova-next job  https://review.opendev.org/c/openstack/nova/+/90217517:29
melwittsean-k-mooney: from our convo yesterday ^17:30
sean-k-mooneycool lets see if that passses18:06
sean-k-mooneyit would only fial if we need more then 12 pci device in a tempest test18:06
melwittkk18:07
sean-k-mooneywe use about 4-6 by defualt and i dont think we are attaching that many volumes or prots to cause an issue18:07
opendevreviewMerged openstack/python-novaclient master: add pyproject.toml to support pip 23.1  https://review.opendev.org/c/openstack/python-novaclient/+/89995018:26
opendevreviewMerged openstack/nova stable/2023.2: Allow enabling cpu_power_management with 0 dedicated CPUs  https://review.opendev.org/c/openstack/nova/+/90165619:15
melwittargh, it already failed for the non-q35 guest kernel panic https://4a4a1510e17776a8b793-89f5aa2f0368a55b3a90e6b26173438f.ssl.cf2.rackcdn.com/902175/1/check/nova-multi-cell/c2bacb0/testr_results.html19:32
melwittit meaning the patch, unrelated to the change itself19:33
sean-k-mooneythat is a slightly diffent panic19:35
sean-k-mooneyit paniced at the same phases switching_root19:35
melwittright19:35
sean-k-mooneybut instead of being in a page fault handeler it was in teh apic timer handeler19:35
sean-k-mooneyit still could be OOM related19:36
sean-k-mooneyin that case we also dont have any of the io mapping error since it machine_type=pc and using pci not pcie as a result19:37
melwittthat would be my guess19:37
melwittright19:37
sean-k-mooneyso the patch will help with one of the 2 issues but perhaps we shoudl just add 64 mb of swap to the tempest flavors19:38
melwittI've been comparing a non kernel panic console log with a panic one and one difference I see leading up to it is that the non-panic has this line "info: copying initramfs to /dev/vda1" but the panic one does not19:38
sean-k-mooneyhttps://github.com/openstack/devstack/blob/master/lib/tempest#L293-L30619:39
sean-k-mooneyso from here https://github.com/cirros-dev/cirros/blob/46a1162787f669ad8d6065cb6bbe477654b4327f/src/init#L6319:40
melwitthm ok. so maybe I can try adding swap there and recheck a bunch of times to see if panic happens?19:40
melwittyes19:40
melwittso that code block is skipped in the cases where it panics. dunno why19:41
sean-k-mooneyso its failing here https://github.com/cirros-dev/cirros/blob/46a1162787f669ad8d6065cb6bbe477654b4327f/src/init#L8519:41
melwittright19:42
sean-k-mooneythat after where the copy should print19:42
sean-k-mooneydo you see the GPT header messages in the one that passed19:42
melwittyes19:43
sean-k-mooney[   11.405476] GPT:Primary header thinks Alt. header is not at the end of the disk.19:43
sean-k-mooney[   11.406051] GPT:229375 != 209715119:43
sean-k-mooney[   11.406370] GPT:Alternate GPT header not at the end of the disk.19:43
sean-k-mooney[   11.406743] GPT:229375 != 209715119:43
sean-k-mooney[   11.407018] GPT: Use GNU Parted to correct GPT errors.19:43
sean-k-mooneyok so its not realted to that19:43
sean-k-mooneyif we dont see "info: copying initramfs to /dev/vda1"19:43
sean-k-mooneythat imples $ROOT is undefiend19:43
melwittyeah.. I kinda wondered if the lack of "copying initramfs" is related to not being able to load 'libtirpc.so.3'19:44
sean-k-mooneywell if we have not copied the ramdisk to the root file system19:44
sean-k-mooneythe the dynmaic loader wont be able to find it19:44
sean-k-mooneyso looking at https://github.com/cirros-dev/cirros/blob/46a1162787f669ad8d6065cb6bbe477654b4327f/src/init#L44-L5219:45
melwitteither $ROOT is undefined of if search_for_blank "$rootspec" rw "$NEWROOT_MP" was false19:45
sean-k-mooneyya i thik in the failing case we are taking the else branch on line 5019:45
melwittI wonder how we could enable debug logging in there19:46
melwittdebug 1 "did not find a device matching $rootspec" but we don't see it19:46
sean-k-mooneywe likely cod do ti by passing kernel command line args19:47
sean-k-mooneybut i partly wonder if that woudl fix it19:47
sean-k-mooneywe woudl have to change form the full disk image to the one with the splict out kernel and initram19:47
sean-k-mooneyi know that neutron use that in one of there jobs19:48
sean-k-mooneyto avoid some kernel panics.19:48
melwittoh huh19:48
sean-k-mooneyspecifical this on ewith the apic timer19:48
melwittinteresting19:48
sean-k-mooneyany unrecognised parmater on the kernel command line is set as an ENV var in the init procsss by the kernel 19:49
sean-k-mooneyincase you didnt know that so if debug is based on an env var you sould jsut add that env var to the kernel command line19:49
melwittso theoretically the num_pcie_ports might fix the nova-next panics and the split disk image might fix the rest ... seems it would be too good to be true19:49
melwittok yeah I didn't know that19:50
sean-k-mooneywell im 99% shure that the split disk will fix the apic one19:50
sean-k-mooneythe only down side to that is that we loose coverage of testign full disk images if we use that every where19:50
sean-k-mooneyill check quickly if i can see where they enable it in there job19:51
melwittk. I wonder if we could get away with using the split image everywhere except nova-next19:52
opendevreviewalisafari proposed openstack/nova master: Fix traits to cpu flags mapping  https://review.opendev.org/c/openstack/nova/+/90218319:53
sean-k-mooneyproably19:53
melwittI feel like I'm seeing these panics all the time. not sure what changed bc they used to be so rare in the past. maybe OOM like you said or maybe if in the past everything was using the split image, that I don't know19:54
sean-k-mooneyno we were defienlty usign the whole disk image for several reelase19:56
sean-k-mooneyi dont know if we ever use d the split imate by default19:56
melwittack19:56
sean-k-mooneywe may have19:56
sean-k-mooneyah found it20:10
sean-k-mooneyhttps://github.com/openstack/neutron/commit/e04bd8fbdfa56320d16870b1f294b2cb62b8a82820:10
sean-k-mooneyso if we add20:11
sean-k-mooney   CIRROS_VERSION: 0.6.220:11
sean-k-mooney        DEFAULT_IMAGE_NAME: cirros-0.6.2-x86_64-uec20:11
sean-k-mooney        DEFAULT_IMAGE_FILE_NAME: cirros-0.6.2-x86_64-uec.tar.gz20:11
sean-k-mooneythat shoud fix it 20:12
sean-k-mooneythe other way to do that is move the job to use nested virt20:12
sean-k-mooneyhttps://review.opendev.org/c/openstack/neutron-tempest-plugin/+/82106720:12
sean-k-mooneymelwitt: what we coudl do is swap to the uec image for our default jobs20:13
sean-k-mooneyand move nova-next to nested virt with the whole disk image20:14
melwittsounds worth a try20:14
sean-k-mooneyalthough we woudl want ot use the nested virt jamy lable in our node set20:15
sean-k-mooneyhttps://opendev.org/openstack/project-config/src/branch/master/nodepool/nl04.opendev.org.yaml#L41-L4220:15
melwittok20:17
sean-k-mooneydo you know how ot make those changes. im going to finish for today and get dinner but i can update the jobs tomorow if you havent and we can see if it works20:18
melwittI think I can do it given the examples you linked. but yeah sounds good, if I do it wrong feel free to redo it or whatever tomorrow20:19
sean-k-mooneywe in theory have nested virt capablity for 2 vexhost clouds and ovh so i dont really feel as bad about using in in nova next as i did whne it was only one provider20:21
melwittah ok20:22
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be restarting momentarily for a patch update to address a recently observed regression preventing some changes from merging21:09

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!