Friday, 2021-01-22

*** jamesmcarthur has quit IRC00:01
*** jamesdenton has quit IRC00:02
*** jamesdenton has joined #openstack-infra00:02
*** jamesmcarthur has joined #openstack-infra00:03
*** sshnaidm|ruck is now known as sshnaidm|afk00:08
*** tosky has quit IRC00:09
*** jamesmcarthur has quit IRC00:15
*** jamesmcarthur has joined #openstack-infra00:15
*** jamesmcarthur has quit IRC00:22
*** jamesmcarthur has joined #openstack-infra00:22
*** jamesmcarthur has quit IRC00:27
*** jamesmcarthur has joined #openstack-infra00:28
*** jamesmcarthur has quit IRC00:30
*** rcernin has quit IRC00:32
*** rcernin has joined #openstack-infra00:37
*** jamesmcarthur has joined #openstack-infra00:37
*** jamesmcarthur has quit IRC00:38
*** lbragstad_ has joined #openstack-infra00:38
*** jamesmcarthur has joined #openstack-infra00:38
*** jamesmcarthur has quit IRC00:38
*** lbragstad has quit IRC00:40
*** dychen has joined #openstack-infra01:15
*** dave-mccowan has quit IRC01:15
*** dchen has quit IRC01:17
*** jamesmcarthur has joined #openstack-infra01:22
*** jamesmcarthur has quit IRC01:23
*** jamesmcarthur has joined #openstack-infra01:23
*** dave-mccowan has joined #openstack-infra01:30
*** jamesmcarthur has quit IRC01:34
*** jamesmcarthur has joined #openstack-infra01:35
*** lbragstad_ is now known as lbragstad01:37
*** lbragstad has quit IRC01:46
openstackgerritMerged openstack/project-config master: Set up access for #openinfra channel  https://review.opendev.org/c/openstack/project-config/+/77107301:57
*** dychen has quit IRC01:59
*** dychen has joined #openstack-infra02:00
*** hamalq has quit IRC02:24
*** dingyichen has joined #openstack-infra02:25
*** dychen has quit IRC02:27
*** jamesmcarthur has quit IRC02:28
*** jamesmcarthur has joined #openstack-infra02:30
*** jamesmcarthur has quit IRC02:35
*** dingyichen has quit IRC02:39
*** dingyichen has joined #openstack-infra02:40
*** ysandeep|away is now known as ysandeep02:43
*** armax has joined #openstack-infra02:50
*** dklyle has quit IRC02:54
*** david-lyle has joined #openstack-infra02:54
*** armax has quit IRC02:56
*** jamesmcarthur has joined #openstack-infra02:58
*** rcernin has quit IRC03:08
*** ianw is now known as ianw_pto03:12
*** armax has joined #openstack-infra03:17
*** lbragstad has joined #openstack-infra03:25
*** armax has quit IRC03:31
*** rcernin has joined #openstack-infra03:33
*** gyee has quit IRC03:36
*** david-lyle has quit IRC03:38
*** rcernin has quit IRC03:48
*** armax has joined #openstack-infra03:59
*** jamesmcarthur has quit IRC04:03
*** jamesmcarthur has joined #openstack-infra04:03
*** rcernin has joined #openstack-infra04:04
*** rcernin has quit IRC04:05
*** rcernin has joined #openstack-infra04:05
*** Ajohn has joined #openstack-infra04:14
*** armax has quit IRC04:22
*** vishalmanchanda has joined #openstack-infra04:47
*** Ajohn has quit IRC04:55
*** Ajohn has joined #openstack-infra04:58
*** ykarel has joined #openstack-infra04:59
*** lbragstad has quit IRC05:11
*** psachin has joined #openstack-infra05:22
*** Ajohn has quit IRC05:41
*** jamesmcarthur has quit IRC05:51
*** jamesmcarthur has joined #openstack-infra05:52
*** jamesmcarthur has quit IRC05:57
*** Ajohn has joined #openstack-infra05:57
*** matt_kosut has joined #openstack-infra05:59
*** jamesmcarthur has joined #openstack-infra06:01
*** jamesmcarthur has quit IRC06:01
*** Ajohn has quit IRC06:02
*** Ajohn has joined #openstack-infra06:03
*** Ajohn has quit IRC06:23
openstackgerritRico Lin proposed openstack/project-config master: Mark min-ready for ubuntu-focal-arm64  https://review.opendev.org/c/openstack/project-config/+/77191206:30
*** sboyron has joined #openstack-infra06:32
*** rcernin has quit IRC07:25
*** rpittau|afk is now known as rpittau07:34
*** rcernin has joined #openstack-infra07:37
*** jcapitao has joined #openstack-infra07:37
*** hashar has joined #openstack-infra07:41
*** rcernin has quit IRC07:42
*** ralonsoh has joined #openstack-infra07:48
*** rcernin has joined #openstack-infra07:50
*** eolivare has joined #openstack-infra07:50
*** yamamoto has quit IRC07:53
*** yamamoto has joined #openstack-infra07:54
*** slaweq has joined #openstack-infra07:54
*** rcernin has quit IRC07:55
*** ysandeep is now known as ysandeep|lunch08:00
*** jamesmcarthur has joined #openstack-infra08:02
*** yamamoto has quit IRC08:04
*** jamesmcarthur has quit IRC08:06
*** rcernin has joined #openstack-infra08:08
*** mgoddard has quit IRC08:11
*** mgoddard has joined #openstack-infra08:11
*** andrewbonney has joined #openstack-infra08:13
*** rcernin has quit IRC08:13
*** rcernin has joined #openstack-infra08:14
*** amoralej|off is now known as amoralej08:17
*** rcernin has quit IRC08:19
*** rcernin has joined #openstack-infra08:26
*** rcernin has quit IRC08:31
*** xek_ has joined #openstack-infra08:34
*** yamamoto has joined #openstack-infra08:36
*** dingyichen has quit IRC08:41
*** rcernin has joined #openstack-infra08:44
*** tosky has joined #openstack-infra08:44
*** rcernin has quit IRC08:49
*** jamesdenton has quit IRC08:49
*** jamesdenton has joined #openstack-infra08:49
*** yamamoto has quit IRC08:51
*** gfidente|afk is now known as gfidente08:52
*** jpena|off is now known as jpena08:57
*** nightmare_unreal has joined #openstack-infra08:58
*** ociuhandu has joined #openstack-infra09:02
*** lucasagomes has joined #openstack-infra09:02
*** rcernin has joined #openstack-infra09:08
*** jamesmcarthur has joined #openstack-infra09:13
*** rcernin has quit IRC09:13
*** zxiiro has quit IRC09:14
*** PrinzElvis has quit IRC09:14
*** masayukig has quit IRC09:14
*** zigo has quit IRC09:14
*** sorrison has quit IRC09:14
*** zxiiro has joined #openstack-infra09:15
*** PrinzElvis has joined #openstack-infra09:15
*** masayukig has joined #openstack-infra09:15
*** zigo has joined #openstack-infra09:15
*** sorrison has joined #openstack-infra09:15
*** wolsen has quit IRC09:18
*** jamesmcarthur has quit IRC09:20
*** jamesmcarthur has joined #openstack-infra09:21
*** mordred has quit IRC09:21
*** JanZerebecki[m] has quit IRC09:21
*** rcernin has joined #openstack-infra09:25
*** jamesmcarthur has quit IRC09:26
*** psachin has quit IRC09:26
*** rcernin has quit IRC09:30
*** rcernin has joined #openstack-infra09:32
*** ysandeep|lunch is now known as ysandeep09:33
openstackgerritPranali Deore proposed openstack/project-config master: Add official-openstack-repo-jobs for openstack/glance-tempest-plugin  https://review.opendev.org/c/openstack/project-config/+/77195409:39
*** derekh has joined #openstack-infra09:48
*** dtantsur|afk is now known as dtantsur09:59
*** yamamoto has joined #openstack-infra10:05
*** yamamoto has quit IRC10:09
*** wolsen has joined #openstack-infra10:10
*** yamamoto has joined #openstack-infra10:16
*** yamamoto has quit IRC10:16
*** yamamoto has joined #openstack-infra10:16
*** yamamoto has quit IRC10:20
*** yamamoto has joined #openstack-infra10:20
*** yamamoto has quit IRC10:20
*** yamamoto has joined #openstack-infra10:21
*** yamamoto has quit IRC10:22
*** yamamoto has joined #openstack-infra10:22
*** yamamoto has quit IRC10:22
*** yamamoto has joined #openstack-infra10:23
*** yamamoto has quit IRC10:27
*** yonglihe has quit IRC10:29
*** JanZerebecki[m] has joined #openstack-infra10:43
*** mordred has joined #openstack-infra10:43
*** systemc is now known as systemb10:44
*** hashar is now known as hasharAway10:44
*** ykarel_ has joined #openstack-infra11:08
*** ykarel has quit IRC11:08
*** ykarel__ has joined #openstack-infra11:16
*** ykarel_ has quit IRC11:19
*** jcapitao is now known as jcapitao_lunch11:23
*** mgoddard has quit IRC11:30
*** rcernin has quit IRC11:31
*** rcernin has joined #openstack-infra11:38
*** rcernin has quit IRC12:00
*** ysandeep is now known as ysandeep|afk12:15
*** rlandy has joined #openstack-infra12:25
*** rcernin has joined #openstack-infra12:27
*** iurygregory_ has joined #openstack-infra12:28
*** iurygregory has quit IRC12:28
*** iurygregory_ is now known as iurygregory12:29
*** jpena is now known as jpena|lunch12:35
*** yamamoto has joined #openstack-infra12:44
*** rcernin has quit IRC12:48
*** yamamoto has quit IRC12:49
*** hasharAway is now known as hashar12:52
*** AJaeger has joined #openstack-infra12:54
*** ociuhandu has quit IRC12:55
*** ysandeep|afk is now known as ysandeep12:56
*** ociuhandu has joined #openstack-infra12:56
*** ociuhandu has quit IRC12:58
*** ociuhandu has joined #openstack-infra12:59
*** jcapitao_lunch is now known as jcapitao13:01
*** ociuhandu has quit IRC13:02
*** ociuhandu has joined #openstack-infra13:02
*** ociuhandu has quit IRC13:02
*** ociuhandu has joined #openstack-infra13:03
*** ociuhandu has quit IRC13:15
*** tkajinam_ has quit IRC13:16
*** sboyron has quit IRC13:16
*** sboyron has joined #openstack-infra13:27
*** mgoddard has joined #openstack-infra13:32
*** jpena|lunch is now known as jpena13:34
*** hemna has quit IRC13:39
*** ociuhandu has joined #openstack-infra13:39
*** lbragstad has joined #openstack-infra13:42
*** hemna has joined #openstack-infra13:44
*** ykarel__ is now known as ykarel13:44
*** ociuhandu has quit IRC13:49
*** ociuhandu has joined #openstack-infra13:50
*** AJaeger has quit IRC13:53
*** ysandeep is now known as ysandeep|away14:06
*** amoralej is now known as amoralej|lunch14:08
*** hemna has quit IRC14:12
*** ociuhandu has quit IRC14:20
*** jamesdenton has quit IRC14:20
*** kashyap has joined #openstack-infra14:23
*** hemna has joined #openstack-infra14:24
*** amoralej|lunch is now known as amoralej14:43
*** jamesdenton has joined #openstack-infra14:45
*** jamesmcarthur has joined #openstack-infra14:48
*** ociuhandu has joined #openstack-infra14:49
*** rpittau is now known as rpittau|afk14:58
*** mugsie has joined #openstack-infra15:09
zbrinfra-core: who can help me do a git-review release? i managed to clean the backlog.15:09
zbri do use git-review from master branch, and last release was in 201915:13
zbrprobably we want to name this 2.0? due to dropping support for py27?15:13
*** vishalmanchanda has quit IRC15:15
*** armax has joined #openstack-infra15:17
*** mugsie has quit IRC15:18
*** zbr3 has joined #openstack-infra15:22
fungizbr: i can do it today, i also need to cut a release for bindep15:23
fungiand yeah, i might make two git-review releases for the py27 drop depending on what order it falls in the history. would be nice to tag the last py27-supporting version and then do the major version bump with the py27 drop15:23
fungii'll try to look once i finish catching up on channel backlog and mailing lists15:24
*** zbr has quit IRC15:24
*** zbr3 is now known as zbr15:24
zbrfungi: cool, if you can also do some testing it would be cool. like using it from master ;)15:26
zbri have no pressure on that, master works fine for me.15:27
fungiyeah, i do tend to use git-review from master but i don't refresh it as often as i should, thanks for the reminder!15:28
zbrdo a --help after you do, you will see a cool feature mentioned at the end.15:32
kashyapfungi: Heya.  When you get a minute, a small topic to discuss on adding a custom disk image for testing:15:32
zbrusing username instead of name for branch names.15:32
*** armax has quit IRC15:32
kashyapfungi: So, we're trying to add support for secure boot in Nova; for that we'd need a disk image in the gate that has an EFI partition.15:33
kashyapfungi: Now, none of the default images; nor any cloud images distributed by distro vedors don't have EFI parition15:33
fungikashyap: i think our amd64 nodes already (have to) boot via uefi15:33
fungier, sorry,  i meant arm6415:34
kashyapfungi: I see; I'm looking for instance booting15:34
fungiaarch6415:34
kashyap(Right; I've had some devel boards in the distant past)15:34
kashyapfungi: So, in a disk image I should see something like this: http://paste.openstack.org/show/801871/15:34
kashyapfungi: For now, my testing has been by creating custom images from a distro install tree, like this:15:34
kashyaphttps://kashyapc.fedorapeople.org/Create-a-SecureBoot-enabled-VM.bash15:35
fungier, to be clear, you want custom test nodes booted with a uefi partition? or you're nesting instances and want an image file booted in devstack in a job or something like that?15:35
kashyapSo one idea, I'm exploring is this:15:35
kashyapfungi: My bad, I didn't explain well.  Let me retry :-)15:35
kashyapfungi: The goal is to be able to test a VM with OVMF ("UEFI for VMs") in DevStack in a job.  Tthe physical host _itself_ doesn't have to have a UEFI partition15:36
fungiokay, and by "physical host" you mean the virtual machine instance we boot with nodepool to run the jobs on top of15:37
kashyapRight15:37
kashyapI see that the infra uses nesting under the hood15:37
fungi(because there's also an actual physical bare metal host under that which the cloud provider controls)15:37
kashyapBecause the "hosts" the Infra gets are essentially level-1 VMs -- right?15:37
fungiright, we get accounts at public cloud providers and boot nova instances with images we build in diskimage-builder15:38
kashyapRight15:38
fungiand ssh into those with ansible to start jobs15:38
kashyapSo do you have a suggestion on how best to go forward here?  Another idea that I'm exploring is:15:38
kashyapMake a disk image with a real Fedora grub and kernel, and a custom tiny initrd that just prints the secure boot status.  That image will be be <10MB, and doesn't require frequent "updating".15:39
fungiyeah, so within a job you want devstack to download this image and boot it and verify it booted, basically?15:41
fungiseems like that test could be worked into devstack/tempest and just included in whatever other tests are run in one of nova's existing ci jobs15:42
kashyapfungi: Boot it based on a Nova config that enables the secure-bootability -- by picking the right OVMF binary, etc.15:42
kashyapfungi: Where can I host this image?15:43
kashyaps/image/template image/15:43
fungiand i agree, one of the first steps would be finding or making the image15:43
*** dklyle has joined #openstack-infra15:44
kashyapYes, how about this: I'll prepare the image, and then write an email to the list w/ some details?15:44
kashyap(So others can follow along, too, who's interested in the topic.)15:44
fungiyou say you've been unable to find a reliable source for a small uefi test image?15:44
kashyapfungi: Yeah, for example:15:45
kashyapFedora's cloud images doesn't have any EFI partitions, and I was told "cloud images are not intended to do EFI"15:45
kashyapfungi: And I haven't explored what Ubuntu / Debian offer15:46
frickleror create a small project that creates such an image and has a job that uploads it as an artefact to some of our sites? (not sure tarballs.o.o would be the right one)15:47
fungiyeah, even uefi.org seems to recommend fat (ubuntu live) images for validation testing15:47
*** armax has joined #openstack-infra15:47
kashyapYeah, that's too much15:47
kashyapfungi: Yeah, we've had various ad-hoc scripts for testing the QEMU bits, etc.  Perhaps should create a tester project15:47
fungiwhoever said "cloud images are not intended to do EFI" also probably thinks arm hardware is not used for making clouds15:48
kashyapHeh15:48
kashyapfrickler: fungi: In fact, we did this in the past when testing OVMF / QEMU stuff: https://github.com/puiterwijk/qemu-secureboot-tester/blob/master/sbtest15:49
kashyap(Warning, "tall" script ... but I'll work through to see how to make use of it in this context)15:50
*** mugsie has joined #openstack-infra15:51
fungigiven the simplicy of uefi, i wonder if it wouldn't be simpler to just make a uefi image generator which devstack can run... would probably take all of a few seconds during the job if you really just want something which will boot and echo a string, doesn't need a kernel/userspace15:51
kashyapThanks, both for the input.  I still have to work out some Nova code.15:51
kashyapfungi: Heh, "simplicity"15:51
fungiof course if you want to test through booting a signed kernel and then do userspace attestation, you'll need more than just that15:52
kashyapWhoa ... no need for "attestation"-level stuff15:52
kashyapfungi: All we want to validate is:15:52
kashyap- Has the VM booted with the right OVMF binaries for secure boot (there's a whole bunch of them!)15:53
kashyap- And has the guest kernel emitted "Secure Boot is in effect" or whatever is the latest message15:53
fungiokay, but you do need a kernel of some sort, so just booting into the uefi manager and echoing something isn't enough15:54
fungithough you could likely boot something small like syslinux15:54
fungi(i think the current debian installer chains uefi to syslinux, as an example)15:55
kashyapfungi: Yes, yes; a proper kernel is needed, definitely15:55
kashyapfungi: So, I know you're curious of low-level bits ... just to show the terrible messiness involved here:15:55
kashyapfungi: There are sooooooooooo many OVMF binary names, and each distro has their _own_ naming scheme!  Thankfully, QEMU solved this problem by creating a firmware "schema" that all distros can use15:56
kashyapAnd we've slowly got that work done in distros over time.15:56
kashyapCheck this out (resolved): https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=93226915:57
openstackDebian bug 932269 in ovmf "Ship the firmware "descriptor files" as part of the 'ovmf' package" [Normal,Fixed]15:57
fungioh yikes15:58
fungiso, yeah, i see two possible approaches:15:58
kashyapYeah ...  I did the packaging work in Fedora, and now Debian, Ubuntu, and Fedora ship these files15:58
kashyapfungi: Actually this all makes life _much_ easier.  Why?15:58
fungi1. have devstack make a uefi test image on the fly and use that (if it's reasonably self-contained, simple-ish, and only takes a few seconds to run)15:59
kashyap(libvirt has done enough work to take advantage of these JSON files, and will just do the right thing -- you only have to tell libvirt to boot with 'efi')15:59
* kashyap listens15:59
kashyapfungi: On (1) -- is there a prior example in-tree?16:00
clarkbcrazy idea: update cirros to do both legacy bios and uefi then use that image for all the things16:00
clarkbor cirros standin16:00
fungi2. write a python tool to create a uefi test image and run a ci job to use it to publish the image to somewhere like tarballs.opendev.org/openstack/uefi-test/uefi-test_2021-01-22.img and tell devstack to grab from there16:00
* kashyap taps on the table and thinks16:01
fungiyeah, 3. fork/take over cirros maintenance and make it support uefi would be awesome but...16:01
kashyapclarkb: Heh; doesn't sound that crazy -- but no, I just don't have time to maintain that16:01
clarkbI know sean was looking at building a small image with dib to replace cirros (alpine based maybe?) and I think frickler has helped smoser a bit in the past. They may have ideas on #316:02
frickleractually I think hrw made cirros work on arm, so it should have at least some kind of efi support I guess16:02
fungi#2 would be the "this is a little too complex to fit in devstack and/or runs too long to just do within the job"16:02
kashyapfungi: Yeah; I see what you mean16:03
kashyapclarkb: Oh, a complication for CirrOS: it doesn't ship the JSON "firmware descriptor files" (see above Debian RFC)16:03
kashyapfungi: Good thing is: it should be quicker now, because, Debian, Ubuntu, and Fedora -- all should ship the firmware descriptor files, with pre-made OVMF "vars" files16:04
clarkbkashyap: well we would be modifying cirros either way? I assume any problems like that can be sorted out too16:04
kashyap(Sorry for the terminology)16:04
clarkbbut I'm not super familiar with cirros' dev process, frickler would definitely be a better input on that16:04
kashyapclarkb: Would we?  I lost track :-)16:04
fungikashyap: to answer your earlier question about publication, we run any services which allow someone to just stick a file somewhere, but we do run services which can be used to publish ci build artifacts (ephemerally or durably), or to upload to other services which will host them for you16:05
fungier, i tried to say "we DON'T run any services which allow someone to just stick a file somewhere..."16:05
kashyapfungi: Okay; I'll come back with some more details next week, after some tinkering.  And discuss more concretely, instead of waving my hands.16:05
kashyapfungi: Fair enough16:06
kashyapfungi: By "other services", you mean non-OpenStack infra ones?16:06
fungiyeah16:06
fricklerthere's also #cirros if you want to discuss modifications or shortcomings16:06
fungior, terrible idea here, you could also just stick the image in a git repo, it's sort of a cheat, but if it's not huge and is the only thing in the repo and doesn't update often/ever...16:06
kashyapAlright; got it.  I'll tinker some more and see what's the least invasive that I can come up w/ for our purposes here.16:07
kashyapfungi: 10MB -- you can forgive that, right, for sticking in a Git repo? :-)16:07
kashyap(I'm not sure yet; 10MB is a guesstimate)16:07
fungiwe do have git repos hosting things like video files to embed in web pages we publish, but yeah it's not the best idea16:07
kashyapAnd no; it doesn't update often; let alone once every 8 months.16:07
kashyapOkay; thanks for the useful discussion, folks.  I'll come back w/ something more concrete.16:08
*** lpetrut has joined #openstack-infra16:10
fungiany time!16:10
*** zul has quit IRC16:11
*** lpetrut has quit IRC16:14
*** ociuhandu has quit IRC16:18
*** ociuhandu has joined #openstack-infra16:19
*** ociuhandu has quit IRC16:24
*** __ministry1 has joined #openstack-infra16:30
*** __ministry1 has quit IRC16:30
*** ociuhandu has joined #openstack-infra16:38
zbrfungi: if you are pleased with what i did with git-review, i could also take try to take care of bindep, https://review.opendev.org/q/project:opendev%252Fbindep+status:open16:39
zbrsomeone i have the impression you will never find time to rework your 3y+ incomplete patches.16:40
*** ociuhandu has quit IRC16:42
*** ykarel has quit IRC16:47
*** ociuhandu has joined #openstack-infra16:47
*** matt_kosut has quit IRC16:56
*** lucasagomes has quit IRC17:08
*** zzzeek has quit IRC17:12
*** zzzeek has joined #openstack-infra17:13
*** jamesmcarthur has quit IRC17:17
*** eolivare has quit IRC17:18
*** jamesmcarthur has joined #openstack-infra17:18
*** ociuhandu_ has joined #openstack-infra17:22
*** armax has quit IRC17:22
*** jamesmcarthur has quit IRC17:23
*** hashar has quit IRC17:25
*** ociuhandu has quit IRC17:26
*** armax has joined #openstack-infra17:26
*** ociuhandu_ has quit IRC17:27
*** jamesmcarthur has joined #openstack-infra17:29
*** jamesmcarthur has quit IRC17:34
*** amoralej is now known as amoralej|off17:39
*** ociuhandu has joined #openstack-infra17:47
*** jamesmcarthur has joined #openstack-infra17:50
*** gyee has joined #openstack-infra17:51
*** ralonsoh has quit IRC17:51
*** ociuhandu has quit IRC17:53
*** gfidente has quit IRC17:55
*** jpena is now known as jpena|off17:55
*** jcapitao has quit IRC17:59
*** derekh has quit IRC18:01
*** jamesmcarthur has quit IRC18:10
*** slaweq has quit IRC18:12
*** lbragstad has quit IRC18:16
*** dtantsur is now known as dtantsur|afk18:17
*** lbragstad has joined #openstack-infra18:22
*** jamesmcarthur has joined #openstack-infra18:25
*** nightmare_unreal has quit IRC18:25
*** ramishra has quit IRC18:38
*** andrewbonney has quit IRC18:56
*** matt_kosut has joined #openstack-infra18:57
*** matt_kosut has quit IRC19:02
*** jamesmcarthur has quit IRC19:28
*** ThePherm has joined #openstack-infra19:33
*** jamesmcarthur has joined #openstack-infra19:43
*** jamesdenton has quit IRC19:43
*** jamesdenton has joined #openstack-infra19:43
*** slaweq has joined #openstack-infra19:46
*** jamesmcarthur has quit IRC19:47
dansmithfungi: you mean tempest would ask nova to create an instance and then when it deleted it, the instance seemingly didn't get deleted (because arp indications) ?19:59
*** jamesmcarthur has joined #openstack-infra20:00
*** slaweq has quit IRC20:01
fungidansmith: nope, nodepool would ask the cloud provider to boot an instance and then the address it gets assigned happens to already be in use by a vm elsewhere on the lan, typically it was an existing server instance at some point but nova no longer had record of it, and the provider ends up running virsh across their hosts to find it20:01
dansmithah20:01
dansmiththat would seem like a huge problem for them20:01
dansmithunrelated to us20:01
fungiwell, *maybe* unrelated. could also be a bug in nova/neutron/something we've provided them ;)20:02
dansmithno,20:02
dansmithI meant unrelated to the aggravation it causes us as a "customer",20:02
fungioh, yep20:02
dansmithI would think that would be a problem for lots of customers and thus a real problem for them20:02
dansmithyou say virsh, so does that mean they end up actually finding libvirt domains that never went away?20:03
dansmithbecause if they have logs, we could certainly look at them, but I've really never heard of that happening20:03
fungiwhere it winds up impacting jobs is that sometimes the new node will initially "win" in the arp fight with the router and zul will be able to ssh into it, but then at some point it loses the battle in an arp cache refresh and suddenly zuul experiences a connection timeout, connection refused, or ssh host key changed error and fails or retries the job20:03
dansmithnow, asking to delete an instance and nova not being able to do it? sure, but.. not acting like it's gone and it's now20:04
dansmith*not20:04
dansmithsure, that's a huge problem20:04
fungihow different providers deal with it varies, and they're not all so forthcoming with details. i've heard at least one provider say they ran virsh on all their hosts periodically to find virtual machines that nova had "forgotten about" so they could be cleaned up20:05
fungibut this is all second-hand20:05
dansmithhmm, I'm skeptical :)20:05
dansmithnova even periodically reaps instances that might've been deleted when it was offline, if properly configured20:05
fungiit's just as possible some overzealous cleanup script is muddling things behind nova's back20:06
fungii'm not really privy to what goes on in anyone's networks, just a user20:06
dansmithyeah I mean there's lots of custom stuff that could be getting in the way20:06
fungii wouldn't be surprised to learn that it's either a bug in some openstack software (maybe in a very old release they're still running) or the side effects of custom attempts to work around yet another bug of some sort20:07
fungibut possibilities are endless20:08
dansmithI mean, I'm not at all saying it's not a bug in nova,20:09
dansmithI'm just saying that if we had this problem, I would think people would be raising hell about it because it's such a big deal20:09
dansmithso it makes it wonder if people have local hacks, some other buggy network hook thing, etc20:10
fungii can say that we see evidence of it in varying degrees across most of our cloud donors from time to time20:10
dansmithmnaser: you see this? ^20:11
fungiusually it's background noise/minor annoyance for us, sometimes it's debilitating and we temporarily stop using that provider and/or give them a list of addresses we saw impacted so they can clean them up20:11
dansmithfungi: if they will file bugs (or if there have been some already) then please encourage them to speak up about them next time you hear it20:11
fungii don't recall if we've seen it happen in vexxhost20:11
fungithough in some cases it seems to be coupled with system outages/restarts where something may have gotten out of sync or maybe a database got rolled back20:12
dansmithyeah, so,20:12
fungiin very large providers, that may be a constant occurrence, i suppose20:12
dansmithif they don't have nova set to reap unknown VMs (maybe because they're cheating and running their own under the covers) then a delete while the system is not healthy could leave one running20:13
dansmithbut that's the point of that option20:13
fungiright, and whether or not a particular donor takes advantage of that feature we generally won't know. it may also be racy and we're hitting the problem between whatever points in time nova rediscovers rogue virtual machines20:14
dansmithit's possible, but again that case has to involve a compute node going offline (via rabbit) for more than a $service_timeout period right before you tried to delete your instance20:15
dansmithand actually, it might even require them to have archived their DB in there, I forget all the semantics of that check20:15
clarkbfungi: did this come up again because we've seen errors in inap after reenabling it?20:15
fungithe varying degrees to which we see it in different providers could be a mix of the stability of their infrastructure, whether they're relying on that feature, and what sort of frequency it's checking20:15
fungiclarkb: not entirely, i was explaining in #openstack-tc why we had disabled it previously20:16
dansmithfungi: maybe but as above, some real badness has to happen to even get into a situation where nova needs to reap one of its own20:16
fungii don't know if the problem there has resurfaced since readding them yesterday20:16
dansmithinap was on super old cellsv1 for quite a while.. do we know what they run now?20:16
fungireal badness is the beef stew on which cloud operators dine daily ;)20:17
dansmithbecause cellsv1 had all kinds of problems with stuff like that because of all the fowarding20:17
clarkbfungi: got it20:17
fungidansmith: i have no idea if they've upgraded, but it could explain why it's been more of a problem there than elsewhere20:17
dansmithyeah for sure20:17
fungithe second highest incidence of it has tended to be in rackspace, though seems less common lately. we've also seen it come and go in ovh from time to time20:18
dansmithI also don't know if rax ever moved :)20:18
fungithis sounds liek a potential correlation ;)20:19
dansmithnot sure about ovh, but I kinda expect they are more current20:19
dansmithI thought rax was mostly frozen in time from the sale, which was cellsv1 IIRC20:19
*** auristor has quit IRC20:20
*** auristor has joined #openstack-infra20:23
melwittthere was/is a patch proposed to reap unknown VMs but we nacked it because archive --before could avoid that state20:30
*** jamesmcarthur has quit IRC20:34
*** jamesmcarthur has joined #openstack-infra20:34
*** thogarre has joined #openstack-infra20:35
dansmithmelwitt: isn't there a setting for that reap periodic that will nuke non-instance vms?20:35
melwittno there isn't. that's what the person wanted to add20:35
melwittwell, sorry. they wanted to nuke instance (owned by nova) vms that were no longer in the database20:36
melwittbut yeah we don't have any options for destroying guest vms that are unknown to nova20:37
dansmithhmm, I really thought there was20:37
dansmithnot really sure why we have the option to turn of the reaping then, if it's only things we know used to be ours20:38
melwittthis was the patch I'm thinking of https://review.opendev.org/c/openstack/nova/+/62776520:39
melwittI don't know the historical reasons to not reap but there are other options like 'log' and 'shutdown'. they look to be options for if the operator wanted to debug or otherwise do forensic stuff20:40
dansmithI distinctly remember some issue with nova reaping vms that weren't nova instances on hosts, from like 2012 (an IBM internal customer)20:40
dansmithbut a lot has changed since then, including all that power state goo from the AT&T people20:41
melwitthttps://github.com/openstack/nova/blob/master/nova/conf/compute.py#L132320:41
melwittdefault is reap20:41
*** dciabrin__ has joined #openstack-infra20:41
dansmithyeah, the comment even says that "log is what you want for production" which I don't think is legit :)20:41
dansmithyeah, I'm looking at it20:41
melwittoh, yeah I wasn't aware of the 2012 issue of reaping vms that weren't nova instances. maybe that was "fixed" since20:42
dansmithwell, a ton of that periodic stuff that tries to correct power state (to the chagrin of the ironic people) was all around/after that point20:42
dansmithso .. yeah, I wonder if they're not running reap, but should (or at least shutdown)20:43
*** dciabrin_ has quit IRC20:43
dansmithbut even still, this would only be a thing if they're local deleting from an api node20:43
melwittyeah ... I dunno, as far as I've known, it's only reaped instances in the database and has caused problems for those who happened to have an archive cron run while any computes were "down"20:43
dansmithwithout logs it's really impossible to tell20:44
melwittright20:44
melwittso if internally a provider cloud had a network partition or some other issue to a compute when a delete was requested, it would do the local delete20:44
melwittand then if archive ran before the compute came back to being accessible, this would happen20:45
fungialso it may impact us more by virtue of the fact that we constantly boot and delete instances in tight loops... if their typical users treat the service more like a vps and boot new instances a few times a month and keep them for years, they're unlikely to be severely impacted20:45
dansmithmelwitt: right, it's a lot of hoops to jump through for a thing that seems to be pervasive20:45
fungiso it's just those crazy openstack testing people who are complaining about it ;)20:46
melwittfungi: yeah, I think so. internally the most I've seen this is on an internal CI system, they call them "zombie vms"20:46
dansmithmelwitt: but, I can totally see cellsv1 database replication over rabbit leaving residue that causes the compute to never delete it20:46
dansmithmelwitt: because a delete is always local at the top cell anyway, has to be replicated down to the cell and replayed to actually delete sutff20:46
dansmiththat might also be why lots of delete/create ends up with conflicts, even if they're temporary.. that's how cellsv1 worked20:47
melwittfungi: and it's because there's a cron running archive + the env struggles with high load issues and computes go "down" often-ish20:47
fungimelwitt: zombie works. i've tended to refer to them as "rogue vms" (reminds me of rogue ais, i guess)20:47
dansmithmelwitt: yeah, so those are easily explainable situations for why it happens on that internal cloud right?20:48
melwittdansmith: yeah, agreed cells v1 could be at play here. I keep forgetting it's still in rax at the least20:48
dansmiththe thing I find so bizarre is why someone like inap would be experiencing this with us and other customers and just sweep it under the rug instead of complain20:48
*** jamesmcarthur has quit IRC20:48
dansmithunless they're on cellsv1 and they know why they have consistency problems :)20:48
fungicould be. next time someone sees mgagne we can ask for more details20:49
melwittyeah... I _thought_ mgagne had moved to cells v2. but I don't feel sure20:50
*** viks____ has quit IRC20:50
fungianyway, it's possible this last bout was just protracted fallout from a major outage or maintenance which took them a couple months to get around to cleaning up completely for $reasons, and it's cleared up since. i haven't seen anyone complaining about a spike in these sorts of failures in the past 24 ours since we've reenabled them in nodepool again20:52
* fungi hunts down the elastic-recheck graph for that pattern20:52
fungiyeah, i can't even find the query which was tracking that condition20:57
*** rcernin has joined #openstack-infra20:58
fungientirely possible that in the zuul v3 era it mostly surfaces as occasional retry_limit results because zuul sees it as a connectivity issue and so builds occasionally just get unlucky and run there three times and happen to trip over bad addresses every time20:59
melwittah, yeah21:01
*** rcernin has quit IRC21:02
*** rcernin has joined #openstack-infra21:03
*** jamesmcarthur has joined #openstack-infra21:15
fungii think it was hitting tripleo hardest because they already had some jobs which were just intermittently knocking test nodes offline, so if they were already relying on builds getting retried once or twice, having an increased source of retries in some provider would nudge them over the edge into retry_limit21:19
*** thogarre has quit IRC21:20
*** jamesmcarthur has quit IRC21:23
*** rcernin has quit IRC21:28
*** jamesmcarthur has joined #openstack-infra21:29
*** rlandy has quit IRC21:40
*** xek_ has quit IRC21:48
*** jamesmcarthur has quit IRC22:16
*** jamesmcarthur has joined #openstack-infra22:17
fungizbr: looks like i'll probably wind up doing the git-review and bindep releases tomorrow, i've had more stuff than i expected crop up today and am running out of steam at this point22:27
*** rcernin has joined #openstack-infra22:46
*** zzzeek has quit IRC22:56
*** zzzeek has joined #openstack-infra22:57
*** yamamoto has joined #openstack-infra22:58
*** matt_kosut has joined #openstack-infra22:59
*** ociuhandu has joined #openstack-infra22:59
*** matt_kosut has quit IRC23:03
*** ociuhandu has quit IRC23:03
*** paladox has quit IRC23:14
*** paladox has joined #openstack-infra23:17
*** jamesmcarthur has quit IRC23:34
*** paladox has quit IRC23:40
*** jamesdenton has quit IRC23:40
*** jamesdenton has joined #openstack-infra23:41
*** paladox has joined #openstack-infra23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!