Friday, 2024-11-22

clarkbmachine id only works if you never change where you're booting the disk00:00
JayFthey generate a machine-id for the DIB image build in the centos element00:02
JayFwhich I think means (maybe this is a bug from a different perspective) that any machine booted with an image built with centos element would have the same machine-id unless you explicitly regenerated it00:02
clarkbJayF: right but when you boot with kvm you'll get a new machine id00:02
JayFdoes cloud-init do that?00:02
clarkbnow that you say that its a good question. We don't use cloud-init but maybe the machine id I've got baked into the image is just hardcoded?00:03
clarkbI suspect that cloud-init will though00:03
JayFMy hunch is that /it will if you don't already have one/00:03
ianwi thought creation was a systemd thing...00:04
clarkblooks like systemd firstboot stes it00:04
JayFIt'd be nice to figure out and have a policy though; e.g. the gentoo element won't do any of the systemd-machine-id-setup bits at all even if you're using a profile00:04
clarkbalso apparently it is a secret value00:04
JayFwhich has to be manually run00:05
JayFunless the OS is doing it00:05
JayFsystemd-firstboot is an interactive thing iirc, unless gentoo modifies it or it looks for a tty00:05
JayFsystemd-machine-id-setup is used I assumed because it's noninteractive and just generates your machine id00:05
clarkbhttps://man7.org/linux/man-pages/man5/machine-id.5.html has a whole process on setting it00:05
clarkband yes it seems like you shouldn't bake it into your images and instead you should let systemd generate it00:05
JayFI generally love systemd but this feels so windows-y 00:06
clarkbit will come from kvm or boot flags etc etc00:06
clarkbI think having a machine id is fine especially for stuff that actually needs to be tied to a machine00:06
ianwi am fairly sure we rm it from the images00:06
JayFin which case, having no machine-id during kernel install triggers the bug ianw linked and coreos elemnt is working around00:06
clarkbbut your boot paths shouldn't be imo00:06
clarkbbecause moving a disk from one machien to another is totally fine00:06
clarkbI have done it a non zero number of times and I don't wnt to break kernel udpates if i do00:06
JayFbut my point is: that won't change machine-id unless you take postiive action to do so00:06
ianwdiskimage_builder/elements/sysprep/finalise.d/99-clear-machine-id00:07
JayFso either: 1) there's something magic that systemd is doing to regen in VM cases (I don't think so) or 2) it only generates it if it doesn't exist00:07
clarkbbut also since machine id is meant to eb secret the code you linked to is broken00:07
JayFianw: there is not `sysprep` in element-deps for the dep tree of centos element00:08
clarkbthough it probably doesn't matter much if we clear it out in the end00:08
JayFer wait, I was looking at centos-minimal00:08
JayFin centos element, it deps on redhat-common, which deps on sysprep00:08
JayFso it gets removed in the centos element case, too00:08
clarkbthe problem is when you install the kernel during image creation it needs a machine id to set up bls so you have an id of some sort00:09
clarkbthen you boot and install a new kernel. Well now you have a new machine id and that confusees bls00:09
clarkbbls should probably never have used machine id in the first place because its conflating things that can only create problems in this space?00:09
JayFI just looked up bls; I find myself grateful that I can choose to use not to do it that way on my system :)00:10
JayFbut I need to step away, I hope I helped some and didn't just stir in more confusion :) 00:10
clarkbin particular VM images are static and reused across many machines (thus a static machine id makes no sense) and also you can take a disk out of one machien and put it in another and boot it there00:11
clarkbanyway I think my workaround may be workarounding00:11
clarkbfirst pass is good for x86 and arm so far00:11
clarkbI'll port it over to system-config if it rechecks clean00:12
clarkbmy hunch is that what drives all of this is security00:15
clarkband wanting to only boot kernels that match the machine id. However, it doesn't seem to do that sincei t owrks sometimes00:15
clarkbianw: is the jira version of that bug public too?00:19
clarkb(I don't know if jira is accessibkle)00:20
ianwhttps://issues.redhat.com/browse/RHEL-4313 but it has been closed00:20
clarkboh wow the motivation behind it is to allow many distros to share /boot/loader/entries then they would each have a different uuid00:23
clarkbthat seems like optimizing for a use case that very few have and making life worse for the majority00:23
clarkbfwiw my dnf install of kernel-devel never failed00:25
clarkbI stopped it and will see if the post reboot pause is more reliable00:25
clarkbarg `grubby --set-default=/boot/vmlinuz-5.14.0-529.el9.aarch64` does not reliably change the kernel either00:30
clarkbat this point I wonder if I should delete the old entry from grubby00:31
clarkbor maybe I still need to mkconfig?00:32
ianwi *think* that grubby updates the config, then you need to run the mkconfig to apply it00:34
clarkbah ok /me updates00:35
clarkbianw: do you know if /boot/grub2/grub.cfg is the right file or is it /boot/efi/EFI/centos/grub.cfg for EFI boots?00:38
clarkbthey don't seem symlinked together (my local tumbleweed install seems to have the efi path grub.cfg include the grub2 path cfg00:39
ianwoh jeez, haha i do not know off hand sorry.  i was about to say i wonder if they're the same thing00:39
clarkblooks like on the held node I have its a realy file (no symlink) and it doesn't just include the other00:39
clarkbso probably need to generate for both00:39
clarkbbut I may have to pick this up tomorrow (at least it feels like some progress)00:40
clarkbI smell dinner00:40
clarkbintermediate registry pruning is up to ea700:40
clarkbalmost into the f00s 00:41
clarkbset prefix=(${root})/boot/grub2 then source "${prefix}/grub.cfg" is what my opensuse install does00:42
clarkbI like how that avoids things getting out of sync and you can boot old school or efi00:42
opendevreviewClark Boylan proposed opendev/system-config master: Upgrade and reboot test nodes before openafs installation  https://review.opendev.org/c/opendev/system-config/+/93581201:48
opendevreviewClark Boylan proposed opendev/system-config master: Upgrade and reboot test nodes before openafs installation  https://review.opendev.org/c/opendev/system-config/+/93581201:52
opendevreviewJoel Capitao proposed zuul/zuul-jobs master: DNM Switch to KVM  https://review.opendev.org/c/zuul/zuul-jobs/+/93602314:12
opendevreviewJoel Capitao proposed openstack/diskimage-builder master: DNM Testing on KVM  https://review.opendev.org/c/openstack/diskimage-builder/+/93602414:12
*** jhorstmann is now known as Guest56014:28
fungiclarkb: yeah, on my debian systems /boot/efi/EFI/debian/grub.cfg is super minimal and similarly does "set prefix=($root)'/grub'" followed by "configfile $prefix/grub.cfg"14:57
fungi(after setting the device search parameters)14:58
*** jhorstmann is now known as Guest57116:20
clarkbit seems like maybe I don't need to mkconfig to both files as https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/935813/ and https://review.opendev.org/c/opendev/system-config/+/935812 are passing fairly consistently now16:20
clarkbthe mkconfig log does say it is writing to uefi stuff so maybe it magically knows to do it there too?16:20
clarkbfungi: I have rechecked both of those one more time I think if they both pass we can land them (particularly the system-config change) then proceed with noble openafs updates16:21
fungisounds good, i'm popping out to grab a quick lunch, but can proceed with noble openafs upgrades as soon as i get back16:24
clarkbintermediate registry pruning is done. It returned 0 and seems to have exited cleanly16:26
clarkbas soon as I get my ssh keys loaded I'll see how many objects are left in that container16:27
clarkbI think we can proceed with https://review.opendev.org/c/opendev/system-config/+/935542 but I've just realized we should redirect output for that into a log file and logrotate the logfile. cc corvus are you interested in udpating that patch or should I?16:27
clarkb350k objects and I think 2TB. I'm trying to remember what frickler said the old size was? 16TB? quite an improvement and we should definitely prune regularly now that this appears to be reliable16:35
clarkbcorvus: I'm going ahead and adding the log handling to that change16:40
opendevreviewClark Boylan proposed opendev/system-config master: Revert "Temporarily disable intermediate registry prune"  https://review.opendev.org/c/opendev/system-config/+/93554216:45
opendevreviewClark Boylan proposed opendev/system-config master: Allow all Ubuntu releases to use our OpenAFS PPA  https://review.opendev.org/c/opendev/system-config/+/93572316:53
clarkbfungi: ^ I went ahead and rebased your change so they can be approved together and made the noble arm64 job voting since it should pass with the ppa enabled16:54
frickler4M ojects and17:02
frickler16TB it was,right17:03
*** jhorstmann is now known as Guest57617:05
corvusclarkb: you sure you want those logs?  :)17:08
clarkbcorvus: well I don't want them going to email :)17:10
clarkbbut also doesn't hurt ot have them17:10
corvusokay.  we can probably reduce the log level at some point too17:11
corvusi'm done with screen if you want to close it.17:12
corvusbtw, one thing i was wondering about -- is there really no way to know if there's a "next page" with swift pagination other than to just go ahead and request it and see if we get nothing back?  some pagination systems return a flag indicating whether there is more, so you don't have to issue that extra request...17:13
corvusfor what we're doing -- that's a lot of extra requests.17:13
clarkbcorvus: I wasn't able to find one in the docs17:14
clarkbhttps://docs.openstack.org/api-ref/object-store/#id2517:14
clarkbmaybe X-Container-Object-Count header gives you to total objects with your prefix and delimiter instead of total container count when you use those parameters?17:15
fungithanks clarkb!17:20
clarkbcorvus: we might be able to use a heuristic that if we've gotten at least 10k responses at some point any response with less than 10k is the last one?17:29
clarkbcorvus: or more generally a response count less than the last response count (that should handle cases where swift might be configured to do more ore less than 10k per request)17:30
corvusyeah -- i think that leads to an algorithm where we have to make the extra queries until one of them actually succeeds, then we know what the max is and we can avoid them in the future?17:49
corvusamusingly, we may never hit the max any more on registry pruning, so it may not be worth the effort.  ;)17:49
clarkbthat reminds me I have a deletion script I should get into working shape however I don't think that will happen today17:51
clarkbthat nova stack of changes is holing up our arm64 test jobs18:04
clarkb*holding18:05
opendevreviewMerged opendev/system-config master: Revert "Temporarily disable intermediate registry prune"  https://review.opendev.org/c/opendev/system-config/+/93554218:26
corvuswonder if we should have registry emit some statsd18:27
*** jhorstmann is now known as Guest58618:47
clarkbfungi: ok all jobs have finally started on that change. I've carried my +2 over if yo uwant to review the parent change and the ozj change (they are very similar/related)18:48
clarkbwe can probably approve all three then work on noble openafs next18:48
corvushttps://zuul.opendev.org/t/zuul/build/58263e295a264039addc8fd9aed3b2c9/console18:51
corvushttps://zuul.opendev.org/t/zuul/build/a27683590c3d4cc58ece62b60d35d529/console18:51
corvusclarkb: do the errors in those two builds make any sense to you?18:51
clarkbhrm maybe that is fallout from registry pruning?18:52
clarkboh no thats buildset registry18:52
corvusyeah, and i think https://zuul.opendev.org/t/zuul/stream/97df38d706fb4d39b89582ee5a8a521b?logfile=console.log is succeeding past that point18:52
corvusthe two failures are on bhs1, the success (so far) is on dfw18:53
corvusthe buildset registry should be local to the node; so it shouldn't be an off-node network issue18:54
clarkbcorvus: the buildset registry is doing pass through proxying right? I see a set of 401s here: https://zuul.opendev.org/t/zuul/build/58263e295a264039addc8fd9aed3b2c9/log/docker/buildset_registry.txt#16-20 I wonder if we got rate limited and this is how it expresses when proxying through the buildset registry18:54
clarkbthough if that is the case how would we know what manifest sha to fetch18:55
corvusi believe it's fallback not passthrough18:56
clarkbthe shas differ so its not some weird hashing of the 401 or 404 pages18:56
clarkbhrm in that case maybe this is expected? those images are all upstream of us and maybe we aren't falling back as we should?18:57
clarkbfalling back in the buildkit stuff where it should try buildset registry then docker hub?18:57
corvusso potential change in podman/containers/something?18:57
corvuscould that be a dockerhub rate limit manifesting through buildkit?18:58
corvuslike if dockerhub returns a rate limit error, buildkit just prints the error for the first thing it tried?18:58
corvus(i'm trying to remember if we've seen rate limit errors in this particular location...)18:59
clarkbit almost looks like it asks the buildset registry about the manifests and it 404s properly and then we ask docker.io and get back the shas but then when we try to fetch the shas from docker we either aren't even going to docker (only buildset registry) or we do go to docker and get rate limited18:59
corvusyeah maybe that's it18:59
clarkbthat would explain why we learn about the shas and why there isn't any info in the buildset registry until we try to fetch the shas about the shas19:01
corvusi think that's a good assumption for now.  i'm going to go ahead and hold the next failure in case you or i or anyone else feels like running some docker pulls on the test node to verify19:01
clarkbso ya I think we must be falling back then this is just some poor error reporting from buildkit19:01
corvusbut other than that, i think that's a plausible enough explanation that i don't feel the need to dig deeper right this moment.  thanks :)19:02
clarkbfwiw it seems like we may be hitting rate limits less often but still hit them19:04
clarkbthats somewhat anecdata and it may not be due to not using ht proxy it could be due to usage pattern shifts within ci jobs19:04
clarkbI do think your idea of mirroring "upstream" images is a good one though19:04
clarkbmariadb, our base python images, zk, etc19:05
clarkbfungi: I see you approved my system-config change I'll go ahead and approve yours (this one https://review.opendev.org/c/opendev/system-config/+/935723)19:08
clarkbfungi: and then https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/935813 is the ozj change with the similar reboot process as my system-config change. Probably worth reviewing now while you've got it paged in19:08
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: WIP: Add mirror-container-images role and job  https://review.opendev.org/c/zuul/zuul-jobs/+/93557419:21
fungilgtm19:32
opendevreviewMerged opendev/system-config master: Upgrade and reboot test nodes before openafs installation  https://review.opendev.org/c/opendev/system-config/+/93581219:49
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: WIP: Add mirror-container-images role and job  https://review.opendev.org/c/zuul/zuul-jobs/+/93557419:56
corvusit would be a very simple role, except i'm trying to keep as much in common with the other container roles as possible, which means some complex data structure stuff to support the idea of restricted secrets....19:57
opendevreviewMerged opendev/system-config master: Allow all Ubuntu releases to use our OpenAFS PPA  https://review.opendev.org/c/opendev/system-config/+/93572319:59
clarkbfungi: ^ I think the mirror deploy job for ^ has succeeded, but I'm guess we still need to manually udpate the packages on the noble mirrors20:31
clarkbI'm going to pop out for a bike ride now but will be back in a bit20:32
fungiyep, on it20:33
fungihave fun!20:33
fungiturns out the deploy job did install the newer package versions, so the servers really just needed rebooting. i'm working through them now21:13
fungibut https://mirror.sjc3.raxflex.opendev.org/ indicates it's working fine on noble21:13
fungiand they're all rebooted onto 1.8.13-1~ppa0~noble now21:23
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Add mirror-container-images role and job  https://review.opendev.org/c/zuul/zuul-jobs/+/93557421:24
*** jhorstmann is now known as Guest60621:35
clarkbfungi: thank you for taking care of that22:46
clarkbcorvus: I'm going to close our screen on insecure-ci-registry.o.o. The log file appears to have captured what we tee'd out to the console too so we don't need that info23:11
clarkb(we don't need the console open to see the info I mean)23:12
*** jhorstmann is now known as Guest61423:13
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: WIP: Add mirror-container-images role and job  https://review.opendev.org/c/zuul/zuul-jobs/+/93557423:18
corvusclarkb: ++23:18
corvusi already detached23:18
clarkbdone23:19
clarkbalso that log file got big (just over 2GB)23:19
clarkbthere is plenty of room on the server so not a big deal to hang onto it for now23:19
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Add mirror-container-images role and job  https://review.opendev.org/c/zuul/zuul-jobs/+/93557423:45

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!