clarkb | machine id only works if you never change where you're booting the disk | 00:00 |
---|---|---|
JayF | they generate a machine-id for the DIB image build in the centos element | 00:02 |
JayF | which I think means (maybe this is a bug from a different perspective) that any machine booted with an image built with centos element would have the same machine-id unless you explicitly regenerated it | 00:02 |
clarkb | JayF: right but when you boot with kvm you'll get a new machine id | 00:02 |
JayF | does cloud-init do that? | 00:02 |
clarkb | now that you say that its a good question. We don't use cloud-init but maybe the machine id I've got baked into the image is just hardcoded? | 00:03 |
clarkb | I suspect that cloud-init will though | 00:03 |
JayF | My hunch is that /it will if you don't already have one/ | 00:03 |
ianw | i thought creation was a systemd thing... | 00:04 |
clarkb | looks like systemd firstboot stes it | 00:04 |
JayF | It'd be nice to figure out and have a policy though; e.g. the gentoo element won't do any of the systemd-machine-id-setup bits at all even if you're using a profile | 00:04 |
clarkb | also apparently it is a secret value | 00:04 |
JayF | which has to be manually run | 00:05 |
JayF | unless the OS is doing it | 00:05 |
JayF | systemd-firstboot is an interactive thing iirc, unless gentoo modifies it or it looks for a tty | 00:05 |
JayF | systemd-machine-id-setup is used I assumed because it's noninteractive and just generates your machine id | 00:05 |
clarkb | https://man7.org/linux/man-pages/man5/machine-id.5.html has a whole process on setting it | 00:05 |
clarkb | and yes it seems like you shouldn't bake it into your images and instead you should let systemd generate it | 00:05 |
JayF | I generally love systemd but this feels so windows-y | 00:06 |
clarkb | it will come from kvm or boot flags etc etc | 00:06 |
clarkb | I think having a machine id is fine especially for stuff that actually needs to be tied to a machine | 00:06 |
ianw | i am fairly sure we rm it from the images | 00:06 |
JayF | in which case, having no machine-id during kernel install triggers the bug ianw linked and coreos elemnt is working around | 00:06 |
clarkb | but your boot paths shouldn't be imo | 00:06 |
clarkb | because moving a disk from one machien to another is totally fine | 00:06 |
clarkb | I have done it a non zero number of times and I don't wnt to break kernel udpates if i do | 00:06 |
JayF | but my point is: that won't change machine-id unless you take postiive action to do so | 00:06 |
ianw | diskimage_builder/elements/sysprep/finalise.d/99-clear-machine-id | 00:07 |
JayF | so either: 1) there's something magic that systemd is doing to regen in VM cases (I don't think so) or 2) it only generates it if it doesn't exist | 00:07 |
clarkb | but also since machine id is meant to eb secret the code you linked to is broken | 00:07 |
JayF | ianw: there is not `sysprep` in element-deps for the dep tree of centos element | 00:08 |
clarkb | though it probably doesn't matter much if we clear it out in the end | 00:08 |
JayF | er wait, I was looking at centos-minimal | 00:08 |
JayF | in centos element, it deps on redhat-common, which deps on sysprep | 00:08 |
JayF | so it gets removed in the centos element case, too | 00:08 |
clarkb | the problem is when you install the kernel during image creation it needs a machine id to set up bls so you have an id of some sort | 00:09 |
clarkb | then you boot and install a new kernel. Well now you have a new machine id and that confusees bls | 00:09 |
clarkb | bls should probably never have used machine id in the first place because its conflating things that can only create problems in this space? | 00:09 |
JayF | I just looked up bls; I find myself grateful that I can choose to use not to do it that way on my system :) | 00:10 |
JayF | but I need to step away, I hope I helped some and didn't just stir in more confusion :) | 00:10 |
clarkb | in particular VM images are static and reused across many machines (thus a static machine id makes no sense) and also you can take a disk out of one machien and put it in another and boot it there | 00:11 |
clarkb | anyway I think my workaround may be workarounding | 00:11 |
clarkb | first pass is good for x86 and arm so far | 00:11 |
clarkb | I'll port it over to system-config if it rechecks clean | 00:12 |
clarkb | my hunch is that what drives all of this is security | 00:15 |
clarkb | and wanting to only boot kernels that match the machine id. However, it doesn't seem to do that sincei t owrks sometimes | 00:15 |
clarkb | ianw: is the jira version of that bug public too? | 00:19 |
clarkb | (I don't know if jira is accessibkle) | 00:20 |
ianw | https://issues.redhat.com/browse/RHEL-4313 but it has been closed | 00:20 |
clarkb | oh wow the motivation behind it is to allow many distros to share /boot/loader/entries then they would each have a different uuid | 00:23 |
clarkb | that seems like optimizing for a use case that very few have and making life worse for the majority | 00:23 |
clarkb | fwiw my dnf install of kernel-devel never failed | 00:25 |
clarkb | I stopped it and will see if the post reboot pause is more reliable | 00:25 |
clarkb | arg `grubby --set-default=/boot/vmlinuz-5.14.0-529.el9.aarch64` does not reliably change the kernel either | 00:30 |
clarkb | at this point I wonder if I should delete the old entry from grubby | 00:31 |
clarkb | or maybe I still need to mkconfig? | 00:32 |
ianw | i *think* that grubby updates the config, then you need to run the mkconfig to apply it | 00:34 |
clarkb | ah ok /me updates | 00:35 |
clarkb | ianw: do you know if /boot/grub2/grub.cfg is the right file or is it /boot/efi/EFI/centos/grub.cfg for EFI boots? | 00:38 |
clarkb | they don't seem symlinked together (my local tumbleweed install seems to have the efi path grub.cfg include the grub2 path cfg | 00:39 |
ianw | oh jeez, haha i do not know off hand sorry. i was about to say i wonder if they're the same thing | 00:39 |
clarkb | looks like on the held node I have its a realy file (no symlink) and it doesn't just include the other | 00:39 |
clarkb | so probably need to generate for both | 00:39 |
clarkb | but I may have to pick this up tomorrow (at least it feels like some progress) | 00:40 |
clarkb | I smell dinner | 00:40 |
clarkb | intermediate registry pruning is up to ea7 | 00:40 |
clarkb | almost into the f00s | 00:41 |
clarkb | set prefix=(${root})/boot/grub2 then source "${prefix}/grub.cfg" is what my opensuse install does | 00:42 |
clarkb | I like how that avoids things getting out of sync and you can boot old school or efi | 00:42 |
opendevreview | Clark Boylan proposed opendev/system-config master: Upgrade and reboot test nodes before openafs installation https://review.opendev.org/c/opendev/system-config/+/935812 | 01:48 |
opendevreview | Clark Boylan proposed opendev/system-config master: Upgrade and reboot test nodes before openafs installation https://review.opendev.org/c/opendev/system-config/+/935812 | 01:52 |
opendevreview | Joel Capitao proposed zuul/zuul-jobs master: DNM Switch to KVM https://review.opendev.org/c/zuul/zuul-jobs/+/936023 | 14:12 |
opendevreview | Joel Capitao proposed openstack/diskimage-builder master: DNM Testing on KVM https://review.opendev.org/c/openstack/diskimage-builder/+/936024 | 14:12 |
*** jhorstmann is now known as Guest560 | 14:28 | |
fungi | clarkb: yeah, on my debian systems /boot/efi/EFI/debian/grub.cfg is super minimal and similarly does "set prefix=($root)'/grub'" followed by "configfile $prefix/grub.cfg" | 14:57 |
fungi | (after setting the device search parameters) | 14:58 |
*** jhorstmann is now known as Guest571 | 16:20 | |
clarkb | it seems like maybe I don't need to mkconfig to both files as https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/935813/ and https://review.opendev.org/c/opendev/system-config/+/935812 are passing fairly consistently now | 16:20 |
clarkb | the mkconfig log does say it is writing to uefi stuff so maybe it magically knows to do it there too? | 16:20 |
clarkb | fungi: I have rechecked both of those one more time I think if they both pass we can land them (particularly the system-config change) then proceed with noble openafs updates | 16:21 |
fungi | sounds good, i'm popping out to grab a quick lunch, but can proceed with noble openafs upgrades as soon as i get back | 16:24 |
clarkb | intermediate registry pruning is done. It returned 0 and seems to have exited cleanly | 16:26 |
clarkb | as soon as I get my ssh keys loaded I'll see how many objects are left in that container | 16:27 |
clarkb | I think we can proceed with https://review.opendev.org/c/opendev/system-config/+/935542 but I've just realized we should redirect output for that into a log file and logrotate the logfile. cc corvus are you interested in udpating that patch or should I? | 16:27 |
clarkb | 350k objects and I think 2TB. I'm trying to remember what frickler said the old size was? 16TB? quite an improvement and we should definitely prune regularly now that this appears to be reliable | 16:35 |
clarkb | corvus: I'm going ahead and adding the log handling to that change | 16:40 |
opendevreview | Clark Boylan proposed opendev/system-config master: Revert "Temporarily disable intermediate registry prune" https://review.opendev.org/c/opendev/system-config/+/935542 | 16:45 |
opendevreview | Clark Boylan proposed opendev/system-config master: Allow all Ubuntu releases to use our OpenAFS PPA https://review.opendev.org/c/opendev/system-config/+/935723 | 16:53 |
clarkb | fungi: ^ I went ahead and rebased your change so they can be approved together and made the noble arm64 job voting since it should pass with the ppa enabled | 16:54 |
frickler | 4M ojects and | 17:02 |
frickler | 16TB it was,right | 17:03 |
*** jhorstmann is now known as Guest576 | 17:05 | |
corvus | clarkb: you sure you want those logs? :) | 17:08 |
clarkb | corvus: well I don't want them going to email :) | 17:10 |
clarkb | but also doesn't hurt ot have them | 17:10 |
corvus | okay. we can probably reduce the log level at some point too | 17:11 |
corvus | i'm done with screen if you want to close it. | 17:12 |
corvus | btw, one thing i was wondering about -- is there really no way to know if there's a "next page" with swift pagination other than to just go ahead and request it and see if we get nothing back? some pagination systems return a flag indicating whether there is more, so you don't have to issue that extra request... | 17:13 |
corvus | for what we're doing -- that's a lot of extra requests. | 17:13 |
clarkb | corvus: I wasn't able to find one in the docs | 17:14 |
clarkb | https://docs.openstack.org/api-ref/object-store/#id25 | 17:14 |
clarkb | maybe X-Container-Object-Count header gives you to total objects with your prefix and delimiter instead of total container count when you use those parameters? | 17:15 |
fungi | thanks clarkb! | 17:20 |
clarkb | corvus: we might be able to use a heuristic that if we've gotten at least 10k responses at some point any response with less than 10k is the last one? | 17:29 |
clarkb | corvus: or more generally a response count less than the last response count (that should handle cases where swift might be configured to do more ore less than 10k per request) | 17:30 |
corvus | yeah -- i think that leads to an algorithm where we have to make the extra queries until one of them actually succeeds, then we know what the max is and we can avoid them in the future? | 17:49 |
corvus | amusingly, we may never hit the max any more on registry pruning, so it may not be worth the effort. ;) | 17:49 |
clarkb | that reminds me I have a deletion script I should get into working shape however I don't think that will happen today | 17:51 |
clarkb | that nova stack of changes is holing up our arm64 test jobs | 18:04 |
clarkb | *holding | 18:05 |
opendevreview | Merged opendev/system-config master: Revert "Temporarily disable intermediate registry prune" https://review.opendev.org/c/opendev/system-config/+/935542 | 18:26 |
corvus | wonder if we should have registry emit some statsd | 18:27 |
*** jhorstmann is now known as Guest586 | 18:47 | |
clarkb | fungi: ok all jobs have finally started on that change. I've carried my +2 over if yo uwant to review the parent change and the ozj change (they are very similar/related) | 18:48 |
clarkb | we can probably approve all three then work on noble openafs next | 18:48 |
corvus | https://zuul.opendev.org/t/zuul/build/58263e295a264039addc8fd9aed3b2c9/console | 18:51 |
corvus | https://zuul.opendev.org/t/zuul/build/a27683590c3d4cc58ece62b60d35d529/console | 18:51 |
corvus | clarkb: do the errors in those two builds make any sense to you? | 18:51 |
clarkb | hrm maybe that is fallout from registry pruning? | 18:52 |
clarkb | oh no thats buildset registry | 18:52 |
corvus | yeah, and i think https://zuul.opendev.org/t/zuul/stream/97df38d706fb4d39b89582ee5a8a521b?logfile=console.log is succeeding past that point | 18:52 |
corvus | the two failures are on bhs1, the success (so far) is on dfw | 18:53 |
corvus | the buildset registry should be local to the node; so it shouldn't be an off-node network issue | 18:54 |
clarkb | corvus: the buildset registry is doing pass through proxying right? I see a set of 401s here: https://zuul.opendev.org/t/zuul/build/58263e295a264039addc8fd9aed3b2c9/log/docker/buildset_registry.txt#16-20 I wonder if we got rate limited and this is how it expresses when proxying through the buildset registry | 18:54 |
clarkb | though if that is the case how would we know what manifest sha to fetch | 18:55 |
corvus | i believe it's fallback not passthrough | 18:56 |
clarkb | the shas differ so its not some weird hashing of the 401 or 404 pages | 18:56 |
clarkb | hrm in that case maybe this is expected? those images are all upstream of us and maybe we aren't falling back as we should? | 18:57 |
clarkb | falling back in the buildkit stuff where it should try buildset registry then docker hub? | 18:57 |
corvus | so potential change in podman/containers/something? | 18:57 |
corvus | could that be a dockerhub rate limit manifesting through buildkit? | 18:58 |
corvus | like if dockerhub returns a rate limit error, buildkit just prints the error for the first thing it tried? | 18:58 |
corvus | (i'm trying to remember if we've seen rate limit errors in this particular location...) | 18:59 |
clarkb | it almost looks like it asks the buildset registry about the manifests and it 404s properly and then we ask docker.io and get back the shas but then when we try to fetch the shas from docker we either aren't even going to docker (only buildset registry) or we do go to docker and get rate limited | 18:59 |
corvus | yeah maybe that's it | 18:59 |
clarkb | that would explain why we learn about the shas and why there isn't any info in the buildset registry until we try to fetch the shas about the shas | 19:01 |
corvus | i think that's a good assumption for now. i'm going to go ahead and hold the next failure in case you or i or anyone else feels like running some docker pulls on the test node to verify | 19:01 |
clarkb | so ya I think we must be falling back then this is just some poor error reporting from buildkit | 19:01 |
corvus | but other than that, i think that's a plausible enough explanation that i don't feel the need to dig deeper right this moment. thanks :) | 19:02 |
clarkb | fwiw it seems like we may be hitting rate limits less often but still hit them | 19:04 |
clarkb | thats somewhat anecdata and it may not be due to not using ht proxy it could be due to usage pattern shifts within ci jobs | 19:04 |
clarkb | I do think your idea of mirroring "upstream" images is a good one though | 19:04 |
clarkb | mariadb, our base python images, zk, etc | 19:05 |
clarkb | fungi: I see you approved my system-config change I'll go ahead and approve yours (this one https://review.opendev.org/c/opendev/system-config/+/935723) | 19:08 |
clarkb | fungi: and then https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/935813 is the ozj change with the similar reboot process as my system-config change. Probably worth reviewing now while you've got it paged in | 19:08 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: WIP: Add mirror-container-images role and job https://review.opendev.org/c/zuul/zuul-jobs/+/935574 | 19:21 |
fungi | lgtm | 19:32 |
opendevreview | Merged opendev/system-config master: Upgrade and reboot test nodes before openafs installation https://review.opendev.org/c/opendev/system-config/+/935812 | 19:49 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: WIP: Add mirror-container-images role and job https://review.opendev.org/c/zuul/zuul-jobs/+/935574 | 19:56 |
corvus | it would be a very simple role, except i'm trying to keep as much in common with the other container roles as possible, which means some complex data structure stuff to support the idea of restricted secrets.... | 19:57 |
opendevreview | Merged opendev/system-config master: Allow all Ubuntu releases to use our OpenAFS PPA https://review.opendev.org/c/opendev/system-config/+/935723 | 19:59 |
clarkb | fungi: ^ I think the mirror deploy job for ^ has succeeded, but I'm guess we still need to manually udpate the packages on the noble mirrors | 20:31 |
clarkb | I'm going to pop out for a bike ride now but will be back in a bit | 20:32 |
fungi | yep, on it | 20:33 |
fungi | have fun! | 20:33 |
fungi | turns out the deploy job did install the newer package versions, so the servers really just needed rebooting. i'm working through them now | 21:13 |
fungi | but https://mirror.sjc3.raxflex.opendev.org/ indicates it's working fine on noble | 21:13 |
fungi | and they're all rebooted onto 1.8.13-1~ppa0~noble now | 21:23 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add mirror-container-images role and job https://review.opendev.org/c/zuul/zuul-jobs/+/935574 | 21:24 |
*** jhorstmann is now known as Guest606 | 21:35 | |
clarkb | fungi: thank you for taking care of that | 22:46 |
clarkb | corvus: I'm going to close our screen on insecure-ci-registry.o.o. The log file appears to have captured what we tee'd out to the console too so we don't need that info | 23:11 |
clarkb | (we don't need the console open to see the info I mean) | 23:12 |
*** jhorstmann is now known as Guest614 | 23:13 | |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: WIP: Add mirror-container-images role and job https://review.opendev.org/c/zuul/zuul-jobs/+/935574 | 23:18 |
corvus | clarkb: ++ | 23:18 |
corvus | i already detached | 23:18 |
clarkb | done | 23:19 |
clarkb | also that log file got big (just over 2GB) | 23:19 |
clarkb | there is plenty of room on the server so not a big deal to hang onto it for now | 23:19 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add mirror-container-images role and job https://review.opendev.org/c/zuul/zuul-jobs/+/935574 | 23:45 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!