Tuesday, 2022-07-12

@iwienand:matrix.org(also, https://review.opendev.org/c/opendev/system-config/+/848562 is a related change that reworks the way we do ssl testing to remove the insecure flags)00:03
@iwienand:matrix.org> <@iwienand:matrix.org> i've filed https://github.com/containers/podman/issues/14884 but upon more research, i'm starting to think it's a cgroups v2 thing00:33
as an update; this is most certainly related to cgroups v2 in jammy v v1 in focal. i'm currently trying some changes that put the nested podman in a separate cgroup in https://review.opendev.org/c/openstack/diskimage-builder/+/849274/
@iwienand:matrix.orgi note that corvus probably wants to get nodepool out of the image building business and that is probably a good direction for this.  all the problems we've had with the containerfile approach have been from the nesting trying to run podman inside the nodepool-builder docker container00:36
@jim:acmegating.comianw: yeah, i've got a back-burner change (just a few mins here and there) to try building a simple nodepool image in a job at https://review.opendev.org/848792 -- that was to validate the ideas in https://review.opendev.org/775042 a bit more.  but i've also recently learned of a zuul user who is in fact doing almost exactly what is in that spec already, so it seems that doesn't really need a lot of validation.  :)  i think 848792 might be more useful as a seed for making some roles for image building.  and we could pivot that to validating a fix for the cgroups stuff.  i know of some cloud operators that may be interested in joining in on "roles for building images in zuul jobs" as well.  i'm also starting a new spec that builds on those ideas to make an even more robust; i hope to have something to show regarding that later this week.  so all in all -- yeah, it's starting to feel like some stuff is lining up for getting some momentum on "build images in zuul jobs".01:53
@jim:acmegating.com * ianw: yeah, i've got a back-burner change (just a few mins here and there) to try building a simple nodepool image in a job at https://review.opendev.org/848792 -- that was to validate the ideas in https://review.opendev.org/775042 a bit more.  but i've also recently learned of a zuul user who is in fact doing almost exactly what is in that spec already, so it seems that doesn't really need a lot of validation.  :)  i think 848792 might be more useful as a seed for making some roles for image building.  and we could pivot that to validating a fix for the cgroups stuff.  i know of some cloud operators that may be interested in joining in on "roles for building images in zuul jobs" as well.  i'm also starting a new spec that builds on those ideas to make an even more robust nodepool/zuul image building story; i hope to have something to show regarding that later this week.  so all in all -- yeah, it's starting to feel like some stuff is lining up for getting some momentum on "build images in zuul jobs".01:57
@blaisepabon:matrix.orgOMG... it works! https://u.do.controlplane.info/tenants02:27
@blaisepabon:matrix.orgThank you corvus (of course, I'll have to get it behind some kind of login page so Heath doesn't freak out.02:30
@iwienand:matrix.orgcorvus: so after looking; trying to get all images from a "pip install dib" on a system is somewhere between impossible and very annoying at best.  e.g. missing packages on focal, etc.  so i'd propose a dib reference image, that is based on nodepool-builder; something along the lines of https://review.opendev.org/c/openstack/diskimage-builder/+/849454.  i've not wanted to duplicate nodepool-builder, but perhaps it is time.  I would propose this is used similar to what is being done in https://review.opendev.org/c/openstack/diskimage-builder/+/791888.  basically as a privileged container that you map various things into 07:20
@iwienand:matrix.org(that doesn't really solve the cgroups v2 issue, but that's something i think worth working through with upstream anyway)07:21
@avass:vassast.orgWe're seeing ansible `command` modules getting stuck before executing in like 1/100 jobs, and when that happens in cleanup it's really bad because we've got jobs hung indefinitely unless they're dequeue and the process on the executor is killed.08:29
I've been digging for a while and from what I can see the on_task_start callback runs to print the task banner, but the build node never seem to run the actual task so ansible-playbook never exits. But I can't find any obvious race condition in logging, or in the command module anywhere. Is anyone else experiencing something similar?
-@gerrit:opendev.org- Benjamin Schanzel proposed on behalf of Tobias Henkel: [zuul/nodepool] 743790: Check for images to upload single threaded https://review.opendev.org/c/zuul/nodepool/+/74379011:19
@avass:vassast.orgMaybe we're seeing this issue: https://github.com/ansible/ansible/issues/59642, if so that may be fixed in ansible 511:33
-@gerrit:opendev.org- Albin Vass proposed: [zuul/zuul-jobs] 849502: DNM: Run tests with ansible 5 https://review.opendev.org/c/zuul/zuul-jobs/+/84950212:06
@jim:acmegating.comianw: that makes sense, but one of the advantages of the zuul job approach is that we don't need a single system to make all images (you could request an ubuntu node to build ubuntu images; a fedora node to build fedora images, etc).  only if the underlying cloud doesn't already have such a node would you need to cross-build.13:50
@jim:acmegating.comianw: (not arguing against the image; more just advocating that the non-image approach may still be an option)13:51
@avass:vassast.orgcorvus: are you talking about building an image like what I did for digitalocean: https://review.opendev.org/c/zuul/zuul-jobs/+/786757 ?13:57
@avass:vassast.orgor is it more like 1) build an image and upload it somehere 2) report it as an artifact that nodepool can pick up and use13:58
@avass:vassast.org * or is it more like 1) build an image and upload it somewhere 2) report it as an artifact that nodepool can pick up and use13:58
@fungicide:matrix.org> <@avass:vassast.org> or is it more like 1) build an image and upload it somewhere 2) report it as an artifact that nodepool can pick up and use14:18
i think the latter, as a continuation of this discussion from last year: https://lists.zuul-ci.org/pipermail/zuul-discuss/2021-February/001503.html
@jim:acmegating.comyeah, i'm looking at dib based, not snapshot14:22
-@gerrit:opendev.org- Artem Goncharov proposed wip: [zuul/zuul] 849033: Initial implementation of the gitea driver https://review.opendev.org/c/zuul/zuul/+/84903314:31
-@gerrit:opendev.org- Artem Goncharov marked as active: [zuul/zuul] 849033: Initial implementation of the gitea driver https://review.opendev.org/c/zuul/zuul/+/84903314:31
@jim:acmegating.comAlbin Vass: btw the zuul tenant has been using ansible5 by default for a while (and now all of opendev)14:36
@avass:vassast.org> <@jim:acmegating.com> Albin Vass: btw the zuul tenant has been using ansible5 by default for a while (and now all of opendev)15:15
Yeah i realized after pushing that change. Found out that the mirror-workspace role was already fixed and i bumped our internal fork. But Ansible 5 didn't solve our issue anyway :(
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 849570: Add pipeline-based merge op metrics https://review.opendev.org/c/zuul/zuul/+/84957017:27
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 849582: Speed up node listing https://review.opendev.org/c/zuul/nodepool/+/84958221:06

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!