Tuesday, 2022-05-24

-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 843038: buildset registry: run socat in new session https://review.opendev.org/c/zuul/zuul-jobs/+/84303800:29
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 842243: DNM: Try jobs with Ansible 5 https://review.opendev.org/c/zuul/zuul/+/84224300:29
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/nodepool] 841651: Add the component registry from Zuul https://review.opendev.org/c/zuul/nodepool/+/84165101:02
-@gerrit:opendev.org- Zuul merged on behalf of Joshua Watt: [zuul/nodepool] 842719: openstack: Remove metadata limit checks https://review.opendev.org/c/zuul/nodepool/+/84271901:59
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:02:01
- [zuul/zuul-jobs] 843048: DNM: welfare check: ensure-podman https://review.opendev.org/c/zuul/zuul-jobs/+/843048
- [zuul/zuul-jobs] 843049: ensure-podman: Remove kubic from Ubuntu 18.04 https://review.opendev.org/c/zuul/zuul-jobs/+/843049
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 843093: ensure-podman: Use PPA for Ubuntu 20.04 https://review.opendev.org/c/zuul/zuul-jobs/+/84309302:14
@jim:acmegating.comianw: tristanC ^ it looks like in 2020, the podman project said to use opensuse kubic instead of the launchpad ppa, so we switched to it.  but apparently recently podman has disappeared from kubic for both bionic and focal.  those experimental changes switch us back to using the ppa for those versions.  the ppa has been deprecated since 2019 though, and apparently only has very old versions of podman.  if those changes work, should we just go with that, or do you have other suggestions on how to proceed?02:18
@jim:acmegating.comhttps://opendev.org/zuul/zuul-jobs/commits/branch/master/roles/ensure-podman/tasks/Ubuntu.yaml tells the history02:19
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/project-config] 843097: Increase check and gate pipeline priority https://review.opendev.org/c/zuul/project-config/+/84309702:34
@jim:acmegating.comianw: tristanC ...and since the ppa doesn't have packages for focal... i'm not sure how to install it there.  should we drop focal support from the role?02:59
@sean-k-mooney:matrix.orgon uubuntu 22.04 i think the recomendation is to use the distro version of cri-o and possible podman instead of kubic so perhaps as a result of that move they drop the old versions? 07:02
@sean-k-mooney:matrix.orghttps://podman.io/blogs/2022/04/05/ubuntu-2204-lts-kubic.html07:03
@sean-k-mooney:matrix.orgcorvus so instead of droping support i think it should just install it form the disto packages07:04
@sean-k-mooney:matrix.orgon ubuntu but that might mean on older ubuntu the role will fail07:04
@sean-k-mooney:matrix.orgon 22.04 it really shoud use the disto version even if it was still on kubic07:05
@arddennis:matrix.orgI tried to write a test for 842913 and found that I completely don’t understand how to catch a TypeError exception or compare a list of images within fake provider with expected list. 10:57
If post_upload_hook fails, the image does not become ready and I reach the timeout using waitForImage as it tries to upload it to a fake provider all the time until a timeout is reached and in my case all of them would fail by design. I tried to add an option and override the default state=zk.READY on getMostRecentImageUpload in waitForImage and it helped to get all failed uploads. But this doesn’t help to catch a TypeError exception during build execution. It is in the builder log, but I don't understand how to catch it in the test.
The problem is that in zookeeper it is marked as failed and assertions isn't working. Also I don't see a way to use fake provider directly to get a list of images to find out that some wasn't deleted due to an exception.
Any help would be appreciated as I tried myself and already near to giving up.
@tristanc_:matrix.orgcorvus: using the ppa and dropping focal support sounds good to me12:13
@fungicide:matrix.orgdoes anyone happen to know why the author of the fileset spec ( https://review.opendev.org/839550 ) abandoned it less than a month after proposing? related to the earlier re2 discussion or something else?12:29
@tristanc_:matrix.orgI just sent http://lists.zuul-ci.org/pipermail/zuul-discuss/2022-May/001803.html , happy to talk about it here too :)13:37
-@gerrit:opendev.org- Denys Mishchenko proposed: [zuul/nodepool] 842913: Fix for post_upload_hook failure. id missing https://review.opendev.org/c/zuul/nodepool/+/84291313:52
@fungicide:matrix.org> <@tristanc_:matrix.org> I just sent http://lists.zuul-ci.org/pipermail/zuul-discuss/2022-May/001803.html , happy to talk about it here too :)14:00
given the name, i guess you developed it primarily in order to identify unused jobs for cleanup?
@fungicide:matrix.orgthat's something i've been pondering for opendev, as people have a tendency to not remove job definitions when they stop running them14:00
@fungicide:matrix.orgthough i suppose the distinction between configuration in a project used by our zuul and configuration in a project for use by another zuul isn't always clear-cut14:01
@fungicide:matrix.orglooking at the explanation of how it works, would it make sense to add teh dependency graph and search methods into zuul's rest api?14:04
@tristanc_:matrix.orgfungi: that is correct, finding unused objects is the goal. Though this needs more thought to get it right, for example a job can be unused in one deployment, but not in another.14:04
@fungicide:matrix.orgwondering why doing it in a separate service outside zuul-web was more attractive14:05
@tristanc_:matrix.orgfungi: it was easier to do the proof of concept as a standalone project, and we would like to connect the dependencies between multiple tenant and deployment, and i'm not sure how this would fit in the current zuul-web.14:07
@fungicide:matrix.orgoh, i see, cross-tenant analysis would indeed increase complexity, especially taking access controls for different tenants into account (you don't want to expose configuration information to users for a tenant they're not supposed to be able to access)14:08
@tristanc_:matrix.orgthough we would love to bring such capabilities to zuul-web14:09
@fungicide:matrix.orglonger term, though, it would be pretty awesome to have more thorough search capabilities integrated into zuul. it comes up fairly often in opendev, and our current answer to users wanting to change or remove jobs is to loop over every branch of every project and query the rest api for those separately14:10
@jim:acmegating.comi agree.  it seems like starting with doing that on a per-tenant basis would be reasonable, and if we want to do cross-tenant analysis, we could maybe think about adding an endpoint that requires a global admin token to get that info...14:12
@jim:acmegating.comjust a quick caution: zuul's zk storage is not a supported external interface (so we're totally going to change that without any kind of notice).  i know you know that, and given the scope and "proof of concept" label for the software, are likely willing accept that for now.  just wanted to mention it though.  :)14:14
@jpew:matrix.orgIs there a plan for a zuul/nodepool release in the near future?14:30
@jim:acmegating.comjpew: nothing concrete -- maybe in a week or so if we restart opendev i'd guess14:34
@fungicide:matrix.orgi'd like to be able to see the new nodepool components view in production. are there other big changes we'd want to get merged before a restart?14:36
@jim:acmegating.comfungi: there is no view yet; just the data in zk for a potential future view14:36
@fungicide:matrix.orger, right. i forgot it's not in the api yet14:37
@jim:acmegating.comfungi: i'd wait for https://review.opendev.org/84302014:38
@jpew:matrix.orgcorvus: Cool, thanks14:38
@jpew:matrix.orgfungi: Will the nodepool components view make nodepool show up in the `Components` section of the zuul web page14:39
@fungicide:matrix.orgjpew: eventually, but needs the api part done still14:41
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com:16:02
- [zuul/nodepool] 841816: Add provider/pool priority support https://review.opendev.org/c/zuul/nodepool/+/841816
- [zuul/nodepool] 843020: Update some variable names https://review.opendev.org/c/zuul/nodepool/+/843020
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 840244: Deduplicate jobs in dependency cycles https://review.opendev.org/c/zuul/zuul/+/84024416:35
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:17:16
- [zuul/zuul-jobs] 843093: ensure-podman: Remove kubic from Ubuntu 18.04 and drop 20.04 https://review.opendev.org/c/zuul/zuul-jobs/+/843093
- [zuul/zuul-jobs] 843038: buildset registry: run socat in new session https://review.opendev.org/c/zuul/zuul-jobs/+/843038
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:17:46
- [zuul/zuul-jobs] 843093: ensure-podman: Remove kubic from Ubuntu 18.04 and drop 20.04 https://review.opendev.org/c/zuul/zuul-jobs/+/843093
- [zuul/zuul-jobs] 843038: buildset registry: run socat in new session https://review.opendev.org/c/zuul/zuul-jobs/+/843038
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 840758: WIP: Add USER to Dockerfile https://review.opendev.org/c/zuul/zuul/+/84075817:53
@iwienand:matrix.orgcorvus: it looks like they've restored podman builds to kubic there, although old versions that don't have backports https://github.com/containers/podman/issues/1433619:45
@jim:acmegating.comtristanC: do you have any time/interest in looking at why zuul-jobs-test-registry-buildset-registry-openshift-docker is failing here: https://zuul.opendev.org/t/zuul/build/fa53c907d2534ae6b0c512ae5e1fc54420:29
@jim:acmegating.com(i'm pretty sure that's some kind of bitrot)20:30
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 843212: DNM: test buildset registry jobs for bitrot https://review.opendev.org/c/zuul/zuul-jobs/+/84321220:32
@jim:acmegating.comtristanC: ^ well, let's see if it's bitrot20:32
@jpew:matrix.orgI'm trying to dequeue some jobs in a pipeline, and it's not doing anything.... it seems like the scheduler really doesn't want to do anything anymore: `Events: 0 trigger events, 9 management events, 714 results.`20:53
@jpew:matrix.orgAnd there is nothing interesting in the scheduler logs20:53
@jpew:matrix.org.... does it have to wait for nodepool to spin up the nodes for the dequeued jobs before it can process the rest of the event queue maybe?20:54
@clarkb:matrix.orgjpew: it needs to process the 9 managment events then the 714 results before taking new action20:57
@clarkb:matrix.orgthe event loop is probably chewing on whatever those management events are which I think can be your dequeues20:58
@jpew:matrix.orgYa, I can see it canceling the jobs now20:59
@jpew:matrix.orgI think it must not be able to cancel the jobs until the nodepool node has been allocated if the allocation request has already been put in21:00
@jpew:matrix.orgWhich.... takes a while for 450 nodes21:00
@jim:acmegating.comit cancels node requests too21:11
@jpew:matrix.orghmm21:14
@jpew:matrix.orgSome operation is taking a long time and the queue gets _really_ backed up21:15
@jpew:matrix.orgI can't figure out what though21:15
@jpew:matrix.orgIt's back up to `Events: 2 trigger events, 2 management events, 347 results.`21:15
@jpew:matrix.orgIs there a way to look into the queue?21:37
@jpew:matrix.orgIt keeps spiking up to 100's of pending events (like there is occassionally an event which is taking a really long time and stalling the queue)21:39
@jim:acmegating.comtristanC: i think the openshift buildset registry thing is fine -- i think it's actually an ipv6 problem that is probably long-standing.  i think it might improve after we actually merge that change series and it gets used by the opendev image base jobs.  sorry for the (hopefully) false alarm21:54
@jim:acmegating.com(we just got unlucky enough for all the jobs on that stack to hit ipv6 nodes, and my noop healthcheck job just hit an ipv4 node and worked)21:55
@jim:acmegating.comi'll recheck https://review.opendev.org/843038 until it hits a v4 node to confirm21:55
@jim:acmegating.comjpew: one thing to consider, if the changes you want to dequeue are in a gate pipeline, dequeue them back to front.21:56
@jpew:matrix.orgcorvus: Ya, this goes beyond dequeuing I think.... something is making the scheduler trip up21:57
@jpew:matrix.orgI dug into zookeeper, and I think I've found the event queue22:02
@jpew:matrix.orgThe event it's stuck on appears to just be fairly ordinary BuildCompletedEvent though :/22:02
@jim:acmegating.comtristanC: Clark okay i think i have some real info: ovh nodes come with an ipv6 entry in /etc/hosts like "2001:41d0:302:1000::1b4f contos-7-12345678" but no global ipv6 interface address configured22:34
@jim:acmegating.comso basically that job fails when it runs on ovh because of the mismatch between /etc/hosts and the network config22:34
@jim:acmegating.comtristanC Clark ianw do we know what is writing /etc/hosts on centos-7?  is there a glean setting we should tweak or something?22:35
@iwienand:matrix.orghrm, i do not think glean is doing that22:57
@iwienand:matrix.orgi feel like OVH has in our clouds.yaml something about "this has weird ipv6, disable it..." looking22:58
@iwienand:matrix.orghttps://opendev.org/opendev/system-config/src/commit/977567ecef96b28f770f059a0f9d9e248903ac85/playbooks/templates/clouds/nodepool_clouds.yaml.j2#L5122:59
@iwienand:matrix.orgi wonder how much that plays into this ... what exactly force_ipv4 does i'm not sure, but clearly some ipv6 is getting in there23:00
@iwienand:matrix.org"The above snippet will tell client programs to prefer returning an IPv4 address." so perhaps that doesn't disable anything as such23:02
@clarkb:matrix.orgianw: corvus we forced ipv4 and no ipv6 because there was no way to get the ipv6 info to configure the interface from within the host. You had to query the nova/neutron api directly. It is possible they finally fixed that and we can just enable ipv6 now23:02
@clarkb:matrix.orghowever it is possible that multinode bridge or some other role is writing out the ipv6 from the nodepool info23:02
@jim:acmegating.comi think the force_ipv4 is going to have nodepool avoid telling zuul there's an ipv6 addr23:03
@clarkb:matrix.organd we still can't properly configure ipv6 without querying neutron23:03
@clarkb:matrix.orgbut ya ovh is weird in that the nodes do have ipv6 addresses it just wasn't ever communicated how to config that into the host itself23:03
@iwienand:matrix.orgthis must be picked up via router advertisements, i guess?23:03
@jim:acmegating.comClark: i didn't see a zuul task that looked likely to write that to /etc/hosts, so i'm presuming it's there on boot (i'm 95% sure of this)23:03
@iwienand:matrix.orgsyslog should say?23:04
@jim:acmegating.comor, rather, "at job start" instead of "on boot"23:04
@clarkb:matrix.orgianw: no ovh doesn't do RAs thats part of the problem23:04
@clarkb:matrix.orgianw: OVH neither did RAs nor dhcpv6 nor provided the info in config drive. You had to query nova/neutron directly23:04
@clarkb:matrix.orgso to work around that we told things to not try ipv6 in ovh23:04
@clarkb:matrix.orgbut I'm thinking something is leaking sufficiently to break us now23:05
@clarkb:matrix.orgits in config drive now23:06
@jim:acmegating.comif it's in config drive, glean would write it to /etc/hosts?23:07
@jim:acmegating.com(but the iface still isn't configured...)23:07
@clarkb:matrix.orgyes, I think what we may want to do now is allow ipv6 in ovh since the info is in config drive23:07
@clarkb:matrix.orgthen /etc/hosts and the interfaces should both be configured to match each other23:07
@jim:acmegating.comdoes that clouds.yaml setting prevent glean from configuring the iface?23:07
@clarkb:matrix.orgI think it is something else. I thought it was a metadata key but not seeing it23:08
@clarkb:matrix.orgoh you know what, I bet we only had to tell the nodepool/zuul side to ignore ipv6 because glean couldn't configure ipv6 in the host23:11
@clarkb:matrix.orgcorvus: on ovh focal nodes I see the interface is configured. Where is it not configuring the interface?23:12
@clarkb:matrix.orgits possible there is a glean issue configuring the interface for ipv6 on specific platforms. Focal seems to work23:12
@jim:acmegating.comcentos-723:13
@jim:acmegating.com(centos-9-stream works too)23:13
@jim:acmegating.com * centos-7 does not work23:13
@clarkb:matrix.orgcentos 7 is not network manager so ya I bet this is a platform specific issue with glean23:13
@clarkb:matrix.orgbasically we need to figure out how to configure ipv6 on centos-7 using the static info from config drive23:13
@clarkb:matrix.org(side note I've been wondering if opendev can delete centos 7 recently as tripleo is off of it aiui)23:14
@jim:acmegating.comapparently we use it for this openshift testing23:14
@clarkb:matrix.orgoh right because openshift 4 is a giant <redacted>23:14
@clarkb:matrix.orgseriously you cannot install openshift 4 without having a rack of machines and they all do the coreos thing of basically treating everything like an appliance rather than software to install23:15
@clarkb:matrix.orgcentos 7 has openshift 3 packages which are sane to install23:15
@jim:acmegating.comfull disclosure: i have no current plans to upgrade these roles/jobs for later versions of centos/openshift.  so i think it's reasonable for us to ask if anyone is interested in that, and if not, how long we want to continue centos-7/openshift-3.23:17
@jim:acmegating.com * full disclosure: i have no current plans to upgrade these roles/jobs for later versions of centos/openshift.  so i think it's reasonable for us to ask if anyone is interested in that, and if not, how long we want to continue centos-7/openshift-3 support for the buildset-registry-related jobs/roles.23:17
@clarkb:matrix.orgI looked into it briefly a while back and quickly realized openshift is not software they intend for anyone to install23:17
@clarkb:matrix.orgwhich is unfortunate because openshift 3 was fine23:17
@iwienand:matrix.orgso to start with, we should just drop the ipv4 force in the config?23:19
@clarkb:matrix.orgianw: ya I think so. We may want to confirm both ovh clouds are doing ipv6 now23:19
@clarkb:matrix.orgsince I only checked a focal host in one cloud (I didn't even bother to check which one /me goes and checks properly)23:19
@jim:acmegating.comhere are options i see:23:20
1) recheck until we get a non-ovh node and we can merge these immediate ansible-5 fixes
2) if we want to continue openshift-3 testing on centos-7, someone should fix glean (otherwise, we should back out the tests and then opendev can drop centos-7 nodes)
3) if we want to support openshift-4, someone should figure out how to do that.
@jim:acmegating.comi think 1 is short-term that maybe we should do regardless, 2 is reasonable medium term, and 3 is long term.23:20
@iwienand:matrix.orgi feel like we use nm on centos-7 too23:21
@jim:acmegating.comianwClark i don't think that clouds.yaml setting will have an effect; the failure isn't in zuul/nodepool, it's in openshift itself trying to use ipv6 with no iface configured23:21
@clarkb:matrix.orgianw: both appear good so I think we can drop that config to reduce confusion23:21
@clarkb:matrix.orgcorvus: yes removing the flag is more about reducing confusion than fixing the problem. I agree it won't fix the problem23:21
@jim:acmegating.comah okay, then, yes, we can drop it and zuul will talk to the nodes over ipv6... except, will that fail on centos-7 where there will be no ipv6 iface?23:22
@jim:acmegating.com * ah okay, then, yes, we can drop it and zuul will talk to the nodes over ipv6... except, will that fail on centos-7 where the ipv6 iface will not have the global address attached to it23:22
@jim:acmegating.com * ah okay, then, yes, we can drop it and zuul will talk to the nodes over ipv6... except, will that fail on centos-7 where the ipv6 iface will not have the global address attached to it?23:22
@clarkb:matrix.orghttps://docs.okd.io/latest/installing/installing_sno/install-sno-preparing-to-install-sno.html for anyone wanting to do openshift 423:23
@clarkb:matrix.orgcorvus: oh ya I bet it will23:24
@jim:acmegating.comin that case i think #2 above would be a pre-req for changing the clouds.yaml config23:24
@jim:acmegating.com(but either half of #2 -- either fix centos-7 or drop it)23:24
@clarkb:matrix.orgnote the minimum resource requirements are larger than our base test node size. It also requires special dns records for some reason23:25
@iwienand:matrix.orgyeah, we are using the same path for NM on centos-7 as other platforms fwiw23:25
@clarkb:matrix.orgianw: we seem to skip ipv6 configuration for redhat platforms in glean entirely23:28
@clarkb:matrix.orglet me get a link23:28
@clarkb:matrix.orghttps://opendev.org/opendev/glean/src/branch/master/glean/cmd.py#L269-L270 and that is the type set by ovh23:28
@clarkb:matrix.orgwhich makes me wonder if this is working on centos 923:29
@clarkb:matrix.orgI think we can "fix" this by having /etc/hosts similarly ignore ipv6 addresses on red hat platforms. However, that will break platforms that do RAs23:30
@clarkb:matrix.orgI suspect this is also a problem on rackspace23:30
@clarkb:matrix.org(since I don't think they do RAs)23:30
@clarkb:matrix.orgOr we can actually address the config where that continue is23:32
@clarkb:matrix.orgya just confirmed a centos-9-stream host on bhs1 does not have the ipv6 interface configured23:35
@clarkb:matrix.orgI don't know how rax works. Maybe they do RAs23:35
@iwienand:matrix.orghrm, since we do this on the deb path using nm, i can't see why setup would be different23:35
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 840758: WIP: Add USER to Dockerfile https://review.opendev.org/c/zuul/zuul/+/84075823:36
@clarkb:matrix.orgyup confirmed rax does RAs (the ipv6 static config doesn't end up in the sysconfig file)23:37
@clarkb:matrix.orgso ya I think this is just a long latent lack of functionality in glean that hasn't been a problem because the clouds always did RAs in addition to providing the network config. Except for ovh23:38
@clarkb:matrix.orgianw: I don't think debian uses NM23:38
@clarkb:matrix.orgit does ENI and it properly writes out ipv6 config for it23:38
@clarkb:matrix.orgreally that would be the best thing here is if we can sort out what an ipv6 config is meant to look like then configure it in glean23:38
@iwienand:matrix.orgi have about 10% of the work done for converting this to "keyfiles" 23:43
@iwienand:matrix.orgperhaps we should do that as step 223:43
@iwienand:matrix.orgthat == ipv623:43
@clarkb:matrix.orgwhat is a keyfile?23:43
@iwienand:matrix.orgindeed.  keyfiles are the native nm ini config files, instead of ifcfg-* style23:44
@iwienand:matrix.orgnm now ships a tool that converts them23:44
@iwienand:matrix.orgwe have all this vlan and bonding stuff that i'm not sure if it's ever used23:46
@tristanc_:matrix.org@corvus not sure if that matter, but in the build you linked, there is this warning (which does not appears in the other one): https://zuul.opendev.org/t/zuul/build/fa53c907d2534ae6b0c512ae5e1fc544/log/docker/k8s_scheduler_kube-scheduler-localhost_kube-system_061a93450f3edcadf64ae87542c300e5_0.txt#523:47
@clarkb:matrix.orgI think ironic uses that23:47
@clarkb:matrix.orgianw: looking at glean the other issue is opensuse. I expect it is similarly broken23:47
@clarkb:matrix.organyway I think we can probably add IPV6ADDR and the routes without too much trouble? But i'm not sure why they were never added in the first place23:48
@iwienand:matrix.orgyeah, i started out with hopes of replacement, but i think that it's probably going to be a NM_USE_KEYFILE=1|0 situation, start with just simple interfaces and we can turn it on (starting with fedora) and catch-up the other bits later23:49

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!