Monday, 2022-05-23

*** Guest0 is now known as prometheanfire00:17
*** mazzy50988129295808594944 is now known as mazzy509881292958085949400:47
fricklerso ... some story about debootstrap. by default it indeed only uses the release pocket with too old ca-certs06:06
fricklerin theory on could add --extra-suites=focal-updates,focal-security, but then the fun begins06:07
fricklerthat option is made to be able to add extra pks from other sources, it doesn't deal with pkgs appearing multiple times06:08
ianwi think using http:// is a good work around :)06:08
frickleror put differently, debootstrap still used the first version of a pkgs it finds, there is no mechanism to select the most recent version like apt would do06:09
fricklerso I tried to turn things around, create a "focal-updates" image and have "--extra-suites=focal" as fallback for pkgs that didn't receive updates06:10
fricklerthis uncovers a bug in the updated apt, which fails to install from scratch at all06:10
fricklerthe only feasible solution I found would be to install an updated ca-certificates pkg via dpkg after debootstrap finishes06:11
fricklerianw: seems a bit sad, but likely it is indeed. even jammy will have to fall back to that if the situation persists once the released ca-certs are no longer good enough06:12
ianwyeah, i think the main mirror is cloudflare, which probably doesn't use LE certs?  maybe a lot of people use https via that, hiding this even further?06:13
ianwwe're probably getting fairly unique even using debootstrap, but also against our own mirrors06:14
fricklerack06:15
fricklerianw: did you see the dib failures at https://review.opendev.org/c/openstack/diskimage-builder/+/842856 ? I haven't found out what is breaking there, not sure if I should just keep rechecking06:16
ianwumm, havne't checked dib queue today06:16
frickleralso please add https://review.opendev.org/c/openstack/project-config/+/842853 to your review list, not sure if my idea works and waiting for a periodic run to test is tedious06:17
ianw2022-05-22 14:00:00.016130 | LOOP [push-to-intermediate-registry : Push tag to intermediate registry]06:17
ianw2022-05-22 14:29:42.793690 | POST-RUN END RESULT_TIMED_OUT: [trusted : opendev.org/opendev/base-jobs/playbooks/buildset-registry/post.yaml@master]06:17
frickleryes, I looked at the registry log, too, but didn't find out what is broken there06:18
ianwthat's weird06:18
ianwyeah the ir logs seem to cut off at 14:0006:19
ianwhttps://7e29afca27547df970c8-f36a7ace61553ff461a9764933cd7ea3.ssl.cf5.rackcdn.com/842856/1/check/opendev-buildset-registry/1e94e9a/docker/buildset_registry.txt06:20
ianwi dunno, probably rechecking and if it replicates we'll need to look more06:21
*** ysandeep|out is now known as ysandeep|rover06:25
fricklerianw: well this was two times in a row. the first time I just assumed some network or whatever hickup, but twice the same I don't know. but ok, let's go for three to have more confidence06:25
*** frenzy_friday is now known as frenzyfriday|ruck06:34
*** jpena|off is now known as jpena07:33
*** elodilles is now known as elodilles_afk08:31
*** ysandeep|rover is now known as ysandeep|rover|lunch08:38
opendevreviewDr. Jens Harbott proposed openstack/diskimage-builder master: DNM: Testing registry failures  https://review.opendev.org/c/openstack/diskimage-builder/+/84292809:16
fricklerianw: failed the third time in a row at the same step. testing with ^^ now to make sure it is really independent.09:19
*** ysandeep|rover|lunch is now known as ysandeep|rover10:07
mnasiadkamgoddard, yoctozepto: the haproxy single frontend patch is ready for review - https://review.opendev.org/c/openstack/kolla-ansible/+/823395 (and CI in https://review.opendev.org/c/openstack/kolla-ansible/+/841239) if you have some time10:18
yoctozeptomnasiadka: let's discuss on #openstack-kolla, not here10:20
mnasiadkaups10:20
mnasiadkamakes sense :)10:20
yoctozepto:-)10:20
*** rlandy|out is now known as rlandy10:21
*** dviroel|out is now known as dviroel11:21
fricklerweird post failure on a docs build in gate, maybe someone else sees more than I do? https://zuul.opendev.org/t/openstack/build/8246b40d474545b99f7e0dc5134fbad111:40
fungilooks like the test node got unhappy in the post-run playbook after copying sphinx-build-pdf.log from work/logs/ and before or during the copy of whatevery would have been in work/artifacts/11:47
fungi"kex_exchange_identification: Connection closed by remote host" is usually an indication that the sshd is unhappy11:48
fungithough interestingly, that build has artifacts11:49
fungioh, that's uploaded by the fetch-sphinx-tarball role in the earlier post-run playbook11:52
funginot by the fetch-output role, which is what broke11:52
fungimy guess is something unexlected (ECLOUD) happened to the node, causing ssh to start insta-closing new connections at that moment11:53
fungier, something unexpected11:54
*** elodilles_afk is now known as elodilles12:16
*** ysandeep|rover is now known as ysandeep|rover|brb12:21
fricklerah, right, I missed that, looks like just the usual cloud hickup indeed12:30
*** ysandeep|rover|brb is now known as ysandeep|rover12:37
fricklerzuul held the failing nodepool-build-image-siblings job, but not the depending registry, that's not helpful for debugging13:14
fungimight instead need to patch the job in the broken change to just force it to wait prior to the upload, and increase the timeout(s)?13:16
frickleryeah, I guess for now I'll stick to hoping someone with more experience with this setup will pick things up ;)13:55
Clark[m]Not really here yet but I would check file sizes for the image (did it explode in size causing the job to timeout uploading it?) And maybe check the intermediate registry logs (that's the insecure CI registry)13:59
opendevreviewJoseph Kostreva proposed zuul/zuul-jobs master: prepare-workspace: Add role variable prepare_workspace_delete_dest  https://review.opendev.org/c/zuul/zuul-jobs/+/84272314:36
opendevreviewJoseph Kostreva proposed zuul/zuul-jobs master: prepare-workspace: Add variable prepare_workspace_delete_dest  https://review.opendev.org/c/zuul/zuul-jobs/+/84272314:37
opendevreviewJoseph Kostreva proposed zuul/zuul-jobs master: prepare-workspace: Add variable prepare_workspace_delete_dest  https://review.opendev.org/c/zuul/zuul-jobs/+/84272314:44
opendevreviewMohammed Naser proposed openstack/project-config master: neutron: add neutron-vpnaas-stable-maint  https://review.opendev.org/c/openstack/project-config/+/84298514:49
*** dviroel is now known as dviroel|lunch15:20
*** mazzy50988129295808594948 is now known as mazzy509881292958085949415:43
*** marios is now known as marios|out15:44
*** mazzy50988129295808594940 is now known as mazzy509881292958085949415:57
*** dviroel|lunch is now known as dviroel16:29
*** ysandeep|rover is now known as ysandeep|out16:30
corvus3 nodepool image builds in a row failed with post_failure; i checked the first and it timed out while pushing to the intermediate registry; i assume the same is true for the other 216:57
corvusthe failed job started pushing at 15:42 and timed out at 16:0916:57
clarkbcorvus: I think frickler  was looking at that but unsure of progress17:00
clarkblooks like frickler  was hoping someone else with more knowledge of the setup could look at it next17:01
corvusi suspect all the worker threads may be stuck.  i don't think we have metrics for that or a sigusr2 handler, so difficult to confirm.17:04
corvus#status log restarted zuul-registry since it appeared to be stuck17:04
fungiand its logs were basically silent?17:09
*** jpena is now known as jpena|off17:10
corvusfungi: yeah, a few ssl connection errors over the past few days17:34
corvusnot even the usual complement of entries from bots/crawlers/etc17:34
fungiinteresting17:37
*** rlandy is now known as rlandy|mtg18:14
fricklerah, those were actual nodepool builds failing. I had only been looking at dib failures. but it seems both are repaired now \o/18:59
*** rlandy|mtg is now known as rlandy19:11
johnsomDoes anyone know what is up with the "caputre-performance-data" task "Unkown database 'stats'" ?19:20
johnsomhttps://zuul.opendev.org/t/openstack/build/52a212790e0f4ce3b29b7ed3448b10a8/log/job-output.txt#784719:20
johnsomOh, that isn't a zuul task, it's devstack. I will go bug the qa channel.19:22
johnsomAh, that is a side effect of: ERROR: Could not find a version that satisfies the requirement os-brick===5.2.019:34
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Correct git config item name in mirror-workspace-git-repos  https://review.opendev.org/c/zuul/zuul-jobs/+/84302320:13
opendevreviewMerged zuul/zuul-jobs master: Correct git config item name in mirror-workspace-git-repos  https://review.opendev.org/c/zuul/zuul-jobs/+/84302320:36
corvusinfra-root: ^ heads up that touches every job (it passed a base-test cycle, so should be fine)20:42
fungiyep, thanks! i was satisfied with the results of the base-test testing21:00
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Remove "include:" usage from multi-node-bridge  https://review.opendev.org/c/zuul/zuul-jobs/+/84302621:09
*** dviroel is now known as dviroel|out21:21
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Remove "include:" usage from multi-node-bridge  https://review.opendev.org/c/zuul/zuul-jobs/+/84302621:24
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Remove "include:" usage from multi-node-bridge  https://review.opendev.org/c/zuul/zuul-jobs/+/84302621:37
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Remove "include:" usage from multi-node-bridge  https://review.opendev.org/c/zuul/zuul-jobs/+/84302621:44
*** rlandy is now known as rlandy|biab22:01
corvusinfra-root: i have piloted switching jobs for the zuul project to ansible 5 and fixed two issues that came up.  it would probably be good for someone to do similar with some openstack jobs before opendev and/or zuul switches the default version22:15
clarkbI've just added ^ this topic to the meeting agenda which I'll send out soon22:16
opendevreviewClark Boylan proposed opendev/system-config master: Try system-config-run jobs with ansible v5  https://review.opendev.org/c/opendev/system-config/+/84303222:29
opendevreviewJames E. Blair proposed openstack/project-config master: Set "zuul" tenant default Ansible version to 5  https://review.opendev.org/c/openstack/project-config/+/84303422:31
clarkbI pushed a change to devstack and one to system-config to start collecting info for those sets of jobs. I expect we'll get decent coverage out of those to start22:32
corvusclarkb: depends-on https://review.opendev.org/843026 yeah?22:36
clarkbcorvus: not yes but just noticed I need to22:36
corvusfungi: ^ there are replies to your question on that, if you feel like re-approving it22:36
corvusclarkb: ++22:36
clarkbcorvus: do we record the ansible version in the job log somewhere to be extra sure (the failure nidicates it is working though)22:37
corvusthat's in "multinode" so it's going to hit a lot22:37
opendevreviewClark Boylan proposed opendev/system-config master: Try system-config-run jobs with ansible v5  https://review.opendev.org/c/opendev/system-config/+/84303222:37
corvusclarkb: heh, yeah so far "red" has been the easiest way to tell :)22:37
corvusclarkb: 2022-05-23 22:33:27.907109 | Ansible Version: 2.9.2722:39
corvusclarkb:  2022-05-23 22:31:42.537820 | Ansible Version: 2.12.522:40
corvus2.12 == 522:40
corvuslooks like it's just that in job-output.txt; i don't see it in any of the other files22:42
clarkbthanks22:42
clarkbcorvus: devstack is hitting https://docs.ansible.com/ansible-core/2.12/user_guide/become.html#risks-of-becoming-an-unprivileged-user22:55
clarkbat first glance I'm a bit worried that this is going to affect things more broadly22:55
clarkbrc: 1, err: chmod: invalid mode: \u2018A+user:stack:rx:allow\u2019\nTry 'chmod --help' for more information.\n is the error22:56
clarkbA+user:stack:rx:allow I'm not even sure how to process that22:57
clarkb"POSIX-draft ACL specification. Solaris, maybe others." from the ansible source. Well we aren't solaris and it doesn't work23:01
clarkbok so the issue is we're unprivileged as zuul and trying to do privileged task of chowning a file to another user?23:03
clarkbFor whatever reason they allow the solaris case to fall through and send you that error message regardless of the platform hyou are on :/23:03
clarkbbut how did this ever work?23:04
clarkbthis is quickly feeling like a thread I don't want to pull on rightn ow as it will likely end up like that pip install thread23:04
corvusclarkb: i agree, neither of those things make sense (why would this be a problem now and not before; and why would adding the extra thing for solaris cause a failure?)23:09
*** rlandy|biab is now known as rlandy23:18
opendevreviewMerged zuul/zuul-jobs master: Remove "include:" usage from multi-node-bridge  https://review.opendev.org/c/zuul/zuul-jobs/+/84302623:47
*** rlandy is now known as rlandy|out23:50

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!