openstackgerrit | James E. Blair proposed zuul/zuul master: Fix repo state restore / Keep jobgraphs frozen https://review.opendev.org/c/zuul/zuul/+/785536 | 00:11 |
---|---|---|
openstackgerrit | James E. Blair proposed zuul/zuul master: Restore repo state in checkoutBranch https://review.opendev.org/c/zuul/zuul/+/786523 | 00:11 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Clarify merger updates and resets https://review.opendev.org/c/zuul/zuul/+/786744 | 00:11 |
*** ajitha has joined #zuul | 01:20 | |
*** evrardjp has quit IRC | 02:33 | |
*** evrardjp has joined #zuul | 02:33 | |
*** ikhan has joined #zuul | 03:13 | |
*** ykarel__ has joined #zuul | 03:51 | |
*** ikhan has quit IRC | 03:55 | |
*** bhavikdbavishi has joined #zuul | 04:21 | |
*** bhavikdbavishi1 has joined #zuul | 04:24 | |
*** bhavikdbavishi has quit IRC | 04:26 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 04:26 | |
*** ikhan has joined #zuul | 04:26 | |
*** ikhan has quit IRC | 04:33 | |
*** vishalmanchanda has joined #zuul | 04:53 | |
*** ajitha has quit IRC | 04:55 | |
*** ykarel_ has joined #zuul | 05:02 | |
*** ykarel__ has quit IRC | 05:05 | |
*** paladox has quit IRC | 05:09 | |
*** jfoufas1 has joined #zuul | 05:15 | |
*** paladox has joined #zuul | 05:18 | |
*** hamalq has joined #zuul | 05:20 | |
*** hamalq has quit IRC | 05:24 | |
*** sam_wan has joined #zuul | 05:33 | |
*** bhagyashris|away is now known as bhagyashris | 05:33 | |
*** dpawlik7 is now known as dpawlik | 06:22 | |
*** saneax has joined #zuul | 06:25 | |
*** jcapitao has joined #zuul | 06:55 | |
*** bhavikdbavishi has quit IRC | 07:01 | |
*** rpittau|afk is now known as rpittau | 07:44 | |
*** bhavikdbavishi has joined #zuul | 07:50 | |
*** bhavikdbavishi1 has joined #zuul | 07:53 | |
*** bhavikdbavishi has quit IRC | 07:55 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 07:55 | |
*** jpena|off is now known as jpena | 07:56 | |
*** tosky has joined #zuul | 08:03 | |
*** nilsph has joined #zuul | 08:04 | |
*** nilsph is now known as nils | 08:05 | |
*** vishalmanchanda has quit IRC | 08:22 | |
*** ykarel_ is now known as ykarel|lunch | 08:46 | |
*** ajitha has joined #zuul | 08:47 | |
*** vishalmanchanda has joined #zuul | 08:58 | |
*** hamalq has joined #zuul | 09:22 | |
*** hamalq has quit IRC | 09:26 | |
*** harrymichal has joined #zuul | 09:26 | |
*** harrymichal_ has joined #zuul | 09:27 | |
*** harrymichal has quit IRC | 09:31 | |
*** harrymichal_ is now known as harrymichal | 09:31 | |
*** ykarel|lunch is now known as ykarel | 09:41 | |
*** bhavikdbavishi has quit IRC | 10:07 | |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Remove statsd args to OpenStack API client call https://review.opendev.org/c/zuul/nodepool/+/786862 | 10:07 |
*** iurygregory has joined #zuul | 10:17 | |
*** amotoki has joined #zuul | 10:23 | |
*** saneax has quit IRC | 10:41 | |
*** saneax has joined #zuul | 10:55 | |
*** avass has quit IRC | 11:00 | |
*** jcapitao is now known as jcapitao_lunch | 11:04 | |
*** bhavikdbavishi has joined #zuul | 11:29 | |
*** bhavikdbavishi has quit IRC | 11:34 | |
*** jpena is now known as jpena|lunch | 11:34 | |
*** rlandy has joined #zuul | 11:36 | |
*** rlandy is now known as rlandy|rover | 11:36 | |
*** sduthil has joined #zuul | 11:49 | |
*** jcapitao_lunch is now known as jcapitao | 12:04 | |
*** bhavikdbavishi has joined #zuul | 12:07 | |
*** avass has joined #zuul | 12:09 | |
*** ricolin has quit IRC | 12:15 | |
*** harrymichal_ has joined #zuul | 12:23 | |
*** harrymichal has quit IRC | 12:24 | |
*** harrymichal_ is now known as harrymichal | 12:24 | |
*** ikhan has joined #zuul | 12:27 | |
*** paladox has quit IRC | 12:34 | |
*** paladox has joined #zuul | 12:35 | |
*** jpena|lunch is now known as jpena | 12:37 | |
*** harrymichal has quit IRC | 12:39 | |
*** harrymichal has joined #zuul | 12:39 | |
*** sam_wan has quit IRC | 12:47 | |
*** ykarel has quit IRC | 13:12 | |
*** bhavikdbavishi has quit IRC | 13:36 | |
*** ricolin has joined #zuul | 13:50 | |
*** harrymichal_ has joined #zuul | 13:58 | |
*** harrymichal has quit IRC | 14:02 | |
*** harrymichal_ is now known as harrymichal | 14:02 | |
*** ykarel has joined #zuul | 14:47 | |
*** ykarel has quit IRC | 15:12 | |
*** bhavikdbavishi has joined #zuul | 15:24 | |
*** bhavikdbavishi has quit IRC | 15:28 | |
*** rlandy|rover is now known as rlandy|rvr|biab | 16:06 | |
*** snktparik_ has joined #zuul | 16:11 | |
*** hamalq has joined #zuul | 16:12 | |
*** hamalq_ has joined #zuul | 16:15 | |
*** hamalq has quit IRC | 16:19 | |
*** jcapitao has quit IRC | 16:52 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Clarify merger updates and resets https://review.opendev.org/c/zuul/zuul/+/786744 | 16:54 |
*** jpena is now known as jpena|off | 16:59 | |
*** rpittau is now known as rpittau|afk | 17:11 | |
*** rlandy|rvr|biab is now known as rlandy|rover | 17:36 | |
*** snktparik_ has quit IRC | 17:40 | |
openstackgerrit | Andy Ladjadj proposed zuul/zuul master: [reporter][elasticsearch] fix the timestamp when the system has a différent timezone by forcing the UTC timezone https://review.opendev.org/c/zuul/zuul/+/786444 | 17:56 |
openstackgerrit | Andy Ladjadj proposed zuul/zuul master: [reporter][elasticsearch] fix the timestamp when the system has a different timezone by forcing the UTC timezone https://review.opendev.org/c/zuul/zuul/+/786444 | 17:57 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Clarify merger updates and resets https://review.opendev.org/c/zuul/zuul/+/786744 | 18:05 |
*** saneax has quit IRC | 18:06 | |
*** hamalq_ has quit IRC | 18:08 | |
*** hamalq has joined #zuul | 18:08 | |
*** paladox_ has joined #zuul | 18:17 | |
*** jfoufas1 has quit IRC | 18:18 | |
*** paladox_ has quit IRC | 18:37 | |
*** paladox_ has joined #zuul | 18:39 | |
*** paladox_ has quit IRC | 18:41 | |
*** paladox_ has joined #zuul | 18:42 | |
*** ajitha has quit IRC | 18:45 | |
*** paladox_ has quit IRC | 18:51 | |
*** paladox_ has joined #zuul | 18:52 | |
*** paladox has quit IRC | 19:28 | |
corvus | it looks like a new failure in zuul-quick-start, possibly gerrit related? | 19:46 |
corvus | https://zuul.opendev.org/t/zuul/build/921b9aa82eaa4ce9b8b45b040ee9c6c6 | 19:47 |
clarkb | hostkey verification failed | 19:48 |
clarkb | is it possibly related to the ssh-rsa vs rsa-sha2-512 stuff that we've run into on fedora? | 19:49 |
clarkb | gerrit should make an ecdsa hostkey though iirc | 19:50 |
corvus | i'm not seeing any apparently relevant gerrit changes in the past few days after a quick check | 19:51 |
corvus | i'll fire up a local quickstart and see what i can see | 19:51 |
fungi | is is repeating, or was it just the one failure? | 19:59 |
corvus | repeating | 20:00 |
corvus | last 3 builds across 2 changes https://zuul.opendev.org/t/zuul/builds?job_name=zuul-quick-start&project=zuul/zuul | 20:01 |
mordred | corvus: I remember reading something in gerrit releasenotes about java ssh libraries | 20:01 |
corvus | mordred: huh, i thought we were running master | 20:01 |
corvus | in quickstart | 20:01 |
corvus | maybe they just merged up something though? | 20:01 |
fungi | what platform is the ssh client on? | 20:02 |
corvus | since they do that "backwards" :) | 20:02 |
*** vishalmanchanda has quit IRC | 20:02 | |
corvus | mordred: well, we run their :latest image, which may not quite be every commit? | 20:02 |
mordred | corvus: I think what I remembered was a recent release saying they were going to remove something in the future - so maybe master removed it? | 20:03 |
mordred | this is VERY hand wavey | 20:03 |
clarkb | 3.4 is very close to going out the door aiui | 20:04 |
clarkb | (though they just staretd a new channel to work through the release so maybe they hit problems/) | 20:04 |
clarkb | https://www.gerritcodereview.com/3.4.html#jcraft-jsch-client-library-is-disabled-per-default | 20:04 |
corvus | fungi: it runs on the zuul test host which is ubuntu-bionic | 20:05 |
clarkb | "Use MINA sshd library for key generation and export" also on that page | 20:05 |
mordred | yeah - that's the stuff | 20:05 |
clarkb | maybe the hostkeys aren't getting generated properly and that prevents git review from receiving and validating them | 20:05 |
fungi | are we prepopulating a known_hosts list in the test or just doing tofu? | 20:06 |
corvus | i'm getting close to that point in my local test -- i had to spend 5 minutes getting docker to remove a bunch of old images | 20:06 |
fungi | if we prepopulate known_hosts then that could explain it (ssh picking up a different ket type we didn't prepopulate with) | 20:07 |
corvus | shell: ssh-keyscan -p 29418 localhost > {{ workspace }}/known_hosts | 20:08 |
corvus | we do that | 20:08 |
corvus | it wfm using my existing rsa key | 20:09 |
corvus | (i say that because i have a recollection that what keys an agent knows about might influence what keyscan does?) | 20:09 |
corvus | hrm, keyscan seems to emit a lot of keys regardless? | 20:11 |
corvus | i get a ssh-rsa, ecdsa-sha2-nistp256, and ssh-ed25519 | 20:12 |
fungi | ssh-keyscan invoked like that does seem to grab multiple keys at least, and in theory if ssh-keyscan is from the same ssh install as the client connection we shouldn't get a separate list | 20:12 |
avass | I think keyscan just tries to negotiate with a couple of keys and returns what it finds | 20:12 |
corvus | same install | 20:12 |
corvus | hrm, latest on dockerhub is 19 days ago | 20:13 |
corvus | i wonder if our job saves the docker image hash | 20:13 |
clarkb | corvus: can it be the client side and be a problem like the fedora ssh-rsa is disabled issue? | 20:13 |
clarkb | (not sure where we run git-review from in the job) | 20:14 |
corvus | clarkb: ubuntu-bionic node | 20:14 |
clarkb | bionic should be fine | 20:14 |
fungi | yeah, the openssh-client package for bionic is a couple years old now | 20:18 |
fungi | the libssl package (which it relies on) was updated in mid-february but i don't see anything in there which would be related | 20:18 |
fungi | i suppose we could hold a node and inspect it | 20:20 |
fungi | or tweak the job to try and get more debugging info out of the ssh connection attempt and the keyscan before it | 20:21 |
clarkb | ya or even just add a `ssh -vvv -p 29418 localhost gerrit ls-projects` and see what comes out of that | 20:22 |
corvus | after having run the quickstart playbook past that location, i started a bionic container and ran ssh-keyscan and then git-review -s in it, and it worked fine | 20:26 |
corvus | so i'm at like 99% local test fidelity; i think the only other thing would be to re-run it without using my local ssh key and agent setup; but i'm struggling to understand how it could affect it. | 20:27 |
fungi | looks like we built a new ubuntu-bionic image just shy of 3 hours ago, around the time of that first failure | 20:27 |
corvus | i'm inclined to go with the hold-node route | 20:27 |
fungi | though that failure was also the first time it ran in 17 hours | 20:27 |
avass | oh.. zuul.change is not the same in a post and gate pipeline for github connections | 20:27 |
corvus | (i did do an apt-get dist-upgrade on the container) | 20:27 |
fungi | so really anything in the last 20 hours which changed is suspect | 20:28 |
corvus | avass: there should be no change in a post pipeline, only a ref | 20:28 |
corvus | avass: (true for all code review systems) | 20:28 |
fungi | avass: yep, this is sort of why we use a promote pipeline, since it's triggered on the change-merged event rather than the ref-updated event | 20:28 |
fungi | so you get a change context in promote vs a (merge) commit context in post | 20:29 |
corvus | but note that a promote pipeline doesn't work with the actual git sha in the authoritative repository. so you have to be careful choosing between the two depending on exactly what you're doing. | 20:29 |
corvus | (building a traceable artifact? use post; okay making something functionally equivalent to what landed like docs? use promote) | 20:29 |
avass | that's a bit annoying, I was hoping to get the artifact produced in the gate and promote it to be the new ci image | 20:30 |
fungi | if zuul pushed its merge commit state rather than asking gerrit to perform the merge, that could in theory be rectified | 20:30 |
corvus | avass: you can still do that, you just have to be aware of the compromises | 20:30 |
corvus | and yes, also that ^ (would be true for github too) | 20:30 |
corvus | avass: fwiw, we accept that tradeoff and use promote to push zuul's dockerhub images | 20:31 |
corvus | they are functionally equivalent to the corresponding git commits | 20:32 |
corvus | but the hashes are ... difficult to trace. | 20:32 |
corvus | i'll set a hold for quickstart | 20:32 |
avass | wait, is the difference what it triggers on or the type of pipeline? (just realized I had an independent pipeline which is not what I want either) | 20:33 |
avass | oh I see what you mean with ref-updated vs change-merged | 20:34 |
corvus | yep, we were using 'post' in the colloquial meaning of "independent pipeline triggered by ref-updated or similar event" :) | 20:34 |
avass | obviously :) | 20:35 |
corvus | and promote being "supercedent pipeline triggered by change-merged event" | 20:35 |
corvus | avass: it's documented ;) https://zuul-ci.org/docs/zuul/reference/glossary.html#term-post | 20:35 |
avass | Now I just need to figure out how to do that with github | 20:36 |
corvus | avass: how to make a promote pipeline? if you do, pls update https://zuul-ci.org/docs/zuul/reference/drivers/github.html#reference-pipelines-configuration | 20:37 |
*** guilhermesp has quit IRC | 20:37 | |
*** paladox_ has quit IRC | 20:37 | |
corvus | avass: you can probably start with opendev's promote as a reference and maybe find a corresponding pr-merged event | 20:38 |
*** erbarr has quit IRC | 20:38 | |
corvus | should probably add promote to the glossary too | 20:38 |
*** mnasiadka has quit IRC | 20:38 | |
*** mnaser has quit IRC | 20:38 | |
*** guilhermesp has joined #zuul | 20:38 | |
*** mnasiadka has joined #zuul | 20:39 | |
*** erbarr has joined #zuul | 20:39 | |
*** mnaser has joined #zuul | 20:39 | |
avass | corvus: yeah that's what I'm looking for. I just hadn't realized the zuul: variables were set depending on what triggered the pipelines | 20:39 |
*** paladox_ has joined #zuul | 20:39 | |
corvus | avass: technically it's the type of item (which has a fairly defined mapping to triggers, so it's mostly interchangable; just that multiple triggers can produce the same type of item) | 20:41 |
corvus | avass: the docs have the zuul variables sectioned by item type: https://zuul-ci.org/docs/zuul/reference/jobs.html#change-items | 20:42 |
*** paladox_ is now known as paladox | 20:42 | |
corvus | avass: so the section above that applies to all types, then there's change, branch, tag, and ref item types | 20:42 |
corvus | you'll get a branch type in post, and a change type in promote | 20:44 |
corvus | i've re-enqueued 764444 with an autohold in place; will check back in a bit | 20:46 |
openstackgerrit | Albin Vass proposed zuul/zuul master: Add example github promote and deploy pipelines https://review.opendev.org/c/zuul/zuul/+/786977 | 20:49 |
avass | I think that's it ^ but I'm gonna test it quickly as well | 20:49 |
corvus | avass: those words look reasonable to me :) | 20:58 |
openstackgerrit | Albin Vass proposed zuul/zuul master: Add example github promote and deploy pipelines https://review.opendev.org/c/zuul/zuul/+/786977 | 21:00 |
avass | corvus: just slightly wrong syntax | 21:00 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Support key versions and unique names in ZK keystorage https://review.opendev.org/c/zuul/zuul/+/786774 | 21:16 |
corvus | avass, fungi, clarkb, mordred: regarding the component/subcomponent issue -- i think step 1 there is to take advantage of the move to zk secrets to make sure we avoid problems there; in that change ^ swest proposed we use urllib.parse.quote_plus -- do you see any issues with using that in the filesystem too (and to be clear, i'm only asking about the internal merger repos right now -- how to resolve it in | 21:16 |
corvus | the workspace checkouts is a different matter) | 21:16 |
fungi | so that we store foo/bar as foo%2Fbar in zk? | 21:19 |
clarkb | I think url quoting is safe for zk node names | 21:19 |
clarkb | fungi: that is my understanding yes | 21:19 |
corvus | fungi: yep, but i'd like to use the same thing for the filesystem for the merger-internal repos | 21:19 |
clarkb | corvus: does zk have a limit on the number of entries at any single node level? converting all /s to %2F might create many more than we typically expect | 21:19 |
clarkb | old ext does have practical limits but ext4 is much better | 21:19 |
clarkb | note in opendev's case I think we're still talking on the order of thousands or maybe tens of thousands of entries at a single level which is also well below old ext limits | 21:22 |
fungi | using url quoting seems like a reasonable solution to me, unless there's something more typical for sanitizing strings in zookeeper node names | 21:22 |
corvus | https://stackoverflow.com/questions/29791134/zookeeper-max-number-of-children-per-node says 4mb max packet size | 21:22 |
avass | corvus: those pipelines does that correct thing | 21:22 |
fungi | also url quoting is probably more readable for people than, say, base64 | 21:23 |
avass | corvus: and no I don't see any problems with that | 21:23 |
clarkb | 200k children seems like plenty (but is definitely a limit) | 21:24 |
corvus | clarkb: opendev's mean project name length is 26 bytes, so we could have 161319 projects per connection | 21:24 |
clarkb | corvus: does that include adding 2 extra bytes for each / :) (I think this limit is fine for opendev, but maybe we double check with other users that they don't do a project per employee or something) | 21:25 |
corvus | oh, but bump that to 28 chars because of %2f, so 149796 :) | 21:25 |
corvus | clarkb: heh | 21:25 |
corvus | clarkb: i feel fairly confident this would be a regression for no current users | 21:27 |
corvus | but if we're concerned about it, maybe we can come up with a quick way to shard? | 21:27 |
clarkb | the last two letters/digits system employed by gerrit seems to work well. But I agree not sure that is necessary given scale of current use | 21:28 |
corvus | yeah, that space is pretty even though, and may not shard as well for project names (imagine 100,000 projects all ending with -python) | 21:29 |
avass | just wait until someone connects every github project | 21:29 |
corvus | both first and last characters suffer from that; we could hash them, but then it's unpredictable for users | 21:30 |
fungi | maybe shard by a (truncated) hash of the string | 21:30 |
fungi | hah, that, yes | 21:31 |
fungi | i agree hard to find things in a sort that way | 21:31 |
fungi | you get to experience that first hand when dealing with pypi file urls | 21:31 |
fungi | or the local python package cache for that matter | 21:31 |
corvus | i think i'm inclined to accept the "one or two hundred thousand projects per connection" limit for now and kick that can down the road for a bit. | 21:35 |
avass | well the same gerrit instance could also be sharded over several connections if that would ever happen | 21:36 |
clarkb | wfm | 21:36 |
corvus | i had one more idea - | 21:36 |
corvus | we could do a sort of pseudo-hierarchy, where we assume every project either has zero or 1 path components and treat the first one specially. so "foo/bar" goes into "foo/foo%2fbar" and "baz" goes into "_/baz" | 21:37 |
corvus | (i should say has zero, or at least 1, but we just ignore everything after 1) | 21:38 |
corvus | that would produce some reasonable sharding most of the time. probably. | 21:39 |
corvus | then you get a limit of 200,000 github orgs :) | 21:39 |
corvus | (per connection) | 21:39 |
corvus | i guess that's a variation on the "first chars" method of sharding | 21:40 |
corvus | just with a variable number of first chars | 21:40 |
corvus | under the assumption that going up to a / produces a meaningful differentiation. if you had a flat gerrit system that would not be true of course, everything would be under _/ | 21:41 |
avass | I think that's probably going to be good enough :) | 21:45 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Pseudo-shard unique project names in keystore https://review.opendev.org/c/zuul/zuul/+/786983 | 22:01 |
corvus | okay, i left the latest version of the keystore patch with quote_plus, then wrote that as a followup so we can shard by first path component if we want | 22:02 |
corvus | hrm, maybe just drop the _ and always go with path[0] | 22:04 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Pseudo-shard unique project names in keystore https://review.opendev.org/c/zuul/zuul/+/786983 | 22:05 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Require dib 3.10.0 https://review.opendev.org/c/zuul/nodepool/+/786984 | 22:16 |
ianw | fungi: ^ as discussed in #opendev | 22:17 |
fungi | thanks! | 22:18 |
fungi | i wonder if there's a better way to limit the churn in that reqs entry, but i guess that's the most straightforward way to get a new nodepool image | 22:19 |
corvus | fungi, ianw: right now, it seems to accurately reflect what's going on; it might be worth seeing if some of the churn in dib could be moved out? like into another repo of elements that are treated as user data or something? | 22:21 |
corvus | i don't know what the churn in dib is, but if it is elements, then having a standard library of elements repo which could be checked out and updated independently of dib releases makes a lot of sense to me. | 22:23 |
fungi | yeah, in this case the impetus was for opendev to get new nodepool container images containing a version of dib which had working support for building debian-buster images | 22:23 |
fungi | which was almost certainly just stuff in the debian-minimal element | 22:24 |
ianw | in theory, you could pull out the elements/* subdirectory into it's own repo. it would be a lot of churn in reworking our various jobs and deployment bits to account for that | 22:24 |
corvus | yeah, and there's trade-offs the other way too if you want dib to be batteries-included (do you tell new users to install dib *and* clone the elements repo? do you bundle a slightly stale version and let users override with a local install?) | 22:25 |
corvus | anyway, just throwing out ideas. no big deal. | 22:27 |
fungi | yep, it's worth noodling on, nothing urgent | 22:27 |
ianw | we did go down that path a little once before with a dib-utils repo. that was supposed to have things that might be useful outside of dib (the run-parts implementation). it caused confusion and never became something generically useful, so we moved run-parts back into dib and dropped the dependency | 22:28 |
fungi | really just didn't want to bug zuul maintainers every time opendev's adding a new node type | 22:28 |
fungi | i suppose it's not a super frequent occurrence | 22:29 |
openstackgerrit | Ian Wienand proposed zuul/nodepool master: Remove statsd args to OpenStack API client call https://review.opendev.org/c/zuul/nodepool/+/786862 | 22:42 |
ianw | tristancC: ^ I made the openstacksdk update dependent, that should be clearer | 22:42 |
ianw | also, packet tracing openstacksdk it seems to be sending floats, which i don't think statsd handles. another yak to shave | 22:43 |
clarkb | corvus: I think your hold caught one | 22:50 |
clarkb | corvus: though the change number you gave above seems to be for a magnum change so can't confirm it failed due ot the same hostkeys issue (though I assume that is teh case) | 22:52 |
corvus | clarkb: https://zuul.opendev.org/t/zuul/build/3a468907f29e4d79b51f3234b98e30b3/log/zuul-info/inventory.yaml | 22:53 |
corvus | feel free to shell in | 22:53 |
*** decimuscorvinus has quit IRC | 22:54 | |
clarkb | ya that matches what nodepool says was held, cool | 22:54 |
corvus | /tmp/tutorial-zuul/known_hosts is empty | 22:54 |
clarkb | and same issue in the log | 22:55 |
clarkb | if I try `ssh-keyscan -v -p 29418 localhost` it says connection refused | 22:56 |
corvus | yeah, that has me confused | 22:56 |
*** rlandy|rover has quit IRC | 22:56 | |
clarkb | if I run that within the container it works | 22:57 |
clarkb | I get ssh-rsa ecdsa-sha2-nistp256 and ssh-ed25519 keys from within the container | 22:58 |
clarkb | docker ps -a shows 0.0.0.0:29418->29418/tcp as the port forwards for the gerrit container | 22:59 |
corvus | ssh-keyscan -p 29418 127.0.0.1 works | 22:59 |
corvus | ::1 returns connection refused | 22:59 |
corvus | did something suddenly start preferring ipv6 lo? | 23:00 |
clarkb | ya ssh-keyscan -4 works but -6 breaks | 23:00 |
clarkb | 127.0.0.1 localhost is in /etc/hosts before ::1 localhost ip6-localhost ip6-loopback but maybe that doesn't matter if keyscan is trying to use ipv6 | 23:01 |
clarkb | netstat -lnp shows that the 29418 listen is 0.0.0.0:29418 not :::29418 | 23:02 |
clarkb | that explains why ipv6 isn't working. but not why ipv6 is being attempted and then ipv4 is ignored | 23:02 |
fungi | oh! | 23:02 |
fungi | so this is what came up when i was working on updating the git-review jobs to run on focal | 23:03 |
fungi | and i mentioned it in #opendev as something we should remember in case we run into it again with other job updates | 23:03 |
fungi | i updated the git-review functional test to connect to 127.0.0.1 explicitly as a workaround | 23:04 |
fungi | i think what's changed is /etc/hosts | 23:04 |
fungi | but i didn't expect that to creep into bionic | 23:04 |
fungi | i was seeing it in focal | 23:04 |
clarkb | do we make the docker port forward ipv4 specific? the internet makes it sound like ti may do ipv6 by default (which then captures v4 too) | 23:05 |
fungi | and the node you're on is definitely ubuntu-bionic? | 23:05 |
clarkb | fungi: any idea what about /etc/hosts changed? it looks about how I would expect it | 23:05 |
clarkb | fungi: /etc/os-release says so as is the ovh test node name | 23:06 |
fungi | mmm | 23:06 |
fungi | could the gerrit container have changed to switch from listening on :: to 0.0.0.0? | 23:06 |
fungi | no, that's probably set in the config we supply | 23:07 |
corvus | fungi: gerrit image says it's weeks old | 23:07 |
corvus | we use a lot of defaults; i'm unsure if we override any listening lines | 23:07 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Use ssh-keyscan -4 in quick-start https://review.opendev.org/c/zuul/zuul/+/786988 | 23:07 |
clarkb | fungi: corvus its specifically the host side that listens on 0.0.0.0 though | 23:09 |
fungi | but yeah, if we configure gerrit to listen only on 0.0.0.0 but then say to connect to localhost and /etc/hosts also resolves that to ::1 (in addition to 127.0.0.1) then maybe what's changed is the libc socket decision making? | 23:09 |
*** nils has quit IRC | 23:09 | |
clarkb | I don't think the container side matters as long as teh other end of the nat is right | 23:09 |
clarkb | the docker compose file doesn't say anything about listening addrs for the proxy | 23:11 |
corvus | i think 0.0.0.0 is consistent with past behavior from docker, so i want to guess based on my vague memory that this is ssh-keyscan/libc/something deciding to prefer ::1 instead of 127.0.0.1 | 23:11 |
clarkb | fwiw the container side is also listening on 0.0.0.0 | 23:12 |
fungi | again, surprised a low-level behavior like that would change in a 3-year-old ubuntu lts | 23:12 |
fungi | i'd expect them to be more careful about what gets backported there | 23:12 |
fungi | also as we noted, we didn't release a new version of dib recently, until after that failure started, so i don't think it's a change to how we're building the node image | 23:13 |
clarkb | /etc/hosts in the conatiner looks really similar to the host too | 23:13 |
clarkb | which makes me suspect toolchain more than config | 23:13 |
clarkb | could it be a change to keyscan itself? | 23:14 |
fungi | what's in /etc/gai.conf? | 23:15 |
fungi | anything uncommented? | 23:15 |
clarkb | just comments | 23:16 |
fungi | last glibc change in bionic-updates was december too | 23:16 |
clarkb | socket.getaddrinfo('localhost', 29418, socket.IPPROTO_IP) shows both ipv6 and ipv4 addrs returned as expected | 23:18 |
*** decimuscorvinus has joined #zuul | 23:19 | |
*** decimuscorvinus has quit IRC | 23:20 | |
fungi | i originally assumed this was a behavior change in focal, which is why i just hard-coded the git-review tests to use 127.0.0.1 for connecting to test gerrit instances | 23:20 |
fungi | but seeing it spontaneously crop up on bionic is troublesome | 23:20 |
*** decimuscorvinus has joined #zuul | 23:21 | |
*** tosky has quit IRC | 23:23 | |
fungi | it's openssh-client 7.6p1-4ubuntu0.3 installed, yeah? | 23:23 |
fungi | and libc6 2.27-3ubuntu1.4 | 23:24 |
*** decimuscorvinus has joined #zuul | 23:25 | |
clarkb | looking | 23:26 |
clarkb | ii openssh-client 1:7.6p1-4ubuntu0.3 and ii libc6:amd64 2.27-3ubuntu1.4 | 23:26 |
*** decimuscorvinus has quit IRC | 23:27 | |
fungi | yeah, so those are the same (fairly old) versions i see in the ubuntu package changelogs as well | 23:28 |
*** decimuscorvinus has joined #zuul | 23:30 | |
*** decimuscorvinus has quit IRC | 23:30 | |
*** decimuscorvinus has joined #zuul | 23:31 | |
clarkb | Looking at a strace it creates 3 sockets. It does so by first connecting to the ipv4 address then closing that socket and starting over with ipv6 using the same fd number | 23:32 |
clarkb | like it decides for some reason that it shouldn't use ipv4 | 23:33 |
clarkb | https://github.com/openssh/openssh-portable/blob/master/ssh-keyscan.c#L363-L378 thsi loop I think | 23:36 |
clarkb | the behavior in the loop ^ there is slightly different to what my strace implies though as we should hit that break after the first connection to 127.0.0.1 because the return code is 0 | 23:40 |
clarkb | now to find the ubuntu source I guess | 23:40 |
corvus | apt-get source ? | 23:40 |
corvus | not sure if we have source repos on those hosts | 23:41 |
corvus | nope | 23:41 |
clarkb | http://archive.ubuntu.com/ubuntu/pool/main/o/openssh/openssh_7.6p1.orig.tar.gz I think | 23:42 |
clarkb | if (connect(s, ai->ai_addr, ai->ai_addrlen) < 0 && errno != EINPROGRESS) is the condition there. Slightly different but still should be fine based on the strace I think | 23:43 |
clarkb | hrm though I am seeing the close(s) I think. So maybe I am hitting the error condition and strace isn't showing it properly? | 23:45 |
clarkb | or I'm looking at the wrong code I guess | 23:45 |
fungi | the ubuntu openssh source is that orig.tar.gz with the corresponding diff.gz applied and then probably patches from the included debian/patches/* tree applied | 23:48 |
clarkb | ah right may be other patches applied. Anyway I think I may have read this wrong we aren't looping quite the way I thought we were. The fd is used for name lookups first then for connecting to | 23:51 |
fungi | you should be able to use the dget tool pointed at the dsc file to do all that magically | 23:51 |
clarkb | we are in that loop but the first item tried is ipv6 and we don't try any others | 23:51 |
fungi | dget -u http://archive.ubuntu.com/ubuntu/pool/main/o/openssh/openssh_7.6p1-4ubuntu0.3.dsc | 23:51 |
fungi | cd openssh-7.6p1/ | 23:51 |
clarkb | (I noticed the connection types go from DGRAM to STREAM which is how I figured that out) | 23:51 |
fungi | that tree will have all the patches applied | 23:51 |
clarkb | and the ipv6 connection returns = -1 EINPROGRESS (Operation now in progress) | 23:52 |
clarkb | so it drops out of the loop assuming it connected fine | 23:52 |
clarkb | could the change be in socket() itself? | 23:52 |
clarkb | perhaps previously we would have gotten a different erno like "this is broken nothing there" | 23:53 |
clarkb | seems like ssh-keyscan should wait until the connection has completed before breaking and returning there, but openbsd has probably put more thought into this than me :) | 23:54 |
clarkb | another explanation is that the order of getaddrinfo changed | 23:55 |
fungi | yeah, i was trying to debunk that one what with checking gai.conf and the age of the glibc source in bionic | 23:56 |
fungi | which are what should decide it | 23:56 |
clarkb | getaddrinfo() seems to find 127.0.0.1 first then ::1 then ::ffff:127.0.0.1 but the order of the ai list wouldn't necessarily be that I suppose | 23:57 |
clarkb | AF_UNSPEC alone doesn't seem to imply an order | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!