Monday, 2021-04-19

openstackgerritJames E. Blair proposed zuul/zuul master: Fix repo state restore / Keep jobgraphs frozen  https://review.opendev.org/c/zuul/zuul/+/78553600:11
openstackgerritJames E. Blair proposed zuul/zuul master: Restore repo state in checkoutBranch  https://review.opendev.org/c/zuul/zuul/+/78652300:11
openstackgerritJames E. Blair proposed zuul/zuul master: Clarify merger updates and resets  https://review.opendev.org/c/zuul/zuul/+/78674400:11
*** ajitha has joined #zuul01:20
*** evrardjp has quit IRC02:33
*** evrardjp has joined #zuul02:33
*** ikhan has joined #zuul03:13
*** ykarel__ has joined #zuul03:51
*** ikhan has quit IRC03:55
*** bhavikdbavishi has joined #zuul04:21
*** bhavikdbavishi1 has joined #zuul04:24
*** bhavikdbavishi has quit IRC04:26
*** bhavikdbavishi1 is now known as bhavikdbavishi04:26
*** ikhan has joined #zuul04:26
*** ikhan has quit IRC04:33
*** vishalmanchanda has joined #zuul04:53
*** ajitha has quit IRC04:55
*** ykarel_ has joined #zuul05:02
*** ykarel__ has quit IRC05:05
*** paladox has quit IRC05:09
*** jfoufas1 has joined #zuul05:15
*** paladox has joined #zuul05:18
*** hamalq has joined #zuul05:20
*** hamalq has quit IRC05:24
*** sam_wan has joined #zuul05:33
*** bhagyashris|away is now known as bhagyashris05:33
*** dpawlik7 is now known as dpawlik06:22
*** saneax has joined #zuul06:25
*** jcapitao has joined #zuul06:55
*** bhavikdbavishi has quit IRC07:01
*** rpittau|afk is now known as rpittau07:44
*** bhavikdbavishi has joined #zuul07:50
*** bhavikdbavishi1 has joined #zuul07:53
*** bhavikdbavishi has quit IRC07:55
*** bhavikdbavishi1 is now known as bhavikdbavishi07:55
*** jpena|off is now known as jpena07:56
*** tosky has joined #zuul08:03
*** nilsph has joined #zuul08:04
*** nilsph is now known as nils08:05
*** vishalmanchanda has quit IRC08:22
*** ykarel_ is now known as ykarel|lunch08:46
*** ajitha has joined #zuul08:47
*** vishalmanchanda has joined #zuul08:58
*** hamalq has joined #zuul09:22
*** hamalq has quit IRC09:26
*** harrymichal has joined #zuul09:26
*** harrymichal_ has joined #zuul09:27
*** harrymichal has quit IRC09:31
*** harrymichal_ is now known as harrymichal09:31
*** ykarel|lunch is now known as ykarel09:41
*** bhavikdbavishi has quit IRC10:07
openstackgerritIan Wienand proposed zuul/nodepool master: Remove statsd args to OpenStack API client call  https://review.opendev.org/c/zuul/nodepool/+/78686210:07
*** iurygregory has joined #zuul10:17
*** amotoki has joined #zuul10:23
*** saneax has quit IRC10:41
*** saneax has joined #zuul10:55
*** avass has quit IRC11:00
*** jcapitao is now known as jcapitao_lunch11:04
*** bhavikdbavishi has joined #zuul11:29
*** bhavikdbavishi has quit IRC11:34
*** jpena is now known as jpena|lunch11:34
*** rlandy has joined #zuul11:36
*** rlandy is now known as rlandy|rover11:36
*** sduthil has joined #zuul11:49
*** jcapitao_lunch is now known as jcapitao12:04
*** bhavikdbavishi has joined #zuul12:07
*** avass has joined #zuul12:09
*** ricolin has quit IRC12:15
*** harrymichal_ has joined #zuul12:23
*** harrymichal has quit IRC12:24
*** harrymichal_ is now known as harrymichal12:24
*** ikhan has joined #zuul12:27
*** paladox has quit IRC12:34
*** paladox has joined #zuul12:35
*** jpena|lunch is now known as jpena12:37
*** harrymichal has quit IRC12:39
*** harrymichal has joined #zuul12:39
*** sam_wan has quit IRC12:47
*** ykarel has quit IRC13:12
*** bhavikdbavishi has quit IRC13:36
*** ricolin has joined #zuul13:50
*** harrymichal_ has joined #zuul13:58
*** harrymichal has quit IRC14:02
*** harrymichal_ is now known as harrymichal14:02
*** ykarel has joined #zuul14:47
*** ykarel has quit IRC15:12
*** bhavikdbavishi has joined #zuul15:24
*** bhavikdbavishi has quit IRC15:28
*** rlandy|rover is now known as rlandy|rvr|biab16:06
*** snktparik_ has joined #zuul16:11
*** hamalq has joined #zuul16:12
*** hamalq_ has joined #zuul16:15
*** hamalq has quit IRC16:19
*** jcapitao has quit IRC16:52
openstackgerritJames E. Blair proposed zuul/zuul master: Clarify merger updates and resets  https://review.opendev.org/c/zuul/zuul/+/78674416:54
*** jpena is now known as jpena|off16:59
*** rpittau is now known as rpittau|afk17:11
*** rlandy|rvr|biab is now known as rlandy|rover17:36
*** snktparik_ has quit IRC17:40
openstackgerritAndy Ladjadj proposed zuul/zuul master: [reporter][elasticsearch] fix the timestamp when the system has a diffĂ©rent timezone by forcing the UTC timezone  https://review.opendev.org/c/zuul/zuul/+/78644417:56
openstackgerritAndy Ladjadj proposed zuul/zuul master: [reporter][elasticsearch] fix the timestamp when the system has a different timezone by forcing the UTC timezone  https://review.opendev.org/c/zuul/zuul/+/78644417:57
openstackgerritJames E. Blair proposed zuul/zuul master: Clarify merger updates and resets  https://review.opendev.org/c/zuul/zuul/+/78674418:05
*** saneax has quit IRC18:06
*** hamalq_ has quit IRC18:08
*** hamalq has joined #zuul18:08
*** paladox_ has joined #zuul18:17
*** jfoufas1 has quit IRC18:18
*** paladox_ has quit IRC18:37
*** paladox_ has joined #zuul18:39
*** paladox_ has quit IRC18:41
*** paladox_ has joined #zuul18:42
*** ajitha has quit IRC18:45
*** paladox_ has quit IRC18:51
*** paladox_ has joined #zuul18:52
*** paladox has quit IRC19:28
corvusit looks like a new failure in zuul-quick-start, possibly gerrit related?19:46
corvushttps://zuul.opendev.org/t/zuul/build/921b9aa82eaa4ce9b8b45b040ee9c6c619:47
clarkbhostkey verification failed19:48
clarkbis it possibly related to the ssh-rsa vs rsa-sha2-512 stuff that we've run into on fedora?19:49
clarkbgerrit should make an ecdsa hostkey though iirc19:50
corvusi'm not seeing any apparently relevant gerrit changes in the past few days after a quick check19:51
corvusi'll fire up a local quickstart and see what i can see19:51
fungiis is repeating, or was it just the one failure?19:59
corvusrepeating20:00
corvuslast 3 builds across 2 changes https://zuul.opendev.org/t/zuul/builds?job_name=zuul-quick-start&project=zuul/zuul20:01
mordredcorvus: I remember reading something in gerrit releasenotes about java ssh libraries20:01
corvusmordred: huh, i thought we were running master20:01
corvusin quickstart20:01
corvusmaybe they just merged up something though?20:01
fungiwhat platform is the ssh client on?20:02
corvussince they do that "backwards" :)20:02
*** vishalmanchanda has quit IRC20:02
corvusmordred: well, we run their :latest image, which may not quite be every commit?20:02
mordredcorvus: I think what I remembered was a recent release saying they were going to remove something in the future - so maybe master removed it?20:03
mordredthis is VERY hand wavey20:03
clarkb3.4 is very close to going out the door aiui20:04
clarkb(though they just staretd a new channel to work through the release so maybe they hit problems/)20:04
clarkbhttps://www.gerritcodereview.com/3.4.html#jcraft-jsch-client-library-is-disabled-per-default20:04
corvusfungi: it runs on the zuul test host which is ubuntu-bionic20:05
clarkb"Use MINA sshd library for key generation and export" also on that page20:05
mordredyeah - that's the stuff20:05
clarkbmaybe the hostkeys aren't getting generated properly and that prevents git review from receiving and validating them20:05
fungiare we prepopulating a known_hosts list in the test or just doing tofu?20:06
corvusi'm getting close to that point in my local test -- i had to spend 5 minutes getting docker to remove a bunch of old images20:06
fungiif we prepopulate known_hosts then that could explain it (ssh picking up a different ket type we didn't prepopulate with)20:07
corvus  shell: ssh-keyscan -p 29418 localhost > {{ workspace }}/known_hosts20:08
corvuswe do that20:08
corvusit wfm using my existing rsa key20:09
corvus(i say that because i have a recollection that what keys an agent knows about might influence what keyscan does?)20:09
corvushrm, keyscan seems to emit a lot of keys regardless?20:11
corvusi get a ssh-rsa, ecdsa-sha2-nistp256, and ssh-ed2551920:12
fungissh-keyscan invoked like that does seem to grab multiple keys at least, and in theory if ssh-keyscan is from the same ssh install as the client connection we shouldn't get a separate list20:12
avassI think keyscan just tries to negotiate with a couple of keys and returns what it finds20:12
corvussame install20:12
corvushrm, latest on dockerhub is 19 days ago20:13
corvusi wonder if our job saves the docker image hash20:13
clarkbcorvus: can it be the client side and be a problem like the fedora ssh-rsa is disabled issue?20:13
clarkb(not sure where we run git-review from in the job)20:14
corvusclarkb: ubuntu-bionic node20:14
clarkbbionic should be fine20:14
fungiyeah, the openssh-client package for bionic is a couple years old now20:18
fungithe libssl package (which it relies on) was updated in mid-february but i don't see anything in there which would be related20:18
fungii suppose we could hold a node and inspect it20:20
fungior tweak the job to try and get more debugging info out of the ssh connection attempt and the keyscan before it20:21
clarkbya or even just add a `ssh -vvv -p 29418 localhost gerrit ls-projects` and see what comes out of that20:22
corvusafter having run the quickstart playbook past that location, i started a bionic container and ran ssh-keyscan and then git-review -s in it, and it worked fine20:26
corvusso i'm at like 99% local test fidelity; i think the only other thing would be to re-run it without using my local ssh key and agent setup; but i'm struggling to understand how it could affect it.20:27
fungilooks like we built a new ubuntu-bionic image just shy of 3 hours ago, around the time of that first failure20:27
corvusi'm inclined to go with the hold-node route20:27
fungithough that failure was also the first time it ran in 17 hours20:27
avassoh.. zuul.change is not the same in a post and gate pipeline for github connections20:27
corvus(i did do an apt-get dist-upgrade on the container)20:27
fungiso really anything in the last 20 hours which changed is suspect20:28
corvusavass: there should be no change in a post pipeline, only a ref20:28
corvusavass: (true for all code review systems)20:28
fungiavass: yep, this is sort of why we use a promote pipeline, since it's triggered on the change-merged event rather than the ref-updated event20:28
fungiso you get a change context in promote vs a (merge) commit context in post20:29
corvusbut note that a promote pipeline doesn't work with the actual git sha in the authoritative repository.  so you have to be careful choosing between the two depending on exactly what you're doing.20:29
corvus(building a traceable artifact? use post;  okay making something functionally equivalent to what landed like docs?  use promote)20:29
avassthat's a bit annoying, I was hoping to get the artifact produced in the gate and promote it to be the new ci image20:30
fungiif zuul pushed its merge commit state rather than asking gerrit to perform the merge, that could in theory be rectified20:30
corvusavass: you can still do that, you just have to be aware of the compromises20:30
corvusand yes, also that ^ (would be true for github too)20:30
corvusavass: fwiw, we accept that tradeoff and use promote to push zuul's dockerhub images20:31
corvusthey are functionally equivalent to the corresponding git commits20:32
corvusbut the hashes are ... difficult to trace.20:32
corvusi'll set a hold for quickstart20:32
avasswait, is the difference what it triggers on or the type of pipeline? (just realized I had an independent pipeline which is not what I want either)20:33
avassoh I see what you mean with ref-updated vs change-merged20:34
corvusyep, we were using 'post' in the colloquial meaning of "independent pipeline triggered by ref-updated or similar event" :)20:34
avassobviously :)20:35
corvusand promote being "supercedent pipeline triggered by change-merged event"20:35
corvusavass: it's documented ;)  https://zuul-ci.org/docs/zuul/reference/glossary.html#term-post20:35
avassNow I just need to figure out how to do that with github20:36
corvusavass: how to make a promote pipeline?  if you do, pls update https://zuul-ci.org/docs/zuul/reference/drivers/github.html#reference-pipelines-configuration20:37
*** guilhermesp has quit IRC20:37
*** paladox_ has quit IRC20:37
corvusavass: you can probably start with opendev's promote as a reference and maybe find a corresponding pr-merged event20:38
*** erbarr has quit IRC20:38
corvusshould probably add promote to the glossary too20:38
*** mnasiadka has quit IRC20:38
*** mnaser has quit IRC20:38
*** guilhermesp has joined #zuul20:38
*** mnasiadka has joined #zuul20:39
*** erbarr has joined #zuul20:39
*** mnaser has joined #zuul20:39
avasscorvus: yeah that's what I'm looking for. I just hadn't realized the zuul: variables were set depending on what triggered the pipelines20:39
*** paladox_ has joined #zuul20:39
corvusavass: technically it's the type of item (which has a fairly defined mapping to triggers, so it's mostly interchangable; just that multiple triggers can produce the same type of item)20:41
corvusavass: the docs have the zuul variables sectioned by item type: https://zuul-ci.org/docs/zuul/reference/jobs.html#change-items20:42
*** paladox_ is now known as paladox20:42
corvusavass: so the section above that applies to all types, then there's change, branch, tag, and ref item types20:42
corvusyou'll get a branch type in post, and a change type in promote20:44
corvusi've re-enqueued 764444 with an autohold in place; will check back in a bit20:46
openstackgerritAlbin Vass proposed zuul/zuul master: Add example github promote and deploy pipelines  https://review.opendev.org/c/zuul/zuul/+/78697720:49
avassI think that's it ^ but I'm gonna test it quickly as well20:49
corvusavass: those words look reasonable to me :)20:58
openstackgerritAlbin Vass proposed zuul/zuul master: Add example github promote and deploy pipelines  https://review.opendev.org/c/zuul/zuul/+/78697721:00
avasscorvus: just slightly wrong syntax21:00
openstackgerritJames E. Blair proposed zuul/zuul master: Support key versions and unique names in ZK keystorage  https://review.opendev.org/c/zuul/zuul/+/78677421:16
corvusavass, fungi, clarkb, mordred: regarding the component/subcomponent issue -- i think step 1 there is to take advantage of the move to zk secrets to make sure we avoid problems there; in that change ^ swest proposed we use urllib.parse.quote_plus -- do you see any issues with using that in the filesystem too (and to be clear, i'm only asking about the internal merger repos right now -- how to resolve it in21:16
corvusthe workspace checkouts is a different matter)21:16
fungiso that we store foo/bar as foo%2Fbar in zk?21:19
clarkbI think url quoting is safe for zk node names21:19
clarkbfungi: that is my understanding yes21:19
corvusfungi: yep, but i'd like to use the same thing for the filesystem for the merger-internal repos21:19
clarkbcorvus: does zk have a limit on the number of entries at any single node level? converting all /s to %2F might create many more than we typically expect21:19
clarkbold ext does have practical limits but ext4 is much better21:19
clarkbnote in opendev's case I think we're still talking on the order of thousands or maybe tens of thousands of entries at a single level which is also well below old ext limits21:22
fungiusing url quoting seems like a reasonable solution to me, unless there's something more typical for sanitizing strings in zookeeper node names21:22
corvushttps://stackoverflow.com/questions/29791134/zookeeper-max-number-of-children-per-node says 4mb max packet size21:22
avasscorvus: those pipelines does that correct thing21:22
fungialso url quoting is probably more readable for people than, say, base6421:23
avasscorvus: and no I don't see any problems with that21:23
clarkb200k children seems like plenty (but is definitely a limit)21:24
corvusclarkb: opendev's mean project name length is 26 bytes, so we could have 161319 projects per connection21:24
clarkbcorvus: does that include adding 2 extra bytes for each / :) (I think this limit is fine for opendev, but maybe we double check with other users that they don't do a project per employee or something)21:25
corvusoh, but bump that to 28 chars because of %2f, so 149796 :)21:25
corvusclarkb: heh21:25
corvusclarkb: i feel fairly confident this would be a regression for no current users21:27
corvusbut if we're concerned about it, maybe we can come up with a quick way to shard?21:27
clarkbthe last two letters/digits system employed by gerrit seems to work well. But I agree not sure that is necessary given scale of current use21:28
corvusyeah, that space is pretty even though, and may not shard as well for project names (imagine 100,000 projects all ending with -python)21:29
avassjust wait until someone connects every github project21:29
corvusboth first and last characters suffer from that;  we could hash them, but then it's unpredictable for users21:30
fungimaybe shard by a (truncated) hash of the string21:30
fungihah, that, yes21:31
fungii agree hard to find things in a sort that way21:31
fungiyou get to experience that first hand when dealing with pypi file urls21:31
fungior the local python package cache for that matter21:31
corvusi think i'm inclined to accept the "one or two hundred thousand projects per connection" limit for now and kick that can down the road for a bit.21:35
avasswell the same gerrit instance could also be sharded over several connections if that would ever happen21:36
clarkbwfm21:36
corvusi had one more idea -21:36
corvuswe could do a sort of pseudo-hierarchy, where we assume every project either has zero or 1 path components and treat the first one specially.  so "foo/bar" goes into "foo/foo%2fbar" and "baz" goes into "_/baz"21:37
corvus(i should say has zero, or at least 1, but we just ignore everything after 1)21:38
corvusthat would produce some reasonable sharding most of the time.  probably.21:39
corvusthen you get a limit of 200,000 github orgs :)21:39
corvus(per connection)21:39
corvusi guess that's a variation on the "first chars" method of sharding21:40
corvusjust with a variable number of first chars21:40
corvusunder the assumption that going up to a / produces a meaningful differentiation.  if you had a flat gerrit system that would not be true of course, everything would be under _/21:41
avassI think that's probably going to be good enough :)21:45
openstackgerritJames E. Blair proposed zuul/zuul master: Pseudo-shard unique project names in keystore  https://review.opendev.org/c/zuul/zuul/+/78698322:01
corvusokay, i left the latest version of the keystore patch with quote_plus, then wrote that as a followup so we can shard by first path component if we want22:02
corvushrm, maybe just drop the _ and always go with path[0]22:04
openstackgerritJames E. Blair proposed zuul/zuul master: Pseudo-shard unique project names in keystore  https://review.opendev.org/c/zuul/zuul/+/78698322:05
openstackgerritIan Wienand proposed zuul/nodepool master: Require dib 3.10.0  https://review.opendev.org/c/zuul/nodepool/+/78698422:16
ianwfungi: ^ as discussed in #opendev22:17
fungithanks!22:18
fungii wonder if there's a better way to limit the churn in that reqs entry, but i guess that's the most straightforward way to get a new nodepool image22:19
corvusfungi, ianw: right now, it seems to accurately reflect what's going on; it might be worth seeing if some of the churn in dib could be moved out?  like into another repo of elements that are treated as user data or something?22:21
corvusi don't know what the churn in dib is, but if it is elements, then having a standard library of elements repo which could be checked out and updated independently of dib releases makes a lot of sense to me.22:23
fungiyeah, in this case the impetus was for opendev to get new nodepool container images containing a version of dib which had working support for building debian-buster images22:23
fungiwhich was almost certainly just stuff in the debian-minimal element22:24
ianwin theory, you could pull out the elements/* subdirectory into it's own repo.  it would be a lot of churn in reworking our various jobs and deployment bits to account for that22:24
corvusyeah, and there's trade-offs the other way too if you want dib to be batteries-included (do you tell new users to install dib *and* clone the elements repo?  do you bundle a slightly stale version and let users override with a local install?)22:25
corvusanyway, just throwing out ideas.  no big deal.22:27
fungiyep, it's worth noodling on, nothing urgent22:27
ianwwe did go down that path a little once before with a dib-utils repo.  that was supposed to have things that might be useful outside of dib (the run-parts implementation).  it caused confusion and never became something generically useful, so we moved run-parts back into dib and dropped the dependency22:28
fungireally just didn't want to bug zuul maintainers every time opendev's adding a new node type22:28
fungii suppose it's not a super frequent occurrence22:29
openstackgerritIan Wienand proposed zuul/nodepool master: Remove statsd args to OpenStack API client call  https://review.opendev.org/c/zuul/nodepool/+/78686222:42
ianwtristancC: ^ I made the openstacksdk update dependent, that should be clearer22:42
ianwalso, packet tracing openstacksdk it seems to be sending floats, which i don't think statsd handles.  another yak to shave22:43
clarkbcorvus: I think your hold caught one22:50
clarkbcorvus: though the change number you gave above seems to be for a magnum change so can't confirm it failed due ot the same hostkeys issue (though I assume that is teh case)22:52
corvusclarkb: https://zuul.opendev.org/t/zuul/build/3a468907f29e4d79b51f3234b98e30b3/log/zuul-info/inventory.yaml22:53
corvusfeel free to shell in22:53
*** decimuscorvinus has quit IRC22:54
clarkbya that matches what nodepool says was held, cool22:54
corvus/tmp/tutorial-zuul/known_hosts is empty22:54
clarkband same issue in the log22:55
clarkbif I try `ssh-keyscan -v -p 29418 localhost` it says connection refused22:56
corvusyeah, that has me confused22:56
*** rlandy|rover has quit IRC22:56
clarkbif I run that within the container it works22:57
clarkbI get ssh-rsa ecdsa-sha2-nistp256 and ssh-ed25519 keys from within the container22:58
clarkbdocker ps -a shows 0.0.0.0:29418->29418/tcp as the port forwards for the gerrit container22:59
corvusssh-keyscan -p 29418 127.0.0.1 works22:59
corvus::1 returns connection refused22:59
corvusdid something suddenly start preferring ipv6 lo?23:00
clarkbya ssh-keyscan -4 works but -6 breaks23:00
clarkb127.0.0.1 localhost is in /etc/hosts before ::1 localhost ip6-localhost ip6-loopback but maybe that doesn't matter if keyscan is trying to use ipv623:01
clarkbnetstat -lnp shows that the 29418 listen is 0.0.0.0:29418 not :::2941823:02
clarkbthat explains why ipv6 isn't working. but not why ipv6 is being attempted and then ipv4 is ignored23:02
fungioh!23:02
fungiso this is what came up when i was working on updating the git-review jobs to run on focal23:03
fungiand i mentioned it in #opendev as something we should remember in case we run into it again with other job updates23:03
fungii updated the git-review functional test to connect to 127.0.0.1 explicitly as a workaround23:04
fungii think what's changed is /etc/hosts23:04
fungibut i didn't expect that to creep into bionic23:04
fungii was seeing it in focal23:04
clarkbdo we make the docker port forward ipv4 specific? the internet makes it sound like ti may do ipv6 by default (which then captures v4 too)23:05
fungiand the node you're on is definitely ubuntu-bionic?23:05
clarkbfungi: any idea what about /etc/hosts changed? it looks about how I would expect it23:05
clarkbfungi: /etc/os-release says so as is the ovh test node name23:06
fungimmm23:06
fungicould the gerrit container have changed to switch from listening on :: to 0.0.0.0?23:06
fungino, that's probably set in the config we supply23:07
corvusfungi: gerrit image says it's weeks old23:07
corvuswe use a lot of defaults; i'm unsure if we override any listening lines23:07
openstackgerritJames E. Blair proposed zuul/zuul master: Use ssh-keyscan -4 in quick-start  https://review.opendev.org/c/zuul/zuul/+/78698823:07
clarkbfungi: corvus  its specifically the host side that listens on 0.0.0.0 though23:09
fungibut yeah, if we configure gerrit to listen only on 0.0.0.0 but then say to connect to localhost and /etc/hosts also resolves that to ::1 (in addition to 127.0.0.1) then maybe what's changed is the libc socket decision making?23:09
*** nils has quit IRC23:09
clarkbI don't think the container side matters as long as teh other end of the nat is right23:09
clarkbthe docker compose file doesn't say anything about listening addrs for the proxy23:11
corvusi think 0.0.0.0 is consistent with past behavior from docker, so i want to guess based on my vague memory that this is ssh-keyscan/libc/something deciding to prefer ::1 instead of 127.0.0.123:11
clarkbfwiw the container side is also listening on 0.0.0.023:12
fungiagain, surprised a low-level behavior like that would change in a 3-year-old ubuntu lts23:12
fungii'd expect them to be more careful about what gets backported there23:12
fungialso as we noted, we didn't release a new version of dib recently, until after that failure started, so i don't think it's a change to how we're building the node image23:13
clarkb/etc/hosts in the conatiner looks really similar to the host too23:13
clarkbwhich makes me suspect toolchain more than config23:13
clarkbcould it be a change to keyscan itself?23:14
fungiwhat's in /etc/gai.conf?23:15
fungianything uncommented?23:15
clarkbjust comments23:16
fungilast glibc change in bionic-updates was december too23:16
clarkbsocket.getaddrinfo('localhost', 29418, socket.IPPROTO_IP) shows both ipv6 and ipv4 addrs returned as expected23:18
*** decimuscorvinus has joined #zuul23:19
*** decimuscorvinus has quit IRC23:20
fungii originally assumed this was a behavior change in focal, which is why i just hard-coded the git-review tests to use 127.0.0.1 for connecting to test gerrit instances23:20
fungibut seeing it spontaneously crop up on bionic is troublesome23:20
*** decimuscorvinus has joined #zuul23:21
*** tosky has quit IRC23:23
fungiit's openssh-client 7.6p1-4ubuntu0.3 installed, yeah?23:23
fungiand libc6 2.27-3ubuntu1.423:24
*** decimuscorvinus has joined #zuul23:25
clarkblooking23:26
clarkbii  openssh-client                         1:7.6p1-4ubuntu0.3 and ii  libc6:amd64                            2.27-3ubuntu1.423:26
*** decimuscorvinus has quit IRC23:27
fungiyeah, so those are the same (fairly old) versions i see in the ubuntu package changelogs as well23:28
*** decimuscorvinus has joined #zuul23:30
*** decimuscorvinus has quit IRC23:30
*** decimuscorvinus has joined #zuul23:31
clarkbLooking at a strace it creates 3 sockets. It does so by first connecting to the ipv4 address then closing that socket and starting over with ipv6 using the same fd number23:32
clarkblike it decides for some reason that it shouldn't use ipv423:33
clarkbhttps://github.com/openssh/openssh-portable/blob/master/ssh-keyscan.c#L363-L378 thsi loop I think23:36
clarkbthe behavior in the loop ^ there is slightly different to what my strace implies though as we should hit that break after the first connection to 127.0.0.1 because the return code is 023:40
clarkbnow to find the ubuntu source I guess23:40
corvusapt-get source ?23:40
corvusnot sure if we have source repos on those hosts23:41
corvusnope23:41
clarkbhttp://archive.ubuntu.com/ubuntu/pool/main/o/openssh/openssh_7.6p1.orig.tar.gz I think23:42
clarkbif (connect(s, ai->ai_addr, ai->ai_addrlen) < 0 && errno != EINPROGRESS) is the condition there. Slightly different but still should be fine based on the strace I think23:43
clarkbhrm though I am seeing the close(s) I think. So maybe I am hitting the error condition and strace isn't showing it properly?23:45
clarkbor I'm looking at the wrong code I guess23:45
fungithe ubuntu openssh source is that orig.tar.gz with the corresponding diff.gz applied and then probably patches from the included debian/patches/* tree applied23:48
clarkbah right may be other patches applied. Anyway I think I may have read this wrong we aren't looping quite the way I thought we were. The fd is used for name lookups first then for connecting to23:51
fungiyou should be able to use the dget tool pointed at the dsc file to do all that magically23:51
clarkbwe are in that loop but the first item tried is ipv6 and we don't try any others23:51
fungidget -u http://archive.ubuntu.com/ubuntu/pool/main/o/openssh/openssh_7.6p1-4ubuntu0.3.dsc23:51
fungicd openssh-7.6p1/23:51
clarkb(I noticed the connection types go from DGRAM to STREAM which is how I figured that out)23:51
fungithat tree will have all the patches applied23:51
clarkband the ipv6 connection returns = -1 EINPROGRESS (Operation now in progress)23:52
clarkbso it drops out of the loop assuming it connected fine23:52
clarkbcould the change be in socket() itself?23:52
clarkbperhaps previously we would have gotten a different erno like "this is broken nothing there"23:53
clarkbseems like ssh-keyscan should wait until the connection has completed before breaking and returning there, but openbsd has probably put more thought into this than me :)23:54
clarkbanother explanation is that the order of getaddrinfo changed23:55
fungiyeah, i was trying to debunk that one what with checking gai.conf and the age of the glibc source in bionic23:56
fungiwhich are what should decide it23:56
clarkbgetaddrinfo() seems to find 127.0.0.1 first then ::1 then ::ffff:127.0.0.1 but the order of the ai list wouldn't necessarily be that I suppose23:57
clarkbAF_UNSPEC alone doesn't seem to imply an order23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!