Friday, 2021-06-18

SpamapSOh my.. it's been a while since I developed on Zuul on this box..00:44
SpamapS   f7f8ea61..0a5e3308  master     -> origin/master00:44
SpamapS * [new tag]           3.10.0     -> 3.10.000:44
SpamapSYour branch is behind 'origin/master' by 2891 commits, and can be fast-forwarded.00:45
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Switch jobs to use fedora-34 nodes  https://review.opendev.org/c/zuul/zuul-jobs/+/79563601:10
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Ensure dnf-plugins-core before calling "dnf copr"  https://review.opendev.org/c/zuul/zuul-jobs/+/79697901:10
ianwmnaser: ^ couple of tweaks01:11
corvusspamaps: lol :)  btw, zuul/tools/test-setup-docker.sh is what i use to get an environment for running tests  (runs mysql and zk in containers)01:20
corvusspamaps: that's probably new since the last time you pulled :)01:20
corvusspamaps: and if you want to run tests on python >= 3.9, then add "tests.unit.test_scheduler.TestSchedulerSSL" to your exclude list (there's a fix for that, but we're semi-frozen and also about to remove gearman, so we might just wait for that)01:22
SpamapSTy, I think I remember that script but tox was in my muscle memory.01:34
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Switch jobs to use fedora-34 nodes  https://review.opendev.org/c/zuul/zuul-jobs/+/79563602:14
opendevreviewJames E. Blair proposed zuul/zuul master: Replace TreeCache in component registry  https://review.opendev.org/c/zuul/zuul/+/79658202:21
opendevreviewJames E. Blair proposed zuul/zuul master: Add ExecutorApi  https://review.opendev.org/c/zuul/zuul/+/77090202:21
opendevreviewJames E. Blair proposed zuul/zuul master: Change zone handling in ExecutorApi  https://review.opendev.org/c/zuul/zuul/+/78783302:21
opendevreviewJames E. Blair proposed zuul/zuul master: Switch to string constants in BuildRequest  https://review.opendev.org/c/zuul/zuul/+/79184902:21
opendevreviewJames E. Blair proposed zuul/zuul master: Clean up Executor API build request locking and add tests  https://review.opendev.org/c/zuul/zuul/+/78862402:21
opendevreviewJames E. Blair proposed zuul/zuul master: Fix race with watches in ExecutorAPI  https://review.opendev.org/c/zuul/zuul/+/79230002:21
opendevreviewJames E. Blair proposed zuul/zuul master: Execute builds via ZooKeeper  https://review.opendev.org/c/zuul/zuul/+/78898802:21
opendevreviewJames E. Blair proposed zuul/zuul master: Move build request cleanup from executor to scheduler  https://review.opendev.org/c/zuul/zuul/+/79468702:21
opendevreviewJames E. Blair proposed zuul/zuul master: Handle errors in the executor main loop  https://review.opendev.org/c/zuul/zuul/+/79658302:21
SpamapSOh wow the test suite got much bigger!02:47
SpamapSMy little crappy laptop has been running tests for 45 minutes now. ;)02:47
corvusyeah... it's... well tested.  :)  some of the cloud nodes take 1hr20m.  selective testing during development is helpful02:59
corvus(i think we could speed tests up with a db migration rollup strategy, but that's going to take some planning)03:00
SpamapSYeah I figured I should run them all just to validate and knock the rust off things. :)03:04
SpamapSI should probably spin up an Ubuntu VM on my gaming laptop and use that.. it has a lot more oomph :)03:05
corvusthey paralellize very well :)03:05
SpamapSI have 2 cores! But.. they're BAD cores. model name: AMD A9-9420e RADEON R5, 5 COMPUTE CORES 2C+3G03:08
SpamapS{0} tests.unit.test_gerrit_legacy_crd.TestGerritLegacyCRD.test_crd_branch [45.901647s] ... ok03:12
opendevreviewMerged zuul/zuul-jobs master: Ensure dnf-plugins-core before calling "dnf copr"  https://review.opendev.org/c/zuul/zuul-jobs/+/79697903:12
opendevreviewMerged zuul/zuul-jobs master: Switch jobs to use fedora-34 nodes  https://review.opendev.org/c/zuul/zuul-jobs/+/79563603:30
opendevreviewMerged zuul/zuul-jobs master: ensure-zookeeper: better match return code  https://review.opendev.org/c/zuul/zuul-jobs/+/79353703:30
*** bhavikdbavishi1 is now known as bhavikdbavishi05:49
*** marios is now known as marios|ruck06:02
*** jpena|off is now known as jpena07:18
*** rpittau|afk is now known as rpittau08:17
*** raukadah is now known as chandankumar09:26
swestcorvus: https://github.com/python-zk/kazoo/issues/64510:12
*** jpena is now known as jpena|lunch11:41
*** bhagyashris_ is now known as bhagyashris11:50
mhuinhello, anybody ever ran into zookeeper connection timeouts? My Zookeeper service seems up and running, I've set up TLS as explained in zuul's doc, and I can netcat into port 2281. But zuul can't seem to connect12:26
gtemaZK is really a messy sw. I have some issues with it (i.e. zuul-web start slowly with some delay exactly connecting to it), but not timeouts. Check ZK logs, cause i.e. if you try to establish non ssl connection to ssl port you have broken connection12:30
gtemaor if cluster got broken you can maybe establish connection, but not able to do anything in it12:33
mhuingtema, thanks for the tips12:34
*** jpena|lunch is now known as jpena12:38
*** raukadah is now known as chandankumar13:13
*** rpittau is now known as rpittau|afk14:09
opendevreviewJames E. Blair proposed zuul/zuul master: WIP reenqueue  https://review.opendev.org/c/zuul/zuul/+/79711614:22
masterpe[m]I'm trying to configure zookeeper to use tls, I have disabled clientPort in zoo.cfg and I have specifyed hosts=localhost:2281 in zuul.conf but I get in de logs "WARNING zuul.zk.base.ZooKeeperClient: Retrying zookeeper connection". But I don't see any traffic on lo when I sniff with tcpdump?14:29
mhuinmasterpe[m], ha! I am having the same problem as well and it's driving me crazy14:29
mhuinI've tried to follow the steps in the ensure-zookeeper role in zuul-jobs, but no success14:30
mhuinhttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-zookeeper/tasks/setup_tls.yaml14:30
fungithis is what our zk config for opendev looks like: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/zookeeper/templates/zoo.cfg.j214:32
corvusmasterpe: any info in the zk server logs?14:33
*** marios is now known as marios|ruck14:34
mhuinfungi, I have a standalone deployment, but I don't think that matters in that case, right?14:36
fungimhuin: by standalone you mean single-node zk deployment rather than a cluster? i don't think that's likely to matter from a connectivity perspective14:37
gtemawhat makes definitely sense for the SSL ZK:14:37
gtema- forward and reverse DNS14:38
mhuinfungi, yep14:38
gtema(dns is crucial for host verification)14:38
gtemaensuring ZK starts properly and is able to get quorum - in SSL setup not self undernstandable14:39
corvusmhuin: one zk server, or multiple?14:43
mhuincorvus: one standalone14:43
mhuinI know the 2281 port is open and listening14:43
mhuinbut logs aren't very helpful on zuul's nor zk's side. I'm probably missing something though...14:44
gtemacan you try zkCli.sh?14:44
gtemamaybe you need zkCli.sh -server localhost:228114:44
corvusmhuin: that setup is similar to the unit tests; tools/test-setup-docker.sh sets up that environment14:45
gtemaor depending on the SSL setup you might really need to connect using you real IP address, otherwise ZK rejects connect since it fails to validate 14:45
mhuingtema you might be on to something here14:45
corvusi mean to say, that script will set up a single-node zk environment on localhost, which i use when running unit tests; the unit tests configure zuul to connect to localhost:228114:46
mhuincorvus, yes, I looked at the unit tests first to see how to do it14:46
gtemain my real cluster SSL setup I can't connect using localhost or 127.0.0.1 - I need a real IP address (but I run ZK inside of container)14:47
mhuingtema: progress with zkCli > SASL config status: Will not attempt to authenticate using SASL (unknown error)14:47
mhuinlove unknown errors14:47
gtemayou definitely need to look into zk logs14:48
*** jpena is now known as jpena|out14:48
gtemathat is all the reason I really "love" java apps14:48
corvus(we shouldn't be using sasl)14:48
gtemait's a normal condition to only throw exceptions. Good luck figuring out14:48
gtema@corvus - that's right, but zkCli somehow tries this by default14:48
gtemaif you only mention ssl - is wants sasl14:49
gtemaanyway, when your client can't connect properly you only have chance figuring out from zk logs (I know how great they are)14:49
corvusmhuin: maybe the most productive thing would be to paste the zk server logs from when zuul is trying to connect?14:52
gtemamaybe14:52
mhuincorvus, I'm tailing both the logs from zuul and from zk, but I only see the timeout notifications from zuul. it's like nothing happens on zk14:53
mhuinI did see the connection errors when using zkCli though14:53
gtemawhat if you zuul config is trying to connect to wrong ip (localhost from container, etc)14:54
corvusmhuin: are you sure you got the port number right?  i switch the numbers all the time14:54
mhuincorvus, I just checked, it's set to 2281 on both sides14:55
mhuingtema, I'm not using containers and everything is on the same VM14:55
gtemaok14:55
gtemaand on both sides it is localhost?14:57
mhuingtema, yes, I think so14:57
gtemamaybe just for ensuring try to set real ip in zuul config14:58
gtemasometimes apps can start only listening on ipv614:58
corvusyeah, v4/v6 could be an issue14:59
* corvus < https://matrix.org/_matrix/media/r0/download/matrix.org/JdcNmEdtMlFKqiMiJDJETniw/message.txt >14:59
corvusdid that pastebomb irc?14:59
mhuinit turned into a link15:00
corvusawesome15:00
gtema:) matrix is absolutely fine with that15:00
corvusmhuin: ^ there's a hello-world script for you, may make testing easier15:00
corvusthat's more or less zuul's zk client connection setup15:00
mhuinthanks corvus 15:01
corvus(i ran that successfully against my zk running in docker with tools/test-setup-docker.sh15:01
corvusnetstat -l |grep 2281 shows:15:02
corvustcp        0      0 0.0.0.0:2281            0.0.0.0:*               LISTEN     15:02
corvusso my zk is listening on ipv4 only15:02
mhuinugh, it's listening on ipv615:04
fungii seem to recall making java apps dual-stack involves separate listening sockets for each address family15:04
mhuinnetstat -laputen | grep 228115:04
mhuintcp6       0      0 :::2281                 :::*                    LISTEN      0          476052     369509/java 15:04
fungimhuin: can you confirm whether that's reachable over a valid ipv4 address for the host?15:05
fungiit's possible modern java has solved the single socket dual-stack problem15:06
mhuinok so corvus' test script did manage to connect15:08
mhuinok ... bad file permissions15:15
* mhuin facepalms15:15
corvusmhuin: for the certs?15:15
fungibasic things like that are often what consumes most of my personal sanity as well15:15
corvus(client side?)15:16
corvusjust wondering if there's a check we can add to zuul15:16
mhuincorvus, yes - mind you, I'm testing the packaging of zuul for fedora. so zuul runs as the zuul user on the system15:16
mhuinthe copy of the certs used by zuul needed to be owned by zuul15:16
corvusso we should add a read check to the zuul client init15:16
mhuincorvus, it's probably specific to the way we deploy15:17
fungiahh, yeah the client connection does need read access to its private key15:17
mhuinbut it probably wouldn't hurt to ensure the files are readable15:17
corvusyeah, but it's an easy error for any deployer to make, and if the kazoo client initializer doesn't emit a useful error in that case, it's better that we do :)15:17
mhuinI'm actually surprised no permission error is brought up15:17
corvusexactly15:17
* masterpe[m] < https://matrix.org/_matrix/media/r0/download/matrix.org/loWCPRvVyqcBSAzmLcLskjyv/message.txt >15:43
* masterpe[m] < https://matrix.org/_matrix/media/r0/download/matrix.org/KXoAOpfteqaiDSVuAjhIbznW/message.txt >15:43
* masterpe[m] < https://matrix.org/_matrix/media/r0/download/matrix.org/zakAsLKxGTxJabdqUCGsawPl/message.txt >15:43
mhuinmasterpe[m], so for me it was a file permissions problem on the certs, zuul could not read them15:44
* masterpe[m] < https://matrix.org/_matrix/media/r0/download/matrix.org/MlPACqwwTfmrCjpkTxlxjzGp/message.txt >15:44
corvusmasterpe: any chance you have the same problem as mhuin?  zuul running as a different user and couldn't read the certs?15:44
masterpe[m]ohw15:45
masterpe[m]That can be15:45
mhuinalthough now I have anotehr problem, the scheduler complains about the key store password15:45
mhuinthere's no mention of that on the doc though15:45
mhuinraise RuntimeError("No key store password configured!")15:46
mhuinwell I see it mentioned here: https://zuul-ci.org/docs/zuul/discussion/components.html?highlight=key%20store#attr-keystore15:48
opendevreviewJames E. Blair proposed zuul/zuul master: Verify ZK certs can be read  https://review.opendev.org/c/zuul/zuul/+/79713515:49
corvusmhuin: oh, that doc could possibly be clearer -- it's nothing related to the zookeeper connection15:50
corvusmhuin: it's just a password that zuul uses to encrypt its own data15:50
mhuincorvus, ok, so it's not related to the error I'm seeing now?15:50
corvusmhuin: it is related to that error15:51
corvusit's just not a zookeeper connection issues15:51
corvusmhuin: basically:  just make up a password and put it in zuul.conf and you're done ;)15:51
mhuinah gotcha15:51
mhuinis that something recent? we have a fairly recent deployment of zuul for sf.io and I don't see this option set15:52
corvusmhuin: but make it really long and random; like "pwgen -s 256" or something -- really i think fungi suggested it should be at least as long as any private keys you are likely to encrypt as zuul secrets)15:53
corvusmhuin: pretty soon after 4.0 i think15:53
corvusit's been a few months15:53
mhuinhmm weird, I'll check the logs then15:54
gtema@corvus - any info on whether I ever need to rotate it, or what to do if it's lost, etc?15:55
fungiyes, in short since the password you provide for the secret store is going to be protecting keys held in that keystore, it would be unfortunate for brute-forcing the password protecting those keys to be easier than brute-forcing one of the keys themselves15:56
corvusmhuin: fyi, the purpose of that is so that we can put the zuul secrets encryption keys in zookeeper, in encrypted form, without worrying about having to secure the zk data storage (some locations have "encryption at rest" policies, which this should hopefully comply with)  that's why it's called the "keystore" password15:56
gtemayeah, it's relatively clear. But apparently I have used initially a relatively short pwd and have no clue now whether I can simply change it, or whether I need to clean ZK if I decide to change it (export/import of the keys)15:57
corvusgtema: right now, the keys are still written to disk on the scheduler, and, if they don't exist in zk, are read from disk.  so they are effectively a backup of the data in zk.  you should be able to stop the scheduler, delete all of zk, change the password, then start the scheduler and it will re-read the keys from disk and use the new password.15:58
corvusgtema: before we remove the "filesystem backup" feature, we'll make sure we have a command line utility to export/import the keys, so the same would be possible.15:58
corvusand then eventually, hopefully a real key rotation mechanism :)15:58
gtemaoki, thks15:59
*** marios|ruck is now known as marios|out16:02
*** jpena|out is now known as jpena16:22
opendevreviewJames E. Blair proposed zuul/zuul master: Execute builds via ZooKeeper  https://review.opendev.org/c/zuul/zuul/+/78898816:30
opendevreviewJames E. Blair proposed zuul/zuul master: Move build request cleanup from executor to scheduler  https://review.opendev.org/c/zuul/zuul/+/79468716:30
opendevreviewJames E. Blair proposed zuul/zuul master: Handle errors in the executor main loop  https://review.opendev.org/c/zuul/zuul/+/79658316:30
*** jpena is now known as jpena|off16:48
SpamapSinteresting, a bunch of tests failed with this: "    OSError: [Errno 24] Too many open files"17:06
fungisounds like a personal problem ;)17:07
clarkbopen files                          (-n) 1024 is my local ulimit value and I was able to run the zuul test suite as recently as a week ago17:07
fungibut yeah, maybe low open file limit? or maybe some runaway test opened waaaay too many files17:07
SpamapSI wonder if it's something with the slowness of the system17:07
fungithat could also maybe cause a pileup17:08
clarkbmy ulimit isn't super high which makes me suspect some sort of runaway. I guess slow tests hanging around and piling up fds could do it17:08
SpamapSRan: 1129 tests in 7014.8591 sec.17:08
fungithat's... a lot of seconds17:08
clarkblocally the tests run in about 2400 seconds17:08
SpamapS - Failed: 58617:08
SpamapSAll that I've seen so far failed with too many open files in the kazoo code17:08
clarkbSpamapS: are you using the tools/test-setup-docker.sh script to set up zookeeper and friends?17:09
SpamapSyes17:09
SpamapSIt's possible though that this laptop just isn't beefy enough to run zk17:09
clarkbok, that mounts zookeepers data dir on tmpfs which should help a bit with speed17:09
SpamapSmy IO is pretty fast as it's an SSD. It's the CPU and memory bus that are just incredibly slow17:10
SpamapSso far I re-ran 10 of the failing tests and they all passed17:12
SpamapSso yeah, something that closes sockets just got behind17:12
fungimaybe try reducing the parallelism?17:12
fungiit's probably guessing high based on cpu core count17:13
clarkb--concurrency=`python -c "import multiprocessing; print(int(multiprocessing.cpu_count()/2))"`17:13
SpamapSclint@clint-Inspiron-3185:~/src/zuul-ci/zuul$ python -c "import multiprocessing; print(int(multiprocessing.cpu_count()/2))"17:14
SpamapS117:14
SpamapSSo it's not that. :)17:14
fungiyeah, don't think you can go much lower without having to deal with infinities ;)17:14
SpamapSNo I really think that it's just the thread or task that closes sockets just not getting CPU time.17:15
SpamapSNot a real concern at all. Nobody should be using this terrible machine to run the entire suite. :)17:15
clarkbbetween mysql, zookeeper and the test suite 2 total cpus may not be sufficient17:15
avass[m]yean usually likes opening a lot of files, could it be that somehow?17:15
avass[m]yarn*17:15
opendevreviewJames E. Blair proposed zuul/zuul master: Shard BuildRequest parameters  https://review.opendev.org/c/zuul/zuul/+/79714918:37
opendevreviewMerged zuul/zuul-jobs master: Add role to enable FIPS on a node  https://review.opendev.org/c/zuul/zuul-jobs/+/78877818:50
opendevreviewJames E. Blair proposed zuul/zuul master: Shard BuildRequest parameters  https://review.opendev.org/c/zuul/zuul/+/79714920:05
opendevreviewJames E. Blair proposed zuul/zuul master: Compress sharded ZK data  https://review.opendev.org/c/zuul/zuul/+/79715620:14
corvusmasterpe: element tells me you are a mod (power level 50) in the matrix room -- do you know how that happened?20:50
masterpe[m]no21:00
masterpe[m]Can it be that I'm using my own home server?21:01
gtemaCorvus, I guess this happened when people joined room before they became managed by us. I got also mod on ansible-sig room (pretty much joined first)21:02
gtemaPerhaps created IRC room through matrix21:03
masterpe[m]I think I joined this oftc room first yesterday.21:03
gtemaHm, then it's something different21:03
corvusseveral other people are on their own homeserver, and it's been around for quite a while...21:03
corvusi'll see if i can ask in #matrix-irc21:04
corvusmasterpe: don't worry about removing perms right now; let's leave it alone to see if we can figure out how it happened21:04
corvusokay, super weird idea: one of the access levels supported by oftc's chanserv access list is "MASTER"... and... well, that's a substring of masterpe.  that makes no sense to me, but it's the only connection i see.21:09
masterpe[m]Then I must be on multiple channels have moderator rights.21:10
corvusmasterpe: but only OFTC portal rooms -- are you in any others?  if not, consider joining #_oftc_#opendev:matrix.org and we can see what happens there21:11
corvus(this would not apply to freenode or libera.chat)21:11
corvusmasterpe:  i note in an irc client, you are not a chanap, so it's just a matrix power level issue21:21
*** ChanServ sets mode: +o corvus21:21
*** ChanServ sets mode: -o corvus21:22
corvusapparently the main bridge developer is on holiday, so i'm not expecting an answer any time soon :)21:30
corvusi'm assuming we could fix the problem by toggling the mod status like i just did for myself, but i'm still curious how it happened.  so i'll leave it for a bit in case we do get a response21:30
corvusmasterpe: thanks for the info and the #opendev test :)21:30
clarkbcorvus: left a few comments on https://review.opendev.org/c/zuul/nodepool/+/781926 one of which I felt was worth a -121:31
masterpe[m]corvus: Your welcome.21:32
opendevreviewJames E. Blair proposed zuul/nodepool master: Azure: update documentation  https://review.opendev.org/c/zuul/nodepool/+/78192621:40
corvusclarkb: thanks, fixed!21:40
opendevreviewJames E. Blair proposed zuul/nodepool master: Rename pip4/6 to public_ipv4  https://review.opendev.org/c/zuul/nodepool/+/79350821:41
opendevreviewMerged zuul/nodepool master: Azure: don't require full subnet id  https://review.opendev.org/c/zuul/nodepool/+/78040221:47
opendevreviewMerged zuul/nodepool master: Azure: add quota support  https://review.opendev.org/c/zuul/nodepool/+/78043921:47
corvusi just ran the test suite repeatedly on several changes in the build-requests-in-zk stack; "sum of execute time" stays around 9700 seconds, so that's probably not going to be a huge performance impact.21:58
opendevreviewJames E. Blair proposed zuul/zuul master: Add item UUID to MQTT reporter  https://review.opendev.org/c/zuul/zuul/+/79716522:09
SpamapSWith ulimit set to 4096, I don't get any too many files errors, but I do get a lot of other tests breaking that pass running singularly. I just think my little laptop is too slow to run the whole suite. :-P23:06
SpamapSThere may be races that only show up on really slow CPUs. ;)23:07
opendevreviewMerged zuul/nodepool master: Azure: implement launch retries  https://review.opendev.org/c/zuul/nodepool/+/78068223:10
SpamapSSo, question: what's this ansible-core vs. ansible comunity thing?23:12
SpamapSAnd... what version of which does Zuul care about?23:12
SpamapSwow what a mess of a thing23:13
SpamapSOk I think I get it... ansible-core is the engine and ansible community is all the "batteries included" modules.23:14
SpamapSSo looks like Zuul would care about community.23:14
corvusspamaps: yeah, i think so -- at least, assuming community is more or less the set of modules that we took for granted before23:29
SpamapSThe thing we're installing is community.23:29
SpamapSAnd I think that's the simpler if not fatter choice.23:30
corvuscool :)23:30
SpamapSAbout to test 2.10 ;)23:30
corvusmaybe when we grow native support for galaxy or whatever, we could pare down what zuul installs, then jobs can request more.  then again, maybe everyone wants community and we just keep doing that.  :)23:30
SpamapSYeah it could result in a leaner Zuul.23:31
SpamapSThough I don't know if anybody is complaining about the installed size of zuul itself.23:32

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!