Saturday, 2021-10-16

-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:00:18
- [zuul/zuul] 814070: Create Abstract and FrozenJob classes https://review.opendev.org/c/zuul/zuul/+/814070
- [zuul/zuul] 814242: Make FrozenJob.updateParentData a static method https://review.opendev.org/c/zuul/zuul/+/814242
- [zuul/zuul] 814243: WIP Make FrozenJob a ZKObject https://review.opendev.org/c/zuul/zuul/+/814243
@fungicide:matrix.orgnot super urgent, but we have a hung pipeline in opendev's zuul, and i think it may be related to this exception: https://paste.opendev.org/show/81004718:37
@fungicide:matrix.orgdoes that look familiar to anyone? luckily it's in a queue that's not in the critical path to merging (most) changes, but it's starting to become a pileup as new changes get added for evaluation against the trigger rules for it and then never cleared18:39
@vlotorev:matrix.orgHi, I'm trying to upgrade dev instance from 4.7.0 to 4.10.2. The issue is with keys: scheduler reports in logs that created new private keys. Instance is started, but zuul fails to decrypt secrets from trusted project.18:47
@vlotorev:matrix.org * Hi, I'm trying to upgrade dev instance from 4.7.0 to 4.10.2. The issue is with keys: scheduler reports in logs that it created new private keys (INFO    zuul.KeyStorage: Generating a new SSH key for gerrit/). Instance is started, but zuul fails to decrypt secrets from trusted project.18:47
@vlotorev:matrix.org * Hi, I'm trying to upgrade dev instance from 4.7.0 to 4.10.2. The issue is with keys: scheduler reports in logs that it created new private keys (INFO    zuul.KeyStorage: Generating a new SSH key for gerrit/...). Instance is started, but zuul fails to decrypt secrets from trusted project.18:48
@vlotorev:matrix.orgI've read release notes https://zuul-ci.org/docs/zuul/reference/releasenotes.html, and still don't quite understand. `zuul import-keys/export-keys` commands are added in 4.10.0, so there is no way to call 'export-keys' using 4.7 to 'import-keys' later in 4.10.18:51
@fungicide:matrix.orgvlotorev: 4.7.0 should have already been copying the keys automatically into zookeeper every time the scheduler was started19:03
@fungicide:matrix.orgthe relevant upgrade note was in 4.9.0: "Zuul no longer reads or writes project private key files from the scheduler’s filesystem. In order to load existing keys into ZooKeeper, run version 4.6.0 of the scheduler at least once, if you haven’t already."19:03
@vlotorev:matrix.orgfungi: I don' quite understand: I'm using docker-compose env (based on official Zuul quickl-start). In this setup zookeeper is created each time docker-compose is up. No state/data directory is mounted from zookeeper on host.19:09
So while I'm using 4.7.0 zuul recreates all the keys from filesystem into fresh zookeeper.
But on 4.10 both zuul and zookeeper are empty so keys are recreated on each docker-compose start?
@vlotorev:matrix.orghttps://opendev.org/zuul/zuul/src/branch/master/doc/source/examples/zoo.cfg has `dataDir`, is this directory should be persistent during ZooKeeper restarts?19:16
@vlotorev:matrix.org * https://opendev.org/zuul/zuul/src/branch/master/doc/source/examples/zoo.cfg has `dataDir`, should this directory be persistent during ZooKeeper restarts?19:17
@fungicide:matrix.orgprior to 4.6.0, the scheduler used keys it found on disk or created them if they didn't exist. starting from 4.6.0 up until 4.9.0 the scheduler read keys from disk and put them into zookeeper each time it started, creating new keys on disk if they didn't exist. as of 4.9.0 the scheduler stopped reading keys from disk, and stopped creating keys on disk if they didn't exist, creating them now if they're not present in zookeeper19:17
@fungicide:matrix.orgif you're wiping all of zookeeper between restarts or upgrades, then i expect the scheduler is now creating new keys every time that happens19:18
@fungicide:matrix.orgso any secrets created for the old keys are no longer able to be decrypted19:19
@fungicide:matrix.orgvlotorev: yes, it's expected that data will be persisted between zk restarts. this is what the zk container docker-compose looks like in opendev's deployment (note the host paths mapped into the container): https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/zookeeper/files/zookeeper-compose/docker-compose.yaml#L11-L1619:22
@vlotorev:matrix.orgfungi: Thank you, now I see what is going on and how to persist the zk state.19:27
@fungicide:matrix.orgit's also worth noting that in production deployments you'd typically have at least three zk servers, so they don't really even need their on-disk data at restart unless the other cluster members are offline19:28
@fungicide:matrix.orgor at least don't need all of it, like the replay logs19:29
@fungicide:matrix.orgat any rate, it's good to have them persisted even with a cluster, for disaster recovery purposes19:30
@vlotorev:matrix.orgThanks, my zuul instance is small. I just do docker-compose down/up for 'production' upgrade in the evening when no one is creating Gerrit patchsets. I'm way too far from three zk servers :)19:36
@fungicide:matrix.orgyeah, the up-side to having only one zk server is you don't have to worry about the cluster figuring out its quorum20:05
@vlotorev:matrix.orgI've updated to Zuul 4.10.2 and nodepool 4.3.0. But Zuul web doesn't show any nodes. If I run `docker-compose exec launcher nodepool list` then all nodes are available. If I trigger the job, the job is run on one of the node. No errors or warnings in scheduler/web/launcher logs to give a hint what is wrong.21:07

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!