Wednesday, 2021-04-21

openstackgerritMerged zuul/zuul-jobs master: ensure-docker: ensure docker.socket is stopped  https://review.opendev.org/c/zuul/zuul-jobs/+/78727100:18
openstackgerritMerged zuul/nodepool master: Require dib 3.10.0  https://review.opendev.org/c/zuul/nodepool/+/78698400:23
*** sam_wan has joined #zuul00:59
*** sam_wan has quit IRC01:36
*** ikhan has quit IRC02:07
*** ajitha has joined #zuul02:30
*** evrardjp has quit IRC02:33
*** evrardjp has joined #zuul02:33
*** sam_wan has joined #zuul03:16
*** rlandy|rover has quit IRC03:34
*** ykarel|away has joined #zuul04:06
*** ykarel_ has joined #zuul04:10
*** ykarel|away has quit IRC04:12
*** bhavikdbavishi has joined #zuul04:15
*** bhavikdbavishi1 has joined #zuul04:18
*** bhavikdbavishi has quit IRC04:20
*** bhavikdbavishi1 is now known as bhavikdbavishi04:20
*** bhavikdbavishi has quit IRC04:27
*** bhavikdbavishi has joined #zuul04:28
*** bhavikdbavishi has quit IRC04:39
*** hamalq has quit IRC04:49
*** vishalmanchanda has joined #zuul04:55
*** jfoufas1 has joined #zuul05:11
*** paladox has quit IRC05:55
*** ykarel_ has quit IRC05:55
*** ykarel__ has joined #zuul05:55
*** mnaser has quit IRC05:59
*** bhavikdbavishi has joined #zuul05:59
*** mnaser has joined #zuul06:00
*** saneax has joined #zuul06:25
*** jcapitao has joined #zuul06:34
*** ykarel_ has joined #zuul06:38
*** ykarel__ has quit IRC06:40
*** bhavikdbavishi1 has joined #zuul06:53
*** bhavikdbavishi has quit IRC06:54
*** bhavikdbavishi1 is now known as bhavikdbavishi06:54
*** avass has quit IRC07:09
*** avass has joined #zuul07:10
*** rpittau|afk is now known as rpittau07:33
*** bhavikdbavishi has quit IRC07:35
*** bhavikdbavishi has joined #zuul07:35
*** tosky has joined #zuul07:46
*** bhavikdbavishi has quit IRC07:48
*** ykarel_ has quit IRC07:52
*** jpena|off is now known as jpena07:56
*** nils has joined #zuul08:08
*** bhavikdbavishi has joined #zuul08:08
*** bhavikdbavishi1 has joined #zuul08:11
*** bhavikdbavishi has quit IRC08:13
*** bhavikdbavishi1 is now known as bhavikdbavishi08:13
*** ykarel_ has joined #zuul08:27
*** ykarel_ has quit IRC09:34
*** holser has joined #zuul09:52
openstackgerritIan Wienand proposed zuul/nodepool master: Account for resource usage of leaked nodes  https://review.opendev.org/c/zuul/nodepool/+/78582110:12
*** bhavikdbavishi has quit IRC10:18
*** bhavikdbavishi has joined #zuul10:25
openstackgerritMatthieu Huin proposed zuul/zuul master: web UI: user login with OpenID Connect  https://review.opendev.org/c/zuul/zuul/+/73408210:28
openstackgerritMatthieu Huin proposed zuul/zuul master: Add authentication-realm attribute to tenants  https://review.opendev.org/c/zuul/zuul/+/73558610:29
openstackgerritMatthieu Huin proposed zuul/zuul master: web UI: allow a privileged user to dequeue a change  https://review.opendev.org/c/zuul/zuul/+/73485010:29
openstackgerritMatthieu Huin proposed zuul/zuul master: web UI: allow a privileged user to re-enqueue a change  https://review.opendev.org/c/zuul/zuul/+/73677210:29
openstackgerritMatthieu Huin proposed zuul/zuul master: Web UI: allow a privileged user to request autohold  https://review.opendev.org/c/zuul/zuul/+/76811510:30
openstackgerritMatthieu Huin proposed zuul/zuul master: Web UI: add Autoholds, Autohold page  https://review.opendev.org/c/zuul/zuul/+/76819910:31
*** jcapitao is now known as jcapitao_lunch10:36
*** bhavikdbavishi has quit IRC11:17
*** bhavikdbavishi has joined #zuul11:29
*** jpena is now known as jpena|lunch11:32
*** rlandy has joined #zuul11:48
*** rlandy is now known as rlandy|rover11:49
*** rlandy|rover has quit IRC11:54
*** sshnaidm has quit IRC12:00
*** jcapitao_lunch is now known as jcapitao12:06
*** sshnaidm has joined #zuul12:07
*** rlandy has joined #zuul12:08
*** rlandy is now known as rlandy|rover12:08
*** okamis has joined #zuul12:30
*** jpena|lunch is now known as jpena12:32
*** sam_wan has quit IRC12:52
*** fsvsbs has quit IRC12:55
*** bhavikdbavishi has quit IRC13:35
*** saneax has quit IRC14:16
corvustobiash: do you have any thoughts on http://lists.zuul-ci.org/pipermail/zuul-discuss/2021-April/001566.html ?14:33
avasscorvus: I'm running my own deployment on the tip of the master branch and don't have those problems14:55
avassso maybe there's a change combined with specific github configuration that causes that?14:56
corvushuh.  super weird.  i guess we'll just wait for gtema_ to do more investigation14:56
corvusavass: maybe so?14:56
*** jfoufas1 has quit IRC14:57
*** bhavikdbavishi has joined #zuul14:59
*** saneax has joined #zuul15:05
avassalso I have some ideas how to extend zuul-cache to also handle provides/requires and fetching artifacts from a previous pipelines with the artifacts api :)15:13
avasscorvus: I double checked and both labels and reviews work for me and I'm running 4.2.1.dev8 4f3f973a15:15
*** okamis has quit IRC15:21
*** bhavikdbavishi1 has joined #zuul15:28
*** bhavikdbavishi has quit IRC15:30
*** bhavikdbavishi1 is now known as bhavikdbavishi15:30
openstackgerritMerged zuul/nodepool master: Log decline reason at info  https://review.opendev.org/c/zuul/nodepool/+/78651315:39
*** bhavikdbavishi1 has joined #zuul16:08
*** bhavikdbavishi has quit IRC16:11
*** bhavikdbavishi1 is now known as bhavikdbavishi16:11
*** saneax has quit IRC16:16
corvushrm, https://review.opendev.org/758940 failed quickstart, but afaict it just looks like a random zk disconnect16:20
corvusi'm going to recheck, but let's keep that in mind16:20
corvus(visible in the nodepool launcher)16:20
*** hamalq has joined #zuul16:22
*** hamalq has quit IRC16:23
*** hamalq has joined #zuul16:24
*** jcapitao has quit IRC16:41
*** jpena is now known as jpena|off17:03
*** bhavikdbavishi has quit IRC17:07
*** bhavikdbavishi has joined #zuul17:08
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: ensure-docker: prevent issue on centos-7 where the socket does not exists  https://review.opendev.org/c/zuul/zuul-jobs/+/78742117:12
*** bhavikdbavishi has quit IRC17:24
*** rpittau is now known as rpittau|afk17:24
tristanCaccording to https://bugs.launchpad.net/tripleo/+bug/1925372, the change for ensure-docker with the socket service broke centos-7 job17:26
openstackLaunchpad bug 1925372 in tripleo "centos-7 content provider failing to install and start docker" [Critical,Triaged]17:26
corvustristanC: any idea why the centos7 test job didn't catch that?17:38
corvuszuul-jobs-test-ensure-docker-centos-717:39
openstackgerritMerged zuul/zuul master: Store secrets keys and SSH keys in Zookeeper  https://review.opendev.org/c/zuul/zuul/+/75894017:40
*** bhavikdbavishi has joined #zuul17:46
corvustristanC: is it because the zuul-jobs test uses upstream repos and tripleo does not?17:48
mordredcorvus: looking through logs - the difference ... yupo17:48
mordredthat's what I was just in the middle of writing17:48
mordredtripelo jobs are using distro, zuul-jobs test is using upstream17:48
corvusok, that makes sense17:48
corvusand distro might not even have a socket service17:48
mordredwe could potentially put a when: not distro instead of a failed_when false17:49
tristanCtripleo jobs does seem to be using distro packages17:49
mordredbut- that might not be accurate anywhere other than centos7 (I'm gussing distro-docker on centos7 is old)17:49
mordredso it might need to be when: not centos7 and not use-distro-packages17:50
corvusmaybe we could add a comment so that we remember what we're protecting against17:50
mordredyeah17:50
corvusand maybe we need to 2x the jobs and run them both ways?17:50
corvusnormally i'd hesitate to do that, but this role is important and becoming widely used, and it's almost two very different circumstances depending on the flag17:51
mordredyeah - I don't think it's a crazy idea17:52
tristanCi don't mind using an alternative attribute/comment17:52
corvustristanC: cool -- how about we merge your existing change to fix tripleo quick, then add a comment and/or change the condition and add a second set of tests in a followup?17:53
tristanCthat works for me, let me do a follow-up then17:54
corvuscool, +317:55
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: ensure-docker: do not manage the socket on distro centos  https://review.opendev.org/c/zuul/zuul-jobs/+/78742918:01
tristanChere is a follow-up using `when: not centos7 and not use-distro-packages`, but perhaps we could also check for a docker.socket service instead?18:02
*** y2kenny has joined #zuul18:06
*** nils has quit IRC18:07
avasswhat would happen if "enabled: true" is removed? if the docker.socket isn't present maybe ansible is smart enough to not do anything?18:07
avassdocs say "At least one of state and enabled are required." and "started/stopped are idempotent actions that will not run commands unless necessary." so maybe?18:09
y2kennyHi, this has been bugging me for awhile but I am not sure if it's a known bug or configuration issue.  On the Zuul web UI status page, when a build set has multiple jobs running, while the job is in progress, there is a link going to the stream log.  When a build/job is finished while other jobs in the same buildset are still going, there's a18:11
y2kennylink to http://<server>/t/<tenant>/build/<build id> for the finished job.  But that link always goes to "build does not exist" with 404 error on api/tenant/<tenant>/build/<build id>... Is that a known issue?18:11
fungiy2kenny: should be configurable, in opendev's deployment it goes to the upload location for that build's logs (since the build is not recorded into the database until the entire buildset reports)18:13
avasslooks like the service module still fails if it's told to stop a service that doesn't exist18:14
fungiy2kenny: soon i think zuul is switching when build information is written to the db to be as soon as each build completes rather than being implemented as a reporter18:14
y2kennyfungi: Ah ok, I was about to ask about that.18:15
tobiashcorvus: I've seen that. I think this should be analyzed. However we're running 4.2.0 in production without issues so that sounds weird18:15
openstackgerritMerged zuul/zuul-jobs master: ensure-docker: prevent issue on centos-7 where the socket does not exists  https://review.opendev.org/c/zuul/zuul-jobs/+/78742118:19
corvustobiash, avass: so we have 1 report of failure (with no logs) and 2 successes.18:23
corvusoof, 2 unit test timeouts on https://review.opendev.org/785972  both on bhs118:28
corvuswe're probably getting close to the point where we need to bump the timeout; i sort of expected us to slowly creep up as we did more zk work18:28
avassyeah and since they mention k8s and doing a rollback maybe they also reverted something else. we're going to need a bit more information at least18:28
fungiprobably ovh-bhs1 is acting as a canary because it's least suited to whatever the bulk of the resource consumption in those jobs is18:30
fungii agree it seems like an indication we need to increase the timeout (or improve test efficiency somehow)18:30
corvusi have an "easy" way which is not so easy: if we can find a way to roll-up sql schema migrations it would save a huge amount of time18:31
corvusi just don't see how to do that with alembic and still support arbitrary migrations18:32
corvusi think what i really want is a tree of migrations with multiple starting points;  like $current can be reached via the existing tree of migrations or a rollup migration.18:32
corvusthen 99% of the tests can use the rollup.  but i haven't seen how to convince alembic that a tree with multiple roots is okay.18:34
corvus(i believe a common way to handle rollup migrations with alembic is to require your users to upgrade to a certain point before upgrading past it; so you simply remove the ability to upgrade from any point before then.  that sounds user-unfriendly)18:35
*** y2kenny has quit IRC18:45
*** vishalmanchanda has quit IRC18:54
*** ajitha has quit IRC19:00
*** bhavikdbavishi has quit IRC19:10
*** hamalq has quit IRC19:44
*** hamalq has joined #zuul19:44
openstackgerritMerged zuul/zuul master: Move key_store_password to keystore section in zuul.conf  https://review.opendev.org/c/zuul/zuul/+/78597219:45
openstackgerritMerged zuul/zuul master: Support key versions and unique names in ZK keystorage  https://review.opendev.org/c/zuul/zuul/+/78677419:50
openstackgerritMerged zuul/zuul master: Pseudo-shard unique project names in keystore  https://review.opendev.org/c/zuul/zuul/+/78698319:50
corvushuzzah!19:57
*** nils has joined #zuul20:08
tobiash\o/20:11
corvusi'm coordinating a restart in #opendev20:19
corvusi'd like to restart opendev with that, and then land the global repo state changes20:19
*** nils has quit IRC20:21
*** nils has joined #zuul20:43
corvustobiash, swest: opendev is restarted with secrets in zk.  it took a few (~5?) minutes to import them for all the projects.  i restarted it a second time after that, and it took about 1.5 minutes to load them.  that's definitely workable, but it might be worth taking a look at whether we can speed that up.21:22
mordredcorvus: once we have multi-scheduler the 1.5 minutes might no longer matter?21:40
corvusmordred: yeah; there's definitely a balancing act between making things "worse" now in order to make them "better" later...21:40
corvusmordred: but even with multi-sched, it wouldn't hurt to be faster :)21:40
corvusso i'm thinking if we can do one or two low-hanging-fruit kind of things (like just call get_children once per connections) and they make a difference, great21:41
corvusotherwise, it's not worth fretting over21:41
openstackgerritJames E. Blair proposed zuul/zuul master: Add a fast-forward test  https://review.opendev.org/c/zuul/zuul/+/78652122:05
openstackgerritJames E. Blair proposed zuul/zuul master: Correct repo_state format in isUpdateNeeded  https://review.opendev.org/c/zuul/zuul/+/78652222:05
openstackgerritJames E. Blair proposed zuul/zuul master: Revert "Revert "Make repo state buildset global""  https://review.opendev.org/c/zuul/zuul/+/78553522:05
openstackgerritJames E. Blair proposed zuul/zuul master: Fix repo state restore / Keep jobgraphs frozen  https://review.opendev.org/c/zuul/zuul/+/78553622:05
openstackgerritJames E. Blair proposed zuul/zuul master: Restore repo state in checkoutBranch  https://review.opendev.org/c/zuul/zuul/+/78652322:05
openstackgerritJames E. Blair proposed zuul/zuul master: Clarify merger updates and resets  https://review.opendev.org/c/zuul/zuul/+/78674422:05
openstackgerritJames E. Blair proposed zuul/zuul master: Support overlapping repos and a flat workspace scheme  https://review.opendev.org/c/zuul/zuul/+/78745122:05
corvusthat's a rebase plus a new one22:06
*** nils has quit IRC22:06
*** rlandy|rover is now known as rlandy|rover|bbl22:24
corvusclarkb: when you have a second, would you mind doing a re-review of https://review.opendev.org/785536 ? i think you previously +2d it when it was a pair of changes; i've squashed it since then.  and also https://review.opendev.org/786744 which is new -- it's an attempt to make merger stuff easier to understand.22:30
clarkbI'll try! (too many things today)22:31
corvusoh yeah, sorry, i just saw the new cloud was :(22:32
clarkbno worries, it was my own fault22:35
clarkbjust working to make it happy now22:35
ianw2021-04-21 22:35:36,688 ERROR nodepool.builder.CleanupWorker.0: Exception cleaning up image fedora-32:22:36
ianw2021-04-21 22:35:36,687 ERROR nodepool.zk.ZooKeeper: Error loading json data from image build /nodepool/images/fedora-32/builds/000005796822:37
ianwjson.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)22:37
ianwis this ringing any bells, a null entry in ZK maybe?22:37
clarkbianw: ya I think that is an issue no one has been able to track down22:38
clarkband we've just manualyl removed the znode to address it in the past? (though we made its impact less bad by skipping to the next cleanup iirc rather than bailing out)22:38
ianwok22:39
ianwthis started happening on nb03 @22:39
ianw2021-04-21 09:17:00,614 DEBUG nodepool.builder.CleanupWorker.0: Removing failed upload record: <ImageUpload {'state': 'uploading', 'state_time': 1618996371.6001904, 'external_id': None, 'external_name': None, 'format': None, 'username': 'zuul', 'python_path': 'auto', 'shell_type': None, 'id': '0000000004', 'build_id': '0000096307', 'provider_name': 'osuosl-regionone', 'image_name': 'debian-buster-arm64'}>22:39
ianw2021-04-21 09:17:00,705 ERROR nodepool.zk.ZooKeeper: Error loading json data from image build /nodepool/images/fedora-32/builds/000005796822:40
ianwthat's pretty close together ... i wonder if the removing failed upload somehow affected it?22:40
ianwhowever, nb03 has hours of attempting to upload to osu and failing before that as well (see other discussions on the suspected ipv4 issues there)22:41
*** hamalq has quit IRC23:34
ianw2021-04-21 06:35:47,362 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from ovh-bhs123:37
ianw2021-04-21 06:35:53,663 ERROR nodepool.zk.ZooKeeper: Error loading json data from image build /nodepool/images/fedora-32/builds/000005796823:37
ianwWE DON'T LOG ANYTHING BETWEEN THOSE TWO23:38
ianwsorry, caps lock23:38
ianw(CONNECTED [localhost:2181]) /nodepool/images/fedora-32/builds> json_cat 000005796823:40
ianwit's just empty, as suspected23:40
ianwahh, no it's not!23:41
ianw(CONNECTED [localhost:2181]) /nodepool/images/fedora-32/builds/0000057968/providers> ls23:41
ianwovh-bhs123:41
*** tosky has quit IRC23:42
ianw(CONNECTED [localhost:2181]) /nodepool/images/fedora-32/builds/0000057968/providers/ovh-bhs1/images> ls23:44
ianwis blank.  so somehow ovh-bhs1 has no recorded images but a zombie entry23:45
fungii feel like we've had empty image build znodes before, and never managed to work out what causes that to happen23:49
corvusit could be a race/sequencing issue with locks23:50
ianwnodepool-builder.log.2021-04-20_23:2021-04-21 06:22:40,129 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from vexxhost-ca-ymq-123:52
ianwnodepool-builder.log.2021-04-20_23:2021-04-21 06:35:47,362 INFO nodepool.builder.CleanupWorker.0: Deleting image build fedora-32-0000057968 from ovh-bhs123:52
ianwthe vexxhost one didn't seem to have any issues.  the ovh-bhs1 did.  so possibly looking for something that happened between 06:22 -> 06:3523:53
ianwthis is on nb01.  i wonder if 02 did something in that period?23:53
corvusif it's a lock race/sequencing issue it would be triggered by the last one.23:53
fungiand yeah, the previous incidents i've observed did look like they came in bursts23:55
ianw2021-04-21 06:22:31,024 ERROR nodepool.builder.UploadWorker.0: Failed to upload build 0000057970 of image fedora-32 to provider ovh-bhs123:55
ianwopenstack.exceptions.ConflictException: ConflictException: 409: Client Error for url: https://image.compute.bhs1.cloud.ovh.net/v2/images/2da84d78-f42f-4f8a-95f4-9405df0d9443/file, Conflict23:56
ianwdon't know what a "conflictexception" means23:56
openstackgerritJames E. Blair proposed zuul/zuul master: Lock node requests in fake nodepool  https://review.opendev.org/c/zuul/zuul/+/78730123:56
fungiianw: looks like openstacksdk raises that on a 409 status response from keystoneauth23:57
fungiso i guess the real question is under what circumstances does keystoneauth get a 40923:58
ianw2021-04-21 06:35:26,661 INFO nodepool.builder.UploadWorker.2: Image build fedora-32-0000057970 (external_id 12743826-2016-4ae7-b838-e4aefd919c7d) in ovh-bhs1 is ready23:59
ianw2021-04-21 06:22:31,140 INFO nodepool.builder.UploadWorker.2: Uploading DIB image build 0000057970 from /opt/nodepool_dib/fedora-32-0000057970.qcow2 to ovh-bhs123:59
ianwsorry, that's reversed, but it tries again and the image is ready by 06:3523:59
fungiyou'd think the whole point of mapping error codes to custom python exceptions would be so you could also attach descriptive explanations ;)23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!