Thursday, 2019-06-20

*** dmsimard0 is now known as dmsimard00:25
jheskethmhu: no worries, I need to revisit that series but I've been a little snowed under recently :-s00:30
*** mattw4 has quit IRC00:37
*** michael-beaver has quit IRC00:43
openstackgerritJoshua Hesketh proposed zuul/zuul master: Expose ansible_date_time instead of date_time  https://review.opendev.org/66626800:44
SpamapShrm00:49
SpamapSI wish there was a clear way to say "why didn't this event trigger that pipeline?"00:50
SpamapSLike, I just had to dig through hundreds of lines of logs to find Exception: Project GoodMoney/funnel-cake is not allowed to run job promote-funnel-cake00:53
SpamapSI wish that would somehow post to the PR00:54
SpamapS(or maybe get recorded as a failed build)00:54
SpamapSDoes look like it was recorded as a CONFIG_ERROR buildset00:54
SpamapSbut there's no link to the text00:54
fungifiguring out which things to report that about is where i get stuck01:00
SpamapSUltimately I think it belongs somewhere in zuul's database.01:01
fungithere are so many things zuul chooses not to run that it seems like it would be very overwhelming01:01
fungibut yeah, maybe database reporter01:01
SpamapSThe "don't run because not matching" is fine. CONFIG_ERROR though, is important.01:01
SpamapSThat error message is very clear, and the user will know what to do with it.01:02
SpamapSSo, it belongs in the user's hands.01:02
fungiand that config error isn't reported with the configuration errors in the status interface (via the "bell" icon)?01:11
SpamapSfungi:no, it's detected at runtime.01:15
SpamapS(though from what I see, it *could* be detected at config parse time)01:15
SpamapSIf you try to add a project stanza that isn't allowed, you should get an error.01:16
SpamapSInstead it lands just fine, and then at the time where it tries to run the job, it fails the allowed projects check.01:16
SpamapSI haven't looked closely though, there may be runtime circumstances that make it allowed.01:16
*** sanjayu__ has joined #zuul01:26
*** spsurya has joined #zuul01:39
*** rlandy|bbl is now known as rlandy01:42
pabelangerjlk: Did you get a chance to see about github3.py release process?02:01
*** rlandy has quit IRC02:44
*** jamesmcarthur has joined #zuul02:47
*** bhavikdbavishi has joined #zuul03:23
*** bhavikdbavishi1 has joined #zuul03:26
*** bhavikdbavishi has quit IRC03:27
*** bhavikdbavishi1 is now known as bhavikdbavishi03:28
*** jamesmcarthur has quit IRC03:35
*** jamesmcarthur has joined #zuul03:42
*** jamesmcarthur has quit IRC03:47
*** jamesmcarthur has joined #zuul03:55
*** sanjayu__ has quit IRC04:22
*** jamesmcarthur has quit IRC04:26
*** jamesmcarthur has joined #zuul04:34
*** bhavikdbavishi has quit IRC04:49
*** jamesmcarthur has quit IRC05:04
*** jamesmcarthur has joined #zuul05:16
*** jamesmcarthur has quit IRC05:21
*** jamesmcarthur has joined #zuul05:27
*** jamesmcarthur has quit IRC05:31
*** raukadah is now known as chandankumar05:40
*** jamesmcarthur has joined #zuul05:47
*** jamesmcarthur has quit IRC05:52
*** pcaruana has joined #zuul06:15
*** jamesmcarthur has joined #zuul06:20
*** jamesmcarthur has quit IRC06:30
*** saneax has joined #zuul06:45
*** jamesmcarthur has joined #zuul06:45
*** jamesmcarthur has quit IRC06:52
*** themroc has joined #zuul07:09
*** jamesmcarthur has joined #zuul07:21
*** bhavikdbavishi has joined #zuul07:32
*** jamesmcarthur has quit IRC07:33
*** bhavikdbavishi1 has joined #zuul07:35
*** bhavikdbavishi has quit IRC07:36
*** bhavikdbavishi1 is now known as bhavikdbavishi07:36
*** jamesmcarthur has joined #zuul07:44
*** jpena|off is now known as jpena07:46
*** jamesmcarthur has quit IRC07:48
*** jamesmcarthur has joined #zuul07:50
*** jamesmcarthur has quit IRC07:57
*** jamesmcarthur has joined #zuul07:58
*** bhavikdbavishi has quit IRC08:06
*** saneax has quit IRC08:06
*** sanjayu_ has joined #zuul08:06
*** bhavikdbavishi has joined #zuul08:06
openstackgerritTobias Henkel proposed zuul/zuul master: Add command processor to zuul-web  https://review.opendev.org/66630708:10
openstackgerritTobias Henkel proposed zuul/zuul master: Add repl server for debug purposes  https://review.opendev.org/57996208:10
*** bhavikdbavishi has quit IRC08:13
*** zbr has joined #zuul08:17
*** igordc has quit IRC08:20
*** yolanda has quit IRC08:54
*** jamesmcarthur has quit IRC08:55
*** jamesmcarthur has joined #zuul08:59
*** jamesmcarthur has quit IRC09:08
*** jamesmcarthur has joined #zuul09:18
openstackgerritIan Wienand proposed zuul/nodepool master: Pin to openshift <= 0.8.9  https://review.opendev.org/66652609:31
*** jamesmcarthur has quit IRC09:36
openstackgerritJean-Philippe Evrard proposed zuul/zuul-jobs master: Dockerhub now returns 200 for DELETEs  https://review.opendev.org/66652909:37
*** jamesmcarthur has joined #zuul09:45
openstackgerritMark Meyer proposed zuul/zuul master: Add Bitbucket Server source functionality  https://review.opendev.org/65783709:47
openstackgerritMark Meyer proposed zuul/zuul master: Create a basic Bitbucket build status reporter  https://review.opendev.org/65833509:47
openstackgerritMark Meyer proposed zuul/zuul master: Create a basic Bitbucket event source  https://review.opendev.org/65883509:47
openstackgerritMark Meyer proposed zuul/zuul master: Upgrade formatting of the patch series.  https://review.opendev.org/66068309:47
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213409:47
*** bhavikdbavishi has joined #zuul09:55
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213409:59
*** jamesmcarthur has quit IRC10:00
*** gtema has joined #zuul10:07
*** gtema has quit IRC10:16
*** gtema has joined #zuul10:16
openstackgerritJean-Philippe Evrard proposed zuul/zuul-jobs master: Dockerhub now returns 200 for DELETEs  https://review.opendev.org/66652910:20
*** electrofelix has joined #zuul10:23
*** electrofelix has quit IRC10:27
*** electrofelix has joined #zuul10:27
*** avass has joined #zuul10:35
*** NBorg has joined #zuul10:36
*** avass has quit IRC10:44
*** jamesmcarthur has joined #zuul10:45
*** jamesmcarthur has quit IRC10:50
*** jamesmcarthur has joined #zuul10:50
*** jpena is now known as jpena|lunch10:59
openstackgerritAlexander Braverman proposed zuul/nodepool master: Openshift client  https://review.opendev.org/66654110:59
*** jamesmcarthur has quit IRC11:06
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213411:08
*** NBorg has quit IRC11:08
*** jamesmcarthur has joined #zuul11:09
*** rfolco_off has joined #zuul11:20
*** rfolco_off is now known as rfolco11:21
*** rlandy has joined #zuul11:30
*** gtema has quit IRC11:30
*** rlandy is now known as rlandy|afk11:33
flaper87tobiash: thanks for the hint on using statefulsets. That worked11:35
*** rfolco is now known as rfolco_pto11:36
tobiash:)11:38
*** hashar has joined #zuul12:00
*** bhavikdbavishi has quit IRC12:12
*** jpena|lunch is now known as jpena12:17
*** themroc has quit IRC12:17
*** rfolco_pto has quit IRC12:35
ofososMy builds seem to be stuck :(12:35
*** themroc has joined #zuul12:37
*** rfolco has joined #zuul12:42
*** jamesmcarthur has quit IRC12:42
*** NBorg_ has joined #zuul13:00
*** pcaruana has quit IRC13:00
*** pcaruana has joined #zuul13:00
pabelangermorning, I'm having an issue with nodepool 3.7.0: http://paste.openstack.org/show/753220/13:19
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213413:19
pabelangerzuul-maint: ^13:20
pabelangerI'm rolling back, and going to debug in a few minutes13:21
pabelangeropenshift==0.9.013:28
pabelangerthat got pulled in13:28
pabelangerI suspect something changed in their API13:28
*** spsurya has quit IRC13:29
Shrewspabelanger: there are at least two changes up to fix that but I’m not really here today to review13:31
pabelangerokay, thanks13:31
pabelangerhttps://review.opendev.org/666526/13:32
pabelangercool, thanks fungi / tobiash13:32
pabelangerokay, downgrading openshift, has fixed the issue13:33
pabelangerwe maybe should consider a 3.7.1 release to pick that up13:33
*** rlandy|afk is now known as rlandy13:40
*** rfolco has quit IRC13:58
openstackgerritMerged zuul/nodepool master: Pin to openshift <= 0.8.9  https://review.opendev.org/66652613:59
*** sanjayu_ has quit IRC13:59
*** sanjayu_ has joined #zuul14:12
openstackgerritMark Meyer proposed zuul/zuul master: Extend event reporting  https://review.opendev.org/66213414:15
openstackgerritFabien Boucher proposed zuul/zuul master: Add missing start-message in pipeline config schema  https://review.opendev.org/66593614:22
*** jamesmcarthur has joined #zuul14:22
*** felixgcb has joined #zuul14:26
felixgcbhey :) have any of you guys ever tried to use "include_role" from one untrusted project to call a role in another untrusted_project? It works locally for me, but on the zuul-executer it just gives me an immediate "ok" result and proceeds..14:29
fungifelixgcb: is that other project in the roles list for the job?14:30
fungizuul-maint: there's a very lengthy log of me basically talking to myself over in #openstack-infra where it seems like we ended up perpetually locking a node for a paused build when the change was rescheduled (though i can't be certain)... it picks up around: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2019-06-20.log.html#t2019-06-20T13:54:0114:33
fungiif anybody has further ideas of things i should check (and how i should cleanly release that lock), it would be much appreciated14:33
*** sanjayu_ has quit IRC14:35
felixgcbfungi: It wasn't explicitly defined in the projects job.roles setting, but it should be read automatically, since it is a job in the roles directory. I tried adding it explicitly but that also didn't work..14:39
fungifelixgcb: you're talking about the project name, right? like this: https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L18-L1914:41
felixgcbfungi: ah this section is really helpful. is "zuul" the repository which contains that job?14:42
fungizuul/zuul-jobs is the repository which contains roles we want this particular job to also use14:43
openstackgerritFabien Boucher proposed zuul/zuul master: Add missing doc for pipeline start-message  https://review.opendev.org/66593014:43
openstackgerritFabien Boucher proposed zuul/zuul master: Add support for item.change for pipeline start-message formater  https://review.opendev.org/66596814:43
openstackgerritFabien Boucher proposed zuul/zuul master: Add change replacement field in doc for start-message  https://review.opendev.org/66597414:43
corvusfungi: locks should be released when the scheduler restarts14:43
fungifelixgcb: you have to tell zuul for a given job which other projects you want it to search for roles14:43
*** zbr is now known as zbr|ruck14:44
fungicorvus: but there's no (easy) way to manually tell the scheduler to release a lock for a build which has been (or was supposed to be) cancelled so that nodepool will clean up the node?14:44
fungicorvus: is the best course of action just to manually delete the node from outside zuul/nodepool and ignore the locked node entry in nodepool until the zuul scheduler is next restarted?14:46
*** michael-beaver has joined #zuul14:46
corvusfungi: i guess it depends on the goal -- if the zk entry is still there, nodepool will reserve space for it in quota calculations, so deleting the node won't necessarily get the quota back14:47
corvusfungi: wearing my opendev hat, i'd say "one node doesn't matter, just leave it until next restart"14:48
fungione goal is to comply with a request from the provider's support staff to clean up an apparently unused instance14:48
fungisince they were the ones who brought it to our attention14:49
corvusfungi: sure, then delete it out from under nodepool14:49
fungijust wanting to make sure i'm not going to cause more problems by deleting it out of band when there's a zuul lock on it14:49
fungithe bigger reason i brought it up in here was to better postulate about what could cause zuul to indefinitely hold a node lock associated with a build which seems to have been cancelled14:50
tobiashcorvus, fungi: I also noticed rare node leaks which I haven't got to analyze so far14:51
corvusit should be possible to find the exact sequence that caused that in the logs14:51
tobiashmy first quick log analysis a few weeks ago showed nothing special, but I'll also have a deeper look14:53
pabelangercorvus: if we do nodepool 3.7.1 (for openshift pin), do we need to add reno note to generate something for releasenotes page?14:53
fungiwould log messages about cancellation of a build not include the build uuid (and so that's why i'm not finding them)? or maybe did we never actually cancel the build?14:54
corvuspabelanger: yes; we should always have at least one release note (otherwise, why did we make a release?)14:54
tobiashbut that analysis was before we annotated the logs14:54
pabelangercorvus: ack14:54
corvustobiash: it was harder before, but still possible :)14:55
tobiashI'm sure it's possible, I just had more pressing issues last weeks14:56
tobiashWe leak around 20 nodes per week atm, and a temporary workaround is deleting the znode14:58
openstackgerritPaul Belanger proposed zuul/nodepool master: Add release note about pinning openshift client  https://review.opendev.org/66660514:59
corvusit would be really good to find and fix that leak14:59
flaper87does zuul have a concept of stages? Something that would allow for building "artifacts" in one stage and then reuse them in other jobs? Something like https://docs.gitlab.com/ee/ci/yaml/#stages15:09
openstackgerritMerged zuul/zuul-jobs master: Dockerhub now returns 200 for DELETEs  https://review.opendev.org/66652915:09
pabelangerflaper87: Yup! you'll want to checkout the recent work corvus has been doing around that15:10
pabelanger(trying to find docs)15:11
flaper87sweeeeet15:12
pabelangerflaper87: https://zuul-ci.org/docs/zuul-jobs/docker-image.html has some info about how containers would work15:13
corvusflaper87: see https://zuul-ci.org/docs/zuul/user/config.html#attr-job.dependencies  to have one job depend on another job15:14
*** pcaruana has quit IRC15:14
corvusflaper87: https://zuul-ci.org/docs/zuul/user/jobs.html#return-values can be used to pass information to dependent jobs15:15
*** themroc has quit IRC15:15
corvusflaper87: and https://zuul-ci.org/docs/zuul/user/jobs.html#var-zuul.artifacts can be used to extend depends-on behavior to artifacts (so you can have a change in one project depend on a built artifact in another project)15:16
*** hashar has quit IRC15:17
flaper87niiiice! Thanks, I'll read all those and come back with questions15:19
tobiashcorvus: hrm, we seem to be missing a relation between job name and build uuid in the logs15:24
corvustobiash: hrm, we used to have it when we launched a build15:25
corvustobiash: zuul/executor/client.py:            "Execute job %s (uuid: %s) on nodes %s for change %s "15:26
tobiashcorvus: oh found it15:26
tobiashyeah that15:26
tobiashso appearently the job that leaked a node never shows this message15:26
tobiashlast thing for the job that leaked is froze job and completed node request15:28
flaper87corvus: pabelanger thanks! sounds like all the building blocks for what I need are there15:28
flaper87nice15:28
tobiashso it might got dequeued somewhere between node lock and 'execute job'15:29
corvusfungi: was our node also missing that line ^ ?15:30
tobiashfungi: it's a little bit hard to correlate, you could search first for the node, then the node-request which tells you the job and then filter for event id and job name15:31
felixgcbfungi: thank you so much :) now it works.15:33
fungicorvus: sorry, in two meetings simultaneously at the moment. pretty sure i saw the scheduler log that in my case, yes... checking15:33
fungiahh, no it was the entry about the executor accepting the build request i was thinking of... the one you're asking about would be logged by the builder?15:35
*** jamesmcarthur has quit IRC15:36
corvusfungi: that should show up in the scheduler debug log15:37
fungier, yeah i meant s/builder/executor/ but i was looking at the normal scheduler log not the debug log15:40
fungijust a sec15:40
*** electrofelix has quit IRC15:44
fungiwell, more than a sec15:45
fungipower outage here, but also grepping our scheduler debug log is not fast15:45
tobiashcorvus, fungi: the job that leaked is making use of skipping child jobs15:45
*** jamesmcarthur has joined #zuul15:53
fungiooh, we've got a bunch of useful info in the scheduler debug log i'm surprised didn't percolate into the normal service log15:54
corvuswe should fix that15:54
fungiagreed, just figuring out which ones should get elevated15:54
fungiapparently the scheduler was trying to cancel the build but couldn't find it in the queue and thought it wasn't started15:55
*** fdegir8 has joined #zuul15:55
*** kklimonda_ has joined #zuul15:55
corvusmordred: i've run into an error with my new nodepool functional job, can i get your help debugging this: http://paste.openstack.org/show/753227/   (i've set an autohold on that node, so it should be available for us to ssh into in a bit)15:56
fungibut yeah, the log entry you were asking about is present in this case:15:56
*** jamesmcarthur has quit IRC15:56
fungi2019-06-20 04:53:52,464 INFO zuul.ExecutorClient: [e: 9b44cc146cd649e585ff229c0a8a296b] Execute job swift-upload-image (uuid: 395c781799ea452c8f639d3037f8de0f) on nodes <NodeSet [<Node 0008103245 ('ubuntu-bionic',):ubuntu-bionic>]> for change <Change 0x7fa6a08caf60 openstack/swift 665487,2> [...]15:56
corvusthen it seems we may have a different situation than tobiash15:56
fungiseems so, yes15:56
*** jamesmcarthur has joined #zuul15:56
*** persia_ has joined #zuul15:57
tobiashhrm, two node leaks?15:57
tobiashbad15:57
fungithis is probably the critical element in our scenario:15:57
fungi2019-06-20 04:54:18,614 DEBUG zuul.ExecutorClient: [e: 9b44cc146cd649e585ff229c0a8a296b] Response to cancel build request: b'ERR UNKNOWN_JOB'15:57
*** sean-k-mooney1 has joined #zuul15:58
*** mattw4 has joined #zuul15:58
fungithe build cancellation event was logged at 04:54:18 a few lines before that16:00
fungiso ~36 seconds after the job execution event16:01
fungistrangely, it seems to have continued on the executor, which i guess never got asked to abort16:02
fungiand at 05:02:00 the executor paused the build16:02
*** pcaruana has joined #zuul16:02
fungiso maybe a race around cancelling a job moments after execution?16:02
*** kklimonda has quit IRC16:03
*** fdegir has quit IRC16:03
*** persia has quit IRC16:03
*** sean-k-mooney has quit IRC16:03
*** kklimonda_ is now known as kklimonda16:03
*** jangutter has quit IRC16:03
fungihrm, but the first reference to that build uuid in the executor debug log was for picking up the corresponding executor:execute request from gearman at 04:53:52, then staring the ssh agent at 04:53:53... so the job was well under way when the cancellation happened16:09
tobiashI found the sequence of my scenario: https://etherpad.openstack.org/p/EAvGRON5P416:13
fungiyeah, in our case the build seems to have been well into checking out git refs on the executor when the cancellation should have happened16:13
tobiashcorvus: that etherpad shows the sequence in my scenario and a poposal for a fix ^16:17
fungiin the log example above, is 'ERR UNKNOWN_JOB' being returned by gearman? i don't see where the executor ever responded to a cancellation16:18
ofososHey ho, has anybody experience with running the OpenShift drivers to connect to an EKS cluster?16:20
*** hwangbo has joined #zuul16:23
fungiofosos: i think SpamapS maybe tried out hooking nodepool to ekcs? not sure if it was with the openshift or kubernetes node driver16:24
fungisounded like authentication for ekcs is complicated16:24
hwangboHi everyone. When we start up our Zuul server, the executor takes a long time (several hours) updating/resetting the repos in our gerrit server. Is there any way to shorten this time?16:24
pabelangerhwangbo: sounds like maybe an IO issue?16:25
pabelangerhow many repos do you have, and I guess they are pretty large?16:25
hwangboWe're using 3 repos, but they have dozens of branches with thousands of commits. Pretty large16:26
*** felixgcb has quit IRC16:28
pabelangerhwangbo: I'd try to collect stats on your disk, and try to figure out where the bottleneck is. IIRC, tobiash also had some IO issues recently.16:31
pabelangerhwangbo: how many executors are you running?16:31
*** fdegir8 is now known as fdegir16:32
corvusfungi: yeah, 'err unknown job' is the gearman server telling the client that it doesn't know about the job it was requested to cancel16:32
hwangboWe're using the Zuul quick start docker-compose, so it's just a single executor container16:32
fungicorvus: should it have been in there, or is that a normal occurrence once an executor picks up the job?16:32
pabelangerhwangbo: you may want to also scale our the executor, which should allow less jobs per executor, resulting in less IO too16:33
tobiashhwangbo: note that the jobdir must be on the same mountpoint like the cache dir16:33
tobiashhwangbo: do you already have this in your deployment https://review.opendev.org/665186 ?16:33
tobiashthis fixes the default in the docker-compose so this condition is met16:34
pabelangergood point16:34
tobiash(which has been merged in the last few days)16:34
hwangbotobiash: Nope, we don't have that change yet16:35
fungicorvus: hrm, grep counts 46 occurrences in a 24 hour period, so i guess it's somewhat expected (or else this problem is more widespread)16:35
pabelangertobiash: mind a review on https://review.opendev.org/666605/16:35
tobiashpabelanger: done16:35
corvusfungi: yes, the race is not unexpected, and, at least theoretically handled :)16:37
fungiright, so there's more to it than just that16:37
corvusit is certainly an interesting code path though16:37
fungipresumably if a build is underway on an executor then it is expected that there is a corresponding gearman job, otherwise i'd expect waaaay more than a couple of these an hour16:38
tobiashmaybe connection issues with gearman?16:39
tobiashfungi, corvus: fwiw I don't have a match to 'ERR UNKNOWN_JOB' in my logs in the last 15 days16:41
fungitobiash: in the scheduler debug log, right?16:41
tobiashoh wait, the matching was wrong :/16:41
tobiashI lied, I see 5000 in the last 15 days16:42
tobiashyes, scheduler debug log16:42
hwangbopabelanger: Is there any documentation on how to scale the executor? This is just for the server initialization, I thought there's only ever 1 "executor" container running.16:44
tobiashso this looks quite normal16:44
corvushwangbo: update with the change tobiash mentioned first, then let us know if you still have problems16:45
*** panda has quit IRC16:50
*** panda has joined #zuul16:52
openstackgerritTobias Henkel proposed zuul/zuul master: Defer setting build result to event queue  https://review.opendev.org/66664316:55
tobiashthis should fix the node leak in my scenario ^16:56
openstackgerritJames E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job  https://review.opendev.org/66502316:56
tobiashcorvus: is this worth a release note?16:57
tobiashpabelanger: I'd appreciate a review on https://review.opendev.org/579962 and parent. Facilitates advanced debugging techniques in a running zuul16:59
corvustobiash: i don't feel strongly about a release note for that17:00
pabelangertobiash: sure, can look shortly17:00
pabelangercontemplating zuul upgrade to 3.9.0 this afternoon :)17:01
corvustobiash: that will slow things down a bit, but i think we actually process events fairly quickly now, so it's probably okay (there was a time where result event processing took so long, half our system would be idle)17:01
corvustobiash: i think we'll just want to keep an eye out for efficiency changes related to that17:02
openstackgerritMerged zuul/nodepool master: Add release note about pinning openshift client  https://review.opendev.org/66660517:03
tobiashcorvus: yeah, I also thought about that but at least freeing the nodes is at the same stage17:05
pabelangernodepool 3.7.1 should be ready now, if we want to tag with ^17:05
tobiashcorvus: or do you think we need a different solution?17:05
pabelangerWe are already running the pre-relase version for zuul.a.c, without issues17:05
corvustobiash: this feels like the more correct one, and is simpler; so i think it's worth trying17:06
tobiash++17:06
corvusi'll tag nodepool master as 3.7.117:07
corvuspushed17:09
pabelangerThanks!17:10
hwangbocorvus pabelanger: I don't think it helped. I updated with the change, but it's still stuck doing "Updating repo...Resetting repo" for what seems like each commit in the repository17:13
*** jpena is now known as jpena|off17:15
tobiashhwangbo: it shouldn't do that for each commit, so logs would be helpful17:16
hwangbotobiash: Here's a snippet of what the log looks like. I spoofed out some sensitive information, but I think you'd get the idea. https://pastebin.com/HEQH1eWM17:32
tobiashhwangbo: you mean the cat jobs that are executed when starting up zuul?17:35
tobiashthose are executed for each repo and branch17:35
hwangboYeah, sorry. I should have been more verbose17:35
tobiash(not every commit)17:35
hwangboFor us, that process takes several hours17:35
tobiashin this case a single executor is a bottleneck17:35
tobiashso you likely want to run more than one executor17:35
tobiash(which is not covered by the docker-compose file)17:36
hwangboIs there any other documentation on how to set that up?17:36
tobiashwell you have the executor config, so you just can take that part and run it multiple times17:37
tobiashnote that the cache dirs must not be shared17:37
hwangborun it multiple times?17:38
hwangboIs this separate from the docker-compose file?17:38
fungiin our case we run executors and additional mergers, and put them on separate servers from where we run the scheduler/web/finger-gw daemons17:39
fungii think our current count is 12 executors and 8 additional mergers for the opendev zuul deployment17:40
fungithe cat job workload probably scales in an embarrassingly parallel fashion, so you could divide the current duration by the desired duration to get a rough estimate of the number you need17:42
pabelangerdoes docker-compose setup a zuul-merger?17:43
fungianother (or additional) option could be to limit which repositories you want zuul to look in for job configuration17:43
pabelangerif not, maybe we make that an option thing to enable17:43
hwangboIt doesn't setup a standalone zuul-merger17:44
fungisince i *think* cat jobs are only run for repositories where zuul is looking for job configuration (someone can probably correct me on that)17:44
pabelangerhowever, given this is a single server, IO is still going to be a concern17:44
hwangboIs there an example zuul.conf I can look at to see how multiple executors/mergers are configured?17:45
pabelangerhwangbo: in theory, you shouldn't need to change anything when you add another executor online. However, in this case, like tobiash said, you need to create a new volume in docker as not to share it with other executor17:47
pabelangerhttps://github.com/openstack-infra/puppet-zuul/blob/master/templates/zuulv3.conf.erb17:48
pabelangeris example from opendev.org17:48
pabelangerhttps://github.com/ansible-network/windmill-config/blob/master/zuul/zuul.conf.j217:48
pabelangeris ours for zuul.ansible.com17:48
fungizuul is designed so that you can have just one service configuration and hand it out to all the executors and mergers as well as the scheduler, but for merger-specific configuration options see https://zuul-ci.org/docs/zuul/admin/components.html#id417:49
fungireally if you just take your executor configuration and put it on additional servers with zuul installed and start the zuul-executor or zuul-merger daemon, it ought to simply work17:51
fungiassuming you've configured them to communicate with the scheduler on an address they can all reach17:51
openstackgerritTobias Henkel proposed zuul/zuul master: Differentiate between queued and waiting jobs in zuul web UI  https://review.opendev.org/66087817:51
fungiand that any firewall rules are configured to allow communication between those servers17:52
hwangbofungi, pabelanger, tobiash: Thanks for all the help. I'll be giving this a shot17:52
openstackgerritTobias Henkel proposed zuul/zuul master: Differentiate between queued and waiting jobs in zuul web UI  https://review.opendev.org/66087817:52
fungifor a better understanding of what firewall rules you might need to allow communication between distributed services, see https://zuul-ci.org/docs/zuul/admin/components.html17:53
corvusyes, io on a single machine may still be a bottleneck no matter how many mergers/executors are running.  it all depends on the situation.  for that reason, i don't think it's worth expanding the quick start for this.  the operator, otoh, should handle this case.17:57
tobiashcorvus: we're running into another scalability problem regarding the number of branches in a repo that is being used by a job. We have a project containing 2000 branches (due to branch&pull workflow and many people working on it). Restoring the repo state when running a job takes a significant amount of time (20 minutes when under io pressure).17:58
tobiashmost jobs only need the protected branches (<5)17:59
corvustobiash: "most"?17:59
tobiashactually probably all17:59
corvuswhew, that's probably more actionable :)18:00
tobiashone idea would be to be able to limit the repo state to protected branches as a job setting18:00
tobiashbut that would violate layering in the model18:00
tobiashhowever I cannot guarantee that it's *all* jobs so I guess we'd need something configurable18:01
tobiash(e.g. projects following fork and pull on github.com might not even need protected branches)18:02
tobiashor we could determine this by the exclude-protected-branches option on the repo18:02
tobiashwhich we already have18:03
tobiashI think in this case we could require a user to protect a branch if he wants to run something on it18:03
tobiashI think I'll go for the existing exclude-protected-branches option as the best compromise. I don't really want to put that into a job parameter.18:05
*** hashar has joined #zuul18:06
*** jamesmcarthur has quit IRC18:06
*** jamesmcarthur has joined #zuul18:07
pabelangeryah, I've seen some random branches get cretaed in a repo, when humans edit a file via web ui18:08
pabelangerhard to suggest changing that workflow, as users tend to not be well versed in git18:08
*** jamesmcarthur has quit IRC18:12
tobiashin many enterprise workflows fork&pull is not even possible. In this case you always have the source branches of pull requests in the same repo (which most likely won't ever be needed by jobs)18:12
*** jamesmcarthur has joined #zuul18:12
*** jamesmcarthur_ has joined #zuul18:17
*** jamesmcarthur has quit IRC18:17
*** jamesmcarthur_ has quit IRC18:22
*** jamesmcarthur has joined #zuul18:23
*** igordc has joined #zuul18:27
*** Minnie100 has joined #zuul18:30
*** igordc has quit IRC18:36
openstackgerritMerged zuul/zuul master: Update cached repo during job startup only if needed  https://review.opendev.org/64822918:39
*** jamesmcarthur has quit IRC18:42
*** jamesmcarthur has joined #zuul18:51
*** jamesmcarthur has quit IRC19:01
*** jamesmcarthur has joined #zuul19:14
*** Minnie100 has quit IRC19:16
*** jamesmcarthur has quit IRC19:22
openstackgerritTobias Henkel proposed zuul/zuul master: Filter out unprotected branches from builds if excluded  https://review.opendev.org/66666419:34
*** gtema has joined #zuul19:40
*** bhavikdbavishi has joined #zuul19:44
*** gtema has quit IRC19:46
*** bhavikdbavishi has quit IRC20:15
SpamapStobiash: or in my case, the users just refuse to use fork&pull because they like that branches on the main repo are shared by default.20:26
SpamapS*ugh*, the prohibition of untrusted repos sharing a job with a secret just ruined my day :-/20:32
SpamapShave to move that job into a config repo :-/20:32
corvusi had an idea of how to address that, but i don't seem to have time to work on it20:42
*** pcaruana has quit IRC20:45
SpamapScorvus: I'm happy my secrets are protected, but yeah... I wish there was a better answer.20:46
*** jeliu_ has joined #zuul20:47
SpamapSalso this runs into the job renaming problem I had in the past20:49
SpamapS(there's no way to move a job from repo to repo without going through a 3-way dance to create a new name, change all references to it, and then remove the old one and rename the new one20:52
dmsimardSpamapS: I'd love a better solution to that dance too.20:53
dmsimardI don't have any great ideas :(20:53
corvusi expect it could be better with bidirectional dependencies20:53
corvus(at least, for those who wished to enable that, once support for it exists)20:54
SpamapSMaybe, another thought I had was just to have an optional precedence field.20:54
corvusSpamapS: you could temporarily allow a repo to shadow another20:54
*** jeliu_ has quit IRC20:54
corvus(that is a form of optional precedence)20:54
SpamapScorvus: indeed.. a bit of a big hammer, but swingable.20:55
openstackgerritJames E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job  https://review.opendev.org/66502320:56
SpamapScorvus: I kind of wonder why I can't just break the prohibition and say "allowed-projects: foo"20:56
SpamapSThe two repos are separate purely for focus reasons.. we expect a small team to iterate rapidly on a small piece of the code so we extracted it to a separate repo..20:57
SpamapSso it would be nice if I could just make an explicit trust.20:57
corvusSpamapS: i think the solution i was considering had the idea that a config project could cause a job in untrusted repo A to be run in untrusted repo B.  in other words, allowing the config project to assert that trust relationship.20:59
SpamapScorvus:yes, that's exactly what I want.21:00
corvusSpamapS: unfortunately, just doing "allowed-projects: foo" in an untrusted project is dangerous (that is a thing we had to remove) because a change can speculative change it.  so we need to get the config project involved to fixate it safely.21:00
*** hashar has quit IRC21:01
corvusSpamapS: i'll see if i can't negotiate that one up higher in my todo list... it's an unhandled use case, and you know i don't like those.21:02
SpamapSI do ;)21:03
SpamapSalso, TBH, we may actually kick all the secrets out of the untrusted repos to have more separation between deploy and dev.21:04
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: Allow config projects to override allowed-projects  https://review.opendev.org/66673321:17
corvusSpamapS: ^ maybe now it'll be a bit more in my face and i can finish it up piecemeal.21:18
corvusi wonder if, in the nodepool devstack job, we could start the nodepool builder at the beginning of the job, so that it ran the DIB build portion while devstack was being installed, and we rely on the upload retries for the builder to eventually upload the image once the cloud came online...21:39
corvusi think i'll try that after i get the basic functionality working21:40
*** sshnaidm is now known as sshnaidm|off21:55
openstackgerritJames E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job  https://review.opendev.org/66502322:15
*** mattw4 has quit IRC22:47
*** sean-k-mooney1 has quit IRC22:49
openstackgerritJames E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job  https://review.opendev.org/66502322:53
openstackgerritJames E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job  https://review.opendev.org/66502322:54
openstackgerritJames E. Blair proposed zuul/nodepool master: WIP: new devstack-based functional job  https://review.opendev.org/66502322:55
*** mattw4 has joined #zuul23:02
*** rlandy has quit IRC23:44

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!