Monday, 2021-02-22

*** jamesmcarthur has quit IRC00:05
*** cloudnull has quit IRC00:05
*** cloudnull has joined #zuul00:06
*** jamesmcarthur has joined #zuul00:18
*** jamesmcarthur has quit IRC00:23
*** jamesmcarthur has joined #zuul00:37
*** jamesmcarthur has quit IRC00:44
*** tosky has quit IRC00:53
*** jamesmcarthur has joined #zuul00:57
*** jamesmcarthur has quit IRC01:03
*** jamesmcarthur has joined #zuul01:04
*** jamesmcarthur has quit IRC01:06
*** jamesmcarthur has joined #zuul01:06
*** jamesmcarthur has quit IRC01:10
*** jamesmcarthur has joined #zuul01:20
*** dry has quit IRC01:52
*** dry has joined #zuul01:54
*** fdegir has quit IRC02:58
*** ikhan has quit IRC03:08
*** ykarel has joined #zuul03:14
corvusmordred: i think i've just about got the bazel cache situation worked out, as well as adding tests for the gerrit build.  so i sent a note to repo-discuss suggesting we go ahead and add a checker and then iteratively improve the existing job.03:23
corvus(all the work is in the latest PS of https://gerrit-review.googlesource.com/c/zuul/jobs/+/297362/ -- i'll split it out into multiple changes when we're ready for real)03:24
*** jamesmcarthur has quit IRC03:33
*** jamesmcarthur has joined #zuul03:33
*** jamesmcarthur has quit IRC03:38
*** ricolin has joined #zuul04:02
*** jamesmcarthur has joined #zuul04:03
*** vishalmanchanda has joined #zuul04:25
*** saneax has joined #zuul04:54
*** evrardjp has quit IRC05:33
*** evrardjp has joined #zuul05:33
*** jfoufas1 has joined #zuul06:51
*** hashar has joined #zuul07:09
*** piotrowskim has joined #zuul07:10
*** guillaumec has quit IRC07:34
*** guillaumec has joined #zuul07:34
*** hashar has quit IRC07:52
*** hashar has joined #zuul07:54
*** jcapitao has joined #zuul07:59
*** ykarel_ has joined #zuul08:30
*** ykarel has quit IRC08:33
*** ykarel_ is now known as ykarel08:33
*** dry is now known as msuszko08:34
*** rpittau|afk is now known as rpittau08:34
*** tosky has joined #zuul08:43
*** jpena|off is now known as jpena08:58
*** ykarel has quit IRC09:23
*** jfoufas1 has quit IRC09:24
*** harrymichal has joined #zuul09:33
*** harrymichal has quit IRC09:35
*** ykarel has joined #zuul09:38
*** flaper87 has joined #zuul09:43
*** flaper87 has quit IRC09:43
*** flaper87 has joined #zuul09:46
*** flaper87 has quit IRC09:47
*** flaper87 has joined #zuul09:49
*** flaper87 has joined #zuul09:51
*** flaper87 has quit IRC09:52
*** flaper87 has joined #zuul09:56
*** flaper87 has quit IRC09:57
*** flaper87 has joined #zuul09:59
*** flaper87 has quit IRC10:00
*** flaper87 has joined #zuul10:03
*** flaper87 has quit IRC10:04
*** flaper87 has joined #zuul10:09
*** flaper87 has quit IRC10:14
*** flaper87 has joined #zuul10:15
*** jpena has left #zuul10:18
*** jpena has joined #zuul10:18
*** sshnaidm__ is now known as sshnaidm10:22
*** ykarel has quit IRC10:36
*** jhesketh has quit IRC10:37
*** jhesketh has joined #zuul10:37
*** flaper87 has quit IRC10:41
*** flaper87 has joined #zuul10:41
*** flaper87 has quit IRC10:42
*** flaper87 has joined #zuul10:42
*** ykarel has joined #zuul10:42
*** tosky_ has joined #zuul10:53
*** tosky has quit IRC10:54
*** tosky_ is now known as tosky10:54
*** flaper87 has quit IRC11:20
*** flaper87 has joined #zuul11:21
openstackgerritSorin Sbârnea proposed zuul/zuul-jobs master: Upgrade ansible-lint to 5.0  https://review.opendev.org/c/zuul/zuul-jobs/+/77324511:40
*** ikhan has joined #zuul11:50
*** hashar is now known as hasharLunch11:59
*** jcapitao is now known as jcapitao_lunch12:04
*** rlandy has joined #zuul12:28
*** jpena is now known as jpena|lunch12:31
*** jcapitao_lunch is now known as jcapitao13:06
*** iurygregory_ has joined #zuul13:14
*** iurygregory has quit IRC13:15
*** tosky has quit IRC13:18
*** iurygregory_ is now known as iurygregory13:24
*** tosky has joined #zuul13:24
*** jpena|lunch is now known as jpena13:25
*** sduthil has joined #zuul13:32
*** aluria_ is now known as aluria13:50
*** ykarel has quit IRC13:56
*** hasharLunch is now known as hashar14:06
*** newbie2020 has joined #zuul14:09
newbie2020Hi guys, it looks to me that required-projects defined in a child job does not "override" parent's required-projects definition, but rather that the final list is the superset of all the required-projects lists. Is it true? I vaguely remember having read that somewhere but I cannot find it on the doc (probably it is just me, sorry for that)14:11
corvusnewbie2020: that's correct14:13
fungiwe make use of that quite extensively in opendev. some users have general purpose jobs with a bunch of required-projects entries, but then want to extend that with one or more additional entries when they add it to a pipeline or make a minor adjustment as a custom version of that14:20
fungihaving to restate the original required-projects list ever time you wanted to alter it would become a challenge, to keep synchronized, if you ever needed to change the list from the original job14:21
fungiof course that comes as the loss of not being able to make custom alterations which remove some of the entries14:22
mordredcorvus: the bazel cache patch looks great!14:23
newbie2020thanks fungi for explaining the rationale behind it!14:27
*** tosky has quit IRC14:29
*** tosky has joined #zuul14:33
mordredcorvus: left a couple of comments - including one that's really just from luca14:59
corvusmordred: super weird, i don't see a msg from luca15:06
*** tosky has quit IRC15:15
*** tosky has joined #zuul15:15
*** tosky_ has joined #zuul15:18
*** tosky is now known as Guest4362915:19
*** tosky_ is now known as tosky15:19
*** Guest43629 has quit IRC15:21
*** newbie2020 has quit IRC15:26
mordredOn the mailing list15:49
corvusmordred: i just now got a message from him16:01
corvusbut i don't see anything relating to the comment you left :/ weird16:01
clarkbcorvus: are we still on track to make releases today? /me is catcing up on email and don't see any zuul complaints from the weekend16:18
corvusclarkb: yes, i'm prparing for that now.16:20
mordredcorvus: OH - duh. it was from davido16:21
corvusoooh ok yeah i saw that one :)16:21
clarkbcorvus: cool, let me know if I can help16:23
openstackgerritHelena Spease proposed zuul/zuul master: Errors from patch 3 added.  https://review.opendev.org/c/zuul/zuul/+/76940416:34
*** vishalmanchanda has quit IRC16:34
clarkbdavid o mentions checks plugin vs checks ui. Anyone know if the checks ui setup is documetned somewhere? Googling it seems to only return results for the checks plugin16:36
corvusi thought the stuff was happening in the checks plugin16:36
clarkbah16:37
*** jcapitao has quit IRC16:53
corvuscommit 4c5fa46540e07dfad1d62f069f8e44aa5a330660 (HEAD -> master, tag: 4.0.0, origin/master, gerrit/master, refs/changes/86/776286/15)17:09
corvuscommit 24405c9c745fe8de14106ef9b53b7ad6de871b09 (HEAD, tag: 4.0.0, refs/changes/49/776249/1)17:10
corvuszuul-maint: ^ do those look right?17:10
corvus(that's one commit behind master on zuul, but it's just the no-op dockerfile change)17:10
fungilookin'17:10
clarkbcorvus: the nodepool sha looks correct based on the git log but isn't what you logged on friday for our restart (however the restart on friday should've been for 4c5fa46540e07dfad1d62f069f8e44aa5a330660)17:13
clarkbother than that they lgtm17:13
corvusclarkb: i agree, it looks like i logged the previous patchset of that; i must have had a dirty tree17:13
fungii agree those commit ids match what i expect17:13
corvusclarkb: it's a lot harder to figure out what sha to log with containers17:13
*** piotrowskim has quit IRC17:15
tobiashcorvus: tags lgtm17:19
*** fbo is now known as fbo|off17:23
openstackgerritAlbin Vass proposed zuul/zuul master: Replace reset_repo_to_head(repo) with GitPython.  https://review.opendev.org/c/zuul/zuul/+/77549917:26
*** sgw has left #zuul17:26
tristanCcorvus: +117:27
openstackgerritJames E. Blair proposed zuul/zuul master: Use ZooKeeperClient.fromConfig in tests  https://review.opendev.org/c/zuul/zuul/+/77683817:30
corvusfelixedel: regarding that ^ are you working on a change to switch the test jobs to use tls, or do you want me to?17:35
corvustags pushed17:36
clarkband now we wait for zuul to release itself17:37
fungizuul is actually an ouroboros17:41
*** tjgresha has joined #zuul17:43
avasswith a HA schedulers it might also be able to upgrade itself as well :)17:43
corvusavass: i would say that's a near certainty for opendev at least :)17:44
*** rpittau is now known as rpittau|afk17:49
fungii'm looking forward to eventually not having to interrupt and reenqueue in-progress builds once v5 arrives17:50
fungia scheduler restart probably costs us around 750-1000 node-hours at peak utilization these days17:51
clarkbpossibly even more because you lose the results of compelted jobs too (not jsut those that were in progress)17:54
*** jpena is now known as jpena|off17:55
clarkbzuul appears to be all done now. Pypi and docker hub have the release artifacts. nodepool is stillin progress ( the arm64 cross compile isn't quick()18:04
fungiyeah, .75-1k was a conservative estimate18:06
*** jfoufas1 has joined #zuul18:07
*** hashar has quit IRC18:07
*** hamalq has joined #zuul18:10
*** jfoufas1 has quit IRC18:19
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Revert "Revert "Update upload-logs roles to support endpoint override""  https://review.opendev.org/c/zuul/zuul-jobs/+/77667718:22
Open10K8Scorvus: I made a PS on https://review.opendev.org/c/zuul/zuul-jobs/+/776677 but I don't think I am so clear on your comments18:31
Open10K8SSo the problem is opendev's "quick-download" link as I understood, but what is quick-download? :)18:32
Open10K8SSeems likely I never experienced this18:32
Open10K8Sis there any chance to explain a little more plz? thank you18:33
fungiOpen10K8S: it's an example of a consumer of the current url parameter. visit a build result's artifacts like https://zuul.opendev.org/t/openstack/build/ddfd844504ff44ee89990f24bd234933/artifacts and you'll see a link to a script which downloads all logs for that build18:34
fungiit relies on the url parameter for that currently18:35
fungiit could in theory be spliced together with bits from other variables18:35
*** jamesmcarthur has quit IRC18:35
fungii'm not keen on the fact that it encourages people to pipe random scripts from the internet straight to their shell, but the suggestion to have a special log downloading client was deemed too heavyweight for most users18:37
Open10K8Saha, I see18:38
Open10K8Sfungi: then how does that url come to the zuul ui?18:39
fungiOpen10K8S: the artifact is registered here, for reference: https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/base/post-logs.yaml#L29-L3918:39
fungiso it's being resolved by ansible18:39
*** tjgresha has quit IRC18:39
Open10K8Shmm... so if we want remove url attribute from upload-log jobs, we need to change all other jobs which are using .url.  and even though, we replace all jobs in zuul-jobs repo, maybe users are using it in their own jobs so need to announce it. is this right?18:41
fungiyes18:41
fungithat's basically it18:41
fungiso for example we can keep the url parameter as a legacy value and mark it deprecated, then announce publicly that we'll be removing it at a later date, and encourage users to switch to the new vars18:42
Open10K8Ssounds reasonable surely.  And how about this step? 2) Define a new cacheable fact "zuul_upload_logs_results" which contains url, endpoint, path.  Document it in the readme.18:43
Open10K8SI think I am not sure about this new fact's purpose18:43
avassOpen10K8S: mostly to document the behaviour since opendev was using undocumented output variables, as well as making sure they persist across playbooks with cacheable: true18:45
fungiOpen10K8S: the argument for adding the zuul_upload_logs_results fact is that it can be used by subsequent playbooks which need to know the url where logs are being made available18:46
fungiif that's in a zuul variable supplied to ansible for those playbooks, then i suppose it could also work?18:47
fungicacheable facts are simply the primary mechanism i think we've used to communicate values between different playbooks18:48
corvusupload-docker-image timed out (?) for nodepool18:53
corvushttps://zuul.opendev.org/t/zuul/build/ca898dd7dc554cf5b0be90afbaff1957/log/job-output.txt#585418:55
fungihuh18:55
corvusthat appears to be during the build phase?18:55
corvuson arm6418:55
fungilooks like it was in the middle of pip install?18:56
fungimaybe compiling some things from sdist and took too long?18:56
fungiyeah, looks like that was the run phase timeout from zuul mere seconds after output was streamed to the console, so the node could have simply been struggling and the build going much slower18:58
corvusyeah, that requirement took almost 5 minutes in the previous run: https://zuul.opendev.org/t/zuul/build/50504dba533a4f04aaa15a25f4d637a6/log/job-output.txt#606418:59
fungia similar step earlier took almost 7 minutes https://zuul.opendev.org/t/zuul/build/ca898dd7dc554cf5b0be90afbaff1957/log/job-output.txt#552018:59
corvusseems like cross-compiling image builds just take ~1 hour18:59
corvusfungi: what will happen if we run the pypi upload job again?19:00
corvus(it succeeded)19:00
mordredcorvus: zomg https://gerrit-review.googlesource.com/c/zuul/jobs/+/297442 passed19:00
fungiit will break because pypi will reject an attempt to upload a file which alreay exists19:01
corvusmordred: \o/  any speedup?  (btw, now that that's there, we might be able to use multiprocessing to speed up more)19:01
fungiwe could in theory make the pypi upload role smarter and do a lbyl, i suppose19:01
corvusfungi: that's fine -- as long as it isn't going to mess up pypi.19:02
mordredcorvus: test-gerrit-setup took 3min 8sec19:02
corvusmordred: this is when avass's change to print playbook times would be handy19:03
avass:)19:03
corvusfungi: so i'll re-enqueue the ref.  pypi will harmlessly fail and docker will hopefully succeed.  in an hour we'll know.19:03
corvusfungi: sound right?19:03
avassmy change only added plays however, playbooks aren't recorded at the moment19:03
fungicorvus: yeah, that seems fine19:03
mordredcorvus: yah - although previous successful runs of that seem to be in the 7-9 minute mark19:04
fungicorvus: since they're separate jobs and not dependent, it'll be fine19:04
corvusmordred: that sounds like a speedup!19:04
corvuszuul enqueue-ref --tenant zuul --trigger gerrit --pipeline release --project zuul/nodepool --ref refs/tags/4.0.0 --newrev e983422fabd34e21143057700ce434a874b5930c19:04
*** saneax has quit IRC19:04
corvuslook right?19:04
fungilgtm, that seems to be the correct id for the tag19:05
corvus(i copied the sha from inventory.yaml and it also looks like it matches 'git show-ref 4.0.0.')19:05
corvusdone19:05
fungicorvus: i've been using the script in https://review.opendev.org/613676 to similar ends for reenqueuing failed releases for openstack, maybe something like that could make sense to implement as part of zuul-client19:06
fungibasically parse the inventory, call enqueue or enqueue-ref with the parameters therein19:07
openstackgerritJames E. Blair proposed zuul/nodepool master: Increase timeout of docker release job  https://review.opendev.org/c/zuul/nodepool/+/77698619:07
corvusfungi: sounds good19:07
corvuszuul-maint: ^ let's go ahead and merge that so we're ready next time.19:08
corvusmordred: i'm running with "--test_tag_filters=-flaky,-elastic,-git-protocol-v2" and it's not looking good; lots of tests timing out :/19:09
mordred:(19:09
*** jamesmcarthur has joined #zuul19:12
mordredcorvus: so, 'passed' might have been a bit misleading19:19
mordredcorvus: https://ci.gerritcodereview.com/t/gerrit/build/35cf5e26276f4542a77cfd2bff08094c/console#1/0/20/testnode19:19
Open10K8Sfungi: avass19:21
Open10K8Sthank you19:21
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Revert "Revert "Update upload-logs roles to support endpoint override""  https://review.opendev.org/c/zuul/zuul-jobs/+/77667719:21
corvusmordred: oof, i also found this old change where i was doing a bunch of stuff to try to get the tests to pass, including running on a huge node: https://gerrit-review.googlesource.com/c/zuul/jobs/+/26027519:22
corvusmordred: how does this look? https://etherpad.opendev.org/p/1BRSLMfR7rc2frbMz9S219:28
*** jamesmcarthur_ has joined #zuul19:30
corvusmordred: i'm kind of at a point of wondering whether it's worth moving on to try to get the tests running (if they timed out after an hour on a large node last year....) or just stop where we are and try to get rbe to work, since it seems the replies pretty much continually mention rbe.19:31
*** jamesmcarthur has quit IRC19:33
*** jamesmcarthur_ has quit IRC19:34
mordredcorvus: yeah... maybe figuring our RBE is somehow better?19:37
corvusmordred: 2 things: 1 the bazel build with my cache was faster (3m) than luca's (4m).  (weird?  maybe an old cache?  or probably a different environment causing more cache misses based on environmental inputs)20:52
corvusmordred: 2 -- i ran a build on a 60G node and it finished, but still failed.  the test portion of the build took 37 minutes.20:53
corvusi given that it failed and there were a bunch of timed out tests, i don't know if that represents a minimum run time or not.  maybe passing tests run faster.20:54
mordredcorvus: and enabling luca's cache didn't fix the test timeouts20:56
corvusapparently not20:56
corvusmordred: i updated https://etherpad.opendev.org/p/1BRSLMfR7rc2frbMz9S2 slightly; i think i'm inclined to send that now; what do you think?20:59
openstackgerritMerged zuul/nodepool master: Increase timeout of docker release job  https://review.opendev.org/c/zuul/nodepool/+/77698621:00
corvussecond timeout on nodepool 4.0.0 docker build job21:11
corvusre-re-enqueued21:11
clarkbthis time it should run with the longer timeout at least21:13
corvusyep21:14
mordredcorvus: yeah - I think that looks good21:36
corvussent21:40
clarkbwhat is RBE?21:54
corvusclarkb: bazel remote execution build farm thingy21:55
clarkboh interesting21:55
corvusit's self-hostable, or google runs one; gerrit would likely use the google hosted instance.  main issue with zuul is credentials, but there may be a possible solution with application default credentials and service account.21:56
clarkbis that the sort of thing that might make sense as a nodepool tie in?21:59
mordrednot really21:59
clarkbnodepool provisions some portion of the build farm (maybe that doesn't even make sense)21:59
corvusi think in rbe the build farm is expected to be long-lived21:59
mordredthe rbe is designed to be an existing long-running thing because it incorporates caching and shares build resources across different build jobs21:59
corvusand also not exclusive, so i don't think nodepool would need to mediate access (like static nodes)21:59
clarkbya and ansible probably doesn't have an rbe driver to talk to anything that would be provisioned22:00
corvusi could see maybe using nodepool to make some kind of limited credential so it could be given to an untrusted job and if it were exposed damage would be limited22:00
clarkbbut ya something like ^ was what I had in mind22:00
clarkbsince nodepool is our currently "give me resources in a secure manner" tool22:01
corvusbut i'd rather have a system where you can't expose the credential in the first place, not one where damage is limited to 1 hour blocks or something.22:01
mordred++22:01
corvustristanC: i just noticed that if you have a .kube/config file, the openshiftpods driver will try to refresh a token for it, even if nodepool isn't using any k8s drivers at all.  (in other words, i left a .kube/config file with a google credential sitting around, started up nodepool-launcher with a config file only for azure, and it failed to start because it tried to load application default creds); we may22:42
corvuswant to revisit the driver initialization there.22:42
corvus(this probably isn't going to affect many people in production, unless they change clouds)22:43
tristanCcorvus: do you still have the stacktrace?22:49
corvustristanC: yes! http://paste.openstack.org/show/802901/22:52
tristanCi see, let me add an except handler22:56
corvustristanC: do you think we could avoid calling it in the first place?22:56
corvusbecause it's still trying to obtain tokens, etc22:57
corvusi think an exception handler would be a step backwards -- if the user doesn't intend to use it, then they're paying an unecessary startup cost.  and if they do intend to use it, then it's better for nodepool to break on startup22:58
*** ikhan has quit IRC23:02
tristanCwell we can also lazy load the client23:06
corvusthat might be a good stopgap23:08
corvuszomg the docker upload ran this time.  it took 58 mins.  :)23:12
corvusi think that means that we finally managed to release 4.023:12
corvusi will work on release announcements now23:12
fungiyeesh, just barely scraped through under the timeout23:13
corvusthe timeout should be higher now, thigh we didn't end up verifying that :)23:13
fungier, yeah, the old timeout23:14
tristanCcorvus: hmm, this is happening on Driver.reset(), not sure how to indicate a driver is not used and should not be reset23:18
openstackgerritTristan Cacqueray proposed zuul/nodepool master: kubernetes: refactor client creation to utils_k8s  https://review.opendev.org/c/zuul/nodepool/+/77702223:26
openstackgerritTristan Cacqueray proposed zuul/nodepool master: drivers: only reset drivers that have been active  https://review.opendev.org/c/zuul/nodepool/+/77702423:47
tristanCcorvus: not super fan of this solution, but that might work ^23:47
openstackgerritTristan Cacqueray proposed zuul/nodepool master: drivers: only reset drivers that have been active  https://review.opendev.org/c/zuul/nodepool/+/77702423:48
clarkbtristanC: that code doesn't seem to check if they are active though? it is doing the accounting but then not taking action on it?23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!