Wednesday, 2021-04-07

fungicorvus: indeed, yet it also in a rather contradictory way encouraged that you "don't panic" right on the cover00:03
*** tosky has quit IRC00:17
corvusneat, the operator test job failed on limestone because there was insufficient memory to start a 3-node pxc cluster: https://zuul.opendev.org/t/zuul/build/eaf6255eda634b56a8e75a88110de9cf/log/describe-pods.txt#82200:29
corvus(but we have seen it get past that point on other providers)00:29
corvuswe could probably tweak the pxc memory requirements, or we could make an actual 3-node k8s cluster00:31
fungiif "other providers" are vexxhost, they upped our minimum ram on the flavor we're using to ~32gb in order to balance against cpu count00:31
corvusfungi: indeed, the successful run i picked was vexxhost; that's a bit dangerous for exactly this reason00:33
corvusfungi: last time we were in this situation, we booted nodes with reduced ram00:33
fungiyep, i brought all that up at the time00:33
corvusfungi: i don't remember any of this -- is there a reason opendev didn't decide to do that this time?  or just nobody got around to doing it?00:34
fungiit also required us to bake the memory limit into our images, if i recall correctly, since it was done with kernel command line parameters00:34
corvusfungi: yes, that is my recollection00:34
fungicorvus: there are some comments on 773710 about it, and then brought it up in the meeting http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-02-19.01.log.html#l-14600:39
corvuswow that change merged in 30 minutes00:42
corvus-> #opendev00:42
fungithere was also related discussion in http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2021-02-02.log.html#t2021-02-02T15:33:33-200:42
*** hamalq has quit IRC00:48
*** rlandy|bbl has quit IRC01:41
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503901:57
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Bump API version to v1alpha2  https://review.opendev.org/c/zuul/zuul-operator/+/78504702:00
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: docs  https://review.opendev.org/c/zuul/zuul-operator/+/78508302:00
corvustristanC, mordred: ^ that's the start of some docs; i intend to write more.  that might help explain some of what's in 78503902:01
*** evrardjp has quit IRC02:33
*** evrardjp has joined #zuul02:33
*** josefwells has quit IRC03:01
*** bhavikdbavishi has joined #zuul03:03
*** bhavikdbavishi has quit IRC03:08
*** sam_wan has joined #zuul03:35
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503903:53
*** vishalmanchanda has joined #zuul03:57
*** ykarel has joined #zuul04:11
*** saneax has joined #zuul04:47
*** saneax has quit IRC05:40
*** saneax has joined #zuul06:16
*** ykarel_ has joined #zuul06:18
*** ykarel has quit IRC06:21
openstackgerritSimon Westphahl proposed zuul/zuul master: Switch to ZooKeeperSimpleBase where possible  https://review.opendev.org/c/zuul/zuul/+/78509106:30
*** ykarel_ is now known as ykarel06:32
tobiashswest, corvus: +3 with comment on https://review.opendev.org/c/zuul/zuul/+/78381706:44
*** jcapitao has joined #zuul06:53
*** rpittau|afk is now known as rpittau07:03
*** tosky has joined #zuul07:37
*** ykarel has quit IRC07:56
*** jpena|off is now known as jpena07:57
openstackgerritSimon Westphahl proposed zuul/zuul master: Stop active event gathering on connection loss  https://review.opendev.org/c/zuul/zuul/+/78510008:06
swesttobiash: corvus: ^ proposal to handle the connection loss08:13
openstackgerritSimon Westphahl proposed zuul/zuul master: Stop active event gathering on connection loss  https://review.opendev.org/c/zuul/zuul/+/78510008:18
*** ykarel has joined #zuul08:20
*** rpittau is now known as rpittau|bbl09:23
tobiashclarkb: I've digged deeper into the repo state and isUpdateNeeded thing. I think we just miss a restore repo state after updating the repos (see comment)09:29
openstackgerritGuillaume Chauvel proposed zuul/zuul master: quick-start: Make zookeeper wait for certificates  https://review.opendev.org/c/zuul/zuul/+/78511009:29
openstackgerritMerged zuul/zuul master: Dispatch Gerrit events via Zookeeper  https://review.opendev.org/c/zuul/zuul/+/78381609:47
tobiashclarkb: yepp, validated in the log of a local test case that the repo state restore is missing. This has been unnoticed by a flaw in the global repo state test case. I'll work on a fix for the test case and repo state restore.09:59
*** lyr has quit IRC10:18
*** lyr has joined #zuul10:20
*** ykarel has quit IRC10:41
*** ykarel has joined #zuul10:48
*** jcapitao is now known as jcapitao_lunch10:54
*** dpawlik4 has joined #zuul11:40
*** dpawlik4 is now known as dpawlik11:42
openstackgerritTobias Henkel proposed zuul/zuul master: Fix missing repo state restore with required projects  https://review.opendev.org/c/zuul/zuul/+/78515211:43
tobiashclarkb: this should be one part of the fix (retuired-projects use case)11:43
tobiashclarkb: the fix for the playbooks and roles will follow11:43
*** rlandy has joined #zuul11:43
openstackgerritTobias Henkel proposed zuul/zuul master: Fix missing repo state restore with required projects  https://review.opendev.org/c/zuul/zuul/+/78515211:50
openstackgerritTobias Henkel proposed zuul/zuul master: Fix missing repo state restore with required projects  https://review.opendev.org/c/zuul/zuul/+/78515211:52
openstackgerritTobias Henkel proposed zuul/zuul master: Fix missing repo state restore with required projects  https://review.opendev.org/c/zuul/zuul/+/78515211:53
*** jcapitao_lunch is now known as jcapitao11:55
*** sanjayu_ has joined #zuul12:10
*** saneax has quit IRC12:12
openstackgerritTobias Henkel proposed zuul/zuul master: Fix missing restore repo state for playbooks and roles  https://review.opendev.org/c/zuul/zuul/+/78516212:17
tobiashclarkb: and this is the second part ^12:17
tobiashcorvus: looks like the quick start job fails to setup tls zk: https://zuul.opendev.org/t/zuul/build/5dad7c4f2ac44a0ca1ba597ad178ca7c/log/container_logs/zk.log12:23
guillaumectobiash, https://review.opendev.org/c/zuul/zuul/+/78511012:36
tobiashguillaumec: oh thanks, missed that :)12:36
*** sanjayu_ has quit IRC12:42
*** rpittau|bbl is now known as rpittau12:51
*** rlandy is now known as rlandy|rover13:52
openstackgerritMerged zuul/zuul master: quick-start: Make zookeeper wait for certificates  https://review.opendev.org/c/zuul/zuul/+/78511014:51
clarkbtobiash: I'll take a look in a bit. What confuses me is when I read the restore repo state method it seems to be there to set refs not sync from external sources15:07
clarkbtobiash: the rough process seemed to be: pull from external sources if necessary, merge zuul changes, set refs to point at zuul changes as necessary and this last step was the repo restore step15:08
openstackgerritJames E. Blair proposed zuul/zuul master: Stop active event gathering on connection loss  https://review.opendev.org/c/zuul/zuul/+/78510015:14
openstackgerritJames E. Blair proposed zuul/zuul master: Add a checkpoint release note  https://review.opendev.org/c/zuul/zuul/+/78505415:14
openstackgerritTobias Henkel proposed zuul/nodepool master: Add simple load testing script  https://review.opendev.org/c/zuul/nodepool/+/77584315:22
clarkbtobiash: ok I think I see my confusion I read your earlier comments as meaning setRepoState() was the problem, but it is _restoreRepoState() which wasn't called at all previously?15:26
clarkbshould we make that a public method if we intend it to be called externally?15:26
clarkbtobiash: also I think we should add my test to your change15:27
tobiashyeah, I think we should make that a public method15:27
clarkb(may need to update my test to call slightly different methods though)15:27
tobiashclarkb: the change to the test case does test the fix15:27
tobiashat least it makes sure that the repo state gets restored15:28
clarkbtobiash: ya, I don't think it covers the specific case of a fast forward, but maybe this is sufficient15:29
*** ykarel is now known as ykarel|away15:41
*** hamalq has joined #zuul16:00
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503916:01
*** ykarel|away has quit IRC16:02
*** sshnaidm is now known as sshnaidm|afk16:06
openstackgerritMerged zuul/zuul master: Ensure single instance for active event gathering  https://review.opendev.org/c/zuul/zuul/+/78381716:07
*** sam_wan has quit IRC16:11
*** jcapitao has quit IRC16:32
openstackgerritMerged zuul/zuul master: Fix default parameter in GitlabSource  https://review.opendev.org/c/zuul/zuul/+/77925516:38
openstackgerritMerged zuul/zuul master: Periodically clean up leaked semaphores  https://review.opendev.org/c/zuul/zuul/+/78452316:38
openstackgerritMerged zuul/zuul master: Retry job on broken process pool  https://review.opendev.org/c/zuul/zuul/+/77787416:39
*** avass has joined #zuul16:46
*** rpittau is now known as rpittau|afk16:46
*** josefwells has joined #zuul16:47
*** rlandy|rover is now known as rlandy|rover|lch16:58
openstackgerritSorin Sbârnea proposed zuul/zuul master: Document local testing  https://review.opendev.org/c/zuul/zuul/+/76646017:03
*** jpena is now known as jpena|off17:08
*** rlandy|rover|lch is now known as rlandy|rover17:19
*** y2kenny has joined #zuul17:29
*** sassyn has joined #zuul17:30
*** sassyn has quit IRC17:37
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503917:42
*** y2kenny has quit IRC17:44
clarkbtobiash: ok I tried to leave some details notes (sorry it took a while)18:33
pabelangerso, interesting issue. some one just opened a PR in ansible/ansible and wanted to merge 4186 commits: https://github.com/ansible/ansible/pull/7417818:34
clarkbtobiash: I'm not really convinced this fix is the solution since merge_items and state_items are disjoint and the calls of setRepoState against state_items should've already called _restoreRepoState on those repos internal to the merger18:34
pabelangernow zuul is churning through those commits, even after the PR is closed18:34
clarkbpabelanger: you might be able to run the dequeue command to tell it to stop18:34
pabelangertrying to figure out the best way to deal with this, given we are block on other events right now for 2 hours18:35
clarkbor restart the merge doing the work and have it return an error18:35
pabelangernot sure how to figure out the merger18:36
pabelangerlet me lok18:36
clarkbpabelanger: look for the really busy one?18:36
pabelangerwell, it isn't the merge18:39
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Bump API version to v1alpha2  https://review.opendev.org/c/zuul/zuul-operator/+/78504718:39
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: docs  https://review.opendev.org/c/zuul/zuul-operator/+/78508318:39
pabelangerit is the API calls to github18:39
clarkbpabelanger: oh neat18:39
pabelanger2021-04-07 18:39:33,956 DEBUG zuul.GithubRequest: [e: ab169bc0-97c1-11eb-9073-bd928821d5df] GET https://api.github.com/repositories/3638964/pulls/74178/files?per_page=100&page=7 result: 200, size: 624883, duration: 54818:39
clarkbpabelanger: I didn't think the zuul api calls changed by number of commits in a PR18:39
pabelangeretc18:39
clarkbTIL18:39
clarkbthis is probably even before the event has been queued18:40
pabelangerand, I am unsure I can dequeue, as the change isn't in the web UI18:40
clarkbso dequeue wouldn't ehlp18:40
pabelangeryah18:40
pabelangerbasically, need a way to drop the current request or some sort of limit of merges to loop over18:42
clarkbpabelanger: is it doing these lookups per commit and that is slow or is it per PR but the PR is so large that the constituent pieces are slow?18:43
pabelangerno, it looks like look up per-commit18:43
pabelangerI can see API hits, then merger trigger, then API hits18:43
clarkbI'm not familiar enough with the github driver to know for sure, but I suspect it would be difficult to limit commit lookups and maintain correctness for testing PRs that exceed that limit. But maybe can return and error and ask epople to trim18:45
clarkbs/trim/squash/ might be more accurate18:45
pabelangeryah, each commit is a github event18:47
pabelangerwhich we then process18:47
clarkbpabelanger: looking at githubconnection.py the files listings are the only place the driver does that18:52
pabelangeryah18:55
pabelanger2021-04-07 18:54:28,227 INFO zuul.GithubConnection: [e: b0983720-97c1-11eb-9907-eed5aeae07ee] Updating <Change 0x7fedfc87d278 ansible/ansible 74178,81ac1d492cecf541dc2d479b1c537b199bafca65>18:55
clarkbreading the code it seems we only expect 300 files as a limit on the PR events (and log a warning when that is exceeded)18:57
clarkbpabelanger: on the non PR side (eg tags or branch push) it does iterate through all the new commits18:58
pabelangerchange.files, that is for filematch I assume?18:58
clarkbI wonder if we get both types of event when a PR is open18:58
clarkbpabelanger: yes18:58
clarkbpabelanger: I think it may also help detect if there are new zuul configs to evaluate18:59
pabelangeryah18:59
pabelangerokay, that makes sense18:59
pabelangerin this case, we'd never load config from ansible/ansible because of exlcude config setting in project (tenant)19:00
pabelangerso maybe we could short cut in that case19:00
clarkbwell it already is short cutted is what I' msaying19:00
clarkbunless PRs somehow trigger non PR events too19:00
pabelangeroh19:00
clarkbI'm getting links up now19:01
clarkbhttps://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L1384-L1402 PR case19:01
clarkbhttps://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L2171-L2173 branch push/tag case19:02
clarkbthe behavior you describe seems more like the branch push/tag case since that does iterate over every commit in the push and lists files19:02
clarkbbut when in a PR context we appear to ask the API for the commits in the PR overall and log a warning if there are too many19:02
pabelangerfrom what I see, we don't loop over the commit in the PR, we got a ton of events for each commit, and we are looping over those events19:04
pabelangerwhich is a single commit19:04
pabelanger2021-04-07 19:04:58,147 DEBUG zuul.GithubConnection: [e: bca42b50-97c1-11eb-84e2-8d446fc8f00f] Scheduling event from github.com: <GithubTriggerEvent 0x7fee1fb9afd0 check_run ansible/ansible refs/pull/74178/head completed github.com/ansible/ansible 74178,81ac1d492cecf541dc2d479b1c537b199bafca65 delivery: bca42b50-97c1-11eb-84e2-8d446fc8f00f>19:05
pabelangerI see that each time we loop over the files19:05
pabelangeronce we get all the files, then another event from same PR comes in19:05
pabelangerbut, it seems to always be the same commit 81ac1d19:05
clarkbok that would be the _event_check_run codepath I think19:06
pabelanger2021-04-07 19:06:35,212 DEBUG zuul.GithubConnection: [e: bd34f9a0-97c1-11eb-9ce2-c5809c60be27] Scheduling event from github.com: <GithubTriggerEvent 0x7fee13eba198 check_run ansible/ansible refs/pull/74178/head completed github.com/ansible/ansible 74178,81ac1d492cecf541dc2d479b1c537b199bafca65 delivery: bd34f9a0-97c1-11eb-9ce2-c5809c60be27>19:06
pabelangernext event19:06
*** josefwells has quit IRC19:07
clarkbya so the delivery ids differ but the gist of the action is the same19:07
pabelangeryup19:07
clarkbI think that implies github is sending the same event over and over again19:07
clarkbdoes each commit in a PR trigger a different check_run event?19:08
pabelangeryah, for each commit I believe19:08
pabelangerI _think_ so19:08
pabelangerthat is what I am going to read up on19:08
clarkbyou can double check that through the github app history of hook requests iirc19:08
clarkbto me that doesn't make a lot of sense for their api design unless the PR was modified with a enw commit over and over in sequence over time19:09
pabelangerway too many events now to look19:11
clarkbcan you filter by id?19:11
clarkb(if so  you cuold grab a few from your zuul log, pull them up and see if they differ)19:11
clarkbs/if they differ/how they differ/19:11
pabelangerno, it is just a long list by hour19:12
pabelangerI maybe able via api some how, but not web ui19:12
pabelangerk, I have to run for a bit, will update later19:14
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503919:58
*** vishalmanchanda has quit IRC20:16
*** mgagne has joined #zuul20:22
*** spotz has quit IRC20:38
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework  https://review.opendev.org/c/zuul/zuul-operator/+/78503920:38
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Bump API version to v1alpha2  https://review.opendev.org/c/zuul/zuul-operator/+/78504720:38
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: docs  https://review.opendev.org/c/zuul/zuul-operator/+/78508320:38
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Support externally managed Zookeeper  https://review.opendev.org/c/zuul/zuul-operator/+/78527320:38
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Support externally managed Zookeeper and DB  https://review.opendev.org/c/zuul/zuul-operator/+/78527320:44
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: docs  https://review.opendev.org/c/zuul/zuul-operator/+/78508320:44
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: docs  https://review.opendev.org/c/zuul/zuul-operator/+/78508320:59
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Pass through extra scheduler config options  https://review.opendev.org/c/zuul/zuul-operator/+/78527720:59
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Add merger support  https://review.opendev.org/c/zuul/zuul-operator/+/78527820:59
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Support imagePrefix and versions  https://review.opendev.org/c/zuul/zuul-operator/+/78527920:59
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Support externally managed Zookeeper and DB  https://review.opendev.org/c/zuul/zuul-operator/+/78527321:19
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Pass through extra scheduler config options  https://review.opendev.org/c/zuul/zuul-operator/+/78527721:19
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Add merger support  https://review.opendev.org/c/zuul/zuul-operator/+/78527821:19
openstackgerritJames E. Blair proposed zuul/zuul-operator master: Support imagePrefix and versions  https://review.opendev.org/c/zuul/zuul-operator/+/78527921:19
openstackgerritJames E. Blair proposed zuul/zuul-operator master: WIP: docs  https://review.opendev.org/c/zuul/zuul-operator/+/78508321:19
*** spotz has joined #zuul21:38
openstackgerritClark Boylan proposed zuul/nodepool master: DO NOT MERGE Just need a test image built  https://review.opendev.org/c/zuul/nodepool/+/78528621:48
clarkbfungi: ^ I think that will produce the image we want21:48
clarkbTBD if installing dib from source like that will be too slow on the emulated image build to get in under the timeout21:49
*** ianw_pto is now known as ianw22:14
*** rlandy|rover is now known as rlandy|rover|bbl22:34
*** tosky has quit IRC23:11
openstackgerritClark Boylan proposed zuul/nodepool master: DO NOT MERGE Just need a test image built  https://review.opendev.org/c/zuul/nodepool/+/78528623:20
clarkblets see if that does better. I think the problem was it tries to build two images but one of them didn't set the arches list and when you trigger the buildx path you need to set arches23:21

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!