fungi | corvus: indeed, yet it also in a rather contradictory way encouraged that you "don't panic" right on the cover | 00:03 |
---|---|---|
*** tosky has quit IRC | 00:17 | |
corvus | neat, the operator test job failed on limestone because there was insufficient memory to start a 3-node pxc cluster: https://zuul.opendev.org/t/zuul/build/eaf6255eda634b56a8e75a88110de9cf/log/describe-pods.txt#822 | 00:29 |
corvus | (but we have seen it get past that point on other providers) | 00:29 |
corvus | we could probably tweak the pxc memory requirements, or we could make an actual 3-node k8s cluster | 00:31 |
fungi | if "other providers" are vexxhost, they upped our minimum ram on the flavor we're using to ~32gb in order to balance against cpu count | 00:31 |
corvus | fungi: indeed, the successful run i picked was vexxhost; that's a bit dangerous for exactly this reason | 00:33 |
corvus | fungi: last time we were in this situation, we booted nodes with reduced ram | 00:33 |
fungi | yep, i brought all that up at the time | 00:33 |
corvus | fungi: i don't remember any of this -- is there a reason opendev didn't decide to do that this time? or just nobody got around to doing it? | 00:34 |
fungi | it also required us to bake the memory limit into our images, if i recall correctly, since it was done with kernel command line parameters | 00:34 |
corvus | fungi: yes, that is my recollection | 00:34 |
fungi | corvus: there are some comments on 773710 about it, and then brought it up in the meeting http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-02-19.01.log.html#l-146 | 00:39 |
corvus | wow that change merged in 30 minutes | 00:42 |
corvus | -> #opendev | 00:42 |
fungi | there was also related discussion in http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2021-02-02.log.html#t2021-02-02T15:33:33-2 | 00:42 |
*** hamalq has quit IRC | 00:48 | |
*** rlandy|bbl has quit IRC | 01:41 | |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework https://review.opendev.org/c/zuul/zuul-operator/+/785039 | 01:57 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Bump API version to v1alpha2 https://review.opendev.org/c/zuul/zuul-operator/+/785047 | 02:00 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: docs https://review.opendev.org/c/zuul/zuul-operator/+/785083 | 02:00 |
corvus | tristanC, mordred: ^ that's the start of some docs; i intend to write more. that might help explain some of what's in 785039 | 02:01 |
*** evrardjp has quit IRC | 02:33 | |
*** evrardjp has joined #zuul | 02:33 | |
*** josefwells has quit IRC | 03:01 | |
*** bhavikdbavishi has joined #zuul | 03:03 | |
*** bhavikdbavishi has quit IRC | 03:08 | |
*** sam_wan has joined #zuul | 03:35 | |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework https://review.opendev.org/c/zuul/zuul-operator/+/785039 | 03:53 |
*** vishalmanchanda has joined #zuul | 03:57 | |
*** ykarel has joined #zuul | 04:11 | |
*** saneax has joined #zuul | 04:47 | |
*** saneax has quit IRC | 05:40 | |
*** saneax has joined #zuul | 06:16 | |
*** ykarel_ has joined #zuul | 06:18 | |
*** ykarel has quit IRC | 06:21 | |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Switch to ZooKeeperSimpleBase where possible https://review.opendev.org/c/zuul/zuul/+/785091 | 06:30 |
*** ykarel_ is now known as ykarel | 06:32 | |
tobiash | swest, corvus: +3 with comment on https://review.opendev.org/c/zuul/zuul/+/783817 | 06:44 |
*** jcapitao has joined #zuul | 06:53 | |
*** rpittau|afk is now known as rpittau | 07:03 | |
*** tosky has joined #zuul | 07:37 | |
*** ykarel has quit IRC | 07:56 | |
*** jpena|off is now known as jpena | 07:57 | |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Stop active event gathering on connection loss https://review.opendev.org/c/zuul/zuul/+/785100 | 08:06 |
swest | tobiash: corvus: ^ proposal to handle the connection loss | 08:13 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: Stop active event gathering on connection loss https://review.opendev.org/c/zuul/zuul/+/785100 | 08:18 |
*** ykarel has joined #zuul | 08:20 | |
*** rpittau is now known as rpittau|bbl | 09:23 | |
tobiash | clarkb: I've digged deeper into the repo state and isUpdateNeeded thing. I think we just miss a restore repo state after updating the repos (see comment) | 09:29 |
openstackgerrit | Guillaume Chauvel proposed zuul/zuul master: quick-start: Make zookeeper wait for certificates https://review.opendev.org/c/zuul/zuul/+/785110 | 09:29 |
openstackgerrit | Merged zuul/zuul master: Dispatch Gerrit events via Zookeeper https://review.opendev.org/c/zuul/zuul/+/783816 | 09:47 |
tobiash | clarkb: yepp, validated in the log of a local test case that the repo state restore is missing. This has been unnoticed by a flaw in the global repo state test case. I'll work on a fix for the test case and repo state restore. | 09:59 |
*** lyr has quit IRC | 10:18 | |
*** lyr has joined #zuul | 10:20 | |
*** ykarel has quit IRC | 10:41 | |
*** ykarel has joined #zuul | 10:48 | |
*** jcapitao is now known as jcapitao_lunch | 10:54 | |
*** dpawlik4 has joined #zuul | 11:40 | |
*** dpawlik4 is now known as dpawlik | 11:42 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix missing repo state restore with required projects https://review.opendev.org/c/zuul/zuul/+/785152 | 11:43 |
tobiash | clarkb: this should be one part of the fix (retuired-projects use case) | 11:43 |
tobiash | clarkb: the fix for the playbooks and roles will follow | 11:43 |
*** rlandy has joined #zuul | 11:43 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix missing repo state restore with required projects https://review.opendev.org/c/zuul/zuul/+/785152 | 11:50 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix missing repo state restore with required projects https://review.opendev.org/c/zuul/zuul/+/785152 | 11:52 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix missing repo state restore with required projects https://review.opendev.org/c/zuul/zuul/+/785152 | 11:53 |
*** jcapitao_lunch is now known as jcapitao | 11:55 | |
*** sanjayu_ has joined #zuul | 12:10 | |
*** saneax has quit IRC | 12:12 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix missing restore repo state for playbooks and roles https://review.opendev.org/c/zuul/zuul/+/785162 | 12:17 |
tobiash | clarkb: and this is the second part ^ | 12:17 |
tobiash | corvus: looks like the quick start job fails to setup tls zk: https://zuul.opendev.org/t/zuul/build/5dad7c4f2ac44a0ca1ba597ad178ca7c/log/container_logs/zk.log | 12:23 |
guillaumec | tobiash, https://review.opendev.org/c/zuul/zuul/+/785110 | 12:36 |
tobiash | guillaumec: oh thanks, missed that :) | 12:36 |
*** sanjayu_ has quit IRC | 12:42 | |
*** rpittau|bbl is now known as rpittau | 12:51 | |
*** rlandy is now known as rlandy|rover | 13:52 | |
openstackgerrit | Merged zuul/zuul master: quick-start: Make zookeeper wait for certificates https://review.opendev.org/c/zuul/zuul/+/785110 | 14:51 |
clarkb | tobiash: I'll take a look in a bit. What confuses me is when I read the restore repo state method it seems to be there to set refs not sync from external sources | 15:07 |
clarkb | tobiash: the rough process seemed to be: pull from external sources if necessary, merge zuul changes, set refs to point at zuul changes as necessary and this last step was the repo restore step | 15:08 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Stop active event gathering on connection loss https://review.opendev.org/c/zuul/zuul/+/785100 | 15:14 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add a checkpoint release note https://review.opendev.org/c/zuul/zuul/+/785054 | 15:14 |
openstackgerrit | Tobias Henkel proposed zuul/nodepool master: Add simple load testing script https://review.opendev.org/c/zuul/nodepool/+/775843 | 15:22 |
clarkb | tobiash: ok I think I see my confusion I read your earlier comments as meaning setRepoState() was the problem, but it is _restoreRepoState() which wasn't called at all previously? | 15:26 |
clarkb | should we make that a public method if we intend it to be called externally? | 15:26 |
clarkb | tobiash: also I think we should add my test to your change | 15:27 |
tobiash | yeah, I think we should make that a public method | 15:27 |
clarkb | (may need to update my test to call slightly different methods though) | 15:27 |
tobiash | clarkb: the change to the test case does test the fix | 15:27 |
tobiash | at least it makes sure that the repo state gets restored | 15:28 |
clarkb | tobiash: ya, I don't think it covers the specific case of a fast forward, but maybe this is sufficient | 15:29 |
*** ykarel is now known as ykarel|away | 15:41 | |
*** hamalq has joined #zuul | 16:00 | |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework https://review.opendev.org/c/zuul/zuul-operator/+/785039 | 16:01 |
*** ykarel|away has quit IRC | 16:02 | |
*** sshnaidm is now known as sshnaidm|afk | 16:06 | |
openstackgerrit | Merged zuul/zuul master: Ensure single instance for active event gathering https://review.opendev.org/c/zuul/zuul/+/783817 | 16:07 |
*** sam_wan has quit IRC | 16:11 | |
*** jcapitao has quit IRC | 16:32 | |
openstackgerrit | Merged zuul/zuul master: Fix default parameter in GitlabSource https://review.opendev.org/c/zuul/zuul/+/779255 | 16:38 |
openstackgerrit | Merged zuul/zuul master: Periodically clean up leaked semaphores https://review.opendev.org/c/zuul/zuul/+/784523 | 16:38 |
openstackgerrit | Merged zuul/zuul master: Retry job on broken process pool https://review.opendev.org/c/zuul/zuul/+/777874 | 16:39 |
*** avass has joined #zuul | 16:46 | |
*** rpittau is now known as rpittau|afk | 16:46 | |
*** josefwells has joined #zuul | 16:47 | |
*** rlandy|rover is now known as rlandy|rover|lch | 16:58 | |
openstackgerrit | Sorin Sbârnea proposed zuul/zuul master: Document local testing https://review.opendev.org/c/zuul/zuul/+/766460 | 17:03 |
*** jpena is now known as jpena|off | 17:08 | |
*** rlandy|rover|lch is now known as rlandy|rover | 17:19 | |
*** y2kenny has joined #zuul | 17:29 | |
*** sassyn has joined #zuul | 17:30 | |
*** sassyn has quit IRC | 17:37 | |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework https://review.opendev.org/c/zuul/zuul-operator/+/785039 | 17:42 |
*** y2kenny has quit IRC | 17:44 | |
clarkb | tobiash: ok I tried to leave some details notes (sorry it took a while) | 18:33 |
pabelanger | so, interesting issue. some one just opened a PR in ansible/ansible and wanted to merge 4186 commits: https://github.com/ansible/ansible/pull/74178 | 18:34 |
clarkb | tobiash: I'm not really convinced this fix is the solution since merge_items and state_items are disjoint and the calls of setRepoState against state_items should've already called _restoreRepoState on those repos internal to the merger | 18:34 |
pabelanger | now zuul is churning through those commits, even after the PR is closed | 18:34 |
clarkb | pabelanger: you might be able to run the dequeue command to tell it to stop | 18:34 |
pabelanger | trying to figure out the best way to deal with this, given we are block on other events right now for 2 hours | 18:35 |
clarkb | or restart the merge doing the work and have it return an error | 18:35 |
pabelanger | not sure how to figure out the merger | 18:36 |
pabelanger | let me lok | 18:36 |
clarkb | pabelanger: look for the really busy one? | 18:36 |
pabelanger | well, it isn't the merge | 18:39 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Bump API version to v1alpha2 https://review.opendev.org/c/zuul/zuul-operator/+/785047 | 18:39 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: docs https://review.opendev.org/c/zuul/zuul-operator/+/785083 | 18:39 |
pabelanger | it is the API calls to github | 18:39 |
clarkb | pabelanger: oh neat | 18:39 |
pabelanger | 2021-04-07 18:39:33,956 DEBUG zuul.GithubRequest: [e: ab169bc0-97c1-11eb-9073-bd928821d5df] GET https://api.github.com/repositories/3638964/pulls/74178/files?per_page=100&page=7 result: 200, size: 624883, duration: 548 | 18:39 |
clarkb | pabelanger: I didn't think the zuul api calls changed by number of commits in a PR | 18:39 |
pabelanger | etc | 18:39 |
clarkb | TIL | 18:39 |
clarkb | this is probably even before the event has been queued | 18:40 |
pabelanger | and, I am unsure I can dequeue, as the change isn't in the web UI | 18:40 |
clarkb | so dequeue wouldn't ehlp | 18:40 |
pabelanger | yah | 18:40 |
pabelanger | basically, need a way to drop the current request or some sort of limit of merges to loop over | 18:42 |
clarkb | pabelanger: is it doing these lookups per commit and that is slow or is it per PR but the PR is so large that the constituent pieces are slow? | 18:43 |
pabelanger | no, it looks like look up per-commit | 18:43 |
pabelanger | I can see API hits, then merger trigger, then API hits | 18:43 |
clarkb | I'm not familiar enough with the github driver to know for sure, but I suspect it would be difficult to limit commit lookups and maintain correctness for testing PRs that exceed that limit. But maybe can return and error and ask epople to trim | 18:45 |
clarkb | s/trim/squash/ might be more accurate | 18:45 |
pabelanger | yah, each commit is a github event | 18:47 |
pabelanger | which we then process | 18:47 |
clarkb | pabelanger: looking at githubconnection.py the files listings are the only place the driver does that | 18:52 |
pabelanger | yah | 18:55 |
pabelanger | 2021-04-07 18:54:28,227 INFO zuul.GithubConnection: [e: b0983720-97c1-11eb-9907-eed5aeae07ee] Updating <Change 0x7fedfc87d278 ansible/ansible 74178,81ac1d492cecf541dc2d479b1c537b199bafca65> | 18:55 |
clarkb | reading the code it seems we only expect 300 files as a limit on the PR events (and log a warning when that is exceeded) | 18:57 |
clarkb | pabelanger: on the non PR side (eg tags or branch push) it does iterate through all the new commits | 18:58 |
pabelanger | change.files, that is for filematch I assume? | 18:58 |
clarkb | I wonder if we get both types of event when a PR is open | 18:58 |
clarkb | pabelanger: yes | 18:58 |
clarkb | pabelanger: I think it may also help detect if there are new zuul configs to evaluate | 18:59 |
pabelanger | yah | 18:59 |
pabelanger | okay, that makes sense | 18:59 |
pabelanger | in this case, we'd never load config from ansible/ansible because of exlcude config setting in project (tenant) | 19:00 |
pabelanger | so maybe we could short cut in that case | 19:00 |
clarkb | well it already is short cutted is what I' msaying | 19:00 |
clarkb | unless PRs somehow trigger non PR events too | 19:00 |
pabelanger | oh | 19:00 |
clarkb | I'm getting links up now | 19:01 |
clarkb | https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L1384-L1402 PR case | 19:01 |
clarkb | https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L2171-L2173 branch push/tag case | 19:02 |
clarkb | the behavior you describe seems more like the branch push/tag case since that does iterate over every commit in the push and lists files | 19:02 |
clarkb | but when in a PR context we appear to ask the API for the commits in the PR overall and log a warning if there are too many | 19:02 |
pabelanger | from what I see, we don't loop over the commit in the PR, we got a ton of events for each commit, and we are looping over those events | 19:04 |
pabelanger | which is a single commit | 19:04 |
pabelanger | 2021-04-07 19:04:58,147 DEBUG zuul.GithubConnection: [e: bca42b50-97c1-11eb-84e2-8d446fc8f00f] Scheduling event from github.com: <GithubTriggerEvent 0x7fee1fb9afd0 check_run ansible/ansible refs/pull/74178/head completed github.com/ansible/ansible 74178,81ac1d492cecf541dc2d479b1c537b199bafca65 delivery: bca42b50-97c1-11eb-84e2-8d446fc8f00f> | 19:05 |
pabelanger | I see that each time we loop over the files | 19:05 |
pabelanger | once we get all the files, then another event from same PR comes in | 19:05 |
pabelanger | but, it seems to always be the same commit 81ac1d | 19:05 |
clarkb | ok that would be the _event_check_run codepath I think | 19:06 |
pabelanger | 2021-04-07 19:06:35,212 DEBUG zuul.GithubConnection: [e: bd34f9a0-97c1-11eb-9ce2-c5809c60be27] Scheduling event from github.com: <GithubTriggerEvent 0x7fee13eba198 check_run ansible/ansible refs/pull/74178/head completed github.com/ansible/ansible 74178,81ac1d492cecf541dc2d479b1c537b199bafca65 delivery: bd34f9a0-97c1-11eb-9ce2-c5809c60be27> | 19:06 |
pabelanger | next event | 19:06 |
*** josefwells has quit IRC | 19:07 | |
clarkb | ya so the delivery ids differ but the gist of the action is the same | 19:07 |
pabelanger | yup | 19:07 |
clarkb | I think that implies github is sending the same event over and over again | 19:07 |
clarkb | does each commit in a PR trigger a different check_run event? | 19:08 |
pabelanger | yah, for each commit I believe | 19:08 |
pabelanger | I _think_ so | 19:08 |
pabelanger | that is what I am going to read up on | 19:08 |
clarkb | you can double check that through the github app history of hook requests iirc | 19:08 |
clarkb | to me that doesn't make a lot of sense for their api design unless the PR was modified with a enw commit over and over in sequence over time | 19:09 |
pabelanger | way too many events now to look | 19:11 |
clarkb | can you filter by id? | 19:11 |
clarkb | (if so you cuold grab a few from your zuul log, pull them up and see if they differ) | 19:11 |
clarkb | s/if they differ/how they differ/ | 19:11 |
pabelanger | no, it is just a long list by hour | 19:12 |
pabelanger | I maybe able via api some how, but not web ui | 19:12 |
pabelanger | k, I have to run for a bit, will update later | 19:14 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework https://review.opendev.org/c/zuul/zuul-operator/+/785039 | 19:58 |
*** vishalmanchanda has quit IRC | 20:16 | |
*** mgagne has joined #zuul | 20:22 | |
*** spotz has quit IRC | 20:38 | |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: Use kopf operator framework https://review.opendev.org/c/zuul/zuul-operator/+/785039 | 20:38 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Bump API version to v1alpha2 https://review.opendev.org/c/zuul/zuul-operator/+/785047 | 20:38 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: docs https://review.opendev.org/c/zuul/zuul-operator/+/785083 | 20:38 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Support externally managed Zookeeper https://review.opendev.org/c/zuul/zuul-operator/+/785273 | 20:38 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Support externally managed Zookeeper and DB https://review.opendev.org/c/zuul/zuul-operator/+/785273 | 20:44 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: docs https://review.opendev.org/c/zuul/zuul-operator/+/785083 | 20:44 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: docs https://review.opendev.org/c/zuul/zuul-operator/+/785083 | 20:59 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Pass through extra scheduler config options https://review.opendev.org/c/zuul/zuul-operator/+/785277 | 20:59 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Add merger support https://review.opendev.org/c/zuul/zuul-operator/+/785278 | 20:59 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Support imagePrefix and versions https://review.opendev.org/c/zuul/zuul-operator/+/785279 | 20:59 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Support externally managed Zookeeper and DB https://review.opendev.org/c/zuul/zuul-operator/+/785273 | 21:19 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Pass through extra scheduler config options https://review.opendev.org/c/zuul/zuul-operator/+/785277 | 21:19 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Add merger support https://review.opendev.org/c/zuul/zuul-operator/+/785278 | 21:19 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: Support imagePrefix and versions https://review.opendev.org/c/zuul/zuul-operator/+/785279 | 21:19 |
openstackgerrit | James E. Blair proposed zuul/zuul-operator master: WIP: docs https://review.opendev.org/c/zuul/zuul-operator/+/785083 | 21:19 |
*** spotz has joined #zuul | 21:38 | |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: DO NOT MERGE Just need a test image built https://review.opendev.org/c/zuul/nodepool/+/785286 | 21:48 |
clarkb | fungi: ^ I think that will produce the image we want | 21:48 |
clarkb | TBD if installing dib from source like that will be too slow on the emulated image build to get in under the timeout | 21:49 |
*** ianw_pto is now known as ianw | 22:14 | |
*** rlandy|rover is now known as rlandy|rover|bbl | 22:34 | |
*** tosky has quit IRC | 23:11 | |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: DO NOT MERGE Just need a test image built https://review.opendev.org/c/zuul/nodepool/+/785286 | 23:20 |
clarkb | lets see if that does better. I think the problem was it tries to build two images but one of them didn't set the arches list and when you trigger the buildx path you need to set arches | 23:21 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!