Wednesday, 2021-08-04

opendevreviewJames E. Blair proposed zuul/zuul master: Use a new NodeRequest object when re-submitting node requests  https://review.opendev.org/c/zuul/zuul/+/80340700:42
corvusclarkb: ^ with a test!00:43
corvusthinking more on that -- we're more likely to hit that now that the executors are the ones which actually accept the request -- there's a longer window between fulfillment and acceptance.00:44
Clark[m]corvus: I skimmed the fix and it looks good. I'll give it a proper review in the morning01:21
Clark[m]I noticed this evening that trying to search builds on mobile doesn't work because the menu that drops down for search closes when I try to select the search text field and type in it01:31
opendevreviewxinliang proposed zuul/zuul-jobs master: Fix install podman error on Ubuntu aarch64 Bionic  https://review.opendev.org/c/zuul/zuul-jobs/+/80341302:34
opendevreviewxinliang proposed zuul/zuul-jobs master: Fix install podman error on Ubuntu aarch64 Bionic  https://review.opendev.org/c/zuul/zuul-jobs/+/80341303:22
opendevreviewxinliang proposed zuul/zuul-jobs master: Fix install podman error on Ubuntu aarch64 Bionic  https://review.opendev.org/c/zuul/zuul-jobs/+/80341303:24
opendevreviewxinliang proposed zuul/zuul-jobs master: Fix install podman error on Ubuntu aarch64 Bionic  https://review.opendev.org/c/zuul/zuul-jobs/+/80341303:45
felixedelcorvus: Right, haven't thought about that yesterday. Thanks for squashing the changes.05:19
*** marios is now known as marios|ruck06:06
abitrolly[m]Kafka is now capable of running without Zookeeper - https://developer.ibm.com/tutorials/kafka-in-kubernetes/  Can Zuul do the same?06:39
*** jpena|off is now known as jpena07:01
*** rpittau|afk is now known as rpittau07:13
apevec[m]so that would require Zuul feature like https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum ?07:38
avass[m]abitrolly: it would probably require quite a bit of work to replace Zookeeper with something else for zuul (or add support for other systems like etcd). I don't think anyone is strictly opposed to it but with the current resources the development effort is focused on replacing gearman with zookeeper07:45
abitrolly[m]What was rationale for choosing Zookeeper?07:48
avass[m]abitrolly: you'll have to ask someone more involved in the process for a good answer. But i remember seeing discussions about that earlier and etcd wasn't as mature when the decision was taken.08:00
mhuinabitrolly[m], I think this was discussed in this spec: https://opendev.org/zuul/zuul/src/branch/master/doc/source/reference/developer/specs/scale-out-scheduler.rst08:01
mhuinI may be wrong though08:02
abitrolly[m]The article just says that Zookeper will be used without any rationale or mentioning alternative fault resistance mechanisms (i.e. Raft). It is unfortunate that such decisions are not getting proper write ups. People could learn a lot from that.08:04
avass[m]Yeah i think zookeeper was also chosen for zuul since nodepool was already using it08:05
swestabitrolly: as avass said it was a natural choice. IIRC there where some discussions if etcd would be a better option08:06
abitrolly[m]I don't understand what is a natural choice.08:07
abitrolly[m]People had experience with Zookeeper and no experience with etcd, and choose what they know?08:08
swestZookeeper and etcd serve a similar purpose and Zookeeper was already used for Zuul-Nodepool communication (so yes, existing experience + code) 08:10
swestthat's what I meant with "natural choice"08:10
swestabitrolly: btw. I found the discussion on zuul-discuss in case you are interested: http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-December/000659.html08:11
abitrolly[m]From this doc on Nodepool - https://docs.opendev.org/opendev/infra-specs/latest/specs/nodepool-zookeeper-workers.html - it looks like there was a need in shared state between daemon and workers that is not in MySQL to avoid giving access to MySQL for workers. Is it possible to write an abstraction for that shared state, so that any storage can be used? Redis, DB, etcd/Zookeeper.08:16
swestabitrolly: it's certainly possible but I'm not sure I understand your use case. Do you want to avoid using Zookeeper?08:22
abitrolly[m]Simplicity for testing Zuul jobs locally.08:24
swestabitrolly: sounds like your are looking for something like the zuul-runner (wip) https://zuul-ci.org/docs/zuul/reference/developer/specs/zuul-runner.html08:33
apevec[m]this was 2018. "zk is used in enough things that many people have a vested interest in (like Kafka)" - not anymore08:39
*** jcapitao is now known as jcapitao_lunch11:11
*** dviroel|out is now known as dviroel11:16
*** jpena is now known as jpena|lunch11:26
*** marios is now known as marios|ruck11:27
tibeer[m]Hello, I wanted to ask if someone has the Zuul-Operator working and could help me getting it running. During my testing I face the issue that at least one zookeeper is unable to join the cluster and the percona cluster is also not coming up correctly preventing the startup.12:11
fungiabitrolly[m]: also that specification you found is from early 2016. at the time, zookeeper was the only reasonable choice for a mature and highly-available lock manager service packaged in existing linux distributions (etcd for example was too new and not yet packaged in stable distros, other options were only distributed under proprietary/non-osi-approved licenses, and so on)12:25
fungitibeer[m]: are you using 0.1.0 or the current master branch state as of this weekend when all of corvus's refactor was merged?12:29
tibeer[m]fungi: Didn't noticed that. I'm using the current master branch.12:30
tibeer[m]I'll try 0.1.0 later, thanks!12:30
*** jpena|lunch is now known as jpena12:32
opendevreviewSorin Sbârnea proposed zuul/zuul-jobs master: Include podman installation with molecule  https://review.opendev.org/c/zuul/zuul-jobs/+/80347112:37
fungitibeer[m]: well, i was asking because i'd heard there were a number of problems with 0.1.0, hence all the recent work on redoing it. on closer inspection there's still a number of changes not yet merged in the https://review.opendev.org/q/topic:"kopf" topic, so maybe what's merged so far is not complete12:37
*** jcapitao_lunch is now known as jcapitao12:42
tibeer[m]fungi: I quickly skimmed all change subjects and they seem not to fit the issue(s) I am facing.12:43
fungitibeer[m]: yes, as did i. still, maybe avass[m] or corvus have some idea what's going on there since they seem to have written most of the changes for that implementation12:45
tibeer[m]I am happy to help / contribute. We are actively looking into Zuul on Kubernetes.12:47
avass[m]It's a bit hard to know why that happens without more information. 12:48
avass[m]tibeer: I don't have any setup to quickly take a look at it since I'm on vacation. But if you have any logs from the zk/db operators i could probably take a quick look12:51
opendevreviewBenjamin Schanzel proposed zuul/nodepool master: Add Tenant-Scoped Resource Quota  https://review.opendev.org/c/zuul/nodepool/+/80076512:54
tibeer[m]avass: Very kind from you but please enjoy your vacation. I'd like to push that to a later point in time. I would like to take a lot of notes and contribute that to the documentation, as we are using it for GitHub. So others can profit from some sort of troubleshooting guide as well.12:55
opendevreviewBenjamin Schanzel proposed zuul/zuul master: statsd: expose current_requests per tenant  https://review.opendev.org/c/zuul/zuul/+/78868013:06
avass[m]tibeer: cool thanks! I'm planning on working on it some time after my vacation and if there are still issues then I'll take a look at it :)13:06
corvustibeer: it sounds like the zk cluster may not be sized correctly -- you'll need at least 3 somewhat large nodes to support all 3 zk and pxc nodes.  i'd check the k8s status for those and see if it's failing constraints or taints.13:42
opendevreviewBenjamin Schanzel proposed zuul/zuul master: Add tenant name on NodeRequests for Nodepool  https://review.opendev.org/c/zuul/zuul/+/78868014:22
opendevreviewBenjamin Schanzel proposed zuul/nodepool master: Add Tenant-Scoped Resource Quota  https://review.opendev.org/c/zuul/nodepool/+/80076514:49
clarkbswest: can you check my comment on https://review.opendev.org/c/zuul/zuul/+/803407 if I'm wrong about that assumption it would be good to clarify14:50
swestclarkb: responded to your comment14:52
clarkbswest: I see the other direction is the issue14:53
clarkbcorvus: ^14:53
swestyep14:53
clarkbcorvus: swest: maybe we need a request.clear() or .reset() method to call and have it modify the internal state to be safe?14:59
clarkboh we do have access to the scheduler in the nodepool class there. We could probably lookup the buildset and modify its requests list too?15:00
corvusclarkb: i think we should just make a minimal change to the existing code for now; an upcoming change will allow us to avoid re-using the node request later15:00
clarkbcorvus: that wfm15:01
corvusi'll update in just a bit15:01
*** jpena is now known as jpena|off15:05
opendevreviewSorin Sbârnea proposed zuul/zuul-jobs master: Include podman installation with molecule  https://review.opendev.org/c/zuul/zuul-jobs/+/80347115:10
opendevreviewJames E. Blair proposed zuul/zuul master: Execute merge jobs via ZooKeeper  https://review.opendev.org/c/zuul/zuul/+/80229915:14
*** marios|ruck is now known as marios|out16:00
opendevreviewSorin Sbârnea proposed zuul/zuul-jobs master: Include podman installation with molecule  https://review.opendev.org/c/zuul/zuul-jobs/+/80347116:11
*** rpittau is now known as rpittau|afk16:14
clarkbfelixedel: corvus: I left a number of comments on https://review.opendev.org/c/zuul/zuul/+/787686 I'm not sure if any are worth a -1 but wanted to make sure you all saw those16:46
corvusclarkb: thx; will look now -- i'm doing a final test run on the updated fix, should have a patch in a min16:47
clarkbcool I'll plan to review that next then. I should context switch to gerrit account retirements after that though as I said I would get that done today16:47
opendevreviewJames E. Blair proposed zuul/zuul master: Clear nodeset when re-submitting node requests  https://review.opendev.org/c/zuul/zuul/+/80340717:12
corvusclarkb: i think that should be ready now17:12
clarkbreviewing17:14
clarkbcorvus: how does copying the nodeset fix this? is that because nodeset.copy() is only copying the logical nodeset info and not the fulfilled nodeset info?17:21
corvusclarkb: yes.  but also, i think some node state is in zkdata too17:22
corvus * clarkb: yes.  but also, i think some node state is in `_zk_data` too17:22
clarkbcorvus: ok, looking at copy() and addNode() I think it may just produce the same data again which makes me suspect _zk_data more. But if this is tested then good enough for me /me reviews the test17:23
corvusclarkb: yes, i think _zk_data is the most important, but i think it's also true that the Node objects in the nodeset do have ip addresses, etc, and they will not after the copy17:25
corvus(the copy creates new node objects with only names and labels)17:25
clarkbgot it17:26
clarkblgtm +217:29
clarkbswest: ^ if your day hasn't ended you amy want to rereview that17:29
corvusclarkb: replied to your comments on merger.  i think there are some -1s in there.  for some of the improvements -- the merger and executor apis are nearly mirror images now.  i'd like to keep them as close as possible, and now that we know what each of them needs, i'd like to follow this stack up with a refactor to try to merge them.  it's not trivial -- each has something the other doesn't, but i think it's possible with a bit of work, and should17:30
corvusmake maintenance much easier.17:30
corvus(differences: the executor has zones; the merger returns data; neither is true for the other)17:31
corvuszuul-maint: https://review.opendev.org/803407 is a production bugfix which i'd love to merge and restart opendev with today17:32
fungioh, right, that's where the marathon troubleshooting ended up last night. reviewing17:34
clarkbcorvus: merge api change comments responded to. I'm ok with cleaning up those bigger things in refactoring changes. But a few of those like the removal of dead code and unused conditions as well as handling param deletion etc should probably get cleaned up in this change to make it easier to understand going forward17:36
corvusclarkb: yep, will do next.17:37
*** ricolin_ is now known as ricolin18:02
opendevreviewJames E. Blair proposed zuul/zuul master: [web] Update the tenantsPage to drop redux  https://review.opendev.org/c/zuul/zuul/+/80350418:21
fungicorvus: 803407,2 has failed its tox unit test jobs, though looks like the buildset isn't likely to report (probably courtesy of the bug it aims to fix? i haven't dug into it)18:35
fungi"testtools.matchers._impl.MismatchError: 'unknown' != 'ready'" on test_accept_nodes_lost_request and test_accept_nodes_lost_request 18:38
fungier, on TestNodepoolResubmit.test_accept_nodes_lost_request and TestNodepool.test_accept_nodes_lost_request18:38
clarkbhrm we didn't modify TestNodepool just inherited from it. If we were going to have errors I would've expected it in the child only18:42
funginice though that we have proper build result pages linked from those even if the buildset never reports!18:44
fungithat feature is already paying dividends18:44
corvusoh, i overrode a method, i think i just need to refactor that out into a base class; i didn't want the test methods in super to run19:15
corvusand the change requires an update to that test.19:19
opendevreviewTristan Cacqueray proposed zuul/zuul master: [web] Update the TenantsPage to use the context hook  https://review.opendev.org/c/zuul/zuul/+/80350919:20
opendevreviewJames E. Blair proposed zuul/zuul master: Clear nodeset when re-submitting node requests  https://review.opendev.org/c/zuul/zuul/+/80340719:20
*** y2kenny is now known as Guest332320:05
*** Guest3323 is now known as y2kenny20:06
clarkbcorvus: in ^ why does the state of the nodes chagne from ready to unknown in that test?20:06
y2kennyFor smtp/email reporter, is the email address configured per pipeline or can the email address be configured at the job level?20:07
clarkby2kenny: since reporters are done per pipeline it is only per pipeline iirc20:08
clarkbyou could have a job send email itself (openstack does this for some release jobs)20:08
y2kennyclark: ah ok.  THanks.20:11
corvusclarkb: because the nodeset in the test is the nodeset on the build request, and now we're resetting it.  that test ends with an unfulfilled node request, so 'unknown' is legit.  previously it ended with an unfulfilled request which had nodes, and those nodes were ready.  i think the important thing is to assert that they aren't locked, and aren't used or somethign.20:12
clarkbI see, its a side effect of our new reset20:14
clarkbcorvus: I've +2'd the change but not approved it since i Think the unittests are close to finishing and getting that info if they fail seems like a good idea20:20
corvusclarkb: sgtm; i haven't re-run the full suite locally so i agree20:20
*** dviroel is now known as dviroel|out20:38
corvusone of the test suites finished, i'm going to go ahead and +w it20:40
fungiawesome, i'll be around for a restart in an hour or so, just putting dinner together now20:49
*** timburke_ is now known as timburke20:57
opendevreviewJames E. Blair proposed zuul/zuul master: Add MergerApi  https://review.opendev.org/c/zuul/zuul/+/78768621:19
opendevreviewJames E. Blair proposed zuul/zuul master: Fix test_gerrit.TestPolling.test_config_update  https://review.opendev.org/c/zuul/zuul/+/77302321:20
opendevreviewJames E. Blair proposed zuul/zuul master: Execute merge jobs via ZooKeeper  https://review.opendev.org/c/zuul/zuul/+/80229921:20
corvusclarkb: good suggestions :)21:20
clarkbcorvus: +2'd the first one. The relationship between read, wait and delete in the event result futures is fun to unravel21:31
clarkbhttps://review.opendev.org/c/zuul/zuul/+/802299/ is next and not a small change :) I'll probably hold off on digging into that until after we restart zuul to fix the ongoing issue and then I need to find a nice spot in the shade :)21:36
corvusclarkb: sounds good.  hopefully the split between "add the api" and "use the api" makes sense.  no rush on those.21:46
corvusmy plan is to help with restart when ready and otherwise hack on the refactor21:46
fungigood, dinner done and it hasn't merged yet21:48
fungii'm not too late21:48
corvusit timed out :(22:33
clarkb:(22:39
opendevreviewJames E. Blair proposed zuul/zuul master: WIP: Refactor Merger/Executor API  https://review.opendev.org/c/zuul/zuul/+/80352923:43
corvusthat's just a checkpoint -- it's not done yet; i still have a lot of test methods to update.  however, the basic executor and merger tests both work with that, so i think the approach is viable.23:44
corvusso far it's a net -292 line change23:45
opendevreviewMerged zuul/zuul master: Clear nodeset when re-submitting node requests  https://review.opendev.org/c/zuul/zuul/+/80340723:47
clarkbwoot it merged23:47
clarkbnow we wait for image promotion23:48
clarkbhttps://zuul.opendev.org/t/zuul/build/d281dbe097124a3899ba25bfb8113355 that job succeeded23:49
clarkbcorvus: ^ should we proceed wtih a restart? I'm not sure what else has merged since the last restart and how risk it is due to the other changes23:50

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!