*** jamesmcarthur has joined #zuul | 00:11 | |
*** Goneri has quit IRC | 00:14 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: Implement zookeeper-auth https://review.opendev.org/619156 | 00:15 |
---|---|---|
*** jamesmcarthur has quit IRC | 00:19 | |
*** jamesmcarthur has joined #zuul | 00:21 | |
*** jamesmcarthur has quit IRC | 00:26 | |
*** y2kenny has joined #zuul | 00:34 | |
y2kenny | I have been trying the tutorial and have been adopting/modifying it to my existing environment. When I try the testjob step, I have been running into RETRY_LIMIT error (according to the dashboard), in duration of 6 seconds | 00:37 |
fungi | zuul sees failures in "pre" phase playbooks and also problems like inability to connect to defined job nodes as a reason to automatically retry a build | 00:39 |
fungi | by default, three retries in a row for such conditions bails out with a RETRY_LIMIT build result | 00:40 |
y2kenny | I see. I was looking at the log. I see scheduler logging "Execute job testjob ... on nodes... for change ... with dependent changes... " | 00:40 |
fungi | you might check the executor log for evidence of some systemic connectivity problem or early pre phase error | 00:41 |
y2kenny | ok. Thanks for the tips. I see executor logging Started SSH agent, added key, beginning job test job, updating repo, skipping updating local repo, checking out ... branch... | 00:43 |
y2kenny | may be I messed with the keys too much. | 00:44 |
*** jamesmcarthur has joined #zuul | 00:44 | |
y2kenny | I will try running the plain unmodify tutorial and compare. | 00:45 |
fungi | y2kenny: are you archiving logs anywhere? you could check the build results and see if ansible logged any errors with a pre playbook | 00:56 |
fungi | the build logs i mean, not the service logs | 00:57 |
fungi | even a retry-limit result should usually log the build console for the final (third by default) attempt | 00:59 |
*** jamesmcarthur has quit IRC | 01:08 | |
y2kenny | oddly enough the build log is not getting posted even though I added the post-logs.yaml. I might have messed around with the config a bit too much (I wanted to connect zuul to my production gerrit instead of the sample gerrit instance.) A lot of things are working (like the noop check and gate, recheck from comment, etc.) Auto submit also work. | 01:11 |
*** jamesmcarthur has joined #zuul | 01:12 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: Implement zookeeper-auth https://review.opendev.org/619156 | 01:18 |
fungi | another option might be to try to view the console log stream for a build if you can catch it while it's still running | 01:20 |
*** jamesmcarthur has quit IRC | 01:24 | |
*** jamesmcarthur has joined #zuul | 01:24 | |
*** rlandy|bbl is now known as rlandy | 01:53 | |
*** jamesmcarthur has quit IRC | 02:06 | |
*** jamesmcarthur has joined #zuul | 02:07 | |
*** swest has quit IRC | 02:16 | |
*** swest has joined #zuul | 02:30 | |
*** bhavikdbavishi has joined #zuul | 03:18 | |
*** jamesmcarthur has quit IRC | 03:21 | |
*** rlandy has quit IRC | 03:36 | |
*** bhavikdbavishi has quit IRC | 04:05 | |
*** ianychoi has quit IRC | 04:39 | |
*** ianychoi has joined #zuul | 04:40 | |
*** evrardjp has quit IRC | 05:35 | |
*** evrardjp has joined #zuul | 05:35 | |
y2kenny | Um... I was able to get the console log but for some reason I got an "ANSIBLE PARSE ERROR" | 06:16 |
y2kenny | although, later on it said "ubuntu-bionic | UNREACHABLE!" | 06:16 |
y2kenny | in between there are a few skipping plugin ara_read and ara_record | 06:17 |
*** AJaeger has joined #zuul | 07:03 | |
*** y2kenny has quit IRC | 07:05 | |
*** dpawlik has joined #zuul | 07:21 | |
*** AJaeger has quit IRC | 07:49 | |
*** AJaeger has joined #zuul | 08:00 | |
*** tosky has joined #zuul | 08:18 | |
*** avass has quit IRC | 08:20 | |
*** avass has joined #zuul | 08:20 | |
*** jcapitao has joined #zuul | 08:22 | |
*** Defolos has joined #zuul | 08:25 | |
*** decimuscorvinus has quit IRC | 08:45 | |
*** decimuscorvinus has joined #zuul | 08:45 | |
*** jpena|off is now known as jpena | 08:50 | |
*** bhavikdbavishi has joined #zuul | 08:52 | |
*** Defolos has quit IRC | 09:12 | |
*** Defolos has joined #zuul | 09:37 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Evaluate CODEOWNERS settings during canMerge check https://review.opendev.org/644557 | 09:41 |
*** bhavikdbavishi has quit IRC | 09:42 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add optional support for circular dependencies https://review.opendev.org/685354 | 09:45 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Optionally allow zoned executors to process unzoned jobs https://review.opendev.org/673840 | 09:49 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Use implied branch matcher for implied branches https://review.opendev.org/640272 | 10:01 |
*** sshnaidm|afk is now known as sshnaidm | 10:01 | |
*** jcapitao has quit IRC | 10:13 | |
*** rishabhhpe has joined #zuul | 10:14 | |
rishabhhpe | Hi All , is there any way in which i can restrict zuul to execute only one job at a single time ? | 10:15 |
*** jcapitao has joined #zuul | 10:15 | |
openstackgerrit | Benjamin Schanzel proposed zuul/nodepool master: Kubernetes/OpenShift Provider: Don't Require Bash in Container Images https://review.opendev.org/712034 | 10:17 |
*** mhu has joined #zuul | 10:26 | |
AJaeger | rishabhhpe: you can create a semaphore in your job config and use it everywhere, the semaphore can limit number of jobs. | 10:26 |
AJaeger | rishabhhpe:check the Zuul manual at zuul-ci.org for semaphore. But I wonder why you want this. What is the reason you ask this? What are you trying to achieve? | 10:27 |
rishabhhpe | Ajaeger: just for information i am asking .. not trying anything with this . | 10:27 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: DNM: test triggers https://review.opendev.org/712037 | 10:33 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: DNM: test triggers https://review.opendev.org/712037 | 10:33 |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: DNM: test triggers https://review.opendev.org/712037 | 10:42 |
*** bschanzel has joined #zuul | 10:43 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: DNM: test triggers https://review.opendev.org/712037 | 10:43 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Improve ensure-tox role https://review.opendev.org/708642 | 10:51 |
openstackgerrit | Merged zuul/zuul-jobs master: Tests bindep role on all-platforms https://review.opendev.org/708704 | 11:06 |
*** jcapitao is now known as jcapitao_lunch | 11:11 | |
*** ianychoi has quit IRC | 11:16 | |
*** ianychoi has joined #zuul | 11:16 | |
*** jamesmcarthur has joined #zuul | 11:19 | |
*** jamesmcarthur has quit IRC | 11:24 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Optionally allow zoned executors to process unzoned jobs https://review.opendev.org/673840 | 11:30 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add spec for enhanced regional executor distribution https://review.opendev.org/663413 | 11:34 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Move fingergw config to fingergw https://review.opendev.org/664949 | 11:36 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: WIP: Route streams to different zones via finger gateway https://review.opendev.org/664965 | 11:36 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw https://review.opendev.org/664950 | 11:36 |
*** bolg has quit IRC | 11:37 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: WIP: Route streams to different zones via finger gateway https://review.opendev.org/664965 | 11:47 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw https://review.opendev.org/664950 | 11:47 |
*** Goneri has joined #zuul | 11:49 | |
*** bhavikdbavishi has joined #zuul | 12:01 | |
*** Goneri has quit IRC | 12:12 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: bindep: Add missing virtualenv and fixed repo install https://review.opendev.org/693637 | 12:17 |
*** jamesmcarthur has joined #zuul | 12:21 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: WIP: Make ensure-tox pass cross-platform https://review.opendev.org/707439 | 12:24 |
*** jpena is now known as jpena|lunch | 12:25 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: Implement zookeeper-auth https://review.opendev.org/619156 | 12:25 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: ensure-tox: use failed_when https://review.opendev.org/712062 | 12:28 |
tristanC | corvus: Shrews: ^ is now failing on test_zookeeper_disconnect and test_node_request_disconnect , but it passes test_zookeeper_disconnect2 . would you understand why? (in PS20 the three test use the same connection restart process) | 12:28 |
*** jamesmcarthur has quit IRC | 12:36 | |
*** Goneri has joined #zuul | 12:43 | |
*** ianychoi has quit IRC | 12:49 | |
*** ianychoi has joined #zuul | 12:50 | |
*** jcapitao_lunch is now known as jcapitao | 12:57 | |
*** jpena|lunch is now known as jpena | 13:04 | |
*** harrymichal_ has joined #zuul | 13:07 | |
*** zxiiro has joined #zuul | 13:19 | |
*** bhavikdbavishi has quit IRC | 13:22 | |
*** harrymichal_ has quit IRC | 14:04 | |
*** ianychoi has quit IRC | 14:10 | |
*** ianychoi has joined #zuul | 14:11 | |
*** harrymichal has joined #zuul | 14:11 | |
Shrews | tristanC: hrm, not sure. If I had to guess, might have something to do with the failing tests doing self.zk.client.stop()/self.zk.client.start() while the disconnect2 test calls self.zk.connect() which re-establishes the auth stuff | 14:17 |
Shrews | I didn't actually look at PS20 to see if there were changes around that, though | 14:19 |
Shrews | I looked at the latest | 14:19 |
tristanC | Shrews: in PS20 i used another restart process (stop() then connect()), and that made the test failure localized to `test_zookeeper_disconnect` and `test_node_request_disconnect` | 14:21 |
tristanC | Shrews: in PS21 i revert back to (stop() start()), but that is causing unrelated test failure | 14:22 |
Shrews | tristanC: give me a bit to deal with some payroll headaches then I'll look at a little closer at those patch sets | 14:25 |
tristanC | Shrews: also, using (stop() then connect()) somehow fixed failure in `test_zookeeper_disconnect2` | 14:25 |
tristanC | Shrews: thanks! | 14:26 |
*** jamesmcarthur has joined #zuul | 14:39 | |
bschanzel | Hi tristanC, I've got a change proposal for syncing git repos to k8s pod nodes (related to https://review.opendev.org/#/c/535557/ and https://review.opendev.org/#/c/570667). See https://review.opendev.org/#/c/711920. I'd be glad about your opinion there. | 14:45 |
*** jamesmcarthur has quit IRC | 14:46 | |
bschanzel | Also, currently the k8s and OpenShift drivers require bash on the build nodes/pods. Therefore proposed https://review.opendev.org/#/c/712034. What do you think? | 14:46 |
*** bschanzel has quit IRC | 14:49 | |
*** bschanzel has joined #zuul | 14:51 | |
*** jamesmcarthur has joined #zuul | 14:54 | |
*** jamesmcarthur has quit IRC | 14:56 | |
*** jamesmcarthur has joined #zuul | 14:56 | |
tristanC | bschanzel: LGTM, thanks! | 15:01 |
bschanzel | tristanC cool, thanks for your review! | 15:03 |
*** mattw4 has joined #zuul | 15:14 | |
*** rishabhhpe has quit IRC | 15:16 | |
*** bschanzel has quit IRC | 15:31 | |
corvus | i'm back and slowly catching up; i'll look at the zk auth changes shortly | 15:32 |
*** harrymichal has quit IRC | 15:47 | |
*** erbarr has joined #zuul | 15:52 | |
*** mhu has quit IRC | 15:58 | |
*** mhu has joined #zuul | 15:58 | |
Shrews | tristanC: did you try adding a client.add_auth() call in between the client.stop()/client.start() calls in the test? | 16:01 |
Shrews | I'm trying to find where that's done in KazooClient but haven't found it yet :/ | 16:06 |
Shrews | Wow, I totally do not get how the kazoo auth system does the auth registration. | 16:11 |
corvus | Shrews: it may not send add_auth | 16:11 |
*** ianychoi has quit IRC | 16:11 | |
corvus | Shrews: when i used zk-shell, i was unable to use add_auth; i'm starting to suspect that add_auth only works for the old digest auth, and sasl is different and only happens on connect | 16:12 |
*** ianychoi has joined #zuul | 16:12 | |
Shrews | hrmmm | 16:12 |
clarkb | sasl should happen on connection creation | 16:12 |
clarkb | (as a general rule of thumb for sasl) | 16:12 |
corvus | Shrews: there's some stuff in kazoo/protocol/connection.py | 16:13 |
Shrews | ah, indeed. my grepping fails me | 16:14 |
corvus | Shrews: an initial skim of that file makes it look to me like stop()/start() should do everything needed to reconnect with sasl (but i'm not positive -- i have only skimmed it) | 16:16 |
Shrews | maybe we need a call to close() in between the stop/start | 16:20 |
*** harrymichal has joined #zuul | 16:21 | |
Shrews | total guess on that though | 16:21 |
corvus | yeah, that's what tristanC did in the other test | 16:23 |
corvus | Shrews, tristanC: oh, i think i may see it -- the kazoo client connection object has a .sasl_cli attribute, and that has a .completed attribute -- it may be that it isn't cleared by .stop() so it think it has already completed the sasl request on the second connection | 16:25 |
*** jcapitao has quit IRC | 16:25 | |
tristanC | corvus: that sounds like a good explanation of the failure | 16:27 |
*** harrymichal has quit IRC | 16:27 | |
*** harrymichal has joined #zuul | 16:28 | |
corvus | tristanC, Shrews: yeah, if i insert "self.zk.client._connection.sasl_cli = None" between stop() and start() in test_zookeeper_disconnect it works | 16:28 |
corvus | i'm not sure that's what we should do in the test, but i think it helps narrow down the problem :) | 16:28 |
Shrews | corvus: that begs the question to what happens in the real world using sasl and we lose our zk connection.... | 16:32 |
*** harrymichal has quit IRC | 16:32 | |
*** harrymichal_ has joined #zuul | 16:33 | |
corvus | Shrews: indeed... | 16:33 |
corvus | Shrews, tristanC: this is probably worth a test script that does authenticates and performs a 'get' every few seconds. run that and then restart zk. | 16:34 |
tristanC | Shrews: could we force a full reconnection in the connection listener? | 16:34 |
*** jcapitao has joined #zuul | 16:35 | |
Shrews | ConnectionHandler only sets sasl_cli to None in __init__, and KazooClient only gets a connection handler once in its __init__ | 16:35 |
corvus | well, kazoo is supposed to do that for us | 16:35 |
Shrews | I'm guessing this isn't a well tested path for kazoo | 16:35 |
corvus | Shrews: based on that, i suspect that this is a bug in kazoo and it will fail in real life, but i think we'll need to do something like the test i proposed to verify it (i don't think the zuul unit test is an adequate simulation) | 16:36 |
corvus | (well, i mean, the unit test in zuul might be a *great* simulation, but we need to confirm reality matches :) | 16:37 |
Shrews | agreed | 16:37 |
*** harrymichal_ has quit IRC | 16:38 | |
corvus | Shrews, tristanC: i think i know the answer to the other half of tristanC's question from earlier: the stop(), close(), connect() sequence that he used in test_zookeeper_disconnect2 appears to work, so why did test_zookeeper_disconnect fail when it was used there? | 16:39 |
corvus | Shrews, tristanC: i think the answer there is that test_zookeeper_disconnect relies on watches which are only called after the client reconnects. the stop/close/connect sequence completely kills the kazoo client, so all the watches disappear. | 16:40 |
*** harrymichal has joined #zuul | 16:40 | |
corvus | Shrews, tristanC: so i think that while even though that sequence appears to correctly reconnect, we should not use it because we rely on maintaining state information like watches in the kazoo client | 16:41 |
corvus | Shrews, tristanC: i just confirmed our suspicion with a test | 16:44 |
corvus | Shrews, tristanC: (i just modified a unit test to do the get in an infinite loop, then restarted zookeeper from under it -- easier than writing a dedicated script) | 16:44 |
Shrews | sasl failed i guess?? | 16:45 |
corvus | yeah, got the noautherror | 16:45 |
corvus | Shrews: what's the diff between lost and suspended? | 16:45 |
Shrews | corvus: i think suspended was the connection to zk was lost but the session might still be valid??? | 16:46 |
corvus | ah | 16:46 |
Shrews | lemme find the doc | 16:46 |
Shrews | corvus: https://kazoo.readthedocs.io/en/latest/api/protocol/states.html#kazoo.protocol.states.KazooState | 16:47 |
*** ianychoi has quit IRC | 16:47 | |
*** ianychoi has joined #zuul | 16:48 | |
corvus | Shrews, tristanC: proposal: we set "self.client._connection.sasl_cli = None" in our connection listener for when it's suspended. and we open an issue on kazoo. then we remove the workaround when it's fixed. | 16:48 |
Shrews | in suspended or lost? or both? | 16:49 |
corvus | Shrews: my reading is that suspended happens before lost, but maybe both if i'm wrong? | 16:49 |
corvus | Shrews: oh, the 'valid state transitions' in that doc covers it | 16:49 |
corvus | there is a possible connected-> lost, but only if the creds don't work... | 16:50 |
corvus | and suspended -> lost can happen if the connection resumes.... | 16:50 |
corvus | Shrews: so maybe both. :) | 16:50 |
Shrews | yeah, i think both is safest | 16:50 |
corvus | i just tried a test with both, and it seems to work | 16:51 |
corvus | so both at least seems good for the "zk server restarts" case | 16:51 |
corvus | Shrews, tristanC: i've got most of this typed up, let me just amend tristanC's patch with it | 16:52 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Implement zookeeper-auth https://review.opendev.org/619156 | 16:54 |
corvus | Shrews, tristanC: ^ | 16:54 |
corvus | Shrews, tristanC: note also i left a -1 comment on the zk auth fixup script | 16:54 |
Shrews | can haz same change in nodepool? | 16:54 |
corvus | Shrews, tristanC: i'll go open an issue on kazoo now | 16:54 |
Shrews | corvus: i just searched open PRs and didn't see any similar | 16:55 |
corvus | cool | 16:55 |
corvus | Shrews: you want to copy that over to the nodepool change while i open the pr? | 16:55 |
Shrews | sure | 16:55 |
openstackgerrit | David Shrewsbury proposed zuul/nodepool master: Implement zookeeper-auth https://review.opendev.org/619155 | 16:57 |
Shrews | oh, they added a sasl test suite last month according to commit log | 17:02 |
corvus | Shrews, tristanC: https://github.com/python-zk/kazoo/issues/594 | 17:03 |
Shrews | corvus: what kazoo version did your test use? | 17:06 |
corvus | 2.6.1 | 17:06 |
corvus | looks like that's the latest, yeah? | 17:06 |
Shrews | yeah | 17:07 |
Shrews | https://github.com/python-zk/kazoo/blob/master/CHANGES.md | 17:07 |
* Shrews lunches | 17:08 | |
*** Defolos has quit IRC | 17:09 | |
*** jamesmcarthur has quit IRC | 17:20 | |
*** ianychoi has quit IRC | 17:26 | |
*** mattw4 has quit IRC | 17:26 | |
*** mattw4 has joined #zuul | 17:27 | |
*** jamesmcarthur has joined #zuul | 17:27 | |
*** ianychoi has joined #zuul | 17:27 | |
*** sshnaidm is now known as sshnaidm|afk | 17:31 | |
*** evrardjp has quit IRC | 17:35 | |
*** evrardjp has joined #zuul | 17:35 | |
*** harrymichal has quit IRC | 17:40 | |
*** harrymichal has joined #zuul | 17:41 | |
*** jpena is now known as jpena|off | 17:43 | |
*** jamesmcarthur has quit IRC | 17:43 | |
*** jcapitao is now known as jcapitao_off | 17:44 | |
*** y2kenny has joined #zuul | 17:48 | |
*** harrymichal_ has joined #zuul | 17:51 | |
*** harrymichal has quit IRC | 17:55 | |
*** harrymichal_ is now known as harrymichal | 17:55 | |
*** jamesmcarthur has joined #zuul | 17:56 | |
*** ianychoi has quit IRC | 18:10 | |
*** ianychoi has joined #zuul | 18:17 | |
*** harrymichal has quit IRC | 18:28 | |
*** harrymichal has joined #zuul | 18:33 | |
*** harrymichal has quit IRC | 18:41 | |
*** harrymichal has joined #zuul | 18:42 | |
*** Defolos has joined #zuul | 18:43 | |
*** harrymichal has quit IRC | 18:52 | |
y2kenny | Hi, is it possible to have a job that neither success or fail? (for example, a job that stop the pipeline from proceeding.) I see that there is a no_jobs reporter but I am not sure if that's relevant or how a job can trigger that condition. | 18:56 |
fungi | y2kenny: not sure what you mean by "stop the pipeline from proceeding" but not adding any jobs for a particular project-pipeline will cause no jobs to be run for the project in the corresponding pipeline and so no results to be reported for it | 18:58 |
y2kenny | fungi: I am thinking something like a conditional noop | 18:59 |
y2kenny | like, a job will run but it may proceed or not depending on some condition | 18:59 |
corvus | y2kenny: fbo was just asking about something similar on the mailing list | 18:59 |
y2kenny | Oh | 18:59 |
y2kenny | I will go check | 18:59 |
corvus | y2kenny: see this message: http://lists.zuul-ci.org/pipermail/zuul-discuss/2020-March/001171.html | 18:59 |
y2kenny | thanks | 18:59 |
*** igordc has joined #zuul | 19:09 | |
*** armstrongs has joined #zuul | 19:11 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: spec: add a zuul-runner cli https://review.opendev.org/681277 | 19:22 |
*** mugsie has quit IRC | 19:23 | |
tristanC | corvus: Shrews: thank you for taking care of zk-auth, the changes lgtm | 19:23 |
*** armstrongs has quit IRC | 19:24 | |
corvus | tristanC: great -- i left a -1 comment for you on the zuul change | 19:25 |
*** mugsie has joined #zuul | 19:26 | |
tristanC | corvus: alright, i'll have a look tomorrow | 19:26 |
*** jamesmcarthur has quit IRC | 19:30 | |
y2kenny | I am still a bit stumped on the connection between executor, nodepool and my node. I am modifying the tutorial. The only things I have changed are: connecting to my own production gerrit and mounting in pregenerated ssh keys. When I try doing the testjob, I am getting node unreachable Permission denied: | 19:31 |
y2kenny | ubuntu-bionic | UNREACHABLE! => { "changed": false, "msg": "Data could not be sent to remote host \"node\". Make sure this host can be reached over ssh: root@node: Permission denied (publickey,password).\r\n", "unreachable": true } | 19:32 |
clarkb | y2kenny: the tutorial may use the same key for gerrit as it does for connecting to the node (so if you've changed them may need to modify the test node) | 19:32 |
*** openstackgerrit has quit IRC | 19:32 | |
y2kenny | I see the nodepool and zuul keypair | 19:33 |
y2kenny | and I mount the keys into the various container similar to the tutorial | 19:33 |
y2kenny | same location, etc. I have added the zuul.pub to my gerrit account as well. | 19:34 |
y2kenny | the connection to gerrit is fine. I am able to pull in the config and so on. | 19:34 |
y2kenny | and in the service log, I see a bunch of event that make sense | 19:34 |
y2kenny | for example, I get the scheduler "adding change to queue in pipeline check (I added the testjob to check) | 19:35 |
clarkb | ya that shows that zuul is able to connect to gerrit correctly so that half of the change is working. But zuul also needs to be able to ssh into the test node and I'm not sure if that is the same key or a different key (in a meeting now but can help look more closely later) | 19:36 |
y2kenny | scheduler also report "Submitted node request" | 19:36 |
fungi | node requests are coordinated between the zuul scheduler and nodepool launchers through zookeeper | 19:37 |
y2kenny | and launcher reports assigning node request and then scheduler accepting and completed node request. | 19:37 |
y2kenny | then scheduler report nodepool setting node set in use, executor started ssh agent, added ssh key /var/ssh/nodepool | 19:38 |
y2kenny | then executor report Beginning job testjob for ref refs/changes/53/329253/1 | 19:39 |
y2kenny | updating repo (I assume this is happening on the executor and not the node?) | 19:39 |
y2kenny | executor also does a few checkout | 19:41 |
y2kenny | but then scheduler report back Build complete, result None, no warnings and returning nodeset | 19:42 |
fungi | yes, there are initial checkouts in the workspace on the executor | 19:42 |
fungi | and then there's an optional role to rsync those to the nodes | 19:42 |
y2kenny | does executor talk to the nodes directly or via nodepool? | 19:43 |
y2kenny | or does scheduler launch a node via nodepool and executor talk to node once the node come up? | 19:43 |
y2kenny | and what does "launching a node" means in the context of the tutorial? My understanding is that the tutorial starts a single node container. But the node id from the web ui seems to increase with each retry | 19:45 |
clarkb | y2kenny: "scheduler launch a node via nodepool and executor talk to node once the node come up" that is what happens | 19:45 |
y2kenny | Ok. | 19:45 |
clarkb | launching a node means that nodepool has managed to procure the resources from $somewhere | 19:45 |
clarkb | in the quick start case its a static node so it doesnt' actualyl do much | 19:45 |
clarkb | but it could mean creating a VM in a cloud or a pod in k8s | 19:46 |
y2kenny | ok. | 19:46 |
y2kenny | so the problem I am having seems to be communication between the executor and node container | 19:46 |
y2kenny | I have exec into the executor container and tried to ssh node and it's reachable from the networking perspective | 19:47 |
fungi | make sure the ssh keys available in ~zuul/.ssh/ are usable to authenticate to the zuul account on the node | 19:48 |
y2kenny | ls | 19:49 |
y2kenny | oops... sorry... wrong window | 19:50 |
Shrews | y2kenny: what nodepool driver are you using for your nodes? | 19:50 |
y2kenny | just static right now | 19:50 |
y2kenny | the node root/.ssh/ does have the authorized_keys | 19:51 |
y2kenny | which is the nodepool.pub | 19:51 |
Shrews | is the username set correctly? https://zuul-ci.org/docs/nodepool/configuration.html#attr-providers.[static].pools.nodes.username | 19:53 |
Shrews | looks like you're assuming 'root' but i'm certain that our tutorials use root | 19:54 |
Shrews | i'm NOT certain | 19:54 |
y2kenny | the username is root | 19:54 |
y2kenny | it's set in nodepool.yaml | 19:54 |
y2kenny | and default_username for the executor | 19:54 |
y2kenny | in zuul.conf | 19:55 |
y2kenny | the tutorial works fully | 19:55 |
y2kenny | I am just not sure what I change that made things not to work. | 19:55 |
y2kenny | I plan to connect zuul to a k8s cluster for production anyway but I am not confident to move forward to that if I can't debug a static node | 19:56 |
y2kenny | although the container networking may have complicated things? I am not sure | 19:57 |
y2kenny | seems like the issue is between the executor and the node but I am not sure why | 19:57 |
y2kenny | Is there a way for me to manually/interactively pretend to be an executor? | 19:57 |
y2kenny | I am guessing the executor tries to connect via some python ssh session instead of doing so over the shell? | 19:58 |
Shrews | y2kenny: the executor just runs Ansible which in turn uses ssh to contact the node | 19:59 |
y2kenny | um... so I should be able to exec into the executor container and try to run ansible? | 20:00 |
fungi | also it's running ansible inside a lightweight bubblewrap container, so the ssh keys have to be inside the filesystem (i think usually via a bindmount?) | 20:00 |
*** y2kenny55 has joined #zuul | 20:02 | |
*** y2kenny55 has left #zuul | 20:03 | |
*** y2kenny79 has joined #zuul | 20:03 | |
y2kenny79 | um... | 20:03 |
y2kenny79 | is this working? | 20:04 |
y2kenny79 | I am not sure what happened with my connection | 20:04 |
clarkb | y2kenny79: you seem to be here again :) | 20:04 |
*** y2kenny has quit IRC | 20:04 | |
y2kenny79 | ok... looks like my original name has timeout | 20:05 |
y2kenny79 | let me get back to that one | 20:05 |
*** y2kenny79 has left #zuul | 20:05 | |
*** y2kenny has joined #zuul | 20:06 | |
fungi | you can just `/nick y2kenny` (if your nick is registered you can even ask nickserv to kick your ghosted connection) | 20:06 |
y2kenny | ok. (I am new to irc also... :)) | 20:07 |
clarkb | fungi: note this channel requires registration | 20:07 |
y2kenny | fungi: so when you said lightweight bubblewarp container, is that a regular docker container or other kind of container? | 20:08 |
corvus | clarkb: ftr, the quickstart uses separate keys for gerrit and the worker node (to avoid exactly the kind of confusion you were worried about) | 20:09 |
*** jcapitao_off has quit IRC | 20:09 | |
corvus | before we go too far down the rabbit hole, let's see if we can come up with a simple command to test | 20:09 |
corvus | y2kenny: what happens if you run "ssh -i /var/node/id_rsa -l root <ip address of test node>" on the executor? | 20:11 |
corvus | correction: | 20:11 |
y2kenny | var/ssh/nodepool ? | 20:12 |
corvus | y2kenny: yep :) | 20:13 |
corvus | "ssh -i /var/ssh/nodepool/id_rsa -l root <ip address of test node>" | 20:13 |
y2kenny | ok that is the key difference between the tutorial and my setup | 20:14 |
y2kenny | I have a parallel setup with just plain tutorial and the command works | 20:14 |
y2kenny | but my mess up setup the password is prompted | 20:14 |
y2kenny | that tells me something is wrong with the node's authorized_keys | 20:14 |
y2kenny | which is unexpected | 20:15 |
corvus | y2kenny: yeah, sounds like it | 20:15 |
y2kenny | let me double check... probably I have a typo some where | 20:15 |
fungi | y2kenny: and no, it's not really a docker container, more like a chroot with cgroups/process isolation | 20:15 |
corvus | y2kenny: now you've got a command that should be roughly equivalent to what zuul (via ansible and bubblewrap) will do to help test | 20:15 |
y2kenny | yes. Thanks for the tips. | 20:16 |
y2kenny | this give me something to poke around | 20:16 |
*** jamesmcarthur has joined #zuul | 20:17 | |
y2kenny | ok... I think this may be the .ssh directory's permission | 20:17 |
*** ianychoi has quit IRC | 20:17 | |
*** ianychoi has joined #zuul | 20:18 | |
y2kenny | um... I spoke too soon... there's a bunch of other differences... I will be back | 20:19 |
*** openstackstatus has joined #zuul | 20:20 | |
*** ChanServ sets mode: +v openstackstatus | 20:20 | |
y2kenny | OOOOOOOOOOOk.... I think this may have fixed it. The ssh authorized keys is mounted in with wrong owner and permission (1000 instead of root) on my setup | 20:28 |
y2kenny | yup... now I got further. Thanks folks, you guys are awesome! | 20:30 |
corvus | y2kenny: yay! good luck, let us know how it goes :) | 20:33 |
*** mattw4 has quit IRC | 20:40 | |
*** openstackgerrit has joined #zuul | 20:43 | |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: Install zypper on the nodepool-builder image https://review.opendev.org/712177 | 20:43 |
clarkb | ianw: mordred ^ fyi I think that will get us zypper in the nodepool-builder image | 20:43 |
*** plaurin has joined #zuul | 21:15 | |
plaurin | hello irc people!! | 21:15 |
clarkb | hello | 21:16 |
plaurin | Quite late in the day for me, however I wanted to talk about the kubernetes log streaming. I installed zuul 3.18.0 and nodepool 3.12.0, but still no luck. HOWEVER I now see a bunch of "Zuul log stream did not terminate" here and there | 21:16 |
clarkb | plaurin: that is actualy caused by it not starting properly. corvus added a release note update to cover that case (which may fix your problem) let me find a link | 21:17 |
clarkb | plaurin: https://zuul-ci.org/docs/zuul/reference/releasenotes.html#upgrade-notes are socat and kubectl installed on the executor? | 21:18 |
*** armstrongs has joined #zuul | 21:19 | |
plaurin | thx, yeah I installed socat indeed before updating | 21:22 |
plaurin | checking the upgrade note | 21:22 |
plaurin | ha I might be missing the start-zuul-console for some reason checking that | 21:23 |
clarkb | oh that may actually be what was added later | 21:27 |
corvus | plaurin: yeah, we realized rather late that start-zuul-console was required | 21:27 |
corvus | i think we realized it after the change merged, but right before we actually cut the release | 21:27 |
corvus | plaurin: so if my memory serves, that would be right after you went off to start testing it, sorry | 21:27 |
*** jamesmcarthur has quit IRC | 21:28 | |
plaurin | no problem, I'm excited to see this log streaming working, some people are going to be quite happy | 21:28 |
plaurin | I'm really grateful | 21:28 |
*** jamesmcarthur has joined #zuul | 21:28 | |
corvus | plaurin: no problem, thanks for testing! | 21:28 |
plaurin | testing in prod, like everyone should lol | 21:29 |
*** armstrongs has quit IRC | 21:29 | |
fungi | as long as you're also *testing* prod, that sounds ideal! | 21:31 |
plaurin | yep, .. outside of work hours at least | 21:33 |
*** avass has quit IRC | 21:43 | |
*** jamesmcarthur has quit IRC | 21:44 | |
*** jamesmcarthur has joined #zuul | 21:45 | |
*** jamesmcarthur has quit IRC | 21:53 | |
plaurin | YES | 21:58 |
plaurin | Thank you, sreaming is working now | 21:58 |
plaurin | I guess this can be resolved or closed https://storyboard.openstack.org/#!/story/2007321 | 21:58 |
*** sreejithp has joined #zuul | 21:59 | |
fungi | yay for working screaming! ;) | 22:00 |
*** jcapitao_off has joined #zuul | 22:01 | |
*** sreejithp has quit IRC | 22:02 | |
*** jamesmcarthur has joined #zuul | 22:05 | |
*** marvs has quit IRC | 22:05 | |
*** jamesmcarthur has quit IRC | 22:10 | |
plaurin | I am streaming of joy | 22:15 |
plaurin | sed -i 's/screaming/streaming/g' | 22:16 |
*** evrardjp has quit IRC | 22:17 | |
*** evrardjp has joined #zuul | 22:18 | |
*** ianychoi has quit IRC | 22:19 | |
*** ianychoi has joined #zuul | 22:20 | |
*** plaurin has quit IRC | 22:27 | |
*** evrardjp has quit IRC | 22:31 | |
*** evrardjp has joined #zuul | 22:33 | |
*** dpawlik has quit IRC | 22:40 | |
*** ianychoi has quit IRC | 22:41 | |
y2kenny | ok, I am back... now that the ssh issue is fixed. I am running into problem of role not found. These are zuul/zuul-jobs (such as add-build-sshkey or upload-logs.) From conversation from earlier, these zuul roles are supposed to be rsync to the node by the executor? | 22:41 |
*** ianychoi has joined #zuul | 22:42 | |
fungi | do you have a connection for opendev.org configured and are you including the zuul/zuul-jobs repository in your tenant configuration? | 22:47 |
y2kenny | Oooo... ok... I think I assumed a bit too much magic :) | 22:48 |
fungi | pretty sure add-build-sshkey and upload-logs run in the workspace on the executor anyway | 22:48 |
y2kenny | I removed the opendev connection | 22:48 |
fungi | you can remove the opendev connection if you want to carry a local fork of the zuul-jobs repo | 22:49 |
y2kenny | right. | 22:49 |
fungi | and i think a number of sites do that | 22:49 |
*** jcapitao_off has quit IRC | 22:49 | |
fungi | but we designed it so you can reuse the public copy as a standard library | 22:49 |
fungi | and we treat everything in it as an api contract, with deprecation announcements for behavior changes and the like | 22:50 |
y2kenny | in my mind I thought it works like dockerhub or something where docker (in this case zuul) would fetch the role. | 22:50 |
y2kenny | independent of the opendev connection | 22:51 |
y2kenny | I treated that connection as part of the tutorial/example | 22:51 |
fungi | it does fetch the role, but that connection is how it knows where you want to fetch it from, and it's extensible so you can treat any repository anywhere reachable as a source of job configuration | 22:51 |
y2kenny | got it. Thanks! | 22:51 |
fungi | yeah, the opendev.org connection and zuul/zuul-jobs repository are not in any way "special, they're just another pubic source of job configuration | 22:52 |
fungi | which you're free to add/remove/use/ignore/fork/reimplement/whatever suits your needs | 22:53 |
y2kenny | so the bits that tell zuul to fetch the role is: | 22:53 |
y2kenny | roles: - zuul: zuul/zuul-jobs | 22:53 |
y2kenny | under jobs.yaml? | 22:54 |
fungi | yep | 22:54 |
fungi | but that doesn't tell it where to find the roles | 22:55 |
corvus | clarkb, Shrews, tristanC: i believe i have a local zk cluster of 3 nodes using server-side (quorum) tls! | 22:56 |
clarkb | nice | 22:56 |
corvus | apparently all the java keystore stuff requires passwords (and some things break without them), so the hardest part is i had to type "foobar" a lot. | 22:57 |
y2kenny | ok so now I have another question... for things in pipeline like trigger, reporter, etc., I would specify a source. But looks like I don't do that for projects. What if I have zuul/zuul-jobs in both the opendev and my internal gerrit server? | 22:57 |
corvus | clarkb, Shrews, tristanC: tomorrow, i'll work on client tls config, then i'll see about setting up our tests that way and documenting this | 22:58 |
y2kenny | Oh... main.yaml/tenant config | 22:58 |
fungi | y2kenny: https://zuul-ci.org/docs/zuul/reference/tenants.html#tenant has an example of the tenant config, with zuul/zuul-jobs included as an untrusted repository, and below that you can see an example of filtering what kinds of configuration you want to allow it to consume from specific repositories as well | 22:58 |
*** igordc has quit IRC | 22:58 | |
y2kenny | ok, yes... I think I am connecting the dots now. | 22:58 |
y2kenny | thanks | 22:58 |
fungi | y2kenny: and then here's the bit on defining connections in your zuul.conf: https://zuul-ci.org/docs/zuul/reference/connections.html | 22:58 |
corvus | y2kenny: if there is a name collision (ie, zuul/zuul-jobs) you can supply the fully qualified name for a repo (eg opendev.org/zuul/zuul-jobs). we use the canonical name there rather than the connection name so that it can be the same across different zuul installations even if they have different connection names | 22:59 |
y2kenny | great. Yes, it's taking me a bit of time to connect all the concepts together. Coming from the jenkins world, this is really awesome. | 23:00 |
corvus | y2kenny: (that's generally true any place in zuul where a repo name is supplied outside of the context of a source) | 23:00 |
fungi | glad to hear someone coming from the jenkins world thinks zuul is awesome, rather than uselessly confusing! ;) | 23:00 |
y2kenny | corvus: that's good to know. I actually had some question in my head when I was setting up the gerrit connection. I feel like I need to name the server in 3 different places. | 23:01 |
corvus | y2kenny: yeah, i think one of those is superfluous and we should remove it from the docs, but i don't know which :) | 23:01 |
y2kenny | fungi: zuul give me all the stuff I wish jenkins would come with out of the box. Jenkins is fine for simple projects but, at least for my setup, the complexity of reality just made it grew into a monster | 23:03 |
y2kenny | a lot of these zuul concept are essentially what we have implemnted in Jenkins, custom, in groovy... Jenkins' groovy. It got pretty ugly. | 23:04 |
corvus | y2kenny: yeah, zuul is the result of running jenkins at scale for several years :) | 23:05 |
fungi | y2kenny: us too! | 23:05 |
fungi | that's basically how zuul evolved | 23:05 |
fungi | we wrote so much glue and orchestration around jenkins, that there was a lot more of it than there was jenkins | 23:06 |
fungi | so then we "just" swapped the jenkins masters out for ansible/executors | 23:07 |
y2kenny | corvus: it definitely shows. I was reading the stuff and I was like... these all make sense! I was so unhappy with Jenkins I was about to cook up something on my own. Good thing I saw the talk at Gerrit User Summit in december. | 23:07 |
y2kenny | and I was like... awesome, someone has already done this for me :D | 23:07 |
fungi | y2kenny: if you're intrigued by the jenkins history of zuul, there's an article here which recounts the highlights of what we went through to end up here: https://opensource.com/article/20/2/zuul | 23:08 |
corvus | y2kenny: oh great! did you see we're making progress on using zuul for gerrit's gerrit? https://ci.gerritcodereview.com/tenants | 23:08 |
corvus | y2kenny, fungi: yeah, that's a great article on the subject -- it might help zuul users coming from jenkinsland | 23:09 |
y2kenny | fungi: that's going to be useful when someone challenge my decision for not sticking with jenkins. | 23:15 |
*** zxiiro has quit IRC | 23:15 | |
y2kenny | corvus: I did see the discussion on repo-discuss but didn't know you guys got it up and running already. This is great. | 23:16 |
corvus | y2kenny: heh, it's, er, speculatively running :) there are about 8 required patches that haven't merged yet, but since none of them are in config repos, we can actually run jobs and see the result before they land. so we know it works, it's just wrangling reviews now :) | 23:17 |
fungi | yeah, it's almost mind-bending at times that you're able to basically test a complex ci deployment without even merging most of the configuration for it | 23:19 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!