tristanC | corvus: indeed, it seems like the limit is somehow applied to the number of build? | 00:02 |
---|---|---|
corvus | tristanC: i suspect it's that join to the build table | 00:03 |
corvus | tristanC: i'm still skeptical that we need to query by build name... so i think the easiest/best solution would be to drop that | 00:04 |
tristanC | corvus: it should work though, but your suggestion works for me too | 00:05 |
SpamapS | jlk: fascinating. I used to rely on the status triggers though, so that I could require a Jenkins check, but if we failed ot merge because jenkins was lagging, we had a trigger source of the jenkins status check, to kick the gate off again. | 00:06 |
corvus | tristanC: yeah, i'm sure we can make it work, but unless we really think querying by job is important, this will make the code+query simpler and more efficient. | 00:06 |
SpamapS | but seems like a corner case | 00:06 |
corvus | tristanC: the *main* thing here is to get SpamapS his buildset status page ;) | 00:06 |
jlk | SpamapS: hrm. If Jenkins was using the checkAPI though I think that would still work | 00:06 |
SpamapS | corvus: yes! :) actually lately I've been pointing at builds with a change filter, and that has made my users happier. | 00:07 |
jlk | SpamapS: because there would be a check suite for Jenkins, and you'd get check suite events saying Jenkins started or whatnot, and you could query to see if all check suites were complete. | 00:07 |
SpamapS | but a buildset would be ideal. | 00:07 |
tristanC | corvus: can you please revisit your -2 on https://review.openstack.org/573473 , the last PS correctly handle multi-parent jobs | 00:07 |
SpamapS | jlk: ah yeah, that's the answer than. | 00:07 |
jlk | a built set == all the jobs done for a particular change? | 00:07 |
SpamapS | jlk: correct (but don't get confused, corvus and tristanC are talking about other things. ;) | 00:08 |
corvus | tristanC: awesome! i will do that first thing tomorrow, i have to EOD now | 00:08 |
SpamapS | buildset would actually map pretty well to the check run though. | 00:08 |
jlk | SpamapS: right, but I'm just thinking that checks API solves that too. | 00:08 |
jlk | each job is represented as a check in the check run | 00:08 |
SpamapS | ya | 00:09 |
tristanC | corvus: and your -2 on https://review.openstack.org/555153 too please. Please explain why a job filter is a no-go. | 00:09 |
*** jamesmcarthur has joined #zuul | 00:10 | |
jlk | er. sorry. Each job would be a check_run in the check_suite | 00:10 |
clarkb | thats the other thing that the check api would give us iirc, better breakdown of individual jobs (though the buildset page in zuul side still a good idea) | 00:10 |
clarkb | is the check api something available in GHE too? as thats another thing to consider with our GHE users | 00:11 |
jlk | I can't recall just yet, but I _think_ so | 00:11 |
jlk | https://developer.github.com/enterprise/2.16/v3/checks/ | 00:11 |
jlk | It was beta in 2.14, GA in 2.15 | 00:12 |
clarkb | cool so more than one release even | 00:13 |
SpamapS | ya the features seem to flow into GHE at a pretty impressive pace | 00:14 |
*** jamesmcarthur has quit IRC | 00:16 | |
jlk | ... it may be a while before the Actions API is available there. | 00:16 |
SpamapS | meh ;) | 00:16 |
*** manjeets has quit IRC | 00:18 | |
*** manjeets has joined #zuul | 00:19 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: remove build and job_name filter from the buildset route https://review.openstack.org/636504 | 00:22 |
*** sdake has quit IRC | 00:26 | |
*** sdake has joined #zuul | 00:47 | |
*** openstackgerrit has quit IRC | 00:52 | |
*** jamesmcarthur has joined #zuul | 00:53 | |
*** jamesmcarthur has quit IRC | 00:54 | |
*** jamesmcarthur has joined #zuul | 01:04 | |
jhesketh | corvus: did I miss the discussion about zuul at the PTG? What was the reason for not having an official room if you don't mind me asking? | 01:10 |
*** sdake has quit IRC | 01:14 | |
*** sdake has joined #zuul | 01:17 | |
*** sdake has quit IRC | 01:29 | |
*** jamesmcarthur has quit IRC | 01:30 | |
*** krasmussen has quit IRC | 01:30 | |
*** sdake has joined #zuul | 01:33 | |
*** ruffian_sheep has joined #zuul | 01:33 | |
*** jamesmcarthur has joined #zuul | 01:33 | |
ruffian_sheep | Hi,guys.I want to use zuul v3 to build a third party ci for cinder of openstack. | 01:34 |
ruffian_sheep | I build it during the link:https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html | 01:35 |
ruffian_sheep | Now,there has some problem I don't know tow to sovle.It showed me :zuul-executor: ConnectionRefusedError: [Errno 111] Connection refused | 01:36 |
ruffian_sheep | Is it related to the setting of nodepool-static? | 01:37 |
*** sdake has quit IRC | 01:39 | |
SpamapS | ruffian_sheep: zuul-executor only connects to gearman. | 01:43 |
SpamapS | ruffian_sheep: which should be running/open on the scheduler. | 01:43 |
ruffian_sheep | SpamapS:Get it ,but there is also appeared an error :zuul-scheduler.service stop-sigterm timed out. Killing.When I try to restart it | 01:45 |
ruffian_sheep | SpamapS:What I can think of about the scehduler part is the setting of the zuul.conf file. | 01:46 |
SpamapS | ruffian_sheep: There should be other clues in the zuul-scheduler service output that tells you why it isn't working. | 01:47 |
ruffian_sheep | SpamapS:Is it probably I set the wrong paraments? | 01:47 |
*** jamesmcarthur has quit IRC | 01:53 | |
*** jamesmcarthur has joined #zuul | 01:54 | |
ruffian_sheep | SpamapS:I cat the file of zuul-scheduler.log.It load the main.yaml and show me zuul.Scheduler: Full reconfiguration complete | 01:57 |
SpamapS | ruffian_sheep: sounds good. But is it listening on 4730 ? | 01:58 |
ruffian_sheep | SpamapS:I think yes.I didn't set the port of gearman_server.During the document ,It should be 4730 in default. | 02:01 |
*** jamesmcarthur has quit IRC | 02:05 | |
SpamapS | ruffian_sheep: ok, I have to go AFK now, but check to make sure executor can reach the scheduler port 4730 | 02:05 |
*** jamesmcarthur has joined #zuul | 02:16 | |
ruffian_sheep | SpamapS:OK,thx dude. | 02:19 |
*** jamesmcarthur has quit IRC | 02:28 | |
*** bjackman has joined #zuul | 02:40 | |
*** saneax has joined #zuul | 02:45 | |
*** jamesmcarthur has joined #zuul | 02:59 | |
ruffian_sheep | I make sure web can can reach the scheduler port 4730.zuul 7363 0.1 7.4 1011516 75104 ? Ssl 10:00 0:03 /usr/bin/python3.5 /usr/bin/zuul-web | 03:03 |
ruffian_sheep | tcp 0 0 127.0.0.1:34644 127.0.0.1:4730 ESTABLISHED 7363/python3.5 | 03:04 |
*** jamesmcarthur has quit IRC | 03:06 | |
ruffian_sheep | But I don't why I cannot start the executor. | 03:06 |
*** sdake has joined #zuul | 03:07 | |
ruffian_sheep | What I set in the zuul.conf is that [executor] private_key_file=/var/lib/zuul/.ssh/id_rsa | 03:07 |
*** ruffian_sheep has quit IRC | 03:16 | |
*** ruffian_sheep has joined #zuul | 03:23 | |
*** rlandy has quit IRC | 03:24 | |
*** jamesmcarthur has joined #zuul | 03:26 | |
*** jamesmcarthur has quit IRC | 03:30 | |
*** bjackman has quit IRC | 03:45 | |
*** spsurya has joined #zuul | 03:55 | |
*** bjackman has joined #zuul | 03:55 | |
*** sdake has quit IRC | 04:15 | |
*** openstackgerrit has joined #zuul | 04:24 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildsets page https://review.openstack.org/630041 | 04:24 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078 | 04:24 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildset page https://review.openstack.org/630079 | 04:25 |
tristanC | corvus: topic:web-build should be good for review now | 04:26 |
tristanC | SpamapS: if you could have look too ^ | 04:26 |
*** bjackman has quit IRC | 04:54 | |
*** bjackman has joined #zuul | 04:57 | |
*** sanjayu_ has joined #zuul | 05:02 | |
*** saneax has quit IRC | 05:04 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078 | 05:08 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildset page https://review.openstack.org/630079 | 05:08 |
ruffian_sheep | tristanC:Hi,tristanC.Do you know how to deal with the problem? | 06:07 |
tristanC | ruffian_sheep: hum, what is the problem again? | 06:12 |
ruffian_sheep | tristanC:I want to use zuul v3 to build a third party ci for cinder of openstack.It showed me :zuul-executor: ConnectionRefusedError: [Errno 111] Connection refused | 06:18 |
ruffian_sheep | tristanC:SpamapS told me to checked the scheduler.I also found that zuul-scheduler.service stop-sigterm timed out. Killing.When I try to restart it | 06:20 |
ruffian_sheep | tristanC:I cat the file of zuul-scheduler.log.It load the main.yaml and show me zuul.Scheduler: Full reconfiguration complete | 06:20 |
ruffian_sheep | tristanC:I make sure zuul-web can can reach the scheduler port 4730. zuul 7363 0.1 7.4 1011516 75104 ? Ssl 10:00 0:03 /usr/bin/python3.5 /usr/bin/zuul-web | 06:21 |
ruffian_sheep | tristanC:tcp 0 0 127.0.0.1:34644 127.0.0.1:4730 ESTABLISHED 7363/python3.5 | 06:21 |
ruffian_sheep | tristanC:But I don't why I cannot start the executor.What I set in the zuul.conf is that [executor] private_key_file=/var/lib/zuul/.ssh/id_rsa | 06:22 |
tristanC | ruffian_sheep: can the executor reach the scheduler port 4730? | 06:22 |
*** sshnaidm is now known as sshnaidm|afk | 06:22 | |
ruffian_sheep | tristanC:I cannot start the executor now | 06:23 |
ruffian_sheep | tristanC:It appeared that zuul-executor[8845]: ConnectionRefusedError: [Errno 111] Connection refused | 06:23 |
ruffian_sheep | tristanC:I dont know what's wrong with the setting. | 06:24 |
ruffian_sheep | tristanC:I didn't change the setting about executor.It can be restarted before. | 06:25 |
tristanC | ruffian_sheep: please paste the executor logs | 06:25 |
ruffian_sheep | tristanC:zuul.Executor: Starting log streamer and zuul.Executor: Stopping log streamer | 06:26 |
ruffian_sheep | tristanC:I take it from /var/log/zuul/executor.log | 06:26 |
ruffian_sheep | tristanC:Is it ? | 06:27 |
tristanC | ruffian_sheep: yes, the logs before the connection refused error | 06:28 |
*** sdake has joined #zuul | 06:28 | |
ruffian_sheep | tristanC:Do you know what the problem is ?Or how to solve it? | 06:29 |
ruffian_sheep | tristanC:I didn't find a way to deal with | 06:29 |
tristanC | ruffian_sheep: it seems like the executor cannot connect to the scheduler gearman service | 06:29 |
ruffian_sheep | tristanC:But what I set the parament about the executor is that [executor] private_key_file=/var/lib/zuul/.ssh/id_rsa | 06:34 |
ruffian_sheep | tristanC:Is something I ignored? | 06:34 |
ruffian_sheep | tristanC:Is the gearman related to nodepool? | 06:35 |
tristanC | ruffian_sheep: iirc, executor pulls gearman config from the [gearman] section | 06:36 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: zuul-runner: add quick-start integration test https://review.openstack.org/635701 | 06:41 |
ruffian_sheep | tristanC:Almost the same as the executor.In zuul.conf :[gearman] server=127.0.0.1 | 06:43 |
tristanC | ruffian_sheep: is the executor running on the same host as the scheduler? | 06:44 |
ruffian_sheep | tristanC:Yes. | 06:45 |
ruffian_sheep | tristanC:I build is during the link:https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html | 06:46 |
tristanC | and is the gearman service running, e.g. is there something listening on port 4730? | 06:46 |
ruffian_sheep | tristanC:Yes.tcp 0 0 0.0.0.0:4730 0.0.0.0:* LISTEN 9638/python3.5 | 06:47 |
ruffian_sheep | tristanC:zuul 9638 0.0 6.5 479232 66004 ? Sl 14:45 0:00 /usr/bin/python3.5 /usr/bin/zuul-scheduler | 06:48 |
tristanC | ruffian_sheep: could you paste the full exception from the executor logs, "ConnectionRefusedError: [Errno 111] Connection refused" might be something else | 06:48 |
ruffian_sheep | tristanC:Is it ?Feb 13 14:33:02 ubuntu zuul-executor: Traceback (most recent call last): Feb 13 14:33:02 ubuntu zuul-executor: File "/usr/bin/zuul-executor", line 10, in <module> Feb 13 14:33:02 ubuntu zuul-executor: sys.exit(main()) Feb 13 14:33:02 ubuntu zuul-executor: File "/usr/lib/python3.5/site-packages/zuul/cmd/executor.py", line 121, in main Feb 13 14:33:02 ubuntu zuul-executor: Executor().main() Feb 13 14:33:02 ubuntu | 06:49 |
*** quiquell|off is now known as quiquell|rover | 06:49 | |
tristanC | ruffian_sheep: use a pasting service please | 06:50 |
*** bjackman has quit IRC | 06:50 | |
*** bjackman has joined #zuul | 06:51 | |
ruffian_sheep | tristanC:Emmmm....Are you saying that you are going to paste it line by line? I am not very familiar with the use of IRC. | 06:51 |
quiquell|rover | tristanC: o/ | 06:52 |
tristanC | ruffian_sheep: paste the log in http://paste.openstack.org/ and share the resulting url here | 06:55 |
tristanC | quiquell|rover: hey | 06:55 |
quiquell|rover | tristanC: good morning sir | 06:56 |
quiquell|rover | tristanC: so I found a correct way to escape jinja in the zuul.message | 06:56 |
tristanC | quiquell|rover: nice | 06:57 |
quiquell|rover | https://review.openstack.org/#/c/633930 | 06:57 |
quiquell|rover | It's here but maybe is too complicated for the value it gives | 06:57 |
quiquell|rover | If you can take a look would be nice | 06:58 |
tobiash | quiquell|rover: commented on ^ | 07:02 |
quiquell|rover | tobiash: thanks! So is this a proper way to do it ? | 07:04 |
quiquell|rover | It's a little ad-hoc | 07:05 |
ruffian_sheep | tristanC:Can you open it? http://paste.openstack.org/show/744995/ | 07:05 |
tobiash | quiquell|rover: I think that makes sense and is reusable if we need it elsewhere | 07:05 |
quiquell|rover | tobiash: ack | 07:06 |
tristanC | ruffian_sheep: so the connection error is related to /var/lib/zuul/executor.socket, is there a sock file leaked? | 07:07 |
tristanC | perhaps tries to remove the file and restart the service | 07:07 |
*** sdake has quit IRC | 07:07 | |
ruffian_sheep | tristanC:I don't quite understand what you mean. What is it about? | 07:08 |
ruffian_sheep | tristanC:I just changed the user and user group of a folder. His feedback has changed, is it related to this? http://paste.openstack.org/show/744996/ | 07:11 |
ruffian_sheep | tristanC:I had done it like you said.But it failed again. | 07:12 |
tristanC | ruffian_sheep: for some reasone, the executor process doesn't seems to be able to open the executor.socket file | 07:14 |
ruffian_sheep | tristanC:yes,the executor.socket is belong to root:root before.The messages showed me that http://paste.openstack.org/show/744997/ .I change it.And it turns to be connection refused. | 07:19 |
ruffian_sheep | tristanC:I'm confused what's wrong in the setting.I have no idea,and cann't find some similar case to solve | 07:21 |
tristanC | ruffian_sheep: sorry, i'm not familiar with the documentation you followed, not sure what is the issue here | 07:21 |
quiquell|rover | tobiash: So at my system it's using cyaml, but still only works if y set yaml.SafeLoader.add_cons ... and the like :-/ | 07:26 |
quiquell|rover | tobiash: so looks like cyaml is still calling something from yaml for this | 07:26 |
*** jamesmcarthur has joined #zuul | 07:26 | |
quiquell|rover | tobiash: I suppose cyaml use the binding but the rest as from yaml | 07:27 |
tobiash | quiquell|rover: hrm, ok | 07:28 |
quiquell|rover | tobiash: so have to be as it's | 07:28 |
quiquell|rover | tobiash: yaml.Safe... add_constructor there :-) will add a comment | 07:28 |
tobiash | quiquell|rover: k | 07:28 |
*** swest has joined #zuul | 07:30 | |
*** jamesmcarthur has quit IRC | 07:31 | |
*** swest has quit IRC | 07:34 | |
ruffian_sheep | tristanC:ok,thx dude! | 07:35 |
openstackgerrit | Quique Llorente proposed openstack-infra/zuul master: Mark as unsafe commit message at inventory https://review.openstack.org/633930 | 07:44 |
quiquell|rover | tobiash, tristanC: Added comment | 07:44 |
quiquell|rover | btw https://github.com/yaml/pyyaml/blob/4c2e993321ad29a02a61c4818f3cef9229219003/lib/yaml/cyaml.py#L49 | 07:44 |
quiquell|rover | Looks like the representer is from yaml | 07:44 |
quiquell|rover | Humm we can call parent class from the multiple inheritance | 07:47 |
openstackgerrit | Quique Llorente proposed openstack-infra/zuul master: Mark as unsafe commit message at inventory https://review.openstack.org/633930 | 07:47 |
quiquell|rover | Calling SafeReprenter works for cyaml and yaml | 07:50 |
*** gtema has joined #zuul | 07:55 | |
quiquell|rover | nah not working | 08:01 |
openstackgerrit | Quique Llorente proposed openstack-infra/zuul master: Mark as unsafe commit message at inventory https://review.openstack.org/633930 | 08:28 |
quiquell|rover | tobiash, tristanC: Have just leave a comment ^, let me know if something else is missing, sorry about the harassing | 08:29 |
openstackgerrit | Quique Llorente proposed openstack-infra/zuul master: Mark as unsafe commit message at inventory https://review.openstack.org/633930 | 08:42 |
*** dmsimard has quit IRC | 08:44 | |
*** dmsimard has joined #zuul | 08:45 | |
*** jpena|off is now known as jpena | 08:54 | |
openstackgerrit | Felix Schmidt proposed openstack-infra/zuul master: Retrieve full list of jobs with details per tenant via API https://review.openstack.org/635714 | 09:03 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildsets page https://review.openstack.org/630041 | 09:42 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078 | 09:42 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildset page https://review.openstack.org/630079 | 09:42 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift Pod provider https://review.openstack.org/590335 | 09:46 |
*** sshnaidm|afk is now known as sshnaidm | 09:59 | |
*** ruffian_sheep has quit IRC | 09:59 | |
*** fdegir has joined #zuul | 10:10 | |
*** swest has joined #zuul | 10:52 | |
quiquell|rover | tobiash: zuul passing now, we can merge it https://review.openstack.org/#/c/633930/ | 11:22 |
quiquell|rover | tristanC: ^ | 11:22 |
*** gtema has quit IRC | 11:52 | |
*** panda is now known as panda|lunch | 12:00 | |
*** gtema has joined #zuul | 12:01 | |
*** sdake has joined #zuul | 12:04 | |
*** sdake has quit IRC | 12:16 | |
*** jpena is now known as jpena|lunch | 12:34 | |
*** bjackman_ has joined #zuul | 12:53 | |
*** rlandy has joined #zuul | 12:55 | |
*** bjackman has quit IRC | 12:56 | |
*** bjackman__ has joined #zuul | 12:57 | |
*** bjackman_ has quit IRC | 13:00 | |
*** panda|lunch is now known as panda | 13:12 | |
*** jamesmcarthur has joined #zuul | 13:14 | |
*** sanjayu_ has quit IRC | 13:19 | |
*** bjackman__ has quit IRC | 13:19 | |
*** swest has quit IRC | 13:19 | |
*** sanjayu_ has joined #zuul | 13:19 | |
*** swest has joined #zuul | 13:20 | |
*** jamesmcarthur has quit IRC | 13:29 | |
*** swest has quit IRC | 13:33 | |
*** sanjayu__ has joined #zuul | 13:36 | |
*** jpena|lunch is now known as jpena | 13:37 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift Pod provider https://review.openstack.org/590335 | 13:39 |
*** sanjayu_ has quit IRC | 13:39 | |
*** jamesmcarthur has joined #zuul | 13:50 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: executor: add log_stream_port and log_stream_file settings https://review.openstack.org/535538 | 13:55 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: executor: add log_stream_port and log_stream_file settings https://review.openstack.org/535538 | 13:55 |
*** sdake has joined #zuul | 14:08 | |
*** sdake has quit IRC | 14:09 | |
*** sdake has joined #zuul | 14:10 | |
*** sdake has quit IRC | 14:13 | |
*** sdake has joined #zuul | 14:14 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Implement a Runc driver https://review.openstack.org/535556 | 14:15 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift Pod provider https://review.openstack.org/590335 | 14:17 |
*** rfolco has quit IRC | 14:18 | |
*** pwhalen has quit IRC | 14:21 | |
*** pwhalen has joined #zuul | 14:22 | |
*** sdake has quit IRC | 14:23 | |
*** sanjayu__ has quit IRC | 14:26 | |
*** sanjayu__ has joined #zuul | 14:26 | |
*** jamesmcarthur has quit IRC | 14:26 | |
*** rlandy is now known as rlandy|afk | 14:26 | |
*** rfolco has joined #zuul | 14:37 | |
jkt | tristanC: thanks for your very fast runc followup | 14:49 |
jkt | I wonder why that log streaming goes over a static port in the first place; Ansible has ssh multiplexing, so why not launch another command which `tail -f` from that well-known file location on the build node? | 14:51 |
*** gtema has quit IRC | 14:51 | |
*** jamesmcarthur has joined #zuul | 14:55 | |
mordred | jkt: we read the stream in the callback plugin which isn't in a position to spawn additional commands. I've got a half-finished replacement for the whole thing which gets rid of the file too - but it keeps being lower priority than other things to finish | 15:30 |
mordred | although I need to check out bcoca's work upstream to allow modules to return intermediary results to see if we can use it instead | 15:31 |
*** quiquell|rover is now known as quiquell|off | 15:48 | |
clarkb | http://paste.openstack.org/show/745021/ github processing data | 16:05 |
clarkb | that was a recent event that was finished. It took about a minute and a half to process once it was its turn and we do seem to have a growing queue | 16:05 |
*** sdake has joined #zuul | 16:06 | |
jkt | mordred: ack, thanks, looking forward to having it replaced :) | 16:07 |
*** ianychoi has quit IRC | 16:08 | |
clarkb | I think that confirms our suspicions that we grow a queue due to expensive serialized processing. I do note that it appears to take ~7 seconds to handle events from less big projects too so while ansible is really slow its not incredibly fast on the cheaper projects | 16:08 |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: CLI: fail if trying to enqueue/dequeue a change for the wrong project https://review.openstack.org/636662 | 16:11 |
tobiash | clarkb: that queue length is impressive... | 16:38 |
tobiash | clarkb: maybe we should go the search route while knowing that the api limit can be a problem. But to fill that rate limit we need to be faster than two seconds per event anyway and in addition have a long queue in the same org | 16:41 |
clarkb | ya it does make me wonder if we can reduce the number/type of events we listen for | 16:42 |
clarkb | I'm not familiar enough with how zuul plugs into workflows to understand if that is viable, but reducing total number of events gives us more breathing room on time to process an event | 16:42 |
*** sdake has quit IRC | 16:46 | |
*** sdake has joined #zuul | 16:46 | |
tobiash | clarkb: do you gate on any github repo atm? | 16:46 |
tobiash | clarkb: if not, you cound temporarily remove the status events | 16:47 |
clarkb | tobiash: we don't but I think ttx is using labels on kata for something | 16:47 |
clarkb | ttx: ^ maybe you can confirm | 16:47 |
clarkb | but maybe we don't use them on ansible and that is the biggest traffic generator? | 16:47 |
tobiash | label events should be even something different | 16:47 |
clarkb | ah | 16:47 |
tobiash | clarkb: the event subscription is configured at app level | 16:48 |
tobiash | not per repo | 16:48 |
clarkb | oh I thought it was per installation? | 16:48 |
tobiash | afaik not | 16:48 |
clarkb | I guess the app defines what it needs and you either accept or deny that on installation? | 16:48 |
tobiash | yepp | 16:48 |
tobiash | that's how it's supposed to work | 16:48 |
*** jamesmcarthur has quit IRC | 16:51 | |
corvus | quiquell|off, fungi: why use !unsafe instead of "{% raw %}" ? | 16:51 |
corvus | is it for the double-escape case? where the message itself contains endraw? | 16:52 |
fungi | it's possible i misunderstand the point of that change. i thought it was avoiding performing any jinja2 substitutions at all on commit messages by marking them unsafe? | 16:54 |
corvus | fungi: yes, but somewhere along its 28 patchset history, it changed from using raw/endraw to using !unsafe, and i'd like to make sure i fully understand why | 16:55 |
corvus | i don't see any explicit comments about the switch | 16:55 |
*** gcutrini has joined #zuul | 16:56 | |
fungi | this is the first patchset for it i've reviewed. it had rather numerous revisions so i'll admit i didn't dig into them all | 16:56 |
fungi | definitely a good question | 16:56 |
corvus | fungi: ack, i pinged you cause i thought you might have the extra context. i can wait for quiquell|off | 16:56 |
fungi | yeah, i unfortunately do not but am eager to learn the reason | 16:57 |
fungi | sorry | 16:57 |
corvus | i think test_jinja2_message_raw supports that supposition | 16:57 |
corvus | it's a unittest with {% endraw %} in the commit message | 16:57 |
clarkb | for more data handling PR events takes ~2 seconds on ansible | 16:58 |
*** sdake has quit IRC | 16:58 | |
clarkb | so ya ignoring status events could be a big win | 16:58 |
corvus | clarkb: interesting, aside from looking up the PR, the status event takes 5 seconds | 16:58 |
corvus | clarkb: so even the other parts of that event are slower than a pr event... | 16:59 |
corvus | i guess the other things it's doing may be worth looking into as well | 16:59 |
*** jamesmcarthur has joined #zuul | 17:00 | |
*** rlandy|afk is now known as rlandy | 17:00 | |
clarkb | http://paste.openstack.org/show/745027/ is the pr event I was looking at | 17:01 |
clarkb | https://developer.github.com/v3/activity/events/types/#statusevent status event details. I was a big confused about this yesterday but rereading it it is scoped to CI status results | 17:01 |
clarkb | We likely can ignore those in our app at least for now | 17:02 |
clarkb | since we don't gate or trigger on the results of CI systems currently | 17:02 |
corvus | clarkb: but to change the app permissions may mean asking everyone who has installed it to update; that's something i'd like to avoid. | 17:03 |
clarkb | ah | 17:03 |
corvus | (perhaps asking for fewer perms doesn't require any intervention, but if we fix this and re-add it later, i'm almost certain it would) | 17:04 |
clarkb | yup | 17:04 |
corvus | clarkb: do you have the github3.py logs for the 90 seconds it was doing the sha->pr mapping? | 17:06 |
clarkb | corvus: not yet but I'm sure I can dig that up | 17:08 |
clarkb | til awk can set marks and print text between matching patterns | 17:11 |
clarkb | why did I learn sed instaed of awk | 17:11 |
corvus | clarkb: wow. can you paste your awk command when you're done? | 17:12 |
ttx | clarkb: yes we use label/unlabel events on the Kata pipelines | 17:14 |
ttx | but not the status ones :) | 17:15 |
clarkb | corvus: awk '/2019-02-13 15:56:45,615 DEBUG zuul.GithubEventConnector: \[delivery: 8ec60b1e-2f99-11e9-8d5b-7a80f8e7170b\] Handling status event/{flag=1;next}/2019-02-13 15:58:22,125 DEBUG zuul.GithubEventConnector: \[delivery: 8ec60b1e-2f99-11e9-8d5b-7a80f8e7170b\] Finished event processing/{flag=0}flag' debug.log | grep github3 | 17:16 |
clarkb | corvus: flag=1 sets a mark on first match then flag=0 unsets it. And the flag at the end of the command means print if flag is not zero (or something similar) | 17:16 |
clarkb | then I grep all the lines with github3 in it | 17:17 |
clarkb | I don't know if any of the github query logs are sensitive should I just write them to disk for now? or you can run my comamnd above and see them for yourself :) | 17:17 |
corvus | clarkb: i don't think so, but since i'm not sure, i'll look and summarize | 17:18 |
* clarkb is skimming them now and can paste if nothing sensitive is found | 17:18 | |
corvus | clarkb: oh yeah, i think we can just paste that. | 17:20 |
clarkb | ya doing so now | 17:20 |
clarkb | http://paste.openstack.org/show/745028/ | 17:21 |
clarkb | one thing I notice is we seem to find the commit in the first couple of pages of PRs then we continue scanning to the end | 17:21 |
clarkb | (I guess because there can be more than one PR for that commit) | 17:21 |
clarkb | another is that we check our rate limit a lot | 17:21 |
*** jamesmcarthur has quit IRC | 17:22 | |
clarkb | as expected the bulk of the cost is in that PR listing | 17:22 |
clarkb | we also get user data which maybe isn't necessary? (I don't know) | 17:23 |
corvus | clarkb: i *think* we have some other actions in here | 17:23 |
corvus | (from another thread) | 17:23 |
clarkb | oh maybe | 17:24 |
clarkb | I'm not sure how to dedup that | 17:24 |
corvus | it's not super-relevant, i just think there's a report happening at the same time | 17:24 |
corvus | a few of these requests look unrelated and we can ignore them, but they're small anyway | 17:25 |
corvus | clarkb: is it possible your assumption about finding the pr early was based on the simultaneous report happening? | 17:27 |
clarkb | https://github.com/ansible/ansible/pull/52156 is the pull request and https://github.com/ansible/ansible/pull/52156/commits/0fb7693d44b31f867fab1fe7e35fbef7b17dbb14 the commit | 17:28 |
clarkb | corvus: it was based on the 0fb769 POST | 17:28 |
clarkb | whcih ya is unrelated to the page through I think | 17:28 |
corvus | yeah, pretty sure that's the report | 17:28 |
corvus | 52156 doesn't show up until the scan is finished; i don't think we do anything special when we find a pr -- i think much of the data are already there | 17:29 |
clarkb | though the api is being asked to sort by created date descending which should put newer PRs near the top | 17:29 |
corvus | right, so we *could* likely break earlier | 17:29 |
clarkb | 52156 is in the first page on the web ui with that sort | 17:30 |
clarkb | corvus: ya but we'd have to assume a 1:1 PR:commit relationship | 17:30 |
clarkb | which is safe 99% of the time until you hit that 1% set of corner cases jlk and SpamapS were talking about yesterday | 17:30 |
clarkb | stable backport where master and stable/foo haven't moved their branch heads yet so are the same | 17:31 |
clarkb | A force push to another branch (though maybe zuul doesn't care about that) | 17:31 |
*** jamesmcarthur has joined #zuul | 17:34 | |
clarkb | trying to reason about how your idea of an LRU cache for that mapping table would like like in that case. For example if PRA had commit 1 and we cached that, then PRB is created with commit 1 we'll find PRA and not B in the cache and ignore B | 17:34 |
clarkb | so we'd have to curate teh cache based on PR created events | 17:35 |
clarkb | which is maybe good enough? there could be windows (like zuul wasn't running when PRB was created but was when PRA was created) where we get out of sync | 17:35 |
corvus | here's a summary of the requests and the time they took: http://paste.openstack.org/show/745030/ | 17:36 |
corvus | maybe we should drop some of those rate limits :) | 17:36 |
corvus | do we get the PR twice? | 17:37 |
clarkb | hrm does seem like we do | 17:38 |
clarkb | we also search issues and PRs for the PR, do we need to do that when we know a commit is involved (implying a PR not an issue?) | 17:39 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Rework upload-forge role to use module https://review.openstack.org/635941 | 17:39 |
clarkb | re searching by sha1 https://help.github.com/articles/searching-issues-and-pull-requests/#search-by-commit-sha is the api to do that and comes with a much lower api rate limit aiui | 17:39 |
clarkb | https://developer.github.com/v3/search/#rate-limit has rate limit details for search | 17:40 |
corvus | right. i think that's one of the options... | 17:40 |
corvus | i still want to finish digging into the data first | 17:40 |
clarkb | 30 per minute is average 2s per | 17:40 |
clarkb | corvus: ++ | 17:40 |
corvus | i think i understand why we're fetching the PR twice -- i think that's an optimization we can make in the _updateChange method | 17:40 |
corvus | your question about issues/pulls is interesting | 17:41 |
clarkb | corvus: that part of the github api gets weird and we may have to do both for reasons. | 17:41 |
corvus | i'm going to try interpolating the log lines back together to figure out what prompted those requests | 17:41 |
clarkb | basically PRs are issues and you don't get all of the issue related data from the PR side of the api so if you need those details you have to query the issue api iirc | 17:41 |
corvus | ok, both of those (/pulls/52156 and /issues/52156) happened between 'got user' and 'refreshed change' | 17:42 |
corvus | so probably both inside the _updateChange method | 17:43 |
corvus | clarkb: yeah, i don't see zuul calling 2 methods there, so that's probably github3.py internals | 17:47 |
corvus | oh, no it is us | 17:47 |
corvus | but there's a comment explaining why it's necessary | 17:47 |
corvus | https://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/driver/github/githubconnection.py#n1114 | 17:47 |
clarkb | ya that would be a case of the leaky models there | 17:48 |
clarkb | https://developer.github.com/v3/pulls/#get-a-single-pull-request does show labels listed in an example response | 17:51 |
clarkb | maybe the reponse from pulls gives us what we need now? | 17:51 |
corvus | jlk, tobiash: ^ any ideas if we can drop the extra call to /issues/ ? | 17:52 |
jlk | hrm, trying to catch scrollback | 17:57 |
jlk | oooh | 17:57 |
jlk | so the /issues/ call is to get the labels? | 17:58 |
jlk | hrm | 17:58 |
jlk | hrm hrm | 17:58 |
corvus | that's what the comment says... | 17:58 |
jlk | yeah | 17:58 |
jlk | but... | 17:58 |
corvus | and i think the code backs it up | 17:58 |
corvus | at least, labels are the only thing we're getting from 'issueobj' | 17:59 |
* jlk looks at github3.py | 17:59 | |
clarkb | its probably a minor thing performance wise, but if we can clean that up and simplify the code it will probably make it easier to optomize the bits that do cost us | 18:00 |
corvus | yeah, i confess neither of these will save us much, but there's no better time now that we see the full sequence | 18:01 |
corvus | and this stuff does add up | 18:01 |
clarkb | ya the impact is largely on the human understanding of the system which does add up (more logs to wade through as well as code to undersatnd) | 18:01 |
clarkb | but also rate limits etc | 18:01 |
jlk | So yeah it does look like github3.py doesn't expose labels as part of the pull request object | 18:02 |
jlk | and it does for an issue object | 18:02 |
jlk | so yes, this appears to be a github3.py bug, which would take a fair amount of work to fix up | 18:03 |
jlk | outside of this, has there been any revelations about what's going on w/ the queue length? | 18:04 |
clarkb | jlk: http://paste.openstack.org/show/745030/ is the semi processed log data | 18:04 |
clarkb | jlk: that is for one ansible status event | 18:04 |
corvus | jlk: well, that's actually great news -- there's an improvement that we can make :) should i file a bug on github3.py? (or would you like to do it and use better words?) | 18:05 |
jlk | oof, so like 4 to 8 seconds for each of those 100 PR grabs | 18:05 |
jlk | corvus: I'm distracted with jury duty today, so feel free :D | 18:05 |
clarkb | it does seem like using the search api even with the lower limit would be an improvement | 18:05 |
corvus | jlk: will do! | 18:06 |
clarkb | we can process 30 status updates a minute with search api, but its a minute and a half for one with PR listing | 18:06 |
jlk | ugghh | 18:06 |
jlk | I don't trust the search API though | 18:06 |
jlk | because it's not necessarily live data | 18:06 |
SpamapS | jlk: I thought this was actually part of the API.. that PR's are just "special" issues. | 18:07 |
*** bhavikdbavishi has joined #zuul | 18:07 | |
clarkb | another approach could be to maintain a PR cache that is curated by PR events | 18:07 |
jlk | SpamapS: yes ... and no | 18:07 |
clarkb | (or maybe we do both to avoid search costs as much as possible) | 18:07 |
jlk | SpamapS: there are different API endpoints, and at some point lables may not have been exposed on the PR endpoint, but they are now | 18:07 |
jlk | clarkb: ooooh | 18:08 |
jlk | If we could thread getting all of those PRs.... :D | 18:08 |
jlk | no reason to do all of those serially | 18:08 |
clarkb | jlk: I don't think you know how many pages to page through unless you do them serially? | 18:08 |
jlk | yeah that's a good point | 18:09 |
jlk | let me do some chatting internally for a moment | 18:09 |
* tobiash is afk this evening | 18:12 | |
*** jpena is now known as jpena|off | 18:13 | |
corvus | jlk: https://github.com/sigmavirus24/github3.py/issues/924 look sane? | 18:15 |
corvus | or, rather, coherent :) | 18:15 |
jlk | that looks good to me! | 18:15 |
openstackgerrit | Jan Kundrát proposed openstack-infra/nodepool master: runc: Use correct interpreter for Ansible https://review.openstack.org/636702 | 18:16 |
openstackgerrit | Jan Kundrát proposed openstack-infra/nodepool master: runc: Support IPv6 https://review.openstack.org/636703 | 18:16 |
openstackgerrit | Jan Kundrát proposed openstack-infra/nodepool master: runc: Use the hypervisor's zuul-console location https://review.openstack.org/636704 | 18:16 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Re-use the github PR object when fetching reviews https://review.openstack.org/636705 | 18:16 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add comment about extra issues request https://review.openstack.org/636706 | 18:17 |
*** sdake has joined #zuul | 18:19 | |
jlk | I have learned that a certain other CI provider uses the search API and it is non-deterministic for them. | 18:23 |
fungi | slowness is preferable over nondeterminism if we have to pick one | 18:25 |
clarkb | fungi: sort of, we don't actually need these events for the openstack instance | 18:25 |
fungi | oh? | 18:25 |
fungi | you mean the status events in particular | 18:26 |
clarkb | fungi: these events are CI status updates and since we don't trigger jobs based on ci results and we aren't gating we can safely ignore them | 18:26 |
clarkb | fungi: yup, they are the events that force us to map commits to PRs | 18:26 |
quiquell|off | corvus, fungi: the jinja review is just to not expand jinja stuff I the comments it break the jobs | 18:26 |
quiquell|off | First approach was just using raw | 18:26 |
fungi | do we trigger jobs for ansible repositories based on status updates? | 18:26 |
fungi | do we foresee wanting to trigger jobs (as a third-party ci) on status updates? | 18:27 |
clarkb | its possible we might wait for first party to succeed before running third party | 18:28 |
clarkb | there are definitrly use casesthere | 18:28 |
jlk | I think you'd only trigger if you were doing a gate, or some kind of 2nd run CI | 18:28 |
*** quiquell|off is now known as quiquell|rover | 18:28 | |
fungi | presumably some github-using project might want to do that with their own zuul deployment though. would we want an option to ignore status updates entirely? or per tenant/per project (if we even get enough context to be able to do that)? | 18:29 |
fungi | but also having it be more efficient would be friendly to high-volume github-using projects who want to run a zuul and have it gate their pull requests for them | 18:30 |
clarkb | fungi ya maybe a per tenant config if we cant fogure out a better functioning approach | 18:30 |
corvus | i would like for this to be a plausible thing for people to use, and therefore would prefer that we find a way to ameliorate this. | 18:30 |
fungi | yes, i concur | 18:30 |
fungi | it does seem like a blocker for, say, ansible if they decided they wanted to run a zuul to gate them | 18:31 |
corvus | i think we had 3 main ideas: search api, cache, paralellization. search api seems to be out of the running. were there any others? | 18:31 |
jlk | I'm still chatting internally | 18:33 |
fungi | that's all i recall. of those cache sounds the least dangerous (but possibly also the least effective?) | 18:33 |
corvus | cool, let's see if jlk comes up with anything else | 18:34 |
jlk | honestly I'm not hopeful. The person I'm talking to recognizes the problem, has pointed to a similar problem with some plan to fix the similar problem (though for GraphQL). | 18:35 |
*** sdake has quit IRC | 18:37 | |
jlk | I think I'm seeing some desire to enhance the /commit/ object within the API, so that a GET on the commit object would include a listing of PRs w/ the commit as the HEAD. SO we could get hook, find hash, get commit object via API, find our PRs | 18:39 |
corvus | that sounds like a good solution | 18:41 |
*** bhavikdbavishi has quit IRC | 18:42 | |
jlk | I don't know the timeline yet though | 18:43 |
jlk | could be months | 18:44 |
fungi | i thought replacing all apis with taco bell^W^Wgraphql was supposed to solve everything | 18:46 |
jlk | sadly the github3.py api doesn't even allow selecting which page to grab, so parallelizing would be some extra work to make URLs on our own | 18:46 |
corvus | clarkb, jlk, fungi: my inclination would be to start with caching -- it's simpler, and it means less work for the cpu. then if we *also* want to do parllel of processing of multiple events, each of those processes will already be faster. | 18:46 |
corvus | (i worry that if we start with threading, we just end up with 100% cpu paging through github prs) | 18:47 |
fungi | yep, i think that's a sane next step | 18:47 |
fungi | caching, aside from arguing over cache invalidation, can't really hurt | 18:47 |
jlk | I can't see in my head how the caching would work | 18:48 |
fungi | also, it might get the performance to the point where it's "good enough for now" while github works to improve their api | 18:48 |
clarkb | jlk: when a PR is created/updated we get the commit info right? we stick that in a cache table. Then when we get status update for commit we check that table first and if not found then do the scan then update cache | 18:49 |
jlk | is it that as you do GETs on PRs we maintain a list of of HEADs to PRs, removing them when no longer open, then when we need PR from HASH we hit the cache first, if we find one yay, if we find more than one boo, and if we find none we resort to the long API calls? | 18:49 |
clarkb | jlk: yup | 18:50 |
jlk | okay. | 18:50 |
corvus | i was about to type the same thing a third way but i stopped :) | 18:50 |
jlk | hahah just wanted to be sure I understood the approach. I feel like this is a classic compsci thing | 18:50 |
fungi | and we can assume the commit info will be immutable (or at least things like commit message, timestamps, author/committer and file contents anyway) | 18:50 |
fungi | because changes to any of those result in a new hash value | 18:51 |
jlk | I don't think we need anything other than the hash | 18:51 |
corvus | right, we really just need sha->pr_number | 18:51 |
jlk | we still have to GET the PR to have up to date info | 18:51 |
fungi | and if the pr gets a new sha associated with it, that gives us a distinct event right? | 18:52 |
jlk | which as a side effect updates the cache | 18:52 |
SpamapS | Hey you know what's good for parallelizing a bunch of on-demand cache-or-slow-backend requests? Gearman. :) | 18:52 |
jlk | fungi: yes, a push event | 18:52 |
* SpamapS shakes fist "You'll never get rid of me and my gearman, mwahahahahaha" | 18:52 | |
fungi | gearman is like a short-order cook | 18:53 |
jlk | one day I'll learn what gearman is. | 18:53 |
* fungi doesn't have a punchline for that analogy | 18:53 | |
SpamapS | A short order cook that clones food when it is ready. | 18:53 |
clarkb | sigmavirus has left a note about original_labels on the PR response | 18:54 |
clarkb | appaerntly that might be a thing we can use from github3 | 18:54 |
SpamapS | In all seriousness, its entire reason for being is to coalesce multiple requesters for hot-but-not-cached items into a single backend slow request thread. | 18:54 |
SpamapS | The fact that geard doesn't actually support that mode of operation, and that gearman is actually useful for other things, is purely an accident. ;) | 18:54 |
corvus | SpamapS: yeah. to be fair, there was no documentation about that mode of operation when i wrote geard :) | 18:55 |
SpamapS | corvus: Some day I'd like to do a talk about assumed/implicit qualities of software and how longevity compromises it. ;) | 18:55 |
corvus | SpamapS: i would like to be an exhibit in that talk :) | 18:56 |
SpamapS | gearman may be the only example tho ;) | 18:56 |
fungi | i am sort of curious what the original use case was which prompted its existence. did it emerge as a need from some existing community? | 18:56 |
corvus | fungi: resizing pictures for livejournal | 18:56 |
SpamapS | fungi: LiveJournal wrote it, and it sat between MySQL and Memcache. | 18:56 |
clarkb | fungi: livejournal or some such service needed to ya that | 18:56 |
quiquell|rover | corvus: ping | 18:56 |
SpamapS | and file storage too, right | 18:56 |
fungi | neat | 18:57 |
fungi | in that case i can understand the multiple requesters bit | 18:57 |
SpamapS | livejournal would go "do we have a thumbnail for uniqueid=012345 in memcached?" no, ok imagemagick and store it and then return it but the magic was that each client who fetched that unique id would be sent to the same gearmand (hash the host list) and then each gearmand would only start one job for every unique id. | 18:58 |
fungi | since you might have a bunch of hits around the same time all waiting on the same thumbnail | 18:58 |
fungi | yeah, pretty awesome | 18:58 |
SpamapS | It completely mitigates thundering herd. | 18:58 |
SpamapS | (unless you have a thundering herd of 100% unique requests) | 18:58 |
SpamapS | But what's really funny, is that nobody thought to *write that down*. | 18:59 |
fungi | sort of sad to see lj passing into obscurity. didn't they originate openid as well? | 18:59 |
SpamapS | So while the unique ID was there in the protocol, and useful for other things, it wasn't actually stated in the docs that gearmand implementations should coalesce. | 18:59 |
*** quiquell|rover is now known as quiquell|off | 19:02 | |
jlk | fwiw I've been told that the API team DOES want to do this change for the REST API and I'll get tagged on the issue when it gets created so I can follow along | 19:03 |
corvus | jlk: does github ever make issues like that public? | 19:04 |
jlk | rarely | 19:04 |
jlk | actually I take that back | 19:04 |
jlk | I honestly don't know. | 19:04 |
jlk | I think there would always be an internal issue that would see most of the updates, and would have all kinds of business info and justification and customer names and stuff that we wouldn't want to share | 19:04 |
jlk | there might be an originating public issue | 19:05 |
jlk | jury duty bbiab | 19:05 |
clarkb | looking at the github3 pulls vs issues thing a bit more we get the full response body, we could deserialize that and grab labels from there I expect? | 19:08 |
SpamapS | yeah I think it wasn't there before, but now it is. | 19:10 |
clarkb | oh hrm maybe thats the markdown formatted commit message/pr header | 19:11 |
clarkb | not the response body | 19:11 |
clarkb | pr._json_data is the raw data but given the _ we should probably avoid accessing it directly | 19:14 |
clarkb | oh but as_dict returns it that way publicly | 19:14 |
clarkb | ok I think that is enough to make a change | 19:15 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul master: Don't request PR issue data https://review.openstack.org/636728 | 19:18 |
clarkb | I think something like ^ may work. | 19:18 |
clarkb | is anyone working on the cache idea? or intending to? I'm not in a great spot for that due to needing to care for infra things as well as not being super familiar with the github API. But I am happy to continue to get logs as needed to help with that | 19:21 |
clarkb | reading up on the rate limit logging it is purely informational in its current state and optional. Basically zuul won't backoff near the rate limit or stop and wait | 19:30 |
clarkb | so openstack's zuul can set that to false to reduce api queries and log noise | 19:30 |
SpamapS | so, if you guys want GitHub to do anything | 19:33 |
SpamapS | I am officially a paying customer | 19:34 |
SpamapS | so I can rattle chains | 19:34 |
SpamapS | or whatever that is. | 19:34 |
SpamapS | I'm sure tobiash can join me and we can amplify our customer-desires. :) | 19:34 |
clarkb | SpamapS: from jlk it sounds like they already recognize this as an issue and there is at least some plans to add an api endpoint to alleviate it | 19:34 |
clarkb | basically add the commit lookup that will return list of PRs api | 19:34 |
SpamapS | Indeed, just making sure we get the appropriate priority on the dev team backlog. :) | 19:35 |
clarkb | https://docs.google.com/document/d/e/2PACX-1vRW41XdoY-HlKdgSzAZ4Dwm-OeAyi1zEkDvwIfw3FsZ5H4yhtPNaZPs-50lgG_BHsWLKixxlrHpske1/pub Gerrit first class support for CI systems just announced | 19:36 |
daniel2 | So I'm working with an old version of nodepool and keep getting this error: https://bpaste.net/show/578c082a3450 Unable to track down the cause, maybe someone can nudge me a little bit? | 19:37 |
clarkb | looks like they will integrate a system to request checks (jobs/buildsets) in gerrit directly so recheck comments can go away | 19:38 |
clarkb | daniel2: paramiko seems to think that your ecdsa private key isn't valid for some reason. I would try sshing with it with command line ssh client and if that works maybe it is a bug in paramiko? You can try an rsa key instead (we know those work as it is what we use) | 19:39 |
clarkb | the initial pass of first class CI support in Gerrit won't support cross project CI systems :/ | 19:40 |
daniel2 | They are all RSA keys, and I even regenerated them all, I have no idea what key its actually looking at. | 19:42 |
clarkb | also the events are basically a "hey there might be work for you to poll on now" events which is an interesting design | 19:43 |
clarkb | daniel2: that is the code that verifies the booted instance is up and running properly iirc. So it is going to try and use the default key of whatever user is running nodepool to ssh into the isntance. Maybe the daemon is running under an unexpected user and finding unexpected keys? | 19:44 |
clarkb | also is there an ssh agent that might be injected unexpected key(s)? | 19:46 |
*** smyers has quit IRC | 19:47 | |
*** smyers has joined #zuul | 19:48 | |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul master: Don't request PR issue data https://review.openstack.org/636728 | 19:53 |
*** jamesmcarthur has quit IRC | 19:55 | |
fungi | daniel2: the "SSHException: not a valid EC private key file" message is misleading. in actuality it tries a variety of means of identifying keys with which to attempt authentication, and is merely mentioning the last type it attempted to use | 19:57 |
daniel2 | oh okay thanks! | 19:58 |
fungi | usual causes are that the username it's trying to use isn't valid for the remote host, the private key doesn't match a public key authorized for that user on the remote host, or (and this one is weird) that it found a local copy of the public key which doesn't actually correspond to the private key present | 19:59 |
SpamapS | btw.. replacing CircleCI badges in our code.. I wonder if anybody could do an actual pro job of making this: http://spamaps.org/files/zuul-gated-badge.png | 20:00 |
fungi | that looks fairly professional to me! | 20:00 |
SpamapS | It's 20 minutes of gimping | 20:00 |
clarkb | I can pass that along to james (he did the original logo work) | 20:01 |
corvus | SpamapS: there's https://zuul-ci.org/docs/zuul/user/badges.html | 20:02 |
clarkb | oh nevermind its done | 20:02 |
corvus | if folks like the logo better, we should update that | 20:02 |
clarkb | just without the logo | 20:02 |
SpamapS | perfect | 20:02 |
SpamapS | no I'm so happy somebody did it already | 20:02 |
SpamapS | and I like that it's blue | 20:02 |
SpamapS | green is kind of snarky ;) | 20:02 |
corvus | i believe that shade of blue is even our secondary logo color | 20:03 |
*** panda is now known as panda|off | 20:04 | |
corvus | and since we host it on our site, someday we might be able to do analytics on it (jamesmacarthur had a suggestion about anonymous analytics we might employ, but i haven't had a chance to look at it yet) | 20:04 |
*** jamesmcarthur has joined #zuul | 20:07 | |
SpamapS | corvus: Yeah that would be cool, and should be encouraged. If nothing else, it would be cool if that badge page had a little counter showing unique referrers in the last 30 days or something. | 20:08 |
corvus | maybe we can ask openstack if they'd like some more badges. | 20:09 |
SpamapS | It might make sense to make it a backend API call on the website too, so it can become dynamic. | 20:09 |
corvus | clarkb, fungi: did one of you want to work on implementing that cache? | 20:10 |
SpamapS | I still think it would be fun to have a "It has been # days since the build broke." and just have it count the days since the first time we saw that referrer. | 20:10 |
corvus | SpamapS: haha :) | 20:10 |
clarkb | corvus: I noted above that I don't think I have the time or github api knowledge to do it, but am happy to work with whoever does work on it to get logs and similar data from openstack installation | 20:11 |
*** jamesmcarthur has quit IRC | 20:12 | |
corvus | clarkb: ok. i don't think much github api knowledge is required (i don't expect any github api changes), and i think it can be written and satisfactorily tested entirely with the current unit test. but no dispute about "not enough time". :) | 20:12 |
clarkb | corvus: probably need enough knowledge to know when to update the cache? definitely something that can be learned from code and docs reading | 20:13 |
corvus | clarkb: i think it's safe to update it every time we get a pr | 20:13 |
clarkb | speaking of docs reading. The Gerrit first class CI thing is interesting. It appears that they are trying to get away from needing the ssh event system as a requirement for CI with this design | 20:14 |
clarkb | basically you can poll the checks api without the ssh checks trigger events | 20:14 |
clarkb | but still receive the ssh checks trigger events to react without needing to always poll | 20:15 |
clarkb | corvus: re update every time we get a PR rgr. In that case maybe I will have time | 20:16 |
*** jamesmcarthur has joined #zuul | 20:16 | |
fungi | i'm open to helping write the caching implementation, but am not familiar enough with zuul internals to know what existing cache solutions should be copied from (pretty sure there are some though, we cache gerrit change query results, right?) | 20:17 |
clarkb | fungi: the github driver has a _change_cache | 20:18 |
clarkb | whcih is pretty simple. but its caching a derivative set of data (PRs mangled into "changes") | 20:18 |
fungi | oh, so some of the primitives there in the driver could presumably be reused at least? | 20:20 |
fungi | looking | 20:21 |
clarkb | I am not sure how reuseable it is | 20:21 |
clarkb | its a dict of all the "changes" currently in pipelines | 20:21 |
fungi | yeah, i suppose it is just an associative array plus the maintainCache() method | 20:22 |
fungi | we try to get() from the dict, and if the result evaluates false we query normally and set the result in the cache | 20:24 |
fungi | looks like we also take exceptions in _updateChange() as a sign to remove entries from the cache | 20:25 |
SpamapS | yeah it should be relatively straight forward to wrap the PR fetching in cache decorators, and then invalidate/update the cache based on events we get. | 20:26 |
fungi | so figuring out what circumstances necessitate removal from the pr cache is i suppose the most complex bit | 20:26 |
SpamapS | might even be useful to use dogpile for it, since the cache could get rather large, and it would be useful to be able to farm it out to memcached. | 20:26 |
fungi | well, we maintain change caches in our other connection drivers too and those probably get fairly large as well i'd imagine | 20:27 |
fungi | not that i disagree about externalizing this cache, just that there seem to also be other caches in the scheduler which could presumably benefit from the same effort | 20:28 |
fungi | also just using an associative array in memory like we're doing for _change_cache initially is probably simpler to prove efficacy of the idea | 20:29 |
*** jamesmcarthur has quit IRC | 20:35 | |
*** sanjayu__ has quit IRC | 20:46 | |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul master: First draft PR caching by commit sha https://review.openstack.org/636751 | 20:50 |
clarkb | that is super rough | 20:51 |
clarkb | and doesn't try to do anything fancy. More of a direct outline of how we'd want to maintain sucha cache I think | 20:51 |
fungi | thanks, i'll see if it does what i expect | 20:51 |
pabelanger | Will the recent discussion around github performance block a potential release of zuul this week? | 21:02 |
clarkb | pabelanger: the issue has been there all along so probably not | 21:03 |
pabelanger | okay, neat! | 21:05 |
clarkb | we're just better at noticing now | 21:05 |
pabelanger | I also won't mind if we tagged a new nodepool release (with https://review.openstack.org/636393/) but understand if we don't want with people out | 21:05 |
SpamapS | a new nodepool release deserves some noise, since it will be the first one with the aws driver. | 21:11 |
corvus | there's a lot of new stuff in play that we haven't had a chance to exercise in production yet, so i'm not optimistic about a zuul release | 21:11 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul master: First draft PR caching by commit sha https://review.openstack.org/636751 | 21:12 |
*** jamesmcarthur has joined #zuul | 21:25 | |
*** jamesmcarthur has quit IRC | 21:30 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP: alternative github sha cache https://review.openstack.org/636764 | 21:47 |
corvus | clarkb: ^ i don't think we have to care about events -- since _updateChange gets called at the end of every event, this will ensure that after every event involving a PR, we update the cache. | 21:48 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Re-use the github PR object when fetching reviews https://review.openstack.org/636705 | 21:56 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add comment about extra issues request https://review.openstack.org/636706 | 21:56 |
clarkb | oh I see, ya | 22:03 |
clarkb | corvus: I left a few notes on your change based on what I had learned writing my change | 22:11 |
corvus | clarkb: thx, replied | 22:20 |
clarkb | oh derp msised the exception thrower that already existed | 22:22 |
*** openstackgerrit has quit IRC | 22:22 | |
*** pwhalen has quit IRC | 22:32 | |
*** gouthamr has quit IRC | 22:35 | |
*** gouthamr has joined #zuul | 22:36 | |
*** pwhalen has joined #zuul | 22:38 | |
*** jamesmcarthur has joined #zuul | 22:43 | |
*** jamesmcarthur has quit IRC | 22:48 | |
*** jamesmcarthur has joined #zuul | 22:51 | |
*** jamesmcarthur has quit IRC | 23:07 | |
*** jamesmcarthur has joined #zuul | 23:27 | |
*** jamesmcarthur has quit IRC | 23:33 | |
clarkb | corvus: https://review.openstack.org/#/c/636705/2 still fails tests, I think I tracked down why (details in comments) | 23:45 |
*** jamesmcarthur has joined #zuul | 23:49 | |
*** jamesmcarthur has quit IRC | 23:53 | |
*** jamesmcarthur has joined #zuul | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!