Wednesday, 2019-02-13

tristanC	corvus: indeed, it seems like the limit is somehow applied to the number of build?	00:02
corvus	tristanC: i suspect it's that join to the build table	00:03
corvus	tristanC: i'm still skeptical that we need to query by build name... so i think the easiest/best solution would be to drop that	00:04
tristanC	corvus: it should work though, but your suggestion works for me too	00:05
SpamapS	jlk: fascinating. I used to rely on the status triggers though, so that I could require a Jenkins check, but if we failed ot merge because jenkins was lagging, we had a trigger source of the jenkins status check, to kick the gate off again.	00:06
corvus	tristanC: yeah, i'm sure we can make it work, but unless we really think querying by job is important, this will make the code+query simpler and more efficient.	00:06
SpamapS	but seems like a corner case	00:06
corvus	tristanC: the main thing here is to get SpamapS his buildset status page ;)	00:06
jlk	SpamapS: hrm. If Jenkins was using the checkAPI though I think that would still work	00:06
SpamapS	corvus: yes! :) actually lately I've been pointing at builds with a change filter, and that has made my users happier.	00:07
jlk	SpamapS: because there would be a check suite for Jenkins, and you'd get check suite events saying Jenkins started or whatnot, and you could query to see if all check suites were complete.	00:07
SpamapS	but a buildset would be ideal.	00:07
tristanC	corvus: can you please revisit your -2 on https://review.openstack.org/573473 , the last PS correctly handle multi-parent jobs	00:07
SpamapS	jlk: ah yeah, that's the answer than.	00:07
jlk	a built set == all the jobs done for a particular change?	00:07
SpamapS	jlk: correct (but don't get confused, corvus and tristanC are talking about other things. ;)	00:08
corvus	tristanC: awesome! i will do that first thing tomorrow, i have to EOD now	00:08
SpamapS	buildset would actually map pretty well to the check run though.	00:08
jlk	SpamapS: right, but I'm just thinking that checks API solves that too.	00:08
jlk	each job is represented as a check in the check run	00:08
SpamapS	ya	00:09
tristanC	corvus: and your -2 on https://review.openstack.org/555153 too please. Please explain why a job filter is a no-go.	00:09
*** jamesmcarthur has joined #zuul		00:10
jlk	er. sorry. Each job would be a check_run in the check_suite	00:10
clarkb	thats the other thing that the check api would give us iirc, better breakdown of individual jobs (though the buildset page in zuul side still a good idea)	00:10
clarkb	is the check api something available in GHE too? as thats another thing to consider with our GHE users	00:11
jlk	I can't recall just yet, but I _think_ so	00:11
jlk	https://developer.github.com/enterprise/2.16/v3/checks/	00:11
jlk	It was beta in 2.14, GA in 2.15	00:12
clarkb	cool so more than one release even	00:13
SpamapS	ya the features seem to flow into GHE at a pretty impressive pace	00:14
*** jamesmcarthur has quit IRC		00:16
jlk	... it may be a while before the Actions API is available there.	00:16
SpamapS	meh ;)	00:16
*** manjeets has quit IRC		00:18
*** manjeets has joined #zuul		00:19
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: remove build and job_name filter from the buildset route https://review.openstack.org/636504	00:22
*** sdake has quit IRC		00:26
*** sdake has joined #zuul		00:47
*** openstackgerrit has quit IRC		00:52
*** jamesmcarthur has joined #zuul		00:53
*** jamesmcarthur has quit IRC		00:54
*** jamesmcarthur has joined #zuul		01:04
jhesketh	corvus: did I miss the discussion about zuul at the PTG? What was the reason for not having an official room if you don't mind me asking?	01:10
*** sdake has quit IRC		01:14
*** sdake has joined #zuul		01:17
*** sdake has quit IRC		01:29
*** jamesmcarthur has quit IRC		01:30
*** krasmussen has quit IRC		01:30
*** sdake has joined #zuul		01:33
*** ruffian_sheep has joined #zuul		01:33
*** jamesmcarthur has joined #zuul		01:33
ruffian_sheep	Hi,guys.I want to use zuul v3 to build a third party ci for cinder of openstack.	01:34
ruffian_sheep	I build it during the link:https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html	01:35
ruffian_sheep	Now,there has some problem I don't know tow to sovle.It showed me :zuul-executor: ConnectionRefusedError: [Errno 111] Connection refused	01:36
ruffian_sheep	Is it related to the setting of nodepool-static?	01:37
*** sdake has quit IRC		01:39
SpamapS	ruffian_sheep: zuul-executor only connects to gearman.	01:43
SpamapS	ruffian_sheep: which should be running/open on the scheduler.	01:43
ruffian_sheep	SpamapS:Get it ,but there is also appeared an error :zuul-scheduler.service stop-sigterm timed out. Killing.When I try to restart it	01:45
ruffian_sheep	SpamapS:What I can think of about the scehduler part is the setting of the zuul.conf file.	01:46
SpamapS	ruffian_sheep: There should be other clues in the zuul-scheduler service output that tells you why it isn't working.	01:47
ruffian_sheep	SpamapS:Is it probably I set the wrong paraments?	01:47
*** jamesmcarthur has quit IRC		01:53
*** jamesmcarthur has joined #zuul		01:54
ruffian_sheep	SpamapS:I cat the file of zuul-scheduler.log.It load the main.yaml and show me zuul.Scheduler: Full reconfiguration complete	01:57
SpamapS	ruffian_sheep: sounds good. But is it listening on 4730 ?	01:58
ruffian_sheep	SpamapS:I think yes.I didn't set the port of gearman_server.During the document ,It should be 4730 in default.	02:01
*** jamesmcarthur has quit IRC		02:05
SpamapS	ruffian_sheep: ok, I have to go AFK now, but check to make sure executor can reach the scheduler port 4730	02:05
*** jamesmcarthur has joined #zuul		02:16
ruffian_sheep	SpamapS:OK,thx dude.	02:19
*** jamesmcarthur has quit IRC		02:28
*** bjackman has joined #zuul		02:40
*** saneax has joined #zuul		02:45
*** jamesmcarthur has joined #zuul		02:59
ruffian_sheep	I make sure web can can reach the scheduler port 4730.zuul 7363 0.1 7.4 1011516 75104 ? Ssl 10:00 0:03 /usr/bin/python3.5 /usr/bin/zuul-web	03:03
ruffian_sheep	tcp 0 0 127.0.0.1:34644 127.0.0.1:4730 ESTABLISHED 7363/python3.5	03:04
*** jamesmcarthur has quit IRC		03:06
ruffian_sheep	But I don't why I cannot start the executor.	03:06
*** sdake has joined #zuul		03:07
ruffian_sheep	What I set in the zuul.conf is that [executor] private_key_file=/var/lib/zuul/.ssh/id_rsa	03:07
*** ruffian_sheep has quit IRC		03:16
*** ruffian_sheep has joined #zuul		03:23
*** rlandy has quit IRC		03:24
*** jamesmcarthur has joined #zuul		03:26
*** jamesmcarthur has quit IRC		03:30
*** bjackman has quit IRC		03:45
*** spsurya has joined #zuul		03:55
*** bjackman has joined #zuul		03:55
*** sdake has quit IRC		04:15
*** openstackgerrit has joined #zuul		04:24
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildsets page https://review.openstack.org/630041	04:24
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078	04:24
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildset page https://review.openstack.org/630079	04:25
tristanC	corvus: topic:web-build should be good for review now	04:26
tristanC	SpamapS: if you could have look too ^	04:26
*** bjackman has quit IRC		04:54
*** bjackman has joined #zuul		04:57
*** sanjayu_ has joined #zuul		05:02
*** saneax has quit IRC		05:04
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078	05:08
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildset page https://review.openstack.org/630079	05:08
ruffian_sheep	tristanC:Hi,tristanC.Do you know how to deal with the problem?	06:07
tristanC	ruffian_sheep: hum, what is the problem again?	06:12
ruffian_sheep	tristanC:I want to use zuul v3 to build a third party ci for cinder of openstack.It showed me :zuul-executor: ConnectionRefusedError: [Errno 111] Connection refused	06:18
ruffian_sheep	tristanC:SpamapS told me to checked the scheduler.I also found that zuul-scheduler.service stop-sigterm timed out. Killing.When I try to restart it	06:20
ruffian_sheep	tristanC:I cat the file of zuul-scheduler.log.It load the main.yaml and show me zuul.Scheduler: Full reconfiguration complete	06:20
ruffian_sheep	tristanC:I make sure zuul-web can can reach the scheduler port 4730. zuul 7363 0.1 7.4 1011516 75104 ? Ssl 10:00 0:03 /usr/bin/python3.5 /usr/bin/zuul-web	06:21
ruffian_sheep	tristanC:tcp 0 0 127.0.0.1:34644 127.0.0.1:4730 ESTABLISHED 7363/python3.5	06:21
ruffian_sheep	tristanC:But I don't why I cannot start the executor.What I set in the zuul.conf is that [executor] private_key_file=/var/lib/zuul/.ssh/id_rsa	06:22
tristanC	ruffian_sheep: can the executor reach the scheduler port 4730?	06:22
*** sshnaidm is now known as sshnaidm\|afk		06:22
ruffian_sheep	tristanC：I cannot start the executor now	06:23
ruffian_sheep	tristanC：It appeared that zuul-executor[8845]: ConnectionRefusedError: [Errno 111] Connection refused	06:23
ruffian_sheep	tristanC：I dont know what's wrong with the setting.	06:24
ruffian_sheep	tristanC：I didn't change the setting about executor.It can be restarted before.	06:25
tristanC	ruffian_sheep: please paste the executor logs	06:25
ruffian_sheep	tristanC：zuul.Executor: Starting log streamer and zuul.Executor: Stopping log streamer	06:26
ruffian_sheep	tristanC：I take it from /var/log/zuul/executor.log	06:26
ruffian_sheep	tristanC：Is it ?	06:27
tristanC	ruffian_sheep: yes, the logs before the connection refused error	06:28
*** sdake has joined #zuul		06:28
ruffian_sheep	tristanC：Do you know what the problem is ?Or how to solve it?	06:29
ruffian_sheep	tristanC：I didn't find a way to deal with	06:29
tristanC	ruffian_sheep: it seems like the executor cannot connect to the scheduler gearman service	06:29
ruffian_sheep	tristanC：But what I set the parament about the executor is that [executor] private_key_file=/var/lib/zuul/.ssh/id_rsa	06:34
ruffian_sheep	tristanC：Is something I ignored？	06:34
ruffian_sheep	tristanC：Is the gearman related to nodepool?	06:35
tristanC	ruffian_sheep: iirc, executor pulls gearman config from the [gearman] section	06:36
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: zuul-runner: add quick-start integration test https://review.openstack.org/635701	06:41
ruffian_sheep	tristanC：Almost the same as the executor.In zuul.conf :[gearman] server=127.0.0.1	06:43
tristanC	ruffian_sheep: is the executor running on the same host as the scheduler?	06:44
ruffian_sheep	tristanC：Yes.	06:45
ruffian_sheep	tristanC：I build is during the link:https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html	06:46
tristanC	and is the gearman service running, e.g. is there something listening on port 4730?	06:46
ruffian_sheep	tristanC：Yes.tcp 0 0 0.0.0.0:4730 0.0.0.0:* LISTEN 9638/python3.5	06:47
ruffian_sheep	tristanC：zuul 9638 0.0 6.5 479232 66004 ? Sl 14:45 0:00 /usr/bin/python3.5 /usr/bin/zuul-scheduler	06:48
tristanC	ruffian_sheep: could you paste the full exception from the executor logs, "ConnectionRefusedError: [Errno 111] Connection refused" might be something else	06:48
ruffian_sheep	tristanC：Is it ?Feb 13 14:33:02 ubuntu zuul-executor: Traceback (most recent call last): Feb 13 14:33:02 ubuntu zuul-executor: File "/usr/bin/zuul-executor", line 10, in <module> Feb 13 14:33:02 ubuntu zuul-executor: sys.exit(main()) Feb 13 14:33:02 ubuntu zuul-executor: File "/usr/lib/python3.5/site-packages/zuul/cmd/executor.py", line 121, in main Feb 13 14:33:02 ubuntu zuul-executor: Executor().main() Feb 13 14:33:02 ubuntu	06:49
*** quiquell\|off is now known as quiquell\|rover		06:49
tristanC	ruffian_sheep: use a pasting service please	06:50
*** bjackman has quit IRC		06:50
*** bjackman has joined #zuul		06:51
ruffian_sheep	tristanC：Emmmm....Are you saying that you are going to paste it line by line? I am not very familiar with the use of IRC.	06:51
quiquell\|rover	tristanC: o/	06:52
tristanC	ruffian_sheep: paste the log in http://paste.openstack.org/ and share the resulting url here	06:55
tristanC	quiquell\|rover: hey	06:55
quiquell\|rover	tristanC: good morning sir	06:56
quiquell\|rover	tristanC: so I found a correct way to escape jinja in the zuul.message	06:56
tristanC	quiquell\|rover: nice	06:57
quiquell\|rover	https://review.openstack.org/#/c/633930	06:57
quiquell\|rover	It's here but maybe is too complicated for the value it gives	06:57
quiquell\|rover	If you can take a look would be nice	06:58
tobiash	quiquell\|rover: commented on ^	07:02
quiquell\|rover	tobiash: thanks! So is this a proper way to do it ?	07:04
quiquell\|rover	It's a little ad-hoc	07:05
ruffian_sheep	tristanC：Can you open it? http://paste.openstack.org/show/744995/	07:05
tobiash	quiquell\|rover: I think that makes sense and is reusable if we need it elsewhere	07:05
quiquell\|rover	tobiash: ack	07:06
tristanC	ruffian_sheep: so the connection error is related to /var/lib/zuul/executor.socket, is there a sock file leaked?	07:07
tristanC	perhaps tries to remove the file and restart the service	07:07
*** sdake has quit IRC		07:07
ruffian_sheep	tristanC：I don't quite understand what you mean. What is it about?	07:08
ruffian_sheep	tristanC：I just changed the user and user group of a folder. His feedback has changed, is it related to this? http://paste.openstack.org/show/744996/	07:11
ruffian_sheep	tristanC：I had done it like you said.But it failed again.	07:12
tristanC	ruffian_sheep: for some reasone, the executor process doesn't seems to be able to open the executor.socket file	07:14
ruffian_sheep	tristanC：yes,the executor.socket is belong to root:root before.The messages showed me that http://paste.openstack.org/show/744997/ .I change it.And it turns to be connection refused.	07:19
ruffian_sheep	tristanC：I'm confused what's wrong in the setting.I have no idea,and cann't find some similar case to solve	07:21
tristanC	ruffian_sheep: sorry, i'm not familiar with the documentation you followed, not sure what is the issue here	07:21
quiquell\|rover	tobiash: So at my system it's using cyaml, but still only works if y set yaml.SafeLoader.add_cons ... and the like :-/	07:26
quiquell\|rover	tobiash: so looks like cyaml is still calling something from yaml for this	07:26
*** jamesmcarthur has joined #zuul		07:26
quiquell\|rover	tobiash: I suppose cyaml use the binding but the rest as from yaml	07:27
tobiash	quiquell\|rover: hrm, ok	07:28
quiquell\|rover	tobiash: so have to be as it's	07:28
quiquell\|rover	tobiash: yaml.Safe... add_constructor there :-) will add a comment	07:28
tobiash	quiquell\|rover: k	07:28
*** swest has joined #zuul		07:30
*** jamesmcarthur has quit IRC		07:31
*** swest has quit IRC		07:34
ruffian_sheep	tristanC:ok,thx dude!	07:35
openstackgerrit	Quique Llorente proposed openstack-infra/zuul master: Mark as unsafe commit message at inventory https://review.openstack.org/633930	07:44
quiquell\|rover	tobiash, tristanC: Added comment	07:44
quiquell\|rover	btw https://github.com/yaml/pyyaml/blob/4c2e993321ad29a02a61c4818f3cef9229219003/lib/yaml/cyaml.py#L49	07:44
quiquell\|rover	Looks like the representer is from yaml	07:44
quiquell\|rover	Humm we can call parent class from the multiple inheritance	07:47
openstackgerrit	Quique Llorente proposed openstack-infra/zuul master: Mark as unsafe commit message at inventory https://review.openstack.org/633930	07:47
quiquell\|rover	Calling SafeReprenter works for cyaml and yaml	07:50
*** gtema has joined #zuul		07:55
quiquell\|rover	nah not working	08:01
openstackgerrit	Quique Llorente proposed openstack-infra/zuul master: Mark as unsafe commit message at inventory https://review.openstack.org/633930	08:28
quiquell\|rover	tobiash, tristanC: Have just leave a comment ^, let me know if something else is missing, sorry about the harassing	08:29
openstackgerrit	Quique Llorente proposed openstack-infra/zuul master: Mark as unsafe commit message at inventory https://review.openstack.org/633930	08:42
*** dmsimard has quit IRC		08:44
*** dmsimard has joined #zuul		08:45
*** jpena\|off is now known as jpena		08:54
openstackgerrit	Felix Schmidt proposed openstack-infra/zuul master: Retrieve full list of jobs with details per tenant via API https://review.openstack.org/635714	09:03
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildsets page https://review.openstack.org/630041	09:42
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/buildset/{uuid} route https://review.openstack.org/630078	09:42
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add buildset page https://review.openstack.org/630079	09:42
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift Pod provider https://review.openstack.org/590335	09:46
*** sshnaidm\|afk is now known as sshnaidm		09:59
*** ruffian_sheep has quit IRC		09:59
*** fdegir has joined #zuul		10:10
*** swest has joined #zuul		10:52
quiquell\|rover	tobiash: zuul passing now, we can merge it https://review.openstack.org/#/c/633930/	11:22
quiquell\|rover	tristanC: ^	11:22
*** gtema has quit IRC		11:52
*** panda is now known as panda\|lunch		12:00
*** gtema has joined #zuul		12:01
*** sdake has joined #zuul		12:04
*** sdake has quit IRC		12:16
*** jpena is now known as jpena\|lunch		12:34
*** bjackman_ has joined #zuul		12:53
*** rlandy has joined #zuul		12:55
*** bjackman has quit IRC		12:56
*** bjackman__ has joined #zuul		12:57
*** bjackman_ has quit IRC		13:00
*** panda\|lunch is now known as panda		13:12
*** jamesmcarthur has joined #zuul		13:14
*** sanjayu_ has quit IRC		13:19
*** bjackman__ has quit IRC		13:19
*** swest has quit IRC		13:19
*** sanjayu_ has joined #zuul		13:19
*** swest has joined #zuul		13:20
*** jamesmcarthur has quit IRC		13:29
*** swest has quit IRC		13:33
*** sanjayu__ has joined #zuul		13:36
*** jpena\|lunch is now known as jpena		13:37
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift Pod provider https://review.openstack.org/590335	13:39
*** sanjayu_ has quit IRC		13:39
*** jamesmcarthur has joined #zuul		13:50
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: executor: add log_stream_port and log_stream_file settings https://review.openstack.org/535538	13:55
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: executor: add log_stream_port and log_stream_file settings https://review.openstack.org/535538	13:55
*** sdake has joined #zuul		14:08
*** sdake has quit IRC		14:09
*** sdake has joined #zuul		14:10
*** sdake has quit IRC		14:13
*** sdake has joined #zuul		14:14
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: Implement a Runc driver https://review.openstack.org/535556	14:15
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift Pod provider https://review.openstack.org/590335	14:17
*** rfolco has quit IRC		14:18
*** pwhalen has quit IRC		14:21
*** pwhalen has joined #zuul		14:22
*** sdake has quit IRC		14:23
*** sanjayu__ has quit IRC		14:26
*** sanjayu__ has joined #zuul		14:26
*** jamesmcarthur has quit IRC		14:26
*** rlandy is now known as rlandy\|afk		14:26
*** rfolco has joined #zuul		14:37
jkt	tristanC: thanks for your very fast runc followup	14:49
jkt	I wonder why that log streaming goes over a static port in the first place; Ansible has ssh multiplexing, so why not launch another command which `tail -f` from that well-known file location on the build node?	14:51
*** gtema has quit IRC		14:51
*** jamesmcarthur has joined #zuul		14:55
mordred	jkt: we read the stream in the callback plugin which isn't in a position to spawn additional commands. I've got a half-finished replacement for the whole thing which gets rid of the file too - but it keeps being lower priority than other things to finish	15:30
mordred	although I need to check out bcoca's work upstream to allow modules to return intermediary results to see if we can use it instead	15:31
*** quiquell\|rover is now known as quiquell\|off		15:48
clarkb	http://paste.openstack.org/show/745021/ github processing data	16:05
clarkb	that was a recent event that was finished. It took about a minute and a half to process once it was its turn and we do seem to have a growing queue	16:05
*** sdake has joined #zuul		16:06
jkt	mordred: ack, thanks, looking forward to having it replaced :)	16:07
*** ianychoi has quit IRC		16:08
clarkb	I think that confirms our suspicions that we grow a queue due to expensive serialized processing. I do note that it appears to take ~7 seconds to handle events from less big projects too so while ansible is really slow its not incredibly fast on the cheaper projects	16:08
openstackgerrit	Matthieu Huin proposed openstack-infra/zuul master: CLI: fail if trying to enqueue/dequeue a change for the wrong project https://review.openstack.org/636662	16:11
tobiash	clarkb: that queue length is impressive...	16:38
tobiash	clarkb: maybe we should go the search route while knowing that the api limit can be a problem. But to fill that rate limit we need to be faster than two seconds per event anyway and in addition have a long queue in the same org	16:41
clarkb	ya it does make me wonder if we can reduce the number/type of events we listen for	16:42
clarkb	I'm not familiar enough with how zuul plugs into workflows to understand if that is viable, but reducing total number of events gives us more breathing room on time to process an event	16:42
*** sdake has quit IRC		16:46
*** sdake has joined #zuul		16:46
tobiash	clarkb: do you gate on any github repo atm?	16:46
tobiash	clarkb: if not, you cound temporarily remove the status events	16:47
clarkb	tobiash: we don't but I think ttx is using labels on kata for something	16:47
clarkb	ttx: ^ maybe you can confirm	16:47
clarkb	but maybe we don't use them on ansible and that is the biggest traffic generator?	16:47
tobiash	label events should be even something different	16:47
clarkb	ah	16:47
tobiash	clarkb: the event subscription is configured at app level	16:48
tobiash	not per repo	16:48
clarkb	oh I thought it was per installation?	16:48
tobiash	afaik not	16:48
clarkb	I guess the app defines what it needs and you either accept or deny that on installation?	16:48
tobiash	yepp	16:48
tobiash	that's how it's supposed to work	16:48
*** jamesmcarthur has quit IRC		16:51
corvus	quiquell\|off, fungi: why use !unsafe instead of "{% raw %}" ?	16:51
corvus	is it for the double-escape case? where the message itself contains endraw?	16:52
fungi	it's possible i misunderstand the point of that change. i thought it was avoiding performing any jinja2 substitutions at all on commit messages by marking them unsafe?	16:54
corvus	fungi: yes, but somewhere along its 28 patchset history, it changed from using raw/endraw to using !unsafe, and i'd like to make sure i fully understand why	16:55
corvus	i don't see any explicit comments about the switch	16:55
*** gcutrini has joined #zuul		16:56
fungi	this is the first patchset for it i've reviewed. it had rather numerous revisions so i'll admit i didn't dig into them all	16:56
fungi	definitely a good question	16:56
corvus	fungi: ack, i pinged you cause i thought you might have the extra context. i can wait for quiquell\|off	16:56
fungi	yeah, i unfortunately do not but am eager to learn the reason	16:57
fungi	sorry	16:57
corvus	i think test_jinja2_message_raw supports that supposition	16:57
corvus	it's a unittest with {% endraw %} in the commit message	16:57
clarkb	for more data handling PR events takes ~2 seconds on ansible	16:58
*** sdake has quit IRC		16:58
clarkb	so ya ignoring status events could be a big win	16:58
corvus	clarkb: interesting, aside from looking up the PR, the status event takes 5 seconds	16:58
corvus	clarkb: so even the other parts of that event are slower than a pr event...	16:59
corvus	i guess the other things it's doing may be worth looking into as well	16:59
*** jamesmcarthur has joined #zuul		17:00
*** rlandy\|afk is now known as rlandy		17:00
clarkb	http://paste.openstack.org/show/745027/ is the pr event I was looking at	17:01
clarkb	https://developer.github.com/v3/activity/events/types/#statusevent status event details. I was a big confused about this yesterday but rereading it it is scoped to CI status results	17:01
clarkb	We likely can ignore those in our app at least for now	17:02
clarkb	since we don't gate or trigger on the results of CI systems currently	17:02
corvus	clarkb: but to change the app permissions may mean asking everyone who has installed it to update; that's something i'd like to avoid.	17:03
clarkb	ah	17:03
corvus	(perhaps asking for fewer perms doesn't require any intervention, but if we fix this and re-add it later, i'm almost certain it would)	17:04
clarkb	yup	17:04
corvus	clarkb: do you have the github3.py logs for the 90 seconds it was doing the sha->pr mapping?	17:06
clarkb	corvus: not yet but I'm sure I can dig that up	17:08
clarkb	til awk can set marks and print text between matching patterns	17:11
clarkb	why did I learn sed instaed of awk	17:11
corvus	clarkb: wow. can you paste your awk command when you're done?	17:12
ttx	clarkb: yes we use label/unlabel events on the Kata pipelines	17:14
ttx	but not the status ones :)	17:15
clarkb	corvus: awk '/2019-02-13 15:56:45,615 DEBUG zuul.GithubEventConnector: \[delivery: 8ec60b1e-2f99-11e9-8d5b-7a80f8e7170b\] Handling status event/{flag=1;next}/2019-02-13 15:58:22,125 DEBUG zuul.GithubEventConnector: \[delivery: 8ec60b1e-2f99-11e9-8d5b-7a80f8e7170b\] Finished event processing/{flag=0}flag' debug.log \| grep github3	17:16
clarkb	corvus: flag=1 sets a mark on first match then flag=0 unsets it. And the flag at the end of the command means print if flag is not zero (or something similar)	17:16
clarkb	then I grep all the lines with github3 in it	17:17
clarkb	I don't know if any of the github query logs are sensitive should I just write them to disk for now? or you can run my comamnd above and see them for yourself :)	17:17
corvus	clarkb: i don't think so, but since i'm not sure, i'll look and summarize	17:18
* clarkb is skimming them now and can paste if nothing sensitive is found		17:18
corvus	clarkb: oh yeah, i think we can just paste that.	17:20
clarkb	ya doing so now	17:20
clarkb	http://paste.openstack.org/show/745028/	17:21
clarkb	one thing I notice is we seem to find the commit in the first couple of pages of PRs then we continue scanning to the end	17:21
clarkb	(I guess because there can be more than one PR for that commit)	17:21
clarkb	another is that we check our rate limit a lot	17:21
*** jamesmcarthur has quit IRC		17:22
clarkb	as expected the bulk of the cost is in that PR listing	17:22
clarkb	we also get user data which maybe isn't necessary? (I don't know)	17:23
corvus	clarkb: i think we have some other actions in here	17:23
corvus	(from another thread)	17:23
clarkb	oh maybe	17:24
clarkb	I'm not sure how to dedup that	17:24
corvus	it's not super-relevant, i just think there's a report happening at the same time	17:24
corvus	a few of these requests look unrelated and we can ignore them, but they're small anyway	17:25
corvus	clarkb: is it possible your assumption about finding the pr early was based on the simultaneous report happening?	17:27
clarkb	https://github.com/ansible/ansible/pull/52156 is the pull request and https://github.com/ansible/ansible/pull/52156/commits/0fb7693d44b31f867fab1fe7e35fbef7b17dbb14 the commit	17:28
clarkb	corvus: it was based on the 0fb769 POST	17:28
clarkb	whcih ya is unrelated to the page through I think	17:28
corvus	yeah, pretty sure that's the report	17:28
corvus	52156 doesn't show up until the scan is finished; i don't think we do anything special when we find a pr -- i think much of the data are already there	17:29
clarkb	though the api is being asked to sort by created date descending which should put newer PRs near the top	17:29
corvus	right, so we could likely break earlier	17:29
clarkb	52156 is in the first page on the web ui with that sort	17:30
clarkb	corvus: ya but we'd have to assume a 1:1 PR:commit relationship	17:30
clarkb	which is safe 99% of the time until you hit that 1% set of corner cases jlk and SpamapS were talking about yesterday	17:30
clarkb	stable backport where master and stable/foo haven't moved their branch heads yet so are the same	17:31
clarkb	A force push to another branch (though maybe zuul doesn't care about that)	17:31
*** jamesmcarthur has joined #zuul		17:34
clarkb	trying to reason about how your idea of an LRU cache for that mapping table would like like in that case. For example if PRA had commit 1 and we cached that, then PRB is created with commit 1 we'll find PRA and not B in the cache and ignore B	17:34
clarkb	so we'd have to curate teh cache based on PR created events	17:35
clarkb	which is maybe good enough? there could be windows (like zuul wasn't running when PRB was created but was when PRA was created) where we get out of sync	17:35
corvus	here's a summary of the requests and the time they took: http://paste.openstack.org/show/745030/	17:36
corvus	maybe we should drop some of those rate limits :)	17:36
corvus	do we get the PR twice?	17:37
clarkb	hrm does seem like we do	17:38
clarkb	we also search issues and PRs for the PR, do we need to do that when we know a commit is involved (implying a PR not an issue?)	17:39
openstackgerrit	Merged openstack-infra/zuul-jobs master: Rework upload-forge role to use module https://review.openstack.org/635941	17:39
clarkb	re searching by sha1 https://help.github.com/articles/searching-issues-and-pull-requests/#search-by-commit-sha is the api to do that and comes with a much lower api rate limit aiui	17:39
clarkb	https://developer.github.com/v3/search/#rate-limit has rate limit details for search	17:40
corvus	right. i think that's one of the options...	17:40
corvus	i still want to finish digging into the data first	17:40
clarkb	30 per minute is average 2s per	17:40
clarkb	corvus: ++	17:40
corvus	i think i understand why we're fetching the PR twice -- i think that's an optimization we can make in the _updateChange method	17:40
corvus	your question about issues/pulls is interesting	17:41
clarkb	corvus: that part of the github api gets weird and we may have to do both for reasons.	17:41
corvus	i'm going to try interpolating the log lines back together to figure out what prompted those requests	17:41
clarkb	basically PRs are issues and you don't get all of the issue related data from the PR side of the api so if you need those details you have to query the issue api iirc	17:41
corvus	ok, both of those (/pulls/52156 and /issues/52156) happened between 'got user' and 'refreshed change'	17:42
corvus	so probably both inside the _updateChange method	17:43
corvus	clarkb: yeah, i don't see zuul calling 2 methods there, so that's probably github3.py internals	17:47
corvus	oh, no it is us	17:47
corvus	but there's a comment explaining why it's necessary	17:47
corvus	https://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/driver/github/githubconnection.py#n1114	17:47
clarkb	ya that would be a case of the leaky models there	17:48
clarkb	https://developer.github.com/v3/pulls/#get-a-single-pull-request does show labels listed in an example response	17:51
clarkb	maybe the reponse from pulls gives us what we need now?	17:51
corvus	jlk, tobiash: ^ any ideas if we can drop the extra call to /issues/ ?	17:52
jlk	hrm, trying to catch scrollback	17:57
jlk	oooh	17:57
jlk	so the /issues/ call is to get the labels?	17:58
jlk	hrm	17:58
jlk	hrm hrm	17:58
corvus	that's what the comment says...	17:58
jlk	yeah	17:58
jlk	but...	17:58
corvus	and i think the code backs it up	17:58
corvus	at least, labels are the only thing we're getting from 'issueobj'	17:59
* jlk looks at github3.py		17:59
clarkb	its probably a minor thing performance wise, but if we can clean that up and simplify the code it will probably make it easier to optomize the bits that do cost us	18:00
corvus	yeah, i confess neither of these will save us much, but there's no better time now that we see the full sequence	18:01
corvus	and this stuff does add up	18:01
clarkb	ya the impact is largely on the human understanding of the system which does add up (more logs to wade through as well as code to undersatnd)	18:01
clarkb	but also rate limits etc	18:01
jlk	So yeah it does look like github3.py doesn't expose labels as part of the pull request object	18:02
jlk	and it does for an issue object	18:02
jlk	so yes, this appears to be a github3.py bug, which would take a fair amount of work to fix up	18:03
jlk	outside of this, has there been any revelations about what's going on w/ the queue length?	18:04
clarkb	jlk: http://paste.openstack.org/show/745030/ is the semi processed log data	18:04
clarkb	jlk: that is for one ansible status event	18:04
corvus	jlk: well, that's actually great news -- there's an improvement that we can make :) should i file a bug on github3.py? (or would you like to do it and use better words?)	18:05
jlk	oof, so like 4 to 8 seconds for each of those 100 PR grabs	18:05
jlk	corvus: I'm distracted with jury duty today, so feel free :D	18:05
clarkb	it does seem like using the search api even with the lower limit would be an improvement	18:05
corvus	jlk: will do!	18:06
clarkb	we can process 30 status updates a minute with search api, but its a minute and a half for one with PR listing	18:06
jlk	ugghh	18:06
jlk	I don't trust the search API though	18:06
jlk	because it's not necessarily live data	18:06
SpamapS	jlk: I thought this was actually part of the API.. that PR's are just "special" issues.	18:07
*** bhavikdbavishi has joined #zuul		18:07
clarkb	another approach could be to maintain a PR cache that is curated by PR events	18:07
jlk	SpamapS: yes ... and no	18:07
clarkb	(or maybe we do both to avoid search costs as much as possible)	18:07
jlk	SpamapS: there are different API endpoints, and at some point lables may not have been exposed on the PR endpoint, but they are now	18:07
jlk	clarkb: ooooh	18:08
jlk	If we could thread getting all of those PRs.... :D	18:08
jlk	no reason to do all of those serially	18:08
clarkb	jlk: I don't think you know how many pages to page through unless you do them serially?	18:08
jlk	yeah that's a good point	18:09
jlk	let me do some chatting internally for a moment	18:09
* tobiash is afk this evening		18:12
*** jpena is now known as jpena\|off		18:13
corvus	jlk: https://github.com/sigmavirus24/github3.py/issues/924 look sane?	18:15
corvus	or, rather, coherent :)	18:15
jlk	that looks good to me!	18:15
openstackgerrit	Jan Kundrát proposed openstack-infra/nodepool master: runc: Use correct interpreter for Ansible https://review.openstack.org/636702	18:16
openstackgerrit	Jan Kundrát proposed openstack-infra/nodepool master: runc: Support IPv6 https://review.openstack.org/636703	18:16
openstackgerrit	Jan Kundrát proposed openstack-infra/nodepool master: runc: Use the hypervisor's zuul-console location https://review.openstack.org/636704	18:16
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Re-use the github PR object when fetching reviews https://review.openstack.org/636705	18:16
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Add comment about extra issues request https://review.openstack.org/636706	18:17
*** sdake has joined #zuul		18:19
jlk	I have learned that a certain other CI provider uses the search API and it is non-deterministic for them.	18:23
fungi	slowness is preferable over nondeterminism if we have to pick one	18:25
clarkb	fungi: sort of, we don't actually need these events for the openstack instance	18:25
fungi	oh?	18:25
fungi	you mean the status events in particular	18:26
clarkb	fungi: these events are CI status updates and since we don't trigger jobs based on ci results and we aren't gating we can safely ignore them	18:26
clarkb	fungi: yup, they are the events that force us to map commits to PRs	18:26
quiquell\|off	corvus, fungi: the jinja review is just to not expand jinja stuff I the comments it break the jobs	18:26
quiquell\|off	First approach was just using raw	18:26
fungi	do we trigger jobs for ansible repositories based on status updates?	18:26
fungi	do we foresee wanting to trigger jobs (as a third-party ci) on status updates?	18:27
clarkb	its possible we might wait for first party to succeed before running third party	18:28
clarkb	there are definitrly use casesthere	18:28
jlk	I think you'd only trigger if you were doing a gate, or some kind of 2nd run CI	18:28
*** quiquell\|off is now known as quiquell\|rover		18:28
fungi	presumably some github-using project might want to do that with their own zuul deployment though. would we want an option to ignore status updates entirely? or per tenant/per project (if we even get enough context to be able to do that)?	18:29
fungi	but also having it be more efficient would be friendly to high-volume github-using projects who want to run a zuul and have it gate their pull requests for them	18:30
clarkb	fungi ya maybe a per tenant config if we cant fogure out a better functioning approach	18:30
corvus	i would like for this to be a plausible thing for people to use, and therefore would prefer that we find a way to ameliorate this.	18:30
fungi	yes, i concur	18:30
fungi	it does seem like a blocker for, say, ansible if they decided they wanted to run a zuul to gate them	18:31
corvus	i think we had 3 main ideas: search api, cache, paralellization. search api seems to be out of the running. were there any others?	18:31
jlk	I'm still chatting internally	18:33
fungi	that's all i recall. of those cache sounds the least dangerous (but possibly also the least effective?)	18:33
corvus	cool, let's see if jlk comes up with anything else	18:34
jlk	honestly I'm not hopeful. The person I'm talking to recognizes the problem, has pointed to a similar problem with some plan to fix the similar problem (though for GraphQL).	18:35
*** sdake has quit IRC		18:37
jlk	I think I'm seeing some desire to enhance the /commit/ object within the API, so that a GET on the commit object would include a listing of PRs w/ the commit as the HEAD. SO we could get hook, find hash, get commit object via API, find our PRs	18:39
corvus	that sounds like a good solution	18:41
*** bhavikdbavishi has quit IRC		18:42
jlk	I don't know the timeline yet though	18:43
jlk	could be months	18:44
fungi	i thought replacing all apis with taco bell^W^Wgraphql was supposed to solve everything	18:46
jlk	sadly the github3.py api doesn't even allow selecting which page to grab, so parallelizing would be some extra work to make URLs on our own	18:46
corvus	clarkb, jlk, fungi: my inclination would be to start with caching -- it's simpler, and it means less work for the cpu. then if we also want to do parllel of processing of multiple events, each of those processes will already be faster.	18:46
corvus	(i worry that if we start with threading, we just end up with 100% cpu paging through github prs)	18:47
fungi	yep, i think that's a sane next step	18:47
fungi	caching, aside from arguing over cache invalidation, can't really hurt	18:47
jlk	I can't see in my head how the caching would work	18:48
fungi	also, it might get the performance to the point where it's "good enough for now" while github works to improve their api	18:48
clarkb	jlk: when a PR is created/updated we get the commit info right? we stick that in a cache table. Then when we get status update for commit we check that table first and if not found then do the scan then update cache	18:49
jlk	is it that as you do GETs on PRs we maintain a list of of HEADs to PRs, removing them when no longer open, then when we need PR from HASH we hit the cache first, if we find one yay, if we find more than one boo, and if we find none we resort to the long API calls?	18:49
clarkb	jlk: yup	18:50
jlk	okay.	18:50
corvus	i was about to type the same thing a third way but i stopped :)	18:50
jlk	hahah just wanted to be sure I understood the approach. I feel like this is a classic compsci thing	18:50
fungi	and we can assume the commit info will be immutable (or at least things like commit message, timestamps, author/committer and file contents anyway)	18:50
fungi	because changes to any of those result in a new hash value	18:51
jlk	I don't think we need anything other than the hash	18:51
corvus	right, we really just need sha->pr_number	18:51
jlk	we still have to GET the PR to have up to date info	18:51
fungi	and if the pr gets a new sha associated with it, that gives us a distinct event right?	18:52
jlk	which as a side effect updates the cache	18:52
SpamapS	Hey you know what's good for parallelizing a bunch of on-demand cache-or-slow-backend requests? Gearman. :)	18:52
jlk	fungi: yes, a push event	18:52
* SpamapS shakes fist "You'll never get rid of me and my gearman, mwahahahahaha"		18:52
fungi	gearman is like a short-order cook	18:53
jlk	one day I'll learn what gearman is.	18:53
* fungi doesn't have a punchline for that analogy		18:53
SpamapS	A short order cook that clones food when it is ready.	18:53
clarkb	sigmavirus has left a note about original_labels on the PR response	18:54
clarkb	appaerntly that might be a thing we can use from github3	18:54
SpamapS	In all seriousness, its entire reason for being is to coalesce multiple requesters for hot-but-not-cached items into a single backend slow request thread.	18:54
SpamapS	The fact that geard doesn't actually support that mode of operation, and that gearman is actually useful for other things, is purely an accident. ;)	18:54
corvus	SpamapS: yeah. to be fair, there was no documentation about that mode of operation when i wrote geard :)	18:55
SpamapS	corvus: Some day I'd like to do a talk about assumed/implicit qualities of software and how longevity compromises it. ;)	18:55
corvus	SpamapS: i would like to be an exhibit in that talk :)	18:56
SpamapS	gearman may be the only example tho ;)	18:56
fungi	i am sort of curious what the original use case was which prompted its existence. did it emerge as a need from some existing community?	18:56
corvus	fungi: resizing pictures for livejournal	18:56
SpamapS	fungi: LiveJournal wrote it, and it sat between MySQL and Memcache.	18:56
clarkb	fungi: livejournal or some such service needed to ya that	18:56
quiquell\|rover	corvus: ping	18:56
SpamapS	and file storage too, right	18:56
fungi	neat	18:57
fungi	in that case i can understand the multiple requesters bit	18:57
SpamapS	livejournal would go "do we have a thumbnail for uniqueid=012345 in memcached?" no, ok imagemagick and store it and then return it but the magic was that each client who fetched that unique id would be sent to the same gearmand (hash the host list) and then each gearmand would only start one job for every unique id.	18:58
fungi	since you might have a bunch of hits around the same time all waiting on the same thumbnail	18:58
fungi	yeah, pretty awesome	18:58
SpamapS	It completely mitigates thundering herd.	18:58
SpamapS	(unless you have a thundering herd of 100% unique requests)	18:58
SpamapS	But what's really funny, is that nobody thought to write that down.	18:59
fungi	sort of sad to see lj passing into obscurity. didn't they originate openid as well?	18:59
SpamapS	So while the unique ID was there in the protocol, and useful for other things, it wasn't actually stated in the docs that gearmand implementations should coalesce.	18:59
*** quiquell\|rover is now known as quiquell\|off		19:02
jlk	fwiw I've been told that the API team DOES want to do this change for the REST API and I'll get tagged on the issue when it gets created so I can follow along	19:03
corvus	jlk: does github ever make issues like that public?	19:04
jlk	rarely	19:04
jlk	actually I take that back	19:04
jlk	I honestly don't know.	19:04
jlk	I think there would always be an internal issue that would see most of the updates, and would have all kinds of business info and justification and customer names and stuff that we wouldn't want to share	19:04
jlk	there might be an originating public issue	19:05
jlk	jury duty bbiab	19:05
clarkb	looking at the github3 pulls vs issues thing a bit more we get the full response body, we could deserialize that and grab labels from there I expect?	19:08
SpamapS	yeah I think it wasn't there before, but now it is.	19:10
clarkb	oh hrm maybe thats the markdown formatted commit message/pr header	19:11
clarkb	not the response body	19:11
clarkb	pr._json_data is the raw data but given the _ we should probably avoid accessing it directly	19:14
clarkb	oh but as_dict returns it that way publicly	19:14
clarkb	ok I think that is enough to make a change	19:15
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: Don't request PR issue data https://review.openstack.org/636728	19:18
clarkb	I think something like ^ may work.	19:18
clarkb	is anyone working on the cache idea? or intending to? I'm not in a great spot for that due to needing to care for infra things as well as not being super familiar with the github API. But I am happy to continue to get logs as needed to help with that	19:21
clarkb	reading up on the rate limit logging it is purely informational in its current state and optional. Basically zuul won't backoff near the rate limit or stop and wait	19:30
clarkb	so openstack's zuul can set that to false to reduce api queries and log noise	19:30
SpamapS	so, if you guys want GitHub to do anything	19:33
SpamapS	I am officially a paying customer	19:34
SpamapS	so I can rattle chains	19:34
SpamapS	or whatever that is.	19:34
SpamapS	I'm sure tobiash can join me and we can amplify our customer-desires. :)	19:34
clarkb	SpamapS: from jlk it sounds like they already recognize this as an issue and there is at least some plans to add an api endpoint to alleviate it	19:34
clarkb	basically add the commit lookup that will return list of PRs api	19:34
SpamapS	Indeed, just making sure we get the appropriate priority on the dev team backlog. :)	19:35
clarkb	https://docs.google.com/document/d/e/2PACX-1vRW41XdoY-HlKdgSzAZ4Dwm-OeAyi1zEkDvwIfw3FsZ5H4yhtPNaZPs-50lgG_BHsWLKixxlrHpske1/pub Gerrit first class support for CI systems just announced	19:36
daniel2	So I'm working with an old version of nodepool and keep getting this error: https://bpaste.net/show/578c082a3450 Unable to track down the cause, maybe someone can nudge me a little bit?	19:37
clarkb	looks like they will integrate a system to request checks (jobs/buildsets) in gerrit directly so recheck comments can go away	19:38
clarkb	daniel2: paramiko seems to think that your ecdsa private key isn't valid for some reason. I would try sshing with it with command line ssh client and if that works maybe it is a bug in paramiko? You can try an rsa key instead (we know those work as it is what we use)	19:39
clarkb	the initial pass of first class CI support in Gerrit won't support cross project CI systems :/	19:40
daniel2	They are all RSA keys, and I even regenerated them all, I have no idea what key its actually looking at.	19:42
clarkb	also the events are basically a "hey there might be work for you to poll on now" events which is an interesting design	19:43
clarkb	daniel2: that is the code that verifies the booted instance is up and running properly iirc. So it is going to try and use the default key of whatever user is running nodepool to ssh into the isntance. Maybe the daemon is running under an unexpected user and finding unexpected keys?	19:44
clarkb	also is there an ssh agent that might be injected unexpected key(s)?	19:46
*** smyers has quit IRC		19:47
*** smyers has joined #zuul		19:48
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: Don't request PR issue data https://review.openstack.org/636728	19:53
*** jamesmcarthur has quit IRC		19:55
fungi	daniel2: the "SSHException: not a valid EC private key file" message is misleading. in actuality it tries a variety of means of identifying keys with which to attempt authentication, and is merely mentioning the last type it attempted to use	19:57
daniel2	oh okay thanks!	19:58
fungi	usual causes are that the username it's trying to use isn't valid for the remote host, the private key doesn't match a public key authorized for that user on the remote host, or (and this one is weird) that it found a local copy of the public key which doesn't actually correspond to the private key present	19:59
SpamapS	btw.. replacing CircleCI badges in our code.. I wonder if anybody could do an actual pro job of making this: http://spamaps.org/files/zuul-gated-badge.png	20:00
fungi	that looks fairly professional to me!	20:00
SpamapS	It's 20 minutes of gimping	20:00
clarkb	I can pass that along to james (he did the original logo work)	20:01
corvus	SpamapS: there's https://zuul-ci.org/docs/zuul/user/badges.html	20:02
clarkb	oh nevermind its done	20:02
corvus	if folks like the logo better, we should update that	20:02
clarkb	just without the logo	20:02
SpamapS	perfect	20:02
SpamapS	no I'm so happy somebody did it already	20:02
SpamapS	and I like that it's blue	20:02
SpamapS	green is kind of snarky ;)	20:02
corvus	i believe that shade of blue is even our secondary logo color	20:03
*** panda is now known as panda\|off		20:04
corvus	and since we host it on our site, someday we might be able to do analytics on it (jamesmacarthur had a suggestion about anonymous analytics we might employ, but i haven't had a chance to look at it yet)	20:04
*** jamesmcarthur has joined #zuul		20:07
SpamapS	corvus: Yeah that would be cool, and should be encouraged. If nothing else, it would be cool if that badge page had a little counter showing unique referrers in the last 30 days or something.	20:08
corvus	maybe we can ask openstack if they'd like some more badges.	20:09
SpamapS	It might make sense to make it a backend API call on the website too, so it can become dynamic.	20:09
corvus	clarkb, fungi: did one of you want to work on implementing that cache?	20:10
SpamapS	I still think it would be fun to have a "It has been # days since the build broke." and just have it count the days since the first time we saw that referrer.	20:10
corvus	SpamapS: haha :)	20:10
clarkb	corvus: I noted above that I don't think I have the time or github api knowledge to do it, but am happy to work with whoever does work on it to get logs and similar data from openstack installation	20:11
*** jamesmcarthur has quit IRC		20:12
corvus	clarkb: ok. i don't think much github api knowledge is required (i don't expect any github api changes), and i think it can be written and satisfactorily tested entirely with the current unit test. but no dispute about "not enough time". :)	20:12
clarkb	corvus: probably need enough knowledge to know when to update the cache? definitely something that can be learned from code and docs reading	20:13
corvus	clarkb: i think it's safe to update it every time we get a pr	20:13
clarkb	speaking of docs reading. The Gerrit first class CI thing is interesting. It appears that they are trying to get away from needing the ssh event system as a requirement for CI with this design	20:14
clarkb	basically you can poll the checks api without the ssh checks trigger events	20:14
clarkb	but still receive the ssh checks trigger events to react without needing to always poll	20:15
clarkb	corvus: re update every time we get a PR rgr. In that case maybe I will have time	20:16
*** jamesmcarthur has joined #zuul		20:16
fungi	i'm open to helping write the caching implementation, but am not familiar enough with zuul internals to know what existing cache solutions should be copied from (pretty sure there are some though, we cache gerrit change query results, right?)	20:17
clarkb	fungi: the github driver has a _change_cache	20:18
clarkb	whcih is pretty simple. but its caching a derivative set of data (PRs mangled into "changes")	20:18
fungi	oh, so some of the primitives there in the driver could presumably be reused at least?	20:20
fungi	looking	20:21
clarkb	I am not sure how reuseable it is	20:21
clarkb	its a dict of all the "changes" currently in pipelines	20:21
fungi	yeah, i suppose it is just an associative array plus the maintainCache() method	20:22
fungi	we try to get() from the dict, and if the result evaluates false we query normally and set the result in the cache	20:24
fungi	looks like we also take exceptions in _updateChange() as a sign to remove entries from the cache	20:25
SpamapS	yeah it should be relatively straight forward to wrap the PR fetching in cache decorators, and then invalidate/update the cache based on events we get.	20:26
fungi	so figuring out what circumstances necessitate removal from the pr cache is i suppose the most complex bit	20:26
SpamapS	might even be useful to use dogpile for it, since the cache could get rather large, and it would be useful to be able to farm it out to memcached.	20:26
fungi	well, we maintain change caches in our other connection drivers too and those probably get fairly large as well i'd imagine	20:27
fungi	not that i disagree about externalizing this cache, just that there seem to also be other caches in the scheduler which could presumably benefit from the same effort	20:28
fungi	also just using an associative array in memory like we're doing for _change_cache initially is probably simpler to prove efficacy of the idea	20:29
*** jamesmcarthur has quit IRC		20:35
*** sanjayu__ has quit IRC		20:46
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: First draft PR caching by commit sha https://review.openstack.org/636751	20:50
clarkb	that is super rough	20:51
clarkb	and doesn't try to do anything fancy. More of a direct outline of how we'd want to maintain sucha cache I think	20:51
fungi	thanks, i'll see if it does what i expect	20:51
pabelanger	Will the recent discussion around github performance block a potential release of zuul this week?	21:02
clarkb	pabelanger: the issue has been there all along so probably not	21:03
pabelanger	okay, neat!	21:05
clarkb	we're just better at noticing now	21:05
pabelanger	I also won't mind if we tagged a new nodepool release (with https://review.openstack.org/636393/) but understand if we don't want with people out	21:05
SpamapS	a new nodepool release deserves some noise, since it will be the first one with the aws driver.	21:11
corvus	there's a lot of new stuff in play that we haven't had a chance to exercise in production yet, so i'm not optimistic about a zuul release	21:11
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: First draft PR caching by commit sha https://review.openstack.org/636751	21:12
*** jamesmcarthur has joined #zuul		21:25
*** jamesmcarthur has quit IRC		21:30
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP: alternative github sha cache https://review.openstack.org/636764	21:47
corvus	clarkb: ^ i don't think we have to care about events -- since _updateChange gets called at the end of every event, this will ensure that after every event involving a PR, we update the cache.	21:48
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Re-use the github PR object when fetching reviews https://review.openstack.org/636705	21:56
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Add comment about extra issues request https://review.openstack.org/636706	21:56
clarkb	oh I see, ya	22:03
clarkb	corvus: I left a few notes on your change based on what I had learned writing my change	22:11
corvus	clarkb: thx, replied	22:20
clarkb	oh derp msised the exception thrower that already existed	22:22
*** openstackgerrit has quit IRC		22:22
*** pwhalen has quit IRC		22:32
*** gouthamr has quit IRC		22:35
*** gouthamr has joined #zuul		22:36
*** pwhalen has joined #zuul		22:38
*** jamesmcarthur has joined #zuul		22:43
*** jamesmcarthur has quit IRC		22:48
*** jamesmcarthur has joined #zuul		22:51
*** jamesmcarthur has quit IRC		23:07
*** jamesmcarthur has joined #zuul		23:27
*** jamesmcarthur has quit IRC		23:33
clarkb	corvus: https://review.openstack.org/#/c/636705/2 still fails tests, I think I tracked down why (details in comments)	23:45
*** jamesmcarthur has joined #zuul		23:49
*** jamesmcarthur has quit IRC		23:53
*** jamesmcarthur has joined #zuul		23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!