Friday, 2017-11-03

*** dkranz has joined #zuul		00:11
*** xinliang has quit IRC		03:43
*** xinliang has joined #zuul		03:55
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make encrypt_secret.py work with OpenSSL 0.x https://review.openstack.org/517133	06:37
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Reset state on unpaused, declined request https://review.openstack.org/517417	06:44
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add BaseSource.getProjectReadonly and refactor https://review.openstack.org/517067	06:55
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Do not add invalid projets via the /keys API https://review.openstack.org/517078	06:55
*** hashar has joined #zuul		08:58
*** electrofelix has joined #zuul		09:38
*** Cibo_ has joined #zuul		10:43
*** Cibo has joined #zuul		10:54
*** Cibo_ has quit IRC		10:56
*** hashar has quit IRC		11:08
*** hashar has joined #zuul		11:13
*** hashar has quit IRC		11:37
*** hashar has joined #zuul		12:19
dmsimard	Anyone know where I could start troubleshooting three mergers being connected to geard properly but only one of them is picking up work ?	13:52
*** jkilpatr has quit IRC		13:52
dmsimard	it's connected properly ... zuul-merg 10705 zuul 6u IPv4 26617254 0t0 TCP zm01.review.rdoproject.org:41876->managesf.review.rdoproject.org:4730 (ESTABLISHED)	13:52
dmsimard	restarted the two not picking up work in debug and that's all I get http://paste.openstack.org/raw/625431/	13:53
dmsimard	as I mention that, things start rolling again...	14:09
dmsimard	¯\_(ツ)_/¯	14:09
*** jkilpatr has joined #zuul		14:12
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Switch to threading model of socketserver https://review.openstack.org/517437	14:45
*** sambetts\|afk has quit IRC		14:51
*** jkilpatr has quit IRC		15:01
*** dkranz has quit IRC		15:09
jeblair	dmsimard: telnet geardserver 4730	15:10
jeblair	dmsimard: workers	15:10
jeblair	dmsimard: status	15:11
dmsimard	jeblair: it was a red herring, what was stuck was actually our nodepool	15:11
dmsimard	it was just weird that two mergers were not picking up anything but all three are doing stuff now.	15:11
jeblair	dmsimard: it's worth noting that unless the system is busy and there are jobs queued, it's quite likely for only one to pick up jobs (and the same one). all depends on cpu speed and network topology.	15:11
jeblair	dmsimard: the geard server doesn't round-robin or anything, it wakes up all idle workers, and the first one to respond gets the next job.	15:12
dmsimard	yeah, we had to get more than one merger.. we have three and it's fine now. Back when we only had one, sometimes we would get one of those "rebase bombs" from the upstream gerrit and that monopolizes a merger for quite a bit	15:12
jeblair	dmsimard: and it wakes them in the same order :\|	15:12
dmsimard	like a 22-long patch stack with depends-on mixed in between	15:13
jeblair	good times	15:13
*** jkilpatr has joined #zuul		15:14
*** kmalloc has joined #zuul		15:37
dmsimard	jeblair: I'm trying to hunt down the best solution to fix https://review.openstack.org/#/c/514489/ and https://review.openstack.org/#/c/514490/	15:50
dmsimard	jeblair: I wondered if we should just make sure the hostvars apply to localhost as well	15:50
dmsimard	but at the same time, there can be legitimate "nodeless" jobs which run only on localhost, so I'm not sure to what extent the concept of nodepool vars applies (cloud, provider, ip addresses, etc.)	15:52
dmsimard	For example, tristanC's work on the container driver defines a different inventory layout https://review.openstack.org/#/c/468753/24/nodepool/driver/oci/provider.py	15:54
dmsimard	It'd look like emit-job-header would be driver dependant	15:55
dmsimard	so maybe we could pull that back in the executor code itself, rather than to try and keep it as a role/playbook	15:56
*** bhavik has joined #zuul		15:56
SpamapS	jeblair: oh waking in the same order. That would be an easy thing to fix. ;)	16:06
* SpamapS throws it on the pile		16:06
SpamapS	jeblair: note that ideally they'd be woken up in the order that they slept.	16:09
SpamapS	so the one that gave you a PRE_SLEEP first is the first one to wake up. That sort of functions as a round robin.	16:10
* SpamapS ponders putting that into the protocol doc		16:10
jeblair	SpamapS: ++	16:27
jeblair	dmsimard: indeed we have nodeless jobs already.	16:28
dmsimard	jeblair: what does the inventory look like for nodeless jobs ?	16:30
jeblair	dmsimard: i think the right solution is for zuul not to care. i think our job header should gracefully handle missing data.	16:30
jeblair	dmsimard: "hosts: []"	16:31
jeblair	or hosts: {}	16:31
jeblair	something like that	16:31
jeblair	dmsimard: yeah, should be "all: hosts: {}"	16:31
*** Cibo has quit IRC		17:03
*** dkranz has joined #zuul		17:07
*** jkilpatr has quit IRC		17:16
*** jkilpatr has joined #zuul		17:28
*** bhavik has quit IRC		17:36
*** Cibo has joined #zuul		18:10
*** Cibo has quit IRC		18:16
*** weshay is now known as weshay_brb		18:30
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Make encrypt_secret.py work with OpenSSL 0.x https://review.openstack.org/517133	18:37
jlk	ugh. So if I bring 20K to the table, I can get a decent XC90 on a 72 month term for the same monthly payments I'm making now. Only, I think I'm underwater on current car, so blah.	18:46
SpamapS	jlk: wrong window :)	18:50
jlk	ah shit.	18:50
jlk	too bad this isn't slack, I can't delete it.	18:50
SpamapS	you can't really delete it in slack either. :)	18:50
SpamapS	You can just hide it from unprivileged users who haven't thought to run clients that log everything yet. ;)	18:51
*** Cibo has joined #zuul		18:52
jlk	sure	18:52
jlk	SpamapS: as a GH user, care to review https://review.openstack.org/#/c/517121/ ?	18:52
jlk	jeblair: ^^ is ready for review	18:53
SpamapS	jlk: oh yeah that one is cool :)	18:54
SpamapS	even though I still don't have apps ;)	18:54
jlk	ah right.	18:54
jlk	I wonder if you'll get GraphQL before apps	18:54
SpamapS	but I can review with hope for the future	18:54
*** Cibo has quit IRC		19:11
SpamapS	Looks like we might have a GH bug: http://paste.openstack.org/show/625458/	19:19
*** weshay_brb is now known as weshay		19:20
*** electrofelix has quit IRC		19:21
jlk	interesting!	19:32
jlk	we only seem to set that value in pull request style events	19:33
jlk	wha tevent was taht?	19:33
jlk	I'm guessing a push	19:33
jlk	or...	19:34
jlk	wow, why is it a ZuulTriggerEvent?	19:35
SpamapS	2017-11-03 12:18:43,376 DEBUG zuul.ZuulTrigger: onChangeEnqueued {'parent-change-enqueued'}	19:49
SpamapS	2017-11-03 12:18:43,376 DEBUG zuul.ZuulTrigger: Checking for changes needing <Change 0x7f2c1c016a90 20,472d642f38f232719e8d75ee15c87ac09d2fa2bd>:	19:49
SpamapS	jlk: just added parent-change-enqueued as a trigger. :)	19:50
jlk	okay, so that's the new thing we never hit in Bonny	19:50
SpamapS	which is how you get dependencies to merge along with their parents	19:50
jlk	interesting that the trigger object is a zuul trigger event, so it doesn't have that updated_at	19:50
SpamapS	the source is zuul. :)	19:51
SpamapS	so it makes sense	19:51
jlk	It means we wouldn't be able to determine if it's an updateOf something else	19:52
jlk	maybe what could be done is if updated_at isn't a key, create it and make it "now"	19:53
*** dkranz has quit IRC		20:41
SpamapS	Yeah that's the way to go I think.	20:50
SpamapS	Because that's pretty much the way it happened	20:50
SpamapS	so I have an interesting problem to deal with now.. hoping semaphore can help but I'm not sure.	20:52
SpamapS	I have a pool of resources that are statically allocated to my CI (user accounts). Creating/deleting them is heavy so we don't want the churn. But the accounts give us isolation during a CI run...	20:53
SpamapS	What I'm not sure how I'm going to deal with is how to hand out one account to only one job at a time.	20:53
SpamapS	the semaphore will just tell me how many are concurrently running. But I kind of need something more like a resource pool.	20:54
clarkb	could have a pre run on the executor do a checkout of the account and lock it	20:55
clarkb	and do it all in ansible	20:55
clarkb	though not sure if you can then modify the inventory	20:56
clarkb	that might be the issue with this plan	20:56
SpamapS	I think I need a state store for that.	21:01
SpamapS	and I may need one, period.	21:01
clarkb	ya I think thats a given if you need to coordinate arbitrary data among jobs?	21:01
SpamapS	yeah I was just trying to think if I could use something in zuul already but I don't think I can.	21:02
SpamapS	I honestly just need a pool of integers.	21:02
SpamapS	since I could have an array of 10 user accounts and just pick using the integers as offsets.	21:03
SpamapS	Feels like something zookeeper or etcd would do well.	21:04
jeblair	things like this may or may not come under nodepool's remit. we're giving it more node types, and we may extend it to support more types of cloud resources. but this is perhaps more general. perhaps extending semaphores in zuul to include data, or making secrets reservable is another option.	21:04
jeblair	SpamapS: but i agree, nothing off-the-shelf.	21:05
clarkb	reservable secrets makes sense to me	21:05
clarkb	"this is my trove cluster now" "this job has complete control of this k8s deployment"	21:06
clarkb	maybe those arent the best examples but ya	21:06
jeblair	SpamapS: a pre-playbook could reserve it via zk or etcd, but you'd still need a way to put it back, and we need cleanup jobs (your idea!) for that. we haven't implemented them yet, but they're probably not too hard.	21:06
SpamapS	jeblair: oh yeah I'd love it if nodepool could be extended to do this.	21:07
SpamapS	For now I think what I'm going to do is just abuse the zookeeper we already have for zuul but with different creds and different "chroot"	21:10
jeblair	SpamapS: yeah, if you can do it all within one playbook then an ephemeral zk node should work well	21:12
SpamapS	should work OK if I start a daemon-ish process early in each playbook which holds the resource while the playbook runs. As long as the node timeout is longer than the lag between pre/run/post	21:12
SpamapS	No I was thinking it would cross the boundaries but that as long as we refresh the ephemeral nodes in time it wouldn't be a problem.	21:13
jeblair	SpamapS: i'm not sure that's going to work. once a playbook ends, bwrap will kill the process, and ephemeral nodes are tied to connections, so zk will soon realize that connection is dead and delete it. if you tried to lock it again, it could race with another connection (and using the standard zk locking algorithm, if there is a race, it will lose, because it'll be at the back of the line)	21:18
SpamapS	jeblair: there's a timeout for re-establishing that dead connection, I was hoping we can make that long enough to survive pre to run.	21:19
jeblair	SpamapS: yeah, but you won't be re-establishing it, you'll be making a new one	21:19
SpamapS	Actually	21:19
SpamapS	I have nodes	21:19
jeblair	doing it on the node works :)	21:19
SpamapS	And in my evil world, those nodes can reach my zk (or.. a ZK anyway)	21:19
SpamapS	but gah	21:20
SpamapS	this got evil	21:20
SpamapS	Maybe I can just write it as a zuul feature. ;)	21:20
SpamapS	Another thought is to just write a little API frontend.	21:20
jeblair	you could implement cleanup jobs :)	21:20
SpamapS	Yeah	21:20
SpamapS	Not sure how much more I can invest in Zuul.. just getting it up and running has been a bit of a side-job.	21:21
SpamapS	Right now my automation spins up the heavy accounts and deletes them. And that's ok because the job that needs those accounts runs like, 3 times a day. But the admins of that service have made it clear that the harder I push it, the more it will fail.	21:22
jeblair	also, i wonder if another sort of job might be useful -- one that starts before any others in the buildset, runs for the duration of the buildset, then gets a signal (or is killed) when all others are done. so basically doing a pre-job + cleanup-job pair but with only one job.	21:22
SpamapS	so the pool of accounts is something I'll have to do before I get more jobs on Zuul.. but it's also something I can't do until I get more buy-in for zuul ;-)	21:22
SpamapS	jeblair: oh interesting!	21:22
SpamapS	A supervisor job	21:23
jeblair	good name	21:23
jeblair	that's still probably about as much work as the cleanup job (they share enough maybe we can implement both at once)	21:23
*** Cibo has joined #zuul		21:23
jeblair	SpamapS: i have a really hacky idea	21:23
jeblair	this is terrible and no one should ever do it	21:23
jeblair	but, you could probably make your own supervisor job just by having a long-running zero-node job that queries the zuul status.json to figure out when its peers are all done.	21:24
jeblair	you'd still need to do some inter-job communication outside of zuul to coordinate things. but it's an option.	21:25
jeblair	i'm not actually sure it's any less work than implementing cleanup job. but just brainstorming here. :)	21:25
SpamapS	I like it	21:27
SpamapS	But I think the trouble is still finding the coordination point.	21:28
SpamapS	dunno, I've sort of backburnered it now as I think I'll have to let it sit and stew for the next 2 weeks while I get some other stuff done	21:29
SpamapS	I'm sure if I was coming to Sydney I could have figured it out with all of you over fried Wallaby toes.	21:29
SpamapS	jlk: have you ever succeeded in getting Zuul to dump debug logs about the github API requests it is making?	21:35
SpamapS	I'd really like to include the Etag's and stuff so GH support can debug with me.	21:35
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Prime github app install map on connection load https://review.openstack.org/517121	21:50
*** harlowja has quit IRC		22:12
jlk	SpamapS: yeah I thought with a log config that was cranked up to debug you can get some	23:01
jlk	oh, maybe not. I see "caching due to etag" but not what the etag itself is	23:02
jlk	so maybe it's more github3.py debugging	23:02
jlk	let me see if I can insert a debugger here and get data	23:03
*** harlowja has joined #zuul		23:04
openstackgerrit	Emilien Macchi proposed openstack-infra/zuul-jobs master: version-from-git: fix logic with tags https://review.openstack.org/517733	23:05
*** Cibo has quit IRC		23:39
openstackgerrit	Merged openstack-infra/zuul-jobs master: version-from-git: fix logic with tags https://review.openstack.org/517733	23:48
*** hashar has quit IRC		23:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!