Tuesday, 2017-10-03

openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Clear project config cache later https://review.openstack.org/509040	01:14
openstackgerrit	Sam Yaple proposed openstack-infra/zuul feature/zuulv3: Add additional information about secrets https://review.openstack.org/509047	02:20
openstackgerrit	Sam Yaple proposed openstack-infra/zuul feature/zuulv3: Add additional information about secrets https://review.openstack.org/509047	02:25
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Improve scheduler log messages https://review.openstack.org/509057	03:25
jeblair	that's not exactly critical, but considering the amount of time i spent filtering those out of logs today, it's probably a net gain if folks have a second.	03:26
*** bhavik1 has joined #zuul		04:06
*** bhavik1 has quit IRC		05:13
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Move shadow layout to item https://review.openstack.org/509014	05:21
*** isaacb has joined #zuul		06:29
openstackgerrit	Andreas Jaeger proposed openstack-infra/zuul feature/zuulv3: Improve scheduler log messages https://review.openstack.org/509057	06:52
openstackgerrit	Merged openstack-infra/zuul-jobs master: Add content to support translation jobs https://review.openstack.org/502207	06:59
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Clear project config cache later https://review.openstack.org/509040	07:32
*** isaacb has quit IRC		07:33
*** hashar has joined #zuul		07:57
*** isaacb has joined #zuul		08:14
openstackgerrit	Merged openstack-infra/zuul-jobs master: Handle z-c shim copies across filesystems https://review.openstack.org/508772	08:27
*** hashar has quit IRC		09:01
*** electrofelix has joined #zuul		09:04
*** hashar has joined #zuul		09:06
*** hashar has quit IRC		09:23
*** hashar has joined #zuul		09:24
*** isaacb has quit IRC		10:03
*** isaacb has joined #zuul		10:03
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Fix branch matching logic https://review.openstack.org/508955	10:08
*** jkilpatr has quit IRC		10:34
*** isaacb has quit IRC		11:00
*** jkilpatr has joined #zuul		11:03
*** jesusaur has quit IRC		11:19
*** jkilpatr has quit IRC		11:25
*** jesusaur has joined #zuul		11:33
*** jkilpatr has joined #zuul		11:40
*** jkilpatr_ has joined #zuul		11:46
*** jkilpatr has quit IRC		11:47
*** jkilpatr_ has quit IRC		11:54
*** isaacb has joined #zuul		12:04
*** jkilpatr_ has joined #zuul		12:08
*** isaacb has quit IRC		12:14
*** isaacb has joined #zuul		12:32
*** isaacb has quit IRC		12:34
*** isaacb has joined #zuul		12:56
*** isaacb has quit IRC		13:04
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Set default on fetch-tox-output to venv https://review.openstack.org/509177	13:36
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Set default on fetch-tox-output to venv https://review.openstack.org/509177	13:39
*** dkranz has joined #zuul		14:06
*** ricky_ has joined #zuul		15:12
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Add change ID to NodeRequest https://review.openstack.org/509215	16:06
*** hashar is now known as hasharAway		16:11
jeblair	Shrews: is that something we need now, or can we back-burner it until fires are out ^ ?	16:16
Shrews	jeblair: total backburner. just getting it off my todo list	16:16
jeblair	Shrews: okay. cool. :)	16:17
SpamapS	jeblair: I had a thought about how to keep reconfigs from eating all RAM.	16:18
jeblair	Shrews: we found a fault in zuul's nodepool code yesterday, see line 122 of https://etherpad.openstack.org/p/zuulv3-issues ; are you interested in picking that up?	16:18
jeblair	SpamapS: cool! (though we don't know what's eating all the ram)	16:18
SpamapS	jeblair: what if for reconfig we fork, re-exec in the child, and shove the new tree down a pipe?	16:18
SpamapS	Like, at the end of a reconfig event, we pretty much get back to square 0 of the scheduler right?	16:19
SpamapS	So sort of automate what you've been doing with the dump/re-enqueue.	16:20
jeblair	SpamapS: it's the in-process version of a cron restart to deal with a memory leak?	16:20
SpamapS	yep!	16:20
SpamapS	Not sexy at all	16:20
SpamapS	But until we can figure out why they're not GC'd...	16:21
jeblair	SpamapS: i suppose it might work, but i fear we could spend more time dealing with that (file descriptors!) than fixing the issue	16:21
jeblair	SpamapS: well, we've just decided to roll back, so we have a window of time now where we can safely test this at scale now.	16:21
SpamapS	I agree	16:21
SpamapS	jeblair: k	16:21
Shrews	jeblair: iiuc, before zuul locks the nodes, it needs to make sure the request is still valid. and if not, abort the handling somehow. correct?	16:22
SpamapS	jeblair: another thing to try is just adding a 'gc.collect()' right after the reconfig.	16:22
jeblair	SpamapS: yeah -- i started to try to fold in some of the stuff on gc you and dhellman mentioned yesterday, and i ran gc.collect(0), gc.collect(1), and gc.collect(2) manually	16:23
SpamapS	jeblair: did you see anything in gc.garbage by any chance?	16:24
SpamapS	I'd expect no	16:24
jeblair	SpamapS: no it was empty	16:24
jeblair	SpamapS: those did collect the unknown layout objects though	16:24
SpamapS	since we don't use any objects from C extensions in there AFAIK	16:24
jeblair	so after all 3 collection passes, the number of layout objects in memory was exactly equal to the number i would expect	16:25
jeblair	SpamapS: so i'm developing a theory that because they stay around so long, they end up in generation 2	16:25
jeblair	and we just don't run generation 2 very often, or often enough	16:25
SpamapS	jeblair: that's certainly possible.	16:25
jeblair	i have no idea how often the various generations run though	16:26
SpamapS	It's based on allocations vs. deallocations I know, but I don't know the ratio	16:26
Shrews	jeblair: left a comment trying to explain my goal with that zuul change above. let me know if that's confusing or just a bad idea in general.	16:26
SpamapS	gc.set_threshold() lets you change them, but doesn't explain what they are before. Lame.	16:26
jeblair	Shrews: re zuul: yes -- i think there are maybe two approaches, and maybe we shoud do both. 1) zk has probably notified zuul already that the request is gone. we're just acting asynchronously from an event a long time ago. so maybe when zk notifies us the request is gone, we need to set a flag on our copy of the request object. 2) we could re-get the request before we do anything with the node.	16:29
jeblair	Shrews: yeah, i agree that the buildset uuid/job won't make that task easier, but why are we doing that task? is it just idle curiosity? if so, i'm not a fan of adding in potentially confusing information. it could lead people to believe that the data/request model is much simpler than it actually is and make debugging harder.	16:30
Shrews	jeblair: To answer the question: "My review doesn't seem to be doing anything. Why is that?"	16:31
jeblair	Shrews: if we really need to reverse-map this (as opposed to getting the request id out of the zuul log, or somehow asking zuul what request is for what build (ie, maybe adding it to the status.json)) then i think we need to add a lot of data to disambiguate it.	16:31
Shrews	we can then use request-list to verify if that review is waiting on nodes. otherwise, it's very hard to determine that	16:31
jeblair	Shrews: yeah, i think if we make this too simplistic, we are likely to make mistakes.	16:32
Shrews	jeblair: afraid i don't see the issue clearly. do we not have a 1-to-1 mapping of change/patchset to noderequest?	16:33
jeblair	Shrews: nope. it's entirely reasonable in our system to have 40 or more requests outstanding for the same change.	16:33
jeblair	maybe even significantly more than that	16:33
jeblair	Shrews: the unique key for a node request is pipeline+item(change)+build(job)	16:34
Shrews	jeblair: ok, then this isn't going to give me what i want. i'll skip it for now	16:35
jeblair	Shrews: ok. let's come back to this when we have more time to poke at it	16:36
*** rcurran_ has joined #zuul		16:37
*** hasharAway has quit IRC		16:37
*** rcurran_ has quit IRC		16:41
openstackgerrit	Andreas Jaeger proposed openstack-infra/zuul master: Use new infra pipelines https://review.openstack.org/509223	16:44
openstackgerrit	Andreas Jaeger proposed openstack-infra/zuul feature/zuulv3: Use new infra pipelines https://review.openstack.org/509224	16:45
*** hashar has joined #zuul		16:46
*** hashar is now known as hasharAway		16:46
openstackgerrit	Andrea Frittoli proposed openstack-infra/zuul-jobs master: Add a generic stage-artifacts role https://review.openstack.org/509233	17:25
openstackgerrit	Andrea Frittoli proposed openstack-infra/zuul-jobs master: Add compress capabilities to stage artifacts https://review.openstack.org/509234	17:25
openstackgerrit	Andrea Frittoli proposed openstack-infra/zuul-jobs master: Add compress capabilities to stage artifacts https://review.openstack.org/509234	17:30
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add TODO note about reworking fetch-tox-output https://review.openstack.org/509237	17:42
jeblair	oh, hey, i think i can use this to get a handle on when the collector is run: https://docs.python.org/3.5/library/gc.html#gc.callbacks	17:51
jeblair	SpamapS: i've done some more poking today, and have found that sometimes there are still unexpected layout objects even after a full collection. so it looks like sometimes that helps, and sometimes it doesn't. i'm still trying to figure out what's holding on to them.	17:53
jeblair	i haven't seen this before: http://paste.openstack.org/show/622572/	17:58
SpamapS	I wonder if there are stuck threads holding on or something.	17:59
SpamapS	I assume objgraph knows about threads and references in each thread's stack.	17:59
jeblair	SpamapS: hrm. we don't make a lot of threads, and when we do, we generally only pass the singleton scheduler instance around to facilitate cross-communication.	18:00
SpamapS	I hadn't even looked at threads in the scheduler.	18:01
SpamapS	But just wondering out loud really	18:01
jeblair	one thing i noticed while debugging is that sometimes exceptions hold stack frames with local variables with references. so one thing i'm concerned about is whether exception handlers are keeping entire layouts alive that way.	18:03
jeblair	and moreover, the sys.last_exception or whatever it's called is keeping that alive	18:04
jeblair	i cross-referenced memory jumps with config syntax errors and did not see a pattern	18:04
jeblair	but presumably, even if that's the case, it should only be, at most, one extra layout in memory	18:05
pabelanger	we're just discussion pipeline changes again in zuulv3, do we need to reload zuul for those (via puppet) or does zuul pick them up as soon as merged?	18:17
mordred	pabelanger: zuul picks them up	18:18
mordred	pabelanger: the only thing zuul needs a puppet action for is main.yaml	18:18
pabelanger	mordred: Thanks, I was getting confused	18:19
*** electrofelix has quit IRC		18:26
dmsimard	mordred: a side effect from truncating the end of the logs is that we no longer get a formal job status in the logs. I'm looking at adding missing zuul v3 bits in elastic-recheck and there's a couple places where it attempts to look for strings like "[Zuul] Job complete" or "Finished: FAILURE"	18:39
mordred	dmsimard: nod	18:40
dmsimard	I wonder what kind of workaround we could do. Run a post-job before log collection that anticipates and prints the status of "run" ? Use ARA to do it ? something else ? Something in the log processor ?	18:40
dmsimard	The problem being that if you search for "build_status:FAILURE" instead of "build_status:FAILURE and message:[Zuul] Job complete" you end up with all the log lines for that job, not just one result that contains that particular line.	18:41
mordred	dmsimard: well - it doesn't have to be at the end of the log, right?	18:42
dmsimard	yeah, hence why I proposed a 'post-run' playbook that would be able to tell if any of the 'run' or 'pre-run' playbooks failed	18:42
mordred	dmsimard: we could emit a message at the end of each playbook that is a single line that indicates success or failure that could be searched for	18:42
dmsimard	mordred: oh, I like that idea. The executor could print the status code of each playbook.	18:43
mordred	dmsimard: and have it be "[Zuul] Run Complete" or "[Zuul] Run Failed" (and switch run with pre-run and post-run as well)	18:43
mordred	yah	18:43
dmsimard	mordred: wfm, I'll send a patch	18:43
mordred	so that you could also see [Zuul] Post-Run Failed ... for everything EXCEPT errors in the final uploading of logs	18:44
mordred	dmsimard: woot!	18:44
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul feature/zuulv3: Explicitely print the result status of each playbook run https://review.openstack.org/509254	19:09
dmsimard	mordred: ^ not sure if there's more to it than that	19:09
mordred	dmsimard: I thnik you may want to add phase in too - that's listed a few lines up	19:11
mordred	dmsimard: as people might want to elasticcheck search for Run failures or Pre-Run failures or something	19:12
dmsimard	mordred: good idea	19:12
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul feature/zuulv3: Explicitely print the result status of each playbook run https://review.openstack.org/509254	19:15
SpamapS	unrelated to current fires. Has anyone looked into how one might use submodules with zuul?	19:19
SpamapS	I'm currently just stripping them out of a project and replacing relative paths with prefixed paths so I can just point to the other project's src_dir...	19:19
SpamapS	but it would be pretty awesome if Zuul could just inject them.	19:20
clarkb	zuul probably needs to have a git submodule init and update step if they are detected	19:22
SpamapS	the tricky part is that the parent repo will have a ref of where it wants the submodule checked out	19:22
SpamapS	and Zuul will want to checkout the submodule to the place where Zuul has prepared it to in the workspace	19:23
clarkb	they also tend to use relative paths to the repo	19:23
SpamapS	in my experience they tend to use absolutes... https://github.com/somewhere/something	19:23
SpamapS	not that it's the right thing, but they do that.	19:23
clarkb	gerrit at least uses relative	19:24
SpamapS	so there'd need to be a step rewriting that to the local src_dir, and then another one that checks it out to where the prepared git repo is	19:24
SpamapS	gerrit uses git in very nice ways. :)	19:24
mordred	SpamapS: the question has come up a few times, but I don't know that we've got an answer yet - or ultimately even a good write up of what we believe should happen in such scenarios - it gets even more complex in that canonical_hostname isn't necessarily guaranteed to be the same as the value in the git submodule reference (using openstack repos as a pathological example)	19:31
mordred	SpamapS: the openstack/openstack repo, for instance, is a repo with a ton of submodules. for gerrit submodule tracking to work, they must be urls that refer to gerrit's url - so https://review.openstack.org	19:32
mordred	but zuul knows those repos as git.openstack.org repos	19:32
mordred	SpamapS: which is all to say - there be a pile of dragons here- and I think the hardest part will be defining semantics	19:33
clarkb	which is not surprising because well thats what has been said about submodules since they were created	19:33
mordred	SpamapS: that said - it's also possible as a temporary approach to just list a parent repo's submodules in required-projects for jobs, and then make a pre-playbook that does the necessary repo surgery with the contents of src_dir - since on a repo-by-repo basis it should be much easier to know what the right thing is than it is to know that generally	19:37
mordred	SpamapS: the tox-install-siblings role is sort of lke this - it'll inject contents of a repo from required-projects into a tox virtualenv if it exists, but if it doens't will use the normal venv contents	19:38
fungi	i'm mildly curious what travis does with them	19:38
fungi	not so much because i think they're an example we should follow on things, but more because it'd be interesting to see a ci system's take on how submodules actually get integrated	19:39
fungi	as in what compromises they ultimately end up making for the sake of sanity	19:39
fungi	given that they're primarily github-focused and so get to deal with all that "variety"	19:40
SpamapS	mordred: the pre-surgery route is what I went down for a day. It's dragon infested.	19:41
mordred	fungi: well - I think for non-zuul systems the answer is fairly easy - just do a 'git submodule update' in the repo after cloning - or clone with the option that says to update the submodules when cloning	19:41
SpamapS	And as clarkb said, it's not Zuul's fault. It's submodules' fault.	19:42
SpamapS	I'm recommending erradicating them from the place they're used instead.	19:42
mordred	but with zuul being multi-repo aware, I could see people wanting to make a change to a submodule and then have zuul test that it'll work in the parent repo	19:42
mordred	SpamapS: ++	19:42
clarkb	fungi: I don't think travis really does any speculative future state testing beyond "current patch"	19:42
SpamapS	Travis gets given a repo and a reference, checks it out, and runs tests.	19:43
SpamapS	Their git-fu is pretty meh because they're single repo.	19:43
mordred	also - in considering just the simple "git clone --recurse-submodules" case - there's no good way to tell git to use the local cache on the mergers instead of cloning from the submodule reference source	19:45
mordred	so even that would need special consideration	19:45
SpamapS	Yeah it's a mess	19:45
mordred	++	19:45
SpamapS	Ultimately, git submodules are a hacky deployment system.	19:46
SpamapS	They track repo urls, and git shas of said repos.	19:46
SpamapS	The challenges would be mostly the same to try and have Zuul manage a repo that just had ansible yaml files with the same level of information. Because it makes use of git directly, it steps all over Zuul's domain.	19:47
* SpamapS has finished removing the submodules and moves on		19:47
*** AJaeger has joined #zuul		19:49
mordred	SpamapS: \o/	19:54
*** jkilpatr_ has quit IRC		20:00
openstackgerrit	Andrea Frittoli proposed openstack-infra/zuul-jobs master: Add a generic stage-artifacts role https://review.openstack.org/509233	20:09
openstackgerrit	Andrea Frittoli proposed openstack-infra/zuul-jobs master: Add compress capabilities to stage artifacts https://review.openstack.org/509234	20:09
*** jkilpatr has joined #zuul		20:20
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Include tenant in pipeline log messages https://review.openstack.org/509279	20:29
clarkb	Shrews: were you working on that thing I found last night? <- jeblair?	20:30
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul feature/zuulv3: Explicitely print the result status of each playbook run https://review.openstack.org/509254	20:32
dmsimard	mordred: ^ I think I like this approach better and it also kinda fixes a bug	20:33
Shrews	clarkb: what thing did you find last night?	20:33
clarkb	the double lock after request cancel	20:34
jeblair	Shrews: the zuul locking thing i mentioned earlier	20:34
Shrews	clarkb: oh, yeah. put my name beside it on the etherpad	20:34
*** patriciadomin has joined #zuul		20:41
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul feature/zuulv3: Explicitely print the result status of each playbook run https://review.openstack.org/509254	20:46
SpamapS	I'm not loving the way zuul.projects ends up working btw...	21:01
SpamapS	http://paste.openstack.org/show/622586/	21:01
SpamapS	Had to use this to grab the path to a particular project.	21:01
SpamapS	might work better as a mapping	21:02
clarkb	name: path?	21:02
SpamapS	EPARSE	21:06
clarkb	mapping from name to path	21:07
SpamapS	yeah I think that will be a common use case	21:07
jeblair	will it still be easy to iterate?	21:17
jeblair	there are some roles in zuul-jobs that do with_items: "{{ zuul.projects }}"	21:18
jeblair	i think the issues are: 1) still need to be able to iterate over the values. 2) the dict key will need to be the fully qualified project name; eg: git.openstack.org/openstack/nova	21:19
jeblair	if those aren't blockers, i think switching to dict would be fine	21:20
mordred	jeblair, SpamapS: this is also a location where we could write some additional filter plugins - zuul \| zuul_projects_by_name or zuul \| zuul_projects_by_canonical_name - etc ... since this is specifically about playbooks that manipulate varibles in the zuul datastructure, we're already in the realm where 'pristine' ansible isn't going to be possible anyway	21:26
mordred	heck - we could do zuul\|zuul_project('cloudplatform/openstack-helpers') ... and have that do the zuul matching logic so that it'll match cloudplatform/openstack-helpers if that's unique but otherwise would want git.godaddy.com/cloudplatform/openstack-helpers ... which would otherwise be fairly hard to do just with native jinja filters	21:30
jeblair	that's a thing	21:31
*** dkranz has quit IRC		21:32
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add jinja filters for manipulating zuul.projects https://review.openstack.org/509294	21:39
mordred	jeblair, SpamapS: quick 5 minute for-instance (fine if we say no to that and abandon that patch - just wanted to show code real quick)	21:40
mordred	I've marked it WIP for now	21:40
jeblair	mordred: thx	21:40
jeblair	i'm switching to speeding up the config loading because having zuul spending all its time on that is making everything else hard to debug	21:52
mordred	jeblair: ++	21:53
mordred	dmsimard: zomg. I LOVE your 509254 patch and the fact that it removes that chunk from the callback	21:54
dmsimard	yay ? :)	22:13
dmsimard	mordred: when it gets queued it'll be even more awsum	22:14
jeblair	i think i wedged it debugging. i want to get this speedup ready before restarting tho	22:24
*** hasharAway has quit IRC		22:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Speed configuration building https://review.openstack.org/509309	23:06
jeblair	mordred, clarkb, jlk, SpamapS: ^ i'm estimating that will reduce our dynamic config delays from 50s to about 4s.	23:08
dmsimard	jeblair: that is pretty darn awesome, I would buy you a beer right about now	23:08
jeblair	there's still more we can do, but that's the easy stuff	23:08
clarkb	jeblair: how much of that wsa just not recompiling voluptuous each time?	23:09
jeblair	clarkb: i think about 75%. most of the rest was avoiding deepcopy	23:12
clarkb	jeblair: re the deepcopy where was the modification happening that the comment talked about?	23:15
* clarkb quickly scanning for a pop() and not seeing it		23:15
jeblair	clarkb: configloader.py line 831 on the old side	23:16
clarkb	oh derp its right there also looks like the conf exception handler can do a pop which is why you pass validate=False	23:17
jeblair	clarkb: now we just tell the template parser not to validate it, since the project parser already has. then the template parser just ignores it	23:17
clarkb	lgtm thanks	23:17
jeblair	clarkb: well, the exception handler (with configuration_exceptions:) actually does make a copy internally	23:18
clarkb	oh ya it does	23:18
jeblair	so the validate=false is mostly to avoid erroring on having the 'templates' section present in the template parser	23:18
jeblair	oh, here's a way we can further halve the dynamic config time in many cases -- if there are no config changes to config-projects ahead in the queue, do not run phase1. similarly, if there are no changes to untrusted-projects, do not run phase 2. (of course, eventually if there are both in a queue, we'll still end up running both phases for changes after that, but that could be rare in practice)	23:40
jeblair	i don't have time to do that right now, if anyone else wants to work an that feel free	23:41
clarkb	I've got to do yardwork before I'm out of town for seagl	23:41
clarkb	I haven't had to mow in months but alas the rain is back and the grass is happy	23:41
jeblair	clarkb: what's the conference?	23:41
clarkb	seattle gnu linux (seagl)	23:42
clarkb	I'm gonna tlak about python packaging	23:42
clarkb	and probably make everyone on hackernews angry	23:42
jeblair	oh cool, good luck!	23:42
jeblair	clarkb: they won't be there cause it says gnu linux	23:42
clarkb	oh right	23:42
clarkb	I'm excited I haven't been able to go in the psat due to summit conflicts and this year no conflict	23:43
SpamapS	jeblair: oh wow, lots of deepcopies gone	23:46
* SpamapS is EOD'ing but will look later		23:46

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!