Thursday, 2017-08-17

SpamapS	mordred: It's proving pretty gross in python too, so I"m glad I flipped	00:04
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add configure-logserver role https://review.openstack.org/489113	02:31
*** openstack has quit IRC		04:42
*** openstack has joined #zuul		04:47
*** eventingmonkey has joined #zuul		04:47
*** leifmadsen has quit IRC		04:47
*** leifmadsen has joined #zuul		04:47
*** rfolco has joined #zuul		04:47
*** jlk has joined #zuul		04:49
*** ianw has joined #zuul		04:49
*** jhesketh has joined #zuul		04:49
*** jlk has quit IRC		04:49
*** jlk has joined #zuul		04:49
*** tobiash has joined #zuul		04:50
*** jesusaurum has joined #zuul		04:51
*** maxamillion has joined #zuul		04:52
*** rcarrill1 has joined #zuul		04:53
*** cinerama has joined #zuul		04:54
*** mnaser has joined #zuul		04:55
*** tristanC has joined #zuul		05:02
*** isaacb has joined #zuul		05:18
*** dmsimard\|off has quit IRC		05:48
*** dmsimard has joined #zuul		05:49
*** dmsimard is now known as dmsimard\|off		05:49
*** isaacb has quit IRC		06:17
*** isaacb has joined #zuul		06:28
*** SotK has joined #zuul		07:43
*** smyers has quit IRC		09:05
*** smyers has joined #zuul		09:16
*** electrofelix has joined #zuul		09:53
*** smyers has quit IRC		10:12
*** robled has quit IRC		10:12
*** robled has joined #zuul		10:17
*** robled has quit IRC		10:17
*** robled has joined #zuul		10:17
*** smyers has joined #zuul		10:21
*** jkilpatr has quit IRC		10:44
*** pbelamge has joined #zuul		10:56
pbelamge	hello all	10:56
pbelamge	I am getting this error in gate pipeline:	10:57
pbelamge	<QueueItem 0x7fe63c495410 for <Change 0x7fe63c50be10 44,1> in gate> is a failing item because ['it did not merge']	10:57
pbelamge	anyone faced this kind of error before?	10:57
pbelamge	log entries before this line:	10:57
pbelamge	https://thepasteb.in/p/lOhONzjJ3JZCB	10:58
SpamapS	2017-07-30 02:00:12,386 DEBUG zuul.GerritSource: Change <Change 0x7fe63c50be10 44,1> did not appear in the git repo	11:00
SpamapS	pbelamge: perhaps you're using a mirror for merging?	11:00
pbelamge	https://thepasteb.in/p/AnhrA18DMNzHv	11:01
pbelamge	that is the git directory mentioned in the zuul.conf and the zuul_url	11:02
pbelamge	let me know if you need anything to see if I am missing anything?	11:08
pbelamge	in layout.yaml I used jenkins as user name and in zuul.conf as well	11:21
pbelamge	I am sending Workflow (1) and Code-Review (2) from gerrit	11:21
pbelamge	added jenkins in non-interactive users group in gerrit	11:21
pbelamge	so, Verified (2) and Submit is automatically executed	11:22
SpamapS	So zuul got the event. But it appears it couldn't find it later.	11:22
pbelamge	right	11:22
SpamapS	rather, it appears Zuul couldn't find the change.	11:22
pbelamge	what could be wrong?	11:23
SpamapS	not entirely sure	11:23
*** jkilpatr has joined #zuul		11:23
SpamapS	I'm not super experienced debugging zuul problems. However, have you checked the gerrit logs to see what zuul tried to fetch?	11:23
pbelamge	ok, let me take a look	11:24
pbelamge	log from sshd_log	11:28
pbelamge	https://thepasteb.in/p/LghNnY3jRGOsZ	11:32
pbelamge	log from httpd_log	11:32
pbelamge	https://thepasteb.in/p/nZhlN6WJzM3UY	11:33
*** jkilpatr has quit IRC		11:37
*** jkilpatr has joined #zuul		11:37
SpamapS	pbelamge: some more experienced zuul debuggers will be online in the next 4-5 hours ... I'm not even supposed to be awake yet. ;)	12:05
* SpamapS decides to try and get another hour of sleep after reporting and fixing https://github.com/ansible/ansible/issues/28325 in ansible :-P		12:14
*** pleia2 has joined #zuul		12:42
*** isaacb has quit IRC		12:53
*** gothicmindfood has joined #zuul		13:06
*** pbelamge has quit IRC		13:25
*** amoralej is now known as amoralej\|lunch		13:40
*** dkranz has joined #zuul		14:04
*** amoralej\|lunch is now known as amoralej		14:25
*** isaacb has joined #zuul		14:37
*** isaacb has quit IRC		14:43
*** jeblair has joined #zuul		14:56
Shrews	Does anyone need any help with the current set of "must-be-done" tasks?	15:17
SpamapS	mordred: hah, thanks for the +1 on that PR. :)	15:35
SpamapS	mordred: the workaround is to just add all 3 known types. :-P	15:36
*** openstackgerrit has joined #zuul		15:55
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use cached branch list in dynamic config https://review.openstack.org/494618	15:55
* SpamapS notes that jamielennox got BonnyCI ALMOST to v3 last night		15:59
SpamapS	just ran into https://review.openstack.org/#/c/493059/ which we're fixing now	16:00
SpamapS	(post-review instead of allow-secrets)	16:00
jeblair	SpamapS, jamielennox: \o/	16:01
pabelanger	nice	16:02
Shrews	SpamapS: for some reason, i thought you were already on v3	16:12
Shrews	but yay!	16:12
SpamapS	I know right?	16:15
SpamapS	hrm... status webapp seems stuck	16:17
SpamapS	Job base not defined	16:20
SpamapS	derp	16:20
SpamapS	I think just needs a parent: null	16:22
jeblair	SpamapS: ah yep that's another new thing	16:22
SpamapS	luckily I've been paying attention :)	16:22
jeblair	pabelanger: what's the status of the mirror name fix?	16:23
jeblair	i just had a job bomb because it ran on inap with the wrong mirror name	16:23
SpamapS	oh yay, as soon as I pushed it.. Zuul saw it and sprang to life.	16:25
SpamapS	or not	16:26
SpamapS	hm	16:26
Shrews	does it have to be "parent: base" or is "parent: null" also acceptable?	16:29
SpamapS	Guessing I'm hitting an unexpected url here....http://paste.openstack.org/show/618704/	16:29
SpamapS	Shrews: the ultimate base needs to be parent: null	16:29
jeblair	Shrews: to define a parent job, "parent: null". we no longer need to add "parent: base" to every other job as that's now the default.	16:29
Shrews	SpamapS: ah, right. makes sense	16:29
SpamapS	or yeah, all ultimate base jobs I should say, there's not just one..	16:30
jeblair	SpamapS: yeah, maybe that's hitting the status url with the wrong (or no?) tenant name (which should be the first path component of the url)	16:30
SpamapS	https://zuul.opentech.bonnyci.org/BonnyCI/status.json <-- wee	16:31
SpamapS	jeblair: it's hitting with _only_ the tenant name.	16:31
SpamapS	should not really be a server side ERROR log IMO	16:31
jeblair	SpamapS: i agree	16:32
SpamapS	maybe our log config is off	16:39
SpamapS	because it looks like it is being logged using .exception()	16:39
SpamapS	which I'd only expect to see in debug logs	16:39
SpamapS	level=WARNING ... so.. hrm	16:40
SpamapS	OH	16:41
SpamapS	.exception() does ERROR? that seems.. hrm.	16:41
pabelanger	jeblair: I think we need to restart executors	16:41
pabelanger	looking now	16:42
jeblair	SpamapS: yes -- i think that's appropriate. generally if there's an uncaught exception i want to know about it. :) i think we just need to make sure there's a tenant validity check before that point.	16:43
jeblair	SpamapS: iow -- any other error formatting the status json is worth noting :)	16:43
SpamapS	yeah, just some lazy coding in there expecting things	16:43
SpamapS	I'm really just trying to see how to make webapp return 404	16:44
SpamapS	ahh HTTPNotFound k	16:44
pabelanger	jeblair: ya, looks like we need to restart executors. I can do that once I get some food	16:44
jeblair	pabelanger: thx	16:45
pabelanger	Shrews: here is an interesting nodepool failure: http://logs.openstack.org/30/493330/2/gate/gate-dsvm-nodepool-ubuntu-src/993f8ce/logs/screen-nodepool-builder.txt.gz#_Aug_17_16_02_33_543757	16:48
pabelanger	we lost connection with zookeeper some reason	16:48
SpamapS	That should be something that resolves itself.	16:49
SpamapS	One should expect to lose one's zk from time to time.	16:49
Shrews	pabelanger: but seems like the builder recovered correctly, yeah?	16:49
Shrews	yeah, can't do anything about zk disappearing when we only have one node in the cluster. as long as it recovers correctly...	16:50
Shrews	but in a dsvm job, you really don't expect zk to just go away. weird that it did	16:51
Shrews	looks like it happened during the upload, that failed, then the upload worker suspended itself correctly.	16:54
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Return 404 on unknown tenants https://review.openstack.org/494642	16:54
pabelanger	Shrews: ya, likely need to recover more gracefully for d-g hook: http://logs.openstack.org/30/493330/2/gate/gate-dsvm-nodepool-ubuntu-src/993f8ce/console.html	16:54
pabelanger	lets see if it happens more often	16:55
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use cached branch list in dynamic config https://review.openstack.org/494618	17:38
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Allow multiple semaphore definitions within a project https://review.openstack.org/494650	17:38
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Reload configuration when branches are created or deleted https://review.openstack.org/494651	17:38
pabelanger	jeblair: just ze01.o.o or are we launching jobs across other executors now for restarting	17:48
jeblair	pabelanger: all 4 now	17:48
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP Replace paste/webob with aiohttp https://review.openstack.org/494655	17:50
mordred	SpamapS: saw you poking at webapp things - I wrote taht ^^ a couple of months ago as an exploration - figured I'd stick it up in case anyone decides they feel like poking further webapp/zuul-web in case it's helpful	17:51
pabelanger	jeblair: thanks	17:51
pabelanger	Hmm, looks like we leak /var/run/zuul-executor/zuul-executor.pid on stop. Will find out why	17:53
pabelanger	k, all have restarted. Now to fix puppet-zuul	17:55
jeblair	pabelanger: because zuul can't delete it because it's written as root but zuul drops privileges	17:57
jeblair	mordred: this error is new to me: http://logs.openstack.org/50/494650/1/check/tox-pep8/50bd841/job-output.txt.gz#_2017-08-17_17_43_53_019040	17:58
*** jkilpatr has quit IRC		18:02
jeblair	and again: http://logs.openstack.org/51/494651/1/check/tox-pep8/c1a2cb1/job-output.txt.gz#_2017-08-17_17_56_46_888201	18:02
jeblair	i wonder if those have something to do with the executors restarting	18:03
*** jkilpatr has joined #zuul		18:03
jeblair	pabelanger, mordred: what's the status on tarball/publish jobs?	18:04
pabelanger	let me pull up etherpad	18:05
pabelanger	both ssh / testpypi credentials are added to project-config: https://review.openstack.org/494276/ was to start testing testpypi creds on executor and if pip install was working	18:06
pabelanger	I'be added the publish-openstack-python-branch-tarball to post pipeline for zuul, and confirmed it uploaded to zuulv3-dev.o.o	18:07
pabelanger	mordred: still has (pre-)python-tarball playbooks to create, but happy to take over if needed	18:08
jeblair	pabelanger: +3 494276	18:08
pabelanger	twine role is started, and ready to push up pending above	18:08
pabelanger	so, I can take back pre-python-tarball and python-tarball playbooks now to keep iterating forwar	18:09
jeblair	mordred: ^ are you on that or do you want to hand it off?	18:10
jeblair	also, other folks were asking about what they could do to help.	18:10
pabelanger	most of the playbook content is written, we just need to reorg it	18:10
jeblair	pabelanger: when 494276 lands you should be able to move forward on testing pypi uploads at least, right?	18:10
pabelanger	jeblair: yes, I'll push on that	18:11
mordred	jeblair, pabelanger: I'm fine with pabelanger taking those if he's in the groove on it	18:11
pabelanger	k	18:11
mordred	I wanted to see if I could get the secret name thing in real quick - cause I think we can use it to good effect on both tarballs and logs	18:12
pabelanger	which one is that?	18:12
mordred	pabelanger: oh - I also, speaking of - I think we can collapse the logs role/job to at least use the add-fileserver role like you mentioned yesterday	18:12
mordred	pabelanger: 494650 - I need to fix a unittest real quick (now that I'm done with this morning's project-config fun :) )	18:13
pabelanger	mordred: ya, I did that already with https://review.openstack.org/494314/ for base-test logs	18:13
mordred	jeblair: OH MY - that's a fantastic error	18:13
mordred	pabelanger: neat!	18:13
pabelanger	I think you have the other patch in merge conflict	18:13
jeblair	mordred: yeah, i'm kinda inclined to see if it shows up again not in proximity to an executor restart before we spend too much time on it	18:14
mordred	jeblair: kk	18:14
jeblair	mordred: (even though i would not expect an executor restart to manifest like that; just trying not to rabbithole)	18:14
mordred	"dictionary changed size during iteration" <-- seems extra weird - but yah	18:14
pabelanger	I see that a lot when doing python3 convert in nodepool	18:15
mordred	yah - I'm just not sure why we'd only see it near an executor restart - so I'm hoping it's just a heisenbug	18:15
jeblair	mordred: let me know when you have the secrets name thing fixed (or if you need another set of eyes on it). i'm working on the branch cache (should be done, just waiting on stable test results) and will work on the tmpfs-for-secrets idea next (probably after lunch).	18:16
mordred	jeblair: cool	18:16
mordred	jeblair: and will do - also, when you get a spare second, https://review.openstack.org/#/c/494260/ will let us do https://review.openstack.org/#/c/494281/	18:17
mordred	(and the depends have landed, etc)	18:17
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Allow requesting secrets by a different name https://review.openstack.org/494343	18:19
mordred	nobody should let me code	18:19
jeblair	mordred: hire 4 devs and you have a deal. otherwise, no dice.	18:21
mordred	it's literally the same thing I fixed between patch 1 and patch 2 - just one line down	18:23
jeblair	mordred: ara: +2 on the first, -1 on the second (one of us is confused)	18:25
*** electrofelix has quit IRC		18:30
mordred	jeblair: I just got another ANSIBLE PARSE ERROR	18:30
jeblair	okay so that's a thing :(	18:31
jeblair	mordred: i wonder if our ansible version is different on the new executors	18:31
jeblair	mordred: left some comments on secrets change	18:33
mordred	jeblair: well - I got it on ze01 - but I have now officially switched to trying to figure out what's up	18:34
jeblair	hrm, ansible 2.3.2.0 on ze01 and ze04.	18:34
jeblair	mordred: ok cool. i have to afk now, back after lunch.	18:34
mordred	yah - and both with the same python version	18:34
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Allow requesting secrets by a different name https://review.openstack.org/494343	18:39
Shrews	for key in result_dict.keys():	18:43
Shrews	if key != 'changed':	18:43
Shrews	result_dict.pop(key)	18:43
Shrews	mordred: that seems suspicious	18:44
Shrews	the "if '_zuul_nolog_return'" portion of v2_runner_on_ok()	18:44
mordred	Shrews: yes - I agree- that looks very bad	18:46
mordred	I _think_ there are two issues here ... one is an issue in the callback plugin - the second is that there was a problem connecting	18:47
Shrews	mordred: interestingly, i can reproduce that 'dict changed size' error on that exact code, but so far only in py3. we aren't running ansible under py3, are we?	18:49
mordred	Shrews: the local code - the callback plugin - does run under python3	18:51
Shrews	that doesn't seem safe, since ansible isn't really py3 ready	18:52
mordred	thecore folks said core is py3 ready - it's modules that are the problem (we chatted with them about it)	18:52
*** jesusaurum is now known as jesusaur		18:52
Shrews	ah yes, i believe i have seen a similar statement	18:52
mordred	Shrews: so running ansible-playbook with python3 but setting ansible_python_interpreter so that it uses python2 on the remote nodes	18:53
mordred	and toshio said if we encounter any py3 issues in core they'd consider them urgent/bad bugs	18:53
Shrews	mordred: want me to put up a fix for that?	18:56
mordred	Shrews: sure! thanks	18:56
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Fix zuul_stream callback dict modification https://review.openstack.org/494679	18:58
Shrews	mordred: jeblair: ^^^	18:58
mordred	now- why that playbook couldn't connect is a whole different question to track down - especially considering we're using persistent conection stuff	18:59
mordred	SpamapS: we did wind up using persistent connections yeah?	19:00
*** amoralej is now known as amoralej\|off		19:01
SpamapS	mordred: fer what? sorry, I lost context.	19:01
mordred	SpamapS: for ansible to remote nodes	19:01
SpamapS	Oh yes we're still persistent.	19:01
SpamapS	But we don't persist across runs.	19:01
SpamapS	The bubblewrap dies and kills the ssh with it.	19:02
mordred	hrm	19:02
SpamapS	Not doing that proved pretty squirelly (Technical Term)	19:02
SpamapS	I think it would be doable, but we had to get down into more ansible guts to do it right IIRC.	19:03
pabelanger	mordred: jeblair: good news, pip install worked in bwrap on executor	19:07
pabelanger	jeblair: I think we also need to restart zuulv3.o.o for nodepool.cloud variable. I can see nodepool adding it to zookeeper, but still not showing up in our inventory files.	19:10
pabelanger	I only reset our executors	19:10
mordred	pabelanger: pip install --user yeah?	19:12
pabelanger	mordred: yes	19:13
pabelanger	mordred: it is missing our pip.conf settings, so we'll have to iterate on that too	19:14
mordred	pabelanger: good point	19:14
mordred	pabelanger: we'll need to be careful with that - as executors are in dfw, so we can't use any nodepool variables that we normally would for setting mirrors	19:18
Shrews	hrm, i wonder if https://review.openstack.org/494679 will need to be force merged	19:20
*** isaacb has joined #zuul		19:21
pabelanger	mordred: ++	19:23
mordred	pabelanger: we might need to have a special role or version of configure-mirror for setting up only ~/.pydistutils and ~/.pip.conf?	19:24
mordred	pabelanger: oh - I've got an idea ...	19:25
pabelanger	I'm going to restart zuulv3.o.o to pick up new inventory variables for nodepool.cloud	19:25
mordred	kk	19:25
pabelanger	mordred: also waiting for your idea	19:25
pabelanger	zuulv3.o.o back	19:27
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Allow configure-mirrors to do only homedir https://review.openstack.org/494684	19:30
mordred	pabelanger: how about that ^^ ?	19:30
pabelanger	ya, that might work	19:31
mordred	pabelanger: then in the publish playbook you're working on, since that's specific to openstack anyway, you can just do role: configure-mirrors: mirror_fqdn: mirror.dfw.rax.opentsack.org only_homedir: true	19:31
mordred	s/role: /roles: - /	19:31
mordred	Shrews: I'm curious as to why this issue with the dict just started all of a sudden	19:33
Shrews	yah, me too	19:33
jeblair	Shrews: thanks! +3	19:33
mordred	also - that's in if '_zuul_nolog_return' in result_dict: which only occurs in like one place	19:33
Shrews	like i said, i don't think it will merge w/o magic	19:34
jeblair	Shrews: oh why not?	19:34
Shrews	b/c zuulv3 has the buggy version, causing the failure, yeah?	19:34
jeblair	Shrews: i thought it was sporadic	19:34
Shrews	jeblair: my patch failed w/ the same error	19:35
Shrews	so... less sporadic now? dunno	19:35
mordred	jeblair: turning on -vvv to the logs requires a restart right? or do we have a command for that?	19:35
jeblair	mordred: zuul-executor verbose	19:35
mordred	I'm going to turn that on	19:36
jeblair	kk	19:36
jeblair	mordred: we have 4 executors running now	19:36
jeblair	mordred: if you want to scale that back to 1, i think that'd be fine	19:36
mordred	jeblair: I actually havne't seen the error anywhere OTHER than ze01 fwiw	19:37
jeblair	mordred: curiouser and curiouser	19:37
mordred	yah ....	19:37
Shrews	and failed yet again :(	19:37
Shrews	weird it's just showing up	19:37
jeblair	Shrews: okay i'll push it through	19:37
mordred	Aug 17 19:28:12 ze01 kernel: [6224688.842500] traps: ansible-playboo[28587] general protection ip:50ee24 sp:7ffe1c61e848 error:0 in python3.5[400000+3a8000]	19:38
jeblair	mordred: if you're ready for that -- unless you want.....	19:38
mordred	we're getting segfaults	19:38
mordred	from ansible-playbook	19:38
jeblair	i um	19:38
mordred	that's in /var/log/syslog	19:38
SpamapS	woof	19:38
Shrews	neat	19:38
mordred	which have not been in the syslog until just recently	19:39
mordred	that, in fact, was the first occurance	19:39
jeblair	mordred: so not 1:1 with these errors?	19:40
Shrews	2.4 wasn't released was it?	19:40
jeblair	Shrews: we're on 2.3.2.0	19:41
jeblair	on all 4 executors	19:41
Shrews	k. i knew 2.4 was very close	19:41
mordred	jeblair: no - there's only a few of them	19:41
jeblair	mordred: i think the segfaults have been around a while	19:48
mordred	awesome	19:48
jeblair	mordred: they're in all our syslogs, back to aug 11	19:48
mordred	maybe that happens when we restart an executor with a playbook running?	19:48
jeblair	mordred: maybe?	19:48
mordred	thing to keep our eyes on at leats	19:48
jeblair	i'm at least inclined to separate it from the callback issue at this point	19:48
mordred	++	19:49
mordred	agree	19:49
jeblair	so we've got callback first reported (could be older -- this is manual) around 17:58	19:49
mordred	I'm rechecking the secret patch while verbose is on to see if I can get a traceback in the logs	19:49
jeblair	mordred: i have a kernel of an idea but it's not fleshed out -	19:52
mordred	ok	19:53
jeblair	mordred: our jobdir paths just got longer, and some of the paths in an error i just checked are 120 bytes long; the shebang limit is 128. i don't know how that could come in to play.	19:53
jeblair	maybe the ansible module interpolation stuff? i dunno. just brainstorming.	19:54
mordred	oh. well- that could be the cause of the remote error - it's possible the issue with callback warning has been happening for a while and we just didn't notice	19:54
mordred	sorry - don't know if you caught that when you came back - there are two issues - the job isn't failing because of the callback error	19:54
mordred	jeblair: http://logs.openstack.org/43/494343/4/check/tox-pep8/4b474ca/job-output.txt.gz#_2017-08-17_18_43_45_460997	19:55
jeblair	mordred: ah. so force-merging Shrews' change won't help.	19:55
mordred	jeblair: is a real issue	19:55
jeblair	right, but not one which causes ansible to fail?	19:55
mordred	no - I mean that link is the link to the real issue	19:55
jeblair	mordred: yes i know. i see both issues. i'm just trying to understand which is fatal.	19:56
jeblair	callback, unreachable, or both?	19:56
mordred	the unreachable one	19:56
Shrews	might this be a good time to try autohold?	19:56
mordred	callback errors are never fatal - they just cause lack of further output to log for that task	19:56
jeblair	mordred: run 'zuul-executor keep' as well please so we can inspect the jobdir contents	19:57
jeblair	Shrews: maybe so	19:58
mordred	jeblair: done	19:58
jeblair	it's also worth entertaining the idea that, if this only happens on infracloud nodes, that the network is legitimately in a worse-than-normal state.	19:58
mordred	jeblair: the error is happening consistently right after Install /etc/pip.conf	19:59
jeblair	infracloud had its mirror hosts replaced this morning. i also can't tie that to the problem, but it's worth noting.	19:59
jeblair	(aside from the fact that the network there is incomprehensible and we have switches acting as hubs so it's anyone's guess)	20:00
mordred	oh - sorry - it's happening consistently ON Install /ec/pip.conf	20:00
pabelanger	Ya, new mirrors and slow networking	20:00
jeblair	mordred: oh, so that's between tasks within one playbook; it's not on a playbook boundary. so that reduces (not eliminates) likelihood of it being a network problem since that should be cached.	20:01
jeblair	(pydistutils.cfg task comes after pip.conf task)	20:02
mordred	yah	20:02
mordred	this makes me think back to your shebang path thing	20:02
jeblair	mordred: get a hit with vvv and keep yet?	20:02
mordred	/var/lib/zuul/builds/4b474ca8037b49c3afd351b9dee20611/.ansible/remote_tmp/ansible-tmp-1502995425.1263802-116590003515092 is what it reports as the remote_tmp	20:02
mordred	jeblair: with vvv yes	20:02
mordred	nothing new interesting in log	20:03
jeblair	okay, rechecking	20:03
jeblair	maybe with keep we can grep for long paths	20:03
jeblair	ansible-tmp-1502995425.1263802-116590003515092 looks like it may be a variable length path component too	20:03
mordred	jeblair: in log ..	20:04
mordred	b'mkdir: cannot create directory \\xe2\\x80\\x98/var/lib/zuul\\xe2\\x80\\x99: Permission denied\\n')"	20:04
mordred	mkdir -p \\"` echo /var/lib/zuul/builds/9ed3901c28ff4e72b4b6233cbb08fce2/.ansible/remote_tmp/ansible-tmp-1502999815.994324-89732834853800 `\\"	20:05
mordred	is the command it reports associated with that I believe	20:05
mordred	jeblair: http://paste.openstack.org/show/618720/ <-- relevant section in whole	20:06
jeblair	\\xe2\\x80\\x98 is utf-8 for "Left single quotation mark"	20:08
mordred	yah. '‘/var/lib/zuul’' is what I got from python decode	20:08
*** jkilpatr has quit IRC		20:09
jeblair	/var/lib/zuul should pretty much either exist (yes) or not exist within bwrap.	20:10
jeblair	er remote_tmp?	20:11
jeblair	is this happening on the remote host?	20:11
mordred	yah- I think that's it trying to make the remote_tmp dir - and yes, I think so?	20:11
mordred	yes: b"<15.184.65.176> (1, b'', b'mkdir: cannot create directory \\xe2\\x80\\x98/var/lib/zuul\\xe2\\x80\\x99: Permission denied\\n')"	20:11
jeblair	so that's changed recently too, from /tmp/.... to /var/lib/zuul/....	20:12
jeblair	because the remote_tmp is set to the same path as the local_tmp	20:12
jeblair	i guess when we did that, we didn't expect the local tmp path to be anything different	20:12
jeblair	why would this sometimes work?	20:12
pabelanger	is it failing all the time now?	20:13
mordred	yah. so maybe we have some hosts missing a writable/var/lib/zuul? have we changed /var/lib/zuul on base images any time recently?	20:13
mordred	and yah - I think we should try the autohold feature - it might be good to look at a node and figure out what's up with it	20:14
jeblair	pabelanger: i don't think so.	20:14
jeblair	mordred: you want to do that? 'zuul autohold' on zuulv3.o.o	20:14
mordred	kk	20:14
jeblair	mordred: maybe let's add it for tox-pep8 tox-py35 tox-docs and tox-cover	20:15
mordred	the interface says job is optional ...	20:15
mordred	can I just add it for opentsack-infra/zuul ?	20:15
mordred	Shrews: ^^ ?	20:15
jeblair	i'll work on a patch that changes remote_tmp	20:16
mordred	and does project need to be git.openstack.org/openstack-infra/zuul ?	20:16
mordred	ok. tenant job and reason are required	20:16
mordred	Shrews: I would like to report a bug in the helptext :)	20:16
Shrews	i think openstack-infra/zuul will be enough since it's unique	20:17
pabelanger	jeblair: I am wondering if we had older version of ansible-playbook and when I restarted ze01 this morning, it started using ansible 2.3.2.0	20:17
mordred	neat	20:17
mordred	configparser.NoSectionError: No section: 'gearman'	20:17
mordred	oh. I shold be not mordred	20:18
mordred	I have submitted an autohold for tox-pep8 tox-py35 tox-docs and tox-cover	20:18
mordred	I'm going to recheck	20:19
Shrews	mordred: \| 0000017891 \| infracloud-vanilla \| nova \| ubuntu-xenial \| fbc53ee2-2044-4710-b276-a0fb73cc623e \| hold \| 00:00:00:02 \| unlocked \| ubuntu-xenial-infracloud-vanilla-0000017891 \| 15.184.65.200 \| 15.184.65.200 \| \| 22 \| nl01-6551-PoolWorker.infracloud-vanilla-main \| openstack git.openstack.org/openstack-infra/zuul tox-pep8 \| track down connectivity problem \|	20:20
Shrews	fyi	20:20
pabelanger	2017-08-16 20:03:48,173 DEBUG zuul.AnsibleJob: [build: 917b0d5c1778477eae27ca466ef70a66] Job root: /var/lib/zuul/builds/917b0d5c1778477eae27ca466ef70a66 was the first time we started using /var/lib/zuul/builds on ze01.o.o	20:20
pabelanger	which resulted in http://logs.openstack.org/10/494310/1/check/tox-py35/917b0d5/job-output.txt.gz error	20:21
mordred	Shrews: woot	20:22
pabelanger	0ce80c17f2004e098b2092e37204cdc2 was the last job to run using /tmp, and didn't have that issue	20:22
pabelanger	mind you the job was aborted	20:22
Shrews	\| 0000017896 \| inap-mtl01 \| nova \| ubuntu-xenial \| 335daeb7-e5f5-4fe2-b28b-d6774d877701 \| hold \| 00:00:01:59 \| unlocked \| ubuntu-xenial-inap-mtl01-0000017896 \| 198.72.124.67 \| 198.72.124.67 \| \| 22 \| nl01-6551-PoolWorker.inap-mtl01-main \| openstack git.openstack.org/openstack-infra/zuul tox-docs \| track down connectivity problem \|	20:23
Shrews	is the other one so far	20:23
mordred	jeblair: there is, in fact, no /var/lib/zuul on the node	20:23
pabelanger	Oh, is that on the remote side?	20:24
pabelanger	Ya, that explains it	20:24
mordred	yah	20:24
pabelanger	we used /tmp	20:24
mordred	jeblair: OH ! I've got the whole thig now	20:24
mordred	jeblair: ze01 is the only executor using /var/lib/zuul	20:24
mordred	the others are stll using /tmp	20:24
mordred	that's why it only fails sometimes	20:24
pabelanger	Ya	20:24
mordred	and always on ze01	20:24
mordred	PHEW	20:25
mordred	well - we know the entire story now :)	20:25
mordred	and it turns out the error is, in fact, that it can't create a directory	20:25
pabelanger	ya	20:25
pabelanger	/tmp is 777	20:25
pabelanger	we should likely have it use /home/zuul or ansible_ssh_user on the remote side	20:25
jeblair	why are the other executors not using varlibzuul?	20:25
pabelanger	I don't see puppet running on ze02	20:26
pabelanger	did we accept ssh keys on puppetmaster?	20:26
jeblair	pabelanger: probably not	20:27
jeblair	why isn't that part of launch-node?	20:27
pabelanger	Ya, hostkeys aren't accepted on puppetmaster	20:28
jeblair	pabelanger: would you mind fixing that please?	20:28
pabelanger	jeblair: not sure, it has always been a manually process since I've created nodes	20:28
pabelanger	sure	20:28
pabelanger	accepted now	20:30
*** jkilpatr has joined #zuul		20:30
pabelanger	I cannot remember why we changed remote_tmp in ansible.cfg	20:31
jeblair	mordred, pabelanger, Shrews: i need help with this. this is the reason the local and remote directories have the same root: https://review.openstack.org/346880	20:31
pabelanger	was it something with async?	20:31
jeblair	keep in mind, that's for zuulv2.5. i haven't worked out if it's applicable still.	20:31
mordred	jeblair: my reading there is that it's mostly about async	20:32
pabelanger	Ya, that is what I seem to remember too	20:32
mordred	oh - aso - we don't set keep_remote_files anymore	20:33
mordred	# NB: when setting pipelining = True, keep_remote_files	20:33
mordred	# must be False (the default). Otherwise it apparently	20:33
mordred	# will override the pipelining option and effectively	20:33
mordred	# disable it.	20:33
mordred	so I think both reasons we did that are now gone	20:33
jeblair	mordred, pabelanger: okay -- two options: 1) we can remote remote_tmp and revert to default behavior. or 2) i can create a new local tmpdir explicitly for setting remote_tmp (this will work both locally and remotely, and we can make sure it's cleaned up with jobdir).	20:33
pabelanger	right, I seem to remember that also	20:34
pabelanger	I am open to trying option 1	20:34
mordred	me too	20:34
jeblair	option 1 should say 'remove remote_tmp' but i bet you got that	20:34
pabelanger	+1	20:35
mordred	our setting of this seems mostly to have been about working around an issue with two things we don't use anymore - so I vote for option 1 because it's less complexity	20:35
mordred	and if there IS a reason we need to be explicit about it - I'd like to learn that reason and document it	20:35
mordred	jeblair: that said - I'm not oppposed to 2 and think that would also work fine - I just don't think we need it	20:36
Shrews	mordred: don't forget to delete your held nodes when you're done with them	20:36
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove remote_tmp setting https://review.openstack.org/494695	20:36
mordred	Shrews: what's my process for that?	20:36
Shrews	mordred: nodepool delete ID .... i can do it for you since i'm already there	20:37
jeblair	mordred, pabelanger, Shrews: ^ there's that.	20:37
mordred	Shrews: cool. thanks	20:37
mordred	jeblair: I'm going to turn off verbose and keep	20:37
Shrews	mordred: safe to delete them now?	20:37
jeblair	we can merge that by shutting down ze01, or force-merge it and restart all of them.	20:37
mordred	Shrews: yah	20:37
mordred	jeblair: I'm fine with either approach - we should restart all of them anyway though to pick up that change	20:38
mordred	jeblair: so maybe shut down ze01, land that change, then restart all	20:38
mordred	I'm on ze01 and can shut it down real quick	20:39
pabelanger	mordred: jeblair: just thinking, with bwrap now, we might also be able to drop local_tmp too. Since each ansible-playbook is namespaced	20:40
*** isaacb has quit IRC		20:40
pabelanger	also, I have +2'd 494695	20:40
jeblair	mordred: ++	20:40
jeblair	let's make sure we get that and shrews change in place for the restarts	20:40
mordred	done	20:40
mordred	the Shrews change landed already I believe yeah?	20:41
jeblair	nope 494679 i just rechecked it	20:41
jeblair	should go in now	20:41
jeblair	and i self-approved my change as it has 2x+2	20:42
jeblair	(my change may still fail check if it hit ze01 in which case we may need to recheck)	20:42
mordred	jeblair: k. we're gonna have to recheck your change - it managed to get three fails before ze01 went away	20:42
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove remote_tmp setting https://review.openstack.org/494695	20:43
jeblair	that'll speed things up. commit msg mod	20:43
mordred	https://review.openstack.org/#/c/494343/ is reviewable while we're waiting	20:43
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Add known_hosts generation for multiple nodes https://review.openstack.org/494700	20:46
jeblair	mordred: lgtm	20:47
SpamapS	^^ is a bit rough	20:47
SpamapS	and I'm not sure we want that in the predominant base job :-P	20:47
*** dkranz has quit IRC		20:48
mordred	SpamapS: dude	20:51
SpamapS	can you imagine that in jinja? ;-)	20:52
mordred	NO	20:52
SpamapS	neither could I	20:52
jeblair	now.... xpath... you could totally do that in xpath.	20:53
SpamapS	you can build minecraft in xpath	20:53
mordred	you can build xpath in minecraft	20:53
clarkb	re ecdsa aren't you not supposed to use those at all? (Just htinking its a lot of effort to go through to support that s it changes when you shouldn't even use them)	20:56
clarkb	I apparently have ecdsa host keys on this machine though. Interesting	20:57
clarkb	also we are assuming no dsa because we aren't going to talk to ancient things anymore?	20:57
jeblair	it looks like all changes are now failing with this: http://logs.openstack.org/95/494695/2/check/tox-docs/efc3986/job-output.txt.gz#_2017-08-17_20_53_28_879680	21:02
jeblair	pabelanger: are we still missing a piece for the nodepool cloud mirror thing?	21:02
pabelanger	jeblair: no, just waiting for puppet to apply the change to servers	21:04
pabelanger	okay, so that is it	21:04
pabelanger	ah	21:04
pabelanger	nodepool.cloud is missing from that inventory	21:04
pabelanger	why is that	21:05
jeblair	pabelanger: could it be because the executors are running an old zuul?	21:05
pabelanger	ya, that is possible. I didn't confirm /opt/zuul was latest version on zuul executors	21:06
jeblair	okay, i'm going to fix this	21:06
pabelanger	thanks	21:06
pabelanger	http://logs.openstack.org/79/494679/1/check/tox-cover/a8082ad/inventory.yaml does have nodepool.cloud, which is ze01	21:08
jeblair	mordred: this one (A worker was found in a dead state) hit again: http://paste.openstack.org/show/618724/	21:17
jeblair	i'm convinced at this point that's going to kill us in production	21:17
jeblair	https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/strategy/__init__.py#L584	21:21
jeblair	mordred, dmsimard\|off: ara seems to be adding "Initializing new DB from scratch" to the start of our job output; is there a way to avoid that?	21:24
jeblair	see top of http://logs.openstack.org/79/494679/1/check/tox-docs/4b6b007/job-output.txt.gz	21:24
jeblair	2017-08-17 21:14:22,654 DEBUG zuul.AnsibleJob: [build: 4b6b0071d22b4093aa915b76d9d713e6] Ansible output: b'ERROR! A worker was found in a dead state'	21:26
jeblair	Aug 17 21:14:22 ze01 kernel: [6231058.789222] ansible-playboo[10382]: segfault at a9 ip 000000000050ee24 sp 00007fffb6c19a38 error 4 in python3.5[400000+3a8000]	21:26
jeblair	mordred: ^ the "dead state" error is caused by segfault	21:26
jeblair	note they check for sigsegv here: https://github.com/ansible/ansible/blob/devel/lib/ansible/executor/task_queue_manager.py#L333	21:28
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Remove remote_tmp setting https://review.openstack.org/494695	21:32
jeblair	SpamapS: i think we need to do allow ansible-playbook to write a core file. it's running in bwrap in a python subprocess. any ideas how to do that?	21:35
jeblair	(also, we need to consider where it will be written)	21:36
jeblair	(well, technically it's an ansible worker process that's getting the segv and will coredump, but i assume getting this to work for ansible-playbook should work for the worker process too)	21:37
SpamapS	jeblair: ulimit should work inside the process namespace. We can do that with a wrapper around ansible-playbook	21:37
jeblair	SpamapS: so a little shell script which does "ulimit -c unlimited; ansible-playbook ...." then we call that from our popen?	21:41
SpamapS	jeblair: exactly.	21:42
jeblair	cool, i'll hack that up real quick...	21:42
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Fix zuul_stream callback dict modification https://review.openstack.org/494679	21:42
SpamapS	jeblair: I presume you already tried just letting the executor write cores?	21:42
SpamapS	I'm not 100% sure but I thought when you make a new process namespace it takes the parent's settings.	21:42
jeblair	SpamapS: i have not.... i... perhaps erroneously... thought it wouldn't make it all the way through that.	21:42
jeblair	that would be much easier so maybe we should try that first.	21:43
SpamapS	It may not	21:43
SpamapS	you can test that with zuul-bwrap	21:43
jeblair	good point	21:43
jeblair	SpamapS: that works	21:44
jeblair	i'll start with the init script then	21:44
jeblair	now where will the core be written i wonder?	21:45
SpamapS	pwd	21:45
SpamapS	so if that's not writable, that's a problem	21:46
SpamapS	I believe we chdir to / of the bwrap	21:46
jeblair	well, it's more that it's in the jobdir so it will be deleted	21:46
jeblair	so i guess we're going to need to run with 'keep' enabled for a while to catch this	21:46
SpamapS	ah no, no chdir	21:46
jeblair	the popen has cwd=self.jobdir.work_root	21:47
EmilienM	jeblair: fyi, dmsimard is on pto and back next week IIRC	21:47
jeblair	so it'll be in jobdir/work i think	21:47
SpamapS	we chdir to {workdir} so yay	21:47
jeblair	EmilienM: thanks	21:47
SpamapS	jeblair: yeah and the bwrap also does --chdir {workdir}	21:48
SpamapS	is the kernel the same version on all nodes btw?	21:54
SpamapS	just wondering	21:54
jeblair	SpamapS: ze01 is older 4.4.0-79-generic vs 4.4.0-92-generic	21:56
jeblair	SpamapS: you reckon i should reboot?	21:56
SpamapS	Not the worst idea, if for no other reason than normalization	21:57
jeblair	SpamapS: yeah; though to increase the chances of catching the error, i was planning on only keeping ze01 in service	21:57
SpamapS	that 4.4.0 kernel in xenial is particularly insidious .. it's really a bunch of post-4.4.0 patches masquerading as 4.4.0	21:58
jeblair	but if something in the kernel has altered it (fixed or made worse), i'd rather know sooner, so i'll reboot	21:58
SpamapS	Yeah I can't point to anything and go "It's that!" but... that's hundreds of patches.	21:59
jeblair	i'll wait till the current crop of tests finish	21:59
jeblair	we got 4-6 of theese an hour today	21:59
SpamapS	I remember there was a really bad bug in the python in Ubuntu 14.04 that broke our tests for a while. I wonder if we have something similar going on.	22:01
clarkb	that bug was in the garbage controller trying to free already freed entries in a circular list	22:16
clarkb	so possible you've found another one of those in python	22:16
jeblair	clarkb: py3.5?	22:19
clarkb	it was 3.4	22:19
jeblair	ze01 is rebooted, running latest kernel, with ulimit -c unlimited and keep enabled	22:19
jeblair	so we're looking for Ansible output: b'ERROR! A worker was found in a dead state' in the log after 2017-08-17 22:00	22:20
pabelanger	k	22:22
jeblair	i'm going to recheck all zuul patches.	22:22
*** openstackgerrit has quit IRC		22:33
jeblair	oho 3 segfaults i think	22:37
jeblair	yeah, timestamps line up	22:37
SpamapS	status lighting up like a christmas tree	22:37
jeblair	no core files though :(	22:38
SpamapS	blurgh	22:43
*** openstackgerrit has joined #zuul		22:47
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Remove callback_whitelist setting https://review.openstack.org/488214	22:47
dmsimard\|off	jeblair: re - db initialization in ARA, not yet but it's on my to-do.	22:50
* mordred now has working water in his house and does NOT have water running down the street from the broken sprinkler pipe		22:52
mordred	jeblair: have I missed anything new or exciting?	22:54
jeblair	SpamapS: i used /proc to verify that the zuul process was ulimit -c 0 still (maybe something something systemd? or start-stop-daemon? or python-daemon?)	22:54
jeblair	SpamapS: i use prlimit (which i just learned about!) to set it to unlimited while running (!!!!one!)	22:55
jeblair	mordred: yeah; the scrollback about segfaults is interesting; that's what i'm poking at	22:55
SpamapS	jeblair: ooooooooooooooooooo neat	22:55
jeblair	now waiting for another segfault, post 22:55 utc	22:56
jeblair	oh wow all the jobs are done?	22:56
jeblair	i guess most of those were merge conflicts	22:56
mordred	jeblair: just read - lovely!	22:57
jeblair	rechecking all changes again	22:57
mordred	jeblair, dmsimard\|off: also, I'm very curious as to how the db init line from ara got in to the zuul-stream log file	22:57
dmsimard\|off	mordred: me too, likely faulty logging config from ara	22:58
mordred	aha. nod. I can squelch that	22:59
jeblair	it looks like web log streaming may be broken	22:59
dmsimard\|off	logging config (and logging in general) in ara is either horrible or inexistant so it wouldn't surprise me	22:59
jeblair	direct finger to ze01 works	22:59
jeblair	mordred, dmsimard\|off: we're also getting alembic log lines to ansible stdout/stderr (so they're showing up in the executor debug log)	23:01
jeblair	2017-08-17 22:59:07,638 DEBUG zuul.AnsibleJob: [build: 52eab059248f4642b08660e3988cfc64] Ansible output: b'INFO [alembic.runtime.migration] Running upgrade 22aa8072d705 -> 5716083d63f5, ansible_metadata'	23:01
jeblair	eg ^	23:01
dmsimard\|off	jeblair: that would also be ara, yes	23:01
jeblair	that's not user visible, but it does make the executor logs chatty so would be nice to clean up (the entries showing up in the console log are more important to nix before production)	23:02
dmsimard\|off	https://storyboard.openstack.org/#!/story/2000931 is aptly described as "Get rid of alembic foreground logging, it's pretty annoying"	23:03
jeblair	it's only 7% of our log output at the moment. :)	23:03
dmsimard\|off	I'll add a logging general topic to my 1.0 checklist sir :)	23:04
jeblair	2017-08-17 23:00:56,323 DEBUG zuul.AnsibleJob: [build: 5aaa32661bae496ba0e01006aa3d67cb] Ansible output: b'ERROR! A worker was found in a dead state'	23:04
jeblair	ding!	23:04
pabelanger	winner winner, chicken dinner	23:06
dmsimard\|off	if I keep piling up those 1.0 ARA todo's I'm going to pull a Zuul v3 and ship it in 2 years at this rate :D	23:06
* dmsimard\|off hides		23:06
jeblair	SpamapS, mordred: still no core file	23:08
clarkb	jeblair: is pam resetting ulimits?	23:08
clarkb	it may do that if there are user switches happening	23:08
clarkb	you should be able to check via /proc I think	23:08
jeblair	clarkb: i just checked an ansible-playbook process via proc and it has core set to unlimited.	23:09
SpamapS	jeblair: work_dir is rw mounted right?	23:09
jeblair	SpamapS: yes, it's jobdir/work so should be writable in all cases	23:10
jeblair	cat /proc/sys/kernel/core_pattern	23:10
jeblair	\|/usr/share/apport/apport %p %s %c %P	23:10
* jeblair is not amused		23:11
jeblair	SpamapS: any chance you know if that has caused the core dumps to go to some place where i can recover them?	23:13
mordred	jeblair: maybe we shoudl uninstall apport	23:14
jeblair	/var/crash is empty	23:14
clarkb	ianw may know since there was debugging of cores from apache I think	23:14
jeblair	echo -n "core" > /proc/sys/kernel/core_pattern	23:15
jeblair	i did that	23:15
mordred	Apport is not enabled by default in stable releases, even if it is installed. The automatic crash interception component of apport is disabled by default in stable releases for a number of reasons:	23:15
mordred	I beg to differ with them	23:15
jeblair	now looking for a segfault after 23:15	23:16
jeblair	another global recheck	23:16
mordred	jeblair: echo "/tmp/cores/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern	23:17
jeblair	mordred: does /tmp/cores need to be world-writable?	23:18
jeblair	if i were to do that?	23:18
mordred	jeblair: unsure - reading more	23:19
mordred	jeblair: so - actually your thing seems better	23:19
mordred	jeblair: "core" is the default and will put a core file in the current dir	23:20
jeblair	k. i think we're set up for that, but if we have problems, we can try with tmp	23:20
mordred	cool	23:20
jeblair	-rw------- 1 zuul zuul 84430848 Aug 17 23:27 /var/lib/zuul/builds/33ff13fb20cf44dea20078f1ea61eac5/work/core	23:29
jeblair	finally!	23:29
SpamapS	jeblair: woot	23:30
jeblair	#0 0x000000000050ee24 in visit_decref () at ../Modules/gcmodule.c:373	23:31
jeblair	that function is the same in 3.5.2 and tip of master	23:37
clarkb	I want to say that may be where the old bug was	23:38
jeblair	ianw: ^ is any of this looking familiar?	23:38
clarkb	they do list traversals and decrement references. Problem is its a circular list so if you don't clean pointers up properly on last pass you can hit old freed memory on the next pass and attempt to free already freed memory	23:39
jeblair	clarkb: was there a fix for that?	23:39
clarkb	yes, the fix came from 3.5 iirc	23:39
clarkb	I'm finding links now	23:40
clarkb	https://bugs.launchpad.net/ubuntu/+source/python3.4/+bug/1367907 is what I filed in ubuntu to fix trusty, http://bugs.python.org/issue21435 is bug upstream in python	23:41
openstack	Launchpad bug 1367907 in python3.4 (Ubuntu Trusty) "Segfault in gc with cyclic trash" [Undecided,In progress]	23:41
SpamapS	Ubuntu is usually _really_ close to mainline releases on python.	23:41
clarkb	bt is different	23:41
clarkb	so this is probably a new bug	23:41
ianw	jeblair: apart from the unfortunate familiarity of spending 97% of the time just trying to get a coredump, no sorry :/	23:41
jeblair	https://etherpad.openstack.org/p/uHxJU6CrXW	23:42
ianw	can you do a bt full or is it crazy?	23:47
clarkb	http://bugs.python.org/issue31181	23:48
clarkb	interesting that they seem to think it is related to yaml	23:48
jeblair	clarkb: and that's py27	23:48
clarkb	http://bugs.python.org/issue27945 seems like a more robust reporting of what may be this issue and last bug is dup of it	23:48
clarkb	jeblair: ya, ^ is all the versions though and I think is the same bug	23:49
clarkb	there are patches too	23:49
jeblair	ianw: will check in a sec	23:49
clarkb	is it just me or does gerrit make it a million times easier to understand what the actual fixes are	23:50
clarkb	https://github.com/python/cpython/commit/2f7f533cf6fb57fcedcbc7bd454ac59fbaf2c655	23:50
jeblair	ianw: http://paste.ubuntu.com/25335706/	23:53
clarkb	its not too difficult to build your own python particularly on debuntu	23:54
clarkb	you could likely grab that patch and see if it fixes	23:54
clarkb	(on suse its a major pita because everything needs patching by distro)	23:54
mordred	jeblair, clarkb: since it mentions yaml - i wonder if there is any difference between cyaml and non-cyaml	23:59
clarkb	mordred: looking at the second bug it seems to be related to a large rewrite of the dict type	23:59
mordred	ah	23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!