Tuesday, 2017-10-31

clarkb	infra-root can I get https://review.openstack.org/#/c/516473/ reviewed and merged so that I don't have to disable puppet on logstash.o.o?	00:00
clarkb	it just restarted the daemon with the broken config (I'm going to restart with manually fixed config now)	00:00
*** andreww has joined #openstack-infra		00:00
*** ijw has quit IRC		00:01
fungi	it's reviewed and approved	00:01
*** ijw has joined #openstack-infra		00:01
*** bobh has quit IRC		00:02
clarkb	tyty	00:02
*** gouthamr has quit IRC		00:05
* clarkb pops out to make dinner		00:06
fungi	subunit-worker02 seems to have gear 0.11.0 installed now, so i'm going to restart the worker on it	00:06
fungi	maybe it'll catch back up	00:07
*** armaan__ has quit IRC		00:10
*** markvoelker_ has quit IRC		00:11
*** dingyichen has joined #openstack-infra		00:11
*** ijw has quit IRC		00:14
*** ijw has joined #openstack-infra		00:14
fungi	i need to knock off for the evening. i'll be semi-around tomorrow for meetings but need to take some time to finish prepping for a very long flight on wednesday-friday	00:16
clarkb	good night	00:17
fungi	thanks, you too	00:17
clarkb	and ya I'll be around but also packing/prepping	00:17
*** ijw has quit IRC		00:19
openstackgerrit	Merged openstack-infra/system-config master: Remove zl's from jenkins-logstash-client config https://review.openstack.org/516473	00:21
clarkb		00:24
clarkb	whoops	00:24
*** markvoelker has joined #openstack-infra		00:25
*** LindaWang has quit IRC		00:30
pabelanger	clarkb: ah, we should have landed https://review.openstack.org/515181/ :)	00:31
pabelanger	I can rebase quickly	00:32
clarkb	ah sorry	00:32
*** bobh has joined #openstack-infra		00:33
*** thorst has quit IRC		00:34
*** ijw has joined #openstack-infra		00:35
*** owalsh_pto has quit IRC		00:36
openstackgerrit	Paul Belanger proposed openstack-infra/system-config master: Remove zuul-launcher support https://review.openstack.org/515181	00:37
pabelanger	ianw: clarkb: okay, should be rebased and fully removes zuul-launchers	00:37
*** hongbin has joined #openstack-infra		00:39
pabelanger	ugh, another failure on logstash-worker	00:40
pabelanger	I'll have to pick it up in the morning	00:40
*** bobh has quit IRC		00:42
*** xingchao has joined #openstack-infra		00:45
openstackgerrit	OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645	00:48
*** owalsh_ has joined #openstack-infra		00:49
*** owalsh has joined #openstack-infra		00:51
openstackgerrit	OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645	00:51
*** mat128 has joined #openstack-infra		00:53
*** owalsh- has joined #openstack-infra		00:53
openstackgerrit	OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645	00:54
*** zhurong has joined #openstack-infra		00:54
*** owalsh_ has quit IRC		00:54
*** owalsh_ has joined #openstack-infra		00:55
*** owalsh has quit IRC		00:56
*** owalsh has joined #openstack-infra		00:58
*** cuongnv has joined #openstack-infra		00:59
*** owalsh- has quit IRC		00:59
*** namnh has joined #openstack-infra		01:00
*** owalsh- has joined #openstack-infra		01:01
*** LindaWang has joined #openstack-infra		01:01
*** kiennt26 has joined #openstack-infra		01:01
*** owalsh_ has quit IRC		01:01
*** ijw has quit IRC		01:03
*** ijw has joined #openstack-infra		01:03
openstackgerrit	Merged openstack-infra/system-config master: Switch to cgit from gitweb https://review.openstack.org/511970	01:03
*** ijw has joined #openstack-infra		01:04
*** yamahata has quit IRC		01:04
*** owalsh_ has joined #openstack-infra		01:04
*** owalsh has quit IRC		01:04
*** baoli has joined #openstack-infra		01:05
openstackgerrit	Merged openstack-infra/system-config master: Remove unneeded encoding change. https://review.openstack.org/515580	01:06
*** rhallisey has quit IRC		01:07
*** owalsh has joined #openstack-infra		01:07
*** xingchao has quit IRC		01:08
*** owalsh- has quit IRC		01:08
*** dhinesh has joined #openstack-infra		01:08
*** ijw has quit IRC		01:08
*** owalsh- has joined #openstack-infra		01:10
*** owalsh_ has quit IRC		01:11
*** smatzek has joined #openstack-infra		01:14
openstackgerrit	Merged openstack-infra/system-config master: Add stretch mirror for ceph https://review.openstack.org/513591	01:14
*** owalsh has quit IRC		01:14
yamamoto	why logstash.o.o shows files which is not in jenkins-log-client.yaml? (like logs/screen-gnocchi-metricd.txt.gz)	01:17
*** bobh has joined #openstack-infra		01:17
yamamoto	is there another list?	01:17
clarkb	yamamoto: with the switch to zuulv3 we switched to regex based listing based on what is on disk for the job	01:18
clarkb	so anyrhing matching the regex is now pushed, you can see that in project-config/roles iirc	01:18
*** Apoorva_ has joined #openstack-infra		01:19
*** rlandy has quit IRC		01:20
yamamoto	clarkb: so jenkins-log-client.yaml is no longer relevant?	01:21
*** psachin has joined #openstack-infra		01:21
clarkb	for the most part that is correct	01:21
clarkb	we still use it to run the gearman server	01:21
*** psachin has quit IRC		01:22
*** Apoorva has quit IRC		01:22
*** psachin has joined #openstack-infra		01:22
*** Apoorva_ has quit IRC		01:23
yamamoto	clarkb: i got it. thank you	01:24
*** larainema has joined #openstack-infra		01:24
*** bobh has quit IRC		01:26
yamamoto	is it expected that logstash.o.o shows both of job-output.txt and job-output.txt.gz?	01:31
*** smatzek has quit IRC		01:32
*** LindaWang has quit IRC		01:34
clarkb	it shouldnjust be one per job iirc	01:35
clarkb	if not then that is a bug	01:35
openstackgerrit	Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: Create build-openstack-puppet-tarball https://review.openstack.org/515980	01:35
openstackgerrit	Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: Remove publish-openstack-puppet-branch-tarball from post pipeline https://review.openstack.org/515982	01:35
openstackgerrit	Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: Move publish-openstack-puppet-branch-tarball into ozj https://review.openstack.org/515981	01:35
openstackgerrit	Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: Revert "Remove publish-openstack-puppet-branch-tarball from post pipeline" https://review.openstack.org/515983	01:35
*** LindaWang has joined #openstack-infra		01:37
pabelanger	AJaeger: fungi: clarkb: EmilienM: mnaser: ianw: ^when you have time, that should be the last steps to getting puppet modules 'release / build' jobs as native zuulv3 jobs	01:39
EmilienM	nice	01:39
EmilienM	pabelanger: why did it work on ocata/pike/master?	01:40
EmilienM	and not on newton	01:40
yamamoto	clarkb: message:"At completion, logs for this job will be available at" for 7d seems to have both fo them for many of results	01:41
*** mriedem has quit IRC		01:42
clarkb	yamamoto: does it have both for the same job?	01:42
clarkb	I think as long as its unique per job we are ok but if not needs investogating	01:42
pabelanger	EmilienM: you'll have to ask mnaser that, maybe jobs didnt get backported properly? The correct path forward is to use the new build-openstack-puppet-tarball job, as it simplifies things greatly	01:42
*** Sukhdev has quit IRC		01:44
yamamoto	clarkb: both with the same build_uuid	01:44
*** annp has joined #openstack-infra		01:44
clarkb	ok will have to investigate then	01:45
*** edmondsw has quit IRC		01:45
clarkb	can you share an example uuid/query?	01:45
yamamoto	clarkb: the query was message:"At completion, logs for this job will be available at"	01:46
yamamoto	clarkb: for 7d period	01:46
yamamoto	clarkb: build_uuid is eg. 2c30560da8f04966af83c1d951dd8603	01:46
clarkb	thanks will look in the morning	01:47
*** dhinesh has quit IRC		01:49
mnaser	EmilienM: is it possible that stable/newton didnt have puppet in bindep where it didnt work?	01:53
EmilienM	mnaser: no, see https://review.openstack.org/#/c/515132/ which is on top of the bindep patch	01:53
EmilienM	mnaser: and still failing after recheck	01:54
*** gmann_afk is now known as gmann		01:58
*** camunoz has quit IRC		01:58
pabelanger	mnaser: revoke-sudo is called, before sudo install	01:58
*** aeng has quit IRC		02:00
*** apetrich has quit IRC		02:03
*** apetrich has joined #openstack-infra		02:04
openstackgerrit	Paul Belanger proposed openstack-infra/project-config master: Remove publish-openstack-puppet-branch-tarball https://review.openstack.org/515984	02:06
mnaser	EmilienM: i wonder why its failing foro that	02:07
mnaser	EmilienM: oooooh	02:09
mnaser	one moment	02:09
mnaser	EmilienM: https://review.openstack.org/#/q/Id68ee1b443a4172d0c1d6d58a04908c52a566623 you can blame mwhahaha for this one :D	02:10
mnaser	oh, merge conflict	02:10
EmilienM	I always do	02:10
EmilienM	I can do the git thing	02:11
mnaser	EmilienM: do you mind cherry picking that locally into stable/newton please	02:11
mnaser	that will fix it for you	02:11
*** threestrands has joined #openstack-infra		02:11
*** threestrands has quit IRC		02:11
*** threestrands has joined #openstack-infra		02:11
mnaser	you can do Depends-On as well to get your puppet-tripleo job to be green	02:11
* mnaser goes back to winter tire shopping :(		02:11
EmilienM	mnaser: ok	02:11
EmilienM	mnaser: I need to buy that also	02:11
mnaser	november 15 is coming up :P	02:12
EmilienM	mnaser: I don't live in QUebec :P	02:12
EmilienM	I don't even know how it works here in BC lol	02:12
EmilienM	mnaser: where do you go?	02:14
mnaser	EmilienM: usually to Costco but they don’t carry tires in the size of my new car	02:14
EmilienM	ok	02:14
mnaser	I found a place in Ottawa that has a nice packages both tire + rims so I can swap them myself	02:14
EmilienM	I'll let professionals doing it :-D	02:15
*** aeng has joined #openstack-infra		02:17
mgagne	isn't December 15 in Quebec?	02:18
*** rkukura has quit IRC		02:19
pabelanger	I kinda with I lived in northern ontario, you can put studs on winter tires	02:20
mgagne	yea, just read about it, you can have studs since October 1st but no laws regarding winter tires	02:20
mnaser	mgagne: it was december 15th when they first started enforcing it, but the actaul date was november 15th	02:21
mnaser	wait	02:21
mnaser	it is december 15	02:21
mnaser	i thought it was november 15	02:21
mgagne	yes, can't find anything about november	02:21
*** rwsu has joined #openstack-infra		02:22
mgagne	so you have time, maybe someone suggested November in the news?	02:22
clarkb	EmilienM: are you in vancouver or victoria?	02:22
mnaser	let me keep telling myself november 15 so i can get it done earlier :P	02:22
* mgagne said nothing		02:22
clarkb	if so you probably dont need tires unless driving up.to eg whistler	02:22
mnaser	mgagne: but also, i have summer performance tires which means the car is useless with the slightest of snow	02:22
clarkb	pnw is typically realtively warm and wet	02:22
*** aeng has quit IRC		02:22
mgagne	I read you are required by laws to get winter tires in some BC areas	02:22
mnaser	so dont wanna take any chances	02:22
mgagne	hehe	02:23
mgagne	and now to think about storing that motorcycle ^^'	02:23
*** catintheroof has joined #openstack-infra		02:24
mgagne	time to go home and get some rest for more Nova Mitaka upgrade tomorrow :D	02:25
mnaser	oh boy	02:25
mnaser	bonne chance	02:25
mgagne	thanks =)	02:25
mnaser	odds of a change that's console says "--- END OF STREAM ---" only actually doing work?	02:26
mnaser	515937 legacy-tripleo-ci-centos-7-scenario002-multinode-oooq-puppet .. the reset would be massive :(	02:26
clarkb	it typically is iirc	02:26
clarkb	there is a bug where we dont always get a stream that hasnt been sortes but job is running	02:26
mnaser	lets hope thats the case	02:27
*** bobh has joined #openstack-infra		02:30
pabelanger	hmm	02:33
pabelanger	could be 79 isn't listening again	02:33
pabelanger	let me check quickly	02:33
pabelanger	ya	02:34
pabelanger	finger test@ze02.openstack.org	02:34
pabelanger	is down	02:34
mnaser	yep, it went through!	02:34
*** thorst has joined #openstack-infra		02:35
*** aeng has joined #openstack-infra		02:35
pabelanger	netstat -na \| grep \:79	02:35
pabelanger	returns nothing on ze02	02:35
ianw	pabelanger: is it a full executor restart to get that back?	02:35
pabelanger	yah	02:36
pabelanger	I think it is because of high load on the system	02:36
pabelanger	then we somehow loose socket	02:36
pabelanger	ianw: actually	02:36
pabelanger	https://review.openstack.org/516403/	02:36
pabelanger	we should land that, then do restart all our executors	02:37
pabelanger	that will fix the issue you found yesterday	02:37
ianw	pabelanger: yeah, i need to write a playbook for that, it was a bit of an emergency situation last night	02:37
ianw	we should really stop the scheduler, restart executors, then restart scheduler right?	02:38
pabelanger	https://review.openstack.org/510155/	02:38
pabelanger	I need to rebase that	02:38
pabelanger	and address comments, but should give us a playbook	02:39
*** salv-orl_ has joined #openstack-infra		02:39
*** thorst has quit IRC		02:39
pabelanger	ianw: no, we should be okay to keep scheduler running	02:39
pabelanger	just stop executors	02:39
pabelanger	then start	02:39
*** rkukura has joined #openstack-infra		02:40
ianw	what about the running jobs though? they just die?	02:40
pabelanger	scheduler will see job aborted and requeue it	02:40
pabelanger	so, users shouldn't need to do anything, just that there job will restart a few times until restarts are finished	02:41
*** salv-orlando has quit IRC		02:42
ianw	ahh, ok. i guess last night, the executors were in their really odd state, which messed things up	02:42
*** catintheroof has quit IRC		02:43
ianw	let's merge the typo fix, i'll see about playbook	02:44
*** dhinesh has joined #openstack-infra		02:45
*** reed_ has joined #openstack-infra		02:52
*** gildub has joined #openstack-infra		02:54
*** reed_ has quit IRC		02:54
clarkb	yamamoto: I think the issue is that we use the archive ansible module to gzip job-output.txt and it does not remove the original file by default	02:55
clarkb	yamamoto: so when we look at the fielsystem to create job we see both job-output.txt and job-output.txt.gz	02:56
*** dave-mccowan has quit IRC		02:56
clarkb	we actually just want job-output.txt I thin	02:56
*** hongbin has quit IRC		02:56
*** hongbin_ has joined #openstack-infra		02:56
yamamoto	clarkb: so adding $ to regex would solve the issue?	02:57
*** cshastri has joined #openstack-infra		02:57
clarkb	yamamoto: I think so at least for this specific instance	02:58
clarkb	(which may be sufficient	02:58
clarkb	this could also partly explain why we are behind in indexing	02:59
clarkb	we are indexing twice as much data as we should	02:59
*** hongbin_ has quit IRC		02:59
*** hongbin has joined #openstack-infra		03:00
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Fix syntax with gear unRegisterFunction() https://review.openstack.org/516403	03:03
*** bobh has quit IRC		03:08
*** rosmaita has quit IRC		03:11
*** bobh has joined #openstack-infra		03:14
*** Sukhdev has joined #openstack-infra		03:15
*** cody-somerville has joined #openstack-infra		03:17
*** lathiat_ has joined #openstack-infra		03:18
*** ramishra has joined #openstack-infra		03:19
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Add hard reset for zuul-executors https://review.openstack.org/510155	03:19
*** lathiat has quit IRC		03:19
*** hongbin has quit IRC		03:22
*** esberglu_ has quit IRC		03:22
*** esberglu has joined #openstack-infra		03:22
ianw	ok i've restarted ze01 using ^ to pickup ^^	03:23
ianw	i will monitor for a bit before doing others	03:23
*** edmondsw has joined #openstack-infra		03:24
*** liujiong has joined #openstack-infra		03:26
*** edmondsw has quit IRC		03:29
*** vkmc has quit IRC		03:29
jeblair	ianw, pabelanger: let's give it longer to stop. like 15-20m?	03:33
jeblair	(in the playbook)	03:33
*** gouthamr has joined #openstack-infra		03:34
*** vkmc has joined #openstack-infra		03:35
jeblair	pabelanger, ianw, Shrews: i think we have some more error logging now which may have information on why finger daemon died on ze02, we should look for that.	03:35
*** armax has quit IRC		03:35
pabelanger	jeblair: ianw: yah, I don't think it worked well on ze01. We have 3 zuul-executor and 1 defunt process now	03:35
*** armax has joined #openstack-infra		03:36
ianw	yep, just poking and noticed that	03:36
pabelanger	which usually means, we started an executor one another was shtting down	03:36
pabelanger	shutting*	03:36
ianw	it did correctly find that the pid had disappeared though? it did not timeout	03:36
pabelanger	I have to run now, sould be able to stop both sockets again	03:36
*** bobh has quit IRC		03:37
openstackgerrit	Clark Boylan proposed openstack-infra/project-config master: Logstash jobs treat gz and non gz files as identical https://review.openstack.org/516502	03:37
ianw	jeblair: yeah ... but at least checking that pid didn't even hit that timeout (http://paste.openstack.org/show/625041/)	03:37
clarkb	yamamoto: jeblair dmsimard ^ that is totally untested but I think we may want to do something like that to solve both the query logical name problem and the double indexing of job-output.txt/job-output.txt.gz	03:37
clarkb	jeblair: ^ btw I think yamamoto discovered the cause of the increase in index volume we are indexing console logs twice	03:38
ianw	i'm going to stop it manually and see what disappears	03:38
jeblair	yamamoto: thanks! :)	03:38
openstackgerrit	Kien Nguyen proposed openstack-infra/project-config master: Remove Zun-ui gate jobs https://review.openstack.org/516503	03:38
*** armaan has joined #openstack-infra		03:39
openstackgerrit	Kien Nguyen proposed openstack-infra/openstack-zuul-jobs master: Remove Zun-ui legacy gate jobs https://review.openstack.org/516504	03:39
ianw	ok the init.d stop has returned, the process from the pid file is stil lthere	03:41
openstackgerrit	Kien Nguyen proposed openstack-infra/project-config master: Remove Zun-ui legacy gate jobs https://review.openstack.org/516503	03:42
jeblair	replacement process was probably unable to create a socket file	03:42
ianw	zuul 22102 11596 0 Oct30 ? 00:00:00 [git] <defunct>	03:44
ianw	zuul 22215 1 0 Oct30 ? 00:00:00 ssh -i /var/lib/zuul/ssh/id_rsa -p 29418 zuul@review.openstack.org git-upload-pack '/openstack/tripleo-heat-templates'	03:44
ianw	that ssh parented to init ...	03:44
ianw	ok, 5301 disappeared after 03:47:10 - 03:39:45	03:47
ianw	so 10 minutes minimum i guess	03:48
ianw	anw@ze01:~$ ps -aef \| grep [z]uul-e	03:48
ianw	zuul 11596 1 8 Oct27 ? 06:42:58 /usr/bin/python3 /usr/local/bin/zuul-executor	03:48
ianw	zuul 11599 11596 0 Oct27 ? 00:00:41 /usr/bin/python3 /usr/local/bin/zuul-executor	03:48
ianw	zuul 21113 11599 0 Oct29 ? 00:00:00 /usr/bin/python3 /usr/local/bin/zuul-executor	03:48
ianw	i'll manually clean up these	03:48
*** dhinesh_ has joined #openstack-infra		03:53
*** dhinesh has quit IRC		03:53
*** markvoelker has quit IRC		03:55
*** ijw has joined #openstack-infra		04:04
*** ykarel has joined #openstack-infra		04:06
*** udesale has joined #openstack-infra		04:06
*** ijw has quit IRC		04:09
*** xingchao has joined #openstack-infra		04:13
*** armax_ has joined #openstack-infra		04:21
*** rkukura_ has joined #openstack-infra		04:22
*** armax has quit IRC		04:22
*** notmyname has quit IRC		04:22
*** armax_ is now known as armax		04:22
*** dhinesh_ has quit IRC		04:23
*** jpena\|off has quit IRC		04:23
*** dhinesh has joined #openstack-infra		04:23
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Add hard reset for zuul-executors https://review.openstack.org/510155	04:24
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Add some notes on puppet kicks and service restarts https://review.openstack.org/516510	04:24
*** vhosakot_ has joined #openstack-infra		04:24
*** notmyname has joined #openstack-infra		04:24
*** vhosakot_ has quit IRC		04:24
*** sc` has quit IRC		04:24
*** sc` has joined #openstack-infra		04:25
*** jpena\|off has joined #openstack-infra		04:25
*** nhicher has quit IRC		04:25
*** nhicher has joined #openstack-infra		04:25
*** rkukura has quit IRC		04:25
*** rkukura_ is now known as rkukura		04:25
*** vhosakot has quit IRC		04:27
*** cshastri has quit IRC		04:31
*** vsaienk0 has joined #openstack-infra		04:31
*** thorst has joined #openstack-infra		04:34
*** gouthamr has quit IRC		04:34
*** vhosakot has joined #openstack-infra		04:34
*** thorst has quit IRC		04:39
*** vsaienk0 has quit IRC		04:41
*** xingchao has quit IRC		04:49
*** armax has quit IRC		04:50
*** zhurong has quit IRC		04:55
*** Sukhdev has quit IRC		04:55
*** markvoelker has joined #openstack-infra		04:56
*** dhinesh has quit IRC		05:06
*** janki has joined #openstack-infra		05:06
yamamoto	can in-repo .zuul.yaml have periodic jobs?	05:06
*** liusheng has quit IRC		05:07
*** liusheng has joined #openstack-infra		05:07
*** edmondsw has joined #openstack-infra		05:13
*** edmondsw has quit IRC		05:17
*** sree has joined #openstack-infra		05:19
*** gildub has quit IRC		05:25
*** janki has quit IRC		05:27
*** janki has joined #openstack-infra		05:27
*** markvoelker has quit IRC		05:30
*** mat128 has quit IRC		05:30
*** yamahata has joined #openstack-infra		05:35
*** xingchao has joined #openstack-infra		05:36
*** gildub has joined #openstack-infra		05:37
*** kiennt26 has quit IRC		05:42
*** gildub has quit IRC		05:46
*** zhurong has joined #openstack-infra		05:46
*** threestrands has quit IRC		05:46
*** cshastri has joined #openstack-infra		05:58
ianw	pabelanger: ahhh! "path": "/proc/11206\n/status" ... the '\n' is why it doesn't wait properly	05:58
*** threestrands has joined #openstack-infra		05:58
*** threestrands has quit IRC		05:58
*** threestrands has joined #openstack-infra		05:58
*** threestrands has quit IRC		06:03
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010	06:05
*** kiennt26 has joined #openstack-infra		06:05
*** ijw has joined #openstack-infra		06:05
openstackgerrit	Chason Chan proposed openstack-infra/project-config master: Add pike branch for OpenStack-Manuals gerritbot https://review.openstack.org/516523	06:08
*** ijw has quit IRC		06:10
*** gildub has joined #openstack-infra		06:12
*** dhajare has joined #openstack-infra		06:15
*** aeng has quit IRC		06:19
AJaeger	yamamoto: yes, should be possible - try it out - and point me to your change for review	06:22
*** esberglu has quit IRC		06:24
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010	06:25
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Add hard reset for zuul-executors https://review.openstack.org/510155	06:25
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Add some notes on puppet kicks and service restarts https://review.openstack.org/516510	06:25
*** markvoelker has joined #openstack-infra		06:27
*** gongysh has joined #openstack-infra		06:28
*** nikhil has quit IRC		06:30
*** thorst has joined #openstack-infra		06:35
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010	06:35
*** linkedinyou has joined #openstack-infra		06:37
*** ijw has joined #openstack-infra		06:38
openstackgerrit	Merged openstack-infra/project-config master: Setup Contributor Guide in Storyboard https://review.openstack.org/516462	06:38
*** thorst has quit IRC		06:39
*** ijw has quit IRC		06:42
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010	06:44
*** hemna_ has quit IRC		06:45
*** hemna_ has joined #openstack-infra		06:46
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010	06:53
*** tosky has joined #openstack-infra		06:53
*** ijw has joined #openstack-infra		06:54
*** vhosakot has quit IRC		06:54
*** jtomasek has joined #openstack-infra		06:58
*** ijw has quit IRC		06:58
*** rcernin has quit IRC		06:59
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010	07:00
*** jtomasek has quit IRC		07:00
*** markvoelker has quit IRC		07:00
openstackgerrit	Rui Chen proposed openstack-infra/zuul feature/zuulv3: Use user home as work directory of executor https://review.openstack.org/516532	07:03
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Fix openstack-infra publishing https://review.openstack.org/516010	07:05
*** spectr has joined #openstack-infra		07:08
*** salv-orl_ has quit IRC		07:11
*** salv-orlando has joined #openstack-infra		07:11
*** yamahata has quit IRC		07:14
*** esberglu has joined #openstack-infra		07:15
*** salv-orlando has quit IRC		07:16
*** kiennt26 has quit IRC		07:17
*** pcaruana has joined #openstack-infra		07:17
*** vsaienk0 has joined #openstack-infra		07:18
*** aviau has quit IRC		07:19
*** esberglu has quit IRC		07:19
*** tosky has quit IRC		07:19
*** kiennt26 has joined #openstack-infra		07:19
*** aviau has joined #openstack-infra		07:19
*** gildub has quit IRC		07:21
*** salv-orlando has joined #openstack-infra		07:21
*** gildub has joined #openstack-infra		07:22
openstackgerrit	Bhagyashri Shewale proposed openstack-infra/project-config master: Add masakari-dashboard project https://review.openstack.org/515337	07:24
openstackgerrit	Bhagyashri Shewale proposed openstack-infra/project-config master: Add masakari-dashboard project https://review.openstack.org/515337	07:26
openstackgerrit	Bhagyashri Shewale proposed openstack-infra/project-config master: Add jobs for masakari-dashboard project https://review.openstack.org/516537	07:26
*** vsaienk0 has quit IRC		07:32
openstackgerrit	Nam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy jobs from Barbican https://review.openstack.org/510390	07:32
*** jtomasek has joined #openstack-infra		07:33
*** vsaienk0 has joined #openstack-infra		07:34
openstackgerrit	Nam Nguyen Hoai proposed openstack-infra/project-config master: Remove legacy jobs from Barbican https://review.openstack.org/510390	07:38
*** linkedinyou has quit IRC		07:39
*** rcernin has joined #openstack-infra		07:44
*** shardy has joined #openstack-infra		07:45
*** shardy has quit IRC		07:45
*** ffledgling has left #openstack-infra		07:48
*** shardy has joined #openstack-infra		07:49
*** gildub has quit IRC		07:50
openstackgerrit	Nam Nguyen Hoai proposed openstack-infra/openstack-zuul-jobs master: Remove Barbican legacy jobs https://review.openstack.org/510414	07:54
*** markvoelker has joined #openstack-infra		07:58
*** ykarel is now known as ykarel\|lunch		08:03
*** tmorin has joined #openstack-infra		08:09
*** ralonsoh has joined #openstack-infra		08:15
*** Liced has joined #openstack-infra		08:15
*** tesseract has joined #openstack-infra		08:16
*** sree has quit IRC		08:17
*** ccamacho has joined #openstack-infra		08:17
*** kiennt26 has quit IRC		08:17
*** priteau has joined #openstack-infra		08:22
leyal	Hi , I need some help - when i tried to upload a patch - i getting the following message : "Received disconnect from 104.130.246.91 port 29418:12: Too many concurrent connections (64) - max. allowed: 64"	08:22
leyal	But i don't have any open connection to 104.130.246.91 ..	08:22
*** yamamoto has quit IRC		08:23
openstackgerrit	Niraj Singh proposed openstack-infra/project-config master: Add masakari-dashboard project https://review.openstack.org/516550	08:26
openstackgerrit	Niraj Singh proposed openstack-infra/project-config master: Add masakari-dashboard project https://review.openstack.org/516550	08:27
openstackgerrit	Niraj Singh proposed openstack-infra/project-config master: Add jobs for masakari-dashboard project https://review.openstack.org/516552	08:27
*** pgadiya has joined #openstack-infra		08:27
*** salv-orlando has quit IRC		08:27
*** salv-orlando has joined #openstack-infra		08:28
*** d0ugal has quit IRC		08:29
*** d0ugal has joined #openstack-infra		08:29
*** markvoelker has quit IRC		08:30
*** alexchadin has joined #openstack-infra		08:31
*** gcb has joined #openstack-infra		08:32
*** salv-orlando has quit IRC		08:32
*** jpena\|off is now known as jpena		08:33
*** ociuhandu has joined #openstack-infra		08:42
*** ykarel\|lunch is now known as ykarel		08:42
*** amoralej\|off is now known as amoralej		08:44
*** jpich has joined #openstack-infra		08:44
*** hashar has joined #openstack-infra		08:46
*** edmondsw has joined #openstack-infra		08:49
*** salv-orlando has joined #openstack-infra		08:50
*** edmondsw has quit IRC		08:53
*** sdague has joined #openstack-infra		08:53
*** dhajare has quit IRC		08:56
*** baoli has quit IRC		08:58
*** baoli has joined #openstack-infra		08:58
ianw	leyal: are you behind some sort of nat?	08:58
leyal	ianw , thanks for answering me . i am from working from my home - so i am the only one that will gerrit from this network ..	09:00
*** dingyichen has quit IRC		09:01
*** gmann is now known as gmann_afk		09:02
*** cuongnv has quit IRC		09:03
*** gcb_ has joined #openstack-infra		09:03
*** baoli has quit IRC		09:03
*** zhurong has quit IRC		09:03
*** gcb has quit IRC		09:04
*** annp has quit IRC		09:04
*** dhajare has joined #openstack-infra		09:05
ianw	leyal: i can see logins from yourself but no particular errors. is this persistent?	09:06
leyal	ianw, It's started yesterday , and since than it's persistent , (i tried git review ~10 times in the last 3 hours )	09:10
Liced	hi, ajeaeger told me yesterday that translation job was broken 10 days ago. the translation doesn't work for https://github.com/openstack/networking-bgpvpn and now the last merge on the project was yesterday. so my setup translation doesn't work and I don't find the solution	09:11
Liced	translation support is done https://review.openstack.org/486349 and translation is activated in project-config https://review.openstack.org/509178 but with the last merge the project in zanata is still empty	09:14
*** jascott1 has quit IRC		09:14
*** jascott1 has joined #openstack-infra		09:15
*** martinkopec has joined #openstack-infra		09:15
*** Kevin_Zheng has joined #openstack-infra		09:15
*** lucas-afk is now known as lucasagomes		09:19
*** jascott1 has quit IRC		09:19
*** electrofelix has joined #openstack-infra		09:20
ianw	gerrit2@review:~$ ssh -i review_site/etc/ssh_host_rsa_key -p 29418 'Gerrit Code Review'@127.0.0.1 gerrit show-connections -n \| grep 'a/26131' \| wc -l	09:21
ianw	64	09:21
ianw	infra-root: ^ somehow leyal is leaking connections	09:21
*** yamamoto has joined #openstack-infra		09:24
ianw	leyal: i've forcibly closed all the open connections, can you try again?	09:24
*** markvoelker has joined #openstack-infra		09:27
*** jistr\|mtgs is now known as jistr		09:29
leyal	ianw, i tried again and it's ok now ..	09:32
*** yamamoto has quit IRC		09:34
ianw	ok, and i confirmed there was no open connection just now, so it doesn't appear to be leaking any more	09:34
*** liujiong_lj has joined #openstack-infra		09:35
*** liujiong has quit IRC		09:35
ianw	seeing as both are from ip's not your current one, but from your isp, i think it's probably transient. keep an eye, if problems reoccur you can point back to this in logs and we can investigate further	09:35
*** SpamapS has quit IRC		09:36
*** sflanigan has quit IRC		09:36
*** sflanigan has joined #openstack-infra		09:36
*** sflanigan has joined #openstack-infra		09:36
*** bradm has quit IRC		09:36
ianw	("both" above being the ip's against the open sessions that i killed, to be clear)	09:37
*** wolverineav has joined #openstack-infra		09:38
leyal	ianw, thanks !	09:39
*** bradm has joined #openstack-infra		09:39
*** SpamapS has joined #openstack-infra		09:40
*** shardy has quit IRC		09:42
*** shardy has joined #openstack-infra		09:42
*** dsariel__ has joined #openstack-infra		09:44
*** wolverineav has quit IRC		09:46
*** wolverineav has joined #openstack-infra		09:47
*** owalsh- is now known as owalsh		09:47
ianw	jeblair / pabelanger: i have not ended up touching ze02 as i haven't had a chance to look for any info on the finger death. i can see tomorrow if you don't get to it	09:48
*** slaweq has joined #openstack-infra		09:49
openstackgerrit	Niraj Singh proposed openstack-infra/project-config master: Add masakari-dashboard project https://review.openstack.org/516550	09:51
*** wolverineav has quit IRC		09:51
*** slaweq_ has quit IRC		09:51
*** pgadiya has quit IRC		09:52
*** bradm has quit IRC		09:53
*** bradm has joined #openstack-infra		09:54
*** sambetts\|afk is now known as sambetts		09:55
*** kjackal_ has joined #openstack-infra		09:55
*** mandre has quit IRC		09:55
*** mandre_ has joined #openstack-infra		09:56
*** mandre_ is now known as mandre		09:56
openstackgerrit	Merged openstack-infra/project-config master: Fix Grafana neutron-lib dashboard https://review.openstack.org/514801	09:59
*** namnh has quit IRC		10:00
*** armaan_ has joined #openstack-infra		10:00
*** jistr_ has joined #openstack-infra		10:00
*** hemna- has joined #openstack-infra		10:00
*** niska` has joined #openstack-infra		10:00
*** markvoelker has quit IRC		10:00
*** xhku_ has joined #openstack-infra		10:01
openstackgerrit	Merged openstack-infra/project-config master: Publish requirements loci images to DockerHub https://review.openstack.org/512941	10:01
openstackgerrit	Merged openstack-infra/project-config master: ironic: Remove publish-to-pypi add release-openstack-server https://review.openstack.org/516453	10:01
*** witek has quit IRC		10:01
*** niska has quit IRC		10:01
*** jistr has quit IRC		10:01
*** jschlueter has quit IRC		10:01
*** hemna has quit IRC		10:01
*** fbouliane has quit IRC		10:01
*** michaelxin has quit IRC		10:01
*** timrc has quit IRC		10:01
*** armaan has quit IRC		10:01
*** rwsu has quit IRC		10:01
*** krtaylor has quit IRC		10:01
*** askb has quit IRC		10:01
*** zerick has quit IRC		10:01
*** migi has quit IRC		10:01
*** admcleod_ has quit IRC		10:01
*** admcleod has joined #openstack-infra		10:01
*** zerick has joined #openstack-infra		10:01
*** isq_ has joined #openstack-infra		10:01
*** askb has joined #openstack-infra		10:02
*** krtaylor has joined #openstack-infra		10:02
*** rwsu has joined #openstack-infra		10:02
*** witek has joined #openstack-infra		10:02
*** michaelxin has joined #openstack-infra		10:02
*** timrc has joined #openstack-infra		10:02
*** migi has joined #openstack-infra		10:02
*** isq has quit IRC		10:03
*** Jeffrey4l has quit IRC		10:04
*** zoli has quit IRC		10:04
*** pgadiya has joined #openstack-infra		10:06
*** pblaho has joined #openstack-infra		10:06
*** zoli has joined #openstack-infra		10:06
*** Jeffrey4l has joined #openstack-infra		10:07
*** pblaho has quit IRC		10:09
*** pblaho has joined #openstack-infra		10:09
*** bkero has quit IRC		10:11
*** kota_ has quit IRC		10:11
*** kota_ has joined #openstack-infra		10:11
*** bkero has joined #openstack-infra		10:12
*** tobiash has quit IRC		10:12
*** tobiash has joined #openstack-infra		10:15
*** e0ne has joined #openstack-infra		10:15
*** liujiong_lj has quit IRC		10:18
*** pgadiya has quit IRC		10:24
*** udesale has quit IRC		10:26
*** sree has joined #openstack-infra		10:29
*** boden has joined #openstack-infra		10:33
*** sree has quit IRC		10:33
*** sree has joined #openstack-infra		10:35
*** jschlueter\|znc has joined #openstack-infra		10:35
*** hemna_ has quit IRC		10:36
*** thorst has joined #openstack-infra		10:36
*** edmondsw has joined #openstack-infra		10:37
*** pgadiya has joined #openstack-infra		10:38
*** yamamoto has joined #openstack-infra		10:39
*** sree has quit IRC		10:39
*** edmondsw has quit IRC		10:41
*** ociuhandu has quit IRC		10:42
*** [HeOS] has quit IRC		10:42
openstackgerrit	OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645	10:43
*** thorst has quit IRC		10:43
AJaeger	Liced: let me check...	10:46
*** ociuhandu has joined #openstack-infra		10:46
AJaeger	Liced: that merged yesterday at a time that Zuul was unhappy ;(	10:46
AJaeger	We had to restart Zuul and the post job never run.	10:47
AJaeger	Liced: So, waiting for next merge ;(	10:47
*** vsaienk0 has quit IRC		10:47
*** gongysh has quit IRC		10:51
*** pbourke has quit IRC		10:53
*** vsaienk0 has joined #openstack-infra		10:53
*** pbourke has joined #openstack-infra		10:54
*** ijw has joined #openstack-infra		10:55
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Increase timeout for requirements propose job https://review.openstack.org/516610	10:57
*** markvoelker has joined #openstack-infra		10:58
*** ijw has quit IRC		10:59
*** huanxie has quit IRC		10:59
hwoarang	good day	11:02
hwoarang	I am seeing some problems with some internal openstack mirrors for opensuse	11:02
hwoarang	as you can see from here http://logs.openstack.org/27/511227/2/gate/openstack-ansible-functional-opensuse-423/c0d460c/host/lxc-cache-prep-commands.log.txt.gz downloading a package takes a while a the job times out	11:03
hwoarang	the mirror which is used is http://mirror.mtl01.inap.openstack.org/opensuse/...	11:03
hwoarang	fungi pabelanger dirk^	11:04
hwoarang	it's been hitting all the openstack-ansible jobs for quite a while	11:04
dirk	hwoarang: about to turn off mobile phone for the next 24 hours. Will be back from Sydney	11:08
dirk	hwoarang: it looks like I can reach that mirror. Maybe mtu or ipv6 issues?	11:08
hwoarang	dirk: the mirror is reachable but terribly slow as it seems from the job output. it takes 10 minutes to download a few packages	11:09
*** priteau has quit IRC		11:09
hwoarang	and the job is killed	11:09
*** priteau has joined #openstack-infra		11:10
*** Hal has joined #openstack-infra		11:10
*** Hal is now known as Guest95277		11:10
*** do3 has joined #openstack-infra		11:10
*** priteau has quit IRC		11:11
*** do3 has left #openstack-infra		11:11
*** hashar is now known as hasharLunch		11:12
dirk	hwoarang: smells like mtu issue to me	11:12
dirk	hwoarang: can you add debug output for testing that theory? Maybe there is something mtu related different just for opensuse	11:13
hwoarang	dirk: i will have a look but ubuntu also uses mtu 1500	11:14
*** armaan_ has quit IRC		11:14
*** armaan has joined #openstack-infra		11:15
dirk	hwoarang: and you can reproduce the slowness?	11:17
*** rcernin has quit IRC		11:18
hwoarang	i can't reproduce it outside of openstack gates	11:18
*** ldnunes has joined #openstack-infra		11:18
dirk	hwoarang: weird.	11:19
*** kjackal_ has quit IRC		11:20
hwoarang	i have no proof that it's mtu related because the host progresses with downloads, setup fine and it only starts to fail about 20 minutes down the road	11:20
hwoarang	when running a chroot zypper command to prepare a chroot	11:20
hwoarang	anyway	11:20
*** stakeda has quit IRC		11:22
*** panda\|ruck\|off is now known as panda\|ruck		11:24
openstackgerrit	Arx Cruz proposed openstack-infra/tripleo-ci master: DO NOT MERGE - Testing specific DLRN hash tag https://review.openstack.org/516624	11:24
*** armaan has quit IRC		11:25
*** armaan has joined #openstack-infra		11:26
*** hemna has joined #openstack-infra		11:29
vdrok	good morning folks. could someone take a look at https://review.openstack.org/515716? I've added job definitions as per ML thread, but still only the jobs from project-config are run	11:29
*** wolverineav has joined #openstack-infra		11:29
*** markvoelker has quit IRC		11:31
*** sileht has quit IRC		11:33
*** sileht has joined #openstack-infra		11:34
*** ociuhandu has quit IRC		11:36
*** armaan has quit IRC		11:37
*** armaan has joined #openstack-infra		11:37
*** armaan has quit IRC		11:42
*** smatzek has joined #openstack-infra		11:44
*** esberglu has joined #openstack-infra		11:48
*** salv-orlando has quit IRC		11:48
*** pgadiya has quit IRC		11:51
*** rosmaita has joined #openstack-infra		11:51
*** esberglu has quit IRC		11:52
*** udesale has joined #openstack-infra		11:54
*** salv-orlando has joined #openstack-infra		11:55
*** jaypipes has joined #openstack-infra		11:55
*** thorst has joined #openstack-infra		11:58
*** kjackal_ has joined #openstack-infra		12:05
*** hemna has quit IRC		12:07
*** shardy is now known as shardy_lunch		12:07
*** alexchadin has quit IRC		12:09
*** yamamoto has quit IRC		12:11
*** yamamoto has joined #openstack-infra		12:11
*** thorre has quit IRC		12:13
*** dprince has joined #openstack-infra		12:13
*** thorre has joined #openstack-infra		12:16
*** armaan has joined #openstack-infra		12:17
*** hemna has joined #openstack-infra		12:19
*** armaan has quit IRC		12:20
*** markvoelker has joined #openstack-infra		12:21
*** dhajare has quit IRC		12:22
*** martinkopec has quit IRC		12:24
*** salv-orlando has quit IRC		12:24
*** martinkopec has joined #openstack-infra		12:25
*** edmondsw_ has joined #openstack-infra		12:25
*** rhallisey has joined #openstack-infra		12:28
*** thorst_ has joined #openstack-infra		12:29
*** thorst has quit IRC		12:31
*** catintheroof has joined #openstack-infra		12:31
*** catintheroof has quit IRC		12:32
*** catintheroof has joined #openstack-infra		12:32
*** rlandy has joined #openstack-infra		12:34
*** trown\|outtypewww is now known as trown		12:34
*** dave-mccowan has joined #openstack-infra		12:35
*** [HeOS] has joined #openstack-infra		12:38
*** jcoufal has joined #openstack-infra		12:39
*** salv-orlando has joined #openstack-infra		12:39
*** jonher has joined #openstack-infra		12:40
*** tosky has joined #openstack-infra		12:44
Shrews	pabelanger: jeblair: I searched ze02 logs for the new finger daemon logging on abnormal exception back to Oct 20th. Found nothing.	12:45
dmsimard	Shrews: that catches finger:// urls being returned as logs ?	12:46
*** lucasagomes is now known as lucas-hungry		12:47
*** Dinesh_Bhor has quit IRC		12:47
Shrews	dmsimard: no	12:47
Shrews	only unexpected exceptions from the daemon	12:48
dmsimard	Shrews: oh, okay, cause I have a reproducer for those errors :)	12:48
*** udesale has quit IRC		12:49
*** udesale has joined #openstack-infra		12:49
*** zhurong has joined #openstack-infra		12:49
*** shardy_lunch is now known as shardy		12:49
*** LindaWang has quit IRC		12:50
*** jpena is now known as jpena\|lunch		12:51
*** dhajare has joined #openstack-infra		12:52
Liced	AJaeger: bad luck for me	12:52
*** esberglu has joined #openstack-infra		12:54
*** udesale has quit IRC		12:56
*** yamamoto has quit IRC		12:56
*** udesale has joined #openstack-infra		12:56
*** mandre is now known as mandre_afk		12:56
*** janki has quit IRC		12:58
*** bh526r has joined #openstack-infra		13:01
*** felipemonteiro has joined #openstack-infra		13:03
*** martinkopec has quit IRC		13:03
*** mat128 has joined #openstack-infra		13:04
*** edmondsw_ is now known as edmondsw		13:04
*** martinkopec has joined #openstack-infra		13:04
*** LindaWang has joined #openstack-infra		13:06
*** hasharLunch is now known as hashar		13:09
*** yamamoto has joined #openstack-infra		13:10
*** kgiusti has joined #openstack-infra		13:13
*** bh526r has quit IRC		13:13
*** bh526r has joined #openstack-infra		13:14
*** yamamoto has quit IRC		13:15
*** jcoufal_ has joined #openstack-infra		13:16
*** jascott1 has joined #openstack-infra		13:17
*** mriedem has joined #openstack-infra		13:18
*** jcoufal has quit IRC		13:18
*** salv-orlando has quit IRC		13:19
*** hemna has quit IRC		13:20
*** salv-orlando has joined #openstack-infra		13:20
*** jascott1 has quit IRC		13:21
*** bobh has joined #openstack-infra		13:23
jonher	Is it possible to merge two gerrit accounts? I seem to have double accounts because I've logged in using two different ubuntu accounts to review	13:23
*** salv-orlando has quit IRC		13:24
*** smatzek has quit IRC		13:26
*** smatzek has joined #openstack-infra		13:27
openstackgerrit	Luka Peschke proposed openstack-infra/project-config master: Create a repo for CloudKitty tempest plugin https://review.openstack.org/516673	13:27
Shrews	vdrok: That's a good question. I'm not sure what's going on there. We'll have to wait for jeblair b/c I'm interested in the reason too.	13:30
*** smatzek has quit IRC		13:31
openstackgerrit	Arx Cruz proposed openstack-infra/tripleo-ci master: DO NOT MERGE - Testing specific DLRN hash tag https://review.openstack.org/516624	13:32
*** nikhil has joined #openstack-infra		13:32
*** jaosorior has quit IRC		13:33
*** baoli has joined #openstack-infra		13:35
vdrok	Shrews: ok, thank you	13:36
openstackgerrit	Luka Peschke proposed openstack-infra/project-config master: Add initial jobs for CloudKitty Tempest plugin https://review.openstack.org/516679	13:36
*** ldnunes has quit IRC		13:37
*** lbragstad has joined #openstack-infra		13:37
*** ldnunes has joined #openstack-infra		13:37
*** lucas-hungry is now known as lucasagomes		13:41
*** eharney has joined #openstack-infra		13:41
*** yamamoto has joined #openstack-infra		13:41
*** zhurong has quit IRC		13:42
*** hongbin has joined #openstack-infra		13:43
*** dhajare has quit IRC		13:43
openstackgerrit	OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645	13:44
*** yamamoto has quit IRC		13:45
*** felipemonteiro_ has joined #openstack-infra		13:46
openstackgerrit	Luka Peschke proposed openstack-infra/project-config master: Create a repo for CloudKitty tempest plugin https://review.openstack.org/516673	13:46
openstackgerrit	OpenStack Proposal Bot proposed openstack/os-testr master: Updated from global requirements https://review.openstack.org/503645	13:48
*** amoralej is now known as amoralej\|lunch		13:48
*** kiennt26 has joined #openstack-infra		13:49
*** felipemonteiro has quit IRC		13:49
openstackgerrit	Dmitry Tyzhnenko proposed openstack-infra/git-review master: Add reviewers by group alias on upload https://review.openstack.org/195043	13:50
*** jpena\|lunch is now known as jpena		13:52
*** esberglu has quit IRC		13:53
mriedem	grenade seems to be busted against ocata changes http://logs.openstack.org/04/516404/1/check/legacy-grenade-dsvm-neutron-multinode/aa97464/logs/grenade.sh.txt.gz#_2017-10-30_18_28_09_865	13:53
mriedem	seeing ImportError issues with neutronlib	13:54
mriedem	did something get EOL'ed?	13:54
mriedem	tonyb: ^	13:54
mriedem	looks like at least neutron is eol	13:54
mriedem	for newton	13:54
mriedem	sdague: once anything that stable/newton relies on is eol then don't we have to just kill the grenade job in ocata?	13:56
fungi	jonher: yeah, we've seen that happen if you change which e-mail address you give the ubuntu sso when logging in (gerrit accounts map to ubuntu sso ids, not to launchpad profiles, so even if your multiple ubuntu sso ids are associated with the same lp profile they'll result in distinct accounts in gerrit)	13:56
*** oanson has quit IRC		13:57
fungi	jonher: what's the ssh username on one of the accounts? i should be able to use that to find the other account id by looking for e-mail address overlaps	13:57
*** oanson has joined #openstack-infra		13:57
fungi	or worst case i'll try to match them up by full name	13:58
jonher	old account id: 18279 that I want to keep. New one is ID 27013	13:58
fungi	even better. looking now	13:58
jonher	thanks	13:58
*** hemna has joined #openstack-infra		13:58
*** smatzek has joined #openstack-infra		13:59
*** iyamahat has joined #openstack-infra		13:59
*** gcb_ has quit IRC		14:00
*** jokke_ has joined #openstack-infra		14:00
*** gcb_ has joined #openstack-infra		14:00
*** janki has joined #openstack-infra		14:01
*** smatzek has quit IRC		14:03
fungi	jonher: i've moved your new openid from account 27013 to account 18279 and set 27013 inactive. you may need to log out of and back into the gerrit webui to be in the correct account again	14:03
jonher	Alright, thanks :)	14:03
fungi	my pleasure	14:03
fungi	let us know if you have any further trouble with it	14:03
*** smatzek has joined #openstack-infra		14:05
*** mandre_afk is now known as mandre		14:05
*** ykarel has quit IRC		14:06
sdague	mriedem: yeh, it should be	14:07
sdague	grenade should be turned off first before eoling branches	14:08
mriedem	ok, trying to figure out how to do that in the new zuulv3 world	14:08
*** esberglu has joined #openstack-infra		14:09
fungi	patch to stable/ocata to remove grenade jobs you've declared within that branch, or patch to project-config to adjust the branch filter for grenade jobs if defined there	14:09
*** marst has joined #openstack-infra		14:09
mriedem	not openstack-zuul-jobs?	14:10
mriedem	ok project-config it is	14:10
AJaeger	mriedem: might be openstack-zuul-jobs as well - if you want to patch the job directly	14:11
fungi	yeah, i'm looking now to see where that branch filter is set	14:12
*** catintheroof has quit IRC		14:12
AJaeger	fungi: we can set it in openstack-zuul-jobs on the job itself - and then it applies everywhere	14:12
mriedem	i know how to do it per-job in openstack-zuul-jobs/zuul.d/zuul-legacy-jobs	14:12
AJaeger	yep	14:12
mriedem	since layout.yaml is gone in project-config	14:12
mriedem	ok	14:12
fungi	yeah, we set it in project-config for the legacy grenade jobs at the moment, like http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n58	14:13
AJaeger	config-core, the requirements proposal job times out, please review this change to have it run longer https://review.openstack.org/516610	14:14
*** catintheroof has joined #openstack-infra		14:14
AJaeger	fungi: yes, that's per job and project. If we want to do it globally, it's openstack-zuul-jobs	14:14
fungi	right there's a mix of the two right now but once we get the legacy jobs cleaned up we'll hopefully just have one place to update that in job-templates	14:15
fungi	er, project-templates	14:16
*** rbrndt has joined #openstack-infra		14:16
AJaeger	indeed that one as well...	14:16
*** armax has joined #openstack-infra		14:17
openstackgerrit	Matt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata https://review.openstack.org/516694	14:17
mriedem	AJaeger: i think this ^	14:17
AJaeger	mriedem: yes, expect so	14:18
fungi	well, except for the ones which are still set in project-config in the projects.yaml for now	14:19
fungi	also, i don't think you want to remove the grenade-forward jobs from ocata	14:19
mriedem	what does the forward job do again?	14:19
fungi	since those test that you can upgrade from a proposed stable/ocata change to stable/pike	14:19
mriedem	oh..	14:19
mriedem	will projects.yaml override what's in openstack-zuul-jobs/	14:20
mriedem	?	14:20
*** jaosorior has joined #openstack-infra		14:20
sdague	fungi: they've never been voting, I'm not convinced they even work	14:20
fungi	sdague: perhaps we should remove them entirely in that case?	14:20
fungi	newton eol isn't a reason to remove the grenade-forward jobs from ocata since they don't touch newton, but if there is a good reason to just drop the grenade-forward jobs globally then probably better to do that	14:21
sdague	yeh, that's fine	14:22
fungi	mriedem: i think the variants in the projects.yaml in project-config will override what's in openstack-zuul-jobs (though there aren't many)	14:23
mriedem	ok i'll tinker with project-config too then	14:23
*** armax has quit IRC		14:23
AJaeger	I agree with fungi, we need to update both	14:24
openstackgerrit	Matt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata https://review.openstack.org/516694	14:24
*** amoralej\|lunch is now known as amoralej		14:27
openstackgerrit	Matt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata https://review.openstack.org/516694	14:27
*** armax has joined #openstack-infra		14:28
*** Guest53850 has quit IRC		14:29
*** lamt has joined #openstack-infra		14:29
jeblair	vdrok: i don't see what's wrong with that patch. i lost a debugging tool in a recent zuul restart; i may need to restart it again to get it back to dig into that.	14:30
*** catintheroof has quit IRC		14:30
*** eharney has quit IRC		14:31
jeblair	i'll do that now, unless anyone objects	14:32
dmsimard	AJaeger: commented on https://review.openstack.org/#/c/516397/	14:33
fungi	jeblair: no objection from me	14:33
AJaeger	dmsimard: that's so far the only repo that needs it and therefore I did it at repo level	14:34
AJaeger	dmsimard: do you see that this is needed by more repos?	14:35
dmsimard	AJaeger: that's curious, why wouldn't this be required for other repos ? it's the same job isn't it ?	14:35
*** catintheroof has joined #openstack-infra		14:35
AJaeger	dmsimard: that repo installs in its tox_install requirements repo...	14:36
*** ykarel has joined #openstack-infra		14:36
*** salv-orlando has joined #openstack-infra		14:37
*** spectr has quit IRC		14:37
jeblair	restarted and re-enqueueing now	14:37
*** vsaienk0 has quit IRC		14:38
*** spectr has joined #openstack-infra		14:39
*** salv-orl_ has joined #openstack-infra		14:40
*** nicolasbock has joined #openstack-infra		14:41
dmsimard	AJaeger: the failing playbook is actually this one: http://logs.openstack.org/e9/e95351593168da9ae6c55c8b5995c097d0ba7853/post/publish-openstack-python-branch-tarball/03c4f4b/ara/file/7f67273f-9e6a-4fa4-91a4-a6ab4c28c511/ ## task: http://logs.openstack.org/e9/e95351593168da9ae6c55c8b5995c097d0ba7853/post/publish-openstack-python-branch-tarball/03c4f4b/ara/result/a2ce2da7-1b23-41fe-9d9c-07e3edc27aea/	14:42
dmsimard	"python-tarball/run.yaml" is used for those jobs: 1) http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n132 2) http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n153 and 3) http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n401	14:42
dmsimard	#3, our failing job, is the only one without requirements	14:42
jaosorior	after zuul was restarted, should the jobs be rechecked or will they get requeued automatically?	14:43
jeblair	jaosorior: they are being re-enqueued now	14:43
dmsimard	jaosorior: they get requeued by the operator who restarts zuul	14:43
dmsimard	it's not automatic in the sense that it still requires manual injection, right jeblair ?	14:43
AJaeger	dmsimard: AH!	14:44
dmsimard	AJaeger: does that make sense ?	14:44
jaosorior	I see	14:44
openstackgerrit	Merged openstack-infra/project-config master: Remove zuul/mapping and job https://review.openstack.org/516029	14:44
jeblair	jaosorior: if your change isn't there now, we may have missed it so you should recheck it	14:44
AJaeger	dmsimard: my hero, thanks for digging into this. Let me double check...	14:44
jeblair	dmsimard: ya	14:44
*** salv-orlando has quit IRC		14:44
*** amotoki has quit IRC		14:44
*** jbernard has quit IRC		14:44
*** cshastri has quit IRC		14:44
jaosorior	that makes sense	14:44
jaosorior	jeblair, dmsimard: thanks for the info :D	14:45
*** charz has quit IRC		14:45
jaosorior	the workings of zuul are still quite unknown to me	14:45
AJaeger	dmsimard: yes, makes sense. Do you want to patch it?	14:45
* AJaeger will abandon his ...		14:45
dmsimard	AJaeger: it's the same repo you can just submit another patchset	14:45
*** jbernard has joined #openstack-infra		14:45
dmsimard	I can submit it if you want	14:45
jeblair	vdrok: well, that's annoying. 515716 appears to be working correctly after the restart.	14:46
*** eharney has joined #openstack-infra		14:46
AJaeger	dmsimard: if you have time, go for it, please	14:46
dmsimard	ack	14:46
vdrok	jeblair: I see, well, that's not so bad :)	14:46
jeblair	vdrok: i guess let me know if it happens again :\|	14:46
vdrok	sure, will do. thanks for looking into this	14:46
fungi	dmsimard: correct, we run a python script to generate a shell script based on the contents of specific pipelines obtained from the scheduler's status.json before stopping the service, and then run that shell script once the scheduler is back up and running again (the shell script consists of preformatted calls to the zuul enqueue rpc cli)	14:47
*** udesale has quit IRC		14:47
*** amotoki has joined #openstack-infra		14:47
dmsimard	fungi: if that script hasn't changed since v2, I know what script it is :)	14:47
*** charz has joined #openstack-infra		14:47
*** psachin has quit IRC		14:48
fungi	dmsimard: it's changed just ever so slightly to add --tenant	14:49
*** david-lyle has quit IRC		14:50
openstackgerrit	Matt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata https://review.openstack.org/516694	14:50
openstackgerrit	Dmitry Tyzhnenko proposed openstack-infra/git-review master: Add reviewers by group alias on upload https://review.openstack.org/195043	14:50
openstackgerrit	Matt Riedemann proposed openstack-infra/project-config master: Cleanup legacy-grenade-dsvm-neutron* branch restrictions https://review.openstack.org/516705	14:53
dmsimard	AJaeger: so, digging a bit further for that requirements thing... it turns out this is the culprit: http://codesearch.openstack.org/?q=%5C%24ZUUL_CLONER&i=nope&files=&repos=	14:55
dmsimard	tools/tox_install.sh all over the place uses ZUUL_CLONER to ensure that (amongst other things) requirements is there	14:56
*** xarses has joined #openstack-infra		14:56
jeblair	mordred is working to change the pti so we build tarballs differently and won't need that anymore	14:58
fungi	yep, that's so local devs can experience a consistent experience and don't have to wonder why their unconstrained unit tests on their workstation are broken while the same patches pass testing in our constrained ci jobs	14:58
*** dtantsur\|afk is now known as dtantsur		14:58
*** salv-orl_ has quit IRC		14:58
*** salv-orlando has joined #openstack-infra		14:58
*** gcb_ has quit IRC		14:58
fungi	but overriding how tox builds all its virtualenvs is a pretty clumsy hammer, and then when we turn around and use tox for jobs which don't actually need it we end up with nasty side effects like that	14:59
dmsimard	AJaeger: that gives us the list of repos requiring requirements http://codesearch.openstack.org/?q=openstack%2Frequirements&i=nope&files=tools%2Ftox_install.sh&repos=	14:59
jeblair	let's just put it in the job for now, so it's easy to clean up when mordred finishes his work	14:59
jeblair	or template or whatever. ie, not on individual projects.	15:00
dmsimard	yup, taking care of it	15:00
*** gcb_ has joined #openstack-infra		15:01
*** diablo_rojo has quit IRC		15:01
openstackgerrit	David Moreau Simard proposed openstack-infra/project-config master: Add openstack/requirements to publish-openstack-python-branch-tarball https://review.openstack.org/516397	15:01
dmsimard	AJaeger: ^	15:01
dmsimard	EmilienM: are you around ?	15:02
*** yamamoto has joined #openstack-infra		15:04
jeblair	i'm going to restart all of the executors	15:05
*** lbragstad has quit IRC		15:05
jeblair	i don't think they were cleaned up properly after the unclean shutdown the other day	15:05
openstackgerrit	Clark Boylan proposed openstack-infra/project-config master: Logstash jobs treat gz and non gz files as identical https://review.openstack.org/516502	15:05
Shrews	jeblair: would this be a good time to restart the np launchers? we have a couple of bug fixes that should go in	15:06
EmilienM	dmsimard: yes	15:06
*** salv-orlando has quit IRC		15:06
jeblair	Shrews: yes... though i think theoretically any time should be fine? :)	15:07
*** salv-orlando has joined #openstack-infra		15:07
pabelanger	dmsimard: AJaeger: when we remove zuul-cloner from images, that is going to break branch-tarball jobs right? re 516397. Maybe we need to be creating a legacy branch tarball job that will still use zuul-cloner, or I think mordred has patches to remove the needed for tox_install.sh	15:07
AJaeger	dmsimard: we need horizon and neutron as well - do you want to update again or shall I?	15:07
Shrews	jeblair: infra-root: i'm going to do that then. restarting launchers (unless i hear any objections)	15:07
*** lbragstad has joined #openstack-infra		15:08
pabelanger	+1	15:08
jeblair	pabelanger: are you suggesting we have a job using the old v2 zuul-cloner?	15:08
*** sree has joined #openstack-infra		15:08
fungi	Shrews: sounds good, thanks	15:09
AJaeger	pabelanger: yes, this needs some analysis - the use of tox_install and zuul-cloner will be fun once we remoe zuul-cloner from images	15:09
jeblair	no wait	15:09
jeblair	AJaeger, pabelanger: no jobs should be using the copy of zuul-cloner on the images, if you think a job is, please investigate and confirm that now	15:10
pabelanger	jeblair: not sure, just indicating when we remove zuul-cloner from images, and base playbook branch-tarballs job look like they are going to break	15:10
Shrews	infra-root: restarted nodepool-launcher on both nl01 and nl02	15:10
jeblair	pabelanger: please confirm that. it should not be the case.	15:10
pabelanger	yes, looking now	15:10
openstackgerrit	Matt Riedemann proposed openstack-infra/openstack-zuul-jobs master: Don't run legacy-grenade-dsvm-neutron* jobs in newton or ocata https://review.openstack.org/516694	15:11
*** ykarel has quit IRC		15:11
AJaeger	dmsimard: will update	15:11
dmsimard	AJaeger: I guess it would avoid having to create -horizon and -neutron variants (which I dislike very much)	15:11
openstackgerrit	Matt Riedemann proposed openstack-infra/project-config master: Cleanup legacy-grenade-dsvm-neutron* branch restrictions https://review.openstack.org/516705	15:11
openstackgerrit	Matt Riedemann proposed openstack-infra/project-config master: Remove legacy-grenade-dsvm-neutron-nova-next https://review.openstack.org/516711	15:11
AJaeger	dmsimard: agreed, let me fix	15:11
*** salv-orlando has quit IRC		15:11
*** sree has quit IRC		15:12
AJaeger	jeblair: I see both required-projects with "name: some-repo" and with just "some-repo". Are both valid? Any preference?	15:13
AJaeger	check project-config/zuul.d/jobs.yaml	15:13
jeblair	AJaeger: both valid; if not using a branch specifier, i'd prefer just "some-repo"	15:14
*** yamamoto has quit IRC		15:14
*** vsaienk0 has joined #openstack-infra		15:15
AJaeger	ok	15:15
EmilienM	dmsimard: what's up?	15:15
openstackgerrit	Andreas Jaeger proposed openstack-infra/project-config master: Add openstack/requirements to publish-openstack-python-branch-tarball https://review.openstack.org/516397	15:15
dmsimard	EmilienM: sorry got sidetracked	15:15
AJaeger	dmsimard: updated ^	15:15
*** dtantsur is now known as dtantsur\|afk		15:15
dmsimard	EmilienM: mnaser has a nice series of patches here but I put them on hold as you were working on migration https://review.openstack.org/#/c/515972/1	15:15
mriedem	AJaeger: fungi: openstack-zuul-jobs with a depends-on to project-config won't work, right?	15:16
mriedem	project-config changes still have to merge first?	15:16
mriedem	https://review.openstack.org/#/c/516694/	15:16
EmilienM	dmsimard: he can go ahead - I won't have time to work on that until after summit.	15:16
dmsimard	EmilienM: it's work that you'd need to do after the migration anyway, so I thought maybe we ought to land those but that means you'll need to rebase	15:16
EmilienM	dmsimard: I'll rebase	15:16
AJaeger	mriedem: yes, it has to. So, push them both up and then recheck the openstack-zuul-jobs once project-config merged	15:16
dmsimard	EmilienM: ok, it's not going to be so much a rebase as a rewrite but sure	15:16
EmilienM	dmsimard: no worries	15:16
jeblair	mriedem: they won't run tests with the new content, but they will still perform config syntax validation and ensure merging in the right order, so generally worth including the footer still.	15:16
dmsimard	EmilienM: ack, are you confident if I review those or would you rather we keep jobs frozen for now ?	15:17
mriedem	jeblair: then i don't know why this is failing https://review.openstack.org/#/c/516694/	15:17
dmsimard	EmilienM: they're no-op for the most part, just reducing duplication and streamlining	15:17
AJaeger	dmsimard: EmilienM gave a +1, so you could +2A if you like	15:17
dmsimard	AJaeger: double checking :)	15:17
dmsimard	tripleo has had a bumpy gate recently	15:18
AJaeger	dmsimard: sure, appreciated...	15:18
clarkb	dmsimard: fyi https://review.openstack.org/516502 may be of interest to you	15:18
dmsimard	mriedem: you can't do a depends-on on a project-config patch	15:18
*** jbadiapa has quit IRC		15:18
mriedem	dmsimard: that's what i thought, but see jeblair's comment	15:18
EmilienM	dmsimard, AJaeger I need to review it properly	15:19
dmsimard	mriedem: project-config is a "special" project in the context of zuul, it is used for storing secrets and trusted jobs	15:19
mriedem	jeblair: but i see this now, "The syntax of the configuration in this change has been verified to be correct once the config project change upon which it depends is merged, but it can not be used until that occurs."	15:19
mriedem	which is different, and better	15:19
mriedem	so ok	15:19
pabelanger	jeblair: okay, so projects that use tox_install.sh (eg: python-ironicclient) and publish-openstack-python-branch-tarball jobs will be okay when we merge https://review.openstack.org/514483/ (delete zuul-env from DIB) but will break when we land https://review.openstack.org/513506/ (fetch-zuul-cloner from base)	15:19
EmilienM	dmsimard, AJaeger : a commit message would have helped	15:19
*** jcoufal_ has quit IRC		15:19
*** jcoufal has joined #openstack-infra		15:20
pabelanger	I'm thinking, we could create legacy-publish-openstack-python-branch-tarball and parent to base-legacy which will pull in fetch-zuul-cloner	15:20
jeblair	mriedem: yeah. i don't know why that didn't happen the first time.... :\|	15:20
pabelanger	then update jobs using zuul-cloner to that	15:20
dmsimard	pabelanger: let's not create new legacy jobs	15:20
mriedem	jeblair: i think it was just the order in which i pushed the changes up	15:20
dmsimard	pabelanger: it was mentioned that mordred was working on fixing the different tox_install.sh	15:20
jeblair	dmsimard: people can and should use depends-on against project-config patches. it lets zuul do config syntax validation, ensures they land in the right order, and helps human reviewers understand the sequencing.	15:21
pabelanger	well, it isn't a new legacy job, it is just parenting to base-legacy, that pull in zuul-cloner	15:21
EmilienM	AJaeger, dmsimard : lgtm	15:21
dmsimard	jeblair: sure, I mean they can do it but it's not going to have the intended effect is what I mean. It "works" in the sense that it doesn't let that patch merge until the project-config patch merges but it doesn't actually apply and run the jobs intended to run	15:22
pabelanger	but we have zuulv3 jobs, still using legacy code, which is something that is a little confusing too. Meaning, we are going to have breakages at some point	15:22
dmsimard	jeblair: so it... half works ?	15:22
*** martinkopec has quit IRC		15:23
jeblair	dmsimard: look at mriedem's change and the message that zuul reported. that's only possible because of the depends-on	15:23
dmsimard	pabelanger: legacy code embedded in project repositories	15:23
dmsimard	pabelanger: the jobs themselves aren't the ones doing zuul-cloner, it's tox_install.sh	15:23
dmsimard	fungi mentioned earlier that the approach with tox_install.sh is fairly clunky to begin with	15:24
jeblair	dmsimard: i'm just saying if folks ask, rather than saying "it doesn't work" say "it won't run jobs with the changes in effect but there are still several good reasons to do it". i don't want folks to think they should stop using those footers.	15:24
dmsimard	jeblair: fair	15:24
clarkb	also we've always used depends on just for the merge this first behavior	15:25
clarkb	regardless of how it affects jobs	15:25
clarkb	so that is not a regression and still valuable	15:25
dmsimard	clarkb: I use(d) it a lot for the zuul-cloner factor :)	15:25
clarkb	we're definitely not keeping up with the logstash worker load (up to 94k jobs queued since yesterday evening's restart)	15:26
mriedem	AJaeger: thanks for hitting those patches	15:27
clarkb	I'd like to get https://review.openstack.org/#/c/516502/2 reviewed, tested, and in to see if not indexing job-output.txt twice helps there	15:27
pabelanger	dmsimard: right, but we need a plan to remove zuul-cloner that ideally doesn't break them. Today, the 2 patchs up to do so, will break them	15:27
clarkb	so reviews on that and thoughts on testing very much welcome	15:27
clarkb	I guess I need to update the issues ether pad too	15:28
pabelanger	clarkb: I'm still trying to bring online a new worker, another puppet issue I am debugging now	15:28
dmsimard	clarkb: sorry about that. I started down that road yesterday after our discussion, got sidetracked and then wanted to chat about it	15:29
*** spectr has quit IRC		15:29
clarkb	dmsimard: its not a problem I learned new things with yamamoto's help	15:29
openstackgerrit	Jose Luis Franco proposed openstack-infra/tripleo-ci master: WIP: Upgrade UC and OC using tripleo-upgrade role https://review.openstack.org/515643	15:30
dmsimard	clarkb: I'm not sure we want to blindly remove the .gz, I can see it affecting unexpected things -- but mostly because we gzip by default https://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/upload-logs/tasks/main.yaml	15:31
jeblair	okay all zuul-related processes on the executors are stopped	15:31
jeblair	restarting them now	15:31
fungi	clarkb et al: do we have any updates we want to give the board on our drive to beef up root sysadmin count in emea/apac? https://etherpad.openstack.org/p/syd-leadership-top-5-update	15:32
dmsimard	clarkb: I think one of the things we discussed was to not assume it would be gzipped and it might be specific to openstack-infra, but really it's there by default so I would adjust the e-r queries to take that into account	15:32
*** camunoz has joined #openstack-infra		15:33
*** catintheroof has quit IRC		15:34
*** gyee has joined #openstack-infra		15:34
openstackgerrit	Paul Belanger proposed openstack-infra/system-config master: Fix dependency order with logstash_worker.pp https://review.openstack.org/516717	15:37
clarkb	dmsimard: the problem is that we are ending up with job-output.txt and job-output.txt.gz on disk so we submit jobs to index both. So we need to pick one or the other. I chose picking the one that is backward compatbile with zuulv2	15:38
clarkb	we could choose to use the .gz but then we would have to udpate all the queries	15:38
*** Liced has quit IRC		15:39
*** pblaho has quit IRC		15:40
*** catintheroof has joined #openstack-infra		15:40
*** pblaho has joined #openstack-infra		15:40
clarkb	(worth nothing this is a behavior difference between gzip the command and ansible archive module, gzip doesn't leave the original around but ansible archive does)	15:40
*** gcb_ has quit IRC		15:40
clarkb	*worth noting	15:41
*** kiennt26 has quit IRC		15:41
*** spectr has joined #openstack-infra		15:42
*** catintheroof has quit IRC		15:42
dmsimard	clarkb: pretty sure we can get ansible to delete the extra file	15:43
dmsimard	http://docs.ansible.com/ansible/latest/archive_module.html "remove" "no" "Remove any added source files and trees after adding to archive."	15:43
*** gcb_ has joined #openstack-infra		15:43
clarkb	we can but I also don't want to rely on everyone's ansible doing the right thing so this is defensive	15:44
dmsimard	clarkb: you mean if I write a job that archives files relevant to me as an end user and then try to submit those ?	15:45
clarkb	yes	15:45
dmsimard	(and omit the remove parameter)	15:46
dmsimard	hmm	15:46
pabelanger	clarkb: okay, I think 516717 is our fix for logstash-workers, but waiting on zuul to report back	15:46
*** notemerson has quit IRC		15:47
openstackgerrit	Miguel Lavalle proposed openstack-infra/project-config master: Remove legacy-neutron-dsvm-api from Neutron https://review.openstack.org/516724	15:47
clarkb	pabelanger: I don't think you can use a before that way beacuse those are defines not classes	15:47
clarkb	pabelanger: instead you want to require in each of the defines that that file is in place	15:48
pabelanger	clarkb: Oh, right	15:48
pabelanger	Hmm	15:48
clarkb	so you just need one require for each define after config_file	15:48
pabelanger	yah, I started doing that and switched	15:48
pabelanger	let me update	15:48
openstackgerrit	Paul Belanger proposed openstack-infra/system-config master: Fix dependency order with logstash_worker.pp https://review.openstack.org/516717	15:49
pabelanger	clarkb: ^	15:49
dmsimard	clarkb: in the current form of your patch I'm just worried of it backfiring in unexpected ways, but I'm not coming up with any examples to express my concerns.. closest I can come up with is matching .tar.gz. You mentioned the regexes are new, should we take that out and be explicit instead ? I guess we're here discussing this because the regex is matching unexpected things.	15:50
*** xingchao has quit IRC		15:50
*** gongysh has joined #openstack-infra		15:51
*** xingchao has joined #openstack-infra		15:51
*** gongysh has quit IRC		15:51
clarkb	dmsimard: it is two things though, one is the regex overmatching. The other is we have broken the (somewhat loose) contract we had around the file metadata we inject to elasticsearch	15:51
clarkb	I think this addresses both, whereas dropping the regex would only address one?	15:51
clarkb	and as for things like tar those aren't valid indexable files anyways so would fail either way	15:52
dmsimard	What contract is that ?	15:52
pabelanger	dmsimard: clarkb: jeblair: fungi: mordred: AJaeger: we likely need to have some discussion around https://review.openstack.org/513506/ (Remove fetch-zuul-cloner from base job) should I add that to todays infra meeting or is that something we could hash out outside of the meeting?	15:52
clarkb	dmsimard: we've always indexed with filename and tags dropping the .gz even if that is actually the name on disk	15:52
dmsimard	pabelanger: discussing at the meeting is probably fair game	15:52
clarkb	dmsimard: because logically the file is foo.txt not foo.txt.gz and our webserver honors that as well	15:52
dmsimard	clarkb: console.html wasn't gzipped by default, was it ?	15:52
dmsimard	hm, yeah other files, maybe	15:53
clarkb	other files were console.html wasn't upfront but was eventually	15:53
clarkb	and the webserver would serve console.html and console.html.gz from the same source	15:53
dmsimard	yeah due to mime types and live decompressing	15:53
dmsimard	and rewrite rules	15:53
dmsimard	I'm familiar with those bits..	15:54
dmsimard	need to pick up my kids from school for lunch, I'll try to think about it	15:54
clarkb	pabelanger: logstash fix lgtm. Lets see if anyone else is willing to review that quickly	15:55
clarkb	(otherwise I think you can single approve)	15:55
*** jbadiapa has joined #openstack-infra		15:55
*** Apoorva has joined #openstack-infra		15:56
clarkb	pabelanger: I think we can talk about zuul cloner thigns in today's meeting, go ahead and add it as a zuulv3 subtopic	15:56
*** Apoorva has quit IRC		15:56
clarkb	I expect the meeting will be relatively quick?	15:56
clarkb	I've got to pack :) so here is hoping	15:56
*** Apoorva has joined #openstack-infra		15:56
*** smatzek has quit IRC		15:57
*** ihrachys has joined #openstack-infra		15:57
*** Apoorva has quit IRC		15:57
*** xingchao_ has joined #openstack-infra		15:57
*** gongysh has joined #openstack-infra		15:57
*** gongysh has quit IRC		15:57
*** slaweq has quit IRC		15:58
pabelanger	k, added	15:58
*** xingchao has quit IRC		15:58
*** smatzek has joined #openstack-infra		15:59
*** janki has quit IRC		16:00
pabelanger	also added removal of jenkins to topic too	16:00
*** bnemec has quit IRC		16:00
*** iyamahat has quit IRC		16:00
clarkb	fyi for others packing sydney weather is supposed to be damp and relatively cool so don't be tricked by recent news of a heat wave	16:01
*** smatzek_ has joined #openstack-infra		16:01
* clarkb goes to find rain jacket		16:01
*** xingchao_ has quit IRC		16:02
*** yamamoto has joined #openstack-infra		16:02
*** yamamoto has quit IRC		16:02
*** smatzek has quit IRC		16:03
*** jaosorior has quit IRC		16:03
pabelanger	ya, I haven't looked at weather	16:04
panda\|ruck	cloud-y with a chance of tarballs.	16:05
* fungi will pack a hat		16:06
*** david-lyle has joined #openstack-infra		16:09
*** esberglu has quit IRC		16:09
mwhahaha	seeing errors in the gate	16:09
inc0	hey, I can't figure out what happens in our mariadb setup - did you change iptables in gates during zuulv3 transition?	16:10
mwhahaha	no logs, just jobs in 'error'	16:10
jeblair	mwhahaha: zuul should have more detailed information on what the error is when it reports on the change	16:10
mwhahaha	jeblair: http://zuulv3.openstack.org/legacy-tripleo-ci-centos-7-scenario001-multinode-oooq-puppet	16:10
mwhahaha	404	16:11
mwhahaha	from status page	16:11
mwhahaha	jeblair: see 485172,5	16:11
jeblair	mwhahaha: yeah, it doesn't have a log url	16:11
*** bh526r has quit IRC		16:11
jeblair	mwhahaha: we'll know more when it reports	16:11
*** xingchao has joined #openstack-infra		16:11
*** smatzek_ has quit IRC		16:12
*** edmondsw has quit IRC		16:12
*** smatzek has joined #openstack-infra		16:12
jeblair	(those errors could probably be added to the status.json, but i don't think they are there now)	16:12
dmsimard	jeblair: not sure if it's related but I'm seeing an error for another job that did report back -- https://review.openstack.org/#/c/516397/ : openstack-tox-linters openstack-tox-linters : ERROR Project openstack/requirements does not have the default branch master	16:13
dmsimard	that seems like an odd message	16:13
jeblair	dmsimard: could be. i wonder if it's fallout from the executor restart.	16:13
AJaeger	what is this? " build-tox-manuals-checkbuild build-tox-manuals-checkbuild : ERROR Project openstack/requirements does not have the default branch master" - change https://review.openstack.org/516696	16:13
dmsimard	jeblair, AJaeger: I just did a recheck on https://review.openstack.org/#/c/516397/ -- let's see if it reproduces	16:14
AJaeger	and also on https://review.openstack.org/516397	16:14
dmsimard	AJaeger: we were just discussing that, yes	16:14
AJaeger	dmsimard: sorry, read backscroll - found bug, paced - and you beat me to it ;)	16:14
AJaeger	pasted I mean	16:14
*** bnemec has joined #openstack-infra		16:15
AJaeger	clarkb: could you put the infra-publishing change on your review queue again, please? https://review.openstack.org/516010	16:16
jeblair	mwhahaha: seems likely the errors you're seeing are the same thing -- a transient issue caused by the zuul restart	16:16
mwhahaha	:/	16:16
*** trown is now known as trown\|lunch		16:16
mwhahaha	it's really screwing with the gate, is there some sort of recovery that can be done?	16:17
openstackgerrit	Ronelle Landy proposed openstack-infra/tripleo-ci master: DO NOT MERGE - Testing specific DLRN hash tag Also testing dlrn_hash_tag_newest set to specific hash. https://review.openstack.org/516624	16:17
mwhahaha	before the restart we were like 20hours behind	16:17
*** panda\|ruck is now known as panda\|ruck\|bbl		16:18
jeblair	mwhahaha: we could sacrifice the change at the head of the gate to force a reset on all the changes behind it	16:18
fungi	or promote a change to cause it all to reshuffle	16:19
mwhahaha	i already sacrafied a whole bunch of puppet jobs	16:19
jeblair	fungi: yeah	16:19
* fungi is unsure whether promoting the first change would actually restart jobs		16:19
jeblair	fungi: i don't think so	16:20
jeblair	but promoting the second one ahead of the first should	16:20
*** bhavik1 has joined #openstack-infra		16:20
AJaeger	the first three in the queue are looking fine, aren't they? So better let them through?	16:20
jeblair	AJaeger: our only tool is moving it to the top	16:20
jeblair	AJaeger: we could wait i guess	16:20
mwhahaha	meh i guess we'll just have to deal	16:20
mwhahaha	just really frustrating	16:21
AJaeger	jeblair: at this time: I would wait - otherwise we lose the first two...	16:21
jeblair	mwhahaha: you don't like any of the proposed options?	16:21
*** smatzek has quit IRC		16:21
mwhahaha	not really since we haven't been able to merge anything in like a day	16:21
fungi	well, we can give promote a list for which the first three are unchanged but swap the fifth for the fourth	16:21
mwhahaha	how about retry if error :/	16:21
jeblair	fungi: oh we can?	16:21
*** smatzek has joined #openstack-infra		16:21
fungi	promote used to at least take a list of changes	16:22
jeblair	mwhahaha: these are "permanent" errors	16:22
pabelanger	just looking into zuul-executors, we look to be swapping 2GB-3GB (on average) across all of them	16:22
AJaeger	fungi: that might just work...	16:22
dmsimard	unrelated and don't want to sidetrack but wanted to point out maybe there's an issue with nodepool? looking at grafana, we're capping at ~830 nodes.. I see an uptick in failures, it seems like there's a lot of nodes in "ready" state perhaps not being allocated to jobs in a timely fashion ?	16:22
dmsimard	we should be capping at higher than 830 nodes, we saw north of 900 easily after we shifted back inap's capacity to v3	16:22
jeblair	mwhahaha: so what change would you like moved to position #4?	16:23
mwhahaha	let me look	16:23
jeblair	Shrews: can you look into that with dmsimard ?	16:23
dmsimard	Shrews: http://grafana.openstack.org/dashboard/db/nodepool	16:24
*** shardy is now known as shardy_afk		16:24
pabelanger	I'm also thinking we might either need to bump load governor up, or stand up a few more executors. We look to have a fair bit of ready nodes atm	16:24
dmsimard	185 ready nodes and 74 failed nodes.. with a bunch of queued jobs in zuul, there's something going on for sure :/	16:24
mwhahaha	jeblair: can we promote a failed one? 511509	16:25
*** bhavik1 has quit IRC		16:25
dmsimard	pabelanger: oh, that's a good point. Perhaps we have a bunch of nodes available to be queued but current executors are too loaded ?	16:25
fungi	mwhahaha: that one looks like it declares a dependency on 510363	16:25
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Increase github delay to 10 seconds https://review.openstack.org/515812	16:25
jeblair	mwhahaha: yeah, it doesn't really matter. as soon as whatever is in position #4 changes, everthing behind it will restart.	16:25
dmsimard	pabelanger: doesn't explain the failed nodes, but does explain the ready nodes	16:25
pabelanger	dmsimard: possible, I am trying to see if that is the case	16:25
mwhahaha	fungi: no that declarded a dep on 512082 which is already merged	16:26
dmsimard	pabelanger: no load graphs on cacti for ze's :(	16:26
pabelanger	each time we restart an executor, there is a large spike in load on others, which make sense	16:26
clarkb	inc0: we changed how the iptables are configured, but tehy should be wide open between all test nodes still (its just ansible doing it instead of a nodepool ready script now)	16:26
pabelanger	so need to wait a bit for things to even out	16:27
*** smatzek has quit IRC		16:27
jeblair	dmsimard: we have graphs http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63999&rra_id=all	16:27
fungi	mwhahaha: oh, you're right it was showing a tree where 510363 was the closest working change	16:27
clarkb	inc0: you should see in the job-output.txt and in ara if the jobs fail the playbooks that set up the rules between the nodes run	16:27
dmsimard	jeblair: oh, I wasn't looking at the right place, thanks	16:27
dmsimard	jeblair: was looking at, example, http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=558	16:27
*** smatzek has joined #openstack-infra		16:27
*** iyamahat has joined #openstack-infra		16:27
jeblair	dmsimard, pabelanger: this new graph may be helpful: http://graphite.openstack.org/render/?width=586&height=308&_salt=1509467265.42&target=stats.gauges.zuul.executors.accepting	16:28
fungi	mwhahaha: jeblair: at any rate, something seems to have kicked the change after 514330 out of the gate so everything behind restarted anyway	16:28
jeblair	i haven't grafana'd that	16:28
pabelanger	jeblair: Yah, that is neat	16:28
mwhahaha	fungi: jeblair ok well i'll just keep an eye on it	16:29
inc0	clarkb: what I'm experiencing (and can't figure out what's wrong) is timeouts between multinode stuff	16:29
inc0	like galera	16:29
jeblair	mwhahaha: do you want me to reshuffle to fix 514330, or leave it?	16:29
mwhahaha	jeblair: just leave it	16:29
inc0	so while iptables jobs runs well (I assume) they might close something I need	16:29
jeblair	mwhahaha: ok	16:29
inc0	I'm also trying with tunnel playbook, but doesn't seem to help	16:29
*** esberglu has joined #openstack-infra		16:29
*** kjackal_ has quit IRC		16:30
pabelanger	dmsimard: we also have ~135 ready, but locked nodes in nodepool. So we are likely waiting for the noderequst to be fulfilled before unlocking	16:30
*** kjackal_ has joined #openstack-infra		16:30
clarkb	inc0: the overlay you mean?	16:30
inc0	yeah	16:30
clarkb	inc0: do you have an example failing job we can look at logs for?	16:30
AJaeger	dmsimard, jeblair recheck did not help - still " ERROR Project openstack/requirements does not have the default branch master" on https://review.openstack.org/516397	16:31
jeblair	AJaeger: i'll dig into it	16:31
pabelanger	dmsimard: so, it is possible that nodepool is waiting for jobs to finish, so it can launch more nodes	16:31
AJaeger	thanks, jeblair	16:31
AJaeger	bbl	16:32
*** smatzek has quit IRC		16:32
*** felipemonteiro_ has quit IRC		16:32
*** salv-orlando has joined #openstack-infra		16:32
*** salv-orlando has quit IRC		16:33
Shrews	dmsimard: that grafana graph, i believe, can be a bit misleading. a node can be READY and already assigned to a request, but there could be a wait on other nodes needed for the request	16:33
*** ijw has joined #openstack-infra		16:33
pabelanger	yah, that's basically what is going on ATM	16:33
inc0	clarkb: http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/ for example	16:33
*** smatzek has joined #openstack-infra		16:33
inc0	in this ps mariadb is in single node, but it fails in different place	16:33
*** ramishra has quit IRC		16:33
inc0	also timeout	16:34
inc0	http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/ansible/deploy <- this is deployment log	16:34
dmsimard	inc0: iptables is literally set up to accept any traffic from nodes in a multinode set, let me pick you up the set of rules	16:35
Shrews	dmsimard: if you have a specific review in mind, or anything specific, really, i don't mind digging. but it's hard to give an explanation of the general state of things	16:35
dmsimard	Shrews: there's contention for the amount of ready nodes and I think we understand that part. I was also asking about the seemingly high amount of failed nodes	16:36
clarkb	inc0: so it is trying to hit http://172.24.4.250:35357 and failing? where do we see that ip address is assigned?	16:36
*** jascott1 has joined #openstack-infra		16:37
dmsimard	inc0: http://logs.openstack.org/36/509436/6/gate/multinode-integration-ubuntu-xenial/7a9df40/job-output.txt.gz#_2017-10-23_22_40_18_201002	16:37
*** salv-orlando has joined #openstack-infra		16:37
dmsimard	inc0: are you using switch/peer groups ?	16:37
jeblair	AJaeger: it looks like that executor's internal git repo for openstack/requirements is corrupted	16:37
jeblair	i suspect that's either related to the unclean shutdown/startup, or my cleaning up after it	16:38
clarkb	dmsimard: ya they are in the inventory	16:38
clarkb	dmsimard: http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/zuul-info/inventory.yaml	16:38
pabelanger	Shrews: dmsimard: the failed nodes, is because we've hit quota on cloud. So maybe an issue with calculating that?	16:38
*** smatzek has quit IRC		16:38
pabelanger	eg: shade.exc.OpenStackCloudHTTPError: (403) Client Error for url: https://iad.servers.api.rackspacecloud.com/v2/637776/servers Quota exceeded for ram: Requested 8192, but already used 1531904 of 1536000 ram	16:38
*** pvaneck has joined #openstack-infra		16:38
Shrews	pabelanger: dmsimard: shade.exc.OpenStackCloudHTTPError: (403) Client Error for url: https://iad.servers.api.rackspacecloud.com/v2/637776/servers Quota exceeded for ram: Requested 8192, but already us	16:38
dmsimard	pabelanger: leaked nodes perhaps ?	16:38
Shrews	ed 1531904 of 1536000 ram	16:38
Shrews	seeing that in nl01 logs	16:38
pabelanger	yah, same	16:39
clarkb	inc0: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 198.72.124.138. Set the 'ServerName' directive globally to suppress this message I think that is the problem	16:39
clarkb	inc0: from http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/docker_logs/keystone.txt.gz	16:39
pabelanger	I can check for leacked ndoes in IAD quickly	16:39
clarkb	inc0: would have to double check the vhost but if apache thinks it is serving from a different name than the one you are hitting that could explain it	16:39
Shrews	pabelanger: nodepool doesn't calculate anything wrt cpu or ram	16:40
clarkb	inc0: ya the listens are all off at http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/kolla_configs/keystone/wsgi-keystone.conf too	16:40
clarkb	172.24.4.1 != 172.24.4.250	16:40
*** catintheroof has joined #openstack-infra		16:40
pabelanger	Shrews: yah, maybe a leak or max-server isn't quiet correct	16:41
Shrews	pabelanger: possible that max-servers is set too high?	16:41
clarkb	so I think apache is just not listening on the IP that would make this work	16:41
Shrews	yeah, that	16:41
pabelanger	quite*	16:41
*** vsaienk0 has quit IRC		16:41
*** jascott1 has quit IRC		16:41
*** armaan has joined #openstack-infra		16:41
dmsimard	clarkb: those are internal bridged IPs, right ? We don't set those up in the firewall explicitely but in practice the traffic goes in and out of the private IPs so it should go through fine I think	16:42
pabelanger	clarkb: can I delete you clarkb-test-centos7 in rax-iad?	16:42
*** sree has joined #openstack-infra		16:42
clarkb	pabelanger: yes that should be fine	16:42
pabelanger	kk	16:42
clarkb	dmsimard: ya I think the firewall is fine, its the process not listening on the right ip:port combo	16:42
*** jascott1 has joined #openstack-infra		16:43
inc0	clarkb: .250 will be handled by haproxy	16:43
inc0	that's why	16:43
pabelanger	yah, I see a few vms in rax-iad that looks to be in error state	16:43
pabelanger	going to try and clean them up	16:43
clarkb	inc0: do you have more logs than http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/docker_logs/haproxy.txt.gz ?	16:44
Shrews	pabelanger: also worth noting that the launcher start a few moments ago would have eliminated a couple of situations where we could have some instances essentially sticking around much too long.	16:44
*** Apoorva has joined #openstack-infra		16:44
Shrews	s/start/restart/	16:44
pabelanger	Shrews: ok	16:45
clarkb	aha http://logs.openstack.org/79/512779/25/check/kolla-ansible-ubuntu-source-ceph/fbace60/primary/logs/kolla/haproxy/haproxy_latest.20171025.b55c6793b5c7f834e.txt.gz	16:45
*** catintheroof has quit IRC		16:45
*** sree has quit IRC		16:47
inc0	clarkb: let me retry full mariadb cluster so you'll see it	16:47
inc0	issue wasn't with haproxy tho because turning it off didn't help with mariadb	16:47
inc0	it's node->node timeout over 172... ips that failed	16:48
*** yamahata has joined #openstack-infra		16:48
dmsimard	Shrews: fwiw tobiash has a stack of patches around switching from "max-servers" to use quotas instead, https://review.openstack.org/#/c/503838/	16:48
clarkb	inc0: right but you aren't really going node to node over 172	16:48
clarkb	inc0: you are going node to proxy to node to docker	16:48
clarkb	inc0: and any one of those pieces could be broken	16:48
clarkb	(actually its docker to node to proxy to node to docker)	16:48
inc0	we use net=host in docker so network stack isn't dockerized	16:48
dmsimard	Shrews: (which I'm very excited about)	16:48
inc0	we don't use docker proxy or docker networking at all	16:49
clarkb	inc0: so haproxy is the only address rewriting involved?	16:49
inc0	yes, well that and keepalived	16:49
inc0	keepalived creates .250 ip and handles HA	16:50
inc0	(on host)	16:50
dmsimard	jeblair: so we had one particular executor with a corrupted repository and that's it ?	16:50
inc0	haproxy listens on .250 and forwards to .1 .2. .3	16:50
Shrews	dmsimard: yes. i think we're in a position to begin considering those now	16:50
jeblair	dmsimard: i'm going to check all of them	16:50
jeblair	sudo ansible 'ze*' -m shell -a 'whoami' --become-user zuul	16:51
jeblair	why does that report 'root' ?	16:51
Shrews	jeblair: i think you also need --become	16:51
dmsimard	jeblair: perhaps .ssh/config defaults to root ? and what Shrews said	16:51
jeblair	Shrews: yep, thanks :)	16:52
Shrews	jeblair: you'd think the user one would imply the other, but... not so much	16:52
jeblair	yeah, i guess it's a really strict correspondence with commandline args and module args	16:52
clarkb	inc0: ok the haproxy log gives us clues on each node it appears it can talk to the keystone running on itself but not the others	16:52
pabelanger	Shrews: did you want to take a peak into infracloud-chocolate, a large portion of its nodes are available currently: http://grafana.openstack.org/dashboard/db/nodepool-infra-cloud I haven't checked why that is	16:52
mwhahaha	still getting errors :/	16:53
clarkb	(just based on the health checks)	16:53
*** smatzek has joined #openstack-infra		16:53
Shrews	pabelanger: looking	16:53
inc0	clarkb: problem is, even if I turned off haproxy it fails	16:53
dmsimard	mwhahaha: jeblair identified the problem and is working on it	16:53
jeblair	mwhahaha: yeah, i think there's a problem on at least one executor, i'm working on a solution	16:53
*** smatzek has quit IRC		16:53
mwhahaha	k	16:53
inc0	clarkb: let this latest patchset run	16:53
jeblair	mwhahaha: i'll be happy to promote changes once it's fixed; i'll let you know	16:53
inc0	I just uploaded new one with multinode galera	16:53
clarkb	inc0: I don't think we need galera...	16:54
*** smatzek has joined #openstack-infra		16:54
inc0	that's basically setup I'd like to test	16:54
inc0	well, we do, galera deployment is part of our code and we want to gate it	16:54
clarkb	inc0: I mean to debug this problem	16:54
clarkb	it exists with or without galera	16:54
inc0	right, but galera makes it appear earlier	16:54
inc0	anyway, removing haproxy have same effect	16:55
inc0	removing haproxy -> pointing all API traffic to .1	16:55
mnaser	are you deploying keepalived in the gate across 3 vms?	16:55
inc0	yeah	16:55
mnaser	so how can the 2 other nodes reach the vip	16:56
inc0	well they should, I use tunnel overlay	16:56
mnaser	because usually openstack won't let traffic originating from an IP that does not belong to an instance leave the instance	16:56
inc0	that's why I'm using tunnel overlay;)	16:57
mnaser	oh, if you have an overlay for that then im not sure, i'd consider MTU issues because now you're doing tunnel in tunnel	16:57
clarkb	inc0 Oct 25 23:40:24 ubuntu kernel: [ 720.595795] iptables dropped: IN=brinfra OUT= MAC=e2:9f:77:40:45:4a:fa:58:48:a1:33:49:08:00 SRC=172.24.4.3 DST=172.24.4.2 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=25069 DF PROTO=TCP SPT=41312 DPT=35357 WINDOW=28200 RES=0x00 SYN URGP=0	16:57
inc0	I checked mtu	16:57
mnaser	as some cloud providers dont give you a real L2 network but a tunneled one	16:57
mnaser	so you the MTU might change from a provider to another	16:57
*** bnemec has quit IRC		16:57
inc0	clarkb: sounds like problem there	16:57
clarkb	inc0: reading that I think the problem is likely that the firewall rules are only updated for the actual host IPs and not for the overlay	16:57
clarkb	dmsimard: ^ does the overlay role update firewall rules too?	16:58
clarkb	dmsimard: if not we probably want it to	16:58
*** rbrndt has quit IRC		17:00
*** baoli has quit IRC		17:00
dmsimard	clarkb: that's what I said earlier, no ? said we didn't add bridge IPs, only nodepool private ips	17:00
clarkb	dmsimard: oh sorry I misread it	17:01
*** baoli has joined #openstack-infra		17:01
dmsimard	clarkb: just making sure I'm not crazy http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-31.log.html#t2017-10-31T16:42:10	17:01
dmsimard	do we need to add the bridge IPs to the firewall, then ?	17:01
clarkb	yes we will	17:01
pabelanger	clarkb: dmsimard: Shrews: okay, clean up of rax-IAD underway, looks like we did leak some old nodes. maybe during cutover to zuulv3	17:01
*** jcoufal has quit IRC		17:01
Shrews	pabelanger: i'm sort of suspecting zuul has those chocolate ready nodes locked and is not doing anything with them	17:02
clarkb	(I read private IPs as the thing used by the overlay but you mean the actual cloud provided private IPs)	17:02
dmsimard	clarkb: ok, I'll get that done	17:02
dmsimard	brb	17:02
clarkb	dmsimard: and rather than doing ip to ip we just want to open the whole range up	17:02
pabelanger	Shrews: oh, could it because executors stopped accepting jobs, due to high load?	17:02
clarkb	so that inc0 can use .250 for haproxy	17:03
Shrews	pabelanger: the nodes are assigned, the request is gone, so nodepool is waiting for zuul to change their state	17:03
*** yamamoto has joined #openstack-infra		17:03
*** hashar is now known as hasharAway		17:03
pabelanger	kk	17:03
Shrews	pabelanger: probably	17:03
*** jcoufal has joined #openstack-infra		17:03
pabelanger	Shrews: thanks! I'll dig into that more	17:03
pabelanger	once I finish clean up	17:03
jeblair	dmsimard: ze 02,03,04 have ara 0.14.0. ze 05,06,07,08 have 0.14.2. ze 01,09,10 have 0.14.4.	17:03
jeblair	dmsimard: i believe those are clustered by installation date	17:03
jeblair	dmsimard: what's the version you just released?	17:03
clarkb	dmsimard: I think `sudo iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT` is what devstack-gate used to do	17:05
openstackgerrit	James E. Blair proposed openstack-infra/puppet-zuul master: Ensure ara is updated on executors https://review.openstack.org/516740	17:05
jeblair	clarkb, fungi, pabelanger: ^ is that okay to do?	17:05
jeblair	i can never remember what works and doesn't.	17:06
*** rkukura has quit IRC		17:06
jeblair	dmsimard: ^	17:06
pabelanger	yah, syntax looks correct	17:06
dmsimard	jeblair: 0.14.5	17:06
clarkb	jeblair: that should be ok but it will update all the deps too (whcih may cause problems like with the subunit2sql thing fungi ran into)	17:06
inc0	also fyi, patchset with dockerhub publishing is up	17:06
inc0	we'll move our gates to dockerhub + proxy as soon as first images appear upstream	17:07
inc0	to get rid of tarbalsl	17:07
dmsimard	jeblair: we don't run a pip upgrade of zuul which would trigger an update if it's dependencies ?	17:07
inc0	proxy == https cache in nodepools	17:07
dmsimard	brb...	17:07
jeblair	dmsimard: ara is not a zuul dependency	17:07
dmsimard	Ohhh	17:07
jeblair	(it's optional)	17:07
*** rkukura has joined #openstack-infra		17:08
jeblair	we could probably work something out with those requirement tag thingies... i'm not sure if that whole pipeline works now...?	17:08
*** baoli has quit IRC		17:08
pabelanger	okay, ready nodes are dropping now that quota is rax-iad is coming avaiable again	17:09
clarkb	dmsimard: oh except that may only work with the old linux bridge things and not ovs, looking into how we set it up for ovs	17:09
jeblair	tag isn't the right word for that... what's the word they use?	17:09
*** baoli has joined #openstack-infra		17:09
clarkb	jeblair: extras	17:09
jeblair	clarkb: yeah that's it!	17:09
*** edmondsw has joined #openstack-infra		17:10
*** yamamoto has quit IRC		17:11
clarkb	dmsimard: I'm actually not seeing any explicit allows since we switched to ovs. I think that is because neutron/nova net manage their own IPs and firewall rules on that range	17:11
*** tosky has quit IRC		17:11
clarkb	dmsimard: so we didn't actually need to manage it directly which would explain why inc0 is having problems too but devstack didn't	17:11
clarkb	we don't run the control plane on devstack multinode on the overlay	17:11
pabelanger	clarkb: so, to loop back to queue window size and related node usuage at PTG. If we support a negative window size on failures, i think that would help with the wasting of CI resources? EG: gate resets, 20 patches reset and suck up all the nodes	17:12
clarkb	we only run the VM networks there	17:12
*** lucasagomes is now known as lucas-afk		17:12
*** catintheroof has joined #openstack-infra		17:12
clarkb	inc0: fyi ^ what is the motivation for using the overlay for the control plane here?	17:12
*** bnemec has joined #openstack-infra		17:12
inc0	clarkb: well, I had same mariadb timeouts before overlay, so I asked here and you guys suggested overlay	17:12
clarkb	pabelanger: you mean 0 size? you can't really have a negative size	17:12
inc0	overlay is good for us also because we can use keepalived+haproxy too	17:13
pabelanger	clarkb: by sliding the window down to 10, then it reduces the amount of nodes is grabs each reset	17:13
jlvillal	Is there a little command line utility that I can point at the 'zuul.d/' directory in our repo and it will print out what jobs should run for master, stable/pike, stable/ocata. And if they are voting, non-voting, or experimental.	17:13
clarkb	inc0: those should all run over layer 3 right?	17:13
* jlvillal doesn't ask for much		17:13
clarkb	pabelanger: so my issue with that is the gate is supposed to pass. If it doesn't that is the bug not the window size which is already a hack around bad jobs	17:13
inc0	well, neutron underneath will not allow keepalived floating ip to work over regular network	17:13
pabelanger	clarkb: I mean, start at 20 today, and ramp up to 40 and down to 10	17:13
jlvillal	I'm trying to port what is in master to our stable branches.	17:13
inc0	neutron of nodepools	17:14
clarkb	pabelanger: why not just change the minimum to 10?	17:14
clarkb	pabelanger: it will slide up to 20 if your jobs pass	17:14
inc0	also L2 connectivity will generally be more similar to prod so it's little added benefit	17:14
pabelanger	clarkb: yah, I mean setting to 10 might be easier. If we want to do that	17:14
clarkb	inc0: oh right you need a shared IP then ya you need an overlay	17:14
inc0	we might add second overlay for vms later;)	17:14
jeblair	pabelanger: what problem are you trying to solve?	17:14
clarkb	inc0: dmsimard ok considering that I think what we want is an option to the overlay network role to open up the entire range between the nodes eg 172.24.4.0/23 can talk to 172.24.4.0/23	17:15
clarkb	inc0: dmsimard default it to not do that as that is the old behavior and some things (like neutron) want to manage the rules itself	17:15
jeblair	#status log restarted all zuul executors and cleaned up old processes from previous restarts	17:15
openstackstatus	jeblair: finished logging	17:16
clarkb	but then inc0 can set that to true and get it working for control plane on overlay	17:16
jeblair	#status log removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/python-glanceclient on ze05	17:16
openstackstatus	jeblair: finished logging	17:16
fungi	jeblair: yeah, digging through logs and the source for puppet's pip package provider, i confirmed that using ensure=>latest will cause not only the named package to be updated but all its dependencies will be unconditionally updated to the latest versions on pypi (even if you've preinstalled sufficient versions as system packages) due to using the default upgrade strategy rather than the	17:16
jeblair	#status log removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/neutron on ze10	17:16
fungi	only-if-needed strategy	17:16
openstackstatus	jeblair: finished logging	17:16
pabelanger	jeblair: minimize tripleo change pipeline resetting and consuming all the nodes, it has been happening pretty often the last few weeks. I know they are trying to fix the issues, but still making progress	17:16
jeblair	#status log removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/requirements on ze07	17:16
openstackstatus	jeblair: finished logging	17:16
inc0	clarkb: just add new role for iptables	17:16
inc0	we can call it explicitly	17:16
*** gouthamr has joined #openstack-infra		17:16
jeblair	pabelanger: can you elaborate on "consuming all the nodes"?	17:17
pabelanger	dmsimard: Shrews: okay, rax-iad is happy again. Looking at rax-ord now	17:17
jeblair	mwhahaha, AJaeger, dmsimard: all the executors should be repaired. should i promote any changes?	17:17
*** markvoelker_ has joined #openstack-infra		17:17
AJaeger	jeblair: thanks - no requests from my side	17:17
clarkb	inc0: except its fairly tightly coupled here, I don't think a new role is right because people will miss it when adding this role	17:17
clarkb	inc0: instead its an option of the overlay	17:18
inc0	either way works, thanks!	17:18
jeblair	fungi: and that does not happen on initial installation?	17:18
AJaeger	jeblair: what about integrated gate? Promote 515702 ?	17:19
fungi	jeblair: it does not with ensure=>present because it just calls pip install without --upgrade	17:19
*** markvoelker has quit IRC		17:19
fungi	pip thinks --upgrade means "upgrade everything" unless you supply --upgrade-strategy=only-if-needed	17:19
pabelanger	jeblair: currently, we're up to 470 centos-7 nodes, which I haven't calculated but likely used mostly by tripleo jobs. Since those job run times are pretty long, each time their gate resets, it is a large amount of resource that get wasted and jobs have to start again. So, was looking for a way to see how we could decrease the window size until that change queue becomes happy again.	17:20
dmsimard	jeblair: for my own curiosity, were you able to tell if (strangely) just openstack/requirements affected ?	17:20
fungi	the latter will upgrade the named packages but only upgrade their dependencies if the installed versions are insufficient	17:20
pabelanger	jeblair: the delay has been pushing 24hrs for the last week or so	17:20
jeblair	dmsimard: i status logged the repos i repaired	17:20
AJaeger	dmsimard: see the #status log above	17:20
dmsimard	jeblair: oops, didn't read far back enough, thanks	17:20
*** Swami has joined #openstack-infra		17:21
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Use user home as work directory of executor https://review.openstack.org/516532	17:21
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Check start time for wait_time key https://review.openstack.org/516465	17:21
*** markvoelker has joined #openstack-infra		17:22
jeblair	pabelanger: "wasting" is an abstract concern -- what's the problem?	17:23
*** markvoelker_ has quit IRC		17:23
jeblair	help me understand the concrete problem that needs solving	17:23
dmsimard	jeblair: tripleo is consuming more resources because the gate keeps resetting for different reasons, not all within their control	17:23
dmsimard	is the gist of it	17:23
jeblair	dmsimard: more than what?	17:23
*** e0ne has quit IRC		17:23
dmsimard	more than if they merged the first time, rather than be rechecked/requeued several times	17:24
*** salv-orlando has quit IRC		17:24
clarkb	470 nodes would be ~half our capacity right?	17:24
*** felipemonteiro has joined #openstack-infra		17:24
*** salv-orlando has joined #openstack-infra		17:25
jeblair	dmsimard: not really; a gate reset releases old resources and consumes new ones, so the consumption stays constant	17:25
*** tmorin has quit IRC		17:25
*** felipemonteiro_ has joined #openstack-infra		17:25
pabelanger	jeblair: Sure, our check pipeline is currently 231, and wanted to see how we can get more nodes for it. My _gut_ is saying because the change queue for tripleo is resetting every 2 hours, that is the reason we are backing up check	17:25
dmsimard	jeblair: okay, I think we're understanding each other but using different vocabulary	17:25
pabelanger	jeblair: however, it isn't a problem. Since we eventually get nodes into check after the gate reset	17:26
clarkb	my initial guess is that the runtime of tripleo jobs is what is making this potentially problematic	17:26
jeblair	pabelanger: okay, a check backlog is something that can be addressed by reducing gate node usage (which can be done by shrinking the window when things are bad)	17:26
clarkb	because check is at a lower priority than gate so when tripleo holds half our capacity in gate then resets they get to keep holding that half for significant periods of time	17:27
jeblair	pabelanger: i'd caution against evaluating the backlog right now since i just reset the entire system twice this morning	17:27
dmsimard	clarkb: it's a bit of a vicious circle, yes	17:27
clarkb	we give gate a higher priority because those jobs should never fail	17:27
clarkb	and so in theory have good throughput	17:27
jeblair	we had a 100 change backlog in check and 40 in gate when i restarted the first time. when i did that, we lost about 800 CPU hours of computation. and then i did it again.	17:27
clarkb	(also merging things is important)	17:27
pabelanger	jeblair: sure, understood.	17:27
*** salv-orlando has quit IRC		17:29
fungi	the main question i have is why is tripleo's shared change queue so much larger than the others in the gate? some combination of excessive job runtimes, more frequent job failures and tighter coupling between a greater number of repos than most other openstack projects?	17:29
*** felipemonteiro has quit IRC		17:29
*** jpich has quit IRC		17:29
dmsimard	mwhahaha, EmilienM ^	17:29
fungi	or do they actually push that many more changes than other teams?	17:29
*** pcaruana has quit IRC		17:29
jeblair	fungi: i'd say mostly the first 2 right now. the project diversity doesn't seem to be a big issue.	17:29
*** sree has joined #openstack-infra		17:29
*** trown\|lunch is now known as trown		17:30
pabelanger	yah job failures and long runtimes is the current state	17:30
odyssey4me	oh dear, did zuul restart?	17:30
odyssey4me	did I break it again?	17:30
fungi	or is it also that they've held off approving changes due to tripleo cloud outages, and are working through an approval backlog now?	17:30
pabelanger	however, that isn't something we can currently fix, that would be done on tripleo side	17:30
*** camunoz has quit IRC		17:30
pabelanger	can't*	17:31
dmsimard	odyssey4me: zuul restarted, likely not your fault :)	17:31
mwhahaha	there are many reasons for the long queue today, most of which are not necessarily tripleo failures	17:31
fungi	just trying to figure out whether it's expected to be perpetual or whether this is temporary while tripleo catches up on pending change approvals	17:31
*** dhinesh has joined #openstack-infra		17:31
mwhahaha	1) zuul reset (and subsequent errors) has contributed to the length, 2) puppet jobs mixed in also got hit with a gem update that broke unit tests	17:32
pabelanger	okau, moving to rax-ord clean up	17:32
mwhahaha	we aren't aproving more things than normal afaik	17:32
jeblair	pabelanger: anyway, if you want to lower the min window because check is too backlogged, i will probably be fine with that, i'd just ask that you not make that evaluation based on the backlog right now which we know is not representative.	17:33
*** ijw has quit IRC		17:33
mwhahaha	I've asked that we start actively tracking the gate failures better, it's really hard to tell once they clear the dashboard what failed where	17:33
jeblair	pabelanger: it would be good to know what the backlog is under normal circumstances, and how the min-window change would be expected to affect that	17:33
*** ijw has joined #openstack-infra		17:33
mwhahaha	i have noticed there does seem to be a slower responstime in updating the status of the jobs on the v3 dashboard as opposed to the older one so i'm wondering if the constant churn is also not helping that	17:33
fungi	mwhahaha: "zuul reset" isn't a cause, merely a symptom. aside from the handful we just had due to the corrupt requirements git repository on one of the executors, most of those presumably go back to a generally higher failure rate for tripleo jobs i suppose? otherwise we'd see similar backlog for other projects	17:34
*** ijw has quit IRC		17:34
clarkb	mwhahaha: in theory that is what the health dashboard should tell you (what failed where)	17:34
*** sree has quit IRC		17:34
*** ijw has joined #openstack-infra		17:34
pabelanger	jeblair: is there an easy way to track shared queue reset with statsd? Or maybe something we could start tracking.	17:34
Shrews	keep in mind that right now, jobs using the tripleo-centos-7 node label are limited to a single pool that has max-servers of 70. Once those are all in use by long running jobs, other jobs requesting that node will be waiting for those 70 in-use nodes to be released before they even begin	17:34
mwhahaha	fungi: but it's a symptom of something outside of the tripleo world	17:34
Shrews	not sure how relevent that info is, just throwing it out there	17:34
mwhahaha	fungi: the point is that there are zuul or other related problems causing jobs to reset and because the jobs take long it has a bigger impact	17:35
mwhahaha	fungi: not related to things specific to the tripleo world, so because our jobs take longer the impact is greater on our queue	17:35
fungi	mwhahaha: many (i would wager most?) gate resets are due to job failures	17:35
mwhahaha	fungi: except this morning when the executor was erroring for a few hours	17:36
*** felipemonteiro_ has quit IRC		17:36
*** tesseract has quit IRC		17:36
fungi	right, i said aside from that specific incident	17:36
*** felipemonteiro_ has joined #openstack-infra		17:36
fungi	which should have affected other projects too	17:36
mwhahaha	it did but they are in their own queue and approve less	17:36
mwhahaha	so all of ours are more visable because of the single queue	17:36
mwhahaha	i saw a single nova change in the gate	17:36
pabelanger	pypi.slave.openstack.org is that something we can delete from rax-ord? looks like something from the past we didn't clean up	17:36
jeblair	Shrews: a couple of things mitigate that -- those are only used by check jobs, and only used by tripleo. so that shouldn't directly affect the gate queue length (but can make getting jobs ready to be gated take longer)	17:37
pabelanger	that is nodepool project too	17:37
fungi	okay, so you are saying you're actually approving more changes than the projects sharing the integrated gate queue	17:37
clarkb	jeblair: remind me which roles were we moving into devstack? was it the swap setup? network overlay is staying in project-config right?	17:37
mwhahaha	fungi: without actual metrics I cannot say for certain but our queue is larger than the integrated one	17:37
mwhahaha	fungi: all i can speak to is the last 48 hours	17:37
fungi	mwhahaha: agreed, trying to find out how it reached that point	17:37
mwhahaha	fungi: and the failures the queue has had weren't necessarily tripleo specific	17:38
fungi	it's also just possible other openstack projects are taking the week off in preparation for the summit i guess?	17:38
clarkb	multi-node-bridge is in zuul-jobs so it must not be moving	17:38
dmsimard	clarkb: re: bridge network and iptables -- we're doing this inside the image: http://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/elements/nodepool-base/install.d/20-iptables	17:38
pabelanger	http://logs.openstack.org/74/467074/7/gate/legacy-tripleo-ci-centos-7-scenario002-multinode-oooq-container/84b7572/ just reset tripleo gate	17:38
dmsimard	clarkb: notice the rules with 172.24.4.0/23	17:38
pabelanger	failed to download from github.com looks like	17:38
clarkb	dmsimard: iirc that is a hack to make ironic agent work	17:39
mwhahaha	fungi: do we have graphite metrics for zuul queue sizes?	17:39
pabelanger	so, that is one reason for failures, we discussed at PTG not downloading from github any more, but that requires changes to DLRN	17:39
fungi	mwhahaha: the disconnect for me is that if the majority of the issues weren't tripleo-specific, then i'm trying to understand how that isn't impacting other projects equally	17:39
jeblair	clarkb: i think network overlay should be in ozj or zj	17:39
clarkb	dmsimard: we can probably clean that up in the future, but aiui ironic nodes must have access to the control plane	17:39
pabelanger	eg: using spec files from RPMS over github.com	17:39
clarkb	jeblair: ya its zj I was confused	17:39
dmsimard	clarkb: but we're talking about just whitelisting the entire traffic between that range so that would be taken out, aye ?	17:40
openstackgerrit	Miguel Lavalle proposed openstack-infra/openstack-zuul-jobs master: Remove job neutron-dsvm-api https://review.openstack.org/516744	17:40
jeblair	fungi, mwhahaha: it sounds like at least one of the non-tripleo issues is tripleo-focused at least -- the gem failures.	17:40
clarkb	dmsimard: no not necessarily. That rule is 172.24.4.0/23 to the control plane which is one cloud IPs	17:40
mwhahaha	jeblair: no that's puppet-openstack specific	17:40
jeblair	fungi, mwhahaha: that one is a matter of perspective	17:40
clarkb	dmsimard: I think we only want to add the rules that allow 172.24.4.0/23 to talk to 172.24.4.0/23	17:40
mwhahaha	jeblair: and because we share queue, it impacted tripleo	17:40
pabelanger	fungi: clarkb: see my question about pypi.slave.o.o in rax-ord, is that safe to delete	17:40
jeblair	mwhahaha: you use those modules, right?	17:40
clarkb	pabelanger: I don't know	17:41
pabelanger	clarkb: k, I'll add it to meeting	17:41
jeblair	mwhahaha: i mean, there's a reason that queue is shared	17:41
mwhahaha	jeblair: by your argument then we should share neutron, nova, etc	17:41
clarkb	dmsimard: if you haven't started on the chagne to multi node bridge I can take a stab at it	17:41
clarkb	dmsimard: let me know	17:41
jeblair	mwhahaha: that is in fact my argument but i have compromised	17:41
fungi	mwhahaha: and attempting to ascertain whether whatever is causing tripleo to be singled out at the moment is a long-term problem we need to address systemically to bring tripleo's resource consumption into alignment with other teams, or whether this is a temporary/fleeting ballooning of resource needs which will subside once you work through it	17:41
mwhahaha	once again, we need metrics and data so we can get to the bottom	17:42
*** armaan has quit IRC		17:42
*** salv-orlando has joined #openstack-infra		17:42
mwhahaha	i also do not like this but without help understanding wtf is occuring in zuul over time when i'm not watching it's hard to say	17:42
*** camunoz has joined #openstack-infra		17:42
dmsimard	clarkb: mostly making sure we'd be doing the change in the right place, I searched and couldn't find a reference to "iptables -I FORWARD -m physdev --physdev-is-bridged -j ACCEPT".. closest I found was in actual neutron code	17:42
jeblair	mwhahaha: at any rate, what i'm trying to say is that those gem failures are a partial answer to fungi's question of why the tripleo queue has beed adversely affected recently	17:43
*** d0ugal has quit IRC		17:43
fungi	makes sense	17:43
clarkb	dmsimard: ya I think that was left over from when we used linux bridges but now it is ovs	17:43
mwhahaha	jeblair: that was this morning and was addressed before the executor problem. it doesn't explain what happened yesterday	17:43
clarkb	dmsimard: I did a bit of digging myself, pretty sure the rule is lacking to meet inc0's needs	17:43
mwhahaha	the queue was already in bad shape before that happened	17:43
mwhahaha	that's just contributed to further delays today	17:44
clarkb	mwhahaha: is the health dashboard not tracking it for you?	17:44
clarkb	re metrics	17:44
inc0	thanks clarkb, dmsimard do you want me to publish patch for it?	17:45
fungi	mwhahaha: poking around in graphite i don't think we have separate meters for every shared change queue	17:45
mwhahaha	fungi: that would be beneficial to have so that we can tell when issues start for RCA	17:45
clarkb	mwhahaha: http://status.openstack.org/openstack-health/#/	17:45
clarkb	seems to be tracking tripleo jobs based on the front page	17:45
dmsimard	clarkb, inc0: I've started something	17:45
clarkb	dmsimard: cool, thanks	17:46
*** Guest95277 has quit IRC		17:46
inc0	thanks dmsimard	17:46
mwhahaha	clarkb: yea it's there but i'll have to dig further.	17:46
openstackgerrit	Miguel Lavalle proposed openstack-infra/project-config master: Remove legacy-neutron-dsvm-api from Neutron https://review.openstack.org/516724	17:47
mwhahaha	clarkb: what specifically would be beneficial is a break out of check vs gate	17:47
clarkb	mwhahaha: aiui its all gate today (no check) just due to volume but mtreinish would have to confirm	17:47
mwhahaha	k i'll have to dig in and see if there's specifics we can point to	17:48
mwhahaha	i recall there being a problem in the dashboard around the pingtest being improperly reported so i'll need to make sure that's still not a problem	17:49
fungi	jeblair: is stats.zuul.tenant.openstack.pipeline.gate.total_changes scaled by 0.01? seems more spiky than i would expect too	17:50
dmsimard	clarkb: bleh, I hate this but we might need to use a meta dependency. The problem is that the bridge network is not known before the multi-node-bridge role runs and for that role to work, we need to run the firewall role first to authorize the traffic between the nodes	17:51
pabelanger	mwhahaha: I had a patch up to fix that, https://review.openstack.org/495517/ it was because it would set pingtest fail by default, then never run the test	17:51
EmilienM	legacy-tripleo-ci-centos-7-nonha-multinode-oooq legacy-tripleo-ci-centos-7-nonha-multinode-oooq : ERROR Project openstack/requirements does not have the default branch master ( found on https://review.openstack.org/#/c/516683/ - stable/newton)	17:52
dmsimard	EmilienM: we identified the issue and resolved it	17:52
fungi	EmilienM: that was corrected an hour or so ago	17:52
EmilienM	dmsimard: ok, I'll run "recheck" in that case	17:52
clarkb	dmsimard: yes I think you need to just do it in multi-node-bridge completely independent of the firewall role	17:52
fungi	corrupt git repository cached on an executor	17:52
mwhahaha	pabelanger: yea that's what i'm remembering so not sure since we moved to tempest if we need that as much	17:52
*** jascott1 has quit IRC		17:52
clarkb	dmsimard: is is a feature of the multinode bridge (and have it be a flag off by default that decides if it turns on or not)	17:52
pabelanger	clarkb: fungi: do you mind reaching out to citycloud, or I can if you have a contact email, about deleting our 7 stuck instances in Kna1. They seems stuck in BUILDING	17:53
dmsimard	clarkb: we could put it in multi-node-bridge directly, but then the rules wouldn't be persisted (https://review.openstack.org/#/c/513943/)	17:53
clarkb	dmsimard: I think you can just add a task to main.yaml in multi-node-bridge with a when: flag \| bool	17:53
*** ccamacho has quit IRC		17:53
clarkb	dmsimard: oh right that	17:53
dmsimard	clarkb: let me put up a WIP to explain	17:53
*** felipemonteiro__ has joined #openstack-infra		17:54
fungi	pabelanger: i don't know that i have any specific contact off the top of my head--if we do have contact info it'll generally be in our passwords file	17:54
clarkb	pabelanger: I've included you in earlier emails to them you can use but also our contact for that cloud should be in the passwords file	17:54
pabelanger	clarkb: k, couldn't remember. let me search mail again	17:54
pabelanger	with nodepool launcher errors under control, I'm going to see why we have a large amount of ready nodes now	17:55
fungi	pabelanger: if we have a dashboard login of some kind for them, might make sense to just open a trouble ticket through that since it's likely non-urgent	17:55
clarkb	dmsimard: can you just call the firewall role again from multi node bridge and pass in the different IPs? I think the problem is today the firewall role assumes the node IPs in inventory right? but we should be able to have it take a list?	17:55
clarkb	dmsimard: or break out the persist iptables portion of that rule and only reuse that bit (that might actually be easiest)	17:55
pabelanger	fungi: good idea	17:55
dmsimard	hang on, I'll have a patch up soon	17:55
clarkb	jeblair: considering you wrote https://review.openstack.org/#/c/516502/2 I'd be curious to get your thoughts on that (its logstash job submission change)	17:57
*** felipemonteiro_ has quit IRC		17:57
openstackgerrit	Paul Belanger proposed openstack-infra/project-config master: Add 'Accepting Builds' panel for zuul-status https://review.openstack.org/516755	18:01
pabelanger	jeblair: ^is that the correct syntax to render the new accepting metric for executors?	18:01
*** pvaneck has quit IRC		18:02
*** panda\|ruck\|bbl is now known as panda\|ruck		18:02
mwhahaha	question about logstash, which is the correct filename to use going forward? with or without the .gz? ie job-output.txt or job-output.txt.gz	18:02
clarkb	mwhahaha: that is what I'm trying to sort out with https://review.openstack.org/#/c/516502/2	18:02
clarkb	mwhahaha: I'm asserting no .gz (backward compatible)	18:03
mwhahaha	k	18:03
*** d0ugal has joined #openstack-infra		18:03
*** tpsilva has joined #openstack-infra		18:03
clarkb	but hoping more people will review that change so we can make that decision	18:03
dmsimard	clarkb: re-reading ianw's comment on https://review.openstack.org/#/c/513943/ -- I guess we could take out the iptables persistence into a specific role that'd run after multi-node-bridge and multi-node-firewall	18:04
mwhahaha	i know it used to be not .gz but the v3 changed that. so i'm for dropping it	18:04
dmsimard	clarkb: which would avoid having to run the multi-node-firewall role twice.	18:04
clarkb	dmsimard: ya	18:04
dmsimard	(and dealing with a meta dependency, which is cool too.)	18:04
dmsimard	okay, let's do that.	18:04
clarkb	dmsimard: that would be my perference I think its clear what is going on that way	18:04
* dmsimard hates meta dependencies		18:04
* mwhahaha adds a meta dependency on dmsimard		18:06
*** rbrndt has joined #openstack-infra		18:06
clarkb	can you include_role the same role from multiple places?	18:06
dmsimard	clarkb: yes	18:07
clarkb	but then main firewall setup can include_role persist iptables and then multi node bridge can include_role persist iptables too	18:07
clarkb	either way works, as long as persist iptables happens after all the firewalling at least once	18:07
*** rbrndt has quit IRC		18:07
dmsimard	clarkb: I intended to add the persist iptables roles once in the multinode playbook after both roles, but it's true that each of those roles could do an include role too.	18:08
dmsimard	either way works -- using include_role makes it so each role can be used on their own without relying on the playbook including the role	18:09
* dmsimard no strong opinion on either		18:09
clarkb	dmsimard: ya may be worth going that route just for the ability to consume things individually	18:09
*** dsariel__ has quit IRC		18:09
*** zzzeek has quit IRC		18:10
*** jpena is now known as jpena\|off		18:10
*** zzzeek has joined #openstack-infra		18:12
openstackgerrit	Alex Schultz proposed openstack-infra/elastic-recheck master: Add query for 1729054 https://review.openstack.org/516756	18:13
AJaeger	odyssey4me: looking at https://review.openstack.org/#/c/516605/2/zuul.d/project.yaml - why are you not using a project-template in a central place? That way you define the template once and can use it everywhere	18:13
AJaeger	odyssey4me: looks to me like you use the same jobs in a couple of repos	18:13
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul-jobs master: Authorize the multi-node-bridge network in iptables if there's one https://review.openstack.org/516757	18:15
dmsimard	clarkb: should be as simple as that ? ^ I'll fix the persistence stack	18:15
clarkb	dmsimard: one comment inline	18:16
clarkb	oh actually one more one sec	18:16
*** sree has joined #openstack-infra		18:16
*** zzzeek has quit IRC		18:17
clarkb	posted	18:18
dmsimard	clarkb: sure	18:18
*** zzzeek has joined #openstack-infra		18:18
clarkb	dmsimard: thinking about it more specifying the dest is probably not necessary	18:18
clarkb	you aren't going to have packets from that range coming from external nodes to the test env (due to routing)	18:19
dmsimard	clarkb: probably doesn't hurt to specify it	18:19
mwhahaha	is there anything we can do to speed up the response time of the elastic-recheck bot? or is it only as good as the indexing delay	18:20
pabelanger	/dev/xvde2 70G 66G 1.1G 99% /var/lib/zuul	18:20
pabelanger	that is on ze10.o.o	18:20
pabelanger	for some reason, almost full	18:20
*** zzzeek has quit IRC		18:20
dmsimard	pabelanger: that's what, git repos ?	18:20
clarkb	mwhahaha: its only as good as the indexing delay and right now thats not great due to the double indexing described in https://review.openstack.org/#/c/516502/2	18:20
clarkb	inc0: you should be able to depends on https://review.openstack.org/516757 and see if that makes things better for you	18:20
*** hemna has quit IRC		18:21
clarkb	pabelanger: I think we leak build workspaces when executors are restarted	18:21
*** d0ugal_ has joined #openstack-infra		18:21
*** sree has quit IRC		18:21
clarkb	not the case on ze02 though	18:21
pabelanger	yah, I'm going to stop ze10.o.o now, we are posting back some errors to jobs	18:21
pabelanger	then see what leaked	18:21
clarkb	hold on	18:22
pabelanger	k	18:22
dmsimard	clarkb, inc0: hang on, we'll default that to false, right ?	18:22
dmsimard	so that it's opt-in, not opt-out	18:22
clarkb	pabelanger: there are a few builds from the 30th yo ucan probably just delete those without stopping the executor since our timeout is less than 18 hours	18:22
clarkb	dmsimard: ya that preserves the old behavior of things like neutron testing their own firewall rules	18:23
pabelanger	clarkb: didn't we have a clean up find command we used before	18:23
*** d0ugal has quit IRC		18:23
clarkb	pabelanger: I'm not sure	18:23
inc0	dmsimard: I think default for confugre addresses is true	18:24
inc0	https://github.com/openstack-infra/zuul-jobs/blob/master/roles/multi-node-bridge/defaults/main.yaml#L5	18:24
odyssey4me	AJaeger our job definitions are a little common, but not that common - the extra layer of abstraction doesn't actually help much	18:24
clarkb	inc0: ya I'm saying we need another flag for whther or not the firewall should be opened as well since old behavior was not to do that because things like neutron do it themselves	18:24
inc0	and if you configure addresses, it will require iptables too	18:24
clarkb	inc0: no it won't require iptables	18:25
AJaeger	odyssey4me: then my example files were too small ;)	18:25
inc0	having address you can't communicate over?	18:25
clarkb	because things like neutron are expected to directly manage that stuff and if we do a global rule that masks neutron's rules we won't test neutron	18:25
inc0	maybe configure_addresses should be default false	18:25
*** zzzeek has joined #openstack-infra		18:25
clarkb	inc0: yes because some things like neutron do it themselves	18:25
inc0	right, but then you don't want address on iface too right?	18:25
odyssey4me	AJaeger I think as we stabilise we might look into using the templates -but for now it's not too bad	18:25
clarkb	inc0: we do	18:25
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul-jobs master: Authorize the multi-node-bridge network in iptables if there's one https://review.openstack.org/516757	18:26
dmsimard	clarkb: ^ with your comments	18:26
clarkb	inc0: without addresses on the interface we won't be able to ssh to neutron managed VMs	18:26
clarkb	due to routing	18:26
clarkb	inc0: but neutron is responsible for making sure the iptables rules are set to all ssh	18:26
clarkb	*allow	18:26
*** zzzeek_ has joined #openstack-infra		18:27
clarkb	dmsimard: +2 thanks	18:27
dhinesh	hi, looks like i might have a working CI https://review.openstack.org/#/c/516758/ , but how do you get the 'success' of 'failure' status for a CI under workflow	18:27
dmsimard	inc0: try a Depends-On with https://review.openstack.org/#/c/516757/ and set bridge_authorize_internal_traffic: true in your job vars	18:28
clarkb	inc0: basically the isn't a regression for the old overlay code in bash, it seemed to always expect the deployed software to then manage the IPs	18:28
dmsimard	clarkb: -infra uses puppet3 right ?	18:28
clarkb	inc0: your use case is different so we are adding this as a feature that is off by default	18:28
clarkb	dmsimard: yes	18:28
dmsimard	docs for puppet3 are dead :/ they took them out lol	18:29
clarkb	woo	18:29
clarkb	(I mean we do it too so can't complain)	18:29
dmsimard	oh wait there's https://docs.puppet.com/puppet/3.8/ but https://puppet.com/docs/puppet/3.8/index.html is broken	18:30
clarkb	dhinesh: your comments to gerrit have to match our comment link rules	18:30
*** zzzeek has quit IRC		18:30
inc0	trying	18:30
clarkb	dhinesh: er not comment links but the javascript we inject looks for a specific format (trying to find where that is)	18:31
*** ralonsoh has quit IRC		18:31
*** zzzeek_ has quit IRC		18:31
clarkb	pabelanger: fwiw my du to try and identify the bad dirs is going much slower on ze10 than I expected we probably should stop it since we can't turn it around quikcly	18:33
clarkb	pabelanger: my concern with just stopping it though is that I think some jobs use a lot more disk than others due to having more required projects and those jobs will just all migrate to other executors potentially causing them to run out of disk too	18:33
*** sambetts is now known as sambetts\|afk		18:34
pabelanger	clarkb: yah, I cleaned up old dirs, no help	18:34
clarkb	dhinesh: looks like it is comment links after all https://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/manifests/review.pp#n170	18:34
pabelanger	clarkb: I suspect we are just syncing back to much data	18:34
clarkb	pabelanger: last time when I investigate the fs issues it was largely the git repos	18:35
*** pvaneck has joined #openstack-infra		18:36
clarkb	each job can have like 5GB of just git repos	18:36
pabelanger	clarkb: oh, yah. that could be it too	18:36
clarkb	(and git repos are also inode heavy)	18:36
dmsimard	not just that, the executor also pulls the logs, right ?	18:36
pabelanger	clarkb: okay, so stop or ride it out?	18:36
clarkb	I'm torn I don't want to stop it so we can actually see what is using the disk	18:37
clarkb	but du is running very slowly	18:37
clarkb	still hasn't returned to me	18:37
pabelanger	k, I am searching manually myself	18:37
pabelanger	clarkb: yah, we are swapping too	18:37
dmsimard	clarkb: we can attach an additional volume temporarily and move stuff ?	18:37
clarkb	dmsimard: ya though I'm not sure that will be much faster (but I guess lets us investigate more later if necessary)	18:38
*** bnemec has quit IRC		18:38
clarkb	(the problem is stopping it deletes all/most of the builds dirs)	18:38
*** zzzeek has joined #openstack-infra		18:39
clarkb	ok I've got to prep for meeting /me is mostly afk until 1900	18:39
pabelanger	clarkb: so, far, everything I have git is 1.2GB to 1.6GB	18:39
pabelanger	5f02db42bc9a4be680c3d617a2eacdbf 2.3 GB	18:40
pabelanger	oh, interesting	18:41
pabelanger	I think we are leaking stuff	18:41
pabelanger	2017-10-31 15:24:18,141 DEBUG zuul.AnsibleJob: [build: 5f02db42bc9a4be680c3d617a2eacdbf] Sending result: {"result": "ERROR", "error_detail": "Project openstack/tripleo-quickstart-extras does not have the default branch master"}	18:41
pabelanger	that is last log entry, but still data on disk	18:41
pabelanger	clarkb: ^	18:42
dhinesh	clarkb: so just adding comment links from the log server would initiate it?	18:42
openstackgerrit	Merged openstack-infra/project-config master: Add 'Accepting Builds' panel for zuul-status https://review.openstack.org/516755	18:42
pabelanger	clarkb: so, I think we should stop ze10 and see has been leaked, then go back into debug logs	18:42
*** zzzeek has quit IRC		18:42
clarkb	pabelanger: ok	18:43
clarkb	dhinesh: your comments to gerrit have to match that rule there	18:44
pabelanger	clarkb: k, stopping	18:44
*** hemna has joined #openstack-infra		18:44
*** zzzeek has joined #openstack-infra		18:44
pabelanger	jobs aborting now	18:45
*** rloo has joined #openstack-infra		18:45
*** dprince has quit IRC		18:49
jeblair	back	18:50
dmsimard	the issues are with ze10 ?	18:51
jeblair	clarkb: are you running du?	18:51
dmsimard	yeah ok, nevermind -- saw a finger url for it.	18:51
clarkb	jeblair: not anymore	18:51
jeblair	i don't want to run my own if others are	18:51
jeblair	clarkb: what did you find?	18:52
clarkb	55G builds was biggest consumer followed by 9.1G executor-git	18:52
clarkb	everything else is in the KB range	18:52
clarkb	there are 224 builds	18:53
clarkb	so even at 1GB each thats enough to fill the disk	18:53
jeblair	there's no zuul-executor running?	18:53
clarkb	jeblair: pabelanger was stopping it	18:53
jeblair	okay, then they can all be deleted :)	18:53
pabelanger	okay, I don't see any more playbooks running on ze10, but I do see ssh connections still open	18:53
*** e0ne has joined #openstack-infra		18:54
jeblair	i'm assuming a bunch of them leaked due to earlier unclean shutdowns	18:54
pabelanger	yes, it just stopped now	18:54
jeblair	we probably should check for and delete old build dirs on the other executors	18:54
clarkb	jeblair: pabelanger found that {"result": "ERROR", "error_detail": "Project openstack/tripleo-quickstart-extras does not have the default branch master"} is a thing	18:54
jeblair	i'll start that	18:54
mugsie	is "ERROR Project openstack/requirements does not have the default branch master" a known issue?	18:54
jeblair	that's probably due to being out of space	18:54
clarkb	oh ya all those build dirs are from 1500UTC or so	18:55
clarkb	which was around when things restarted?	18:55
jeblair	yep	18:55
pabelanger	agree	18:55
pabelanger	5f02db42bc9a4be680c3d617a2eacdbf was the one I linked before and still exists on disk	18:55
*** jascott1 has joined #openstack-infra		18:56
clarkb	infra meeting in ~4 minutes	18:56
AJaeger	mugsie: konwn issue and fixed - please recheck	18:56
clarkb	join us in #openstack-meeting	18:56
jeblair	okay i'm deleting all build dirs older than 4 hours	18:56
mugsie	AJaeger: thanks	18:57
AJaeger	mugsie: if you get it on a change that you just pushed or after the recheck, then it's a new one - please report back in that case	18:57
clarkb	jeblair: I don't think that will catch tose on ze10 just yet	18:57
mugsie	cool - I will keep an eye on them	18:57
jeblair	clarkb: it seems to be the only one with a bunch of 1500s; other executors are generally older	18:58
jeblair	since it's stopped i'll delete the whole builds dir	18:59
*** yamahata has quit IRC		18:59
fungi	mmm, meeting time?	19:00
clarkb	yup	19:00
jeblair	pabelanger: i'm deleting all the build dirs, and i'm also doing fscks on all the git repos on ze10	19:03
jeblair	just to make sure everything is clean when we restart it	19:03
*** sree has joined #openstack-infra		19:03
pabelanger	jeblair: ack	19:03
*** yamahata has joined #openstack-infra		19:07
*** sree has quit IRC		19:07
openstackgerrit	James Slagle proposed openstack-infra/tripleo-ci master: Default $NODEPOOL_PROVIDER https://review.openstack.org/490037	19:07
*** catintheroof has quit IRC		19:14
*** yamahata has quit IRC		19:14
*** pcaruana has joined #openstack-infra		19:15
*** dprince has joined #openstack-infra		19:15
*** yamahata has joined #openstack-infra		19:17
*** rbrndt has joined #openstack-infra		19:22
*** ijw has quit IRC		19:23
*** ijw has joined #openstack-infra		19:24
jeblair	pabelanger: all of the build dirs on ze10 are deleted, and my git repo fsck has come back clean; you should be clear to restart when ready	19:24
pabelanger	jeblair: thanks, starting up now	19:25
*** electrofelix has quit IRC		19:25
AJaeger	jeblair, pabelanger what about tripleo-quickstart? That was mentioned above	19:25
jeblair	AJaeger: that error is probably caused by an error cloning from the git repo cache to the job's build dir. i checked the cache, and the repos are fine, so it was probably just that it ran out of space coping it into the build dir.	19:27
jeblair	AJaeger: now that all the old build dirs are deleted, should be fine	19:27
AJaeger	jeblair: ok, thanks	19:27
*** eharney has quit IRC		19:29
*** eharney has joined #openstack-infra		19:30
*** hasharAway is now known as hashar		19:38
*** Hal has joined #openstack-infra		19:43
*** Hal is now known as Guest4300		19:43
*** pvaneck has quit IRC		19:44
*** pvaneck has joined #openstack-infra		19:45
openstackgerrit	sebastian marcet proposed openstack-infra/openstackid-resources master: Raise Api rate limit for Public endpoints https://review.openstack.org/516773	19:48
*** amoralej is now known as amoralej\|off		19:48
*** pcaruana has quit IRC		19:49
openstackgerrit	Merged openstack-infra/openstackid-resources master: Raise Api rate limit for Public endpoints https://review.openstack.org/516773	19:49
*** pvaneck has quit IRC		19:49
*** sree has joined #openstack-infra		19:50
*** salv-orlando has quit IRC		19:51
openstackgerrit	Merged openstack-infra/system-config master: Fix dependency order with logstash_worker.pp https://review.openstack.org/516717	19:52
*** Guest4300 has quit IRC		19:53
openstackgerrit	Ruby Loo proposed openstack-infra/project-config master: Remove legacy python-ironicclient jobs https://review.openstack.org/516774	19:55
*** catintheroof has joined #openstack-infra		19:55
*** sree has quit IRC		19:55
*** mat128 has quit IRC		19:56
fungi	pabelanger: fwiw, i can't seem to get either of the "fg-test" instances for openstackjenkins in ord to accept any of my ssh keys	19:58
fungi	so i don't think they're anything i created	19:58
pabelanger	fungi: k, thanks	19:58
clarkb	I think show will tell us how old they are?	19:59
* clarkb looks		19:59
fungi	my money is on "ancient"	19:59
pabelanger	http://paste.openstack.org/show/625158/	19:59
fungi	they're using a "512 MB Classic v1" flavor after all	19:59
pabelanger	centos-6.2 nodes	19:59
pabelanger	512	19:59
fungi	oh yeah, created 2016-06-02	20:00
fungi	less ancient than i anticipated	20:00
clarkb	oh in that case we likely can delete them as the only thing we really ever used centos 6 for was test nodes and git.o.o and both work otherwise	20:00
pabelanger	fungi: I am guessing live migration?	20:00
pabelanger	stop /create	20:00
fungi	maybe	20:00
pabelanger	okay, will delete them now then	20:01
clarkb	re https://review.openstack.org/#/c/516502/ if we can get that in I think I'd like to restart gear on logstash.o.o as its over 100k now and see if that change makes a dent in job queue growth	20:01
clarkb	but first lunch	20:01
jeblair	i have to go run errands for a few hours	20:01
*** efried has joined #openstack-infra		20:03
openstackgerrit	Ruby Loo proposed openstack-infra/openstack-zuul-jobs master: Remove legacy python-ironicclient jobs https://review.openstack.org/516776	20:05
*** pvaneck has joined #openstack-infra		20:06
efried	Howdy folks. Zuul is -1ing everything without giving a reason I can understand. Known issue?	20:06
pabelanger	looking into why infracloud-chocolate appears to be wedged.	20:06
pabelanger	56 locked, ready nodes	20:06
*** ccamacho has joined #openstack-infra		20:10
pabelanger	yah, appears to be wedge waiting on more nodes	20:10
pabelanger	just a waiting game now I think	20:11
pabelanger	\| 0000624820 \| infracloud-chocolate \| nova \| centos-7 \| 0da89be1-f267-4926-bc91-b1debb4c509d \| ready \| 00:04:07:47 \| locked \|	20:11
pabelanger	is the longest right now	20:11
*** camunoz has quit IRC		20:11
fungi	efried: please link to an example of the everything it has -1'd	20:11
fungi	there was a disk spontaneously running out of room on one of the 10 executors a little bit ago which could have resulted in some job failures	20:12
fungi	though it seemed like we caught it pretty quickly	20:13
efried	fungi E.g.: https://review.openstack.org/#/c/515151/	20:13
fungi	efried: thanks, looking	20:13
efried	fungi E.g.: https://review.openstack.org/#/c/515223/	20:14
efried	fungi Let me know if you want more.	20:14
efried	fungi Thanks for looking!	20:14
fungi	efried: if you toggle the ci comments on that first one, all those "ERROR Project openstack/requirements does not have the default branch master" entries are likely to have been an executor failures we resolved earlier, those jobs started a few hours ago based on the time of your recheck and the duration of the working jobs and time it reported on the change	20:16
*** jtomasek has quit IRC		20:17
efried	fungi Okay, I want to say there was at least one we rechecked and it still failed, lemme go find...	20:17
fungi	efried: same thing with the second change you linked	20:17
efried	fungi https://review.openstack.org/#/c/516662/	20:17
efried	MERGER_FAILURE sounds like a sinister Wall Street thing.	20:18
fungi	efried: yes, that looks like a different issue	20:18
*** mrunge has quit IRC		20:18
*** kgiusti has left #openstack-infra		20:20
fungi	efried: unrelated to the corrupt git repo on one of the executors which caused the "does not have the default branch master" errors, timing on the failed recheck there looks related to an issue we resolved shortly thereafter where an executor ran out of disk space	20:21
fungi	it should be fine at this point	20:21
efried	fungi Cool, thanks for checking it out.	20:21
clarkb	merger failures can be lack of disk too	20:21
*** Swami has quit IRC		20:22
fungi	yup	20:22
fungi	that's what i expect it was in that case	20:22
*** mrunge has joined #openstack-infra		20:22
clarkb	we may want to formalize rm -rf /var/lib/zuul/builds on zuul startup	20:23
clarkb	maybe add it to the init script?	20:23
*** ijw has quit IRC		20:24
*** dave-mcc_ has joined #openstack-infra		20:26
*** smatzek has quit IRC		20:26
*** smatzek has joined #openstack-infra		20:27
*** dave-mccowan has quit IRC		20:27
inc0	dmsimard: I think it helped, gates failed still but for different reason I think	20:28
inc0	mariadb finally bootstraps:)	20:28
dmsimard	inc0: progress! Do you know what's the issue ?	20:29
*** erlon has quit IRC		20:29
inc0	yeah I think, I think I need to recreate /etc/hosts so hostname will point to 172... ip	20:29
inc0	for rabbitmq	20:29
dmsimard	inc0: we setup the inventory hostnames in /etc/hosts	20:29
inc0	yeah I know	20:30
*** ldnunes has quit IRC		20:30
dmsimard	But they point to internal nodepool ip	20:30
inc0	but since I'm using overlay, I'll set it to overlay net	20:30
inc0	shouldn't be too bad	20:30
dmsimard	I guess you want to setup the bridge IPs ?	20:30
inc0	yeah	20:30
*** csomerville has joined #openstack-infra		20:30
dmsimard	Ok, maybe something to consider as well clarkb ^	20:30
inc0	kolla-ansible already does that, but I needed to remove previous setup	20:31
inc0	dmsimard: not sure, it might be kolla-ansible specific	20:31
*** smatzek has quit IRC		20:31
pabelanger	mwhahaha: EmilienM: can you see what in tripleo jobs is overwriting /root/.ssh/known_host? it is deleting the infra-root keys we add with nodepool and prevents us from SSH into the running nodes	20:31
mwhahaha	pabelanger: probably in quickstart	20:32
*** Hal has joined #openstack-infra		20:32
*** cody-somerville has quit IRC		20:32
*** Hal is now known as Guest95121		20:32
mwhahaha	pabelanger: we append not overwrite	20:33
fungi	pabelanger: itym authorized_keys?	20:33
dmsimard	pabelanger: infra-root keys != known_hosts ?	20:33
*** xingchao has quit IRC		20:33
dmsimard	fungi beat me to it :)	20:33
pabelanger	oh ya, that	20:33
pabelanger	thanks	20:33
pabelanger	authorized_keys	20:33
pabelanger	ty	20:34
mwhahaha	but we do remove known_hosts https://github.com/openstack/tripleo-quickstart-extras/blob/79cf07e3dd3e555206ae6fefdd41423a6da38cd8/roles/virthost-full-cleanup/tasks/main.yml#L111	20:34
mwhahaha	pabelanger: https://github.com/openstack/tripleo-quickstart-extras/blob/dab754c8de7d235ffe85d157f7d6d6f05be988eb/roles/undercloud-setup/tasks/non_root_user_setup.yml	20:34
mwhahaha	pabelanger: but we're using the authorized_key thing in ansible so not sure if that should be removing any more keys	20:35
pabelanger	mwhahaha: is undercloud_user == root?	20:36
mwhahaha	pabelanger: usually it isn't but it might be in multinode	20:37
dmsimard	mwhahaha: authorized_key from ansible doesn't delete keys unless "state: absent" or "exclusive: yes"	20:37
*** sree has joined #openstack-infra		20:37
mwhahaha	found it	20:37
mwhahaha	https://github.com/openstack-infra/tripleo-ci/blob/49a6109cbd92f43bdca7e81e84925c023bd08a0a/toci_gate_test-oooq.sh#L238	20:38
pabelanger	yup	20:38
dmsimard	oh yeah, that totally overwrites the one in /root	20:39
*** sshnaidm is now known as sshnaidm\|afk		20:39
pabelanger	cat foo \| sudo tee -a /root/.ssh/authorized_keys	20:39
pabelanger	that is the fix	20:39
*** Guest95121 has quit IRC		20:39
dmsimard	depends what's the purpose	20:39
dmsimard	there might already be keys in there	20:39
mwhahaha	did you guys stop putting those keys in /etc/nodepool? https://github.com/openstack-infra/tripleo-ci/blob/49a6109cbd92f43bdca7e81e84925c023bd08a0a/toci_gate_test-oooq.sh#L232	20:40
mwhahaha	maybe that's the problem?	20:40
pabelanger	no, we use glean now to populate /root/.ssh/authorized_keys	20:41
pabelanger	just that your the only jobs that overwrite it	20:41
clarkb	dmsimard: inc0 that would be another new use case for the overlay	20:41
clarkb	dmsimard: inc0 I am not opposed to supporting it too but the way kolla sets it up I don't think kolla really wants us to do it anyways? because we are unaware of the keepalived ip	20:41
pabelanger	mwhahaha: what is your key used for?	20:41
*** sree has quit IRC		20:42
dmsimard	pabelanger: the fix is to leave line 238 as is, but then cat "${HOME}/.ssh/authorized_keys" \| sudo tee -a /root/.ssh/authorized_keys	20:42
dmsimard	instead of doing the cp	20:42
pabelanger	right	20:42
mwhahaha	pabelanger: no idea it was like that when i got here	20:42
dmsimard	I can send a patch since i'm not core and all	20:42
* dmsimard writes		20:42
clarkb	ianw: for https://review.openstack.org/#/c/516502/ you good if I go ahead and approve that now and edit the comment in a followup?	20:44
*** thorst_ has quit IRC		20:44
*** trown is now known as trown\|outtypewww		20:45
*** priteau has joined #openstack-infra		20:45
openstackgerrit	David Moreau Simard proposed openstack-infra/tripleo-ci master: Don't replace /root/.ssh/authorized_keys, append to it https://review.openstack.org/516785	20:45
dmsimard	pabelanger, mwhahaha ^	20:45
*** thorst has joined #openstack-infra		20:47
pabelanger	jeblair: Shrews: clarkb: yah, chocolate is completely wedged. I'm going to see about bumping max-server up by 5 to see if that will cause things to move again.	20:47
pabelanger	otherwise, I don't know how to delete or release a locked node	20:47
pabelanger	locked 'ready' node	20:47
inc0	clarkb: yeah in general we have a lot of setup like that in our code, so unless someone else wants it, I don't think there is reason for making this thing just for us	20:48
clarkb	inc0: in this case I think because you are using IPs in the range that we aren't direclty controlling you'll want to do it	20:48
dmsimard	clarkb: yeah.. basically at that point, they might as well parent to base instead of multinode and include each role they need individually (and leave the multi-node-hosts-file out)	20:48
*** felipemonteiro__ has quit IRC		20:51
*** thorst has quit IRC		20:51
*** xingchao has joined #openstack-infra		20:53
*** smatzek has joined #openstack-infra		20:56
pabelanger	okay, I am not sure what is going on with infracloud-chocolate, 3 more nodes came on line, and looked to be fulfilled for zuul, but still locked	20:57
pabelanger	2017-10-31 20:49:53,310 DEBUG zuul.nodepool: Updating node request <NodeRequest 200-0000806206 <NodeSet legacy-centos-7-2-node OrderedDict([('primary', <Node None primary:centos-7>), ('secondary', <Node None secondary:centos-7>)])OrderedDict([('subnodes', <Group subnodes ['secondary']>)])>>	20:57
*** hemna has quit IRC		20:58
pabelanger	I'll check back online jeblair is back from errands, see if we can figure out the issue	20:58
pabelanger	but, unsure how to release the locked ready nodes and unwedge	20:58
*** cody-somerville has joined #openstack-infra		20:59
*** cody-somerville has joined #openstack-infra		20:59
clarkb	pabelanger: did the nodes that are locked boot and did zuul use them?	20:59
clarkb	if they are just not booting we should be able to address that problem	20:59
*** xingchao has quit IRC		21:00
*** smatzek has quit IRC		21:01
*** csomerville has quit IRC		21:01
*** jcoufal_ has joined #openstack-infra		21:02
openstackgerrit	Clark Boylan proposed openstack-infra/project-config master: Better comment in logstash job submission role https://review.openstack.org/516786	21:03
clarkb	ianw: ^ there is the bigger comment I'm going to go ahead and approve the other change now	21:03
*** salv-orlando has joined #openstack-infra		21:05
pabelanger	clarkb: they have booted, nodepool-launcher marked them ready (fulfilled), then zuul locked them , but hasn't launched any jobs. let me see if I can figure out if an executor was assigned	21:05
pabelanger	maybe we are having an issue there	21:05
*** jcoufal has quit IRC		21:05
Shrews	pabelanger: yeah, not sure how to debug the zuul side of that	21:06
*** eharney has quit IRC		21:07
pabelanger	http://paste.openstack.org/show/625161/	21:07
pabelanger	Shrews: clarkb: that is all the data I see in zk	21:07
pabelanger	which looks correct	21:08
pabelanger	and I see a lock too	21:08
pabelanger	but don't know how to see that info	21:08
*** catintheroof has quit IRC		21:09
pabelanger	k, have to dadops with kids, I'll check backscroll this evening	21:09
*** rhallisey has quit IRC		21:10
openstackgerrit	Merged openstack-infra/project-config master: Logstash jobs treat gz and non gz files as identical https://review.openstack.org/516502	21:12
mwhahaha	ok so is there anywhere to look in the logs to see why the tripleo queue keeps resetting	21:16
mwhahaha	cause it jsut reset again and i have no idea why	21:16
mwhahaha	besides 510900,2 being stuck there in error	21:17
clarkb	usually the easiest thing is to look at the top of the queue and see what just failed. Has 510900 been in that state for a while?	21:18
mwhahaha	clarkb: yes	21:18
mwhahaha	clarkb: hours	21:18
*** ijw has joined #openstack-infra		21:18
clarkb	ok in that case its likely whatever change was ahead attempted to merge and failed because jgit (so zuul wasn't able to detect that ahead of time) or it got a new patchset and was evicted	21:19
openstackgerrit	Ruby Loo proposed openstack-infra/project-config master: Remove legacy python-ironic-inspector-client jobs https://review.openstack.org/516789	21:19
clarkb	but otherwise you should have a failure at the tip	21:19
mwhahaha	i know usually it shows up	21:19
mwhahaha	but come back to the ui and it's like wtf just happened	21:19
mwhahaha	but i litterally caught it zeroing out everything but no failure listed	21:19
clarkb	ya if its not filure then likely merge failed in gerrit late or new patchset arrived for some new change	21:20
clarkb	er some change at the head of the queue	21:20
mwhahaha	i don't think so	21:20
* mwhahaha goes looking		21:20
mwhahaha	http://logs.openstack.org/21/509521/2/gate/openstack-tox-pep8/?C=M;O=A	21:20
mwhahaha	gotta love all those runs	21:20
*** xingchao has joined #openstack-infra		21:21
mwhahaha	clarkb: it's items in the inventory.yaml to show what was ahead of it right?	21:22
*** priteau has quit IRC		21:22
mwhahaha	it looks like the last 3 runs of 509521 had nothing infront of it	21:23
*** priteau has joined #openstack-infra		21:23
clarkb	I'm not sure where that is recorded	21:23
openstackgerrit	Ruby Loo proposed openstack-infra/openstack-zuul-jobs master: Remove legacy python-ironic-inspector-client jobs https://review.openstack.org/516791	21:23
mwhahaha	it just reset again	21:23
mwhahaha	and nothing is infront of it	21:23
* mwhahaha flips tables		21:24
*** sree has joined #openstack-infra		21:24
clarkb	I don't think it reset the gate	21:24
clarkb	the changes behind it are still running jobs	21:25
mwhahaha	why is that job getting reset	21:25
clarkb	zuul will do that if the test node crashes (up to some limit of retries)	21:25
*** ijw has quit IRC		21:25
clarkb	I'm trying t ofind it in the logs now	21:25
clarkb	so its reseting jobs on that change but not reseting the gate as a result from what I see	21:26
*** xingchao has quit IRC		21:26
mwhahaha	they are all resetting	21:26
*** priteau_ has joined #openstack-infra		21:26
*** priteau has quit IRC		21:27
Shrews	i'm going to wager a guess that the infracloud long-locked nodes, and the tripleo problems mwhahaha is seeing are somehow related	21:28
*** sree has quit IRC		21:29
clarkb	2017-10-31 21:20:10,386 INFO zuul.Pipeline.openstack.gate: Resetting builds for change <Change 0x7fee5d659978 509521,2> because the item ahead, <QueueItem 0x7fee5d66fd30 for <Change 0x7fee5d66fba8 510900,2> in gate>, is not the nearest non-failing item, None	21:29
clarkb	2017-10-31 21:20:10,387 DEBUG zuul.Pipeline.openstack.gate: Cancel jobs for change <Change 0x7fee5d659978 509521,2>	21:29
mwhahaha	so it's resetting because 510900,2 is stuck?	21:30
clarkb	mwhahaha: well "is not the nearest non-failing item" I htink means something other than 510900 failed	21:30
mwhahaha	but there's nothing there :/	21:30
clarkb	and the only thin between those two changes that can fail other than 510900 is 509521 itself	21:30
clarkb	I'm trying to see if I can find build logs for 509521 now	21:31
mwhahaha	the pep8 logs were http://logs.openstack.org/21/509521/2/gate/openstack-tox-pep8/?C=M;O=A	21:31
mwhahaha	but not sure where the other job logs were	21:31
*** mrunge has quit IRC		21:31
clarkb	if it is the node crashing then we won't have copied logs becaues the instances went away	21:32
clarkb	trying to dig through via the zuul logs	21:32
*** rcernin has joined #openstack-infra		21:32
mwhahaha	clarkb: could it be the stomping on the authorized_keys that pabelanger mentioned earlier?	21:34
mwhahaha	clarkb: where zuul thinks the node crashed but it wasn't	21:34
mwhahaha	it's just that you can't connect anymore	21:34
*** amoralej\|off is now known as amoralej		21:34
mwhahaha	or is the the fact that it seems to belooping in http://zuulv3.openstack.org/static/stream.html?uuid=6d416d1154364d65982e64c940d2f6d0&logfile=console.log	21:35
clarkb	ya if zuul can't ssh back in that could do it	21:35
clarkb	but I think the thing pabelanger was talking about was root user specific not zuul user	21:35
mwhahaha	so how is that a new thing	21:35
*** threestrands has joined #openstack-infra		21:36
mwhahaha	is the zuul user different than the user the job uses?	21:36
*** armax_ has joined #openstack-infra		21:36
*** armax has quit IRC		21:38
*** armax_ is now known as armax		21:38
clarkb	no the zuul user is the user the job framework uses	21:38
*** mrunge has joined #openstack-infra		21:39
mwhahaha	ok maybe we're touching that, but we didn't change anything recently around that	21:39
rm_work	is zuul on storyboard or launchpad? wondering where i could request a feature	21:39
clarkb	rm_work: storyboard	21:39
rm_work	kk	21:39
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul-jobs master: Persist iptables rules https://review.openstack.org/513943	21:39
rm_work	clarkb: actually, is zuul's webui part of zuul or part of something else	21:40
dmsimard	clarkb: as per discussed ^ I'll fix the integration tests	21:40
*** edmondsw has quit IRC		21:40
*** bobh has quit IRC		21:41
openstackgerrit	Merged openstack-infra/project-config master: Better comment in logstash job submission role https://review.openstack.org/516786	21:41
clarkb	rm_work: it is part of zuul	21:42
rm_work	k, thanks	21:42
*** jascott1 has quit IRC		21:43
clarkb	mwhahaha: http://paste.openstack.org/show/625167/ that is a better snippte of logs but I think I am more confused now reading that	21:44
mwhahaha	clarkb: :( i have no idea. i'm watching the console of the one runing job to see if it is something we're doing. if it resets again i'm going to abandon 510900 to get it to go away	21:46
*** jascott1 has joined #openstack-infra		21:46
*** dprince has quit IRC		21:46
clarkb	I think what is going on is zuul thinks that 510900 is still the parent of 509521 in the ordered future git state under test	21:47
clarkb	where it should decouple the two queues and 510900 is in a queue of its own with one item in it (itself) and then 509521 forms the tip of a new queue	21:48
mwhahaha	my thoughts as well is that it's confused about those two	21:48
*** jcoufal has joined #openstack-infra		21:48
mwhahaha	legacy-tripleo-ci-centos-7-scenario003-multinode-oooq-container seems to be queued on 510900	21:49
mwhahaha	so i wonder if they keep resetting eachother	21:49
mwhahaha	so let me abandon that patch to clear it	21:49
*** boden has quit IRC		21:50
*** jcoufal_ has quit IRC		21:50
openstackgerrit	David Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add integration test coverage for iptables persistence https://review.openstack.org/513934	21:51
mwhahaha	clarkb: are there visible metrics for number of times a job (or jobs) gets reset somewhere?	21:51
openstackgerrit	David Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add integration test coverage for iptables persistence https://review.openstack.org/513934	21:51
dmsimard	ianw: addressed your comment in https://review.openstack.org/#/c/513943/ because splitting the role out made something else easier	21:55
clarkb	mwhahaha: I'm not sure if zuul emits that to graphite, I think it may under the status NONE category	21:55
*** bnemec has joined #openstack-infra		21:55
mwhahaha	clarkb: cause i want to say its been happening a lot based on queues but it's hard to tell :/	21:56
clarkb	http://paste.openstack.org/show/625169/	21:57
clarkb	mwhahaha: thats how often its happened roughly	21:58
dmsimard	clarkb: wow	21:58
mwhahaha	yea that coinicides with all the pep8 logs from that one change	21:58
dmsimard	clarkb: each time that's happened, all the jobs essentially restart with the failed job out of the queue, right ?	21:58
openstackgerrit	Mohammed Naser proposed openstack-infra/project-config master: Drop TripleO jobs from Puppet modules https://review.openstack.org/516794	21:59
clarkb	dmsimard: they should, but not sure if that is happening 510900 and 509521 seem to be coupled more tightly than I would expect	21:59
mnaser	mwhahaha: ^	21:59
mwhahaha	clarkb: the coupling was just within zuul cause one is stable/newton and the other is master :D	22:00
*** ijw has joined #openstack-infra		22:00
mwhahaha	so it's artifically together just in queue	22:00
* mwhahaha shrugs		22:01
clarkb	well its not artificial that is by design so that upgrade jobs do the right thing	22:01
clarkb	you can't test upgrades without that	22:01
mwhahaha	we've pulled our upgrades out of infra	22:01
clarkb	but reading that latest paste it looks like we have 510900 <- some change <- 509521	22:01
mwhahaha	so in this case, we probably don't want it	22:01
clarkb	and that some change changes constantly	22:01
clarkb	then it transitions to 510900 <- 509521	22:02
*** jcoufal has quit IRC		22:02
clarkb	but the whole time 510900 is there	22:02
clarkb	which seems odd to me	22:02
clarkb	unless 510900 was causing every change after it to break?	22:02
mwhahaha	probably	22:02
mwhahaha	it was one of the ones were the jobs had errored from the executor stuff	22:03
mwhahaha	so it seemed to have gotten in a really bad state	22:03
*** amoralej is now known as amoralej\|off		22:05
clarkb	we may need jeblair to dig in when he gets back	22:06
clarkb	I'm quikcly getting beyond my understanding here	22:06
mwhahaha	there be dragons	22:06
mwhahaha	seems that we have some other changes that might also be suffering from the same problems themselves	22:07
mwhahaha	liek 516630,1	22:07
*** marst has quit IRC		22:07
mwhahaha	that's got a bunch of stuff that was errored but may also be requeuing	22:07
mwhahaha	516651,1	22:07
mwhahaha	516047,2 499239,27 511350,25 511350,25	22:08
mwhahaha	i'm going to abandon/restore the tripleo ones	22:09
mwhahaha	but there's a rally and nova one as well	22:09
mwhahaha	that may continue	22:09
clarkb	mwhahaha: which rally/nova one?	22:10
mwhahaha	516047,2	22:10
mwhahaha	rally	22:10
clarkb	the only nova ones in the gate are green	22:10
mwhahaha	499239,27	22:10
mwhahaha	not gate	22:11
mwhahaha	in check	22:11
clarkb	ah ok	22:11
mwhahaha	doing same stuff	22:11
mwhahaha	or at least they look like they might be since they are still around	22:11
mwhahaha	and have things queued	22:11
clarkb	in check the fallout should be limited to that change, but ya probably want to evict them so they don't hang out there until next restart	22:11
*** jascott1_ has joined #openstack-infra		22:12
mwhahaha	well they also might be requeuing constantly as well taking up resources	22:12
mwhahaha	i cleared the 3 tripleo ones	22:12
mwhahaha	but i can't help with those two	22:12
*** slaweq has joined #openstack-infra		22:12
*** bobh has joined #openstack-infra		22:13
clarkb	ya not sure looks like they each have a single job queued (and no gate resets)	22:13
*** ijw has quit IRC		22:13
mwhahaha	just something to keep an eye on	22:14
*** slaweq has quit IRC		22:15
*** jascott1 has quit IRC		22:15
*** baoli has quit IRC		22:18
*** tpsilva has quit IRC		22:18
*** e0ne has quit IRC		22:18
clarkb	we'll want to keep an eye on it to see if "normal" gate failures create the same problem	22:18
clarkb	what should happen is after the first failure things get decoupled from each other and move on on their own	22:19
*** ijw has joined #openstack-infra		22:19
*** rloo has left #openstack-infra		22:19
clarkb	but if a second failure causes it to reset again that would be a bug	22:19
*** bobh has quit IRC		22:20
*** ijw has quit IRC		22:22
*** ijw has joined #openstack-infra		22:22
ianw	dmsimard: dropped a bridge/iptables comment on 516757 ... is testing sufficient for that?	22:24
clarkb	ianw: in this case we use ovs without linux bridges, does that change your concern?	22:24
clarkb	ianw: we also have syslogs from kolla jobs showing iptables dropped packets sourced and desitnation from that 172.24.4.0/23 range	22:26
ianw	clarkb: ok, just as long as we've tested it, i guess ovs is probably different	22:27
ianw	whenever i see "bridge" and "firewall" it makes me think of this	22:27
clarkb	ianw: ya inc0 mentions it seemed to address the problem in hisdepends on change	22:27
inc0	clarkb: fwiw it helped	22:28
*** aeng has joined #openstack-infra		22:30
*** ijw has quit IRC		22:33
*** slaweq has joined #openstack-infra		22:33
clarkb	mwhahaha: I've added this problem to the issues with zuul list at https://etherpad.openstack.org/p/zuulv3-issues	22:33
*** lbragstad has quit IRC		22:36
*** edmondsw has joined #openstack-infra		22:38
*** edmondsw has quit IRC		22:43
*** wolverineav has quit IRC		22:44
jlvillal	What does it mean when there is a Zuul error: MERGER_FAILURE on some of the jobs, but not all?	22:45
*** rbrndt has quit IRC		22:45
jlvillal	Seen on this patch: https://review.openstack.org/#/c/513152/3	22:45
*** priteau_ has quit IRC		22:47
jeblair	back	22:47
clarkb	jlvillal: earlier today an executor ran out of disk due to leaked build dirs. This caused the merger failure messages	22:47
clarkb	jeblair: can you see the converstaion with mwhahaha above (and tried to tldr it on the zuul issues etherpad too	22:47
jlvillal	clarkb: Ah, okay. recheck it is :)	22:47
jeblair	clarkb, mwhahaha: ack	22:48
jeblair	clarkb: "Changes in the gate did not appear to decouple from each other when the one ahead failed. Specifically 510900 and 509521 on October 31." ?	22:49
*** rlandy has quit IRC		22:49
mwhahaha	jeblair: yea that one	22:49
*** mriedem has quit IRC		22:50
clarkb	jeblair: yes	22:51
ianw	clarkb: do you have experience re-init of a bup repo -> http://paste.openstack.org/show/625174/ ?	22:51
clarkb	jeblair: it looked like 510900 kept resetting 509521, paste in ehterpad tries to capture that	22:51
clarkb	ianw: I think the .bup should be in /root ? re error: '/opt/backups/bup-ask/.bup/' is not a bup repository; run "bup init"	22:52
clarkb	oh its talking to the remote end and not finding a repo there	22:52
clarkb	ianw: I think you may have to bootstrap local and remote?	22:52
clarkb	system-config docs hopefully have more info?	22:52
*** ijw has joined #openstack-infra		22:53
ianw	clarkb: yeah, i'm not sure ... /opt/backups/bup-ask/.bup is the remote side, which you'd think "bup init -r ..." would create for you, anyway, i'll keep poking	22:53
clarkb	ianw: does /opt/backups/bup-ask exist?	22:54
clarkb	it may create the .bup for you but not if it can't login?	22:55
ianw	clarkb: yep, that was cloned from the old server. i just removed the .bup directory	22:55
clarkb	gotcha	22:55
ianw	it may be the system-config instructions are missing a bup init on the remote side	22:55
*** hongbin has quit IRC		22:55
clarkb	ianw: the bup init on the local side to be backed up was a relatively ne wthing too its possible the bootstrap on remote side steps are new as well?	22:55
clarkb	ianw: ya thinking that could be possible	22:56
jeblair	clarkb: so if i can try to summarize -- down at the end of your paste (where it says the NNFI is None), we're looking at 509521 behind 510900 which is at the head. you might expect 510900 to fail and reset 509521 once, but it appears that it somehow failed multiple times and therefore reset 509521 multiple times.	22:58
jeblair	clarkb: does that sound right? (also, are we sure that 510900 was the head at that time?)	22:58
*** slaweq has quit IRC		22:59
clarkb	jeblair: correct on my theory	23:00
jeblair	okay. i'll try to dig into that. it will take a while.	23:00
clarkb	jeblair: http://paste.openstack.org/show/625167/ may also be helpful	23:00
clarkb	jeblair: thats a bit more logging around a single reset occurence	23:00
ianw	i think we should exclude /var/lib/postgresql from backups ... we just want to backup the database dump. it changes under the live backup	23:00
clarkb	ianw: +1	23:01
jeblair	clarkb: ah thx	23:01
jeblair	i'm going to try to get 2 of those and see what happened between them	23:01
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [DNM] remove ci-backup-rs-ord.openstack.org https://review.openstack.org/516159	23:01
ianw	clarkb: ^ i think remote needs an init, updated instructions	23:02
clarkb	ianw: do you need to flip the order around? bup init on server then on client?	23:03
clarkb	docs say do server to be backed up first	23:03
clarkb	but I think that is why you had the error	23:04
*** xarses has quit IRC		23:04
ianw	clarkb: i think "bup init" on the client will create /root/.bup, but it's not till it runs with "-r user@backupserver:" that it tries to look at the remote .bup dir	23:05
clarkb	aha	23:06
openstackgerrit	Ian Wienand proposed openstack-infra/puppet-bup master: Ignore postgres working directory https://review.openstack.org/516798	23:08
*** gyee has quit IRC		23:10
ianw	infra-root: ^ if ok, could i get two eyes on this, i'd like to start new backups without this	23:11
jeblair	clarkb, mwhahaha: i think there's a bug with reconfiguration; i think we erroneously put 509521 behind 510900 again after reconfiguration, then the next pass through the queue processor threw it out again.	23:16
*** LindaWang has quit IRC		23:17
*** yamahata has quit IRC		23:21
*** yamahata has joined #openstack-infra		23:21
*** thorst has joined #openstack-infra		23:24
*** gildub has joined #openstack-infra		23:25
*** gmann_afk is now known as gmann		23:27
*** slaweq has joined #openstack-infra		23:29
clarkb	jeblair: ok, so my theory wasn't too far off	23:30
* clarkb goes to pack		23:31
*** thorst has quit IRC		23:31
*** daidv has quit IRC		23:35
*** daidv has joined #openstack-infra		23:35
*** aviau has quit IRC		23:38
*** aviau has joined #openstack-infra		23:38
*** nicolasbock has quit IRC		23:39
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: failing test for reconfiguration at failed head https://review.openstack.org/516799	23:41
jeblair	there's a failing test case that reproduces this. this may be a longstanding zuulv2 bug, we just didn't notice it because we didn't reconfigure every 5 minutes.	23:42
*** jascott1_ has quit IRC		23:46
*** jascott1 has joined #openstack-infra		23:47
*** baoli has joined #openstack-infra		23:50
*** markvoelker has quit IRC		23:51
*** adreznec has quit IRC		23:52
*** sdague has quit IRC		23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!