Thursday, 2018-10-25

*** gyee has quit IRC		00:14
*** jamesmcarthur has joined #openstack-infra		00:17
*** openstackgerrit has joined #openstack-infra		00:24
openstackgerrit	MarcH proposed openstack-infra/git-review master: Make it possible to configure draft as default push mode https://review.openstack.org/220426	00:24
*** smarcet has joined #openstack-infra		00:26
*** longkb has joined #openstack-infra		00:55
*** jamesmcarthur has quit IRC		00:58
*** jamesmcarthur has joined #openstack-infra		01:01
*** diablo_rojo has quit IRC		01:07
*** rlandy\|bbl is now known as rlandy		01:09
*** longkb has quit IRC		01:10
*** carl_cai has joined #openstack-infra		01:18
*** mrsoul has quit IRC		01:19
*** diablo_rojo has joined #openstack-infra		01:20
*** smarcet has quit IRC		01:20
*** hongbin has joined #openstack-infra		01:21
*** jamesmcarthur has quit IRC		01:23
*** smarcet has joined #openstack-infra		01:26
*** efried has quit IRC		01:49
*** jamesmcarthur has joined #openstack-infra		01:51
*** jamesmcarthur has quit IRC		01:56
*** efried has joined #openstack-infra		02:01
*** smarcet has quit IRC		02:02
*** anteaya has quit IRC		02:04
*** felipemonteiro has joined #openstack-infra		02:08
*** tinwood has quit IRC		02:10
*** tinwood has joined #openstack-infra		02:11
*** bobh has joined #openstack-infra		02:13
*** agopi has joined #openstack-infra		02:13
*** apetrich has quit IRC		02:16
*** longkb has joined #openstack-infra		02:18
*** jamesmcarthur has joined #openstack-infra		02:20
*** jamesmcarthur_ has joined #openstack-infra		02:27
*** jamesmcarthur has quit IRC		02:27
*** munimeha1 has quit IRC		02:30
*** roman_g_ has quit IRC		02:47
*** psachin has joined #openstack-infra		02:53
*** jamesmcarthur_ has quit IRC		03:10
*** bhavikdbavishi has joined #openstack-infra		03:18
*** diablo_rojo has quit IRC		03:21
*** jesusaur has joined #openstack-infra		03:21
*** lpetrut has joined #openstack-infra		03:30
*** bobh has quit IRC		03:32
*** felipemonteiro has quit IRC		03:32
*** ramishra has joined #openstack-infra		03:37
*** cfriesen has quit IRC		03:50
*** ykarel\|away has joined #openstack-infra		03:54
*** ykarel\|away is now known as ykarel		03:54
*** lpetrut has quit IRC		03:56
*** lbragstad has quit IRC		04:00
*** hongbin has quit IRC		04:08
*** janki has joined #openstack-infra		04:12
*** udesale has joined #openstack-infra		04:27
*** felipemonteiro has joined #openstack-infra		04:28
*** ykarel has quit IRC		04:30
*** armax has quit IRC		04:38
openstackgerrit	Merged openstack-infra/irc-meetings master: Remove Glare meeting https://review.openstack.org/612693	04:40
*** ykarel has joined #openstack-infra		04:49
*** spsurya has joined #openstack-infra		04:50
*** larainema has joined #openstack-infra		04:55
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul-jobs master: WIP: Add sar logging roles https://review.openstack.org/613112	04:57
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul-jobs master: Pin flake8 https://review.openstack.org/613194	05:07
*** armax has joined #openstack-infra		05:08
*** carl_cai has quit IRC		05:08
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool master: Prepend exception output with time, date and thread https://review.openstack.org/613196	05:10
*** kjackal has joined #openstack-infra		05:16
*** felipemonteiro has quit IRC		05:16
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul-jobs master: WIP: Add sar logging roles https://review.openstack.org/613112	05:22
*** rlandy has quit IRC		05:31
openstackgerrit	Simon Westphahl proposed openstack-infra/zuul master: Fix issue in Github connection with large diffs https://review.openstack.org/612989	05:32
*** armax has quit IRC		05:39
openstackgerrit	Simon Westphahl proposed openstack-infra/zuul master: Fix issue in Github connection with large diffs https://review.openstack.org/612989	05:49
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul-jobs master: Add sar logging roles https://review.openstack.org/613112	06:01
*** tobiash_ has quit IRC		06:03
*** lpetrut has joined #openstack-infra		06:03
*** tobiash has joined #openstack-infra		06:04
openstackgerrit	Andreas Jaeger proposed openstack-infra/zuul-jobs master: Fix flake8 3.6.0 errors https://review.openstack.org/613205	06:11
openstackgerrit	OpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/613206	06:13
*** gfidente has joined #openstack-infra		06:26
*** kjackal_v2 has joined #openstack-infra		06:34
*** kopecmartin\|off is now known as kopecmartin		06:34
*** slaweq has joined #openstack-infra		06:35
*** kjackal has quit IRC		06:37
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul-jobs master: Add sar logging roles https://review.openstack.org/613112	06:42
*** AJaeger has quit IRC		06:43
*** aojea has joined #openstack-infra		06:45
*** quiquell\|off is now known as quiquell		06:49
*** xek has joined #openstack-infra		06:49
*** AJaeger has joined #openstack-infra		06:57
*** xek has quit IRC		06:59
*** rcernin has quit IRC		07:00
*** apetrich has joined #openstack-infra		07:00
*** pcaruana has joined #openstack-infra		07:04
*** cfriesen has joined #openstack-infra		07:08
*** ccamacho has joined #openstack-infra		07:09
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul-jobs master: DNM: Run tox with eatmydata https://review.openstack.org/613221	07:11
*** SpamapS has quit IRC		07:11
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: DNM: Enable sar logging for unit tests https://review.openstack.org/613117	07:11
openstackgerrit	Merged openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/613206	07:15
*** jpena\|off is now known as jpena		07:15
*** ginopc has joined #openstack-infra		07:16
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: DNM: pass LD_PRELOAD and LD_LIBRARY_PATH vars https://review.openstack.org/613222	07:20
*** SpamapS has joined #openstack-infra		07:24
*** hashar has joined #openstack-infra		07:25
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul-jobs master: DNM: Run tox with eatmydata https://review.openstack.org/613221	07:37
openstackgerrit	Merged openstack-infra/zuul-jobs master: Fix flake8 3.6.0 errors https://review.openstack.org/613205	07:37
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: DNM: Enable sar logging for unit tests https://review.openstack.org/613117	07:37
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: DNM: pass LD_PRELOAD and LD_LIBRARY_PATH vars https://review.openstack.org/613222	07:37
*** carl_cai has joined #openstack-infra		07:42
*** Emine has joined #openstack-infra		07:46
*** cfriesen has quit IRC		07:48
*** ykarel is now known as ykarel\|lunch		07:59
*** jpich has joined #openstack-infra		08:00
*** ccamacho has quit IRC		08:00
*** ccamacho has joined #openstack-infra		08:01
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul-jobs master: DNM: Run tox with eatmydata https://review.openstack.org/613221	08:03
*** bhavikdbavishi has quit IRC		08:11
openstackgerrit	Natal Ngétal proposed openstack/gertty master: [Documentation] Add a link for aur. https://review.openstack.org/613238	08:21
*** roman_g has joined #openstack-infra		08:32
*** ykarel\|lunch is now known as ykarel		08:34
*** e0ne has joined #openstack-infra		08:42
*** electrofelix has joined #openstack-infra		08:54
*** ccamacho has quit IRC		08:54
*** ccamacho has joined #openstack-infra		08:55
*** xek has joined #openstack-infra		08:57
*** dtantsur\|afk is now known as dtantsur		09:09
*** tosky has joined #openstack-infra		09:16
ssbarnea\|bkp2	hi! i want to test some commands on the f28 image we use in CI. How can I do this?	09:26
quiquell	ianw: ^	09:26
quiquell	ianw: Can we get the image and start it up at a local host ?	09:27
ssbarnea\|bkp2	f28 images has some customizations that are affecting what we do and I cannot really wait for CI for these. Currently I am using ~clean f28 which is good for generic use-case, but i need to cover CI too.	09:27
*** ccamacho has quit IRC		09:27
ianw	quiquell ssbarnea\|bkp2 : you can grab the images from https://nb01.openstack.org/images/ ... they boot with config-drive + glean, so should pick up root keys via that	09:30
quiquell	ianw: Do the have the exclusion at dnf.conf ?	09:33
quiquell	ianw: Or this is done later on at some ansible role ?	09:33
ianw	quiquell: that will all be in the base image	09:33
quiquell	ianw: ack	09:34
quiquell	ykarel: ^	09:34
*** psachin has quit IRC		09:35
ykarel	quiquell, yes that only i refereed	09:35
*** panda\|off has quit IRC		09:35
ykarel	in #oooq	09:35
ykarel	ok u referred dnf.conf thing	09:36
*** kopecmartin is now known as kopecmartin\|afk		09:38
*** panda has joined #openstack-infra		09:38
*** derekh has joined #openstack-infra		09:39
*** psachin has joined #openstack-infra		09:40
ykarel	quiquell, so if i got it correct https://github.com/openstack/diskimage-builder/blob/e796b3bc1884cbb0a7259be486d835ca114cca9e/diskimage_builder/elements/pip-and-virtualenv/install.d/pip-and-virtualenv-source-install/04-install-pip#L29-L30 and https://github.com/openstack/diskimage-builder/blob/e796b3bc1884cbb0a7259be486d835ca114cca9e/diskimage_builder/elements/pip-and-virtualenv/install.d/pip-and-virtualenv-source-install/04-insta	09:41
ykarel	ll-pip#L156 does add excludes	09:41
ykarel	and the images in upstream are build using diskimage builder iirc, ianw ?	09:41
quiquell	ykarel: I think that's it, and it's by design so we don't mess around with those	09:42
ykarel	quiquell, yes if we know what we are doing, we can hack i think	09:43
*** yamamoto has quit IRC		09:44
quiquell	ykarel: You mean changing dnf.conf at our jobs ?	09:45
ianw	ykarel: yes, it's using those images	09:45
*** yamamoto has joined #openstack-infra		09:45
ykarel	ianw, ack	09:45
ianw	i mean elements	09:45
ykarel	quiquell, yes if it's really required when using nodepool images	09:45
ykarel	ianw, ack	09:46
ianw	it might be that "yum install python-virtualenv" when it's held does nothing, and "dnf install python-virtualenv" fails	09:46
quiquell	ykarel: We can fix our stuff just not installing if it's present in the system so we use nodepool versions	09:46
quiquell	ykarel: I mean for example if virtualenv is already install don't install python*-virtualenv and the same for pip and setuptools	09:47
quiquell	ssbarnea\|bkp2: ^	09:47
ykarel	quiquell, that's what i said, if really required, if we can fix other way it's fine,	09:47
ykarel	but remember we need to add support for non nodepool images	09:47
quiquell	ykarel: I will try to do that at my review	09:47
ykarel	quiquell, ack	09:48
quiquell	ykarel: yep, just want to make the job for f28 pass and then productify the changes so they work for all the environments	09:48
ykarel	quiquell, cool	09:48
quiquell	ykarel: Puff let's see	09:48
*** yamamoto has quit IRC		09:49
*** carl_cai has quit IRC		09:52
*** longkb has quit IRC		09:58
*** jbadiapa has quit IRC		10:02
*** psachin has quit IRC		10:03
*** jbadiapa has joined #openstack-infra		10:04
*** psachin has joined #openstack-infra		10:05
*** yamamoto has joined #openstack-infra		10:17
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: Add tox functional testing for drivers https://review.openstack.org/609515	10:20
*** bhavikdbavishi has joined #openstack-infra		10:20
*** psachin has quit IRC		10:25
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: Implement a Kubernetes driver https://review.openstack.org/535557	10:25
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: Add tox functional testing for drivers https://review.openstack.org/609515	10:25
*** psachin has joined #openstack-infra		10:27
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul-jobs master: Add prepare-workspace-git role https://review.openstack.org/613036	10:31
*** yamamoto has quit IRC		10:34
*** yamamoto has joined #openstack-infra		10:35
*** udesale has quit IRC		10:36
*** Emine has quit IRC		10:38
*** Emine has joined #openstack-infra		10:39
*** yamamoto has quit IRC		10:39
ssbarnea\|bkp2	what is the best place to talk about bindep? ... and its inability to have conditions based on distro version.	10:41
openstackgerrit	Benoît Bayszczak proposed openstack-infra/zuul master: Disable Nodepool nodes lock for SKIPPED jobs https://review.openstack.org/613261	10:43
ssbarnea\|bkp2	https://storyboard.openstack.org/#!/story/2004176 -- bindep no support for disto versions	10:44
openstackgerrit	Benoît Bayszczak proposed openstack-infra/zuul master: Disable Nodepool nodes lock for SKIPPED jobs https://review.openstack.org/613261	10:47
*** pbourke has quit IRC		10:48
*** pbourke has joined #openstack-infra		10:48
*** psachin has quit IRC		11:02
*** ccamacho has joined #openstack-infra		11:07
*** dave-mccowan has joined #openstack-infra		11:15
*** carl_cai has joined #openstack-infra		11:17
*** florianf is now known as florianf\|pto		11:19
*** yamamoto has joined #openstack-infra		11:24
*** adriancz has quit IRC		11:31
*** jpena is now known as jpena\|lunch		11:34
*** rh-jelabarre has joined #openstack-infra		11:34
*** jesusaur has quit IRC		11:43
*** lpetrut has quit IRC		11:44
*** jesusaur has joined #openstack-infra		11:46
*** bhavikdbavishi has quit IRC		11:56
*** ldnunes has joined #openstack-infra		12:01
*** haleyb has joined #openstack-infra		12:02
*** fuentess has joined #openstack-infra		12:03
*** quiquell is now known as quiquell\|lunch		12:05
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: Add tox functional testing for drivers https://review.openstack.org/609515	12:05
*** fuentess has quit IRC		12:09
*** adrianreza has quit IRC		12:09
*** jistr_ is now known as jistr		12:14
*** jamesmcarthur has joined #openstack-infra		12:14
*** jamesmcarthur has quit IRC		12:19
*** zul has joined #openstack-infra		12:19
*** boden has joined #openstack-infra		12:21
*** pcaruana has quit IRC		12:26
*** rlandy has joined #openstack-infra		12:27
*** ykarel is now known as ykarel\|afk		12:28
*** beagles is now known as beagles_mtg		12:29
*** janki has quit IRC		12:31
*** janki has joined #openstack-infra		12:31
*** ykarel\|afk has quit IRC		12:33
*** jpena\|lunch is now known as jpena		12:33
fungi	ssbarnea\|bkp2: here is probably the best place to talk about bindep (or on the infra ml)	12:34
*** pcaruana has joined #openstack-infra		12:39
Shrews	corvus: clarkb: before we merge the zk cluster stuff to the launchers and zuul, I think we need a plan of action on how to handle the current provider instances. If we just switch, we'll have a LOT of instances we'll have to manually clean up (and rather quickly to free up quota).	12:41
Shrews	corvus: clarkb: nodepool won't see those as leaked instances since they'll have the right metadata	12:41
Shrews	maybe we should first set max-servers to 0 for all providers and let most of them go away naturally?	12:41
*** gfidente has quit IRC		12:42
*** gfidente has joined #openstack-infra		12:42
fungi	ssbarnea\|bkp2: i've commented on your story	12:42
fungi	ssbarnea\|bkp2: we've made extensive use of that feature in the past when, say, packages were renamed, split, combined, et cetera between different distro versions	12:43
*** quiquell\|lunch is now known as quiquell		12:43
sshnaidm\|ruck	fungi, clarkb do you know if it's possible to check if docker proxy works fine? We have some jobs (but not all) failing because a long time for containers preparing, I'd like to ensure we still download them from proxy, not from docker.io	12:44
sshnaidm\|ruck	fungi, clarkb or maybe you know the way to check it in jobs - whether we download from proxy or docker.io directly	12:45
* Shrews steps away momentarily for bfast before the mass zoo migration		12:46
fungi	i'm not familiar enough with what sort of debug output docker commands provide. does it not tell you the urls it's using?	12:46
fungi	Shrews: clarkb: i'm similarly going to go try to catch early voting while it's hopefully quiet, and then be back as quickly as possible. what time did we say the zk migration was starting?	12:46
*** smarcet has joined #openstack-infra		12:47
*** smarcet has quit IRC		12:48
*** jcoufal has joined #openstack-infra		12:50
*** ykarel has joined #openstack-infra		12:50
corvus	Shrews, clarkb: or if we set min-ready to 0 then stop zuul, [almost] all of the nodes should be deleted	12:50
*** ansmith has joined #openstack-infra		12:50
*** janki has quit IRC		12:56
*** bnemec has joined #openstack-infra		12:56
*** yamamoto has quit IRC		12:56
*** yamamoto has joined #openstack-infra		12:56
*** bobh has joined #openstack-infra		13:02
*** lpetrut has joined #openstack-infra		13:03
*** rascasoft has quit IRC		13:05
*** _ari_ has quit IRC		13:05
*** rascasoft has joined #openstack-infra		13:05
*** kgiusti has joined #openstack-infra		13:05
*** agopi has quit IRC		13:12
*** hashar is now known as hasharAway		13:12
*** kgiusti has quit IRC		13:15
*** rascasoft has quit IRC		13:15
*** kgiusti has joined #openstack-infra		13:17
*** rascasoft has joined #openstack-infra		13:17
*** eharney has joined #openstack-infra		13:19
openstackgerrit	Fabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver https://review.openstack.org/604404	13:20
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Filter file comments for existing files https://review.openstack.org/613161	13:21
*** felipemonteiro has joined #openstack-infra		13:23
Shrews	fungi: i think t-35 minutes?	13:25
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Collect docker logs after quick-start run https://review.openstack.org/613027	13:25
Shrews	corvus: yes, that would be faster i think. then we'd just have to delete the ready nodes that are left	13:26
openstackgerrit	James E. Blair proposed openstack-infra/system-config master: Add opendev nameservers (2/2) https://review.openstack.org/610066	13:27
*** ansmith has quit IRC		13:28
*** ansmith_ has joined #openstack-infra		13:28
*** jistr is now known as jistr\|call		13:29
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP: support foreign required-projects https://review.openstack.org/613143	13:30
*** tobberydberg has joined #openstack-infra		13:30
openstackgerrit	David Shrewsbury proposed openstack-infra/project-config master: Disable all providers in nodepool launchers https://review.openstack.org/613329	13:31
Shrews	clarkb: corvus: ^^^ in case that's the route we choose	13:31
*** rascasoft has quit IRC		13:31
*** yamamoto has quit IRC		13:31
*** yamamoto has joined #openstack-infra		13:31
*** d0ugal has quit IRC		13:33
*** lbragstad has joined #openstack-infra		13:33
*** rascasoft has joined #openstack-infra		13:33
*** d0ugal has joined #openstack-infra		13:34
clarkb	we need to restart zuul before we restart the launchers then?	13:35
clarkb	(thats fine, making sure I understand)	13:36
*** pcaruana has quit IRC		13:37
clarkb	fungi: re docker it doesnt even use urls, it is more than a bit furstrating	13:37
clarkb	with current docker its just a hostname iirc and images mist be served relative to the http root at that lovation	13:38
fungi	okay, i'm back	13:39
fungi	with 20 minutes to spare	13:40
*** agopi has joined #openstack-infra		13:40
Shrews	clarkb: i think we (1) set max-servers to 0, (2) stop zuul, (3) delete (or record) any instances we need to manually cleanup, (4) merge your zk change to launchers & zuul, (5) revert max-servers change, (6) start launchers & zuul	13:40
Shrews	i think ???	13:41
* Shrews would like a logic check there		13:41
Shrews	or if someone has a better plan...	13:41
clarkb	Shrews: I think there is a 2.5 of stop launchers	13:42
Shrews	clarkb: oh yes	13:42
Shrews	actually, if we stop launchers, do we need to set max-servers to 0?	13:42
Shrews	oh, yes. we need them to delete the USED instances	13:43
Shrews	so we need a 2.1 step to wait for that to happen	13:43
clarkb	yup	13:45
clarkb	should I go ahead and put everything in them emergency file now, then we can approve and merge stuff and use kick.sh to apply things?	13:45
clarkb	we won't be able to rely on zuul merging stuff while we do the work	13:45
clarkb	(we may also want to set max-server:0 by hand? since that is a short temporary state?	13:46
Shrews	i'm fine with setting max-servers by hand	13:48
Shrews	will be shorter downtime	13:48
*** jcoufal has quit IRC		13:48
Shrews	maybe we need an announcement?	13:48
clarkb	nl, ze, zm, and zuul are in the emergency file now	13:48
clarkb	Shrews: ya we can #status notice as soon as we start rolling and I'll let the release team know	13:49
clarkb	as long as we capture queues and restore them only state changes in gerrit that happen while zuul-scheduler is off will be a problem	13:49
*** d0ugal has quit IRC		13:49
clarkb	https://review.openstack.org/#/c/612443/ and https://review.openstack.org/#/c/612442/ should be safe to approve now with those hosts in the emergency file. Any objections to doing that now?	13:50
*** beagles_mtg is now known as beagles		13:51
Shrews	let's start Operation Cattle Drive \o/	13:51
*** d0ugal has joined #openstack-infra		13:51
*** jcoufal has joined #openstack-infra		13:51
clarkb	note I did use the * glob in the emergency file which I think works?	13:51
corvus	we will find out	13:52
clarkb	heh I can list them too if we want	13:52
*** rpittau has quit IRC		13:52
clarkb	pretty sure the * should work	13:52
Shrews	should we go ahead and set max-servers to 0 in the configs?	13:56
*** efried has quit IRC		13:56
clarkb	Shrews: we should let those two changes merge first (they are waiting on node allocations)	13:56
Shrews	o rite	13:56
* Shrews enables zuul --turbo option		13:57
*** efried has joined #openstack-infra		13:57
*** smarcet has joined #openstack-infra		13:58
clarkb	sshnaidm\|ruck: when we are done with this zuul and nodepool work, I can take a look at docker things	13:58
sshnaidm\|ruck	clarkb, thanks	13:59
clarkb	looks like the tripleo gate just did a almost full restart ahead of our changes :/	13:59
clarkb	we might consider direct merging if we are on a tight time schedule, I think corvus was the one with the time bounds?	14:01
clarkb	corvus: do you think we should bypass the gate on those two changes? they did both pass check	14:01
corvus	the good news is it just did another partial reset	14:02
clarkb	I expect we'll move fairly quickly once those two changes merge. The biggest time sink is likely waiting for executors to stop and launcher to delete nodes	14:03
corvus	so we're getting nodes now; i think we can just let them merge	14:03
clarkb	wfm	14:03
*** smarcet has quit IRC		14:04
openstackgerrit	Simon Westphahl proposed openstack-infra/zuul master: Use branch for grouping in supercedent manager https://review.openstack.org/613335	14:07
*** ykarel is now known as ykarel\|away		14:08
clarkb	ok I don't think the * glob in the emergency file worked	14:09
clarkb	I'm going to list out the nodes instead	14:09
clarkb	(puppet just ran on zuul01)	14:09
clarkb	(which is fine at this point, nothing has merged yet)	14:10
corvus	they might be regexes, but listing is good now i think :)	14:12
clarkb	though project-config merging might make things interesting with the launchers racing ansible, arg	14:13
clarkb	the launchers should puppet in about 15 minutes and project config will merge before then	14:13
fungi	just a heads up, the etherpad system cpu bump from yesterday has returned as of 14:00z from the looks of it	14:13
openstackgerrit	Simon Westphahl proposed openstack-infra/zuul master: Use branch for grouping in supercedent manager https://review.openstack.org/613335	14:13
openstackgerrit	Simon Westphahl proposed openstack-infra/zuul master: Use branch for grouping in supercedent manager https://review.openstack.org/613335	14:15
fungi	i suspect it's memory pressure and etherpad is using a bunch of cache memory for its operation	14:16
clarkb	fungi: ya that is what made me think about the hwe kernels because memory was weird on the xenial kernel on our executors and switching to hwe fixed it	14:16
openstackgerrit	Merged openstack-infra/project-config master: Switch nodepool launchers to use new zk cluster https://review.openstack.org/612442	14:16
fungi	looking at the graph, restarting nodejs yesterday the cache memory usage spiked back up to ~3.5 out of 4 gb immediately, and as of nowish it's about out of free memory	14:16
clarkb	ok ^ may or may not apply to the launchers depending on whether or not the globs work for the launchers	14:16
clarkb	Shrews: ^ fyi, I'm not sure there is anything we can do to control that other than to force stop ansible on bridge right now	14:17
fungi	so we may want to think about resizing etherpad.o.o to 8gb?	14:17
clarkb	oh nevermind	14:17
clarkb	I did my math wrong and puppet ran on the launchers a few minutes ago?	14:17
Shrews	launchers still point to nodepool.o.o	14:17
clarkb	Shrews: yup I think we ended up having the timing just work out afterall	14:18
clarkb	so now just waiting on the zuul config update and we can do the manual steps after that	14:18
clarkb	fungi: what is odd is the version of etherpad and the version of nodejs hasn't changed and we kept the flavor fixed on the upgrade	14:18
mwhahaha	is gerrit ssh broken or is it just me?	14:19
clarkb	mwhahaha: I can ssh to gerrit from here	14:19
clarkb	the ls-projects command in particular works for me	14:19
mwhahaha	hmmm ok	14:19
corvus	ditto	14:19
mwhahaha	hrm it seems to be trying ipv6	14:20
mwhahaha	that's odd	14:20
corvus	mwhahaha: v6 wfm; maybe your v6 route is sad?	14:21
clarkb	fungi: before we bump the memory I'd be inclined to try the hwe kernel	14:21
clarkb	fungi: then if that doesn't help a rebuild on bigger flavor will give us the normal kernel	14:21
mwhahaha	corvus: yea something was odd, i started and ping which was delayed for a bit then when it kicked in it worked.	14:21
mwhahaha	sorry for the false alarm	14:21
corvus	false alarms better than real ones	14:22
dmsimard	clarkb: out of curiosity, have we tried etherpad on 18.04 ?	14:22
clarkb	dmsimard: no, because we don't have deployment stuff for etherpad that works on 18.04 currently	14:22
dmsimard	ack	14:22
clarkb	if someone wants to invest in that nowish we could do that too. It isn't a terribly complicated system once you get the nodejs and npm stuff working (I have no idea if that is a solved problem in ansible land, but containers theoretically make that better too)	14:23
clarkb	Shrews: sounds like you are watching the launchers, are you planning to set max-servers to 0 by hand? corvus did you want to do the zuul shutdown? I can run the kick.sh commands and help watch the cleanup that happens	14:24
fungi	clarkb: i agree, trying hwe kernel next would be good	14:24
clarkb	also how does this look: #status notice Zuul and Nodepool services are being restarted to migrate them to a new Zookeeper cluster. THis brings us an HA database running on newer servers.	14:26
fungi	lgtm	14:26
clarkb	I'll send that as soon as we start making changes to the running services	14:27
Shrews	clarkb: i can do the launcher configs	14:27
fungi	i guess it would be redundant to also say we're taking our quotas down to zero	14:27
Shrews	clarkb: are we ready to set max-servers to 0 now?	14:27
clarkb	Shrews: lets let the last job finish just in case it has to restart or something	14:27
Shrews	clarkb: awaiting the go signal...	14:27
corvus	i can do the zuul shutdown	14:29
clarkb	I think that job is currently compiling afs modules	14:30
clarkb	might be a couple minutes more if so	14:30
corvus	i should just do a full system restart, yeah?	14:31
corvus	just to go ahead and get everything current	14:31
clarkb	corvus: yes, but do a stop, then we'll pause for a sec to make sure configs are updated then we'll do a start	14:31
corvus	(we only need to do the scheduler, but since that's the disruptive one)	14:31
clarkb	corvus: but this way we have good data on current zuul tree so maybe zuul can do a release next week	14:32
gnuoy	Hi, does https://review.openstack.org/#/c/608866/ need re-approval now that the dependant change has landed ?	14:32
corvus	clarkb: ack. we still have the zuul-web pid bug, so i'll run the restart playbook and wait to remove the pidfile until we're ready.	14:32
clarkb	corvus: shrews didn't want to apply the zk changes to running processes. So we are stopping everything, updating config, then starting everything	14:32
*** gfidente has quit IRC		14:32
corvus	++	14:33
clarkb	gnuoy: a recheck will work too.	14:33
fungi	clarkb: do rechecks work now even when there's already a verified +1?	14:33
fungi	did zuul v3 solve that?	14:34
clarkb	fungi: they should, I think it was gerrit 2.13 that fixed that	14:34
fungi	oh, interesting	14:34
clarkb	fungi: the problem before was that older gerrit only sent vote deltas. So if you reapplied a +1 that info wasn't sent to zuul	14:34
clarkb	zaro fixed it so that gerrit sends the entire event content	14:34
gnuoy	clarkb, excellent, thanks, will do	14:35
openstackgerrit	Merged openstack-infra/system-config master: Switch zuul scheduler to new zk cluster https://review.openstack.org/612443	14:36
clarkb	Shrews: ^ I think you can set max-servers to 0 now.	14:36
Shrews	ok	14:36
clarkb	#status notice Zuul and Nodepool services are being restarted to migrate them to a new Zookeeper cluster. THis brings us an HA database running on newer servers.	14:38
openstackstatus	clarkb: sending notice	14:38
Shrews	ok, done	14:38
Shrews	good to stop zuul now	14:38
clarkb	corvus: ^	14:38
corvus	stopping zuul	14:38
clarkb	I'm going to update system-config on bridge.o.o so that we are ready to run kick.sh	14:38
-openstackstatus- NOTICE: Zuul and Nodepool services are being restarted to migrate them to a new Zookeeper cluster. THis brings us an HA database running on newer servers.		14:39
corvus	oh neat all the zuul hosts are disabled...	14:40
corvus	trying again	14:41
openstackstatus	clarkb: finished sending notice	14:41
corvus	scheduler stopped	14:41
clarkb	I'm watching nodepool list now to see nodes hopefully get cleaned up	14:42
Shrews	same	14:42
clarkb	yup a lot of deleting in there now	14:42
*** maciejjozefczyk has quit IRC		14:42
corvus	clarkb: want to go ahead and kick the zuul servers?	14:42
Shrews	we should have only READY and HOLD nodes left eventually	14:43
corvus	or, well, at least zuul01	14:43
fungi	gnuoy: the reason i recommended having the change reapproved is that it's non-urgent (just bookkeeping), so mnaser or dhellmann will get to it when they're available	14:43
*** otherwiseguy has joined #openstack-infra		14:43
*** munimeha1 has joined #openstack-infra		14:43
Shrews	corvus: all requests will be declined	14:43
fungi	a reapproval will run fewer jobs since the change already passed the check pipeline once	14:43
clarkb	Shrews: corvus in that case maybe we wait for the launchers to move over first?	14:43
Shrews	yeah	14:44
clarkb	ok I will wait on the kick.sh then	14:44
corvus	Shrews: i'm asking clarkb to kick zuul01 so it has the correct config in place. i was not planning on starting zuul.	14:44
gnuoy	fungi, ah ,ok. I didn't appreciate there was a mechanism for requesting re-approval	14:44
corvus	it takes a long time to kick	14:44
Shrews	oh	14:44
clarkb	corvus: do we know if puppet will start the scheduler?	14:45
corvus	clarkb: it .... well better not.	14:45
*** kopecmartin\|afk is now known as kopecmartin		14:46
*** gfidente has joined #openstack-infra		14:46
*** ramishra has quit IRC		14:46
*** smarcet has joined #openstack-infra		14:47
clarkb	we are down toe 28 nodes in the launcher I expect we can just wait since we are almost ready to stop the launchers then kick and start them?	14:47
corvus	we have a policy of not having our config management start user-facing services. i really hope we have not decided to violate that.	14:47
Shrews	one more to delete	14:48
clarkb	corvus: I'm skimming the puppet and I think it will actually do the right thing	14:48
clarkb	corvus: we ensure => undef but enable => true in the scheduler service definition	14:48
clarkb	corvus: however I'm not sure if ensure => undef has weird puppet default behavior like ensure running?	14:48
*** Swami has joined #openstack-infra		14:49
corvus	clarkb: can you just run it and we'll find out?	14:49
Shrews	wow, we've created (or attempted to create) over 3 million nodes since running nodepool v3	14:49
clarkb	corvus: I can	14:49
clarkb	corvus: doing that now	14:49
*** jistr\|call is now known as jistr		14:50
Shrews	hrm, vexxhost is being slow with that last delete	14:50
*** Swami has quit IRC		14:50
clarkb	Shrews: I've recorded the nodepool list output and since there are held and ready nodes to delete anyway maybe lets move ahead with stopping the launchers now?	14:50
clarkb	Shrews: then I can kick.sh the launchers too	14:51
Shrews	clarkb: yeah, we can get that last one manually too if we need to	14:51
clarkb	Shrews: ya lets do that	14:51
Shrews	stopping launchers...	14:51
Shrews	just fyi, http://paste.openstack.org/raw/733050/	14:52
clarkb	corvus: puppet says it is done on zuul01	14:52
corvus	clarkb: i agree. config looks good, no procs running	14:52
clarkb	corvus: let me know when you think I should run it on ze* and zm*	14:52
Shrews	clarkb: corvus: launchers stopped	14:52
clarkb	Shrews: ok kicking launchers now	14:52
corvus	clarkb: they won't use it so it's not important to run on the other z servers	14:53
*** gema has joined #openstack-infra		14:53
clarkb	corvus: oh right. We can let normal puppet update that then	14:53
Shrews	clarkb: that should take care of resetting max-servers too, right?	14:53
clarkb	Shrews: it should	14:53
Shrews	i'll make sure	14:53
clarkb	Shrews: the max-servers thing ended up working really well. Much smaller list of things to cleanup this way	14:54
Shrews	clarkb: yeah	14:55
*** bobh has quit IRC		14:55
clarkb	Shrews: kick.sh is done	14:55
Shrews	clarkb: we would have quickly had quota issues, too	14:55
clarkb	Shrews: I think you are good to start launchers when you are happy with their configs	14:56
Shrews	checking configs...	14:56
*** tobberydberg has quit IRC		14:56
Shrews	clarkb: configs look good. i'm going to start nl02 first since it has the lowest setting for max-servers	14:57
clarkb	Shrews: ok	14:57
Shrews	Marking for delete leaked instance ubuntu-bionic-limestone-regionone-0002677659 (819013fc-1051-4655-bc61-1769bdc1af4d) in limestone-regionone (unknown node id 0002677659)	14:59
Shrews	oh, maybe we validate node IDs???	14:59
* Shrews looks		14:59
clarkb	ok not much activity on nl02 because we set min ready with nl01	14:59
clarkb	nl01 should be the next one to start	14:59
clarkb	nl02 looks happy in its idling though	15:00
Shrews	clarkb: ok, maybe this max-servers step was unnecessary	15:00
Shrews	starting nl01 now	15:00
clarkb	ah the alien cleanup is more sophisticated than anticipated?	15:01
corvus	well, it helped reduce churn. but yeah, nodepool should do the cleanup for us.	15:01
*** cfriesen has joined #openstack-infra		15:01
*** jtomasek has quit IRC		15:01
clarkb	corvus: thats a good point, we avoid the shock of it having to do it all at once	15:02
clarkb	we have a bunch of building nodes now. Do we want to wait to see them go ready before starting the other launchers?	15:02
corvus	(we set metadata with the nodepool id, and if that id isn't in the db, it's a leaked instance)	15:02
Shrews	clarkb: yeah, give it a minute	15:02
clarkb	Shrews: these first boots may be slower than normal because the images are new and haven't been used yet (whcih caches them on hypervisors)	15:03
*** eernst has joined #openstack-infra		15:03
*** jpena is now known as jpena\|brb		15:04
Shrews	clarkb: yeah, i just wanted to validate some stuff first. good to start the others now	15:04
*** tobberydberg has joined #openstack-infra		15:04
clarkb	Shrews: are you going to start them or should I help with that?	15:04
*** ccamacho has quit IRC		15:05
Shrews	i can do it	15:05
clarkb	ok	15:05
Shrews	03 and 04 started now	15:05
clarkb	we have ready nodes	15:05
corvus	shall i continue with zuul?	15:06
clarkb	corvus: Shrews I think we can start zuul now that ^ is in place	15:06
Shrews	i see some ready nodes now	15:06
clarkb	corvus: I'm good to start zuul if shrews is	15:06
clarkb	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64704&rra_id=all is going to be a fun graph to watch	15:07
corvus	zuul is starting, but we're still a few mins away from node requests.	15:07
Shrews	up to corvus at this point. nodepool is priming, but ready	15:07
Shrews	clarkb: unless ianw updated the graphs for the new stat names, some may be empty	15:08
clarkb	Shrews: that is cacti network bw usage on the current zk leader. Independent of the statsd stuff	15:08
Shrews	oh, that's cacti	15:08
Shrews	yeah	15:08
*** yamamoto has quit IRC		15:09
*** yamamoto has joined #openstack-infra		15:10
*** smarcet has quit IRC		15:10
clarkb	corvus: how is zuul doing?	15:12
clarkb	iirc its about a 5 minute startup for zuul? should start seeing stuff zoon?	15:12
corvus	clarkb: scheduler is running, executors are stopping	15:12
clarkb	heh zoon	15:12
corvus	node requests are being submitted	15:12
clarkb	see them on then odepool side, lots of building requests	15:13
corvus	we're up to 145 node reqs	15:13
*** tobberydberg has quit IRC		15:13
corvus	i'll re-enqueue now	15:13
clarkb	there are in-use nodes now too	15:13
corvus	that's surprising	15:13
corvus	oh, they're marked in use in nodepool, but zuul hasn't actually begun execution yet	15:14
corvus	(it has no executors online)	15:14
clarkb	huh	15:14
corvus	it's normal -- zuul has claimed the nodes	15:15
corvus	executors have started	15:15
Shrews	clarkb: so i don't think the nodes in 'hold' state from before the migration will be cleaned up by nodepool	15:15
Shrews	so those will have to be tracked	15:15
*** jpena\|brb is now known as jpena		15:15
*** jamesmcarthur has joined #openstack-infra		15:15
clarkb	Shrews: noted, we also need to clean up the old image builds on the builders	15:15
corvus	Shrews: i think they should be deleted as leaks too	15:15
corvus	zuul is completely restarted; re-enqueue is in progress	15:16
*** yamamoto has quit IRC		15:16
Shrews	corvus: oh, i think you're right	15:16
Shrews	i hope they weren't needed! :)	15:16
clarkb	Shrews: we can always hold new ones	15:17
corvus	our max servers line in grafana is lower than before	15:17
Shrews	yup, i know. just sayin...	15:17
corvus	max went from 1034 -> 934	15:18
clarkb	that may be ovh gra1	15:18
clarkb	we've been sort of manually editing it and if the kick.sh undid that we'd lose almost a hundred nodes	15:18
* clarkb looks		15:18
corvus	oh there we go it's back now	15:18
clarkb	nope gra1 is correct	15:18
corvus	if you refresh, you can see it's back at 1034 after a step up	15:19
clarkb	oh that must've been stats lag due to starting launchers one at a tiem?	15:19
clarkb	so it stepped up for each launcher started	15:19
*** quiquell is now known as quiquell\|off		15:21
clarkb	the in-use number keeps going up according to grafana	15:22
clarkb	zk stat continue to look happy	15:22
*** smarcet has joined #openstack-infra		15:22
clarkb	zk network graph spiking but not crazy	15:22
clarkb	corvus: Shrews thoughts on removing all of these hosts from the emergency file at this point?	15:23
Shrews	i think nodepool is happy	15:23
clarkb	`echo stat \| nc localhost 2181` is the zk monitoring hack I learned if anyone else wants to look at zk too	15:24
clarkb	the Mode: and Oustanding: fields are interesting. Mode tells you if cluster is set up right (followers and leaders) and outstanding shows you if you have a syncage backlog I think	15:25
*** tobberydberg has joined #openstack-infra		15:25
clarkb	If we are good removing the nodes from the emergency file I'll send #status notice The Zuul and Nodepool database transition is complete.	15:27
corvus	wfm. reenqueue is still proceeding (busy mergers), but that's fine.	15:27
clarkb	ok	15:27
Shrews	i'm now going to find a fedex drop off point to send a laptop back to HQ	15:29
clarkb	#status notice The Zuul and Nodepool database transition is complete. Changes updated during the Zuul outage may need to be rechecked.	15:29
openstackstatus	clarkb: sending notice	15:29
*** bobh has joined #openstack-infra		15:29
clarkb	emergency file is updated. I left the builders in it due to the need for an sdk release	15:29
-openstackstatus- NOTICE: The Zuul and Nodepool database transition is complete. Changes updated during the Zuul outage may need to be rechecked.		15:31
clarkb	corvus: the zuul status page doesn't seem to want to load for me. Is that reconfigure induced slowness that we've seen before? I expect it is	15:31
*** armax has joined #openstack-infra		15:32
corvus	clarkb: yes, i expect that to continue as long as the re-enqueue is happening -- the same gearman worker handles both things	15:32
clarkb	corvus: roger	15:32
corvus	clarkb: it should eventually load -- like, it shouldn't take more than a few minutes (you may need some amount of refreshing due to js stuff)	15:32
openstackstatus	clarkb: finished sending notice	15:33
clarkb	corvus: ya it did eventually reload	15:33
corvus	re-enqueue is finished	15:33
corvus	i think that's everything then	15:33
clarkb	ya the only other outstanding item on my list is cleaning out the old images from the nodepool builders	15:34
clarkb	that isn't urgent but I shall try to get around to it today while shrews is around	15:34
clarkb	then I need to delete nodepool.o.o (say on monday?)	15:34
clarkb	infra-root ^ if you have anything on nodepool.o.o you want to keep please grab it now :)	15:34
clarkb	corvus: did you write down the zuul sha1 that is running?	15:35
*** bobh has quit IRC		15:35
clarkb	corvus: probably want to grab that for a possible zuul reelase?	15:35
corvus	clarkb: it's on the status page now	15:35
corvus	(it's the scheduler sha)	15:35
clarkb	oh nice	15:35
clarkb	thank you everyone for helping with this. Also pabelanger did a bunch of the prep work a while back	15:36
clarkb	sshnaidm\|ruck: hey, I'm about to context switch to docker things. I did notice that https://review.openstack.org/608319 is failing pep8 in the gate due to a name not being valid	15:38
*** ccamacho has joined #openstack-infra		15:38
clarkb	sshnaidm\|ruck: can you point me to a specific job that you'd like to learn more about the docker setup for?	15:38
clarkb	sshnaidm\|ruck: I'd like to start with the logs to either confirm we log the important bits or if not, undersatnd what is missing	15:38
fungi	mtreinish: do you think any of the security fixes mentioned in recent releases at https://github.com/eclipse/mosquitto/blob/master/ChangeLog.txt are relevant to our occasional crashes? especially the cve-2017-7651 fix in 1.4.15 looks suspicious	15:38
*** bobh has joined #openstack-infra		15:39
fungi	debian just backported a bunch of security fixes per https://security-tracker.debian.org/tracker/DSA-4325-1 https://security-tracker.debian.org/tracker/DLA-1409-1 https://security-tracker.debian.org/tracker/DLA-1334-1	15:40
fungi	in theory ubuntu ought to be able to import those updates	15:40
sshnaidm\|ruck	clarkb, I'd like to ensure that afs docker proxies really cache the image, we have this config: http://logs.openstack.org/87/610087/4/gate/tripleo-ci-centos-7-scenario001-multinode-oooq-container/2e409e1/logs/undercloud/etc/docker/daemon.json.txt.gz	15:44
sshnaidm\|ruck	clarkb, I think it's enough to use docker proxy, right? But I can't really check where I download the image from in reality	15:45
*** apetrich has quit IRC		15:45
clarkb	sshnaidm\|ruck: in the job logs can you point me to where the images are fetched? we can then cross check against the mirror configured there	15:45
clarkb	sshnaidm\|ruck: the ability to check where you downloaded the image from would be a logging function of whatever you use to pull thei mage	15:45
mordred	clarkb: it would require starting the docker daemon with debug logging I believe	15:45
mordred	ten there will be http trace entries in the logs that will indicate from where docker actually fetched things	15:46
clarkb	sshnaidm\|ruck: but we can check the mirror node logs too since I Guess docker doesn't etll you by default	15:46
sshnaidm\|ruck	clarkb, yeah, for example here: http://logs.openstack.org/87/610087/4/gate/tripleo-ci-centos-7-scenario001-multinode-oooq-container/2e409e1/logs/undercloud/home/zuul/install-undercloud.log.txt.gz#_2018-10-25_03_07_03_975	15:46
sshnaidm\|ruck	docker.io/tripleomaster/centos-binary-rsyslog-base in 2018-10-25 03:07:03.975 from http://mirror.bhs1.ovh.openstack.org:8081/registry-1.docker/	15:47
clarkb	sshnaidm\|ruck: sha256:7810f63ac7ce7026eb5bcb308fd485fb7aa3224707bb2c57c24d2dedd7992cbb looks like the hash for that image right ? (all of the image serving is from hashes so I'll be grepping that in the logs)	15:47
clarkb	oh that is specifically the centos base image. I'll check that one to start	15:47
clarkb	sha256:a55bd98df50363f394ecbb21d19aade7e250590211dd64e83019f8b9cc5273ea looks like a layer for rsyslog specifically	15:48
clarkb	neither sha256 is in the apache logs	15:50
*** yamamoto has joined #openstack-infra		15:50
*** eernst has quit IRC		15:51
sshnaidm\|ruck	clarkb, I see this, maybe it's it: "docker.io/tripleomaster/centos-binary-rsyslog-base@sha256:19ff38dcdc12a167bcf8dcbef4cb55247194b101d8fc1c4aff781ce73a794756"	15:51
sshnaidm\|ruck	and this: "Id": "sha256:5455eec0649474d22cd21dc3a08f9a80659973551c4c5ecbf675609926489c80"	15:52
sshnaidm\|ruck	so many shas	15:52
mordred	sshnaidm\|ruck: don't you know - shas make everything better :)	15:53
clarkb	sshnaidm\|ruck: sha256:5455eec0649474d22cd21dc3a08f9a80659973551c4c5ecbf675609926489c80 shows up a bunch in the logs as cache hits. But the previous one does not	15:54
sshnaidm\|ruck	clarkb, ok, let's hope it is :D	15:54
sshnaidm\|ruck	clarkb, thanks!	15:54
*** sshnaidm\|ruck is now known as sshnaidm\|bbl		15:54
fungi	the system cpu spike on the etherpad server has died back off. i wonder if it corresponded to the board call which also just wrapped up?	15:55
clarkb	fungi: that would be an unfortuante regression if using the service made it slow :)	15:56
clarkb	fungi: but that should be testable at least	15:56
fungi	especially concerning since there weren't _that_ many people using that particular pad	15:56
smcginnis	If we need a few folks to all hit an etherpad around the same time, I can help.	15:56
clarkb	sshnaidm\|bbl: as far as I can tell given that id it should be doing what we expect. If you want to be double sure adding the extra docker logging that mordred pointed out is probably worthwile	15:56
clarkb	we keep using steadily less quota in bhs1	15:57
clarkb	I wonder if the port cleanups failed there?	15:57
*** derekh has quit IRC		15:58
clarkb	#status log Zuul and Nodepool running against the new three node zookeeper cluster at zk01 + zk02 + zk03 .openstack.org. Old server at nodepool.openstack.org will be deleted in the near future	15:59
openstackstatus	clarkb: finished logging	15:59
*** e0ne has quit IRC		15:59
clarkb	also inap doesn't look happy. I'm going to start with inap since bhs1 is still mostly working	16:00
clarkb	the inap errors appear to be timeouts, possibly related to our switch to new images?	16:01
mgagne	clarkb: we redeployed a new minor version of Nova. Didn't expect that much impact. Is there anything I can look at for now?	16:01
mgagne	new packages were promoted ~55m ago	16:02
*** carl_cai has quit IRC		16:02
clarkb	mgagne: from our side it looks like we may have timed out some boots because we transitiioned to new images globally. Then those boots are now timing out trying to delete	16:02
clarkb	mgagne: I can get you uuids in just a momment	16:02
mgagne	clarkb: I never remember what I do to fix those issues :-/	16:03
clarkb	e064b5bf-dfca-48aa-8b02-b3da37509688 bdf1a01f-1e95-47ec-8e72-827d0180140a 637d76be-930a-4ea2-b145-96c8501d03f4	16:03
clarkb	are three examples	16:03
mgagne	checking	16:03
clarkb	mgagne: thank you!	16:04
mgagne	I think it was restarting nova-compute?	16:04
clarkb	mgagne: ya that sounds familiar	16:04
mgagne	clarkb: ok, one is now in error, I think it's now in a state where nodepool can retry its delete and it will work	16:05
clarkb	mgagne: great, nodepool should do that automatically	16:06
mgagne	I guess I will restart the whole region in that case	16:06
clarkb	I don't know enough about your cloud to advise one way or the other. But your help is greatly appreciated :)	16:07
mgagne	hehe	16:07
clarkb	openstack.exceptions.SDKException: Error in creating the server. Compute service reports fault: No valid host was found. There are not enough hosts available. is the ovh bhs1 usage reduction cause	16:11
clarkb	amorin: dpawlik ^ fyi if you happen to be around (this may be some side effect of us restarting zuul which creates a rush of demand)	16:12
clarkb	I wonder if the next nodepool feature is going to be a launch throttle	16:13
*** smarcet has quit IRC		16:14
*** ginopc has quit IRC		16:14
*** bhavikdbavishi has joined #openstack-infra		16:15
*** ccamacho has quit IRC		16:15
*** gyee has joined #openstack-infra		16:16
clarkb	other than those two cloud related issues (which could theoretically also be related to new openstacksdk?) we appear to be quite stable	16:16
clarkb	zookeeper seems to be keepign up with the demand in its new 3 node configuration as well	16:16
*** weshay has joined #openstack-infra		16:18
clarkb	mgagne: I see the deleting count falling in inap	16:19
mgagne	yea, instances are now in ERROR state.	16:19
*** emine__ has joined #openstack-infra		16:21
*** dtantsur is now known as dtantsur\|afk		16:22
*** Emine has quit IRC		16:24
*** jhesketh has joined #openstack-infra		16:25
*** shardy has quit IRC		16:25
*** mnaser has quit IRC		16:26
*** yamamoto has quit IRC		16:26
*** yamamoto has joined #openstack-infra		16:26
*** mnaser has joined #openstack-infra		16:26
*** jhesketh_ has quit IRC		16:27
*** jpich has quit IRC		16:28
clarkb	for the bhs1 thing we have 116 instances according to quota but only ~68 instances according to server list	16:30
clarkb	I think quota may have gotten out of sync there and so the hypervisors think they are used (and possibly are)	16:31
*** trown is now known as trown\|lunch		16:32
fungi	clarkb: amorin said yesterday that was a known issue in bhs1 i think? they're still working on trying to get gra1 back in okay shape	16:32
*** smarcet has joined #openstack-infra		16:32
clarkb	gotcha	16:32
*** panda is now known as panda\|off		16:38
openstackgerrit	Hervé Beraud proposed openstack/gertty master: Introduce security checks with bandit and fix it https://review.openstack.org/613371	16:38
clarkb	we have 1 node available in inap. I think we may be turning the corner there.	16:39
mgagne	clarkb: most are stuck in building right?	16:40
clarkb	mgagne: ya I think that is due to new images?	16:40
mgagne	yea	16:40
mgagne	just making sure there is no other issue I can fix	16:41
clarkb	mgagne: we should know in another 10-15 minutes.	16:41
clarkb	2 available now. Makes me think in 10-15 minutes we'll be operating normally	16:41
mgagne	+1	16:41
*** fuentess has joined #openstack-infra		16:42
*** gfidente has quit IRC		16:42
*** imacdonn has quit IRC		16:42
*** imacdonn has joined #openstack-infra		16:43
ssbarnea\|bkp2	fungi clarkb: question regarding basepython=python3 : please read https://github.com/tox-dev/tox/issues/1072 -- I am curious how openstack plans to cover this aspect.	16:44
clarkb	ssbarnea\|bkp2: you have to set basepython on the docs and linting to python3 then let py35/py36 etc do the right thing	16:47
ssbarnea\|bkp2	centos-7 concerns me because on it we python3->python3.4 which was dropped by ansible, see https://github.com/ansible/ansible/blob/devel/setup.py#L247	16:48
fungi	clarkb: ssbarnea\|bkp2: where's the context? you likely also need to set ignore_basepython_conflict = True	16:48
clarkb	fungi: the context is centos7 using python3.4 I guess?	16:49
ssbarnea\|bkp2	clarkb yeah, this is where I seen the failure to install ansible on python3, because it was incompatible.	16:49
clarkb	fwiw that seems more like a distro problem	16:49
fungi	otherwise setting basepython = python3 if python3 is 3.4 will result in the implicit py35 and py36 testenvs using 3.4	16:49
clarkb	not a tox problem	16:49
clarkb	fungi: aha	16:49
clarkb	ssbarnea\|bkp2: this is why we carefully run things on a variety of distros to make sure that their python versions line up with what we expect	16:50
clarkb	(it can be painful at times, but does work)	16:50
fungi	https://github.com/tox-dev/tox/issues/477	16:51
ssbarnea\|bkp2	for testing purposes is a PITA, not on my main (macos) machine where I have the freedom to juggle them but if you want to run just "tox" across multiple platforms, you soon realize that conflict.	16:51
clarkb	ssbarnea\|bkp2: but then setting basepython to python3 means we can explicitly run the docs job on xenial to get 3.5, on bionic to get 3.6. Then when the next release comes out we don't have to update tox.ini just add the job to run on that distro release	16:51
fungi	fixed by https://github.com/tox-dev/tox/pull/841	16:52
clarkb	ssbarnea\|bkp2: right we don't support tox on multiple platforms generally. We support specific versions of python informed by what is on distros (which we use to test) then you have to get the right version of python	16:52
fungi	stephenfin did excellent work there	16:52
clarkb	mgagne: up to 5 available now. Slow going but trending the right direction	16:53
ssbarnea\|bkp2	clarkb: just to be clear: I am not trying to say that setting it to python3 is bad. i am going to test the ignore_basepython_conflict	16:53
clarkb	ssbarnea\|bkp2: ya. I'm just trying to point out that just because an openstack tox.ini says python3 doesn't mean it will work with any ypthon3. We do that for convenience to avoid needign to update tox.ini frequently. You still need a valid python3 version	16:54
*** lpetrut has quit IRC		16:54
*** lpetrut has joined #openstack-infra		16:55
ssbarnea\|bkp2	clarkb: we are not in conflict here :D	16:55
ssbarnea\|bkp2	now I only need to explain others that we still have to use basepython for some tasks, like https://review.openstack.org/#/c/613083/2/tox.ini	16:58
*** jpena is now known as jpena\|off		16:58
*** xek has quit IRC		16:58
*** xek has joined #openstack-infra		16:59
*** chandankumar is now known as chkumar\|off		17:02
*** ccamacho has joined #openstack-infra		17:02
*** pcaruana has joined #openstack-infra		17:03
ssbarnea\|bkp2	now I have a cosmetic question about zuul html output not wrapping at screen width, doing horizontal scrolling browser sucks. Is this by design, or a known bug?	17:06
*** jamesmcarthur has quit IRC		17:06
clarkb	ssbarnea\|bkp2: at least on mobile it does one column without horizontal scrolling. I also don't have horizontal scrolling on current browser /me tries resizing	17:07
*** jamesmcarthur has joined #openstack-infra		17:07
clarkb	ssbarnea\|bkp2: it seems to resize without doing horizontal scrolling on firefox for me	17:07
ssbarnea\|bkp2	clarkb firefox on http://logs.openstack.org/83/613083/2/check/openstack-tox-linters/10deb24/job-output.txt.gz -- desktop	17:07
clarkb	oh the job logs not the zuul status web page	17:08
ssbarnea\|bkp2	to be exact http://logs.openstack.org/83/613083/2/check/openstack-tox-linters/10deb24/job-output.txt.gz#_2018-10-24_21_57_28_549089	17:08
clarkb	ssbarnea\|bkp2: my personal opinion on that is that is desired. It is a txt file not an html file	17:08
clarkb	it is the raw output	17:08
*** jamesmcarthur has quit IRC		17:08
clarkb	if we want something to render that differently we should do that on top of the raw data	17:08
ssbarnea\|bkp2	i think it has wrapping only on spaces which prevents the wrapping from occuring.	17:09
clarkb	it will be however firefox wraps text file lines	17:09
clarkb	(might even be configurable?)	17:09
ssbarnea\|bkp2	clarkb i am sure css can change behavior but i wanted to know if this was desired or fixable :D	17:09
clarkb	I think we want to make the raw data available, but if we also render it nicely for people with browsers that is good too	17:10
clarkb	the reason I say that is some log files are massive (hundreds of meg) and i have to view them with vim locally	17:10
clarkb	we also index the raw data in elasticsaerch so you want to be able to support use cases like that	17:10
ssbarnea\|bkp2	this is 100% css issue, i do not expect the lines to be wrapped server side.	17:11
ssbarnea\|bkp2	as you said: they should be as close as possible to raw	17:11
clarkb	ssbarnea\|bkp2: except there is no css in txt files	17:11
*** eharney has quit IRC		17:11
clarkb	oh except that os loganalyze is sending some. I understand now	17:12
clarkb	so ya you could update os-loganalyze to change the html rendering. Sorry I've been using vim a lot lately because too many large files.	17:12
clarkb	os-loganalyze will serve the raw data id you don't set accept encoding to html	17:13
clarkb	or accept-type? whatever the header is	17:13
fungi	infra-root: just a heads up, i have to disappear for a few hours to deal with insurance company stuff in person, but will be back on later today	17:13
clarkb	fungi: gl	17:13
ssbarnea\|bkp2	I found the fix, is missing: word-break: break-all;	17:13
ssbarnea\|bkp2	now i only have to find the place to add that code.	17:14
fungi	clarkb: supposedly they'll be handing me a briefcase full of unmarked bills at the end, so totally worth it (okay, really just a paper check, but regardless...)	17:14
clarkb	fungi: ah you are past the point of arguing over what was insured then :)	17:14
fungi	yep!	17:14
clarkb	ssbarnea\|bkp2: look in openstack-infra/os-loganalyze	17:15
fungi	well, except for the wind damage claim which we haven't finished yet. but flood and care are done	17:15
*** sshnaidm\|bbl is now known as sshnaidm\|off		17:15
fungi	er, flood and car are done	17:15
* fungi vanishes in a puff of errands		17:15
*** apetrich has joined #openstack-infra		17:17
*** apetrich has quit IRC		17:17
*** apetrich has joined #openstack-infra		17:18
openstackgerrit	Sorin Sbarnea proposed openstack-infra/os-loganalyze master: Assures that wrapping on PRE occurs on any kind of characters https://review.openstack.org/613383	17:18
ssbarnea\|bkp2	this reminded me that i hate the timestamp column, too much screen real estate taken by it. I would personally prefer to transform it into a line-numer and have the time value as tooltip.	17:20
clarkb	ssbarnea\|bkp2: I find the timestamps to be invaluable	17:20
ssbarnea\|bkp2	but obviously that I would need support for such change.	17:20
*** smarcet has quit IRC		17:21
ssbarnea\|bkp2	it is valuable, but not sure if it needs to be visible by default and all the time. maybe expandable so something similar.	17:21
clarkb	ssbarnea\|bkp2: I think if you want to do something like that then we want a render layer that allows you to toggle things like that. I don't think we should remove that from the raw txt	17:22
clarkb	it is really useful to undersatnd when things happen in a distributed system	17:22
ssbarnea\|bkp2	clarkb: sure I was referring to the display layer	17:22
clarkb	to the point where it is the one requirement I push on people to use the elasticsearch/logstash system	17:22
clarkb	ssbarnea\|bkp2: I also think it is important because it helps remind people that their jobs have a time cost	17:23
clarkb	that time cost impacts everyone else's ability to use those test resources	17:23
ssbarnea\|bkp2	50% of timestamp is spam = first and last. year and month are useless as we don't even keep logs for so long, and sub second divisions ...	17:24
clarkb	sub second is very useful. The year may not be necessary. But the rest of it is I think	17:24
clarkb	we want to keep logs for ~6 months again which is why the swift work is happening	17:24
ssbarnea\|bkp2	another thing that I could fix with css alone, almost for sure.	17:25
*** diablo_rojo has joined #openstack-infra		17:26
*** eharney has joined #openstack-infra		17:26
*** lpetrut has quit IRC		17:28
clarkb	ya I think we can fiddle with overlay type stuff to make it toggleable to user preference, but I also think being clear about how long jobs are taking and how long specific job tasks take is important particularly when we run behind the curve with constricted resources	17:29
clarkb	otherwise as soon as I ask someone to make their jobs run faster the response will be but I can't tell where the time is spent	17:29
ssbarnea\|bkp2	i am building now a proposal, and I will show it to you.	17:30
clarkb	mgagne: hrm it seems to have gone back to unhappy deleting nodes again	17:30
ssbarnea\|bkp2	i got the idea, i will try to cover all use cases	17:30
*** eharney has quit IRC		17:35
clarkb	mgagne: I'm going to find lunch/breakfast soon but let me know if I can help with any debugging	17:36
hogepodge	I'm looking through some Loci code, and there are notes saying things like "Remove this when infra starts signing thier mirrors" for the apt repositories.	17:40
hogepodge	Just curious, is this something that infra is now doing or plans on doing?	17:40
clarkb	hogepodge: it is not something we are doing now, and I know of no plam to do so. The problem there is apt repo updates are race prone and can lead to broken repos/clients. What happens is you can have packages removed from disk that are still in the index then your clients fail to install the package. The other fail mode is you update the index on a client then the packege is removed from the repo	17:41
clarkb	hogepodge: to address this we use reprepro to build a new valid index based on what is on disk ( and tell it to not clean up old packages for some hours ). Unfortuantely this means the indexes we produce are different than those from upstream and so the upstream keys aren't valid	17:42
hogepodge	Ok, thanks. I'm thinking I'm going to make that bit configurable so we're doing secure by default, but do insecure in the gate	17:42
clarkb	hogepodge: We could sign our repos and you could trust the keys, but we also want to avoid people treating those repos as consumable outside of testing	17:42
clarkb	or if someone can figure out a way to use the upstream signed indexes and mirror them without breaking clients we'd probably do that	17:43
hogepodge	No, I'm just going through our notes and trying to get TODOs out of code.	17:43
hogepodge	It's not a strong requirement, just wanted to see if the note reflected reality, and it kind of doesn't. :-) We don't require it to be signed.	17:43
hogepodge	But I can imagine a downstream user not wanting to trust unsigned repos for producing production packages.	17:44
hogepodge	In the gate, it's not critical ¯\_(ツ)_/¯	17:44
hogepodge	thanks clarkb	17:45
*** smarcet has joined #openstack-infra		17:46
clarkb	ok food is here. I'm out for a bit to eat	17:47
*** eharney has joined #openstack-infra		17:49
mgagne	checking	17:52
*** felipemonteiro has quit IRC		17:52
*** jamesmcarthur has joined #openstack-infra		17:54
*** trown\|lunch is now known as trown		17:54
openstackgerrit	Aakarsh proposed openstack-infra/project-config master: Move openstack-browbeat zuul jobs to project repository https://review.openstack.org/613092	17:54
*** betherly has joined #openstack-infra		17:55
*** jamesmcarthur has quit IRC		17:58
*** zzzeek_ has joined #openstack-infra		17:59
*** betherly has quit IRC		17:59
*** eumel8 has joined #openstack-infra		18:04
*** apetrich has quit IRC		18:07
*** tung_comnets has joined #openstack-infra		18:08
tung_comnets	Can someone give one more +2 to this patch: https://review.openstack.org/#/c/612962/	18:10
tung_comnets	Thanks :)	18:10
*** jamesmcarthur has joined #openstack-infra		18:10
openstackgerrit	Aakarsh proposed openstack-infra/project-config master: Move openstack-browbeat zuul jobs to project repository https://review.openstack.org/613092	18:11
*** jamesmcarthur has quit IRC		18:13
*** apetrich has joined #openstack-infra		18:20
*** jamesmcarthur has joined #openstack-infra		18:23
openstackgerrit	Pete Birley proposed openstack-infra/project-config master: New Repo - OpenStack-Helm Images https://review.openstack.org/611892	18:28
mgagne	clarkb: so I'm not sure what to do next. centos image looks fine, maybe because there aren't much instances based on it. but xenial is having a hard time.	18:29
*** bnemec has quit IRC		18:29
openstackgerrit	Pete Birley proposed openstack-infra/project-config master: New Repo: OpenStack-Helm Docs https://review.openstack.org/611893	18:30
*** felipemonteiro has joined #openstack-infra		18:30
*** electrofelix has quit IRC		18:31
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator https://review.openstack.org/604648	18:32
openstackgerrit	Tobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator https://review.openstack.org/604648	18:35
*** munimeha1 has quit IRC		18:37
*** bhavikdbavishi has quit IRC		18:37
*** felipemonteiro has quit IRC		18:37
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: DNM: Enable sar logging for unit tests https://review.openstack.org/613117	18:38
*** bobh has quit IRC		18:40
clarkb	mgagne: is it just timing out?	18:45
clarkb	mgagne: maybe we be patient with it and see if the cachnig is able to get in place?	18:47
mgagne	could be it, some are active now	18:47
clarkb	normally we don't rotate all images at once so this shouldn't be a common thing	18:47
clarkb	(we did it in this case because ti was easier when we moved zookeeper clusters to not migrate the data)	18:47
clarkb	Shrews: I'm going to look at nb01 now	18:48
clarkb	for cleaning up old images	18:48
Shrews	k k	18:48
clarkb	I'm just going to delete the stuff in /opt/nodepool_dib that is old	18:49
clarkb	Shrews: then after that we need to delete the images on the cloud side	18:51
clarkb	mordred: re ^ if you get a chance could you look at rax swift and glance to see if those are all cleaned up properly? I worry the sdk issue caused extra weirdness thee	18:54
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: quick-start: add a note about github https://review.openstack.org/613398	19:00
clarkb	#status log Old dib images cleared out of /opt/nodepool_dib on nb01, nb02, and nb03. Need to remove them from cloud providers next.	19:02
openstackstatus	clarkb: finished logging	19:02
clarkb	I'm going to start looking at image cleanup in $clouds	19:02
Shrews	clarkb: finishing up a required training thing, but can assist when i'm done (if you haven't finished by then)	19:05
clarkb	Shrews: ok.	19:06
mordred	clarkb: heya - once the images are imported the swift objects that are created re no longer needed - so we can clean any of those out - I can do a cleanup pass tomorrow	19:06
clarkb	One thing I've noticed is that in bhs1 some images can't be deleted because they have snapshots. Odd	19:07
clarkb	these are images older than the ones I expected to need to clean. The images I expected to clean apepar to delete ok	19:07
clarkb	mordred: ya, I just have no idea if that was working when we were having that error happen	19:07
clarkb	I want to say we don't catch the exception in the image create path and so it may not happen automatically	19:07
*** hasharAway is now known as hashar		19:07
*** ykarel\|away has quit IRC		19:11
*** smarcet has quit IRC		19:12
*** rlandy is now known as rlandy\|brb		19:13
*** bobh has joined #openstack-infra		19:15
clarkb	BHS1 is done, expcet for all the images that can't be deleted because they have snapshots (I expect that is something cloud side we should look into later)	19:17
clarkb	I don't think we made any snapshots ourselves	19:17
mordred	clarkb: yeah - that's weird, I can't think of any reason we'd make snapshots of images	19:17
*** bobh has quit IRC		19:20
AJaeger	config-core, could you review these two changes, please? https://review.openstack.org/612820 and https://review.openstack.org/612962	19:23
clarkb	GRA1 list of images to delete is running now	19:25
clarkb	it seems to be failing less with snapshots than BHS1	19:26
*** jcoufal_ has joined #openstack-infra		19:26
*** jcoufal has quit IRC		19:27
*** bobh has joined #openstack-infra		19:31
*** rlandy\|brb is now known as rlandy		19:31
clarkb	gra1 is done now too. Going to do inap next	19:40
*** lbragstad has quit IRC		19:43
*** lbragstad has joined #openstack-infra		19:43
clarkb	inap is going to take a while. I may run a few of these in parallel. I'll look at vexxhost sjc1 next	19:50
clarkb	what I'm doing is a openstack iamge list --private. Trimming out any images we want to keep and putting that ina file. Then doing a for loop catting that file and openstack image deleting	19:50
clarkb	its not very elegant, but I'm finding there are just enough new corner cases in each cloud that trying to automate it would take all day	19:51
clarkb	like for some reason there are cloud specific private images we didn't upload	19:51
*** irclogbot_1 has joined #openstack-infra		20:01
*** mriedem has joined #openstack-infra		20:03
*** kgiusti has left #openstack-infra		20:03
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP: support foreign required-projects https://review.openstack.org/613143	20:09
*** smarcet has joined #openstack-infra		20:16
mordred	clarkb: that might be osc not having full support for the new shared state?	20:16
clarkb	mordred: oh maybe	20:16
clarkb	I'm through all clouds but rax, packethost, citycloud (I think we upload but don't boot there), and inap	20:18
clarkb	inap is in progress	20:18
clarkb	the arm clouds were nice and tidy. Only the two we leaked by changing DBs had to be deleted looks like	20:18
* clarkb does packethost next		20:18
mgagne	clarkb: I think there is a network bottleneck somewhere. But there is little I can do except trying to tell our netadmin that it's "normal".	20:18
*** jcoufal_ has quit IRC		20:19
clarkb	mgagne: whats odd is it didn't do that before. So either our start it all at once shock to the system or your update (or some other change?) must've changed the behavior?	20:19
clarkb	mgagne: I'm happy to help however we can	20:19
mgagne	clarkb: the package contained unrelated changes to some management tools.	20:20
clarkb	ah	20:20
mgagne	maybe the network gear is much more overloaded than last time all images were updated.	20:20
*** jcoufal has joined #openstack-infra		20:21
*** irclogbot_1 has quit IRC		20:22
clarkb	possible	20:23
mgagne	;)	20:26
*** smarcet has quit IRC		20:27
*** hashar has quit IRC		20:29
clarkb	mordred: on the glance side of things we did seem to leak a bunch of images	20:31
*** imacdonn has quit IRC		20:31
clarkb	mordred: I'm going to go ahead and delete all but the ones we are using now since it should be safe to cleanup swift later by just clearing things out	20:32
clarkb	starting with rax-iad	20:32
mordred	clarkb: ++	20:32
mgagne	clarkb: we would need to put a hold on mtl01, this is affecting some other critical systems.	20:33
clarkb	mgagne: ok, if you write the change I'll go ahead and put it in place manually	20:34
clarkb	mgagne: max-servers: 0? (or I can write the change too)	20:34
mgagne	yes, 0 please	20:34
*** anteaya has joined #openstack-infra		20:34
clarkb	ok I've put that in place manually and will make sure puppet doesn't undo it while we wait for the change to merge	20:35
mgagne	thanks	20:36
openstackgerrit	Mathieu Gagné proposed openstack-infra/project-config master: Disable inap-mtl01 provider https://review.openstack.org/613418	20:36
clarkb	mgagne: I also have an out of band image cleanup running against inap. Should I stop that too? It is running openstack image delete serially one after another to cleanup some images that we leaked (some were stuck in saving and others are from us changing DBs)	20:38
clarkb	(I don't expect this is doing much to your cloud since it is running one at a time serially and cleaning things up, but happy to stop it too if you think it will help)	20:38
mgagne	clarkb: I don't think this will affect the network performance as this shouldn't pull much bandwidth	20:38
*** ansmith_ has quit IRC		20:39
clarkb	ya I don't expect it would cause that	20:41
*** xek has quit IRC		20:42
*** jcoufal has quit IRC		20:54
*** betherly has joined #openstack-infra		20:56
clarkb	rax-iad is done. Now on to ord	20:57
*** betherly has quit IRC		21:01
*** larainema has quit IRC		21:02
clarkb	heh I've been deleting by name. A few of the delete failures in rax were due to unique names. I'll make a second pass on iad and ord	21:13
clarkb	Shrews: ^ is that a nodepool bug? I wouldn't expect us to reuse a name	21:14
*** fuentess has quit IRC		21:14
*** irclogbot_1 has joined #openstack-infra		21:15
*** trown is now known as trown\|outtypewww		21:15
*** bobh has quit IRC		21:16
*** betherly has joined #openstack-infra		21:16
*** ldnunes has quit IRC		21:18
*** betherly has quit IRC		21:21
*** kjackal_v2 has quit IRC		21:22
*** kjackal has joined #openstack-infra		21:22
*** tung_comnets has quit IRC		21:28
*** kjackal has quit IRC		21:36
*** jamesmcarthur has quit IRC		21:37
*** betherly has joined #openstack-infra		21:37
*** betherly has quit IRC		21:42
*** kopecmartin is now known as kopecmartin\|off		21:43
*** efried is now known as pot		21:43
*** pot is now known as efried		21:43
*** jamesmcarthur has joined #openstack-infra		21:44
clarkb	ok doing rax-dfw now then I think I am done	21:46
*** jamesmcarthur has quit IRC		21:48
*** bobh has joined #openstack-infra		21:51
*** bobh has quit IRC		21:56
*** betherly has joined #openstack-infra		21:58
*** betherly has quit IRC		22:02
*** armax has quit IRC		22:03
*** armax has joined #openstack-infra		22:03
*** boden has quit IRC		22:13
*** betherly has joined #openstack-infra		22:18
*** gema has quit IRC		22:18
*** mriedem has quit IRC		22:21
*** betherly has quit IRC		22:23
*** emine__ has quit IRC		22:24
openstackgerrit	Merged openstack-infra/project-config master: New Airship project - Utils https://review.openstack.org/612820	22:25
ianw	clarkb: does the drop in http://grafana.openstack.org/d/8wFIHcSiz/nodepool-rackspace?panelId=15&fullscreen&orgId=1&from=now-7d&to=now correlate about when something nodepoolish was restarted?	22:25
openstackgerrit	Merged openstack-infra/project-config master: Add release tag and remove python jobs for Apmec https://review.openstack.org/612962	22:27
clarkb	ianw: yes	22:28
clarkb	ianw: I sent the notice we were taking zuul down about 14:38UTC and we were done about an hour later	22:29
ianw	clarkb: hrm, well i guess i have something to look into now then :)	22:30
clarkb	ianw: part of the rename in stats that you did?	22:31
ianw	clarkb: this was certainly not an intended result of that, but yeah, it's the suspect	22:32
ianw	hrm, although these stats are coming from openstacksdk via ... magic ... i wonder if this task thread etc has changed things	22:33
clarkb	#status log Old nodepool images cleared out of cloud providers as part of the post ZK db transition cleanup.	22:33
openstackstatus	clarkb: finished logging	22:33
clarkb	ianw: possible, openstacksdk did update	22:33
clarkb	ianw: also I've disabled inap at the request of mgagne	22:35
clarkb	ovh is looking happy	22:35
clarkb	seems somethign to do with asking inap to use a bunch of new images all at once made networking there sad	22:36
ianw	ahh, i wondered what that drop was. yeah, occasionally we cleanup a few ports on ovh, but not much	22:36
mgagne	clarkb: we will perform some tests tomorrow to see how we can improve the network performance for mtl01. For now, it should stay disabled.	22:36
clarkb	mgagne: ok, if it would help we can also turn it back on with a lower max servers value tio reduce thrashing but induce behavior if you need it	22:37
clarkb	say to 5 or 10. Not sure if that is desireable on your end	22:37
ianw	like in the last 2 hours on ovh gra1, we found 1 DOWN port that had been sitting around for 3+ minutes	22:37
clarkb	ianw: not bad	22:37
mgagne	clarkb: we need to avoid sending trafic to a specific network hardware. so we will test on our end first and enable back when we are sure the problem is mitigated.	22:38
clarkb	mgagne: roger	22:38
ianw	amorin: we might be at a point where it would make sense for us to modify the script to keep track of the leaked id's? it might be practical from your side to trace through just one port allocation and see why it leaked	22:38
ianw	2018-10-25 12:12:27,814 DEBUG nodepool.TaskManager: Manager rax-iad ran task ComputeGetServersDetail in 1.608656644821167s	22:40
ianw	so the name is being mangled correctly ... this leaves the possibility that stats are being produced but not making it to statsd	22:41
*** tosky has quit IRC		22:45
*** tpsilva has quit IRC		22:47
ianw	E..P..@.@.....c=h........<r.nodepool.task.rax-ord.ComputePostServers:0.000000\|ms	22:48
*** eharney has quit IRC		22:48
ianw	there's your problem ... it's sending zeros	22:48
clarkb	that will do it	22:49
*** betherly has joined #openstack-infra		22:49
clarkb	possibly related to the sdk update in that case	22:49
clarkb	fwiw I've yet to see anything that would indicate a problem with the new zk cluster	22:49
ianw	HA FTW	22:50
clarkb	ya the only two spofs now are gerrit and zuul scheduler	22:50
clarkb	(I guess technically log copies too, but that is being worked with swift uploads)	22:50
clarkb	I had to get up at a ridiculously early hour this morning so I may begin to call it a day at this point. Anything else I should look at or help with before doing so? I did AJaeger's review requests	22:51
ianw	no ... vice versa anything i should watch particularly?	22:53
*** betherly has quit IRC		22:53
clarkb	ianw: I would keep an eye on zk periodically just to make sure it hasn't done anything weird (cacti is probably good enough for that). Otherwise I don't think so	22:53
ianw	clarkb: ok, no worries. result will probably be gate grinding to a halt, so that's also a good canary :)	22:54
clarkb	indeed	22:54
*** yamamoto has quit IRC		23:01
*** yamamoto has joined #openstack-infra		23:01
*** yamamoto has quit IRC		23:06
*** betherly has joined #openstack-infra		23:09
*** markmcd has joined #openstack-infra		23:09
*** betherly has quit IRC		23:14
*** rlandy has quit IRC		23:23
*** carl_cai has joined #openstack-infra		23:39
*** yamamoto has joined #openstack-infra		23:44
*** agopi is now known as agopi\|brb		23:56
*** gyee has quit IRC		23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!