Tuesday, 2018-05-01

clarkb	Downloading http://mirror.bhs1.ovh.openstack.org/pypi/packages/af/c6/904651ff18e647e37351ca61a183218d3773c60f16d49c2b2756235b0fd4/yarl-1.2.1-cp35-cp35m-manylinux1_x86_64.whl (252kB)	00:17
clarkb	yarl requires Python '>=3.5.3' but the running Python is 3.5.2	00:17
clarkb	that is currently breaking zuul jobs	00:17
clarkb	I'm not in a spot to debug it but if people are curious ^ yarl comes from aiohttp	00:18
*** swest1 has joined #zuul		01:34
*** swest has quit IRC		01:34
*** spsurya has joined #zuul		01:43
tristanC	tobiash: fwiw, the re2 change broke status requirement written like 'status: "sf-io[bot]:local/check:success"'	04:29
tristanC	e.g. when you want to require a check success from your zuul, instead of any ".*:success" as you pasted	04:32
tristanC	when upgrading to 3.0.2, you have to escape regexp token in status requirement	04:36
*** pwhalen_ has joined #zuul		05:06
*** pwhalen has quit IRC		05:07
* SpamapS drums fingers on desk while nodepool sits and does nothing		06:43
*** sigmavirus24 has quit IRC		06:52
tobiash	tristanC: oh, that was unexpected. Maybe we should string match and fallback to regex match if that fails	06:52
tristanC	tobiash: fallback might be confusing too. well now that the change is released i think it's fine	07:00
*** hashar has joined #zuul		07:01
tobiash	SpamapS: you seem to have this problem often. Is your environment differently constrained than others? E.g. providers with low quota?	07:10
*** hashar is now known as hasharAway		07:42
*** ssbarnea_ has joined #zuul		07:57
*** ssbarnea_ has quit IRC		08:13
SpamapS	tobiash: I have one provider that is always at 0 max-servers so that I can rapidly fail over to it. Maybe that's a mistake?	08:21
SpamapS	(it is the most constrained cloud we have, so I try not to use it, but if the other one goes down..)	08:21
SpamapS	currently using about 75x8GB nodes on our less constrained cloud	08:22
tobiash	SpamapS: the provider with max-servers 0 should decline every request	08:22
SpamapS	it does	08:22
SpamapS	so I think it's not the problem	08:22
SpamapS	what I see is that the other one gets stuck	08:22
tobiash	SpamapS: any logging about quota?	08:23
SpamapS	Not sure where exactly, but I suspect it's a zk lock or something	08:23
SpamapS	nothing about quota	08:23
tobiash	you can inspect locks by telneting to zk and type in 'dump'	08:23
tobiash	that should list you any sessions together with their locks	08:23
tobiash	maybe that helps	08:24
SpamapS	It might	08:24
SpamapS	I suspect locks mainly because it is inconsistent	08:25
SpamapS	sometimes the thing fires up right away	08:25
SpamapS	and sometimes jobs are queued for 15+ minutes	08:25
SpamapS	(with no other jobs running, and more than enough nodes available, and none building)	08:25
SpamapS	It gets stuck between "Accepting node request ..." and the next thing to log..	08:26
tobiash	hrm, did you notice any zk instabilities?	08:31
tobiash	it is very sensitive to inconsistent io performance	08:32
SpamapS	Not really.. it almost doesn't register	08:34
SpamapS	2018-05-01 01:32:21,515 INFO nodepool.PoolWorker.a-main: Assigning node request <NodeRequest {'state': 'requested', 'nodes': [], 'stat': ZnodeStat(czxid=7873512, mzxid=7873589, ctime=1525163448214, mtime=1525163495027, version=1, cversion=0, aversion=0, ephemeralOwner=98858567839189105, dataLength=217, numChildren=0, pzxid=7873512), 'declined_by': ['zuul.cloud.phx3.gdg-24752-PoolWorker.p-main'], 'requestor':	08:34
SpamapS	'zuul.cloud.phx3.gdg', 'node_types': ['g7', 'g7', 'g7'], 'id': '200-0000026032', 'reuse': True, 'state_time': 1525163495.026352}>	08:34
SpamapS	That line is where it is stuck right at this moment	08:34
SpamapS	though it just let go	08:34
tobiash	can you get a thread dump?	08:34
SpamapS	probably	08:34
SpamapS	would need to run nodepool in pdb I suppose?	08:35
SpamapS	THis time it only queued for 3 minutes :-P	08:35
tobiash	SpamapS: just send sigusr2 to it, it should write the thread dump into the log	08:35
SpamapS	http://paste.openstack.org/show/720170/	08:36
SpamapS	that's the time it was doing nothing	08:36
SpamapS	zk was quiescent.. thing was just sitting there	08:37
SpamapS	tobiash: ah didn't know that	08:37
tobiash	when it hangs, check if the request is set to pending	08:38
tobiash	it should do that right after that log message	08:39
tobiash	and it looks like _waitForNodeSet doesn't log anything	08:40
tobiash	so you might want to add additional debugging info into http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/driver/__init__.py#n239	08:41
SpamapS	the request is set to 'requested' when it is paused	08:42
tobiash	maybe you're running into a corner case in that function	08:42
SpamapS	yeah that's kind of what I assume	08:43
SpamapS	just haven't had time to sit with it	08:43
tobiash	at least it should not switch to paused without logging it	08:44
SpamapS	http://paste.openstack.org/show/720172/	08:47
SpamapS	there's the thread dump	08:47
SpamapS	a-main is the "working" cloud, p-main is the one with max-servers: 0	08:48
*** electrofelix has joined #zuul		08:48
SpamapS	hm	08:49
SpamapS	I did just notice there are a ton of broken queued images in the alien image list	08:49
SpamapS	makes it take a while to list images	08:50
SpamapS	I suppose I should delete all the aliens	08:50
tobiash	it can slow down processing	08:50
SpamapS	yeah if it does that a few times	08:50
SpamapS	alien-image-list takes ~60s	08:50
tobiash	are you logging the openstack api requests?	08:50
SpamapS	no	08:50
tobiash	you should	08:50
SpamapS	well just the debug level stuff	08:50
SpamapS	but not like, GET's/PUT's	08:51
tobiash	I mean the tasks of the provider request queue	08:51
tobiash	it also logs the queue size	08:51
SpamapS	yeah those stream	08:51
tobiash	so it's not that create-server is starved because of image listing?	08:53
tobiash	does this happen under quota pressure?	08:53
SpamapS	looks like back around march 15 something horrible happened and we were thrashing on image uploads	08:53
SpamapS	tobiash: create server isn't needed	08:54
SpamapS	once it gets past this it just grabs the existing ready nodes	08:54
tobiash	ah ok	08:54
SpamapS	I have min-ready: 15 for the types that we use a lot	08:54
SpamapS	I'm deleting all the image records. they aren't even uploaded, just 'queued'	08:55
SpamapS	interesting.. I will say that now that we have enough jobs where we run 10 at a time on every PR.. it's time for a second executor. :-P	08:58
SpamapS	8 vcpu 16GB VM is hitting loads near 20	08:58
tobiash	SpamapS: did you configure 'rate' for your provider?	08:59
tobiash	https://zuul-ci.org/docs/nodepool/configuration.html#openstack-driver	08:59
tobiash	the default is 1 which means it sends at max one request per second to the cloud	08:59
tobiash	which is often far too low	09:00
SpamapS	tobiash: i didn't, that does sound slow	09:01
SpamapS	and if it's busy just trying to look at server details instead of list images and fetch their details...	09:01
tobiash	you have quite a few NodeDeleter threads in your dump so maybe it is under quota pressure because it's not cleaning up fast enough -> paused handler	09:01
tobiash	try to set it to 0.001	09:02
SpamapS	My quota in the cloud is 300	09:02
tobiash	that means it basically doesn't do a pause before sending the next request in the queue	09:02
SpamapS	max-servers is just me being polite. ;)	09:02
SpamapS	also I've now deleted all the bad images	09:03
tobiash	max-servers is also treated like quota in noderpool	09:03
SpamapS	I thought quota was the server side quota. :-P	09:03
tobiash	it handles both the same way	09:03
tobiash	so you can have server side quota and self-constrained quota	09:04
SpamapS	ah	09:04
SpamapS	and yeah I'm churning a lot of nodes	09:04
SpamapS	have a 3 node and a 5 node job	09:04
SpamapS	that get run often	09:05
tobiash	that makes me think that you have definitely a far too low rate	09:05
tobiash	my providers run with a rate of 0.01	09:06
SpamapS	Yeah the control plane can def handle it	09:07
tobiash	that also makes me think if the default of 1 is actually a good default (as it will only work well for very small deployments)	09:08
tobiash	maybe 0.1 would be a better default?	09:08
tobiash	corvus: what do you think? ^	09:09
tobiash	SpamapS: it's even worse: rate = In seconds, amount to wait between operations on the provider. Defaults to 1.0.	09:11
tobiash	SpamapS: so you're not even getting one request per second but actually less	09:11
SpamapS	yeah	09:14
SpamapS	tobiash: never noticed before but sometimes the queue was indeed getting long	09:16
SpamapS	20 or so at the worst	09:16
SpamapS	I've now set it to 0.01	09:16
SpamapS	and queue is staying at 0	09:16
SpamapS	but box still waiting	09:17
SpamapS	and yeah, my executor is also bombed	09:18
* SpamapS puts adding another one on the next sprint board :-P		09:18
SpamapS	tobiash: thanks, that does seem a bit better	09:18
SpamapS	still stalls for a minute or so.. but seems consistently 1 minute.. Have done 5 patches now.. and all were ~60s .. last night I was seeing 5-15 minute lag	09:19
SpamapS	alright, sleep calls	09:20
tobiash	at least better now	09:22
openstackgerrit	Ian Wienand proposed openstack-infra/zuul master: Pin yarl for python < 3.5.3 https://review.openstack.org/565470	09:31
openstackgerrit	Ian Wienand proposed openstack-infra/zuul master: Pin yarl for python < 3.5.3 https://review.openstack.org/565470	09:37
*** ssbarnea_ has joined #zuul		09:41
*** gtema has joined #zuul		10:28
openstackgerrit	Merged openstack-infra/zuul master: Pin yarl for python < 3.5.3 https://review.openstack.org/565470	10:46
openstackgerrit	Merged openstack-infra/zuul master: Add release note about config memory improvements https://review.openstack.org/565348	10:56
*** gtema has quit IRC		10:58
odyssey4me	Hi everyone, has any thought been given to implementing the ability to define a matrix of jobs? I mean something like https://docs.travis-ci.com/user/customizing-the-build#Build-Matrix where I want to execute the same job, but with different parameters and instead of defining each job individually using the parent/child, zuul could prepare the matrix for me.	11:44
*** ssbarnea_ has quit IRC		12:22
*** rlandy has joined #zuul		12:26
*** weshay is now known as weshay\|ruck		12:46
*** pwhalen_ is now known as pwhalen		13:01
*** pwhalen has joined #zuul		13:01
*** ssbarnea_ has joined #zuul		13:10
*** ssbarnea_ has quit IRC		13:15
*** hasharAway has quit IRC		13:27
openstackgerrit	Merged openstack-infra/zuul master: Increase unit testing of host / group vars https://review.openstack.org/559405	13:27
openstackgerrit	Merged openstack-infra/zuul master: Inventory groups should be under children key https://review.openstack.org/559406	13:28
*** openstackgerrit has quit IRC		13:34
*** pwhalen has quit IRC		13:47
*** pwhalen has joined #zuul		13:50
*** gtema has joined #zuul		13:55
*** ssbarnea_ has joined #zuul		14:03
*** ssbarnea_ has quit IRC		14:07
*** ssbarnea_ has joined #zuul		14:13
*** gtema has quit IRC		14:15
corvus	odyssey4me: we had that in jjb, but we're trying to avoid it in zuul because it can get pretty confusing pretty fast. so at the moment, the answer is basically "make a bunch of one-liner jobs which override the thing you want to change". but if that doesn't work out in the long run, we can probably add some syntactic sugar to do that.	14:17
*** dkranz has joined #zuul		14:17
odyssey4me	corvus yeah, understood - the trouble is that now we've got to make many, many job definitions which gets confusing too	14:28
corvus	odyssey4me: indeed. is this in openstack, or a public repo where i can take a look?	14:28
odyssey4me	especially when you're trying to cover multiple axes - like python version, ansible version, code path, and config elements	14:28
odyssey4me	not just yet, I'm just forward planning - trying to figure out how to tackle things and looking at options	14:29
corvus	odyssey4me: okay, let me know when you get there -- i'd like to see what it ends up looking like	14:31
corvus	odyssey4me: also here's an idea for an interim solution: you could put all the variants in one file in zuul.d and write a script which generates the matrix output	14:31
corvus	(so keep the main body of the job in a separate file so you don't have to worry about munging it)	14:32
odyssey4me	corvus yep, I was just thinking of doing that	14:33
odyssey4me	use a tool to generate a static zuul yaml file - kinda the best of both worlds	14:33
corvus	it's at least an easy way to experiment before we have to commit to syntax changes :)	14:34
mordred	++ I like that as an experimentation approach	14:39
*** acozine1 has joined #zuul		14:40
*** CharlesShine has joined #zuul		14:51
*** CharlesShine has left #zuul		14:52
*** trishnag has quit IRC		14:57
clarkb	SpamapS: tobiash I notice that reuse: True is set maybe related to that?	15:41
clarkb	(its not something we use upstream)	15:41
tobiash	clarkb: reuse?	15:55
clarkb	['zuul.cloud.phx3.gdg-24752-PoolWorker.p-main'], 'requestor':'zuul.cloud.phx3.gdg', 'node_types': ['g7', 'g7', 'g7'], 'id': '200-0000026032', 'reuse': True, 'state_time':1525163495.026352}>	15:56
clarkb	tobiash: ^ there reuse:True I think that means nodepool is reusing the nodes?	15:56
tobiash	ah, that's a min-ready request	16:05
clarkb	ah	16:07
SpamapS	odyssey4me: fwiw, I think having a single clear file that has a list of jobs and the reason they vary is actually the least confusing option. But, that's me preferring composition in general.	16:15
SpamapS	clarkb: yeah we're not reusing nodes.	16:16
*** ssbarnea_ has quit IRC		16:18
*** trishnag has joined #zuul		16:24
*** trishnag has quit IRC		16:24
*** trishnag has joined #zuul		16:24
*** spsurya has quit IRC		16:31
*** acozine1 has quit IRC		16:51
*** ssbarnea_ has joined #zuul		17:05
*** sshnaidm is now known as sshnaidm\|afk		17:17
pabelanger	clarkb: corvus: just looking at python3.6 jobs, for zuul / nodepool do we want both py35 / py36 or just take the leap directly to py36?	17:33
clarkb	pabelanger: I think based on our py26->27 experience we probably want to run both	17:34
clarkb	especially if we want to continue to support 35	17:34
clarkb	(since aiohttp for example just broke us there)	17:34
pabelanger	wgm	17:34
pabelanger	wfm*	17:34
pabelanger	we also still have xenial servers for openstack-infra, so good to keep it around too	17:35
clarkb	locally I use py36, so I'm reasonably confident zuul works under py36	17:35
clarkb	so doing both shouldn't be too bad outside of external deps breaking us	17:36
pabelanger	yah, I've been using fedora a lot and haven't seen an issue so far	17:36
mrhillsman	reproduce.sh no longer works because /usr/zuul-env/bin/zuul-cloner does not exist, is there a workaround/fix for this?	17:43
clarkb	mrhillsman: not currently no.	17:43
mrhillsman	thx	17:43
clarkb	mrhillsman: beyond z-c not existing zuul isn't publicly publishing its git refs	17:43
clarkb	so you'd need something to do the merges like zuul before invoking the devstack run which I think we've largely decided will happen as part of a "run zuul job locally" generic tool (rather than devstack specific)	17:44
clarkb	(I want to say jhesketh has been looking at that problem)	17:45
mrhillsman	cool, thx for the info	17:46
mnaser	in super unrelated cool news, doing our first proposal to a customer for a full ci/cd pipeline that builds docker images based on their github commits and then (hopefully) deploys it to a k8s cluster using zuul	17:46
clarkb	if you goal is to just run devstack you can grab the local.conf and run it that way. It won't coordinate multinode testing or run pre/post gate hooks etc	17:47
clarkb	mnaser: neat	17:47
mnaser	feels like the ideal to do it :>	17:47
mnaser	cool way to explore use cases as well i guess	17:47
mrhillsman	yeah, i just want to reproduce this error is all, i'll have to figure out a workaround	17:47
mrhillsman	just did not want to dive down rabbit hole	17:47
clarkb	mrhillsman: have a link to the error?	17:47
clarkb	(just generally curious)	17:48
mrhillsman	yeah	17:48
mrhillsman	http://logs.openlabtesting.org/logs/67/967/7fbc3bfa7dbba1ac0264dab27ec84020c19cd964/check/gophercloud-acceptance-test/e432753/job-output.txt.gz	17:48
mrhillsman	sorry, bad c/p	17:48
mrhillsman	it is devstack related as well	17:49
mrhillsman	something we are configuring wrong trying to lbaas/octavia testing	17:49
mrhillsman	for gophercloud	17:49
mrhillsman	waiting on devstack logs to finish load :)	17:49
mrhillsman	http://paste.openstack.org/show/720193/	17:50
mrhillsman	i thought reproduce would allow me to kick it off and walk away for food	17:51
clarkb	mrhillsman: /opt/stack/new/neutron-lbaas/devstack/plugin.sh:neutron_lbaas_configure_common:54 : /usr/local/bin/neutron-db-manage --subproject neutron-lbaas --config-file /etc/neutron/neutron.conf --config-file / upgrade head the config file isn't set properly for the db migration	17:51
mrhillsman	yeah, just not sure how that is happening right now, i think maybe neutron-legacy is the issue	17:52
clarkb	looks like you have both neutron-foo and q-foo enabled which may be part of the problem	17:54
mrhillsman	ok, that is what i was leaning towards, thx for quick look	17:55
mrhillsman	i'll go look at the PRs submitted shortly, appreciate it	17:56
mrhillsman	it goes back to something i was worried a bit about some time ago	17:57
mrhillsman	we are force enabling services in this one job; some refactoring needed	17:57
*** myoung is now known as myoung\|biab		18:06
SpamapS	https://photos.app.goo.gl/wD1z5tYowJr7TXZ92 <-- woot, my first successful unattended upgrade of zuul	18:19
SpamapS	(by itself)	18:20
*** openstackgerrit has joined #zuul		18:35
openstackgerrit	Fatih Degirmenci proposed openstack-infra/zuul master: Add additional steps for configuring Nodepool service on CentOS 7 https://review.openstack.org/564950	18:35
*** ssbarnea_ has quit IRC		18:39
*** trishnag has quit IRC		18:40
*** acozine1 has joined #zuul		18:47
*** trishnag has joined #zuul		18:47
*** ssbarnea_ has joined #zuul		18:49
openstackgerrit	Merged openstack-infra/zuul master: Install g++ on platform:rpm https://review.openstack.org/565070	18:52
*** trishnag has quit IRC		18:52
*** myoung\|biab is now known as myoung		19:16
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Add ansible_hostname to /etc/hosts entries https://review.openstack.org/565564	19:20
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Add ansible_hostname to /etc/hosts entries https://review.openstack.org/565564	19:21
openstackgerrit	Fatih Degirmenci proposed openstack-infra/zuul master: Add additional steps for configuring Nodepool service on CentOS 7 https://review.openstack.org/564950	19:27
*** hashar has joined #zuul		19:28
SpamapS	you know..	19:52
SpamapS	I love ansible dearly	19:52
SpamapS	but sometimes jinja makes me have violent mood swings.	19:52
*** hashar has quit IRC		20:00
mordred	SpamapS: yah	20:01
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Add ansible_hostname to /etc/hosts entries https://review.openstack.org/565564	20:18
*** pwhalen has quit IRC		20:25
*** pwhalen has joined #zuul		20:27
*** pwhalen has joined #zuul		20:27
*** myoung is now known as myoung\|biab		20:36
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Fix setting a change queue in a template https://review.openstack.org/565581	20:41
*** acozine1 has quit IRC		20:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Fix regex project templates https://review.openstack.org/565584	21:00
*** ssbarnea_ has quit IRC		21:06
*** myoung\|biab is now known as myoung		21:54
clarkb	corvus: https://review.openstack.org/#/c/565584/1 question on that change	22:25
*** swest1 has quit IRC		23:08
corvus	clarkb: yeah... that's -1 worthy :)	23:12
clarkb	done	23:13
*** acozine1 has joined #zuul		23:18
*** swest has joined #zuul		23:23
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Fix regex project templates https://review.openstack.org/565584	23:25
corvus	clarkb: i think i deleted the right things this time :)	23:25

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!