Tuesday, 2018-05-01

clarkbDownloading http://mirror.bhs1.ovh.openstack.org/pypi/packages/af/c6/904651ff18e647e37351ca61a183218d3773c60f16d49c2b2756235b0fd4/yarl-1.2.1-cp35-cp35m-manylinux1_x86_64.whl (252kB)00:17
clarkbyarl requires Python '>=3.5.3' but the running Python is 3.5.200:17
clarkbthat is currently breaking zuul jobs00:17
clarkbI'm not in a spot to debug it but if people are curious ^ yarl comes from aiohttp00:18
*** swest1 has joined #zuul01:34
*** swest has quit IRC01:34
*** spsurya has joined #zuul01:43
tristanCtobiash: fwiw, the re2 change broke status requirement written like 'status: "sf-io[bot]:local/check:success"'04:29
tristanCe.g. when you want to require a check success from your zuul, instead of any ".*:success" as you pasted04:32
tristanCwhen upgrading to 3.0.2, you have to escape regexp token in status requirement04:36
*** pwhalen_ has joined #zuul05:06
*** pwhalen has quit IRC05:07
* SpamapS drums fingers on desk while nodepool sits and does nothing06:43
*** sigmavirus24 has quit IRC06:52
tobiashtristanC: oh, that was unexpected. Maybe we should string match and fallback to regex match if that fails06:52
tristanCtobiash: fallback might be confusing too. well now that the change is released i think it's fine07:00
*** hashar has joined #zuul07:01
tobiashSpamapS: you seem to have this problem often. Is your environment differently constrained than others? E.g. providers with low quota?07:10
*** hashar is now known as hasharAway07:42
*** ssbarnea_ has joined #zuul07:57
*** ssbarnea_ has quit IRC08:13
SpamapStobiash: I have one provider that is always at 0 max-servers so that I can rapidly fail over to it. Maybe that's a mistake?08:21
SpamapS(it is the most constrained cloud we have, so I try not to use it, but if the other one goes down..)08:21
SpamapScurrently using about 75x8GB nodes on our less constrained cloud08:22
tobiashSpamapS: the provider with max-servers 0 should decline every request08:22
SpamapSit does08:22
SpamapSso I think it's not the problem08:22
SpamapSwhat I see is that the other one gets stuck08:22
tobiashSpamapS: any logging about quota?08:23
SpamapSNot sure where exactly, but I suspect it's a zk lock or something08:23
SpamapSnothing about quota08:23
tobiashyou can inspect locks by telneting to zk and type in 'dump'08:23
tobiashthat should list you any sessions together with their locks08:23
tobiashmaybe that helps08:24
SpamapSIt might08:24
SpamapSI suspect locks mainly because it is inconsistent08:25
SpamapSsometimes the thing fires up right away08:25
SpamapSand sometimes jobs are queued for 15+ minutes08:25
SpamapS(with no other jobs running, and more than enough nodes available, and none building)08:25
SpamapSIt gets stuck between "Accepting node request ..." and the next thing to log..08:26
tobiashhrm, did you notice any zk instabilities?08:31
tobiashit is very sensitive to inconsistent io performance08:32
SpamapSNot really.. it almost doesn't register08:34
SpamapS2018-05-01 01:32:21,515 INFO nodepool.PoolWorker.a-main: Assigning node request <NodeRequest {'state': 'requested', 'nodes': [], 'stat': ZnodeStat(czxid=7873512, mzxid=7873589, ctime=1525163448214, mtime=1525163495027, version=1, cversion=0, aversion=0, ephemeralOwner=98858567839189105, dataLength=217, numChildren=0, pzxid=7873512), 'declined_by': ['zuul.cloud.phx3.gdg-24752-PoolWorker.p-main'], 'requestor':08:34
SpamapS'zuul.cloud.phx3.gdg', 'node_types': ['g7', 'g7', 'g7'], 'id': '200-0000026032', 'reuse': True, 'state_time': 1525163495.026352}>08:34
SpamapSThat line is where it is stuck right at this moment08:34
SpamapSthough it just let go08:34
tobiashcan you get a thread dump?08:34
SpamapSprobably08:34
SpamapSwould need to run nodepool in pdb I suppose?08:35
SpamapSTHis time it only queued for 3 minutes :-P08:35
tobiashSpamapS: just send sigusr2 to it, it should write the thread dump into the log08:35
SpamapShttp://paste.openstack.org/show/720170/08:36
SpamapSthat's the time it was doing nothing08:36
SpamapSzk was quiescent.. thing was just sitting there08:37
SpamapStobiash: ah didn't know that08:37
tobiashwhen it hangs, check if the request is set to pending08:38
tobiashit should do that right after that log message08:39
tobiashand it looks like _waitForNodeSet doesn't log anything08:40
tobiashso you might want to add additional debugging info into http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/driver/__init__.py#n23908:41
SpamapSthe request is set to 'requested' when it is paused08:42
tobiashmaybe you're running into a corner case in that function08:42
SpamapSyeah that's kind of what I assume08:43
SpamapSjust haven't had time to sit with it08:43
tobiashat least it should not switch to paused without logging it08:44
SpamapShttp://paste.openstack.org/show/720172/08:47
SpamapSthere's the thread dump08:47
SpamapSa-main is the "working" cloud, p-main is the one with max-servers: 008:48
*** electrofelix has joined #zuul08:48
SpamapShm08:49
SpamapSI did just notice there are a ton of broken queued images in the alien image list08:49
SpamapSmakes it take a while to list images08:50
SpamapSI suppose I should delete all the aliens08:50
tobiashit can slow down processing08:50
SpamapSyeah if it does that a few times08:50
SpamapSalien-image-list takes ~60s08:50
tobiashare you logging the openstack api requests?08:50
SpamapSno08:50
tobiashyou should08:50
SpamapSwell just the debug level stuff08:50
SpamapSbut not like, GET's/PUT's08:51
tobiashI mean the tasks of the provider request queue08:51
tobiashit also logs the queue size08:51
SpamapSyeah those stream08:51
tobiashso it's not that create-server is starved because of image listing?08:53
tobiashdoes this happen under quota pressure?08:53
SpamapSlooks like back around march 15 something horrible happened and we were thrashing on image uploads08:53
SpamapStobiash: create server isn't needed08:54
SpamapSonce it gets past this it just grabs the existing ready nodes08:54
tobiashah ok08:54
SpamapSI have min-ready: 15 for the types that we use a lot08:54
SpamapSI'm deleting all the image records. they aren't even uploaded, just 'queued'08:55
SpamapSinteresting.. I will say that now that we have enough jobs where we run 10 at a time on every PR.. it's time for a second executor. :-P08:58
SpamapS8 vcpu 16GB VM is hitting loads near 2008:58
tobiashSpamapS: did you configure 'rate' for your provider?08:59
tobiashhttps://zuul-ci.org/docs/nodepool/configuration.html#openstack-driver08:59
tobiashthe default is 1 which means it sends at max one request per second to the cloud08:59
tobiashwhich is often far too low09:00
SpamapStobiash: i didn't, that does sound slow09:01
SpamapSand if it's busy just trying to look at server details instead of list images and fetch their details...09:01
tobiashyou have quite a few NodeDeleter threads in your dump so maybe it is under quota pressure because it's not cleaning up fast enough -> paused handler09:01
tobiashtry to set it to 0.00109:02
SpamapSMy quota in the cloud is 30009:02
tobiashthat means it basically doesn't do a pause before sending the next request in the queue09:02
SpamapSmax-servers is just me being polite. ;)09:02
SpamapSalso I've now deleted all the bad images09:03
tobiashmax-servers is also treated like quota in noderpool09:03
SpamapSI thought quota was the server side quota. :-P09:03
tobiashit handles both the same way09:03
tobiashso you can have server side quota and self-constrained quota09:04
SpamapSah09:04
SpamapSand yeah I'm churning a lot of nodes09:04
SpamapShave a 3 node and a 5 node job09:04
SpamapSthat get run often09:05
tobiashthat makes me think that you have definitely a far too low rate09:05
tobiashmy providers run with a rate of 0.0109:06
SpamapSYeah the control plane can def handle it09:07
tobiashthat also makes me think if the default of 1 is actually a good default (as it will only work well for very small deployments)09:08
tobiashmaybe 0.1 would be a better default?09:08
tobiashcorvus: what do you think? ^09:09
tobiashSpamapS: it's even worse: rate = In seconds, amount to wait between operations on the provider. Defaults to 1.0.09:11
tobiashSpamapS: so you're not even getting one request per second but actually less09:11
SpamapSyeah09:14
SpamapStobiash: never noticed before but sometimes the queue was indeed getting long09:16
SpamapS20 or so at the worst09:16
SpamapSI've now set it to 0.0109:16
SpamapSand queue is staying at 009:16
SpamapSbut box still waiting09:17
SpamapSand yeah, my executor is also bombed09:18
* SpamapS puts adding another one on the next sprint board :-P09:18
SpamapStobiash: thanks, that does seem a bit better09:18
SpamapSstill stalls for a minute or so.. but seems consistently 1 minute.. Have done 5 patches now.. and all were ~60s .. last night I was seeing 5-15 minute lag09:19
SpamapSalright, sleep calls09:20
tobiashat least better now09:22
openstackgerritIan Wienand proposed openstack-infra/zuul master: Pin yarl for python < 3.5.3  https://review.openstack.org/56547009:31
openstackgerritIan Wienand proposed openstack-infra/zuul master: Pin yarl for python < 3.5.3  https://review.openstack.org/56547009:37
*** ssbarnea_ has joined #zuul09:41
*** gtema has joined #zuul10:28
openstackgerritMerged openstack-infra/zuul master: Pin yarl for python < 3.5.3  https://review.openstack.org/56547010:46
openstackgerritMerged openstack-infra/zuul master: Add release note about config memory improvements  https://review.openstack.org/56534810:56
*** gtema has quit IRC10:58
odyssey4meHi everyone, has any thought been given to implementing the ability to define a matrix of jobs? I mean something like https://docs.travis-ci.com/user/customizing-the-build#Build-Matrix where I want to execute the same job, but with different parameters and instead of defining each job individually using the parent/child, zuul could prepare the matrix for me.11:44
*** ssbarnea_ has quit IRC12:22
*** rlandy has joined #zuul12:26
*** weshay is now known as weshay|ruck12:46
*** pwhalen_ is now known as pwhalen13:01
*** pwhalen has joined #zuul13:01
*** ssbarnea_ has joined #zuul13:10
*** ssbarnea_ has quit IRC13:15
*** hasharAway has quit IRC13:27
openstackgerritMerged openstack-infra/zuul master: Increase unit testing of host / group vars  https://review.openstack.org/55940513:27
openstackgerritMerged openstack-infra/zuul master: Inventory groups should be under children key  https://review.openstack.org/55940613:28
*** openstackgerrit has quit IRC13:34
*** pwhalen has quit IRC13:47
*** pwhalen has joined #zuul13:50
*** gtema has joined #zuul13:55
*** ssbarnea_ has joined #zuul14:03
*** ssbarnea_ has quit IRC14:07
*** ssbarnea_ has joined #zuul14:13
*** gtema has quit IRC14:15
corvusodyssey4me: we had that in jjb, but we're trying to avoid it in zuul because it can get pretty confusing pretty fast.  so at the moment, the answer is basically "make a bunch of one-liner jobs which override the thing you want to change".  but if that doesn't work out in the long run, we can probably add some syntactic sugar to do that.14:17
*** dkranz has joined #zuul14:17
odyssey4mecorvus yeah, understood - the trouble is that now we've got to make many, many job definitions which gets confusing too14:28
corvusodyssey4me: indeed.  is this in openstack, or a public repo where i can take a look?14:28
odyssey4meespecially when you're trying to cover multiple axes - like python version, ansible version, code path, and config elements14:28
odyssey4menot just yet, I'm just forward planning - trying to figure out how to tackle things and looking at options14:29
corvusodyssey4me: okay, let me know when you get there -- i'd like to see what it ends up looking like14:31
corvusodyssey4me: also here's an idea for an interim solution: you could put all the variants in one file in zuul.d and write a script which generates the matrix output14:31
corvus(so keep the main body of the job in a separate file so you don't have to worry about munging it)14:32
odyssey4mecorvus yep, I was just thinking of doing that14:33
odyssey4meuse a tool to generate a static zuul yaml file - kinda the best of both worlds14:33
corvusit's at least an easy way to experiment before we have to commit to syntax changes :)14:34
mordred++ I like that as an experimentation approach14:39
*** acozine1 has joined #zuul14:40
*** CharlesShine has joined #zuul14:51
*** CharlesShine has left #zuul14:52
*** trishnag has quit IRC14:57
clarkbSpamapS: tobiash I notice that reuse: True is set maybe related to that?15:41
clarkb(its not something we use upstream)15:41
tobiashclarkb: reuse?15:55
clarkb['zuul.cloud.phx3.gdg-24752-PoolWorker.p-main'], 'requestor':'zuul.cloud.phx3.gdg', 'node_types': ['g7', 'g7', 'g7'], 'id': '200-0000026032', 'reuse': True, 'state_time':1525163495.026352}>15:56
clarkbtobiash: ^ there reuse:True I think that means nodepool is reusing the nodes?15:56
tobiashah, that's a min-ready request16:05
clarkbah16:07
SpamapSodyssey4me: fwiw, I think having a single clear file that has a list of jobs and the reason they vary is actually the least confusing option. But, that's me preferring composition in general.16:15
SpamapSclarkb: yeah we're not reusing nodes.16:16
*** ssbarnea_ has quit IRC16:18
*** trishnag has joined #zuul16:24
*** trishnag has quit IRC16:24
*** trishnag has joined #zuul16:24
*** spsurya has quit IRC16:31
*** acozine1 has quit IRC16:51
*** ssbarnea_ has joined #zuul17:05
*** sshnaidm is now known as sshnaidm|afk17:17
pabelangerclarkb: corvus: just looking at python3.6 jobs, for zuul / nodepool do we want both py35 / py36 or just take the leap directly to py36?17:33
clarkbpabelanger: I think based on our py26->27 experience we probably want to run both17:34
clarkbespecially if we want to continue to support 3517:34
clarkb(since aiohttp for example just broke us there)17:34
pabelangerwgm17:34
pabelangerwfm*17:34
pabelangerwe also still have xenial servers for openstack-infra, so good to keep it around too17:35
clarkblocally I use py36, so I'm reasonably confident zuul works under py3617:35
clarkbso doing both shouldn't be too bad outside of external deps breaking us17:36
pabelangeryah, I've been using fedora a lot and haven't seen an issue so far17:36
mrhillsmanreproduce.sh no longer works because /usr/zuul-env/bin/zuul-cloner does not exist, is there a workaround/fix for this?17:43
clarkbmrhillsman: not currently no.17:43
mrhillsmanthx17:43
clarkbmrhillsman: beyond z-c not existing zuul isn't publicly publishing its git refs17:43
clarkbso you'd need something to do the merges like zuul before invoking the devstack run which I think we've largely decided will happen as part of a "run zuul job locally" generic tool (rather than devstack specific)17:44
clarkb(I want to say jhesketh has been looking at that problem)17:45
mrhillsmancool, thx for the info17:46
mnaserin super unrelated cool news, doing our first proposal to a customer for a full ci/cd pipeline that builds docker images based on their github commits and then (hopefully) deploys it to a k8s cluster using zuul17:46
clarkbif you goal is to just run devstack you can grab the local.conf and run it that way. It won't coordinate multinode testing or run pre/post gate hooks etc17:47
clarkbmnaser: neat17:47
mnaserfeels like the ideal to do it :>17:47
mnasercool way to explore use cases as well i guess17:47
mrhillsmanyeah, i just want to reproduce this error is all, i'll have to figure out a workaround17:47
mrhillsmanjust did not want to dive down rabbit hole17:47
clarkbmrhillsman: have a link to the error?17:47
clarkb(just generally curious)17:48
mrhillsmanyeah17:48
mrhillsmanhttp://logs.openlabtesting.org/logs/67/967/7fbc3bfa7dbba1ac0264dab27ec84020c19cd964/check/gophercloud-acceptance-test/e432753/job-output.txt.gz17:48
mrhillsmansorry, bad c/p17:48
mrhillsmanit is devstack related as well17:49
mrhillsmansomething we are configuring wrong trying to lbaas/octavia testing17:49
mrhillsmanfor gophercloud17:49
mrhillsmanwaiting on devstack logs to finish load :)17:49
mrhillsmanhttp://paste.openstack.org/show/720193/17:50
mrhillsmani thought reproduce would allow me to kick it off and walk away for food17:51
clarkbmrhillsman: /opt/stack/new/neutron-lbaas/devstack/plugin.sh:neutron_lbaas_configure_common:54 :   /usr/local/bin/neutron-db-manage --subproject neutron-lbaas --config-file /etc/neutron/neutron.conf --config-file / upgrade head the config file isn't set properly for the db migration17:51
mrhillsmanyeah, just not sure how that is happening right now, i think maybe neutron-legacy is the issue17:52
clarkblooks like you have both neutron-foo and q-foo enabled which may be part of the problem17:54
mrhillsmanok, that is what i was leaning towards, thx for quick look17:55
mrhillsmani'll go look at the PRs submitted shortly, appreciate it17:56
mrhillsmanit goes back to something i was worried a bit about some time ago17:57
mrhillsmanwe are force enabling services in this one job; some refactoring needed17:57
*** myoung is now known as myoung|biab18:06
SpamapShttps://photos.app.goo.gl/wD1z5tYowJr7TXZ92 <-- woot, my first successful unattended upgrade of zuul18:19
SpamapS(by itself)18:20
*** openstackgerrit has joined #zuul18:35
openstackgerritFatih Degirmenci proposed openstack-infra/zuul master: Add additional steps for configuring Nodepool service on CentOS 7  https://review.openstack.org/56495018:35
*** ssbarnea_ has quit IRC18:39
*** trishnag has quit IRC18:40
*** acozine1 has joined #zuul18:47
*** trishnag has joined #zuul18:47
*** ssbarnea_ has joined #zuul18:49
openstackgerritMerged openstack-infra/zuul master: Install g++ on platform:rpm  https://review.openstack.org/56507018:52
*** trishnag has quit IRC18:52
*** myoung|biab is now known as myoung19:16
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Add ansible_hostname to /etc/hosts entries  https://review.openstack.org/56556419:20
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Add ansible_hostname to /etc/hosts entries  https://review.openstack.org/56556419:21
openstackgerritFatih Degirmenci proposed openstack-infra/zuul master: Add additional steps for configuring Nodepool service on CentOS 7  https://review.openstack.org/56495019:27
*** hashar has joined #zuul19:28
SpamapSyou know..19:52
SpamapSI love ansible dearly19:52
SpamapSbut sometimes jinja makes me have violent mood swings.19:52
*** hashar has quit IRC20:00
mordredSpamapS: yah20:01
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul-jobs master: Add ansible_hostname to /etc/hosts entries  https://review.openstack.org/56556420:18
*** pwhalen has quit IRC20:25
*** pwhalen has joined #zuul20:27
*** pwhalen has joined #zuul20:27
*** myoung is now known as myoung|biab20:36
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Fix setting a change queue in a template  https://review.openstack.org/56558120:41
*** acozine1 has quit IRC20:45
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Fix regex project templates  https://review.openstack.org/56558421:00
*** ssbarnea_ has quit IRC21:06
*** myoung|biab is now known as myoung21:54
clarkbcorvus: https://review.openstack.org/#/c/565584/1 question on that change22:25
*** swest1 has quit IRC23:08
corvusclarkb: yeah... that's -1 worthy :)23:12
clarkbdone23:13
*** acozine1 has joined #zuul23:18
*** swest has joined #zuul23:23
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Fix regex project templates  https://review.openstack.org/56558423:25
corvusclarkb: i think i deleted the right things this time :)23:25

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!