Wednesday, 2017-07-05

*** jkilpatr has quit IRC		00:47
*** bhavik1 has joined #zuul		05:27
*** bhavik1 has quit IRC		05:31
*** bhavik1 has joined #zuul		06:21
*** bhavik1 has quit IRC		06:24
*** yolanda_ has joined #zuul		07:01
*** hashar has joined #zuul		08:13
*** bhavik1 has joined #zuul		09:21
*** bhavik1 has quit IRC		10:14
*** jkilpatr has joined #zuul		11:19
*** hashar has quit IRC		11:42
*** hashar has joined #zuul		11:51
*** dkranz has quit IRC		12:32
*** jkilpatr has quit IRC		13:11
*** jkilpatr has joined #zuul		13:11
*** jkilpatr has quit IRC		13:17
*** dkranz has joined #zuul		13:26
*** jkilpatr has joined #zuul		13:30
openstackgerrit	Andreas Scheuring proposed openstack-infra/nodepool master: Use POST operations for create resource https://review.openstack.org/480601	14:10
openstackgerrit	Merged openstack-infra/nodepool master: Remove duplicate python-jenkins code from nodepool https://review.openstack.org/259157	14:23
mordred	jeblair, clarkb: I'm reviewing AJaeger's mitaka cleanup patches and it made me think abouta thing we'll need a good story for in zuul v3 ...	14:37
jeblair	i do like a good story	14:37
mordred	namely - we'll want, at some point, to be able to have either a job or a utility to verify what jobs in the zuul config reference a given image name	14:37
jeblair	mordred: web api may be a good choice for that	14:38
mordred	as once we have .zuul.yaml job configs - configs that reference, say "ubuntu-trusty" would become sad when infra stops building such a node -but it'll be hard for infra to be able to be proactive on upgrades since we won't know	14:38
mordred	jeblair: agree	14:38
mordred	jeblair: I figured we should just capture the use case somewhere so that we remember to implement it before we delete ubuntu-xenial :)	14:39
mordred	jeblair: might also be neat for us to be able to add a "deprecated" flag to an image so that zuul could report use of a deprecated image when it runs jobs using one too	14:39
jeblair	mordred: we can also see when nodepool provides nodes, so we can see what active jobs are using particular image types	14:41
mordred	jeblair: yup	14:42
mordred	jeblair: I think we've got good tools at our disposal	14:42
clarkb	switching behavior to fail instead of queue indefinitely might also help. At least this way you get feedback immediately on any issues which can be corrected	15:04
mordred	clarkb: ++	15:34
mordred	clarkb: also will be important even for normal cases to catch config errors	15:35
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add configuration documentation https://review.openstack.org/463328	15:49
jlk	o/	15:58
* rcarrillocruz waves		16:08
rcarrillocruz	hey folks	16:08
rcarrillocruz	playing with multiple networks per node in nodepool	16:08
rcarrillocruz	however i get OpenStackCloudException: Error in creating instance (Inner Exception: Multiple possible networks found, use a Network ID to be more specific. (HTTP 409) (Request-ID: req-bd4ea6d2-9c97-487f-82bd-2b0e0540ff06))	16:09
rcarrillocruz	not sure if it's because nodepool doesn't know which network to associate floating IP on	16:09
rcarrillocruz	looking at docs i don't see a param to say 'associate floating IP to this network IPs' or something like that	16:09
rcarrillocruz	Shrews: mordred ^	16:09
jlk	rcarrillocruz: it's probably deeper than that	16:10
jlk	rcarrillocruz: are you specifying which network to use for the private address?	16:10
jlk	OpenStack doesn't really have a server side way to say "this is the default, pick it if somebody doesn't specify"	16:10
clarkb	right the only time there is a default is when you have a single network	16:11
clarkb	then neutron just uses that bceause there is no other option. If there is more than one network you have to specify which to use	16:11
jlk	rcarrillocruz: I see that message usually when doing the initial boot, and you have more than one network available for the private address.	16:12
rcarrillocruz	but where in the yaml	16:12
rcarrillocruz	in the label?	16:12
rcarrillocruz	i don't see that in the docs	16:12
clarkb	rcarrillocruz: no, under the cloud provider	16:13
clarkb	rcarrillocruz: there are examples in infras nodepool.yaml	16:13
clarkb	(osic does this iirc)	16:13
clarkb	https://docs.openstack.org/infra/nodepool/configuration.html#provider is where it is documented	16:15
dmsimard	Silly question	16:17
dmsimard	At what point does a RFE becomes a spec from a story ? When someone picks it up and it's ready to be discussed/worked on basically ?	16:18
rcarrillocruz	i don't see in prod nodepool providers with multiple networks	16:18
rcarrillocruz	and yes	16:18
rcarrillocruz	that's exactly what i have	16:18
rcarrillocruz	in the provider	16:18
rcarrillocruz	multiple networks defined there	16:18
clarkb	rcarrillocruz: osic	16:18
rcarrillocruz	but nodepool doesn't like it it seems	16:18
jeblair	dmsimard: we usually only write specs if we want to make sure we have agreement on something before starting work on it	16:18
jeblair	dmsimard: we tend to do it for larger or more complex efforts	16:19
clarkb	rcarrillocruz: also tripleo iirc	16:19
dmsimard	jeblair: makes sense, just wanted to make sure	16:19
rcarrillocruz	clarkb: looking at nodepool.openstack.org osic clouds have only one network	16:21
rcarrillocruz	- name: 'GATEWAY_NET_V6'	16:21
clarkb	rcarrillocruz: the provider only has one network but the cloud has multiple. That is the way we select whcih of the multiple networks we want to use because neutron will not default to one of them for us	16:21
clarkb	rcarrillocruz: that is also a list so you can provide more than one if you want more than one	16:22
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Reorganize docs into user/admin guide https://review.openstack.org/475928	16:24
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use oslosphinx theme https://review.openstack.org/477585	16:27
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move tenant_config option to scheduler section https://review.openstack.org/477587	16:27
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_expiry to webapp section https://review.openstack.org/477586	16:27
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Correct sample zuul.conf https://review.openstack.org/477588	16:27
dmsimard	jlk, jeblair: I created a story from a discussion a while back since it was a topic of discussion again in RDO's implementation https://storyboard.openstack.org/#!/story/2001102	16:29
jlk	dmsimard: can you document the data flow? Where does the whole thing get started from? Seems like you still need an event to wake up the scheduler, which would then use the script as a pipeline filter? or do you just imagine the scheduler running the script every X seconds in a loop, or???	16:32
dmsimard	jlk: I'm not particularly familiar with zuul internals so it's not trivial for me to express is proper terms :)	16:33
jlk	ah.	16:33
jlk	but generally?	16:33
dmsimard	jlk: technically, using the term filter would probably be more appropriate than trigger	16:34
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs https://review.openstack.org/477589	16:34
dmsimard	at least in our use case, it would be a filter	16:34
jlk	Pipelines are event driven, because basically the system sits idle until some event kicks it into gear. Be it an event from gerrit, or github, or a timer.	16:34
jlk	that event is processed to determine which project it relates to	16:35
jlk	and then the scheduler looks at the event and matches it up to the triggers for pipelines, to complete a event + project + jobs + pipeline mashup	16:35
jlk	when it's doing that, it looks at either event filters ( limitations on the triggers themselves ), or pipeline filters ( higher level restrictions on the pipeline ).	16:37
dmsimard	Right, so perhaps what I'm asking for is something that doesn't exist yet -- the notion of a pipeline filter, and then fold the existing branch/files filtering at the job-level under a proper filter key where more things (such as the arbitrary script I'm speaking of) could be added	16:37
jlk	for instance, a github event may be a comment event of "recheck", but the gate pipeline may have a pipeline filter (requirement) that there be enough votes. SO the trigger matches, but the pipeline requirement filter blocks it.	16:38
jlk	dmsimard: if I understand you, then you still want the initial wake up to be an existing trigger, like a gerrit change or comment	16:39
dmsimard	yes.	16:39
jlk	but then you want as a pipeline requirement the ability to run an arbitrary script to make a Just In Time decision on whether to enqueue or not	16:39
dmsimard	Yes. Ultimately I think our need is for that mechanism to be at the job definition layer (second example in the story)	16:40
dmsimard	But I feel that can get things slow rather quickly	16:41
dmsimard	especially at scale	16:41
jlk	yeah, since that definition can live _in_ the repo in question	16:41
jlk	so it's a clone/merge, re-read configuration	16:41
jlk	yeah eww, that kind of leaks pipeline config (which is supposed to only be from trusted repos) into untrusted repos	16:42
dmsimard	jlk: to re-iterate the context, it's basically a way to do a "freeform" filter	16:43
jlk	where would you picture this script running, with what context?	16:43
dmsimard	jlk: in JJB terms, add a shell builder to decide whether to continue or not very early on in the process :p	16:43
dmsimard	jlk: that last statement was not in reply to your question, hang on	16:44
dmsimard	jlk: thinking outside the box, perhaps I'm looking at this from the wrong angle	16:45
dmsimard	jlk: how about being able to dynamically trigger jobs from a job ?	16:45
dmsimard	the filtering and triggering logic could live inside that job and that job would trigger jobs based on what it needs	16:46
jlk	we have a job hierarchy thing	16:48
jlk	You can have an "early" job on a pipeline, and a number of jobs that "depend" on that job	16:48
jlk	so if your early job succeeds, the others go. If not, they do not	16:48
mordred	dmsimard: can you give an example of what userscript.sh might do?	16:48
jlk	let me see if I can pull up the docs (for v3) on this	16:49
dmsimard	jlk: yeah, but (like discussed last time), this parent job might have one of a dozen jobs to trigger -- for a patch we might have three jobs to trigger, not 12.. and another patch maybe we have 5	16:49
dmsimard	mordred: curl weather.com and if weather is good trigger job A B C, if weather is bad trigger job C D E ?	16:50
jlk	this is where roles/playbooks come in	16:50
mordred	dmsimard: yah - also what jlk says about hierarchies may be helpful - but a few things are quite different in v3, so it's possible that the things you're wanting to accomplish are structued differently	16:50
mordred	dmsimard: I mean - can you give me a non-invented example?	16:50
mordred	dmsimard: (trying to wrap my head around the use case and I think a real example problem would help a lot here)	16:51
rcarrillocruz	the issue was that i had networks section on provider	16:51
rcarrillocruz	and cos i have a pool	16:51
rcarrillocruz	it had to be put in that level	16:51
mordred	dmsimard: it's also worth noting that jobs in v3 are in ansible and also support multiple hosts -so it's entirely possible that this is a "this logic should just be in an ansible playbook" case	16:52
dmsimard	mordred: We have this thing in RDO called rdoinfo which is more or less a database that maps upstream git repositories to rpm package names (amongst other things). This project spans all OpenStack releases and the jobs we need to trigger depends on the release that is being modified.	16:52
dmsimard	mordred: https://github.com/redhat-openstack/rdoinfo/blob/master/rdo.yml	16:52
mordred	dmsimard: looking/reading	16:52
mordred	dmsimard: and thanks	16:52
dmsimard	mordred: We've currently hacked together something rather awesome but also pretty ugly where we have a job that knows what jobs to trigger and with what parameters and triggers them remotely (on a remote jenkins instance where generic parameterized jobs exist) and we then poll the jobs until they finish to get their status	16:53
mordred	dmsimard: is that rdo.yml consumed by anything _other_ than routing how builds work?	16:54
dmsimard	mordred: https://review.rdoproject.org/jenkins/job/weirdo-validate-buildsys-tags/98/consoleFull	16:54
dmsimard	mordred: it's consumed by various RDO tooling to decide what is built and how it is built	16:55
dmsimard	mordred: basically, when we, for example, bump upper-constraints https://review.rdoproject.org/r/#/c/7403/3/rdo.yml we'll run integration jobs	16:56
dmsimard	which will build the packages that changed with that patch taken into account and then run integration jobs with those newly built packages	16:56
dmsimard	that particular upper-constraints patch is sort of an easy case because it only touches one release, the problem comes when it spans different releases	16:57
mordred	dmsimard: I mean - I ask because it seems like th eproblems you're solving with it are the same problems that zuulv3 is trying to solve - which is not to say that the answer for your problem is just already in v3 - but the problem spae is SO similar that I think it's going to take us a non-simple amount of time to untangle which bit should grok and understand what	16:57
mordred	dmsimard: which is my way of saying - I think I understand the problem space, as least in an initial way, and I think we'll need to understand a bit more deeply to be able to work with you on an appropriate solution	16:58
dmsimard	mordred: sure -- I'm an end user after all :P I'm pretty sure the use case is useful but can be addressed in different ways. Happy to discuss it further.	16:59
mordred	dmsimard: awesome. I'm certain the use case and problem domain are important- so I definitely want to make sure we find an answer	17:00
*** hashar has quit IRC		17:02
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname https://review.openstack.org/477592	17:38
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section https://review.openstack.org/477591	17:38
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile https://review.openstack.org/477590	17:38
jlk	So I think I need an alternative to curl for throwing json at my zuul	17:39
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs https://review.openstack.org/477593	17:40
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation https://review.openstack.org/479020	17:42
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver https://review.openstack.org/477594	17:42
mordred	jlk: I tend to use python requests in the python repl for things like that	17:45
jeblair	jlk, mordred: ^ that's the docs stack with most comments addressed; starting at 463328	17:45
mordred	jeblair: woot	17:45
jlk	thanks!	17:46
jeblair	i addressed comments without performing a rebase for easier diffing	17:46
jeblair	i'm going to perform the rebase pass now, and then there are a couple things to change after the rebase (using the new config getter)	17:46
mordred	++	17:47
jeblair	then let's land it and be done :)	17:47
jlk	I'll trade you for reviews on depends-on and fixing reports on push events :D	17:47
mordred	jeblair: agree. I think it'll be much more useful for us to have docs and iterate on them as we find things	17:47
jeblair	jlk: deal	17:47
jlk	mordred: okay, doing it with requests in a python3 interpreter worked, something is just screwy with my curl binary it seems :(	18:00
mordred	jlk: well- that's "good news"	18:01
jlk	seeing if brew has a newer curl	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs https://review.openstack.org/477593	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname https://review.openstack.org/477592	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver https://review.openstack.org/477594	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use oslosphinx theme https://review.openstack.org/477585	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move tenant_config option to scheduler section https://review.openstack.org/477587	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add configuration documentation https://review.openstack.org/463328	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_expiry to webapp section https://review.openstack.org/477586	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs https://review.openstack.org/477589	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Correct sample zuul.conf https://review.openstack.org/477588	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section https://review.openstack.org/477591	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile https://review.openstack.org/477590	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation https://review.openstack.org/479020	18:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Reorganize docs into user/admin guide https://review.openstack.org/475928	18:01
jeblair	mordred, jlk: ^ that's the rebase -- the remaining suggestions about using get_config were actually the conflicts, so i had to fix those during the rebase. should mean the stack is ready now.	18:02
jlk	oooh weird. It worked to get the event in, but then it tried to make an error somewhere?	18:04
jlk	zuul-scheduler_1 \| DEBUG:paste.httpserver.ThreadPool:Added task (0 tasks queued)	18:05
jlk	then a 400 bad request	18:05
jlk	or... or I did something wierd? I don't know.	18:05
mordred	jlk: it's going to make an outbound call to github to fill in some data, right?	18:05
jlk	it did all of that	18:05
mordred	jlk: is the data in your payload data that it can effectively make outbound calls for?	18:05
mordred	oh. weird	18:06
jlk	I'm going to repeat and see if I'm really seeing this or not	18:06
mordred	kk	18:06
mordred	jeblair: https://review.openstack.org/#/c/478265/ is ready btw	18:06
mordred	jeblair: or, I say that- lemme go ahead and add a zuul patch depending on it real quick	18:06
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Run the new fancy zuul-jobs versions of jobs https://review.openstack.org/480692	18:08
mordred	jeblair: let's see how badly that breaks :)	18:08
mordred	oh. piddle	18:08
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Run the new fancy zuul-jobs versions of jobs https://review.openstack.org/480692	18:08
jlk	okay I can't repeat the error when using requests.	18:09
mordred	jlk: when you're curling - are you doing: -H "Content-Type: application/json; charset=UTF-8" ?	18:12
jlk	curl -H "Content-Type: application/json" -H "X-Github-Event: issue_comment" -X POST -d @pull-comment.json http://localhost:8001/connection/github/payload	18:12
jlk	which worked when the recipient was python2 based, but fails with python3	18:13
jlk	now I"m wondering if the saved .json file is just in a format that python3 won't like	18:13
mordred	jlk: does adding "; charset=UTF-8" to your content type header have any effect?	18:14
* mordred grasping at straws to explain		18:14
mordred	but I'm honestly curious to know what the problem actually is	18:14
jlk	it hasn't, no. Because that doesn't change /how/ curl sends things, it just adjust what it tells the other end	18:14
jlk	I'm going to do something more fun, curl from a Fedora docker.	18:15
mordred	jlk: awesome	18:16
mordred	jlk: also, while you're grasping at straws: python3 that: http://paste.openstack.org/show/614493/	18:19
jlk	nod, I went through the dance of writing the file as utf8 via vim	18:20
mordred	:)	18:20
jeblair	hrm, looks like there's a problem in the docs stack about halfway down; looking	18:21
jlk	oh d'oh. Gotta link my docker networks together.	18:21
jlk	nope, that didn't do it.	18:24
mordred	jlk: like - it still broke?	18:27
jlk	yeah, I re-wrote the file using your snippit and used curl in Fedora docker to send the file up	18:27
jlk	still get the immediate traceback	18:27
jlk	ugh, and using --data-binary instead of just -d didn't work either	18:30
mordred	jlk: that's super crazy	18:31
mordred	jeblair: notaminusone - you added mention of allow-secrets in the secrets sectoin - but it's not actually listed in the pipeline config section	18:32
SpamapS	jlk: and your only log in zuul-scheduler is 400?	18:37
jlk	SpamapS: no, it's a traceback out of webob	18:37
jlk	zuul-scheduler_1 \| TypeError: a bytes-like object is required, not 'str'	18:37
jlk	looks like if I do the json manually in-line with curl it doesn't choke	18:37
jlk	so it's something about how curl is reading it from the filesystem	18:37
SpamapS	weird	18:38
jlk	very	18:40
jlk	Guess I'll just build myself a tiny python script to do this.	18:43
jlk	Still no idea where the bug is, but my little utility is working fine.	19:10
jlk	https://github.com/j2sol/z8s/blob/master/sendit.py	19:11
dmsimard	In the context of optimizing resources (for higher concurrency rather than speed), how would you deal with more than one flavor properly inside a nodepool tenant ?	19:25
dmsimard	Just set max-servers for each node type in a ratio that makes sense ?	19:25
clarkb	dmsimard: currently max-servers is per provider so you could do that but would have to split up logical providers per resource	19:26
dmsimard	I know upstream doesn't really deal with this since the flavor is uniform with 8vcpu/8gb ram etc	19:26
clarkb	s/resource/flavor/	19:26
clarkb	we've done the logical provider max-servers mapping onto clodu resources in the past to limit the number of instances per network/router so it does work but not sure how useable it is on the flavor side	19:27
dmsimard	clarkb: oh you're right max-servers is at the provider level, I was thinking of min-ready	19:27
mordred	jeblair: I have +2'd the first three, but have left comments on them too	19:27
dmsimard	clarkb: I guess what I'm looking for is a "max-servers" at the label layer or something.	19:28
clarkb	dmsimard: ya you can do that with the logical provider hack	19:29
clarkb	dmsimard: we did it with hpcloud to control how many instances ended up on each network/router	19:29
dmsimard	clarkb: does an example of this exist somewhere ?	19:30
clarkb	dmsimard: in the way way back history of our nodepool.yaml from when hpcloud was in it	19:30
dmsimard	ok, I'll try and look when I have a chance. Thanks :)	19:30
jeblair	dmsimard: remember all this is different in nodepool v3 because allocation is handled differently. things should work better with much less tuning; the only thing that might warrant tuning is min-ready.	19:40
jeblair	mordred: thanks; i'll add another patch for allow-secrets	19:41
jeblair	jlk: feel free to stick it in zuul/tools if it's generally useful	19:41
clarkb	jeblair: max-servers is still problematic (tobiash ran into this) because you can boot 50 "large" instances which could put you over cpu or memory or disk etc quota then try to boot 1 "small"	19:42
clarkb	instance and nodepool thinks it is ok because max servers is say 100	19:43
clarkb	this is a general problem in nodepool reducing multi dimensional quota to a single factor	19:43
jeblair	clarkb: ah yes. we do need to rid ourselves of max-servers. hopefully we can figure out how to use openstack's quota api :)	19:43
mordred	jeblair: we support it shade-side	19:44
jeblair	(though we probably want to keep supporting max-servers but also add max-ram and max-cpu)	19:44
mordred	jeblair: the biggest problem is that sometimes the clouds lie/don't report it :(	19:44
jeblair	for folks who may have higher quota than they want to use/pay for	19:44
clarkb	yes ^	19:44
mordred	jeblair: BUT - I think if we just allowed people to set max-ram / max-cpu / etc like you suggest	19:44
clarkb	but also the simplification isn't all bad, openstack quotas are a mess and we mostly don't worry about them because we simplified	19:45
mordred	we could get close enough - the info is provided in the flavors	19:45
jeblair	mordred: do you think the quota api is useful enough for us to do both? ie, if none of max-{ram,cpu,servers} is unlimited in the nodepool config, can we automatically query openstack to find out what the max is? or is that dangerous enough we need to make it opt-in or opt-out?	19:45
mordred	jeblair: I think we could totally query the quota api - and then provide override values for folks with non-sane quota apis	19:46
mordred	jeblair: or for folks who want to use less than the entire quota on their project	19:46
jeblair	mordred: that seems like a reasonable approach. yeah, i guess it works for both cases.	19:46
mordred	but I also agree - max-servers for the simple case seems like a nice thing to keep	19:47
clarkb	granted any resource limitation tool with 20 axis is going to be a complicated mess :)	19:48
clarkb	instances, cpu, ram, disk, ports, floating ips, networks, subnets, routers, volumes, volume size, volume aggregate, and I am sure I am forgetting a bunch	19:48
clarkb	oh volumes per instance	19:49
clarkb	clouds should just be free and infinite	19:49
jeblair	heh, real clouds pretty much are. i think the metaphor just broke.	19:49
*** hashar has joined #zuul		19:49
mordred	++	19:50
mordred	I think maybe let's start with instances, cpu, ram and maybe disk - since those are all reported in flavors	19:51
mordred	and we can consider fips and volumes later	19:51
jeblair	"we talkin' cloud like seattle-cloud? or yuma-cloud? 'cause they ain't the same thing."	19:52
mordred	heh	19:54
mordred	jeblair: feature request related to "always run post jobs"	19:58
jeblair	wow. in a very morisettian development -- the "cloud museum" is 10 miles from yuma -- the least cloudiest city in america: http://cloudmuseum.dynamitedave.com/	19:58
mordred	jeblair: if a job hits retry_limit - it sure would be nice to get the logs from the last failure	19:58
jeblair	mordred: i think that should happen	19:59
jeblair	did we land that change yet?	19:59
clarkb	wouldn't you get them under $uuid?	19:59
clarkb	so it is just a matter of reporting where they can be found?	19:59
jeblair	clarkb, mordred: yes, logs from all the attempts should be stored, then i would expect the final log url to be used in the report	20:00
mordred	ah- that may be it - https://review.openstack.org/#/c/480692/ ran new jobs - but no url	20:00
jeblair	huh, wonder why no url?	20:01
jeblair	at any rate, i suspect that's the bug here	20:01
mordred	http://logs.openstack.org/92/480692/2/check/e2e1435/job-output.txt	20:02
mordred	yes - output was logged and uploaded	20:02
jeblair	okay, i've got 2 things on my stack right now; i can look at that a bit later if someone doesn't beat me to it	20:02
jlk	'retry_limit' is our version of Nova's "no valid host found".	20:07
jeblair	jlk: :( yes	20:07
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs https://review.openstack.org/477589	20:12
* jlk lunches		20:15
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs https://review.openstack.org/477593	20:15
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname https://review.openstack.org/477592	20:15
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation https://review.openstack.org/479020	20:15
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver https://review.openstack.org/477594	20:15
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section https://review.openstack.org/477591	20:15
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile https://review.openstack.org/477590	20:15
tobiash	hi, regarding quotas	20:18
tobiash	what if we let nodepool just run into quota and pause allocation for a while if it gets an quota related error?	20:19
clarkb	tobiash: I am not sure you can reliably detact quota erros?	20:19
tobiash	my nodepoolv2 works quite nice when running into the quota	20:19
clarkb	things will fail but who knows why? that is something that is testable though	20:20
tobiash	my nodepoolv3 not so nice...	20:20
tobiash	clarkb: at least the logs say something about quota in the exceptions	20:20
tobiash	so I think it should be detectable (not sure how easy)	20:20
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add docs on allow-secrets https://review.openstack.org/480726	20:21
tobiash	letting run nodepool into quota gracefully would also reduce the need for synchronization of several nodepools in the same tenant	20:26
clarkb	tobiash: I think the intent is that that already happens because the next provider should attempt to fullfill the request. The missing bit is some sort of coordination to back off once you hit the limit so that you aren't constantly trying when you know you are likely to fail?	20:27
jeblair	clarkb: the current algorithm assumes that it knows quota; so i think we either need to facilitate the provider "knowing quota correctly" (whether that's api calls and/or config), or change the algorithm to accomodate what tobiash is suggesting	20:28
jeblair	of course with multiple nodepools in the same tenant, things get a little hairy -- you can, but probably don't want to, solve that with config (set each to 1/2 quota? yuck)	20:29
tobiash	jeblair: I think running gracefully into quota could be easier and more robust (considering several nodepools running in the same tenant)	20:29
jeblair	api calls can improve the situation, but only if you make them before each request, and i wasn't thinking of doing that (that would be a big hit for us)	20:30
jeblair	so gracefully handling quota errors is probably the best solution to that particular case (and happens to help with some others as well)	20:30
tobiash	nodepoolv2 had no problem running into quota (apart from stressing the openstack api, but that could be solved)	20:30
tobiash	with nodepoolv3 and running into quota I get NODE_FAILURE errors as job results	20:31
jeblair	tobiash: handling it gracefully gets fairly complicated though. what if nodepool-A is never able to create the nodes it needs because of nodepool-B?	20:31
tobiash	jeblair: at some point in time it will	20:32
jeblair	in other words, the algorithm handles starvation very well currently, but only if it's the only occupant in the tenant (or, at least, its max-servers has been set up so that it doesn't run into other occupants)	20:33
tobiash	jeblair: but I think if it runs into quota error it should behave if it never had tried to fulfill the node request	20:33
*** dkranz has quit IRC		20:34
tobiash	jeblair: could nodepool just unlock the node request in this case (without decline) and retry (or let another nodepool retry) the request?	20:34
jeblair	tobiash: yeah, that's the minimal necessary modification to the algorithm to handle that. it's probably the best thing to do. but i do think we need to make sure we know the caveat that in the case where nodepool can not utilize its full expected quota, starvation can result.	20:34
jeblair	tobiash: yep	20:35
jeblair	tobiash: http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#proposed-change	20:37
tobiash	jeblair: so what kind of starvation do you think of (assuming both nodepools are working for the same zuul)?	20:37
jeblair	tobiash: actually, the best change might be to remain in step 4 until it can be satisfied (in other words, continue trying to create new servers for step 4 despite quota errors).	20:38
jeblair	tobiash: (and also treating a quota failure in step 5 as transitioning that to a step 4 request).	20:39
tobiash	jeblair: I think that could block the job until the quota errors are gone even if another provider in a different tenant could satisfy it?	20:40
tobiash	jeblair: I think I didn't understand some other part of this regarding step 3	20:41
tobiash	jeblair: scenario: one nodepool, request > quota -> declined -> node allocation failed -> job fails with NODE_FAILURE	20:42
tobiash	jeblair: is my assumption correct?	20:42
jeblair	tobiash: correct. to clarify: "request > quota" doesn't mean "request > available_nodes" -- it truly means "my quota is set at 5 nodes and this is a request for 10 nodes, so i can never satisfy it under any conditions"	20:43
tobiash	jeblair: ah, misunderstood that then	20:44
jeblair	tobiash: the algorithm will actually work well, even with two nodepools, if it knows the correct value of "available nodes". it's just that it can't do so without extra api calls. but we can approximate that by treating a surprise out-of-quota error as if we were in step 4 and were just waiting on available nodes (regardless of whether we were in step 4 or step 5)	20:46
tobiash	jeblair: ok, that would work I think	20:46
jeblair	tobiash: and yes, as soon as you handle a request in step 4, you may not be running optimally, because another provider might be able to handle that request. however, that other provider will handle the next request.	20:47
jeblair	tobiash: the occasional delay in fulfilling a single request when a provider hits its quota limit is the price we pay in exchange for not having to communicate between providers.	20:47
tobiash	jeblair: that was the second thing I observed	20:48
jeblair	tobiash: we can, in the future, extend the algorthm to communicate information between providers (we are using zookeeper after all). but we wanted to keep things simple for this first attempt.	20:48
jeblair	tobiash: (and even if we do that, i doubt we would communicate between different nodepool instances sharing a tenant, or even nodepool sharing with something else, so we may still need to keep some of the step-4 behavior)	20:49
tobiash	jeblair: scenario: provider1 almost at quota, provider2 with free capacity, provider1 grabs node request, blocking due to running into quota (max_servers) where provider2 would have been able to serve the request with already existing nodes	20:49
jeblair	(basically, the change would be: step 3.5: if request > available nodes for this provider and request < available nodes for another provider, skip)	20:50
jeblair	tobiash: correct	20:50
tobiash	jeblair: that would do	20:51
jeblair	tobiash: i'd follow your scenario up with: provider2 grabs next node request and continues, while provider1 stops handling further requests	20:51
tobiash	jeblair: maybe it also makes sense to skip instead of block when running into quota (regardless if surprising or calculated)	20:51
jeblair	tobiash: if we always skip and we don't have the extra information about other providers, then the request sits indefinitely.	20:52
jeblair	tobiash: that's bad for large requests -- smaller ones will always starve them out	20:52
jeblair	tobiash: (remember that at load, we're usually only 1 node away from quota, so we would only satisfy one-node requests)	20:53
tobiash	jeblair: right	20:54
tobiash	jeblair: third scenario: provider1 has 0 nodes ready, provider2 has 3 nodes ready, both not at quota	20:56
tobiash	jeblair: what I observed was that often provider1 took the node request spawing a new node where provider2 would have allocated it directly	20:56
tobiash	jeblair: -> job start penalty of a minute	20:57
jeblair	whoopsie	20:57
tobiash	jeblair: so a step 3.5 could be check for ready nodes in different providers and skip to give them a chance	20:57
jeblair	tobiash: that can also probably be solved with something like the step3.5... exactly :)	20:57
tobiash	jeblair: so the plan could be: add 3.5 with ready node check and block in 4 until nodes could be spawned by gracefully handling quota errors?	21:00
tobiash	jeblair: (that would of course be two changes)	21:01
jeblair	tobiash: yeah. i think the change to 4 should happen first, then 3.5 later (because it's introducing a new layer of complexity to the algorithm).	21:02
tobiash	jeblair: agreed, 4 fixes an issue, 3.5 is optimization	21:03
tobiash	jeblair: will try that next week	21:03
* tobiash is on a workshop marathon this week		21:03
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix indentation error in docs https://review.openstack.org/480740	21:06
mordred	jeblair: ok - I've +2d most of the docs stack - there's one -1 in the middle cause it's code related - for otherthings there may be comments with the +2	21:10
mordred	jeblair: also, I think we could deal with the -1 as a followup if you prefer	21:10
jeblair	mordred: ok. as i'm going through this, a lot of your suggestions are good but i'm not going to write all of them. some i will leave for you and others. :)	21:13
jeblair	mordred: i don't want you to think i'm ignoring them, or don't like them.	21:14
jeblair	mordred: just that i want to land this so that we can all take shared ownership of docs. :)	21:14
mordred	jeblair: oh - yah - I mostly just wanted ot write them down somewhere so we didn't completely lose them	21:14
mordred	jeblair: I totally agree about landing this as it is	21:14
jeblair	mordred: does "scalable component" -> "scale-out component" help address your concerns? (i'm also adding other words, just wondering if that's a better capsule description)	21:17
mordred	jeblair: I think so? I think scalable component could describe an Oracle database on a very large piece of hardware	21:18
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Reorganize docs into user/admin guide https://review.openstack.org/475928	21:36
jeblair	mordred, jlk: ^ updated patch #2	21:37
jeblair	mordred, jlk: i've replied to your comments on patches 1 and 2 now.	21:37
mordred	++	21:38
mordred	let's get these landed - SpamapS, feel like landing a couple of docs patches?	21:39
SpamapS	mordred: I'll start reviewing them now.	21:44
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs https://review.openstack.org/477593	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname https://review.openstack.org/477592	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix indentation error in docs https://review.openstack.org/480740	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver https://review.openstack.org/477594	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use oslosphinx theme https://review.openstack.org/477585	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move tenant_config option to scheduler section https://review.openstack.org/477587	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_expiry to webapp section https://review.openstack.org/477586	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs https://review.openstack.org/477589	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Correct sample zuul.conf https://review.openstack.org/477588	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section https://review.openstack.org/477591	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile https://review.openstack.org/477590	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add docs on allow-secrets https://review.openstack.org/480726	21:45
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation https://review.openstack.org/479020	21:45
mordred	SpamapS: awesome. thanks! in general aiming for fixing stuff in followups where possible, fwiw	21:45
jeblair	(cause otherwise ^ that happens and it's not fun)	21:46
*** hashar has quit IRC		21:50
SpamapS	+3'd a lot so far	21:51
SpamapS	So I may have gotten out of sync w/ that last rebase	21:51
SpamapS	leading to a possible <shock> +3 without two +2's	21:52
SpamapS	Unfortunately, I have to run to a 15:00 appt. bbiab	21:52
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_url to webapp config section https://review.openstack.org/480759	21:53
jeblair	mordred: i aimed for the [scheduler] section but missed and ended up in [webapp].	21:53
jeblair	mordred: okay, i'm now waiting for v+1 and w+3s to roll in -- is there job stuff waiting for me to look at?	21:57
mordred	jeblair: no - I added jobs to zuul but then they failed	22:06
mordred	jeblair: the issue at hand is:	22:06
mordred	2017-07-05 18:17:01.821275 \| ubuntu-xenial \| + sudo service mysql start	22:06
mordred	2017-07-05 18:17:02.033407 \| ubuntu-xenial \| Failed to start mysql.service: Unit mysql.service not found.	22:06
jeblair	something amiss in bindep land?	22:06
mordred	jeblair: I have not yet dug in to why that's working for tox-py35 and not for zuul-tox-py35	22:07
mordred	jeblair: I'm guessing	22:07
mordred	jeblair: oh - it's a sequencing	22:08
jeblair	jlk: adam_g left a comment on ps5 of https://review.openstack.org/474401 does it still apply with the change to pr comments rather than commit messages?	22:09
jeblair	mordred: cool; i'll dig into that post url thing now	22:09
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Port in tox jobs from openstack-zuul-jobs https://review.openstack.org/478265	22:09
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Add a test to verify push reports only set status https://review.openstack.org/476400	22:10
mordred	jeblair: cool - the new job was running extra test setup before bindep (whoops)	22:10
jlk	jeblair: so what Adam is saying is that what I have in change 474401 is fine, it is not broken in the way that change 476286 addressed.	22:10
jeblair	jlk: okay cool, sorry i got twisted around. :)	22:11
jlk	I had to re-read it myself :D	22:11
mordred	jeblair: the zuul-* jobs are passing on zuul except for the cover job	22:17
mordred	jeblair: given how little I care about the cover job, I think I'm going to consider that "good for now" and treat the cover job as a followup fix	22:18
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Improve debugging at start of job https://review.openstack.org/480761	22:19
jeblair	that's been driving me batty trying to read the logs ^	22:19
jeblair	mordred: cool. i mean, we should fix it because it's a useful test case, but i'm down with deferring.	22:20
jeblair	mordred: though....	22:20
mordred	yah. I mean - theres a TON of iterative work that needs to be done on those jobs	22:20
jeblair	mordred: if we land it now, we'll go back to having zuul -1 all our changes	22:20
mordred	nah - it's a non-voting job anyway	22:21
jeblair	okay	22:21
jeblair	good, because i have noticed people have been ignoring v-1 changes, even though they are just false v-1s from zuul	22:21
mordred	jeblair: oh - also, I don't know if it's a thing or not - but on the status page completed jobs are not updating their url to be the log location - they keep the finger location	22:22
jeblair	mordred: hrm, that sounds like an oversight.	22:23
mordred	jeblair: https://review.openstack.org/#/c/480692/ is the change to add zuul- jobs to the zuul repo https://review.openstack.org/#/c/478265/ are the jobs themselves	22:23
jeblair	mordred: the reason the cover job is failing is due to a lack of playbook	22:24
mordred	oh	22:24
mordred	well that's a good reason to fail	22:24
mordred	let's fix that	22:25
jeblair	mordred: (that happened to be the one i picked to figure out why the log link didn't show up in the report -- so i think the answer there is that we don't have friendly exception handlers for that, as wes as probably a few other "normal" exceptions like when a job has an included ansible module)	22:25
mordred	jeblair: ah - nod	22:26
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Port in tox jobs from openstack-zuul-jobs https://review.openstack.org/478265	22:27
mordred	k. that should fix the cover job - thakns for spotting that	22:27
jeblair	mordred: do you think that was the case for the other failures too? since your recheck fixed the others...	22:27
mordred	jeblair: oh- no- the others were that we were running extra-test-setup before bindep	22:28
mordred	the recheck there is because I uploaded a new version to zuul-jobs that fixed the order	22:28
jeblair	hrm, that should have reported then. i'll continue to dig.	22:28
mordred	jeblair: that was a normal pre-playbook-exit-nonzero case - and it just actually did hit its retry limit	22:29
jlk	mordred: SpamapS: Could use a review/workflow on https://review.openstack.org/474401	22:40
mordred	jeblair: ok - now https://review.openstack.org/#/c/480692/ is totally green - so https://review.openstack.org/#/c/478265/ is good to go now	22:46
mordred	SpamapS: ^^	22:46
mordred	jeblair, SpamapS: don't gouge your eyes out - there are ugly copy-pasta shell scripts directly in playbooks - followup patches will start to refactor those small bits at a time	22:47
jeblair	mordred: i can't navigate the zuul debug log with all the json crap in it. we've fixed that in code, but i need to restart zuulv3.	22:48
jeblair	mordred: i'd like to do that and then debug the log url thing when it happens again.	22:48
mordred	jeblair: ok	22:49
mordred	jeblair: it's an easy thing to reproduce - you want me to make a DNM patch that will trigger it for you?	22:49
jeblair	mordred: sure. i've restarted zuul scheduler now, so it's safe to push that up.	22:50
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Implement Depends-On for github https://review.openstack.org/474401	22:50
jeblair	mordred: i'm going to restart the executor too, to eliminate any doubt about what we're running. :)	22:51
mordred	jeblair: kk	22:51
jeblair	done	22:52
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: DNM Broken Pre Playbook https://review.openstack.org/480764	22:54
mordred	jeblair: ok - that was one of the no-log types we saw	22:55
mordred	jeblair: (that's the "there is no pre-playbook" case)	22:55
jeblair	mordred: oh, that case i understand	22:56
jeblair	i just don't know what the other failures were	22:56
mordred	k. patch for the other one coming too	22:56
jeblair	mordred: your jobs changes have +2s from me	22:57
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: DNM Broken Pre Playbook that fails pre-task https://review.openstack.org/480766	22:57
mordred	jeblair: woot!	22:57
mordred	jeblair: that one is a pre-playbook that has a shell task that fails	22:57
jeblair	ok cool	22:57
mordred	jeblair: I also realized I can delete all of the other jobs in that one so we don't waste time waiting on them	22:58
SpamapS	I put up a question on 480759	22:58
jeblair	SpamapS: oh, sorry, i guess that's an unrelated change. i wrote it then because i was performing a mental audit of "are single-entry config sections correct". i wanted to record why i thought it was okay to keep zookeeper.hosts	23:01
jeblair	SpamapS: would you like me to spin it out into a new change, or update the commit message?	23:01
SpamapS	jeblair: OH I wasn't sure which one was a white lie! ;-)	23:01
SpamapS	no if that's the correct lie, I'm game	23:01
jeblair	okay, i'll respond to your comment for posterity then. happy to update or spin out as needed though.	23:02
SpamapS	So many birds in the air.. I think we'll see if we can hit a few with the scatter gun. ;-)	23:03
jeblair	i'm not sure if that's a metaphor or not.... :)	23:04
jeblair	mordred: aha! i think i get it. it's because of the way we implement retry_limit. we create the n+1 build, and if n+1 > retries, we fail that build.	23:07
mordred	jeblair: ah!	23:07
jeblair	mordred: that was convenient to implement, but maybe we need to do the silghtly harder thing and actually have the N build return retry_limit on failure	23:08
mordred	jeblair: yah - I think we might need to	23:08
mordred	jeblair: while we're looking at that - are we hard-erroring out on non-retryable errors?	23:09
mordred	jeblair: for instance - there is no need to retry if ansible-playbook returns a parse error	23:09
jeblair	mordred: nope, we retry that. :(	23:09
jeblair	i'll add that to the punch list.	23:10
jeblair	https://storyboard.openstack.org/#!/story/2001104	23:25
jeblair	https://storyboard.openstack.org/#!/story/2001105	23:27
jeblair	https://storyboard.openstack.org/#!/story/2001106	23:29
jlk	hrm. A change (PR) that includes .zuul.yaml files, that should cause a reconfig, right? Like a change that adds another pipeline to a project?	23:47
jlk	(should potentially trigger that pipeline)	23:47
jlk	yeah it should if I read model and manager correctly	23:51
jlk	oh haha, my fault.	23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!