Wednesday, 2017-07-05

*** jkilpatr has quit IRC00:47
*** bhavik1 has joined #zuul05:27
*** bhavik1 has quit IRC05:31
*** bhavik1 has joined #zuul06:21
*** bhavik1 has quit IRC06:24
*** yolanda_ has joined #zuul07:01
*** hashar has joined #zuul08:13
*** bhavik1 has joined #zuul09:21
*** bhavik1 has quit IRC10:14
*** jkilpatr has joined #zuul11:19
*** hashar has quit IRC11:42
*** hashar has joined #zuul11:51
*** dkranz has quit IRC12:32
*** jkilpatr has quit IRC13:11
*** jkilpatr has joined #zuul13:11
*** jkilpatr has quit IRC13:17
*** dkranz has joined #zuul13:26
*** jkilpatr has joined #zuul13:30
openstackgerritAndreas Scheuring proposed openstack-infra/nodepool master: Use POST operations for create resource  https://review.openstack.org/48060114:10
openstackgerritMerged openstack-infra/nodepool master: Remove duplicate python-jenkins code from nodepool  https://review.openstack.org/25915714:23
mordredjeblair, clarkb: I'm reviewing AJaeger's mitaka cleanup patches and it made me think abouta thing we'll need a good story for in zuul v3 ...14:37
jeblairi do like a good story14:37
mordrednamely - we'll want, at some point, to be able to have either a job or a utility to verify what jobs in the zuul config reference a given image name14:37
jeblairmordred: web api may be a good choice for that14:38
mordredas once we have .zuul.yaml job configs - configs that reference, say "ubuntu-trusty" would become sad when infra stops building such a node -but it'll be hard for infra to be able to be proactive on upgrades since we won't know14:38
mordredjeblair: agree14:38
mordredjeblair: I figured we should just capture the use case somewhere so that we remember to implement it before we delete ubuntu-xenial :)14:39
mordredjeblair: might also be neat for us to be able to add a "deprecated" flag to an image so that zuul could report use of a deprecated image when it runs jobs using one too14:39
jeblairmordred: we can also see when nodepool provides nodes, so we can see what active jobs are using particular image types14:41
mordredjeblair: yup14:42
mordredjeblair: I think we've got good tools at our disposal14:42
clarkbswitching behavior to fail instead of queue indefinitely might also help. At least this way you get feedback immediately on any issues which can be corrected15:04
mordredclarkb: ++15:34
mordredclarkb: also will be important even for normal cases to catch config errors15:35
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add configuration documentation  https://review.openstack.org/46332815:49
jlko/15:58
* rcarrillocruz waves16:08
rcarrillocruzhey folks16:08
rcarrillocruzplaying with multiple networks per node in nodepool16:08
rcarrillocruzhowever i get OpenStackCloudException: Error in creating instance (Inner Exception: Multiple possible networks found, use a Network ID to be more specific. (HTTP 409) (Request-ID: req-bd4ea6d2-9c97-487f-82bd-2b0e0540ff06))16:09
rcarrillocruznot sure if it's because nodepool doesn't know which network to associate floating IP on16:09
rcarrillocruzlooking at docs i don't see a param to say 'associate floating IP to this network IPs' or something like that16:09
rcarrillocruzShrews: mordred ^16:09
jlkrcarrillocruz: it's probably deeper than that16:10
jlkrcarrillocruz: are you specifying which network to use for the private address?16:10
jlkOpenStack doesn't really have a server side way to say "this is the default, pick it if somebody doesn't specify"16:10
clarkbright the only time there is a default is when you have a single network16:11
clarkbthen neutron just uses that bceause there is no other option. If there is more than one network you have to specify which to use16:11
jlkrcarrillocruz: I see that message usually when doing the initial boot, and you have more than one network available for the private address.16:12
rcarrillocruzbut where in the yaml16:12
rcarrillocruzin the label?16:12
rcarrillocruzi don't see that in the docs16:12
clarkbrcarrillocruz: no, under the cloud provider16:13
clarkbrcarrillocruz: there are examples in infras nodepool.yaml16:13
clarkb(osic does this iirc)16:13
clarkbhttps://docs.openstack.org/infra/nodepool/configuration.html#provider is where it is documented16:15
dmsimardSilly question16:17
dmsimardAt what point does a RFE becomes a spec from a story ? When someone picks it up and it's ready to be discussed/worked on basically ?16:18
rcarrillocruzi don't  see in prod nodepool providers with multiple networks16:18
rcarrillocruzand yes16:18
rcarrillocruzthat's exactly what i have16:18
rcarrillocruzin the provider16:18
rcarrillocruzmultiple networks defined there16:18
clarkbrcarrillocruz: osic16:18
rcarrillocruzbut nodepool doesn't like it it seems16:18
jeblairdmsimard: we usually only write specs if we want to make sure we have agreement on something before starting work on it16:18
jeblairdmsimard: we tend to do it for larger or more complex efforts16:19
clarkbrcarrillocruz: also tripleo iirc16:19
dmsimardjeblair: makes sense, just wanted to make sure16:19
rcarrillocruzclarkb: looking at nodepool.openstack.org osic clouds have only one network16:21
rcarrillocruz      - name: 'GATEWAY_NET_V6'16:21
clarkbrcarrillocruz: the provider only has one network but the cloud has multiple. That is the way we select whcih of the multiple networks we want to use because neutron will not default to one of them for us16:21
clarkbrcarrillocruz: that is also a list so you can provide more than one if you want more than one16:22
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Reorganize docs into user/admin guide  https://review.openstack.org/47592816:24
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use oslosphinx theme  https://review.openstack.org/47758516:27
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move tenant_config option to scheduler section  https://review.openstack.org/47758716:27
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_expiry to webapp section  https://review.openstack.org/47758616:27
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Correct sample zuul.conf  https://review.openstack.org/47758816:27
dmsimardjlk, jeblair: I created a story from a discussion a while back since it was a topic of discussion again in RDO's implementation https://storyboard.openstack.org/#!/story/200110216:29
jlkdmsimard: can you document the data flow? Where does the whole thing get started from? Seems like you still need an event to wake up the scheduler, which would then use the script as a pipeline filter?  or do you just imagine the scheduler running the script every X seconds in a loop, or???16:32
dmsimardjlk: I'm not particularly familiar with zuul internals so it's not trivial for me to express is proper terms :)16:33
jlkah.16:33
jlkbut generally?16:33
dmsimardjlk: technically, using the term filter would probably be more appropriate than trigger16:34
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs  https://review.openstack.org/47758916:34
dmsimardat least in our use case, it would be a filter16:34
jlkPipelines are event driven, because basically the system sits idle until some event kicks it into gear. Be it an event from gerrit, or github, or a timer.16:34
jlkthat event is processed to determine which project it relates to16:35
jlkand then the scheduler looks at the event and matches it up to the triggers for pipelines, to complete a event + project + jobs + pipeline mashup16:35
jlkwhen it's doing that, it looks at either event filters ( limitations on the triggers themselves ), or pipeline filters ( higher level restrictions on the pipeline ).16:37
dmsimardRight, so perhaps what I'm asking for is something that doesn't exist yet -- the notion of a pipeline filter, and then fold the existing branch/files filtering at the job-level under a proper filter key where more things (such as the arbitrary script I'm speaking of) could be added16:37
jlkfor instance, a github event may be a comment event of "recheck", but the gate pipeline may have a pipeline filter (requirement) that there be enough votes. SO the trigger matches, but the pipeline requirement filter blocks it.16:38
jlkdmsimard: if I understand you, then you still want the initial wake up to be an existing trigger, like a gerrit change or comment16:39
dmsimardyes.16:39
jlkbut then you want as a pipeline requirement the ability to run an arbitrary script to make a Just In Time decision on whether to enqueue or not16:39
dmsimardYes. Ultimately I think our need is for that mechanism to be at the job definition layer (second example in the story)16:40
dmsimardBut I feel that can get things slow rather quickly16:41
dmsimardespecially at scale16:41
jlkyeah, since that definition can live _in_ the repo in question16:41
jlkso it's a clone/merge, re-read configuration16:41
jlkyeah eww, that kind of leaks pipeline config (which is supposed to only be from trusted repos) into untrusted repos16:42
dmsimardjlk: to re-iterate the context, it's basically a way to do a "freeform" filter16:43
jlkwhere would you picture this script running, with what context?16:43
dmsimardjlk: in JJB terms, add a shell builder to decide whether to continue or not very early on in the process :p16:43
dmsimardjlk: that last statement was not in reply to your question, hang on16:44
dmsimardjlk: thinking outside the box, perhaps I'm looking at this from the wrong angle16:45
dmsimardjlk: how about being able to dynamically trigger jobs from a job ?16:45
dmsimardthe filtering and triggering logic could live inside that job and that job would trigger jobs based on what it needs16:46
jlkwe have a job hierarchy thing16:48
jlkYou can have an "early" job on a pipeline, and a number of jobs that "depend" on that job16:48
jlkso if your early job succeeds, the others go. If not, they do not16:48
mordreddmsimard: can you give an example of what userscript.sh might do?16:48
jlklet me see if I can pull up the docs (for v3) on this16:49
dmsimardjlk: yeah, but (like discussed last time), this parent job might have one of a dozen jobs to trigger -- for a patch we might have three jobs to trigger, not 12.. and another patch maybe we have 516:49
dmsimardmordred: curl weather.com and if weather is good trigger job A B C, if weather is bad trigger job C D E ?16:50
jlkthis is where roles/playbooks come in16:50
mordreddmsimard: yah - also what jlk says about hierarchies may be helpful - but a few things are quite different in v3, so it's possible that the things you're wanting to accomplish are structued differently16:50
mordreddmsimard: I mean - can you give me a non-invented example?16:50
mordreddmsimard: (trying to wrap my head around the use case and I think a real example problem would help a lot here)16:51
rcarrillocruzthe issue was that i had networks section on provider16:51
rcarrillocruzand cos i have a pool16:51
rcarrillocruzit had to be put in that level16:51
mordreddmsimard: it's also worth noting that jobs in v3 are in ansible and also support multiple hosts -so it's entirely possible that this is a "this logic should just be in an ansible playbook" case16:52
dmsimardmordred: We have this thing in RDO called rdoinfo which is more or less a database that maps upstream git repositories to rpm package names (amongst other things). This project spans all OpenStack releases and the jobs we need to trigger depends on the release that is being modified.16:52
dmsimardmordred: https://github.com/redhat-openstack/rdoinfo/blob/master/rdo.yml16:52
mordreddmsimard: looking/reading16:52
mordreddmsimard: and thanks16:52
dmsimardmordred: We've currently hacked together something rather awesome but also pretty ugly where we have a job that knows what jobs to trigger and with what parameters and triggers them remotely (on a remote jenkins instance where generic parameterized jobs exist) and we then poll the jobs until they finish to get their status16:53
mordreddmsimard: is that rdo.yml consumed by anything _other_ than routing how builds work?16:54
dmsimardmordred: https://review.rdoproject.org/jenkins/job/weirdo-validate-buildsys-tags/98/consoleFull16:54
dmsimardmordred: it's consumed by various RDO tooling to decide what is built and how it is built16:55
dmsimardmordred: basically, when we, for example, bump upper-constraints https://review.rdoproject.org/r/#/c/7403/3/rdo.yml we'll run integration jobs16:56
dmsimardwhich will build the packages that changed with that patch taken into account and then run integration jobs with those newly built packages16:56
dmsimardthat particular upper-constraints patch is sort of an easy case because it only touches one release, the problem comes when it spans different releases16:57
mordreddmsimard: I mean - I ask because it seems like th eproblems you're solving with it are the same problems that zuulv3 is trying to solve - which is not to say that the answer for your problem is just already in v3 - but the problem spae is SO similar that I think it's going to take us a non-simple amount of time to untangle which bit should grok and understand what16:57
mordreddmsimard: which is my way of saying - I think I understand the problem space, as least in an initial way, and I think we'll need to understand a bit more deeply to be able to work with you on an appropriate solution16:58
dmsimardmordred: sure -- I'm an end user after all :P I'm pretty sure the use case is useful but can be addressed in different ways. Happy to discuss it further.16:59
mordreddmsimard: awesome. I'm certain the use case and problem domain are important- so I definitely want to make sure we find an answer17:00
*** hashar has quit IRC17:02
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname  https://review.openstack.org/47759217:38
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section  https://review.openstack.org/47759117:38
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile  https://review.openstack.org/47759017:38
jlkSo I think I need an alternative to curl for throwing json at my zuul17:39
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs  https://review.openstack.org/47759317:40
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation  https://review.openstack.org/47902017:42
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver  https://review.openstack.org/47759417:42
mordredjlk: I tend to use python requests in the python repl for things like that17:45
jeblairjlk, mordred: ^ that's the docs stack with most comments addressed; starting at 46332817:45
mordredjeblair: woot17:45
jlkthanks!17:46
jeblairi addressed comments without performing a rebase for easier diffing17:46
jeblairi'm going to perform the rebase pass now, and then there are a couple things to change after the rebase (using the new config getter)17:46
mordred++17:47
jeblairthen let's land it and be done :)17:47
jlkI'll trade you for reviews on depends-on and fixing reports on push events :D17:47
mordredjeblair: agree. I think it'll be much more useful for us to have docs and iterate on them as we find things17:47
jeblairjlk: deal17:47
jlkmordred: okay, doing it with requests in a python3 interpreter worked, something is just screwy with my curl binary it seems :(18:00
mordredjlk: well- that's "good news"18:01
jlkseeing if brew has a newer curl18:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs  https://review.openstack.org/47759318:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname  https://review.openstack.org/47759218:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver  https://review.openstack.org/47759418:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use oslosphinx theme  https://review.openstack.org/47758518:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move tenant_config option to scheduler section  https://review.openstack.org/47758718:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add configuration documentation  https://review.openstack.org/46332818:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_expiry to webapp section  https://review.openstack.org/47758618:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs  https://review.openstack.org/47758918:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Correct sample zuul.conf  https://review.openstack.org/47758818:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section  https://review.openstack.org/47759118:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile  https://review.openstack.org/47759018:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation  https://review.openstack.org/47902018:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Reorganize docs into user/admin guide  https://review.openstack.org/47592818:01
jeblairmordred, jlk: ^ that's the rebase -- the remaining suggestions about using get_config were actually the conflicts, so i had to fix those during the rebase.  should mean the stack is ready now.18:02
jlkoooh weird. It worked to get the event in, but then it tried to make an error somewhere?18:04
jlkzuul-scheduler_1     | DEBUG:paste.httpserver.ThreadPool:Added task (0 tasks queued)18:05
jlkthen a 400 bad request18:05
jlkor... or I did something wierd? I don't know.18:05
mordredjlk: it's going to  make an outbound call to github to fill in some data, right?18:05
jlkit did all of that18:05
mordredjlk: is the data in your payload data that it can effectively make outbound calls for?18:05
mordredoh. weird18:06
jlkI'm going to repeat and see if I'm really seeing this or not18:06
mordredkk18:06
mordredjeblair: https://review.openstack.org/#/c/478265/ is ready btw18:06
mordredjeblair: or, I say that- lemme go ahead and add a zuul patch depending on it real quick18:06
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Run the new fancy zuul-jobs versions of jobs  https://review.openstack.org/48069218:08
mordredjeblair: let's see how badly that breaks :)18:08
mordredoh. piddle18:08
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Run the new fancy zuul-jobs versions of jobs  https://review.openstack.org/48069218:08
jlkokay I can't repeat the error when using requests.18:09
mordredjlk: when you're curling - are you doing: -H "Content-Type: application/json; charset=UTF-8" ?18:12
jlkcurl -H "Content-Type: application/json" -H "X-Github-Event: issue_comment" -X POST -d @pull-comment.json http://localhost:8001/connection/github/payload18:12
jlkwhich worked when the recipient was python2 based, but fails with python318:13
jlknow I"m wondering if the saved .json file is just in a format that python3 won't like18:13
mordredjlk: does adding "; charset=UTF-8" to your content type header have any effect?18:14
* mordred grasping at straws to explain18:14
mordredbut I'm honestly curious to know what the problem actually is18:14
jlkit hasn't, no. Because that doesn't change /how/ curl sends things, it just adjust what it tells the other end18:14
jlkI'm going to do something more fun, curl from a Fedora docker.18:15
mordredjlk: awesome18:16
mordredjlk: also, while you're grasping at straws: python3 that: http://paste.openstack.org/show/614493/18:19
jlknod, I went through the dance of writing the file as utf8 via vim18:20
mordred:)18:20
jeblairhrm, looks like there's a problem in the docs stack about halfway down; looking18:21
jlkoh d'oh. Gotta link my docker networks together.18:21
jlknope, that didn't do it.18:24
mordredjlk: like - it still broke?18:27
jlkyeah, I re-wrote the file using your snippit and used curl in Fedora docker to send the file up18:27
jlkstill get the immediate traceback18:27
jlkugh, and using --data-binary instead of just -d   didn't work either18:30
mordredjlk: that's super crazy18:31
mordredjeblair: notaminusone - you added mention of allow-secrets in the secrets sectoin - but it's not actually listed in the pipeline config section18:32
SpamapSjlk: and your only log in zuul-scheduler is 400?18:37
jlkSpamapS: no, it's a traceback out of webob18:37
jlkzuul-scheduler_1     | TypeError: a bytes-like object is required, not 'str'18:37
jlklooks like if I do the json manually in-line with curl it doesn't choke18:37
jlkso it's something about how curl is reading it from the filesystem18:37
SpamapSweird18:38
jlkvery18:40
jlkGuess I'll just build myself a tiny python script to do this.18:43
jlkStill no idea where the bug is, but my little utility is working fine.19:10
jlkhttps://github.com/j2sol/z8s/blob/master/sendit.py19:11
dmsimardIn the context of optimizing resources (for higher concurrency rather than speed), how would you deal with more than one flavor properly inside a nodepool tenant ?19:25
dmsimardJust set max-servers for each node type in a ratio that makes sense ?19:25
clarkbdmsimard: currently max-servers is per provider so you could do that but would have to split up logical providers per resource19:26
dmsimardI know upstream doesn't really deal with this since the flavor is uniform with 8vcpu/8gb ram etc19:26
clarkbs/resource/flavor/19:26
clarkbwe've done the logical provider max-servers mapping onto clodu resources in the past to limit the number of instances per network/router so it does work but not sure how useable it is on the flavor side19:27
dmsimardclarkb: oh you're right max-servers is at the provider level, I was thinking of min-ready19:27
mordredjeblair: I have +2'd the first three, but have left comments on them too19:27
dmsimardclarkb: I guess what I'm looking for is a "max-servers" at the label layer or something.19:28
clarkbdmsimard: ya you can do that with the logical provider hack19:29
clarkbdmsimard: we did it with hpcloud to control how many instances ended up on each network/router19:29
dmsimardclarkb: does an example of this exist somewhere ?19:30
clarkbdmsimard: in the way way back history of our nodepool.yaml from when hpcloud was in it19:30
dmsimardok, I'll try and look when I have a chance. Thanks :)19:30
jeblairdmsimard: remember all this is different in nodepool v3 because allocation is handled differently.  things should work better with much less tuning; the only thing that might warrant tuning is min-ready.19:40
jeblairmordred: thanks; i'll add another patch for allow-secrets19:41
jeblairjlk: feel free to stick it in zuul/tools if it's generally useful19:41
clarkbjeblair: max-servers is still problematic (tobiash ran into this) because you can boot 50 "large" instances which could put you over cpu or memory or disk etc quota then try to boot 1 "small"19:42
clarkbinstance and nodepool thinks it is ok because max servers is say 10019:43
clarkbthis is a general problem in nodepool reducing multi dimensional quota to a single factor19:43
jeblairclarkb: ah yes.  we do need to rid ourselves of max-servers.  hopefully we can figure out how to use openstack's quota api :)19:43
mordredjeblair: we support it shade-side19:44
jeblair(though we probably want to keep supporting max-servers but also add max-ram and max-cpu)19:44
mordredjeblair: the biggest problem is that sometimes the clouds lie/don't report it :(19:44
jeblairfor folks who may have higher quota than they want to use/pay for19:44
clarkbyes ^19:44
mordredjeblair: BUT - I think if we just allowed people to set max-ram / max-cpu / etc like you suggest19:44
clarkbbut also the simplification isn't all bad, openstack quotas are a mess and we mostly don't worry about them because we simplified19:45
mordredwe could get close enough - the info is provided in the flavors19:45
jeblairmordred: do you think the quota api is useful enough for us to do both?  ie, if none of max-{ram,cpu,servers} is unlimited in the nodepool config, can we automatically query openstack to find out what the max is?  or is that dangerous enough we need to make it opt-in or opt-out?19:45
mordredjeblair: I think we could totally query the quota api - and then provide override values for folks with non-sane quota apis19:46
mordredjeblair: or for folks who want to use less than the entire quota on their project19:46
jeblairmordred: that seems like a reasonable approach. yeah, i guess it works for both cases.19:46
mordredbut I also agree - max-servers for the simple case seems like a nice thing to keep19:47
clarkbgranted any resource limitation tool with 20 axis is going to be a complicated mess :)19:48
clarkbinstances, cpu, ram, disk, ports, floating ips, networks, subnets, routers, volumes, volume size, volume aggregate, and I am sure I am forgetting a bunch19:48
clarkboh volumes per instance19:49
clarkbclouds should just be free and infinite19:49
jeblairheh, real clouds pretty much are.  i think the metaphor just broke.19:49
*** hashar has joined #zuul19:49
mordred++19:50
mordredI think maybe let's start with instances, cpu, ram and maybe disk - since those are all reported in flavors19:51
mordredand we can consider fips and volumes later19:51
jeblair"we talkin' cloud like seattle-cloud?  or yuma-cloud?  'cause they ain't the same thing."19:52
mordredheh19:54
mordredjeblair: feature request related to "always run post jobs"19:58
jeblairwow.  in a very morisettian development -- the "cloud museum" is 10 miles from yuma -- the least cloudiest city in america: http://cloudmuseum.dynamitedave.com/19:58
mordredjeblair: if a job hits retry_limit - it sure would be nice to get the logs from the last failure19:58
jeblairmordred: i think that should happen19:59
jeblairdid we land that change yet?19:59
clarkbwouldn't you get them under $uuid?19:59
clarkbso it is just a matter of reporting where they can be found?19:59
jeblairclarkb, mordred: yes, logs from all the attempts should be stored, then i would expect the final log url to be used in the report20:00
mordredah- that may be it - https://review.openstack.org/#/c/480692/ ran new jobs - but no url20:00
jeblairhuh, wonder why no url?20:01
jeblairat any rate, i suspect *that's* the bug here20:01
mordredhttp://logs.openstack.org/92/480692/2/check/e2e1435/job-output.txt20:02
mordredyes - output was logged and uploaded20:02
jeblairokay, i've got 2 things on my stack right now; i can look at that a bit later if someone doesn't beat me to it20:02
jlk'retry_limit' is our version of Nova's "no valid host found".20:07
jeblairjlk: :( yes20:07
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs  https://review.openstack.org/47758920:12
* jlk lunches20:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs  https://review.openstack.org/47759320:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname  https://review.openstack.org/47759220:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation  https://review.openstack.org/47902020:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver  https://review.openstack.org/47759420:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section  https://review.openstack.org/47759120:15
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile  https://review.openstack.org/47759020:15
tobiashhi, regarding quotas20:18
tobiashwhat if we let nodepool just run into quota and pause allocation for a while if it gets an quota related error?20:19
clarkbtobiash: I am not sure you can reliably detact quota erros?20:19
tobiashmy nodepoolv2 works quite nice when running into the quota20:19
clarkbthings will fail but who knows why? that is something that is testable though20:20
tobiashmy nodepoolv3 not so nice...20:20
tobiashclarkb: at least the logs say something about quota in the exceptions20:20
tobiashso I think it should be detectable (not sure how easy)20:20
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add docs on allow-secrets  https://review.openstack.org/48072620:21
tobiashletting run nodepool into quota gracefully would also reduce the need for synchronization of several nodepools in the same tenant20:26
clarkbtobiash: I think the intent is that that already happens because the next provider should attempt to fullfill the request. The missing bit is some sort of coordination to back off once you hit the limit so that you aren't constantly trying when you know you are likely to fail?20:27
jeblairclarkb: the current algorithm assumes that it knows quota; so i think we either need to facilitate the provider "knowing quota correctly" (whether that's api calls and/or config), or change the algorithm to accomodate what tobiash is suggesting20:28
jeblairof course with multiple nodepools in the same tenant, things get a little hairy -- you can, but probably don't want to, solve that with config (set each to 1/2 quota? yuck)20:29
tobiashjeblair: I think running gracefully into quota could be easier and more robust (considering several nodepools running in the same tenant)20:29
jeblairapi calls can improve the situation, but only if you make them before each request, and i wasn't thinking of doing that (that would be a big hit for us)20:30
jeblairso gracefully handling quota errors is probably the best solution to that particular case (and happens to help with some others as well)20:30
tobiashnodepoolv2 had no problem running into quota (apart from stressing the openstack api, but that could be solved)20:30
tobiashwith nodepoolv3 and running into quota I get NODE_FAILURE errors as job results20:31
jeblairtobiash: handling it gracefully gets fairly complicated though.  what if nodepool-A is never able to create the nodes it needs because of nodepool-B?20:31
tobiashjeblair: at some point in time it will20:32
jeblairin other words, the algorithm handles starvation very well currently, but only if it's the only occupant in the tenant (or, at least, its max-servers has been set up so that it doesn't run into other occupants)20:33
tobiashjeblair: but I think if it runs into quota error it should behave if it never had tried to fulfill the node request20:33
*** dkranz has quit IRC20:34
tobiashjeblair: could nodepool just unlock the node request in this case (without decline) and retry (or let another nodepool retry) the request?20:34
jeblairtobiash: yeah, that's the minimal necessary modification to the algorithm to handle that.  it's probably the best thing to do.  but i do think we need to make sure we know the caveat that in the case where nodepool can not utilize its full expected quota, starvation can result.20:34
jeblairtobiash: yep20:35
jeblairtobiash: http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#proposed-change20:37
tobiashjeblair: so what kind of starvation do you think of (assuming both nodepools are working for the same zuul)?20:37
jeblairtobiash: actually, the best change might be to remain in step 4 until it can be satisfied (in other words, continue trying to create new servers for step 4 despite quota errors).20:38
jeblairtobiash: (and also treating a quota failure in step 5 as transitioning that to a step 4 request).20:39
tobiashjeblair: I think that could block the job until the quota errors are gone even if another provider in a different tenant could satisfy it?20:40
tobiashjeblair: I think I didn't understand some other part of this regarding step 320:41
tobiashjeblair: scenario: one nodepool, request > quota -> declined -> node allocation failed -> job fails with NODE_FAILURE20:42
tobiashjeblair: is my assumption correct?20:42
jeblairtobiash: correct.  to clarify: "request > quota" doesn't mean "request > available_nodes" -- it truly means "my quota is set at 5 nodes and this is a request for 10 nodes, so i can never satisfy it under any conditions"20:43
tobiashjeblair: ah, misunderstood that then20:44
jeblairtobiash: the algorithm will actually work well, even with two nodepools, if it knows the correct value of "available nodes".  it's just that it can't do so without extra api calls.  but we can approximate that by treating a surprise out-of-quota error as if we were in step 4 and were just waiting on available nodes (regardless of whether we were in step 4 or step 5)20:46
tobiashjeblair: ok, that would work I think20:46
jeblairtobiash: and yes, as soon as you handle a request in step 4, you may not be running optimally, because another provider might be able to handle that request.  *however*, that other provider *will* handle the next request.20:47
jeblairtobiash: the occasional delay in fulfilling a single request when a provider hits its quota limit is the price we pay in exchange for not having to communicate between providers.20:47
tobiashjeblair: that was the second thing I observed20:48
jeblairtobiash: we can, in the future, extend the algorthm to communicate information between providers (we are using zookeeper after all).  but we wanted to keep things simple for this first attempt.20:48
jeblairtobiash: (and even if we do that, i doubt we would communicate between different nodepool instances sharing a tenant, or even nodepool sharing with something else, so we may still need to keep some of the step-4 behavior)20:49
tobiashjeblair: scenario: provider1 almost at quota, provider2 with free capacity, provider1 grabs node request, blocking due to running into quota (max_servers) where provider2 would have been able to serve the request with already existing nodes20:49
jeblair(basically, the change would be: step 3.5: if request > available nodes for this provider and request < available nodes for another provider, skip)20:50
jeblairtobiash: correct20:50
tobiashjeblair: that would do20:51
jeblairtobiash: i'd follow your scenario up with: provider2 grabs next node request and continues, while provider1 stops handling further requests20:51
tobiashjeblair: maybe it also makes sense to skip instead of block when running into quota (regardless if surprising or calculated)20:51
jeblairtobiash: if we always skip and we don't have the extra information about other providers, then the request sits indefinitely.20:52
jeblairtobiash: that's bad for large requests -- smaller ones will always starve them out20:52
jeblairtobiash: (remember that at load, we're usually only 1 node away from quota, so we would only satisfy one-node requests)20:53
tobiashjeblair: right20:54
tobiashjeblair: third scenario: provider1 has 0 nodes ready, provider2 has 3 nodes ready, both not at quota20:56
tobiashjeblair: what I observed was that often provider1 took the node request spawing a new node where provider2 would have allocated it directly20:56
tobiashjeblair: -> job start penalty of a minute20:57
jeblairwhoopsie20:57
tobiashjeblair: so a step 3.5 could be check for ready nodes in different providers and skip to give them a chance20:57
jeblairtobiash: that can also probably be solved with something like the step3.5... exactly :)20:57
tobiashjeblair: so the plan could be: add 3.5 with ready node check and block in 4 until nodes could be spawned by gracefully handling quota errors?21:00
tobiashjeblair: (that would of course be two changes)21:01
jeblairtobiash: yeah.  i think the change to 4 should happen first, then 3.5 later (because it's introducing a new layer of complexity to the algorithm).21:02
tobiashjeblair: agreed, 4 fixes an issue, 3.5 is optimization21:03
tobiashjeblair: will try that next week21:03
* tobiash is on a workshop marathon this week21:03
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix indentation error in docs  https://review.openstack.org/48074021:06
mordredjeblair: ok - I've +2d most of the docs stack - there's one -1 in the middle cause it's code related - for otherthings there may be comments with the +221:10
mordredjeblair: also, I think we could deal with the -1 as a followup if you prefer21:10
jeblairmordred: ok.  as i'm going through this, a lot of your suggestions are good but i'm not going to write all of them.  some i will leave for you and others.  :)21:13
jeblairmordred: i don't want you to think i'm ignoring them, or don't like them.21:14
jeblairmordred: just that i want to land this so that we can all take shared ownership of docs.  :)21:14
mordredjeblair: oh - yah - I mostly just wanted ot write them down somewhere so we didn't completely lose them21:14
mordredjeblair: I totally agree about landing this as it is21:14
jeblairmordred: does "scalable component" -> "scale-out component" help address your concerns?  (i'm also adding other words, just wondering if that's a better capsule description)21:17
mordredjeblair: I think so? I think scalable component could describe an Oracle database on a very large piece of hardware21:18
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Reorganize docs into user/admin guide  https://review.openstack.org/47592821:36
jeblairmordred, jlk: ^ updated patch #221:37
jeblairmordred, jlk: i've replied to your comments on patches 1 and 2 now.21:37
mordred++21:38
mordredlet's get these landed - SpamapS, feel like landing a couple of docs patches?21:39
SpamapSmordred: I'll start reviewing them now.21:44
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix some inconsistent indentation in docs  https://review.openstack.org/47759321:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add some information about canonical_hostname  https://review.openstack.org/47759221:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix indentation error in docs  https://review.openstack.org/48074021:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename git_host to server in github driver  https://review.openstack.org/47759421:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use oslosphinx theme  https://review.openstack.org/47758521:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move tenant_config option to scheduler section  https://review.openstack.org/47758721:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_expiry to webapp section  https://review.openstack.org/47758621:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use executor section of zuul.conf for executor dirs  https://review.openstack.org/47758921:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Correct sample zuul.conf  https://review.openstack.org/47758821:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move zookeeper_hosts to zookeeper section  https://review.openstack.org/47759121:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use scheduler-specific log config and pidfile  https://review.openstack.org/47759021:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add docs on allow-secrets  https://review.openstack.org/48072621:45
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Clarify canonical_hostname documentation  https://review.openstack.org/47902021:45
mordredSpamapS: awesome. thanks! in general aiming for fixing stuff in followups where possible, fwiw21:45
jeblair(cause otherwise ^ that happens and it's not fun)21:46
*** hashar has quit IRC21:50
SpamapS+3'd a lot so far21:51
SpamapSSo I may have gotten out of sync w/ that last rebase21:51
SpamapSleading to a possible <shock> +3 without two +2's21:52
SpamapSUnfortunately, I have to run to a 15:00 appt. bbiab21:52
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move status_url to webapp config section  https://review.openstack.org/48075921:53
jeblairmordred: i aimed for the [scheduler] section but missed and ended up in [webapp].21:53
jeblairmordred: okay, i'm now waiting for v+1 and w+3s to roll in -- is there job stuff waiting for me to look at?21:57
mordredjeblair: no - I added jobs to zuul but then they failed22:06
mordredjeblair: the issue at hand is:22:06
mordred2017-07-05 18:17:01.821275 | ubuntu-xenial | + sudo service mysql start22:06
mordred2017-07-05 18:17:02.033407 | ubuntu-xenial | Failed to start mysql.service: Unit mysql.service not found.22:06
jeblairsomething amiss in bindep land?22:06
mordredjeblair: I have not yet dug in to why that's working for tox-py35 and not for zuul-tox-py3522:07
mordredjeblair: I'm guessing22:07
mordredjeblair: oh - it's a sequencing22:08
jeblairjlk: adam_g left a comment on ps5 of https://review.openstack.org/474401  does it still apply with the change to pr comments rather than commit messages?22:09
jeblairmordred: cool; i'll dig into that post url thing now22:09
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Port in tox jobs from openstack-zuul-jobs  https://review.openstack.org/47826522:09
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add a test to verify push reports only set status  https://review.openstack.org/47640022:10
mordredjeblair: cool - the new job was running extra test setup before bindep (whoops)22:10
jlkjeblair: so what Adam is saying is that what I have in change 474401 is fine, it is not broken in the way that change 476286 addressed.22:10
jeblairjlk: okay cool, sorry i got twisted around.  :)22:11
jlkI had to re-read it myself :D22:11
mordredjeblair: the zuul-* jobs are passing on zuul except for the cover job22:17
mordredjeblair: given how little I care about the cover job, I think I'm going to consider that "good for now" and treat the cover job as a followup fix22:18
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Improve debugging at start of job  https://review.openstack.org/48076122:19
jeblairthat's been driving me batty trying to read the logs ^22:19
jeblairmordred: cool.  i mean, we should fix it because it's a useful test case, but i'm down with deferring.22:20
jeblairmordred: though....22:20
mordredyah. I mean - theres a TON of iterative work that needs to be done on those jobs22:20
jeblairmordred: if we land it now, we'll go back to having zuul -1 all our changes22:20
mordrednah - it's a non-voting job anyway22:21
jeblairokay22:21
jeblairgood, because i have noticed people have been ignoring v-1 changes, even though they are just false v-1s from zuul22:21
mordredjeblair: oh - also, I don't know if it's a thing or not - but on the status page completed jobs are not updating their url to be the log location - they keep the finger location22:22
jeblairmordred: hrm, that sounds like an oversight.22:23
mordredjeblair: https://review.openstack.org/#/c/480692/ is the change to add zuul- jobs to the zuul repo https://review.openstack.org/#/c/478265/ are the jobs themselves22:23
jeblairmordred: the reason the cover job is failing is due to a lack of playbook22:24
mordredoh22:24
mordredwell that's a good reason to fail22:24
mordredlet's fix that22:25
jeblairmordred: (that happened to be the one i picked to figure out why the log link didn't show up in the report -- so i think the answer there is that we don't have friendly exception handlers for that, as wes as probably a few other "normal" exceptions like when a job has an included ansible module)22:25
mordredjeblair: ah - nod22:26
openstackgerritMonty Taylor proposed openstack-infra/zuul-jobs master: Port in tox jobs from openstack-zuul-jobs  https://review.openstack.org/47826522:27
mordredk. that should fix the cover job - thakns for spotting that22:27
jeblairmordred: do you think that was the case for the other failures too?  since your recheck fixed the others...22:27
mordredjeblair: oh- no- the others were that we were running extra-test-setup before bindep22:28
mordredthe recheck there is because I uploaded a new version to zuul-jobs that fixed the order22:28
jeblairhrm, that should have reported then.  i'll continue to dig.22:28
mordredjeblair: that was a normal pre-playbook-exit-nonzero case - and it just actually did hit its retry limit22:29
jlkmordred: SpamapS: Could use a review/workflow on https://review.openstack.org/47440122:40
mordredjeblair: ok - now https://review.openstack.org/#/c/480692/ is totally green - so https://review.openstack.org/#/c/478265/ is good to go now22:46
mordredSpamapS: ^^22:46
mordredjeblair, SpamapS: don't gouge your eyes out - there are ugly copy-pasta shell scripts directly in playbooks - followup patches will start to refactor those small bits at a time22:47
jeblairmordred: i can't navigate the zuul debug log with all the json crap in it.  we've fixed that in code, but i need to restart zuulv3.22:48
jeblairmordred: i'd like to do that and then debug the log url thing when it happens again.22:48
mordredjeblair: ok22:49
mordredjeblair: it's an easy thing to reproduce - you want me to make a DNM patch that will trigger it for you?22:49
jeblairmordred: sure.  i've restarted zuul scheduler now, so it's safe to push that up.22:50
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Implement Depends-On for github  https://review.openstack.org/47440122:50
jeblairmordred: i'm going to restart the executor too, to eliminate any doubt about what we're running.  :)22:51
mordredjeblair: kk22:51
jeblairdone22:52
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: DNM Broken Pre Playbook  https://review.openstack.org/48076422:54
mordredjeblair: ok - that was one of the no-log types we saw22:55
mordredjeblair: (that's the "there is no pre-playbook" case)22:55
jeblairmordred: oh, that case i understand22:56
jeblairi just don't know what the other failures were22:56
mordredk. patch for the other one coming too22:56
jeblairmordred: your jobs changes have +2s from me22:57
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: DNM Broken Pre Playbook that fails pre-task  https://review.openstack.org/48076622:57
mordredjeblair: woot!22:57
mordredjeblair: that one is a pre-playbook that has a shell task that fails22:57
jeblairok cool22:57
mordredjeblair: I also realized I can delete all of the other jobs in that one so we don't waste time waiting on them22:58
SpamapSI put up a question on 48075922:58
jeblairSpamapS: oh, sorry, i guess that's an unrelated change.  i wrote it then because i was performing a mental audit of "are single-entry config sections correct".  i wanted to record why i thought it was okay to keep zookeeper.hosts23:01
jeblairSpamapS: would you like me to spin it out into a new change, or update the commit message?23:01
SpamapSjeblair: OH I wasn't sure which one was a white lie! ;-)23:01
SpamapSno if that's the correct lie, I'm game23:01
jeblairokay, i'll respond to your comment for posterity then.  happy to update or spin out as needed though.23:02
SpamapSSo many birds in the air.. I think we'll see if we can hit a few with the scatter gun. ;-)23:03
jeblairi'm not sure if that's a metaphor or not.... :)23:04
jeblairmordred: aha! i think i get it.  it's because of the way we implement retry_limit.  we create the n+1 build, and if n+1 > retries, we fail that build.23:07
mordredjeblair: ah!23:07
jeblairmordred: that was convenient to implement, but maybe we need to do the silghtly harder thing and actually have the N build return retry_limit on failure23:08
mordredjeblair: yah - I think we might need to23:08
mordredjeblair: while we're looking at that - are we hard-erroring out on non-retryable errors?23:09
mordredjeblair: for instance - there is no need to retry if ansible-playbook returns a parse error23:09
jeblairmordred: nope, we retry that.  :(23:09
jeblairi'll add that to the punch list.23:10
jeblairhttps://storyboard.openstack.org/#!/story/200110423:25
jeblairhttps://storyboard.openstack.org/#!/story/200110523:27
jeblairhttps://storyboard.openstack.org/#!/story/200110623:29
jlkhrm. A change (PR) that includes .zuul.yaml files, that should cause a reconfig, right? Like a change that adds another pipeline to a project?23:47
jlk(should potentially trigger that pipeline)23:47
jlkyeah it should if I read model and manager correctly23:51
jlkoh haha, my fault.23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!