openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Speed configuration building https://review.openstack.org/509309 | 00:32 |
---|---|---|
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Speed configuration building https://review.openstack.org/509309 | 01:33 |
*** leifmadsen has quit IRC | 03:34 | |
*** leifmadsen has joined #zuul | 03:35 | |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Use new infra pipelines https://review.openstack.org/509224 | 04:10 |
*** mnaser has quit IRC | 04:15 | |
*** tristanC has quit IRC | 04:16 | |
*** mnaser has joined #zuul | 04:23 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Set default on fetch-tox-output to venv https://review.openstack.org/509177 | 04:38 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add TODO note about reworking fetch-tox-output https://review.openstack.org/509237 | 04:38 |
openstackgerrit | Merged openstack-infra/zuul master: Use new infra pipelines https://review.openstack.org/509223 | 04:43 |
*** tristanC has joined #zuul | 04:46 | |
*** _ari_ has quit IRC | 05:54 | |
*** weshay has quit IRC | 05:55 | |
*** pabelanger has quit IRC | 05:56 | |
*** weshay has joined #zuul | 06:20 | |
*** _ari_ has joined #zuul | 06:20 | |
*** pabelanger has joined #zuul | 06:21 | |
*** hashar has joined #zuul | 07:08 | |
*** adrianc has joined #zuul | 07:17 | |
*** isaacb has joined #zuul | 07:57 | |
*** adrianc has quit IRC | 08:08 | |
*** AJaeger has quit IRC | 08:24 | |
*** isaacb has quit IRC | 08:32 | |
*** electrofelix has joined #zuul | 08:57 | |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul-jobs master: Set zuul_log_path for a periodic job https://review.openstack.org/509384 | 09:17 |
*** hashar is now known as hasharAway | 09:39 | |
*** ricky_ has quit IRC | 10:26 | |
*** jkilpatr has quit IRC | 10:35 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add launcher ID to log messages https://review.openstack.org/509406 | 11:03 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add launcher ID to log messages https://review.openstack.org/509406 | 11:06 |
*** jkilpatr has joined #zuul | 11:10 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add launcher ID to log messages https://review.openstack.org/509406 | 11:23 |
Shrews | if we can get 509406 ^^^ in, it would help greatly with future debugging of nodepool | 11:30 |
*** dkranz has joined #zuul | 12:36 | |
mordred | Shrews: done | 12:42 |
dmsimard | Shrews, mordred: that kind of reminds me, is there a zuul variable in which the executor running the job is available ? | 12:42 |
dmsimard | in v2, zuul tells us which zuul-launcher is driving the job, right ? | 12:43 |
dmsimard | is that information relevant ? | 12:43 |
pabelanger | dmsimard: yes, zuul.executor.hostname | 12:44 |
dmsimard | pabelanger, mordred, Shrews: ^ how would you feel about adding that information in https://github.com/openstack-infra/zuul-jobs/blob/master/roles/emit-job-header/tasks/main.yaml ? | 12:46 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Add launcher ID to log messages https://review.openstack.org/509406 | 12:46 |
mordred | dmsimard: fine by me | 12:55 |
mordred | pabelanger, clarkb: ok - this tox-linters / infra-docs tox_envlist thing is WEIRD | 12:55 |
mordred | if you look at http://logs.openstack.org/21/509221/2/infra-check/tox-linters/4b0f6a5/zuul-info/inventory.yaml ... | 12:56 |
mordred | you can see that zuul is passing tox_envlist: infra-docs | 12:56 |
mordred | tox_extra_args: -vv python setup.py build_sphinx | 12:56 |
mordred | but those variables are not set anywhere in the inheritance chain for the tox-linters job | 12:57 |
mordred | jeblair: ^^ when you get up, I think there may be a bug (I'm going to keep looking myself, but it seems like one that might wind up needing your eyeballs) | 12:57 |
dmsimard | mordred: jeblair fixed it yesterday afaik | 12:58 |
dmsimard | it was due to config rework | 12:58 |
mordred | oh! ok cool | 12:58 |
dmsimard | we noticed it in https://review.openstack.org/#/c/509254/ first and then he submitted PS2 in https://review.openstack.org/#/c/509309/ | 12:59 |
dmsimard | and it passed after a recheck | 12:59 |
dmsimard | since that was last night, I'd suspect it occured before PS2 was reloaded | 13:00 |
mordred | nod | 13:00 |
mordred | so - any known reason why infra-check isn't showing up on http://zuulv3.openstack.org/ ? | 13:01 |
dmsimard | not sure, I've noticed that 509254 didn't merge despite passing check last night, maybe related but I was asleep back then | 13:02 |
mordred | yah | 13:02 |
pabelanger | mordred: I thought I seen infra-check / infra-gate this morning | 13:10 |
pabelanger | but I don't see them now | 13:10 |
mordred | pabelanger: I looked at debug log and it seems to think infracheck is a thing | 13:12 |
pabelanger | Ya, I could see the pipeline in debug.log yesterday too | 13:13 |
pabelanger | hmm | 13:14 |
pabelanger | | 0000124504 | rax-ord | None | ubuntu-xenial | 5aefbb28-6219-4347-8e17-d53e71225c28 | in-use | 00:08:11:55 | locked | | 13:14 |
pabelanger | that doesn't look right | 13:14 |
pabelanger | 8+ hours, in-use | 13:14 |
pabelanger | going to see why | 13:14 |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/zuul feature/zuulv3: Avoid JS error when a change has no id https://review.openstack.org/509435 | 13:14 |
*** hasharAway has quit IRC | 13:16 | |
openstackgerrit | Andrea Frittoli proposed openstack-infra/zuul-jobs master: Add a generic stage-artifacts role https://review.openstack.org/509233 | 13:17 |
openstackgerrit | Andrea Frittoli proposed openstack-infra/zuul-jobs master: Add compress capabilities to stage artifacts https://review.openstack.org/509234 | 13:17 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: Add zuul.pipeline and zuul.executor.hostname to job header https://review.openstack.org/509436 | 13:20 |
pabelanger | dmsimard: see: https://review.openstack.org/483022/ for history. It should be in validate-host, or was | 13:24 |
dmsimard | pabelanger: hm, I figured that should be in the job header role, I'll look. brb. | 13:26 |
pabelanger | Hmm, ze07 doesn't look happy | 13:28 |
pabelanger | trying to see why | 13:28 |
pabelanger | I am seeing a bunch of leaked ssh-agent processes | 13:28 |
pabelanger | wonder if we should put a timeout on them | 13:28 |
mordred | pabelanger: we should put that on the todo-list ... | 13:28 |
mordred | pabelanger: figuring out why they're leaking ... and then having them not leak :) | 13:29 |
pabelanger | okay | 13:29 |
pabelanger | I fixed ze07 I think | 13:29 |
pabelanger | it is because there was a git clone process that was hung | 13:30 |
pabelanger | and seemed to block everything on the executor | 13:30 |
pabelanger | once I killed it, it started processing again | 13:30 |
pabelanger | mordred: okay, updated etherpad with ^ | 13:32 |
pabelanger | dmsimard: ya, we already have the info in inventory files, should be needed to duplicate it | 13:33 |
mordred | pabelanger: do we have an entry already about that git hung process thing too? | 13:34 |
pabelanger | mordred: looking | 13:34 |
mordred | dmsimard, pabelanger: should we maybe consider including inventory files in logstash uploads? would it help e-r queires? | 13:34 |
pabelanger | mordred: ++ | 13:35 |
pabelanger | mordred: I don't see any other entry about hung processes | 13:35 |
pabelanger | mordred: actually, line 106. It could be related | 13:35 |
dmsimard | mordred: yeah, we could add it with proper indexing per field | 13:35 |
mordred | pabelanger: k. we should add that to our todo list too - it's happened a few times | 13:35 |
mordred | pabelanger: out of curiosity - was the repo it was hung on glance-specs | 13:36 |
mordred | dmsimard: ++ | 13:36 |
pabelanger | mordred: project-config | 13:36 |
mordred | pabelanger: ok. cool | 13:36 |
dmsimard | mordred: I can probably send a patch for that | 13:36 |
pabelanger | guessing networking | 13:36 |
mordred | pabelanger: all the previous times I'd seen it it was glance-specs | 13:36 |
pabelanger | ya | 13:36 |
pabelanger | same | 13:36 |
mordred | so I was starting to worry abot that repo :) | 13:36 |
pabelanger | checking other executors now | 13:36 |
pabelanger | ze06.o.o stuck, fixed | 13:38 |
pabelanger | openstack/openstack repo | 13:38 |
pabelanger | checking zuul-mergers too | 13:40 |
dmsimard | mordred: fyi speaking of e-r, zuulv3 support is here https://review.openstack.org/#/c/509313/ (builds on top of the zuul_stream new emit playbook header) | 13:41 |
pabelanger | zm05 hung :( | 13:41 |
pabelanger | cleaning it up | 13:41 |
pabelanger | k, just zm05.o.o. So ya, we should see about wrapping out git clone with timeouts | 13:42 |
dmsimard | mordred, pabelanger: could you point me in the right direction where zuul would be sending metrics to graphite ? grepfu is failing me | 13:43 |
dmsimard | mordred, pabelanger: nevermind, through statsd | 13:44 |
pabelanger | mordred: okay, once zuul-executors started running again, the jobs in check pipeline picked up where things left off. So, zuul was able to recover properly | 13:46 |
pabelanger | which is nice :) | 13:46 |
mordred | dmsimard: looks good! it does make me wonder ... | 13:50 |
mordred | dmsimard, pabelanger, tobiash, jeblair, jlk, SpamapS, clarkb, fungi: should we consider before re-rollout renaming job-output.txt to console.html? or maybe just call the html wrapper thing that consumes job-output.json whenever we write it console.html instead (I know clarkb wants us to leave the pure-text job-output.txt for ease of downloading/grepping) | 13:52 |
mordred | dmsimard: also - I noticed a http://logs.openstack.org/21/509221/2/infra-check/build-openstack-sphinx-docs/fa00cdc/job-output.txt.gz#_2017-10-03_23_37_59_484348 | 13:53 |
pabelanger | mordred: I too prefer job-output.txt to console.html | 13:53 |
pabelanger | however the HTML wrapper seems like a good thing for json file | 13:53 |
dmsimard | I don't have a strong opinion | 13:53 |
mordred | pabelanger: yah - the intent is to write an html/javascript that takes job-output.json and makes it look like what job-output.txt looks lke now - except with expandable/contractable sections and details | 13:54 |
dmsimard | mordred: really ? | 13:54 |
SpamapS | mordred: +1 to html and plain being available. | 13:54 |
mordred | so that we'll wind up with 3 main presentations of that data ... | 13:54 |
dmsimard | mordred: plaintext and ara aren't enough ? :p | 13:54 |
mordred | 1) the text file - which is what is streamed to the websocket streaming | 13:54 |
dmsimard | oh, the full "unadultered" ansible stdout | 13:55 |
mordred | 2) the new console.html which will look like the text file but will have sections - so for folks comfortable with that view but who want a little more ability to poke | 13:55 |
pabelanger | speaking of ARA, I did notice it using a bit of CPU for generating reports. I didn't profile it, but something we should look at on the executors. However, might not be much of an issue now that SpamapS patch is running | 13:55 |
mordred | 3) ara - for exploring the build froman ansible POV | 13:56 |
mordred | hopefully between the three of those we should cover folks depending on how they like to think about things | 13:56 |
dmsimard | pabelanger: I'd expect that it would use cpu capacity when generating reports, yes, but hopefully for a very short burst/duration | 13:57 |
mordred | I imagine some folks are going to want to ignore the ansible-ness of things and just have ansible run shell scripts - which should be fine | 13:57 |
dmsimard | fairly high-volume playbooks (such as OSA with >20 hosts >2000 tasks) take no more than a few seconds to generate | 13:57 |
mordred | then - once we get tristanC's dashboard in - I would also imagine a way for someone to set a personal prefernce "show me ara by default" or "show me console.html by default" - and a dashboard job result link could show you the thing that makes sense | 13:58 |
mordred | dmsimard, pabelanger: now that we have SpamapS patch in - that one seems like just a thing to deal with via capacity planning and executor scale out (also, thanks for that patch SpamapS !!!) | 13:59 |
Shrews | I'd like to restart nl01 and nl02 to get the new logging enhancement. Anyone object? | 14:00 |
pabelanger | mordred: Ya, so far executors are working great (minus git clone issue). I'd like to play with our threshold too, right now we are at 20.00 load cap. We likely can bump up a little more based on zuul-launcher history | 14:01 |
pabelanger | Shrews: wfm | 14:01 |
dmsimard | It's also worth mentioning that generating static reports is very convenient but really not efficient (in terms of storage), eventually we can consider other ways of providing the ara reports like what pabelanger and I have been discussing | 14:01 |
Shrews | holy smokes. >5300 requests in the node request pool | 14:02 |
Shrews | wth is happening there? | 14:02 |
pabelanger | yah, I really like the ability how stackviz works. If we could some how capture the data in ansible, then at later time have ARA generate it, that would be awesome | 14:02 |
openstackgerrit | Andrea Frittoli proposed openstack-infra/zuul-jobs master: Add a generic process-test-results role https://review.openstack.org/509459 | 14:03 |
pabelanger | Shrews: that is because 2 zuul-executors were stuck for about 8 hours | 14:03 |
pabelanger | so we didn't process any jobs | 14:03 |
pabelanger | it should be draining now | 14:03 |
dmsimard | pabelanger: stackviz generates statically in-job, no ? | 14:04 |
dmsimard | pabelanger: or you mean openstack-health ? | 14:04 |
pabelanger | dmsimard: stackviz does today, but nothing stopping us from doing that at a later point in time | 14:04 |
dmsimard | right | 14:05 |
rcarrillocruz | heya | 14:06 |
rcarrillocruz | Shrews: around? | 14:06 |
rcarrillocruz | i'm getting "AttributeError: 'OpenStackNodeRequestHandler' object has no attribute 'launcher_id' " on nodepool | 14:07 |
rcarrillocruz | saw a recent change you pushed, like a few hours? | 14:07 |
rcarrillocruz | is that supposed to be fixed by now | 14:07 |
Shrews | rcarrillocruz: yup. good grief you are running on the edge | 14:07 |
rcarrillocruz | heh | 14:07 |
Shrews | rcarrillocruz: just saw that myself. working on a fix | 14:07 |
rcarrillocruz | thx sir | 14:07 |
dmsimard | edgy | 14:08 |
Shrews | gah. that attribute isn't set immediately in the parent | 14:09 |
Shrews | that stinks | 14:09 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Log exceptions from MODULE FAILURE more consistently https://review.openstack.org/509484 | 14:14 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Set log after we have launcher_id https://review.openstack.org/509485 | 14:14 |
mordred | dmsimard: ^^ that should fix the issue seen here: http://logs.openstack.org/21/509221/2/infra-check/build-openstack-sphinx-docs/fa00cdc/job-output.txt.gz#_2017-10-03_23_37_59_484348 | 14:14 |
Shrews | mordred: pabelanger: ^^^ can we push this through? nl02 is down until we get that merged | 14:14 |
Shrews | thx | 14:15 |
Shrews | rcarrillocruz: 509485 is the fix. also, standard warning about running latest code, blah blah blah, bugs, blah blah blah | 14:16 |
rcarrillocruz | lulz | 14:16 |
rcarrillocruz | thx | 14:16 |
rcarrillocruz | +2 | 14:16 |
Shrews | but it's kinda cool someone is running our code before WE are :) | 14:18 |
rcarrillocruz | heh | 14:18 |
jeblair | pabelanger: what repo was hung with the git clone? | 14:18 |
jeblair | nm, i see it in scrollback now | 14:19 |
rcarrillocruz | i'm continously deploying nodepool, as part of a ci installer for ansible networking | 14:19 |
Shrews | rcarrillocruz: neat! risky, but neat. | 14:20 |
rcarrillocruz | ansible-role-nodepool doesn't install nodepool with latest, but as i torn down the environment to bring it up from scratch today i got the issue | 14:22 |
pabelanger | :) | 14:23 |
rcarrillocruz | hmm | 14:25 |
rcarrillocruz | o-k | 14:25 |
rcarrillocruz | so as zuul v3 has 20 capacity, i better stop looking at zuul dashboard | 14:26 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Avoid JS error when a change has no id https://review.openstack.org/509435 | 14:27 |
fungi | rcarrillocruz: whatcha mean by "20 capacity?" | 14:33 |
fungi | its capacity is 20% of our aggregate quota | 14:33 |
fungi | so maybe that? | 14:34 |
rcarrillocruz | yeah, that's what i meant | 14:34 |
fungi | that's still something like 200 nodes | 14:34 |
pabelanger | ya, it is a little backed up because of the blockage this morning | 14:34 |
pabelanger | but pipeline is moving | 14:34 |
rcarrillocruz | yeah, but there seems to be a long queue ahead of the change | 14:35 |
Shrews | nl02 being down might back it back up again, but hope to have it up soon | 14:35 |
pabelanger | rcarrillocruz: which change? | 14:36 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move test_model.test_job_inheritance_configloader https://review.openstack.org/509495 | 14:36 |
rcarrillocruz | https://review.openstack.org/509485 | 14:36 |
rcarrillocruz | the thing i was chatting with Shrews about earlier | 14:37 |
jeblair | that is not especially urgent, but if i do any more config rework, i'm going to build on that. because i spent as much time changing that test for the actual work i did earlier as i just now spent moving it so i won't have to do that again. | 14:37 |
pabelanger | rcarrillocruz: I don't think nodepool is gating by zuulv3 | 14:37 |
jeblair | we should probably move it to the infra queues | 14:38 |
Shrews | no, that's waiting on v2 | 14:38 |
rcarrillocruz | but check is on zuulv3 dashboard at least | 14:38 |
rcarrillocruz | oh wait | 14:38 |
rcarrillocruz | so | 14:38 |
pabelanger | Ya, we run all checks for zuulv3 but zuulv2.5 is the gate | 14:38 |
rcarrillocruz | events | 14:38 |
rcarrillocruz | are being processed by both zuuls | 14:38 |
rcarrillocruz | gotcha | 14:38 |
pabelanger | rcarrillocruz: we did create infra-check and infra-gate for zuulv3 on projects that are gatting. eg: zuul, zuul-jobs, openstack-zuul-jobs, which will be high priority | 14:39 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Speed configuration building https://review.openstack.org/509309 | 14:41 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move test_model.test_job_inheritance_configloader https://review.openstack.org/509495 | 14:41 |
*** leifmadsen has quit IRC | 15:01 | |
*** leifmadsen has joined #zuul | 15:06 | |
pabelanger | Shrews: I'm seeing about 44 used nodes that are unlocked right now, but nodepool-launcher doesn't appear to be cleaning them up. Based on the discusion the other day, I thought you mentioned nodepool should delete those | 15:11 |
Shrews | pabelanger: are they owned by nl02? | 15:12 |
pabelanger | Shrews: checking | 15:12 |
pabelanger | Shrews: yah, they appear to be | 15:12 |
Shrews | k. nl02 is not running | 15:13 |
pabelanger | okay, cool! That explains it | 15:13 |
Shrews | and the fix doesn't appear to be moving :( | 15:13 |
clarkb | mordred: my only concern with adding more and more representations of the data like that is log storage and retention | 15:14 |
clarkb | console logs tend to be large ish (though not our largest source of disk consumption) | 15:14 |
pabelanger | Shrews: we could enqueue into gate if needed, or move nodepool to zuulv3 infra-check / infra-gate? | 15:16 |
Shrews | pabelanger: 1st thing sounds good | 15:17 |
Shrews | then maybe the 2nd thing after for future changes? | 15:17 |
pabelanger | Shrews: 509485 right? | 15:18 |
Shrews | pabelanger: yes | 15:19 |
pabelanger | okay, enqueued | 15:20 |
Shrews | jeblair: for this double locking thing, acceptNodes() already checks if the request it was given is canceled. Does it need to see if request.uid still exists in self.requests (assuming it was canceled and deleted before it pulled the event off the queue)? | 15:22 |
Shrews | (just trying to make sure i have the sequence of events right in my head) | 15:23 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Set log after we have launcher_id https://review.openstack.org/509485 | 15:23 |
pabelanger | Shrews: ^ | 15:24 |
Shrews | pabelanger: cool. weird that it still shows on status.o.o | 15:24 |
pabelanger | Shrews: ya, it will still run in the check pipeline then report results | 15:25 |
Shrews | pabelanger: will wait for puppet to update it then restart | 15:25 |
jeblair | Shrews: i think the cancel we're talking about here is zuul itself canceling it. but in the thing we saw, we lost the zk connection, so it disappeared. | 15:25 |
pabelanger | okay, relocating to library. will return shortly | 15:25 |
Shrews | jeblair: oh, so _updateNodeRequest should be the thing that updates the copy of the request with "this disappeared" so acceptNodes can check that and act accordingly? | 15:28 |
Shrews | gah. i need to think through this some more so i can see how the various pieces interact | 15:30 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove test_job_inheritance https://review.openstack.org/509509 | 15:30 |
jeblair | Shrews: likely so -- i imagine that's getting called but not doing anything useful for the 'znode disappeared' case right now | 15:31 |
Shrews | ah ha, that's a watcher callback. that's one piece in place | 15:34 |
*** yolanda has left #zuul | 15:34 | |
Shrews | i may not be quick, but i sure am slow | 15:34 |
mordred | clarkb: yes - it should be no additional representations than we have now | 15:36 |
dmsimard | mordred: That whole module failure handling, not just your patch but what exists elsewhere for handling module failures seems needlessly complicated to me ? CallbackModule ships methods to handle warnings and exceptions, we could use them ? Or improve them if we feel they're not good enough ? | 15:36 |
mordred | clarkb: console.html will be an in-broswer javascript rendering of log-output.json | 15:36 |
dmsimard | mordred: https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/callback/default.py#L52-L53 && https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/callback/__init__.py#L111-L135 | 15:36 |
mordred | dmsimard: yah - the problem is that those use self._display which is not useful to us | 15:37 |
dmsimard | CallbackModule.display = self.log ? | 15:37 |
dmsimard | or something | 15:37 |
dmsimard | I dunno | 15:37 |
mordred | dmsimard: now - maybe what we should do is define something that behaves like self._display that we can use to override it and re-use base methods | 15:38 |
mordred | dmsimard: I agree in general - zuul_stream needs a refactor which is coming up very soon on my todo list | 15:38 |
Shrews | | None | None | None | None | None | used | 00:00:00:00 | unlocked | | 15:38 |
Shrews | that's one of our nodes | 15:38 |
dmsimard | mordred: anyway, I'm not -1 on your patch if it fixes the issue -- I'm just saying we need to keep things as simple as possible :) | 15:39 |
Shrews | if someone want to try to track down why it's all empty, that would be great | 15:39 |
dmsimard | Shrews: ah interesting, that'd explain the None I've been seeing on http://grafana.openstack.org/dashboard/db/nodepool | 15:40 |
mordred | dmsimard: totally agree | 15:40 |
mordred | Shrews: I managed to get myself nerdsniped from your logging patch to nodepool earlier ... | 15:40 |
Shrews | mordred: sorry? | 15:41 |
mordred | Shrews: (the [%s] on the end of the logger name bothered me, because python logging otherwise has facilities for attaching arbitrary data and being structured for flexible consume-side logging config ... | 15:41 |
mordred | BUT - I have learned a new thing and have a fun patch coming that I think should be nice in a few different places | 15:42 |
Shrews | k. at any rate, nl02 is now restarted and doing things | 15:43 |
Shrews | pabelanger: ^^^ | 15:43 |
Shrews | and the request list is empty. going to restart nl01 now | 15:44 |
dmsimard | Shrews: grafana shows that "None" node as far back as 6 months ago /me shrug | 15:44 |
Shrews | nl01 restarted | 15:46 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add git timeout https://review.openstack.org/509517 | 15:48 |
* SpamapS has coffee now | 15:51 | |
*** hashar has joined #zuul | 16:00 | |
pabelanger | Shrews: great, thanks | 16:03 |
jeblair | Shrews: 2017-10-03 18:39:48,998 INFO zuul.nodepool: Returning nodeset <NodeSet OrderedDict([('ubuntu-xenial', <Node None ubuntu-xenial:ubuntu-xenial>)])OrderedDict()> | 16:06 |
pabelanger | just seen this too | 16:06 |
pabelanger | 2017-10-04 16:04:47,782 DEBUG zuul.zk.ZooKeeper: ZooKeeper connection: LOST | 16:06 |
pabelanger | there we go, it is back | 16:06 |
pabelanger | err, tying to connect | 16:06 |
jeblair | pabelanger: let's not debug zk connection issues while i'm performing memory debugging | 16:06 |
pabelanger | jeblair: okay! | 16:07 |
jeblair | Shrews: so, somehow, zuul decided to return a nodeset before it was allocated | 16:08 |
Shrews | jeblair: i'm beginning to get concerned that you are resubmitting node requests with the same UID. b/c nodepool could update the request out from under zuul if you have resubmitted it. | 16:10 |
Shrews | and also, it makes it difficult to match up the node acceptance with the proper request iteration | 16:11 |
jeblair | Shrews: does nodepool use the request.uid? | 16:12 |
Shrews | jeblair: only for updating the request. otherwise, no | 16:12 |
jeblair | Shrews: why? that's internal to zuul... | 16:13 |
jeblair | Shrews: i don't see uid in nodepool's NodeRequest model | 16:13 |
Shrews | not talking about nodepool modifying the uid. but the info for the request (the set of nodes, in particular) | 16:14 |
jeblair | Shrews: okay, you may need to explain the issue to me with more words, sorry. | 16:14 |
Shrews | jeblair: i'm not sure that there is an issue yet. just beginning to formulate this in my head | 16:15 |
Shrews | jeblair: but in the current thing i'm debugging (the double node locking thing), i'm concerned that the request will enter the queue with one set of data, but could possibly be modified with a different set of data before it can be processed out of the queue | 16:18 |
jeblair | Shrews: who's doing the modifying? | 16:18 |
Shrews | jeblair: nodepool (b/c it's processing the request twice). the request watcher you use in zuul would update the request if nodepool changes it. | 16:19 |
jeblair | Shrews: why would nodepool process the request twice? | 16:20 |
Shrews | jeblair: because you resubmit it if zuul notices it disappears | 16:20 |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool feature/zuulv3: Provide resource_name in logging as structured data https://review.openstack.org/509531 | 16:20 |
Shrews | jeblair: is this scenario possible: 1) zuul submits NR 2) np assigns nodes 3) zuul enqueues the NR to be handled (but processing of the queue is slow) 4) zk connection is lost, zuul resubmits same NR 5) nodepool assigns new set of nodes to same request (which updates the request in zuul's queue) | 16:23 |
Shrews | my concern is that if 5 happens before the request is processed out of the queue, that we lose the original request data | 16:24 |
Shrews | could totally not be a problem, but i'm not yet familiar enough with this code to know if it's a valid concern | 16:28 |
jeblair | Shrews: i see what you're getting at; since we map the znode request id to a nodepool request object, while we know from zk that it's 2 different requests that have been updated, by the time it gets to the zuul event queue, it just appears as the same request twice. i don't think there's any data corruption, except for the fact that we could end up accepting the same request twice. | 16:34 |
jeblair | Shrews: however, *a lot* of things internally are based on that request object being persistant, so it may be better to just find a way to invalidate queued events if they end up becoming invalid (perhaps by checking that the underlying nodepool request id for the request matches what it was when the event was enqueued) | 16:35 |
Shrews | jeblair: yeah, that's what i was (poorly) attempting to get at with the re-used UID | 16:36 |
Shrews | jeblair: also, i'm not sure if it's the *same* request object in the queue twice (thus both would get changed), or if they're copies | 16:38 |
* Shrews mumbles about python underneath his breath | 16:39 | |
clarkb | id() will tell you | 16:40 |
clarkb | might be something to log if it is a concern | 16:41 |
*** hashar is now known as hasharAway | 16:46 | |
*** AJaeger has joined #zuul | 17:00 | |
jeblair | Shrews: there is a single zuul NodeRequest object, so it would be added to the queue twice. but if the connection has been lost the underlying znode object has changed. so adding the znode id to the event along with the request would let us verify that the event is still valid. | 17:06 |
jeblair | Shrews: the "None" node was from request 100-0000118403 | 17:11 |
jeblair | hrm, i'm not sure about that actually | 17:17 |
dmsimard | Looks like zuulv3.o.o has gone lights out again :( | 17:40 |
jeblair | Shrews: it was request 200-0000118158 | 17:40 |
jeblair | Shrews: and i think i see the error | 17:40 |
jeblair | Shrews: i think the connection was in the process of going offline when it was updating the request for the fulfilled event. so it managed to set request.state=fulfilled, but did not manage to update the nodes. | 17:45 |
pabelanger | jeblair: +2 on 509517 but comment was left. noticed something in API docs for timeout that might affect us | 17:45 |
jeblair | Shrews: the joys of a non-transactional system | 17:46 |
pabelanger | dmsimard: running slowly, memory debugging | 17:46 |
Shrews | jeblair: transactions are supported | 17:46 |
Shrews | jeblair: http://kazoo.readthedocs.io/en/latest/api/client.html#kazoo.client.KazooClient.transaction | 17:47 |
Shrews | i suspect we should really be taking more advantage of that | 17:47 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Update node requests after nodes https://review.openstack.org/509571 | 17:47 |
jeblair | Shrews: i mean locally. from what i can tell, we aborted halfway through updating our internal data structure. | 17:48 |
jeblair | Shrews: we'd need an internal transaction mechanism to deal with that. | 17:49 |
jeblair | Shrews: anyway ^ that's one possible quick fix; feel free to incorporate it or discard it depending on how it relates to your current work. | 17:49 |
jeblair | pabelanger: yes, i agree it could be a problem. if we figure out how to detect that has happened, we should probably delete the repo. i don't know how to do that at the moment, and this should at least avoid stopping the world. | 17:53 |
Shrews | jeblair: where did you see that a znode has an id? I do not see that here: https://kazoo.readthedocs.io/en/latest/api/protocol/states.html#kazoo.protocol.states.ZnodeStat | 18:54 |
Shrews | session id might be useful. could compare it to the current connection's session id | 18:55 |
Shrews | need to run a test to see how that works.... | 18:56 |
jeblair | Shrews: the request has an id (the znode name); in zuul we store it as NodeRequest.id | 19:05 |
jeblair | Shrews: if zuul resubmits the request, that will be updated with the new znode | 19:06 |
Shrews | jeblair: oh! duh, it's a sequence znode. | 19:09 |
jeblair | yeah, so we always know it's different if it was resubmitted | 19:12 |
dmsimard | executors aren't automatically updated on each zuul merge, right ? | 19:24 |
clarkb | dmsimard: not the python code they are running no | 19:26 |
clarkb | dmsimard: anything in the ansible bits should be as that is forked on demand | 19:26 |
dmsimard | clarkb: what decides when the python code is reloaded ? it's on demand ? | 19:27 |
clarkb | ya I think we do it manually when needed | 19:27 |
clarkb | well puppet updates the install, then restarts to pick it up are on demand | 19:27 |
fungi | for zuul source code, in our environment puppet deploys it periodically from branch tip and then we run the new version whenever the daemon is manually restarted | 19:27 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603 | 19:27 |
Shrews | jeblair: ^^^ untested, but maybe that's enough? | 19:28 |
dmsimard | okay, I was wondering why https://review.openstack.org/#/c/509254/ did not seem to be effective yet but that explains it | 19:28 |
Shrews | jeblair: not sure how to write a test for that | 19:28 |
Shrews | ugh, i don't think that will work if the request.id value can change from underneath us at that point (due to re-use of the request object) | 19:33 |
pabelanger | jeblair: Ya, deleting might work. No issue waiting until it actually happens then dealing with it then | 19:35 |
jeblair | Shrews: ya, i'm starting to like the idea that we embrace that and use it. so set request.id back to None in the callback when the node is deleted from under us. then include the request id in any events we enqueue. don't respond to an event if the request id doesn't match. | 19:35 |
jeblair | pabelanger: if we can figure out how to force a timeout (maybe i could clone the kernel over dsl with a 1s timeout) we can probably prepare for it | 19:35 |
pabelanger | For sure | 19:36 |
jeblair | Shrews: then we invalidate events both in the case that that we process it before and after resubmission | 19:36 |
jeblair | pabelanger: if you want to run with that idea (not sure how fast your new connection is, but if you can clone the kernel in 1s i will be really impressed) feel free, i'm still swamped debugging memory | 19:37 |
Shrews | jeblair: what good does setting the request.id to None do if we just resubmit it and it gets a new id? | 19:37 |
pabelanger | jeblair: sure, i can take a go at it | 19:37 |
Shrews | totally get the "include req id" in the queue part | 19:38 |
jeblair | Shrews: oh, hrm, we resubmit it in the callback. it would only be None for very short while. still, it narrows a race, and if resubmitting fails, we'd still be in a better state. | 19:40 |
Shrews | k | 19:40 |
jeblair | Shrews: also, i think maybe the delete handling in _updateNodeRequest should be first, rather than last? :) | 19:41 |
jeblair | i think that erroneously assumes that we can still do something with a fulfilled node request that was lost. but we can't. | 19:42 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Handle double node locking snafu https://review.openstack.org/509603 | 19:47 |
Shrews | jeblair: something along the lines of that ^^^ ? | 19:47 |
Shrews | (that may possibly break all sorts of tests, but covers all the things, i think) | 19:49 |
jeblair | Shrews: ya, left some further thoughts inline | 19:57 |
Shrews | oh right, forgot the id comparison | 19:58 |
Shrews | i'll wait for the tests to finish before i add that | 20:00 |
Shrews | and the other suggestion | 20:01 |
jeblair | i think the answer to the memory problem may be in a 616 node object graph. does anyone have a large format printer? | 20:03 |
*** ianw is now known as ianw|pto | 20:04 | |
fungi | rs232, already loaded up with a fanfold stack of tractor-fed greenbar! | 20:04 |
mnaser | jeblair: i have access to a plotter though it'll be a little while before the print can get to you | 20:05 |
fungi | jeblair: just |lp | 20:05 |
mnaser | on second thought, jeblair, time to buy the biggest highest res tv on the market and expense that because large object graphs | 20:06 |
mnaser | :P | 20:06 |
openstackgerrit | Ricardo Carrillo Cruz proposed openstack-infra/nodepool feature/zuulv3: Bring back per label groups in Openstack https://review.openstack.org/509620 | 20:08 |
rcarrillocruz | hey folks | 20:08 |
rcarrillocruz | we lost per label groups in nodepool | 20:08 |
clarkb | I have roll of paper drop cloth and crayons | 20:08 |
rcarrillocruz | i believe on the nodepool openstack driver split | 20:09 |
rcarrillocruz | ^ | 20:09 |
SpamapS | per label? | 20:09 |
rcarrillocruz | can I get eyes on it? without that feature, i have my workflow broken | 20:09 |
rcarrillocruz | yeah | 20:09 |
SpamapS | rcarrillocruz: note that it disappeared because it's not tested. :) | 20:10 |
rcarrillocruz | https://git.openstack.org/cgit/openstack-infra/nodepool/commit/?h=feature/zuulv3&id=7c3263c7df08bf824a1a8a87279d4e8ca547fd63 | 20:10 |
SpamapS | You may want to consider adding a test to make sure that doesn't happen in the next refactor :) | 20:11 |
rcarrillocruz | where are the driver tests | 20:11 |
rcarrillocruz | tristanC: around? | 20:12 |
rcarrillocruz | i guess testnodelaunchmanager | 20:12 |
pabelanger | jeblair: first issue with kill_after_timeout: https://github.com/gitpython-developers/GitPython/blob/master/git/repo/base.py#L920 hardcodes as_process=True, which doesn't work with kill_after_timeout. Looking to see why that would be | 20:13 |
Shrews | rcarrillocruz: test_launcher.py is the place | 20:17 |
rcarrillocruz | ack, let me see | 20:18 |
jeblair | pabelanger: ok. let me know what you find. maybe we need to make our own version of that method without the progress handling stuff (we don't use it anyway). | 20:19 |
pabelanger | jeblair: I'm currently patching GitPython to see if we could make it work | 20:21 |
clarkb | pabelanger: jeblair that may be something we can configure in a gitconfig too? | 20:27 |
dmsimard | rcarrillocruz: tristanC is on PTO for the next few weeks | 20:28 |
rcarrillocruz | k thx, nm i was pointed to the place for writing a test | 20:29 |
*** jkilpatr has quit IRC | 20:44 | |
mordred | rcarrillocruz: patch looks sane to me - +2 - but obvs would prefer with test :) | 20:51 |
rcarrillocruz | yeah, looking at it | 20:51 |
pabelanger | clarkb: oh, maybe | 20:51 |
rcarrillocruz | just spinned a xenial vm | 20:51 |
pabelanger | I'll have to read up on it | 20:51 |
pabelanger | clarkb: TIL: https://stackoverflow.com/questions/6458790/is-there-a-way-to-make-git-over-http-timeout | 20:52 |
mordred | pabelanger: ooh - that looks promising | 20:53 |
pabelanger | I think I got gitpython working (found a bug) but need to better understand the code changes | 20:53 |
pabelanger | but, have to run now for family diner | 20:53 |
pabelanger | dinner* | 20:54 |
clarkb | pabelanger: cool, I figured there would be something like that in the large list of git configoptions | 20:54 |
pabelanger | ya | 20:54 |
pabelanger | I'll see about testing / reading up about it | 20:54 |
*** hasharAway has quit IRC | 20:58 | |
mrhillsman | working on deploying zuul, got it integrated with a github project, do i need nodepool? | 21:14 |
jlk | what do you want to do with it? | 21:16 |
jlk | If you want to be able to execute things inside ephemeral environments, then yes | 21:16 |
jlk | otherwise there are very limited things you could do directly on zuul executors (basically poke remote URLs) | 21:16 |
mrhillsman | ok cool | 21:16 |
mrhillsman | working on creating a ci/cd | 21:17 |
mrhillsman | environment | 21:17 |
mrhillsman | reading the docs it is a bit difficult to figure things out | 21:17 |
jlk | gotcha. | 21:17 |
jlk | The typical scenario is that you have a nodepool hooked up to one or more OpenStack providers. Nodepool will generate images of your choice, and satisfy node needs from zuul jobs (each job uses a node) | 21:18 |
mrhillsman | correct me if i am wrong but i need to have like a project-config (configuration project), zookeeper, gearman, nodepool, zuul jobs project | 21:18 |
jlk | it can make those nodes hot-ready, or provision them on-demand. | 21:18 |
clarkb | jlk: mrhillsman also keep in mind you need to be very specific about what versions of nodepool/zuul you are deploying | 21:18 |
mrhillsman | v3 | 21:18 |
jlk | mrhillsman: you do not need to have a specific zuul jobs repository. You could place the jobs in the project-config repo if you wished. | 21:19 |
mrhillsman | i pulled the feature/zuulv3 branch | 21:19 |
mrhillsman | ah ok | 21:19 |
jlk | although that does prevent some things from happening | 21:19 |
jlk | project-config needs to be a 'trusted' repository, like where secrets are defined | 21:19 |
clarkb | you also don't need to run a gearman, zuul-scheduler will fork one for you. | 21:19 |
jlk | Zuul will not attempt to use configuration in a proposed change to test the change to a trusted repository | 21:20 |
jlk | so if you'd like to be able to test your proposed change, it needs to be in an untrusted repository | 21:20 |
clarkb | but yes if using v3 then you'll want a zookeeper + nodepool and a project config repo | 21:20 |
jlk | which is why we would have a "jobs" repo | 21:20 |
mrhillsman | cool | 21:21 |
mrhillsman | makes sense now | 21:21 |
fungi | also, the intent (at least eventually) is that basic jobs defined in the upstream openstack-infra/zuul-jobs repo are general-purpose building blocks you can use directly or inherit from in your own custom job definitions | 21:21 |
mrhillsman | i forked that one | 21:21 |
mrhillsman | as well as project-config and just removed a lot of stuff | 21:21 |
fungi | we're trying to keep the jobs/roles in that zuul-jobs repo pristine and free of openstackisms | 21:22 |
jlk | yeah, the project-config upstream is for openstack, so it'll be... huge. | 21:22 |
fungi | basically, zuul-jobs is intended to be a batteries-included stdlib for zuul | 21:22 |
mrhillsman | the docs are a bit difficult to fully understand but forking those and trial/error got me to the point of a noop job successfully updating github | 21:22 |
jlk | nice | 21:23 |
jlk | we're really interested in suggestions on doc updates / restructuring :) | 21:23 |
mrhillsman | so figured it was time to start asking questions as we need to move on to real jobs running | 21:23 |
mrhillsman | and was not entirely sure i needed nodepool so about to dive into that :) | 21:24 |
jlk | do you have an OpenStack available to you? | 21:24 |
fungi | please ask, we don't have many early adopters for the unreleased v3 so any feedback is useful | 21:24 |
clarkb | mrhillsman: theoretically you don't need a nodepool, just something to do the zookeeper node request dance with zuul. But right now the only implementation I know of for that is nodepool | 21:24 |
mrhillsman | got it | 21:26 |
mrhillsman | i have a small openstack deployment | 21:26 |
mrhillsman | but currently working with VMs to emulate | 21:27 |
mrhillsman | less folks hovering over you when you are using 1 dedicated server vs 100 :) | 21:28 |
*** jkilpatr has joined #zuul | 21:28 | |
mrhillsman | it is the openlab work i discussed with clarkb mordred jeblair and a few others at the ptg | 21:29 |
mrhillsman | need to ensure the PoC is working before i push for a fleet of servers | 21:29 |
jlk | neat1 | 21:30 |
jlk | ! | 21:30 |
jlk | If you have any issues / questions / suggestions for the github integration point, I'd love to hear them. I shepherded and (re)wrote most of that code. | 21:30 |
clarkb | mrhillsman: tristanC (who is on PTO I Guess) has been working to add non dynamic VM backends to nodepool | 21:31 |
clarkb | containers, static instances, etc | 21:31 |
mrhillsman | sure thing, i'll be sure to capture some more detail notes | 21:31 |
mrhillsman | oh, that would be nice | 21:31 |
clarkb | I think the base set of provider plugin changes got in but I'm not seeing any specific implementations in tree that use it yet | 21:31 |
jlk | taht's what I hope to work on next myself too. | 21:31 |
mrhillsman | great to hear, the hope is that openlab can be a model for how others can adopt zuul going forward | 21:36 |
mrhillsman | so any help we can offer and >guidance< we can get in ensuring it is up and running would be great | 21:37 |
mrhillsman | i emphasis guidance because i know you all are pretty busy regularly | 21:37 |
jeblair | mrhillsman: oh hi! we're also planning to work with leifmadsen who wants to flesh out new user documentation as soon as we're not pulling our hair out over the openstack zuulv3 transition. we know it's a weak spot. | 21:40 |
mrhillsman | that is great | 21:40 |
mrhillsman | i have colleague working in parallel so i am sure he can provide some feedback as well | 21:41 |
mrhillsman | he is in china though and they are on holiday this week | 21:41 |
mrhillsman | will make sure he joins this channel when he is back | 21:42 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't store pipeline references on builds https://review.openstack.org/509653 | 21:46 |
jeblair | mordred, SpamapS, clarkb, fungi, Shrews: ^ AKA fix memory leak | 21:47 |
jeblair | mrhillsman: cool, we'll let you know when we resume that | 21:47 |
mordred | jeblair: ZOMG | 21:48 |
clarkb | jeblair: aroo | 21:48 |
* fungi cheers elatedly | 21:49 | |
mordred | jeblair: my huch was close to correct - I think https://review.openstack.org/#/c/509653/1/zuul/model.py is somewhat akin to "it's gonna be a comma or a colon" :) | 21:49 |
jeblair | mordred: i could not have done it without graphviz :) also, have you used xdot? it's pretty cool. i mean, i wish it were a little more jurassic park, but hey. | 21:50 |
clarkb | jeblair: so you did end up graphing out the entire reference tree? | 21:50 |
mordred | jeblair: graphviz is the unsung hero of computer science | 21:51 |
jeblair | clarkb: i mean, not the *entire* tree.... just 10 nodes deep... | 21:51 |
jeblair | https://i.imgur.com/JqtYeG5.jpg | 21:52 |
jeblair | mordred: the trick is finding the comma or colon in that ^ | 21:52 |
mordred | jeblair: JEEZ | 21:53 |
jeblair | it is not normally supposed to look like that. :) | 21:53 |
jlk | gah... | 21:54 |
jlk | that looks about like something that would fall out of the OpenStack community :D | 21:54 |
jeblair | if you zoom out, you can see the patterns -- there's a bunch of layouts in there, each with its own set of pipelines, but there should only be one set. so it's like 20x bigger than it should be. but they're all connected by these individual builds | 21:57 |
mordred | jeblair: yah - between that and your commit message it makes total sense | 21:58 |
jeblair | oh, let me restack that | 21:59 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't store pipeline references on builds https://review.openstack.org/509653 | 22:00 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Update node requests after nodes https://review.openstack.org/509571 | 22:00 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move test_model.test_job_inheritance_configloader https://review.openstack.org/509495 | 22:00 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove test_job_inheritance https://review.openstack.org/509509 | 22:00 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add git timeout https://review.openstack.org/509517 | 22:00 |
mordred | jeblair: are we running with that fix now? | 22:00 |
jeblair | mordred: no i was about to suggest i pull that onto my local branch and restart | 22:00 |
jlk | Did you fix the pep8 violation? | 22:00 |
mordred | jeblair: I think that's a GREAT suggestion | 22:01 |
jeblair | jlk: there was one in a parent patch, so i reordered them by importance... was there also one in that patch? | 22:01 |
jlk | no, just in the parent | 22:02 |
jeblair | k; i'll clean those up in a few mins | 22:02 |
jlk | It was in https://review.openstack.org/#/c/509495 | 22:03 |
mordred | jeblair: with that fix in place, I would expect v3 to be running well still when we wake up in the morning (zk locking issues notwithstanding) | 22:03 |
jeblair | ++ | 22:04 |
jeblair | i was thinking tomorrow $morning might be a good time to regroup and check on our progress on things in the etherpad too | 22:06 |
jeblair | clarkb: do you want to +3 https://review.openstack.org/508786 ? | 22:07 |
jeblair | clarkb: you may also be interested in the child of that change, which i still haven't tested. | 22:08 |
clarkb | jeblair: ok, I'm still trying to digest the leak fix. Does that imply an old build could get a new layout (and is that ok?) | 22:08 |
clarkb | I'm guessing it desn't actually matter really | 22:10 |
jeblair | clarkb: if an item with a running build is re-enqueued into a new layout, the build will be pulled into the new layout along with the item. nothing about the build should change. if something about it the job it represents did change, then the build gets canceled (and possibly a successor run instead) | 22:10 |
jeblair | clarkb: none of that really interacts with this pipeline reference though, which, afaict, was mostly there for stats reporting and friendly output. in fact, i was halfway through changing all the references of "pipeline.name" to "pipeline_name" because that's all it seemed to be used for, when i decided this was simpler and nicer. | 22:11 |
clarkb | gotcha | 22:12 |
clarkb | in which case the updated pipeline defs don't really matter much | 22:12 |
jeblair | clarkb: basically, it's so we can say "this build ran in the check pipeline" | 22:12 |
clarkb | +3 on the usr2 handler | 22:13 |
jeblair | clarkb: yeah. the main things that could change during a reconfig that could affect this are the pipeline and job defs. as i said above, if the job def changes, then we cancel/relaunch. as for pipeline defs changing -- we make an assumption that if you redefine a pipeline called "check" that we should re-enqueue any jobs running in the old "check" into the new "check". it's a compromise -- anything else is gnarly. | 22:14 |
clarkb | jeblair: I'll work to reproduce that yappi thing really quickly | 22:15 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't store pipeline references on builds https://review.openstack.org/509653 | 22:21 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Update node requests after nodes https://review.openstack.org/509571 | 22:21 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move test_model.test_job_inheritance_configloader https://review.openstack.org/509495 | 22:21 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove test_job_inheritance https://review.openstack.org/509509 | 22:21 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add git timeout https://review.openstack.org/509517 | 22:21 |
jeblair | okay that's pep8'd | 22:21 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add debug messages and exception handling to stack dump handler https://review.openstack.org/508786 | 22:22 |
clarkb | jeblair: confirmed it prevents the exception locally (also checked the old code failed) | 22:26 |
clarkb | jeblair: I'll +2 | 22:26 |
jeblair | clarkb: cool, thx! | 22:26 |
clarkb | jeblair: should I approve it or do you want a proper test too? | 22:26 |
jeblair | clarkb: i think it's okay to +3 | 22:26 |
clarkb | kk | 22:27 |
openstackgerrit | Ricardo Carrillo Cruz proposed openstack-infra/nodepool feature/zuulv3: Bring back per label groups in Openstack https://review.openstack.org/509620 | 22:27 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Speed configuration building https://review.openstack.org/509309 | 22:27 |
rcarrillocruz | mordred , Shrews ^ | 22:29 |
rcarrillocruz | the test for the groups thin | 22:29 |
rcarrillocruz | g | 22:30 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Change yappi bytesio to stringio https://review.openstack.org/508787 | 22:38 |
mordred | rcarrillocruz: that's totally going to fail pep8 :) | 22:38 |
rcarrillocruz | erm, possibly? i just did tox -epy35 -- TestNodeLaunchManager locally | 22:39 |
rcarrillocruz | let me tox pep8 it | 22:39 |
rcarrillocruz | nah, i get green locally | 22:40 |
rcarrillocruz | you mean the long assert line i assume | 22:41 |
rcarrillocruz | ? | 22:41 |
openstackgerrit | Merged openstack-infra/zuul-sphinx master: Update exception message to include directories https://review.openstack.org/505400 | 22:42 |
rcarrillocruz | it pass pep8 jobs on zuulv3 | 22:45 |
rcarrillocruz | off-topic: why there is tox-pep and openstack-tox-pep8? | 22:45 |
rcarrillocruz | tox-pep8 | 22:45 |
*** dkranz has quit IRC | 22:51 | |
SpamapS | jeblair: looks like 509653 may have a legitimate failure or at best a racey test. | 22:55 |
SpamapS | rcarrillocruz: I'd guess that the openstack- one does some openstack-specific things | 22:56 |
SpamapS | rcarrillocruz: in fact it does | 22:57 |
SpamapS | tox_constraints_file: "{{ ansible_user_dir }}/src/git.openstack.org/openstack/requirements/upper-constraints.txt" | 22:57 |
SpamapS | not everybody has a constraints file :) | 22:57 |
tristanC | clarkb: i'm indeed on vacation till the end of the month, though i'll be checking in intermittently | 23:09 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't store pipeline references on builds https://review.openstack.org/509653 | 23:27 |
dmsimard | Wow 509653 is such a small change yet a huge impact, that's an awesome find. | 23:33 |
dmsimard | jeblair++ | 23:33 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!