mordred | pabelanger: for zuul-jobs since the consumption is intended to be via git rather than releases I'm not sure if reno would be a win - however, I like reno a lot - so maybe it still would be? | 00:00 |
---|---|---|
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Fix change history cycle detection https://review.openstack.org/485368 | 00:02 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add project info to JobDirPlaybook https://review.openstack.org/485273 | 00:02 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Make playbook run meta info less fragile https://review.openstack.org/485284 | 00:02 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add spacer after playbook stats rather than before playbook https://review.openstack.org/485285 | 00:03 |
pabelanger | mordred: ya, I like the idea of reno too | 00:03 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Switch from tox-linters to tox-pep8 https://review.openstack.org/485379 | 00:04 |
mordred | pabelanger: ^^ :) | 00:04 |
pabelanger | +2 | 00:05 |
mordred | pabelanger: I'm going to restart the executor to pick up the logging format hanges | 00:09 |
mordred | changes | 00:09 |
mordred | not hanges | 00:09 |
mordred | pabelanger: also - if you have a sec, ould you look at https://review.openstack.org/#/c/485345/ ? | 00:10 |
mordred | pabelanger: jeblair and Shrews +2'd it earlier, but I had a fix a puppet apply test issue | 00:10 |
pabelanger | mordred: +2, feel free to +3 is you get nobody else | 00:11 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Gather facts for tox/run playbook https://review.openstack.org/485380 | 00:16 |
pabelanger | remote: https://review.openstack.org/483987 Create openstack-py35 job with upper-constraints | 00:17 |
pabelanger | mordred: so those 2 are interesting ^. and think we need to gather_facts for all our playbooks in zuul-jobs by default | 00:17 |
mordred | pabelanger: whyfore? what are we missing? | 00:21 |
* mordred looks at patch | 00:21 | |
pabelanger | mordred: i'm trying to pass ansible_user_dir into an environmental variable | 00:21 |
mordred | pabelanger: and we don't have that one without gather_facts? | 00:22 |
pabelanger | right, facts are disabled by default | 00:22 |
pabelanger | in ansible.cfg | 00:22 |
pabelanger | so we always have to opt into them | 00:22 |
mordred | yup | 00:23 |
mordred | pabelanger: I just plopped in a comment about gather_subset ... | 00:23 |
pabelanger | http://logs.openstack.org/27/484027/5/check/openstack-py35/a18c5ba/job-output.txt passes | 00:23 |
mordred | pabelanger: actually - we should have it there | 00:24 |
pabelanger | looking | 00:24 |
mordred | pabelanger: because unittests ... oh, nevermind | 00:24 |
mordred | it'sunittests _pre_ that has a gather_facts | 00:24 |
jeblair | pabelanger: you're losing me on that. would you mind explaining the whole problem in the commit message please? | 00:24 |
mordred | http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/playbooks/unittests/pre.yaml | 00:24 |
pabelanger | jeblair: sure I can go into more detail | 00:24 |
mordred | jeblair: https://review.openstack.org/#/c/483987/9/zuul.yaml for context | 00:27 |
jeblair | okay, so that's fallout from dropping zuul_workspace_dir | 00:28 |
mordred | yah | 00:28 |
pabelanger | right, ansible_user_dir would be the same thing | 00:29 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Gather facts for tox/run playbook https://review.openstack.org/485380 | 00:30 |
pabelanger | okay, some more details | 00:30 |
jeblair | to be fair, hardcoding /home/zuul there, or in a job variable would be about as valid. but i feel like the problem may be likely to occur in a less openstack-specific way in the future. | 00:30 |
jeblair | is it worth considering adding zuul_workspace_dir back? or should we focus on making the facts approach better? | 00:31 |
jeblair | can we cache facts across playbook runs? | 00:31 |
mordred | yah. we could switch for gather: false in ansible.cfg to on but with gather_subset: !all in ansible.cfg | 00:31 |
jeblair | mordred: seems like if the bulk of our jobs are doing that anyway (and now are about to twice), that may be a win. | 00:31 |
mordred | gathering = smart | 00:32 |
pabelanger | It is actually an interesting question, aside from the preformance issues (I am not sure how bad it is) what is the downside of having ansible variables by default to jobs? | 00:32 |
mordred | fact_caching = jsonfile | 00:32 |
jeblair | (and who knows, the devstack jobs may end up doing it too, we just haven't gotten there yet) | 00:32 |
mordred | fact_caching_connection = /tmp/facts_cache | 00:32 |
jeblair | pabelanger: i think the reason we turned facts off was performance issues? | 00:33 |
mordred | pabelanger: yah its an extra hit on each play - but in the zuul context I think we could turn on jsonfile fact_caching and it'll be likely pretty fine | 00:33 |
pabelanger | ya, I am not sure why we disabled actually. But cachine of facts does seem like a nice idea | 00:33 |
pabelanger | caching* | 00:33 |
jeblair | mordred: any concern with having the fact cache be in the untrusted writable work dir? | 00:34 |
jeblair | i guess if a trusted post-playbook relied on a fact, an untrusted main playbook could have overwritten it beforehand... | 00:36 |
jeblair | so caching may be tricky | 00:36 |
mordred | jeblair: we should see what things it caches | 00:36 |
* mordred tests | 00:36 | |
pabelanger | ya, doing the same | 00:37 |
mordred | it does not cache facts set with set_fact | 00:38 |
mordred | so it's only a cache of facts that are gathered by the system | 00:38 |
pabelanger | ya, that make sense | 00:38 |
jeblair | mordred: but someone could still directly write the file | 00:38 |
pabelanger | fact_caching_connection is a local filesystem path to a writeable directory, so we'd have it outside the src directory on executor? | 00:40 |
jeblair | mordred: oh, i guess we can stick it in jobroot/ansible and that should work | 00:40 |
pabelanger | can brwap write to jobroot/ansible? | 00:41 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Gather facts smartly and cache them https://review.openstack.org/485384 | 00:42 |
mordred | jeblair, pabelanger: I was thinking like that ^^ | 00:42 |
jeblair | pabelanger: yeah, all of jobroot is writable inside of bwrap. the only protection for inside of jobroot but outside of jobroot/work is the ansible module path checks | 00:42 |
pabelanger | jeblair: cool, had to look it up to see | 00:43 |
jeblair | so if you escape ansible, you can't hose the executor, but you can alter the code that's about to run in trusted post playbooks. | 00:43 |
pabelanger | mordred: think you have a typo | 00:43 |
pabelanger | %/.ansible | 00:43 |
mordred | probably | 00:43 |
pabelanger | %s/.ansible | 00:43 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Gather facts smartly and cache them https://review.openstack.org/485384 | 00:44 |
jeblair | mordred: do you mean to use ".ansible/" rather than "ansible/"? | 00:44 |
mordred | jeblair: yah - I figured sticking it in the same place as local_tmp and remote_tmp since it's ultimately ephemeral data | 00:45 |
mordred | although I guess I could make it be facts_tmp to math those dir names ... | 00:45 |
jeblair | mordred: i don't think facts_tmp is necessary | 00:46 |
mordred | kk | 00:46 |
jeblair | (the other options are literally "remote_tmp") | 00:47 |
pabelanger | +2 | 00:47 |
mordred | jeblair: while I've got you here: https://review.openstack.org/#/c/485345/ and https://review.openstack.org/#/c/485379/ could both use a quick +A | 00:48 |
jeblair | done | 00:48 |
jeblair | i'm going to eod now | 00:49 |
pabelanger | mordred: have I also mentioned this is pretty fun | 00:51 |
mordred | pabelanger: ++ | 00:52 |
mordred | pabelanger: also - https://review.openstack.org/#/c/262597/ is the reno patch I made for zuul forever ago | 00:52 |
pabelanger | cool, I'll look into it in the morning | 00:53 |
pabelanger | EOD now | 00:53 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Switch from tox-linters to tox-pep8 https://review.openstack.org/485379 | 00:56 |
*** jkilpatr has quit IRC | 01:11 | |
mordred | http://logs.openstack.org/27/484027/5/check/openstack-py35/8f780a0/job-output.txt latest logging updates applied | 01:18 |
*** harlowja has quit IRC | 01:32 | |
clarkb | https://blog.sileht.net/automate-rebasing-and-merging-of-your-pr-with-pastamaker.html may be of interest to the channel | 02:08 |
mordred | indeed | 02:10 |
*** harlowja has joined #zuul | 02:52 | |
mordred | clarkb: the promise of github seems to be allowing every dev team to write their own PR bot | 03:22 |
clarkb | right | 03:23 |
SpamapS | It's the yak every dev team is dying to shave. | 03:44 |
*** harlowja has quit IRC | 04:18 | |
*** harlowja has joined #zuul | 04:30 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Rework tox -e linters https://review.openstack.org/484836 | 05:14 |
*** isaacb has joined #zuul | 05:16 | |
tobiash | sounds like they actually would need BonnyCI... | 05:24 |
*** harlowja has quit IRC | 05:33 | |
*** isaacb has quit IRC | 06:13 | |
*** isaacb has joined #zuul | 06:24 | |
*** bhavik1 has joined #zuul | 06:30 | |
*** amoralej|off is now known as amoralej | 07:26 | |
*** hashar has joined #zuul | 07:41 | |
*** yolanda has quit IRC | 07:58 | |
*** clarkb has quit IRC | 08:04 | |
*** clarkb has joined #zuul | 08:05 | |
*** isaacb has quit IRC | 08:31 | |
*** isaacb has joined #zuul | 08:32 | |
*** yolanda has joined #zuul | 08:53 | |
*** yolanda has quit IRC | 08:54 | |
*** yolanda has joined #zuul | 08:55 | |
*** smyers has quit IRC | 09:04 | |
*** smyers has joined #zuul | 09:18 | |
*** bhavik1 has quit IRC | 10:27 | |
*** xinliang has quit IRC | 10:29 | |
*** xinliang has joined #zuul | 10:41 | |
*** xinliang has quit IRC | 10:41 | |
*** xinliang has joined #zuul | 10:41 | |
*** jkilpatr has joined #zuul | 11:02 | |
*** amoralej is now known as amoralej|lunch | 11:04 | |
*** amoralej|lunch is now known as amoralej | 12:24 | |
*** dkranz has joined #zuul | 13:18 | |
pabelanger | mordred: nice work on console logs! One request, LOOP [Collect tox logs] in your job-output.txt above, can we prettyprint the dict item to make it easier to read the output? | 13:38 |
dmsimard | pabelanger, clarkb, mordred: did you guys just flip the switch when you moved from jenkins to zuul 2.5 ? I forget. Was there a weird period where both of them were running side by side ? We're wondering how to make sure Zuul and Nodepool don't get confused on which target the job is meant to run | 13:39 |
dmsimard | tristanC: ^ | 13:39 |
pabelanger | dmsimard: no, we stopped jenkins and started launchers IIRC. Don't think we ever had both running at same time | 13:40 |
pabelanger | dmsimard: are you seeing an issue? | 13:41 |
pabelanger | mordred: also, I looked like we are missing ok between some tasks: | 13:41 |
pabelanger | 2017-07-20 01:04:44.850458 | TASK [bindep : Look for other-requirements.txt] | 13:41 |
pabelanger | 2017-07-20 01:04:44.871528 | TASK [bindep : Define bindep_file fact] | 13:41 |
pabelanger | would have expected to see ok or skipped there | 13:41 |
dmsimard | pabelanger: we intended to run jenkins and zuul-launcher side by side to allow for a transition period but it looks when both targets are enabled, zuul and nodepool get confused about which node is requested and allocated | 13:42 |
dmsimard | pabelanger: I can forward you an email since I haven't personally witnessed this | 13:43 |
dmsimard | sent | 13:43 |
pabelanger | right, different image names would be one way to fix that | 13:43 |
dmsimard | pabelanger: that's what I thought too, see internal irc sf channel backlog | 13:45 |
pabelanger | but ya, I wouldn't expect both running at the same time to be good. Going back into irc logs to see what we did. But, I seem to remember we did the roll out dance a few times | 13:45 |
pabelanger | dmsimard: where is your nodepool.yaml file? | 13:45 |
dmsimard | pabelanger: mixture of https://softwarefactory-project.io/r/gitweb?p=config.git;a=blob;f=nodepool/nodepool.yaml;h=ee8c998c4c80b9de8ddf3e5891671ed16f880c9b;hb=HEAD | 13:46 |
dmsimard | and https://softwarefactory-project.io/r/gitweb?p=software-factory/sf-config.git;a=blob;f=ansible/roles/sf-nodepool/templates/_nodepool.yaml.j2;hb=HEAD | 13:46 |
dmsimard | (it's mashed by software factory when running the nodepool config update job) | 13:46 |
dmsimard | this is what nodepool list is giving: https://softwarefactory-project.io/paste/show/xLmGA1UyA1bfi4fGUT2h/ | 13:47 |
pabelanger | dmsimard: Ya, I'd just home your images to either jenkins or launcher for now | 13:48 |
pabelanger | not across both | 13:48 |
dmsimard | pabelanger: tristanC says that wouldn't work, see #softwarefactory | 13:48 |
pabelanger | then setup a new cloud provider | 13:49 |
pabelanger | and have jenkins and zuulv2.5 only access one or other | 13:49 |
dmsimard | pabelanger: hmm, how would we go about that ? | 13:49 |
pabelanger | both could use the same project, just different names in nodepool.yaml. Like we do for osic-cloud1 | 13:50 |
dmsimard | like, I know how to setup additional providers but how to get only jenkins to consume this cloud and zuul the other ? | 13:50 |
pabelanger | dmsimard: look at osic-cloud1-s3500 and osic-cloud1-s3700 in nodepool.yaml, both are using the same cloud, project | 13:50 |
pabelanger | Oh, hmm | 13:51 |
tristanC | pabelanger: iiuc the issue is because when there are node ready, nodepool doesn't check if they are correctly assigned to the requesting target | 13:51 |
tristanC | pabelanger: even if in the logs it seems to correctly create server and assigned them to the right target when requested, we ends up with lots of node ready in the wrong target | 13:52 |
pabelanger | ya, guess you'd need a 2nd nodepool too | 13:53 |
pabelanger | which didn't know about jenkins | 13:53 |
pabelanger | maybe jeblair or clarkb have an idea on how to work around that with gearman | 13:53 |
tristanC | pabelanger: yes, that would work too | 13:54 |
tristanC | dmsimard: iirc, openstack-infra was running jenkins and zuul-launcher with the same set of jobs | 13:54 |
pabelanger | https://review.openstack.org/#/c/325992/ | 13:59 |
pabelanger | tristanC: ya, you are right, we did run both in production | 14:02 |
pabelanger | I'm remembering now | 14:02 |
pabelanger | however, we wanted jobs to run on both jenkins and zuul-launcher | 14:02 |
pabelanger | as not to interrupt the gate | 14:02 |
pabelanger | I don't think we ever pinned a job just to launcher or jenkins | 14:03 |
pabelanger | because we wanted to ensure the migration between launcher and jenkins worked as expected | 14:03 |
pabelanger | since both were JJB, they both should run on both | 14:03 |
pabelanger | dmsimard: so looping back, why do you want a job to only run on launcher? | 14:04 |
dmsimard | pabelanger: I'm a scaredy cat of flipping the switch 100% from jenkins to launcher without the opportunity to test things first | 14:05 |
pabelanger | dmsimard: right, so you can use both jenkins and launcher per tristanC comment, and make sure everything is 100% across both | 14:06 |
dmsimard | pabelanger: and we're in a bit of a time crunch from a lot of angles so if we don't get that transition period where both are running side by side, we will need to delay the upgrade | 14:06 |
dmsimard | pabelanger: well, running side them side by side was the plan, but this whole discussion is about that not exactly working as intended | 14:07 |
pabelanger | dmsimard: more JJB jobs into another SF instance and test them there first? | 14:07 |
pabelanger | but ya, no easy migration path for this | 14:08 |
dmsimard | pabelanger: requires time and it's a precious resource right now | 14:08 |
dmsimard | especially with folks on PTO left and right | 14:08 |
pabelanger | it took us about 4 weeks to migrate to zuul-launchers and all hands were on deck | 14:09 |
pabelanger | don't want to rush it | 14:09 |
dmsimard | yeah, that is what I am thinking as well -- RDO's deployment is not nearly openstack-infra scale but we still have a considerable amount of jobs, some that are fairly complicated | 14:10 |
dmsimard | so I'm not confident in just flipping the switch | 14:10 |
dmsimard | We'll discuss our options, thanks | 14:10 |
*** isaacb has quit IRC | 14:30 | |
jeblair | yeah, we added zuul into the mix, so with 8 jenkins masters and one zuul launcher, zuul ran 1/9 of the jobs. | 14:44 |
jeblair | if we saw failures, we turned off the zuul launcher, fixed them and repeated. | 14:44 |
jeblair | then as things got gradually better, we added more zuul launchers and reduced the jenkins masters. | 14:45 |
pabelanger | mordred: jeblair: so smart facts and bubblewrap are broken at the moment. Working on fix, this is because the 'zuul' user doesn't exist on my local system when tox is run | 14:58 |
pabelanger | and the ansible connection for localhost is defaulting to 'zuul' user | 14:58 |
pabelanger | http://paste.openstack.org/show/616030/ is the issue | 15:01 |
pabelanger | this also means we'd be gathering facts on executors | 15:02 |
jeblair | why does it default to 'zuul' for localhost? | 15:04 |
pabelanger | jeblair: not sure yet. trying to figure that out | 15:05 |
jeblair | pabelanger: i think this could cause problems for folks running zuul-executor as a different username than, well, whatever thing is causing it to use 'zuul'. :) | 15:05 |
pabelanger | agree, think we exposed a bug | 15:05 |
jeblair | pabelanger: localhost facts probably aren't important; any way to turn it off? | 15:05 |
pabelanger | jeblair: not sure yet, will find out | 15:06 |
Shrews | forgive my ignorance, what are "smart facts" | 15:06 |
Shrews | ? | 15:06 |
pabelanger | ansible will scan host, if missing from fact cache | 15:07 |
pabelanger | http://docs.ansible.com/ansible/intro_configuration.html#gathering | 15:07 |
pabelanger | is 'host' is mssing from fact cache | 15:07 |
Shrews | ah | 15:08 |
*** isaacb has joined #zuul | 15:13 | |
*** dmsimard is now known as dmsimard|cave | 15:14 | |
pabelanger | jeblair: SpamapS: so, If I try to run the brap command from our tox jobs myself, I get the following error: Can't write data to file /etc/passwd: Bad file descriptor | 15:41 |
pabelanger | any idea why? | 15:41 |
jeblair | pabelanger: zuul creates a new password file as a pipe and hands the fd to bubblewrap. i don't know what could cause a problem with that. can you paste your command and error? | 15:46 |
pabelanger | sure | 15:48 |
pabelanger | http://paste.openstack.org/show/616042/ | 15:48 |
pabelanger | error is above is only thing I get back | 15:48 |
jeblair | pabelanger: oh, you're copy/pasting a previously run bwrap command | 15:49 |
jeblair | pabelanger: so you're telling bwrap to use a file descriptor that doesn't exist | 15:50 |
pabelanger | jeblair: ya, I'm trying to reproduce the environment from tox | 15:50 |
jeblair | --file 1000 /etc/passwd" | 15:50 |
pabelanger | ah | 15:50 |
jeblair | pabelanger: try running the bwrap module as main. it has a main method so you can do testing like that | 15:50 |
jeblair | python zuul/drivers/bubblewrap/__init__.py | 15:51 |
jeblair | that should take care of creating the passwd/group files and run the command you give it in bwrap | 15:51 |
pabelanger | jeblair: ya, I did that and smart facts works | 15:53 |
pabelanger | so, was thinking we might be doing something different some how | 15:54 |
jeblair | pabelanger: i ran tests with vvv to get the full tb: http://paste.openstack.org/show/616046/ | 15:57 |
pabelanger | jeblair: yup, that is the error I am seeing too from tox | 15:57 |
pabelanger | it looks like zuul isn't in /etc/passwd file | 15:58 |
pabelanger | but trying to confirm | 15:58 |
pabelanger | 2017-07-20 12:00:51,174 zuul.AnsibleJob DEBUG [build: 19831828f32042a7bfddba05a2059dd9] Ansible output: b' "stdout": "pabelanger:x:1000:1000:pabelanger:/tmp/tmpaapfgaf3/zuul-test/19831828f32042a7bfddba05a2059dd9:/bin/bash",' | 16:01 |
pabelanger | so ya, it is using my uid | 16:01 |
pabelanger | not zuul | 16:01 |
jeblair | why does ansible want to look up zuul? | 16:02 |
pabelanger | I think we are running ansible-playbook as zuul user some how | 16:02 |
jeblair | pabelanger: as you pointed out, you don't have a zuul user on your system, and there isn't one in the bwrap passwd file either. | 16:04 |
pabelanger | jeblair: Ya, that is what is driving me crazy. I don't know where it is coming from atm | 16:05 |
jeblair | right, that was my question -- what, specifically, causes ansible to do a pwnam lookup on zuul? | 16:07 |
jeblair | the code in ansible is https://github.com/ansible/ansible/blob/stable-2.3/lib/ansible/module_utils/facts.py#L551 | 16:07 |
jeblair | so it's getting the name to look up from getpass.getuser() | 16:07 |
pabelanger | Oh | 16:09 |
pabelanger | I see it now | 16:09 |
pabelanger | http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/executor/server.py?h=feature/zuulv3#n1318 | 16:10 |
jeblair | pabelanger: yep. python getpass looks at LOGNAME first | 16:10 |
*** isaacb has quit IRC | 16:11 | |
jeblair | the only reason i know of we hardcode logname to user is so that when we used the default ansible logging module to create the job log, it put "zuul" as the username. | 16:11 |
jeblair | pabelanger: we can probably drop that now. | 16:11 |
pabelanger | jeblair: cool, let me try that | 16:11 |
jeblair | since i think that has entirely been superceded by zuul_stream and friends | 16:12 |
jeblair | pabelanger: the hello-world job is failing the playbook test because we're gathering facts locally, and executing local code is prohibited | 16:23 |
pabelanger | ya, that is what I am debuging now | 16:23 |
jeblair | pabelanger, mordred: that error did not make it into the text console log, only the json one. i think that's a bug. | 16:23 |
jeblair | i'll file a story for that | 16:24 |
pabelanger | so, should we allow facts to be gathered on localhost? | 16:25 |
jeblair | no | 16:25 |
pabelanger | k, I'll see how we can stop smart from doing that | 16:26 |
pabelanger | maybe prime the fact cache with something | 16:26 |
jeblair | mordred: for whenever your next cup of coffee is: https://storyboard.openstack.org/#!/story/2001129 | 16:29 |
clarkb | pabelanger: tristanC I do not know why you'd have nodes assigned to the wrong target. I can imagine that targets "forgetting" nodes so they leak on the nodepool side is possible though | 16:40 |
*** bhavik1 has joined #zuul | 16:50 | |
pabelanger | jeblair: mordred: okay, so if we prime the 'localhost' json file in fact-cache with {"module_setup": true} ansible will not run setup task | 17:02 |
pabelanger | that will stop untrusted playbooks from trying run it on executor | 17:02 |
*** hashar has quit IRC | 17:03 | |
pabelanger | and trusted playbooks will not have ansible facts I think | 17:03 |
pabelanger | unless we use a different cache for trusted / untrusted? | 17:03 |
jeblair | pabelanger: i don't think we want anything different between trusted/untrusted | 17:04 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Remove hardcoded LOGNAME for ansible-playbook https://review.openstack.org/485749 | 17:05 |
pabelanger | jeblair: k, any thoughts about accessing ansible varialbes on executor with trusted playbook? | 17:07 |
pabelanger | I think if the first playbook that runs today is trusted, localhost fact gets created and other playbooks will use it | 17:08 |
pabelanger | however, if untrusted is first, localhost fact cache will fail | 17:08 |
pabelanger | the other intereting thing is, from an untrusted playbook, it does create the localhost json file in fact-cache, even though job fails | 17:18 |
SpamapS | jeblair: FYI, you don't need to find the bwrap driver to run its main method. it has a CLI entrypoint called zuul-bwrap :) | 17:21 |
jeblair | SpamapS: oh right! i should put a comment in there to remind myself. :) | 17:24 |
jeblair | pabelanger: i thought you were looking at pre-populating the cache with something so it doesn't run at all, and we would have the same behavior in both places. | 17:25 |
pabelanger | jeblair: right, we can do that. but means we'd never be able to use ansible varibles for a playbook on executor | 17:26 |
pabelanger | want to make sure we are okay with that | 17:27 |
jeblair | pabelanger: until someone comes up with a use case for it, i'm okay with it. | 17:27 |
pabelanger | ok | 17:28 |
* SpamapS sets out to write a disk monitor thing | 17:32 | |
*** artgon has joined #zuul | 17:43 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Gather facts smartly and cache them https://review.openstack.org/485384 | 17:49 |
pabelanger | mordred: jeblair: ^ should be the fix | 17:49 |
*** amoralej is now known as amoralej|off | 17:59 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Check out implicit branch in timer jobs https://review.openstack.org/485329 | 18:00 |
jeblair | pabelanger: lgtm, thanks. let's see what mordred says when he awakens. | 18:02 |
*** harlowja has joined #zuul | 18:06 | |
*** bhavik1 has quit IRC | 18:35 | |
*** amoralej|off is now known as amoralej | 18:59 | |
Shrews | Has anyone started poking at the zuul auto-hold stuff yet? I might start looking into it if not. | 19:05 |
*** isaacb has joined #zuul | 19:20 | |
jeblair | Shrews: not that i'm aware of. there's a story for it with a plan i think should work. and i think if you assign it to yourself, you win. :) | 19:28 |
*** dmsimard|cave is now known as dmsimard | 19:28 | |
pabelanger | ++ to auto-hold, would make debugging new jobs easier too | 19:29 |
Shrews | been skirting the boundaries of zuul code thus far, so it would be my first adventure with it. hopefully it won't end up being over my head | 19:33 |
jeblair | Shrews: i'm happy to help with any questions! | 19:39 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Check out implicit branch in timer jobs https://review.openstack.org/485329 | 19:49 |
Shrews | i <3 watching streaming logs | 19:53 |
Shrews | much more readable these days, too | 19:53 |
jeblair | o hrm. the url in the status page isn't switching to the log url when the job is finished. it's sposed to do that. | 19:57 |
*** openstack has joined #zuul | 20:01 | |
pabelanger | can we also display finger URL also on status page? | 20:01 |
pabelanger | but steamming is cool | 20:02 |
jeblair | yes we can+should | 20:04 |
*** jkilpatr has quit IRC | 20:05 | |
tobiash | jeblair: I also noticed that, but didn't have time yet to look into that | 20:06 |
pabelanger | tobiash: feedback for html stream, have autoscroll happen in you are at bottom of browser, but don't auto scroll if you move up a few lines | 20:07 |
pabelanger | I have no idea how you'd do that with JS | 20:07 |
* tobiash too | 20:07 | |
jeblair | pabelanger, mordred, clarkb, SpamapS: i'm working on tidying up things related to non-change jobs. periodic, post, tag, etc. can you take a look at the commit message and the docs for https://review.openstack.org/485329 and let me know if that direction looks good? | 20:08 |
tobiash | pabelanger: I'm a js and html noob so this was poor mans first try on console streaming ;) | 20:08 |
pabelanger | tobiash: well done! much better then I could have done :D | 20:09 |
jeblair | pabelanger, mordred, clarkb, SpamapS: (i know we're going to have jobs where we want to keep using the old shell variables during a transition period. i figure we can have some ansible to do the translations for us and add that to those jobs.) | 20:10 |
*** amoralej is now known as amoralej|off | 20:10 | |
mordred | morning all | 20:10 |
jeblair | mordred: morning! | 20:10 |
pabelanger | jeblair: sure, looking now | 20:10 |
pabelanger | look a mordred | 20:10 |
jeblair | pabelanger, mordred, clarkb, SpamapS: also, the changes to model.py there -- i've got distinct objects for ref, branch, tag, and change now. | 20:12 |
*** isaacb has quit IRC | 20:12 | |
*** openstackgerrit has quit IRC | 20:17 | |
*** isaacb has joined #zuul | 20:19 | |
*** dkranz has quit IRC | 20:22 | |
*** isaacb_ has joined #zuul | 20:22 | |
*** isaacb has quit IRC | 20:25 | |
mordred | jeblair: change looks good in general - but I'm also still waking up | 20:32 |
jeblair | mordred: no takebacks! | 20:34 |
*** dkranz has joined #zuul | 20:35 | |
SpamapS | jeblair: I think we'll all go a little less insane if we can change as little as possible about what gets run by jobs.. so yeah, using the old envvars, perhaps for a long time (till zuul4?) is a good idea. | 20:39 |
jeblair | SpamapS: i agree, though my time horizon is perhaps a bit shorter. the bulk of our jobs won't use the old zuul env variables out of the gate (none of the zuulv3 jobs currently do). but yeah, i don't want to be under time pressure to get the long tail cleaned up before ptg. | 20:43 |
*** openstackgerrit has joined #zuul | 20:45 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: gzip console log before uploading it https://review.openstack.org/483611 | 20:45 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Remove hardcoded LOGNAME for ansible-playbook https://review.openstack.org/485749 | 20:45 |
SpamapS | jeblair: the ones that use them are also going to be the stickiest nastiest ones that do everything we told them not to do ;) | 20:46 |
jeblair | SpamapS: indeed! | 20:47 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Gather facts smartly and cache them https://review.openstack.org/485384 | 20:47 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Simplify bindep logic removing fallback support https://review.openstack.org/482650 | 20:48 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Simplify bindep logic removing fallback support https://review.openstack.org/482650 | 20:49 |
pabelanger | Yay, new facts merged | 20:50 |
mordred | pabelanger: I abaondoned your gather-facts change for that job | 20:51 |
pabelanger | mordred: ++ | 20:52 |
mordred | jeblair: if you get a sec: https://review.openstack.org/#/c/485377 | 20:52 |
pabelanger | mordred: caching is also fast too | 20:52 |
pabelanger | tried it a few times | 20:52 |
mordred | there's only 3 open zuul-jobs changes: https://review.openstack.org/#/q/project:openstack-infra/zuul-jobs+status:open | 20:52 |
pabelanger | Yup, I can start back up tox role now | 20:53 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Remove OOM check from tox role https://review.openstack.org/485377 | 20:56 |
pabelanger | mordred: jeblair: I'm thinking we should also remove the 'zero' tests run check. Or is that something we care about for zuul-jobs? Maybe we still want it in openstack? | 20:59 |
jeblair | pabelanger: that one may be good to keep and generally useful. | 21:01 |
pabelanger | jeblair: okay, let me see what the logic would look like | 21:02 |
jeblair | pabelanger: i assume it will have to branch based on testr/nose/etc | 21:03 |
mordred | pabelanger: I think we're probably going to want to have a "did I find a .testrepository? if so, include: testrepository.yaml | 21:03 |
pabelanger | mordred: ya, that's what I am thinking too | 21:04 |
mordred | pabelanger: you know what - that might want to be in its own role too -so that if someone is writing a job that does things but doesn't use tox (like a job to just run testr without tox or something) thatthey could add the "check-zero-tests" role to their playbook | 21:05 |
mordred | (or something) | 21:05 |
pabelanger | Agree | 21:06 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Remove nodepool DIB specific logic https://review.openstack.org/485824 | 21:07 |
pabelanger | mordred: jeblair: I also wouldn't object if we wanted to restart zuulv3 to pickup smart facts too | 21:08 |
mordred | ++ | 21:09 |
jeblair | pabelanger: wfm | 21:10 |
jeblair | mordred: speaking of restart... did you see this from earlier? https://storyboard.openstack.org/#!/story/2001129 | 21:10 |
jeblair | mordred: i had to read a .json file to figure out what went wrong with a job :( | 21:10 |
jeblair | mordred: (the first play failed but did not log its failure to the job log. the second play *also* failed -- don't be distracted by that. :) | 21:12 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Check out implicit branch in timer jobs https://review.openstack.org/485329 | 21:13 |
mordred | jeblair: oh goodie | 21:19 |
*** dkranz has quit IRC | 21:20 | |
mordred | jeblair: cool. I should be able to reproduce thatlocally pretty easily | 21:21 |
pabelanger | jeblair: should graceful restart work? | 21:21 |
pabelanger | I haven't run the command yet on ze01 | 21:21 |
mordred | jeblair: so - unfortunately I actually CAN'T reproduce that locally | 21:25 |
mordred | oh - wait | 21:26 |
mordred | uyup. there it is (I had fact caching turned on - so it wasn't runing) | 21:27 |
mordred | jeblair: turnsit the fact that it was the fact gathering is the relevant thing | 21:27 |
mordred | although there is another thing that's ugly ... | 21:28 |
jeblair | pabelanger: i think so? | 21:35 |
jeblair | pabelanger: but it might run into the pidfile issue on restart | 21:36 |
pabelanger | k, I'll stop when no jobs are running | 21:41 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Improve display of simple error messages https://review.openstack.org/485833 | 21:41 |
mordred | jeblair, pabelanger: that'll fix the story (gah, I should reference in commit message - one sec) | 21:42 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Improve display of simple error messages https://review.openstack.org/485833 | 21:42 |
mordred | that should fix the story - and also fix error messages that are just an error message from getting json-displayed | 21:43 |
jeblair | mordred: merge conflicts | 21:46 |
mordred | gah | 21:47 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Improve display of simple error messages https://review.openstack.org/485833 | 21:47 |
mordred | pabelanger: your item/pprint request is ... harder | 21:47 |
mordred | pabelanger: the problem is that what we're dealing with is that the dict on the ITEM line is the loop key | 21:49 |
mordred | in this case, we're looping over the results from the previous task | 21:49 |
mordred | so there is no "good" thing to display there | 21:49 |
pabelanger | hmm | 21:50 |
pabelanger | k, +3 lets not hold up the patch | 21:52 |
pabelanger | mordred: also, using zuul_stream in tox jobs makes it pretty hard to see what ansible is doing. I had to actually disable it in ansible.cfg today | 21:53 |
pabelanger | I am not sure if we have a better story for that | 21:53 |
pabelanger | eg: tox -epy35 on zuul | 21:54 |
jeblair | pabelanger: i turn on KEEP_TEMPDIRS and go look at the job log. | 21:55 |
jeblair | pabelanger: if we want to make that more automatic, we could attach the job log as a subunit attachment | 21:55 |
mordred | pabelanger: can you expand on that? like - what is it that you were missing? (also, I expect that to be better once I get the json formatting thing done) | 21:55 |
pabelanger | jeblair: ya, I did have keep_tempdirs eventually. But attachment might be nice too. | 21:55 |
mordred | actually - lemme do a couple of things ... | 21:56 |
pabelanger | mordred: it was around the gather facts stuff today. So part of it was ansible traceback, -vvv help but others were shell output is not longer displayed in zuul_stream. So, shell: cat /etc/passwd wouldn't return any data for me | 21:57 |
pabelanger | so, have to stop using zuul_stream and default back to normal ansible output | 21:57 |
mordred | pabelanger: wait - that' sa bug | 21:57 |
mordred | shell output should ABSOLUTELY be displayed in zuul_stream | 21:58 |
pabelanger | Ah, I thought we disabled it for some reason | 21:58 |
mordred | that's like the entire reason it exists | 21:58 |
mordred | is to live-stream shell output :) | 21:58 |
pabelanger | let me test is again here locally with your latest changes | 21:58 |
jeblair | i worry that we're not all on the same page here | 21:59 |
jeblair | so at the risk of stating things people may already know: | 21:59 |
mordred | cool - please do - and yes, any time that you have to disable something, or hold a temp dir or look in the json file to figure out what went wrong with a job, please point that out | 21:59 |
jeblair | i think pabelanger is concerned with running tests on zuul itself. so not a running production instance of zuul, but rather what's happening inside of a unit test. | 22:00 |
mordred | AH. gotcha | 22:00 |
mordred | thank you | 22:00 |
mordred | we were not on the same page | 22:00 |
pabelanger | jeblair: yes, thanks | 22:01 |
pabelanger | ze01 has been restarted | 22:04 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Remove gather_facts: true https://review.openstack.org/485834 | 22:06 |
*** jkilpatr has joined #zuul | 22:07 | |
pabelanger | mordred: we likely can drop validate-host gather fact logic too, and maybe just use the fact_cache files instead? | 22:08 |
mordred | pabelanger: yah - you mean the gather_facts at the top of playbooks/unittests/pre.yaml ? | 22:10 |
mordred | oh - you mean the setup call | 22:10 |
pabelanger | ya: http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/validate-host/tasks/main.yaml#n16 | 22:11 |
pabelanger | that _should_ be the same as cached facts right? | 22:11 |
pabelanger | we could have a post job to add it into logs folder | 22:11 |
mordred | pabelanger: yup. I agree | 22:11 |
mordred | pabelanger: we can ALSO remote the gather_facts: and gather_subset: lines | 22:12 |
mordred | pabelanger: (we're also doing validate-host in both base and unittests atm) | 22:12 |
pabelanger | mordred: ya, I see that now too | 22:12 |
pabelanger | bug? | 22:12 |
pabelanger | our logs are looking real good now too | 22:13 |
pabelanger | much nicer to debug with | 22:13 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Improve display of simple error messages https://review.openstack.org/485833 | 22:20 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Remove validate-host from unittests/pre.yaml https://review.openstack.org/485836 | 22:23 |
pabelanger | mordred: openstack-py35 now passing on shade: http://logs.openstack.org/27/484027/5/check/openstack-py35/1b33d30/job-output.txt | 22:23 |
pabelanger | mordred: still need to rework the job based on your feedback still | 22:24 |
jeblair | oO | 22:24 |
pabelanger | also, still | 22:24 |
mordred | woot! | 22:24 |
*** dkranz has joined #zuul | 22:25 | |
pabelanger | mordred: I would have expected TASK [Gathering Facts] to return ubuntu-xenial | ok in that log too. But holding out to restart ze01 again with latest changes | 22:25 |
pabelanger | jeblair: did you mention you were working on a fix to logs.o.o on zuulv3.o.o status page after job finishes? | 22:28 |
pabelanger | also, https://review.openstack.org/#/c/483611/ could use a review too. gzip our job-output.txt file | 22:29 |
jeblair | pabelanger: no fix, just observed the issue | 22:29 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: gzip console log before uploading it https://review.openstack.org/483611 | 22:33 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Check out implicit branch in timer jobs https://review.openstack.org/485329 | 22:37 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840 | 22:49 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Check out implicit branch in timer jobs https://review.openstack.org/485329 | 22:55 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840 | 22:56 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840 | 23:03 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Check out implicit branch in timer jobs https://review.openstack.org/485329 | 23:03 |
jeblair | pabelanger, mordred, SpamapS: ^ that's in final form for review | 23:03 |
jeblair | also, it's 50% docs. :) | 23:04 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename tags job variable jobtags https://review.openstack.org/485845 | 23:06 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840 | 23:07 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840 | 23:24 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename uuid to build https://review.openstack.org/485851 | 23:32 |
* SpamapS getting into nitty gritty of config file changes for a disk monitor | 23:39 | |
SpamapS | This feels like something that would possibly be unique to each executor. | 23:40 |
SpamapS | Though that may also not really be true in practice, since it would be annoying if sometimes your job works, and sometimes it fails, because sometimes it hits the big executor and sometimes the small one. | 23:41 |
pabelanger | jeblair: mordred: going to restart ze01 again to pick up latest logging changes | 23:41 |
mordred | cool | 23:41 |
jeblair | SpamapS: i think we also had the idea of a jobs-per executor limit. so fine tuning might involve considering those values together. | 23:43 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add zuul.items to job vars https://review.openstack.org/485853 | 23:44 |
mordred | jeblair:your stack looks good - but there's a missing word in the docs in the first patch | 23:48 |
SpamapS | jeblair: Right, what would limit an executor from overloading itself now? Nothing? | 23:54 |
jeblair | mordred: ugh. maybe a followup at this point? :) | 23:58 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Log items in loops better https://review.openstack.org/485864 | 23:58 |
mordred | jeblair: yah. that's what I was thining | 23:59 |
mordred | pabelanger: ^^ that should make your item/loop thing from earlier better | 23:59 |
jeblair | SpamapS: indeed. we spun up zl08 recently because we think we overloaded our v2 launchers when running at full capacity (we were at 213 simultaneous jobs, now down to 187) | 23:59 |
pabelanger | okay, ze01 restarted | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!