tristanC | SpamapS: that would be definitelly nice to have, do you think nodepool could/should do that (manage speculative image)? | 00:31 |
---|---|---|
tristanC | SpamapS: i'm affraid the ansible kubectl connection plugin only seems to work with the raw/shell modules... | 00:35 |
tristanC | SpamapS: perhaps another workflow would be to let zuul-executor build and push images to k8s, and then write the tests as part of a Job: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/ | 00:37 |
tristanC | but then again, we'll lose the capability to write test definition in Ansible | 00:39 |
tristanC | SpamapS: and what about lxd, why would it be a better choice? | 00:41 |
clarkb | I think you'd build and push the image as a pre step | 00:56 |
clarkb | since it will have access to the git repos there | 00:56 |
tristanC | clarkb: maybe zuul could pass the speculative git repos as part of the requestNodes() call :) | 00:57 |
clarkb | tristanC: one of the redesign goals of v3 was to avoid needing to serve git repos though | 00:58 |
clarkb | I think ideally anything requiring speculatively merges code runs on the executor aide of things | 00:59 |
tristanC | well the test instances do require the speculatively merged code... so instead of building the environment once with nodepool or a "containerpool" service, we end up pushing the merged code to each environment | 01:02 |
tristanC | thinking about a kernel ci, it would be nice to have glance image with the speculative kernel so that it could run test in parallel | 01:04 |
clarkb | for that I think you just install it and reboot as part of job? | 01:05 |
clarkb | potentially with an earlier job building it | 01:05 |
clarkb | especially since the kernel may not boot at all | 01:06 |
clarkb | so youd want that job side to record it properly | 01:06 |
tristanC | clarkb: yes, that's exactly what jeblair suggested :) | 01:06 |
clarkb | having built kerbels that nuked grub you definitely want a away to account for that | 01:07 |
tristanC | we could even get a post-nova-console.yml play | 01:07 |
tristanC | clarkb: i am mostly interrested in the kernsec/bwrap/oci stack test, for which a aki/ari/ami would work fine | 01:10 |
tristanC | i mean, as a thought experiment, it's an interresting workflow to address | 01:12 |
clarkb | ya though I dont think it needs to be complicated, just install kernel and reboot | 01:13 |
clarkb | if it doesnt come back treat it as a failure. if it comes back test it | 01:13 |
tristanC | yes, thanks for that suggestion, that would easily work indeed | 01:14 |
SpamapS | tristanC: nodepool would build the image that the pre job FROM's | 01:40 |
openstackgerrit | Rui Chen proposed openstack-infra/nodepool feature/zuulv3: Fix nodepool cmd TypeError when no arguemnts https://review.openstack.org/519582 | 01:44 |
SpamapS | tristanC: and lxd is meant to run containers that act like VMs. | 02:02 |
tristanC | SpamapS: iiuc, lxd doesn't provide an api, so you would still have to manage host instances (which is what the oci driver does) | 02:10 |
tristanC | oh my bad, it does have a rest api | 02:17 |
tristanC | SpamapS: anyway i'm not convinced lxd is a better choice over oci/k8s, at least not for tests like tox, go test, rpmbuild... Those seems to work fine with the containerized sshd trick | 02:25 |
tristanC | it seems like more comparable to openstack instance, e.g. when you need the whole system stack | 02:28 |
tristanC | well i never used lxd so i may be missing something. anyway it looks like a good nodepool driver candidate | 02:43 |
*** threestrands has quit IRC | 06:03 | |
*** bhavik has joined #zuul | 06:47 | |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul-jobs master: Ignore missing .tox/env/logs directorys for copy https://review.openstack.org/521436 | 06:52 |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul-jobs master: Ignore missing .tox/env/logs directories for copy https://review.openstack.org/521436 | 06:55 |
*** xinliang has quit IRC | 07:06 | |
*** xinliang has joined #zuul | 07:18 | |
*** hashar has joined #zuul | 08:33 | |
*** bhavik has quit IRC | 09:53 | |
*** electrofelix has joined #zuul | 09:55 | |
*** bhavik has joined #zuul | 10:03 | |
*** bhavik1 has joined #zuul | 10:37 | |
*** bhavik has quit IRC | 10:39 | |
*** bhavik1 is now known as bhavik | 10:39 | |
*** bhavik has quit IRC | 10:55 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Ignore missing .tox/env/logs directories for copy https://review.openstack.org/521436 | 10:56 |
*** kmalloc has joined #zuul | 11:01 | |
*** openstackgerrit has quit IRC | 11:32 | |
*** jkilpatr has quit IRC | 11:52 | |
*** jkilpatr has joined #zuul | 12:24 | |
rcarrillocruz | http://38.145.34.35/logs/ansible-networking/check/github.com/rcarrillocruz-org/ansible-fork/2/1e759d67fe084e1297cab4fba9440cd1/run-openvswitch-integration-tests/logs/job-output.json | 12:32 |
rcarrillocruz | \o/ | 12:32 |
rcarrillocruz | first job run of openvswitch ansible modules on my testing zuulv3 server | 12:33 |
rcarrillocruz | hooked with GH | 12:33 |
SpamapS | rcarrillocruz: congrats! | 13:52 |
SpamapS | tristanC: also sorry for getting all engineer-crit on your thing. I think its really cool to have a k8s option for zuul. :) | 13:53 |
tristanC | SpamapS: heh, no offense taken :) i'm very new to docker/k8s and your suggestions are much appreciated | 14:01 |
tristanC | sorry if i sounded defensive | 14:02 |
SpamapS | not at all | 14:18 |
SpamapS | just woke up and realized I had only been critical | 14:18 |
SpamapS | And really I just want to boil it down to the thinnest thing possible. | 14:19 |
SpamapS | sshd is certainly thinner than a whole VM :) | 14:19 |
*** openstackgerrit has joined #zuul | 14:22 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 14:22 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 14:23 |
leifmadsen | mordred: jeblair: fyi pabelanger helped me last Thursday/Friday, and I was able to get far enough to run a "hello world"! | 14:36 |
leifmadsen | so I'll be circling back around in the next week or two, and building out a cleaned up set of docs/notes for a "Zuul From Scratch" (I'm thinking about avoiding use of "quickstart" in the docs, because it's not really very quick or light :)) | 14:36 |
leifmadsen | once I get that far, I'll propose some changes to the existing feature/zuulv3 branch | 14:37 |
mordred | leifmadsen: \o/ | 15:00 |
leifmadsen | indeed heh | 15:02 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Make build-python-release job https://review.openstack.org/513925 | 15:10 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Remove old python-sdist job https://review.openstack.org/513926 | 15:10 |
rcarrillocruz | folks, not finding much docs about how to stream console jobs on zuul. What is it needed? web-console up and zuul_console spawned on each job on the executor, what else | 15:15 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Half-Revert "Revert "Add ensure-reno and ensure-babel roles"" https://review.openstack.org/521558 | 15:24 |
tristanC | rcarrillocruz: what is web-console? you meant zuul-web right? | 15:24 |
*** bhavik1 has joined #zuul | 15:24 | |
rcarrillocruz | erm, yeah | 15:25 |
rcarrillocruz | zuul-web sorry | 15:25 |
tristanC | not sure what's wrong with your setup, but that should be enough. here is an apache conf to sit in front of zuul-web: https://review.openstack.org/#/c/505452/9/zuul/web/static/README | 15:26 |
tristanC | well, without that patch cherry-picked, change it to ProxyPassMatch /console-stream ws://localhost:9000/console-stream nocanon retry=0 | 15:27 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Half-Revert "Revert "Add ensure-reno and ensure-babel roles"" https://review.openstack.org/521558 | 15:27 |
*** bhavik1 has quit IRC | 15:39 | |
rcarrillocruz | hmm, k, was mixing app webapp *and* zuul-web | 15:41 |
*** jkilpatr has quit IRC | 16:13 | |
*** jkilpatr has joined #zuul | 16:29 | |
*** bhavik1 has joined #zuul | 16:39 | |
*** hashar is now known as hasharAway | 16:45 | |
*** bhavik1 has quit IRC | 16:50 | |
*** bhavik1 has joined #zuul | 16:50 | |
tobiash | ianw: I left an answer on https://review.openstack.org/#/c/520855 | 16:53 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Fix nodepool cmd TypeError when no arguemnts https://review.openstack.org/519582 | 16:56 |
*** bhavik1 has quit IRC | 16:59 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 17:01 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Update fetch sphinx output to use sphinx vars https://review.openstack.org/521590 | 17:01 |
mordred | jeblair: if you didn't see in the scrollback, electrofelix was asking questions in #openstack-infra about the ability to push merges from zuul and how adding that might or might not relate to merger/executor interactions ... I thought that was a good jeblair conversation :) | 17:08 |
*** jkilpatr has quit IRC | 17:09 | |
*** jkilpatr has joined #zuul | 17:10 | |
mordred | jeblair: also - important to note - there is what I think is an issue with override-checkout not working as expected | 17:15 |
mordred | jeblair: we disabled tox-py35-on-zuul from zuul-jobs as it was bombing out due to zuul master being checked out: https://review.openstack.org/#/c/521096/ and I failed at figuring out what was happening | 17:16 |
jeblair | mordred: does that affect override-branch too? (or have we removed it?) | 17:17 |
electrofelix | mordred: thanks, I keep forgetting this is the best room for zuul discussions, I'll break my habbit at some point | 17:17 |
mordred | jeblair: well, first stab at fixing was "swich from override-branch to override-checkout" | 17:17 |
mordred | jeblair: http://logs.openstack.org/42/521142/1/check/tox-py35-on-zuul/81bff30/ is an log from a failure | 17:18 |
mordred | jeblair: but switching from one to the other did not have any impact | 17:18 |
jeblair | electrofelix: i caught up on the earlier infra discussion -- but what's the issue you're trying to solve? | 17:18 |
jeblair | (i think i need just a little more context) | 17:18 |
electrofelix | We build artifacts during the gate that we don't want to rebuild afterwards, because what we built is what actually passed tests | 17:19 |
jeblair | mordred: ok, so it seems to affect both. did this use to work and something changed? | 17:19 |
electrofelix | some teams like to attach the metadata of the git repos to those artifacts | 17:19 |
jeblair | electrofelix: okay, so this may actually touch on two related issues: | 17:19 |
electrofelix | but the commit SHA1 won't line necessarily up subsequently, because it corresponds to the proposed merge by zuul is just a test rather than the actual merge | 17:20 |
electrofelix | jeblair: need to switch from office to home, so maybe hold off discussion until tomorrow and I'll ping you again about it | 17:21 |
jeblair | electrofelix: 1) using the result of the initial zuul merger calculation in the executors (rather than repeating the action in the executors), as well as 2) pushing the result of the initial zuul merger calculation to the target branch rather than having gerrit/github perform the merge. | 17:21 |
jeblair | electrofelix: sounds good | 17:21 |
*** electrofelix has quit IRC | 17:21 | |
mordred | jeblair: yes, I believe it worked at some point in the past as that was our initial thing to make sure zuul-jobs changes didn't break unittest jobs | 17:25 |
mordred | jeblair: THAT SAID - it's possible we never actually validated that it was running zuulv3's unittests and not zuulv2s' | 17:25 |
mordred | jeblair: I'm also not sure, even if it WAS running zuul v2's unittests - that that would break due to the patch in question | 17:25 |
jeblair | mordred: yeah, i guess that sort of thing could slip by (were we running py3 tests on v2?) | 17:26 |
mordred | jeblair: but there were too many variables in the air and I was more focused on helping get releasenotes jobs fixed, so I didn't diagnose deeply | 17:26 |
jeblair | mordred: thanks, i'll try to dig in shortly | 17:27 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 17:41 |
jeblair | mordred: there's a good chance we forgot to restart the executors after adding support for override-checkout. that may explain why override-checkout isn't working, but it would not explain why override-branch wasn't. | 17:46 |
jeblair | (override-checkout change merged nov 1, a random executor start time is oct 31) | 17:47 |
jeblair | Shrews: in shutting down openstack's executors, i see that 2 of them are stuck at the following: | 18:05 |
jeblair | sendto(7, "59.676215 | wheel-mirror-ubuntu-"..., 4096, 0, NULL, 0 | 18:05 |
jeblair | zuul-exec 9708 zuul 7u IPv6 3547817819 0t0 TCP ze04.openstack.org:finger->zuulv3.openstack.org:34988 (ESTABLISHED) | 18:06 |
jeblair | Shrews: it looks like it's stuck transmitting data to the multiplexer | 18:06 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618 | 18:07 |
jeblair | Shrews: these were last restarted on nov 10 and nov 15, so they should be running the old process code | 18:07 |
jeblair | (though i suspect this would be the same in either case -- process or threading) | 18:07 |
mordred | tristanC: what version of angular is required? (I just realized that we didn't add an entry into etc/status/fetch-dependencies.sh | 18:13 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: DNM: test tox-py35-on-zuul https://review.openstack.org/521623 | 18:14 |
mordred | dmsimard: do you perhaps know the answer ^^ ? | 18:15 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add angular to fetch-dependencies.sh https://review.openstack.org/521625 | 18:23 |
mordred | tristanC, dmsimard: ^^ based on looking at xstatic packages and also storyboard-webclient depends I'm guessing that's the version you were using | 18:25 |
dmsimard | There's no angular in ARA | 18:30 |
dmsimard | Missing context, let me read back.. | 18:30 |
dmsimard | looking at https://softwarefactory-project.io/r/gitweb?p=scl/zuul-distgit.git;a=blob;f=zuul.spec;h=bfaaa32b7bb1b61ed84fa5695f665f05bd9af0b8;hb=HEAD there's no obvious version for it.. seems like it's being pulled from either in-tree or elsewhere | 18:35 |
dmsimard | This is the angular.js file from a test deployment http://paste.openstack.org/raw/626839/ .. there's no version header in it. Something about 1.3.7 but looks about error handling instead. | 18:41 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618 | 18:53 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 19:09 |
dmsimard | mordred, SpamapS: fyi along the lines of our discussion last week, I'm starting a thread on openstack-dev around leveraging zuul v3 jobs/roles/playbooks outside OpenStack -- it's focused around migrating some "downstream" TripleO jobs to Zuul v3 but I feel like the discussion will be worthwhile to see what we can do, what we can't, what works and what doesn't | 19:20 |
dmsimard | I feel the need to mention it since you might filter out [TripleO] threads :) | 19:20 |
mordred | dmsimard: sweet | 19:22 |
jeblair | dmsimard: sounds cool :) | 19:23 |
dmsimard | jeblair: oh, we started a pad to document issues with running zuul-jobs outside the gate: https://etherpad.openstack.org/p/downstream-zuul-jobs-issues | 19:24 |
dmsimard | there's not much there yet, we only scratched the surface | 19:24 |
dmsimard | We were also discussing how we can safely include zuul configuration from zuul-jobs/openstack-zuul-jobs/etc without incurring things as potential syntax failures due to clashing names or whatever else. | 19:25 |
jeblair | dmsimard: cool, when things settle down, we should probably establish a storyboard tag or something for issues like that | 19:25 |
dmsimard | So far we're using parameters like shadow and selective includes | 19:25 |
jeblair | cool, it'll be good to know whether those work as intended or need further work | 19:26 |
dmsimard | I was surprised to even see that those were available in the first place, it means someone thought about the use case | 19:28 |
dmsimard | so whoever added that in, +++ | 19:28 |
tobiash | dmsimard: I added a point to your etherpad | 19:47 |
dmsimard | tobiash: yeah, I haven't quite figured out that one yet | 19:49 |
jlk | tristanC: SpamapS haven't read full backlog, but I also was thinking in teh direction of a kubectl driver for ansible, so that you can do kubectl exec type things and not need ssh inside the images. | 19:49 |
tobiash | I looked a bit into that some weeks ago but didn't have time for really working on that yet | 19:49 |
jlk | I see "ssh" as a byproduct of how openstack VMs work | 19:50 |
jlk | and if they're not necessary to execute things inside the contained (vm or otherwise) environment, then all the better. | 19:50 |
pabelanger | jlk: the comment that interested me from tristanC was how would do synchronize task for logs from container? | 19:51 |
jlk | synchronize stil works with the docker module | 19:52 |
jlk | the docker connection module for Ansible, which just uses docker exec. Presumably the same sort of thing can work on k8s | 19:53 |
jlk | granted, I haven't looked close enough at _how_ it works for Docker, but I've used it before. | 19:53 |
pabelanger | kk | 19:53 |
*** hasharAway is now known as hashar | 20:00 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/zuul-jobs master: Half-Revert "Revert "Add ensure-reno and ensure-babel roles"" https://review.openstack.org/521558 | 20:07 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 20:15 |
mordred | jlk: ++ to using native k8s/docker connection for the things - I believe flaper87 wrotea kubectl module for ansible - but I think in this case we'd need a kubectl connection plugin, yeah? | 20:19 |
clarkb | in fairness ssh is a byproduct of how ansible works :P | 20:21 |
dmsimard | what's a native k8s connection ? I know openshift has "oc rsh" but I haven't done much pure k8s | 20:21 |
clarkb | if we used some other rpc system we'd use whatever protocol that speaks over | 20:22 |
dmsimard | on openshift if you do "oc rsh <pod>" it opens a shell in the pod, not sure what it does behind the scenes (or how it would pick one of the containers in the pod) | 20:23 |
dmsimard | openshift-ansible probably have a module/plugin for that actually | 20:23 |
clarkb | (it also happens to be the case that jenkins used ssh as well, so that wasn't a change for us) | 20:23 |
clarkb | (but nothing about it is openstack vm specific | 20:23 |
dmsimard | yeah doesn't look like openshift-ansible folks have something like that yet but it'd be "insanely cool" | 20:26 |
dmsimard | the different upstream connection plugins are here https://github.com/ansible/ansible/tree/devel/lib/ansible/plugins/connection | 20:27 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 20:49 |
mordred | clarkb, dmsimard: yah - I think it's actually "ssh is a byproduct of the fact that we're currently using VMs that look like computers for our test nodes" - due to the connection plugin support, if we produce test nodes that are not intended to be connected to over ssh, that should not be a blocker | 20:59 |
mordred | we're already aware of at least win_rm for the windows nodes | 21:00 |
*** jkilpatr_ has joined #zuul | 21:00 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Normalize daemon process handling https://review.openstack.org/517381 | 21:00 |
*** jkilpatr has quit IRC | 21:01 | |
mordred | so yah - being able to support docker connection plugin for docker container build nodes or a theoretical kubectl connection plugin for k8s pods seems like a good way to handle those as we grow support for them | 21:01 |
*** jkilpatr_ has quit IRC | 21:01 | |
jeblair | reminder, meeting is in 1 hour (this may have changed in reference to your local time for folks in the usa) | 21:01 |
mordred | jeblair: nice daemon process handling cleanup patch | 21:03 |
clarkb | mordred: ya I just think its important to decouple that from openstack, nothing in openstack says you have to do it that way and in fact you could run windows on openstack if you wanted and do whatever it is you do with windows for example. | 21:05 |
jeblair | yeah, in working with leifmadsen i was able to see what worked well and didn't in zuul/nodepool. nodepool's handling of pidfiles wins because of the way they use paths, and zuul's default logging config wins -- that's something we should work on porting to nodepool. | 21:05 |
clarkb | mordred: the determining factor for us is ansible (and prior to that was jenkins) | 21:06 |
mordred | clarkb: I agree - it's not caused by openstack ... | 21:06 |
mordred | clarkb: but I think it's just as important to point out that it's not actually driven by ansible either, as ansible has plenty of non-ssh connection plugins | 21:07 |
mordred | it is the combination of the fact that we are running ssh capable VMs and that is the default ansible connection mechanism | 21:07 |
clarkb | mordred: right thats fine we could use whatever ansible supports | 21:07 |
mordred | yah | 21:07 |
clarkb | it does ssh by default so we've ended up there by default | 21:08 |
mordred | yup | 21:08 |
clarkb | jenkins too fwiw, there were non ssh methods | 21:08 |
mordred | and I thnik it's 100% the right choice of how to connect to linux-based things that look and behave like multi-user/multi-process computers | 21:08 |
mordred | (whether those come from bare metal, vms or containers) | 21:09 |
clarkb | ya | 21:09 |
jeblair | okay, i'm going to go out on a limb here... is there anything new happening in this discussion, or are we just repeating the discussion we have every few weeks? | 21:09 |
jeblair | i'm asking because i'd like for us to, as a group, avoid spinning our wheels | 21:10 |
jeblair | these discussions take a *lot* of time and attention | 21:10 |
jeblair | and this is an important topic | 21:10 |
clarkb | if nothing else, I think it shows that we may have confusion that ansible is what drives this | 21:10 |
jeblair | and we're not going to be able to address it fully right now. we do have an item (maybe more than one item) on the roadmap to deal with non-vm test hosts, after the release | 21:10 |
clarkb | zuul doesn't really care, nodepool doesn't really care and openstack definitely doesn't care. Ansible is what determines this so maybe we need to write that down "if you want to connect to something not sshd then you need to make ansible talk to it" | 21:10 |
jeblair | is there a way we can structure our work so that we can say "yeah, we totally know this is a thing, let's work on it later?" | 21:11 |
jeblair | or do we need to further delay the zuulv3 release so that we can work on this now? | 21:11 |
clarkb | I don't think we need to work on it, its not something that zuul has to be directly aware of I don't think (but I could be wrong about that) | 21:12 |
jeblair | this is what i was trying to accomplish with a roamap. have a place where we say we know what we're going to do in the future, but it's in the future. now we have boring^Wexciting things like "actually make system usable and release" to do :) | 21:12 |
clarkb | it would just be documentation for zuul users that says "make ansible do it" | 21:12 |
clarkb | regardless of what the system is (windows, k8s, docker, etc) | 21:13 |
jeblair | clarkb: well, i think this needs a significant amount of thought and discussion but i'm not prepared to have it now, and i'd like other folks to have the space to focus on near-term goals. | 21:14 |
jeblair | so how can we do that? or am i wrong? do we just need to accept that we need to solve containers right now?L | 21:14 |
jeblair | basically, in my mind, the roadmap is flexible -- we can change it. but if we agree about it, let's stick to it. | 21:15 |
clarkb | I don't personally think we need to solve containers right now (we've not needed them in ~5 yaers). I do think it is something people are interested in tackling though and if there isn't any direction for them that maybe isn't a great thing | 21:15 |
jeblair | i'm happy to work on that, but i'm afraid we don't have enough people signed up to do the things we need to get to the v3.0 release. | 21:16 |
leifmadsen | isn't this the type of discussion we have every 6 months? :) | 21:17 |
leifmadsen | i.e. isn't that was devcons are for? | 21:17 |
mordred | jeblair: the main thing I was trying to communicate by responding to this particular incarnation of the topic was to remind folks that support for non-ssh connection plugins is going to be important no matter what the thing on the other end of the connection winds up being - and I think from a zuul pov much of that doesn't have a ton to do with containers vs. non-containers | 21:17 |
jeblair | leifmadsen: yes, i'd like to get us synchronized with that. hopefully in the next cycle we can take advantage of it more. | 21:17 |
jeblair | leifmadsen: we did indeed sketch out our roadmap at the last one | 21:17 |
jeblair | leifmadsen: i'm wondering if it has meaning. :) | 21:18 |
leifmadsen | depends where it is, and if you're working from it | 21:18 |
leifmadsen | and if everyone knows what it is, and that it's being worked from :) | 21:18 |
jeblair | leifmadsen: indeed | 21:19 |
leifmadsen | also, it's good to document these things as you go for sure, because you'll forget all this stuff when you're building an agenda for the next devcon :) | 21:19 |
leifmadsen | so a place to document these discussions so they can be useful and had, then documented, then added to roadmap, is a good place to be | 21:20 |
jeblair | leifmadsen: that's another reason to put the roadmap in storyboard i suppose | 21:20 |
leifmadsen | is it just in an etherpad right now? :) | 21:20 |
jeblair | leifmadsen: it was an etherpad, became an email, and i believe at the last meeting pre-summit, clarkb and i signed up to make it be in storyboard | 21:21 |
leifmadsen | yea, so basically, it doesn't exist :) | 21:21 |
jeblair | leifmadsen: not sure why not -- we're all on the email list | 21:21 |
leifmadsen | I don't consider etherpads and email documentation :D | 21:21 |
leifmadsen | email lists are incredibly hard to go and lookup information. The ideal place, in my head, is a link to a sane location from the README | 21:22 |
jeblair | it's not documentation, it's meant to be a discussion, and something to agree on | 21:22 |
leifmadsen | it's documentation | 21:22 |
leifmadsen | you're documenting a plan | 21:22 |
leifmadsen | otherwise, you're just pontificating | 21:22 |
jeblair | leifmadsen: i think there's a miscommunication here | 21:22 |
jeblair | leifmadsen: i think it's really important for us to discuss something like this, which is why i started it in person, and online, at the summit in an etherpad | 21:22 |
leifmadsen | I'm not sure there is :) | 21:22 |
leifmadsen | yes, I agree | 21:23 |
jeblair | leifmadsen: then followed up on the mailing list to make sure even more people were included | 21:23 |
leifmadsen | what I'm saying, is your roadmap is not yet documented | 21:23 |
leifmadsen | if it's only on an email list and an etherpad | 21:23 |
jeblair | leifmadsen: it will eventually end up somewhere. at our last meeting before the summit, we discussed whether it should be in a readme in repo or in storyboard | 21:23 |
jeblair | leifmadsen: and we decided to put it in storyborad | 21:23 |
jeblair | leifmadsen: i think that should make you happy. but i suggest your criticism is unwarranted. | 21:23 |
leifmadsen | sure, that works, and then you can link to it from a README so that people getting to the project know where to look | 21:24 |
leifmadsen | unwarranted? | 21:24 |
leifmadsen | ummm ok | 21:24 |
leifmadsen | I'm not intending my tone to be harsh | 21:24 |
jeblair | here's the email: http://lists.openstack.org/pipermail/openstack-infra/2017-November/005657.html | 21:24 |
pabelanger | re:containers, it does seem to be something people want before adopting zuulv3. But at the same token, we are seeing requests to tag zuulv3 now. So feel like catch-22 in that aspect. But agree, we need to stablizing things more before new features | 21:25 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 21:26 |
leifmadsen | everyone wants all the things all the time | 21:26 |
leifmadsen | just gotta have your must haves, then move along with your life; development is never ending | 21:26 |
mordred | jeblair: having just now gone to look at the roadmap again real quick, I notice that "nodepool backends" is in "long term / design", which I think we may want to at least partially reconsider, as I know getting windows nodes is important to tobiash | 21:26 |
jeblair | mordred: that's a backend? | 21:27 |
mordred | jeblair: the part that of that that I think may be worth considering on the pre-3.0 roadmap is ensuring that we can support backends that don't use ssh ... as I could imagine that there might be some sort of weird breaking change associated with that | 21:27 |
jeblair | mordred: i don't see why that can't be a forward-compatible thing in v3.1? | 21:28 |
pabelanger | I could sign up for 'demonstrate openstack-infra reporting on github' (gtest-org project) and 'add command socket to scheduler and merger for consistent start/stop' items myself | 21:29 |
jeblair | leifmadsen: yep, we have to make compromises | 21:29 |
mordred | it might be able to be, for sure- but I know that talking through the win_rm auth needs with tobiash from a nodepool->zuul perspective exposed a few design assumptions | 21:29 |
jeblair | pabelanger: thank you so much! | 21:29 |
jeblair | mordred: could maybe someone write that up as an email or something? | 21:29 |
mordred | jeblair: sure | 21:30 |
mordred | jeblair: like, I don't think we need to have other nodepool backends fully implemented - it's more "before we release a 3.0, can we sanity check that we're not including something that would make doing so extra hard/awkward" | 21:30 |
mordred | jeblair: but happy to write that up as an email to the list | 21:30 |
jeblair | mordred: if there is something breaking, i agree we should look at it early. but i think our goal should be to polish up the thing that we are basically running, and do the minimum to ensure we're not backing ourselves into a corner later, and defer the rest. | 21:31 |
mordred | jeblair: yes, I agree with that 100% | 21:31 |
jeblair | (because "we're running a thing but telling no one else to run it because doing so sucks" is not a state we should be in long-term) | 21:31 |
mordred | jeblair: I mostly want to make sure we're not backing ourselves into a corner on this one | 21:31 |
mordred | jeblair: ++ | 21:31 |
clarkb | jeblair: I've reviewed the daemon change. I like it but a couple things that I think need to be addressed | 21:39 |
jeblair | clarkb: thanks, i'll look at that. i also just noticed it collided with another patch that landed ahead of it. that's why pep8 failed. i'll have to untangle that too. | 21:40 |
jeblair | clarkb: i'm having trouble following your static comment | 21:42 |
jeblair | clarkb: i call "Executor().main()", so that's an instance method | 21:43 |
clarkb | jeblair: when you call main() you are doing so as ClassName.main() not ClassObject.main() | 21:43 |
jeblair | clarkb: nah, there's a () after Executor | 21:43 |
clarkb | oh am I just compeltely blind? because that may be too | 21:44 |
jeblair | it's just anonymous | 21:44 |
clarkb | ya I'm just blind. So I think the only other thing is making sure you don't need a pidfile when running non daemon mode | 21:44 |
jeblair | clarkb: cool. i agree with that and will implement in next rev | 21:44 |
clarkb | and I think that with: pass will lock the file | 21:44 |
jeblair | clarkb: yeah, it's intended to lock-then-unlock | 21:45 |
jlk | jeblair: I think the "new thing" that's happened with "how do we container" is that a driver implementation was submitted | 21:47 |
jeblair | jlk: oh, this is tristanC's change? | 21:48 |
jeblair | 521356? | 21:49 |
jlk | that's the one that spurred the conversation. I have the tab open to read it, but I haven't yet. I glean from the conversation that it implements / relies upon sshd inside the container | 21:49 |
jeblair | jlk: i suspect that's going to spawn the design discussion that we all know we need to have | 21:50 |
jeblair | so it still may be worth asking ourselves, is it most beneficial to have this now, or would it be better to defer it until we're closer to that point on the roadmap. or do we need to change the roadmap. | 21:50 |
jeblair | all of those '.' should be '?'. :) | 21:51 |
jlk | well yeah | 21:51 |
jlk | I'd like to contribute to this discussion/feature, because I have use cases, and some time to dedicate to it. But if y'all aren't ready for it, then now's not the time | 21:51 |
jeblair | well, it's more "us" than "ya'll" i hope :) | 21:52 |
jlk | that's... entirely fair. I've been checked out lately and I slipped phrasing | 21:52 |
jeblair | to be fair "us aren't ready for it" isn't great phrasing either. heh. :) | 21:53 |
jeblair | i'm apparently englishing poorly today | 21:54 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Normalize daemon process handling https://review.openstack.org/517381 | 21:55 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 21:59 |
clarkb | meeting time now right? | 21:59 |
jeblair | yep! | 22:04 |
jeblair | in #openstack-meeting-alt | 22:04 |
pabelanger | I'll only be able to attend first 30mins today | 22:04 |
jeblair | jlk, SpamapS, leifmadsen, Shrews, dmsimard: ^ | 22:05 |
*** hashar has quit IRC | 22:06 | |
jeblair | mordred: ^ | 22:07 |
leifmadsen | I can't make 5pm meetings | 22:09 |
leifmadsen | so won't be there :) | 22:09 |
*** jlk has quit IRC | 22:31 | |
*** jlk has joined #zuul | 22:32 | |
*** jlk has quit IRC | 22:32 | |
*** jlk has joined #zuul | 22:32 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142 | 22:38 |
dmsimard | oh hey just got my notification that the zuul meeting is in 10 minutes from now .. :/ | 22:50 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Make build-python-release job https://review.openstack.org/513925 | 22:56 |
ianw | dmsimard: just be sure not to change the timeline, don't want to have a marty mcfly situation on our hands :) | 23:00 |
clarkb | jeblair: re zuul config breakages the one I'm most familiar with is the parent to final job one so I'll start with that | 23:01 |
clarkb | jeblair: one of the release jobs that only runs on tags was marked final. Neutron/Horizon then parented that job in order to add require projects to the job. This merged with no errors. Problem then only came up when tag was made and zuul said no that job is final | 23:02 |
clarkb | this is less than ideal because it could be quite a bit of time between jobs. Maybe we handle this not with zuul self checking internally but by running a job that does an actual compile of the jobs? | 23:02 |
dmsimard | jlk: the support for different version of Ansible doesn't necessarily have to come out of Zuul.. for example for ARA integration testing I just install a "nested" ansible of the desired version on the target nodes | 23:03 |
dmsimard | jlk: because I want to test that ara works with different version of ansible/py27/py35/etc | 23:03 |
clarkb | I'm guessing that the "binding" of final happens far too late in the compile process for the simpler syntax checking to catch it | 23:03 |
jeblair | clarkb: yeah, that's a run-time error which is impossible (or at least very difficult) to detect in advance | 23:03 |
dmsimard | jlk: however, it sort of sucks because then the "zuul" ara report contains just one large command instead of the granular tasks | 23:04 |
jlk | dmsimard: right. That was my intent. I don't think you should be able to select which executor your job runs from based on the ansible version on the executor | 23:04 |
jlk | otherwise I think it feels like being overly dependent on the fact that Zuul runs Ansible under the hood, making it that much harder to change (if we ever changed) | 23:04 |
clarkb | jeblair: I'm digging through gerrit changes to find the error that prevented zuul from starting now | 23:05 |
jeblair | clarkb: basically, you can construct combinations of variants that could theoretically trigger or not trigger such problems. so it's hard to detect in advance. | 23:05 |
dmsimard | jlk: I still think, however, that "forcing" the upgrade from 2.3 to 2.4 on users is dangerous | 23:05 |
clarkb | jeblair: https://review.openstack.org/#/c/519949/ is the change | 23:05 |
clarkb | https://review.openstack.org/#/c/520205/ is another that we may want to look at as far as pre merge testing goes | 23:05 |
dmsimard | jlk: historically there has always been issues between "major" versions (2.0 -> 2.1 -> 2.2 -> 2.3 -> 2.4) | 23:06 |
jlk | yeah, do you pin that to major zuul versions? | 23:06 |
dmsimard | people usually pin ansible for that reason | 23:06 |
clarkb | we've had issues with minor updates too fwiw | 23:06 |
dmsimard | clarkb: yeah, but they're not as common | 23:06 |
jeblair | clarkb: there is perhaps a simple case where you could say that all variants for a certain job are final and therefore there would be an error. but that's hard, and it's half a solution, so i worry about whether it's a good idea. | 23:06 |
dmsimard | they're gotten better at testing but their internal API is not stable | 23:06 |
jlk | yeah there's two concerns at play | 23:07 |
jlk | will Zuul's integration with Ansible continue to work | 23:07 |
clarkb | jeblair: ya thinking it might be better to have a job that compiles zuul outside the running zuul? | 23:07 |
clarkb | jlk: if that makes sense? | 23:07 |
jlk | and will end user playbooks continue to work as expected | 23:07 |
jeblair | clarkb: well, the thing is that the job doesn't exist until it's run. it's based on variants matching a specific change/ref/etc | 23:08 |
dmsimard | clarkb: my idea around testing base jobs was along these lines, run a nested zuul (complete with executor, with a second node to be used in the upcoming static nodepool driver) and then do a "manual" zuul enqueue of the real job | 23:08 |
jlk | Heizenjob | 23:08 |
dmsimard | clarkb: but the problem with that is that you still need to load like 2000 repositories worth of configuration | 23:09 |
clarkb | dmsimard: ya | 23:09 |
clarkb | jeblair: gotcha | 23:09 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Update fetch sphinx output to use sphinx vars https://review.openstack.org/521590 | 23:09 |
dmsimard | clarkb: but then the nested zuul also doesn't have the private keys to decrypt the secrets | 23:09 |
dmsimard | which is totally okay, because otherwise that'd allow people to peek at things | 23:10 |
clarkb | I wonder, could we do a lint type check that just walks up the parent tree for finals? | 23:10 |
clarkb | jeblair: ^ | 23:10 |
clarkb | maybe thats the case you mean where all variants are final | 23:10 |
jeblair | clarkb: that's the thing i've probably been unable to express clearly -- even the parent hierarchy isn't fully determined until the job runs | 23:11 |
clarkb | jeblair: got it | 23:11 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618 | 23:11 |
jeblair | if you say "parent: foo" and "foo" has N variants, we don't know which will apply. it could be anywhere from 0-N. | 23:11 |
jeblair | (if 0 apply, the job doesn't run) | 23:12 |
clarkb | ok lets ignore that one for now because its reltively minor and not easily fixable. https://review.openstack.org/#/c/519949/1 is the fix for the thing that prevented zuul from starting | 23:12 |
clarkb | I think ^ is more important as it impacts the ability to run zuul | 23:12 |
clarkb | (also if anyone is wondering we think we tracked back the source of the OOM to puppet openstack pushing a ton of job changes all at once which I think is a known issue, we asked OSA to push them slowly) | 23:13 |
jeblair | clarkb: i suspect that error came in due to mordred's git surgery? | 23:15 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618 | 23:15 |
jeblair | clarkb: i was wondering, thanks | 23:15 |
clarkb | jeblair: ya I think .zuul.yaml must've come from os-c-c | 23:15 |
clarkb | but that should still have failed pre merge no? | 23:15 |
jeblair | clarkb: we should be in much better shape than when osa did it originally, but still, lots of job changes can use lots of ram | 23:15 |
jeblair | clarkb: i think mordred did git surgery to create that branch and push it directly | 23:16 |
clarkb | ah | 23:16 |
clarkb | that may be the piece I am missing | 23:16 |
jeblair | it is likely that caused zuul to get stuck on an old config as well | 23:16 |
jeblair | (it would have kept running with the last config it was able to fully load) | 23:16 |
clarkb | lesson here then is be very careful with force merges | 23:16 |
* mordred reads | 23:17 | |
jeblair | one thing that will make this very specific case better: we plan to drop the project name from in-repo config files | 23:17 |
mordred | AH. derp | 23:17 |
mordred | yah. that's my bad for sure | 23:18 |
jeblair | but the general problem of dealing with erroneous configs remain. i think we'll eventually have to do something like have zuul automatically remove projects with broken configs. once we have the dashboard, there will be a nice place to have a big red warning that has happened. | 23:18 |
mordred | we can actually delete that branch already ... it was just thereto prevent gerrit from creating thousands of gerrit changes | 23:18 |
jeblair | (that could have lots of follow-on effects, like breaking other projects, but i don't think there's anything else that could be done) | 23:19 |
mordred | in fact, how about I go ahead and delete it now | 23:19 |
mordred | jeblair: ++ | 23:19 |
SpamapS | sorry to miss the meeting.. had a long drive and we got moving late :-P | 23:46 |
jeblair | no meeting while driving! :) | 23:46 |
SpamapS | FYI, I use my zuul to test our ansible deployment stuff, which is pinned at ansible 2.1 | 23:56 |
SpamapS | I just install ansible in a virtualenv on a node and run it. | 23:56 |
SpamapS | Which is good, because I wouldn't want to mix the concerns of Zuul with the concerns of my k8s deploying ansible code. | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!