Tuesday, 2017-11-14

leifmadsenplease note there is a lot of precursor "quickstart" notes here: https://etherpad.openstack.org/p/zuulv3-quickstart00:00
leifmadsenSpamapS: ^^00:02
leifmadsenI need to get a couple small things out of the way, but then a "omg I just want to run some basic Zuul" is very high on my list, primarily because as every day passes, I realize I need it more and more00:02
leifmadsenNext step in my notes, is mostly just "run a job". Based on some examples I've seen, I think that's going to be relatively straight forward. Once that works, then it's basically me turning the notes into some documentation, which I'm not too worried about, as I've done a lot of that part.00:03
tristanCleifmadsen: hey, fwiw we are about to release software-factory 2.7, and you could get a running v3 setup with a base jobs already configured with a logserver in 3 commands00:08
leifmadsenyea... not really what I'm looking for though00:08
leifmadsenI understand everyone has push button infra now :)00:08
tristanC"omg I just want to run some basic Zuul" sounds like what you need is actually a push button infra now :)00:11
leifmadsendoes it do GitHub integration too?00:11
leifmadsennot looking to run gerrit00:12
tristanCif you add github apitoken and webhook secret to the sfconfig.yaml, sf will add github connections and pipeline to zuul config00:13
leifmadsenI just think SF is going to have far too many moving parts for what I really want to explore00:13
leifmadsenalso, that doesn't help the "document zuul" effort :)00:13
leifmadsenso it's more than just "run zuul"00:13
leifmadsenotherwise, I'd look more at Windmill, or Hoist, or XYZ00:14
SpamapStristanC: I'm very excited that you are doing that. :)00:15
SpamapSI wanted to use SF00:15
SpamapSbut it was still too weird for me so I just fell back on BonnyCI/hoist00:16
SpamapSleifmadsen: I've got it on my list of things to work on tonight, where I'll have some quiet time in a hotel room. :)00:16
SpamapSI have written up the entire bootstrap procedure in a GoDaddy context....00:16
leifmadsenSpamapS: cool, well I'd encourage you to go through what I've gotten so far for sure00:16
leifmadsenthe idea being to run a single VM, that gets events from GitHub, and trigger it to run a "Hello world" ansible playbook00:17
SpamapSJust need to genericise the parts that are like "Go to the system that allocates service accounts in AD and get one in group Z" and make those "you'll need a user account on your cloud that can do A, B, C"00:17
leifmadsenI gotta go put the kids to bed, or I'd elaborate more, but had a talk with mordred and jeblair before we started, so they know the general approach00:17
leifmadsenSpamapS: that's exactly what I'm trying to avoid :)00:17
pabelangermuch backscroll00:18
SpamapSpabelanger: most of it is just wanking from me. ;)00:21
leifmadsenSpamapS: so I didn't read the whole scrollback, but what confuses me, is why master == 2.0, and feature/zuulv3 exists at all :)  I'd actually have almost thought it'd be the other way around, and there be stable/2.0 and master (future 3.0)00:23
leifmadsenI'm sure it's been asked and answered a 1000x though00:24
pabelangerleifmadsen: FWIW: at summit, we were really close to getting hello world job going on zuulv3 for hands-on workshop. We were lacking an openstack cloud to run jobs on a node. I'm hoping next time we can demo, something like OCI nodepool driver will be finished, or I'll ask openstack passport program to offer up some cloud resources00:26
mordredpabelanger: or both!00:26
leifmadsenpabelanger: didn't have an RDO Cloud login?00:27
leifmadsenthat's what I've been using anyways00:27
pabelangerleifmadsen: we had 20 users running their own zuulv3, so the idea was each would have their own cloud creds00:27
pabelangermordred: ++00:27
leifmadsengotcha00:27
mordredpabelanger: one of the things that stood out to me doing the walkthrough with leif earlier is that a getting started guide that can use OCI or static or something similar as a step one, with "now you can plumb in your clouds" as step two feels like a nice incremental approach - once we can do that00:27
pabelangerleifmadsen: also, might be interested in https://git.openstack.org/cgit/openstack-infra/publications/commit/?id=4f0a375f966171a81be4a1c76983c71359e037a1 for example playbooks for jobs.  I did that for my JJB to ansible playbooks talks00:28
leifmadsensomeone (SpamapS?) mentioned the other day that I could just run against the executor itself too00:28
pabelangermordred: right, I think that is inline with what we talked about too00:28
mordred"here's how you can get a zuul on your laptop that will run content all on your laptop with no clouds" - then "here's how you can add clouds" and then "here's how you can add managed base images" ...00:28
leifmadsenpabelanger: cool, might be useful for TOAD I guess00:28
SpamapSpabelanger: did you just do your hello world job without a nodeset?00:28
leifmadsenmordred: +∞00:29
SpamapSBecause I was just poking at doing a few dumb jobs on the executor today.00:29
SpamapSLike, I have a few that just validate YAML00:29
pabelangerSpamapS: we didn't just noops, due to time. The plan was to use a trusted playbook on executor00:29
SpamapSdon't need to install anything, just some python. Don't need a node for that. :)00:29
leifmadsenI probably need to back up at some point and just drop nodepool entirely in the quickstart00:29
pabelangerleifmadsen: yah, that's what we did.  I was hoping to demo with RDO cloud on my laptop, but ran out of time00:30
clarkbmordred: static to localhost would be super easy00:30
pabelanger90mins goes fast!00:30
clarkbmordred: and not require any additional estup or software00:30
pabelangeri should check if devconf.cz has a hands-on workshop session too00:32
leifmadsenalso, I've been wanting to run this on Fedora instead of Ubuntu00:32
leifmadsenFedora has gone well. Pretty sure I'll abandon CentOS for now.00:32
pabelangeryah, I've dropped centos for now00:32
leifmadsenmaybe add some sidebar stuff later, but automating on Fedora is much easier00:32
pabelangerbut fedora works great00:32
leifmadsenI mostly have zero interest in Ubuntu stuff00:32
clarkbthe only difference I would expect between the two is bubblewrap00:33
clarkbeverything else should be fairly transparent00:33
clarkbvirtualenv, run process, win00:33
pabelangerclarkb: yah, having fedora shipping bwrap is nice. We need to get our bwrap into backports for xenial00:34
pabelangeralso, devconf.cz does have working groups :) We should totally do an installfest for zuulv300:34
clarkbbwrap works great on tumbleweed too00:34
pabelangerclarkb: when are we getting a DIB :D00:35
clarkbpabelanger: dirk says its up for review00:35
pabelangercool00:35
clarkbits on my list of things to do now that I'm home00:35
clarkb(review the change)00:36
pabelangerokay, submitted JJB to ansible talk again to devconf.cz00:48
*** jkilpatr has quit IRC01:05
SpamapSMy Zuul runs on CentOS 703:24
SpamapSwith bubblewrap from rawhide03:24
SpamapSpabelanger: ahh.. a week in snowy Brno. ;)03:32
pabelanger:)03:33
SpamapSahhhh.. sweet sweet 1st class upgrade (even on a 60 minute flight.. so good)04:16
pabelangerWhere did we land on the tox with sudo job, did that ever get resolved?04:34
SpamapSinteresting07:13
SpamapSI tried to make an executor-only job that runs a ruby program,but ruby does not like running in the bwrap07:13
SpamapShttp://paste.openstack.org/show/626236/07:13
* SpamapS heads to bed to ponder this in dreamland07:14
*** xinliang has quit IRC07:21
*** xinliang has joined #zuul07:33
*** xinliang has quit IRC07:33
*** xinliang has joined #zuul07:33
openstackgerritMerged openstack-infra/zuul-jobs master: Add role to build Puppet module  https://review.openstack.org/51948907:56
*** bhavik has joined #zuul08:42
openstackgerritRui Chen proposed openstack-infra/nodepool feature/zuulv3: Fix nodepool cmd TypeError when no arguemnts  https://review.openstack.org/51958208:47
openstackgerritRui Chen proposed openstack-infra/nodepool feature/zuulv3: Fix nodepool cmd TypeError when no arguemnts  https://review.openstack.org/51958208:57
*** bhavik has quit IRC09:23
*** jianghuaw has quit IRC09:26
*** rbergeron has quit IRC09:34
*** rbergeron has joined #zuul09:34
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Allow run to be list of playbooks  https://review.openstack.org/51959609:53
pabelangerfun review is anybody else is a wake09:53
tobiashpabelanger: this looks ok to me but I didn't fully understand your use case10:05
tobiashthe use case described in the commit message should also be possible with one playbook containing multiple plays10:06
tobiashor did I overlook something?10:06
openstackgerritAkihiro Motoki proposed openstack-infra/zuul-jobs master: Fix npm-run-test  https://review.openstack.org/51887911:27
openstackgerritAkihiro Motoki proposed openstack-infra/zuul-jobs master: Fix npm-run-test  https://review.openstack.org/51887911:31
odyssey4meShrews is the request/response protocol documented anywhere? if that's the API, I'd rather be using it...11:46
*** tobiash has quit IRC12:04
*** tobiash has joined #zuul12:06
*** tobiash has quit IRC12:06
*** tobiash has joined #zuul12:07
*** jkilpatr has joined #zuul12:42
Shrewsodyssey4me: it's described in http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html but there is no proper documentation of the protocol itself12:47
Shrewssomething we should strive to correct12:47
Shrewsbut it's sort of fluid right now until we get a proper release12:48
odyssey4meShrews ah, ok - I think we can work that in... it definitely makes sense to... but yeah, using the same protocol as zuul for requests to nodepool makes a lot of sense12:50
Shrewsodyssey4me: this is the class zuul uses for the requests: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/model.py?h=feature/zuulv3#n52612:51
Shrewsodyssey4me: and this is what nodepool uses for what it expects from a request and uses for fulfillment: http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/zk.py?h=feature/zuulv3#n34812:52
Shrewsthe zuul class should be at least a subset of the nodepool class12:53
odyssey4methanks Shrews - I'll be getting back to that in a week or two. I've unfortunately got to context switch right now to something else. :/12:55
mordredodyssey4me: darned context switching12:57
odyssey4meyup, comes with the territory...12:57
mordred++12:57
mordredpabelanger: I agree with tobiash - the patch looks fine but I don't fully understand the words from the commit message12:59
mordredpabelanger: the foo and bar playbooks you have in the test could totally be two different plays in the same playbook ... that said, I don't see a reason why run shouldn't be able to take a list13:00
mordredpabelanger, tobiash: also, I think we should sit on that one until jeblair gets back - I'm not sure if there was an active reason run was a single playbook and not a list13:04
openstackgerritAkihiro Motoki proposed openstack-infra/zuul-jobs master: Fix npm-run-test  https://review.openstack.org/51887913:19
rcarrillocruzmordred, pabelanger : are we good to +A https://review.openstack.org/#/c/453968/1113:33
* SpamapS still trying to figure out bubblewrap + ruby fail :-P13:35
rcarrillocruzShrews: pushed revision on https://review.openstack.org/#/c/500800/5 last night, pls have a look when get a sec13:38
mordredSpamapS: oh good luck with that :)13:38
SpamapSinteresting...13:42
SpamapSso on the executor, if I rw mount /var/lib/zuul into the bwrap, ruby works fine13:42
SpamapSsuggesting that ruby uses $HOME weirdly13:42
SpamapSyep, that's it13:43
SpamapSwell that at least simplifies things. :)13:43
SpamapSI wonder if we should set $HOME in the bubblewrap driver.13:43
SpamapSwe change it in /etc/passwd13:43
tobiashSpamapS: in this case we should probably13:44
SpamapSYeah, easy patch, and I can't see the harm in it.13:46
SpamapS/var/lib/zuul is totally inaccessible otherwise.13:46
mordredSpamapS: yah - I think our intent is to make the workdir appear as $HOME - so if we also need to set the variable to get that to happen, seems sane to me13:48
mordredSpamapS: setting $HOME should avoid the need to rw mount /var/lib/zuul if I'm reading you right, yeah?13:49
SpamapSmordred: yeah, and mounting /var/lib/zuul would be counter to the bwrap mission in this case. :)13:50
SpamapS(Or we could bind mount work_dir on top of $HOME .. but I kind of dislike that)13:50
SpamapSwe rewrote it in /etc/passwd, I think rewriting it in the environment makes a lot of sense.13:51
SpamapSmordred: my intention here is to test out running nodeless jobs that do very little.13:51
SpamapSthis one just runs a silly markdown linter written in ruby13:51
SpamapSSo I'm thinking, just install that script on the executors and let it run on localhost. Also means we can skip most of the stuff I have in my usual base job that verifies nodes and pushes source.13:53
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Override HOME environment variable in bubblewrap  https://review.openstack.org/51965413:57
*** hashar has joined #zuul13:58
*** hashar has quit IRC14:00
*** jianghuaw_ has joined #zuul14:03
*** dmsimard|off is now known as dmsimard14:05
SpamapSUgh, cancelling a job that is being watched ... did not go well14:13
SpamapShttp://paste.openstack.org/show/626270/14:13
mordredSpamapS: I've got some refactoring of that stack on my TDL for this week14:14
SpamapSI think I haven't pulled in a while too14:14
SpamapSBeen avoiding pulling until I can CI my CI14:14
SpamapSmordred: if you can make stream.html just retry over and over if it gets an empty log.. that would be great. I constantly click it about 1s before output has started.14:15
SpamapShitting cmd-R is..hard?14:15
leifmadsenSpamapS: ruby? doing something weird? say it ain't so!14:21
tobiashSpamapS: I thought I had a fix for that14:22
leifmadsenSpamapS: but once you have CI for your CI, how will you test your tester?14:23
* leifmadsen ducks14:23
leifmadsenok, so I think I got far enough that I have a "hello world" job and a dummy base job to load it from. How do I go about ignoring the nodepool stuff for now and running directly on the executor? I'm not able to run the job right now because Zuul complains that it doesn't have permission to ssh-add /root/.ssh/id_rsa. The seems like a simple enough configuration problem, but I think it only does that because it is14:24
leifmadsentrying to connect to the nodepool nodes?14:24
tobiashSpamapS: https://review.openstack.org/#/c/514617/ that should have fixed the empty log14:25
tobiash(if this was the only cause)14:25
SpamapSleifmadsen: when I have CI for my CI I will test my tester by CD'ing my CI'd CI.14:30
SpamapSleifmadsen: also, just using a markdown linter.14:30
SpamapSIf there's a better one in python, I'll use that. :)14:30
leifmadsen:buddy_jesus_thumbs_up:14:30
SpamapStobiash: ah yah, just need to restart zuul-executor to get that one. :)14:32
tobiash:)14:32
SpamapSOh right.. can't use command: on localhost on untrusted jobs. Well poo. I just need a container thingy then.. trusted jobs are a pain.14:38
* SpamapS goes back to running it on a tiny VM14:39
kklimondaI’d like to use zuul pipeline and dependent jobs to introduce “checkpoints” (so that zuul skips part of the pipeline that has succeeded on the previous failure and restarts from the first failed job) - i don’t think that’s possible now, but are there any potential risks in the scheduler I should be aware of before I start looking at the code?14:42
SpamapSkklimonda: that would require keeping state somewhere.14:43
SpamapSkklimonda: but, what restart are you thinking that you want to avoid re-testing?14:44
kklimondagood point, that’s correct :)14:44
kklimondain the gate pipeline we’ll be building packages, then docker containers and then running integration testing14:45
kklimondaThat’s 50 minutes for packaging, another 30+ (probably more) to get containers ready14:46
kklimondaAll this to rerun a flaky test14:46
kklimondaand we do that for 4 different distros, and different OS flavors14:48
kklimondaA failure is costly14:48
kklimondaSo now in our zuul2 setup our jobs check for the existence of artifacts and return early14:48
kklimondaBut it would be nice to make zuul aware of that so it’s not reporting jobs as taking 30 seconds14:49
kklimondaI have thought of adding a different job return status (EXIT_EARLY) to indicate that the job didn’t run (I’m not sure about overloading SKIPPED) but making zuul more aware of the pipeline status would be even nicer14:51
kklimondaObviously, after your comment that’s not something b14:51
kklimondaThat can be done relatively easy*14:52
*** jkilpatr has quit IRC14:53
SpamapSkklimonda: I think the way you're doing it is pretty nice actually. Your artifact repository is keeping state for you.14:54
SpamapSkklimonda: I'd be concerned about missing changes though. What ID do you use to store/fetch them?14:55
kklimondahmm, right now it’s a tulle of (change, patchset, job name)14:56
kklimondaTuple*14:56
kklimondaI believe, I’ll double check when I get to the computer - what should I keep in mind ?14:57
SpamapSSo, that's going to break if you have a long dependent pipeline.15:00
SpamapSIf you're single-repo, it's fine.15:00
SpamapSBut if you have 2 repos in there.. those won't change, but the parent may.15:00
SpamapSI've struggled with this a lot with Zuul actually. Need something that stays with the build from the first moment through to after the merge.15:01
leifmadsenok that's weird... I killed the zuul-executor, and restarted it, and now it just dies in the background...15:01
leifmadsenoh wait, I know why15:03
* leifmadsen facepalms15:03
leifmadsenI killed it, so there was no removal of the /var/run/ file15:03
kklimondaSpamapS: ha, interesting point - that makes it all a non-starter basically. Thanks15:07
kklimondaright now we are not utilizing dependent pipelines and cross-repo dependencies but that's one of the requirements for the newer system.15:08
kklimondabut when you think about it, right now it's also broken15:10
kklimondawell, we can always have a periodic job that will catch anything that slips through cracks ¯\_(ツ)_/¯15:24
kklimondaI think I'll need a drink15:24
*** jkilpatr has joined #zuul15:58
*** hashar has joined #zuul15:59
rbergeronspamaps: I believe i have located the magical karaoke location nearby-ish if you haven't identified a place yet :)16:00
*** jkilpatr has quit IRC16:21
*** jkilpatr has joined #zuul16:29
*** jkilpatr has quit IRC16:46
pabelangertobiash: mordred: yah, I figured commit message would need more details, happy to explain.  Today, this is the inventory file I use, http://git.openstack.org/cgit/openstack/windmill/tree/playbooks/inventory along with the entry point for ansible-playbook: http://git.openstack.org/cgit/openstack/windmill/tree/playbooks/site.yaml. Because the way ansible loads includes, if I setup a single playbook run, with17:04
pabelangerhttps://review.openstack.org/#/c/519596/1/tests/fixtures/config/ansible/git/common-config/zuul.yaml as nodesets, this means a single SSH connection will be used for ansible-playbook runs.  Which messes up my variable include structure, the playbooks were written to either be run on a single host or inventory file with multiple different hosts.  I can use the following nodeset also,17:04
pabelangerhttps://review.openstack.org/#/c/519539/16/.zuul.d/jobs.yaml but means I need to consume 6 nodes in nodepool.  I am hoping ansible 2.4 might fix some of the variable issues, but switching to include_plabooks vs include17:04
pabelangerSo, happy for feedback / suggestions, but in my testing last night, this seems to be the only way to get ansible playbooks to work in zuulv3 from executor17:06
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Add username to build and upload information  https://review.openstack.org/45396817:12
*** clarkb has quit IRC17:14
Shrewsrcarrillocruz: i believe your last patch set on 453968 to fix the tests that you self approved exposes an issue, or at least an inconsistency17:16
Shrewsrcarrillocruz: i'm going to put up a fix17:16
*** hashar is now known as hasharAway17:17
*** clarkb has joined #zuul17:22
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Be consistent with the ZK data model  https://review.openstack.org/51970617:27
Shrewsrcarrillocruz: ^^^17:27
SpamapSrbergeron: woot!18:13
rbergeronspamaps: how goes your kool-aid drinking? is this your official orientation stuff?18:21
rbergeronor was that in sunnyvale a while ago18:21
*** jkilpatr has joined #zuul18:30
mordredpabelanger: how would you split site.yaml into multiple playbooks otherwise? or would you basically put each of the playbooks listed in your site.yaml in the run list?18:55
pabelangermordred: yah, each in the run list for now is what I was thinking. The other option, is to show how allow use to generate an inventory file like: http://git.openstack.org/cgit/openstack/windmill/tree/playbooks/inventory but have same IP for ansible_host setting of each node.  Which, might be something good to support regardless.18:59
pabelangerwill be single node in nodepool, but allow ansible to think it is multiple hosts19:00
mordredpabelanger: you can do that already ...19:01
mordredpabelanger: just make a nodeset witha single node and that node assigned to multiple groups19:01
pabelangermordred: like https://review.openstack.org/#/c/519539/14/.zuul.d/jobs.yaml ?19:03
mordredpabelanger: yup!19:06
pabelangermordred: right, so that does not work as I would expect, as it related to variable scoping. I think because ansible only uses a single SSH connection, variables are not reset between play runs as I would expect with running ansible-playbook multiple times19:07
mordredpabelanger: gotcha19:07
pabelangermordred: the good news is, using 5 nodes in nodepool, playbooks work as expected19:09
mordred\o/19:10
Shrewsconverting legacy devstack jobs to use the new native parent job is not as well documented as one would like19:49
Shrewssomething better than zero would be nice19:50
dmsimardianw: so what we do for testing zuul callbacks right now is to set up a "nested" zuul in a multinode job but we're just running ansible-playbook, we're not actually exercising the executor20:00
dmsimardianw: nodeset and job is here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/.zuul.yaml?h=feature/zuulv3#n1620:01
dmsimardI don't know to what extent we could use this approach to exercise base jobs or trusted playbooks from an executor's POV20:01
dmsimardmordred, jeblair: has there been any new ideas regarding how to test config-repo/trusted execution ? The fact that these aren't integration tested still bothers me very much.20:02
dmsimardI haven't come up with ideas that don't involved somehow setting up a nested zuul and, like, enqueuing a job manually.. but then that also requires an openstack cloud, ugh20:03
mordreddmsimard: once we land static node support, we should be able to just make a two-node job that installs zuul on one, registers the othre as a static node and then runs a job20:07
SpamapSrbergeron: yes, kool-aid drinking happening20:07
ianwmordred: are there changes out there for that already?20:10
ianwjust thinking that maybe the nodepool dsvm jobs maybe aren't that far off having a zuul added to them20:13
mordredianw: https://review.openstack.org/#/c/468624/20:16
dmsimardmordred: so then what ? the nested zuul has to load all the 2000 repository worth of configuration before being able to run and stuff ?20:19
mordreddmsimard: nah - we'd just write out a config file with only a few repos listed I'd guess - or potentially even just use the git driver and point it at repos on the local disk *waves hands*20:20
mordreddmsimard: I mean, I don't know the full answer yet, but I know we should at least be able to do something once static node is there without needing an openstack :)20:21
* dmsimard nods20:21
dmsimardprogress20:21
mordredyah.20:21
mordredbaby steps20:21
pabelangerI'd be totally interested on how we'd trigger a job run in zuul for that20:22
*** hasharAway has quit IRC20:23
mordredpabelanger: magic?20:23
pabelangermaybe fedmsg :)20:23
mordred:)20:23
mordredpabelanger: fedmsg is becoming the new AFS ... it's the answer we roll out for all the questions :)20:24
pabelangerindeed!20:24
dmsimardpabelanger: I was thinking just ... "zuul enqueue thing thing"20:37
dmsimardShrews: seeing NodeExists errors in our zookeeper.. which apparently only exists here: http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/zk.py?h=feature/zuulv3#n130820:54
dmsimardShrews: if I stop the (only) launcher we have, I stop seeing the error in the zk logs.. but they resume as soon as I start nodepool. I tried nothing and I'm all out of ideas.20:55
dmsimardfor example: http://paste.openstack.org/raw/626294/20:56
Shrewsdmsimard: you tried nothing?  lol20:59
Shrewsdmsimard: what does your config look like?20:59
Shrewsyour pools need unique names20:59
Shrewsi'm guessing you have 2 pools under the np3 provider named "main"21:02
Shrewsor you have two providers named "np3" (which is less likely, but possible i guess)21:03
dmsimardShrews: I was mostly kidding, just not sure where to look. As far as I can tell, this is a config with one pool and one launcher21:06
dmsimardI can pull up some configs, hang on21:06
Shrewsdmsimard: if it's not your config, then i'll have no idea since that's the only thing that i know it could possibly be21:08
dmsimardShrews: this is the nodepoolv3 nodepool.yaml: http://paste.openstack.org/raw/626296/21:09
Shrewsdmsimard: if that's your config, and you're sure you're starting only a single launcher, that is truly a mystery then because that should not be possible21:11
Shrewseven with multiple launchers that shouldn't happen because the process ID is part of the launcher ID21:12
dmsimardShrews: /me googles how to query zk21:13
Shrewsdmsimard: oh, wait21:13
Shrewsthat's not a nodepool error message21:13
Shrewsthat's from zk21:13
Shrewsdmsimard: it's fine, unless your nodepool is not working21:14
dmsimardyeah in fact the issue is a NODE_FAILURE but there's no real hints in zuul or nodepool logs21:14
dmsimardat least afaict21:14
dmsimardso I was digging around21:14
Shrewsdmsimard: the main pool worker loop always tries to register itself, but ignores that error21:15
dmsimardShrews: ah, okay, so red herring.21:15
Shrewsyeah21:15
Shrewsdmsimard: http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/launcher.py?h=feature/zuulv3#n26621:16
dmsimardah, okay.21:16
dmsimardShrews: this is the logs I'm getting: http://paste.openstack.org/raw/626297/21:16
dmsimardthe debug logs aren't particularly helpful, only adding a line of context: http://paste.openstack.org/raw/626298/21:20
Shrewsdmsimard: i can't tell much from that, but given the config you pasted, and that output, i'm guessing you're asking for a "centos-oci" node type, but that is not a valid label for that launcher.21:21
Shrewsv3-dib-centos-7 is defined21:21
dmsimardoh man that's it21:21
dmsimard2017-11-14 21:21:21,281 DEBUG nodepool.driver.openstack.OpenStackNodeRequestHandler[zuulv3.27-rc1.com-29912-PoolWorker.np3-main]: Declining node request 200-0000000007 because node type(s) [centos-oci] not available21:21
dmsimardthat probably should be raised from debug ?21:21
dmsimardI mean, INFO or something21:21
Shrewsyeah, was gonna say there should be a message21:22
Shrewsi think debug is appropriate, IMO21:22
clarkbzuul side should probably error though?21:22
clarkbits not an error on the nodepool side, but if zuul is requesting invalid labels it would be an error to zuul?21:23
Shrewszuul knows (and logs) that the request failed, but doesn't know the reason21:23
Shrewswe don't pass that info thru zk21:23
clarkbah21:24
dmsimardthe failure reason should be obvious without having to turn on debug :/21:27
clarkbdmsimard: I agree. Its just that its not an error for nodepool to get bad requests (sanitizing and handling external input and all that)21:31
clarkbso the error should be raised on the requestor side imo21:31
clarkbwould it make sense to always treat it as an error in zuul if the request can't be fullfilled from nodepool? regardless of reason?21:32
Shrewsdmsimard: it's debug because you can have multiple providers, each with different labels, and any one of them can handle the request. we'd be littering the INFO with extraneous entries that would make it noisy21:40
Shrewsso if you have 4 providers, 3 could potentially decline the request because "invalid label", but the 4th might handle it the request just fine21:41
Shrewsthe problem with returning that info back to zuul is each provider pool might decline it for different reasons.21:43
clarkbya I think what is more important on the zuul side was I asked for X and everything decline it21:43
clarkbthe error would be I couldnt'21:43
clarkb get the resource I asked for anywhere21:43
Shrewsclarkb: yeah, something more substantial on the zuul side would be nice21:44
*** zigo has quit IRC21:59
*** zigo has joined #zuul22:01
*** threestrands has joined #zuul22:15
*** threestrands has quit IRC22:21

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!