*** herlo has quit IRC | 02:59 | |
*** herlo has joined #zuul | 03:01 | |
*** herlo has quit IRC | 03:01 | |
*** herlo has joined #zuul | 03:01 | |
*** saneax-_-|AFK is now known as saneax | 03:04 | |
*** saneax is now known as saneax-_-|AFK | 03:32 | |
*** Cibo_ has joined #zuul | 04:03 | |
*** bhavik1 has joined #zuul | 04:41 | |
*** bhavik1 has quit IRC | 04:49 | |
*** saneax-_-|AFK is now known as saneax | 04:51 | |
*** saneax is now known as saneax-_-|AFK | 05:16 | |
*** saneax-_-|AFK is now known as saneax | 06:02 | |
*** TheJulia_ has joined #zuul | 06:29 | |
*** patrickeast_ has joined #zuul | 06:29 | |
*** TheJulia has quit IRC | 06:33 | |
*** patrickeast has quit IRC | 06:33 | |
*** TheJulia_ is now known as TheJulia | 06:33 | |
*** patrickeast_ is now known as patrickeast | 06:33 | |
*** abregman has joined #zuul | 06:35 | |
*** abregman has quit IRC | 09:01 | |
*** Cibo_ has quit IRC | 09:02 | |
*** openstackgerrit has quit IRC | 09:03 | |
*** bhavik1 has joined #zuul | 09:11 | |
*** bhavik1 has joined #zuul | 09:11 | |
*** hashar has joined #zuul | 09:17 | |
*** abregman has joined #zuul | 09:34 | |
*** bhavik2 has joined #zuul | 10:06 | |
*** bhavik1 has quit IRC | 10:08 | |
*** bhavik2 has quit IRC | 10:11 | |
*** Cibo_ has joined #zuul | 11:44 | |
*** openstackgerrit has joined #zuul | 13:29 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Implement node launching https://review.openstack.org/431523 | 13:29 |
---|---|---|
Shrews | jeblair: 431523 is where I might need some guidance in using some fakes for the test_nodelaunchmanager tests | 13:30 |
mordred | Shrews: you ready for the game tonight? | 13:38 |
Shrews | mordred: oh? is there some sort of sportsball event occuring in this area tonight??? I hadn't heard | 13:39 |
* Shrews delets the snark and changes answer to "hell yes" | 13:40 | |
mordred | :) | 13:40 |
mordred | nicely played | 13:40 |
Shrews | mordred: are we taking bets on which quarter Grayson trips someone? | 13:41 |
Shrews | err, half | 13:41 |
Shrews | i have to get out of football mode | 13:41 |
mordred | Shrews: it's the UNC game - the answer should be 'both' right? | 13:48 |
mordred | Shrews: jesusaur has some review comments for you on https://review.openstack.org/#/c/428428/ that you might otherwise miss | 13:55 |
mordred | Shrews: (I believe they can be handled in a follow up if they're valid) | 13:56 |
Shrews | mordred: yep. getting around to responding | 13:58 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Assign node set to node requests https://review.openstack.org/428428 | 13:59 |
mordred | Shrews: k. I just +2'd that stack | 14:01 |
jeblair | they are good comments, i'm glad i saw them | 14:03 |
jeblair | Shrews: i will pitch in on 523 after breakfast | 14:03 |
Shrews | mordred: jeblair: i think all of his concerns are already addressed, except for the min-ready one which I'm a bit unclear on. But none of that is implemented yet, anyway. | 14:04 |
openstackgerrit | Merged openstack-infra/nodepool master: Support AFS mirrors for nodepool diskimages https://review.openstack.org/414273 | 14:05 |
*** saneax is now known as saneax-_-|AFK | 14:12 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Remove 'template-' from image name format https://review.openstack.org/431564 | 14:13 |
Shrews | pabelanger: mordred: ^^^ addresses the 'template' thing | 14:13 |
mordred | ++ | 14:15 |
Shrews | jeblair: re: 523, i really need access to the config Provider object, the dict of config Label objects, and the ProviderManager to pass to the launch manager. I have yet to devine how to do that (though I've gotten distracted by other tasks since I submitted that). | 14:16 |
Shrews | mordred: lemme make another cup of coffee and i'll get to your shade reviews | 14:16 |
mordred | mmm. coffee | 14:17 |
mordred | I shall also drink another cup of coffee | 14:17 |
jeblair | Shrews: well, all of those things used to be in the main config object | 14:18 |
*** jlk has quit IRC | 14:22 | |
openstackgerrit | Merged openstack-infra/zuul master: Fix setting of GIT_SSH for timer merge jobs https://review.openstack.org/430872 | 14:30 |
jeblair | Shrews: so i guess i'm not entirely sure what you're asking there or how i can help | 14:46 |
* Shrews context switches back | 14:46 | |
Shrews | jeblair: how do i grab this "main config" object? | 14:47 |
jeblair | Shrews: there's a path from the main nodepool class through the request handler and nodelaunchmanager to the node launcher, right? | 14:47 |
Shrews | it looked like i'd have to poke into the builder or nodepool internals to get it | 14:47 |
openstackgerrit | Merged openstack-infra/zuul master: Set keepalives for gerrit connections https://review.openstack.org/238988 | 14:49 |
Shrews | jeblair: well, yeah. but in the test scenarios, I don't have all those things | 14:49 |
Shrews | which is why i suggested the mocks, so i can simplify the test to only testing nodelaunchmanager | 14:49 |
jeblair | Shrews: ah, there's a useNodepool method that gets you a nodepool | 14:50 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Put loggers into zuul namespace https://review.openstack.org/431579 | 14:50 |
jeblair | that's in the test_node_assignment test | 14:51 |
jeblair | so it should be running a nodepool | 14:51 |
Shrews | jeblair: yeah. i was sort of hoping there was an existing way to avoid poking into those internals, but I can do it that way. | 14:52 |
jeblair | Shrews: are you asking about getting that information from within the test? | 14:53 |
Shrews | yes | 14:53 |
Shrews | i'm going to have to start a builder so that the launch code will work, so maybe i can poke into that object instead | 14:54 |
Shrews | i don't think i need to start nodepool | 14:55 |
openstackgerrit | Merged openstack-infra/zuul master: Log a warning when zuul.conf is misconfigured https://review.openstack.org/250270 | 14:55 |
jeblair | Shrews: how about just enhancing the TestNodepol tests (like test_node_assignment)? they should eventually cover all of this, right? | 14:56 |
openstackgerrit | Merged openstack-infra/zuul master: Tidy up loggers https://review.openstack.org/224336 | 14:56 |
jeblair | Shrews: (so basically, do test this with both a builder and a nodepool) | 14:56 |
Shrews | jeblair: yes, the TestNodepool tests will test the overall functionality. But I wanted the test_nodelaunchmanager tests to explicity test the details of the launch manager. | 14:58 |
Shrews | i.e, i wanted to make sure ready_nodes and failed_nodes get set correctly in certain situations | 14:58 |
jeblair | Shrews: gotcha. then, yeah, the plan to borrow the config from a running builder (and/or nodepool) seems sound :) | 15:00 |
*** saneax-_-|AFK is now known as saneax | 15:27 | |
*** abregman is now known as abregman|afk | 15:29 | |
*** saneax is now known as saneax-_-|AFK | 16:11 | |
jeblair | mordred: just looking at ps7 of 428798 -- do we still need the custom async action plugin? | 16:20 |
*** abregman|afk has quit IRC | 16:20 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Implement node launching https://review.openstack.org/431523 | 16:22 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Implement node launching https://review.openstack.org/431523 | 16:27 |
*** yolanda_ is now known as yolanda | 16:41 | |
Shrews | jeblair: I think we should add 'external_id' to your node definition from the spec. Will be needed to delete the instance I believe | 16:43 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Implement node launching https://review.openstack.org/431523 | 16:51 |
jeblair | Shrews: yes, that sounds like an inadvertent omission. | 16:56 |
mordred | jeblair: yes. we have to use our custom async/normal plugins to inject the log streaming into the test node for shell actions | 16:57 |
Shrews | pabelanger: were you going to begin on nodepool commands? I'm kind of to the point where I want to at least have 'list' working so I can re-enable the devstack test to see some output | 17:01 |
Shrews | i can put that up if you haven't had a chance yet | 17:02 |
jeblair | mordred: oh, because the 'async' module is the same for both async and not-async? | 17:02 |
jeblair | mordred: rather, our custom async plugin is run regardless of whether the task is async or not | 17:03 |
mordred | jeblair: yes. at least for us it is | 17:03 |
mordred | yup | 17:03 |
jeblair | mordred: ok. where's the bit that does the log streaming? | 17:04 |
SpamapS | jeblair: hm, my changes to land commits in upstream repos may be interacting with other tests. Those repos aren't unique per-test run right? | 17:04 |
jeblair | SpamapS: upstream repos should be unique per test | 17:04 |
SpamapS | jeblair: hrm. I have a weird timeout going on now. Only happens when all tests are run. | 17:05 |
jeblair | SpamapS: before you go too far down a hole -- this might possibly manifest like that: https://review.openstack.org/430456 | 17:05 |
SpamapS | Oh that seems highly likely | 17:06 |
jeblair | SpamapS: (at least, it's probably worth rebasing on that before duing further analysis, because that could potentially cause some nondeterminism) | 17:06 |
SpamapS | jeblair: it's manifesting in the enablement of test_timer_smtp and it's causing a hard timeout. | 17:06 |
SpamapS | jeblair: danke, trying exactly that | 17:06 |
mordred | jeblair: oh - sorry - the command module: zuul/ansible/library/command.py is for logging. zuul/ansible/action/async.py is for timeout | 17:07 |
jeblair | mordred: okay, that makes more sense! do we still want that timeout? at this point it's starting to feel sort of like us just fixing things about ansible... and instead we should maybe just rely on zuul's internal timeout in v3? | 17:09 |
jeblair | (the 'watchdog timeout' i guess) | 17:09 |
mordred | yes. I agree - I thnk we shoudl just do the watchdog timeout | 17:10 |
jeblair | cool, that should simplify some things | 17:10 |
mordred | yah. thanks for the question - I'd forgotten that's how we broke that out - I'll fix that in just a smidge | 17:15 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Implement node launching https://review.openstack.org/431523 | 17:18 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Implement nodepool 'list' command https://review.openstack.org/431647 | 17:18 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: DNM: Re-enable devstack test https://review.openstack.org/431649 | 17:20 |
Shrews | ^^^ result of that should be fun | 17:20 |
* Shrews goes to get lunch | 17:20 | |
openstackgerrit | Monty Taylor proposed openstack-infra/nodepool master: Consume Task and TaskManager from shade https://review.openstack.org/414759 | 17:22 |
SpamapS | I need a bigger laptop to run zuul tests.. | 17:23 |
SpamapS | top - 09:23:09 up 6 days, 22:45, 2 users, load average: 8.91, 5.72, 2.95 | 17:23 |
SpamapS | jeblair: so, rebasing on that resulted in a test breaking, and still getting Alarm clock fail. | 17:24 |
jeblair | SpamapS: is the one you're working on the one that timed out? does it time out on its own? | 17:26 |
*** hashar has quit IRC | 17:27 | |
SpamapS | jeblair: hrm no, I think my rebase went awry somewhere | 17:28 |
SpamapS | let me try again | 17:28 |
* SpamapS assembles one patch at a time | 17:28 | |
*** jlk has joined #zuul | 17:29 | |
SpamapS | jeblair: 430456 fails when rebased on feature/zuulv3 | 17:30 |
SpamapS | http://paste.openstack.org/show/598271/ | 17:31 |
jeblair | SpamapS: ah, must have merged an unskip since tests ran. zuul-connections-multiple-gerrits.conf will need to be updated like the other conf files in 456 | 17:33 |
SpamapS | jeblair: want me to push that up? Changing that moved me forward. | 17:37 |
* SpamapS has to detach for a bit to take wife to airport.. bbl | 17:39 | |
jeblair | SpamapS: ++ | 17:39 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add action plugins to restrict untrusted execution https://review.openstack.org/428798 | 17:42 |
mordred | jeblair: ^^ that re-cleans up the non-necessary action plugins - still wip though (I should WIP the commit message) | 17:43 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP Add action plugins to restrict untrusted execution https://review.openstack.org/428798 | 17:43 |
jesusaur | mordred: thanks for pointing out my comments, i'm slowly starting to get my feet wet in the zuulv3 review pool | 18:00 |
jesusaur | but there's a lot to get caught up on | 18:00 |
mordred | \o/ | 18:01 |
jeblair | mordred: i just realized that at some point in the future (not now, but maybe not too far away) when someone submits a change to a .zuul.yaml file and it has a syntax error, zuul should be able to *leave a line comment on the change pointing out the syntax error in its own configuration* | 18:08 |
mordred | jeblair: ++ | 18:09 |
*** jlk has quit IRC | 18:46 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Implement nodepool 'list' command https://review.openstack.org/431647 | 18:54 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: DNM: Re-enable devstack test https://review.openstack.org/431649 | 18:54 |
Shrews | pabelanger: i went ahead and took care of 'nodepool list' command ^^^. You can base any future command enablement off of that | 18:57 |
Shrews | jeblair: and if my local enablement of the test_list_nodes unit test can be believed, we might actually be launching nodes now. hoping to see the output from 431649 to verify | 18:59 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Supply label name to Nodes https://review.openstack.org/431719 | 19:14 |
jeblair | Shrews: i'm getting drunk from toasting all these milestones! | 19:14 |
mordred | Shrews: wait - like, actual nodes? | 19:17 |
Shrews | mordred: actual virtual solidly ephemeral nodes | 19:17 |
* mordred boggles | 19:17 | |
Shrews | maybe. my local laptop doesn't have the resources, so waiting on the gate | 19:18 |
* SpamapS back | 19:23 | |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Split merger and launcher git roots https://review.openstack.org/430456 | 19:24 |
SpamapS | jeblair: ^ with that one, locally, tests.unit.test_scheduler.TestScheduler.test_dependent_behind_dequeue fails with the hard timeout | 19:25 |
pabelanger | Shrews: ack | 19:27 |
pabelanger | Shrews: I should be able to start later today / tomorrow | 19:27 |
SpamapS | jeblair: I'm trying it again after a 'git clean -xdf' | 19:28 |
SpamapS | just in case | 19:28 |
jeblair | SpamapS: i think that one is either slow or racy, i'm not quite sure. | 19:38 |
SpamapS | same fail on a clean run | 19:39 |
jeblair | SpamapS: does it fail for you when run on its own? | 19:39 |
SpamapS | Oh that is a big test | 19:40 |
SpamapS | Ran 1 test in 27.684s | 19:40 |
SpamapS | jeblair: no, works fine on its own | 19:40 |
SpamapS | testr does have a way to try and see if there are isolation problems | 19:40 |
SpamapS | jeblair: trying with --analyze-isolation | 19:43 |
jeblair | SpamapS: it was running close to the timeout before, so i doubled the timeout from 30 to 60s, but it's possible that it's hitting even that when run along with all the other tests due to load. but unfortunately, i don't know; we don't get logs with hard timeouts (and i checked, we still don't cleanup correctly with soft ones). so it's also possible there is a race that only shows up under load. | 19:45 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Set 'cloud' param for integration config file https://review.openstack.org/431727 | 19:45 |
SpamapS | jeblair: after --analyze-isolation finishes, I'll try with --concurrency=3 which should back the load off a bit | 19:46 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Supply label name to Nodes https://review.openstack.org/431719 | 19:46 |
SpamapS | jeblair: with us forking git a lot I wonder if we're just overstepping our CPU resources | 19:47 |
SpamapS | I do see 74% system CPU load | 19:47 |
pabelanger | jeblair: do you mind reviewing 430329 and 430339 again, that is for nodepool-launcher | 19:48 |
jeblair | Shrews: btw, i've been thinking: if we're going to keep label around as a concept, we may want to alter the zuul-nodepool protocol to specify 'label' instead of 'type'... | 19:48 |
jeblair | pabelanger: sure | 19:49 |
Shrews | jeblair: maybe. i admit, 'type' had me confused until just recently | 19:49 |
Shrews | label seems much more cohesive | 19:49 |
jeblair | pabelanger: i'm not sure i follow your comment in 430329... you say you updated system-config to support the new syntax in 430324, but what i'm worried about is 430329 changing what's running on nodepool.openstack.org | 19:51 |
jeblair | pabelanger: it looks like 430329 would break what is currently running on nodepool.o.o and 430324 doesn't correct that | 19:52 |
pabelanger | jeblair: Oh, I believe I added install_nodepool_launcher = true, into the wrong node for 430324 | 19:53 |
pabelanger | let me double check | 19:54 |
jeblair | pabelanger: oh, no you added it to the right one | 19:54 |
jeblair | pabelanger: i just didn't realize that snaked through to do the same things that are being removed in 430329 | 19:55 |
jeblair | pabelanger: i'm re-evaluating my comment :) | 19:55 |
pabelanger | Ya all the layers we have in puppet are starting to get a little confusing :) | 19:55 |
pabelanger | I went cross eyed yesterday with zuul | 19:55 |
jeblair | pabelanger: partly this is my fault because of the emergency launcher split. but not entirely. ;) | 19:55 |
pabelanger | I do agree, there is a chance for breakage. so extra reviews are most welcome | 19:56 |
*** Cibo_ has quit IRC | 19:57 | |
jeblair | weird, why do i still have a -1 on 430329? that was 4 patch sets ago. | 19:57 |
jeblair | +2 now | 19:57 |
jeblair | pabelanger: re 430324 was there not already a nodepool-launcher log config file? | 19:58 |
Shrews | dang. either the node didn't launch, or the 'list' command isn't working | 20:00 |
pabelanger | jeblair: I think that is crud, confirming | 20:00 |
jeblair | pabelanger: otherwise lgtm | 20:00 |
* jeblair lunches | 20:00 | |
pabelanger | yes, crud. Deleted | 20:01 |
clarkb | pabelanger: why would we want a nodepool launcher on nodepool.o.o at this point? | 20:04 |
clarkb | isn't that why we have nl01 instead? | 20:04 |
clarkb | and also does it make sense to have people just use openstack_ci::nodepool_launcher rather than adding a new flag to openstack_ci::nodepool? | 20:05 |
clarkb | (I think thats the more puppety way to do it, though we have been flag happy so this is probably more consistent with how openstackci currently works) | 20:05 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: DNM: Re-enable devstack test https://review.openstack.org/431649 | 20:07 |
pabelanger | clarkb: we still need nodepool-launcher on nodepool.o.o, so we don't break production. Because, we do run nodepool-launcher init.d script | 20:07 |
pabelanger | which just does nodepoold --no-deletes --no-web, etc | 20:07 |
pabelanger | I was assuming, we'd run nl01 in parallel until we were ready to switch to production | 20:08 |
clarkb | pabelanger: right isn't that what the split_daemon flag is for? | 20:08 |
pabelanger | clarkb: it is, but if you look at 430329, you can see how I added the new logic | 20:09 |
pabelanger | which should allow both master and feature/zuulv3 run | 20:09 |
pabelanger | but, I'm open to suggestions if there is a better way I am not seeing | 20:09 |
pabelanger | actually | 20:10 |
clarkb | right so I guess what I owuld do is not add a new flag | 20:10 |
clarkb | just do if split_daemon | 20:10 |
clarkb | and then you are backward compatible | 20:10 |
pabelanger | now that I am looking at 430329. we need to update our DAEMON args, to use the new patch to nodepool-laundher | 20:10 |
jeblair | we should not use 'split_daemon' to mean 'install a v3 launcher' | 20:10 |
clarkb | the problem is this isn't compatible for anyone already using split_daemon. Its ok for the new stuff to require new things | 20:11 |
clarkb | jeblair: I agree | 20:11 |
jeblair | that's pretty user-unfriendly | 20:11 |
jeblair | clarkb: no one but us is using split_daemon | 20:11 |
clarkb | jeblair: to get a v3 daemon you should just include openstackci::nodepool_launcher which is what nl01 does | 20:11 |
*** jlk has joined #zuul | 20:11 | |
clarkb | basically we have 3 different ways to do 2 things. I am asking that we have 2 ways | 20:12 |
clarkb | split_daemon == pre v3 split daemons. openstackci::nodepool_launcher == v3 launcher | 20:12 |
clarkb | and remove new flag that is confusing | 20:13 |
pabelanger | I agree, puppet-openstackci is starting to get confusing | 20:14 |
jlk | "starting" | 20:17 |
clarkb | pabelanger: my suggestion would be to keep the interface to openstackci::nodepool as is (edit internally as necessary) for pre v3 setup. Then use openstackci::nodepool_launcher for v3 zk speaking nodepool launcher | 20:19 |
clarkb | pabelanger: but I may be missing soething that complicates doing it that way | 20:19 |
pabelanger | clarkb: no, I think we could do what you are suggesting, but it means a sight difference on how nodepool-builder and nodepool-launcher could be installed with puppet-openstackci | 20:23 |
pabelanger | I don't have much preference on which way we go | 20:23 |
clarkb | pabelanger: because builder already has its own "run a builder" flag? | 20:24 |
pabelanger | clarkb: ya | 20:24 |
clarkb | hrm | 20:24 |
Shrews | Does the ubuntu image built by DIB set a password for the ubuntu user? | 20:34 |
mordred | Shrews: I do not believe so | 20:35 |
pabelanger | we don't have a ubuntu user do we? | 20:35 |
mordred | yah - that's an even better point | 20:35 |
*** harlowja_ has joined #zuul | 20:35 | |
mordred | Shrews: there is a devuser element you can add to the element list which will put your local key information into the image | 20:36 |
mordred | Shrews: for the ones we make for actual nodepool, we have an element that runs puppet and adds the user accounts and keys for all of the infra-root humans | 20:36 |
*** harlowja has quit IRC | 20:37 | |
pabelanger | we did something recently for tripleo too, to allow root user SSH key or password | 20:37 |
pabelanger | trying to find the commit | 20:37 |
clarkb | pabelanger: glean already does it, just had to allow ssh to root iirc | 20:38 |
clarkb | pabelanger: so if you config drive + glean + nodepool dib image you cna ssh as root | 20:38 |
Shrews | i'm just trying to figure out how to setup my nodepool.yaml to not make nodepoold bork | 20:38 |
clarkb | which makes the hardest part of that figuring out config drive | 20:38 |
pabelanger | ya, maybe this was for console password | 20:38 |
Shrews | but, i don't think i can actually do this locally | 20:38 |
clarkb | Shrews: you mean the ssh into node test? | 20:39 |
Shrews | clarkb: yeah | 20:39 |
clarkb | Shrews: you just have to make your dib build create a user and inject a key, the devstack plugin does this | 20:39 |
clarkb | I think using devuser? | 20:39 |
pabelanger | ya, we use that | 20:39 |
Shrews | ah, i see it now. thx | 20:41 |
clarkb | pabelanger: back to puppeting things. Can we make it so that split_daemons is the thing for launcher pre v3 and have a flag for proper launcher post v3? I think most of my confusion is in how those are getting smushed together | 20:43 |
clarkb | pabelanger: but that way we are consistent as far as if you want new thing flip flag, if using old thing you are fine as is | 20:43 |
pabelanger | clarkb: sure, if we are okay with the duplicate code for now | 20:47 |
*** jlk has quit IRC | 20:50 | |
clarkb | pabelanger: what parts are duplicated (maybe this is actually what I am confused on)? The old launcher is run nodepoold with flags, new launcher is its own separate daemon without selective bits disabled right? The overlap is that they both need init scripts? | 20:51 |
*** jlk has joined #zuul | 20:51 | |
pabelanger | clarkb: so, I was thinking we'd create openstackci::nodepool_launcher class, by moving the current code from openstackci::nodepool into it | 20:53 |
pabelanger | aside from the new DAEMON syntax, we could reuse everything from today | 20:53 |
clarkb | gotcha | 20:53 |
pabelanger | if I am understanding you right, you want to keep openstackci::nodepool un touched | 20:54 |
pabelanger | but still okay with openstackci::nodepool_launcher | 20:54 |
clarkb | pabelanger: no I am ok with adding the nodepool_launcher to nodepool (wasn't originally but you pointed out we do that for the builder) | 20:54 |
clarkb | but make it v3 specific, right now I think its straddling both v2 and v3? | 20:54 |
pabelanger | right | 20:54 |
clarkb | eg if you don't have it set then your old split daemon setup will stop working too | 20:54 |
pabelanger | ya, so I tried to do that, but it actually meant a lot more puppet code, because of duplicate Class[nodepool_launcher] things | 20:55 |
pabelanger | so | 20:55 |
pabelanger | if we don't update openstackci::nodepool, keep the original code, then add nodepool_launcher.pp, I think we'll be okay | 20:56 |
pabelanger | then, after v3 lands, we can go back and dedupe | 20:56 |
SpamapS | jeblair: weirdly enough, setting concurrency to 2 made things _much_ worse. | 21:02 |
clarkb | pabelanger: ok. I think the biggest thing to avoid would be not taking the opportunity to make the interface to using v3 better as we roll it out. even if maybe that means a little duplication in places for a while | 21:02 |
clarkb | pabelanger: and I am pretty sure puppet is trying to push towards more composable bits rather than single thing alreayd composed iwth all the flags | 21:02 |
SpamapS | jeblair: I'd have expected it to take twice as long, but it's at 15 minutes and lots of fail spewed | 21:03 |
SpamapS | jeblair: aha! | 21:16 |
SpamapS | it failed soft | 21:16 |
SpamapS | http://paste.openstack.org/show/598306 | 21:16 |
SpamapS | oh hm.. it cut off | 21:17 |
SpamapS | https://gist.github.com/anonymous/e0ffec24a61905b8d67ea3e2d36e2e24 <-- untruncated | 21:19 |
SpamapS | Exception: Timeout waiting for Zuul to settle | 21:19 |
SpamapS | stderr: 'fatal: repository '/home/clint/tmp/tmpfNCGOh/zuul-test/upstream/common-config' does not exist | 21:20 |
SpamapS | that seems like it might be a problem! | 21:20 |
SpamapS | GitCommandNotFound: Cmd('git') not found due to: OSError('[Errno 2] No such file or directory: '/home/clint/tmp/tmpfNCGOh/zuul-test/tmp3J1ByO/git/org/project3'') | 21:20 |
SpamapS | guessing there's somewhere else assuming the subdir name | 21:21 |
jeblair | SpamapS: if that happened after the test shutdown started, that might just be zuul continuing to run during shutdown | 21:22 |
jeblair | SpamapS: yeah, those errors are all after this: | 21:26 |
jeblair | 2017-02-09 12:45:54,707 zuul.test ERROR Timeout waiting for Zuul to settle | 21:26 |
jeblair | SpamapS: after that point, all bets are off as tmpdirs may be deleted while things are still running | 21:26 |
jeblair | SpamapS: looking at that, my current thinking is that maybe we just need to further increase the settle timeout (and maybe we should just do it for that test). but we should also probably take a close look at what that test is doing, and make sure we haven't broken something in zuul that makes it take much longer than it should. | 21:30 |
jeblair | SpamapS: (that might best be done with an eye to making sure the INFO log level is useful (because OMG debug) -- cleanup is sorely needed there) | 21:31 |
*** Shrews has quit IRC | 21:35 | |
*** Shrews has joined #zuul | 21:36 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Implement node launching https://review.openstack.org/431523 | 21:44 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Implement nodepool 'list' command https://review.openstack.org/431647 | 21:45 |
Shrews | jeblair: hrm, not quite sure what's going on with node launching. the 'list' command isn't returning any data: http://logs.openstack.org/49/431649/3/check/gate-dsvm-nodepool/6926ac6/console.html#_2017-02-09_20_30_16_096126 | 21:52 |
Shrews | jeblair: will pick it up again tomorrow | 21:52 |
*** harlowja_ has quit IRC | 21:58 | |
pabelanger | Shrews: working on nodepool hold command | 22:02 |
pabelanger | questions however, we do we default to 0000000000 as the first node id? | 22:03 |
pabelanger | and not 0000000001 | 22:03 |
pabelanger | like images? | 22:03 |
*** harlowja has joined #zuul | 22:05 | |
pabelanger | Shrews: also, left a comment on 431647 | 22:07 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Update nodepool list to use zookeeper https://review.openstack.org/431756 | 22:08 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Update nodepool hold to use zookeeper https://review.openstack.org/431756 | 22:09 |
pabelanger | Shrews: ^, should I be using a lock? | 22:09 |
Shrews | pabelanger: it's safer to do so, yes | 22:17 |
Shrews | for the lock question, that is | 22:17 |
Shrews | pabelanger: as for the node id, i guess just assume the same as images | 22:18 |
pabelanger | Shrews: right, but I don't understand why our images first build is 0000000001 and our first node is 0000000000. Any ideas why that is? I would have expected both to start at 01 | 22:20 |
Shrews | jeblair: oh! i just realized why that test isn't working. i have yet to support min-ready, so nothing is being built | 22:20 |
Shrews | pabelanger: i think we went through that a while back. it has something to do with deep ZK internals | 22:20 |
Shrews | pabelanger: i thought so far everything was starting on 0000 | 22:22 |
Shrews | but my memory is likely failing me | 22:22 |
pabelanger | okay, weird | 22:22 |
pabelanger | unit tests for images, are 01 right now | 22:23 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Update nodepool hold to use zookeeper https://review.openstack.org/431756 | 22:23 |
pabelanger | now with locking^ | 22:23 |
pabelanger | I'll move on to unhold, if ^ looks good | 22:25 |
Shrews | pabelanger: so, waitForNodeRequest returns a new request that should have request.id populated. You can use that instead of the hardcoded 000000000 | 22:26 |
Shrews | and avoid that whole "guess the number" mess | 22:26 |
pabelanger | k | 22:27 |
Shrews | pabelanger: ugh. no | 22:27 |
Shrews | it returns the request, not the node. duh. but request.nodes should have the node id | 22:28 |
jeblair | Shrews: aha, so 'list' is successfully failing! :) | 22:28 |
* Shrews really should EOD | 22:28 | |
Shrews | jeblair: yes | 22:28 |
Shrews | so i guess that's tomorrow's task | 22:29 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Update nodepool hold to use zookeeper https://review.openstack.org/431756 | 22:30 |
Shrews | ok. i'm going away 4realz | 22:32 |
Shrews | night | 22:32 |
pabelanger | later | 22:32 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Inherit playbooks and modify job variance https://review.openstack.org/430483 | 23:27 |
jeblair | mordred: ^ for you to look at tomorrow | 23:28 |
jeblair | that should really be two separate changes, but i've been working on it for 2 days, so really just wanted to push something up. if folks need me to split it, i can. | 23:29 |
jeblair | it makes major changes to the Job class, but i think they're worth it. it clarifies a lot of gray areas, and i'm pretty sure i can actually explain how job inheritance and variance work now. | 23:31 |
mordred | jeblair: ++ | 23:43 |
mordred | jeblair: I think the most important outcome is a clear and succinct description of job inheritance and variance | 23:43 |
mordred | rbergeron: ^^ btw - we may be getting close to writing some new-user/end-user related doc type things | 23:44 |
rbergeron | oh like things i can do? ;) | 23:46 |
rbergeron | i mean im not on a plane and i love feeling useful | 23:47 |
* mordred hands jeblair and robyn some slices of pie he found over yonder | 23:53 | |
rbergeron | pie? | 23:55 |
rbergeron | mmm pie | 23:55 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!