clarkb | the old behavior was to not build an image if it didn't have a target output format | 00:51 |
---|---|---|
openstackgerrit | Merged openstack-infra/nodepool: Add image_name to UploadWorker INFO message https://review.openstack.org/404891 | 01:15 |
*** jamielennox is now known as jamielennox|away | 01:34 | |
pabelanger | sweet | 01:35 |
pabelanger | with nodepool passing the md5 / sha256 files into shade, uploads start immediately now | 01:36 |
pabelanger | must fast | 01:36 |
pabelanger | 1h20mins to build debian-jessie | 01:37 |
*** Shuo has quit IRC | 02:08 | |
*** rcarrillocruz has quit IRC | 02:12 | |
*** rcarrillocruz has joined #zuul | 02:16 | |
*** jamielennox|away is now known as jamielennox | 02:54 | |
*** saneax-_-|AFK is now known as saneax | 02:56 | |
*** jamielennox is now known as jamielennox|away | 03:17 | |
*** jamielennox|away is now known as jamielennox | 03:26 | |
*** jamielennox is now known as jamielennox|away | 03:47 | |
*** jamielennox|away is now known as jamielennox | 04:33 | |
*** saneax is now known as saneax-_-|AFK | 04:36 | |
*** saneax-_-|AFK is now known as saneax | 06:45 | |
*** willthames has quit IRC | 07:03 | |
*** willthames has joined #zuul | 07:16 | |
*** abregman has joined #zuul | 07:27 | |
*** Cibo_ has quit IRC | 07:31 | |
*** herlo has quit IRC | 08:42 | |
*** herlo has joined #zuul | 08:44 | |
*** hogepodge has quit IRC | 09:11 | |
*** hogepodge has joined #zuul | 09:12 | |
*** hogepodge has quit IRC | 09:31 | |
*** hogepodge has joined #zuul | 09:32 | |
*** openstack has joined #zuul | 10:06 | |
*** hashar has joined #zuul | 10:32 | |
openstackgerrit | Merged openstack-infra/nodepool: Test rotation of builds in nodepool-builder https://review.openstack.org/400004 | 10:45 |
openstackgerrit | Merged openstack-infra/zuul: Don't merge post-merge items https://review.openstack.org/404903 | 11:10 |
openstackgerrit | Merged openstack-infra/zuul: Define the internal noop job https://review.openstack.org/404864 | 11:14 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul: Cloner: Better infrastructure failure handling https://review.openstack.org/403559 | 11:33 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul: Keep existing loggers with fileConfig https://review.openstack.org/405333 | 12:12 |
*** abregman has quit IRC | 12:53 | |
*** jamielennox is now known as jamielennox|away | 13:00 | |
*** abregman has joined #zuul | 14:18 | |
*** abregman has quit IRC | 14:23 | |
*** abregman has joined #zuul | 14:23 | |
*** abregman has quit IRC | 14:25 | |
*** abregman has joined #zuul | 14:25 | |
*** saneax is now known as saneax-_-|AFK | 14:57 | |
pabelanger | morning | 15:00 |
pabelanger | nb01.o.o did its thing again last night | 15:01 |
pabelanger | all iamges uploads | 15:01 |
pabelanger | uploaded* | 15:02 |
pabelanger | Shrews: so, looks like you are right. nodepool-builder is currently pinning a CPU, even though the system is idle. We should be running the recently changes to interval for build / upload workers | 15:03 |
pabelanger | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63748&rra_id=all | 15:04 |
Shrews | pabelanger: i read those as mostly idle, no? | 15:10 |
pabelanger | ah | 15:11 |
pabelanger | I didn't link the right one | 15:11 |
pabelanger | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63747&rra_id=all | 15:11 |
pabelanger | load is >1 all the time | 15:11 |
pabelanger | and it is nodepool-builder | 15:11 |
pabelanger | must be looping some place | 15:11 |
Shrews | pabelanger: how many build and upload threads? | 15:13 |
pabelanger | 1 builder, 16 uploads | 15:13 |
pabelanger | however, we might be able to 1/2 uploads now | 15:13 |
Shrews | ok, so each thread sleeps 10s between checks. with that many, chances are one of them is doing something every few seconds. that's my best guess | 15:14 |
Shrews | i have to head out for a quick back adjustment. bbiab | 15:14 |
openstackgerrit | Jan Hruban proposed openstack-infra/zuul: Allow to fetch git packs via SSH https://review.openstack.org/405441 | 15:20 |
pabelanger | Shrews: looks like it might be our connection to zookeeper | 15:20 |
pabelanger | when I enable DEBUG logging for kazoo.client, and 1 upload / build worker, I see a spike to 100% | 15:21 |
pabelanger | so, with 16 threads, we must be hammering zk | 15:21 |
pabelanger | ya, looks like it | 15:22 |
Shrews | pabelanger: ah yeah. The kazoo client has its own threads for keepalive things. Not sure if that can be adjusted, or if we should even try. | 15:39 |
Shrews | Can look when I get back | 15:39 |
*** abregman has quit IRC | 16:01 | |
jeblair | well, there are 311 threads running, which seems surprisingly high | 16:05 |
jeblair | and apparently we don't support sigusr2 there so i accidentally killed it | 16:06 |
jeblair | however, it seems to have stopped logging around 15:24 | 16:07 |
pabelanger | jeblair: I reduced kazoo logging back to INFO and stopped / started nodepool-builder. So, we're currently idle | 16:09 |
pabelanger | and nothing to log | 16:09 |
jeblair | pabelanger: any idea why i'm not able to restart it? | 16:11 |
jeblair | ah there's the pid file | 16:11 |
pabelanger | odd, | 16:11 |
jeblair | ok it's running again | 16:11 |
pabelanger | never seen that | 16:11 |
jeblair | the pid file still existed | 16:11 |
jeblair | still running 311 threads, so at least it's not leaking, but jeepers that's a lot. | 16:13 |
pabelanger | I'm sure we can decrease our upload workers now, maybe 8 | 16:13 |
jeblair | pabelanger: why would we want to? | 16:14 |
pabelanger | save some threads? | 16:14 |
jeblair | pabelanger: well, i'd rather find out why we're spinning our wheels doing nothing | 16:14 |
pabelanger | ack | 16:14 |
pabelanger | even with 2 upload workers, it was still spiking to 100% CPU | 16:15 |
jeblair | i'm going to add the stack dump handler to it | 16:16 |
pabelanger | Also, working on sorting our default outputs, for example: | 16:20 |
pabelanger | current output (unsorted): http://paste.openstack.org/show/591153/ | 16:21 |
pabelanger | updated output (sorted by, provider, image, age): http://paste.openstack.org/show/591152/ | 16:21 |
pabelanger | 2nd should be how we do it today in nodepoo | 16:21 |
pabelanger | nodepool* | 16:21 |
jeblair | oooooh | 16:22 |
pabelanger | another thing we could do is actually expose sorting options to CLI | 16:22 |
jeblair | the threads are because we did not centralize the provider managers | 16:23 |
jeblair | i should have caught that :( | 16:23 |
jeblair | so we have (builders+uploaders+cleanup)*providers | 16:24 |
jeblair | and there are 3 zk threads for each of those | 16:30 |
pabelanger | remote: https://review.openstack.org/405482 Sort output of image-list | 16:30 |
pabelanger | if we are happy with that, I'll add another patch for dib-image-list | 16:31 |
jeblair | pabelanger: you don't want to sort by build,upload? | 16:34 |
jeblair | er build,provider,upload i guess | 16:34 |
*** hashar has quit IRC | 16:34 | |
pabelanger | let me see what that looks like | 16:34 |
pabelanger | At min, I think we want provider then image | 16:36 |
pabelanger | then we can do build / upload | 16:37 |
pabelanger | that give a nice format too | 16:37 |
jeblair | pabelanger: maybe ask in -infra? | 16:37 |
pabelanger | will do | 16:37 |
EmilienM | I have a stupid question | 16:43 |
EmilienM | a few months ago, I asked if with current version of zuul we could mix a zuul layout with files constraint and projects | 16:44 |
EmilienM | example: I want a job run only if it's for a specific project OR/AND some specific files in a project | 16:44 |
EmilienM | last time I asked, the answer was no, you can only do one or the other | 16:44 |
jeblair | EmilienM: is there more to your question? | 16:48 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Sort output of image-list https://review.openstack.org/405482 | 16:50 |
EmilienM | jeblair: no | 16:50 |
EmilienM | jeblair: I'm not sure I explained correctly | 16:50 |
EmilienM | let me know if it's clear | 16:50 |
jeblair | EmilienM: well, i understand the question you asked a few months ago (was it that long?). i just don't understand what you're asking now (you didn't actually ask a question just now) -- are you asking if anything has changed? | 16:51 |
jeblair | and sorry, i'm not trying to be dense, i really don't know if that's what you want, or if you have some follow-up question... | 16:52 |
EmilienM | I would like to know if situation has changed and if now, I can create a zuul layout that says "run this job if we touch this file or/and this project" | 16:53 |
jeblair | EmilienM: no -- it's not likely to change before zuulv3, where you will be able to do that. zuulv2 is mostly frozen for new features. | 16:54 |
Shrews | jeblair: "did not centralize the provider managers"... Help me understand that a bit more? Not sure what that means/entails since current code just uses ProviderManager class methods | 16:54 |
Shrews | want to understand what I did wrong | 16:54 |
EmilienM | jeblair: fair enough, thanks! | 16:55 |
jeblair | EmilienM: np, and sorry. i'm working as fast as i can. :) | 16:55 |
Shrews | for i in range(1..100): workers.append(jeblair.copy()) | 16:57 |
EmilienM | jeblair: no, thanks, sorry I didn't express it well :) | 16:58 |
jeblair | Shrews: the provider manager is meant to be a global object that all threads use to contact a single provider (it's a shade taskmanager) | 16:58 |
jeblair | Shrews: having said that... i'm not sure it's super important in this instance, since we're not actually doing that much cloud communication | 16:58 |
jeblair | Shrews: and i may have been off-base when i blamed it for the number of threads -- it actually shouldn't be spawning any threads | 16:59 |
Shrews | jeblair: right. not sure where in the code where we break that usage model | 16:59 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add stack dump handler to builder https://review.openstack.org/405529 | 17:00 |
jeblair | Shrews: well, if we were to use it as intended, NodePoolBuilder would contain the provider managers, and the workers would use those copies, rather than the workers having their own copies | 17:00 |
Shrews | jeblair: do those copies come along with the config object then? | 17:02 |
Shrews | no intentional copying that i see | 17:02 |
Shrews | ah, i think so | 17:03 |
jeblair | Shrews: yeah, when you reconfigure the providermanagers, they get attached to the config, so it's the loadconfig/provider_manager.reconfigure that's happening in each worker thread that causes them to get their own | 17:04 |
Shrews | gotcha | 17:04 |
jeblair | however, let's not move that just yet :) | 17:04 |
jeblair | it may be advantageous for us to leave it as-is, since this way we can multiplex uploads to a single provider | 17:05 |
jeblair | i am still eyeing the zk connection as something we may want to centralize. i think it is reponsible for 54 threads. | 17:06 |
jeblair | but i think the way we should proceed is to land 405529 and run that on nb01 and see what all those threads actually are | 17:07 |
Shrews | 54? wow | 17:07 |
jeblair | Shrews: yeah, i count 3 threads locally for each zk connection, and nb01 has 18 workers | 17:08 |
jeblair | i have to run an errand... Shrews, pabelanger: if you can land 405529 while i'm gone, i'd be grateful | 17:10 |
Shrews | centralizing the zk connection seems like a bad idea. it would serialize everything, including lock management | 17:10 |
Shrews | jeblair: i can +1 it, but can't land it ;) | 17:10 |
jeblair | Shrews: oh, would it actually serialize access? | 17:13 |
pabelanger | looking | 17:14 |
pabelanger | +2 | 17:17 |
pabelanger | 3 more hours until builds for today start | 17:20 |
pabelanger | should be flexing the cleanup worker today | 17:20 |
pabelanger | as we'll be rotating images | 17:20 |
Shrews | jeblair: i'm actually not sure which parts of the client are thread safe and which aren't, or if we'd need to switch to the async api | 17:20 |
Shrews | we'll have to explore it | 17:20 |
Shrews | transactions aren't, but we aren't using those so far | 17:22 |
openstackgerrit | Merged openstack-infra/nodepool: Have an ending line-feed on the generated id_rsa.pub file https://review.openstack.org/383496 | 17:32 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Sort output of image-list https://review.openstack.org/405482 | 18:15 |
*** morgan is now known as morgred | 18:20 | |
*** morgred is now known as morgan | 18:21 | |
*** saneax-_-|AFK is now known as saneax | 18:23 | |
*** abregman has joined #zuul | 18:30 | |
jesusaur | im trying to piece together how zuul works in a post-jenkins world, how does zuul-launcher know to update job definitions? | 18:44 |
*** bhavik1 has joined #zuul | 18:46 | |
pabelanger | jesusaur: zuul-launcher reconfigure | 18:50 |
pabelanger | when JJB folder changes, we Exec that in puppet | 18:50 |
pabelanger | zuul will then reload JJB into memory | 18:50 |
jesusaur | pabelanger: thanks :) | 18:51 |
pabelanger | jesusaur: np | 18:51 |
pabelanger | jesusaur: there are a few commands for zuul-launcher now, everything from graceful restart to drop nodes | 18:51 |
Shrews | jesusaur: wow, 405529 had some epic fails (other than the pep8 thing). not sure what happened there | 18:53 |
Shrews | err, jeblair | 18:53 |
Shrews | sorry jesusaur | 18:53 |
jesusaur | no worries | 18:54 |
jlk | something else we've noticed in the new zuul land | 18:56 |
jlk | and maybe this is because we've got gearman running on its own | 18:56 |
jlk | but if we do a full restart of zuul-launcher, it seems to lose track of existing job nodes, and I have to rm the nodepool nodes so that nodepool launches them again and re-registers. | 18:57 |
jlk | DOes that sound right to anybody? | 18:57 |
pabelanger | yup | 18:57 |
pabelanger | so, you can do 2 thing | 18:57 |
pabelanger | each time you restart zuul, delete all your online nodes | 18:58 |
pabelanger | or | 18:58 |
pabelanger | check_job_registration=False | 18:58 |
pabelanger | http://docs.openstack.org/infra/zuul/zuul.html#gearman | 18:58 |
pabelanger | IIRC, that will stop the NOT_REGISTERED message | 18:59 |
jlk | What's the implications of turning job registration off? | 18:59 |
pabelanger | zuul will try to launch a node | 18:59 |
jlk | what we see isn't a "NOT_REGISTERED" scenario, but instead sort of a infinite wait | 18:59 |
pabelanger | Hmm | 18:59 |
jlk | and as soon as the node shows up, the job goes through | 19:00 |
jlk | (new node) | 19:00 |
pabelanger | right | 19:00 |
pabelanger | 1 sec | 19:00 |
pabelanger | jlk: http://docs.openstack.org/infra/system-config/zuul.html#restarts | 19:00 |
pabelanger | you should follow that process for restarts | 19:00 |
pabelanger | it should outline how to handle the missing registrations | 19:01 |
pabelanger | you can also release you nodes back to nodepool too, before you stop now | 19:01 |
jlk | kk | 19:02 |
pabelanger | actually, that is only zuul-launcher | 19:02 |
jlk | hopefully we don't have to restart nearly allt hat much | 19:02 |
pabelanger | my mistake | 19:02 |
pabelanger | jlk: Ya, we usually schedule them | 19:03 |
pabelanger | jlk: look into zuul-launcher graceful | 19:03 |
pabelanger | give that a try | 19:03 |
pabelanger | that should help | 19:03 |
jlk | alrighty | 19:04 |
jeblair | when zuul-lancher stops, it returns all nodes to nodepool | 19:04 |
jeblair | pabelanger, Shrews: my change didn't land :( | 19:05 |
pabelanger | jeblair: oh, I did't approve | 19:05 |
pabelanger | sorry about that | 19:05 |
jeblair | pabelanger: i did, but it didn't pass tests | 19:05 |
Shrews | jeblair: yep. been looking at those failures, but... geez | 19:06 |
jlk | looksl ike graceful just does a stop | 19:06 |
jlk | and then need to start it back up again | 19:06 |
pabelanger | jlk: it keeps running jobs alive, so when we run it. It takes about 3 hours | 19:06 |
pabelanger | Oh, yes | 19:06 |
jeblair | jlk: fyi, none of this is relavent in zuulv3 | 19:06 |
jlk | jeblair: fair enough. Just trying to get a handle on running something to fully understand the changes of v3 | 19:07 |
*** abregman has quit IRC | 19:07 | |
jeblair | jlk: yep, just wanted to make sure you were aware so could plan accordingly :) | 19:07 |
pabelanger | we do have a playbook to handle some of this too today | 19:08 |
pabelanger | thankfully, restarts are minimal right now | 19:09 |
jlk | we're in somewhat deep dev mode, so we've got frequent code changes, so restarts are more common place for us currently | 19:10 |
pabelanger | http://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/hard_restart_zuul_launchers.yaml and http://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/restart_zuul_launchers.yaml are what we have today | 19:10 |
jeblair | pabelanger, Shrews: i see the problem with the tests | 19:14 |
jeblair | patch shortly | 19:14 |
pabelanger | mordred: ya, also not a fan of the magic numbers, open to suggestions | 19:15 |
jeblair | pabelanger: ^ what did i miss? | 19:20 |
pabelanger | jeblair: 405482 | 19:22 |
jeblair | pabelanger: wow i don't understand that | 19:23 |
clarkb | pabelanger: jeblair I think a more readablr way to do that is change the loop orders | 19:24 |
pabelanger | jeblair: ya, some magic for sure | 19:24 |
jeblair | i think the loops are in the right order | 19:24 |
clarkb | oh ya its image build provider upload | 19:24 |
jeblair | we could add sorted() calls to each one, or change the zk methods to return them sorted | 19:25 |
pabelanger | Ya, I think might be cleaner | 19:25 |
pabelanger | I can work on that | 19:25 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Add stack dump handler to builder https://review.openstack.org/405529 | 19:25 |
jeblair | pabelanger, Shrews: ^ | 19:25 |
pabelanger | jeblair: because I don't know, why did you need to update tests/__init__.py? | 19:28 |
Shrews | b/c he added the self.name attribute | 19:29 |
Shrews | to the workers | 19:29 |
pabelanger | ah, I see | 19:29 |
pabelanger | the final whitelist check | 19:30 |
pabelanger | thanks | 19:30 |
jeblair | they were slipping in under "Thread-" earlier ("apscheduler thread pool") | 19:30 |
*** openstackgerrit has quit IRC | 19:32 | |
*** openstackgerrit has joined #zuul | 19:34 | |
*** openstackgerrit has quit IRC | 19:36 | |
*** bhavik1 has quit IRC | 19:38 | |
*** saneax is now known as saneax-_-|AFK | 19:40 | |
*** openstackgerrit has joined #zuul | 19:41 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul: Fix variants not picking up negative matches. https://review.openstack.org/399871 | 19:41 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul: Add test for variant override https://review.openstack.org/399871 | 19:42 |
*** openstackgerrit has quit IRC | 19:43 | |
jeblair | SpamapS: i think i found one error in the tests you're adding in 399871. when i correct that and run it with my fix, it works. can you take a close look at that and make sure my analysis is correct? | 19:44 |
dmsimard | rcarrillocruz: o/ where can I see those ansible based jobs run for devstack you've been working on ? | 19:48 |
jeblair | dmsimard: https://review.openstack.org/#/q/topic:zuulv3+project:openstack-infra/devstack-gate | 19:49 |
jeblair | dmsimard: the first one has merged already | 19:51 |
mordred | we're working through transitioning d-g stepwise in small chunks | 19:52 |
SpamapS | jeblair: ditto here ;) | 19:52 |
*** openstackgerrit has joined #zuul | 19:52 | |
openstackgerrit | Merged openstack-infra/nodepool: Add stack dump handler to builder https://review.openstack.org/405529 | 19:52 |
dmsimard | yeah, I'm trying to see where the ansible bits are running | 19:52 |
* dmsimard looks | 19:52 | |
*** jamielennox|away is now known as jamielennox | 19:53 | |
jeblair | dmsimard: note the callback plugin, so the output is changed slightly from standard | 19:53 |
dmsimard | jeblair: yeah, in fact I was curious to see what it would look like if I threw ARA in there | 19:55 |
jeblair | dmsimard: d-g is self testing, so please do... maybe you could put a change at the end of rcarrillocruz's stack so it has something to chew on; that could be really useful :) | 19:56 |
clarkb | keep in mind we do index the text logsfot aggregate processing so we cant replacethose, but having athing alongaide for specific ansible stuffcould be nice | 19:58 |
jeblair | clarkb: yeah, i'm thinking at the very least a lot of the "setting up host this should take 10 minutes" stuff gets a lot better | 19:58 |
dmsimard | clarkb: I actually want a way to export reports in a elasticsearch-friendly manner at some point in time so we can index fields and such. ARA can already sort-of do that through cliff which supports json output for the CLI but it's not straightforward to crawl through all the results and dump them right now. | 20:00 |
pabelanger | come on time, move faster. 90mins until our next round of builds on nb01.o.o | 20:00 |
clarkb | ya its also not necessaryto solve right away as long as you krrp the old text logs too since wealready process those | 20:01 |
dmsimard | So instead of having logstash parse through console log output, you'd have real groks/filters/fields to work with | 20:01 |
pabelanger | clarkb: also, no upload failures yesterday | 20:01 |
clarkb | pabelanger: at all? seems like a list i saw had failures | 20:02 |
jeblair | clarkb, pabelanger: maybe our cloud providers haven't blacklisted our new builder ip. | 20:02 |
clarkb | the sort output one | 20:02 |
pabelanger | clarkb: those were 2 days ago | 20:02 |
clarkb | ah | 20:02 |
pabelanger | jeblair: ha | 20:02 |
rcarrillocruz | dmsimard: we were discussing yesterday about dropping role defaults and have a central vars file to put all d-g variables | 20:05 |
rcarrillocruz | https://review.openstack.org/#/c/404974/ | 20:05 |
rcarrillocruz | clarkb: ^ | 20:05 |
rcarrillocruz | just in case you start ansiblying other stuff | 20:05 |
clarkb | dmsimard: also we have tons of real fields :) it just happens that one of them starts as the entire message.But we inject a bunch of job info too | 20:05 |
rcarrillocruz | if we agree on ^, i suggest you put your vars on it | 20:05 |
dmsimard | rcarrillocruz: what's the last patch in your tree ? | 20:05 |
rcarrillocruz | clarkb when you get a chance, have a look, i pasted last night some paste link with how it looks likes | 20:05 |
dmsimard | rcarrillocruz: I'll send a poc patch with ara at the end just to see what it looks like | 20:06 |
rcarrillocruz | https://review.openstack.org/#/c/404243/ , although that one i need to iterate, it's wip now | 20:06 |
dmsimard | clarkb: fair, not mutually exclusive -- porque no los dos :p | 20:06 |
pabelanger | +2 | 20:08 |
pabelanger | makes sense | 20:08 |
pabelanger | jeblair: Shrews: anything I should be working on for nodepool? | 20:09 |
adam_g | jeblair: so ive been banging on test_scheduler:test_build_configuration, and am wondering why merge_mode is hard-coded to MERGER_MERG_RESOLVE in v3 at https://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/model.py?h=feature/zuulv3 | 20:10 |
jeblair | pabelanger: the sigusr2 patch landed, can you restart the builder? then we can get an idea of what those threads are | 20:10 |
pabelanger | jeblair: sure | 20:10 |
pabelanger | just waiting for git to update | 20:11 |
rcarrillocruz | erm, thx for the topic change jeblair | 20:11 |
jeblair | adam_g: that should just be the default, but it should be overridable in configuration; i don't know if that made it into configloader, if not, we should add it. | 20:11 |
pabelanger | jeblair: restarted | 20:12 |
adam_g | jeblair: configurable on the project? | 20:13 |
adam_g | oh, merge-mode | 20:13 |
jeblair | adam_g: yep, like "project: nova\n merge-mode: something" | 20:13 |
jeblair | yeah | 20:13 |
jeblair | pabelanger: i just sent it sigusr2 | 20:13 |
pabelanger | see it in debug logs | 20:14 |
dmsimard | jeblair, rcarrillocruz: there you go, let's wait and see: https://review.openstack.org/#/c/405613/ | 20:14 |
pabelanger | thats a lot of threads | 20:14 |
clarkb | rcarrillocruz: what does the second arg to default() do there? | 20:14 |
clarkb | all the docs I can find are for single arg default() | 20:14 |
adam_g | jeblair: ok yeah, doesn't look like its configurable ATM. will see about adding it and adding some tests for it | 20:15 |
jeblair | adam_g: cool, thanks! that's an easy one for us to overlook since right now, we're using the default exclusively | 20:15 |
jeblair | but definitely need to support the others | 20:15 |
jeblair | yeah, so we do have one thread for each provider for each worker | 20:16 |
clarkb | http://jinja.pocoo.org/docs/dev/templates/#default finally found the function signature | 20:17 |
jeblair | pabelanger, Shrews: ah, right because the taskmanager itself *is* a thread | 20:17 |
jeblair | so probably what we want to do is either decide that we want to serialize all access to a provider from a given process, or if we want to keep it parallel, stop using taskmanagers here | 20:18 |
jeblair | i lean toward option 2 | 20:19 |
jeblair | i'm going to grab some lunch | 20:19 |
rcarrillocruz | clarkb: lookup doesn't return None | 20:20 |
rcarrillocruz | but an empty string | 20:20 |
rcarrillocruz | thus we need the false thing | 20:21 |
rcarrillocruz | dmsimard: \o/ excited to have ARA! | 20:21 |
* rcarrillocruz goes for dinner, be back shortly | 20:21 | |
clarkb | rcarrillocruz: I thnik if mordred and jeblair are happy with 404974 we can go ahead and approve that then rebase things like https://review.openstack.org/#/c/402107/8 onto it | 20:22 |
rcarrillocruz | doesn't return None if var is not in environment, that is | 20:22 |
clarkb | rcarrillocruz: ya | 20:22 |
Shrews | pabelanger: uh, *shrug*. anything you want to pick up from storyboard is fine with me. i may experiment with using shared ZK connections for a bit, in case we need decide to go that route (though I'm not certain how necessary it is yet) | 20:24 |
pabelanger | Shrews: k, let me check the board | 20:25 |
*** openstack has joined #zuul | 20:46 | |
pabelanger | 20mins, fedora-23 build starts | 20:58 |
dmsimard | rcarrillocruz: neat: http://logs.openstack.org/13/405613/2/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/507dbc5/logs/ara/playbook/97db8f0e-56f8-4925-91a5-710a19e4ea7b/host/localhost/index.html | 21:08 |
jeblair | very cool | 21:09 |
pabelanger | ya, that looks nice | 21:09 |
dmsimard | jeblair: so basically I wonder how this would work from a Zuul perspective -- like, could we get the whole thing recorded "first party" by Zuul when the "end user" job leverages Ansible ? | 21:10 |
dmsimard | I had done this a long time ago: https://review.openstack.org/#/c/330874/ but I sort of stopped there because what Zuul did with Ansible at that time was still fairly limited | 21:11 |
*** rcarrillocruz has quit IRC | 21:12 | |
pabelanger | I would imagine it being some sort of post playbook | 21:12 |
jeblair | dmsimard: yeah, i think so. probably by making it easy to add callback plugins. | 21:12 |
jeblair | dmsimard: we're going to have a pretty big focus on roles in v3 -- would it be possible to group the tasks by roles? | 21:12 |
jeblair | pabelanger: there are those two nodepool v3 stories i filed yesterday | 21:13 |
dmsimard | jeblair: more or less... last I checked, Ansible doesn't really pass the concept of roles to callbacks -- files are recorded and ARA can filter by files afterwards. Ex: http://logs.openstack.org/13/405613/2/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/507dbc5/logs/ara/playbook/97db8f0e-56f8-4925-91a5-710a19e4ea7b/file/3f036130-6be1-428f-97c5-d7e19f46d878/index.html | 21:14 |
jeblair | pabelanger, Shrews: i'm going to look into avoiding the taskmanagers | 21:14 |
pabelanger | jeblair: ack, let me get started on a test to reproduce it | 21:14 |
dmsimard | jeblair: the closest I have right now is to be able to filter by play properly (like, if there is a 1-to-1 relationship between plays and roles..) | 21:15 |
jeblair | dmsimard: got it -- the filter next to the role file is useful | 21:15 |
clarkb | dmsimard: firefox gets really unhappy with that | 21:15 |
clarkb | (thinks its a run away script) | 21:15 |
jeblair | clarkb: wfm | 21:15 |
dmsimard | clarkb: it does ? let me look | 21:15 |
clarkb | jeblair: could be my local browser then | 21:15 |
jeblair | dmsimard: yeah, we *could* do 1:1 here, but i don't know that we would end up doing that generally in v3 | 21:15 |
clarkb | (it eventually works thouhg) | 21:15 |
dmsimard | clarkb: ok. | 21:16 |
dmsimard | jeblair: a 1:1 relationship is an ugly constraint to work with, I'll ask the ansible devs if there's anything exposed that I could use cause I would definitely love that. | 21:16 |
jeblair | cool | 21:16 |
jeblair | mordred, Shrews: ^ fyi | 21:17 |
dmsimard | jeblair: back to my zuul question, though -- does Zuul fork out to another ansible process on the nodepool node ? Or does it sort of run the whole thing straight from ansible launcher ? | 21:17 |
* dmsimard not familiar enough with new zuul architecture | 21:17 | |
jeblair | dmsimard: from the launcher | 21:17 |
pabelanger | jeblair: okay, I didn't comment since I didn't know how much work was involved. | 21:17 |
dmsimard | Ok, so if ARA would run on the launcher and everything else until the end is ansible, we could do something like http://status.openstack.org/openstack-health/#/ basically. | 21:18 |
pabelanger | I think how we do stackviz today works, assuming you can get access to all the data from the launcher | 21:18 |
jeblair | dmsimard: i think we would want to package up the results and publish them somewhere in a post playbook as pabelanger suggested | 21:19 |
jeblair | (launchers aren't meant to serve end-users) | 21:20 |
dmsimard | Yeah, stackviz is great -- ARA can generate a static version too. At some level of scale it gets sort of absurd, though. Like OpenStack-Ansible's gate has >6k tasks which ends up being >10k files and ~50MB gzipped .. all from a 6MB sqlite database file. | 21:20 |
clarkb | (which is basically how this test worked) | 21:20 |
pabelanger | indeed, that would be pretty slick if we did that | 21:20 |
clarkb | dmsimard: all the more reason to distribute it ? | 21:20 |
dmsimard | clarkb: but we don't /need/ to generate the static version | 21:20 |
jeblair | dmsimard: ah, yeah, sure we can copy the sqlite file over to something that can serve it | 21:21 |
dmsimard | It's a flask app that's super lightweight when using sqlite/mysql | 21:21 |
dmsimard | jeblair: yeah, I discussed with mtreinish to sort of retrieve the sqlite files the way he retrieves the tempest results | 21:21 |
clarkb | dmsimard: right but what happens when an openstack ansible change fails a bunch of tests and all of a sudden you have to render all of those at once off a single server | 21:21 |
pabelanger | clarkb: so, have a central ARA server that will generate things and archive them off some place? | 21:21 |
clarkb | pabelanger: dmsimard is suggesting that we have the launcher possibly host the contents | 21:22 |
pabelanger | Ah | 21:22 |
dmsimard | not really, hang on, let's start again | 21:22 |
jeblair | dmsimard: well, we will have different constraints with v3, in that we will be able to push things rather than pull, but still we'll have the idea of launcher collects data, then *something* is shipped off to *somewhere* at the end of the job. | 21:22 |
jeblair | (that could mean shipping a small data bundle to a central ara flask server) | 21:23 |
dmsimard | What's important about the launchers if to have the callback configured and saving stuff. The stuff can be saved to a central mysql server (not unlike openstack-health) directly through sqlalchemy -- or saved locally inside a sqlite database that'd be pushed/pulled and imported in a central place (again, like openstack-health) | 21:24 |
dmsimard | Generating the static version of the app is nuts at any kind of serious scale, like beyond 4k tasks is not reasonable .. at least currently. So I'd tend to run the database driven flask app instead of generating the static files. | 21:24 |
jeblair | oh if the callback can stream to mysql server that's probably even easier then :) | 21:25 |
dmsimard | This runs off of the flask app with sqlite: http://46.231.133.111/ | 21:25 |
dmsimard | jeblair: yeah, it's just sqlalchemy .. http://ara.readthedocs.io/en/latest/configuration.html#ara-database | 21:25 |
dmsimard | it's magic*alchemy | 21:25 |
clarkb | dmsimard: one benefit of the static generation is you pay the cost once for any single report and that cost is eaten by the test node itself. But if its unwieldy for a web broswer to handle I can see how that wouldn't work well | 21:25 |
jeblair | cool, so i feel pretty confident we can fairly easily slip this in to v3 when the time comes | 21:26 |
dmsimard | At scale, it's not a web browser issue .. I mean, you're still browsing one page at once, right ? It just takes a long time to generate the static version and it's a stupid amount of files. | 21:26 |
clarkb | dmsimard: gotcha so the 50MB isn't a single file that browser has to uncompress | 21:27 |
jeblair | (since the callback will run on the launcher, security concerns are lessened) | 21:27 |
clarkb | its many little files | 21:27 |
clarkb | dmsimard: how long does it take? | 21:27 |
dmsimard | clarkb: right, it's 50MB in ~10k files for the openstack-ansible playbooks. 15 minutes on my laptop. | 21:27 |
clarkb | gotcha | 21:27 |
dmsimard | clarkb: it's otherwise a 6MB sqlite file that's instantaneously loaded | 21:27 |
dmsimard | This is probably the heaviest page from OSA: http://46.231.133.111/playbook/293c1e05-066d-44a6-b7e1-11cc4fc4a1ef/ | 21:27 |
clarkb | dmsimard: and you don't see that high cost reflected on dynamic laods of a centralized server? | 21:28 |
jeblair | dmsimard: have you tested what it might look like with 3600000 runs in a single db? | 21:28 |
clarkb | right ^ being the next thing :) | 21:28 |
dmsimard | clarkb: if there's no one browsing it, there won't be any load | 21:28 |
jeblair | (that's ~6 months for us) | 21:28 |
*** rcarrillocruz has joined #zuul | 21:28 | |
dmsimard | jeblair: Nope, I have no clue. tbh there are probably a fair bit of improvements to do at that kind of scale. Some features that are definitely not in yet. Searching, paging, etc. | 21:29 |
dmsimard | I'm only one guy and this is a side project though :( | 21:29 |
jhesketh | Morning | 21:29 |
jeblair | dmsimard: well, searching/paging probably isn't too important off the bat -- we can deep-link job results into it | 21:29 |
clarkb | there is definitely a trade off between generating things once at high cost vs generating only what you need on demand at potentially high cost | 21:29 |
dmsimard | doesn't require "generating" on demand | 21:29 |
clarkb | dmsimard: sure it does | 21:30 |
dmsimard | it's flask/jinja rendered in real time, database driven | 21:30 |
clarkb | dmsimard: you have to serve http and js with db data | 21:30 |
dmsimard | I don't see that as a potentially high cost | 21:30 |
clarkb | dmsimard: if it takes 15 minutes to do that for one job the total cost of all those pages is ~15 minutes | 21:30 |
clarkb | dmsimard: now if you get users to browse that job several times its still 15 minutes | 21:30 |
dmsimard | No, it's not | 21:31 |
clarkb | in the static generation case | 21:31 |
dmsimard | yeah, for static. | 21:31 |
jeblair | clarkb: though not all jobs get looked at | 21:31 |
clarkb | in the dynamic case it depends on how many pages they hit | 21:31 |
clarkb | jeblair: right | 21:31 |
dmsimard | Otherwise it's 0 minutes or 1 second for one page or something :p | 21:31 |
dmsimard | (dynamic) | 21:31 |
clarkb | my point being depending on browsing habits one may be better than the other | 21:31 |
mordred | yah - I think it'll be a thing where poking at it a few different ways will likely teach us a lot | 21:32 |
jeblair | if i had to guess, from a global carbon footprint perspective, the central server would probably be a win for us :) | 21:32 |
dmsimard | That said, this is part of the reason why I submitted https://review.openstack.org/#/c/397773/ | 21:33 |
clarkb | dmsimard: oh cool is that ready? | 21:33 |
dmsimard | It wfm on logs-dev | 21:33 |
clarkb | awesome will review | 21:33 |
rcarrillocruz | mordred , jeblair : you good with +A https://review.openstack.org/#/c/404974/ | 21:34 |
rcarrillocruz | if so, i'll rebase the chain on it and remove role defaults | 21:34 |
dmsimard | clarkb: ty :) | 21:35 |
pabelanger | doh, misread our DIB schedule, another 45mins until fedora-23 start on nb01.o.o | 21:35 |
clarkb | there are other benefits to consider too, for example rendering in job means you can self test changes to the hosting. Downside to that means old content wont' get bugfixed when you push them only new things will. etc. I think one of openstack-health's biggest issues has just been the pure quantity of data | 21:36 |
clarkb | and since it tries to provide a global view we can't as easily split that workload out onto the test jobs themselves | 21:36 |
clarkb | but "we have too much data in the mysql db" is probably a good problem to have as far as problems go | 21:37 |
pabelanger | that's what I like about stackviz, the in job rendering. So I hope we could also do the same with ARA, even if we did a central approach | 21:37 |
*** abregman has joined #zuul | 21:38 | |
jeblair | rcarrillocruz, mordred: i +3 404974 but mordred should still see it | 21:39 |
rcarrillocruz | thx | 21:40 |
* mordred looking | 21:40 | |
mordred | yah - looks great | 21:41 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder https://review.openstack.org/405663 | 21:56 |
jeblair | mordred, pabelanger, Shrews: ^ that should get rid of the bulk of our idle threads. even with that change, we can still rework the builder so that all the workers share providermanagers, and if we do that, it's actually a simple boolean switch to whether they serialize cloud api calls or not. | 21:58 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder https://review.openstack.org/405663 | 22:00 |
*** openstackgerrit has quit IRC | 22:03 | |
mordred | jeblair: that patch makes sense - and image uploads are certainly not what TaskManagers exist to help with | 22:04 |
rcarrillocruz | mordred: mind re-reviewing https://review.openstack.org/#/c/401975/ pls, i amended clarkb comments | 22:19 |
SpamapS | jeblair: indeed, 399871 looks correct now, thanks for spotting that. I'm pushing a new patch that fixes pep8. | 22:47 |
jeblair | SpamapS: thanks, sorry i missed the pep8 | 22:47 |
*** openstackgerrit has joined #zuul | 22:48 | |
openstackgerrit | K Jonathan Harker proposed openstack-infra/nodepool: Write per-label nodepool demand info to statsd https://review.openstack.org/246037 | 22:48 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul: Add test for variant override https://review.openstack.org/399871 | 22:49 |
*** abregman has quit IRC | 23:03 | |
openstackgerrit | Jamie Lennox proposed openstack-infra/nodepool: Accept user-home in config validator https://review.openstack.org/404519 | 23:25 |
jeblair | adam_g: https://storyboard.openstack.org/#!/story/2000785 | 23:43 |
jeblair | SpamapS: take a look at https://storyboard.openstack.org/#!/board/39 | 23:46 |
jeblair | SpamapS: i wrote a quick script that uses the storyboard api to semi-automatically manage that board | 23:46 |
jeblair | SpamapS: the rules are: merged items are removed from the board, inprogress/review must show up in "in progress" or "blocked" (in progress is the default). todo must show up in "new", "backlog" or "todo" ("new" is the default). | 23:48 |
jeblair | SpamapS: so basically, a "todo" task it hasn't seen before will be added to "new". we can move that to backlog or todo. when someone grabs it, it will be moved to "in progress" automatically. we can then move it to/from blocked as needed. when it merges, it's removed from the board. | 23:49 |
jeblair | (could easily have a merge lane as well, but i'm not sure how useful it is) | 23:49 |
jeblair | Zara, SotK: ^ fyi | 23:52 |
openstackgerrit | Jamie Lennox proposed openstack-infra/zuul: Re-enable requirement vote tests https://review.openstack.org/401061 | 23:54 |
openstackgerrit | Jamie Lennox proposed openstack-infra/zuul: Re-enable requirement status tests https://review.openstack.org/401062 | 23:54 |
openstackgerrit | Jamie Lennox proposed openstack-infra/zuul: Re-enable requirement reject tests https://review.openstack.org/401063 | 23:54 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!