Thursday, 2016-12-01

clarkbthe old behavior was to not build an image if it didn't have a target output format00:51
openstackgerritMerged openstack-infra/nodepool: Add image_name to UploadWorker INFO message  https://review.openstack.org/40489101:15
*** jamielennox is now known as jamielennox|away01:34
pabelangersweet01:35
pabelangerwith nodepool passing the md5 / sha256 files into shade, uploads start immediately now01:36
pabelangermust fast01:36
pabelanger1h20mins to build debian-jessie01:37
*** Shuo has quit IRC02:08
*** rcarrillocruz has quit IRC02:12
*** rcarrillocruz has joined #zuul02:16
*** jamielennox|away is now known as jamielennox02:54
*** saneax-_-|AFK is now known as saneax02:56
*** jamielennox is now known as jamielennox|away03:17
*** jamielennox|away is now known as jamielennox03:26
*** jamielennox is now known as jamielennox|away03:47
*** jamielennox|away is now known as jamielennox04:33
*** saneax is now known as saneax-_-|AFK04:36
*** saneax-_-|AFK is now known as saneax06:45
*** willthames has quit IRC07:03
*** willthames has joined #zuul07:16
*** abregman has joined #zuul07:27
*** Cibo_ has quit IRC07:31
*** herlo has quit IRC08:42
*** herlo has joined #zuul08:44
*** hogepodge has quit IRC09:11
*** hogepodge has joined #zuul09:12
*** hogepodge has quit IRC09:31
*** hogepodge has joined #zuul09:32
*** openstack has joined #zuul10:06
*** hashar has joined #zuul10:32
openstackgerritMerged openstack-infra/nodepool: Test rotation of builds in nodepool-builder  https://review.openstack.org/40000410:45
openstackgerritMerged openstack-infra/zuul: Don't merge post-merge items  https://review.openstack.org/40490311:10
openstackgerritMerged openstack-infra/zuul: Define the internal noop job  https://review.openstack.org/40486411:14
openstackgerritTobias Henkel proposed openstack-infra/zuul: Cloner: Better infrastructure failure handling  https://review.openstack.org/40355911:33
openstackgerritTristan Cacqueray proposed openstack-infra/zuul: Keep existing loggers with fileConfig  https://review.openstack.org/40533312:12
*** abregman has quit IRC12:53
*** jamielennox is now known as jamielennox|away13:00
*** abregman has joined #zuul14:18
*** abregman has quit IRC14:23
*** abregman has joined #zuul14:23
*** abregman has quit IRC14:25
*** abregman has joined #zuul14:25
*** saneax is now known as saneax-_-|AFK14:57
pabelangermorning15:00
pabelangernb01.o.o did its thing again last night15:01
pabelangerall iamges uploads15:01
pabelangeruploaded*15:02
pabelangerShrews: so, looks like you are right. nodepool-builder is currently pinning a CPU, even though the system is idle.  We should be running the recently changes to interval for build / upload workers15:03
pabelangerhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63748&rra_id=all15:04
Shrewspabelanger: i read those as mostly idle, no?15:10
pabelangerah15:11
pabelangerI didn't link the right one15:11
pabelangerhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63747&rra_id=all15:11
pabelangerload is >1 all the time15:11
pabelangerand it is nodepool-builder15:11
pabelangermust be looping some place15:11
Shrewspabelanger: how many build and upload threads?15:13
pabelanger1 builder, 16 uploads15:13
pabelangerhowever, we might be able to 1/2 uploads now15:13
Shrewsok, so each thread sleeps 10s between checks. with that many, chances are one of them is doing something every few seconds. that's my best guess15:14
Shrewsi have to head out for a quick back adjustment. bbiab15:14
openstackgerritJan Hruban proposed openstack-infra/zuul: Allow to fetch git packs via SSH  https://review.openstack.org/40544115:20
pabelangerShrews: looks like it might be our connection to zookeeper15:20
pabelangerwhen I enable DEBUG logging for kazoo.client, and 1 upload / build worker, I see a spike to 100%15:21
pabelangerso, with 16 threads, we must be hammering zk15:21
pabelangerya, looks like it15:22
Shrewspabelanger: ah yeah. The kazoo client has its own threads for keepalive things. Not sure if that can be adjusted, or if we should even try.15:39
ShrewsCan look when I get back15:39
*** abregman has quit IRC16:01
jeblairwell, there are 311 threads running, which seems surprisingly high16:05
jeblairand apparently we don't support sigusr2 there so i accidentally killed it16:06
jeblairhowever, it seems to have stopped logging around 15:2416:07
pabelangerjeblair: I reduced kazoo logging back to INFO and stopped / started nodepool-builder.  So, we're currently idle16:09
pabelangerand nothing to log16:09
jeblairpabelanger: any idea why i'm not able to restart it?16:11
jeblairah there's the pid file16:11
pabelangerodd,16:11
jeblairok it's running again16:11
pabelangernever seen that16:11
jeblairthe pid file still existed16:11
jeblairstill running 311 threads, so at least it's not leaking, but jeepers that's a lot.16:13
pabelangerI'm sure we can decrease our upload workers now, maybe 816:13
jeblairpabelanger: why would we want to?16:14
pabelangersave some threads?16:14
jeblairpabelanger: well, i'd rather find out why we're spinning our wheels doing nothing16:14
pabelangerack16:14
pabelangereven with 2 upload workers, it was still spiking to 100% CPU16:15
jeblairi'm going to add the stack dump handler to it16:16
pabelangerAlso, working on sorting our default outputs, for example:16:20
pabelangercurrent output (unsorted): http://paste.openstack.org/show/591153/16:21
pabelangerupdated output (sorted by, provider, image, age): http://paste.openstack.org/show/591152/16:21
pabelanger2nd should be how we do it today in nodepoo16:21
pabelangernodepool*16:21
jeblairoooooh16:22
pabelangeranother thing we could do is actually expose sorting options to CLI16:22
jeblairthe threads are because we did not centralize the provider managers16:23
jeblairi should have caught that :(16:23
jeblairso we have (builders+uploaders+cleanup)*providers16:24
jeblairand there are 3 zk threads for each of those16:30
pabelangerremote:   https://review.openstack.org/405482 Sort output of image-list16:30
pabelangerif we are happy with that, I'll add another patch for dib-image-list16:31
jeblairpabelanger: you don't want to sort by build,upload?16:34
jeblairer build,provider,upload i guess16:34
*** hashar has quit IRC16:34
pabelangerlet me see what that looks like16:34
pabelangerAt min, I think we want provider then image16:36
pabelangerthen we can do build / upload16:37
pabelangerthat give a nice format too16:37
jeblairpabelanger: maybe ask in -infra?16:37
pabelangerwill do16:37
EmilienMI have a stupid question16:43
EmilienMa few months ago, I asked if with current version of zuul we could mix a zuul layout with files constraint and projects16:44
EmilienMexample: I want a job run only if it's for a specific project OR/AND some specific files in a project16:44
EmilienMlast time I asked, the answer was no, you can only do one or the other16:44
jeblairEmilienM: is there more to your question?16:48
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Sort output of image-list  https://review.openstack.org/40548216:50
EmilienMjeblair: no16:50
EmilienMjeblair: I'm not sure I explained correctly16:50
EmilienMlet me know if it's clear16:50
jeblairEmilienM: well, i understand the question you asked a few months ago (was it that long?).  i just don't understand what you're asking now (you didn't actually ask a question just now) -- are you asking if anything has changed?16:51
jeblairand sorry, i'm not trying to be dense, i really don't know if that's what you want, or if you have some follow-up question...16:52
EmilienMI would like to know if situation has changed and if now, I can create a zuul layout that says "run this job if we touch this file or/and this project"16:53
jeblairEmilienM: no -- it's not likely to change before zuulv3, where you will be able to do that.  zuulv2 is mostly frozen for new features.16:54
Shrewsjeblair: "did not centralize the provider managers"... Help me understand that a bit more? Not sure what that means/entails since current code just uses ProviderManager class methods16:54
Shrewswant to understand what I did wrong16:54
EmilienMjeblair: fair enough, thanks!16:55
jeblairEmilienM: np, and sorry.  i'm working as fast as i can.  :)16:55
Shrewsfor i in range(1..100): workers.append(jeblair.copy())16:57
EmilienMjeblair: no, thanks, sorry I didn't express it well :)16:58
jeblairShrews: the provider manager is meant to be a global object that all threads use to contact a single provider (it's a shade taskmanager)16:58
jeblairShrews: having said that... i'm not sure it's super important in this instance, since we're not actually doing that much cloud communication16:58
jeblairShrews: and i may have been off-base when i blamed it for the number of threads -- it actually shouldn't be spawning any threads16:59
Shrewsjeblair: right. not sure where in the code where we break that usage model16:59
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add stack dump handler to builder  https://review.openstack.org/40552917:00
jeblairShrews: well, if we were to use it as intended, NodePoolBuilder would contain the provider managers, and the workers would use those copies, rather than the workers having their own copies17:00
Shrewsjeblair: do those copies come along with the config object then?17:02
Shrewsno intentional copying that i see17:02
Shrewsah, i think so17:03
jeblairShrews: yeah, when you reconfigure the providermanagers, they get attached to the config, so it's the loadconfig/provider_manager.reconfigure that's happening in each worker thread that causes them to get their own17:04
Shrewsgotcha17:04
jeblairhowever, let's not move that just yet :)17:04
jeblairit may be advantageous for us to leave it as-is, since this way we can multiplex uploads to a single provider17:05
jeblairi am still eyeing the zk connection as something we may want to centralize.  i think it is reponsible for 54 threads.17:06
jeblairbut i think the way we should proceed is to land 405529 and run that on nb01 and see what all those threads actually are17:07
Shrews54? wow17:07
jeblairShrews: yeah, i count 3 threads locally for each zk connection, and nb01 has 18 workers17:08
jeblairi have to run an errand... Shrews, pabelanger: if you can land 405529 while i'm gone, i'd be grateful17:10
Shrewscentralizing the zk connection seems like a bad idea. it would serialize everything, including lock management17:10
Shrewsjeblair: i can +1 it, but can't land it  ;)17:10
jeblairShrews: oh, would it actually serialize access?17:13
pabelangerlooking17:14
pabelanger+217:17
pabelanger3 more hours until builds for today start17:20
pabelangershould be flexing the cleanup worker today17:20
pabelangeras we'll be rotating images17:20
Shrewsjeblair: i'm actually not sure which parts of the client are thread safe and which aren't, or if we'd need to switch to the async api17:20
Shrewswe'll have to explore it17:20
Shrewstransactions aren't, but we aren't using those so far17:22
openstackgerritMerged openstack-infra/nodepool: Have an ending line-feed on the generated id_rsa.pub file  https://review.openstack.org/38349617:32
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Sort output of image-list  https://review.openstack.org/40548218:15
*** morgan is now known as morgred18:20
*** morgred is now known as morgan18:21
*** saneax-_-|AFK is now known as saneax18:23
*** abregman has joined #zuul18:30
jesusaurim trying to piece together how zuul works in a post-jenkins world, how does zuul-launcher know to update job definitions?18:44
*** bhavik1 has joined #zuul18:46
pabelangerjesusaur: zuul-launcher reconfigure18:50
pabelangerwhen JJB folder changes, we Exec that in puppet18:50
pabelangerzuul will then reload JJB into memory18:50
jesusaurpabelanger: thanks :)18:51
pabelangerjesusaur: np18:51
pabelangerjesusaur: there are a few commands for zuul-launcher now, everything from graceful restart to drop nodes18:51
Shrewsjesusaur: wow, 405529 had some epic fails (other than the pep8 thing). not sure what happened there18:53
Shrewserr, jeblair18:53
Shrewssorry jesusaur18:53
jesusaurno worries18:54
jlksomething else we've noticed in the new zuul land18:56
jlkand maybe this is because we've got gearman running on its own18:56
jlkbut if we do a full restart of zuul-launcher, it seems to lose track of existing job nodes, and I have to rm the nodepool nodes so that nodepool launches them again and re-registers.18:57
jlkDOes that sound right to anybody?18:57
pabelangeryup18:57
pabelangerso, you can do 2 thing18:57
pabelangereach time you restart zuul, delete all your online nodes18:58
pabelangeror18:58
pabelangercheck_job_registration=False18:58
pabelangerhttp://docs.openstack.org/infra/zuul/zuul.html#gearman18:58
pabelangerIIRC, that will stop the NOT_REGISTERED message18:59
jlkWhat's the implications of turning job registration off?18:59
pabelangerzuul will try to launch a node18:59
jlkwhat we see isn't a "NOT_REGISTERED" scenario, but instead sort of a infinite wait18:59
pabelangerHmm18:59
jlkand as soon as the node shows up, the job goes through19:00
jlk(new node)19:00
pabelangerright19:00
pabelanger1 sec19:00
pabelangerjlk: http://docs.openstack.org/infra/system-config/zuul.html#restarts19:00
pabelangeryou should follow that process for restarts19:00
pabelangerit should outline how to handle the missing registrations19:01
pabelangeryou can also release you nodes back to nodepool too, before you stop now19:01
jlkkk19:02
pabelangeractually, that is only zuul-launcher19:02
jlkhopefully we don't have to restart nearly allt hat much19:02
pabelangermy mistake19:02
pabelangerjlk: Ya, we usually schedule them19:03
pabelangerjlk: look into zuul-launcher graceful19:03
pabelangergive that a try19:03
pabelangerthat should help19:03
jlkalrighty19:04
jeblairwhen zuul-lancher stops, it returns all nodes to nodepool19:04
jeblairpabelanger, Shrews: my change didn't land :(19:05
pabelangerjeblair: oh, I did't approve19:05
pabelangersorry about that19:05
jeblairpabelanger: i did, but it didn't pass tests19:05
Shrewsjeblair: yep. been looking at those failures, but... geez19:06
jlklooksl ike graceful just does a stop19:06
jlkand then need to start it back up again19:06
pabelangerjlk: it keeps running jobs alive, so when we run it. It takes about 3 hours19:06
pabelangerOh, yes19:06
jeblairjlk: fyi, none of this is relavent in zuulv319:06
jlkjeblair: fair enough. Just trying to get a handle on running something to fully understand the changes of v319:07
*** abregman has quit IRC19:07
jeblairjlk: yep, just wanted to make sure you were aware so could plan accordingly :)19:07
pabelangerwe do have a playbook to handle some of this too today19:08
pabelangerthankfully, restarts are minimal right now19:09
jlkwe're in somewhat deep dev mode, so we've got frequent code changes, so restarts are more common place for us currently19:10
pabelangerhttp://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/hard_restart_zuul_launchers.yaml and http://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/restart_zuul_launchers.yaml are what we have today19:10
jeblairpabelanger, Shrews: i see the problem with the tests19:14
jeblairpatch shortly19:14
pabelangermordred: ya, also not a fan of the magic numbers, open to suggestions19:15
jeblairpabelanger: ^ what did i miss?19:20
pabelangerjeblair: 40548219:22
jeblairpabelanger: wow i don't understand that19:23
clarkbpabelanger: jeblair I think a more readablr way to do that is change the loop orders19:24
pabelangerjeblair: ya, some magic for sure19:24
jeblairi think the loops are in the right order19:24
clarkboh ya its image build provider upload19:24
jeblairwe could add sorted() calls to each one, or change the zk methods to return them sorted19:25
pabelangerYa, I think might be cleaner19:25
pabelangerI can work on that19:25
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add stack dump handler to builder  https://review.openstack.org/40552919:25
jeblairpabelanger, Shrews: ^19:25
pabelangerjeblair: because I don't know, why did you need to update tests/__init__.py?19:28
Shrewsb/c he added the self.name attribute19:29
Shrewsto the workers19:29
pabelangerah, I see19:29
pabelangerthe final whitelist check19:30
pabelangerthanks19:30
jeblairthey were slipping in under "Thread-" earlier ("apscheduler thread pool")19:30
*** openstackgerrit has quit IRC19:32
*** openstackgerrit has joined #zuul19:34
*** openstackgerrit has quit IRC19:36
*** bhavik1 has quit IRC19:38
*** saneax is now known as saneax-_-|AFK19:40
*** openstackgerrit has joined #zuul19:41
openstackgerritJames E. Blair proposed openstack-infra/zuul: Fix variants not picking up negative matches.  https://review.openstack.org/39987119:41
openstackgerritJames E. Blair proposed openstack-infra/zuul: Add test for variant override  https://review.openstack.org/39987119:42
*** openstackgerrit has quit IRC19:43
jeblairSpamapS: i think i found one error in the tests you're adding in 399871.  when i correct that and run it with my fix, it works.  can you take a close look at that and make sure my analysis is correct?19:44
dmsimardrcarrillocruz: o/ where can I see those ansible based jobs run for devstack you've been working on ?19:48
jeblairdmsimard: https://review.openstack.org/#/q/topic:zuulv3+project:openstack-infra/devstack-gate19:49
jeblairdmsimard: the first one has merged already19:51
mordredwe're working through transitioning d-g stepwise in small chunks19:52
SpamapSjeblair: ditto here ;)19:52
*** openstackgerrit has joined #zuul19:52
openstackgerritMerged openstack-infra/nodepool: Add stack dump handler to builder  https://review.openstack.org/40552919:52
dmsimardyeah, I'm trying to see where the ansible bits are running19:52
* dmsimard looks19:52
*** jamielennox|away is now known as jamielennox19:53
jeblairdmsimard: note the callback plugin, so the output is changed slightly from standard19:53
dmsimardjeblair: yeah, in fact I was curious to see what it would look like if I threw ARA in there19:55
jeblairdmsimard: d-g is self testing, so please do... maybe you could put a change at the end of rcarrillocruz's stack so it has something to chew on; that could be really useful :)19:56
clarkbkeep in mind we do index the text logsfot aggregate processing so we cant replacethose, but having athing alongaide for specific ansible stuffcould be nice19:58
jeblairclarkb: yeah, i'm thinking at the very least a lot of the "setting up host this should take 10 minutes" stuff gets a lot better19:58
dmsimardclarkb: I actually want a way to export reports in a elasticsearch-friendly manner at some point in time so we can index fields and such. ARA can already sort-of do that through cliff which supports json output for the CLI but it's not straightforward to crawl through all the results and dump them right now.20:00
pabelangercome on time, move faster. 90mins until our next round of builds on nb01.o.o20:00
clarkbya its also not necessaryto solve right away as long as you krrp the old text logs too since wealready process those20:01
dmsimardSo instead of having logstash parse through console log output, you'd have real groks/filters/fields to work with20:01
pabelangerclarkb: also, no upload failures yesterday20:01
clarkbpabelanger: at all? seems like a list i saw had failures20:02
jeblairclarkb, pabelanger: maybe our cloud providers haven't blacklisted our new builder ip.20:02
clarkbthe sort output one20:02
pabelangerclarkb: those were 2 days ago20:02
clarkbah20:02
pabelangerjeblair: ha20:02
rcarrillocruzdmsimard: we were discussing yesterday about dropping role defaults and have a central vars file to put all d-g variables20:05
rcarrillocruzhttps://review.openstack.org/#/c/404974/20:05
rcarrillocruzclarkb: ^20:05
rcarrillocruzjust in case you start ansiblying other stuff20:05
clarkbdmsimard: also we have tons of real fields :) it just happens that one of them starts as the entire message.But we inject a bunch of job info too20:05
rcarrillocruzif we agree on ^, i suggest you put your vars on it20:05
dmsimardrcarrillocruz: what's the last patch in your tree ?20:05
rcarrillocruzclarkb when you get a chance, have a look, i pasted last night some paste link with how it looks likes20:05
dmsimardrcarrillocruz: I'll send a poc patch with ara at the end just to see what it looks like20:06
rcarrillocruzhttps://review.openstack.org/#/c/404243/ , although that one i need to iterate, it's wip now20:06
dmsimardclarkb: fair, not mutually exclusive -- porque no los dos :p20:06
pabelanger+220:08
pabelangermakes sense20:08
pabelangerjeblair: Shrews: anything I should be working on for nodepool?20:09
adam_gjeblair: so ive been banging on test_scheduler:test_build_configuration, and am wondering why merge_mode is hard-coded to MERGER_MERG_RESOLVE in v3 at https://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/model.py?h=feature/zuulv320:10
jeblairpabelanger: the sigusr2 patch landed, can you restart the builder?  then we can get an idea of what those threads are20:10
pabelangerjeblair: sure20:10
pabelangerjust waiting for git to update20:11
rcarrillocruzerm, thx for the topic change jeblair20:11
jeblairadam_g: that should just be the default, but it should be overridable in configuration; i don't know if that made it into configloader, if not, we should add it.20:11
pabelangerjeblair: restarted20:12
adam_gjeblair: configurable on the project?20:13
adam_goh, merge-mode20:13
jeblairadam_g: yep, like "project: nova\n merge-mode: something"20:13
jeblairyeah20:13
jeblairpabelanger: i just sent it sigusr220:13
pabelangersee it in debug logs20:14
dmsimardjeblair, rcarrillocruz: there you go, let's wait and see: https://review.openstack.org/#/c/405613/20:14
pabelangerthats a lot of threads20:14
clarkbrcarrillocruz: what does the second arg to default() do there?20:14
clarkball the docs I can find are for single arg default()20:14
adam_gjeblair: ok yeah, doesn't look like its configurable ATM. will see about adding it and adding some tests for it20:15
jeblairadam_g: cool, thanks!  that's an easy one for us to overlook since right now, we're using the default exclusively20:15
jeblairbut definitely need to support the others20:15
jeblairyeah, so we do have one thread for each provider for each worker20:16
clarkbhttp://jinja.pocoo.org/docs/dev/templates/#default finally found the function signature20:17
jeblairpabelanger, Shrews: ah, right because the taskmanager itself *is* a thread20:17
jeblairso probably what we want to do is either decide that we want to serialize all access to a provider from a given process, or if we want to keep it parallel, stop using taskmanagers here20:18
jeblairi lean toward option 220:19
jeblairi'm going to grab some lunch20:19
rcarrillocruzclarkb: lookup doesn't return None20:20
rcarrillocruzbut an empty string20:20
rcarrillocruzthus we need the false thing20:21
rcarrillocruzdmsimard: \o/ excited to have ARA!20:21
* rcarrillocruz goes for dinner, be back shortly20:21
clarkbrcarrillocruz: I thnik if mordred and jeblair are happy with 404974 we can go ahead and approve that then rebase things like https://review.openstack.org/#/c/402107/8 onto it20:22
rcarrillocruzdoesn't return None if var is not in environment, that is20:22
clarkbrcarrillocruz: ya20:22
Shrewspabelanger: uh, *shrug*. anything you want to pick up from storyboard is fine with me. i may experiment with using shared ZK connections for a bit, in case we need decide to go that route (though I'm not certain how necessary it is yet)20:24
pabelangerShrews: k, let me check the board20:25
*** openstack has joined #zuul20:46
pabelanger20mins, fedora-23 build starts20:58
dmsimardrcarrillocruz: neat: http://logs.openstack.org/13/405613/2/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/507dbc5/logs/ara/playbook/97db8f0e-56f8-4925-91a5-710a19e4ea7b/host/localhost/index.html21:08
jeblairvery cool21:09
pabelangerya, that looks nice21:09
dmsimardjeblair: so basically I wonder how this would work from a Zuul perspective -- like, could we get the whole thing recorded "first party" by Zuul when the "end user" job leverages Ansible ?21:10
dmsimardI had done this a long time ago: https://review.openstack.org/#/c/330874/ but I sort of stopped there because what Zuul did with Ansible at that time was still fairly limited21:11
*** rcarrillocruz has quit IRC21:12
pabelangerI would imagine it being some sort of post playbook21:12
jeblairdmsimard: yeah, i think so.  probably by making it easy to add callback plugins.21:12
jeblairdmsimard: we're going to have a pretty big focus on roles in v3 -- would it be possible to group the tasks by roles?21:12
jeblairpabelanger: there are those two nodepool v3 stories i filed yesterday21:13
dmsimardjeblair: more or less... last I checked, Ansible doesn't really pass the concept of roles to callbacks -- files are recorded and ARA can filter by files afterwards. Ex: http://logs.openstack.org/13/405613/2/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/507dbc5/logs/ara/playbook/97db8f0e-56f8-4925-91a5-710a19e4ea7b/file/3f036130-6be1-428f-97c5-d7e19f46d878/index.html21:14
jeblairpabelanger, Shrews: i'm going to look into avoiding the taskmanagers21:14
pabelangerjeblair: ack, let me get started on a test to reproduce it21:14
dmsimardjeblair: the closest I have right now is to be able to filter by play properly (like, if there is a 1-to-1 relationship between plays and roles..)21:15
jeblairdmsimard: got it -- the filter next to the role file is useful21:15
clarkbdmsimard: firefox gets really unhappy with that21:15
clarkb(thinks its a run away script)21:15
jeblairclarkb: wfm21:15
dmsimardclarkb: it does ? let me look21:15
clarkbjeblair: could be my local browser then21:15
jeblairdmsimard: yeah, we *could* do 1:1 here, but i don't know that we would end up doing that generally in v321:15
clarkb(it eventually works thouhg)21:15
dmsimardclarkb: ok.21:16
dmsimardjeblair: a 1:1 relationship is an ugly constraint to work with, I'll ask the ansible devs if there's anything exposed that I could use cause I would definitely love that.21:16
jeblaircool21:16
jeblairmordred, Shrews: ^ fyi21:17
dmsimardjeblair: back to my zuul question, though -- does Zuul fork out to another ansible process on the nodepool node ? Or does it sort of run the whole thing straight from ansible launcher ?21:17
* dmsimard not familiar enough with new zuul architecture21:17
jeblairdmsimard: from the launcher21:17
pabelangerjeblair: okay, I didn't comment since I didn't know how much work was involved.21:17
dmsimardOk, so if ARA would run on the launcher and everything else until the end is ansible, we could do something like http://status.openstack.org/openstack-health/#/ basically.21:18
pabelangerI think how we do stackviz today works, assuming you can get access to all the data from the launcher21:18
jeblairdmsimard: i think we would want to package up the results and publish them somewhere in a post playbook as pabelanger suggested21:19
jeblair(launchers aren't meant to serve end-users)21:20
dmsimardYeah, stackviz is great -- ARA can generate a static version too. At some level of scale it gets sort of absurd, though. Like OpenStack-Ansible's gate has >6k tasks which ends up being >10k files and ~50MB gzipped .. all from a 6MB sqlite database file.21:20
clarkb(which is basically how this test worked)21:20
pabelangerindeed, that would be pretty slick if we did that21:20
clarkbdmsimard: all the more reason to distribute it ?21:20
dmsimardclarkb: but we don't /need/ to generate the static version21:20
jeblairdmsimard: ah, yeah, sure we can copy the sqlite file over to something that can serve it21:21
dmsimardIt's a flask app that's super lightweight when using sqlite/mysql21:21
dmsimardjeblair: yeah, I discussed with mtreinish to sort of retrieve the sqlite files the way he retrieves the tempest results21:21
clarkbdmsimard: right but what happens when an openstack ansible change fails a bunch of tests and all of a sudden you have to render all of those at once off a single server21:21
pabelangerclarkb: so, have a central ARA server that will generate things and archive them off some place?21:21
clarkbpabelanger: dmsimard is suggesting that we have the launcher possibly host the contents21:22
pabelangerAh21:22
dmsimardnot really, hang on, let's start again21:22
jeblairdmsimard: well, we will have different constraints with v3, in that we will be able to push things rather than pull, but still we'll have the idea of launcher collects data, then *something* is shipped off to *somewhere* at the end of the job.21:22
jeblair(that could mean shipping a small data bundle to a central ara flask server)21:23
dmsimardWhat's important about the launchers if to have the callback configured and saving stuff. The stuff can be saved to a central mysql server (not unlike openstack-health) directly through sqlalchemy -- or saved locally inside a sqlite database that'd be pushed/pulled and imported in a central place (again, like openstack-health)21:24
dmsimardGenerating the static version of the app is nuts at any kind of serious scale, like beyond 4k tasks is not reasonable .. at least currently. So I'd tend to run the database driven flask app instead of generating the static files.21:24
jeblairoh if the callback can stream to mysql server that's probably even easier then :)21:25
dmsimardThis runs off of the flask app with sqlite: http://46.231.133.111/21:25
dmsimardjeblair: yeah, it's just sqlalchemy .. http://ara.readthedocs.io/en/latest/configuration.html#ara-database21:25
dmsimardit's magic*alchemy21:25
clarkbdmsimard: one benefit of the static generation is you pay the cost once for any single report and that cost is eaten by the test node itself. But if its unwieldy for a web broswer to handle I can see how that wouldn't work well21:25
jeblaircool, so i feel pretty confident we can fairly easily slip this in to v3 when the time comes21:26
dmsimardAt scale, it's not a web browser issue .. I mean, you're still browsing one page at once, right ? It just takes a long time to generate the static version and it's a stupid amount of files.21:26
clarkbdmsimard: gotcha so the 50MB isn't a single file that browser has to uncompress21:27
jeblair(since the callback will run on the launcher, security concerns are lessened)21:27
clarkbits many little files21:27
clarkbdmsimard: how long does it take?21:27
dmsimardclarkb: right, it's 50MB in ~10k files for the openstack-ansible playbooks. 15 minutes on my laptop.21:27
clarkbgotcha21:27
dmsimardclarkb: it's otherwise a 6MB sqlite file that's instantaneously loaded21:27
dmsimardThis is probably the heaviest page from OSA: http://46.231.133.111/playbook/293c1e05-066d-44a6-b7e1-11cc4fc4a1ef/21:27
clarkbdmsimard: and you don't see that high cost reflected on dynamic laods of a centralized server?21:28
jeblairdmsimard: have you tested what it might look like with 3600000 runs in a single db?21:28
clarkbright ^ being the next thing :)21:28
dmsimardclarkb: if there's no one browsing it, there won't be any load21:28
jeblair(that's ~6 months for us)21:28
*** rcarrillocruz has joined #zuul21:28
dmsimardjeblair: Nope, I have no clue. tbh there are probably a fair bit of improvements to do at that kind of scale. Some features that are definitely not in yet. Searching, paging, etc.21:29
dmsimardI'm only one guy and this is a side project though :(21:29
jheskethMorning21:29
jeblairdmsimard: well, searching/paging probably isn't too important off the bat -- we can deep-link job results into it21:29
clarkbthere is definitely a trade off between generating things once at high cost vs generating only what you need on demand at potentially high cost21:29
dmsimarddoesn't require "generating" on demand21:29
clarkbdmsimard: sure it does21:30
dmsimardit's flask/jinja rendered in real time, database driven21:30
clarkbdmsimard: you have to serve http and js with db data21:30
dmsimardI don't see that as a potentially high cost21:30
clarkbdmsimard: if it takes 15 minutes to do that for one job the total cost of all those pages is ~15 minutes21:30
clarkbdmsimard: now if you get users to browse that job several times its still 15 minutes21:30
dmsimardNo, it's not21:31
clarkbin the static generation case21:31
dmsimardyeah, for static.21:31
jeblairclarkb: though not all jobs get looked at21:31
clarkbin the dynamic case it depends on how many pages they hit21:31
clarkbjeblair: right21:31
dmsimardOtherwise it's 0 minutes or 1 second for one page or something :p21:31
dmsimard(dynamic)21:31
clarkbmy point being depending on browsing habits one may be better than the other21:31
mordredyah - I think it'll be a thing where poking at it a few different ways will likely teach us a lot21:32
jeblairif i had to guess, from a global carbon footprint perspective, the central server would probably be a win for us :)21:32
dmsimardThat said, this is part of the reason why I submitted https://review.openstack.org/#/c/397773/21:33
clarkbdmsimard: oh cool is that ready?21:33
dmsimardIt wfm on logs-dev21:33
clarkbawesome will review21:33
rcarrillocruzmordred , jeblair : you good with +A https://review.openstack.org/#/c/404974/21:34
rcarrillocruzif so, i'll rebase the chain on it and remove role defaults21:34
dmsimardclarkb: ty :)21:35
pabelangerdoh, misread our DIB schedule, another 45mins until fedora-23 start on nb01.o.o21:35
clarkbthere are other benefits to consider too, for example rendering in job means you can self test changes to the hosting. Downside to that means old content wont' get bugfixed when you push them only new things will. etc. I think one of openstack-health's biggest issues has just been the pure quantity of data21:36
clarkband since it tries to provide a global view we can't as easily split that workload out onto the test jobs themselves21:36
clarkbbut "we have too much data in the mysql db" is probably a good problem to have as far as problems go21:37
pabelangerthat's what I like about stackviz, the in job rendering. So I hope we could also do the same with ARA, even if we did a central approach21:37
*** abregman has joined #zuul21:38
jeblairrcarrillocruz, mordred: i +3 404974 but mordred should still see it21:39
rcarrillocruzthx21:40
* mordred looking21:40
mordredyah - looks great21:41
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder  https://review.openstack.org/40566321:56
jeblairmordred, pabelanger, Shrews: ^ that should get rid of the bulk of our idle threads.  even with that change, we can still rework the builder so that all the workers share providermanagers, and if we do that, it's actually a simple boolean switch to whether they serialize cloud api calls or not.21:58
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder  https://review.openstack.org/40566322:00
*** openstackgerrit has quit IRC22:03
mordredjeblair: that patch makes sense - and image uploads are certainly not what TaskManagers exist to help with22:04
rcarrillocruzmordred: mind re-reviewing https://review.openstack.org/#/c/401975/ pls, i amended clarkb  comments22:19
SpamapSjeblair: indeed, 399871 looks correct now, thanks for spotting that. I'm pushing a new patch that fixes pep8.22:47
jeblairSpamapS: thanks, sorry i missed the pep822:47
*** openstackgerrit has joined #zuul22:48
openstackgerritK Jonathan Harker proposed openstack-infra/nodepool: Write per-label nodepool demand info to statsd  https://review.openstack.org/24603722:48
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul: Add test for variant override  https://review.openstack.org/39987122:49
*** abregman has quit IRC23:03
openstackgerritJamie Lennox proposed openstack-infra/nodepool: Accept user-home in config validator  https://review.openstack.org/40451923:25
jeblairadam_g: https://storyboard.openstack.org/#!/story/200078523:43
jeblairSpamapS: take a look at https://storyboard.openstack.org/#!/board/3923:46
jeblairSpamapS: i wrote a quick script that uses the storyboard api to semi-automatically manage that board23:46
jeblairSpamapS: the rules are: merged items are removed from the board, inprogress/review must show up in "in progress" or "blocked" (in progress is the default). todo must show up in "new", "backlog" or "todo" ("new" is the default).23:48
jeblairSpamapS: so basically, a "todo" task it hasn't seen before will be added to "new".  we can move that to backlog or todo.  when someone grabs it, it will be moved to "in progress" automatically.  we can then move it to/from blocked as needed.  when it merges, it's removed from the board.23:49
jeblair(could easily have a merge lane as well, but i'm not sure how useful it is)23:49
jeblairZara, SotK: ^ fyi23:52
openstackgerritJamie Lennox proposed openstack-infra/zuul: Re-enable requirement vote tests  https://review.openstack.org/40106123:54
openstackgerritJamie Lennox proposed openstack-infra/zuul: Re-enable requirement status tests  https://review.openstack.org/40106223:54
openstackgerritJamie Lennox proposed openstack-infra/zuul: Re-enable requirement reject tests  https://review.openstack.org/40106323:54

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!