Friday, 2017-06-09

*** jamielennox|away is now known as jamielennox00:02
jamielennoxis there a way i can get zuul to scan my project-config00:03
jamielennoxs/scan/validate00:03
SpamapSmordred: I'm just poking at a thing that translates layout <-> layout and does not attempt to grok jjb00:13
SpamapSMy thinking is to keep it as simple and unambitious as possible. pipelines and projects to a central config is step 1, and then I want to try and have project job selection moved into git repos as .zuul.yaml00:14
SpamapSjamielennox: v2 yes, v3, start it up.00:14
SpamapSjamielennox: v3 has trouble with validation the way v2 did it so all we do is validate that the config can parse.00:15
jamielennoxSpamapS: yea, it's just leading to me debugging by push change to git, restart server, parse logs00:15
jamielennoxi'm not sure how exactly you could expose it with all the in-repo stuff, just asking00:15
SpamapSjamielennox: I think we could definitely write a validator that loads config.00:48
SpamapSjamielennox: it just hasn't been done.00:48
SpamapSlike, just read the zuul.conf and the full layout from every source, like you would at startup00:48
jamielennoxSpamapS: so realistically i don't even need that, just scan the file i give you and does it make sense00:49
jamielennoxoh, but you can't00:49
SpamapSOh yeah for that I think you can write a little cmdline entry point00:49
SpamapSWell you can00:49
SpamapSyou can run it through the voluptuous schema checker00:49
jamielennoxyea, but it depends where things like pipelines and connections are defined00:49
jamielennoxbecause gerrit: is a key, and if that's defined elsewhere your schema will fial00:49
jamielennoxzuul (at least used to) validate reviews against its own project-config so once you have it working it's not so bad00:50
jamielennoxbut if validation fails on startup it seems to take down a thread00:50
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Store cache expiry times per status object  https://review.openstack.org/46133000:57
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use routes for URL handling in webapp  https://review.openstack.org/46133100:57
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use tenant_name to look up project key  https://review.openstack.org/46133200:57
SpamapSjamielennox: I think you could write a cli version that validates the whole config. Not so sure about fragments.00:59
*** jamielennox is now known as jamielennox|away01:00
SpamapSjamielennox|away: that said, for fragments... submit review, wait for angry report of failed config load?01:00
*** jamielennox|away is now known as jamielennox01:10
jamielennoxSpamapS: yea, i think once you're up and running you can rely on the reports01:11
jamielennoxjust for now the error is happening during startup and causing problems01:11
SpamapSon startup you can do the whole-config thing01:12
jamielennoxSpamapS: is there anything about bubblewrap that means the executor should run as root01:13
jamielennoxi caught something about it on IRC the other day but don't remember it exactly01:13
jamielennoxwe're currently running executor as a zuul user and i have to change the finger port to make that happen, which is fine, i don't care about users being able to finger the executors directly01:14
jamielennoxbut bubblewrap is the other thing i can think of that would be affected by this01:15
mordredSpamapS: oh! right - the layout conversion. nod01:16
SpamapSjamielennox: no, it should install setuid by default02:22
SpamapSjamielennox: OR if you have a very new kernel, it can do its thing with USER_NS stuff without setuid02:22
jamielennoxSpamapS: so your saying i have to run executor as root?02:43
tristanCjamielennox: bwrap should be root setuid so that it can be used by regular user03:25
*** dkranz has quit IRC03:26
*** dkranz has joined #zuul03:31
*** hashar has joined #zuul09:09
*** hashar has quit IRC09:23
*** hashar has joined #zuul09:40
*** hashar has quit IRC10:18
*** jkilpatr has joined #zuul10:59
ShrewstristanC: left you a comment on https://review.openstack.org/472128 along with the +2. was there a reason to avoid the 2 extra characters for 'list_address' ?12:37
Shrewsvs listen_address12:38
tristanCShrews: not at all, it's a mistake, should be listen_address12:40
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Add webapp port and listen_address configuration  https://review.openstack.org/47212812:41
Shrewstfw the code you had semi-working yesterday no longer works at all today13:08
pabelangerShrews: tristanC: we should also make sure to update good.yaml fixtures too for 47212813:24
tristanCpabelanger: done13:56
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Add webapp port and listen_address configuration  https://review.openstack.org/47212813:56
pabelangertristanC: thanks13:56
tristanCyou're welcome, thanks for the review!14:25
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add reporter for Federated Message Bus (fedmsg)  https://review.openstack.org/42686115:00
ShrewsSo, something has merged within the last few days to affect ansible jobs in unit tests15:07
Shrewsbecause what used to work, now hangs here: http://paste.openstack.org/show/612143/15:08
pabelangerdoesn't look to be using bubblewrap15:10
Shrewshrm. lemme rebuild the env. forgot to do that when i rebased15:12
Shrewspabelanger: you run fedora, yes? what's the package for bubblewrap?15:21
pabelangerShrews: should be bubblewrap for package name15:21
pabelangerthere is also bwrap-oci, but haven't tried that15:22
Shrewshrm, bubblewrap already installed, it seems15:23
Shrews*sigh*15:23
Shrewsok, well the pause was from my wait_for in my playbook. but now i'm not getting build.jobdir populated15:32
* Shrews takes a long lunch break to blow off frustration. bbl15:32
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add reporter for Federated Message Bus (fedmsg)  https://review.openstack.org/42686115:58
openstackgerritJames E. Blair proposed openstack-infra/zuul-jobs master: Add Sphinx module for Zuul jobs  https://review.openstack.org/47274316:04
jeblairSpamapS, mordred: can you take a look at that change and its parents?   ^ i'm working on establishing a documentation framework for the zuul stdlib.  obviously that should be spun out into a new sphinx module, but incubating it there for the moment.16:09
SpamapSmmmmmmmmmmmmmm doc framework16:19
mordredjeblair, tristanC, pabelanger, jlk, SpamapS, Shrews: ok. I have sent email to the list on the REST question - sorry it took so long, there was a bunch of research needed doing16:22
mordredclarkb, jamielennox: ^^ you too - sorry for lack of ping16:28
SpamapSmordred: that's good readin16:47
pabelangerjeblair: I have restarted ze01.o.o. We look to be running now16:48
jeblairneat!16:49
jeblairpabelanger: how about we stop zuulv3-dev now16:49
pabelangerjeblair: yes16:49
jeblairi'll do that16:49
pabelangerhttp://zuulv3.openstack.org/16:49
pabelangershould also properly display things now16:49
jeblairzuulv3-dev is stopped16:50
jeblairi will recheck a change now16:50
jeblairhrm, i'm going to restart the scheduler16:51
pabelangerack16:52
pabelangerseen something that time16:52
SpamapShow does zuulv3.openstack.org know what status.json to get?17:02
jeblairSpamapS: it's in the apache config for now17:03
jeblair    RewriteRule ^/status.json$ http://127.0.0.1:8001/openstack/status.json [P]17:03
jeblair  RewriteRule ^/status/(.*) http://127.0.0.1:8001/openstack/status/$1 [P]17:03
jeblairwe should add a story about making the status page tenant aware, if there isn't one already17:04
mordredjeblair, SpamapS: yah - and we'll probably want to figure out an auth story aroud access to tenant status too (I do see that jamielennox already has patches for tenant-aware logs which should also allow a similar amount of protection)17:06
jeblairmordred: right.  for now, the story is, if you have private tenants, build that into your web server infrastructure.17:07
jeblair(like, use mod_auth_foo to restrict access to /private_tenant/....)17:08
jeblairauth in zuul itself would be a significant distraction at this point17:08
mordredjeblair: ++17:08
jeblairSpamapS: have you started any work regarding the new ssh context manager/wrapper thing in the merger?  (this came up in the context of test_timer_sshkey)17:18
SpamapSjeblair: none.. that's way down in the stack unfortunateyl.17:22
* jlk looks at the size of the scrollbar on mordred 's email, settles in for a long read.17:22
jeblairSpamapS: okay, i think it's about to pop onto the top of my stack17:22
jeblairSpamapS: pabelanger and i are suspecting that change is not at all working in reality17:22
jeblairi do not see how to use the context manager for an initial clone17:25
jeblairSpamapS, pabelanger: okay, i've confirmed that's the problem with the merger.  i will start working on a fix.  it will take a little bit, but i think we can do it fairly easily and comprehensively.17:34
jeblairpabelanger: i have to take care of a few things first; it will probably be a couple of hours until i'm ready with this change.17:38
jlkmordred: just to screw with your noodle, what if we skipped REST and went straight to GraphQL?17:38
pabelangerjeblair: understood17:39
pabelangerjlk: is GraphQL the new hotness these days?17:40
jlkpabelanger: seems like a buzzy word, but I'm not really clear on what problems it solves. I'm mostly aware of it because Github builds their website with graphql, and are now exposing it so that us app writers can get first-class access as they change the platform, instead of waiting for somebody to re-write everything for their REST API.17:42
jlkat some point, we're going to have GraphQL in zuul, to deal with github, at least on a client level.17:42
jeblairfwiw, i'm not sure 'rest api' actually describes what we'll end up doing with zuul anyway.  more like 'http api'.17:42
pabelangerYa, I've only heard of it from here and githubv4 (I think). Not sure what else is using it.  Looks like facebooks started it17:43
jlknod17:43
jeblair(there are almost zero actual REST apis in the world)17:44
SpamapSBecause that's hard and mostly an exercise in correctness not pragmatism? :)17:46
SpamapSLooks to me like GraphQL is more about being super expressive on the client side about what you want from the server and reducing round trips by allowing secondary lookups in the responses.17:47
SpamapSI could see that being useful for a better status responder17:47
SpamapSLike if no jobs are expanded you don't need the lists of jobs.17:48
jlkyeah I'm not seriously suggesting it.17:48
jlkalthough now that I re-read what you're saying17:48
SpamapSjlk: I actually think it's for making what we do with status.json more efficient.17:48
SpamapSI mean, status.json is 110 - 120 kB17:49
jlkMy basic read is that you can so "Go fetch me this info, and while you're there, fetch this adjacent info, and that info, and that over there too, and all teh things attached to that"17:49
jlkor you can just say "I only want this tiny bit"17:49
SpamapSif you refresh that 1/s ... that's not a trivial amount of data if you have a lot of users watching.17:49
SpamapSjlk: its like SQL, for backend API queries. ;)17:49
jlkIt'd allow us to go from 4 or 5 round trips to the github API to a single call.17:50
SpamapSjlk: yeah seems like something worth it for Github to push, since it will likely reduce their bandwidth usage and server side wasted overhead.17:50
SpamapSso I think it's a super valid thing to push for when we revisit status.json17:51
SpamapSalso I didn't know we had some kind of frontend for nodepool?17:51
SpamapSor is it just an HTTP backend17:51
jlkI could see these things being useful for monitoring17:52
jlkparticularly in a containerized world, where we're not going to run zuul _and_ a monitoring daemon17:52
jlkI want something external that can poke at it and tell me if it's healthy17:52
jlkand later data mining for efficiency fixes. "where are you spinning your wheels"17:52
SpamapSyeah those /health checks mordred mentioned17:53
SpamapSjlk: that's more statsd's job tho17:53
SpamapSjust let zuul spit that stuff out and you can go get answers from influxdb or datadog17:53
jlkhow does statsd work? what collects it? Do you have to tell it what to send things to?17:54
jlkbrb, walking the furry four legger.17:54
mordredSpamapS: /health checks are a thing k8s people want for prometheus18:00
mordredSpamapS: I'm not saying we implement them today - but it's a thing that will come up in the future18:01
SpamapSI like them for stuff too18:01
mordredyay18:01
SpamapSI used to have nagios ssh'ing into boxes and restarting our apache when /health timed out18:01
SpamapSthis isn't a new paradigm ;)18:01
SpamapSjlk: statsd is daemon that listens for a UDP protocol. You add little calls in your app when you want to increment counters and it sends them off to statsd. Zuul and nodepool already do that.18:02
SpamapSjlk: and then statsd takes those increments and puts them where you want.. graphite/influxdb/etc. Datadog speaks statsd too.18:03
SpamapSmordred: does my response make sense at all?18:18
SpamapSI'm kind of arguing that you have a nice clean "here's where the world talks to Zuul" spot, and then you can have any kind of messy backend weirdness you want behind that.18:19
mordredSpamapS: I goes to read18:35
mordredSpamapS: it does - although amusingly enough part of it (the zk async stream concern related to webhooks) is in the spec I wrote about ingestors that I told peole to ignore that clarkb also suggested mqtt for. I don't want to actually get in to that debate currently (I think there are pros and cons to each and we shoudl discuss those) - but I don't think we have to for now18:40
mordredSpamapS: I would like to bikeshed on the one vs. two endpoints thought though18:41
mordredlemme respond to the email first though18:41
*** yolanda has quit IRC18:43
jlkSpamapS: right, so okay. The app has to support shuffling things off to statsd itself, and has to have configuration for _which_ statsd to shuffle them off too.18:45
*** yolanda has joined #zuul18:45
jlkSpamapS: that's a little different in that you need prior knowledge of this stuff IN the config file to run the service, making moving an image to different environments a bit more difficult (but not really since we already ahve some of that stuff).18:46
mordredjlk: it's acutally usually configured via env vars though18:46
jlkwhereas if it were health things hanging off the API, it wouldn't matter where they're ran, outside parties could poke at the health points.18:46
jlkmordred: I can't tell you how much I hate configuration via ENV vars.18:46
mordredme too - but you know it's the "right" way in cloud native, right? ;)18:46
mordredhttps://12factor.net/config18:47
mordredjlk: but in any case, in this particular instance it's a mechanism where communicating to statsd where the statsd is for your k8s would be fairly easy to accomplish without app having a-priori knowledge, no?18:47
jlkso..18:48
jlkyeah, statsd daemon startup should advertise its location to k8s, which will toss it in etcd18:48
mordredalso - to provide metrics to a /status endpoint, the app has to _save_ the metrics somewhere18:48
mordredjlk: ++18:49
jlkyoru service still have to be able to read from etcd to, at start up, know where to send statsd stuff18:49
jlk(and within the cluster that place is going ot be fairly static, due to k8s proxy things)18:49
clarkbmordred: it was mostly a response to having a spec for a thing that already exists mostly. I was saying you don't need a spec for that you just need to have zuul read from mqtt18:49
mordredclarkb: right. I'm not convinced that actually solves the use cases18:49
SpamapSjlk: it's push vs. pull all over again. :)18:49
mordredclarkb: which is why I think it'll be fun to talk about18:49
* SpamapS will ponder whilst at the gym18:49
jlkmordred: hrm, I hadn't thought about saving the metrics, just more of getting snapshots of state periodically. Hit hte API point to get a snapshot of state, some external system does the time series analysis18:50
mordredjlk: yes. I think polling urls for some state is a FINE idea18:50
mordredlike, polling the zuul status page, for instance - would be a fine way to get an amount of data18:50
jlkI also think statsd is a fine idea.18:50
jlksince we have ALREADY built statsd support into the app18:51
mordredyah- I think having a /status AND also emitting to statsd is best of both worlds18:51
jlkwhich reminds me, should probably take a pass through the github driver code and litter some statsd about18:51
mordredI think there will be metrics we provide to statsd that would be hard to collect and provide to /status - but maybe that's fine18:51
jlkyeah I can see a separation of concerns18:53
jlkyou don't want lengthy analysis to happen when hotting the status/ url18:53
jlknot unless we do something like graphQL and allow taking a sip of the stats vs the firehose18:54
mordredSpamapS: ok. responded with more words than are likely necessary19:12
mordredjlk: agree19:13
pabelangerlynx finger://ze01.openstack.org19:18
pabelanger:D19:18
pabelangerthat is kinda cool19:18
pabelangercannot wait for jobs to be listed19:18
jeblairmordred, SpamapS: i also responded, in an apparently complementary way to mordred.  hopefully walked the right line between "lets decide the now stuff now, and talk about the later stuff later".19:19
jeblairpabelanger: "finger @ze01.openstack.org" should return a nice error message19:21
jeblairlacking a newline19:21
pabelangerjeblair: ya, was playing with that already19:22
pabelangerway trying to see if any client support other then port 7919:23
pabelangerGNU finger apparently does, but that is not shipped in fedora19:23
mordredjeblair, Shrews: that finger command is the most exciting thing ever19:27
* Shrews very angry at finger things atm19:28
* mordred hands Shrews a box of fingers he found laying around19:28
jeblairmordred: you're always bringing game of thrones into things19:29
* Shrews gives middle finger back to mordred19:29
Shrewsdid i win? i feel like i just won19:29
jeblairShrews: yes but you gave the prize to mordred19:30
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test  https://review.openstack.org/47107919:30
Shrewsif someone wants to explain why my ansible job never runs in that test (it did until i rebased yesterday), i would be terribly grateful19:31
Shrewsbecause i'm a bit fed up19:31
Shrewstest_log_streamer.py line 10219:32
mordredShrews: when you say "don't seem to have real sync for ansible jobs" ...19:33
Shrewswe can't pause ansible jobs with hold_jobs_in_build19:33
mordredShrews: http://git.openstack.org/cgit/openstack-infra/zuul/tree/tests/unit/test_inventory.py?h=feature/zuulv3#n2919:34
mordredShrews: self.executor_server.hold_jobs_in_build = True seems to work for me?19:34
Shrewsmordred: for fake builds19:34
mordredOOOHOHHHHHHHH19:34
mordredgotcha19:34
mordredsorry - and thanks for clarifying for my dumb brainhole19:35
Shrewsthe entire test attempts to keep the ansible log file around long enough so i can attempt to start the finger thingy and stream it19:35
Shrewswhich i almost had working yesterday (was streaming and getting contents)19:36
Shrewsnow nothing works19:36
clarkbmordred: I have responded to the great api thread of june 201719:36
pabelangerhttp://logs.openstack.org/79/471079/3/check/gate-zuul-python35/fd2756a/console.html#_2017-06-07_17_13_58_908303 seems to be a warning, but I don't see any other tasks after it19:36
Shrewspabelanger: that's another thing i've been avoiding. i have no idea why that's there19:37
pabelangerYa, I don't see any of the other tasks after welcome19:38
pabelangerpossible that ansible is just not running them now?19:38
mordredI think it's acutally that somehting is failing in that method so no file is getting written19:38
* mordred looks19:38
pabelangerI'd actually expect http://logs.openstack.org/79/471079/3/check/gate-zuul-python35/fd2756a/console.html#_2017-06-07_17_13_58_909229 to be ok=4, since you have 4 tasks19:39
mordredShrews: mind if I push up an update with some added debugging?19:40
Shrewsmordred: go fore it19:41
Shrewsi can't even get that locally. something missing from my local setup?19:42
jeblairShrews: remember the venv activation required when using ttrun on real ansible jobs?19:42
Shrewsjeblair: yup, got that19:42
jeblairdrat, out of ideas19:43
mordredShrews: I think you may have rebased on a commit that's in the middle of the updates to the callback plugin19:43
mordrednothing that should be causing an issue here though19:44
Shrewswelp, going to just rebase on an older commit that works19:45
Shrewsmordred: fwiw, i was getting that v2_playback method error before rebasing, too19:45
mordredShrews: oh - I say that ...19:45
mordredShrews: nope. there is a bugfix that went in after your rebase that is probably the thing screwing you19:46
Shrewsi thought maybe something was wrong with my test ansible config setup causing that19:46
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test  https://review.openstack.org/47107919:46
mordredShrews: I missed an encode('utf-8') on the send to the socket part19:46
jeblairpy3 is the best!19:46
mordredShrews: that adds a debugging and a rebase19:47
mordredShrews: so either it'll work now, or we'll hopefully see the error message19:47
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test  https://review.openstack.org/47107919:48
mordredsorry - one more fix19:48
jlkalright I'm out for the weekend. Would love some eyes on the github caching change object er, change.19:52
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix git ssh env in merger  https://review.openstack.org/47278819:55
jeblairpabelanger, SpamapS, mordred: ^ that's to fix the current blocker on ze0119:56
pabelangerlooking19:57
Shrewsmordred: the exception would seem to imply something wrong with my job yaml, but i don't see it20:00
* Shrews wishes for a separate utility for zuul-purposed yaml files20:06
jeblairShrews: what exception?20:11
Shrewsjeblair: http://logs.openstack.org/79/471079/6/check/gate-zuul-python35/0744d62/testr_results.html.gz20:11
jeblairShrews: s/image/label/20:11
jeblairShrews: https://review.openstack.org/472372 changed that20:12
Shrewsjeblair: other configs use image20:12
jeblairShrews: before yesterday, yes20:12
pabelangerAh, also didn't know we changed it20:13
jeblairtrying to get the breaking config changes in before we write too many real configs :)20:14
pabelangerya, I don't believe we are using that today in our jobs20:15
Shrewsah, mordred rebased it20:15
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test  https://review.openstack.org/47107920:15
mordredShrews: oh - sorry - yah -rebased to get the utf-8 fix, forgot the label change20:16
Shrewsmordred: pushed up a PS to correct it20:16
mordredShrews: fingers crossed20:18
mordredjeblair: your patch has the sads20:18
Shrewsstill don't know why things won't run locally now. just going to have to depend on zuul, i guess20:19
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix git ssh env in merger  https://review.openstack.org/47278820:21
jeblairmordred: thanks.  let's try that ^.  :)20:21
pabelangermordred: jeblair: On the topic of nodepool-drivers, would a heat implemetation fall under the current openstack driver via shade today? Or some new dedicated driver?  Was having some talks with tripleo-CI folks and the use case would be have nodepool create their OVB resources directly when possible20:22
mordredpabelanger: that's a great question. I think a heat driver could be a separate driver, but I'd want it to use shade not use heatclient directly20:24
jeblairpabelanger: could that use case be covered by linch-pin?20:25
mordredpabelanger: it's also possible that making it a different driver doesn't make sense and just adding it as an option to the current one would make more sense20:25
pabelangerjeblair: yes, that is also possible.  I am really starting to like the idea of a generic ansible driver or we just say that is linch-pin thing20:25
pabelangermordred: understood20:26
mordredlke - I'm guessing it would be "boot this heat stack" rather than "boot this server" - so one could imagine having a "stack" option instead of an "image" option20:26
pabelangerI get the feeling it is more about how people would like to support it long term20:26
mordredbut I don't know enough20:26
pabelangermordred: ya, that was my simplest use case atm20:26
mordredpabelanger: it's also worth doing some thought into how we express that zuul-side and how the resources in question make it into the inventory20:26
jeblairyes, it's 'nodepool' not 'stackpool' :)20:27
mordredwhich is to say - I don't think I know enough to know whether it sohuld be linch-pin, a modificaiton to the current openstack driver or a new driver20:27
mordredbut - I can say that in all three of those cases the heat api interactions should all be via shade ;)20:27
mordredalso - the same question will need to be asked re: multi-node and linch-pin integration - how does that get expressed in zuul config and how does it map into inventories20:28
mordredso there is a worthwhile question to explore regardless of impl details20:28
jeblairyep.  i don't think we have time to fully explore it now.  i'd really like us to focus on getting v3 out the door.20:29
jeblairthese are important considerations, but let's add them to the backlog, not get distracted by them now.20:30
pabelangerYa, I don't want to distract on current efforts for sure20:30
mordredyah. agree. but definitely think there is a thing that is super worth digging in to when the time is right20:30
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test  https://review.openstack.org/47107920:32
jeblairmordred, pabelanger: on that note, how about https://review.openstack.org/472472 ?20:32
jeblairpabelanger raised a good point there20:33
openstackgerritMerged openstack-infra/zuul-jobs master: Add initial license, docs, and other config  https://review.openstack.org/47241020:34
openstackgerritMerged openstack-infra/zuul-jobs master: Add revoke-sudo role  https://review.openstack.org/47246120:34
pabelangerYa, on the fense. If we wanted to enforce ansible, it would be a great thing to skip. But the opposite, it is an easy way to run a bash script20:34
jeblairand i guess the answer is how much should the "standard library" python jobs attempt to do?  after conversations with mordred, i'm encouraged to try to push it as far as possible.20:34
pabelanger+2 however, don't want to block20:35
pabelangerya, if we want to support it cool20:35
jeblairpabelanger: i think the idea for this job would be to make it as simple as possible for someone to "just run the python unit tests in a repo".  we might even add things to it later to support nose as well as testr, to try to make it universally applicable.20:35
jeblairpabelanger: so it would work as well for an openstack project as a random github project20:36
jeblairpabelanger: i've been digging into all the stuff we do (in run-tox.sh), and i actually think most of it *can* be universally applied.  which surprised me a bit.20:36
jeblair(as long as we do it carefully)20:36
jeblair(aside from the 50mb subunit limit)20:37
pabelangersure, wfm20:37
jeblairok, 1 test failure on the merger fix; going to refresh that now20:38
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix git ssh env in merger  https://review.openstack.org/47278820:39
jeblairpabelanger: i think the next step in that series is to start decomposing run-tox.sh into roles20:39
pabelangerYa, that seems right20:40
Shrewsmordred: http://logs.openstack.org/79/471079/7/check/gate-zuul-python35/b0b14f4/console.html#_2017-06-09_20_20_30_42976620:40
mordredShrews: ooh - that's fun20:42
Shrewsmissed encoding there20:42
Shrewsunless it's fixed elsewhere?20:42
mordredShrews: oh for the love of ... yes, that's the bug20:43
* mordred feels bad - was testing this locally with py27 - should stop doing that20:43
Shrewsmordred: we are at least now hitting my artificial failure point again. many many thanks20:44
mordredShrews: you got that encoding or want me to?20:46
Shrewsi got it20:46
mordredkk. cool20:46
mordredShrews: is that working locally again too?20:48
mordredby "working" I mean "breaking properly"20:48
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Fix zuul_streamer send() call for py35  https://review.openstack.org/47281620:48
jeblairyay passes tests now20:51
pabelangerYay tests20:52
mordredjeblair: ^^ that change if you get a sec to bump it in20:52
jeblairmordred, pabelanger: what do you think of the documentation-related bits i've started on in 472485 and 472743?20:52
jeblairmordred: +320:53
mordredjeblair: ++20:53
pabelangerjeblair: I like them, I was hoping to see the output20:53
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test  https://review.openstack.org/47107920:54
Shrewsmordred: i don't think so20:54
jeblairpabelanger: yeah; that'll be easier with zuulv3.o.o up and running.  :)20:54
pabelanger++20:54
jeblairpabelanger: you can run 'tox -e docs' with it checked out locally though and you'll get something20:54
pabelangerAgree, I'm not sure I have them cloned locally yet :)20:56
pabelangerwill have to try that out in a bit, looks like heading out for some food momentarily20:56
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix git ssh env in merger  https://review.openstack.org/47278821:01
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix zuul_streamer send() call for py35  https://review.openstack.org/47281621:06
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test  https://review.openstack.org/47107921:10
mordredShrews: zomg. patch 9 actually passed the test for py27!21:11
Shrewsmordred: that one has always passed21:11
mordredoh21:11
mordredoh right - because you skip if not py3521:11
Shrewsright21:12
Shrewsthe string exceptions are new. hoping that's some other random thing21:12
Shrewsmordred: locally, it seems to hang on the 'shell' task since i get no output in the ansible log for that task beyond the name. it seems to continue on zuul, but the current patchset should validate that21:14
Shrewsspecifically:  shell: echo 'Hello, World'21:15
mordredShrews: that sounds like the same issue I'm seeing just when I try to run a simple playbook not in the test suite ... but on the patch where I combined the two log streamers21:19
mordredI wonder ...21:20
Shrewsugh, mysterious StringException again21:20
Shrewsaaaaand nothing in the logs. wunderbar21:21
mordredShrews: http://logs.openstack.org/79/471079/10/check/gate-zuul-python35/3fb578e/console.html#_2017-06-09_21_17_20_20649021:21
Shrewswuh?21:22
mordredhrm - I dont think that's for your test ..21:22
jeblairoh yeah, that looks like the old "everthing spews to the console" bug again21:22
mordredyah21:22
jeblairthat's a test timeout21:23
jeblairthere's no way we're going to extract its output (if any) from the console log21:23
Shrewsso perhaps my test IS hanging as well? i removed the short-circuit to prevent the hang21:24
Shrewshanging in zuul as well, i mean21:24
jeblairyep21:24
Shrewsi wonder what happens if i remove the shell task...21:25
* Shrews tests21:25
Shrewshah! it proceeds as normal21:26
mordredShrews: I've got a thing I want to toss up to try21:26
Shrewssomething funky with our shell module21:26
mordredwell - that's gonna be doing the stream-logs-from-remote-host code - which maybe isn't working so well in the test framework21:27
mordredShrews: oh- actually - you need to be runnign a zuul_console on the "remote" node for this to work at all21:27
* mordred feels really stupid21:27
jeblairoh yeah.21:28
mordredone -sec - lemme make you a quick patch thing21:28
jeblairthis test is going to need a good long comment.  :)21:28
mordredyah21:28
*** jkilpatr has quit IRC21:28
mordredso - if we don't test with shell, but instead test with non-shell things21:28
mordredit should be fine for testing the finger log streamer21:28
mordredsince non-shell things do not need zuul_console to produce output21:28
Shrewsk. that's fine. i really didn't need that shell thing anyway21:28
mordredyah- you just need things to produce output so you can test that you stream that output21:29
Shrewsyup21:29
Shrewsjeblair: this test has been a... experience... for sure21:30
Shrewsjeblair: can i do this without defining the 'nodes' in the job? I'm seeing entries for 'localhost' and 'ubuntu-xenial' in the ansible log and they're sort of conflicting when i go to remove the flag file21:33
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Don't wait for forever to join streamer  https://review.openstack.org/47283921:34
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Use display.display for executor debug messages  https://review.openstack.org/47284021:34
mordredShrews, jeblair: two minor changes to zuul_stream to consider21:34
jeblairShrews: yes; i think we always add 'localhost' in the tests.21:35
Shrewsmordred: lgtm, except the 2nd one has an unintentional indent21:37
jeblairmordred: 2 comments on the first one; a 0 and a -0.5.  :)21:39
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test  https://review.openstack.org/47107921:41
jeblairmordred: does 840 put the "starting to log" message into the main job log?21:41
jeblairmordred: if so, i'm not sure i'm keen on that.  maybe we could run with '-vvv' for the tests instead?21:42
mordredjeblair: no - it does not21:46
SpamapSquite the spirited discussion on HTTP stuffs21:46
mordredjeblair: it's probably worth a comment somewhere21:46
jeblairmordred: ok.  i are confused.21:46
mordredjeblair: there are essentially now two places things can go - the log file we defined, and stdout21:46
jeblairmordred: got it, thanks.  :)21:47
jeblairmordred: then i'll be +2 on that after the fixup.21:47
mordredjeblair: self._display.display will put things on stdout - which it seemed like our test suite was capturing in some manner at least in an earlier linkn from pabelanger21:47
mordredcool21:47
jeblairmordred: yeah, and it'll end up in the zuul executor log, which is fine21:47
mordredjeblair: it's possible we may want to explore intentional uses of those two a little more21:47
jeblair*nod*21:48
mordredjeblair: re: waiting for 30 - tasks will not proceeed any further if that is blocking21:48
mordredwe ok with that still?21:48
jeblairmordred: it should only matter if something has gone wrong, or if we're really backlogged reading the log.  so it seems okay to me....?21:49
mordred++21:50
jeblairpabelanger: sweet.  zuulv3 is up sufficiently that it is complaining about config errors related to image/label21:50
SpamapSalso... nobody's biting on my Netflix/zuul reference? ;-)21:51
mordredSpamapS: :)21:51
mordredSpamapS: I almost did, but then decided not to21:51
SpamapSgood choice21:53
SpamapSit was bad21:53
Shrewsmordred: ps11 fails as I expect now21:53
jeblair2017-06-09 21:54:36,390 INFO zuul.IndependentPipelineManager: Adding change <Change 0x7f1bfc2e4780 472483,1> to queue <ChangeQueue check: openstack-infra/zuul> in <Pipeline check>21:55
jeblairthat's progress21:55
jeblair2017-06-09 21:55:13,467 DEBUG zuul.AnsibleJob: [build: 77890c7afde348899105bccf5bcb71f3] Ansible output: b'fatal: [ubuntu-xenial]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \\"15.184.66.20\\". Make sure this host can be reached over ssh", "unreachable": true}'21:55
jeblairthat's unfortunate21:55
mordredjeblair: do you remember why we're doing the log streaming in zuul_stream as a subprocess and not a thread?21:55
SpamapSkey problems?21:56
jeblairmordred: i thought it was something about ansible modules, but i can't recall what.21:56
jeblairer plugins21:56
jeblairlike, something about that forced our hand.  but i have no idea.21:57
mordrednod21:57
jeblairmordred: should be easy to switch, since they have almost the same interface.21:57
jeblairSpamapS: highly likely21:57
* mordred is starting to worry about layers of subprocesses with things reading and writing sockets and files and whatnot21:57
SpamapSjeblair: is this trusted or untrusted? While I tested the ssh-agent stuff lightly locally... and the tests verify it works the way we hope it does.. there are a few new moving parts there...21:58
jeblairSpamapS: amusingly, i think *all* of our config right now is untrusted21:58
SpamapSfail closed FTW?21:58
jeblairheh21:58
Shrewsmordred: it does seem like that could get fragile rather quickly21:59
SpamapShow's that subprocess spawned btw?22:00
* SpamapS looks22:00
jeblairmultiprocessing i think22:00
SpamapS:q22:01
SpamapSyay vim reflex22:01
jeblairSpamapS: i think we have more local key problems on the host22:02
SpamapSI've always thought multiprocessing was only for stepping around the GIL.. if you have other issues, subprocess is the cleaner path.22:03
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Use display.display for executor debug messages  https://review.openstack.org/47284022:04
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Don't wait for forever to join streamer  https://review.openstack.org/47283922:04
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Use threads instead of processes in zuul_stream  https://review.openstack.org/47285022:04
mordredjeblair, Shrews: ^^ local testing shows that to work just fine22:04
mordredSpamapS: you too22:04
SpamapSmordred: is it possible we were worried about GIL contention?22:05
SpamapSlog streaming is going to be a constant load22:05
jeblairthis is inside of an ansible process22:05
SpamapSor is that only inside ansible-playbook and thus not such a concern?22:05
mordredya22:06
mordredthat22:06
mordredI tihnk it was cargo-cult- not specific decision22:06
SpamapSso we already have our own per-job CPU eater22:06
SpamapSack22:06
* SpamapS is about 25 minutes from EOD'ing early to begin a 3 day LEGOland extravaganza in beautiful Carlsbad CA...22:06
mordredooh fun22:06
SpamapSso.. my focus is already drifting22:06
* SpamapS pulls it together for a last review push22:07
mordredSpamapS: also - I agree, I don't know that moving status.json out of the scheduler is of immediate concern and just shifting that to be aiohttp in place for now seems fine22:07
SpamapSmordred: that stack is a little funny22:08
SpamapSadds the terminate() call and then removes it22:08
SpamapSmordred: I actually really love the idea of coalescing with Etag's of the hashed status.json or something.. but.. yeah, KISS says if you already need to rework your HTTP, just rework it to be the way you want it.22:09
jeblairSpamapS, mordred: you both +2d https://review.openstack.org/472485 which uses https://review.openstack.org/472483 -- does that mean you like the addition of that field?22:13
SpamapSjeblair: I do! +2'd22:16
mordredjeblair: yup22:19
clarkbmordred: fwiw the api services that just talk to the other services in openstack are what are being wsgi'd there is enough application logic in them talking to backends to make that desireable22:21
clarkbbut maybe I still misunderstand what you are saying?22:21
mordredclarkb: yah - I'm saying that in either case we want to have the api services be separate services22:22
mordredclarkb: so either WSGI or aiohttp it'll be a thing thats job is just to handle http requests22:22
clarkbmordred: right, but the problem is not just that they are spearate but that running the http server in python is sadness22:22
mordredclarkb: right - aiohttp is apparently _much_ better at that22:23
mordredone sec - lemme get you links22:23
SpamapSright that's basically the point of aiohttp22:23
clarkbso I see aiohttp as roughly equivalent to eventlet + whatever webserver that nova api (used to) give you22:23
SpamapSI have zero problem with WSGI.. but if we're already going python3.5+ and asyncio for streaming...22:23
mordredSpamapS: exactly22:23
SpamapSyeah I don't think comparing it to eventlet is fair at all22:24
clarkbthey are very similar in both design and use (if you dont use eventlet monkeypatching and instead fully cooperate)22:24
clarkbasnycio has the benefit of the syntax being nicer in new python though22:24
SpamapSeventlet is trying very hard to hide complexity from you, and in so doing creates a debugging nightmare and a compatibility nightmare (basically invalidates half of pypy's advantages because of its trickery)22:24
clarkbSpamapS: no thats not quit true, you can use eventlet fully explicitly without monkey patching iirc22:25
clarkbnow it happens that openstack doesn't22:25
SpamapSDid any of the openstack services not use monkeypatching?22:25
mordredclarkb: https://github.com/aio-libs/aiohttp/issues/234 - the bug about "add docs explaining how to use this in production" and the associated PR discuss that the situation is different22:25
mordredhttps://github.com/aio-libs/aiohttp/pull/23722:25
SpamapSAnd it's not just the monkeypatching that breaks eventlet for pypi22:25
SpamapSpypy22:25
clarkbpypy works fine now22:25
clarkbhas for years (vishy got it going iirc)22:26
SpamapSworks, does not improve your performance22:26
SpamapS(the way pypy should)22:26
clarkbSpamapS: according to intel it does22:26
SpamapSah maybe intel fixed eventlet22:26
clarkbI don't know how they made it happen but they did all kinds of testing and benchmarks22:26
mordredgiven that we _currently_ are doing fine with a paste server in a thread, I think aiohttp is likely to be fine for us too :)22:26
clarkbmordred: sort of we have to cache the response in apache for zuul22:26
clarkbbecause paste in a thread doesn't cut it22:26
SpamapSthere were some things that were causing the jit to run over and over IIRC22:26
clarkb(greanted you could continue to do that)22:26
mordredyah22:26
clarkbspecifically for status.json because its big22:27
SpamapSI think we would definitely continue to do that.22:27
SpamapSAnd then refactor to read from zk when we have a zk to read from.22:27
mordredyup. but that'll be for later :)22:27
SpamapSand maybe by then we'll have more than 3 things to have in the web tier that might make WSGI+Flask a more compelling choice.22:28
SpamapS(like the admin requests)22:28
clarkbSpamapS: the recent thread about dropping pypy testing for cinderclient brought up the pypy + openstack + intel stuff and I think they had rough numbers there22:28
clarkbunfortunately they haven't done much upstream (at least not explicitly) that I have seen so its somewhat hand wavy still22:28
SpamapSclarkb: I remember seeing those and thinking it was surprising. As usual, my info is outdated. :-P22:28
clarkbSpamapS: I went to a local talk they gave on it using swift specifically and I want to say it was like 2x speedup but only after processes were running for ~5 minutes22:29
clarkbso it won't make $clitool better but for long lived services even with eventlet it can be quite beneficial22:29
SpamapSSounds like a win to me. :)22:29
SpamapSI figure aiohttp will finally give you twisted level performance without twisted level brain twisting22:31
mordredyah22:32
mordredmy main thinking is that we have to run log streaming anyway - and that's websockets in python and isn't served by wsgi. so we _could_ do wsgi for the other things, but since we're not actually an API service in that way, it doesn't seem like any win to justify 2 http technologies when we can just use one and likely be fine22:32
mordredespecially since people report that aiohttp performs very well22:32
SpamapSalso the API for aiohttp looks simple enough22:33
SpamapSit's not like we're saddling developers with something super weird22:33
clarkbmordred: ya I agree that keeping them similar is worthwhile22:33
clarkbmordred: I'm just afeared of deciding in a year it all has to be rewritten becuase average response time22:33
SpamapSin a year it does all have to be rewritten22:34
SpamapSso we can scale out schedulers ;)22:34
jeblairfinger 5b7d2c04bd0a4a229167e1a89d2fa2ce@ze01.openstack.org22:36
jeblaireverybody run that22:36
mordredI am running that22:36
mordredjeblair: mine is not streaming anything but is sitting at 2017-06-09 22:33:12.248107 | TASK [openstack-info : Display networking information about zuul worker.]22:36
jeblairmordred: agreed22:36
mordredcool22:36
jeblairmordred: i am happy about the things it output up to that point22:37
clarkbcensored is a funny verb for "this writes too much data"22:37
clarkb:)22:37
jeblairi am less happy about the lack of things it output after that22:37
mordredjeblair: I will confirm that that is all th efile on disk shows22:38
jeblairi'll ssh into the worker22:38
jeblairwhich, amusingly, would be easier if it had gotten around to printing its ip address22:38
mordredjeblair: hah22:38
jeblairthe only thing running on the worker is zuul       839  0.0  0.1 115788 10068 ?        Sl   22:33   0:00 /usr/bin/python /tmp/ansible_XtEykT/ansible_module_zuul_console.py22:41
mordredjeblair: is there a /tmp/console file?22:41
pabelangerI think ansible-playbook is defunted22:41
jeblairmordred: /tmp/console-17a1464795d146bcb85c9802956a908f.log on the worker has the complete output from openstack-info22:41
jeblair2017-06-09 22:33:17.593676 | [Zuul] Task exit code: 022:42
jeblairends with that22:42
mordredjeblair: ok. so somewhere the streaming borked22:42
mordredjeblair: what's the IP?22:42
pabelangeroh, we are also using python3 for ansible. Are we wanting that too?22:42
jeblairssh -i /var/lib/zuul/ssh/nodepool_id_rsa zuul@15.184.65.16722:42
jeblairssh -i /var/lib/zuul/ssh/nodepool_id_rsa zuul@15.184.65.16722:42
jeblairmordred: ^ you'll want to run that from ze0122:42
jeblairmordred: (pabelanger is fixing up missing root ssh keys)22:43
mordredjeblair: I was actually just trying to hit the console streamer22:43
jeblairah k22:43
pabelangerhttps://review.openstack.org/#/c/472853/ and https://review.openstack.org/#/c/472854 are nl01.o.o updates we'll need I think22:43
pabelangerI included flavor-name change too22:44
mordredjeblair: telnet 15.184.65.167 19885 opens the connection, then I put 17a1464795d146bcb85c9802956a908f in and hit enter, but it did not start streaming22:44
jeblairmordred: i also get that behavior22:45
mordredjeblair: maybe we should add some trace logging into zuul_console that we send to a file or something and see what it thinks is going on22:45
jeblairmordred: yeah; i'll see if i can eek anything out of the running process22:46
jeblairmordred: it's looping on this:22:47
jeblair[pid  1322] open("/tmp/console.log", O_RDONLY) = -1 ENOENT (No such file or directory)22:47
jeblair[pid  1322] select(0, NULL, NULL, NULL, {0, 500000} <unfinished ...>22:47
jeblair[pid  1269] <... select resumed> )      = 0 (Timeout)22:47
mordredjeblair: wow. it should never do thta22:47
mordredjeblair: out of sync copies of python things?22:47
mordredjeblair: that's an old copy of zuul_console22:47
jeblairweird.  the version on ze01 looks current.22:48
mordredjeblair: we copy to /opt/zuul and work from there, right?22:49
jeblairwe restarted it only a couple hours ago; that change merged yesterday, right?22:49
mordredyah22:49
jeblairmordred: /var/lib/zuul22:49
jeblairso /var/lib/zuul/ansible/zuul/ansible/library/zuul_console.py  should be the operative version22:49
mordredjeblair: I agree - that looks like what I expect it to look like22:50
jeblair/tmp/5b7d2c04bd0a4a229167e1a89d2fa2ce/ansible/untrusted.cfg says library = /var/lib/zuul/ansible/zuul/ansible/library22:51
mordredand /var/lib/zuul/ansible/zuul/ansible/library is what's written to the unrusted.cfg22:51
jeblairwhich is correct22:51
mordredjeblair: it's like we have the same process in our heads22:51
jeblairthe start time of the zuul_console process corresponds with the start of the job (so it's not an old one somehow)22:54
mordredjeblair: I ran an updatedb and then a locate22:54
mordredjeblair: checking the contents of all of the zuul_console.py files, I see only one that doesn't reference console-{uuid}.log22:55
mordredwhich is /var/lib/zuul/executor-git/git.openstack.org/openstack-infra/zuul/zuul/ansible/library/zuul_console.py22:55
mordredbut that also doesn't reference /tmp/console.log22:55
mordredOH WAIT A SECOND22:55
jeblairthat's probably a master checkout22:55
mordredyah22:56
mordredjeblair: but zuul_console takes a filename as an argument22:56
jeblair(that's the working directory of the merger, so its contents shouldn't matter)22:56
mordredare we passing it a filename in the job content?22:56
pabelangerwe set it to /tmp/console.log in our prepare-workspace role22:56
mordredthat's the bug22:56
mordredwe need to stop doing that22:56
pabelangerk, what should it be?22:57
mordrednothing. just leave it out :)22:57
pabelangerk, we should likely disabled the override then :)22:57
pabelangerI can patch, 1 sec22:57
mordredyah.22:57
jeblairoh, the uuid thing is the *default*22:57
mordredyou can also set it to '/tmp/console-{log_uuid}.log'22:58
pabelangerport? can that be left or removed?22:58
mordredyah - I think we need to rework that as a thing tha has a parameter22:58
mordredpabelanger: just remove it for now22:58
pabelangerack22:58
mordredpabelanger: and we can rethink how we might want to allow this to be parameterized22:58
pabelangerHmm, how do you want to handle clean up on static nodes?22:58
mordredpabelanger: I have some thoughts on that - I'll write them up for folkses22:59
jeblairoh, wait, this has broken static node log streaming hasn't it?23:00
jeblairor has it?23:00
mordredshouldn't have - it will have broken cleaning up lurking log files23:00
jeblairah, gotcha23:00
pabelangerya, we just ensure /tmp/console.log was purged before23:00
pabelangerwe could wildcard it moving forward23:01
jeblairpabelanger: not a bad idea23:01
mordred++23:02
mordredjeblair, pabelanger: we could also send a "we're done, cleanup after yourself" to the on-node console streamer23:03
jeblairya23:03
pabelangeragree23:03
pabelangertmpreaper would also work I think23:04
mordredI actually started poking at "cleanup" the other day, but then got lost in the "refactor these two to be the same code"23:04
jeblairpabelanger: yeah, but tmpreaper is externalizing the cost of zuul ops onto the sysadmin23:04
pabelangerfinger is still working here :D23:04
jeblairbetter for us to clean up ourselves23:04
pabelangeragree23:04
jeblairfinger b82946f7a81a496bbfa45451606c34b2@ze01.openstack.org23:04
mordredI'll pull that thought back up and see if I can't get y'all a patch on monday23:05
pabelangerOh, ah. we hit ffi missing dependency23:05
jeblairShrews: fingering is happening ^ :)23:05
mordredand WORKING23:05
pabelangerya23:05
pabelangerNice, it is now logging our rsync attempts too23:06
clarkbpabelanger: and censoring them23:06
pabelangerOh, this is just pulling logs to executor23:06
mordredjeblair: what do you think of zuul itself doing a socket connection to each node on port 19885 and sending a cleanup command before it returns the nodes?23:06
mordredjeblair: we could also make it a thing in the base job's post playbook23:07
jeblairmordred: i think 19885 has to be read-only; so something in the playbook which logs into the host and sends a signal would be better23:07
clarkbmordred: could you use an at exit type construct for the service instead? I assume we expect the process to die23:07
mordredjeblair: kk23:07
mordredclarkb: no - nothing kills the process currently23:07
pabelangerguess we need to also publish properly to logs.o.o now too23:08
jeblair(we need it to continue running at least until the executor has finished streaming from it)23:08
jeblair(which is slightly *after* the thing it is running is finished)23:08
mordredjeblair: ok - maybe zuul_console should return the PID of the child it spawns23:08
mordredjeblair: and then we can run a post-playbook in the base job that signals that pid and tells it to shut down23:08
jeblairmordred: considering we just said "we're only going to wait 30 seconds for the stream to catch up", we could probably have the streamer wait 40 seconds, then clean up and exit.23:09
mordredjeblair: 40 seconds after what?23:09
mordredit doesn't know when the last task will have been performed23:09
mordred(there is a connect-disconnect per task now)23:09
clarkbmordred: will it get a signal when the parent dies(parents get signal when children die)23:10
jeblairmordred: oh, right...  hrm.23:10
mordredclarkb: nope23:10
clarkbmaybe we can make parent explicitly send a signal?23:10
mordredclarkb: the parent is LONG gone on purpose23:10
jeblairmordred: pid/signal sounds best so far.23:10
clarkbwell whatever ends the job23:10
mordredclarkb: the first task of the first pre-playbook is "run zuul_console"23:10
clarkbcan lookup pid based on socket ownership or fd23:10
mordredclarkb: ooh - TIL ...23:11
mordredclarkb: what's the best way to lookup pid based on socket ownership?23:11
clarkbthats a good question :) I Know its possible, but not sure of a best way23:11
mordredheh :)23:12
clarkbmordred: looks like psutil23:12
mordredclarkb: sudo ss -lptn 'sport = :19885'23:13
clarkbyou can iterate over processes and for each process you can list the files23:13
mordredor sudo netstat -nlp | grep :1985523:13
clarkbya or lsof23:13
clarkbbut I am assuming you want to do it from python right?23:13
mordredno - it'll be an ansible task23:13
clarkbI think psutil may be easiest there23:13
clarkbah23:13
mordredbecuase the only thing that will have context to know things are done is the base job's post playbook23:14
mordredclarkb: I don't have psutil instlaled anywhere23:14
clarkbmordred: its a pyhton module23:14
mordredoh - duh23:14
clarkbI think off pypi23:14
mordredyah - let's see if we can get this without needing to install stuff23:15
clarkbhttps://pypi.python.org/pypi/psutil23:15
clarkbya ss/netstat/lsof likely to work fine. Just not sure how consistently they are installed places23:15
clarkbI seem to recall needing to install lsof on centos23:15
mordrednetstat is pretty-much always around right?23:16
clarkbit is in /bin for me23:17
clarkbwhich I think means yes at least for suse23:17
clarkbmordred: looks like netstat is being killed as part of the switch to ip and ss is the new command23:17
clarkbso your original one is likely best23:18
mordredkill $(netstat -nlp | grep :19885 | awk '{print $7}' | cut -f1 -d/)23:18
mordredlovely23:18
mordredss has the hardest output to parse23:18
clarkbof course23:18
mordredroot@ubuntu-xenial-rax-ord-9191322:~# ss -lptn 'sport = :19885'23:18
mordredState      Recv-Q Send-Q Local Address:Port               Peer Address:Port23:19
mordredLISTEN     0      5           :::19885                   :::*                   users:(("python",pid=3440,fd=3))23:19
mordredSIGH23:19
clarkboh thats not too bad give me one sec23:19
jeblairmordred: hah, i thought SIGH was a State.23:19
clarkb| sed -n -e 's/.*,\(pid=[0-9]\+\),/\1/p'23:20
* jeblair renames nodepool states... SIGH. JUST_GET_IT_OVER_WITH. WHATS_THE_HOLDUP.23:20
clarkbSIGH has value 25623:21
mordredclarkb:  that gets me pid=3440fd=3))23:21
* clarkb tests more locally23:21
clarkboh right you just want pid23:22
clarkb| sed -n -e 's/.*,pid=\([0-9]\+\),.*/\1/p'23:22
mordredyes23:22
mordredthank you23:22
clarkbnot to derail anything but can you imagine sorting all this out for windows too?23:23
pabelangerjeblair: are we expecting the current jobs to get killed once job timeout happens?23:24
mordredoy23:25
mordredclarkb: well - this isn't going to work actually - because you have to be root to look up process by port23:25
clarkbmordred: even if that process is sharing a user with you?23:26
mordredhrm. maybe? lemme try23:26
* clarkb checks /proc23:26
jeblairpabelanger: yes23:26
clarkbmordred: things are owned by you in /proc so if utility does best effort it should work23:26
jeblairpabelanger: i'm inclined to just let that happen and check back later23:26
clarkbbut if it just bails out if not root then ya won't work23:26
mordredclarkb: ok. cool. you're right23:26
mordredit works23:26
clarkbcool23:27
ShrewsYay finger things being useful23:27
clarkbmordred: you definitely won't eb able to do it for arbitrary users as not root though23:27
pabelangerjeblair: agree, happy to see what happens23:27
mordredclarkb, jeblair, pabelanger: https://review.openstack.org/47286623:28
jeblairmordred: is that going to put us in a catch-22?  it will create a log file that the callback plugin will want to stream?23:29
mordredjeblair: I was just about to say that23:29
mordredjeblair: I'll workon that next :)23:29
mordredjeblair: can probably write it as an option to the zuul_console module actually - so that we can just call "zuul_console: state=absent"23:30
jeblairmordred: interestingly, that's a DoS that your 30s timeout patch will prevent :)23:30
mordredjeblair: and have the zuul_console python module that gets copied over and that does not log to the shell log do the kill in that case23:30
jeblairmordred: ++23:31
mordredit'll also keep the logic in a zuul file and have the base job be easy and symmetrical for others23:31
clarkbmordred: that would be python then?23:33
clarkbprobably still don't want to rely on psutil?23:33
jeblairi think it could be problematic to use something not in the stdlib23:39
clarkbin that case still possible to iterate through /proc/$piddirsownedbyouruser/fd23:43
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add shutdown option for zuul_console  https://review.openstack.org/47286723:43
mordredclarkb, jeblair: ^^ how's that look?23:43
clarkbthat looks fine, I'm just trying to find a sane way to make it all python without the subprocess23:46
mordredI welcome that23:46
mordredI'm gonna pushup a quick update adding a comment23:46
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add shutdown option for zuul_console  https://review.openstack.org/47286723:47
mordredclarkb, jeblair: it is time for me to EOD - but I think that should at least put a placeholder in the approach we can use there23:51
jeblairmordred: lgtm; ++ to proc if clarkb works it out.23:51
clarkbmordred: https://github.com/giampaolo/psutil/blob/0a6953cfd59009b422b808b2c59e37077c0bdcb1/psutil/_pslinux.py#L1870 ya that does what I dscribe using /proc23:51
mordredclarkb: so we could potentially copypasta some and be not 100% terrible23:51
jeblairbsd licensed, should be ok23:52
clarkbso I thin general process is list /proc filter by process dirs owned by us, for each process dir readlink fd/*, if matches tcp port then return pid23:52
clarkbyup licensing should be fine and I think we can mostly use that implementation too23:52
mordredcool23:52
mordredclarkb: if you get a while desire to do that before I do and update the change, I will not be offended23:52
mordreds/while/wild/23:53
clarkbthough looking in my /proc we may need to do a second lookup because I get things like 3 -> socket:[11248846] in fd23:53
jeblairclarkb: fdinfo?23:53
clarkbthat gives me pos, flags, and mnt_id. Apparently htat number in [] is an inode23:54
pabelangerwe seem to be getting a deprecated warning about commas as lists, but not sure where that is coming from just yet23:54
jeblairmordred: for monday: http://paste.openstack.org/show/612171/23:56
clarkbjeblair: mordred you read /proc/net/tcp which gives you all the connections in binary tabular form23:56
pabelangerwarning and error seem to go hand and hand: http://paste.openstack.org/show/612172/23:57
jeblairclarkb: is pid in there?23:57
clarkbso I think what we actually do is read /proc/net/tcp first to find inode for socket on correct port. Then look for that inode to find the process23:57
jeblairclarkb: oh gotcha23:58
clarkbjeblair: it doens't look like pid is in there :) that would be too easy23:58
clarkbsl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode <- are the fields23:58
jeblairpabelanger: are you sure those are related?23:58
jeblairpabelanger: i mean, it's outputting the comma warning on every invocation.  i would expect it to *also* show up in invocations with the log error.23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!