Tuesday, 2017-06-06

jlkgit options do not appear to be rate limited, not git:// or ssh://. Maybe https://?  Not sure.00:00
jamielennoxright, we've known caching will fix much of this, most of the information does seem available via events, it's just an issue of timing and storing the data00:01
jlknod00:01
jlkso I assumed with a move to graphQL and with keeping a cache, and using eTags for github side calls, we'll fly under the limits radar.00:02
clarkbI think github rate limits git operations by erroring when it can't fulfill them :)00:02
jlklol00:02
mordredjlk, jamielennox: also, if there's data we need from the webhook that is causing us to need to query for it, that might also be a thing to be like "hey, gh, we're trying to do X without being bad rate limit people maybe can has another field in webhook?"00:03
mordredI mean, that doesn't always work - but sometimes it does00:03
jlkI'm building up a graphQL query right now that would get all the data we ever fetch from the individual API calls, so instead of 14 various API calls, we'd have one graphQL call00:03
mordredneat00:03
jamielennoxmordred: yea, the main place where that is obvious though is an intentional github design00:04
jlkmordred: we tried. The two gotchas are PR comments, and commit status.00:04
jamielennoxso statuses are set by commit hash not by PR, because the same commit can end up in multiple PRs00:04
jamielennoxjlk: so where are you doing that call from? where are we doing more than 2 or 3 requests sequentially?00:06
jamielennox(ignoring the stupid github3.py library fetching the PR objects each time)00:06
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Mutex repo updates and merge ops in executor  https://review.openstack.org/47057100:06
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix reporting on changes dequeued for deps  https://review.openstack.org/47114300:06
jlkwe're not doing any more than that sequentially, was just counting all the different functions we log API rate limits from00:06
jamielennoxya, we should remove that logging as well, statsd at best00:07
jlkfetching user info, project branches, PullReq itself, Pull file names, Pull by sha, pull reviews, repo permissions,00:07
jlkoh crap, I think I counted things where we _set_ stuff00:08
jlkgetCommitStatuses was missing from above list00:09
jlkso 8 separate API calls down to 100:09
jamielennoxjlk: yea, but those calls are not made from the same sections of code. are you thinking one big cached request?00:10
jlkright00:10
jlkmake one query when an event comes in, gather all that data00:11
jlkand have an _update() call for getChange which pulls the PR out of our cache, re-runs the query, and updates anything that's changed.00:11
jlkbasically how gerrit does it00:11
jamielennoxjlk: will be interesting to see anyway, not sure atm whether it will be an advantage to just fixing how we use the rest api00:14
jlkso things like getting a change, we get the PR, we get the files from the PR, we get the reviews, we get the status of the head ref.00:16
jlkjust the way the API works, that's 4 API calls00:16
jamielennoxyea, was thinking cache might help because we get an event for each review and each status thats set00:20
jamielennoxbut it seems common to basically use those as a trigger and ask github for a real state of the world rather than assume you're constructed state is correct00:21
jlkyeah, and we could have missed some of those events during a restart. or network glitch00:22
jlkanyway, I "solved" my earlier problem, just go with updatedAt. Now I'm on to figure out how to construct a list of changed files in a given PR00:22
jlkoh bother, there doesn't seem to be a way to GET the files of a PR in graphQL. wtaf.00:35
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Status branch protection checking for github  https://review.openstack.org/47117500:38
*** jamielennox is now known as jamielennox|away01:17
*** jamielennox|away is now known as jamielennox01:34
*** dkranz has quit IRC01:37
jlkWelp. Couple things missing from GraphQL, I can't get whether or not a user has push access to a repo (to judge the "level" of their review), and I can't get a listing of the files changed in a PR.01:37
jlkhttp://paste.openstack.org/show/611447/01:38
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Write inventory as yaml not ini  https://review.openstack.org/46763401:44
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Add github reporter status-url config option  https://review.openstack.org/44979402:01
jeblairw00t, the 'mutex merger' change series (471161 -- 470571) is green now02:53
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Add github reporter status-url config option  https://review.openstack.org/44979403:50
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Add github reporter status-url config option  https://review.openstack.org/44979403:54
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Expose ref to the executor and url formatter  https://review.openstack.org/46345703:56
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Add tenant to url formatting.  https://review.openstack.org/46362803:56
*** linuxlite3 has joined #zuul03:56
*** linuxlite3 has quit IRC03:58
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix race in gerrit+github test  https://review.openstack.org/47116104:06
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Write inventory as yaml not ini  https://review.openstack.org/46763404:13
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Expose ref to the executor and url formatter  https://review.openstack.org/46345704:53
openstackgerritJamie Lennox proposed openstack-infra/zuul feature/zuulv3: Add tenant to url formatting.  https://review.openstack.org/46362804:53
*** eventingmonkey has quit IRC06:47
*** eventingmonkey has joined #zuul06:52
*** hashar has joined #zuul07:21
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add override-branch property to job repos  https://review.openstack.org/46737507:29
*** hashar_ has joined #zuul07:30
*** hashar has quit IRC07:33
*** hashar_ is now known as hashar07:51
*** jamielennox is now known as jamielennox|away10:10
*** jamielennox|away is now known as jamielennox10:16
*** jkilpatr has quit IRC10:41
*** jkilpatr has joined #zuul10:59
*** smyers has quit IRC12:14
openstackgerritTristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: executor: add support for custom ansible_port  https://review.openstack.org/46871012:25
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Don't report start on unmanaged projects  https://review.openstack.org/45571112:50
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix reporting on changes dequeued for deps  https://review.openstack.org/47114312:55
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Mutex repo updates and merge ops in executor  https://review.openstack.org/47057112:55
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Rename 'repos' job attribute to 'required-projects'  https://review.openstack.org/46737613:08
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add override-branch property to jobs  https://review.openstack.org/46777513:08
*** isaacb has joined #zuul13:29
*** dkranz has joined #zuul13:47
*** smyers has joined #zuul14:02
*** dougbtv_ is now known as dougbtv14:50
*** isaacb has quit IRC15:01
Shrewsholy smokes, i just saw a real-life Transformer. the new recycling truck has a yellow arm that comes out, grabs the recycling can, dumps it and sets the can back.15:09
Shrewsthis is high-tech for Durham15:09
gundalowLike this http://imgur.com/gallery/YiwxQKy15:12
*** hashar has quit IRC15:13
Shrewshaha, a little more elegant than that one  :)15:15
Shrewsexactly like this one: https://www.youtube.com/watch?v=mE0VDZzxTR015:16
jeblair467776 failed to merge because of this unrelated error, which i haven't seen before: http://paste.openstack.org/show/611527/15:37
jeblairthat's the http server failing to shutdown for some reason15:38
jeblairoh interesting, this is unique too: 2017-06-06 14:18:29,120 paste.httpserver.ThreadPool      INFO     1 workers didn't stop properly, and 0 zombies15:40
jeblairi think it just didn't wait long enough for the worker to die15:41
jeblairit gives each worker 0.5s to shutdown gracefully, then kills the thread15:41
jeblairit must have taken more than 0.5s to stop, but still stopped before it got around to killing the thread15:41
jeblairand there's no exception handling inside of paste for that case15:42
jeblairi'd be tempted to just try shutting down again, but the whole process starts with a socket.close(), and that's not going to work twice15:44
jeblaircould try calling self.thread_pool.shutdown(60) again...15:44
mordredjeblair: in https://review.openstack.org/#/c/467634/ I'm calling release but I'm left with jobs at the end - do I need to waitUntilSettled again after the release?15:51
jeblairmordred: yes15:51
jeblairmordred: the release will let them continue, they'll run asynchronously15:52
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Write inventory as yaml not ini  https://review.openstack.org/46763415:52
jeblairmordred: most tests end with a check that all the jobs have finished15:52
mordredok. it's going to work this time :)15:52
jeblairi just rechecked my stack; i'm going to try to figure out how to do a hg pull request on bitbucket in the mean time15:52
jeblair(i barely knew before, and now they changed *everything*)15:52
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add support for defining groups in nodesets  https://review.openstack.org/46761115:53
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Write inventory as yaml not ini  https://review.openstack.org/46763415:53
mordredjeblair: wow, what do you need to hg for?15:53
jeblairmordred: i'm trying to fix the bug in paste above15:54
mordredjeblair: oh - so - there is a proposed openstack goal for "migrate off of paste" ... the rationale is that apparently paste is essentially a dead project15:55
jeblairwow, yep, that looks like it's the case.15:56
jeblairnice of them to mention that in the readme15:56
*** hashar has joined #zuul15:57
SpamapSso much backscroll15:58
SpamapSShrews: welcome to 2005 trash collection technology!15:58
Shrews[WARNING]: Failure using method (v2_playbook_on_task_start) in callback plugin (<ansible.plugins.callback.zuul_stream.CallbackModule object at 0x7f768f2c2710>): 'ansible_host'16:00
Shrewsso that's fun16:00
* Shrews hunts down his error16:01
mordredjeblair: so - maybe I really should start that thread about tristanC 's dashboard patch - because maybe it's time to have an answer to what we want out of our rest layer16:01
jeblairmordred: yeah, jamielennox has suggestions as well.  let's pick a destination.16:02
mordredjeblair: ++16:02
jeblair(we don't urgently need to arrive there, but i think we'll need to start the journey soon)16:02
* jeblair is still clicking around bitbucket16:02
mordredjeblair: I'm betting that picking the answer will be the hard part - we don't have a ton of http endpoints already, so I bet a migration to $whatever will be pretty simple16:03
* jeblair prepares a d2 roll16:04
jeblairhttps://bitbucket.org/ianb/paste/pull-requests/38/fix-error-on-httpserver-shutdown/diff16:04
jeblairthere, i have sent that off on its long journey into the void16:05
mordredjeblair: well done16:05
pabelangermordred: is there also an effort to move away from pecan too ?16:06
SpamapSCan we just wsgi, and maybe add a handy uwsgi integration for those who don't want more than that?16:07
SpamapSis pecan an http server?16:08
* SpamapS has never looked at it16:08
pabelangerya, I think it started in openstack then left the project, but looks pretty dead now16:08
SpamapSI thought it was just a REST building framework, but that you still need a thing to serve16:08
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Return resulting commits from merger  https://review.openstack.org/46777616:08
pabelangerwell, not may commit over the last 6 months16:09
pabelangerI think wsgi / uwsgi would be great. One less daemon to run16:10
pabelangerwell, once less zuul daemon16:10
pabelangerone*16:11
SpamapSSo the problem with that would be that we'd need to RPC between the scheduler and wsgi16:11
jeblairpabelanger: right, which means one more external dependency and configuration interaction16:11
SpamapSRight now the scheduler answers the web calls right?16:11
jeblairpabelanger: we don't just get to wash our hands of it.  :)  we're responsible for making sure users can actually get the whole thing up and running, one way or another.16:11
SpamapSuwsgi, I think, has an in-process server16:11
pabelangerjeblair: agree. something still needs to be configured16:12
SpamapShttps://uwsgi-docs.readthedocs.io/en/latest/Embed.html16:12
SpamapSoh hrm, that's the opposite I think16:12
jeblairSpamapS: yeah, rpcing shouldn't be too hard though.  we can throw in a gearman call for event injection.  on the log streaming side, we're already moving toward a gearman function to look up some data.16:12
SpamapSjeblair: oh good, that might be easier than I'd thought.16:13
SpamapSAlso, I just want to use the project because they are clearly wordsmiths; from their docs: "Now, we are ready to kick asses with uWSGI ninja awesomeness."16:13
jeblairmordred: don't forget the third leg of this -- the log streaming asyncio/websocket stuff16:14
mordredjeblair: yup16:14
SpamapSdoes asyncio have a powerful enough server implementation?16:14
*** hashar has quit IRC16:16
mordredjeblair, SpamapS, pabelanger: I think there are at least 2 questions: what, if anything, do we use for REST layer, and what do we use for http server - then as jeblair mentions, there's also the log streaming which may or may not want to be directly integrated16:16
mordredSpamapS: http://aiohttp.readthedocs.io/en/stable/ is the asyncio-based http framework thing16:19
SpamapShttp://aiohttp.readthedocs.io/en/stable/web.html#aiohttp-web16:19
SpamapSlooks pretty simple16:19
jeblairmordred: i think that's a fair summary.  to expand a bit: our rest layer (* it's not really rest) is super simple, so we should not get too bogged down in that.  and ideally, we would not ask users to run two different webservers, so being able to integrate log streaming with status page, webhooks, and dashboard would be ideal.16:20
mordredjeblair: agree16:22
jeblairSpamapS: regarding the git ssh wrapper stuff... i suspect there are problems16:24
jeblairSpamapS: ca5d0cac39b026bc23b74903478addd4f702173e is the most recent related change16:25
jeblairSpamapS: da90a50b794f18f74de0e2c7ec3210abf79dda24 is the prior one16:25
*** jamielennox is now known as jamielennox|away16:26
jeblairSpamapS: in short, i think the idea is to make sure we have the appropriate git command environment (with the right ssh key) set whenever we're running git commands on the merger which can use git+ssh16:26
mordredSpamapS: it does. otoh, interfacing it with any sort of wsgi-based api programming gets a little weird. HOWEVER - there is this: https://github.com/etianen/aiohttp-wsgi#design16:26
jeblairSpamapS: the v2 way was to write out little wrapper files, then set an env variable to tell git to use them.16:27
mordredSpamapS: which might allow us to have both options: "run with internal asyncio-based wsgi server" and a "run wsgi in uwsgi"16:27
jeblairSpamapS: jhesketh has moved us to a context manager in v316:27
SpamapSmordred: my question is why bother with wsgi based programming for this case if we can just build a REST API in the asyncio webserver directly?16:28
SpamapSjeblair: AHHHHH16:28
jeblairSpamapS: the test (test_timer_sshkey) was written because there was a path in the v2 code which would not actually invoke the ssh wrapper16:29
jeblairSpamapS: because of the way that the ssh wrapper was semi-persistent, i think that we actually papered over a bunch of ways that it could have otherwise been unset16:30
jeblairSpamapS: i suspect that now, in the v3 code with the context manager, there are *many* instances when we're running without the right environment16:30
SpamapSso with a context manager in place, it may actually be harder to find it?16:30
SpamapSOh and it's maybe not even being used where it needs to be?16:31
jeblairSpamapS: in particular, if you look at the merge function (_mergeItem), the context manager is used in the second half, but not the first half.  the first half performs a 'git remote update' which seems like it should need the ssh key info.16:31
jeblairSpamapS: i'm wondering if actually we should look into moving the context manager idea into the Repo() class (which is our wrapper around git.repo), so we can be more certain all the right commands have a context manager set up.16:32
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: add log streaming test  https://review.openstack.org/47107916:32
SpamapSjeblair: makes sense to me.16:33
jeblairSpamapS: either that, or adopt a pattern of establishing a context manager for the duration of the high level functions in Merger (like, for the entirety of _mergeItem)16:33
jeblairSpamapS: i suspect we're not seeing this problem because on zuulv3-dev, the zuul user's ssh private key is what we use with gerrit, so we aren't actually exercising this.  it's only an issue if the connection's ssh key is not in ~/.ssh/id_rsa.16:34
jeblairSpamapS: as for the test itself, i think once we establish the context manager the way we want, we shouldn't have to worry as much about "does the key get set up for this particular kind of job"; it should just be handled by the merger/repo api.  it might still be valuable to have a test of whether that's working, however, the mechanism would be completely different than the ssh wrapper file check that's there now.16:36
jeblairSpamapS: so in short, we probably just need to drop that test, and then see if there's some alternative way we can verify the context manager.16:37
SpamapSjeblair: I think if we build the context manager into Repo, a simple test of whether or not Repo calls it is enough.16:38
jeblairmordred, clarkb: i don't understand this failure: http://logs.openstack.org/77/467777/2/check/gate-zuul-python35/0dff23e/16:50
jeblairi think we've seen it a few times on the python3 tests16:50
jeblairthere's no testrepository file16:50
mordredjeblair: I do not understand that either16:51
jeblairbased on the console log (which has a whole bunch of stuff spewing out?  it's not supposed to do that), it stopped doing things after 7 minutes, and then sat there for the next 38 minutes.16:51
jeblairwhat's the tmpgh4exbjp file?16:52
clarkbjeblair: should be the temp subunit file that get renamed when things die gracefully. We capture it when its not so graceful16:53
jeblairthe perms seem wrong on that :(16:54
jeblair-rw------- 1 jenkins jenkins   85704 Jun  6 15:59 tmpgh4exbjp16:54
jeblairi just chmodded it16:55
jeblairthat's not giving me any more insight16:58
jeblairi don't see any indication of hung tests16:59
jeblairit looks like all the tests passed: 265 successful tests and 18 skips.  that matches the next successful py35 run.17:01
jeblairso why didn't testr finish?17:02
mordredjeblair: maybe something with a hung subprocess and testr wasn't able to properly shutdown?17:03
mordredjeblair: like if something had an orphaned/borked filehandle or similar?17:04
*** jamielennox|away is now known as jamielennox17:05
jeblairmordred: hrm.... maybe -- so testr runs (in this case 4) subprocesses for the tests.  within each of those subprocesses, we have a hard timeout for each test.  that did not trigger, so we know that each test completed.  but you're thinking the subprocess itself didn't exit, even though it finished running all of its tests...17:06
jeblairmordred: i wonder if all 4 subprocess would have to hang like that for us to see this, or if just one would be sufficient....17:06
mordredjeblair: dunno - and that'sa TOTAL stab in the dark too17:07
jeblairmordred: ya.  so far it fits the (limited) evidence :)17:07
jlko/17:09
jeblairi'm going to try to catch this in action17:09
jlkSpamapS: I think I left you a land mine. The build of z8s that produced an image I had commented out almost all the dib elements so that it was incredibly simple (and likely non-functional).17:09
clarkbjeblair: I think one is sufficient as testr waits for all to exit17:10
jlkSpamapS: in /root/etc/nodepool/nodepool.yaml17:10
jeblairclarkb, mordred: so i think the debug procedure for this is: when it's hung, get a process listing, open file listing, and possibly gdb stacktraces17:10
* jeblair writes a status json scraper17:11
SpamapSjeblair: +1 that's what I was going to suggest actually17:13
SpamapSjlk: ACK, I got side tracked playing with kompose17:13
SpamapSand Helm17:13
mordredSpamapS: ZOMG. I finally got https://review.openstack.org/#/c/467634/ right17:20
mordredSpamapS: or, rathre, I finally got the tests right17:20
mordredSpamapS: so - can haz review? also, the one before it (https://review.openstack.org/#/c/467611) you had -1'd for tests, but the tests are in that patch and now pass17:20
SpamapSYeah I'll hit that soon17:22
SpamapShey we are talking about shutting down BonnyCI's v2.5 and one thing I realized is we need to translate the layout17:22
SpamapSpabelanger: did you already start on a translator?17:22
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Put variables into the inventory  https://review.openstack.org/46763517:22
SpamapSI was thinking we could start one.17:22
mordredSpamapS: I do not believe we have started on a layout translator, no17:23
mordredSpamapS: I have some very old patches where Istarted thinking about standalone jjb translator -but those are likely bong at this point17:23
jeblairw00t i think i have a node with the problem17:24
pabelangerSpamapS: no translator yet17:24
jeblairhttps://etherpad.openstack.org/p/LXWKvYpOGt17:24
SpamapSjenkins  11410  0.0  0.8 3155492 67640 ?       S    16:48   0:00 python -m subunit.run discover -t ./ ./tests/unit --load-list /tmp/tmp.7mpWDK7dfm/tmp73pe3n8017:26
SpamapSthat one looks stuck17:26
jeblairapt-get install lsof :)17:28
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Use the executor cached repos more often  https://review.openstack.org/46777717:29
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Check out the appropriate branch in executor  https://review.openstack.org/46777817:30
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Enable test_one_branch cloner test  https://review.openstack.org/46777917:30
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Release executor jobs on test exit  https://review.openstack.org/46855317:31
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add LoggerAdapter for executor jobs  https://review.openstack.org/46855417:31
SpamapSpython  11410 jenkins   28u     unix 0xffff880234a2cc00      0t0  129510 /tmp/tmp.7mpWDK7dfm/tmp1aaa0lim/zuul-test/lib/executor.socket type=STREAM17:31
SpamapSjeblair: RPC socket still open?17:31
SpamapSseems like that should not be17:32
jeblairagreed; also there's an epoll, so geard may still be running17:32
SpamapSso yeah, somewhere a test didn't properly assert end state. I wonder if we have tests that don't extend ZuulTestCase but do fire some of that up.17:33
jeblairtwo open socket's actually17:35
jlkI believe https://review.openstack.org/464252 is ready for review, closes out a TODO17:35
jeblairSpamapS: unfortunately, i didn't get the nice python backtrace for that thread17:36
jeblairbut we seem to be stuck in recv17:36
jeblairstrace says recvfrom 817:37
jeblairpython  11410 jenkins    8u     IPv4             163332      0t0     TCP localhost:7900->localhost:40776 (ESTABLISHED)17:37
jeblairso we're reading from the tcp connection17:37
jeblairi don't know what that connection is17:39
jeblairneither of those port numbers show up in the console log17:40
jlkI'm picking up task 4664, making config repo updates work on github.17:40
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Update the merger recent dict when saving the repo state  https://review.openstack.org/46959517:40
SpamapSjeblair: lsof should show both sides17:41
jeblairSpamapS: it does; it's the same process17:41
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Actually check out the branch in the jobdir  https://review.openstack.org/46855617:42
jeblairso it's probably the fake gearman server, or the fake zookeeper17:42
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Enable test_multi_branch cloner/executor test  https://review.openstack.org/46855717:42
SpamapSjeblair: I'd guess gearman, because I feel like the ZK flaps and polls a lot17:43
mordredjesusaur: in 449794 - does the status get set to the status url while the job is running then to the log url when it's done? or is this only about log url when done?17:43
jeblairSpamapS: yeah, actually, the fake gearman would be the only thing with both connections on the same process17:44
SpamapSoh right, zk is real zk17:45
SpamapSjeblair: so perhaps it's stuck in getJob?17:45
jeblairSpamapS: yeah, though could also be the server17:47
jlkComing up with a good verb to reflect that content in a repo/branch we care about has changed. Originally it was keyed off of "change-merged" event in gerrit, and in github it'll be off of a "push" event. I'm thinking "branch_updated" as a boolean. Any objections?17:48
jeblairjlk: none here17:49
jeblairSpamapS: i could really use a real python bt right now, but gdb isn't cooperating17:49
SpamapSjeblair: :(17:50
mordredjesusaur: nevermind. the previous patch in the stack answers my question17:53
jlkjeblair: unless I'm blind, I'm not seeing a test case that appears to test the functionality of a change being merged that has zuul.yaml content, leading to a reconfiguration.17:58
mordredjeblair: +2 on the stack up to 470442 - so pending getting the hung-tests thing figured, I think it's all good to go17:59
jeblairjlk: i think there's some coverage in test_v3.py in test_dynamic_config.  but only for a .zuul.yaml change landing (it ensures that the dynamic config becomes the static config after reconfiguration after it lands).  that should generally cover the same code, but a new test that explicitly tests what you describe would be a good addition i think.18:01
jeblairmordred: thx!18:01
jeblairi'm trying to understand how the python gdb stuff has changed with gdb7 and python318:01
* mordred hands jeblair hip wading boots18:02
jeblairhttps://docs.python.org/devguide/gdb.html18:02
mordredjeblair: oh - that's pretty neat - all you need now is that python-gdb.py file18:05
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Handle job or build is None in formatUrlPattern  https://review.openstack.org/47041518:07
jlkjeblair: what should happen if a (.)zuul.yaml file is removed? Should that cause a reconfigure too?18:08
jeblairjlk: erm, yes.  i'm pretty sure that's untested.  :)18:09
jlklol!18:09
jeblairjlk: (who would want to do such a thing?!)18:09
jlkwell, ¯\_(ツ)_/¯18:09
jlkI'm just adding code to the github driver to expose the files associated with a push18:09
jlkand it breaks them down by added/modified/removed. I figured we should probably do _something_ on a file removed18:10
jeblair++18:10
clarkbjeblair: for that tcp connection both ends are process 11410?18:13
jeblairclarkb: yep18:13
jeblairfinally!18:14
jeblairi have a python backtrace18:14
jeblair'apt-get install python3.5-examples' and then the rest is in the etherpad18:14
jeblairand we were wrong, it's not gear; it's the log streamer test18:15
jeblairi think that's enough info to work from now18:16
jeblairi'm going to try to get lunch before the infra meeting18:16
jlkPython style help. I'm using a list comprehension to fill a set, so I'm calling f.add(blah) as the "action" in the comprehension. So this line will return something like [None, None,  None...] that I just don't need. I was going to just assign that to "_" and never use it, but pep doesn't like that. Should I just let it "return" that value unassigned?18:45
*** mordred has quit IRC18:47
jlkmordred: re multiple ingestors, I was thinking on that topic the other day too, and I was going to "solve" it in Bonny with a caching load balancer with a single backend. So that if the backend went away, the LB would cache until it showed up again, or eventually timeout.18:47
clarkbjlk: can you paste the comprehension18:47
jlk        [[files.add(f) for f in c.get('added') + c.get('modified') +18:48
jlk          c.get('removed')] for c in event.get('commits').values()]18:48
clarkbjlk: you may just need something like [ f.add(x) for x in foo if x18:48
*** mordred has joined #zuul18:48
jlkI figured a set was more efficient at adding unique items than doing the if in a list comprehension.18:48
clarkbjlk: as an alternative f.add([x for x in foo if x])18:48
jlkoh hrm, yeah you can add a scalar?18:48
clarkboh not add() but there is a method iirc18:49
jlkhrm, no.18:49
jlknot with add()18:49
jlkfoo.update()18:49
jlkhrm, maybe not18:52
jlkoh I see why. Because it's a double comprehension I'm getting a list of lists return18:53
jlkdoing the f.add() inside the inner comprehension allows it to just flow, doesn't matter that the return is a list of lists.18:54
clarkbI think you can collapste them. [files.add(c.get()+c.get+c.get()) for c in vent.get(commits).values()] <- roughly18:55
jlkthat's still calling add() inside the comprehension, so you'll get an unused "return"18:56
clarkbya in this case just let it be c.get()+...+c.get() then do the files.update(comprehension here)18:57
clarkbjust pointing out that you don't need two lists I don't think18:57
jlkoooh, maybe I'll map() this18:57
clarkb(dependso n how you want to represent the data structure ultimately)18:57
jlkor I'll just move this back to real loops and not try to be cute.18:58
*** jkilpatr has quit IRC19:03
jlkoh map() also returns none19:04
*** hashar has joined #zuul19:07
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Put variables into the inventory  https://review.openstack.org/46763519:25
*** harlowja has quit IRC19:29
*** harlowja has joined #zuul19:50
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Put variables into the inventory  https://review.openstack.org/46763519:57
SpamapSjeblair: so, for the streamer recv code.. it looks like its non-blocking.... what gives?20:03
jeblairSpamapS: is it?  it's whatever a streamingserver does; i'm not familiar with it, and just started looking at it20:05
jeblairer, socketserver rather20:05
Shrewsstreamer issues?20:06
SpamapSShrews: https://etherpad.openstack.org/p/LXWKvYpOGt20:07
SpamapSsee the bottom20:07
SpamapSa test process is stuck in that thread20:07
SpamapS  File "/usr/lib/python3.5/socketserver.py", line 234, in serve_forever20:08
SpamapS    self._handle_request_noblock()20:08
Shrewsneat20:08
jeblairi can reproduce by starting a LogStreamer, connecting to it without sending anything, then stopping20:10
Shrewsif that's the serve_forever() thread, then i would expect non-blocking behavior (so we have the opportunity to stop the server)20:11
jeblairnote that in that case the *foreground* process exits, but since we're using a forking server, the background process handling the request remains20:11
jeblairShrews: no, that's the request handling subprocess20:12
jeblairthe serve_forever thread has completed20:12
SpamapSI can't really see what is blocking20:13
jeblairi think there are two potential things of interest: a) that request handling subprocess is blocking, and b) that one of those sticking around can cause testr to hang20:13
jeblairSpamapS: recv() is a blocking call20:13
mordredjeblair: I believe we can set a sockettimeout on that socket before we make the recv call20:13
SpamapSjeblair: wait, are these stacktraces bottom->down?20:14
mordredjeblair: because we're already i na handler so, the socketserver should have already done the select bits meaning we should be expecting the changeid in pretty short order20:14
* SpamapS was reading it as if the last thing was the place we were at20:14
Shrewsjeblair: ah, well the one log streamer test does just that: connects w/o sending anything then disconnects, but i've never seen it hang20:17
SpamapSShrews: likely a race20:18
jeblairSpamapS: ah yeah, innermost at top20:18
jeblairmordred: i agree20:19
jeblairis a socket recv timeout the right approach, or should we make a select loop?20:19
mordredjeblair: the socketserver already has a select loop20:20
SpamapSI thought we were asyncio'ing and had an event loop?20:20
SpamapSor this is just the log streamer, not the websocket20:20
mordredwhich is what is causing this handler to be run - the issue is that something connects then never sends data20:20
mordredSpamapS: yah20:20
jeblairmordred: i mean should we put a select loop in the handler method20:21
jeblair(it's technically the right thing to do because there's no guarantee that our request is going to arrive in one packet)20:21
SpamapSyeah, 2 lines before the spot that called into our request handler in socketserver.py is where the select was done on that socket20:21
mordredjeblair: I'm good with select loop in the handler20:22
jeblairhow long shall we give our clients to send their request?  10s?20:23
Shrewsi think that's more than reasonable20:24
Shrewsso ++20:25
jeblaircool, i'll hack this up real quick like20:25
*** dkranz has quit IRC20:32
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Extend in-repo config update support to github [wip]  https://review.openstack.org/47147720:32
jlkjeblair: ^^ that still needs the new tests we talked about, but passes existing tests. I'm going to play with the code with some real events stuffed in to see if it works right. A quick once-over would be appreciated.20:33
*** jamielennox has quit IRC20:34
mordredjlk: neat20:36
* jlk pedals off in search of calories.20:36
*** jkilpatr has joined #zuul20:38
*** jamielennox has joined #zuul20:38
mordredwoot! https://review.openstack.org/#/c/467635/ stack is green20:40
mordredjlk: when you get back, I'd love your eyes on that ^^ - it changes how we're ansibling a bit20:40
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use a select loop on log streamer socket reads  https://review.openstack.org/47148020:45
jeblairmordred, Shrews, SpamapS: ^ hopefully sufficiently paraniod20:45
mordredjeblair: it double decodes20:46
mordredjeblair: but otherwise looks great20:47
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Use a select loop on log streamer socket reads  https://review.openstack.org/47148020:48
jeblairmordred: derp, thx20:48
mordredjeblair: awesome. way to use gdb!20:49
mordredjeblair: maybe once we land that the rest of your stack can land20:49
jeblairmordred: i'm very happy to have cracked the py3 gdb mystery :)20:59
jeblairmordred: +2 on your stack21:04
Shrewsjeblair: lgtm21:10
jeblairjlk: generally looks good; i left a note about a potential gerrit improvement inline21:12
jeblairclarkb: are you interested in looking at changes 470049--470393 ?  they are the zuul-cloner tests morphing into executor git repo prep tests.21:17
jeblair(ignore the verified -1 or -2; they all pass tests)21:18
clarkbjeblair: I can, just getting back to computer post lunch now. Will take a look shortly21:19
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Use a select loop on log streamer socket reads  https://review.openstack.org/47148021:20
jlkmordred: opened.21:24
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Replace equals None with is None  https://review.openstack.org/47107521:29
clarkbjeblair: I'm looking at https://review.openstack.org/#/c/470385/3/tests/unit/test_cloner.py and states is a list with one item in it, and self.builds should have a single build in it for integration. Why use a for loop there?21:34
clarkbthe outer for loop specifically21:35
jeblairclarkb: easy copypasta between tests.  we can probably refactor that into a helper method now.21:35
jeblairthat entire for loop is now identical in all tests21:36
clarkbgotcha21:36
clarkbI also left a comment in there about the debug logging not sure if that was intentional to have or just test debugging21:36
jeblairoh that's from the copy pasta too.  it's marginally useful.21:37
jeblair(really, there's probably one or two things in there that are useful, and it would be better to just log it/them)21:37
jeblairnah, it's not really useful. i'll drop it.21:40
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Factor out duplicate code in executor repo tests  https://review.openstack.org/47149221:42
jeblairclarkb: ^ :)21:42
clarkbthanks, assuming its at the tip of this stack I will get there eventually21:42
jeblair(yep21:42
clarkbhttps://review.openstack.org/#/c/470390/3 has interesting implications for grenade21:43
clarkbany idea how setup.py develop will handle that21:44
clarkbmordred: ^21:44
jeblairSpamapS: https://review.openstack.org/456090 is relevant to earlier discussion, but i don't think it changes any conclusions21:44
clarkbjeblair: out of curiousity why not rewrite https://review.openstack.org/#/c/470390/3/tests/unit/test_cloner.py to assert that checking out oldbranch and master works?21:44
clarkbI guess that doesn't really add any coverage compared to other tests?21:45
jeblairclarkb: oh, for grenade i was imagining that we would have devstack-gate clone a second copy on the host, so we'd still have separate old/new checkouts21:45
jeblairmordred: ^21:45
*** hashar has quit IRC21:45
clarkbgotcha21:46
jeblairclarkb, mordred: so we wouldn't change the underlying branch after having done a python install21:46
jeblairclarkb: and yeah, especially considering ^ i don't think that approach would add coverage21:46
clarkbya its up to the test framework more so than zuul to affect that21:47
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Verify executor branch checkout  https://review.openstack.org/46856421:47
jeblairyup21:47
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Enable test_project_override  https://review.openstack.org/47004921:48
mordredclarkb, jeblair: I agree re: grenade above21:50
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Enable test_periodic  https://review.openstack.org/47005321:52
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Enable test_post_checkout  https://review.openstack.org/47038121:53
jeblairmordred: one thought about https://review.openstack.org/467635 -- does combining inventory and vars.yaml into one file make it harder for folks to reproduce test runs?21:54
mordredjeblair: oh - that's an excellent point21:55
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Enable test_post_and_master_checkout  https://review.openstack.org/47038521:55
jeblairmordred: (previously, vars.yaml contained "things about the test run which are useful for local reproduction" and inventory had "things you will need to completely replace")21:55
mordredyah21:55
mordredkk. let's skip that one21:55
mordred(the groups and inventory is the more important one)21:56
jeblairmordred: ok.  i +2d it anyway, but just thought of that.  it's all hand-wavey until we actuall have a reproduce procedure anyway21:56
mordred++21:58
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Fix passing labels to Gerrit when they are not defined in All-Projects  https://review.openstack.org/46542021:58
jlkwell...21:59
jlkis the inventory file written out?21:59
jlkor do we just manage it internally?21:59
mordredyes, it's written out21:59
jlkso it's still there, just not as easily consumable22:00
mordredas well as the vars file - although we have not yet added saving those things to log server22:00
mordredjlk: well - today we're writing an inventory and a vars file - the change puts the vars into the inventory file22:00
mordredjlk: so rather than the user fetching vars.yaml and ignoring the written inventory, they'd need to fetch the inventory and then replace the host entries22:01
clarkbjeblair: https://review.openstack.org/#/c/470442/2 catches a legit bug in the test fixtures22:06
clarkbjeblair: I have approved it but it won't merge until we fix that (same thing with the cleanup child change)22:07
jeblairclarkb: ah yeah, tobiash's start-reporting change is going to expose race conditions for a while.22:10
mordredjeblair: dontcha love things that expose races?22:11
clarkbjeblair: in this case I think its because a project config defines a project other than itself?22:11
jeblairmordred: well, to be fair, the change *caused* the race :)22:11
clarkboh wait thats the test that tests that hence the exception22:11
jlkthat was a difficult sentence to parse.22:11
jeblairbasically, we should suspect any failure of an assertion on reporting counts to be an expression of a race from the start-reporting change22:12
jeblairclarkb: right.  a recheck would probably push that through.  but i'm trying to fix all the reporting count assertions as they come up22:12
jeblairso i'll do that one now22:12
jeblairhttps://review.openstack.org/455711 is the change in question22:13
jeblairjamielennox: i get what's going on in https://review.openstack.org/463457 now, and left a comment about a further update.  also, i'm curious about your thoughts on my second comment there (at 2017-05-19 20:39:43+0000) -- if we have all of the builds return their own log urls, can we automatically determine the buildset log url from that?22:35
jlkuh... I think I'm running into an executor bug22:37
jlkwhere it's trying to copy files that already exist?22:37
jeblairjlk: neat!  pastebin?22:38
jlkcapturing22:38
jlkthere's a lot going on here: http://paste.openstack.org/show/611566/22:39
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add github reporter status-url config option  https://review.openstack.org/44979422:42
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove test_git_cache_dir  https://review.openstack.org/47038922:42
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove test_upgrade  https://review.openstack.org/47039022:42
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove test_periodic_update  https://review.openstack.org/47039122:42
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove test_post_checkout  https://review.openstack.org/47039222:42
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Rename TestCloner TestExecutorRepos  https://review.openstack.org/47039322:43
jlkI'm guessing that shutil.copytree doesn't like the sub contents in there?22:43
jeblairjlk: there was a change to that recently... maybe "rm -rf /var/lib/zuul/ansible" and then try twice more?22:45
jeblairjlk: obviously the first time should work; will be interesting to see if the second time fails or not22:45
jlkso shutil.copytree says that the destination must not exist22:45
jlkbut there's no checking in zuul code to see if the destination already exists22:45
jeblairjlk: ah, then i bet this is a real bug22:45
jlkwhat is the intended behavior? Replace changed files? or initial copy only?22:46
jeblairmaybe the fix should just be to rmtree target_dir at the start of _copy_ansible_files22:46
jeblairjlk: i think replace (and remove obsolete) is most desirable22:46
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Reject some untrusted config objects  https://review.openstack.org/47044222:48
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Factor out duplicate code in executor repo tests  https://review.openstack.org/47149222:48
jeblairclarkb: there's the rebase+fix ^22:48
jeblairalso, my gertty dashboard for zuulv3 is once again under 100 changes (oy!)22:49
clarkbjeblair: gerrit doesn't seem to know what the parent of that change is?22:51
clarkboh I wonder if it s a mereg commit22:51
jeblairshould be branch tip merge commit22:52
clarkbdoesn't the merger do that for us in the tests?22:52
jeblairclarkb: do what?22:52
clarkboh I get it your new test also needed the updated value22:52
clarkbjeblair: merge against target branch and provide that as zuul ref22:52
clarkbbut the problem was that you also need to change your assertion to 1 in the untrusted project test. I get it now22:53
jeblairyep22:53
jeblairthis series and another parallel series had an interaction, which is why it started failing in the gate (when the other series merged)22:53
jeblairi found it easiest to wait for this change's parent to merge, then rebase on branch tip so i'd have latest from both22:54
jeblairpabelanger: what's the zuulv3 relevance of https://review.openstack.org/413103 ?22:58
pabelangerjeblair: that was around talks about creating a minimal chroot for bwrap with DIB22:59
pabelangerwe can likely remove the topic for now22:59
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Factor out duplicate code in executor repo tests  https://review.openstack.org/47149222:59
pabelangersince we are using host FS23:00
jeblairpabelanger: k; done23:00
jeblairmordred: are https://review.openstack.org/411512 and https://review.openstack.org/411473 still relevant?23:01
jeblairmordred: i've marked the zk-shim changes WIP23:01
jeblairjamielennox: https://review.openstack.org/445656 has necessary reviews but could use an update for py323:02
jeblairShrews: https://review.openstack.org/464283  is +3able by you23:06
jeblairjhesketh: i'm abandoning some very old changes of yours which I don't think are relevant any more: https://review.openstack.org/304134 and https://review.openstack.org/34424823:13
jeblairadam_g, mordred, jlk: see comment on https://review.openstack.org/46302223:32
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Convert some leftover changeish mentions to ref  https://review.openstack.org/46571823:34
jlkblargh!23:35
jlkgithub goes down right when I'm trying to test against it.23:35
jeblairthey have had one 9 of uptime over the past month.23:36
jeblairokay, to be fair, there is a second nine in their uptime percentage.  it's just separated from the first by a couple of numbers which are not nines.23:38
jeblair98.4924%23:38
jlk95 in the past day23:39
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Reject some untrusted config objects  https://review.openstack.org/47044223:40
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Factor out duplicate code in executor repo tests  https://review.openstack.org/47149223:41
clarkbI wonder if they have to restart their services to deal with a memory leak too >_>23:41
jeblairyes, there aren't any other nines at all in today's uptime23:41
*** jkilpatr has quit IRC23:42
jeblairclarkb: i assume so, but they're more h-a so we usually don't notice.  :|23:42
openstackgerritMerged openstack-infra/zuul feature/zuulv3: sql-reporter: add support for Ref change  https://review.openstack.org/46645723:44
*** jkilpatr has joined #zuul23:45
*** jkilpatr has quit IRC23:50
jamielennoxjeblair: i'll fix that one up this morning shouldn't be too hard, i had actually thought it was merged already23:50
jamielennoxjeblair: re the buildset uuid23:50
jeblairjamielennox: cool.  and heh, yeah, i'm trying to clean house.  i'm getting confused about what's merged and what's not too.  :)23:51
jamielennoxjeblair: yea, i really don't mind what that buildset identifier is, the 'Z'+ format was always a little odd for this use case but i think buildset is the right location for the identifier and thought i may as well use it23:52
*** jkilpatr has joined #zuul23:53
jamielennoxit could maybe go on QueueItem instead, but i don't know why it would23:53
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Remove ansible files at startup before copy  https://review.openstack.org/47151823:53
jamielennoxturns out i completely missed a patch the first time i uploaded the series so i see why you didn't understand it23:54
jamielennox:)23:54
jlkjeblair: ^^ that seems to fix at least one part of second-startup. I think I hit another bit with the log streamer, but that's not fatal.23:54
jamielennoxre uwsgi - i don't think you want to use uwsgi from the python side, just expose it as wsgi and let people use what they want to host it23:55
*** jkilpatr has quit IRC23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!