*** mattw4 has quit IRC | 00:23 | |
*** sanjayu_ has quit IRC | 00:33 | |
*** jamesmcarthur has quit IRC | 00:54 | |
*** swest has quit IRC | 01:20 | |
*** swest has joined #zuul | 01:35 | |
*** rlandy|ruck has quit IRC | 01:46 | |
*** bhavikdbavishi has joined #zuul | 03:04 | |
*** bhavikdbavishi has quit IRC | 03:10 | |
*** bhavikdbavishi has joined #zuul | 03:28 | |
*** swest has quit IRC | 04:33 | |
*** raukadah is now known as chandankumar | 05:11 | |
*** quiquell|off is now known as quiquell|rover | 05:49 | |
*** pcaruana has joined #zuul | 06:06 | |
*** swest has joined #zuul | 06:07 | |
*** swest has quit IRC | 06:11 | |
*** electrofelix has joined #zuul | 06:14 | |
*** swest has joined #zuul | 06:24 | |
*** bhavikdbavishi has quit IRC | 06:39 | |
*** bhavikdbavishi has joined #zuul | 06:41 | |
*** quiquell|rover is now known as quique|rover|brb | 06:42 | |
*** ericbarrett has quit IRC | 06:54 | |
*** saneax has joined #zuul | 07:04 | |
*** quique|rover|brb is now known as quiquell|rover | 07:12 | |
*** themroc has joined #zuul | 07:13 | |
*** arxcruz|off|23 is now known as arxcruz | 07:13 | |
*** jpena|off has joined #zuul | 07:43 | |
*** jpena|off is now known as jpena | 07:43 | |
*** gtema has joined #zuul | 08:00 | |
*** jangutter has joined #zuul | 08:18 | |
*** fbo_ has joined #zuul | 09:16 | |
*** bjackman has joined #zuul | 09:36 | |
*** bhavikdbavishi has quit IRC | 09:59 | |
*** threestrands has quit IRC | 10:15 | |
*** gtema has quit IRC | 10:25 | |
*** sshnaidm|afk is now known as sshnaidm | 10:40 | |
*** bhavikdbavishi has joined #zuul | 10:44 | |
*** bhavikdbavishi has quit IRC | 10:56 | |
*** gtema has joined #zuul | 11:07 | |
*** bhavikdbavishi has joined #zuul | 11:25 | |
*** bhavikdbavishi has quit IRC | 11:25 | |
*** bhavikdbavishi has joined #zuul | 11:26 | |
*** bhavikdbavishi1 has joined #zuul | 11:30 | |
*** gtema has quit IRC | 11:30 | |
*** bhavikdbavishi has quit IRC | 11:31 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 11:31 | |
*** jpena is now known as jpena|lunch | 11:33 | |
*** quiquell|rover is now known as quique|rover|lun | 11:35 | |
*** quique|rover|lun is now known as quique|rover|eat | 11:36 | |
*** bhavikdbavishi has quit IRC | 11:42 | |
*** bhavikdbavishi has joined #zuul | 11:43 | |
*** panda is now known as panda|lunch | 11:57 | |
*** quique|rover|eat is now known as quiquell|rover | 11:59 | |
*** rlandy has joined #zuul | 12:06 | |
*** rlandy is now known as rlandy|ruck | 12:07 | |
*** jpena|lunch is now known as jpena | 12:31 | |
pabelanger | morning! | 12:37 |
---|---|---|
pabelanger | corvus: Shrews: clarkb: do you think we can plan for a nodepool release this week before openinfra summit? I'd like to pick up the network_cli patch | 12:38 |
pabelanger | I'll have to confirm again which version opendev is running currently | 12:39 |
*** bhavikdbavishi has quit IRC | 12:42 | |
*** jamesmcarthur has joined #zuul | 12:46 | |
Shrews | pabelanger: We still need to restart production with the latest changes, which I'd very much like to do today, but I need https://review.opendev.org/654462 merged and puppeted first so I can try to find the mem leak. | 12:48 |
Shrews | In 6 days, launcher memory usage on nl01 has grown from 17% to 40%, so it's not good | 12:52 |
Shrews | fyi tobiash: ^^^ | 12:53 |
pabelanger | +3 | 12:57 |
Shrews | pabelanger: thx. while you are in the reviewing mood, could you look at the very similar change for zuul? https://review.opendev.org/654599 | 13:00 |
openstackgerrit | Monty Taylor proposed zuul/nodepool master: Update devstack settings and docs for opendev https://review.opendev.org/654230 | 13:03 |
pabelanger | Shrews: +2 | 13:05 |
mordred | Shrews: I *think* I just pushed up a dependent patch for that nodepool patch that will hopefully fix the nodepool patch | 13:07 |
Shrews | neat | 13:08 |
*** bjackman has quit IRC | 13:16 | |
*** jamesmcarthur has quit IRC | 13:18 | |
openstackgerrit | Darragh Bailey (electrofelix) proposed zuul/zuul master: Improve proxy settings support for compose env https://review.opendev.org/655140 | 13:20 |
openstackgerrit | Darragh Bailey (electrofelix) proposed zuul/zuul master: Add some packages for basic python jobs https://review.opendev.org/655141 | 13:20 |
openstackgerrit | Darragh Bailey (electrofelix) proposed zuul/zuul master: Scale nodes up to 4 instances https://review.opendev.org/655142 | 13:20 |
*** panda|lunch is now known as panda | 13:23 | |
*** sshnaidm is now known as sshnaidm|afk | 13:26 | |
*** quiquell|rover is now known as quique|rover|lun | 13:26 | |
*** quique|rover|lun is now known as quique|rover|eat | 13:26 | |
*** rlandy|ruck is now known as rlandy|ruck|mtg | 13:30 | |
*** bhavikdbavishi has joined #zuul | 13:35 | |
pabelanger | corvus: mordred: I'd like to point you to https://review.opendev.org/653497/ and add to your review pipelines. That seems to be the last(?) thing needed for multi-ansible zuul | 13:38 |
*** bhavikdbavishi has quit IRC | 13:39 | |
pabelanger | corvus: maybe today, you could help give some guidance on how best to solve: https://review.opendev.org/652424/ I'd like to use write-inventory role more for our network vendor images (with nested ansible) but we seem to hardcode which vars are allowed to be used. My first thought was to expose that to the user, with sane defauts for the role | 13:41 |
*** jamesmcarthur has joined #zuul | 13:46 | |
*** quique|rover|eat is now known as quiquell|rover | 13:47 | |
*** jamesmcarthur has quit IRC | 13:47 | |
*** jamesmcarthur has joined #zuul | 13:52 | |
*** sshnaidm|afk is now known as sshnaidm | 13:55 | |
corvus | between opendev cleanup and summit prep, i will probably not have a lot of time for zuul work this week; i apologize for that. hopefully things will return to normal after the summit. | 14:00 |
corvus | pabelanger, tobiash, mordred: i like that patch, but maybe we should increase the lru size there? | 14:00 |
corvus | i'm worred that it's exactly the size of our current number of supported ansible versions, and we're about to add a fourth. | 14:01 |
corvus | there's no reason that couldn't be, like 10, right? | 14:01 |
pabelanger | I could do 10, I only did 3 based on tobiash original comments | 14:02 |
tobiash | ++ for 10 | 14:02 |
pabelanger | k, incoming patch | 14:03 |
*** saneax has quit IRC | 14:03 | |
pabelanger | ... in a few minutes, going to reorg filesystem for new opendev domain :) | 14:04 |
webknjaz | Hi, I saw here https://github.com/ansible/ansible-runner/commit/795ee7b1edb7569ba52f7df381ca515af458a4b5 there's a commit trailer `Reviewed-by: https://github.com/ansible-zuul[bot]` | 14:21 |
webknjaz | Does Zuul itself add it? | 14:21 |
webknjaz | `https://github.com/ansible-zuul[bot]` link is wrong. `ansible-zuul[bot]` is a display name on GitHub in multiple places but it's a bot and the correct URL would be `https://github.com/apps/ansible-zuul` | 14:22 |
clarkb | pabelanger: ^ that may br a zuul you know about | 14:22 |
*** electrofelix has quit IRC | 14:22 | |
pabelanger | webknjaz: clarkb: yah, that is zuul.ansible.com | 14:27 |
webknjaz | that needs fixing | 14:27 |
pabelanger | will have to look at zuul github driver to see where that is being set | 14:27 |
pabelanger | I believe we just ask github to merge the commit, and this is the default message | 14:28 |
webknjaz | nope | 14:28 |
webknjaz | `Reviewed-by: ` is a custom trailer | 14:28 |
webknjaz | Sometimes it may add `Co-Authored-By: ` but that's it | 14:29 |
*** rlandy|ruck|mtg is now known as rlandy|ruck | 14:30 | |
pabelanger | https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubreporter.py#L167 | 14:30 |
pabelanger | that looks to be the code | 14:30 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Bump lru_cache size to 10 https://review.opendev.org/655173 | 14:44 |
webknjaz | pagelanger: `self.connection.getUserUri(username)` this probably doesn't know about `[bot]` suffix | 14:46 |
pabelanger | I'm look at the event now, to see if we can get info directly from it | 14:50 |
pabelanger | otherwise, we likely can add some logic, if [bot], then assume github app | 14:51 |
pabelanger | "html_url": "https://github.com/apps/ansible-zuul", and | 14:53 |
pabelanger | "type": "Bot", | 14:53 |
webknjaz | yep | 14:55 |
webknjaz | I was about to share the same payload :D | 14:56 |
*** sshnaidm is now known as sshnaidm|afk | 15:54 | |
*** quiquell|rover is now known as quiquell|off | 16:04 | |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Use user.html_url for github reporter messages https://review.opendev.org/655188 | 16:17 |
pabelanger | webknjaz: ^should be the fix | 16:17 |
pabelanger | however untested | 16:17 |
pabelanger | tobiash: you maybe be interested in ^ | 16:17 |
pabelanger | if you could help review the logic on github side | 16:18 |
tobiash | pabelanger: lgtm, but that is only part of the real fix, because zuul adds reviewed-by and just uses the source event account (which most of the time is zuul itself). So we should fix that too and really get the reviews of the pr. | 16:21 |
pabelanger | tobiash: yah, that would be cool | 16:22 |
tobiash | pabelanger: this is the problem: https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubreporter.py#L173 | 16:22 |
tobiash | that's on my todo list since a long time already but haven't got to it yet | 16:23 |
*** mrhillsman is now known as openlab | 16:23 | |
*** openlab is now known as mrhillsman | 16:23 | |
*** mrhillsman is now known as openlab | 16:24 | |
*** openlab is now known as mrhillsman | 16:25 | |
pabelanger | tobiash: yah, I was struggling to figure out what source_event really was there. It would be helpful to maybe link that to github api | 16:25 |
tobiash | source_event is the last event that updated the pr afaik | 16:25 |
tobiash | it's from the change zuul data structure | 16:25 |
pabelanger | yah, that should be zuul report back the gate results to PR | 16:29 |
*** themroc has quit IRC | 16:29 | |
*** mattw4 has joined #zuul | 16:45 | |
*** altlogbot_2 has quit IRC | 16:50 | |
*** jpena is now known as jpena|off | 16:52 | |
*** altlogbot_1 has joined #zuul | 16:56 | |
pabelanger | tobiash: Hmm, auth error: http://paste.openstack.org/show/749656/ | 17:02 |
pabelanger | but user has correct permissions | 17:02 |
pabelanger | guessing remote side flaked out again | 17:03 |
pabelanger | going to see if we can add a retry their | 17:03 |
pabelanger | there* | 17:03 |
pabelanger | I sent the event again from github ui and zuul did the right thing | 17:03 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Add retries to getPullReviews() with github https://review.opendev.org/655204 | 17:23 |
pabelanger | tobiash: ^copypaste from our other function, if you'd like to review | 17:24 |
pabelanger | I have to run an errand for 30mins, back shortly | 17:24 |
*** jamesmcarthur has quit IRC | 17:30 | |
*** jamesmcarthur has joined #zuul | 17:31 | |
*** jamesmcarthur has quit IRC | 17:35 | |
tobiash | pabelanger: while that retry looks useful I don't think it solves this particular issue. Details later when I'm at pc to verify my hunch | 17:40 |
* tobiash has to run errands first as well | 17:42 | |
pabelanger | tobiash: kk | 17:54 |
*** themroc has joined #zuul | 18:11 | |
tobiash | pabelanger: hrm, my hunch was that the access token was expired, but looking at the code and your log this seems unlikely | 18:19 |
tobiash | because there should be a buffer of at least two minutes left after getGithubClient | 18:20 |
tobiash | pabelanger: is your clock running accurate? | 18:20 |
tobiash | I had this problem in our early days once because ntp/chrony didn't work unexpectedly | 18:21 |
pabelanger | tobiash: let me check drift, but ntp is running | 18:21 |
pabelanger | tobiash: hard to tell, but systemd-timesyncd is running. First time using this service | 18:26 |
pabelanger | but time is correct | 18:26 |
tobiash | pabelanger: this should give you more detail: timedatectl timesync-status | 18:27 |
pabelanger | http://paste.openstack.org/show/749661/ | 18:28 |
pabelanger | however, it seems it only runs when network configuration changed | 18:28 |
pabelanger | which means I either need to install chrony or look to see how to run systemd-timesyncd more often | 18:29 |
pabelanger | so, possible there is some drift on the server | 18:29 |
tobiash | pabelanger: ok, so there are two possibilities, clock drift or some weird hickup in github where the retry would only fix the latter | 18:31 |
pabelanger | okay, looking at man page | 18:32 |
pabelanger | http://paste.openstack.org/show/749662/ | 18:32 |
tobiash | however so far I only saw this kind of error if either github or zuul had a time drift | 18:32 |
pabelanger | okay, I'll have to audit logs and see how often this has happend | 18:35 |
*** jamesmcarthur has joined #zuul | 18:44 | |
*** jamesmcarthur_ has joined #zuul | 18:45 | |
*** themroc has quit IRC | 18:48 | |
*** jamesmcarthur has quit IRC | 18:49 | |
corvus | we're restarting the opendev zuul because it looks like there is a memory leak. first appeared after our april 16 restart: | 19:06 |
corvus | http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=64792&rra_id=3&view_type=&graph_start=1555130676&graph_end=1555611492&graph_height=120&graph_width=500&title_font_size=12 | 19:06 |
corvus | i haven't tracked down what versions are involved yet | 19:06 |
corvus | Shrews: ^ fyi | 19:07 |
Shrews | yay for all the leaks! abandon ship? | 19:07 |
Shrews | corvus: did we put any zk caching into zuul itself? | 19:08 |
tobiash | not that I know | 19:09 |
-openstackstatus- NOTICE: the zuul scheduler is being restarted now in order to address a memory utilization problem; changes under test will be reenqueued automatically | 19:10 | |
tobiash | but there is a commit -> pr cache in the github driver that could have landed around that time | 19:12 |
clarkb | I thought I landed that a while back and that we bounded the upper size of that cache? | 19:14 |
tobiash | clarkb: I don't know if we checked the memory usage after it landed | 19:15 |
tobiash | https://review.opendev.org/637615 | 19:15 |
tobiash | landed 9 weeks ago | 19:15 |
tobiash | but I guess you restarted zuul more often | 19:16 |
clarkb | ya we definitely restarted gerrit soon after that landed | 19:17 |
clarkb | and then checked that it improved kata's experience | 19:17 |
tobiash | based on your status log the restart before april 16 was 2019-03-19 21:11:31 UTC restarted all of zuul at commit 77ffb70104959803a8ee70076845c185bd17ddc1 | 19:18 |
tobiash | I didn't find anything obvious during a quick scan through the history | 19:23 |
tobiash | I'll double check my deployment if I see something similar | 19:24 |
clarkb | is it possible my security fix is leaking? | 19:24 |
clarkb | (if it somehow causes configs to hang around in memory) | 19:24 |
tobiash | which one was it? | 19:25 |
clarkb | https://opendev.org/zuul/zuul/commit/41b6b0ea335866be27970f719d1ba7b256418fa4 | 19:25 |
corvus | that's certainly a good thing to check | 19:28 |
tobiash | mine looks similar: https://paste.pics/9aa6d3f5054b7aebd0fbebaa228530a6 | 19:32 |
*** jamesmcarthur_ has quit IRC | 19:32 | |
tobiash | also since around april 16 | 19:32 |
clarkb | if it is that change i would suspect return item.queue.pipeline.tenant.layout lines as being the cause as otherwise we return a untrusted layout as before | 19:34 |
tobiash | those are the upstream changes between my restarts: https://etherpad.openstack.org/p/pNVg8WQ4la | 19:36 |
tobiash | from those I think the security fix looks most likely | 19:36 |
openstackgerrit | Sean McGinnis proposed zuul/zuul-jobs master: ensure-twine: Don't install --user if running in venv https://review.opendev.org/655241 | 19:56 |
openstackgerrit | Sean McGinnis proposed zuul/zuul-jobs master: ensure-twine: Don't install --user if running in venv https://review.opendev.org/655241 | 19:58 |
clarkb | fwiw I dont see how it would cause it but cant rule it out and agree it is alikely source | 20:07 |
*** jamesmcarthur has joined #zuul | 20:08 | |
pabelanger | tobiash: which tool is that? | 20:09 |
tobiash | pabelanger: you mean the graph? | 20:10 |
pabelanger | tobiash: yah | 20:10 |
tobiash | it's grafana using prometheus as backend | 20:11 |
tobiash | And I think that data is coming from the kubelet exporter | 20:12 |
pabelanger | ah, for some reason didn't look like grafana | 20:12 |
*** jamesmcarthur has quit IRC | 20:12 | |
tobiash | I cropped it from the new explore view in grafana 6 | 20:17 |
tobiash | It allows you to on demand hack requests together without the need to create a dashboard first | 20:19 |
pabelanger | nice | 20:20 |
clarkb | tobiash: are the strikethroughs the result of a bisection? | 20:21 |
tobiash | No, just those where I'm pretty sure that they are unrelated | 20:22 |
tobiash | Like test improvements and web things | 20:22 |
clarkb | gotcha | 20:22 |
tobiash | But beware that this might be completely unrelated, a longer period graph in my env shows a similar behavior already quite some time so my graphs might also be load related | 20:24 |
tobiash | So don't take that list for granted, it's just a first hunch | 20:25 |
clarkb | ya though ours definitely seems to have started on the 16th so I think the end point in your list is probably at least correct | 20:26 |
*** pcaruana has quit IRC | 20:32 | |
clarkb | https://opendev.org/zuul/zuul/src/commit/41b6b0ea335866be27970f719d1ba7b256418fa4/zuul/manager/__init__.py#L554-L568 is the block that ended up being very different in my fix | 20:32 |
clarkb | previously we returned the trusted_layout in that situation | 20:32 |
clarkb | maybe that is causing us to leak those pipeline layouts somehow? | 20:33 |
clarkb | (whereas before we would return the function local trusted layout and it would clean up | 20:33 |
*** jamesmcarthur has joined #zuul | 20:39 | |
clarkb | heh my on disk path for zuul source is now src/zuul/zuul/zuul/manager/__init__.py | 20:40 |
clarkb | is that enough zuuls? | 20:40 |
*** jamesmcarthur has quit IRC | 20:46 | |
clarkb | ok I don't think it is the errors in that config handler related to this at least the times don't seem to strongly correlate to when we have big rises in memory use | 20:50 |
clarkb | could still be something else in that chagne though | 20:51 |
*** jamesmcarthur has joined #zuul | 20:57 | |
*** jamesmcarthur has quit IRC | 21:01 | |
*** jamesmcarthur has joined #zuul | 21:06 | |
*** rfolco has quit IRC | 21:24 | |
*** tjgresha_nope has quit IRC | 21:41 | |
*** jamesmcarthur has quit IRC | 21:55 | |
*** jamesmcarthur has joined #zuul | 21:58 | |
*** jamesmcarthur has quit IRC | 22:15 | |
*** jangutter has quit IRC | 22:35 | |
*** jangutter has joined #zuul | 23:40 | |
*** rlandy|ruck has quit IRC | 23:40 | |
*** mattw4 has quit IRC | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!