Tuesday, 2019-04-23

*** mattw4 has quit IRC00:23
*** sanjayu_ has quit IRC00:33
*** jamesmcarthur has quit IRC00:54
*** swest has quit IRC01:20
*** swest has joined #zuul01:35
*** rlandy|ruck has quit IRC01:46
*** bhavikdbavishi has joined #zuul03:04
*** bhavikdbavishi has quit IRC03:10
*** bhavikdbavishi has joined #zuul03:28
*** swest has quit IRC04:33
*** raukadah is now known as chandankumar05:11
*** quiquell|off is now known as quiquell|rover05:49
*** pcaruana has joined #zuul06:06
*** swest has joined #zuul06:07
*** swest has quit IRC06:11
*** electrofelix has joined #zuul06:14
*** swest has joined #zuul06:24
*** bhavikdbavishi has quit IRC06:39
*** bhavikdbavishi has joined #zuul06:41
*** quiquell|rover is now known as quique|rover|brb06:42
*** ericbarrett has quit IRC06:54
*** saneax has joined #zuul07:04
*** quique|rover|brb is now known as quiquell|rover07:12
*** themroc has joined #zuul07:13
*** arxcruz|off|23 is now known as arxcruz07:13
*** jpena|off has joined #zuul07:43
*** jpena|off is now known as jpena07:43
*** gtema has joined #zuul08:00
*** jangutter has joined #zuul08:18
*** fbo_ has joined #zuul09:16
*** bjackman has joined #zuul09:36
*** bhavikdbavishi has quit IRC09:59
*** threestrands has quit IRC10:15
*** gtema has quit IRC10:25
*** sshnaidm|afk is now known as sshnaidm10:40
*** bhavikdbavishi has joined #zuul10:44
*** bhavikdbavishi has quit IRC10:56
*** gtema has joined #zuul11:07
*** bhavikdbavishi has joined #zuul11:25
*** bhavikdbavishi has quit IRC11:25
*** bhavikdbavishi has joined #zuul11:26
*** bhavikdbavishi1 has joined #zuul11:30
*** gtema has quit IRC11:30
*** bhavikdbavishi has quit IRC11:31
*** bhavikdbavishi1 is now known as bhavikdbavishi11:31
*** jpena is now known as jpena|lunch11:33
*** quiquell|rover is now known as quique|rover|lun11:35
*** quique|rover|lun is now known as quique|rover|eat11:36
*** bhavikdbavishi has quit IRC11:42
*** bhavikdbavishi has joined #zuul11:43
*** panda is now known as panda|lunch11:57
*** quique|rover|eat is now known as quiquell|rover11:59
*** rlandy has joined #zuul12:06
*** rlandy is now known as rlandy|ruck12:07
*** jpena|lunch is now known as jpena12:31
pabelangermorning!12:37
pabelangercorvus: Shrews: clarkb: do you think we can plan for a nodepool release this week before openinfra summit? I'd like to pick up the network_cli patch12:38
pabelangerI'll have to confirm again which version opendev is running currently12:39
*** bhavikdbavishi has quit IRC12:42
*** jamesmcarthur has joined #zuul12:46
Shrewspabelanger: We still need to restart production with the latest changes, which I'd very much like to do today, but I need https://review.opendev.org/654462 merged and puppeted first so I can try to find the mem leak.12:48
ShrewsIn 6 days, launcher memory usage on nl01 has grown from 17% to 40%, so it's not good12:52
Shrewsfyi tobiash: ^^^12:53
pabelanger+312:57
Shrewspabelanger: thx. while you are in the reviewing mood, could you look at the very similar change for zuul?  https://review.opendev.org/65459913:00
openstackgerritMonty Taylor proposed zuul/nodepool master: Update devstack settings and docs for opendev  https://review.opendev.org/65423013:03
pabelangerShrews: +213:05
mordredShrews: I *think* I just pushed up a dependent patch for that nodepool patch that will hopefully fix the nodepool patch13:07
Shrewsneat13:08
*** bjackman has quit IRC13:16
*** jamesmcarthur has quit IRC13:18
openstackgerritDarragh Bailey (electrofelix) proposed zuul/zuul master: Improve proxy settings support for compose env  https://review.opendev.org/65514013:20
openstackgerritDarragh Bailey (electrofelix) proposed zuul/zuul master: Add some packages for basic python jobs  https://review.opendev.org/65514113:20
openstackgerritDarragh Bailey (electrofelix) proposed zuul/zuul master: Scale nodes up to 4 instances  https://review.opendev.org/65514213:20
*** panda|lunch is now known as panda13:23
*** sshnaidm is now known as sshnaidm|afk13:26
*** quiquell|rover is now known as quique|rover|lun13:26
*** quique|rover|lun is now known as quique|rover|eat13:26
*** rlandy|ruck is now known as rlandy|ruck|mtg13:30
*** bhavikdbavishi has joined #zuul13:35
pabelangercorvus: mordred: I'd like to point you to https://review.opendev.org/653497/ and add to your review pipelines. That seems to be the last(?) thing needed for multi-ansible zuul13:38
*** bhavikdbavishi has quit IRC13:39
pabelangercorvus: maybe today, you could help give some guidance on how best to solve: https://review.opendev.org/652424/ I'd like to use write-inventory role more for our network vendor images (with nested ansible) but we seem to hardcode which vars are allowed to be used.  My first thought was to expose that to the user, with sane defauts for the role13:41
*** jamesmcarthur has joined #zuul13:46
*** quique|rover|eat is now known as quiquell|rover13:47
*** jamesmcarthur has quit IRC13:47
*** jamesmcarthur has joined #zuul13:52
*** sshnaidm|afk is now known as sshnaidm13:55
corvusbetween opendev cleanup and summit prep, i will probably not have a lot of time for zuul work this week; i apologize for that.  hopefully things will return to normal after the summit.14:00
corvuspabelanger, tobiash, mordred: i like that patch, but maybe we should increase the lru size there?14:00
corvusi'm worred that it's exactly the size of our current number of supported ansible versions, and we're about to add a fourth.14:01
corvusthere's no reason that couldn't be, like 10, right?14:01
pabelangerI could do 10, I only did 3 based on tobiash original comments14:02
tobiash++ for 1014:02
pabelangerk, incoming patch14:03
*** saneax has quit IRC14:03
pabelanger... in a few minutes, going to reorg filesystem for new opendev domain :)14:04
webknjazHi, I saw here https://github.com/ansible/ansible-runner/commit/795ee7b1edb7569ba52f7df381ca515af458a4b5 there's a commit trailer `Reviewed-by: https://github.com/ansible-zuul[bot]`14:21
webknjazDoes Zuul itself add it?14:21
webknjaz`https://github.com/ansible-zuul[bot]` link is wrong. `ansible-zuul[bot]` is a display name on GitHub in multiple places but it's a bot and the correct URL would be `https://github.com/apps/ansible-zuul`14:22
clarkbpabelanger: ^ that may br a zuul you know about14:22
*** electrofelix has quit IRC14:22
pabelangerwebknjaz: clarkb: yah, that is zuul.ansible.com14:27
webknjazthat needs fixing14:27
pabelangerwill have to look at zuul github driver to see where that is being set14:27
pabelangerI believe we just ask github to merge the commit, and this is the default message14:28
webknjaznope14:28
webknjaz`Reviewed-by: ` is a custom trailer14:28
webknjazSometimes it may add `Co-Authored-By: ` but that's it14:29
*** rlandy|ruck|mtg is now known as rlandy|ruck14:30
pabelangerhttps://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubreporter.py#L16714:30
pabelangerthat looks to be the code14:30
openstackgerritPaul Belanger proposed zuul/zuul master: Bump lru_cache size to 10  https://review.opendev.org/65517314:44
webknjazpagelanger: `self.connection.getUserUri(username)` this probably doesn't know about `[bot]` suffix14:46
pabelangerI'm look at the event now, to see if we can get info directly from it14:50
pabelangerotherwise, we likely can add some logic, if [bot], then assume github app14:51
pabelanger    "html_url": "https://github.com/apps/ansible-zuul", and14:53
pabelanger    "type": "Bot",14:53
webknjazyep14:55
webknjazI was about to share the same payload :D14:56
*** sshnaidm is now known as sshnaidm|afk15:54
*** quiquell|rover is now known as quiquell|off16:04
openstackgerritPaul Belanger proposed zuul/zuul master: Use user.html_url for github reporter messages  https://review.opendev.org/65518816:17
pabelangerwebknjaz: ^should be the fix16:17
pabelangerhowever untested16:17
pabelangertobiash: you maybe be interested in ^16:17
pabelangerif you could help review the logic on github side16:18
tobiashpabelanger: lgtm, but that is only part of the real fix, because zuul adds reviewed-by and just uses the source event account (which most of the time is zuul itself). So we should fix that too and really get the reviews of the pr.16:21
pabelangertobiash: yah, that would be cool16:22
tobiashpabelanger: this is the problem: https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubreporter.py#L17316:22
tobiashthat's on my todo list since a long time already but haven't got to it yet16:23
*** mrhillsman is now known as openlab16:23
*** openlab is now known as mrhillsman16:23
*** mrhillsman is now known as openlab16:24
*** openlab is now known as mrhillsman16:25
pabelangertobiash: yah, I was struggling to figure out what source_event really was there. It would be helpful to maybe link that to github api16:25
tobiashsource_event is the last event that updated the pr afaik16:25
tobiashit's from the change zuul data structure16:25
pabelangeryah, that should be zuul report back the gate results to PR16:29
*** themroc has quit IRC16:29
*** mattw4 has joined #zuul16:45
*** altlogbot_2 has quit IRC16:50
*** jpena is now known as jpena|off16:52
*** altlogbot_1 has joined #zuul16:56
pabelangertobiash: Hmm, auth error: http://paste.openstack.org/show/749656/17:02
pabelangerbut user has correct permissions17:02
pabelangerguessing remote side flaked out again17:03
pabelangergoing to see if we can add a retry their17:03
pabelangerthere*17:03
pabelangerI sent the event again from github ui and zuul did the right thing17:03
openstackgerritPaul Belanger proposed zuul/zuul master: Add retries to getPullReviews() with github  https://review.opendev.org/65520417:23
pabelangertobiash: ^copypaste from our other function, if you'd like to review17:24
pabelangerI have to run an errand for 30mins, back shortly17:24
*** jamesmcarthur has quit IRC17:30
*** jamesmcarthur has joined #zuul17:31
*** jamesmcarthur has quit IRC17:35
tobiashpabelanger: while that retry looks useful I don't think it solves this particular issue. Details later when I'm at pc to verify my hunch17:40
* tobiash has to run errands first as well17:42
pabelangertobiash: kk17:54
*** themroc has joined #zuul18:11
tobiashpabelanger: hrm, my hunch was that the access token was expired, but looking at the code and your log this seems unlikely18:19
tobiashbecause there should be a buffer of at least two minutes left after getGithubClient18:20
tobiashpabelanger: is your clock running accurate?18:20
tobiashI had this problem in our early days once because ntp/chrony didn't work unexpectedly18:21
pabelangertobiash: let me check drift, but ntp is running18:21
pabelangertobiash: hard to tell, but systemd-timesyncd is running. First time using this service18:26
pabelangerbut time is correct18:26
tobiashpabelanger: this should give you more detail: timedatectl timesync-status18:27
pabelangerhttp://paste.openstack.org/show/749661/18:28
pabelangerhowever, it seems it only runs when network configuration changed18:28
pabelangerwhich means I either need to install chrony or look to see how to run systemd-timesyncd more often18:29
pabelangerso, possible there is some drift on the server18:29
tobiashpabelanger: ok, so there are two possibilities, clock drift or some weird hickup in github where the retry would only fix the latter18:31
pabelangerokay, looking at man page18:32
pabelangerhttp://paste.openstack.org/show/749662/18:32
tobiashhowever so far I only saw this kind of error if either github or zuul had a time drift18:32
pabelangerokay, I'll have to audit logs and see how often this has happend18:35
*** jamesmcarthur has joined #zuul18:44
*** jamesmcarthur_ has joined #zuul18:45
*** themroc has quit IRC18:48
*** jamesmcarthur has quit IRC18:49
corvuswe're restarting the opendev zuul because it looks like there is a memory leak.  first appeared after our april 16 restart:19:06
corvushttp://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=64792&rra_id=3&view_type=&graph_start=1555130676&graph_end=1555611492&graph_height=120&graph_width=500&title_font_size=1219:06
corvusi haven't tracked down what versions are involved yet19:06
corvusShrews: ^ fyi19:07
Shrewsyay for all the leaks! abandon ship?19:07
Shrewscorvus: did we put any zk caching into zuul itself?19:08
tobiashnot that I know19:09
-openstackstatus- NOTICE: the zuul scheduler is being restarted now in order to address a memory utilization problem; changes under test will be reenqueued automatically19:10
tobiashbut there is a commit -> pr cache in the github driver that could have landed around that time19:12
clarkbI thought I landed that a while back and that we bounded the upper size of that cache?19:14
tobiashclarkb: I don't know if we checked the memory usage after it landed19:15
tobiashhttps://review.opendev.org/63761519:15
tobiashlanded 9 weeks ago19:15
tobiashbut I guess you restarted zuul more often19:16
clarkbya we definitely restarted gerrit soon after that landed19:17
clarkband then checked that it improved kata's experience19:17
tobiashbased on your status log the restart before april 16 was 2019-03-19 21:11:31 UTC restarted all of zuul at commit 77ffb70104959803a8ee70076845c185bd17ddc119:18
tobiashI didn't find anything obvious during a quick scan through the history19:23
tobiashI'll double check my deployment if I see something similar19:24
clarkbis it possible my security fix is leaking?19:24
clarkb(if it somehow causes configs to hang around in memory)19:24
tobiashwhich one was it?19:25
clarkbhttps://opendev.org/zuul/zuul/commit/41b6b0ea335866be27970f719d1ba7b256418fa419:25
corvusthat's certainly a good thing to check19:28
tobiashmine looks similar: https://paste.pics/9aa6d3f5054b7aebd0fbebaa228530a619:32
*** jamesmcarthur_ has quit IRC19:32
tobiashalso since around april 1619:32
clarkbif it is that change i would suspect return item.queue.pipeline.tenant.layout lines as being the cause as otherwise we return a untrusted layout as before19:34
tobiashthose are the upstream changes between my restarts: https://etherpad.openstack.org/p/pNVg8WQ4la19:36
tobiashfrom those I think the security fix looks most likely19:36
openstackgerritSean McGinnis proposed zuul/zuul-jobs master: ensure-twine: Don't install --user if running in venv  https://review.opendev.org/65524119:56
openstackgerritSean McGinnis proposed zuul/zuul-jobs master: ensure-twine: Don't install --user if running in venv  https://review.opendev.org/65524119:58
clarkbfwiw I dont see how it would cause it but cant rule it out and agree it is alikely source20:07
*** jamesmcarthur has joined #zuul20:08
pabelangertobiash: which tool is that?20:09
tobiashpabelanger: you mean the graph?20:10
pabelangertobiash: yah20:10
tobiashit's grafana using prometheus as backend20:11
tobiashAnd I think that data is coming from the kubelet exporter20:12
pabelangerah, for some reason didn't look like grafana20:12
*** jamesmcarthur has quit IRC20:12
tobiashI cropped it from the new explore view in grafana 620:17
tobiashIt allows you to on demand hack requests together without the need to create a dashboard first20:19
pabelangernice20:20
clarkbtobiash: are the strikethroughs the result of a bisection?20:21
tobiashNo, just those where I'm pretty sure that they are unrelated20:22
tobiashLike test improvements and web things20:22
clarkbgotcha20:22
tobiashBut beware that this might be completely unrelated, a longer period graph in my env shows a similar behavior already quite some time so my graphs might also be load related20:24
tobiashSo don't take that list for granted, it's just a first hunch20:25
clarkbya though ours definitely seems to have started on the 16th so I think the end point in your list is probably at least correct20:26
*** pcaruana has quit IRC20:32
clarkbhttps://opendev.org/zuul/zuul/src/commit/41b6b0ea335866be27970f719d1ba7b256418fa4/zuul/manager/__init__.py#L554-L568 is the block that ended up being very different in my fix20:32
clarkbpreviously we returned the trusted_layout in that situation20:32
clarkbmaybe that is causing us to leak those pipeline layouts somehow?20:33
clarkb(whereas before we would return the function local trusted layout and it would clean up20:33
*** jamesmcarthur has joined #zuul20:39
clarkbheh my on disk path for zuul source is now src/zuul/zuul/zuul/manager/__init__.py20:40
clarkbis that enough zuuls?20:40
*** jamesmcarthur has quit IRC20:46
clarkbok I don't think it is the errors in that config handler related to this at least the times don't seem to strongly correlate to when we have big rises in memory use20:50
clarkbcould still be something else in that chagne though20:51
*** jamesmcarthur has joined #zuul20:57
*** jamesmcarthur has quit IRC21:01
*** jamesmcarthur has joined #zuul21:06
*** rfolco has quit IRC21:24
*** tjgresha_nope has quit IRC21:41
*** jamesmcarthur has quit IRC21:55
*** jamesmcarthur has joined #zuul21:58
*** jamesmcarthur has quit IRC22:15
*** jangutter has quit IRC22:35
*** jangutter has joined #zuul23:40
*** rlandy|ruck has quit IRC23:40
*** mattw4 has quit IRC23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!