Friday, 2020-09-11

*** sanjayu__ has quit IRC00:19
*** sgw has quit IRC00:31
*** zenkuro has quit IRC00:50
*** hamalq_ has quit IRC01:03
*** openstackgerrit has joined #zuul01:15
openstackgerritIan Wienand proposed zuul/zuul master: web: PF4 minor rework of log viewer page  https://review.opendev.org/75114001:15
ianwfelixedel: thanks, i think your work looks great.  it makes some other bits look a bit old now :) i had a go at some PF4ness for the log viewer page ^01:16
openstackgerritIan Wienand proposed zuul/zuul master: web: PF4 minor rework of log viewer page  https://review.opendev.org/75114002:08
openstackgerritIan Wienand proposed zuul/zuul master: web: PF4 minor rework of log viewer page  https://review.opendev.org/75114002:50
*** bstinson has quit IRC04:26
*** evrardjp has quit IRC04:33
*** evrardjp has joined #zuul04:33
*** bstinson has joined #zuul04:38
*** cloudnull has quit IRC04:51
*** cloudnull has joined #zuul04:52
openstackgerritJan Kubovy proposed zuul/zuul master: Prepare Zookeeper for scale-out scheduler  https://review.opendev.org/71726905:19
felixedelianw: Glad you like it :) Right, I had the same feeling yesterday. But currently I have too many PF4 changes open and I want to finish thhose first before starting with another new page :D05:38
*** wxy has joined #zuul06:16
*** jcapitao has joined #zuul06:55
*** jcapitao has quit IRC07:04
*** jcapitao has joined #zuul07:06
*** jcapitao has quit IRC07:21
*** saneax has joined #zuul07:24
*** jcapitao has joined #zuul07:26
openstackgerritTobias Henkel proposed zuul/zuul master: Fix memleak on zk session loss  https://review.opendev.org/75117007:26
openstackgerritTobias Henkel proposed zuul/zuul master: Clear traceback before attaching exception to event  https://review.opendev.org/75117107:26
openstackgerritTobias Henkel proposed zuul/zuul master: Remove source_event from Change objects  https://review.opendev.org/75117207:26
*** hashar has joined #zuul07:39
*** saneax has quit IRC07:44
*** jpena|off is now known as jpena07:44
*** jcapitao has quit IRC07:45
*** LLIU82 has joined #zuul08:01
openstackgerritLida Liu proposed zuul/zuul master: Add commit id to Change for mqtt reporter  https://review.opendev.org/72247808:02
mhufelixedel, that's wise, any change you'd like reviewed in priority?08:22
tobiashzuul-maint: we have a couple of memleak fixes we've been hunting throughout this week: https://review.opendev.org/#/q/project:zuul/zuul+topic:memleak-fixes08:23
felixedelmu: https://review.opendev.org/#/c/741385/6 and https://review.opendev.org/#/c/746112/9 would be cool. But I have the feeling that the latter one must be rebased after the merged our "scroll issue fixes" and the modal changes (https://review.opendev.org/#/c/750875/1 + parents). The filtertoolbar works independently, though.08:29
*** jcapitao has joined #zuul08:30
felixedel^mhu08:30
*** ssbarnea has joined #zuul08:38
mhufelixedel, oops disregard comments on obsolete PS, I guess they were kept in cache by firefow08:43
*** tosky has joined #zuul08:47
*** wuchunyang has joined #zuul08:51
felixedelianw, corvus: I've abandoned my other "try to fix the scroll issues" changes as I think they are not necessary anylonger once the config error modal and the related changes are merged.08:57
*** vishalmanchanda has joined #zuul09:00
*** harrymichal has joined #zuul09:02
*** armstrongs has joined #zuul09:02
*** ssbarnea has quit IRC09:06
*** LLIU82 has quit IRC09:08
*** harrymichal has quit IRC09:25
*** zenkuro has joined #zuul09:27
openstackgerritJan Kubovy proposed zuul/zuul master: Prepare Zookeeper for scale-out scheduler  https://review.opendev.org/71726909:29
openstackgerritJan Kubovy proposed zuul/zuul master: Mandatory Zookeeper connection for ZuulWeb in tests  https://review.opendev.org/72125409:29
openstackgerritJan Kubovy proposed zuul/zuul master: Driver event ingestion  https://review.opendev.org/71729909:29
openstackgerritJan Kubovy proposed zuul/zuul master: Connect merger to Zookeeper  https://review.opendev.org/71622109:29
openstackgerritJan Kubovy proposed zuul/zuul master: Connect fingergw to Zookeeper  https://review.opendev.org/71687509:29
openstackgerritJan Kubovy proposed zuul/zuul master: Connect executor to Zookeeper  https://review.opendev.org/71626209:29
openstackgerritJan Kubovy proposed zuul/zuul master: WIP: Switch to using zookeeper instead of gearman for jobs (keep gearman for mergers)  https://review.opendev.org/74441609:29
*** saneax has joined #zuul09:35
*** sanjayu_ has joined #zuul09:41
*** saneax has quit IRC09:42
*** nils has joined #zuul10:06
*** mnaser has quit IRC10:11
*** gundalow has quit IRC10:11
*** donnyd has quit IRC10:11
*** ttx has quit IRC10:11
*** andreykurilin has quit IRC10:11
*** freefood has quit IRC10:11
*** corvus has quit IRC10:11
*** andreykurilin has joined #zuul10:11
*** corvus has joined #zuul10:11
*** gundalow has joined #zuul10:11
*** donnyd has joined #zuul10:11
*** freefood has joined #zuul10:16
openstackgerritMatthieu Huin proposed zuul/zuul-client master: Add console-stream subcommand  https://review.opendev.org/75123810:32
mhuzuul-maint: https://review.opendev.org/#/c/749775/ needs the last +3!10:35
*** wuchunyang has quit IRC10:35
tobiash+3 with comment10:47
*** jcapitao is now known as jcapitao_lunch10:47
tobiashmhu: cool, you already thought about the encrypt subcommand :)10:48
openstackgerritMerged zuul/zuul-client master: Initialize repository  https://review.opendev.org/74977510:56
openstackgerritMerged zuul/zuul-client master: Add promote, release jobs  https://review.opendev.org/75019310:56
*** sanjayu__ has joined #zuul11:03
*** sanjayu_ has quit IRC11:05
tobiashzuul-maint: this bugfix for vars returned via zuul_return in combination with retries would need another review: https://review.opendev.org/71100211:20
*** sanjayu__ has quit IRC11:28
*** hashar has quit IRC11:34
*** ttx has joined #zuul11:40
*** jpena is now known as jpena|lunch11:42
mhuthe promote jobs for zuul-client seem to be missing something https://zuul.opendev.org/t/zuul/builds?project=zuul%2Fzuul-client&pipeline=promote11:49
*** jcapitao_lunch is now known as jcapitao11:59
*** rfolco|ruck has joined #zuul12:01
*** rlandy has joined #zuul12:01
*** zenkuro has quit IRC12:14
*** Goneri has joined #zuul12:16
tobiashmhu: there is a jobname mismatch: opendev-tox-docs vs zuul-tox-docs12:16
mhutobiash, ah I see, I'll upload a patch12:20
mhuI guess build-python-release is also missing12:20
tobiashmhu: k, so just use zuul-tox-docs instead of opendev-tox-docs and the docs promote should work12:21
tobiashnodepool does it like that as well12:21
*** LLIU82 has joined #zuul12:22
tobiashand gate misses build-python-release12:22
tobiashyepp12:22
LLIU82https://review.opendev.org/#/c/722478/ need some review here12:23
openstackgerritMatthieu Huin proposed zuul/zuul-client master: Fix promote and release pipelines  https://review.opendev.org/75125912:24
mhutobiash, ^ should do it12:24
tobiash+212:24
openstackgerritMatthieu Huin proposed zuul/zuul-client master: Add cross testing with Zuul  https://review.opendev.org/75126412:36
*** hashar has joined #zuul12:37
*** zenkuro has joined #zuul12:39
*** zenkuro has quit IRC12:43
*** jpena|lunch is now known as jpena12:44
openstackgerritMerged zuul/nodepool master: [provider][aws] use one API call to create tags  https://review.opendev.org/74692112:55
*** zenkuro has joined #zuul13:07
*** LLIU82 has quit IRC13:09
openstackgerritMerged zuul/zuul-client master: Fix promote and release pipelines  https://review.opendev.org/75125913:12
*** zenkuro has quit IRC13:12
openstackgerritMatthieu Huin proposed zuul/zuul-client master: Add cross testing with Zuul  https://review.opendev.org/75126413:20
*** zenkuro has joined #zuul13:37
*** saneax has joined #zuul13:40
*** zenkuro has quit IRC13:42
*** dmsimard has quit IRC13:44
*** dmsimard has joined #zuul13:45
openstackgerritTobias Henkel proposed zuul/zuul master: Ignore 500 errors when requesting pr files  https://review.opendev.org/75128113:48
*** gmann is now known as gmann_afk14:05
openstackgerritMatthieu Huin proposed zuul/zuul-client master: Make default config files location a class attribute  https://review.opendev.org/75129114:13
openstackgerritMatthieu Huin proposed zuul/zuul master: Add zuul-client to requirements  https://review.opendev.org/75019614:14
*** hashar has quit IRC14:18
tobiashpromote worked now: https://review.opendev.org/#/c/751259/ :)14:24
tobiashand it's hosted: https://zuul-ci.org/docs/zuul-client/14:25
tobiash(but not linked yet)14:25
mhutobiash, ah cool! Can we get an initial 0.0 release so that the project's on PyPI as well?14:31
tobiashcorvus is our release expert :)14:33
corvustobiash, mhu: i'll take a look in a bit14:34
mhuthanks!14:34
fungimhu: you should be able to submit a change against the zuul/zuul-website repository to add it to the docs list14:35
fungiwhen you're ready14:35
mhufungi, I'll have a look14:35
corvusmhu, tobiash: commit 56981c76df188d78e8395260a19eee9e5ad16b54 (HEAD -> master, tag: 0.0.0, origin/master, origin/HEAD, refs/changes/59/751259/1)14:39
corvusthat look right?14:39
tobiashlgtm14:40
mhucorvus, yep! Should allow for the rest to get going14:40
mhuthanks14:40
tobiashmhu: thinking about https://review.opendev.org/750196 it might make sense to additionally expose those tests in their own job so they can be used in the zuul-client repo as integration tests14:41
mhutobiash, I was thinking of doing something like what's done with nodepool: https://review.opendev.org/#/c/751264/14:41
mhuwould that work for you?14:41
corvusmhu: pushed14:42
mhu \o/14:42
tobiashmhu: wfm14:43
*** sgw has joined #zuul14:44
openstackgerritMatthieu Huin proposed zuul/zuul-website master: Add link to zuul-client documentation  https://review.opendev.org/75131214:48
mhuuh, there might be a problem with the archive name14:49
mhucreating '/home/zuul/src/opendev.org/zuul/zuul-client/dist/zuul-0.0.0-py3-none-any.whl' and adding '.' to it14:49
fungii'm not getting whatever the problem is with that archive name14:51
corvusoh that should say "zuul-client-0.0.0"?14:51
fungiaha, yep14:51
fungiit's likely needing to be tweaked in setup.cfg14:51
mhuyep, exactly14:51
mhusorry about that14:51
fungithe package name needs to not be inherited from the module name14:52
corvushttps://zuul.opendev.org/t/zuul/build/3dbd2b28fc46457eb10e1c2dddf14c98/log/job-output.txt#48614:52
corvuscongratulations, you just uploaded zuul 0.0.014:52
corvushttps://pypi.org/project/zuul/0.0.0/14:52
mhuahaha14:53
corvusi'm not laughing14:53
fungii can delete that release14:53
mhuI mean, at least it didn't erase 3.19.114:53
corvusi'm more concerned about whether we overwrote a real release14:53
fungipypi (warehouse) thankfully won't allow you to replace a release or a file14:54
corvusok.  i thought there was a 0.0.014:54
corvusfungi: i am in favor of you deleting that14:54
fungiit may have allowed us to upload additional files for 0.0.0 if their names were different than existing files for that release14:54
corvuslooks like it's just the new files14:55
fungithough from what i could tell digging in zuul git repository history, we started versioning at 1.0.014:55
fungiyou know, because we could ;)14:55
corvuswe may have had a 0.0.0 release with no files (due to the old pypi registration system)14:55
corvusthat's what i used to do when it could be done14:55
fungimaybe. i used to just use "0"14:56
fungithough i'm pretty sure the release would have been marked as many years ago rather than 7 minutes ago in that case14:56
corvuseven better14:56
fungipypi has normally kept the release timestamp the same even when new files are uploaded for an existing release (like adding more wheels or something)14:57
fungi#status log deleted errant zuul-0.0.0-py3-none-any.whl and zuul-0.0.0.tar.gz files and the corresponding 0.0.0 release from the zuul project on pypi14:59
openstackstatusfungi: finished logging14:59
openstackgerritMatthieu Huin proposed zuul/zuul-client master: Fix package metadata  https://review.opendev.org/75131515:00
fungigoing to see if i need to remove them from tarballs.o.o as well15:00
fungilooks like we only upload branch tip artifacts there, no release artifacts or signatures: https://tarballs.opendev.org/zuul/zuul/15:01
fungiand not since 2020-02-20 apparently15:02
*** harrymichal has joined #zuul15:02
mhufungi, thanks for looking into that, https://review.opendev.org/#/c/751315/ should get things back to normal. My apologies for letting that typo go past!15:05
fungii think i reviewed the addition too and didn't spot it, so it's not all on you15:06
corvusfungi: thx15:07
corvusmhu: should that be zuulclient or zuul-client ?15:07
mhucorvus, every other project is named zuul-something, I'll add the hyphen15:10
openstackgerritMatthieu Huin proposed zuul/zuul-client master: Fix package metadata  https://review.opendev.org/75131515:11
corvusmhu: maybe capitalize that "a" at the start?  i think that will shouw up on pypi15:12
corvus(it's a nit, but a big nit)15:12
openstackgerritMatthieu Huin proposed zuul/zuul-client master: Fix package metadata  https://review.opendev.org/75131515:13
mhuvoilĂ  :)15:13
corvustobiash, fungi: ^ lgtm15:13
*** harrymichal has quit IRC15:15
fungiyep, when i build with that it produces zuul-client-0.0.1.dev1.tar.gz and zuul_client-0.0.1.dev1-py3-none-any.whl now, so should be all set15:18
openstackgerritMerged zuul/zuul-client master: Fix package metadata  https://review.opendev.org/75131515:29
*** harrymichal has joined #zuul15:39
*** jcapitao has quit IRC15:46
corvusmhu, fungi: commit 1d2301814b5c27e2e712e50a5beb0e96fccf3bab (HEAD -> master, tag: 0.0.1, origin/master, origin/HEAD, refs/changes/15/751315/3)15:47
corvuslook right?15:47
fungicorvus: yep, that looks like my origin/HEAD too15:48
fungiand is also the change i checked out and tested15:49
corvuspushed15:50
openstackgerritPierre-Louis Bonicoli proposed zuul/zuul master: gitlab: an "update" event isn't always a "labeled" action  https://review.opendev.org/75054415:58
*** hamalq has joined #zuul15:58
*** hamalq_ has joined #zuul15:59
*** hamalq has quit IRC16:03
*** harrymichal has quit IRC16:08
*** saneax has quit IRC16:31
*** rfolco|ruck is now known as rfolco|ruck|brb16:39
clarkbis http://paste.openstack.org/show/797786/ a zuul bug?16:43
clarkbI'm checking now if the problem persists in that repo, but maybe the executor crashed and leaked that file and that is antoher thing to scrub on startup?16:43
clarkbthe other thing that could be happening is having the build contexts leak out somehow?16:43
clarkb-rw-r--r--   1 zuuld zuuld     0 Aug 25 07:18 index.lock <- uptime says 17 days whcih is ~ to that time16:45
clarkbso ya I think the server may have crashed and we leaked the git index.lock16:45
*** fdegir has quit IRC16:53
fungimaybe it would be safest if executors cleaned their local repos at boot?16:53
fungiat start, whatever16:53
*** tobberydberg has quit IRC16:57
clarkbya I think we should add in a startup task to clear out index.lock files at least. Doing an update on all repos first would likely be very expsnive16:57
*** jpena is now known as jpena|off17:34
openstackgerritClark Boylan proposed zuul/zuul master: Clean up stale git index.lock files on merger startup  https://review.opendev.org/75137017:37
clarkbsomething like that maybe for cleaning up the index.lock  files. I Was hoping the build dir cleanup had tests but I don't see any? Or maybe there were added after the original implementation and i need to look harder17:37
*** gmann_afk is now known as gmann17:42
*** rfolco|ruck|brb is now known as rfolco|ruck17:46
*** fdegir has joined #zuul18:02
fungicould recent changes for zuul's webui have changed how api queries are being made? we're noticing a lot of "cache busting" (pragma: no-cache, cache-control: no-cache, max-age: 1) which we think has caused our deployment in opendev to no longer be able to offload requests onto apache mod_cache... zuul-web cpu utilization is quite high, response times are terrible, and apache definitely is not caching the status18:06
fungijson for us18:06
fungihas anyone else seen similar behavior recently?18:06
clarkband it seemed to start after we updated zuul-scheduler and zuul-web to pick up the hold build status in the db18:07
fungiif that's intentional then we can probably come up with a workaround, just first trying to make sure we're not barking up the wrong tree18:08
fungiour request volume doesn't look like it's particularly higher than usual, but we also weren't logging mod_cache info until yesterday so we can't be 100% certain we were successfully caching it before either18:09
fungiit's just our only good explanation for what we're seeing at this point18:10
clarkbya we noticed that we lacked logging for this in the debugging process and have since added it18:10
*** vishalmanchanda has quit IRC18:20
openstackgerritClark Boylan proposed zuul/zuul master: Clean up stale git index.lock files on merger startup  https://review.opendev.org/75137018:25
corvusclarkb: i don't always see a pragma header being sent18:31
corvusclarkb: only when i do a shift-reload18:33
clarkbcorvus: the cache-control: no-cache seems to be there if I just let it sit and wait18:34
clarkbas is the pragma18:34
clarkbit also seems to be refreshing way more quickly that I would normally expect18:35
clarkbevery 5 seconds? (maybe this is related)18:36
corvusevery 5 seconds is normal18:38
clarkboh I thought it was 60 seconds for some reason18:39
corvusthe only way i can get mine to send a pragma is by shift reload.  i've never sent a cache-control header afaict18:39
clarkbI see both on every request. Maybe it is browser specific: FF 81.0b7 here I'll check chrome now18:40
corvusi do see one bug: it's supposed to stop sending the request if the browser window is not active, but that only happens if it's not in the middle of a request when the user switches focus.  if they do that while the request is outstanding, it does not disable the request loop.18:40
corvusff80.0.118:40
corvusi believe that changed recently18:41
corvus(the request loop disabling code)18:41
clarkbchrome doesn't seem to provide cache-control or pragma on the 5 second interval refreshes. Doing a shift reload now to see if it does then18:41
clarkbshift reload in chrome set both18:42
corvusthen see if the next request clears them18:42
corvus(or maybe the one after that if the loop gets out of sync)18:42
clarkbyes the next regular interval (and subsequent intervals) has cleared them18:42
fungiso maybe the performance decrease is due to no longer skipping refreshes on inactive windows, and we've actually been failing to cache stuff all along18:43
clarkbif I'm the only one using broken ff beta then that doesn't explain why we aren't caching though18:43
clarkbfungi: oh good call18:43
corvusso we have 2 hypotheses: a) "pragma: no-cache" header sent consistently in ff beta.  b) refresh loop broken in js causing lots of extra requests from people leaving backgrounded status tabs open18:44
fungihowever, like i said, our actual request volume doesn't look particularly higher going by bandwidth utilization and packet rate18:44
corvusah, and c) web server cache config is broken?18:44
corvusclarkb: do we know we are not caching?  if so, how do we know that?18:44
fungiby adding cache details to the access log18:45
clarkbthat was a change we made yseterday to ebtter understand this18:45
clarkband we see the static resources being handled by the caching system but not the status json18:45
fungiyeah, when we started to dig into it we noticed that apache's cacheenable directive isn't documented as supporting regular expressions, so we tried changing that and got the static content to start caching, but not the status api calls for the multi-tenant vhost (but it's somehow caching them for the whitelabeled vhost)18:46
fungiwe need a wildcarded pattern match to cover the multi-tenant status api path, so tried putting cacheenable in a locationmatch instead, but it's still not getting cached18:47
clarkbfungi: but also you tried ti with specific tenant names and that wasn't caching either18:48
clarkband that is when we started to think it may be due to the requests themselves18:48
corvushow can you tell a cache hit?18:48
corvusi see that it says "cache miss" in the access log18:49
clarkbI think it will say cache hit but also cache something I'm caching it on this request18:49
* clarkb looks18:49
fungioh, yep, instead of the locationmatch i tries to just cacheenable mem /api/tenant/openstack/status and we still weren't getting any hits18:49
fungicorvus: grep for "cache hit"18:49
corvusah.  i don't see that for any status url18:50
clarkbcache miss: attempting entity save and cache hit18:50
clarkbyou see both for /static18:50
corvusi see either "cache miss" or nothing for status18:50
fungicorvus: yep, that's the problem we're running into18:50
clarkbso caching is working for that path but not status18:50
corvusi thought someone said whitelabel status was being cached18:50
fungier, i don't recall if it was being cached, but it was at least hitting mod_cache accoring to the access log18:51
fungiwhile the non-whitelabeled status api path was not that i could find18:51
fungieven when just hardcoding it to one of the tenants and not including any wildcarding18:51
clarkbcorvus: yes if you filter by /api/status which is the whitelabeled path then you'll see both18:51
clarkb(though not many cache hits they do exist)18:51
fungi(right now it's configured for a locationmatch, but that's also not working)18:51
clarkbbut if you filter by /api/tenant/ you get nothing18:52
*** tobberydberg has joined #zuul18:58
clarkbcorvus: fwiw I'm not entirely sure that we're caching the whitelabeled status correctly, just that the system agrees it is something to cache (and appears to have done so occasionally)18:59
corvusclarkb: 4 times in the past day sounds pretty spurious19:00
fungiyeah, i don't know that's even a frequency we can chalk up to not many people using the whitelabeled vhost these days19:01
corvusthere are lots of whitelabeled requests19:02
corvusplenty of times we see multiple requests / second for whitelabel, so should be enough to hit the cache19:03
corvusabusing the inactivity bug -- i now have 4 tabs reloading continually, all of them are cache misses19:05
corvusand i do occasionally see duplicate content length, so it's very likely at least some of the time i'm hitting the internal zuul-web cache (so i know it's within the 1-second window)19:06
*** johanssone_ has joined #zuul19:13
*** johanssone has quit IRC19:15
clarkbcorvus: do you think the bug causing background refreshes is new enough to have been pulled in by the latest restart? or is that an old one?19:24
corvusclarkb: i think it's new, lemme dig19:32
corvus(was just out to lunch.  so to speak)19:32
clarkbno worries I'm justfinishing up mine too19:32
clarkbit doesseem like if something like that is new with the latest restart it makes a likely candidate for the issue19:33
clarkbdoesn't explain the apache cachign problems but maybe wecan solve that separately if this is the underlying problem19:33
AJaegerclarkb, corvus, here's a reorg of the upload-logs roles in zuul-jobs, could you review the stack at  https://review.opendev.org/#/c/742732/7 , please?19:36
corvusclarkb: 1ecbe58474 ed9d0446d5 70a7997197 are first-pass candidates from git blame; all authored in june/july timeframe (unsure when merged)19:37
clarkbcorvus: iirc our last restart of zuul-web was july 31 I feel like I checked that but need to find it in logs19:38
clarkboh that was before we restarted for scroll fixes19:38
corvus1ecbe58474 merged aug 24; ed9d0446d5 merged jul 13; 70a7997197 merged jul 719:39
clarkbwe restarted on he 28th for scroll fixes19:40
clarkbAugust 28 I mean19:40
corvusso all of these were in place then19:40
corvusand aug 24 is the most recent related change19:41
clarkbya so either its the cause and no one complained until recently or its something else19:41
corvusi doubt it's that no one complained :)19:41
*** hashar has joined #zuul19:42
clarkblooking at gerrit the only web related chagne between the earlier build page restart and scorll fixes and this restart was the change we restarted for19:43
clarkbwhich added the hold attribute to builds19:43
clarkbmaybe its the scheduelr then (since the zuul web process is largely just a fancy proxy for that)19:44
corvusclarkb: the logs indicate that zuul web internal caching is working as expected19:44
corvusclarkb: zuul-web is only making a request to the scheduler 1/sec19:45
corvus(so zuul-web is protecting the scheduler, but nothing is protecting zuul-web)19:45
clarkbI guess it can also be an update in cherrpy ?19:45
clarkbno new cherrypy releases19:45
corvuscheck cheroot19:46
clarkbhttps://pypi.org/project/cheroot/8.4.5/ is from august 2419:46
corvuswe unpinned on jul 1419:46
corvusso that all should be the same then19:46
clarkbwhen I looked yesterday (and now looking again) it seems the gearman requests finish in a erasonable amount of time19:49
clarkbmaybe not always super fact but hundreds of milliseconds not multiple whole seconds19:49
clarkbbut we don't log the job uuids so may have things mismatched19:49
clarkbI'm able to reproduce what fungi said which is that direct requests to zuul-web are slow19:53
clarkbwhich has me leaning back towards maybe we need better caching in apache to protect zuul-web, but also what changed here ?19:53
clarkbcorvus: I notice that 404s are slow too19:57
clarkbcorvus: almost like it is routing that is the problem19:57
clarkbbeacuse we should 404 early for something like /api/tenant/zuul/shoulderror19:58
clarkband not do any expensive backend processing19:58
corvusclarkb: i imagine zuul-web is just cpu starved and backlogged19:58
clarkbya it is a busy process (and it isn't forking right)20:00
corvusyes, it's threaded20:01
clarkbso we're back to figuring out apache I guess. Maybe we need to increase the max ttl ?20:02
clarkbperhaps the max-age value is hurting us?20:04
clarkbthough you'd expect apache would cache it for second?20:04
corvusyeah, pretty sure this worked at one point20:05
clarkbI wonder if that is calculated against the last modified value and not when apache gets it20:05
clarkbmaybe that delta is > 1 due to the slowness20:05
corvusif so, could be a simple tipping point scenario20:05
clarkb"Well formed content that is intended to be cached should declare an explicit freshness lifetime with the Cache-Control header's max-age or s-maxage fields, or by including an Expires header." and "If a response does not include an Expires header but does include a Last-Modified header, mod_cache can infer a freshness lifetime based on a heuristic, which can be controlled through the use of the20:06
clarkbCacheLastModifiedFactor directive."20:06
clarkbso I think that is a possibility here20:06
clarkbreading about that CacheLastModifiedFactor directive now20:06
clarkbmax-age is meant to be taken from the time of request20:10
clarkbso ya I think if zuul takes longer than a second to respond then we stop caching?20:10
clarkband perhaps the background refreshes are a component in tipping over20:12
clarkbthinking out loud here, maybe we should bump that internal caching and max-age to 10 seconds?20:20
clarkbthen see if the behavior changes at all? (the longest request I saw was 9 seconds so expect 10 to be plenty)20:21
corvusi'm not a fan of that20:22
corvusi think a large installation like opendev either needs a good caching layer or more zuul-web instances20:23
*** nils has quit IRC20:23
clarkbhrm any idea if SO_REUSEADDR is set? if so we may by able to just run a few zuul-webs on the existing host20:25
clarkbhttps://github.com/cherrypy/cherrypy/issues/1088 implies that it is set20:26
clarkbwebknjaz: ^ would it be crazy to start a few cherrypy processes on port 9000 and let the kernel decide where connections go?20:27
fungiyeah, we have plenty of available processors, zuul-web seems to only want to use one20:30
fungiwell, one processor worth anyway20:31
fungithough the work *seems* to get distributed across the processors according to top (if you hit the 1 key to expand the processor list)20:32
fungiso i can't say for sure it would actually help20:32
clarkbfungi: its a single python process with many threads and due to the GIL I think the only way to effectively use multiple cpus is to fork20:32
openstackgerritJames E. Blair proposed zuul/zuul master: Correct visibility check in web JS  https://review.opendev.org/75142520:33
fungii wonder why top makes it look like those threads are distributed across processors then20:33
corvusfungi: it's io heavy so it can use > 1 processor at a time20:33
corvusbut barely20:34
corvusclarkb: is there no way to actually use apache to cache?20:35
clarkbcorvus: we may be able to tell apache2 to ignore the max-age sent by zuul-web20:35
corvusclarkb, fungi, zbr:  https://review.opendev.org/751425 should fix the visibility check20:35
corvusclarkb: i don't want to ignore it, i want it to honor it :)20:36
corvus"cache this for one second after you get it"20:36
clarkbcorvus: there are lots of tunables I'm just not sure how to express that in this case20:37
fungiand sorry i'm mostly on silent running for the past few... trying to get okonomiyaki grilled and consumed before the mordred hour20:38
corvusfungi: grill extra and share20:40
fungiit's too thick to fit through the fax machine20:41
fungi(you're not supposed to press down on it while it cooks!)20:42
clarkblooking at the js fix now20:47
clarkbhrm I should go pour a whiskey for the mordred happy hour20:50
fungii have a bowl of sake20:52
clarkbthe way the air is here I bet sake would almost taste like whiskey20:52
clarkb(not really the trees burning are not all oak)20:52
fungithis is pretty terrible sake (the finest sake, gekkeikan, serving the imperial household by appointment!)20:53
fungii normally use it for cooking, but desperate times call for desperate sake20:54
fungiour local grocery carries 1.5 liter bottles of the stuff20:54
clarkbacutally I've just remembered I have beer20:54
* mordred has made a caipirnha - but probably won't try to fax it to anyone20:55
*** rlandy is now known as rlandy|afk21:09
webknjaz@clarkb: it is set but I haven't tried using it https://github.com/cherrypy/cheroot/pull/53/files#diff-b0366adf530cee9249c1888ba4f32260R155021:14
clarkbwebknjaz: great well I expect https://review.opendev.org/751426 to test it21:15
*** nils has joined #zuul21:17
*** rfolco|ruck has quit IRC21:20
*** rlandy|afk is now known as rlandy21:53
*** hashar has quit IRC21:59
*** nils has quit IRC22:13
clarkbwebknjaz: Timeout('Port 9000 not free on 127.0.0.1.') so its not quite working, but I'm not likely to debug that further today. I'll poke at it more when I have time22:15
clarkb"except when there is an active listening socket bound to the address." <- that may be the problem22:17
*** hamalq has joined #zuul23:13
*** hamalq_ has quit IRC23:14
*** tosky has quit IRC23:20
*** hamalq has quit IRC23:37
*** armstrongs47 has joined #zuul23:39
*** armstrongs47 has quit IRC23:49

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!