Wednesday, 2019-08-28

*** noorul has joined #zuul00:15
*** noorul has quit IRC00:28
*** noorul has joined #zuul00:30
*** noorul has quit IRC00:40
*** noorul has joined #zuul00:40
*** noorul has quit IRC00:55
*** wxy-xiyuan has joined #zuul01:08
*** jamesmcarthur has joined #zuul01:22
*** bhavikdbavishi has joined #zuul01:30
*** noorul has joined #zuul01:32
*** noorul has quit IRC01:38
*** threestrands has joined #zuul01:39
*** bhavikdbavishi has quit IRC01:52
*** jamesmcarthur has quit IRC01:54
*** noorul has joined #zuul01:54
*** noorul has quit IRC01:59
openstackgerritJeff Liu proposed zuul/zuul-operator master: Add PerconaXDB Cluster to Zuul-Operator  https://review.opendev.org/67731502:07
*** spsurya has joined #zuul02:11
*** noorul has joined #zuul02:15
*** noorul has quit IRC02:20
*** noorul has joined #zuul02:23
*** noorul has quit IRC02:31
*** noorul has joined #zuul02:36
*** bhavikdbavishi has joined #zuul03:14
*** bhavikdbavishi1 has joined #zuul03:19
*** bhavikdbavishi has quit IRC03:20
*** bhavikdbavishi1 is now known as bhavikdbavishi03:20
*** ianychoi has quit IRC03:29
*** ianychoi has joined #zuul03:30
*** rfolco has quit IRC03:32
*** noorul has quit IRC03:50
*** noorul has joined #zuul04:22
*** raukadah is now known as chkumar|rover04:47
*** bjackman has joined #zuul05:03
*** noorul has quit IRC05:06
*** jhesketh has quit IRC05:34
*** jhesketh has joined #zuul05:42
openstackgerritMerged zuul/zuul master: web: test trailing slash are removed from renderTree  https://review.opendev.org/67682406:14
*** sanjayu_ has joined #zuul06:16
openstackgerritBenedikt Löffler proposed zuul/zuul master: Report retried builds via sql reporter.  https://review.opendev.org/63350106:18
*** sanjayu_ has quit IRC06:50
openstackgerritMonty Taylor proposed zuul/zuul master: Add linter rule disallowing use of var  https://review.opendev.org/67384107:08
yoctozeptohey folks, any thoughts on: https://review.opendev.org/678273 ? please review before it rots :-)07:10
*** themroc has joined #zuul07:34
*** jpena|off is now known as jpena07:40
openstackgerritMatthieu Huin proposed zuul/zuul master: Zuul Web: add /api/user/authorizations endpoint  https://review.opendev.org/64109907:56
*** mhu has joined #zuul08:01
*** jangutter has joined #zuul08:03
*** yolanda__ is now known as yolanda08:30
*** sanjayu_ has joined #zuul09:09
*** sshnaidm|afk is now known as sshnaidm10:02
tobiashcorvus: do you want to have a look at 678895 (the ref fix)? Or shall we +a it? It has +2 from me and tristanC.10:44
*** hashar has joined #zuul11:22
corvustobiash: +3 thx11:29
*** hashar has quit IRC11:33
*** hashar has joined #zuul11:33
*** jpena is now known as jpena|lunch11:35
*** rlandy has joined #zuul11:52
*** rlandy is now known as rlandy|ruck11:53
*** rfolco has joined #zuul12:15
*** rlandy|ruck is now known as rlandy|ruck|mtg12:19
openstackgerritMerged zuul/zuul master: Check refs and revs for repo needing updates  https://review.opendev.org/67889512:21
*** jamesmcarthur has joined #zuul12:22
*** jamesmcarthur has quit IRC12:29
*** jpena|lunch is now known as jpena12:32
openstackgerritMerged zuul/zuul master: Add linter rule disallowing use of var  https://review.opendev.org/67384112:35
*** jamesmcarthur has joined #zuul12:48
*** bjackman has quit IRC12:49
*** bjackman has joined #zuul12:52
*** dkehn has quit IRC12:53
*** nhicher has joined #zuul13:21
mhuShrews, no prob, happy to be of help13:30
openstackgerritDavid Shrewsbury proposed zuul/zuul master: WIP: Add autohold delete/info commands to web API  https://review.opendev.org/67905713:32
Shrewsmhu: am I on the right track with that? ^^^^13:32
mhuShrews, I'm in meetings right now but I'll have a look ASAP13:32
Shrewsmhu: sure, no hurry. thx13:33
*** sanjayu_ has quit IRC13:34
*** bjackman has quit IRC13:34
*** sanjayu_ has joined #zuul13:34
openstackgerritDavid Shrewsbury proposed zuul/zuul master: WIP: Add autohold delete/info commands to web API  https://review.opendev.org/67905713:39
tobiashare opendev's executors backed by ssds?13:51
*** bhavikdbavishi has quit IRC13:59
*** bjackman has joined #zuul14:04
*** dkehn_ has joined #zuul14:14
*** sshnaidm has quit IRC14:17
*** brennen is now known as brennen|afk14:18
*** sshnaidm has joined #zuul14:19
*** openstackgerrit has quit IRC14:22
tristanCShrews: it seems like change to the diskimage structs (e.g. adding a python-path), are not picked up by provider which still spawn register the previous python-path. It seems like we need to restart the launcher process14:23
fungitobiash: i don't believe so, looks like they're on whatever rackspace's default is for the rootfs and the ephemeral disk where we mount /var/lib/zuul14:23
fungipresumably "spinning rust" (via sata)14:23
tristanCShrews: any idea how to make provider reload diskimage definition when they change?14:24
tobiashfungi: I was wondering as your executors seem to perform much better than ours (which are ceph backed)14:24
fungitobiash: just a sec and i'll get more specifics14:24
tobiashbut we're currently in process of moving them to nvme disks14:24
tobiashtristanC: it should reload automatically afaik14:25
tobiashif not that seems like a bug14:26
fungitobiash: we've booted them from rackspace's "8 GB Performance" flavor in their dfw region, and this is not one of their special "ssd" flavors14:26
tobiashfungi: ah thanks14:26
tristanCtobiash: iiuc the openstack.config module, it keeps a copy of the diskimage but it doesn't check for change (diskimages list is global vs openstack provider local list)14:26
tobiashtristanC: hrm, we should probably fix this14:27
fungitobiash: additional details, they're using ubuntu bionic (18.04.2 LTS) with linux 4.15.0-46-generic and the filesystems are formatted ext414:27
tobiashfungi: cool, thanks!14:27
fungiplease ask if you wand any other details. i don't think anything besides the passwords is meant to be particularly secret ;)14:28
tristanCShrews: tobiash: oh my bad, the python-path change requires a new image upload14:28
*** jamesmcarthur has quit IRC14:29
*** jamesmcarthur has joined #zuul14:29
tobiashfungi: thanks, I was interested mainly in io performance characteristics14:29
*** jeliu_ has joined #zuul14:29
fungiour cacti graphs should show some i/o metrics if you haven't looked yet14:30
tobiashfungi: so your executors average about 4k iops14:32
fungineat, i hadn't looked but that sounds like a lot14:33
tobiashsounds like ssd ;)14:33
tobiashhttp://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=64197&rra_id=0&view_type=tree&graph_start=1566916407&graph_end=1567002807 for reference14:34
*** amotoki_ has quit IRC14:34
*** amotoki has joined #zuul14:35
*** jamesmcarthur has quit IRC14:42
*** openstackgerrit has joined #zuul14:47
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Switch to fetch-sphinx-tarball for tox-docs  https://review.opendev.org/67643014:47
*** jamesmcarthur has joined #zuul14:49
fungitobiash: one thing we've observed is that the sla in a lot of public providers use write-back caching instead of write-through, so we could simply be seeing numbers reflecting writes to memory there14:57
*** igordc has joined #zuul14:57
*** bjackman has quit IRC15:00
*** jamesmcarthur has quit IRC15:02
tobiashfungi: according to flavor description it seems to be (undefined) ssd: https://developer.rackspace.com/docs/cloud-servers/v2/general-api-info/flavors/15:04
*** themroc has quit IRC15:04
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: Add phoronix-test-suite job  https://review.opendev.org/67908215:06
fungihuh, neat. maybe it's just their default block storage which is on sata then, i know you have to request a special flavor of that to get ssd15:06
fungi(or at least you used to, maybe they've upgraded all their storage?)15:07
clarkbnote /var/lib/zuul is the ephemeral device15:08
clarkbwhich may be different hardware than the root disk15:08
fungiyep15:08
fungii haven't seen where they indicate what hardware serves their ephemeral disks15:08
mhuShrews: I've commented the review, you're close :) Next I'll help with the tests if you'd like15:11
Shrewsmhu: trying to figure out the tests now  :)15:11
mhuShrews, I think the existing ones for autoholding from the REST API should be a good starting point15:12
mhualthough they require auth15:12
*** jamesmcarthur has joined #zuul15:13
Shrewsmhu: what portion is the "boilerplate authentication/authorization code" ?15:14
Shrewsmhu: the part beginning with: rawToken = cherrypy.request.headers['Authorization'][len('Bearer '):]  ?15:15
mhuShrews, from "basic_error ..."15:16
mhuhttps://opendev.org/zuul/zuul/src/branch/master/zuul/web/__init__.py#L244 to https://opendev.org/zuul/zuul/src/branch/master/zuul/web/__init__.py#L262 for example15:17
*** jamesmcarthur has quit IRC15:18
mhuthis would be better factored in a single method ... I thought of having this as a decorator but was advised against it15:18
*** tosky has joined #zuul15:21
Shrewsmhu: that portion of code calls is_authorized() which requires a tenant parameter. How does that effect your suggestion to not use tenant in the api url?15:22
mhuShrews, IIUC autohold-info gets you the tenant info for the autohold request15:23
mhuso I'd suggest calling autohold-info first, fetch the tenant, call is_authorized, then proceed15:24
Shrewsok15:24
mhuthis way you can also catch errors when the request id is incorrect and return a 40415:24
toskyanyone up for re-reviewing https://review.opendev.org/#/c/674334/ ?15:29
openstackgerritDavid Shrewsbury proposed zuul/zuul master: WIP: Add autohold delete/info commands to web API  https://review.opendev.org/67905715:31
*** chkumar|rover is now known as raukadah15:31
Shrewsmhu: does that mean you'd suggest i use 404 instead of 500 within _autohold_info() if the rpc call fails?15:33
Shrewsi just copy-pasted that code from elsewhere15:33
mhuShrews: that depends on the type of error15:37
mhu4XX HTTP statuses are generally used for user-induced errors15:37
mhufor example 404 (Not Found) is an adequate return code if the request ID is incorrect15:38
mhu401 means Unauthorized, ie the user needs more privileges in order to perform an action15:38
*** sanjayu_ has quit IRC15:38
mhu500 is a catch-all code for server-side errors15:39
*** sanjayu_ has joined #zuul15:39
mhuso in your case 500 is correct, just not very informative15:39
mhubut maybe the RPC itself won't give much info15:39
Shrewsoh, i think i need to check for an empty dict and THEN return 40415:40
*** rlandy|ruck|mtg is now known as rlandy15:43
mhuShrews, yep15:44
openstackgerritDavid Shrewsbury proposed zuul/zuul master: WIP: Add autohold delete/info commands to web API  https://review.opendev.org/67905715:44
Shrewsthat should do it ^^15:44
*** rlandy is now known as rlandy|brb15:46
*** panda is now known as panda|rover15:47
*** sshnaidm is now known as sshnaidm|afk15:49
*** noorul has joined #zuul16:02
*** jpena is now known as jpena|off16:05
noorulHow does the log collection works in Zuul?16:07
*** rlandy|brb is now known as rlandy16:07
noorulWhere is it actually stored?16:07
clarkbnoorul: whereever you configure it is the short answer. There are roles to upload logs to openstack swift storage locations (what we currently use) as well as rsync onto filesystems which you can serve with a webserver (what we used previously)16:10
noorulclar16:11
noorulclarkb: Actually the script are triggered remotely using Ansible. So the logs are stored on the node where the Ansible runs. Is this the executor?16:12
clarkbnoorul: there is a bit of coordination between the executor and test nodes to make this work. Basically there are log collection roles that pull logs onto the executor from the test nodes then roles with publish those logs from the executor to say swift or a fileserver16:15
clarkbso there are two steps. Collect logs into publication source dir, run publication role to publish publication source dir16:16
noorulAm I missing anything in the config http://paste.openstack.org/show/766638/ ?16:18
clarkbfor logging? no. The logging happens as part of your job config. Typically you will put it in your base job16:19
noorulThis is the base log http://paste.openstack.org/show/766639/16:20
clarkbfor opendev this is the chain of things that gets you logs: https://opendev.org/opendev/base-jobs/src/branch/master/zuul.d/jobs.yaml#L55 https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/base/post.yaml#L3-L4 https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/base/post-logs.yaml16:21
clarkbthe first bit is where the base job includes the playbooks then the first playbook collects logs and the second publishes them16:21
clarkbnoorul: ya so your post-run playbook(s) should coordinate publishing of logs16:24
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: Add phoronix-test-suite job  https://review.opendev.org/67908216:27
*** hashar has quit IRC16:33
mordredtristanC: ^^ interesting. how are you thinking of using that?16:34
clarkbmordred: I'm guessing that will be used to benchmark nodepool nodes16:36
mhuclarkb, yep16:37
mordredclarkb: ah - yeah. makes sense16:37
tristanCmordred: to test nodepool label performance, here is how: https://softwarefactory-project.io/r/#/c/16145/16:37
mhuand more generally cloud providers16:37
mordredfor some reason I was reading it as a test of nodepool itself which didn't make any sense. that makes much more sense16:37
clarkbopendev has used tempest for years for that since it isn't artificial and tends to map to our needs well16:37
clarkbit is actually a relly good test of cpu and disk and network16:38
tristanCclarkb: well, we'd like to know what is causing a difference, e.g. check cpu, memory, io, network, ...16:38
clarkbya phoronix test sutie is likely best if you want to examine specific items rather than a holistic "is this node fast enough to run our jobs"16:40
mordredyeah. seems like a good tool in the toolbox16:41
tristanCmordred: yeah, we figured that would be a nice addition to zuul-jobs :)16:43
AJaeger_mordred: the two week waiting period for https://review.opendev.org/676430 is over - and I just pushed an update for it to fix the problems I noticed. In case you want to ask for reviews ;)16:44
*** jeliu_ has quit IRC16:45
mordredtristanC: left a question/comment on it16:46
*** jamesmcarthur has joined #zuul17:00
noorulIf I make changes config-projects repo, will get automatically loaded?17:01
nooruls/chages/chages to/17:02
*** jeliu_ has joined #zuul17:06
clarkbno changes to config projects have to be merged before they take effect17:06
clarkb(this is for security reasons you don't want to expose secrets for example)17:06
noorulI did not get that17:10
clarkbchanges to config projects must be merged before they change how zuul operates. This ensures that humans can review the changes prior to implementing them which helps to avoid security problems with privileged info17:11
noorulclarkb: I directly pushed to master17:12
noorulMy main.yaml is here http://paste.openstack.org/show/766643/17:16
noorulI am not seeing all the roles under zuul-jobs under the tenant17:16
clarkbif you've pushed directly to master then I would expect zuul to pick it up. However I don't know how the bitbucket driver will handle that case17:17
tristanCnoorul: iirc, zuul may miss direct push and thus skip reload the config17:17
noorultristanC: So a restart of scheduler might help?17:18
clarkbtristanC: on Gerrit at least there should be an event for that case17:19
tristanCnoorul: if no ref-updated or change merged happened in the scheduler log, then restarting the service will force using the latest config17:19
clarkband I would expect zuul to pick it up17:19
clarkb(but you'd have to push through gerrit not directly onto disk)17:19
noorulclarkb: I a not using Gerrit but Bitbucket server instead17:20
clarkbI know, I'm explaining that it should work in the gerrit case but may not in the other driver cases17:21
noorulOh I see17:21
*** panda|rover is now known as panda|rover|off17:23
*** noorul has quit IRC17:43
*** tosky_ has joined #zuul17:45
*** tosky has quit IRC17:45
*** tosky_ is now known as tosky17:45
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Add autohold-info CLI command  https://review.opendev.org/66248718:00
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Record held node IDs with autohold request  https://review.opendev.org/66249818:00
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Auto-delete expired autohold requests  https://review.opendev.org/66376218:00
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Mark nodes as USED when deleting autohold  https://review.opendev.org/66406018:00
openstackgerritDavid Shrewsbury proposed zuul/zuul master: WIP: Add autohold delete/info commands to web API  https://review.opendev.org/67905718:00
Shrewsyay bugs18:00
Shrewsmhu: ok, i have tests now. The only one that fails is test_autohold_delete() and that's because of authz failure. How do I do that correctly?18:01
Shrewsmhu: oh! the authz tests are in a different class.18:03
Shrewsyay, it works18:06
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Add autohold delete/info commands to web API  https://review.opendev.org/67905718:08
Shrewscorvus: mordred: that should tie up the loose ends of the autohold revamp stuff and should be gtg now ^^^18:14
clarkbzuulians I'm wondering if we should consider a release this week to fix that zuul tests the wrong commit bug for peopel consuming relases?18:19
clarkbI'm deploying that fix on opendev now so we should know if it is working18:19
clarkbor at least doesn't regress further18:19
clarkb(the conditions under which it happens are somewhat specific)18:20
*** michael-beaver has joined #zuul18:23
Shrewsthat bug merges things into the wrong branch, yeah? If so, then yeah, a release sounds advisable18:23
*** igordc has quit IRC18:23
clarkbShrews: it causes zuul to checkout the wrong branch in the jobs18:23
clarkbso the jobs run against the wrong commit if they trip over the bug18:23
Shrewsah18:24
clarkbI think the correct commits are actually there too18:24
Shrewsstill seems worthy18:24
clarkbya18:24
*** armstrongs has joined #zuul18:36
*** jamesmcarthur has quit IRC18:43
*** armstrongs has quit IRC18:45
*** tosky has quit IRC18:56
openstackgerritClark Boylan proposed zuul/nodepool master: Use fedora-29 instead of fedora-28  https://review.opendev.org/67911619:01
openstackgerritClark Boylan proposed zuul/nodepool master: Use fedora-29 instead of fedora-28  https://review.opendev.org/67911619:06
openstackgerritClark Boylan proposed zuul/nodepool master: Use fedora-29 instead of fedora-28  https://review.opendev.org/67911619:15
clarkbtristanC: ^ can you review that change19:17
EmilienMhi there, what is the "2. attempt" thing in zuul?19:18
EmilienM(I probably missed the feature announcement)19:18
EmilienMis it like an auto-recheck or?19:19
openstackgerritRonelle Landy proposed zuul/zuul-jobs master: Only use RHEL8 deps repo on Red Hat systems newer than 7  https://review.opendev.org/67912619:19
clarkbEmilienM: there are two major cases for it: either the job fails in pre-run playbook so is restarted or zuul identifies the failure as something external to the job so retries it19:20
clarkbEmilienM: in this case I've restarted all of the zuul executors which kills the jobs running on the executor that was stopped and reschedules them to another19:20
EmilienMin case #2,w here is the list of known issues?19:20
clarkb(this was to update the deployment of our executors)19:20
fungialso not a new feature, but the fact that we're surfacing it in the builds dashboard is new19:21
clarkbEmilienM: I don't think there is a list as much as "this exit code from ansible means it has a network failre" type of deal19:21
clarkb+ gearman worker went away19:21
EmilienMok19:21
EmilienMthanks!19:21
fungiEmilienM: when you see a build result of RETRY_LIMIT that means that zuul saw failures it thought meant it should abort and requeue the build, but tried that repeatedly and finally gave up19:22
EmilienMit makes sense19:23
EmilienMnicely done!19:23
clarkbya zuul has done this since the jenkins days19:25
clarkbits just always been a bit transparent to people unless they hit retry limits19:25
fungibecause: clouds (and the internet)19:25
fungistuff has a tendency to just spontaneously go away and never come back19:26
fungibuilding a castle on a foundation of sand19:26
clarkbI want to say the jenkins behavior of losing its sshconnection and not trying to reconnect but instead simply failing is what precipitated the feature19:27
EmilienMfungi: right i just didn't see it before in the UI19:27
mnaseruh19:40
mnaserno bucket sharding is happening for periodic jobs uploaded to swift btw19:40
mnaserex: https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_periodic/opendev.org/openstack/operations-guide/master/propose-translation-update/ba10bde/19:40
mnaserthis has contributed to doing really bad things in our swift :\19:40
clarkbbecause one container is larger than the others?19:47
timburkemnaser, out of curiosity, roughly how many objects are in the container?19:49
clarkbtimburke: note mnasers cloud is ceph not swift19:49
mnaser^19:49
clarkbhavent heard complaints from the swift clouds19:50
mnaserclarkb: yes, and rados eventually needs to reshard buckets automatically19:50
timburke👍19:50
timburkestill, just curious :-)19:50
mnaserso it hits a limit then starts a reshard which takes forevers19:50
mnaserlet me check19:50
clarkbtimburke: ya me too19:50
mnaserthe stats is sitting around for a while..19:52
clarkbmnaser: the way the prefix sharding works is it takes the zuul log path eg: logs/periodic/opendev.org and logs/68/678968/3/check and replaces the first / with a _ and that component becomes the container name20:05
clarkbso it is sharding the periodic logs, it is just sharding them into the same container20:05
mnaseroh hm20:06
mnaserright20:06
clarkbwe could change the zuul log path for periodic jobs to include their day of the month maybe?20:06
clarkbeg logs/periodic_$DoM/opendev.org20:06
clarkbthen you'd get 31 periodic job shards20:06
clarkbthere may be other methods that would work better?20:07
clarkblet me figure out how to push that up so we have a change we can poke at at least20:10
timburkemight produce a bit of a hot container -- surely no worse than we've got now, but you might want to consider using seconds instead of day. plus you'd get about twice as many20:15
openstackgerritClark Boylan proposed zuul/zuul-jobs master: WIP: Add day of month to periodic logs for swift sharding  https://review.opendev.org/67913520:16
clarkbtimburke: thats a good point. I avoided hour and minute because we launch the periodic jobs at the same time20:17
clarkbbut secdonds should give us enough variance there20:17
openstackgerritClark Boylan proposed zuul/zuul-jobs master: WIP: Add current date seconds to periodic logs for swift sharding  https://review.opendev.org/67913520:18
clarkbtimburke: mnaser ^ there20:18
clarkbfrom the user standpoint my big concern is people not using swift for logs may rely on webserver indexes that sort by name to present periodic jobs. I'm not sure if we want to be able to assume the zuul dashboard is the primary consumption point for this stuff yet20:19
clarkb(fwiw I think it should be the primary point as it adds a bunch of functionality but we may not quite be there yet)20:19
clarkbin any case this may be good enough while people transition. I'll let others chime in20:19
timburkeyeah, that's a fair point. might be a point in favor of day, as it would have *some* sort of useful-ish meaning for a human20:22
clarkbtristanC: ^ you may have thoughts on that, will it be bad for softwarefactory for example20:41
clarkbpabelanger: ^ you too20:41
*** jamesmcarthur has joined #zuul20:44
fungii suppose we could reorder the path so that the build id comes first, which should be fairly entropic20:47
clarkbI think that becomes even harder for people to navigate without the dashboard though20:48
fungiis there a good reason that's a bad idea? (i assume it is or someone would have already suggested it as an obvious option)20:48
clarkbya its the hitting logs.openstack.org/ type webserver root problem20:48
clarkbif you are looking for periodic jobs finding them would be hard if it was just build uuids20:48
clarkb(granted digging through 60 different dirs isn't easy either20:49
fungiyeah, i see that as no worse than the lets-inject-seconds plan20:49
fungiand it would allow us to keep the paths no longer than they currently are20:49
clarkbwe'd probably go to full uuids so they would be longer20:50
clarkbto avoid collisions20:51
clarkbcurrently we avoid them by being at the end of the path20:51
clarkbso 7 chars is enough20:51
fungiwhy would collisions become a problem if they're not already with the parameters reordered?20:51
clarkbbecause right now they are uniquely identified by branch project name and job name and pipeline20:52
fungii don't propose we change that20:52
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: Add phoronix-test-suite job  https://review.opendev.org/67908220:53
fungibut you could just as uniquely identify them by build id, branch, project name, job name, and pipeline20:53
fungias by branch, project name, job name, pipeline, and build id20:53
clarkbthat gets weird to me if you go to the first dir and find multiple entries for the same id20:53
fungiare people going to the first dir now?20:53
clarkbyes that is what makes periodic special20:54
clarkbthe way you navigate periodic jobs on the old style log server is to go to the root, sort by date, and find wat you want20:54
clarkbthis is why we reverted the swift logs stuff forever ago the first time20:54
fungii thought the upshot of object storage was that we started requiring folks to rely on the zuul dashboard as an index to the logs20:54
clarkbbecause we neglected the periodic jobs use case20:54
clarkbfungi: yes that is the question I had above, can we currently expect people to use the dashboard20:55
fungibecause we no longer guarantee that the log urls are predictable20:55
clarkbfungi: because this is a change to zuul-jobs it affects more than opendev20:55
fungioh, that ordering is hard-coded, not parameterized? hrm...20:55
clarkbfor opendev I have no problem changing those urls to arbitrary random strings20:55
clarkbbeause the dashboard is the primary consumption point20:55
fungiseems like it could be solved by templating the parameters and their order20:56
fungifwiw, i think injecting seconds early in the path creates basically the same problem20:56
clarkbya day of month sort of avoids it if you know the scheme20:57
clarkbwhich is ps120:57
fungiday of month still creates load clustering, i expect. we end up slamming the same shard with writes over the course of a day20:58
clarkbfungi: ya that was timburke's point20:58
clarkbhowever if the problem is total count of objects this may help20:58
clarkbmnaser: ^ would probably need to weigh in on that aspect of it20:58
fungiright, i'd defer to someone operating an impacted storage environment on that20:59
*** jamesmcarthur has quit IRC20:59
clarkbother options include using a fork of that role in opendev/base-jobs or similar for a bit and change it to whatever21:02
clarkbsince we have the dashboard it is safe ofr us21:02
clarkbor just accept that periodic logs via old webserver isn't going to be nice21:02
clarkband change it to whatever in zuul-jobs21:02
fungii do think if we made the path component ordering configurable, it would allow opendev to do something like build-id first and get better path-indicated sharding on platforms which do that sort of thing21:03
fungieven just 7 hex digits allows for a fairly insane number of shards21:04
fungi268 billion21:04
clarkbwhich might be a problem itself21:05
clarkbwe probably only want to do 2 or 3 digits that way21:05
clarkbto avoid creating too many containers21:05
fungi<sagan>billions and billions...</sagan>21:05
fungioh, that's container names?21:05
clarkbyes21:06
clarkbthe way it creates the container name is to take the first two components of the path and to combine them21:06
fungiin that case we could just prefix with 2 hex digits truncated from the build-id i guess21:06
clarkblogs/68/54368/3 -> container logs_68 with object 54368/3/...21:06
clarkbthe problem with periodic jobs is that becomes logs_periodic for all periodic jobs21:07
fungifor some reason i thought it as that ceph/radosgw was using the first two path components to decide on the sharding21:07
clarkball other jobs either have a change number or ref prefix21:07
clarkbfungi: no we are deciding that in our swift upload role21:07
fungiyeah, if we're creating containers based on those, i agree even distribution over 268 billion possibilities ultimately probably means that over the course of the month we have roughly as many containers as builds21:08
clarkband then ceph is sharding within the container iaui21:08
clarkband that becomes a problem when a single container has too many objects?21:08
clarkbthat was my understanding of what mnaser said21:09
clarkbso if we divide the object count by 31 or 60 maybe we reuce the object count sufficiently to not be a problem21:09
fungiif we use a 2-hex-digit truncation then we top out at 256 possibilities which is a more reasonable container count probably21:10
clarkbyup, but would be shared across all builds not just periodic builds (that should more evenly distribute the objects which is a good thing)21:10
clarkbmnaser: ^ if you get a chance your input on what would help your ceph install would be valuable for figuring out the next step here21:19
fungiwhat i like about the truncated build uuid in the container name is that we get a clear upper bound on container count in each provider that way21:20
fungithough the quantization jump in either direction is to go to 16 or 4096 which might be extreme21:21
clarkbmy guess is 4096 is probably fine but 16 too small21:22
clarkbswift should be able to handle thousands of containers21:22
fungiso of those three options (16, 256, 4096) the middle one seems the most reasonable21:22
fungiand yeah, 4096 may be as well21:22
fungi65k containers, the next jump past 4k, is probably not21:23
clarkbya21:23
fungigranted, we *could* reencode and then truncate the build-id in whatever base we want, so sticking to powers of 16 is not absolutely necessary either21:24
fungibut as much as i might like to spend the rest of my afternoon on modular arithmetic, i probably need to get to mowing the lawn at some point21:25
tristanCclarkb: we do have users relying on $logserver/periodic/ to collect periodic logs, but we can break that by documenting zuul builds interface21:37
tristanCclarkb: or perhaps this new behavior can be toggled by a set-zuul-log-path-fact attribute?21:37
clarkbtristanC: ya that is what I'm working on now to make it a toggle21:38
tristanCusing an unified build-id based path sounds like a good idea, and we would likely enable this by default if it's optional21:39
tristanCour prune log script does take into account the different log path scheme, i'd be happy to make it more simple21:40
fungii do feel like that path is a bit of an api contract we shouldn't break without warning, but thankfully there are options to allow it to continue working as-is by defauly21:40
openstackgerritClark Boylan proposed zuul/zuul-jobs master: Add option for object store friendly log paths  https://review.opendev.org/67914521:40
clarkbI've only removed the previous sharding prefixes and kept the rest of the paths as is21:41
clarkbthere is some useful info in there about what job and change and stuff that helps people when sharing urls that I don't think should go away21:41
fungithe main argument i see for sharding by date is that storage schemes which want to expire old logs can far more easily prune old paths that way21:41
fungiif we were stuck doing opendev on attached storage, i would have advocated for something like logical volume per day mounted at those subtrees, and then we could just umount and lvremove them at expiration21:43
fungiwhich would have been trivial compared to the days-long find -exec rm cronjobs we ran21:43
fungiof course, logging to swift, we can just set expiration times at the object level and forget about it21:44
pabelangerclarkb: I think we'd be okay with the change, most humans use builds UI to fetch periodic jobs in swift21:46
pabelangerin fact, we'd love to iterate on http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-June/000961.html for logs21:47
clarkbpabelanger: filtering does exist for start and end iirc21:48
clarkbbut it may not be exposed with a filter option in the list21:48
pabelangeryah, can't remember of top of head the issue there. But we'd want to be able to create weekly reports, and filter specific periodic jobs in that range21:50
clarkbyup I think that is doable today you just have to know what the parameter names are /me double checks21:51
clarkbwhihc admittedly should be made easier21:51
*** jeliu_ has quit IRC21:52
clarkbah nope its offset and limit that I'm thinking of so its the pagination problem21:52
clarkbno time bounding currently21:53
clarkbhrm manually setting skip and limit doesn't seem to work21:59
clarkbdid react change that?21:59
tristanCclarkb: the webui doesn't know about skip or limit filters, only the json endpoints interpret those22:03
clarkbI see22:03
clarkbseems like before you could just manipulate the builds url and it worked22:03
clarkbbut I guess that only works now if tlaking to the api directly?22:03
tristanCperhaps the old angular code did forward the query args22:04
openstackgerritMerged zuul/nodepool master: Use fedora-29 instead of fedora-28  https://review.opendev.org/67911622:06
openstackgerritClark Boylan proposed zuul/zuul-jobs master: Add option for object store friendly log paths  https://review.opendev.org/67914522:13
clarkbianw: ^ not much shorter than your suggestion but is namespaced now22:13
*** armstrongs has joined #zuul22:44
*** armstrongs has quit IRC22:48
ianwclarkb: did you want to go with it and run some test jobs?  not sure how urgent it is23:20
clarkbI think it isnt super urgent because we pulled vexxhost out already23:21
clarkbnext step may ve to have mnaser confirm it should help then work on testing23:22
clarkbno rush23:22
*** rfolco has quit IRC23:29
*** rlandy has quit IRC23:43

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!