Wednesday, 2019-08-28

*** noorul has joined #zuul		00:15
*** noorul has quit IRC		00:28
*** noorul has joined #zuul		00:30
*** noorul has quit IRC		00:40
*** noorul has joined #zuul		00:40
*** noorul has quit IRC		00:55
*** wxy-xiyuan has joined #zuul		01:08
*** jamesmcarthur has joined #zuul		01:22
*** bhavikdbavishi has joined #zuul		01:30
*** noorul has joined #zuul		01:32
*** noorul has quit IRC		01:38
*** threestrands has joined #zuul		01:39
*** bhavikdbavishi has quit IRC		01:52
*** jamesmcarthur has quit IRC		01:54
*** noorul has joined #zuul		01:54
*** noorul has quit IRC		01:59
openstackgerrit	Jeff Liu proposed zuul/zuul-operator master: Add PerconaXDB Cluster to Zuul-Operator https://review.opendev.org/677315	02:07
*** spsurya has joined #zuul		02:11
*** noorul has joined #zuul		02:15
*** noorul has quit IRC		02:20
*** noorul has joined #zuul		02:23
*** noorul has quit IRC		02:31
*** noorul has joined #zuul		02:36
*** bhavikdbavishi has joined #zuul		03:14
*** bhavikdbavishi1 has joined #zuul		03:19
*** bhavikdbavishi has quit IRC		03:20
*** bhavikdbavishi1 is now known as bhavikdbavishi		03:20
*** ianychoi has quit IRC		03:29
*** ianychoi has joined #zuul		03:30
*** rfolco has quit IRC		03:32
*** noorul has quit IRC		03:50
*** noorul has joined #zuul		04:22
*** raukadah is now known as chkumar\|rover		04:47
*** bjackman has joined #zuul		05:03
*** noorul has quit IRC		05:06
*** jhesketh has quit IRC		05:34
*** jhesketh has joined #zuul		05:42
openstackgerrit	Merged zuul/zuul master: web: test trailing slash are removed from renderTree https://review.opendev.org/676824	06:14
*** sanjayu_ has joined #zuul		06:16
openstackgerrit	Benedikt Löffler proposed zuul/zuul master: Report retried builds via sql reporter. https://review.opendev.org/633501	06:18
*** sanjayu_ has quit IRC		06:50
openstackgerrit	Monty Taylor proposed zuul/zuul master: Add linter rule disallowing use of var https://review.opendev.org/673841	07:08
yoctozepto	hey folks, any thoughts on: https://review.opendev.org/678273 ? please review before it rots :-)	07:10
*** themroc has joined #zuul		07:34
*** jpena\|off is now known as jpena		07:40
openstackgerrit	Matthieu Huin proposed zuul/zuul master: Zuul Web: add /api/user/authorizations endpoint https://review.opendev.org/641099	07:56
*** mhu has joined #zuul		08:01
*** jangutter has joined #zuul		08:03
*** yolanda__ is now known as yolanda		08:30
*** sanjayu_ has joined #zuul		09:09
*** sshnaidm\|afk is now known as sshnaidm		10:02
tobiash	corvus: do you want to have a look at 678895 (the ref fix)? Or shall we +a it? It has +2 from me and tristanC.	10:44
*** hashar has joined #zuul		11:22
corvus	tobiash: +3 thx	11:29
*** hashar has quit IRC		11:33
*** hashar has joined #zuul		11:33
*** jpena is now known as jpena\|lunch		11:35
*** rlandy has joined #zuul		11:52
*** rlandy is now known as rlandy\|ruck		11:53
*** rfolco has joined #zuul		12:15
*** rlandy\|ruck is now known as rlandy\|ruck\|mtg		12:19
openstackgerrit	Merged zuul/zuul master: Check refs and revs for repo needing updates https://review.opendev.org/678895	12:21
*** jamesmcarthur has joined #zuul		12:22
*** jamesmcarthur has quit IRC		12:29
*** jpena\|lunch is now known as jpena		12:32
openstackgerrit	Merged zuul/zuul master: Add linter rule disallowing use of var https://review.opendev.org/673841	12:35
*** jamesmcarthur has joined #zuul		12:48
*** bjackman has quit IRC		12:49
*** bjackman has joined #zuul		12:52
*** dkehn has quit IRC		12:53
*** nhicher has joined #zuul		13:21
mhu	Shrews, no prob, happy to be of help	13:30
openstackgerrit	David Shrewsbury proposed zuul/zuul master: WIP: Add autohold delete/info commands to web API https://review.opendev.org/679057	13:32
Shrews	mhu: am I on the right track with that? ^^^^	13:32
mhu	Shrews, I'm in meetings right now but I'll have a look ASAP	13:32
Shrews	mhu: sure, no hurry. thx	13:33
*** sanjayu_ has quit IRC		13:34
*** bjackman has quit IRC		13:34
*** sanjayu_ has joined #zuul		13:34
openstackgerrit	David Shrewsbury proposed zuul/zuul master: WIP: Add autohold delete/info commands to web API https://review.opendev.org/679057	13:39
tobiash	are opendev's executors backed by ssds?	13:51
*** bhavikdbavishi has quit IRC		13:59
*** bjackman has joined #zuul		14:04
*** dkehn_ has joined #zuul		14:14
*** sshnaidm has quit IRC		14:17
*** brennen is now known as brennen\|afk		14:18
*** sshnaidm has joined #zuul		14:19
*** openstackgerrit has quit IRC		14:22
tristanC	Shrews: it seems like change to the diskimage structs (e.g. adding a python-path), are not picked up by provider which still spawn register the previous python-path. It seems like we need to restart the launcher process	14:23
fungi	tobiash: i don't believe so, looks like they're on whatever rackspace's default is for the rootfs and the ephemeral disk where we mount /var/lib/zuul	14:23
fungi	presumably "spinning rust" (via sata)	14:23
tristanC	Shrews: any idea how to make provider reload diskimage definition when they change?	14:24
tobiash	fungi: I was wondering as your executors seem to perform much better than ours (which are ceph backed)	14:24
fungi	tobiash: just a sec and i'll get more specifics	14:24
tobiash	but we're currently in process of moving them to nvme disks	14:24
tobiash	tristanC: it should reload automatically afaik	14:25
tobiash	if not that seems like a bug	14:26
fungi	tobiash: we've booted them from rackspace's "8 GB Performance" flavor in their dfw region, and this is not one of their special "ssd" flavors	14:26
tobiash	fungi: ah thanks	14:26
tristanC	tobiash: iiuc the openstack.config module, it keeps a copy of the diskimage but it doesn't check for change (diskimages list is global vs openstack provider local list)	14:26
tobiash	tristanC: hrm, we should probably fix this	14:27
fungi	tobiash: additional details, they're using ubuntu bionic (18.04.2 LTS) with linux 4.15.0-46-generic and the filesystems are formatted ext4	14:27
tobiash	fungi: cool, thanks!	14:27
fungi	please ask if you wand any other details. i don't think anything besides the passwords is meant to be particularly secret ;)	14:28
tristanC	Shrews: tobiash: oh my bad, the python-path change requires a new image upload	14:28
*** jamesmcarthur has quit IRC		14:29
*** jamesmcarthur has joined #zuul		14:29
tobiash	fungi: thanks, I was interested mainly in io performance characteristics	14:29
*** jeliu_ has joined #zuul		14:29
fungi	our cacti graphs should show some i/o metrics if you haven't looked yet	14:30
tobiash	fungi: so your executors average about 4k iops	14:32
fungi	neat, i hadn't looked but that sounds like a lot	14:33
tobiash	sounds like ssd ;)	14:33
tobiash	http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=64197&rra_id=0&view_type=tree&graph_start=1566916407&graph_end=1567002807 for reference	14:34
*** amotoki_ has quit IRC		14:34
*** amotoki has joined #zuul		14:35
*** jamesmcarthur has quit IRC		14:42
*** openstackgerrit has joined #zuul		14:47
openstackgerrit	Andreas Jaeger proposed zuul/zuul-jobs master: Switch to fetch-sphinx-tarball for tox-docs https://review.opendev.org/676430	14:47
*** jamesmcarthur has joined #zuul		14:49
fungi	tobiash: one thing we've observed is that the sla in a lot of public providers use write-back caching instead of write-through, so we could simply be seeing numbers reflecting writes to memory there	14:57
*** igordc has joined #zuul		14:57
*** bjackman has quit IRC		15:00
*** jamesmcarthur has quit IRC		15:02
tobiash	fungi: according to flavor description it seems to be (undefined) ssd: https://developer.rackspace.com/docs/cloud-servers/v2/general-api-info/flavors/	15:04
*** themroc has quit IRC		15:04
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: Add phoronix-test-suite job https://review.opendev.org/679082	15:06
fungi	huh, neat. maybe it's just their default block storage which is on sata then, i know you have to request a special flavor of that to get ssd	15:06
fungi	(or at least you used to, maybe they've upgraded all their storage?)	15:07
clarkb	note /var/lib/zuul is the ephemeral device	15:08
clarkb	which may be different hardware than the root disk	15:08
fungi	yep	15:08
fungi	i haven't seen where they indicate what hardware serves their ephemeral disks	15:08
mhu	Shrews: I've commented the review, you're close :) Next I'll help with the tests if you'd like	15:11
Shrews	mhu: trying to figure out the tests now :)	15:11
mhu	Shrews, I think the existing ones for autoholding from the REST API should be a good starting point	15:12
mhu	although they require auth	15:12
*** jamesmcarthur has joined #zuul		15:13
Shrews	mhu: what portion is the "boilerplate authentication/authorization code" ?	15:14
Shrews	mhu: the part beginning with: rawToken = cherrypy.request.headers['Authorization'][len('Bearer '):] ?	15:15
mhu	Shrews, from "basic_error ..."	15:16
mhu	https://opendev.org/zuul/zuul/src/branch/master/zuul/web/__init__.py#L244 to https://opendev.org/zuul/zuul/src/branch/master/zuul/web/__init__.py#L262 for example	15:17
*** jamesmcarthur has quit IRC		15:18
mhu	this would be better factored in a single method ... I thought of having this as a decorator but was advised against it	15:18
*** tosky has joined #zuul		15:21
Shrews	mhu: that portion of code calls is_authorized() which requires a tenant parameter. How does that effect your suggestion to not use tenant in the api url?	15:22
mhu	Shrews, IIUC autohold-info gets you the tenant info for the autohold request	15:23
mhu	so I'd suggest calling autohold-info first, fetch the tenant, call is_authorized, then proceed	15:24
Shrews	ok	15:24
mhu	this way you can also catch errors when the request id is incorrect and return a 404	15:24
tosky	anyone up for re-reviewing https://review.opendev.org/#/c/674334/ ?	15:29
openstackgerrit	David Shrewsbury proposed zuul/zuul master: WIP: Add autohold delete/info commands to web API https://review.opendev.org/679057	15:31
*** chkumar\|rover is now known as raukadah		15:31
Shrews	mhu: does that mean you'd suggest i use 404 instead of 500 within _autohold_info() if the rpc call fails?	15:33
Shrews	i just copy-pasted that code from elsewhere	15:33
mhu	Shrews: that depends on the type of error	15:37
mhu	4XX HTTP statuses are generally used for user-induced errors	15:37
mhu	for example 404 (Not Found) is an adequate return code if the request ID is incorrect	15:38
mhu	401 means Unauthorized, ie the user needs more privileges in order to perform an action	15:38
*** sanjayu_ has quit IRC		15:38
mhu	500 is a catch-all code for server-side errors	15:39
*** sanjayu_ has joined #zuul		15:39
mhu	so in your case 500 is correct, just not very informative	15:39
mhu	but maybe the RPC itself won't give much info	15:39
Shrews	oh, i think i need to check for an empty dict and THEN return 404	15:40
*** rlandy\|ruck\|mtg is now known as rlandy		15:43
mhu	Shrews, yep	15:44
openstackgerrit	David Shrewsbury proposed zuul/zuul master: WIP: Add autohold delete/info commands to web API https://review.opendev.org/679057	15:44
Shrews	that should do it ^^	15:44
*** rlandy is now known as rlandy\|brb		15:46
*** panda is now known as panda\|rover		15:47
*** sshnaidm is now known as sshnaidm\|afk		15:49
*** noorul has joined #zuul		16:02
*** jpena is now known as jpena\|off		16:05
noorul	How does the log collection works in Zuul?	16:07
*** rlandy\|brb is now known as rlandy		16:07
noorul	Where is it actually stored?	16:07
clarkb	noorul: whereever you configure it is the short answer. There are roles to upload logs to openstack swift storage locations (what we currently use) as well as rsync onto filesystems which you can serve with a webserver (what we used previously)	16:10
noorul	clar	16:11
noorul	clarkb: Actually the script are triggered remotely using Ansible. So the logs are stored on the node where the Ansible runs. Is this the executor?	16:12
clarkb	noorul: there is a bit of coordination between the executor and test nodes to make this work. Basically there are log collection roles that pull logs onto the executor from the test nodes then roles with publish those logs from the executor to say swift or a fileserver	16:15
clarkb	so there are two steps. Collect logs into publication source dir, run publication role to publish publication source dir	16:16
noorul	Am I missing anything in the config http://paste.openstack.org/show/766638/ ?	16:18
clarkb	for logging? no. The logging happens as part of your job config. Typically you will put it in your base job	16:19
noorul	This is the base log http://paste.openstack.org/show/766639/	16:20
clarkb	for opendev this is the chain of things that gets you logs: https://opendev.org/opendev/base-jobs/src/branch/master/zuul.d/jobs.yaml#L55 https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/base/post.yaml#L3-L4 https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/base/post-logs.yaml	16:21
clarkb	the first bit is where the base job includes the playbooks then the first playbook collects logs and the second publishes them	16:21
clarkb	noorul: ya so your post-run playbook(s) should coordinate publishing of logs	16:24
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: Add phoronix-test-suite job https://review.opendev.org/679082	16:27
*** hashar has quit IRC		16:33
mordred	tristanC: ^^ interesting. how are you thinking of using that?	16:34
clarkb	mordred: I'm guessing that will be used to benchmark nodepool nodes	16:36
mhu	clarkb, yep	16:37
mordred	clarkb: ah - yeah. makes sense	16:37
tristanC	mordred: to test nodepool label performance, here is how: https://softwarefactory-project.io/r/#/c/16145/	16:37
mhu	and more generally cloud providers	16:37
mordred	for some reason I was reading it as a test of nodepool itself which didn't make any sense. that makes much more sense	16:37
clarkb	opendev has used tempest for years for that since it isn't artificial and tends to map to our needs well	16:37
clarkb	it is actually a relly good test of cpu and disk and network	16:38
tristanC	clarkb: well, we'd like to know what is causing a difference, e.g. check cpu, memory, io, network, ...	16:38
clarkb	ya phoronix test sutie is likely best if you want to examine specific items rather than a holistic "is this node fast enough to run our jobs"	16:40
mordred	yeah. seems like a good tool in the toolbox	16:41
tristanC	mordred: yeah, we figured that would be a nice addition to zuul-jobs :)	16:43
AJaeger_	mordred: the two week waiting period for https://review.opendev.org/676430 is over - and I just pushed an update for it to fix the problems I noticed. In case you want to ask for reviews ;)	16:44
*** jeliu_ has quit IRC		16:45
mordred	tristanC: left a question/comment on it	16:46
*** jamesmcarthur has joined #zuul		17:00
noorul	If I make changes config-projects repo, will get automatically loaded?	17:01
noorul	s/chages/chages to/	17:02
*** jeliu_ has joined #zuul		17:06
clarkb	no changes to config projects have to be merged before they take effect	17:06
clarkb	(this is for security reasons you don't want to expose secrets for example)	17:06
noorul	I did not get that	17:10
clarkb	changes to config projects must be merged before they change how zuul operates. This ensures that humans can review the changes prior to implementing them which helps to avoid security problems with privileged info	17:11
noorul	clarkb: I directly pushed to master	17:12
noorul	My main.yaml is here http://paste.openstack.org/show/766643/	17:16
noorul	I am not seeing all the roles under zuul-jobs under the tenant	17:16
clarkb	if you've pushed directly to master then I would expect zuul to pick it up. However I don't know how the bitbucket driver will handle that case	17:17
tristanC	noorul: iirc, zuul may miss direct push and thus skip reload the config	17:17
noorul	tristanC: So a restart of scheduler might help?	17:18
clarkb	tristanC: on Gerrit at least there should be an event for that case	17:19
tristanC	noorul: if no ref-updated or change merged happened in the scheduler log, then restarting the service will force using the latest config	17:19
clarkb	and I would expect zuul to pick it up	17:19
clarkb	(but you'd have to push through gerrit not directly onto disk)	17:19
noorul	clarkb: I a not using Gerrit but Bitbucket server instead	17:20
clarkb	I know, I'm explaining that it should work in the gerrit case but may not in the other driver cases	17:21
noorul	Oh I see	17:21
*** panda\|rover is now known as panda\|rover\|off		17:23
*** noorul has quit IRC		17:43
*** tosky_ has joined #zuul		17:45
*** tosky has quit IRC		17:45
*** tosky_ is now known as tosky		17:45
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Add autohold-info CLI command https://review.opendev.org/662487	18:00
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Record held node IDs with autohold request https://review.opendev.org/662498	18:00
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Auto-delete expired autohold requests https://review.opendev.org/663762	18:00
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Mark nodes as USED when deleting autohold https://review.opendev.org/664060	18:00
openstackgerrit	David Shrewsbury proposed zuul/zuul master: WIP: Add autohold delete/info commands to web API https://review.opendev.org/679057	18:00
Shrews	yay bugs	18:00
Shrews	mhu: ok, i have tests now. The only one that fails is test_autohold_delete() and that's because of authz failure. How do I do that correctly?	18:01
Shrews	mhu: oh! the authz tests are in a different class.	18:03
Shrews	yay, it works	18:06
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Add autohold delete/info commands to web API https://review.opendev.org/679057	18:08
Shrews	corvus: mordred: that should tie up the loose ends of the autohold revamp stuff and should be gtg now ^^^	18:14
clarkb	zuulians I'm wondering if we should consider a release this week to fix that zuul tests the wrong commit bug for peopel consuming relases?	18:19
clarkb	I'm deploying that fix on opendev now so we should know if it is working	18:19
clarkb	or at least doesn't regress further	18:19
clarkb	(the conditions under which it happens are somewhat specific)	18:20
*** michael-beaver has joined #zuul		18:23
Shrews	that bug merges things into the wrong branch, yeah? If so, then yeah, a release sounds advisable	18:23
*** igordc has quit IRC		18:23
clarkb	Shrews: it causes zuul to checkout the wrong branch in the jobs	18:23
clarkb	so the jobs run against the wrong commit if they trip over the bug	18:23
Shrews	ah	18:24
clarkb	I think the correct commits are actually there too	18:24
Shrews	still seems worthy	18:24
clarkb	ya	18:24
*** armstrongs has joined #zuul		18:36
*** jamesmcarthur has quit IRC		18:43
*** armstrongs has quit IRC		18:45
*** tosky has quit IRC		18:56
openstackgerrit	Clark Boylan proposed zuul/nodepool master: Use fedora-29 instead of fedora-28 https://review.opendev.org/679116	19:01
openstackgerrit	Clark Boylan proposed zuul/nodepool master: Use fedora-29 instead of fedora-28 https://review.opendev.org/679116	19:06
openstackgerrit	Clark Boylan proposed zuul/nodepool master: Use fedora-29 instead of fedora-28 https://review.opendev.org/679116	19:15
clarkb	tristanC: ^ can you review that change	19:17
EmilienM	hi there, what is the "2. attempt" thing in zuul?	19:18
EmilienM	(I probably missed the feature announcement)	19:18
EmilienM	is it like an auto-recheck or?	19:19
openstackgerrit	Ronelle Landy proposed zuul/zuul-jobs master: Only use RHEL8 deps repo on Red Hat systems newer than 7 https://review.opendev.org/679126	19:19
clarkb	EmilienM: there are two major cases for it: either the job fails in pre-run playbook so is restarted or zuul identifies the failure as something external to the job so retries it	19:20
clarkb	EmilienM: in this case I've restarted all of the zuul executors which kills the jobs running on the executor that was stopped and reschedules them to another	19:20
EmilienM	in case #2,w here is the list of known issues?	19:20
clarkb	(this was to update the deployment of our executors)	19:20
fungi	also not a new feature, but the fact that we're surfacing it in the builds dashboard is new	19:21
clarkb	EmilienM: I don't think there is a list as much as "this exit code from ansible means it has a network failre" type of deal	19:21
clarkb	+ gearman worker went away	19:21
EmilienM	ok	19:21
EmilienM	thanks!	19:21
fungi	EmilienM: when you see a build result of RETRY_LIMIT that means that zuul saw failures it thought meant it should abort and requeue the build, but tried that repeatedly and finally gave up	19:22
EmilienM	it makes sense	19:23
EmilienM	nicely done!	19:23
clarkb	ya zuul has done this since the jenkins days	19:25
clarkb	its just always been a bit transparent to people unless they hit retry limits	19:25
fungi	because: clouds (and the internet)	19:25
fungi	stuff has a tendency to just spontaneously go away and never come back	19:26
fungi	building a castle on a foundation of sand	19:26
clarkb	I want to say the jenkins behavior of losing its sshconnection and not trying to reconnect but instead simply failing is what precipitated the feature	19:27
EmilienM	fungi: right i just didn't see it before in the UI	19:27
mnaser	uh	19:40
mnaser	no bucket sharding is happening for periodic jobs uploaded to swift btw	19:40
mnaser	ex: https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_periodic/opendev.org/openstack/operations-guide/master/propose-translation-update/ba10bde/	19:40
mnaser	this has contributed to doing really bad things in our swift :\	19:40
clarkb	because one container is larger than the others?	19:47
timburke	mnaser, out of curiosity, roughly how many objects are in the container?	19:49
clarkb	timburke: note mnasers cloud is ceph not swift	19:49
mnaser	^	19:49
clarkb	havent heard complaints from the swift clouds	19:50
mnaser	clarkb: yes, and rados eventually needs to reshard buckets automatically	19:50
timburke	👍	19:50
timburke	still, just curious :-)	19:50
mnaser	so it hits a limit then starts a reshard which takes forevers	19:50
mnaser	let me check	19:50
clarkb	timburke: ya me too	19:50
mnaser	the stats is sitting around for a while..	19:52
clarkb	mnaser: the way the prefix sharding works is it takes the zuul log path eg: logs/periodic/opendev.org and logs/68/678968/3/check and replaces the first / with a _ and that component becomes the container name	20:05
clarkb	so it is sharding the periodic logs, it is just sharding them into the same container	20:05
mnaser	oh hm	20:06
mnaser	right	20:06
clarkb	we could change the zuul log path for periodic jobs to include their day of the month maybe?	20:06
clarkb	eg logs/periodic_$DoM/opendev.org	20:06
clarkb	then you'd get 31 periodic job shards	20:06
clarkb	there may be other methods that would work better?	20:07
clarkb	let me figure out how to push that up so we have a change we can poke at at least	20:10
timburke	might produce a bit of a hot container -- surely no worse than we've got now, but you might want to consider using seconds instead of day. plus you'd get about twice as many	20:15
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: WIP: Add day of month to periodic logs for swift sharding https://review.opendev.org/679135	20:16
clarkb	timburke: thats a good point. I avoided hour and minute because we launch the periodic jobs at the same time	20:17
clarkb	but secdonds should give us enough variance there	20:17
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: WIP: Add current date seconds to periodic logs for swift sharding https://review.opendev.org/679135	20:18
clarkb	timburke: mnaser ^ there	20:18
clarkb	from the user standpoint my big concern is people not using swift for logs may rely on webserver indexes that sort by name to present periodic jobs. I'm not sure if we want to be able to assume the zuul dashboard is the primary consumption point for this stuff yet	20:19
clarkb	(fwiw I think it should be the primary point as it adds a bunch of functionality but we may not quite be there yet)	20:19
clarkb	in any case this may be good enough while people transition. I'll let others chime in	20:19
timburke	yeah, that's a fair point. might be a point in favor of day, as it would have some sort of useful-ish meaning for a human	20:22
clarkb	tristanC: ^ you may have thoughts on that, will it be bad for softwarefactory for example	20:41
clarkb	pabelanger: ^ you too	20:41
*** jamesmcarthur has joined #zuul		20:44
fungi	i suppose we could reorder the path so that the build id comes first, which should be fairly entropic	20:47
clarkb	I think that becomes even harder for people to navigate without the dashboard though	20:48
fungi	is there a good reason that's a bad idea? (i assume it is or someone would have already suggested it as an obvious option)	20:48
clarkb	ya its the hitting logs.openstack.org/ type webserver root problem	20:48
clarkb	if you are looking for periodic jobs finding them would be hard if it was just build uuids	20:48
clarkb	(granted digging through 60 different dirs isn't easy either	20:49
fungi	yeah, i see that as no worse than the lets-inject-seconds plan	20:49
fungi	and it would allow us to keep the paths no longer than they currently are	20:49
clarkb	we'd probably go to full uuids so they would be longer	20:50
clarkb	to avoid collisions	20:51
clarkb	currently we avoid them by being at the end of the path	20:51
clarkb	so 7 chars is enough	20:51
fungi	why would collisions become a problem if they're not already with the parameters reordered?	20:51
clarkb	because right now they are uniquely identified by branch project name and job name and pipeline	20:52
fungi	i don't propose we change that	20:52
openstackgerrit	Tristan Cacqueray proposed zuul/zuul-jobs master: Add phoronix-test-suite job https://review.opendev.org/679082	20:53
fungi	but you could just as uniquely identify them by build id, branch, project name, job name, and pipeline	20:53
fungi	as by branch, project name, job name, pipeline, and build id	20:53
clarkb	that gets weird to me if you go to the first dir and find multiple entries for the same id	20:53
fungi	are people going to the first dir now?	20:53
clarkb	yes that is what makes periodic special	20:54
clarkb	the way you navigate periodic jobs on the old style log server is to go to the root, sort by date, and find wat you want	20:54
clarkb	this is why we reverted the swift logs stuff forever ago the first time	20:54
fungi	i thought the upshot of object storage was that we started requiring folks to rely on the zuul dashboard as an index to the logs	20:54
clarkb	because we neglected the periodic jobs use case	20:54
clarkb	fungi: yes that is the question I had above, can we currently expect people to use the dashboard	20:55
fungi	because we no longer guarantee that the log urls are predictable	20:55
clarkb	fungi: because this is a change to zuul-jobs it affects more than opendev	20:55
fungi	oh, that ordering is hard-coded, not parameterized? hrm...	20:55
clarkb	for opendev I have no problem changing those urls to arbitrary random strings	20:55
clarkb	beause the dashboard is the primary consumption point	20:55
fungi	seems like it could be solved by templating the parameters and their order	20:56
fungi	fwiw, i think injecting seconds early in the path creates basically the same problem	20:56
clarkb	ya day of month sort of avoids it if you know the scheme	20:57
clarkb	which is ps1	20:57
fungi	day of month still creates load clustering, i expect. we end up slamming the same shard with writes over the course of a day	20:58
clarkb	fungi: ya that was timburke's point	20:58
clarkb	however if the problem is total count of objects this may help	20:58
clarkb	mnaser: ^ would probably need to weigh in on that aspect of it	20:58
fungi	right, i'd defer to someone operating an impacted storage environment on that	20:59
*** jamesmcarthur has quit IRC		20:59
clarkb	other options include using a fork of that role in opendev/base-jobs or similar for a bit and change it to whatever	21:02
clarkb	since we have the dashboard it is safe ofr us	21:02
clarkb	or just accept that periodic logs via old webserver isn't going to be nice	21:02
clarkb	and change it to whatever in zuul-jobs	21:02
fungi	i do think if we made the path component ordering configurable, it would allow opendev to do something like build-id first and get better path-indicated sharding on platforms which do that sort of thing	21:03
fungi	even just 7 hex digits allows for a fairly insane number of shards	21:04
fungi	268 billion	21:04
clarkb	which might be a problem itself	21:05
clarkb	we probably only want to do 2 or 3 digits that way	21:05
clarkb	to avoid creating too many containers	21:05
fungi	<sagan>billions and billions...</sagan>	21:05
fungi	oh, that's container names?	21:05
clarkb	yes	21:06
clarkb	the way it creates the container name is to take the first two components of the path and to combine them	21:06
fungi	in that case we could just prefix with 2 hex digits truncated from the build-id i guess	21:06
clarkb	logs/68/54368/3 -> container logs_68 with object 54368/3/...	21:06
clarkb	the problem with periodic jobs is that becomes logs_periodic for all periodic jobs	21:07
fungi	for some reason i thought it as that ceph/radosgw was using the first two path components to decide on the sharding	21:07
clarkb	all other jobs either have a change number or ref prefix	21:07
clarkb	fungi: no we are deciding that in our swift upload role	21:07
fungi	yeah, if we're creating containers based on those, i agree even distribution over 268 billion possibilities ultimately probably means that over the course of the month we have roughly as many containers as builds	21:08
clarkb	and then ceph is sharding within the container iaui	21:08
clarkb	and that becomes a problem when a single container has too many objects?	21:08
clarkb	that was my understanding of what mnaser said	21:09
clarkb	so if we divide the object count by 31 or 60 maybe we reuce the object count sufficiently to not be a problem	21:09
fungi	if we use a 2-hex-digit truncation then we top out at 256 possibilities which is a more reasonable container count probably	21:10
clarkb	yup, but would be shared across all builds not just periodic builds (that should more evenly distribute the objects which is a good thing)	21:10
clarkb	mnaser: ^ if you get a chance your input on what would help your ceph install would be valuable for figuring out the next step here	21:19
fungi	what i like about the truncated build uuid in the container name is that we get a clear upper bound on container count in each provider that way	21:20
fungi	though the quantization jump in either direction is to go to 16 or 4096 which might be extreme	21:21
clarkb	my guess is 4096 is probably fine but 16 too small	21:22
clarkb	swift should be able to handle thousands of containers	21:22
fungi	so of those three options (16, 256, 4096) the middle one seems the most reasonable	21:22
fungi	and yeah, 4096 may be as well	21:22
fungi	65k containers, the next jump past 4k, is probably not	21:23
clarkb	ya	21:23
fungi	granted, we could reencode and then truncate the build-id in whatever base we want, so sticking to powers of 16 is not absolutely necessary either	21:24
fungi	but as much as i might like to spend the rest of my afternoon on modular arithmetic, i probably need to get to mowing the lawn at some point	21:25
tristanC	clarkb: we do have users relying on $logserver/periodic/ to collect periodic logs, but we can break that by documenting zuul builds interface	21:37
tristanC	clarkb: or perhaps this new behavior can be toggled by a set-zuul-log-path-fact attribute?	21:37
clarkb	tristanC: ya that is what I'm working on now to make it a toggle	21:38
tristanC	using an unified build-id based path sounds like a good idea, and we would likely enable this by default if it's optional	21:39
tristanC	our prune log script does take into account the different log path scheme, i'd be happy to make it more simple	21:40
fungi	i do feel like that path is a bit of an api contract we shouldn't break without warning, but thankfully there are options to allow it to continue working as-is by defauly	21:40
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Add option for object store friendly log paths https://review.opendev.org/679145	21:40
clarkb	I've only removed the previous sharding prefixes and kept the rest of the paths as is	21:41
clarkb	there is some useful info in there about what job and change and stuff that helps people when sharing urls that I don't think should go away	21:41
fungi	the main argument i see for sharding by date is that storage schemes which want to expire old logs can far more easily prune old paths that way	21:41
fungi	if we were stuck doing opendev on attached storage, i would have advocated for something like logical volume per day mounted at those subtrees, and then we could just umount and lvremove them at expiration	21:43
fungi	which would have been trivial compared to the days-long find -exec rm cronjobs we ran	21:43
fungi	of course, logging to swift, we can just set expiration times at the object level and forget about it	21:44
pabelanger	clarkb: I think we'd be okay with the change, most humans use builds UI to fetch periodic jobs in swift	21:46
pabelanger	in fact, we'd love to iterate on http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-June/000961.html for logs	21:47
clarkb	pabelanger: filtering does exist for start and end iirc	21:48
clarkb	but it may not be exposed with a filter option in the list	21:48
pabelanger	yah, can't remember of top of head the issue there. But we'd want to be able to create weekly reports, and filter specific periodic jobs in that range	21:50
clarkb	yup I think that is doable today you just have to know what the parameter names are /me double checks	21:51
clarkb	whihc admittedly should be made easier	21:51
*** jeliu_ has quit IRC		21:52
clarkb	ah nope its offset and limit that I'm thinking of so its the pagination problem	21:52
clarkb	no time bounding currently	21:53
clarkb	hrm manually setting skip and limit doesn't seem to work	21:59
clarkb	did react change that?	21:59
tristanC	clarkb: the webui doesn't know about skip or limit filters, only the json endpoints interpret those	22:03
clarkb	I see	22:03
clarkb	seems like before you could just manipulate the builds url and it worked	22:03
clarkb	but I guess that only works now if tlaking to the api directly?	22:03
tristanC	perhaps the old angular code did forward the query args	22:04
openstackgerrit	Merged zuul/nodepool master: Use fedora-29 instead of fedora-28 https://review.opendev.org/679116	22:06
openstackgerrit	Clark Boylan proposed zuul/zuul-jobs master: Add option for object store friendly log paths https://review.opendev.org/679145	22:13
clarkb	ianw: ^ not much shorter than your suggestion but is namespaced now	22:13
*** armstrongs has joined #zuul		22:44
*** armstrongs has quit IRC		22:48
ianw	clarkb: did you want to go with it and run some test jobs? not sure how urgent it is	23:20
clarkb	I think it isnt super urgent because we pulled vexxhost out already	23:21
clarkb	next step may ve to have mnaser confirm it should help then work on testing	23:22
clarkb	no rush	23:22
*** rfolco has quit IRC		23:29
*** rlandy has quit IRC		23:43

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!