Wednesday, 2024-08-14

xrkr	acoles: mattoliver: DHE: i have tried db vacuum.. the db file got reduced only from 74G to 68G.	00:45
mattoliver	well it was a little bloated then, but still that's a big container!	00:55
mattoliver	If the container is still to big to really fit on any of your disks, then sharding would probably be your answer. That'll break it down in to multiple smaller dbs that will be spread across the cluster.	00:57
xrkr	Hi mattoliver: Any guidance on sharding pls.	03:18
xrkr	acoles: mattoliver: DHE: I also see db_preallocation = True in my [container-updater] [container-auditor] [container-server].. Changing this to false would help?	04:07
mattoliver	xrkr: A good place to start for info at least is: https://docs.openstack.org/swift/latest/overview_container_sharding.html But we can help you get it going :)	05:33
mattoliver	There is an auto-sharding option in the sharder, this ISN"T supported or ready for prod use yet. But basically allows the sharder to do leader elleection, scan for big containers, insert shardranges (poits to shard at) and does the sharding.. one day we'd be able just turn this on (or in better always have it on my default). But we can't do this yet. (I'm working on it).	05:35
mattoliver	So instead at the moment it's operator driven.	05:35
mattoliver	Firstly you need to be running the sharder daemons on your container servers. When they run, they actually drop recon data about your biggest (And best) candidate DBs for sharding.	05:36
mattoliver	You can then go to ONLY one of the replicas of the container you want to shard and use the swift-manage-shard-ranges operator tool to find and insert the shardranges into the container and enable it for sharding.	05:37
mattoliver	Using this tool, you can decide how you want to split it up.. in what size chunks. 1 million object rows. 500k :shrug:.	05:38
xrkr	Thanks mattoliver: Let me get this checked. How about my comment on db_preallocation ? Would that work?	05:43
mattoliver	Once you'd enabled the container, the sharder when it next visits it will see the shardranges the tool inserted and start sharding (splitting) the container up. How many it splits at a time (cleave_batch_size) is configurable in the sharder config. If you want to happen faster you can set this to a higher number (or the number of shardranges inserted to do it at once).	05:44
mattoliver	Here's more information on the sharder settings: https://github.com/openstack/swift/blob/master/etc/container-server.conf-sample#L374	05:44
mattoliver	but again we can walk you through if you need help.	05:45
mattoliver	xrkr: db_preallocation is to allocate space for an expanding db. if there isn't enough room on the server your syncing a new db I don't think it'll do anything.	05:55
xrkr	Ok. Thanks mattoliver: I shall revert	06:15
*** avanzaghi11 is now known as avanzaghi1		08:15
acoles	xrkr: db_preallocation = False will, I think, help you get around the immediate problem because IIUC the server is currently trying and failing to preallocate ~68G. But longer term you may want to think about sharding that container.	09:01
xrkr	Hi acoles: mattoliver thinks otherwise, however i am just thinking for my 3 site cluster, i do notice this db exists one in 3 servers in each site. I was just wondering if i can create another temp configuration file and start the container server process just in those nodes where this db exists. I am unsure if this works though (proivided at all db_preallocation = false will help us).	11:58
mattoliver	my reading of the preallocation code uses fallocate to pre reserve some space for expansions. And it does it in steps. Turning it to false would mean it would not require as much space at the onset (putting space aside). So it does make things better.. but only if there is enough room on the remote device to actually house the DB. And I was under the impression you didn't have room to replicate.	12:46
acoles	@mattoliver IIUC the problem xrkr has is that the DB can't pre-allocate in situ, so container listing is failing because fallocate fails when the pending file is flushed to the DB	12:49
opendevreview	Alistair Coles proposed openstack/swift master: Ignore 404s from handoffs when choosing response code https://review.opendev.org/c/openstack/swift/+/925993	14:11
opendevreview	Alistair Coles proposed openstack/swift master: object-server POST: return x-backend-timestamp in 404 https://review.opendev.org/c/openstack/swift/+/926220	14:11
opendevreview	Elod Illes proposed openstack/python-swiftclient master: DNM: gate health test https://review.opendev.org/c/openstack/python-swiftclient/+/926336	16:10
opendevreview	Alistair Coles proposed openstack/swift master: Ignore 404s from handoffs when choosing response code https://review.opendev.org/c/openstack/swift/+/925993	16:42
opendevreview	Alistair Coles proposed openstack/swift master: object-server POST: return x-backend-timestamp in 404 https://review.opendev.org/c/openstack/swift/+/926220	16:42
opendevreview	Tim Burke proposed openstack/liberasurecode master: Release 1.6.4 https://review.opendev.org/c/openstack/liberasurecode/+/917784	16:52
opendevreview	Merged openstack/liberasurecode master: Release 1.6.4 https://review.opendev.org/c/openstack/liberasurecode/+/917784	17:37
opendevreview	Clay Gerrard proposed openstack/swift master: wip: add some test infra https://review.opendev.org/c/openstack/swift/+/926349	17:42
mattoliver	@acoles: oh sorry didn't realise it was that bad, i mustve missed that part in scrollbars! Yeah xrkr turn it off!	20:25
mattoliver	Morning, I wonder if we're meeting today? Maybe I've just forgotten if it got cancelled 🙃	21:07
fulecorafa	Here wondering too	21:07
acoles	o/	21:08
mattoliver	Is timburke around today?	21:09
timburke	oh, right! meeting!	21:09
timburke	sorry everybody	21:09
timburke	#startmeeting swift	21:09
opendevmeet	Meeting started Wed Aug 14 21:09:50 2024 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.	21:09
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	21:09
opendevmeet	The meeting name has been set to 'swift'	21:09
timburke	who's here for the swift meeting?	21:09
mattoliver	o/	21:10
fulecorafa	o/	21:10
acoles	\o	21:10
acoles	I'm just here til 30 after the hour	21:10
timburke	sorry for the delay -- between kids being home for this last week before going back to school and me getting nerd-sniped by some performance investigation, i'm not well prepared today ;-)	21:11
mattoliver	lol, understandable!	21:11
timburke	acoles, since you've got to cut out early anyway, is there anything you'd like to bring up?	21:12
acoles	timburke: the patch to deprecate log_statsd_* options...	21:13
acoles	https://review.opendev.org/c/openstack/swift/+/922518/	21:13
patch-bot	patch 922518 - swift - statsd: deprecate log_ prefix for options - 11 patch sets	21:13
acoles	clayg: made some good observations about the decision to blow up if both the old and new options were found in conf:	21:14
acoles	1. that seems too harsh if the old and new options have the same value!	21:14
timburke	fair	21:15
acoles	2. that prevents ops adding config options in anticipation of upgrade / problematic if a node isn;t upgraded	21:15
timburke	but then why move off the old options at all? suffer the warning until you can move with confidence!	21:16
acoles	3. with conf.d style it's easy to miss that there's a legacy option still in there (this would be mitigated by tolerating same value)	21:18
acoles	we noted that if we turned it down to just a warning, then worst case some stats stop appearing and ops check logs and see warning and go fix conf	21:18
acoles	anyways, I wanted to solicit more opinion...	21:19
timburke	eh, it's certainly not a hill i'm looking to die on	21:20
acoles	I certainly think that tolerating old-and-new-but-same-value without blowing up seems reasonable	21:20
acoles	hehe, I'm not sure anyone wants to dis on this particular hill ;-)	21:20
timburke	tolerating both, even with different values seems perfectly in line with other times that we've tried to push for renaming options	21:20
acoles	we'd obvs make the warning pretty clear - "you gave me x and y and I'm ignoring y"	21:21
mattoliver	yeah supporting new but falling back to old seems the normal migration path. With warnings to me makes sense. But maybe I'm missing something obvious (it is early) :P	21:22
acoles	the context is this comment https://review.opendev.org/c/openstack/swift/+/922518/comment/56eb874d_9707e7c5/	21:23
patch-bot	patch 922518 - swift - statsd: deprecate log_ prefix for options - 11 patch sets	21:23
acoles	we're concerned that an op might update to new option in DEFAULT and not realise it overrides what was previously a different value in proxy-logging	21:24
acoles	which led us to "blow up if old and new are present" ... but now reflecting on whether that is too brittle	21:25
timburke	i think my perspective when thinking about going for an error was if you've got log_statsd_host=foo, log_statsd_port=12345, statsd_host=bar -- that having some but not all options coming from the correct place seems likely a misconfiguration	21:25
acoles	yup, but does misconfiguration result in warning or ValueError ???	21:26
mattoliver	oh I see. But if they know they're moving to the new (and done so in DEFAULT) isn't it on them to finish the job.. I think warning of settings is enough, to alert them.	21:27
mattoliver	But I guess it depends on where we think metrics sit on importance in swift	21:27
mattoliver	if it's critial then it should ValueError	21:27
mattoliver	but it seems kinda optional (which maybe wrong) so warning is enough	21:28
timburke	maybe my real concern is that i want a statsd_endpoint config option that looks like host:port -- which is definitely out of scope :P	21:29
mattoliver	Downstream our metrics are super important, so I'd assume ValueError would be more useful, because we may not check/see warning for some time.	21:29
mattoliver	so yeah, I think I finally clicked onto the delema :P	21:30
timburke	eh, the warning's probably fine	21:31
timburke	i've probably been overthinking it	21:31
acoles	worth noting that downstream, with ValueError approach, we might have to do work (i.e. rip stuff out of controller) before we could use new metrics via ansible conf.d, so we might be stuck with legacy for a while	21:31
mattoliver	gut still feels like a warning, as thats more historic swift	21:31
acoles	tell you what - I'll spin a warning only version and we can go back and forth more on gerrit - I've already used half the meeting time :)	21:33
mattoliver	maybe we need a config check tool.. maybe swift-config can do the audit	21:33
timburke	no worries	21:33
mattoliver	+1	21:33
timburke	ok, next up	21:34
mattoliver	get swift-config to print dep warnings and tell users they should always run new config through swift-config :)	21:34
timburke	oh, i think i like that... we should probably use that tool more	21:35
timburke	i realized that we're getting toward the end of the dalmatian cycle	21:35
timburke	#link https://releases.openstack.org/dalmatian/schedule.html	21:35
timburke	i need to be getting a client release together soonish	21:36
mattoliver	kk, let's fire up priority reviews page :)	21:36
timburke	i did finally get the liberasurecode release out that i'd meant to do months ago at least :-)	21:36
mattoliver	\o/	21:37
timburke	and within the next couple weeks or so we should get a swift release out	21:37
mattoliver	Then I want to make sure the account-qoutas follow up lands before then.	21:38
timburke	+1	21:38
timburke	i should take another look at that, too	21:38
mattoliver	me 3	21:39
timburke	we probably ought to make a decision on https://review.opendev.org/c/openstack/swift/+/924795 by then, too -- either remove now, or add the warnings before release	21:39
patch-bot	patch 924795 - swift - Remove legacy bin/ scripts - 3 patch sets	21:39
mattoliver	yes. I added my vote ;)	21:40
timburke	i saw! thanks	21:40
mattoliver	maybe we give it a timelimit. and count the +/- 1's	21:40
timburke	i'm sad we still haven't been able to feel comfortable merging https://review.opendev.org/c/openstack/swift/+/853590	21:41
patch-bot	patch 853590 - swift - Drop py2 support - 15 patch sets	21:41
mattoliver	say by next meeting.	21:41
mattoliver	ohh, maybe we should.. or is that rocking the boat too much and to quickly :P	21:42
mattoliver	or do we merge that just after this release.. and that's the line?	21:42
timburke	my main concern is similar to acoles's on the config warning/error question -- would it just be setting us up to have to do work downstream to work around it...	21:43
mattoliver	well yeah, but that's downstream, and maybe the kick in the pants we need :P	21:43
timburke	i suppose we could always merge it, then immediately propose a revert, which we can carry as needed :P	21:45
mattoliver	lol	21:45
mattoliver	well we have a few weeks at the most, let's probe downstream and see where the current blockers are.. but it would be nice to not worry about py2 anymore. I don't think anyone upstream in the community other then us has some dep issues on py2 code in an older codebase	21:46
timburke	no, i don't believe so either	21:47
timburke	next ptg, we should have a straw poll on oldest python version that should still be supported by master -- because i think we could probably drop py36(and maybe even py37/py38?) as well	21:48
mattoliver	k, lets put a pin in it for this meeting an re-discuss next when we might have some more data.	21:48
mattoliver	oh good idea	21:49
timburke	all right, i think those are the main things i've got	21:49
timburke	anything else we should bring up this week?	21:49
fulecorafa	If I may	21:49
zaitcev	Enumerate the distros you're still willing to support, find what Python they ship, and there's your answer.	21:49
timburke	fulecorafa, go ahead!	21:50
fulecorafa	We're having some problems with some users either deleting enourmous files or deleting a large quantity of them. Essentially any delete objects that takes some time to resolve the HTTP request	21:50
fulecorafa	From what I've tested, it seems like it is a simple problem of connection timeout because the operation takes a long time	21:51
fulecorafa	However, I think this should open the possibility of making deletions async	21:51
mattoliver	yeah interesting. async deletion. I guess the question is what status code do you get.	21:52
mattoliver	did build delete ever get the keep-alive heartbeat love.	21:52
mattoliver	*bulk delete	21:52
timburke	204 Accepted -- good for so many things :D	21:53
fulecorafa	For the actual implementation, I think I would go acoles direction to make deletion markers. Although I remember there is something similar already there, even though I didn't find it around in the repo lately...	21:53
mattoliver	well maybe passing in a query arg to indicate an async delete might be ok	21:53
zaitcev	As a workaround, could you do a bulk delete with just 1 entry?	21:54
fulecorafa	That is an idea zaitcev and mattoliver, didn't try that yet	21:54
mattoliver	yeah possibly	21:54
timburke	mattoliver, i'd forgotten that heartbeating was opt-in -- but yeah, pretty sure bulk delete and slo both have it now	21:54
fulecorafa	Will check the possibility	21:54
timburke	fulecorafa, are your users using swift or s3 api? or both?	21:54
fulecorafa	s3api mostly	21:54
zaitcev	oh	21:55
fulecorafa	Yep	21:55
zaitcev	I'm not sure if ?bulk-delete is accessible through S3.	21:55
zaitcev	Sorry I just wanted to get your users going while we're tinkering.	21:55
mattoliver	otherwise dont mind the idea of async just need to think about how. drop a tombstone and wait for something to clean it up, drop it into expirer queue	21:55
fulecorafa	There is a multi-delete controller, but we're having problem with that too	21:55
timburke	oh, no! i was wrong about bulk -- and was in fact thinking about slo PUTs! https://github.com/openstack/swift/blob/master/swift/common/middleware/slo.py#L95-L100	21:56
fulecorafa	One idea I was having, such that we could give backwards support for today's s3api	21:56
mattoliver	timburke: yeah I was wondering if that was something I was going to add to the summer school students and why it was in my head :P	21:57
fulecorafa	You can send a combination of config and query param requesting for async, creating controllers for async where we would want it. The configuration defaults the behaviour, while the query param overwrites it	21:57
mattoliver	i beleive there is a a feature request for it	21:57
mattoliver	fulecorafa: sounds like we're on board, maybe we need to write up something (bug/feature request or wait until next meeting to discuss further) so we can continue the disucssion async	21:59
mattoliver	see what I did there :P	21:59
mattoliver	but think it's a great idea, and useful feature as objects can get pretty big :)	21:59
fulecorafa	Thanks mattoliver. Wanted to be sure this was not available today. Since it is a nice touch, I will open a feature request for that soon than	22:00
mattoliver	ta	22:00
mattoliver	I think we're at time	22:00
mattoliver	I did want to mention that I might have a patch to solving the early active issue we see in getting auto shrinking happening in sharding: https://review.opendev.org/c/openstack/swift/+/926036	22:01
patch-bot	patch 926036 - swift - ShardRange: track last state change timestamp - 3 patch sets	22:01
timburke	fulecorafa, you said it happens when deleting enormous files -- is allow_async_delete configured for slo? it should default to on which is what you'd want https://github.com/openstack/swift/blob/master/etc/proxy-server.conf-sample#L1114	22:01
timburke	s3api should be trying to use that functionality: https://github.com/openstack/swift/blob/2.33.0/swift/common/middleware/s3api/s3request.py#L1518-L1535	22:02
mattoliver	oh good call, I guess they probably are SLOs	22:03
fulecorafa	Thx timburke, I didn't check that, didn't remember that or we're in an old version that it didn't appear to me.	22:03
timburke	oh, and bulk always wants to do that, but the option is called yield_frequency https://github.com/openstack/swift/blob/master/etc/proxy-server.conf-sample#L1054-L1057	22:04
timburke	but i don't think s3api's bulk-delete-equivalent uses that	22:05
fulecorafa	It doesn't ;-;	22:06
timburke	the complete-multipart-upload code might be a useful starting point for similar functionality: https://github.com/openstack/swift/blob/2.33.0/swift/common/middleware/s3api/controllers/multi_upload.py#L788-L818	22:07
* mattoliver needs to go wrangle kids and get them ready for school. So I gotta drop.		22:07
timburke	all right, mattoliver's right, we're past time now -- i should wrap up	22:08
timburke	thank you all for coming, and thank you for working on swift!	22:08
timburke	#endmeeting	22:08
opendevmeet	Meeting ended Wed Aug 14 22:08:26 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	22:08
opendevmeet	Minutes: https://meetings.opendev.org/meetings/swift/2024/swift.2024-08-14-21.09.html	22:08
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/swift/2024/swift.2024-08-14-21.09.txt	22:08
opendevmeet	Log: https://meetings.opendev.org/meetings/swift/2024/swift.2024-08-14-21.09.log.html	22:08
timburke	i'd forgotten how all-async-deleted-segments-must-be-in-one-container was a self-imposed restriction: https://github.com/openstack/swift/blob/master/swift/common/middleware/slo.py#L1768-L1775	23:23
timburke	i think i was worried about the authorize callback potentially being somewhat expensive?	23:24
timburke	or i was just trying to descope to cover only what was strictly needed for s3api	23:24

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!