Wednesday, 2023-02-22

opendevreview	ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710	00:20
opendevreview	Clay Gerrard proposed openstack/swift master: test for ssync meta offset bug https://review.opendev.org/c/openstack/swift/+/874330	00:21
opendevreview	ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710	00:33
opendevreview	ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710	00:35
opendevreview	ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710	00:38
opendevreview	ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710	00:48
opendevreview	ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710	00:55
opendevreview	ASHWIN A NAIR proposed openstack/swift master: Add x-backend-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710	00:55
opendevreview	Matthew Oliver proposed openstack/swift master: docs: Add memcache.conf config doc https://review.opendev.org/c/openstack/swift/+/874720	05:20
opendevreview	Matthew Oliver proposed openstack/swift master: updater: add memcache shard update lookup support https://review.opendev.org/c/openstack/swift/+/874721	05:20
opendevreview	Tim Burke proposed openstack/swift master: container: Add delimiter-depth query param https://review.opendev.org/c/openstack/swift/+/829605	05:34
opendevreview	Tim Burke proposed openstack/swift master: staticweb: Work with prefix-based tempurls https://review.opendev.org/c/openstack/swift/+/810754	05:35
opendevreview	Tim Burke proposed openstack/swift master: replicator: Add sync_batches_per_revert option https://review.opendev.org/c/openstack/swift/+/839649	05:45
opendevreview	Matthew Oliver proposed openstack/swift master: db: shard up the DatabaseBroker pending files https://review.opendev.org/c/openstack/swift/+/830551	06:23
mattoliver	Just a rebase ^	06:23
mku11	Hi I have a weird situation that I can maybe get some pointers to investigate further. We have 2 regions where container sync is enabled. Some customers upload objects with an expiration date. When I look at the sync reports some containers can sync 130+ puts per container_time(60) but the number of deletes never exceeds 4. I can't find anything in the code that restricts	10:31
mku11	deletes. Unfortuanelty the containers where objects have an expiration date run out of sync I guess because there is not enough deletes synced	10:31
opendevreview	Alistair Coles proposed openstack/swift master: sharder: make misplaced objects lookup faster https://review.opendev.org/c/openstack/swift/+/871843	13:45
opendevreview	Alistair Coles proposed openstack/swift master: sharder: yield fewer rows that have no destination https://review.opendev.org/c/openstack/swift/+/874781	14:43
opendevreview	Alistair Coles proposed openstack/swift master: ssync: Round-trip offsets in meta/ctype Timestamps https://review.opendev.org/c/openstack/swift/+/874184	15:40
opendevreview	Alistair Coles proposed openstack/swift master: sharder: make misplaced objects lookup faster https://review.opendev.org/c/openstack/swift/+/871843	16:02
opendevreview	Alistair Coles proposed openstack/swift master: sharder: yield fewer rows that have no destination https://review.opendev.org/c/openstack/swift/+/874781	16:02
mku11	I found the problem with container sync and object expiration. in sync.py object_delete is called withouth retries parameter. Since the object in the other region in most cases is already deleted by the object expirer proces the object_delete gets a 404 not found and retries default 5 times with a total of around 17 seconds before continuing on. This gives a maximum of 4	17:47
mku11	deletes per run and very slow advance of the sync pointer	17:47
mku11	Perhaps object_delete in sync.py can be better called with retries=0	17:47
timburke	mku11, what version of swift is this? sounds a lot like https://bugs.launchpad.net/swift/+bug/1849841 which should've been fixed in 2.25.0 (so, ussuri) by https://github.com/openstack/swift/commit/f68e22d4	18:30
mku11	ah sorry apparently I work with an old version (just got thrown into swift) newton. I had no luck digging up this bug with google.	18:55
timburke	mku11, no worries! just wanted to make sure there wasn't something else affecting later versions :-)	18:56
timburke	i'd definitely recommend upgrading when you get a chance, though -- there are so many other bugs we've fixed since then	18:57
mku11	I will surely give that attention but my first assignment is to move our swift from vm's to iron. When that is done I will look into upgrading. Thanks for the pointer to the patch	18:59
opendevreview	Mandell proposed openstack/swift master: WIP Add grace period to object expirer https://review.opendev.org/c/openstack/swift/+/874806	20:15
opendevreview	ASHWIN A NAIR proposed openstack/swift master: Add x-backend-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710	20:16
opendevreview	ASHWIN A NAIR proposed openstack/swift master: Add x-backend-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710	20:16
opendevreview	Mandell proposed openstack/swift master: WIP Add grace period to object expirer https://review.opendev.org/c/openstack/swift/+/874806	20:38
indianwhocodes	howdy!	21:00
timburke	#startmeeting swift	21:00
opendevmeet	Meeting started Wed Feb 22 21:00:35 2023 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.	21:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	21:00
opendevmeet	The meeting name has been set to 'swift'	21:00
timburke	who's here for the swift team meeting?	21:00
kota	o/	21:00
indianwhocodes	o/	21:00
mattoliver	o/	21:01
timburke	sorry for the last-minute cancellation last week -- i've been having some computer troubles, so everything just feels harder than it should be :-(	21:01
mattoliver	Nps	21:02
timburke	as usual, the agenda's at	21:02
timburke	#link https://wiki.openstack.org/wiki/Meetings/Swift	21:02
timburke	first up	21:02
timburke	#topic recoving expired objects	21:02
indianwhocodes	my understanding is that we will have two headers for obj-server and proxy-sever seperately ?	21:03
timburke	we've had some users that accidentally let some data expire that they didn't mean to -- and it seemed like it'd be nice if there were a way to help them recover it	21:03
mattoliver	Makes sense	21:05
timburke	today, you can't really do that, at least not easily. the object-server will start responding 404 as soon as the expiration time has passed. you could do something with x-backend-replication:true and internal clients... but it's a little tricky and operator-intensive	21:05
indianwhocodes	noted.	21:06
timburke	indianwhocodes, yeah -- the way i'd imagine this working would be to have one client-facing header, then translate it to something else (possibly even just piggy-backing off the existing x-backend-replication header) when talking to the backend	21:07
timburke	we've got a couple pieces of work started: first, allowing an intentional delay in the expirer processing queue entries	21:07
timburke	#link https://review.opendev.org/c/openstack/swift/+/874806	21:07
timburke	and second, a client-facing API for retrieving expired data	21:08
timburke	#link https://review.opendev.org/c/openstack/swift/+/874710	21:08
indianwhocodes	imo, grace_period sounds a bit weird	21:09
mattoliver	oh yeah, x-backend-replication: true basically allows you get the data so a user header -> into this makes sense	21:10
kota	ic	21:11
timburke	i think there are two main (possibly related) questions: (1) does this seem like a reasonable feature? i.e., can we see this merging upstream as opposed to just being something we (nvidia) carry?	21:11
timburke	and (2) what level of access would we want to require for this feature? reseller admin? swift owner? probably not all authed users; certainly not anonymous users	21:12
timburke	i'm don't think we're likely to answer either of those this week, but i wanted to put them out there and draw attention to the patches so we can talk more about them next week or at the vPTG	21:14
kota	good perspective	21:14
mattoliver	I think it could be a useful tool to have. I mean in an ideal world just set a better X-Delete-At if want to access it after then.	21:14
mattoliver	and so long as people then dont expect a grace period of other objects that they could undelete.	21:15
mattoliver	But fact is we have a user who has the need and that goes along way	21:15
mattoliver	I feel like it could be a good enhancement to expired objects and could be used to determine if its been reclaimed (rather then look at the timestamps of the request) a 404 with x-open-expired would mean it's gone	21:16
mattoliver	and something a client could deal with	21:16
kota	it would be being still operator perspective, could it be developed as a middleware? if it would be a plubggable feature like as a debugging tool, either (maintain upstream or not) may be fine. my feel.	21:17
timburke	fwiw, i feel like the delay is not so far off from an operator deciding to just turn off expirers for a while -- but better in some significant ways	21:18
timburke	kota, yes, absolutely -- at least for the retrieval side of things	21:18
mattoliver	interensting kota if is was a small middleware we could have an option to make it an admin or authed user etc.	21:19
timburke	(maybe this is going to be answerable this week :-)	21:19
mattoliver	basiclly all it would do is look for x-open-expired (or whatever) and convert it to x-backend-replication: true	21:19
mattoliver	and we can check to see if it was an admin request or not (if we enable that option).	21:20
indianwhocodes	I think exposing it as a client api is enough but ofcourse my wsgi middleware knowledge is not on par	21:21
mattoliver	it would be a small middleware, but allows opt in.. hmm	21:21
timburke	i've got a few more topics, so i think i'll keep us moving	21:21
mattoliver	Also something indianwhocodes can bite his teeth into.	21:21
mattoliver	indianwhocodes: most our apis are middlewares	21:21
timburke	#topic ssyncing data with offsets and meta without	21:21
mattoliver	oh this was an interesting bug	21:23
timburke	we recently discovered an issue with syncing .data files that have offsets (specifically, because of object versioning, though we expect this would happen with reconciled data, too) that also have .meta on them that do not	21:23
timburke	#link https://launchpad.net/bugs/2007643	21:23
timburke	acoles did a great job diagnosing the issue, writing up the bug, and coming up with a fix	21:23
timburke	#link https://review.opendev.org/c/openstack/swift/+/874122	21:23
zaitcev	As always.	21:24
timburke	clayg even wrote up a probe test	21:24
timburke	#link https://review.opendev.org/c/openstack/swift/+/874330	21:24
timburke	and i've got this itch to make sure we can also ssync .metas that have offsets	21:24
timburke	#link https://review.opendev.org/c/openstack/swift/+/874184	21:25
timburke	i don't think there's too much to discuss on the issue, but wanted to call it out as a recent body of work	21:25
timburke	#topic http keepalive	21:26
mattoliver	yeah, interesting bug that needed just the right combination of ssync, posts, versioned objects. Great work all of you!	21:26
mattoliver	I'll go give a review to what needs it seeing as I wasn't involved	21:27
timburke	some recent experiments showed that constraining max_clients could have a decent benefit to time to first byte latencies (and maybe overall performance? i forget)	21:27
timburke	but it uncovered an annoying issue with out clients	21:28
timburke	sometimes, a client would hold on to an idle connection for a while, just in case it decided to make another request. usually, this isn't much of a problem -- with high max_clients, we can keep some greenthreads watching the idle connection, no trouble	21:29
mattoliver	yeah, we were trampolining on too many eventlet coroutines when we had too large a max_clients.. at least with our clients, workflows and hardware sku's tuning them down helped us... it was an interesting deep dive.. but thankfully our PTL is also an eventlet maintainer :)	21:30
timburke	but with the constrained max_clients, we could find ourselves with all available greenthreads waiting on idle connections while new connections stayed in accept queues or waited to be handed off	21:31
timburke	one of the frustrating things was that from swift's perspective, our TTFB still looked good -- but not from a client perspective :-(	21:32
zaitcev	Ugh.	21:32
mattoliver	yeah, that damn accept queue in the listening socket, meant we hadn't accepted so our timers didn't start.. but the clients had	21:32
timburke	one option was to turn down client_timeout -- iirc we were running with the default 60s, which meant the idle connection would linger a pretty long time	21:34
timburke	but turning it down too much would lead to increased 499s/408s as it looks like the client timed out during request processing	21:34
timburke	i've been working on an eventlet patch to add a separate timeout for reading the start of the next request, and temoto seems on board	21:35
timburke	#link https://github.com/eventlet/eventlet/pull/788	21:35
timburke	but i've also been trying to do a similar thing purely in swift	21:36
timburke	#link https://review.opendev.org/c/openstack/swift/+/873744	21:36
mattoliver	I managed to recreate the problem in a VSAIO and had a bad client, and was able to "fix" the bad problem by limiting the wsgi server to HTTP/1.0 basically disconnecting after each request.. which isn't ideal but worked, your fix does this but better, break an idle (after a request and waiting for a new one) after a given amount of time. Allowing these bad clients to get disconected if they're hogging a connection and not using it.	21:36
timburke	i wanted to offer that background on the patch (since otherwise it seems a little crazy)	21:37
mattoliver	I think for those patches background is the key :)	21:38
timburke	and solicit some feedback on one point: if/when the eventlet patch merges, should i change the other one to just plumb in the timeout from config and call out that you need a fairly new eventlet? or continue setting the timeout so it works with old eventlet?	21:39
mattoliver	oh, interesting. I guess it depends on how many others are seeing this bad client behaviour. If people aren't noticing maybe the former as it's less code for us to carry?	21:40
mattoliver	or is it something we want to backport.	21:40
mattoliver	if it is, then the latter for maybe at least the next release is EOLed?	21:41
mattoliver	(just thinking outloud)	21:41
timburke	mattoliver, former was kind of my feeling, too -- it'll require that we (nvidia) remember to upgrade eventlet before the patch stops setting timeouts, but that should be fine	21:42
mattoliver	yeah	21:42
timburke	i don't expect this to be something to backport	21:42
timburke	#topic per-policy quotas	21:43
timburke	#link https://review.opendev.org/c/openstack/swift/+/861282	21:43
mattoliver	the swift patch does do a little wit of timeout swapping that's harder to grok.. so great workaround, but I vote for newer eventlet long term	21:43
timburke	i'm pretty sure i mentioned this patch not so long ago, but wanted to call out that the pre-reqs have seen some updates recently	21:44
timburke	i'd still love to be able to use my nice all-flash policy without worrying about it filling up and causing problems :-)	21:45
timburke	#topic vPTG	21:46
timburke	so... i forgot to actually book rooms 😱	21:46
mattoliver	lol	21:46
kota	wow	21:46
timburke	i'll follow up with appropriate parties to make sure we have something, though -- don't expect it'll be a problem	21:46
timburke	i'll stick with the same slots we've been using the past few vPTGs	21:47
mattoliver	yeah, get what you can, I can always just drink alot of coffee :P	21:47
kota	ok. nps	21:47
mattoliver	if worse comes to worse	21:47
mattoliver	I've put some stuff in the topics etherpad	21:48
timburke	remember how i said "everything just feels harder than it should be"? this is part of "everything" :P	21:48
timburke	thanks mattoliver!	21:48
mattoliver	#link https://etherpad.opendev.org/p/swift-ptg-bobcat	21:48
mattoliver	lol	21:48
mattoliver	Still probably missing a bunch there. all the topics we talked about today, if they're not landed might be interesting discussions.	21:49
timburke	that's all i've got	21:49
timburke	#topic open discussion	21:49
mattoliver	other things people are working on, or got langushing	21:49
timburke	what else should we discuss this week?	21:49
mattoliver	We've had more stuck shards, but we've already merged the patch that should stop the edgecase.	21:50
mattoliver	I've since also been opening the lids on sharding container-update and async pending stuff	21:50
mattoliver	Done some brain storming about what we can improve when a root is under too high a load..	21:51
mattoliver	But instead of putting it in here, I just wrote it up in the PTG etherpad... so if anyone is interested you can look / comment there	21:51
mattoliver	or wait until the PTG and I might have some more POC and benchmarking done.	21:52
mattoliver	basically object-updater memcache support, maybe a an container-update header to the object-server to ot do a container-update just write directly to async as a hold off procedure.	21:53
mattoliver	*not	21:53
timburke	ooh, good thought...	21:54
mattoliver	Also build a beefy SAIO on an object server skued dev box, and I plan on seeing how my sharding-pending files patch is doing, to see if it too will help in this area from the other side.. then I'll put it under load... but let's see how far I can get before PTG.	21:56
mattoliver	(probably need to us an A/C skued box, real h/w is a good next step).	21:56
timburke	nice	21:56
timburke	all right, i think i'll call it	21:57
timburke	thank you all for coming, and thank you for working on swift!	21:57
timburke	#endmeeting	21:57
opendevmeet	Meeting ended Wed Feb 22 21:57:19 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	21:57
opendevmeet	Minutes: https://meetings.opendev.org/meetings/swift/2023/swift.2023-02-22-21.00.html	21:57
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/swift/2023/swift.2023-02-22-21.00.txt	21:57
opendevmeet	Log: https://meetings.opendev.org/meetings/swift/2023/swift.2023-02-22-21.00.log.html	21:57
opendevreview	Tim Burke proposed openstack/swift master: Fix docstring regarding private method https://review.opendev.org/c/openstack/swift/+/874816	23:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!