opendevreview | ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710 | 00:20 |
---|---|---|
opendevreview | Clay Gerrard proposed openstack/swift master: test for ssync meta offset bug https://review.opendev.org/c/openstack/swift/+/874330 | 00:21 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710 | 00:33 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710 | 00:35 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710 | 00:38 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710 | 00:48 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: Add x-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710 | 00:55 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: Add x-backend-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710 | 00:55 |
opendevreview | Matthew Oliver proposed openstack/swift master: docs: Add memcache.conf config doc https://review.opendev.org/c/openstack/swift/+/874720 | 05:20 |
opendevreview | Matthew Oliver proposed openstack/swift master: updater: add memcache shard update lookup support https://review.opendev.org/c/openstack/swift/+/874721 | 05:20 |
opendevreview | Tim Burke proposed openstack/swift master: container: Add delimiter-depth query param https://review.opendev.org/c/openstack/swift/+/829605 | 05:34 |
opendevreview | Tim Burke proposed openstack/swift master: staticweb: Work with prefix-based tempurls https://review.opendev.org/c/openstack/swift/+/810754 | 05:35 |
opendevreview | Tim Burke proposed openstack/swift master: replicator: Add sync_batches_per_revert option https://review.opendev.org/c/openstack/swift/+/839649 | 05:45 |
opendevreview | Matthew Oliver proposed openstack/swift master: db: shard up the DatabaseBroker pending files https://review.opendev.org/c/openstack/swift/+/830551 | 06:23 |
mattoliver | Just a rebase ^ | 06:23 |
mku11 | Hi I have a weird situation that I can maybe get some pointers to investigate further. We have 2 regions where container sync is enabled. Some customers upload objects with an expiration date. When I look at the sync reports some containers can sync 130+ puts per container_time(60) but the number of deletes never exceeds 4. I can't find anything in the code that restricts | 10:31 |
mku11 | deletes. Unfortuanelty the containers where objects have an expiration date run out of sync I guess because there is not enough deletes synced | 10:31 |
opendevreview | Alistair Coles proposed openstack/swift master: sharder: make misplaced objects lookup faster https://review.opendev.org/c/openstack/swift/+/871843 | 13:45 |
opendevreview | Alistair Coles proposed openstack/swift master: sharder: yield fewer rows that have no destination https://review.opendev.org/c/openstack/swift/+/874781 | 14:43 |
opendevreview | Alistair Coles proposed openstack/swift master: ssync: Round-trip offsets in meta/ctype Timestamps https://review.opendev.org/c/openstack/swift/+/874184 | 15:40 |
opendevreview | Alistair Coles proposed openstack/swift master: sharder: make misplaced objects lookup faster https://review.opendev.org/c/openstack/swift/+/871843 | 16:02 |
opendevreview | Alistair Coles proposed openstack/swift master: sharder: yield fewer rows that have no destination https://review.opendev.org/c/openstack/swift/+/874781 | 16:02 |
mku11 | I found the problem with container sync and object expiration. in sync.py object_delete is called withouth retries parameter. Since the object in the other region in most cases is already deleted by the object expirer proces the object_delete gets a 404 not found and retries default 5 times with a total of around 17 seconds before continuing on. This gives a maximum of 4 | 17:47 |
mku11 | deletes per run and very slow advance of the sync pointer | 17:47 |
mku11 | Perhaps object_delete in sync.py can be better called with retries=0 | 17:47 |
timburke | mku11, what version of swift is this? sounds a lot like https://bugs.launchpad.net/swift/+bug/1849841 which should've been fixed in 2.25.0 (so, ussuri) by https://github.com/openstack/swift/commit/f68e22d4 | 18:30 |
mku11 | ah sorry apparently I work with an old version (just got thrown into swift) newton. I had no luck digging up this bug with google. | 18:55 |
timburke | mku11, no worries! just wanted to make sure there wasn't something else affecting later versions :-) | 18:56 |
timburke | i'd definitely recommend upgrading when you get a chance, though -- there are so many *other* bugs we've fixed since then | 18:57 |
mku11 | I will surely give that attention but my first assignment is to move our swift from vm's to iron. When that is done I will look into upgrading. Thanks for the pointer to the patch | 18:59 |
opendevreview | Mandell proposed openstack/swift master: WIP Add grace period to object expirer https://review.opendev.org/c/openstack/swift/+/874806 | 20:15 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: Add x-backend-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710 | 20:16 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: Add x-backend-open-expired to recover expired objects https://review.opendev.org/c/openstack/swift/+/874710 | 20:16 |
opendevreview | Mandell proposed openstack/swift master: WIP Add grace period to object expirer https://review.opendev.org/c/openstack/swift/+/874806 | 20:38 |
indianwhocodes | howdy! | 21:00 |
timburke | #startmeeting swift | 21:00 |
opendevmeet | Meeting started Wed Feb 22 21:00:35 2023 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. | 21:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 21:00 |
opendevmeet | The meeting name has been set to 'swift' | 21:00 |
timburke | who's here for the swift team meeting? | 21:00 |
kota | o/ | 21:00 |
indianwhocodes | o/ | 21:00 |
mattoliver | o/ | 21:01 |
timburke | sorry for the last-minute cancellation last week -- i've been having some computer troubles, so everything just feels harder than it should be :-( | 21:01 |
mattoliver | Nps | 21:02 |
timburke | as usual, the agenda's at | 21:02 |
timburke | #link https://wiki.openstack.org/wiki/Meetings/Swift | 21:02 |
timburke | first up | 21:02 |
timburke | #topic recoving expired objects | 21:02 |
indianwhocodes | my understanding is that we will have two headers for obj-server and proxy-sever seperately ? | 21:03 |
timburke | we've had some users that accidentally let some data expire that they didn't mean to -- and it seemed like it'd be nice if there were a way to help them recover it | 21:03 |
mattoliver | Makes sense | 21:05 |
timburke | today, you can't really do that, at least not easily. the object-server will start responding 404 as soon as the expiration time has passed. you could do something with x-backend-replication:true and internal clients... but it's a little tricky and operator-intensive | 21:05 |
indianwhocodes | noted. | 21:06 |
timburke | indianwhocodes, yeah -- the way i'd imagine this working would be to have one client-facing header, then translate it to something else (possibly even just piggy-backing off the existing x-backend-replication header) when talking to the backend | 21:07 |
timburke | we've got a couple pieces of work started: first, allowing an intentional delay in the expirer processing queue entries | 21:07 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/874806 | 21:07 |
timburke | and second, a client-facing API for retrieving expired data | 21:08 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/874710 | 21:08 |
indianwhocodes | imo, grace_period sounds a bit weird | 21:09 |
mattoliver | oh yeah, x-backend-replication: true basically allows you get the data so a user header -> into this makes sense | 21:10 |
kota | ic | 21:11 |
timburke | i think there are two main (possibly related) questions: (1) does this seem like a reasonable feature? i.e., can we see this merging upstream as opposed to just being something we (nvidia) carry? | 21:11 |
timburke | and (2) what level of access would we want to require for this feature? reseller admin? swift owner? probably not all authed users; certainly not anonymous users | 21:12 |
timburke | i'm don't think we're likely to answer either of those this week, but i wanted to put them out there and draw attention to the patches so we can talk more about them next week or at the vPTG | 21:14 |
kota | good perspective | 21:14 |
mattoliver | I think it could be a useful tool to have. I mean in an ideal world just set a better X-Delete-At if want to access it after then. | 21:14 |
mattoliver | and so long as people then dont expect a grace period of other objects that they could undelete. | 21:15 |
mattoliver | But fact is we have a user who has the need and that goes along way | 21:15 |
mattoliver | I feel like it could be a good enhancement to expired objects and could be used to determine if its been reclaimed (rather then look at the timestamps of the request) a 404 with x-open-expired would mean it's gone | 21:16 |
mattoliver | and something a client could deal with | 21:16 |
kota | it would be being still operator perspective, could it be developed as a middleware? if it would be a plubggable feature like as a debugging tool, either (maintain upstream or not) may be fine. my feel. | 21:17 |
timburke | fwiw, i feel like the delay is not *so* far off from an operator deciding to just turn off expirers for a while -- but better in some significant ways | 21:18 |
timburke | kota, yes, absolutely -- at least for the retrieval side of things | 21:18 |
mattoliver | interensting kota if is was a small middleware we could have an option to make it an admin or authed user etc. | 21:19 |
timburke | (maybe this *is* going to be answerable this week :-) | 21:19 |
mattoliver | basiclly all it would do is look for x-open-expired (or whatever) and convert it to x-backend-replication: true | 21:19 |
mattoliver | and we can check to see if it was an admin request or not (if we enable that option). | 21:20 |
indianwhocodes | I think exposing it as a client api is enough but ofcourse my wsgi middleware knowledge is not on par | 21:21 |
mattoliver | it would be a small middleware, but allows opt in.. hmm | 21:21 |
timburke | i've got a few more topics, so i think i'll keep us moving | 21:21 |
mattoliver | Also something indianwhocodes can bite his teeth into. | 21:21 |
mattoliver | indianwhocodes: most our apis are middlewares | 21:21 |
timburke | #topic ssyncing data with offsets and meta without | 21:21 |
mattoliver | oh this was an interesting bug | 21:23 |
timburke | we recently discovered an issue with syncing .data files that have offsets (specifically, because of object versioning, though we expect this would happen with reconciled data, too) that *also* have .meta on them that *do not* | 21:23 |
timburke | #link https://launchpad.net/bugs/2007643 | 21:23 |
timburke | acoles did a great job diagnosing the issue, writing up the bug, and coming up with a fix | 21:23 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/874122 | 21:23 |
zaitcev | As always. | 21:24 |
timburke | clayg even wrote up a probe test | 21:24 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/874330 | 21:24 |
timburke | and i've got this itch to make sure we can also ssync .metas that have offsets | 21:24 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/874184 | 21:25 |
timburke | i don't think there's too much to discuss on the issue, but wanted to call it out as a recent body of work | 21:25 |
timburke | #topic http keepalive | 21:26 |
mattoliver | yeah, interesting bug that needed just the right combination of ssync, posts, versioned objects. Great work all of you! | 21:26 |
mattoliver | I'll go give a review to what needs it seeing as I wasn't involved | 21:27 |
timburke | some recent experiments showed that constraining max_clients could have a decent benefit to time to first byte latencies (and maybe overall performance? i forget) | 21:27 |
timburke | but it uncovered an annoying issue with out clients | 21:28 |
timburke | sometimes, a client would hold on to an idle connection for a while, just in case it decided to make another request. usually, this isn't much of a problem -- with high max_clients, we can keep some greenthreads watching the idle connection, no trouble | 21:29 |
mattoliver | yeah, we were trampolining on too many eventlet coroutines when we had too large a max_clients.. at least with our clients, workflows and hardware sku's tuning them down helped us... it was an interesting deep dive.. but thankfully our PTL is also an eventlet maintainer :) | 21:30 |
timburke | but with the constrained max_clients, we could find ourselves with all available greenthreads waiting on idle connections while new connections stayed in accept queues or waited to be handed off | 21:31 |
timburke | one of the frustrating things was that from swift's perspective, our TTFB still looked good -- but not from a client perspective :-( | 21:32 |
zaitcev | Ugh. | 21:32 |
mattoliver | yeah, that damn accept queue in the listening socket, meant we hadn't accepted so our timers didn't start.. but the clients had | 21:32 |
timburke | one option was to turn down client_timeout -- iirc we were running with the default 60s, which meant the idle connection would linger a pretty long time | 21:34 |
timburke | but turning it down too much would lead to increased 499s/408s as it looks like the client timed out during request processing | 21:34 |
timburke | i've been working on an eventlet patch to add a separate timeout for reading the start of the next request, and temoto seems on board | 21:35 |
timburke | #link https://github.com/eventlet/eventlet/pull/788 | 21:35 |
timburke | but i've also been trying to do a similar thing purely in swift | 21:36 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/873744 | 21:36 |
mattoliver | I managed to recreate the problem in a VSAIO and had a bad client, and was able to "fix" the bad problem by limiting the wsgi server to HTTP/1.0 basically disconnecting after each request.. which isn't ideal but worked, your fix does this but better, break an idle (after a request and waiting for a new one) after a given amount of time. Allowing these bad clients to get disconected if they're hogging a connection and not using it. | 21:36 |
timburke | i wanted to offer that background on the patch (since otherwise it seems a little crazy) | 21:37 |
mattoliver | I think for those patches background is the key :) | 21:38 |
timburke | and solicit some feedback on one point: if/when the eventlet patch merges, should i change the other one to just plumb in the timeout from config and call out that you need a fairly new eventlet? or continue setting the timeout so it works with old eventlet? | 21:39 |
mattoliver | oh, interesting. I guess it depends on how many others are seeing this bad client behaviour. If people aren't noticing maybe the former as it's less code for us to carry? | 21:40 |
mattoliver | or is it something we want to backport. | 21:40 |
mattoliver | if it is, then the latter for maybe at least the next release is EOLed? | 21:41 |
mattoliver | (just thinking outloud) | 21:41 |
timburke | mattoliver, former was kind of my feeling, too -- it'll require that we (nvidia) remember to upgrade eventlet before the patch stops setting timeouts, but that should be fine | 21:42 |
mattoliver | yeah | 21:42 |
timburke | i don't expect this to be something to backport | 21:42 |
timburke | #topic per-policy quotas | 21:43 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/861282 | 21:43 |
mattoliver | the swift patch does do a little wit of timeout swapping that's harder to grok.. so great workaround, but I vote for newer eventlet long term | 21:43 |
timburke | i'm pretty sure i mentioned this patch not *so* long ago, but wanted to call out that the pre-reqs have seen some updates recently | 21:44 |
timburke | i'd still love to be able to use my nice all-flash policy without worrying about it filling up and causing problems :-) | 21:45 |
timburke | #topic vPTG | 21:46 |
timburke | so... i forgot to actually book rooms 😱 | 21:46 |
mattoliver | lol | 21:46 |
kota | wow | 21:46 |
timburke | i'll follow up with appropriate parties to make sure we have something, though -- don't expect it'll be a problem | 21:46 |
timburke | i'll stick with the same slots we've been using the past few vPTGs | 21:47 |
mattoliver | yeah, get what you can, I can always just drink alot of coffee :P | 21:47 |
kota | ok. nps | 21:47 |
mattoliver | if worse comes to worse | 21:47 |
mattoliver | I've put some stuff in the topics etherpad | 21:48 |
timburke | remember how i said "everything just feels harder than it should be"? this is part of "everything" :P | 21:48 |
timburke | thanks mattoliver! | 21:48 |
mattoliver | #link https://etherpad.opendev.org/p/swift-ptg-bobcat | 21:48 |
mattoliver | lol | 21:48 |
mattoliver | Still probably missing a bunch there. all the topics we talked about today, if they're not landed might be interesting discussions. | 21:49 |
timburke | that's all i've got | 21:49 |
timburke | #topic open discussion | 21:49 |
mattoliver | other things people are working on, or got langushing | 21:49 |
timburke | what else should we discuss this week? | 21:49 |
mattoliver | We've had more stuck shards, but we've already merged the patch that should stop the edgecase. | 21:50 |
mattoliver | I've since also been opening the lids on sharding container-update and async pending stuff | 21:50 |
mattoliver | Done some brain storming about what we can improve when a root is under too high a load.. | 21:51 |
mattoliver | But instead of putting it in here, I just wrote it up in the PTG etherpad... so if anyone is interested you can look / comment there | 21:51 |
mattoliver | or wait until the PTG and I might have some more POC and benchmarking done. | 21:52 |
mattoliver | basically object-updater memcache support, maybe a an container-update header to the object-server to ot do a container-update just write directly to async as a hold off procedure. | 21:53 |
mattoliver | *not | 21:53 |
timburke | ooh, good thought... | 21:54 |
mattoliver | Also build a beefy SAIO on an object server skued dev box, and I plan on seeing how my sharding-pending files patch is doing, to see if it too will help in this area from the other side.. then I'll put it under load... but let's see how far I can get before PTG. | 21:56 |
mattoliver | (probably need to us an A/C skued box, real h/w is a good next step). | 21:56 |
timburke | nice | 21:56 |
timburke | all right, i think i'll call it | 21:57 |
timburke | thank you all for coming, and thank you for working on swift! | 21:57 |
timburke | #endmeeting | 21:57 |
opendevmeet | Meeting ended Wed Feb 22 21:57:19 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 21:57 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/swift/2023/swift.2023-02-22-21.00.html | 21:57 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/swift/2023/swift.2023-02-22-21.00.txt | 21:57 |
opendevmeet | Log: https://meetings.opendev.org/meetings/swift/2023/swift.2023-02-22-21.00.log.html | 21:57 |
opendevreview | Tim Burke proposed openstack/swift master: Fix docstring regarding private method https://review.opendev.org/c/openstack/swift/+/874816 | 23:38 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!