xrkr | acoles: mattoliver: DHE: i have tried db vacuum.. the db file got reduced only from 74G to 68G. | 00:45 |
---|---|---|
mattoliver | well it was a little bloated then, but still that's a big container! | 00:55 |
mattoliver | If the container is still to big to really fit on any of your disks, then sharding would probably be your answer. That'll break it down in to multiple smaller dbs that will be spread across the cluster. | 00:57 |
xrkr | Hi mattoliver: Any guidance on sharding pls. | 03:18 |
xrkr | acoles: mattoliver: DHE: I also see db_preallocation = True in my [container-updater] [container-auditor] [container-server].. Changing this to false would help? | 04:07 |
mattoliver | xrkr: A good place to start for info at least is: https://docs.openstack.org/swift/latest/overview_container_sharding.html But we can help you get it going :) | 05:33 |
mattoliver | There is an auto-sharding option in the sharder, this ISN"T supported or ready for prod use yet. But basically allows the sharder to do leader elleection, scan for big containers, insert shardranges (poits to shard at) and does the sharding.. one day we'd be able just turn this on (or in better always have it on my default). But we can't do this yet. (I'm working on it). | 05:35 |
mattoliver | So instead at the moment it's operator driven. | 05:35 |
mattoliver | Firstly you need to be running the sharder daemons on your container servers. When they run, they actually drop recon data about your biggest (And best) candidate DBs for sharding. | 05:36 |
mattoliver | You can then go to ONLY one of the replicas of the container you want to shard and use the swift-manage-shard-ranges operator tool to find and insert the shardranges into the container and enable it for sharding. | 05:37 |
mattoliver | Using this tool, you can decide how you want to split it up.. in what size chunks. 1 million object rows. 500k :shrug:. | 05:38 |
xrkr | Thanks mattoliver: Let me get this checked. How about my comment on db_preallocation ? Would that work? | 05:43 |
mattoliver | Once you'd enabled the container, the sharder when it next visits it will see the shardranges the tool inserted and start sharding (splitting) the container up. How many it splits at a time (cleave_batch_size) is configurable in the sharder config. If you want to happen faster you can set this to a higher number (or the number of shardranges inserted to do it at once). | 05:44 |
mattoliver | Here's more information on the sharder settings: https://github.com/openstack/swift/blob/master/etc/container-server.conf-sample#L374 | 05:44 |
mattoliver | but again we can walk you through if you need help. | 05:45 |
mattoliver | xrkr: db_preallocation is to allocate space for an expanding db. if there isn't enough room on the server your syncing a new db I don't think it'll do anything. | 05:55 |
xrkr | Ok. Thanks mattoliver: I shall revert | 06:15 |
*** avanzaghi11 is now known as avanzaghi1 | 08:15 | |
acoles | xrkr: db_preallocation = False will, I think, help you get around the immediate problem because IIUC the server is currently trying and failing to preallocate ~68G. But longer term you may want to think about sharding that container. | 09:01 |
xrkr | Hi acoles: mattoliver thinks otherwise, however i am just thinking for my 3 site cluster, i do notice this db exists one in 3 servers in each site. I was just wondering if i can create another temp configuration file and start the container server process just in those nodes where this db exists. I am unsure if this works though (proivided at all db_preallocation = false will help us). | 11:58 |
mattoliver | my reading of the preallocation code uses fallocate to pre reserve some space for expansions. And it does it in steps. Turning it to false would mean it would not require as much space at the onset (putting space aside). So it does make things better.. but only if there is enough room on the remote device to actually house the DB. And I was under the impression you didn't have room to replicate. | 12:46 |
acoles | @mattoliver IIUC the problem xrkr has is that the DB can't pre-allocate in situ, so container listing is failing because fallocate fails when the pending file is flushed to the DB | 12:49 |
opendevreview | Alistair Coles proposed openstack/swift master: Ignore 404s from handoffs when choosing response code https://review.opendev.org/c/openstack/swift/+/925993 | 14:11 |
opendevreview | Alistair Coles proposed openstack/swift master: object-server POST: return x-backend-timestamp in 404 https://review.opendev.org/c/openstack/swift/+/926220 | 14:11 |
opendevreview | Elod Illes proposed openstack/python-swiftclient master: DNM: gate health test https://review.opendev.org/c/openstack/python-swiftclient/+/926336 | 16:10 |
opendevreview | Alistair Coles proposed openstack/swift master: Ignore 404s from handoffs when choosing response code https://review.opendev.org/c/openstack/swift/+/925993 | 16:42 |
opendevreview | Alistair Coles proposed openstack/swift master: object-server POST: return x-backend-timestamp in 404 https://review.opendev.org/c/openstack/swift/+/926220 | 16:42 |
opendevreview | Tim Burke proposed openstack/liberasurecode master: Release 1.6.4 https://review.opendev.org/c/openstack/liberasurecode/+/917784 | 16:52 |
opendevreview | Merged openstack/liberasurecode master: Release 1.6.4 https://review.opendev.org/c/openstack/liberasurecode/+/917784 | 17:37 |
opendevreview | Clay Gerrard proposed openstack/swift master: wip: add some test infra https://review.opendev.org/c/openstack/swift/+/926349 | 17:42 |
mattoliver | @acoles: oh sorry didn't realise it was that bad, i mustve missed that part in scrollbars! Yeah xrkr turn it off! | 20:25 |
mattoliver | Morning, I wonder if we're meeting today? Maybe I've just forgotten if it got cancelled 🙃 | 21:07 |
fulecorafa | Here wondering too | 21:07 |
acoles | o/ | 21:08 |
mattoliver | Is timburke around today? | 21:09 |
timburke | oh, right! meeting! | 21:09 |
timburke | sorry everybody | 21:09 |
timburke | #startmeeting swift | 21:09 |
opendevmeet | Meeting started Wed Aug 14 21:09:50 2024 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. | 21:09 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 21:09 |
opendevmeet | The meeting name has been set to 'swift' | 21:09 |
timburke | who's here for the swift meeting? | 21:09 |
mattoliver | o/ | 21:10 |
fulecorafa | o/ | 21:10 |
acoles | \o | 21:10 |
acoles | I'm just here til 30 after the hour | 21:10 |
timburke | sorry for the delay -- between kids being home for this last week before going back to school and me getting nerd-sniped by some performance investigation, i'm not well prepared today ;-) | 21:11 |
mattoliver | lol, understandable! | 21:11 |
timburke | acoles, since you've got to cut out early anyway, is there anything you'd like to bring up? | 21:12 |
acoles | timburke: the patch to deprecate log_statsd_* options... | 21:13 |
acoles | https://review.opendev.org/c/openstack/swift/+/922518/ | 21:13 |
patch-bot | patch 922518 - swift - statsd: deprecate log_ prefix for options - 11 patch sets | 21:13 |
acoles | clayg: made some good observations about the decision to blow up if both the old and new options were found in conf: | 21:14 |
acoles | 1. that seems too harsh if the old and new options have the same value! | 21:14 |
timburke | fair | 21:15 |
acoles | 2. that prevents ops adding config options *in anticipation* of upgrade / problematic if a node isn;t upgraded | 21:15 |
timburke | but then why move off the old options at all? suffer the warning until you can move with confidence! | 21:16 |
acoles | 3. with conf.d style it's easy to miss that there's a legacy option still in there (this would be mitigated by tolerating same value) | 21:18 |
acoles | we noted that if we turned it down to just a warning, then worst case some stats stop appearing and ops check logs and see warning and go fix conf | 21:18 |
acoles | anyways, I wanted to solicit more opinion... | 21:19 |
timburke | eh, it's certainly not a hill i'm looking to die on | 21:20 |
acoles | I certainly think that tolerating old-and-new-but-same-value without blowing up seems reasonable | 21:20 |
acoles | hehe, I'm not sure anyone wants to dis on this particular hill ;-) | 21:20 |
timburke | tolerating both, even with different values seems perfectly in line with other times that we've tried to push for renaming options | 21:20 |
acoles | we'd obvs make the warning pretty clear - "you gave me x and y and I'm ignoring y" | 21:21 |
mattoliver | yeah supporting new but falling back to old seems the normal migration path. With warnings to me makes sense. But maybe I'm missing something obvious (it is early) :P | 21:22 |
acoles | the context is this comment https://review.opendev.org/c/openstack/swift/+/922518/comment/56eb874d_9707e7c5/ | 21:23 |
patch-bot | patch 922518 - swift - statsd: deprecate log_ prefix for options - 11 patch sets | 21:23 |
acoles | we're concerned that an op might update to new option in DEFAULT and not realise it overrides what was previously a different value in proxy-logging | 21:24 |
acoles | which led us to "blow up if old and new are present" ... but now reflecting on whether that is too brittle | 21:25 |
timburke | i think my perspective when thinking about going for an error was if you've got log_statsd_host=foo, log_statsd_port=12345, statsd_host=bar -- that having some but not all options coming from the correct place seems likely a misconfiguration | 21:25 |
acoles | yup, but does misconfiguration result in warning or ValueError ??? | 21:26 |
mattoliver | oh I see. But if they know they're moving to the new (and done so in DEFAULT) isn't it on them to finish the job.. I think warning of settings is enough, to alert them. | 21:27 |
mattoliver | But I guess it depends on where we think metrics sit on importance in swift | 21:27 |
mattoliver | if it's critial then it should ValueError | 21:27 |
mattoliver | but it seems kinda optional (which maybe wrong) so warning is enough | 21:28 |
timburke | maybe my real concern is that i want a statsd_endpoint config option that looks like host:port -- which is definitely out of scope :P | 21:29 |
mattoliver | Downstream our metrics are super important, so I'd assume ValueError would be more useful, because we may not check/see warning for some time. | 21:29 |
mattoliver | so yeah, I think I finally clicked onto the delema :P | 21:30 |
timburke | eh, the warning's probably fine | 21:31 |
timburke | i've probably been overthinking it | 21:31 |
acoles | worth noting that downstream, with ValueError approach, we might have to do *work* (i.e. rip stuff out of controller) before we could use new metrics via ansible conf.d, so we might be stuck with legacy for a while | 21:31 |
mattoliver | gut still feels like a warning, as thats more historic swift | 21:31 |
acoles | tell you what - I'll spin a warning only version and we can go back and forth more on gerrit - I've already used half the meeting time :) | 21:33 |
mattoliver | maybe we need a config check tool.. maybe swift-config can do the audit | 21:33 |
timburke | no worries | 21:33 |
mattoliver | +1 | 21:33 |
timburke | ok, next up | 21:34 |
mattoliver | get swift-config to print dep warnings and tell users they should always run new config through swift-config :) | 21:34 |
timburke | oh, i think i like that... we should probably use that tool more | 21:35 |
timburke | i realized that we're getting toward the end of the dalmatian cycle | 21:35 |
timburke | #link https://releases.openstack.org/dalmatian/schedule.html | 21:35 |
timburke | i need to be getting a client release together soonish | 21:36 |
mattoliver | kk, let's fire up priority reviews page :) | 21:36 |
timburke | i *did* finally get the liberasurecode release out that i'd meant to do months ago at least :-) | 21:36 |
mattoliver | \o/ | 21:37 |
timburke | and within the next couple weeks or so we should get a swift release out | 21:37 |
mattoliver | Then I want to make sure the account-qoutas follow up lands before then. | 21:38 |
timburke | +1 | 21:38 |
timburke | i should take another look at that, too | 21:38 |
mattoliver | me 3 | 21:39 |
timburke | we probably ought to make a decision on https://review.opendev.org/c/openstack/swift/+/924795 by then, too -- either remove now, or add the warnings before release | 21:39 |
patch-bot | patch 924795 - swift - Remove legacy bin/ scripts - 3 patch sets | 21:39 |
mattoliver | yes. I added my vote ;) | 21:40 |
timburke | i saw! thanks | 21:40 |
mattoliver | maybe we give it a timelimit. and count the +/- 1's | 21:40 |
timburke | i'm sad we still haven't been able to feel comfortable merging https://review.opendev.org/c/openstack/swift/+/853590 | 21:41 |
patch-bot | patch 853590 - swift - Drop py2 support - 15 patch sets | 21:41 |
mattoliver | say by next meeting. | 21:41 |
mattoliver | ohh, maybe we should.. or is that rocking the boat too much and to quickly :P | 21:42 |
mattoliver | or do we merge that just after this release.. and that's the line? | 21:42 |
timburke | my main concern is similar to acoles's on the config warning/error question -- would it just be setting us up to have to do work downstream to work around it... | 21:43 |
mattoliver | well yeah, but that's downstream, and maybe the kick in the pants we need :P | 21:43 |
timburke | i suppose we could always merge it, then immediately propose a revert, which we can carry as needed :P | 21:45 |
mattoliver | lol | 21:45 |
mattoliver | well we have a few weeks at the most, let's probe downstream and see where the current blockers are.. but it would be nice to not worry about py2 anymore. I don't think anyone upstream in the community other then us has some dep issues on py2 code in an older codebase | 21:46 |
timburke | no, i don't believe so either | 21:47 |
timburke | next ptg, we should have a straw poll on oldest python version that should still be supported by master -- because i think we could probably drop py36(and maybe even py37/py38?) as well | 21:48 |
mattoliver | k, lets put a pin in it for this meeting an re-discuss next when we might have some more data. | 21:48 |
mattoliver | oh good idea | 21:49 |
timburke | all right, i think those are the main things i've got | 21:49 |
timburke | anything else we should bring up this week? | 21:49 |
fulecorafa | If I may | 21:49 |
zaitcev | Enumerate the distros you're still willing to support, find what Python they ship, and there's your answer. | 21:49 |
timburke | fulecorafa, go ahead! | 21:50 |
fulecorafa | We're having some problems with some users either deleting enourmous files or deleting a large quantity of them. Essentially any delete objects that takes some time to resolve the HTTP request | 21:50 |
fulecorafa | From what I've tested, it seems like it is a simple problem of connection timeout because the operation takes a long time | 21:51 |
fulecorafa | However, I think this should open the possibility of making deletions async | 21:51 |
mattoliver | yeah interesting. async deletion. I guess the question is what status code do you get. | 21:52 |
mattoliver | did build delete ever get the keep-alive heartbeat love. | 21:52 |
mattoliver | *bulk delete | 21:52 |
timburke | 204 Accepted -- good for so many things :D | 21:53 |
fulecorafa | For the actual implementation, I think I would go acoles direction to make deletion markers. Although I remember there is something similar already there, even though I didn't find it around in the repo lately... | 21:53 |
mattoliver | well maybe passing in a query arg to indicate an async delete might be ok | 21:53 |
zaitcev | As a workaround, could you do a bulk delete with just 1 entry? | 21:54 |
fulecorafa | That is an idea zaitcev and mattoliver, didn't try that yet | 21:54 |
mattoliver | yeah possibly | 21:54 |
timburke | mattoliver, i'd forgotten that heartbeating was opt-in -- but yeah, pretty sure bulk delete and slo both have it now | 21:54 |
fulecorafa | Will check the possibility | 21:54 |
timburke | fulecorafa, are your users using swift or s3 api? or both? | 21:54 |
fulecorafa | s3api mostly | 21:54 |
zaitcev | oh | 21:55 |
fulecorafa | Yep | 21:55 |
zaitcev | I'm not sure if ?bulk-delete is accessible through S3. | 21:55 |
zaitcev | Sorry I just wanted to get your users going while we're tinkering. | 21:55 |
mattoliver | otherwise dont mind the idea of async just need to think about how. drop a tombstone and wait for something to clean it up, drop it into expirer queue | 21:55 |
fulecorafa | There is a multi-delete controller, but we're having problem with that too | 21:55 |
timburke | oh, no! i was wrong about bulk -- and was in fact thinking about slo PUTs! https://github.com/openstack/swift/blob/master/swift/common/middleware/slo.py#L95-L100 | 21:56 |
fulecorafa | One idea I was having, such that we could give backwards support for today's s3api | 21:56 |
mattoliver | timburke: yeah I was wondering if that was something I was going to add to the summer school students and why it was in my head :P | 21:57 |
fulecorafa | You can send a combination of config and query param requesting for async, creating controllers for async where we would want it. The configuration defaults the behaviour, while the query param overwrites it | 21:57 |
mattoliver | i beleive there is a a feature request for it | 21:57 |
mattoliver | fulecorafa: sounds like we're on board, maybe we need to write up something (bug/feature request or wait until next meeting to discuss further) so we can continue the disucssion async | 21:59 |
mattoliver | see what I did there :P | 21:59 |
mattoliver | but think it's a great idea, and useful feature as objects can get pretty big :) | 21:59 |
fulecorafa | Thanks mattoliver. Wanted to be sure this was not available today. Since it is a nice touch, I will open a feature request for that soon than | 22:00 |
mattoliver | ta | 22:00 |
mattoliver | I think we're at time | 22:00 |
mattoliver | I did want to mention that I might have a patch to solving the early active issue we see in getting auto shrinking happening in sharding: https://review.opendev.org/c/openstack/swift/+/926036 | 22:01 |
patch-bot | patch 926036 - swift - ShardRange: track last state change timestamp - 3 patch sets | 22:01 |
timburke | fulecorafa, you said it happens when deleting enormous files -- is allow_async_delete configured for slo? it should default to on which is what you'd want https://github.com/openstack/swift/blob/master/etc/proxy-server.conf-sample#L1114 | 22:01 |
timburke | s3api should be trying to use that functionality: https://github.com/openstack/swift/blob/2.33.0/swift/common/middleware/s3api/s3request.py#L1518-L1535 | 22:02 |
mattoliver | oh good call, I guess they probably are SLOs | 22:03 |
fulecorafa | Thx timburke, I didn't check that, didn't remember that or we're in an old version that it didn't appear to me. | 22:03 |
timburke | oh, and bulk *always* wants to do that, but the option is called yield_frequency https://github.com/openstack/swift/blob/master/etc/proxy-server.conf-sample#L1054-L1057 | 22:04 |
timburke | but i don't think s3api's bulk-delete-equivalent uses that | 22:05 |
fulecorafa | It doesn't ;-; | 22:06 |
timburke | the complete-multipart-upload code might be a useful starting point for similar functionality: https://github.com/openstack/swift/blob/2.33.0/swift/common/middleware/s3api/controllers/multi_upload.py#L788-L818 | 22:07 |
* mattoliver needs to go wrangle kids and get them ready for school. So I gotta drop. | 22:07 | |
timburke | all right, mattoliver's right, we're past time now -- i should wrap up | 22:08 |
timburke | thank you all for coming, and thank you for working on swift! | 22:08 |
timburke | #endmeeting | 22:08 |
opendevmeet | Meeting ended Wed Aug 14 22:08:26 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 22:08 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/swift/2024/swift.2024-08-14-21.09.html | 22:08 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/swift/2024/swift.2024-08-14-21.09.txt | 22:08 |
opendevmeet | Log: https://meetings.opendev.org/meetings/swift/2024/swift.2024-08-14-21.09.log.html | 22:08 |
timburke | i'd forgotten how all-async-deleted-segments-must-be-in-one-container was a self-imposed restriction: https://github.com/openstack/swift/blob/master/swift/common/middleware/slo.py#L1768-L1775 | 23:23 |
timburke | i think i was worried about the authorize callback potentially being somewhat expensive? | 23:24 |
timburke | or i was just trying to descope to cover only what was strictly needed for s3api | 23:24 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!