Wednesday, 2024-08-14

xrkracoles: mattoliver: DHE: i have tried db vacuum.. the db file got reduced only from 74G to 68G. 00:45
mattoliverwell it was a little bloated then, but still that's a big container!00:55
mattoliverIf the container is still to big to really fit on any of your disks, then sharding would probably be your answer. That'll break it down in to multiple smaller dbs that will be spread across the cluster.00:57
xrkrHi mattoliver: Any guidance on sharding pls. 03:18
xrkracoles: mattoliver: DHE: I also see db_preallocation = True in my [container-updater] [container-auditor] [container-server].. Changing this to false would help?04:07
mattoliverxrkr: A good place to start for info at least is: https://docs.openstack.org/swift/latest/overview_container_sharding.html But we can help you get it going :)05:33
mattoliverThere is an auto-sharding option in the sharder, this ISN"T supported or ready for prod use yet. But basically allows the sharder to do leader elleection, scan for big containers, insert shardranges (poits to shard at) and does the sharding.. one day we'd be able just turn this on (or in better always have it on my default). But we can't do this yet. (I'm working on it).05:35
mattoliverSo instead at the moment it's operator driven.05:35
mattoliverFirstly you need to be running the sharder daemons on your container servers. When they run, they actually drop recon data about your biggest (And best) candidate DBs for sharding. 05:36
mattoliverYou can then go to ONLY one of the replicas of the container you want to shard and use the swift-manage-shard-ranges operator tool to find and insert the shardranges into the container and enable it for sharding.05:37
mattoliverUsing this tool, you can decide how you want to split it up.. in what size chunks. 1 million object rows. 500k :shrug:. 05:38
xrkrThanks mattoliver: Let me get this checked. How about my comment on db_preallocation ? Would that work?05:43
mattoliverOnce you'd enabled the container, the sharder when it next visits it will see the shardranges the tool inserted and start sharding (splitting) the container up. How many it splits at a time (cleave_batch_size) is  configurable in the sharder config. If you want to happen faster you can set this to a higher number (or the number of shardranges inserted to do it at once). 05:44
mattoliverHere's more information on the sharder settings: https://github.com/openstack/swift/blob/master/etc/container-server.conf-sample#L374 05:44
mattoliverbut again we can walk you through if you need help.05:45
mattoliverxrkr: db_preallocation is to allocate space for an expanding db. if there isn't enough room on the server your syncing a new db I don't think it'll do anything.05:55
xrkrOk. Thanks mattoliver: I shall revert06:15
*** avanzaghi11 is now known as avanzaghi108:15
acolesxrkr: db_preallocation = False will, I think, help you get around the immediate problem because IIUC the server is currently trying and failing to preallocate ~68G. But longer term you may want to think about sharding that container.09:01
xrkrHi acoles: mattoliver thinks otherwise, however i am just thinking for my 3 site cluster, i do notice this db exists one in 3 servers in each site. I was just wondering if i can create another temp configuration file and start the container server process just in those nodes where this db exists. I am unsure if this works though (proivided at all db_preallocation = false will help us).11:58
mattolivermy reading of the preallocation code uses fallocate to pre reserve some space for expansions. And it does it in steps. Turning it to false would mean it would not require as much space at the onset (putting space aside). So it does make things better.. but only if there is enough room on the remote device to actually house the DB. And I was under the impression you didn't have room to replicate.12:46
acoles@mattoliver IIUC the problem xrkr has is that the DB can't pre-allocate in situ, so container listing is failing because fallocate fails when the pending file is flushed to the DB12:49
opendevreviewAlistair Coles proposed openstack/swift master: Ignore 404s from handoffs when choosing response code  https://review.opendev.org/c/openstack/swift/+/92599314:11
opendevreviewAlistair Coles proposed openstack/swift master: object-server POST: return x-backend-timestamp in 404  https://review.opendev.org/c/openstack/swift/+/92622014:11
opendevreviewElod Illes proposed openstack/python-swiftclient master: DNM: gate health test  https://review.opendev.org/c/openstack/python-swiftclient/+/92633616:10
opendevreviewAlistair Coles proposed openstack/swift master: Ignore 404s from handoffs when choosing response code  https://review.opendev.org/c/openstack/swift/+/92599316:42
opendevreviewAlistair Coles proposed openstack/swift master: object-server POST: return x-backend-timestamp in 404  https://review.opendev.org/c/openstack/swift/+/92622016:42
opendevreviewTim Burke proposed openstack/liberasurecode master: Release 1.6.4  https://review.opendev.org/c/openstack/liberasurecode/+/91778416:52
opendevreviewMerged openstack/liberasurecode master: Release 1.6.4  https://review.opendev.org/c/openstack/liberasurecode/+/91778417:37
opendevreviewClay Gerrard proposed openstack/swift master: wip: add some test infra  https://review.opendev.org/c/openstack/swift/+/92634917:42
mattoliver@acoles: oh sorry didn't realise it was that bad, i mustve missed that part in scrollbars! Yeah xrkr turn it off!20:25
mattoliverMorning, I wonder if we're meeting today? Maybe I've just forgotten if it got cancelled 🙃21:07
fulecorafaHere wondering too21:07
acoleso/21:08
mattoliverIs timburke around today?21:09
timburkeoh, right! meeting!21:09
timburkesorry everybody21:09
timburke#startmeeting swift21:09
opendevmeetMeeting started Wed Aug 14 21:09:50 2024 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.21:09
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:09
opendevmeetThe meeting name has been set to 'swift'21:09
timburkewho's here for the swift meeting?21:09
mattolivero/21:10
fulecorafao/21:10
acoles\o21:10
acolesI'm just here til 30 after the hour21:10
timburkesorry for the delay -- between kids being home for this last week before going back to school and me getting nerd-sniped by some performance investigation, i'm not well prepared today ;-)21:11
mattoliverlol, understandable! 21:11
timburkeacoles, since you've got to cut out early anyway, is there anything you'd like to bring up?21:12
acolestimburke: the patch to deprecate log_statsd_* options...21:13
acoleshttps://review.opendev.org/c/openstack/swift/+/922518/21:13
patch-botpatch 922518 - swift - statsd: deprecate log_ prefix for options - 11 patch sets21:13
acolesclayg: made some good observations about the decision to blow up if both the old and new options were found in conf:21:14
acoles1. that seems too harsh if the old and new options have the same value!21:14
timburkefair21:15
acoles2. that prevents ops adding config options *in anticipation* of upgrade / problematic if a node isn;t upgraded21:15
timburkebut then why move off the old options at all? suffer the warning until you can move with confidence!21:16
acoles3. with conf.d style it's easy to miss that there's a legacy option still in there (this would be mitigated by tolerating same value)21:18
acoleswe noted that if we turned it down to just a warning, then worst case some stats stop appearing and ops check logs and see warning and go fix conf21:18
acolesanyways, I wanted to solicit more opinion...21:19
timburkeeh, it's certainly not a hill i'm looking to die on21:20
acolesI certainly think that tolerating old-and-new-but-same-value without blowing up seems reasonable21:20
acoleshehe, I'm not sure anyone wants to dis on this particular hill ;-)21:20
timburketolerating both, even with different values seems perfectly in line with other times that we've tried to push for renaming options21:20
acoleswe'd obvs make the warning pretty clear - "you gave me x and y and I'm ignoring y"21:21
mattoliveryeah supporting new but falling back to old seems the normal migration path. With warnings to me makes sense. But maybe I'm missing something obvious (it is early) :P 21:22
acolesthe context is this comment https://review.opendev.org/c/openstack/swift/+/922518/comment/56eb874d_9707e7c5/21:23
patch-botpatch 922518 - swift - statsd: deprecate log_ prefix for options - 11 patch sets21:23
acoleswe're concerned that an op might update to new option in DEFAULT and not realise it overrides what was previously a different value in proxy-logging21:24
acoleswhich led us to "blow up if old and new are present" ... but now reflecting on whether that is too brittle21:25
timburkei think my perspective when thinking about going for an error was if you've got log_statsd_host=foo, log_statsd_port=12345, statsd_host=bar -- that having some but not all options coming from the correct place seems likely a misconfiguration21:25
acolesyup, but does misconfiguration result in warning or ValueError ???21:26
mattoliveroh I see. But if they know they're moving to the new (and done so in DEFAULT) isn't it on them to finish the job.. I think warning of settings is enough, to alert them. 21:27
mattoliverBut I guess it depends on where we think metrics sit on importance in swift21:27
mattoliverif it's critial then it should ValueError21:27
mattoliverbut it seems kinda optional (which maybe wrong) so warning is enough21:28
timburkemaybe my real concern is that i want a statsd_endpoint config option that looks like host:port -- which is definitely out of scope :P21:29
mattoliverDownstream our metrics are super important, so I'd assume ValueError would be more useful, because we may not check/see warning for some time.21:29
mattoliverso yeah, I think I finally clicked onto the delema :P 21:30
timburkeeh, the warning's probably fine21:31
timburkei've probably been overthinking it21:31
acolesworth noting that downstream, with ValueError approach, we might have to do *work* (i.e. rip stuff out of controller) before we could use new metrics via ansible conf.d, so we might be stuck with legacy for a while21:31
mattolivergut still feels like a warning, as thats more historic swift21:31
acolestell you what - I'll spin a warning only version and we can go back and forth more on gerrit - I've already used half the meeting time :)21:33
mattolivermaybe we need a config check tool.. maybe swift-config can do the audit 21:33
timburkeno worries21:33
mattoliver+121:33
timburkeok, next up21:34
mattoliverget swift-config to print dep warnings and tell users they should always run new config through swift-config :) 21:34
timburkeoh, i think i like that... we should probably use that tool more21:35
timburkei realized that we're getting toward the end of the dalmatian cycle21:35
timburke#link https://releases.openstack.org/dalmatian/schedule.html21:35
timburkei need to be getting a client release together soonish21:36
mattoliverkk, let's fire up priority reviews page :) 21:36
timburkei *did* finally get the liberasurecode release out that i'd meant to do months ago at least :-)21:36
mattoliver\o/21:37
timburkeand within the next couple weeks or so we should get a swift release out21:37
mattoliverThen I want to make sure the account-qoutas follow up lands before then. 21:38
timburke+121:38
timburkei should take another look at that, too21:38
mattoliverme 321:39
timburkewe probably ought to make a decision on https://review.opendev.org/c/openstack/swift/+/924795 by then, too -- either remove now, or add the warnings before release21:39
patch-botpatch 924795 - swift - Remove legacy bin/ scripts - 3 patch sets21:39
mattoliveryes. I added my vote ;) 21:40
timburkei saw! thanks21:40
mattolivermaybe we give it a timelimit. and count the +/- 1's21:40
timburkei'm sad we still haven't been able to feel comfortable merging https://review.opendev.org/c/openstack/swift/+/85359021:41
patch-botpatch 853590 - swift - Drop py2 support - 15 patch sets21:41
mattoliversay by next meeting. 21:41
mattoliverohh, maybe we should.. or is that rocking the boat too much and to quickly :P 21:42
mattoliveror do we merge that just after this release.. and that's the line?21:42
timburkemy main concern is similar to acoles's on the config warning/error question -- would it just be setting us up to have to do work downstream to work around it...21:43
mattoliverwell yeah, but that's downstream, and maybe the kick in the pants we need :P 21:43
timburkei suppose we could always merge it, then immediately propose a revert, which we can carry as needed :P21:45
mattoliverlol21:45
mattoliverwell we have a few weeks at the most, let's probe downstream and see where the current blockers are.. but it would be nice to not worry about py2 anymore. I don't think anyone upstream in the community other then us has some dep issues on py2  code in an older codebase21:46
timburkeno, i don't believe so either21:47
timburkenext ptg, we should have a straw poll on oldest python version that should still be supported by master -- because i think we could probably drop py36(and maybe even py37/py38?) as well21:48
mattoliverk, lets put a pin in it for this meeting an re-discuss next when we might have some more data. 21:48
mattoliveroh good idea21:49
timburkeall right, i think those are the main things i've got21:49
timburkeanything else we should bring up this week?21:49
fulecorafaIf I may21:49
zaitcevEnumerate the distros you're still willing to support, find what Python they ship, and there's your answer.21:49
timburkefulecorafa, go ahead!21:50
fulecorafaWe're having some problems with some users either deleting enourmous files or deleting a large quantity of them. Essentially any delete objects that takes some time to resolve the HTTP request21:50
fulecorafaFrom what I've tested, it seems like it is a simple problem of connection timeout because the operation takes a long time21:51
fulecorafaHowever, I think this should open the possibility of making deletions async21:51
mattoliveryeah interesting. async deletion. I guess the question is what status code do you get. 21:52
mattoliverdid build delete ever get the keep-alive heartbeat love. 21:52
mattoliver*bulk delete21:52
timburke204 Accepted -- good for so many things :D21:53
fulecorafaFor the actual implementation, I think I would go acoles direction to make deletion markers. Although I remember there is something similar already there, even though I didn't find it around in the repo lately...21:53
mattoliverwell maybe passing in a query arg to indicate an async delete might be ok21:53
zaitcevAs a workaround, could you do a bulk delete with just 1 entry?21:54
fulecorafaThat is an idea zaitcev and mattoliver, didn't try that yet21:54
mattoliveryeah possibly21:54
timburkemattoliver, i'd forgotten that heartbeating was opt-in -- but yeah, pretty sure bulk delete and slo both have it now21:54
fulecorafaWill check the possibility21:54
timburkefulecorafa, are your users using swift or s3 api? or both?21:54
fulecorafas3api mostly21:54
zaitcevoh21:55
fulecorafaYep21:55
zaitcevI'm not sure if ?bulk-delete is accessible through S3.21:55
zaitcevSorry I just wanted to get your users going while we're tinkering.21:55
mattoliverotherwise dont mind the idea of async just need to think about how. drop a tombstone and wait for something to clean it up, drop it into expirer queue21:55
fulecorafaThere is a multi-delete controller, but we're having problem with that too21:55
timburkeoh, no! i was wrong about bulk -- and was in fact thinking about slo PUTs! https://github.com/openstack/swift/blob/master/swift/common/middleware/slo.py#L95-L10021:56
fulecorafaOne idea I was having, such that we could give backwards support for today's s3api21:56
mattolivertimburke: yeah I was wondering if that was something I was going to add to the summer school students and why it was in my head :P 21:57
fulecorafaYou can send a combination of config and query param requesting for async, creating controllers for async where we would want it. The configuration defaults the behaviour, while the query param overwrites it21:57
mattoliveri beleive there is a a feature request for it21:57
mattoliverfulecorafa: sounds like we're on board, maybe we need to write up something (bug/feature request or wait until next meeting to discuss further) so we can continue the disucssion async  21:59
mattoliversee what I did there :P 21:59
mattoliverbut think it's a great idea, and useful feature as objects can get pretty big :) 21:59
fulecorafaThanks mattoliver. Wanted to be sure this was not available today. Since it is a nice touch, I will open a feature request for that soon than22:00
mattoliverta22:00
mattoliverI think we're at time 22:00
mattoliverI did want to mention that I might have a patch to solving the early active issue we see in getting auto shrinking happening in sharding: https://review.opendev.org/c/openstack/swift/+/926036 22:01
patch-botpatch 926036 - swift - ShardRange: track last state change timestamp - 3 patch sets22:01
timburkefulecorafa, you said it happens when deleting enormous files -- is allow_async_delete configured for slo? it should default to on which is what you'd want https://github.com/openstack/swift/blob/master/etc/proxy-server.conf-sample#L111422:01
timburkes3api should be trying to use that functionality: https://github.com/openstack/swift/blob/2.33.0/swift/common/middleware/s3api/s3request.py#L1518-L153522:02
mattoliveroh good call, I guess they probably are SLOs22:03
fulecorafaThx timburke, I didn't check that, didn't remember that or we're in an old version that it didn't appear to me. 22:03
timburkeoh, and bulk *always* wants to do that, but the option is called yield_frequency https://github.com/openstack/swift/blob/master/etc/proxy-server.conf-sample#L1054-L105722:04
timburkebut i don't think s3api's bulk-delete-equivalent uses that22:05
fulecorafaIt doesn't ;-;22:06
timburkethe complete-multipart-upload code might be a useful starting point for similar functionality: https://github.com/openstack/swift/blob/2.33.0/swift/common/middleware/s3api/controllers/multi_upload.py#L788-L81822:07
* mattoliver needs to go wrangle kids and get them ready for school. So I gotta drop.22:07
timburkeall right, mattoliver's right, we're past time now -- i should wrap up22:08
timburkethank you all for coming, and thank you for working on swift!22:08
timburke#endmeeting22:08
opendevmeetMeeting ended Wed Aug 14 22:08:26 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)22:08
opendevmeetMinutes:        https://meetings.opendev.org/meetings/swift/2024/swift.2024-08-14-21.09.html22:08
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/swift/2024/swift.2024-08-14-21.09.txt22:08
opendevmeetLog:            https://meetings.opendev.org/meetings/swift/2024/swift.2024-08-14-21.09.log.html22:08
timburkei'd forgotten how all-async-deleted-segments-must-be-in-one-container was a self-imposed restriction: https://github.com/openstack/swift/blob/master/swift/common/middleware/slo.py#L1768-L177523:23
timburkei think i was worried about the authorize callback potentially being somewhat expensive?23:24
timburkeor i was just trying to descope to cover only what was strictly needed for s3api23:24

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!