Wednesday, 2022-02-23

opendevreviewTim Burke proposed openstack/swift master: db: Attempt to clean up part dir post replication  https://review.opendev.org/c/openstack/swift/+/83053500:07
opendevreviewTim Burke proposed openstack/liberasurecode master: Add CentOS 9 Stream job  https://review.opendev.org/c/openstack/liberasurecode/+/82096905:38
opendevreviewMatthew Oliver proposed openstack/swift master: POC/WIP - db: shard up the DatabaseBroker pending files  https://review.opendev.org/c/openstack/swift/+/83055105:59
mattoliver^ that is _very_ rough, and still has debugging `import q` statements int it. So will fail tests. Just want to push it off my laptop before I go further down the rabbit hole :P06:01
afaranhahi, anyone knows if there's currently some issues when adding jobs zuul for wallaby and xena? We added fips jobs and it's not being run: https://review.opendev.org/c/openstack/swift/+/827901 https://review.opendev.org/c/openstack/swift/+/82790210:08
clarkbafaranha: can you point to where it isn't being run? The fips jobs appear to have run against 827901 and 827902 and there are no newer xena or wallaby changes that may haev triggered the tests16:21
afaranhaclarkb,  shouldn't it run on the patch itself?16:24
clarkbafaranha: yes, it did16:25
afaranhai can't see its results on the patch itself16:25
afaranhain others projects, like nova, I can see it there16:26
afaranhahttps://review.opendev.org/c/openstack/nova/+/82789516:26
clarkbfor 827901 zuul +1 on february 9. You see them if you open the zuul summary too16:26
clarkbthey are there16:26
clarkbthey don't run in the gate so are not part of the gate testing16:26
clarkbbut the change didn't add them to the gate so that is expected16:27
afaranhaclarkb++ ow, nice, thanks for the explanation16:29
*** timburke_ is now known as timburke20:59
kotagood morning21:00
timburke#startmeeting swift21:00
opendevmeetMeeting started Wed Feb 23 21:00:30 2022 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.21:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.21:00
opendevmeetThe meeting name has been set to 'swift'21:00
timburkewho's here for the swift meeting?21:00
mattolivero/21:00
kotao/21:01
acoleso/21:02
timburkeas usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift21:02
timburke(though i've forgotten to update it :P)21:02
timburke#topic PTG21:03
timburkequick reminder to fill out the doodle poll to pick meeting times21:03
timburke#link https://doodle.com/poll/qs2pysgyb8nb36c221:03
kotaoh ok. will do soon21:03
timburkei'll get an etherpad up to collect development topics, too21:04
timburke#topic priority reviews21:04
timburkei updated the page at https://wiki.openstack.org/wiki/Swift/PriorityReviews21:05
timburkemostly to call out some patches i know we're running in prod21:05
timburkesome seem about ready to go -- expirer: Only try to delete empty containers (https://review.opendev.org/c/openstack/swift/+/825883) did just what we hoped it would, and we saw a precipitous drop in container deletes and listing shard range cache misses21:07
acolesyes that was a great improvement21:08
timburkeothers had somewhat more mixed results -- container-server: plumb includes down into _get_shard_range_rows (https://review.opendev.org/c/openstack/swift/+/569847) *maybe* had some impact on updater timings, but it was hard to say decidedly21:08
timburkethere was one that i wanted to check in on in particular21:08
timburke#link https://review.opendev.org/c/openstack/swift/+/80996921:08
timburkeSharding: a remote SR without an epoch can't replicate over one with an epoch21:09
timburkemattoliver, am i remembering right that the idea was to get the no-epoch SR to stick around so we could hunt down how it happened?21:09
mattoliverThat stops the reset, but I think currently locks the problem to the problem node. 21:10
mattoliverBut if that problem node is a handoff then it might be fine. 21:10
mattoliverInteredtly we haven't seen to problem again since we started running it. 21:10
timburkewhat do we think about merging it sooner rather than later, and calling the problem fixed until we get new information?21:11
mattoliverYeah, kk, it does log when there is an issue, so it'll let people know. 21:11
acolesmight be worth adding broker.db_path to the warning?21:13
mattoliveroh yeah, good idea. 21:14
timburkeall right, that's about all i've got then21:15
timburke#topic open discussion21:15
mattoliverI haven't looked at the patch so will look today21:15
timburkewhat else should we bring up this week?21:15
mattoliverI added handoff_delete to the db replicators https://review.opendev.org/c/openstack/swift/+/82863721:16
mattoliverwhich helps when needing to drain and gets them closer to on par with the obj replicator21:16
mattoliverAlso been playing with concurrent container object puts to the same container and trying to understand the problems involved and attempting to improve things some more. 21:18
timburkenice! along the same lines, i wrote up https://review.opendev.org/c/openstack/swift/+/830535 to clean up part dirs more quickly when you're rebalancing DBs21:19
mattolivercool21:19
mattoliverIn initial testing, moving the container directory lock and sharding out the pending file and locking the pending file your updating seems really promising. Getting much less directory lock timeouts21:19
mattoliverJust improves concurrent access to the server. So helps when running multiple workers21:20
mattolivercurrent POC WIP is https://review.opendev.org/c/openstack/swift/+/83055121:21
timburkeyeah, that looked promising -- anything to get a few more reqs/s out of the container-server21:22
mattoliverThat still has debugging and q statements in it. Just wanted to get it backed up off my laptop. 21:22
mattoliver+121:22
timburkeone thing i'm still curious about is what the curve looks like for number of container-server workers vs. max concurrent requests before clients start hitting timeouts21:24
mattoliveryeah, on my VSAIO wont be as high as a real server :P 21:25
timburkestill, hopefully the curve would still look somewhat similar -- start off at some level, and as you add a *ton* of workers it drops pretty low because of all the contention -- but what happens in the middle?21:26
timburkei feel like that may push us toward something like a servers-per-port strategy21:26
mattoliveryup, can have a play. 21:26
mattolivercurrently I'm randomly choosing a pending file shard when a put comes in. I wonder if I could just have a shard per worker, or maybe its shards per worker. 21:27
mattoliversome of the timeouts could also be due to the randomness of choosing a shard.21:27
acolesmattoliver: are you no longer locking the parent directory when appending to the pending file?21:28
mattolivernope, not unless it's a _commit_puts and we actually update the DB21:29
mattoliverbut not sure the effect that is on other things like replication yet21:29
mattoliverbut I do lock the pending file being updated so we don't loose pending data.21:29
acolesbut not locking the pending file when flushing it?21:30
acolesdoes the parent dir lock also take lock on all the pending files?21:30
mattoliverI do lock then too, because we use a truncate on it21:30
timburkeyeah, i'd imagine you'd want to lock all the pending files (and the parent dir) when flushing21:31
acolesOIC down in commit_puts21:31
mattoliverbut I take a lock on a pending file while flushing it, and only while dealing with that one so a concurrent put could go use it again.21:31
mattolivertimburke: yup 21:32
timburkenice21:32
timburkeif anyone has some spare time to think about a client-facing api change, i've got some users that'd appreciate something like https://review.opendev.org/c/openstack/swift/+/829605 - container: Add delimiter-depth query param21:32
acolesI was wondering if it would be possible to direct updates to a pending file that isn't being flushed?21:33
mattoliveroh interesting! 21:34
timburkethat'd be fancy! do it as a ring ;-)21:34
acolese.g. if the pending files could be pinned to workers21:34
acolesor some kind of rotation21:34
mattoliverI like it!21:35
acolesmaybe just try 'em all til you get a lock, a bit like how we do multiple lock files21:35
mattoliveryeah can borrow that code as a start at least :) 21:36
mattoliveralso like the ring like approach.21:36
mattoliverWill have a play. thanks for the awesome ideas21:37
timburkeall right, i think i'll call it21:38
timburkethank you all for coming, and thank you for working on swift!21:38
timburke#endmeeting21:38
opendevmeetMeeting ended Wed Feb 23 21:38:34 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)21:38
opendevmeetMinutes:        https://meetings.opendev.org/meetings/swift/2022/swift.2022-02-23-21.00.html21:38
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/swift/2022/swift.2022-02-23-21.00.txt21:38
opendevmeetLog:            https://meetings.opendev.org/meetings/swift/2022/swift.2022-02-23-21.00.log.html21:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!