opendevreview | Alistair Coles proposed openstack/swift master: sharder: always get ranges from root while shrinking https://review.opendev.org/c/openstack/swift/+/857718 | 15:09 |
---|---|---|
opendevreview | Alistair Coles proposed openstack/swift master: sharder: refactor _audit_shard_container https://review.opendev.org/c/openstack/swift/+/859630 | 15:09 |
opendevreview | Alistair Coles proposed openstack/swift master: sharder: always merge child shard ranges fetched from root https://review.opendev.org/c/openstack/swift/+/858398 | 15:54 |
opendevreview | Alistair Coles proposed openstack/swift master: sharder: merge shard shard_ranges from root while sharding https://review.opendev.org/c/openstack/swift/+/852905 | 15:54 |
opendevreview | Alistair Coles proposed openstack/swift master: sharder: always merge child shard ranges fetched from root https://review.opendev.org/c/openstack/swift/+/858398 | 16:23 |
opendevreview | Alistair Coles proposed openstack/swift master: sharder: merge shard shard_ranges from root while sharding https://review.opendev.org/c/openstack/swift/+/852905 | 16:23 |
opendevreview | Merged openstack/swift master: sharder: refactor _audit_shard_container https://review.opendev.org/c/openstack/swift/+/859630 | 18:44 |
timburke | created the doodle poll for PTG times: https://doodle.com/meeting/organize/id/b4RryJJe?authToken=dGltLmJ1cmtlQGdtYWlsLmNvbTtUaW0gQnVya2U%3D.VKpXl57xMo1KNayeLq | 19:17 |
timburke | and an etherpad: https://etherpad.opendev.org/p/swift-ptg-antelope | 20:38 |
opendevreview | Merged openstack/swift master: Update master for stable/zed https://review.opendev.org/c/openstack/swift/+/859098 | 20:50 |
timburke | #startmeeting swift | 21:00 |
opendevmeet | Meeting started Wed Sep 28 21:00:42 2022 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. | 21:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 21:00 |
opendevmeet | The meeting name has been set to 'swift' | 21:00 |
timburke | who's here for the swift meeting? | 21:00 |
mattoliver | o/ | 21:01 |
acoles | o/ | 21:01 |
timburke | as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift | 21:02 |
timburke | it's a little sparse, mostly owing to me being out last week | 21:02 |
timburke | first up | 21:02 |
timburke | #topic PTG | 21:02 |
timburke | it's soon! a little over two weeks | 21:02 |
mattoliver | Oh wow, that is soon | 21:03 |
timburke | sorry for the short notice, but i did get a doodle poll up so i can block off some meeting times | 21:03 |
timburke | #link https://doodle.com/meeting/organize/id/b4RryJJe?authToken=dGltLmJ1cmtlQGdtYWlsLmNvbTtUaW0gQnVya2U%3D.VKpXl57xMo1KNayeLq | 21:03 |
zaitcev | I still haven't registered. | 21:04 |
timburke | ...and i maybe shouldn't have included the token in the query... oh well | 21:04 |
zaitcev | Maybe I should just go on my own. | 21:04 |
timburke | zaitcev, it's all-virtual! | 21:04 |
zaitcev | Oh. I thought it was in Columbus, Ohio. | 21:05 |
timburke | where'd i put that ML update... | 21:05 |
zaitcev | Booooring. | 21:05 |
timburke | #link https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029879.html | 21:05 |
timburke | caught me off guard, too -- part of why i'm racing to catch up | 21:06 |
acoles | Boring is in Oregon | 21:06 |
acoles | #link https://doodle.com/meeting/participate/id/b4RryJJe | 21:08 |
timburke | yeah, i'd still prefer an in-person meetup, too. but even when we thought it'd be in person, we needed someone to foot the airfare for acoles and mattoliver... | 21:08 |
timburke | anyway, i also created an etherpad to gather discussion topics | 21:08 |
timburke | #link https://etherpad.opendev.org/p/swift-ptg-antelope | 21:08 |
acoles | are the doodle times UTC? | 21:09 |
timburke | should be | 21:09 |
timburke | i feel like the doodle interface got worse :-( i thought you could switch timezones fairly easily before... | 21:09 |
timburke | oh -- it should give you times in your local TZ i think -- over on the left in a private tab i see "United States - Los Angeles, San Diego, San Jose, San Francisco" | 21:11 |
timburke | and if you haven't registered yet, please do | 21:12 |
timburke | #link https://openinfra.dev/ptg/ | 21:12 |
timburke | that's all i've got for the PTG -- hope to see you all there! i'll also bring it up with a few other nvidians that aren't in IRC much | 21:13 |
timburke | #topic ring v2 - replication improvements | 21:14 |
acoles | I'm registered and I've booked all the travel I need ;-) | 21:15 |
timburke | so i've continued building on the ring v2 work -- matt added the last primaries table, and i made the proxy smart enough to use it | 21:15 |
timburke | next i wanted to improve the replicator/reconstructor to use it, too | 21:16 |
timburke | mattoliver already had a patch to update the reconstructor | 21:16 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/835001 | 21:16 |
mattoliver | Oh nice! | 21:16 |
mattoliver | I'll give them a review and a play | 21:17 |
timburke | but i realized it wouldn't work for the replicator -- the approach was fairly deep in the ssync protocol, and worked on an individual diskfile at a time | 21:17 |
timburke | so i took a stab at updating the replicator at the suffix-comparison level | 21:17 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/859349 | 21:18 |
timburke | the idea is to gather a bunch of remote suffixes (including from old primaries) before starting to rsync anything | 21:19 |
mattoliver | oh interesting | 21:20 |
timburke | and based on those suffixes, apply some heuristics to decide whether a node (including the local node!) is a mostly-up-to-date primary, a mostly-full old primary, or a mostly-empty new primary | 21:21 |
timburke | if the local node seems to be filling, bail early -- like, before we even do local rehashing | 21:21 |
timburke | if one of the remotes seems to be filling and there's a fairly full old-primary, skip replicating to that specific remote | 21:22 |
timburke | while working through this and running some experiments at home, i realized we currently do way too much rehashing on those filling primaries | 21:24 |
mattoliver | sounds really interesting, being able to make some smart determinations based off the suffixs in the part sounds really clever. | 21:25 |
timburke | it's a problem because rsync will pre-create a bunch of dirs before starting to fill them -- and when we rehash during an inbound transfer, we clean them up, and the remote rsync hits a failure later | 21:26 |
timburke | so to avoid that, i also added a new REPLICATE api to just consolidate hashes, without rehashing the invalid suffixes | 21:27 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/859348 | 21:27 |
mattoliver | can that confuse that suffix dirs are on a remote node if another comes and rsyncs at the same time? | 21:27 |
mattoliver | *what | 21:28 |
mattoliver | and confuse your new heuristics | 21:28 |
mattoliver | re: rsync creating a bunch of dirs before filling them up. Or is it more one suffix at a time as it's rsyncing. | 21:29 |
timburke | the heuristics are all based on "knowledge" of a suffix -- they don't actually look at the hash value | 21:29 |
mattoliver | oh yeah, so long as we don't rehash, we wont know of the "rsync dirs" | 21:30 |
timburke | my hope is to get far enough along this path that i can run some more experiments with it at home and have some pretty graphs of improvements to show for the PTG | 21:31 |
mattoliver | Nice, sounds really cool | 21:31 |
timburke | anyway, that's all i've got | 21:31 |
timburke | #topic open discussion | 21:31 |
acoles | yeah, lots of progress | 21:31 |
timburke | what else should we talk about this week? | 21:32 |
timburke | acoles, i saw a bunch of shard range patches earlier today | 21:33 |
acoles | yes I have a chain of patches aimed at improving when shards update their sub-shards | 21:33 |
acoles | all motivated by our painful experience with some shards that got stuck while sharding | 21:34 |
acoles | briefly, the shards had an incomplete set of sub-shards to which they could cleave, so could not complete sharding... | 21:34 |
acoles | ...the *root* had a complete set of shard ranges but the shard never merges what the root has (even though the root ranges are fetched during audit)...so shard remains stuck | 21:35 |
timburke | is it mostly a matter of needing reviews, or is there more discussion that would be useful? | 21:36 |
acoles | the fix allows shard ranges from the root to be merged into the shard, but only if the result appears to be a valid set of shard ranges | 21:37 |
acoles | timburke: I have one last tweak to make on the last patch (I have a comment to that effect), but the first 2 are I hope good to go | 21:37 |
acoles | https://review.opendev.org/c/openstack/swift/+/857718/7 | 21:38 |
timburke | 👍 | 21:38 |
mattoliver | ^ and the v2 patches and follow ups, I've really been meaning to review and get into, but been distracted with a work priority. | 21:38 |
acoles | https://review.opendev.org/c/openstack/swift/+/858398/ | 21:38 |
mattoliver | Will try and get to them all over the next week. | 21:38 |
mattoliver | On another note, I haven't been too idle. For those who want to try out the new OpenTelemetry version of Tracing, I've got a branch and a PR on the VSAIO repo that you can use. Seems to be working. Still might be some OpenTracing bits I missed, but I think I got them all. https://github.com/NVIDIA/vagrant-swift-all-in-one/pull/134 | 21:39 |
acoles | mattoliver: nice! | 21:39 |
timburke | mattoliver, thanks! don't worry too much :-) especially those later patches in the chain, there's a lot of testing gaps... | 21:39 |
mattoliver | Oh and I've also been migrating the OpenTracing POC over to OpenTelemetry | 21:39 |
mattoliver | and cleaning it up quite a bit at the same time. | 21:40 |
mattoliver | #link https://review.opendev.org/c/openstack/swift/+/857559 | 21:40 |
mattoliver | is the OTel version | 21:40 |
clarkb | Zuul is adding tracing support with testing using jaeger iirc. There might be useful bits there to copy over to swift if you end up wanting to test this | 21:41 |
mattoliver | Also found a better way of getting the spans into eventlet pools and piles. So the code is much cleaner | 21:41 |
mattoliver | oh nice! clarkb | 21:41 |
mattoliver | I'll take a look! | 21:41 |
mattoliver | Working on an in memory exporter version of the tracer that we can interrogate, in essence a tracer for unittests | 21:42 |
timburke | nice! | 21:43 |
mattoliver | or rather to unit test the tracing code in swift, not trace unittests :P | 21:43 |
timburke | that's how i'd interpreted it ;-) | 21:44 |
mattoliver | great :P | 21:44 |
acoles | I need to drop, 👋 | 21:45 |
timburke | seems like we're winding down anyway | 21:46 |
timburke | thank you all for coming, and thank you for working on swift! | 21:46 |
timburke | #endmeeting | 21:46 |
opendevmeet | Meeting ended Wed Sep 28 21:46:34 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 21:46 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/swift/2022/swift.2022-09-28-21.00.html | 21:46 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/swift/2022/swift.2022-09-28-21.00.txt | 21:46 |
opendevmeet | Log: https://meetings.opendev.org/meetings/swift/2022/swift.2022-09-28-21.00.log.html | 21:46 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!