opendevreview | Timur Alperovich proposed openstack/swift master: Bug: fix s3api multipart parts listings https://review.opendev.org/c/openstack/swift/+/811247 | 01:04 |
---|---|---|
opendevreview | Timur Alperovich proposed openstack/swift master: Bug: fix s3api multipart parts listings https://review.opendev.org/c/openstack/swift/+/811247 | 03:59 |
timburke__ | clayg, any ideas on how to make my changes at https://review.opendev.org/c/openstack/swift/+/732996/2/swift/container/backend.py more readable? | 19:17 |
clayg | maybe the readability matters less if my question about "is it for some reason more common in this specific call site" is answered "yes" | 19:22 |
clayg | you find the other get_brokers()[0] call site and called it "probably safer" - and I guess we only ever blow up when trying to get brokers[0]? cause shortly after we reap brokers[0] (the old db) we're back to only one db? | 19:23 |
clayg | I just assumed if we had a broker.get_old_db() or something we could encapsulate the race - i'm not sure what it does when it finds it? refresh db state and return the new broker? raise a more specific exception as part of the contract? | 19:25 |
timburke__ | hmm... maybe that could work... refresh and return new broker, then get state... | 19:31 |
timburke__ | but yeah, i suspect it's way more common at that particular call site -- because in the window where the race is tightest, we should mainly be looking at the old DB for the sake of stats | 19:33 |
reid_g | Hello, Is this IRC channel a good place to ask questions about swift usage/troubleshooting or is it more for development? | 19:38 |
timburke__ | reid_g, it's for both! what's your question? | 19:39 |
*** timburke__ is now known as timburke | 19:39 | |
reid_g | Cool! I've emailed before and had clayg respond a few times. Thought it would be fun to join IRC | 19:40 |
clayg | reid_g: IRC is SO MUCH FUN 😉 | 19:41 |
reid_g | We are using EC and recently added new nodes. We kicked off a rebalance and it looks to have mostly gone smoothly. | 19:41 |
reid_g | Went from ~120K handoffs in 4 rings to ~500 but it has been stuck there for a day. | 19:42 |
reid_g | wondering how to tell what is causing the stragglers | 19:42 |
reid_g | Tried running the reconstructor copy-of.conf (with logging set to debug) -o -v -p against one of these handoff partitions but it doesn't push the fragment to the correct device. | 19:43 |
reid_g | Tried going to what I thought was a neighbor and tried running the reconstructor on the partition but the missing fragment didn't get created in the missing primary. | 19:43 |
reid_g | My understanding is that the reconstructor should push a handoff to the correct location or recreate the missing fragment when run from a neighbor. | 19:43 |
reid_g | We identify handoffs as being partitions that don't belong to the host/device. Not sure if this is the correct terminology for data that needs to be moved after a rebalance. | 19:44 |
clayg | off the cuff: probably bugs - we had to fix more than one EC handoff draining bugs | 19:52 |
clayg | one of the really bad ones that only got closed recently had to do with expired fragments I think @acoles fixed it | 19:52 |
clayg | had to extend the SSYNC protocol and everything! he's a mad scientist | 19:53 |
clayg | oh, not expired - non-durable https://review.opendev.org/c/openstack/swift/+/770047 | 19:54 |
clayg | but I'm pretty sure that's only the most recent example | 19:54 |
clayg | same shit different metadata https://review.opendev.org/c/openstack/swift/+/456921 | 19:56 |
clayg | @acoles probably had a whole 'nother career as a network scientist in between fixing those bugs 🙄 | 19:56 |
reid_g | This particular object I'm looking at isn't deleted/expired. No meta when I salt out a `ls` to all the nodes | 19:57 |
reid_g | What is a durable vs non-durable fragment? | 19:57 |
reid_g | We did upgrade all of these clusters to USSURI but that is the latest available to 18.04. | 20:00 |
timburke | when we write EC data, there's a two (or, really, three) phase commit -- in one phase we write fsync the newly-written data to all nodes but don't mark it "durable" (so it won't be considered authoritative, we won't clean up whatever old data may have been at that name, and given enough time, the non-durable data will get cleaned up similar to a tombstone) | 20:01 |
timburke | provided enough backend nodes ack that phase, we tell them all to switch it over to durable | 20:01 |
timburke | you can tell whether a frag is durable or not just by looking at the name -- durable data will end in #<frag number>#d.data, while non-durable will just be #<frag number>.data (or, on pretty old swift, there'd be a separate .durable file) | 20:03 |
reid_g | So it looks like this isn't durable 1632511385.76093#8.data | 20:04 |
reid_g | This ^ fragment is one that is in the pre-rebalance location. | 20:05 |
reid_g | So may be related to 770047 above | 20:06 |
reid_g | If I go to a neighbor and run object-reconstructor, shouldn't that create the missing fragment on the device that is missing? | 20:06 |
reid_g | How nodes need to say OK to turn EC object durable? the above example is from a 10+4 EC ring | 20:13 |
reid_g | Reading docs | 20:17 |
reid_g | Seems like it should be close to realtime looking at the high level example | 20:43 |
timburke | reid_g, what do the other nodes say? do they have durable data? | 20:46 |
timburke | (sorry, suddenly had to drop off for childcare) | 20:46 |
reid_g | No. the other nodes do not show durable (no #d in the file name) | 20:46 |
timburke | for a 10+4 policy, we want 11 acks before marking durable. as long as at least one node marks it durable, it should propagate pretty quickly, even if other nodes missed the second phase. if we don't get enough acks, nobody gets marked durable, the client gets back a 503, and the data on-disk never *will* get marked durable | 20:48 |
timburke | it's a little curious, tho -- i would've expected the client to retry the upload after getting back the 503, so the other nodes would hopefully have durable data with a later timestamp | 20:50 |
reid_g | O.O | 20:50 |
reid_g | Going to check with the application team if they expect this object to be functioning. | 20:50 |
timburke | maybe also double check whether it shows up in listings (given the description thus far, i expect it doesn't) | 20:51 |
reid_g | What do you mean in "listings" | 20:51 |
timburke | when the client does a GET at the container level | 20:52 |
timburke | you can also go grepping for the object name in logs, confirm the response status was sent back to the client | 20:54 |
timburke | er...status *that* was sent... | 20:55 |
kota | morning | 20:59 |
reid_g | Good news is that the application storing the data doesn't know about that object so it probably handled a failure. | 21:00 |
timburke | \o/ | 21:00 |
timburke | #startmeeting swift | 21:00 |
opendevmeet | Meeting started Wed Sep 29 21:00:39 2021 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. | 21:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 21:00 |
opendevmeet | The meeting name has been set to 'swift' | 21:00 |
timburke | who's here for the swift meeting? | 21:00 |
kota | hi | 21:01 |
mattoliver | o/ | 21:01 |
acoles | o/ | 21:02 |
timburke | as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift | 21:02 |
timburke | #topic gate | 21:02 |
timburke | the tempest and grenade failures should be resolved now! i seem to recall clayg and acoles noticing them being a problem | 21:03 |
timburke | #link http://lists.openstack.org/pipermail/openstack-discuss/2021-September/025128.html | 21:03 |
timburke | #topic xena release | 21:04 |
timburke | the stable branch has been cut! | 21:04 |
clayg | Woot! | 21:04 |
mattoliver | Nice | 21:04 |
timburke | i'd actually kinda meant to do another release before xena so we wouldn't be shipping code from June, but oh well -- i dropped the ball a bit | 21:05 |
timburke | not the end of the world, of course. 2.28.0 is a great release ;-) | 21:06 |
timburke | #topic ring v2 | 21:06 |
timburke | thanks for the review mattoliver! i haven't had a chance to look through them much, but i'll plan on responding this week | 21:07 |
mattoliver | Nps, still have more of the chain to work though, will do that today. | 21:07 |
timburke | #topic root epoch reset | 21:08 |
timburke | it looked like there was a good bit of progress on this, kinda split between... | 21:09 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/807824 | 21:09 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/809969 | 21:09 |
mattoliver | Yeah, so the situation is we have had an epoch reset in the cluster a few times | 21:10 |
mattoliver | But still can't reproduce without physically breaking it (in a probe test) | 21:11 |
mattoliver | But we know once and own_shard_range is shared it should ALWAYS have an epoch set | 21:12 |
mattoliver | The first patch is one that stops merging a remote own_shard_range with a local if local has an epoch and remote doesn't down in the container broker. | 21:13 |
mattoliver | This "should" fix it.. but hard to know because we still don't know the cause | 21:14 |
mattoliver | The second is one that doesn't do down in the guts of shard merging only on replication. And will block a node without an epoch from replicating to its neighbours | 21:15 |
mattoliver | Not as universal as the first, but will give us better logging and a pause to help diagnose the bug. | 21:15 |
mattoliver | We're thinking about rolling out the second temporarily to catch when/if it happens again so we can track the bugger down. | 21:16 |
timburke | sounds good | 21:17 |
timburke | #topic staticweb + tempurl-with-prefix | 21:17 |
timburke | so this was an idea i had while thinking about how to share something out of my home cluster | 21:18 |
timburke | i wanted to share a set of files, but not require that i send a separate tempurl for each or provide swift creds | 21:19 |
timburke | so i came up with | 21:19 |
timburke | #link https://review.opendev.org/c/openstack/swift/+/810754 | 21:19 |
timburke | the core of the change is in staticweb -- basically, do staticweb listings when auth'ed via tempurl and carry the prefix-based tempurl to the links that we build | 21:21 |
timburke | i wanted to get people's thoughts on it, and see how we feel about the increase in privileges (prefix-based tempurl can now do listings -- but only if staticweb is enabled and only within the prefix) | 21:22 |
mattoliver | Oh that's kinda cool. You can share as a list of links and not make the container public readable. | 21:23 |
mattoliver | Need to have a better look and play first though | 21:23 |
timburke | yup :-) | 21:23 |
timburke | and it needs tests -- skimped there, figuring i ought to get a bit more buy-in first | 21:24 |
timburke | anyway, just wanted to raise a bit of attention for it | 21:25 |
timburke | that's all i've got | 21:25 |
timburke | #topic open discussion | 21:25 |
timburke | what else should we bring up this week? | 21:25 |
mattoliver | Ptg not far away, get topics in, or just plan to come and hang with us virtually for a bit | 21:27 |
kota | is the schedule fixed? | 21:28 |
timburke | i believe so, but will double check for next week | 21:29 |
kota | okay | 21:29 |
timburke | all right, i think i'll call it | 21:32 |
timburke | thank you all for coming, and thank you for working on swift! | 21:32 |
timburke | #endmeeting | 21:32 |
opendevmeet | Meeting ended Wed Sep 29 21:32:48 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 21:32 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/swift/2021/swift.2021-09-29-21.00.html | 21:32 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/swift/2021/swift.2021-09-29-21.00.txt | 21:32 |
opendevmeet | Log: https://meetings.opendev.org/meetings/swift/2021/swift.2021-09-29-21.00.log.html | 21:32 |
mattoliver | Thanks timburke__ | 21:33 |
mattoliver | Time for breakfast 😀 | 21:33 |
zaitcev | To hang in virtually is my plan. | 21:36 |
reid_g | timburke - checked the logs and this object looks like 7/14 timeout on the commit status of the put so that would be why it wasn't durable. | 21:44 |
reid_g | This object should be cleaned up at reclaim_age since it didn't get the durable flag? | 21:44 |
acoles | reid_g: you may be suffering from bug https://bugs.launchpad.net/swift/+bug/1778002 - during rebalance, EC fragments should move from what has become a handoff node to their new primary node. But *non-durable* EC frags wouldn't be moved. A durable frag is identified by a filename with #d in it such as 1234567890.00000#1#d.data whereas 1234567890.00000#1.data is non-durable. | 21:44 |
acoles | reid_g: yep, I was about to say, the nondurable will be eventually removed after reclaim age has passed. | 21:45 |
timburke | makes sense. yup! it'll get cleaned up after a reclaim age -- or you could just delete it manually | 21:45 |
acoles | or, when you upgrade to wallaby :) the bug was fixed in wallaby i.e. the nondurable gets moved to the primary, but it still isn't made durable unless there'a another durable frag. | 21:46 |
reid_g | Nice at least we know why we have some things sitting around. I am guessing that the cluster got a bit busy during the rebalance and caused some timeouts. | 21:46 |
acoles | in the meantime the handoff nags around annoyingly | 21:47 |
acoles | s/nags/hangs/ | 21:47 |
reid_g | Yeah. We use them as the metric to know when the rebalance is done. | 21:47 |
* acoles heading to bed | 21:49 | |
reid_g | Thanks for the help today! Time to go take care of kid | 21:50 |
timburke | reid_g, glad to help! pop by again any time you need some help | 22:17 |
opendevreview | Tim Burke proposed openstack/swift master: sharding: Raise fewer errors when the on-disk files change out from under us https://review.opendev.org/c/openstack/swift/+/732996 | 22:25 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!