Wednesday, 2021-09-29

opendevreview	Timur Alperovich proposed openstack/swift master: Bug: fix s3api multipart parts listings https://review.opendev.org/c/openstack/swift/+/811247	01:04
opendevreview	Timur Alperovich proposed openstack/swift master: Bug: fix s3api multipart parts listings https://review.opendev.org/c/openstack/swift/+/811247	03:59
timburke__	clayg, any ideas on how to make my changes at https://review.opendev.org/c/openstack/swift/+/732996/2/swift/container/backend.py more readable?	19:17
clayg	maybe the readability matters less if my question about "is it for some reason more common in this specific call site" is answered "yes"	19:22
clayg	you find the other get_brokers()[0] call site and called it "probably safer" - and I guess we only ever blow up when trying to get brokers[0]? cause shortly after we reap brokers[0] (the old db) we're back to only one db?	19:23
clayg	I just assumed if we had a broker.get_old_db() or something we could encapsulate the race - i'm not sure what it does when it finds it? refresh db state and return the new broker? raise a more specific exception as part of the contract?	19:25
timburke__	hmm... maybe that could work... refresh and return new broker, then get state...	19:31
timburke__	but yeah, i suspect it's way more common at that particular call site -- because in the window where the race is tightest, we should mainly be looking at the old DB for the sake of stats	19:33
reid_g	Hello, Is this IRC channel a good place to ask questions about swift usage/troubleshooting or is it more for development?	19:38
timburke__	reid_g, it's for both! what's your question?	19:39
*** timburke__ is now known as timburke		19:39
reid_g	Cool! I've emailed before and had clayg respond a few times. Thought it would be fun to join IRC	19:40
clayg	reid_g: IRC is SO MUCH FUN 😉	19:41
reid_g	We are using EC and recently added new nodes. We kicked off a rebalance and it looks to have mostly gone smoothly.	19:41
reid_g	Went from ~120K handoffs in 4 rings to ~500 but it has been stuck there for a day.	19:42
reid_g	wondering how to tell what is causing the stragglers	19:42
reid_g	Tried running the reconstructor copy-of.conf (with logging set to debug) -o -v -p against one of these handoff partitions but it doesn't push the fragment to the correct device.	19:43
reid_g	Tried going to what I thought was a neighbor and tried running the reconstructor on the partition but the missing fragment didn't get created in the missing primary.	19:43
reid_g	My understanding is that the reconstructor should push a handoff to the correct location or recreate the missing fragment when run from a neighbor.	19:43
reid_g	We identify handoffs as being partitions that don't belong to the host/device. Not sure if this is the correct terminology for data that needs to be moved after a rebalance.	19:44
clayg	off the cuff: probably bugs - we had to fix more than one EC handoff draining bugs	19:52
clayg	one of the really bad ones that only got closed recently had to do with expired fragments I think @acoles fixed it	19:52
clayg	had to extend the SSYNC protocol and everything! he's a mad scientist	19:53
clayg	oh, not expired - non-durable https://review.opendev.org/c/openstack/swift/+/770047	19:54
clayg	but I'm pretty sure that's only the most recent example	19:54
clayg	same shit different metadata https://review.opendev.org/c/openstack/swift/+/456921	19:56
clayg	@acoles probably had a whole 'nother career as a network scientist in between fixing those bugs 🙄	19:56
reid_g	This particular object I'm looking at isn't deleted/expired. No meta when I salt out a `ls` to all the nodes	19:57
reid_g	What is a durable vs non-durable fragment?	19:57
reid_g	We did upgrade all of these clusters to USSURI but that is the latest available to 18.04.	20:00
timburke	when we write EC data, there's a two (or, really, three) phase commit -- in one phase we write fsync the newly-written data to all nodes but don't mark it "durable" (so it won't be considered authoritative, we won't clean up whatever old data may have been at that name, and given enough time, the non-durable data will get cleaned up similar to a tombstone)	20:01
timburke	provided enough backend nodes ack that phase, we tell them all to switch it over to durable	20:01
timburke	you can tell whether a frag is durable or not just by looking at the name -- durable data will end in #<frag number>#d.data, while non-durable will just be #<frag number>.data (or, on pretty old swift, there'd be a separate .durable file)	20:03
reid_g	So it looks like this isn't durable 1632511385.76093#8.data	20:04
reid_g	This ^ fragment is one that is in the pre-rebalance location.	20:05
reid_g	So may be related to 770047 above	20:06
reid_g	If I go to a neighbor and run object-reconstructor, shouldn't that create the missing fragment on the device that is missing?	20:06
reid_g	How nodes need to say OK to turn EC object durable? the above example is from a 10+4 EC ring	20:13
reid_g	Reading docs	20:17
reid_g	Seems like it should be close to realtime looking at the high level example	20:43
timburke	reid_g, what do the other nodes say? do they have durable data?	20:46
timburke	(sorry, suddenly had to drop off for childcare)	20:46
reid_g	No. the other nodes do not show durable (no #d in the file name)	20:46
timburke	for a 10+4 policy, we want 11 acks before marking durable. as long as at least one node marks it durable, it should propagate pretty quickly, even if other nodes missed the second phase. if we don't get enough acks, nobody gets marked durable, the client gets back a 503, and the data on-disk never will get marked durable	20:48
timburke	it's a little curious, tho -- i would've expected the client to retry the upload after getting back the 503, so the other nodes would hopefully have durable data with a later timestamp	20:50
reid_g	O.O	20:50
reid_g	Going to check with the application team if they expect this object to be functioning.	20:50
timburke	maybe also double check whether it shows up in listings (given the description thus far, i expect it doesn't)	20:51
reid_g	What do you mean in "listings"	20:51
timburke	when the client does a GET at the container level	20:52
timburke	you can also go grepping for the object name in logs, confirm the response status was sent back to the client	20:54
timburke	er...status that was sent...	20:55
kota	morning	20:59
reid_g	Good news is that the application storing the data doesn't know about that object so it probably handled a failure.	21:00
timburke	\o/	21:00
timburke	#startmeeting swift	21:00
opendevmeet	Meeting started Wed Sep 29 21:00:39 2021 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.	21:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	21:00
opendevmeet	The meeting name has been set to 'swift'	21:00
timburke	who's here for the swift meeting?	21:00
kota	hi	21:01
mattoliver	o/	21:01
acoles	o/	21:02
timburke	as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift	21:02
timburke	#topic gate	21:02
timburke	the tempest and grenade failures should be resolved now! i seem to recall clayg and acoles noticing them being a problem	21:03
timburke	#link http://lists.openstack.org/pipermail/openstack-discuss/2021-September/025128.html	21:03
timburke	#topic xena release	21:04
timburke	the stable branch has been cut!	21:04
clayg	Woot!	21:04
mattoliver	Nice	21:04
timburke	i'd actually kinda meant to do another release before xena so we wouldn't be shipping code from June, but oh well -- i dropped the ball a bit	21:05
timburke	not the end of the world, of course. 2.28.0 is a great release ;-)	21:06
timburke	#topic ring v2	21:06
timburke	thanks for the review mattoliver! i haven't had a chance to look through them much, but i'll plan on responding this week	21:07
mattoliver	Nps, still have more of the chain to work though, will do that today.	21:07
timburke	#topic root epoch reset	21:08
timburke	it looked like there was a good bit of progress on this, kinda split between...	21:09
timburke	#link https://review.opendev.org/c/openstack/swift/+/807824	21:09
timburke	#link https://review.opendev.org/c/openstack/swift/+/809969	21:09
mattoliver	Yeah, so the situation is we have had an epoch reset in the cluster a few times	21:10
mattoliver	But still can't reproduce without physically breaking it (in a probe test)	21:11
mattoliver	But we know once and own_shard_range is shared it should ALWAYS have an epoch set	21:12
mattoliver	The first patch is one that stops merging a remote own_shard_range with a local if local has an epoch and remote doesn't down in the container broker.	21:13
mattoliver	This "should" fix it.. but hard to know because we still don't know the cause	21:14
mattoliver	The second is one that doesn't do down in the guts of shard merging only on replication. And will block a node without an epoch from replicating to its neighbours	21:15
mattoliver	Not as universal as the first, but will give us better logging and a pause to help diagnose the bug.	21:15
mattoliver	We're thinking about rolling out the second temporarily to catch when/if it happens again so we can track the bugger down.	21:16
timburke	sounds good	21:17
timburke	#topic staticweb + tempurl-with-prefix	21:17
timburke	so this was an idea i had while thinking about how to share something out of my home cluster	21:18
timburke	i wanted to share a set of files, but not require that i send a separate tempurl for each or provide swift creds	21:19
timburke	so i came up with	21:19
timburke	#link https://review.opendev.org/c/openstack/swift/+/810754	21:19
timburke	the core of the change is in staticweb -- basically, do staticweb listings when auth'ed via tempurl and carry the prefix-based tempurl to the links that we build	21:21
timburke	i wanted to get people's thoughts on it, and see how we feel about the increase in privileges (prefix-based tempurl can now do listings -- but only if staticweb is enabled and only within the prefix)	21:22
mattoliver	Oh that's kinda cool. You can share as a list of links and not make the container public readable.	21:23
mattoliver	Need to have a better look and play first though	21:23
timburke	yup :-)	21:23
timburke	and it needs tests -- skimped there, figuring i ought to get a bit more buy-in first	21:24
timburke	anyway, just wanted to raise a bit of attention for it	21:25
timburke	that's all i've got	21:25
timburke	#topic open discussion	21:25
timburke	what else should we bring up this week?	21:25
mattoliver	Ptg not far away, get topics in, or just plan to come and hang with us virtually for a bit	21:27
kota	is the schedule fixed?	21:28
timburke	i believe so, but will double check for next week	21:29
kota	okay	21:29
timburke	all right, i think i'll call it	21:32
timburke	thank you all for coming, and thank you for working on swift!	21:32
timburke	#endmeeting	21:32
opendevmeet	Meeting ended Wed Sep 29 21:32:48 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	21:32
opendevmeet	Minutes: https://meetings.opendev.org/meetings/swift/2021/swift.2021-09-29-21.00.html	21:32
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/swift/2021/swift.2021-09-29-21.00.txt	21:32
opendevmeet	Log: https://meetings.opendev.org/meetings/swift/2021/swift.2021-09-29-21.00.log.html	21:32
mattoliver	Thanks timburke__	21:33
mattoliver	Time for breakfast 😀	21:33
zaitcev	To hang in virtually is my plan.	21:36
reid_g	timburke - checked the logs and this object looks like 7/14 timeout on the commit status of the put so that would be why it wasn't durable.	21:44
reid_g	This object should be cleaned up at reclaim_age since it didn't get the durable flag?	21:44
acoles	reid_g: you may be suffering from bug https://bugs.launchpad.net/swift/+bug/1778002 - during rebalance, EC fragments should move from what has become a handoff node to their new primary node. But non-durable EC frags wouldn't be moved. A durable frag is identified by a filename with #d in it such as 1234567890.00000#1#d.data whereas 1234567890.00000#1.data is non-durable.	21:44
acoles	reid_g: yep, I was about to say, the nondurable will be eventually removed after reclaim age has passed.	21:45
timburke	makes sense. yup! it'll get cleaned up after a reclaim age -- or you could just delete it manually	21:45
acoles	or, when you upgrade to wallaby :) the bug was fixed in wallaby i.e. the nondurable gets moved to the primary, but it still isn't made durable unless there'a another durable frag.	21:46
reid_g	Nice at least we know why we have some things sitting around. I am guessing that the cluster got a bit busy during the rebalance and caused some timeouts.	21:46
acoles	in the meantime the handoff nags around annoyingly	21:47
acoles	s/nags/hangs/	21:47
reid_g	Yeah. We use them as the metric to know when the rebalance is done.	21:47
* acoles heading to bed		21:49
reid_g	Thanks for the help today! Time to go take care of kid	21:50
timburke	reid_g, glad to help! pop by again any time you need some help	22:17
opendevreview	Tim Burke proposed openstack/swift master: sharding: Raise fewer errors when the on-disk files change out from under us https://review.opendev.org/c/openstack/swift/+/732996	22:25

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!