Monday, 2019-09-09

*** rcernin has quit IRC		00:30
*** ccamacho has quit IRC		00:51
*** rcernin has joined #openstack-swift		01:45
*** baojg has quit IRC		02:47
*** baojg has joined #openstack-swift		02:48
*** gkadam has joined #openstack-swift		03:57
*** e0ne has joined #openstack-swift		06:09
*** e0ne has quit IRC		06:18
openstackgerrit	Matthew Oliver proposed openstack/swift master: PDF Documentation Build tox target https://review.opendev.org/679898	06:18
*** ccamacho has joined #openstack-swift		06:42
*** rcernin has quit IRC		07:02
*** tesseract has joined #openstack-swift		07:13
*** aluria has quit IRC		07:33
*** aluria has joined #openstack-swift		07:38
*** pcaruana has joined #openstack-swift		07:49
*** e0ne has joined #openstack-swift		08:19
*** tkajinam has quit IRC		08:42
*** rcernin has joined #openstack-swift		10:08
*** spsurya has joined #openstack-swift		10:21
*** rcernin has quit IRC		12:22
*** gkadam has quit IRC		12:55
*** BjoernT has joined #openstack-swift		12:56
*** e0ne has quit IRC		13:19
*** camelCaser has quit IRC		13:28
*** BjoernT_ has joined #openstack-swift		13:41
*** BjoernT has quit IRC		13:44
*** NM has joined #openstack-swift		13:48
*** e0ne has joined #openstack-swift		13:50
NM	Hello everyone. Mind if someone can point any direction to me: One of our sharded container is reporting it has zero objects and its header "X-Container-Sharding" returns False. The container.recon confirms it's sharded ( "state": "sharded"	13:52
NM	"state": "sharded").	13:52
NM	One thing we found out: although we work with 3 replicas, there are 6 db files: 3 regular ones and 3 handoffs. Are there any tips to "fix" this container?	13:52
*** BjoernT_ has quit IRC		13:58
*** BjoernT has joined #openstack-swift		14:08
*** BjoernT_ has joined #openstack-swift		14:12
*** BjoernT has quit IRC		14:13
*** zaitcev has joined #openstack-swift		14:59
*** ChanServ sets mode: +v zaitcev		14:59
*** diablo_rojo__ has joined #openstack-swift		15:05
timburke	NM: sounds a lot like https://bugs.launchpad.net/swift/+bug/1839355	15:06
openstack	Launchpad bug 1839355 in OpenStack Object Storage (swift) "container-sharder should keep cleaving when there are no rows" [Undecided,In progress] - Assigned to Matthew Oliver (matt-0)	15:06
timburke	good news is, mattoliverau's proposed a fix at https://review.opendev.org/#/c/675820/	15:06
patchbot	patch 675820 - swift - sharder: Keep cleaving on empty shard ranges - 4 patch sets	15:06
*** gyee has joined #openstack-swift		15:07
*** NM has quit IRC		15:52
*** diablo_rojo__ is now known as diablo_rojo		16:00
openstackgerrit	Thiago da Silva proposed openstack/swift master: WIP: new versioning mode as separate middleware https://review.opendev.org/681054	16:00
*** tesseract has quit IRC		16:07
*** e0ne has quit IRC		16:10
*** spsurya has quit IRC		16:27
*** diablo_rojo has quit IRC		16:49
*** diablo_rojo has joined #openstack-swift		17:02
*** NM has joined #openstack-swift		17:09
*** camelCaser has joined #openstack-swift		17:19
NM	thanks timburke. Meanwhile, is there any workarround? Should it be safe to delete de db's on the handoff nodes?	17:20
*** camelCaser has quit IRC		17:28
*** camelCaser has joined #openstack-swift		17:28
timburke	NM, mostly likely yes, it's safe. it also shouldn't really be harming anything, though...	17:58
clayg	tdasilva: timburke: should I rebase p 673682 ontop of p 681054 - or wait until we can talk more abou tit?	17:58
patchbot	https://review.opendev.org/#/c/673682/ - swift - s3api: Implement versioning status API - 2 patch sets	17:58
patchbot	https://review.opendev.org/#/c/681054/ - swift - WIP: new versioning mode as separate middleware - 1 patch set	17:58
timburke	NM, why are we worried about it? is it making the account stats flop around or something?	17:58
clayg	mattoliverau: IIRC, you seemed pretty vocal about versioned writes in the meeting last week - would you be available to talk about what our strategy should be ahead of the Wednesday meeting?	17:59
timburke	clayg, heh, i've been tinkering with a rebase of that on top of https://review.opendev.org/#/c/678962/ ;-)	17:59
patchbot	patch 678962 - swift - WIP: Add another versioning mode with a new naming... - 2 patch sets	17:59
clayg	timburke: I think the big question is going to be "can we commit to land this BEFORE we go to China" ???	17:59
timburke	👍	18:00
tdasilva	re the idea of auto-creating the versioned container, is it out of the question to create it in a .account?	18:01
timburke	tdasilva, i think that'd throw off the account usage reporting a lot...	18:02
*** zaitcev has quit IRC		18:02
clayg	^ 👍	18:02
tdasilva	I know there was a concern about accounting, but I was wondering if versions should actually be a separate "count" from objects? that way users would know how many versions they have for a given account	18:02
clayg	I mean if the counts and bytes went.. oh.. you i see what you mean...	18:03
clayg	but the bytes need to be billage	18:03
clayg	oh but still - "bytes of version" vs "bytes of objects"	18:03
timburke	even if your billing system were to assign bytes-used from .versions-<acct> to <acct>, the client gets no insight into which containers are costing them	18:03
clayg	hrmm...	18:03
tdasilva	right, but that should that be a separate stat	18:03
tdasilva	right	18:03
tdasilva	timburke: they get from the ?versions list, no?	18:04
clayg	😬 we need to think very carefully about this 🤔	18:04
tdasilva	and the container stat could also have a count of versions? maybe?	18:04
clayg	we should definately look at how s3 does it 🤣	18:04
tdasilva	^^^ !	18:05
tdasilva	i think they charge even for delete markers	18:05
clayg	yeah - two HEAD requests to fulfill a single client request is fine... if that's all we needed and it works better i think i could get behind that	18:05
timburke	tdasilva, so you'd need to do a ?versions HEAD to each of your... hundreds? thousands? of containers to get an aggregate view? there's definite value in having bytes/objects show up in GET account...	18:05
clayg	yes HEAD on account needs to have all the bytes - but we could have "bytes" and "bytes of versioned" returned by the API - I'm not seeing a good reason to argue that wouldn't work	18:06
timburke	i guess maybe you could have it do two GETs for the client request, one for the base account, one for the .versions guy... in some way merge the listings...	18:06
clayg	I mean you could even like annotate the containers in the bucket list	18:07
clayg	does s3 have bucket accounting in list-buckets?	18:07
timburke	what happens when you have a versions container but no base container?	18:07
timburke	clayg, nope :P Name and CreationDate as i recall	18:08
clayg	re-vivify?	18:08
tdasilva	timburke: should that be allowed to happen?	18:08
clayg	tdasilva: NO	18:08
clayg	but... eventual consistency 😞	18:08
timburke	...but eventual consistency will guarantee that it will at some point ;-)	18:09
clayg	we can consider it an error case tho - and degrade gracefully as long as it's well understood	18:09
tdasilva	going...i need to step out a bit for dinner, be back later	18:10
clayg	as long as it's determistic and we can explain how the client should preceed depending on what they want - I feel like that'd be workable.	18:10
clayg	so basically re-viviy - every versions container gets and entry in the container listing - if there's not one there you have to bake it up - and the client can either delete all the versions or recreate the container	18:11
clayg	i'm just spitballin	18:11
clayg	I still like we should say "you can't delete this container until you've deleted all the versoins" - we'll get it right most of the time	18:11
*** e0ne has joined #openstack-swift		18:12
NM	timburke: not sure if that is the reason but I'm getting some HTTP 503 when I try to list the container content. When I look at my proxy-error I got this message: " ERROR with Container server x.x.x.x:6001/sdb "#012HTTPException: got more than 126 headers"	18:15
*** zaitcev has joined #openstack-swift		18:15
*** ChanServ sets mode: +v zaitcev		18:15
timburke	NM, have you tried curling the container server directly and seeing how many headers are returned? you might try bumping up extra_header_count in swift.conf...	18:18
NM	If I curl direct to the container-server, it returns 1288 headers of X-Container-Sysmeta-Shard-Context-GUIDNUMBER	18:19
NM	(I was typing that :) )	18:19
timburke	O.o	18:19
NM	And they all have the same values: ""max_row": -1, "ranges_todo": 0, "ranges_done": 8, "cleaving_done": true, "last_cleave_to_row": null, "misplaced_done": true, "cursor": "", "cleave_to_row": -1, "ref": "SOME UID IN THE HEADER}	18:20
timburke	i expect the primaries are also going to have problems then... we really need to do something to clean up old records :-(	18:21
timburke	you should be able to POST to clear those -- i forget if we support x-remove-container-sysmeta-* like we do for user meta or not though. worst case, curl lets you specify a blank header with something like -H 'X-Container-Sysmeta-Shard-Context-GUIDNUMBER;'	18:23
NM	timburke: So I can assume it's safe to delete then, right? I didn't find anything about this header and I was somehow curious about ir.	18:26
timburke	NM, we use it to track sharding progress so we know when it's safe to delete an old DB. if it's not present, we might reshard a container, but that's ok -- everything will continue moving toward the correct state	18:28
*** e0ne has quit IRC		18:34
timburke	NM, filed https://bugs.launchpad.net/swift/+bug/1843313	18:38
openstack	Launchpad bug 1843313 in OpenStack Object Storage (swift) "Sharding handoffs creates a ton of container-server headers" [Undecided,New]	18:38
timburke	if you want all the nitty-gritty details, look for "CleavingContext" or "cleaving_context" https://github.com/openstack/swift/blob/master/swift/container/sharder.py	18:40
timburke	there's a little bit about that metadata in https://docs.openstack.org/swift/latest/overview_container_sharding.html#cleaving-shard-containers -- we should probably include an example of the header, though, to help with search-ability	18:42
*** NM has quit IRC		18:54
*** diablo_rojo has quit IRC		18:56
*** diablo_rojo has joined #openstack-swift		18:56
*** BjoernT has joined #openstack-swift		18:59
*** BjoernT_ has quit IRC		19:00
*** NM has joined #openstack-swift		19:02
*** e0ne has joined #openstack-swift		19:03
*** e0ne has quit IRC		19:09
*** e0ne has joined #openstack-swift		19:10
*** BjoernT_ has joined #openstack-swift		19:12
*** BjoernT has quit IRC		19:13
*** henriqueof1 has joined #openstack-swift		19:18
*** baojg has quit IRC		19:19
*** henriqueof1 has quit IRC		19:20
*** henriqueof has joined #openstack-swift		19:20
*** NM has quit IRC		19:44
*** NM has joined #openstack-swift		19:44
clayg	dude, I have no idea what we're going to do about the listing ordering thing - what a hozer - i needs a column 😢	19:54
timburke	clayg, the nice thing about '\x01' is that nothing can squeeze in before it ;-)	20:26
timburke	my naming scheme with '\x01\x01' gets even better with https://review.opendev.org/#/c/609843/	20:27
patchbot	patch 609843 - swift - Allow arbitrary UTF-8 strings as delimiters in lis... - 4 patch sets	20:27
clayg	I mean if it works that's amazing - I wonder if we could make it an implementation detail to the database somehow?	20:28
clayg	Maybe it's only at the container server or broker just for versioned containers - but the API can still be sane looking?	20:28
clayg	why do you need \x01\x01 if \x01 works?	20:28
clayg	a versioned container could potentially have a lot of context - we could know that \x01<timestamp> is like not part of the name when returning results	20:29
timburke	make it more likely that we'll get the right sort order even if clients include \x01 in names	20:29
timburke	(provided they don't double them up, too	20:29
clayg	so it can still go south if someone puts \x01 or \x01\x01 anywhere in the name? 😞	20:30
clayg	I mean if it doesn't work - I don't think it's worth it - let's just rewrite the container-server - if it works!? awesome kludge! 👍	20:31
*** notmyname has quit IRC		20:32
timburke	idk about getting this down to the broker -- one of the things i really like about how VW works today is that you've got enough visibility as a client to be able to repair things even if there's a bug in how you list/retrieve versions	20:32
*** patchbot has quit IRC		20:32
timburke	fwiw, \x00 would be a kludge that'd always work... provided we continue restricting clients from creating things with null bytes when we open it up for internal requests	20:35
NM	timburke: thanks for open a bug report. I can't see the headers if I send the request to the proxies. I can see the headers only when I perform a get to the container-server. I tried to send a POST to the container-server but it complains about the "Missing X-Timestamp header".	20:35
clayg	I like that operators have flexibility to "fix" things - I hate that versioned writes today makes us worry all the time about "well what if a client goes behind the curtain and messes everything up!" 😞	20:35
clayg	timburke: yes! let's do \x00 then? why doesn't it "always" work - why would you suggest \x01\x01 then!? 😕	20:36
timburke	clayg, i'm not sure how deep the "no null bytes in paths" thing goes, though. some unlikely-to-be-used but already valid string seemed "safe enough"	20:38
timburke	shrug	20:38
timburke	if clients go and mess things up behind our back, their on their own; it's undefined behavior. we should do our best to do something "reasonable", but as long as it's not a 500...	20:39
timburke	they're*	20:40
timburke	NM, add a -H 'X-Timestamp: 1568061615.94565' or so. it's more or less just a unix timestamp	20:41
timburke	use something like `python -c 'from swift.common.utils import Timestamp; print(Timestamp.now().internal)'` if you really want it to be up-to-date ;-)	20:41
*** patchbot has joined #openstack-swift		20:42
*** notmyname has joined #openstack-swift		20:42
*** ChanServ sets mode: +v notmyname		20:42
*** mvkr has joined #openstack-swift		20:48
NM	timburke: that is what I call up-to-date :) Anyway, do you see any relation between this headers and the container reporting 0 objects inside it and not listing its own objects?	21:04
NM	(And also the x-container-sharding as False)	21:05
DHE	any one see a problem is I had objects updated every ~5 seconds?	21:05
timburke	NM, once the primaries get all those headers, the proxies are always going to get the same 503 from them that you were seeing. presumably there's a container PUT at some point when clients go to upload, which is why we've got a bunch of handoffs and the heap of cleaving metadata. the freshly-created handoff containers would be the only things responsive, so... no objects :-(	21:10
timburke	i'm guessing that replication's not real happy right now, either...	21:10
timburke	DHE, how many objects are we talking?	21:13
timburke	and why are we writing them every 5s? are clients going to run into any trouble if they get a stale read?	21:15
NM	timburke: should I be worried about this container? At least the container on .shard_AUTH are responding correctly.	21:19
timburke	NM, i think once you get the old cleaving sysmeta out of there, it should sort itself out. you might need to do that periodically for a bit though, at least until https://bugs.launchpad.net/swift/+bug/1843313 is fixed and you can upgrade to a version that includes the fix :-( sorry	21:21
openstack	Launchpad bug 1843313 in OpenStack Object Storage (swift) "Sharding handoffs creates a ton of container-server headers" [Undecided,New]	21:21
timburke	NM, how long has the container been sharded?	21:21
timburke	is it on your default storage policy, or some other one?	21:22
timburke	if it's non-default, https://bugs.launchpad.net/swift/+bug/1836082 is a worry...	21:23
openstack	Launchpad bug 1836082 in OpenStack Object Storage (swift) "Reconciler-enqueuing needs to be shard-aware" [High,Fix released]	21:23
timburke	what version of swift are you on?	21:23
timburke	(probably should have been my first question ;-)	21:23
NM	timburke: LOL. I should also have said that earlier. Sharding: It was done 2 or 3 months ago. Replicas: I'm using the default (3 replicas). Version: 2.20.0	21:24
timburke	replicated vs ec is good to know, but i was thinking more about how many storage policies you've got defined in swift.conf and whether this container's policy matches whichever is flagged as default in that config	21:26
timburke	that reconciler bug definitely affects 2.20.0	21:27
NM	We don't use EC right now. 1 thing about this container (I don't know if it's relevant) but it recieve lots of tar.gz file to be unzip and stored.	21:28
DHE	timburke: a lot of objects, but maybe 2000 will be kept busy...	21:28
DHE	timburke: I'm okay with the previous version being read, MAYBE 2, but that's about my limit...	21:30
DHE	only failure scenario I can think of is a brief window when an object server is down, comes back up, and there's ~5 second window while a user could fetch an ancient version	21:30
DHE	where 5 minutes is considered ancient	21:30
timburke	NM, that might explain where the container PUTs are coming from: https://github.com/openstack/swift/blob/2.20.0/swift/common/middleware/bulk.py#L303-L309	21:32
timburke	NM, couple what you're seeing with https://bugs.launchpad.net/swift/+bug/1833612 -- yeah, that's probably gonna create a heap of handoffs	21:32
openstack	Launchpad bug 1833612 in OpenStack Object Storage (swift) "Overloaded container can get erroneously cached as 404" [Undecided,Fix released]	21:32
NM	timburke: Hummm… Do you see any way to fix this? Like, list all shards and uses this list to 'feed' the original container?	21:38
*** e0ne has quit IRC		21:40
timburke	NM, i think the shards should be ok. it's definitely worth correcting me if i'm wrong though! if you do a direct POST to clear the unneeded headers to just one of the primaries, it'll propagate to the other replicas, including the handoffs	21:42
timburke	and it should fix that replica pretty much immediately. you'll probably have to wait for it to no longer be error-limited, though	21:44
timburke	DHE, will the client be smart enough to see that the ancient version is ancient and either retry or bomb out? how big's the cluster? what kind of policy will those objects use? trying to figure out how many objects are likely to land on the same disk and cause contention...	21:47
timburke	and what's the read/write ratio? given how often we're writing, i'd expect that to dominate, but just want to confirm	21:49
*** BjoernT_ has quit IRC		21:54
DHE	timburke: it's been ordered, but hardware is still a few weeks out. looking at a somewhat geographically diverse cluster. it's basically being used for live TV. might be no hits, might be 1000 hits per second. but I plan to put some edge caching in place in front of the proxy servers, even if it's on the same host.	22:17
DHE	hmm.. maybe I could just set an expiration time of like 30 seconds...	22:18
timburke	DHE, and use a naming scheme that will give each write a distinct name. let the client derive the expected name based on current time (or maybe even better, some server-provided time)	22:20
DHE	I don't have that luxery	22:21
timburke	with client time, you've got to worry about some client that thinks it's in the future. with server time, you've gotta worry about caching	22:21
DHE	luxury	22:21
DHE	whatever	22:21
DHE	I've been pushing back that swift wasn't optimal for live content, cache or not	22:22
timburke	yeah, it's definitely got me a little nervous... likely to see async pendings piling up, but that's not so bad. stale reads sound like a much more likely problem, and more customer-facing	22:25
timburke	how big's 5s worth of content? i wonder if you could serve it more or less straight from memcached...	22:27
timburke	swift is great at durability. our scaling and performance are pretty good, but way less so when you're hammering just a handful of objects. need that nice, broad distribution of load	22:29
timburke	meanwhile, this use-case seems to be tossing the durability aspect out the window	22:30
timburke	definitely tee it off to swift so you can use it for on-demand! i'm less sure about the live stream	22:31
*** tkajinam has joined #openstack-swift		22:55
*** rcernin has joined #openstack-swift		22:59
NM	timburke: I've tried curl -v -H "X-Container-Sysmeta-Shard-Context-7ddb3….-547595833ff8;" --data "" -H"X-Timestamp: 1568070332.21750" http://MY_IP:6001/sda/42306/AUTH_643f797035bf416ba8001e95947622c0/components	23:13
NM	And curl -v -H "X-Remove-Container-Sysmeta-Shard-Context-7ddb3….-547595833ff8: x" --data "" -H"X-Timestamp: 1568070332.21750" http://MY_IP:6001/sda/42306/AUTH_643f797035bf416ba8001e95947622c0/components	23:14
*** threestrands has joined #openstack-swift		23:14
NM	Neither of then seems to work. At some point I successfull set the header to "{}" but after some time it got back to the data about sharding process.	23:15
timburke	this was for one of the UUIDs that claimed "cleaving_done": true, "misplaced_done": true, yeah? hmm...	23:17
NM	Yeah! {"max_row": -1, "ranges_todo": 0, "ranges_done": 8, "cleaving_done": true, "last_cleave_to_row": null, "misplaced_done": true, "cursor": "", "cleave_to_row": -1, "ref": "7dd…"}	23:18
timburke	there's a chance that it just happened to be one of the handoffs out there sharding, i suppose...	23:19
NM	I see. My last shot was to send the post to all container-server at once but it didn't work also.	23:21
timburke	at like 6/1000 or so, the odds seem against it, though...	23:23
NM	The 3 primaries db are sharded. Handoff 4 and 5 are unsharded and handoff 6 is sharding.	23:26
NM	Considering "X-Backend-Sharding-State" header	23:27
timburke	have you looked further out in the handoff list? i'm a little worried that replication will poison some of the handoffs so they'd start responding with too many headers, too...	23:28
NM	One handoff is "poisoned". The one that says it's shardiing. The other 2 are not. But they say "X-Backend-Sharding-State: unsharded"	23:33
timburke	NM, what about handoffs 7, 8? might want to add a --all to your swift-get-nodes to see more handoffs	23:43
timburke	sorry, i gotta head out... the surgery might be a little more involved than i'd originally hoped, sorry NM :-(	23:45
NM	timburke: sure! Thanks anyway. Tomorrow I'll get back to this.	23:47
NM	timburke: (Do you mean real surgery or are you talking about swift?)	23:48
*** NM has quit IRC		23:50

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!