*** rcernin has quit IRC | 00:30 | |
*** ccamacho has quit IRC | 00:51 | |
*** rcernin has joined #openstack-swift | 01:45 | |
*** baojg has quit IRC | 02:47 | |
*** baojg has joined #openstack-swift | 02:48 | |
*** gkadam has joined #openstack-swift | 03:57 | |
*** e0ne has joined #openstack-swift | 06:09 | |
*** e0ne has quit IRC | 06:18 | |
openstackgerrit | Matthew Oliver proposed openstack/swift master: PDF Documentation Build tox target https://review.opendev.org/679898 | 06:18 |
---|---|---|
*** ccamacho has joined #openstack-swift | 06:42 | |
*** rcernin has quit IRC | 07:02 | |
*** tesseract has joined #openstack-swift | 07:13 | |
*** aluria has quit IRC | 07:33 | |
*** aluria has joined #openstack-swift | 07:38 | |
*** pcaruana has joined #openstack-swift | 07:49 | |
*** e0ne has joined #openstack-swift | 08:19 | |
*** tkajinam has quit IRC | 08:42 | |
*** rcernin has joined #openstack-swift | 10:08 | |
*** spsurya has joined #openstack-swift | 10:21 | |
*** rcernin has quit IRC | 12:22 | |
*** gkadam has quit IRC | 12:55 | |
*** BjoernT has joined #openstack-swift | 12:56 | |
*** e0ne has quit IRC | 13:19 | |
*** camelCaser has quit IRC | 13:28 | |
*** BjoernT_ has joined #openstack-swift | 13:41 | |
*** BjoernT has quit IRC | 13:44 | |
*** NM has joined #openstack-swift | 13:48 | |
*** e0ne has joined #openstack-swift | 13:50 | |
NM | Hello everyone. Mind if someone can point any direction to me: One of our sharded container is reporting it has zero objects and its header "X-Container-Sharding" returns False. The container.recon confirms it's sharded ( "state": "sharded" | 13:52 |
NM | "state": "sharded"). | 13:52 |
NM | One thing we found out: although we work with 3 replicas, there are 6 db files: 3 regular ones and 3 handoffs. Are there any tips to "fix" this container? | 13:52 |
*** BjoernT_ has quit IRC | 13:58 | |
*** BjoernT has joined #openstack-swift | 14:08 | |
*** BjoernT_ has joined #openstack-swift | 14:12 | |
*** BjoernT has quit IRC | 14:13 | |
*** zaitcev has joined #openstack-swift | 14:59 | |
*** ChanServ sets mode: +v zaitcev | 14:59 | |
*** diablo_rojo__ has joined #openstack-swift | 15:05 | |
timburke | NM: sounds a lot like https://bugs.launchpad.net/swift/+bug/1839355 | 15:06 |
openstack | Launchpad bug 1839355 in OpenStack Object Storage (swift) "container-sharder should keep cleaving when there are no rows" [Undecided,In progress] - Assigned to Matthew Oliver (matt-0) | 15:06 |
timburke | good news is, mattoliverau's proposed a fix at https://review.opendev.org/#/c/675820/ | 15:06 |
patchbot | patch 675820 - swift - sharder: Keep cleaving on empty shard ranges - 4 patch sets | 15:06 |
*** gyee has joined #openstack-swift | 15:07 | |
*** NM has quit IRC | 15:52 | |
*** diablo_rojo__ is now known as diablo_rojo | 16:00 | |
openstackgerrit | Thiago da Silva proposed openstack/swift master: WIP: new versioning mode as separate middleware https://review.opendev.org/681054 | 16:00 |
*** tesseract has quit IRC | 16:07 | |
*** e0ne has quit IRC | 16:10 | |
*** spsurya has quit IRC | 16:27 | |
*** diablo_rojo has quit IRC | 16:49 | |
*** diablo_rojo has joined #openstack-swift | 17:02 | |
*** NM has joined #openstack-swift | 17:09 | |
*** camelCaser has joined #openstack-swift | 17:19 | |
NM | thanks timburke. Meanwhile, is there any workarround? Should it be safe to delete de db's on the handoff nodes? | 17:20 |
*** camelCaser has quit IRC | 17:28 | |
*** camelCaser has joined #openstack-swift | 17:28 | |
timburke | NM, mostly likely yes, it's safe. it also shouldn't really be *harming* anything, though... | 17:58 |
clayg | tdasilva: timburke: should I rebase p 673682 ontop of p 681054 - or wait until we can talk more abou tit? | 17:58 |
patchbot | https://review.opendev.org/#/c/673682/ - swift - s3api: Implement versioning status API - 2 patch sets | 17:58 |
patchbot | https://review.opendev.org/#/c/681054/ - swift - WIP: new versioning mode as separate middleware - 1 patch set | 17:58 |
timburke | NM, why are we worried about it? is it making the account stats flop around or something? | 17:58 |
clayg | mattoliverau: IIRC, you seemed pretty vocal about versioned writes in the meeting last week - would you be available to talk about what our strategy should be ahead of the Wednesday meeting? | 17:59 |
timburke | clayg, heh, i've been tinkering with a rebase of that on top of https://review.opendev.org/#/c/678962/ ;-) | 17:59 |
patchbot | patch 678962 - swift - WIP: Add another versioning mode with a new naming... - 2 patch sets | 17:59 |
clayg | timburke: I think the big question is going to be "can we commit to land this BEFORE we go to China" ??? | 17:59 |
timburke | 👍 | 18:00 |
tdasilva | re the idea of auto-creating the versioned container, is it out of the question to create it in a .account? | 18:01 |
timburke | tdasilva, i think that'd throw off the account usage reporting a lot... | 18:02 |
*** zaitcev has quit IRC | 18:02 | |
clayg | ^ 👍 | 18:02 |
tdasilva | I know there was a concern about accounting, but I was wondering if versions should actually be a separate "count" from objects? that way users would know how many versions they have for a given account | 18:02 |
clayg | I mean if the counts and bytes went.. oh.. you i see what you mean... | 18:03 |
clayg | but the bytes need to be billage | 18:03 |
clayg | oh but still - "bytes of version" vs "bytes of objects" | 18:03 |
timburke | even if your billing system were to assign bytes-used from .versions-<acct> to <acct>, the client gets no insight into which containers are costing them | 18:03 |
clayg | hrmm... | 18:03 |
tdasilva | right, but that should that be a separate stat | 18:03 |
tdasilva | right | 18:03 |
tdasilva | timburke: they get from the ?versions list, no? | 18:04 |
clayg | 😬 we need to think very carefully about this 🤔 | 18:04 |
tdasilva | and the container stat could also have a count of versions? maybe? | 18:04 |
clayg | we should definately look at how s3 does it 🤣 | 18:04 |
tdasilva | ^^^ ! | 18:05 |
tdasilva | i think they charge even for delete markers | 18:05 |
clayg | yeah - two HEAD requests to fulfill a single client request is fine... if that's all we needed and it works better i think i could get behind that | 18:05 |
timburke | tdasilva, so you'd need to do a ?versions HEAD to each of your... hundreds? thousands? of containers to get an aggregate view? there's definite value in having bytes/objects show up in GET account... | 18:05 |
clayg | yes HEAD on account needs to have all the bytes - but we could have "bytes" and "bytes of versioned" returned by the API - I'm not seeing a good reason to argue that wouldn't work | 18:06 |
timburke | i guess maybe you could have it do two GETs for the client request, one for the base account, one for the .versions guy... in some way merge the listings... | 18:06 |
clayg | I mean you could even like *annotate* the containers in the bucket list | 18:07 |
clayg | does s3 have bucket accounting in list-buckets? | 18:07 |
timburke | what happens when you have a versions container but no base container? | 18:07 |
timburke | clayg, nope :P Name and CreationDate as i recall | 18:08 |
clayg | re-vivify? | 18:08 |
tdasilva | timburke: should that be allowed to happen? | 18:08 |
clayg | tdasilva: NO | 18:08 |
clayg | but... eventual consistency 😞 | 18:08 |
timburke | ...but eventual consistency will guarantee that it will at some point ;-) | 18:09 |
clayg | we can consider it an error case tho - and degrade gracefully as long as it's well understood | 18:09 |
tdasilva | going...i need to step out a bit for dinner, be back later | 18:10 |
clayg | as long as it's determistic and we can explain how the client should preceed depending on what they want - I feel like that'd be workable. | 18:10 |
clayg | so basically re-viviy - every versions container gets and entry in the container listing - if there's not one there you have to bake it up - and the client can either delete all the versions or recreate the container | 18:11 |
clayg | i'm just spitballin | 18:11 |
clayg | I still like we should say "you can't delete this container until you've deleted all the versoins" - we'll get it right most of the time | 18:11 |
*** e0ne has joined #openstack-swift | 18:12 | |
NM | timburke: not sure if that is the reason but I'm getting some HTTP 503 when I try to list the container content. When I look at my proxy-error I got this message: " ERROR with Container server x.x.x.x:6001/sdb "#012HTTPException: got more than 126 headers" | 18:15 |
*** zaitcev has joined #openstack-swift | 18:15 | |
*** ChanServ sets mode: +v zaitcev | 18:15 | |
timburke | NM, have you tried curling the container server directly and seeing how many headers are returned? you might try bumping up extra_header_count in swift.conf... | 18:18 |
NM | If I curl direct to the container-server, it returns 1288 headers of X-Container-Sysmeta-Shard-Context-GUIDNUMBER | 18:19 |
NM | (I was typing that :) ) | 18:19 |
timburke | O.o | 18:19 |
NM | And they all have the same values: ""max_row": -1, "ranges_todo": 0, "ranges_done": 8, "cleaving_done": true, "last_cleave_to_row": null, "misplaced_done": true, "cursor": "", "cleave_to_row": -1, "ref": "SOME UID IN THE HEADER} | 18:20 |
timburke | i expect the primaries are also going to have problems then... we really need to do something to clean up old records :-( | 18:21 |
timburke | you should be able to POST to clear those -- i forget if we support x-remove-container-sysmeta-* like we do for user meta or not though. worst case, curl lets you specify a blank header with something like -H 'X-Container-Sysmeta-Shard-Context-GUIDNUMBER;' | 18:23 |
NM | timburke: So I can assume it's safe to delete then, right? I didn't find anything about this header and I was somehow curious about ir. | 18:26 |
timburke | NM, we use it to track sharding progress so we know when it's safe to delete an old DB. if it's not present, we might reshard a container, but that's ok -- everything will continue moving toward the correct state | 18:28 |
*** e0ne has quit IRC | 18:34 | |
timburke | NM, filed https://bugs.launchpad.net/swift/+bug/1843313 | 18:38 |
openstack | Launchpad bug 1843313 in OpenStack Object Storage (swift) "Sharding handoffs creates a *ton* of container-server headers" [Undecided,New] | 18:38 |
timburke | if you want all the nitty-gritty details, look for "CleavingContext" or "cleaving_context" https://github.com/openstack/swift/blob/master/swift/container/sharder.py | 18:40 |
timburke | there's a little bit about that metadata in https://docs.openstack.org/swift/latest/overview_container_sharding.html#cleaving-shard-containers -- we should probably include an example of the header, though, to help with search-ability | 18:42 |
*** NM has quit IRC | 18:54 | |
*** diablo_rojo has quit IRC | 18:56 | |
*** diablo_rojo has joined #openstack-swift | 18:56 | |
*** BjoernT has joined #openstack-swift | 18:59 | |
*** BjoernT_ has quit IRC | 19:00 | |
*** NM has joined #openstack-swift | 19:02 | |
*** e0ne has joined #openstack-swift | 19:03 | |
*** e0ne has quit IRC | 19:09 | |
*** e0ne has joined #openstack-swift | 19:10 | |
*** BjoernT_ has joined #openstack-swift | 19:12 | |
*** BjoernT has quit IRC | 19:13 | |
*** henriqueof1 has joined #openstack-swift | 19:18 | |
*** baojg has quit IRC | 19:19 | |
*** henriqueof1 has quit IRC | 19:20 | |
*** henriqueof has joined #openstack-swift | 19:20 | |
*** NM has quit IRC | 19:44 | |
*** NM has joined #openstack-swift | 19:44 | |
clayg | dude, I have no idea what we're going to do about the listing ordering thing - what a hozer - i needs a column 😢 | 19:54 |
timburke | clayg, the nice thing about '\x01' is that nothing can squeeze in before it ;-) | 20:26 |
timburke | my naming scheme with '\x01\x01' gets even better with https://review.opendev.org/#/c/609843/ | 20:27 |
patchbot | patch 609843 - swift - Allow arbitrary UTF-8 strings as delimiters in lis... - 4 patch sets | 20:27 |
clayg | I mean if it *works* that's *amazing* - I wonder if we could make it an implementation detail to the database somehow? | 20:28 |
clayg | Maybe it's only at the container server or broker just for versioned containers - but the API can still be sane looking? | 20:28 |
clayg | why do you need \x01\x01 if \x01 works? | 20:28 |
clayg | a versioned container could potentially have a lot of context - we could know that \x01<timestamp> is like *not* part of the name when returning results | 20:29 |
timburke | make it more likely that we'll get the right sort order even if clients include \x01 in names | 20:29 |
timburke | (provided *they* don't double them up, *too* | 20:29 |
clayg | so it can still go south if someone puts \x01 or \x01\x01 anywhere *in* the name? 😞 | 20:30 |
clayg | I mean if it doesn't *work* - I don't think it's worth it - let's just rewrite the container-server - if it works!? awesome kludge! 👍 | 20:31 |
*** notmyname has quit IRC | 20:32 | |
timburke | idk about getting this down to the broker -- one of the things i really like about how VW works today is that you've got enough visibility as a client to be able to repair things even if there's a bug in how you list/retrieve versions | 20:32 |
*** patchbot has quit IRC | 20:32 | |
timburke | fwiw, \x00 would be a kludge that'd *always* work... provided we continue restricting *clients* from creating things with null bytes when we open it up for *internal* requests | 20:35 |
NM | timburke: thanks for open a bug report. I can't see the headers if I send the request to the proxies. I can see the headers only when I perform a get to the container-server. I tried to send a POST to the container-server but it complains about the "Missing X-Timestamp header". | 20:35 |
clayg | I like that *operators* have flexibility to "fix" things - I *hate* that versioned writes today makes us worry all the time about "well what if a client goes behind the curtain and messes everything up!" 😞 | 20:35 |
clayg | timburke: yes! let's do \x00 then? why doesn't it "always" work - why would you suggest \x01\x01 then!? 😕 | 20:36 |
timburke | clayg, i'm not sure how deep the "no null bytes in paths" thing goes, though. some unlikely-to-be-used but already valid string seemed "safe enough" | 20:38 |
timburke | *shrug* | 20:38 |
timburke | if clients go and mess things up behind our back, their on their own; it's undefined behavior. we should do our best to do something "reasonable", but as long as it's not a 500... | 20:39 |
timburke | they're* | 20:40 |
timburke | NM, add a -H 'X-Timestamp: 1568061615.94565' or so. it's more or less just a unix timestamp | 20:41 |
timburke | use something like `python -c 'from swift.common.utils import Timestamp; print(Timestamp.now().internal)'` if you *really* want it to be up-to-date ;-) | 20:41 |
*** patchbot has joined #openstack-swift | 20:42 | |
*** notmyname has joined #openstack-swift | 20:42 | |
*** ChanServ sets mode: +v notmyname | 20:42 | |
*** mvkr has joined #openstack-swift | 20:48 | |
NM | timburke: that is what I call up-to-date :) Anyway, do you see any relation between this headers and the container reporting 0 objects inside it and not listing its own objects? | 21:04 |
NM | (And also the x-container-sharding as False) | 21:05 |
DHE | any one see a problem is I had objects updated every ~5 seconds? | 21:05 |
timburke | NM, once the primaries get all those headers, the proxies are always going to get the same 503 from them that you were seeing. presumably there's a container PUT at some point when clients go to upload, which is why we've got a bunch of handoffs and the heap of cleaving metadata. the freshly-created handoff containers would be the only things responsive, so... no objects :-( | 21:10 |
timburke | i'm guessing that replication's not real happy right now, either... | 21:10 |
timburke | DHE, how many objects are we talking? | 21:13 |
timburke | and why are we writing them every 5s? are clients going to run into any trouble if they get a stale read? | 21:15 |
NM | timburke: should I be worried about this container? At least the container on .shard_AUTH are responding correctly. | 21:19 |
timburke | NM, i think once you get the old cleaving sysmeta out of there, it should sort itself out. you might need to do that periodically for a bit though, at least until https://bugs.launchpad.net/swift/+bug/1843313 is fixed and you can upgrade to a version that includes the fix :-( sorry | 21:21 |
openstack | Launchpad bug 1843313 in OpenStack Object Storage (swift) "Sharding handoffs creates a *ton* of container-server headers" [Undecided,New] | 21:21 |
timburke | NM, how long has the container been sharded? | 21:21 |
timburke | is it on your default storage policy, or some other one? | 21:22 |
timburke | if it's non-default, https://bugs.launchpad.net/swift/+bug/1836082 is a worry... | 21:23 |
openstack | Launchpad bug 1836082 in OpenStack Object Storage (swift) "Reconciler-enqueuing needs to be shard-aware" [High,Fix released] | 21:23 |
timburke | what version of swift are you on? | 21:23 |
timburke | (probably should have been my *first* question ;-) | 21:23 |
NM | timburke: LOL. I should also have said that earlier. Sharding: It was done 2 or 3 months ago. Replicas: I'm using the default (3 replicas). Version: 2.20.0 | 21:24 |
timburke | replicated vs ec is good to know, but i was thinking more about how many storage policies you've got defined in swift.conf and whether this container's policy matches whichever is flagged as default in that config | 21:26 |
timburke | that reconciler bug definitely affects 2.20.0 | 21:27 |
NM | We don't use EC right now. 1 thing about this container (I don't know if it's relevant) but it recieve lots of tar.gz file to be unzip and stored. | 21:28 |
DHE | timburke: a lot of objects, but maybe 2000 will be kept busy... | 21:28 |
DHE | timburke: I'm okay with the previous version being read, MAYBE 2, but that's about my limit... | 21:30 |
DHE | only failure scenario I can think of is a brief window when an object server is down, comes back up, and there's ~5 second window while a user could fetch an ancient version | 21:30 |
DHE | where 5 minutes is considered ancient | 21:30 |
timburke | NM, that might explain where the container PUTs are coming from: https://github.com/openstack/swift/blob/2.20.0/swift/common/middleware/bulk.py#L303-L309 | 21:32 |
timburke | NM, couple what you're seeing with https://bugs.launchpad.net/swift/+bug/1833612 -- yeah, that's probably gonna create a heap of handoffs | 21:32 |
openstack | Launchpad bug 1833612 in OpenStack Object Storage (swift) "Overloaded container can get erroneously cached as 404" [Undecided,Fix released] | 21:32 |
NM | timburke: Hummm… Do you see any way to fix this? Like, list all shards and uses this list to 'feed' the original container? | 21:38 |
*** e0ne has quit IRC | 21:40 | |
timburke | NM, i *think* the shards should be ok. it's definitely worth correcting me if i'm wrong though! if you do a direct POST to clear the unneeded headers to just one of the primaries, it'll propagate to the other replicas, including the handoffs | 21:42 |
timburke | and it should fix that replica pretty much immediately. you'll probably have to wait for it to no longer be error-limited, though | 21:44 |
timburke | DHE, will the client be smart enough to see that the ancient version is ancient and either retry or bomb out? how big's the cluster? what kind of policy will those objects use? trying to figure out how many objects are likely to land on the same disk and cause contention... | 21:47 |
timburke | and what's the read/write ratio? given how often we're writing, i'd expect that to dominate, but just want to confirm | 21:49 |
*** BjoernT_ has quit IRC | 21:54 | |
DHE | timburke: it's been ordered, but hardware is still a few weeks out. looking at a somewhat geographically diverse cluster. it's basically being used for live TV. might be no hits, might be 1000 hits per second. but I plan to put some edge caching in place in front of the proxy servers, even if it's on the same host. | 22:17 |
DHE | hmm.. maybe I could just set an expiration time of like 30 seconds... | 22:18 |
timburke | DHE, and use a naming scheme that will give each write a distinct name. let the client derive the expected name based on current time (or maybe even better, some server-provided time) | 22:20 |
DHE | I don't have that luxery | 22:21 |
timburke | with client time, you've got to worry about some client that thinks it's in the future. with server time, you've gotta worry about caching | 22:21 |
DHE | luxury | 22:21 |
DHE | whatever | 22:21 |
DHE | I've been pushing back that swift wasn't optimal for live content, cache or not | 22:22 |
timburke | yeah, it's definitely got me a little nervous... likely to see async pendings piling up, but that's not *so* bad. stale reads sound like a much more likely problem, and more customer-facing | 22:25 |
timburke | how big's 5s worth of content? i wonder if you could serve it more or less straight from memcached... | 22:27 |
timburke | swift is great at durability. our scaling and performance are pretty good, but way less so when you're hammering just a handful of objects. need that nice, broad distribution of load | 22:29 |
timburke | meanwhile, this use-case seems to be tossing the durability aspect out the window | 22:30 |
timburke | definitely tee it off to swift so you can use it for on-demand! i'm less sure about the live stream | 22:31 |
*** tkajinam has joined #openstack-swift | 22:55 | |
*** rcernin has joined #openstack-swift | 22:59 | |
NM | timburke: I've tried curl -v -H "X-Container-Sysmeta-Shard-Context-7ddb3….-547595833ff8;" --data "" -H"X-Timestamp: 1568070332.21750" http://MY_IP:6001/sda/42306/AUTH_643f797035bf416ba8001e95947622c0/components | 23:13 |
NM | And curl -v -H "X-Remove-Container-Sysmeta-Shard-Context-7ddb3….-547595833ff8: x" --data "" -H"X-Timestamp: 1568070332.21750" http://MY_IP:6001/sda/42306/AUTH_643f797035bf416ba8001e95947622c0/components | 23:14 |
*** threestrands has joined #openstack-swift | 23:14 | |
NM | Neither of then seems to work. At some point I successfull set the header to "{}" but after some time it got back to the data about sharding process. | 23:15 |
timburke | this was for one of the UUIDs that claimed "cleaving_done": true, "misplaced_done": true, yeah? hmm... | 23:17 |
NM | Yeah! {"max_row": -1, "ranges_todo": 0, "ranges_done": 8, "cleaving_done": true, "last_cleave_to_row": null, "misplaced_done": true, "cursor": "", "cleave_to_row": -1, "ref": "7dd…"} | 23:18 |
timburke | there's a chance that it just happened to be one of the handoffs out there sharding, i suppose... | 23:19 |
NM | I see. My last shot was to send the post to all container-server at once but it didn't work also. | 23:21 |
timburke | at like 6/1000 or so, the odds seem against it, though... | 23:23 |
NM | The 3 primaries db are sharded. Handoff 4 and 5 are unsharded and handoff 6 is sharding. | 23:26 |
NM | Considering "X-Backend-Sharding-State" header | 23:27 |
timburke | have you looked further out in the handoff list? i'm a little worried that replication will poison some of the handoffs so *they'd* start responding with too many headers, too... | 23:28 |
NM | One handoff is "poisoned". The one that says it's shardiing. The other 2 are not. But they say "X-Backend-Sharding-State: unsharded" | 23:33 |
timburke | NM, what about handoffs 7, 8? might want to add a --all to your swift-get-nodes to see more handoffs | 23:43 |
timburke | sorry, i gotta head out... the surgery might be a little more involved than i'd originally hoped, sorry NM :-( | 23:45 |
NM | timburke: sure! Thanks anyway. Tomorrow I'll get back to this. | 23:47 |
NM | timburke: (Do you mean real surgery or are you talking about swift?) | 23:48 |
*** NM has quit IRC | 23:50 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!