Tuesday, 2019-01-22

*** tkajinam has joined #openstack-swift		00:02
*** psachin has joined #openstack-swift		02:43
*** gkadam has joined #openstack-swift		03:19
*** rcernin has quit IRC		03:35
*** baojg has joined #openstack-swift		03:48
*** rcernin has joined #openstack-swift		03:50
*** rcernin has quit IRC		03:57
*** rcernin has joined #openstack-swift		03:58
*** baojg has quit IRC		04:12
*** baojg has joined #openstack-swift		04:18
*** gkadam has quit IRC		04:21
*** baojg has quit IRC		04:23
*** godog has quit IRC		04:26
*** baojg has joined #openstack-swift		05:11
*** tkajinam_ has joined #openstack-swift		05:16
*** tkajinam has quit IRC		05:18
*** baojg has quit IRC		05:22
*** baojg has joined #openstack-swift		05:23
*** tkajinam__ has joined #openstack-swift		05:25
*** tkajinam_ has quit IRC		05:27
*** e0ne has joined #openstack-swift		05:52
*** e0ne has quit IRC		05:53
*** tkajinam_ has joined #openstack-swift		06:09
*** tkajinam__ has quit IRC		06:12
*** spsurya has joined #openstack-swift		06:21
*** ccamacho has joined #openstack-swift		07:12
*** baojg has quit IRC		07:36
*** pcaruana has joined #openstack-swift		07:41
*** gkadam has joined #openstack-swift		07:51
*** baojg has joined #openstack-swift		08:03
*** tkajinam_ has quit IRC		08:12
*** godog has joined #openstack-swift		08:18
*** rcernin has quit IRC		08:56
*** mikecmpbll has quit IRC		09:04
*** hseipp has joined #openstack-swift		09:05
*** mikecmpbll has joined #openstack-swift		09:18
*** baojg has quit IRC		09:57
*** baojg has joined #openstack-swift		09:58
*** baojg has quit IRC		10:09
*** e0ne has joined #openstack-swift		10:24
*** mahatic has joined #openstack-swift		10:51
*** ChanServ sets mode: +v mahatic		10:51
*** mvkr has quit IRC		11:34
*** mvkr has joined #openstack-swift		12:06
*** e0ne has quit IRC		12:25
*** e0ne has joined #openstack-swift		12:30
*** baojg has joined #openstack-swift		13:00
*** psachin has quit IRC		13:16
*** e0ne has quit IRC		14:05
*** e0ne has joined #openstack-swift		14:08
*** openstackgerrit has joined #openstack-swift		15:24
openstackgerrit	Thiago da Silva proposed openstack/swift master: Remove duplicate statement https://review.openstack.org/632486	15:24
*** ccamacho has quit IRC		15:37
*** ccamacho has joined #openstack-swift		15:37
*** ybunker has joined #openstack-swift		15:46
ybunker	hi all, quick question.. I've a swift cluster and some of the obj drives are getting more used that others, for example, some are at 95% of used space, and others are at 82% on the same node and also on different nodes.. the weights on the object ring are the same for those drives.. any ideas of what could be going on in here? also, is there a way to stop "storing" data on those 95% used disks?	15:48
*** openstackgerrit has quit IRC		15:51
*** ianychoi has joined #openstack-swift		16:26
*** ccamacho has quit IRC		16:33
*** e0ne has quit IRC		16:38
*** e0ne has joined #openstack-swift		16:39
*** hseipp has quit IRC		16:42
*** pcaruana has quit IRC		17:02
*** e0ne has quit IRC		17:02
DHE	do you have a sane number of placement groups? sounds like you may have too few	17:06
DHE	and no, you can't just stop using certain drives. swift needs to be able to consistently predict where an object is located by name alone	17:06
*** ccamacho has joined #openstack-swift		17:16
ybunker	DHE: I've 8192 partitions, with replica of 3, 1 region, 8 zones and 72 devices	17:21
DHE	seems okay...	17:25
ybunker	DHE: don't know where else to look, and every day the used space is growing	17:26
DHE	are the overused devices (disks) consistently on the same nodes? or would one node have a mixture of high and low usage disks	17:28
tdasilva	it's also a good idea to double check that one of your nodes don't have old rings...swift-recon --md5 will check all rings for you	17:30
DHE	that's a good point...	17:30
ybunker	thanks a lot guys will check on that	17:31
DHE	I'm assuming 72 disks here, not 72 nodes/hosts/servers	17:31
*** mikecmpbll has quit IRC		17:43
*** ccamacho has quit IRC		17:51
ybunker	72 disks yes, with 8 data nodes	17:52
ybunker	the rings are the same for all the nodes, so that's not the problem :(	17:55
timburke	ybunker: sounds like you're getting into a cluster-full situation, which typically sucks :-( even those ~80% full drives aren't going to be super happy; fwiw my usual recommendation is to try to keep drives under ~75% full	18:00
ybunker	we are planning to add 4x more data nodes.. but it will take at least a month... :S	18:01
timburke	...can we delete some data?	18:01
*** pcaruana has joined #openstack-swift		18:01
ybunker	two data nodes had 60~70% used, so i change the weights of those nodes so they balance more there.. but on the other nodes instead of lower a little bit on space, they just grow and grow :(	18:02
timburke	the core trouble is that those full drives are going to start responding 507, even for a lot of replication requests, which means that the remaining drives will fill up even more quickly, and you'll probably get some super-replicated data	18:03
timburke	if you're confident that your drives are healthy and unlikely to fail in the next couple months (to give you time to not only get the new hardware in place but also get replication to settle), you might want to look at the handoffs_first and handoffs_delete options.... i'd feel much more comfortable recommending them if you already had the new hardware in place, though, and just needed to make replication go faster	18:05
ybunker	the thing is that the cluster has millon and millons of images.., mmm at some point is possible to delete some of the replicas?	18:05
timburke	see https://github.com/openstack/swift/blob/2.20.0/etc/object-server.conf-sample#L279-L296 for the config options	18:06
ybunker	timburke: thanks a lot, let me take a look on that	18:06
ybunker	timburke: are those options available for juno release? 2.2.0 ?	18:07
timburke	you could reduce the replica count for the ring... but it'll come at a cost to durability, and probably wouldn't be a quick fix. i wouldn't recommend it unless you already know you want a two-replica policy or something	18:08
timburke	should go back fairly far... but then, junolemme see...	18:08
timburke	should go back fairly far... but then, juno's pretty old... lemme see...	18:08
timburke	looks like you're good: https://github.com/openstack/swift/commit/e078dc3da05ce9e7c2b36e05686d28101381eec8	18:09
timburke	(missing sample config got added in 1.13.0)	18:10
ybunker	thanks :), so handoffs_first should be change to True and leave handoff_delete to auto	18:11
ybunker	oh sorry to 2	18:12
timburke	probably? you'll definitely want to have handoffs_first=true when rebalancing... and yeah, handoff_delete=2 seems not-crazy	18:12
timburke	once the new hardware's in place and you've had a few good replication cycles, you'll want to take those back to the defaults	18:13
ybunker	another problem that we got is that we can let object-replicator process to be running all day, we start the process on a specific window and then stopped, because latency goes to the sky	18:13
ybunker	so the obj-replicator process runs for about 4 hours a day	18:14
timburke	good news is, more disks should definitely help with that	18:15
timburke	are your auditors on the same schedule?	18:16
ybunker	yes	18:16
timburke	makes sense. anything you can do to avoid the disk-thrashing, i'd imagine...	18:17
ybunker	do I need any special configuration on the object-auditor? i just have a concurrency of 1, files per second to 1, zero_byte_files_per_second = 5 and bytes_per_second = 1000	18:19
*** pcaruana has quit IRC		18:23
timburke	seems... ok-ish, i guess? how far does it get in that 4hr window? i feel like with that tuning, you should be able to have them running continuously without really impacting client traffic...	18:23
timburke	i'd be inclined to increase concurrency to # of disks on the node, but that's me	18:25
*** gkadam has quit IRC		18:26
timburke	how long is that cycle time? i feel like it must take a while...	18:26
ybunker	on object-replicator i had concurrency = 2 and replicator_workers = 6	18:27
timburke	yeah, i think i like that once better. auditor doesn't have the same concurrency/workers split iirc	18:28
timburke	i think they might not even mean the same thing :-(	18:28
timburke	ugh, yeah: https://review.openstack.org/#/c/572571/	18:29
patchbot	patch 572571 - swift - object-auditor: change "concurrency" to "auditor_w... - 1 patch set	18:29
*** ybunker has quit IRC		18:34
*** ybunker has joined #openstack-swift		18:34
*** ybunker has quit IRC		18:45
*** ybunker has joined #openstack-swift		18:45
*** openstackgerrit has joined #openstack-swift		18:45
openstackgerrit	Tim Burke proposed openstack/swift master: object-auditor: change "concurrency" to "auditor_workers" in configs https://review.openstack.org/572571	18:45
*** mikecmpbll has joined #openstack-swift		18:47
openstackgerrit	Tim Burke proposed openstack/swift master: object-auditor: change "concurrency" to "auditor_workers" in configs https://review.openstack.org/572571	18:53
*** baojg has quit IRC		18:57
*** baojg has joined #openstack-swift		18:57
*** baojg has quit IRC		18:58
*** baojg has joined #openstack-swift		18:58
*** baojg has quit IRC		18:58
*** baojg has joined #openstack-swift		18:59
*** baojg has quit IRC		18:59
*** baojg has joined #openstack-swift		18:59
*** baojg has quit IRC		19:00
*** baojg has joined #openstack-swift		19:00
*** baojg has quit IRC		19:01
*** baojg has joined #openstack-swift		19:01
*** baojg has quit IRC		19:02
*** baojg has joined #openstack-swift		19:02
*** baojg has quit IRC		19:02
*** baojg has joined #openstack-swift		19:03
*** baojg has quit IRC		19:04
*** baojg has joined #openstack-swift		19:04
*** baojg has quit IRC		19:05
*** baojg has joined #openstack-swift		19:05
*** baojg has quit IRC		19:06
*** baojg has joined #openstack-swift		19:06
*** baojg has quit IRC		19:06
*** baojg has joined #openstack-swift		19:07
*** e0ne has joined #openstack-swift		19:07
*** baojg has quit IRC		19:08
*** baojg has joined #openstack-swift		19:08
*** baojg has quit IRC		19:09
*** baojg has joined #openstack-swift		19:09
*** baojg has quit IRC		19:09
*** baojg has joined #openstack-swift		19:10
*** baojg has quit IRC		19:10
*** baojg has joined #openstack-swift		19:11
*** baojg has quit IRC		19:11
*** baojg has joined #openstack-swift		19:11
*** baojg has quit IRC		19:12
*** baojg has joined #openstack-swift		19:13
*** baojg has quit IRC		19:13
*** baojg has joined #openstack-swift		19:14
*** baojg has quit IRC		19:14
*** baojg has joined #openstack-swift		19:15
*** baojg has quit IRC		19:15
*** baojg has joined #openstack-swift		19:15
*** baojg has quit IRC		19:16
*** baojg has joined #openstack-swift		19:16
*** baojg has quit IRC		19:17
*** baojg has joined #openstack-swift		19:17
*** baojg has quit IRC		19:17
*** baojg has joined #openstack-swift		19:19
*** baojg has quit IRC		19:20
*** takamatsu has quit IRC		19:27
ybunker	ok so I disable object-replicator on the nodes that have more capacity	19:38
ybunker	then I flip on handoffs_first and drop handoff_delete to 2 on object-replicator on all the nodes, or do i have to change that just on the most full nodes?	19:38
*** e0ne has quit IRC		19:42
*** ybunker has quit IRC		19:43
timburke	that first bit sounds a little terrifying. why are we turning the replicator off entirely? as for the config changes, i always find it easier to reason about a cluster when i have configs as uniform as possible across the nodes... i think i'd do that on all of them	19:46
*** pcaruana has joined #openstack-swift		19:50
*** pcaruana has quit IRC		20:11
DHE	also remember that replication is push based. It seems to me running the replicator is more likely to allow a host to delete objects once it realizes that it is a handoff node and the primaries are all healthy. (is that something a replicator does?)	20:14
*** pcaruana has joined #openstack-swift		20:24
*** portante has left #openstack-swift		20:32
zaitcev	I tried to make everything less complicated for container server, and it went very poorly.	21:01
zaitcev	I mean less complicated than my previous patch, which had ShardRange(row[0].decode('utf-8'), row[1:])	21:02
zaitcev	The biggest problem is the code that insists on using the nul character for SQL markers.	21:03
zaitcev	Like... m = x + b'\x00', then sql("SELECT FROM table WHERE name < ?", m)	21:05
zaitcev	There's NO WAY that I can see to use unicode there	21:05
clayg	timburke: thanks for point me at p 437523 and p 609843 - those are both good to keep on the radar	21:13
patchbot	https://review.openstack.org/#/c/437523/ - swift - Store version id when copying object to archive - 9 patch sets	21:13
patchbot	https://review.openstack.org/#/c/609843/ - swift - Allow arbitrary UTF-8 strings as delimiters in con... - 2 patch sets	21:13
zaitcev	Does anyone actually remember what that zero actually does?	21:14
* zaitcev pokes mattoliverau		21:17
zaitcev	https://github.com/openstack/swift/blob/master/swift/container/sharder.py#L237	21:17
zaitcev	https://github.com/openstack/swift/blob/master/swift/common/utils.py#L4799	21:18
zaitcev	(the latter is actually bogus in py3, but never mind)	21:18
*** pcaruana has quit IRC		21:19
*** baojg has joined #openstack-swift		21:21
*** baojg has quit IRC		21:27
openstackgerrit	Tim Burke proposed openstack/swift master: Fix socket leak on object-server death https://review.openstack.org/575254	21:39
timburke	zaitcev: we can't use u'\x00'? the idea is that `name == x` should be included, but no other valid object name after that. though i can't remember now why we didn't use `name <= ?`...	21:45
timburke	why is that last one bogus on py3?	21:46
zaitcev	wait, what	21:49
zaitcev	oh, so a nil is a valid unicode character	21:49
zaitcev	timburke: thanks a lot, I have something to re-think here.	21:52
*** rcernin has joined #openstack-swift		22:10
*** rcernin has quit IRC		22:17
*** rcernin has joined #openstack-swift		22:19
*** lifeless_ is now known as lifeless		22:34
*** baojg has joined #openstack-swift		22:52
*** tkajinam has joined #openstack-swift		22:54
openstackgerrit	Merged openstack/swift master: Remove duplicate statement https://review.openstack.org/632486	23:12
timburke	so this is weird. func testing https://review.openstack.org/#/c/575254/ (which needs another patchset; i was dumb in my last one), i put some russian-roulette middleware in my object server pipelines, try to pull down something sizeable, and one of two things happens	23:14
patchbot	patch 575254 - swift - Fix socket leak on object-server death - 3 patch sets	23:14
timburke	either i see three object server deaths and a traceback in the proxy that ends with ShortReadError	23:14
timburke	(which is good, that's the behavior i want)	23:14
timburke	or i see one object server death and a traceback coming out of catch_errors that ends with BadResponseLength	23:15
timburke	and i can't seem to figure out where i'm getting a response body file that wouldn't have my ByteCountEnforcer :-(	23:16
timburke	i even tried pushing the wrapping up into utils/request_helpers...	23:18
zaitcev	ugh	23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!