Friday, 2015-06-05

*** annegentle has joined #openstack-swift		00:09
*** zhill_ has joined #openstack-swift		00:11
*** zhill_ has quit IRC		00:11
*** annegentle has quit IRC		00:14
ho	good morning!	00:18
notmyname	good mornign ho	00:18
ho	notmyname: hello	00:19
*** remix_tj has quit IRC		00:21
*** annegentle has joined #openstack-swift		00:22
ho	acoles: "why did 'w' get missed out :)" <=== (^-^;)	00:35
ho	acoles: (^-^;) means I'm embarrassed and a cold sweat is come out in Japan :-)	00:43
mattoliverau	ho: morning	00:46
ho	mattoliverau: morning!	00:49
*** blmartin has joined #openstack-swift		00:51
*** blmartin_ has joined #openstack-swift		00:51
*** chlong has quit IRC		00:52
*** chlong has joined #openstack-swift		00:54
*** lpabon has joined #openstack-swift		01:03
*** lpabon has quit IRC		01:03
*** annegentle has quit IRC		01:06
*** kota_ has joined #openstack-swift		01:13
*** ChanServ sets mode: +v kota_		01:13
kota_	good morning	01:13
notmyname	hello	01:16
kota_	notmyname: hi :)	01:18
*** hugespoon has left #openstack-swift		01:49
*** blmartin has quit IRC		01:56
*** blmartin_ has quit IRC		01:56
*** jkugel has joined #openstack-swift		02:12
clayg	hi!	03:00
ho	hello clayg!	03:06
*** zul has quit IRC		03:08
*** asettle is now known as asettle-afk		03:10
*** zul has joined #openstack-swift		03:20
*** asettle-afk is now known as asettle		03:48
*** kota_ has quit IRC		04:12
*** jamielennox is now known as jamielennox\|away		04:32
ho	patch #187489 got liberasurecode in gate: http://paste.openstack.org/show/264868/.	04:47
ho	liberasurecode_backend_open: dynamic linking error libJerasure.so: cannot open shared object file: No such file or directory	04:47
ho	is there any env changes in gate?	04:50
*** ahonda has quit IRC		04:58
swifterdarrell	ho: is jerasure installed too?	04:58
*** ppai has joined #openstack-swift		05:00
ho	swifterdarrell: i'm not sure. do you how to check it in gate?	05:04
swifterdarrell	ho: no idea	05:04
swifterdarrell	ho: ask the infra guys?	05:04
ho	swifterdarrell: do you know a gu	05:05
swifterdarrell	ho: but based on the error, I'd first make sure libjerasure is actually installed	05:05
ho	my keyboard something wrong...	05:05
swifterdarrell	ho: I think monty is one of them? not sure the full set of infra folks	05:06
swifterdarrell	ho: try #openstack-infra channel?	05:06
ho	swifterdarrell: thanks! I will ask this to infra guys	05:06
swifterdarrell	ho: np, good luck!	05:07
swifterdarrell	portante: good news! I'm clearly seeing that one bad disk per node totally fucks object-server, even with workers = 90 (3x disks). Details to come, but it's night and day	05:11
portante	nice	05:12
swifterdarrell	portante: now testing with 1 bad disk in one of the storage nodes, then I'll do a control group with no bad disks... just normal unfettered swift-object-auditor	05:12
portante	great	05:12
portante	so then you'll do the servers-per-port runs with that	05:12
swifterdarrell	portante: sneak peek https://www.dropbox.com/s/o9sapxkvnr2brip/Screenshot%202015-06-04%2022.12.45.png?dl=0	05:12
swifterdarrell	portante: ya	05:12
* portante looks		05:12
swifterdarrell	portante: that graph has 4 runs of servers_per_disk=3 w/one hammered disk per storage node, then 4 runs of workers=90, also w/one hammered disk per node	05:13
swifterdarrell	portante: far right is teh first run w/workers=90 with only one hammered drive in one of two storage nodes	05:13
portante	which means th object servers are not all engaged handling requests, a few of them are getting the requests, a top on the system would probably bear that out	05:14
swifterdarrell	portante: I have a threaded python script that uses directio python module to issue random reads directly (O_DIRECT) to the raw block device of one disk; it gets I/O queue full to 128 with await times between 500 and 700+ ms	05:15
portante	pummeled	05:15
portante	where in that graph is the servers-per-port stuff	05:15
swifterdarrell	portante: ya, the longer blocking I/Os to the bad disk should interrupt servicing of I/O to other non-bad disks by that same, unlucky, swift-object-server	05:15
swifterdarrell	portante: servers_per_port=3 are teh left 4 runs	05:16
swifterdarrell	portante: (the ones that don't look like shit)	05:16
*** SkyRocknRoll has joined #openstack-swift		05:17
swifterdarrell	portante: this is ssbench-master run-scenario -U bench2 -S http://127.0.0.1/v1/AUTH_bench2 -f Real_base2_simplified.scenario -u 120 -r 1200	05:17
swifterdarrell	portante: 120 concurrency	05:17
swifterdarrell	portante: I simplified the realistic scenario to reduce noise; cutting down concurrency & whatnot for all background daemons other than swift-object-auditor also reduced noise	05:17
portante	okay	05:17
portante	so in that graph, left y-axis is min avg max response time, and right is request per sec	05:18
swifterdarrell	portante: so I'm seeing even one hammered disk screws things up... this is 1 hammered disk out of ~60 total obj disks in cluster	05:18
swifterdarrell	portante: yup	05:19
swifterdarrell	portante: so lower black line is bad, adn higher red line is bad	05:19
openstackgerrit	Kota Tsuyuzaki proposed openstack/swift: Fix the missing SLO state on fast-post https://review.openstack.org/182564	05:19
swifterdarrell	portante: I'll have the actual nubmers later (i have all the raw data for these runs)	05:19
portante	wow, so my guess is that if you were to lower the maxclients you would see less of that effect with 90 workers	05:19
portante	and the flat lines just mean test wasn't running	05:20
swifterdarrell	portante: flat line was dinner + True Detective :)	05:20
portante	cute	05:20
swifterdarrell	=)	05:20
swifterdarrell	portante: so you think lowering teh default 1024 max clients will imporove those runs on the right? the wokers=90?	05:21
swifterdarrell	portante: maybe I'll try that in the morning (still w/one bad disk); that'll be cheap & easy to test	05:21
portante	it should because it will allow more of the workers to participate fully	05:21
portante	eventlet accept greenlet is greedy	05:22
swifterdarrell	portante: i'm not convinced that starvation's happening, necessarily... I think the problem is that any object-server (on the affected node) can get fucked by that bad disk	05:22
portante	it'll just keep posting accepts and gobbling up, as long as that process gets on the run queue	05:22
swifterdarrell	portante: but I'm interested in the experiment :)	05:22
portante	certainly, but the more greedy the object server the larger the effect	05:22
portante	servers-per-port is by nature not greedy	05:23
swifterdarrell	portante: I see CPU consumption of swift-object-server being uneven, but how can I tell there aren't outstanding I/Os blocked for the others? i.e. lack of CPU consumptoin doesn't mean reqs aren't being processed	05:23
swifterdarrell	portante: makes sense; what value would you suggest?	05:24
portante	I would drop it down to 5 or something	05:24
swifterdarrell	portante: k	05:24
portante	maybe even 1 if 90 workers covers all your requests	05:25
swifterdarrell	portante: let's see... 120 clients w/some PUT amplification is up to ~360 concurrent reqs, divided by 180 total workers betw 2 servers, is 2	05:25
swifterdarrell	portante: so I'll try it w/2	05:25
portante	sounds reasonable	05:26
portante	so what percentage of the oustsanding requests would be for the bad disk in this controlled experiment?	05:26
swifterdarrell	portante: thanks for the idea! I won't get a chance to run it 'til tomorrow morning	05:27
swifterdarrell	portante: heading to bed soon	05:27
portante	I should be myself	05:27
swifterdarrell	portante: all subject ot Swift's dispersion (md5)	05:27
swifterdarrell	portante: 1 bad disk (latest runs) out of 60	05:27
portante	60 across all servers, right?	05:28
swifterdarrell	portante: ya	05:28
swifterdarrell	portante: like 29 and 32 i think	05:28
swifterdarrell	portante: so actually 61, but whatever	05:28
portante	so my guess is that what you'll is still works that servers-per-disk, maybe not too bad, but better than normal by a lot	05:28
swifterdarrell	P that a GET hits it is... 1 / (61 / 3) ~= 5%	05:29
portante	but that is just a guess, it depends on how well the requests get spread out	05:29
*** Triveni has joined #openstack-swift		05:29
swifterdarrell	P that a PUT hits it is... same, I guess	05:29
portante	this is good stuff	05:30
swifterdarrell	portante: I'll ping you tomorrow when I have something	05:30
swifterdarrell	portante: g'night!	05:30
portante	'night!	05:30
*** mitz has quit IRC		05:41
ho	clayg, notmyname, torgomatic: macque's info: https://www.youtube.com/watch?v=hWstAme0eQs	05:56
cschwede	Good Morning!	06:23
*** jistr has joined #openstack-swift		06:57
ho	cschwede: Morning!	07:03
openstackgerrit	Christian Schwede proposed openstack/python-swiftclient: Add connection release test https://review.openstack.org/159076	07:19
*** remix_tj has joined #openstack-swift		07:27
*** mmcardle has joined #openstack-swift		07:42
*** jistr is now known as jistr\|biab		07:47
*** hseipp has joined #openstack-swift		08:09
*** acoles_away is now known as acoles		08:27
acoles	ho: i did put :) after my comment. just amused me that the values accidentally ended up using v,x,y,z :P	08:29
*** chlong has quit IRC		08:34
*** foexle has joined #openstack-swift		08:35
ho	acoles: i didn't notice it. i exposed weaknesses in english :-)	08:45
acoles	ho: did you get any idea why the unit tests failed with the libjerasure.so problem? i see it on other jenkins jobs too	08:46
ho	acoles: I asked it to infra guys but i didn't get any response yet.	08:48
ho	acoles: FYI: 14:09:58 (ho) hello, patch #187489 got an error regarding to lberasurecode in gate. I would like to know whether libJerasure is installed or not. http://paste.openstack.org/show/264868/	08:50
*** jistr\|biab is now known as jistr		08:50
acoles	ho: thanks	08:51
ho	acoles: you are welcome!	08:51
acoles	ho: that was a good addition to the test btw	08:52
ho	acoles: thanks! :)	08:54
*** jistr has quit IRC		08:59
*** jistr has joined #openstack-swift		09:19
cschwede	acoles: do you remember if we have a „official“ abandon policy? I wanted to abandon a few waised patches (not mine).	09:20
*** geaaru has joined #openstack-swift		09:21
*** marzif_ has joined #openstack-swift		09:21
acoles	cschwede: hi. iirc swift policy is that mattoliverau has a bot that finds old patches (4 weeks with -1 ??), mails the author to warm them of likely abandonment, then after 2 weeks he or notmyname abandon them.	09:22
cschwede	morning :) yes, that’s for swift - but it what about swiftclient? i think we don’t do it there	09:24
cschwede	https://review.openstack.org/#/q/status:open+project:openstack/python-swiftclient++label:Code-Review%253C%253D-1,n,z	09:24
cschwede	acoles: ^^	09:24
acoles	cschwede: i guess an improvement might be to place a comment on the patch when warning is given, then any core could abandon after another 2 weeks.	09:24
acoles	cschwede: hmm, you are most likely right, maybe mattoliverau only does it for swift	09:24
acoles	looking...	09:25
cschwede	https://review.openstack.org/#/q/status:open+AND+project:openstack/python-swiftclient+AND+%28label:Verified-1+OR+label:Code-Review%253C%253D-1%29,n,z	09:26
cschwede	acoles: ^^ is a better view, all patches that got either a -1/-2 from a reviewer or jenkins	09:27
*** SkyRocknRoll has quit IRC		09:27
cschwede	that’s more than half of all swiftclient patches	09:27
acoles	patch 158701 i will ping author (hp)	09:27
patchbot	acoles: https://review.openstack.org/#/c/158701/	09:27
cschwede	heh, i was looking at that patch just a few minutes ago :)	09:28
acoles	cschwede: so we don't abandon if WIP, i think	09:32
acoles	cschwede: global requirements notmyname always -2's	09:33
cschwede	makes sense	09:34
*** aix has joined #openstack-swift		09:34
acoles	patch 148791 appears to be superseded by patch 185629	09:36
patchbot	acoles: https://review.openstack.org/#/c/148791/	09:36
acoles	cschwede: shall i abandon 148791?	09:37
cschwede	i was thinking the same	09:37
acoles	doing it now	09:37
cschwede	i abandoned a patch earlier and left the following msg:	09:37
cschwede	„There has been no change to this patch since nearly a year, thus abandoning this patch to clear the list of incoming reviews.	09:38
cschwede	Please feel free to restore this patch in case you are still working on it. Thank you!“	09:38
cschwede	on patch 109802	09:38
patchbot	cschwede: https://review.openstack.org/#/c/109802/	09:38
*** SkyRocknRoll has joined #openstack-swift		09:40
acoles	cschwede: i will abandon patch 160169 because i think the bug is already fixed	09:45
patchbot	acoles: https://review.openstack.org/#/c/160169/	09:45
cschwede	yes, and there was also no response to your comment	09:46
acoles	cschwede: does an 'abandon' add to our stackalytics scores :P	09:48
cschwede	i don’t think so?	09:48
acoles	awww	09:48
acoles	cschwede: that leaves patch 116065 that is old and not WIP	09:51
patchbot	acoles: https://review.openstack.org/#/c/116065/	09:51
*** shlee322 has joined #openstack-swift		09:51
cschwede	yes, and i agree with Darrells comment on that patch. probably a candidate for abandoning too	09:52
acoles	go for it	09:53
acoles	author can always restore if the disagree	09:53
cschwede	done	09:55
acoles	patch 172791 is not quite so old so i left a comment asking if more work is planned. also, it has had no human review.	09:58
patchbot	acoles: https://review.openstack.org/#/c/172791/	09:58
*** dmorita has quit IRC		09:58
acoles	cschwede: i have fresh coffee and croissant waiting for me. bbiab :)	09:58
openstackgerrit	Christian Schwede proposed openstack/python-swiftclient: Add ability to download objects to particular folder. https://review.openstack.org/160283	09:59
cschwede	acoles: coffee sounds great - enjoy!	09:59
*** Triveni has quit IRC		10:13
*** shlee322 has quit IRC		10:18
*** shlee322 has joined #openstack-swift		10:32
*** ho has quit IRC		10:41
*** wasmum has quit IRC		10:55
*** marzif_ has quit IRC		11:11
*** marzif_ has joined #openstack-swift		11:12
*** marzif_ has quit IRC		11:13
*** marzif_ has joined #openstack-swift		11:13
*** wasmum has joined #openstack-swift		11:19
*** blmartin_ has joined #openstack-swift		11:33
*** jkugel has quit IRC		11:34
*** shlee322 has quit IRC		11:56
*** wbhuber has joined #openstack-swift		11:58
*** geaaru has quit IRC		12:08
*** geaaru has joined #openstack-swift		12:09
*** km has quit IRC		12:11
*** mmcardle has quit IRC		12:13
openstackgerrit	Alistair Coles proposed openstack/swift: Make test_proxy work independent of evn vars https://review.openstack.org/188756	12:15
openstackgerrit	Alistair Coles proposed openstack/swift: Make test_proxy work independent of env vars https://review.openstack.org/188756	12:15
*** zul has quit IRC		12:16
*** zul has joined #openstack-swift		12:16
*** thurloat_isgone is now known as thurloat		12:21
*** sc has quit IRC		12:23
*** mmcardle has joined #openstack-swift		12:25
*** MVenesio has joined #openstack-swift		12:25
*** MVenesio has quit IRC		12:25
*** sc has joined #openstack-swift		12:26
*** wbhuber has quit IRC		12:27
*** annegentle has joined #openstack-swift		12:32
*** blmartin_ has quit IRC		12:38
openstackgerrit	Prashanth Pai proposed openstack/swift: Make object creation more atomic in Linux https://review.openstack.org/162243	12:59
*** jkugel has joined #openstack-swift		13:00
*** ppai has quit IRC		13:01
*** jkugel1 has joined #openstack-swift		13:02
*** jkugel has quit IRC		13:05
*** kei_yama has quit IRC		13:07
*** petertr7_away is now known as petertr7		13:07
*** SkyRocknRoll has quit IRC		13:18
*** wbhuber has joined #openstack-swift		13:19
*** blmartin has joined #openstack-swift		13:23
*** acoles is now known as acoles_away		13:37
tdasilva	good morning	13:38
cschwede	Hello Thiago!	13:42
tdasilva	cschwede: hi!	13:43
*** acampbell has joined #openstack-swift		13:43
*** acampbel11 has joined #openstack-swift		13:44
*** jrichli has joined #openstack-swift		14:06
*** esker has joined #openstack-swift		14:09
*** esker has quit IRC		14:14
*** esker has joined #openstack-swift		14:15
*** acoles_away is now known as acoles		14:15
*** foexle has quit IRC		14:27
swifterdarrell	portante: initial results for workers=90 + max_clients=2 do not look good	14:29
*** thurloat is now known as thurloat_isgone		14:30
*** acampbel11 has quit IRC		14:38
*** thurloat_isgone is now known as thurloat		14:41
swifterdarrell	portante: ya, we'll see how consistent the 4 runs are, but the first run of workers=90 + max_clients=2 is worse than workers=90 + max_clients=1024	14:43
*** minwoob has joined #openstack-swift		14:50
portante	swifterdarrell: bummer	15:12
*** B4rker has joined #openstack-swift		15:12
swifterdarrell	portante: cluster might have just gotten cold overnight; the 2nd run of max_clients=2 is looking very similar to the max_clients=1024	15:13
swifterdarrell	portante: PUTs: https://www.dropbox.com/s/kpb598iouccs3cw/Screenshot%202015-06-05%2008.13.43.png?dl=0	15:13
swifterdarrell	portante: GETs: https://www.dropbox.com/s/rjeq83p95ofy2mq/Screenshot%202015-06-05%2008.14.00.png?dl=0	15:14
swifterdarrell	portante: far right are the (still going) max_clients=2 runs	15:14
swifterdarrell	portante: they're identical to the middle runs: 1 bad disk out of 61; swift-object-auditor running full speed, workers=90; only difference is max_clients	15:14
swifterdarrell	portante: and servers_per_disk=3 with two bad disks (one per server) is better than all the workers=90 runs w/only 1 bad disk	15:16
swifterdarrell	portante: so I'm still pretty sure that the I/O isolation is really important	15:16
swifterdarrell	portante: (what threads-per-disk and servers_per_port really try to get at )	15:16
swifterdarrell	portante: funny story: one of our guys is onsite w/Intel doing EC benchmarking on a much larger cluster than I've got for my testing and their results were being interfered with by like 1 or two naughty disks	15:18
*** gyee_ has joined #openstack-swift		15:38
*** janonymous_ has joined #openstack-swift		15:41
*** janonymous_ has quit IRC		15:46
*** david-lyle has quit IRC		15:46
*** david-lyle has joined #openstack-swift		15:46
*** SkyRocknRoll has joined #openstack-swift		15:50
portante	swifterdarrell: yes	15:50
portante	;) that is the reality, too often we work with "pristine" environments and expect that is what customers have	15:51
portante	;)	15:51
*** shlee322 has joined #openstack-swift		15:55
*** zaitcev has joined #openstack-swift		15:57
*** ChanServ sets mode: +v zaitcev		15:57
swifterdarrell	portante: little more data for the max_clients=2: https://www.dropbox.com/s/2oijdlsmng4ih0r/Screenshot%202015-06-05%2009.03.16.png?dl=0	16:03
swifterdarrell	portante: you can see it's very ball-park with max_clients=1024	16:03
swifterdarrell	portante: (tossing the first run as an outlier)	16:03
*** jistr has quit IRC		16:03
swifterdarrell	portante: I'm going to halt the max_clients=2 runs and proceed the rest of my targets	16:04
*** B4rker has quit IRC		16:07
*** B4rker has joined #openstack-swift		16:10
*** B4rker has quit IRC		16:14
*** B4rker has joined #openstack-swift		16:16
*** jordanP has joined #openstack-swift		16:29
*** Fin1te has joined #openstack-swift		16:29
portante	swifterdarrell: so the timeline for the maxclients=1024 is 22:00 - 00:00, and maxclients=2 is 07:00 till end?	16:38
*** breitz has quit IRC		16:38
*** breitz has joined #openstack-swift		16:39
portante	swifterdarrell: are there 4 phases to the test which might match those peaks and valleys?	16:43
*** acoles is now known as acoles_away		16:45
*** annegentle has quit IRC		16:46
*** jordanP has quit IRC		16:48
openstackgerrit	Tim Burke proposed openstack/python-swiftclient: Add ability to download objects to particular folder. https://review.openstack.org/160283	16:51
*** mmcardle has quit IRC		16:53
*** annegentle has joined #openstack-swift		16:54
notmyname	good morning	16:54
*** breitz has quit IRC		16:57
*** SkyRocknRoll has quit IRC		16:59
*** annegentle has quit IRC		17:08
*** Fin1te has quit IRC		17:10
peluse	morning	17:10
*** SkyRocknRoll has joined #openstack-swift		17:11
*** dimasot has joined #openstack-swift		17:13
*** zhill_ has joined #openstack-swift		17:15
*** geaaru has quit IRC		17:18
peluse	jrichli, you there	17:27
*** harlowja has quit IRC		17:27
*** SkyRocknRoll has quit IRC		17:32
*** harlowja has joined #openstack-swift		17:32
*** lastops has joined #openstack-swift		17:33
*** SkyRocknRoll has joined #openstack-swift		17:45
*** marzif_ has quit IRC		17:45
*** B4rker has quit IRC		17:46
*** annegentle has joined #openstack-swift		17:50
*** marzif_ has joined #openstack-swift		17:51
*** B4rker has joined #openstack-swift		17:53
*** hseipp has left #openstack-swift		17:53
*** B4rker has quit IRC		17:58
*** B4rker has joined #openstack-swift		17:59
jrichli	peluse: I am back. what can I do for you?	18:01
*** proteusguy has joined #openstack-swift		18:04
*** Fin1te has joined #openstack-swift		18:33
*** harlowja has quit IRC		18:46
*** proteusguy has quit IRC		18:47
*** gyee_ has quit IRC		18:47
*** themadcanudist has joined #openstack-swift		18:48
tdasilva	so i probably missed this conversation, but what's the status with the py27 tests failing?	18:49
themadcanudist	hey guys, i ran a $container->deleteAllObjects(); from the php opencloud SDK and the command translated to "DELETE /v1/AUTH_$tenant_id%3Fbulk-delete%3D1"	18:50
themadcanudist	which ended up deleting the account!! Now I can't do anything.. i get a 403 "recently deleted" message	18:50
themadcanudist	is this supposed to be possible?	18:50
*** harlowja has joined #openstack-swift		18:53
*** serverascode has quit IRC		18:54
*** briancurtin has quit IRC		18:54
*** zhiyan has quit IRC		18:54
*** nottrobin has quit IRC		18:54
*** odsail has joined #openstack-swift		18:55
*** lastops has quit IRC		19:00
redbo	themadcanudist: what version of swift are you running? It's possible that could happen if you have an old version of swift and you don't have the bulk delete middleware running and you have the proxy set up to do account management.	19:02
redbo	and you're using a client that doesn't use POST for bulk deletes	19:04
portante	POST for bulk deletes, don't tell the REST police!	19:04
swifterdarrell	portante: probably? each "run" was subsequent 20-min ssbench runs w/like 120s sleep in between. So there should have been 4 humps from max_clients=1024 and 2 or 3 from the in-progress max_clients=2 at the time I took those screenshots	19:04
portante	swifterdarrell: k thanks	19:05
*** lastops has joined #openstack-swift		19:10
*** annegentle has quit IRC		19:14
*** SkyRocknRoll has quit IRC		19:14
*** lastops has quit IRC		19:15
*** odsail has quit IRC		19:18
*** lastops has joined #openstack-swift		19:22
*** lastops has quit IRC		19:25
themadcanudist	redbo: That's likely it	19:26
themadcanudist	can I disable that functionality!?	19:26
themadcanudist	it's super-dangerous	19:26
*** acampbell has quit IRC		19:26
themadcanudist	I just blew away my test account by running a bulkdelete php api call on a container	19:26
themadcanudist	not swift's fault, except for the fact that it allows that behaviour	19:27
*** Fin1te has quit IRC		19:29
openstackgerrit	Minwoo Bae proposed openstack/swift: The hash_cleanup_listdir function should only be called when necessary. https://review.openstack.org/178317	19:30
*** silor has joined #openstack-swift		19:31
*** silor has quit IRC		19:37
*** B4rker has quit IRC		19:40
*** ptb has joined #openstack-swift		19:42
ptb	In a multi-region cluster (4 replica - 2/2) when getting a 404 on head it will go over the wire to check the 2 remote disks and 2 secondary locations. Ideally, when read affinity is enabled would it make sense to have an additional setting to only check local region for an object - if enabled?	19:48
notmyname	ptb: even knowing that setting will result in false 404s to the client?	19:51
*** annegentle has joined #openstack-swift		19:54
ptb	agreed, what I am trying to solve for is the extra latency over the wire to the remote region	19:54
ptb	b4 returning a 404 that is. 8)	19:56
notmyname	ptb: yeah, that makes sense. just wanted to know if you still wanted the config option even knowing the tradeoffs	19:59
notmyname	ptb: so it sounds like you're ok with a 404 for data that has been stored in the system but isn't in the current region	20:00
notmyname	is that true?	20:00
ptb	Exactly! There is an application pattern in use that always checks b4 writing files.	20:00
ptb	In a multi-region replication scenario having such a setting would save a great deal of over the wire traffic to the remote regions with a HEAD 404	20:02
notmyname	yeah, I understand how it's good for speeding up 404s	20:03
notmyname	but if you have an issue in one region and those 2 replicas aren't available (but they are available in the other region), then you'll ask for it and get a 404 to the client	20:03
*** annegentle has quit IRC		20:04
notmyname	I'm thinking of data that is actually already in the cluster	20:04
themadcanudist	redbo: Is there a way to disable the functionality that allows for older swift servers to have the account sent a DELETE?	20:04
*** annegentle has joined #openstack-swift		20:04
ptb	Yes...this is why I think it should be a setting. Or in a multi-region >2 you could even have a single fallback region.	20:04
*** petertr7 is now known as petertr7_away		20:06
ptb	I have regions A,B,C,D and perhaps cld set a fallback region to check if the local lookup fails - opposed to the current (albeit correct) behavior of checking all locations?	20:07
notmyname	a specific fallback is already accounted for with the read_affinity setting (you can prefer one region or zone over another)	20:08
notmyname	but yeah, you're asking for a config option to return the wrong answer really fast (in some casese)	20:09
notmyname	granted, the common case would just make 404s faster	20:09
ptb	Yes. Ideally that is the problem to address - returning 404s faster in a multi-region cluster during a HEAD operation	20:10
ptb	As we add regions, customers are expecting the same response times...trying to see if there are options to help meet their expectations. 8)	20:11
ptb	Which also sends another +1 for your async container fix!	20:12
notmyname	have you considered not doing the initial HEAD and instead doing a PUT with the "If-None-Match: *" header?	20:13
* notmyname needs to get back to that patch soon		20:13
ptb	I hadn't...will experiment.	20:13
*** charlesw has joined #openstack-swift		20:14
redbo	themadcanudist: if allow_account_management is set to "yes" in the proxy configs, that should be removed/set to no.	20:14
notmyname	ptb: https://gist.github.com/notmyname/a63de6ce99d6f1df857b	20:15
themadcanudist	last question redbo… if I accidently deleted an account like that, can I udnelete it and will the objects and containers still exist?	20:15
themadcanudist	cuz righ tnow i'm getting a 403 "recently deleted"	20:15
notmyname	themadcanudist: from the swift "ideas" page: "utility to "undelete" accounts, as described in http://docs.openstack.org/developer/swift/overview_reaper.html"	20:16
*** blmartin has quit IRC		20:16
redbo	themadcanudist: it depends. if you're running the swift-account-reaper, what's what actually deletes things. Oh yeah just read that page.	20:17
*** tellesnobrega_ has joined #openstack-swift		20:17
themadcanudist	yeah	20:17
notmyname	themadcanudist: unfortunately, the current way to do that is to twiddle the bit in the account db	20:17
themadcanudist	right	20:17
themadcanudist	and mapping the account -> /srv/node/*/accounts/$HASH ?	20:19
notmyname	ptb: can you add that idea of a config option as you described it as a bug on launchpad? https://bugs.launchpad.net/swift	20:19
*** thurloat is now known as thurloat_isgone		20:20
ptb	Indeed...will do. Thanks!	20:23
notmyname	ptb: thanks	20:27
minwoob	"Invalid arguments passed to liberasurecode_instance_create" -- has anyone seen this error lately, when testing a patch?	20:34
minwoob	It isn't showing up locally, but when I push it to gerrit, the gate test seems to fail.	20:35
minwoob	On my local system it is fine.	20:35
minwoob	http://logs.openstack.org/17/178317/7/check/gate-swift-python27/889998b/console.html	20:36
*** tellesnobrega_ has quit IRC		20:38
*** tellesnobrega_ has joined #openstack-swift		20:39
torgomatic	themadcanudist: fwiw, more-recent versions of Swift disallow account DELETE calls with a query string for just that reason	20:41
*** nottrobin has joined #openstack-swift		20:46
*** serverascode has joined #openstack-swift		20:51
*** zhiyan has joined #openstack-swift		20:54
*** themadcanudist has quit IRC		20:59
peluse	minwoob, just approved so lets see if it happens again...	20:59
minwoob	peluse: All right. Thank you.	21:01
peluse	there's been some issues in the past up there, I didn't deal with them though so if it still pukes we'll find someone who did :) works locally for me as well though	21:02
*** briancurtin has joined #openstack-swift		21:05
minwoob	Okay.	21:07
minwoob	peluse: Do you mean that problem has been observed before, or just in general that the gate tests occasionally have exhibited strange behaviors?	21:10
*** tellesnobrega_ has quit IRC		21:11
peluse	minwoob, in general issues with liberasurecode/pyeclib setup correctly on those systems...	21:13
peluse	bah, still failed.	21:15
peluse	probably little chance of getting help til Mon...	21:15
*** lastops has joined #openstack-swift		21:18
*** doxavore has joined #openstack-swift		21:21
*** lastops has quit IRC		21:23
*** shlee322 has quit IRC		21:26
*** shlee322 has joined #openstack-swift		21:27
*** esker has quit IRC		21:29
*** jrichli has quit IRC		21:31
*** shlee322 has quit IRC		21:36
swifterdarrell	peluse: speaking of PyECLib & friends, the 1.0.7m versions break w/some distribute/pip/setuptools combination... I don't have anything more specific than that, but just FYI it can cause trouble	21:38
swifterdarrell	peluse: i.e. I had 1.0.7m installed and the requirement ">=1.0.7" was unmet and my proxies wouldn't start	21:39
swifterdarrell	peluse: arguably, I should have just packaged "1.0.7m" as "1.0.7" (a little white lie)	21:39
swifterdarrell	peluse: (this was for PyECLib)	21:39
*** openstack has joined #openstack-swift		21:43
torgomatic	swifterdarrell: that is pretty darn convincing	21:43
peluse	swifterdarrell, interesting. info on pyeclib. tsg is out of the country, I'll pass it on to he and keving though	21:43
swifterdarrell	peluse: thx; don't think there's a real fix beyond getting gate to use actual libraries and get rid of teh "m"	21:44
swifterdarrell	peluse: (so the PyPi version also just uses liberasurecode vs. some bundled C stuff)	21:44
swifterdarrell	torgomatic: yup...	21:44
swifterdarrell	torgomatic: night and day	21:44
notmyname	swifterdarrell: run "2" is with the failing disk(s)?	21:44
swifterdarrell	notmyname: numbers like "0;" and "2;" indicate how many disks were made slow	21:45
notmyname	ah ok	21:45
swifterdarrell	acoles_away: clayg: cschwede: zaitcev: portante: mattoliverau: notmyname: tdasilva: torgomatic: so comparing "0;..." to "2;..." shows how badly 2 slow disks (1 per storage node, with 61 total disks in cluster) hurt	21:46
notmyname	yup	21:46
notmyname	night and day :)	21:46
zaitcev	V.curious	21:46
notmyname	the max latency is the scariest part (of workers=90)	21:47
openstackgerrit	Darrell Bishop proposed openstack/swift: Allow 1+ object-servers-per-disk deployment https://review.openstack.org/184189	21:48
torgomatic	swifterdarrell: so that's 61 total disks in a cluster of 2 nodes, where each storage node had 1 slow disk?	21:49
torgomatic	or is that 2 slow disks per storage node?	21:50
peluse	wow	21:51
peluse	we just experienced those same type things here in our cluster...	21:52
notmyname	dfg_: glange: redbo: y'all should definitely check out those results and the patch ^	21:52
swifterdarrell	acoles_away: clayg: cschwede: zaitcev: portante: mattoliverau: notmyname: tdasilva: torgomatic: updated the gist to clarify	21:56
swifterdarrell	acoles_away: clayg: cschwede: zaitcev: portante: mattoliverau: notmyname: tdasilva: torgomatic: "0;..." is no slow disks (just normal swift-object-auditor) "1;..." is one slow disk (out of 61 total disks in cluster) in one of 2 storage nodes "2;..." is two slow disks (out of 61 total disks in cluster), one per each of the 2 storage nodes	21:57
torgomatic	swifterdarrell: thanks	21:57
swifterdarrell	torgomatic: np	21:57
zaitcev	I saw that	21:57
zaitcev	Thanks	21:57
swifterdarrell	acoles_away: clayg: cschwede: zaitcev: portante: mattoliverau: notmyname: tdasilva: torgomatic: also updated gist to include the make-a-drive-slow script I used; a value of 256 kills a disk pretty good (await between 500 and 1000+ms)	21:59
openstackgerrit	Michael Barton proposed openstack/swift: go: restructure cmd/hummingbird.go https://review.openstack.org/188939	21:59
swifterdarrell	https://gist.github.com/dbishop/fd0ab067babdecfb07ca#file-io_hammer-py	21:59
*** marzif_ has quit IRC		22:01
swifterdarrell	acoles_away: clayg: cschwede: zaitcev: portante: mattoliverau: notmyname: tdasilva: torgomatic: here's earlier results that illustrate the threads_per_disk overhead: https://gist.github.com/dbishop/1d14755fedc86a161718#file-tabular_results-md	22:02
mattoliverau	Wow!	22:04
peluse	swifterdarrell, so I guess the question is how many clusters out there are being hit by this but their admins have no idea?	22:07
swifterdarrell	peluse: probably a lot?	22:07
notmyname	all of them? ;-)	22:07
notmyname	peluse: only the ones that have drives that fail	22:08
peluse	safe bet I think!	22:08
peluse	fail or just intermittently crappy perf?	22:08
swifterdarrell	peluse: Intel saw it in their testing w/COSBench way back, like 4 design summits ago? that was what prompted the threads_per_disk change	22:08
peluse	yup, I remember - that was jiangang	22:08
peluse	portland summit	22:08
swifterdarrell	peluse: notmyname: declining perf is worse than total failure, I think	22:08
notmyname	yeah, bad drive. or overloaded drive	22:08
notmyname	swifterdarrell: yeah	22:08
swifterdarrell	peluse: ya! portland	22:09
notmyname	good: working drive. bad: broken drive. worst: drive that isn't broken but is slow	22:09
redbo	was this not common knowledge?	22:09
peluse	I didn't think the extent was common knoweldge but could just be me	22:10
swifterdarrell	redbo: which part? the pain of slow disks has been common knowledge since at least the portland summit	22:10
* peluse means by extent the drastic data that swifterdarrell just showed vs 'yeah there's impact'		22:10
*** charlesw has quit IRC		22:10
swifterdarrell	redbo: Mercado Libre deployed object servers per disk via ring device port differentiation quite a while ago and talked about it, so that's been common knowledge	22:11
swifterdarrell	redbo: I don't know about common knowledge, but we've seen too-high overhead w/threads_per_disk and no longer recommend it	22:11
peluse	maybe ripping it out would be a good low priority todo item at some point....	22:12
*** dimasot has quit IRC		22:12
notmyname	the new thing here is multiple listeners per port when there's one drive per port (right?). the mercado libre situation was 1:1 port:drive	22:12
swifterdarrell	peluse: I've been terrified of blocking I/O calls starving eventlet hub for quite a while now	22:12
swifterdarrell	peluse: servers_per_port has been on my hitlist for a long time... but finally had enough time to actually work on it	22:12
peluse	so they're not keeping you busy enough, is that what you're saying? :)	22:13
swifterdarrell	peluse: haha	22:13
notmyname	and swifterdarrell's results show that the 1 worker per drive isn't always good (or much better than just a lot of worker wrt max latency). but multiple workers per port is great for smoothing out latency and keeping it low	22:13
swifterdarrell	notmyname: ya, servers_per_port=1 didn't cut it	22:14
swifterdarrell	notmyname: 3 was the sweet spot for my 30-disk nodes; not sure how that'd change for a 60 or 80-disk storage node	22:14
notmyname	I think the new thing here is that multiple servers per port where each port is a different drive	22:14
notmyname	s/that//	22:15
portante	swifterdarrell: nice work, I see that maxclients=2 with 90 workers only helps the 99%, but the average still high, so that really shows how important the server per port method is	22:17
openstackgerrit	Darrell Bishop proposed openstack/swift: Allow 1+ object-servers-per-disk deployment https://review.openstack.org/184189	22:17
swifterdarrell	portante: ya, the max_clients=2 was a wash	22:18
*** zhill__ has joined #openstack-swift		22:19
*** jkugel1 has quit IRC		22:19
*** zhill_ has quit IRC		22:20
*** bi_fa_fu has joined #openstack-swift		22:21
*** themadcanudist has joined #openstack-swift		22:24
themadcanudist	torgomatic: thanks!	22:25
*** ptb has quit IRC		22:28
*** openstackgerrit has quit IRC		22:37
*** openstackgerrit has joined #openstack-swift		22:37
*** lcurtis has quit IRC		22:41
*** wbhuber has quit IRC		22:46
*** doxavore has quit IRC		22:48
*** ozialien has joined #openstack-swift		22:49
*** zhill__ has quit IRC		22:53
*** zhill_ has joined #openstack-swift		22:55
*** zhill_ is now known as zhill_mbp		22:57
*** petertr7_away is now known as petertr7		22:58
*** ozialien has quit IRC		23:02
*** ozialien has joined #openstack-swift		23:05
*** petertr7 is now known as petertr7_away		23:07
*** annegentle has quit IRC		23:18
openstackgerrit	Michael Barton proposed openstack/swift: go: restructure cmd/hummingbird.go https://review.openstack.org/188939	23:31
*** ozialien has quit IRC		23:33

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!