Tuesday, 2021-01-19

*** baojg has joined #openstack-swift		01:34
*** rcernin has quit IRC		02:26
*** rcernin has joined #openstack-swift		02:42
*** rcernin has quit IRC		02:44
*** rcernin has joined #openstack-swift		02:44
openstackgerrit	Matthew Oliver proposed openstack/swift master: Add root aceptor as root if root has been deleted https://review.opendev.org/c/openstack/swift/+/771343	05:15
*** evrardjp has quit IRC		05:33
*** evrardjp has joined #openstack-swift		05:33
*** m75abrams has joined #openstack-swift		05:36
*** dsariel has joined #openstack-swift		06:12
*** rcernin has quit IRC		07:19
*** rpittau\|afk is now known as rpittau		08:07
*** hoonetorg has quit IRC		08:51
*** hoonetorg has joined #openstack-swift		08:53
*** dsariel has quit IRC		09:15
*** dsariel has joined #openstack-swift		09:16
*** dsariel has quit IRC		09:17
*** dsariel has joined #openstack-swift		09:17
*** rcernin has joined #openstack-swift		09:43
*** dsariel has quit IRC		10:19
*** dsariel has joined #openstack-swift		10:20
*** rcernin has quit IRC		10:28
*** rcernin has joined #openstack-swift		11:13
*** baojg has quit IRC		11:28
*** baojg has joined #openstack-swift		11:29
*** baojg has quit IRC		11:29
*** baojg has joined #openstack-swift		11:29
*** baojg has quit IRC		11:30
*** baojg has joined #openstack-swift		11:30
*** baojg has quit IRC		11:31
*** baojg has joined #openstack-swift		11:31
*** baojg has quit IRC		11:32
*** baojg has joined #openstack-swift		11:32
*** baojg has quit IRC		11:32
*** baojg has joined #openstack-swift		11:33
*** baojg has quit IRC		11:33
*** baojg has joined #openstack-swift		11:34
*** baojg has quit IRC		11:34
*** baojg has joined #openstack-swift		11:34
openstackgerrit	Alistair Coles proposed openstack/swift master: Fix 503s from EC GETs of objects with POST metadata https://review.opendev.org/c/openstack/swift/+/771089	11:34
*** baojg has quit IRC		11:35
*** baojg has joined #openstack-swift		11:35
*** baojg has quit IRC		11:36
*** baojg has joined #openstack-swift		11:36
*** baojg has quit IRC		11:36
*** baojg has joined #openstack-swift		11:37
*** rcernin has quit IRC		11:37
*** baojg has quit IRC		11:37
*** baojg has joined #openstack-swift		11:38
*** baojg has quit IRC		11:43
*** paladox has quit IRC		11:57
*** lifeless has quit IRC		11:57
*** DHE has quit IRC		11:57
*** paladox has joined #openstack-swift		11:57
*** lifeless has joined #openstack-swift		11:58
*** DHE has joined #openstack-swift		11:58
openstackgerrit	Merged openstack/swift master: s3api: Get rid of slo_enabled flag https://review.opendev.org/c/openstack/swift/+/770685	12:02
*** rcernin has joined #openstack-swift		14:42
openstackgerrit	Alistair Coles proposed openstack/swift master: s3api: actually execute check_pipeline in real world https://review.opendev.org/c/openstack/swift/+/771467	14:45
*** m75abrams has quit IRC		14:53
*** rcernin has quit IRC		15:01
*** m75abrams has joined #openstack-swift		15:24
*** klamath_atx has joined #openstack-swift		15:45
timburke_	good morning	15:51
timburke_	DHE, good to know on the proxy hang -- i'll try to get something approaching a repro with pre https://review.opendev.org/c/openstack/swift/+/752593, then test again with something more like master. assuming that doesn't fly, i'll try moving the fix to before monkey-patching (which seems like it should do it for both original and patched versions)	15:56
*** m75abrams has quit IRC		16:00
*** klamath_atx has quit IRC		16:01
*** diablo_rojo has joined #openstack-swift		16:01
*** m75abrams has joined #openstack-swift		16:03
*** jv has quit IRC		16:21
*** jv has joined #openstack-swift		16:34
*** hoonetorg has quit IRC		16:49
*** hoonetorg has joined #openstack-swift		16:49
*** jv has quit IRC		17:10
*** m75abrams has quit IRC		17:19
*** jv has joined #openstack-swift		17:25
seongsoocho	Hi, When a just one disk unmount from object-server, all object-server's replicate respond slowly. As a result, the object-replicator runs very slowly. and I see the message 'Nothing replicated for 2400.06150103 seconds' in the log file. Is it normal operation?	17:33
*** gyee has joined #openstack-swift		18:16
*** rpittau is now known as rpittau\|afk		18:41
openstackgerrit	Tim Burke proposed openstack/swift master: obj: Include timeout value when logging long-running rsyncs https://review.opendev.org/c/openstack/swift/+/771504	18:44
timburke_	seongsoocho, that sounds like normal operation. when a disk responds as unmounted, the replicator will assume that the disk has "failed in place" and work to ensure full durability by replicating to the first-available handoff. the "Nothing replicated ..." messages are usually because it's waiting on a long-running rsync (which makes sense if it needs to copy the whole partition)	19:02
timburke_	note that if the server never responds, otoh, the replicator assumes that it's a transient failure and will not replicate to handoffs	19:03
seongsoocho	timburke_: About 24 hours have passed since it was unmounted. And looking at the replication network traffic, it seems that replication to the first handoff node is over. Will replicating to the first handoff node affect the slow response speed of the object-server's REPLICATE api?	19:09
timburke_	every partition dir on every disk has a hashes.pkl file that has a kind of a checksum of the files present in that partition on that disk. after rsyncing a whole partition, the receiver will need to recalculate that checksum, which can be pretty io intensive. this can cause slow responses for requests to that disk (REPLICATE or otherwise, and within that partition or otherwise)	19:18
timburke_	still, i'm surprised that it's still impacting things that much a day on...	19:20
timburke_	it's probably worth looking at iostat/iotop on a slow server	19:23
*** lifeless has quit IRC		19:27
*** lifeless has joined #openstack-swift		19:27
seongsoocho	disk io is not that high on a slow server. (It's weird.. ) I will wait a little longer. thanks!	19:30
openstackgerrit	Clay Gerrard proposed openstack/swift master: Do not reclaim sharded roots until they shrink https://review.opendev.org/c/openstack/swift/+/771086	19:44
clayg	oh hrm... looks like I may have missed an opportunity to address some review comments - i'll mark it WIP	19:45
clayg	seongsoocho: did you say you already know why cycle time is slow, and it's the REPLIATE requests? i don't see why a REPLICATE request to a mounted disk would be slow because another disk is unmounted on a different server - maybe bouncing services could clean up some tar pit object server?	19:50
clayg	IME REPLICATE requests that have to do a re-hash have always been kinda slow - maybe you just didn't notice and that's not actually what changed between when your cycles times were ok, and now?	19:51
timburke_	fwiw, i know i often see low %util but high %iowait at home -- not sure if that's mostly a result of running with SMR disks or what, though	19:52
clayg	what Tim said about "unmounting a disk causes replication" is 100% true - if that's all that's going on that's normal	19:52
clayg	timburke_: for sure, iowait can get tanked by random reads - which is basically what a re-hash is doing 🤮	19:52
timburke_	the funny thing is that %util for the disk isn't just low, it's 0. ditto all the per-disk stats	19:53
timburke_	like, in the middle of a `iostat -x 5` i get back a set of stats like http://paste.openstack.org/show/801743/	19:57
seongsoocho	https://www.irccloud.com/pastebin/5s8sEwPp/	20:01
seongsoocho	oh..	20:01
seongsoocho	this is the current object-server log.	20:02
seongsoocho	[19/Jan/2021:18:41:32 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 30793" 0.0217 "-" 2148 0	20:02
seongsoocho	[19/Jan/2021:18:59:20 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 7361" 0.7141 "-" 2166 0	20:02
seongsoocho	[19/Jan/2021:19:03:54 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 30793" 0.0194 "-" 2166 0	20:02
seongsoocho	[19/Jan/2021:19:23:43 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 7361" 0.5397 "-" 2164 0	20:02
seongsoocho	Before the replication to hand off node, the response time was about 0.0x sconds.	20:02
*** Jeffrey4l has quit IRC		20:04
seongsoocho	I don't know why the response time has changed from before. But I think this is why the replicator slows down.	20:08
*** openstackgerrit has quit IRC		20:12
*** Jeffrey4l has joined #openstack-swift		20:13
*** openstackgerrit has joined #openstack-swift		20:23
openstackgerrit	Alistair Coles proposed openstack/swift master: s3api: actually execute check_pipeline in real world https://review.opendev.org/c/openstack/swift/+/771467	20:23
timburke_	man, trying to repro https://bugs.launchpad.net/swift/+bug/1895739 is making me notice other weird things, too... somehow, with a 3x replicated policy and only 5 disks in my cluster, i'm getting 7x "Client disconnected on read of ..." messages with the same txn id??	20:31
openstack	Launchpad bug 1895739 in OpenStack Object Storage (swift) "Proxy server sometimes deadlocks while logging client disconnect" [Undecided,In progress]	20:31
timburke_	client_ip is sometimes present, sometimes not... and some log lines are missing txn id entirely... maybe i should try applying https://review.opendev.org/c/openstack/swift/+/761475 and see if i can get any more insight? though that was mostly targetting EC...	20:34
timburke_	i should probably also up-rev swift -- it's not even on 2.26.0 yet (though it's only a couple commits or so behind there)	20:36
timburke_	:-/ that didn't help much; still see error logs with no txn id, no client ip...	20:41
*** Jeffrey4l has quit IRC		20:50
*** Jeffrey4l has joined #openstack-swift		20:51
openstackgerrit	Clay Gerrard proposed openstack/swift master: Debug EC multipart/byteranges responses https://review.opendev.org/c/openstack/swift/+/761475	20:57
openstackgerrit	Clay Gerrard proposed openstack/swift master: WIP: s3api: Make multi-deletes async https://review.opendev.org/c/openstack/swift/+/648263	21:06
clayg	little rebase action pre-package 🥳	21:07
openstackgerrit	Tim Burke proposed openstack/swift master: relinker: Track part_power/next_part_power in state file https://review.opendev.org/c/openstack/swift/+/769855	21:18
*** priteau has quit IRC		21:35
mattoliverau	morning	22:00
*** rcernin has joined #openstack-swift		22:09
*** dsariel has quit IRC		22:18
*** openstackgerrit has quit IRC		22:59
*** openstackgerrit has joined #openstack-swift		23:00
openstackgerrit	Tim Burke proposed openstack/swift master: s3api: Break S3Request.__init__ signature less https://review.opendev.org/c/openstack/swift/+/771526	23:00
timburke_	clayg, give ^^^ a try	23:00
*** klamath_atx has joined #openstack-swift		23:05

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!