*** baojg has joined #openstack-swift | 01:34 | |
*** rcernin has quit IRC | 02:26 | |
*** rcernin has joined #openstack-swift | 02:42 | |
*** rcernin has quit IRC | 02:44 | |
*** rcernin has joined #openstack-swift | 02:44 | |
openstackgerrit | Matthew Oliver proposed openstack/swift master: Add root aceptor as root if root has been deleted https://review.opendev.org/c/openstack/swift/+/771343 | 05:15 |
---|---|---|
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #openstack-swift | 05:33 | |
*** m75abrams has joined #openstack-swift | 05:36 | |
*** dsariel has joined #openstack-swift | 06:12 | |
*** rcernin has quit IRC | 07:19 | |
*** rpittau|afk is now known as rpittau | 08:07 | |
*** hoonetorg has quit IRC | 08:51 | |
*** hoonetorg has joined #openstack-swift | 08:53 | |
*** dsariel has quit IRC | 09:15 | |
*** dsariel has joined #openstack-swift | 09:16 | |
*** dsariel has quit IRC | 09:17 | |
*** dsariel has joined #openstack-swift | 09:17 | |
*** rcernin has joined #openstack-swift | 09:43 | |
*** dsariel has quit IRC | 10:19 | |
*** dsariel has joined #openstack-swift | 10:20 | |
*** rcernin has quit IRC | 10:28 | |
*** rcernin has joined #openstack-swift | 11:13 | |
*** baojg has quit IRC | 11:28 | |
*** baojg has joined #openstack-swift | 11:29 | |
*** baojg has quit IRC | 11:29 | |
*** baojg has joined #openstack-swift | 11:29 | |
*** baojg has quit IRC | 11:30 | |
*** baojg has joined #openstack-swift | 11:30 | |
*** baojg has quit IRC | 11:31 | |
*** baojg has joined #openstack-swift | 11:31 | |
*** baojg has quit IRC | 11:32 | |
*** baojg has joined #openstack-swift | 11:32 | |
*** baojg has quit IRC | 11:32 | |
*** baojg has joined #openstack-swift | 11:33 | |
*** baojg has quit IRC | 11:33 | |
*** baojg has joined #openstack-swift | 11:34 | |
*** baojg has quit IRC | 11:34 | |
*** baojg has joined #openstack-swift | 11:34 | |
openstackgerrit | Alistair Coles proposed openstack/swift master: Fix 503s from EC GETs of objects with POST metadata https://review.opendev.org/c/openstack/swift/+/771089 | 11:34 |
*** baojg has quit IRC | 11:35 | |
*** baojg has joined #openstack-swift | 11:35 | |
*** baojg has quit IRC | 11:36 | |
*** baojg has joined #openstack-swift | 11:36 | |
*** baojg has quit IRC | 11:36 | |
*** baojg has joined #openstack-swift | 11:37 | |
*** rcernin has quit IRC | 11:37 | |
*** baojg has quit IRC | 11:37 | |
*** baojg has joined #openstack-swift | 11:38 | |
*** baojg has quit IRC | 11:43 | |
*** paladox has quit IRC | 11:57 | |
*** lifeless has quit IRC | 11:57 | |
*** DHE has quit IRC | 11:57 | |
*** paladox has joined #openstack-swift | 11:57 | |
*** lifeless has joined #openstack-swift | 11:58 | |
*** DHE has joined #openstack-swift | 11:58 | |
openstackgerrit | Merged openstack/swift master: s3api: Get rid of slo_enabled flag https://review.opendev.org/c/openstack/swift/+/770685 | 12:02 |
*** rcernin has joined #openstack-swift | 14:42 | |
openstackgerrit | Alistair Coles proposed openstack/swift master: s3api: actually execute check_pipeline in real world https://review.opendev.org/c/openstack/swift/+/771467 | 14:45 |
*** m75abrams has quit IRC | 14:53 | |
*** rcernin has quit IRC | 15:01 | |
*** m75abrams has joined #openstack-swift | 15:24 | |
*** klamath_atx has joined #openstack-swift | 15:45 | |
timburke_ | good morning | 15:51 |
timburke_ | DHE, good to know on the proxy hang -- i'll try to get something approaching a repro with pre https://review.opendev.org/c/openstack/swift/+/752593, then test again with something more like master. assuming that doesn't fly, i'll try moving the fix to *before* monkey-patching (which seems like it should do it for both original and patched versions) | 15:56 |
*** m75abrams has quit IRC | 16:00 | |
*** klamath_atx has quit IRC | 16:01 | |
*** diablo_rojo has joined #openstack-swift | 16:01 | |
*** m75abrams has joined #openstack-swift | 16:03 | |
*** jv has quit IRC | 16:21 | |
*** jv has joined #openstack-swift | 16:34 | |
*** hoonetorg has quit IRC | 16:49 | |
*** hoonetorg has joined #openstack-swift | 16:49 | |
*** jv has quit IRC | 17:10 | |
*** m75abrams has quit IRC | 17:19 | |
*** jv has joined #openstack-swift | 17:25 | |
seongsoocho | Hi, When a just one disk unmount from object-server, all object-server's replicate respond slowly. As a result, the object-replicator runs very slowly. and I see the message 'Nothing replicated for 2400.06150103 seconds' in the log file. Is it normal operation? | 17:33 |
*** gyee has joined #openstack-swift | 18:16 | |
*** rpittau is now known as rpittau|afk | 18:41 | |
openstackgerrit | Tim Burke proposed openstack/swift master: obj: Include timeout value when logging long-running rsyncs https://review.opendev.org/c/openstack/swift/+/771504 | 18:44 |
timburke_ | seongsoocho, that sounds like normal operation. when a disk responds as unmounted, the replicator will assume that the disk has "failed in place" and work to ensure full durability by replicating to the first-available handoff. the "Nothing replicated ..." messages are usually because it's waiting on a long-running rsync (which makes sense if it needs to copy the whole partition) | 19:02 |
timburke_ | note that if the server *never responds*, otoh, the replicator assumes that it's a transient failure and will *not* replicate to handoffs | 19:03 |
seongsoocho | timburke_: About 24 hours have passed since it was unmounted. And looking at the replication network traffic, it seems that replication to the first handoff node is over. Will replicating to the first handoff node affect the slow response speed of the object-server's REPLICATE api? | 19:09 |
timburke_ | every partition dir on every disk has a hashes.pkl file that has a kind of a checksum of the files present in that partition on that disk. after rsyncing a whole partition, the receiver will need to recalculate that checksum, which can be pretty io intensive. this can cause slow responses for requests to that disk (REPLICATE or otherwise, and within that partition or otherwise) | 19:18 |
timburke_ | still, i'm surprised that it's still impacting things *that much* a day on... | 19:20 |
timburke_ | it's probably worth looking at iostat/iotop on a slow server | 19:23 |
*** lifeless has quit IRC | 19:27 | |
*** lifeless has joined #openstack-swift | 19:27 | |
seongsoocho | disk io is not that high on a slow server. (It's weird.. ) I will wait a little longer. thanks! | 19:30 |
openstackgerrit | Clay Gerrard proposed openstack/swift master: Do not reclaim sharded roots until they shrink https://review.opendev.org/c/openstack/swift/+/771086 | 19:44 |
clayg | oh hrm... looks like I may have missed an opportunity to address some review comments - i'll mark it WIP | 19:45 |
clayg | seongsoocho: did you say you already know *why* cycle time is slow, and it's the REPLIATE requests? i don't see why a REPLICATE request to a mounted disk would be slow because another disk is unmounted *on a different server* - maybe bouncing services could clean up some tar pit object server? | 19:50 |
clayg | IME REPLICATE requests that have to do a re-hash have always been kinda slow - maybe you just didn't notice and that's not actually what *changed* between when your cycles times were ok, and now? | 19:51 |
timburke_ | fwiw, i know i often see low %util but high %iowait at home -- not sure if that's mostly a result of running with SMR disks or what, though | 19:52 |
clayg | what Tim said about "unmounting a disk causes replication" is 100% true - if that's all that's going on that's normal | 19:52 |
clayg | timburke_: for sure, iowait can get tanked by random reads - which is basically what a re-hash is doing 🤮 | 19:52 |
timburke_ | the funny thing is that %util for the disk isn't just *low*, it's *0*. ditto *all* the per-disk stats | 19:53 |
timburke_ | like, in the middle of a `iostat -x 5` i get back a set of stats like http://paste.openstack.org/show/801743/ | 19:57 |
seongsoocho | https://www.irccloud.com/pastebin/5s8sEwPp/ | 20:01 |
seongsoocho | oh.. | 20:01 |
seongsoocho | this is the current object-server log. | 20:02 |
seongsoocho | [19/Jan/2021:18:41:32 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 30793" 0.0217 "-" 2148 0 | 20:02 |
seongsoocho | [19/Jan/2021:18:59:20 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 7361" 0.7141 "-" 2166 0 | 20:02 |
seongsoocho | [19/Jan/2021:19:03:54 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 30793" 0.0194 "-" 2166 0 | 20:02 |
seongsoocho | [19/Jan/2021:19:23:43 +0000] "REPLICATE /sdc/934" 200 169030 "-" "-" "object-replicator 7361" 0.5397 "-" 2164 0 | 20:02 |
seongsoocho | Before the replication to hand off node, the response time was about 0.0x sconds. | 20:02 |
*** Jeffrey4l has quit IRC | 20:04 | |
seongsoocho | I don't know why the response time has changed from before. But I think this is why the replicator slows down. | 20:08 |
*** openstackgerrit has quit IRC | 20:12 | |
*** Jeffrey4l has joined #openstack-swift | 20:13 | |
*** openstackgerrit has joined #openstack-swift | 20:23 | |
openstackgerrit | Alistair Coles proposed openstack/swift master: s3api: actually execute check_pipeline in real world https://review.opendev.org/c/openstack/swift/+/771467 | 20:23 |
timburke_ | man, trying to repro https://bugs.launchpad.net/swift/+bug/1895739 is making me notice other weird things, too... somehow, with a 3x replicated policy and only 5 disks in my cluster, i'm getting 7x "Client disconnected on read of ..." messages with the same txn id?? | 20:31 |
openstack | Launchpad bug 1895739 in OpenStack Object Storage (swift) "Proxy server sometimes deadlocks while logging client disconnect" [Undecided,In progress] | 20:31 |
timburke_ | client_ip is sometimes present, sometimes not... and some log lines are missing txn id entirely... maybe i should try applying https://review.opendev.org/c/openstack/swift/+/761475 and see if i can get any more insight? though that was mostly targetting EC... | 20:34 |
timburke_ | i should probably also up-rev swift -- it's not even on 2.26.0 yet (though it's only a couple commits or so behind there) | 20:36 |
timburke_ | :-/ that didn't help much; still see error logs with no txn id, no client ip... | 20:41 |
*** Jeffrey4l has quit IRC | 20:50 | |
*** Jeffrey4l has joined #openstack-swift | 20:51 | |
openstackgerrit | Clay Gerrard proposed openstack/swift master: Debug EC multipart/byteranges responses https://review.opendev.org/c/openstack/swift/+/761475 | 20:57 |
openstackgerrit | Clay Gerrard proposed openstack/swift master: WIP: s3api: Make multi-deletes async https://review.opendev.org/c/openstack/swift/+/648263 | 21:06 |
clayg | little rebase action pre-package 🥳 | 21:07 |
openstackgerrit | Tim Burke proposed openstack/swift master: relinker: Track part_power/next_part_power in state file https://review.opendev.org/c/openstack/swift/+/769855 | 21:18 |
*** priteau has quit IRC | 21:35 | |
mattoliverau | morning | 22:00 |
*** rcernin has joined #openstack-swift | 22:09 | |
*** dsariel has quit IRC | 22:18 | |
*** openstackgerrit has quit IRC | 22:59 | |
*** openstackgerrit has joined #openstack-swift | 23:00 | |
openstackgerrit | Tim Burke proposed openstack/swift master: s3api: Break S3Request.__init__ signature less https://review.opendev.org/c/openstack/swift/+/771526 | 23:00 |
timburke_ | clayg, give ^^^ a try | 23:00 |
*** klamath_atx has joined #openstack-swift | 23:05 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!