opendevreview | Timur Alperovich proposed openstack/swift master: Fix multipart upload listings https://review.opendev.org/c/openstack/swift/+/813715 | 00:21 |
---|---|---|
reid_g | Hey guys, what would be the path to diagnosing 'ERROR with Object server 10.40.100.92:6001/d1 re: Trying to get final status of PUT to' messages? We started receiving a lot of them with some increased traffic. | 13:42 |
DHE | timeouts? | 13:42 |
reid_g | Yeah Timeout (10.0s) | 13:42 |
reid_g | Are you saying to tweak timeouts? | 13:46 |
reid_g | Which have you changed? | 13:46 |
DHE | umm, none. 10 second timeout seems downright generous, unless you have some crazy fast networking and large objects? | 13:46 |
DHE | if it's consistently the same server and device (d1) then I'd look at this specific disk's health | 13:47 |
reid_g | We have 10G on that cluster but the items are probably small. | 14:05 |
reid_g | It's pretty much every server showing these errors. Should we be doing rolling restart of swift services occasionally? | 14:06 |
reid_g | 10s is the default for that Timeout btw | 14:11 |
DHE | I'm not a dev, but my guess would be large files uploaded very rapidly... 10G could upload a full size 5G object in 5 seconds at peak, but a spinning disk can't possibly flush the data within 10 seconds... depending on write cache sizes of course | 14:17 |
DHE | actually it's 512 MB limit of dirty data by default... | 14:18 |
DHE | that's my guess... busy cluster, large objects, dirty data sync-out... | 14:19 |
reid_g | We are currently copying from an old hodgepodge cluster to this newer cluster. Fixed an issue on the old cluster and the GET 200 doubled and HEAD 200 x4. So I think the transfer sped up and is causing more strain on the new cluster. | 14:49 |
DHE | multiple simultaneous uploads at high speeds? yeah I'm thinking disks might just be getting too busy to fsync() in reasonable amounts of time... | 15:10 |
DHE | raising timeouts is probably a good idea | 15:10 |
reid_g | Is/was it possible for just the SLO manifest to get written and none of the segments made it? | 17:35 |
reid_g | What is "512 MB limit of dirty data by default"? | 18:19 |
DHE | [app:object-server] mb_per_sync = 512 | 18:56 |
opendevreview | Clay Gerrard proposed openstack/swift master: wip: testing gate fix https://review.opendev.org/c/openstack/swift/+/818107 | 19:14 |
reid_g | Not sure if it is only our version of Swift (2.2.0 Juno), but if you try to GET an object where only the SLO manifest exists, the proxy will return no data (404 exists in logs for the fragment). | 19:23 |
zaitcev | I see timburke is not online today. | 21:12 |
acoles | zaitcev: timburke is out this week, IIRC he cancelled the meeting for this week and next (thanksgiving) | 21:15 |
opendevreview | Clay Gerrard proposed openstack/swift master: wip: testing gate fix https://review.opendev.org/c/openstack/swift/+/818107 | 21:58 |
opendevreview | Clay Gerrard proposed openstack/swift master: DNM: playing with ssync EAGAIN https://review.opendev.org/c/openstack/swift/+/818296 | 22:02 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!