Wednesday, 2021-11-17

opendevreview	Timur Alperovich proposed openstack/swift master: Fix multipart upload listings https://review.opendev.org/c/openstack/swift/+/813715	00:21
reid_g	Hey guys, what would be the path to diagnosing 'ERROR with Object server 10.40.100.92:6001/d1 re: Trying to get final status of PUT to' messages? We started receiving a lot of them with some increased traffic.	13:42
DHE	timeouts?	13:42
reid_g	Yeah Timeout (10.0s)	13:42
reid_g	Are you saying to tweak timeouts?	13:46
reid_g	Which have you changed?	13:46
DHE	umm, none. 10 second timeout seems downright generous, unless you have some crazy fast networking and large objects?	13:46
DHE	if it's consistently the same server and device (d1) then I'd look at this specific disk's health	13:47
reid_g	We have 10G on that cluster but the items are probably small.	14:05
reid_g	It's pretty much every server showing these errors. Should we be doing rolling restart of swift services occasionally?	14:06
reid_g	10s is the default for that Timeout btw	14:11
DHE	I'm not a dev, but my guess would be large files uploaded very rapidly... 10G could upload a full size 5G object in 5 seconds at peak, but a spinning disk can't possibly flush the data within 10 seconds... depending on write cache sizes of course	14:17
DHE	actually it's 512 MB limit of dirty data by default...	14:18
DHE	that's my guess... busy cluster, large objects, dirty data sync-out...	14:19
reid_g	We are currently copying from an old hodgepodge cluster to this newer cluster. Fixed an issue on the old cluster and the GET 200 doubled and HEAD 200 x4. So I think the transfer sped up and is causing more strain on the new cluster.	14:49
DHE	multiple simultaneous uploads at high speeds? yeah I'm thinking disks might just be getting too busy to fsync() in reasonable amounts of time...	15:10
DHE	raising timeouts is probably a good idea	15:10
reid_g	Is/was it possible for just the SLO manifest to get written and none of the segments made it?	17:35
reid_g	What is "512 MB limit of dirty data by default"?	18:19
DHE	[app:object-server] mb_per_sync = 512	18:56
opendevreview	Clay Gerrard proposed openstack/swift master: wip: testing gate fix https://review.opendev.org/c/openstack/swift/+/818107	19:14
reid_g	Not sure if it is only our version of Swift (2.2.0 Juno), but if you try to GET an object where only the SLO manifest exists, the proxy will return no data (404 exists in logs for the fragment).	19:23
zaitcev	I see timburke is not online today.	21:12
acoles	zaitcev: timburke is out this week, IIRC he cancelled the meeting for this week and next (thanksgiving)	21:15
opendevreview	Clay Gerrard proposed openstack/swift master: wip: testing gate fix https://review.opendev.org/c/openstack/swift/+/818107	21:58
opendevreview	Clay Gerrard proposed openstack/swift master: DNM: playing with ssync EAGAIN https://review.opendev.org/c/openstack/swift/+/818296	22:02

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!