Friday, 2018-06-15

timburkenotmyname: it will be if/when https://review.openstack.org/#/c/575568/ lands00:00
patchbotpatch 575568 - requirements - Blacklist eventlet 0.23.000:00
notmynameperfect!00:00
openstackgerritTim Burke proposed openstack/swift master: Only try to fetch or sync shard ranges if the remote supports sharding  https://review.openstack.org/57381600:00
notmynameI'll approve ours, and it won't do anything until the other one lands ;-)00:00
timburkei just got sick of needing to run down eventlet-caused test failures for people...00:01
openstackgerritTim Burke proposed openstack/swift master: Use path_qs instead of reinventing it  https://review.openstack.org/57557200:08
*** mikecmpbll has quit IRC00:09
*** germs has joined #openstack-swift00:45
*** germs has quit IRC00:45
*** germs has joined #openstack-swift00:45
*** germs has quit IRC00:50
*** two_tired has joined #openstack-swift01:06
*** gyee has quit IRC01:09
*** lifeless_ has quit IRC01:13
*** lifeless has joined #openstack-swift01:13
*** lifeless_ has joined #openstack-swift01:24
*** lifeless has quit IRC01:25
mattoliveraumorning, later start on the interweb this morning, as the wife had a check up01:30
*** lifeless has joined #openstack-swift01:40
*** lifeless_ has quit IRC01:41
*** lifeless has quit IRC01:54
*** lifeless has joined #openstack-swift02:00
*** two_tired has quit IRC02:14
*** mwheckmann has joined #openstack-swift02:32
*** germs has joined #openstack-swift02:46
*** germs has quit IRC02:46
*** germs has joined #openstack-swift02:46
*** germs has quit IRC02:50
*** germs has joined #openstack-swift02:50
*** germs has quit IRC02:50
*** germs has joined #openstack-swift02:50
*** germs has quit IRC03:00
*** d0ugal_ has joined #openstack-swift03:17
*** d0ugal has quit IRC03:17
*** psachin has joined #openstack-swift03:28
*** mwheckmann has quit IRC03:48
openstackgerritMerged openstack/swift master: Experimental swift-ring-composer CLI to build composite rings  https://review.openstack.org/45150704:27
openstackgerritMerged openstack/swift master: Improve path handling in proxy_logging  https://review.openstack.org/56335404:33
*** germs has joined #openstack-swift04:37
*** germs has quit IRC04:37
*** germs has joined #openstack-swift04:37
*** germs has quit IRC04:41
*** links has joined #openstack-swift04:55
*** pcaruana has quit IRC05:18
openstackgerritMerged openstack/swift master: Use path_qs instead of reinventing it  https://review.openstack.org/57557205:41
*** cbartz has joined #openstack-swift05:49
*** cah_link has joined #openstack-swift05:49
*** silor has joined #openstack-swift06:11
*** silor has quit IRC06:36
*** gkadam has joined #openstack-swift06:38
*** pcaruana has joined #openstack-swift06:44
*** hseipp has joined #openstack-swift06:44
*** rcernin has quit IRC07:08
*** germs has joined #openstack-swift07:08
*** germs has quit IRC07:08
*** germs has joined #openstack-swift07:08
*** germs has quit IRC07:13
*** tesseract has joined #openstack-swift07:17
*** silor has joined #openstack-swift07:18
*** geaaru has joined #openstack-swift07:24
acolesgood morning07:36
acolestimburke: torgomatic_ zaitcev - thanks for clarifying re p 575511, I'd not appreciated the lack of support for 100 Continue headers and got too focussed on us just needing to eliminate the non-compliant bytes07:36
patchbothttps://review.openstack.org/#/c/575511/ - swift - WIP: PUT POST: Use 100 Continue response header to...07:36
*** ccamacho has joined #openstack-swift07:51
*** hseipp has quit IRC07:59
*** d0ugal_ has quit IRC08:04
*** d0ugal has joined #openstack-swift08:04
*** d0ugal has joined #openstack-swift08:04
*** mikecmpbll has joined #openstack-swift08:06
*** threestrands has quit IRC08:20
*** silor has quit IRC08:37
*** lifeless has quit IRC08:57
*** germs has joined #openstack-swift09:09
*** germs has quit IRC09:09
*** germs has joined #openstack-swift09:09
*** lifeless has joined #openstack-swift09:10
*** germs has quit IRC09:14
*** silor has joined #openstack-swift09:30
*** psachin has quit IRC09:38
*** lifeless has quit IRC09:38
*** lifeless has joined #openstack-swift09:45
*** rcernin has joined #openstack-swift10:06
*** cbartz has quit IRC10:09
*** silor1 has joined #openstack-swift10:09
*** silor has quit IRC10:11
*** silor1 is now known as silor10:11
*** sundbp has joined #openstack-swift10:13
sundbpI'm trying to understand what the expected failure mode is for the following: 9 node cluster, using 1 region with 3 zones with 3 servers each. replication factor = 3. 1 zone goes down. I seem to get some failures in this situation, but I expected to have no service disruption. If I remove the failed zone from ring all is well. Trying to understand if I've misconfigured things, or if having to update the ring i expected?10:16
sundbp(it's not ALL failures, but some. i don't yet see a clear system to what requests works and what not)10:16
*** lifeless_ has joined #openstack-swift10:22
*** lifeless has quit IRC10:23
DHEfailure as in undesirable HTTP response code (4xx or 5xx), or the proxy server hanging?10:28
DHEalso did you set write affinity in the configs? (there's a few places it can go)10:29
*** lifeless_ has quit IRC10:39
*** lifeless has joined #openstack-swift10:41
sundbpoddly i can't find 4xx or 5xx, but that's what I'm assuming10:45
sundbpno write affinity10:45
sundbponce i update ring to not have the failed zone (the 3 network partitioned servers) all is well.10:46
sundbpi do see connecttimeout errors in logs to those 3 hosts, but i expected that. i expected it fail for those servers. but succed on the other 2 serves in the 3x replication, and all would be well as >50%10:47
sundbpDHE: (forgot to prefix with your nick)10:53
*** lifeless_ has joined #openstack-swift10:55
sundbpDHE: i've got about 300 instances of an app working vs this swift cluster. i seemed to have intermittent failures for say 15 at a time. they'd come and go. they all do a hearbeat by writing a file to swift and reading it back as part of a test. some of those tests would fail intermittently. until I updated the ring, then everything solid.10:55
*** lifeless has quit IRC10:56
*** lifeless has joined #openstack-swift11:00
*** lifeless_ has quit IRC11:01
acolessundbp: what is the nature of the failures? failed PUT, failed GET, or unexpected content, or...?11:03
*** lifeless has quit IRC11:05
*** germs has joined #openstack-swift11:10
*** germs has quit IRC11:10
*** germs has joined #openstack-swift11:10
*** lifeless has joined #openstack-swift11:10
*** hseipp has joined #openstack-swift11:10
*** germs has quit IRC11:14
*** silor has quit IRC11:15
*** lifeless_ has joined #openstack-swift11:15
*** lifeless has quit IRC11:15
sundbpacoles: I'm trying to track that down. What I do know is that the hearbeat checks that upload a file and download it intermittently failed. but i'm really struggling to find any actual log content showing the failures. as updating the ring immediately removed ALL errors (that had been ongoing for 2h) it seems almost a given it was related to swift.11:24
*** cbartz has joined #openstack-swift11:26
sundbpthis is a chart showing health check. the first few red bars are the network partition causing multiple issues, it goes back to green when i fix non-swift related things, then the 2 red bars later are intermittent failures related to swift. i see such for all instances (not at same time, had about 5% at any given time in that period). once i removed the 3 partitioned from  the ring around 08:00 all fails went away.11:31
sundbpand i can see that the response times went back down to normal'ish for this health test.11:32
sundbptypical log swift-side from heartbeat: https://gist.github.com/sundbp/10c6a8cfd69436b2d5bf6bf1b5ac0dc311:34
sundbplooks alright to me - the ERRORs are connections to the hosts that are down (network partitioned). but it goes ahead and the PUT/GET is 2xx.11:34
acolessundbp: do you have more than one proxy server?11:40
sundbpacoles: i have 3, 1 which was partitioned, so 2 were running at the time.11:42
sundbpacoles: (the logs are very similar for both)11:42
acolesI'm just wondering if you client requests are hitting proxies on either side of network partition, so a PUT goes to one side then the GET for same object hits other side and fails, or gets stale data11:43
sundbpacoles: good theory, but unlikely. once those health checks started going green was because i completely disabled the partitioned DC and had moved/restarted everything to not depend on anything there.11:45
sundbpacoles: so the 2 remaining DCs were in fine connectivity with each other, and neither had any contact with the 3rd (a broken circuit card of some sort in that DC).11:45
acolesoh, so those logs are for the healthy system?11:46
sundbpthe only thing i can see in logs now after digging around a lot is client app side i see some read timeouts (intermittent) and in swift logs I see "proxy-server: Client disconnected on read (txn: tx77877c4ef47d4757a0d44-005b234be2)"11:46
sundbpyep, healthy side.11:47
sundbpi can't correlate exactly but it seems to me the 2 logs agree, i.e. they refer to same timeout.11:48
sundbpi think that's the source of the intermittent fails. question is why there were some timeouts..11:48
*** mikecmpbll has quit IRC11:50
sundbpand why did these timeouts go away once I updated ring..11:51
acolessundbp: I have to leave now, sorry I couldn't help more11:52
sundbpacoles: thanks for trying!11:52
*** hseipp has quit IRC12:10
openstackgerritAlistair Coles proposed openstack/swift master: WIP: PUT POST: simplify object server POST handler  https://review.openstack.org/57551212:13
*** mwheckmann has joined #openstack-swift12:29
*** kei_yama has quit IRC12:37
*** lifeless_ has quit IRC12:46
*** rcernin has quit IRC13:02
*** germs has joined #openstack-swift13:11
*** germs has quit IRC13:11
*** germs has joined #openstack-swift13:11
*** germs has quit IRC13:15
*** links has quit IRC13:27
*** cbartz has quit IRC14:15
*** mvenesio has joined #openstack-swift14:22
*** germs has joined #openstack-swift14:23
*** germs has joined #openstack-swift14:23
*** cah_link has quit IRC14:26
*** mrjk__ has quit IRC14:41
*** mrjk has joined #openstack-swift14:42
*** pcaruana has quit IRC15:01
sundbpacoles: think i worked out at least 1 part of mystery. proxy-server set to use a 10s connect timeout, and then the client had 10s read timeout set, meaning we had a race if it gets a timeout or successful outcome. so hence why no errors in swift logs, just client disconnects, and same on clientside - see timeout but no error results.15:21
*** armaan has joined #openstack-swift15:48
*** gyee has joined #openstack-swift15:52
notmynamegood morning15:59
*** armaan has quit IRC16:23
timburkegood morning16:33
*** gkadam has quit IRC16:43
*** tesseract has quit IRC17:01
*** germs has quit IRC17:24
*** germs has joined #openstack-swift17:24
*** germs has quit IRC17:24
*** germs has joined #openstack-swift17:24
*** ccamacho has quit IRC17:24
*** germs has quit IRC17:24
openstackgerritTim Burke proposed openstack/swift master: func tests: Rename storage_url to storage_path  https://review.openstack.org/57490017:32
openstackgerritTim Burke proposed openstack/swift master: Tighten up staticweb redirect test  https://review.openstack.org/57490117:32
timburkehow old of a python-swiftclient do we reasonably want to support when running swift's func tests?17:35
timburkehttps://github.com/openstack/requirements/blob/master/lower-constraints.txt currently lists 3.2.0, which is about a year and a half old... i feel like we could probably go back further if we wanted, though...17:37
openstackgerritTim Burke proposed openstack/swift master: Blacklist eventlet 0.23.0  https://review.openstack.org/57556917:43
openstackgerritTim Burke proposed openstack/swift master: Set lower bounds on all requirements and test-requirements  https://review.openstack.org/57580617:43
*** gkadam has joined #openstack-swift17:44
*** geaaru has quit IRC17:52
notmynametimburke: referencing the discussion we had in dublin, we talked about having a 2 year support window. I'm not sure if that needs to apply here, but it's a good start, IMO17:54
notmyname(yes, that discussion was more about swift, not the client, but as a general guide, I don't think it's bad)17:54
openstackgerritTim Burke proposed openstack/swift master: Add debugging info to SignatureDoesNotMatch responses  https://review.openstack.org/57580817:59
*** gkadam_ has joined #openstack-swift18:33
openstackgerritSamuel Merritt proposed openstack/swift master: Fix socket leak on object-server death  https://review.openstack.org/57525418:35
*** gkadam has quit IRC18:37
openstackgerritTim Burke proposed openstack/swift master: Support long-running multipart uploads  https://review.openstack.org/57581818:38
*** cah_link has joined #openstack-swift18:54
timburke:-( i forgot to check the s3api func tests when i did https://review.openstack.org/#/c/570604/18:55
patchbotpatch 570604 - swift - Fix up insecure behavior for functional tests (MERGED)18:55
openstackgerritTim Burke proposed openstack/swift master: Properly handle custom metadata upon an object COPY operation  https://review.openstack.org/57582419:16
*** brimestoned has joined #openstack-swift19:29
*** brimestoned has left #openstack-swift19:29
*** lifeless has joined #openstack-swift19:34
*** mvenesio has quit IRC19:36
openstackgerritTim Burke proposed openstack/swift master: Change PUT bucket conflict error  https://review.openstack.org/57582919:51
openstackgerritTim Burke proposed openstack/swift master: s3api: Stop debug-logging the entire wsgi environment  https://review.openstack.org/57583420:02
*** lifeless has quit IRC20:02
*** lifeless has joined #openstack-swift20:03
openstackgerritTim Burke proposed openstack/swift master: Give better errors for malformed credentials  https://review.openstack.org/57583620:15
openstackgerritTim Burke proposed openstack/swift master: Listing of versioned objects when versioning is not enabled  https://review.openstack.org/57583820:24
*** geaaru has joined #openstack-swift20:28
*** _david_-_ has joined #openstack-swift20:30
openstackgerritTim Burke proposed openstack/swift master: Fix the deletion of non-existent keys  https://review.openstack.org/57584220:30
_david_-_I'm having difficulty figuring out why one node in a 9 node cluster has multiple disks that are 10% above the average disk fullness.  The disk capacity looks good for the weight set in the object ring. Any ideas ?20:33
_david_-_swift-container-replicator is running. I have not changed weights in months, no disks added or deleted.20:36
notmynamecould be "unlucky" hashing where large objects get placed on the same drive20:37
zaitcevI'd look into the dark data, personally... Maybe that box has some history of reboots or running a disparate version, or ooms often.20:43
timburke_david_-_: you said the object ring looks right, but then were talking about the *container* replicator... do you know whether it's object or container data throwing things off?20:47
*** lifeless has quit IRC20:49
timburkei'd also wonder about what sorts of stats the replicators are emitting... it may well be that all the replication services are running fine on that node, but it isn't deleting handoff data because it has trouble talking to one of the primaries for it20:49
*** lifeless has joined #openstack-swift20:50
_david_-_timburke: I du'ed the containers folder in one of the disks, it's only 1.3G so it much be objects data. We typically have large numbers of objects in a smaller number of containers.20:57
openstackgerritTim Burke proposed openstack/swift master: Support long-running multipart uploads  https://review.openstack.org/57581820:59
timburkethat's the way it usually goes :-)21:01
timburkei was also thinking about singular containers with a whole lot of objects, though, such that the container db itself is on the order of tens of GBs. sounds like that isn't your problem21:01
timburkewhat kind of stats are the object replicators emitting?21:01
*** gkadam_ has quit IRC21:05
_david_-_object-replicator: 51931/70612 (73.54%) partitions replicated in 1800.04s (28.85/sec, 10m remaining) object-replicator: 103836 successes, 180 failures object-replicator: 16411053 suffixes checked - 0.01% hashed, 0.00% synced object-replicator: Partition times: max 10.4714s, min 0.0096s, med 0.0125s21:09
_david_-_I think I have one two disks in cluster unmounted at the moment, so that might explain the failures. The failure number seems to stick at 180 across multiple output cycles21:11
timburkeand even so, ~0.2% failure rate seems not bad... hmm...21:14
_david_-_It feels almost like one or more drives were mounted on the wrong /srv/node mountpoints and then they were moved to the correct mount points. Even if that were the case I would assume the replicator would 'fix it'21:21
openstackgerritMerged openstack/swift master: Only try to fetch or sync shard ranges if the remote supports sharding  https://review.openstack.org/57381621:30
openstackgerritMerged openstack/swift master: Set lower bounds on all requirements and test-requirements  https://review.openstack.org/57580621:31
_david_-_another issue... I've just discovered that the x-account-object-count and x-account-bytes-used in the Account DBs are not being updated.  Has anyone heard of that issue before ?22:00
_david_-_x-account-container-count does seem to be being updated22:00
*** lifeless has quit IRC22:01
timburkeare the container-updaters running? they'd be in charge of reporting bytes/counts from the container dbs up to the account dbs22:03
*** sundbp has quit IRC22:03
timburkeand assuming they *are* running, are they more or less keeping up?22:03
openstackgerritTim Burke proposed openstack/swift master: Include '-' in multipart ETags  https://review.openstack.org/57586022:04
_david_-_timburke: hmm. the last time we restarted swift-container-updater across the ACO nodes was 1st Apr, which is when the stats stopped updating..22:15
_david_-_that was when we upgrades from newton to ocata22:17
timburkethat sounds suspicious... do you have any logs from the updaters? if they failed to restart, would there be something to try to bring them back up? does ps or top say that they're running?22:17
_david_-_timburke: ps shows them running, yes.  No logs in the day logs the past few days, both 'main' and 'error'22:17
timburke...which version from ocata? hmm... did we ever tag after backporting https://github.com/openstack/swift/commit/69c715c?22:18
timburkealso related: https://github.com/openstack/swift/commit/633998922:19
timburke'cause https://bugs.launchpad.net/swift/+bug/1722951 sounds *very much* like your problem...22:19
openstackLaunchpad bug 1722951 in OpenStack Object Storage (swift) "Container updater may be stuck and not make progress" [High,Fix released] - Assigned to Samuel Merritt (torgomatic)22:19
_david_-_in ubuntu's cloudarchive versioning,  I have 2.15.1-0ubuntu3~cloud022:20
_david_-_I have the change from https://github.com/openstack/swift/commit/69c715c?  but I don't have the code change from https://bugs.launchpad.net/swift/+bug/172295122:23
openstackLaunchpad bug 1722951 in OpenStack Object Storage (swift) "Container updater may be stuck and not make progress" [High,Fix released] - Assigned to Samuel Merritt (torgomatic)22:23
*** mwheckmann has quit IRC22:25
timburkenotmyname: we should think about tagging a 2.15.2 and 2.13.2 for pike and ocata, respectively...22:25
timburke_david_-_: you can fix it without a code change by ensuring the updaters (ideally, all swift processes) start with EVENTLET_HUB=poll or EVENTLET_HUB=selects in the environment22:29
_david_-_Thanks timburke . Would ' $ EVENTLET_HUB=poll swift-init restart <name> '  do the job ?22:32
timburkei believe so? i forget or sure22:35
notmynametimburke: have we landed many (any?) backports?22:40
timburkenotmyname: pike has four patches of interest, most notably the one that fixes _david_-_'s updater issue22:41
notmynameah22:42
timburkeocata has that same bug fix, though it's less important since it doesn't have the PipeMutex yet22:42
*** lifeless has joined #openstack-swift22:49
*** cah_link has quit IRC22:56
*** gyee has quit IRC23:24
openstackgerritSamuel Merritt proposed openstack/swift master: Rename test_except.py -> test_catch_errors.py  https://review.openstack.org/57587523:34
openstackgerritSamuel Merritt proposed openstack/swift master: Enforce Content-Length in catch_errors  https://review.openstack.org/57587623:34
openstackgerritTim Burke proposed openstack/swift master: Support long-running multipart uploads  https://review.openstack.org/57581823:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!