Tuesday, 2017-12-05

*** aagrawal has joined #openstack-swift		00:22
*** abhinavtechie has quit IRC		00:22
*** abhinavtechie has joined #openstack-swift		00:27
*** aagrawal has quit IRC		00:31
*** tovin07__ has joined #openstack-swift		00:43
*** gyee has quit IRC		00:56
*** JimCheung has quit IRC		01:08
*** psachin has joined #openstack-swift		01:10
kota_	good morning	01:21
*** mat128 has joined #openstack-swift		01:21
kota_	notmyname: great news	01:23
*** cshastri has joined #openstack-swift		01:26
timburke	kota_: o/	01:26
kota_	timburke: o/	01:26
acoles	kota_: good morning	01:27
kota_	acoles: morning, oh you're at SFO?	01:27
acoles	kota_: yes for this week	01:28
kota_	make sense	01:28
*** mvk has quit IRC		01:57
*** mvk has joined #openstack-swift		01:59
openstackgerrit	Merged openstack/swift feature/deep: Merge remote-tracking branch 'origin/master' into feature/deep https://review.openstack.org/525269	02:52
mattoliverau	acoles: o/	02:57
mattoliverau	kota_: o/	02:57
*** links has joined #openstack-swift		03:07
*** threestrands has joined #openstack-swift		03:27
kota_	mattoliverau: the symlink patch is getting progress. I almost done on the functests cleanup and waiting m_kazuhiro's review. I think that will be squashed up into the main in 1-2 days.	03:32
kota_	mattoliverau: and it shows me not significant issues on there so that you can continue the symlink patch reviews except the func tests, imo.	03:33
*** kei_yama has quit IRC		03:49
*** abhinavtechie has quit IRC		03:53
mattoliverau	kota_: great, thanks :)	03:55
*** kei_yama has joined #openstack-swift		04:01
*** psachin has quit IRC		04:15
*** abhitechie has joined #openstack-swift		04:20
*** chsc has joined #openstack-swift		04:28
*** psachin has joined #openstack-swift		04:30
*** psachin has quit IRC		04:32
*** chsc has quit IRC		04:35
*** ianychoi has joined #openstack-swift		05:07
*** psachin has joined #openstack-swift		05:10
*** klrmn has quit IRC		05:21
*** threestrands has quit IRC		05:25
*** two_tired has quit IRC		05:41
*** mat128 has quit IRC		05:47
openstackgerrit	Matthew Oliver proposed openstack/swift master: Add a -L or --list to recon to list all results https://review.openstack.org/525039	05:49
mattoliverau	^^ just a few fixes. Just needs some tests for the new helper methods and maybe some -L print tests.	05:49
*** vinsh_ has quit IRC		06:39
*** armaan has joined #openstack-swift		06:40
*** armaan has quit IRC		06:54
*** vinsh has joined #openstack-swift		07:11
*** pcaruana has joined #openstack-swift		07:32
*** hseipp has joined #openstack-swift		07:44
*** rcernin has quit IRC		07:48
*** neonpastor has quit IRC		08:00
*** neonpastor has joined #openstack-swift		08:02
openstackgerrit	Van Hung Pham proposed openstack/swift master: Replace assertTrue(isinstance()) with assertIsInstance() https://review.openstack.org/475639	08:17
*** armaan has joined #openstack-swift		08:22
*** tesseract has joined #openstack-swift		08:24
*** rcernin has joined #openstack-swift		08:33
*** gkadam has joined #openstack-swift		08:38
*** pcaruana has quit IRC		08:40
*** geaaru has joined #openstack-swift		08:52
*** cbartz has joined #openstack-swift		09:07
*** armaan has quit IRC		09:10
*** armaan has joined #openstack-swift		09:20
*** gleblanc has quit IRC		09:35
*** abhitechie has quit IRC		09:54
*** mvk has quit IRC		09:54
*** amito has joined #openstack-swift		10:05
amito	Hi, our cinder CI is failing in the last couple of days. looked in the logs and it seems glance is failing consistently on "BackendException: Cannot find swift service endpoint : The request you have made requires authentication. (HTTP 401)". Any idea?	10:05
*** HCLTech-SSW has joined #openstack-swift		10:09
*** armaan has quit IRC		10:09
*** mvk has joined #openstack-swift		10:22
*** kei_yama has quit IRC		10:26
*** rcernin has quit IRC		10:30
*** tovin07__ has quit IRC		10:30
*** ianychoi has quit IRC		10:55
*** ianychoi has joined #openstack-swift		10:55
*** SkyRocknRoll has joined #openstack-swift		10:57
*** HCLTech-SSW has quit IRC		11:03
*** cshastri has quit IRC		11:05
*** kukacz has joined #openstack-swift		11:22
*** cshastri has joined #openstack-swift		11:47
*** silor has joined #openstack-swift		11:50
openstackgerrit	Kazuhiro MIYAHARA proposed openstack/swift master: Cleanup Symlink Functional Tests https://review.openstack.org/524203	11:59
*** armaan has joined #openstack-swift		12:03
*** cshastri has quit IRC		12:03
*** ^andrea^ has quit IRC		12:07
*** oshritf has quit IRC		12:15
*** zhurong has joined #openstack-swift		12:45
*** zhurong has quit IRC		13:02
*** zhurong has joined #openstack-swift		13:03
*** cshastri has joined #openstack-swift		13:06
*** links has quit IRC		13:22
*** SkyRocknRoll has quit IRC		13:23
*** zhurong has quit IRC		13:27
*** cshastri has quit IRC		13:27
*** psachin has quit IRC		13:33
*** silor1 has joined #openstack-swift		13:47
*** silor has quit IRC		13:47
*** silor1 is now known as silor		13:48
*** mat128 has joined #openstack-swift		14:03
*** geaaru has quit IRC		14:24
*** armaan has quit IRC		14:29
*** armaan has joined #openstack-swift		14:31
*** _ix has joined #openstack-swift		14:42
_ix	Good morning folks. I've got a Mitaka cluster with some eight nodes running, and it's largely been pretty great.	14:43
*** geaaru has joined #openstack-swift		14:43
_ix	Unfortunately, we had some hardware issues that made us reconfigure the rings, and a colleague made a silly mistake of turning down the services without making adjustments to the rings for about 6 weeks. We ended up reformatting those drives on the problem node, and re-introducing it to the cluster.	14:44
_ix	Complete replication took about 5 days after some adjustments to the rsync configurations, several re-adjustments to the rings, and anxiety, but it completed on Sunday night.	14:45
_ix	Anyway, the outstanding issues appear to be related to that six week period of the problem node being offline. The majority of the cluster at normal weights is sitting at some 50-60% disk utilization, while the problem node is above 90% across its disks.	14:47
_ix	There are a number of objects with X-Delete-At values set at 1504*, but there don't appear to be any matching containers for the .expiring_objects account that relate to those timestamps. Essentially, the swift architecture appears to be unaware of these files.	14:48
_ix	My question are: if I understand this correctly and there are a number of .data files lingering on my problem node, is there a way to ensure these files are removed in a cleanup (I guess the Mitaka auditor doesn't take care of this)?	14:50
rledisez	_ix: what could have happened is that the objects were deleted while your node was offline, and the tombstones were reclaimed before your node came back online, so they are now "dark data", they should not be here anymore, but swift can't know it must delete them	14:50
_ix	rledisez: So, I'm on the right track?	14:51
_ix	Are you aware of any safe ways to remove the dark data outside of swift?	14:52
tdasilva	but if the node that was down for 6 weeks had the drives reormatted, how would you get dark data there?	14:56
_ix	tdasilva: That's another question that I had.	14:56
_ix	We had some serious problems in getting the balance correct when we re-introduced this node with empty disks, and I requested from my colleagues that they drop objects that they didn't need any longer to potentially alleviate the replication times, and it's my assumption that some of this stuff was tombstoned and the reclaim age ran down before we achieved stability.	14:58
_ix	That's merely conjecture, though. I'll double check the reclaim age and see if it's any less than the default.	14:59
_ix	I'm assuming the default was 7 days in Mitaka, too. If that's the case, this value hasn't been explicitly set, and it should be 7 days.	15:01
_ix	I am interested in the why of these circumstances, but I'm also interested in the how, that is, how do I clean up after our mistakes?	15:03
_ix	object manifests/segments with X-Delete-At values, but no associated .expiring_objects container means it's dark data. How do we clean up dark data?	15:04
*** silor has quit IRC		15:14
*** silor has joined #openstack-swift		15:14
*** klrmn has joined #openstack-swift		15:14
*** klrmn has quit IRC		15:16
_ix	From my reading over other's experiences, this doesn't appear to be a very easy question to answer.	15:17
tdasilva	_ix: clayg might be a good person to ask about dark data, but he is in PST time zone	15:21
_ix	tdasilva: Thanks! I'm actually reading over some conversations on eavesdrop and that sounds like a safe bet. Do you happen to know what time zone redbo is in?	15:23
tdasilva	redbo is in CST i believe	15:26
*** armaan has quit IRC		15:30
*** armaan has joined #openstack-swift		15:30
*** openstackgerrit has quit IRC		15:48
*** klrmn has joined #openstack-swift		16:00
*** armaan has quit IRC		16:04
*** oshritf has joined #openstack-swift		16:38
*** mvk has quit IRC		16:42
*** oshritf has quit IRC		16:47
*** abhitechie has joined #openstack-swift		16:47
*** chsc has joined #openstack-swift		16:50
*** chsc has joined #openstack-swift		16:50
frankkahle	i have done another install of openstack-swift. I have made sure all of the requiremetns are met and i have compiled the 1.5 of liberasurecode, I assume that i have to have only .... (dots) and no "E" when i run the unit tests correct.?	17:03
*** kallenp has joined #openstack-swift		17:09
notmyname	good morning	17:13
notmyname	frankkahle: correct	17:13
*** cbartz has quit IRC		17:13
*** mvk has joined #openstack-swift		17:14
frankkahle	ok so I got an E, and hit control-C to abort it and saw the error, "ERROR: test_real_config (test.unit.common.middleware.test_memcache.TestCacheMiddleware)" , how do i debug this?	17:15
*** kallenp has left #openstack-swift		17:19
*** pcaruana has joined #openstack-swift		17:22
notmyname	frankkahle: unit tests should all pass on any server where you install swift, but the functests are what's really interesting/important for you to validate a production cluster.	17:24
*** hseipp has quit IRC		17:25
notmyname	the module path in the error is the python import path for it	17:25
notmyname	eg https://github.com/openstack/swift/blob/master/test/unit/common/middleware/test_memcache.py#L99	17:25
*** pcaruana has quit IRC		17:26
*** pcaruana has joined #openstack-swift		17:27
*** ukaynar has joined #openstack-swift		17:30
*** gkadam has quit IRC		17:30
_ix	clayg: Are you around this morning?	17:31
timburke	don't see him at his desk yet, but iirc he's planning on coming in	17:32
_ix	Thanks, Tim. I hope he's in the mood to talk about dark data.	17:38
_ix	I feel like one must whisper when saying... dark data.	17:38
*** pcaruana has quit IRC		17:58
*** itlinux has joined #openstack-swift		17:58
clayg	I’m on the train. I’m not sure what to do about expired manifests and unexpired segments. It’s not really dark data if it’s in the container listing. It’s just orphaned segments.	18:03
_ix	clayg: Thanks for weighing in. That gives me something more to consider.	18:09
*** oshritf has joined #openstack-swift		18:13
*** oshritf has quit IRC		18:15
*** silor has quit IRC		18:16
*** tesseract has quit IRC		18:21
_ix	I've looked again, and the data is not in the container listings. If it were, I suppose it would be trivial to delete.	18:26
_ix	So, does that in turn qualify this as dark data?	18:27
*** oshritf has joined #openstack-swift		18:31
_ix	The data is in a state that we coined...	18:33
_ix	expired-but-not-yet-deleted-and-not-in-the-.expiring_objects-containers-to-be-deleted-properly state.	18:33
*** dcourtoi has quit IRC		18:38
*** openstackgerrit has joined #openstack-swift		18:39
openstackgerrit	Alistair Coles proposed openstack/swift master: Refactor proxy-server conf loading to a utils function https://review.openstack.org/525728	18:39
acoles	I'd almost forgotten how to do an upstream patch! :)	18:39
*** oshritf has quit IRC		18:42
*** dcourtoi has joined #openstack-swift		18:53
*** armaan has joined #openstack-swift		18:55
*** armaan has quit IRC		19:02
*** armaan has joined #openstack-swift		19:03
frankkahle	i'm about ready to give up....I have now built 6 vms in total, ranging from ubuntu 14, 16, 17 and centos multiple versions, based on the instructions here (https://docs.openstack.org/swift/latest/development_saio.html#common-dev-section), carefully done it by the book. built my own version of liberasurecode from git, and yest i still cannot get the unittests to run....	19:15
tdasilva	frankkahle: what's the unit test error you are seeing? may I also suggest maybe trying one of these: https://docs.openstack.org/swift/latest/associated_projects.html#developer-tools ?	19:17
tdasilva	frankkahle: i main this: https://github.com/thiagodasilva/ansible-saio and it works pretty well for me...altough i'll be honest and say that rhel7.4 has a regression and some unit tests will fail but if you run with a rhel7.3 image it should work fine	19:19
tdasilva	s/i main/i maintain	19:19
frankkahle	well i started the unittests and got lots a bunch of dots, then saw some 'E's andd hoit control-c and saw this error (ERROR: test_real_config (test.unit.common.middleware.test_memcache.TestCacheMiddleware)	19:19
tdasilva	what's the error?	19:21
* tdasilva wonders if it has to do with the tmpdir issue		19:21
frankkahle	oh maybe something to do with the cryptography not being a high enough version???	19:21
*** klrmn has quit IRC		19:22
tdasilva	frankkahle: let the unit tests run and post the errors to http://paste.openstack.org/	19:22
frankkahle	this is the bottom of the eror... ContextualVersionConflict: (cryptography 1.2.3 (/usr/lib/python2.7/dist-packages), Requirement.parse('cryptography!=2.0,>=1.6'), set(['swift']))	19:23
frankkahle	should i upgrade that somehow?	19:24
acoles	frankkahle: have you tried running the tests using `tox -e py27 -r`	19:24
frankkahle	and what is tox?	19:25
frankkahle	BTW running on ubuntu 16.04.3 LTS	19:26
acoles	frankkahle: https://docs.openstack.org/swift/latest/development_guidelines.html	19:26
frankkahle	hmm, lol, saus it cannot find tox.ini	19:29
frankkahle	saus=says	19:30
acoles	cd to the root dir of your swift repo	19:31
frankkahle	lol, ok thats is running	19:31
*** klrmn has joined #openstack-swift		19:34
*** joeljwright has joined #openstack-swift		19:36
*** ChanServ sets mode: +v joeljwright		19:36
_ix	clayg: any additional thoughts on expired-but-not-yet-deleted-and-not-in-the-.expiring_objects-containers-to-be-deleted-properly files?	19:37
clayg	_ix: sounds like my original classification may have been incorrect	19:38
frankkahle	its still running, question should the unitests be run as sudo user?	19:39
clayg	an object .data file on-disk (expired or otherwise) that does have a row in it's containers listing is exactly the definition of "dark data"	19:39
clayg	an expired but not yet reaped object would have a row in it's container until the expirer deletes it. so it's probably somewhat inconsequential that the dark data you're finding is expired...	19:40
acoles	clayg: did you mean to type 'done NOT have a row' above?	19:43
clayg	_ix: dark data happens when some how an object .data file persists on a non-primary node longer than the configured reclaim_age (i.e. if you disconnect a node from the primaries for some time, issue a DELETE for some data, wait for the tombstones to get reclaimed then somehow reconnect the orphaned data to the rest of the cluster - which results in the orphaned stale data being repaired in the object tier w/o	19:44
clayg	any record of the earlier, now reclaimed, tombstone/DELETE)	19:44
*** klrmn has quit IRC		19:44
acoles	frankkahle: you shouldn't need to sudo the unit tests	19:46
clayg	acoles: i'm trying to read it again, I think I meant what I said... an expired .data file (i.e. a .data that exists after the x-delete-at metadata) WOULD have a row in the container - until the object-expirer reaps it ... unless it's dark data	19:46
acoles	clayg: yeah, but the line before 'an object .data file on-disk (expired or otherwise) that does NOT have a row in it's containers listing is exactly the definition of "dark data"'	19:47
frankkahle	i had to sudo to get tox command running..and its showing a lot of OK's so far	19:47
clayg	acoles: yup thank you	19:48
*** klrmn has joined #openstack-swift		19:48
clayg	_ix: an object .data file on-disk (expired or otherwise) that does NOT have a row in it's containers listing is exactly the definition of "dark data"	19:48
acoles	clayg: teamwork!	19:49
clayg	anyway, the fix is basically to just issue a DELETE request through the proxy for any objects you find that need to be delete'd - if the containers are deleted you might need to recreate them to get the storage-policy correct or a script that will let you set x-backend-storage-policy-override	19:49
clayg	enumeration of dark data is the hardest part	19:50
clayg	i don't have a great example of either...	19:52
*** joeljwright has quit IRC		19:56
*** joeljwright has joined #openstack-swift		20:02
*** ChanServ sets mode: +v joeljwright		20:02
*** chsc has quit IRC		20:03
*** gkadam has joined #openstack-swift		20:05
*** armaan has quit IRC		20:07
_ix	clayg: Thanks for the advice.	20:08
clayg	one intensive audit I've done in the past involved getting a list of all names of all .data files on disk from object metadata (similar to swift-object-info) then doing container listings on all the accounts discovered and digging into any files on disk that didn't show up in the container listings...	20:11
_ix	It seems like a few things would have to go wrong in order for darkdata to be created, however...	20:14
clayg	yes, generally - there's no known open bugs that cause dark data	20:15
_ix	The sequence of events that you mentioned here is difficult to compare to our own. Indeed, the node was taken offline. But, before rejoining the cluster, the drives were wiped.	20:15
_ix	clayg: Well, we're running Mitaka, but I think the bugs that we saw annotated were fixed as for... Newton or Ocata.	20:16
clayg	that's a possibility...	20:16
_ix	s/for/of	20:16
_ix	Like I said above, I read through some chat logs where an xfs bulkstat or similar was discussed with redbo. Does that ring any bells for you?	20:18
clayg	afaik nothing ever came of that investigation - and people have made do just walking the object trees like the auditor already does	20:18
clayg	auditor hooks that @torgomatic was working on might have been an option...	20:18
redbo	Did you remove the node from the ring right away? Any handoffs that take longer than a week to clear have the same problem.	20:20
_ix	No, and that's probably where the first major mistake was made.	20:20
_ix	Another engineer just took it offline without making adjustmnets to the ring.	20:21
_ix	It sat for some six weeks, with rsync logs stacking up with errors reaching that node. I'm trying to forget this blunder.	20:22
_ix	And indeed, your week figure is pretty accurate. I think it took about six days to bring the node back into the cluster.	20:23
redbo	So yeah, we bulkstat to get a list of all the objects that actually exist in the system, and then dump all container listings to get a list of everything that's in containers. Then cross collate them to find out what objects aren't in listings. But it's not wrapped up all nice and pretty.	20:24
redbo	Because we have too many objects for all of that data to just put it all in a database and do a set comparison.	20:26
redbo	I say we, I'm not working on that anymore.	20:26
*** geaaru has quit IRC		20:26
_ix	redbo: I don't mind doing some work. I think the disk utilization pressure has been relieved after beginning to manage the ring definitions appropriately and rebalancing the cluster.	20:27
_ix	Can I assume this is what we're talking about https://github.com/redbo/python-xfs ?	20:28
_ix	I think we can even do without the node that's at issue... if we're patient and adjust the weights to 0 on all of the drives attached to this node, can I assume that eventually we'll bottom out at a point > 0, and wipe those disks once more?	20:30
redbo	Isn't this a dark data thing? If you drop the weights of those drives, it'll just replicate that dark data out.	20:32
* _ix thinks		20:33
_ix	I'm not sure how the replicator works. I assumed that only non-expired data gets replicated.	20:34
redbo	It's not that smart. So if all of your dark data is expired, you're lucky and can probably clear it with a custom audit type thing.	20:37
_ix	Can you say more about a custom audit type thing?	20:38
_ix	The .data files we've come across so far in our investigations has all been expired.	20:38
redbo	I don't know, just where the auditor pulls the metadata, you could check to see if it's expired and throw it away.	20:38
redbo	Like clayg said, torgomatic worked on make pluggable auditor modules there, but I don't know what happened with that.	20:39
_ix	OK. Well, thanks very much for the discussion. I really appreciate your taking the time.	20:41
redbo	I had to appear, my name was said 3 times.	20:43
*** gyee has joined #openstack-swift		20:44
*** klrmn_ has joined #openstack-swift		20:51
openstackgerrit	John Dickinson proposed openstack/swift master: Added swift version to recon cli https://review.openstack.org/413991	21:18
*** mat128 has quit IRC		21:34
*** gkadam has quit IRC		21:38
*** itlinux has quit IRC		21:39
*** itlinux has joined #openstack-swift		21:44
*** joeljwright has quit IRC		21:47
*** nadeem_ has joined #openstack-swift		21:53
*** nadeem_ has quit IRC		21:53
openstackgerrit	Tim Burke proposed openstack/swift master: tempurl: Make the digest algorithm configurable https://review.openstack.org/525770	21:55
openstackgerrit	Tim Burke proposed openstack/swift master: tempurl: Deprecate sha1 signatures https://review.openstack.org/525771	21:55
*** flwang has quit IRC		21:56
*** flwang has joined #openstack-swift		22:01
*** threestrands has joined #openstack-swift		22:05
*** threestrands has quit IRC		22:05
*** threestrands has joined #openstack-swift		22:05
*** rcernin has joined #openstack-swift		22:05
mattoliverau	morning	22:23
*** klrmn has quit IRC		22:28
*** klrmn_ has quit IRC		23:07
*** kei_yama has joined #openstack-swift		23:22
*** manous_ has joined #openstack-swift		23:29
*** _ix has quit IRC		23:36
*** manous_ has quit IRC		23:40

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!