*** aagrawal has joined #openstack-swift | 00:22 | |
*** abhinavtechie has quit IRC | 00:22 | |
*** abhinavtechie has joined #openstack-swift | 00:27 | |
*** aagrawal has quit IRC | 00:31 | |
*** tovin07__ has joined #openstack-swift | 00:43 | |
*** gyee has quit IRC | 00:56 | |
*** JimCheung has quit IRC | 01:08 | |
*** psachin has joined #openstack-swift | 01:10 | |
kota_ | good morning | 01:21 |
---|---|---|
*** mat128 has joined #openstack-swift | 01:21 | |
kota_ | notmyname: great news | 01:23 |
*** cshastri has joined #openstack-swift | 01:26 | |
timburke | kota_: o/ | 01:26 |
kota_ | timburke: o/ | 01:26 |
acoles | kota_: good morning | 01:27 |
kota_ | acoles: morning, oh you're at SFO? | 01:27 |
acoles | kota_: yes for this week | 01:28 |
kota_ | make sense | 01:28 |
*** mvk has quit IRC | 01:57 | |
*** mvk has joined #openstack-swift | 01:59 | |
openstackgerrit | Merged openstack/swift feature/deep: Merge remote-tracking branch 'origin/master' into feature/deep https://review.openstack.org/525269 | 02:52 |
mattoliverau | acoles: o/ | 02:57 |
mattoliverau | kota_: o/ | 02:57 |
*** links has joined #openstack-swift | 03:07 | |
*** threestrands has joined #openstack-swift | 03:27 | |
kota_ | mattoliverau: the symlink patch is getting progress. I almost done on the functests cleanup and waiting m_kazuhiro's review. I think that will be squashed up into the main in 1-2 days. | 03:32 |
kota_ | mattoliverau: and it shows me not significant issues on there so that you can continue the symlink patch reviews except the func tests, imo. | 03:33 |
*** kei_yama has quit IRC | 03:49 | |
*** abhinavtechie has quit IRC | 03:53 | |
mattoliverau | kota_: great, thanks :) | 03:55 |
*** kei_yama has joined #openstack-swift | 04:01 | |
*** psachin has quit IRC | 04:15 | |
*** abhitechie has joined #openstack-swift | 04:20 | |
*** chsc has joined #openstack-swift | 04:28 | |
*** psachin has joined #openstack-swift | 04:30 | |
*** psachin has quit IRC | 04:32 | |
*** chsc has quit IRC | 04:35 | |
*** ianychoi has joined #openstack-swift | 05:07 | |
*** psachin has joined #openstack-swift | 05:10 | |
*** klrmn has quit IRC | 05:21 | |
*** threestrands has quit IRC | 05:25 | |
*** two_tired has quit IRC | 05:41 | |
*** mat128 has quit IRC | 05:47 | |
openstackgerrit | Matthew Oliver proposed openstack/swift master: Add a -L or --list to recon to list all results https://review.openstack.org/525039 | 05:49 |
mattoliverau | ^^ just a few fixes. Just needs some tests for the new helper methods and maybe some -L print tests. | 05:49 |
*** vinsh_ has quit IRC | 06:39 | |
*** armaan has joined #openstack-swift | 06:40 | |
*** armaan has quit IRC | 06:54 | |
*** vinsh has joined #openstack-swift | 07:11 | |
*** pcaruana has joined #openstack-swift | 07:32 | |
*** hseipp has joined #openstack-swift | 07:44 | |
*** rcernin has quit IRC | 07:48 | |
*** neonpastor has quit IRC | 08:00 | |
*** neonpastor has joined #openstack-swift | 08:02 | |
openstackgerrit | Van Hung Pham proposed openstack/swift master: Replace assertTrue(isinstance()) with assertIsInstance() https://review.openstack.org/475639 | 08:17 |
*** armaan has joined #openstack-swift | 08:22 | |
*** tesseract has joined #openstack-swift | 08:24 | |
*** rcernin has joined #openstack-swift | 08:33 | |
*** gkadam has joined #openstack-swift | 08:38 | |
*** pcaruana has quit IRC | 08:40 | |
*** geaaru has joined #openstack-swift | 08:52 | |
*** cbartz has joined #openstack-swift | 09:07 | |
*** armaan has quit IRC | 09:10 | |
*** armaan has joined #openstack-swift | 09:20 | |
*** gleblanc has quit IRC | 09:35 | |
*** abhitechie has quit IRC | 09:54 | |
*** mvk has quit IRC | 09:54 | |
*** amito has joined #openstack-swift | 10:05 | |
amito | Hi, our cinder CI is failing in the last couple of days. looked in the logs and it seems glance is failing consistently on "BackendException: Cannot find swift service endpoint : The request you have made requires authentication. (HTTP 401)". Any idea? | 10:05 |
*** HCLTech-SSW has joined #openstack-swift | 10:09 | |
*** armaan has quit IRC | 10:09 | |
*** mvk has joined #openstack-swift | 10:22 | |
*** kei_yama has quit IRC | 10:26 | |
*** rcernin has quit IRC | 10:30 | |
*** tovin07__ has quit IRC | 10:30 | |
*** ianychoi has quit IRC | 10:55 | |
*** ianychoi has joined #openstack-swift | 10:55 | |
*** SkyRocknRoll has joined #openstack-swift | 10:57 | |
*** HCLTech-SSW has quit IRC | 11:03 | |
*** cshastri has quit IRC | 11:05 | |
*** kukacz has joined #openstack-swift | 11:22 | |
*** cshastri has joined #openstack-swift | 11:47 | |
*** silor has joined #openstack-swift | 11:50 | |
openstackgerrit | Kazuhiro MIYAHARA proposed openstack/swift master: Cleanup Symlink Functional Tests https://review.openstack.org/524203 | 11:59 |
*** armaan has joined #openstack-swift | 12:03 | |
*** cshastri has quit IRC | 12:03 | |
*** ^andrea^ has quit IRC | 12:07 | |
*** oshritf has quit IRC | 12:15 | |
*** zhurong has joined #openstack-swift | 12:45 | |
*** zhurong has quit IRC | 13:02 | |
*** zhurong has joined #openstack-swift | 13:03 | |
*** cshastri has joined #openstack-swift | 13:06 | |
*** links has quit IRC | 13:22 | |
*** SkyRocknRoll has quit IRC | 13:23 | |
*** zhurong has quit IRC | 13:27 | |
*** cshastri has quit IRC | 13:27 | |
*** psachin has quit IRC | 13:33 | |
*** silor1 has joined #openstack-swift | 13:47 | |
*** silor has quit IRC | 13:47 | |
*** silor1 is now known as silor | 13:48 | |
*** mat128 has joined #openstack-swift | 14:03 | |
*** geaaru has quit IRC | 14:24 | |
*** armaan has quit IRC | 14:29 | |
*** armaan has joined #openstack-swift | 14:31 | |
*** _ix has joined #openstack-swift | 14:42 | |
_ix | Good morning folks. I've got a Mitaka cluster with some eight nodes running, and it's largely been pretty great. | 14:43 |
*** geaaru has joined #openstack-swift | 14:43 | |
_ix | Unfortunately, we had some hardware issues that made us reconfigure the rings, and a colleague made a silly mistake of turning down the services without making adjustments to the rings for about 6 weeks. We ended up reformatting those drives on the problem node, and re-introducing it to the cluster. | 14:44 |
_ix | Complete replication took about 5 days after some adjustments to the rsync configurations, several re-adjustments to the rings, and anxiety, but it completed on Sunday night. | 14:45 |
_ix | Anyway, the outstanding issues appear to be related to that six week period of the problem node being offline. The majority of the cluster at normal weights is sitting at some 50-60% disk utilization, while the problem node is above 90% across its disks. | 14:47 |
_ix | There are a number of objects with X-Delete-At values set at 1504*, but there don't appear to be any matching containers for the .expiring_objects account that relate to those timestamps. Essentially, the swift architecture appears to be unaware of these files. | 14:48 |
_ix | My question are: if I understand this correctly and there are a number of .data files lingering on my problem node, is there a way to ensure these files are removed in a cleanup (I guess the Mitaka auditor doesn't take care of this)? | 14:50 |
rledisez | _ix: what could have happened is that the objects were deleted while your node was offline, and the tombstones were reclaimed before your node came back online, so they are now "dark data", they should not be here anymore, but swift can't know it must delete them | 14:50 |
_ix | rledisez: So, I'm on the right track? | 14:51 |
_ix | Are you aware of any safe ways to remove the dark data outside of swift? | 14:52 |
tdasilva | but if the node that was down for 6 weeks had the drives reormatted, how would you get dark data there? | 14:56 |
_ix | tdasilva: That's another question that I had. | 14:56 |
_ix | We had some serious problems in getting the balance correct when we re-introduced this node with empty disks, and I requested from my colleagues that they drop objects that they didn't need any longer to potentially alleviate the replication times, and it's my assumption that some of this stuff was tombstoned and the reclaim age ran down before we achieved stability. | 14:58 |
_ix | That's merely conjecture, though. I'll double check the reclaim age and see if it's any less than the default. | 14:59 |
_ix | I'm assuming the default was 7 days in Mitaka, too. If that's the case, this value hasn't been explicitly set, and it should be 7 days. | 15:01 |
_ix | I am interested in the *why* of these circumstances, but I'm also interested in the *how*, that is, how do I clean up after our mistakes? | 15:03 |
_ix | object manifests/segments with X-Delete-At values, but no associated .expiring_objects container means it's dark data. How do we clean up dark data? | 15:04 |
*** silor has quit IRC | 15:14 | |
*** silor has joined #openstack-swift | 15:14 | |
*** klrmn has joined #openstack-swift | 15:14 | |
*** klrmn has quit IRC | 15:16 | |
_ix | From my reading over other's experiences, this doesn't appear to be a very easy question to answer. | 15:17 |
tdasilva | _ix: clayg might be a good person to ask about dark data, but he is in PST time zone | 15:21 |
_ix | tdasilva: Thanks! I'm actually reading over some conversations on eavesdrop and that sounds like a safe bet. Do you happen to know what time zone redbo is in? | 15:23 |
tdasilva | redbo is in CST i believe | 15:26 |
*** armaan has quit IRC | 15:30 | |
*** armaan has joined #openstack-swift | 15:30 | |
*** openstackgerrit has quit IRC | 15:48 | |
*** klrmn has joined #openstack-swift | 16:00 | |
*** armaan has quit IRC | 16:04 | |
*** oshritf has joined #openstack-swift | 16:38 | |
*** mvk has quit IRC | 16:42 | |
*** oshritf has quit IRC | 16:47 | |
*** abhitechie has joined #openstack-swift | 16:47 | |
*** chsc has joined #openstack-swift | 16:50 | |
*** chsc has joined #openstack-swift | 16:50 | |
frankkahle | i have done another install of openstack-swift. I have made sure all of the requiremetns are met and i have compiled the 1.5 of liberasurecode, I assume that i have to have only .... (dots) and no "E" when i run the unit tests correct.? | 17:03 |
*** kallenp has joined #openstack-swift | 17:09 | |
notmyname | good morning | 17:13 |
notmyname | frankkahle: correct | 17:13 |
*** cbartz has quit IRC | 17:13 | |
*** mvk has joined #openstack-swift | 17:14 | |
frankkahle | ok so I got an E, and hit control-C to abort it and saw the error, "ERROR: test_real_config (test.unit.common.middleware.test_memcache.TestCacheMiddleware)" , how do i debug this? | 17:15 |
*** kallenp has left #openstack-swift | 17:19 | |
*** pcaruana has joined #openstack-swift | 17:22 | |
notmyname | frankkahle: unit tests should all pass on any server where you install swift, but the functests are what's really interesting/important for you to validate a production cluster. | 17:24 |
*** hseipp has quit IRC | 17:25 | |
notmyname | the module path in the error is the python import path for it | 17:25 |
notmyname | eg https://github.com/openstack/swift/blob/master/test/unit/common/middleware/test_memcache.py#L99 | 17:25 |
*** pcaruana has quit IRC | 17:26 | |
*** pcaruana has joined #openstack-swift | 17:27 | |
*** ukaynar has joined #openstack-swift | 17:30 | |
*** gkadam has quit IRC | 17:30 | |
_ix | clayg: Are you around this morning? | 17:31 |
timburke | don't see him at his desk yet, but iirc he's planning on coming in | 17:32 |
_ix | Thanks, Tim. I hope he's in the mood to talk about dark data. | 17:38 |
_ix | I feel like one must whisper when saying... dark data. | 17:38 |
*** pcaruana has quit IRC | 17:58 | |
*** itlinux has joined #openstack-swift | 17:58 | |
clayg | I’m on the train. I’m not sure what to do about expired manifests and unexpired segments. It’s not really dark data if it’s in the container listing. It’s just orphaned segments. | 18:03 |
_ix | clayg: Thanks for weighing in. That gives me something more to consider. | 18:09 |
*** oshritf has joined #openstack-swift | 18:13 | |
*** oshritf has quit IRC | 18:15 | |
*** silor has quit IRC | 18:16 | |
*** tesseract has quit IRC | 18:21 | |
_ix | I've looked again, and the data is not in the container listings. If it were, I suppose it would be trivial to delete. | 18:26 |
_ix | So, does that in turn qualify this as dark data? | 18:27 |
*** oshritf has joined #openstack-swift | 18:31 | |
_ix | The data is in a state that we coined... | 18:33 |
_ix | expired-but-not-yet-deleted-and-not-in-the-.expiring_objects-containers-to-be-deleted-properly state. | 18:33 |
*** dcourtoi has quit IRC | 18:38 | |
*** openstackgerrit has joined #openstack-swift | 18:39 | |
openstackgerrit | Alistair Coles proposed openstack/swift master: Refactor proxy-server conf loading to a utils function https://review.openstack.org/525728 | 18:39 |
acoles | I'd almost forgotten how to do an upstream patch! :) | 18:39 |
*** oshritf has quit IRC | 18:42 | |
*** dcourtoi has joined #openstack-swift | 18:53 | |
*** armaan has joined #openstack-swift | 18:55 | |
*** armaan has quit IRC | 19:02 | |
*** armaan has joined #openstack-swift | 19:03 | |
frankkahle | i'm about ready to give up....I have now built 6 vms in total, ranging from ubuntu 14, 16, 17 and centos multiple versions, based on the instructions here (https://docs.openstack.org/swift/latest/development_saio.html#common-dev-section), carefully done it by the book. built my own version of liberasurecode from git, and yest i still cannot get the unittests to run.... | 19:15 |
tdasilva | frankkahle: what's the unit test error you are seeing? may I also suggest maybe trying one of these: https://docs.openstack.org/swift/latest/associated_projects.html#developer-tools ? | 19:17 |
tdasilva | frankkahle: i main this: https://github.com/thiagodasilva/ansible-saio and it works pretty well for me...altough i'll be honest and say that rhel7.4 has a regression and some unit tests will fail but if you run with a rhel7.3 image it should work fine | 19:19 |
tdasilva | s/i main/i maintain | 19:19 |
frankkahle | well i started the unittests and got lots a bunch of dots, then saw some 'E's andd hoit control-c and saw this error (ERROR: test_real_config (test.unit.common.middleware.test_memcache.TestCacheMiddleware) | 19:19 |
tdasilva | what's the error? | 19:21 |
* tdasilva wonders if it has to do with the tmpdir issue | 19:21 | |
frankkahle | oh maybe something to do with the cryptography not being a high enough version??? | 19:21 |
*** klrmn has quit IRC | 19:22 | |
tdasilva | frankkahle: let the unit tests run and post the errors to http://paste.openstack.org/ | 19:22 |
frankkahle | this is the bottom of the eror... ContextualVersionConflict: (cryptography 1.2.3 (/usr/lib/python2.7/dist-packages), Requirement.parse('cryptography!=2.0,>=1.6'), set(['swift'])) | 19:23 |
frankkahle | should i upgrade that somehow? | 19:24 |
acoles | frankkahle: have you tried running the tests using `tox -e py27 -r` | 19:24 |
frankkahle | and what is tox? | 19:25 |
frankkahle | BTW running on ubuntu 16.04.3 LTS | 19:26 |
acoles | frankkahle: https://docs.openstack.org/swift/latest/development_guidelines.html | 19:26 |
frankkahle | hmm, lol, saus it cannot find tox.ini | 19:29 |
frankkahle | saus=says | 19:30 |
acoles | cd to the root dir of your swift repo | 19:31 |
frankkahle | lol, ok thats is running | 19:31 |
*** klrmn has joined #openstack-swift | 19:34 | |
*** joeljwright has joined #openstack-swift | 19:36 | |
*** ChanServ sets mode: +v joeljwright | 19:36 | |
_ix | clayg: any additional thoughts on expired-but-not-yet-deleted-and-not-in-the-.expiring_objects-containers-to-be-deleted-properly files? | 19:37 |
clayg | _ix: sounds like my original classification may have been incorrect | 19:38 |
frankkahle | its still running, question should the unitests be run as sudo user? | 19:39 |
clayg | an object .data file on-disk (expired or otherwise) that does have a row in it's containers listing is exactly the definition of "dark data" | 19:39 |
clayg | an expired but not yet reaped object would have a row in it's container until the expirer deletes it. so it's probably somewhat inconsequential that the dark data you're finding is expired... | 19:40 |
acoles | clayg: did you mean to type 'done NOT have a row' above? | 19:43 |
clayg | _ix: dark data happens when some how an object .data file persists on a non-primary node longer than the configured reclaim_age (i.e. if you disconnect a node from the primaries for some time, issue a DELETE for some data, wait for the tombstones to get reclaimed then somehow reconnect the orphaned data to the rest of the cluster - which results in the orphaned stale data being repaired in the object tier w/o | 19:44 |
clayg | any record of the earlier, now reclaimed, tombstone/DELETE) | 19:44 |
*** klrmn has quit IRC | 19:44 | |
acoles | frankkahle: you shouldn't need to sudo the unit tests | 19:46 |
clayg | acoles: i'm trying to read it again, I think I meant what I said... an expired .data file (i.e. a .data that exists after the x-delete-at metadata) WOULD have a row in the container - until the object-expirer reaps it ... unless it's dark data | 19:46 |
acoles | clayg: yeah, but the line before 'an object .data file on-disk (expired or otherwise) that does NOT have a row in it's containers listing is exactly the definition of "dark data"' | 19:47 |
frankkahle | i had to sudo to get tox command running..and its showing a lot of OK's so far | 19:47 |
clayg | acoles: yup thank you | 19:48 |
*** klrmn has joined #openstack-swift | 19:48 | |
clayg | _ix: an object .data file on-disk (expired or otherwise) that does NOT have a row in it's containers listing is exactly the definition of "dark data" | 19:48 |
acoles | clayg: teamwork! | 19:49 |
clayg | anyway, the fix is basically to just issue a DELETE request through the proxy for any objects you find that need to be delete'd - if the containers are deleted you might need to recreate them to get the storage-policy correct or a script that will let you set x-backend-storage-policy-override | 19:49 |
clayg | enumeration of dark data is the hardest part | 19:50 |
clayg | i don't have a great example of either... | 19:52 |
*** joeljwright has quit IRC | 19:56 | |
*** joeljwright has joined #openstack-swift | 20:02 | |
*** ChanServ sets mode: +v joeljwright | 20:02 | |
*** chsc has quit IRC | 20:03 | |
*** gkadam has joined #openstack-swift | 20:05 | |
*** armaan has quit IRC | 20:07 | |
_ix | clayg: Thanks for the advice. | 20:08 |
clayg | one intensive audit I've done in the past involved getting a list of all names of all .data files on disk from object metadata (similar to swift-object-info) then doing container listings on all the accounts discovered and digging into any files on disk that didn't show up in the container listings... | 20:11 |
_ix | It seems like a few things would have to go wrong in order for darkdata to be created, however... | 20:14 |
clayg | yes, generally - there's no known *open* bugs that cause dark data | 20:15 |
_ix | The sequence of events that you mentioned here is difficult to compare to our own. Indeed, the node was taken offline. But, before rejoining the cluster, the drives were wiped. | 20:15 |
_ix | clayg: Well, we're running Mitaka, but I think the bugs that we saw annotated were fixed as for... Newton or Ocata. | 20:16 |
clayg | that's a possibility... | 20:16 |
_ix | s/for/of | 20:16 |
_ix | Like I said above, I read through some chat logs where an xfs bulkstat or similar was discussed with redbo. Does that ring any bells for you? | 20:18 |
clayg | afaik nothing ever came of that investigation - and people have made do just walking the object trees like the auditor already does | 20:18 |
clayg | auditor hooks that @torgomatic was working on might have been an option... | 20:18 |
redbo | Did you remove the node from the ring right away? Any handoffs that take longer than a week to clear have the same problem. | 20:20 |
_ix | No, and that's probably where the first major mistake was made. | 20:20 |
_ix | Another engineer just took it offline without making adjustmnets to the ring. | 20:21 |
_ix | It sat for some six weeks, with rsync logs stacking up with errors reaching that node. I'm trying to forget this blunder. | 20:22 |
_ix | And indeed, your week figure is pretty accurate. I think it took about six days to bring the node back into the cluster. | 20:23 |
redbo | So yeah, we bulkstat to get a list of all the objects that actually exist in the system, and then dump all container listings to get a list of everything that's in containers. Then cross collate them to find out what objects aren't in listings. But it's not wrapped up all nice and pretty. | 20:24 |
redbo | Because we have too many objects for all of that data to just put it all in a database and do a set comparison. | 20:26 |
redbo | I say we, I'm not working on that anymore. | 20:26 |
*** geaaru has quit IRC | 20:26 | |
_ix | redbo: I don't mind doing some work. I think the disk utilization pressure has been relieved after beginning to manage the ring definitions appropriately and rebalancing the cluster. | 20:27 |
_ix | Can I assume this is what we're talking about https://github.com/redbo/python-xfs ? | 20:28 |
_ix | I think we can even do without the node that's at issue... if we're patient and adjust the weights to 0 on all of the drives attached to this node, can I assume that eventually we'll bottom out at a point > 0, and wipe those disks once more? | 20:30 |
redbo | Isn't this a dark data thing? If you drop the weights of those drives, it'll just replicate that dark data out. | 20:32 |
* _ix thinks | 20:33 | |
_ix | I'm not sure how the replicator works. I assumed that only non-expired data gets replicated. | 20:34 |
redbo | It's not that smart. So if all of your dark data is expired, you're lucky and can probably clear it with a custom audit type thing. | 20:37 |
_ix | Can you say more about a custom audit type thing? | 20:38 |
_ix | The .data files we've come across so far in our investigations has all been expired. | 20:38 |
redbo | I don't know, just where the auditor pulls the metadata, you could check to see if it's expired and throw it away. | 20:38 |
redbo | Like clayg said, torgomatic worked on make pluggable auditor modules there, but I don't know what happened with that. | 20:39 |
_ix | OK. Well, thanks very much for the discussion. I really appreciate your taking the time. | 20:41 |
redbo | I had to appear, my name was said 3 times. | 20:43 |
*** gyee has joined #openstack-swift | 20:44 | |
*** klrmn_ has joined #openstack-swift | 20:51 | |
openstackgerrit | John Dickinson proposed openstack/swift master: Added swift version to recon cli https://review.openstack.org/413991 | 21:18 |
*** mat128 has quit IRC | 21:34 | |
*** gkadam has quit IRC | 21:38 | |
*** itlinux has quit IRC | 21:39 | |
*** itlinux has joined #openstack-swift | 21:44 | |
*** joeljwright has quit IRC | 21:47 | |
*** nadeem_ has joined #openstack-swift | 21:53 | |
*** nadeem_ has quit IRC | 21:53 | |
openstackgerrit | Tim Burke proposed openstack/swift master: tempurl: Make the digest algorithm configurable https://review.openstack.org/525770 | 21:55 |
openstackgerrit | Tim Burke proposed openstack/swift master: tempurl: Deprecate sha1 signatures https://review.openstack.org/525771 | 21:55 |
*** flwang has quit IRC | 21:56 | |
*** flwang has joined #openstack-swift | 22:01 | |
*** threestrands has joined #openstack-swift | 22:05 | |
*** threestrands has quit IRC | 22:05 | |
*** threestrands has joined #openstack-swift | 22:05 | |
*** rcernin has joined #openstack-swift | 22:05 | |
mattoliverau | morning | 22:23 |
*** klrmn has quit IRC | 22:28 | |
*** klrmn_ has quit IRC | 23:07 | |
*** kei_yama has joined #openstack-swift | 23:22 | |
*** manous_ has joined #openstack-swift | 23:29 | |
*** _ix has quit IRC | 23:36 | |
*** manous_ has quit IRC | 23:40 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!