openstackgerrit | Pete Zaitcev proposed openstack/swift master: py3: port bulk middleware https://review.openstack.org/619303 | 00:12 |
---|---|---|
*** hoonetorg has quit IRC | 00:24 | |
*** hoonetorg has joined #openstack-swift | 00:26 | |
*** itlinux has joined #openstack-swift | 00:58 | |
*** gyee has quit IRC | 01:21 | |
*** itlinux_ has joined #openstack-swift | 01:38 | |
*** itlinux_ has quit IRC | 01:40 | |
*** itlinux has quit IRC | 01:41 | |
*** psachin has joined #openstack-swift | 02:43 | |
notmyname | kota_: rledisez: mattoliverau: looking at the meeting schedule for tomorrow (because i've got a potential conflict, or at least time crunch)... | 04:43 |
notmyname | I wanted to follow up on a couple of things that were mentioned last week | 04:43 |
kota_ | notmyname: o/ | 04:44 |
notmyname | (1) for kota_ and rledisez, what's the status of gate jobs on feature/losf? | 04:44 |
notmyname | (2) for rledisez, did you do anything about the docs that are "encouraging" device names in lieu of labels or uuids? (at least file a bug, if not a patch) | 04:44 |
notmyname | IMO, these questions can have an async answer, and if there is no follow-up needed, then we don't need a meeting tomorrow | 04:45 |
kota_ | notmyname: not progressed yet so much. I briefly looked at the dsvm gate job and zuul docs, but I didn't reach out the reason why we dropped the dsvm gate on losf branch. | 04:45 |
notmyname | ok | 04:45 |
kota_ | I'll ask to infra team in this week. | 04:45 |
notmyname | kota_: if you need help with getting that fixed, please ask | 04:45 |
notmyname | sounds good | 04:45 |
*** ianychoi_ has joined #openstack-swift | 05:24 | |
*** ianychoi has quit IRC | 05:28 | |
*** hoonetorg has quit IRC | 05:33 | |
*** hoonetorg has joined #openstack-swift | 05:50 | |
*** e0ne has joined #openstack-swift | 06:31 | |
*** zaitcev has quit IRC | 07:13 | |
*** ccamacho has joined #openstack-swift | 07:41 | |
*** e0ne has quit IRC | 07:47 | |
*** e0ne has joined #openstack-swift | 07:54 | |
*** admin6_ has joined #openstack-swift | 07:58 | |
*** admin6 has quit IRC | 08:00 | |
*** admin6_ is now known as admin6 | 08:00 | |
*** hseipp has joined #openstack-swift | 08:01 | |
*** tkajinam has quit IRC | 08:09 | |
*** e0ne has quit IRC | 08:16 | |
*** pcaruana has joined #openstack-swift | 08:29 | |
rledisez | notmyname: about the device name vs label/uuid, i filed a bug ( https://bugs.launchpad.net/swift/+bug/1817966 ) and I answered to the mail. no patch yet, still on my todo list | 08:52 |
openstack | Launchpad bug 1817966 in OpenStack Object Storage (swift) "Encourage the use of static device names in fstab" [Undecided,New] | 08:52 |
rledisez | clayg: I though many times of removing the REPLICATE call after data transmission. I'm pretty sure I did it once in a specific situation, but nothing I did permanently. I was not totally of all the consequences. But it seems reasonable to do it with SSYNC | 08:59 |
admin6 | hi team, do you have any idea why one disk, having the same size as the other disks of a node and the same relative weight in the ring, could suddenly start to be fullfilled by swift. Even after reducing it’s weight it continues to grow up to 100% full. And furthermore, I now have the same behavior on all the disks of this server… | 09:30 |
rledisez | admin6: can you check the folders other than objects (tmp, quarantines, etc…) to see where it is growing | 09:44 |
rledisez | we already had case where an object was replicated, then quarantined, then replicated, etc... | 09:45 |
admin6 | rledisez: didn’t seem to be that. I’ve checked the size of dir quarantined on these disk and the values are standard, about the same compared to another server. | 09:57 |
admin6 | rledisez: I’ve also checked async_pending ans tmp dirs. | 10:01 |
rledisez | admin6: then it is probably dispersion | 10:06 |
admin6 | rledisez: could you be a bit more precise ? I know I have a really bad dispersion currently, besauce I’m trying to reduce the number of zones in my ring from 6 to 4 (or maybe 5 zones), but I’ve paused this project for a while. Here is the disperion of this ring : Dispersion is 24.258486, Balance is 21.684194, Overload is 10.00% Required overload is 358.386218% Worst tier is 43.564049 (r1z2-10.10.1.52) | 10:11 |
rledisez | i'm talking dispersion on disk, not in the ring. it means the replicator/reconstructor still need to work to put all data back in place. are you using ssync or rsync ? | 10:33 |
rledisez | there is no real tool to check that easily | 10:33 |
rledisez | you basically need to request the ring for the partition placed on a given device | 10:34 |
rledisez | then compare with what's on disk (ls /srv/node/…/objects/) | 10:34 |
*** e0ne has joined #openstack-swift | 10:35 | |
admin6 | rledisez: I’m using ssync with swift 2.17. how can I get from the ring the list of partition placed on a given drive ? | 10:53 |
*** FlorianFa has quit IRC | 11:01 | |
*** FlorianFa has joined #openstack-swift | 11:02 | |
*** pcaruana has quit IRC | 11:04 | |
*** ianychoi_ is now known as ianychoi | 11:20 | |
*** pcaruana has joined #openstack-swift | 11:32 | |
admin6 | Do you know how I can request a ring to get the list of partitions placed on a given drive ? | 11:38 |
admin6 | sorry, swift-ring-builder list_parts seems to be a good candidate for my previous question ;-) | 12:01 |
*** henriqueof has joined #openstack-swift | 12:02 | |
*** mvkr has joined #openstack-swift | 12:13 | |
*** henriqueof has quit IRC | 12:16 | |
*** ybunker has joined #openstack-swift | 12:28 | |
ybunker | hi all, question, i just add two new storage nodes to an existing cluster (queens), and I notice that the account/container files are not replicating at all..obj files are working fine, but acct and cont no, any ideas? in the swift_container_server.error file the only thing Im seeing is the following err msg: | 12:29 |
ybunker | container-replicator: Can't find itself 127.0.0.1, ::1, 1x.xx.xx.xx, fxxx::xxxx:xxxx:xxxx:xxxx, 1x.xx.xx.xx, fxxx::xxxx:xxxx:xxxx:xxxx, 1x.xx.xx.xx, xxxx::xxxx:xxxx:xxxx:xxxx with port 5103 in ring file, not replicating | 12:29 |
ybunker | and the swift_container_server.log file is full of 404 errors on PUT operations | 12:30 |
ybunker | any ideas? | 12:35 |
ybunker | anyone? | 12:45 |
ybunker | the container ring file (http://pasted.co/3314af99) | 12:54 |
ybunker | nodes 10.1.1.19 and 10.1.1.20 are the new ones... | 12:54 |
ybunker | ?? | 13:19 |
ybunker | on rsyncd log im seeing this error msg (unknown module 'container' tried) | 13:31 |
admin6 | ybunker: maybe you forgot to declare account and container section in rsyncd.conf on the new server ? | 13:39 |
ybunker | admin6: I did it already (http://pasted.co/e62c72a9) | 13:43 |
ybunker | i dont know what else to look for... account and container is not replicating at all, check the files inside the /srv/node/ for acct cont and no files at all either | 13:44 |
ybunker | is there a way to "manually" copy from other nodes?, at least to lower a little bit the 404 errors | 13:51 |
*** pcaruana has quit IRC | 13:51 | |
ybunker | ? | 13:57 |
*** pcaruana has joined #openstack-swift | 14:01 | |
ybunker | anyone could give a hint on this? i really need to get this thing running ... :S | 14:07 |
DHE | the replication service basically runs "ip addr show" to see what IPs are on this host to find itself (all entries for all disks/devices) in the container ring file, on the port(s) it listens on. it failed to find itself. | 14:08 |
DHE | there is no 10.1.1.19 in your container ringbuilder | 14:09 |
DHE | oh wait, there it is but out of order. n/m that | 14:09 |
*** FlorianFa has quit IRC | 14:09 | |
ybunker | DHE: mm what do you mean with out of order? | 14:18 |
DHE | your paste doesn't list devices in IP address order. but that's my fault for not paying enough attention | 14:21 |
ybunker | DHE: oh I see, yeah i dont know why it put server 19 (10.1.1.19) with that ID, anyway, besides that i can't find any miss config or anything different on the other servers. | 14:23 |
ybunker | any idea? | 14:49 |
ybunker | i compare rsyncd file with all the other nodes in the cluster and they are the same, of course except for the address that changes on all of the nodes | 14:49 |
ybunker | also check the container-server.conf file, its the same on all the nodes | 14:50 |
ybunker | i also check verify that rsync is accepting connections for all servers http://pasted.co/21ab29c8 | 14:59 |
ybunker | mmm on the logs of the container now is showing: container-sync: Skipping tmp as it is not mounted | 15:09 |
ybunker | container-sync: Skipping containers as it is not mounted | 15:09 |
ybunker | container-sync: Skipping accounts as it is not mounted | 15:09 |
admin6 | HA all. I’m still working on my "disk full" problem for an erasure coding ring. As rledisez suggested me, I had a look at the "disk dispersion" and I found a big delta between the list_parts declared in the ring (14127 declared on this disk) and the existing directories in the object folder (57000+ directories). Looking into some of the 43000 additional dirs that are not listed in the list_part, I see a lot of directo | 15:21 |
admin6 | fullfilled with real datas (valid fragments of objects) that have been accessed recently, but has nothing to do there as they are not part of the main fragment places, neither the first 12 handoffs. It looks like they are old handoff that are parsed by some swift process but never have been cleaned. May I have missed something in my reconstructor config that prevent to clean these files ? | 15:21 |
notmyname | rledisez: alecuyer: how do you feel about skipping today's meeting? but I see you added something about grpc to the agenda... | 15:30 |
rledisez | notmyname: it can be skipped, nothing urgent | 15:32 |
notmyname | rledisez: ok | 15:32 |
ybunker | any ideas? im kind of stuck in here | 15:47 |
*** pcaruana has quit IRC | 15:53 | |
*** pcaruana has joined #openstack-swift | 16:06 | |
*** ccamacho has quit IRC | 16:27 | |
ybunker | ... | 16:36 |
clayg | admin6: handoffs_only + reconstructor_workers = #_of_disks should get you back on track | 16:51 |
clayg | ybunker: for some reason it sounds like the devices config option is off a directory - all account/container/object config should be /srv/node | 16:53 |
ybunker | clayg: acct and cont are inside /srv/node/1, /srv/node/2 and /srv/node/3 | 16:55 |
*** pcaruana has quit IRC | 16:55 | |
clayg | admin6: there's a couple of open bugs I know about that can cause EC handoff partition not to remove cleanly -> https://bugs.launchpad.net/swift/+bug/1816501 is one... | 16:55 |
openstack | Launchpad bug 1816501 in OpenStack Object Storage (swift) "reconstructor doesn't remove empty handoff dirs with reclaimed tombstones" [Undecided,New] | 16:55 |
clayg | admin6: also https://bugs.launchpad.net/swift/+bug/1778002 | 16:56 |
openstack | Launchpad bug 1778002 in OpenStack Object Storage (swift) "EC non-durable fragment won't be deleted by reconstructor. " [Medium,Confirmed] | 16:56 |
admin6 | clayg: thanks, but I’m already running with these options set since yesterday, but It has no real results on my disk usage. :-( | 16:56 |
clayg | ybunker: that's a little different than normal - most of the time a single node has /srv/node/<device>/[account|container|object] | 16:57 |
clayg | so the accounts|containers|objects dirs are all parallel and all the nodes devices option is /srv/node | 16:57 |
clayg | this is useful esp if your a&c and o share disks - you just mount everything into /srv/node/<device> | 16:57 |
ybunker | clayg: i've /etc/swift/account-server/1.conf... 2.conf... 3.conf... r_1.conf...r_2.conf and r_3.conf on all the data nodes | 16:57 |
clayg | ybunker: you CAN set it up the way you described - but I bet you hvae to be extra careful with your rsync.conf | 16:58 |
ybunker | clayg: yeah... the thing is that rsync conf is the same for all the nodes.. and actually objects are being replicated.. but acct and cont no :( | 16:58 |
clayg | separate configs for front-end and backend services is not uncommon, you certainly CAN manage all account services in a single config - or weird setups like the SAIO have multiple configs to simulate multiple "nodes" on a single machine | 16:59 |
clayg | ybunker: so maybe the rsync config is correct for object (you said /srv/node/1?) and so then it's wrong for a&c? | 17:00 |
ybunker | clayg: objects are on /srv/node/4 .../5... /12, and acct cont on /srv/node/1... /2... /3 | 17:00 |
clayg | admin6: for the referenced bugs something like https://gist.github.com/clayg/7a975ef3b34828c5ac7db05a519b6e8a might help 🤷♂️ | 17:00 |
ybunker | i check permissions and are the same for all the directories | 17:01 |
clayg | admin6: but if you have DATA in the disks really you should just need to crank up the reconstructor handoff_only workers and let her run | 17:01 |
clayg | what is the reconstructor *doing* - are the disks "busy" (i.e. iostat -dmx 2) | 17:01 |
*** ccamacho has joined #openstack-swift | 17:02 | |
clayg | ybunker: maybe paste you `account-replicator` config, and your `rsync.conf` and the output of `swift-account-replicator /replicator.conf once verbose` | 17:03 |
clayg | ybunker: it seems like you have a slightly unorthodox configuratoin and probably something that normally the default is fine needs to be changed so all the pieices are wired up correctly | 17:03 |
clayg | there's probably a hint in the logs - you just need to cut through the noise and zoom in on what's really broken (the error msg may be only kind of "indirectly" related) | 17:04 |
clayg | ybunker: I'd guess it's something to do with rsync_module maybe - that stuff is trixy | 17:06 |
ybunker | clayg: let me post the config files, hopefully is something like that | 17:06 |
*** gyee has joined #openstack-swift | 17:06 | |
clayg | admin6: I need to bounce off for a bit - good luck! the reconstructor is a beast - you can get it tuned - don't forget to check your incoming concurrency settings (reconstructor talks to object-server - there's a replication related concurrency setting you may need to open up) | 17:07 |
admin6 | clayg: thanks a lot | 17:07 |
clayg | admin6: if you've never done a rebalance before there's lots of misconfigs that could be in effect (see ybunker's current crisis) - maybe https://bugs.launchpad.net/swift/+bug/1446873 ??? | 17:08 |
openstack | Launchpad bug 1446873 in OpenStack Object Storage (swift) "ssync doesn't work with replication_server = true" [Medium,Confirmed] | 17:08 |
clayg | ybunker: admin6: either of you all going to come see us in Denver for the Summit!? Swift party time! I'll buy you a beer 🍻 | 17:09 |
admin6 | clayg: no chance to see me there unfortunaltely, but i’d really appreciate to drink a beer with you at another swift party time. | 17:11 |
ybunker | clayg: find the config detail (http://pasted.co/34e6f214) | 17:15 |
*** ccamacho has quit IRC | 17:15 | |
guimaluf | hi all, when I run swift-ring-builder dispersion on my object.builder I'm getting 0.58% of dispersion, that means 189 partition within the same region. How can I fix this? Should I increase weight in one node on r1? or decrease weight in r2? any suggestion would be appreciate | 17:16 |
guimaluf | d | 17:16 |
*** ccamacho has joined #openstack-swift | 17:19 | |
ybunker | clayg: it seems that container-replicator has nothing to replicate at all... http://pasted.co/15f955ef | 17:24 |
*** hseipp has quit IRC | 17:27 | |
*** e0ne has quit IRC | 17:34 | |
ybunker | clayg: did you find anything wrong on the configs? | 17:53 |
ybunker | can I push the data from acct cont from another of the nodes? | 17:54 |
*** ccamacho has quit IRC | 18:00 | |
*** psachin has quit IRC | 18:05 | |
*** zaitcev has joined #openstack-swift | 18:13 | |
*** ChanServ sets mode: +v zaitcev | 18:13 | |
ybunker | ? | 18:18 |
ybunker | ok.. now im getting: | 18:19 |
ybunker | object-replicator: rsync error: error starting client-server protocol (code 5) at main.c(1653) [sender=3.1.1] | 18:19 |
ybunker | clayg: find that on some proxy nodes: | 18:37 |
ybunker | Ring file account.ring.gz is obsolete | 18:37 |
*** ybunker has quit IRC | 19:33 | |
*** e0ne has joined #openstack-swift | 20:03 | |
*** e0ne has quit IRC | 20:21 | |
clayg | that whole "early quorum" stuff maybe isn't so great when the early part of the quorum is an error | 20:21 |
mattoliverau | So are we meeting today? it seems both of notmyname's things were answered | 20:32 |
clayg | that post_quorum_timeout is a heck of a setting | 20:53 |
kota_ | morning | 21:00 |
* kota_ is scrolling back to know if the meeting happens | 21:01 | |
mattoliverau | kota_ I don't know I asked the question | 21:02 |
kota_ | mattoliverau: o/ | 21:03 |
mattoliverau | kota_: notmyname asked rledisez if he's ok if the meeting is skipped and seemed to be ok with it.. maybe that means it is being skipped (reading scrollback) | 21:05 |
mattoliverau | So /me might go and eat breakfast :) | 21:06 |
kota_ | mattoliverau: got it, thanks :) | 21:07 |
kota_ | my wife and kids are still in asleep so let's go back to my bed :P | 21:08 |
kota_ | oic, i found the talk line about skipping meeting. | 21:10 |
mattoliverau | kota_: yeah go sleep while you can ;) see you a little later here :) | 21:11 |
*** e0ne has joined #openstack-swift | 21:13 | |
notmyname | kota_: mattoliverau: yeah, thanks for being flexible :-) | 21:16 |
zaitcev | timburke: do you think Request.path_info returns WSGI string or native string? Seems like returning WSGI in the current code, but what would you do if you had a clean slate? | 21:35 |
timburke | zaitcev, i think i'd keep it as wsgi, just since we're proxying straight through to env[PATH_INFO] | 21:44 |
timburke | we *do* at least have Request.swift_entity_path to get us native strings... | 21:45 |
zaitcev | timburke: okay. That decision means a ton of wsgi_to_str, is all. | 21:45 |
timburke | *maybe* Request.path should do that, too? | 21:45 |
timburke | what about introducing a Request.path_info_str property or something? | 21:46 |
zaitcev | I hit it in slo, and I'm going to roll those extra wsgi_to_str in other modules. Tests didn't cover it adequately. | 21:46 |
zaitcev | Hmm. | 21:46 |
zaitcev | Let's start with adding wsgi_to_str everywhere and then maybe path_info_str if it gets too much. The annoying path really is the self.split_path, because that one has v,a,c,o = self.split_path() pattern and inserting conversions is annoying. | 21:47 |
*** e0ne has quit IRC | 21:58 | |
*** e0ne has joined #openstack-swift | 22:01 | |
*** e0ne has quit IRC | 22:01 | |
*** rcernin has joined #openstack-swift | 22:52 | |
clayg | oh how wise he is... https://bugs.launchpad.net/swift/+bug/1503161/comments/17 | 22:53 |
openstack | Launchpad bug 1503161 in OpenStack Object Storage (swift) "[Re-open in 2015 Oct] DELETE operation not write affinity aware" [Medium,Fix released] - Assigned to Lingxian Kong (kong) | 22:53 |
*** tkajinam has joined #openstack-swift | 23:01 | |
timburke | i don't even remember what that was about | 23:04 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!