Wednesday, 2019-03-06

openstackgerrit	Pete Zaitcev proposed openstack/swift master: py3: port bulk middleware https://review.openstack.org/619303	00:12
*** hoonetorg has quit IRC		00:24
*** hoonetorg has joined #openstack-swift		00:26
*** itlinux has joined #openstack-swift		00:58
*** gyee has quit IRC		01:21
*** itlinux_ has joined #openstack-swift		01:38
*** itlinux_ has quit IRC		01:40
*** itlinux has quit IRC		01:41
*** psachin has joined #openstack-swift		02:43
notmyname	kota_: rledisez: mattoliverau: looking at the meeting schedule for tomorrow (because i've got a potential conflict, or at least time crunch)...	04:43
notmyname	I wanted to follow up on a couple of things that were mentioned last week	04:43
kota_	notmyname: o/	04:44
notmyname	(1) for kota_ and rledisez, what's the status of gate jobs on feature/losf?	04:44
notmyname	(2) for rledisez, did you do anything about the docs that are "encouraging" device names in lieu of labels or uuids? (at least file a bug, if not a patch)	04:44
notmyname	IMO, these questions can have an async answer, and if there is no follow-up needed, then we don't need a meeting tomorrow	04:45
kota_	notmyname: not progressed yet so much. I briefly looked at the dsvm gate job and zuul docs, but I didn't reach out the reason why we dropped the dsvm gate on losf branch.	04:45
notmyname	ok	04:45
kota_	I'll ask to infra team in this week.	04:45
notmyname	kota_: if you need help with getting that fixed, please ask	04:45
notmyname	sounds good	04:45
*** ianychoi_ has joined #openstack-swift		05:24
*** ianychoi has quit IRC		05:28
*** hoonetorg has quit IRC		05:33
*** hoonetorg has joined #openstack-swift		05:50
*** e0ne has joined #openstack-swift		06:31
*** zaitcev has quit IRC		07:13
*** ccamacho has joined #openstack-swift		07:41
*** e0ne has quit IRC		07:47
*** e0ne has joined #openstack-swift		07:54
*** admin6_ has joined #openstack-swift		07:58
*** admin6 has quit IRC		08:00
*** admin6_ is now known as admin6		08:00
*** hseipp has joined #openstack-swift		08:01
*** tkajinam has quit IRC		08:09
*** e0ne has quit IRC		08:16
*** pcaruana has joined #openstack-swift		08:29
rledisez	notmyname: about the device name vs label/uuid, i filed a bug ( https://bugs.launchpad.net/swift/+bug/1817966 ) and I answered to the mail. no patch yet, still on my todo list	08:52
openstack	Launchpad bug 1817966 in OpenStack Object Storage (swift) "Encourage the use of static device names in fstab" [Undecided,New]	08:52
rledisez	clayg: I though many times of removing the REPLICATE call after data transmission. I'm pretty sure I did it once in a specific situation, but nothing I did permanently. I was not totally of all the consequences. But it seems reasonable to do it with SSYNC	08:59
admin6	hi team, do you have any idea why one disk, having the same size as the other disks of a node and the same relative weight in the ring, could suddenly start to be fullfilled by swift. Even after reducing it’s weight it continues to grow up to 100% full. And furthermore, I now have the same behavior on all the disks of this server…	09:30
rledisez	admin6: can you check the folders other than objects (tmp, quarantines, etc…) to see where it is growing	09:44
rledisez	we already had case where an object was replicated, then quarantined, then replicated, etc...	09:45
admin6	rledisez: didn’t seem to be that. I’ve checked the size of dir quarantined on these disk and the values are standard, about the same compared to another server.	09:57
admin6	rledisez: I’ve also checked async_pending ans tmp dirs.	10:01
rledisez	admin6: then it is probably dispersion	10:06
admin6	rledisez: could you be a bit more precise ? I know I have a really bad dispersion currently, besauce I’m trying to reduce the number of zones in my ring from 6 to 4 (or maybe 5 zones), but I’ve paused this project for a while. Here is the disperion of this ring : Dispersion is 24.258486, Balance is 21.684194, Overload is 10.00% Required overload is 358.386218% Worst tier is 43.564049 (r1z2-10.10.1.52)	10:11
rledisez	i'm talking dispersion on disk, not in the ring. it means the replicator/reconstructor still need to work to put all data back in place. are you using ssync or rsync ?	10:33
rledisez	there is no real tool to check that easily	10:33
rledisez	you basically need to request the ring for the partition placed on a given device	10:34
rledisez	then compare with what's on disk (ls /srv/node/…/objects/)	10:34
*** e0ne has joined #openstack-swift		10:35
admin6	rledisez: I’m using ssync with swift 2.17. how can I get from the ring the list of partition placed on a given drive ?	10:53
*** FlorianFa has quit IRC		11:01
*** FlorianFa has joined #openstack-swift		11:02
*** pcaruana has quit IRC		11:04
*** ianychoi_ is now known as ianychoi		11:20
*** pcaruana has joined #openstack-swift		11:32
admin6	Do you know how I can request a ring to get the list of partitions placed on a given drive ?	11:38
admin6	sorry, swift-ring-builder list_parts seems to be a good candidate for my previous question ;-)	12:01
*** henriqueof has joined #openstack-swift		12:02
*** mvkr has joined #openstack-swift		12:13
*** henriqueof has quit IRC		12:16
*** ybunker has joined #openstack-swift		12:28
ybunker	hi all, question, i just add two new storage nodes to an existing cluster (queens), and I notice that the account/container files are not replicating at all..obj files are working fine, but acct and cont no, any ideas? in the swift_container_server.error file the only thing Im seeing is the following err msg:	12:29
ybunker	container-replicator: Can't find itself 127.0.0.1, ::1, 1x.xx.xx.xx, fxxx::xxxx:xxxx:xxxx:xxxx, 1x.xx.xx.xx, fxxx::xxxx:xxxx:xxxx:xxxx, 1x.xx.xx.xx, xxxx::xxxx:xxxx:xxxx:xxxx with port 5103 in ring file, not replicating	12:29
ybunker	and the swift_container_server.log file is full of 404 errors on PUT operations	12:30
ybunker	any ideas?	12:35
ybunker	anyone?	12:45
ybunker	the container ring file (http://pasted.co/3314af99)	12:54
ybunker	nodes 10.1.1.19 and 10.1.1.20 are the new ones...	12:54
ybunker	??	13:19
ybunker	on rsyncd log im seeing this error msg (unknown module 'container' tried)	13:31
admin6	ybunker: maybe you forgot to declare account and container section in rsyncd.conf on the new server ?	13:39
ybunker	admin6: I did it already (http://pasted.co/e62c72a9)	13:43
ybunker	i dont know what else to look for... account and container is not replicating at all, check the files inside the /srv/node/ for acct cont and no files at all either	13:44
ybunker	is there a way to "manually" copy from other nodes?, at least to lower a little bit the 404 errors	13:51
*** pcaruana has quit IRC		13:51
ybunker	?	13:57
*** pcaruana has joined #openstack-swift		14:01
ybunker	anyone could give a hint on this? i really need to get this thing running ... :S	14:07
DHE	the replication service basically runs "ip addr show" to see what IPs are on this host to find itself (all entries for all disks/devices) in the container ring file, on the port(s) it listens on. it failed to find itself.	14:08
DHE	there is no 10.1.1.19 in your container ringbuilder	14:09
DHE	oh wait, there it is but out of order. n/m that	14:09
*** FlorianFa has quit IRC		14:09
ybunker	DHE: mm what do you mean with out of order?	14:18
DHE	your paste doesn't list devices in IP address order. but that's my fault for not paying enough attention	14:21
ybunker	DHE: oh I see, yeah i dont know why it put server 19 (10.1.1.19) with that ID, anyway, besides that i can't find any miss config or anything different on the other servers.	14:23
ybunker	any idea?	14:49
ybunker	i compare rsyncd file with all the other nodes in the cluster and they are the same, of course except for the address that changes on all of the nodes	14:49
ybunker	also check the container-server.conf file, its the same on all the nodes	14:50
ybunker	i also check verify that rsync is accepting connections for all servers http://pasted.co/21ab29c8	14:59
ybunker	mmm on the logs of the container now is showing: container-sync: Skipping tmp as it is not mounted	15:09
ybunker	container-sync: Skipping containers as it is not mounted	15:09
ybunker	container-sync: Skipping accounts as it is not mounted	15:09
admin6	HA all. I’m still working on my "disk full" problem for an erasure coding ring. As rledisez suggested me, I had a look at the "disk dispersion" and I found a big delta between the list_parts declared in the ring (14127 declared on this disk) and the existing directories in the object folder (57000+ directories). Looking into some of the 43000 additional dirs that are not listed in the list_part, I see a lot of directo	15:21
admin6	fullfilled with real datas (valid fragments of objects) that have been accessed recently, but has nothing to do there as they are not part of the main fragment places, neither the first 12 handoffs. It looks like they are old handoff that are parsed by some swift process but never have been cleaned. May I have missed something in my reconstructor config that prevent to clean these files ?	15:21
notmyname	rledisez: alecuyer: how do you feel about skipping today's meeting? but I see you added something about grpc to the agenda...	15:30
rledisez	notmyname: it can be skipped, nothing urgent	15:32
notmyname	rledisez: ok	15:32
ybunker	any ideas? im kind of stuck in here	15:47
*** pcaruana has quit IRC		15:53
*** pcaruana has joined #openstack-swift		16:06
*** ccamacho has quit IRC		16:27
ybunker	...	16:36
clayg	admin6: handoffs_only + reconstructor_workers = #_of_disks should get you back on track	16:51
clayg	ybunker: for some reason it sounds like the devices config option is off a directory - all account/container/object config should be /srv/node	16:53
ybunker	clayg: acct and cont are inside /srv/node/1, /srv/node/2 and /srv/node/3	16:55
*** pcaruana has quit IRC		16:55
clayg	admin6: there's a couple of open bugs I know about that can cause EC handoff partition not to remove cleanly -> https://bugs.launchpad.net/swift/+bug/1816501 is one...	16:55
openstack	Launchpad bug 1816501 in OpenStack Object Storage (swift) "reconstructor doesn't remove empty handoff dirs with reclaimed tombstones" [Undecided,New]	16:55
clayg	admin6: also https://bugs.launchpad.net/swift/+bug/1778002	16:56
openstack	Launchpad bug 1778002 in OpenStack Object Storage (swift) "EC non-durable fragment won't be deleted by reconstructor. " [Medium,Confirmed]	16:56
admin6	clayg: thanks, but I’m already running with these options set since yesterday, but It has no real results on my disk usage. :-(	16:56
clayg	ybunker: that's a little different than normal - most of the time a single node has /srv/node/<device>/[account\|container\|object]	16:57
clayg	so the accounts\|containers\|objects dirs are all parallel and all the nodes devices option is /srv/node	16:57
clayg	this is useful esp if your a&c and o share disks - you just mount everything into /srv/node/<device>	16:57
ybunker	clayg: i've /etc/swift/account-server/1.conf... 2.conf... 3.conf... r_1.conf...r_2.conf and r_3.conf on all the data nodes	16:57
clayg	ybunker: you CAN set it up the way you described - but I bet you hvae to be extra careful with your rsync.conf	16:58
ybunker	clayg: yeah... the thing is that rsync conf is the same for all the nodes.. and actually objects are being replicated.. but acct and cont no :(	16:58
clayg	separate configs for front-end and backend services is not uncommon, you certainly CAN manage all account services in a single config - or weird setups like the SAIO have multiple configs to simulate multiple "nodes" on a single machine	16:59
clayg	ybunker: so maybe the rsync config is correct for object (you said /srv/node/1?) and so then it's wrong for a&c?	17:00
ybunker	clayg: objects are on /srv/node/4 .../5... /12, and acct cont on /srv/node/1... /2... /3	17:00
clayg	admin6: for the referenced bugs something like https://gist.github.com/clayg/7a975ef3b34828c5ac7db05a519b6e8a might help 🤷‍♂️	17:00
ybunker	i check permissions and are the same for all the directories	17:01
clayg	admin6: but if you have DATA in the disks really you should just need to crank up the reconstructor handoff_only workers and let her run	17:01
clayg	what is the reconstructor doing - are the disks "busy" (i.e. iostat -dmx 2)	17:01
*** ccamacho has joined #openstack-swift		17:02
clayg	ybunker: maybe paste you `account-replicator` config, and your `rsync.conf` and the output of `swift-account-replicator /replicator.conf once verbose`	17:03
clayg	ybunker: it seems like you have a slightly unorthodox configuratoin and probably something that normally the default is fine needs to be changed so all the pieices are wired up correctly	17:03
clayg	there's probably a hint in the logs - you just need to cut through the noise and zoom in on what's really broken (the error msg may be only kind of "indirectly" related)	17:04
clayg	ybunker: I'd guess it's something to do with rsync_module maybe - that stuff is trixy	17:06
ybunker	clayg: let me post the config files, hopefully is something like that	17:06
*** gyee has joined #openstack-swift		17:06
clayg	admin6: I need to bounce off for a bit - good luck! the reconstructor is a beast - you can get it tuned - don't forget to check your incoming concurrency settings (reconstructor talks to object-server - there's a replication related concurrency setting you may need to open up)	17:07
admin6	clayg: thanks a lot	17:07
clayg	admin6: if you've never done a rebalance before there's lots of misconfigs that could be in effect (see ybunker's current crisis) - maybe https://bugs.launchpad.net/swift/+bug/1446873 ???	17:08
openstack	Launchpad bug 1446873 in OpenStack Object Storage (swift) "ssync doesn't work with replication_server = true" [Medium,Confirmed]	17:08
clayg	ybunker: admin6: either of you all going to come see us in Denver for the Summit!? Swift party time! I'll buy you a beer 🍻	17:09
admin6	clayg: no chance to see me there unfortunaltely, but i’d really appreciate to drink a beer with you at another swift party time.	17:11
ybunker	clayg: find the config detail (http://pasted.co/34e6f214)	17:15
*** ccamacho has quit IRC		17:15
guimaluf	hi all, when I run swift-ring-builder dispersion on my object.builder I'm getting 0.58% of dispersion, that means 189 partition within the same region. How can I fix this? Should I increase weight in one node on r1? or decrease weight in r2? any suggestion would be appreciate	17:16
guimaluf	d	17:16
*** ccamacho has joined #openstack-swift		17:19
ybunker	clayg: it seems that container-replicator has nothing to replicate at all... http://pasted.co/15f955ef	17:24
*** hseipp has quit IRC		17:27
*** e0ne has quit IRC		17:34
ybunker	clayg: did you find anything wrong on the configs?	17:53
ybunker	can I push the data from acct cont from another of the nodes?	17:54
*** ccamacho has quit IRC		18:00
*** psachin has quit IRC		18:05
*** zaitcev has joined #openstack-swift		18:13
*** ChanServ sets mode: +v zaitcev		18:13
ybunker	?	18:18
ybunker	ok.. now im getting:	18:19
ybunker	object-replicator: rsync error: error starting client-server protocol (code 5) at main.c(1653) [sender=3.1.1]	18:19
ybunker	clayg: find that on some proxy nodes:	18:37
ybunker	Ring file account.ring.gz is obsolete	18:37
*** ybunker has quit IRC		19:33
*** e0ne has joined #openstack-swift		20:03
*** e0ne has quit IRC		20:21
clayg	that whole "early quorum" stuff maybe isn't so great when the early part of the quorum is an error	20:21
mattoliverau	So are we meeting today? it seems both of notmyname's things were answered	20:32
clayg	that post_quorum_timeout is a heck of a setting	20:53
kota_	morning	21:00
* kota_ is scrolling back to know if the meeting happens		21:01
mattoliverau	kota_ I don't know I asked the question	21:02
kota_	mattoliverau: o/	21:03
mattoliverau	kota_: notmyname asked rledisez if he's ok if the meeting is skipped and seemed to be ok with it.. maybe that means it is being skipped (reading scrollback)	21:05
mattoliverau	So /me might go and eat breakfast :)	21:06
kota_	mattoliverau: got it, thanks :)	21:07
kota_	my wife and kids are still in asleep so let's go back to my bed :P	21:08
kota_	oic, i found the talk line about skipping meeting.	21:10
mattoliverau	kota_: yeah go sleep while you can ;) see you a little later here :)	21:11
*** e0ne has joined #openstack-swift		21:13
notmyname	kota_: mattoliverau: yeah, thanks for being flexible :-)	21:16
zaitcev	timburke: do you think Request.path_info returns WSGI string or native string? Seems like returning WSGI in the current code, but what would you do if you had a clean slate?	21:35
timburke	zaitcev, i think i'd keep it as wsgi, just since we're proxying straight through to env[PATH_INFO]	21:44
timburke	we do at least have Request.swift_entity_path to get us native strings...	21:45
zaitcev	timburke: okay. That decision means a ton of wsgi_to_str, is all.	21:45
timburke	maybe Request.path should do that, too?	21:45
timburke	what about introducing a Request.path_info_str property or something?	21:46
zaitcev	I hit it in slo, and I'm going to roll those extra wsgi_to_str in other modules. Tests didn't cover it adequately.	21:46
zaitcev	Hmm.	21:46
zaitcev	Let's start with adding wsgi_to_str everywhere and then maybe path_info_str if it gets too much. The annoying path really is the self.split_path, because that one has v,a,c,o = self.split_path() pattern and inserting conversions is annoying.	21:47
*** e0ne has quit IRC		21:58
*** e0ne has joined #openstack-swift		22:01
*** e0ne has quit IRC		22:01
*** rcernin has joined #openstack-swift		22:52
clayg	oh how wise he is... https://bugs.launchpad.net/swift/+bug/1503161/comments/17	22:53
openstack	Launchpad bug 1503161 in OpenStack Object Storage (swift) "[Re-open in 2015 Oct] DELETE operation not write affinity aware" [Medium,Fix released] - Assigned to Lingxian Kong (kong)	22:53
*** tkajinam has joined #openstack-swift		23:01
timburke	i don't even remember what that was about	23:04

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!