Monday, 2019-04-01

*** mrjk has quit IRC00:05
*** mrjk has joined #openstack-swift00:06
*** mikecmpbll has quit IRC00:45
*** irclogbot_0 has quit IRC02:19
*** psachin has joined #openstack-swift03:17
*** psachin has quit IRC05:09
*** psachin has joined #openstack-swift05:15
*** ianychoi has quit IRC05:23
*** ianychoi has joined #openstack-swift05:24
*** psachin has quit IRC05:57
*** psachin has joined #openstack-swift06:05
*** ccamacho has quit IRC06:53
*** rcernin has quit IRC06:58
*** pcaruana has joined #openstack-swift06:58
*** pcaruana has quit IRC07:02
*** pcaruana has joined #openstack-swift07:02
*** psachin has quit IRC07:08
*** renich has joined #openstack-swift07:08
*** rdejoux has joined #openstack-swift07:12
*** psachin has joined #openstack-swift07:16
*** psachin has quit IRC07:16
*** ccamacho has joined #openstack-swift07:22
*** e0ne has joined #openstack-swift07:40
*** mikecmpbll has joined #openstack-swift07:43
*** gkadam has joined #openstack-swift08:02
*** e0ne has quit IRC08:11
*** e0ne has joined #openstack-swift08:17
*** renich has quit IRC08:19
*** gkadam has quit IRC08:32
*** tkajinam has quit IRC08:34
*** mikecmpbll has quit IRC08:38
*** mikecmpb_ has joined #openstack-swift08:39
*** e0ne has quit IRC08:50
*** e0ne has joined #openstack-swift08:59
*** e0ne has quit IRC09:16
*** gkadam has joined #openstack-swift09:28
*** e0ne has joined #openstack-swift09:35
*** e0ne has quit IRC10:45
*** e0ne has joined #openstack-swift11:03
*** e0ne has quit IRC11:10
*** ybunker has joined #openstack-swift11:27
*** e0ne has joined #openstack-swift12:11
*** rcernin has joined #openstack-swift12:19
zigoHi there !12:24
zigoI was wondering, how can I simulate a broken HDD, so that the drive audit does its job of commenting in /etc/fstab, etc. ?12:24
*** e0ne has quit IRC12:38
*** mvkr has quit IRC12:49
zigoI could remove a drive with qemu's console, now how do I force Swift to audit the drive and remove it from fstab like with broken hdds?13:02
*** mvkr has joined #openstack-swift13:16
*** irclogbot_3 has joined #openstack-swift13:27
*** rcernin has quit IRC13:29
*** mrjk has quit IRC13:45
*** mrjk has joined #openstack-swift13:48
*** e0ne has joined #openstack-swift14:06
*** rchurch has joined #openstack-swift14:27
ybunkerhi all, quick question.. is there a way to manually delete partitions on 100% full drives condition?, we can't run object-replicator during the day because of excessive latency, so we run it on a maintenance window (4h) daily, already set the handsoff_ parameters but drives still at 100% of used space14:43
ybunkerso i was wondering if its possible to manually make some space there14:43
ybunkeranyone has run into this kind of situation?14:44
*** itlinux_ has quit IRC14:50
*** e0ne has quit IRC15:24
*** manous has joined #openstack-swift15:30
manoushi All15:30
manoushow can i solve this issue https://paste.fedoraproject.org/paste/4xm~fOEhTLOMqdxmIecMBQ15:31
*** e0ne has joined #openstack-swift15:44
*** itlinux has joined #openstack-swift15:46
*** e0ne has quit IRC15:47
openstackgerritTim Burke proposed openstack/swift master: WIP: s3api: Make multi-deletes async  https://review.openstack.org/64826315:50
timburkegood morning15:50
*** renich has joined #openstack-swift16:29
*** mikecmpb_ has quit IRC16:34
ybunkeranyone?16:39
*** e0ne has joined #openstack-swift16:45
*** gkadam has quit IRC17:05
*** e0ne has quit IRC17:35
*** zigo has quit IRC17:37
claygso with p 571906 once you realize that unquoted symlinks work poorly, and mostly on accident - and that quoted symlinks always work on purpose - it becomes easy to start to think "well, let's just get rid of unquoted symlinks with normalization on the way in, and all the unquoted symlinks we already have on disk that currently work will continue to work"17:50
patchbothttps://review.openstack.org/#/c/571906/ - swift - Make symlink work with Unicode account names - 4 patch sets17:50
claygi think it's a good bug fix personally17:50
claygall credit to timburke17:51
*** e0ne has joined #openstack-swift17:57
*** mvkr has quit IRC18:01
*** rdejoux has quit IRC18:02
timburkefwiw, i think we'll almost certainly want to get that in before trying to port symlink to py3, too18:07
timburkeonce you've got that loaded in your head, you might want to look at https://review.openstack.org/#/c/571907/ too18:10
patchbotpatch 571907 - swift - Make staticweb return URL-encoded Location headers - 2 patch sets18:10
timburkethose were both part of a long chain leading toward https://review.openstack.org/#/c/571908/ -- i don't actually remember what the failures on that were now, though...18:11
patchbotpatch 571908 - swift - Support Unicode in account and user names during f... - 1 patch set18:11
*** klamath has joined #openstack-swift18:18
klamathHowdy, wondering if anyone is around to look at a weird container error im seeing18:18
klamathseeing this error on liberty when trying to stat a container: Apr  1 17:55:21 908172-r2-z2-swiftstorage008 container-server: ERROR __call__ error with GET /disk48/67379/AUTH_XXXXXX/XXXXXX : #012Traceback (most recent call last):#012  File "/openstack/venvs/swift-12.0.13/lib/python2.7/site-packages/swift/container/server.py", line 582, in __call__#012    res = method(req)#012  File "/openstack/venvs/swift-12.0.13/l18:20
klamathib/python2.7/site-packages/swift/common/utils.py", line 2693, in wrapped#012    return func(*a, **kw)#012  File "/openstack/venvs/swift-12.0.13/lib/python2.7/site-packages/swift/common/utils.py", line 1230, in _timing_stats#012    resp = func(ctrl, *args, **kwargs)#012  File "/openstack/venvs/swift-12.0.13/lib/python2.7/site-packages/swift/container/server.py", line 469, in GET#012    resp_headers = gen_resp_heade18:20
klamathrs(info, is_deleted=is_deleted)#012  File "/openstack/venvs/swift-12.0.13/lib/python2.7/site-packages/swift/container/server.py", line 54, in gen_resp_headers#012    'X-Backend-Timestamp': Timestamp(info.get('created_at', 0)).internal,#012  File "/openstack/venvs/swift-12.0.13/lib/python2.7/site-packages/swift/common/utils.py", line 756, in __init__#012    self.timestamp = float(parts.pop(0))#012ValueError: invali18:20
klamathd literal for float(): 14870&4222&3r98018:20
*** e0ne has quit IRC18:28
*** itlinux has quit IRC18:44
claygthat looks like maybe a fast post related timestamp encoding maybe?18:44
klamathcan you explain more clayg?18:45
claygwell, i need to eat some lunch... i was gunna search launchpad for a value error tho.. isn't liberty kinda old-ish?18:47
klamathyes liberty is oldish, the problem just started a few days ago18:47
*** mvkr has joined #openstack-swift18:48
claygORLY!  what changed!?18:49
claygso you'll probably want to find the sqlite databases and compare the object rows in the problem db to a container that doesn't seem to have this problem.18:49
claygyou maybe also verify if all three copies of the sqlitedatabase have weird rows... let me see if I can look at a raw composite timestamp real quick18:50
klamathnothing changed on the 29th, all 4 container servers are reporting the same weird container listing times18:51
claygcan you tell if it's for more than one object?18:52
klamathit is effecting the container itself, cant pull any stat listings from those bad containers with timestamps, looking at around 21 total containers having this problem18:53
claygok, can you find the sqlite db's on disk?  maybe using swift-get-nodes - i see you redacted the path AUTH_XXXXXX/XXXXXX18:54
claygit's on disk48 in partition 67379 *somewhere*18:54
*** itlinux has joined #openstack-swift18:55
*** renich has quit IRC18:55
klamathyes we found the db on disk and looked at the container in question18:57
*** BjoernT has joined #openstack-swift18:57
klamathINSERT INTO "container_info" VALUES('AUTH_XXX','XXXXX','14870&4222&3r980','1487064222.32110','0','1487064222.32110','0',63819,2453015870,'e7d93b85101b026fad7275a0d8927e3b','75b35dfa-9081-468c-aa3a-eb3399a96768','','1487064222.32110','',-1,-1,0,889692);18:57
klamath18:57
klamathCREATE TABLE container_info (18:57
klamath        account TEXT,18:57
klamath        container TEXT,18:57
klamath        created_at TEXT,18:57
klamath        put_timestamp TEXT DEFAULT '0',18:57
klamath        delete_timestamp TEXT DEFAULT '0',18:57
klamath        reported_put_timestamp TEXT DEFAULT '0',18:57
klamath        reported_delete_timestamp TEXT DEFAULT '0',18:57
klamath        reported_object_count INTEGER DEFAULT 0,18:57
klamath        reported_bytes_used INTEGER DEFAULT 0,18:58
klamath        hash TEXT default '00000000000000000000000000000000',18:58
klamath        id TEXT,18:58
klamath        status TEXT DEFAULT '',18:58
klamath        status_changed_at TEXT DEFAULT '0',18:58
klamath        metadata TEXT DEFAULT '',18:58
klamath        x_container_sync_point1 INTEGER DEFAULT -1,18:58
klamath        x_container_sync_point2 INTEGER DEFAULT -1,18:58
klamath        storage_policy_index INTEGER DEFAULT 0,18:58
klamath        reconciler_sync_point INTEGER DEFAULT -118:58
klamath    );18:58
klamathproblem is with the delete_timestamp and the non numeric values stored in it18:58
timburkei'm guessing its some db corruption -- the string length is right for a timestamp, and '&' and '6' or '&' and '.' are just a few bitflips away from each other19:08
timburkeeven '2' and 'r' are just one bitflip away...19:09
klamathanyway to update these?19:09
klamathwould posting to the container to update the delete_timestamp?19:10
timburkewas it the delete, or the create timestamp that was causing trouble? i thought create...19:12
timburkefrom a sqlite3 prompt, something like `UPDATE container_info SET created_at='1487064222.32980';` would probably do19:12
timburkedo all replicas have that, or was it at least limited to just one db?19:12
klamathwould you need to acquire lock an all dbs or just one and have it propagate out?19:12
klamathall dbs are showing the same bad timestamp19:13
timburkemight be worth looking around to see if you can establish a consensus about what it *should* be19:13
timburke:-(19:13
claygtimburke: you must have guessed db corruption and then played with the bytes?  You don't just intuitively KNOW that '&' and '6' are near each other in an ascii table!?  DO YOU!?19:13
claygtimburke: object will have an x-timestamp - might not even be corrupted since we checksum object metadata19:13
* timburke shrugs innocently19:14
klamathwe havent made any changes to the db at this point, just ro19:14
klamathyea it appears in this case created_at is corrupt19:14
claygif it's at all useful, composite timestamps look like:            created_at = 1554146043.88703+991803+019:16
claygso... my guess was wrong.19:16
timburkeso, a thing worth noting: now that we've identified at least *one* corrupt db... and likely seen that corruption spreading to *other* dbs... i'm more than a little worried about what *else* might be corrupted19:16
claygklamath: yeah post to the object with some bs metadata might be good enough... i don't know liberty...19:16
claygtimburke: try not to stress about that and just put "more checksumming of sqlite data" on the todo list somewhere19:17
claygsqlite has some internal checksuming - it might be interesting to dig into how it managed to fail in this case19:18
claygklamath: what version of sqlite are you running!?19:18
claygtimburke: I think the newest version of the replicator might have to be a bit smarter about having to parse rows (you were looking at merge_items recently) - it's possible it wouldn't have been able to propagate the corruption19:19
klamath2.8.1719:20
klamathany pointers on a metadata update that would trigger a container update?19:23
BjoernTwe run sqllite3 wihch introduced a new locking  ""SQLite Version 3.0.0 introduced a new locking and journaling mechanism designed to improve concurrency over SQLite version 2 and to reduce the writer starvation problem. The new mechanism also allows atomic commits of transactions involving multiple database files. This document describes the new locking mechanism. The intended audience is programmers who want to understand and/or modify the pager19:24
BjoernT code and reviewers working to verify the design of SQLite version 3.""" which has a topic around "How To Corrupt Your Database Files"19:24
BjoernT libsqlite3-0:amd64                    3.8.2-1ubuntu2.1                 amd64        SQLite 3 shared library19:25
*** e0ne has joined #openstack-swift19:27
BjoernTperhaps nobarrier screwed us over here19:27
timburkeclayg, i guess it'd probably be worth pulling https://github.com/openstack/swift/blob/2.21.0/swift/common/db.py#L566-L570 out of SQL -- parse the values as actual timestamps, make the comparisons in python and store the greater...19:29
timburkeklamath, it's particularly tricky because it's created_at that's corrupted -- and that only gets set (as i recall) during the broker's _initialize, so only when you don't already have a db file on disk19:30
claygOh, itโ€™s the container info ๐Ÿ˜‚19:31
timburkeif it were put_timestamp instead, you could probably just issue a new PUT for the container, but as it is... not sure there's a good way do fix this via the swift API19:31
klamathcan we manually update that created_at on the sqlite level and have it replicate out?19:31
*** ybunker has quit IRC19:31
claygThat actually explains how the corruption spread a little better. But not between db. Common disk maybe?19:34
timburkemight be safest to stop the container replicators on the affected nodes, manually run the update, then restart replicators. the trouble is that the '&' is going to compare less than '6', so the corrupt timestamp will win out during replication19:34
claygHaha19:34
timburkemaybe if you don't mind an inaccurate created_at, you could set it to 1486964222.32980 instead of 1487064222.32980?19:35
*** spsurya has quit IRC19:36
timburkeand be very very happy that the corruption didn't occur in that leading digit ;-)19:36
*** manous has quit IRC19:40
klamathany risk in increasing the timestamp timburke?19:41
klamathi just tired using swiftly to put a file into that bad container and it uploaded but still cant pull container listing or any info from that container19:43
timburkeklamath, i think the risks to using an earlier timestamp are fairly low -- fortunately, the position of the corruption means that it'll only change the created_at by a couple days or so19:48
timburke`UPDATE container_info SET created_at='1486964222.32980';` is seeming better and better19:49
timburkedon't have to stop the replicators, should be able to do it on just one affected db...19:50
timburkestill might take a bit to have the replicators propagate it out to all replicas, though19:50
BjoernThow sure are you that only one bit flipped and not multiple ?19:51
BjoernTlooking at this timestamp puts us in 197419:51
timburkeeh? i'm seeing feb 2017...19:52
BjoernTits milli seconds ?19:52
timburkeso my thinking is that '14870&4222&3r980' was *supposed* to be '1487064222.32980' -- which required a total of four bitflips19:53
BjoernToh I was looking at r is the .19:53
timburkeoh! maybe only 3 flips... for some reason i thought one of them required two flips...19:56
timburkeoh, i was trying to go . -> 6 when i needed to be going & -> . and & -> 619:58
klamaththat fixed the problem on this one container20:23
BjoernTgot a new date, lol 1478352919.<972920:25
BjoernT< = 4 ?20:25
BjoernTor  820:26
BjoernTprobably doesnt matter as it subseconds20:26
BjoernTinterestingly it is always just container_info20:38
*** itlinux has quit IRC20:58
*** itlinux has joined #openstack-swift20:59
*** e0ne has quit IRC21:03
*** pcaruana has quit IRC21:17
*** samueldmq has joined #openstack-swift21:36
*** itlinux has quit IRC21:43
*** ccamacho has quit IRC21:43
claygklamath: WTFG!!!21:43
claygtell your boss you get a raise21:44
*** itlinux has joined #openstack-swift21:44
*** itlinux has quit IRC21:44
*** BjoernT has quit IRC22:01
*** renich has joined #openstack-swift22:31
*** rcernin has joined #openstack-swift22:40
*** tkajinam has joined #openstack-swift22:56
*** mikecmpbll has joined #openstack-swift23:15
*** itlinux has joined #openstack-swift23:24
*** renich has quit IRC23:24
*** renich has joined #openstack-swift23:38
*** openstackgerrit has quit IRC23:56
*** BjoernT has joined #openstack-swift23:57
*** timburke has quit IRC23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!