acoles | ho: there must be something more important than 169035 ;) | 00:01 |
---|---|---|
acoles | notmyname: ^^ ? | 00:01 |
*** vinsh has quit IRC | 00:01 | |
ho | acoles: yeah, 169035 is not a big one. | 00:02 |
notmyname | acoles: ho: I defer to peluse and clayg about https://review.openstack.org/#/c/131872/ | 00:02 |
*** annegentle has quit IRC | 00:02 | |
notmyname | ho: this one is easy, and I want it to be in kilo (for non-technical reasons--for DefCore reasons) https://review.openstack.org/#/c/167828/ | 00:02 |
notmyname | ho: and torgomatic just pushed up a new version of multi-range for EC. that would be great to get in (but since [my] this morning I've been assuming it won't) | 00:04 |
ho | notmyname: acoles: thanks! I will review 167828 first then try 131872 | 00:04 |
notmyname | ho: thanks! | 00:04 |
*** zhill has quit IRC | 00:06 | |
clayg | what'd i do? | 00:07 |
notmyname | clayg: I was deferring to you (and peluse) on what to do next with the reconstructor patch | 00:08 |
clayg | notmyname: oh, nothing really much to review - i'm still working on the reconstructor tests | 00:09 |
yuan | torgomatic, for 168254, there are some other more complicated cases, like mismatched versions on the first K nodes, if we could read from (k+m) nodes then we have more chance to decode the object | 00:10 |
torgomatic | yuan: true, although the code we have on feature/ec today just gives up when it finds mismatched fragment archives | 00:10 |
yuan | torgomatic, yeah I made some updates on that part of code, it tries to get all the responses from k+m nodes and check if there's enough data to decode for one version | 00:12 |
torgomatic | yuan: ok, I'll take another look | 00:12 |
yuan | thanks, the general idea was to count the etags founded and check if there's sum(one etag) >= ecpolicy.ec_ndata | 00:14 |
openstackgerrit | Alistair Coles proposed openstack/swift: Erasure Code Reconstructor https://review.openstack.org/131872 | 00:17 |
openstackgerrit | Alistair Coles proposed openstack/swift: Fix ssync sender cleanup of reverted fragment files https://review.openstack.org/169052 | 00:17 |
*** rdaly2 has joined #openstack-swift | 00:17 | |
acoles | ho: ^^ patch 169052 is ready for review too :) | 00:19 |
patchbot | acoles: https://review.openstack.org/#/c/169052/ | 00:19 |
acoles | peluse: clayg: ^^ i think thats heading the right way but i'm heading to bed! | 00:19 |
*** tsg has quit IRC | 00:20 | |
*** acoles is now known as acoles_away | 00:21 | |
*** rdaly2 has quit IRC | 00:21 | |
*** Tahmina has quit IRC | 00:27 | |
*** dmorita has joined #openstack-swift | 00:31 | |
*** lcurtis has quit IRC | 00:34 | |
*** kota_ has joined #openstack-swift | 00:36 | |
kota_ | morning, everyone :) | 00:37 |
ho | kota_: morning! | 00:40 |
openstackgerrit | Samuel Merritt proposed openstack/swift: Add some debug output to the ring builder https://review.openstack.org/146945 | 00:47 |
openstackgerrit | John Dickinson proposed openstack/swift: Check if REST API version is valid https://review.openstack.org/168509 | 00:48 |
notmyname | ok, I'll be back later tonight | 00:52 |
*** Nadeem_ has quit IRC | 00:54 | |
*** annegentle has joined #openstack-swift | 00:58 | |
mattoliverau | kota_: morning | 00:58 |
mattoliverau | notmyname: o/ | 00:59 |
kota_ | ho, mattoliverau: o/ | 00:59 |
openstackgerrit | paul luse proposed openstack/swift: Erasure Code Reconstructor https://review.openstack.org/131872 | 01:07 |
*** annegentle has quit IRC | 01:12 | |
*** annegentle has joined #openstack-swift | 01:13 | |
*** haigang has joined #openstack-swift | 01:13 | |
*** mitz has quit IRC | 01:26 | |
*** vinsh has joined #openstack-swift | 01:29 | |
*** mitz has joined #openstack-swift | 01:36 | |
*** annegentle has quit IRC | 01:42 | |
*** tsg has joined #openstack-swift | 01:49 | |
*** panbalag has joined #openstack-swift | 01:59 | |
*** jrichli has quit IRC | 02:05 | |
*** jrichli has joined #openstack-swift | 02:05 | |
openstackgerrit | Janie Richling proposed openstack/swift: WIP - Provides a simple skeleton of middleware for encryption feature. https://review.openstack.org/157907 | 02:06 |
*** panbalag has joined #openstack-swift | 02:12 | |
*** annegentle has joined #openstack-swift | 02:15 | |
*** vinsh has quit IRC | 02:16 | |
*** haomaiwang has joined #openstack-swift | 02:17 | |
*** lcurtis has joined #openstack-swift | 02:22 | |
*** Gues_____ has quit IRC | 02:25 | |
openstackgerrit | Hisashi Osanai proposed openstack/swift: Remove sudo from resetswift command https://review.openstack.org/169142 | 02:25 |
*** jrichli has quit IRC | 02:32 | |
*** Gues_____ has joined #openstack-swift | 02:35 | |
*** lcurtis has quit IRC | 02:37 | |
*** bkopilov has quit IRC | 02:38 | |
*** haigang has quit IRC | 02:57 | |
*** erlon has quit IRC | 03:01 | |
*** tsg has quit IRC | 03:19 | |
*** Gues_____ has quit IRC | 03:20 | |
*** haigang has joined #openstack-swift | 03:22 | |
*** kei_yama has joined #openstack-swift | 03:25 | |
*** vinsh has joined #openstack-swift | 03:26 | |
*** haigang has quit IRC | 03:27 | |
*** vinsh has quit IRC | 03:28 | |
*** vinsh has joined #openstack-swift | 03:29 | |
*** vinsh has quit IRC | 03:31 | |
*** panbalag has quit IRC | 03:43 | |
*** haigang has joined #openstack-swift | 03:55 | |
*** dmorita has quit IRC | 04:00 | |
*** bkopilov has joined #openstack-swift | 04:08 | |
*** jasondotstar has quit IRC | 04:12 | |
*** jasondotstar has joined #openstack-swift | 04:14 | |
*** bkopilov has quit IRC | 04:20 | |
*** zaitcev has quit IRC | 04:21 | |
*** annegentle has quit IRC | 04:40 | |
*** ppai has joined #openstack-swift | 04:41 | |
*** goodes has quit IRC | 04:41 | |
*** goodes has joined #openstack-swift | 04:43 | |
*** annegentle has joined #openstack-swift | 04:46 | |
*** annegentle has quit IRC | 04:49 | |
clayg | no more sudo! | 04:54 |
*** SkyRocknRoll has joined #openstack-swift | 04:55 | |
ho | clayg: sorry. it's my mis-understanding... | 04:56 |
clayg | ho: no i'm sure it's fine - iwas just catching up in channel | 04:56 |
clayg | i've been ignoring it writing reconstructor unittests | 04:56 |
ho | clayg: can i ask the patch num for your unittest? | 04:57 |
clayg | i haven't submited yet - i'm going to drop them right ontop of patch 131872 | 04:58 |
patchbot | clayg: https://review.openstack.org/#/c/131872/ | 04:58 |
clayg | i guess I could kick up what I have | 04:58 |
clayg | meh, i need to write some more | 04:58 |
*** silor has joined #openstack-swift | 05:00 | |
ho | clayg: thanks for the info. this patch is my next target. :) | 05:05 |
ho | s/target/target of review | 05:06 |
*** kota_ has quit IRC | 05:07 | |
*** kota_ has joined #openstack-swift | 05:11 | |
*** dmorita has joined #openstack-swift | 05:12 | |
clayg | ho: a'igth - i guess i'll push up what I've got | 05:13 |
*** dmorita has quit IRC | 05:21 | |
clayg | ho: whoa - bit of a mess with the rebasin' | 05:24 |
ho | clayg: haha. conflict with your previous patch? (i forgot the num) | 05:26 |
openstackgerrit | Clay Gerrard proposed openstack/swift: wip: ec reconstructor probe test https://review.openstack.org/164291 | 05:26 |
openstackgerrit | Clay Gerrard proposed openstack/swift: Erasure Code Reconstructor https://review.openstack.org/131872 | 05:26 |
clayg | i hope i didn't loose anyones fixes there | 05:28 |
*** dmorita has joined #openstack-swift | 05:31 | |
openstackgerrit | Clay Gerrard proposed openstack/swift: Fix ssync sender cleanup of reverted fragment files https://review.openstack.org/169052 | 05:31 |
mattoliverau | clayg: maybe you need to come start a swiftstack office in Oz, cause you seem to be living the timezone :P | 05:38 |
clayg | mattoliverau: maybe if I did it'd help me get back on normal time? | 05:44 |
*** reed has quit IRC | 05:47 | |
mattoliverau | lol, maybe | 05:49 |
*** nshaikh has joined #openstack-swift | 05:52 | |
*** annegentle has joined #openstack-swift | 05:58 | |
*** annegentle has quit IRC | 06:02 | |
cschwede | Good Morning! | 06:03 |
ho | cschwede: good morning! | 06:04 |
mattoliverau | cschwede: morning! What are you doing up so early! | 06:05 |
cschwede | mattoliverau: it’s 8am over here, normal time for me | 06:06 |
mattoliverau | oh, cool, as we have a little git of cross over (until I need to change my clocks in like a week or something) | 06:07 |
mattoliverau | s/git/bit/ | 06:07 |
cschwede | we just changed clocks the weekend, normally i would start one hour later | 06:07 |
mattoliverau | cschwede: ahh, so that's why.. otherwise I'd just assume you used to ignore me first thing in the morning :P | 06:14 |
cschwede | mattoliverau: no worries, i’m not ignoring you! sometimes i’m just a silent observer | 06:16 |
mattoliverau | lol | 06:16 |
*** haigang has quit IRC | 06:27 | |
*** haigang has joined #openstack-swift | 06:27 | |
*** haigang has quit IRC | 06:32 | |
notmyname | cschwede: I found the error in https://review.openstack.org/#/c/168509 | 06:32 |
notmyname | I unintentionally found it | 06:32 |
cschwede | notmyname: oh, great - i’m just debugging it. what’s the reason? | 06:32 |
notmyname | you didn't clean up constraints after setting it. I'm about to push a new verison | 06:33 |
cschwede | ah, i think it’s a re-used swift.conf? | 06:33 |
cschwede | :) just recognized that too. thx for debugging! | 06:33 |
openstackgerrit | John Dickinson proposed openstack/swift: Check if REST API version is valid https://review.openstack.org/168509 | 06:33 |
cschwede | btw, the new APIVersionError is a good idea | 06:34 |
notmyname | cschwede: also check my revert to test_chunked_put_bad_version and test_chunked_put_bad_path. they pass now (with the version on master) | 06:35 |
notmyname | cschwede: I'm guessing you had changed those to make some tests pass? | 06:35 |
notmyname | cschwede: are you on the openstack operators list? | 06:37 |
notmyname | cschwede: didn't you say you had a customer in instanbul? http://lists.openstack.org/pipermail/openstack-operators/2015-March/006675.html | 06:37 |
cschwede | notmyname: yes, i modified test_chunked_put_bad_version to make the tests pass. but looking at it now my change is not necessary | 06:38 |
notmyname | cschwede: ah. feel free to push over it :-) | 06:38 |
clayg | notmyname: you're up late! | 06:38 |
clayg | notmyname: did you ever hear back from infra on the review-ec branch? | 06:38 |
cschwede | i read the operators list, yes; but that is not my customer | 06:38 |
notmyname | clayg: not yet. I expect ttx to be up in about an hour (he's one our later than cschwede if my geography is correct) | 06:40 |
*** silor has quit IRC | 06:40 | |
cschwede | notmyname: France? Same time, 8:40 am now | 06:40 |
notmyname | ah ok | 06:41 |
clayg | i'm seeing this random eventlet.switch bug on some of the feature/ec changes -> http://logs.openstack.org/72/131872/48/check/gate-swift-python27/e998cc0/nose_results.html | 06:41 |
cschwede | they also switched to DST on the weekend | 06:41 |
cschwede | notmyname: clayg: do you have a new stronger coffee brand in the office? looks like you guys are no longer sleeping | 06:41 |
notmyname | if I haven't heard from ttx by the time I get up tomorrow, I'll find other people to DOITNOW! | 06:42 |
clayg | it's like power naps | 06:42 |
*** jamielennox is now known as jamielennox|away | 06:49 | |
clayg | it's like eventlet.sleep acctually | 06:49 |
openstackgerrit | Clay Gerrard proposed openstack/swift: wip: ec reconstructor probe test https://review.openstack.org/164291 | 06:56 |
openstackgerrit | Clay Gerrard proposed openstack/swift: Erasure Code Reconstructor https://review.openstack.org/131872 | 06:56 |
clayg | test ERROR: ERROR __call__ error with PUT /sda1/p/a/c/o : Timeout (1s) <- that's the same thing - in switch | 06:57 |
clayg | somethings borked :\ | 06:57 |
clayg | portante: was always really good at tracking this crap down | 06:57 |
clayg | i wonder if it's just on feature/ec | 06:57 |
*** annegentle has joined #openstack-swift | 06:58 | |
*** annegentle has quit IRC | 07:04 | |
notmyname | I just saw an email from ttx to a mailing list, so I know he's up | 07:04 |
notmyname | clayg: basically, you'll know when the new feature/ec_review branch is there when you `git fetch gerrit` and see it download | 07:05 |
notmyname | so I'm hoping he'll get to it soon | 07:05 |
notmyname | ok, I'm going to bed. talk to you in a few hours | 07:06 |
*** haigang has joined #openstack-swift | 07:12 | |
*** silor has joined #openstack-swift | 07:17 | |
*** someonespace has joined #openstack-swift | 07:22 | |
*** bkopilov has joined #openstack-swift | 07:28 | |
*** geaaru has joined #openstack-swift | 07:42 | |
*** jistr has joined #openstack-swift | 07:43 | |
*** annegentle has joined #openstack-swift | 07:59 | |
*** annegentle has quit IRC | 08:05 | |
cschwede | hmm, looks like a dependency is missing on the gate, saw that error on at least two different jobs | 08:28 |
cschwede | https://bugs.launchpad.net/openstack-ci/+bug/1334550 | 08:28 |
openstack | Launchpad bug 1334550 in OpenStack-Gate "Could not find any downloads that satisfy the requirement X" [Low,Fix released] | 08:28 |
*** haigang has quit IRC | 08:38 | |
*** jordanP has joined #openstack-swift | 08:46 | |
*** acoles_away is now known as acoles | 08:47 | |
acoles | morning/evening | 08:49 |
-openstackstatus- NOTICE: CI Check/Gate pipelines currently stuck due to a bad dependency creeping in the system. No need to recheck your patches at the moment. | 08:55 | |
*** ChanServ changes topic to "CI Check/Gate pipelines currently stuck due to a bad dependency creeping in the system. No need to recheck your patches at the moment." | 08:55 | |
*** annegentle has joined #openstack-swift | 09:00 | |
*** annegentle has quit IRC | 09:06 | |
ho | acoles: morning! | 09:06 |
openstackgerrit | Christian Schwede proposed openstack/swift: Check if device name is valid when adding to the ring https://review.openstack.org/169231 | 09:20 |
*** ho has quit IRC | 09:34 | |
*** jamielennox|away is now known as jamielennox | 09:41 | |
*** silor has quit IRC | 09:43 | |
*** jamielennox is now known as jamielennox|away | 09:47 | |
*** annegentle has joined #openstack-swift | 10:01 | |
*** annegentle has quit IRC | 10:06 | |
openstackgerrit | Alistair Coles proposed openstack/swift: Fix ssync sender cleanup of reverted fragment files https://review.openstack.org/169052 | 10:09 |
*** dmorita has quit IRC | 10:34 | |
*** haomaiwang has quit IRC | 10:36 | |
*** haomaiwang has joined #openstack-swift | 10:54 | |
*** silor has joined #openstack-swift | 11:02 | |
*** annegentle has joined #openstack-swift | 11:02 | |
*** annegentle has quit IRC | 11:07 | |
*** nshaikh has quit IRC | 11:20 | |
*** nshaikh has joined #openstack-swift | 11:31 | |
*** jistr is now known as jistr|english | 11:32 | |
*** jistr|english is now known as jistr|class | 11:33 | |
*** kei_yama has quit IRC | 11:48 | |
*** ChanServ changes topic to "Review Dashboard: http://goo.gl/uRzLBX | Overview Dashboard: http://goo.gl/2By1qv | EC status: https://gist.github.com/notmyname/fd006c061ccb28e8ecfc | Logs: http://eavesdrop.openstack.org/irclogs/%23openstack-swift/" | 11:50 | |
-openstackstatus- NOTICE: Check/Gate unstuck, feel free to recheck your abusively-failed changes. | 11:50 | |
straycat | Okay, I created two identical ring builder files | 11:53 |
straycat | a.builder and b.builder | 11:53 |
*** kota_ has quit IRC | 11:53 | |
straycat | diff a.builder and b.builder shows they're the same | 11:53 |
*** ujjain has quit IRC | 11:54 | |
straycat | but after a rebalance with the same seed diff says they differ? | 11:54 |
straycat | that wasn't exactly what i was expecting | 11:54 |
cschwede | straycat: iirc, there is a timestamp in the ring file, thus the difference | 12:01 |
straycat | I was hoping it'd be something like that | 12:02 |
*** km has quit IRC | 12:02 | |
*** annegentle has joined #openstack-swift | 12:03 | |
*** annegentle has quit IRC | 12:08 | |
*** erlon has joined #openstack-swift | 12:09 | |
*** panbalag has joined #openstack-swift | 12:10 | |
*** ppai has quit IRC | 12:34 | |
*** mahatic has joined #openstack-swift | 12:38 | |
*** bkopilov has quit IRC | 12:41 | |
cschwede | straycat: use the following to print your rings, and compare your output using diff: https://gist.github.com/cschwede/4c60f89a86bd238d309a | 12:42 |
cschwede | if i use a seed the ringdata is identical except for the _last_part_moves_epoch | 12:42 |
*** tongli has joined #openstack-swift | 12:49 | |
*** jistr|class is now known as jistr | 12:58 | |
*** annegentle has joined #openstack-swift | 13:02 | |
*** petertr7 has joined #openstack-swift | 13:10 | |
peluse | clayg, that 1s timeout thing on feature/ec was introduced between set 44 and 45.... | 13:15 |
*** logan2 has quit IRC | 13:16 | |
*** ujjain has joined #openstack-swift | 13:17 | |
peluse | clayg, and someone mentioned the test ZBF thing, just a simple fix. Looking into the job cardinality comment you mentioned, intent was as you mentioned and I think I see where things strayed | 13:17 |
peluse | man this is the carappiest code I've ever written in my life... I've got to find someone to blame for this! | 13:19 |
*** logan2 has joined #openstack-swift | 13:19 | |
*** nshaikh has quit IRC | 13:21 | |
*** trex has joined #openstack-swift | 13:40 | |
*** tsg_ has joined #openstack-swift | 13:46 | |
*** logan2 has quit IRC | 13:53 | |
*** rdaly2 has joined #openstack-swift | 13:56 | |
*** logan2 has joined #openstack-swift | 13:56 | |
straycat | cschwede, Okay, given that's the offset all the last_part_moves are based on that feels slightly unsafe, though I guess on a timescale of hours it's not going to be a problem | 13:57 |
*** rdaly2 has quit IRC | 14:01 | |
*** jrichli has joined #openstack-swift | 14:17 | |
*** bkopilov has joined #openstack-swift | 14:26 | |
*** tsg_ has quit IRC | 14:27 | |
*** blankspace has joined #openstack-swift | 14:37 | |
*** emptyspace has joined #openstack-swift | 14:39 | |
*** someonespace has quit IRC | 14:40 | |
*** blankspace has quit IRC | 14:43 | |
*** blankspace has joined #openstack-swift | 14:52 | |
*** emptyspace has quit IRC | 14:56 | |
*** blankspace has quit IRC | 15:01 | |
*** SkyRocknRoll has quit IRC | 15:02 | |
*** annegentle has quit IRC | 15:08 | |
*** annegentle has joined #openstack-swift | 15:13 | |
straycat | cschwede, Ahh sorry I might be being a stupid here, that value is only going to be used by the rebalance process, and that's only ever performed by running swift-ring-builder foo.builder rebalance ? | 15:13 |
*** reed has joined #openstack-swift | 15:13 | |
straycat | If that's the case then I don't need to worry about the differing value for _last_part_moves_epoch | 15:14 |
cschwede | straycat: yes, exactly. even if you do a rebalance the assignment of the partitions should be identical, given that you apply the same changes everywhere, use the same seed and a min_part_hour of less than the time difference to your last rebalance | 15:20 |
*** zigo__ is now known as zigo | 15:23 | |
*** mahatic has quit IRC | 15:24 | |
straycat | cschwede, "and a min_part_hour of less than the time difference to your last rebalance" thereby forcing all partitions to be rebalanced regardless of _last_move_parts ? | 15:27 |
*** zaitcev has joined #openstack-swift | 15:28 | |
*** ChanServ sets mode: +v zaitcev | 15:28 | |
cschwede | straycat: min_part_hours is the amount of hours that needs to pass before a partition is allowed to move again. thus if you have different values, some partitions on builderfile A might move, but not on B (if you rebalance). But if it is 0 for example, the same rules apply everywhere, no matter the time that has passed | 15:30 |
clayg | introduced between set 44 and set 45 ? | 15:32 |
straycat | cschwede, makes sense thanks | 15:33 |
cschwede | straycat: you’re welcome :) | 15:33 |
clayg | cschwede: i've been wondering if balancing ec rings with like 10-16 "replicas" is going have strange behaviors because of min_part_hours | 15:34 |
cschwede | clayg: depends on the total amount of disks? | 15:35 |
clayg | i just mean if only one replica of a part is going to move every min_part_hours and you have 10-16 o them - that's like two weeks to move everything if you're... idk migrating zones or something | 15:36 |
clayg | acoles: cschwede: peluse: so you guys say we know something about the random Timeout (1s) failures? | 15:38 |
clayg | is it happening on master reviews too? | 15:40 |
cschwede | clayg: hmm, so you mean we need to think about the rebalancing for EC? Allowing to move more than one replica of a partition at a time? | 15:43 |
cschwede | clayg: sorry, now, i don’t know the reason for the timeout :( | 15:43 |
clayg | cschwede: well we shouldn't do anything yet - i just haven't really spent anytime with real world rebalancing of 12 replica rings :P | 15:46 |
clayg | cschwede: I see 'em on my laptop sometimes honestly - I think something is going on | 15:46 |
cschwede | clayg: agreed - thinking alone about juggling 12 instead of 1 replicas at a time makes me dizzy | 15:47 |
acoles | clayg: i've seen some of those timeout test failures but not investigated | 15:53 |
*** bobby2_ is now known as bobby2 | 15:54 | |
clayg | acoles: :'( | 15:59 |
clayg | i'm worried its going to take someone smart looking at it | 15:59 |
acoles | rules me out then | 15:59 |
clayg | see how quick you were with that | 16:00 |
clayg | tdasilva: how'd you do that etherpad thing - can I get everyone to start helping capture the tests and line numbers that breaking? | 16:01 |
tdasilva | clayg: one sec, let me create one | 16:02 |
acoles | clayg: after my experiment last night i have concluded that my body can't take coding til early hours of morning ;) | 16:02 |
clayg | acoles: you never know until you try | 16:02 |
*** tsg_ has joined #openstack-swift | 16:03 | |
tdasilva | clayg: https://etherpad.openstack.org/p/swift_timeout_test_failure | 16:03 |
clayg | acoles: I've found injecting coffee into the bloodstream helps recovery in the morning | 16:03 |
acoles | acoles: i'm just gettin too darn old | 16:05 |
glange | acoles: your are talking to yourself | 16:05 |
glange | you | 16:05 |
acoles | s/acoles/clayg/ see what i mean! | 16:05 |
glange | :) | 16:05 |
*** silor has quit IRC | 16:05 | |
acoles | glange: its the only way to get agreement ;) | 16:06 |
notmyname | good morning | 16:07 |
*** jistr has quit IRC | 16:09 | |
clayg | lol! | 16:12 |
peluse | clayg, i thought the 1s timeout was only on the ecrecon branch, no? | 16:12 |
clayg | oh is that what's going on? peluse, i have no idea | 16:14 |
clayg | i'm just going to keep running tests in a loop until i get a clue | 16:14 |
peluse | OK, I'll look at curent feature/ec | 16:14 |
peluse | on ECrecon it started at patch set 45 though | 16:15 |
clayg | oh that was easy | 16:15 |
notmyname | clayg: feature/ec_review is available | 16:15 |
notmyname | just heard from ttx that summit room assignments should be finished by the end of next week. today's cross-project meeting has that as an agenda item | 16:15 |
clayg | notmyname: ok - do you want to start reviewing with or without the bugs? | 16:16 |
notmyname | I just got in and haven't caught up from what happened during the night | 16:16 |
notmyname | where are we right now? | 16:17 |
clayg | notmyname: peluse and I are going to make the reconstructor awesome by the power of unittests! | 16:17 |
notmyname | yay | 16:18 |
clayg | notmyname: and no one is getting any work done because of this stupid reoccuring failure with Timeout (1s) on feature/ec (or anything based on the reconstructor) | 16:18 |
notmyname | is that what's blocking https://review.openstack.org/#/c/169035/ | 16:18 |
peluse | clayg, just so we know we're taking about the same thing, exactly which test(s) are you seeing hit by this? | 16:19 |
clayg | peluse: i'm trying to take notes as I go https://etherpad.openstack.org/p/swift_timeout_test_failure | 16:20 |
peluse | notmyname, that one looks like it needs a recheck. I'm not aware of it being associated with the timeout thing but could be wrong | 16:20 |
peluse | clay, OK | 16:20 |
notmyname | peluse: ok, just trigged the recheck | 16:20 |
peluse | clayg, seems like the 1s TO thing is only on the ECrecon branch and only when you include the ECrecon tests (and again started with patch set 45) if you want to continue adding more unit tests I'll go hunt down exactly what is causing it | 16:26 |
clayg | i think that was just the gate thing the status messages were talking about "Could not install requirement XStatic-Angular-Irdragndrop" | 16:27 |
clayg | peluse: *DEAL* | 16:27 |
clayg | acoles: what are you up to? just slackin? | 16:28 |
acoles | ! | 16:28 |
acoles | feet up | 16:28 |
acoles | kickin back | 16:28 |
clayg | i swear like three times in the last two weeks i was just like "I ... can't ... write ... another ... line ... of ... code ..." | 16:28 |
acoles | clayg: i am nearly done writing the mother of all ssync tests | 16:29 |
clayg | nice! is it an integration test - do you spin up servers!? | 16:29 |
acoles | clayg: its gonna test end to end, on disk files to on disk | 16:30 |
acoles | clayg: i was going to add it to this patch https://review.openstack.org/#/c/169052/ | 16:31 |
notmyname | neat | 16:31 |
clayg | acoles: that's flipping *great* | 16:32 |
clayg | i can't wait to see it | 16:32 |
clayg | push it now | 16:32 |
acoles | clayg: peluse : i think we need the fixes in 169052 | 16:32 |
clayg | i don't care if it fails | 16:32 |
clayg | the 1s TO fixes? | 16:32 |
acoles | clayg: now i have oversold it i'm worried | 16:32 |
clayg | peluse: says anything that dpends on the reconstructor breaks | 16:33 |
clayg | acoles: SHUT UP AND TAKE MY MONEY | 16:33 |
acoles | clayg no 169052 is ssync/clean up reverted FI's fixes | 16:33 |
acoles | clayg: gotta check my code carefully for farts ;) | 16:34 |
clayg | lol! | 16:34 |
* acoles will never enjoy beethoven again | 16:35 | |
clayg | well - it's just when you hear it you'll get gassy | 16:35 |
acoles | you're making it worse! | 16:35 |
peluse | clayg, narrowed down: its in the hokey setup of the old set of unit tests. maybe we should clean those out sooner than later but first let me see if I can get a bit closer to root cause to make sure it really is something we dont give a shit about | 16:38 |
clayg | peluse: weird - thanks | 16:38 |
*** annegentle has quit IRC | 16:40 | |
clayg | man so like if you're doing a 10+2 scheme and you need to rebuild a fragment, you can only have *one* other node down or else you can rebuild | 16:45 |
clayg | I guess that's what the 2 is for :\ | 16:45 |
*** jordanP has quit IRC | 16:45 | |
*** vinsh has joined #openstack-swift | 16:48 | |
zaitcev | Guys, remember that we have eventlet>=0.16.1,!=0.17.0? Apparently 0.17.1 satisfies it. | 16:58 |
zaitcev | Yay | 16:58 |
notmyname | zaitcev: that makes sense, right? | 16:58 |
notmyname | "anything bigger than 16.1 except for 17.0 | 16:59 |
zaitcev | Right, as long as EC works with it | 16:59 |
notmyname | I'm frankly amazed that we haven't (AFAIK) seen two openstack projects with incompatible dependency versions (eg proj A requires dep v1, but proj B requires dep v2. of course proj A breaks with dep v2) | 17:00 |
*** welldannit has quit IRC | 17:00 | |
notmyname | zaitcev: the gate is running whatever the highest version available that meets the version requirements, so all the tests have used 0.17.1 since it's been available. so I'm not too worried about EC breaking with 0.17.1 | 17:01 |
zaitcev | Thank that implicit keyword arguments in Python. They allow for easy compatibility in most cases without burdening developers with things like namespaces. | 17:02 |
notmyname | (which is also one reason I try to keep my saio running with the earliest supported versions of dependencies) | 17:03 |
clayg | notmyname: you're doing gods work man | 17:19 |
clayg | i implemented the ts method on like *one* test case now where there's not a timestamp just waiting for me I'm all put out | 17:21 |
peluse | clayg, just updated the etherpad wrt the 1s TO thing. Have to run to a mandatory meeting for 90 min or so | 17:23 |
clayg | acoles: https://etherpad.openstack.org/p/swift_timeout_test_failure <- peluse says you durable files are too durable | 17:26 |
acoles | clayg: looking... | 17:35 |
*** silor has joined #openstack-swift | 17:35 | |
*** zhill has joined #openstack-swift | 17:40 | |
*** annegentle has joined #openstack-swift | 17:41 | |
*** geaaru has quit IRC | 17:44 | |
openstackgerrit | Alistair Coles proposed openstack/swift: Fix ssync sender cleanup of reverted fragment files https://review.openstack.org/169052 | 17:44 |
acoles | clayg: there's the ssync tests ^^, still got a failure so wip | 17:44 |
*** annegentle has quit IRC | 17:46 | |
*** bkopilov has quit IRC | 17:56 | |
*** bkopilov has joined #openstack-swift | 17:56 | |
clayg | acoles: that's cool tho | 18:01 |
openstackgerrit | Christian Schwede proposed openstack/swift: Check if device name is valid when adding to the ring https://review.openstack.org/169231 | 18:04 |
*** jamielennox|away is now known as jamielennox | 18:04 | |
*** dencaval has joined #openstack-swift | 18:15 | |
openstackgerrit | Janie Richling proposed openstack/swift: WIP - Provides a simple skeleton of middleware for encryption feature. https://review.openstack.org/157907 | 18:30 |
*** zhill_ has joined #openstack-swift | 18:32 | |
wer | I have one (of 8) servers that isn't really clearing it's async work. async_pending": 98258. They are not clearing on this one host. | 18:37 |
*** silor has quit IRC | 18:39 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/swift: Updated from global requirements https://review.openstack.org/88736 | 18:39 |
clayg | wer: you got a container server offline atm? | 18:51 |
*** annegentle has joined #openstack-swift | 18:52 | |
clayg | weird that it's just one server - network routes? anything change on the machine recently? old ring? | 18:52 |
*** fbo has quit IRC | 18:52 | |
wer | no clayg. Everyone is online. All these machines are identical and on the same 10 gig switch with no known connectivity issues. | 18:52 |
wer | rings have not changed in months. | 18:53 |
wer | I run a pretty steady state.... but the other day I noticed one object server on another machine that was cpu'ing more than the others..... And then this one server that isn't moving it's async stuff. | 18:54 |
wer | I reloaded the object server... but was tempted to actually restart it and see if things move. I'm also suspicious that I might have a container hotspotting or something but I have not identified anything yet. | 18:56 |
wer | or maybe a fragmented sqlite. It's strange behavior. It's kinda stuck :) | 18:57 |
wer | I actually had everything running really well for 6 months. But I needed to shard a container. Cause I had like 3million objects and did a lot of deletes each night. So I sharded that container. I do feel like I see more timeouts then on the old cluster with a single larger container.... but the timeout's are still very low. | 19:00 |
clayg | hmm... that is interesting | 19:00 |
clayg | ok, well when you don't know what the problem is TO THE LOGS! | 19:01 |
clayg | object-updater is responsible for that - you could stop him, and the run swift-init object-updater once -nv from the command line to get the logs on the console | 19:01 |
wer | lol the only thing I've identified are some timeouts. And I hate them. Maybe I don't have enough object servers listening or something. | 19:02 |
wer | object-updater does the async stuff? | 19:02 |
*** fbo has joined #openstack-swift | 19:04 | |
wer | he's not running ?! :P | 19:05 |
wer | wtf | 19:05 |
openstackgerrit | Clay Gerrard proposed openstack/swift: wip: ec reconstructor probe test https://review.openstack.org/164291 | 19:10 |
openstackgerrit | Clay Gerrard proposed openstack/swift: Erasure Code Reconstructor https://review.openstack.org/131872 | 19:10 |
*** silor has joined #openstack-swift | 19:11 | |
*** zhill_ has quit IRC | 19:14 | |
wer | It's working on it clayg. Looks like the object updater is getting some timeouts on one of my mounts. hrm. it swept 4 disks fine and is timing out now on three more. I've got something to dig at I guess. | 19:15 |
clayg | wer: nice - keep us posted - good luck! | 19:16 |
wer | k thanks! I see a few failures.... but I can only assume this is because it's probably been dead for so long | 19:16 |
wer | d133 completed: 240.01s, 4096 successes, 3 failures | 19:16 |
wer | I've got plenty of ram. workers = 24 max_clients=512 on the object server. The really messed up thing is that I get about 10million requests a day for objects that are going to be 404. And I think that uses up my connections. But I didn't have this issue previously. I might bump max clients if I can identify it as being hit. I really don't know. | 19:20 |
*** reed has quit IRC | 19:36 | |
*** reed has joined #openstack-swift | 19:36 | |
wer | ok, damn. I was missing two container-replicator and an account-replicator process... I'll put some checks in place to trend if these die, or if I had some one time occurrence that I missed. serves me right for turning my back on things I guess. | 19:37 |
acoles | clayg: so about these timeouts, am i missing something re the commit(), i see random tests fail due to time rounding without running the recon tests | 19:38 |
acoles | s/recon/??/ | 19:38 |
clayg | recoder | 19:38 |
acoles | clayg: i wrote stuff on the etherpad | 19:39 |
clayg | because it RE-EC-EnCodes them! | 19:39 |
acoles | oh no i lit that fuse again :P | 19:39 |
clayg | acoles: so but i mostly only see this specific Timeout 1s errors when I run the reconstructor tests | 19:39 |
clayg | maybe there's a test here or there that has some timing isues rounding stuff - but the Timeout (1s) failures are pretty consistent for me | 19:40 |
acoles | clayg: maybe the ts iter pattern is flaky? are you using that a lot in the recodifier tests | 19:40 |
clayg | acoles: *maybe* but I started that back in the reconciler tests | 19:41 |
clayg | anyway - i'm not seeing like assertion failures or some timestamp doesn't equal another - i'm seeing the hub.swift raising timeouts - peluse says it's because stuff is slow | 19:41 |
acoles | clayg: ok gotcha | 19:42 |
acoles | clayg: i'll go dog some more | 19:42 |
acoles | argh dig!! | 19:42 |
clayg | heh | 19:42 |
clayg | i'm going to eat something and head into the office for awhile - catch up with notmyname - unless - is he out today? | 19:42 |
acoles | taxes? ;) | 19:43 |
*** annegentle has quit IRC | 19:51 | |
peluse | acoles, just got back from meeting-ville and read your thing on the etherpa | 20:02 |
acoles | peluse: maybe i was barking up wrong tree? | 20:03 |
peluse | acoles, not sure I "get it" just yet though :) | 20:03 |
acoles | peluse: well i wrote it quick so bit of a brian dump | 20:03 |
acoles | whoever brian is | 20:04 |
peluse | acoles, if I don't use the commit() method the things work fine, it seems to be when its used as part of the class setup() somehow?? How does that relate to what you saw wrt timestamps (or maybe it doesnt) | 20:04 |
acoles | peluse: not sure it does, can you point me to example of where its used as part of a class setup()? | 20:04 |
peluse | but honestly I think we're scrapping that whole set of tests including the setup and nowhere else (that I can find) do we create a bunch of files in setup() | 20:05 |
peluse | sure, look in... (copying) | 20:05 |
peluse | setup() of TestGlobalSetupObjectReconstructor() | 20:05 |
acoles | peluse: k, will do | 20:05 |
peluse | crap, another mtg, will be semi-online here for a few | 20:06 |
acoles | peluse: unrelated, i have a fix for test_removes_zbf in the test_recosntructor, what shall i do with it? paste it for you or clayg, or push over the recon patch (its like 5 lines) | 20:07 |
peluse | I have one ready too :) | 20:07 |
peluse | one liner | 20:07 |
peluse | but paste and lets compare | 20:07 |
peluse | list(self.reconstructor.collect_parts()) | 20:08 |
peluse | self.assertFalse(os.path.exists(pol_1_part_1_path)) | 20:08 |
acoles | peluse: http://paste.openstack.org/show/197736/ | 20:08 |
acoles | peluse: yeah that works too :D | 20:09 |
peluse | OK, have to talk on this other call.. back in a few min | 20:09 |
*** dencaval has quit IRC | 20:09 | |
acoles | k i'm heading home will touch base later | 20:09 |
peluse | OK | 20:09 |
*** annegentle has joined #openstack-swift | 20:10 | |
*** acoles is now known as acoles_away | 20:11 | |
*** silor has quit IRC | 20:12 | |
peluse | notmyname, anyone - is there any plan (timeframe wise) for py 3.0? | 20:19 |
peluse | for swift of course... we have a team doing python optmizations and it doesn't look like there will be a 2.8 to contribute them to... | 20:20 |
*** jogriffin has joined #openstack-swift | 20:24 | |
*** annegentle has quit IRC | 20:30 | |
openstackgerrit | Merged openstack/swift: Even more cleanup to EC on-disk file cleanup https://review.openstack.org/169035 | 20:30 |
*** annegentle has joined #openstack-swift | 20:34 | |
*** david-lyle has quit IRC | 20:40 | |
notmyname | peluse: not as far as I know. nobody is really working on it | 20:40 |
peluse | yeah, I thought it had some steam last year and seemed to die off | 20:43 |
notmyname | peluse: only client side | 20:43 |
peluse | ahh | 20:44 |
notmyname | peluse: or something called ec got everyone busy | 20:44 |
peluse | yay :) | 20:44 |
clayg | notmyname: i think everyone is still waiting on eventlet to support py3 | 20:50 |
clayg | notmyname: but I think temoto is making progress | 20:50 |
notmyname | ya. I've heard good things | 20:51 |
clayg | i lost the ether pad - does someone have the link? | 20:51 |
clayg | peluse: acoles_away: unless you have the fix already somehow? | 20:51 |
peluse | on the phone, almost done | 20:52 |
clayg | peluse: i don't know if this works in your office - but around here there's like two magical letters that you get you out of just about anything | 20:53 |
peluse | FO? | 20:54 |
clayg | heh | 20:54 |
clayg | ok, i see why that collect jobs test was broken - but what does that have to do with the Timeout (1s) thing? | 20:54 |
peluse | nada | 20:56 |
clayg | srly, i lost the etherpad link | 20:56 |
clayg | nm, lastlog had it | 20:56 |
clayg | peluse: the other failures that acoles pointed out make sense enough I guess - but again nothing to do with Timeout (1s) | 20:59 |
peluse | coming | 20:59 |
peluse | agree | 20:59 |
peluse | https://etherpad.openstack.org/p/swift_timeout_test_failure | 20:59 |
clayg | have you tried just commenting out the GlobalTestReconstructor | 20:59 |
peluse | there it is :) | 20:59 |
*** thumpba has joined #openstack-swift | 20:59 | |
peluse | clayg, yes, that works like a champ | 20:59 |
clayg | peluse: thanks | 20:59 |
clayg | peluse: oh | 20:59 |
clayg | ok, well at least we have one option | 20:59 |
peluse | that's why I think mabe this isn't worth chasing down but wanted to dig just a little deeper to make sure it wasn't something real that jsut exposed by my goofy test setup | 21:00 |
clayg | peluse: ... right | 21:00 |
clayg | :\ | 21:00 |
clayg | peluse: well at least we know we have a way to cut and run | 21:00 |
clayg | peluse: can you keep on it and I'll get acoles fixes and look into those other tests? | 21:00 |
peluse | but I've been on the damned phone since acoles went to lunch - off now so will poke a bit more and I think maybe just remove all the global tests and port the ones that make sense into the class you added (the piss ant ones for coverage of things like check_rings) | 21:01 |
peluse | yup | 21:01 |
*** annegentle has quit IRC | 21:01 | |
*** annegentle has joined #openstack-swift | 21:01 | |
clayg | peluse: yeah i'm on board with that plan too | 21:05 |
clayg | peluse: but I looked and could not see what on earth that setup could be doing that leaked into any other tests!? | 21:06 |
clayg | acoles_away: I can't get test_object_delete_at_aysnc_update to fail in a tight loop - what you said makes sense - but I can't get it to do it | 21:06 |
*** annegentle has quit IRC | 21:07 | |
clayg | acoles_away: from your description the problem would be easily fixed by ts = (utils.Timestamp(t) for t in itertools.count(int(time() + 1))) | 21:07 |
*** Nadeem has joined #openstack-swift | 21:11 | |
Nadeem | Hello folks, I was wondering how could I propose a skip on Tempest tests for https://review.openstack.org/#/c/150149/ | 21:13 |
Nadeem | As per https://github.com/openstack/tempest/blob/master/HACKING.rst#2-bug-fix-on-core-project-needing-tempest-changes I need to propose a skip on Tempest tests | 21:14 |
clayg | Nadeem: how about we *not* change the API? | 21:16 |
Nadeem | @clayg Well currently we are not following the RFC. As per RFC 2616 section 10.3.5 & section 4.3, 304 Not Modified should not include entity headers like Content-Length & Content-Type. | 21:19 |
Nadeem | This change allow us to be compliant with the RFC. | 21:20 |
clayg | we're compliant | 21:20 |
clayg | it said should not | 21:20 |
clayg | acoles_away: yay it failed! | 21:23 |
*** annegentle has joined #openstack-swift | 21:23 | |
wer | so my async problem is gone. I'm getting occasional timeouts and it looks like the object-server is giving an 499 on occasion. The client talking with the proxy-server is returned a 408 under these conditions. Any pointers where to poke at these 499's? | 21:27 |
peluse | clayg, interesting finding on the 1s timeout thing - if you change the order of tests fed into nosetests it goes away (see etherpad). have a 30 min phone call now.... | 21:30 |
clayg | wer: all of those say that the client talking to the proxy stopped sending data - probably on a PUT - maybe they have a timeout and got bored? | 21:32 |
wer | yeah I wanted to blame the client.... but I wasn't sure | 21:32 |
wer | is that what that says to you? :) | 21:32 |
clayg | peluse: yeah I think i observed that too | 21:32 |
peluse | and if you add --processes=4 (or something) you can run them with the reconstructor first and it works | 21:33 |
clayg | wer: well it may not be their *fault* if the service is being slow | 21:33 |
clayg | --processes=4 !? where do you come up with this stuff | 21:33 |
peluse | shotgun troubleshooting baby :) | 21:33 |
clayg | wer: but I'm not sure how much more you can get from the 408's - the code paths that hit that mean that the proxy thinks that there was a timeout reading from the client | 21:34 |
peluse | I'm thinking we can kill the global tests at this point, this seems like a nosetest thing that doesn't like how much time we're spending in class setup() | 21:34 |
*** bkopilov has quit IRC | 21:34 | |
clayg | wer: client_timeout setting in the proxy server config | 21:34 |
clayg | wer: you might go digging into the specific transaction id's - and see if you can engage the client making the request - it's possible they may be seeing something different on their end | 21:35 |
clayg | wer: another thing that can cause a timeout to pop is something starving the reactor - like a pice of middleware doing a blocking operation | 21:35 |
clayg | er... "hub" in eventlet parlance | 21:36 |
clayg | peluse: what? too much time in setUp? blaming nosetests? not sure I buy that | 21:36 |
clayg | acoles_away: well the test delete-at bugs only seem to pop on the feature/ec branch | 21:37 |
clayg | peluse: maybe those tests are leaking their background processes somehow and causing a bunch of background noise in the eventlet hub? | 21:37 |
peluse | and only when you run the tests with reconstructor first (from my testing) | 21:37 |
clayg | wer: ^ speaking of starving the hub! :D | 21:37 |
wer | clayg: I totally have some middleware doing the sharding.... | 21:38 |
peluse | clayg, OK, I'll look some more | 21:38 |
clayg | wer: is it doing *blocking* requests - or all green (like calling into the app) | 21:38 |
*** bkopilov has joined #openstack-swift | 21:39 | |
*** acoles_away is now known as acoles | 21:39 | |
wer | ug. clayg I could barely speak wsgi... I wrote it. I don't think it should be blocking and just sits in the pipeline :/ | 21:40 |
*** jrichli has quit IRC | 21:40 | |
mattoliverau | Morning, phew reading scroll back took a while.. So fun test errors huh | 21:40 |
*** thumpba_ has joined #openstack-swift | 21:40 | |
clayg | wer: well maybe it's fine! | 21:40 |
*** thumpba has quit IRC | 21:40 | |
peluse | clayg, so with this option it still fails... | 21:41 |
clayg | wer: is it publicly available - you could probably trick someone into looking it over - cschwede and mattoliverau are into that crazy stuff | 21:41 |
peluse | --process-restartworker | 21:41 |
peluse | If set, will restart each worker process once their tests are done, this helps control memory leaks from killing the system. [NOSE_PROCESS_RESTARTWORKER] | 21:41 |
clayg | peluse: well if there's only one worker? | 21:41 |
*** thumpba has joined #openstack-swift | 21:41 | |
clayg | peluse: you might be able to limit it to a specific global reconstructor test | 21:42 |
peluse | ya | 21:42 |
*** thumpba_ has quit IRC | 21:45 | |
clayg | peluse: maybe the rings are leaking - and the object server tests are acctually trying to connect to loal host or something? | 21:47 |
acoles | clayg: peluse: i'm not here for long but wondering what do you want me to focus on tomorrow? dig into the timeout issue more, keep going on ssync tests? | 21:50 |
peluse | my vote would be ssync tests | 21:51 |
*** erlon has quit IRC | 21:51 | |
clayg | peluse: I have something that fixed the issue for me locally | 21:51 |
peluse | really? do share | 21:52 |
clayg | peluse: https://gist.github.com/clayg/24e882c4ee9f786e5312 | 21:52 |
acoles | peluse: ok | 21:53 |
clayg | i'm going to let that run for awhile and then push up the fix | 21:53 |
clayg | acoles: yeah it'd be great if you could get those ssync tests working! | 21:53 |
peluse | OK, so that bit of hackery you replaced was solving a problem that I was getting when running tox or noestests at the obj directory level - I mentioned it to you at the hackathon I think | 21:54 |
acoles | i was just looking at that global testdir setup | 21:54 |
acoles | clayg: i think the test failures were just the ones in the underlying recon patch | 21:54 |
acoles | clayg: but there's more scenario coverage i can add | 21:55 |
clayg | tox or nosetests at the obj directory level? | 21:56 |
clayg | peluse: i must have missed that | 21:56 |
clayg | acoles: oh ok - well that sounds promising | 21:57 |
acoles | clayg: peluse : do you want me to keep patch 169052 dependent or squash into the reconstructor patch?? | 21:57 |
patchbot | acoles: https://review.openstack.org/#/c/169052/ | 21:57 |
peluse | probably standalone | 21:58 |
clayg | acoles: oh i see what you mean - ummm... | 21:58 |
peluse | what am I missing? | 21:59 |
peluse | brb | 21:59 |
clayg | well he's cleaning up code that we're adding in patch 131872 - and adding tests | 22:00 |
patchbot | clayg: https://review.openstack.org/#/c/131872/ | 22:00 |
clayg | in the long run it probably doesn't matter | 22:00 |
clayg | all of that ssync reconstructor stuff will probably be in a single change when reviewing on master | 22:00 |
acoles | clayg: so according to jenkins the ssync tests in 169052 are good its just the reconstructor test failures showing up | 22:01 |
clayg | torgomatic: I think I ran off poor Nadeem - why are you encouging patch 150149 | 22:01 |
patchbot | clayg: https://review.openstack.org/#/c/150149/ | 22:01 |
torgomatic | clayg: just wondering what was going on with it; either it should be abandoned or it should make progress | 22:01 |
acoles | clayg: so its more manageable separate just as long as folks reviewing 131872 realise there's some fixes up the chain | 22:02 |
torgomatic | of course, the fact that his response was basically "the directions say do X; what does X mean?" | 22:02 |
torgomatic | ...doesn't fill me with confidence that it will make progress | 22:02 |
clayg | torgomatic: seems obviously a v1.1 api thing - "backwards incompatible with some clients (e.g. Tempest)" seems like one of those "valid reasons in particular circumstances" that rfc 2119 was talking about when it described *should* not | 22:02 |
clayg | acoles: idk, i'll rebase it when I push up some of these other reconstructor fixes (the timeout 1s thing for sure) - i guess I'll keep it seperate for now | 22:04 |
acoles | clayg: thats fine by me. | 22:05 |
notmyname | clayg: /cc torgomatic you've convinced me. after looking at it again and thinking some, I'll -2 that patch | 22:05 |
clayg | notmyname: i'm *always* down on changing the public api | 22:05 |
acoles | clayg: i gotta try and sneak in the ssync protocol change somehow too | 22:06 |
torgomatic | sounds good; whatever gets it out of a state that's waiting on feedback | 22:06 |
clayg | notmyname: If we don't do a good job keeping these warts around we'll never have a good reason to work on the recrapifier middleware! | 22:06 |
notmyname | :-) | 22:06 |
torgomatic | retroencrapulator | 22:06 |
Nadeem | @clayg I didn't ran off :) I got distracted in another chat...well the RFC 2616 in Sec 4.3 says that 304 shouldn't have any Message body. Hence it should not have any Content-Length/Content-Type. | 22:06 |
clayg | acoles: whoa did I say that outload - i was totally thinking you sneak that in somehow! | 22:06 |
notmyname | clayg: ya, my initial look was more agreeing with the concept. I hadn't considered the implications. now, -2 | 22:07 |
acoles | clayg: yup i'm just trying to think how to do it without churning all of the existing tests :/ | 22:07 |
clayg | acoles: stupid tests - you wont even notice it it when it comes in with all the other changes | 22:08 |
notmyname | ok, whew. out of meetings, I hope, for the day | 22:08 |
clayg | peluse: OH NO! it failed again! it's less likely apparently - but still happens :'( | 22:14 |
acoles | ok i'm calling it a day | 22:14 |
peluse | bah | 22:15 |
peluse | same failure mode? | 22:15 |
peluse | either way what you posted is simpler than what was there, wanna push it and I'll work on it from that point? | 22:15 |
clayg | peluse: it's the same failure mode | 22:16 |
peluse | acoles, have a good one man | 22:17 |
clayg | peluse: idk, i'm trying to see if i can isolate to a specific set of tests - i don't really think it's the setUp now? | 22:17 |
peluse | OK, its still isolated to the global tests though right? I can go through them one at a time | 22:18 |
openstackgerrit | Merged openstack/python-swiftclient: Include unsupported url scheme with ClientException https://review.openstack.org/158248 | 22:18 |
*** acoles is now known as acoles_away | 22:19 | |
*** thumpba has quit IRC | 22:20 | |
peluse | well, that's what I'll do since wherever it is in there I created it - ugh | 22:20 |
clayg | peluse: maybe it's the heartbeater stuff? | 22:22 |
peluse | I'll get there... just adding them in one at a time til it breaks :) | 22:22 |
peluse | there aren't that many | 22:22 |
clayg | peluse: well I think when you get to porting the ones that are causing the issues they'll - hehe - yeah that | 22:22 |
clayg | peluse: ok I think i have it isolated to those 4 test process_job_all_* tests | 22:26 |
clayg | test_process_job_all_timeout i think - obviously it's *all* timeout! | 22:27 |
peluse | I'm almost to that point, will let you know if its the same for me. | 22:27 |
*** Nadeem has left #openstack-swift | 22:28 | |
*** tacotuesday has joined #openstack-swift | 22:30 | |
peluse | heh, yeah. just that one which is of course the last one I added back in... | 22:31 |
*** trex has quit IRC | 22:31 | |
*** jogriffin has quit IRC | 22:33 | |
peluse | so increasing the mocked timeout seems to work - could it simply be that the large amount of crap in setup is right on the 1s mark? | 22:35 |
*** tacotuesday has quit IRC | 22:35 | |
peluse | because remember it works if I don't use the .commit() method as well (which is a shitpile less code) | 22:35 |
clayg | peluse: idk, i think there was just some junk timouts getting stuck in the hub or something | 22:37 |
clayg | ok i'm going to submit with the skiptest in there | 22:37 |
*** annegentle has quit IRC | 22:38 | |
peluse | sounds good. wierd though, 5 sec stil fails after a few runs (just that single test in the class) but 10 sec runs over and over | 22:38 |
openstackgerrit | Clay Gerrard proposed openstack/swift: wip: ec reconstructor probe test https://review.openstack.org/164291 | 22:39 |
openstackgerrit | Clay Gerrard proposed openstack/swift: Erasure Code Reconstructor https://review.openstack.org/131872 | 22:39 |
peluse | running now... | 22:40 |
*** annegentle has joined #openstack-swift | 22:41 | |
peluse | I'll also rebase acoles' dependent patch | 22:42 |
clayg | peluse: oh shit - i was already doing that | 22:44 |
clayg | peluse: now I don't want to type yes because you may beat me to it and the world could end?! | 22:45 |
peluse | it likely would end :) | 22:45 |
peluse | my tox job is almsot done | 22:45 |
openstackgerrit | Clay Gerrard proposed openstack/swift: Fix ssync sender cleanup of reverted fragment files https://review.openstack.org/169052 | 22:45 |
clayg | hahah! | 22:45 |
clayg | peluse: the trick is to not run tox ;) | 22:46 |
peluse | you bastard! | 22:46 |
peluse | so I had a failure in tox though.... | 22:47 |
clayg | notmyname: torgomatic: why can't we have https://review.openstack.org/#/c/143791/ on feature/ec - it's a good and useful change! | 22:47 |
clayg | peluse: unrelated | 22:47 |
clayg | peluse: maybe? | 22:47 |
clayg | peluse: :P | 22:47 |
peluse | likely... test_version_manifest_utf8_container_utf_object | 22:47 |
clayg | so you may have the last push after all :D | 22:47 |
notmyname | clayg: we can. same story as the multi-range get | 22:48 |
clayg | notmyname: mattoliverau: yeah why can't we have multi-range get on feature/ec - it's a good and useful change! | 22:49 |
clayg | $ git diff master | wc -l 21181 | 22:50 |
clayg | psshshhsththt - this is going to be *easy* | 22:50 |
*** devlaps has joined #openstack-swift | 22:50 | |
notmyname | clayg: smaller than storage policies ;-) | 22:50 |
*** devlaps has quit IRC | 22:54 | |
peluse | clayg, that failure above is on the latest ecrecon patch and can be hit with just running ./.unittests | 23:06 |
peluse | (and not on latest feature/ec) | 23:07 |
clayg | peluse: what failure now? | 23:10 |
clayg | peluse: can you fix it? | 23:10 |
clayg | peluse: it doesn't look like jenkins has chimed in on patch 131872 yet | 23:10 |
patchbot | clayg: https://review.openstack.org/#/c/131872/ | 23:10 |
peluse | test_version_manifest_utf8_container ERROR 1.36 | 23:11 |
peluse | test_version_manifest_utf8_container_utf_object ERROR 0.01 | 23:11 |
clayg | so with all of the ECObjectController refactoring - do we still need all of the proxy specific ec methods on the storage policy? | 23:11 |
clayg | torgomatic: ^ | 23:11 |
peluse | yeah, I'll look. would have thunk it was me but passes on feature/ec | 23:12 |
clayg | peluse: ok well it's probably something I added I guess? I'll look into if it jenkins kicks it back before you find a fix | 23:12 |
peluse | you'll likely have dinner, a few shots of scotch and breakfast before I find a fix :) | 23:13 |
* peluse is not having a good week | 23:13 | |
*** annegentle has quit IRC | 23:13 | |
clayg | lol | 23:14 |
clayg | peluse: ok well then maybe ignore it and try to fill in some of the reconstructor tests - decide what if anything we're to do with the all_timeout test - and think about what you want to do build_jobs? | 23:14 |
clayg | peluse: or tell me what to do - because i'm just piddlin' around with getting some changes ready on review/ec | 23:15 |
clayg | err... feature/ec_review | 23:15 |
clayg | which *does* kind of sound like a feature! | 23:15 |
peluse | clayg, looks like it came in with patch set 45 (same as the crazy timeout) | 23:17 |
peluse | let me see if it runs w/o the global tests | 23:17 |
*** ChanServ changes topic to "EC Merge plan: https://etherpad.openstack.org/p/ec_merge_plan | Review Dashboard: http://goo.gl/uRzLBX | Overview Dashboard: http://goo.gl/2By1qv | Logs: http://eavesdrop.openstack.org/irclogs/%23openstack-swift/" | 23:18 | |
notmyname | https://etherpad.openstack.org/p/ec_merge_plan | 23:18 |
notmyname | there's the plan to merge ec to master | 23:18 |
notmyname | to be discussed more at tomorrow's meeting | 23:19 |
clayg | peluse: if stupid jenkins would get off it's can and give me a traceback I might be able to look at it! | 23:20 |
*** petertr7 has quit IRC | 23:21 | |
peluse | clayg, OK wait I lied about the patch set 45 thing - different failures (expected).... will keep searching for where it went off the rails | 23:22 |
peluse | clayg, jenkins says "it still stucks" -- https://jenkins01.openstack.org/job/gate-swift-python27/4349/console (same error I see locally) | 23:27 |
*** kei_yama has joined #openstack-swift | 23:27 | |
clayg | peluse: why isn't that posted on the patch yet? | 23:28 |
peluse | I went to zuul and clicked in on it - I don't think its totally done yet | 23:29 |
peluse | so its passed for me twice with 49 and failed twice with 50 | 23:30 |
*** tongli has quit IRC | 23:30 | |
clayg | bah this is so tedius | 23:31 |
peluse | and makes so little sense | 23:32 |
mattoliverau | Sorry been in a meeting | 23:35 |
mattoliverau | clayg: multi-range gets would be awesome in EC.. so long as they work.. in my testing they were cutting off the end of the final boundry (miscalculate content-length maybe?) making incomplete multi-part. torgomatic has uploaded a new patchset so I should go test that I guess. | 23:38 |
*** km has joined #openstack-swift | 23:38 | |
* peluse feels like a blind squirrel trying to find a nut | 23:41 | |
clayg | peluse: you said you can reproduce it locally right - have you find a series of tests that cause the issue - beyond.... all of them? | 23:42 |
clayg | peluse: you said adding raise SkipTest() in the GlobalReconstrctor tests didn't make it stop for you? | 23:42 |
peluse | not yet, I ruled out the crappy global tests and am now just looking at unsuspecting changes in patch set 50 | 23:42 |
peluse | correct | 23:43 |
clayg | test coupling is the worst thing ever invented | 23:43 |
*** ho has joined #openstack-swift | 23:52 | |
clayg | peluse: while true; do nosetests swift/test/unit/obj/test_reconstructor.py swift/test/unit/proxy/; if [ $? -ne 0 ]; then break ; fi; done seems to fail pretty reliably for me | 23:53 |
peluse | this is bannanas - looks like its one of the tests in the other class - the new ones | 23:54 |
*** zhill has quit IRC | 23:55 | |
clayg | maybe it's worth understanding why that *_all_timeout test was so bad? | 23:55 |
torgomatic | one thing that's gotten me before is a proxy server object (proxy.server.Application) being shared between tests and deciding to error-limit my fake nodes because I had fake errors | 23:56 |
torgomatic | and then that leaks to the next test case and the proxy gets all snobby about who it'll talk to | 23:57 |
mattoliverau | So on the latest EC recontructor patch, I get the timeout error on test_reconstructor_skips_bogus_partition_dirs, but only when I run all teats on TestGlobalSetupObjectReconstructor (and probably when running the whole suite). Running just the test seems to be fine.. so must be resource cleaning somewhere. | 23:57 |
mattoliverau | I know you've dicussed it to death, but hey I was sleeping :P I'm going to take a quick debugging look incase fresh eyes help | 23:58 |
peluse | mattoliverau, are you running all unittests? | 23:58 |
peluse | because what you mention is a new one that at least I haven't seen | 23:58 |
mattoliverau | peluse: I wanted to recreate the issue without waiting as long, and seems to trigger if I just run all the tests in TestGlobalSetupObjectReconstructor class. | 23:59 |
clayg | peluse: https://github.com/simplegeo/eventlet/blob/master/eventlet/timeout.py#L76 | 23:59 |
peluse | mattoliverau, so whatever is behind this series of strange issues it seems to depend on how the test is run | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!