*** thorst has joined #openstack-powervm | 00:00 | |
*** thorst has quit IRC | 00:02 | |
*** thorst has joined #openstack-powervm | 00:05 | |
*** apearson has joined #openstack-powervm | 00:13 | |
*** thorst has quit IRC | 00:23 | |
*** thorst has joined #openstack-powervm | 00:23 | |
*** thorst has quit IRC | 00:27 | |
*** thorst has joined #openstack-powervm | 00:38 | |
*** edmondsw has joined #openstack-powervm | 00:42 | |
*** edmondsw has quit IRC | 00:47 | |
*** svenkat has quit IRC | 01:09 | |
*** svenkat has joined #openstack-powervm | 01:12 | |
*** svenkat has quit IRC | 01:19 | |
*** thorst has quit IRC | 01:24 | |
*** apearson has quit IRC | 01:30 | |
*** apearson has joined #openstack-powervm | 01:30 | |
*** edmondsw has joined #openstack-powervm | 02:30 | |
*** edmondsw has quit IRC | 02:35 | |
*** apearson has joined #openstack-powervm | 02:41 | |
*** apearson has quit IRC | 03:21 | |
*** apearson has joined #openstack-powervm | 03:22 | |
*** apearson has quit IRC | 03:23 | |
*** apearson has joined #openstack-powervm | 03:26 | |
*** esberglu has quit IRC | 03:59 | |
*** thorst has joined #openstack-powervm | 04:25 | |
*** thorst has quit IRC | 04:30 | |
*** esberglu has joined #openstack-powervm | 04:49 | |
*** esberglu has quit IRC | 04:53 | |
*** apearson has quit IRC | 05:00 | |
*** apearson has joined #openstack-powervm | 05:01 | |
*** apearson has quit IRC | 05:01 | |
*** apearson has joined #openstack-powervm | 05:01 | |
*** apearson has quit IRC | 05:02 | |
*** apearson has joined #openstack-powervm | 05:03 | |
*** apearson has quit IRC | 05:03 | |
*** thorst has joined #openstack-powervm | 05:58 | |
*** thorst has quit IRC | 06:03 | |
*** edmondsw has joined #openstack-powervm | 06:06 | |
*** edmondsw has quit IRC | 06:11 | |
*** edmondsw has joined #openstack-powervm | 07:54 | |
*** edmondsw has quit IRC | 07:59 | |
*** thorst has joined #openstack-powervm | 07:59 | |
*** thorst has quit IRC | 08:04 | |
*** esberglu has joined #openstack-powervm | 08:28 | |
*** esberglu has quit IRC | 08:32 | |
*** edmondsw has joined #openstack-powervm | 09:43 | |
*** edmondsw has quit IRC | 09:47 | |
*** thorst has joined #openstack-powervm | 10:00 | |
*** thorst has quit IRC | 10:05 | |
*** esberglu has joined #openstack-powervm | 10:18 | |
*** esberglu has quit IRC | 10:21 | |
*** thorst has joined #openstack-powervm | 10:42 | |
*** thorst has quit IRC | 10:54 | |
*** thorst has joined #openstack-powervm | 10:54 | |
*** thorst has quit IRC | 10:59 | |
*** svenkat has joined #openstack-powervm | 11:40 | |
*** svenkat_ has joined #openstack-powervm | 11:44 | |
*** svenkat has quit IRC | 11:44 | |
*** svenkat_ is now known as svenkat | 11:44 | |
*** smatzek has joined #openstack-powervm | 11:46 | |
*** smatzek_ has joined #openstack-powervm | 11:48 | |
*** smatzek has quit IRC | 11:48 | |
*** thorst has joined #openstack-powervm | 11:51 | |
*** edmondsw has joined #openstack-powervm | 11:57 | |
*** esberglu has joined #openstack-powervm | 12:04 | |
*** esberglu has quit IRC | 12:09 | |
*** esberglu has joined #openstack-powervm | 12:58 | |
esberglu | #startmeeting powervm_driver_meeting | 13:01 |
---|---|---|
openstack | Meeting started Tue Aug 8 13:01:02 2017 UTC and is due to finish in 60 minutes. The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot. | 13:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 13:01 |
openstack | The meeting name has been set to 'powervm_driver_meeting' | 13:01 |
mdrabe | o/ | 13:01 |
edmondsw | o/ | 13:01 |
esberglu | #topic In Tree Driver | 13:02 |
esberglu | #link link https://etherpad.openstack.org/p/powervm-in-tree-todos | 13:02 |
esberglu | I don't think there is anything new IT | 13:03 |
edmondsw | right | 13:03 |
esberglu | #topic Out Of Tree Driver | 13:03 |
edmondsw | thorst please check 5645 | 13:04 |
thorst | edmondsw: yes sir. | 13:05 |
edmondsw | I think that's all we've got going OOT at the moment | 13:05 |
esberglu | #topic PCI Passthrough | 13:06 |
esberglu | Anything new here? | 13:06 |
edmondsw | I don't think we've made any progress here yet. efried is finishing up some auth work and then we can start to make progress | 13:07 |
efried | o/ | 13:07 |
efried | Yeah, what edmondsw said. | 13:07 |
esberglu | #topic PowerVM CI | 13:08 |
esberglu | Tested the devstack gen. tempest.conf one last time for all runs last night, all looked good | 13:09 |
edmondsw | great | 13:09 |
esberglu | Got the +2 from edmondsw, anyone else want to look before I merge? | 13:09 |
esberglu | Tempest bugs are getting worked through | 13:10 |
edmondsw | do we need to be opening a LP bug about those 2 tests having the same id? | 13:10 |
efried | esberglu I don't need to look again. | 13:10 |
esberglu | edmondsw: I think that it is intentional for those 2 | 13:10 |
efried | If it's tested and edmondsw is happy, I'm happy. | 13:10 |
esberglu | They are the same test, just different microversions | 13:10 |
edmondsw | I'd rather we weren't having to skip a couple new tests, but that seems a small price to pay to get this in | 13:11 |
edmondsw | I hope there's a todo to figure that out and get those unskipped? | 13:11 |
esberglu | edmondsw: Yeah I was going to add it to the list once I merged | 13:11 |
edmondsw | yeah, I know it's kinda the same test... still thought they should probably have different ids but maybe not | 13:11 |
edmondsw | esberglu I'd go ahead and add it just to make sure we don't forget :) | 13:12 |
esberglu | I can disable the 2.48 version of the tests by setting the max_microversion | 13:12 |
edmondsw | I'd rather not | 13:12 |
esberglu | But I'm not familiar enough with compute microversions to know if that's really what we want | 13:12 |
esberglu | I didn't think so either | 13:12 |
efried | Can I get some background here? | 13:12 |
efried | Two different tests testing the same thing over different microversions of the API ought to have different UUIDs. I very much doubt that was intentional. | 13:13 |
efried | And we should be able to handle both microversions in our env. If we can't, and that's passing in the world at large, it's our bug. | 13:13 |
edmondsw | efried check 5598 | 13:14 |
esberglu | https://github.com/openstack/tempest/blob/master/tempest/api/compute/admin/test_server_diagnostics.py | 13:14 |
esberglu | I'm guessing whoever made the V248 test there just copied the original test case and didn't change the ID | 13:14 |
edmondsw | efried I expect efried is right, but I didn't look at how the test is actually written... is it one method, so one id, but run twice somehow? | 13:14 |
edmondsw | esberglu ah in that case it does sound like a bug | 13:15 |
efried | esberglu I suspect that's what happened. | 13:15 |
esberglu | Anyways I can look into it | 13:15 |
edmondsw | esberglu open the LP bug... worst case they reject it | 13:15 |
edmondsw | tx | 13:15 |
esberglu | Yep | 13:15 |
esberglu | Other bugs... | 13:15 |
esberglu | There was a bug in tempest where the REST requests would timeout | 13:16 |
esberglu | efried made a loop to see if it was permanent or temporary | 13:16 |
esberglu | https://review.openstack.org/#/c/491003/ | 13:16 |
esberglu | With that getting patched in we no longer are seeing that timeout | 13:16 |
esberglu | But we still need to find out what's causing the timeout and make a long term solution | 13:17 |
edmondsw | ++ | 13:17 |
esberglu | hsien got to the bottom of the internal server error 500's | 13:17 |
efried | oh, do tell | 13:17 |
edmondsw | sweet | 13:18 |
edmondsw | 5657 | 13:18 |
esberglu | There was an issue with the vios busy rc not being honored and retrying | 13:18 |
efried | btw, that loop fixup should have logged a message when we hit it. We should look for that log message and see how many times it hits per test. I suspect the very next try went through. Which probably means it's a threading problem at the server side of that call. | 13:19 |
esberglu | efried: Will do | 13:19 |
efried | esberglu Another experiment that might be worthwhile is knocking our threading level down. It's possible we're just timing out due to load. | 13:20 |
efried | Though... it seems like it would always hit on one or more of the same three or four tests, nah? | 13:21 |
esberglu | efried: Yeah same handful of tests | 13:21 |
edmondsw | esberglu you also had something about discover_hosts on the agenda? | 13:22 |
edmondsw | did we get that all straight? | 13:22 |
edmondsw | looks like the CI has been better | 13:22 |
esberglu | edmondsw: Was just going to say that our fix is working there | 13:22 |
edmondsw | awesome | 13:22 |
esberglu | Yep with that and efried's retry loop success rates are up | 13:22 |
esberglu | hsien's fix is +2 so should be in soon, then I will update the systems | 13:23 |
efried | edmondsw It needs to be noted that the retry loop is in tempest code, not our code. | 13:23 |
efried | So it's not a long-term fix (unless we can make the case that it should be submitted to tempest itself). | 13:23 |
edmondsw | efried right, we need to figure out what's going on there and how to fix it permanently | 13:24 |
efried | Yeah, cause I don't think it's a good idea for us to be running long-term with a tempest patch. | 13:24 |
edmondsw | ++ | 13:24 |
esberglu | ++ | 13:24 |
edmondsw | that on the todo list, esberglu? | 13:24 |
edmondsw | at the top? :) | 13:25 |
esberglu | edmondsw: I need to do an update of the list after the meeting but yeah it will be | 13:25 |
edmondsw | cool | 13:25 |
edmondsw | I was going to ask about http://184.172.12.213/92/474892/6/check/nova-in-tree-pvm/2922a78/ | 13:25 |
edmondsw | I'm pretty sure I've seen that kind of failure before... but can't remember where it ended up | 13:26 |
esberglu | edmondsw: Yeah I saw that. I think when I removed a bunch of tests from the skip list with the networking api extension change some may have introduced new issues | 13:26 |
esberglu | I know we have had those before, can't remember what our solution was | 13:27 |
edmondsw | ok, that makes sense. cuz I thought we'd fixed that, but it was probably with a skip | 13:27 |
esberglu | edmondsw: IIRC its an issue with tests interfering with each other | 13:28 |
esberglu | That's all for CI | 13:29 |
esberglu | #topic Driver Testing | 13:29 |
esberglu | Any progress? | 13:29 |
edmondsw | I opened RTC stories for testing | 13:30 |
edmondsw | I ordered them such that we'd validate vSCSI, FC, and LPM with the OOT driver before coming back to iSCSI | 13:30 |
edmondsw | give us some time to do the dev work on iSCSI | 13:30 |
edmondsw | don't see jay1_ on to discuss further | 13:31 |
edmondsw | chhavi fyi ^ | 13:31 |
esberglu | #topic Open Discussion | 13:33 |
esberglu | Any last words? | 13:33 |
edmondsw | I finally got devstack working! ;) | 13:33 |
edmondsw | so there are a bunch of additions to https://etherpad.openstack.org/p/powervm_stacking_issues | 13:33 |
esberglu | Woohoo! | 13:33 |
edmondsw | that last one was really weird... hope that's really the fix, and it wasn't just coincidence that it worked after that | 13:34 |
edmondsw | I'm pretty sure it's legit | 13:34 |
edmondsw | that's it from me | 13:35 |
esberglu | Thanks for joining | 13:35 |
esberglu | #endmeeting | 13:35 |
openstack | Meeting ended Tue Aug 8 13:35:32 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 13:35 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-08-08-13.01.html | 13:35 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-08-08-13.01.txt | 13:35 |
openstack | Log: http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-08-08-13.01.log.html | 13:35 |
*** smatzek_ has quit IRC | 13:43 | |
*** smatzek has joined #openstack-powervm | 14:01 | |
*** smatzek has quit IRC | 14:05 | |
*** smatzek has joined #openstack-powervm | 14:06 | |
*** tjakobs has joined #openstack-powervm | 14:07 | |
*** smatzek has quit IRC | 14:46 | |
thorst | efried: 5620 - I'm going to merge that | 14:56 |
efried | thorst Do it. | 14:56 |
*** smatzek has joined #openstack-powervm | 15:22 | |
esberglu | CI environment is updated with hsien's vios busy fix. Should hopefully see a decent chunk of failures disappear | 15:37 |
efried | esberglu Cool. Confirm that one-and-done behavior of that timeout yet? | 15:59 |
*** smatzek has quit IRC | 15:59 | |
*** smatzek has joined #openstack-powervm | 16:00 | |
edmondsw | esberglu efried timeouts don't seem to be a thing of the past yet: http://184.172.12.213/94/490994/3/check/nova-in-tree-pvm/5ce0e6c/ | 16:29 |
esberglu | efried: Nope not yet | 16:32 |
efried | edmondsw esberglu I do believe those are *real* timeouts. | 16:32 |
efried | actual test timeouts. | 16:32 |
esberglu | edmondsw: efried: Yep. Those are timeouts waiting for servers to reach a certain status | 16:32 |
esberglu | Where as the other timeouts were REST request timeouts | 16:33 |
edmondsw | i.e. indicating there is a problem with the patch being proposed, or something for us to dig into? | 16:33 |
efried | But it's those same effin tests still. | 16:34 |
esberglu | Not diagnosed fully yet. Definitely something we need to dig into | 16:34 |
efried | edmondsw The patch isn't being proposed :) At least not yet. It's a stopgap to help us identify whether there's an actual problem that needs to be solved. | 16:34 |
edmondsw | efried not what I meant | 16:34 |
edmondsw | I meant the patch that the CI is testing, not the patch that is supposed to help us avoid timeouts | 16:35 |
efried | gotcha. | 16:35 |
efried | Well, the fact that it's these same bloody tests... | 16:35 |
edmondsw | yeah, it's not a good sign | 16:35 |
efried | esberglu Tried bumping the concurrency down yet? | 16:35 |
esberglu | efried: We did that at some point in the past with no success | 16:36 |
efried | Wellllll | 16:36 |
efried | That would have been a different timeout. | 16:36 |
esberglu | efried: No we tried it for the actual server timeouts | 16:37 |
efried | I'll grant that we MAY still need to figure out why those REST requests are timing out. But the fact that the SAME TESTS are still hitting overall test case timeouts pretty much indicates that, for the function these tests are hitting, our performance sucks. | 16:38 |
efried | That would be corroborated if we don't see the same failures when we reduce the concurrency and/or increase the test timeout. | 16:38 |
efried | And those actions would have been inconclusive before because we were hitting that REST request timeout instead, which was masking this 'un. | 16:39 |
esberglu | The rest timeout wasn't masking the other timeouts | 16:40 |
esberglu | I agree with everything else though | 16:42 |
efried | esberglu The exception we were seeing without the retry loop was in the REST request. It wasn't getting to the overall test timeout, was it? | 16:45 |
esberglu | efried: If a test hit the REST timeout it wouldn't get to the overall timeout. But tests would still get to the overall timeout and not hit the REST timeout | 16:46 |
efried | esberglu Okay, if you're sure which one we were seeing under which configuration. | 16:47 |
*** smatzek has quit IRC | 16:53 | |
efried | esberglu What's the overall single-test timeout? | 16:55 |
esberglu | 1200 | 16:56 |
esberglu | 20 min | 16:56 |
*** kylek3h has joined #openstack-powervm | 17:04 | |
*** smatzek has joined #openstack-powervm | 17:17 | |
*** chhavi has quit IRC | 17:33 | |
esberglu | efried: edmondsw: I think the fix from hsien is causing problems | 18:28 |
efried | oh goodie. What kind of problems? | 18:28 |
efried | We getting extra VIOS_BUSYs? | 18:28 |
esberglu | efried: Not sure. Just got back from lunch and seeing a bunch of runs like this | 18:28 |
esberglu | http://184.172.12.213/66/491866/1/check/nova-out-of-tree-pvm/d069a7f/powervm_os_ci.html | 18:28 |
esberglu | Could be something else and just coincidental timing, diving into the logs now | 18:29 |
edmondsw | lovely | 18:30 |
esberglu | Looks more like what we would see with a cells/placement issue | 18:31 |
efried | esberglu Yeah, compute didn't even start. | 18:31 |
esberglu | efried: Looks like nothing started | 18:34 |
efried | yeah | 18:34 |
efried | though stack appears to have succeeded. | 18:34 |
esberglu | Well this should be fun to debug | 18:35 |
esberglu | efried: I have a test node up for something else that hit this | 18:37 |
esberglu | "Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable." | 18:37 |
esberglu | Might just not be logging properly | 18:37 |
efried | uck | 18:38 |
edmondsw | esberglu I was seeing log rotation while I was trying to stack the other day. Very annoying | 18:41 |
edmondsw | hit an issue, go away for a few hours and then when you come back to it the logs aren't there anymore | 18:41 |
edmondsw | I'm not a fan of journalctl so far... | 18:42 |
esberglu | Me neither | 18:42 |
edmondsw | hopefully there's a way to get it to save gzipped log files? | 18:45 |
esberglu | Get what to save gzipped log files? | 18:45 |
esberglu | This journalctl issue is hitting the passing runs too | 18:46 |
esberglu | So we don't have CI logs for the time being | 18:46 |
edmondsw | journald | 18:47 |
efried | There must be a way to set the rotate size | 18:47 |
esberglu | /etc/systemd/journald.conf | 18:47 |
esberglu | That's the conf for it, looking at the options now | 18:48 |
edmondsw | I don't see anything about rotation, just retention | 18:49 |
edmondsw | which probably makes sense... if it's managing everything for you without files, there aren't files to rotate | 18:50 |
esberglu | Looks like it saves the journals to /run/log/journal | 18:51 |
esberglu | I wonder if we are filling that up | 18:51 |
edmondsw | look at the "MaxFile" ones | 18:52 |
edmondsw | journalctl does say that "Output is interleaved from all accessible journal files, whether they are rotated or currently being written", so it's not that it's only reading from the current file | 18:54 |
edmondsw | esberglu it looks to me like we should unset SystemMaxFiles and RuntimeMaxFiles and instead use SystemMaxFileSize and RuntimeMaxFileSize | 18:56 |
edmondsw | oh, nm... you need both | 18:57 |
edmondsw | but one or the other probably needs to be increased | 18:58 |
esberglu | edmondsw: Spawning a system to give it a try | 19:00 |
*** dwayne has quit IRC | 19:10 | |
esberglu | edmondsw: Tried using the RuntimeMaxFileSize with no luck. There is another option to save the logs to /var/log/journal | 21:01 |
esberglu | instead of /run/log/journal | 21:01 |
esberglu | Which seems to have worked | 21:01 |
edmondsw | cool | 21:01 |
esberglu | Trying it out on a fresh system for a full run now | 21:02 |
*** svenkat has quit IRC | 21:04 | |
*** thorst has quit IRC | 21:37 | |
*** edmondsw has quit IRC | 21:38 | |
*** smatzek has quit IRC | 22:03 | |
*** esberglu has quit IRC | 22:03 | |
*** esberglu has joined #openstack-powervm | 22:04 | |
*** esberglu has quit IRC | 22:08 | |
*** esberglu has joined #openstack-powervm | 22:16 | |
*** tjakobs has quit IRC | 22:34 | |
*** thorst has joined #openstack-powervm | 22:38 | |
*** thorst has quit IRC | 22:43 | |
*** apearson has joined #openstack-powervm | 23:32 | |
*** apearson has quit IRC | 23:40 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!