*** thorst has joined #openstack-powervm | 00:12 | |
*** thorst has quit IRC | 00:13 | |
*** thorst has joined #openstack-powervm | 00:43 | |
*** smatzek has joined #openstack-powervm | 00:59 | |
*** thorst has quit IRC | 01:02 | |
*** jayasankar_ has joined #openstack-powervm | 01:05 | |
*** thorst has joined #openstack-powervm | 01:11 | |
*** thorst has quit IRC | 01:11 | |
*** AndyWojo has quit IRC | 01:25 | |
*** AndyWojo has joined #openstack-powervm | 01:27 | |
*** smatzek has quit IRC | 01:57 | |
*** thorst has joined #openstack-powervm | 02:12 | |
*** thorst has quit IRC | 02:17 | |
*** esberglu has joined #openstack-powervm | 02:36 | |
*** esberglu has quit IRC | 02:40 | |
*** thorst has joined #openstack-powervm | 03:13 | |
*** thorst has quit IRC | 03:33 | |
*** thorst has joined #openstack-powervm | 04:30 | |
*** thorst has quit IRC | 04:34 | |
*** thorst has joined #openstack-powervm | 06:31 | |
*** thorst has quit IRC | 06:35 | |
*** k0da has joined #openstack-powervm | 08:06 | |
*** thorst has joined #openstack-powervm | 08:33 | |
*** thorst has quit IRC | 08:52 | |
*** thorst has joined #openstack-powervm | 09:47 | |
*** k0da has quit IRC | 09:52 | |
*** thorst has quit IRC | 10:19 | |
*** jayasankar_ has quit IRC | 10:39 | |
*** jayasankar_ has joined #openstack-powervm | 10:52 | |
*** smatzek has joined #openstack-powervm | 11:10 | |
*** thorst has joined #openstack-powervm | 11:20 | |
*** thorst has quit IRC | 11:25 | |
*** thorst has joined #openstack-powervm | 11:45 | |
thorst | efried adreznec: know of any tools that test the underlying cloud performance. Not things like how well do the APIs scale (that's Rally), but how well does I/O run across X VMs with Y hosts | 12:41 |
---|---|---|
*** apearson has joined #openstack-powervm | 12:44 | |
*** edmondsw has joined #openstack-powervm | 12:45 | |
*** nbante has joined #openstack-powervm | 12:45 | |
thorst | looks like there is a SPECCloud benchmark | 12:48 |
*** mdrabe has joined #openstack-powervm | 12:54 | |
*** esberglu has joined #openstack-powervm | 12:54 | |
esberglu | #startmeeting powervm_driver_meeting | 13:00 |
openstack | Meeting started Tue Mar 28 13:00:05 2017 UTC and is due to finish in 60 minutes. The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot. | 13:00 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 13:00 |
openstack | The meeting name has been set to 'powervm_driver_meeting' | 13:00 |
thorst | o/ | 13:00 |
efried | o/ | 13:01 |
esberglu | #topic In Tree Driver | 13:02 |
esberglu | efried: I was looking through the changesets and CI really didn't like power on/off and after | 13:02 |
esberglu | Oh wait nvm | 13:03 |
efried | Yeah, I don't think that was us. | 13:03 |
efried | Something fundamental done broke. | 13:03 |
esberglu | Yeah I think it was an issue with certain CIs | 13:03 |
esberglu | But looks like verified +1s now | 13:03 |
esberglu | So I owe you a couple reviews | 13:04 |
efried | I don't know if you caught this comment last week, but... | 13:04 |
efried | If you ever feel down about our CI success rate, just look at freakin xenserver. | 13:05 |
efried | That guy fails like 50% of the time. | 13:05 |
thorst | :-) | 13:05 |
efried | And that one is gating. | 13:05 |
efried | Does anyone have an OOT setup with recent (last 2 weeks, say) nova code underneath it where we can verify that glance bug? | 13:07 |
esberglu | Yeah I think we are fine from a success rate view. At least most days :-) | 13:07 |
efried | I'm still restacking my test system to debug it. But I need to prove that it affects OOT so I can open a launchpad bug. | 13:07 |
thorst | efried: I don't | 13:07 |
adreznec | efried: Nope, still on Ocata over here | 13:09 |
jayasankar_ | efried: I'm reconfiguring neo34 for OOT, got stuck with issues, which I'm looking into .. | 13:09 |
efried | Okay. | 13:09 |
efried | Otherwise in-tree just need reviews, at least up to 'console'. (I don't want to move SSP into the ready list until we figure this bug out.) | 13:10 |
efried | thorst I may need your help with the bug | 13:10 |
efried | "Monkey patch the glance API code in nova" is my only solution right now. | 13:11 |
thorst | uhhh, that's awful | 13:11 |
thorst | if you have a setup where it's borked I can take a peak | 13:11 |
efried | Yeah, I assume that's not a viable solution. | 13:11 |
thorst | totes not viable. | 13:11 |
efried | thorst I ought to have that by the time this meeting is over. Stacking now. And that always succeeds. | 13:12 |
thorst | cool | 13:12 |
adreznec | Should be ready in 10 minutes then efried | 13:12 |
adreznec | :) | 13:12 |
efried | btw, wanna queue up a topic for after the meeting: I have a sneaking suspicion that, when a system has been running for a long time, things go pear-shaped. | 13:12 |
esberglu | noted | 13:13 |
adreznec | That sounds bad, but ok | 13:13 |
thorst | yeah, curious about that too...because we've been running CI for months | 13:13 |
thorst | but...post scrum topic | 13:13 |
esberglu | #topic OOT Driver | 13:14 |
esberglu | Anyone have anything here? | 13:14 |
efried | Wellll... | 13:14 |
efried | I've been accumulating changes from in-tree to backport to OOT. | 13:15 |
efried | I have some of them in a (no-yet-proposed) commit. | 13:15 |
efried | But some things have come up that will require a much wider effort. | 13:15 |
efried | Like autospeccing. | 13:15 |
thorst | I know Shyama will be proposing fixes for LPM w.r.t. Cinder and File backed volumes. | 13:15 |
thorst | she's taking over a change set from me | 13:15 |
efried | I guess I don't really have an action item to propose here, but I do want to announce that I'll be requiring new UT to autospec anything coming from pypowervm from this point forward. | 13:16 |
thorst | fair enough... | 13:16 |
efried | And it won't hurt my feelings if people want to go retrofit existing UTs with autospec. | 13:16 |
adreznec | The ephemeral file support is still on hold until we can get those pesky REST changes implemented. Probably a couple sprints out still tbh | 13:16 |
thorst | adreznec: and then we need pypowervm updates? | 13:17 |
adreznec | Yeah | 13:17 |
adreznec | Once the REST side is done | 13:17 |
thorst | good thing we have a new versioning approach there | 13:17 |
adreznec | :) | 13:17 |
adreznec | Yeah we'll have to keep that as a topic | 13:17 |
adreznec | Deciding when we need to do a version bump there | 13:18 |
adreznec | FYI it looks like the change to add a global-reqs job for nova-powervm got stuck (https://review.openstack.org/#/c/440852/) | 13:20 |
adreznec | The corresponding deps merged but it didn't go in. Just bumped it | 13:21 |
adreznec | Do we want to add g-r jobs for networking-powervm and ceilometer-powervm? | 13:21 |
esberglu | Probably | 13:21 |
thorst | we should I'd think | 13:21 |
adreznec | Ok | 13:21 |
adreznec | I can toss those up a bit later here | 13:22 |
adreznec | Fairly straightforward | 13:22 |
esberglu | Cool | 13:22 |
esberglu | Anything else OOT before we move on? | 13:22 |
esberglu | #topic CI | 13:24 |
esberglu | I've got a bunch of stuff here | 13:24 |
esberglu | I believe we are ready to move up the IT CI patches to console? | 13:25 |
esberglu | And then add the corresponding whitelist change | 13:25 |
adreznec | Sounds like it | 13:26 |
esberglu | Then we can start getting some volume through and hunt down any issues | 13:26 |
esberglu | So I will put up that patch today | 13:26 |
esberglu | Other than that there are a few things I want to get working | 13:27 |
esberglu | I want to get all branches running on master tempest | 13:27 |
esberglu | ocata and master are fine | 13:27 |
esberglu | newton is passing everything but 3 tests | 13:27 |
esberglu | So I need to figure those failures out and then we can move it up for newton | 13:28 |
esberglu | I also want to get the undercloud moved from newton to ocata | 13:28 |
*** jwcroppe has quit IRC | 13:28 | |
esberglu | It seems like we have a lull where I can try to get that going on staging | 13:28 |
esberglu | I'm guessing it's going to be a bigger endeavor than just checking out a different branch | 13:29 |
esberglu | Then the last big change is to fix the goofy networking stuff | 13:29 |
esberglu | Right now the IT and OOT networking is different | 13:30 |
thorst | did we ever dig up that OVS note? | 13:30 |
esberglu | And OOT networks are being created in prep_devstack.sh while IT is using the os_ci_tempest.sh | 13:31 |
esberglu | And its just bad | 13:31 |
*** smatzek has quit IRC | 13:31 | |
esberglu | thorst: Was gonna talk to you about that today if you have time | 13:31 |
thorst | I'm free between 12-3 to chat about that | 13:32 |
thorst | just need to find that note...I have no idea where that thing is :-) | 13:32 |
*** dwayne__ has joined #openstack-powervm | 13:32 | |
esberglu | Okay I'll hunt it down after this | 13:32 |
thorst | I seem to remember me thinking it was brilliant at the time, but I've since forgotten what that idea is | 13:32 |
esberglu | That's all I have for CI | 13:34 |
jayasankar_ | esberglu: We don't have any tests specific to SVC + FC in CI right ? | 13:35 |
thorst | jayasankar_: we do not. | 13:37 |
thorst | no cinder in the CI | 13:37 |
jayasankar_ | Okay.. | 13:37 |
esberglu | Yep. That's why we are having you take a look | 13:37 |
adreznec | jayasankar_: The only storage in the CI today is SSP | 13:38 |
efried | And using remote upload, at that. | 13:38 |
efried | which is why we didn't see problems three weeks ago. | 13:38 |
esberglu | #topic Open Discussion | 13:39 |
esberglu | efried: You had something here? | 13:39 |
efried | My test system was up, not doing anything, for a couple of weeks. | 13:39 |
efried | When I got back to it, it was broken. | 13:40 |
efried | I've been looking at it while we've been talking, here, and I believe I've narrowed it down to the VIOS being hosed. | 13:40 |
efried | I know at least the cluster is screwed. | 13:40 |
efried | At the moment I'm trying to figure out if it could be because another system was in the cluster, and it may have inadvertently used the cluster disks for something. | 13:41 |
adreznec | Networking issues maybe? | 13:41 |
thorst | adreznec: networking never fails | 13:41 |
efried | Mm, could be part of it, I suppose. Got a weird error listing the cluster - it was saying the localhost was only reachable through the repository disk. | 13:41 |
efried | Anyway, purely anecdotally, this isn't the first time I've exerienced this - left a neo alone for "a while" and come back to find it borked. | 13:42 |
*** jwcroppe has joined #openstack-powervm | 13:42 | |
adreznec | We've had systems up and running for many weeks without notable issues | 13:42 |
thorst | efried: could be shared disk issues. | 13:42 |
efried | Okay, we have? Then I'm happy. | 13:42 |
efried | Yeah. | 13:42 |
efried | I need to be reminded where that SAN is so I can make sure those disks are gone from the other neo. | 13:42 |
efried | And I'll contact Uma to see if she can recover it to some normal state. I can't get anything going wrt the cluster right now. | 13:43 |
nbante | esberglu: I need help on to configure tempest in OSA. I stuck there last few weeks. | 13:43 |
esberglu | nbante: I'm in the same boat. I just got an OSA deployment to complete the full run_playbooks script yesterday for the first time since picking OSA back up | 13:45 |
thorst | efried: I can send you the v7k | 13:45 |
nbante | nice.. | 13:45 |
nbante | I faced so many issue while setup but now stuck in tempest | 13:46 |
adreznec | nbante: esberglu are these AIO? | 13:46 |
esberglu | Yeah mine is | 13:46 |
adreznec | If so, one the AIO is running you should just be able to use the gate-check-commit.sh script in the OSA repo I think | 13:47 |
adreznec | A subset of which is running tempest agains tthe AIO | 13:47 |
nbante | AIO? | 13:47 |
adreznec | All in One | 13:47 |
esberglu | All in one | 13:47 |
nbante | ok | 13:47 |
adreznec | That'll do a bit more than just tempest, but it'll be the same level of testing they'd do in the gate | 13:48 |
adreznec | Which is what we'd ideally want | 13:48 |
nbante | adreznec: do you have any link where I can get that script. I'll try to run tht as well. | 13:50 |
*** k0da has joined #openstack-powervm | 13:50 | |
adreznec | nbante: It's in the scripts subdirectory of the main OSA repo | 13:50 |
adreznec | https://github.com/openstack/openstack-ansible/blob/master/scripts/gate-check-commit.sh | 13:50 |
adreznec | So if you have OSA cloned down, you should already have it | 13:50 |
adreznec | in openstack-ansible/scripts/ | 13:50 |
nbante | I already cloned down. It should. will try to run and share you result | 13:51 |
*** smatzek has joined #openstack-powervm | 13:54 | |
esberglu | Any final topics before I end the meeting? | 13:56 |
jayasankar_ | is there any planned schedule for IT deliverable ? | 13:57 |
jayasankar_ | both IT and OOT ? or it is like by 2Q we have to complete both ? | 13:58 |
thorst | jayasankar_: the OOT is there today. IT needs to be done as patches are proposed up | 14:01 |
thorst | the core reviewers hold the key to when things get merged in... | 14:01 |
thorst | (we are not core reviewers) | 14:01 |
thorst | so the net is, IT needs to be tested as efried proposes them up :-) | 14:01 |
*** edmondsw has quit IRC | 14:02 | |
jayasankar_ | Okay. | 14:02 |
esberglu | Thanks for joining | 14:03 |
esberglu | #endmeeting | 14:03 |
openstack | Meeting ended Tue Mar 28 14:03:21 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 14:03 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-03-28-13.00.html | 14:03 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-03-28-13.00.txt | 14:03 |
openstack | Log: http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-03-28-13.00.log.html | 14:03 |
efried | Okay, one more time - how do I see the size of a PV in ioscli? | 14:04 |
*** edmondsw has joined #openstack-powervm | 14:05 | |
*** nbante has quit IRC | 14:07 | |
efried | got it. geez | 14:07 |
*** burgerk has joined #openstack-powervm | 14:28 | |
*** edmondsw has quit IRC | 14:38 | |
*** edmondsw has joined #openstack-powervm | 14:41 | |
*** tjakobs has joined #openstack-powervm | 14:42 | |
*** jpasqualetto has joined #openstack-powervm | 14:59 | |
*** kylek3h_ has joined #openstack-powervm | 15:08 | |
*** kylek3h has quit IRC | 15:10 | |
*** openstackgerrit has joined #openstack-powervm | 16:34 | |
openstackgerrit | Shyama proposed openstack/nova-powervm master: Update file i/o to support slot map https://review.openstack.org/432322 | 16:34 |
*** jayasankar_ has quit IRC | 16:41 | |
*** nbante has joined #openstack-powervm | 16:58 | |
openstackgerrit | Shyama proposed openstack/nova-powervm master: Update file i/o to support slot map https://review.openstack.org/432322 | 17:15 |
*** shyama has joined #openstack-powervm | 17:20 | |
*** nbante has quit IRC | 18:00 | |
*** shyama has quit IRC | 18:25 | |
thorst | efried: I'm going to +2 this. Any reason to hold off on a W+1 ? https://review.openstack.org/#/c/448381/2 | 18:39 |
efried | thorst fine by me | 18:48 |
openstackgerrit | Merged openstack/nova-powervm master: Config Option for Endpoint Type for Swift https://review.openstack.org/448381 | 19:02 |
*** jpasqualetto has quit IRC | 19:15 | |
*** jpasqualetto has joined #openstack-powervm | 19:28 | |
*** mdrabe has quit IRC | 19:35 | |
efried | thorst adreznec Well, in my latest stack, the upload is hanging the entire compute process. | 19:56 |
*** mdrabe has joined #openstack-powervm | 19:59 | |
efried | It's hanging on open() of the pipe. | 20:17 |
efried | wtf could cause that?? | 20:17 |
*** smatzek has quit IRC | 20:20 | |
thorst | efried: we had that for a long while...I'm thinking.... | 20:22 |
efried | open() shouldn't be able to hang. There's no such thing as a non-advisory file lock in Linux, is there? | 20:23 |
thorst | I thought when we had the 'open' hang it was actually hanging on the upload pipe | 20:24 |
thorst | and it was a super esoteric code path to figure that out | 20:24 |
efried | Well, I've put debug printfs in the glance code itself, and I get my debug statement right before open() and not after. | 20:25 |
efried | So it's not trying to write anything yet. | 20:25 |
efried | And yes, it's trying to open() the fifo itself. | 20:25 |
efried | btw, I can reproduce the hang by trying to echo > the fifo from the shell. So the hang is at the syscall level. | 20:28 |
adreznec | efried: thorst So I'm not all that familiar with the new upload code, but are we setting os.O_NONBLOCK on the pipe when we open it? | 20:29 |
efried | "we" aren't opening it. | 20:29 |
efried | nova/image/glance.py is opening it. | 20:29 |
efried | And no, it's just saying 'wb' | 20:29 |
efried | Actually, is O_NONBLOCK an available flag on open()? I don't think it is. | 20:31 |
efried | It's available on lock | 20:32 |
efried | Any case, we don't have control over that. | 20:34 |
efried | But hey, it looks like it may actually be because the reader needs to open the sucker first. | 20:34 |
efried | So - why isn't the REST API opening the pipe from their end? | 20:34 |
efried | This is gonna be fun to figure out. | 20:34 |
efried | Okay, I manually called the dummy REST API upload function from an ipython session, and it kicked the compute thread in the pants, allowing it to complete (though it still errored). | 20:41 |
efried | Could I be single-threaded?? | 20:41 |
efried | How would I find that out? | 20:41 |
efried | Hm. | 20:45 |
efried | Is there a reason we're threading this at all? | 20:45 |
efried | Does the upload_file thread actually block until the pipe is fully written?? | 20:46 |
*** esberglu has quit IRC | 20:47 | |
*** esberglu has joined #openstack-powervm | 20:48 | |
*** esberglu has quit IRC | 20:52 | |
efried | rat farts. it does. | 20:57 |
efried | So this hangs the entire compute process. | 21:03 |
efried | Like, everything stops. | 21:04 |
*** apearson has quit IRC | 21:20 | |
efried | adreznec thorst How can I tell how many threads I've got? | 21:20 |
efried | neo@neo40:/opt/pvm-rest/data/fileupload$ lscpu | 21:21 |
efried | Architecture: ppc64le | 21:21 |
efried | Byte Order: Little Endian | 21:21 |
efried | CPU(s): 16 | 21:21 |
efried | On-line CPU(s) list: 0-15 | 21:21 |
efried | Thread(s) per core: 8 | 21:21 |
efried | Core(s) per socket: 1 | 21:21 |
efried | Socket(s): 2 | 21:21 |
efried | NUMA node(s): 1 | 21:21 |
efried | Model: 2.1 (pvr 004b 0201) | 21:21 |
efried | Model name: POWER8 (architected), altivec supported | 21:21 |
efried | Hypervisor vendor: pHyp | 21:21 |
efried | Virtualization type: para | 21:21 |
efried | L1d cache: 64K | 21:21 |
efried | L1i cache: 32K | 21:21 |
efried | NUMA node0 CPU(s): 0-15 | 21:21 |
efried | So. Why is the hanging open() causing the entire compute process to hang? That is the question of the day. | 21:22 |
efried | Booya, at least I figured out the EINVAL. fsync on a FIFO. | 21:24 |
*** burgerk has quit IRC | 21:28 | |
*** edmondsw has quit IRC | 21:42 | |
*** edmondsw has joined #openstack-powervm | 21:42 | |
*** jpasqualetto has quit IRC | 21:47 | |
*** edmondsw has quit IRC | 21:47 | |
*** thorst has quit IRC | 21:51 | |
*** jpasqualetto has joined #openstack-powervm | 22:04 | |
*** edmondsw has joined #openstack-powervm | 22:12 | |
*** tjakobs has quit IRC | 22:12 | |
*** edmondsw has quit IRC | 22:16 | |
*** jpasqualetto has quit IRC | 22:21 | |
*** k0da has quit IRC | 22:40 | |
*** jwcroppe has quit IRC | 22:49 | |
*** jwcroppe has joined #openstack-powervm | 22:49 | |
*** jwcroppe has quit IRC | 22:50 | |
*** jwcroppe has joined #openstack-powervm | 22:51 | |
*** thorst has joined #openstack-powervm | 22:51 | |
*** thorst has quit IRC | 22:55 | |
*** jwcroppe has quit IRC | 22:56 | |
*** thorst has joined #openstack-powervm | 23:08 | |
thorst | efried: ppc64smt I think? Or just cat /proc/cpuinfo | 23:19 |
*** jwcroppe has joined #openstack-powervm | 23:22 | |
*** thorst has quit IRC | 23:25 | |
*** thorst has joined #openstack-powervm | 23:25 | |
*** thorst has quit IRC | 23:29 | |
*** thorst has joined #openstack-powervm | 23:56 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!