*** adreznec has joined #openstack-powervm | 00:01 | |
*** apearson has quit IRC | 00:04 | |
*** k0da has quit IRC | 00:08 | |
*** chas__ has quit IRC | 00:08 | |
*** chas has joined #openstack-powervm | 00:08 | |
*** chas has quit IRC | 00:13 | |
*** chas has joined #openstack-powervm | 03:10 | |
*** chas has quit IRC | 03:15 | |
*** adreznec has quit IRC | 03:34 | |
*** adreznec has joined #openstack-powervm | 03:34 | |
*** chas has joined #openstack-powervm | 05:11 | |
*** chas has quit IRC | 05:16 | |
*** kotra03 has joined #openstack-powervm | 05:22 | |
*** chas has joined #openstack-powervm | 07:12 | |
*** chas has quit IRC | 07:17 | |
*** chas has joined #openstack-powervm | 07:25 | |
*** k0da has joined #openstack-powervm | 08:21 | |
*** k0da has quit IRC | 09:18 | |
*** jpasqualetto has joined #openstack-powervm | 10:44 | |
*** jpasqualetto has quit IRC | 11:34 | |
*** smatzek has joined #openstack-powervm | 11:41 | |
*** jpasqualetto has joined #openstack-powervm | 11:46 | |
*** efried has quit IRC | 12:19 | |
*** efried has joined #openstack-powervm | 12:20 | |
*** tblakes has joined #openstack-powervm | 13:00 | |
*** kotra03 has quit IRC | 13:06 | |
*** svenkat has joined #openstack-powervm | 13:28 | |
*** thorst has joined #openstack-powervm | 13:30 | |
efried | thorst, adreznec, esberglu: agenda item for today's meeting: We need to figure out how to have separate skip lists for CI run against in-tree and out-of-tree. | 13:38 |
---|---|---|
*** mdrabe has joined #openstack-powervm | 13:44 | |
thorst | efried: https://wiki.openstack.org/wiki/Meetings/Nova | 13:50 |
thorst | per our earlier discussion | 13:50 |
thorst | and agree on topic | 13:50 |
thorst | efried: you also need to chat with qing wu about how we want the driver work to continue. Now that you're back I think the two of you work as a team on it. | 13:51 |
*** jwcroppe has quit IRC | 13:53 | |
*** jwcroppe has joined #openstack-powervm | 13:53 | |
*** wangqwsh has joined #openstack-powervm | 13:54 | |
*** apearson has joined #openstack-powervm | 13:55 | |
*** jwcroppe_ has joined #openstack-powervm | 13:57 | |
*** tblakes has quit IRC | 13:58 | |
*** jwcroppe has quit IRC | 13:58 | |
efried | thorst, ack. | 14:02 |
efried | Are adreznec and/or esberglu working today? To run the meeting? | 14:02 |
thorst | checking... | 14:02 |
thorst | I don't see either online | 14:04 |
thorst | we could have an informal meeting | 14:04 |
efried | We need esberglu for my topic. | 14:05 |
thorst | true... | 14:05 |
thorst | but we have wangqwsh here, so we could have a nova driver discussion | 14:05 |
efried | yuh | 14:05 |
adreznec | Sorry, running late today | 14:06 |
adreznec | I think Eric is out though | 14:06 |
adreznec | Until next week or something | 14:06 |
thorst | do we want to talk through the driver stuff? | 14:08 |
thorst | namespace change...how the WIP change sets are going...etc... | 14:08 |
adreznec | May as well | 14:08 |
adreznec | #startmeeting powervm_driver_meeting | 14:08 |
openstack | Meeting started Tue Jan 3 14:08:45 2017 UTC and is due to finish in 60 minutes. The chair is adreznec. Information about MeetBot at http://wiki.debian.org/MeetBot. | 14:08 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 14:08 |
openstack | The meeting name has been set to 'powervm_driver_meeting' | 14:08 |
adreznec | Super formal now | 14:09 |
thorst | so namespace change? | 14:09 |
adreznec | #topic In-tree driver status | 14:09 |
adreznec | So where did we leave off there? I know the three of us each had discussions before the holidays | 14:10 |
efried | #link https://review.openstack.org/413736 | 14:10 |
adreznec | And efried sent out a note on things | 14:10 |
thorst | I think we all agreed we need a namespace change | 14:10 |
thorst | need being flexible... It's the least evil | 14:10 |
adreznec | Yeah, no good way around it | 14:11 |
thorst | efried's got that link...we have nova_powervm, but in the e-mail he suggested powervm_ext | 14:11 |
thorst | I kinda like powervm_ext better... | 14:11 |
adreznec | yeah, I know you and I had discussed something similar | 14:12 |
adreznec | I'm in agreement as well - makes things clear to users that this is the external driver | 14:12 |
thorst | what did the hyper-v guys do though/ | 14:12 |
adreznec | they just called it compute-hyperv | 14:12 |
thorst | I know you had looked that up earlier. | 14:12 |
adreznec | Same as the project name | 14:12 |
thorst | ahh, I think powervm-ext is better? | 14:12 |
thorst | at least for our needs... | 14:12 |
*** tblakes has joined #openstack-powervm | 14:13 | |
efried | I'm fine with that. Any other opinions? | 14:13 |
efried | #action efried to change namespace from nova_powervm to powervm_ext | 14:13 |
adreznec | their full oot namespace is nova.virt.compute_hyperv | 14:13 |
adreznec | I don't think there's any real standard though | 14:14 |
adreznec | So lets go with powervm_ext | 14:14 |
thorst | rip it | 14:14 |
efried | And | 14:14 |
efried | #action wangqwsh to confirm this fixes whatever issues he was seeing. | 14:14 |
efried | But before we merge, we need to resolve the other stuff I brought up. Lemme reread... | 14:15 |
adreznec | Yeah, was just bringing up that email | 14:15 |
thorst | your bullet-point 3...do we need networking-powervm changes as well to support this | 14:15 |
thorst | basically, one solution would be the networking-powervm agent listens to powervm or powervm-ext... | 14:16 |
adreznec | Yep | 14:16 |
thorst | though, I'm not sure that is really the case... | 14:16 |
adreznec | I guess that depends | 14:16 |
thorst | I think both agent types know how to deal with pvm_sea, much like KVM, hyper-v and PowerVM all know how to deal with a type of 'ovs' | 14:16 |
adreznec | Are we supporting SEA with the in-tree driver today? | 14:16 |
thorst | no, but eventually we will... | 14:16 |
efried | What kind of network does the current in-tree driver support? | 14:17 |
adreznec | So I guess the question is do we just change it to powervm_ext until we do | 14:17 |
efried | We're not going to pass squat in CI without a network. | 14:17 |
adreznec | or do we allow both for now | 14:17 |
adreznec | I think LBr/OvS for simplicity? | 14:17 |
*** tlian has joined #openstack-powervm | 14:17 | |
efried | Is the code for those guys in place in the first change set? | 14:18 |
*** kylek3h has joined #openstack-powervm | 14:18 | |
efried | Cause there once again we're inflating that change set, which we're really wanting to avoid. | 14:18 |
thorst | so networking isn't going to come in yet. That was going to be WIP6 | 14:19 |
thorst | I didn't get to that one yet | 14:19 |
thorst | in our BP we said that we'd use OVS, but that has CI changes that we need to work through | 14:19 |
thorst | and yeah...we're not going to get far on Tempest with these initial change sets. That's chicken/egg...known problem | 14:19 |
efried | Meaning we're going to accept failing CI initially? Is the community going to go for that? I've been getting the impression they wouldn't. | 14:20 |
thorst | I think it depends. | 14:21 |
efried | Or back to the discussion on paring down CI for in-tree-only | 14:21 |
thorst | having the passing powervm_ext CI will help | 14:21 |
adreznec | Right | 14:21 |
thorst | no, we have to have two CI's. | 14:21 |
thorst | one in tree, one out of tree | 14:21 |
thorst | and its really 'one CI', just running two different jobs. | 14:21 |
thorst | different tests for the in-tree (while we get support in - though I suspect change set 1 won't pass much of anything) | 14:22 |
efried | Well, wanted to punt that to the other #topic, but briefly: we're going to have a volume problem if we literally run two CIs on every change set. | 14:22 |
thorst | efried: I'm not sure about that...we're sitting OK volume wise atm | 14:22 |
thorst | like to the point we could run much more. | 14:22 |
thorst | we solved a big bottleneck mid Dec. | 14:22 |
efried | Okay, good deal. | 14:23 |
efried | So back to VIRT_DRIVER. | 14:23 |
thorst | and if we are bad on capacity, we'll figure it out | 14:23 |
efried | I still contend we need to understand more about what that thing means before we start mucking with it. | 14:24 |
adreznec | efried: we'll also theoretically be getting more systems... details are still being worked there | 14:24 |
adreznec | In devstack, VIRT_DRIVER is what enables the plugins for a nova driver | 14:24 |
adreznec | So for example, if you set VIRT_DRIVER= libvirt, it'll run https://github.com/openstack-dev/devstack/blob/d0df7c88f2c4d8e929c635beca55e6efc69be2f5/lib/nova_plugins/hypervisor-libvirt | 14:25 |
thorst | adreznec: that brings up a good point...do we need a devstack change for powervm as it goes in tree (maybe that's a discussion for later, once we're further along) | 14:26 |
adreznec | There are a couple of other misc places it pops up as well, checks for setting specific variables in things like OVS config if a specific VIRT_DRIVER is specified | 14:26 |
adreznec | Yes | 14:26 |
adreznec | We need lib/nova_plugins/hypervisor-powervm | 14:26 |
efried | That's what wangqwsh was going to work on, per email thread, yes? | 14:26 |
adreznec | I thought so | 14:27 |
wangqwsh | yes, hypervisor-powervm is used to fix the VIRT_DRIVER type in devstack | 14:28 |
efried | Perhaps the question is whether it's actually appropriate for networking-powervm to be using VIRT_DRIVER to decide which network plugins it loads. | 14:29 |
efried | Versus, perhaps, a conf setting. | 14:30 |
thorst | efried: I contend it doesn't...where are you seeing that it does? | 14:30 |
*** jpasqualetto has quit IRC | 14:30 | |
efried | #link https://github.com/openstack/networking-powervm/blob/master/devstack/settings#L6 | 14:31 |
thorst | ahhh, OK. | 14:31 |
thorst | networking-powervm's *devstack* (which I guess I was missing) assumes that. | 14:31 |
adreznec | Yeah | 14:32 |
adreznec | I mean we could definitely find something else to key off | 14:32 |
thorst | ok...I was off in kansas then. | 14:32 |
thorst | well, I think that's a good default. | 14:32 |
adreznec | Just seemed like a fair thing to do at the time as there was no case to ever load the driver without having VIRT_DRIVER=powervm | 14:32 |
thorst | right...and at the time it was the only one that worked. | 14:32 |
thorst | now we have OVS | 14:32 |
adreznec | I think we have three options at this point | 14:32 |
*** edmondsw has joined #openstack-powervm | 14:32 | |
thorst | but I think if using powervm_ext, I think it is appropriate to default to enabling the sea-agt and sriov-agt | 14:33 |
thorst | I'd assert it's always OK (if PowerVM anything) to start the sriov-agt | 14:33 |
efried | So the code is smart enough not to blow up if VIRT_DRIVER=foo and there's no hypervisor-foo - else our oot driver would have blown up. | 14:33 |
adreznec | 1) Make it an explicit conf setting. 2) Change it to just powervm_ext and add powervm back later. 3) Support both | 14:33 |
efried | I vote for #1. | 14:34 |
adreznec | And yeah efried, devstack will ignore things if hypervisor-foo doesn't exist | 14:34 |
efried | OOT isn't using it for anything else at the moment. | 14:34 |
*** smatzek has quit IRC | 14:34 | |
efried | Today we have [ml2] with mechanism_drivers={comma-separated list, e.g. pvm_sea,pvm_sriov} | 14:37 |
efried | in ml2_conf.ini | 14:37 |
efried | Now, I still don't really get what's a plugin and what's ML2 and what's a driver and what's a ... whatever. | 14:37 |
*** edmondsw_ has joined #openstack-powervm | 14:38 | |
thorst | efried: yeah, its weird... | 14:38 |
efried | So if it's not appropriate for us just to key off of the above, we could make a similar conf option that tells us what to load up. | 14:38 |
*** edmondsw_ has quit IRC | 14:38 | |
thorst | I think I'm OK with option 1...but we should call that out in nova_powervm's change set that you now need to explicitly set the neutron plugin | 14:38 |
adreznec | I'd be okay with either, we just need to be explicit on the decision | 14:38 |
thorst | actually, I think that devstack defaults to ovs now | 14:38 |
thorst | so we should probably default to that | 14:38 |
*** tlian2 has joined #openstack-powervm | 14:38 | |
thorst | but if they explicitly set sea, we should disable OVS... | 14:38 |
efried | Is this a devstack thing, or a runtime thing? | 14:39 |
thorst | (we've gone down a hell of a rabbit hole) | 14:39 |
thorst | devstack thing | 14:39 |
efried | Okay, so local.conf, not nova/neutron.conf | 14:39 |
thorst | right. | 14:39 |
thorst | nova will figure out what to do based off the vif binding | 14:39 |
efried | Ah, and blow up if there's no appropriate driver/plugin/widget/thingy. | 14:40 |
thorst | key is...only one L2-ish plugin can run at once (with the exception of macvtap and sr-iov, which can run in parallel) | 14:40 |
thorst | if running sea, can't run ovs. And vice versa | 14:40 |
efried | Okay, so we need #action somebody to propose this to networking-powervm. | 14:41 |
*** tlian has quit IRC | 14:41 | |
efried | volunteer? | 14:41 |
efried | #crickets | 14:41 |
adreznec | I can throw something together | 14:41 |
thorst | lol - as I was typing he saved me | 14:41 |
efried | Thanks. | 14:41 |
thorst | phew | 14:41 |
efried | #action adreznec to propose networking-powervm change to move from VIRT_DRIVER to local.conf option to decide which networking thingies to install. | 14:42 |
adreznec | Huzzah | 14:42 |
efried | What's next? | 14:44 |
efried | We may be able to get some mileage out of the dual-CI discussion without esberglu here. | 14:44 |
efried | And load him up with corresponding #actions for when he gets back ;-) | 14:44 |
efried | wanna? | 14:45 |
thorst | heh, lets just take a moment to congratulate him (without him here) on how well the CI has been running | 14:45 |
* efried bows head for introspective moment of silence. | 14:46 | |
*** jpasqualetto has joined #openstack-powervm | 14:47 | |
thorst | so, what's the discussion CI wise. | 14:47 |
thorst | single CI - run two jobs for each nova patch. One in tree, one out of tree. Different config files. Configs changing rapidly for in tree as function gets added. | 14:47 |
adreznec | Yeah | 14:48 |
adreznec | Different configs and different test lists | 14:48 |
*** jwcroppe_ has quit IRC | 14:48 | |
thorst | we want to use OVS...which will be a deeper discussion | 14:48 |
thorst | but one we've been having (lightly) for the OSA discussion | 14:48 |
thorst | so this will help our OSA CI as well | 14:48 |
efried | Anybody remember where we are wrt converting test names to idempotent IDs? | 14:49 |
thorst | he has a patch set up for that | 14:49 |
thorst | not sure if merged. | 14:49 |
efried | link? | 14:49 |
thorst | its on morpheus...let me look up | 14:50 |
efried | got it | 14:50 |
efried | 4650 | 14:50 |
efried | ah, good, he's got the test names in comments next to the IDs. | 14:50 |
efried | That's what I wanted to make sure of. | 14:50 |
efried | Else maintainability nightmare. | 14:50 |
thorst | efried: agree. | 14:51 |
efried | I gave that change set +2 | 14:51 |
thorst | its been a month :-) | 14:52 |
efried | We'll let esberglu merge it, cause I want him to make sure it really works (i.e. getting parsed properly) | 14:52 |
efried | added comment to that effect. | 14:53 |
efried | What else? | 14:53 |
efried | (is there a way to queue this dual-CI agenda item up for next time when esberglu is here?) | 14:54 |
thorst | efried: make an action for him to lead that discussion? | 14:55 |
efried | #action esberglu to drive discussion: two CI runs per change set: in- and out-of-tree. In-tree much smaller, with plan to grow in lockstep with in-tree driver merges. | 14:57 |
efried | (where "smaller" really means "larger skip list") | 14:57 |
efried | We done for now? | 14:58 |
thorst | yeah, lets call it | 14:58 |
thorst | glad to have you back efried :-D | 14:59 |
efried | Glad to be here. | 14:59 |
efried | adreznec, gavel? | 14:59 |
efried | (he's waiting for precisely top of the hour) | 14:59 |
efried | (or getting coffee) | 15:00 |
*** smatzek has joined #openstack-powervm | 15:01 | |
adreznec | #endmeeting | 15:02 |
openstack | Meeting ended Tue Jan 3 15:02:00 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:02 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-01-03-14.08.html | 15:02 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-01-03-14.08.txt | 15:02 |
openstack | Log: http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-01-03-14.08.log.html | 15:02 |
adreznec | Sorry, Steve stopped by my office | 15:02 |
efried | adreznec, thorst - can you help me out here? Is this still accurate? https://review.openstack.org/#/c/413736/2/doc/source/devref/usage.rst@73 | 15:04 |
efried | What's the significance of --namespace nova_powervm in that command? | 15:05 |
efried | Does that correspond to the [powervm] section in the conf? I.e. should that have been --namespace powervm anyway? | 15:06 |
thorst | I think it was to get us to load properly...but not entirely sure... | 15:06 |
thorst | yeah... | 15:06 |
efried | actually no, it appears as though the --namespace is the root package name. The [powervm] comes from the opt code. | 15:08 |
efried | So that line stays the same (in particular, does _not_ change to powervm_ext) | 15:08 |
thorst | right. | 15:09 |
thorst | conf should all be the same between the two | 15:09 |
thorst | with powervm_ext being a superset of powervm in tree | 15:09 |
thorst | for now | 15:09 |
efried | Cause we're changing nova_powervm/virt/powervm/nova_powervm to nova_powervm/virt/powervm_ext, not powervm_ext/virt/powervm/powervm_ext | 15:10 |
thorst | correct - nova_powervm/virt/powervm_ext | 15:10 |
adreznec | yeah, both the nova_powervm/virt and nova/virt paths should end up as powervm_ext | 15:11 |
openstackgerrit | Eric Fried proposed openstack/nova-powervm: Change namespace to nova.virt.powervm_ext https://review.openstack.org/413736 | 15:15 |
*** kotra03 has joined #openstack-powervm | 15:15 | |
*** jwcroppe has joined #openstack-powervm | 15:15 | |
efried | adreznec, oh, I was going to leave nova_powervm/virt/powervm alone. | 15:15 |
*** tjakobs has joined #openstack-powervm | 15:16 | |
adreznec | efried: I suppose that works too | 15:16 |
efried | Cause a) it's already a different namespace, and b) it's hidden by nova/virt/powervm_ext | 15:16 |
adreznec | Just saw thorst mention making it nova_powervm/virt/powervm_ext there | 15:16 |
efried | Sorry, that was my bad. | 15:17 |
efried | That would make the change set a LOT bigger. If we can get away with it this way, let's do that. | 15:17 |
adreznec | Yep, agreed | 15:17 |
efried | adreznec, thorst - so having been congratulatory about the CI running smoothly, it seems like we've been failing a lot of runs (most? all?) | 15:22 |
thorst | not all | 15:22 |
efried | E.g. http://184.172.12.213/42/415142/10/check/nova-pvm-dsvm-tempest-full/7d3fcbf/powervm_os_ci.html | 15:22 |
thorst | go check the jenkins | 15:22 |
thorst | many are passing ( at least they were a few days back ) | 15:22 |
efried | Well, I know I've been deleting tens of emails a day about failures. | 15:23 |
thorst | yeah | 15:23 |
efried | Above only has two failed tests. So we not gonna worry about it til esberglu gets back? | 15:23 |
thorst | same...but we expect failures | 15:23 |
thorst | change sets will fail. | 15:23 |
thorst | we need to compare to the KVM results | 15:23 |
thorst | which esberglu was doing before break | 15:24 |
adreznec | We're failing more than KVM right now | 15:24 |
adreznec | http://ci-watch.tintri.com/project?project=nova | 15:24 |
adreznec | Something must be wonky | 15:24 |
efried | okay, so if there's a whole column of Xes, it's likely the change set that's effed; but if there's mostly greens and we're red, it's probably us. | 15:26 |
efried | So - important to investigate ASAP, or wait for esberglu? | 15:26 |
*** wangqwsh has quit IRC | 15:33 | |
thorst | hmm...lets see if its an obvious problem? | 15:58 |
thorst | efried: ^^ | 15:58 |
thorst | like a SSP filled up or something | 15:58 |
efried | thorst, in the above example, they're both timeouts. | 16:00 |
thorst | so one question...did new tests get added? | 16:01 |
thorst | that we don't support...one is a migration test | 16:01 |
thorst | are the two failures consistent I guess...its always those tests or its random | 16:02 |
efried | thorst, at a glance, appears regular but not consistent. | 16:03 |
thorst | hmmm | 16:05 |
thorst | next question. Is it always on specific hosts | 16:05 |
efried | thorst, how do I tell? | 16:05 |
thorst | log parsing I think | 16:06 |
thorst | looking for the key | 16:06 |
efried | console log appears to have an IP. | 16:06 |
thorst | which it shouldn't.... | 16:07 |
thorst | and I think that is the client address...not the NL address | 16:07 |
*** jpasqualetto has quit IRC | 16:07 | |
efried | in the console log? | 16:07 |
efried | at the very top? 9. address? | 16:07 |
efried | (which, by the way, is not the same for any of the failures I'm looking at) | 16:08 |
thorst | yeah, that's a client address | 16:08 |
thorst | so it'll never be the same | 16:08 |
efried | mmph | 16:09 |
thorst | I know there's a way to do it | 16:09 |
thorst | :-) | 16:09 |
thorst | I just forget how | 16:09 |
efried | Used to be a neoXX name in the log, but not anymore. | 16:09 |
thorst | yeah....probably because we eased up logging | 16:10 |
efried | I thought we got rid of this stupid "No value for $TERM" thing. | 16:11 |
thorst | managed system UUID | 16:12 |
thorst | we'll have to work with that for now... | 16:12 |
efried | gross | 16:14 |
efried | thorst, so do we need to keep digging into this now, or is it okay to wait for esberglu? | 16:14 |
efried | thorst, is our CI voting on nova-powervm changes? | 16:19 |
efried | gating, that is | 16:20 |
efried | See https://review.openstack.org/#/c/413736/ which is marked -1 | 16:20 |
*** jpasqualetto has joined #openstack-powervm | 16:20 | |
adreznec | efried: theoretically it is | 16:20 |
adreznec | At least according to esberglu | 16:20 |
efried | Seems to be working ;-) | 16:20 |
*** jpasqualetto has quit IRC | 16:21 | |
thorst | efried: lets keep digging | 16:23 |
thorst | it is odd... | 16:23 |
thorst | that some are passing. | 16:23 |
efried | thorst, okeydokey. | 16:24 |
thorst | efried: simple fix to get us going may be to just disable those tests for now... | 16:28 |
thorst | looks like it is three tests... | 16:28 |
thorst | if we don't have sorted before EOD, we should probably just push that through quick. | 16:28 |
efried | Looking at the first one (test_list_migrations_in_flavor_resize_situation) - creation succeeds; resize "succeeds" (or at least completes), and it's the confirmResize that's timing out. | 16:29 |
thorst | I wonder if its a change I made... | 16:29 |
thorst | sec... | 16:30 |
thorst | efried: https://review.openstack.org/#/c/391164/ | 16:30 |
thorst | could that play into it/ | 16:30 |
efried | because not updating status? That seems like a decent candidate. | 16:31 |
efried | would explain intermittent-ness of failures. | 16:32 |
efried | thorst, don't we eventually emit the event anyway? | 16:32 |
thorst | unless. | 16:32 |
thorst | something overrides it | 16:32 |
thorst | power-off -> power-on means we wouldn't issue the power-off | 16:33 |
thorst | because it is a reflective state. | 16:33 |
thorst | I'm not sure if that plays into resize... | 16:33 |
efried | thorst, doesn't look like it. We just POST the new specs to the LPAR. | 16:35 |
thorst | hmm | 16:38 |
efried | wtf is the resize method named in the driver? | 16:38 |
efried | Only Resize task call I see in driver is in finish_migration | 16:39 |
thorst | finish_migration | 16:39 |
thorst | because resize is a migration | 16:39 |
thorst | to the same host | 16:39 |
thorst | its a super weird thing. | 16:40 |
efried | okay, then I guess we are powering it off to do the resize. | 16:40 |
thorst | yeah, I think we are | 16:43 |
thorst | ooo, could be a power off timeout? | 16:43 |
thorst | (though that should be quick...its just sitting in SMS I think) | 16:44 |
efried | thorst, do you have the compute log open? | 16:44 |
thorst | which one...I've been perusing many | 16:44 |
efried | I'm looking at this one: http://184.172.12.213/42/415142/10/check/nova-pvm-dsvm-tempest-full/7d3fcbf/ | 16:45 |
efried | instance is 2af9198d-b098-4919-ab38-7aa4c751d269 | 16:46 |
thorst | hmm...all within 10 minutes basically | 16:48 |
thorst | and it timed out in the test... | 16:48 |
efried | 2017-01-03 07:37:19.467 issue power-on. | 16:49 |
efried | 2017-01-03 07:37:20.755 that completes. | 16:49 |
efried | 2017-01-03 07:37:23.443 Resized/migrated instance is powered off. Setting vm_state to 'stopped' | 16:49 |
efried | 2017-01-03 07:37:25.514 is where it gets interesting. | 16:49 |
thorst | yeah, looks like it goes spastic then. | 16:50 |
efried | looks like the DB still says it's stopped, so instead of updating the DB, it actually stops the VM! | 16:50 |
efried | That seems wrong. | 16:50 |
thorst | efried: no, that's the purpose of sync_power_state I thought | 16:50 |
thorst | to bring it in line with the db. | 16:50 |
thorst | but maybe we shouldn't sync the power state on VMs with active tasks? | 16:50 |
thorst | though...its the compute manager that issues that. | 16:51 |
efried | right | 16:51 |
efried | So we need to back up and figure out where the misstep occurs. | 16:52 |
efried | 2017-01-03 07:37:10.197 we start powering off. | 16:56 |
efried | 2017-01-03 07:37:10.205 lifecycle event 'Started' | 16:56 |
efried | but 2017-01-03 07:37:10.429 skips syncing the power state because it's migrating. | 16:56 |
thorst | why did we start it again is the question. | 16:56 |
efried | So yeah, could be because we're delaying the event because we're SHUTTING_DOWN | 16:57 |
efried | We didn't. | 16:57 |
efried | uh | 16:57 |
thorst | but this timestamp: 2017-01-03 07:37:25.751 | 16:57 |
thorst | indicates we had an event that told us we were starting | 16:57 |
thorst | which I thought just flew through | 16:58 |
*** kotra03 has quit IRC | 16:59 | |
efried | Is that the same event that's been bouncing around since we first skipped it at 2017-01-03 07:37:10.429 ? | 16:59 |
efried | Does it do that? Keep events queued while operations are ongoing? | 16:59 |
thorst | sec | 16:59 |
efried | looks like maybe not. The events are tagged with a ms-since-epoch timestamp, which is different between those two events. | 17:00 |
efried | Why does this happen: 2017-01-03 07:37:23.443 ?? | 17:04 |
efried | three seconds earlier, power-on completed. | 17:05 |
efried | thorst, what does resize actually do? Why do we do this rename business? Are there actually two instances with the same UUID at some point? | 17:08 |
*** jpasqualetto has joined #openstack-powervm | 17:09 | |
*** chas has quit IRC | 17:09 | |
thorst | efried: kylek3h is actually the one who wrote it...but my understanding is that it changes the name because at some point there are two LPARs (the old size and the new size) on the host (if same host) | 17:09 |
*** chas has joined #openstack-powervm | 17:09 | |
efried | So we could be seeing power on/off for those separate instances? | 17:10 |
efried | How does that work on our hypervisor if they have the same UUID? | 17:10 |
thorst | good questions...looking. kylek3h you around to help answer that quick/ | 17:14 |
*** chas has quit IRC | 17:14 | |
kylek3h | What's up? | 17:14 |
thorst | kylek3h: with a resize on the same host...do we ever have two partitions matching? | 17:16 |
thorst | sorry... | 17:16 |
thorst | two partitions representing the same VM? | 17:16 |
thorst | I'm dug a bit more in now...doesn't look like we actually have two VMs on the same host. We just rename the instance to help us identify it later. | 17:16 |
kylek3h | right...should just be a rename....it's been a while since I looked at all that. | 17:17 |
thorst | yeah, our CI is just hitting issues with it and we're trying to decompile the intention :-) | 17:17 |
kylek3h | could have two with the same uuid on different systems but that they shouldn't be running at the same time. | 17:17 |
thorst | yeah, that seems reasonable... | 17:18 |
thorst | but on the same host, we only have one VM. | 17:18 |
kylek3h | I thought I had a big comment above some of that code documenting the semantics... | 17:18 |
thorst | yeah, we're rushing. :-) | 17:20 |
thorst | but its pretty obvious now. | 17:20 |
efried | So here's what I don't understand: | 17:21 |
efried | 2017-01-03 07:37:20.755 Task pwr_vm completed in 1 seconds | 17:21 |
efried | Then three seconds later: | 17:21 |
efried | 2017-01-03 07:37:23.443 Resized/migrated instance is powered off. Setting vm_state to 'stopped'. | 17:21 |
efried | I think the problem is that the power-on finished, but for whatever reason nova didn't get the event and set the DB state to active. | 17:21 |
efried | So when we get to :23.443, it still has cached (somewhere?? not in the DB??) that it's off, so it sets it to stopped in the DB. | 17:21 |
efried | Maybe that event is what's hitting at | 17:23 |
efried | 07:37:25.515 Emitting event <LifecycleEvent: 1483450645.52, 2af9198d-b098-4919-ab38-7aa4c751d269 => Started> | 17:23 |
efried | ...which is too late. Though I still don't get why that event triggers sync_power_state, and/or why sync_power_state overrides the event and/or real state of the partition. | 17:23 |
thorst | Yeah, I have the code here...sec, let me find it on github | 17:25 |
thorst | https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3519-L3527 | 17:26 |
thorst | I wonder if we just need to update the 'instance.power_state' as part of the power on/off jobs... | 17:26 |
efried | thorst, that code is conditioned on the VM power state being SHUTDOWN, though - which it shouldn't be, should it? | 17:28 |
thorst | I don't see any other driver having to do that... | 17:28 |
efried | Where does instance.power_state normally get set? | 17:31 |
thorst | in the compute manager. | 17:32 |
thorst | it calls _get_power_state in the manager. Which then calls to 'get_info' in the driver. Which then calls down to 'InstanceInfo' in vm.py | 17:33 |
thorst | so power_state should always be the real state of the VM (at least if its set via _get_power_state) | 17:34 |
thorst | bbiab | 17:35 |
*** thorst is now known as thorst_afk | 17:35 | |
*** thorst_afk is now known as thorst | 17:51 | |
thorst | efried: well, turns out esberglu isn't back until the 16th | 17:51 |
thorst | efried: I have a patch I'm going to propose up...see if it fixes that issue... | 18:00 |
openstackgerrit | Drew Thorstensen (thorst) proposed openstack/nova-powervm: Check for instance state fix https://review.openstack.org/416315 | 18:05 |
thorst | efried: ^^ | 18:05 |
openstackgerrit | Drew Thorstensen (thorst) proposed openstack/nova-powervm: Check for instance state fix https://review.openstack.org/416315 | 18:06 |
efried | thorst, lgtm. If CI passes, though, we don't necessarily know the problem is fixed, do we? | 18:14 |
efried | bbiab | 18:15 |
thorst | efried: yeah, who knows if that's a legit fix or not. | 18:24 |
thorst | just seemed like it could be the cause | 18:24 |
*** k0da has joined #openstack-powervm | 18:37 | |
*** chas has joined #openstack-powervm | 19:10 | |
efried | thorst, fyi, the other failure in that run looks to be the same issue. | 19:14 |
*** chas has quit IRC | 19:15 | |
efried | ...and your proposal passed CI, for whatever that's worth. | 19:15 |
adreznec | efried: thorst should we run rechecks on that one a few times? | 19:16 |
efried | adreznec, good idea. | 19:16 |
adreznec | For a bit more confidence? | 19:16 |
*** efried has quit IRC | 19:35 | |
*** adi_____ has quit IRC | 19:35 | |
*** efried has joined #openstack-powervm | 19:36 | |
*** adi_____ has joined #openstack-powervm | 19:41 | |
thorst | agree on that | 19:43 |
thorst | thx for kicking it off | 19:43 |
efried | thorst, where did we land on the whole upload thing? | 19:47 |
efried | Did y'all manage to sort that out without my interference? | 19:48 |
efried | ...err, "help"? | 19:48 |
thorst | well, we have upload working | 19:48 |
thorst | remote upload through the function is still in that hack'ed patch set | 19:48 |
thorst | which I am still a proponent of | 19:48 |
thorst | but we've got the glance case sorted out...no complaints that I'm aware of | 19:48 |
thorst | let me ask kris | 19:48 |
thorst | according to Kris - was working well, no one has tried it this new year. | 19:57 |
*** kriskend has joined #openstack-powervm | 19:57 | |
adreznec | thorst: Yeah, we haven't heard anything more on it after delivering the full package | 20:01 |
adreznec | Holidays and all | 20:01 |
*** openstackgerrit has quit IRC | 20:03 | |
adreznec | FYI thorst efried - Not that it directly impacts us right now, but we may want to keep an eye on the OSA SRIOV patch series (e.g. https://review.openstack.org/#/c/415903/) to see how things develop from a compatibility perspective | 20:14 |
thorst | efried: 3 tests failed in this pass. | 20:39 |
thorst | its uploading logs now... | 20:39 |
*** jpasqualetto has joined #openstack-powervm | 20:40 | |
thorst | same failures as before... | 20:40 |
efried | thorst, what's docker? | 20:54 |
efried | (adreznec ^^) | 20:57 |
efried | never mind, I think I get it. | 20:59 |
efried | I don't know if this is good exposure, but they used nova-powervm as an example/template for their OOT driver: https://review.openstack.org/#/c/408148/5 | 21:00 |
efried | I guess imitation is the sincerest form of flattery, and all that. | 21:00 |
thorst | :-) neat | 21:01 |
thorst | efried: so I'm looking at the difference between a good run and a bad run. | 21:05 |
thorst | same patch set | 21:05 |
efried | coo | 21:05 |
*** chas has joined #openstack-powervm | 21:11 | |
*** chas has quit IRC | 21:15 | |
thorst | definitely the event thing... | 21:18 |
thorst | well...wait...I got my success and failure ones reversed. | 21:19 |
thorst | o, no I didn't... | 21:19 |
thorst | I think I see the fix... | 21:20 |
*** smatzek has quit IRC | 21:20 | |
*** openstackgerrit has joined #openstack-powervm | 21:28 | |
openstackgerrit | Drew Thorstensen (thorst) proposed openstack/nova-powervm: Check for instance state fix https://review.openstack.org/416315 | 21:28 |
*** chas has joined #openstack-powervm | 21:51 | |
*** chas has quit IRC | 21:56 | |
*** jwcroppe has quit IRC | 21:58 | |
*** smatzek has joined #openstack-powervm | 22:01 | |
*** svenkat has quit IRC | 22:02 | |
*** kriskend has quit IRC | 22:05 | |
*** thorst has quit IRC | 22:07 | |
*** thorst has joined #openstack-powervm | 22:08 | |
*** chas has joined #openstack-powervm | 22:12 | |
*** thorst has quit IRC | 22:13 | |
*** chas has quit IRC | 22:16 | |
*** edmondsw has quit IRC | 22:21 | |
*** edmondsw has joined #openstack-powervm | 22:21 | |
*** tjakobs has quit IRC | 22:24 | |
*** edmondsw has quit IRC | 22:25 | |
*** thorst has joined #openstack-powervm | 22:32 | |
*** thorst has quit IRC | 22:37 | |
*** tblakes has quit IRC | 22:40 | |
*** smatzek has quit IRC | 22:44 | |
*** thorst has joined #openstack-powervm | 22:53 | |
*** thorst has quit IRC | 22:57 | |
*** mdrabe has quit IRC | 23:01 | |
*** openstack has joined #openstack-powervm | 23:15 | |
*** k0da has quit IRC | 23:27 | |
*** apearson has quit IRC | 23:29 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!