*** chyka has quit IRC | 00:00 | |
rybridges | not much seems pertinent in the libvirtd log | 00:00 |
---|---|---|
rybridges | I dont think that this is related to libvirtd or qemu actually | 00:02 |
*** mingyu has quit IRC | 00:02 | |
rybridges | we are running a juno deployment with identical libvirtd / qemu versions and config | 00:02 |
rybridges | and we do not see thsi problem | 00:02 |
rybridges | but when we run the same libvirtd/qemu setup with the ocata codebase, we see this issue | 00:03 |
*** rcernin has quit IRC | 00:09 | |
*** rcernin_ has joined #openstack-nova | 00:09 | |
rybridges | could be a problem with the libvirt-python version in ocata | 00:26 |
rybridges | the upper constraints is capped at 2.5.0 | 00:26 |
rybridges | but that is completely broken in rhel environments, cant even install it | 00:26 |
rybridges | so we tried 3.5.0 | 00:26 |
rybridges | that wasnt working | 00:26 |
rybridges | tried 3.10.0 | 00:27 |
rybridges | also not working | 00:27 |
rybridges | now trying 3.7.0 | 00:27 |
rybridges | and it seems to be working | 00:27 |
rybridges | i have suspended 40 instances without error | 00:27 |
*** yangyapeng has joined #openstack-nova | 00:27 | |
rybridges | doh | 00:27 |
rybridges | take that back | 00:28 |
rybridges | tried 20 in parallel | 00:28 |
rybridges | still got a few errors | 00:28 |
clarkb | rybridges: libvirt-python is supposed to be compatible with any libvirt that is the same release as it or an older release. so libvirt-python 3.0 can tlak to libvirt 2.5 but libvirt-python 2.5 cn't talk to libvirt 3.0 | 00:31 |
clarkb | this is why rhel 7.4 broke the 2.5.0 cap | 00:31 |
clarkb | (they did a major upgrade of libvirt | 00:31 |
*** yangyapeng has quit IRC | 00:31 | |
rybridges | right | 00:32 |
rybridges | yes | 00:32 |
rybridges | 2.5 breaks on later versions of libvirt | 00:32 |
rybridges | so we had to switch up | 00:33 |
*** kumarmn has joined #openstack-nova | 00:35 | |
*** hiro-kobayashi has joined #openstack-nova | 00:35 | |
*** marst has quit IRC | 00:35 | |
*** trinaths has joined #openstack-nova | 00:36 | |
*** tuanla____ has joined #openstack-nova | 00:38 | |
*** kumarmn has quit IRC | 00:40 | |
*** moshele has joined #openstack-nova | 00:40 | |
*** rcernin_ has quit IRC | 00:50 | |
*** psachin has joined #openstack-nova | 00:53 | |
*** annp has quit IRC | 00:53 | |
*** annp has joined #openstack-nova | 00:53 | |
*** jose-phillips has quit IRC | 00:53 | |
*** hoangcx has quit IRC | 00:53 | |
*** hieulq has quit IRC | 00:53 | |
*** daidv has quit IRC | 00:53 | |
*** tuanla____ has quit IRC | 00:53 | |
*** daidv has joined #openstack-nova | 00:54 | |
*** tuanla____ has joined #openstack-nova | 00:54 | |
*** hoangcx has joined #openstack-nova | 00:54 | |
*** hieulq has joined #openstack-nova | 00:54 | |
*** jose-phillips has joined #openstack-nova | 00:55 | |
*** catintheroof has joined #openstack-nova | 00:56 | |
*** edmondsw has joined #openstack-nova | 00:58 | |
*** moshele has quit IRC | 01:00 | |
*** chyka has joined #openstack-nova | 01:01 | |
*** gyee has quit IRC | 01:01 | |
*** phuongnh has joined #openstack-nova | 01:02 | |
*** moshele has joined #openstack-nova | 01:03 | |
*** huanxie has quit IRC | 01:04 | |
*** huanxie has joined #openstack-nova | 01:06 | |
*** chyka has quit IRC | 01:06 | |
*** salv-orlando has quit IRC | 01:13 | |
*** salv-orlando has joined #openstack-nova | 01:14 | |
*** yangyapeng has joined #openstack-nova | 01:16 | |
*** yangyapeng has quit IRC | 01:17 | |
*** yangyapeng has joined #openstack-nova | 01:17 | |
*** salv-orlando has quit IRC | 01:18 | |
*** catintheroof has quit IRC | 01:21 | |
*** vishwanathj has joined #openstack-nova | 01:30 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/nova master: Updated from global requirements https://review.openstack.org/528881 | 01:31 |
*** Apoorva_ has joined #openstack-nova | 01:32 | |
*** linkmark has quit IRC | 01:33 | |
*** Apoorva has quit IRC | 01:36 | |
*** Apoorva_ has quit IRC | 01:37 | |
mriedem | alex_xu: here is a question about something from long ago https://review.openstack.org/#/c/97727/ | 01:44 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/python-novaclient master: Updated from global requirements https://review.openstack.org/528911 | 01:44 |
mriedem | alex_xu: why does populate_retry not check for MaxRetriesExceeded if max_attempts = 1? | 01:44 |
mriedem | i realize that means reschedules are disabled, but why wouldn't we compare num_attempts > max_attempts? | 01:45 |
*** Dinesh_Bhor has joined #openstack-nova | 01:46 | |
mriedem | i guess that's what the code always did... | 01:46 |
mriedem | goes way back to https://review.openstack.org/#/c/9540/ | 01:48 |
*** claudiub has joined #openstack-nova | 01:48 | |
mriedem | oh nvm, i know why | 01:51 |
mriedem | if max_attempts == 1, we never set the retry key in the filter properties passed to compute | 01:51 |
*** trungnv has joined #openstack-nova | 01:52 | |
mriedem | https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1855 | 01:52 |
mriedem | and then we don't reschedule | 01:52 |
openstackgerrit | OpenStack Proposal Bot proposed openstack/nova master: Updated from global requirements https://review.openstack.org/528881 | 01:53 |
*** edmondsw has quit IRC | 01:53 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Don't try to delete build requests on MaxRetriesExceeded https://review.openstack.org/528835 | 01:53 |
mriedem | melwitt: % | 01:53 |
mriedem | ^ | 01:53 |
mriedem | gah, that also goes back to newton | 01:54 |
*** andreas_s has joined #openstack-nova | 01:55 | |
*** andreas_s has quit IRC | 01:59 | |
*** penick has quit IRC | 02:07 | |
*** moshele has quit IRC | 02:08 | |
*** rcernin has joined #openstack-nova | 02:11 | |
*** zhangjl has joined #openstack-nova | 02:11 | |
*** zhangjl has left #openstack-nova | 02:11 | |
*** moshele has joined #openstack-nova | 02:12 | |
*** salv-orlando has joined #openstack-nova | 02:14 | |
*** moshele has quit IRC | 02:15 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Don't try to delete build request during a reschedule https://review.openstack.org/528835 | 02:16 |
*** r-daneel has quit IRC | 02:17 | |
rybridges | so I think i found the root problem with suspend | 02:17 |
rybridges | when i run suspend like this: openstack server suspend <uuid1> <uuid2> <uuid3> <uuid4> <uuid5>..... <uuid40> | 02:17 |
rybridges | almost all of the instances go to error state | 02:17 |
rybridges | but | 02:17 |
*** psachin has quit IRC | 02:18 | |
rybridges | when I run suspend in a simple for loop | 02:18 |
rybridges | like this: | 02:18 |
rybridges | for i in {1..20} | 02:18 |
rybridges | do | 02:18 |
*** trinaths has left #openstack-nova | 02:18 | |
rybridges | openstack server suspend ryan-rhel68-$i & | 02:18 |
rybridges | done | 02:19 |
rybridges | i get no errors | 02:19 |
rybridges | all of the instances go to suspended state (and NOT error state like the first command) | 02:19 |
mriedem | the compute api only takes a single instance for suspend, | 02:19 |
*** salv-orlando has quit IRC | 02:20 | |
mriedem | so not sure what osc cli is doing | 02:20 |
mriedem | looks like it should be doing the same thing as you are, in a loop https://github.com/openstack/python-openstackclient/blob/master/openstackclient/compute/v2/server.py#L2131 | 02:20 |
*** moshele has joined #openstack-nova | 02:20 | |
*** tuanla____ has quit IRC | 02:20 | |
*** daidv has quit IRC | 02:20 | |
rybridges | right | 02:21 |
rybridges | i was just looking at that | 02:21 |
rybridges | it looks like it should do essentially the same thing as the loop | 02:21 |
rybridges | but its not | 02:21 |
mriedem | what is the actual error in the nova logs? | 02:21 |
rybridges | because 80% of the instances go to error state | 02:21 |
*** daidv has joined #openstack-nova | 02:21 | |
*** tuanla____ has joined #openstack-nova | 02:21 | |
rybridges | https://pastebin.com/jTedyZVJ | 02:21 |
rybridges | weird libvirt error | 02:21 |
rybridges | but i dont get that at all when i call suspend in a loop from a shell scrip | 02:22 |
rybridges | script* | 02:22 |
mriedem | huh, shouldn't make any difference | 02:22 |
mriedem | definitely looks like you're killing libvirt | 02:22 |
mriedem | seeing libvirt crash in the libvirtd logs or syslog? | 02:22 |
rybridges | i checked the libvirtd log | 02:23 |
rybridges | and did not see anything useful at all | 02:23 |
rybridges | not really any errors that seem meaningful | 02:23 |
rybridges | even if that was the case | 02:23 |
mriedem | i don't know why it would be any different | 02:23 |
rybridges | why would running the command in one way crash it and running the command in another way be just fine | 02:23 |
mriedem | either way you're running it | 02:23 |
rybridges | yea | 02:23 |
rybridges | it is though, i have 4 ocata clusters | 02:24 |
mriedem | unless there is some timing difference | 02:24 |
rybridges | all of them the behavior is like this | 02:24 |
rybridges | we have an ntp server | 02:24 |
rybridges | it also cant be timing | 02:24 |
rybridges | because if it was | 02:24 |
rybridges | it would be reproducible with both commands | 02:24 |
rybridges | right? | 02:24 |
rybridges | unless | 02:24 |
rybridges | one command is doing something different than the other | 02:24 |
mriedem | well, | 02:24 |
rybridges | do you know if that .suspend() call is asynch? | 02:24 |
mriedem | there is overhead to simply issuing an osc command | 02:25 |
mriedem | it is | 02:25 |
mriedem | https://github.com/openstack/python-openstackclient/blob/master/openstackclient/compute/v2/server.py#L2131 | 02:25 |
mriedem | oops | 02:25 |
mriedem | https://github.com/openstack/python-openstackclient/blob/master/openstackclient/compute/v2/server.py#L2131 | 02:25 |
mriedem | damn | 02:25 |
mriedem | anyway yeah it's an rpc cast from api to compute | 02:25 |
rybridges | right | 02:25 |
mriedem | so i'm wondering if your script is hitting the osc overhead just enough that each iteration is slow enough | 02:25 |
*** claudiub has quit IRC | 02:25 | |
rybridges | hmm could be | 02:26 |
mriedem | but when doing them in batch via osc itself, it doesn't have the per-issue overhead | 02:26 |
rybridges | in theory, you would think that running the script would actually be calling that .suspend() method slower than passing all the uuids | 02:26 |
mriedem | try running both using timeit? | 02:26 |
mriedem | that's what i'm saying, | 02:26 |
mriedem | i think the script way is slower | 02:26 |
mriedem | and you're slowing it down, effectively load balancing :) | 02:26 |
rybridges | yea that makes sense | 02:26 |
mriedem | so you don't DoS libvirt | 02:26 |
*** gcb has joined #openstack-nova | 02:27 | |
mriedem | i didn't know osc actually let you specify a list of uuids to perform some action | 02:28 |
rybridges | well | 02:28 |
rybridges | it wasnt always like that | 02:28 |
rybridges | in juno we could not do that for the suspend command | 02:28 |
mriedem | yeah but now you guys are all upgraded to ocata | 02:28 |
mriedem | and have shiny new ways to kill yourselves | 02:28 |
rybridges | lololol | 02:29 |
*** Apoorva has joined #openstack-nova | 02:29 | |
lbragstad | mriedem: responded with more context/questions, hopefully it's clearer https://review.openstack.org/#/c/525772/1 | 02:31 |
mriedem | lbragstad: i think v1 of this thing needs to probably default to allowing whatever we support today, | 02:32 |
mriedem | which is admin == god | 02:32 |
mriedem | so in this thing, god == system scope | 02:32 |
mriedem | yes? | 02:32 |
lbragstad | so - ['system', 'project'] | 02:32 |
mriedem | yeah, | 02:33 |
lbragstad | because right now if you're admin you're god | 02:33 |
mriedem | and then for deployments that are doing a god -> project admin -> sheep setup, they can tweak their policy | 02:33 |
lbragstad | and can do anything everywhere | 02:33 |
mriedem | cburgess: ^ | 02:33 |
mriedem | cburgess would be a good person to ask because i think he's in the god role | 02:33 |
mriedem | i.e. the hosting company operator | 02:33 |
lbragstad | right | 02:34 |
lbragstad | so the big question is, how much power do i want to give customers without giving them the power to hose my deployment | 02:34 |
*** psachin has joined #openstack-nova | 02:36 | |
mriedem | today by default its all or none right? | 02:37 |
mriedem | admin or not admin | 02:37 |
lbragstad | pretty much | 02:37 |
mriedem | ok so i would think in queens, anything that's an admin rule by default today, would be system and project scopes | 02:38 |
mriedem | for compat | 02:38 |
mriedem | then over time you could start restricting the defaults from system to just project with release notes | 02:38 |
*** trungnv has quit IRC | 02:38 | |
mriedem | these are just defaults in the code, and can be overridden | 02:38 |
lbragstad | so - i kinda tried to go about doing that here: https://review.openstack.org/#/c/528847/1 | 02:38 |
*** AlexeyAbashkin has joined #openstack-nova | 02:38 | |
lbragstad | and i'd be super curious to get cburgess' feedback on that | 02:38 |
mriedem | oh so you have a global switch | 02:39 |
lbragstad | where an operator can go through and flip that switch once they have the right role infrastructure in place | 02:39 |
lbragstad | and they have audited their users to have the right roles | 02:39 |
rybridges | so the whole reason why i was asking about suspend originally is because snapshots were failing | 02:39 |
rybridges | and the snapshot flow (to my knowledge) is suspend > snapshot > resume | 02:39 |
lbragstad | (e.g. bob had the admin role but based on good faith, he didn't hose my deployment) | 02:39 |
rybridges | and it was always failing on suspend | 02:40 |
rybridges | and they still fail most of the time on suspend | 02:40 |
rybridges | with the same error above | 02:40 |
mriedem | rybridges: what libvirt calls suspend is likely != the compute api suspend | 02:40 |
rybridges | even though i cannot reproduce the error with suspending on the cli with the loop | 02:40 |
mriedem | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1786 | 02:41 |
mriedem | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L2686 | 02:41 |
mriedem | former is what libvirt calls on the guest during a snapshot | 02:41 |
mriedem | latter is what you get with 'openstack server suspend' | 02:42 |
rybridges | oh | 02:42 |
rybridges | ok that is interesting | 02:42 |
mriedem | oh jeez, nvm | 02:43 |
mriedem | self.suspend(context, instance) | 02:43 |
mriedem | derp | 02:43 |
mriedem | you're right | 02:43 |
*** AlexeyAbashkin has quit IRC | 02:43 | |
mriedem | i was thinking of this https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L600 | 02:43 |
mriedem | rybridges: did you see where/why the snapshot was actually failing? have you tried doing live snapshots? | 02:43 |
rybridges | the snapshots are failing with the exact same error as i posted in the pastebin above | 02:44 |
mriedem | you might want to try live snapshot if libvirt / qemu on the host is new enough | 02:44 |
rybridges | it looks like it is just failing on the suspend | 02:44 |
mriedem | we don't call suspend if you do a live snapshot | 02:44 |
rybridges | we are running the latest libvirt / qemu that is available for rhel7 | 02:44 |
mriedem | which is what? | 02:45 |
mriedem | https://github.com/openstack/nova/blob/stable/ocata/nova/conf/workarounds.py#L68 | 02:45 |
*** Tom-Tom has joined #openstack-nova | 02:45 | |
rybridges | can you do live snapshot from horizon? | 02:45 |
mriedem | live vs cold is a config option in nova-compute in this case | 02:45 |
rybridges | i dont see where to do that | 02:46 |
mriedem | by default it's cold | 02:46 |
mriedem | we removed that in queens https://github.com/openstack/nova/commit/980d0fcd75c2b15ccb0af857a9848031919c6c7d | 02:46 |
mriedem | so now it's always live | 02:46 |
mriedem | well, live by default | 02:46 |
rybridges | ok this is very interesting | 02:47 |
mriedem | using libvirt 3.6.0 and qemu 2.10 we haven't seen issues with live snapshot in CI | 02:47 |
rybridges | i will try this now | 02:47 |
rybridges | ok | 02:47 |
mriedem | used to see about a 25% failure rate with live snapshot using libvirt 1.2.2 back in the day | 02:47 |
rybridges | ok | 02:48 |
*** windsn has quit IRC | 02:48 | |
rybridges | you said that conf option should be on the hypervisor right? | 02:48 |
rybridges | for nova compute | 02:48 |
mriedem | yeah | 02:48 |
rybridges | not in nova api | 02:48 |
rybridges | ok | 02:48 |
mriedem | it's read from the nova-compute service | 02:48 |
mriedem | if that works, penick owes me a ginger ale in dublin | 02:49 |
mriedem | either way i'm hanging it up for the night | 02:49 |
rybridges | haha | 02:50 |
rybridges | i will be in ireland for ptg in feb | 02:50 |
rybridges | so ill get ou a ginger ale too | 02:50 |
*** mriedem has quit IRC | 02:56 | |
*** Dinesh_Bhor has quit IRC | 03:01 | |
*** yamahata has quit IRC | 03:12 | |
*** edmondsw has joined #openstack-nova | 03:15 | |
*** salv-orlando has joined #openstack-nova | 03:15 | |
*** abhishekk has joined #openstack-nova | 03:16 | |
*** mingyu has joined #openstack-nova | 03:17 | |
*** yamahata has joined #openstack-nova | 03:17 | |
*** edmondsw has quit IRC | 03:19 | |
*** salv-orlando has quit IRC | 03:20 | |
*** armax has quit IRC | 03:22 | |
*** mingyu has quit IRC | 03:22 | |
*** markvoelker has joined #openstack-nova | 03:24 | |
*** takashin has quit IRC | 03:25 | |
*** yamahata has quit IRC | 03:27 | |
*** lyan has quit IRC | 03:30 | |
*** tbachman has quit IRC | 03:42 | |
*** tbachman has joined #openstack-nova | 03:43 | |
*** mingyu has joined #openstack-nova | 03:45 | |
*** mingyu has quit IRC | 03:49 | |
*** dave-mccowan has quit IRC | 03:51 | |
*** Dinesh_Bhor has joined #openstack-nova | 03:52 | |
*** sridharg has joined #openstack-nova | 03:55 | |
*** markvoelker has quit IRC | 03:58 | |
*** armax has joined #openstack-nova | 04:01 | |
*** Apoorva has quit IRC | 04:04 | |
*** itlinux_ has joined #openstack-nova | 04:04 | |
*** salv-orlando has joined #openstack-nova | 04:16 | |
*** gouthamr has quit IRC | 04:18 | |
*** fragatina has quit IRC | 04:20 | |
*** fragatina has joined #openstack-nova | 04:20 | |
*** salv-orlando has quit IRC | 04:21 | |
*** takashin has joined #openstack-nova | 04:25 | |
*** psachin has quit IRC | 04:30 | |
*** psachin has joined #openstack-nova | 04:35 | |
*** andreas_s has joined #openstack-nova | 04:40 | |
*** Tom-Tom has quit IRC | 04:41 | |
*** janki has joined #openstack-nova | 04:43 | |
*** andreas_s has quit IRC | 04:45 | |
*** ratailor has joined #openstack-nova | 04:48 | |
*** phuongnh has quit IRC | 04:51 | |
*** annp has quit IRC | 04:51 | |
*** huanxie has quit IRC | 04:51 | |
*** phuongnh has joined #openstack-nova | 04:51 | |
*** annp has joined #openstack-nova | 04:52 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: api-ref: Verify parameters in servers.inc https://review.openstack.org/528201 | 04:52 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: api-ref: Verify parameters in servers.inc https://review.openstack.org/528201 | 04:52 |
*** armax has quit IRC | 04:53 | |
*** daidv has quit IRC | 04:54 | |
*** hieulq has quit IRC | 04:54 | |
*** tuanla____ has quit IRC | 04:54 | |
*** hoangcx has quit IRC | 04:54 | |
*** huanxie has joined #openstack-nova | 04:54 | |
*** tuanla____ has joined #openstack-nova | 04:55 | |
*** daidv has joined #openstack-nova | 04:55 | |
*** hieulq has joined #openstack-nova | 04:55 | |
*** hoangcx has joined #openstack-nova | 04:55 | |
*** markvoelker has joined #openstack-nova | 04:55 | |
*** gcb has quit IRC | 04:56 | |
*** yamamoto has joined #openstack-nova | 04:57 | |
*** gcb has joined #openstack-nova | 04:58 | |
*** yamamoto has quit IRC | 05:07 | |
*** psachin has quit IRC | 05:07 | |
*** yamamoto has joined #openstack-nova | 05:08 | |
*** janki has quit IRC | 05:09 | |
*** janki has joined #openstack-nova | 05:10 | |
*** Tom-Tom has joined #openstack-nova | 05:11 | |
*** psachin has joined #openstack-nova | 05:17 | |
*** salv-orlando has joined #openstack-nova | 05:17 | |
openstackgerrit | Nakanishi Tomotaka proposed openstack/nova master: Use Placement API to check resource usage https://review.openstack.org/528953 | 05:21 |
*** salv-orlando has quit IRC | 05:22 | |
*** armax has joined #openstack-nova | 05:22 | |
*** yamamoto has quit IRC | 05:27 | |
*** markvoelker has quit IRC | 05:28 | |
*** armax has quit IRC | 05:28 | |
*** yamamoto has joined #openstack-nova | 05:30 | |
*** penick has joined #openstack-nova | 05:32 | |
*** tuanla____ has quit IRC | 05:33 | |
*** Dinesh_Bhor has quit IRC | 05:34 | |
*** Dinesh_Bhor has joined #openstack-nova | 05:34 | |
*** yamamoto has quit IRC | 05:36 | |
*** penick has quit IRC | 05:36 | |
*** penick_ has joined #openstack-nova | 05:36 | |
*** Dinesh_Bhor has quit IRC | 05:38 | |
*** Dinesh_Bhor has joined #openstack-nova | 05:39 | |
*** Dinesh_Bhor has quit IRC | 05:42 | |
*** Dinesh_Bhor has joined #openstack-nova | 05:43 | |
*** links has joined #openstack-nova | 05:47 | |
*** chyka has joined #openstack-nova | 05:50 | |
*** chyka has quit IRC | 05:51 | |
*** tuanla____ has joined #openstack-nova | 05:58 | |
*** Tom-Tom has quit IRC | 06:02 | |
openstackgerrit | Rajesh Tailor proposed openstack/nova master: Host addition host-aggregate should be case-sensitive https://review.openstack.org/498334 | 06:05 |
openstackgerrit | Rajesh Tailor proposed openstack/nova master: Fix case-sensitivity for metadata keys https://review.openstack.org/504885 | 06:06 |
*** itlinux_ has quit IRC | 06:08 | |
*** annp has quit IRC | 06:08 | |
*** annp has joined #openstack-nova | 06:09 | |
*** afazekas has quit IRC | 06:11 | |
*** afazekas has joined #openstack-nova | 06:11 | |
*** penick_ has quit IRC | 06:12 | |
*** karthiks has joined #openstack-nova | 06:13 | |
*** namnh has joined #openstack-nova | 06:13 | |
*** itlinux_ has joined #openstack-nova | 06:14 | |
*** itlinux_ has quit IRC | 06:15 | |
*** chyka has joined #openstack-nova | 06:15 | |
*** chyka has quit IRC | 06:20 | |
*** armax has joined #openstack-nova | 06:21 | |
*** janki has quit IRC | 06:22 | |
*** Dinesh_Bhor has quit IRC | 06:23 | |
*** markvoelker has joined #openstack-nova | 06:25 | |
*** armax has quit IRC | 06:28 | |
*** yamamoto has joined #openstack-nova | 06:35 | |
*** hiro-kobayashi has quit IRC | 06:36 | |
*** Dinesh_Bhor has joined #openstack-nova | 06:37 | |
*** moshele has quit IRC | 06:40 | |
*** Tom-Tom has joined #openstack-nova | 06:41 | |
*** yamamoto has quit IRC | 06:41 | |
*** trungnv has joined #openstack-nova | 06:43 | |
*** moshele has joined #openstack-nova | 06:49 | |
*** edmondsw has joined #openstack-nova | 06:51 | |
*** moshele has quit IRC | 06:55 | |
*** edmondsw has quit IRC | 06:55 | |
*** markvoelker has quit IRC | 06:59 | |
*** _gryf has joined #openstack-nova | 06:59 | |
*** jchhatbar has joined #openstack-nova | 07:00 | |
openstackgerrit | jichenjc proposed openstack/nova master: Remove 'nova-manage shell' command https://review.openstack.org/521835 | 07:00 |
openstackgerrit | jichenjc proposed openstack/nova master: Remove 'nova-manage account' and 'nova-manage project' https://review.openstack.org/521833 | 07:00 |
openstackgerrit | jichenjc proposed openstack/nova master: Remove 'nova-manage logs' command https://review.openstack.org/522133 | 07:00 |
*** kumarmn has joined #openstack-nova | 07:00 | |
*** kumarmn has quit IRC | 07:05 | |
*** chyka has joined #openstack-nova | 07:08 | |
*** armax has joined #openstack-nova | 07:10 | |
*** armax has quit IRC | 07:11 | |
*** ludo has joined #openstack-nova | 07:12 | |
*** ludo is now known as Guest72028 | 07:12 | |
Guest72028 | Hello All | 07:12 |
*** chyka has quit IRC | 07:13 | |
*** pchavva has quit IRC | 07:14 | |
*** andreas_s has joined #openstack-nova | 07:18 | |
*** salv-orlando has joined #openstack-nova | 07:19 | |
Guest72028 | I have a questions for nova specialists about nova ressources: is it possible to have compute nodes spares ? is it possible to reserve compute ressources , prioritize rebuild order of VM while evacuation process ? | 07:19 |
*** claudiub has joined #openstack-nova | 07:20 | |
*** nore_rabel has joined #openstack-nova | 07:22 | |
*** salv-orlando has quit IRC | 07:23 | |
*** rcernin has quit IRC | 07:31 | |
*** Eran_Kuris has quit IRC | 07:32 | |
*** ircuser-1 has joined #openstack-nova | 07:39 | |
*** tetsuro has quit IRC | 07:43 | |
*** yamamoto has joined #openstack-nova | 07:45 | |
*** Dave has quit IRC | 07:48 | |
*** Dave has joined #openstack-nova | 07:49 | |
*** _gryf has quit IRC | 07:52 | |
*** AlexeyAbashkin has joined #openstack-nova | 07:52 | |
*** hoangcx has quit IRC | 07:53 | |
*** salv-orlando has joined #openstack-nova | 07:54 | |
*** sahid has joined #openstack-nova | 07:55 | |
*** takashin has left #openstack-nova | 08:00 | |
*** rcernin has joined #openstack-nova | 08:02 | |
*** hoangcx has joined #openstack-nova | 08:05 | |
*** Dinesh_Bhor has quit IRC | 08:07 | |
*** Dinesh_Bhor has joined #openstack-nova | 08:08 | |
*** ralonsoh has joined #openstack-nova | 08:10 | |
*** daidv has quit IRC | 08:10 | |
*** hieulq has quit IRC | 08:10 | |
*** tuanla____ has quit IRC | 08:10 | |
*** tuanla____ has joined #openstack-nova | 08:11 | |
*** daidv has joined #openstack-nova | 08:11 | |
*** hieulq has joined #openstack-nova | 08:11 | |
*** Dinesh_Bhor has quit IRC | 08:17 | |
*** Dinesh_Bhor has joined #openstack-nova | 08:18 | |
*** Dinesh_Bhor has quit IRC | 08:21 | |
*** xinliang has quit IRC | 08:22 | |
*** alexchadin has joined #openstack-nova | 08:22 | |
*** moshele has joined #openstack-nova | 08:24 | |
*** Dinesh_Bhor has joined #openstack-nova | 08:24 | |
openstackgerrit | 龚肖 proposed openstack/nova stable/pike: compute: Catch binding failed exception while init host https://review.openstack.org/528985 | 08:25 |
*** Guest72028 has left #openstack-nova | 08:25 | |
*** andreas__ has joined #openstack-nova | 08:30 | |
*** andreas_s has quit IRC | 08:34 | |
*** xinliang has joined #openstack-nova | 08:34 | |
openstackgerrit | TommyLike proposed openstack/nova master: Remove redundant try/except block when authorize https://review.openstack.org/528991 | 08:35 |
openstackgerrit | Mr Rambo proposed openstack/nova master: Fix the problems that volume-backed server rebuild https://review.openstack.org/528994 | 08:35 |
*** edmondsw has joined #openstack-nova | 08:39 | |
*** Dinesh_Bhor has quit IRC | 08:43 | |
*** edmondsw has quit IRC | 08:44 | |
*** damien_r has joined #openstack-nova | 08:45 | |
*** salv-orlando has quit IRC | 08:45 | |
*** salv-orlando has joined #openstack-nova | 08:46 | |
*** alexchadin has quit IRC | 08:47 | |
*** mdnadeem has joined #openstack-nova | 08:47 | |
*** alexchadin has joined #openstack-nova | 08:48 | |
*** priteau has joined #openstack-nova | 08:50 | |
*** jpena|off is now known as jpena | 08:51 | |
*** salv-orlando has quit IRC | 08:51 | |
*** trungnv has quit IRC | 08:52 | |
*** karthiks has quit IRC | 08:56 | |
*** markvoelker has joined #openstack-nova | 08:56 | |
*** cdent has joined #openstack-nova | 08:57 | |
*** chyka has joined #openstack-nova | 08:58 | |
*** salv-orlando has joined #openstack-nova | 08:58 | |
*** brault has joined #openstack-nova | 08:59 | |
*** andreas_s has joined #openstack-nova | 09:00 | |
*** chyka has quit IRC | 09:02 | |
*** andreas__ has quit IRC | 09:04 | |
lyarwood | mdbooth: thanks for the reviews yesterday btw, is your uuid series ready for review? | 09:05 |
*** Dinesh_Bhor has joined #openstack-nova | 09:05 | |
mdbooth | lyarwood: Mostly, yes. | 09:07 |
mdbooth | lyarwood: Well, most of it | 09:07 |
mdbooth | lyarwood: I'd very much like your input on this one: https://review.openstack.org/#/c/528363/ | 09:08 |
mdbooth | But I'm about to rebase that into the main series, because I need it for the next patch | 09:08 |
*** karthiks has joined #openstack-nova | 09:08 | |
mdbooth | The series is here: https://review.openstack.org/#/q/topic:bp/local-disk-serial-numbers+(status:open+OR+status:merged) | 09:09 |
*** Dinesh_Bhor has quit IRC | 09:10 | |
lyarwood | mdbooth: ack, looking | 09:10 |
*** ttsiouts has quit IRC | 09:11 | |
*** ttsiouts has joined #openstack-nova | 09:11 | |
mdbooth | lyarwood: Thanks | 09:13 |
*** Dinesh_Bhor has joined #openstack-nova | 09:22 | |
*** Dinesh_Bhor has quit IRC | 09:24 | |
*** Dinesh_Bhor has joined #openstack-nova | 09:25 | |
*** mvk has quit IRC | 09:25 | |
*** jaianshu has joined #openstack-nova | 09:27 | |
*** markvoelker has quit IRC | 09:29 | |
*** Dinesh_Bhor has quit IRC | 09:30 | |
*** lucas-afk is now known as lucasagomes | 09:32 | |
*** Dinesh_Bhor has joined #openstack-nova | 09:33 | |
*** derekh has joined #openstack-nova | 09:36 | |
openstackgerrit | Mr Rambo proposed openstack/nova master: Fix the problems that volume-backed server rebuild https://review.openstack.org/528740 | 09:37 |
*** josecastroleon has joined #openstack-nova | 09:37 | |
maciejjozefczyk | gibi jaypipes: Hello, thanks for your comments :) Please check my response https://review.openstack.org/#/c/520024/ Thank you! | 09:41 |
*** Dinesh_Bhor has quit IRC | 09:42 | |
*** ratailor has quit IRC | 09:44 | |
*** yangyapeng has quit IRC | 09:45 | |
*** mvk has joined #openstack-nova | 09:52 | |
*** afazekas has quit IRC | 10:01 | |
*** namnh has quit IRC | 10:01 | |
lyarwood | mdbooth: so the DriverVolumeBlockDevice change LGTM, could we set self.connection_info earlier and just pass that around within the method? | 10:02 |
mdbooth | lyarwood: Which function are you referring to specifically? | 10:05 |
lyarwood | mdbooth: _legacy_volume_attach or _volume_attach in block_device.py | 10:05 |
*** josecastroleon has quit IRC | 10:06 | |
mdbooth | Sec, just reading and digesting your review comment | 10:06 |
mdbooth | lyarwood: Ah, you're talking about *not* changing the interface? | 10:07 |
*** josecastroleon has joined #openstack-nova | 10:07 | |
mdbooth | And continuing to pass connection info explicitly? | 10:07 |
*** afazekas has joined #openstack-nova | 10:07 | |
lyarwood | mdbooth: no, just setting self.connection_info when we actually fetch it from cinder | 10:07 |
mdbooth | Ah... | 10:07 |
mdbooth | Yep, that would be cleaner. | 10:08 |
*** norman has joined #openstack-nova | 10:08 | |
mdbooth | It would make the patch a bit noisier, though... | 10:08 |
mdbooth | And it's already pretty noisy. | 10:08 |
mdbooth | Hmm... | 10:09 |
lyarwood | yeah I assumed that's why you didn't do it tbh | 10:09 |
norman | hi all, anyone know why domainxml of the live-migrated instance has <features> under <cpu> section, but new booted instances not | 10:16 |
norman | I'd trying go through the code ,failed to find clues. I am still using the Mitaka, not sure the new version is Ok or not | 10:17 |
Tahvok | Hey guys, I'm unable to find a good example of ComputeCapabilitiesFilter. Do you apply it on the flavor's metadata or on the compute host somehow? If it should me on the compute host, so where exactly do I specify my 'capabilities' filter? | 10:17 |
*** MikeG451 has joined #openstack-nova | 10:18 | |
*** karthiks has quit IRC | 10:20 | |
*** fragatin_ has joined #openstack-nova | 10:22 | |
*** fragatina has quit IRC | 10:22 | |
*** norman has quit IRC | 10:23 | |
*** psachin has quit IRC | 10:24 | |
*** norman has joined #openstack-nova | 10:24 | |
*** namnh has joined #openstack-nova | 10:25 | |
*** namnh has quit IRC | 10:25 | |
*** markvoelker has joined #openstack-nova | 10:26 | |
*** edmondsw has joined #openstack-nova | 10:27 | |
openstackgerrit | Merged openstack/nova stable/ocata: Make request_spec.spec MediumText https://review.openstack.org/528332 | 10:28 |
openstackgerrit | Merged openstack/nova master: Fix the formatting for 2.56 in the compute REST API history doc https://review.openstack.org/528114 | 10:28 |
*** moshele has quit IRC | 10:29 | |
*** mvk has quit IRC | 10:31 | |
*** edmondsw has quit IRC | 10:32 | |
*** mvk has joined #openstack-nova | 10:32 | |
*** phuongnh has quit IRC | 10:37 | |
*** yikun has quit IRC | 10:37 | |
*** josecastroleon has quit IRC | 10:37 | |
*** yikun has joined #openstack-nova | 10:37 | |
*** sambetts|afk is now known as sambetts | 10:39 | |
lyarwood | kashyap: http://logs.openstack.org/38/528338/4/check/legacy-tempest-dsvm-multinode-live-migration/d867726/logs/subnode-2/screen-n-cpu.txt.gz#_2017-12-18_20_20_13_894 - seeing this on stable/newton, LM failure for a single paused instance, looks like the remote libvirtd didn't respond in time, have you seen this before? | 10:41 |
lyarwood | kashyap: http://logs.openstack.org/38/528338/4/check/legacy-tempest-dsvm-multinode-live-migration/d867726/job-output.txt.gz#_2017-12-18_20_20_56_230121 is the tempest failure | 10:41 |
* kashyap clicks | 10:41 | |
*** jchhatbar is now known as janki | 10:41 | |
kashyap | lyarwood: Is this only stable/newton? | 10:42 |
lyarwood | kashyap: that's the only place I've seen it thus far | 10:42 |
*** inara has quit IRC | 10:42 | |
* kashyap checks the other libvirt log to see about the timeout | 10:42 | |
* kashyap notes to himself: If you see "subnode-2" in the URL, that means it's the 'source' host.) | 10:43 | |
*** norman has quit IRC | 10:43 | |
*** brault has quit IRC | 10:44 | |
*** inara has joined #openstack-nova | 10:44 | |
*** brault has joined #openstack-nova | 10:45 | |
*** karthiks has joined #openstack-nova | 10:46 | |
*** yangyapeng has joined #openstack-nova | 10:47 | |
*** brault has quit IRC | 10:50 | |
kashyap | lyarwood: So I looked at all the logs, one thing that potentially jumps out at me is in the source QEMU log: | 10:50 |
kashyap | [...] | 10:50 |
kashyap | [...] | 10:50 |
kashyap | warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] | 10:50 |
kashyap | main-loop: WARNING: I/O thread spun for 1000 iterations | 10:50 |
kashyap | Now, that warning isn't really an egregious error (& upstream QEMU is aware of it; it's a hard thing to fix), but that might be contributing to it | 10:50 |
kashyap | Where "it" being the timeout we see: | 10:51 |
kashyap | 2017-12-18 20:20:13.880+0000: 16820: error : virKeepAliveTimerInternal:143 : internal error: connection closed due to keepalive timeout | 10:51 |
kashyap | 2017-12-18 20:20:13.881+0000: 16825: error : virKeepAliveTimerInternal:143 : internal error: connection closed due to keepalive timeout | 10:51 |
kashyap | (From the source) | 10:51 |
*** brault has joined #openstack-nova | 10:51 | |
* kashyap checks with a migration dev | 10:51 | |
lyarwood | kashyap: kk, so the dest was stuck, didn't send a keepalive and everything dies? | 10:52 |
kashyap | lyarwood: Yeah, from the destination livirtd log: | 10:52 |
kashyap | 2017-12-18 20:20:19.394+0000: 23816: error : qemuMonitorIOWrite:545 : Unable to write to monitor: Broken pipe | 10:52 |
kashyap | The above means libvirt lost access to the QMP socket connection, i.e. VM died | 10:53 |
*** yangyapeng has quit IRC | 10:53 | |
lyarwood | kashyap: it's paused, that shouldn't cause the QMP socket to die however right? | 10:53 |
*** huanxie has quit IRC | 10:54 | |
*** huanxie has joined #openstack-nova | 10:54 | |
*** yamamoto has quit IRC | 10:55 | |
kashyap | lyarwood: Yeah, clearly something is wonky. I'll check this w/ Dave Gilbert as he meditates on migration | 10:55 |
*** yamamoto has joined #openstack-nova | 10:55 | |
kashyap | But the process _is_ killed, as we see from the destination (http://logs.openstack.org/38/528338/4/check/legacy-tempest-dsvm-multinode-live-migration/d867726/logs/subnode-2/libvirt/qemu/instance-00000004.txt.gz): | 10:55 |
kashyap | 2017-12-18T20:17:51.612132Z qemu-system-x86_64: terminating on signal 15 from pid 16820 | 10:55 |
*** Tom-Tom has quit IRC | 10:56 | |
*** yamamoto has quit IRC | 10:56 | |
*** yamamoto has joined #openstack-nova | 10:56 | |
*** fragatina has joined #openstack-nova | 10:57 | |
*** abhishekk has quit IRC | 10:57 | |
*** fragatin_ has quit IRC | 10:57 | |
openstackgerrit | Chen Hanxiao proposed openstack/nova master: libvirt: don't call sync_guest_time if qga is not enabled https://review.openstack.org/524836 | 10:58 |
lyarwood | kashyap: kk, thanks, I'm going to recheck this change and see if we can hit it again | 10:58 |
kashyap | So it is the live block migration, right | 10:58 |
kashyap | (To pause it) | 10:58 |
*** damien_r has left #openstack-nova | 10:58 | |
kashyap | LiveMigrationTest.test_live_block_migration_paused | 10:58 |
lyarwood | kashyap: yup, LM of a paused instance without shared storage | 10:59 |
*** markvoelker has quit IRC | 11:00 | |
openstackgerrit | Matthew Booth proposed openstack/nova master: Rename block_device_info_get_root https://review.openstack.org/529028 | 11:01 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Add local_root to block_device_info https://review.openstack.org/529029 | 11:01 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Expose driver_block_device fields as attributes https://review.openstack.org/528362 | 11:03 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Pass DriverBlockDevice to driver.attach_volume https://review.openstack.org/528363 | 11:03 |
*** yangyapeng has joined #openstack-nova | 11:03 | |
*** yangyapeng has quit IRC | 11:08 | |
*** salv-orlando has quit IRC | 11:09 | |
*** salv-orlando has joined #openstack-nova | 11:09 | |
*** yamamoto has quit IRC | 11:09 | |
*** andreas_s has quit IRC | 11:10 | |
*** andreas_s has joined #openstack-nova | 11:11 | |
openstackgerrit | Merged openstack/nova master: Implement query param schema for migration index https://review.openstack.org/518644 | 11:13 |
*** yamamoto has joined #openstack-nova | 11:13 | |
kashyap | lyarwood: So a couple of things from interacting w/ Dan & Dave from QEMU: | 11:14 |
*** salv-orlando has quit IRC | 11:14 | |
kashyap | (1) You see the _later_ log messages (on destination) are 3 before the earlier log message. | 11:14 |
kashyap | So that's some weird timestamps there. | 11:15 |
kashyap | (2) We're not the first to hit this case; there's this existing bug https://bugzilla.redhat.com/show_bug.cgi?id=1367620 ("storage migration fails due to keepalive timeout") | 11:17 |
openstack | bugzilla.redhat.com bug 1367620 in libvirt "storage migration fails due to keepalive timeout" [High,Assigned] - Assigned to jdenemar | 11:17 |
*** yamamoto has quit IRC | 11:18 | |
openstackgerrit | Matthew Booth proposed openstack/nova master: Expose BDM uuid to drivers https://review.openstack.org/529037 | 11:18 |
kashyap | lyarwood: Wonder if you have a link to how many times this was hit in the past / this week? | 11:19 |
lyarwood | kashyap: I don't have one to hand now but I can create one, and an upstream bug for this | 11:21 |
mdbooth | lyarwood: Don't know if you're still looking at it, but I've been messing with that series this morning. | 11:22 |
*** AlexeyAbashkin has quit IRC | 11:22 | |
lyarwood | assuming we've seen it more than once | 11:22 |
mdbooth | Not quite finished yet. | 11:22 |
lyarwood | mdbooth: kk I stopped after the earlier change sorry | 11:22 |
mdbooth | lyarwood: NP. I'd have been messing you about anyway. | 11:23 |
kashyap | lyarwood: I'm creating a potential reproducer, and can file one with that | 11:23 |
kashyap | lyarwood: But if you've already drafted a bug / issue, go ahead & submit it | 11:23 |
* kashyap looks at logstash meanwhile | 11:24 | |
*** andreas_s has quit IRC | 11:24 | |
*** yangyapeng has joined #openstack-nova | 11:24 | |
*** huanxie has quit IRC | 11:25 | |
*** andreas_s has joined #openstack-nova | 11:26 | |
openstackgerrit | Merged openstack/nova master: Remove 'nova-manage shell' command https://review.openstack.org/521835 | 11:27 |
*** gszasz has joined #openstack-nova | 11:28 | |
*** yangyapeng has quit IRC | 11:29 | |
*** huanxie has joined #openstack-nova | 11:30 | |
*** andreas_s has quit IRC | 11:30 | |
openstackgerrit | Matthew Booth proposed openstack/nova master: Give volume DriverBlockDevice classes a common prefix https://review.openstack.org/526346 | 11:32 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Add DriverLocalImageBlockDevice https://review.openstack.org/526347 | 11:32 |
*** david_8 has quit IRC | 11:33 | |
*** carthaca_ has quit IRC | 11:33 | |
*** mkoderer_ has quit IRC | 11:33 | |
*** tpatzig_4 has quit IRC | 11:33 | |
*** tpatzig_5 has joined #openstack-nova | 11:34 | |
*** carthaca_1 has joined #openstack-nova | 11:34 | |
*** david_9 has joined #openstack-nova | 11:34 | |
*** dgonzalez_ has joined #openstack-nova | 11:34 | |
*** carthaca_ has joined #openstack-nova | 11:34 | |
*** mkoderer_ has joined #openstack-nova | 11:34 | |
openstackgerrit | Matthew Booth proposed openstack/nova master: Use real block_device_info data in test_blockinfo https://review.openstack.org/527916 | 11:34 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Rename block_device_info_get_root https://review.openstack.org/529028 | 11:34 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Add local_root to block_device_info https://review.openstack.org/529029 | 11:34 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Expose driver_block_device fields as attributes https://review.openstack.org/528362 | 11:34 |
openstackgerrit | Matthew Booth proposed openstack/nova master: Pass DriverBlockDevice to driver.attach_volume https://review.openstack.org/528363 | 11:34 |
*** dgonzalez_ has quit IRC | 11:36 | |
*** carthaca_1 has quit IRC | 11:36 | |
*** AlexeyAbashkin has joined #openstack-nova | 11:38 | |
*** brault has quit IRC | 11:39 | |
*** andreas_s has joined #openstack-nova | 11:40 | |
*** alexchadin has quit IRC | 11:44 | |
*** yangyapeng has joined #openstack-nova | 11:45 | |
*** brault has joined #openstack-nova | 11:45 | |
*** yangyapeng has quit IRC | 11:49 | |
*** andreas_s has quit IRC | 11:50 | |
*** brault has quit IRC | 11:50 | |
*** andreas_s has joined #openstack-nova | 11:51 | |
*** andreas_s has quit IRC | 11:53 | |
*** andreas_s has joined #openstack-nova | 11:53 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Change 'InstancePCIRequest' spec field https://review.openstack.org/449257 | 11:53 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Add Neutron port capabilities to devspec in request https://review.openstack.org/451777 | 11:53 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Format NIC features using os-traits definitions https://review.openstack.org/466051 | 11:53 |
*** smatzek has joined #openstack-nova | 11:56 | |
*** markvoelker has joined #openstack-nova | 11:57 | |
*** gszasz has quit IRC | 11:59 | |
*** gszasz has joined #openstack-nova | 12:00 | |
stephenfin | ralonsoh: If you have the time to address it, I rebased and left a question on https://review.openstack.org/#/c/449257/ | 12:00 |
openstackgerrit | Merged openstack/nova master: Pass mountpoint to volume attachment_update https://review.openstack.org/527468 | 12:00 |
*** huanxie has quit IRC | 12:01 | |
ralonsoh | stephenfin: I'll take a look to those patches on Friday. This patch depends on other two patches, 449257 and 451777 | 12:02 |
stephenfin | ralonsoh: 449257 is the one I'm referring to :) | 12:02 |
ralonsoh | stephenfin: https://review.openstack.org/#/c/466051/ shouldn't be on top of master | 12:02 |
ralonsoh | stephenfin: ok, thanks! I'll take a look at those patches on Friday | 12:03 |
stephenfin | ralonsoh: It isn't - it's on top of 449257 and 451777 | 12:03 |
ralonsoh | stephenfin: I was lookin at the wrong patch | 12:03 |
ralonsoh | looking | 12:04 |
gibi | jaypipes: some extra madness for the server_group functional tests https://bugs.launchpad.net/nova/+bug/1739013 | 12:04 |
openstack | Launchpad bug 1739013 in OpenStack Compute (nova) "nova.tests.functional.test_server_group.ServerGroupTest*.test_evacuate_with_anti_affinity does not validate that evacuation really happens" [Undecided,New] | 12:04 |
*** yangyapeng has joined #openstack-nova | 12:05 | |
*** huanxie has joined #openstack-nova | 12:05 | |
*** yamamoto has joined #openstack-nova | 12:06 | |
*** alexchadin has joined #openstack-nova | 12:06 | |
openstackgerrit | Yikun Jiang (Kero) proposed openstack/python-novaclient master: Microversion 2.58 - Instance actions list pagination https://review.openstack.org/528601 | 12:07 |
*** Brin has joined #openstack-nova | 12:09 | |
*** salv-orlando has joined #openstack-nova | 12:10 | |
*** yamamoto has quit IRC | 12:10 | |
*** yangyapeng has quit IRC | 12:11 | |
*** zhangbailin_ has joined #openstack-nova | 12:12 | |
*** Brin has quit IRC | 12:12 | |
*** salv-orlando has quit IRC | 12:14 | |
*** edmondsw has joined #openstack-nova | 12:16 | |
*** annp has quit IRC | 12:17 | |
*** zhangbailin_ has quit IRC | 12:17 | |
jaypipes | gibi: not sure how much more madness you can get in that... :) | 12:17 |
ebbex | I've created a server with swap, where both root and swap are rbd, yet I have a "huge" swap-file on my compute node under nova/instances/_base/, and I see in the logs a "nova-rootwrap touch -c ...ova/instances/_base/swap_16384" going off about once a minute on that compute node. Where can I read up on the code that creates that file, and how long is the swap-file supposed to stay there? | 12:18 |
jaypipes | ebbex: the swap file should stay there for the life of the VM (since it's the swap content for the image...) | 12:19 |
jaypipes | ebbex: though I'm not sure why you'd see the touch -c command show up more than once. that's odd... | 12:20 |
*** edmondsw has quit IRC | 12:20 | |
ebbex | "virsh domblklist instance-00000041" gives: vdb vms/6e366e4b-2d87-48ac-a99c-999706e7e4f0_disk.swap, (on the ceph cluster) which I take it is where the instance gets to write swap to, right? No actual writes going to the _base/swap | 12:22 |
*** links has quit IRC | 12:23 | |
ebbex | I'm afraid that we might end up with a full disk thanks to swap images on our computenode as we have really small disks there. Yet vast amounts of storage on ceph. | 12:24 |
jaypipes | ebbex: hmm, I'm not sure. sure... mdbooth you around? | 12:25 |
lyarwood | mdbooth: ^ that smells like a bug, looksing at the code the fetch_func for creating swap is always _create_swap in nova/virt/libvirt/driver.py | 12:26 |
lyarwood | looksing | 12:26 |
lyarwood | :| | 12:26 |
jaypipes | mdbooth: does swap file get fulfilled by local disk even when ceph is used? | 12:26 |
jaypipes | oh, hey lyarwood :) | 12:26 |
lyarwood | \o_ morning | 12:26 |
*** mlavalle has joined #openstack-nova | 12:28 | |
ebbex | jaypipes: Yeah, I think it's kinda odd touching the file every minute, if the ImageCache tries something like _remove_old_enough*. I don't really understand how it's all supposed to hang together. | 12:28 |
*** yangyapeng has joined #openstack-nova | 12:29 | |
jaypipes | ebbex: we've sent up the bat-signal for mdbooth :) hopefully he can share his insight on this (I'm afraid I'm not proficient enough in this area of the codebase) | 12:29 |
*** markvoelker has quit IRC | 12:29 | |
*** brault has joined #openstack-nova | 12:30 | |
ebbex | :) | 12:30 |
*** gszasz has quit IRC | 12:33 | |
*** yangyapeng has quit IRC | 12:34 | |
*** huanxie has quit IRC | 12:36 | |
*** yamamoto has joined #openstack-nova | 12:38 | |
*** yamamoto has quit IRC | 12:39 | |
*** tuanla____ has quit IRC | 12:39 | |
*** lucasagomes is now known as lucas-hungry | 12:42 | |
jaypipes | stephenfin: I'd just go ahead and take over that InstancePCIRequest patch... ralonsoh, you cool with that? | 12:42 |
*** huanxie has joined #openstack-nova | 12:44 | |
*** gszasz has joined #openstack-nova | 12:45 | |
*** alexchadin has quit IRC | 12:46 | |
ralonsoh | jaypipes, stephenfin: but I've been taking care of my remaining patches. Anyway, if doing this we can have this 8 months patch merged, is ok | 12:47 |
openstackgerrit | Claudiu Belu proposed openstack/nova master: tests: autospecs all the mock.patch usages https://review.openstack.org/470775 | 12:48 |
*** jpena is now known as jpena|lunch | 12:48 | |
*** weshay_pto is now known as weshay | 12:48 | |
*** yangyapeng has joined #openstack-nova | 12:49 | |
*** aarefiev has joined #openstack-nova | 12:50 | |
jaypipes | ralonsoh: cool. it's just that stephenfin has been making some other changes around InstancePCIRequest object to support NUMA PCI affinity policy, so I thought it would be easier to have him take it over. | 12:52 |
*** ralonsoh has quit IRC | 12:53 | |
*** yangyapeng has quit IRC | 12:53 | |
jaypipes | gibi: are you planning on pushing a patch around that test_server_groups.py bug? | 12:54 |
*** claudiub|2 has joined #openstack-nova | 12:57 | |
*** zhurong has joined #openstack-nova | 13:00 | |
*** claudiub has quit IRC | 13:00 | |
*** zhurong has quit IRC | 13:02 | |
*** zhurong has joined #openstack-nova | 13:03 | |
*** claudiub has joined #openstack-nova | 13:04 | |
gibi | jaypipes: yes, I'm working on that right now | 13:07 |
jaypipes | gibi: cool. | 13:07 |
*** claudiub|2 has quit IRC | 13:07 | |
*** yangyapeng has joined #openstack-nova | 13:10 | |
*** salv-orlando has joined #openstack-nova | 13:11 | |
*** janki has quit IRC | 13:11 | |
*** janki has joined #openstack-nova | 13:11 | |
*** catintheroof has joined #openstack-nova | 13:11 | |
*** catintheroof has quit IRC | 13:12 | |
openstackgerrit | Merged openstack/nova master: Deprecate configurable Hide Server Address Feature https://review.openstack.org/526297 | 13:12 |
*** catintheroof has joined #openstack-nova | 13:12 | |
*** huanxie has quit IRC | 13:14 | |
*** yangyapeng has quit IRC | 13:15 | |
*** salv-orlando has quit IRC | 13:15 | |
*** links has joined #openstack-nova | 13:18 | |
*** r-daneel has joined #openstack-nova | 13:19 | |
*** rcernin has quit IRC | 13:20 | |
*** huanxie has joined #openstack-nova | 13:20 | |
*** r-daneel has quit IRC | 13:20 | |
*** zhurong has quit IRC | 13:25 | |
*** markvoelker has joined #openstack-nova | 13:27 | |
*** markvoelker has quit IRC | 13:28 | |
*** jaianshu has quit IRC | 13:28 | |
*** markvoelker has joined #openstack-nova | 13:29 | |
*** yangyapeng has joined #openstack-nova | 13:30 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Fix false positive server group functional tests https://review.openstack.org/529063 | 13:30 |
gibi | jaypipes: ^^ | 13:30 |
jaypipes | cool, thanks | 13:30 |
*** Tom-Tom has joined #openstack-nova | 13:31 | |
cdent | gibi: since you've spent a lot of time in the servers functional tests, can you recall how good the coverage is for the various migrations? I'm hoping that existing tests cover https://review.openstack.org/#/c/528089/ | 13:34 |
*** r-daneel has joined #openstack-nova | 13:34 | |
*** lucas-hungry is now known as lucasagomes | 13:34 | |
*** yangyapeng has quit IRC | 13:35 | |
*** diga has joined #openstack-nova | 13:36 | |
*** stvnoyes has joined #openstack-nova | 13:36 | |
*** pchavva has joined #openstack-nova | 13:38 | |
*** ralonsoh has joined #openstack-nova | 13:39 | |
*** yangyapeng has joined #openstack-nova | 13:39 | |
*** yamamoto has joined #openstack-nova | 13:40 | |
*** liverpooler has joined #openstack-nova | 13:40 | |
*** yangyapeng has quit IRC | 13:44 | |
*** dave-mccowan has joined #openstack-nova | 13:45 | |
*** yamamoto has quit IRC | 13:48 | |
*** jpena|lunch is now known as jpena | 13:49 | |
*** huanxie has quit IRC | 13:51 | |
*** Tom-Tom_ has joined #openstack-nova | 13:51 | |
*** mingyu has joined #openstack-nova | 13:52 | |
*** Tom-Tom has quit IRC | 13:54 | |
*** huanxie has joined #openstack-nova | 13:56 | |
*** mriedem has joined #openstack-nova | 13:58 | |
*** lyan has joined #openstack-nova | 14:01 | |
bauzas | mmm, blaming libvirt/driver.py seems a bit habit :p | 14:02 |
bauzas | s/bit/bad | 14:02 |
*** links has quit IRC | 14:03 | |
*** yangyapeng has joined #openstack-nova | 14:06 | |
gibi | cdent: I think it is safe to assume that the changes in https://review.openstack.org/#/c/528089/ is covered with the existing functional tests | 14:10 |
gibi | cdent: I put the patch on my review list | 14:11 |
cdent | thanks gibi | 14:11 |
*** salv-orlando has joined #openstack-nova | 14:11 | |
mriedem | edleafe: the failure on https://review.openstack.org/#/c/511358/ is because we aren't removing the existing allocations for the instance (from the tried and failed host) before we try allocating resources on the alternate | 14:11 |
mriedem | so the report client thinks we're doing a move operation, which we aren't | 14:12 |
*** yangyapeng has quit IRC | 14:12 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Fix false positive server group functional tests https://review.openstack.org/529063 | 14:13 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Fix false positive server group functional tests https://review.openstack.org/529063 | 14:15 |
mriedem | _move_operation_alloc_request is broken if we get the allocation candidates using 1.12 | 14:15 |
*** salv-orlando has quit IRC | 14:16 | |
edleafe | mriedem: ok, just settling in. Will look over that shortly | 14:17 |
*** links has joined #openstack-nova | 14:17 | |
*** abhishekk has joined #openstack-nova | 14:18 | |
*** abhishekk_ has joined #openstack-nova | 14:19 | |
*** Tom-Tom_ has quit IRC | 14:19 | |
*** Tom-Tom has joined #openstack-nova | 14:19 | |
mriedem | i'll open a bug for the _move_operation_alloc_request thing | 14:20 |
*** salv-orlando has joined #openstack-nova | 14:23 | |
mriedem | https://bugs.launchpad.net/nova/+bug/1739042 | 14:23 |
openstack | Launchpad bug 1739042 in OpenStack Compute (nova) "_move_operation_alloc_request fails with TypeError when using 1.12 version allocation request" [Undecided,New] | 14:23 |
kashyap | mriedem: When you get a moment, my 'logstash' foo isn't helping me; I want to see how many times this error has occurred: "error: connection closed due to keepalive timeout" | 14:24 |
kashyap | Putting it verbatim here http://logstash.openstack.org/#/dashboard/file/logstash.json | 14:24 |
*** Tom-Tom has quit IRC | 14:24 | |
kashyap | Didn't help. | 14:24 |
mriedem | kashyap: where is it originating from? | 14:24 |
kashyap | mriedem: stable/newton | 14:24 |
mriedem | which file? | 14:24 |
kashyap | Let me get a link | 14:24 |
kashyap | mriedem: There - http://logs.openstack.org/38/528338/4/check/legacy-tempest-dsvm-multinode-live-migration/d867726/job-output.txt.gz#_2017-12-18_20_20_56_230121 | 14:24 |
*** yangyapeng has joined #openstack-nova | 14:24 | |
kashyap | It's this one: LiveMigrationTest.test_live_block_migration_paused | 14:24 |
mriedem | i don't see "error: connection closed due to keepalive timeout" in there at all | 14:25 |
kashyap | I debugged it a bit this morning w/ upstream libvirt & QEMU folks. And I'm setting up a reproducer to see if I can get to it | 14:25 |
kashyap | mriedem: Ah, sorry; that error actually comes from libvirtd log, let me get that link | 14:25 |
mriedem | we don't index the libvirtd logs | 14:25 |
mriedem | which is why it's not in logstash | 14:25 |
lyarwood | it's also in n-cpu FWIW | 14:25 |
lyarwood | http://logs.openstack.org/38/528338/4/check/legacy-tempest-dsvm-multinode-live-migration/d867726/logs/subnode-2/screen-n-cpu.txt.gz#_2017-12-18_20_20_13_894 | 14:26 |
kashyap | mriedem: There - http://logs.openstack.org/38/528338/4/check/legacy-tempest-dsvm-multinode-live-migration/d867726/logs/subnode-2/libvirt/libvirtd.txt.gz#_2017-12-18_20_20_13_880 | 14:26 |
kashyap | Ah-ha | 14:26 |
kashyap | mriedem: Any reason we don't index it? | 14:26 |
mriedem | http://logs.openstack.org/38/528338/4/check/legacy-tempest-dsvm-multinode-live-migration/d867726/logs/subnode-2/screen-n-cpu.txt.gz#_2017-12-18_20_20_13_894 is debug | 14:26 |
mriedem | we index INFO+ | 14:26 |
*** huanxie has quit IRC | 14:26 | |
mriedem | we don't index libvirtd because it kills the indexer | 14:26 |
mriedem | too much content | 14:26 |
* kashyap nods | 14:26 | |
kashyap | Okay, the screen-n-cpu.txt has it | 14:26 |
lyarwood | mriedem: it's also above in ERROR | 14:26 |
kashyap | Yeah, it's in ERROR | 14:27 |
mriedem | http://logs.openstack.org/38/528338/4/check/legacy-tempest-dsvm-multinode-live-migration/d867726/logs/subnode-2/screen-n-cpu.txt.gz#_2017-12-18_20_20_13_893 | 14:27 |
mriedem | ok that should work | 14:27 |
kashyap | mriedem: Do you kow how could it kill the index? Due to its size? | 14:27 |
mriedem | kashyap: yes | 14:27 |
mriedem | size | 14:27 |
mriedem | http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Live%20Migration%20failure%3A%20internal%20error%3A%20connection%20closed%20due%20to%20keepalive%20timeout%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d | 14:28 |
kashyap | Ah, interesting, even ~217K is too much? | 14:28 |
kashyap | (Anyway, that's fine.) | 14:28 |
* kashyap clicks | 14:28 | |
kashyap | mriedem++ | 14:28 |
mriedem | kashyap: it's that times however many jobs we run PER DAY | 14:28 |
kashyap | No Karma bot | 14:28 |
*** smatzek has quit IRC | 14:29 | |
*** yangyapeng has quit IRC | 14:29 | |
kashyap | Okay, only 2 hits so far; thanks mriedem | 14:29 |
mriedem | yeah, but all on newton | 14:30 |
kashyap | Right | 14:30 |
*** andreas_s has quit IRC | 14:30 | |
mriedem | so i'm guessing it's related to the version of libvirt/qemu we have in newton | 14:30 |
kashyap | (Don't know the root cause of it yet; could be QEMU, could be libvirt. There are 2 other bugs filed for them - https://bugzilla.redhat.com/show_bug.cgi?id=1367620) | 14:30 |
openstack | bugzilla.redhat.com bug 1367620 in libvirt "storage migration fails due to keepalive timeout" [High,Assigned] - Assigned to jdenemar | 14:30 |
*** kumarmn has joined #openstack-nova | 14:30 | |
mriedem | i don't think we're using UCA packages in newton jobs | 14:30 |
*** andreas_s has joined #openstack-nova | 14:30 | |
mriedem | ^ is libvirt 1.3.1 and qemu 2.5 | 14:31 |
mriedem | we're way newer than that on queens | 14:31 |
kashyap | Yeah, saw the versions earlier in the day | 14:31 |
*** huanxie has joined #openstack-nova | 14:31 | |
kashyap | Is it worth it to use UCA in that case? Maybe not, for these rare one-off cases | 14:31 |
mriedem | not at this point for newton | 14:32 |
kashyap | Yep, noted. | 14:32 |
mriedem | bauzas: this is a regression introduced in newton https://review.openstack.org/#/c/528835/ - would be good to get your review on that | 14:32 |
openstackgerrit | Bernhard M. Wiedemann proposed openstack/nova master: Fix 4 doc typos https://review.openstack.org/529084 | 14:33 |
bauzas | mriedem: ack, looking | 14:35 |
*** abhishekk has quit IRC | 14:35 | |
bauzas | mriedem: ah, good call | 14:36 |
bauzas | I remember we had a shit number of races for the BuildRequest object | 14:37 |
edleafe | mriedem: about the func test failure: this line should de-allocate against the instance: https://review.openstack.org/#/c/511358/43/nova/compute/manager.py@1778 | 14:38 |
mriedem | right so what was added there in newton was just for novalidhost on the initial create | 14:38 |
mriedem | but didn't take into account reschedules | 14:39 |
mdbooth | mriedem: Any chance you could have another look at the BDM uuid patches? https://review.openstack.org/#/c/242602/25 and the following 2 are the ones which do the db modification. I addressed your review comments. | 14:39 |
bauzas | mriedem: the point is that we were not having cell conductors yet | 14:39 |
mriedem | edleafe: ah, well, that's a race :) | 14:39 |
mriedem | edleafe: we cast to build_instances *before* compute cleans up the allocations | 14:40 |
bauzas | mriedem: now that we reschedule per cell conductors, yes it's a problem | 14:40 |
mriedem | bauzas: you could still run newton in split MQ mode | 14:40 |
mriedem | and split db | 14:40 |
mriedem | i think anyway | 14:40 |
edleafe | mriedem: so it's only locked for build, not claim | 14:40 |
bauzas | mriedem: sure | 14:41 |
mriedem | edleafe: the lock in compute doesn't matter | 14:41 |
mriedem | compute rpc casts to conductor build_instances | 14:41 |
mriedem | and then goes to delete the allocation for the instance | 14:41 |
edleafe | mriedem: that's my point - it's only locking builds for that host | 14:42 |
mriedem | in fact, this could overwrite what conductor claims on the alternate if the timing window hits it just right | 14:42 |
edleafe | I'll move the allocation cleanup so it is run before the cast | 14:42 |
mriedem | edleafe: you can't just move it, | 14:42 |
mriedem | it's there for reschedules and any other kind of failure | 14:43 |
mriedem | edleafe: i think this: | 14:43 |
mriedem | fails = (build_results.FAILED, | 14:43 |
mriedem | build_results.RESCHEDULED) | 14:43 |
edleafe | mriedem: all of the other cleanups are in _do_build_and_run_instance() | 14:43 |
mriedem | becomes just build_results.FAILED | 14:43 |
mriedem | but if we change that then self._build_failed() won't get called... | 14:44 |
bauzas | mriedem: looking at http://www.voidspace.org.uk/python/mock/magicmock.html#mock.NonCallableMagicMock | 14:45 |
*** andreas_s has quit IRC | 14:45 | |
bauzas | mriedem: it means that we call it, then we would have an exception ? | 14:45 |
mriedem | bauzas: yes | 14:45 |
bauzas | interesting | 14:45 |
bauzas | I wasn't knowing it | 14:45 |
mriedem | edleafe: so if you're going to leave the cleanup in the compute, then i think we can only call https://review.openstack.org/#/c/511358/43/nova/compute/manager.py@1778 if result == build_results.FAILED in that block | 14:45 |
*** andreas_s has joined #openstack-nova | 14:45 | |
mriedem | because we still need to call self._build_failed() | 14:45 |
mriedem | and then *add* rt.reportclient.delete_allocation_for_instance(instance.uuid) right before we cast to build_instances | 14:46 |
mriedem | yeah? | 14:46 |
edleafe | mriedem: I can split the code running under that conditional so that the deallocation only runs for FAILED, but the rest runs for both | 14:46 |
*** burt has joined #openstack-nova | 14:46 | |
bauzas | mriedem: any reason why you're not just using http://www.voidspace.org.uk/python/mock/mock.html#mock.Mock.called ? | 14:46 |
mriedem | bauzas: one less thing to do | 14:46 |
edleafe | yeah, that's where I was going to move it to. I'll just copy the call. | 14:47 |
*** gouthamr has joined #openstack-nova | 14:47 | |
mriedem | bauzas: NonCallableMock just does the thing i already want | 14:47 |
bauzas | I see | 14:47 |
bauzas | anway, I don't want to discuss about the pattern | 14:47 |
*** felipemonteiro has joined #openstack-nova | 14:47 | |
*** cleong has joined #openstack-nova | 14:47 | |
bauzas | my point is just that when reviewing the change, we need to understand that noncallablemock already supports that | 14:47 |
*** andreas_s has quit IRC | 14:48 | |
bauzas | without needing to verify the call count | 14:48 |
*** andreas_s has joined #openstack-nova | 14:48 | |
bauzas | less explicit, but interesting tho | 14:48 |
mriedem | we = you? | 14:48 |
mriedem | now you know :) | 14:48 |
mriedem | i expect to see it in all of your new tests now | 14:48 |
bauzas | heh | 14:49 |
openstackgerrit | Jackie Truong proposed openstack/python-novaclient master: Microversion 2.59 - Add trusted_image_certificates https://review.openstack.org/500396 | 14:50 |
*** jmlowe has joined #openstack-nova | 14:51 | |
*** gszasz has quit IRC | 14:57 | |
*** smatzek has joined #openstack-nova | 14:58 | |
*** pchavva has quit IRC | 14:59 | |
openstackgerrit | Merged openstack/nova stable/newton: Make request_spec.spec MediumText https://review.openstack.org/528338 | 15:00 |
*** yangyapeng has joined #openstack-nova | 15:01 | |
mriedem | huh https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/newton | 15:02 |
mriedem | to eol or not to eol | 15:02 |
lyarwood | \o/ | 15:02 |
*** huanxie has quit IRC | 15:02 | |
mriedem | i worry about not having https://review.openstack.org/#/c/528835/ in newton | 15:02 |
*** pchavva has joined #openstack-nova | 15:03 | |
*** andreas_s has quit IRC | 15:03 | |
mriedem | but i also don't know how many people running newton are going to have conductor split out yet and not have the cell conductor configured to hit the api db | 15:03 |
mriedem | as bauzas noted, probably not a real worry | 15:03 |
*** andreas_s has joined #openstack-nova | 15:03 | |
*** salv-orlando has quit IRC | 15:04 | |
*** salv-orlando has joined #openstack-nova | 15:04 | |
mriedem | unrelated, i also started thinking about https://review.openstack.org/#/q/topic:fix-bfv-boot-resources+(status:open+OR+status:merged) again... | 15:06 |
mriedem | and whether or not we should just take on the debt since shared provider modeling is who knows how far off yet | 15:06 |
*** huanxie has joined #openstack-nova | 15:08 | |
*** salv-orlando has quit IRC | 15:09 | |
* cdent feels shame | 15:09 | |
*** mingyu has quit IRC | 15:10 | |
*** marst has joined #openstack-nova | 15:10 | |
*** andreas_s has quit IRC | 15:12 | |
*** gszasz has joined #openstack-nova | 15:13 | |
mriedem | no shame intended | 15:13 |
*** andreas_s has joined #openstack-nova | 15:13 | |
mriedem | it's that we put that off for a few releases because we were saying placement would fix the problem, and we haven't yet, and people (ops) ask for it at least once per cycle | 15:13 |
jaypipes | cdent shaming is indeed the best kind of shaming. second only to pug shaming. | 15:14 |
openstackgerrit | Merged openstack/nova master: [placement] Add x-openstack-request-id in API ref https://review.openstack.org/523007 | 15:14 |
cdent | mriedem: don't worry, I'll feel shame, even for things entirely outside my control and/or the result of perfectly reasonable decision making processes | 15:15 |
cdent | I may be part pug | 15:15 |
*** salv-orlando has joined #openstack-nova | 15:15 | |
*** aarefiev has quit IRC | 15:20 | |
*** chyka has joined #openstack-nova | 15:20 | |
*** sahid has quit IRC | 15:20 | |
*** armax has joined #openstack-nova | 15:20 | |
*** andreas_s has quit IRC | 15:23 | |
*** karthiks has quit IRC | 15:25 | |
*** chyka has quit IRC | 15:26 | |
*** chyka has joined #openstack-nova | 15:26 | |
*** andreas_s has joined #openstack-nova | 15:27 | |
maciejjozefczyk | jaypipes: Hey :) I responded to your comment https://review.openstack.org/#/c/520024/ Could you please check it? Is it possible to discuss it when you'll check it? Maybe on Thursdays meeting? Thanks :) | 15:29 |
*** felipemonteiro has quit IRC | 15:29 | |
openstackgerrit | Jackie Truong proposed openstack/python-novaclient master: Microversion 2.59 - Add trusted_image_certificates https://review.openstack.org/500396 | 15:30 |
*** felipemonteiro has joined #openstack-nova | 15:30 | |
*** chyka has quit IRC | 15:31 | |
*** awaugama has joined #openstack-nova | 15:34 | |
*** andreas_s has quit IRC | 15:34 | |
*** andreas_s has joined #openstack-nova | 15:34 | |
*** slunkad_ has quit IRC | 15:36 | |
*** liverpooler has quit IRC | 15:38 | |
jaypipes | maciejjozefczyk: I should be able to get to that patch today, yes. | 15:38 |
*** eharney has joined #openstack-nova | 15:38 | |
*** huanxie has quit IRC | 15:38 | |
*** slunkad has joined #openstack-nova | 15:39 | |
maciejjozefczyk | jaypipes: thanks a lot :) | 15:39 |
*** gszasz has quit IRC | 15:39 | |
*** yikun_jiang has joined #openstack-nova | 15:41 | |
*** slunkad has quit IRC | 15:43 | |
*** yikun has quit IRC | 15:44 | |
*** huanxie has joined #openstack-nova | 15:44 | |
*** liusheng has quit IRC | 15:45 | |
*** liusheng has joined #openstack-nova | 15:45 | |
*** elod has quit IRC | 15:47 | |
*** liverpooler has joined #openstack-nova | 15:48 | |
*** gszasz has joined #openstack-nova | 15:51 | |
*** edmondsw has joined #openstack-nova | 15:52 | |
*** salv-orlando has quit IRC | 15:52 | |
*** salv-orlando has joined #openstack-nova | 15:53 | |
*** nore_rabel has quit IRC | 15:54 | |
*** edmondsw has quit IRC | 15:56 | |
*** salv-orlando has quit IRC | 15:57 | |
*** slunkad has joined #openstack-nova | 15:59 | |
mriedem | lyarwood: artom: bauzas: did we or did we not say that we needed a minor version bump on stable for the release with the schema migration? | 16:07 |
lyarwood | mriedem: we don't \need\ it for anything but I think we agreed it would be nice to have a minor version bump for this, yes. | 16:07 |
*** josecastroleon has joined #openstack-nova | 16:10 | |
mriedem | ok here is ocata https://review.openstack.org/529100 | 16:11 |
*** jmlowe has quit IRC | 16:13 | |
mriedem | and newton: https://review.openstack.org/529102 | 16:14 |
*** huanxie has quit IRC | 16:14 | |
*** damien_r has joined #openstack-nova | 16:20 | |
*** huanxie has joined #openstack-nova | 16:20 | |
*** brault has quit IRC | 16:20 | |
*** brault has joined #openstack-nova | 16:21 | |
*** salv-orlando has joined #openstack-nova | 16:25 | |
*** brault_ has joined #openstack-nova | 16:25 | |
*** brault has quit IRC | 16:26 | |
*** andreas_s has quit IRC | 16:26 | |
*** andreas_s has joined #openstack-nova | 16:26 | |
openstackgerrit | Merged openstack/nova master: Updated from global requirements https://review.openstack.org/528881 | 16:27 |
*** janki has quit IRC | 16:31 | |
mriedem | jaypipes: on maciejjozefczyk's patch, i'm assuming the shutdown instances thing is a problem because of _update_usage_from_instance which is called between the initial compute node update and the final one, | 16:32 |
mriedem | and _update_usage_from_instance calls self.stats.update_stats_for_instance(instance, is_removed_instance) | 16:32 |
mriedem | which looks at things like vm_sate | 16:32 |
mriedem | *state | 16:32 |
jaypipes | yeah | 16:32 |
mriedem | and calls _update_usage | 16:33 |
mriedem | i'm not sure wth cn.current_workload = self.stats.calculate_workload() is for | 16:33 |
mriedem | no filters use that, it's just for reporting out of the API i guess | 16:34 |
*** moshele has joined #openstack-nova | 16:34 | |
jaypipes | mriedem: switched my vote on it. | 16:34 |
*** r-daneel has quit IRC | 16:38 | |
openstackgerrit | Merged openstack/python-novaclient master: Updated from global requirements https://review.openstack.org/528911 | 16:39 |
mriedem | jaypipes: i think he still has changes to make | 16:39 |
mriedem | per my earlier review | 16:39 |
mriedem | in _check_for_nodes_rebalance | 16:39 |
*** moshele has quit IRC | 16:39 | |
*** andreas_s has quit IRC | 16:40 | |
*** gyee has joined #openstack-nova | 16:42 | |
*** penick has joined #openstack-nova | 16:42 | |
jaypipes | mriedem: sure, though that's only going to be valid for baremetal nodes... | 16:46 |
jaypipes | mriedem: not sure there's much of a race interval for that... but maybe | 16:47 |
*** huanxie has quit IRC | 16:50 | |
*** andreas_s has joined #openstack-nova | 16:53 | |
*** huanxie has joined #openstack-nova | 16:56 | |
*** lucasagomes is now known as lucas-afk | 16:56 | |
*** sridharg has quit IRC | 16:56 | |
*** AlexeyAbashkin has quit IRC | 17:01 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: console: introduce framework for RFB authentication https://review.openstack.org/345397 | 17:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: console: introduce the VeNCrypt RFB authentication scheme https://review.openstack.org/345398 | 17:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: console: Provide an RFB security proxy implementation https://review.openstack.org/345399 | 17:01 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: doc: Document TLS security setup for noVNC proxy https://review.openstack.org/500544 | 17:01 |
*** salv-orlando has quit IRC | 17:01 | |
*** salv-orl_ has joined #openstack-nova | 17:01 | |
cdent | jaypipes: speaking of cdent shaming, I was hoping you were going to shame me for my infinite resource classes crack in the latest placement update | 17:02 |
*** andreas_s has quit IRC | 17:03 | |
jaypipes | cdent: haven't gotten that far yet. | 17:04 |
jaypipes | cdent: still trying to wrestle with friggin server groups. | 17:05 |
*** r-daneel has joined #openstack-nova | 17:07 | |
*** imacdonn has quit IRC | 17:11 | |
*** imacdonn has joined #openstack-nova | 17:11 | |
*** ludovic_ has joined #openstack-nova | 17:13 | |
ludovic_ | Hello everyone | 17:13 |
stephenfin | ludovic_: o/ | 17:13 |
openstackgerrit | Merged openstack/os-vif master: Check if interface belongs to a Linux Bridge before removing https://review.openstack.org/526079 | 17:14 |
mnaser | Spec-ing out new compute and in the interest of deployers and doing things in open we’re planning to publish the document ... I wanted to gather some feedback at what sort of cpu overcommit you’ve ran/seen people run? | 17:14 |
ludovic_ | Maybe someone can help me to understand the Filter Scheduler ? I have found a strange scheduler behaviour while host-evacuate | 17:14 |
ludovic_ | I have two compute Nodes. The first compute Node with 31 instances ( total RAM allocated = 177152 M) , the second one with 2 instances ( Total RAM allocated = 24576 ) | 17:18 |
ludovic_ | RAM of computes Nodes is 196483 (memory_mb) | 17:19 |
ludovic_ | Host-evacuate work but 5 VMs was on ERROR with insufficient memory (nova-compute log) and on was with NO STATE (No Host found by RAM Filter) | 17:20 |
ludovic_ | I expected the same ERROR on these 6 instances | 17:21 |
ludovic_ | I used tripleO to deploy my OpenStack environment | 17:22 |
ludovic_ | with Ocata repository | 17:22 |
ludovic_ | nova.conf : enabled_filters=RetryFilter,AggregateInstanceExtraSpecsFilter,AvailabilityZoneFilter,RamFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter | 17:24 |
*** mvk has quit IRC | 17:25 | |
ludovic_ | Maybe someone can explain this situation ? Thanks a lot | 17:25 |
ludovic_ | nova specialist ? | 17:26 |
*** andreas_s has joined #openstack-nova | 17:26 | |
*** huanxie has quit IRC | 17:26 | |
*** shaner has quit IRC | 17:27 | |
ludovic_ | is someone have already wrote a scheduler filter to prioritize rebuild of instances ? | 17:29 |
*** ralonsoh has quit IRC | 17:30 | |
*** huanxie has joined #openstack-nova | 17:32 | |
*** brault_ has quit IRC | 17:38 | |
*** brault has joined #openstack-nova | 17:38 | |
*** edmondsw has joined #openstack-nova | 17:40 | |
*** brault has quit IRC | 17:43 | |
*** andreas_s has quit IRC | 17:43 | |
*** Apoorva has joined #openstack-nova | 17:43 | |
*** Apoorva has quit IRC | 17:43 | |
*** Apoorva has joined #openstack-nova | 17:44 | |
jaypipes | mriedem: http://paste.openstack.org/show/629349/ .. is there some other place other nova/api/openstack/api_version_request.py that I need to bump a max microversion? | 17:44 |
*** edmondsw has quit IRC | 17:44 | |
* jaypipes only used to the placement microversion dance, not the main nova api one.. | 17:45 | |
*** derekh has quit IRC | 17:47 | |
*** andreas_s has joined #openstack-nova | 17:48 | |
*** diga has quit IRC | 17:48 | |
*** catintheroof has quit IRC | 17:50 | |
*** shaner has joined #openstack-nova | 17:50 | |
cfriesen | question about the DB interfaces...why does some code go through nova.db.api and other code directly uses nova.db.sqlalchemy.api ? | 17:50 |
*** josecastroleon has quit IRC | 17:51 | |
*** catintheroof has joined #openstack-nova | 17:51 | |
*** mdnadeem has quit IRC | 17:52 | |
*** damien_r has quit IRC | 17:52 | |
*** alee is now known as alee_lunch | 17:54 | |
mriedem | jaypipes: yeah the version samples | 17:57 |
mriedem | jaypipes: https://github.com/openstack/nova/tree/master/doc/api_samples/versions | 17:58 |
jaypipes | mriedem: I looked there but all I see is an interpolation marker for max_api_version | 17:58 |
jaypipes | mriedem: gah, never mind. | 17:59 |
jaypipes | mriedem: sigh... | 17:59 |
jaypipes | mriedem: was looking in nova/tests/functional/api_samples/ | 17:59 |
jaypipes | mriedem: have I mentioned I hate these? :) | 17:59 |
*** r-daneel_ has joined #openstack-nova | 18:01 | |
*** r-daneel has quit IRC | 18:01 | |
*** r-daneel_ is now known as r-daneel | 18:01 | |
*** huanxie has quit IRC | 18:02 | |
*** andreas_s has quit IRC | 18:02 | |
*** sambetts is now known as sambetts|afk | 18:03 | |
cfriesen | is all quota information now going into the API DB? or will we still put some in the main DB? | 18:04 |
openstackgerrit | Merged openstack/nova master: Fix 4 doc typos https://review.openstack.org/529084 | 18:05 |
jaypipes | melwitt: see cfriesen's ? above... | 18:06 |
*** catintheroof has quit IRC | 18:07 | |
*** catintheroof has joined #openstack-nova | 18:08 | |
*** huanxie has joined #openstack-nova | 18:08 | |
*** harlowja has joined #openstack-nova | 18:15 | |
*** chyka has joined #openstack-nova | 18:18 | |
*** corey_ has joined #openstack-nova | 18:21 | |
*** cleong has quit IRC | 18:22 | |
*** AlexeyAbashkin has joined #openstack-nova | 18:22 | |
*** links has quit IRC | 18:23 | |
*** corey_ is now known as cleong | 18:24 | |
*** AlexeyAbashkin has quit IRC | 18:27 | |
*** jpena is now known as jpena|off | 18:33 | |
*** cfriesen has quit IRC | 18:33 | |
*** cfriesen has joined #openstack-nova | 18:33 | |
*** andreas_s has joined #openstack-nova | 18:39 | |
*** gszasz has quit IRC | 18:39 | |
*** huanxie has quit IRC | 18:40 | |
openstackgerrit | Merged openstack/python-novaclient master: CommandError is raised for invalid server fields https://review.openstack.org/525110 | 18:40 |
*** burt has quit IRC | 18:40 | |
*** burt has joined #openstack-nova | 18:43 | |
*** andreas_s has quit IRC | 18:43 | |
*** huanxie has joined #openstack-nova | 18:44 | |
*** chyka has quit IRC | 18:45 | |
*** AlexeyAbashkin has joined #openstack-nova | 18:46 | |
*** cleong has quit IRC | 18:47 | |
*** chyka_ has joined #openstack-nova | 18:49 | |
*** chyka_ has quit IRC | 18:49 | |
*** cleong has joined #openstack-nova | 18:49 | |
*** chyka_ has joined #openstack-nova | 18:50 | |
mriedem | cfriesen: api db | 18:51 |
mriedem | cfriesen: starting in pike we don't use the usages or reservations tables anymore | 18:51 |
*** brault has joined #openstack-nova | 18:52 | |
*** brault has quit IRC | 18:53 | |
*** brault has joined #openstack-nova | 18:54 | |
*** chyka_ has quit IRC | 18:54 | |
*** dtantsur is now known as dtantsur|afk | 18:56 | |
*** catintheroof has quit IRC | 18:56 | |
*** cdent has quit IRC | 18:56 | |
*** r-daneel has quit IRC | 18:57 | |
*** AlexeyAbashkin has quit IRC | 18:58 | |
*** elod has joined #openstack-nova | 18:59 | |
openstackgerrit | Jay Pipes proposed openstack/nova-specs master: Support aggregate affinity scheduler filters https://review.openstack.org/529135 | 19:01 |
*** andreas_s has joined #openstack-nova | 19:01 | |
*** sbezverk has joined #openstack-nova | 19:02 | |
*** r-daneel has joined #openstack-nova | 19:03 | |
*** alee_lunch is now known as alee | 19:04 | |
*** chyka has joined #openstack-nova | 19:12 | |
*** huanxie has quit IRC | 19:14 | |
*** andreas_s has quit IRC | 19:15 | |
*** chyka has quit IRC | 19:17 | |
*** huanxie has joined #openstack-nova | 19:20 | |
*** kumarmn has quit IRC | 19:21 | |
*** ludovic_ has quit IRC | 19:27 | |
*** edmondsw has joined #openstack-nova | 19:28 | |
*** catintheroof has joined #openstack-nova | 19:29 | |
*** awaugama has quit IRC | 19:29 | |
openstackgerrit | Jay Pipes proposed openstack/nova-specs master: Support aggregate affinity scheduler filters https://review.openstack.org/529135 | 19:31 |
*** damien_r has joined #openstack-nova | 19:32 | |
*** edmondsw has quit IRC | 19:32 | |
*** edmondsw has joined #openstack-nova | 19:37 | |
*** openstack has joined #openstack-nova | 19:43 | |
*** ChanServ sets mode: +o openstack | 19:43 | |
*** edmondsw has quit IRC | 19:44 | |
*** gouthamr has quit IRC | 19:45 | |
*** nore_rabel has joined #openstack-nova | 19:45 | |
*** yamahata has joined #openstack-nova | 19:47 | |
*** claudiub|2 has joined #openstack-nova | 19:48 | |
*** nore__ has joined #openstack-nova | 19:49 | |
*** claudiub has quit IRC | 19:50 | |
*** huanxie has quit IRC | 19:50 | |
*** itlinux has quit IRC | 19:52 | |
*** itlinux has joined #openstack-nova | 19:52 | |
*** fragatina has quit IRC | 19:56 | |
*** huanxie has joined #openstack-nova | 19:56 | |
*** damien_r has quit IRC | 19:59 | |
*** gouthamr has joined #openstack-nova | 20:00 | |
*** gouthamr has quit IRC | 20:01 | |
*** nore_rabel has quit IRC | 20:04 | |
*** nore__ has quit IRC | 20:05 | |
rybridges | So I am seeing errors every time i make a snapshot | 20:11 |
*** thingee has left #openstack-nova | 20:11 | |
rybridges | regarldess of whether i use disable_libvirt_livesnapshot = true or disable_libvirt_livesnapshot = false | 20:11 |
rybridges | the error is the same every time | 20:11 |
rybridges | it happens on the hypervisor | 20:11 |
rybridges | this is nova-compute.log https://pastebin.com/WiqYtZzF | 20:12 |
rybridges | this is the error for libvirt log: https://pastebin.com/4NeSuNvX | 20:13 |
rybridges | its impossible to take a snapshot on ocata from the horizon ui | 20:14 |
rybridges | as far as we can tell | 20:14 |
*** mvk has joined #openstack-nova | 20:16 | |
mriedem | https://www.jrssite.com/wordpress/?p=302 | 20:18 |
mriedem | https://bugs.launchpad.net/nova/+bug/1381153 | 20:18 |
openstack | Launchpad bug 1381153 in OpenStack Compute (nova) "Cannot create instance live snapshots in Centos7 (icehouse)" [Undecided,Invalid] | 20:18 |
*** claudiub|2 has quit IRC | 20:23 | |
*** huanxie has quit IRC | 20:23 | |
*** huanxie has joined #openstack-nova | 20:23 | |
*** huanxie has quit IRC | 20:26 | |
*** jmlowe has joined #openstack-nova | 20:28 | |
cfriesen | mriedem: thanks for the quota answer earlier...in objects.Quotas we're still looking at both the api DB and the main DB. is that left over from Pike? Presumably now we could remove the main DB access? | 20:28 |
mriedem | cfriesen: that's for the online data migration | 20:30 |
alee | hi - does anyone know how to force nova to re-fetch images from glance? | 20:30 |
mriedem | so we can't remove that until people have run through the online data migrations, | 20:30 |
cfriesen | mriedem: that should have happened in Pike though, right? so we could remove it in Q? | 20:30 |
mriedem | and we have a blocker schema migration in place to enforce that there are no quota limits/classes in the main cell db | 20:30 |
mriedem | cfriesen: you'd have to add a blocker migration | 20:30 |
cfriesen | mriedem: okay | 20:31 |
*** nore_rabel has joined #openstack-nova | 20:32 | |
*** nore__ has joined #openstack-nova | 20:32 | |
*** huanxie has joined #openstack-nova | 20:32 | |
*** AlexeyAbashkin has joined #openstack-nova | 20:37 | |
*** kumarmn has joined #openstack-nova | 20:38 | |
*** kumarmn has quit IRC | 20:41 | |
*** kumarmn has joined #openstack-nova | 20:41 | |
*** AlexeyAbashkin has quit IRC | 20:41 | |
*** nore_rabel has quit IRC | 20:46 | |
*** nore__ has quit IRC | 20:46 | |
*** nore_rabel has joined #openstack-nova | 20:46 | |
*** Apoorva has quit IRC | 20:51 | |
*** penick has quit IRC | 20:53 | |
*** smatzek has quit IRC | 20:54 | |
*** ludovic has joined #openstack-nova | 21:02 | |
*** chyka has joined #openstack-nova | 21:02 | |
*** catintheroof has quit IRC | 21:02 | |
*** huanxie has quit IRC | 21:02 | |
*** catintheroof has joined #openstack-nova | 21:03 | |
openstackgerrit | Ed Leafe proposed openstack/nova master: Make conductor pass and use host_lists https://review.openstack.org/511358 | 21:04 |
openstackgerrit | Ed Leafe proposed openstack/nova master: Change compute RPC to use alternates for resize https://review.openstack.org/526436 | 21:04 |
edleafe | mriedem: ^^ should address the functional test failures | 21:04 |
*** MikeG451 has quit IRC | 21:05 | |
ludovic | Hi, I 'd like to ask few questions to nova experts: is it exists tips to prioritize evacuation order for instances | 21:06 |
*** chyka has quit IRC | 21:06 | |
openstackgerrit | Merged openstack/nova master: Some nit fix in multi_cell_list https://review.openstack.org/527597 | 21:06 |
openstackgerrit | Merged openstack/nova master: doc: add note about fixing admin-only APIs without a microversion https://review.openstack.org/527421 | 21:07 |
ludovic | or is it possible to reserve spare nodes ti ensure host-evacuation | 21:07 |
*** catintheroof has quit IRC | 21:07 | |
mriedem | ludovic: no and no, | 21:08 |
mriedem | you can specify a target host during evacuate, but that doesn't reserve it | 21:08 |
*** huanxie has joined #openstack-nova | 21:08 | |
mriedem | and you evacuate one instance at a time, | 21:08 |
mriedem | so if there is priority, the caller handles that | 21:08 |
*** cleong has quit IRC | 21:10 | |
ludovic | ok, so we can't ensure a good SLA | 21:10 |
ludovic | of we can't reserve resources to securely evacuate all workload | 21:11 |
ludovic | the SLA is impacted | 21:11 |
ludovic | mriedem: does the scheduler propose as a trick to overcome this? | 21:15 |
*** edmondsw has joined #openstack-nova | 21:17 | |
mriedem | ludovic: how is this any different than the scheduler correctly picking a host during the initial server create? | 21:19 |
mriedem | besides reschedules | 21:19 |
ludovic | excuse-me , maybe can we have a private discussion about the Filter scheduler VS host-evacuate process with HA compute instances ? | 21:20 |
*** Apoorva has joined #openstack-nova | 21:20 | |
mriedem | why private? | 21:20 |
*** penick has joined #openstack-nova | 21:20 | |
ludovic | in order to not disturb the room | 21:21 |
ludovic | np | 21:21 |
ludovic | I'm testing the HA compute instances with tripleo deployment | 21:21 |
ludovic | so the nova-evacuate pacemaker process is working well | 21:22 |
ludovic | but i was suprised by the Ram Filter scheduler | 21:22 |
*** edmondsw has quit IRC | 21:22 | |
ludovic | to explain the case , I have two compute Nodes. I have 2 Computes Nodes. The first compute Node with 31 instances ( total RAM allocated = 177152 M) , the second one with 2 instances ( Total RAM allocated = 24576 ) | 21:24 |
ludovic | RAM of computes Nodes is 196483 (memory_mb) | 21:24 |
ludovic | The Host-evacuate worked but 5 VMs was on ERROR with insufficient memory (nova-compute log) and on was with NO STATE (No Host found by RAM Filter) | 21:26 |
ludovic | one | 21:26 |
ludovic | I expected the same ERROR on these 6 instances | 21:26 |
ludovic | HOST NOT FOUND or ERROR with insufficient memory | 21:27 |
mriedem | so you're evacuating the 31 instances on the one compute node to the other compute node with 2 instances? | 21:27 |
ludovic | So i don't understand the result | 21:27 |
ludovic | yes | 21:28 |
mriedem | and 5 of 31 fail | 21:28 |
ludovic | 6 | 21:28 |
mriedem | which release? | 21:28 |
ludovic | 5 with insufficient memory and 1 with no valid host found | 21:28 |
mriedem | if you're in pike+ and using the FilterScheduler, you should remove the RamFilter | 21:28 |
ludovic | ocata | 21:28 |
mriedem | so the problem is, | 21:29 |
mriedem | the scheduler has a point in time snapshot of the resources from the compute, per request, | 21:29 |
ludovic | i use ocata for the moment | 21:29 |
mriedem | and the ram usage doesn't change until one of the evacuations makes it to the compute and claims those resources, and updates the compute node record in the db, which the scheduler will read in the next scheduling attempt | 21:29 |
mriedem | so the problem is if you're sending all 31 evacuate requests in a for loop, for example, with no time in between for the scheduler to catch up to the changes in the computes, | 21:30 |
mriedem | the scheduler thinks the compute is fine and sends the instance there, but the claim on the compute might fail because another request claimed those resources in the meantime | 21:30 |
ludovic | yes exactly | 21:31 |
mriedem | this should be fixed in pike, | 21:31 |
mriedem | because in pike, the RamFilter can be removed and the FilterScheduler uses the Placement service to claim the resources during scheduling | 21:31 |
mriedem | any claim collisions on the compute node in pike due to concurrent requests will be retried up to 3 times | 21:32 |
mriedem | otherwise what you're hitting is latent behavior | 21:32 |
ludovic | ah ok so with ocata the process is not enough efficient ? | 21:32 |
mriedem | correct; the late claim on the compute has always been a known issue with scheduling | 21:32 |
mriedem | with server create, if you hit this, we would reschedule to another compute | 21:33 |
mriedem | we don't do reschedules with evacuate though | 21:33 |
ludovic | But is it possible to influence the evacuate with max_concurrent_build ? | 21:33 |
ludovic | in nova.conf ? | 21:33 |
ludovic | by default this parameter is 10 , if we reduce to one ? | 21:34 |
mriedem | no max_concurrent_build is for server create, not evacuate | 21:36 |
ludovic | ah ok i understand . | 21:36 |
mriedem | there is no option like that for limiting evacuates | 21:36 |
mriedem | that's not to say one couldn't be added, but adding that in queens or pike doesn't make sense when we've solved this part of the problem in the scheduler | 21:37 |
mriedem | still, it seems reasonable to allow limiting the number of concurrent evacuates on a given compute, since we do that for spawn and live migrate | 21:38 |
mriedem | i wouldn't be opposed to adding something like that | 21:39 |
*** huanxie has quit IRC | 21:40 | |
ludovic | ok thank you for your explanations, it's precious for me because I'm testing openstack for a big French Company you know | 21:40 |
ludovic | And the goal is to see if we can in the future replace the massive usage of WMare with OpenStack you know | 21:41 |
mriedem | oolala | 21:41 |
ludovic | near future | 21:42 |
*** pchavva has quit IRC | 21:42 | |
mriedem | ok. would be cool if you could test this out on a pike deployment. | 21:42 |
mriedem | to make sure the filter scheduler + placement is correctly handling this for you | 21:42 |
ludovic | That's why i asked if reservation and /or prioritizing exist under OpenStack | 21:42 |
mriedem | remember to remove the RamFilter in pike if you're using the FilterScheduler since (1) it's redundant and (2) it will remove the memory_mb claim in the compute | 21:42 |
ludovic | ok | 21:43 |
*** huanxie has joined #openstack-nova | 21:44 | |
ludovic | don't you think it will be interesting to add the possibility to have spare Compute Nodes ? | 21:44 |
ludovic | with aggregate host spare for example | 21:45 |
mriedem | you mean build something into nova to mark specific computes as only used for evacuate? | 21:45 |
ludovic | or in a aggregat to propose the prioritinzing of important workload when evacuate | 21:45 |
ludovic | yes for example | 21:45 |
mriedem | yeah idk, maybe. i wouldn't want to change the evacuate api to pass through scheduler hints probably. | 21:46 |
ludovic | That will be ensure a very good SLA because the evacuate processus would be securized | 21:46 |
*** priteau_ has joined #openstack-nova | 21:46 | |
mriedem | i don't know how many deployments just have compute nodes lying around as spares for evacuate | 21:46 |
mriedem | you can also control which host is used client-side | 21:47 |
mriedem | as noted | 21:47 |
mriedem | so as a client, if you have a special "evacuate" host aggregate, you could round robin through those hosts and send it with the evacuate request | 21:48 |
mriedem | but, as your evacuate aggregate starts to fill it, it is no longer spare capacity | 21:48 |
mriedem | so... | 21:48 |
mriedem | *fill up | 21:48 |
openstackgerrit | Ed Leafe proposed openstack/nova master: Make conductor pass and use host_lists https://review.openstack.org/511358 | 21:49 |
openstackgerrit | Ed Leafe proposed openstack/nova master: Change compute RPC to use alternates for resize https://review.openstack.org/526436 | 21:49 |
edleafe | ^^ fixed pep8 booboo | 21:50 |
*** priteau has quit IRC | 21:50 | |
ludovic | it seems to not be easy to design ... | 21:50 |
ludovic | evacuate is temporary | 21:50 |
ludovic | until the source node repaired | 21:51 |
ludovic | so the goal is to failback and so free the evacuate aggregat | 21:51 |
mriedem | ludovic: sure, that's why it's not something built natively into nova | 21:52 |
mriedem | nova provides the API so a higher level service can orchestrate whatever you need here | 21:52 |
mriedem | edleafe: ack, will run the ironic patch on that | 21:53 |
jose-phillips | hi any idea | 21:55 |
*** nore_rabel has quit IRC | 21:55 | |
jose-phillips | why im im getting this error on devstack | 21:56 |
*** wind has joined #openstack-nova | 21:56 | |
jose-phillips | can't apply process capabilities -1 | 21:59 |
jose-phillips | using qemu | 21:59 |
rybridges | for the record mriedem, i tried recompiling qemu like you said... took the latest version i could find here http://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/ | 22:00 |
wind | Hi, I'm trying to make a rest-api call from ironic to nova, was hoping if someone could gimme a hint how to do so ... Ironic.conf doesn't have anything in there for nova, so should i build a keystoneclient session, and then use that to retrieve the token, and compute-api endpoint and then send the GET & PUT Commands | 22:00 |
rybridges | still get the exact same erroro | 22:00 |
wind | Any suggestions would be really helpful.... I'm new to openstack | 22:01 |
*** kumarmn has quit IRC | 22:04 | |
*** kumarmn has joined #openstack-nova | 22:05 | |
*** eharney has quit IRC | 22:05 | |
mnaser | 2017-12-19 22:04:53.593 44047 DEBUG nova.compute.resource_tracker [req-2858b7b9-d273-4347-aff3-dfa3fa10f4cd - - - - -] We're on a Pike compute host in a deployment with Ocata compute hosts. Auto-correcting allocations to handle Ocata-style assumptions. _update_usage_from_instance /usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py:1042 <=== wouldnt this be nice if this was a warning instead of | 22:05 |
mnaser | DEBUG? | 22:05 |
mnaser | given that I just found out Nova thinks there are Ocata hosts in my all-pike installation (i guess something is wrong somewhere) | 22:06 |
ludovic | mriedem: Just to be sure, in order to influence the host-evacuate process, i don't have other options than using host aggregate with flavor and extra_specs ? | 22:08 |
mnaser | ludovic: yes, but afaik migrations use the flavor and extra_specs that it had at the moment of provisioning | 22:08 |
mnaser | ex: if your flavor A had aggregate "foo" when a VM is booted, and you changed the aggregate to "bar", and then do a migration/evacuate, it will still try to look for "foo" | 22:09 |
*** priteau_ has quit IRC | 22:09 | |
mnaser | afaik that has been my experience but i might be wrong? | 22:09 |
*** priteau has joined #openstack-nova | 22:09 | |
ludovic | oh yes so it's necessary to provide two aggregates at the moemet of provisioning "prod" and "spare" | 22:10 |
ludovic | moment | 22:11 |
mnaser | well ideally if you launched an instance in 'prod', you dont want it to be evacuated into 'staging' | 22:11 |
mnaser | because maybe your host aggregate hardware in staging gets turned off at 5pm | 22:11 |
* mnaser goes back to wrestling nova not getting/processing any live migrations | 22:12 | |
*** claudiub|2 has joined #openstack-nova | 22:12 | |
mnaser | oh would you look at that | 22:13 |
mnaser | "Unable to submit allocation for instance 42e2e0cd-0dd2-48c0-b873-ed9cd08a451a" .. placement returning 400, JSON does not validate: None is not of type 'string' ... instance['project_id'] == None somehow?! | 22:13 |
*** kumarmn has quit IRC | 22:14 | |
mnaser | does this ring any bells to anyone or should i start diving in | 22:14 |
*** priteau has quit IRC | 22:14 | |
penick | Working hard to get live migrations functional is like cranking on the handle for a jack-in-the-box. Except the thing in the box is a fist. And it punches you in the face. | 22:14 |
*** huanxie has quit IRC | 22:14 | |
ludovic | mnaser: that sound JSON Filter not working no ? | 22:15 |
*** kumarmn has joined #openstack-nova | 22:15 | |
mnaser | penick: its always worked, but i guess this is a weird pike corner case, i see this - https://bugs.launchpad.net/nova/+bug/1701129 but it should be in pike which we're running | 22:16 |
openstack | Launchpad bug 1701129 in OpenStack Compute (nova) "Functional tests fail intermittently with 400 Bad Request from placement" [Low,Fix released] - Assigned to melanie witt (melwitt) | 22:16 |
* mnaser thinks | 22:16 | |
*** penick has quit IRC | 22:19 | |
*** huanxie has joined #openstack-nova | 22:20 | |
mnaser | i guess for some reason RequestSpec is getting an empty project_id | 22:21 |
openstackgerrit | Lance Bragstad proposed openstack/nova master: Add scope_types to server policies https://review.openstack.org/525772 | 22:21 |
openstackgerrit | Merged openstack/nova master: Convert ext filesystem resizes to privsep. https://review.openstack.org/517516 | 22:23 |
openstackgerrit | Merged openstack/nova master: Move flushing block devices to privsep. https://review.openstack.org/519010 | 22:23 |
openstackgerrit | Merged openstack/nova master: [placement] Separate API schemas (resource_class) https://review.openstack.org/520611 | 22:23 |
openstackgerrit | Merged openstack/nova master: Update nova-status and docs for nova-compute requiring placement 1.14 https://review.openstack.org/526505 | 22:23 |
*** gouthamr has joined #openstack-nova | 22:23 | |
openstackgerrit | Merged openstack/nova master: Deduplicate functional test code https://review.openstack.org/526227 | 22:23 |
openstackgerrit | Merged openstack/nova master: Fix possible TypeError in VIF.fixed_ips https://review.openstack.org/527920 | 22:24 |
*** lyan has quit IRC | 22:24 | |
*** penick has joined #openstack-nova | 22:25 | |
*** rcernin has joined #openstack-nova | 22:28 | |
*** salv-orl_ has quit IRC | 22:36 | |
*** salv-orlando has joined #openstack-nova | 22:37 | |
*** penick has quit IRC | 22:40 | |
*** marst has quit IRC | 22:41 | |
*** salv-orlando has quit IRC | 22:41 | |
*** salv-orlando has joined #openstack-nova | 22:42 | |
*** penick has joined #openstack-nova | 22:43 | |
mnaser | instance = common.get_instance(self.compute_api, context, id) <== would anyone know if this supplies project_id by default? | 22:49 |
mnaser | because that's the instance which is passed down to conductor and by the time its at the scheduler, instance.project_id == None which then in turn makes it fail the request to the placement api | 22:50 |
*** huanxie has quit IRC | 22:50 | |
mnaser | further investigation - {"project_id": null, "user_id": "695d5f386eed440cb0e38455e1afdc9e", "allocations": [{"resource_provider": {"uuid": "5d5c5177-29bb-484f-9cc6-928360afa195"}, "resources": {"MEMORY_MB": 512, "VCPU": 2, "DISK_GB": 20}}, {"resource_provider": {"uuid": "4e43861e-ee36-40b7-ba7b-2239b46a1609"}, "resources": {"VCPU": 2, "MEMORY_MB": 512, "DISK_GB": 20}}]} .. for some reason, user_id comes in but | 22:55 |
mnaser | project_id doesn't. fwiw, this is a server created in 2015. | 22:55 |
*** huanxie has joined #openstack-nova | 22:56 | |
mnaser | the user_id is the user of the one executing the live migration, not the owner of the instance oddly enough | 22:57 |
*** lennyb has quit IRC | 22:58 | |
*** lennyb has joined #openstack-nova | 23:00 | |
*** edmondsw has joined #openstack-nova | 23:06 | |
mnaser | ok.. request_spec record has project_id set to null for that vm | 23:06 |
mnaser | in the database | 23:06 |
mnaser | why and how.. :( | 23:06 |
mriedem | hmm, not sure why the project_id would be null | 23:06 |
mriedem | should come off the context | 23:07 |
mriedem | sorry, was on a call for the last hour | 23:07 |
mnaser | mriedem: no problem, its null because .. its null in the request_specs table too.. | 23:07 |
mnaser | i wonder why | 23:07 |
mriedem | you said it's a really old instance right? | 23:07 |
mnaser | yes mriedem | 23:07 |
mriedem | ok reqspec is created here https://github.com/openstack/nova/blob/16.0.4/nova/compute/api.py#L899 | 23:08 |
mnaser | the created_at for the requestspec is "2017-03-07 02:28:47" | 23:08 |
mnaser | but no updated_at | 23:08 |
mriedem | https://github.com/openstack/nova/blob/16.0.4/nova/objects/request_spec.py#L411 | 23:08 |
mriedem | 2017-03-07 is ocata right? | 23:09 |
mriedem | i'm wondering if this was a request spec created for an older instance | 23:09 |
mriedem | what's the created_at on the instance? | 23:09 |
mnaser | it was | 23:09 |
mnaser | 2015 created_at, 2017 requestspec | 23:09 |
mriedem | ok in ocata this is the routine for creating requestspecs for old instances | 23:09 |
mriedem | https://github.com/openstack/nova/blob/stable/ocata/nova/objects/request_spec.py#L590 | 23:09 |
mriedem | which https://github.com/openstack/nova/blob/stable/ocata/nova/objects/request_spec.py#L405 | 23:10 |
mriedem | however, | 23:10 |
mriedem | if that's an admin context, from the online data migration, it won't have a project id... | 23:10 |
*** edmondsw has quit IRC | 23:10 | |
mriedem | https://github.com/openstack/nova/blob/stable/ocata/nova/cmd/manage.py#L776 | 23:11 |
mnaser | which explains how we landed in this case | 23:11 |
mriedem | https://github.com/openstack/nova/blob/stable/ocata/nova/context.py#L313 | 23:11 |
mriedem | yup | 23:11 |
mnaser | i guess its probably not the only one | 23:11 |
mriedem | probably not | 23:11 |
mriedem | ok so you're hitting this trying to live migrate that instance right? | 23:12 |
mnaser | mriedem: yes but i believe that any operations involving placement will likely fail | 23:12 |
mriedem | so that's this http://git.openstack.org/cgit/openstack/nova/tree/nova/scheduler/client/report.py#n1141 | 23:12 |
mriedem | the scheduler is trying to create allocations in placement on the target node for that instance | 23:12 |
mnaser | correct, and because im not forcing it, it goes through the scheduler | 23:13 |
mnaser | and the scheduler tacks on project_id from the request_spec | 23:13 |
mriedem | yup https://github.com/openstack/nova/blob/16.0.4/nova/scheduler/filter_scheduler.py#L287 | 23:14 |
mriedem | and in this case, the instance project_id is likely != the context.project_id because the context is the admin user | 23:14 |
mriedem | doing the live migration | 23:14 |
mriedem | SOB | 23:14 |
mnaser | i looked at the number of request_specs | 23:15 |
mnaser | and its pretty terrifying to have to update it all | 23:15 |
mnaser | lol | 23:15 |
mriedem | the number of reqspecs that don't have a project_id set? | 23:15 |
*** burt has quit IRC | 23:15 | |
mnaser | i didnt want to run that query because im pretty sure ill burn down the sql server | 23:15 |
mnaser | close to a million records and i probably would have to wildcard match it | 23:15 |
mriedem | select count(*) from nova_api.request_specs where project_id is null and deleted == 0; | 23:16 |
mriedem | ? | 23:16 |
mnaser | request_specs contains a json thingy called 'spec' | 23:16 |
mnaser | {"nova_object.version": "1.5", ...} | 23:17 |
mriedem | oh right | 23:17 |
mriedem | yeah the request_specs.spec is a serialized json blob of the object | 23:17 |
mriedem | so forget your db query | 23:17 |
mriedem | jaypipes: ^ | 23:17 |
mriedem | mnaser: well, i could hack something up for you quickish | 23:17 |
mriedem | mnaser: have you reported a bug yet? | 23:17 |
mnaser | mriedem: i havent yet, i just kinda discovered how i ended up here with your information | 23:18 |
mnaser | (i got as far as .. request spec doesnt have project id) but the online migration confirms it | 23:18 |
mriedem | ok, i can start hacking up a workaround if you can report a bug | 23:18 |
* mriedem wonders if we should hold up the newton eol for this | 23:18 | |
mnaser | mriedem: just out of curiosity, is project_id/user_id actually used by the placement api ? | 23:18 |
mriedem | not yet | 23:19 |
mnaser | but i guess we dont want to make it from bad to worse | 23:19 |
mriedem | the long-term idea is we can leverage the allocations with the project/user information for doing things like counting quotas without iterating the cells | 23:19 |
mnaser | gotcha | 23:20 |
mnaser | alright let me write up a bug | 23:20 |
mriedem | this would be very wrong for that though https://github.com/openstack/nova/blob/16.0.4/nova/scheduler/filter_scheduler.py#L293 | 23:20 |
mriedem | if we're live migrating or evacuating | 23:20 |
mnaser | i guess thats why it says todo :> | 23:21 |
mriedem | heh | 23:21 |
mriedem | melwitt: ^ a todo to keep in mind if we ever want to use placement allocations to mine data for counting quotas | 23:21 |
mriedem | we aren't storing the correct user_id for all allocations | 23:21 |
melwitt | so we should have one claim per allocation or? | 23:23 |
mriedem | when migrating or evacuating, by default the context is the admin | 23:24 |
mriedem | b/c those are admin apis | 23:24 |
mriedem | so the user_id we're storing in the allocation for the instance is from the admin, but the project_id should come from the instance, which is the user | 23:24 |
melwitt | yeah, I see. guh | 23:24 |
*** felipemonteiro has quit IRC | 23:24 | |
*** felipemonteiro has joined #openstack-nova | 23:24 | |
*** kumarmn has quit IRC | 23:25 | |
*** kumarmn has joined #openstack-nova | 23:26 | |
*** huanxie has quit IRC | 23:27 | |
*** andreas_s has joined #openstack-nova | 23:28 | |
melwitt | does it maybe work out because allocations are updated by the compute host every update interval? would it auto heal the user/project once we fix it? | 23:28 |
mriedem | no | 23:29 |
mriedem | computes don't mess with allocations once you're upgraded to pike | 23:29 |
mnaser | mriedem: https://bugs.launchpad.net/nova/+bug/1739318 | 23:30 |
openstack | Launchpad bug 1739318 in OpenStack Compute (nova) "Online data migration context does not contain project_id" [Undecided,New] | 23:30 |
mriedem | mnaser: thanks | 23:30 |
melwitt | hm, I thought that's what update_available_resource did | 23:30 |
mriedem | melwitt: used to did | 23:30 |
mnaser | also looks like the claim resources which did `project_id = spec_obj.project_id` was moved to scheduler utils | 23:30 |
melwitt | damn | 23:30 |
mnaser | so that might make things more challenging to backport. | 23:30 |
*** kumarmn has quit IRC | 23:30 | |
mnaser | (or if you have to solve the user_id one) | 23:31 |
*** kumarmn has joined #openstack-nova | 23:31 | |
*** huanxie has joined #openstack-nova | 23:32 | |
*** andreas_s has quit IRC | 23:32 | |
*** moshele has joined #openstack-nova | 23:32 | |
*** itlinux has quit IRC | 23:33 | |
mnaser | mriedem: so is it time to write a little script to iterate all request specs, and those will null, look up the project_id from instances table and update it again with the project_id in there? | 23:36 |
mriedem | mnaser: i think we might need that too, but i have a workaround i think we can use for now, | 23:38 |
mriedem | plus fixing that busted migration routine for people that haven't hit this yet | 23:38 |
mnaser | mriedem: im working on a small fix for that busted migration routine as it seems pretty trivial | 23:38 |
*** penick has quit IRC | 23:39 | |
mnaser | mriedem: im noticing a lot (most fields) are nullable=True .. can I drop that for project_id or is that a design decision? | 23:41 |
mnaser | if i cant drop it, i can raise an exception in from_components if context.project_id is none (and add a unit test for that), then fix the layer above it to make sure it always supplies a project_id | 23:42 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Use instance.project_id when creating request specs for old instances https://review.openstack.org/529184 | 23:42 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Workaround missing RequestSpec.project_id when moving an instance https://review.openstack.org/529185 | 23:42 |
mriedem | mnaser: this is my start ^ | 23:42 |
mnaser | oh okay :P | 23:42 |
mnaser | c'mon gerrit | 23:43 |
*** catinthe_ has joined #openstack-nova | 23:43 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Workaround missing RequestSpec.project_id when moving an instance https://review.openstack.org/529185 | 23:45 |
mriedem | ^ handles the other cases | 23:45 |
mriedem | tonyb: think we might want to hold up https://review.openstack.org/#/c/529102/ for https://review.openstack.org/529184 | 23:47 |
mnaser | mriedem: the patch for the fix looks good, but just a question, do you want to drop nullable=True to make sure that it will never save (in case we ever likely run into this again?) | 23:47 |
mriedem | mnaser: that will require a version bump on the object and isn't something we can backport | 23:47 |
mriedem | it's something we can do on master, but not critical atm | 23:47 |
mnaser | ah okay, figured there was a reason behind it | 23:47 |
mriedem | i'll leave a todo | 23:48 |
*** kumarmn has quit IRC | 23:48 | |
mriedem | mnaser: i don't suppose you have a recreate of this in staging that you can test out with the workaround patch? | 23:49 |
mnaser | mriedem: i dont think i can recreate this scenario.. we just rebuilt our local dev cloud from scratch a few weeks ago :( | 23:50 |
mnaser | it was too bad because it was running since newton | 23:50 |
mriedem | ok, we could probably recreate it though with devstack. create a new instance, delete it's request spec from the db directly, then run the migration routine | 23:50 |
mriedem | then try to migrate that instance | 23:50 |
mnaser | mriedem: we probably dont have to get that far, probably seeing project_id non null in request_specs table would probably be enough to show that this bug specifically was resolved | 23:51 |
openstackgerrit | Merged openstack/nova master: Deduplicate instance.create notification samples https://review.openstack.org/523456 | 23:51 |
mriedem | true | 23:52 |
mriedem | i mean, you could just test this in prod, but...i didn't want to ask | 23:52 |
mnaser | mriedem: i could probably patch up the live migration one only | 23:52 |
*** catintheroof has joined #openstack-nova | 23:52 | |
mnaser | since really nothing can break there because its an admin api only | 23:52 |
*** catintheroof has quit IRC | 23:52 | |
mnaser | i wouldnt be able to test the migrate and conductor changes as those are too critical tbh | 23:52 |
*** catintheroof has joined #openstack-nova | 23:53 | |
mnaser | by conductor, the conductor manager change that is | 23:53 |
mriedem | yeah | 23:55 |
mnaser | oh fun times | 23:55 |
mnaser | this will conflict in stable/pike | 23:55 |
mnaser | _get_request_spec_for_select_destinations doesnt exist in stable/pike .. not in my stable/pike | 23:55 |
*** catinthe_ has quit IRC | 23:55 | |
mnaser | tasks | 23:56 |
mriedem | that code is in _find_destination in pike | 23:57 |
*** ludovic has quit IRC | 23:57 | |
*** catintheroof has quit IRC | 23:58 | |
mriedem | https://github.com/openstack/nova/blob/stable/pike/nova/conductor/tasks/live_migrate.py#L264 | 23:58 |
mnaser | ok i see | 23:58 |
*** jdurgin has quit IRC | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!