Thursday, 2018-08-09

*** gyee has quit IRC00:00
*** BrinZhang has quit IRC00:04
*** r-daneel has quit IRC00:10
*** zhurong has joined #openstack-nova00:12
*** tetsuro has joined #openstack-nova00:15
openstackgerritMerged openstack/nova master: Add functional test for forced live migration rollback allocs  https://review.openstack.org/58663600:24
*** erlon has quit IRC00:26
*** moshele has quit IRC00:35
*** gbarros has joined #openstack-nova00:40
*** mhen has quit IRC01:24
*** mhen has joined #openstack-nova01:25
*** ircuser-1 has quit IRC01:27
openstackgerritMerged openstack/nova stable/ocata: [stable only] Add functional regression test for bug 1783613  https://review.openstack.org/58841601:36
openstackbug 1783613 in OpenStack Compute (nova) ocata "[ocata only] quota usage not decremented during boot/delete race" [Undecided,In progress] https://launchpad.net/bugs/1783613 - Assigned to melanie witt (melwitt)01:36
*** gbarros has quit IRC01:39
*** Dinesh_Bhor has joined #openstack-nova01:44
*** zhurong has quit IRC01:47
*** gbarros has joined #openstack-nova01:48
*** gbarros has quit IRC01:52
*** gbarros has joined #openstack-nova01:56
*** mrsoul has quit IRC01:59
*** Dinesh_Bhor has quit IRC02:03
*** gbarros has quit IRC02:04
*** mriedem has quit IRC02:08
openstackgerritChen proposed openstack/nova stable/queens: Fix bad links for admin-guide  https://review.openstack.org/59006802:08
*** Dinesh_Bhor has joined #openstack-nova02:10
openstackgerritMatt Riedemann proposed openstack/nova stable/pike: Update nova network info when doing rebuild for evacuate operation  https://review.openstack.org/59007002:13
openstackgerritChen proposed openstack/nova stable/queens: Fix bad links for admin-guide  https://review.openstack.org/59006802:16
openstackgerritChen proposed openstack/nova stable/pike: Fix bad links for admin-guide  https://review.openstack.org/59007202:16
*** Bhujay has joined #openstack-nova02:30
*** Bhujay has quit IRC02:30
*** Bhujay has joined #openstack-nova02:31
*** zhurong has joined #openstack-nova02:33
*** Nel1x has joined #openstack-nova02:36
openstackgerritChen proposed openstack/nova master: Update ssh configuration doc  https://review.openstack.org/58984402:39
openstackgerritGhanshyam Mann proposed openstack/nova master: Update the parameter explain when updating a volume attachment  https://review.openstack.org/56518102:45
*** dklyle has joined #openstack-nova02:46
*** psachin has joined #openstack-nova02:47
*** hongbin has joined #openstack-nova02:49
*** Bhujay has quit IRC02:51
openstackgerritTakashi NATSUME proposed openstack/nova master: [placement] api-ref: add description for 1.29  https://review.openstack.org/58940702:56
openstackgerritVishakha Agarwal proposed openstack/nova master: Quota details for key_pair "in_use" is 0.  https://review.openstack.org/59008103:12
openstackgerritChen proposed openstack/nova master: Trivial fix on migration doc  https://review.openstack.org/58902803:17
*** hongbin has quit IRC03:41
*** dklyle has quit IRC03:42
*** zhurong has quit IRC03:53
*** _ix has quit IRC03:54
*** Dinesh_Bhor has quit IRC03:55
*** janki has joined #openstack-nova03:57
*** hemna_ has quit IRC03:57
*** ratailor has joined #openstack-nova04:11
openstackgerritMerged openstack/nova stable/queens: Fix bad links for admin-guide  https://review.openstack.org/59006804:13
*** Nel1x has quit IRC04:14
*** liuyulong has joined #openstack-nova04:22
*** Bhujay has joined #openstack-nova04:32
*** Bhujay has quit IRC04:34
*** markvoelker has joined #openstack-nova04:41
*** Dinesh_Bhor has joined #openstack-nova04:54
*** ratailor has quit IRC04:57
*** ratailor has joined #openstack-nova04:58
*** ircuser-1 has joined #openstack-nova05:04
*** udesale has joined #openstack-nova05:06
openstackgerritVishakha Agarwal proposed openstack/nova master: Quota details for key_pair "in_use" is 0.  https://review.openstack.org/59008105:32
*** nicolasbock has joined #openstack-nova05:34
*** tetsuro has quit IRC05:35
*** links has joined #openstack-nova05:50
openstackgerritVishakha Agarwal proposed openstack/nova master: Quota details for key_pair "in_use" is 0.  https://review.openstack.org/59008106:05
*** hamzy_ has quit IRC06:29
*** hamzy_ has joined #openstack-nova06:29
*** nicolasbock has quit IRC06:35
*** pcaruana has joined #openstack-nova06:38
*** nicolasbock has joined #openstack-nova06:41
*** holser_ has joined #openstack-nova06:45
*** udesale has quit IRC06:54
*** evrardjp has joined #openstack-nova06:55
openstackgerritSergii Golovatiuk proposed openstack/nova master: libvirt: Always escape IPv6 addresses when used in migration URI  https://review.openstack.org/58954806:55
*** hshiina has joined #openstack-nova06:56
*** ccamacho has joined #openstack-nova06:58
*** udesale has joined #openstack-nova06:58
*** ratailor_ has joined #openstack-nova07:00
*** ratailor has quit IRC07:03
*** luksky has joined #openstack-nova07:03
*** ispp has joined #openstack-nova07:06
*** stakeda has joined #openstack-nova07:06
*** rmart04 has joined #openstack-nova07:14
openstackgerritliuyamin proposed openstack/python-novaclient master: Replace os-client-config to openstacksdk  https://review.openstack.org/59014107:15
*** Bhujay has joined #openstack-nova07:16
*** rcernin has quit IRC07:16
*** tetsuro has joined #openstack-nova07:16
*** zhangbailin_ has quit IRC07:21
*** zhangbailin_ has joined #openstack-nova07:21
*** Bhujay has quit IRC07:22
*** dpawlik has joined #openstack-nova07:22
openstackgerritTetsuro Nakamura proposed openstack/nova master: Adds a test for _get_provider_ids_matching()  https://review.openstack.org/59015007:23
*** dpawlik has quit IRC07:24
*** dpawlik has joined #openstack-nova07:24
*** zhangbailin_ has quit IRC07:26
*** BrinZhang has joined #openstack-nova07:26
*** rmart04_ has joined #openstack-nova07:26
*** rmart04 has quit IRC07:26
*** rmart04_ is now known as rmart0407:26
*** brinzh has joined #openstack-nova07:27
*** BrinZhang has quit IRC07:27
*** ratailor_ has quit IRC07:34
*** ratailor__ has joined #openstack-nova07:35
*** Bhujay has joined #openstack-nova07:40
*** ratailor_ has joined #openstack-nova07:50
*** jpena|off is now known as jpena07:50
*** XueFeng has quit IRC07:50
*** ratailor__ has quit IRC07:52
*** udesale has quit IRC07:53
*** johnthetubaguy has joined #openstack-nova07:54
gibimelwitt, mriedem: opened a versioned notification bp for stein https://blueprints.launchpad.net/nova/+spec/versioned-notification-transformation-stein08:04
*** tetsuro has quit IRC08:11
*** _ix has joined #openstack-nova08:11
*** rmart04 has quit IRC08:13
*** mdbooth has joined #openstack-nova08:17
*** Dinesh_Bhor has quit IRC08:17
mdboothlyarwood: Passing: https://review.openstack.org/#/c/587013/ !08:19
*** tssurya has joined #openstack-nova08:19
*** dulek has joined #openstack-nova08:21
mdboothlyarwood: Also passed without the rebase workaround. I'll merge them and resubmit.08:21
*** Dinesh_Bhor has joined #openstack-nova08:23
*** udesale has joined #openstack-nova08:24
openstackgerritMatthew Booth proposed openstack/nova master: fixtures: Track volume attachments within CinderFixtureNewAttachFlow  https://review.openstack.org/58701308:27
*** dtantsur|afk is now known as dtantsur08:33
openstackgerritJose Castro Leon proposed openstack/nova master: Fix get_device_path from network mounted volume  https://review.openstack.org/59018808:36
*** janki has quit IRC08:36
*** ratailor_ has quit IRC08:46
*** ratailor_ has joined #openstack-nova08:46
*** priteau has quit IRC08:47
*** priteau has joined #openstack-nova08:48
openstackgerritMatthew Booth proposed openstack/nova master: Add regression test for bug#1784353  https://review.openstack.org/58701408:49
*** rmart04 has joined #openstack-nova09:00
*** cdent has joined #openstack-nova09:08
*** ispp has quit IRC09:10
*** Dinesh_Bhor has quit IRC09:11
lyarwoodmdbooth: cool, thanks again for working through this :)09:13
mdboothlyarwood: np.09:14
mdboothlyarwood: Started accidentally while reviewing and seemed silly to stop :)09:14
*** josecastroleon has quit IRC09:14
*** janki has joined #openstack-nova09:15
*** josecastroleon has joined #openstack-nova09:19
lyarwoodmdbooth: are you going to rebase https://review.openstack.org/#/c/587071/ ?09:21
lyarwoodmdbooth: np if not, I have time to work on it this morning finally09:21
mdboothlyarwood: I was going to leave that for you, that's the real bit :)09:21
lyarwoodmdbooth: tis cool, thanks again09:21
mdboothlyarwood: Incidentally, did you consider the 'fix it in compute' approach09:22
mdboothI know that's where the patch started, then you moved to conductor09:22
mdboothBut in fixing the fixture I can across other cleanup in compute which already does exactly what I was talking about09:22
lyarwoodmdbooth: yeah the remove_volume_connections call09:23
mdboothYeah09:23
lyarwoodmdbooth: yeah I'll take a look now, it's a shame to flip back again but meh09:24
mdboothlyarwood: I'm not saying do it, just asking if it's feasible/worth considering09:25
mdboothOr if you've already considered and rejected it, in fact09:25
lyarwoodmdbooth: yeah understood, I think it is feasible and ultimatley a better approach I just haven't looked into the knock on impact of changing shutdown_instance.09:28
mdboothlyarwood: ack09:28
*** amarao has joined #openstack-nova09:32
*** derekh has joined #openstack-nova09:35
*** goutham1 has joined #openstack-nova09:37
*** Dinesh_Bhor has joined #openstack-nova09:38
goutham1HI all i a facing this issue in rally when i try to create a deployment it throws this error Env manager got invalid spec:09:39
goutham1["There is no Platform plugin with name: 'existing@openstack'"]09:39
goutham12:5709:39
goutham1any idea on how to fix it ??09:39
*** dpawlik has quit IRC09:39
goutham1rally deployment create --fromenv --name=existing09:39
goutham1Env manager got invalid spec:09:39
goutham1["There is no Platform plugin with name: 'existing@openstack'"]09:39
goutham1it shows something of this sort any idea on how to fix this ??09:40
*** dpawlik has joined #openstack-nova09:41
mdboothgoutham1: You'll want to try #openstack for user issues, I think09:41
goutham1thank mdbooth thanks09:41
*** dpawlik has quit IRC09:45
*** dpawlik has joined #openstack-nova09:49
tobascogibi: maybe a stupid question but is the os-server-external-events interface related to legacy or versioned notifications in nova?09:52
gibitobasco: no it doesn't09:52
gibitobasco: os-server-external-events is in the REST API09:52
tobascook thanks09:53
gibitobasco: when I say nova notifications I mean notification emitted on the notifications or versioned_notification RPC  topic09:53
*** neiljerram has joined #openstack-nova09:55
neiljerramGood morning everyone.09:55
*** luksky has quit IRC09:56
neiljerramI am struggling with a problem in Queens where I can do novaclient.images.list() if novaclient is for the admin tenant, but I get 401 if novaclient is for some other tenant/project.09:58
neiljerramThis was working in a Pike installation, and using Keystone v2 for authentication.  In my Queens install I don't have Keystone v2 so am now using Keystone v3 for auth.09:59
*** maciejjozefczyk has quit IRC09:59
neiljerramAny thoughts?09:59
neiljerramI believe any tenant/project should be able to list images, right?10:00
*** maciejjozefczyk has joined #openstack-nova10:01
*** goutham1 has quit IRC10:04
neiljerramWhen I do novaclient.images.list() with a non-admin tenant, and get 401, there is no new logging in nova-api.log.  (Whereas when I do a successful list with the admin tenant, I see a 200 log line in nova-api.log.)10:04
neiljerramTherefore I guess that this 401 is coming from some middleware before nova-api?  But I don't know how to debug or see any logging for that middleware...10:05
neiljerramAh, just realized that I should be asking all this in #openstack instead...10:09
openstackgerritBalazs Gibizer proposed openstack/nova master: Use placement 1.28 in scheduler report client  https://review.openstack.org/58366710:11
*** claudiub|2 has joined #openstack-nova10:17
*** claudiub|2 is now known as claudiub10:17
openstackgerritSergii Golovatiuk proposed openstack/nova master: libvirt: Always escape IPv6 addresses when used in migration URI  https://review.openstack.org/58954810:18
mdboothneiljerram: :) You're also best talking to glance directly for listing images. Pretty sure nova would just proxy it. Actually, I wonder if novaclient just talks to glance instead? When can we kill that, btw?10:21
*** Dinesh_Bhor has quit IRC10:22
neiljerrammdbooth, I think you're right that nova proxies this to glance.  When I try this with an admin tenant, I see a 200 log line in nova-api.log - which I think means that it can't be going directly to glance; right?10:22
mdboothneiljerram: Yeah, it means we saw it.10:23
neiljerrammdbooth, I think my problem may be more to do with not understanding how users and tokens work in Keystone v3...10:24
*** bhagyashris has joined #openstack-nova10:25
*** stakeda has quit IRC10:26
neiljerrammdbooth, If it's OK to ask this here: if I have just created some new project and user (in that project), do I also need to explicitly set up some token(s) for that user?  Or will that happen under the covers when I first try to do something with that user?10:26
*** goutham1 has joined #openstack-nova10:26
*** ratailor_ has quit IRC10:33
*** ratailor has joined #openstack-nova10:33
*** hshiina has quit IRC10:34
*** goutham1 has quit IRC10:34
*** luksky has joined #openstack-nova10:35
openstackgerritBalazs Gibizer proposed openstack/nova master: Use placement 1.28 in scheduler report client  https://review.openstack.org/58366710:40
*** sahid has joined #openstack-nova10:45
*** erlon has joined #openstack-nova10:49
lyarwoodmdbooth: https://review.openstack.org/#/c/589567/3 - would you mind taking a look at my latest comment here, basically we wanted to introduce a workaround to avoid using qemu-img to aviod the RT from blocking other operations, however this path is also used during LM to get the sizes of disks before we recreate on the dest if required. I'm tempted to just close this change and move on but I might be10:55
lyarwoodmissing something here.10:55
mdboothlyarwood: ack. Looking now.10:55
lyarwoodmdbooth: note the logic in the current change is all backwards, I was rewriting it to always default to using os.path.getsize and only use qemu-img when the workaround was enabled.10:56
lyarwoodmdbooth: but that reintroduces 177064010:56
mdboothlyarwood: We should really just write that data into disk.info and read it from there.10:58
lyarwoodmdbooth: yeah looks like get_instance_disk_info always checks the config11:00
lyarwoodthe actual instance config that is, not disk.info11:01
*** jpena is now known as jpena|lunch11:01
lyarwoodmdbooth: brb need to drop for 10 mins11:02
mdboothlyarwood: k11:02
*** holser_ has quit IRC11:06
openstackgerritChen proposed openstack/nova master: Add additional info to resource provider aggregates update API  https://review.openstack.org/59024311:07
*** maciejjozefczyk has quit IRC11:08
*** maciejjozefczyk has joined #openstack-nova11:09
*** maciejjozefczyk has quit IRC11:10
*** brinzh has quit IRC11:12
mdboothlyarwood: If you're back, why the hell didn't we just stat it?11:15
mdboothlyarwood: Probably because qemu-img is there and we weren't considering the performance impact.11:15
mdboothlyarwood: But for allocated size, stat should give us exactly what we want, be really fast, and we don't need a workaround.11:16
openstackgerritMerged openstack/nova master: Docs: Add guide to migrate instance with snapshot  https://review.openstack.org/58444211:27
*** sambetts|afk has quit IRC11:29
*** sambetts_ has joined #openstack-nova11:31
*** maciejjozefczyk has joined #openstack-nova11:32
openstackgerritMatthew Booth proposed openstack/nova master: Improve performance of get_allocated_disk_size  https://review.openstack.org/59025311:33
mdboothlyarwood: How about ^^^ instead? Just throwing that out as a proposal.11:33
*** sambetts_ is now known as sambetts11:35
*** sambetts is now known as sambetts|afk11:35
* mdbooth -> lunch11:35
*** bhagyashris has quit IRC11:35
*** links has quit IRC11:36
openstackgerritMerged openstack/nova master: Fix host validity check for live-migration  https://review.openstack.org/40100911:36
*** sambetts|afk has quit IRC11:46
*** sambetts_ has joined #openstack-nova11:47
*** s10 has joined #openstack-nova11:52
openstackgerritRajesh Tailor proposed openstack/nova stable/queens: Fix host validity check for live-migration  https://review.openstack.org/59026211:55
openstackgerritRajesh Tailor proposed openstack/nova stable/pike: Fix host validity check for live-migration  https://review.openstack.org/59026311:58
*** jpena|lunch is now known as jpena11:58
lyarwoodmdbooth: sorry that took a while11:59
* lyarwood reads12:00
lyarwoodmdbooth: yeah but we also need the virtual size12:00
mdboothlyarwood: Where?12:00
lyarwoodmdbooth: and that's the more important value tbh12:00
*** ratailor has quit IRC12:01
lyarwoodmdbooth: so we need it to work out over_committed_disk_size but also during LM to ensure we create the dest disks correctly12:01
lyarwoodmdbooth: your change is still good12:02
lyarwoodmdbooth: but I don't think it stops us from calling qemu-img to get the virtual size12:02
mdboothlyarwood: Yes, you're right12:02
mdboothSomething somewhere said it the regression was introduced in a particular change, and that change only added get_allocated_disk_size12:03
lyarwoodmdbooth: yeah the bug for this highlights that change first I think12:03
*** sambetts_ has quit IRC12:04
mdboothOk. We could also eliminate get_disk_size, but that would be more complex12:04
lyarwoodmdbooth: that introduced the first call to qemu-img, then we noticed that broke LM so I introduced the virtual size call12:04
*** sambetts_ has joined #openstack-nova12:04
mdboothWe'd have to cache it12:04
lyarwoodmdbooth: yeah I think we can do that for virtual size12:05
mdboothI think. Not hard, but harder.12:05
mdboothPossibly not worth it harder12:05
mdboothIncidentally, that is the only use of get_allocated_disk_size12:05
openstackgerritMerged openstack/nova master: Update nova network info when doing rebuild for evacuate operation  https://review.openstack.org/38285312:06
lyarwoodmdbooth: yeah as I introduced it to fix the original over commit issue a while ago12:06
lyarwoodmdbooth: where we originally used os.path.getsize12:06
lyarwoodmdbooth: I guess we could just use that for the virtual size for files and avoid the call to qemu-img12:07
lyarwoodmdbooth: and use your change to get the allocated size12:08
lyarwoodmdbooth: then everyone is happy12:08
mdboothlyarwood: Ok, so now it's a judgement call. The workaround is frankly ugly and puts the onus on users to fix it. However, there shouldn't be many of those users and it's technically simpler.12:08
mdboothI think os.path.getsize() is unreliable12:09
mdboothDepends what qcow2 allocation we use12:09
lyarwoodtrue12:09
mdboothqemu-img is the right tool to use for that12:09
mdboothlyarwood: So my fix is only going to half the performance impact. Is that enough?12:10
lyarwoodmdbooth: it's not going to change the impact12:11
lyarwoodmdbooth: we still make a single qemu-img call for the virtual size12:11
lyarwoodmdbooth: we only make one now12:11
mdboothlyarwood: Well we'll only call it once instead of twice, right?12:11
lyarwoodmdbooth: nope, https://review.openstack.org/#/c/589513/ reduced it down to one12:12
mdboothlyarwood: Ah, with that landed there are *no* calls to get_alloated_disk_size12:12
mdbooth\o/ dead code12:12
lyarwoodmdbooth: well before you rm -rf it12:13
lyarwoodmdbooth: I still think we could use the getsize approach for raw disks12:13
lyarwoodmdbooth: and your stat call12:13
lyarwoodmdbooth: that way, no workaround, just an additional bugfix for RAW disks12:14
mdboothHonestly, I'd prefer to avoid doing anything complicated for a legacy code path. Adding if <raw>, elif <qcow2>, elif <lvm>...12:14
mdboothDoesn't seem like a good plan. I'll take your hack over that.12:15
lyarwoodmdbooth: well we already do that here anyway tbh12:16
lyarwooddisk_type == file driver_type == ploop etc12:16
*** eharney has quit IRC12:17
mdboothlyarwood: Right, but we'd be adding at least 1 new code path. Also, we'd still be slow for qcow2 at least sometimes.12:23
mdboothlyarwood: I'm ambivalent here in case you hadn't picked up :) I'm not necessarily against your hack, just thinking if it's worth the effort to do better.12:25
lyarwoodmdbooth: yeah I really don't enjoy touching this stuff tbh as something always comes up but I think the stat/getsize approach for RAW is the best we can offer12:26
mdboothIncidentally, what scheme do we now have which isn't interested in actual disk usage?12:27
mdboothThat seems odd.12:27
s10update_available_resource() with this call to get_allocated_disk_size is blocking some other operations only in post_live_migration: https://github.com/openstack/nova/commit/ab1e48f4683315db631be3f0995be6258edf6997 ? Do we really need this call here now?12:27
mdbooths10: Intuitively I'd say yes, but based on the same assumptions which tell me we should still be interested in actual disk usage.12:29
mdbooth...which we're apparently not.12:29
lyarwoodyeah placement should handle that now, so I think we can actually remove this?12:30
mdboothlyarwood: But how does placement handle it?12:30
mdboothPlacement doesn't know anything about actual disk usage which the hypervisor didn't tell it.12:30
lyarwoodmdbooth: yeah but I didn't think that was coming from the RT but I'm likely wrong.12:31
mdboothUnless we deprecated disk overcommit?12:32
*** gbarros has joined #openstack-nova12:34
*** sambetts_ has quit IRC12:36
*** sambetts_ has joined #openstack-nova12:37
mdboothlyarwood: An alternate (but not necessarily 'better'): write virtual size into disk.info. It's possible to do this in a backwards compatible way. We would just read it out for virtual size, update it automatically if it's missing so we don't have to fetch it again, and use state for allocated size.12:39
mdbooths/state/stat/12:39
mdboothlyarwood: It would be fast, accurate, and secure.12:39
mdboothIt would also be a bit more complex, so only worth it if we continue to need the data.12:40
mdboothIf a fast get_disk_size() is required ongoing, I think we should do ^^^. If not, I think we should go with the workaround.12:40
*** oanson has joined #openstack-nova12:42
*** edmondsw has joined #openstack-nova12:45
*** lbragstad has joined #openstack-nova12:45
openstackgerritLee Yarwood proposed openstack/nova master: WIP libvirt: rewrite _get_instance_disk_info_from_config  https://review.openstack.org/58956712:47
lyarwoodmdbooth: I'd rather keep this simple if possible, what about ^12:47
* lyarwood isn't sure about the ploop logic tbh12:49
mdboothlyarwood: That doesn't eliminate the qemu-img call, though12:49
mdboothFor qcow212:49
lyarwoodmdbooth: yeah I don't think we can12:49
lyarwoodmdbooth: the issue was reported against RAW FWIW12:49
mdboothYou can if you cache it12:49
lyarwoodtrue12:49
mdboothAnd then you've also got 1 less code path to test12:50
lyarwoodwell you still need it if it isn't cached12:50
lyarwoodfor now12:50
lyarwoodbut longer term we can remove it12:50
mdboothRight, but you can put that in a utility call with separate tests12:51
lyarwoodmdbooth: are there util methods for reading/writing to disk.info btw?12:51
mdboothNo, we'd need to pull it out of imagebackend12:51
mdbooth(A good thing)12:51
*** _ix has quit IRC12:52
mdboothlyarwood: But that's conditional on us continuing to need this stuff. If it has a limited shelf life it's not worth it.12:54
*** sambetts_ has quit IRC12:54
mdboothAlthough I still don't understand why we don't need allocated disk any more.12:55
*** sambetts_ has joined #openstack-nova12:56
lyarwoodmdbooth: well we still need it for LM12:57
*** josecastroleon has quit IRC12:57
*** josecastroleon has joined #openstack-nova12:57
lyarwoodmdbooth: tbh I'd rather land something simple like this as a bugfix and then work to switch to disk.info outside of this in a bp or something12:57
*** eharney has joined #openstack-nova12:58
mdboothlyarwood: Sure. I'd prefer the workaround over the extra code paths for sure.12:59
lyarwoodmdbooth: wait, getting confused now, which workaround?12:59
lyarwoodmdbooth: os.stat?12:59
mdboothlyarwood: No, your original one.12:59
mdboothos.stat() doesn't fix it, we established that12:59
*** _ix has joined #openstack-nova13:00
mdboothlyarwood: So... land your original workaround, with the disk.info thing in reserve.13:01
lyarwoodmdbooth: that workaround breaks LM13:01
lyarwoodmdbooth: that's why I'm suggesting using os.stat and os.path.getsize for RAW at least13:01
mdboothHow does it break LM, btw?13:02
lyarwoodmdbooth: see my comment, LM with non-shared storage where we are creating images on the dest in pre_live_migration13:03
lyarwoodmdbooth: if we don't get an accurate virtual size we end up creating images that are too small13:03
*** josecastroleon has quit IRC13:03
lyarwoodhttps://review.openstack.org/#/c/589567/3 - my comment there sorry13:04
*** josecastroleon has joined #openstack-nova13:04
lyarwoodhttps://bugs.launchpad.net/nova/+bug/177064013:04
openstackLaunchpad bug 1770640 in nova (Ubuntu Bionic) "live block migration of instance with vfat config drive fails" [High,Fix committed]13:04
mdboothlyarwood: We shouldn't be live migrating a config disk anyway13:10
mdboothThat sounds like a different bug13:10
mdboothWe should just host->host copy it13:10
lyarwoodmdbooth: yeah the issue wasn't with the config disk but the main instance disk iirc13:11
mdboothOk.13:12
lyarwoodhmm actually that's vdb13:12
lyarwoodbut you can see we are mirroring13:12
*** holser_ has joined #openstack-nova13:12
*** sambetts_ has quit IRC13:15
odyssey4meHi folks. Is there a conf entry for the number of workers nova-scheduler fires up?13:17
mdboothlyarwood: Ok, now I understand the interaction.13:17
odyssey4meI can't seem to find one in the references.13:17
*** sambetts_ has joined #openstack-nova13:18
odyssey4meok, it would appear that there is one: https://github.com/openstack/nova/blob/master/nova/cmd/scheduler.py#L4913:19
*** sambetts_ has quit IRC13:22
*** sambetts_ has joined #openstack-nova13:23
*** mriedem has joined #openstack-nova13:24
*** amarao has left #openstack-nova13:28
*** gbarros has quit IRC13:28
mriedemdansmith: i'm +2 on https://review.openstack.org/#/c/582413/ if you want to re-apply your +213:28
mriedembauzas: can you go through these backports? https://review.openstack.org/#/q/topic:bug/1784705+status:open13:29
*** edmondsw has quit IRC13:29
mriedemmel said she was looking to cut stable releases today13:29
mriedemso i'm going to try and flush some of these out13:29
*** _ix has quit IRC13:29
dansmithokay13:30
*** edmondsw has joined #openstack-nova13:37
*** sambetts_ has quit IRC13:37
*** sambetts has joined #openstack-nova13:38
*** jistr is now known as jistr|call13:39
mriedemstephenfin_: e. gads. https://review.openstack.org/#/q/topic:bug/1746393+status:open13:40
*** edmondsw has quit IRC13:41
mriedemfeels like a feature as a bug fix13:41
mriedemespecially nervous when we have 0 CI of any of this stuff13:42
*** ccamacho has quit IRC13:44
*** edmondsw has joined #openstack-nova13:44
*** ccamacho has joined #openstack-nova13:44
*** eharney has quit IRC13:45
mriedemsean-k-mooney[m]: i guess the intel 3rd party PCI/NFV CI must be dead huh?13:46
mriedemdansmith: here are the queens backports with a +2 ready to go https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/queens+label:Code-Review=2 - the rest are in stephen's series, which i'd want to hold off on for a bit13:47
*** ccamacho has quit IRC13:48
mriedemoh and https://review.openstack.org/#/c/590062/ would be nice - that fix sat for over a year13:49
*** eharney has joined #openstack-nova13:49
melwittnova meeting in 10 minutes13:50
*** sean-k-mooney has joined #openstack-nova13:51
melwittcdent: is this bug considered closed/fixed now that both patches have landed? neither patch used Closes-Bug in the commit message https://bugs.launchpad.net/nova/+bug/178605513:53
openstackLaunchpad bug 1786055 in OpenStack Compute (nova) "performance degradation in placement with large number of resource providers" [High,In progress] - Assigned to Chris Dent (cdent)13:53
*** takashin has joined #openstack-nova13:53
cdentmelwitt: hmmm. There is more than can be done, but not likely that more will be done _now_, so I would guess closed is probably a reasonable state. The major factor has been addressed. Fixing the rest will involve considerable refactoring13:54
*** sambetts has quit IRC13:55
melwittcdent: I see, thanks13:56
*** sambetts_ has joined #openstack-nova13:58
*** awaugama has joined #openstack-nova13:58
*** jistr|call is now known as jistr13:59
*** amotoki_ is now known as amotoki14:01
openstackgerritLee Yarwood proposed openstack/nova master: libvirt: Use os.stat and os.path.getsize for RAW disk inspection  https://review.openstack.org/58956714:01
*** _ix has joined #openstack-nova14:03
*** gbarros has joined #openstack-nova14:05
*** sambetts_ has quit IRC14:05
*** jaypipes has quit IRC14:06
*** sambetts_ has joined #openstack-nova14:07
*** josecastroleon has quit IRC14:08
*** jaypipes has joined #openstack-nova14:10
mriedemi guess we don't need to wait for translations https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:zanata/translations14:17
*** josecastroleon has joined #openstack-nova14:19
melwittyeah, I was thinking it seems like openstack doesn't do translations anymore but I didn't know how to check. I'll add that link to the release checklist wiki14:20
melwittI know we don't translate log messages14:20
melwittbut other user-facing message, still translate? I wasn't sure14:20
efriedafaik we're still supposed to _() for exception messages.14:21
lyarwoodmdbooth: remind me again where the compute code was that deleted and recreated an attachment?14:21
melwittaye, I have seen that14:21
*** gbarros has quit IRC14:21
*** sambetts_ has quit IRC14:21
mdboothlyarwood: _terminate_volume_connections14:22
lyarwoodmdbooth: urgh was looking at remove_volume_connection14:24
*** sambetts_ has joined #openstack-nova14:24
mdboothlyarwood: IIRC it was triggering a bug in the cinder fixture, which assumed only 1 attachment14:25
mdboothBut with this we've briefly got 2 attachments14:25
*** ccamacho has joined #openstack-nova14:25
*** ccamacho has quit IRC14:25
*** ccamacho has joined #openstack-nova14:26
mdboothlyarwood: I don't love the raw-only fix, tbh, because I think it increases the test and maintenance burden. If that code needs to live on I'd prefer to bring it together somehow.14:26
mdboothI'll abandon the stat thing, though, because as you point out it's not a solution14:27
lyarwoodmdbooth: kk, well it improves performance for the raw images user that reported the issue in the short term until we start using disk.info to store the virtual size14:28
lyarwoodmdbooth: and given that means we also need to refactor code out of imagebackend I'd rather land something simple first then focus on that14:29
mdboothI don't think it's a refactor, btw. Just code motion really iirc.14:30
mdboothWould just be moving it elsewhere to make it easier to call.14:30
mriedemGET /kashyap returns me a 40414:31
mriedemis his nick not registered?14:31
mdboothmriedem: Yep. He's out for a few more days yet, I think14:31
mriedemblarg14:31
mriedemwanted him to read the "nova-compute choosing incorrect qemu binary when scheduling 'alternate' (ppc64, armv7l) architectures?" thread in the ML14:31
mriedemwondering if there is a good reason why we don't set guest.arch in the libvirt domain xml based on the hw_architecture image property14:32
mriedemmaybe someone could ask danpb?14:32
mdboothstephenfin_: might have an opinion14:33
mdboothmriedem: Is it tagged [nova]?14:33
* mdbooth doesn't see it14:34
openstackgerritLee Yarwood proposed openstack/nova master: Add regression test for bug#1784353  https://review.openstack.org/58701414:34
openstackgerritLee Yarwood proposed openstack/nova master: WIP compute: Terminate volume connections during _shutdown_instance  https://review.openstack.org/59034814:34
lyarwoodmdbooth: ^ that's the terminate_volume_connections alternative btw, without unit test changes14:34
mdboothlyarwood: That was fast! Looking.14:34
mdboothmriedem: Do you have an opinion on ^^^, btw?14:35
mdboothmriedem: Basically leaves us with a blank attachment when calling _shutdown_instance14:36
lyarwoodhmm that removes the call to detach with cinderv214:36
lyarwoodwhy would we do that during shutdown?14:37
mdboothlyarwood: remind me what v2 detach() does14:38
mriedemmdbooth: seems fun at first since it's very much the same code,14:38
mriedembut see inline comments14:38
*** sambetts_ has quit IRC14:39
mdboothterminate_connection() causes the storage backend to kill the connection14:39
mriedemmdbooth: v2 detach is just the os-detach volume action which changes the volume status to 'available'14:39
mriedemdoes nothing on the volume backend14:39
mdboothmriedem: Got it.14:39
mriedemso if you did this,14:40
*** sambetts_ has joined #openstack-nova14:40
*** hongbin has joined #openstack-nova14:41
mriedemyou'd have to have volume attachment cleanup code in both compute manager if we don't reschedule (max_attempts=1 or force_hosts/nodes is set) or if we do reschedule and conductor build_instances hits MaxRetriesExceeded14:41
mriedemwhich is essentially what we have for cleaning up ports14:41
sean-k-mooneymdbooth: stephenfin_ is on vaction for the next week and a half just fyi14:41
mriedemi assume that was meant for me14:42
mdboothmriedem: See, over here in communist Europe everybody takes vacation in August14:42
sean-k-mooneymriedem: well both you and mdbooth since he suggested stephenfin_ would have an oppion on something14:42
sean-k-mooneyi was still scrolling back to see what14:42
*** dtantsur is now known as dtantsur|brb14:43
lyarwoodmriedem: kk thanks, so this doesn't really simplify the fix at all14:43
mriedemlyarwood: not really14:43
mriedemreschedules are a minefield14:44
mriedemmdbooth: is that what bauzas is doing as well?14:44
mriedemfrance gets august off right?14:44
lyarwoodpretty much14:44
sean-k-mooneymriedem: oh ye were talking about setting the arch in libvirtxml based on hw_architecture14:44
mriedemfor wine and cheese and love making14:44
mdboothmriedem: Yep14:44
melwittefried: haha, I am literally looking at that review already right now14:44
efriedmelwitt: Cool. I wanted to make sure it got attention from someone who knows how to spell "quota" (which ain't me).14:45
dansmithmriedem: question in here: https://review.openstack.org/#/c/590062/114:45
dansmithI know it's a backport, but want to make sure I understand at least14:45
melwittefried: yeah, makes sense. I'm conflicted about the sentence they're proposing because it's not true that it's not possible to count keypairs. it's just, for legacy reasons (before pike) it always returned zero so I kept the behavior with the quota work in pike14:46
mdboothmriedem lyarwood: I think that approach is fundamentally good. We do need to think about the attachment cleanup after the last reschedule failure, but if we're not doing that then we've always been leaking there.14:46
efriedmelwitt: Ah, okay, then perhaps it should just say, "For legacy reasons, this value is always zero. We'll fix it in a future microversion. Maybe. If you're lucky."14:46
mdboothThat is, we can leak there right now, because it's possible to schedule to a compute and have it fail before touching volumes, so the 'reservation' still exists after failure.14:47
mdbooth^^^ * 3 == leak14:47
melwittefried: haha, yeah.14:47
mdboothUnless we already handle it14:47
mriedemmdbooth: what i'd be most comfortable with is if we suspect we leak today, that we add a functional regression test for that which does reschedules with a volume attached, and asserts at the end of the reschedules when we get novalidhost (for max retries exceeded) that we've cleaned up all attachments to the volume14:48
mriedemmdbooth: because this is too hairy to go on based on review alone14:49
mriedem*functional test (not really a regression if it's always leaked)14:49
mriedemthat may have been fixed recently though14:49
mriedemafter about 7 years of being broken14:49
mdboothGot it. I did a bit of an audit here, btw: https://review.openstack.org/#/c/587071/9/nova/tests/unit/conductor/test_conductor.py@100414:50
mriedemI8b1c05317734e14ea73dc868941351bb31210bf014:51
*** hvvcben has joined #openstack-nova14:51
mriedemyeah so we'll call _cleanup_volumes which will detach if we abort the build14:52
mriedembut not if we reschedule14:52
*** priteau has quit IRC14:52
mriedemand conductor doesn't do any volume cleanup on MaxRetriesExceeded14:52
mriedemso that's probably a separate bug,14:52
mriedemand would benefit from a functional test since it involves more than a single service14:52
mriedem(really 3 - conductor and 2 computes for the reschedule)14:52
mriedemwe might already have a func test that does reschedules with a volume attached14:53
mriedemi don't see one though14:54
mriedembut shouldn't be hard to write14:54
lyarwoodmdbooth: https://review.openstack.org/#/c/587014/ does that14:58
mdboothlyarwood: Thought I'd seen it recently :)14:59
hvvcbenHi - probably a newb question and if this isn't the correct channel please advise.  - I am trying to rework an old neutron ML2 driver into Queens and having issues with Nova-compute during port creation.   Because of no  bind_host_id = $nodeID  during instance create.    The nova-compute api call to neutron to create port it doesn't have the bind_host_id set in the api call.   On older version like Mitaka, that parameter  is  in15:00
hvvcben the API call, i.e "binding:host_id": "mymitkaComputehost",15:00
hvvcbenany help or links to doc pertaining to these changes would be greatly appreciated15:00
*** Swami has joined #openstack-nova15:03
*** ivve has quit IRC15:03
*** janki has quit IRC15:03
mriedemmelwitt: is your link to irc in https://review.openstack.org/#/c/589972/ wrong?15:03
melwittoh, yeah it is now because I used "latest". derp. I think I've done that a few times lately15:04
melwittadded a new comment with the right link15:05
hvvcbenmitka api nova-compute api to neutron = "binding:host_id": "mymitkaComputehost",      Queens nova-compute api call to neutron more like "binding:host_id": "",15:06
mriedemhvvcben: i see bind_host_id in the neutronv2/api.py code in queens15:06
mriedemare you saying bind_host_id isn't being passed down from the compute manager to allocate_for_instance?15:06
hvvcbenyes,15:07
mriedemi think that was only ever used by the ironic driver15:07
*** rmart04 has quit IRC15:07
mriedemit's still used in queens https://github.com/openstack/nova/blob/stable/queens/nova/compute/manager.py#L139015:08
hvvcbenI am just having trouble figuring out why it does it in mitaka and not in later version, I was thinking the port creation process has been modified and that value would come later in the process15:08
mriedembut as i said, that would only ever have a value for ironic15:08
hvvcbenthis particular driver does interact with hardware and in its present state fails if no Host_id is passed15:09
hvvcbenhardware meaning switch hardware15:10
mriedemthe only difference i see when setting binding:host_id between mitaka and queens is that in mitaka we only set that if the neutron port binding extension was available, and we stopped looking for that sometime later and just assumed it would be available15:11
hvvcbeni was just curious, since i have default install of mitaka and it passes it(using openvswitch as driver) and the queens version doesn't was there some point where that was changed?  Thats what I am having trouble finding.  I thought it may relate to livemigration15:11
mriedemyou said you're trying to create an instance, not live migrate it, right?15:11
mriedemthere is nothing immediately obviously different between mitaka and queens in how binding:host_id is handled,15:12
hvvcbenyes yes, but I thought their were some rework done on port creation that affected nova and neutron dealing with port creation as a whole in effort to smooth out live migrations15:12
mriedemso you're going to have to debug15:12
mriedemthat's in rocky15:12
hvvcbenyes been trying15:12
mriedemmaybe you mean the migrating_to stuff?15:13
mriedemfor dvr15:13
*** sambetts_ has quit IRC15:13
mriedemif you're not live migrating, you wouldn't hit any of that so shouldn't be a problem15:13
hvvcbengotcha.  has the port creation process changed significantly from mitaka to queens?15:14
hvvcbenas far as what nova do etc?15:14
mriedemyou're talking about like a 2 year window of dev here :)15:16
mriedemi'm not aware of anything significant changing in that flow in that time though, no15:16
*** sambetts_ has joined #openstack-nova15:16
hvvcben: ) I know i know15:17
mriedemare you sure you're not using now-invalid config in queens?15:17
mriedemlike, we could have deprecated some config options in mitaka/newton and they are gone by the time you get to queens15:17
hvvcben ... appreciate i will dig further..  I just mainly need to find a way to get the nova-compute host_id and pass it to neutron in a way during create_port_precommit15:17
*** s10 has quit IRC15:18
mriedemwhich virt driver are you using?15:18
mriedemlibvirt?15:18
mriedemhttps://github.com/openstack/nova/blob/mitaka-eol/nova/virt/driver.py#L158715:18
hvvcbenprobably all of the above...  -- the driver was designed to work with Mitaka and not maintained their have been quite a bit of changes in neutron since then obviously(in a good way)15:19
hvvcbenyes it is libvirt15:19
mriedemhttps://github.com/openstack/nova/blob/stable/queens/nova/virt/driver.py#L165715:19
mriedemthe only other thing i can think is by the time we call network_binding_host_id in the compute manager, the instance.host field isn't set yet15:19
hvvcbenyea i think that may be part of a port staging process now where back then  it was more like "Create it right now"15:20
mriedemmelwitt: i'm +2 on the reno https://review.openstack.org/589303 and the rpc alias https://review.openstack.org/589972 so you will need to bug another core15:20
melwittmriedem: ack, thanks15:20
*** Bhujay has quit IRC15:21
mriedemhvvcben: shouldn't have changed this, the ResourceTracker.instance_claim sets the instance.host,15:22
mriedemand that happens before we start the network allocation stuff15:22
openstackgerritJay Pipes proposed openstack/nova master: split gigantor SQL placement query into multiple  https://review.openstack.org/59004115:22
openstackgerritJay Pipes proposed openstack/nova master: placement: use simple code paths when possible  https://review.openstack.org/59038815:22
*** alex_xu has quit IRC15:24
hvvcbenthanks mriedem: thanks for the assistance15:25
mriedemdansmith: replied in https://review.openstack.org/#/c/590062/15:26
mriedemhvvcben: np, good luck15:26
openstackgerritEric Fried proposed openstack/nova master: Nix 'new in 1.19' from 1.19 sections for rp aggs  https://review.openstack.org/59038915:27
*** hvvcben has quit IRC15:27
*** dtantsur|brb is now known as dtantsur15:29
*** ccamacho has quit IRC15:29
openstackgerritLee Yarwood proposed openstack/nova master: fixtures: Track volume attachments within CinderFixtureNewAttachFlow  https://review.openstack.org/58701315:29
openstackgerritLee Yarwood proposed openstack/nova master: Add regression test for bug#1784353  https://review.openstack.org/58701415:29
openstackgerritLee Yarwood proposed openstack/nova master: conductor: Recreate volume attachments during a reschedule  https://review.openstack.org/58707115:29
dansmithmriedem: ah dang, I saw update_cells and stopped reading15:31
dansmiththinking that was just the v1 sync thing like instance save15:32
dansmithso nevermind15:32
*** takashin has left #openstack-nova15:37
*** priteau has joined #openstack-nova15:38
mriedemlyarwood: can you hit this? https://review.openstack.org/#/c/590062/15:39
*** dklyle has joined #openstack-nova15:39
lyarwoodmriedem: yup looking15:43
*** psachin has quit IRC15:44
*** rmart04 has joined #openstack-nova15:48
*** tssurya has quit IRC15:48
*** sahid has quit IRC15:49
melwittmriedem: I just happened upon the patch for adding the zvm driver to the support matrix https://review.openstack.org/53272015:53
melwittother doc updates are stacked on top15:53
melwittand I found that no reno was added for the zvm driver at the time of the changes, so I think someone needs to add that15:54
mriedemif you want it, it's likely going to have to be you15:55
mriedemjichen is probably gone for the day15:55
melwittyeah. I was thinking that, given the time factor15:56
openstackgerritJay Pipes proposed openstack/nova master: placement: use simple code paths when possible  https://review.openstack.org/59038815:56
openstackgerritJay Pipes proposed openstack/nova master: split gigantor SQL placement query into multiple  https://review.openstack.org/59004115:56
*** dklyle has quit IRC15:59
*** dklyle has joined #openstack-nova16:01
openstackgerritEric Fried proposed openstack/nova master: Adds a test for _get_provider_ids_matching()  https://review.openstack.org/59015016:04
*** jpena is now known as jpena|off16:07
*** Bhujay has joined #openstack-nova16:08
mriedem-1 on the zvm feature support matrix patch16:10
mriedemso if we do an rc2, the zvm docs and such might need to fall into that16:10
melwittack16:10
mriedemthe only mention that it was added will be in your prelude reno16:11
*** holser_ has quit IRC16:12
melwittI know ... I'm writing up it's own reno based on the patches, and hopefully efried can help. I would like it to have its own reno with the details and not have the only mention be in the prelude16:12
melwittI didn't realize it was missing a reno of its own16:13
mriedemok, i personally don't think we should hold up https://review.openstack.org/#/c/589303/ on that,16:13
mriedembut ok16:14
efriedI would only be guessing in writing up that reno. I guess it prolly needs to be done by EOB though, huh?16:14
efriedmriedem: No, I agree, I was holding on the other issue.16:14
*** derekh has quit IRC16:14
mriedemthe grammar nit?16:14
mriedemthen let's just fix it inline and approve?16:14
efriedyeah, sounds good.16:15
mriedemmelwitt: ^?16:15
melwittmriedem: you think it's ok for that to be the only mention? if so, I'm fine with it. you know a lot more about this than I do16:15
mriedemi'm fine with it16:15
mriedemthe bigger docs series is what really matters16:15
mriedembut that's not ready for rc116:15
efriedI can try to rip out a reno real quick16:15
melwittok, inline fix it and approve is cool with me then16:15
mriedemyou can always do the docs in rc216:16
efriedmriedem: I'll do the grammar fix and push it.16:16
mriedemack16:16
openstackgerritEric Fried proposed openstack/nova master: Add a prelude release note for the 18.0.0 Rocky GA  https://review.openstack.org/58930316:16
melwittok, I wasn't sure if a docs-only thing was rc2 worthy. if it is, then we can do that16:16
efriedmelwitt, mriedem: done. If we're not worried about getting the reno landed today, I'll happily wait for jichenjc to do it.16:17
mriedemi'm not losing sleep over a detailed reno for the zvm driver which does a very small number of things - the docs are more important to me16:18
mriedemthe prelude mentions it16:18
mriedemif someone wants to learn more, they can find the driver docs16:18
mriedemthe way i look at renos, if it's really detailed, it likely needs to be a doc, because release notes are a one time only thing16:19
mriedemmelwitt: should probably hold up stable releases on https://review.openstack.org/#/q/topic:bug/1784705+status:open16:22
melwittok, that's helpful. fwiw, I was thinking basic detail in the reno like, what operations are supported (spawn, destroy, snapshot, get console output, power actions)16:22
melwittmriedem: ok, will do16:22
mriedemanyone using ironic + ComputeCapabilitiesFilter will be hit by those16:22
mriedemmelwitt: supported ops for the zvm driver are in the feature support matrix16:22
mriedemso i wouldn't put that in the reno16:22
melwittok. efried, we don't need an additional reno ^16:23
efriedOkay, wfm. jichenjc, in case you're snooping later ^16:24
efriedThe prelude could have linked to the admin config doc, if it was landed, but it ain't, so...16:24
*** rmart04 has quit IRC16:25
*** rmart04 has joined #openstack-nova16:25
melwittbased on what mriedem said, we can have a rc2 because of the docs, and land them there16:26
melwittso I'll add those to https://etherpad.openstack.org/p/nova-rocky-release-candidate-todo16:27
melwittok he already added them, thanks16:27
mriedemlyarwood: should i start reviewing https://review.openstack.org/#/c/587013/ again or do you expect more from mdbooth/16:28
mriedem?16:28
lyarwoodmriedem: that should be good now16:28
melwittdansmith, lyarwood: could you pls review these changes for the ironic bug, we're holding the stable releases on those fixes https://review.openstack.org/#/q/topic:bug/1784705+status:open16:31
*** awaugama has quit IRC16:31
lyarwoodmelwitt: ack'd the Pike changes16:32
melwittthanks16:33
*** rmart04 has quit IRC16:35
*** hongbin has quit IRC16:37
*** Swami has quit IRC16:37
*** s10 has joined #openstack-nova16:38
mriedemlyarwood: just a couple of small things in https://review.openstack.org/#/c/587013/16:38
mriedemrandom musings in your functional test too; i.e. i wonder how many volumes we orphan when nova creates the root volume and we reschedule16:44
mriedemnice fun way to go over volume quota16:44
lyarwoodmriedem: hmmm I forgot the compute did that, does it not see the existing bdm?16:46
mriedemthe existing bdm will have source_type=image on it or whatever16:48
mriedemright?16:48
mriedemwe don't update that16:48
mriedemso it will be transformed to a DriverImageBlockDevice or whatever16:48
openstackgerritMerged openstack/nova stable/pike: Fix bad links for admin-guide  https://review.openstack.org/59007216:48
mriedemDriverVolImageBlockDevice16:48
mriedemanyway, haven't tested it, but i'm pretty sure that's been busted since forever16:48
mriedemwe cleanup after ourselves for ports but not volumes16:49
lyarwoodah right understood, should be easy enough to show in another functional test16:51
mriedemmaybe.....i'm not sure the fixture is setup for that really16:51
mriedemdevstack is probably much easier/faster to start16:51
mriedemif you have 2 nodes...16:52
lyarwoodmriedem: I don't to hand but I'll make a note to give this a go16:57
*** s10 has quit IRC16:57
*** itlinux has joined #openstack-nova16:58
openstackgerritLee Yarwood proposed openstack/nova master: block_device: Rollback volumes to in-use on DeviceDetachFailed  https://review.openstack.org/59043917:21
*** udesale has quit IRC17:23
*** luksky has quit IRC17:27
*** cdent has quit IRC17:29
openstackgerritMatt Riedemann proposed openstack/nova master: placement: ignore policy scope check failures if not enforcing scope  https://review.openstack.org/59044517:34
*** gouthamr is now known as gouthamr_away17:37
openstackgerritMatt Riedemann proposed openstack/nova master: placement: ignore policy scope check failures if not enforcing scope  https://review.openstack.org/59044517:38
*** Bhujay has quit IRC17:39
openstackgerritMerged openstack/nova master: Update the parameter explain when updating a volume attachment  https://review.openstack.org/56518117:44
*** rmart04 has joined #openstack-nova17:47
*** gyee has joined #openstack-nova17:47
*** rmart04 has quit IRC17:50
*** rmart04 has joined #openstack-nova17:51
*** psachin has joined #openstack-nova17:51
*** awaugama has joined #openstack-nova18:07
*** priteau has quit IRC18:15
*** sambetts_ has quit IRC18:18
*** sambetts_ has joined #openstack-nova18:21
*** panda|ruck is now known as panda|ruck|off18:22
*** dtantsur is now known as dtantsur|afk18:24
*** owalsh has quit IRC18:50
*** owalsh has joined #openstack-nova18:51
*** gbarros has joined #openstack-nova18:58
*** luksky has joined #openstack-nova18:58
*** rmart04 has quit IRC19:02
*** mriedem has quit IRC19:11
openstackgerritMatt Riedemann proposed openstack/nova master: placement: ignore policy scope check failures if not enforcing scope  https://review.openstack.org/59044519:11
*** mriedem has joined #openstack-nova19:11
openstackgerritMatt Riedemann proposed openstack/nova master: Fix image-defined numa claims during evacuate  https://review.openstack.org/58865719:15
openstackgerritMatt Riedemann proposed openstack/nova master: Add encrypted volume support to feature matrix docs  https://review.openstack.org/57025519:18
openstackgerritMatt Riedemann proposed openstack/nova master: api-ref: fix GET /flavors?is_public description  https://review.openstack.org/58809219:28
*** pcaruana has quit IRC19:33
*** gbarros has quit IRC19:35
*** prometheanfire has joined #openstack-nova19:37
prometheanfireis https://github.com/openstack/nova/commit/ff747792b8f5aefe1bebb01bdf49dacc01353348#diff-f4019782d93a196a0d026479e6aa61b1R6938 run multiple times or only checked once?19:37
openstackgerritMerged openstack/nova stable/queens: Fix host validity check for live-migration  https://review.openstack.org/59026219:37
openstackgerritMerged openstack/nova master: [placement] api-ref: add description for 1.29  https://review.openstack.org/58940719:37
prometheanfirelive migrations seem to be limited to 1M a sec and never increase19:37
melwitthm19:44
melwittdo you know anything about that mriedem ^19:45
mriedemprometheanfire: linuxbridge?19:45
mriedemhttps://github.com/openstack/nova/commit/ff747792b8f5aefe1bebb01bdf49dacc01353348#diff-f4019782d93a196a0d026479e6aa61b1R538019:46
mriedemare you using linuxbridge i mean19:46
prometheanfireya, lb19:47
prometheanfirevlan interface on the VM19:47
mriedemwell, we should be waiting on network-vif-plugged events from neutron and if we get them, we set the bw back up and resume the live migration, else we should fail the live migration19:48
mriedemi'm assuming you have vif_plugging_timeout left at the default config of 300?19:48
prometheanfirethe migration finishes, it's just slow19:49
prometheanfireI don't think we changed it19:49
mriedemdoes it finish in under 5 minutes?19:49
prometheanfiretakes ~10 min19:49
prometheanfiredebug log19:49
prometheanfirehttps://gist.githubusercontent.com/mheler/475d21b741aa58f320a456c3ac0d0f45/raw/ff76c45c3b6968b0d0514a9a7dcf478f054f6b70/gistfile1.txt19:49
mriedemwe should have either timed out and failed by then or reconfigured the guest to go back to the normal bw19:49
prometheanfirewhich includes x-auth info, great19:50
mriedemi don't see either the timeout or "VIF events received, continuing migration" messages in those logs19:50
prometheanfireya, either do I, which is why I'm confused19:50
melwittI'm not seeing that message "LOG.debug('VIF events received, continuing migration with max bandwidth configured" in your logs19:51
prometheanfirethat was the first thing I looked for19:51
mriedem_http_log_request /openstack/venvs/nova-r16.2.2/lib/python2.7/site-packages/keystoneauth1/session.py:375 for that x-auth thing19:51
melwittbah, lag19:51
mriedembesides me, sahid and dansmith are the other two that know about that change,19:51
mriedembut are you sure you actually have that code?19:51
mriedemwould be nice if we logged something like "waiting for events" in that block19:52
prometheanfireya, set using b58c7f033771e3ea228e4b40c796d1bc95a087f5 from nova19:52
mriedemprometheanfire: well your token thing isn't a problem :) https://github.com/openstack/keystoneauth/blob/master/keystoneauth1/session.py#L37119:54
mriedemit's redacted19:54
mriedemprometheanfire: do you know the instance id in question here?19:54
mriedemchecking logs w/o an instance id is kind of hard19:54
prometheanfireyes19:54
mriedemalso,19:54
mriedemare these logs from the source or dest host?19:55
mriedemb/c what we're looking for would be source host19:55
prometheanfirethe logs are from grepping it for c37d7489-a67b-47ea-a4f7-9323804cc55219:55
prometheanfireya, source19:55
mriedem2018-08-09 21:20:20.557 12111 DEBUG nova.compute.manager [req-84ca4a17-0d3d-4597-91d6-f5721989dd41 143ee57edd4d4e3b9a165d375d0e7e1a a727713d2c0a4ed69b730d9cb2116af6 - default default] [instance: c37d7489-a67b-47ea-a4f7-9323804cc552] Received event network-vif-plugged-490f6f25-8b88-487c-a76b-62d16e3c0da1 external_instance_event /openstack/venvs/nova-r16.2.2/lib/python2.7/site-packages/nova/compute/manager.py:707119:56
mriedem2018-08-09 21:20:20.558 12111 DEBUG nova.compute.manager [req-84ca4a17-0d3d-4597-91d6-f5721989dd41 143ee57edd4d4e3b9a165d375d0e7e1a a727713d2c0a4ed69b730d9cb2116af6 - default default] [instance: c37d7489-a67b-47ea-a4f7-9323804cc552] No waiting events found dispatching network-vif-plugged-490f6f25-8b88-487c-a76b-62d16e3c0da1 pop_instance_event /openstack/venvs/nova-r16.2.2/lib/python2.7/site-packages/nova/compute/manager.py:3619:56
mriedem2018-08-09 21:20:20.559 12111 WARNING nova.compute.manager [req-84ca4a17-0d3d-4597-91d6-f5721989dd41 143ee57edd4d4e3b9a165d375d0e7e1a a727713d2c0a4ed69b730d9cb2116af6 - default default] [instance: c37d7489-a67b-47ea-a4f7-9323804cc552] Received unexpected event network-vif-plugged-490f6f25-8b88-487c-a76b-62d16e3c0da1 for instance19:56
prometheanfirethis is a pike install btw19:56
mriedemmaybe we're getting the event before we're waiting for it?19:56
prometheanfirebut it was backported to pike, so meh19:56
prometheanfireperhaps19:56
mriedemwe could also be getting ^ from the vif plug that happens on the dest host during pre-live migration19:57
mriedemthe events are going to go to the source host19:57
mriedemwhich isn't waiting for those19:57
prometheanfirenot yet at least, ya19:57
*** eharney has quit IRC19:57
mriedemhttps://github.com/openstack/nova/commit/ff747792b8f5aefe1bebb01bdf49dacc01353348#diff-f4019782d93a196a0d026479e6aa61b1R689919:57
mriedem"They are going to be                                                        # created by libvirt at the very beginning of the                                                        # live-migration process."19:58
prometheanfireyep19:58
prometheanfireI read that :P19:58
mriedemthat must mean plug_vifs during pre-live migration on the dest host19:58
mriedemwhich triggers the event from neutron to the source host19:58
mriedemand we're getting it before we start waiting it looks like19:58
prometheanfirewhich isn't waiting yet?19:59
prometheanfireya19:59
mriedembut, you should then hit this https://github.com/openstack/nova/commit/ff747792b8f5aefe1bebb01bdf49dacc01353348#diff-f4019782d93a196a0d026479e6aa61b1R693319:59
mriedemand the migration should fail19:59
prometheanfirealso yes19:59
mriedemadded https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L675 in rocky but that doesn't help you here on pike20:00
prometheanfireyarp20:01
mriedemwe can't backport that either b/c it's got rpc changes in it20:01
mriedemso maybe the raised MigrationError isn't really doing anything?20:01
* prometheanfire shrugs20:01
mriedemi expect you to know exactly how all of this nova code works20:02
prometheanfirelolol20:02
*** psachin has quit IRC20:02
prometheanfireya, I'm honestly surprised that it finished, I expected it to fail20:02
mriedemno it shoud raise up,20:02
mriedemi was thinking we might be threaded but that's here https://github.com/openstack/nova/blob/ff747792b8f5aefe1bebb01bdf49dacc01353348/nova/virt/libvirt/driver.py#L692820:03
prometheanfiremaybe eventlet.timeout.Timeout isn't the actual error getting raised?20:03
openstackgerritMerged openstack/nova stable/queens: Update nova network info when doing rebuild for evacuate operation  https://review.openstack.org/59006220:03
prometheanfiremaybe we finish the migration before the timeout occurs?20:04
mriedemTimeout is the right error20:04
mriedemprometheanfire: well that's why i asked how long it took,20:04
mriedembut the default timeout is 5 min20:04
mriedemyou said it completed in 10 min20:04
prometheanfirewent from 21:20 to 21:25 ish20:05
mriedemoh, well that's not 10 min L(20:05
mriedem:)20:05
prometheanfireya20:05
mriedemso yeah i bet you completed before the timeout20:05
*** gbarros has joined #openstack-nova20:05
prometheanfire21:20:21.043 to 21:27:12.788 at least20:06
mriedembut honestly,20:06
mriedemthe threading here is messing with my head20:06
prometheanfireMigration running for 410 secs20:06
prometheanfireso over 5 min20:06
prometheanfireyep20:06
mriedemwait_for_instance_event is meant to register events to wait20:06
mriedemthen run some code and wait or timeout20:06
mriedem"opthread = utils.spawn(self._live_migration_operation" is what code gets run20:07
mriedembut that should mean we wait until we do "opthread.link(thread_finished, finish_event)"20:08
mriedemeven if we get the timeout, it seems this is pretty dangerous if we've already started the live migration in the hypervisor20:08
mriedemi +2ed this code too...20:09
prometheanfireso we can blame you :P20:09
mriedemkinda need dansmith here20:09
prometheanfireya20:09
mriedemi still don't know why you wouldn't see the timeout message20:09
prometheanfiresame20:09
mriedemfrom your log20:09
mriedem2018-08-09 21:27:20.197 12111 DEBUG nova.virt.libvirt.driver [req-84ca4a17-0d3d-4597-91d6-f5721989dd41 143ee57edd4d4e3b9a165d375d0e7e1a a727713d2c0a4ed69b730d9cb2116af6 - default default] [instance: c37d7489-a67b-47ea-a4f7-9323804cc552] Migration operation thread notification thread_finished /openstack/venvs/nova-r16.2.2/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:695220:09
mriedemsahid will be around in the morning if you can catch him20:11
dansmithcatch me up?20:11
*** nicolasbock has quit IRC20:11
mriedemdansmith: prometheanfire is running a linuxbridge live migration in pike,20:11
mriedemwith that bw slow down patch of sahid's20:11
mriedemthe live migration takes longer than our default vif plugging timeout20:12
dansmiththat got backported I assume?20:12
mriedemyeah (from us). and looks like we actually get the event before sahid's code registers to wait,20:12
mriedembut the weird thing is we don't get the timeout event after 5 minutes20:12
mriedemhttps://gist.githubusercontent.com/mheler/475d21b741aa58f320a456c3ac0d0f45/raw/ff76c45c3b6968b0d0514a9a7dcf478f054f6b70/gistfile1.txt20:12
mriedeminstance is c37d7489-a67b-47ea-a4f7-9323804cc55220:12
dansmithif it comes before we register it gets dropped,20:12
prometheanfirethat's the sender20:12
dansmithbut obviously the point is it's supposed to come after the register as you said20:13
mriedemgets dropped but won't we wait for something that doesn't come and timeout?20:13
dansmithshould timeout yeah20:13
mriedemthe network-vif-plugged is triggered via plug_vifs during pre_live_migration on the dest,20:13
mriedemwhich happens before his code runs to register the waiter20:13
mriedemso it's a total race window20:14
dansmithugh20:14
mriedemwhich is why we added https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L67520:14
mriedembut not backportable20:14
dansmithI thought it gets triggered by the actual guest starting on the other side, which came from the actual live migration op20:14
mriedemi'd need sahid to confirm that20:14
mriedembut network-vif-plugged, as far as i know, comes from plug_vifs on the dest during pre_live_migratoin20:15
mriedemwhich is before his code runs20:15
mriedemon the source20:15
dansmithso,20:15
dansmiththe even comes from the tap being created actually20:15
dansmith*event20:15
dansmithso maybe plug is creating a tap before libvirt does but I'm not sure how we'd give it to it20:16
mriedemprometheanfire: i'm assuming you have this https://review.openstack.org/#/c/586965/20:16
mriedem^ fix for the pike backport20:16
prometheanfireya20:17
dansmithI learned this after we were working on that patch though20:17
prometheanfirethat's within the sha I posted earlier20:17
prometheanfireotherwise it wouldn't succeed at all :P20:17
openstackgerritJay Pipes proposed openstack/nova master: placement: use simple code paths when possible  https://review.openstack.org/59038820:17
openstackgerritJay Pipes proposed openstack/nova master: split gigantor SQL placement query into multiple  https://review.openstack.org/59004120:17
mriedemprometheanfire: yeah20:17
mriedemvery obvious explosion20:17
openstackgerritJay Pipes proposed openstack/nova master: Adds a test for _get_provider_ids_matching()  https://review.openstack.org/59015020:17
mriedemprometheanfire: you have this? https://review.openstack.org/#/c/510013/20:18
sean-k-mooneymriedem: netwrokg-vif-plugged comre from neutron when it finishes wiering up the port20:18
mriedemprometheanfire: this one was fun in that it depended on neutron backports as well20:18
sean-k-mooneyalso i just as pluging stuff so need to scoll back to get context20:18
prometheanfirethat one I'm not sure, but probably20:18
mriedemmight want to check20:19
*** awaugama has quit IRC20:19
prometheanfirechecking20:19
prometheanfiremerged dec 3 into stable pike https://github.com/openstack/neutron/commits/stable/pike?after=ad8f00236cc57ce9a8f077dd2d32c6fada00e817+13920:20
openstackgerritMatt Riedemann proposed openstack/nova master: Handle binding_failed vif plug errors on compute restart  https://review.openstack.org/58749820:22
dansmithif the event comes from stuff we're doing in pre_dest, it seems unlikely we'd ever win the race in gate20:22
prometheanfireusing at least this version of neutron https://github.com/openstack/openstack-ansible/blob/5c341a7bada78edab5f3d132d55adb00eaf2413f/playbooks/defaults/repo_packages/openstack_services.yml#L12520:23
prometheanfirewhich is from 2018 in may20:23
mriedemidk this is where i thought we'd generate the event https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L777520:23
prometheanfireok, I have to go for a bit, but will be back20:24
dansmithwell,20:24
*** tbachman has quit IRC20:24
dansmithI did some debugging on plug stuff with the godaddy people a month or so ago,20:24
dansmithand read through all the neutron code related to this20:24
dansmithand I was surprised to learn that actually what happens is,20:24
dansmithsomething creates an interface with the right name,20:25
dansmitha periodic in the neutron agent notices, hooks it up and sends the event20:25
dansmithso it's a little less connected to us than I would have thought20:25
mriedemi knew the linuxbridge-agent does polling only b/c sean mooney explained that when we had the issue with waiting for hard reboot events for linuxbridge in the gate20:26
mriedemovs agent listens for an actual event from ovs itself20:26
dansmithmaybe that's why we can win it in the gate, since there was a polling loop20:26
*** Sundar has joined #openstack-nova20:26
mriedemwe use ovs in the gate20:27
mriedemfor most everything20:27
openstackgerritMatt Riedemann proposed openstack/nova master: Handle binding_failed vif plug errors on compute restart  https://review.openstack.org/58749820:27
dansmithI thought we had a LB job?20:27
mriedemneutron has a couple of lb jobs i think20:27
dansmithregardless, you did a LB migration job and it passed a few times at least20:27
mriedemi had rigged up a patch that ran linuxbridge multinode to test sahid's patch20:27
mriedemyeah, could have won the race though right?20:27
Sundarefried: Please ping me when you have a moment20:27
efriedHi Sundar, what's up?20:28
dansmithI guess because of the polling loop we have a decent chance20:28
mriedemand all of our controller services on a single node slowing things down maybe, idk20:28
dansmithbecause pre-dest is pretty long before we get to the event wait, which seems ...crazy to ever win it20:28
mriedemdon't know what prometheanfire's env setup is like20:28
dansmithif so, presumably this patch broke live migration for anyone with a fast system20:28
*** owalsh_ has joined #openstack-nova20:29
mriedemgod i hope so20:29
mriedemthat would be magical20:29
dansmithso, sahid has been arguing to make the timeout non-fatal, reportedly because someone was using a custom driver or something20:29
dansmithbut I wonder if it's actually because this is actually totes broken20:29
SundarWe were discussing the relative roles of plugins and drivers. It seems to me that that the distinction is not a hard one. We just need some extension agent with clear APIs for both os-acc and Cyborg. It could be the same module providing two sets of APIs. Does that work for you?20:29
efriedSundar: Absolutely.20:30
dansmithmriedem: melwitt: tbh, if we really think this is that broken (and sounds like it is) then we should probably revert it from the release immediately20:30
mriedemwell the other thing i was saying above was,20:31
melwittoof20:31
SundarGreat. I was trying to delineate the two and it was getting rather ambiguous. Good to have this sorted out. Will send out the os-acc spec with this update20:31
mriedemeven if we get the TimeoutError, we raise MigrationError or whatever,20:31
mriedembut i *think* by that point we've already started the guest transfer20:31
mriedemb/c we call _live_migration_operation20:31
efriedSundar: Well, we should still delineate the two, if they're going to be *able* to be separate modules.20:31
dansmithcorrect20:31
mriedemthe nwait20:31
*** owalsh has quit IRC20:31
mriedemso starting the guest transfer and then timing out on the watier b/c we registered late kills the nova live migration option but not what's in the hypervisor20:32
mriedemright?20:32
mriedems/option/operation/20:32
dansmithmriedem: um, what?20:32
mriedemif we get the timeout and raise MigrationError20:32
mriedembut have already started the guest transfer20:32
mriedemthe only thing that happens in nova is we call the _rollback_live_migration code,20:33
mriedemwe don't attempt to kill any live migration job that's running in libvirt20:33
mriedemright?20:33
Sundarefried: There will certainly be two sets of APIs: one for instance half of the attach and one for the device half. But both can do device-specific, platform-specific and vendor-specific actions.20:33
efriedSundar: I dig it.20:33
efriedSundar: And I like the idea of being able to supply that code in one module or two.20:33
dansmithlemme look20:33
mriedemin other words,20:34
*** owalsh has joined #openstack-nova20:34
mriedemnova will say "migratoin failed" but the guest might actually get transferred20:34
mriedemjust really f'ing slowly20:34
dansmithwell, yeah, I mean, the point of this code was to not raise the speed limit until it came20:34
*** owalsh_ has quit IRC20:35
mriedemso did sahid want the timeout to just log and we'd have a finally that always set the bw back up?20:35
dansmithyes20:35
mriedemgiven what seems to be a pretty easy race to fail, that seems like it would have been better20:36
dansmithwhich means you let it go to the other side but without networking20:36
Sundarefried: So, it may be superfluous to have two separate modules, which are separately loaded by Stevedore. os-acc would have to load both, and the distinction in terms of what each module does seems to come down to APIs, rather than anything else. So, we might as well define two sets of APIs, and have one module do both. Internally, of course, the module may have separate packages/sub-modules for different functionalities.20:36
dansmithmriedem: but the goal of the patch wasn't to "maybe catch the plug event", so if it never came in, it really should stop20:36
dansmithmriedem: so I think it should cancel20:36
dansmithwhich I said on the patch a couple of times, but I guess we never even got it that far20:37
SundarWe may also provide common functions in os-acc for specific hypervisors20:37
mriedemdansmith: ok, well the waiter is in the wrong place then, and https://review.openstack.org/#/c/558001/ was the right thing,20:37
mriedembut not backportable20:37
Sundarwhich any driver/plugin/module can invoke20:37
efriedSundar: Offhand I don't see a problem with that. Is it ever going to be the case that you need to run one but not both (i.e. a driver but not its corresponding plugin, or vice versa) on a given system?20:37
sean-k-mooneyefried: Sundar provided there is a well defiend versioned interface its ok but Sundar i dont thin os-acc should be able to alter the hypervior context20:37
efriedSundar: Let's move to #openstack-cyborg so we're not cross-talking with the others.20:38
sean-k-mooneye.g. just like os-vif os-acc should not be able to modify the libvirt xml20:38
dansmithmriedem: yeah, I was just looking through compute manager wondering why the fsck it was in there too20:38
dansmithmriedem: does that not work for LB for some reason?20:38
*** owalsh_ has joined #openstack-nova20:39
mriedemdoes what not work?20:39
openstackgerritMerged openstack/nova stable/queens: Reload oslo_context after calling monkey_patch()  https://review.openstack.org/58924920:39
mriedemhttps://review.openstack.org/#/c/558001/ ?20:39
dansmithyeah20:39
openstackgerritMerged openstack/nova stable/queens: Fix message for unexpected external event  https://review.openstack.org/58950520:39
*** owalsh has quit IRC20:39
mriedemprometheanfire is failing in pike20:39
mriedemhttps://review.openstack.org/#/c/558001/ is rocky20:39
openstackgerritMerged openstack/nova master: Trivial fix on migration doc  https://review.openstack.org/58902820:39
mriedemb/c we backported sahid's patch20:39
openstackgerritMerged openstack/nova master: Add a prelude release note for the 18.0.0 Rocky GA  https://review.openstack.org/58930320:39
Sundarsean-k-mooney: os-acc may provide device-specific XML snippets, for example, which libvirt driver would compose into a domain XML.20:39
Sundarefried: Sure, joined #openstack-cyborg20:40
dansmithmriedem: no, I realize that20:40
sean-k-mooneySundar: i really hope not or that code should live in the nova tree20:40
efriedsean-k-mooney: Can you join us in -cyborg?20:40
sean-k-mooneyefried: sure20:40
dansmithmriedem: what I'm saying is, because the event gets triggered from pre-migration, the wait should really be up a level in compute manager, which you added in rocky20:40
dansmithmriedem: and I'm asking if there's some reason why the wait in compute manager can't work with LB20:41
dansmithmriedem: so we like just rip sahid's stuff out of everywhere and make sure that you're including events in the compute manager wait20:41
sean-k-mooneyefried: #openstack-cyborg?20:42
efriedyes20:42
mriedemdansmith: it should work for LB as far as i know20:44
dansmithseems like it20:44
mriedemthe only backend i know that won't work,20:44
mriedemis ODL20:44
dansmithin fact20:45
mriedembecause that doesn't send events on vif plug/unplug, only port host binding changes20:45
sean-k-mooneymriedem: lb polls for new interfaces and can miss the addtion and removal of interfaces in some cases20:45
dansmithmriedem: you're not only waiting for ovs interfaces there right?20:45
sean-k-mooneyso we can rely on lb to emit the event20:45
mriedemdansmith: correct20:45
sean-k-mooneyor rathar the lb l2 agent20:45
dansmithmriedem: so you should be eating them up there, and then his wait is definitely never going to get them right?20:45
dansmithso in pike, I expect it races,20:46
dansmithand in rocky it never ever works at all20:46
mriedemyeah maybe20:47
mriedemi could dig up my 2 node lb ci patch,20:47
dansmithoh sweet baby jesus thank you for this day20:47
mriedemand enable this waiter in nova on master,20:47
mriedemand we'd have to probably turn the vif plugging timeout way down to actually see if we hit a timeout20:47
mriedemotherwise i'd expect in the gate, live migration with a tiny cirros guest not doing anything transfers pretty fast20:47
dansmithyeah20:48
mriedemheh, and i was just going to start mowing and packing20:48
dansmithmelwitt: so, honestly, reverting sahid's thing for rocky needs to be high prio I think20:48
dansmithmelwitt: live migration with LB is completely broken I expect20:48
mriedemi'll update that ci patch20:48
melwittok, so is this a RC1 thing or a RC2 thing?20:49
dansmithmelwitt: and it probably needs to be reverted out of the older releases too20:49
dansmithmelwitt: your call but RCsomething, IMHO20:49
prometheanfireback20:49
prometheanfirebut leaving soonish20:49
dansmithI would think we could do something like what mriedem did in rocky for those releases20:49
melwittok, definitely RC2. trying to do it by the end of today will be hard unless it gets approved, stat20:50
melwitt*definitely RC2, at least20:50
dansmithwell, maybe not since he needed a signal from the remote machine that it was going to do the wait ...20:50
prometheanfiremriedem: I'll try to get a bug reported if you think that's the next step20:51
dansmithalthough the event should still trigger20:51
mriedemprometheanfire: yes please20:51
dansmithprometheanfire: yes20:51
mriedemwe'll track that for rc220:51
prometheanfirerc2?20:51
mriedemwas going to try and see how long a guest transfer takes in the gate20:51
mriedemprometheanfire: today is release candidate 1 day20:51
prometheanfireI thought this didn't hit rocky20:51
prometheanfirebut I'll leave that to you20:51
mriedemprometheanfire: new wrinkle20:51
*** lbragstad has quit IRC20:51
prometheanfireoh, nice20:51
dansmithworse wrinkle20:52
mriedem(1) probably race fail on pike20:52
prometheanfirehappy to help :P20:52
mriedem(2) totes broken on master20:52
prometheanfireeven better20:52
dansmithprometheanfire: hook me up with a bug number and I'll propose the revert20:53
dansmithand I can comment on the bug with all the deets20:53
*** gouthamr_away is now known as gouthamr20:53
dansmithsince mriedem will be busy with the ci patch and packing for da nang20:53
mriedemisn't da nang vietnam?20:53
dansmithyes20:54
openstackgerritMatt Riedemann proposed openstack/nova master: placement: ignore policy scope check failures if not enforcing scope  https://review.openstack.org/59044520:54
prometheanfireI'm hoping to get the user to report, but I will if he went home20:54
dansmiththe revert is complete conflict20:55
dansmithwonderful.20:55
dansmiththe other benefit of doing this in the manager is that we don't need the silly artificial speed limit20:56
dansmithalthough we probably need sahid and libvirt people to confirm that there's not something we're missing here20:56
dansmithbecause I asked him specifically about doing this early on in his patch and he said it wasn't possible, but I believed that qemu/libvirt on the dest machine were responsible for the plugging at the time20:57
dansmithand maybe he did too20:57
mriedemmeanwhile, our granite seller is being an ass and i have to get back to our plumber21:00
dansmith#firstworldrichpersonproblems21:01
mriedemok so looking at a job, we register waiting for events starting here http://logs.openstack.org/98/587498/1/check/nova-live-migration/5ff805a/logs/screen-n-cpu.txt.gz#_Jul_31_17_18_19_56299921:06
mriedemJul 31 17:18:19.562999 ubuntu-xenial-inap-mtl01-0001077052 nova-compute[2320]: DEBUG nova.compute.manager [None req-942f438c-3cbb-4ce7-8afb-ecd250c98f75 tempest-LiveAutoBlockMigrationV225Test-1515676049 tempest-LiveAutoBlockMigrationV225Test-1515676049] [instance: 7f68c430-f565-433a-8f87-27b9a00d29a0] Preparing to wait for external event network-vif-plugged-6b030652-5fe6-471a-b7ec-0b70e95159a4 {{(pid=2320) prepare_for_instanc21:06
mriedement /opt/stack/new/nova/nova/compute/manager.py:328}}21:06
mriedempre_live_migration takes about 7 seconds21:07
mriedemJul 31 17:18:26.178239 ubuntu-xenial-inap-mtl01-0001077052 nova-compute[2320]: INFO nova.compute.manager [None req-994d9893-545f-47b9-b93e-1e21cb439db7 tempest-LiveMigrationTest-233418614 tempest-LiveMigrationTest-233418614] [instance: 7f68c430-f565-433a-8f87-27b9a00d29a0] Took 6.61 seconds for pre_live_migration on destination host ubuntu-xenial-inap-mtl01-0001077053.21:07
mriedemJul 31 17:18:26.237174 ubuntu-xenial-inap-mtl01-0001077052 nova-compute[2320]: DEBUG nova.virt.libvirt.driver [None req-994d9893-545f-47b9-b93e-1e21cb439db7 tempest-LiveMigrationTest-233418614 tempest-LiveMigrationTest-233418614] [instance: 7f68c430-f565-433a-8f87-27b9a00d29a0] Starting monitoring of live migration {{(pid=2320) _live_migration /opt/stack/new/nova/nova/virt/libvirt/driver.py:7555}}21:09
mriedemstart monitoring the live migration there ^21:09
mriedemlive migration complete:21:09
mriedemJul 31 17:18:27.486275 ubuntu-xenial-inap-mtl01-0001077052 nova-compute[2320]: INFO nova.compute.manager [None req-14ccce2e-8610-47b7-aba2-77f6fc468b61 tempest-LiveMigrationRemoteConsolesV26Test-255629851 tempest-LiveMigrationRemoteConsolesV26Test-255629851] [instance: 7f68c430-f565-433a-8f87-27b9a00d29a0] VM Migration completed (Lifecycle Event)21:09
mriedemheh 1 second?21:10
dansmithmakes sense.. I doubt a cirros guest has more than a hundred meg of dirty ram21:10
dansmithwhich is 1 second at gigE21:11
mriedemi never see "VIF events received, continuing migration"21:11
dansmithit's not LB21:11
dansmithright?21:11
mriedemoh right duh21:11
prometheanfireyou'll probably want to retitle the bug https://bugs.launchpad.net/nova/+bug/178634621:11
openstackLaunchpad bug 1786346 in OpenStack Compute (nova) "live migrations slow" [Undecided,New]21:11
prometheanfiremriedem: dansmith ^21:11
dansmithprometheanfire: thanks21:11
prometheanfireif you can let me know when you update the bug with details I'd appreciate it21:11
dansmithI'm trying to get the revert to even pass tests and then I will21:12
prometheanfirethanks21:14
*** dosaboy has joined #openstack-nova21:14
*** dave-mccowan has quit IRC21:17
mriedemdansmith: ok so https://review.openstack.org/553608 should do the wait in compute now21:18
*** rmart04 has joined #openstack-nova21:18
dansmithmriedem: cool, updating the bug now and working on the revert in parallel, so we can make that depend on the revert to be sure we don't get the timeout message at least right?21:19
*** mhen has quit IRC21:19
mriedemwell, that's why i was looking at timings,21:19
mriedembecause this means we'll consume the network-vif-plugged event from pre_live_migration before we call driver.live_migration which does the bw stuff,21:20
*** gouthamr is now known as gouthamr|brb21:20
mriedemthe event won't come for that waiter,21:20
mriedembut the guest transfer is so fast, won't we just finish the operatoin before we ever had a chance to timeout?21:20
mriedemlike, do i need a patch that puts a fake sleep in the driver's live migratoin metohd?21:20
mriedem*method21:20
*** gouthamr|brb is now known as gouthamr21:20
*** rmart04 has quit IRC21:20
*** gouthamr is now known as gouthamr|brb21:21
dansmitheven at 1MB/s?21:21
mriedemcould set the vif_plugging_timeout to like 1 minute, and add a 30 second sleep in the driver21:21
mriedemwell,21:21
dansmithshould go slower there, but I guess it won't take long enough21:21
mriedemmaybe not, but the test will timeout before the 5 minute vif_plugging_timeout i think21:21
dansmithyeah okay so we'll have to force it down I guess21:21
mriedemjust wondering if i should set the vif_plugging_timeout to like 1 minute21:22
dansmith30sec but yeah21:23
*** mhen has joined #openstack-nova21:23
mriedemok updated; hopefully my zuul fu is strong enough21:23
jaypipesmelwitt, dansmith: regarding https://review.openstack.org/#/c/540258, even if we fix the scheduler/top-level issues around server group affinity and multiple cells, that's still not going to fix the eleventh-hour on-the-compute-node checks that currently run just for affinity groups, though, right? I mean, the computes can't talk cross-cell anyway so there would be no way for those on-compute-node checks to run...21:23
mriedemjaypipes: yes https://review.openstack.org/#/c/540258/8/nova/scheduler/utils.py@73821:24
mriedem"Also note that we could be racing if we have multiple server create  requests for the same affinity group and the scheduler decides to put  them each in different cells - the late affinity check in the compute  won't resolve that because the upcall check is targeted to the cell the  compute is in, and won't see any other hosts for other members in other  cells. Separate bug though..."21:25
*** owalsh_ is now known as owalsh21:25
jaypipesmriedem: ack, ok, just wanted to verify I wasn't crazypants.21:26
*** slaweq has quit IRC21:26
mriedemgood to have more than just me thinking that21:26
mriedem*that it's an issue..21:27
mriedemnot that you're (not?) crazy21:27
dansmithprometheanfire: commented on the bug21:27
melwittyeah, it makes sense. I was re-thinking about the upcalls described in https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#operations-requiring-upcalls and it's true that they can only work if you're single cell21:28
melwittonce you're multi-cell/split-MQ I think none of them can work21:28
smcginnismelwitt: Howdy. How are things coming along for the RC?21:29
mriedemha21:30
*** rmart04 has joined #openstack-nova21:30
dansmithsmcginnis: awesome21:30
dansmiththey're going awesome21:30
dansmiththanks for asking21:30
melwitthaha ....21:30
smcginnisAnd I know no one here would ever be sarcastic so... that's great!21:31
smcginnis:P21:31
melwittsmcginnis: we stumbled upon something that we'll need to fix for rc2. but as for rc1, I'm waiting on the RPC version alias patch to land, then will propose the release for rc121:31
smcginnismelwitt: Cool, sounds good. I know a few others already know they will need to get an RC2, so that's no big deal. Thanks.21:32
mriedemdansmith: so one comment on the bug,21:33
mriedemby default the compute manager won't wait for the event21:33
mriedemthe config is false for backward compat21:33
melwittmriedem: I guess, since we're having a rc2, should I just leave the rpc version alias for then? or get it for rc1?21:33
melwittsmcginnis: we're not alone... :)21:33
dansmithmriedem: oh? I didn't see a config valve21:34
mriedemyes, live_migration_wait_for_vif_plug21:34
mriedemb/c not all network backends send the event for just vif plugging (ODL)21:34
*** rmart04 has quit IRC21:34
smcginnismelwitt: We can wait a bit. Unless you think it will be several hours yet.21:34
mriedemmelwitt: probably fine either way21:35
dansmithmriedem: ah, so we handled that in sahid's patch by looking for the vif_Type21:35
melwittmriedem: thx. probably will just go ahead now and let that be in rc221:35
dansmithmaking people know to opt into proper behavior kidna sucks21:35
melwittthe version alias was recheck at 13:30 so it's got awhile before it will have a chance to merge21:36
mriedemtrue,21:36
mriedembut ODL shows up as ovs vif_type21:36
dansmithoh21:36
dansmithwell, that sucks21:36
mriedemwhich means we have no idea how to not wait for ODL21:36
mriedemremember the hard reboot wait fiasco?21:36
dansmithwell, I'd say let them opt out21:36
dansmithpersonallt21:36
dansmithbut whatever21:36
mriedemthat's why i plan on making the option True by default in stein21:36
melwittsmcginnis: the version alias patch was rechecked an hour ago, so it'll be awhile to merge, if the gate doesn't fail on us again. but since we're having a rc2, that patch will be fine to go into rc2, so I can just cut rc1 now21:37
dansmithoh I see, I just read your comment21:37
dansmithokay21:37
mriedemi might have also defaulted to False when i originally thought we'd backport this21:37
smcginnismelwitt: Up to you. I'm fine waiting awhile.21:37
mriedemsmcginnis doesn't have to go to china tomorrow21:37
melwittheh21:37
smcginnis:)21:38
openstackgerritDan Smith proposed openstack/nova master: Revert "libvirt: slow live-migration to ensure network is ready"  https://review.openstack.org/59053821:40
dansmiththat was a super nasty revert, fyi21:40
dansmithso look at it with critical eyes21:40
melwittmriedem, get out soul crusher #321:40
*** gouthamr|brb is now known as gouthamr21:40
melwittsmcginnis: am I to include cycle-highlights in the patch? I see that was done for queens21:45
melwitt*in the release patch21:45
prometheanfiredansmith: thanks21:46
smcginnismelwitt: Ideally, yes. Marketing type folks would love to have that.21:48
melwittok, I will include them21:48
smcginnismelwitt: It can be a follow up patch too though.21:48
melwittthanks21:48
*** lbragstad has joined #openstack-nova21:55
mriedemdansmith: ok done21:56
mriedemthe params stuff looks like it made that terrible21:56
dansmithyes, yes it did21:57
dansmithmriedem:  you want a reno that says what exactly?21:57
*** mchlumsky has quit IRC21:58
mriedemwell we can revert this because the original bug is fixed with the new config option right?21:58
dansmiththat bug $orig was solved automatically but because of bug $new you must now enable $conf?21:58
mriedemright21:58
dansmiththe original bug was arguably less bad than the current state21:58
dansmithokay21:58
mriedemthe chances of anyone even having picked up that fix on stable already and be relying on it are pretty slim, at least for upstream, but you guys sound like you had at least one major customer that needed this21:59
dansmiththey did, but it never worked for them.. I think I now know why :)21:59
dansmithmriedem: and are you asking me to actually clean up those tests here or just commenting about later reverts?22:01
melwittmriedem: rc1 release proposed https://review.openstack.org/59057422:04
melwittI tried to pick the top highlights, let me know if I should add/remove based on your opinion22:05
mriedemdansmith: just commenting, and that we can clean up that other unused stuff in the later separate revert22:05
openstackgerritDan Smith proposed openstack/nova master: Revert "libvirt: slow live-migration to ensure network is ready"  https://review.openstack.org/59053822:07
*** luksky has quit IRC22:07
*** neiljerram has quit IRC22:08
dansmithsince we're doing this in rc2, and since melwitt will be up early tomorrow to ping him anyway, I assume we're going to wait for sahid's ack before putting this in?22:08
mriedemmelwitt: lgtm22:08
mriedemyeah, also waiting on the recreate in my ci patch22:09
dansmithyeah22:09
melwittyeah, let's talk to sahid tomorrow22:09
*** rcernin has joined #openstack-nova22:09
*** slaweq has joined #openstack-nova22:11
*** imacdonn has quit IRC22:12
*** imacdonn has joined #openstack-nova22:12
*** tobasco is now known as tobias-urdin22:14
mriedemand,22:15
mriedemjust wrapped up my plumbing thing22:15
mriedemit's all coming together22:15
*** slaweq has quit IRC22:15
melwittgranite all stars22:15
mriedemwell you see the granite tops come with a free sink but i need to know the dimensions for the plumber otherwise we needed to order our own which costs extra obviously and there is a time crunch and just omwoeoweitew22:16
melwitthaha22:16
* mriedem goes to mow the lawn - the CI job is running tempest now22:21
mriedemhttps://review.openstack.org/#/c/553608/22:21
*** itlinux has quit IRC22:23
*** _ix has quit IRC22:29
openstackgerritMerged openstack/nova stable/ocata: [stable only] Handle quota usage during create/delete races  https://review.openstack.org/58241322:31
openstackgerritMerged openstack/nova master: Update ssh configuration doc  https://review.openstack.org/58984422:31
openstackgerritMerged openstack/nova stable/queens: [placement] Retry allocation writes server side  https://review.openstack.org/58856922:35
openstackgerritMerged openstack/nova stable/pike: Reload oslo_context after calling monkey_patch()  https://review.openstack.org/58925122:38
*** sambetts_ has quit IRC22:52
*** claudiub has quit IRC22:55
*** evrardjp has quit IRC22:55
*** sambetts_ has joined #openstack-nova22:58
*** sambetts_ has quit IRC23:02
openstackgerritmelanie witt proposed openstack/nova master: Add functional test for affinity with multiple cells  https://review.openstack.org/58507323:02
openstackgerritmelanie witt proposed openstack/nova master: Make scheduler.utils.setup_instance_group query all cells  https://review.openstack.org/54025823:02
*** sambetts_ has joined #openstack-nova23:06
*** gyee has quit IRC23:06
melwittguess I'll be holding off on stable releases because of the slow live migration issue23:06
*** slaweq has joined #openstack-nova23:11
*** gbarros has quit IRC23:11
*** slaweq has quit IRC23:16
*** efried has quit IRC23:31
*** efried has joined #openstack-nova23:31
*** gbarros has joined #openstack-nova23:50
*** slagle has joined #openstack-nova23:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!