*** liuyulong has quit IRC | 00:08 | |
*** hamzy has joined #openstack-nova | 00:09 | |
*** tonyb has quit IRC | 00:16 | |
*** tetsuro_ has joined #openstack-nova | 00:28 | |
*** tetsuro_ has quit IRC | 00:36 | |
*** gyee has quit IRC | 00:36 | |
*** edmondsw has joined #openstack-nova | 00:37 | |
*** edmondsw has quit IRC | 00:42 | |
*** tbachman has quit IRC | 00:43 | |
openstackgerrit | melanie witt proposed openstack/nova stable/ocata: [stable only] Handle quota usage during create/delete races https://review.openstack.org/582413 | 00:58 |
---|---|---|
openstackgerrit | melanie witt proposed openstack/nova stable/ocata: Add functional regression test for bug 1783613 https://review.openstack.org/588416 | 00:58 |
openstack | bug 1783613 in OpenStack Compute (nova) ocata "[ocata only] quota usage not decremented during boot/delete race" [Undecided,In progress] https://launchpad.net/bugs/1783613 - Assigned to melanie witt (melwitt) | 00:58 |
openstackgerrit | Merged openstack/nova master: In Python3.7 async is a keyword [1] https://review.openstack.org/584365 | 00:58 |
openstackgerrit | melanie witt proposed openstack/nova stable/ocata: [stable only] Add functional regression test for bug 1783613 https://review.openstack.org/588416 | 00:59 |
openstack | bug 1783613 in OpenStack Compute (nova) ocata "[ocata only] quota usage not decremented during boot/delete race" [Undecided,In progress] https://launchpad.net/bugs/1783613 - Assigned to melanie witt (melwitt) | 00:59 |
openstackgerrit | melanie witt proposed openstack/nova stable/ocata: [stable only] Handle quota usage during create/delete races https://review.openstack.org/582413 | 00:59 |
*** mrsoul has joined #openstack-nova | 01:06 | |
*** frankwang has joined #openstack-nova | 01:14 | |
melwitt | mriedem_afk: I added a functional regression test that might help demonstrate the bug ^ | 01:15 |
melwitt | customer hit an issue around this so I proposed it upstream too in case it can help | 01:18 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Remove unused request API sample template https://review.openstack.org/588420 | 01:30 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Remove unused request API sample template https://review.openstack.org/588420 | 01:31 |
*** mriedem_afk has quit IRC | 01:34 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Remove unused request API sample template https://review.openstack.org/588420 | 01:34 |
*** tetsuro_ has joined #openstack-nova | 01:36 | |
*** hongbin has joined #openstack-nova | 01:44 | |
lbragstad | melwitt: oh - so don't try and support per user quotas? | 01:44 |
*** gbarros has joined #openstack-nova | 01:47 | |
openstackgerrit | Zhenyu Zheng proposed openstack/nova master: Update installation guide to be more clear about cellsv2 https://review.openstack.org/584244 | 01:51 |
openstackgerrit | zhufl proposed openstack/nova master: Fix none-ascii char in doc https://review.openstack.org/588422 | 01:56 |
*** gbarros has quit IRC | 01:57 | |
*** tbachman has joined #openstack-nova | 01:57 | |
*** Dinesh_Bhor has joined #openstack-nova | 01:57 | |
*** tetsuro_ has quit IRC | 02:04 | |
*** threestrands has joined #openstack-nova | 02:11 | |
*** tonyb has joined #openstack-nova | 02:22 | |
*** threestrands has quit IRC | 02:23 | |
*** edmondsw has joined #openstack-nova | 02:25 | |
*** edmondsw has quit IRC | 02:30 | |
*** Kevin_Zheng has joined #openstack-nova | 02:34 | |
*** psachin has joined #openstack-nova | 02:34 | |
*** dave-mccowan has quit IRC | 02:35 | |
openstackgerrit | Vishakha Agarwal proposed openstack/nova master: No change in field 'updated' in server https://review.openstack.org/586446 | 03:03 |
*** tbachman_ has joined #openstack-nova | 03:09 | |
*** tbachman has quit IRC | 03:12 | |
*** tbachman_ is now known as tbachman | 03:12 | |
*** frankwang has quit IRC | 03:31 | |
*** udesale has joined #openstack-nova | 03:31 | |
*** frankwang has joined #openstack-nova | 03:31 | |
*** itlinux_ has joined #openstack-nova | 03:33 | |
*** hongbin has quit IRC | 03:52 | |
*** Dinesh_Bhor has quit IRC | 03:59 | |
*** ratailor has joined #openstack-nova | 03:59 | |
*** frankwang has quit IRC | 04:05 | |
*** diga has joined #openstack-nova | 04:14 | |
*** sridharg has joined #openstack-nova | 04:17 | |
*** frankwang has joined #openstack-nova | 04:20 | |
*** liuyulong has joined #openstack-nova | 04:32 | |
*** udesale has quit IRC | 04:35 | |
openstackgerrit | melanie witt proposed openstack/nova stable/ocata: [stable only] Handle quota usage during create/delete races https://review.openstack.org/582413 | 04:38 |
*** dklyle has quit IRC | 04:39 | |
*** hshiina has joined #openstack-nova | 04:47 | |
*** tetsuro_ has joined #openstack-nova | 04:55 | |
*** udesale has joined #openstack-nova | 04:55 | |
*** frankwang has quit IRC | 04:58 | |
*** tetsuro_ has quit IRC | 05:02 | |
*** Dinesh_Bhor has joined #openstack-nova | 05:04 | |
*** tetsuro_ has joined #openstack-nova | 05:08 | |
*** tetsuro_ has quit IRC | 05:10 | |
*** tetsuro__ has joined #openstack-nova | 05:10 | |
*** pmannidi has joined #openstack-nova | 05:13 | |
*** frankwang has joined #openstack-nova | 05:37 | |
*** frankwang has quit IRC | 05:38 | |
*** frankwang has joined #openstack-nova | 05:39 | |
*** jaosorior has quit IRC | 05:40 | |
*** jaosorior has joined #openstack-nova | 05:41 | |
*** janki has joined #openstack-nova | 05:44 | |
*** tetsuro__ has quit IRC | 05:46 | |
*** tetsuro_ has joined #openstack-nova | 05:49 | |
openstackgerrit | Vishakha Agarwal proposed openstack/nova master: 'Updated_at' is NULL when show aggregate info https://review.openstack.org/580271 | 05:57 |
*** tetsuro_ has quit IRC | 05:57 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Remove unused request API sample template https://review.openstack.org/588420 | 06:06 |
*** Luzi has joined #openstack-nova | 06:13 | |
*** sridharg has quit IRC | 06:22 | |
*** gibi is now known as giblet | 06:25 | |
*** chason has quit IRC | 06:32 | |
*** chason[m] has quit IRC | 06:32 | |
*** chason has joined #openstack-nova | 06:44 | |
*** ccamacho has joined #openstack-nova | 06:45 | |
alex_xu | stephenfin: sorry, just send at wrong channel, the sample files are deleted by this commit https://review.openstack.org/#/c/149129/, and actually, the api sample test doesn't validate the request body, so there is no complain, those file actually just for document. | 06:48 |
gmann | alex_xu: stephenfin this will fix - https://review.openstack.org/#/c/588420/4 | 06:52 |
*** tetsuro_ has joined #openstack-nova | 06:52 | |
alex_xu | gmann: thanks | 06:53 |
alex_xu | gmann: but I'm thinking about why we cleanup those empty file at https://review.openstack.org/#/c/149129/ | 06:54 |
*** rcernin has quit IRC | 06:54 | |
gmann | alex_xu: not sure why we removed may be because their are just empty | 06:55 |
gmann | they are just empty | 06:55 |
alex_xu | gmann: yea, anyway, your fix is better | 06:56 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Remove unused request API sample template https://review.openstack.org/588420 | 07:00 |
gmann | alex_xu: done ^^ | 07:00 |
*** chason has quit IRC | 07:01 | |
*** trungnv has quit IRC | 07:01 | |
*** chason has joined #openstack-nova | 07:02 | |
*** annp has quit IRC | 07:02 | |
alex_xu | gmann: thanks | 07:02 |
*** tetsuro_ has quit IRC | 07:03 | |
*** kaisers has quit IRC | 07:03 | |
*** tetsuro_ has joined #openstack-nova | 07:04 | |
*** blkart has quit IRC | 07:07 | |
*** chason has quit IRC | 07:08 | |
*** pmannidi has quit IRC | 07:16 | |
openstackgerrit | Yongli He proposed openstack/nova master: Load expected attr pci_devices while migrate https://review.openstack.org/588455 | 07:23 |
openstackgerrit | Merged openstack/nova master: Add another up-call to the cells v2 caveats list https://review.openstack.org/581910 | 07:31 |
*** tetsuro_ has quit IRC | 07:41 | |
*** chason has joined #openstack-nova | 07:44 | |
*** tetsuro_ has joined #openstack-nova | 07:44 | |
*** mschuppert has joined #openstack-nova | 07:49 | |
*** tetsuro__ has joined #openstack-nova | 07:54 | |
*** tetsuro_ has quit IRC | 07:55 | |
*** Bhujay has joined #openstack-nova | 07:56 | |
*** dtantsur|afk is now known as dtantsur | 08:01 | |
*** tommylikehu is now known as tommylikehu2 | 08:02 | |
*** tommylikehu2 is now known as tommylikehu | 08:03 | |
*** derekh has joined #openstack-nova | 08:03 | |
*** tommylikehu is now known as tommylikehu_afk | 08:04 | |
*** Bhujay has quit IRC | 08:05 | |
*** tetsuro__ has quit IRC | 08:07 | |
*** tommylikehu_afk is now known as tommylikehu | 08:09 | |
*** jpena|off is now known as jpena | 08:20 | |
giblet | stephenfin: you found it you can make sure it is fixed ;) https://review.openstack.org/#/c/588420/ | 08:22 |
*** tetsuro_ has joined #openstack-nova | 08:27 | |
openstackgerrit | Yikun Jiang (Kero) proposed openstack/nova master: Fix nits in resource_provider.py https://review.openstack.org/588470 | 08:28 |
*** vivsoni_ has quit IRC | 08:29 | |
*** tetsuro_ has quit IRC | 08:40 | |
*** tetsuro_ has joined #openstack-nova | 08:41 | |
*** tssurya has joined #openstack-nova | 08:49 | |
*** tssurya has quit IRC | 08:50 | |
*** sambetts_ is now known as sambetts | 09:03 | |
*** jaosorior has quit IRC | 09:03 | |
*** Dinesh_Bhor has quit IRC | 09:07 | |
*** cdent has joined #openstack-nova | 09:09 | |
*** chason has quit IRC | 09:10 | |
tobasco | is there any manual process that needs to be performed if you get a lot of this? | 09:12 |
tobasco | http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/resource_tracker.py#n1308 | 09:12 |
tobasco | why wouldn't it clear allocations if the instance doesn't exist? | 09:12 |
openstackgerrit | zhufl proposed openstack/nova master: xx_instance_type_id in list_migrations should be integer https://review.openstack.org/588481 | 09:12 |
*** frankwang has quit IRC | 09:18 | |
*** frankwang has joined #openstack-nova | 09:19 | |
*** deepak_mourya has quit IRC | 09:25 | |
*** chason has joined #openstack-nova | 09:28 | |
*** hshiina has quit IRC | 09:29 | |
*** Dinesh_Bhor has joined #openstack-nova | 09:29 | |
*** avolkov has joined #openstack-nova | 09:31 | |
*** tetsuro_ has quit IRC | 09:31 | |
*** vivsoni has joined #openstack-nova | 09:31 | |
*** Dinesh_Bhor has quit IRC | 09:41 | |
*** liuyulong has quit IRC | 09:43 | |
openstackgerrit | Yikun Jiang (Kero) proposed openstack/nova master: Fix nits in resource_provider.py https://review.openstack.org/588470 | 09:55 |
*** obre is now known as obre_ | 09:58 | |
*** obre_ is now known as obre | 09:58 | |
*** Dinesh_Bhor has joined #openstack-nova | 10:03 | |
*** obre has quit IRC | 10:04 | |
*** obre has joined #openstack-nova | 10:06 | |
*** chason has quit IRC | 10:06 | |
*** chason has joined #openstack-nova | 10:07 | |
*** chason has quit IRC | 10:12 | |
* giblet takes the rest of the day easy | 10:15 | |
*** fanzhang has quit IRC | 10:18 | |
cdent | giblet++ | 10:20 |
*** panda|rover has joined #openstack-nova | 10:22 | |
panda|rover | Hi, I'm trying to gather console logs for nova instances, but it seems logs resets at boot, is there a way to maintain the console log persistent across reboots ? | 10:29 |
*** Dinesh_Bhor has quit IRC | 10:35 | |
*** frankwang has quit IRC | 10:39 | |
*** cdent has quit IRC | 10:43 | |
*** diga has quit IRC | 10:54 | |
openstackgerrit | Chen proposed openstack/nova master: Revert task_state to none for LM failure due to invalid dest https://review.openstack.org/588512 | 11:01 |
*** jpena is now known as jpena|lunch | 11:03 | |
*** Yingxin has quit IRC | 11:07 | |
sean-k-mooney | tobasco: there are some bugs related to live migration that can cause allocation to leak | 11:13 |
*** amarao has joined #openstack-nova | 11:14 | |
amarao | Hello. I found that if I remove image instance booted from, migration no longer uses a proper aggregate based on that image meta. Is someone knew something about this? | 11:15 |
tobasco | so we've been pounding our cloud with rally, so if my logs contain excessive of such statements that would probably be after rally live migrations | 11:15 |
tobasco | should I be worried, I assume I would want to release those allocations manually somehow | 11:15 |
*** tssurya has joined #openstack-nova | 11:17 | |
*** vivsoni has quit IRC | 11:17 | |
*** tbachman_ has joined #openstack-nova | 11:22 | |
*** tbachman has quit IRC | 11:25 | |
*** tbachman_ is now known as tbachman | 11:25 | |
*** cdent has joined #openstack-nova | 11:26 | |
*** tbachman has quit IRC | 11:28 | |
openstackgerrit | Merged openstack/nova master: Remove unused request API sample template https://review.openstack.org/588420 | 11:29 |
*** dave-mccowan has joined #openstack-nova | 11:44 | |
*** jpena|lunch is now known as jpena | 11:57 | |
*** _pewp_ has quit IRC | 12:03 | |
*** _pewp_ has joined #openstack-nova | 12:04 | |
*** hemna_ has quit IRC | 12:04 | |
openstackgerrit | Liam Young proposed openstack/nova master: Target metadata requests at the correct cell. https://review.openstack.org/588520 | 12:06 |
*** ratailor has quit IRC | 12:08 | |
*** tbachman has joined #openstack-nova | 12:13 | |
openstackgerrit | Merged openstack/nova master: Docs: Add Placement to Nova system architecture https://review.openstack.org/584338 | 12:13 |
*** rmart04 has joined #openstack-nova | 12:14 | |
*** hemna_ has joined #openstack-nova | 12:15 | |
*** panda|rover is now known as panda|rover|off | 12:17 | |
*** tbachman has quit IRC | 12:17 | |
*** tbachman has joined #openstack-nova | 12:19 | |
openstackgerrit | Liam Young proposed openstack/nova master: Remove Neutron MetaAPIProxy from cellsv2-layout https://review.openstack.org/588525 | 12:20 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: Cleanup comp_node, res_prov, services, aggregate_hosts during cell deletion https://review.openstack.org/546660 | 12:30 |
*** mriedem has joined #openstack-nova | 12:39 | |
mriedem | cfriesen: what was the reason for needing a POST /os-services API to create nova-compute services on a given host? https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-f3afe2522f9c92f5705f0ff5cf343865R246 | 12:40 |
mriedem | which is also, btw, not multi-cell aware since it doesn't rely on the host mapping | 12:40 |
mriedem | sean-k-mooney: check this out https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-99e4b3f7232bf35155ff8b590b0ea589R44 | 12:44 |
sean-k-mooney | mriedem: clicking but not sure i want too | 12:47 |
sean-k-mooney | haha | 12:47 |
*** efried is now known as fried_rice | 12:48 | |
sean-k-mooney | mriedem: that in the api. that is not a bad idea to be honest | 12:48 |
sean-k-mooney | mriedem: we dont document in the api that when using the libvirt dirview we detach all pci/sriov device on suspend which is hostile to a guest application that was uing them | 12:49 |
sean-k-mooney | pause would not detach the devices. | 12:49 |
mriedem | sean-k-mooney: i know https://bugs.launchpad.net/nova/+bug/1785246 | 12:51 |
openstack | Launchpad bug 1785246 in OpenStack Compute (nova) "Compute API reference should describe pause and suspend operations" [Medium,Confirmed] | 12:51 |
sean-k-mooney | mriedem: we likely can do this where i wanted too in the snapshot case after talking to dansmith due to concern about data curroption by not flushing buffers but this seams ok | 12:51 |
sean-k-mooney | ya i was think about that yesterday after we were talking about it. | 12:51 |
mriedem | windriver could have just disabled the suspend/resume apis with policy | 12:52 |
mriedem | rather than change the behavior | 12:52 |
sean-k-mooney | its an impentation detail of the libvirt driver im not sure we should be leaking it through the api | 12:52 |
sean-k-mooney | that said we should document it | 12:52 |
bauzas | mriedem: hola | 12:53 |
sean-k-mooney | fried_rice: any idea if the ibm drivers detach pcidevices from the guest on suspend | 12:53 |
bauzas | mriedem: I was thinking on cherry-picking https://review.openstack.org/#/c/584204 (I mean, the series) to Queens | 12:54 |
bauzas | mriedem: you okay with it ? | 12:54 |
fried_rice | sean-k-mooney: I would only be able to answer for PowerVM, and... It's possible suspend is an operation we don't support. /me checks support matrix... | 12:55 |
sean-k-mooney | fried_rice: im looking at the intree driver now. | 12:55 |
mriedem | bauzas: no | 12:55 |
mriedem | RequestSpec.is_bfv is an rpc api bump | 12:56 |
mriedem | so not backportable | 12:56 |
bauzas | oh rightg | 12:56 |
mriedem | dansmith and i talked about that while he was writing the patch | 12:56 |
sean-k-mooney | mriedem: i might start working on that docs bug by the way but ill need stephenfin ect to check it since my written expression is not always well valid english :) | 12:56 |
fried_rice | sean-k-mooney: https://docs.openstack.org/nova/latest/user/support-matrix.html#operation_suspend_driver_powervm | 12:57 |
mriedem | sean-k-mooney: sure | 12:57 |
mriedem | i'm also not saying we should copy the libvirt description of those operations into the api, | 12:57 |
mriedem | if it's not the same behavior across virt drivers | 12:57 |
sean-k-mooney | mriedem: i agree but we should likely add a note for the different backends. https://docs.openstack.org/nova/latest/user/support-matrix.html#operation_suspend should proably have a note too | 12:59 |
*** nicolasbock has joined #openstack-nova | 13:00 | |
bauzas | mriedem: for some reason, I missed https://review.openstack.org/#/c/580720/ in the series | 13:00 |
bauzas | my bad | 13:00 |
*** edmondsw_ has joined #openstack-nova | 13:00 | |
*** eharney has joined #openstack-nova | 13:07 | |
sean-k-mooney | interesting... the xenapi appears to just suspend. both hyperv and vspher end up delegating to there repective hyperviors suspend as a result this apears to only be a thing for libvirt. | 13:08 |
*** gbarros has joined #openstack-nova | 13:12 | |
fried_rice | Anyone from HyperV around? | 13:12 |
sean-k-mooney | fried_rice: i guess not | 13:17 |
sean-k-mooney | brb going for lunch/coffee | 13:17 |
fried_rice | doesn't matter, I found what I needed. | 13:17 |
openstackgerrit | Merged openstack/nova master: Scrub hw:cpu_model from API samples https://review.openstack.org/588371 | 13:19 |
fried_rice | stephenfin: Any justification for something like this https://review.openstack.org/#/c/588422/ ? | 13:20 |
*** dave-mccowan has quit IRC | 13:25 | |
*** mriedem is now known as mriedem_afk | 13:25 | |
*** tbachman has quit IRC | 13:26 | |
*** lbragstad has quit IRC | 13:26 | |
stephenfin | fried_rice: Not that I'm aware of, anyway | 13:27 |
stephenfin | Purely a nice to hav | 13:27 |
fried_rice | stephenfin: ight, thanks for the look. I'm not opposed to approving the thing once they fix it, I guess. | 13:27 |
stephenfin | likewise | 13:28 |
fried_rice | Rather than saying it's a fin u cannot do (hint) | 13:28 |
*** stephenfin is now known as finucannot | 13:28 | |
finucannot | noted | 13:28 |
finucannot | :) | 13:28 |
*** psachin has quit IRC | 13:33 | |
*** frankwang has joined #openstack-nova | 13:36 | |
*** frankwang has quit IRC | 13:40 | |
openstackgerrit | Chris Dent proposed openstack/nova stable/queens: [placement] Retry allocation writes server side https://review.openstack.org/588569 | 13:41 |
openstackgerrit | Eric Fried proposed openstack/nova master: [placement] Debug log per granular request group https://review.openstack.org/588350 | 13:41 |
openstackgerrit | Stephen Finucane proposed openstack/nova stable/queens: Don't filter out sibling sets with one core https://review.openstack.org/588570 | 13:42 |
openstackgerrit | Stephen Finucane proposed openstack/nova stable/queens: Ensure emulator threads are always calculated https://review.openstack.org/588571 | 13:42 |
openstackgerrit | Stephen Finucane proposed openstack/nova stable/queens: Always pass 'NUMACell.siblings' to _pack_instance_onto_cores' https://review.openstack.org/588572 | 13:42 |
openstackgerrit | Stephen Finucane proposed openstack/nova stable/queens: trivialfix: cleanup _pack_instance_onto_cores() https://review.openstack.org/588573 | 13:42 |
openstackgerrit | Stephen Finucane proposed openstack/nova stable/queens: Add unit tests for EmulatorThreadsTestCase https://review.openstack.org/588574 | 13:42 |
openstackgerrit | Stephen Finucane proposed openstack/nova stable/queens: Not use thread alloc policy for emulator thread https://review.openstack.org/588575 | 13:42 |
*** gbarros has quit IRC | 13:42 | |
finucannot | lyarwood: Fancy sticking those on your review queue? ^ | 13:43 |
lyarwood | finucannot: sure thing | 13:44 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Use placement 1.28 in scheduler report client https://review.openstack.org/583667 | 13:44 |
*** awaugama has joined #openstack-nova | 13:45 | |
*** _ix has joined #openstack-nova | 13:46 | |
*** jaypipes is now known as leakypipes | 13:46 | |
*** antosh has joined #openstack-nova | 13:47 | |
*** bnemec is now known as beekneemech | 13:51 | |
*** gbarros has joined #openstack-nova | 13:53 | |
*** damien_r has quit IRC | 14:00 | |
openstackgerrit | Chris Dent proposed openstack/nova master: [placement] Move resource_class_cache into placement hierarchy https://review.openstack.org/584085 | 14:02 |
openstackgerrit | Chris Dent proposed openstack/nova master: [placement] ensure_rc_cache only at start of process https://review.openstack.org/584086 | 14:02 |
*** tbachman has joined #openstack-nova | 14:05 | |
*** alex_xu has quit IRC | 14:08 | |
*** edmondsw_ is now known as edmondsw | 14:08 | |
*** Luzi has quit IRC | 14:15 | |
*** rpittau has quit IRC | 14:18 | |
*** _ix has quit IRC | 14:22 | |
*** cdent has quit IRC | 14:23 | |
melwitt | dansmith: do you understand this bug? says guests can't retrieve metadata from the metadata API with multiple cells https://bugs.launchpad.net/nova/+bug/1785235 cc gnuoy | 14:26 |
openstack | Launchpad bug 1785235 in OpenStack Compute (nova) "metadata retrieval fails when using a global nova-api-metadata service" [Undecided,In progress] - Assigned to Liam Young (gnuoy) | 14:26 |
dansmith | well, I understand the words in the bug | 14:27 |
melwitt | I had thought guests retrieved metadata over http, not the MQ | 14:27 |
cfriesen | mriedem_afk: I think the idea was to allow a management layer to create a new compute node in the DB so that we can "disable" it, set up system-generated host aggregates, boot the node, do some health checks, then "enable" it once everything is ready. | 14:27 |
*** gbarros has quit IRC | 14:27 | |
dansmith | I also understand that I'm going to -2 the code change | 14:27 |
dansmith | melwitt: of course they do, | 14:27 |
melwitt | okay. I didn't understand either | 14:27 |
dansmith | but they're saying that metadata service then doesn't hit the right db as a result | 14:28 |
*** artom has joined #openstack-nova | 14:28 | |
*** janki has quit IRC | 14:28 | |
gnuoy | yep | 14:28 |
melwitt | oh, I see | 14:28 |
*** gbarros has joined #openstack-nova | 14:28 | |
melwitt | so, this method, or another one that we missed cell targeting in? https://github.com/openstack/nova/blob/master/nova/api/metadata/base.py#L677 | 14:29 |
dansmith | melwitt: L692 | 14:30 |
*** antosh has quit IRC | 14:30 | |
melwitt | right | 14:30 |
melwitt | okay | 14:31 |
dansmith | commented | 14:31 |
gnuoy | dansmith, I see context.target_cell getting called and adding the cells mq and db endpoints to cctxt | 14:32 |
melwitt | cool thanks | 14:32 |
gnuoy | but when the request executes those endpoints are ignored | 14:32 |
dansmith | gnuoy: did you open this bug? | 14:33 |
gnuoy | I did | 14:33 |
dansmith | gnuoy: please show some logs and config | 14:33 |
*** hongbin has joined #openstack-nova | 14:33 | |
gnuoy | sure | 14:34 |
dansmith | gnuoy: are you running a standalone metadata server with cmd/api_metadata ? | 14:36 |
gnuoy | I am running a standalone metadata server. I don't follow the second part of the question | 14:38 |
dansmith | gnuoy: okay I think I see what's going on and why you want to make the change you're making | 14:39 |
gnuoy | ah, cool! | 14:39 |
gnuoy | dansmith, that change was a starter for 10, I'm happy to update it | 14:39 |
dansmith | gnuoy: so there are two ways to run metadata, either part of the regular api with enabled_apis= | 14:39 |
dansmith | and then with the standalone cmd/api_metadata thing which just starts up a standalone metadata server | 14:40 |
dansmith | the latter always forces the indirection api into place, | 14:40 |
dansmith | which was really intended for the case where you're running metadata on each compute node | 14:40 |
gnuoy | yep, I saw that | 14:40 |
gnuoy | ah | 14:40 |
dansmith | for the global case, you really should be running the regular api server, and just not enable the osapi api if you want to only run metadata | 14:40 |
dansmith | that will not install the indirection handler and do proper switching | 14:41 |
dansmith | and will perform better | 14:41 |
gnuoy | dansmith, ok, I will give that a try, thanks. | 14:41 |
dansmith | gnuoy: please confirm for us and assuming that shakes out, we should write a doc change to fix this bug | 14:42 |
gnuoy | dansmith, absolutely, thanks | 14:42 |
mriedem_afk | cfriesen: there is a config option to keep compute services disabled when they are first created | 14:43 |
*** mriedem_afk is now known as mriedem | 14:43 | |
gnuoy | dansmith, If the indirection api is not used will the metadata service expect to be able to talk directly to the cells individual dbs? | 14:44 |
dansmith | gnuoy: yeah | 14:44 |
gnuoy | oh, hmm, ok | 14:44 |
dansmith | but it's global, and thus the access pattern will look like all those other global ones | 14:45 |
gnuoy | right, I see | 14:45 |
mriedem | cfriesen: was a bug reported upstream for this? https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-516904cc81cade24a9122ecf96707bf0R3359 | 14:51 |
*** janki has joined #openstack-nova | 14:51 | |
*** tbachman has quit IRC | 14:52 | |
mriedem | seems like something that could be deal with during _init_instance on restart of the compute service | 14:54 |
*** tbachman has joined #openstack-nova | 14:56 | |
mriedem | cfriesen: heh i can get behind this :) https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-516904cc81cade24a9122ecf96707bf0R4217 | 14:57 |
melwitt | can someone sanity check me on this? when configured to use qcow2 images with libvirt, the backing file on each compute host is expected to be raw format (not qcow2) being that each instance created on the compute will be a COW copy of it? | 14:59 |
melwitt | I'm trying to triage this https://bugs.launchpad.net/nova/+bug/1774730 | 15:00 |
openstack | Launchpad bug 1774730 in OpenStack Compute (nova) "Compute node convert qcow2 to raw even if force_raw_images=false" [Undecided,New] | 15:00 |
*** cdent has joined #openstack-nova | 15:01 | |
mriedem | cfriesen: i could have sworn we had a patch up for this upstream too https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-516904cc81cade24a9122ecf96707bf0R4226 | 15:01 |
melwitt | mdbooth: if you're around, question about force_raw_images=False and images_type=qcow2, the backing file is expected to be raw format right? since each instance is a COW copy of it? ^ | 15:03 |
*** antosh has joined #openstack-nova | 15:04 | |
mdbooth | melwitt: I *think* the backing file is allowed to be qcow2 in that case | 15:04 |
mdbooth | We done some sanity checking on it when we import it | 15:04 |
mdbooth | So it's not allowed to have a backing file iirc | 15:04 |
melwitt | mdbooth: oh, hm okay. then we might be a bug there. thank you for the info | 15:05 |
mdbooth | qcow2 can have a backing file of qcow2 | 15:05 |
mdbooth | melwitt: Got a link? | 15:05 |
melwitt | mdbooth: yes https://bugs.launchpad.net/nova/+bug/1774730 | 15:05 |
openstack | Launchpad bug 1774730 in OpenStack Compute (nova) "Compute node convert qcow2 to raw even if force_raw_images=false" [Undecided,New] | 15:05 |
* mdbooth clicks | 15:05 | |
*** tbachman has quit IRC | 15:06 | |
mdbooth | melwitt: lemme have a quick dig | 15:07 |
cfriesen | mriedem: the first one you mentioned (15min ago) was flagged as upstreamable, checking if we ever actually tried | 15:07 |
leakypipes | cfriesen: if the exact same request to placement "works" (i.e. returns >0 results) in one moment, and then "fails" (returns 0 results) a short time after, that isn't a "failure of the service's SLA". | 15:08 |
leakypipes | cfriesen: it's not a failure to return 0 results. | 15:08 |
leakypipes | cfriesen: the capacity to meet that particular request may easily have been exceeded by the first "successful" request's claim of those resources. | 15:09 |
leakypipes | cfriesen: SLAs are for things like "mean time to recover" or "mean time to respond". not for everyday occurrences and normal business of a service. | 15:10 |
leakypipes | cfriesen: for instance, if WRS claimed to its customers that the placement service would always return a result within 50 milliseconds, and placement returned a result in 2 seconds, that would be a failure of the SLA. But if the placement service returns 0 results in 20 milliseconds, that's not a failure of the SLA. | 15:11 |
cfriesen | leakypipes: doesn't that depend what's in the SLA? If I'm Netflix, I could have an agreement with Amazon saying that I'll always be able to burst by X additional resources. | 15:12 |
leakypipes | cfriesen: that has nothing to do with the placement service, and you know it. :) | 15:13 |
fried_rice | oh, this conversation is happening. | 15:13 |
fried_rice | I just sent this in an email, prettymuch. | 15:13 |
cfriesen | leakypipes: true, but it's a failure to provide resources that are supposed to be available. it's not an exception in placement, but it's arguably exceptional for the provider. | 15:14 |
cfriesen | like I said in my email, I'm of two minds. | 15:14 |
fried_rice | ^ this exactly. | 15:14 |
leakypipes | cfriesen: who said the resources are "supposed to be available"? that's crazy talk, friend. | 15:14 |
cfriesen | leakypipes: my hypothetical guarantee to netflix that they can always burst by X | 15:15 |
sean-k-mooney | cfriesen: isnt that what blazer is for | 15:15 |
fried_rice | Nova certainly isn't in the business of enforcing, or even knowing about SLAs | 15:15 |
fried_rice | yeah, was gonna say, that's some service way above nova. | 15:15 |
leakypipes | fried_rice: or *multiple* services (running as a SaaS system or orchestrator of some sort or whatever) | 15:15 |
fried_rice | fosho | 15:15 |
leakypipes | cfriesen: nobody other than a service provider can or would make such a guarantee. | 15:16 |
mdbooth | melwitt: fwiw, I can't immediately see how that's possible. | 15:16 |
leakypipes | cfriesen: we aren't a service provider. we're a placement service. | 15:16 |
fried_rice | We're talking about FFDC where the second F is only a F from the perspective of something waaay above placement. | 15:16 |
cfriesen | fried_rice: okay, but now you have those services trying to figure out why the request couldn't be met, and I can see how it would be nice to have an exception object with nice logs in it rather than sending an operator digging through logs. | 15:16 |
melwitt | mdbooth: okay, that is odd. thank you for taking a look | 15:16 |
leakypipes | cfriesen, fried_rice: move this to #openstack-placement before we get told off... | 15:17 |
fried_rice | yuh, swhy I didn't notice the conversation until after I had sent my email. | 15:17 |
mdbooth | melwitt: I'd definitely want to see logs and config. Immediate suspect is conf error. | 15:17 |
melwitt | mdbooth: okay, that's helpful. I ask the reporter for more info | 15:18 |
melwitt | thanks | 15:18 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Avoid live migrate to same host https://review.openstack.org/542689 | 15:24 |
*** pooja_jadhav has quit IRC | 15:25 | |
*** amarao has quit IRC | 15:27 | |
openstackgerrit | Chris Dent proposed openstack/nova master: [placement] ensure_rc_cache only at start of process https://review.openstack.org/584086 | 15:28 |
cfriesen | mriedem: I don't see an upstream bug report. probably just got missed, so I opened one. https://bugs.launchpad.net/nova/+bug/1785270 | 15:28 |
openstack | Launchpad bug 1785270 in OpenStack Compute (nova) "allow confirmation of resize/migration for migrations in "confirming" status" [Undecided,New] | 15:28 |
*** tbachman has joined #openstack-nova | 15:29 | |
*** janki has quit IRC | 15:29 | |
mriedem | danke | 15:33 |
cfriesen | mriedem: I think we now have the ability to set RUN_ON_REBUILD to enforce validating the image on rebuild. I'm not aware of a similar thing to enforce always going through the scheduler for live migration, though I think I talked about it with dansmith. | 15:34 |
cfriesen | gotta step out for a bit....back later. | 15:36 |
*** cfriesen is now known as cfriesen_afk | 15:36 | |
mriedem | cfriesen_afk: i see you guys removed the force flag for live migrate so you can't do that, which means you'd go through the scheduler, but that *doesnt* apply to live migrations before the microversion that added the force flag because in those cases, simply specifying a host bypasses the scheduler | 15:38 |
mriedem | RUN_ON_REBUILD is only b/c we don't actually move hosts on rebuild | 15:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Deprecate upgrade_levels options for deprecated/removed services https://review.openstack.org/588607 | 15:41 |
mriedem | dansmith: might want to get that into rocky ^ | 15:41 |
mriedem | to start the timer | 15:41 |
mriedem | melwitt: ^ you too given nova-consoleauth | 15:41 |
*** Shilpa has quit IRC | 15:43 | |
*** itlinux_ has quit IRC | 15:43 | |
melwitt | ack | 15:43 |
dansmith | yar | 15:46 |
*** itlinux has joined #openstack-nova | 15:49 | |
melwitt | mriedem: TYPO | 15:49 |
melwitt | in the reno | 15:50 |
dansmith | WUT? NO. | 15:50 |
melwitt | YUH HUH | 15:50 |
dansmith | zomg | 15:50 |
dansmith | let it be known on the third day of august, the year of our lord 2018... | 15:50 |
* dansmith thinks saying "the year of our lord" legals-up the verbiage | 15:51 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Deprecate upgrade_levels options for deprecated/removed services https://review.openstack.org/588607 | 15:53 |
* mriedem finds the tc-approved code review etiquette guidelines | 15:53 | |
mriedem | dansmith: you want to hit jichen's change below that also? | 15:53 |
dansmith | meh | 15:55 |
mriedem | i thought it was useful b/c at least one person thought "auto" could be applied to all | 15:56 |
dansmith | meh | 15:56 |
mriedem | MEH?! | 15:57 |
dansmith | MEH. | 15:57 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Reno for notification-transformation-rocky https://review.openstack.org/588403 | 15:57 |
mriedem | so i guess +r has killed casual nick friday? | 15:57 |
*** ccamacho has quit IRC | 15:57 | |
dansmith | no | 15:57 |
dansmith | it has killed other things, but not casual friday | 15:57 |
*** mriedem is now known as hansmoleman | 15:57 | |
-openstackstatus- NOTICE: The infra team is renaming projects in Gerrit. There will be a short ~10 minute Gerrit downtime in a few minutes as a result. | 16:03 | |
leakypipes | sean-k-mooney: I'm good with https://review.openstack.org/#/c/587378 going into os-vif. I will leave it to the submitter to try and get a stable branch of nova to bring it in via requirements.txt though (0.00001% chance of that happening) | 16:03 |
*** tssurya has quit IRC | 16:05 | |
*** ccamacho has joined #openstack-nova | 16:09 | |
*** gyee has joined #openstack-nova | 16:09 | |
fried_rice | finucannot: If [testenv] defines `commands` and [myenv] defines `commands`, does [myenv] actually execute testenv.commands + myenv.commands?? | 16:10 |
finucannot | fried_rice: No, [myenv] commands will override [testenv] commands | 16:10 |
fried_rice | finucannot: So that's what I thought, but snot the behavior I'm seeing :( | 16:11 |
finucannot | Got a paste? | 16:11 |
hansmoleman | cfriesen_afk: found it https://review.openstack.org/#/c/401009/ | 16:12 |
fried_rice | finucannot: agh, ignore me, pebcak | 16:13 |
*** melwitt is now known as jgwentworth | 16:14 | |
fried_rice | jgwentworth: How did you manage that? Got password? | 16:15 |
jgwentworth | yeah | 16:15 |
*** gbarros has quit IRC | 16:16 | |
*** gbarros has joined #openstack-nova | 16:18 | |
*** fried_rice is now known as fried_rolls | 16:19 | |
*** udesale has quit IRC | 16:20 | |
*** gbarros has quit IRC | 16:25 | |
*** rmart04 has left #openstack-nova | 16:26 | |
*** gbarros has joined #openstack-nova | 16:28 | |
cdent | have I got names right? hansmoleman is matt, jgwentworth is mel? | 16:29 |
jgwentworth | correct | 16:29 |
*** psachin has joined #openstack-nova | 16:29 | |
*** imacdonn has quit IRC | 16:37 | |
*** imacdonn has joined #openstack-nova | 16:37 | |
leakypipes | jgwentworth: weird... why do we have stable versions of dependent libraries when we follow a semver release model? | 16:40 |
mdbooth | dansmith: For interest, the fun LM bug we discussed earlier was sorta filed before https://bugs.launchpad.net/nova/+bug/1628606 . I don't think we appreciated the potential for the thing ending up running in 2 places at once, though. | 16:44 |
openstack | Launchpad bug 1628606 in OpenStack Compute (nova) "live migration does not clean up at target node if a failure occurs during post migration" [Low,Confirmed] | 16:44 |
*** jpena is now known as jpena|off | 16:44 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Remove unused flavor_delete_info() method https://review.openstack.org/588621 | 16:45 |
*** openstackgerrit has quit IRC | 16:49 | |
leakypipes | jgwentworth: I suppose so we can backport stuff... | 16:50 |
leakypipes | (shows you how much I keep up with stable stuff...) | 16:50 |
mdbooth | I just added a comment to https://bugs.launchpad.net/nova/+bug/1628606 . I think it's pretty serious. | 16:53 |
openstack | Launchpad bug 1628606 in OpenStack Compute (nova) "live migration does not clean up at target node if a failure occurs during post migration" [Low,Confirmed] | 16:53 |
mdbooth | I can't update the importance, though, not that it matters all that much I guess. | 16:53 |
*** tbachman has quit IRC | 16:55 | |
*** openstackgerrit has joined #openstack-nova | 16:56 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: fixtures: Track volume attachments within CinderFixtureNewAttachFlow https://review.openstack.org/587013 | 16:56 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: Add regression test for bug#1784353 https://review.openstack.org/587014 | 16:56 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: compute: Recreate volume attachments during a reschedule https://review.openstack.org/587071 | 16:56 |
*** derekh is now known as derekh_afk | 17:00 | |
*** antosh has quit IRC | 17:03 | |
hansmoleman | lyarwood: did you talk to dansmith at all about re-creating the volume attachment record in conductor vs compute when rescheduling? ^ | 17:03 |
hansmoleman | https://review.openstack.org/#/c/587071/3/nova/compute/manager.py@1611 | 17:04 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix host validity check for live-migration https://review.openstack.org/401009 | 17:05 |
jgwentworth | leakypipes: yeah, I don't know the nitty gritty on how or why it works but yeah, if we backport fixes to stable and release a new lib version from stable, people can receive fixes via the upper-constraints requirement bump if they're on an older release. I guess maybe it's more an artifact of how the deployment tools usually work? not sure | 17:07 |
lyarwood | hansmoleman: I did not, just had 30 mins to work on this now at the end of the day | 17:07 |
jgwentworth | mdbooth: will take a look. fyi to update importance of a bug, all you have to do is join the bugs team (open team) https://launchpad.net/~nova-bugs | 17:09 |
jgwentworth | being a team member gives permission to update importance | 17:09 |
*** mdbooth has quit IRC | 17:09 | |
jgwentworth | bug from 2016, still a thing? yeesh | 17:10 |
*** amarao has joined #openstack-nova | 17:15 | |
*** dtantsur is now known as dtantsur|afk | 17:20 | |
*** amarao has quit IRC | 17:23 | |
*** gouthamr has quit IRC | 17:23 | |
*** tbachman has joined #openstack-nova | 17:24 | |
hansmoleman | cfriesen_afk: it's interesting you added this _await_volume_detached thing in compute https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-77f9348ab09642ba46409b6828af4af0R2649 - i thought os-detach in cinder was a synchronous operation, so what issues were you hitting that required that waiter? | 17:25 |
*** cfriesen_afk is now known as cfriesen | 17:25 | |
*** erlon has quit IRC | 17:26 | |
*** gouthamr has joined #openstack-nova | 17:26 | |
hansmoleman | unless maybe you made detach async on the cinder side...? | 17:26 |
*** tbachman_ has joined #openstack-nova | 17:27 | |
cfriesen | hansmoleman: Isn't https://review.openstack.org/#/c/401009/ just about checking whether the specified host exists, not whether it satisfies filters? | 17:27 |
*** tbachman has quit IRC | 17:28 | |
*** tbachman_ is now known as tbachman | 17:28 | |
hansmoleman | cfriesen: correct, but it does it before changing the instance task_state to 'migrating' | 17:29 |
-openstackstatus- NOTICE: Project renames and review.openstack.org downtime are complete without any major issue. | 17:29 | |
hansmoleman | as opposed to where it was before https://review.openstack.org/#/c/401009/13/nova/compute/api.py@a4369 | 17:29 |
hansmoleman | if we failed at ^ we'd leave the instance task_state stuck in 'migrating' | 17:29 |
hansmoleman | the scheduler (or conductor in the case of force) if the specific host is valid | 17:29 |
*** psachin has quit IRC | 17:29 | |
hansmoleman | seems like this should go upstream https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-77f9348ab09642ba46409b6828af4af0R3245 | 17:30 |
hansmoleman | for https://review.openstack.org/#/c/401009/ we should really probably just have a @reverts_task_state decorator in compute api like we do in the manger | 17:31 |
hansmoleman | *manager | 17:31 |
*** gouthamr has quit IRC | 17:33 | |
cfriesen | hansmoleman: about the live migration stuff...we wanted to ensure that any operation involving node selection actually went through the scheduler | 17:36 |
hansmoleman | yeah so for evacuate, | 17:36 |
hansmoleman | https://github.com/openstack/nova/blob/3ac6deb94c5a07e42611158a680bd26febe79d6d/nova/compute/manager.py#L3170 | 17:36 |
hansmoleman | cfriesen: yeah i realize - that's not what that bug fix above is about | 17:37 |
hansmoleman | for evacuate we'll update the port's binding profile to set the migrating_to attr (which is really supposed to only be for live migration with DVR i think...), | 17:37 |
hansmoleman | heh and then we'll wipe that out in _update_port_binding_for_instance immediately after | 17:39 |
hansmoleman | yeah so it probably makes more sense to get the refreshed nw info during evacuate since we update the port binding for the dest host right before | 17:40 |
hansmoleman | https://github.com/openstack/nova/blob/3ac6deb94c5a07e42611158a680bd26febe79d6d/nova/compute/manager.py#L3182 | 17:40 |
cfriesen | the _await_volume_detached() thing was actually grabbed from https://bugs.launchpad.net/nova/+bug/1527623 | 17:40 |
openstack | Launchpad bug 1527623 in OpenStack Compute (nova) "Nova might orphan volumes when it's racing to delete a volume-backed instance" [Medium,In progress] - Assigned to ChangBo Guo(gcb) (glongwave) | 17:40 |
*** hemna_ has quit IRC | 17:41 | |
hansmoleman | i think that might be old / bogus now, | 17:43 |
hansmoleman | seeing that in CI logs now, it's hitting on TestVolumeBootPattern, | 17:43 |
hansmoleman | and i think the race is that tempest doesn't wait to cleanup the volume snapshots first | 17:43 |
hansmoleman | before deleting theserver | 17:43 |
hansmoleman | i thought jgwentworth had a patch for that | 17:43 |
hansmoleman | looking at the cinder API, os-detach is synchronous, it's an rpc call from volume api to volume manager | 17:43 |
jgwentworth | sounds familiar. let me check | 17:44 |
jgwentworth | the one I'm thinking of is this, not sure if that's the same thing you're talking about https://review.openstack.org/571336 | 17:45 |
jgwentworth | no, mine is about test_volume_backup | 17:45 |
jgwentworth | I had a different one that got merged, let me find that | 17:46 |
cfriesen | hansmoleman: did cinder change os-detach from async to sync? your bug report said it was async. | 17:47 |
hansmoleman | cfriesen: not sure, but that might have been faulty triage by me at the time, not sure | 17:47 |
hansmoleman | it's an old bug | 17:47 |
hansmoleman | i might have assumed it was async b/c it's async in nova | 17:47 |
hansmoleman | but lots of the cinder api is synchronous | 17:47 |
cfriesen | I'm going to have to go through irc history and start opening starlingx storyboard bugs. :) | 17:48 |
jgwentworth | this one https://review.openstack.org/#/c/565601/8/tempest/scenario/test_volume_boot_pattern.py but I didn't change anything about deletion of the server. because for rbd, you have to delete the server first before you delete the volume, the server booted from the volume is dependent on the volume | 17:49 |
hansmoleman | ugh | 17:49 |
jgwentworth | is it backward for non-rbd perhaps? | 17:49 |
hansmoleman | http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Failed%20to%20delete%20volume%5C%22%20AND%20message%3A%5C%22due%20to%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d shows up in non-ceph jobs though | 17:50 |
jgwentworth | I see. my brain is getting confused trying to think of how the dependency works for non-ceph, but it might be the case that the order has to be different depending on ceph vs non-ceph | 17:50 |
jgwentworth | I can't remember it now but I thought when I tried to delete the volume snapshot first, the test failed | 17:51 |
* jgwentworth looks at the previous PS's | 17:51 | |
hansmoleman | yes you said that in there, | 17:52 |
hansmoleman | "Hm, this actually makes the previously passing ceph job fail. And thinking about it, this is backwards -- we shouldn't delete the volume snapshot first if the created volume depends on it. The created volume should be deleted first, and then that would allow the volume snapshot to be deleted (in the case of ceph)." | 17:52 |
jgwentworth | ah, only failed for ceph | 17:52 |
jgwentworth | okay, so do we need different behavior depending on whether it's ceph? | 17:52 |
hansmoleman | the issue is that we have to delete the volume snapshot before attempting to delete the volume, and nova-compute deletes the volume, | 17:53 |
hansmoleman | was there an issue with trying to just delete the volume snapshot before the server with ceph? | 17:53 |
hansmoleman | https://review.openstack.org/#/c/565601/5 is what i was +1 on | 17:54 |
hansmoleman | and you said that made ceph fail? | 17:54 |
jgwentworth | yes, it failed every time with ceph. so I thought maybe the comment was backward | 17:54 |
hansmoleman | ceph is backward | 17:54 |
hansmoleman | the cinder api shouldn't really behave differently depending on the volume type | 17:54 |
hansmoleman | otherwise client side tooling would always need a "if volume_type=='rbd'" condition | 17:55 |
jgwentworth | it has something to do with the references in ceph. trying to see what it was exactly | 17:55 |
jgwentworth | yeah | 17:55 |
hansmoleman | idk how you delete a ceph volume that has snapshots then, | 17:57 |
hansmoleman | because if you can't remove the snapshots until the volume is gone, | 17:57 |
hansmoleman | and you can't delete the volume while it has snapshots, | 17:57 |
jgwentworth | you have to delete the servers that reference it first, I think | 17:57 |
hansmoleman | then wtf | 17:57 |
hansmoleman | ugh | 17:57 |
hansmoleman | so for rbd volume-backed servers with snapshots, you just always orphan the volumes? | 17:57 |
jgwentworth | well, in the case of the test, what happens is the server is deleted and then nova deletes the volume snapshot after | 17:58 |
hansmoleman | nova doesn't delete volume snapshots | 17:59 |
hansmoleman | nova-compute will attempt to delete the volume because bdm.delete_on_termination, | 17:59 |
jgwentworth | I mean, the virt domain gets destroyed and then compute deletes the underlying volume if delete_on_termination=True | 17:59 |
hansmoleman | which fails if the volume has snapshots | 17:59 |
*** Kevin_Zheng has quit IRC | 17:59 | |
jgwentworth | okay, I think I'm lacking on knowledge of volume snapshots. let me dig into how this works (non-ceph vs ceph) and propose a change to put the comment back and explain the ceph part | 18:01 |
*** sambetts is now known as sambetts|afk | 18:02 | |
jgwentworth | I didn't realize they were different and thought I had made a mistake with trying to delete the snapshot first because that change failed the ceph job, whereas leaving it alone had both jobs passing | 18:02 |
*** ccamacho has quit IRC | 18:04 | |
*** ccamacho has joined #openstack-nova | 18:05 | |
*** ccamacho has quit IRC | 18:09 | |
*** openstackgerrit has quit IRC | 18:19 | |
*** derekh_afk has quit IRC | 18:21 | |
*** gbarros has quit IRC | 18:23 | |
*** gbarros has joined #openstack-nova | 18:26 | |
*** gbarros has quit IRC | 18:31 | |
*** gbarros has joined #openstack-nova | 18:41 | |
*** hemna_ has joined #openstack-nova | 18:48 | |
*** openstackgerrit has joined #openstack-nova | 18:49 | |
openstackgerrit | Merged openstack/nova master: Reno for notification-transformation-rocky https://review.openstack.org/588403 | 18:49 |
*** gbarros has quit IRC | 18:51 | |
*** gbarros has joined #openstack-nova | 18:52 | |
openstackgerrit | Chris Dent proposed openstack/nova master: [placement] ensure_rc_cache only at start of process https://review.openstack.org/584086 | 18:55 |
*** gouthamr has joined #openstack-nova | 19:04 | |
hansmoleman | cfriesen: what does the VIM do for a "crashed" instance in order to recover it? | 19:07 |
*** gbarros has quit IRC | 19:11 | |
hansmoleman | seems this could go upstream in some form https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-77f9348ab09642ba46409b6828af4af0R7696 - looks like it cleans up old files when a previously evacuated source host comes back online and the instances are off it now (and were being resized when the source was evacuated?) | 19:13 |
*** derekh has joined #openstack-nova | 19:13 | |
*** derekh has quit IRC | 19:15 | |
hansmoleman | i thought we already had something like this upstream https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-77f9348ab09642ba46409b6828af4af0R7967 | 19:16 |
openstackgerrit | Merged openstack/nova master: [placement] Move resource_class_cache into placement hierarchy https://review.openstack.org/584085 | 19:17 |
*** fried_rolls is now known as fried_rice | 19:18 | |
cfriesen | hansmoleman: for a crashed instance it just does a "stop/start" sequence I think. More generally for an ERROR instance it will cycle through gradually more extreme options (reboot, rebuild, evacuate, etc.) depending on the node state | 19:19 |
hansmoleman | ok and looking at the _cleanup_running_orphan_instances periodic, and the upstream ComputeManager.init_host, it looks like we don't have something for that, | 19:20 |
hansmoleman | b/c on compute startup we'll only work with instances still in the db and only destroy guests from the hypervisor that have been evacuated to another host | 19:20 |
hansmoleman | so i case in that case, the compute host went down when the user deleted the instance from the db | 19:21 |
hansmoleman | then the compute comes back up and the instance isn't in the db but it's consuming resources on the hypervisor | 19:21 |
cfriesen | hansmoleman: I think we also got running orphan instance from things like failures during migration | 19:23 |
hansmoleman | so https://bugs.launchpad.net/nova/+bug/1285000 | 19:24 |
openstack | Launchpad bug 1285000 in OpenStack Compute (nova) pike "instance data resides on destination node when vm is deleted during live-migration" [Medium,Fix released] - Assigned to Maciej Jozefczyk (maciej.jozefczyk) | 19:24 |
cfriesen | that'd be one possibility. you could also get something like what mdbooth commented on where a live migration never runs "post live migration at destination" and the system gets into a weird state | 19:26 |
cfriesen | the orphan audit dates from havana, when things were not quite as robust as they are now | 19:28 |
hansmoleman | ack | 19:30 |
*** cdent has quit IRC | 19:35 | |
hansmoleman | cfriesen: interesting https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-afb9c0c0ca5276c7eacd987bbf51d8e6R447 | 19:39 |
hansmoleman | does upstream retrieve the volume_image_metadata properly for volume-backed scheduling with things like the NUMATopologyFilter? | 19:39 |
hansmoleman | looks like we should, compute API _get_bdm_image_metadata | 19:41 |
hansmoleman | gets the volume image metadata from the volume | 19:41 |
hansmoleman | https://bugs.launchpad.net/nova/+bug/1785318 | 19:53 |
openstack | Launchpad bug 1785318 in OpenStack Compute (nova) "evacuate rebuild claim will not use any image_meta for volume-backed instances" [Medium,Triaged] | 19:53 |
*** mdrabe has quit IRC | 19:54 | |
*** mdrabe has joined #openstack-nova | 19:56 | |
*** gouthamr has quit IRC | 20:00 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Remove old check_attach version check in API https://review.openstack.org/588348 | 20:05 |
*** artom has quit IRC | 20:07 | |
hansmoleman | fried_rice: ^ updated the commit message on that for the del instance.id thing in the tests | 20:09 |
hansmoleman | also updated some really really old comment | 20:09 |
cfriesen | hansmoleman: thanks for opening that bug report. I believe we have evacuate working reliably with NumaTopology. | 20:14 |
hansmoleman | yeah i don't see why it wouldn't | 20:14 |
hansmoleman | i was thinking of the isolated hosts filter, | 20:14 |
hansmoleman | which relies on the image id which we don't have in the request spec for bfv | 20:14 |
hansmoleman | https://review.openstack.org/#/c/543263/ | 20:15 |
cfriesen | hansmoleman: there will likely be other cases where we fixed something and then it's been fixed upstream and we ported the change without retesting against upstream first. | 20:15 |
hansmoleman | ^ still not sure *why* or if it was intentional that we don't store the RequestSpec.image.id for bfv instances | 20:15 |
hansmoleman | we also just don't really test the isolated hosts filter | 20:16 |
*** harlowja has joined #openstack-nova | 20:21 | |
*** gbarros has joined #openstack-nova | 20:35 | |
hansmoleman | cfriesen: i'll have a patch up for that shortly, slightly different than what's in starlingx | 20:48 |
*** beekneemech is now known as bnemec-pto | 20:55 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix image-defined numa claims during evacuate https://review.openstack.org/588657 | 20:59 |
*** awaugama has quit IRC | 21:02 | |
*** gbarros has quit IRC | 21:02 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Remove redundant _update()s https://review.openstack.org/588091 | 21:22 |
*** avolkov has quit IRC | 21:26 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Optimize AZ lookup during schedule_and_build_instances https://review.openstack.org/588665 | 21:29 |
*** gbarros has joined #openstack-nova | 21:33 | |
hansmoleman | dansmith: looks like another place for the long rpc timeout https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-2a50b2dbeb123b515ebb4b917ae1cb2bR751 | 21:37 |
*** pcaruana has quit IRC | 21:38 | |
*** gbarros has quit IRC | 21:43 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Use CONF.long_rpc_timeout in post_live_migration_at_destination https://review.openstack.org/588668 | 21:47 |
*** gouthamr has joined #openstack-nova | 21:47 | |
hansmoleman | cfriesen: i thought the api change to return the server group that each server is in was kind of interesting, | 21:50 |
hansmoleman | but that kind of sucks for performance if you're listing servers with details for 1000 servers | 21:51 |
hansmoleman | since it's an api db query per instance | 21:51 |
hansmoleman | alternatively that could be done by storing the instance group info in instance_extra with the instance, | 21:51 |
hansmoleman | or adding a member filter to GET /os-server-groups | 21:51 |
hansmoleman | so you could just get server groups by a given member server | 21:51 |
hansmoleman | that list would always return at most 1 since a server can't be in more than one group | 21:52 |
cfriesen | hansmoleman: that implementation was driven partly by trying to make it as easy to port as possible. | 21:55 |
hansmoleman | yeah i get that | 21:56 |
hansmoleman | i think the server group is only returned if the wrs-header is present too... | 21:56 |
cfriesen | yes] | 21:56 |
hansmoleman | was also thinking we could do like a GET /servers/{id}/group but that is pretty boring, it would just return the server group subresource | 21:57 |
hansmoleman | anywho | 21:57 |
cfriesen | heading out? | 21:59 |
hansmoleman | jgwentworth: on https://bugs.launchpad.net/nova/+bug/1781286 i just remembered, | 21:59 |
openstack | Launchpad bug 1781286 in OpenStack Compute (nova) "CantStartEngineError in cell conductor during reschedule - get_host_availability_zone up-call" [Medium,Triaged] | 21:59 |
hansmoleman | cfriesen: of course not | 21:59 |
hansmoleman | jgwentworth: one way i thought about fixing that was adding a different migrate_server (or whatever it's called) method in conductor that's not trying to target a cell, | 21:59 |
hansmoleman | because that's how build_resources works, it's not using the @targets_cell decorator because it's called from the api for scheduling and from the compute for reschedules | 21:59 |
hansmoleman | so really we should do similar for rescheduling a resize/cold migrate | 22:00 |
cfriesen | hansmoleman: as a heads-up, I'm on vacation next week. you'll have to save up your questions. :) | 22:00 |
hansmoleman | cfriesen: damn | 22:00 |
hansmoleman | i'm leaving for china on friday | 22:00 |
cfriesen | you can try asking Dean | 22:00 |
hansmoleman | given i'm gotten through the api extension and nova/compute/* changes so far, i think i'm pretty good | 22:01 |
hansmoleman | i expect some scheduler stuff but mostly linked to what i've already seen | 22:01 |
hansmoleman | *i've | 22:01 |
cfriesen | there are some changes around the server group affinity to close some races in the scheduler | 22:01 |
hansmoleman | jgwentworth: reason i bring that up now, is if we fixed the bug that way, it's a new rpc api method which isn't backportable, | 22:02 |
hansmoleman | so we'd have to get that done before rc if we want it in rocky | 22:02 |
hansmoleman | i also thought about putting the az per host on the Selection object that we pass down for alternates, | 22:04 |
hansmoleman | but that's an rcp api version bump on the Selection object, so again, not backportable | 22:04 |
hansmoleman | ew it's actually 2 separate but very similar bugs and we'd need both | 22:06 |
hansmoleman | @targets_cell looks up the instance mapping which fails if the cell conductor isn't configured with the api db | 22:06 |
hansmoleman | looking up the host az fails if not configured for the api db | 22:07 |
*** Guest43093 is now known as amrith | 22:12 | |
hansmoleman | mnaser: i'm assuming you have [api_database]/connection configured in your cell conductors? or are you just using a single global conductor? | 22:14 |
* mnaser has to whois for a second | 22:15 | |
mnaser | s/has/had/ | 22:15 |
mnaser | hansmoleman: right now just a global conductor but we'll be switching to cell conductor soon | 22:15 |
*** hongbin has quit IRC | 22:19 | |
hansmoleman | jgwentworth: just added https://blueprints.launchpad.net/nova/+spec/fix-reschedule-up-calls for stein and to the ptg etherpad; i think it's too much change at this point to try and rush a fix for rocky | 22:33 |
hansmoleman | especially given it's been broken since pike | 22:34 |
hansmoleman | cfriesen: hmm looks like a bug upstream https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-378a96ec6159d0a2f8ec7ab71bc3843bR921 - i know we change the request spec on resize, but do we revert the request_spec.flavor on revert resize? | 22:36 |
hansmoleman | another good reason why we can't trust the request spec a lot of the time.. | 22:37 |
openstackgerrit | Merged openstack/os-vif master: Support for OVS DB TCP socket communication. https://review.openstack.org/587378 | 22:41 |
*** derekh has joined #openstack-nova | 22:43 | |
*** nicolasbock has quit IRC | 22:43 | |
hansmoleman | https://bugs.launchpad.net/nova/+bug/1785339 | 22:45 |
openstack | Launchpad bug 1785339 in OpenStack Compute (nova) "RequestSpec.flavor is not reverted on resize revert" [Medium,Triaged] | 22:45 |
hansmoleman | busted since newton | 22:45 |
*** mschuppert has quit IRC | 23:03 | |
*** derekh has quit IRC | 23:08 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update RequestSpec.flavor on resize_revert https://review.openstack.org/588689 | 23:14 |
openstackgerrit | Merged openstack/nova master: Remove unused flavor_delete_info() method https://review.openstack.org/588621 | 23:18 |
hansmoleman | cfriesen: i see what you mean about affinity races https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-d29c9372baf108a281712642550918dcR88 | 23:24 |
hansmoleman | should probably report a bug for that | 23:24 |
hansmoleman | since we don't do the late anti-affinity check on compute like we do for server create and evacuate (which would be up-calls for cells v2 now too) | 23:25 |
hansmoleman | oh heh https://bugs.launchpad.net/nova/+bug/1600251 | 23:25 |
openstack | Launchpad bug 1600251 in OpenStack Compute (nova) "live migration does not honor server group policy" [High,Confirmed] | 23:25 |
*** tetsuro_ has joined #openstack-nova | 23:29 | |
hansmoleman | i think part of that is fixed with https://review.openstack.org/#/c/527799/ | 23:30 |
hansmoleman | but not sure where we re-calculate the group members prior to scheduling | 23:31 |
*** harlowja has quit IRC | 23:48 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!