Friday, 2018-08-03

*** liuyulong has quit IRC00:08
*** hamzy has joined #openstack-nova00:09
*** tonyb has quit IRC00:16
*** tetsuro_ has joined #openstack-nova00:28
*** tetsuro_ has quit IRC00:36
*** gyee has quit IRC00:36
*** edmondsw has joined #openstack-nova00:37
*** edmondsw has quit IRC00:42
*** tbachman has quit IRC00:43
openstackgerritmelanie witt proposed openstack/nova stable/ocata: [stable only] Handle quota usage during create/delete races  https://review.openstack.org/58241300:58
openstackgerritmelanie witt proposed openstack/nova stable/ocata: Add functional regression test for bug 1783613  https://review.openstack.org/58841600:58
openstackbug 1783613 in OpenStack Compute (nova) ocata "[ocata only] quota usage not decremented during boot/delete race" [Undecided,In progress] https://launchpad.net/bugs/1783613 - Assigned to melanie witt (melwitt)00:58
openstackgerritMerged openstack/nova master: In Python3.7 async is a keyword [1]  https://review.openstack.org/58436500:58
openstackgerritmelanie witt proposed openstack/nova stable/ocata: [stable only] Add functional regression test for bug 1783613  https://review.openstack.org/58841600:59
openstackbug 1783613 in OpenStack Compute (nova) ocata "[ocata only] quota usage not decremented during boot/delete race" [Undecided,In progress] https://launchpad.net/bugs/1783613 - Assigned to melanie witt (melwitt)00:59
openstackgerritmelanie witt proposed openstack/nova stable/ocata: [stable only] Handle quota usage during create/delete races  https://review.openstack.org/58241300:59
*** mrsoul has joined #openstack-nova01:06
*** frankwang has joined #openstack-nova01:14
melwittmriedem_afk: I added a functional regression test that might help demonstrate the bug ^01:15
melwittcustomer hit an issue around this so I proposed it upstream too in case it can help01:18
openstackgerritGhanshyam Mann proposed openstack/nova master: Remove unused request API sample template  https://review.openstack.org/58842001:30
openstackgerritGhanshyam Mann proposed openstack/nova master: Remove unused request API sample template  https://review.openstack.org/58842001:31
*** mriedem_afk has quit IRC01:34
openstackgerritGhanshyam Mann proposed openstack/nova master: Remove unused request API sample template  https://review.openstack.org/58842001:34
*** tetsuro_ has joined #openstack-nova01:36
*** hongbin has joined #openstack-nova01:44
lbragstadmelwitt: oh - so don't try and support per user quotas?01:44
*** gbarros has joined #openstack-nova01:47
openstackgerritZhenyu Zheng proposed openstack/nova master: Update installation guide to be more clear about cellsv2  https://review.openstack.org/58424401:51
openstackgerritzhufl proposed openstack/nova master: Fix none-ascii char in doc  https://review.openstack.org/58842201:56
*** gbarros has quit IRC01:57
*** tbachman has joined #openstack-nova01:57
*** Dinesh_Bhor has joined #openstack-nova01:57
*** tetsuro_ has quit IRC02:04
*** threestrands has joined #openstack-nova02:11
*** tonyb has joined #openstack-nova02:22
*** threestrands has quit IRC02:23
*** edmondsw has joined #openstack-nova02:25
*** edmondsw has quit IRC02:30
*** Kevin_Zheng has joined #openstack-nova02:34
*** psachin has joined #openstack-nova02:34
*** dave-mccowan has quit IRC02:35
openstackgerritVishakha Agarwal proposed openstack/nova master: No change in  field 'updated' in server  https://review.openstack.org/58644603:03
*** tbachman_ has joined #openstack-nova03:09
*** tbachman has quit IRC03:12
*** tbachman_ is now known as tbachman03:12
*** frankwang has quit IRC03:31
*** udesale has joined #openstack-nova03:31
*** frankwang has joined #openstack-nova03:31
*** itlinux_ has joined #openstack-nova03:33
*** hongbin has quit IRC03:52
*** Dinesh_Bhor has quit IRC03:59
*** ratailor has joined #openstack-nova03:59
*** frankwang has quit IRC04:05
*** diga has joined #openstack-nova04:14
*** sridharg has joined #openstack-nova04:17
*** frankwang has joined #openstack-nova04:20
*** liuyulong has joined #openstack-nova04:32
*** udesale has quit IRC04:35
openstackgerritmelanie witt proposed openstack/nova stable/ocata: [stable only] Handle quota usage during create/delete races  https://review.openstack.org/58241304:38
*** dklyle has quit IRC04:39
*** hshiina has joined #openstack-nova04:47
*** tetsuro_ has joined #openstack-nova04:55
*** udesale has joined #openstack-nova04:55
*** frankwang has quit IRC04:58
*** tetsuro_ has quit IRC05:02
*** Dinesh_Bhor has joined #openstack-nova05:04
*** tetsuro_ has joined #openstack-nova05:08
*** tetsuro_ has quit IRC05:10
*** tetsuro__ has joined #openstack-nova05:10
*** pmannidi has joined #openstack-nova05:13
*** frankwang has joined #openstack-nova05:37
*** frankwang has quit IRC05:38
*** frankwang has joined #openstack-nova05:39
*** jaosorior has quit IRC05:40
*** jaosorior has joined #openstack-nova05:41
*** janki has joined #openstack-nova05:44
*** tetsuro__ has quit IRC05:46
*** tetsuro_ has joined #openstack-nova05:49
openstackgerritVishakha Agarwal proposed openstack/nova master: 'Updated_at' is NULL when show aggregate info  https://review.openstack.org/58027105:57
*** tetsuro_ has quit IRC05:57
openstackgerritGhanshyam Mann proposed openstack/nova master: Remove unused request API sample template  https://review.openstack.org/58842006:06
*** Luzi has joined #openstack-nova06:13
*** sridharg has quit IRC06:22
*** gibi is now known as giblet06:25
*** chason has quit IRC06:32
*** chason[m] has quit IRC06:32
*** chason has joined #openstack-nova06:44
*** ccamacho has joined #openstack-nova06:45
alex_xustephenfin: sorry, just send at wrong channel, the sample files are deleted by this commit https://review.openstack.org/#/c/149129/, and actually, the api sample test doesn't validate the request body, so there is no complain, those file actually just for document.06:48
gmannalex_xu: stephenfin this will fix - https://review.openstack.org/#/c/588420/406:52
*** tetsuro_ has joined #openstack-nova06:52
alex_xugmann: thanks06:53
alex_xugmann: but I'm thinking about why we cleanup those empty file at https://review.openstack.org/#/c/149129/06:54
*** rcernin has quit IRC06:54
gmannalex_xu: not sure why we removed may be because their are just empty06:55
gmannthey are just empty06:55
alex_xugmann: yea, anyway, your fix is better06:56
openstackgerritGhanshyam Mann proposed openstack/nova master: Remove unused request API sample template  https://review.openstack.org/58842007:00
gmannalex_xu: done ^^07:00
*** chason has quit IRC07:01
*** trungnv has quit IRC07:01
*** chason has joined #openstack-nova07:02
*** annp has quit IRC07:02
alex_xugmann: thanks07:02
*** tetsuro_ has quit IRC07:03
*** kaisers has quit IRC07:03
*** tetsuro_ has joined #openstack-nova07:04
*** blkart has quit IRC07:07
*** chason has quit IRC07:08
*** pmannidi has quit IRC07:16
openstackgerritYongli He proposed openstack/nova master: Load expected attr pci_devices while migrate  https://review.openstack.org/58845507:23
openstackgerritMerged openstack/nova master: Add another up-call to the cells v2 caveats list  https://review.openstack.org/58191007:31
*** tetsuro_ has quit IRC07:41
*** chason has joined #openstack-nova07:44
*** tetsuro_ has joined #openstack-nova07:44
*** mschuppert has joined #openstack-nova07:49
*** tetsuro__ has joined #openstack-nova07:54
*** tetsuro_ has quit IRC07:55
*** Bhujay has joined #openstack-nova07:56
*** dtantsur|afk is now known as dtantsur08:01
*** tommylikehu is now known as tommylikehu208:02
*** tommylikehu2 is now known as tommylikehu08:03
*** derekh has joined #openstack-nova08:03
*** tommylikehu is now known as tommylikehu_afk08:04
*** Bhujay has quit IRC08:05
*** tetsuro__ has quit IRC08:07
*** tommylikehu_afk is now known as tommylikehu08:09
*** jpena|off is now known as jpena08:20
gibletstephenfin: you found it you can make sure it is fixed ;) https://review.openstack.org/#/c/588420/08:22
*** tetsuro_ has joined #openstack-nova08:27
openstackgerritYikun Jiang (Kero) proposed openstack/nova master: Fix nits in resource_provider.py  https://review.openstack.org/58847008:28
*** vivsoni_ has quit IRC08:29
*** tetsuro_ has quit IRC08:40
*** tetsuro_ has joined #openstack-nova08:41
*** tssurya has joined #openstack-nova08:49
*** tssurya has quit IRC08:50
*** sambetts_ is now known as sambetts09:03
*** jaosorior has quit IRC09:03
*** Dinesh_Bhor has quit IRC09:07
*** cdent has joined #openstack-nova09:09
*** chason has quit IRC09:10
tobascois there any manual process that needs to be performed if you get a lot of this?09:12
tobascohttp://git.openstack.org/cgit/openstack/nova/tree/nova/compute/resource_tracker.py#n130809:12
tobascowhy wouldn't it clear allocations if the instance doesn't exist?09:12
openstackgerritzhufl proposed openstack/nova master: xx_instance_type_id in list_migrations should be integer  https://review.openstack.org/58848109:12
*** frankwang has quit IRC09:18
*** frankwang has joined #openstack-nova09:19
*** deepak_mourya has quit IRC09:25
*** chason has joined #openstack-nova09:28
*** hshiina has quit IRC09:29
*** Dinesh_Bhor has joined #openstack-nova09:29
*** avolkov has joined #openstack-nova09:31
*** tetsuro_ has quit IRC09:31
*** vivsoni has joined #openstack-nova09:31
*** Dinesh_Bhor has quit IRC09:41
*** liuyulong has quit IRC09:43
openstackgerritYikun Jiang (Kero) proposed openstack/nova master: Fix nits in resource_provider.py  https://review.openstack.org/58847009:55
*** obre is now known as obre_09:58
*** obre_ is now known as obre09:58
*** Dinesh_Bhor has joined #openstack-nova10:03
*** obre has quit IRC10:04
*** obre has joined #openstack-nova10:06
*** chason has quit IRC10:06
*** chason has joined #openstack-nova10:07
*** chason has quit IRC10:12
* giblet takes the rest of the day easy10:15
*** fanzhang has quit IRC10:18
cdentgiblet++10:20
*** panda|rover has joined #openstack-nova10:22
panda|roverHi, I'm trying to gather console logs for nova instances, but it seems logs resets at boot, is there a way to maintain the console log persistent across reboots ?10:29
*** Dinesh_Bhor has quit IRC10:35
*** frankwang has quit IRC10:39
*** cdent has quit IRC10:43
*** diga has quit IRC10:54
openstackgerritChen proposed openstack/nova master: Revert task_state to none for LM failure due to invalid dest  https://review.openstack.org/58851211:01
*** jpena is now known as jpena|lunch11:03
*** Yingxin has quit IRC11:07
sean-k-mooneytobasco: there are some bugs related to live migration that can cause allocation to leak11:13
*** amarao has joined #openstack-nova11:14
amaraoHello. I found that if I remove image instance booted from, migration no longer uses a proper aggregate based on that image meta. Is someone knew something about this?11:15
tobascoso we've been pounding our cloud with rally, so if my logs contain excessive of such statements that would probably be after rally live migrations11:15
tobascoshould I be worried, I assume I would want to release those allocations manually somehow11:15
*** tssurya has joined #openstack-nova11:17
*** vivsoni has quit IRC11:17
*** tbachman_ has joined #openstack-nova11:22
*** tbachman has quit IRC11:25
*** tbachman_ is now known as tbachman11:25
*** cdent has joined #openstack-nova11:26
*** tbachman has quit IRC11:28
openstackgerritMerged openstack/nova master: Remove unused request API sample template  https://review.openstack.org/58842011:29
*** dave-mccowan has joined #openstack-nova11:44
*** jpena|lunch is now known as jpena11:57
*** _pewp_ has quit IRC12:03
*** _pewp_ has joined #openstack-nova12:04
*** hemna_ has quit IRC12:04
openstackgerritLiam Young proposed openstack/nova master: Target metadata requests at the correct cell.  https://review.openstack.org/58852012:06
*** ratailor has quit IRC12:08
*** tbachman has joined #openstack-nova12:13
openstackgerritMerged openstack/nova master: Docs: Add Placement to Nova system architecture  https://review.openstack.org/58433812:13
*** rmart04 has joined #openstack-nova12:14
*** hemna_ has joined #openstack-nova12:15
*** panda|rover is now known as panda|rover|off12:17
*** tbachman has quit IRC12:17
*** tbachman has joined #openstack-nova12:19
openstackgerritLiam Young proposed openstack/nova master: Remove Neutron MetaAPIProxy from cellsv2-layout  https://review.openstack.org/58852512:20
openstackgerritSurya Seetharaman proposed openstack/nova master: Cleanup comp_node, res_prov, services, aggregate_hosts during cell deletion  https://review.openstack.org/54666012:30
*** mriedem has joined #openstack-nova12:39
mriedemcfriesen: what was the reason for needing a POST /os-services API to create nova-compute services on a given host? https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-f3afe2522f9c92f5705f0ff5cf343865R24612:40
mriedemwhich is also, btw, not multi-cell aware since it doesn't rely on the host mapping12:40
mriedemsean-k-mooney: check this out https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-99e4b3f7232bf35155ff8b590b0ea589R4412:44
sean-k-mooneymriedem: clicking but not sure i want too12:47
sean-k-mooneyhaha12:47
*** efried is now known as fried_rice12:48
sean-k-mooneymriedem: that in the api. that is not a bad idea to be honest12:48
sean-k-mooneymriedem: we dont document in the api that when using the libvirt dirview we detach all pci/sriov device on suspend which is hostile to a guest application that was uing them12:49
sean-k-mooneypause would not detach the devices.12:49
mriedemsean-k-mooney: i know https://bugs.launchpad.net/nova/+bug/178524612:51
openstackLaunchpad bug 1785246 in OpenStack Compute (nova) "Compute API reference should describe pause and suspend operations" [Medium,Confirmed]12:51
sean-k-mooneymriedem: we likely can do this where i wanted too in the snapshot case after talking to dansmith due to concern about data curroption by not flushing buffers but this seams ok12:51
sean-k-mooneyya i was think about that yesterday after we were talking about it.12:51
mriedemwindriver could have just disabled the suspend/resume apis with policy12:52
mriedemrather than change the behavior12:52
sean-k-mooneyits an impentation detail of the libvirt driver im not sure we should be leaking it through the api12:52
sean-k-mooneythat said we should document it12:52
bauzasmriedem: hola12:53
sean-k-mooneyfried_rice: any idea if the ibm drivers detach pcidevices from the guest on suspend12:53
bauzasmriedem: I was thinking on cherry-picking https://review.openstack.org/#/c/584204 (I mean, the series) to Queens12:54
bauzasmriedem: you okay with it ?12:54
fried_ricesean-k-mooney: I would only be able to answer for PowerVM, and... It's possible suspend is an operation we don't support. /me checks support matrix...12:55
sean-k-mooneyfried_rice: im looking at the intree driver now.12:55
mriedembauzas: no12:55
mriedemRequestSpec.is_bfv is an rpc api bump12:56
mriedemso not backportable12:56
bauzasoh rightg12:56
mriedemdansmith and i talked about that while he was writing the patch12:56
sean-k-mooneymriedem: i might start working on that docs bug by the way but ill need stephenfin ect to check it since my written expression is not always well valid english :)12:56
fried_ricesean-k-mooney: https://docs.openstack.org/nova/latest/user/support-matrix.html#operation_suspend_driver_powervm12:57
mriedemsean-k-mooney: sure12:57
mriedemi'm also not saying we should copy the libvirt description of those operations into the api,12:57
mriedemif it's not the same behavior across virt drivers12:57
sean-k-mooneymriedem: i agree but we should likely add a note for the different backends. https://docs.openstack.org/nova/latest/user/support-matrix.html#operation_suspend should proably have a note too12:59
*** nicolasbock has joined #openstack-nova13:00
bauzasmriedem: for some reason, I missed https://review.openstack.org/#/c/580720/ in the series13:00
bauzasmy bad13:00
*** edmondsw_ has joined #openstack-nova13:00
*** eharney has joined #openstack-nova13:07
sean-k-mooneyinteresting... the xenapi appears to just suspend. both hyperv  and vspher end up delegating to there repective hyperviors suspend as a result this apears to only be a thing for libvirt.13:08
*** gbarros has joined #openstack-nova13:12
fried_riceAnyone from HyperV around?13:12
sean-k-mooneyfried_rice: i guess not13:17
sean-k-mooneybrb going for lunch/coffee13:17
fried_ricedoesn't matter, I found what I needed.13:17
openstackgerritMerged openstack/nova master: Scrub hw:cpu_model from API samples  https://review.openstack.org/58837113:19
fried_ricestephenfin: Any justification for something like this https://review.openstack.org/#/c/588422/ ?13:20
*** dave-mccowan has quit IRC13:25
*** mriedem is now known as mriedem_afk13:25
*** tbachman has quit IRC13:26
*** lbragstad has quit IRC13:26
stephenfinfried_rice: Not that I'm aware of, anyway13:27
stephenfinPurely a nice to hav13:27
fried_ricestephenfin: ight, thanks for the look. I'm not opposed to approving the thing once they fix it, I guess.13:27
stephenfinlikewise13:28
fried_riceRather than saying it's a fin u cannot do (hint)13:28
*** stephenfin is now known as finucannot13:28
finucannotnoted13:28
finucannot:)13:28
*** psachin has quit IRC13:33
*** frankwang has joined #openstack-nova13:36
*** frankwang has quit IRC13:40
openstackgerritChris Dent proposed openstack/nova stable/queens: [placement] Retry allocation writes server side  https://review.openstack.org/58856913:41
openstackgerritEric Fried proposed openstack/nova master: [placement] Debug log per granular request group  https://review.openstack.org/58835013:41
openstackgerritStephen Finucane proposed openstack/nova stable/queens: Don't filter out sibling sets with one core  https://review.openstack.org/58857013:42
openstackgerritStephen Finucane proposed openstack/nova stable/queens: Ensure emulator threads are always calculated  https://review.openstack.org/58857113:42
openstackgerritStephen Finucane proposed openstack/nova stable/queens: Always pass 'NUMACell.siblings' to _pack_instance_onto_cores'  https://review.openstack.org/58857213:42
openstackgerritStephen Finucane proposed openstack/nova stable/queens: trivialfix: cleanup _pack_instance_onto_cores()  https://review.openstack.org/58857313:42
openstackgerritStephen Finucane proposed openstack/nova stable/queens: Add unit tests for EmulatorThreadsTestCase  https://review.openstack.org/58857413:42
openstackgerritStephen Finucane proposed openstack/nova stable/queens: Not use thread alloc policy for emulator thread  https://review.openstack.org/58857513:42
*** gbarros has quit IRC13:42
finucannotlyarwood: Fancy sticking those on your review queue? ^13:43
lyarwoodfinucannot: sure thing13:44
openstackgerritBalazs Gibizer proposed openstack/nova master: Use placement 1.28 in scheduler report client  https://review.openstack.org/58366713:44
*** awaugama has joined #openstack-nova13:45
*** _ix has joined #openstack-nova13:46
*** jaypipes is now known as leakypipes13:46
*** antosh has joined #openstack-nova13:47
*** bnemec is now known as beekneemech13:51
*** gbarros has joined #openstack-nova13:53
*** damien_r has quit IRC14:00
openstackgerritChris Dent proposed openstack/nova master: [placement] Move resource_class_cache into placement hierarchy  https://review.openstack.org/58408514:02
openstackgerritChris Dent proposed openstack/nova master: [placement] ensure_rc_cache only at start of process  https://review.openstack.org/58408614:02
*** tbachman has joined #openstack-nova14:05
*** alex_xu has quit IRC14:08
*** edmondsw_ is now known as edmondsw14:08
*** Luzi has quit IRC14:15
*** rpittau has quit IRC14:18
*** _ix has quit IRC14:22
*** cdent has quit IRC14:23
melwittdansmith: do you understand this bug? says guests can't retrieve metadata from the metadata API with multiple cells https://bugs.launchpad.net/nova/+bug/1785235 cc gnuoy14:26
openstackLaunchpad bug 1785235 in OpenStack Compute (nova) "metadata retrieval fails when using a global nova-api-metadata service" [Undecided,In progress] - Assigned to Liam Young (gnuoy)14:26
dansmithwell, I understand the words in the bug14:27
melwittI had thought guests retrieved metadata over http, not the MQ14:27
cfriesenmriedem_afk: I think the idea was to allow a management layer to create a new compute node in the DB so that we can "disable" it, set up system-generated host aggregates, boot the node, do some health checks, then "enable" it once everything is ready.14:27
*** gbarros has quit IRC14:27
dansmithI also understand that I'm going to -2 the code change14:27
dansmithmelwitt: of course they do,14:27
melwittokay. I didn't understand either14:27
dansmithbut they're saying that metadata service then doesn't hit the right db as a result14:28
*** artom has joined #openstack-nova14:28
*** janki has quit IRC14:28
gnuoyyep14:28
melwittoh, I see14:28
*** gbarros has joined #openstack-nova14:28
melwittso, this method, or another one that we missed cell targeting in? https://github.com/openstack/nova/blob/master/nova/api/metadata/base.py#L67714:29
dansmithmelwitt: L69214:30
*** antosh has quit IRC14:30
melwittright14:30
melwittokay14:31
dansmithcommented14:31
gnuoydansmith, I see context.target_cell getting called and adding the cells mq and db endpoints to cctxt14:32
melwittcool thanks14:32
gnuoybut when the request executes those endpoints are ignored14:32
dansmithgnuoy: did you open this bug?14:33
gnuoyI did14:33
dansmithgnuoy: please show some logs and config14:33
*** hongbin has joined #openstack-nova14:33
gnuoysure14:34
dansmithgnuoy: are you running a standalone metadata server with cmd/api_metadata ?14:36
gnuoyI am running a standalone metadata server. I don't follow the second part of the question14:38
dansmithgnuoy: okay I think I see what's going on and why you want to make the change you're making14:39
gnuoyah, cool!14:39
gnuoydansmith, that change was a starter for 10, I'm happy to update it14:39
dansmithgnuoy: so there are two ways to run metadata, either part of the regular api with enabled_apis=14:39
dansmithand then with the standalone cmd/api_metadata thing which just starts up a standalone metadata server14:40
dansmiththe latter always forces the indirection api into place,14:40
dansmithwhich was really intended for the case where you're running metadata on each compute node14:40
gnuoyyep, I saw that14:40
gnuoyah14:40
dansmithfor the global case, you really should be running the regular api server, and just not enable the osapi api if you want to only run metadata14:40
dansmiththat will not install the indirection handler and do proper switching14:41
dansmithand will perform better14:41
gnuoydansmith, ok, I will give that a try, thanks.14:41
dansmithgnuoy: please confirm for us and assuming that shakes out, we should write a doc change to fix this bug14:42
gnuoydansmith, absolutely, thanks14:42
mriedem_afkcfriesen: there is a config option to keep compute services disabled when they are first created14:43
*** mriedem_afk is now known as mriedem14:43
gnuoydansmith, If the indirection api is not used will the metadata service expect to be able to talk directly to the cells individual dbs?14:44
dansmithgnuoy: yeah14:44
gnuoyoh, hmm, ok14:44
dansmithbut it's global, and thus the access pattern will look like all those other global ones14:45
gnuoyright, I see14:45
mriedemcfriesen: was a bug reported upstream for this? https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-516904cc81cade24a9122ecf96707bf0R335914:51
*** janki has joined #openstack-nova14:51
*** tbachman has quit IRC14:52
mriedemseems like something that could be deal with during _init_instance on restart of the compute service14:54
*** tbachman has joined #openstack-nova14:56
mriedemcfriesen: heh i can get behind this :) https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-516904cc81cade24a9122ecf96707bf0R421714:57
melwittcan someone sanity check me on this? when configured to use qcow2 images with libvirt, the backing file on each compute host is expected to be raw format (not qcow2) being that each instance created on the compute will be a COW copy of it?14:59
melwittI'm trying to triage this https://bugs.launchpad.net/nova/+bug/177473015:00
openstackLaunchpad bug 1774730 in OpenStack Compute (nova) "Compute node convert qcow2 to raw even if force_raw_images=false" [Undecided,New]15:00
*** cdent has joined #openstack-nova15:01
mriedemcfriesen: i could have sworn we had a patch up for this upstream too https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-516904cc81cade24a9122ecf96707bf0R422615:01
melwittmdbooth: if you're around, question about force_raw_images=False and images_type=qcow2, the backing file is expected to be raw format right? since each instance is a COW copy of it? ^15:03
*** antosh has joined #openstack-nova15:04
mdboothmelwitt: I *think* the backing file is allowed to be qcow2 in that case15:04
mdboothWe done some sanity checking on it when we import it15:04
mdboothSo it's not allowed to have a backing file iirc15:04
melwittmdbooth: oh, hm okay. then we might be a bug there. thank you for the info15:05
mdboothqcow2 can have a backing file of qcow215:05
mdboothmelwitt: Got a link?15:05
melwittmdbooth: yes https://bugs.launchpad.net/nova/+bug/177473015:05
openstackLaunchpad bug 1774730 in OpenStack Compute (nova) "Compute node convert qcow2 to raw even if force_raw_images=false" [Undecided,New]15:05
* mdbooth clicks15:05
*** tbachman has quit IRC15:06
mdboothmelwitt: lemme have a quick dig15:07
cfriesenmriedem: the first one you mentioned (15min ago) was flagged as upstreamable, checking if we ever actually tried15:07
leakypipescfriesen: if the exact same request to placement "works" (i.e. returns >0 results) in one moment, and then "fails" (returns 0 results) a short time after, that isn't a "failure of the service's SLA".15:08
leakypipescfriesen: it's not a failure to return 0 results.15:08
leakypipescfriesen: the capacity to meet that particular request may easily have been exceeded by the first "successful" request's claim of those resources.15:09
leakypipescfriesen: SLAs are for things like "mean time to recover" or "mean time to respond". not for everyday occurrences and normal business of a service.15:10
leakypipescfriesen: for instance, if WRS claimed to its customers that the placement service would always return a result within 50 milliseconds, and placement returned a result in 2 seconds, that would be a failure of the SLA. But if the placement service returns 0 results in 20 milliseconds, that's not a failure of the SLA.15:11
cfriesenleakypipes: doesn't that depend what's in the SLA?  If I'm Netflix, I could have an agreement with Amazon saying that I'll always be able to burst by X additional resources.15:12
leakypipescfriesen: that has nothing to do with the placement service, and you know it. :)15:13
fried_riceoh, this conversation is happening.15:13
fried_riceI just sent this in an email, prettymuch.15:13
cfriesenleakypipes: true, but it's a failure to provide resources that are supposed to be available.  it's not an exception in placement, but it's arguably exceptional for the provider.15:14
cfriesenlike I said in my email, I'm of two minds.15:14
fried_rice^ this exactly.15:14
leakypipescfriesen: who said the resources are "supposed to be available"? that's crazy talk, friend.15:14
cfriesenleakypipes: my hypothetical guarantee to netflix that they can always burst by X15:15
sean-k-mooneycfriesen: isnt that what blazer is for15:15
fried_riceNova certainly isn't in the business of enforcing, or even knowing about SLAs15:15
fried_riceyeah, was gonna say, that's some service way above nova.15:15
leakypipesfried_rice: or *multiple* services (running as a SaaS system or orchestrator of some sort or whatever)15:15
fried_ricefosho15:15
leakypipescfriesen: nobody other than a service provider can or would make such a guarantee.15:16
mdboothmelwitt: fwiw, I can't immediately see how that's possible.15:16
leakypipescfriesen: we aren't a service provider. we're a placement service.15:16
fried_riceWe're talking about FFDC where the second F is only a F from the perspective of something waaay above placement.15:16
cfriesenfried_rice: okay, but now you have those services trying to figure out why the request couldn't be met, and I can see how it would be nice to have an exception object with nice logs in it rather than sending an operator digging through logs.15:16
melwittmdbooth: okay, that is odd. thank you for taking a look15:16
leakypipescfriesen, fried_rice: move this to #openstack-placement before we get told off...15:17
fried_riceyuh, swhy I didn't notice the conversation until after I had sent my email.15:17
mdboothmelwitt: I'd definitely want to see logs and config. Immediate suspect is conf error.15:17
melwittmdbooth: okay, that's helpful. I ask the reporter for more info15:18
melwittthanks15:18
openstackgerritMatt Riedemann proposed openstack/nova master: Avoid live migrate to same host  https://review.openstack.org/54268915:24
*** pooja_jadhav has quit IRC15:25
*** amarao has quit IRC15:27
openstackgerritChris Dent proposed openstack/nova master: [placement] ensure_rc_cache only at start of process  https://review.openstack.org/58408615:28
cfriesenmriedem: I don't see an upstream bug report.  probably just got missed, so I opened one.  https://bugs.launchpad.net/nova/+bug/178527015:28
openstackLaunchpad bug 1785270 in OpenStack Compute (nova) "allow confirmation of resize/migration for migrations in "confirming" status" [Undecided,New]15:28
*** tbachman has joined #openstack-nova15:29
*** janki has quit IRC15:29
mriedemdanke15:33
cfriesenmriedem: I think we now have the ability to set RUN_ON_REBUILD to enforce validating the image on rebuild.  I'm not aware of a similar thing to enforce always going through the scheduler for live migration, though I think I talked about it with dansmith.15:34
cfriesengotta step out for a bit....back later.15:36
*** cfriesen is now known as cfriesen_afk15:36
mriedemcfriesen_afk: i see you guys removed the force flag for live migrate so you can't do that, which means you'd go through the scheduler, but that *doesnt* apply to live migrations before the microversion that added the force flag because in those cases, simply specifying a host bypasses the scheduler15:38
mriedemRUN_ON_REBUILD is only b/c we don't actually move hosts on rebuild15:38
openstackgerritMatt Riedemann proposed openstack/nova master: Deprecate upgrade_levels options for deprecated/removed services  https://review.openstack.org/58860715:41
mriedemdansmith: might want to get that into rocky ^15:41
mriedemto start the timer15:41
mriedemmelwitt: ^ you too given nova-consoleauth15:41
*** Shilpa has quit IRC15:43
*** itlinux_ has quit IRC15:43
melwittack15:43
dansmithyar15:46
*** itlinux has joined #openstack-nova15:49
melwittmriedem: TYPO15:49
melwittin the reno15:50
dansmithWUT? NO.15:50
melwittYUH HUH15:50
dansmithzomg15:50
dansmithlet it be known on the third day of august, the year of our lord 2018...15:50
* dansmith thinks saying "the year of our lord" legals-up the verbiage15:51
openstackgerritMatt Riedemann proposed openstack/nova master: Deprecate upgrade_levels options for deprecated/removed services  https://review.openstack.org/58860715:53
* mriedem finds the tc-approved code review etiquette guidelines15:53
mriedemdansmith: you want to hit jichen's change below that also?15:53
dansmithmeh15:55
mriedemi thought it was useful b/c at least one person thought "auto" could be applied to all15:56
dansmithmeh15:56
mriedemMEH?!15:57
dansmithMEH.15:57
openstackgerritMatt Riedemann proposed openstack/nova master: Reno for notification-transformation-rocky  https://review.openstack.org/58840315:57
mriedemso i guess +r has killed casual nick friday?15:57
*** ccamacho has quit IRC15:57
dansmithno15:57
dansmithit has killed other things, but not casual friday15:57
*** mriedem is now known as hansmoleman15:57
-openstackstatus- NOTICE: The infra team is renaming projects in Gerrit. There will be a short ~10 minute Gerrit downtime in a few minutes as a result.16:03
leakypipessean-k-mooney: I'm good with https://review.openstack.org/#/c/587378 going into os-vif. I will leave it to the submitter to try and get a stable branch of nova to bring it in via requirements.txt though (0.00001% chance of that happening)16:03
*** tssurya has quit IRC16:05
*** ccamacho has joined #openstack-nova16:09
*** gyee has joined #openstack-nova16:09
fried_ricefinucannot: If [testenv] defines `commands` and [myenv] defines `commands`, does [myenv] actually execute testenv.commands + myenv.commands??16:10
finucannotfried_rice: No, [myenv] commands will override [testenv] commands16:10
fried_ricefinucannot: So that's what I thought, but snot the behavior I'm seeing :(16:11
finucannotGot a paste?16:11
hansmolemancfriesen_afk: found it https://review.openstack.org/#/c/401009/16:12
fried_ricefinucannot: agh, ignore me, pebcak16:13
*** melwitt is now known as jgwentworth16:14
fried_ricejgwentworth: How did you manage that? Got password?16:15
jgwentworthyeah16:15
*** gbarros has quit IRC16:16
*** gbarros has joined #openstack-nova16:18
*** fried_rice is now known as fried_rolls16:19
*** udesale has quit IRC16:20
*** gbarros has quit IRC16:25
*** rmart04 has left #openstack-nova16:26
*** gbarros has joined #openstack-nova16:28
cdenthave I got names right? hansmoleman is matt, jgwentworth is mel?16:29
jgwentworthcorrect16:29
*** psachin has joined #openstack-nova16:29
*** imacdonn has quit IRC16:37
*** imacdonn has joined #openstack-nova16:37
leakypipesjgwentworth: weird... why do we have stable versions of dependent libraries when we follow a semver release model?16:40
mdboothdansmith: For interest, the fun LM bug we discussed earlier was sorta filed before https://bugs.launchpad.net/nova/+bug/1628606 . I don't think we appreciated the potential for the thing ending up running in 2 places at once, though.16:44
openstackLaunchpad bug 1628606 in OpenStack Compute (nova) "live migration does not clean up at target node if a failure occurs during post migration" [Low,Confirmed]16:44
*** jpena is now known as jpena|off16:44
openstackgerritMatt Riedemann proposed openstack/nova master: Remove unused flavor_delete_info() method  https://review.openstack.org/58862116:45
*** openstackgerrit has quit IRC16:49
leakypipesjgwentworth: I suppose so we can backport stuff...16:50
leakypipes(shows you how much I keep up with stable stuff...)16:50
mdboothI just added a comment to https://bugs.launchpad.net/nova/+bug/1628606 . I think it's pretty serious.16:53
openstackLaunchpad bug 1628606 in OpenStack Compute (nova) "live migration does not clean up at target node if a failure occurs during post migration" [Low,Confirmed]16:53
mdboothI can't update the importance, though, not that it matters all that much I guess.16:53
*** tbachman has quit IRC16:55
*** openstackgerrit has joined #openstack-nova16:56
openstackgerritLee Yarwood proposed openstack/nova master: fixtures: Track volume attachments within CinderFixtureNewAttachFlow  https://review.openstack.org/58701316:56
openstackgerritLee Yarwood proposed openstack/nova master: Add regression test for bug#1784353  https://review.openstack.org/58701416:56
openstackgerritLee Yarwood proposed openstack/nova master: compute: Recreate volume attachments during a reschedule  https://review.openstack.org/58707116:56
*** derekh is now known as derekh_afk17:00
*** antosh has quit IRC17:03
hansmolemanlyarwood: did you talk to dansmith at all about re-creating the volume attachment record in conductor vs compute when rescheduling? ^17:03
hansmolemanhttps://review.openstack.org/#/c/587071/3/nova/compute/manager.py@161117:04
openstackgerritMatt Riedemann proposed openstack/nova master: Fix host validity check for live-migration  https://review.openstack.org/40100917:05
jgwentworthleakypipes: yeah, I don't know the nitty gritty on how or why it works but yeah, if we backport fixes to stable and release a new lib version from stable, people can receive fixes via the upper-constraints requirement bump if they're on an older release. I guess maybe it's more an artifact of how the deployment tools usually work? not sure17:07
lyarwoodhansmoleman: I did not, just had 30 mins to work on this now at the end of the day17:07
jgwentworthmdbooth: will take a look. fyi to update importance of a bug, all you have to do is join the bugs team (open team) https://launchpad.net/~nova-bugs17:09
jgwentworthbeing a team member gives permission to update importance17:09
*** mdbooth has quit IRC17:09
jgwentworthbug from 2016, still a thing? yeesh17:10
*** amarao has joined #openstack-nova17:15
*** dtantsur is now known as dtantsur|afk17:20
*** amarao has quit IRC17:23
*** gouthamr has quit IRC17:23
*** tbachman has joined #openstack-nova17:24
hansmolemancfriesen_afk: it's interesting you added this _await_volume_detached thing in compute https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-77f9348ab09642ba46409b6828af4af0R2649 - i thought os-detach in cinder was a synchronous operation, so what issues were you hitting that required that waiter?17:25
*** cfriesen_afk is now known as cfriesen17:25
*** erlon has quit IRC17:26
*** gouthamr has joined #openstack-nova17:26
hansmolemanunless maybe you made detach async on the cinder side...?17:26
*** tbachman_ has joined #openstack-nova17:27
cfriesenhansmoleman: Isn't https://review.openstack.org/#/c/401009/ just about checking whether the specified host exists, not whether it satisfies filters?17:27
*** tbachman has quit IRC17:28
*** tbachman_ is now known as tbachman17:28
hansmolemancfriesen: correct, but it does it before changing the instance task_state to 'migrating'17:29
-openstackstatus- NOTICE: Project renames and review.openstack.org downtime are complete without any major issue.17:29
hansmolemanas opposed to where it was before https://review.openstack.org/#/c/401009/13/nova/compute/api.py@a436917:29
hansmolemanif we failed at ^ we'd leave the instance task_state stuck in 'migrating'17:29
hansmolemanthe scheduler (or conductor in the case of force) if the specific host is valid17:29
*** psachin has quit IRC17:29
hansmolemanseems like this should go upstream https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-77f9348ab09642ba46409b6828af4af0R324517:30
hansmolemanfor https://review.openstack.org/#/c/401009/ we should really probably just have a @reverts_task_state decorator in compute api like we do in the manger17:31
hansmoleman*manager17:31
*** gouthamr has quit IRC17:33
cfriesenhansmoleman: about the live migration stuff...we wanted to ensure that any operation involving node selection actually went through the scheduler17:36
hansmolemanyeah so for evacuate,17:36
hansmolemanhttps://github.com/openstack/nova/blob/3ac6deb94c5a07e42611158a680bd26febe79d6d/nova/compute/manager.py#L317017:36
hansmolemancfriesen: yeah i realize - that's not what that bug fix above is about17:37
hansmolemanfor evacuate we'll update the port's binding profile to set the migrating_to attr (which is really supposed to only be for live migration with DVR i think...),17:37
hansmolemanheh and then we'll wipe that out in _update_port_binding_for_instance immediately after17:39
hansmolemanyeah so it probably makes more sense to get the refreshed nw info during evacuate since we update the port binding for the dest host right before17:40
hansmolemanhttps://github.com/openstack/nova/blob/3ac6deb94c5a07e42611158a680bd26febe79d6d/nova/compute/manager.py#L318217:40
cfriesenthe _await_volume_detached() thing was actually grabbed from https://bugs.launchpad.net/nova/+bug/152762317:40
openstackLaunchpad bug 1527623 in OpenStack Compute (nova) "Nova might orphan volumes when it's racing to delete a volume-backed instance" [Medium,In progress] - Assigned to ChangBo Guo(gcb) (glongwave)17:40
*** hemna_ has quit IRC17:41
hansmolemani think that might be old / bogus now,17:43
hansmolemanseeing that in CI logs now, it's hitting on TestVolumeBootPattern,17:43
hansmolemanand i think the race is that tempest doesn't wait to cleanup the volume snapshots first17:43
hansmolemanbefore deleting theserver17:43
hansmolemani thought jgwentworth had a patch for that17:43
hansmolemanlooking at the cinder API, os-detach is synchronous, it's an rpc call from volume api to volume manager17:43
jgwentworthsounds familiar. let me check17:44
jgwentworththe one I'm thinking of is this, not sure if that's the same thing you're talking about https://review.openstack.org/57133617:45
jgwentworthno, mine is about test_volume_backup17:45
jgwentworthI had a different one that got merged, let me find that17:46
cfriesenhansmoleman: did cinder change os-detach from async to sync?  your bug report said it was async.17:47
hansmolemancfriesen: not sure, but that might have been faulty triage by me at the time, not sure17:47
hansmolemanit's an old bug17:47
hansmolemani might have assumed it was async b/c it's async in nova17:47
hansmolemanbut lots of the cinder api is synchronous17:47
cfriesenI'm going to have to go through irc history and start opening starlingx storyboard bugs. :)17:48
jgwentworththis one https://review.openstack.org/#/c/565601/8/tempest/scenario/test_volume_boot_pattern.py but I didn't change anything about deletion of the server. because for rbd, you have to delete the server first before you delete the volume, the server booted from the volume is dependent on the volume17:49
hansmolemanugh17:49
jgwentworthis it backward for non-rbd perhaps?17:49
hansmolemanhttp://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Failed%20to%20delete%20volume%5C%22%20AND%20message%3A%5C%22due%20to%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d shows up in non-ceph jobs though17:50
jgwentworthI see. my brain is getting confused trying to think of how the dependency works for non-ceph, but it might be the case that the order has to be different depending on ceph vs non-ceph17:50
jgwentworthI can't remember it now but I thought when I tried to delete the volume snapshot first, the test failed17:51
* jgwentworth looks at the previous PS's17:51
hansmolemanyes you said that in there,17:52
hansmoleman"Hm, this actually makes the previously passing ceph job fail. And  thinking about it, this is backwards -- we shouldn't delete the volume  snapshot first if the created volume depends on it. The created volume  should be deleted first, and then that would allow the volume snapshot  to be deleted (in the case of ceph)."17:52
jgwentworthah, only failed for ceph17:52
jgwentworthokay, so do we need different behavior depending on whether it's ceph?17:52
hansmolemanthe issue is that we have to delete the volume snapshot before attempting to delete the volume, and nova-compute deletes the volume,17:53
hansmolemanwas there an issue with trying to just delete the volume snapshot before the server with ceph?17:53
hansmolemanhttps://review.openstack.org/#/c/565601/5 is what i was +1 on17:54
hansmolemanand you said that made ceph fail?17:54
jgwentworthyes, it failed every time with ceph. so I thought maybe the comment was backward17:54
hansmolemanceph is backward17:54
hansmolemanthe cinder api shouldn't really behave differently depending on the volume type17:54
hansmolemanotherwise client side tooling would always need a "if volume_type=='rbd'" condition17:55
jgwentworthit has something to do with the references in ceph. trying to see what it was exactly17:55
jgwentworthyeah17:55
hansmolemanidk how you delete a ceph volume that has snapshots then,17:57
hansmolemanbecause if you can't remove the snapshots until the volume is gone,17:57
hansmolemanand you can't delete the volume while it has snapshots,17:57
jgwentworthyou have to delete the servers that reference it first, I think17:57
hansmolemanthen wtf17:57
hansmolemanugh17:57
hansmolemanso for rbd volume-backed servers with snapshots, you just always orphan the volumes?17:57
jgwentworthwell, in the case of the test, what happens is the server is deleted and then nova deletes the volume snapshot after17:58
hansmolemannova doesn't delete volume snapshots17:59
hansmolemannova-compute will attempt to delete the volume because bdm.delete_on_termination,17:59
jgwentworthI mean, the virt domain gets destroyed and then compute deletes the underlying volume if delete_on_termination=True17:59
hansmolemanwhich fails if the volume has snapshots17:59
*** Kevin_Zheng has quit IRC17:59
jgwentworthokay, I think I'm lacking on knowledge of volume snapshots. let me dig into how this works (non-ceph vs ceph) and propose a change to put the comment back and explain the ceph part18:01
*** sambetts is now known as sambetts|afk18:02
jgwentworthI didn't realize they were different and thought I had made a mistake with trying to delete the snapshot first because that change failed the ceph job, whereas leaving it alone had both jobs passing18:02
*** ccamacho has quit IRC18:04
*** ccamacho has joined #openstack-nova18:05
*** ccamacho has quit IRC18:09
*** openstackgerrit has quit IRC18:19
*** derekh_afk has quit IRC18:21
*** gbarros has quit IRC18:23
*** gbarros has joined #openstack-nova18:26
*** gbarros has quit IRC18:31
*** gbarros has joined #openstack-nova18:41
*** hemna_ has joined #openstack-nova18:48
*** openstackgerrit has joined #openstack-nova18:49
openstackgerritMerged openstack/nova master: Reno for notification-transformation-rocky  https://review.openstack.org/58840318:49
*** gbarros has quit IRC18:51
*** gbarros has joined #openstack-nova18:52
openstackgerritChris Dent proposed openstack/nova master: [placement] ensure_rc_cache only at start of process  https://review.openstack.org/58408618:55
*** gouthamr has joined #openstack-nova19:04
hansmolemancfriesen: what does the VIM do for a "crashed" instance in order to recover it?19:07
*** gbarros has quit IRC19:11
hansmolemanseems this could go upstream in some form https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-77f9348ab09642ba46409b6828af4af0R7696 - looks like it cleans up old files when a previously evacuated source host comes back online and the instances are off it now (and were being resized when the source was evacuated?)19:13
*** derekh has joined #openstack-nova19:13
*** derekh has quit IRC19:15
hansmolemani thought we already had something like this upstream https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-77f9348ab09642ba46409b6828af4af0R796719:16
openstackgerritMerged openstack/nova master: [placement] Move resource_class_cache into placement hierarchy  https://review.openstack.org/58408519:17
*** fried_rolls is now known as fried_rice19:18
cfriesenhansmoleman: for a crashed instance it just does a "stop/start" sequence I think.  More generally for an ERROR instance it will cycle through gradually more extreme options (reboot, rebuild, evacuate, etc.) depending on the node state19:19
hansmolemanok and looking at the _cleanup_running_orphan_instances periodic, and the upstream ComputeManager.init_host, it looks like we don't have something for that,19:20
hansmolemanb/c on compute startup we'll only work with instances still in the db and only destroy guests from the hypervisor that have been evacuated to another host19:20
hansmolemanso i case in that case, the compute host went down when the user deleted the instance from the db19:21
hansmolemanthen the compute comes back up and the instance isn't in the db but it's consuming resources on the hypervisor19:21
cfriesenhansmoleman: I think we also got running orphan instance from things like failures during migration19:23
hansmolemanso https://bugs.launchpad.net/nova/+bug/128500019:24
openstackLaunchpad bug 1285000 in OpenStack Compute (nova) pike "instance data resides on destination node when vm is deleted during live-migration" [Medium,Fix released] - Assigned to Maciej Jozefczyk (maciej.jozefczyk)19:24
cfriesenthat'd be one possibility.  you could also get something like what mdbooth commented on where a live migration never runs "post live migration at destination" and the system gets into a weird state19:26
cfriesenthe orphan audit dates from havana, when things were not quite as robust as they are now19:28
hansmolemanack19:30
*** cdent has quit IRC19:35
hansmolemancfriesen: interesting https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-afb9c0c0ca5276c7eacd987bbf51d8e6R44719:39
hansmolemandoes upstream retrieve the volume_image_metadata properly for volume-backed scheduling with things like the NUMATopologyFilter?19:39
hansmolemanlooks like we should, compute API _get_bdm_image_metadata19:41
hansmolemangets the volume image metadata from the volume19:41
hansmolemanhttps://bugs.launchpad.net/nova/+bug/178531819:53
openstackLaunchpad bug 1785318 in OpenStack Compute (nova) "evacuate rebuild claim will not use any image_meta for volume-backed instances" [Medium,Triaged]19:53
*** mdrabe has quit IRC19:54
*** mdrabe has joined #openstack-nova19:56
*** gouthamr has quit IRC20:00
openstackgerritMatt Riedemann proposed openstack/nova master: Remove old check_attach version check in API  https://review.openstack.org/58834820:05
*** artom has quit IRC20:07
hansmolemanfried_rice: ^ updated the commit message on that for the del instance.id thing in the tests20:09
hansmolemanalso updated some really really old comment20:09
cfriesenhansmoleman: thanks for opening that bug report.  I believe we have evacuate working reliably with NumaTopology.20:14
hansmolemanyeah i don't see why it wouldn't20:14
hansmolemani was thinking of the isolated hosts filter,20:14
hansmolemanwhich relies on the image id which we don't have in the request spec for bfv20:14
hansmolemanhttps://review.openstack.org/#/c/543263/20:15
cfriesenhansmoleman: there will likely be other cases where we fixed something and then it's been fixed upstream and we ported the change without retesting against upstream first.20:15
hansmoleman^ still not sure *why* or if it was intentional that we don't store the RequestSpec.image.id for bfv instances20:15
hansmolemanwe also just don't really test the isolated hosts filter20:16
*** harlowja has joined #openstack-nova20:21
*** gbarros has joined #openstack-nova20:35
hansmolemancfriesen: i'll have a patch up for that shortly, slightly different than what's in starlingx20:48
*** beekneemech is now known as bnemec-pto20:55
openstackgerritMatt Riedemann proposed openstack/nova master: Fix image-defined numa claims during evacuate  https://review.openstack.org/58865720:59
*** awaugama has quit IRC21:02
*** gbarros has quit IRC21:02
openstackgerritEric Fried proposed openstack/nova master: Remove redundant _update()s  https://review.openstack.org/58809121:22
*** avolkov has quit IRC21:26
openstackgerritMatt Riedemann proposed openstack/nova master: Optimize AZ lookup during schedule_and_build_instances  https://review.openstack.org/58866521:29
*** gbarros has joined #openstack-nova21:33
hansmolemandansmith: looks like another place for the long rpc timeout https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-2a50b2dbeb123b515ebb4b917ae1cb2bR75121:37
*** pcaruana has quit IRC21:38
*** gbarros has quit IRC21:43
openstackgerritMatt Riedemann proposed openstack/nova master: Use CONF.long_rpc_timeout in post_live_migration_at_destination  https://review.openstack.org/58866821:47
*** gouthamr has joined #openstack-nova21:47
hansmolemancfriesen: i thought the api change to return the server group that each server is in was kind of interesting,21:50
hansmolemanbut that kind of sucks for performance if you're listing servers with details for 1000 servers21:51
hansmolemansince it's an api db query per instance21:51
hansmolemanalternatively that could be done by storing the instance group info in instance_extra with the instance,21:51
hansmolemanor adding a member filter to GET /os-server-groups21:51
hansmolemanso you could just get server groups by a given member server21:51
hansmolemanthat list would always return at most 1 since a server can't be in more than one group21:52
cfriesenhansmoleman: that implementation was driven partly by trying to make it as easy to port as possible.21:55
hansmolemanyeah i get that21:56
hansmolemani think the server group is only returned if the wrs-header is present too...21:56
cfriesenyes]21:56
hansmolemanwas also thinking we could do like a GET /servers/{id}/group but that is pretty boring, it would just return the server group subresource21:57
hansmolemananywho21:57
cfriesenheading out?21:59
hansmolemanjgwentworth: on https://bugs.launchpad.net/nova/+bug/1781286 i just remembered,21:59
openstackLaunchpad bug 1781286 in OpenStack Compute (nova) "CantStartEngineError in cell conductor during reschedule - get_host_availability_zone up-call" [Medium,Triaged]21:59
hansmolemancfriesen: of course not21:59
hansmolemanjgwentworth: one way i thought about fixing that was adding a different migrate_server (or whatever it's called) method in conductor that's not trying to target a cell,21:59
hansmolemanbecause that's how build_resources works, it's not using the @targets_cell decorator because it's called from the api for scheduling and from the compute for reschedules21:59
hansmolemanso really we should do similar for rescheduling a resize/cold migrate22:00
cfriesenhansmoleman: as a heads-up, I'm on vacation next week.  you'll have to save up your questions. :)22:00
hansmolemancfriesen: damn22:00
hansmolemani'm leaving for china on friday22:00
cfriesenyou can try asking Dean22:00
hansmolemangiven i'm gotten through the api extension and nova/compute/* changes so far, i think i'm pretty good22:01
hansmolemani expect some scheduler stuff but mostly linked to what i've already seen22:01
hansmoleman*i've22:01
cfriesenthere are some changes around the server group affinity to close some races in the scheduler22:01
hansmolemanjgwentworth: reason i bring that up now, is if we fixed the bug that way, it's a new rpc api method which isn't backportable,22:02
hansmolemanso we'd have to get that done before rc if we want it in rocky22:02
hansmolemani also thought about putting the az per host on the Selection object that we pass down for alternates,22:04
hansmolemanbut that's an rcp api version bump on the Selection object, so again, not backportable22:04
hansmolemanew it's actually 2 separate but very similar bugs and we'd need both22:06
hansmoleman@targets_cell looks up the instance mapping which fails if the cell conductor isn't configured with the api db22:06
hansmolemanlooking up the host az fails if not configured for the api db22:07
*** Guest43093 is now known as amrith22:12
hansmolemanmnaser: i'm assuming you have [api_database]/connection configured in your cell conductors? or are you just using a single global conductor?22:14
* mnaser has to whois for a second22:15
mnasers/has/had/22:15
mnaserhansmoleman: right now just a global conductor but we'll be switching to cell conductor soon22:15
*** hongbin has quit IRC22:19
hansmolemanjgwentworth: just added https://blueprints.launchpad.net/nova/+spec/fix-reschedule-up-calls for stein and to the ptg etherpad; i think it's too much change at this point to try and rush a fix for rocky22:33
hansmolemanespecially given it's been broken since pike22:34
hansmolemancfriesen: hmm looks like a bug upstream https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-378a96ec6159d0a2f8ec7ab71bc3843bR921 - i know we change the request spec on resize, but do we revert the request_spec.flavor on revert resize?22:36
hansmolemananother good reason why we can't trust the request spec a lot of the time..22:37
openstackgerritMerged openstack/os-vif master: Support for OVS DB TCP socket communication.  https://review.openstack.org/58737822:41
*** derekh has joined #openstack-nova22:43
*** nicolasbock has quit IRC22:43
hansmolemanhttps://bugs.launchpad.net/nova/+bug/178533922:45
openstackLaunchpad bug 1785339 in OpenStack Compute (nova) "RequestSpec.flavor is not reverted on resize revert" [Medium,Triaged]22:45
hansmolemanbusted since newton22:45
*** mschuppert has quit IRC23:03
*** derekh has quit IRC23:08
openstackgerritMatt Riedemann proposed openstack/nova master: Update RequestSpec.flavor on resize_revert  https://review.openstack.org/58868923:14
openstackgerritMerged openstack/nova master: Remove unused flavor_delete_info() method  https://review.openstack.org/58862123:18
hansmolemancfriesen: i see what you mean about affinity races https://github.com/starlingx-staging/stx-nova/commit/71acfeae0d1c59fdc77704527d763bd85a276f9a#diff-d29c9372baf108a281712642550918dcR8823:24
hansmolemanshould probably report a bug for that23:24
hansmolemansince we don't do the late anti-affinity check on compute like we do for server create and evacuate (which would be up-calls for cells v2 now too)23:25
hansmolemanoh heh https://bugs.launchpad.net/nova/+bug/160025123:25
openstackLaunchpad bug 1600251 in OpenStack Compute (nova) "live migration does not honor server group policy" [High,Confirmed]23:25
*** tetsuro_ has joined #openstack-nova23:29
hansmolemani think part of that is fixed with https://review.openstack.org/#/c/527799/23:30
hansmolemanbut not sure where we re-calculate the group members prior to scheduling23:31
*** harlowja has quit IRC23:48

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!