Thursday, 2020-02-13

*** Liang__ has joined #openstack-nova00:02
*** nicolasbock has quit IRC00:03
openstackgerritSundar Nadathur proposed openstack/nova master: Create and bind Cyborg ARQs.  https://review.opendev.org/63124400:05
openstackgerritSundar Nadathur proposed openstack/nova master: Pass accelerator requests to each virt driver from compute manager.  https://review.opendev.org/69858100:05
openstackgerritSundar Nadathur proposed openstack/nova master: Compose accelerator PCI devices into domain XML in libvirt driver.  https://review.opendev.org/63124500:05
openstackgerritSundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted.  https://review.opendev.org/67373500:05
openstackgerritSundar Nadathur proposed openstack/nova master: Enable hard/soft reboot with accelerators.  https://review.opendev.org/69794000:05
openstackgerritSundar Nadathur proposed openstack/nova master: Enable start/stop of instances with accelerators.  https://review.opendev.org/69955300:05
openstackgerritSundar Nadathur proposed openstack/nova master: Enable and use COMPUTE_ACCELERATORS trait.  https://review.opendev.org/69955400:05
openstackgerritSundar Nadathur proposed openstack/nova master: Bump compute rpcapi version and reduce Cyborg calls.  https://review.opendev.org/70422700:05
openstackgerritSundar Nadathur proposed openstack/nova master: Add cyborg tempest job.  https://review.opendev.org/67099900:05
*** liuyulong has quit IRC00:11
*** migawa|lunch is now known as migawa|AFK00:19
*** xiaolin has joined #openstack-nova00:24
*** samc-bbc has quit IRC00:26
*** samc-bbc has joined #openstack-nova00:27
*** tetsuro has quit IRC00:33
*** tetsuro_ has joined #openstack-nova00:33
*** gyee has quit IRC00:50
*** mlavalle has quit IRC01:17
*** tbachman has joined #openstack-nova01:32
*** ileixe has joined #openstack-nova01:43
*** tbachman has quit IRC02:06
*** zhanglong has joined #openstack-nova02:15
*** dave-mccowan has joined #openstack-nova02:23
*** zhanglong has quit IRC02:27
*** zhanglong has joined #openstack-nova02:31
*** lbragstad has quit IRC02:37
*** zhanglong has quit IRC02:45
*** zhanglong has joined #openstack-nova02:47
*** vishalmanchanda has joined #openstack-nova02:59
*** brinzhang has joined #openstack-nova03:05
*** migawa|AFK is now known as migawa03:05
*** mkrai has joined #openstack-nova03:15
*** zhanglong has quit IRC03:17
*** spatel has joined #openstack-nova03:39
*** spatel has quit IRC03:44
*** tetsuro_ has quit IRC04:03
*** tetsuro has joined #openstack-nova04:04
*** udesale has joined #openstack-nova04:07
*** migawa is now known as migawa|lunch|AFK04:18
*** igordc has joined #openstack-nova04:37
*** ileixe has quit IRC04:42
*** igordc has quit IRC04:48
*** igordc has joined #openstack-nova04:48
*** igordc has quit IRC04:48
*** migawa|lunch|AFK is now known as migawa|lunch04:52
*** ileixe has joined #openstack-nova04:52
*** yaawang has joined #openstack-nova05:13
*** Liang__ has quit IRC05:28
*** Liang__ has joined #openstack-nova05:30
*** evrardjp has quit IRC05:34
*** evrardjp has joined #openstack-nova05:34
*** yaawang has quit IRC05:50
*** spatel has joined #openstack-nova06:00
*** spatel has quit IRC06:05
*** zhanglong has joined #openstack-nova06:12
*** migawa|lunch is now known as migawa|AFK06:24
*** migawa|AFK is now known as migawa06:25
*** ccamacho has quit IRC06:39
*** tetsuro has quit IRC06:51
*** tetsuro_ has joined #openstack-nova06:51
*** spatel has joined #openstack-nova06:55
*** spatel has quit IRC07:00
*** ociuhandu has joined #openstack-nova07:30
*** ociuhandu has quit IRC07:35
*** imacdonn has quit IRC07:54
*** imacdonn has joined #openstack-nova07:54
*** maciejjozefczyk has joined #openstack-nova07:57
*** kevinz has joined #openstack-nova08:08
*** slaweq has joined #openstack-nova08:10
*** ccamacho has joined #openstack-nova08:13
kevinzHi Nova, Linaro has donate some machines to Infra team already, node is ready in nodepool. We'd like to enable Arm64 CI for Nova(libvirt driver). So any guidline for me to get involved?08:15
*** ralonsoh has joined #openstack-nova08:18
*** tosky has joined #openstack-nova08:19
*** tesseract has joined #openstack-nova08:20
kevinzcreate a bug here to track this: https://bugs.launchpad.net/nova/+bug/186305808:21
openstackLaunchpad bug 1863058 in OpenStack Compute (nova) "Arm64 CI for Nova" [Undecided,New]08:21
*** ivve has joined #openstack-nova08:24
*** ivve has quit IRC08:27
*** tkajinam has quit IRC08:29
*** amoralej|off is now known as amoralej08:29
*** huaqiang has quit IRC08:30
lyarwoodkevinz: I guess if these hosts are already in nodepool then we can use them directly without third party CI?08:49
kevinzlyarwood: hi, yes these machines are now in nodepool.08:49
kevinzlyarwood: https://opendev.org/openstack/project-config/src/branch/master/nodepool/nl03.openstack.org.yaml#L41408:50
kevinzAnd for arm64 nodes, there is another pipeline https://review.opendev.org/#/c/69860608:51
kevinzdue to lacking of nodes08:51
kevinzcalled "check-arm64"08:51
lyarwoodhttps://github.com/openstack/project-config/blob/b393e477951ba3a38c63565c0824f4cf95ae292d/zuul.d/pipelines.yaml#L349-L376 - yeah just found that08:54
lyarwoodso I assume we'd need to add that pipeline alongside check etc in .zuul.yml - https://github.com/openstack/nova/blob/554a6ffa837ba915c06c8ae70c339e911c9c9303/.zuul.yaml#L213-L33608:55
lyarwoodand define some jobs08:55
lyarwoodkevinz: I'd raise this on the weekly meeting later today if you're around.08:56
lyarwoodor the ML if you're not08:57
kevinzlyarwood: Thanks, We can talk via ML first. as today's meeting is quite early for me :D08:58
kevinzwe should define some jobs for this CI08:58
kevinzlyarwood: The meeting next week(UTC14 is available for me)09:00
lyarwoodkevinz: ack understood, well in that case feel free to propose a change and we can always talk about things directly there as well :)09:01
kevinzlyarwood: no problem, I will. Thanks a lot09:01
*** tetsuro_ has quit IRC09:03
lyarwoodnp :)09:04
openstackgerritLee Yarwood proposed openstack/nova master: libvirt: Always provide the size in bytes when calling virDomainBlockResize  https://review.opendev.org/70759009:04
openstackgerritLee Yarwood proposed openstack/nova master: images: Remove Libvirt specific configurable use from qemu_img_info  https://review.opendev.org/70759109:04
*** tetsuro has joined #openstack-nova09:07
*** huaqiang has joined #openstack-nova09:14
openstackgerritLee Yarwood proposed openstack/nova master: DNM - Test TEMPEST_EXTEND_ATTACHED_ENCRYPTED_VOLUME  https://review.opendev.org/70759309:21
*** derekh has joined #openstack-nova09:29
*** bbowen has quit IRC09:34
*** bbowen has joined #openstack-nova09:35
*** ociuhandu has joined #openstack-nova09:37
*** ociuhandu has quit IRC09:40
*** ileixe has quit IRC09:52
*** ociuhandu has joined #openstack-nova09:53
*** ileixe has joined #openstack-nova09:57
huaqiangstephenfin: sean-k-mooney: alex_xu: Thanks for review. And now I just have several things need to further disccuss with you when you are around.10:00
*** ociuhandu has quit IRC10:01
*** ociuhandu has joined #openstack-nova10:01
*** ociuhandu has quit IRC10:02
*** ociuhandu has joined #openstack-nova10:02
*** ociuhandu has quit IRC10:05
*** dtantsur|afk is now known as dtantsur10:05
*** ociuhandu has joined #openstack-nova10:05
*** bbowen has quit IRC10:11
*** bbowen has joined #openstack-nova10:11
*** ociuhandu has quit IRC10:13
*** xiaolin has quit IRC10:13
*** ileixe has quit IRC10:16
huaqiangsean-k-mooney: for the mixed instance spec, in the cpu policy matrix, when no 'hw:cpu_policy' and 'hw_cpu_policy' defined, I think the final result should not be 'shared', which is you sugguested in your review.10:16
huaqiangbecause in this case, say again, no 'hw_cpu_policy' in image property and no 'hw:cpu_policy' in flavor extra specs,10:17
*** ileixe has joined #openstack-nova10:17
huaqiangthe final instance CPU allocation policy is determined by 'resources:(P|V)CPU'10:18
huaqiangit might be 'dedicated' or 'mixed'10:18
*** ileixe has quit IRC10:21
openstackgerritSundar Nadathur proposed openstack/nova master: Delete ARQs for an instance when the instance is deleted.  https://review.opendev.org/67373510:37
openstackgerritSundar Nadathur proposed openstack/nova master: Enable hard/soft reboot with accelerators.  https://review.opendev.org/69794010:37
openstackgerritSundar Nadathur proposed openstack/nova master: Enable start/stop of instances with accelerators.  https://review.opendev.org/69955310:37
openstackgerritSundar Nadathur proposed openstack/nova master: Enable and use COMPUTE_ACCELERATORS trait.  https://review.opendev.org/69955410:37
openstackgerritSundar Nadathur proposed openstack/nova master: Bump compute rpcapi version and reduce Cyborg calls.  https://review.opendev.org/70422710:37
openstackgerritSundar Nadathur proposed openstack/nova master: Add cyborg tempest job.  https://review.opendev.org/67099910:37
bauzasstephenfin: sean-k-mooney: thanks for the comments on https://review.opendev.org/#/c/552924/, I'll try to update the spec by today around 2pm CET10:39
*** spatel has joined #openstack-nova10:41
*** spatel has quit IRC10:46
alex_xustephenfin: I reply about the image metadata, but we don't have any specific usecase for images meta, I'm just thinking the generic usecase we may have in the nova https://review.opendev.org/#/c/668656/19/specs/ussuri/approved/use-pcpu-vcpu-in-one-instance.rst@122. So i'm not insis on that. just try to ensure that isn't what we want10:47
alex_xusean-k-mooney: I'm prefer the service version, since I think the traits will become useless after upgrade. And this isn't a feature we have to support in the middle of upgrade. https://review.opendev.org/#/c/668656/19/specs/ussuri/approved/use-pcpu-vcpu-in-one-instance.rst@34810:49
alex_xuhuaqiang: ^10:49
*** xiaolin has joined #openstack-nova10:58
*** zhanglong has quit IRC11:02
*** udesale has quit IRC11:02
*** zhanglong has joined #openstack-nova11:03
*** dtantsur is now known as dtantsur|brb11:22
openstackgerritLiang Fang proposed openstack/nova-specs master: Support volume local cache  https://review.opendev.org/68907011:25
*** jawad_axd has joined #openstack-nova11:25
*** hamzy_ is now known as hamzy11:26
*** ivve has joined #openstack-nova11:39
*** mkrai has quit IRC11:40
*** mkrai has joined #openstack-nova11:50
*** Liang__ has quit IRC11:57
stephenfinalex_xu: Need to read your reply for the image metadata bit, but for traits vs. service version, by traits do you mean capabilities?11:58
*** vishalmanchanda has quit IRC11:58
stephenfini.e. this virt driver can create/handle mixed instances11:59
sean-k-mooneyalex_xu: i just added a reason to keep the trait11:59
sean-k-mooneyalex_xu: specificaly if you have mixed hypervisors e.g. hyper-v and libvirt the compute campaltibity trait will be useful to select just the libvirt hosts via placement11:59
*** ociuhandu has joined #openstack-nova11:59
*** amoralej is now known as amoralej|lunch12:00
sean-k-mooneyalex_xu: huaqiang: stephenfin ^ what do you think is that enough reason to use the trait12:00
*** Liang__ has joined #openstack-nova12:01
sean-k-mooneystephenfin: we have standardised compute capablitys as traits in os-tratis12:01
sean-k-mooneystephenfin: so when we add a new compute capability we now also add a trait for that12:01
sean-k-mooneybasically that is what the compute namespace is for. not entirly but more or less https://github.com/openstack/os-traits/tree/master/os_traits/compute12:03
sean-k-mooneyfor example the COMPUTE_VOLUME_MULTI_ATTACH trait https://github.com/openstack/os-traits/blob/master/os_traits/compute/volume.py#L2412:03
sean-k-mooneyor same host cold migrate for vsphere https://github.com/openstack/os-traits/blob/master/os_traits/compute/__init__.py#L30 i think mix cpu suport makes sense as it not otherwise discoverable via plamcent and i think it is something we want to schdule on12:05
*** ociuhandu has quit IRC12:05
*** nicolasbock has joined #openstack-nova12:07
sean-k-mooneyalex_xu: with that said i wont -1 if you dont add the trait and decied to go with the service version bump i just dont like useing the service version as a proxy for specific features if we can avoid it.12:07
stephenfinsean-k-mooney: not all of them though12:21
stephenfinwe have traits for e.g. the 'supports_image_type_ploop' capability12:21
stephenfinbut I don't see any for 'supports_pcpus'12:22
stephenfinwhat's the advantage of the trait approach?12:22
stephenfin(for my own reference)12:22
*** ociuhandu has joined #openstack-nova12:22
*** derekh has quit IRC12:23
alex_xusean-k-mooney: that is good point. I didn't think about it. I agree with mix hypervisor, that is useful12:23
*** ccamacho has quit IRC12:25
alex_xusean-k-mooney: we have hypervisor doesn't support NUMA right?, but we don't have traits for them also12:25
alex_xustephenfin: we needn't supports_cpus, probably we need supports_dedicated, or support_numa, I think we have some of hypervisor doesn't support those12:26
*** ociuhandu has quit IRC12:28
sean-k-mooneyalex_xu: we do although numa with sylvains spec would be relected in the toplogy of the RPs12:35
sean-k-mooneyso it would no longer need it12:35
*** ociuhandu has joined #openstack-nova12:35
alex_xuah, right12:35
sean-k-mooneynuma is supported by hyper-v and libvirt today12:35
*** tetsuro_ has joined #openstack-nova12:40
*** ociuhandu has quit IRC12:40
alex_xusean-k-mooney: I guess libvirt is the only virt driver report pcpu12:40
*** tetsuro has quit IRC12:44
*** tetsuro_ has quit IRC12:45
huaqiangsean-k-mooney: about the white-box test, it was the decision made on Shanghai PDT meeting, I need stephenfin's openion12:46
huaqiangIf I remember correctly, he proposed the test. I'd know if he insists on the same openion now12:47
*** mkrai has quit IRC12:48
huaqiangstephenfin: If I'll add functional tests for the proposing mixed instance spec, do you still think the white-box tempest plugin should be a test that I have to pass?12:49
huaqiangs/add functional tests/add functional tests in intree NUMA test cases/12:51
alex_xusean-k-mooney: I still prefer service version now. Since it is the legacy way to figure out the upgrade status. And the trait support_mix doesn't useful, since libvirt is the only driver support pcpu. So it doesn't feel good we add extra trait in the placement request which need extra filtering and db query inside placement, but it isn't very useful for now. Maybe we need that trait in the future, if12:55
alex_xuwe have other virt driver support pcpu and vcpu on the same host.12:55
*** zhanglong has quit IRC13:00
*** mkrai has joined #openstack-nova13:00
*** nweinber has joined #openstack-nova13:01
sean-k-mooneyok ill leve it up to you to decide13:03
*** adriant has quit IRC13:04
*** adriant has joined #openstack-nova13:04
sean-k-mooneyregarding whitebox stephenfin i think you will agree it is a nice to have but not a hard requirement13:04
*** zhanglong has joined #openstack-nova13:04
sean-k-mooneyform a downstream perspective we will need to have this tested with whitbox before we can support it in the osp product but it should not be a requirement for merging upstream13:05
sean-k-mooneyespcially since we dont currently have a whitebox job runing against nova13:05
alex_xusean-k-mooney: thanks13:06
*** tbachman has joined #openstack-nova13:07
openstackgerritLee Yarwood proposed openstack/nova master: DNM - Test TEMPEST_EXTEND_ATTACHED_ENCRYPTED_VOLUME  https://review.opendev.org/70759313:15
*** mkrai has quit IRC13:16
*** ccamacho has joined #openstack-nova13:19
*** slaweq has quit IRC13:21
*** rosmaita has joined #openstack-nova13:23
*** eharney has quit IRC13:24
gibicores, volume local cache discussion will start soon on  https://bluejeans.com/322852897313:24
gibior even anybody who is interested13:24
*** slaweq has joined #openstack-nova13:25
*** eharney has joined #openstack-nova13:26
*** belmoreira has joined #openstack-nova13:27
*** ociuhandu has joined #openstack-nova13:27
*** amoralej|lunch is now known as amoralej13:32
gibisean-k-mooney: ^^13:32
stephenfinefried, gibi, dansmith: API question: should the `validation` parameter for the extra spec validation be part of the POST/PUT request body or the query string?13:38
*** martinkennelly has joined #openstack-nova13:38
stephenfinwe seem to do the former for things like server hints https://docs.openstack.org/api-ref/compute/?expanded=create-extra-specs-for-a-flavor-detail,create-server-detail#id1113:38
*** ociuhandu has quit IRC13:39
*** ociuhandu has joined #openstack-nova13:39
mnaserjust wondering if we can have more eyes on: https://review.opendev.org/#/c/670112/ -- its really helpful and still remains useful to this day :>13:41
sean-k-mooneystephenfin: query arg13:41
sean-k-mooneyi think13:41
sean-k-mooneyalthough if we dont have other args in the query args13:41
sean-k-mooneythen maybe the body13:41
sean-k-mooneyquery arg fells more natural13:41
sean-k-mooneyas its not part of the data of the flavor resouces13:42
*** brinzhang has quit IRC13:43
stephenfinHmm, yeah, you could make the same argument for server hints though13:44
*** brinzhang has joined #openstack-nova13:44
*** tbachman has quit IRC13:45
*** icarusfactor has quit IRC13:49
*** tetsuro has joined #openstack-nova13:50
*** martinkennelly has quit IRC13:58
*** ociuhandu has quit IRC13:59
*** huaqiang has quit IRC14:05
openstackgerritSylvain Bauza proposed openstack/nova-specs master: Proposes NUMA topology with RPs  https://review.opendev.org/55292414:06
*** huaqiang has joined #openstack-nova14:08
*** tbachman has joined #openstack-nova14:10
bauzasefried: dansmith: gibi: sean-k-mooney: stephenfin: ^14:10
*** lbragstad has joined #openstack-nova14:11
*** lbragstad has quit IRC14:11
*** lbragstad has joined #openstack-nova14:12
*** Liang__ is now known as LiangFang14:15
*** zhanglong has quit IRC14:19
*** dtantsur|brb is now known as dtantsur14:24
openstackgerritMerged openstack/nova master: Make RBD imagebackend flatten method idempotent  https://review.opendev.org/70433014:25
*** maciejjozefczyk has quit IRC14:26
*** spatel has joined #openstack-nova14:27
*** mriedem has joined #openstack-nova14:29
*** udesale has joined #openstack-nova14:29
openstackgerritLee Yarwood proposed openstack/nova stable/train: Make RBD imagebackend flatten method idempotent  https://review.opendev.org/70765014:29
*** spatel has quit IRC14:32
*** tetsuro has quit IRC14:37
huaqiangsean-k-mooney: thanks. I have removed the content of white-box and send the spec agian.14:38
*** ociuhandu has joined #openstack-nova14:38
openstackgerritHuachang Wang proposed openstack/nova-specs master: Use PCPU and VCPU in one instance  https://review.opendev.org/66865614:39
huaqiangstephenfin: sean-k-mooney: alex_xu: the mixed instance spec is updated, please review. Thanks14:40
*** ociuhandu has quit IRC14:45
*** maciejjozefczyk has joined #openstack-nova14:48
*** jawad_axd has quit IRC14:49
dansmithstephenfin: agree query arg feels more right-er14:52
*** jawad_axd has joined #openstack-nova14:56
*** dtantsur is now known as dtantsur|afk14:58
*** jawad_axd has quit IRC15:01
efriedstephenfin: I vote qparam too.15:10
stephenfinsweet. qparam it is15:10
*** mlavalle has joined #openstack-nova15:26
*** derekh has joined #openstack-nova15:27
*** priteau has joined #openstack-nova15:52
*** artom has joined #openstack-nova15:59
*** eharney has quit IRC16:02
*** martinkennelly has joined #openstack-nova16:10
*** udesale_ has joined #openstack-nova16:14
*** udesale has quit IRC16:14
*** ociuhandu has joined #openstack-nova16:16
gibistephenfin: I vote for query arg as Sean stated it is not part of the entity you actually create or modify16:16
*** jmlowe has joined #openstack-nova16:18
efriedbauzas: I'm very close on the NUMA RP spec; +2 if you just flip the defaults as noted. But (despite him saying he's not blocking) I want to convince stephenfin that this is the way we should go.16:21
*** tosky has quit IRC16:21
efriedsean-k-mooney: do you agree with my notes on the default for implicit numa nodes?16:21
efriedhttps://review.opendev.org/#/c/552924/20/specs/ussuri/approved/numa-topology-with-rps.rst@26916:22
stephenfinefried: So I'm clear, what's the objection to a "I want this host to report/not report NUMA"?16:24
stephenfindansmith too ^16:24
efriedstephenfin: no objection. That's being provided. But I want the default to be "report NUMA".16:24
stephenfinIt's being provided temporarily though, not long term16:25
stephenfinWhy not do this long-term16:25
efriedah, okay:16:25
*** maciejjozefczyk has quit IRC16:25
*** READ10 has joined #openstack-nova16:26
efriedThe way dansmith explained it, the only reason we don't always report NUMA and create real NUMA topologies for guests is because it's hard. But no consumer *actually* wants a guest that doesn't affine its memory to CPUs; they're taking a significant performance hit because we haven't solved this problem in nova.16:27
efriedI'm... paraphrasing. Dan was more eloquent about it.16:27
dansmithdoubtful, but yeah16:27
efriedBut we have to balance that against fitting.16:27
dansmithstephenfin: I don't want to have two "modes" for the compute service to operate in long term16:28
efriedagain paraphrasing dansmith, nova explicitly disclaims the ability to fit that last VM on that last almost-full host. So if we're compromising, that's where we're compromising.16:28
stephenfinBut we could provide NUMA information to the guest. It would just match the topology of the host16:29
efriedthat's exactly what we're doing.16:29
efriedthe U proposal does it imperfectly, 80/20.16:29
efriedFor V we can work on can_split to get that other 20%16:29
efriedand then we can remove the [workaround].16:29
stephenfinFrom the the libvirt XML perspective, yeah, but not from placement perspective16:30
efriedsorry, wha?16:31
stephenfinThe only guests that *needs* NUMA affinity are pinned instances, yeah?16:34
stephenfinand those with hugepage, but that's a self-inflicted wound16:34
stephenfinthe pinned ones need it because their cores are pinned to host cores from a specific NUMA node16:35
stephenfinwhereas unpinned instances are floating across all (enabled) host cores16:35
stephenfinso they naturally have affinity to everything, even though we don't properly expose that information to the guest16:36
efrieddoes memory float too?16:37
*** ociuhandu has quit IRC16:37
stephenfinif you don't provide a pagesize, yes16:37
stephenfinbut otherwise, no. that's the self-inflicted wound I talked about above16:37
efriedThen why are we modeling 4k pages under NUMA nodes?16:38
efriedWait, *can* you pin 4k pages?16:38
stephenfinI _think_ so, yeah16:39
stephenfinif you use the strict mem policy16:39
stephenfinactually, I don't think it's pinning in the traditional sense16:39
stephenfinbecause they can be shared16:39
*** udesale_ has quit IRC16:40
stephenfinsorry, I'm in a meeting so I can't formulate my thoughts properly. gimme 2016:40
*** jmlowe has quit IRC16:40
*** ociuhandu has joined #openstack-nova16:42
efriedIght. I don't have the depth of understanding to refute the "unpinned/floating" argument, which I was kinda advocating the other day (albeit probably without specifics). Going to need dansmith to take that on.16:42
*** gyee has joined #openstack-nova16:43
dansmithI think maybe he's talking about the case where you're overcommitting memory16:45
dansmithI'm also really not an expert on the low-level details, so maybe we've gotten lost in the woods a bit,16:47
efriedtobiash: What's the word on https://review.opendev.org/#/c/572805/ ? Spec freeze is today.16:49
dansmithI'm trying to translate what I know of reports of what people do, vs. what they would like to do, and what makes sense into what we should be doing16:50
*** Sundar has joined #openstack-nova16:50
tobiashefried: sorry, I was busy with other other things in the meantime, I fear it has to be postponed to the next release :(16:51
efriedtobiash: Okay, thanks, I'll do that.16:51
tobiashthanks a lot16:51
*** tesseract has quit IRC16:59
bauzasefried: sorry, was on meeting16:59
* bauzas reads any comments on the spec17:00
efriedbauzas: I really just want to know what you think of my French17:00
bauzasLOL17:01
gibiefried: based on Tushar's comment on the spec bp/support-shared-storage-resource-provider can be deferred out from U17:01
* efried looks...17:01
bauzasefried: we have some french continuous present but not really like yours17:02
efriedno, like I said, I don't see anybody ever saying or writing anything like that.17:02
bauzasefried: anyway, I see your -1 but I intentionnally flipped the default to *not* reshape as discussed between sean-k-mooney, stephenfin and I17:02
efriedbauzas: to me, that's the crux17:02
efriedIf we don't reshape by default, everything changes.17:03
bauzasefried: because of the potentiality of the regressions we could get17:03
bauzasefried: I know17:03
bauzasefried: and I wanted to discuss this with you17:03
bauzasbecause I'm very afraid of any potential issue we would have in Ussuri17:03
efriedstephenfin and dansmith need to be involved, but I think they're are on calls rn17:03
bauzasif we flip to changing the world17:03
bauzasmy point is, I don't know the figure but not all clouds care about NUMA17:04
bauzasfor those clouds, I'd prefer us to not change their lifes17:04
bauzasand pretending there will be no regressions17:04
efriedSo dansmith's argument was "don't care about NUMA" does *not* mean "give me shitty performance".17:05
bauzason the other hand, if we allow a default to be "no-op", then we can work on the NUMA implementation seamlessly and iteratively like we did for Cells v217:05
efriedThe [workaround] and reversible reshape gives you the way to deal with regressions.17:05
dansmithI don't really understand.. I thought the agreement was to default the new behavior off for U, not reshape by default, let people opt-in during U and then flip the default (or remove it) for V?17:05
efriedugh, no, if we were doing that, there would be no good motivation to hack the splitting thing in.17:05
efriedAnd also no motivation for operators to opt in.17:06
efriedso it would be a waste of effort.17:06
bauzasdansmith: that's what i wrote but looks like the outcome of tuesday's discussion between you, sean-k-mooney and efried was the other way17:06
dansmithbauzas: not that I remember17:06
dansmithbut I'll admit to being completely exhausted by this conversation17:06
bauzasefried: dansmith: floor is your17:06
bauzasdansmith: and tbh, me too17:06
efriedyou people need to do more cardio17:07
bauzasI do gym twice a week17:07
stephenfinlyarwood: comments on https://review.opendev.org/#/c/706880/17:07
bauzas(and skiing, but that's irrelevant)17:07
bauzasefried: anyway, my point is,17:07
stephenfindansmith, efried, bauzas: so, from the top17:07
bauzaswe can't ask operators to modify their configs *before* they upgrade or *before* they restart their clouds17:08
stephenfinfrom what CERN are saying, they're already dividing their hosts into those for NUMA and those for not NUMA17:08
efriednope. We're not asking that.17:08
bauzasefried: I know, but what you propose will frighten them17:08
stephenfinso I'm not sure why we can't do the same in placement17:08
bauzasbecause, we flip to NUMA everywhere17:08
bauzasfor CERN, that would mean non-NUMA cells would be NUMA-speaking from Ussuri17:09
efriedwhich will happen at some point anyway.17:09
bauzasand they would have to let them speak... what? after this17:09
bauzasefried: I don't disagree with you, and I think this could be Victoria17:09
efriedif we default 'off', nobody is going to switch it on. Then in V (or whenever) we switch the default and have this issue.17:09
stephenfinif you care about NUMA affinity, configure things so placement speaks NUMA, otherwise YAGNI17:10
bauzasbut not Ussuri17:10
bauzasstephenfin: that's what I propose17:10
stephenfinit seems so much simpler17:10
bauzashonestly, my vision of the work to do is :17:11
bauzas'plumb, plumb, plumb things on one side, and mark this feature as opt-in'17:11
bauzasin the eventuality of a very bad situation close to RC1, then we just add an 'EXPERIMENTAL' flag on the option17:12
bauzasboom, problem solved.17:12
efriedstephenfin: so in that scenario, you upgrade your control plane, and then any hw:numa*-havin flavors will simply refuse to land until you've upgraded *and* opted-in some hosts.17:12
stephenfinno17:12
efriedor that's what the 'fallback' query is for17:12
stephenfinyeah, short term fallback query like we do for PCPU17:13
efriedso, still you're doing two queries and either merging the results or violating pack/spread and server affinity groups.17:13
stephenfinwith a big ass warning saying "you're using this host for NUMA instances - update configuration now or perish in a future release"17:14
stephenfinyeah, but some hosts can choose to never opt-in17:14
stephenfinbecause they don't care17:14
efriedAnd you can land NUMA-aware flavors on either kind of host17:14
bauzasyeah, that's my thoughts17:14
efriedWhat about NUMA-agnostic flavors? Those can only land on un-upgraded or un-reshaped hosts, right?17:14
efriedSo one-way segregation?17:14
stephenfinthey never boot pinned instances and their instance floats across all (enabled) host cores as before17:15
bauzasefried: non-NUMA would stick with non-NUMA hosts17:15
stephenfinnon-NUMA or non-upgraded17:15
bauzasright17:15
stephenfinbecause we can't distinguish17:15
bauzascorrect, we just say "not those hosts"17:15
efriedwell, we could distinguish if we wanted to.17:15
bauzasthru a forbidden triat17:15
stephenfinnot without operator intervention17:16
stephenfinthe operator would have to do something to say "this host is intended to be a non-NUMA host"17:16
efriedwe could make the segregation complete by simply adding a trait to (even unreshaped) U hosts.17:16
stephenfinhow do you tell the difference between unreshaped and intentionally non-NUMA hosts?17:17
efriedunreshaped U is intentionally non-NUMA.17:17
stephenfinit can't be - you'd break upgrades17:17
efriedum17:18
efriedyes17:18
efriedthat's what we're talking about doing.17:18
stephenfinthe query for NUMA-based instances in U would be "all NUMA hosts + all unreshaped hosts"17:18
stephenfinthe query for non-NUMA-based instances would be "all non-NUMA hosts + all unreshaped hosts"17:18
efriedor "all NUMA U hosts + all unreshaped pre-U hosts" and "all non-NUMA U hosts + all unreshaped pre-U hosts"17:18
efriedbecause I thought we were trying to segregate from U+17:19
stephenfinagain, you'll break upgrades17:19
efriedhow so?17:19
stephenfinyou might not be reshaping17:20
stephenfinif NUMA'ness if optional long term17:20
stephenfinby V, a host will identify itself as either caring about NUMA or not caring17:21
stephenfinbut before then, we're in an uncertain state where the host _might_ be NUMA or might not17:21
stephenfinand we'd need the operator to do something to tell us which one it is17:21
bauzasstephenfin: if the operator doesn't reshape, then all hosts are non-NUMA17:21
bauzaswe don't need to distinguish them17:22
lyarwoodstephenfin: thanks, just sent some comments back. FWIW it's part of this bugfix series https://review.opendev.org/#/q/topic:bug/186107117:22
bauzasit's just that we gonna add a specific forbidden trait for ensuring either way that non-NUMA instances can't land on NUMA hosts17:22
stephenfinbauzas: how will you ever kill the fallback query in that case?17:23
bauzasif the operator starts definining NUMA hosts, then he will shard its cloud, but I'm cool with it17:23
bauzasstephenfin: the failback query should only be for 'NUMA-aware' instances17:24
bauzas.... aaaaand I probably messed this up17:24
bauzas(in the last rev of the spec)17:24
stephenfinwe want to make sure a non-NUMA instance will not land on a NUMA host, but long term shouldn't we also make sure a NUMA instance won't land on a non-NUMA host?17:24
bauzasstephenfin: yeah17:24
stephenfinokay, then you need to find some way to indicate that yes, this *really* is a non-NUMA host17:25
stephenfinthat way your queries can be "give me all NUMA hosts and all unconfigured hosts, but *not* any non-NUMA hosts"17:26
stephenfinand vice versa17:26
stephenfinright?17:26
bauzassec, wrapping up things in my mind17:26
*** dave-mccowan has quit IRC17:26
efriedso by your proposal, we actually need a three-way conf opt in U.17:27
bauzasthere are two timeframes in my mind17:27
bauzasUssuri where hosts can be unconfigured17:27
bauzas(because default is no reshape)17:27
stephenfinin a future release, those would simply become "give me all NUMA hosts" or "give me all non-NUMA hosts", depending on your instance type17:27
bauzasVictoria where all hosts are configured17:27
stephenfinefried: yeah, I was thinking a boolean that defaults to None17:27
stephenfinI think we can do that17:27
stephenfinnone/unset17:28
efried- "This host is NUMA" ==> reshape, only land hw:numa* flavors17:28
efried- "This host is not NUMA" ==> no reshape, only land non-hw:numa* flavors17:28
efried- None (default in U) ==> no reshape, looks just like a T host, land either type of flavor17:28
stephenfinyup17:28
bauzasI can write this17:28
efriedand then, what, make None illegal in V??17:28
*** maciejjozefczyk has joined #openstack-nova17:28
bauzasefried: I'm cool with it17:28
efriedThus breaking upgrades??17:28
bauzasnope17:28
stephenfinV, W, X, ... at some point in the future17:28
bauzasbecaue17:28
bauzasbecause,17:29
bauzaswe can test things17:29
bauzasand see 'okay, look, this is harmless'17:29
bauzasso, once we all agree, we remove the None value17:29
stephenfinessentially this would become one of the things you have to configure17:29
stephenfinlike 'compute_driver'17:29
bauzasand de facto all instances act upon NUMA checking17:29
efriedI mean, if we're going to segregate eventually, then at some point we "break upgrades".17:29
efriedbtw, dansmith specifically said he didn't want two modes long term.17:30
stephenfinyeah, but by that point they'll have had a couple of cycles of warnings saying "yo, you *really* need to set this config option"17:30
stephenfinefried: yeah, I don't understand why that's a bad thing17:30
dansmithI officially give up, please proceed.17:31
efriedsigh17:31
stephenfinI get that all instances should have some kind of NUMA awareness17:31
efriedokay, back to PS1617:31
efriedstephenfin: tbc, if we go this route, we don't need can_split ever, right?17:32
stephenfinbut it's a nice-to-have and I don't imagine everyone really cares17:33
* stephenfin doesn't care what NUMA node Chrome is running on17:33
stephenfinefried: correct17:33
stephenfinif we're going with the "everything is mapped to NUMA", then I think we should move the ball forward on 'can_split' instead17:33
stephenfinbecause if we don't, it won't ever happen :)17:34
*** evrardjp has quit IRC17:34
stephenfinimplement that, then use it for NUMA in V17:34
*** evrardjp has joined #openstack-nova17:34
bauzasfolks, you lost me17:34
stephenfinbut as cdent saw from the openstack-discuss thread, no one's really asking for their NUMA-based instance to coexist alongside their "I don't care about NUMA"-based instances17:35
stephenfinbauzas: A boolean '[compute] enable_numa' option that default to unset (None)17:35
efriedbauzas: that ^, but otherwise PS16.17:35
stephenfinwhen unset, we start flashing a warning saying "you need to decide if this host is meant for NUMA-based instances or not"17:36
stephenfini.e. "go configure this option"17:36
bauzasand no 'everything is NUMA and good luck finding a host that can fit your non-NUMA instance ?"17:36
stephenfinnot needed, IMO17:36
bauzasyeah I agree17:37
stephenfinit's so much more additional complexity for idk how much gain17:37
bauzasok, it's 6:37pm here and I will have to eat soon17:37
bauzasI'm rushing over providing another round17:37
*** martinkennelly has quit IRC17:38
stephenfinYeah, I've to go but feel free to +2 in my absence if the spec roughly maps to the above ^^^ I'm onboard with that approach17:38
efriedAs PTL I decree that we can do the final approvals tomorrow morning.17:39
efriedrather than try to rush it through "tonight".17:39
stephenfinsounds good to me (y)17:40
bauzasefried: I appreciate your help but I'll still stick with working on a rev tonight17:42
efriedk17:42
stephenfinhuaqiang: https://review.opendev.org/#/c/668656/ acked too, btw. Thanks for sticking with that17:42
efriedsaying, I won't proxy stephenfin's +2 tonight; it's fine to wait til morning for that.17:42
efriedah, woot17:42
efriedgibi: re DISK_GB, save me reading the comment history, are you saying that the nova spec will be dependent on the placement change?17:45
*** ociuhandu_ has joined #openstack-nova17:45
efried...an because the placement change won't happen in U, therefore the nova bp can be deferred?17:45
*** ociuhandu has quit IRC17:48
*** eharney has joined #openstack-nova17:48
*** ociuhandu_ has quit IRC17:50
*** Sundar has quit IRC17:57
openstackgerritLee Yarwood proposed openstack/nova master: virt: Provide block_device_info during rescue  https://review.opendev.org/70081117:58
openstackgerritLee Yarwood proposed openstack/nova master: libvirt: Add support for stable device rescue  https://review.opendev.org/70081217:58
openstackgerritLee Yarwood proposed openstack/nova master: compute: Report COMPUTE_RESCUE_BFV and check during rescue  https://review.opendev.org/70142917:58
openstackgerritLee Yarwood proposed openstack/nova master: api: Introduce microverion 2.82 allowing boot from volume rescue  https://review.opendev.org/70143017:58
openstackgerritLee Yarwood proposed openstack/nova master: compute: Extract _get_bdm_image_metadata into nova.utils  https://review.opendev.org/70521217:58
*** derekh has quit IRC17:59
efriedbrinzhang: What's the story on https://review.opendev.org/#/c/580336/ (bp/destroy-instance-with-datavolume)? We're at spec freeze...18:00
openstackgerritMerged openstack/nova-specs master: Use PCPU and VCPU in one instance  https://review.opendev.org/66865618:02
gmannefried: can you remove -2 from this now as spec is merged and good to code- https://review.opendev.org/#/c/701609/18:03
efriedgmann: Since we're at spec freeze, we should probably wait until we've decided which unfinished blueprints should be Direction:Approved.18:04
efriedIf the code were ready, that would be different, but...18:04
gmannefried: code is in progress so i am not sure if author still confuse with -218:05
efriedgmann: We can help educate the author :P18:05
gmannbut ok to wait till Direction:Approved decision18:05
*** maciejjozefczyk has quit IRC18:06
gmanncommented on review the same.18:07
*** amoralej is now known as amoralej|off18:12
efriedmelwitt: are you now owning nova-audit? (https://review.opendev.org/#/c/693226/)18:20
melwittefried: I didn't want to but I think the answer is technically yes because dansmith lost interest18:20
efriedmelwitt: well, I ask because we're at spec freeze, so you need to get a couple cores on board, ahem, today if it's going to happen in ussuri.18:21
bauzasefried: melwitt: FWIW, this is related https://review.opendev.org/#/c/670112/18:22
efriedit is?18:22
bauzastechnically, it's just a rename18:22
bauzasbut the intent of the spec is to provide a new specific command AFAICR18:23
bauzasthis change ^ would just be another subcommand18:23
melwittefried: yeah, I don't think that's going to happen. operators are interested but the spec didn't attract review from cores thus far and I don't think I could wrangle two that would not be considered part owners by the end of today18:24
efriedmelwitt: if "tomorrow" would make the difference, I'm fine with that. Or do you just want me to defer?18:25
melwittbauzas: the intent of the spec is to organize all of the heal commands in one place and make them runnable as a daemon service so that they automatically heal your cloud periodically18:26
bauzasoh missed the last part18:27
bauzasgtk18:27
*** Sundar has joined #openstack-nova18:27
openstackgerritSylvain Bauza proposed openstack/nova-specs master: Proposes NUMA topology with RPs  https://review.opendev.org/55292418:28
bauzasefried: ^18:28
efriedack18:28
*** igordc has joined #openstack-nova18:28
*** igordc has quit IRC18:28
bauzasanyway, bailing out18:29
melwittefried: I guess yeah if you'll give it till tomorrow, I'll send some email and see if anyone's willing to review. if there's not interest after that, then punt it18:29
Sundardansmith: If https://review.opendev.org/#/c/673735/37/nova/conductor/manager.py@524 is not the right place to delete ARQs on a reschedule, do you have any suggestion for a better plac? I could do it in the callers.18:29
efriedmelwitt: ack. I'm adding it (with other open specs) to today's meeting agenda, if you want to drum up interest there.18:29
dansmithmelwitt: efried it seems highly unlikely that anything would get implemented in U either way, so I'm not sure it's worth that18:29
*** priteau has quit IRC18:29
dansmithI thought we were supposed to be trying to reduce the number of things we approved that aren't likely to make it,18:30
dansmithbut it kinda seems like we're doing the same ol' kind of behavior18:30
dansmithSundar: do it where it needs to be done, not inside a thing called something else.. so yes, wherever that's called from that is the right place18:31
melwittdansmith, efried: well, I could implement it quickly/dumbly (I'm imagining just moving the commands and adding a service) but getting review would be another story. worst case it sits there ready to go for V if ppl can't review in time. so, I dunno18:32
efrieddansmith: Yes, intend to do a sweep of Definition:Approved blueprints "soon" to decide which of those we can/should defer.18:33
efried"spec freeze" -- no more definition approvals -- is what's happening now.18:34
openstackgerritJohn Garbutt proposed openstack/nova-specs master: Add Unified Limits Spec  https://review.opendev.org/60220118:37
efriedjohnthetubaguy: Save me looking, did you squash the fup?18:45
efried(abandon if so)18:45
*** dking_desktop has joined #openstack-nova18:50
dking_desktopI'm attempting to troubleshoot why I get the "No valid host was found." error when attempting to create a baremetal server, and just found this when I enabled debugging for the nova-scheduler: compute_status_filter request filter added forbidden trait COMPUTE_STATUS_DISABLED18:51
dking_desktopCould that be the reason why I'm not able to find a valid host? How would I troubleshoot this further?18:52
*** ralonsoh has quit IRC18:52
efrieddking_desktop: We always add that trait. It's only going to have an effect if the compute host is exposing that trait. You can check with a command like19:03
efriedopenstack resource provider trait list $host_uuid19:03
efried(I may not have the syntax exactly right -- see the docs)19:04
*** jawad_axd has joined #openstack-nova19:06
dking_desktopI'm using Ironic if that helps. I don't see anything for "openstack resource". Is it "openstack service provider list"?19:07
dking_desktopOh, maybe "openstack baremetal node trait list"19:08
dking_desktopefried: I tried "openstack baremetal node trait list <UUID>", but that gave no results. Is that the problem?19:09
efrieddking_desktop: You need to install the osc-placement plugin to get the 'resource provider' subcommands19:10
efriedpip install osc-placement (or equivalent for your distro)19:10
efriedCOMPUTE_STATUS_DISABLED isn't a trait that ironic itself would know about.19:10
*** jawad_axd has quit IRC19:11
dking_desktopOdd. I get "Operation or argument is not supported with version 1.0; requires at least version 1.6"19:12
dking_desktopI wonder what software that refers to. The osc-placement package should be 1.8.0.19:14
dking_desktoppython-openstackclient is 4.0.0. Is there another way to check? It's good to know that isn't specifically an Ironic thing. However, my regular VMs work fine. It's only the baremetal nodes causing me trouble.19:16
dking_desktopOh, I add that to the command line. Okay, I can run that, but no mention of the above trait.19:19
*** READ10 has quit IRC19:20
dking_desktophttp://paste.openstack.org/show/789544/19:21
dking_desktopefried: I notice that the above output doesn't show nearly as much information as I see for my compute node. Would the problem be that there's just not any information there about the CPU, etc.?19:25
efrieddking_desktop: sorry, yes, you need to specify a microversion for almost every OSC command with placement, as OSC defaults to 1.0 and very little in placement worked at that microversion. You can use an environment variable if you'd rather not have to think about it with every command.19:26
efrieddking_desktop: When you say "as I see for my compute node", you mean a libvirt host?19:27
efriedis that what compute1.stack1 is?19:27
dking_desktopCorrect19:30
efriedNext thing to look at is the inventory of your ironic node vs. the flavor you're trying to deploy.19:30
efriedIf you're seeing COMPUTE_STATUS_DISABLED in play, your control plane is at least at Train, which means your node is supposed to be at least at Stein, by which time we had cut ironic over to reporting single-unit custom resource classes.19:30
efriedsomething like19:31
efriedopenstack resource provider inventory list (or maybe show) $node_uuid19:31
efriedshould show you that.19:31
dking_desktopI'm using train.19:31
efriedokay, so you might want to make life easier with19:32
efriedexport OS_PLACEMENT_API_VERSION=1.3619:32
efried(I think I'm spelling that var name right)19:32
efriedThen you won't have to add --os-placement-api-version with every command.19:32
dking_desktopOkay, that shows me the resource class and a few other pieces of info: http://paste.openstack.org/show/789545/19:32
efriedGreat. So the flavor you're using should be asking for resources:CUSTOM_BAREMETAL_RESOURCE_CLASS=119:34
*** READ10 has joined #openstack-nova19:34
efriedis it?19:34
dking_desktopYes, it is: http://paste.openstack.org/show/789546/19:35
efriedthe fact that your trait list didn't show COMPUTE_STATUS_DISABLED, and your inventory showed reserved=0, is two out of the three things that should make this node eligible for scheduling.19:35
dking_desktopThat sounds good. Any idea why I would be getting "No valid host was found."19:36
efriedokay, I need to go check whether you need to do something special the ram and disk (set them to zero) but I think those should be ignored. Meanwhile, the last of the three ^ things is to make sure there's no allocation present.19:36
efriedopenstack resource provider allocation list (or show?) $node_uuid19:36
dking_desktopThat's empty.19:37
efriedOkay. Easier than me finding that code would be checking your placement logs.19:37
efriedLook for a line that has a GET call to the /allocation_candidates route with a querystring that includes CUSTOM_BAREMETAL_RESOURCE_CLASS19:38
efriedOkay, yeah, it looks like you need to set the flavor VCPUs to zero to make this work.19:40
efrieddking_desktop: like this: https://docs.openstack.org/ironic/latest/install/configure-nova-flavors19:41
efried(not just VCPU, MEMORY_MB and DISK_GB too)19:42
dking_desktopefried: Sorry, took me a minute to find it: http://paste.openstack.org/show/789547/19:42
dking_desktopAh, so the VCPUs could be the problem? Let me see if I can update that.19:42
efriedYup, so you see where that query is *also* asking for DISK_GB%3A1%2CMEMORY_MB%3A512%2CVCPU%3A1 (DISK_GB:1,MEMORY_MB:512,VCPU:1)?19:43
efriedyour baremetal node's inventory doesn't have any of those resources.19:44
efriedThose are being fed in from your base flavor's disk/ram/vcpus19:44
*** READ10 has quit IRC19:45
efriedSo the fix (ahem, it's a hack, I am ashamed) is to explicitly override those with zeros to take them out of the query.19:45
efriedAnd I think we did that hack because letting you set the base flavor values to real zeros would have blown up the code in a billion places19:46
efrieddking_desktop: anyway, if you follow https://docs.openstack.org/ironic/latest/install/configure-nova-flavors you should be able to make it work.19:46
dking_desktopI'm having trouble getting it to accept 0.19:47
*** jawad_axd has joined #openstack-nova19:48
efrieddking_desktop: You can't set the base flavor properties to zero. You have to set *additional* extra specs for resources:{VCPU, MEMORY_MB, DISK_GB}19:48
efried... to zero19:48
efriedThat will cause the scheduler to ignore the base vcpus/ram/disk values.19:49
efried...which can be set to whatever.19:49
dking_desktopAh. Yes, that's what's in the article, so that makes sense. Let me try that.19:50
*** jawad_axd has quit IRC19:53
dking_desktopGreat! That got much further. The build still failed, and I need to investigate that, but at least it started trying.19:55
efriedOkay, good deal.19:56
dking_desktopI got: 'Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance cc78edb5-0268-4d71-a48d-61608b532d6f.'19:59
dking_desktopDo you know where in the logs I might see where that failed?19:59
*** lbragsta_ has joined #openstack-nova20:00
dking_desktopOh, I see it in nova-compute-ironic.log. It seems that the deploy image needs to be a UUID and not a name.20:01
efrieddking_desktop: I assume that was in your controller logs. You want to look in the compute log on the host that owns your ironic node to see why it bounced.20:01
dking_desktopYeah. nova-compute-ironic.log showed: Validation of image href deploy-initrd failed, reason: Scheme-less image href is not a UUID.20:02
dking_desktopefried: Thank you very much for your help! I wouldn't have been able to make progress today without it.20:06
efrieddking_desktop: You're welcome.20:07
efrieddking_desktop: Reading between the lines, you're trying $old_release configurations/images/etc against $new_release code, and running into stuff we changed along the way.20:08
*** jawad_axd has joined #openstack-nova20:09
efrieddking_desktop: since it's ironic you're trying to deploy, you might also try the #openstack-ironic channel, as they'll generally be more familiar with the quirks in that area. dtantsur|afk I think would be particularly helpful, though he's Eastern Europe so best to hit him earlier in the day.20:09
dking_desktopEarlier, I think I wasn't able to find any good documentation. My google searches kept taking me to outdated documentation, and so I think I started using that, only to find trouble down the road after I'd forgotten where I'd found the information.20:10
efriedugh20:10
dking_desktopAnd yes, the timezone issues make things difficult. I do most of my work while they're sleeping, though they're helpful early in my day.20:11
efriedwell, what might work *sometimes* is, if you hit a page under docs.openstack.org/$proj/$rele/..., try replacing $rele with `latest`20:11
efriedYou'll either get the latest instructions, or you'll get a 404, which means whatever you're trying to do won't work anyway :P20:11
efried(in your case s/$rele/train/)20:12
efriedbauzas, sean-k-mooney, stephenfin: Responded on https://review.opendev.org/#/c/552924/ (NUMA RPs).20:12
efriedI would update it myself, but then stephenfin and gibi would have to be the approvers.20:13
*** jawad_axd has quit IRC20:13
efriedI guess we pretty much have to wait for morning anyway, so I'll just leave it.20:13
dking_desktopThank you so much. I think that now that I'm seeing the correct documentation, I may just start over and read each piece again in order.20:15
efrieddking_desktop: Cool. If in doing that you find discrepancies (or even typos), please propose edits.20:16
efried...or open bugs20:16
efried...or at least shout in here and we'll help you do ^20:16
dking_desktopThanks. I submitted some a while back, and I've been the inspiration for a few more fixes already. Even if I'm having trouble, at least, there's updates being made somewhere.20:17
*** rosmaita has left #openstack-nova20:18
*** factor has joined #openstack-nova20:18
*** brinzhang has quit IRC20:27
*** jawad_axd has joined #openstack-nova20:29
efriedNova meeting in half an hour in #openstack-meeting.20:30
efriedLots to discuss. Please plan to attend.20:30
*** jmlowe has joined #openstack-nova20:32
sean-k-mooneyefried: sorry i was picking up the keys to my now house this afternoon.  so i missed the spec discussions. im just getting dinner now but ill try and look at them in an hour or so.20:33
sean-k-mooneyi can try and attent the nova meeting too20:33
*** jawad_axd has quit IRC20:34
efriedsean-k-mooney: congrats on the house20:35
sean-k-mooneynow  all i need is furniture, broadband, utilities and to move all my stuff :)20:36
*** Sundar has quit IRC20:40
efriedsean-k-mooney: would be nice to have a nova-side delegate who attended the vol local cache meeting.20:41
efriedI'm watching the replay, but I'm afraid it's not going to help me understand whether this is going to fly for U.20:41
efriedI assume lyarwood is sleeping?20:42
efriedsean-k-mooney: also, I added you to the os-vif release patch. Assume no reason not to merge that? https://review.opendev.org/70701820:43
*** jawad_axd has joined #openstack-nova20:50
*** jawad_axd has quit IRC20:54
*** nweinber has quit IRC21:08
sean-k-mooneynot that i know of. i will check the review queue and +1 it soon21:10
*** N3l1x has quit IRC21:13
*** xek has quit IRC21:25
*** lbragsta_ has quit IRC21:29
*** penick has joined #openstack-nova21:31
sean-k-mooneymelwitt: it sound like the quota/flavor counting is boraderline a bug fix but ya if you think its better to do in V then thats fine21:33
sean-k-mooneyi say that because once we fix it upstream im sure we will be asked to backport it downstream21:34
*** penick has quit IRC21:34
melwittsean-k-mooney: normally it would be but I think because we need to leverage a new feature in placement and as part of that we'd have to migrate all of nova's allocations to have proper consumer types set, it's too big to be a bug fix. it's a spec21:35
sean-k-mooneyah right ya that makes sense21:35
sean-k-mooneyi didnt think about the need to do a data migration of the allocations.21:41
*** jmlowe has quit IRC21:46
*** penick has joined #openstack-nova21:50
*** slaweq has quit IRC22:01
*** jmlowe has joined #openstack-nova22:02
*** jmlowe has quit IRC22:08
*** slaweq has joined #openstack-nova22:11
*** jawad_axd has joined #openstack-nova22:13
efriedsean-k-mooney: you know that the num_implicit_numa_nodes thing is dead as of the latest rev, right?22:15
efriedso, your last comment is n/a?22:15
*** slaweq has quit IRC22:16
sean-k-mooneyi just noticed that now22:17
sean-k-mooneyso we are not doing the spliting?22:17
*** eharney has quit IRC22:18
sean-k-mooneyi havent fully got up to speed with what changed sicne v19-2122:18
*** jawad_axd has quit IRC22:18
*** jmlowe has joined #openstack-nova22:18
sean-k-mooneyi see that the reporting is now a tristate True|false|none22:18
efriedright. But that's not explained at all in the doc; it needs to be.22:19
sean-k-mooneyare we still going to do the implict numa generation for non numa guests? or has that been removed too22:19
*** penick has quit IRC22:21
efriedremoved22:22
efriedsean-k-mooney: we're back to segregating22:22
sean-k-mooneyok22:22
sean-k-mooneyi might still propose my automatic asymetirc numa node change as a bug fix then22:23
sean-k-mooneythen someone can tell me its a feature and it can wait to V22:23
*** penick has joined #openstack-nova22:24
efriedYou mean:22:24
efriedToday if you say hw:numa_nodes=$x and we can't split $x evenly we bounce;22:24
efriedWith your fix we would split asymmetrically, as close to evenly as possible22:24
efried?22:24
*** brinzhang has joined #openstack-nova22:24
sean-k-mooneyyep22:24
sean-k-mooneyno other change22:24
efriedWhat's the error for that bounce today?22:25
efriedFrom the API, I imagine?22:25
sean-k-mooneywe have an exception we raise form the api yes22:25
sean-k-mooneythat says we cant generate an asemtic numa node configuration and you have to manally set it in the flavor/image22:25
sean-k-mooneyso it fails only at server boot but before we even create a build request22:26
efriedSo yeah, I'm not sure if that counts as a bug fix or a feature. I'm also not sure whether it's important that it be discoverable or optional.22:26
sean-k-mooneyi think you will get a 40022:26
efriedI can see the argument either way.22:26
sean-k-mooneyits a triaval change too this fucntion https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L1564-L157922:27
efriedyeah, I understand the change22:28
efriedmaybe gmann could weigh in as to whether it would need a microversion.22:28
efriedI would be okay without one, I think. Basically today if you have flavors that look like that they are useless.22:28
sean-k-mooneyit currently returns a 400 https://github.com/openstack/nova/blob/0d3aeb0287a0619695c9b9e17c2dec49099876a5/nova/exception.py#L177622:29
efriedand it's not like you're sitting around trying them again and again to see if maybe they work now.22:29
efriedyou would have tried them, gotten the bounce, and (if you understood the issue) just deleted them or modified them to work.22:29
efriedso this would just allow you to start making new flavors that aren't subject to that limitation.22:29
dking_desktopI know this is the wrong place for this, but would anybody here happen to know, when a baremetal server is being deployed with "openstack server create ...", and it reboots to get a DHCP request, what service should be handling the DHCP request? I'm assuming that it's attempting to get the deploy_image from glance somehow.22:29
sean-k-mooneywell no so you can set them in the image22:29
efriedmeh, same same.22:30
* efried steps aside and lets sean-k-mooney handle question with "DHCP" in it...22:30
sean-k-mooneyefried: the point is the image may have had 5 cpus and you tried to use it with an image that asked for 2 numa nodes22:30
efriedoh, I see. Then it's not quite so simple.22:30
sean-k-mooneyya so today it would fail22:30
efriedI didn't know you could ask for numa topo via the image. But that makes sense now I think about it.22:30
sean-k-mooneywith a tiny change it would work22:31
dking_desktopsean-k-mooney: Would you happen to have any idea there? I think the folks over in #openstack-ironic are overseas and sleeping.22:31
*** penick has quit IRC22:31
sean-k-mooneydking_desktop: the dhcp request is handeled by neutron dhcp agent22:31
sean-k-mooneythe way it works as part of the dhcp respoce we pass a dhcp option that tells the server where to find the ipxe image22:32
sean-k-mooneythat then deploys the ironic python agent22:32
sean-k-mooneywhich connects to glance and streams the image onto the local disk of the ironic server22:32
efriedI'm out.22:33
efriedGood luck.22:33
efriedo/22:33
dking_desktopefried: Have a great night!22:33
*** jawad_axd has joined #openstack-nova22:34
sean-k-mooneydking_desktop: did that anser your question22:34
sean-k-mooneyby defualt the deploy image with the ironic python agent is served off a tftp share that is pxi booted not form glance22:34
dking_desktopsean-k-mooney: Great! I suspected that. I'm looking at the neutron-dhcp-agent container. I see that it's running dnsmasq, but I can't see it listening anywhere.22:35
sean-k-mooneydking_desktop: that said i know ironic have been working on redfish and http boot22:35
sean-k-mooneydking_desktop: it will be running in a network namespace22:35
dking_desktopI'd love to use Redfish. Unfortunately, I think there's a flaw in my server's redfish implimentation that causes it to fail when moving it to manage.22:35
sean-k-mooneyya i have seen that altough i have only worked with prepoduction server that had redfish support so i was just happy it booted :)22:36
*** jmlowe has quit IRC22:36
*** jawad_axd has quit IRC22:38
dking_desktopI do like redfish, though. I'm using it for everything outside of openstack.22:39
sean-k-mooneyenginering samples of motherboads or alpha bios roms are not your friend whn trying to get redfish to work however22:40
dking_desktopIt seems that I'm not familiar with network namespaces. That's something new I suppose that I"ll need to learn about. I did find the configuration file, though. It only has one line, and that's for "log-facility".22:41
sean-k-mooneydking_desktop: so anyway if you log into the network node and do "ip netns" you should see a bunch of network namespaces22:41
sean-k-mooneythe one where dnsmask is runnign will be dhcp_<network uuid> i think22:42
dking_desktopThere's just a couple of them at the moment, and one is the "qdhcp-..."22:42
sean-k-mooneyyep that is likely the one22:42
sean-k-mooneyq stands for quantum which is what neutron was originally called22:43
*** penick has joined #openstack-nova22:43
sean-k-mooneyso if you do "sudo ip netns exec qdhcp-.... bash" you will spawn a bash shell in the network namespace22:44
sean-k-mooneythen you can do "netstat -nlp"22:44
sean-k-mooneyand you shoudl see it listening on port 53?22:44
sean-k-mooneythat is the dhcp port right22:44
dking_desktopI was thinking it was port 67. I think 53 is DNS.22:45
dking_desktopBut yes, both are there.22:45
sean-k-mooneyah you are right 53 is dns22:45
*** mriedem has quit IRC22:45
dking_desktopThat's pretty neat. I see that I still have much to learn.22:46
sean-k-mooneyso if you install tcpdump or tskark you shoudl be able to dump the dhcp packets22:46
sean-k-mooneyi prefer tshark(the cli for wireshark) since it print the packets more nicely22:46
sean-k-mooneyso "tshark -i <interface> -V dhcp"22:47
sean-k-mooneythe -V is what prints the full packet22:47
sean-k-mooneyit might not recognise dhcpu in which case you would do  'tshark -i <interface> -V udp port 67 or 68'22:48
*** jmlowe has joined #openstack-nova22:49
dking_desktopI've been using tcpdump. So, I see that inside the network namespace, my devices are limited to just the loopback, and another, which I'm assuming is from an ovs bridge port.22:49
sean-k-mooneyyes22:49
sean-k-mooneywhat is the actull issue you are having by the way22:51
dking_desktopIs that for the provisioning_network?22:51
sean-k-mooneyso it depned on how you have it set up. i belive you can etiher use a seperate provisioning netwrok with a dnsmask manage by ironic or you can use a neutron netwrok22:51
sean-k-mooneyi should point out that i have not used ironic in about 4 release so they could have change things.22:52
dking_desktopThe issue is that I'm trying to deploy a baremetal server. Where I'm at currently is that I have created the baremetal node, introspected it, provided it, and I'm attempting to "openstack server create". I see that the node reboots and sends a DHCP request, but it gets no response, so it never completes the BUILD.22:52
sean-k-mooneyah ok22:53
sean-k-mooneyis your provisioning network a neutron netwrok22:53
sean-k-mooneyif so did you make it a flat network22:53
dking_desktopYes, but I'm pretty sure I set it up incorrectly. I'm still trying to get familiar with openstack networking.22:53
sean-k-mooneyor are you using the external provioning network approch where the network is not manage by openstack22:53
sean-k-mooneydking_desktop: i think the issue you are hitting is that ironic only optionally uses neutorn22:54
*** TxGirlGeek has joined #openstack-nova22:54
sean-k-mooneyin older release provioning was handeled by a non nuton network22:54
sean-k-mooneyin more recent release they use neutron22:55
sean-k-mooneynot all the docs are clear on whant you shoudl do in each case22:55
sean-k-mooneyi assume you are using enabled_network_interfaces=noop,flat,neutron and default_network_interface=neutron22:56
dking_desktopI'm using train, currently. I'm open to whatever option works. I saw somewhere in the documentation that I should set cleaning_network, and then I got a complaint that I should set provisioning_network also. I didn't find any documentation, so I just made a flat network, and tried using that.22:56
sean-k-mooneythis is the relevent doc i think https://docs.openstack.org/ironic/train/install/configure-tenant-networks.html22:57
dking_desktopenabled_network_interfaces = flat,neutron, and I don't have a default_network_interface.22:58
sean-k-mooneydking_desktop: i think that is ok22:58
*** jmlowe has quit IRC22:58
sean-k-mooneyits says if default_network_interface is not set the default network interface is determined by looking at the [dhcp]dhcp_provider22:59
sean-k-mooneydking_desktop: did you disabel security groups for your provisioning and cleaning network23:00
dking_desktopI was just reading about that. I did not.23:02
dking_desktopI suppose that I should set cleaning_network_security_groups and provisioning_network_security_groups ? Are those the group names or IDs?23:02
sean-k-mooneyusually the uuid23:03
sean-k-mooneyif intospection is working then you are 90% of the way there23:04
sean-k-mooneyas that means 1 ironci can manage teh hardawer over ipmi23:04
*** tkajinam has joined #openstack-nova23:04
sean-k-mooney2 it can serve the intospection ram disk23:04
dking_desktopYep. It took quite some time to get that working.23:04
dking_desktopSo, I know that it at least can get a ram disk to boot. I just have to figure out how to get the networking straight for provisioning.23:05
*** nweinber has joined #openstack-nova23:06
sean-k-mooneyya unfortunetly i think you will have to ask either the ironic or neutron folks23:07
sean-k-mooneyi have done it years ago but i dont use ironic often so i have forgoten most of it23:07
openstackgerritBrian Rosmaita proposed openstack/nova master: Reject boot request for unsupported images  https://review.opendev.org/70773823:08
sean-k-mooneydking_desktop: do you need multi tenancy by the way for the provioning network23:09
sean-k-mooneyif its a private cloud you could look at teh simpler flat configuration23:10
dking_desktopI might need to do that. Right now, I want to leave my options open.23:10
*** ivve has quit IRC23:11
*** huaqiang has quit IRC23:11
*** slaweq has joined #openstack-nova23:11
dking_desktopIs the "provisioning_network" only to get the ramdisk booted and deploy the server? So, once that's done, it's either not necessary, or perhaps only for status updates?23:11
*** huaqiang has joined #openstack-nova23:11
sean-k-mooneyyes basicaly23:12
sean-k-mooneyit is the network that need to have conectivity to where the image is located23:12
sean-k-mooneyand the tftp server23:12
sean-k-mooneyonce the ironic node is provisioned it will normally  use a different interface for the teant to ssh in/have netwrok conenctivity out onto the datacenter23:13
sean-k-mooneydking_desktop: you might be hitting this by the way https://docs.openstack.org/ironic/train/admin/troubleshooting.html#dhcp-during-pxe-or-ipxe-is-inconsistent-or-unreliable23:15
dking_desktopSo, maybe you can help me here. Inside of the network namespace, I'm not seeing any DHCP requests. That explains why I didn't see anything logged and no responses.23:15
*** slaweq has quit IRC23:16
sean-k-mooneyya so its possible the dhcp request is being droped by the switch before it gets to the contoler23:16
dking_desktopSo, how _should_ the packets be getting there? I see that this network interface is an ovs port inside of br-int. I know that br-int is patched to br-ex.23:17
*** nweinber has quit IRC23:17
sean-k-mooneyyes and the br-ex should have a physical interface attached23:18
dking_desktopThe server is booting up using DHCP/PXE, but it is on a trunked port, so the packets are coming in untagged. I know that's caused me trouble before.23:18
sean-k-mooneyright so if the neutron network is a flat netwrok23:18
*** TxGirlGeek has quit IRC23:18
sean-k-mooneythen it should be untag form the server, get to the top of rack switch and remain untagged23:19
sean-k-mooneythen as ita a broadcast it will flood23:19
dking_desktopIt does. It's attached to bond0. So, does br-ex send DHCP broadcasts to br-int, and then it sends them to all of its ports? That doesn't sound right.23:19
sean-k-mooneyeventually make it to the contoler23:19
sean-k-mooneywhen it arrives in the contoler it will enter the br-ex. it will be vlan taged with a local vlan and then be flooded to only the ports fo that vlan23:20
sean-k-mooneythen it will be striped when it is send to the dhcp namespace23:20
sean-k-mooneyso if you do a tcp dump on the bond you should see the request23:20
sean-k-mooneyif its gettign that far23:21
dking_desktopYes, I see them on the requests. In order to get ironic dnsmasq to work, though, I had to bring up the br-ex interface with an IP address. Could that be messing with this?23:22
sean-k-mooneyperhaps the br-ex normally should not require an ip23:23
dking_desktopSo, the baremetal server sends a DHCP request, it goes through the chassis switch, to the ToR, and then from there to the controller, and I see the data coming in on bond0.23:23
sean-k-mooneyso you deploed a second dnsmas for ironic23:24
sean-k-mooneythat is vaild but you have to set the dhcp provider i belive23:24
dking_desktopMaybe not, but without it, I couldn't get ironic's dnsmasq to be able to see the packets. So, it was a hack. Would there have been a better way? Folks in the other channel were recommending that I have untagged packets tagged at the switch port, but so far, that's not been working.23:24
sean-k-mooneyso that is the old way to do it im not sure if its still required or the default.23:25
sean-k-mooneywhen ironic was first created it handeld amost all its nteworking itself23:25
sean-k-mooneythen neutron was added after23:25
dking_desktopIronic handles its own dnsmasq. It works fine once I manually changed the interface to br-ex and put an IP on br-ex to bring it up.23:25
sean-k-mooneyslowly over the laft few years they have been moving ot useing neutorn where possible23:25
sean-k-mooneyya23:26
sean-k-mooneythat was how i deployed previously23:26
sean-k-mooneyif you do a tcp dump on br-ex23:26
sean-k-mooneydo you see the dhcp request23:26
dking_desktopYes, I can see them on br-ex23:27
sean-k-mooneyand they are not vlan tagged23:27
dking_desktopCorrect23:28
sean-k-mooneyi have had issue with default route and arp that cause the respoces to not be sent by the br-ex in the past23:28
sean-k-mooneydo you have a second interface on the same subnet23:28
dking_desktopWhich subnet is that? The one that I setup for br-ex?23:28
sean-k-mooneyyes23:29
sean-k-mooneyi have had issues in the past where i have added an ip to the br-ex and recived packet but had teh reply sent via ens3 becaue it had an ip in the same subnet but a better metric23:29
dking_desktopI don't think so. I set that up to use an IP from the range I've been using for untagged packets.23:29
*** spatel has joined #openstack-nova23:30
dking_desktopNo, the only route for that subnet is through dev br-ex.23:30
sean-k-mooneyok this sounds very familar but i dont recall the casue.23:31
dking_desktopFrom inside of the qdhcp-* subnet, "tcpdump -i any -nne -xx -Avvvv" hasn't shown any packets yet.23:31
sean-k-mooneyyes so i dont think it will since you the iniall boot will go to the provisoing network23:32
*** lbragstad_ has joined #openstack-nova23:32
sean-k-mooneyi suspect if you check the uuid that is the dhcp agent for the tenat network23:32
sean-k-mooneynot the provisoning network23:32
sean-k-mooneyanyway its getting late and im out of ideas so ill have to leave it there23:33
*** lbragstad has quit IRC23:33
*** spatel has quit IRC23:34
dking_desktopOh, okay. I see now that the * in qdhcp-* is actually an ID for a network. Exactly, it's the tenant network.23:35
dking_desktopLet me check that I setup DHCP for the provisioning network.23:35
dking_desktopOh, I did not enable DHCP for the provisioning network. I can enable it. Is that what's supposed to happen?23:36
sean-k-mooneyif you didnt enable it in the subnet then neutron would not create the namespce or spwan the dnsmas process for it so that could be it23:36
sean-k-mooneythere is one way to find out:) but i think if you use the nutron netorking interface dirver then yes you shoudl turn it on23:37
dking_desktopLet me try that. But if that's the case, would the DNS be the right one, with the PXE information in it?23:37
sean-k-mooneyif you use the flat network interface dirver i think you deploy a seperate dnsmask for ironic as you did manually23:37
sean-k-mooneyhonestly im at the edge of my knoladge here as i said its been a while but i cant recall23:38
dking_desktopYour help has been very enlightening. Also, I just saw some packets in that network, and it does seem to be set up for PXE. I'm going to try creating a server and see if that works.23:42
*** lbragstad__ has joined #openstack-nova23:42
dking_desktopBut even if not, I've learned much, so thank you very much!23:42
*** lbragstad_ has quit IRC23:44
sean-k-mooneythis is realy really old but if you have not seen it before its how neutron ovs networking used to work23:45
sean-k-mooneyhttps://www.rdoproject.org/networking/networking-in-too-much-detail/23:45
sean-k-mooneyits now simpler23:45
sean-k-mooneybut it is a good thing to read over at least once even if its not how it works exactly today23:45
* sean-k-mooney feels old since this was published after i started wroking on openstack and figuring this stuff out23:46
dking_desktopThank you. I'll check that out. It may help with some of the dark spots in my knowledge.23:48
*** lbragstad_ has joined #openstack-nova23:49
*** lbragstad__ has quit IRC23:51
*** lbragstad__ has joined #openstack-nova23:56
*** lbragstad_ has quit IRC23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!