Thursday, 2019-07-25

*** artom has joined #openstack-nova00:00
*** xek has joined #openstack-nova00:01
*** whoami-rajat has quit IRC00:01
*** mlavalle has quit IRC00:04
*** jistr has quit IRC00:15
*** jistr has joined #openstack-nova00:15
*** gyee has quit IRC00:49
*** ricolin has joined #openstack-nova00:55
*** igordc has quit IRC01:04
*** slaweq has joined #openstack-nova01:11
*** slaweq has quit IRC01:15
*** KeithMnemonic has quit IRC01:16
*** brinzhang has joined #openstack-nova01:28
*** mvkr_ has quit IRC01:41
*** mriedem has quit IRC01:49
*** mvkr_ has joined #openstack-nova01:54
*** whoami-rajat has joined #openstack-nova03:06
*** slaweq has joined #openstack-nova03:11
*** slaweq has quit IRC03:16
*** psachin has joined #openstack-nova03:38
*** rcernin has quit IRC03:55
openstackgerritZHOU YAO proposed openstack/nova master: Preserve UEFI NVRAM variable store  https://review.opendev.org/62164604:01
*** udesale has joined #openstack-nova04:06
*** etp has joined #openstack-nova04:07
*** pcaruana has joined #openstack-nova04:44
*** boxiang has quit IRC04:48
*** boxiang has joined #openstack-nova04:48
*** amodi has quit IRC04:49
*** pcaruana has quit IRC04:56
*** Luzi has joined #openstack-nova05:03
openstackgerritmelanie witt proposed openstack/nova-specs master: Propose policy rule for host status UNKNOWN  https://review.opendev.org/66618105:05
*** slaweq has joined #openstack-nova05:11
*** slaweq has quit IRC05:16
*** ratailor has joined #openstack-nova05:30
*** rcernin has joined #openstack-nova05:33
*** brault has quit IRC05:35
*** tetsuro has joined #openstack-nova05:56
*** jaosorior has quit IRC06:03
*** rcernin has quit IRC06:03
*** tetsuro has quit IRC06:04
*** shilpasd has joined #openstack-nova06:06
*** belmoreira has joined #openstack-nova06:08
*** brinzhang has quit IRC06:10
*** brinzhang has joined #openstack-nova06:11
*** slaweq has joined #openstack-nova06:11
*** mkrai has joined #openstack-nova06:13
*** slaweq has quit IRC06:15
*** rcernin has joined #openstack-nova06:18
*** jaosorior has joined #openstack-nova06:20
*** pcaruana has joined #openstack-nova06:21
*** rcernin has quit IRC06:21
*** rcernin has joined #openstack-nova06:21
*** dpawlik has joined #openstack-nova06:31
*** maciejjozefczyk has joined #openstack-nova06:32
*** slaweq has joined #openstack-nova06:33
*** udesale has quit IRC06:33
*** udesale has joined #openstack-nova06:34
*** luksky123 has joined #openstack-nova06:48
-openstackstatus- NOTICE: The git service on opendev.org is currently down.06:50
*** ChanServ changes topic to "The git service on opendev.org is currently down."06:50
*** dpawlik has quit IRC06:50
*** yaawang has quit IRC06:55
*** yaawang has joined #openstack-nova06:57
*** ChipOManiac has joined #openstack-nova06:58
openstackgerritZHOU YAO proposed openstack/nova master: Preserve UEFI NVRAM variable store  https://review.opendev.org/62164607:04
*** rcernin has quit IRC07:06
*** tesseract has joined #openstack-nova07:15
*** udesale has quit IRC07:16
*** adriant has quit IRC07:17
*** dpawlik has joined #openstack-nova07:17
*** udesale has joined #openstack-nova07:18
*** adriant has joined #openstack-nova07:18
*** rpittau|afk is now known as rpittau07:22
ChipOManiacHi guys. We have an openstack cluster with three KVM compute hosts that we setup via openstack-ansible. We've set up a single Nova-LXD compute unit and then imported a tgz ubuntu cloud image into our images list.07:25
ChipOManiacOur problem seems to be with launching any LXD instances on this new Nova-LXD compute host.07:26
ChipOManiacIf we try starting an LXD instance with this image. Nova creates a KVM instance and tries to boot the tgz with it.07:27
ChipOManiacObviously that won't work.07:27
ChipOManiacIs there any way to make the LXD instance work here?07:28
ChipOManiacI've seen Ubuntu charms deployments have a 'root-tar' image format. I don't see that here in our openstack-ansubile. Is there any way for me to add this disk format?07:29
*** boxiang has quit IRC07:30
*** boxiang_ has joined #openstack-nova07:30
*** tssurya has joined #openstack-nova07:33
*** bhagyashris has joined #openstack-nova07:38
*** priteau has joined #openstack-nova07:45
*** tkajinam has quit IRC07:53
*** tkajinam has joined #openstack-nova07:53
*** jaosorior has quit IRC07:57
*** ralonsoh has joined #openstack-nova08:13
*** ttsiouts has joined #openstack-nova08:18
kashyapaspiers: Catching up with the long scroll here...buried in several things08:20
*** cdent has joined #openstack-nova08:22
*** belmoreira has quit IRC08:28
*** panda has quit IRC08:28
*** belmoreira has joined #openstack-nova08:29
*** belmoreira has quit IRC08:29
*** panda has joined #openstack-nova08:31
*** mrch_ has joined #openstack-nova08:33
-openstackstatus- NOTICE: Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers.08:33
*** ChanServ changes topic to "Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers."08:33
*** tkajinam has quit IRC08:36
*** tetsuro has joined #openstack-nova08:37
*** ChanServ changes topic to "Current runways: https://etherpad.openstack.org/p/nova-runways-train -- This channel is for Nova development. For support of Nova deployments, please use #openstack."08:40
-openstackstatus- NOTICE: The problem in our cloud provider has been fixed, services should be working again08:40
*** luksky123 has quit IRC08:54
openstackgerritMerged openstack/nova master: Correct project/user id descriptions for os-instance-actions  https://review.opendev.org/67002708:58
stephenfinalex_xu: Would you be okay with me fixing this up in a follow-up?09:01
stephenfin<mschuppert> kashyap: yes. when you'd reserve one host per supported version and can not get multiple versions on one host.09:01
stephenfin<mschuppert> kashyap: like I mentioned yesterday . the devnest pool is a shared pool for multiple DFGs. there are only 2 systems or so which are really exclusive for compute.09:01
stephenfin<kashyap> mschuppert: Hmm, didn't realize that fully09:01
stephenfinwhoops09:01
stephenfinhttps://review.opendev.org/#/c/551026/09:01
stephenfinHexChat's copy-paste behaviour is FUBAR09:01
alex_xustephenfin: ok, i'm cool with that09:02
*** luksky123 has joined #openstack-nova09:08
openstackgerritStephen Finucane proposed openstack/nova master: Follow-up for I2936ce8cb293dc80e1a426094fdae6e675461470  https://review.opendev.org/67266909:08
stephenfinalex_xu: ^09:08
*** ivve has joined #openstack-nova09:08
alex_xuthanks09:08
*** lpetrut has joined #openstack-nova09:15
*** lpetrut has quit IRC09:16
*** lennyb has joined #openstack-nova09:16
*** lpetrut has joined #openstack-nova09:16
kashyapaspiers: On "what's wrong with option 1 --  do we need to pass virt_type?" -- if libvirt "knows" that KVM is available, then if you pass virt_type as None to getDomanCapabilities(), it defaults to KVM09:17
kashyapI've done a bunch of quick tests09:17
*** tetsuro has quit IRC09:28
*** tetsuro has joined #openstack-nova09:29
aspierskashyap, sean-k-mooney: I got a reply from our libvirt guy09:29
aspiershe said "domcapabilities should be the same with or without virttype"09:29
aspiersso I'm not sure why it's a parameter in the API call09:29
aspiers"default on a kvm host would be virttype=kvm"09:30
*** luksky123 has quit IRC09:31
kashyapaspiers: I'm getting you some diffs and results.  1 sec, uploading the files09:31
kashyaphttps://kashyapc.fedorapeople.org/domCapabilities/domCapabilities_without_virt_type.txt09:32
kashyaphttps://kashyapc.fedorapeople.org/domCapabilities/domCapabilities_with_virt_type_kvm.txt09:32
kashyapaspiers: If you `diff` them, you'd see no `diff` (besides a one-line unrelated noise)09:32
aspiersok09:33
kashyapaspiers: But ... as you guessed, if you explicitly supply virt_type as 'qemu', you would see siginificant difference09:33
kashyapHere is the 'diff': https://kashyapc.fedorapeople.org/domCapabilities/diff_domCapabilities_of_virt_type_kvm_and_qemu.txt09:33
*** takashin has left #openstack-nova09:34
*** luksky123 has joined #openstack-nova09:35
kashyapaspiers: In other words, "your guy" is of course correct :-)09:35
aspierskashyap: thanks09:52
kashyapaspiers: I'm still finishing something; but yes, option-4 is what I'd lean towards (at 'debug' level)09:53
aspiersI think that's what sean-k-mooney's new PS implemented09:53
kashyapAnd also yes to sean-k-mooney's we _do_ want to use the 'virt_type' when we know.  As that will ensure the right CPU features are reported.09:54
kashyapaspiers: Ah, okay.  I'm lagging behind, as I was basing my comments on your IRC exchange linked in the review.09:54
kashyap(Didn't refresh)09:54
aspierskashyap: https://review.opendev.org/#/c/670189/9..10/nova/virt/libvirt/host.py09:54
kashyapThank you09:54
bhagyashrisstephenfin: I gone through all you patches and also applied those in my environment and did some testing on it I have few observation with me09:58
stephenfinshoot09:58
bhagyashrisstephenfin: 1. 1. I am able create the pinned instance using old way on your patches. you can see the details which steps I have followed to create the instance here http://paste.openstack.org/show/754840/09:59
bhagyashrisstephenfin: 2. I have also checked few scenarios and there I saw some issues. you can see here http://paste.openstack.org/show/754841/09:59
stephenfinYeah, I'd expect it to consume both VCPU and PCPU because I don't have the handling code you do. Your implementation is better in that regard10:01
stephenfinThe other two scenarios are interesting. I wonder what I'm hitting there10:01
bhagyashrisstephenfin: yeah it's handling this case ....10:01
bhagyashrisstephenfin: I know where is point that's reporting wrong inventory10:02
stephenfinOh yeah?10:04
bhagyashrisHere https://review.opendev.org/#/c/671793/4/nova/virt/libvirt/driver.py@6852 if I set [compute] cpu_dediacted_set then it's report VCPU resources as well10:04
bhagyashrisstephenfin: because in self._get_vcpu_total() method you are checking if vcpu_pin_set , elif CONF.compute.cpu_shared_set else all the host_cpus10:05
stephenfinAh, yes. So before thatfallthrough case I need a final conditional to check if cpu_dedicated_set is set and return nothing if so10:06
stephenfinAnd ditto for the '_get_vcpu_total' method10:06
bhagyashrisyes10:06
bhagyashrisstephenfin: I will keep testing and review your patches10:07
stephenfinbhagyashris: The one I'm most interested in your thoughts on is https://review.opendev.org/#/c/671800/7/nova/objects/numa.py10:07
stephenfinBecause I think that and the changes to InstanceNUMACell are the biggest differences we have10:08
stephenfinI don't know what we do with old NUMACell objects. For those, 'cpu_usage' can contain usage of either pinned (PCPU) or unpinned (VCPU) instance vCPUs10:09
bhagyashrisstephenfin: yeah same question was in my mid10:09
bhagyashrismind*10:09
stephenfinand we don't ever rebuild the objects from scratch10:09
stephenfininstead, we use that numa_usage_from_instances function to add or subtract usage based on a provided instance NUMA topology10:10
stephenfinI'm thinking it might make sense to start retrieving all instances associated with a host and building the host NUMA topology object from scratch each time10:11
stephenfinbut that would involve a join on the instance extra table in some places10:11
*** mkrai has quit IRC10:12
bhagyashrisstephenfin: yeah that is one option10:14
bhagyashrisstephenfin: what's your opinion about my change I mean I made change in both the InctanceNUMAToplogy and host NUMAToplogy10:15
stephenfinYeah, as noted I'm not sure if it's necessary yet. We have the 'cpu_policy' field on that object so we're already able to tell if 'cpuset' describes VCPUs or PCPUs10:16
stephenfinThat will change when we support both types in the same instance, but we're doing that separately10:17
stephenfinSpeaking of which, I need to review that spec again today10:17
*** ttsiouts has quit IRC10:17
stephenfinSo I don't mind having it, but I think it might be unnecessary for now and possibly make things a little more complicated than necessary10:17
*** ttsiouts has joined #openstack-nova10:18
bhagyashrisstephenfin: okay, But what I thought is anyways we are going to support both the VCPU and PCPU in future so that will not cause any problem even if we keep now and I have added the api and scheduler check that dont allow both the PCPU and VCPU in one request10:19
bhagyashrisstephenfin: so in future there will be just matter of removing that check10:20
stephenfinYup, I get that. Maybe it makes sense. I haven't really parsed how much complexity it adds so maybe it's not an issue10:20
stephenfinI just wanted to highlight that it wasn't 100% necessary yet, if that makes sense10:20
sean-k-mooneywell we will need to add a filed to store teh mask of pinned cores10:20
sean-k-mooneywhich we should not add until we need it10:20
sean-k-mooneye.g. we should not make object changes that would only be required when we allow mixed instances10:21
sean-k-mooneyuntil we support that10:21
stephenfinYeah, that's my gut feeling too10:21
stephenfinYAGNI or something like that10:21
bhagyashrisstephenfin, sean-k-mooney: ok.10:22
*** ttsiouts has quit IRC10:22
stephenfinbhagyashris: On the plus side, it should be very easy reuse it if/when that spec to allow mixed instances gets merged, so that's a win :)10:23
bhagyashrisstephenfin, sean-k-mooney: means for now we will consider the cpuset only10:24
bhagyashrisstephenfin: ok10:24
bhagyashrisstephenfin: and one more point we are going to allow the new syntax of flavor extra specs like "resources:PCPU=<no of cpus>" right?10:26
stephenfinI think we decided we would have to, yes10:26
stephenfinThough it wouldn't be the preferred option, of course10:26
*** brtknr has quit IRC10:27
bhagyashrisstephenfin: and I saw that this is not yet implemented in your series of patches ... and I have implemented that so may be we can use that code10:27
stephenfinbhagyashris: I'm working on rebasing your stuff into my series (keeping authorship, of course) to do just that now :)10:28
*** cdent has quit IRC10:29
bhagyashrisstephenfin: also the upgrade and reshape part is not in your series of patches and I have submitted the patche https://review.opendev.org/#/c/672224/1 to do so . I will address all your review comments on it and will upload the patch ( this is priority work for me)10:30
bhagyashrisstephenfin: now only the part remaining is which one will be the better option this one  https://review.opendev.org/#/c/672223/1 or this one https://review.opendev.org/#/c/671801/710:32
*** brtknr has joined #openstack-nova10:33
stephenfinYeah, pretty much10:34
bhagyashrisstephenfin: I guess this one will be the better option ttps://review.opendev.org/#/c/672223/1  because it's simple and also as mentioned in spec that we will used scheduler profiler https://review.opendev.org/#/c/555081/28/specs/train/approved/cpu-resources.rst@45110:34
stephenfinI'm still on the fence, personally10:34
stephenfinYeah, definitely a lot less work there10:34
stephenfinThe only thing is that rewriting extra specs on the user feels a little wrong. I'm not sure why10:35
stephenfinAssuming these are persisted somewhere10:35
bhagyashrisstephenfin: yeah, but it looks simple that and less work and there is no overhead of conf option as well10:36
stephenfinI think we still need the config option, no?10:37
stephenfinOtherwise how do we say "don't start converting these extra specs yet because I don't have enough hosts reporting PCPU inventory" ?10:37
bhagyashrisstephenfin: ohh ok10:39
stephenfinI'm not crazy, right? We do need an option for that, yeah?10:40
bhagyashrisstephenfin: will wait for others opinion then10:40
bhagyashrisstephenfin: TO DO is 1. the aliasing of flavor extra spec 2. upgrade related stuff is remaining ... out of that for first to do will wait for other opinion and I will take the upgrade part on high priority10:45
bhagyashrisstephenfin: what's your opinion?10:45
*** cdent has joined #openstack-nova10:46
*** etp has quit IRC10:46
*** jaosorior has joined #openstack-nova10:47
openstackgerritMaksim Malchuk proposed openstack/nova stable/queens: fix cellv2 delete_host  https://review.opendev.org/67269010:49
*** bhagyashris has quit IRC10:50
*** brtknr has quit IRC10:56
*** brtknr has joined #openstack-nova10:56
openstackgerritStephen Finucane proposed openstack/nova master: Follow-up for I2936ce8cb293dc80e1a426094fdae6e675461470  https://review.opendev.org/67266911:01
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Start reporting PCPU inventory to placement  https://review.opendev.org/67179311:01
openstackgerritStephen Finucane proposed openstack/nova master: trivial: Rename exception argument  https://review.opendev.org/67179511:01
openstackgerritStephen Finucane proposed openstack/nova master: trivial: Remove unused function parameter  https://review.opendev.org/67179611:01
openstackgerritStephen Finucane proposed openstack/nova master: Remove 'hardware.get_host_numa_usage_from_instance'  https://review.opendev.org/67179711:01
openstackgerritStephen Finucane proposed openstack/nova master: Remove 'hardware.host_topology_and_format_from_host'  https://review.opendev.org/67179811:01
openstackgerritStephen Finucane proposed openstack/nova master: Remove 'hardware.instance_topology_from_instance'  https://review.opendev.org/67179911:01
openstackgerritStephen Finucane proposed openstack/nova master: Rework 'hardware.numa_usage_from_instances'  https://review.opendev.org/67256511:01
openstackgerritStephen Finucane proposed openstack/nova master: tests: Split NUMA object tests  https://review.opendev.org/67233611:01
openstackgerritStephen Finucane proposed openstack/nova master: WIP: hardware: Differentiate between shared and dedicated CPUs  https://review.opendev.org/67180011:01
openstackgerritStephen Finucane proposed openstack/nova master: Add support translating CPU policy extra specs, image meta  https://review.opendev.org/67180111:01
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: '_get_(v|p)cpu_total' to '_get_(v|p)cpu_available'  https://review.opendev.org/67269311:01
*** adriant has quit IRC11:07
*** adriant has joined #openstack-nova11:07
openstackgerritStephen Finucane proposed openstack/nova master: objects: Remove ConsoleAuthToken.to_dict  https://review.opendev.org/65297011:08
openstackgerritStephen Finucane proposed openstack/nova master: WIP! docs: Rework nova console diagram  https://review.opendev.org/66014711:08
openstackgerritStephen Finucane proposed openstack/nova master: docs: Integrate 'sphinx.ext.imgconverter'  https://review.opendev.org/66569311:08
*** jaosorior has quit IRC11:08
*** udesale has quit IRC11:13
*** ChipOManiac has quit IRC11:19
slaweqsean-k-mooney: hi again11:20
sean-k-mooneyo/11:20
slaweqsean-k-mooney: again about https://bugs.launchpad.net/neutron/+bug/1836642 - I replied to Your last comment11:20
openstackLaunchpad bug 1836642 in neutron "Metadata responses are very slow sometimes" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)11:20
slaweqcan You check it?11:20
sean-k-mooneyya i saw11:21
sean-k-mooneywhat i think happened there was the first call took 13 seconds to complete a second call was recived while the first call was completeing and that took 2 secons since the first call hand not completed11:22
slaweqsean-k-mooney: no, first call took 2 seconds11:23
slaweqand second took 1311:23
slaweqfirst call is always for /instance-id11:23
sean-k-mooneyok its proably a cache miss then11:23
slaweqand that would be retried if it would took more than 10 seconds - it's how this ec2-metadata script from cirros works11:23
slaweqand second call (after first was completed fine) was for public-keys and this took 13 seconds11:24
slaweqand this isn't retried by ec2-metadata script so it failed11:24
*** tetsuro has quit IRC11:24
sean-k-mooneyright ok11:25
slaweqsean-k-mooney: so I think that it's like that because each of those requests are processed by different worker thus for each worker data isn't cached11:26
slaweqsean-k-mooney: could it be like that?11:26
sean-k-mooneyi think we are using memcahce to chage it but if its in porcess yes11:26
sean-k-mooneyif its memcache it still takes a while to propagate11:27
sean-k-mooneythose request are ~500 ms apart11:27
slaweqcan You maybe check it to be sure?11:27
slaweq500ms when? usually when it's fine, right?11:28
sean-k-mooney_Jul_09_17_23_47_872908 - _Jul_09_17_23_47_351149 ~ 500ms11:28
sean-k-mooneyactully i gues those are the log message for the completions?11:29
slaweqyes11:29
slaweqand there is time in each line11:29
slaweq"time: 13.2546282"11:29
slaweqthis is how long this took to process this request and send response11:29
sean-k-mooneyanyway ill check11:30
slaweqsean-k-mooney: thx11:30
openstackgerritMerged openstack/nova master: Remove deprecated CPU, RAM, disk claiming in resource tracker  https://review.opendev.org/55102611:36
openstackgerritMerged openstack/nova master: Pass extra_specs to flavor in vif tests  https://review.opendev.org/66255611:36
sean-k-mooneyso it checks the cache here https://opendev.org/openstack/nova/src/branch/master/nova/api/metadata/handler.py#L7711:36
sean-k-mooneywhich is initalised here https://opendev.org/openstack/nova/src/branch/master/nova/api/metadata/handler.py#L4711:37
sean-k-mooneyi think this delegates to oslo cache11:37
sean-k-mooneyso we call this https://github.com/openstack/nova/blob/master/nova/cache_utils.py#L4711:38
*** stakeda has quit IRC11:39
sean-k-mooneythere is no cache section defiend in the nova.conf http://logs.openstack.org/35/521035/8/check/tempest-full/031b0b9/controller/logs/etc/nova/nova_conf.txt.gz11:40
sean-k-mooneymeaning we fall back to olso_caches default with is an in memeory dogpile dict cache11:41
*** pcaruana has quit IRC11:42
sean-k-mooneyslaweq: so looking at the metadata apis uwsgi config we are runnint 1 thread per process and 2 processes11:42
sean-k-mooneyso there are two workers each of which have seperate in memeory dict caches11:43
*** igordc has joined #openstack-nova11:43
sean-k-mooneyslaweq: so this is like just because oslo cache is not configured to use memcache in that job11:43
slaweqsean-k-mooney: so do You think that enabling memcache "globally" in all tempest based jobs would be good and could help to solve/workaround this issue?11:44
sean-k-mooneyi think it would not only solve this issue but signifcatntly speed up the gate11:44
sean-k-mooneywell maybe not11:44
slaweqthan always during /public-keys/ call we would already have cached data for instance11:44
sean-k-mooneyit is still doing an in memory dict cache11:45
sean-k-mooneybut it  proably would resove this issue11:45
sean-k-mooneyyep11:45
slaweqsean-k-mooney: ok, that's something11:45
slaweqcan You maybe write this in comment in launchpad?11:45
slaweqI will try to propose patch to tempest repo probably11:45
slaweqor devstack11:45
sean-k-mooneysure. keysone is already configure to use memcache in that job11:47
sean-k-mooneyhttp://logs.openstack.org/35/521035/8/check/tempest-full/031b0b9/controller/logs/etc/keystone/keystone_conf.txt.gz11:47
sean-k-mooneyso it should be trival to copy the exact same caching logic to nova the config options are identical because they just come form oslo11:47
slaweqsean-k-mooney: thx a lot11:47
slaweqI will propose patch for this today11:47
slaweqand we will see how it will work :)11:48
slaweqbut that is some idea which may solve this problem in gate and improve gate stability for all projects in fact :)11:48
sean-k-mooneyactully i think oslos cache's default is the null cache11:48
sean-k-mooneyso i would expect this to speed up the gate jobs11:48
sean-k-mooneythe null cache never caches anything11:49
openstackgerritAdam Spiers proposed openstack/nova master: libvirt: harden Host.get_domain_capabilities()  https://review.opendev.org/67018911:49
*** ksdean has joined #openstack-nova11:49
aspierssean-k-mooney: ^^^ fixed a few minor typos/grammar issues in the comments but there is still one issue11:49
*** ttsiouts has joined #openstack-nova11:50
sean-k-mooneyaspiers: :) i think there always will be11:50
sean-k-mooneyaspiers: what is the latest one?11:50
aspierssean-k-mooney: just posted on the review11:50
*** mriedem has joined #openstack-nova11:51
sean-k-mooneywe dont have an easy way to check the excpetion unless we pars the error message text11:54
aspiersyes that's what I was suggesting11:54
aspiersit's not great but better than nothing11:55
sean-k-mooneyyes and i think that is a bad idea11:55
sean-k-mooneyit makes it rather fragile11:55
aspierswell OK then the debug message should be more honest and not assume what it doesn't know11:55
aspiersit's fine if it says it's guessing the issue11:55
sean-k-mooneythe debug message just states that we are skiping the arch because its in compatible with the virt type and machine type11:55
sean-k-mooneywe dont state which is the in compatibality11:55
aspiersbut it doesn't know that11:56
aspiersit could fail due to libvirtd issues11:56
sean-k-mooneyit does11:56
aspierse.g. libvirtd crashes 1usec beforehand11:56
aspiersand then you get a misleading debug message11:56
sean-k-mooneyso i originally was going to not have a debug message at all11:56
aspiersno it's good to have one11:56
sean-k-mooneyso we can delete it if we want too11:56
aspiersit should just not risk being wrong11:56
aspiersit's fine if it says "this is *probably* what happened"11:57
aspiersif that's the most likely thing11:57
sean-k-mooneythen we can simply state we are skiping the arch and not say way11:57
sean-k-mooney*why11:57
aspiersif that's the most likely thing it's better to expose the guess11:57
aspierssince that's potentially more helpful to the operator/dev than not guessing11:57
aspiersas long as it's not misleading11:57
sean-k-mooneye.g. Skipping arch: becasue libvirt raised an error, check you libvirt logs for more info11:57
aspiersthink of it from the operator perspective11:57
aspiersno the libvirt logs might not reveal anything11:58
sean-k-mooneyaspiers: this is never ment to be read by operators11:58
aspiersif it was incompatible11:58
sean-k-mooneythat is why its at debug level11:58
aspiersif you don't believe operators read DEBUG you live in a different universe ...11:58
aspiers:)11:58
sean-k-mooneythey might but this is not intened for them11:58
aspiersanyway it doesn't matter who is reading it11:58
sean-k-mooneyare you ok with the message i suggested above11:58
aspiersthe point is that the message needs to be a) not misleading b) as helpful as possible11:59
*** sapd1_ has joined #openstack-nova11:59
aspiersOK I will paste a suggested message here, 1 sec11:59
*** sapd1 has quit IRC11:59
sean-k-mooney "Skipping arch: %s becasue libvirt raised an error, check you libvirt logs for more info."11:59
aspiersnope12:00
aspierslike I said libvirt logs might not help12:00
aspiersand in this case we know we might be able to help by guessing the likely cause12:00
sean-k-mooneyyou said you dont want it to be missleading12:00
aspiersyes that is a)12:00
aspiersbut also b)12:01
sean-k-mooneythe current error message is our best guess at why the error was raised12:01
aspiersYes but it's not honest that it's a guess12:01
aspiersThis is better: "Failed to retrieve domain caps from libvirt for arch %s; maybe incompatible with virt_type %s / machine_type %s?"12:01
sean-k-mooneyit is honest it was a summary of the error message12:02
sean-k-mooneywe could just print the error message we get back from libvirt12:02
aspiersYes good idea12:02
aspiersThis is better: "Failed to retrieve domain caps from libvirt for arch %(arch)s (%(error)s); maybe incompatible with virt_type %(virt_type)s / machine_type %(mach_type)s?"12:02
sean-k-mooneyi wanted to avoid the stack trace but we should be able to just get the message12:02
*** dpawlik has quit IRC12:02
aspiersthat last one includes the libvirt error message ^^^12:03
sean-k-mooneyi would not put the error in the middel12:03
sean-k-mooneyi would put it at the end12:03
sean-k-mooneyi guess its not that long12:04
sean-k-mooneyhttp://paste.openstack.org/show/754776/12:04
sean-k-mooneyits invalid argument: KVM is not supported by '/usr/bin/qemu-system-alpha' on this host12:04
sean-k-mooneyat least in the case where the virt type is the issue12:04
aspiersOK good point12:04
*** ratailor has quit IRC12:05
aspiers"Failed to retrieve domain caps from libvirt for arch %(arch)s; maybe incompatible with virt_type %(virt_type)s / machine_type %(mach_type)s? libvirt error was: %(error)s"12:05
sean-k-mooneyya im fine with that12:05
aspiersor actually12:05
aspierssince the libvirt error is already helpful enough12:05
aspiers"Failed to retrieve domain caps from libvirt for arch %(arch)s / virt_type %(virt_type)s / machine_type %(mach_type)s; libvirt error was: %(error)s"12:06
sean-k-mooneysure we dont need to make it a question12:06
sean-k-mooneysince the lbvirt error states what the issue was12:06
* kashyap follows the discussion12:06
kashyapaspiers: Can't we do a check to determine the arch to 'virt_type' compatibility?12:07
sean-k-mooneyno12:07
aspierssean-k-mooney: exactly, I removed the question mark12:07
sean-k-mooneythat is libvirt job12:07
* kashyap still reading12:07
aspierskashyap: I agree with sean-k-mooney here12:07
aspiersnova shouldn't know about that12:07
kashyapsean-k-mooney: I meant, _using_ libvirt's reported results, of course12:07
kashyapaspiers: Yeah, just thinking out loud12:07
kashyapI agree that Nova sholdn't know about it12:08
sean-k-mooneyright if kvm ever adds support for accleration of non ntaive instruction i dont want to need to modify nova12:08
aspiersso nova is not checking compatibility, it's just trying to get dom caps12:08
kashyapaspiers: typo: "caps" --> "capabilities"12:08
aspiersif that fails we report the error from libvirt plus details of the API parameters12:08
aspierskashyap: my wrists hurt so I am trying to reduce typing :-p12:08
aspierstypos in IRC are allowed!12:08
aspiersjust not in code12:09
kashyapaspiers: No-no, in the _error_ message I mean12:09
aspiersoh :)12:09
aspiersok sure12:09
kashyapOf course, it's fine here ;-)  I'm not _that_ pedantic12:09
aspiers"Failed to retrieve domain capabilities from libvirt for arch %(arch)s / virt_type %(virt_type)s / machine_type %(mach_type)s; libvirt error was: %(error)s"12:09
kashyapaspiers: Yes, sounds clear and truthful12:09
aspierskashyap: haha well I am sometimes so I wasn't ruling out that you might be too ;-)12:09
sean-k-mooneyya that looks fine12:09
aspierssean-k-mooney: OK want me to change it or you?12:09
sean-k-mooneysure12:10
sean-k-mooneyi am about to grab lunch12:10
aspierssure == me? :)12:10
aspiersOK12:10
aspiersbefore you go12:10
sean-k-mooneyif you havent by the time i get back ill do it12:10
aspiersI have some semi-exciting news12:10
aspierscheck this out12:10
sean-k-mooneyoh?12:10
aspiersJul 25 01:29:37 della5s17 nova-compute[6543]: DEBUG nova.scheduler.client.report [None req-ce6afb17-2d5f-489a-bfda-667053513883 None None] Inventory has not changed for provider 54d4029e-c36b-4bd3-b922-ab4cdefba128 based on inventory data: {u'VCPU': {u'allocation_ratio': 16.0, u'total': 128, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 128}, u'MEMORY_MB': {u'allocation_ratio': 1.5,12:10
aspiersu'total': 128452, u'reserved': 512, u'step_size': 1, u'min_unit': 1, u'max_unit': 128452}, u'DISK_GB': {u'allocation_ratio': 1.0, u'total': 95, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 95}, u'MEM_ENCRYPTION_CONTEXT': {u'allocation_ratio': 1.0, u'total': 2147483647, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 2147483647}} {{(pid=6543) set_inventory_for_provider12:10
aspiers/opt/stack/nova/nova/scheduler/client/report.py:912}}12:10
aspiersoops, sorry for linebreaks12:10
aspiersthat's from a real SEV system12:10
aspierstotal is "infinite" cos I didn't configure the nova.conf option yet12:11
aspiersI'm gonna set to 16 now12:11
sean-k-mooneycool12:11
kashyapaspiers: Hehe12:11
sean-k-mooneyi also reached a milestone yesterday where i fully tested the vPMU feature and image metadata prefilter stuff i have been working on12:11
aspiersnice!12:12
aspiersalso12:12
aspiersJul 25 01:24:29 della5s17 nova-compute[6543]: DEBUG nova.scheduler.client.report [None req-ce6afb17-2d5f-489a-bfda-667053513883 None None] Refreshing trait associations for resource provider 54d4029e-c36b-4bd3-b922-ab4cdefba128, traits:12:12
aspiersHW_CPU_X86_SSE,COMPUTE_IMAGE_TYPE_ISO,COMPUTE_NET_ATTACH_INTERFACE,COMPUTE_NET_ATTACH_INTERFACE_WITH_TAG,HW_CPU_X86_AMD_SEV,COMPUTE_IMAGE_TYPE_AKI,COMPUTE_IMAGE_TYPE_ARI,COMPUTE_IMAGE_TYPE_QCOW2,COMPUTE_TRUSTED_CERTS,HW_CPU_X86_SVM,COMPUTE_DEVICE_TAGGING,COMPUTE_VOLUME_ATTACH_WITH_TAG,COMPUTE_VOLUME_MULTI_ATTACH,HW_CPU_X86_SSE2,COMPUTE_IMAGE_TYPE_AMI,COMPUTE_VOLUME_EXTEND,HW_CPU_X86_MMX,COMPUTE_IMAGE_12:12
sean-k-mooneyit was really nice to see the traits showing up in placmenet for the prefilter and the transform happening12:12
aspiersTYPE_RAW {{(pid=6543) _refresh_associations /opt/stack/nova/nova/scheduler/client/report.py:796}}12:12
aspiersoh, line breaks are from IRC max line limit12:12
aspiersJul 25 01:13:25 della5s17 nova-compute[6543]:  {{(pid=6543) _get_domain_capabilities /opt/stack/nova/nova/virt/libvirt/host.py:831}}12:12
aspiersJul 25 01:13:25 della5s17 nova-compute[6543]: DEBUG nova.virt.libvirt.host [-] Checking SEV support for arch x86_64 and machine type pc {{(pid=6543) _set_amd_sev_support /opt/stack/nova/nova/virt/libvirt/host.py:12:12
aspiers1120}}12:12
aspiersJul 25 01:13:25 della5s17 nova-compute[6543]: INFO nova.virt.libvirt.host [-] AMD SEV support detected12:12
aspiersso I'm finally ready to test booting a real SEV VM through nova :-O12:13
*** takashin has joined #openstack-nova12:13
sean-k-mooneynice12:14
sean-k-mooneyi have added both my features to the runeway queue so hopfully we will see sev on that list soon too12:14
aspiersit's already in the queue12:15
aspiersI thought the series was already ready before I went on vacation so I added it then12:15
sean-k-mooneyi dont see it12:16
aspiersof course I was wrong12:16
aspiersah, got removed again12:16
sean-k-mooneyya because it was in merge conflict12:16
aspiersnot when I added it12:16
sean-k-mooneysure but it went into merge conflict when you were on vaction12:17
aspiersyes12:17
aspiersat least that bit wasn't my fault ;-)12:17
sean-k-mooney:) damb cores merging code12:17
sean-k-mooneyits almost as if its part of there jobs12:17
aspiersI know, disgraceful behaviour12:18
*** mrch_ has quit IRC12:20
sean-k-mooneyok, shower, lunch and then back before a meeting at 2. that should be totally doable...12:21
*** dpawlik has joined #openstack-nova12:21
aspiersgood luck12:21
*** pcaruana has joined #openstack-nova12:22
mriedembrinzhang: https://review.opendev.org/#/c/612949/1012:24
mriedemalex_xu: ^12:24
mriedemi think what i've described as the main proposed change (in my comments) makes sense, but the spec overall is a bit confusing since it seems to mix in lots of different things12:24
mriedemif the spec can be cleaned up in time for the spec freeze i'll probably be +2 on it12:24
mriedemi do appreciate that the entire dev team from inspur has +1ed the spec without leaving any comments though :)12:25
aspiersmriedem: since I'm back from vacation and have got the SEV series back to (AFAICS) 100% health, I've readded to the runway12:26
mriedemwithin like 5 minutes of each other :)12:26
mriedemaspiers: ok12:26
openstackgerritDaniel Speichert proposed openstack/nova-specs master: Directly download and upload images to RBD  https://review.opendev.org/65890312:27
*** tbachman has quit IRC12:40
aspierssean-k-mooney: almost got the PS ready12:55
efriedaspiers: also, it's generally discouraged to put a series in a runway slot if the devs are going to be on vacation or otherwise unavailable during the window to address comments (and merge conflicts).12:56
efriedsee last bullet under "Requirements for being eligible..."12:57
aspiersefried: ack. IIRC it was far back in the queue so didn't seem to have much chance of getting addressed while I was away12:57
aspiersbut maybe I miscalculated12:57
aspiersalso, I was expecting a colleague to be available while I was away12:57
aspiersbut I think he got busy with other commitments12:58
efriedaspiers: What should have happened is whoever actually moved it from the queue to a slot should have asked about dev availability.12:58
aspierslife happens ...12:58
efriedanyway, nbd12:58
aspiersI'm here now anyway :)12:58
aspiersand it's all looking pretty good12:58
efriedit's not like nuclear missile launches will or will not happen based on how tightly we run our runway queue.12:59
aspiersthey won't? :-o12:59
aspiersdamn, I'm in the wrong business12:59
cdentefried: speak for yourself efried, I run my small republic's military on the comings and goings of openstack etherpads13:01
cdentand we have da bomb13:01
aspiersI was only in OpenStack because I thought I was helping hurry the apocalypse along13:01
cdent_exactly_13:01
*** gryf has quit IRC13:01
aspierscdent: is it hotter than hell where you are too?13:01
aspiersmaybe the apocalypse has already started13:02
cdentaspiers: no. it's hotter than normal, but since I'm in corwall by the sea, it's a balmy and breezy 2413:02
efriedIt's unseasonably cool here in central texas.13:02
cdentso apparently the apocolypse is upcountry, which make sense13:02
aspiersbah13:02
* cdent blames boris13:02
aspiersI guess London deserves the wrath of satan more than the rest of the country13:02
cdentyes13:02
aspierss/guess/know/13:03
*** tbachman has joined #openstack-nova13:07
*** udesale has joined #openstack-nova13:16
*** mvkr_ has quit IRC13:17
*** jhesketh has quit IRC13:22
openstackgerritAdam Spiers proposed openstack/nova master: libvirt: harden Host.get_domain_capabilities()  https://review.opendev.org/67018913:23
*** jaosorior has joined #openstack-nova13:23
*** marta_lais has joined #openstack-nova13:24
efriedstephenfin: I'm not sure what cores are able to review --^ but I thought you might be one of them?13:24
efriedI know it's not me :(13:24
aspierssean-k-mooney, kashyap: new version up ^^^13:25
*** jhesketh has joined #openstack-nova13:26
openstackgerritAdam Spiers proposed openstack/nova master: libvirt: harden Host.get_domain_capabilities()  https://review.opendev.org/67018913:29
aspierskashyap: ah, just saw your nits - addressed13:29
*** _hemna has joined #openstack-nova13:30
mriedemefried: fyi i'll be running my kid to a class thing during the meeting so won't be around13:30
*** mkrai has joined #openstack-nova13:31
openstackgerritMatt Riedemann proposed openstack/nova stable/stein: Revert "[libvirt] Filter hypervisor_type by virt_type"  https://review.opendev.org/67272313:31
*** Luzi has quit IRC13:33
*** _hemna has quit IRC13:35
kashyapaspiers: Thanks :-)13:35
efriedmeeting, right, meeting.13:35
*** ricolin has quit IRC13:36
stephenfinefried: It's on my list. Got to get to the combined VCPU/PCPU instances spec first though13:42
* kashyap reads the interesting revert above13:42
openstackgerritMerged openstack/nova master: Remove super old unnecessary TODO from API start() method  https://review.opendev.org/67233013:43
openstackgerritMerged openstack/nova master: Completely remove fake_libvirt_utils.  https://review.opendev.org/64389713:43
*** jaosorior has quit IRC13:43
openstackgerritMerged openstack/nova master: Remove usused umask argument to virt.libvirt.utils.write_to_file  https://review.opendev.org/64508613:43
*** amodi has joined #openstack-nova13:45
efriedthanks stephenfin, specs are much more important rn.13:53
*** mvkr_ has joined #openstack-nova13:58
sean-k-mooneyefried: cob today is spec freeze right13:58
efriedsean-k-mooney: Yes. http://lists.openstack.org/pipermail/openstack-discuss/2019-July/008019.html13:59
*** mkrai has quit IRC14:00
sean-k-mooneyyep read that yesterday but its been a busy few days14:00
dansmithsean-k-mooney: it's been a few days since yesterday?14:00
efriednova meeting now in -meeting14:00
openstackgerritMerged openstack/nova master: Revert "[libvirt] Filter hypervisor_type by virt_type"  https://review.opendev.org/67255914:01
sean-k-mooneyno just ingeneral i was traviling form cashel to shannon after spending a few days at my mothers since her car broke down14:01
openstackgerritMerged openstack/nova master: Consts for need_healing  https://review.opendev.org/67228414:01
openstackgerritMerged openstack/nova master: Fix cleaning up console tokens  https://review.opendev.org/63771614:01
openstackgerritMerged openstack/nova master: Disambiguate logs in delete_allocation_for_instance  https://review.opendev.org/67186914:01
openstackgerritMerged openstack/nova master: Remove old TODO about forced_host policy check  https://review.opendev.org/66947414:01
dansmithsean-k-mooney: heh, it was just a funny mind-o highlighting the business (i.e. two days felt like three)14:01
openstackgerritMerged openstack/nova master: Avoid logging traceback when detach device not found  https://review.opendev.org/67164014:01
dansmither, busy-ness14:01
*** pots has joined #openstack-nova14:02
sean-k-mooneyha ya14:02
sean-k-mooneyspeaking of specs i agree that https://review.opendev.org/#/c/608696/ and https://review.opendev.org/#/c/602201/ are the closest of the remaing set14:05
*** luksky123 has quit IRC14:07
*** ttsiouts has quit IRC14:09
*** ttsiouts has joined #openstack-nova14:09
*** ttsiouts has quit IRC14:14
*** dpawlik has quit IRC14:28
*** ccamacho has joined #openstack-nova14:32
*** ttsiouts has joined #openstack-nova14:34
openstackgerritMatt Riedemann proposed openstack/nova stable/rocky: Avoid crashing while getting libvirt capabilities with unknown arch names  https://review.opendev.org/67274614:35
openstackgerritMatt Riedemann proposed openstack/nova stable/rocky: Revert "[libvirt] Filter hypervisor_type by virt_type"  https://review.opendev.org/67274714:35
artomDid something change with placement in functional tests recently? I swear my NUMA LM func test was getting to updating the XML last night, but this morning it's failing on placement not giving any allocations14:40
*** yikun has quit IRC14:40
*** jmlowe has quit IRC14:42
cdentartom you have an example of the query that's being made, that might clarify things14:42
cdentjust today some cache adjustments were merged. when did things last work?14:43
artomLast night (EDT)14:44
artom27.0.0.1 "GET /placement/allocation_candidates?limit=1000&required=%21COMPUTE_STATUS_DISABLED&resources=DISK_GB%3A20%2CMEMORY_MB%3A2048%2CVCPU%3A3" status: 200 len: 53 microversion: 1.3114:44
artomGot no allocation candidates from the Placement API. This could be due to insufficient resources or a temporary occurrence as compute nodes start up.14:44
* artom tries recreating the tox venv14:45
artom(Which I guess makes even less sense - if it was the same as before, what changed?)14:45
cdentartom: got a patch up so I can play with it myself?14:47
cdentwhen did compute status disabled support merge in nova?14:48
artomcdent, https://review.opendev.org/#/c/672595, but I had to rebase it locally, so hang on14:48
cdentaye aye14:48
mriedemcdent: last week i think14:49
mriedemor 2 weeks ago14:49
*** mlavalle has joined #openstack-nova14:50
*** lbragstad has joined #openstack-nova14:51
openstackgerritArtom Lifshitz proposed openstack/nova master: [WIP-until-series-is-ready] Introduce live_migration_claim()  https://review.opendev.org/63566914:51
openstackgerritArtom Lifshitz proposed openstack/nova master: New objects for NUMA live migration  https://review.opendev.org/63482714:51
openstackgerritArtom Lifshitz proposed openstack/nova master: LM: add support for sending NUMAMigrateData to the source  https://review.opendev.org/63482814:51
openstackgerritArtom Lifshitz proposed openstack/nova master: LM: add support for updating NUMA-related XML on the source  https://review.opendev.org/63522914:51
openstackgerritArtom Lifshitz proposed openstack/nova master: RPC changes to prepare for NUMA live migration  https://review.opendev.org/63460514:51
openstackgerritArtom Lifshitz proposed openstack/nova master: NUMA live migration support  https://review.opendev.org/63460614:51
openstackgerritArtom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002114:51
openstackgerritArtom Lifshitz proposed openstack/nova master: [WIP] Functional tests for NUMA live migration  https://review.opendev.org/67259514:51
artom^^ There, fixed the merge conflicts14:51
artomFor all I know it's not even placement...14:52
cdentthanks artom will try it from my side of the world14:52
artomcdent, appreciated, much thanks :)14:52
*** ccamacho has quit IRC14:53
efriedartom: https://review.opendev.org/#/c/668752/14:55
efried~2w ago14:55
mriedemi'm pretty sure required=%21COMPUTE_STATUS_DISABLED is saying !disabled14:57
mriedemi.e. forbidden trait14:57
cdentyes14:57
artomYeah, that's !14:57
sean-k-mooneymriedem: yep it is14:59
sean-k-mooneyalthouhg ! does not technically have to be url encoded15:00
sean-k-mooneybut %21 is the encodeing for !15:00
*** takashin has left #openstack-nova15:01
*** brault has joined #openstack-nova15:03
*** artom has quit IRC15:06
*** TxGirlGeek has joined #openstack-nova15:07
*** jmlowe has joined #openstack-nova15:07
*** mdbooth has quit IRC15:10
*** artom has joined #openstack-nova15:14
*** ratailor has joined #openstack-nova15:16
*** wwriverrat has joined #openstack-nova15:16
*** dklyle has quit IRC15:17
*** _erlon_ has joined #openstack-nova15:18
*** dklyle has joined #openstack-nova15:18
*** mkrai has joined #openstack-nova15:18
*** ricolin has joined #openstack-nova15:22
*** gryf has joined #openstack-nova15:27
efriedsean-k-mooney: A requirement for cycle-with-intermediary projects is a m-2 release. os-vif qualifies.15:28
efriedIt looks like there have been half a dozen or so commits since the last release, of which only one looks like it has any meat (https://review.opendev.org/#/c/658786/)15:28
efriedCan we do a release now?15:28
efriedjangutter: ^15:28
*** maciejjozefczyk has quit IRC15:32
sean-k-mooneyefried: actully we just need to have 1 release we dont need one at each milestone15:33
sean-k-mooneythat was cycles-with-milestones15:33
sean-k-mooneybut ill check and get back to you later15:33
sean-k-mooneyjust on a meeting15:33
*** ricolin_ has joined #openstack-nova15:33
stephenfinefried, mriedem: Can I have pre-commit? Pretty please? https://review.opendev.org/#/c/665518/15:34
stephenfinI've to respin ~8 patches because I forgot to run pep8 :'(  https://review.opendev.org/#/c/671797/15:35
stephenfinNot that I'm going to bother yet. I'll fix it when I need to respin15:35
*** ricolin has quit IRC15:36
*** pchavva has joined #openstack-nova15:37
*** pchavva has left #openstack-nova15:37
jangutterefried: regarding the meat, I'm happy for it to get barbecued into a release. note that there has been some follow-on stuff (unmerged) that happened afterwards too.15:37
*** ricolin_ is now known as ricolin15:38
jangutterefried: specifically https://review.opendev.org/#/c/665965/15:39
cdentartom: I responded on your thing. I can see where things go wrong, but not why15:40
cdentwhat I mean is I can answer the "why" but not the "why of the why"15:40
jangutterefried: my view (stated in the os-vif review) is that I don't think the follow-on is necessary, but I don't feel strongly enough to oppose it.15:40
artomcdent, aha, that's already very helpful15:41
cdentartom: good. i'll be curious to here what the missing piece is15:41
cdentand hear too15:41
artomI know some stuff changed recently around fakelibvirt15:41
artomMaybe a side effect of that was changing the default flavor and/or compute disk size?15:42
mriedemstephenfin: i'll defer your pre-commit request to dansmith15:42
cdentsounds likekly15:42
artomI shall dig, right after this meeting15:42
artomWhich is done, thank god15:42
stephenfinugh, but he's the worst15:42
mriedemi'm an old bugbear so i don't care about pre-commit15:42
stephenfinUm, I mean...the best <315:42
mriedemand will bitch to no end if it makes me do extra things15:42
*** ratailor has quit IRC15:43
artommriedem's artistic left brain half is actually a tox venv for running pep815:43
sean-k-mooneymriedem: well if you dont install it it wont do anything15:43
stephenfinIt should make you do less things, since you won't need to remember to run fast815:43
stephenfinbut it is different things15:43
stephenfin...unless we backport15:43
stephenfin...which I would be game to do15:44
*** gyee has joined #openstack-nova15:44
mriedemi would not15:44
*** avolkov has joined #openstack-nova15:45
sean-k-mooneyanyway the important thing about the pre-commit stuff is it should not impact anywayone that does not want to use it15:45
*** cdent has left #openstack-nova15:45
sean-k-mooneythey can continue to use tox and the gate will continute to use tox15:45
*** cdent has joined #openstack-nova15:45
stephenfin'zactly. It's purely opt-in15:45
sean-k-mooneyfor those that want to use it the can install the tool and let it doe its thing15:45
stephenfinsean-k-mooney: I added the tab to spaces converter thing too, if that sweetens the deal for you15:46
*** artom has quit IRC15:46
sean-k-mooney:)15:46
stephenfinsome pre-commit >>> no pre-commit15:46
sean-k-mooneyyes yes it does15:46
sean-k-mooneyi already likeed it however15:46
openstackgerritMatt Riedemann proposed openstack/nova master: api-ref: touch up the os-services docs  https://review.opendev.org/67257115:47
*** altlogbot_1 has quit IRC15:48
openstackgerritMatt Riedemann proposed openstack/nova master: api-ref: touch up the os-services docs  https://review.opendev.org/67257115:49
*** altlogbot_3 has joined #openstack-nova15:50
*** tesseract has quit IRC15:50
sean-k-mooneyefried: alex_xu stephenfin i have summerised where i stand on the mix cpu spec in my last top level comment https://review.opendev.org/#/c/668656/515:55
sean-k-mooneyif that seams fair i think we could proceed with it contionally on that restricted time table15:56
sean-k-mooneyotherwise i would move this to backlog/U15:56
jangutterstephenfin: tab to space is low hanging fruit. If you really want a holy war, enforce "one space after period."15:57
sean-k-mooneyjangutter: the tab to space thing is because we manually enforce no tabs15:57
janguttersean-k-mooney: yep, pep8 will just do a late fail for you.15:58
sean-k-mooneyso the pre-commit hook would detect it for you15:58
sean-k-mooneyjangutter: only on python 3 i think15:58
sean-k-mooneyon 2 i think it will allow it as long as you dont mix15:58
mriedemoooooo yeahhh https://www.youtube.com/watch?v=Lrle0x_DHBM15:59
openstackgerritSurya Seetharaman proposed openstack/nova master: API microversion 2.75: Add 'power-update' external event  https://review.opendev.org/64561115:59
*** ttsiouts has quit IRC16:00
*** shilpasd has quit IRC16:03
*** brault has quit IRC16:04
*** JamesBenson has joined #openstack-nova16:04
*** tssurya has quit IRC16:04
*** jmlowe has quit IRC16:04
*** brault has joined #openstack-nova16:05
*** mkrai has quit IRC16:05
efriedstephenfin: Isn't pre-commit something you can carry locally at will?16:07
stephenfinI'd need to add it to '.gitignore' but otherwise yes16:08
efriedI mean, given the level of meh about putting it in the codebase, that just semes like the better option, doesn't it?16:09
efriedIt would be, like, two of you using it?16:09
*** brault has quit IRC16:09
stephenfinPerhaps, but it just rubs me the wrong way if we're being honest16:11
stephenfinI'm not sure why we should be so averse to trying new things, especially when those things are opt-in and don't affect the end product in any way16:11
Nick_Awhat is the correct way to enable maintenance mode on a hypervisor to prevent new instances from spawning on it? https://docs.openstack.org/python-openstackclient/rocky/cli/command-objects/host.html doesn't seem to work - "Not Implemented" error16:11
*** artom has joined #openstack-nova16:12
Nick_ANever mind - we found it16:12
artomcdent, https://review.opendev.org/#/c/644793/12/nova/tests/functional/libvirt/test_numa_servers.py culprit found16:12
cdentwoot16:12
artomcdent, thanks again for your prompt help!16:12
cdentyou're welcome. was there a line number associated with that link?16:13
cdentor just the general concept16:13
artomcdent, in general - more specifically, it removed a monkey patch in setup: https://review.opendev.org/#/c/644793/12/nova/tests/functional/libvirt/base.py16:14
artomSo then had to mock a bunch of stuff for each test16:14
cdentah ha16:14
artomWhich I wasn't mocking, as I wrote the test before that landed16:14
artomAnd thus was relying on the setUp monkeypatch, which got pulled from under me16:15
cdentyowsa16:15
artomTBH, the commit message doensn't do a great job of explaining *why* it was necessary to remove that monkeypatch16:15
stephenfinidk. I do a lot of reviews. I write some code. I'm a decent community member, in general. Why do I have to pull teeth to get something in that I'm saying helps my productivity and doesn't hamper anyone else. It's frustrating.16:15
artomBut I assume there's a larger context that I'm not aware of16:16
efriedstephenfin: +2 on the basis of you really wanting it.16:16
stephenfinCheers16:16
efriedstephenfin: but to answer your question, because it's trivial for you to do it locally without "polluting" the nova codebase with something that's irrelevant to nova.16:17
cdentstephenfin++16:17
stephenfinMy counterpoint to that is that I've done, and continue to do, a lot of unpolluting16:17
efriedbe like me putting a pycharm config16:17
artomSounds like we need the equivalent of a carbon tax16:18
efriedbig difference between obsolete nova code and something that was never relevant to nova.16:18
*** brinzhang_ has quit IRC16:18
*** brinzhang has quit IRC16:18
stephenfinrelevant to nova developers though16:19
stephenfinwho are as important, if not more important, than the code16:20
efriedas relevant to any project's developers, nah?16:20
efriedare you going on a crusade to propose this same thing to all projects you work on?16:20
artomcdent, woot, I'm back to last night's failure16:21
* artom starts tacking fakelibvirt's broken getXML()16:21
cdentsomething reasonable now16:21
artom*tackling16:21
stephenfinI'll probably add it to one or two of my personal projects, maybe Sphinx too, but I wouldn't be touching oslo and the likes, no16:21
kashyapartom: Hehe, one letter changes the meaning, doesn't it :D16:21
stephenfinBasically anywhere where I'm likely to be undertaking large feature work consisting of many patches16:21
artomkashyap, at least I wasn't tickling it16:22
kashyapmriedem: Try this, not sure if that's your glass of (root?) beer -- https://www.youtube.com/watch?v=jBo870lVUyc16:23
kashyap[Preferably with a good quality headset / speaker]16:23
cdentoh. that's nice.16:27
*** lpetrut has quit IRC16:29
kashyapJimmy Smith++16:29
*** rpittau is now known as rpittau|afk16:36
*** ricolin has quit IRC16:39
*** cdent has quit IRC16:39
*** igordc has quit IRC16:58
*** igordc has joined #openstack-nova16:58
*** ivve has quit IRC17:01
mriedemwhat in tarnations, created devstack from master today, create a server, n-cpu logs say the guest was created in the hypervisor, and then things just hang - and virsh list doesn't show anything17:02
mriedemwtf17:02
sean-k-mooneymriedem: im guessing libvirt crahsed17:05
sean-k-mooneyeither that or you need to run virsh listh with either sudo or --all17:05
sean-k-mooneyactully if it hung then ingore the last bit17:06
mriedemoh right sudo virsh list17:06
mriedemlibvirtd is green17:06
mriedemthe domain is just hung in paused state17:06
sean-k-mooneythe domain or the nova compute agent17:07
mriedemthe domain17:07
mriedem$ sudo virsh list --all17:07
mriedem Id    Name                           State17:07
mriedem----------------------------------------------------17:07
mriedem 3     instance-00000003              paused17:07
sean-k-mooneyand in nova its active17:08
mriedemno17:08
mriedemit's building b/c the libvirt driver is waiting for the power state to change from paused to running17:08
sean-k-mooney oh and you it alwready told it to unpause? we start the domian in the pased state.  i wonder if the qemu monitor has hung17:09
*** jmlowe has joined #openstack-nova17:13
mriedemfun17:14
mriedemJul 25 17:13:36 devstack libvirt-guests.sh[18879]: Timeout expired while shutting down domains17:14
mriedemJul 25 17:13:36 devstack systemd[1]: libvirt-guests.service: Control process exited, code=exited status=117:14
mriedemtrying to restart libvirt-guests17:14
sean-k-mooneyis this a clean install of  ubuntu 18.04?17:15
mriedemwell from a vexxhost image of 18.04 but yeah17:15
sean-k-mooneystrange i personlly havent had any issue with 18.04 i did an install fiday17:16
mriedemme neither17:16
sean-k-mooneyare the vexhost image available for download17:17
mriedemidk, i'm trashing this vm17:18
*** jaypipes has quit IRC17:18
mnaserthe vexxhost images are straight up the ones shipped by ubuntu17:19
sean-k-mooneyya i would just start over too to be honest. i suspect its somehitng to doe with libvirt/qemu or maybe apparmor but i would start clean17:19
sean-k-mooneymnaser: the cloud images17:19
mnaseryep17:19
mnaseronly thign we do is convert from qemu to raw17:19
mnaserand upload17:19
sean-k-mooneyim guessing ye are using ceph as a backend then17:19
*** betherly has joined #openstack-nova17:19
mnaserindeed :)17:20
kashyapOn Fedora, the 'libvirt-guests' thing isn't even enabled:17:20
kashyap$> systemctl status libvirt-guests17:20
kashyap● libvirt-guests.service - Suspend/Resume Running libvirt Guests17:20
kashyap   Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; disabled; vendor preset: disabled)17:20
kashyap   Active: inactive (dead)17:20
kashyap...17:20
kashyapBut yeah, that timeout of 'libvirt-guests' looks spurious enough, might as well start over.17:20
sean-k-mooneykashyap: would that not cause filesystem curruption if you did not suspend them on rebooting the host17:21
kashyap(Also, not sure if that paused instance's QEMU process went 'defunct')17:21
efriedfollowing up re os-vif and python-novaclient releases: Libs are required to do one release per milestone. os-vif was last released at m1, so we can expect the release team to propose that one. python-novaclient was released a couple weeks ago, so we're probably good on that one.17:21
sean-k-mooneykashyap: im guessing the qemu moniotr process stoped processing messages form libvirt17:21
sean-k-mooneyefried: ok there is one think i would like to fix soonish but im only starting on it today17:22
kashyapYeah, but that doesn't tell us why.  It could be any no. of reasons17:22
sean-k-mooneykashyap: yep its proably quicker to kill it and spin up a clean vm17:23
sean-k-mooneyif mriedem hits it again we can take another look17:23
kashyapsean-k-mooney: Yeah, on FS corruption, possibly "enterprise distros" would enable it17:24
*** ralonsoh has quit IRC17:24
*** betherly has quit IRC17:24
sean-k-mooneyit look like ubuntu just enables it by defualt to be safe by defualt17:24
*** JamesBenson has quit IRC17:25
*** igordc has quit IRC17:28
*** vishwanathj has quit IRC17:30
kashyapsean-k-mooney: RHEL doesn't either, BTW.  And one can configure what action 'libvirt-guests' can take on host shutdown17:31
sean-k-mooneyok17:31
sean-k-mooneywell that is not related to the issue mriedem was having17:32
sean-k-mooneythe issue he was having was that the vm hung17:32
sean-k-mooneyand then the linux-guests scipt also hugn on shutdown for the same reason17:32
sean-k-mooneyther eis a timout in the service file if i rememebr correctly it waits for up to 2 minutes17:33
sean-k-mooneyand it continue with the system shutdown if it takes longer then that17:33
*** udesale has quit IRC17:34
kashyapI wasn't saying it is related.  On your FS corruption: no, it is admin / higher-level tool's responsibility to ensure your guests will quiesce its FS.17:34
* kashyap --> needs to run shortly17:34
*** vishwanathj has joined #openstack-nova17:34
kashyap(And yes, there is a timeout: check SHUTDOWN_TIMEOUT in /etc/sysconfig/libvirt-guests)17:35
sean-k-mooneyack17:35
kashyapDefault is 5 minutes.17:35
*** mvkr_ has quit IRC17:35
sean-k-mooneyya i have seen it when i have rebooted system in the console output in the past i just noticed it had one but never really look that closely17:35
*** marta_lais has quit IRC17:38
melwittdansmith, mriedem: would like to have your review on a change to remove the "last context manager" from the CellDatabases fixture https://review.opendev.org/672604. this came up again while I was working on adding a func test to Kevin_Zheng's multi-cell nova-manage db archive_deleted_rows patch https://review.opendev.org/507486, which has been of high priority interest downstream lately17:39
openstackgerritEric Fried proposed openstack/nova master: WIP: Process [compute] in $NOVA_CPU_CONF in nova-next  https://review.opendev.org/67280017:39
sean-k-mooneymelwitt: by the way has anyone reviewed the unified limits spec for the api subteam? not sure who that would be17:42
efriedsean-k-mooney: Isn't gmann "the api subteam"?17:44
sean-k-mooneyefried: i guess so? i wasnt sure who was on it. but i didnt want too see that sepc slip through the cracks if they were not about to review it17:45
efriedAgree.17:46
melwittsean-k-mooney: no, not yet. people I usually ask about api stuff are alex_xu, gmann17:46
efriedI think I added gmann to that spec for that reason, but not sure if he looked.17:46
sean-k-mooneysimilarly with the image encryption spec.17:46
efriedI'm sort of delegating, like "encouraging" the spec owners to track down whoever is needed.17:46
sean-k-mooneyoh the provider yaml spec merged17:48
sean-k-mooneycool i should read the final version17:48
sean-k-mooneyjohnthetubaguy: melwitt: if ye feel like reviewing a spec that is close https://review.opendev.org/#/c/608696/ im happy to do a little reaching out on behalf of the spec owner :)17:49
*** psachin has quit IRC17:50
*** dklyle has quit IRC18:11
*** dklyle has joined #openstack-nova18:12
*** priteau has quit IRC18:13
dansmithmelwitt: okay, I'm generally pretty wary of changing that stuff (or even trying to load enough context to review that). I'm not sure I'll get to that point before I Ieave tomorrow, but...ack :)18:14
openstackgerritMerged openstack/nova master: Remove 'nova.virt.driver.ComputeDriver.estimate_instance_overhead'  https://review.opendev.org/67210618:16
melwittdansmith: ok, thanks for letting me know. it was tough for me to load the context myself, so I understand. I wanted to ideally have you review since the patch involves ripping out the stuff that you had to add with the _cell_lock18:18
melwittI think it makes the fixture much simpler, but definitely want to run it by you in case I missed something18:20
dansmithyeah I'm just afraid of it breaking something subtle which we don't find for a couple months and then think we need to fix it by changing the real code when in fact the fixture is too relaxed or something18:22
dansmithbut that's just because of how hard it was to get it right in the first place, of course18:22
*** igordc has joined #openstack-nova18:22
*** brault has joined #openstack-nova18:23
melwittoh, you mean something to do with racing tests appearing like real bugs when there's really just an issue with the fixture? yeah, I can understand that concern. as far as I can tell, my proposed patch removes all changing of global state, so I'd think there won't be an issue. but those are famous last words, I know18:27
*** brault has quit IRC18:27
*** brault has joined #openstack-nova18:27
melwittif we're too afraid to change the fixture, then we will hopefully be able to accept the multi-cell nova-manage patch's func test not being full coverage because of the faking that the CellDatabases fixture does. my primary objective is to get the multi-cell nova-manage db archive_deleted_rows done18:29
*** JamesBenson has joined #openstack-nova18:29
dansmithokay, I'm not sure why that is harder than other cell iteration things we do in tests, but I'd be much more inclined to accept more mockery  (since that's really a trivial operation) vs. blocking that on rearchitecting the fixture. But, I haven't looked enough into why that's a problem to say really18:31
melwittand while working on that, its func test was not failing when it should have been (bug in a patchset), and I found it wasn't failing properly because of the "last context manager" faking in the fixture18:31
dansmithmaybe I can try to do that before tomorrow at least18:31
melwittdansmith: tl;dr is the func test is written correctly and is good, but it did _not_ catch a bug in the proposed multi-cell archive impl because of the faking in the fixture. the fixture auto-targets untargeted database access to the last targeted database or the default database. the former hid the bug in the impl because the fixture auto-targeted something that was not targeted in real life and needed to be targeted in real life18:34
melwittI hope that makes sense18:34
mriedemi'd also rather figure out why the func tests on the archive patch don't work rather than block on redoing the fixture, but i don't know what the issues were,18:35
mriedemhaving said that,18:35
*** ivve has joined #openstack-nova18:35
mriedemi have a func test in my cross-cell resize series that does db archive on all 3 cells in the test https://review.opendev.org/#/c/651650/22/nova/tests/functional/test_cross_cell_migrate.py18:35
dansmithbut the real code sends untargeted stuff to the default db in the config, which is why the fixture does18:35
melwittmriedem: I did figure it out, it was because of the auto-targeting by "last targeted database" if untargeted18:35
dansmithmriedem: yeah, even still, I'd be happy with just a unit test to make sure that archive is calling archive on all the cell mappings.. it18:36
dansmithis such a trivial op I don't really know that we need much more than that,18:36
mriedemok i guess i mean "i'd rather figure out an easier way to make the new tests work with the existing fixture"18:36
melwittdansmith: but the fixture also sends untargeted stuff to the "last targeted db" first. I think it should only send untargeted stuff to the default db18:36
dansmithand we run archive in tempest jobs, which should hit cell0 and cell1 if we make it run all cells18:36
mriedemalso,18:36
melwitt(which is what my patch is doing)18:36
mriedemi was going to say - nova-next runs archive_deleted_rows, we can and should make it run on all cells18:36
mriedemwhich will hit cell0 and cell1 as dansmith said18:36
mriedemso....then we'd have real integration test coverage18:37
dansmithmelwitt: everything in the compute node is untargeted though, which is why I don't really know how you can change that globally and have it match the real world18:37
dansmithbut.. I haven't read it so I dunno18:37
mriedemof the cli, which is better than the functional stuff anyway18:37
dansmithmriedem: agreed18:37
*** brinzhang has joined #openstack-nova18:37
*** brinzhang_ has joined #openstack-nova18:37
melwittdansmith: I think maybe things are getting confused. in the fixture we have two ways of targeting untargeted stuff. one is sending it to the default db (good) and one is sending it to the last db that was targeted (bad IMHO)18:38
*** mrch_ has joined #openstack-nova18:38
mriedemi have run afoul of that latter behavior18:38
dansmithI understand18:38
mriedemagree it's not fun18:38
mriedembut i think i've found ways around that in my multi-cell func testing18:38
melwittso y'all actually like the last targeted db thing? I'm open to that, just didn't think anyone would think they wanted to keep it18:39
mriedemthis one https://review.opendev.org/#/c/641179/18:39
*** eharney has quit IRC18:39
mriedemi remember i was hitting weirdness because of that 'last targeted context' thing as well18:39
mriedemi'm not saying i like it18:39
mriedembut i also don't like redoing the whole thing per se18:39
mriedemwhen there are maybe other ways18:39
melwittI'm +1 on the real integration testing, that's fine by me. but I just thought it would make the fixture a lot simpler and less big hiding to remove that bit about "last targeted db"18:40
mriedemit's like touching the old quotas code - i can, but don't want to if i can help it18:40
mriedemw/o looking deep into your change idk18:40
mriedemi wouldn't abandon it,18:40
mriedembut i wouldn't block the other thing on it either18:40
mriedemi'd get the integration testing in nova-next working18:40
*** betherly has joined #openstack-nova18:41
melwittthat's fair. it's less like redoing and more like "delete all the self._last_ctxt_mgr" but yeah, when you get around to it, take a look and see if you hate it18:41
mriedemwhich should be like, 1 line18:41
melwittdon't get me wrong, I'm totally fine with that. as long as it gets tested, I'm happy. I was honed in on trying to make the func test work 100%18:42
*** brinzhang_ has quit IRC18:42
*** brinzhang has quit IRC18:42
*** brinzhang has joined #openstack-nova18:43
*** brinzhang_ has joined #openstack-nova18:43
melwittand thought people might be happy to see all the _last_ctxt_mgr stuff deleted from the fixture, no more global state changing, much simpler18:43
melwittI'll rebase the multi-cell archive patch on top of a different change to add --all-cells to nova-next18:45
*** betherly has quit IRC18:45
*** lbragstad has quit IRC18:51
*** mriedem has quit IRC18:54
*** BjoernT has joined #openstack-nova18:59
*** betherly has joined #openstack-nova19:01
*** mriedem has joined #openstack-nova19:03
mriedemefried: ha http://lists.openstack.org/pipermail/openstack-discuss/2019-July/008037.html19:04
*** betherly has quit IRC19:05
*** igordc has quit IRC19:09
*** xek_ has joined #openstack-nova19:15
*** jaypipes has joined #openstack-nova19:16
*** xek has quit IRC19:17
*** TxGirlGeek has quit IRC19:21
openstackgerritMerged openstack/nova master: api-ref: touch up the os-services docs  https://review.opendev.org/67257119:25
*** igordc has joined #openstack-nova19:25
*** igordc has quit IRC19:32
artomThat actually went pretty well.19:33
* artom has minimal scaffolding in place to pass the NUMA LM func test19:34
*** bbowen has quit IRC19:41
*** vishwanathj has quit IRC19:45
openstackgerritMatt Riedemann proposed openstack/nova-specs master: Support delete_on_termination in volume attach api  https://review.opendev.org/61294919:45
mriedemi cleaned this up ^ it's pretty straight-forward19:45
*** dasp has quit IRC19:51
*** dasp has joined #openstack-nova19:54
efriedmriedem: approved that. Seems like an easy win.20:02
*** betherly has joined #openstack-nova20:02
mriedemack, i'm cleaning up https://review.opendev.org/#/c/667894/ now20:03
efriedmriedem: make all the names match up too if you please20:04
efriedmriedem: the bp is at https://blueprints.launchpad.net/nova/+spec/add-user-id-field-to-the-migrations-table20:05
efried(so the path at the top of the spec is right, but the file path, commit message, and topic are wrong)20:06
*** betherly has quit IRC20:07
*** igordc has joined #openstack-nova20:08
openstackgerritMerged openstack/nova-specs master: Support delete_on_termination in volume attach api  https://review.opendev.org/61294920:09
efriedsean-k-mooney, stephenfin: Did we ever figure out whether/how you could tell from within a guest which / how many CPUs are pinned? Was that going to be via "metadata"?20:10
sean-k-mooneyefried: that is what the latest spec say yes20:10
efriedokay, I was about to dig into it. Unfortunately, I don't see me being competent to update it such as to get it approved today.20:11
sean-k-mooneyhttps://review.opendev.org/#/c/668656/5/specs/train/approved/use-pcpu-vcpu-in-one-instance.rst@17320:12
sean-k-mooneyThe metadata API will be extended to dedicated cpu info with new version.20:12
sean-k-mooneyThe new field will be added to the `meta_data.json`::20:12
sean-k-mooney    dedicated_cpu=<cpuset string>20:12
sean-k-mooneyThe ``cpuset string`` indicated the instance cpus which are running on20:12
efriedah nice20:12
sean-k-mooneydedicated pCPU.20:12
sean-k-mooneyyou could alos use that new numa toploty api if we approved that i guess20:13
sean-k-mooneyalthough that is not really from within the guest20:13
efriedI would want stephenfin to be a +2 on that spec anyway, so I guess we'll see if alex_xu et al can polish it up and request sfe, since we seem to have general agreement with caveats as noted.20:14
efriedI'll sent a note on sfe process (which apparently I'll be making up) either tomorrow or Monday.20:15
sean-k-mooneydoes the timeline i set out make sesne to you20:15
efriedoh, totally.20:15
*** gyee has quit IRC20:22
*** wwriverrat has quit IRC20:26
openstackgerritMatt Riedemann proposed openstack/nova-specs master: Add user_id to the migrations  https://review.opendev.org/66789420:28
openstackgerritMatt Riedemann proposed openstack/nova-specs master: Add user_id to the migrations  https://review.opendev.org/66789420:29
mriedemfirst is the diff, second is the file rename20:29
mriedemonly open question on ^ is if we should add user_id/project_id as request filter params to GET /os-migrations20:31
mriedemi think we might as well20:31
*** zbr_ has quit IRC20:35
efriedsean-k-mooney: please vet that I represented you correctly in the whiteboard https://blueprints.launchpad.net/nova/+spec/use-pcpu-and-vcpu-in-one-instance20:35
sean-k-mooneysure20:37
* sean-k-mooney clicks20:37
*** zbr has joined #openstack-nova20:37
sean-k-mooneyyep20:38
efriedthx20:38
sean-k-mooneywithout all the typos in my original comment :)20:38
gmannefried: melwitt sean-k-mooney  ack I will check the spec. it was in my list but did not get time to review.20:38
efriedthanks gmann20:38
melwittcool gmann20:39
efriedmriedem: where are request filter params for GET /os-migrations documented in the api-ref?20:41
efriedoh, are they the top-level fields hidden, host, instance_uuid etc?20:42
efrieduck20:42
efriedmriedem: but yeah, I think it makes sense to add this in there.20:43
mriedemyeah20:43
mriedemi frequently filter on migration_type and instance_uuid in functional tests since GET /servers/{server_id}/migrations is hard-coded to only be in-progress live migrations20:44
*** boxiang_ has quit IRC20:48
*** boxiang_ has joined #openstack-nova20:48
*** mriedem has quit IRC20:52
*** mriedem has joined #openstack-nova20:53
efriedmriedem: chuck that filter field in there and I'm +220:54
*** BjoernT has quit IRC20:55
mriedemalright then20:55
efriednot sure if you're still comfortable being the other +220:55
mriedemo20:55
mriedem*i'm like a 1.520:55
efriedsean-k-mooney: want to throw a +1 on there to push us over the edge?20:56
efried...once mriedem has updated20:56
sean-k-mooneyefried: link?20:56
efriedhttps://review.opendev.org/#/c/667894/ -- stand by for PS420:56
sean-k-mooneyoh i have looked at that before sure ill re review20:57
*** gyee has joined #openstack-nova20:59
*** bbowen has joined #openstack-nova21:02
*** betherly has joined #openstack-nova21:03
mriedembuild specs docs takes awhile21:05
mriedem*building21:05
openstackgerritArtom Lifshitz proposed openstack/nova master: [WIP-until-series-is-ready] Introduce live_migration_claim()  https://review.opendev.org/63566921:05
sean-k-mooneyim reviewing v3 in the mean time and then ill revew the delta21:05
openstackgerritArtom Lifshitz proposed openstack/nova master: New objects for NUMA live migration  https://review.opendev.org/63482721:05
openstackgerritArtom Lifshitz proposed openstack/nova master: LM: add support for sending NUMAMigrateData to the source  https://review.opendev.org/63482821:05
openstackgerritArtom Lifshitz proposed openstack/nova master: LM: add support for updating NUMA-related XML on the source  https://review.opendev.org/63522921:05
openstackgerritArtom Lifshitz proposed openstack/nova master: RPC changes to prepare for NUMA live migration  https://review.opendev.org/63460521:05
openstackgerritArtom Lifshitz proposed openstack/nova master: NUMA live migration support  https://review.opendev.org/63460621:05
openstackgerritArtom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002121:05
openstackgerritArtom Lifshitz proposed openstack/nova master: [WIP] Functional test for NUMA live migration  https://review.opendev.org/67259521:05
efriedmriedem: I use this: http://paste.openstack.org/raw/754874/21:06
*** betherly has quit IRC21:08
openstackgerritMatt Riedemann proposed openstack/nova-specs master: Add user_id to the migrations  https://review.opendev.org/66789421:08
efriedoh, I retract my 'uck' from earlier. Didn't pick up that these were in the querystring. That's dandy.21:10
*** zbr has quit IRC21:11
sean-k-mooneyya that is how we normally do filtering21:11
mriedemyeah i wasn't sure why you were ucking21:12
sean-k-mooneyefried: has intel not given you server with 70 billion cores to run cirros vms on yet21:12
efriedpshht, what do you think?21:13
sean-k-mooneythey are greate for builing docs or running unit tests21:13
sean-k-mooneyi had 3 racks woth of severs in my name when i left so yes?21:13
efriedeven if I had 70 billion cores, I would still want to build only the spec I modified.21:13
sean-k-mooneyyou could add that script to the tools directly and add it as a tox target21:14
mriedemeric is in HMC withdrawals21:14
sean-k-mooneylike fast821:14
efriedthat's a good idea.21:15
*** slaweq has quit IRC21:15
efriedwhere do those tools live?21:16
mriedemthey are just scripts in the repo21:17
sean-k-mooneyhere https://github.com/openstack/nova-specs/tree/master/tools21:17
mriedemthe tox target calls them and passes through the args21:17
efriedoh, I thought there was a central repo21:17
sean-k-mooneyno21:17
sean-k-mooneywe copy past all the things21:17
sean-k-mooneythere is a cookiecutter template somwhere21:18
mriedemoslo-specs-incubator duh21:18
sean-k-mooneybut this would likely only be for tests altherough i guess wyou could use it in nova21:18
sean-k-mooneyi still fine the oslo incubator graduation sript to be magical21:19
sean-k-mooneywe should have used it for placement extraction actully but heindsight21:19
mriedemi was joking21:20
mriedemi just like to make oslo-incubator jokes to feel worldly21:20
mriedemfeel like a BIG MAN21:20
sean-k-mooneysure but i really liked this script https://github.com/openstack/oslo-incubator/blob/stable/kilo/tools/filter_git_history.sh21:20
sean-k-mooneyi also really hated it after the 4 or 5th time i imported chagne form neutron into networking-ovs-dpdk21:22
sean-k-mooneybut it was nic to be able to maintain the history21:22
openstackgerritEric Fried proposed openstack/nova master: Process [compute] in $NOVA_CPU_CONF in nova-next  https://review.opendev.org/67280021:24
efriedmriedem: Should nova-cpu.conf need [[api_]database]/connection ?21:26
sean-k-mooneyno...21:27
*** zbr has joined #openstack-nova21:27
sean-k-mooneyi dont think it should but mriedem or dansmith would know21:27
mriedemefried: no21:28
*** whoami-rajat has quit IRC21:28
*** pcaruana has quit IRC21:28
mriedemwhether or not the cell conductor service needs to hit the api db depends on if you're allowing up-calls21:29
mriedemwhich most people are21:29
mriedembecause we haven't closed all of those gaps21:29
mriedemhttps://docs.openstack.org/nova/latest/user/cellsv2-layout.html#operations-requiring-upcalls21:29
*** ivve has quit IRC21:30
sean-k-mooneyby the way i was talking to slaweq this morning about https://bugs.launchpad.net/neutron/+bug/1836642 and im pretty sure we root cased this to the fact that nova is not confgiured to use memcache in the gate and we are falling back to using the dogpile.null cache implementaion21:30
openstackLaunchpad bug 1836642 in neutron "Metadata responses are very slow sometimes" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)21:30
efriedred herring21:31
efried(db conns from compute)21:31
mriedemsean-k-mooney: coincidentally i just saw he added me to https://review.opendev.org/#/c/672715/121:32
sean-k-mooneywe are hitting ^ downstream too so in at least the donwnstream case i dont think its a duplicate of https://bugs.launchpad.net/openstack-gate/+bug/180801021:32
openstackLaunchpad bug 1808010 in OpenStack Compute (nova) "Tempest cirros ssh setup fails due to lack of disk space causing config-drive setup to fail forcing fallback to metadata server which fails due to hitting 10 second timeout." [Medium,Confirmed]21:32
*** zbr has quit IRC21:32
sean-k-mooneymriedem: oh cool ya just wanted to give ye an fyi as he said he was working on a fix in tempest/devstack21:33
*** panda has quit IRC21:34
*** panda has joined #openstack-nova21:34
*** brault has quit IRC21:36
*** JamesBenson has quit IRC21:57
*** JamesBenson has joined #openstack-nova22:00
*** TxGirlGeek has joined #openstack-nova22:01
*** JamesBenson has quit IRC22:04
*** slaweq has joined #openstack-nova22:11
*** eandersson has joined #openstack-nova22:15
openstackgerritmelanie witt proposed openstack/nova stable/stein: Avoid logging traceback when detach device not found  https://review.opendev.org/67283322:15
*** slaweq has quit IRC22:16
*** betherly has joined #openstack-nova22:16
*** rcernin has joined #openstack-nova22:16
*** betherly has quit IRC22:21
openstackgerritsean mooney proposed openstack/os-vif master: only disable mac ageing for ovs hybrid plug  https://review.opendev.org/67283422:22
eanderssonAnyone seen VMs failed scheduling getting stuck in building/scheduling indefinitely?22:28
eandersson> There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for22:29
eandersson> MaxRetriesExceeded: Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance22:29
eanderssonHit a scheduling race condition which is fine, but then just got stuck (never into an error'd state)22:29
sean-k-mooneyit should not get stuc indefinetly and go to an error state22:29
eanderssonYea - exactly22:29
eanderssonWe have seen this a few times when something like Senlin is aggressively scaling up22:30
eanderssonAnd of course Senlin is unhappy because it just sits there waiting for it to go into ACTIVE (or ERROR'd) state.22:31
sean-k-mooneyif you are using any numa related resouces like hugepages or cpu pinning or if you are using sriov/pci passthough these are not tracked in placenet so if you have more then 1 schduler with more then 1 worker they can race22:31
eanderssonYea - that is exactly it.22:31
eanderssonWe are fine with it failing due to the race condition.22:32
sean-k-mooneyif we make it to 2020 then this should be fixed in U when all that stuff is in placmenet. but back to your current problem are there any error in the conductor that could indicate why the instance was not put into error state22:33
eanderssonNope just the above errors.22:33
eanderssonStarts with the expected22:34
eandersson> Free vcpu 0.00 VCPU < requested 20 VCPU22:34
sean-k-mooneywhat release are you running?22:34
sean-k-mooneythis is where that error is being raised by the way https://github.com/openstack/nova/blob/3370f0f03ce17aaf3a7ebaa95d497f62bef238c0/nova/conductor/manager.py#L63022:35
eanderssonRocky22:35
sean-k-mooneyhave you disabled the core,ram and disk filters22:35
eanderssonWe have not22:36
sean-k-mooneyhttp://lists.openstack.org/pipermail/openstack-dev/2018-January/126283.html22:36
sean-k-mooneythey have been deprecated and should not be enabled after ocata22:36
*** ccstone has joined #openstack-nova22:36
sean-k-mooneywe stopped reporting the info it used i think in rocky or stien22:37
sean-k-mooneyso it might not have been a race we could have elimiated all the host because the filter did not work22:38
eanderssonI feel like that would have been a more visible problem thou22:40
eanderssonWe are seeing this happen in %0.1>22:40
sean-k-mooneyya i think if we got to this part of the code its  not the filters22:41
sean-k-mooneyyou should turn them off however as they will break when you upgade to stien22:41
sean-k-mooneyso after we log that message we shoudl raise here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L60122:42
sean-k-mooneyand end up just below it here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L61522:42
sean-k-mooneyat which port we shoudl notify that the vm state should be error22:43
sean-k-mooneyhttps://github.com/openstack/nova/blob/stable/rocky/nova/scheduler/utils.py#L573 is where teh exception is being loged as a waning22:47
sean-k-mooneyand we do an instance.save right after it22:47
sean-k-mooneythat save should have updated the instance to error22:48
eanderssonDoes overcommit no longer work in aggregates?22:48
sean-k-mooneywe broke that22:49
sean-k-mooneyin ocata22:49
sean-k-mooneyfrom ocata on you need to set the overcommit per node22:51
sean-k-mooneyin stien we implemneted https://github.com/openstack/nova-specs/blob/master/specs/stein/implemented/initial-allocation-ratios.rst22:51
sean-k-mooneywhich allow you to manage allocation ratios directly via placement22:51
sean-k-mooneyand only specifiy an initall option in teh config22:52
sean-k-mooneyeandersson: but yes that is why melwitt sent http://lists.openstack.org/pipermail/openstack-dev/2018-January/126283.html and why we deprecated teh core,ram and disk filters and there aggrate* conterparts22:53
eanderssonWe might have placement slightly misconfigured22:56
mriedemhttps://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#allocation-ratios on the scheduler allocation ratio + placement config stuff22:57
mriedemthe initial* options are only new in stein22:57
mriedemso that doesn't help you in rocky22:57
sean-k-mooneyeandersson: in rocky the nova compute agent will continuoly set the ratios back to whatever is in the compute node config22:58
mriedemyou can either override allocation ratios per compute or override the providers in placement...but i think compute will overwrite anything you set out of band22:58
melwitteandersson: >= ocata there's no notion of a per aggregate allocation ratio. so you have to set them separately per compute host nova.conf22:58
sean-k-mooneyor use the defualt in code if not set22:58
eanderssonWe set the computes wrong for non-overcommited at the moment22:59
eanderssonThat could be causing the issue22:59
mriedemnote that you might not even be getting this far https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L61922:59
sean-k-mooneyits tricky becaue it will inially appar to work fine until you start filling your hosts22:59
mriedem^ is only if you hit a primary host, it fails and you reschedule22:59
mriedemif initial scheduling fails, you should get NoValidHost and the instances should be put into ERROR status23:00
mriedemif initial scheduling fails, you should go here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L124023:00
mriedemand the instances go into cell0 with ERROR status23:01
mriedemif your nova_cell0 db fell over then you're missing some updates...23:01
eandersson> | OS-EXT-STS:task_state               | scheduling23:01
eandersson> | OS-EXT-STS:vm_state                 | building23:01
eanderssonIt was stuck like this until we deleted btw23:01
eandersson12 hours later23:01
mriedemwas the instance ever reported as being on a host?23:01
mriedemor in cell0?23:01
sean-k-mooneymriedem: for some reason that is not happening and the error eandersson is seeing is loged form here https://github.com/openstack/nova/blob/stable/rocky/nova/scheduler/utils.py#L57323:01
eanderssonIt tried to schedule, but the moment it did it failed with23:01
eandersson> Free vcpu 0.00 VCPU < requested 20 VCPU23:01
mriedemif something fell over in conductor, like the db insert/update, you should have had error logs23:02
mriedemsean-k-mooney: that utils code is called from multple places in conductor23:02
eanderssonOnly other time I have seen this happen was when we moved nova-conductor to a new host23:02
eanderssonand forgot that the db is configured in mysql23:02
sean-k-mooneyyes23:02
mriedemsean-k-mooney: https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L120523:02
sean-k-mooneyi tracked it from here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L60123:02
sean-k-mooneybased on teh message that was loged23:02
mriedemok i think i've reported a bug that we could be failing to set instances to ERROR in build_instances if something fails, i remember talking with gibi about it23:03
*** dtruong has joined #openstack-nova23:03
melwitt+1 look for db-related errors in the log. that is how I've seen other situations internally where instance got stuck in building/scheduling state23:03
sean-k-mooney eandersson are you seeing the "'Setting instance to %s state.'" message23:04
eanderssonDoes that have the instance uuid?23:04
sean-k-mooneyyes23:04
mriedemrpc could have also fallen over23:05
eanderssonThen no23:05
sean-k-mooneymriedem: right that is the only thing between those two lines that could fail23:05
mriedemin that case you'd probably have MessagingTimeouts for the db save rpc calls23:05
eanderssonI have 1-2 MySQL server has gone away in the logs, but nothing near the time that happened23:06
eandersson(also those only failed on select 1)23:07
sean-k-mooneywell i was wondering if the  notifier = rpc.get_notifier(service) line is where it stoped23:07
sean-k-mooneyso it might not be related to the db23:07
sean-k-mooneybut to rabbit23:07
*** _erlon_ has quit IRC23:07
eanderssonhmm does placement do rpc?23:09
eanderssonor would this be within nova only?23:09
sean-k-mooneythis is in nova23:09
sean-k-mooneyin the conductor23:09
sean-k-mooneyand no placment does not do any rpc as far as i am aware23:09
sean-k-mooneyits just a wsgi app in front of a db23:10
eanderssonOne thing I don't like with oslo.messaging is that it ack's the message before it gets processed23:10
eanderssonoh23:14
eandersson> Exception during message handling23:14
eandersson> Exception during message handling: MaxRetriesExceeded: Exceeded maximum number of retries.23:14
eanderssonThat is the error I posted above23:15
eanderssonAnd this is from a normal failure23:15
eandersson> Setting instance to ERROR state.: MaxRetriesExceeded: Exceeded maximum number of retries.23:15
mriedemyeah that's this https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/utils.py#L73023:17
mriedemcalled from https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L61923:17
mriedemthe instance change should be saved here https://opendev.org/openstack/nova/src/branch/master/nova/scheduler/utils.py#L73623:17
*** mriedem has quit IRC23:18
*** betherly has joined #openstack-nova23:18
sean-k-mooneyright but in the case where it remains in building we get Exception during message handling:... instead of Setting instance to ERROR state.:...23:19
sean-k-mooneyeandersson: is ^ correct23:19
eanderssonhttps://opendev.org/openstack/oslo.messaging/src/branch/master/oslo_messaging/rpc/server.py#L17423:19
eanderssonThis could be anything :'(23:20
*** JamesBenson has joined #openstack-nova23:20
*** betherly has quit IRC23:23
sean-k-mooneyeandersson: it might be good to check your rabbitmq server logs and see if there are any errors23:23
sean-k-mooneyalthough i honestly dont really know how that code works23:23
eanderssonhttp://paste.openstack.org/show/754875/23:24
eanderssonsean-k-mooney, I honestly don't think it's a rmq issue directly23:24
*** JamesBenson has quit IRC23:24
*** brinzhang has quit IRC23:24
sean-k-mooneyok its unlikely that its related to that placement error23:25
sean-k-mooneythat should happen before we try to build the instace23:25
*** brinzhang has joined #openstack-nova23:25
eanderssonThat log was from the same milisecond23:25
eanderssonI found two Messaging errors and both are the same issue23:25
sean-k-mooneyits posible that we only have 1 candiate host23:26
sean-k-mooneyand that we raced23:26
eanderssonOther than that no other oslo messaging issues23:26
sean-k-mooneyas a result of the fact overcommit is not working23:26
eanderssonIf there was a rmq issue I am sure other services or request would have failed23:26
sean-k-mooneyneutron would be the first23:26
eanderssonWe have a pretty massive deployment so always a lot of things going on23:26
sean-k-mooneyalthogh they have redused the rpc traffic a lot in the last releae or two23:27
sean-k-mooneyoh the same message is also logged form here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L67623:29
artomHeh, for what it's worth, that per-compute libvirt connection mocking has issues:23:30
artom"    2019-07-25 19:29:16,889 ERROR [nova.virt.libvirt.host] Hostname has changed from test_compute0 to test_compute1. A restart is required to take effect."23:30
sean-k-mooneythe other place was teh cellv1 version23:30
sean-k-mooneyactully never bind its the same function23:31
*** tjgresha has quit IRC23:31
sean-k-mooneyso if this is a retry then we are executing https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L642-L67623:33
sean-k-mooneyand on line 655 we are trying to calim the alternate hosts which is failing with http://paste.openstack.org/show/754875/ because overcommit is not working23:34
eanderssonYea pretty sure the error is within oslo.messaging23:34
sean-k-mooneyit might not be23:35
sean-k-mooneywhen we rais the excetip form here https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/manager.py#L67623:35
eanderssonoh23:35
sean-k-mooneyit is not caut locally in this funciton23:35
*** brault has joined #openstack-nova23:36
sean-k-mooneyi should proably go to bed but there is likely a bug there but im to tired to track it this evening23:39
mnaserefried: i think you were working on moving nova to use openstacksdk ? do you have some of the commit you did in nova that did that? trying to do something similar for another project23:40
*** brault has quit IRC23:40
eanderssonThanks for the help sean-k-mooney23:42
eanderssonI'll create a bug report23:42
gmannjohnthetubaguy: melwitt sean-k-mooney efried added my comment/query on unified limit spec.  I am not very clear about how we will handle GET for limits which are not going to move to new unified limits(for example server_groups).23:42
gmannI am almost ok for proxy APIs (as per operators interest) if HTTPGone on those APIs is not acceptable.23:43
melwittgmann: thanks for reviewing. I am not 100% operators would be opposed to having to use an older microversion to use the APIs *but* I think the thing that makes it weird is that unified limits would be opt-in. so that is where I'm uneasy with removing proxy API in new microversion. wanted your opinion on that aspect as well23:45
gmannmelwitt: 'removing proxy API in new microversion' and keep them working for older one seems no benefit and even more maintenance. I was thinking we say those APIs are gone (410 response code HTTPGone) because nova quota system is gone and without microversion bump. similar approach as  nova-network & n-cert  case.23:50
gmannand before we do that we can trigger the notification to users via deprecating those APIs23:51
melwittgmann: oh, sorry, I didn't know HTTPGone is different. my bad23:51
melwittyeah, I'd like to be able to do that but wasn't sure about the API perspective of deprecating something when the new thing is opt-in and not on by default23:52
*** brinzhang_ has quit IRC23:52
gmannmelwitt: https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/floating_ip_dns.py#L2523:52
eanderssonsean-k-mooney, https://bugs.launchpad.net/nova/+bug/183795523:53
openstackLaunchpad bug 1837955 in OpenStack Compute (nova) "MaxRetriesExceeded sometime fails with messaging exception" [Undecided,New]23:53
*** brinzhang_ has joined #openstack-nova23:53
gmannmelwitt: the only things make me feel uncomfortable to do that was the comment johnthetubaguy added in alternate section about Forum discussion with operators about keeping the old tooling. But i hope that is only for transition period not permanently23:54
*** smcginnis has quit IRC23:56
gmannmelwitt: and second point is about GET quotas API to get the limits which are not going on new system.  keep existing GET quotas APIs for them ?23:57
melwittgmann: ok. the treatment of /os-quota-sets and /os-quota-class-sets is definitely temporary for the transition. the /limits API we're not completely sure because if we HTTPGone that one, there will be no more ability for users to show limits + usage in one API. but if community could be OK with having to go to two different API (keystone for limits and placement for usage) then we could 410 /limits too in the future when unified23:57
melwitt limits mode is on by default23:57
gmannmelwitt:  +1 we can decide the limit thing later once hierarchy unified limits are there.23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!