Tuesday, 2019-08-06

*** mriedem has quit IRC00:38
*** altlogbot_3 has quit IRC01:37
*** altlogbot_0 has joined #openstack-placement01:38
*** tetsuro has joined #openstack-placement01:39
*** tetsuro has quit IRC02:12
*** tetsuro has joined #openstack-placement02:44
*** tetsuro_ has joined #openstack-placement02:51
*** tetsuro has quit IRC02:53
*** tetsuro_ has quit IRC03:17
*** tetsuro has joined #openstack-placement03:55
*** ykarel|away has joined #openstack-placement04:05
*** tetsuro has quit IRC05:23
*** tetsuro has joined #openstack-placement05:24
*** tetsuro has quit IRC05:27
*** tetsuro has joined #openstack-placement05:30
*** ykarel|away is now known as ykarel05:39
openstackgerritMerged openstack/placement master: Use TraitCache for Trait.get_by_name  https://review.opendev.org/67375005:48
*** belmoreira has joined #openstack-placement06:36
*** belmoreira has quit IRC06:37
*** belmoreira has joined #openstack-placement06:37
*** tssurya has joined #openstack-placement07:04
*** cdent has joined #openstack-placement07:44
*** ykarel is now known as ykarel|lunch08:02
*** helenafm has joined #openstack-placement08:12
*** tetsuro has quit IRC08:14
openstackgerritChris Dent proposed openstack/placement master: Run nested-perfload parallel correctly  https://review.opendev.org/67350508:19
openstackgerritChris Dent proposed openstack/placement master: Implement a more complex nested-perfload topology  https://review.opendev.org/67351308:19
openstackgerritChris Dent proposed openstack/placement master: Add apache benchmark (ab) to end of perfload jobs  https://review.opendev.org/67354008:19
openstackgerritChris Dent proposed openstack/placement master: Optimize trait creation to check existence first  https://review.opendev.org/67355508:25
openstackgerritChris Dent proposed openstack/placement master: Add RequestWideSearchContext.summaries_by_id  https://review.opendev.org/67425408:33
openstackgerritChris Dent proposed openstack/placement master: Further optimize _build_provider_summaries  https://review.opendev.org/67434908:33
openstackgerritChris Dent proposed openstack/placement master: Track usage info on RequestWideSearchContext  https://review.opendev.org/67458108:33
openstackgerritChris Dent proposed openstack/placement master: Make _get_trees_with_traits return a set  https://review.opendev.org/67463008:33
openstackgerritChris Dent proposed openstack/placement master: Use expanding bindparam in extend_usages_by_provider_tree  https://review.opendev.org/67464708:33
openstackgerritChris Dent proposed openstack/placement master: WIP: Use orjson in python3 for allocation candidate dump  https://review.opendev.org/67466108:33
*** e0ne has joined #openstack-placement08:35
openstackgerritMerged openstack/osc-placement master: Add Python 3 Train unit tests  https://review.opendev.org/66947808:46
*** tetsuro has joined #openstack-placement08:58
*** tetsuro has quit IRC09:00
*** tetsuro has joined #openstack-placement09:01
*** tetsuro has quit IRC09:03
*** ykarel_ has joined #openstack-placement10:17
*** ykarel|lunch has quit IRC10:19
*** ykarel_ is now known as ykarel10:27
*** ykarel_ has joined #openstack-placement10:31
*** ykarel has quit IRC10:34
*** ykarel_ is now known as ykarel10:42
*** ykarel is now known as ykarel|afk11:47
*** ykarel|afk is now known as ykarel12:11
edleafecdent: Thought you might enjoy this: https://nedbatchelder.com//blog/201908/why_your_mock_doesnt_work.html12:39
edleafeI know you know the concepts behind it, but I thought this was the clearest explanation of why mocking can give unexpected results.12:39
cdentyeah, saw that a couple days ago12:39
cdent"At this point, you might be concerned: it seems like mocking is kind of delicate. "12:41
*** ykarel_ has joined #openstack-placement12:42
*** ykarel has quit IRC12:44
edleafeYep12:45
*** ykarel_ is now known as ykarel12:54
*** mriedem has joined #openstack-placement13:00
*** ykarel_ has joined #openstack-placement13:16
*** ykarel has quit IRC13:18
*** ykarel_ is now known as ykarel|afk13:19
*** ykarel_ has joined #openstack-placement13:38
*** ykarel|afk has quit IRC13:40
*** belmoreira has quit IRC13:41
*** belmoreira has joined #openstack-placement13:43
*** ykarel__ has joined #openstack-placement13:45
*** ykarel_ has quit IRC13:47
*** ykarel__ is now known as ykarel13:48
*** ykarel has quit IRC14:11
*** ykarel has joined #openstack-placement14:12
*** altlogbot_0 has quit IRC14:12
*** N3l1x has joined #openstack-placement14:14
*** N3l1x_ has joined #openstack-placement14:14
*** altlogbot_1 has joined #openstack-placement14:14
*** N3l1x_ has quit IRC14:15
*** belmoreira has quit IRC14:49
*** belmoreira has joined #openstack-placement15:05
*** belmoreira has quit IRC15:13
*** ykarel is now known as ykarel|away15:19
*** helenafm has quit IRC15:49
*** tssurya has quit IRC16:15
*** e0ne has quit IRC16:48
* cdent waves16:48
*** cdent has quit IRC16:48
*** N3l1x has quit IRC16:49
*** e0ne has joined #openstack-placement18:17
*** ykarel|away has quit IRC18:31
*** e0ne has quit IRC18:31
*** e0ne has joined #openstack-placement19:04
*** mriedem has quit IRC19:08
*** mriedem has joined #openstack-placement19:09
*** spatel has joined #openstack-placement19:10
spatelFolks, I need help here, i hit this bug https://bugs.launchpad.net/nova/+bug/182947919:10
openstackLaunchpad bug 1829479 in OpenStack Compute (nova) "The allocation table has residual records when instance is evacuated and the source physical node is removed" [Medium,Triaged]19:10
spatelI have deleted compute service and re-build node and now getting this error - http://paste.openstack.org/show/755583/19:11
spatelHow do i delete old uuid from placement service?19:11
*** e0ne has quit IRC19:13
*** e0ne has joined #openstack-placement19:14
*** e0ne has quit IRC19:15
sean-k-mooneysaid a different way when i triaged ^ its my assertion that nova could not delete the RP when the compute service was removed due to existing allocation19:22
sean-k-mooneyso spatel need to know how to list the allocation on the RP  and delete them and the RP via curl19:22
sean-k-mooneyso that when the restart the compute agent it can create a new RP and not get a confilt on the RP name19:23
sean-k-mooneythere is a bug open for this in nova that mriedem was woking on at one point it think19:23
sean-k-mooneybut im about to get dinner and signing off for the day so i can walk spatel though doing this.19:23
spatelthanks sean-k-mooney19:24
mriedemsean-k-mooney: one of these it sounds like https://review.opendev.org/#/c/663737/19:29
mriedembug 1829479 or bug 181783319:29
openstackbug 1829479 in OpenStack Compute (nova) "The allocation table has residual records when instance is evacuated and the source physical node is removed" [Medium,Triaged] https://launchpad.net/bugs/182947919:29
openstackbug 1817833 in OpenStack Compute (nova) "Check compute_id existence when nova-compute reports info to placement" [Undecided,In progress] https://launchpad.net/bugs/1817833 - Assigned to xulei (605423512-j)19:29
mriedemoh right spatel said that :)19:29
spatelmriedem: i am stuck here :(19:30
spatelfinding way to delete RP19:30
mriedemyou have to find the migration uuids which are the consumers with allocatoins against the evacuated node's resource provider,19:30
mriedemyou should be able to list migratoins by migration type (evacuate) and host19:30
spatelmriedem: my issue isn't related to migration19:31
mriedemit's related to evacuate,19:31
mriedembut under the covers nova creates an entry in the 'migrations' table in the cell db19:31
spatelI had working compute node in cluster which i re-build to adjust disk size but my mistake was i delete compute service :(19:31
mriedemand that migration record has a uuid which is the placement allocation consumer of the source node resources during the evacuate19:32
spatelhmmm19:32
spatelmriedem: interesting, how do i find migration uuid here?19:33
mriedemhttps://docs.openstack.org/api-ref/compute/?expanded=list-migrations-detail#list-migrations19:33
mriedemusing microversion >= 2.5919:33
mriedemyou can filter on migration_type=evacuation and source_compute=<host that you evacuated>19:34
mriedemthen cross check that https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-show (openstack resource provider show --allocations <rp_uuid>)19:35
mriedemfor each of the matching migration allocation consumers, you need to delete those using https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-delete19:35
spatelmriedem: i think this is to much for me to eat.. i am new and trying to understand what these document saying..19:36
spatelis there a command or something which i can list and delete stuff?19:36
mriedemyou're not that new - you've been around for at least a year asking sean-k-mooney to help you with stuff19:36
mriedemthose links are to the placement commands19:36
spatelyes but not good at this placement domain :)19:36
mriedemyou can use https://docs.openstack.org/python-novaclient/latest/cli/nova.html#nova-migration-list for listing migrations with the nova cli19:36
spateldo i need OSC-Placement plugin?19:37
mriedemyes19:37
mriedemjust pip install it19:37
spateljust did - pip install osc-placement19:37
mriedemunfortunately 'nova migration-list' doesn't have a specific filter option for migration_type or source_compute19:38
spatelmriedem: hey -  openstack resource provider list19:40
spateli can see the list now.. so my osc plugin is working at least19:40
mriedemyeah you can filter by hostname19:40
mriedemopenstack resource provider list --name <hostname>19:41
-spatel- [root@ostack-osa-2 ~ (admin)]> openstack resource provider list | grep ostack-compute-bld-sriov-2-1.v1v0x.net19:41
-spatel- | 93d7ff00-d4ee-4b7c-9d23-8265554ed99b | ostack-compute-bld-sriov-2-1.v1v0x.net | 2 |19:41
spatelcan i delete this uuid?19:41
mriedemno, placement won't let you because it has allocations against it19:42
mriedemyou can try but i don't think it will work19:42
mriedemopenstack resource provider delete 93d7ff00-d4ee-4b7c-9d23-8265554ed99b19:42
spateldone, its gone19:42
spatellet me restart compute agent services19:42
spatel[root@ostack-compute-bld-sriov-2-1 ~]# systemctl restart nova*19:43
spatelchecking logs for placement error19:44
spatelso far logs are clean.. let me trying to build instance19:45
spatelmriedem: so placement related error is gone but look like still nova isn't happy19:48
spatelduring vms build get error - {"message": "No valid host was found. There are not enough hosts available19:49
spatelmy /var/log/nova/nova-compute.log logs are freeze, mostly these logs are very chatty19:50
spatelI can see19:50
-spatel- [root@ostack-osa-2 ~ (admin)]> openstack resource provider list | grep ostack-compute-bld-sriov-2-1.v1v0x.net19:50
-spatel- | 5f38b898-cc22-49b6-935d-847d1b440bdc | ostack-compute-bld-sriov-2-1.v1v0x.net | 3 |19:50
*** e0ne has joined #openstack-placement19:51
spatellet me restart placement service on controller nodes19:51
mriedemnovalidhost is a scheduling issue,19:53
mriedemso check scheduler logs19:53
mriedemand/or placement-api logs19:53
mriedemmight need to enable debug logging to see which filter(s) rejected the request19:53
spatelin scheduler logs i am seeing - Received a sync request from an unknown host 'ostack-compute-bld-sriov-2-1.v1v0x.net'. Re-created its InstanceList.19:54
spatelbut its INFO19:54
spatelscheduler / placement logs are clean not a single error anywhere19:56
spatelvery strange20:07
spatelall logs are clean20:07
mriedemthat sync thing is unrelated20:08
spateli am running compute nova in debug and didn't find any error or issue20:08
mriedemthe novalidhost error might be in the conductor logs on the nova side20:08
mriedemplacement-api logs would have filtering at debug level20:08
spatellook like nova compute not updating resources to scheduler20:08
mriedemnova-compute is reporting information to placement, not hte scheduler20:08
mriedemand the scheduler is making decisions by asking placement what's available20:09
mriedemwhich release are you running?20:09
spatelstein20:09
mriedemthen if you enable debug in placement you should see some logging about filtering allocation candidates20:10
mriedemduring a scheduling request from nova20:10
spatelyou know what is interesting thing... i re-kick 15 compute nodes and all of them showing same behavior20:10
spatelfor testing i build new compute node (new, this wan't there before) and it working fine..20:11
spatellook like when you re-kick existing compute they somehow doesn't like..20:11
spatelmriedem: let me enable debug in placement, i have 3 controller node so let me try one20:12
spatelmriedem: is this correct file - /etc/uwsgi/nova-api-os-compute.ini20:14
spatelor /etc/nova/nova.conf20:15
spatellet me try nova.conf first20:16
spatelI can see DEBUG in logs but nothing interesting20:21
mriedemwell are you actually trying to schedule a new vm?20:21
spatelhttp://paste.openstack.org/show/755585/20:21
mriedemthere isn't probably anything interesting at steady stae20:21
mriedem*State20:21
spatelYes trying to build new vm and saying no host available20:21
spatelif i try to build vm on other compute it works20:22
spatelthese 15 computes nodes totally zombie now i can't build anything even they are in hypervisour list20:22
spatelwhy compute node not sending periodic updates to placement?20:24
spatelDo you think this is smoking gun in nova-compute.log file - http://paste.openstack.org/show/755586/20:25
spatelLock "compute_resources" acquired by "nova.compute.resource_tracker._update_available_resource"20:25
spatelbunch of Lock statment20:25
spatelstatements*20:26
spatelmriedem: ^^20:27
mriedemthat's from the update_available_resource periodic task,20:30
mriedemit's normal20:30
mriedemruns every minute by default20:30
mriedemcheck that there is inventory for the provider, openstack resource provider inventory list 5f38b898-cc22-49b6-935d-847d1b440bdc20:31
-spatel- [root@ostack-osa-2 ~ (admin)]> openstack resource provider inventory list 5f38b898-cc22-49b6-935d-847d1b440bdc20:33
-spatel- +----------------+------------------+----------+----------+-----------+----------+-------+20:33
-spatel- | resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total |20:33
-spatel- +----------------+------------------+----------+----------+-----------+----------+-------+20:33
-spatel- | VCPU | 2.0 | 32 | 0 | 1 | 1 | 32 |20:33
-spatel- | MEMORY_MB | 1.0 | 65501 | 2048 | 1 | 1 | 65501 |20:33
-spatel- | DISK_GB | 1.0 | 431 | 0 | 1 | 1 | 431 |20:33
-spatel- +----------------+------------------+----------+----------+-----------+----------+-------+20:33
spateleverything looks OK20:33
spatellook like somewhere its holding some info of old data which it doesn't like..20:37
spateli can try to re-kick machine and re-add20:37
mriedemwell you can check your scheduler logs for this https://github.com/openstack/nova/blob/stable/stein/nova/scheduler/manager.py#L14920:39
mriedemif you see that, it means placement is filtering things out and enabling debug logs on the placement side should show what is filtering the allocatoin candidates,20:39
mriedemif placement is returning candidates, then the scheduler debug logs should show which filters are kicking out the host(s)20:40
spatellet me try that20:40
spatelgrep -i "Got no allocation candidates from the Placement" /var/log/nova/nova-scheduler.log20:42
spatelnothing found on all 3 controller nodes20:42
mriedemthen you should see logs from your enabled filters rejecting hosts20:43
mriedemin here https://github.com/openstack/nova/blob/stable/stein/nova/filters.py#L6820:44
mriedem^ gives the summary20:44
spatellet me dig20:45
mriedemi have to run20:46
*** mriedem is now known as mriedem_afk20:46
*** e0ne has quit IRC20:48
*** spatel has quit IRC21:09
*** mriedem_afk is now known as mriedem21:20
*** takashin has joined #openstack-placement23:27
*** mriedem has quit IRC23:40

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!