Monday, 2019-08-26

openstackgerritTakashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (22)  https://review.opendev.org/57671200:16
*** markvoelker has joined #openstack-nova00:20
*** markvoelker has quit IRC00:25
*** itlinux_ has joined #openstack-nova00:46
*** itlinux has quit IRC00:49
*** hongbin has joined #openstack-nova00:51
*** yedongcan has joined #openstack-nova00:55
*** brinzhang has joined #openstack-nova00:57
*** BjoernT has joined #openstack-nova01:25
*** BjoernT has quit IRC01:37
*** sapd1 has joined #openstack-nova01:42
*** larainema has joined #openstack-nova01:49
*** gbarros has quit IRC01:59
openstackgerritzhufl proposed openstack/nova master: [Trivial]Remove unused helper get_vm_ref_from_name  https://review.opendev.org/67844402:00
*** BjoernT has joined #openstack-nova02:03
openstackgerritzhufl proposed openstack/nova master: [Trivial]Remove unused helper _get_min_service_version  https://review.opendev.org/67844602:15
*** zhubx has quit IRC02:17
*** boxiang has joined #openstack-nova02:18
*** boxiang has quit IRC02:22
*** boxiang has joined #openstack-nova02:22
openstackgerritya.wang proposed openstack/nova master: vCPU model selection  https://review.opendev.org/67029802:38
openstackgerritya.wang proposed openstack/nova master: Add compatibility checks for CPU mode and CPU models and extra flags  https://review.opendev.org/67029902:38
openstackgerritya.wang proposed openstack/nova master: Support report multi CPU model traits  https://review.opendev.org/67030002:38
*** zhubx has joined #openstack-nova02:38
*** boxiang has quit IRC02:40
*** masayukig has joined #openstack-nova02:41
*** sapd1 has quit IRC02:54
*** sapd1_x has joined #openstack-nova02:54
*** markvoelker has joined #openstack-nova02:55
openstackgerritBhagyashri Shewale proposed openstack/nova master: tests: Split NUMA object tests  https://review.opendev.org/67233603:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: scheduler: Flatten 'ResourceRequest.from_extra_specs', 'from_image_props'  https://review.opendev.org/67489403:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: objects: Remove legacy '_from_dict' functions  https://review.opendev.org/53741403:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: Remove 'hw:cpu_policy', 'hw:mem_page_size' extra specs from API samples  https://review.opendev.org/67533803:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: libvirt: Start reporting PCPU inventory to placement  https://review.opendev.org/67179303:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: libvirt: '_get_(v|p)cpu_total' to '_get_(v|p)cpu_available'  https://review.opendev.org/67269303:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: trivial: Rewrap definitions of 'NUMACell'  https://review.opendev.org/67439503:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: hardware: Differentiate between shared and dedicated CPUs  https://review.opendev.org/67180003:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: objects: Rename 'fields' import to 'obj_fields'  https://review.opendev.org/67410303:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: libvirt: Start reporting 'HW_CPU_HYPERTHREADING' trait  https://review.opendev.org/67557103:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/67180103:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: libvirt: Fold in argument to '_update_provider_tree_for_vgpu'  https://review.opendev.org/67672903:00
openstackgerritBhagyashri Shewale proposed openstack/nova master: Add reshaper for PCPU  https://review.opendev.org/67489503:00
*** markvoelker has quit IRC03:00
*** yaawang has quit IRC03:03
*** yaawang has joined #openstack-nova03:03
*** boxiang has joined #openstack-nova03:03
*** zhubx has quit IRC03:06
*** cervigni has joined #openstack-nova03:08
*** yedongcan has quit IRC03:08
cervigniis there anyone online now that can help with VGPU scheduling?03:10
*** rcernin_ has joined #openstack-nova03:15
*** rcernin has quit IRC03:15
openstackgerritLuyao Zhong proposed openstack/nova master: db: Add resources column in instance_extra table  https://review.opendev.org/67844703:25
openstackgerritLuyao Zhong proposed openstack/nova master: object: Introduce Resource and ResouceList objs  https://review.opendev.org/67844803:25
openstackgerritLuyao Zhong proposed openstack/nova master: Add resources dict into _Provider  https://review.opendev.org/67844903:25
openstackgerritLuyao Zhong proposed openstack/nova master: Retrive the allocations early  https://review.opendev.org/67845003:25
openstackgerritLuyao Zhong proposed openstack/nova master: Track orphan instances and error migrations in resource tracker  https://review.opendev.org/67845103:25
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845203:25
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67845303:25
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845403:25
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845503:25
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845603:25
*** brinzhang has quit IRC03:26
*** brinzhang has joined #openstack-nova03:27
*** BjoernT has quit IRC03:39
*** ircuser-1 has quit IRC03:41
*** hamzy_ has joined #openstack-nova03:48
*** hamzy has quit IRC03:51
*** mkrai has joined #openstack-nova03:56
*** yedongcan has joined #openstack-nova04:01
*** hongbin has quit IRC04:02
*** udesale has joined #openstack-nova04:05
*** yedongcan has quit IRC04:13
*** yedongcan has joined #openstack-nova04:14
*** markvoelker has joined #openstack-nova04:20
*** markvoelker has quit IRC04:25
*** itlinux has joined #openstack-nova04:31
*** itlinux_ has quit IRC04:34
*** threestrands has joined #openstack-nova04:37
*** threestrands has quit IRC04:37
*** threestrands has joined #openstack-nova04:37
*** bhagyashris has joined #openstack-nova05:11
*** beekneemech has quit IRC05:16
*** bnemec has joined #openstack-nova05:20
openstackgerritzhufl proposed openstack/nova master: [Trivial]Remove unused helper _get_min_service_version  https://review.opendev.org/67844605:40
*** Luzi has joined #openstack-nova05:43
openstackgerritzhufl proposed openstack/nova master: [Trivial]Remove unused helper get_vm_ref_from_name  https://review.opendev.org/67844405:49
*** jaosorior has joined #openstack-nova05:51
openstackgerritTakashi NATSUME proposed openstack/nova master: Remove descriptions of nonexistent hacking rules  https://review.opendev.org/67846205:56
*** mkrai has quit IRC06:14
*** mkrai has joined #openstack-nova06:16
*** rcernin_ has quit IRC06:18
*** lpetrut has joined #openstack-nova06:18
*** dpawlik has joined #openstack-nova06:23
*** sridharg has joined #openstack-nova06:34
*** deepak_mourya_ has joined #openstack-nova06:34
*** takamatsu has joined #openstack-nova06:34
*** takamatsu has quit IRC06:42
openstackgerritLuyao Zhong proposed openstack/nova master: db: Add resources column in instance_extra table  https://review.opendev.org/67844706:45
openstackgerritLuyao Zhong proposed openstack/nova master: object: Introduce Resource and ResouceList objs  https://review.opendev.org/67844806:45
openstackgerritLuyao Zhong proposed openstack/nova master: Add resources dict into _Provider  https://review.opendev.org/67844906:45
openstackgerritLuyao Zhong proposed openstack/nova master: Retrive the allocations early  https://review.opendev.org/67845006:45
openstackgerritLuyao Zhong proposed openstack/nova master: Track orphan instances and error migrations in resource tracker  https://review.opendev.org/67845106:45
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845206:45
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67845306:45
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845406:45
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845506:45
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845606:45
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847006:45
*** bhagyashris has quit IRC06:47
*** aojea has joined #openstack-nova06:58
*** kashyap has joined #openstack-nova06:59
*** trident has quit IRC07:01
*** itlinux has quit IRC07:03
bauzasgood morning Nova07:04
* bauzas is back after a long time07:04
gibigood mornin bauzas07:04
*** itlinux has joined #openstack-nova07:04
gibiwelcome back07:04
alex_xugood morning gibi bauzas07:04
bauzasthanks07:04
gibihi alex_xu07:04
*** shilpasd has joined #openstack-nova07:04
*** damien_r has joined #openstack-nova07:04
bauzasgibi: good morning07:05
bauzasgibi: I'll slowly catch up this morning but in case any reviews for your series (which I have no crazy idea of the status yet), you can ping me07:05
bauzasI haven't forgotten train-3 is soon :)07:06
* kashyap just back after a long time, too :D07:08
kashyapStarts processing The Pile(tm).07:09
*** trident has joined #openstack-nova07:10
gibibauzas: the bottom of the support-move-ops-with-qos-ports in here https://review.opendev.org/#/c/655110 mriedem has votes on the firt couple of patches. The current top of the series has the full support for cold migrate (and possibly resize)07:12
gibibauzas: I appreciate any reviews :)07:12
*** jawad_axd has joined #openstack-nova07:13
lpetrutHi, a quick question about failed live migrations: should the instances end up in error state even if the rollback is successful? this behavior was introduced by https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501bef3/nova/compute/manager.py#L6813-L6820 a couple of years ago.07:13
lpetrutlooks like the libvirt driver won't raise an exception when the instance is recovered: https://github.com/openstack/nova/blob/62f6a0a1bc6c4b24621e1c2e927177f99501bef3/nova/virt/libvirt/driver.py#L8080-L808607:13
lpetrutI'm thinking about doing the same with the Hyper-V driver: avoid propagating migration exceptions if the instance is recovered.07:13
openstackgerritTakashi NATSUME proposed openstack/python-novaclient master: Follow up for microversion 2.75  https://review.opendev.org/67847307:20
openstackgerritBalazs Gibizer proposed openstack/nova master: update allocation in binding profile during migrate  https://review.opendev.org/65642207:20
openstackgerritBalazs Gibizer proposed openstack/nova master: Extend NeutronFixture to handle migrations  https://review.opendev.org/65511407:20
openstackgerritBalazs Gibizer proposed openstack/nova master: prepare func test env for moving servers with bandwidth  https://review.opendev.org/65510907:20
lpetrutalthough the admin may just reset the state, not setting the instance to error state makes it clear that no further steps are required in order to fully recover the instance, reducing the amount of time required for debugging07:21
*** threestrands has quit IRC07:24
*** xek has joined #openstack-nova07:25
openstackgerritBalazs Gibizer proposed openstack/nova master: Func test for migrate server with ports having resource request  https://review.opendev.org/65511307:27
openstackgerritBalazs Gibizer proposed openstack/nova master: Make _rever_allocation nested allocation aware  https://review.opendev.org/67613807:27
openstackgerritBalazs Gibizer proposed openstack/nova master: Support reverting migration / resize with bandwidth  https://review.opendev.org/67614007:32
openstackgerritBalazs Gibizer proposed openstack/nova master: Func test for migrate re-schedule with bandwidth  https://review.opendev.org/67697207:34
openstackgerritBalazs Gibizer proposed openstack/nova master: Support migrating SRIOV port with bandwidth  https://review.opendev.org/67698007:37
openstackgerritBalazs Gibizer proposed openstack/nova master: Allow migrating server with port resource request  https://review.opendev.org/67149707:37
openstackgerritAkihiro Motoki proposed openstack/nova master: PDF documentation build  https://review.opendev.org/67673007:41
openstackgerritBalazs Gibizer proposed openstack/nova master: Allow migrating server with port resource request  https://review.opendev.org/67149707:44
*** phasespace has joined #openstack-nova07:46
openstackgerritLucian Petrut proposed openstack/nova master: Avoid error state for recovered instances after failed migrations  https://review.opendev.org/67848107:52
*** ivve has joined #openstack-nova07:54
openstackgerritLucian Petrut proposed openstack/nova master: Avoid error state for recovered instances after failed migrations  https://review.opendev.org/67848107:55
*** aojea has quit IRC07:57
*** markvoelker has joined #openstack-nova08:02
*** jangutter has joined #openstack-nova08:06
*** markvoelker has quit IRC08:07
*** derekh has joined #openstack-nova08:09
*** janki has joined #openstack-nova08:29
*** tkajinam has quit IRC08:30
*** janki has quit IRC08:32
*** janki has joined #openstack-nova08:34
*** avolkov has joined #openstack-nova08:35
openstackgerritLucian Petrut proposed openstack/nova master: Avoid error state for recovered instances after failed migrations  https://review.opendev.org/67848108:37
*** ralonsoh has joined #openstack-nova08:40
*** dtantsur|afk is now known as dtantsur08:46
*** rcernin_ has joined #openstack-nova09:14
*** damien_r has quit IRC09:16
stephenfinsean-k-mooney: https://review.opendev.org/67849709:16
*** brinzhang_ has joined #openstack-nova09:17
*** brinzhang has quit IRC09:21
sean-k-mooneyim not sure how setup_develop works but will that pass mysql as an extra or just install an addtional package via pip09:21
alex_xusean-k-mooney: stephenfin we can't revert a db migraton script, right? we should add new script to revert the upgrade for the old one, is that correct?09:27
*** yedongcan has quit IRC09:27
sean-k-mooneyalex_xu: we should not do a revert in general09:27
sean-k-mooneyare you asking for the pmem column09:28
alex_xusean-k-mooney: yes09:28
alex_xuand we already have another upgrade script after vpmem09:28
sean-k-mooneyjust add a new migration that drops the column and adds the devices column09:28
alex_xugot it09:28
sean-k-mooneyif we had made a release it would be a bit different as we would have had to do an online migration to move the data09:29
alex_xuluyao: ^, this doesn't sound right https://review.opendev.org/#/c/678447/2/nova/db/sqlalchemy/migrate_repo/versions/398_add_resources.py09:29
sean-k-mooneybut since i doubt anywone has master deploy with vpmem in production it should be fine to drop and replace09:30
alex_xuyea, so good we aren't such late09:30
alex_xuyes, in the past, we should take care about that. but for now, just not ensure whether we should keep that policy09:31
*** takashin has left #openstack-nova09:32
sean-k-mooney i think the probme with just changing the new colum name woudl be it might mess up sqlachemys migration logic? or maybe im thinking about alembic. one of those compute a hash of the migration to generate a unic version number for the db09:33
sean-k-mooneyso i think beyond the fact it would not fix any db that had run the old migration it could break other migrations.09:33
alex_xuyes, I think so09:34
alex_xuthen you have to manually tune the migration version09:34
sean-k-mooneythinking about that a bit more i think that is only with alembic but still better to avoid it09:34
kashyapaspiers: Hey, just catching up with changes post PTO.  Will respond once I finish processing The Pile.09:34
kashyaps/changes/patches/09:35
alex_xusean-k-mooney: yea, agree with you09:35
sean-k-mooneystephenfin: ah yes it is extras  https://opendev.org/openstack/devstack/src/branch/master/inc/python#L41909:41
*** rcernin_ has quit IRC09:42
luyaosean-k-mooney, alex_xu: Get it.! I will add a patch to remove vpmems column first. Thank you.09:43
alex_xuluyao: you needn't a separate patch remove vpmem column, you can have new script do the remove and add new resource column sametime09:44
sean-k-mooneyluyao: basicaly dont update the migration script that added the vpmem column, just add another to remvoe it and add the resources column09:49
*** bhagyashris__ has joined #openstack-nova09:57
openstackgerritBrin Zhang proposed openstack/nova master: Specify availability_zone to unshelve  https://review.opendev.org/66385109:57
bhagyashris__stephenfin: Hi, I just address review comments that you have given on patch https://review.opendev.org/#/c/674895/17 related to functional test and uploaded it for community review09:59
openstackgerritStephen Finucane proposed openstack/nova master: setup.cfg: Cleanup  https://review.opendev.org/67796910:00
openstackgerritStephen Finucane proposed openstack/nova master: requirements: Move DB dependencies to 'extras'  https://review.opendev.org/67747510:00
stephenfinbhagyashris__: That's failing pretty hard at the moment, it seems?10:01
stephenfinIs there more patches expected?10:02
stephenfin*are10:02
bhagyashris__stephenfin: I saw the functional test cases are failing because of change in dependent patches , I haven't yet checked where it's failing so are you going to check that point or should I?10:03
stephenfinIt's late for you, right? I can do it if so10:03
stephenfinI need to fix the base patch anyway10:03
bhagyashris__stephenfin: I just confirming because most of the time we reworked on same point10:04
luyaosean-k-mooney, alex_xu: 'migration script' you mean is  the files under nova/db/sqlalchemy/migrate_repo/versions/  right?10:05
bhagyashris__Yeah it's almost EOD here but I can do it by tomorrow10:05
bhagyashris__stephenfin: if you are ok ?10:05
stephenfinYup, I can do it10:05
*** markvoelker has joined #openstack-nova10:05
stephenfinOh, sorry10:05
stephenfinYeah, tomorrow is good10:05
stephenfinI'll work on fixing that other patch though, since I broke it10:06
stephenfinand leave the last one to you10:06
bhagyashris__stephenfin:  okay. Just reconfirming -> you will fix the main issue and after that I will continuously working on patch https://review.opendev.org/#/c/674895/17  to fix the functional test after your changes .10:08
bhagyashris__stephenfin:  right?10:08
stephenfinsure10:08
bhagyashris__stephenfin: ok Thank you :)10:09
*** bbowen_ has joined #openstack-nova10:10
*** markvoelker has quit IRC10:10
*** bbowen has quit IRC10:11
*** xek has quit IRC10:11
sean-k-mooneystephenfin: so i just did a thing and it worked10:14
*** bbowen_ has quit IRC10:15
*** bbowen has joined #openstack-nova10:16
sean-k-mooneystephenfin: i finally tested my theory about nested pcie passthough. and yes you can do it if you add an iommu to the l1 guest and tweak the pcie layout so that the device is in a seperate iommu group10:16
*** jaosorior has quit IRC10:26
*** xek has joined #openstack-nova10:26
*** markvoelker has joined #openstack-nova10:35
*** sapd1_x has quit IRC10:39
*** markvoelker has quit IRC10:40
bhagyashris__melwitt,  mriedem, efrid,  alex_xu: Requet you to review https://review.opendev.org/#/c/612626/ .10:45
*** bhagyashris__ has quit IRC10:47
*** mkrai has quit IRC10:58
*** udesale has quit IRC11:04
*** francoisp has joined #openstack-nova11:07
*** tesseract has joined #openstack-nova11:12
*** ccamacho has joined #openstack-nova11:21
openstackgerritStephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/67180111:24
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Fold in argument to '_update_provider_tree_for_vgpu'  https://review.opendev.org/67672911:24
openstackgerritStephen Finucane proposed openstack/nova master: Add reshaper for PCPU  https://review.opendev.org/67489511:24
*** Luzi has quit IRC11:25
*** Luzi has joined #openstack-nova11:26
*** Garyx has quit IRC11:43
*** jaosorior has joined #openstack-nova11:44
*** jroll has quit IRC11:44
*** jroll has joined #openstack-nova11:45
*** rcernin_ has joined #openstack-nova11:53
shilpasdstephenfin: need discussion related to python3-train patch for masakari-monitors https://review.opendev.org/#/c/669387/ for py27/py36 and py37, here observed 'import libvirt' is an issue11:59
shilpasdstephenfin: can you pl give me some pointers to resolve this11:59
*** derekh has quit IRC12:00
*** markvoelker has joined #openstack-nova12:00
*** weshay_MOD is now known as weshay12:07
*** larainema has quit IRC12:08
*** artom has joined #openstack-nova12:09
*** rcernin_ has quit IRC12:32
*** nweinber has joined #openstack-nova12:36
openstackgerritLuyao Zhong proposed openstack/nova master: db: Add resources column in instance_extra table  https://review.opendev.org/67844712:39
openstackgerritLuyao Zhong proposed openstack/nova master: object: Introduce Resource and ResouceList objs  https://review.opendev.org/67844812:39
openstackgerritLuyao Zhong proposed openstack/nova master: Add resources dict into _Provider  https://review.opendev.org/67844912:39
openstackgerritLuyao Zhong proposed openstack/nova master: Retrive the allocations early  https://review.opendev.org/67845012:39
openstackgerritLuyao Zhong proposed openstack/nova master: Track orphan instances and error migrations in resource tracker  https://review.opendev.org/67845112:39
openstackgerritLuyao Zhong proposed openstack/nova master: Claim resources in resource tracker  https://review.opendev.org/67845212:39
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Enable driver configuring PMEM namespaces  https://review.opendev.org/67845312:39
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: report VPMEM resources by provider tree  https://review.opendev.org/67845412:39
openstackgerritLuyao Zhong proposed openstack/nova master: libvirt: Support VM creation with vpmems and vpmems cleanup  https://review.opendev.org/67845512:39
openstackgerritLuyao Zhong proposed openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845612:39
openstackgerritLuyao Zhong proposed openstack/nova master: Add functional tests for virtual persistent memory  https://review.opendev.org/67847012:39
*** xek_ has joined #openstack-nova12:42
*** mgoddard has quit IRC12:47
*** xek_ has quit IRC12:47
*** hemna has quit IRC12:47
*** mgoddard has joined #openstack-nova12:47
*** martinkennelly has joined #openstack-nova12:51
*** gbarros has joined #openstack-nova12:53
*** jmlowe has quit IRC12:56
*** slaweq has joined #openstack-nova12:58
*** macz has joined #openstack-nova13:04
*** boxiang_ has joined #openstack-nova13:09
*** boxiang_ has left #openstack-nova13:09
*** eharney has quit IRC13:09
*** boxiang_ has joined #openstack-nova13:10
*** boxiang_ has left #openstack-nova13:10
*** boxiang_ has joined #openstack-nova13:10
*** boxiang_ has left #openstack-nova13:11
*** boxiang_ has joined #openstack-nova13:11
*** janki has quit IRC13:12
*** boxiang_ has quit IRC13:13
*** boxiang_ has joined #openstack-nova13:14
*** jmlowe has joined #openstack-nova13:16
*** gbarros has quit IRC13:16
*** boxiang_ has quit IRC13:18
*** KeithMnemonic has joined #openstack-nova13:19
*** dave-mccowan has joined #openstack-nova13:19
*** BjoernT has joined #openstack-nova13:23
*** gbarros has joined #openstack-nova13:27
*** tbachman has joined #openstack-nova13:29
*** slaweq has quit IRC13:36
*** boxiang_ has joined #openstack-nova13:37
*** Luzi has quit IRC13:44
*** hemna has joined #openstack-nova13:46
*** KeithMnemonic has quit IRC13:48
*** jawad_axd has quit IRC13:53
*** BjoernT_ has joined #openstack-nova13:56
*** mgoddard has quit IRC13:56
*** mgoddard has joined #openstack-nova13:58
*** BjoernT has quit IRC13:59
*** mriedem has joined #openstack-nova13:59
*** eharney has joined #openstack-nova14:01
*** boxiang has quit IRC14:18
*** boxiang has joined #openstack-nova14:18
efriedmriedem, melwitt, dansmith: Could I please get eyes on https://review.opendev.org/#/c/678237/ to unwedge u-c https://review.opendev.org/#/c/678207/14:20
*** eharney has quit IRC14:20
mriedem" We get to stop doing this soon, I promise." fool me once, shame on you, fool me twice...14:21
efriedbut mordred *also* promises14:21
sean-k-mooneydid we "fix" something like this recently14:22
efriedsean-k-mooney: you mean https://review.opendev.org/#/c/676495/ ?14:22
sean-k-mooneyyes14:22
efriedYou mean, every time we release a new sdk we have to re-fix our fixture that stubs out something private in sdk? Yes.14:22
efriedthe long term solution is to not stub out private sdk stuff.14:22
mordredsrrsly14:23
mordredyes14:23
efriedsdk is going to expose a fixture that we can simply import and use14:23
sean-k-mooneyyep seams like what we should be doing14:23
mordredthe sdk is going to give you the happy fixture14:23
mordredand you will use that14:23
mordredand it will be tested in the sdk14:23
mordredso we won't break you14:23
efriedright, and then as they change internals, they'll change the fixture, and we'll automatically "keep up".14:23
mriedemi might as well just fast approve this14:23
mriedemit's test only and holding up the u-c change14:23
*** eharney has joined #openstack-nova14:23
efriedyes, the u-c patch proves it works14:24
sean-k-mooneyefried: we try to do that with plamcent but we have been broken with the placement fixture in the past14:24
efriedand the nova patch proves it still works at 0.34.014:24
efriedsure, but less likely14:24
sean-k-mooneywe will need to similarly pull the sdk out of requirements and install it spereatly in tox right14:24
sean-k-mooneye.g. to consume it form master?14:24
efriedeh? Why would we need to do that?14:24
sean-k-mooneythe same reason we do it for the placement fixture14:25
efriedoh, no, tracking against real releases should be fine.14:25
sean-k-mooneyok14:25
efriedWhen we've needed to pre-test a feature, we twiddle the project list in .zuul.yaml14:25
sean-k-mooneyi guess for placement we do it to get eraly acess to features14:25
efriedyes, which is actually probably not necessary anymore14:25
efriedactually probably not advisable.14:25
efriedwe should bring that up with cdent when he gets back.14:26
sean-k-mooneyyes the second one14:26
efried(which I thought was today, but I guess not)14:26
sean-k-mooneyi think it is still nessacary for some feature but we have a backlog of feature to catch up on14:26
openstackgerritMerged openstack/nova master: Document archive_deleted_rows return codes  https://review.opendev.org/67781914:26
efriedsean-k-mooney: I actually don't remember the last time we did something in nova that *needed* to track against unreleased placement.14:27
efriedand iiuc having it set up this way allows us to do things like... merge a nova feature before placement is released accordingly, thereby breaking nova (cf. the aforementioned fixture issue).14:27
sean-k-mooneywell all of the nested stuff would have needed it if we did that this cycle.14:27
efriedyeah, but we can just release placement14:27
efriedreleases are pretty cheap.14:27
sean-k-mooneymaybe. i guess depends on is really the way to test this stuff14:28
efriedright, I'm saying, I'm not sure there's a good reason for us to have placement in required projects by default.14:29
sean-k-mooneyi wasnt refering to required proejcts specifcally14:29
efriedthat's how we end up with nova running against master placement in the gate, nah?14:30
mriedemyou need placement in required projects in the gate jobs b/c it's a service not a library14:30
mriedemif you want to freeze the fixture or something, yo'ud need to make it a library i think14:30
openstackgerritMerged openstack/nova master: Change nova-manage unexpected error return code to 255  https://review.opendev.org/67783214:31
efrieddo we have keystone/glance/ironic/neutron/cinder in required projects too?14:31
mriedemiow, if placement isn't in required-projects, gate jobs aren't going to run with placement from master14:31
* efried looks14:31
mriedemyes14:31
openstackgerritMerged openstack/nova master: Document map_instances return codes in table format  https://review.opendev.org/67783514:31
efriedokay, maybe I'm thinking of placement as more of a lib14:31
dansmithartom: you working on the test failures? I haven't looked to see what they are, but I assume they're relevant since all of them seem to be failing14:32
mriedemhttps://github.com/openstack/devstack/blob/master/.zuul.yaml#L36814:32
artomdansmith, yeo14:32
artom*yep14:32
artomdansmith, wanted to ask you, we can't fully remove an RPC param until the next major version bump, right?14:33
*** zigo has joined #openstack-nova14:33
artomI wanted to remove destroy_disks from the rollback_live_migration call and have the method on the destination do all the deciding, since _live_migration_flags isn't host-dependant, it only cares about migrate_data.14:33
artomBut we have to keep compatibility code until RPC 6.0, right?14:33
dansmithartom: fully or otherwise, yeah14:34
*** phasespace has quit IRC14:35
artomdansmith, wait, otherwise? So not at all? RPC params are only additive?14:35
dansmithartom: yeah, meaning if you supported the letter and intent of the law in the current version, you can't stop honoring either until the next bump14:36
mriedemwhat does destroy_disks have to do with numa?14:36
dansmithartom: in that, you can't just accept and ignore some flag that has some meaning14:36
artommriedem, it's convoluted, but basically, we need to call rollback_live_migration_at_destination if we did a claim and we need to drop it14:37
artomExcept now, it's called only the do_cleanup is True (as determined by _live_migration_flags)14:37
mriedemyeah i remmeber that14:38
artomThere's a bunch of possibilities around that, but the end result it, everything becomes easier if we just always call rollback_live_migration_at_destination, and let it decide what cleanup needs to be done14:38
mriedemyou mean rollback_live_migration_at_destination right?14:38
artomInstead of passing it booleans over RPC14:38
artomExcept doing that would technically require removing the destroy_disks param14:38
dansmithartom: don't we pass that flag because there are cases where the other side can't decide what to do?14:39
*** mlavalle has joined #openstack-nova14:40
artomdansmith, might have been the case at one time, but doesn't look like it now. https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L6946-L6979 doesn't depend on anything but migrate_data, which both sides have14:41
*** markvoelker has quit IRC14:41
artomUnless I'm really blind14:41
artomI guess it was done to save an RPC call if we didn't need it?14:41
artomsean-k-mooney, could you weigh in please: https://review.opendev.org/#/c/635669/39/nova/compute/resource_tracker.py@30714:43
dansmithhmm, okay.. I really thought we had serious issues with that logic,14:44
dansmithbut perhaps the passing of migrate data to more places made that better or something14:44
dansmithit's been a while14:44
dansmithmriedem probably remembers better14:44
*** markvoelker has joined #openstack-nova14:44
mriedemi would have to dig, _live_migration_cleanup_flags sucks14:44
dansmithyeah14:44
mriedemi had to workaround the same kind of thing where do_cleanup was False but i still needed to do network_api.setup_networks_on_host for the dest in the neutron case with port bindings14:45
mriedemhttps://review.opendev.org/#/q/I658e0a749e842163ed74f82c975bcaf19f9f7f0714:46
mriedemartom: so you just need the source to make a call to the dest to rollback the claim on failure right? regardless of shared storage or not14:47
artommriedem, yep14:47
artomI could always just add a new RPC method14:47
mriedemdansmith probably won't like what i'm about to suggest, but it might be better to just write a new rpc method,14:47
artomSeems weird to do that since we already have a thing called "rollback"14:47
mriedemrather than munge it into this steaming pile o shit14:47
artomYou don't like wobbly shit castles?14:48
artomIf dansmith's onboard I'm all for that14:48
mriedemwe'd only make this new call if we know, from the source, that there would be something to cleanup, right? meaning the instance has numa claims in the migrate_data object or something?14:48
dansmithmriedem: I dunno why you'd say that... making another call that does what we want instead of calling this one and expecting very specific behavior is better, IMHO14:48
mriedemdansmith: b/c of our prep_resize hullabaloo a couple of weeks back14:49
artom'tis settled.14:49
* artom hax14:49
dansmithmriedem: yeah, I mean, okay fair point, but that seemed a little more knife edge single-case to me14:49
*** lpetrut has quit IRC14:55
*** mkrai has joined #openstack-nova14:57
*** belmoreira has joined #openstack-nova14:57
sean-k-mooneyartom: oh hi yes15:03
mriedemif i re-use the existing prep_resize, i will either have to pass it a new param to control the logic or change it to check if the migration.cross_cell_move flag is set and True (which won't work for old computes, but that could be filtered out in the api or conductor based on compute service version), but either way i'd need to change it to say (1) don't reschedule and (2) don't rpc cast to resize_instance on the source15:03
sean-k-mooneyhad neutron open for some reason ill take a look15:03
*** boxiang_ has quit IRC15:03
*** boxiang_ has joined #openstack-nova15:03
*** boxiang_ has quit IRC15:04
*** boxiang_ has joined #openstack-nova15:05
*** tbachman has quit IRC15:05
*** boxiang_ has quit IRC15:06
*** boxiang_ has joined #openstack-nova15:07
*** boxiang_ has quit IRC15:08
*** boxiang_ has joined #openstack-nova15:08
dansmithartom: the new call would be "unclaim $this on the dest" yeah?15:09
sean-k-mooneywe used the migration_data for a few reason on of which was move claims were only used for cold migration and we did not want to have to depend on the for sriov migration.15:09
artomdansmith, yep15:09
*** phillw has joined #openstack-nova15:10
sean-k-mooneywe already had the old vif bindings for the souce and the new vif bindigns for the dest in the migration data15:10
dansmithartom: and what if the source rebooted because of the failure and never calls that?15:10
sean-k-mooneyso it was simple to extend that to also have the pci info15:10
dansmithartom: I don't think I was clear about where you're persisting anything from the claim you're doing, other than in memory15:10
sean-k-mooneythe only thn we neede was the pci adress which is stored in the vifs port profile15:11
artomdansmith, claims aren't persisted anywhere. The instance has a migration context though15:11
artomdansmith, the update resources periodic would clean up if we don't drop the claim "manually", I believe15:11
dansmithartom: right, the claims themselves aren't, but15:12
*** dr_gogeta86 has joined #openstack-nova15:12
dr_gogeta86hi15:12
dansmithyou mean the periodic would find that the migration wasn't associated with us anymore and clean up the in-memory state on the dest15:12
*** phillw has left #openstack-nova15:12
artomdansmith, wait, what do you mean by in-memory state?15:12
artomdansmith, it would update the available resources in the database...15:13
dansmithartom: that's my point, I'm not sure where we're accouting for these resources in the database15:13
sean-k-mooneyartom: in the host_cell object in the numa toplogy blob in the compute node15:13
artomdansmith, ^^ there you go :)15:14
dansmithwhere are we updating that?15:14
*** rajinir has joined #openstack-nova15:14
artomdansmith, lemme dig15:14
dansmithbecause I don't see it15:15
artomKnowing sean-k-mooney he might pull it out in seconds15:15
sean-k-mooneydansmith: when we claim the cpus and hugepages in pre live migereation at dest we shoudl update the db15:15
*** factor has joined #openstack-nova15:15
sean-k-mooneythat is where we claim the pci deivces for sriov live migration15:15
dansmithsean-k-mooney: yeah, stop using words like "should" and handwaving with "the database".. because that doesn't help :)15:15
dansmithsure, pci devices are claimed with the manager, I get tat15:16
dansmithI don't think we ever claimed the basic resources in the db before we removed all that stuff15:16
dansmithso I'm not sure where this is happening15:16
dansmithmriedem: do you know?15:16
artomdansmith, we don't care about the basic resources anymore, do we?15:17
artomThey're all in placement15:17
dansmithartom: no, that's my point exactly15:17
artomI was talking about specific CPUs and stuff15:17
sean-k-mooneywell i know that is where we do it for sriov. i know artom is calling the hardware.py module to geenreate the dest numa toplogy and i belive he is claimign the resouces at that point. ill check his patches15:17
artomdansmith, ok, then I didn't understand your point15:17
dansmithsean-k-mooney: again, that's too vague15:17
dansmithartom: I'm just comparing what you're doing here to what we *used* to do with the basic resources, which is effectively what you're "claiming" that you're doing here15:18
mriedemdansmith: well, sort of - we'd update the usage values on the ComputeNode15:18
dansmithmriedem: usage for ram and cpu etc right?15:18
mriedemand disk yeah15:18
artomdansmith, I'm trying to find the specific call that mriedem'd talking about15:18
mriedemhttps://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L105915:19
artomThere's a lot of layers, so it's taking a while15:19
dansmithmriedem: did we use that for actual scheduling though? I didn't think we did, but even still, I'm not sure what usage info we have for a complex numa15:19
mriedempretty sure the core/disk/ram filters looked at the _used values15:19
artomdansmith, so then I think https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L1123 would be the NUMA stuff15:19
mriedemhttps://github.com/openstack/nova/blob/stable/stein/nova/scheduler/filters/core_filter.py#L6515:20
artomThough I haven't traced how that's called, exactly15:20
dansmithmriedem: okay I didn't think they did, but it's been a long time15:20
dansmithmriedem: okay, yeah, host state is coming back to me a bit15:20
mriedemright, HostState is a wrapper over ComputeNode15:20
dansmithartom: that's a blank line15:20
mriedemi recently found out one of the weighers looks at one of these values as well15:20
artomdansmith, dammit, I was basing it on my local code15:20
artomsec15:20
mriedemah yes https://github.com/openstack/nova/blob/stable/stein/nova/scheduler/weights/cpu.py#L4315:21
dansmithmriedem: okay I've dumped all of that out of my head as no longer necessary, thanks15:21
dansmithokay, so _update_usage() is writing a new numa topo for the host15:21
artomdansmith, https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L109515:22
sean-k-mooneyyes the werigher do look at the numa info amoung other things15:22
dansmithartom: okay _get_usage_dict() is what I'm looking for I think15:23
*** alex_xu has quit IRC15:23
dansmithit looks like we write that compute node record a bunch of times?15:23
dansmithbecause we call it from update-from-migration, which generates usage and saves it, and would do that again for other migrations and other instances15:24
artomYeah, I can't imagine we were too efficient with that15:24
artomThat code is a whole bunch of _update_blah calling each other15:24
artomIt works though15:24
sean-k-mooneyso artom is afctully cretein the livemigration claim here https://review.opendev.org/#/c/634606/54/nova/compute/manager.py@6458 in heck_can_live_migrate_destination15:24
*** alex_xu has joined #openstack-nova15:25
sean-k-mooney*check_can_live_migrate_destination15:25
artomAt the end of train, Windriver found that the update resources periodic task would remove the new instance's usage because IIRC instance.numa_topology wasn't being updated15:25
mriedemend of train?15:25
mriedemrocky?15:25
artomErr, stein15:25
artomI always mix those up15:25
artomLast cycle15:26
dansmithartom: so to summarize, this new call is a short circuit to the dest to remove the migration usage from the compute node record's topology, which would also happen if the periodic ran and noticed the migration was no longer relevant to it15:26
artomdansmith, yep15:27
dansmithI think part of my problem is, I've mixed up the new-world placement arrangement, with the counting quotas stuff, and then the few remaining things that still use this system15:27
dansmithheaven forbid we account for numbers in the same place15:27
openstackgerritMatt Riedemann proposed openstack/python-novaclient master: Clarify --migration-type migration value as cold migration  https://review.opendev.org/67859315:28
sean-k-mooneywell this is really still used to account for assignment of  indiviula resouce rather then count the number of resouce avialble15:28
artomdansmith, what sean-k-mooney said. We need to track specific resources like CPU cores, which placement will never do, so this will never fully go away15:29
dansmithin the pci and numa case you mean15:29
mriedemwhat is "this"?15:29
dansmithsean-k-mooney: ^15:29
*** macz has quit IRC15:29
artomthis == the resource tracker, and claims15:29
sean-k-mooneyno we need to do it for pinned cpus too15:29
mriedemBBBZZZTTT15:29
sean-k-mooneyhugepages can be totally done in placmenet however15:29
artomWe need to say CPU 3 is used, not just "we have 1 CPU used out of 4"15:29
sean-k-mooneywe just need to do the work to move it15:29
mriedembefore removing the claims for them, the RT also "tracked" vcpu/ram/disk, just not at some individual device level15:29
mriedemso the RT did both before train15:29
artommriedem, yep15:29
mriedemhence me and dansmith asking "what is this?"15:30
*** hamzy_ is now known as hamzy15:30
sean-k-mooneyartom: there is a part that im not fully following in your code. i need to do a full review again but where are you calling into the hardware module to calulate the new numa toplogy/cpus/hugepages15:32
*** mtanino has joined #openstack-nova15:32
artomsean-k-mooney, not directly, it's hidden in the claim15:32
sean-k-mooneythat code definetly exsited in the stien version but i have not traced it in the new version15:32
sean-k-mooneyok can i redeploy this again today and test it by the way15:33
artomsean-k-mooney, yeah, at fist I wanted to avoid claims entirely because their code is, umm, interesting15:33
dansmithartom: you're calling that in the claim to validate page_size,15:33
artomBut if we don't want races we have to use them15:33
dansmithbut nothing else right?15:33
dansmithfrom the claim I mean15:33
artomdansmith, the page size is special because the "normal" virt.hardware code would allow a page size change15:33
dansmithhe probably wants _get_live_migrate_numa_info() in the libvirt patch15:33
dansmithhttps://review.opendev.org/#/c/634828/48/nova/virt/libvirt/driver.py15:34
artomSince it only goes by the flavor extra spec15:34
artomSo we need extra logic to block it15:34
dansmithyeah, I know15:34
dansmithhe's asking where the new guest numa topo gets created I think15:34
sean-k-mooneydansmith: artom shoudl be calling the hardware moduel/RT to caluate a new numa toplogy, validate page size and claim pCPUs on the destation15:34
dansmithsean-k-mooney: yeah, I'm not sure about that last bit15:34
* artom searches15:34
dansmithvalidate page size yes, in the claim15:34
dansmithcalculate a new topology, in the above link15:35
dansmithbut not sure I see where the dest claims that, tbh15:35
sean-k-mooneydansmith: it shoudl all be don in can_fit_numa_to_host or something like that let me see if i can find it15:35
artomsean-k-mooney, https://github.com/openstack/nova/blob/master/nova/compute/claims.py#L13815:35
sean-k-mooney yes15:35
*** brinzhang_ has quit IRC15:35
sean-k-mooney hardware.numa_fit_instance_to_host15:35
sean-k-mooneyis the entry point to all the nuam/hugepage/pinning code15:36
*** brinzhang_ has joined #openstack-nova15:36
dansmithoh, that's already there because of regular migrations right?15:36
*** boxiang_ has quit IRC15:36
sean-k-mooneyit looks at all those requets and returns a numa toplogy object that fultiles all of the request or raise an excepiton15:36
artomdansmith, that's actually the non-move Claim, so even booting instances use it15:36
dansmithokay15:36
sean-k-mooneydansmith: its tere for cold migration15:36
dansmithyeah15:36
dansmithwhatever15:37
sean-k-mooneyif hardware.numa_fit_instance_to_host does not raise one of like50 exceptions then the host is valid and it returns the numa toplogy15:38
artomexception.TooHardIGiveUpDoItYourself15:39
artomexception.TheCloudWasAMistak15:39
*** mtanino has quit IRC15:39
*** efried has quit IRC15:42
*** gyee has joined #openstack-nova15:45
*** boxiang has quit IRC15:48
*** tbachman has joined #openstack-nova15:49
*** xek has quit IRC15:51
*** efried has joined #openstack-nova15:51
*** boxiang has joined #openstack-nova15:51
*** boxiang has quit IRC15:52
*** boxiang has joined #openstack-nova15:52
*** factor has quit IRC15:53
*** factor has joined #openstack-nova15:54
*** slaweq has joined #openstack-nova15:56
*** mkrai has quit IRC15:56
sean-k-mooneyartom: :) that might be clearer then some of the messages we raise15:58
*** ivve has quit IRC16:00
artomsean-k-mooney, hah! exception.TurnBackWhileYouCan16:01
*** belmoreira has quit IRC16:03
*** ircuser-1 has joined #openstack-nova16:24
*** gbarros has quit IRC16:27
*** gbarros has joined #openstack-nova16:28
stephenfinbauzas: If you haven't already gone home, want to send this through? https://review.opendev.org/#/c/672336/16:30
bauzasstephenfin: sure, that will help not going crazy with my visa application form16:31
*** nicolasbock has joined #openstack-nova16:32
bauzasstephenfin: ouch, a bit hard to review given we need to make sure we don't regress on the test coverage16:32
bauzasdid efried checked it too ?16:32
efriedyeah, I did.16:33
stephenfinYeah, I haven't removed anything16:33
stephenfinIt's just moving things around16:33
stephenfinI add a good few things to that later so I wanted it broken up16:34
*** boxiang has quit IRC16:35
*** boxiang has joined #openstack-nova16:35
*** igordc has joined #openstack-nova16:37
*** factor has quit IRC16:39
bauzasok, let's make a trust bond16:43
*** ociuhandu has joined #openstack-nova16:44
*** boxiang has quit IRC16:45
*** boxiang has joined #openstack-nova16:45
efriedfungi: stephenfin is quoting you here https://review.opendev.org/#/c/677969/ as agreeing we don't need eggs. If that's so, would you mind throwing a +1 on there for me?16:47
fungiefried: sure, just a sec16:47
efriedfungi, stephenfin: okay, just as a sanity check, I codesearched: http://codesearch.openstack.org/?q=egg%3Dnova&i=nope&files=&repos=16:50
*** ociuhandu has quit IRC16:51
fungias far as i know, that's merely how you tell pip what to name the package it's installing. in modern usage it's actually going to build a wheel from the code there16:51
fungibecause it's not starting from an actual package, pip doesn't know what the package name for that repository should be16:52
fungii believe the name of the option is a bit of unfortunate legacy naming for the sake of backwards compatibility16:52
efriedo...kay16:52
fungii'll double-check the reference materials to confirm16:53
*** slaweq has quit IRC16:55
*** tesseract has quit IRC16:57
efrieddansmith: Are we allowed to rename a field like this https://review.opendev.org/#/c/678447/3/nova/db/sqlalchemy/models.py or do we have to burn it and just create a new one?17:00
dansmithefried: no you can't do that17:00
*** eharney has quit IRC17:00
*** slaweq has joined #openstack-nova17:00
dansmithefried: well, wait, nothing has ever been written there right?17:00
efriedright17:01
dansmiththat one specifically17:01
efriedright17:01
dansmithtechnically you could although it's... icky17:01
efriednever used for anything at all17:01
dansmiththis is why, btw, we should never merge db changes until the patches above are at least close17:01
efriedhttps://review.opendev.org/#/c/678447/3/nova/db/sqlalchemy/migrate_repo/versions/401_add_resources.py help?17:01
sean-k-mooneyefried: i was discussing this with alex_xu this morning17:01
efriedyeah, I still have the bruises from the original smackdown dansmith17:01
sean-k-mooneyi was suggesting just ahveing anothe migration to nuke it and add the new field17:01
sean-k-mooneygiven that nothing has used it and we have not released with it17:02
sean-k-mooneythe previous version did this which was worse https://review.opendev.org/#/c/678447/2/nova/db/sqlalchemy/migrate_repo/versions/398_add_resources.py17:03
efrieddansmith: I'll -2 this for reasons stated if you want to review/approve it.17:03
*** slaweq_ has joined #openstack-nova17:05
*** slaweq has quit IRC17:05
*** ociuhandu has joined #openstack-nova17:09
fungiefried: stephenfin: https://pip.pypa.io/en/stable/user_guide/#requirements-files #4 has an example of that syntax, and the terminology is still semi-relevant since sdists include a packagename.egg-info directory for their metadata, but the egg_info block in setup.cfg looks like it's related to old d2to1 functionality related to some sphinx extension for reporting version information (which make sense17:12
fungigiven the names of the options in it)17:12
fungithat has appeared in nova's setup.cfg since 2010 according to git, and seems to have been cargo-culted into numerous projects who copied their setup.cfg from nova's17:12
*** ociuhandu has quit IRC17:13
efriedshocking17:13
fungiit seems to be entirely vestigial at this stage, and a lot of newer projects don't include it (those newer projects which do include it have, i suspect, just copied it from older projects not knowing any better)17:13
efriedokay, well, I suppose it's probably an easy enough thing to revert & backport if we find out in four years that we broke somebody.17:14
efriedThanks fungi17:14
fungiyeah, if all the copying started from nova, then maybe the undoing also has to start there ;)17:14
fungimordred and/or dhellmann may have sufficient context to explain what that block was actually for once upon a time17:15
fungithe fact that it references svn dates it nicely17:15
*** shilpasd has quit IRC17:16
mordredfungi: what did I do?17:18
mordredoh that. I'd love if it went away17:19
fungimordred: discussing the necessity of the [egg_info] block in setup.cfg17:19
fungi(or lack of necessity)17:19
mordredoh - wait - I thought it was the boilerplate in setup.py for eventlet ... one sec17:19
fungimordred: egg_info.{tag_build,tag_date,tag_svn_revision} was something to do with a sphinx extension?17:20
fungii tried piecing together old commit messages and changelog entries from d2to1 to work out why it was there17:21
mordredah. I believe that is some text that setuptools would write to setup.cfg if it wasn't there17:21
mordredso we added it to avoid it being added and being a diff when running jobs17:22
*** martinkennelly has quit IRC17:22
dansmithefried: did you mean +2?17:22
mordredI'm guessing setuptools has since stopped writing random crap to setup.cfg senselessly :)17:22
fungifwiw i've not seen setuptools write anything into setup.cfg files in modern usage17:22
mordredyeah17:22
mordredit's possible it was a distutils thing even17:22
efrieddansmith: I mean I'm putting a -2 to hold the series, so that you can review and +2 it without fear of it accidentally merging like last time.17:22
mordredlike - that's some OLD cruft17:22
dansmithefried: ah, cool17:23
fungithanks mordred!17:23
*** dtantsur is now known as dtantsur|afk17:25
efriedmriedem: would I be correct in surmising that "cold migration" existed in the world before "live migration"17:26
*** ralonsoh has quit IRC17:26
sean-k-mooneyefried: for most hyperviors yes17:26
sean-k-mooneywhen nova was recreate both where a thing17:27
efriedI can't think of any other reason why "migration" means "cold migration" in the API.17:27
sean-k-mooneybut i think live migration was added later17:27
sean-k-mooneycold migration works in more cases the live migration can17:27
sean-k-mooneyfor excampl you could cold migreate and ironic server or rsd system in principal17:28
sean-k-mooneylive migrating either would be much harder even if it was possibel which it is not today17:28
sean-k-mooneyso haveing cold migration which is a lower barrier to entry be the default makes sense17:28
sean-k-mooneyefried: i dont have a linke but extra is inteded to optional dependcies or tools that are need for building but not using a project. we do not need postgress or mysql to run our tests we can use sqlite to execute all the db test17:31
sean-k-mooneywe opertunistalcly try to use mysql or postgerss if they are installed but its not a requirement17:31
sean-k-mooneywhat stephens change would allow you to do in theroyr is git clone nova and use tox to run the test with out having to intall mysql or postgress client17:32
sean-k-mooneyalthough i think https://review.opendev.org/#/c/677475/3/tox.ini might undo that last bit17:32
sean-k-mooneyso i like the patch in general but if https://review.opendev.org/#/c/677475/3/tox.ini is till forcing them to be install i dont see any point in doing it17:33
sean-k-mooneyideally we would not force them to be installd and isnead modify the gate job to install them17:33
mriedemefried: i think you might be reading too much into the names of those Migration.migration_type values17:36
mriedemyay in the beginning there was migrate and migrateLive, and thou said to thee,17:37
mriedemlet there be a type https://review.opendev.org/#/c/181110/6/nova/objects/migration.py17:37
mriedemand it was so17:37
sean-k-mooneyis it bad that my mind goes to monty python first17:38
sean-k-mooneyrobustify-evacuate you say. how did that go17:40
dansmithsean-k-mooney: you should have seen it before robustification17:43
*** macz has joined #openstack-nova17:44
sean-k-mooneyi unfortunetly did see some of it. but i was only a year or so into dealing with openstack at that point and was still mainly looking at neturon at that point17:45
sean-k-mooneywe used to pass a log more dicts of random garbage around back then17:45
mriedemand lo before the times of the great robustification there was much deleting of servers and gnashing of teeth17:46
sean-k-mooneya lot of the legacy systems in nova are kind of like rabbitmq. sure they can cause you pain but they generally work well enough that its more pain to replace them with something else so we dont. better the dragons you know and all that17:48
*** macz has quit IRC17:49
*** macz has joined #openstack-nova17:52
*** eharney has joined #openstack-nova18:03
*** boxiang has quit IRC18:06
*** boxiang has joined #openstack-nova18:06
*** boxiang has quit IRC18:09
*** boxiang has joined #openstack-nova18:09
*** tbachman has quit IRC18:09
*** boxiang has quit IRC18:10
*** boxiang has joined #openstack-nova18:10
*** zhubx has joined #openstack-nova18:22
*** boxiang has quit IRC18:25
artom*sigh*18:32
artomnew_ != _new18:32
efriedartom: neither one is reserved or anything, right?18:39
*** trident has quit IRC18:40
artomefried, hah, no, this isn't Python, it's migration context prefix grumbling18:40
*** trident has joined #openstack-nova18:40
*** gbarros has quit IRC18:41
dansmithsean-k-mooney: you know we're looking for your input here right? https://review.opendev.org/#/c/635669/39/nova/compute/resource_tracker.py18:42
efrieddansmith: In order to compare two OVOs, it is (apparently) necessary to overrid __eq__. Is nova.objects.numa.all_things_equal the accepted way to do this?18:44
openstackgerritArtom Lifshitz proposed openstack/nova master: Introduce live_migration_claim()  https://review.opendev.org/63566918:44
openstackgerritArtom Lifshitz proposed openstack/nova master: New objects for NUMA live migration  https://review.opendev.org/63482718:44
openstackgerritArtom Lifshitz proposed openstack/nova master: LM: add support for augmenting migrate_data with info from claims  https://review.opendev.org/63482818:44
openstackgerritArtom Lifshitz proposed openstack/nova master: LM: add support for updating NUMA-related XML on the source  https://review.opendev.org/63522918:44
openstackgerritArtom Lifshitz proposed openstack/nova master: NUMA live migration support  https://review.opendev.org/63460618:44
openstackgerritArtom Lifshitz proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002118:44
openstackgerritArtom Lifshitz proposed openstack/nova master: Functional test for NUMA live migration  https://review.opendev.org/67259518:44
dansmithefried: well, I don't know without looking, but let me tell you why it's not built in:18:44
dansmithefried: since OVOs can have only part of the database state loaded, we don't provide a default equality operator because determining equality with missing things may be doable in some cases and not others18:45
* artom writes a func test for rollback while CI runs on ^^^18:45
dansmithefried: and with a nested object the parent and child may have differing rules there18:45
efrieddansmith: ack, thanks for the explanation.18:46
mriedemtrue story, i was writing some ovo compare stuff for cross cell resize and had masked bugs because something was passing b/c one of the objects didn't have a field loaded18:46
dansmithefried: so for an object with primitive field types, then yes that should work, but if it has any nested objects, it won't18:46
openstackgerritMerged openstack/nova master: Update SDK fixture for openstacksdk 0.35.0  https://review.opendev.org/67823718:46
dansmithefried: for some objects, "does the row id match" is good enough, and for others, you need to compare the contents in more depth18:47
*** brault has joined #openstack-nova18:51
efrieddansmith: so for this Resource thing, we're going to be asking the virt driver to populate the (local) resources associated with a provider every time RT update runs (so on periodics, instance ops, etc). I would think we want to avoid writing the same records back to the database every iteration.18:53
efriedWhich means we need to be able to compare with existing18:53
efriedwhat decides how much of the object gets loaded from the db?18:53
dansmithefried: the code that loads it from the db18:54
dansmithefried: this is a performance-hating blob-based object right? in that case, it always comes out whole18:54
efriedThe touchpoint with the db is that InstanceExtra gets a resources field which is of type ResourceList18:55
*** brault has quit IRC18:56
efriedso ultimately I guess we need to be able to compare a ResourceList we get/build from the virt driver with whatever's in Instance.extras18:56
*** markvoelker has quit IRC18:57
*** xek_ has joined #openstack-nova18:58
*** ivve has joined #openstack-nova18:59
sean-k-mooneydansmith: yes although i was not sure if we already covered it on irc19:00
sean-k-mooneyill resond on the revew https://review.opendev.org/#/c/635669/39/nova/compute/resource_tracker.py19:01
*** slaweq_ has quit IRC19:01
*** tbachman has joined #openstack-nova19:08
*** markvoelker has joined #openstack-nova19:11
artomIn an artificial func test scenario with 2 compute hosts, we boot 2 instances. The first one can only fit on one host, the second one on either. Can we be 100% it will be placed on the host that does *not* have the first instance?19:11
artom100% sure?19:11
artomIOW, in that contrived scenario, how deterministic is scheduling?19:11
dansmithif it can't fit, it should be fully deterministic19:12
artomdansmith, the first instance, yeah. But the second instance can go on either host.19:12
dansmiththere are a couple tests that add a weigher which always prefers the first host to help it be deterministic in the order it selects even the first one19:12
dansmithartom: so you're asking if the second one, which will *still* fit will go to one specifically?19:12
*** tbachman has quit IRC19:13
sean-k-mooneyif you boot it in that order tehn its determistic19:13
artomdansmith, yeah. The scheduler's choice is "host A with another instance" or "host B with no instances"19:13
dansmithsean-k-mooney: not if host_subset_size != 119:13
sean-k-mooneyif you wan to force it to the other host i thike we added --host option recently19:13
dansmithartom: I would rather you write the test to be more specific than that if possible19:13
artomdansmith, so force it with a weigher to a host?19:14
artomIt doesn't change the mechanics of the test, just the numbers that we assert.19:14
sean-k-mooneyis the scenarion that the ifrst instace can oly fit on the first host and the second can fit on either(but not after the first is booted)19:14
artomsean-k-mooney, no, the second can fit on either regardless of the first instance.19:14
sean-k-mooneyoh19:14
dansmithartom: yeah, I don't really like altering the behavior in a functional test, but scheduling determinism bugs in functional tests are annoying,19:15
dansmithartom: especially if they work now, but start to fail the obscure assumptions made in a year19:15
sean-k-mooneythen use https://github.com/openstack/nova-specs/blob/master/specs/train/approved/add-host-and-hypervisor-hostname-flag-to-create-server.rst19:15
artomdansmith, well, what I currently have is just an if19:15
artomif hostA; assert hostA things; else assert hostB things19:15
artom(really just the number of the pinned CPUs)19:15
dansmithartom: meaning the test tolerates it being either place?19:15
sean-k-mooneyi think we merged the code for that a few weeks ago right19:15
artomdansmith, exactly19:15
dansmithartom: kinda seems like that's begging for testing both scenarios, no?19:16
dansmithartom: can you boot a third like the second that will have to go in the spot not taken by the second instance and then assert it al?19:16
artomdansmith, yeah, but why?19:16
artomThis is all just a "forcing" function to make sure that when I live migrate it, the XML will have to get updates19:16
dansmithartom: well, if you put it on the second host, I'll ask if you're sure it would have worked if co-located19:17
sean-k-mooneywe are litrally adding a feature this cycle to allow this.19:17
dansmithartom: and if it is co-located, I'll ask if it would have worked if by itself :D19:17
dansmithartom: ah, right, forgot about the migration19:17
dansmithartom: so you an migrate it to and fro and cover both cases right?19:17
sean-k-mooneyand its marked as implemented https://blueprints.launchpad.net/nova/+spec/add-host-and-hypervisor-hostname-flag-to-create-server19:17
dansmithi.e. migrate twice instead of boot thrice19:17
artomIt's about forcing the migration to move the server to a cell that as different CPU numbers19:18
*** tbachman has joined #openstack-nova19:18
sean-k-mooneyyou are trying to force the xml to be regenerated19:18
artomYeah19:18
*** damien_r has joined #openstack-nova19:18
dansmithartom: right, so two migrations will cover the bases of migrating to an empty host which could take anything and to a host where it can't (as long as it didn't choose the same by luck)19:18
artomThe idea is to have a server with 2 NUMA cells: 2 CPUs and 3 CPUs19:18
artomAnd another with just a single cell: 2 CPUs19:19
sean-k-mooneyso why not use the host/hypervioer_host name parmaters on server create19:19
artomBut such that the 2-CPU cells have different physical CPUs in them19:19
sean-k-mooneyit will allow you slect the host and it will use the schdler to validate it is sutable19:19
artomSo host2: [0, 1], host3: [0, 1, 2] [3, 4]19:19
artom"Fill" the 3-CPU cell with a first instance19:20
artomThen boot the 2-CPU instance19:20
dansmithartom: is this for a tempest test or a functional test?19:20
artomI don't care where it goes, as long as I live migrate it, its CPU pins have changed19:20
artomdansmith, functional19:20
dansmithokay, I think we have other tests that migrate twice just to make sure we hit both hosts, regardless of the ordering19:21
dansmithwhich I prefer to modifying the scheduler, if it will cover us19:21
dansmithtargeting the build works too I guess, but it's also not the more realistic case you're trying to replicate19:21
sean-k-mooneywe currently dont have any live migration test the suceed and use the fakelibvirt dirver in teh functional tests19:21
artomYey, I'm first :(19:22
sean-k-mooneyso ignrogn the lack of live migration tests i dont see what the big deal is19:22
sean-k-mooneyyou stare each compute node with a different config with non overlaping cpus19:22
sean-k-mooneyin the dedicate_cpu_set19:23
sean-k-mooneythen spawn a singel vm19:23
sean-k-mooneyand migrated to the ohter node19:23
sean-k-mooney and your done19:23
artomUhh, literally *ALL* tests are failing with "keystoneauth1.exceptions.auth_plugins.MissingAuthPlugin: An auth plugin is required to determine endpoint URL"19:23
artomI think devstack bork19:23
*** sridharg has quit IRC19:23
sean-k-mooneyyou need to rebase19:23
sean-k-mooneythis is fixed on master19:23
artomack19:24
*** efried has quit IRC19:24
artomWhen I push the new func test19:24
sean-k-mooneyare you goint to do ^19:24
artomI gotta run for a daughter school thing (yes, she's starting shool!)19:24
artomsean-k-mooney, can't have different configs in func tests19:25
sean-k-mooneyyes you can19:25
artomNope, CONF is global19:25
sean-k-mooneyyou need to chente the config flag and start the compute agent serially19:25
sean-k-mooneyit does not have to be with mocks and we read most of the values on start up19:26
dansmithsean-k-mooney: that only works for a subset of cases19:26
* artom -> away19:26
sean-k-mooneydansmith: yes but i think it will work in this case. i do not belview we parse the config on ever iteration of the perodic tasks.19:30
sean-k-mooneyif we do we should stop and cache it. we dont support change this config at runtime.19:31
*** damien_r has quit IRC19:31
dansmithit's not just the periodic that matters of course19:35
dansmith% fgrep 'CONF.' -r nova/virt/hardware.py | wc -l19:36
dansmith919:36
sean-k-mooneytrue but i think it would work in this case. the simple way would to test this would be just try it19:36
sean-k-mooneyat least 7 of those shoudl one be check at star up i think19:37
dansmithI'm just saying, this stuff is spread everywhere19:38
sean-k-mooneyi think all of them are just used in init19:38
sean-k-mooneyya it is19:38
sean-k-mooneyi just think/hope this is simpler to do then it first seams19:38
sean-k-mooneyif we test this simple with conf setting that would be better then create a complicate test setup19:39
sean-k-mooneyanyway i replied in https://review.opendev.org/#/c/635669/39/nova/compute/resource_tracker.py@304 not sure if that is what ye wanted.19:44
sean-k-mooneyill redeploy with artoms code shortly and start testing it again.19:45
*** gbarros has joined #openstack-nova19:51
*** mlavalle has quit IRC19:54
*** xek_ has quit IRC20:00
*** efried has joined #openstack-nova20:02
TheJuliamriedem: remember that seemingly weird rebalance race on stable/stein that we spotted with ironic last week? I  just spotted it on master branch testing https://d01b2e57f0a56cb7edf0-b6bc206936c08bb07a5f77cfa916a2d4.ssl.cf5.rackcdn.com/678298/4/check/ironic-tempest-ipa-wholedisk-direct-tinyipa-multinode/92c65ac/compute1/logs/screen-n-cpu.txt.gz <--  nova 19.1.0 :\20:02
sean-k-mooneyTheJulia: implying that backport the other fix may not resolve the issue20:04
sean-k-mooneyassuming on master it still had the db issue20:04
TheJuliaindeed :(20:05
sean-k-mooney those errors seam to be indicating that the RP is not found20:06
*** mlavalle has joined #openstack-nova20:06
sean-k-mooneyim not sing the db conclit on the compute node uuid20:06
TheJuliaseeing?20:06
*** efried has quit IRC20:07
sean-k-mooneyyes :)20:07
sean-k-mooneyi dropped the ee it seams20:07
TheJuliano worries, my brain does similar things20:08
TheJuliaso without the conflict.... hmmmm20:08
* TheJulia wonders if now is a good time for scotch20:08
sean-k-mooneybut it might be that the other compute node is not recreating the rp20:09
sean-k-mooneyor it is but we race20:10
TheJuliaI'll go with it is but we race20:11
* TheJulia will get something to sip on and continue digging shortly20:12
*** efried has joined #openstack-nova20:13
*** xek has joined #openstack-nova20:13
mriedemumm yeah doesn't look like the same issue, no db conflicts on restart of the service that i see20:15
mriedemi see this:20:15
mriedemAug 26 18:41:38.367990 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: INFO nova.compute.resource_tracker [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode eabf1567-cb9f-4a93-b77e-44a80ddd50b0 moving from ubuntu-bionic-rax-ord-0010443317 to ubuntu-bionic-rax-ord-001044331920:15
mriedemAug 26 18:41:38.818412 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: INFO nova.compute.resource_tracker [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] ComputeNode 61dbc9c7-828b-4c42-b19c-a3716037965f moving from ubuntu-bionic-rax-ord-0010443317 to ubuntu-bionic-rax-ord-001044331920:16
mriedemand then a bunch of errors about resource provider not found20:16
mriedemAug 26 18:41:38.992017 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.scheduler.client.report [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] [req-208b7275-abc4-4445-9551-8edbce49e0e1] Failed to retrieve traits from placement API for resource provider with UUID 61dbc9c7-828b-4c42-b19c-a3716037965f. Got 404: {"errors": [{"status": 404, "request_id": "req-208b7275-abc4-4445-9551-8edbce49e0e1", "detail"20:17
mriedemhe resource could not be found.\n\n No resource provider with uuid 61dbc9c7-828b-4c42-b19c-a3716037965f found: No resource provider with uuid 61dbc9c7-828b-4c42-b19c-a3716037965f found  ", "title": "Not Found"}]}.20:17
sean-k-mooneyit looks like the perodic taks the cleans up allocation ran before the compute node rp was recreated by the other compute service20:17
mriedemthe traceback hits get_provider_tree_and_ensure_root which should ensure the provider exists, so i'm not sure why it wouldn't,20:18
mriedemunless it's only looking at it's local provider tree cache and that is stale or something20:18
TheJuliathat does seem to possibly be the case. just glancing at records for eabf1567-cb9f-4a93-b77e-44a80ddd50b0, it is all over both compute logs20:23
TheJuliain the same time window it looks like...20:24
*** david-lyle has quit IRC20:25
*** slaweq has joined #openstack-nova20:29
openstackgerritMerged openstack/nova master: Remove descriptions of nonexistent hacking rules  https://review.opendev.org/67846220:29
TheJuliahmmmmm.... I wonder20:36
*** jmlowe has quit IRC20:38
*** dklyle has joined #openstack-nova20:39
openstackgerritMerged openstack/nova master: tests: Split NUMA object tests  https://review.opendev.org/67233620:41
TheJuliaI'm starting to wonder, at least digging at it, if things did't moderately correct themselves at least resource tracker wise as time went on. Looks like everything changed to no longer reserved around 18:46, placement reports still one node reserved 3 minutes later20:47
*** nweinber has quit IRC20:55
*** slaweq has quit IRC21:01
*** macz has quit IRC21:05
*** lpetrut has joined #openstack-nova21:07
*** CeeMac has quit IRC21:07
*** xek has quit IRC21:07
mriedemi think this is bogus21:09
mriedemAug 26 18:41:38.992017 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: ERROR nova.scheduler.client.report [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] [req-208b7275-abc4-4445-9551-8edbce49e0e1] Failed to retrieve traits from placement API for resource provider with UUID 61dbc9c7-828b-4c42-b19c-a3716037965f. Got 404: {"errors": [{"status": 404, "request_id": "req-208b7275-abc4-4445-9551-8edbce49e0e1", "detail"21:09
mriedemhe resource could not be found.\n\n No resource provider with uuid 61dbc9c7-828b-4c42-b19c-a3716037965f found: No resource provider with uuid 61dbc9c7-828b-4c42-b19c-a3716037965f found  ", "title": "Not Found"}]}.21:09
mriedembecause right before that we get inventories and aggregates for the provider21:09
mriedemAug 26 18:41:38.881026 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing inventories for resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f {{(pid=747) _refresh_associations /opt/stack/nova/nova/scheduler/client/report.py:761}}21:09
mriedemAug 26 18:41:38.953685 ubuntu-bionic-rax-ord-0010443319 nova-compute[747]: DEBUG nova.scheduler.client.report [None req-a894abee-a2f1-4423-8ede-2a1b9eef28a4 None None] Refreshing aggregate associations for resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f, aggregates: None {{(pid=747) _refresh_associations /opt/stack/nova/nova/scheduler/client/report.py:770}}21:09
mriedemunless at the same time, the other host is deleting the resource provider21:12
mriedembecause the driver no longer reports it there21:12
sean-k-mooneywell that could happen. we are not syncronising any of this21:12
mriedemAug 26 18:41:38.832749 ubuntu-bionic-rax-ord-0010443317 nova-compute[19290]: INFO nova.compute.manager [None req-d5a9c4b6-f197-4f6c-8b12-8f736bbdb11c None None] Deleting orphan compute node 6 hypervisor host is 61dbc9c7-828b-4c42-b19c-a3716037965f, nodes are set([u'1d23263a-31d4-49d9-ad68-be19219c3bae', u'be80f41d-73ed-46ad-b8e4-cefb0193de36', u'f3c6add0-3eda-47d9-9624-c1f73d488066', u'2c909342-b5dc-4203-b9cb-05a8f29c6c35', u21:12
mriedem1f5d8-8b39-4d03-8423-8e8404128ece']) Aug 26 18:41:38.962237 ubuntu-bionic-rax-ord-0010443317 nova-compute[19290]: INFO nova.scheduler.client.report [None req-d5a9c4b6-f197-4f6c-8b12-8f736bbdb11c None None] Deleted resource provider 61dbc9c7-828b-4c42-b19c-a3716037965f21:13
mriedem^ is the other host21:13
mriedemyup, 18:41:3821:13
mriedemso the new host starts refreshing it's RT, the old host deletes the provider, then the new host blows up in the RT, but eventually corrects itself on the next run21:14
*** trident has quit IRC21:14
mriedemdId SoMeOnE sAy EtCd?!?!21:14
sean-k-mooney... and follow cinders lead21:14
mriedemi'm joking21:15
sean-k-mooneyi know21:15
sean-k-mooneya distribute lock could prevent this by deadlocking the sytem so we dont get there21:16
sean-k-mooneywe proably need to cacht the error and if it happens remove the compute node form whatever data structure we were iterating over and move on21:17
mriedemi think the bug is that we stash the node on the new host in RT.compute_nodes, then hit the failure, and the next pass in the RT periodic task we see the node is already in the RT.compute_nodes and don't attempt to go through the _update and flush to placement flow, so the new host doesn't recreate the provider until inventory changes21:18
mriedemso find the thing on the new host and store it here https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L51321:18
mriedemblow up here https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L51621:19
mriedemand on the next pass find the node here https://github.com/openstack/nova/blob/71478c3eedd95e2eeb219f47460603221ee249b9/nova/compute/resource_tracker.py#L54621:19
mriedembut don't call _update21:19
mriedemuntil inventory changes21:19
mriedemhence the RP not showing up again for awhile21:19
mriedemTheJulia: ^21:19
mriedema la https://review.opendev.org/#/q/I9fa1d509a3de405d6246fb8670612c65c10cc93b21:19
*** trident has joined #openstack-nova21:20
*** markvoelker has quit IRC21:21
*** lpetrut has quit IRC21:22
mriedemoh actually,21:22
mriedembecause we put the provider back into the cache and then failed, we don't attempt to create it later in the new host either21:22
mriedemso the provider tree cache is hiding the fact the provider doesn't actually exist anymore21:23
mriedemefried: fun stuff ^21:24
mriedemguess i'll just report a bug for now21:25
*** trident has quit IRC21:25
*** igordc has quit IRC21:26
efriedmriedem: Under what circumstances do we put a provider into the cache and then fail?21:27
efriedis this during update_from_provider_tree somehow?21:28
mriedemyeah, it will probably be more clear when i post the bug report21:28
* efried awaits21:29
*** trident has joined #openstack-nova21:33
*** tonyb[m] has quit IRC21:35
*** dosaboy has joined #openstack-nova21:35
*** dosaboy has quit IRC21:39
*** dosaboy has joined #openstack-nova21:39
*** tonyb has quit IRC21:39
mriedemefried: TheJulia: https://bugs.launchpad.net/nova/+bug/184148121:39
openstackLaunchpad bug 1841481 in OpenStack Compute (nova) "Race during ironic re-balance corrupts local RT ProviderTree and compute_nodes cache" [Medium,Triaged]21:39
* TheJulia reads21:39
*** dosaboy has quit IRC21:40
*** dosaboy has joined #openstack-nova21:41
TheJuliafunnn..... and by fun, I think I mean anti-fun21:42
openstackgerritsean mooney proposed openstack/nova master: Libvirt: report storage bus traits  https://review.opendev.org/66691421:44
openstackgerritsean mooney proposed openstack/nova master: libvirt: use domain capabilities to get supported device models  https://review.opendev.org/66691521:44
openstackgerritsean mooney proposed openstack/nova master: Add transform_image_metadata request filter  https://review.opendev.org/66577521:44
fungican anyone double-check for me that bug 1837252 was introduced in os-vif 1.12.0? sean-k-mooney said it "was introduced in stein" but there were multiple releases in master before it branched to stable/stein21:45
openstackbug 1837252 in os-vif stein "IFLA_BR_AGEING_TIME of 0 causes flooding across bridges" [High,In progress] https://launchpad.net/bugs/1837252 - Assigned to sean mooney (sean-k-mooney)21:45
fungijust want to be sure before i go sending this to mitre, so i don't have to update them with errata21:45
sean-k-mooneyi think the exact commit is in the bug somewhere but ya ill check21:45
sean-k-mooneyit was right at the end of the cycle21:45
fungioh, sorry if i overlooked it, checking21:46
fungi5027ce821:46
sean-k-mooneyhttps://github.com/openstack/os-vif/commit/1f6fed6a69e9fd386e421f3cacae97c11cdd7c7521:47
sean-k-mooneyso 1.15.021:47
fungiaha, thanks21:47
sean-k-mooneythats the commit where i swaped the linux bridge plugin to use the common code intoduced in the previous commit21:48
fungiany chance you can skim the rest of the impact description in comment #17 there and see if it makes sense, aside from the affected versions obviously being an incorrect guess?21:48
* sean-k-mooney clicks21:49
sean-k-mooneyim not actully sure if it disable mac learning or if mac learning is still enabled but macs are never unlearnt21:50
sean-k-mooneyboth would result in flooding in different situations21:50
sean-k-mooneyform the inital bug i woudl have suspected that former21:52
sean-k-mooneysorry the latter21:52
fungihrm, yeah at first i thought it simply filled the mac table, but logan- suggested it disabled learning21:53
sean-k-mooneythe fdb does have macs21:53
sean-k-mooneymaybe ip route has some info21:53
sean-k-mooneyor ip tools21:53
fungihence the diff between the impact descriptions in comment #15 and #1721:53
sean-k-mooneybrctl setageing <brname> <time> sets the ethernet (MAC) address ageing time, in seconds. After <time> seconds of not having seen a frame coming from a certain address, the bridge21:54
sean-k-mooney       will time out (delete) that address from the Forwarding DataBase (fdb).21:54
sean-k-mooneyso i would think it never delete the macs21:55
sean-k-mooneyso it would only flood if the mac was learned on two ports21:55
mriedemmelwitt: dansmith: if either of you are around can we move this stein backport series through please? https://review.opendev.org/#/q/topic:bug/1839560+branch:stable/stein21:55
sean-k-mooneythat would only happen if the vm was migrated21:55
fungisean-k-mooney: or if the table filled i guess?21:56
sean-k-mooneyi dont think there is a limit on the table size21:56
fungithere is bound to be some practical limit on table size, whether it is reachable or not is another matter ;)21:56
sean-k-mooneymaybe there is but there the docs on linux bridge are rather limited21:56
dansmithmriedem: hit me tomorrow and I will, I gotta run21:57
funginot sure what sort of layer 2 filters are generally in place to prevent poisoning, but could an instance spoof frames from random macs to overrun the bridge table? or spoof specific macs for other victim instances so they show up on multiple ports?21:58
dansmithnm, I'll just do it now21:58
* dansmith feels mriedem's guilty stare21:58
sean-k-mooneyfungi: neutron adds iptbales rules to prevent mac spoofing and nova used to also21:59
melwittheh, dansmith wins21:59
fungisean-k-mooney: yeah, and i think that would be a more general concern beyond the scope of this particular bug anyway21:59
mriedemdansmith: <322:00
fungisean-k-mooney: so anyway, if the mac really does need to appear on multiple ports (e.g. because the instance was migrated) then that does significantly reduce the risk22:00
* dansmith blows mriedem a kiss as he rides off into the sunset22:00
fungisean-k-mooney: i'll update the bug with a summary of the discussion here and see if anyone contests these assertions22:01
sean-k-mooneyreduce yes but not eliminate so its still a concern22:02
sean-k-mooneycool22:02
fungiabsolutely22:03
sean-k-mooneythe primary concern is really the external netwrok / cross tenant shared networks22:03
fungibut good to capture the exact risk scenario in the impact description and advsory22:03
fungithanks!22:03
sean-k-mooneysecurity groups are still applied so within a tenatn network its less concerning but still not good22:03
sean-k-mooneyno worries22:03
*** markvoelker has joined #openstack-nova22:05
*** trident has quit IRC22:05
*** markvoelker has quit IRC22:10
melwittsean-k-mooney: I just read through your discussion about the os-vif bug, does the bug only affect VMs that are being migrated or can it affect more? sorry I didn't quite get that part22:10
*** damien_r has joined #openstack-nova22:11
sean-k-mooneymelwitt: i originally thought i would only affect vms that were moved as my understand of setting ageing=0 was that all learned mac address would be permement and never age out22:12
sean-k-mooneyalthough in the bug the assertion was made that ageing=0 disable mac learning entirly and casues all packets to flood22:13
sean-k-mooneyi am not sure which is correct22:13
sean-k-mooneythe scope of the bug is obviosly change signifcantly based on that22:13
*** trident has joined #openstack-nova22:14
melwitthm, yeah. ok22:14
fungisean-k-mooney: looking more closely at the example in the bug description, it looks like the macs there are only duplicated between master and vlan1 entries for the same port?22:15
sean-k-mooneythat confused me a bit22:16
fungii don't see the same macs appearing on different port numbers22:16
*** mlavalle has quit IRC22:16
sean-k-mooneyfor linxu bridge we shoudl not be adding vlans to the tap devices22:16
fungiso i think it's just vlan0/trunk and vlan1/tagged both being listed?22:17
sean-k-mooneymy understnading was the the linux bridge driver applied the vlan to the veth pari that connted the tenant bridge to the br-int22:18
sean-k-mooneyso the ports on the tenant network should be untagged22:18
fungialso it's unclear to me whether those entries are marked permanent because they were added through some other mechanism than learning, or whether they're showing up as permanent because their ageing time is 022:19
sean-k-mooneyi can deploy with linux bridge and try to see what happens22:19
sean-k-mooneyor perment because the port exist on the bridge22:20
fungiif you get a chance that would be stellar, but if not i'll dig in the docs when i get a moment22:20
sean-k-mooneyit looks to me like the only macs in teh fdb are for the local ports22:21
sean-k-mooneyimplying that the macs are never learned22:21
sean-k-mooneyfor non local porst22:21
sean-k-mooneynomaly we would expec the eno1.102 port to havee the macs of vms on other host listed22:22
sean-k-mooneyif that is the case the desciption in 17 is closer to being corerect as we would flood any packet to a non local port22:23
sean-k-mooneywhich would mean i need a 2 node deployment to test22:23
fungiaha, so it could be that the problem is traffic addressed to a non-local port is flooded to all ports including the local ports22:24
sean-k-mooneywhich i guess i can do i jsut need to set it up but i likely wont get to it until tomorow22:24
sean-k-mooneyfungi: yep22:24
sean-k-mooneythat what im suspecting22:24
fungithat would make more sense in light of the example bridge table and the other suggestions in the ensuing bug discussion22:25
fungithanks!22:25
*** eharney has quit IRC22:26
*** BjoernT_ has quit IRC22:30
sean-k-mooneyartom: do you mind if i rebase your series to fix the keystone_auth issues22:30
sean-k-mooneyartom: just running the test locally to confirm22:30
*** ivve has quit IRC22:32
sean-k-mooneyill quickly deploy and test it locally then sign off for the night22:32
*** mriedem has quit IRC22:36
openstackgerritsean mooney proposed openstack/nova master: Introduce live_migration_claim()  https://review.opendev.org/63566922:39
openstackgerritsean mooney proposed openstack/nova master: New objects for NUMA live migration  https://review.opendev.org/63482722:39
openstackgerritsean mooney proposed openstack/nova master: LM: add support for augmenting migrate_data with info from claims  https://review.opendev.org/63482822:39
openstackgerritsean mooney proposed openstack/nova master: LM: add support for updating NUMA-related XML on the source  https://review.opendev.org/63522922:39
openstackgerritsean mooney proposed openstack/nova master: NUMA live migration support  https://review.opendev.org/63460622:39
openstackgerritsean mooney proposed openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002122:39
openstackgerritsean mooney proposed openstack/nova master: Functional test for NUMA live migration  https://review.opendev.org/67259522:39
*** dklyle has quit IRC22:40
*** jmlowe has joined #openstack-nova22:42
*** rcernin has joined #openstack-nova22:45
*** tbachman has quit IRC22:51
*** tkajinam has joined #openstack-nova23:02
*** tbachman has joined #openstack-nova23:04
*** slaweq has joined #openstack-nova23:11
*** dave-mccowan has quit IRC23:11
sean-k-mooneyartom: it looks like your getting closer. did you modify the hugepage code.23:16
*** slaweq has quit IRC23:16
sean-k-mooneyi think migrtion with cpu pinning is now working but i think the hugepage migration is not23:16
sean-k-mooneyi will test it more corretly tomorrwo by force the cpus set to not over lap on each host and actully allocating hugepages rther then forcing small pages with hw:mem_page_size=small23:17
*** dklyle has joined #openstack-nova23:26
*** dave-mccowan has joined #openstack-nova23:33
*** tbachman has quit IRC23:35
openstackgerritMerged openstack/nova master: Add test for create server with integer AZ  https://review.opendev.org/67832823:36
*** markvoelker has joined #openstack-nova23:41
*** avolkov has quit IRC23:45
*** markvoelker has quit IRC23:46
artomsean-k-mooney, ugh, I had local changes I wanted to push23:51
artomYou just rebased them?23:51
artomI guess I'll keep my local branch, rebase, then push23:52

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!