Wednesday, 2018-10-03

*** takashin has joined #openstack-nova00:07
*** tetsuro has joined #openstack-nova00:15
*** med_ has joined #openstack-nova00:16
*** markvoelker has quit IRC00:17
*** markvoelker has joined #openstack-nova00:17
*** hshiina has joined #openstack-nova00:18
*** markvoelker has quit IRC00:21
*** gyee has quit IRC00:29
*** erlon has quit IRC00:35
openstackgerritMerged openstack/nova stable/rocky: Delete instance_id_mappings record in instance_destroy  https://review.openstack.org/60437300:56
*** macza has joined #openstack-nova00:59
*** macza has quit IRC01:03
*** mhen has quit IRC01:05
*** mhen has joined #openstack-nova01:07
openstackgerritMerged openstack/nova stable/rocky: Fix stacktraces with redis caching backend  https://review.openstack.org/60689501:09
openstackgerritMerged openstack/nova stable/rocky: Null out instance.availability_zone on shelve offload  https://review.openstack.org/60608601:09
*** hamzy has joined #openstack-nova01:12
openstackgerritMerged openstack/nova stable/rocky: XenAPI/Stops the migration of volume backed VHDS  https://review.openstack.org/60420301:13
openstackgerritMerged openstack/nova stable/pike: Follow devstack-plugin-ceph job rename  https://review.openstack.org/60202201:14
openstackgerritMerged openstack/nova stable/pike: Fix unit test modifying global state  https://review.openstack.org/58459201:14
*** mdbooth has quit IRC01:18
*** mdbooth has joined #openstack-nova01:18
*** mrsoul has joined #openstack-nova01:19
*** bhagyashris has joined #openstack-nova01:21
*** mdbooth has quit IRC01:21
*** mdbooth has joined #openstack-nova01:22
*** mdbooth has quit IRC01:25
*** mdbooth has joined #openstack-nova01:25
*** moshele has joined #openstack-nova01:26
*** tetsuro has quit IRC01:29
*** med_ has quit IRC01:30
*** mdbooth has quit IRC01:30
*** mdbooth has joined #openstack-nova01:31
*** Dinesh_Bhor has joined #openstack-nova01:31
*** Dinesh_Bhor has quit IRC01:38
*** Dinesh_Bhor has joined #openstack-nova01:47
*** tetsuro has joined #openstack-nova01:49
*** dave-mccowan has joined #openstack-nova01:56
*** tinwood has quit IRC02:10
*** tinwood has joined #openstack-nova02:11
*** markvoelker has joined #openstack-nova02:18
*** cfriesen has quit IRC02:28
*** moshele has quit IRC02:30
openstackgerritTetsuro Nakamura proposed openstack/nova stable/rocky: Fix aggregate members in nested alloc candidates  https://review.openstack.org/60745402:36
*** med_ has joined #openstack-nova02:50
*** markvoelker has quit IRC02:51
*** Dinesh_Bhor has quit IRC03:00
*** Dinesh_Bhor has joined #openstack-nova03:01
*** dave-mccowan has quit IRC03:08
*** med_ has quit IRC03:37
*** med_ has joined #openstack-nova03:40
*** hongbin has joined #openstack-nova03:41
*** markvoelker has joined #openstack-nova03:48
*** penick has quit IRC03:52
*** hongbin has quit IRC03:59
*** Dinesh_Bhor has quit IRC04:01
*** Dinesh_Bhor has joined #openstack-nova04:05
*** Dinesh_Bhor has quit IRC04:20
*** med_ has quit IRC04:21
mnaserhas anyone seen behaviour (on queens) where doing pci passthrough for gpus, qemu-kvm process goes up but seems to be spinning at 100% cpu and the process is stuck?04:21
*** markvoelker has quit IRC04:21
mnaserstrace just shows a bunch of ioctl's and poll's04:21
mnasernothing weird in dmesg or journals04:22
mnasernova side looks ok.. "Final resource view: name=<snip> phys_ram=524194MB used_ram=66560MB phys_disk=1863GB used_disk=225GB total_vcpus=48 used_vcpus=6 pci_stats=[PciDevicePool(count=7,numa_node=0,product_id='102d',tags={dev_type='type-PCI'},vendor_id='10de')]"04:23
*** macza has joined #openstack-nova04:26
*** macza has quit IRC04:30
mnaserHmm, I have some leads. I might push some doc changes for the pci passthrough04:45
mnaserDunno if it’s okay to have docs that are more hardware specific04:45
mnaserSomething along the lines of https://github.com/dholt/kvm-gpu/blob/master/README.md04:45
*** Dinesh_Bhor has joined #openstack-nova04:58
*** Bhujay has joined #openstack-nova05:01
*** Bhujay has quit IRC05:01
*** Bhujay has joined #openstack-nova05:02
*** markvoelker has joined #openstack-nova05:18
takashincd /tmp05:22
*** udesale has joined #openstack-nova05:41
*** ratailor has joined #openstack-nova05:43
*** markvoelker has quit IRC05:52
*** moshele has joined #openstack-nova06:00
gmannnova API office hour time06:00
gmann#startmeeting nova api06:01
openstackMeeting started Wed Oct  3 06:01:34 2018 UTC and is due to finish in 60 minutes.  The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot.06:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.06:01
*** openstack changes topic to " (Meeting topic: nova api)"06:01
openstackThe meeting name has been set to 'nova_api'06:01
gmannPING List: gmann, alex_xu06:01
gmannhanging around for some time if anyone has query related to API06:08
*** vivsoni has joined #openstack-nova06:10
*** Dinesh_Bhor has quit IRC06:11
openstackgerritOpenStack Proposal Bot proposed openstack/nova stable/rocky: Imported Translations from Zanata  https://review.openstack.org/60426006:14
*** dims has quit IRC06:24
*** dims has joined #openstack-nova06:26
gmannlet's close office hour.06:32
gmann#endmeeting06:32
*** openstack changes topic to "Current runways: use-nested-allocation-candidates -- This channel is for Nova development. For support of Nova deployments, please use #openstack."06:32
openstackMeeting ended Wed Oct  3 06:32:39 2018 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)06:32
openstackMinutes:        http://eavesdrop.openstack.org/meetings/nova_api/2018/nova_api.2018-10-03-06.01.html06:32
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/nova_api/2018/nova_api.2018-10-03-06.01.txt06:32
openstackLog:            http://eavesdrop.openstack.org/meetings/nova_api/2018/nova_api.2018-10-03-06.01.log.html06:32
*** dims has quit IRC06:34
*** Dinesh_Bhor has joined #openstack-nova06:35
*** vivsoni has quit IRC06:35
*** dims has joined #openstack-nova06:35
openstackgerritMerged openstack/nova stable/ocata: Update RequestSpec.flavor on resize_revert  https://review.openstack.org/60588006:37
*** dpawlik has joined #openstack-nova06:39
openstackgerritMerged openstack/python-novaclient master: Fix up userdata argument to rebuild.  https://review.openstack.org/60534106:47
*** pcaruana has joined #openstack-nova06:51
*** vivsoni has joined #openstack-nova06:53
*** sahid has joined #openstack-nova06:54
bauzasgood morning nova07:10
*** jding1_ has joined #openstack-nova07:10
*** aarents has joined #openstack-nova07:17
*** Dinesh_Bhor has quit IRC07:19
*** jroll has quit IRC07:19
*** logan- has quit IRC07:19
*** jackding has quit IRC07:19
*** ajo has quit IRC07:19
*** Gorian has quit IRC07:19
*** odyssey4me has quit IRC07:19
*** jcosmao has quit IRC07:19
*** vdrok has quit IRC07:19
*** amotoki has quit IRC07:23
gibimorning07:28
*** helenafm has joined #openstack-nova07:29
*** jroll has joined #openstack-nova07:32
*** logan- has joined #openstack-nova07:32
*** ajo has joined #openstack-nova07:32
*** vdrok has joined #openstack-nova07:32
*** jcosmao has joined #openstack-nova07:32
*** odyssey4me has joined #openstack-nova07:32
*** skatsaounis has quit IRC07:36
*** tssurya has joined #openstack-nova07:38
*** Sigyn has quit IRC07:40
*** alexchadin has joined #openstack-nova07:40
*** Sigyn has joined #openstack-nova07:40
openstackgerritBalazs Gibizer proposed openstack/nova master: Use provider tree in virt FakeDriver  https://review.openstack.org/60408307:41
openstackgerritBalazs Gibizer proposed openstack/nova master: Refactor allocation checking in functional tests  https://review.openstack.org/60728707:41
openstackgerritBalazs Gibizer proposed openstack/nova master: Run ServerMovingTests with nested resources  https://review.openstack.org/60408407:41
openstackgerritBalazs Gibizer proposed openstack/nova master: Ignore forcing of live migration for nested instance  https://review.openstack.org/60578507:41
openstackgerritBalazs Gibizer proposed openstack/nova master: Consider nested allocations during allocation cleanup  https://review.openstack.org/60605007:41
openstackgerritBalazs Gibizer proposed openstack/nova master: Ignore forcing of evacuation for nested instance  https://review.openstack.org/60611107:41
openstackgerritBalazs Gibizer proposed openstack/nova master: Run negative server moving tests with nested RPs  https://review.openstack.org/60412507:41
*** jpena|off is now known as jpena07:51
openstackgerritBalazs Gibizer proposed openstack/nova master: consumer gen: support claim_resources  https://review.openstack.org/58366707:52
openstackgerritBalazs Gibizer proposed openstack/nova master: Enable nested allocation candidates in scheduler  https://review.openstack.org/58567207:52
openstackgerritBalazs Gibizer proposed openstack/nova master: Use provider tree in virt FakeDriver  https://review.openstack.org/60408307:52
openstackgerritBalazs Gibizer proposed openstack/nova master: Refactor allocation checking in functional tests  https://review.openstack.org/60728707:52
openstackgerritBalazs Gibizer proposed openstack/nova master: Run ServerMovingTests with nested resources  https://review.openstack.org/60408407:52
openstackgerritBalazs Gibizer proposed openstack/nova master: Ignore forcing of live migration for nested instance  https://review.openstack.org/60578507:52
openstackgerritBalazs Gibizer proposed openstack/nova master: Consider nested allocations during allocation cleanup  https://review.openstack.org/60605007:52
openstackgerritBalazs Gibizer proposed openstack/nova master: Ignore forcing of evacuation for nested instance  https://review.openstack.org/60611107:52
openstackgerritBalazs Gibizer proposed openstack/nova master: Run negative server moving tests with nested RPs  https://review.openstack.org/60412507:52
*** tetsuro has quit IRC07:54
*** ralonsoh has joined #openstack-nova07:55
*** rcernin has quit IRC07:55
*** takashin has left #openstack-nova08:02
*** vivsoni has quit IRC08:07
*** jroll has quit IRC08:13
*** logan- has quit IRC08:13
*** ajo has quit IRC08:13
*** odyssey4me has quit IRC08:13
*** jcosmao has quit IRC08:13
*** vdrok has quit IRC08:13
*** markvoelker has joined #openstack-nova08:18
*** vivsoni has joined #openstack-nova08:18
*** alexchadin has quit IRC08:21
*** jroll has joined #openstack-nova08:27
*** logan- has joined #openstack-nova08:27
*** ajo has joined #openstack-nova08:27
*** vdrok has joined #openstack-nova08:27
*** jcosmao has joined #openstack-nova08:27
*** odyssey4me has joined #openstack-nova08:27
*** amotoki_ has joined #openstack-nova08:27
*** skatsaounis has joined #openstack-nova08:28
*** cdent has joined #openstack-nova08:31
*** derekh has joined #openstack-nova08:33
ralonsohstephenfin: https://review.openstack.org/#/c/476612/36/vif_plug_ovs/ovsdb/ovsdb_lib.py@83. I don't understand this08:47
ralonsohstephenfin: do you mean I need to move this function... where?08:47
*** ttsiouts has joined #openstack-nova08:51
*** jroll has quit IRC08:51
*** logan- has quit IRC08:51
*** ajo has quit IRC08:51
*** odyssey4me has quit IRC08:51
*** jcosmao has quit IRC08:51
*** vdrok has quit IRC08:51
*** markvoelker has quit IRC08:51
*** ttsiouts has quit IRC08:53
*** ttsiouts has joined #openstack-nova08:53
openstackgerritRodolfo Alonso Hernandez proposed openstack/os-vif master: Add abstract OVSDB API  https://review.openstack.org/47661208:54
*** derekh has quit IRC08:58
*** ttsiouts has quit IRC08:58
* stephenfin looks08:58
*** ttsiouts has joined #openstack-nova09:00
stephenfinralonsoh: Sorry, yeah, I mean move that above 'create_ovs_vif_port' or below 'delete_ovs_vif_port', so that 'create_', 'update_' and 'delete_' are grouped together09:02
stephenfinralonsoh: It's a nit though. Don't worry about it09:02
ralonsohstephenfin: done!09:02
stephenfinOh, perfect :)09:03
stephenfinI'll review that now09:03
*** jroll has joined #openstack-nova09:05
*** logan- has joined #openstack-nova09:05
*** ajo has joined #openstack-nova09:05
*** vdrok has joined #openstack-nova09:05
*** jcosmao has joined #openstack-nova09:05
*** odyssey4me has joined #openstack-nova09:05
*** ttsiouts has quit IRC09:07
*** ttsiouts has joined #openstack-nova09:07
*** ttsiouts_ has joined #openstack-nova09:10
*** ttsiouts has quit IRC09:12
*** alexchadin has joined #openstack-nova09:13
*** priteau has joined #openstack-nova09:13
openstackgerritStephen Finucane proposed openstack/nova master: doc: Rewrite the console doc  https://review.openstack.org/60614809:38
openstackgerritStephen Finucane proposed openstack/nova master: doc: Add minimal documentation for RDP consoles  https://review.openstack.org/60699209:38
openstackgerritStephen Finucane proposed openstack/nova master: doc: Add minimal documentation for MKS consoles  https://review.openstack.org/60699309:38
openstackgerritStephen Finucane proposed openstack/nova master: conf: Allow 'nova-xvpvncproxy' to be called with CLI args  https://review.openstack.org/60692909:38
*** alexchadin has quit IRC09:46
*** mdbooth has joined #openstack-nova09:46
bauzasdoes anyone know how to unstack/stack devstack with OFFLINE=True for a single project ?09:47
bauzasI mean, I have all my stack but I just want to reinstall nova from master09:47
sean-k-mooneybauzas: unstack, then manually git pull/checkout master on the nova repo set OFFLINE=True and stack09:49
sean-k-mooneydevstack wont update any of the projects and just used what is availabel09:50
sean-k-mooneyif you are unlucky you might need to pip install requirements form master if they have changed but that rarely is required09:50
bauzassean-k-mooney: that was my plan but I had the assumption that it would redeploy *all* projects from ENABLED_SERVICES09:51
sean-k-mooneybauzas: it will09:51
bauzassean-k-mooney: my point is that I just want to init_nova() honestly09:51
sean-k-mooneyis that an issue09:51
sean-k-mooneyoh well dont unstack then09:51
bauzassean-k-mooney: my need is just that now that I reshaped some inventories, I just want to reshape back09:51
sean-k-mooneyjust checkout the branch you want and run sudo systemctl restart devstack@n-*09:52
sean-k-mooneyif reshape is done by nova manage change restart to stop, then run nova manage and then start09:53
sean-k-mooneyas long as there are no schema migrations in between your current branch and the one your going to and no requiremetns changes you dont need to restack to change commit. just git checkout and restart service X09:55
cdentyeah, that's what I was going to suggest09:58
*** bhagyashris has quit IRC10:04
*** ttsiouts_ has quit IRC10:07
stephenfinbauzas: Could you look at pushing https://review.openstack.org/#/c/456572/ through?10:08
*** kukacz has quit IRC10:12
*** kukacz has joined #openstack-nova10:13
bauzassean-k-mooney: the problem is that the inventories are reshaped10:17
bauzassean-k-mooney: so a DB sync won't work, right?10:17
bauzasbecause all the tables are there10:17
sean-k-mooneybauzas: you can always drop the tables and then sync i guess10:18
bauzasif I'm dropping the tables, I miss eg. https://github.com/openstack-dev/devstack/blob/master/lib/nova#L722-L72310:19
bauzassean-k-mooney: ^10:19
sean-k-mooneyi dont think there is any magical way to reshape witout deleting the reshaped RPs and restarting nova compute10:20
sean-k-mooneythat is what i ment by droping the tables10:20
bauzasI think I'll just unstack/unstack for this time, and snapshot the DB10:21
*** derekh has joined #openstack-nova10:22
bauzasso, when I want to go backwards, I'll just use the dumpfile10:22
sean-k-mooneyya that would work but other then for local testing is there a reason you are trying to downgrade?10:22
bauzassean-k-mooney: no, just testing indeed10:22
sean-k-mooneyif you have never used it https://www.heidisql.com/ is a great little tool for working with dbs10:23
bauzaswell10:23
sean-k-mooneyyou need to run it under wine however10:23
bauzasjust for what I want, a mysqldump is enough10:24
*** udesale has quit IRC10:26
sean-k-mooneythe one thing that annoys me more then the fact that we use a 80 charter line lenght is that we configure pep8 on spec for 79 charters10:36
*** med_ has joined #openstack-nova10:41
openstackgerritsean mooney proposed openstack/nova-specs master: Add spec for sriov live migration  https://review.openstack.org/60511610:43
*** mdbooth has joined #openstack-nova10:47
*** mvkr has quit IRC10:48
*** markvoelker has joined #openstack-nova10:49
*** hshiina has quit IRC10:55
stephenfinThat moment of panic where you submit a review and spot a group of comments on older patchsets from the corner of your eye10:57
stephenfinWhat *did* Stephen of June 2018 have to say about this...10:57
*** ttsiouts has joined #openstack-nova11:02
*** mvkr has joined #openstack-nova11:02
*** erlon has joined #openstack-nova11:03
*** moshele has quit IRC11:03
*** amotoki_ is now known as amotoki11:07
mdboothstephenfin: Fancy a bash at this one: https://review.openstack.org/#/c/605436/ I bitch about python3 in it ;)11:08
openstackgerritsean mooney proposed openstack/nova-specs master: Add spec for sriov live migration  https://review.openstack.org/60511611:09
*** s10 has joined #openstack-nova11:11
*** markvoelker has quit IRC11:22
*** ratailor has quit IRC11:37
*** macza has joined #openstack-nova11:37
*** jpena is now known as jpena|lunch11:38
sean-k-mooneymdbooth: one basic question regarding https://review.openstack.org/#/c/605436/5. the compaute manger runs on the compute agent which is singel treaded but uses eventlets. so there is no paralleism but there is concurancy. so the lock you are aquiring is the mockey patched greenthread lock. is the reason we need the lock in the first place the fact we are doing db io and eventlets is cause us to yeild11:39
sean-k-mooneyallow another invocation of the fuction to start concurrently which races11:39
mdboothsean-k-mooney: Without going into details of locking, I find it's safest to ignore eventlets entirely when considering locking.11:40
mdboothWhen you tie yourself in knots trying to second guess a scheduler, you make lots of mistakes.11:41
sean-k-mooneymdbooth: if we did not have eventlets in this case we would not need to lock11:41
mdboothIt has multiple threads of execution.11:41
mdboothI don't care how we achieve that.11:41
sean-k-mooneythe compute manager is exectued from the compute agent right which does not have workers so only one thread11:42
mdboothEventlets or python's 'threading' all have the same issues.11:42
*** macza has quit IRC11:42
sean-k-mooneymdbooth: yes but my point is eventlets intoduced the concurency so that therefor we need a lock to be correct11:43
mdboothsean-k-mooney: We have concurrency.11:43
sean-k-mooneyif we did not have eventlets the previous code would have been correct because it would have been singel threaded11:43
sean-k-mooneymdbooth: yep i know11:44
mdboothsean-k-mooney: Sure. We do have concurrency, though. It uses eventlets.11:44
sean-k-mooneyjust makeing sure i understand the subtelties of the patch. this is an example of why i hate eventlets it hides concurancy11:44
mdboothsean-k-mooney: It doesn't really by the time you get into the compute manager.11:44
mdboothYou just ignore eventlets entirely and assume you have multiple concurrent threads. How they're implemented isn't all that important in that code.11:45
mdboothIf we later switched to a multi-threaded model, it would still be fine.11:45
sean-k-mooneymdbooth: well if you were new to nova or did not think about it at the time then you can write races because of eventlets easier then if it was expcitly threaded11:46
mdboothHonestly, I never consider eventlets. I assume it's explicitly threaded.11:46
mdboothIt's not, but that doesn't have any bearing on writing safe code, except when there's bugs in eventlet.11:47
sean-k-mooneymdbooth: most new  people i have talked to that work on openstack dont think about threads at all because the say oh its python and that has a gil so i dont need to care11:47
* mdbooth would tell anybody new to Nova to ignore eventlets and assume it's multi-threaded.11:47
mdboothsean-k-mooney: That is just one of many issues with python in the real world :(11:48
mdboothAlso, the gil doesn't prevent overlapping threads of execution, it just stops them running at the same time. The problems are the same.11:48
sean-k-mooneymdbooth: yes and no. there perception is normally correct. its eventlets that violates the paradime11:49
mdboothAs an old curmudgeon, I think python has been extremely detrimental to software engineering, particular in education11:49
mdboothNo, it would not be correct. If you have a multi-threaded python program, even though the gil prevents multiple threads running concurrently, you still need locks.11:50
sean-k-mooneymdbooth: i self taught myself c++ as my first langage then java so ya i like to understand exactly what is going on11:50
mdboothBasically: eventlets or python-multithreading: I don't care. It shouldn't change how your write code.11:51
sean-k-mooneylearning c++ fist made me a better engineer then learing python would have.11:51
mdboothThey both use fake threading.11:52
openstackgerritMatthew Booth proposed openstack/nova master: Run evacuate tests with local/lvm and shared/rbd storage  https://review.openstack.org/60440011:53
mdboothI think ^^^ might fix that weird issue I was hitting11:53
mdboothThere's a shortcut in service_is_up if the service is forced down, and we forced it down11:53
sean-k-mooneymdbooth: yes i know if it was expcitly multi thraded it would be incorrect without the lock. anyway cool ill take a look at that too11:54
mdboothsean-k-mooney: Don't worry about ^^^ btw. Will just wait until zuul has voted.11:55
mdboothsean-k-mooney: I knew I was missing something simple there.11:55
sean-k-mooneywaith why was it forced down?11:55
sean-k-mooneyoh so you could evacuate11:56
mdboothThe test_evacuate.sh script which ran before the second round...11:56
mdboothyeah11:56
sean-k-mooneyha yet anthoer race this tiem between test :)11:57
nicolasbockMorning12:00
nicolasbockI have a run-away server, i.e. a VM that's running on a hypervisor different than what Nova thinks. So far I haven't quite been able to get the placement service to help me update Nova's view of reality...12:01
nicolasbockI can see the server with 'openstack server show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd'12:02
nicolasbockAnd it lists the wrong hypervisor12:02
mdboothsean-k-mooney: $ git grep test\.nested | wc -l12:02
mdbooth34812:02
mdboothI'm not moving that ;)12:02
mdboothsean-k-mooney: Feel free to write a follow-up patch12:03
nicolasbockI can also check that with 'openstack resource provider allocation show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd'12:03
*** udesale has joined #openstack-nova12:03
sean-k-mooneymdbooth: ok i would have just done nested=common.nested in test.py12:03
nicolasbockThe VM is really running on 6cbb84b0-02f4-4ee3-9df2-151475b1effe12:04
mdboothsean-k-mooney: Sure. I don't want to mess with test.nested here, though.12:04
nicolasbockBut `openstack resource provider allocation set --allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd` is not working12:04
sean-k-mooneymdbooth: ok ill submist a follow up patch. eventully... i have added it to my whiteboard12:05
nicolasbockI suppose I am missing a `resource-class-name`12:05
*** ttsiouts has quit IRC12:05
nicolasbockBut what do I put there?12:05
*** ttsiouts has joined #openstack-nova12:07
openstackgerritElod Illes proposed openstack/nova stable/ocata: Fix the help for the disk_weight_multiplier option  https://review.openstack.org/60753712:08
*** ttsiouts has quit IRC12:11
sean-k-mooneynicolasbock: was the vm migrated12:11
nicolasbockYes sean-k-mooney12:12
sean-k-mooneymdbooth: was lookin at an edgecase recently where if cleanup on the source fail we dont update the vm host12:12
sean-k-mooneynicolasbock: e.g. when you finish migrating the instace if the post migrate job on the source node failts to say unplug a vif we fail before we update the instace db record to refect that the vm is running on the new node12:14
sean-k-mooneymdbooth: did you ever proposea a patch for ^12:14
mdboothsean-k-mooney: Probably.12:15
*** moshele has joined #openstack-nova12:15
sean-k-mooneynicolasbock: are the placement allocation currently associated with the vm correct for the host it is actully on12:17
sean-k-mooneye.g. if you ignore the db host value and actully check the vms location12:17
mdboothsean-k-mooney: Trying to parse your comment here: https://review.openstack.org/#/c/605436/5/nova/compute/manager.py@144712:18
mdboothAre you saying we can release the lock there?12:18
mdboothOr yield the context manager there?12:18
mdboothOr something else, because neither of ^^^ would be correct.12:18
nicolasbockThe VM is running on `6cbb84b0-02f4-4ee3-9df2-151475b1effe`12:18
nicolasbockBut placement says that it's on `57b0e4d5-3a3e-4cf3-ba8c-b88c8ce4679b`12:19
sean-k-mooneymdbooth: im saying if we dont have a lock when we invoke that line the db request could cause use to yeild causeing a race12:19
mdboothAh, *eventlet* yield12:19
mdboothOk.12:19
sean-k-mooneye.g. this is the thing that definetly needs to be in the critcal section of the lock12:19
nicolasbockI ran `openstack server show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd`12:20
*** dave-mccowan has joined #openstack-nova12:20
nicolasbockand `openstack resource provider allocation show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd`12:20
sean-k-mooneymdbooth: ya re reading that was not clear12:21
sean-k-mooneynicolasbock: is 57b0e4d5-3a3e-4cf3-ba8c-b88c8ce4679b the source or destination of the migration12:21
sean-k-mooneynicolasbock: im assume the destination correct?12:22
sean-k-mooneysorry source12:22
nicolasbockI don't know what happened, but I would guess that it is the source12:23
nicolasbockSorry, I am not sure I completely grasp the terminology of source and destination12:23
sean-k-mooneyya so the resouce being used on 6cbb84b0-02f4-4ee3-9df2-151475b1effe are likely still owned by the migration object in placement12:23
nicolasbockWhat's the migration object?12:24
*** med_ has quit IRC12:25
sean-k-mooneywhen you do a migration we create a migration record that we use to calim resouces on the destination host then when the vm is move we use a special atomic oepration in the placemnt api to change the allocation consumer form the migration recordds uuid to the vms uuid12:26
nicolasbockSo you are saying that that atomic operation wasn't executed?12:27
sean-k-mooneyyes12:27
nicolasbockOk12:27
nicolasbockCan I get it to execute?12:27
sean-k-mooneyso if you do nova server-migration-list 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd does it have a migration object listed12:28
nicolasbockNo12:28
sean-k-mooneyoh hum strange. perhaps the migration has already been confirmed.12:30
sean-k-mooneynicolasbock: efried might be able to help better the i if he is around12:31
nicolasbockSo I thought that `openstack resource provider allocation set` would allow me to update the DB12:31
*** ttsiouts has joined #openstack-nova12:31
nicolasbockThanks sean-k-mooney !12:31
nicolasbockBut I am not using that command correctly since it complains about an incorrect 'allocation string format'12:32
*** ttsiouts has quit IRC12:33
*** ttsiouts has joined #openstack-nova12:34
openstackgerritVlad Gusev proposed openstack/nova stable/pike: libvirt: Use os.stat and os.path.getsize for RAW disk inspection  https://review.openstack.org/60754412:36
openstackgerritMatthew Booth proposed openstack/nova master: DNM: Run against mriedem's evacuate test  https://review.openstack.org/60442312:37
*** jpena|lunch is now known as jpena12:41
mdboothThis is an interesting query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AssertionError%3A%20u'host3'%20%3D%3D%20u'host3'%5C%2212:44
mdboothI wonder why that started spiking only a few days ago: code change, or infra change?12:44
*** moshele has quit IRC12:53
*** artom has quit IRC12:56
stephenfinlyarwood: RE: https://review.openstack.org/588570 I'd been putting it off but will do that now. Will keep you posted13:03
lyarwoodstephenfin: cheers13:04
*** udesale has quit IRC13:05
s10How can I reopen bug https://bugs.launchpad.net/nova/+bug/1209101 ? It still exists.13:06
openstackLaunchpad bug 1209101 in OpenStack Compute (nova) "Non-public flavor cannot be used in created tenant" [High,Fix released] - Assigned to Sumanth Nagadavalli (sumanth-nagadavalli)13:06
sean-k-mooneys10: just change status and leave a comment with details13:07
s10sean-k-mooney: I've left comment, but I can't change status, all of them are grey.13:08
sean-k-mooneys10: that said i dont think its nessisarly the same but. that was fixed in 201313:08
sean-k-mooneyits more likely a regression13:08
sean-k-mooneyare you seeing this on master?13:08
s10sean-k-mooney: Yes, this regression have never been fixed. I will write a comment about how to reproduce it.13:09
sean-k-mooneywell it was fixed and then reverted so this likely needs to be treated as more then a bug fix but rather as a blueprint/spec13:10
sean-k-mooneythe original bug fix predates microversions so i think a mini spec + microversion but would be required to alter the api behavior13:13
sean-k-mooneybauzas: is ^ correct13:13
bauzascontext ?13:14
sean-k-mooneyprivate flavor are automatically expsed to new tenants13:14
sean-k-mooneybecause https://bugs.launchpad.net/nova/+bug/1209101 was reverted13:15
openstackLaunchpad bug 1209101 in OpenStack Compute (nova) "Non-public flavor cannot be used in created tenant" [High,Fix released] - Assigned to Sumanth Nagadavalli (sumanth-nagadavalli)13:15
sean-k-mooneysorry are not automatically exposed13:15
sean-k-mooneybauzas: so context is if we want to chagne teh behavior of the api to auto grant access to the private flavor that would require a spec rather then being just a bug fix right as its an api change?13:18
*** liuyulong has joined #openstack-nova13:20
bauzassean-k-mooney: IIUC, I'd tend to say yes, as it's a behavioural change13:20
bauzaswe don't really call the fact to not show private flavors as a "bug"13:20
bauzassome people would also like to keep this behaviour I guess13:20
bauzasand last but not the least, two OpenStack clouds could behave differently for the same request and list of flavors, which is not interop13:21
bauzasHTH13:21
*** mvkr has quit IRC13:23
*** eharney has joined #openstack-nova13:23
*** artom has joined #openstack-nova13:30
sean-k-mooneys10: im just having lunch but based on bauzas confirmation rather then repoen the but i would suggest you file a nova spec. if you dont have time to do that i can see if i can do it later today but you have more context then i as to what you wanted to achive13:34
*** mriedem has joined #openstack-nova13:37
*** cfriesen has joined #openstack-nova13:39
efriednicolasbock: Still around?13:39
efriednicolasbock: mriedem would be a better source of CLI syntax help, but I can tell you what the API call itself would need to look like.13:40
mnaserdoes anyone know if CERN does pci passthrough on centos or ubuntu?13:41
dansmithI thought they were all centos13:42
mnaseri'm trying to figure out why pci passthrough isn't working13:43
mnasereverything nova side is working ok, but the newly spawned qemu-kvm process is stuck spinning at 100% cpu, no console logs, libvirt qemu logs show nothing..13:43
mnaserso i'm a bit at a loss, not really sure where to go next13:44
s10sean-k-mooney: Basically, what I want to archive is to be able to provide tenants and ability to manage private flavors which they created.13:44
s10sean-k-mooney: But this will be more complicated than automatic flavor access add after private flavor creation...13:46
sean-k-mooneys10: ok if that is the usecase then that should be relitvly simple to capture in a spec. ill write up a spec.13:46
sean-k-mooneys10: oh why should we not jsut need to allow the teant that created teh flavor to have acess to it automatically?13:47
sean-k-mooneyfor interop reasons we will need a microversion bump but that is just a mechanical sideeffect not germain to the feature13:48
*** mvkr has joined #openstack-nova13:50
*** awaugama has joined #openstack-nova13:52
s10sean-k-mooney: I want them to have access to created flavor automatically because there is no RBAC in nova for private flavors. Flavors don't have an owner. I can't only allow tenants to run flavor-access-add for "their" flavors :(13:53
*** tbachman has joined #openstack-nova13:56
mnaserou13:58
mnaseri wonder if this has to do with using lvm local storage13:58
mnaserand the qemu user not being able to access it13:58
mnaserhmm13:58
mnaseryup, it can't access it13:59
mnaserwould it be a responsibility of nova to setup permissions of volumes under lvm so that the qemu user can read it?14:00
*** maciejjozefczyk has quit IRC14:01
*** rpittau_ has joined #openstack-nova14:02
mdboothI just ran a git bisect to try to find out why the incidence of test_parallel_evacuate_with_server_group failure went from occasional to almost 50% in the last few days, and the culprit is my patch from the other day: https://review.openstack.org/#/c/604859/14:02
mdboothI still consider this a feature :)14:03
*** rpittau has quit IRC14:03
*** munimeha1 has joined #openstack-nova14:05
*** Bhujay has quit IRC14:06
*** hoangcx has quit IRC14:06
*** Bhujay has joined #openstack-nova14:07
mriedemdansmith: did you know there was a cinder_img_volume_type metadata key which can be used to create a bootable volume with a specific volume type? https://docs.openstack.org/cinder/latest/cli/cli-manage-volumes.html#volume-types14:14
mriedemso technically people today could boot from volume from an image with that metadata and get what they wanted instead of passing a volume type to nova - clunky i know14:15
dansmithon the image14:15
mriedemyeah,14:15
dansmithI did not know that, no14:15
mriedemlikely also means that we need to make a decision in the compute API if the user specifies a volume type and the source image has that metadata key/value, which do we pick? or do we 400?14:15
mriedemprobably need to know what cinder does in that same case14:16
mriedemsmcginnis: do you know off the top of your head? ^14:16
*** Bhujay has quit IRC14:16
dansmithseems to me like if they ask for something on the boot request, that always wins14:17
dansmithlike, we have a default type, and there can be a default for an image,14:17
smcginnisIf you explicitly provide a type, I believe we will give that priority over the image property.14:17
dansmithbut if they ask for something specific at the time, I would expect they want the one they asked for14:17
mriedemyeah that's what i'd expect too14:17
smcginnisSo fallback can be to have the volume type stuffed in the image properties, but that should not change the primary usage of someone saying specifically what they want.14:17
*** mlavalle has joined #openstack-nova14:19
* bauzas facepalms when he looks https://stackoverflow.com/questions/11941817/how-to-avoid-runtimeerror-dictionary-changed-size-during-iteration-error14:20
mnaseryup.  that was the issue14:20
mnaserif you use lvm on centos with nova, the volumes are created under user 'root'14:20
mnaserso qemu process cant touch them and it cant boot14:21
mnaseris this technically a nova bug?14:21
mnaser(aka the devices on the system /dev/vg_foo/vmuuid_disk are root:root, qemu-kvm runs on qemu:qemu)14:21
*** hoangcx has joined #openstack-nova14:22
nicolasbockHi efried I am still here14:23
efriednicolasbock: So looking at the manual (https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-set) it appears as though you're going to want multiple --alocation params...14:25
s10mriedem: Can we remove line https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L3669 ? It causes bug https://bugs.launchpad.net/nova/+bug/1746972 . I don't believe that error in set-admin-password on running instance should put instance in error state and require cloud admin intervention to reset instances state or that user should abandon and remove this vm.14:25
openstackLaunchpad bug 1746972 in OpenStack Compute (nova) "After setting the password failed, the VM state is set to error" [Undecided,Confirmed]14:25
nicolasbockYes, that's my reading too efried14:25
nicolasbockBut I don't understand what the other parameters should look like14:26
efriednicolasbock: I *think* each should look like --allocation rp=$rp_uuid,$rc=$amount14:26
dansmithmnaser: what do you want nova to do? chown them? afaik, it doesn't know what user qemu will run as14:26
openstackgerritMerged openstack/nova stable/pike: nova-status - don't count deleted compute_nodes  https://review.openstack.org/60478814:26
nicolasbockOk. What I don't get is why I need a resource-class in there as well. I don't want anything to update in terms of resource classes14:27
efriednicolasbock: Oh, but you do :)14:27
nicolasbockI do?14:27
*** dpawlik has quit IRC14:27
efriednicolasbock: I guess it's obvious to me because I know what the REST payload looks like, but come to think of it, it makes sense how you're thinking about it.14:27
nicolasbockMaybe I am not looking at resource classes correctly14:28
efriednicolasbock: See, the allocations in the API are a hierarchical structure like resource provider => resource class => amount14:28
nicolasbockBut the way I am thinking about them is that they specify things like memory and CPU cores14:28
nicolasbockOk14:28
efriedAnd also the CLI (and the API it's using) is designed to fully *replace* allocations, not like edit pieces of them.14:28
efriedSo given that...14:29
efriednicolasbock: You should do `openstack resource provider allocation show $instance_uuid`14:29
efriedWhich should give you allocations in three-ish resource classes14:29
efriednicolasbock: can you pastebin me that output?14:29
dansmithyou know what would be awesome14:29
dansmithopenstack resource provider allocation edit <uuid>14:30
nicolasbockhttps://pastebin.com/sFdzQuPE14:30
dansmithlike virsh edit14:30
*** ttsiouts has quit IRC14:30
openstackgerritMatt Riedemann proposed openstack/nova master: doc: fix and clarify --block-device usage in user docs  https://review.openstack.org/60758914:30
sean-k-mooneydansmith: that could be done as a client only feature but yes that would be nice14:31
dansmithsean-k-mooney: obviously client-only14:31
sean-k-mooneywell i was debating if you wuld want to put it in th openstack sdk or just the osc plugin14:31
sean-k-mooneybut ya i think just in the plugin14:32
dansmithoh, I meant just in the plugin14:32
dansmithyeah14:32
dansmithand you could translate to/from yaml for the actual editing maybe14:32
dansmithso people aren't having to hand-edit json14:32
dansmithsince you need to validate the schema before you send it back anyway14:32
mnaserdansmith: yeah, that why i don't think it's a nova problem but maybe something that we should document.. or libvirt should14:33
sean-k-mooneysounds like a nice low hangin fruit bug14:33
efriednicolasbock: Okay, so I think you're going to want to build your command with:14:33
efried--allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe,MEMORY_MB=8192 \14:33
efried--allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe,VCPU=4 \14:33
efried--allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe,DISK_GB=8014:33
mriedemha14:33
mriedem"InstancePasswordSetFailed: Failed to set admin password on14:33
mriedem 9f9330c2-4ab4-45f1-a9f9-2770dd34cf30 because error setting admin password"14:33
nicolasbockAh ok14:34
mriedem"we failed because we failed"14:34
efriedmriedem: duh14:34
nicolasbockLet me try that14:34
mriedems10: i don't know why the instance is put into ERROR state there, i want to say i've seen a patch to remove that14:34
efriednicolasbock: Note that's gotta be all in one command. Otherwise you'll end up with an instance with just disk :)14:34
s10mriedem: yes, I see, there is https://review.openstack.org/#/c/555160/14:35
nicolasbockGood point efried :)14:35
mriedemefried: nicolasbock: might be sensible to have an "openstack resource provider allocation class set" similar to the inventory class set CLI https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-inventory-class-set14:36
nicolasbockSo the command worked14:36
mriedem^ allows you to set inventory on a provider for a specific class, not replace the entire set of inventory for the provider14:36
nicolasbockBut now I have https://pastebin.com/KrcWAXbF14:36
mriedemthat uses https://developer.openstack.org/api-ref/placement/#update-resource-provider-inventory14:36
sean-k-mooneyefried: that is proably another reason to have an edit command since this all needs to be done atomicly14:36
mriedemwe don't have an api like that for allocations, which is why there isn't a CLI for it14:37
mriedemwe just have https://developer.openstack.org/api-ref/placement/#update-allocations14:37
openstackgerritMerged openstack/nova stable/rocky: Ignore VirtDriverNotReady in _sync_power_states periodic task  https://review.openstack.org/60553314:37
mriedembut we could easily write a command that just updates one of the resource classes within the existing allocations14:37
nicolasbockYes that sounds sensible mriedem14:37
efriednicolasbock: Oh, interesting. That's... probably a bug.14:37
nicolasbock:)14:38
*** ivve has joined #openstack-nova14:38
mriedemi very much doubt osc-placement handles consumer generations yet, so it could be racy for the CLI to orchestrate this14:38
nicolasbockI should remove the old allocation, right?14:38
mriedembut that's probably a low risk14:38
efriednicolasbock: Yeah, except the only way to do that is openstack resource provider allocation delete $instance_uuid which (I sincerely hope) removes all of them.14:38
efriednicolasbock: actually what may have happened is that the source host still thinks it has the instance, and it "healed" the allocations.14:39
nicolasbockAll of them?14:39
efriedThat would be something to look in the logs for.14:39
nicolasbockOk14:39
mriedemdo you have ocata computes?14:39
nicolasbockBut if it removes all of them, wouldn't that be bad?14:39
efriednicolasbock: Well, if you remove all of them, then you can run your 'set' command to restore the proper ones.14:40
efriedBut14:40
mriedemif you have ocata computes, the resource tracker is reporting the allocations it thinks exist to placement14:40
efriedif my suspicion is correct, once you delete all the allocations and wait a minute, the original (source) allocations will magically reappear.14:40
*** ttsiouts has joined #openstack-nova14:40
efriedokay, so mriedem that would explain the source allocs magically reappearing?14:41
nicolasbockmriedem: This is using Newton14:42
nicolasbockI'll try to delete the allocation14:43
nicolasbockAnd wait to see what happens :)14:43
mriedemnewton/ocata computes will recreate allocations yes14:43
mriedemuntil you get everything upgraded to >= pike, the resource tracker periodic task in the compute service will try to manage allocations14:44
nicolasbockThe new allocation was deleted while we were chatting14:44
nicolasbockInteresting mriedem14:44
nicolasbockBut where is the periodic task getting its information from?14:45
mriedemthe instances it thinks are running on that host,14:46
mriedemand those instances flavors14:46
nicolasbockIs there a way to update that?14:46
mriedemso if compute host A thinks instance B is running on it with a flavor that uses x,y,z vcpu/ram/disk, it's going to report that14:46
mriedemupdate what?14:47
nicolasbockSo I would have to convince the compute host that it's not running the instance?14:47
efriednicolasbock: I kind of missed how we got into this situation. What makes you think the instance was successfully removed from the source host?14:47
mriedemnicolasbock: is the instance.host in the db pointing at that host?14:47
nicolasbockI am going by what `openstack server show` is telling me :)14:47
mriedemserver show should also tell you yeah14:48
openstackgerritMatthew Booth proposed openstack/nova master: Run evacuate tests with local/lvm and shared/rbd storage  https://review.openstack.org/60440014:48
*** slaweq has quit IRC14:48
nicolasbockSo `server show` is reporting an incorrect hypervisor14:49
mriedemthis is where the RT gets the instances it thinks are running on it https://github.com/openstack/nova/blob/newton-eol/nova/compute/resource_tracker.py#L55614:49
openstackgerritVlad Gusev proposed openstack/nova master: Not instance to ERROR if set_admin_password failed  https://review.openstack.org/55516014:49
sean-k-mooneymriedem: there is a live migration edgecase that mdbooth was looking at a few weeks ago where post migrate source failed and we would not update the host the vm was running on14:50
sean-k-mooneybut the vm has actully been moved correectly14:50
mriedemnicolasbock: so did you live migrate this vm or something? why is nova reporting its on the wrong host?14:52
mdboothAh, yes. I do recall a bug with that. If we get an error in cleanup on the source host, called *post* successful migration, we then rollback the migration and put the instance in an error state, but it's still running fine on the destination.14:54
mdboothSo, e.g. if you get an error in terminate_connection or whatever, you get in this state14:54
mdboothAnd you can't clean it up, because instance.host is pointing to the source, but it's actually running on the dest14:54
mriedemterminate_connection as in cleaning up source node volume attachments and such right?14:55
mriedemsame with ports i'm sure14:55
mriedempost live migration cleaning up the source14:55
mdboothmriedem: Right. Any cleanup on the source14:55
mriedemwe should just catch and log cleanup failures14:55
openstackgerritVlad Gusev proposed openstack/nova master: Not set instance to ERROR if set_admin_password failed  https://review.openstack.org/55516014:55
*** slaweq has joined #openstack-nova14:56
mdboothmriedem: Right. The error in my view was that we put the instance in an error state, when the instance was fine. We should put the migration in an error state, but leave the instance alone.14:57
mdboothAnd also do as much cleanup as possible in the presence of errors.14:57
nicolasbockmriedem: Yes I think that's what happened14:57
openstackgerritJan Gutter proposed openstack/nova-specs master: Spec to implement vRouter HW offloads  https://review.openstack.org/56714815:07
openstackgerritJan Gutter proposed openstack/nova-specs master: Spec to implement generic HW offloads for os-vif  https://review.openstack.org/60761015:07
*** jangutter has joined #openstack-nova15:07
openstackgerritClaudiu Belu proposed openstack/nova master: tests: autospecs all the mock.patch usages  https://review.openstack.org/47077515:09
*** eharney has quit IRC15:09
melwitt.15:10
mriedems10: commented in that patch15:12
mriedemnicolasbock: so live migration was successful but something failed in post like mdbooth is mentioning15:12
nicolasbockOk15:12
mriedemnicolasbock: you'll likely need to manually update the instances.host value in the db then for that instance15:12
mriedemotherwise nova-compute on the source host is going to continue thinking it owns the instance15:13
s10mriedem: thank you15:14
nicolasbockOk, other than that this sounds mildly scary, could you give me a pointer where I find that value mriedem ?15:14
mriedemdo you know where the guest is actively running now?15:14
openstackgerritMerged openstack/nova stable/ocata: [Stable Only] Add amd-ssbd and amd-no-ssb CPU flags  https://review.openstack.org/60729615:14
nicolasbockYes15:15
mriedemit should be in the last live-migration migration record for the instance15:15
nicolasbockOk15:15
mriedemwell then you just update the table record in the nova db15:15
mriedemupdate instances set host=<host> where uuid=<instance uuid>;15:16
nicolasbockOk, sounds so straightforward when you put it like that ;)15:16
nicolasbockI'll give it a try15:16
nicolasbockThanks!15:16
sean-k-mooneymriedem: out of interest what would happen if you tried to do a hard reboot or other lifecycle action on a vm in this state with the wrong host set15:19
openstackgerritVlad Gusev proposed openstack/nova master: Not set instance to ERROR if set_admin_password failed  https://review.openstack.org/55516015:19
*** dpawlik has joined #openstack-nova15:19
sean-k-mooneywoudl it repare the instance or try to start it on the wrong host?15:19
mriedemit would try to start it on the wrong host15:21
mriedemi do'nt know what would then happen - would you get the same instance running on two hosts? or a domain not found from the wrong host when trying to reboot it?15:21
mriedemi'd hope the latter15:21
sean-k-mooneyright which if it was using share storage could lead to data curoption correct15:21
mriedemwell i'd hope reboot would fail if the guest isn't actually on the hypervisor15:22
sean-k-mooneyi would hope the latter too15:22
openstackgerritElod Illes proposed openstack/nova stable/ocata: Don't delete neutron port when attach failed  https://review.openstack.org/60761415:22
mriedemdoesn't look like it would fail though https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L84915:23
mriedemwe just handle the not found and assume the guest is already gone15:23
*** dpawlik has quit IRC15:24
*** eharney has joined #openstack-nova15:24
mriedemso maybe there is something to be said for putting hte instance into ERROR state on failed post live migration15:24
mriedemto force the admin to fix it15:24
sean-k-mooneymriedem: right we need that in case where we are usign hard reboot to "fix" things15:25
mriedemso the user doesn't try to reboot the thing and screw it up15:25
sean-k-mooneymriedem: perhaps when i discussed this wit mdbooth  previorsly i was suggesting always updating the host to the correct location fo the vm15:26
sean-k-mooneythen you could decide if it shoudl stay in error or active state seperately without haveing more bugs if you left it in active15:26
openstackgerritClaudiu Belu proposed openstack/nova master: hyper-v: autospec classes before they are instantiated  https://review.openstack.org/34221115:27
sean-k-mooneymriedem: or to put that another way i think there are two issues in the other case. 1 the host is not upstaed when the vm is moved in some cases and 2 what to do when cleanup fails post migration15:28
mriedemif post live migration set the instance.host to the correct host on which it's running then yeah my concern about the user rebooting it and now having the same guest on different hosts is less of an issue15:28
*** ttsiouts has quit IRC15:31
sean-k-mooneynicolasbock: if you are still around the clip notes version of that conversation is you might want to consider locking the instance untill you have reparied the db to prevent any lifecylce envents15:32
mdboothmriedem: IIRC my thought at the time was that we should completely update the instance record for the destination immediately after we switch it, then run source cleanup.15:33
mdboothSo if we fail for whatever reason we've recorded that we're running on the dest.15:33
*** pcaruana has quit IRC15:33
*** med_ has joined #openstack-nova15:37
mriedems10: i think the unit test is going to fail in that patch, see comments for why15:37
mdboothIncidentally, to reiterate something I said earlier, when this patch landed late last week it made the test_parallel_evacuate_with_server_group about 20 times more likely to occur: https://review.openstack.org/#/c/604859/15:37
mdboothThat test is now failing around 50% for me.15:38
mriedemwe can skip the test for now15:38
mriedemwhile the fix is being reviewed15:38
mdboothmriedem: ack. Given ^^^ it seems that the test has never been good.15:38
*** macza has joined #openstack-nova15:39
*** hamzy has quit IRC15:40
openstackgerritRodolfo Alonso Hernandez proposed openstack/os-vif master: Add native implementation OVSDB API  https://review.openstack.org/48222615:40
openstackgerritMatt Riedemann proposed openstack/nova master: Skip test_parallel_evacuate_with_server_group until fixed  https://review.openstack.org/60762015:43
*** hamzy has joined #openstack-nova15:43
mriedemdansmith: efried: mdbooth: ^ gives time to be comfortable with the fix15:43
mdboothmriedem: ack.15:44
efriedmriedem: What will the criteria for comfort be?15:44
efriedmriedem: https://review.openstack.org/#/c/605436/ has three +1s and a +215:45
mriedemi guess that's up to whoever +Ws it15:45
efriedmriedem: Well, I'm comfortable that it fixes the problem. But don't feel confident enough in the actual code change to +W. I would think someone like.... mriedem would be able to have that confidence.15:46
*** Luzi has joined #openstack-nova15:47
mriedemi'm not very confident in anything atm15:47
dansmithI think that change needs a lot of inspection15:48
dansmithwhich I can't do right this moment15:48
*** penick has joined #openstack-nova15:57
mriedemwho enjoys a good UnboundLocalError? https://github.com/openstack/nova/blob/237ced4737a28728408eb30c3d20c6b2c13b4a8d/nova/network/neutronv2/api.py#L142915:59
mriedemoh i guess it's not unbound, it's a module import...16:02
*** sahid has quit IRC16:06
*** gyee has joined #openstack-nova16:06
*** slaweq has quit IRC16:09
mriedemso uh, if we hit ^ shouldn't we fail the build?16:09
*** slaweq has joined #openstack-nova16:10
mriedemclearly the user isn't going to get the sriov port attached to the guest that they requested16:10
openstackgerritJay Pipes proposed openstack/nova stable/ocata: Re-use existing ComputeNode on ironic rebalance  https://review.openstack.org/60762616:11
openstackgerritMatt Riedemann proposed openstack/nova master: Fix logging parameter in _populate_pci_mac_address  https://review.openstack.org/60762816:12
*** ttsiouts has joined #openstack-nova16:12
*** hoangcx has quit IRC16:12
mriedemsean-k-mooney: you might be interested in https://bugs.launchpad.net/nova/+bug/179506416:12
openstackLaunchpad bug 1795064 in OpenStack Compute (nova) "SR-IOV error IndexError: pop from empty list" [Undecided,New]16:12
mriedemsomething something sriov and kernel versions16:13
*** hoangcx has joined #openstack-nova16:13
sean-k-mooneylooking16:13
jaypipesmriedem: https://review.openstack.org/#/c/607626/ is that stable/ocata backport for the duplicate hypervisor_hostname thingie16:13
jaypipesmriedem: thx for your help earlier.16:13
*** s10 has quit IRC16:13
mriedemjaypipes: so you cherry-picked that from master?16:15
mriedemhttps://review.openstack.org/#/c/508555/16:15
*** ttsiouts has quit IRC16:15
*** spatel_ has joined #openstack-nova16:15
mriedemor did you cherry pick from the pike backport but forget the -x option on the cherry-pick command?16:15
spatel_Hi folks16:15
jaypipesmriedem: no, I cherry-picked the SHA1 from the stable/pike patch16:15
*** ttsiouts has joined #openstack-nova16:16
jaypipesmriedem: oh, sorry, I don't know about -x :(16:16
spatel_I am having issue with SR-IOV with shared PCI device between numa and reading this blueprint : https://blueprints.launchpad.net/nova/+spec/share-pci-between-numa-nodes16:16
sean-k-mooneyspatel_: this bug https://bugs.launchpad.net/nova/+bug/1795064? or another?16:17
openstackLaunchpad bug 1795064 in OpenStack Compute (nova) "SR-IOV error IndexError: pop from empty list" [Undecided,New]16:17
spatel_I have set hw:pci_numa_affinity_policy='preferred' in flavor but still its not allowing me to run instance on NUMA-116:17
spatel_sean-k-mooney: that problem got resolved by downgrading kernel to 3.x16:17
sean-k-mooneyspatel_: i think you issue with 4.18 was that you did not have a netdev associated with the vf16:18
spatel_Is that configuration issue or BUG?16:19
sean-k-mooneyspatel_: i would say config issue. i would guess the default options for the gernel module cahnged and or you are using a different driver by default16:19
jaypipesmriedem: apologies. how can I fix appropriately? do I need to re-do the git cherry-pick with -x? or can/should I just edit the commit message with seomthing?16:19
sean-k-mooneyspatel_: for example if the device was bound to vfio_pci instead fo the broadcom driver then it would existit in lspci but not have a netdev16:20
*** ttsiouts has quit IRC16:20
openstackgerritArtom Lifshitz proposed openstack/nova master: WIP: Libvirt live migration: update NUMA XML for dest  https://review.openstack.org/57517916:20
openstackgerritArtom Lifshitz proposed openstack/nova master: Service version check for NUMA live migration  https://review.openstack.org/56672316:20
spatel_hmmm!16:21
sean-k-mooneyspatel_: do you whitelist devices using the devname option?16:21
mriedemjaypipes: see my other comment in the ocata backport about documenting the conflicts?16:21
spatel_sean-k-mooney: yes i am using devname option to specify my interface16:21
spatel_pci_passthrough_whitelist = "{ "physical_network":"vlan", "devname":"eno2" }"16:21
sean-k-mooneyspatel_: in general i advise against that for this exact reason16:21
sean-k-mooneyspatel_: if you use the pci adddress instead 4.18 would likely be fine16:21
spatel_pci address ?16:22
*** macza has quit IRC16:22
sean-k-mooneyspatel_: the whitelist can have 3 modes of whitelisting16:22
spatel_you mean vendor_id or product_id ?16:22
*** macza has joined #openstack-nova16:22
sean-k-mooneyspatel_: you can used devname, (vendor_id and product_id) or you can pass a pci address16:23
spatel_sean-k-mooney: i will give it a try and report back to BUG16:23
spatel_sean-k-mooney: currently i am dealing with this issue :( https://bugs.launchpad.net/nova/+bug/179592016:23
openstackLaunchpad bug 1795920 in OpenStack Compute (nova) "SR-IOV shared PCI numa not working " [Undecided,New]16:23
spatel_Do you know what wrong i am doing here16:23
*** panda is now known as panda|off16:24
spatel_I have 2 NUMA node and running SR-IOV with shared PCI16:24
*** macza has quit IRC16:24
sean-k-mooneyhttps://docs.openstack.org/mitaka/networking-guide/config-sriov.html has the doc16:24
sean-k-mooneyam let me look16:24
spatel_I am only able to use one side of NUMA16:24
sean-k-mooneyby default unless you set a pci numa affinity policy in the flavor or image we require strict numa afinity16:25
spatel_Its not allowing me to launch SR-IOV instance on NUMA-2 ( because PCI is attach to NUMA-1 )16:25
spatel_All i did is hw:pci_numa_affinity_policy=preferred  in flavor16:25
spatel_what else i need to do?16:25
sean-k-mooneyspatel_: let me check. i taught that was enough but you might also need to set the policy in the whitelist16:26
spatel_I do have aggregate_instance_extra_specs:pinned='true', hw:cpu_policy='dedicated'  in flavor16:26
*** helenafm has quit IRC16:26
spatel_I think document isn't clear in blueprint so i am totally confused :(16:27
spatel_If i remove "aggregate_instance_extra_specs:pinned='true', hw:cpu_policy='dedicated'" from flavor then i am able to launch instance anywhere in NUMA with SR-IOV support16:28
*** dims has quit IRC16:28
sean-k-mooneyaggregate_instance_extra_specs:pinned='true' is not a standard thing16:29
*** med_ has quit IRC16:29
spatel_hmmm! i didn't get it16:30
sean-k-mooneyspatel_: you should not need to set anything in the aggragte to use the pci policies16:30
openstackgerritJay Pipes proposed openstack/nova stable/ocata: Re-use existing ComputeNode on ironic rebalance  https://review.openstack.org/60762616:30
spatel_oh! so you are saying i should remove aggregate_instance_extra_specs:pinned16:30
jaypipesmriedem: k, hopefully correct now.16:30
jaypipesthx for the help again.16:31
sean-k-mooneyspatel_: yes16:31
spatel_lets say if i remove "aggragte" then does my vCPU get Pinned ?16:31
spatel_removing aggrate and going to launch instance16:33
*** med_ has joined #openstack-nova16:34
*** dims_ has joined #openstack-nova16:35
spatel_sean-k-mooney: didn't work error 'No valid host was found. There are not enough hosts available'16:36
spatel_look like something is missing..16:36
sean-k-mooneyspatel_: so jsut to confirm you dont have any aggragte metadata set and have hw:pci_numa_affinity_policy=preferred set16:38
sean-k-mooneyin the flavor16:38
spatel_This is what i have currently in flavor  ->  properties                 | hw:cpu_policy='dedicated', hw:numa_nodes='2', hw:pci_numa_affinity_policy='preferred'16:39
spatel_If i remove all 3 option then i am successfully able to launch instance16:39
stephenfinspatel_, sean-k-mooney: We didn't implement it with a flavor extra spec in the end16:39
sean-k-mooneyso this has be set in the pci whitelist then16:40
stephenfinspatel_: Yes16:40
stephenfinOops, sean-k-mooney ^16:40
spatel_oh!! wait wait.. so what i need to do in pci whitelist ?16:40
sean-k-mooneyspatel_: https://github.com/openstack/nova/blob/master/nova/pci/request.py#L16-L2516:40
sean-k-mooneysorry thats not what you want but yes you do16:41
spatel_so i need to add that snippet in compute nova.conf in [PCI] section ?16:41
stephenfinspatel_: Have you seen this? https://docs.openstack.org/nova/latest/admin/networking.html#numa-affinity16:41
stephenfinspatel_: Ignore that - wrong feature :)16:41
spatel_ok16:41
spatel_ I am running queens16:41
stephenfinspatel_: https://docs.openstack.org/nova/latest/configuration/config.html#pci16:42
stephenfinSee the alias configuration key16:42
stephenfinspatel_: But, to be clear, is this for a PCI device or an SR-IOV device?16:42
spatel_SR-IOV device16:42
*** artom has quit IRC16:42
spatel_We are running high performance network application and need high speed network or high PPS rate16:43
sean-k-mooneystephenfin: looking at the whitelist code i dont think we supprot it in the whitelist16:43
stephenfinsean-k-mooney: Doesn't seem like it. I'm trying to think why it was done that way16:44
sean-k-mooneywhich would mean the polices only work for device requeted via flavor alias which would be dumb16:44
sean-k-mooneyare you sure we did not supprot this in the flavor extraspecs /image metadata16:44
spatel_I am going to add alias and get back to you..16:44
stephenfindefinitely not16:44
sean-k-mooneystephenfin: was the whole point of this feature to fix neutron sriov16:45
spatel_is product_id and vendore_id mandatory  because in i am using devname here "pci_passthrough_whitelist = "{ "physical_network":"vlan", "devname":"eno2" }""16:45
*** med_ has quit IRC16:45
*** s10 has joined #openstack-nova16:46
sean-k-mooneyspatel_: no16:46
*** s10 has quit IRC16:46
sean-k-mooneya white list can be in any of these forms https://github.com/openstack/nova/blob/master/nova/pci/devspec.py#L182-L19216:47
sean-k-mooneyactully the alias yes that need to use vendor_id and product_id16:48
stephenfinsean-k-mooney: If it was, it seems something may have slipped through the cracks here16:48
sean-k-mooneyalias are not for networking they are for passthrough devices16:48
stephenfinYup, I get that16:48
spatel_type-PCI, type-PF and type-VF what i should pick ?16:49
spatel_VF ?16:49
stephenfinFrom the quick glance here, it should really be configured via the whitelist. I'm not sure why I went with the alias16:49
stephenfinspatel_: yup16:49
spatel_doing it..16:49
sean-k-mooneystephenfin: ya i am go to confirm https://bugs.launchpad.net/nova/+bug/179592016:49
openstackLaunchpad bug 1795920 in OpenStack Compute (nova) "SR-IOV shared PCI numa not working " [Undecided,Confirmed]16:49
*** med_ has joined #openstack-nova16:49
sean-k-mooneystephenfin: interested in working on this? if not ill add it to my list but this need to be fixed16:50
sean-k-mooneyand backported16:50
stephenfinI won't tackle it tonight but I can do so, yeah16:50
stephenfinnot sure if we can backport though. It'll be a config file change16:50
sean-k-mooneycool we liekly need to repreose the old spec16:50
sean-k-mooneystephenfin: i was suggesting we need to add the flavor and image extraspecs so no config file change16:51
stephenfinsean-k-mooney: Possibly, but before doing so I'd suggest going back and reading the spec reviews16:51
stephenfinThere was a reason we didn't do that, though I don't recall it now :/16:51
*** spatel has joined #openstack-nova16:52
*** spatel_ has quit IRC16:52
spatelsean-k-mooney: this is what i change in nova.conf http://paste.openstack.org/show/731417/16:52
stephenfinIf it's image metadata changes, we can't backport those due to object changes16:52
spatelcan you verify16:52
sean-k-mooneyyes i rembere i was very against using the alisa but i never recalled dropping the extra specs16:52
spatelgoing to launch instance now, figure cross16:53
sean-k-mooneyspatel: its goning to fail.16:53
spatel??16:53
spatelwhy?16:53
sean-k-mooneylooking at the code the feature was not finished16:53
spatelDamn it :(16:54
spatelso what is the deal here ?16:54
sean-k-mooneythe numa policies are only repected for flavor based pci device passhtoruh e.g. for things like gpu or acllorator cards16:54
sean-k-mooneyspatel: the spec was approve and the full feature was not merged16:54
spatelin my case its VF16:55
*** med_ has quit IRC16:55
spatelcurrently i can't utilize both numa :(16:55
sean-k-mooneyspatel: in your case its a neutron port with vnic_type direct correct16:55
spatelyes16:55
sean-k-mooneyspatel: the workaround is to make the guest have 2 numa nodes16:56
melwittsean-k-mooney: which spec was not properly finished?16:56
openstackgerritMatt Riedemann proposed openstack/nova master: Handle IndexError in _populate_neutron_binding_profile  https://review.openstack.org/60765016:56
mriedemspatel: fyi ^16:56
sean-k-mooneymelwitt: the numa pci polices16:56
sean-k-mooneyill get the link on sec16:57
melwittthanks. I didn't see a link in the backscroll16:57
spatelsean-k-mooney: "guest have 2 numa nodes" can you explain this?16:57
*** macza has joined #openstack-nova16:57
sean-k-mooneymelwitt: https://review.openstack.org/#/c/361140/16:57
openstackgerritVlad Gusev proposed openstack/nova master: Not set instance to ERROR if set_admin_password failed  https://review.openstack.org/55516016:57
sean-k-mooneyspatel: in the flavor set hw:numa_nodes=216:57
spatellet me try hold on...16:58
sean-k-mooneythis will create a guest with 2 numa nodes with half the cpus and ram on each16:58
spatelsean-k-mooney: FYI, i have tried this and it failed "hw:cpu_policy='dedicated', hw:numa_nodes='2', hw:pci_numa_affinity_policy='preferred'"16:58
spatelnow i am going to remove "hw:cpu_policy='dedicated" and "hw:pci_numa_affinity_policy='preferred'" to see if that work16:59
sean-k-mooneymelwitt: the flavor and image extraspecs are appently not implemente meaning that this does not work for neutron sriov ports as it should16:59
melwittsean-k-mooney: ok, according to the notes on the blueprint, people thought it was completed "The last functional patch for this was merged on Dec 30, 2017" https://blueprints.launchpad.net/openstack/nova/+spec/share-pci-between-numa-nodes16:59
spatelmelwitt: that is why i am chasing that blueprint because it says completed 201716:59
*** derekh has quit IRC17:00
sean-k-mooneymelwitt: i taught it was complted im checking the code to confirm but apparently its not working17:00
spatelDo we have any ETA because i have 100 compute node in racks and here i am stuck with this issue :(17:00
melwittI feel like someone has asked me about this bp before, asking if it applies to SRIOV too, and I thought since it never mentions SRIOV, that it doesn't17:00
sean-k-mooneyi can proably test this locally too i have jsut set up a sriov host17:01
melwittor wasn't meant to. and that adding SRIOV support would be additional work outside the scope of this particular blueprint17:01
sean-k-mooneymelwitt: its primary usecase was sriov specifics for telcos where they had to numa node server but all nics were connecte to one numa node due to space constriants in there rack17:01
melwitti.e. a new blueprint would be, for example, "add SRIOV support for sharing PCI devices between NUMA nodes"17:01
*** Luzi has quit IRC17:02
spatelmelwitt: its much clear now.. :)17:03
openstackgerritMerged openstack/nova stable/rocky: Explicitly fail if trying to attach SR-IOV port  https://review.openstack.org/60511817:03
mriedemknown issue yeah? https://bugs.launchpad.net/nova/+bug/179471717:04
openstackLaunchpad bug 1794717 in OpenStack Compute (nova) "rocky: ephemeral disk can not be resized" [Undecided,New]17:04
sean-k-mooneymriedem: not being able to resise ephemeral disk ya17:05
sean-k-mooneymriedem: i mean i thk i some very specific edgecase it can work today but there is no generic way to enable it17:05
sean-k-mooneye.g. how to you resize form 1 500G disk to 2 400G disk so we just decided not to support it at all17:06
mriedemyup, bug 155888017:06
openstackbug 1558880 in OpenStack Compute (nova) "instance can not resize ephemeral in mitaka" [Medium,Confirmed] https://launchpad.net/bugs/155888017:06
melwittsean-k-mooney: ok, I don't know anything about that. if that spec scope is actually incomplete, then we need to decide how we deal with it. open another spec for this cycle to finish it or treat them as bugs17:07
sean-k-mooneymelwitt: proably reporpose the spec is the best way and jsut add the flavor exra specs and image metadata values that were orginaly proposed17:08
sean-k-mooneymelwitt: unless you think we can backport them in which case it could be a bug17:08
sean-k-mooneybackporting woudl be the only reason to make it a bug in my mind but its also addign new fuctionality e.g. tuning off numa affinty for pci devices17:09
*** jpena is now known as jpena|off17:11
*** ralonsoh has quit IRC17:13
*** med_ has joined #openstack-nova17:13
sean-k-mooneymelwitt: i or stephen will repropsoe the spec17:15
sean-k-mooneymelwitt: stephenfin  is heading home so i will likely do it later today17:15
melwittsean-k-mooney: I'd run the idea by mriedem too, in case he has another opinion on how to handle this17:17
openstackgerritSylvain Bauza proposed openstack/nova master: libvirt: implement reshaper for vgpu  https://review.openstack.org/59920817:17
sean-k-mooneymelwitt: sure i just felt a little checky to sneak it in as a bug fix :)17:17
bauzasdansmith: mriedem: I tested the vgpu reshape for allocations too and good news it works ! I just fixed a few things that I discovered when testing ^17:18
melwittsean-k-mooney: yeah, you're probably right. I didn't think much about it17:19
bauzasnow, call it a day17:19
*** med_ has quit IRC17:20
sean-k-mooneyspatel: where you able to use a multi numa node guest to spawn the instance. that funtionality definetly works17:23
spatelTesting it now..should i add "hw:cpu_policy='dedicated'" too for pinning ?17:24
dansmithbauzas: ack17:25
sean-k-mooneyspatel: yes if you want cpu pinning then add hw:cpu_policy='dedicated'17:25
spateldoing it.. hold on.. soon report back17:25
sean-k-mooneyno rush17:27
*** tbachman has quit IRC17:29
*** tbachman has joined #openstack-nova17:33
spatelsean-k-mooney: i am able to launch two VM with 10vCPU core each ( i have 32 core compute node with 16+16 numa) but look like it didn't pin down CPU17:35
spatelcheck this out http://paste.openstack.org/show/731420/17:35
spatelI can see it pin CPU cross numa17:35
sean-k-mooneyspatel: can you run virs dumpxml <instance>17:37
*** tbachman_ has joined #openstack-nova17:37
sean-k-mooneyspatel: i think it pinned everythin correctly17:37
spatelhttp://paste.openstack.org/show/731423/17:37
sean-k-mooneyspatel: it looks like each17:37
spatelI thought it should pin all vCPU core with same NUMA node CPU right?17:38
sean-k-mooneyno17:38
spatelhmmmm?17:38
*** tbachman has quit IRC17:39
*** mvkr has quit IRC17:39
*** tbachman_ is now known as tbachman17:39
sean-k-mooneyby setting hw:numa_nodes=2 you will have half the cpus on one numa node and half on the other17:39
sean-k-mooneymemory will also be equally split17:39
sean-k-mooneyprovided there is a pci device free on at least one of the 2 numa nodes assocaicated with the vm vcpus then we will allow the vm to boot17:40
spatelif i remove numa_node=2 then it will pin down all vCPU on same node right?17:40
sean-k-mooneycorrect addint hw:cpu_policy=dedicated implictly adds hw:numa_nodes=117:40
spatelhmm! interesting..17:41
spatelusing numa_node=2 will have some performing issue right?17:41
sean-k-mooneyby explcitly setting hw:numa_nodes=2 it will allow both numa nodes on the host to be used but it will also limit the vm to host with 2+ numa nodes17:41
sean-k-mooneyspatel: it can if the application in the gust itself does not understand numa affinity17:42
sean-k-mooneyit can also improve the performacne as you doubles your memory bandwith as the vm will now use memroy form 2 host numa noes/memory controlers17:43
spatelI think time to run some test...17:43
spatelWe are media company and using VoIP base application17:43
sean-k-mooneytesting is always a good idea :)17:43
spatelFirst i build openstack without SR-IOV and found performance was horrible (PPS rate was only 50k after that it start dropping packets)17:44
sean-k-mooneyas a comunity we have done a lot of work to improve numa affinity over the years17:44
spatelI have just started learning numa stuff so i am new but it looks interesting..17:45
sean-k-mooneyspatel: the stict pci numa affinity was added for telco usescases wehre they could not tolerate cross numa pci/sriov17:45
sean-k-mooneyspatel: it certenly is .... interesting. its also a pain in the ass but give better performace when you get it right17:45
spatelI have some legacy hardware and i have to stick to them17:46
spatelother side i am planning to test DPDK if its better17:46
sean-k-mooneynuma is not going away infact its become more common17:46
sean-k-mooneydpdk is much better then kernel ovs17:46
sean-k-mooneybut its more complicated too17:46
spatelbut it doesn't need hardware dependency atleast17:47
spatelI spent thousand of $$$$ to get SR-IOV supported card17:47
sean-k-mooneyspatel: not in the same way it requires that the guests use hugepages and that there is a dpdk driver for your nic17:47
spatelDoes it perform like SR-IOV ?17:48
sean-k-mooneyspatel: ya dpdk will be cheaeper in that sense but you will have to dedicate 1-2 cores to handel trafic for ovs-dpdk17:48
sean-k-mooneyspatel: in some cases yes. in general not quite17:48
sean-k-mooneywhat data rates / traffic profiles are you targeting?17:49
spatelcurrently i am deploying VoIP application on 1U server with 32 core / 32G memory. and i have 1000 servers...17:49
sean-k-mooney10G small packets 40G jumbo frames? a mix17:49
spatelmy peak in production 200 to 230kpps UDP packet rate17:50
sean-k-mooneyon well dpdk can handel that easilly17:50
spatelreally???17:50
spatelif that is the case then it will be win win solution17:50
*** ivve has quit IRC17:51
sean-k-mooneyya dpdk was desinged to hit 10G line rate with 64byte packets which is 14mpps17:51
spatelwe have lots of server in AWS (with sr-iov) support17:51
spatelthat is really cool!17:51
sean-k-mooneywith the right hardware it can hit 32mpps on a singel core but in generall you will see more like 6mpps17:51
spatelwe are using LinuxBridge + VLAN so i need to upgrade to OVS17:52
sean-k-mooneyits mroe or less like this17:52
sean-k-mooneylb<ovs<sriov+macvtap<ovs-dpdk<sriov direct17:52
spatelI have tried macvtap but that didn't work too17:53
sean-k-mooneyspatel: checkout https://dpdksummit.com/Archive/pdf/2016USA/Day02-Session04-ThomasHerbert-DPDKUSASummit2016.pdf slides 16-1917:54
spatelreading..17:55
sean-k-mooneyspatel: im alittle biased as im one of the people that added ovs-dpdk support to openstack but for your data rate i think it would work quite well17:56
spatelI need to find out how to migrate LinuxBridge to OVS17:56
sean-k-mooneyspatel: today cold migrate works. im working on fixing livemigrate17:57
spatelcool!17:57
spatelin SR-IOV i am not able to get that function too17:57
spateleven bonding isn't supported17:57
sean-k-mooneylive migrate almost works we just dont update the bridge name correctly im hoping to backport that17:57
spatelnice! if that work17:58
openstackgerritSurya Seetharaman proposed openstack/nova master: Add scatter-gather-single-cell utility  https://review.openstack.org/59494717:58
openstackgerritSurya Seetharaman proposed openstack/nova master: Return a minimal construct for nova list when a cell is down  https://review.openstack.org/56778517:58
openstackgerritSurya Seetharaman proposed openstack/nova master: Modify get_by_cell_and_project() to get_not_qfd_by_cell_and_project()  https://review.openstack.org/60766317:58
sean-k-mooneyspatel: haha i think im working on all your missing features :) https://review.openstack.org/#/c/605116/17:58
openstackgerritMerged openstack/nova stable/pike: nova-manage - fix online_data_migrations counts  https://review.openstack.org/60584017:59
spatel:)17:59
spateli have lots of requirement :) these are just starting17:59
spatelsean-k-mooney: thanks for help!!!18:00
sean-k-mooneymy main focus this realease at least initally is livemigraton hardenign. e.g fixing edgecase like lb->ovs or sriov18:00
nicolasbock<freenode_sea "nicolasbock: if you are still ar"> Thanks for the tip!18:00
spateli didn't know freenode will be very helpful.. last 2 days i am chasing google18:00
spatelwhat you use to deploy your openstack?  I am using openstack-ansible18:01
sean-k-mooneyspatel: no worries. im usually hear so feel free to ping me if you have issues18:01
spatelI am going to spend next 6 month here :) until my cloud is ready!!18:01
sean-k-mooneyspatel: for developement devstack. i used to use kolla-ansible but recently joined redhat so i proably shoudl suggest OSP18:01
spatelwe spent million dollar last year in AWS so my boss want to build own AWS :)18:02
sean-k-mooneyspatel: that is how alot of compaines end up running opnestack clouds yes18:02
spatelindeed18:02
spatelOSP using tripleO right?18:03
spateli tried and found very complicated18:03
sean-k-mooneyyes that is the officaly supprot installer from redhat18:03
sean-k-mooneyspatel: and yes it can be18:03
mnaserdoes anyone know if daniel berrange hangs out on irc much?18:03
mnaseri'm looking at this old abandoned review and i'm wondering if this is still an issue -- https://review.openstack.org/#/c/241401/18:03
sean-k-mooneymnaser: not on this irc but he is usally on the libvirt one18:04
mnaseri'll try to ping him there18:04
spatelsean-k-mooney: going to eat something, will catch you again, if any issue :) thanks again18:04
sean-k-mooneymnaser: i think that is a bug that has been forgoten about but not nessacarily fixed18:06
mnasersean-k-mooney: yeah, it's not fixed, but it's been a while so i'm wondering if the whole "it doesnt working with backing" argument is no longer valid18:07
mnaserwe're setting up some really fast hardware (pci-e nvme drives) and want to squeeze the best performance out of it.. short of going to something like lvm18:07
sean-k-mooneymdbooth: and lyarwood  would like be able to comment better then i on https://bugs.launchpad.net/nova/+bug/151032818:07
openstackLaunchpad bug 1510328 in OpenStack Compute (nova) "Nova pre-allocation of qcow2 is flawed" [Low,Confirmed]18:07
openstackgerritJack Ding proposed openstack/nova master: Add HPET timer support for x86 guests  https://review.openstack.org/60590218:08
sean-k-mooneymnaser: right am in that case would you be better with a raw image instead of qcow if your always preallocating18:08
mnasersean-k-mooney: right, i'm thinking that might be the next path, raw files on disk18:09
sean-k-mooneymnaser: if you are also supporting ceph or boot form volume raw can often be better too even if you are using more space for glance / image cache18:10
mnaserbut then we lose a lot of qcow2 features18:10
sean-k-mooneymnaser: like live snapshot18:10
mnaseryeah, a lot.. unfortunately18:10
*** spatel has quit IRC18:10
sean-k-mooneymnaser: i dont think anyone would object if you had a way to fix the bug but did not cause others18:11
mnasersean-k-mooney: yep.. its just that unfortunately there was no documentation as to why that was an issue with backing images18:11
mnaserso thats what im trying to research18:11
sean-k-mooneymnaser: its got to have something to do with the overlays that we create18:13
mnaseryeah it looks like it's not really a possibility18:13
mnaser:<18:13
sean-k-mooneymnaser: mdbooth and kashyap should be able to confirm tomorow when they are back online18:14
mnaseri'll wait to hear18:14
mnasernow to find ways to benchmark this server18:14
mnaserserver/vm that is18:14
mnaserhttp://paste.openstack.org/show/731425/18:15
sean-k-mooneyis that a vm with  468G of ram18:16
sean-k-mooneysorry 47218:16
sean-k-mooneyi also like the insane amount of gpus18:17
sean-k-mooneymay i sugges you use it to play minecrat  tootally how you should benchmark18:17
sean-k-mooneymnaser: also i hear bitcoin is a thing :)18:17
mnasersean-k-mooney: aha, we're rolling out gpus and we have instances with 472G of ram, 48 (dedicated) threads, 1.8T of PCI-e NVMe storage..18:18
sean-k-mooneymnaser: actully on a serious note you are an operator of a cloud with vgpus correct18:18
mnasersean-k-mooney: no vgpu support, only dedicated gpus (as far as we've planned)18:18
*** spatel has joined #openstack-nova18:19
mnaserpart of this is MAYBE seeing if we can get some vGPU CI.. if possible, but i hear there are some more complicated reasons why its not possible18:19
sean-k-mooneyah well does the lack of vgpu numa affinity effect your decision to use vgpus or deploy gpus in the cloud in general18:19
sean-k-mooneymnaser: actully it might be useing complicated trick18:19
sean-k-mooneye.g. nested virt + q35 chipset + viommu  + pci passthoug of phyical gpu PF to host vm18:20
mnaseri think we're starting to roll things out by having dedicated gpus to see market demand for it (we've had some).  unfortunately the other thing that's coming to mind is i'm thinking that users who need gpu levels of performance probably would want 100% of it18:20
mnaserwe can make nested virt available for gpu instances so maybe thats possible18:21
sean-k-mooneymnaser: have you talked to bauzas about possible vgpu ci?18:21
*** imacdonn has quit IRC18:21
*** imacdonn has joined #openstack-nova18:21
mnasersean-k-mooney: we briefly talked about it.. dansmith mentioned concerns about iommu and stuff that's beyond my level of comprehension :)18:21
mnaserbut we plan to provide at least 1 or 2 instances to openstack CI *if* there's a use case that makes sense18:22
dansmithmnaser: he said viommu, so if that's a thing now then maybe it's doable18:22
sean-k-mooneydansmith: yes it is but we have not enabled it in nova yet18:22
sean-k-mooneybut its trival so we could18:23
sean-k-mooneywell its a flavor extraspec + xml generation and other crap but its not technical very hard to do we just have not done it yet18:23
*** pcaruana has joined #openstack-nova18:24
mnaseri'd be more than happy to provide 1 or 2 instances with a gpu18:24
*** itlinux has joined #openstack-nova18:24
sean-k-mooneydansmith: i was added in libvrt 2.1 and qemu 3.4 https://libvirt.org/formatdomain.html#elementsIommu18:25
* dansmith nods18:25
nicolasbock<freenode_mri "nicolasbock: you'll likely need "> I had to also update `instances.node` but then the allocation was updated correctly18:26
sean-k-mooneymnaser: thats very generous. it would certenly help if we could actully test vgpu the upstream ci even if it was an experimtal job that did not run on all patches18:26
mnaserwhile i wrap things up here i can push up a patch to add 1 or 2. we'll probably do it with min-servers: 0 and max-servers: 2 to start with18:27
*** spatel has quit IRC18:28
mriedemefried: i've replied in https://review.openstack.org/#/c/606122/18:28
*** spatel has joined #openstack-nova18:28
efriedack18:29
spatelsean-k-mooney: currently i have "intel_iommu=on" in grub.conf, should i add "iommu=pt" too?18:29
sean-k-mooneyspatel: "iommu=pt" is not requried but advised18:30
efriedmriedem: +218:30
spatelwill add that :)18:30
sean-k-mooneyspatel: this is my cmdline on my sriov systems BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=UUID=2cca5edf-cbcc-4f0d-91df-df438bbd56c5 ro crashkernel=auto rhgb quiet intel_iommu=on iommu=pt pci=assign-busses,realloc18:30
spatelare you using SR-IOV?18:31
spatelor DPDK?18:31
mriedemefried: thanks18:31
mriedemlazy-load can be a cruel mistress18:31
sean-k-mooneyspatel:  pci=assign-busses,realloc is to work around some hardware bugs where my bios does not allocate enough iommu space18:31
efriedsrsly18:31
spatelnice!18:32
sean-k-mooneyspatel: iommu=pt is need for dpdk but not sriov18:32
spateloh! make sense18:32
sean-k-mooneyi enable it always so i can deploy both and swap between them18:32
spatelsean-k-mooney: i have created new flavor (15 vCPU / 14G memory ) and i got this error18:32
spatelERROR (BadRequest): Instance CPUs and/or memory cannot be evenly distributed across instance NUMA nodes. Explicit assignment of CPUs and memory to nodes is required (HTTP 400) (Request-ID: req-400663e1-75d1-4bbc-a06b-07dcfd845be6)18:32
*** cdent has quit IRC18:33
spatelThis is what i have in flavor hw:cpu_policy='dedicated', hw:numa_nodes='2'18:33
sean-k-mooneyspatel: yes the error could be improved. the vcpus needs to be devisable by the number of numa nodes othere wise you have to tell us how many cpus to put on each numa node18:33
*** mvkr has joined #openstack-nova18:34
sean-k-mooneyspatel: so i would jsut set it to 14 vcpus and 14G memory18:34
spatelcool!!18:34
spateldoing it18:35
*** artom has joined #openstack-nova18:35
sean-k-mooneyspatel: since you are optimising your flavors and given your usecase i would also recomment enableing hugepage memroy for the vm18:35
sean-k-mooneyit will give you a 30-40% performacne boost in many workloads but require you to allocate hugepages on the host first via the kernel command line ideally18:36
spatelI have this setting in grub "hugepagesz=2M hugepages=2048 transparent_hugepage=never"18:36
sean-k-mooneyah cool that will only allcoate 4G of hugepates form the 32 you have total.18:37
spatelone more question i have 32G memory so what number should be good for number of pages?18:37
spatelyes i have 32G memory18:37
spateli heard 1G is better for hugepage18:38
sean-k-mooneyhaha i was getting to that next. :) i would recommend between 24-28G of hugepages leave 6-8 for the host18:38
sean-k-mooneyspatel: it depends for some workloads yes for most it does not matter18:38
sean-k-mooneyhugepages cannot be subdevided so if you use 1G hugepges the ram in you flavor must be a multiple of 1G18:39
spatelmy application doesn't need lots of memory because its RTP traffic voip18:39
spatelhmm! make sense18:39
sean-k-mooneyspatel: in your case i doubt you will see a difference and 2MB hugepages will give you more granularity18:40
spatellets stick to 2M then :)18:40
openstackgerritMatt Riedemann proposed openstack/nova master: Add post-test hook for testing evacuate  https://review.openstack.org/60217418:40
openstackgerritMatt Riedemann proposed openstack/nova master: Add volume-backed evacuate test  https://review.openstack.org/60439718:40
openstackgerritMatt Riedemann proposed openstack/nova master: Add functional regression test for bug 1794996  https://review.openstack.org/60610618:41
openstackbug 1794996 in OpenStack Compute (nova) "_destroy_evacuated_instances fails and kills n-cpu startup if lazy-loading flavor on a deleted instance" [High,In progress] https://launchpad.net/bugs/1794996 - Assigned to Matt Riedemann (mriedem)18:41
openstackgerritMatt Riedemann proposed openstack/nova master: Fix InstanceNotFound during _destroy_evacuated_instances  https://review.openstack.org/60612218:41
openstackgerritMatt Riedemann proposed openstack/nova master: Run evacuate tests with local/lvm and shared/rbd storage  https://review.openstack.org/60440018:41
spatelsean-k-mooney: should  i use this? hugepagesz=2M hugepages=1536018:41
spatelit will give 30G18:41
spatellet me try to make it 28G18:41
spatelkeep 4G for OS18:42
sean-k-mooneythe hugepage memory will not be availabel to normal os process so 2MB is likely too tight for a compute node18:42
*** tbachman has quit IRC18:43
sean-k-mooney4G should be ok but used to give ^G as my safty margin that said i did not need that much of a margin18:43
*** artom has quit IRC18:43
spatelIn that case let me give 8G to OS (keep 24G for VM)18:44
sean-k-mooneyspatel: i would set it to 12 28818:44
sean-k-mooneywhich is 24G18:44
spatelhugepagesz=2M hugepages=12288   - DONE! going to reboot compute node18:45
spatelDo you use isolcpus=  CPUAffinity ?18:45
spatelI was reading about that not sure i need to worry about that or not18:45
sean-k-mooneyi would then also reduce the max size vms to 10 or 12 GB ram for your largset flavor so you can alway boot at least 2 of them18:46
sean-k-mooneyisolcpus is not the same as cpuaffintiy18:46
sean-k-mooneyi generally avoid isolcpus= it is a rather large hammer to reach for18:47
mriedembauzas: i've -2ed https://review.openstack.org/#/c/599208/ as we discussed yesterday18:47
sean-k-mooneyit should only be used for realtime instances even then its tricky to use correctly18:47
*** artom has joined #openstack-nova18:47
spatelok! got it18:47
*** liuyulong has quit IRC18:47
*** artom has quit IRC18:47
sean-k-mooneyspatel: generally i would only suggest usign it to isolage cores allcoated to ovs-dpdk if you chose to depoly it18:47
*** artom has joined #openstack-nova18:48
sean-k-mooneyspatel: dont get me wronge isolcpus= has a place but its only somting i reach for when i have no other options left and i really really need it18:48
spatelI will soon deploy dpdk (believe me)18:49
mnasersean-k-mooney: https://review.openstack.org/#/c/607686/ .. ill push up a patch to test things out when possible (or at least something to confirm its working)18:49
spatelin flavor i should set hw:mem_page_size='2048'  right ?18:50
mnaserso maybe if you want to start figuring out nova dependencies18:50
*** gyee has quit IRC18:50
*** pcaruana has quit IRC18:50
*** s10 has joined #openstack-nova18:52
dansmithmriedem: melwitt tssurya: cells meeting today? I have an appointment the hour before, but I will probably be back in time18:52
sean-k-mooneyspatel: you can but i prefer seting hw:mem_page_size=large18:52
sean-k-mooneyspatel: that will work with both 1G and 2MB hugepages18:53
spateldone! let me do that18:53
dansmithside note, mriedem melwitt: This is easy early utility stuff we can merge in front of the down cell stuff: https://review.openstack.org/#/c/594947/18:53
tssuryadansmith: the most important question I had was the best way to get the "type" of exception from the utility ^18:54
dansmithtype?18:54
tssuryawe could also do it during the meeting if others also have topics18:54
mriedemdansmith: i was holding off on that one until i knew what was going on further in the series18:54
tssuryayea for instance a TimeOut/DBonnectionError expception versus InstanceNotFound exception18:54
tssuryaas of now we always return the "raised_exception_sentinel" which is not that useful18:55
tssuryabecause based on the type of exception we have to handle it differently18:55
nicolasbockFixing the migration is more difficult it seems: I successfully updated the DB with the correct hypervisor and `server show` was now showing the correct hypervisor information18:55
mriedemplease hold, i have to sell something to a craigslist weirdo real quick18:55
dansmithtssurya: by timeout you mean an rpc timeout, not the did_not_respond_sentinel I assume?18:55
nicolasbockI ran `server migrate` which failed with `[Errno 2] No such file or directory: '/var/lib/nova/instances/2aa3a324-bf22-4e0c-912a-d7c52f59f1fd/disk`18:55
nicolasbockSo the disk didn't make it in the first migration18:55
nicolasbockI verified that the disk is still on the old host18:56
* melwitt listens18:56
sean-k-mooneymnaser: that spec would allow testing quite alot of featue espcially if it supproted nested virt18:56
nicolasbockSince I am in the middle of open heart surgery anyway I figured I just rsync the disk to the current hypervisor18:56
tssuryadansmith: TimeOut was just an example, my main problem is to filter the "InstanceNotFOund" from others for nvoa show18:56
tssuryanvoa show*18:56
tssuryanova show*18:56
nicolasbockSo that worked18:56
mnasersean-k-mooney: these vms have nested virt18:56
nicolasbockHowever, migrate is now refusing to migrate since the VM is in an ERROR state18:56
dansmithtssurya: yeah18:57
mnasernicolasbock: nova reset-state --active18:57
nicolasbockI can't `nova reset-state` either, it says `Reset state for server 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd succeeded; new state is error`18:57
tssuryaas of now, when we get the InstanceNotFound, utility hides this returns the sentinel, I try to go and make a minimal construct when I shouldn't be18:57
nicolasbockwhich isn't all that helpful :(18:57
dansmithtssurya: probably have to get away from the sentinel object I guess18:57
mnaser`--active`18:57
dansmithtssurya: which is going to be a mess18:57
tssuryamelwitt and I had a brief discussion18:57
melwittdansmith, tssurya: sean-k-mooney proposed this class as a way to be able to return exception objects https://review.openstack.org/60525118:57
tssuryathe other day18:57
nicolasbockYeah mnaser !!!18:57
*** spatel has quit IRC18:58
* tssurya looking18:58
nicolasbockI hadn't considered that since `--active Request the server be reset to "active" state instead of "error" state (the default).`18:58
*** efried has quit IRC18:58
dansmithum18:58
*** spatel has joined #openstack-nova18:58
nicolasbockI guess `--active` isn't the default after all18:58
dansmithseems a lot overkill :)18:59
sean-k-mooneymnaser: do you provide any other custom nodes. i dont know if you care about ovs-dpdk or cpu pinnng but would you be ok if we used that or a sligly less different flavor to maybe test does feature in the gate?18:59
dansmithmelwitt: tssurya: it would be trivial to just use the exception as the sentinel in the response, and we just check to see if the result isinstance(thing, Exception)18:59
mriedemnicolasbock: that disk not found with cold migration sounds like a bug i've seen before that is fixed, but had to do with shared storage and volume-backed instances18:59
melwittdansmith: comment on the review :) it came about because I said something like, can we return the exception object in addition to the sentinel, in a tuple or something18:59
dansmithand then you have the exception itself18:59
mnaserwe are slowly rolling out nested virt across our entire fleet but that is something to discuss more with the infra team i think18:59
mriedemnicolasbock: but likely not fixed on newton18:59
melwittdansmith: yeah, that was my other suggestion. I had two ideas: drop the sentinel and check isinstance or keep the sentinel and have tuples19:00
tssuryadansmith: right, that would be simple, is it okay to change the utility's face now ?19:00
nicolasbockok, do you happen to remember the review this was fixed in mriedem ? Maybe I can backport?19:00
mriedemlooking19:00
dansmithmelwitt: no reason for the sentinel I don't think19:00
*** efried has joined #openstack-nova19:00
nicolasbockThanks mriedem19:01
dansmithanything that isinstance(Exception) is... an error, so...19:01
nicolasbockmnaser: it worked! The VM has migrated to a new host19:01
melwittdansmith: yeah, that's what I was thinking19:01
sean-k-mooneymnaser: for ovs-dpdk and cpu pinning/hugepages we dont need nvme or gpus but we do need nested virt and a vm with multiple numa nodes. it is somthing that i agree i would love to discuss with infra.19:01
mnaseryeah, we'd have to talk it out with infra19:01
melwittdansmith: but sean-k-mooney was thinking checking isinstance was an anti-pattern of some kind19:01
sean-k-mooneymelwitt: sorry i should read the scrole back19:02
melwittsean-k-mooney: we're just talking about the "return exceptions from scatter-gather" thing19:02
dansmithmelwitt: overengineering is an anti-pattern :)19:02
tssuryasean-k-mooney: its about this: https://review.openstack.org/#/c/605251/19:02
mriedemnicolasbock: https://review.openstack.org/#/q/Ib10081150e125961cba19cfa821bddfac4614408 is what i'm thinking of19:03
nicolasbockIs it ok that the disk is still on the old host after migration?19:03
melwittsean-k-mooney: dansmith suggested the same thing I suggested when we first talked about it, just return exception objects instead of the sentinel and check isinstance(thing, Exception) to know whether an error was returned or not19:03
nicolasbockThanks mriedem19:03
sean-k-mooneydansmith: well i was porting a standard calass form c++ to python. retrun an exception has some weird sidefect in python 219:03
dansmithsean-k-mooney: I have no idea what weird side effect you mean, other than that re-raising it doesn't keep the exception context properly, but we won't be doing that here19:04
dansmithsys.exc_info I mean19:04
nicolasbockmriedem: gerrit's cherry-pick doesn't seem to know Newton. Is that because Newton is EOL'ed?19:04
sean-k-mooneymelwitt: so the exception object in python 2 has a reference to the stack fram form which it was first thrown if the garbage collector cant deallocate it or any locks. sys.exc_info  and retruning it is fine19:04
mriedemnicolasbock: correct, newton is eol upstream19:05
mriedemnicolasbock: note that that change is also building on top of two other fixes19:06
mriedemcalled out in the commit message19:06
sean-k-mooneydansmith: https://www.python.org/dev/peps/pep-0344/#open-issue-garbage-collection19:06
sean-k-mooneydansmith: if we call sys.exc_info() and return the tuple as the sentiel that is fine however19:06
nicolasbockThanks mriedem , I will apply the fix in our vendor packages only then19:06
nicolasbockThanks all for the help with the "lost" VM!19:07
dansmithsean-k-mooney: how is returning it any different than encapsulating it in your object here?19:07
dansmithfrom a GC perspective19:07
sean-k-mooneydansmith: i if you dont raise the exception and catch it it does not have the referecne to the stack frame so retrun VauleError("invalid data") is fine19:08
mriedemnicolasbock: well, you probably should verify that the reason you got into this mess in the first place was due to one of those bugs19:08
mriedembut whatever you want to do downstream is fine with me :)19:08
sean-k-mooney"expect Exception as e: return e" is not19:09
nicolasbockYes of course :)19:09
sean-k-mooneyexpect Exception: return sys.exc_info() is also fine19:09
spatelsean-k-mooney: "pci=assign-busses,realloc" was causing issue my server got hung at boot, as soon as i remove that it works.. I have HP DL360p G819:09
dansmithsean-k-mooney: but this result object of yours is just going to swallow the exception that we get from our handler right?19:09
dansmithsean-k-mooney: so it's still pinning the reference to the stack19:10
sean-k-mooneydansmith: ya i was planning to extend it to do the right thing on each python verion.19:11
sean-k-mooneythis is only an issue on python219:11
sean-k-mooneythey fix it in python 3 so you can just return the exception19:11
dansmithwe could trivially re-construct the exception since we know it inherits from NovaException and has very specific characteristics19:11
dansmithsince it's only py2, and since py3 is the future, and since this is a suuuper corner case that only "may" have issues with some GC being delayed... I would tend to punt on caring about this entirely19:12
dansmithbut19:12
sean-k-mooneydansmith: yep but if we are going to do that i taught it would be nice to hide that in a result calss that does the right thing19:12
dansmithjust re-creating the exception and returning it would be fine for our purposes19:12
melwitttssurya: sorry, I'm not really getting the usefulness of the single cell scatter-gather from the commit message. how does it help for down cell?19:13
sean-k-mooneyspatel: ya dont add that. it was specically to work around a hardware bug in my old servers19:13
dansmithIMHO we can punt and not worry about this19:13
spatelroger!19:13
spateli took it out19:13
tssuryamelwitt: https://review.openstack.org/#/c/591658/3/nova/compute/api.py@232319:14
tssuryawe could just directly use the scatter_gather_cells(), just that it looked bad19:14
mriedemjaypipes: i have crushed your soul https://review.openstack.org/#/c/607626/19:14
melwitttssurya: I see, to get the automatic "wait for this amount of time before timing out" part. thanks19:15
tssuryamelwitt: we will be using it for Instance.get_by_uuid for the nova show part19:15
sean-k-mooneydansmith: so ya ill likely work on https://review.openstack.org/#/c/605251/2 a little more in my spare time later this week as i want that class as a tool in my tool box for the future but we may not need it for this usecase.19:17
tssuryadansmith, sean-k-mooney: so.. which way are are agreeing on ?19:17
dansmithtssurya: I vote for return the exception and boil the ocean later :)19:17
melwittso what's all this reconstruction talk? in scatter-gather, when we catch the exception from the cell, are you saying we have to do more than just save the "as exp" part?19:18
tssuryadansmith: ack,19:18
dansmithmelwitt: we do not19:18
dansmithmelwitt: we could trivially if we want19:18
tssuryamelwitt: as far as I understood just returning it should be enough19:18
melwittmmkay19:18
dansmithjust return exp.__class__(exp.args) would be enough19:19
dansmithbut it'd need a comment about why19:19
dansmithand if we don't, then.. not19:19
melwittok, that's what I meant, we can't just return exp, we have to do that step19:19
dansmithwe've just spawned a thread at this point, and most of these things are single DB calls, which means the stack being pinned by the exception is tiny19:19
jaypipesmriedem: awesome.19:20
melwittok, yeah. comment if we do that because otherwise I'm not going to remember why19:20
sean-k-mooneydansmith: oh so just return a new exception object not the one we caught19:20
jaypipesmriedem: it's hard enough already for me to give a rat's ass about a stable branch. :)19:20
dansmithmelwitt: we don't have to do that step.. we can just return exp. If we want, we can do the reconstruction step (and document it)19:21
dansmithmelwitt: I would vote for not reconstructing because I think this is super tiny19:21
tssuryadansmith: ah got it19:21
melwittack, thank you19:21
tssuryamelwitt, sean-k-mooney, dansmith: thanks19:21
dansmithsoooo, back to the meeting,19:22
dansmithI will probably be back, if ya'll want to meet19:22
tssuryadansmith: yea are we having one ?19:22
melwittI'm neutral about meeting. I don't have anything special to talk about19:22
melwittmriedem might want to talk about cross-cell stuff? I dunno19:22
dansmithmriedem may want to talk about crossing the streams19:22
dansmithyeah, t hat19:22
tssuryaI don't have anything special except some silly bugs19:22
melwittsilly bugs? now I'm curious19:23
sean-k-mooneyjust one other comment we are not holding any locks or file handels correct wehre we raise the exception in the scater gater case?19:23
tssuryamelwitt: https://bugs.launchpad.net/nova/+bug/179499419:23
openstackLaunchpad bug 1794994 in OpenStack Compute (nova) "Update the --max-rows parameter description for nova-manage db archive_deleted_rows" [Low,In progress] - Assigned to Surya Seetharaman (tssurya)19:23
tssuryafor now I changed it to a doc fix, but I am skeptical about it19:23
dansmithsean-k-mooney: we would have just gotten a result from a threadpool of db workers, and they would almost definitely have re-raised outside of any locks19:24
tssuryait would be just good to have the API table record removal also in the max-rows19:24
tssuryanot sure if people care though19:24
*** spartakos has joined #openstack-nova19:25
tssuryabut yea its not super urgent19:25
tssuryaokay then I will head home now and will be lurking around during the meeting time in case we decide to have one19:26
melwittok, will read through it. the issue is the command output can be confusing given the treatment of the API records19:26
sean-k-mooneydansmith: ok the the stack frame reference keeps stack locals alive including any file handles or locks so can we add a commet the pep issue if we just return the exception just incase we have issue in the future19:26
tssuryamelwitt: exactly19:26
dansmithsean-k-mooney: yep19:26
sean-k-mooneydansmith: i think we will be fine but future me would regret not adding it if we ever have to debug it :)19:27
*** tbachman has joined #openstack-nova19:27
melwitttssurya: thanks. this is hard for me to imagine because I can't remember what the archive_deleted_rows output looks like :P will look in the code19:28
openstackgerritMatt Riedemann proposed openstack/nova stable/queens: stable-only: fix typo in IVS related privsep method  https://review.openstack.org/60481719:28
*** spatel has quit IRC19:29
*** tssurya has quit IRC19:30
mriedemmelwitt: dansmith: i don't really want to talk about cross-cell resize today probably; i suggested to dansmith that i skim my poc with him over a hangout early next week (i'm out tomorrow and friday)19:30
melwittk19:31
mriedemtl;dr functional testing shows it working,19:31
mriedembut there are a shit load of todos19:31
*** spatel has joined #openstack-nova19:31
mriedemand the patch is over 2K LOC now19:31
mriedemit is definitely not enterprise ready19:31
melwittlol19:31
mriedemi have also resorted to taking sleep aids to not wake up at 1am thinking about it...19:32
*** tbachman has quit IRC19:33
mriedemi've found this helps https://www.youtube.com/watch?v=Lrle0x_DHBM19:34
melwittheh19:36
*** jding1_ has quit IRC19:37
*** spatel has quit IRC19:38
*** jackding has joined #openstack-nova19:38
sean-k-mooneymriedem: oh youtube look kind of weird to me when its rendering 4:3 aspect ratio videos19:43
*** spatel has joined #openstack-nova19:47
*** s10 has quit IRC19:48
*** spatel has quit IRC19:56
*** spatel has joined #openstack-nova19:58
mriedemwho here knows what actually happens to the guest when you stop/start a vmware/hyperv/xenapi/ironic/powervm VM?19:59
mriedemspecifically, the root disk of said VMs if it's volume-backed?20:00
mriedemefried: does powervm in-tree support boot from volume yet?20:00
efriededmondsw: ^20:00
efriedlooking...20:00
*** spatel has quit IRC20:03
*** spatel has joined #openstack-nova20:04
efriedmriedem: Does compute set destroy_disks=False to the destroy() method if booted from volume?20:05
efriedmriedem: I assume you're trying to find out whether the disk gets destroyed or not.20:06
*** artom has quit IRC20:07
efriedI can tell you this: In tree, we don't destroy volumes.20:07
efriedBut I don't know whether we support bfv20:07
mriedemefried: no not related to that20:07
mriedemrelated to https://review.openstack.org/#/c/600628/20:07
mriedemwhich i haven't -1ed yet but it's coming20:07
mriedemthe virt driver doesn't destroy volumes, the compute manager orchestrates the detach and delete if bdm.terminate_on_deletion is True20:08
mriedemi'm mostly wondering if the virt driver will disconnect and reconnect volumes on simple stop/start operatoins20:08
mriedemfor libvirt, we do - starting around queens or rocky20:08
efriedWe don't disconnect anything on power-off20:10
*** slaweq has quit IRC20:10
*** mlavalle has quit IRC20:10
efriedThat said, I'm not 100% sure the *platform* retains ownership of that resource in such a way that you couldn't attach it to something else while the instance is powered off.20:10
efriedGerald would be better equipped to answer this stuff. But he ain't here.20:11
*** slaweq has joined #openstack-nova20:11
*** mlavalle has joined #openstack-nova20:11
*** macza has quit IRC20:15
*** macza has joined #openstack-nova20:16
edmondswmriedem re: bfv for powervm in-tree... I believe the code is close enough that it might work, but it's untested and there is at least one improvement we should make20:16
*** med_ has joined #openstack-nova20:16
*** spartakos has quit IRC20:17
edmondswmriedem why would you disconnect and reconnect volumes on stop/start?20:18
sean-k-mooneyedmondsw: the libvirt driver destorys the domain and recreates it on start stop20:20
edmondswright... why?20:21
sean-k-mooneyedmondsw: so its proably done as a sideffect of that20:21
mriedemcomments inline in that spec20:21
sean-k-mooneyedmondsw: legacy reasons but we treat stop like delete as far as libvirt is concerend but we dont delete the disk obviously20:22
sean-k-mooneywe do detach all ports, gpus ectra when we shut down the vm but we still retain owner ship of them in placement/the resouce tracker20:23
*** gyee has joined #openstack-nova20:24
edmondswok. I'll assume "legacy reasons" means there's no reason for other drivers to consider doing that20:24
*** tbachman has joined #openstack-nova20:26
sean-k-mooneywell there is one but its not a good one. if you are using iscsi volumes by detaching the volume on stop it reduces memory uses on the issci server20:26
sean-k-mooneywhich if its hardware based also means we can potenailly free up other hardware resouces but that also means the vm can fail to start back up if somting else grabs the last slot20:27
sean-k-mooneythat said you would have max out your cloud stroage at that point so you have bigger issue then one vm not starting20:27
sean-k-mooneyedmondsw: i dont know if there is a actul usecase where you would want to disconnect today but mayber there is20:28
edmondswsean-k-mooney tx for the explanation20:28
sean-k-mooneyi know some people want to be able to do things with bfv root volumes when the instace is offline too but i kindof zoned out at the ptg for that conversation.20:30
mriedemsean-k-mooney: that is exactly the spec i'm referring to above20:30
mriedemand why i'm asking about this20:30
*** spatel has quit IRC20:31
sean-k-mooneymriedem: ah ok that make more sense.20:31
mriedembecause i'm pretty sure swapping the root volume while the instance is stopped was not part of the originally approved spec20:31
mriedemand s10 got Kevin_Zheng to change it20:31
mriedemb/c of how the libvirt driver works20:31
*** tbachman_ has joined #openstack-nova20:31
*** tbachman has quit IRC20:31
*** tbachman_ is now known as tbachman20:31
mriedemand i'm asserting that's not a good enough reason...20:31
*** spatel has joined #openstack-nova20:31
sean-k-mooneyright i think haveing an expcit api to say detach volume for stoped instacnce would be better20:32
sean-k-mooneye.g. not assumeing its implcitly detatched when you stop20:32
mriedemwell, the virt driver could just refuse to detach the root volume while the instance is stopped20:32
mriedemif it doens't support it and raise an exception which gets recorded as a fault20:32
sean-k-mooneymriedem: it could but did they not want to allow that?20:32
sean-k-mooneye.g. detaching root volume when its stoped. that would be almost a noop for libvirt20:33
mriedemthe spec is proposing that you can swap the root volume while the instance is offloaded or stopped20:33
*** med_ has quit IRC20:33
sean-k-mooneyright but you could do that by createing a new volume. stoping the instance. detach the root volmu and attach the volume created in step 1 then start20:34
sean-k-mooneydo you need an explcit api to do the swap as an atopmic operation20:34
mriedemno, and that is what the spec is proposing20:34
mriedem"createing a new volume. stoping the instance. detach the root volmu and attach the volume created in step 1 then start"20:35
mriedemmy point is, i don't know that all virt drivers could handle that today for the root volume20:35
mriedemwhile the instance is stopped20:35
sean-k-mooneyha ok am perhaps20:35
sean-k-mooneyi cant think why they could not if they support rebuild20:36
*** tssurya has joined #openstack-nova20:36
*** spatel has quit IRC20:36
sean-k-mooneyits basicaly the same thing except we are not chaning host20:36
*** spatel has joined #openstack-nova20:37
mriedemrebuild does a driver.destroy and then a driver.spawn20:37
mriedemit assumes destruction20:37
mriedemi can't say that stop/start assume that same thing20:38
mriedemsame with shelve/unshelve20:38
mriedemshelve does a driver.destroy and unshelve does a driver.spawn20:38
mriedemsean-k-mooney: haven't you been working for like 20 straight hours at this point or something?20:39
mriedemat what point do you become drunk by exhaustion?20:39
sean-k-mooneymriedem: all of this is true. i would be surprised if this could not be supported on multiple hyperviors but its good to check.20:39
sean-k-mooneyhaha not quite but if i get tired enough i do find it harder to concentrate.20:40
sean-k-mooneyi have been working sine 10:30 + i took an hour for lunch but ya im just finishing up for the day20:41
mriedempretty sure you said you were finishing for the day about 4 hours ago20:41
sean-k-mooneyi did finish rather late last night. ya got distracted with a few things20:41
sean-k-mooneyi really need to file my ptg expensice tomrrow... i started  twice today but then got pull into email threads + code.20:44
sean-k-mooneythats what i was trying to figure out for the last 30 mins but it can wait20:44
sean-k-mooneytalk to you tommorw20:45
melwittmriedem, tssurya: I told dansmith we're skipping the meeting based on the earlier convo20:46
tssuryamelwitt: ack, thanks for the info :)20:47
melwitthe's not going to be back in time anyway20:47
mriedemhe said he would be back in time20:47
tssuryawfm, Its too late here anyways, you guys have a good day20:47
mriedem<3 broken20:47
melwittheh20:48
tssurya:)20:48
mriedemhttps://bugs.launchpad.net/nova/+bug/179596620:49
openstackLaunchpad bug 1795966 in OpenStack Compute (nova) "<class 'oslo_db.exception.DBNonExistentTable'> (HTTP 500)" [Undecided,Invalid]20:49
melwittmeanwhile, gd consoleauth. we have an API where you can 'show' your console auth token. and that is making the deprecation nightmare worse. have to figure out if/how to adjust this for the database backend20:50
*** spatel has quit IRC20:50
melwittmriedem: that must be the shortest bug report ever20:50
mriedemhttps://developer.openstack.org/api-ref/compute/#create-remote-console ?20:50
melwitthttps://developer.openstack.org/api-ref/compute/#show-console-connection-information20:51
mriedemah heh20:51
melwittFML20:51
mriedemwell,20:51
mriedemoh heh you can't know which cell to route it to right20:52
mriedemb/c the token isn't mapped in the api20:52
mriedemyou'll have to iterate the cell dbs looking for that token id20:52
melwittno... which I'm trying to remember, what did I find last time I looked at this. arrrrgghh20:53
*** spartakos has joined #openstack-nova20:54
mriedemhmm, we only store the hashed token in the db right20:55
mriedemand that's not what the API would have in it?20:55
melwittyeah only the hashed token. and I think the API takes the unhashed token from the user20:55
mriedemha, cool20:56
mriedemfor cell in all_cells(): for console_auth_token in all_console_auth_tokens_in_this_cell(): if console_auth_token == req_id: do that thing()20:56
*** eharney has quit IRC20:57
mriedemi bet we log that unhashed token in the nova-api logs too...20:58
mriedemsince it's on the path20:58
mriedemhttps://github.com/openstack/nova/blob/master/nova/api/openstack/requestlog.py#L4120:59
melwittindeed, I can see it in the func test output20:59
mriedemha, cool20:59
melwitt2018-10-03 20:37:40,870 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/os-console-auth-tokens/714a26ff-d7e6-4698-bc30-9934ebf38807"20:59
mriedemwell luckily logging credentials isn't a CVE21:00
*** priteau has quit IRC21:01
melwitt...21:01
melwittI guess people aren't paying too much attention to this API, myself included21:02
mriedemit's admin-only by default21:02
melwittI see, ok21:02
melwittNote "This is only used in Xenserver VNC Proxy."21:03
melwittreally? I wonder how21:03
mriedemthat's for the other 421:03
mriedemi saw that as well21:03
mriedemos-console-auth-tokens was added specifically for rdp consoles for hyperv21:03
melwittO.o21:04
*** erlon has quit IRC21:06
*** spartakos has quit IRC21:08
melwittyeah, so we could scatter-gather a ConsoleAuthToken.validate(context, token) call and only one will return token object, the others will raise exceptions. that method takes an unhashed token and will hash it before looking for it in the db21:09
mriedemsure21:10
mriedemshitty performance but whatareyougonnado21:11
mriedemplus it's admin-only and no one knew it existed21:11
melwittyeah, exactly21:11
melwitthaha, right. we have that going for us21:11
mriedemis there a bug for this?21:11
melwittI'll add it to the pile o poopatches21:11
melwittno21:11
melwittI'll open one21:11
mriedemcool. not sure if we should report the token logging thing or just pretend i never said it.21:12
melwittI was just making the changes to only access consoleauth if [workarounds] and ran into this in the func tests21:12
melwittso I'm doing really good here21:12
melwittyeah, I'm not sure either. I expect it wouldn't cause a CVE because it's been this way for years21:14
mriedemheh, well, we've had things "this way for years" that are CVEs,21:19
mriedembut logging credentials and tokens and such isn't considered one of them21:19
mriedemit's a "hardening opportunity"21:20
melwitthaha, ok21:20
openstackgerritJay Pipes proposed openstack/nova stable/ocata: Re-use existing ComputeNode on ironic rebalance  https://review.openstack.org/60762621:20
melwitthttps://bugs.launchpad.net/nova/+bug/179598221:21
openstackLaunchpad bug 1795982 in OpenStack Compute (nova) "/os-console-auth-tokens/{console_token} API doesn't handle the database backend" [High,Triaged] - Assigned to melanie witt (melwitt)21:21
*** spatel has joined #openstack-nova21:24
*** spartakos has joined #openstack-nova21:26
*** spatel has quit IRC21:29
melwittso, these other console create/delete/get APIs are connected to cell database models, with nothing at the API level to target cells for the consoles21:30
*** slaweq has quit IRC21:30
melwitt"nova-console, which is a XenAPI-specific service that most recent VNC proxy architectures do not use."21:31
melwittit sounds like that should be deprecated. we didn't do anything to handle it in a cells v2 world21:32
melwittmaybe I should send something to the ML to ask about it21:35
*** awaugama has quit IRC21:43
*** slagle has quit IRC21:44
mriedemi thought the xvp stuff was xen-only21:45
*** takashin has joined #openstack-nova21:45
melwittyeah, the nova-console service is xen-only. but if someone ran multi-cell with xen, the nova-console part wouldn't work right21:46
mriedembut yeah this is clearly busted in a cells v2 world21:46
*** tbachman has quit IRC21:47
melwittso the question will be, do we cells-v2-ify it or do we deprecate it. tbc, this is for the other APIs, not the consoleauth one I'm fixing21:47
melwitt*the other 421:47
mriedemyeah i know21:48
*** tbachman has joined #openstack-nova21:49
mriedemidk, i've asked about killing xvp in the past21:50
mriedemno one seems to know21:50
melwittah, ok21:50
mriedemi'd say if there are alternatives available for xenapi users, then we should deprecate it21:51
mriedemso probably a question for naichuans and BobBall21:51
melwittyeah, that's what I wasn't sure about, because IIUC, xenapi has to use some ancient version of stuff, so they might actually need it because they can't use newer VNC21:51
mriedemand yeah send something to the dev and ops MLs21:51
mriedemoh b/c of python 2.4?21:52
mriedemi might be thinking of something else21:52
mriedemi guess start with the ML21:52
melwittmaybe. when stephenfin worked on the encrypted console stuff, he had to exclude xenapi from the version requirement IIRC21:52
mriedemb/c twould be nice to drop all this crap21:52
melwittyeah21:52
melwittI was thinking of this https://github.com/openstack/nova/blob/master/nova/cmd/novncproxy.py#L4021:54
melwittso maybe unrelated since that implies xenapi users can use the regular novnc proxy21:55
*** munimeha1 has quit IRC21:56
melwittum, this is ancient. and not for exactly the same log you pointed out https://bugs.launchpad.net/nova/+bug/149214022:01
openstackLaunchpad bug 1492140 in OpenStack Compute (nova) "consoleauth token displayed in log file" [Low,In progress] - Assigned to Tristan Cacqueray (tristan-cacqueray)22:01
openstackgerritMatt Riedemann proposed openstack/nova stable/queens: Explicitly fail if trying to attach SR-IOV port  https://review.openstack.org/60772922:05
mriedemyeah that's for consoleauth22:05
openstackgerritMatt Riedemann proposed openstack/nova stable/queens: Ignore VirtDriverNotReady in _sync_power_states periodic task  https://review.openstack.org/60773022:07
*** s10 has joined #openstack-nova22:08
*** itlinux has quit IRC22:09
*** mlavalle has quit IRC22:11
mriedemgibi: efried: so i polished off this old dnm devstack test experiment patch to hammer the scheduler to create 1000 instances in a single request which used to give us a ConcurrentUpdate failure during scheduling and creating allocations, and now i can make it fail with consumer generation conflicts http://logs.openstack.org/18/507918/8/check/tempest-full/a9f3849/controller/logs/screen-n-sch.txt.gz?level=TRACE#_Oct_02_23_29_1222:16
mriedem48122:16
mriedemOct 02 23:29:12.475481 ubuntu-xenial-limestone-regionone-0002536892 nova-scheduler[22653]: ERROR oslo_messaging.rpc.server [None req-f4fe43ea-d117-4b7d-a3a4-23dcb59f3058 admin admin] Exception during message handling: AllocationDeleteFailed: Failed to delete allocations for consumer 6962f92b-7dca-4912-aeb2-dcae03c4b52e. Error: {"errors": [{"status": 409, "request_id": "req-13df41fe-cb55-49f1-a998-09b34e48f05b", "code": "place22:16
mriedem.concurrent_update", "detail": "There was a conflict when trying to complete your request.\n\n consumer generation conflict - expected null but got 3  ", "title": "Conflict"}]}22:16
efriedmriedem: Are you using any in-flight patches under that, or just master?22:17
*** spartakos has quit IRC22:17
mriedemi think that is newish right?22:17
mriedemmaster22:17
efriedYes, it's new, since the bottom few patches of gibi's consumer gen patches merged.22:17
efriedmriedem: You may want to try running it on top of https://review.openstack.org/#/c/583667/ and see if that fixes it.22:19
efriedmriedem: fyi, the ConcurrentUpdate and generation conflict are the same thing, we just switched the error code recently.22:20
efriedso it's not really "new", it's just wearing a different dress.22:20
mriedemthat change doesn't look like it would help here,22:20
efriedorly ynot?22:21
mriedemit doesn't use the latest consumer generation when deleting allocations right?22:21
mriedemwe're failing to submit allocations because the host is full22:21
mriedemOct 02 23:29:12.377430 ubuntu-xenial-limestone-regionone-0002536892 nova-scheduler[22653]: WARNING nova.scheduler.client.report [None req-f4fe43ea-d117-4b7d-a3a4-23dcb59f3058 admin admin] Unable to submit allocation for instance 63ae7544-7693-4749-886b-024dc93f09f9 (409 {"errors": [{"status": 409, "request_id": "req-0e85117c-871c-46c2-9e01-53c84e811b44", "code": "placement.undefined_code", "detail": "There was a conflict when22:21
mriedeming to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'MEMORY_MB' on resource provider 'b7709a93-f14c-42ed-addf-9736fb721728'. The requested amount would exceed the capacity.  ", "title": "Conflict"}]})22:21
mriedemand then the scheduler is trying to cleanup allocations created for previously processed instances in the same request22:21
mriedemand fails to do that cleanup b/c the consumer generation changed22:22
efriedhm, yeah, I didn't notice that the first one was a failure on deletion.22:22
efriedThat window I bitched about us not really closing22:22
efriedapparently it's big enough for us to actually hit it.22:22
mriedemnote this is also an extreme case,22:23
mriedemi'm creating 1000 instances in a single request22:23
mriedemexpecting to melt the scheduler22:23
mriedemand i do22:23
efriedI'm trying to figure out where that message is really coming from. "expected null but got 3" <== does this mean we sent null or 3 to the API?22:25
mriedemyeah so between the GET and PUT of the allocatoins, the consumer generation changed and we blew up22:25
melwittI wonder if a normaler number like 100 would do it? because I have heard of people doing that (oath)22:25
*** rcernin has joined #openstack-nova22:25
efriedright, I'm trying to figure out how that happens. What else is mucking with the allocations?22:25
mriedemhttps://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L208322:25
mriedem"If between the GET and the PUT the consumer                               # generation changes then we raise AllocationDeleteFailed."22:25
mriedemthe consumer is just the project/user right?22:25
melwitt(my comment was based on the use of the word "extreme")22:26
mriedemi mean, i guess we have the instance_uuid for the consumer here.22:26
efriedThat's the aforementioned window I bitched about, the result of which was the comment a couple of lines below that.22:26
mriedemis the consumer in placement the unique constraint of the uuid/project_id/user_id?22:26
efriedThere's a real consumer object now22:26
mriedemso the other thing here,22:26
efriedthe consumer object (a db table row) has a generation we have to update atomically or die.22:27
sean-k-mooneymelwitt: i have spawned 350 instance in one request before on newton and it worked fine22:27
mriedemis that because we have a retry decorator on the select_destinations rpc call, if we get MessagingTimeout from the scheduler b/c it takes too long to schedule 1000 instances in a single request, it re-sends the request to the scheduler with the same list of instances22:27
melwittsean-k-mooney: ack, that's a data point22:27
mriedemso at this point we would have 2 workers trying to create allocations for the same set of instances (consumers) against the same set of providers22:27
mriedemwhich will stomp all over themselves22:27
efriedoh, okay. That'd do it.22:27
mriedemwhat i'm wondering is if delete_allocation_for_instance should detect the consumer generation conflict and retry22:28
mriedemlike we do on claim_resources https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L174422:28
*** s10 has quit IRC22:28
efriedmriedem: We decided it should not22:31
efriedthere was a whole long ML thread about it.22:31
efriedBecause, we said, there really shouldn't be more than one thing acting on instance allocations at once, we said.22:31
efriedIMO the bug is  that ^22:32
mriedemyeah i'm writing up the bug now22:33
efriedmriedem: Do you need the ML thread?22:33
mriedemno22:33
mriedemhttps://bugs.launchpad.net/nova/+bug/179599222:33
openstackLaunchpad bug 1795992 in OpenStack Compute (nova) "retry_select_destinations decorator can make a mess with allocations in placement in a large multi-create request" [Medium,Triaged]22:33
efriedHeh. "make a mess".22:34
* efried has fond memories of "diaper failure"22:34
efried...fond because they're *memories*.22:34
mriedemtotal blowout22:34
efriedOne time in IKEA22:34
mriedemcoincidentally, lbragstad is dealing with that right now22:34
efriedOh, did he pop? Good deal.22:35
mriedemblack split pea soup coming out of everything22:35
mriedemlet me mind meld with him quick22:35
melwittcongrats lbragstad22:35
sean-k-mooneyoh before i forget i popped back to say i just found out that kernel 4.16 added a new netdevsim driver that supports among other coolthings sriov. would people be ok with me creating an experimental gate job to test sriov using fedora28?22:36
efriedmriedem: Actually, that patch I mentioned before might possibly make the failure happen earlier in the sequence...22:36
efriedbecause surely the allocation is being overwritten22:36
efriedthough it might be subject to the same window-teeninenss22:37
*** tbachman has quit IRC22:38
*** tssurya has quit IRC22:41
*** macza has quit IRC22:54
*** macza_ has joined #openstack-nova22:54
*** spartakos has joined #openstack-nova23:02
*** tbachman has joined #openstack-nova23:07
openstackgerritMatt Riedemann proposed openstack/nova master: Use long_rpc_timeout in select_destinations RPC call  https://review.openstack.org/60773523:07
mriedemdansmith: ^23:07
* dansmith nods in approval23:08
*** macza_ has quit IRC23:12
*** tbachman has quit IRC23:12
mriedemmelwitt: ocata backport here should be ready to go https://review.openstack.org/#/c/605842/23:12
melwittok23:13
*** tbachman has joined #openstack-nova23:16
mriedemlyarwood: if you want to get these live migration ipv6 changes into the final ocata release before we put it into EM mode you'll need to get the pike and ocata backports fixed up https://review.openstack.org/#/q/I1201db996ea6ceaebd49479b298d74585a78b00623:24
*** artom has joined #openstack-nova23:30
melwittTIL unified object string fields are six.text_type i.e. unicode23:38
melwittdo we have any things where we compare strings agnostic to bytes vs unicode in unit tests?23:46
melwittthis test is asserting the api response as a dict23:47
melwittand if consoleauth served the request, it's bytes strings and if the unified object served the request, it's unicode strings23:47
*** itlinux has joined #openstack-nova23:48
*** mchlumsky has quit IRC23:49
*** slagle has joined #openstack-nova23:51
*** mriedem has quit IRC23:51

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!