Wednesday, 2018-10-03

*** takashin has joined #openstack-nova		00:07
*** tetsuro has joined #openstack-nova		00:15
*** med_ has joined #openstack-nova		00:16
*** markvoelker has quit IRC		00:17
*** markvoelker has joined #openstack-nova		00:17
*** hshiina has joined #openstack-nova		00:18
*** markvoelker has quit IRC		00:21
*** gyee has quit IRC		00:29
*** erlon has quit IRC		00:35
openstackgerrit	Merged openstack/nova stable/rocky: Delete instance_id_mappings record in instance_destroy https://review.openstack.org/604373	00:56
*** macza has joined #openstack-nova		00:59
*** macza has quit IRC		01:03
*** mhen has quit IRC		01:05
*** mhen has joined #openstack-nova		01:07
openstackgerrit	Merged openstack/nova stable/rocky: Fix stacktraces with redis caching backend https://review.openstack.org/606895	01:09
openstackgerrit	Merged openstack/nova stable/rocky: Null out instance.availability_zone on shelve offload https://review.openstack.org/606086	01:09
*** hamzy has joined #openstack-nova		01:12
openstackgerrit	Merged openstack/nova stable/rocky: XenAPI/Stops the migration of volume backed VHDS https://review.openstack.org/604203	01:13
openstackgerrit	Merged openstack/nova stable/pike: Follow devstack-plugin-ceph job rename https://review.openstack.org/602022	01:14
openstackgerrit	Merged openstack/nova stable/pike: Fix unit test modifying global state https://review.openstack.org/584592	01:14
*** mdbooth has quit IRC		01:18
*** mdbooth has joined #openstack-nova		01:18
*** mrsoul has joined #openstack-nova		01:19
*** bhagyashris has joined #openstack-nova		01:21
*** mdbooth has quit IRC		01:21
*** mdbooth has joined #openstack-nova		01:22
*** mdbooth has quit IRC		01:25
*** mdbooth has joined #openstack-nova		01:25
*** moshele has joined #openstack-nova		01:26
*** tetsuro has quit IRC		01:29
*** med_ has quit IRC		01:30
*** mdbooth has quit IRC		01:30
*** mdbooth has joined #openstack-nova		01:31
*** Dinesh_Bhor has joined #openstack-nova		01:31
*** Dinesh_Bhor has quit IRC		01:38
*** Dinesh_Bhor has joined #openstack-nova		01:47
*** tetsuro has joined #openstack-nova		01:49
*** dave-mccowan has joined #openstack-nova		01:56
*** tinwood has quit IRC		02:10
*** tinwood has joined #openstack-nova		02:11
*** markvoelker has joined #openstack-nova		02:18
*** cfriesen has quit IRC		02:28
*** moshele has quit IRC		02:30
openstackgerrit	Tetsuro Nakamura proposed openstack/nova stable/rocky: Fix aggregate members in nested alloc candidates https://review.openstack.org/607454	02:36
*** med_ has joined #openstack-nova		02:50
*** markvoelker has quit IRC		02:51
*** Dinesh_Bhor has quit IRC		03:00
*** Dinesh_Bhor has joined #openstack-nova		03:01
*** dave-mccowan has quit IRC		03:08
*** med_ has quit IRC		03:37
*** med_ has joined #openstack-nova		03:40
*** hongbin has joined #openstack-nova		03:41
*** markvoelker has joined #openstack-nova		03:48
*** penick has quit IRC		03:52
*** hongbin has quit IRC		03:59
*** Dinesh_Bhor has quit IRC		04:01
*** Dinesh_Bhor has joined #openstack-nova		04:05
*** Dinesh_Bhor has quit IRC		04:20
*** med_ has quit IRC		04:21
mnaser	has anyone seen behaviour (on queens) where doing pci passthrough for gpus, qemu-kvm process goes up but seems to be spinning at 100% cpu and the process is stuck?	04:21
*** markvoelker has quit IRC		04:21
mnaser	strace just shows a bunch of ioctl's and poll's	04:21
mnaser	nothing weird in dmesg or journals	04:22
mnaser	nova side looks ok.. "Final resource view: name=<snip> phys_ram=524194MB used_ram=66560MB phys_disk=1863GB used_disk=225GB total_vcpus=48 used_vcpus=6 pci_stats=[PciDevicePool(count=7,numa_node=0,product_id='102d',tags={dev_type='type-PCI'},vendor_id='10de')]"	04:23
*** macza has joined #openstack-nova		04:26
*** macza has quit IRC		04:30
mnaser	Hmm, I have some leads. I might push some doc changes for the pci passthrough	04:45
mnaser	Dunno if it’s okay to have docs that are more hardware specific	04:45
mnaser	Something along the lines of https://github.com/dholt/kvm-gpu/blob/master/README.md	04:45
*** Dinesh_Bhor has joined #openstack-nova		04:58
*** Bhujay has joined #openstack-nova		05:01
*** Bhujay has quit IRC		05:01
*** Bhujay has joined #openstack-nova		05:02
*** markvoelker has joined #openstack-nova		05:18
takashin	cd /tmp	05:22
*** udesale has joined #openstack-nova		05:41
*** ratailor has joined #openstack-nova		05:43
*** markvoelker has quit IRC		05:52
*** moshele has joined #openstack-nova		06:00
gmann	nova API office hour time	06:00
gmann	#startmeeting nova api	06:01
openstack	Meeting started Wed Oct 3 06:01:34 2018 UTC and is due to finish in 60 minutes. The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot.	06:01
openstack	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	06:01
*** openstack changes topic to " (Meeting topic: nova api)"		06:01
openstack	The meeting name has been set to 'nova_api'	06:01
gmann	PING List: gmann, alex_xu	06:01
gmann	hanging around for some time if anyone has query related to API	06:08
*** vivsoni has joined #openstack-nova		06:10
*** Dinesh_Bhor has quit IRC		06:11
openstackgerrit	OpenStack Proposal Bot proposed openstack/nova stable/rocky: Imported Translations from Zanata https://review.openstack.org/604260	06:14
*** dims has quit IRC		06:24
*** dims has joined #openstack-nova		06:26
gmann	let's close office hour.	06:32
gmann	#endmeeting	06:32
*** openstack changes topic to "Current runways: use-nested-allocation-candidates -- This channel is for Nova development. For support of Nova deployments, please use #openstack."		06:32
openstack	Meeting ended Wed Oct 3 06:32:39 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	06:32
openstack	Minutes: http://eavesdrop.openstack.org/meetings/nova_api/2018/nova_api.2018-10-03-06.01.html	06:32
openstack	Minutes (text): http://eavesdrop.openstack.org/meetings/nova_api/2018/nova_api.2018-10-03-06.01.txt	06:32
openstack	Log: http://eavesdrop.openstack.org/meetings/nova_api/2018/nova_api.2018-10-03-06.01.log.html	06:32
*** dims has quit IRC		06:34
*** Dinesh_Bhor has joined #openstack-nova		06:35
*** vivsoni has quit IRC		06:35
*** dims has joined #openstack-nova		06:35
openstackgerrit	Merged openstack/nova stable/ocata: Update RequestSpec.flavor on resize_revert https://review.openstack.org/605880	06:37
*** dpawlik has joined #openstack-nova		06:39
openstackgerrit	Merged openstack/python-novaclient master: Fix up userdata argument to rebuild. https://review.openstack.org/605341	06:47
*** pcaruana has joined #openstack-nova		06:51
*** vivsoni has joined #openstack-nova		06:53
*** sahid has joined #openstack-nova		06:54
bauzas	good morning nova	07:10
*** jding1_ has joined #openstack-nova		07:10
*** aarents has joined #openstack-nova		07:17
*** Dinesh_Bhor has quit IRC		07:19
*** jroll has quit IRC		07:19
*** logan- has quit IRC		07:19
*** jackding has quit IRC		07:19
*** ajo has quit IRC		07:19
*** Gorian has quit IRC		07:19
*** odyssey4me has quit IRC		07:19
*** jcosmao has quit IRC		07:19
*** vdrok has quit IRC		07:19
*** amotoki has quit IRC		07:23
gibi	morning	07:28
*** helenafm has joined #openstack-nova		07:29
*** jroll has joined #openstack-nova		07:32
*** logan- has joined #openstack-nova		07:32
*** ajo has joined #openstack-nova		07:32
*** vdrok has joined #openstack-nova		07:32
*** jcosmao has joined #openstack-nova		07:32
*** odyssey4me has joined #openstack-nova		07:32
*** skatsaounis has quit IRC		07:36
*** tssurya has joined #openstack-nova		07:38
*** Sigyn has quit IRC		07:40
*** alexchadin has joined #openstack-nova		07:40
*** Sigyn has joined #openstack-nova		07:40
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Use provider tree in virt FakeDriver https://review.openstack.org/604083	07:41
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Refactor allocation checking in functional tests https://review.openstack.org/607287	07:41
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Run ServerMovingTests with nested resources https://review.openstack.org/604084	07:41
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Ignore forcing of live migration for nested instance https://review.openstack.org/605785	07:41
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Consider nested allocations during allocation cleanup https://review.openstack.org/606050	07:41
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Ignore forcing of evacuation for nested instance https://review.openstack.org/606111	07:41
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Run negative server moving tests with nested RPs https://review.openstack.org/604125	07:41
*** jpena\|off is now known as jpena		07:51
openstackgerrit	Balazs Gibizer proposed openstack/nova master: consumer gen: support claim_resources https://review.openstack.org/583667	07:52
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Enable nested allocation candidates in scheduler https://review.openstack.org/585672	07:52
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Use provider tree in virt FakeDriver https://review.openstack.org/604083	07:52
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Refactor allocation checking in functional tests https://review.openstack.org/607287	07:52
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Run ServerMovingTests with nested resources https://review.openstack.org/604084	07:52
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Ignore forcing of live migration for nested instance https://review.openstack.org/605785	07:52
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Consider nested allocations during allocation cleanup https://review.openstack.org/606050	07:52
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Ignore forcing of evacuation for nested instance https://review.openstack.org/606111	07:52
openstackgerrit	Balazs Gibizer proposed openstack/nova master: Run negative server moving tests with nested RPs https://review.openstack.org/604125	07:52
*** tetsuro has quit IRC		07:54
*** ralonsoh has joined #openstack-nova		07:55
*** rcernin has quit IRC		07:55
*** takashin has left #openstack-nova		08:02
*** vivsoni has quit IRC		08:07
*** jroll has quit IRC		08:13
*** logan- has quit IRC		08:13
*** ajo has quit IRC		08:13
*** odyssey4me has quit IRC		08:13
*** jcosmao has quit IRC		08:13
*** vdrok has quit IRC		08:13
*** markvoelker has joined #openstack-nova		08:18
*** vivsoni has joined #openstack-nova		08:18
*** alexchadin has quit IRC		08:21
*** jroll has joined #openstack-nova		08:27
*** logan- has joined #openstack-nova		08:27
*** ajo has joined #openstack-nova		08:27
*** vdrok has joined #openstack-nova		08:27
*** jcosmao has joined #openstack-nova		08:27
*** odyssey4me has joined #openstack-nova		08:27
*** amotoki_ has joined #openstack-nova		08:27
*** skatsaounis has joined #openstack-nova		08:28
*** cdent has joined #openstack-nova		08:31
*** derekh has joined #openstack-nova		08:33
ralonsoh	stephenfin: https://review.openstack.org/#/c/476612/36/vif_plug_ovs/ovsdb/ovsdb_lib.py@83. I don't understand this	08:47
ralonsoh	stephenfin: do you mean I need to move this function... where?	08:47
*** ttsiouts has joined #openstack-nova		08:51
*** jroll has quit IRC		08:51
*** logan- has quit IRC		08:51
*** ajo has quit IRC		08:51
*** odyssey4me has quit IRC		08:51
*** jcosmao has quit IRC		08:51
*** vdrok has quit IRC		08:51
*** markvoelker has quit IRC		08:51
*** ttsiouts has quit IRC		08:53
*** ttsiouts has joined #openstack-nova		08:53
openstackgerrit	Rodolfo Alonso Hernandez proposed openstack/os-vif master: Add abstract OVSDB API https://review.openstack.org/476612	08:54
*** derekh has quit IRC		08:58
*** ttsiouts has quit IRC		08:58
* stephenfin looks		08:58
*** ttsiouts has joined #openstack-nova		09:00
stephenfin	ralonsoh: Sorry, yeah, I mean move that above 'create_ovs_vif_port' or below 'delete_ovs_vif_port', so that 'create_', 'update_' and 'delete_' are grouped together	09:02
stephenfin	ralonsoh: It's a nit though. Don't worry about it	09:02
ralonsoh	stephenfin: done!	09:02
stephenfin	Oh, perfect :)	09:03
stephenfin	I'll review that now	09:03
*** jroll has joined #openstack-nova		09:05
*** logan- has joined #openstack-nova		09:05
*** ajo has joined #openstack-nova		09:05
*** vdrok has joined #openstack-nova		09:05
*** jcosmao has joined #openstack-nova		09:05
*** odyssey4me has joined #openstack-nova		09:05
*** ttsiouts has quit IRC		09:07
*** ttsiouts has joined #openstack-nova		09:07
*** ttsiouts_ has joined #openstack-nova		09:10
*** ttsiouts has quit IRC		09:12
*** alexchadin has joined #openstack-nova		09:13
*** priteau has joined #openstack-nova		09:13
openstackgerrit	Stephen Finucane proposed openstack/nova master: doc: Rewrite the console doc https://review.openstack.org/606148	09:38
openstackgerrit	Stephen Finucane proposed openstack/nova master: doc: Add minimal documentation for RDP consoles https://review.openstack.org/606992	09:38
openstackgerrit	Stephen Finucane proposed openstack/nova master: doc: Add minimal documentation for MKS consoles https://review.openstack.org/606993	09:38
openstackgerrit	Stephen Finucane proposed openstack/nova master: conf: Allow 'nova-xvpvncproxy' to be called with CLI args https://review.openstack.org/606929	09:38
*** alexchadin has quit IRC		09:46
*** mdbooth has joined #openstack-nova		09:46
bauzas	does anyone know how to unstack/stack devstack with OFFLINE=True for a single project ?	09:47
bauzas	I mean, I have all my stack but I just want to reinstall nova from master	09:47
sean-k-mooney	bauzas: unstack, then manually git pull/checkout master on the nova repo set OFFLINE=True and stack	09:49
sean-k-mooney	devstack wont update any of the projects and just used what is availabel	09:50
sean-k-mooney	if you are unlucky you might need to pip install requirements form master if they have changed but that rarely is required	09:50
bauzas	sean-k-mooney: that was my plan but I had the assumption that it would redeploy all projects from ENABLED_SERVICES	09:51
sean-k-mooney	bauzas: it will	09:51
bauzas	sean-k-mooney: my point is that I just want to init_nova() honestly	09:51
sean-k-mooney	is that an issue	09:51
sean-k-mooney	oh well dont unstack then	09:51
bauzas	sean-k-mooney: my need is just that now that I reshaped some inventories, I just want to reshape back	09:51
sean-k-mooney	just checkout the branch you want and run sudo systemctl restart devstack@n-*	09:52
sean-k-mooney	if reshape is done by nova manage change restart to stop, then run nova manage and then start	09:53
sean-k-mooney	as long as there are no schema migrations in between your current branch and the one your going to and no requiremetns changes you dont need to restack to change commit. just git checkout and restart service X	09:55
cdent	yeah, that's what I was going to suggest	09:58
*** bhagyashris has quit IRC		10:04
*** ttsiouts_ has quit IRC		10:07
stephenfin	bauzas: Could you look at pushing https://review.openstack.org/#/c/456572/ through?	10:08
*** kukacz has quit IRC		10:12
*** kukacz has joined #openstack-nova		10:13
bauzas	sean-k-mooney: the problem is that the inventories are reshaped	10:17
bauzas	sean-k-mooney: so a DB sync won't work, right?	10:17
bauzas	because all the tables are there	10:17
sean-k-mooney	bauzas: you can always drop the tables and then sync i guess	10:18
bauzas	if I'm dropping the tables, I miss eg. https://github.com/openstack-dev/devstack/blob/master/lib/nova#L722-L723	10:19
bauzas	sean-k-mooney: ^	10:19
sean-k-mooney	i dont think there is any magical way to reshape witout deleting the reshaped RPs and restarting nova compute	10:20
sean-k-mooney	that is what i ment by droping the tables	10:20
bauzas	I think I'll just unstack/unstack for this time, and snapshot the DB	10:21
*** derekh has joined #openstack-nova		10:22
bauzas	so, when I want to go backwards, I'll just use the dumpfile	10:22
sean-k-mooney	ya that would work but other then for local testing is there a reason you are trying to downgrade?	10:22
bauzas	sean-k-mooney: no, just testing indeed	10:22
sean-k-mooney	if you have never used it https://www.heidisql.com/ is a great little tool for working with dbs	10:23
bauzas	well	10:23
sean-k-mooney	you need to run it under wine however	10:23
bauzas	just for what I want, a mysqldump is enough	10:24
*** udesale has quit IRC		10:26
sean-k-mooney	the one thing that annoys me more then the fact that we use a 80 charter line lenght is that we configure pep8 on spec for 79 charters	10:36
*** med_ has joined #openstack-nova		10:41
openstackgerrit	sean mooney proposed openstack/nova-specs master: Add spec for sriov live migration https://review.openstack.org/605116	10:43
*** mdbooth has joined #openstack-nova		10:47
*** mvkr has quit IRC		10:48
*** markvoelker has joined #openstack-nova		10:49
*** hshiina has quit IRC		10:55
stephenfin	That moment of panic where you submit a review and spot a group of comments on older patchsets from the corner of your eye	10:57
stephenfin	What did Stephen of June 2018 have to say about this...	10:57
*** ttsiouts has joined #openstack-nova		11:02
*** mvkr has joined #openstack-nova		11:02
*** erlon has joined #openstack-nova		11:03
*** moshele has quit IRC		11:03
*** amotoki_ is now known as amotoki		11:07
mdbooth	stephenfin: Fancy a bash at this one: https://review.openstack.org/#/c/605436/ I bitch about python3 in it ;)	11:08
openstackgerrit	sean mooney proposed openstack/nova-specs master: Add spec for sriov live migration https://review.openstack.org/605116	11:09
*** s10 has joined #openstack-nova		11:11
*** markvoelker has quit IRC		11:22
*** ratailor has quit IRC		11:37
*** macza has joined #openstack-nova		11:37
*** jpena is now known as jpena\|lunch		11:38
sean-k-mooney	mdbooth: one basic question regarding https://review.openstack.org/#/c/605436/5. the compaute manger runs on the compute agent which is singel treaded but uses eventlets. so there is no paralleism but there is concurancy. so the lock you are aquiring is the mockey patched greenthread lock. is the reason we need the lock in the first place the fact we are doing db io and eventlets is cause us to yeild	11:39
sean-k-mooney	allow another invocation of the fuction to start concurrently which races	11:39
mdbooth	sean-k-mooney: Without going into details of locking, I find it's safest to ignore eventlets entirely when considering locking.	11:40
mdbooth	When you tie yourself in knots trying to second guess a scheduler, you make lots of mistakes.	11:41
sean-k-mooney	mdbooth: if we did not have eventlets in this case we would not need to lock	11:41
mdbooth	It has multiple threads of execution.	11:41
mdbooth	I don't care how we achieve that.	11:41
sean-k-mooney	the compute manager is exectued from the compute agent right which does not have workers so only one thread	11:42
mdbooth	Eventlets or python's 'threading' all have the same issues.	11:42
*** macza has quit IRC		11:42
sean-k-mooney	mdbooth: yes but my point is eventlets intoduced the concurency so that therefor we need a lock to be correct	11:43
mdbooth	sean-k-mooney: We have concurrency.	11:43
sean-k-mooney	if we did not have eventlets the previous code would have been correct because it would have been singel threaded	11:43
sean-k-mooney	mdbooth: yep i know	11:44
mdbooth	sean-k-mooney: Sure. We do have concurrency, though. It uses eventlets.	11:44
sean-k-mooney	just makeing sure i understand the subtelties of the patch. this is an example of why i hate eventlets it hides concurancy	11:44
mdbooth	sean-k-mooney: It doesn't really by the time you get into the compute manager.	11:44
mdbooth	You just ignore eventlets entirely and assume you have multiple concurrent threads. How they're implemented isn't all that important in that code.	11:45
mdbooth	If we later switched to a multi-threaded model, it would still be fine.	11:45
sean-k-mooney	mdbooth: well if you were new to nova or did not think about it at the time then you can write races because of eventlets easier then if it was expcitly threaded	11:46
mdbooth	Honestly, I never consider eventlets. I assume it's explicitly threaded.	11:46
mdbooth	It's not, but that doesn't have any bearing on writing safe code, except when there's bugs in eventlet.	11:47
sean-k-mooney	mdbooth: most new people i have talked to that work on openstack dont think about threads at all because the say oh its python and that has a gil so i dont need to care	11:47
* mdbooth would tell anybody new to Nova to ignore eventlets and assume it's multi-threaded.		11:47
mdbooth	sean-k-mooney: That is just one of many issues with python in the real world :(	11:48
mdbooth	Also, the gil doesn't prevent overlapping threads of execution, it just stops them running at the same time. The problems are the same.	11:48
sean-k-mooney	mdbooth: yes and no. there perception is normally correct. its eventlets that violates the paradime	11:49
mdbooth	As an old curmudgeon, I think python has been extremely detrimental to software engineering, particular in education	11:49
mdbooth	No, it would not be correct. If you have a multi-threaded python program, even though the gil prevents multiple threads running concurrently, you still need locks.	11:50
sean-k-mooney	mdbooth: i self taught myself c++ as my first langage then java so ya i like to understand exactly what is going on	11:50
mdbooth	Basically: eventlets or python-multithreading: I don't care. It shouldn't change how your write code.	11:51
sean-k-mooney	learning c++ fist made me a better engineer then learing python would have.	11:51
mdbooth	They both use fake threading.	11:52
openstackgerrit	Matthew Booth proposed openstack/nova master: Run evacuate tests with local/lvm and shared/rbd storage https://review.openstack.org/604400	11:53
mdbooth	I think ^^^ might fix that weird issue I was hitting	11:53
mdbooth	There's a shortcut in service_is_up if the service is forced down, and we forced it down	11:53
sean-k-mooney	mdbooth: yes i know if it was expcitly multi thraded it would be incorrect without the lock. anyway cool ill take a look at that too	11:54
mdbooth	sean-k-mooney: Don't worry about ^^^ btw. Will just wait until zuul has voted.	11:55
mdbooth	sean-k-mooney: I knew I was missing something simple there.	11:55
sean-k-mooney	waith why was it forced down?	11:55
sean-k-mooney	oh so you could evacuate	11:56
mdbooth	The test_evacuate.sh script which ran before the second round...	11:56
mdbooth	yeah	11:56
sean-k-mooney	ha yet anthoer race this tiem between test :)	11:57
nicolasbock	Morning	12:00
nicolasbock	I have a run-away server, i.e. a VM that's running on a hypervisor different than what Nova thinks. So far I haven't quite been able to get the placement service to help me update Nova's view of reality...	12:01
nicolasbock	I can see the server with 'openstack server show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd'	12:02
nicolasbock	And it lists the wrong hypervisor	12:02
mdbooth	sean-k-mooney: $ git grep test\.nested \| wc -l	12:02
mdbooth	348	12:02
mdbooth	I'm not moving that ;)	12:02
mdbooth	sean-k-mooney: Feel free to write a follow-up patch	12:03
nicolasbock	I can also check that with 'openstack resource provider allocation show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd'	12:03
*** udesale has joined #openstack-nova		12:03
sean-k-mooney	mdbooth: ok i would have just done nested=common.nested in test.py	12:03
nicolasbock	The VM is really running on 6cbb84b0-02f4-4ee3-9df2-151475b1effe	12:04
mdbooth	sean-k-mooney: Sure. I don't want to mess with test.nested here, though.	12:04
nicolasbock	But `openstack resource provider allocation set --allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd` is not working	12:04
sean-k-mooney	mdbooth: ok ill submist a follow up patch. eventully... i have added it to my whiteboard	12:05
nicolasbock	I suppose I am missing a `resource-class-name`	12:05
*** ttsiouts has quit IRC		12:05
nicolasbock	But what do I put there?	12:05
*** ttsiouts has joined #openstack-nova		12:07
openstackgerrit	Elod Illes proposed openstack/nova stable/ocata: Fix the help for the disk_weight_multiplier option https://review.openstack.org/607537	12:08
*** ttsiouts has quit IRC		12:11
sean-k-mooney	nicolasbock: was the vm migrated	12:11
nicolasbock	Yes sean-k-mooney	12:12
sean-k-mooney	mdbooth: was lookin at an edgecase recently where if cleanup on the source fail we dont update the vm host	12:12
sean-k-mooney	nicolasbock: e.g. when you finish migrating the instace if the post migrate job on the source node failts to say unplug a vif we fail before we update the instace db record to refect that the vm is running on the new node	12:14
sean-k-mooney	mdbooth: did you ever proposea a patch for ^	12:14
mdbooth	sean-k-mooney: Probably.	12:15
*** moshele has joined #openstack-nova		12:15
sean-k-mooney	nicolasbock: are the placement allocation currently associated with the vm correct for the host it is actully on	12:17
sean-k-mooney	e.g. if you ignore the db host value and actully check the vms location	12:17
mdbooth	sean-k-mooney: Trying to parse your comment here: https://review.openstack.org/#/c/605436/5/nova/compute/manager.py@1447	12:18
mdbooth	Are you saying we can release the lock there?	12:18
mdbooth	Or yield the context manager there?	12:18
mdbooth	Or something else, because neither of ^^^ would be correct.	12:18
nicolasbock	The VM is running on `6cbb84b0-02f4-4ee3-9df2-151475b1effe`	12:18
nicolasbock	But placement says that it's on `57b0e4d5-3a3e-4cf3-ba8c-b88c8ce4679b`	12:19
sean-k-mooney	mdbooth: im saying if we dont have a lock when we invoke that line the db request could cause use to yeild causeing a race	12:19
mdbooth	Ah, eventlet yield	12:19
mdbooth	Ok.	12:19
sean-k-mooney	e.g. this is the thing that definetly needs to be in the critcal section of the lock	12:19
nicolasbock	I ran `openstack server show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd`	12:20
*** dave-mccowan has joined #openstack-nova		12:20
nicolasbock	and `openstack resource provider allocation show 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd`	12:20
sean-k-mooney	mdbooth: ya re reading that was not clear	12:21
sean-k-mooney	nicolasbock: is 57b0e4d5-3a3e-4cf3-ba8c-b88c8ce4679b the source or destination of the migration	12:21
sean-k-mooney	nicolasbock: im assume the destination correct?	12:22
sean-k-mooney	sorry source	12:22
nicolasbock	I don't know what happened, but I would guess that it is the source	12:23
nicolasbock	Sorry, I am not sure I completely grasp the terminology of source and destination	12:23
sean-k-mooney	ya so the resouce being used on 6cbb84b0-02f4-4ee3-9df2-151475b1effe are likely still owned by the migration object in placement	12:23
nicolasbock	What's the migration object?	12:24
*** med_ has quit IRC		12:25
sean-k-mooney	when you do a migration we create a migration record that we use to calim resouces on the destination host then when the vm is move we use a special atomic oepration in the placemnt api to change the allocation consumer form the migration recordds uuid to the vms uuid	12:26
nicolasbock	So you are saying that that atomic operation wasn't executed?	12:27
sean-k-mooney	yes	12:27
nicolasbock	Ok	12:27
nicolasbock	Can I get it to execute?	12:27
sean-k-mooney	so if you do nova server-migration-list 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd does it have a migration object listed	12:28
nicolasbock	No	12:28
sean-k-mooney	oh hum strange. perhaps the migration has already been confirmed.	12:30
sean-k-mooney	nicolasbock: efried might be able to help better the i if he is around	12:31
nicolasbock	So I thought that `openstack resource provider allocation set` would allow me to update the DB	12:31
*** ttsiouts has joined #openstack-nova		12:31
nicolasbock	Thanks sean-k-mooney !	12:31
nicolasbock	But I am not using that command correctly since it complains about an incorrect 'allocation string format'	12:32
*** ttsiouts has quit IRC		12:33
*** ttsiouts has joined #openstack-nova		12:34
openstackgerrit	Vlad Gusev proposed openstack/nova stable/pike: libvirt: Use os.stat and os.path.getsize for RAW disk inspection https://review.openstack.org/607544	12:36
openstackgerrit	Matthew Booth proposed openstack/nova master: DNM: Run against mriedem's evacuate test https://review.openstack.org/604423	12:37
*** jpena\|lunch is now known as jpena		12:41
mdbooth	This is an interesting query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AssertionError%3A%20u'host3'%20%3D%3D%20u'host3'%5C%22	12:44
mdbooth	I wonder why that started spiking only a few days ago: code change, or infra change?	12:44
*** moshele has quit IRC		12:53
*** artom has quit IRC		12:56
stephenfin	lyarwood: RE: https://review.openstack.org/588570 I'd been putting it off but will do that now. Will keep you posted	13:03
lyarwood	stephenfin: cheers	13:04
*** udesale has quit IRC		13:05
s10	How can I reopen bug https://bugs.launchpad.net/nova/+bug/1209101 ? It still exists.	13:06
openstack	Launchpad bug 1209101 in OpenStack Compute (nova) "Non-public flavor cannot be used in created tenant" [High,Fix released] - Assigned to Sumanth Nagadavalli (sumanth-nagadavalli)	13:06
sean-k-mooney	s10: just change status and leave a comment with details	13:07
s10	sean-k-mooney: I've left comment, but I can't change status, all of them are grey.	13:08
sean-k-mooney	s10: that said i dont think its nessisarly the same but. that was fixed in 2013	13:08
sean-k-mooney	its more likely a regression	13:08
sean-k-mooney	are you seeing this on master?	13:08
s10	sean-k-mooney: Yes, this regression have never been fixed. I will write a comment about how to reproduce it.	13:09
sean-k-mooney	well it was fixed and then reverted so this likely needs to be treated as more then a bug fix but rather as a blueprint/spec	13:10
sean-k-mooney	the original bug fix predates microversions so i think a mini spec + microversion but would be required to alter the api behavior	13:13
sean-k-mooney	bauzas: is ^ correct	13:13
bauzas	context ?	13:14
sean-k-mooney	private flavor are automatically expsed to new tenants	13:14
sean-k-mooney	because https://bugs.launchpad.net/nova/+bug/1209101 was reverted	13:15
openstack	Launchpad bug 1209101 in OpenStack Compute (nova) "Non-public flavor cannot be used in created tenant" [High,Fix released] - Assigned to Sumanth Nagadavalli (sumanth-nagadavalli)	13:15
sean-k-mooney	sorry are not automatically exposed	13:15
sean-k-mooney	bauzas: so context is if we want to chagne teh behavior of the api to auto grant access to the private flavor that would require a spec rather then being just a bug fix right as its an api change?	13:18
*** liuyulong has joined #openstack-nova		13:20
bauzas	sean-k-mooney: IIUC, I'd tend to say yes, as it's a behavioural change	13:20
bauzas	we don't really call the fact to not show private flavors as a "bug"	13:20
bauzas	some people would also like to keep this behaviour I guess	13:20
bauzas	and last but not the least, two OpenStack clouds could behave differently for the same request and list of flavors, which is not interop	13:21
bauzas	HTH	13:21
*** mvkr has quit IRC		13:23
*** eharney has joined #openstack-nova		13:23
*** artom has joined #openstack-nova		13:30
sean-k-mooney	s10: im just having lunch but based on bauzas confirmation rather then repoen the but i would suggest you file a nova spec. if you dont have time to do that i can see if i can do it later today but you have more context then i as to what you wanted to achive	13:34
*** mriedem has joined #openstack-nova		13:37
*** cfriesen has joined #openstack-nova		13:39
efried	nicolasbock: Still around?	13:39
efried	nicolasbock: mriedem would be a better source of CLI syntax help, but I can tell you what the API call itself would need to look like.	13:40
mnaser	does anyone know if CERN does pci passthrough on centos or ubuntu?	13:41
dansmith	I thought they were all centos	13:42
mnaser	i'm trying to figure out why pci passthrough isn't working	13:43
mnaser	everything nova side is working ok, but the newly spawned qemu-kvm process is stuck spinning at 100% cpu, no console logs, libvirt qemu logs show nothing..	13:43
mnaser	so i'm a bit at a loss, not really sure where to go next	13:44
s10	sean-k-mooney: Basically, what I want to archive is to be able to provide tenants and ability to manage private flavors which they created.	13:44
s10	sean-k-mooney: But this will be more complicated than automatic flavor access add after private flavor creation...	13:46
sean-k-mooney	s10: ok if that is the usecase then that should be relitvly simple to capture in a spec. ill write up a spec.	13:46
sean-k-mooney	s10: oh why should we not jsut need to allow the teant that created teh flavor to have acess to it automatically?	13:47
sean-k-mooney	for interop reasons we will need a microversion bump but that is just a mechanical sideeffect not germain to the feature	13:48
*** mvkr has joined #openstack-nova		13:50
*** awaugama has joined #openstack-nova		13:52
s10	sean-k-mooney: I want them to have access to created flavor automatically because there is no RBAC in nova for private flavors. Flavors don't have an owner. I can't only allow tenants to run flavor-access-add for "their" flavors :(	13:53
*** tbachman has joined #openstack-nova		13:56
mnaser	ou	13:58
mnaser	i wonder if this has to do with using lvm local storage	13:58
mnaser	and the qemu user not being able to access it	13:58
mnaser	hmm	13:58
mnaser	yup, it can't access it	13:59
mnaser	would it be a responsibility of nova to setup permissions of volumes under lvm so that the qemu user can read it?	14:00
*** maciejjozefczyk has quit IRC		14:01
*** rpittau_ has joined #openstack-nova		14:02
mdbooth	I just ran a git bisect to try to find out why the incidence of test_parallel_evacuate_with_server_group failure went from occasional to almost 50% in the last few days, and the culprit is my patch from the other day: https://review.openstack.org/#/c/604859/	14:02
mdbooth	I still consider this a feature :)	14:03
*** rpittau has quit IRC		14:03
*** munimeha1 has joined #openstack-nova		14:05
*** Bhujay has quit IRC		14:06
*** hoangcx has quit IRC		14:06
*** Bhujay has joined #openstack-nova		14:07
mriedem	dansmith: did you know there was a cinder_img_volume_type metadata key which can be used to create a bootable volume with a specific volume type? https://docs.openstack.org/cinder/latest/cli/cli-manage-volumes.html#volume-types	14:14
mriedem	so technically people today could boot from volume from an image with that metadata and get what they wanted instead of passing a volume type to nova - clunky i know	14:15
dansmith	on the image	14:15
mriedem	yeah,	14:15
dansmith	I did not know that, no	14:15
mriedem	likely also means that we need to make a decision in the compute API if the user specifies a volume type and the source image has that metadata key/value, which do we pick? or do we 400?	14:15
mriedem	probably need to know what cinder does in that same case	14:16
mriedem	smcginnis: do you know off the top of your head? ^	14:16
*** Bhujay has quit IRC		14:16
dansmith	seems to me like if they ask for something on the boot request, that always wins	14:17
dansmith	like, we have a default type, and there can be a default for an image,	14:17
smcginnis	If you explicitly provide a type, I believe we will give that priority over the image property.	14:17
dansmith	but if they ask for something specific at the time, I would expect they want the one they asked for	14:17
mriedem	yeah that's what i'd expect too	14:17
smcginnis	So fallback can be to have the volume type stuffed in the image properties, but that should not change the primary usage of someone saying specifically what they want.	14:17
*** mlavalle has joined #openstack-nova		14:19
* bauzas facepalms when he looks https://stackoverflow.com/questions/11941817/how-to-avoid-runtimeerror-dictionary-changed-size-during-iteration-error		14:20
mnaser	yup. that was the issue	14:20
mnaser	if you use lvm on centos with nova, the volumes are created under user 'root'	14:20
mnaser	so qemu process cant touch them and it cant boot	14:21
mnaser	is this technically a nova bug?	14:21
mnaser	(aka the devices on the system /dev/vg_foo/vmuuid_disk are root:root, qemu-kvm runs on qemu:qemu)	14:21
*** hoangcx has joined #openstack-nova		14:22
nicolasbock	Hi efried I am still here	14:23
efried	nicolasbock: So looking at the manual (https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-allocation-set) it appears as though you're going to want multiple --alocation params...	14:25
s10	mriedem: Can we remove line https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L3669 ? It causes bug https://bugs.launchpad.net/nova/+bug/1746972 . I don't believe that error in set-admin-password on running instance should put instance in error state and require cloud admin intervention to reset instances state or that user should abandon and remove this vm.	14:25
openstack	Launchpad bug 1746972 in OpenStack Compute (nova) "After setting the password failed, the VM state is set to error" [Undecided,Confirmed]	14:25
nicolasbock	Yes, that's my reading too efried	14:25
nicolasbock	But I don't understand what the other parameters should look like	14:26
efried	nicolasbock: I think each should look like --allocation rp=$rp_uuid,$rc=$amount	14:26
dansmith	mnaser: what do you want nova to do? chown them? afaik, it doesn't know what user qemu will run as	14:26
openstackgerrit	Merged openstack/nova stable/pike: nova-status - don't count deleted compute_nodes https://review.openstack.org/604788	14:26
nicolasbock	Ok. What I don't get is why I need a resource-class in there as well. I don't want anything to update in terms of resource classes	14:27
efried	nicolasbock: Oh, but you do :)	14:27
nicolasbock	I do?	14:27
*** dpawlik has quit IRC		14:27
efried	nicolasbock: I guess it's obvious to me because I know what the REST payload looks like, but come to think of it, it makes sense how you're thinking about it.	14:27
nicolasbock	Maybe I am not looking at resource classes correctly	14:28
efried	nicolasbock: See, the allocations in the API are a hierarchical structure like resource provider => resource class => amount	14:28
nicolasbock	But the way I am thinking about them is that they specify things like memory and CPU cores	14:28
nicolasbock	Ok	14:28
efried	And also the CLI (and the API it's using) is designed to fully replace allocations, not like edit pieces of them.	14:28
efried	So given that...	14:29
efried	nicolasbock: You should do `openstack resource provider allocation show $instance_uuid`	14:29
efried	Which should give you allocations in three-ish resource classes	14:29
efried	nicolasbock: can you pastebin me that output?	14:29
dansmith	you know what would be awesome	14:29
dansmith	openstack resource provider allocation edit <uuid>	14:30
nicolasbock	https://pastebin.com/sFdzQuPE	14:30
dansmith	like virsh edit	14:30
*** ttsiouts has quit IRC		14:30
openstackgerrit	Matt Riedemann proposed openstack/nova master: doc: fix and clarify --block-device usage in user docs https://review.openstack.org/607589	14:30
sean-k-mooney	dansmith: that could be done as a client only feature but yes that would be nice	14:31
dansmith	sean-k-mooney: obviously client-only	14:31
sean-k-mooney	well i was debating if you wuld want to put it in th openstack sdk or just the osc plugin	14:31
sean-k-mooney	but ya i think just in the plugin	14:32
dansmith	oh, I meant just in the plugin	14:32
dansmith	yeah	14:32
dansmith	and you could translate to/from yaml for the actual editing maybe	14:32
dansmith	so people aren't having to hand-edit json	14:32
dansmith	since you need to validate the schema before you send it back anyway	14:32
mnaser	dansmith: yeah, that why i don't think it's a nova problem but maybe something that we should document.. or libvirt should	14:33
sean-k-mooney	sounds like a nice low hangin fruit bug	14:33
efried	nicolasbock: Okay, so I think you're going to want to build your command with:	14:33
efried	--allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe,MEMORY_MB=8192 \	14:33
efried	--allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe,VCPU=4 \	14:33
efried	--allocation rp=6cbb84b0-02f4-4ee3-9df2-151475b1effe,DISK_GB=80	14:33
mriedem	ha	14:33
mriedem	"InstancePasswordSetFailed: Failed to set admin password on	14:33
mriedem	9f9330c2-4ab4-45f1-a9f9-2770dd34cf30 because error setting admin password"	14:33
nicolasbock	Ah ok	14:34
mriedem	"we failed because we failed"	14:34
efried	mriedem: duh	14:34
nicolasbock	Let me try that	14:34
mriedem	s10: i don't know why the instance is put into ERROR state there, i want to say i've seen a patch to remove that	14:34
efried	nicolasbock: Note that's gotta be all in one command. Otherwise you'll end up with an instance with just disk :)	14:34
s10	mriedem: yes, I see, there is https://review.openstack.org/#/c/555160/	14:35
nicolasbock	Good point efried :)	14:35
mriedem	efried: nicolasbock: might be sensible to have an "openstack resource provider allocation class set" similar to the inventory class set CLI https://docs.openstack.org/osc-placement/latest/cli/index.html#resource-provider-inventory-class-set	14:36
nicolasbock	So the command worked	14:36
mriedem	^ allows you to set inventory on a provider for a specific class, not replace the entire set of inventory for the provider	14:36
nicolasbock	But now I have https://pastebin.com/KrcWAXbF	14:36
mriedem	that uses https://developer.openstack.org/api-ref/placement/#update-resource-provider-inventory	14:36
sean-k-mooney	efried: that is proably another reason to have an edit command since this all needs to be done atomicly	14:36
mriedem	we don't have an api like that for allocations, which is why there isn't a CLI for it	14:37
mriedem	we just have https://developer.openstack.org/api-ref/placement/#update-allocations	14:37
openstackgerrit	Merged openstack/nova stable/rocky: Ignore VirtDriverNotReady in _sync_power_states periodic task https://review.openstack.org/605533	14:37
mriedem	but we could easily write a command that just updates one of the resource classes within the existing allocations	14:37
nicolasbock	Yes that sounds sensible mriedem	14:37
efried	nicolasbock: Oh, interesting. That's... probably a bug.	14:37
nicolasbock	:)	14:38
*** ivve has joined #openstack-nova		14:38
mriedem	i very much doubt osc-placement handles consumer generations yet, so it could be racy for the CLI to orchestrate this	14:38
nicolasbock	I should remove the old allocation, right?	14:38
mriedem	but that's probably a low risk	14:38
efried	nicolasbock: Yeah, except the only way to do that is openstack resource provider allocation delete $instance_uuid which (I sincerely hope) removes all of them.	14:38
efried	nicolasbock: actually what may have happened is that the source host still thinks it has the instance, and it "healed" the allocations.	14:39
nicolasbock	All of them?	14:39
efried	That would be something to look in the logs for.	14:39
nicolasbock	Ok	14:39
mriedem	do you have ocata computes?	14:39
nicolasbock	But if it removes all of them, wouldn't that be bad?	14:39
efried	nicolasbock: Well, if you remove all of them, then you can run your 'set' command to restore the proper ones.	14:40
efried	But	14:40
mriedem	if you have ocata computes, the resource tracker is reporting the allocations it thinks exist to placement	14:40
efried	if my suspicion is correct, once you delete all the allocations and wait a minute, the original (source) allocations will magically reappear.	14:40
*** ttsiouts has joined #openstack-nova		14:40
efried	okay, so mriedem that would explain the source allocs magically reappearing?	14:41
nicolasbock	mriedem: This is using Newton	14:42
nicolasbock	I'll try to delete the allocation	14:43
nicolasbock	And wait to see what happens :)	14:43
mriedem	newton/ocata computes will recreate allocations yes	14:43
mriedem	until you get everything upgraded to >= pike, the resource tracker periodic task in the compute service will try to manage allocations	14:44
nicolasbock	The new allocation was deleted while we were chatting	14:44
nicolasbock	Interesting mriedem	14:44
nicolasbock	But where is the periodic task getting its information from?	14:45
mriedem	the instances it thinks are running on that host,	14:46
mriedem	and those instances flavors	14:46
nicolasbock	Is there a way to update that?	14:46
mriedem	so if compute host A thinks instance B is running on it with a flavor that uses x,y,z vcpu/ram/disk, it's going to report that	14:46
mriedem	update what?	14:47
nicolasbock	So I would have to convince the compute host that it's not running the instance?	14:47
efried	nicolasbock: I kind of missed how we got into this situation. What makes you think the instance was successfully removed from the source host?	14:47
mriedem	nicolasbock: is the instance.host in the db pointing at that host?	14:47
nicolasbock	I am going by what `openstack server show` is telling me :)	14:47
mriedem	server show should also tell you yeah	14:48
openstackgerrit	Matthew Booth proposed openstack/nova master: Run evacuate tests with local/lvm and shared/rbd storage https://review.openstack.org/604400	14:48
*** slaweq has quit IRC		14:48
nicolasbock	So `server show` is reporting an incorrect hypervisor	14:49
mriedem	this is where the RT gets the instances it thinks are running on it https://github.com/openstack/nova/blob/newton-eol/nova/compute/resource_tracker.py#L556	14:49
openstackgerrit	Vlad Gusev proposed openstack/nova master: Not instance to ERROR if set_admin_password failed https://review.openstack.org/555160	14:49
sean-k-mooney	mriedem: there is a live migration edgecase that mdbooth was looking at a few weeks ago where post migrate source failed and we would not update the host the vm was running on	14:50
sean-k-mooney	but the vm has actully been moved correectly	14:50
mriedem	nicolasbock: so did you live migrate this vm or something? why is nova reporting its on the wrong host?	14:52
mdbooth	Ah, yes. I do recall a bug with that. If we get an error in cleanup on the source host, called post successful migration, we then rollback the migration and put the instance in an error state, but it's still running fine on the destination.	14:54
mdbooth	So, e.g. if you get an error in terminate_connection or whatever, you get in this state	14:54
mdbooth	And you can't clean it up, because instance.host is pointing to the source, but it's actually running on the dest	14:54
mriedem	terminate_connection as in cleaning up source node volume attachments and such right?	14:55
mriedem	same with ports i'm sure	14:55
mriedem	post live migration cleaning up the source	14:55
mdbooth	mriedem: Right. Any cleanup on the source	14:55
mriedem	we should just catch and log cleanup failures	14:55
openstackgerrit	Vlad Gusev proposed openstack/nova master: Not set instance to ERROR if set_admin_password failed https://review.openstack.org/555160	14:55
*** slaweq has joined #openstack-nova		14:56
mdbooth	mriedem: Right. The error in my view was that we put the instance in an error state, when the instance was fine. We should put the migration in an error state, but leave the instance alone.	14:57
mdbooth	And also do as much cleanup as possible in the presence of errors.	14:57
nicolasbock	mriedem: Yes I think that's what happened	14:57
openstackgerrit	Jan Gutter proposed openstack/nova-specs master: Spec to implement vRouter HW offloads https://review.openstack.org/567148	15:07
openstackgerrit	Jan Gutter proposed openstack/nova-specs master: Spec to implement generic HW offloads for os-vif https://review.openstack.org/607610	15:07
*** jangutter has joined #openstack-nova		15:07
openstackgerrit	Claudiu Belu proposed openstack/nova master: tests: autospecs all the mock.patch usages https://review.openstack.org/470775	15:09
*** eharney has quit IRC		15:09
melwitt	.	15:10
mriedem	s10: commented in that patch	15:12
mriedem	nicolasbock: so live migration was successful but something failed in post like mdbooth is mentioning	15:12
nicolasbock	Ok	15:12
mriedem	nicolasbock: you'll likely need to manually update the instances.host value in the db then for that instance	15:12
mriedem	otherwise nova-compute on the source host is going to continue thinking it owns the instance	15:13
s10	mriedem: thank you	15:14
nicolasbock	Ok, other than that this sounds mildly scary, could you give me a pointer where I find that value mriedem ?	15:14
mriedem	do you know where the guest is actively running now?	15:14
openstackgerrit	Merged openstack/nova stable/ocata: [Stable Only] Add amd-ssbd and amd-no-ssb CPU flags https://review.openstack.org/607296	15:14
nicolasbock	Yes	15:15
mriedem	it should be in the last live-migration migration record for the instance	15:15
nicolasbock	Ok	15:15
mriedem	well then you just update the table record in the nova db	15:15
mriedem	update instances set host=<host> where uuid=<instance uuid>;	15:16
nicolasbock	Ok, sounds so straightforward when you put it like that ;)	15:16
nicolasbock	I'll give it a try	15:16
nicolasbock	Thanks!	15:16
sean-k-mooney	mriedem: out of interest what would happen if you tried to do a hard reboot or other lifecycle action on a vm in this state with the wrong host set	15:19
openstackgerrit	Vlad Gusev proposed openstack/nova master: Not set instance to ERROR if set_admin_password failed https://review.openstack.org/555160	15:19
*** dpawlik has joined #openstack-nova		15:19
sean-k-mooney	woudl it repare the instance or try to start it on the wrong host?	15:19
mriedem	it would try to start it on the wrong host	15:21
mriedem	i do'nt know what would then happen - would you get the same instance running on two hosts? or a domain not found from the wrong host when trying to reboot it?	15:21
mriedem	i'd hope the latter	15:21
sean-k-mooney	right which if it was using share storage could lead to data curoption correct	15:21
mriedem	well i'd hope reboot would fail if the guest isn't actually on the hypervisor	15:22
sean-k-mooney	i would hope the latter too	15:22
openstackgerrit	Elod Illes proposed openstack/nova stable/ocata: Don't delete neutron port when attach failed https://review.openstack.org/607614	15:22
mriedem	doesn't look like it would fail though https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L849	15:23
mriedem	we just handle the not found and assume the guest is already gone	15:23
*** dpawlik has quit IRC		15:24
*** eharney has joined #openstack-nova		15:24
mriedem	so maybe there is something to be said for putting hte instance into ERROR state on failed post live migration	15:24
mriedem	to force the admin to fix it	15:24
sean-k-mooney	mriedem: right we need that in case where we are usign hard reboot to "fix" things	15:25
mriedem	so the user doesn't try to reboot the thing and screw it up	15:25
sean-k-mooney	mriedem: perhaps when i discussed this wit mdbooth previorsly i was suggesting always updating the host to the correct location fo the vm	15:26
sean-k-mooney	then you could decide if it shoudl stay in error or active state seperately without haveing more bugs if you left it in active	15:26
openstackgerrit	Claudiu Belu proposed openstack/nova master: hyper-v: autospec classes before they are instantiated https://review.openstack.org/342211	15:27
sean-k-mooney	mriedem: or to put that another way i think there are two issues in the other case. 1 the host is not upstaed when the vm is moved in some cases and 2 what to do when cleanup fails post migration	15:28
mriedem	if post live migration set the instance.host to the correct host on which it's running then yeah my concern about the user rebooting it and now having the same guest on different hosts is less of an issue	15:28
*** ttsiouts has quit IRC		15:31
sean-k-mooney	nicolasbock: if you are still around the clip notes version of that conversation is you might want to consider locking the instance untill you have reparied the db to prevent any lifecylce envents	15:32
mdbooth	mriedem: IIRC my thought at the time was that we should completely update the instance record for the destination immediately after we switch it, then run source cleanup.	15:33
mdbooth	So if we fail for whatever reason we've recorded that we're running on the dest.	15:33
*** pcaruana has quit IRC		15:33
*** med_ has joined #openstack-nova		15:37
mriedem	s10: i think the unit test is going to fail in that patch, see comments for why	15:37
mdbooth	Incidentally, to reiterate something I said earlier, when this patch landed late last week it made the test_parallel_evacuate_with_server_group about 20 times more likely to occur: https://review.openstack.org/#/c/604859/	15:37
mdbooth	That test is now failing around 50% for me.	15:38
mriedem	we can skip the test for now	15:38
mriedem	while the fix is being reviewed	15:38
mdbooth	mriedem: ack. Given ^^^ it seems that the test has never been good.	15:38
*** macza has joined #openstack-nova		15:39
*** hamzy has quit IRC		15:40
openstackgerrit	Rodolfo Alonso Hernandez proposed openstack/os-vif master: Add native implementation OVSDB API https://review.openstack.org/482226	15:40
openstackgerrit	Matt Riedemann proposed openstack/nova master: Skip test_parallel_evacuate_with_server_group until fixed https://review.openstack.org/607620	15:43
*** hamzy has joined #openstack-nova		15:43
mriedem	dansmith: efried: mdbooth: ^ gives time to be comfortable with the fix	15:43
mdbooth	mriedem: ack.	15:44
efried	mriedem: What will the criteria for comfort be?	15:44
efried	mriedem: https://review.openstack.org/#/c/605436/ has three +1s and a +2	15:45
mriedem	i guess that's up to whoever +Ws it	15:45
efried	mriedem: Well, I'm comfortable that it fixes the problem. But don't feel confident enough in the actual code change to +W. I would think someone like.... mriedem would be able to have that confidence.	15:46
*** Luzi has joined #openstack-nova		15:47
mriedem	i'm not very confident in anything atm	15:47
dansmith	I think that change needs a lot of inspection	15:48
dansmith	which I can't do right this moment	15:48
*** penick has joined #openstack-nova		15:57
mriedem	who enjoys a good UnboundLocalError? https://github.com/openstack/nova/blob/237ced4737a28728408eb30c3d20c6b2c13b4a8d/nova/network/neutronv2/api.py#L1429	15:59
mriedem	oh i guess it's not unbound, it's a module import...	16:02
*** sahid has quit IRC		16:06
*** gyee has joined #openstack-nova		16:06
*** slaweq has quit IRC		16:09
mriedem	so uh, if we hit ^ shouldn't we fail the build?	16:09
*** slaweq has joined #openstack-nova		16:10
mriedem	clearly the user isn't going to get the sriov port attached to the guest that they requested	16:10
openstackgerrit	Jay Pipes proposed openstack/nova stable/ocata: Re-use existing ComputeNode on ironic rebalance https://review.openstack.org/607626	16:11
openstackgerrit	Matt Riedemann proposed openstack/nova master: Fix logging parameter in _populate_pci_mac_address https://review.openstack.org/607628	16:12
*** ttsiouts has joined #openstack-nova		16:12
*** hoangcx has quit IRC		16:12
mriedem	sean-k-mooney: you might be interested in https://bugs.launchpad.net/nova/+bug/1795064	16:12
openstack	Launchpad bug 1795064 in OpenStack Compute (nova) "SR-IOV error IndexError: pop from empty list" [Undecided,New]	16:12
mriedem	something something sriov and kernel versions	16:13
*** hoangcx has joined #openstack-nova		16:13
sean-k-mooney	looking	16:13
jaypipes	mriedem: https://review.openstack.org/#/c/607626/ is that stable/ocata backport for the duplicate hypervisor_hostname thingie	16:13
jaypipes	mriedem: thx for your help earlier.	16:13
*** s10 has quit IRC		16:13
mriedem	jaypipes: so you cherry-picked that from master?	16:15
mriedem	https://review.openstack.org/#/c/508555/	16:15
*** ttsiouts has quit IRC		16:15
*** spatel_ has joined #openstack-nova		16:15
mriedem	or did you cherry pick from the pike backport but forget the -x option on the cherry-pick command?	16:15
spatel_	Hi folks	16:15
jaypipes	mriedem: no, I cherry-picked the SHA1 from the stable/pike patch	16:15
*** ttsiouts has joined #openstack-nova		16:16
jaypipes	mriedem: oh, sorry, I don't know about -x :(	16:16
spatel_	I am having issue with SR-IOV with shared PCI device between numa and reading this blueprint : https://blueprints.launchpad.net/nova/+spec/share-pci-between-numa-nodes	16:16
sean-k-mooney	spatel_: this bug https://bugs.launchpad.net/nova/+bug/1795064? or another?	16:17
openstack	Launchpad bug 1795064 in OpenStack Compute (nova) "SR-IOV error IndexError: pop from empty list" [Undecided,New]	16:17
spatel_	I have set hw:pci_numa_affinity_policy='preferred' in flavor but still its not allowing me to run instance on NUMA-1	16:17
spatel_	sean-k-mooney: that problem got resolved by downgrading kernel to 3.x	16:17
sean-k-mooney	spatel_: i think you issue with 4.18 was that you did not have a netdev associated with the vf	16:18
spatel_	Is that configuration issue or BUG?	16:19
sean-k-mooney	spatel_: i would say config issue. i would guess the default options for the gernel module cahnged and or you are using a different driver by default	16:19
jaypipes	mriedem: apologies. how can I fix appropriately? do I need to re-do the git cherry-pick with -x? or can/should I just edit the commit message with seomthing?	16:19
sean-k-mooney	spatel_: for example if the device was bound to vfio_pci instead fo the broadcom driver then it would existit in lspci but not have a netdev	16:20
*** ttsiouts has quit IRC		16:20
openstackgerrit	Artom Lifshitz proposed openstack/nova master: WIP: Libvirt live migration: update NUMA XML for dest https://review.openstack.org/575179	16:20
openstackgerrit	Artom Lifshitz proposed openstack/nova master: Service version check for NUMA live migration https://review.openstack.org/566723	16:20
spatel_	hmmm!	16:21
sean-k-mooney	spatel_: do you whitelist devices using the devname option?	16:21
mriedem	jaypipes: see my other comment in the ocata backport about documenting the conflicts?	16:21
spatel_	sean-k-mooney: yes i am using devname option to specify my interface	16:21
spatel_	pci_passthrough_whitelist = "{ "physical_network":"vlan", "devname":"eno2" }"	16:21
sean-k-mooney	spatel_: in general i advise against that for this exact reason	16:21
sean-k-mooney	spatel_: if you use the pci adddress instead 4.18 would likely be fine	16:21
spatel_	pci address ?	16:22
*** macza has quit IRC		16:22
sean-k-mooney	spatel_: the whitelist can have 3 modes of whitelisting	16:22
spatel_	you mean vendor_id or product_id ?	16:22
*** macza has joined #openstack-nova		16:22
sean-k-mooney	spatel_: you can used devname, (vendor_id and product_id) or you can pass a pci address	16:23
spatel_	sean-k-mooney: i will give it a try and report back to BUG	16:23
spatel_	sean-k-mooney: currently i am dealing with this issue :( https://bugs.launchpad.net/nova/+bug/1795920	16:23
openstack	Launchpad bug 1795920 in OpenStack Compute (nova) "SR-IOV shared PCI numa not working " [Undecided,New]	16:23
spatel_	Do you know what wrong i am doing here	16:23
*** panda is now known as panda\|off		16:24
spatel_	I have 2 NUMA node and running SR-IOV with shared PCI	16:24
*** macza has quit IRC		16:24
sean-k-mooney	https://docs.openstack.org/mitaka/networking-guide/config-sriov.html has the doc	16:24
sean-k-mooney	am let me look	16:24
spatel_	I am only able to use one side of NUMA	16:24
sean-k-mooney	by default unless you set a pci numa affinity policy in the flavor or image we require strict numa afinity	16:25
spatel_	Its not allowing me to launch SR-IOV instance on NUMA-2 ( because PCI is attach to NUMA-1 )	16:25
spatel_	All i did is hw:pci_numa_affinity_policy=preferred in flavor	16:25
spatel_	what else i need to do?	16:25
sean-k-mooney	spatel_: let me check. i taught that was enough but you might also need to set the policy in the whitelist	16:26
spatel_	I do have aggregate_instance_extra_specs:pinned='true', hw:cpu_policy='dedicated' in flavor	16:26
*** helenafm has quit IRC		16:26
spatel_	I think document isn't clear in blueprint so i am totally confused :(	16:27
spatel_	If i remove "aggregate_instance_extra_specs:pinned='true', hw:cpu_policy='dedicated'" from flavor then i am able to launch instance anywhere in NUMA with SR-IOV support	16:28
*** dims has quit IRC		16:28
sean-k-mooney	aggregate_instance_extra_specs:pinned='true' is not a standard thing	16:29
*** med_ has quit IRC		16:29
spatel_	hmmm! i didn't get it	16:30
sean-k-mooney	spatel_: you should not need to set anything in the aggragte to use the pci policies	16:30
openstackgerrit	Jay Pipes proposed openstack/nova stable/ocata: Re-use existing ComputeNode on ironic rebalance https://review.openstack.org/607626	16:30
spatel_	oh! so you are saying i should remove aggregate_instance_extra_specs:pinned	16:30
jaypipes	mriedem: k, hopefully correct now.	16:30
jaypipes	thx for the help again.	16:31
sean-k-mooney	spatel_: yes	16:31
spatel_	lets say if i remove "aggragte" then does my vCPU get Pinned ?	16:31
spatel_	removing aggrate and going to launch instance	16:33
*** med_ has joined #openstack-nova		16:34
*** dims_ has joined #openstack-nova		16:35
spatel_	sean-k-mooney: didn't work error 'No valid host was found. There are not enough hosts available'	16:36
spatel_	look like something is missing..	16:36
sean-k-mooney	spatel_: so jsut to confirm you dont have any aggragte metadata set and have hw:pci_numa_affinity_policy=preferred set	16:38
sean-k-mooney	in the flavor	16:38
spatel_	This is what i have currently in flavor -> properties \| hw:cpu_policy='dedicated', hw:numa_nodes='2', hw:pci_numa_affinity_policy='preferred'	16:39
spatel_	If i remove all 3 option then i am successfully able to launch instance	16:39
stephenfin	spatel_, sean-k-mooney: We didn't implement it with a flavor extra spec in the end	16:39
sean-k-mooney	so this has be set in the pci whitelist then	16:40
stephenfin	spatel_: Yes	16:40
stephenfin	Oops, sean-k-mooney ^	16:40
spatel_	oh!! wait wait.. so what i need to do in pci whitelist ?	16:40
sean-k-mooney	spatel_: https://github.com/openstack/nova/blob/master/nova/pci/request.py#L16-L25	16:40
sean-k-mooney	sorry thats not what you want but yes you do	16:41
spatel_	so i need to add that snippet in compute nova.conf in [PCI] section ?	16:41
stephenfin	spatel_: Have you seen this? https://docs.openstack.org/nova/latest/admin/networking.html#numa-affinity	16:41
stephenfin	spatel_: Ignore that - wrong feature :)	16:41
spatel_	ok	16:41
spatel_	I am running queens	16:41
stephenfin	spatel_: https://docs.openstack.org/nova/latest/configuration/config.html#pci	16:42
stephenfin	See the alias configuration key	16:42
stephenfin	spatel_: But, to be clear, is this for a PCI device or an SR-IOV device?	16:42
spatel_	SR-IOV device	16:42
*** artom has quit IRC		16:42
spatel_	We are running high performance network application and need high speed network or high PPS rate	16:43
sean-k-mooney	stephenfin: looking at the whitelist code i dont think we supprot it in the whitelist	16:43
stephenfin	sean-k-mooney: Doesn't seem like it. I'm trying to think why it was done that way	16:44
sean-k-mooney	which would mean the polices only work for device requeted via flavor alias which would be dumb	16:44
sean-k-mooney	are you sure we did not supprot this in the flavor extraspecs /image metadata	16:44
spatel_	I am going to add alias and get back to you..	16:44
stephenfin	definitely not	16:44
sean-k-mooney	stephenfin: was the whole point of this feature to fix neutron sriov	16:45
spatel_	is product_id and vendore_id mandatory because in i am using devname here "pci_passthrough_whitelist = "{ "physical_network":"vlan", "devname":"eno2" }""	16:45
*** med_ has quit IRC		16:45
*** s10 has joined #openstack-nova		16:46
sean-k-mooney	spatel_: no	16:46
*** s10 has quit IRC		16:46
sean-k-mooney	a white list can be in any of these forms https://github.com/openstack/nova/blob/master/nova/pci/devspec.py#L182-L192	16:47
sean-k-mooney	actully the alias yes that need to use vendor_id and product_id	16:48
stephenfin	sean-k-mooney: If it was, it seems something may have slipped through the cracks here	16:48
sean-k-mooney	alias are not for networking they are for passthrough devices	16:48
stephenfin	Yup, I get that	16:48
spatel_	type-PCI, type-PF and type-VF what i should pick ?	16:49
spatel_	VF ?	16:49
stephenfin	From the quick glance here, it should really be configured via the whitelist. I'm not sure why I went with the alias	16:49
stephenfin	spatel_: yup	16:49
spatel_	doing it..	16:49
sean-k-mooney	stephenfin: ya i am go to confirm https://bugs.launchpad.net/nova/+bug/1795920	16:49
openstack	Launchpad bug 1795920 in OpenStack Compute (nova) "SR-IOV shared PCI numa not working " [Undecided,Confirmed]	16:49
*** med_ has joined #openstack-nova		16:49
sean-k-mooney	stephenfin: interested in working on this? if not ill add it to my list but this need to be fixed	16:50
sean-k-mooney	and backported	16:50
stephenfin	I won't tackle it tonight but I can do so, yeah	16:50
stephenfin	not sure if we can backport though. It'll be a config file change	16:50
sean-k-mooney	cool we liekly need to repreose the old spec	16:50
sean-k-mooney	stephenfin: i was suggesting we need to add the flavor and image extraspecs so no config file change	16:51
stephenfin	sean-k-mooney: Possibly, but before doing so I'd suggest going back and reading the spec reviews	16:51
stephenfin	There was a reason we didn't do that, though I don't recall it now :/	16:51
*** spatel has joined #openstack-nova		16:52
*** spatel_ has quit IRC		16:52
spatel	sean-k-mooney: this is what i change in nova.conf http://paste.openstack.org/show/731417/	16:52
stephenfin	If it's image metadata changes, we can't backport those due to object changes	16:52
spatel	can you verify	16:52
sean-k-mooney	yes i rembere i was very against using the alisa but i never recalled dropping the extra specs	16:52
spatel	going to launch instance now, figure cross	16:53
sean-k-mooney	spatel: its goning to fail.	16:53
spatel	??	16:53
spatel	why?	16:53
sean-k-mooney	looking at the code the feature was not finished	16:53
spatel	Damn it :(	16:54
spatel	so what is the deal here ?	16:54
sean-k-mooney	the numa policies are only repected for flavor based pci device passhtoruh e.g. for things like gpu or acllorator cards	16:54
sean-k-mooney	spatel: the spec was approve and the full feature was not merged	16:54
spatel	in my case its VF	16:55
*** med_ has quit IRC		16:55
spatel	currently i can't utilize both numa :(	16:55
sean-k-mooney	spatel: in your case its a neutron port with vnic_type direct correct	16:55
spatel	yes	16:55
sean-k-mooney	spatel: the workaround is to make the guest have 2 numa nodes	16:56
melwitt	sean-k-mooney: which spec was not properly finished?	16:56
openstackgerrit	Matt Riedemann proposed openstack/nova master: Handle IndexError in _populate_neutron_binding_profile https://review.openstack.org/607650	16:56
mriedem	spatel: fyi ^	16:56
sean-k-mooney	melwitt: the numa pci polices	16:56
sean-k-mooney	ill get the link on sec	16:57
melwitt	thanks. I didn't see a link in the backscroll	16:57
spatel	sean-k-mooney: "guest have 2 numa nodes" can you explain this?	16:57
*** macza has joined #openstack-nova		16:57
sean-k-mooney	melwitt: https://review.openstack.org/#/c/361140/	16:57
openstackgerrit	Vlad Gusev proposed openstack/nova master: Not set instance to ERROR if set_admin_password failed https://review.openstack.org/555160	16:57
sean-k-mooney	spatel: in the flavor set hw:numa_nodes=2	16:57
spatel	let me try hold on...	16:58
sean-k-mooney	this will create a guest with 2 numa nodes with half the cpus and ram on each	16:58
spatel	sean-k-mooney: FYI, i have tried this and it failed "hw:cpu_policy='dedicated', hw:numa_nodes='2', hw:pci_numa_affinity_policy='preferred'"	16:58
spatel	now i am going to remove "hw:cpu_policy='dedicated" and "hw:pci_numa_affinity_policy='preferred'" to see if that work	16:59
sean-k-mooney	melwitt: the flavor and image extraspecs are appently not implemente meaning that this does not work for neutron sriov ports as it should	16:59
melwitt	sean-k-mooney: ok, according to the notes on the blueprint, people thought it was completed "The last functional patch for this was merged on Dec 30, 2017" https://blueprints.launchpad.net/openstack/nova/+spec/share-pci-between-numa-nodes	16:59
spatel	melwitt: that is why i am chasing that blueprint because it says completed 2017	16:59
*** derekh has quit IRC		17:00
sean-k-mooney	melwitt: i taught it was complted im checking the code to confirm but apparently its not working	17:00
spatel	Do we have any ETA because i have 100 compute node in racks and here i am stuck with this issue :(	17:00
melwitt	I feel like someone has asked me about this bp before, asking if it applies to SRIOV too, and I thought since it never mentions SRIOV, that it doesn't	17:00
sean-k-mooney	i can proably test this locally too i have jsut set up a sriov host	17:01
melwitt	or wasn't meant to. and that adding SRIOV support would be additional work outside the scope of this particular blueprint	17:01
sean-k-mooney	melwitt: its primary usecase was sriov specifics for telcos where they had to numa node server but all nics were connecte to one numa node due to space constriants in there rack	17:01
melwitt	i.e. a new blueprint would be, for example, "add SRIOV support for sharing PCI devices between NUMA nodes"	17:01
*** Luzi has quit IRC		17:02
spatel	melwitt: its much clear now.. :)	17:03
openstackgerrit	Merged openstack/nova stable/rocky: Explicitly fail if trying to attach SR-IOV port https://review.openstack.org/605118	17:03
mriedem	known issue yeah? https://bugs.launchpad.net/nova/+bug/1794717	17:04
openstack	Launchpad bug 1794717 in OpenStack Compute (nova) "rocky: ephemeral disk can not be resized" [Undecided,New]	17:04
sean-k-mooney	mriedem: not being able to resise ephemeral disk ya	17:05
sean-k-mooney	mriedem: i mean i thk i some very specific edgecase it can work today but there is no generic way to enable it	17:05
sean-k-mooney	e.g. how to you resize form 1 500G disk to 2 400G disk so we just decided not to support it at all	17:06
mriedem	yup, bug 1558880	17:06
openstack	bug 1558880 in OpenStack Compute (nova) "instance can not resize ephemeral in mitaka" [Medium,Confirmed] https://launchpad.net/bugs/1558880	17:06
melwitt	sean-k-mooney: ok, I don't know anything about that. if that spec scope is actually incomplete, then we need to decide how we deal with it. open another spec for this cycle to finish it or treat them as bugs	17:07
sean-k-mooney	melwitt: proably reporpose the spec is the best way and jsut add the flavor exra specs and image metadata values that were orginaly proposed	17:08
sean-k-mooney	melwitt: unless you think we can backport them in which case it could be a bug	17:08
sean-k-mooney	backporting woudl be the only reason to make it a bug in my mind but its also addign new fuctionality e.g. tuning off numa affinty for pci devices	17:09
*** jpena is now known as jpena\|off		17:11
*** ralonsoh has quit IRC		17:13
*** med_ has joined #openstack-nova		17:13
sean-k-mooney	melwitt: i or stephen will repropsoe the spec	17:15
sean-k-mooney	melwitt: stephenfin is heading home so i will likely do it later today	17:15
melwitt	sean-k-mooney: I'd run the idea by mriedem too, in case he has another opinion on how to handle this	17:17
openstackgerrit	Sylvain Bauza proposed openstack/nova master: libvirt: implement reshaper for vgpu https://review.openstack.org/599208	17:17
sean-k-mooney	melwitt: sure i just felt a little checky to sneak it in as a bug fix :)	17:17
bauzas	dansmith: mriedem: I tested the vgpu reshape for allocations too and good news it works ! I just fixed a few things that I discovered when testing ^	17:18
melwitt	sean-k-mooney: yeah, you're probably right. I didn't think much about it	17:19
bauzas	now, call it a day	17:19
*** med_ has quit IRC		17:20
sean-k-mooney	spatel: where you able to use a multi numa node guest to spawn the instance. that funtionality definetly works	17:23
spatel	Testing it now..should i add "hw:cpu_policy='dedicated'" too for pinning ?	17:24
dansmith	bauzas: ack	17:25
sean-k-mooney	spatel: yes if you want cpu pinning then add hw:cpu_policy='dedicated'	17:25
spatel	doing it.. hold on.. soon report back	17:25
sean-k-mooney	no rush	17:27
*** tbachman has quit IRC		17:29
*** tbachman has joined #openstack-nova		17:33
spatel	sean-k-mooney: i am able to launch two VM with 10vCPU core each ( i have 32 core compute node with 16+16 numa) but look like it didn't pin down CPU	17:35
spatel	check this out http://paste.openstack.org/show/731420/	17:35
spatel	I can see it pin CPU cross numa	17:35
sean-k-mooney	spatel: can you run virs dumpxml <instance>	17:37
*** tbachman_ has joined #openstack-nova		17:37
sean-k-mooney	spatel: i think it pinned everythin correctly	17:37
spatel	http://paste.openstack.org/show/731423/	17:37
sean-k-mooney	spatel: it looks like each	17:37
spatel	I thought it should pin all vCPU core with same NUMA node CPU right?	17:38
sean-k-mooney	no	17:38
spatel	hmmmm?	17:38
*** tbachman has quit IRC		17:39
*** mvkr has quit IRC		17:39
*** tbachman_ is now known as tbachman		17:39
sean-k-mooney	by setting hw:numa_nodes=2 you will have half the cpus on one numa node and half on the other	17:39
sean-k-mooney	memory will also be equally split	17:39
sean-k-mooney	provided there is a pci device free on at least one of the 2 numa nodes assocaicated with the vm vcpus then we will allow the vm to boot	17:40
spatel	if i remove numa_node=2 then it will pin down all vCPU on same node right?	17:40
sean-k-mooney	correct addint hw:cpu_policy=dedicated implictly adds hw:numa_nodes=1	17:40
spatel	hmm! interesting..	17:41
spatel	using numa_node=2 will have some performing issue right?	17:41
sean-k-mooney	by explcitly setting hw:numa_nodes=2 it will allow both numa nodes on the host to be used but it will also limit the vm to host with 2+ numa nodes	17:41
sean-k-mooney	spatel: it can if the application in the gust itself does not understand numa affinity	17:42
sean-k-mooney	it can also improve the performacne as you doubles your memory bandwith as the vm will now use memroy form 2 host numa noes/memory controlers	17:43
spatel	I think time to run some test...	17:43
spatel	We are media company and using VoIP base application	17:43
sean-k-mooney	testing is always a good idea :)	17:43
spatel	First i build openstack without SR-IOV and found performance was horrible (PPS rate was only 50k after that it start dropping packets)	17:44
sean-k-mooney	as a comunity we have done a lot of work to improve numa affinity over the years	17:44
spatel	I have just started learning numa stuff so i am new but it looks interesting..	17:45
sean-k-mooney	spatel: the stict pci numa affinity was added for telco usescases wehre they could not tolerate cross numa pci/sriov	17:45
sean-k-mooney	spatel: it certenly is .... interesting. its also a pain in the ass but give better performace when you get it right	17:45
spatel	I have some legacy hardware and i have to stick to them	17:46
spatel	other side i am planning to test DPDK if its better	17:46
sean-k-mooney	numa is not going away infact its become more common	17:46
sean-k-mooney	dpdk is much better then kernel ovs	17:46
sean-k-mooney	but its more complicated too	17:46
spatel	but it doesn't need hardware dependency atleast	17:47
spatel	I spent thousand of $$$$ to get SR-IOV supported card	17:47
sean-k-mooney	spatel: not in the same way it requires that the guests use hugepages and that there is a dpdk driver for your nic	17:47
spatel	Does it perform like SR-IOV ?	17:48
sean-k-mooney	spatel: ya dpdk will be cheaeper in that sense but you will have to dedicate 1-2 cores to handel trafic for ovs-dpdk	17:48
sean-k-mooney	spatel: in some cases yes. in general not quite	17:48
sean-k-mooney	what data rates / traffic profiles are you targeting?	17:49
spatel	currently i am deploying VoIP application on 1U server with 32 core / 32G memory. and i have 1000 servers...	17:49
sean-k-mooney	10G small packets 40G jumbo frames? a mix	17:49
spatel	my peak in production 200 to 230kpps UDP packet rate	17:50
sean-k-mooney	on well dpdk can handel that easilly	17:50
spatel	really???	17:50
spatel	if that is the case then it will be win win solution	17:50
*** ivve has quit IRC		17:51
sean-k-mooney	ya dpdk was desinged to hit 10G line rate with 64byte packets which is 14mpps	17:51
spatel	we have lots of server in AWS (with sr-iov) support	17:51
spatel	that is really cool!	17:51
sean-k-mooney	with the right hardware it can hit 32mpps on a singel core but in generall you will see more like 6mpps	17:51
spatel	we are using LinuxBridge + VLAN so i need to upgrade to OVS	17:52
sean-k-mooney	its mroe or less like this	17:52
sean-k-mooney	lb<ovs<sriov+macvtap<ovs-dpdk<sriov direct	17:52
spatel	I have tried macvtap but that didn't work too	17:53
sean-k-mooney	spatel: checkout https://dpdksummit.com/Archive/pdf/2016USA/Day02-Session04-ThomasHerbert-DPDKUSASummit2016.pdf slides 16-19	17:54
spatel	reading..	17:55
sean-k-mooney	spatel: im alittle biased as im one of the people that added ovs-dpdk support to openstack but for your data rate i think it would work quite well	17:56
spatel	I need to find out how to migrate LinuxBridge to OVS	17:56
sean-k-mooney	spatel: today cold migrate works. im working on fixing livemigrate	17:57
spatel	cool!	17:57
spatel	in SR-IOV i am not able to get that function too	17:57
spatel	even bonding isn't supported	17:57
sean-k-mooney	live migrate almost works we just dont update the bridge name correctly im hoping to backport that	17:57
spatel	nice! if that work	17:58
openstackgerrit	Surya Seetharaman proposed openstack/nova master: Add scatter-gather-single-cell utility https://review.openstack.org/594947	17:58
openstackgerrit	Surya Seetharaman proposed openstack/nova master: Return a minimal construct for nova list when a cell is down https://review.openstack.org/567785	17:58
openstackgerrit	Surya Seetharaman proposed openstack/nova master: Modify get_by_cell_and_project() to get_not_qfd_by_cell_and_project() https://review.openstack.org/607663	17:58
sean-k-mooney	spatel: haha i think im working on all your missing features :) https://review.openstack.org/#/c/605116/	17:58
openstackgerrit	Merged openstack/nova stable/pike: nova-manage - fix online_data_migrations counts https://review.openstack.org/605840	17:59
spatel	:)	17:59
spatel	i have lots of requirement :) these are just starting	17:59
spatel	sean-k-mooney: thanks for help!!!	18:00
sean-k-mooney	my main focus this realease at least initally is livemigraton hardenign. e.g fixing edgecase like lb->ovs or sriov	18:00
nicolasbock	<freenode_sea "nicolasbock: if you are still ar"> Thanks for the tip!	18:00
spatel	i didn't know freenode will be very helpful.. last 2 days i am chasing google	18:00
spatel	what you use to deploy your openstack? I am using openstack-ansible	18:01
sean-k-mooney	spatel: no worries. im usually hear so feel free to ping me if you have issues	18:01
spatel	I am going to spend next 6 month here :) until my cloud is ready!!	18:01
sean-k-mooney	spatel: for developement devstack. i used to use kolla-ansible but recently joined redhat so i proably shoudl suggest OSP	18:01
spatel	we spent million dollar last year in AWS so my boss want to build own AWS :)	18:02
sean-k-mooney	spatel: that is how alot of compaines end up running opnestack clouds yes	18:02
spatel	indeed	18:02
spatel	OSP using tripleO right?	18:03
spatel	i tried and found very complicated	18:03
sean-k-mooney	yes that is the officaly supprot installer from redhat	18:03
sean-k-mooney	spatel: and yes it can be	18:03
mnaser	does anyone know if daniel berrange hangs out on irc much?	18:03
mnaser	i'm looking at this old abandoned review and i'm wondering if this is still an issue -- https://review.openstack.org/#/c/241401/	18:03
sean-k-mooney	mnaser: not on this irc but he is usally on the libvirt one	18:04
mnaser	i'll try to ping him there	18:04
spatel	sean-k-mooney: going to eat something, will catch you again, if any issue :) thanks again	18:04
sean-k-mooney	mnaser: i think that is a bug that has been forgoten about but not nessacarily fixed	18:06
mnaser	sean-k-mooney: yeah, it's not fixed, but it's been a while so i'm wondering if the whole "it doesnt working with backing" argument is no longer valid	18:07
mnaser	we're setting up some really fast hardware (pci-e nvme drives) and want to squeeze the best performance out of it.. short of going to something like lvm	18:07
sean-k-mooney	mdbooth: and lyarwood would like be able to comment better then i on https://bugs.launchpad.net/nova/+bug/1510328	18:07
openstack	Launchpad bug 1510328 in OpenStack Compute (nova) "Nova pre-allocation of qcow2 is flawed" [Low,Confirmed]	18:07
openstackgerrit	Jack Ding proposed openstack/nova master: Add HPET timer support for x86 guests https://review.openstack.org/605902	18:08
sean-k-mooney	mnaser: right am in that case would you be better with a raw image instead of qcow if your always preallocating	18:08
mnaser	sean-k-mooney: right, i'm thinking that might be the next path, raw files on disk	18:09
sean-k-mooney	mnaser: if you are also supporting ceph or boot form volume raw can often be better too even if you are using more space for glance / image cache	18:10
mnaser	but then we lose a lot of qcow2 features	18:10
sean-k-mooney	mnaser: like live snapshot	18:10
mnaser	yeah, a lot.. unfortunately	18:10
*** spatel has quit IRC		18:10
sean-k-mooney	mnaser: i dont think anyone would object if you had a way to fix the bug but did not cause others	18:11
mnaser	sean-k-mooney: yep.. its just that unfortunately there was no documentation as to why that was an issue with backing images	18:11
mnaser	so thats what im trying to research	18:11
sean-k-mooney	mnaser: its got to have something to do with the overlays that we create	18:13
mnaser	yeah it looks like it's not really a possibility	18:13
mnaser	:<	18:13
sean-k-mooney	mnaser: mdbooth and kashyap should be able to confirm tomorow when they are back online	18:14
mnaser	i'll wait to hear	18:14
mnaser	now to find ways to benchmark this server	18:14
mnaser	server/vm that is	18:14
mnaser	http://paste.openstack.org/show/731425/	18:15
sean-k-mooney	is that a vm with 468G of ram	18:16
sean-k-mooney	sorry 472	18:16
sean-k-mooney	i also like the insane amount of gpus	18:17
sean-k-mooney	may i sugges you use it to play minecrat tootally how you should benchmark	18:17
sean-k-mooney	mnaser: also i hear bitcoin is a thing :)	18:17
mnaser	sean-k-mooney: aha, we're rolling out gpus and we have instances with 472G of ram, 48 (dedicated) threads, 1.8T of PCI-e NVMe storage..	18:18
sean-k-mooney	mnaser: actully on a serious note you are an operator of a cloud with vgpus correct	18:18
mnaser	sean-k-mooney: no vgpu support, only dedicated gpus (as far as we've planned)	18:18
*** spatel has joined #openstack-nova		18:19
mnaser	part of this is MAYBE seeing if we can get some vGPU CI.. if possible, but i hear there are some more complicated reasons why its not possible	18:19
sean-k-mooney	ah well does the lack of vgpu numa affinity effect your decision to use vgpus or deploy gpus in the cloud in general	18:19
sean-k-mooney	mnaser: actully it might be useing complicated trick	18:19
sean-k-mooney	e.g. nested virt + q35 chipset + viommu + pci passthoug of phyical gpu PF to host vm	18:20
mnaser	i think we're starting to roll things out by having dedicated gpus to see market demand for it (we've had some). unfortunately the other thing that's coming to mind is i'm thinking that users who need gpu levels of performance probably would want 100% of it	18:20
mnaser	we can make nested virt available for gpu instances so maybe thats possible	18:21
sean-k-mooney	mnaser: have you talked to bauzas about possible vgpu ci?	18:21
*** imacdonn has quit IRC		18:21
*** imacdonn has joined #openstack-nova		18:21
mnaser	sean-k-mooney: we briefly talked about it.. dansmith mentioned concerns about iommu and stuff that's beyond my level of comprehension :)	18:21
mnaser	but we plan to provide at least 1 or 2 instances to openstack CI if there's a use case that makes sense	18:22
dansmith	mnaser: he said viommu, so if that's a thing now then maybe it's doable	18:22
sean-k-mooney	dansmith: yes it is but we have not enabled it in nova yet	18:22
sean-k-mooney	but its trival so we could	18:23
sean-k-mooney	well its a flavor extraspec + xml generation and other crap but its not technical very hard to do we just have not done it yet	18:23
*** pcaruana has joined #openstack-nova		18:24
mnaser	i'd be more than happy to provide 1 or 2 instances with a gpu	18:24
*** itlinux has joined #openstack-nova		18:24
sean-k-mooney	dansmith: i was added in libvrt 2.1 and qemu 3.4 https://libvirt.org/formatdomain.html#elementsIommu	18:25
* dansmith nods		18:25
nicolasbock	<freenode_mri "nicolasbock: you'll likely need "> I had to also update `instances.node` but then the allocation was updated correctly	18:26
sean-k-mooney	mnaser: thats very generous. it would certenly help if we could actully test vgpu the upstream ci even if it was an experimtal job that did not run on all patches	18:26
mnaser	while i wrap things up here i can push up a patch to add 1 or 2. we'll probably do it with min-servers: 0 and max-servers: 2 to start with	18:27
*** spatel has quit IRC		18:28
mriedem	efried: i've replied in https://review.openstack.org/#/c/606122/	18:28
*** spatel has joined #openstack-nova		18:28
efried	ack	18:29
spatel	sean-k-mooney: currently i have "intel_iommu=on" in grub.conf, should i add "iommu=pt" too?	18:29
sean-k-mooney	spatel: "iommu=pt" is not requried but advised	18:30
efried	mriedem: +2	18:30
spatel	will add that :)	18:30
sean-k-mooney	spatel: this is my cmdline on my sriov systems BOOT_IMAGE=/vmlinuz-3.10.0-862.11.6.el7.x86_64 root=UUID=2cca5edf-cbcc-4f0d-91df-df438bbd56c5 ro crashkernel=auto rhgb quiet intel_iommu=on iommu=pt pci=assign-busses,realloc	18:30
spatel	are you using SR-IOV?	18:31
spatel	or DPDK?	18:31
mriedem	efried: thanks	18:31
mriedem	lazy-load can be a cruel mistress	18:31
sean-k-mooney	spatel: pci=assign-busses,realloc is to work around some hardware bugs where my bios does not allocate enough iommu space	18:31
efried	srsly	18:31
spatel	nice!	18:32
sean-k-mooney	spatel: iommu=pt is need for dpdk but not sriov	18:32
spatel	oh! make sense	18:32
sean-k-mooney	i enable it always so i can deploy both and swap between them	18:32
spatel	sean-k-mooney: i have created new flavor (15 vCPU / 14G memory ) and i got this error	18:32
spatel	ERROR (BadRequest): Instance CPUs and/or memory cannot be evenly distributed across instance NUMA nodes. Explicit assignment of CPUs and memory to nodes is required (HTTP 400) (Request-ID: req-400663e1-75d1-4bbc-a06b-07dcfd845be6)	18:32
*** cdent has quit IRC		18:33
spatel	This is what i have in flavor hw:cpu_policy='dedicated', hw:numa_nodes='2'	18:33
sean-k-mooney	spatel: yes the error could be improved. the vcpus needs to be devisable by the number of numa nodes othere wise you have to tell us how many cpus to put on each numa node	18:33
*** mvkr has joined #openstack-nova		18:34
sean-k-mooney	spatel: so i would jsut set it to 14 vcpus and 14G memory	18:34
spatel	cool!!	18:34
spatel	doing it	18:35
*** artom has joined #openstack-nova		18:35
sean-k-mooney	spatel: since you are optimising your flavors and given your usecase i would also recomment enableing hugepage memroy for the vm	18:35
sean-k-mooney	it will give you a 30-40% performacne boost in many workloads but require you to allocate hugepages on the host first via the kernel command line ideally	18:36
spatel	I have this setting in grub "hugepagesz=2M hugepages=2048 transparent_hugepage=never"	18:36
sean-k-mooney	ah cool that will only allcoate 4G of hugepates form the 32 you have total.	18:37
spatel	one more question i have 32G memory so what number should be good for number of pages?	18:37
spatel	yes i have 32G memory	18:37
spatel	i heard 1G is better for hugepage	18:38
sean-k-mooney	haha i was getting to that next. :) i would recommend between 24-28G of hugepages leave 6-8 for the host	18:38
sean-k-mooney	spatel: it depends for some workloads yes for most it does not matter	18:38
sean-k-mooney	hugepages cannot be subdevided so if you use 1G hugepges the ram in you flavor must be a multiple of 1G	18:39
spatel	my application doesn't need lots of memory because its RTP traffic voip	18:39
spatel	hmm! make sense	18:39
sean-k-mooney	spatel: in your case i doubt you will see a difference and 2MB hugepages will give you more granularity	18:40
spatel	lets stick to 2M then :)	18:40
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add post-test hook for testing evacuate https://review.openstack.org/602174	18:40
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add volume-backed evacuate test https://review.openstack.org/604397	18:40
openstackgerrit	Matt Riedemann proposed openstack/nova master: Add functional regression test for bug 1794996 https://review.openstack.org/606106	18:41
openstack	bug 1794996 in OpenStack Compute (nova) "_destroy_evacuated_instances fails and kills n-cpu startup if lazy-loading flavor on a deleted instance" [High,In progress] https://launchpad.net/bugs/1794996 - Assigned to Matt Riedemann (mriedem)	18:41
openstackgerrit	Matt Riedemann proposed openstack/nova master: Fix InstanceNotFound during _destroy_evacuated_instances https://review.openstack.org/606122	18:41
openstackgerrit	Matt Riedemann proposed openstack/nova master: Run evacuate tests with local/lvm and shared/rbd storage https://review.openstack.org/604400	18:41
spatel	sean-k-mooney: should i use this? hugepagesz=2M hugepages=15360	18:41
spatel	it will give 30G	18:41
spatel	let me try to make it 28G	18:41
spatel	keep 4G for OS	18:42
sean-k-mooney	the hugepage memory will not be availabel to normal os process so 2MB is likely too tight for a compute node	18:42
*** tbachman has quit IRC		18:43
sean-k-mooney	4G should be ok but used to give ^G as my safty margin that said i did not need that much of a margin	18:43
*** artom has quit IRC		18:43
spatel	In that case let me give 8G to OS (keep 24G for VM)	18:44
sean-k-mooney	spatel: i would set it to 12 288	18:44
sean-k-mooney	which is 24G	18:44
spatel	hugepagesz=2M hugepages=12288 - DONE! going to reboot compute node	18:45
spatel	Do you use isolcpus= CPUAffinity ?	18:45
spatel	I was reading about that not sure i need to worry about that or not	18:45
sean-k-mooney	i would then also reduce the max size vms to 10 or 12 GB ram for your largset flavor so you can alway boot at least 2 of them	18:46
sean-k-mooney	isolcpus is not the same as cpuaffintiy	18:46
sean-k-mooney	i generally avoid isolcpus= it is a rather large hammer to reach for	18:47
mriedem	bauzas: i've -2ed https://review.openstack.org/#/c/599208/ as we discussed yesterday	18:47
sean-k-mooney	it should only be used for realtime instances even then its tricky to use correctly	18:47
*** artom has joined #openstack-nova		18:47
spatel	ok! got it	18:47
*** liuyulong has quit IRC		18:47
*** artom has quit IRC		18:47
sean-k-mooney	spatel: generally i would only suggest usign it to isolage cores allcoated to ovs-dpdk if you chose to depoly it	18:47
*** artom has joined #openstack-nova		18:48
sean-k-mooney	spatel: dont get me wronge isolcpus= has a place but its only somting i reach for when i have no other options left and i really really need it	18:48
spatel	I will soon deploy dpdk (believe me)	18:49
mnaser	sean-k-mooney: https://review.openstack.org/#/c/607686/ .. ill push up a patch to test things out when possible (or at least something to confirm its working)	18:49
spatel	in flavor i should set hw:mem_page_size='2048' right ?	18:50
mnaser	so maybe if you want to start figuring out nova dependencies	18:50
*** gyee has quit IRC		18:50
*** pcaruana has quit IRC		18:50
*** s10 has joined #openstack-nova		18:52
dansmith	mriedem: melwitt tssurya: cells meeting today? I have an appointment the hour before, but I will probably be back in time	18:52
sean-k-mooney	spatel: you can but i prefer seting hw:mem_page_size=large	18:52
sean-k-mooney	spatel: that will work with both 1G and 2MB hugepages	18:53
spatel	done! let me do that	18:53
dansmith	side note, mriedem melwitt: This is easy early utility stuff we can merge in front of the down cell stuff: https://review.openstack.org/#/c/594947/	18:53
tssurya	dansmith: the most important question I had was the best way to get the "type" of exception from the utility ^	18:54
dansmith	type?	18:54
tssurya	we could also do it during the meeting if others also have topics	18:54
mriedem	dansmith: i was holding off on that one until i knew what was going on further in the series	18:54
tssurya	yea for instance a TimeOut/DBonnectionError expception versus InstanceNotFound exception	18:54
tssurya	as of now we always return the "raised_exception_sentinel" which is not that useful	18:55
tssurya	because based on the type of exception we have to handle it differently	18:55
nicolasbock	Fixing the migration is more difficult it seems: I successfully updated the DB with the correct hypervisor and `server show` was now showing the correct hypervisor information	18:55
mriedem	please hold, i have to sell something to a craigslist weirdo real quick	18:55
dansmith	tssurya: by timeout you mean an rpc timeout, not the did_not_respond_sentinel I assume?	18:55
nicolasbock	I ran `server migrate` which failed with `[Errno 2] No such file or directory: '/var/lib/nova/instances/2aa3a324-bf22-4e0c-912a-d7c52f59f1fd/disk`	18:55
nicolasbock	So the disk didn't make it in the first migration	18:55
nicolasbock	I verified that the disk is still on the old host	18:56
* melwitt listens		18:56
sean-k-mooney	mnaser: that spec would allow testing quite alot of featue espcially if it supproted nested virt	18:56
nicolasbock	Since I am in the middle of open heart surgery anyway I figured I just rsync the disk to the current hypervisor	18:56
tssurya	dansmith: TimeOut was just an example, my main problem is to filter the "InstanceNotFOund" from others for nvoa show	18:56
tssurya	nvoa show*	18:56
tssurya	nova show*	18:56
nicolasbock	So that worked	18:56
mnaser	sean-k-mooney: these vms have nested virt	18:56
nicolasbock	However, migrate is now refusing to migrate since the VM is in an ERROR state	18:56
dansmith	tssurya: yeah	18:57
mnaser	nicolasbock: nova reset-state --active	18:57
nicolasbock	I can't `nova reset-state` either, it says `Reset state for server 2aa3a324-bf22-4e0c-912a-d7c52f59f1fd succeeded; new state is error`	18:57
tssurya	as of now, when we get the InstanceNotFound, utility hides this returns the sentinel, I try to go and make a minimal construct when I shouldn't be	18:57
nicolasbock	which isn't all that helpful :(	18:57
dansmith	tssurya: probably have to get away from the sentinel object I guess	18:57
mnaser	`--active`	18:57
dansmith	tssurya: which is going to be a mess	18:57
tssurya	melwitt and I had a brief discussion	18:57
melwitt	dansmith, tssurya: sean-k-mooney proposed this class as a way to be able to return exception objects https://review.openstack.org/605251	18:57
tssurya	the other day	18:57
nicolasbock	Yeah mnaser !!!	18:57
*** spatel has quit IRC		18:58
* tssurya looking		18:58
nicolasbock	I hadn't considered that since `--active Request the server be reset to "active" state instead of "error" state (the default).`	18:58
*** efried has quit IRC		18:58
dansmith	um	18:58
*** spatel has joined #openstack-nova		18:58
nicolasbock	I guess `--active` isn't the default after all	18:58
dansmith	seems a lot overkill :)	18:59
sean-k-mooney	mnaser: do you provide any other custom nodes. i dont know if you care about ovs-dpdk or cpu pinnng but would you be ok if we used that or a sligly less different flavor to maybe test does feature in the gate?	18:59
dansmith	melwitt: tssurya: it would be trivial to just use the exception as the sentinel in the response, and we just check to see if the result isinstance(thing, Exception)	18:59
mriedem	nicolasbock: that disk not found with cold migration sounds like a bug i've seen before that is fixed, but had to do with shared storage and volume-backed instances	18:59
melwitt	dansmith: comment on the review :) it came about because I said something like, can we return the exception object in addition to the sentinel, in a tuple or something	18:59
dansmith	and then you have the exception itself	18:59
mnaser	we are slowly rolling out nested virt across our entire fleet but that is something to discuss more with the infra team i think	18:59
mriedem	nicolasbock: but likely not fixed on newton	18:59
melwitt	dansmith: yeah, that was my other suggestion. I had two ideas: drop the sentinel and check isinstance or keep the sentinel and have tuples	19:00
tssurya	dansmith: right, that would be simple, is it okay to change the utility's face now ?	19:00
nicolasbock	ok, do you happen to remember the review this was fixed in mriedem ? Maybe I can backport?	19:00
mriedem	looking	19:00
dansmith	melwitt: no reason for the sentinel I don't think	19:00
*** efried has joined #openstack-nova		19:00
nicolasbock	Thanks mriedem	19:01
dansmith	anything that isinstance(Exception) is... an error, so...	19:01
nicolasbock	mnaser: it worked! The VM has migrated to a new host	19:01
melwitt	dansmith: yeah, that's what I was thinking	19:01
sean-k-mooney	mnaser: for ovs-dpdk and cpu pinning/hugepages we dont need nvme or gpus but we do need nested virt and a vm with multiple numa nodes. it is somthing that i agree i would love to discuss with infra.	19:01
mnaser	yeah, we'd have to talk it out with infra	19:01
melwitt	dansmith: but sean-k-mooney was thinking checking isinstance was an anti-pattern of some kind	19:01
sean-k-mooney	melwitt: sorry i should read the scrole back	19:02
melwitt	sean-k-mooney: we're just talking about the "return exceptions from scatter-gather" thing	19:02
dansmith	melwitt: overengineering is an anti-pattern :)	19:02
tssurya	sean-k-mooney: its about this: https://review.openstack.org/#/c/605251/	19:02
mriedem	nicolasbock: https://review.openstack.org/#/q/Ib10081150e125961cba19cfa821bddfac4614408 is what i'm thinking of	19:03
nicolasbock	Is it ok that the disk is still on the old host after migration?	19:03
melwitt	sean-k-mooney: dansmith suggested the same thing I suggested when we first talked about it, just return exception objects instead of the sentinel and check isinstance(thing, Exception) to know whether an error was returned or not	19:03
nicolasbock	Thanks mriedem	19:03
sean-k-mooney	dansmith: well i was porting a standard calass form c++ to python. retrun an exception has some weird sidefect in python 2	19:03
dansmith	sean-k-mooney: I have no idea what weird side effect you mean, other than that re-raising it doesn't keep the exception context properly, but we won't be doing that here	19:04
dansmith	sys.exc_info I mean	19:04
nicolasbock	mriedem: gerrit's cherry-pick doesn't seem to know Newton. Is that because Newton is EOL'ed?	19:04
sean-k-mooney	melwitt: so the exception object in python 2 has a reference to the stack fram form which it was first thrown if the garbage collector cant deallocate it or any locks. sys.exc_info and retruning it is fine	19:04
mriedem	nicolasbock: correct, newton is eol upstream	19:05
mriedem	nicolasbock: note that that change is also building on top of two other fixes	19:06
mriedem	called out in the commit message	19:06
sean-k-mooney	dansmith: https://www.python.org/dev/peps/pep-0344/#open-issue-garbage-collection	19:06
sean-k-mooney	dansmith: if we call sys.exc_info() and return the tuple as the sentiel that is fine however	19:06
nicolasbock	Thanks mriedem , I will apply the fix in our vendor packages only then	19:06
nicolasbock	Thanks all for the help with the "lost" VM!	19:07
dansmith	sean-k-mooney: how is returning it any different than encapsulating it in your object here?	19:07
dansmith	from a GC perspective	19:07
sean-k-mooney	dansmith: i if you dont raise the exception and catch it it does not have the referecne to the stack frame so retrun VauleError("invalid data") is fine	19:08
mriedem	nicolasbock: well, you probably should verify that the reason you got into this mess in the first place was due to one of those bugs	19:08
mriedem	but whatever you want to do downstream is fine with me :)	19:08
sean-k-mooney	"expect Exception as e: return e" is not	19:09
nicolasbock	Yes of course :)	19:09
sean-k-mooney	expect Exception: return sys.exc_info() is also fine	19:09
spatel	sean-k-mooney: "pci=assign-busses,realloc" was causing issue my server got hung at boot, as soon as i remove that it works.. I have HP DL360p G8	19:09
dansmith	sean-k-mooney: but this result object of yours is just going to swallow the exception that we get from our handler right?	19:09
dansmith	sean-k-mooney: so it's still pinning the reference to the stack	19:10
sean-k-mooney	dansmith: ya i was planning to extend it to do the right thing on each python verion.	19:11
sean-k-mooney	this is only an issue on python2	19:11
sean-k-mooney	they fix it in python 3 so you can just return the exception	19:11
dansmith	we could trivially re-construct the exception since we know it inherits from NovaException and has very specific characteristics	19:11
dansmith	since it's only py2, and since py3 is the future, and since this is a suuuper corner case that only "may" have issues with some GC being delayed... I would tend to punt on caring about this entirely	19:12
dansmith	but	19:12
sean-k-mooney	dansmith: yep but if we are going to do that i taught it would be nice to hide that in a result calss that does the right thing	19:12
dansmith	just re-creating the exception and returning it would be fine for our purposes	19:12
melwitt	tssurya: sorry, I'm not really getting the usefulness of the single cell scatter-gather from the commit message. how does it help for down cell?	19:13
sean-k-mooney	spatel: ya dont add that. it was specically to work around a hardware bug in my old servers	19:13
dansmith	IMHO we can punt and not worry about this	19:13
spatel	roger!	19:13
spatel	i took it out	19:13
tssurya	melwitt: https://review.openstack.org/#/c/591658/3/nova/compute/api.py@2323	19:14
tssurya	we could just directly use the scatter_gather_cells(), just that it looked bad	19:14
mriedem	jaypipes: i have crushed your soul https://review.openstack.org/#/c/607626/	19:14
melwitt	tssurya: I see, to get the automatic "wait for this amount of time before timing out" part. thanks	19:15
tssurya	melwitt: we will be using it for Instance.get_by_uuid for the nova show part	19:15
sean-k-mooney	dansmith: so ya ill likely work on https://review.openstack.org/#/c/605251/2 a little more in my spare time later this week as i want that class as a tool in my tool box for the future but we may not need it for this usecase.	19:17
tssurya	dansmith, sean-k-mooney: so.. which way are are agreeing on ?	19:17
dansmith	tssurya: I vote for return the exception and boil the ocean later :)	19:17
melwitt	so what's all this reconstruction talk? in scatter-gather, when we catch the exception from the cell, are you saying we have to do more than just save the "as exp" part?	19:18
tssurya	dansmith: ack,	19:18
dansmith	melwitt: we do not	19:18
dansmith	melwitt: we could trivially if we want	19:18
tssurya	melwitt: as far as I understood just returning it should be enough	19:18
melwitt	mmkay	19:18
dansmith	just return exp.__class__(exp.args) would be enough	19:19
dansmith	but it'd need a comment about why	19:19
dansmith	and if we don't, then.. not	19:19
melwitt	ok, that's what I meant, we can't just return exp, we have to do that step	19:19
dansmith	we've just spawned a thread at this point, and most of these things are single DB calls, which means the stack being pinned by the exception is tiny	19:19
jaypipes	mriedem: awesome.	19:20
melwitt	ok, yeah. comment if we do that because otherwise I'm not going to remember why	19:20
sean-k-mooney	dansmith: oh so just return a new exception object not the one we caught	19:20
jaypipes	mriedem: it's hard enough already for me to give a rat's ass about a stable branch. :)	19:20
dansmith	melwitt: we don't have to do that step.. we can just return exp. If we want, we can do the reconstruction step (and document it)	19:21
dansmith	melwitt: I would vote for not reconstructing because I think this is super tiny	19:21
tssurya	dansmith: ah got it	19:21
melwitt	ack, thank you	19:21
tssurya	melwitt, sean-k-mooney, dansmith: thanks	19:21
dansmith	soooo, back to the meeting,	19:22
dansmith	I will probably be back, if ya'll want to meet	19:22
tssurya	dansmith: yea are we having one ?	19:22
melwitt	I'm neutral about meeting. I don't have anything special to talk about	19:22
melwitt	mriedem might want to talk about cross-cell stuff? I dunno	19:22
dansmith	mriedem may want to talk about crossing the streams	19:22
dansmith	yeah, t hat	19:22
tssurya	I don't have anything special except some silly bugs	19:22
melwitt	silly bugs? now I'm curious	19:23
sean-k-mooney	just one other comment we are not holding any locks or file handels correct wehre we raise the exception in the scater gater case?	19:23
tssurya	melwitt: https://bugs.launchpad.net/nova/+bug/1794994	19:23
openstack	Launchpad bug 1794994 in OpenStack Compute (nova) "Update the --max-rows parameter description for nova-manage db archive_deleted_rows" [Low,In progress] - Assigned to Surya Seetharaman (tssurya)	19:23
tssurya	for now I changed it to a doc fix, but I am skeptical about it	19:23
dansmith	sean-k-mooney: we would have just gotten a result from a threadpool of db workers, and they would almost definitely have re-raised outside of any locks	19:24
tssurya	it would be just good to have the API table record removal also in the max-rows	19:24
tssurya	not sure if people care though	19:24
*** spartakos has joined #openstack-nova		19:25
tssurya	but yea its not super urgent	19:25
tssurya	okay then I will head home now and will be lurking around during the meeting time in case we decide to have one	19:26
melwitt	ok, will read through it. the issue is the command output can be confusing given the treatment of the API records	19:26
sean-k-mooney	dansmith: ok the the stack frame reference keeps stack locals alive including any file handles or locks so can we add a commet the pep issue if we just return the exception just incase we have issue in the future	19:26
tssurya	melwitt: exactly	19:26
dansmith	sean-k-mooney: yep	19:26
sean-k-mooney	dansmith: i think we will be fine but future me would regret not adding it if we ever have to debug it :)	19:27
*** tbachman has joined #openstack-nova		19:27
melwitt	tssurya: thanks. this is hard for me to imagine because I can't remember what the archive_deleted_rows output looks like :P will look in the code	19:28
openstackgerrit	Matt Riedemann proposed openstack/nova stable/queens: stable-only: fix typo in IVS related privsep method https://review.openstack.org/604817	19:28
*** spatel has quit IRC		19:29
*** tssurya has quit IRC		19:30
mriedem	melwitt: dansmith: i don't really want to talk about cross-cell resize today probably; i suggested to dansmith that i skim my poc with him over a hangout early next week (i'm out tomorrow and friday)	19:30
melwitt	k	19:31
mriedem	tl;dr functional testing shows it working,	19:31
mriedem	but there are a shit load of todos	19:31
*** spatel has joined #openstack-nova		19:31
mriedem	and the patch is over 2K LOC now	19:31
mriedem	it is definitely not enterprise ready	19:31
melwitt	lol	19:31
mriedem	i have also resorted to taking sleep aids to not wake up at 1am thinking about it...	19:32
*** tbachman has quit IRC		19:33
mriedem	i've found this helps https://www.youtube.com/watch?v=Lrle0x_DHBM	19:34
melwitt	heh	19:36
*** jding1_ has quit IRC		19:37
*** spatel has quit IRC		19:38
*** jackding has joined #openstack-nova		19:38
sean-k-mooney	mriedem: oh youtube look kind of weird to me when its rendering 4:3 aspect ratio videos	19:43
*** spatel has joined #openstack-nova		19:47
*** s10 has quit IRC		19:48
*** spatel has quit IRC		19:56
*** spatel has joined #openstack-nova		19:58
mriedem	who here knows what actually happens to the guest when you stop/start a vmware/hyperv/xenapi/ironic/powervm VM?	19:59
mriedem	specifically, the root disk of said VMs if it's volume-backed?	20:00
mriedem	efried: does powervm in-tree support boot from volume yet?	20:00
efried	edmondsw: ^	20:00
efried	looking...	20:00
*** spatel has quit IRC		20:03
*** spatel has joined #openstack-nova		20:04
efried	mriedem: Does compute set destroy_disks=False to the destroy() method if booted from volume?	20:05
efried	mriedem: I assume you're trying to find out whether the disk gets destroyed or not.	20:06
*** artom has quit IRC		20:07
efried	I can tell you this: In tree, we don't destroy volumes.	20:07
efried	But I don't know whether we support bfv	20:07
mriedem	efried: no not related to that	20:07
mriedem	related to https://review.openstack.org/#/c/600628/	20:07
mriedem	which i haven't -1ed yet but it's coming	20:07
mriedem	the virt driver doesn't destroy volumes, the compute manager orchestrates the detach and delete if bdm.terminate_on_deletion is True	20:08
mriedem	i'm mostly wondering if the virt driver will disconnect and reconnect volumes on simple stop/start operatoins	20:08
mriedem	for libvirt, we do - starting around queens or rocky	20:08
efried	We don't disconnect anything on power-off	20:10
*** slaweq has quit IRC		20:10
*** mlavalle has quit IRC		20:10
efried	That said, I'm not 100% sure the platform retains ownership of that resource in such a way that you couldn't attach it to something else while the instance is powered off.	20:10
efried	Gerald would be better equipped to answer this stuff. But he ain't here.	20:11
*** slaweq has joined #openstack-nova		20:11
*** mlavalle has joined #openstack-nova		20:11
*** macza has quit IRC		20:15
*** macza has joined #openstack-nova		20:16
edmondsw	mriedem re: bfv for powervm in-tree... I believe the code is close enough that it might work, but it's untested and there is at least one improvement we should make	20:16
*** med_ has joined #openstack-nova		20:16
*** spartakos has quit IRC		20:17
edmondsw	mriedem why would you disconnect and reconnect volumes on stop/start?	20:18
sean-k-mooney	edmondsw: the libvirt driver destorys the domain and recreates it on start stop	20:20
edmondsw	right... why?	20:21
sean-k-mooney	edmondsw: so its proably done as a sideffect of that	20:21
mriedem	comments inline in that spec	20:21
sean-k-mooney	edmondsw: legacy reasons but we treat stop like delete as far as libvirt is concerend but we dont delete the disk obviously	20:22
sean-k-mooney	we do detach all ports, gpus ectra when we shut down the vm but we still retain owner ship of them in placement/the resouce tracker	20:23
*** gyee has joined #openstack-nova		20:24
edmondsw	ok. I'll assume "legacy reasons" means there's no reason for other drivers to consider doing that	20:24
*** tbachman has joined #openstack-nova		20:26
sean-k-mooney	well there is one but its not a good one. if you are using iscsi volumes by detaching the volume on stop it reduces memory uses on the issci server	20:26
sean-k-mooney	which if its hardware based also means we can potenailly free up other hardware resouces but that also means the vm can fail to start back up if somting else grabs the last slot	20:27
sean-k-mooney	that said you would have max out your cloud stroage at that point so you have bigger issue then one vm not starting	20:27
sean-k-mooney	edmondsw: i dont know if there is a actul usecase where you would want to disconnect today but mayber there is	20:28
edmondsw	sean-k-mooney tx for the explanation	20:28
sean-k-mooney	i know some people want to be able to do things with bfv root volumes when the instace is offline too but i kindof zoned out at the ptg for that conversation.	20:30
mriedem	sean-k-mooney: that is exactly the spec i'm referring to above	20:30
mriedem	and why i'm asking about this	20:30
*** spatel has quit IRC		20:31
sean-k-mooney	mriedem: ah ok that make more sense.	20:31
mriedem	because i'm pretty sure swapping the root volume while the instance is stopped was not part of the originally approved spec	20:31
mriedem	and s10 got Kevin_Zheng to change it	20:31
mriedem	b/c of how the libvirt driver works	20:31
*** tbachman_ has joined #openstack-nova		20:31
*** tbachman has quit IRC		20:31
*** tbachman_ is now known as tbachman		20:31
mriedem	and i'm asserting that's not a good enough reason...	20:31
*** spatel has joined #openstack-nova		20:31
sean-k-mooney	right i think haveing an expcit api to say detach volume for stoped instacnce would be better	20:32
sean-k-mooney	e.g. not assumeing its implcitly detatched when you stop	20:32
mriedem	well, the virt driver could just refuse to detach the root volume while the instance is stopped	20:32
mriedem	if it doens't support it and raise an exception which gets recorded as a fault	20:32
sean-k-mooney	mriedem: it could but did they not want to allow that?	20:32
sean-k-mooney	e.g. detaching root volume when its stoped. that would be almost a noop for libvirt	20:33
mriedem	the spec is proposing that you can swap the root volume while the instance is offloaded or stopped	20:33
*** med_ has quit IRC		20:33
sean-k-mooney	right but you could do that by createing a new volume. stoping the instance. detach the root volmu and attach the volume created in step 1 then start	20:34
sean-k-mooney	do you need an explcit api to do the swap as an atopmic operation	20:34
mriedem	no, and that is what the spec is proposing	20:34
mriedem	"createing a new volume. stoping the instance. detach the root volmu and attach the volume created in step 1 then start"	20:35
mriedem	my point is, i don't know that all virt drivers could handle that today for the root volume	20:35
mriedem	while the instance is stopped	20:35
sean-k-mooney	ha ok am perhaps	20:35
sean-k-mooney	i cant think why they could not if they support rebuild	20:36
*** tssurya has joined #openstack-nova		20:36
*** spatel has quit IRC		20:36
sean-k-mooney	its basicaly the same thing except we are not chaning host	20:36
*** spatel has joined #openstack-nova		20:37
mriedem	rebuild does a driver.destroy and then a driver.spawn	20:37
mriedem	it assumes destruction	20:37
mriedem	i can't say that stop/start assume that same thing	20:38
mriedem	same with shelve/unshelve	20:38
mriedem	shelve does a driver.destroy and unshelve does a driver.spawn	20:38
mriedem	sean-k-mooney: haven't you been working for like 20 straight hours at this point or something?	20:39
mriedem	at what point do you become drunk by exhaustion?	20:39
sean-k-mooney	mriedem: all of this is true. i would be surprised if this could not be supported on multiple hyperviors but its good to check.	20:39
sean-k-mooney	haha not quite but if i get tired enough i do find it harder to concentrate.	20:40
sean-k-mooney	i have been working sine 10:30 + i took an hour for lunch but ya im just finishing up for the day	20:41
mriedem	pretty sure you said you were finishing for the day about 4 hours ago	20:41
sean-k-mooney	i did finish rather late last night. ya got distracted with a few things	20:41
sean-k-mooney	i really need to file my ptg expensice tomrrow... i started twice today but then got pull into email threads + code.	20:44
sean-k-mooney	thats what i was trying to figure out for the last 30 mins but it can wait	20:44
sean-k-mooney	talk to you tommorw	20:45
melwitt	mriedem, tssurya: I told dansmith we're skipping the meeting based on the earlier convo	20:46
tssurya	melwitt: ack, thanks for the info :)	20:47
melwitt	he's not going to be back in time anyway	20:47
mriedem	he said he would be back in time	20:47
tssurya	wfm, Its too late here anyways, you guys have a good day	20:47
mriedem	<3 broken	20:47
melwitt	heh	20:48
tssurya	:)	20:48
mriedem	https://bugs.launchpad.net/nova/+bug/1795966	20:49
openstack	Launchpad bug 1795966 in OpenStack Compute (nova) "<class 'oslo_db.exception.DBNonExistentTable'> (HTTP 500)" [Undecided,Invalid]	20:49
melwitt	meanwhile, gd consoleauth. we have an API where you can 'show' your console auth token. and that is making the deprecation nightmare worse. have to figure out if/how to adjust this for the database backend	20:50
*** spatel has quit IRC		20:50
melwitt	mriedem: that must be the shortest bug report ever	20:50
mriedem	https://developer.openstack.org/api-ref/compute/#create-remote-console ?	20:50
melwitt	https://developer.openstack.org/api-ref/compute/#show-console-connection-information	20:51
mriedem	ah heh	20:51
melwitt	FML	20:51
mriedem	well,	20:51
mriedem	oh heh you can't know which cell to route it to right	20:52
mriedem	b/c the token isn't mapped in the api	20:52
mriedem	you'll have to iterate the cell dbs looking for that token id	20:52
melwitt	no... which I'm trying to remember, what did I find last time I looked at this. arrrrgghh	20:53
*** spartakos has joined #openstack-nova		20:54
mriedem	hmm, we only store the hashed token in the db right	20:55
mriedem	and that's not what the API would have in it?	20:55
melwitt	yeah only the hashed token. and I think the API takes the unhashed token from the user	20:55
mriedem	ha, cool	20:56
mriedem	for cell in all_cells(): for console_auth_token in all_console_auth_tokens_in_this_cell(): if console_auth_token == req_id: do that thing()	20:56
*** eharney has quit IRC		20:57
mriedem	i bet we log that unhashed token in the nova-api logs too...	20:58
mriedem	since it's on the path	20:58
mriedem	https://github.com/openstack/nova/blob/master/nova/api/openstack/requestlog.py#L41	20:59
melwitt	indeed, I can see it in the func test output	20:59
mriedem	ha, cool	20:59
melwitt	2018-10-03 20:37:40,870 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/os-console-auth-tokens/714a26ff-d7e6-4698-bc30-9934ebf38807"	20:59
mriedem	well luckily logging credentials isn't a CVE	21:00
*** priteau has quit IRC		21:01
melwitt	...	21:01
melwitt	I guess people aren't paying too much attention to this API, myself included	21:02
mriedem	it's admin-only by default	21:02
melwitt	I see, ok	21:02
melwitt	Note "This is only used in Xenserver VNC Proxy."	21:03
melwitt	really? I wonder how	21:03
mriedem	that's for the other 4	21:03
mriedem	i saw that as well	21:03
mriedem	os-console-auth-tokens was added specifically for rdp consoles for hyperv	21:03
melwitt	O.o	21:04
*** erlon has quit IRC		21:06
*** spartakos has quit IRC		21:08
melwitt	yeah, so we could scatter-gather a ConsoleAuthToken.validate(context, token) call and only one will return token object, the others will raise exceptions. that method takes an unhashed token and will hash it before looking for it in the db	21:09
mriedem	sure	21:10
mriedem	shitty performance but whatareyougonnado	21:11
mriedem	plus it's admin-only and no one knew it existed	21:11
melwitt	yeah, exactly	21:11
melwitt	haha, right. we have that going for us	21:11
mriedem	is there a bug for this?	21:11
melwitt	I'll add it to the pile o poopatches	21:11
melwitt	no	21:11
melwitt	I'll open one	21:11
mriedem	cool. not sure if we should report the token logging thing or just pretend i never said it.	21:12
melwitt	I was just making the changes to only access consoleauth if [workarounds] and ran into this in the func tests	21:12
melwitt	so I'm doing really good here	21:12
melwitt	yeah, I'm not sure either. I expect it wouldn't cause a CVE because it's been this way for years	21:14
mriedem	heh, well, we've had things "this way for years" that are CVEs,	21:19
mriedem	but logging credentials and tokens and such isn't considered one of them	21:19
mriedem	it's a "hardening opportunity"	21:20
melwitt	haha, ok	21:20
openstackgerrit	Jay Pipes proposed openstack/nova stable/ocata: Re-use existing ComputeNode on ironic rebalance https://review.openstack.org/607626	21:20
melwitt	https://bugs.launchpad.net/nova/+bug/1795982	21:21
openstack	Launchpad bug 1795982 in OpenStack Compute (nova) "/os-console-auth-tokens/{console_token} API doesn't handle the database backend" [High,Triaged] - Assigned to melanie witt (melwitt)	21:21
*** spatel has joined #openstack-nova		21:24
*** spartakos has joined #openstack-nova		21:26
*** spatel has quit IRC		21:29
melwitt	so, these other console create/delete/get APIs are connected to cell database models, with nothing at the API level to target cells for the consoles	21:30
*** slaweq has quit IRC		21:30
melwitt	"nova-console, which is a XenAPI-specific service that most recent VNC proxy architectures do not use."	21:31
melwitt	it sounds like that should be deprecated. we didn't do anything to handle it in a cells v2 world	21:32
melwitt	maybe I should send something to the ML to ask about it	21:35
*** awaugama has quit IRC		21:43
*** slagle has quit IRC		21:44
mriedem	i thought the xvp stuff was xen-only	21:45
*** takashin has joined #openstack-nova		21:45
melwitt	yeah, the nova-console service is xen-only. but if someone ran multi-cell with xen, the nova-console part wouldn't work right	21:46
mriedem	but yeah this is clearly busted in a cells v2 world	21:46
*** tbachman has quit IRC		21:47
melwitt	so the question will be, do we cells-v2-ify it or do we deprecate it. tbc, this is for the other APIs, not the consoleauth one I'm fixing	21:47
melwitt	*the other 4	21:47
mriedem	yeah i know	21:48
*** tbachman has joined #openstack-nova		21:49
mriedem	idk, i've asked about killing xvp in the past	21:50
mriedem	no one seems to know	21:50
melwitt	ah, ok	21:50
mriedem	i'd say if there are alternatives available for xenapi users, then we should deprecate it	21:51
mriedem	so probably a question for naichuans and BobBall	21:51
melwitt	yeah, that's what I wasn't sure about, because IIUC, xenapi has to use some ancient version of stuff, so they might actually need it because they can't use newer VNC	21:51
mriedem	and yeah send something to the dev and ops MLs	21:51
mriedem	oh b/c of python 2.4?	21:52
mriedem	i might be thinking of something else	21:52
mriedem	i guess start with the ML	21:52
melwitt	maybe. when stephenfin worked on the encrypted console stuff, he had to exclude xenapi from the version requirement IIRC	21:52
mriedem	b/c twould be nice to drop all this crap	21:52
melwitt	yeah	21:52
melwitt	I was thinking of this https://github.com/openstack/nova/blob/master/nova/cmd/novncproxy.py#L40	21:54
melwitt	so maybe unrelated since that implies xenapi users can use the regular novnc proxy	21:55
*** munimeha1 has quit IRC		21:56
melwitt	um, this is ancient. and not for exactly the same log you pointed out https://bugs.launchpad.net/nova/+bug/1492140	22:01
openstack	Launchpad bug 1492140 in OpenStack Compute (nova) "consoleauth token displayed in log file" [Low,In progress] - Assigned to Tristan Cacqueray (tristan-cacqueray)	22:01
openstackgerrit	Matt Riedemann proposed openstack/nova stable/queens: Explicitly fail if trying to attach SR-IOV port https://review.openstack.org/607729	22:05
mriedem	yeah that's for consoleauth	22:05
openstackgerrit	Matt Riedemann proposed openstack/nova stable/queens: Ignore VirtDriverNotReady in _sync_power_states periodic task https://review.openstack.org/607730	22:07
*** s10 has joined #openstack-nova		22:08
*** itlinux has quit IRC		22:09
*** mlavalle has quit IRC		22:11
mriedem	gibi: efried: so i polished off this old dnm devstack test experiment patch to hammer the scheduler to create 1000 instances in a single request which used to give us a ConcurrentUpdate failure during scheduling and creating allocations, and now i can make it fail with consumer generation conflicts http://logs.openstack.org/18/507918/8/check/tempest-full/a9f3849/controller/logs/screen-n-sch.txt.gz?level=TRACE#_Oct_02_23_29_12	22:16
mriedem	481	22:16
mriedem	Oct 02 23:29:12.475481 ubuntu-xenial-limestone-regionone-0002536892 nova-scheduler[22653]: ERROR oslo_messaging.rpc.server [None req-f4fe43ea-d117-4b7d-a3a4-23dcb59f3058 admin admin] Exception during message handling: AllocationDeleteFailed: Failed to delete allocations for consumer 6962f92b-7dca-4912-aeb2-dcae03c4b52e. Error: {"errors": [{"status": 409, "request_id": "req-13df41fe-cb55-49f1-a998-09b34e48f05b", "code": "place	22:16
mriedem	.concurrent_update", "detail": "There was a conflict when trying to complete your request.\n\n consumer generation conflict - expected null but got 3 ", "title": "Conflict"}]}	22:16
efried	mriedem: Are you using any in-flight patches under that, or just master?	22:17
*** spartakos has quit IRC		22:17
mriedem	i think that is newish right?	22:17
mriedem	master	22:17
efried	Yes, it's new, since the bottom few patches of gibi's consumer gen patches merged.	22:17
efried	mriedem: You may want to try running it on top of https://review.openstack.org/#/c/583667/ and see if that fixes it.	22:19
efried	mriedem: fyi, the ConcurrentUpdate and generation conflict are the same thing, we just switched the error code recently.	22:20
efried	so it's not really "new", it's just wearing a different dress.	22:20
mriedem	that change doesn't look like it would help here,	22:20
efried	orly ynot?	22:21
mriedem	it doesn't use the latest consumer generation when deleting allocations right?	22:21
mriedem	we're failing to submit allocations because the host is full	22:21
mriedem	Oct 02 23:29:12.377430 ubuntu-xenial-limestone-regionone-0002536892 nova-scheduler[22653]: WARNING nova.scheduler.client.report [None req-f4fe43ea-d117-4b7d-a3a4-23dcb59f3058 admin admin] Unable to submit allocation for instance 63ae7544-7693-4749-886b-024dc93f09f9 (409 {"errors": [{"status": 409, "request_id": "req-0e85117c-871c-46c2-9e01-53c84e811b44", "code": "placement.undefined_code", "detail": "There was a conflict when	22:21
mriedem	ing to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'MEMORY_MB' on resource provider 'b7709a93-f14c-42ed-addf-9736fb721728'. The requested amount would exceed the capacity. ", "title": "Conflict"}]})	22:21
mriedem	and then the scheduler is trying to cleanup allocations created for previously processed instances in the same request	22:21
mriedem	and fails to do that cleanup b/c the consumer generation changed	22:22
efried	hm, yeah, I didn't notice that the first one was a failure on deletion.	22:22
efried	That window I bitched about us not really closing	22:22
efried	apparently it's big enough for us to actually hit it.	22:22
mriedem	note this is also an extreme case,	22:23
mriedem	i'm creating 1000 instances in a single request	22:23
mriedem	expecting to melt the scheduler	22:23
mriedem	and i do	22:23
efried	I'm trying to figure out where that message is really coming from. "expected null but got 3" <== does this mean we sent null or 3 to the API?	22:25
mriedem	yeah so between the GET and PUT of the allocatoins, the consumer generation changed and we blew up	22:25
melwitt	I wonder if a normaler number like 100 would do it? because I have heard of people doing that (oath)	22:25
*** rcernin has joined #openstack-nova		22:25
efried	right, I'm trying to figure out how that happens. What else is mucking with the allocations?	22:25
mriedem	https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L2083	22:25
mriedem	"If between the GET and the PUT the consumer # generation changes then we raise AllocationDeleteFailed."	22:25
mriedem	the consumer is just the project/user right?	22:25
melwitt	(my comment was based on the use of the word "extreme")	22:26
mriedem	i mean, i guess we have the instance_uuid for the consumer here.	22:26
efried	That's the aforementioned window I bitched about, the result of which was the comment a couple of lines below that.	22:26
mriedem	is the consumer in placement the unique constraint of the uuid/project_id/user_id?	22:26
efried	There's a real consumer object now	22:26
mriedem	so the other thing here,	22:26
efried	the consumer object (a db table row) has a generation we have to update atomically or die.	22:27
sean-k-mooney	melwitt: i have spawned 350 instance in one request before on newton and it worked fine	22:27
mriedem	is that because we have a retry decorator on the select_destinations rpc call, if we get MessagingTimeout from the scheduler b/c it takes too long to schedule 1000 instances in a single request, it re-sends the request to the scheduler with the same list of instances	22:27
melwitt	sean-k-mooney: ack, that's a data point	22:27
mriedem	so at this point we would have 2 workers trying to create allocations for the same set of instances (consumers) against the same set of providers	22:27
mriedem	which will stomp all over themselves	22:27
efried	oh, okay. That'd do it.	22:27
mriedem	what i'm wondering is if delete_allocation_for_instance should detect the consumer generation conflict and retry	22:28
mriedem	like we do on claim_resources https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1744	22:28
*** s10 has quit IRC		22:28
efried	mriedem: We decided it should not	22:31
efried	there was a whole long ML thread about it.	22:31
efried	Because, we said, there really shouldn't be more than one thing acting on instance allocations at once, we said.	22:31
efried	IMO the bug is that ^	22:32
mriedem	yeah i'm writing up the bug now	22:33
efried	mriedem: Do you need the ML thread?	22:33
mriedem	no	22:33
mriedem	https://bugs.launchpad.net/nova/+bug/1795992	22:33
openstack	Launchpad bug 1795992 in OpenStack Compute (nova) "retry_select_destinations decorator can make a mess with allocations in placement in a large multi-create request" [Medium,Triaged]	22:33
efried	Heh. "make a mess".	22:34
* efried has fond memories of "diaper failure"		22:34
efried	...fond because they're memories.	22:34
mriedem	total blowout	22:34
efried	One time in IKEA	22:34
mriedem	coincidentally, lbragstad is dealing with that right now	22:34
efried	Oh, did he pop? Good deal.	22:35
mriedem	black split pea soup coming out of everything	22:35
mriedem	let me mind meld with him quick	22:35
melwitt	congrats lbragstad	22:35
sean-k-mooney	oh before i forget i popped back to say i just found out that kernel 4.16 added a new netdevsim driver that supports among other coolthings sriov. would people be ok with me creating an experimental gate job to test sriov using fedora28?	22:36
efried	mriedem: Actually, that patch I mentioned before might possibly make the failure happen earlier in the sequence...	22:36
efried	because surely the allocation is being overwritten	22:36
efried	though it might be subject to the same window-teeninenss	22:37
*** tbachman has quit IRC		22:38
*** tssurya has quit IRC		22:41
*** macza has quit IRC		22:54
*** macza_ has joined #openstack-nova		22:54
*** spartakos has joined #openstack-nova		23:02
*** tbachman has joined #openstack-nova		23:07
openstackgerrit	Matt Riedemann proposed openstack/nova master: Use long_rpc_timeout in select_destinations RPC call https://review.openstack.org/607735	23:07
mriedem	dansmith: ^	23:07
* dansmith nods in approval		23:08
*** macza_ has quit IRC		23:12
*** tbachman has quit IRC		23:12
mriedem	melwitt: ocata backport here should be ready to go https://review.openstack.org/#/c/605842/	23:12
melwitt	ok	23:13
*** tbachman has joined #openstack-nova		23:16
mriedem	lyarwood: if you want to get these live migration ipv6 changes into the final ocata release before we put it into EM mode you'll need to get the pike and ocata backports fixed up https://review.openstack.org/#/q/I1201db996ea6ceaebd49479b298d74585a78b006	23:24
*** artom has joined #openstack-nova		23:30
melwitt	TIL unified object string fields are six.text_type i.e. unicode	23:38
melwitt	do we have any things where we compare strings agnostic to bytes vs unicode in unit tests?	23:46
melwitt	this test is asserting the api response as a dict	23:47
melwitt	and if consoleauth served the request, it's bytes strings and if the unified object served the request, it's unicode strings	23:47
*** itlinux has joined #openstack-nova		23:48
*** mchlumsky has quit IRC		23:49
*** slagle has joined #openstack-nova		23:51
*** mriedem has quit IRC		23:51

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!