Wednesday, 2023-01-18

*** dasm is now known as dasm\|off		01:24
ierdem	Hi folks, is there any way to boot a signed image from volume? I am testing image validation, I can create VM by using signed images on ephemeral disks but when I try boot from volume, it throws an excepiton (https://paste.openstack.org/show/blZen5ID7OIbi47TN8ib/). I am currently working on OpenStack Ussuri, and image backend is Ceph	10:01
bauzas	gibi: so, there are many ways tricking stestr scheduling	10:10
bauzas	gibi: but given we directly call tempest which eventually calls stestr, the quickiest way to change the test scheduling is about renaming the testname	10:10
bauzas	https://stestr.readthedocs.io/en/latest/MANUAL.html#test-scheduling	10:11
bauzas	"By default stestr schedules the tests by first checking if there is any historical timing data on any tests. It then sorts the tests by that timing data loops over the tests in order and adds one to each worker that it will launch. For tests without timing data, the same is done, except the tests are in alphabetical order instead of based on timing data. If a group regex is used the same algorithm is used with groups instead of	10:11
bauzas	individual tests."	10:11
bauzas	gibi: https://review.opendev.org/c/openstack/tempest/+/870913	10:20
bauzas	gibi: could I modify https://review.opendev.org/c/openstack/nova/+/869900/ to be Depending on ^ ?	10:20
gibi	lets keep https://review.opendev.org/c/openstack/nova/+/869900/ alone. as I want to land that regardless of our troubleshooting here. As I think some of the other problems might go away after we remove the excessive logging by that.	10:24
gibi	about https://review.opendev.org/c/openstack/tempest/+/870913 why we are trying to move the test to the front? I thought we wanted to moved it later or even disable it temporarily to see if other tests are triggering the same OOM behavior and hence trying to establish a pattern causing the OOM	10:25
bauzas	gibi: OK, then I'll add a DNM on nova, np	10:32
bauzas	gibi: good question, I was wanting to see whether it was due to this test or not	10:32
bauzas	if we call it first, and if this is due to this test, it would be killed earlier, right?	10:33
opendevreview	Sylvain Bauza proposed openstack/nova master: DNM: Testing the killed test https://review.opendev.org/c/openstack/nova/+/870924	10:39
kashyap	gibi: Can you have a quick look at this workaround patch when you can (for a change all CI have passed): https://review.opendev.org/c/openstack/nova/+/870794	10:44
kashyap	(When you get a minute, that is)	10:44
tobias-urdin	gibi: any possibility that we can backport this https://review.opendev.org/c/openstack/nova/+/838976 and parent reproducer patch? we are currently patching that in production as we're on newer libvirt with older nova release (xena right now, probably be yoga later this year)	10:48
ierdem	Hi everyone, is there any way to boot a signed image from volume? I am testing image validation, I can create VM by using signed images on ephemeral disks but when I try boot from volume, it throws an excepiton (https://paste.openstack.org/show/blZen5ID7OIbi47TN8ib/). I am currently working on OpenStack Ussuri, and image backend is Ceph	11:06
gokhanisi	hello folks, after rebooting my compute host, I can't attach my cinder volumes to instances. Nova throws "unable to lock /var/lib/nova/mnt/dgf/volume-xx for metadata change: No locks available" Full logs are in https://paste.openstack.org/show/beicZ71J17WeNwLjghKc/ What can be reason of this problem ? I am on victoria. On this compute node I am using also gpu passthrough.	11:15
opendevreview	Sylvain Bauza proposed openstack/nova master: DNM: Testing the killed test https://review.opendev.org/c/openstack/nova/+/870924	11:23
gibi	tobias-urdin: regarding https://review.opendev.org/c/openstack/nova/+/838976 I think this is technically backportable but bauzas should know more about it as it is vgpu related	11:24
* bauzas looks		11:24
bauzas	gibi: tobias-urdin: I already proposed the backports down to wallaby	11:24
sean-k-mooney	tobias-urdin: there is a backport already	11:25
sean-k-mooney	tobias-urdin: https://review.opendev.org/c/openstack/nova/+/866156 is the xena cherry pick	11:25
sean-k-mooney	tobias-urdin: we needed it for wallaby for our downstream product so all the patches are up for review but we have already merged them downstream at the end of the year	11:26
gibi	ahh I missed the backports as the topic was not set on them	11:35
tobias-urdin	oh great, thanks!	11:39
gibi	bauzas: regardin OOM I can do a parallel experiement moving the test to the latest to see if others before it trigger the OOM or not	11:39
gibi	s/latest/last/	11:40
bauzas	gibi: sure, do it	11:40
gibi	ack	11:40
bauzas	gibi: I'm starting to look at the functest races	11:40
bauzas	but I'm hungry	11:40
gibi	ack. I start to get hungry too	11:41
gibi	damn biology	11:41
gibi	bauzas: this was the bug https://bugs.launchpad.net/nova/+bug/1946339 I referred to yesterday related to the funct test failures. It might or might not be related :/	11:47
opendevreview	melanie witt proposed openstack/nova master: imagebackend: Add support to libvirt_info for LUKS based encryption https://review.opendev.org/c/openstack/nova/+/826755	12:38
opendevreview	melanie witt proposed openstack/nova master: imagebackend: Cache the key manager when disk is encrypted https://review.opendev.org/c/openstack/nova/+/826756	12:38
opendevreview	melanie witt proposed openstack/nova master: libvirt: Introduce support for qcow2 with LUKS https://review.opendev.org/c/openstack/nova/+/772273	12:38
opendevreview	melanie witt proposed openstack/nova master: libvirt: Configure and teardown ephemeral encryption secrets https://review.opendev.org/c/openstack/nova/+/870931	12:38
opendevreview	melanie witt proposed openstack/nova master: Support create with ephemeral encryption for qcow2 https://review.opendev.org/c/openstack/nova/+/870932	12:38
opendevreview	melanie witt proposed openstack/nova master: Support resize with ephemeral encryption https://review.opendev.org/c/openstack/nova/+/870933	12:38
opendevreview	melanie witt proposed openstack/nova master: Add encryption support to convert_image https://review.opendev.org/c/openstack/nova/+/870934	12:38
opendevreview	melanie witt proposed openstack/nova master: Add hw_ephemeral_encryption_secret_uuid image property https://review.opendev.org/c/openstack/nova/+/870935	12:38
opendevreview	melanie witt proposed openstack/nova master: Add encryption support to qemu-img rebase https://review.opendev.org/c/openstack/nova/+/870936	12:38
opendevreview	melanie witt proposed openstack/nova master: Support snapshot with ephemeral encryption https://review.opendev.org/c/openstack/nova/+/870937	12:38
opendevreview	melanie witt proposed openstack/nova master: Add reset_encryption_fields() and save_all() to BlockDeviceMappingList https://review.opendev.org/c/openstack/nova/+/870938	12:38
opendevreview	melanie witt proposed openstack/nova master: Update driver BDMs with ephemeral encryption image properties https://review.opendev.org/c/openstack/nova/+/870939	12:38
opendevreview	melanie witt proposed openstack/nova master: DNM test ephemeral encryption + resize: qcow2, raw https://review.opendev.org/c/openstack/nova/+/862416	13:00
opendevreview	melanie witt proposed openstack/nova master: DNM test ephemeral encryption + resize: qcow2, raw https://review.opendev.org/c/openstack/nova/+/862416	13:03
opendevreview	Balazs Gibizer proposed openstack/nova master: DNM: Test OOM killed test https://review.opendev.org/c/openstack/nova/+/870950	13:28
opendevreview	Balazs Gibizer proposed openstack/nova master: DNM: Test OOM killed test https://review.opendev.org/c/openstack/nova/+/870950	13:29
gibi	bauzas: my trial is at https://review.opendev.org/c/openstack/tempest/+/870947 and https://review.opendev.org/c/openstack/nova/+/870950	13:29
opendevreview	Balazs Gibizer proposed openstack/nova master: DNM: Test OOM killed test https://review.opendev.org/c/openstack/nova/+/870950	13:33
*** dasm\|off is now known as dasm		13:34
kashyap	gibi: Thanks for catching my sloppiness here (I actually was rephrasing it locally). Do my replies seem reasonable to you? - https://review.opendev.org/c/openstack/nova/+/870794	13:35
* kashyap goes to rework		13:35
bauzas	gibi: hmm, TIL about dstat and memory_tracker usage from devstack	13:37
sean-k-mooney	ya they run as a service in all the devstack jobs	13:41
sean-k-mooney	we used to have peak_mem_tracker too or somethign like that	13:41
sean-k-mooney	dstat more or less has all the info you want/need	13:41
kashyap	gibi: When you get a sec, I'm wondering what else is missing in this unit-test diff to check the API is called only once - https://paste.opendev.org/show/b7CXHOkMeuuXuzQD6QtW/	13:54
gibi	bauzas: this fresh functional failure https://7ffaea22ff93fca2f0ea-bf433abff5f8b85f7f80257b72ac6f67.ssl.cf5.rackcdn.com/869900/7/gate/nova-tox-functional-py38/3b10d8a/testr_results.html is very similar to what we discuss with melwitt in the comments of https://bugs.launchpad.net/nova/+bug/1946339	14:15
gibi	so probably eventlets are escaping the end of test case execution whete they were born and interfering with later tests	14:16
sean-k-mooney	parsing that statement...	14:17
sean-k-mooney	that sound kind fo familar	14:18
gibi	yepp we fixed a set of those in the past but not all it seems	14:18
sean-k-mooney	i tought we were explcitly shutting dow the event loop between tests gobally	14:18
sean-k-mooney	as in vai a fixture	14:19
gibi	is there a way to do that?	14:19
sean-k-mooney	well yes	14:19
sean-k-mooney	if we modify the base test case to call into eventlet	14:19
sean-k-mooney	in test cleanup	14:19
sean-k-mooney	i think there is a global kill but there is also a per greenthered kill	14:20
sean-k-mooney	https://eventlet.net/doc/modules/greenthread.html#eventlet.greenthread.kill thats the per green tread one	14:21
sean-k-mooney	we can also use waitall	14:21
sean-k-mooney	https://eventlet.net/doc/modules/greenpool.html#eventlet.greenpool.GreenPool.waitall	14:21
gibi	we are not pooling our eventlets	14:22
gibi	and as you noted the greenthread.kill assumes we have access to the greented to kill	14:23
sean-k-mooney	well we do have a greenthread pool but its provide by oslo	14:23
sean-k-mooney	i was just looking at the docs to see if we have a way to list the greenthreads	14:23
gibi	when we call spawn or spawn_n we are not using the greenlet from the pool	14:23
sean-k-mooney	ya but i think there is a default pool that is used	14:24
sean-k-mooney	i could be wrong	14:24
gibi	at least I haven't came accross it when originally fixed part of this problem	14:24
sean-k-mooney	i guess if there is one it does not say https://eventlet.net/doc/basic_usage.html#eventlet.spawn	14:25
sean-k-mooney	is there any reason not to jsut have one gloabl pool	14:25
sean-k-mooney	gibi: i was thinking of https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.executor_thread_pool_size by the way	14:26
sean-k-mooney	Size of executor thread pool when executor is threading or eventlet.	14:26
gibi	as far as I see that is only used by oslo_messaging creating rpc message handler threads / eventlets. but nova uses spawn and spawn_n directly outside of oslo messaging	14:28
gibi	we can try to pool them but I'm not sure both spawn and spawn_n can be pooled in the same way	14:28
gibi	as they are not creating the same entity	14:29
sean-k-mooney	there are spawn and spawn_n function on the pools	14:29
sean-k-mooney	we might need a speerate on form the rpc one	14:29
sean-k-mooney	or want a seperate one	14:29
sean-k-mooney	but i think form an api point of view it shoudl be fine	14:29
sean-k-mooney	https://eventlet.net/doc/modules/greenpool.html#eventlet.greenpool.GreenPool.spawn and https://eventlet.net/doc/modules/greenpool.html#eventlet.greenpool.GreenPool.spawn_n	14:30
sean-k-mooney	hopefully we could just update it here https://github.com/openstack/nova/blob/master/nova/utils.py#L635-L684	14:30
sean-k-mooney	so create a module level pool and use that then in the test call waitall on the base testcase cleanup	14:31
sean-k-mooney	https://github.com/openstack/nova/blob/master/nova/test.py#L150 we currently done tha a cleanup function but in setup we can also jsut add	14:32
sean-k-mooney	self.addCleanup(utils.greenpool.waitall)	14:32
gibi	I can try to set up a way to reproduce the issue more frequently locally and try to see if the pooling might solve it or not	14:35
bauzas	sorry, was at the hairdresser	14:40
* bauzas scrolling up		14:40
opendevreview	Balazs Gibizer proposed openstack/nova master: DNM: Test OOM killed test https://review.opendev.org/c/openstack/nova/+/870950	14:48
gibi	bauzas: one result from your OOM trial is that I noticed that test in question takes a realitvely long time even if it passes tempest.api.compute.admin.test_aaa_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive [181.818420s]	14:54
bauzas	gibi: yup, I've seen it	14:54
bauzas	maybe we should introspect the memory size of the cached image	14:54
bauzas	gibi: fwiw, since the UT ran successfully in the DNM patch, I looked at n-api log and I found we called it	15:20
bauzas	gibi: while on https://834de1be955e9175dba1-6977f7378e5264bdb9ba9d1465839752.ssl.cf1.rackcdn.com/869900/6/gate/nova-ceph-multistore/f5aa5ed/controller/logs/screen-n-api.txt we were not calling it	15:20
bauzas	so, I think the test was killed during the first glance call	15:21
bauzas	https://github.com/openstack/tempest/blob/master/tempest/api/compute/admin/test_volume.py#L84-L89	15:22
dansmith	I'm stacking a ceph devstack right now	15:49
dansmith	so when it's done I could try running just that test and see if it behaves properly in isolation	15:49
opendevreview	Sylvain Bauza proposed openstack/nova master: DNM: Testing the killed test https://review.opendev.org/c/openstack/nova/+/870924	15:55
gibi	bauzas, dansmith: https://bugs.launchpad.net/nova/+bug/2002951/comments/5 based on dstat and the tempest log I'm pretty sure that loading the image data is using up the memory	16:21
bauzas	gibi: I added a few lines	16:22
dansmith	gibi: oh is show_image() eating the whole image?	16:22
bauzas	https://review.opendev.org/c/openstack/tempest/+/870913/2/tempest/api/compute/admin/test_aaa_volume.py	16:22
dansmith	like response.content instead of response.iter_content ?	16:23
bauzas	dansmith: I asked for the image cache size in my next revision	16:23
bauzas	gibi: that was my guess	16:23
bauzas	hence the new rev I created $	16:23
dansmith	if so I guess that's my bad for making the image size large, although that is exactly why I did it :)	16:23
dansmith	oh, I see,	16:25
gibi	dansmith: I don't see any iter_content involved	16:25
dansmith	it's actually downloading and re-uploading the image?	16:25
gibi	yepp	16:25
gibi	and doing it in memory	16:25
dansmith	riight, okay	16:25
dansmith	so yeah that'll have to turn into a chunked loop	16:25
dansmith	I can take a look at that if you want, but we might want to just disable that test for the moment	16:26
bauzas	dansmith: as you see they copy in memory the whole image	16:26
gibi	bauzas: if I'm right L90 in your modification wont return https://review.opendev.org/c/openstack/tempest/+/870913/2/tempest/api/compute/admin/test_aaa_volume.py#90	16:26
bauzas	gibi: hmm?	16:26
bauzas	https://docs.openstack.org/oslo.utils/ocata/examples/timeutils.html#using-a-stopwatch-as-a-context-manager	16:27
gibi	bauzas: OMM will kill the process in the midle	16:27
gibi	OOM	16:27
bauzas	gibi: no, the test is run before	16:27
dansmith	this also means anyone using tempest with a real image (like for verification) will be eating a ton of data	16:27
bauzas	that's still using aaa	16:27
gibi	calling _create_image_with_custom_property will trigger the OOM	16:27
bauzas	gibi: it wasn't the case in the first revision	16:28
bauzas	we waited for 180secs but eventually we didn't got a kill	16:28
gibi	hm, maybe at that run the image fit into memory	16:28
bauzas	https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_362/870924/2/check/nova-ceph-multistore/3626391/testr_results.html	16:29
bauzas	gibi: yeah, because the test is called firstly	16:29
gibi	anyhow dansmith if you look into fixing it then I will propose to disable this test in tempest with a @skip decorator as it is probably dangerous to run in any job	16:29
dansmith	yeah, do it	16:30
gibi	ack	16:30
gibi	bauzas: ahh yeah, that helps to fit	16:30
dansmith	I'm still stacking, had to start over because horizon seems broken :/	16:30
dansmith	2023 is off to a great start	16:30
bauzas	gibi: that said, I wonder why this test was fine since like 10 months before	16:31
gibi	my local tox -e functional-py310 env produces strange failures (>1000 failed test on master) so yeah it is great :D	16:31
dansmith	bauzas: I recently increased the size of the image used on the ceph job from 16MB to 1G	16:31
dansmith	but that was in like november or so	16:31
dansmith	so I'm pretty surprised we've been holding on this long	16:31
dansmith	probably because that memory never gets touched again and just gets swapped out	16:31
sean-k-mooney	the gate was pretty ok after you went on pto in the start od december	16:32
sean-k-mooney	so i dont think its related to using the 1G image	16:33
dansmith	sean-k-mooney: I'm happy to leave if that's what helps	16:33
dansmith	sean-k-mooney: this job crashing with OOM seems clearly related as the job eats the 1g image and ... swells to 1g before it goes boom	16:33
dansmith	s/job/test worker/	16:33
sean-k-mooney	ok do we know why its only happening now or did we get lucky beofre	16:33
gibi	maybe something else grown in memory usage recently a bit and pushing the overall worker VM over the line	16:34
gibi	sean-k-mooney: as bauzas shown if your run this test earlier in the job then it still passes	16:34
dansmith	sean-k-mooney: read the scrollback :)	16:34
sean-k-mooney	ya well ok we could revert back to 512 mb and see if it grows by 512mb	16:34
sean-k-mooney	i was just starting too ya	16:34
dansmith	sean-k-mooney: gibi identified a test that reads the whole image into memory	16:34
bauzas	gibi: dansmith: added Tempest and glance to the bug report	16:34
sean-k-mooney	oh ok	16:34
sean-k-mooney	so fix that test or revert i assume fix the test to not do that	16:35
sean-k-mooney	or exilcitly use a small image in that test	16:35
bauzas	sean-k-mooney: I have a change that will tell us how much memory it takes https://review.opendev.org/c/openstack/tempest/+/870913/2/tempest/api/compute/admin/test_aaa_volume.py#90	16:35
* sean-k-mooney start reading back		16:35
dansmith	no, the test is old there's nothing to revert	16:35
dansmith	we need to fix the test	16:35
bauzas	dansmith: I think we have now an issue because we call more tests	16:36
sean-k-mooney	by revert i ment your image change but the test is clearly not writen as we woudl like	16:36
sean-k-mooney	i.e. it should not break with a bix image	16:36
sean-k-mooney	*big	16:36
bauzas	while it was working fine before, now the memory is too large	16:36
sean-k-mooney	like downstream i know we use rhel images in some cases	16:36
sean-k-mooney	and those are just under a gig too like 700mb	16:37
dansmith	the images client in tempest already chunks the upload, it just does it from a fixed size buffer, so it just needs to be smarter	16:37
bauzas	sean-k-mooney: again, we'll know how much memory creates this test with my new CI job	16:37
sean-k-mooney	bauzas: just got back to this point in scrolback	16:38
sean-k-mooney	bauzas: ack	16:38
dansmith	the large image was specifically to flush out things like this, so I don't think going back to a small image gets us anything useful	16:38
bauzas	anyway, this is a guess	16:38
bauzas	nothing was changed in this module since Feb 22	16:38
dansmith	if anything, it makes me think we can make it larger as this might have been the OOM limit I was running into with 2G	16:38
sean-k-mooney	dansmith: i agree	16:38
bauzas	https://github.com/openstack/tempest/blob/master/tempest/api/compute/admin/test_volume.py	16:39
sean-k-mooney	dansmith: the trade of is we only have 80G of disk space in ci	16:39
sean-k-mooney	so back to 2G perhaps 20G proably not	16:39
bauzas	oh wait	16:39
dansmith	I understand, disk space is not the issue though	16:39
bauzas	maybe we change the image ref	16:39
* bauzas verifying whether we modified the option value closely		16:40
opendevreview	Kashyap Chamarthy proposed openstack/nova master: libvirt: At start-up allow skiping compareCPU() with a workaround https://review.opendev.org/c/openstack/nova/+/870794	16:40
kashyap	Duh, forgot to commit 2 files	16:41
opendevreview	Kashyap Chamarthy proposed openstack/nova master: libvirt: At start-up allow skiping compareCPU() with a workaround https://review.opendev.org/c/openstack/nova/+/870794	16:41
bauzas	https://github.com/openstack/devstack/blob/master/lib/tempest#L213-L220	16:45
bauzas	hmmmm	16:45
gibi	propsed the skip for this test https://review.opendev.org/c/openstack/tempest/+/870974 I checked no other test using the _create_image_with_custom_property util function	16:46
bauzas	2023-01-18 11:54:33.763321 \| controller \| ++ lib/tempest:get_active_images:155 : '[' cirros-raw = cirros-0.5.2-x86_64-disk ']'	16:46
bauzas	https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_362/870924/2/check/nova-ceph-multistore/3626391/job-output.txt	16:46
bauzas	we only get the cirros image	16:46
bauzas	shouldn't be that large	16:46
dansmith	bauzas: you understand that the ceph job uses a 1G cirros image right?	16:48
bauzas	Jan 18 11:57:14.947333 np0032776548 glance-api[110229]: DEBUG glance.image_cache [None req-76dfbfc9-8d31-4a47-a529-e95d8077cfc0 tempest-AttachSCSIVolumeTestJSON-1393201534 tempest-AttachSCSIVolumeTestJSON-1393201534-project-admin] Tee'ing image '0bc12eec-2802-48e8-bedf-0931be582d19' into cache {{(pid=110229) get_caching_iter /opt/stack/glance/glance/image_cache/__init__.py:343}}	16:49
bauzas	dansmith: oh sorry no, wasn't knowing	16:49
gibi	950MB image is downloaded in my case	16:49
dansmith	bauzas: [08:31:27] <dansmith> bauzas: I recently increased the size of the image used on the ceph job from 16MB to 1G	16:50
bauzas	missed that line	16:50
dansmith	:)	16:50
bauzas	ok, so we know that we cache 1GB in memory	16:50
gibi	te test is nice as it get the image metadata first so in the log there is a "size": 996147200 before the data is downloaded	16:50
bauzas	gibi: I'm waiting for my test job to return but I guess we'll see a size of 1GB in memory for that variable	16:51
bauzas	dansmith: about your question (why do we trigger now the kill and not earlier), my guess is that we were just below the line	16:52
opendevreview	Balazs Gibizer proposed openstack/nova master: DNM: Test that OOM triggering test is skipped https://review.opendev.org/c/openstack/nova/+/870950	16:52
dansmith	bauzas: yeah, like I said, we're probably just swapping it all and never touching it again, so pressure is high and we're close to the edge :)	16:53
bauzas	one way to alleviate this issue would be to make sure we run that greedy test into a specific test runner worker	16:53
opendevreview	Aaron S proposed openstack/nova master: Add further workaround features for qemu_monitor_announce_self https://review.opendev.org/c/openstack/nova/+/867324	16:54
bauzas	https://stestr.readthedocs.io/en/latest/MANUAL.html#test-scheduling	16:54
bauzas	tempest exposes the worker configs from stestr	16:54
gibi	bauzas: we just shouldn't load the whole image data in memory at once	16:54
gibi	as in general image size can be way bigger than memory size	16:54
* bauzas tries to understand the reason behind the cache		16:55
bauzas	gibi: that's true	16:55
bauzas	caching the metadata seems ok to me	16:55
bauzas	caching the data itself seems unnecessary	16:55
bauzas	unless you want to compare bytes per bytes	16:55
gibi	yeah meatadata is bound by glance API	16:55
gibi	image size is unbound	16:55
bauzas	but agreed you could and should compare streams and not objects	16:56
bauzas	for the dataz	16:56
dansmith	it's just a naive test not thinking about the world outside a 16MB image	16:56
bauzas	glance team is added on the bug report	16:56
dansmith	anyone using this for verification of a real cloud likely has an even larger image,	16:56
dansmith	so it's clearly not okay to do this	16:56
bauzas	agreed	16:56
bauzas	whoami-rajat: hey, happy new year :)	16:56
opendevreview	Aaron S proposed openstack/nova master: Add further workaround features for qemu_monitor_announce_self https://review.opendev.org/c/openstack/nova/+/867324	16:57
opendevreview	Aaron S proposed openstack/nova master: Add further workaround features for qemu_monitor_announce_self https://review.opendev.org/c/openstack/nova/+/867324	16:58
opendevreview	Merged openstack/nova master: Strictly follow placement allocation during PCI claim https://review.opendev.org/c/openstack/nova/+/855650	16:59
bauzas	hmmm, stackalytics.io is fone	17:01
bauzas	gone*	17:01
* bauzas wonders who to ping		17:01
gibi	what do you mean by gone?	17:01
gibi	it loads for me with	17:02
gibi	Last updated on 18 Jan 2023 12:44:21 UTC	17:02
bauzas	I got a timeout	17:02
bauzas	oh this works now	17:02
gibi	gmann, dansmith: was there some policy change recently that can cause that the nova functional test locally gets 403 from placement?	17:04
gibi	{'errors': [{'status': 403, 'title': 'Forbidden', 'detail': 'Access was denied to this resource.\n\n placement:allocations:list ', 'request_id': 'req-c5c03029-ba79-4e4a-8ec8-03deadb24ded'}]}	17:04
dansmith	gibi: reliably?	17:04
gibi	yes	17:04
gibi	this is pure nova master functional test	17:04
dansmith	oof, then probably	17:04
gmann	gibi: yeah we changed the default and placement fixture needed fix whihc is merged i think	17:04
bauzas	gibi: yeah	17:05
gmann	gibi: can you rebase the placement repo?	17:05
bauzas	gibi: we switched to new RBAC policies	17:05
gibi	it is just my nova repo locally	17:05
gmann	this one https://review.opendev.org/c/openstack/placement/+/869525	17:05
gmann	gibi: because nova functional test use placement fixture from placement repo	17:05
bauzas	which we pull as a dependency, right?	17:06
gmann	yes	17:06
gibi	so nova's tox.ini has	17:06
gibi	openstack-placement>=1.0.0	17:06
gibi	that should pull in the latest openstack-placement	17:07
gibi	in the functional venv	17:07
bauzas	tox -r ?	17:07
gibi	I have openstack-placement==8.0.0	17:07
gibi	in the venv	17:07
gibi	I guess we merged the fix in placement but we haven't released it	17:07
gmann	i do not think we released placement with that	17:08
gibi	so nova's tox.ini pulls placement from pypi	17:08
gmann	yeah not released yet, we should do	17:08
gibi	^^ yepp	17:08
gibi	or change nova's tox.ini to pull placement from github	17:08
gmann	yeah for now this can be workaround	17:08
gmann	let me push it today unless bauzas you want to do?	17:08
gmann	I feel placement fixture import from placement in functional test should be changed like we do for cinder/glance fixture otherwise we need new placement release for any change in there	17:10
bauzas	gmann: do the push and I'll +1	17:11
gmann	bauzas: ok	17:11
gibi	gmann: based on the constraint in tox.ini openstack-placement>=1.0.0 this is the first time we need such a release due to the fixture	17:12
bauzas	gibi: gmann: shall we consider to pull from gh ?	17:13
gibi	I would keep pypi	17:13
gmann	gibi: yeah because this actually change the things like default policy. but this can occur if change in default for policy or config unless we change nova functional test to move to those new defaults. for example	17:14
gibi	if this becomes a frequent problem then I would change the placemnet fixture in nova	17:14
gmann	placement policy need different token default than what nova is using for access	17:14
gibi	I just confimed switching to gh locally in the tox.ini fixes the problem. Still I vote for release a new placement version and bumping the constarint in nova's tox.ini	17:19
gmann	ok. yeah once released we should bump the constraint if we wan to use it from pypi	17:21
gmann	I will push the release	17:21
gibi	yepp	17:21
gibi	and thanks	17:21
bauzas	gmann: I need to disappear soon	17:27
gmann	bauzas: https://review.opendev.org/c/openstack/releases/+/870989	17:28
*** gmann is now known as gmann_afk		17:29
bauzas	gmann_afk: gibi: that's where I'm struggling to consider 8.1.0 as a correct number	17:31
bauzas	placement is cycle-with-rc	17:32
gibi	bauzas: you mean 8.0.0 was Zed, so 8.1.0 should come from stable/ze?	17:32
bauzas	gibi: yup	17:33
bauzas	but we have tools for creating YAMLs	17:33
gibi	yeah I'm not sure either if we can relase 8.1 from placement master	17:33
gibi	elodilles: ^^?	17:33
bauzas	https://releases.openstack.org/reference/using.html#using-new-release-command	17:34
bauzas	so, I'd say we should call out this release using the tool with "new-release antelope placement milestone"	17:36
bauzas	elodilles: right?	17:37
elodilles	we could just release beta from cycle-with-rc projects	17:38
gibi	that would be 9.0.0 beta I assume	17:39
elodilles	answering in #openstack-release	17:39
elodilles	gibi: yes, 9.0.0.0b1	17:39
bauzas	elodilles: using new-release, I guess this is 'milestone' arg I presume ?	17:39
*** gmann_afk is now known as gmann		17:41
gmann	i see, you are right. 8.1.0 is not right	17:41
elodilles	bauzas: yes, 'milestone' generates 9.0.0.0b1 (to answer it here as well :))	17:43
bauzas	ack, gtk	17:43
mnaser	is there a reason why nova only generates device: [] metadata for tagged bdms only?	17:48
mnaser	https://github.com/openstack/nova/blob/702dfd33bb93b7cee8c76e117e26bfe56f637460/nova/virt/libvirt/driver.py#L12092	17:49
mnaser	and then https://github.com/openstack/nova/blob/702dfd33bb93b7cee8c76e117e26bfe56f637460/nova/virt/libvirt/driver.py#L12107-L12108	17:49
mnaser	which then https://github.com/openstack/nova/blob/702dfd33bb93b7cee8c76e117e26bfe56f637460/nova/virt/libvirt/driver.py#L12020-L12024	17:49
mnaser	and if its supposed to be with way™, how could one figure out whats attached to the system?	17:50
bauzas	mnaser: sorry, calling it a day	17:51
elodilles	bauzas gibi : fyi, tox.ini might need an update to allow to install beta releases of placement. that can be done via adding >1.0.0.0b1 instead of >1.0.0 ... if i remember correctly	17:52
mnaser	bauzas: lol, was that enough nova for you? :p	17:52
bauzas	mnaser: that :)	17:52
bauzas	:D	17:52
bauzas	one day of CI issues, and I quit.	17:52
bauzas	mnaser: maybe artom could help you	17:52
bauzas	artom: tl;dr: mnaser is wondering why we only generate the device metadata for tagged bdms	17:53
artom	mnaser, that's by design IIRC, every other bit of information there is already visible to the guest	17:53
artom	mnaser, it's only the tag that comes from the user	17:53
artom	Without the tag it's pointless	17:54
mnaser	volume uuid? i remember there is a place where it does come from though i think	17:54
artom	There might be something else exposed there, like the 'trusted' param for NICs	17:54
artom	mnaser, that should show up as the disk serial number	17:54
mnaser	ah yes	17:54
gibi	elodilles: I will do the tox.ini change once the package is on pypi	18:06
*** gmann is now known as gmann_afk		18:06
sean-k-mooney	mnaser: artom with that said we coudl generate the metadata if we wanted too	18:17
sean-k-mooney	it just wont add extra info as artom mentioned	18:17
sean-k-mooney	you can use lsblk lsusb and lspci to discover it already in the guest	18:17
artom	sean-k-mooney, we could... I just don't see the point? It's already all info the guest has access to	18:17
artom	With lspci, `ip`, etc	18:18
sean-k-mooney	the point woudl jsut to make that part of the metadta nolonger optional	18:18
sean-k-mooney	so you woudl not have to check if its aviabel or not in the ugest it will be	18:18
sean-k-mooney	even if you could get that info elsewhere	18:18
opendevreview	Balazs Gibizer proposed openstack/nova master: Clean up after ImportModulePoisonFixture https://review.opendev.org/c/openstack/nova/+/870993	18:24
gibi	bauzas: while it is part of the the functional instability it is not the case and therefore this is not the fix for it, but it is related and it is a cleanup ^^	18:24
gibi	s/it is not the case/it is not the root cause/	18:25
sean-k-mooney	hum interesting	18:27
sean-k-mooney	what were we leaking	18:27
sean-k-mooney	ah the filter	18:27
sean-k-mooney	ok so we were leakign the filters which could increae memory usage	18:28
sean-k-mooney	but it is not sharing state or caussing ither issues	18:28
gibi	yeah	18:28
sean-k-mooney	it does proably contribute to the oom issues	18:28
gibi	it is part of the functional instability realted to https://bugs.launchpad.net/nova/+bug/1946339	18:28
gibi	as in the recent case it is that import poison that get called by the late eventlet	18:29
gibi	so I looked at the poision and found a global state	18:33
sean-k-mooney	ya so each executor has what about 1000 tests per worker	18:34
sean-k-mooney	i would guess this is addign 1-2MB per run	18:34
sean-k-mooney	at most	18:34
gibi	yeah it is not realted the OOM case we saw in tempest	18:35
sean-k-mooney	i have not check but i would be surpried if adding a filter allocates more then a KB or memory it should jsut be a few fucntion pointer and python objects	18:35
sean-k-mooney	well didnt we also have OOM issues in teh funtional tests	18:36
sean-k-mooney	wll not OOM python interpreter crahses	18:36
sean-k-mooney	gibi: one question however	18:38
sean-k-mooney	https://review.opendev.org/c/openstack/nova/+/870993/1/nova/tests/fixtures/nova.py#1838	18:39
sean-k-mooney	shoudl we install the filter in setUp not init	18:39
sean-k-mooney	i assume the orgianl intent of doign it in init was to do it only once	18:40
sean-k-mooney	for the life time of the fixture and resue the fixture	18:40
sean-k-mooney	which is not how we are doign it	18:40
sean-k-mooney	so if we are going to clean up in cleanup should we set it up in setUp	18:40
*** gmann_afk is now known as gmann		18:56
opendevreview	Merged openstack/nova master: Use new get_rpc_client API from oslo.messaging https://review.opendev.org/c/openstack/nova/+/869900	21:01
opendevreview	Merged openstack/nova master: FUP for the scheduler part of PCI in placement https://review.opendev.org/c/openstack/nova/+/862876	21:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!