Wednesday, 2023-01-18

*** dasm is now known as dasm|off01:24
ierdemHi folks, is there any way to boot a signed image from volume? I am testing image validation, I can create VM by using signed images on ephemeral disks but when I try boot from volume, it throws an excepiton (https://paste.openstack.org/show/blZen5ID7OIbi47TN8ib/). I am currently working on OpenStack Ussuri, and image backend is Ceph10:01
bauzasgibi: so, there are many ways tricking stestr scheduling10:10
bauzasgibi: but given we directly call tempest which eventually calls stestr, the quickiest way to change the test scheduling is about renaming the testname10:10
bauzashttps://stestr.readthedocs.io/en/latest/MANUAL.html#test-scheduling10:11
bauzas"By default stestr schedules the tests by first checking if there is any historical timing data on any tests. It then sorts the tests by that timing data loops over the tests in order and adds one to each worker that it will launch. For tests without timing data, the same is done, except the tests are in alphabetical order instead of based on timing data. If a group regex is used the same algorithm is used with groups instead of 10:11
bauzasindividual tests."10:11
bauzasgibi: https://review.opendev.org/c/openstack/tempest/+/87091310:20
bauzasgibi: could I modify https://review.opendev.org/c/openstack/nova/+/869900/ to be Depending on ^ ?10:20
gibilets keep https://review.opendev.org/c/openstack/nova/+/869900/ alone. as I want to land that regardless of our troubleshooting here. As I think some of the other problems might go away after we remove the excessive logging by that.10:24
gibiabout https://review.opendev.org/c/openstack/tempest/+/870913 why we are trying to move the test to the front? I thought we wanted to moved it later or even disable it temporarily to see if other tests are triggering the same OOM behavior and hence trying to establish a pattern causing the OOM10:25
bauzasgibi: OK, then I'll add a DNM on nova, np10:32
bauzasgibi: good question, I was wanting to see whether it was due to this test or not10:32
bauzasif we call it first, and if this is due to this test, it would be killed earlier, right?10:33
opendevreviewSylvain Bauza proposed openstack/nova master: DNM: Testing the killed test  https://review.opendev.org/c/openstack/nova/+/87092410:39
kashyapgibi: Can you have a quick look at this workaround patch when you can (for a change all CI have passed): https://review.opendev.org/c/openstack/nova/+/87079410:44
kashyap(When you get a minute, that is)10:44
tobias-urdingibi: any possibility that we can backport this https://review.opendev.org/c/openstack/nova/+/838976 and parent reproducer patch? we are currently patching that in production as we're on newer libvirt with older nova release (xena right now, probably be yoga later this year)10:48
ierdemHi everyone, is there any way to boot a signed image from volume? I am testing image validation, I can create VM by using signed images on ephemeral disks but when I try boot from volume, it throws an excepiton (https://paste.openstack.org/show/blZen5ID7OIbi47TN8ib/). I am currently working on OpenStack Ussuri, and image backend is Ceph11:06
gokhanisihello folks, after rebooting my compute host, I can't attach my cinder volumes to instances. Nova throws  "unable to lock /var/lib/nova/mnt/dgf/volume-xx for metadata change: No locks available" Full logs are in https://paste.openstack.org/show/beicZ71J17WeNwLjghKc/ What can be reason of this problem ? I am on victoria. On this compute node I am using also gpu passthrough.11:15
opendevreviewSylvain Bauza proposed openstack/nova master: DNM: Testing the killed test  https://review.opendev.org/c/openstack/nova/+/87092411:23
gibitobias-urdin: regarding https://review.opendev.org/c/openstack/nova/+/838976 I think this is technically backportable but bauzas should know more about it as it is vgpu related11:24
* bauzas looks11:24
bauzasgibi: tobias-urdin: I already proposed the backports down to wallaby11:24
sean-k-mooneytobias-urdin: there is a backport already11:25
sean-k-mooneytobias-urdin: https://review.opendev.org/c/openstack/nova/+/866156 is the xena cherry pick11:25
sean-k-mooneytobias-urdin: we needed it for wallaby for our downstream product so all the patches are up for review but we have already merged them downstream at the end of the year11:26
gibiahh I missed the backports as the topic was not set on them11:35
tobias-urdinoh great, thanks!11:39
gibibauzas: regardin OOM I can do a parallel experiement moving the test to the latest to see if others before it trigger the OOM or not11:39
gibis/latest/last/11:40
bauzasgibi: sure, do it11:40
gibiack11:40
bauzasgibi: I'm starting to look at the functest races11:40
bauzasbut I'm hungry11:40
gibiack. I start to get hungry too 11:41
gibidamn biology11:41
gibibauzas: this was the bug https://bugs.launchpad.net/nova/+bug/1946339 I referred to yesterday related to the funct test failures. It might or might not be related :/11:47
opendevreviewmelanie witt proposed openstack/nova master: imagebackend: Add support to libvirt_info for LUKS based encryption  https://review.opendev.org/c/openstack/nova/+/82675512:38
opendevreviewmelanie witt proposed openstack/nova master: imagebackend: Cache the key manager when disk is encrypted  https://review.opendev.org/c/openstack/nova/+/82675612:38
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Introduce support for qcow2 with LUKS  https://review.opendev.org/c/openstack/nova/+/77227312:38
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Configure and teardown ephemeral encryption secrets  https://review.opendev.org/c/openstack/nova/+/87093112:38
opendevreviewmelanie witt proposed openstack/nova master: Support create with ephemeral encryption for qcow2  https://review.opendev.org/c/openstack/nova/+/87093212:38
opendevreviewmelanie witt proposed openstack/nova master: Support resize with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87093312:38
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to convert_image  https://review.opendev.org/c/openstack/nova/+/87093412:38
opendevreviewmelanie witt proposed openstack/nova master: Add hw_ephemeral_encryption_secret_uuid image property  https://review.opendev.org/c/openstack/nova/+/87093512:38
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to qemu-img rebase  https://review.opendev.org/c/openstack/nova/+/87093612:38
opendevreviewmelanie witt proposed openstack/nova master: Support snapshot with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87093712:38
opendevreviewmelanie witt proposed openstack/nova master: Add reset_encryption_fields() and save_all() to BlockDeviceMappingList  https://review.opendev.org/c/openstack/nova/+/87093812:38
opendevreviewmelanie witt proposed openstack/nova master: Update driver BDMs with ephemeral encryption image properties  https://review.opendev.org/c/openstack/nova/+/87093912:38
opendevreviewmelanie witt proposed openstack/nova master: DNM test ephemeral encryption + resize: qcow2, raw  https://review.opendev.org/c/openstack/nova/+/86241613:00
opendevreviewmelanie witt proposed openstack/nova master: DNM test ephemeral encryption + resize: qcow2, raw  https://review.opendev.org/c/openstack/nova/+/86241613:03
opendevreviewBalazs Gibizer proposed openstack/nova master: DNM: Test OOM killed test  https://review.opendev.org/c/openstack/nova/+/87095013:28
opendevreviewBalazs Gibizer proposed openstack/nova master: DNM: Test OOM killed test  https://review.opendev.org/c/openstack/nova/+/87095013:29
gibibauzas: my trial is at https://review.opendev.org/c/openstack/tempest/+/870947 and https://review.opendev.org/c/openstack/nova/+/87095013:29
opendevreviewBalazs Gibizer proposed openstack/nova master: DNM: Test OOM killed test  https://review.opendev.org/c/openstack/nova/+/87095013:33
*** dasm|off is now known as dasm13:34
kashyapgibi: Thanks for catching my sloppiness here (I actually was rephrasing it locally).  Do my replies seem reasonable to you? - https://review.opendev.org/c/openstack/nova/+/87079413:35
* kashyap goes to rework13:35
bauzasgibi: hmm, TIL about dstat and memory_tracker usage from devstack13:37
sean-k-mooneyya they run as a service in all the devstack jobs13:41
sean-k-mooneywe used to have peak_mem_tracker too or somethign like that13:41
sean-k-mooneydstat more or less has all the info you want/need13:41
kashyapgibi: When you get a sec, I'm wondering what else is missing in this unit-test diff to check the API is called only once - https://paste.opendev.org/show/b7CXHOkMeuuXuzQD6QtW/13:54
gibibauzas: this fresh functional failure https://7ffaea22ff93fca2f0ea-bf433abff5f8b85f7f80257b72ac6f67.ssl.cf5.rackcdn.com/869900/7/gate/nova-tox-functional-py38/3b10d8a/testr_results.html is very similar to what we discuss with melwitt in the comments of https://bugs.launchpad.net/nova/+bug/194633914:15
gibiso probably eventlets are escaping the end of test case execution whete they were born and interfering with later tests14:16
sean-k-mooneyparsing that statement...14:17
sean-k-mooneythat sound kind fo familar14:18
gibiyepp we fixed a set of those in the past but not all it seems14:18
sean-k-mooneyi tought we were explcitly shutting dow the event loop between tests gobally 14:18
sean-k-mooneyas in vai a fixture14:19
gibiis there a way to do that?14:19
sean-k-mooneywell yes14:19
sean-k-mooneyif we modify the base test case to call into eventlet14:19
sean-k-mooneyin test cleanup14:19
sean-k-mooneyi think there is a global kill but there is also a per greenthered kill 14:20
sean-k-mooneyhttps://eventlet.net/doc/modules/greenthread.html#eventlet.greenthread.kill thats the per green tread one14:21
sean-k-mooneywe can also use waitall14:21
sean-k-mooneyhttps://eventlet.net/doc/modules/greenpool.html#eventlet.greenpool.GreenPool.waitall14:21
gibiwe are not pooling our eventlets14:22
gibiand as you noted the greenthread.kill assumes we have access to the greented to kill 14:23
sean-k-mooneywell we do have a greenthread pool but its provide by oslo14:23
sean-k-mooneyi was just looking at the docs to see if we have a way to list the greenthreads14:23
gibiwhen we call spawn or spawn_n we are not using the greenlet from the pool14:23
sean-k-mooneyya but i think there is a default pool that is used14:24
sean-k-mooneyi could be wrong14:24
gibiat least I haven't came accross it when originally fixed part of this problem14:24
sean-k-mooneyi guess if there is one it does not say https://eventlet.net/doc/basic_usage.html#eventlet.spawn14:25
sean-k-mooneyis there any reason not to jsut have one gloabl pool 14:25
sean-k-mooneygibi: i was thinking of https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.executor_thread_pool_size by the way14:26
sean-k-mooneySize of executor thread pool when executor is threading or eventlet.14:26
gibias far as I see that is only used by oslo_messaging creating rpc message handler threads / eventlets. but nova uses spawn and spawn_n directly outside of oslo messaging14:28
gibiwe can try to pool them but I'm not sure both spawn and spawn_n can be pooled in the same way14:28
gibias they are not creating the same entity14:29
sean-k-mooneythere are spawn and spawn_n function on the pools14:29
sean-k-mooneywe might need a speerate on form the rpc one14:29
sean-k-mooneyor want a seperate one14:29
sean-k-mooneybut i think form an api point of view it shoudl be fine14:29
sean-k-mooneyhttps://eventlet.net/doc/modules/greenpool.html#eventlet.greenpool.GreenPool.spawn and https://eventlet.net/doc/modules/greenpool.html#eventlet.greenpool.GreenPool.spawn_n14:30
sean-k-mooneyhopefully we could just update it here https://github.com/openstack/nova/blob/master/nova/utils.py#L635-L68414:30
sean-k-mooneyso create a module level pool and use that then in the test call waitall on the base testcase cleanup14:31
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/test.py#L150 we currently done tha a cleanup function but in setup we can also jsut add14:32
sean-k-mooneyself.addCleanup(utils.greenpool.waitall)14:32
gibiI can try to set up a way to reproduce the issue more frequently locally and try to see if the pooling might solve it or not14:35
bauzassorry, was at the hairdresser14:40
* bauzas scrolling up14:40
opendevreviewBalazs Gibizer proposed openstack/nova master: DNM: Test OOM killed test  https://review.opendev.org/c/openstack/nova/+/87095014:48
gibibauzas: one result from your OOM trial is that I noticed that test in question takes a realitvely long time even if it passes tempest.api.compute.admin.test_aaa_volume.AttachSCSIVolumeTestJSON.test_attach_scsi_disk_with_config_drive [181.818420s]14:54
bauzasgibi: yup, I've seen it14:54
bauzasmaybe we should introspect the memory size of the cached image14:54
bauzasgibi: fwiw, since the UT ran successfully in the DNM patch, I looked at n-api log and I found we called it15:20
bauzasgibi: while on https://834de1be955e9175dba1-6977f7378e5264bdb9ba9d1465839752.ssl.cf1.rackcdn.com/869900/6/gate/nova-ceph-multistore/f5aa5ed/controller/logs/screen-n-api.txt we were not calling it15:20
bauzasso, I think the test was killed during the first glance call15:21
bauzashttps://github.com/openstack/tempest/blob/master/tempest/api/compute/admin/test_volume.py#L84-L8915:22
dansmithI'm stacking a ceph devstack right now15:49
dansmithso when it's done I could try running just that test and see if it behaves properly in isolation15:49
opendevreviewSylvain Bauza proposed openstack/nova master: DNM: Testing the killed test  https://review.opendev.org/c/openstack/nova/+/87092415:55
gibibauzas, dansmith: https://bugs.launchpad.net/nova/+bug/2002951/comments/5 based on dstat and the tempest log I'm pretty sure that loading the image data is using up the memory16:21
bauzasgibi: I added a few lines16:22
dansmithgibi: oh is show_image() eating the whole image?16:22
bauzashttps://review.opendev.org/c/openstack/tempest/+/870913/2/tempest/api/compute/admin/test_aaa_volume.py16:22
dansmithlike response.content instead of response.iter_content ?16:23
bauzasdansmith: I asked for the image cache size in my next revision16:23
bauzasgibi: that was my guess16:23
bauzashence the new rev I created $ 16:23
dansmithif so I guess that's my bad for making the image size large, although that is exactly why I did it :)16:23
dansmithoh, I see, 16:25
gibidansmith: I don't see any iter_content involved16:25
dansmithit's actually downloading and re-uploading the image?16:25
gibiyepp16:25
gibiand doing it in memory16:25
dansmithriight, okay16:25
dansmithso yeah that'll have to turn into a chunked loop16:25
dansmithI can take a look at that if you want, but we might want to just disable that test for the moment16:26
bauzasdansmith: as you see they copy in memory the whole image16:26
gibibauzas: if I'm right L90 in your modification wont return https://review.opendev.org/c/openstack/tempest/+/870913/2/tempest/api/compute/admin/test_aaa_volume.py#9016:26
bauzasgibi: hmm?16:26
bauzashttps://docs.openstack.org/oslo.utils/ocata/examples/timeutils.html#using-a-stopwatch-as-a-context-manager16:27
gibibauzas: OMM will kill the process in the midle16:27
gibiOOM16:27
bauzasgibi: no, the test is run before16:27
dansmiththis also means anyone using tempest with a real image (like for verification) will be eating a ton of data16:27
bauzasthat's still using aaa16:27
gibicalling _create_image_with_custom_property will trigger the OOM16:27
bauzasgibi: it wasn't the case in the first revision16:28
bauzaswe waited for 180secs but eventually we didn't got a kill16:28
gibihm, maybe at that run the image fit into memory16:28
bauzashttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_362/870924/2/check/nova-ceph-multistore/3626391/testr_results.html16:29
bauzasgibi: yeah, because the test is called firstly16:29
gibianyhow dansmith if you look into fixing it then I will propose to disable this test in tempest with a @skip decorator as it is probably dangerous to run in any job16:29
dansmithyeah, do it16:30
gibiack16:30
gibibauzas: ahh yeah, that helps to fit16:30
dansmithI'm still stacking, had to start over because horizon seems broken :/16:30
dansmith2023 is off to a *great* start16:30
bauzasgibi: that said, I wonder why this test was fine since like 10 months before16:31
gibimy local tox -e functional-py310 env produces strange failures (>1000 failed test on master) so yeah it is *great* :D16:31
dansmithbauzas: I recently increased the size of the image used on the ceph job from 16MB to 1G16:31
dansmithbut that was in like november or so16:31
dansmithso I'm pretty surprised we've been holding on this long16:31
dansmithprobably because that memory never gets touched again and just gets swapped out16:31
sean-k-mooneythe gate was pretty ok after you went on pto in the start od december16:32
sean-k-mooneyso i dont think its related to using the 1G image16:33
dansmithsean-k-mooney: I'm happy to leave if that's what helps16:33
dansmithsean-k-mooney: this job crashing with OOM seems clearly related as the job eats the 1g image and ... swells to 1g before it goes boom16:33
dansmiths/job/test worker/16:33
sean-k-mooneyok do we know why its only happening now or did we get lucky beofre16:33
gibimaybe something else grown in memory usage recently a bit and pushing the overall worker VM over the line16:34
gibisean-k-mooney: as bauzas shown if your run this test earlier in the job then it still passes 16:34
dansmithsean-k-mooney: read the scrollback :)16:34
sean-k-mooneyya well ok we could revert back to 512 mb and see if it grows by 512mb16:34
sean-k-mooneyi was just starting too ya16:34
dansmithsean-k-mooney: gibi identified a test that reads the whole image into memory16:34
bauzasgibi: dansmith: added Tempest and glance to the bug report16:34
sean-k-mooneyoh ok 16:34
sean-k-mooneyso fix that test or revert i assume fix the test to not do that16:35
sean-k-mooneyor exilcitly use a small image in that test16:35
bauzassean-k-mooney: I have a change that will tell us how much memory it takes https://review.opendev.org/c/openstack/tempest/+/870913/2/tempest/api/compute/admin/test_aaa_volume.py#9016:35
* sean-k-mooney start reading back16:35
dansmithno, the test is old there's nothing to revert16:35
dansmithwe need to fix the test16:35
bauzasdansmith: I think we have now an issue because we call more tests16:36
sean-k-mooneyby revert i ment your image change but the test is clearly not writen as we woudl like16:36
sean-k-mooneyi.e. it should not break with a bix image16:36
sean-k-mooney*big16:36
bauzaswhile it was working fine before, now the memory is too large16:36
sean-k-mooneylike downstream i know we use rhel images in some cases16:36
sean-k-mooneyand those are just under a gig too like 700mb16:37
dansmiththe images client in tempest already chunks the upload, it just does it from a fixed size buffer, so it just needs to be smarter16:37
bauzassean-k-mooney: again, we'll know how much memory creates this test with my new CI job16:37
sean-k-mooneybauzas: just got back to this point in scrolback16:38
sean-k-mooneybauzas: ack16:38
dansmiththe large image was specifically to flush out things like this, so I don't think going back to a small image gets us anything useful16:38
bauzasanyway, this is a guess16:38
bauzasnothing was changed in this module since Feb 2216:38
dansmithif anything, it makes me think we can make it larger as this might have been the OOM limit I was running into with 2G16:38
sean-k-mooney dansmith: i agree16:38
bauzashttps://github.com/openstack/tempest/blob/master/tempest/api/compute/admin/test_volume.py16:39
sean-k-mooneydansmith: the trade of is we only have 80G of disk space in ci16:39
sean-k-mooneyso back to 2G perhaps 20G proably not16:39
bauzasoh wait16:39
dansmithI understand, disk space is not the issue though16:39
bauzasmaybe we change the image ref 16:39
* bauzas verifying whether we modified the option value closely16:40
opendevreviewKashyap Chamarthy proposed openstack/nova master: libvirt: At start-up allow skiping compareCPU() with a workaround  https://review.opendev.org/c/openstack/nova/+/87079416:40
kashyapDuh, forgot to commit 2 files16:41
opendevreviewKashyap Chamarthy proposed openstack/nova master: libvirt: At start-up allow skiping compareCPU() with a workaround  https://review.opendev.org/c/openstack/nova/+/87079416:41
bauzashttps://github.com/openstack/devstack/blob/master/lib/tempest#L213-L22016:45
bauzashmmmm16:45
gibipropsed the skip for this test https://review.opendev.org/c/openstack/tempest/+/870974 I checked no other test using the _create_image_with_custom_property util function16:46
bauzas 2023-01-18 11:54:33.763321 | controller | ++ lib/tempest:get_active_images:155        :   '[' cirros-raw = cirros-0.5.2-x86_64-disk ']'16:46
bauzashttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_362/870924/2/check/nova-ceph-multistore/3626391/job-output.txt16:46
bauzaswe only get the cirros image16:46
bauzasshouldn't be that large16:46
dansmithbauzas: you understand that the ceph job uses a 1G cirros image right?16:48
bauzasJan 18 11:57:14.947333 np0032776548 glance-api[110229]: DEBUG glance.image_cache [None req-76dfbfc9-8d31-4a47-a529-e95d8077cfc0 tempest-AttachSCSIVolumeTestJSON-1393201534 tempest-AttachSCSIVolumeTestJSON-1393201534-project-admin] Tee'ing image '0bc12eec-2802-48e8-bedf-0931be582d19' into cache {{(pid=110229) get_caching_iter /opt/stack/glance/glance/image_cache/__init__.py:343}}16:49
bauzasdansmith: oh sorry no, wasn't knowing16:49
gibi950MB image is downloaded in my case16:49
dansmithbauzas: [08:31:27]  <dansmith> bauzas: I recently increased the size of the image used on the ceph job from 16MB to 1G16:50
bauzasmissed that line16:50
dansmith:)16:50
bauzasok, so we know that we cache 1GB in memory16:50
gibite test is nice as it get the image metadata first so in the log there is a "size": 996147200 before the data is downloaded16:50
bauzasgibi: I'm waiting for my test job to return but I guess we'll see a size of 1GB in memory for that variable16:51
bauzasdansmith: about your question (why do we trigger now the kill and not earlier), my guess is that we were just below the line16:52
opendevreviewBalazs Gibizer proposed openstack/nova master: DNM: Test that OOM triggering test is skipped  https://review.opendev.org/c/openstack/nova/+/87095016:52
dansmithbauzas: yeah, like I said, we're probably just swapping it all and never touching it again, so pressure is high and we're close to the edge :)16:53
bauzasone way to alleviate this issue would be to make sure we run that greedy test into a specific test runner worker16:53
opendevreviewAaron S proposed openstack/nova master: Add further workaround features for qemu_monitor_announce_self  https://review.opendev.org/c/openstack/nova/+/86732416:54
bauzashttps://stestr.readthedocs.io/en/latest/MANUAL.html#test-scheduling16:54
bauzastempest exposes the worker configs from stestr16:54
gibibauzas: we just shouldn't load the whole image data in memory at once16:54
gibias in general image size can be way bigger than memory size16:54
* bauzas tries to understand the reason behind the cache16:55
bauzasgibi: that's true16:55
bauzascaching the metadata seems ok to me16:55
bauzascaching the data itself seems unnecessary16:55
bauzasunless you want to compare bytes per bytes16:55
gibiyeah meatadata is bound by glance API16:55
gibiimage size is unbound16:55
bauzasbut agreed you could and should compare streams and not objects16:56
bauzasfor the dataz16:56
dansmithit's just a naive test not thinking about the world outside a 16MB image16:56
bauzasglance team is added on the bug report16:56
dansmithanyone using this for verification of a real cloud likely has an even larger image,16:56
dansmithso it's clearly not okay to do this16:56
bauzasagreed16:56
bauzaswhoami-rajat: hey, happy new year :)16:56
opendevreviewAaron S proposed openstack/nova master: Add further workaround features for qemu_monitor_announce_self  https://review.opendev.org/c/openstack/nova/+/86732416:57
opendevreviewAaron S proposed openstack/nova master: Add further workaround features for qemu_monitor_announce_self  https://review.opendev.org/c/openstack/nova/+/86732416:58
opendevreviewMerged openstack/nova master: Strictly follow placement allocation during PCI claim  https://review.opendev.org/c/openstack/nova/+/85565016:59
bauzashmmm, stackalytics.io is fone17:01
bauzasgone*17:01
* bauzas wonders who to ping17:01
gibiwhat do you mean by gone?17:01
gibiit loads for me with17:02
gibiLast updated on 18 Jan 2023 12:44:21 UTC 17:02
bauzasI got a timeout17:02
bauzasoh this works now17:02
gibigmann, dansmith: was there some policy change recently that can cause that the nova functional test locally gets 403 from placement?17:04
gibi{'errors': [{'status': 403, 'title': 'Forbidden', 'detail': 'Access was denied to this resource.\n\n placement:allocations:list  ', 'request_id': 'req-c5c03029-ba79-4e4a-8ec8-03deadb24ded'}]}17:04
dansmithgibi: reliably?17:04
gibiyes17:04
gibithis is pure nova master functional test17:04
dansmithoof, then probably17:04
gmanngibi: yeah we changed the default and placement fixture needed fix whihc is merged i think17:04
bauzasgibi: yeah 17:05
gmanngibi: can you rebase the placement repo?17:05
bauzasgibi: we switched to new RBAC policies17:05
gibiit is just my nova repo locally17:05
gmannthis one https://review.opendev.org/c/openstack/placement/+/86952517:05
gmanngibi: because nova functional test use placement fixture from placement repo17:05
bauzaswhich we pull as a dependency, right?17:06
gmannyes17:06
gibiso nova's tox.ini has17:06
gibiopenstack-placement>=1.0.017:06
gibithat should pull in the latest openstack-placement17:07
gibiin the functional venv17:07
bauzastox -r ?17:07
gibiI have openstack-placement==8.0.017:07
gibiin the venv17:07
gibiI guess we merged the fix in placement but we haven't released it17:07
gmanni do not think we released placement with that17:08
gibiso nova's tox.ini pulls placement from pypi17:08
gmannyeah not released yet, we should do17:08
gibi^^ yepp17:08
gibior change nova's tox.ini to pull placement from github17:08
gmannyeah for now this can be workaround17:08
gmannlet me push it today unless bauzas you want to do?17:08
gmannI feel placement fixture import from placement in functional test should be changed like we do for cinder/glance fixture otherwise we need new placement release for any change in there17:10
bauzasgmann: do the push and I'll +117:11
gmannbauzas: ok17:11
gibigmann: based on the constraint in tox.ini openstack-placement>=1.0.0 this is the first time we need such a release due to the fixture17:12
bauzasgibi: gmann: shall we consider to pull from gh ?17:13
gibiI would keep pypi 17:13
gmanngibi: yeah because this actually change the things like default policy. but this can occur if change in default for policy or config unless we change nova functional test to move to those new defaults. for example 17:14
gibiif this becomes a frequent problem then I would change the placemnet fixture in nova17:14
gmannplacement policy need different token default than what nova is using for access17:14
gibiI just confimed switching to gh locally in the tox.ini fixes the problem. Still I vote for release a new placement version and bumping the constarint in nova's tox.ini17:19
gmannok. yeah once released we should bump the constraint if we wan to use it from pypi17:21
gmannI will push the release17:21
gibiyepp17:21
gibiand thanks17:21
bauzasgmann: I need to disappear soon17:27
gmannbauzas: https://review.opendev.org/c/openstack/releases/+/87098917:28
*** gmann is now known as gmann_afk17:29
bauzasgmann_afk: gibi: that's where I'm struggling to consider 8.1.0 as a correct number17:31
bauzasplacement is cycle-with-rc17:32
gibibauzas: you mean 8.0.0 was Zed, so 8.1.0 should come from stable/ze?17:32
bauzasgibi: yup17:33
bauzasbut we have tools for creating YAMLs17:33
gibiyeah I'm not sure either if we can relase 8.1 from placement master17:33
gibielodilles: ^^? 17:33
bauzashttps://releases.openstack.org/reference/using.html#using-new-release-command17:34
bauzasso, I'd say we should call out this release using the tool with "new-release antelope placement milestone"17:36
bauzaselodilles: right?17:37
elodilleswe could just release beta from cycle-with-rc projects17:38
gibithat would be 9.0.0 beta I assume17:39
elodillesanswering in #openstack-release17:39
elodillesgibi: yes, 9.0.0.0b117:39
bauzaselodilles: using new-release, I guess this is 'milestone' arg I presume ?17:39
*** gmann_afk is now known as gmann17:41
gmanni see, you are right. 8.1.0 is not right17:41
elodillesbauzas: yes, 'milestone' generates 9.0.0.0b1 (to answer it here as well :))17:43
bauzasack, gtk17:43
mnaseris there a reason why nova only generates device: [] metadata for tagged bdms only?17:48
mnaserhttps://github.com/openstack/nova/blob/702dfd33bb93b7cee8c76e117e26bfe56f637460/nova/virt/libvirt/driver.py#L1209217:49
mnaserand then https://github.com/openstack/nova/blob/702dfd33bb93b7cee8c76e117e26bfe56f637460/nova/virt/libvirt/driver.py#L12107-L1210817:49
mnaserwhich then https://github.com/openstack/nova/blob/702dfd33bb93b7cee8c76e117e26bfe56f637460/nova/virt/libvirt/driver.py#L12020-L1202417:49
mnaserand if its supposed to be with way™, how could one figure out whats attached to the system?17:50
bauzasmnaser: sorry, calling it a day17:51
elodillesbauzas gibi : fyi, tox.ini might need an update to allow to install beta releases of placement. that can be done via adding >1.0.0.0b1 instead of >1.0.0 ... if i remember correctly17:52
mnaserbauzas: lol, was that enough nova for you? :p17:52
bauzasmnaser: that :)17:52
bauzas:D17:52
bauzasone day of CI issues, and I quit.17:52
bauzasmnaser: maybe artom could help you17:52
bauzasartom: tl;dr: mnaser is wondering why we only generate the device metadata for tagged bdms17:53
artommnaser, that's by design IIRC, every other bit of information there is already visible to the guest17:53
artommnaser, it's only the tag that comes from the user17:53
artomWithout the tag it's pointless17:54
mnaservolume uuid?  i remember there is a place where it does come from though i think17:54
artomThere might be something else exposed there, like the 'trusted' param for NICs17:54
artommnaser, that should show up as the disk serial number17:54
mnaserah yes17:54
gibielodilles: I will do the tox.ini change once the package is on pypi18:06
*** gmann is now known as gmann_afk18:06
sean-k-mooneymnaser: artom  with that said we coudl generate the metadata if we wanted too18:17
sean-k-mooneyit just wont add extra info as artom mentioned18:17
sean-k-mooneyyou can use lsblk lsusb and lspci to discover it already in the guest18:17
artomsean-k-mooney, we could... I just don't see the point? It's already all info the guest has access to18:17
artomWith lspci, `ip`, etc18:18
sean-k-mooneythe point woudl jsut to make that part of the metadta nolonger optional18:18
sean-k-mooneyso you woudl not have to check if its aviabel or not in the ugest it will be 18:18
sean-k-mooneyeven if you could get that info elsewhere18:18
opendevreviewBalazs Gibizer proposed openstack/nova master: Clean up after ImportModulePoisonFixture  https://review.opendev.org/c/openstack/nova/+/87099318:24
gibibauzas: while it is part of the the functional instability it is not the case and therefore this is not the fix for it, but it is related and it is a cleanup ^^18:24
gibis/it is not the case/it is not the root cause/18:25
sean-k-mooneyhum interesting18:27
sean-k-mooneywhat were we leaking18:27
sean-k-mooneyah the filter18:27
sean-k-mooneyok so we were leakign the filters which could increae memory usage18:28
sean-k-mooneybut it is not sharing state or caussing ither issues18:28
gibiyeah18:28
sean-k-mooneyit does proably contribute to the oom issues 18:28
gibiit is part of the functional instability realted to https://bugs.launchpad.net/nova/+bug/194633918:28
gibias in the recent case it is that import poison that get called by the late eventlet18:29
gibiso I looked at the poision and found a global state18:33
sean-k-mooneyya so each executor has what about 1000 tests per worker18:34
sean-k-mooneyi would guess this is addign 1-2MB per run18:34
sean-k-mooneyat most18:34
gibiyeah it is not realted the OOM case we saw in tempest18:35
sean-k-mooneyi have not check but i would be surpried if adding a filter allocates more then a KB or memory  it should jsut be a few fucntion pointer and python objects18:35
sean-k-mooneywell didnt we also have OOM issues in teh funtional tests18:36
sean-k-mooneywll not OOM python interpreter crahses18:36
sean-k-mooneygibi: one question however18:38
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/870993/1/nova/tests/fixtures/nova.py#183818:39
sean-k-mooneyshoudl we install the filter in setUp not init18:39
sean-k-mooneyi assume the orgianl intent of doign it in init was to do it only once18:40
sean-k-mooneyfor the life time of the fixture and resue the fixture18:40
sean-k-mooneywhich is not how we are doign it18:40
sean-k-mooneyso if we are going to clean up in cleanup should we set it up in setUp18:40
*** gmann_afk is now known as gmann18:56
opendevreviewMerged openstack/nova master: Use new get_rpc_client API from oslo.messaging  https://review.opendev.org/c/openstack/nova/+/86990021:01
opendevreviewMerged openstack/nova master: FUP for the scheduler part of PCI in placement  https://review.opendev.org/c/openstack/nova/+/86287621:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!