Tuesday, 2023-02-07

*** dasm|rover is now known as dasm|out00:55
opendevreviewMerged openstack/nova master: Protect against a deleted node id file  https://review.opendev.org/c/openstack/nova/+/87220402:17
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Configure and teardown ephemeral encryption secrets  https://review.opendev.org/c/openstack/nova/+/82675407:56
opendevreviewmelanie witt proposed openstack/nova master: imagebackend: Add support to libvirt_info for LUKS based encryption  https://review.opendev.org/c/openstack/nova/+/82675507:56
opendevreviewmelanie witt proposed openstack/nova master: imagebackend: Cache the key manager when disk is encrypted  https://review.opendev.org/c/openstack/nova/+/82675607:56
opendevreviewmelanie witt proposed openstack/nova master: Support create with ephemeral encryption for qcow2  https://review.opendev.org/c/openstack/nova/+/87093207:56
opendevreviewmelanie witt proposed openstack/nova master: Support resize with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87093307:56
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to convert_image  https://review.opendev.org/c/openstack/nova/+/87093407:56
opendevreviewmelanie witt proposed openstack/nova master: Add hw_ephemeral_encryption_secret_uuid image property  https://review.opendev.org/c/openstack/nova/+/87093507:56
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to qemu-img rebase  https://review.opendev.org/c/openstack/nova/+/87093607:56
opendevreviewmelanie witt proposed openstack/nova master: Support snapshot with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87093707:56
opendevreviewmelanie witt proposed openstack/nova master: Add reset_encryption_fields() and save_all() to BlockDeviceMappingList  https://review.opendev.org/c/openstack/nova/+/87093807:56
opendevreviewmelanie witt proposed openstack/nova master: Update driver BDMs with ephemeral encryption image properties  https://review.opendev.org/c/openstack/nova/+/87093907:56
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Introduce support for qcow2 with LUKS  https://review.opendev.org/c/openstack/nova/+/77227307:56
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Configure and teardown ephemeral encryption secrets  https://review.opendev.org/c/openstack/nova/+/82675408:13
opendevreviewmelanie witt proposed openstack/nova master: imagebackend: Add support to libvirt_info for LUKS based encryption  https://review.opendev.org/c/openstack/nova/+/82675508:13
opendevreviewmelanie witt proposed openstack/nova master: imagebackend: Cache the key manager when disk is encrypted  https://review.opendev.org/c/openstack/nova/+/82675608:13
opendevreviewmelanie witt proposed openstack/nova master: Support create with ephemeral encryption for qcow2  https://review.opendev.org/c/openstack/nova/+/87093208:13
opendevreviewmelanie witt proposed openstack/nova master: Support resize with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87093308:13
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to convert_image  https://review.opendev.org/c/openstack/nova/+/87093408:13
opendevreviewmelanie witt proposed openstack/nova master: Add hw_ephemeral_encryption_secret_uuid image property  https://review.opendev.org/c/openstack/nova/+/87093508:13
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to qemu-img rebase  https://review.opendev.org/c/openstack/nova/+/87093608:13
opendevreviewmelanie witt proposed openstack/nova master: Support snapshot with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87093708:13
opendevreviewmelanie witt proposed openstack/nova master: Add reset_encryption_fields() and save_all() to BlockDeviceMappingList  https://review.opendev.org/c/openstack/nova/+/87093808:13
opendevreviewmelanie witt proposed openstack/nova master: Update driver BDMs with ephemeral encryption image properties  https://review.opendev.org/c/openstack/nova/+/87093908:13
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Introduce support for qcow2 with LUKS  https://review.opendev.org/c/openstack/nova/+/77227308:13
bauzastobias-urdin: excellent catch on the RPC pin alias, many thanks09:04
bauzasit would have broken seriously our users if we were having it released 09:04
bauzasyet another strange DB issue https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d00/868237/9/check/nova-tox-functional-py38/d00d1ff/testr_results.html09:27
bauzasgibi: I'm writing the etherpad 09:27
bauzasand I'm reporting a new gate failure09:28
gibibauzas: the failure you linked is probably https://bugs.launchpad.net/nova/+bug/194633909:29
gibiso no need for a new bug report09:29
bauzasgibi: seems unrelated09:30
bauzasI get a DB issue due to a missing table09:30
bauzasoh09:31
gibiboth fails with 09:31
gibisqlite3.OperationalError: no such table: instance_faults09:31
gibiafter a message timeout09:31
bauzasyeah, not the same table but quite the same behaviour09:31
bauzaswell, the bug report title is too specific09:32
bauzasgibi: agreed, seems the same09:33
bauzasgibi: https://etherpad.opendev.org/p/nova-ci-failures09:47
bauzasdamn shit, opensearch urls aren't shareable09:49
gibibauzas: feel free to update the title of https://bugs.launchpad.net/nova/+bug/194633909:52
bauzasgibi: I'm not good at naming09:52
bauzasand I'm not sure of the rootcause09:53
bauzasI guess this is due to long-running threads 09:53
bauzasso, due to the length it takes, then we delete the DB09:53
gibibauzas: changed the title then09:54
bauzasand so, when the thread stops, then we try to lookup the table which no longer exists09:54
bauzasamirite on the root cause ?09:54
gibibauzas: one of the root cause is described in the commit message here https://review.opendev.org/c/openstack/nova/+/81403609:54
gibiwhat I we need to figure out what is the new sequence of events that leads to a similar leak09:56
bauzasyeah ok, so we agree09:56
bauzaswe have spawned greenlets09:56
bauzasthat finish after the db is cleaned up09:56
bauzaswe should hold the test until those greenlets finish09:56
gibi1) we need to figure out which eventlet leaks 2) then we need to figure out how that leaked eventlet affects the later test case, i.e. where it the global variable that supports the leak 3) then we can figure out how to fix it09:58
tobias-urdinbauzas: no worries, funny coincidence that i was digging around in the RPC layer and saw it :)09:59
gibiin the fixed case it was RPC handling eventlet was leaked that after 60 sec waked up do to timeout and used nova.rpc.get_versioned_notifier() to get a fresh notifier to the actually running test case09:59
opendevreviewJorge San Emeterio proposed openstack/nova master: Dividing global privsep profile  https://review.opendev.org/c/openstack/nova/+/87172910:00
gibithe fix there was to ignore notifications if it is comming from a test case that is different from the currently running one10:00
bauzasgibi: grabbing a coffee, this is a hard issue to dig into10:03
gibibauzas: ack I will add a logsearch query for it as soon as I have one10:04
bauzasgibi: thanks, I was just doing it but I'm not expert of the tool10:04
bauzasI'm very disappointed we can't longer share our ex-logstack urls10:04
bauzaslogstash*10:04
gibithe SQL cursor one https://bugs.launchpad.net/nova/+bug/2002782 is not that frequent, we hit it 9 times in 20 days so I would ignore it for now10:05
bauzasnot very handy for finding how much we're doomed10:05
bauzasgibi: yup, I was about to tell you10:05
bauzasgibi: that's why I wanted to add the opensearch one10:05
gibithat 1525 hits in 7 days seem way to much for https://bugs.launchpad.net/nova/+bug/1946339 I suspect some false positives there10:09
bauzasprobably10:10
bauzasgibi: fancy to share your logsearch config files ?10:11
gibihttps://github.com/gibizer/zuul-log-search-config10:12
gibiso we have sort of false positives. For example this https://zuul.opendev.org/t/openstack/build/31e9de9e4a574df6a2f45546927954fe/log/job-output.txt#23436  partially reproduced https://bugs.launchpad.net/nova/+bug/1946339 but in this case the test case did not fail due the the missing DB table just the stack trace was logged. 10:14
gibiso we probably have a lot of hits that has the stack trace but no failed tests10:15
bauzasgibi: I can amend the opensearch query to ensure the outcome is FAILURE10:16
gibiyepp, this is an example of a passed tox run with the missing DB table test case https://zuul.opendev.org/t/openstack/build/f389ba6c4d9f4a04a5b6f09e253d864b/log/job-output.txt#2350110:16
bauzas191 hits :)10:17
gibis/missing DB table test case/missing DB table stack trace/10:17
bauzasin the last 7 days10:17
bauzasgibi: sorry for this dumb question but I wanna rush10:18
bauzasgibi: how to use the configs from the other repo into the main one ?10:18
bauzasjust all the files ?10:18
bauzasor just checkouting some of them ?10:18
gibibauzas: what I do is: 1) create a venv and install the tool with pip install git+http://github.com/gibizer/zuul-log-search 2) next to the .venv clone the config repo 10:19
bauzasdid 1)10:20
bauzasdid 2)10:20
gibiIt uses .logsearch.conf.d/ in the current directory if exists. Otherwise, uses $XDG_CONFIG_HOME/logsearch/ if XDG_CONFIG_HOME is defined. Otherwise, uses ~/.config/logsearch/.10:20
bauzasyeah so mv the whole dir ?10:20
bauzasthat was my question10:20
bauzasI see a config subdir in the zuul-log-search10:21
bauzasbut it seems unused10:21
gibihttps://paste.opendev.org/show/bTV1aUuajhb3uPYNb8Mp/10:21
gibisorry, so create the .vevn in the clone config repo.10:22
bauzasI see10:22
bauzasor ln -sf this config dir 10:23
bauzaswhich is what I'll be using10:23
* bauzas usually creates build venvs in the projects repo10:23
gibiack10:24
bauzasyay, that works10:24
gibiI will go and collect other frequent gate failures based on the query logsearch build --project openstack/nova --voting --pipeline gate --result FAILURE --branch master --days 710:28
bauzasgibi: iiuc, Builds with matching logs 160/162 means that over 162 job runs with FAILURE, 160 of them were having the query I asked ?10:29
bauzasso, 98% of them10:29
gibiwe have some gate runs which are TIMED_OUT too logsearch build --project openstack/nova --voting --pipeline gate --result TIMED_OUT --branch master --days 7 I think this is what dansmith mentioned yesterday10:30
gibibauzas: yes10:30
bauzasgibi: maybe the query I make is too large, as you mentioned10:30
bauzasrequest was 'sqlite3.OperationalError: no such table: instance_faults"10:30
gibibauzas: that will pick up the cases when you see the stack trace without that killing the test but job failed for other reason10:32
gibibut we need to live with it10:32
bauzasyup10:32
bauzasI think we now have enough to work with10:32
bauzasI'll try to do this digging thing10:33
gibiack10:33
opendevreviewMaxim Monin proposed openstack/nova master: Server Rescue leads to Server ERROR state if base image is deleted  https://review.opendev.org/c/openstack/nova/+/87238510:34
bauzasI tried to ask opensearch to give me the occurrences down to 30 days10:34
bauzasI'll try to see whether it started to reappear at some point in time10:34
bauzasmmm, interesting10:35
bauzasgrabbing occurrences for the last 2 months10:35
bauzasgibi: https://imgur.com/a/wOeLcUf10:36
gibibauzas: we have limited log storage10:37
bauzasit started recently10:37
bauzasless than one month of storage ?10:37
bauzasor more ?10:37
gibiwith the old logstash it was about a month10:37
gibiI don't know about the new one10:37
bauzasgibi: with old logstash, I was sure it was a month10:37
bauzasanyway, if so, let's start to find the regression by other way10:38
bauzasand I suspect this can't be reproduced locally10:38
bauzasor I would need to speed down my laptop10:39
dvo-plvHello, <sean-k-mooney>10:50
dvo-plv I would like to continue our coversation, which we had at friday10:50
dvo-plvWe talked about packed_ring option10:50
gibibauzas: yeah one thing you can try is to slow down things and increase the frequency of the test case that was failed by duplicating in many times10:51
dvo-plvI would like to discuss schedulet. 10:51
dvo-plvThe situation when user did not ask about COMPUTE_NET_VIRTIO_PACKED trait, but we need to handle migration in some way. I found that scheduler has ALL_REQUEST_FILTERS array with different filters. My eye falls on the accelerators_filter. I suggest implement packed_ring filtering in the same way as in this method. Also this give us ability to avoid situation when user want to start VM on the node where this feature is not unavailable10:51
sean-k-mooneydvo-plv: good thinking but that would be the legacy approch10:53
sean-k-mooneydvo-plv: my counter propsal is this. when a vm is spwaned on a host if it support COMPUTE_NET_VIRTIO_PACKED set a flag in the instance_system_metadta to record that. then instead of adding a post placement filter add a pre  placement filter here https://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.py10:54
sean-k-mooneydvo-plv:  unless this is the accelerators filter you ment https://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.py#L260-L27310:55
sean-k-mooneyif so then yes it would be very similar to that10:55
sean-k-mooneywe would either check for a extra_spec and add the trait in an identical way10:55
sean-k-mooneyor check the instnace_system_metadta for the flag.10:56
sean-k-mooneythe former would take effect when booting a vm that explictly request this the latter for any vm that was spwaned on a host with this capablity10:56
sean-k-mooneyone of the main probalem with the instnace_system_metadata approch is im not sure the request_spec has that field10:58
sean-k-mooneythe approch we take really comes down to one choice. does the packed ring format need to be opt-in or automatic10:58
sean-k-mooneyif its opt in via a flavor/image property then the prefilter is trivial10:59
sean-k-mooneythe request spec has both the image properties and flavor extra spec so you can just check them and add the required trait as the accleror filter does10:59
sean-k-mooneylooking at https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py the instance_system_metadata is not part of the request spec currently11:01
sean-k-mooneyso to that ahat approch we woul d have ot modify the request spec which im not sure is the right thing to do here.11:01
sean-k-mooneythe request spec and instance_system_metadata live in different DBs (api vs cell db)11:02
bauzasgibi: I chose the stestr approach of --until-failure11:03
dvo-plvI check instance_system_metadata table and for me it looks like we will mix different OpenStack's layers ( instance and host) , because there is no type of data for instance like that, we have there some image, project, and user info. And I did not found how this table link with request_spec what we create at the scheduler11:04
sean-k-mooneydvo-plv: so based on that i would suggest we take the opt in/out approch and use flavor/image properties11:04
gibibauzas: that is independent from increasing the chance of catching it by increasing the number of test case to run that we know can fail due to the issue. you can do both11:05
sean-k-mooneydvo-plv: instance_system_metadata is a generic key value store for storing internal information about the instnace. such as the embeded image properites11:05
sean-k-mooneyand its not accesable to the schduler genreally 11:05
bauzasgibi: we know that the issue is not on a single test11:06
bauzasso while the testrunner runs, I'm looking at every single failure to see the stacktrace and find a pattern11:06
gibibauzas: yes, but we can grab a list of test cases run by a failed test worker. That list contains both the test case that leaked and the test case that failed due to the leak11:06
opendevreviewMerged openstack/nova master: Fix 6.2 compute RPC version alias  https://review.opendev.org/c/openstack/nova/+/87280411:06
gibibauzas: we know what is the latter11:06
sean-k-mooneydvo-plv: what you would actully need to do is have the conductor populate a filed on the request spec when you do a live migration. i feel like that approch is more complex then requried11:07
gibibauzas: so we can run the same testcase list11:07
gibibauzas: as we know it contains both11:07
bauzasgibi: I see your proposal11:07
gibibauzas: then we can increase the chance by adding more test cases that is in the latter category11:07
bauzasI have the subunits11:07
bauzasso I can generate a list11:07
bauzasof failing tests11:07
bauzasand duplicate that list11:08
sean-k-mooneydvo-plv: if we were to leverage the instance_system_metadta we would likely need to extend the Destination filed to have addtional trait requests or something like that https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L109311:09
gibioriginally (in 2021) this way I was able to reproduce https://bugs.launchpad.net/nova/+bug/1946339 but I tried this couple weeks ago again and was not able to reproduce the current occasion after couple hour of --unit-failure run11:09
sean-k-mooneydvo-plv: the destination object is constucted here https://github.com/openstack/nova/blob/1c46c4e9e5ba4b84816f5cadad0674f3a773e739/nova/conductor/tasks/live_migrate.py#L6411:10
sean-k-mooneydvo-plv: but as i said this more complex approch is only relevent if we wanted to automatically enabel this functionality11:11
sean-k-mooneywell technially its created here https://github.com/openstack/nova/blob/1c46c4e9e5ba4b84816f5cadad0674f3a773e739/nova/conductor/manager.py#L47011:12
sean-k-mooneythis only matters for the live migration case as the feature can be renegociated on cold migration or other move operations11:13
bauzas(functional-py38) [sbauza@sbauza nova]$ stestr load /tmp/zuul-logs.Edb6Rp/testrepository.subunit --subunit | subunit-filter -F | subunit-ls11:14
bauzasnova.tests.functional.libvirt.test_vtpm.VTPMServersTest.test_create_server11:14
bauzasgibi: I'm able to get the failing test11:14
bauzasso given I'm looking at all the fetched logsearch subunits, I could extrapolate a list of usual suspects11:14
dvo-plvSo if you think that this way is very complex and can make code not so easy and familiar, maybe we should better use existing approach ( creates a new filter like accelerators_filter and check if the user requested packed option) what already exists and is easy to scale. I already check this approach, this approach also forbids migrating VM to the host without packed ring support and also start VM on the host without packed ring support11:15
sean-k-mooneydvo-plv: yep that is the simpelest approch. we could in a future release enable it by default and add a migration mechaniums too if desired.11:16
sean-k-mooneyeither by turning it on once we raise our min QEMU/Libvirt versions to one that means it will alwasy be aviable11:17
sean-k-mooneyor buy automatically adding the image property if not provided11:17
sean-k-mooneyso taking the explict approch now does not prevent use making it implict in the future11:18
sean-k-mooneymaking it automatic now front loads a bunch of complexity11:18
gibibauzas: I tried that too. I fetched multiple failed worker test case list and intersected them it resulted in an empty list. probably we have multiple test cases that leaks11:21
gibibauzas: but you can be lucky11:22
bauzasgibi: I have the uuids from logseearch but I don't have the subunit streams11:23
bauzasgibi: any way to pull them with logsearhc ? maybe using the --file param ?11:24
gibiI pull them manually11:24
gibiI needed only about 5 to see that there is no one common test case in the list11:24
bauzasI can hack download.sh11:25
gibibauzas: I added 11:27
gibihttps://bugs.launchpad.net/tempest/+bug/199989311:28
gibito the etherpad11:28
bauzasgibi: I'm downloading the subunit file from each of the 159 failing runs11:37
gibithat is maybe overkill as I said 5 example was enough for me to end up in an empty intersect11:37
bauzaswe will see11:37
bauzasand then I'll ask stestr run to run 10 times each of the failing tests11:38
sahid_o/ quick question, regqrding instance.props, we do copy image metadata to the instance right? I don't remember11:43
gibibauzas: added another bug to the etherpad https://bugs.launchpad.net/nova/+bug/200646712:10
bauzasack12:11
bauzasfwiw, looping over 78 different libvirt funct tests12:11
bauzasfirst pass was saying OK12:12
dvo-plvSorry, Sean, but I do not get your final think. Do you prefer to use. Honestly, I would like to implement it as I have suggested, all interfaces what I need already exists and I will not extend other methods and classes with one parameter that will not use often. It will be a delicate way to extend the existing bunch of filters with one more filter. 12:18
gibibauzas: added another https://bugs.launchpad.net/glance/+bug/200647312:50
sean-k-mooneydvo-plv: i was suggeting using the extra specs/image properties for now12:57
sean-k-mooneydvo-plv: and when we raise our min libvirt/qemu version eventually we can enable it by defualt then12:57
sean-k-mooneydvo-plv: that would be my prefence. its simple to add to nova, easy to document and understand and easy to test12:58
sean-k-mooneydvo-plv: we can evenutally turn this on by default when we nolonger supprot qemu/libvirt version that dont have this and there is nolonger an upgade impact12:59
sean-k-mooneydvo-plv: we did the same thing with the virtio random number generator in the past13:01
sean-k-mooneydvo-plv: intially it was opt in and we enabled it by default after a few release after we raised our min libvirt/qemu version 13:01
bauzasgibi: after one hour, still none of the 79 tests were having an issue13:14
opendevreviewJorge San Emeterio proposed openstack/nova master: Moving privsep profiles to nova/__init__.py  https://review.opendev.org/c/openstack/nova/+/87201013:18
dvo-plvYes, I would like to have some general pre-approval from you here, before starting to implement and verify this functionality and present it in the blueprint to be sure that it will work okay, and does not waste your time on the blueprint spec file review process 1) User will have the ability to enable/disable this feature via flavor/image. 2) User will have the ability to set trait COMPUTE_NET_VIRTIO_PACKED to the flavor13:18
dvo-plvSorry, I have interrupt, I will resend my question13:19
dvo-plvYes, I would like to have some general pre-approval from you here, before starting to implement and verify this functionality and present it in the blueprint to be sure that it will work okay, and does not waste your time on the blueprint spec file review process13:19
dvo-plv1) User will have the ability to enable/disable this feature via flavor/image. 2) User will have the ability to set trait COMPUTE_NET_VIRTIO_PACKED to the flavor to choose some specific servers. Compute node will set this trait to the resource provider here static_trait.13:19
dvo-plv 3) Scheduler will handle migration and OpenStack cluster update process with automatically understanding which node has this function with extended ALL_REQUEST_FILTERS array with a new filter similarly how it was implemented for accelerators_filter ( get a packed request from flavor ).13:20
sean-k-mooneydvo-plv: yep so requesting the feature via flavor/image shoudl automatically result in the trait request via a pre_fiter like the acclerator filter13:20
dvo-plv4) As far as Qemu from v4.2 can not be compiled without packed ring support and Libvirt from v6.3, we can get if the current compute node can use this functionality and if it is available for the user.13:20
dvo-plvOR do we need just implement options 1, 2, and 4 without the automatic scheduler handling this feature exists on the compute target node?13:20
sean-k-mooneyso they can ask for COMPUTE_NET_VIRTIO_PACKED explictly but it should not be required13:20
sean-k-mooney2 you get for free we already support arbitry trait request in the flavor/image13:21
sean-k-mooneyas part of implementeing 1 you should add a schduler prefilter to request COMPUTE_NET_VIRTIO_PACKED if the extra_spec/image property is set13:21
sean-k-mooneyso 1 and 3 are what you need to enable this feature properly13:22
sean-k-mooneydvo-plv: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L212-L22213:23
sean-k-mooneyour current min libvirt is 6.0 and Qemu is 4.213:23
sean-k-mooneywe have not bumped that in a few releases  so we will likely go to libvirt 7.0 and qemu 5.2 in the B release13:24
sean-k-mooneyalthough we can technially do that in the A release13:25
dvo-plvOkay, I see, but it in the future, for now Libvirt support packed from 6.3, Should I just update minimum libvirt version, or create my own define for my trait?13:25
sean-k-mooneybauzas: kashyap any reason not to do that in A13:25
sean-k-mooneydvo-plv: we have a speicifc procedure for updating it where we have to annowuch our new min version in advacne for at least 1 cycle13:26
sean-k-mooneywe declared 7.0 and 5.2 as our next verion in Wallaby13:27
sean-k-mooneyso we could have done that bump some time ago13:27
bauzassean-k-mooney: we can if you want13:27
sean-k-mooneyalthough we now have new upgrade requirement to test the previous LTS13:27
sean-k-mooneybauzas: i just realsied we cant13:27
sean-k-mooneywe need to support focal for A for upgrade reasons13:28
sean-k-mooneybauzas: so we should do this in early B13:28
sean-k-mooneywe need to not have 20.04 in our greade job to do this bump13:29
bauzashmmm ok13:29
sean-k-mooneyand the dedicated focal job to go away13:29
sean-k-mooneyfor B we will be useing 22.0413:29
sean-k-mooneydvo-plv: so what that means for you is if your patch is after we have done the bump you will not need to do the version check13:30
gibibauzas: I'm not surpirsed, it seems both of us are missing some hidden ingredients to reproduce the same thing that happens on the gate13:31
sean-k-mooneyif its before we do the bump you will ahve to do the version check when reportin the trait13:31
bauzasRan: 4144 tests in 4758.5975 sec.13:31
bauzas - Passed: 414413:31
bauzas - Skipped: 013:31
bauzas - Expected Fail: 013:31
bauzas - Unexpected Success: 013:31
bauzas - Failed: 013:31
bauzasSum of execute time for each test: 32443.8935 sec.13:31
bauzas:)13:31
sean-k-mooneywhat kind of potato is that running on13:31
sean-k-mooneyor were you just running those in a loop13:32
bauzassean-k-mooney: see what we discussed before you arrived13:32
bauzasgibi: yah, maybe13:33
dvo-plvOkay, If Libvirt version  will be lower that 6.3, when I will present patch in the blueprint, I will create separate define with Libvirt version13:33
bauzasgibi: I'm now looking at the code and trying to understand what we use 13:33
opendevreviewJorge San Emeterio proposed openstack/nova master: Moving privsep profiles to nova/__init__.py  https://review.opendev.org/c/openstack/nova/+/87201013:33
kashyapsean-k-mooney: Hi, reading back.  (Was buried elsewhere in an urgent thing)13:33
sean-k-mooneykashyap: its fine it was just on our next libvirt/qemu version13:34
kashyapsean-k-mooney: Yeah, bumping the min versions in 'A' is totally fine.13:34
sean-k-mooneykashyap: actully it used to be its not anymore13:34
sean-k-mooneykashyap: form a pure nova point of view it would be13:34
sean-k-mooneykashyap: but we have PTI/governance requirements13:34
kashyapRight, I'm talking from a Nova PoV13:34
sean-k-mooneythat reqire use to supprot 20.04 for A13:35
sean-k-mooneyright so because of the other requirement we cant bump it in nova until B13:35
sean-k-mooneykashyap: https://github.com/openstack/governance/blob/master/reference/runtimes/2023.1.rst#additional-testing-for-smooth-upgrade13:35
kashyapWhat is "support 20.04 for A", I don't get 13:36
* kashyap reads13:36
dvo-plvThank you for your time and conversation. Have a nice day13:36
kashyapAh, it is Ubunutu 20.0413:36
sean-k-mooneyyes basicaly every time we cange a base OS in the testign requirement we need to have one release wehre we test the old and new version13:37
sean-k-mooneykashyap: we chavned form 20.04 to 22.04 in this release13:37
sean-k-mooneyso the same would happen for debiany 11->12 or centos 9->10 in the future13:37
sean-k-mooneyits to ensure you can upgrade openstack without nessiarly needing to upgrade the OS it also mimic how our greneade jobs work13:38
sean-k-mooneyits related to https://github.com/openstack/governance/blob/master/resolutions/20220210-release-cadence-adjustment.rst the skip level upgrade release and the new lifecycle for integrated release projects13:39
sean-k-mooneydvo-plv: o/13:40
kashyapsean-k-mooney: Yeah, the upgradeability makes sense13:40
sean-k-mooneykashyap: bauzas  any objection to doing the bump in a few weeks after RC 1 is out13:40
sean-k-mooneybetter to try and do that early rather then late13:40
sean-k-mooneyor at least identify what our next versions should be declared as13:40
bauzaswhen RC1 is out, then the master branch will be the Bobcat release, so ok13:41
kashyapsean-k-mooney: Definitely agree on doing it earlier13:41
opendevreviewMerged openstack/nova master: Move comment about _destroy_evacuated_instances()  https://review.opendev.org/c/openstack/nova/+/87234813:42
opendevreviewMerged openstack/nova stable/wallaby: [stable-only][cve] Check VMDK create-type against an allowed list  https://review.opendev.org/c/openstack/nova/+/87155713:42
bauzas\o/13:42
bauzasgibi: interestingly, if I restrict the logsearch call to ImportError: This test imports the 'libvirt' module, which it should not in the test environment. Please add appropriate mocking to this test." which is the latest exception I only get 19/143 failures that match (from the last 20 days)13:45
opendevreviewMaxim Monin proposed openstack/nova master: Server Rescue leads to Server ERROR state if base image is deleted  https://review.opendev.org/c/openstack/nova/+/87238513:45
bauzasby comparing https://7ffaea22ff93fca2f0ea-bf433abff5f8b85f7f80257b72ac6f67.ssl.cf5.rackcdn.com/869900/7/gate/nova-tox-functional-py38/3b10d8a/testr_results.html to https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d00/868237/9/check/nova-tox-functional-py38/d00d1ff/testr_results.html that's why I think we have this13:45
gibiI dont see the difference both has the import error line13:46
*** dasm|out is now known as dasm|rover13:53
bauzasgibi: I mean, this is just a canary line for not getting the false positives13:57
gibido you have a false positive where this line is missing?13:58
opendevreviewDavid Hill proposed openstack/nova master: Increase user_data from 64k to 128k  https://review.opendev.org/c/openstack/nova/+/87293114:00
bauzasgibi: one of the false positives is https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_aea/850501/15/check/nova-tox-functional-py38/aea02af/testr_results.html14:04
bauzasgibi: you can find the DB issue in job_output.txt but the tests aren't failing14:05
bauzashttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_aea/850501/15/check/nova-tox-functional-py38/aea02af/job-output.txt14:05
bauzasand you won't see the canary line14:06
gibiI see. So the difference between the false positive and a real positive is that the real one hits the libvirt import check and fails the actual check while the false one did not14:06
bauzasyup14:06
gibis/actual check/actual test/14:06
bauzasso now I'm trying to find the pattern14:06
gibiI see14:06
bauzasI stestr loaded all the subunites14:06
gibigood progress14:06
bauzasnow, I'm grepping this large txtfile I generated14:06
bauzasgibi: see, the fact that we get the same exceptions with or without failing makes me think that the canary isn't maybe just a canary but rather the root cause of the failure14:13
bauzasor rather some condition to a failure14:14
bauzasif we can understand why in some cases we say meh and why not, then we could fix the problem14:15
bauzasgibi: https://4dca9d38a541907e85e1-0253beca39d73a6e7192d5b32ed5edc2.ssl.cf2.rackcdn.com/860282/2/check/nova-tox-functional-py310/466e0d7/testr_results.html is a good candidate to use as it got 14:21
bauzas*both* false positive and true positives14:21
bauzastest_resize_revert_across_azs is a true positive14:22
bauzaswhile other api failures are false ones14:22
gibiare the hits in the same worker? that would be the best to find a worker with multipl hits as that would mean that worker had more than one leak14:27
bauzas2023-01-24 18:37:50.710671 | ubuntu-jammy |   File "/home/zuul/src/opendev.org/openstack/nova/nova/virt/libvirt/driver.py", line 10619, in _live_migration 2023-01-24 18:37:50.710677 | ubuntu-jammy |     self.live_migration_abort(instance)14:28
bauzaslooks to me all our pain comes from this ^14:28
bauzasand then I'm confused14:31
bauzaswhy so the f... are we calling live_migration_abort() is some functional test that just creates an instance ?14:31
bauzashttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d00/868237/9/check/nova-tox-functional-py38/d00d1ff/testr_results.html14:31
gibibauzas: the leaked thread calls it :)14:32
bauzashah 14:32
gibibauzas: I think the global state the leak acts on to infulence the later tests is sys.meta_path from ImportModulePoisonFixture14:32
gibiso my current theory: we leak a thread/eventlet that eventually calls live_migration_abort while a later test runs. As the later test sets the global posion on libvirt import the later test gets the failure14:33
bauzasyeah, the global libvirt object is set to None by something else14:33
gibiif there is no ImportModulePoisonFixture set in the later test then we only see the stack trace14:33
bauzasso the threads get this NoneError and fails whivh tramples the whole terst14:34
bauzassomething we merged tampered the libvirt import14:34
bauzasand we need to find it14:34
gibiI think we intentionally posion libvirt import14:35
bauzasby posion, you mean poison, right?14:35
bauzasbut yeah got it14:36
gibiyeah, sorry14:36
bauzasafaicr, our libvirt functional tests do poison indeed the import14:36
bauzasthe problem is that it seems that some test that calls live_mig_abort() doesn't use the libvirt poisoned instance, hence the issue14:37
bauzasbut which one and how to find it ? 14:38
bauzas2023-02-06 16:15:47,325 ERROR [root] Original exception being dropped: ['Traceback (most recent call last):\n', '  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_fake.py", line 207, in _send\n    reply, failure = reply_q.get(timeout=timeout)\n', '  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/eventlet14:39
bauzas/queue.py", line 322, in get\n    return waiter.wait()\n', '  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/eventlet/queue.py", line 141, in wait\n    return get_hub().switch()\n', '  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch\n    return self.greenlet.switch()\n', '_queue.Empty\n', '14:39
bauzas\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', '  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 203, in decorated_function\n    return function(self, context, *args, **kwargs)\n', '  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 9300, in _post_live_migration\n    self._update_scheduler_instance_in14:39
bauzasfo(ctxt, instance)\n', '  File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 2219, in _update_scheduler_instance_info\n    self.query_client.update_instance_info(context, self.host,\n', '  File "/home/zuul/src/opendev.org/openstack/nova/nova/scheduler/client/query.py", line 69, in update_instance_info\n    self.scheduler_rpcapi.update_instance_info(context, host_name,\n', '  File "/home/zuul/src/opende14:39
bauzasv.org/openstack/nova/nova/scheduler/rpcapi.py", line 174, in update_instance_info\n    cctxt.cast(ctxt, \'update_instance_info\', host_name=host_name,\n', '  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/fixtures/_fixtures/monkeypatch.py", line 86, in avoid_get\n    return captured_method(*args, **kwargs)\n', '  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py3814:39
bauzas/lib/python3.8/site-packages/oslo_messaging/rpc/client.py", line 190, in call\n    result = self.transport._send(\n', '  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/oslo_messaging/transport.py", line 123, in _send\n    return self._driver.send(target, ctxt, message,\n', '  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/oslo_mess14:39
bauzasaging/_drivers/impl_fake.py", line 222, in send\n    return self._send(target, ctxt, message, wait_for_reply, timeout,\n', '  File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_fake.py", line 213, in _send\n    raise oslo_messaging.MessagingTimeout(\n', 'oslo_messaging.exceptions.MessagingTimeout: No reply on topic scheduler\n'] 2023-02-06 16:15:47,361 WAR14:39
bauzasNING [nova.virt.libvirt.driver] Error monitoring migration: (sqlite3.OperationalError) no such table: compute_nodes14:39
gibimaybe the test has the poison but it is removed after the test finished, but the thread that will do the abort call is leaked to another test that might or might not have (or need) the poision14:39
opendevreviewElod Illes proposed openstack/nova stable/ussuri: DNM: CI test  https://review.opendev.org/c/openstack/nova/+/87218414:49
elodillesbauzas: i'll update the meeting wiki (stable section) if you are OK with it15:02
bauzaselodilles: do it15:02
elodillesack15:02
bauzasgibi: any way we could have to introspect in some log what was creating the thread ?15:04
* bauzas googles as I speak15:04
bauzasChatGPT, maybe you know ?15:05
elodilles:)15:07
elodilles(meanwhile, I'm done with the wiki editing)15:08
gibibauzas: we can try printing https://github.com/openstack/nova/blob/9bc198e05733c03ba1a40f89cd6a77ab54b7e480/nova/tests/fixtures/notifications.py#L154-L160 to get the name of the testcase that started the eventlet 15:10
bauzasgibi: I can write a patch15:13
bauzasgiven the occurrences, we may have evidences coming up15:13
bauzassooner than later15:13
gibiyeah lets try that15:14
bauzasgibi: that being said, the thread is maybe not a FakeVersionedNotifier15:15
*** dasm|rover is now known as dasm|afk15:15
gibibauzas: FakeVersionedNotifier was on the receiving end in the past not on the sending side. in the current case the poison is on the receiving side, and the live_mig_abort is on the sending side afaik15:16
bauzasgibi: are you proposing me to add this directly in   File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 8854, in _do_live_migration     self.driver.live_migration(context, instance, dest, ?15:16
gibiif we want to print only in true positive cases then add it in /home/zuul/src/opendev.org/openstack/nova/nova/tests/fixtures/nova.py  line 1849, 15:17
gibiif we want to print in false positive cases too then in /home/zuul/src/opendev.org/openstack/nova/nova/virt/libvirt/driver.py", line 10071, in live_migration_abort15:17
gibidon't call _get_sender_test_case_id just copy the implementation of it15:18
bauzasyup, I see15:19
bauzasgibi: but we want to know the parent, right?15:21
gibibauzas: print the first id it gets that will be the name of the test case leaked the thread either directly or indirectly15:21
gibihm, direclty, hence the walking on the parents15:21
gibiso print the first id that will be the eventlet nova spawn or spawn_n started and have the test case id emeded15:22
gibiwe walk the parents as eventlets later can spawn other eventlets which we don't control and therefore we cannot propagate the testcase id there15:22
opendevreviewSylvain Bauza proposed openstack/nova master: DNM: Add logging for leaking out the non-poisoned libvirt testcase  https://review.opendev.org/c/openstack/nova/+/87297515:32
bauzasgibi: ^15:32
bauzasI said DNM but we could merge it 15:32
bauzasinstead of us rechecking15:33
opendevreviewDan Smith proposed openstack/nova master: Add docs for stable-compute-uuid behaviors  https://review.opendev.org/c/openstack/nova/+/87297715:46
*** dasm|afk is now known as dasm|rover15:50
bauzas#startmeeting nova16:01
opendevmeetMeeting started Tue Feb  7 16:01:03 2023 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.16:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:01
opendevmeetThe meeting name has been set to 'nova'16:01
bauzassorry folks, forgot to remind you of the meeting16:01
bauzaswho's around ?16:01
Ugglao/16:01
elodilleso/16:02
bauzasI guess we can make a soft start16:02
bauzas#link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting16:02
bauzas#topic Bugs (stuck/critical) 16:03
bauzas#info No Critical bug16:03
bauzas#link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 28 new untriaged bugs (+1 since the last meeting)16:03
bauzas#info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster16:03
bauzasUggla: fancy getting the bug triage baton for this week N16:03
bauzas?16:03
sean-k-mooneyo/16:04
UgglaI will be out next week so I would rather postponed if possible16:04
bauzasack, so artom would you want to continue having the triage baton for an extra week ?16:04
UgglaIf not I'll try to do my best till the end of the week.16:04
artomAh, I completely dropped the ball, didn't I?16:05
artomYeah, I can keep it16:05
bauzas++16:05
bauzasartom: no worries16:05
bauzasand thanks16:05
gibio/16:05
dansmitho/16:06
bauzasok moving on16:06
bauzas#topic Gate status 16:06
bauzas#link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:06
bauzasnew item16:06
bauzas#link https://etherpad.opendev.org/p/nova-ci-failures Etherpad for tracking CI failures16:06
gibithere is a fairly long list but please add to it if you see failures16:07
gibithat are not on the list16:07
bauzaswe have some of them are hit us in some place between the chair and the man16:07
bauzasI think the hardest one is at the bottom of the document16:08
dansmithoof yeah16:08
bauzasI had a lovely morning and a half-afternoon spent on that one16:08
bauzasin the context of a soonish feature freeze, more hands are more than welcome16:08
dansmithI wrote the replace_location test, so I can look into that one.. it's a glance test though. I'm sure it's poking some bug in glance, because until I wrote that we didn't really have any tests for that stuff16:09
bauzasbecause don't expect your patches to be reviewed if most of the cores are having their days spent on fixing CI problems16:09
dansmithbut maybe it is resolvable16:09
gibidansmith: there is https://bugs.launchpad.net/glance/+bug/1999800 and https://bugs.launchpad.net/glance/+bug/2006473 both location tests16:10
bauzasand yeah, I know, debugging a CI failure isn't exactly the best experience you may have of working on an opensource project, but let's be honest and say that's necessary to have an healthy gate16:10
gibibauzas: +116:10
dansmithokay the former is the same as bauzas' one16:10
gibiyeah probably16:10
dansmithyeah from the logs, the test is clearly doing something legit and glance is rejecting it but shouldn't 16:11
bauzasgibi: I created https://bugs.launchpad.net/nova/+bug/2004641 but it seems duplicate of https://bugs.launchpad.net/glance/+bug/199980016:11
dansmithmight be because it fails to talk to the cirros site occasionally, so maybe we can use an openstack infra url instead16:11
dansmithbauzas: indeed16:11
sean-k-mooneyi tought we tried to pull those form provider proxies in ci16:11
gibibauzas: https://bugs.launchpad.net/tempest/+bug/2004641 and https://bugs.launchpad.net/glance/+bug/2006473 are duplicates but https://bugs.launchpad.net/glance/+bug/1999800 is a separate tc16:11
bauzasI can close my one as duplicate16:11
dansmithbauzas: ++16:12
sean-k-mooneygithub is more repliable for downlaoding cirrors images by the way then the cirros site16:12
bauzasI just ideally would like to track that bug in our project16:12
dansmithsean-k-mooney: the cirros page just redirects to the github one16:13
sean-k-mooneyoh they finally implmetned that16:13
dansmithsean-k-mooney: and we're just using CONF.image.http_image in that test16:13
sean-k-mooneyoh this is not the image pulled by https://github.com/openstack/devstack/blob/master/stackrc#L670-L70816:14
dansmiththe github URL is crazy long with tons of tokens and other values after the redirects it does16:14
bauzasgibi: ack will mark your https://bugs.launchpad.net/nova/+bug/2004641 as duplicate of mine, then16:14
dansmithsean-k-mooney: this is a tempest test16:14
sean-k-mooneyright the one with the larger image16:14
dansmithno16:14
dansmithgibi: it's the same test case, different behavior, but I'm guessing its the sameish problem16:14
bauzasok, you know what, I'll add mine in the tracking etherpad, and we'll figure out16:15
bauzasthe three of them are set against Glance either way16:15
sean-k-mooneyim surpised that the tempest test is not using the one we prestage in the vm but ok16:15
sean-k-mooneyi was expecting CONF.image.http_image to be file:///opt/devstack/data/cirros...16:16
gibidansmith: yeah probably similar root cause16:17
dansmithsean-k-mooney: it can't be because that is specifically for testing fetching an image server-side from http16:17
sean-k-mooneyah thanks i was missing that context16:17
sean-k-mooneyoh that that in https://bugs.launchpad.net/glance/+bug/2006473 i was only familar with https://bugs.launchpad.net/glance/+bug/199980016:17
dansmiththey're the same test16:18
dansmithsorry, the same test helper16:18
bauzasand probably the rootcause16:18
sean-k-mooneyya so likely the same cause16:18
bauzassame rootcause16:18
bauzaswhich is a flakey httpservice16:19
bauzaseither way, seems we have a path forward with the github image repo then ?16:19
* bauzas trying to read between the lines16:19
bauzaslooks like people are gone16:21
dansmithbauzas: no, it's already using that via redirect16:21
bauzasthere is another CI failure I'd like to talk about16:21
sean-k-mooneyfrom the name i would not expect either to depned on downloading an image over http but i have not looked at the detail of the test. i was expecting tempest to upload the image form disk.16:21
dansmithbauzas: I'll take it and work something out16:21
bauzasdansmith: very much appreciated, trust me.16:21
bauzasdansmith: fwiw, the hits number seems low compared to other bits16:22
bauzasbites*16:22
bauzasso, about https://bugs.launchpad.net/nova/+bug/194633916:22
dansmithyeah, but if we have no other obvious ones to work on, at least I can make some progress on this :)16:22
bauzasdansmith: heh16:22
bauzasso, after a day of co-investigation with my CSI partner gibi on https://bugs.launchpad.net/nova/+bug/194633916:23
bauzaswe identified this may come from a non-poisoned libvirt 16:23
bauzasthe funny part is that we hit this in a thread, not in the main test16:23
bauzashence why we missed it before16:24
bauzasI have a question16:24
bauzasdo people agree with merging https://review.opendev.org/c/openstack/nova/+/872975 even if it says it's a dnm ?16:24
bauzas(tbc, I can make an update and remove the dnm title)16:24
dansmithwe should remove the dnm for sure16:25
sean-k-mooneybauzas: melwitt had a patch to poison importing libvrt that should catuch this by the way16:25
opendevreviewSylvain Bauza proposed openstack/nova master: Add logging for leaking out the non-poisoned libvirt testcase  https://review.opendev.org/c/openstack/nova/+/87297516:25
dansmithbauzas: do you know about the thing you can do to add additional test payload report sections?16:25
bauzasdansmith: acked ^16:25
dansmithdepending on what you're trying to do, that can be more useful than logging sometimes16:25
bauzasdansmith: nope, hence my sending the bottle to the sea, asking for advices16:26
sean-k-mooneybauzas: can you put a sleep in that busy loop too16:26
dansmithit's not a busy loop is it?16:26
bauzasnope16:26
gibiit is walking a tree up16:26
bauzaswe're trying to find an attribute from an eventlet object and if we can't find it, we walk the ascendance16:26
sean-k-mooneyit will loop until the test_case_id is not None16:26
gibiit walks along the eventlet.parent link16:27
sean-k-mooneyi guess its proably fine16:27
gibiso while it busy it is bounded16:27
sean-k-mooneyoh sorry your right it is doing that16:27
sean-k-mooneyok 16:27
bauzasdansmith: so, about the payload reporting, you gained my interest16:27
dansmithbauzas: https://github.com/openstack/glance/blob/master/glance/tests/functional/__init__.py#L1129-L113016:27
dansmithbauzas: that adds another section of the test failure reporting, like "here's the stdout I captured" and "here are the log lines I captured"16:28
bauzasffff16:28
bauzasdansmith: ++16:28
dansmithhelps to separate nova-logging from something specifically to be reported by the test case16:28
dansmithespecially if debug logging isn't captured, or is being mocked out, etc16:28
dansmithin glance I found it useful because their functional workers run outside the main process, but also in some cases where I needed to debug failures16:29
dansmith(failures that happen infrequently)16:29
dansmithanyway, just FYI, might be helpful16:29
bauzasit could be16:29
gibidansmith: ohh that is good to know :)16:29
sean-k-mooney oh addDetail16:30
bauzasdansmith: the problem is that we get an exception from a test which is actually not due by this test but rather by a leaked eventlet thread that blows up at that point in time16:30
sean-k-mooneyi have seen that before but never looked into it ya look useful16:30
dansmithgibi: yeah, it's kinda nice :)16:31
bauzasideally I would like to trace the whole parenting stack that triggered the leaky thread16:31
gibibauzas: we will hopefuly get the name of the leaky test case and then we can create a local reproduction16:32
bauzasgibi: a stack would have been better but yeah16:33
gibiyou have a stack 16:33
gibibut it start when the thread starts16:33
bauzasthat's the parent stack I want :)16:33
gibiyeah16:33
gibithat is hard16:33
bauzasyup16:35
bauzasanyway, reviews appreciated on https://review.opendev.org/c/openstack/nova/+/87297516:35
bauzasmoving on ?16:35
sean-k-mooneysure16:35
bauzas#link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status16:36
gibiI'm on it16:36
bauzas #info Please look at the gate failures and file a bug report with the gate-failure tag.16:36
bauzas#info Please look at the gate failures and file a bug report with the gate-failure tag.16:36
bauzas#info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures16:37
bauzasdone with this topic, phew.16:37
bauzas#topic Release Planning 16:37
bauzas#link https://releases.openstack.org/antelope/schedule.html16:37
bauzas#info Antelope-3 is in 1.2 weeks16:37
bauzasI said 1.2 because it will be on Thursday next week16:37
bauzas#link https://blueprints.launchpad.net/nova/antelope All accepted blueprints for 2023.116:37
bauzas#link https://etherpad.opendev.org/p/nova-antelope-blueprint-status Blueprint status for 2023.116:37
bauzasfeel free to comment it as much as you want ^16:37
bauzasI was originally planning to do a full reviews set by today, but due to the former topic, I abandoned my promise16:38
elodillesa bit related to release: 'Release final os-vif for 2023.1 Antelope' https://review.opendev.org/c/openstack/releases/+/87277916:39
bauzas(yet again saying, don't expect the reviews to magically happen, be present and interact with us)16:39
bauzaselodilles: good catch I forgot to add it the agenda16:39
bauzasImportant : 16:39
bauzas#info Thursday is the non-client libs feature freeze, which means we can only accept features changes for os-vif, os-traits and os-rc up until Thursday16:40
bauzaslater changes will be on hold until next release16:40
elodilles++16:40
bauzasI haven't looked at os-vif, os-traits and os-resourceclasses master branches, but I think we have open changes on them16:41
bauzasso, if anyone wants some addition to those libraries, I'd recommend them to ping me or anyone else for reviews 16:42
bauzaslast point16:43
bauzasFeatureFreeze is on next Thursday16:43
bauzaswe'll see how the gate goes by that time16:43
bauzasbut as for the older releases, the most important for having your series accepted for Antelope is to get a +W before Thursday EOB16:43
bauzaswe'll manage the rechecks if needed16:44
bauzasdon't freak out by the gate stability, but please continue to ensure your patches are ready for reviews16:44
sean-k-mooneyelodilles: i was planning to propose an os-vif release to include rodolfos patches16:44
sean-k-mooneyso i want to confirm the sha before we move forward with that16:45
sean-k-mooneyill do that after the meeting16:45
elodillessean-k-mooney: as i know he updated the release patch already16:45
bauzassean-k-mooney: thanks, appreciated16:45
elodillessean-k-mooney: but please -1 if something is still missing16:45
sean-k-mooneyack just looking now ill +1 if its correct 16:45
elodillessean-k-mooney: that is even better :)16:46
bauzasI think for os-traits I've seen one patch from Uggla16:46
bauzasbut I don't think we can reasonably merge the nova related series16:47
Ugglayep but it can wait.16:47
bauzasnext topic, if so16:47
bauzas#topic vPTG Planning 16:47
bauzasa bit early butn16:47
bauzas#link https://www.eventbrite.com/e/project-teams-gathering-march-2023-tickets-483971570997 Register your free ticket16:47
bauzasalso16:48
bauzas#link https://etherpad.opendev.org/p/nova-bobcat-ptg Draft PTG etherpad16:48
bauzasevery cycle, we're asked how long we should have sessions16:48
bauzasI thought it would be better to somehow have an idea on how many topics we gonna discuss before saying how many slots we need :)16:49
bauzasbut I know16:49
bauzaslots of topics will arrive the week before the PTG :)16:49
sean-k-mooneythe more virtual ptgs we have the less energy i have for them. that said i would prefer to have more slots over more days then a few short long ones16:50
bauzasthe thing is, you have the etherpad, feel free to amend it16:50
sean-k-mooneys/short//16:50
bauzassean-k-mooney: this cycle, we will also have a "physical PTG" at the middle of bobcat-116:50
bauzasthat could alleviate some discussions16:50
sean-k-mooneythat should really be for C planning16:50
bauzasor B implementation phasing :)16:51
sean-k-mooneyeither/both its close to Spec Freeze16:51
bauzasI haven't checked whether the proposed B agenda is merged yet16:51
sean-k-mooneyso proably to late for directional changes on large specs16:51
bauzassean-k-mooney: don't disagree, I'm just saying it could help some small contributors to get audience when they need16:52
elodillesB schedule was merged16:52
bauzascool16:52
bauzasso16:52
sean-k-mooneyhttps://releases.openstack.org/bobcat/schedule.html16:52
sean-k-mooneyso yes its merged16:52
bauzashttps://releases.openstack.org/bobcat/schedule.html ays that pPTG will be 3 weeks before specfreeze (if we agree on the PTG at b-2 being spec freeze)16:53
bauzasso, that's why I'm saying we could have a shorter but productive vPTG16:53
bauzaslike 2 hours per day16:53
bauzas(and ideally, I'd like to attend some TC discussions this time)16:53
sean-k-mooneyi would prefer to frontload the plannign to the vPTG and have the physical one be more C focused but ok16:54
bauzasdon't disagree16:54
sean-k-mooneybecause of its time in the cycle it feels more like a fourm then a ptg16:54
bauzasanyway, we're rushing out of time16:54
bauzas#topic Review priorities 16:54
bauzas#link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2)16:54
bauzas#info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review16:54
bauzas#topic Stable Branches 16:54
bauzaselodilles: you have 5 mins (sorry)16:54
elodilles#info stable branches seem to be OK back till wallaby16:54
bauzashuzzah16:55
elodillesstable/wallaby gate is passing (failing openstacksdk-functional-devstack job was removed from wallaby)16:55
bauzasthanks gmann for the hard work on stable/wallaby16:55
elodillesyepp16:55
elodilles++16:55
elodilles#info stable/victoria gate is affected by the same failing openstacksdk-functional-devstack job16:55
elodilles#info ussuri and train gates are broken broken ('Could not build wheels for bcrypt, cryptography' errors)16:55
elodilles#info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci16:55
elodillesEOM16:55
bauzaselodilles: I'll pay attention to the ussuri branch with the CVE backport16:55
elodillesbauzas: ack16:56
bauzas... when I have time :)16:56
elodillesbauzas: i have a proposed potential workaround16:56
elodillesfor ussuri16:56
bauzaselodilles: cool, let's figure that out after the meeting, tomorrow per say16:56
elodillesbauzas: ++16:56
bauzasfwiw, I'm planning to deliver the cve fix down to ussuri16:56
bauzasbut not provide any backport to train16:57
elodilleswhy not train? :)16:57
bauzasdue to the oslo.utils versioning16:57
bauzasmost of the distros now made the backports16:57
bauzasso it's upstream support16:57
bauzasand Train is on EM16:57
elodilleswell, Wallaby is EM16:57
bauzasand ussuri too16:57
elodilles(and Xena soon, too)16:57
bauzasbut it was simple to backport the fix down to ussuri16:58
bauzasit was cheap, so we proposed it16:58
bauzasbackporting it to train is a totally different story16:58
elodillesok :) thanks for that!16:58
bauzasit requires some oslo.utils backport too (and then a janga puzzle with dependency management)16:58
elodilles:S16:59
bauzasso, things are said, crystal clear.16:59
elodillesthanks, i see16:59
bauzaselodilles: thanks elodilles for the stable report16:59
elodillesnp16:59
bauzaslast point for the 20 secs left16:59
bauzas#topic Open discussion 16:59
bauzasnothing on the agenda16:59
bauzasso I'll close the meeting17:00
bauzasfeel free to add your items for next week17:00
bauzasthnaks all17:00
bauzas#endmeeting17:00
opendevmeetMeeting ended Tue Feb  7 17:00:21 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)17:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/nova/2023/nova.2023-02-07-16.01.html17:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-02-07-16.01.txt17:00
opendevmeetLog:            https://meetings.opendev.org/meetings/nova/2023/nova.2023-02-07-16.01.log.html17:00
elodillesthanks o/17:02
sean-k-mooneyelodilles: we can chat after the meeting but is the crytography issue beign adressed in train.17:02
gibibauzas: I left feedback in https://review.opendev.org/c/openstack/nova/+/87297517:04
elodillessean-k-mooney[m]: ensure-trust workaround seems to be working for stable/ussuri (i proposed it to master last time accidentally): https://review.opendev.org/c/openstack/grenade/+/872969/17:04
bauzasgibi: appreciated. I can speak French, English and a bit of German and Spanish, but my eventlet is definitely small17:05
elodillessean-k-mooney[m]: and ykarel in neutron added it to tempest (devstack, actually) as well for train, so hopefully something like that should do the trick in train17:05
gibiit is not a big codebase (eventlet + greenlet) but it is not straitforward either. And greenlet has parts of it implemented as a C extension to python :)17:05
bauzasgibi: elodilles: thanks for the reviews17:06
elodillesnp17:07
bauzasto clarify, 1/ this is hard to tell which tests are in cause17:07
sean-k-mooneyelodilles: ack i was asking as its currenlty blocking a ceilometer/devstack fix i was working on 17:07
sean-k-mooneyelodilles: changes to the telemetry tempest plugin  check for compatiablity with train17:07
bauzaselodilles: you missed today's conversation but I basically grepped the occurrences of such problem trying to intersect the failing tests and basically there were no suspects17:07
bauzaselodilles: because the failing test is not responsible 17:07
bauzaselodilles: this is just an unfortunate test that runs at the same time than the thread is throwing the exception17:08
sean-k-mooneyelodilles: i tought you siad orginaly ensure-rust would not work17:08
bauzasbite by the bullet17:08
bauzasgibi: about your comment, not sure I fully understand17:08
bauzasgibi: I can remove the DNM: prefix in the log17:09
elodillessean-k-mooney: yepp, and i was wrong :/ (pushed it to the wrong branch)17:09
gibibauzas: I'm not happy to merge this as it will log things that is misleading I would rather keep a DNM patch that we recheck until hits the issue17:09
sean-k-mooneyelodilles: ah ok :) then cool i can recheck https://review.opendev.org/c/openstack/telemetry-tempest-plugin/+/872350 after its merged17:09
bauzasgibi: but for case #1 you mentioned, this seems to me OK to have this log17:09
bauzasgibi: I mean17:10
elodillesbauzas: ack. i got your intention (i think): to catch the test with the 'DNM' log whenever we see the failing test17:10
bauzasI'm a developer, I'm writing a functest and I forget to poison libvirt17:10
sean-k-mooneywe have a fixture to poision it17:10
sean-k-mooneyand we dont install libvirt so the import should also fail17:11
bauzasthen I'd see my gate saying -1 if myself I'm not brave enough to run the functest locally17:11
sean-k-mooneyin teh fucntional tests the libvirt python package should not be there17:11
sean-k-mooneyunless it has been baked into the ci image17:11
clarkbit shouldn't be17:11
bauzassean-k-mooney: context is, the poison disappears when the thread is executed17:11
sean-k-mooneywe intentionally od not list libvirt in test-requirements.txt or requirements.txt17:12
bauzasso the threads gets a None attribute for the import17:12
sean-k-mooneyok so it raises an import error as we expect17:12
bauzasto quote gibi "an existing test case that is properly poisoned and mocked libvirt. But an eventlet is leaked out from the test, the test finished and removed the mock. Then the leaked eventlet wakes up while a later test case runs and because the mock was removed in when the original test finished the leaked eventlet now imports libvirt and hits the poison set up by the current test."17:12
sean-k-mooneythat should fail the test bug im guessign we are using spawn_n17:12
gibibauzas: the code you injected does not help catching such case where the poison was not added17:13
gibibauzas: and our goal here now is to know what test leaked the eventlet17:13
gibito be able to reproduce the leak locally and fix it17:14
sean-k-mooneyit will just log an error with the orgianl evently id to help identigy the test that was not poisoned17:14
bauzasgibi: yup, that's why I'm trying where to patch17:14
sean-k-mooneygibi: is it a libvirt import in all cases or just some17:14
bauzassean-k-mooney: no the test that runs when the greenthread wakes up then turns into a failure17:15
bauzashttps://4dca9d38a541907e85e1-0253beca39d73a6e7192d5b32ed5edc2.ssl.cf2.rackcdn.com/860282/2/check/nova-tox-functional-py310/466e0d7/testr_results.html17:15
bauzas(one of the many occurences)17:15
bauzasor https://4dca9d38a541907e85e1-0253beca39d73a6e7192d5b32ed5edc2.ssl.cf2.rackcdn.com/860282/2/check/nova-tox-functional-py310/466e0d7/testr_results.html17:15
bauzasor https://4dca9d38a541907e85e1-0253beca39d73a6e7192d5b32ed5edc2.ssl.cf2.rackcdn.com/860282/2/check/nova-tox-functional-py310/466e0d7/testr_results.html17:15
gibisean-k-mooney: depending on when the leaked eventlet weaks up it either hits the libvirt poison and fails the test, or just logs the stack traces and let the test passes if no poison is in place17:16
bauzasor https://7ffaea22ff93fca2f0ea-bf433abff5f8b85f7f80257b72ac6f67.ssl.cf5.rackcdn.com/869900/7/gate/nova-tox-functional-py38/3b10d8a/testr_results.html (sorry)17:16
sean-k-mooneygibi: ack17:16
bauzasgibi: yup, I found some run17:16
gibisean-k-mooney: the poision acts like the global state the lets the leaked eventlet manipulate the running test case17:16
sean-k-mooneygibi: an dis it spawn_n in all cases17:16
sean-k-mooneygibi:yes but this is not a reulst of the poision its just highlighign an exisitng issue17:17
gibisean-k-mooney: yes the poison is good17:17
sean-k-mooneywe did have an existing thing like this related to noticiation i think in the past right17:17
gibisean-k-mooney: yes17:17
sean-k-mooneyand we checked the eventlet id17:17
gibisean-k-mooney: that embeds the testcase id to the eventlet17:18
sean-k-mooneyyep17:18
gibiand checks it during the notification code path17:18
sean-k-mooneywhich is what bauzas  is logging now17:18
gibiand that path is fixed17:18
gibisean-k-mooney: yes, we try to log that now for this poison / live_migration_abort() codepath17:18
bauzassean-k-mooney: yes, I'm trying to see what's firing the greenthread17:18
sean-k-mooneyso longterm i still wonder if we should make nova use a green pool17:18
sean-k-mooneyand then in the tests we can make each test use there own greenpool17:19
sean-k-mooneyand call wait on that in the test cleanup17:19
sean-k-mooneyi think that would be relitvly simple to do 17:19
sean-k-mooneyim just not sure we want to do it 2 weeks before FF17:20
bauzasno17:20
bauzasplease :)17:20
gibisean-k-mooney: we would still need a reproduce for the current failure to see that if the pooling fixes it :)17:20
bauzasgibi: I missed your top comment17:20
sean-k-mooneygibi: yes we would :)17:20
bauzasI'll amend .zuul.yaml17:20
sean-k-mooneygibi: but it would allow use to piosion direct calls to spawn/spaw_n potentially and ensure we cant leek eventlets between cases17:21
gibiso let's get a reproducer first by figuring out the leak tests (we know that there is more than one as simply intersecting testcase lists from failed test workers did not result in a single test case but an empty list)17:21
gibisean-k-mooney: I'm not against fixing this via pooling :)17:21
opendevreviewSylvain Bauza proposed openstack/nova master: DNM: Add logging for leaking out the non-poisoned libvirt testcase  https://review.opendev.org/c/openstack/nova/+/87297517:21
bauzasgibi: sean-k-mooney: I'm not against fixing our concurrency mechanism for func tests, I'm just against doing it *now* :)17:22
gibiI will disappeare soon. I think we can continue this tomorrow. I will look at the patch and call rechecks time to time during my evening17:23
bauzasgibi: if only I was able to reproduce it locally, I could just call tox with -- --until-failure17:25
gibibauzas: yeah17:27
gibithat is the key. If we have it locally I can add as much runtime to it as I want17:27
bauzasanyway, have a good evening17:27
bauzasand thanks for the help17:27
bauzasI think I'll shortly stop too17:27
gibibauzas: thanks for the work, I think we made good progress today. I was not able to do that without you.17:28
opendevreviewMaksim Malchuk proposed openstack/nova stable/xena: Fix to implement 'pack' or 'spread' VM's NUMA cells  https://review.opendev.org/c/openstack/nova/+/82980417:31
opendevreviewMaksim Malchuk proposed openstack/nova stable/wallaby: Fix to implement 'pack' or 'spread' VM's NUMA cells  https://review.opendev.org/c/openstack/nova/+/86183217:37
*** umbSubli1 is now known as umbSublime18:11
*** efried1 is now known as efried19:24
opendevreviewMaxim Monin proposed openstack/nova master: Server Rescue leads to Server ERROR state if base image is deleted  https://review.opendev.org/c/openstack/nova/+/87238520:49
opendevreviewBalazs Gibizer proposed openstack/nova master: DNM: Add logging for leaking out the non-poisoned libvirt testcase  https://review.opendev.org/c/openstack/nova/+/87297521:06
*** dasm|rover is now known as dasm|off22:40

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!