Wednesday, 2023-06-07

iurygregorynot sure if is magic or what, but https://review.opendev.org/c/openstack/ironic/+/885372 seems to help on my patch ^ 00:13
iurygregory:D00:13
iurygregoryor if is just CI that is back to normal00:13
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic master: Add DB API for Firmware and Object  https://review.opendev.org/c/openstack/ironic/+/88306201:54
iurygregoryremoving Depends-On to see01:54
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic master: Firmware Interface  https://review.opendev.org/c/openstack/ironic/+/88527602:00
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic master: [WIP] RedfishFirmware Interface  https://review.opendev.org/c/openstack/ironic/+/88542502:52
opendevreviewMerged openstack/ironic master: execute on child node support  https://review.opendev.org/c/openstack/ironic/+/88054504:04
rpittaugood morning ironic! o/06:27
rpittauso no fails there after 2 runs, maybe it was a transient hiccup? zuul drunk? tox had a heavy lunch?06:28
opendevreviewRiccardo Pittau proposed openstack/ironic master: Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537206:38
rpittaureduced the time to 3 minutes ^06:38
opendevreviewRiccardo Pittau proposed openstack/ironic master: Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537206:48
rpittauah that failed finally!07:42
opendevreviewRiccardo Pittau proposed openstack/ironic master: Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537207:53
rpittaulooks like the timeout didn't work, or tests are not actually timing out but something else is07:54
opendevreviewDmitry Tantsur proposed openstack/ironic master: Migrate the inspector's /continue API  https://review.opendev.org/c/openstack/ironic/+/87594408:57
opendevreviewDmitry Tantsur proposed openstack/ironic master: Add the initial skeleton of the agent inspect interface  https://review.opendev.org/c/openstack/ironic/+/87781409:04
opendevreviewDmitry Tantsur proposed openstack/ironic master: [WIP] Very basic in-band inspection with the "agent" interface  https://review.opendev.org/c/openstack/ironic/+/88545009:08
dtantsurrpittau: I wonder if we have any tests that run for too long09:17
* dtantsur tries locally09:17
dtantsurWe could also consider running the bazillion of RBAC tests in a separate job only.09:19
dtantsurironic.tests.unit.drivers.modules.test_agent_client.TestAgentClient.test__command_poll                                         6.07509:22
dtantsurironic.tests.unit.drivers.modules.ibmc.test_utils.IBMCUtilsTestCase.test_handle_ibmc_exception_retry                           4.11709:22
dtantsurironic.tests.unit.drivers.modules.ilo.test_power.IloPowerInternalMethodsTestCase.test__set_power_state_soft_reboot_fail_to_on  4.09309:22
dtantsurironic.tests.unit.conductor.test_manager.UpdateNodeTestCase.test_update_node_interface_in_allowed_state                        4.03909:22
dtantsurironic.tests.unit.conductor.test_manager.DoNodeTearDownTestCase.test__do_node_tear_down_from_valid_states                      3.23109:22
dtantsurironic.tests.unit.drivers.modules.ilo.test_power.IloPowerInternalMethodsTestCase.test__set_power_state_soft_power_off_timeout  3.18909:22
dtantsurironic.tests.unit.drivers.modules.ilo.test_power.IloPowerInternalMethodsTestCase.test__set_power_state_soft_reboot_timeout     3.11809:22
dtantsurironic.tests.unit.drivers.modules.drac.test_power.DracPowerTestCase.test_set_power_state                                       2.62009:22
dtantsurironic.tests.unit.drivers.modules.drac.test_power.DracPowerTestCase.test_set_power_state_timeout                               2.60509:22
dtantsurironic.tests.unit.api.controllers.v1.test_node.TestPost.test_create_node_specify_interfaces                                    2.35009:22
dtantsurFixing these will shave half a minute off the run09:23
rpittaudtantsur: we could definitely reduce the total time, but I'm not worried about that, but the fact that I don't actually see any test timing out and the limit is set to 15 secs09:25
rpittauso we don't know the root cause of the random timeout09:26
dtantsurouch09:27
dtantsurunderstood09:28
rpittauI think OS_TEST_TIMEOUT is not working as it should, the limit should be 15s per test but ironic.tests.unit.db.sqlalchemy.test_migrations.TestMigrationsMySQL.test_walk_versions   took 43 secs to finish in py3909:36
dtantsur43 is quite slow, although not impossibly slow09:54
rpittauyeah, but it should fail since the timeout is 15 secs :)09:54
dtantsurwell, it's good that it does not :D09:54
opendevreviewRiccardo Pittau proposed openstack/ironic master: Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537210:08
opendevreviewMahnoor Asghar proposed openstack/ironic master: Handle duplicate node inventory entries per node  https://review.opendev.org/c/openstack/ironic/+/88460810:28
rpittaubesides the fact that OS_TEST_TIMEOUT doesn't seem to work, I've noticed that ironic.tests.unit.db.sqlalchemy.test_migrations.TestMigrationsMySQL.test_walk_versions has very weird runtimes that vary from 10-15 secs to over a minute in some cases10:59
opendevreviewRiccardo Pittau proposed openstack/ironic master: [WIP] Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537211:05
opendevreviewRiccardo Pittau proposed openstack/ironic master: [WIP] Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537211:07
iurygregorygood morning Ironic11:45
iurygregoryomg a lot of messages11:45
iurygregoryrpittau, I think you reached the timeout at least according to https://9dd8202499839c857d32-766877f31b96866d435c7c3ea23bc324.ssl.cf1.rackcdn.com/885372/7/check/openstack-tox-cover/a7d221d/testr_results.html11:55
rpittauiurygregory: yeah, but not using OS_TEST_TIMEOUT11:56
iurygregoryyeah11:56
iurygregoryI was reading all the conversation you had and looking at the patch11:57
opendevreviewRiccardo Pittau proposed openstack/ironic master: [WIP] Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537212:00
iurygregorylol `gentle=True`12:00
opendevreviewMahnoor Asghar proposed openstack/ironic master: Handle duplicate node inventory entries per node  https://review.opendev.org/c/openstack/ironic/+/88460812:09
iurygregoryrpittau, I'm trying to understand why we could have invalid timeout .-. (tox would do something wrong when setting OS_TEST_TIMEOUT? 12:14
dtantsurI'd be surprised if tox was aware of OS-specific variables12:15
iurygregorywe are using setenv so OS_TEST_TIMEOUT would be set (at least would be expected lol)12:16
dtantsuriLO unit tests relying on precise timing of sleep() are :chef's kiss:12:54
rpittauI'm trying with passenv this time12:57
rpittauthat should pass the value correctly12:58
rpittau"should"12:58
iurygregorydtantsur, wow12:58
rpittauit doesn't lol12:58
dtantsur\o/13:01
dtantsurSo it's a cursed computing day today?13:02
rpittauI don't see what's wrong, the OS_TEST_TIMEOUT value is passed when running tox so should be taken into account by TestCase13:02
dtantsurrpittau: can some of our dependencies mess with it? like oslotest?13:02
rpittauoslotest is the one that should actually make use of it :/13:03
iurygregory.-.13:03
rpittauhttps://github.com/openstack/oslotest/blob/master/oslotest/base.py#L35-L4513:04
iurygregory(╯°□°)╯︵ ┻━┻13:04
dtantsurHmm, but where is it handled? I don't see it mentioned in the code.13:06
dtantsurah, https://github.com/openstack/oslotest/blob/master/oslotest/timeout.py#L3613:06
dtantsurmaybe it's too gentle then? :D13:06
rpittaulol13:07
iurygregorygentle=False :D13:07
* TheJulia wonders if sleeping in is the best choice13:07
rpittauthe gentle means that it will actually throw an exception13:07
iurygregoryLOL13:07
rpittauTheJulia: it is today :D13:07
dtantsurabsolutely13:07
iurygregoryyeah, let's go back to bed13:07
* TheJulia grabs cat and closes eyes13:07
rpittauby the tentacles of Cthulhu! locally it's working............13:08
iurygregoryrpittau, I'm not surprised :D13:08
* rpittau runs away screaming13:08
iurygregoryits the same scenario as "it works in devstack" :D13:09
dtantsurI also absolutely love silently ignoring values like in https://github.com/openstack/oslotest/blob/master/oslotest/timeout.py#L39-L4113:09
rpittauyeah I'm going to remove that from my implementation (aka copy-paste)13:13
TheJuliaInsufficientCatGravityException :(13:13
opendevreviewRiccardo Pittau proposed openstack/ironic master: [WIP] Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537213:17
* dtantsur wants to set oslo_service.loopingcall on fire13:28
* iurygregory is afraid to ask why13:28
dtantsuriurygregory: it's pretty implicit and horribly overbloated for our case13:28
dtantsuryou see, it's supposed to be used asynchronously, hence its usage of green threads13:29
dtantsurbut we do things like LoopingCall().start().wait()13:29
dtantsurwhich turns it into a complicated for loop with some eventlet inside13:29
iurygregoryjesus13:29
iurygregory.-.13:29
dtantsuryep. and I've just found that we use initial_delay=1 when waiting for power state. which in our case is just sleep(1) inserted into all power drivers.13:30
dtantsurI don't remember if we did that on purpose...13:30
TheJuliait sounds like something we might have done intentionally13:32
dtantsurIf so, I'd rather have us use time.sleep explicitly. At least the intention would be clear.13:32
dtantsur(And don't let me start on using eventlet itself... I wish async Python was there when OpenStack started)13:33
TheJuliadtantsur: same13:37
* TheJulia awaits dtantsur to pour gasoline on eventlet13:37
dtantsurI seriously wonder at which point the person behind eventlet decides to move on13:38
dtantsurI suspect it's becoming an increasingly thankless job13:38
iurygregoryrpittau, seems like 1 works like a charm :D13:39
iurygregoryhttps://11f91ceea8fc4a8c24f4-01466d86d1bf7d9494b28327d57fdcc3.ssl.cf5.rackcdn.com/885372/9/check/openstack-tox-py310/e99f0e5/testr_results.html13:39
iurygregoryTypeError: 'str' object cannot be interpreted as an integer XD13:40
* TheJulia facepalms13:40
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508713:50
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508713:57
opendevreviewDmitry Tantsur proposed openstack/ironic master: Mock sleep in unit tests that rely on it  https://review.opendev.org/c/openstack/ironic/+/88549714:10
rpittauwell that's perfect14:10
rpittaunow I know what's happening14:11
opendevreviewRiccardo Pittau proposed openstack/ironic master: [WIP] Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537214:12
iurygregoryrpittau, https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_4d8/885372/10/check/openstack-tox-py38/4d8415e/testr_results.html 14:25
iurygregoryit works \o/14:25
rpittaushould work without the base changes14:25
rpittauanyway, let's see the results14:26
iurygregorypy38 and py310 failed 14:26
iurygregory39 still running14:26
rpittauwhich is absurd14:26
rpittauah no, just slower to fail14:27
iurygregorystill has 4min remaning14:27
iurygregoryI do see some fixtures._fixtures.timeout.TimeoutException in the console14:29
rpittauoooook so one last test and then I can remove the WIP14:38
opendevreviewRiccardo Pittau proposed openstack/ironic master: [WIP] Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537214:39
iurygregoryfailed 14:48
rpittau\o/14:55
iurygregoryI'm wondering how we will choose the time 14:56
rpittau45 secs14:56
rpittauthe highest I saw was 4214:56
rpittauand it's a lot14:56
iurygregoryyeah14:56
rpittauanyway, first ice cream, then remove WIP14:56
iurygregoryice cream ++14:56
masgharI should probably wait for rpittau to merge before re-checking my patch, right? https://review.opendev.org/c/openstack/ironic/+/88460815:00
iurygregorymasghar, yeah 15:01
masgharAlthough my timeouts were 40 and 50 minutes respectively :O15:01
iurygregorythis is the timeout for the job itself15:01
iurygregorywe are trying to add timeouts to each unittest15:02
masgharI see15:02
masgharThanks!15:02
iurygregoryat least if it fails we will get an idea of which test is taking too long so we can check 15:02
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508715:03
masgharMakes sense. But just out of curiosity, mine probably failed because the CI was busy, right? Because nothing timeout related has been merged to master yet?15:04
TheJuliagah, our devstack plugin has been driving me crazy the last few days15:04
TheJuliaso I'm afraid to ask, but did a new eventlet drop or soemthing?15:05
iurygregorymasghar, CI is unhappy in py3 jobs and is hitting timed out without much explanation, so with rpittau patch hopefully we will be able to identify tests that are taking too long15:05
masghariurygregory: ooh okay. Makes more sense!15:06
iurygregoryTheJulia, are you having trouble with devstack in CI or locally?15:08
TheJuliauhh, I've been so slammed with presentations, people messaging me with stuff, and people seeking ad-hoc help on issues, I've not done too much code wise this week15:09
TheJuliaI did run tests yesterday but had zero issues15:09
iurygregoryrunning locally the tests are working fine15:10
iurygregorybut this issue started to show since Jun 05 the timed_out one15:10
TheJulia... last eventlet was january...15:11
TheJuliaoh well15:11
TheJuliaCI makes sense if new package release because we may have older copies of $things locally15:11
TheJuliaI +2'ed dmitry's sleep change15:13
rpittaummmm wondering why we don't import the same env variables in cover tests15:26
rpittauwell not important now, I'll add  that to a different patch15:27
opendevreviewRiccardo Pittau proposed openstack/ironic master: Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537215:28
rpittau^ should be ready15:28
iurygregoryack15:28
opendevreviewRiccardo Pittau proposed openstack/ironic master: Add test timout to tox config  https://review.opendev.org/c/openstack/ironic/+/88537215:30
rpittaunow :P15:30
opendevreviewRiccardo Pittau proposed openstack/ironic master: Use tox env variables in coverage tests  https://review.opendev.org/c/openstack/ironic/+/88550715:32
dtantsurLong weekend starts here, see you on Monday and safe travels for those going to the Summit!15:35
rpittaudtantsur: enjoy! :)15:35
iurygregoryenjoy dtantsur o/15:35
opendevreviewMerged openstack/ironic master: Mock sleep in unit tests that rely on it  https://review.opendev.org/c/openstack/ironic/+/88549715:37
rpittausee ya tomorrow ironicers! o/15:38
iurygregorybye rpittau o/15:41
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508715:45
TheJuliai like nova advertised their etherpad15:47
TheJuliafor the ptg15:47
JayFwe don't really have one other than the ironic-openinfra-2023 which I already linked on the ML15:48
JayFoooh, like for the forum15:48
JayFthat's a good idea15:48
opendevreviewJulia Kreger proposed openstack/ironic stable/train: Fix Cinder Integration fallout from CVE-2023-2088  https://review.opendev.org/c/openstack/ironic/+/88506516:15
JayFTheJulia: so I found out something interesting the other day: apparently T/U/V/W fixes for Cinder and Nova were all done *downstream*16:20
JayFTheJulia: which sounded weird given we're also experiencing breakage back further than that16:20
iurygregoryoh wow16:23
JayFbecause that's some of the impetus for Cinder retiring old EM branches now16:23
JayFbecause if they couldn't patch them for this nasty vuln, why do they exist16:23
iurygregory*magic*16:24
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508716:28
iurygregoryI need to say, I love ZUUL!16:28
iurygregoryhttps://review.opendev.org/c/openstack/ironic/+/885507 timed_out16:29
JayFAre you fishing for karma?16:29
JayFnope, sarcasm16:29
iurygregory<crying>16:29
JayFit's listening to you Iury, you have to genuinely love Zuul for it to set you free ;) 16:29
iurygregoryyes!16:29
JayFTheJulia: iurygregory: FWIW I checked openstack/releases and openstack/requirements, looks like only recent things added, library wise, in CI should be oslo.messaging and tooz16:31
TheJuliawow, that is... saddening16:32
JayFI've said at least two sad things in the last ten minutes16:32
iurygregoryyeah I checked that also =( 16:32
JayFwhich one has saddened you?16:32
TheJuliayes16:32
* TheJulia nods emphatically16:33
clarkbre the timeouts I note that oslo test only does a gentle timeout. You may need the non gentle version. Also, other approaches that can be taken: grab the subunit file for timed out jobs and see which tests haven't reported back. In theory one of those is the problem. You can also run the test suite in a loop locally to see if you can reproduce the error. Though it may be16:44
clarkbtiming specific16:44
iurygregoryclarkb, looking at the job with timed_out I don't see the subunit file there .-.16:53
iurygregoryhttps://zuul.opendev.org/t/openstack/build/568021ced3a64caaa6bc0a235d4f31c3/logs16:53
iurygregoryor is this tmp_69f9zxx?16:53
clarkbyes that file16:53
iurygregoryack, ty16:54
clarkbwhen the test is running the subunit content is written to a temp file then when it completes it gets moved to its final destination. When the test suite crashes or is forcefully stopped that temp file isn't moved16:54
iurygregorygotcha16:54
opendevreviewJulia Kreger proposed openstack/ironic master: DNM Enable OVN  https://review.opendev.org/c/openstack/ironic/+/88508717:40
opendevreviewJulia Kreger proposed openstack/ironic master: Permit Ironic to notify IPA it can support MD5  https://review.opendev.org/c/openstack/ironic/+/88216818:06
opendevreviewMerged openstack/ironic-tempest-plugin master: rbac - Fix vif_attach expected return values  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/88303218:11
iurygregoryif anyone have time for review I appreciate feedback in https://review.opendev.org/c/openstack/ironic/+/883062 and https://review.opendev.org/c/openstack/ironic/+/885276/ o/18:13
JayFah, I skipped yesterday because of V-118:14
JayFbut I need to stop filtering on that when our CI is going batty18:14
iurygregoryyeah =( this timed_out are a pain18:15
iurygregoryleaving for my physiotherapy session, will continue working at night18:17
TheJuliais it just the tests with sleep or is it more tests?18:17
iurygregoryI saw it getting stuck in different tests18:19
iurygregoryhttps://zuul.opendev.org/t/openstack/builds?job_name=openstack-tox-py39&project=openstack%2Fironic&result=TIMED_OUT&skip=0 18:21
iurygregoryhttps://zuul.opendev.org/t/openstack/builds?job_name=openstack-tox-py38&project=openstack%2Fironic&result=TIMED_OUT&skip=018:21
iurygregoryhttps://zuul.opendev.org/t/openstack/builds?job_name=openstack-tox-py310&project=openstack%2Fironic&result=TIMED_OUT&skip=018:21
JayFdo we run unit tests in parallel by default?18:21
JayFIf so, I wonder if going single-threaded would expose anything18:21
iurygregoryyeah, I think by default we run in parallel18:23
ashinclouds[m]8 runners with parallel execution inside of them18:33
JayFwould be interesting to see if these repro serially18:36
TheJuliaI doubt they world, tbh.18:48
JayFyeah, same18:49
JayFI think I'm going to just start running tox in a loop on my computer, see if I can get one to hang lol18:49
TheJuliaI wonder if it would need to be inside a VM…18:50
TheJuliaSpecifically because addressable memory cache behaves differently in a VM18:51
TheJuliaWhich cascades out in some fun cases18:51
JayFthat would be a touch more difficult for my current setup, but I could make it happen18:53
* JayF is very close to setting up a dedicated VM host in his house18:53
JayFI wonder if we should just ask them to hold a node, and we can try to repro on that specific node19:03
JayFinstead of chasing environments19:03
JayFidk19:03
TheJulia++++++++19:05
JayFmaybe I'll get "lucky", but I just kicked off a run of 50 in a row locally since I'm going to go get some lunch lol19:09
JayF20 clean runs :|19:30
ydehi again19:54
ydei'm hitting an issue i don't know how to fix: when i power up unknown nodes, the dnsmasq ironic services ignores the mac address19:55
ydeand i can't enter "discovery" mode with auto inspection. i know from previous deployment this it totally feasible, but i can't remember which is the setup i have to make so that dnsmasq doesn't ignores unknown mac addresses19:56
TheJuliaGreetings19:57
ydedoes it ring a bell to someone? i would really appreciate it cause i'm quite lost... i even asked chatgpt;)20:01
clarkbI don't mind holding a node but I consider it a pretty big bug that ironic removed test case timeouts20:05
clarkbI'm like 99% sure that test case timeouts would identify the problem quickly but they were removed and now you timeout the entire job20:05
clarkbI added those timeouts across openstack almost a decade ago because they are pretty important in a CI environment like ours20:05
clarkbbasically lets not forego fixing systematic issues if we identify the specific cause more quickly20:06
ydethere is this file which must be setup by the conductor or inspector dhcp-hostsdir/unknown_hosts_filter 20:11
yde*:*:*:*:*:*,ignore20:11
ydei don't know how to disable it20:12
ydeoups. found my mistake... i overrided conf in kolla-ansible in the ironic.conf file, not the inspector.conf.20:19
JayFclarkb: none of us working on Ironic now, for the most part, were here a decade ago fwiw. I don't know why those were removed, but I'm happy to go down the path to check out why they were removed+maybe readd them.20:26
* JayF is getting close, but late 2004 is still only 8.5y20:27
clarkbJayF: I found the change that did it. It replaced it with the oslo test stuff but didn't enable it.20:27
clarkbOr at least that appears to be what happened. I don't think it was entirely intentional more just that we shouldn't continue to ignore it since it is a very useful tool when you have automated testing20:27
clarkbthe oslo test class may also be flawed in that it only does a graceful timeout as well...20:28
opendevreviewJay Faulkner proposed openstack/ironic master: Add additional logging on iLO power failure  https://review.opendev.org/c/openstack/ironic/+/88554920:28
clarkbI think the graceful timeout cannot trigger if eventlet has gone out to lunch ofr example20:29
clarkb(becaue nothing ever handles the signal handler? I seem to recall this was a problem at one time)20:29
JayFI am split attention right now so I can't look in depth, but would you mind putting a bug in launchpad about this and assigning it to me?20:31
JayFif you can't now, I'll try to remember20:31
clarkba bug to reenable timeouts? rpittau already has a change for it. I was hoping it would get merged as the first step in debugging this20:32
JayFthat change did not appear to effectively enable timeouts, I don't think20:33
clarkbI think it did? there was a whole set of failed jobs when the value was set to 120:33
clarkbya patchset 1120:36
clarkbpatchset 10 will be worth revisting if the gentle timeout is insufficient for this specific issue (because you can flip it to gentle=False)20:37
clarkbpatchset 4 had a job timeout (the only one?) and the 15 second test case timeout doesn't seem to have worked there so ya gentle=False may be required20:44
clarkbthese extra tests ran in the patchset 4 py39 job but did not run before the py310 job timed out https://paste.opendev.org/show/bPbFfzlyhZfXS1SPe4dJ/21:09
clarkbI sorted them to make diffing easier so don't read into the order there. But I do notice that db tests are included (I suspected them when rpittau first asked questions in the infra channel)21:09
clarkbanother approach may be to have stestr record the tests in order for each worker before beginning the tests (I don't know if it can do that)21:10
clarkbbut then you'd be able to see which test was supposed to run next21:10
opendevreviewJay Faulkner proposed openstack/ironic master: DO NOT MERGE: See if Unit tests fail more interestingly without wal  https://review.opendev.org/c/openstack/ironic/+/88555021:13
JayFdtantsur: iurygregory: ^ Just doing some science, seeing if disabling WAL will give us a more meaningful error21:13
TheJuliablah... I've been so busy I've barely looked at IRC today21:14
TheJuliayde: yeah, that all gets set as inspector configuration 21:14
opendevreviewClark Boylan proposed openstack/ironic master: DNM terrible unittest bisecting  https://review.opendev.org/c/openstack/ironic/+/88555221:50
clarkbJayF: ^ thats a thing that is a mega hack21:51
clarkbbut illustrates how to use the CI system and speculative testing to your advantage21:51
clarkbheh I think that shows your db tests cannot run in isolation?21:59
clarkb10 of them anyway21:59
TheJuliaWe can't disjoint them if that is what your saying, the db layer needs to be loaded22:01
clarkbTheJulia: I did `stestr run --slowest ironic.tests.unit.db` which I would expect to be fine22:02
clarkbeven if something needs to be loaded each test should be responsible for laoding its dependencies so they can run in isolation22:03
TheJuliathat is true22:03
TheJuliaJayF: I can look at unit testing with you tomorrow22:03
JayFclarkb: thanks for the examples, this is going to be my focus tomorrow unless someone fixes it while I'm not here :D 22:04
JayF(those are my favorite kind of fixes, the kind that appear overnight while the europeans trounce about in the stack :D )22:04
clarkbbasically I'm trying to go through the list of tests that didn't run in a timed out job: https://paste.opendev.org/show/bPbFfzlyhZfXS1SPe4dJ/ and see if any of those result in a timeout but its a bit of a non starter if the tests don't even run in isolation22:04
clarkbI'm extra suspicuous of the db tests now because they seem to require external side effects to run at all22:05
JayFhttps://github.com/openstack/ironic/blob/master/tools/test-setup.sh is your method somehwo skipping over running this?22:07
JayFHmm. That's just a helper for us, not run in CI.22:08
JayFI assume we just get that as part of CI setup then?22:08
opendevreviewClark Boylan proposed openstack/ironic master: DNM terrible unittest bisecting  https://review.opendev.org/c/openstack/ironic/+/88555222:08
clarkbJayF: that is run in CI22:09
JayFO dpm22:09
JayF**I don't see it referenced in ironic in git22:09
JayFor at least I should say; `git grep test-setup.sh` does not bring up anything in our test setup22:09
clarkbhttps://zuul.opendev.org/t/openstack/build/996a1102472d423396355a121889c839/log/job-output.txt#478 you can see it run there22:09
clarkbya its part of the parent job setup that is common to openstack (maybe all of the python jobs?)22:09
JayFthat's what I assumed, glad to have that confirmed then22:10
clarkbthe migration tests should only run if a mysql/postgres/mariadb is present so I think it is detecting hte DB is there but then failing on something else22:10
JayFyeah, ack22:11
JayFI need to look at this with a full brain, not the leftovers on the dangling end of my day22:11
JayFI'm going to pick this up 7am my time tomorrow :) (about 16.75 hours from now :P)22:11
clarkbok I've gone ahead and started bisecting the list of tests that didn't run further. So newer patchsets won't have the db stuff in it22:12
JayFfwiw that job testing disabling WAL did not impact the shape of the failure; not that surprising22:19
TheJuliainteresting, this aiui started on the 5th22:26
TheJuliaand we basically only merged a warning addition on the 5th :\22:28
TheJuliai guess I'm lamenting there is no "silver bullet"22:28
JayFTheJulia: if you have an early failing example, and we have a late success example, we can compare pip freeze output22:31
clarkbthe db fixture stuff uses testresources to avoid recreating a new db schema for each test that needs one. My hunch is that some other test class is providing necessary info to make this happen and then db migrations are able to piggy back off of that but unable to bootstrap things themselves22:32
JayFoh no22:36
JayFhttps://pypi.org/project/tox/#history22:36
opendevreviewClark Boylan proposed openstack/ironic master: DNM terrible unittest bisecting  https://review.opendev.org/c/openstack/ironic/+/88555222:37
JayFclarkb: I'll note ^ tox release just happened to coincide with our failures beginning22:38
* JayF wonders if clarkb has /ignore *tox* at this point :P22:38
JayFtests are using 4.6.022:39
JayFit's impossible to even test this in our CI, isn't it? because tox will just yolo-install newer tox22:39
JayFalthough tox 4.6.0 on python 3.11 does have tests passing for me locally, this is still highly suspect22:40
JayFthe diff makes me think this is a red herring, but the timing is still perfect https://github.com/tox-dev/tox/compare/4.5.2...4.6.022:42
clarkbJayF: I mean I use nox22:46
clarkbtox yolo installing a newer tox happens if you have tox setup requires or whatever it is called22:46
clarkbI don't see that in ironic's tox.ini22:46
JayFensure-tox in zuul is installing 4.6.0 in your tests22:47
JayFper my spot check like, 10 minutes ago22:47
clarkbyes you'd have to override that somewhere22:47
JayFlooking at the diff, I feel pretty safe thinking it's not tox22:47
JayFor at least, not something unique enough that it wouldn't blow the whole stack instead of just ironic :)22:47
clarkbJayF: are you looking at something other than the job timeouts?22:47
clarkbbecause the job timeouts occur in the unittests not tox I think (though I guess it could be tox)22:48
JayFI'm looking from a different angle for right now; basically trying to answer the "What changed June 5?" question22:48
clarkbah22:48
clarkbI think it is unlikely for tox to be the problem here bceause fewer tests are running in the timeout case its not like all tests finish and then tox fails to exit appropriately. But I guess I can't rule it out completely since tox has been known to do weird hings to the runtime22:48
JayFsame; but I was at the end of that thread and thought "what the hell" and it lined up22:49
clarkbMy brain is melting a bit against the oslo db test fixture stuff. the complete lack of logging for magical behaviors is less than ideal22:49
JayFafter checking things more likely to be to blame, I mean22:49
clarkbit should be checking for a mysql server and trying to connect to it and if that works adding it to the list of available resources. Similar with postgres22:49
JayFI'm going to approach it more similarly to you tomorrow, but bluntly, my brain is burned and I'm trying to squeeze usefulness out of it until my EOD :)22:50
clarkbbut somewhere along the line that is breaking and there is no logging and I'm not going to spin this up locally so that I can pdb it or hack in some logging22:50
JayFI don't think any human has repro'd the job failures locally at all22:50
JayFI certainly haven't, not for lack of trying22:50
JayFI really think we'll find it's something inherent to the CI setup, but until I find a smoking gun that's just me trying to blame someone not-me for my problems LOL 22:50
clarkbya I mean where I'm ending up is that the test suite isn't really built for humans to run it locally or in smaller subsets of tests right now unfortunately :/22:51
clarkbwhich I guess shouldn't be surprising because we only enforce that the ntire thing runs with the gate22:52
JayFsometimes "if it's not tested in CI, it's broken" is as much a self-fulfilling prophecy as it is a valid adage22:53
opendevreviewClark Boylan proposed openstack/ironic master: DNM terrible unittest bisecting  https://review.opendev.org/c/openstack/ironic/+/88555222:54
clarkbI think https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/provision.py#L255 is buggy because _no_engine_reason is only set immediately before an exception is raised previously so we can only hit that line of code without that value being set which resuls in AttributeError: 'Backend' object has no attribute '_no_engine_reason'23:03
clarkbya that should have a catch all Exception handler not just for Backend Not available...23:04
clarkbWhatever the underlying issue is is completely hidden from us gg23:04
opendevreviewClark Boylan proposed openstack/ironic master: DNM terrible unittest bisecting  https://review.opendev.org/c/openstack/ironic/+/88555223:13

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!