Wednesday, 2023-06-28

TheJuliadoing some runs with 12 which is 1.5x my cores, seeing some stuff, but I'm also running with --parallel-class 00:00
stevebaker[m]like dial concurrency down for reliability?00:12
JayFDial it up to find and fix more races!00:12
JayFIt should be safe00:12
TheJuliathe higher we go, the harder it will be to pin down00:13
TheJuliaunless it is purely cpu pressure00:13
JayFI was thinking tomorrow I might write significant debug logging into those migration tests00:14
TheJuliawe likely need to spit out logging on each test start00:14
JayFNormally with this kind of deadlock I would worry about the logger never getting a chance to print the message, but since it's waiting on the database I think it should make it to the disk.00:14
TheJuliawith test class name00:15
JayFYeah, I have a few ideas that are specific to test_migrations.py00:15
TheJuliawe also prevent buffered writes00:15
TheJuliaintentionally00:15
JayFI haven't thought of anything that would be more universal than that00:15
TheJuliaadding --parallel-class might make the db stuff just go away (or happen more often!)00:16
TheJuliato the stestr run00:16
JayFI think from reading the description of that, that I was working under the assumption that's how it worked already00:18
TheJuliaby default it spreads it out00:19
TheJuliarandomly!00:19
* TheJulia expects people to scream with many curse wrods00:19
TheJuliawords00:19
JayFI guess for purposes of the migration tests, they still work the way I expect because it's one big test running all of those little methods serially 00:20
TheJuliathe primary ones, yeah00:20
TheJuliaand the risk of stomping there is super high00:21
JayFI'm on board for mitigation as long as we're certain we're not actually reducing our ability to test the code. I'd be happy to test our tests less and test our code more.00:24
JayFRight now I don't think we've actually found any primary code path broken, it's all just been tests, yeah?00:24
JayFAll right, sandwiches are here for dinner, I'm going to get off of IRC for the evening. o/00:25
TheJuliathe tests, interestingly, do seem to execute with a decent speed on my machine with 6 runners, only seen anything crazy with 14 or more00:56
opendevreviewMerged openstack/ironic master: Disable WAL Pragma for Unit Testing  https://review.opendev.org/c/openstack/ironic/+/88686501:24
opendevreviewMerged openstack/ironic master: Handle SAWarning around allocations FK Constratins  https://review.opendev.org/c/openstack/ironic/+/88687101:26
TheJulia*gasp*01:28
TheJuliaTwo in a row!01:28
iurygregory\o/03:05
* iurygregory dances03:05
rpittaugood morning ironic! o/07:11
rpittauI'm going to increase the timeout in https://review.opendev.org/c/openstack/ironic/+/886985 for all tests to 60 as the test walkers can take longer to finish, unless we want to completely change the TestWalkVersions class07:30
opendevreviewRiccardo Pittau proposed openstack/ironic master: CI: Change migrations timeout to be >60 seoncds  https://review.opendev.org/c/openstack/ironic/+/88698507:31
Nisha_Agarwalmorning ironic! 08:54
Nisha_Agarwalrpittau, o/08:54
rpittauhey Nisha_Agarwal :)08:56
Nisha_Agarwalrpittau, we were looking for an update on bug https://bugs.launchpad.net/proliantutils/+bug/2021995.....we still need to try to reproduce the issue but if we know the iLO version it will help us to reproduce and even initaite discussion with firmware team08:58
rpittauNisha_Agarwal: I think that is entirely in JayF jands :)08:59
rpittau*hands08:59
Nisha_Agarwalhmm09:02
opendevreviewHuy Mai proposed openstack/sushy-tools master: Add fake_ipa inspection, lookup and heartbeater to fake system  https://review.opendev.org/c/openstack/sushy-tools/+/87536609:38
opendevreviewVerification of a change to openstack/ironic master failed: Revert "Fix IRONIC_IMAGE_NAME=non-existent-image"  https://review.opendev.org/c/openstack/ironic/+/88696011:17
iurygregorygood morning Ironic11:51
iurygregorygerrit is down?11:55
dtantsuriurygregory: seems to work for me11:56
iurygregory.-. I'm getting Timed Out on my browser .-.11:56
iurygregorytime to clean the cookies, close the browser and see if works11:57
iurygregoryok, sometimes it works and sometimes it doesn't lol 12:11
iurygregoryok, probably something wrong with my ISP .-.12:33
TheJuliagood morning13:17
TheJuliascottsol: you might be able to provide clarity for Nisha_Agarwal 13:18
opendevreviewJulia Kreger proposed openstack/ironic master: DNM: Test class based test distribution  https://review.opendev.org/c/openstack/ironic/+/88716613:30
TheJuliaiurygregory: looks like tests on https://review.opendev.org/c/openstack/ironic/+/883062 fail legitimately13:44
iurygregoryTheJulia, yeah =)13:44
iurygregoryI'm updating things locally13:44
TheJuliak13:44
iurygregoryand trying to address dtantsur comments, but gerrit is not helping too much today -.-'13:45
TheJuliaheh13:45
iurygregoryI'm looking at gerrit via my smartphone using 4g, because my wifi doesn't let me access LOL13:45
TheJuliarpittau: what is your take on CI at this point? I rechecked some of the same things you've rechecked since the FK constraint change merged and still had failures, although it looks like deadlock timeout style failures where *even* it looks like we got through test_migrations, at least at a glance13:46
TheJuliahmmmm, 12 executed13:47
TheJulia16 should have executed13:48
TheJuliaso still something, maybe13:48
rpittauTheJulia: I've increased the timeout in https://review.opendev.org/c/openstack/ironic/+/886985 because I realized that the TestWalkVersions class depended on that as it's based on oslo test base class, but I see it failed again for timeout13:50
rpittauthe tests should just fail if they pass the timeout value, but I'm wondering if SIGALARM is enough to stop them completely in this case13:50
TheJulianot in a deadlock state13:51
TheJuliasince the runner is sitting in epoll waiting for data on a socket13:51
TheJuliawhich the DB cannot give it13:51
TheJuliasomething is still hanging in all of that13:52
rpittauyeah13:53
rpittauthe timeout fixtures is useful until a certain point, unfortunately13:53
* TheJulia works on augmenting our logging14:06
opendevreviewJulia Kreger proposed openstack/ironic master: DNM: Add more debugging to tie test class, test, state  https://review.opendev.org/c/openstack/ironic/+/88716814:16
TheJuliaso in the good news category, we're not handing any raw data back past through test_walk_versions14:16
TheJuliawell, an object14:16
NobodyCamGood Morning Ironic Folks happy hump day14:43
TheJuliagood morning!14:48
zorunhi there14:50
TheJuliagreetings14:50
zorungood afternoon for CEST folks :)14:51
zorunI'm trying to wrap my head again around networking-baremetal: what is it used for?14:51
zorunaccording to https://docs.openstack.org/ironic/latest/admin/multitenancy.html#configuring-the-networking-service it's used for "flat" networking14:51
zorunwhile NGS or model-specific ML plugins would be used for "neutron" networking14:52
TheJuliait does sort of two things14:52
TheJuliaIt is required to update the physnet mapping data in neutron14:52
zorunbut I remember that there was a use-case where both plugins are used14:52
TheJuliait also has some openconfig/netconfig functionality under development14:52
TheJuliato kind of replace networking-generic-switch, but the reality is it may never do that because of how vendors operate14:53
TheJuliaand configuraiton comfort levels14:53
TheJuliahjensas: we should likely clarify some docs14:53
zorunTheJulia: so the first part is where the networking-baremetal agent pulls ports state from Ironic, and pushes it to Neutron?14:53
TheJuliabasically yes14:54
TheJuliaspecifically for the physnet information14:54
zorunso, when you create a baremetal port, you have to specify a physnet14:57
zorunwhat does the physnet mapping map to?14:57
TheJuliaa neutron physnet14:58
TheJuliabeing a physical fabric14:58
zorunah, is it a mapping from node to physnet?14:59
TheJuliayes, think of it as if you had multiple distinct physical network fabrics15:00
TheJuliaThe interactions need to know which things are plugged into15:00
TheJuliaotherwise you can end up in some awesome error states15:00
zorunthanks for the explanation, but I feel I'm missing a key piece of the puzzle15:06
zorundoes Ironic manages the network ports of nodes internally, independently from Neutron?15:06
TheJuliaI'm definitely not an expert in that area, but if you have additional questions, i might be able to point you in the right direction15:07
TheJuliaso,  think of ironic as the holder of the knowledge of the physical machine (i.e., mac address, and where things are plugged in)15:07
TheJuliawe use neutron largely for dhcp, and the actual sdn/switch integration15:07
opendevreviewVerification of a change to openstack/ironic master failed: Use tox env variables in coverage tests  https://review.opendev.org/c/openstack/ironic/+/88550715:09
zorunso typically, you would create an "ironic port" with the local-link-connection information only once, when you initially enroll the node?15:09
TheJuliacorrect15:10
zorunand whenever a user wants a baremetal node, they (or probably) would create a Neutron port, which would then be dynamically "glued" to this ironic port?15:10
zorunthanks, it's starting to make sense15:10
zorunprobably *Nova15:10
TheJuliaideally with physnet information if your needing it, since the logical how humans have assembled things are not visible at that layer15:10
TheJuliawell, that being local link connection information15:11
TheJuliait can typically be discovered through introspection, fwiw15:11
TheJuliayes, nova or a baremetal user directly creates a neutron port, we call this a vif, and then ironic requests the port to be plugged in neutron. When we do this we ship a bunch of information up to neturon15:12
TheJulianeutron either succeeds or fails based upon the data, the physnet, and everything else15:12
zorunso you copy the physnet and the local-link-connection data to the VIF port?15:13
TheJuliawhen the bind occurs, I think we send local-link-connection info15:13
TheJuliaphysnet don't remember exactly15:13
zorunok, thanks15:14
zorunnow I understand better why you possibly wanted Ironic and NGS to communicate directly instead of going through Neutron15:14
TheJuliabecause a lot of folks don't want to run neutron15:15
zorunafter looking at the code, networking-baremetal and networking-generic-switch are somewhat redundant (at least for the physical device reconfiguration part)15:53
zorunthey both implement the ML2 mechanism driver API to catch network and port creation/deletion/binding15:54
zorunand they both have some idea of internal "drivers"15:54
zorunnetworking-baremetal only has a netconf/openconfig driver15:55
zorunnetworking-generic-switch only has a SSH-based netmiko driver (with many supported vendors and models)15:55
zorunbut both seem to be easily extendable by adding additional drivers (again, this is completely internal, nothing related to Neutron)15:56
rpittaugood night! o/15:59
JayFo/16:00
zorunfunny, both projects were initially started by the same person?!16:00
dtantsurthey had initially very different goals16:01
dtantsurthe netconf/openconfig driver was a much later addition16:01
TheJuliaAlso more forward looking, but networking moves at its own speed16:02
zorunyes, it seems the original use-case was to use networking-baremetal for flat networks, and networking-generic-switch for tenant "neutron" networks16:03
TheJuliaahh, yeah, on flat ports nothing says done so neutron automatically thinks it failed16:08
TheJuliathen physnet mapping went in16:08
TheJuliaand the rest is history16:08
zorunfrom my (limited) perspective, it seems that netconf support would have been a better fit for NGS16:14
zorunwe have possible plans to extend NGS beyond netmiko devices, by using REST APIs provided by recent Aruba switches16:15
zorunthat's still a long way off though16:15
opendevreviewJulia Kreger proposed openstack/ironic master: CI: Fix PXE Ananconda cleanup test  https://review.opendev.org/c/openstack/ironic/+/88719816:39
TheJuliazorun: yeah, I think one aspect is comfort of access rights... becuase I think netconf it is all or nothing, but with ssh there is ability to restrict based upon user role16:44
TheJuliaas indicated by the AAA service16:44
TheJuliawe see a variety of comfort or discomfort with touching network infrastructure16:44
TheJuliait is... really... weird.16:44
JayFwe broke the wall between dev and ops16:44
JayFto create the DEVOP16:45
JayFnext up: the NETDEVOP16:45
JayFthe CCNAs hear footsteps and fight the coming future ;) 16:45
TheJuliaSuddenly the door explodes open with the lawyers and accountants demanding separation of duties16:45
TheJuliaa fight ensues, who will win! News at 11!16:45
JayFit's OK, we have someone sitting in a box labelled NOC who has to push the button that says 'needful' four times a minute to actually push the operational changes16:46
zorunat some point, eBPF engineers could replace networking engineers, and almost everything will run on Linux :)16:46
zorunof course the physical layer is the difficult part here16:46
TheJuliaJayF: ahh, I met with someone who indicated that if that button has to be pushed, it is a 2 month process16:46
zorunwell, many switches actually run Linux already16:47
JayFTheJulia: you didn't just run past them and dramatically dive for the button? :D 16:49
zorunTheJulia: networking-baremetal does Netconf over SSH. I guess it *could* be possible to restrict access rights (but I don't know enough about netconf)16:49
JayFnetconf itself doesn't have the access controls you need, whereas traditional ssh connections do in a lot of switches16:49
JayFAIUI from user reports16:50
zorunI like the ideas behind Netconf, it's just the vendor implementations that currently suck16:50
zorunJayF: oh, I see. I guess in 10-20 years Netconf vendor implementations will reach feature parity16:51
TheJuliaJayF: I was in a secure floor on a conference room... bypassing process would have been very bad.16:51
TheJuliazorun: nor do I, and it could still be vendor specific, like some devices out there have no AAA concepts... really.16:52
opendevreviewVerification of a change to openstack/ironic master failed: Use jammy for base jobs  https://review.opendev.org/c/openstack/ironic/+/86905216:54
dtantsurTheJulia: hi, a quick question. I'm adding missing tests for the new inspection API, copy pasting the RBAC pattern resulted in "Policy baremetal:node:ipa_continue_inspection has not been registered". Do you know why?16:55
TheJuliasounds like your test class/file or code tests are not loading the code policies16:56
TheJuliafrom ironic/common/policy.py16:56
TheJuliaat least, that is my guess16:56
dtantsurTheJulia: it's really a copy of the already working TestHeartbeatScopedRBAC..16:57
zorunI'm off for the evening, thanks for the interesting discussion TheJulia16:59
TheJuliain the same file then...17:00
TheJuliadtantsur: I guess I'd need to look at the the current state17:01
dtantsurI can post the WIP patch, maybe you notice something?17:02
TheJuliasure17:02
opendevreviewDmitry Tantsur proposed openstack/ironic master: [WIP] Add missing unit tests for /continue_inspection  https://review.opendev.org/c/openstack/ironic/+/88720217:02
dtantsurTheJulia: ^^17:02
TheJuliaso you need to check your policy names :(17:05
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic master: Add DB API for Firmware and Object  https://review.opendev.org/c/openstack/ironic/+/88306217:05
TheJuliabaremetal:driver:ipa_continue_introspection is in one place, and baremetal:node:ipa_continue_introspection is in another17:05
TheJulia... in the code :(17:05
TheJuliathat first place is the policy definition17:05
dtantsurI may be blind, but grepping continue_intro doesn't give anything17:06
dtantsurIt looks to me that I have the same text..17:06
TheJuliasorry ipa_continue_inspection17:06
TheJuliaI'm typing it out based on memory since I've got a failed test job log in by paste buffer which may yield an understanding of *where* we're blowing up in test_migraitons17:07
dtantsurahhhh, I have node:continue_inspection in one place and driver:continue_inspection in the other17:08
dtantsurdamn17:08
dtantsurthank you, you led me to the right thought17:08
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic master: Firmware Interface  https://review.opendev.org/c/openstack/ironic/+/88527617:09
TheJuliawell, we crashed it seems in _pre_upgrade_e918ff30eb4217:13
JayFthat looks broken17:17
JayFhttps://github.com/openstack/ironic/blob/master/ironic/tests/unit/db/sqlalchemy/test_migrations.py#L138717:18
JayFwe are expecting a DB error17:18
JayFthen continue in the *same txn* to insert good data17:18
JayFshouldn't the expected fail and the expected success be in separate cxtmgr, and separate txns?17:18
opendevreviewDmitry Tantsur proposed openstack/ironic master: [WIP] Correct the policy name for /continue_inspection  https://review.opendev.org/c/openstack/ironic/+/88720217:18
TheJuliawell, you can fail in a transaction and retry17:18
TheJuliathe transaction still lives unless you have a database fatal sort of thing17:19
TheJuliaI guess we need some more data on if we fail there, or not17:29
TheJuliaseems like we have trust issues with the db, dunno17:33
opendevreviewJulia Kreger proposed openstack/ironic master: CI: Fix PXE Ananconda cleanup test  https://review.opendev.org/c/openstack/ironic/+/88719817:57
TheJuliaThat has passed cleanly twice on my parallel class change, so I suspect it is likely good on master branch as well17:58
JayFwhere is your parallel class change? I can take a look18:00
JayFor you just mean locally, not proposed to CI yet?18:00
TheJuliaoh, it is just a stestr change18:00
TheJuliahttps://review.opendev.org/c/openstack/ironic/+/887168/1 is getting more data from the logs so we can pin down test_migrations stuffs. https://review.opendev.org/c/openstack/ironic/+/887166/1 is the stestr change18:01
TheJuliaI mentioned it yesterday18:01
JayFsorry I've not been well tuned in, looking at other things since yesterday18:02
TheJuliano worries18:02
* JayF still needs to get his CfP in for SeaGL and review john's sharding chain and revive+finish sharding ironicclient patch18:02
TheJuliathe anaconda change is a direct result of the parallel change as it happened to happen there, but I've seen it elsewhere18:04
TheJuliaI'm starting to feel like I'll be able to actually focus on non CI things today18:18
TheJuliaany thoughts on just merging something like https://review.opendev.org/c/openstack/ironic/+/887168/1/ironic/tests/unit/db/sqlalchemy/test_migrations.py so we can spot in the buffer output roughly where things went side ways in stdout, not relying upon the logging buffer18:24
JayFAre you sure if you use logging there it's less effective?18:24
JayFor are you assuming?18:24
TheJuliaassuming that it gets kicked to a buffer, where as we shouldn't be buffering stdout18:24
JayFI'm 100% OK with logs like that being in CI runs for test_migrations; I'm a little sketched out by it being print-statement-logging18:24
TheJuliabased upon run configuration18:24
JayFthat buffer shouldn't get stuck in an eventlet-patched unit test run, should it?18:25
TheJuliawe do have some print debug logging in the api for our own troubleshooting sanity18:25
TheJuliafwiw18:25
JayFwell, if the precent is already there18:25
JayFyolo18:25
TheJuliaw/r/t logging, no idea, but I don't htink we're doing anything with eventlet and the stestr run w/r/t logging18:26
* TheJulia wonders why openstack-tox-py38 gets queued18:26
JayFI asked Clark about that in #openstack-infra the other day18:29
JayFtl;dr our infra clouds sometimes just ... don't wanna build, have to get retried, etc18:29
JayFso you get weird jobs stuck queued or child jobs starting before parents18:29
JayFit'll be eventually consistent18:30
TheJuliafun!18:30
JayFbasically: zuul only guarantees correct ordering of the "build me a node" calls18:30
JayFand provides no guarantees jobs will actually /start/ in that order18:30
TheJuliaiurygregory looks like your latest change got bit by the run the migrations twice issue18:30
TheJuliaoooh ahh, looks like https://zuul.opendev.org/t/openstack/stream/10ec1f5893d848f4800c107df2f36016?logfile=console.log is frozen18:50
JayFthis is without the fun output, yeah18:51
TheJuliawell, *hopefully* we'll have some extra logging18:51
TheJuliaand hopefully that will give us an idea18:51
JayFwhen it finally goes kaput, you mean?18:51
TheJuliayes, if we remember18:51
JayFI wish I could forget :| lol18:52
opendevreviewMerged openstack/ironic master: CI: Fix PXE Ananconda cleanup test  https://review.opendev.org/c/openstack/ironic/+/88719818:53
TheJulia /me looks at https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_10e/887168/1/check/openstack-tox-py38/10ec1f5/testr_results.html and cries19:13
TheJuliapymysql.err.OperationalError: (3730, "Cannot drop table 'conductors' referenced by a foreign key constraint 'conductor_hardware_interfaces_ibfk_1' on table 'conductor_hardware_interfaces'.") seems like a solid hint20:20
JayFTheJulia: the other test was "table conductors already exists"20:28
JayFdo you think that's b/c the cleanup of the first failed20:28
JayFor is it possible that we're getting automatically created tables in the DB, with new schema, causing the migrations to freak20:29
TheJuliaabsolutely 20:29
TheJuliawe're attempting to create the tables, but we're in the same db20:29
JayFI'm surprised it failed sanely tbh20:29
JayFwe must have finally "won" the race to see an error instead of a deadlock 20:30
TheJuliai suspect so20:30
TheJuliaactually....20:30
TheJulia:q20:33
TheJuliaerr20:33
JayFnothing sus whatsoever in hardware_interface related migrations20:34
JayFand tests20:34
TheJuliaI don't think we're hanging there20:35
TheJuliaI *think* the issue is that we're failing teardown, and reusing the same database even tough with the OpportunisticDBFixture, we should be creating an entirely new db20:35
JayFthe problem I fall into with that line of thinking is that there's nothing substantially different about our migration tests than those in a couple other openstack projects I checked20:37
JayFso either we have the most complex DB w/full migrations history still checked in, and that causes problems (plausible) or it's not a bug in shared code20:37
TheJuliabut if sqlalchemy is struggling to sort out how to tear down our objects20:37
TheJuliabecause oslo.db trys to teardown objects as opposed to just drop the entire db...20:37
JayFI wonder if there's a belief that metadata.drop_all() *should* work as part of the tests20:39
JayFlike if it's an explicit decision to do that to test the teardown path20:39
JayFbut that seems ~useless? 20:39
TheJuliaif you end up dropping things in any order, you have to determine the teardown order properly20:39
TheJuliaoslo.db does build a list of foreign keys20:40
JayFand at the object layer, we enforce that20:40
TheJulialets see what it does with them20:40
JayFbut this is not using the object layer20:40
TheJuliahmmmmm if self.supports_drop_fk:20:40
JayFdid this change recently?20:42
TheJuliano20:42
JayFI feel like last time I looked at this code it was calling sqlalchemy metadata.drop_all()20:42
JayFthat has to be somewhere else20:42
TheJuliatrying to piece meal through this right now20:42
JayFokay I am sus of this now too20:43
JayFthat I realized where I misread before20:43
JayFI know why we're broken20:44
TheJulia????20:44
JayFfor the 5th time, maybe :D 20:44
TheJuliaheh20:44
JayFwe have a UniqueConstraint that's not an fk20:44
JayFguess where it is?20:44
JayFa UniqueConstraint that *contains* an FK but isn't one20:45
JayFhttps://github.com/openstack/ironic/blob/master/ironic/db/sqlalchemy/alembic/versions/2353895ecfae_add_conductor_hardware_interfaces_table.py#L4420:45
JayFso you can't drop the FK because of the constraint20:45
JayFright?20:45
JayFif conductor.id FK to conductor_id and conductor_id is in a uniqueconstraint20:46
JayFany test with conductor_hardware_interfaces populated is going to not tear down properly depending on how it's ordered20:46
TheJuliauhhhh hmmmm20:46
TheJuliathe fk and the unique constraint should apply independently20:46
TheJulia*should*20:46
JayFbecause removing the FK won't blank conductor_id20:48
JayFand removing the constraint won't touch the data20:48
JayFso the data remains sane and the FK+constraint should be able to be removed in any order20:48
TheJuliathe table can be dropped with a constraint, a fk cannot be, and fk is independent20:49
TheJuliaaiui20:49
JayFyeah, I think your reading of it is right20:50
JayFgotta look for explanation #6 ;) 20:50
TheJuliaheh20:50
JayFTheJulia: so I wonder if https://github.com/openstack/ironic/commit/76a920aed2b103d72552c38b579769dced8d2e22 is breaking drop_all_objects20:56
JayFTheJulia: the state that is repairing, not your fix20:56
JayFTheJulia: and we traverse that state when running migration tests, yeah? even still?20:57
JayFeh, I guess not20:57
JayFb/c you commented it out in old migrations20:57
TheJuliaheh20:57
JayFbut I could see a "loop" like this screwing up the FK things20:57
TheJuliayeah20:57
* JayF is rubber ducking in IRC and needs to ping you less while doing so20:57
TheJuliaheh20:57
JayFtalking to a person is better than screaming into the board lol20:58
JayF**void20:58
TheJuliaso interestingly enough, we have a different db name than what I expect based upon the oslodb test fixture code20:58
JayFare you reading current version or version that was running in that test?20:59
TheJuliacurrent, but the code hasn't changed in lke 6-9 years20:59
JayFaight20:59
JayFI almost wanna hook up migrations tests in a debugger just to see the code flow, but I suspect the effort:value ratio is wildly off on that21:01
TheJuliaoh, the non-fixture code did change 2 months ago21:03
* iurygregory is back21:03
iurygregoryTheJulia, yup just saw the failure on zuul, but the output of the failed log is weird21:03
TheJuliaeh, looks unrelated21:04
iurygregoryyeah, funny that is was only on the cover job21:05
TheJuliait has to do a full run to analyize the code coverage21:05
iurygregory(ง •̀_•́)ง StringException 21:09
iurygregorythe only thing I wish today is my browser opening https://review.opendev.org without hitting time out '-'21:15
JayFSome kind of internet routing issue over there?21:16
JayFit's worked for me fine all day 21:16
iurygregoryyeah =(21:16
iurygregorystill trying to figure out what is going on here21:17
TheJuliasounds like your isp has a caching proxy of evilness21:17
JayFflip off v6?21:18
JayFI like IPv6 but turning it off sometimes fixes ISP flakiness21:18
iurygregoryTheJulia, yeah21:18
iurygregoryJayF, my ISP has V6 off 21:18
JayFd'oh21:18
TheJuliahttps://meet.google.com/nqs-tiwa-hto?authuser=0 <-- call for spitballing/identifying the db issues21:18
JayFgive me a sec to grab headphones and I'll be there21:19
iurygregoryme too ^21:19
TheJuliak21:20
TheJuliahttps://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/test_fixtures.py21:30
opendevreviewJay Faulkner proposed openstack/ironic master: Drop databases in unit tests instead of contents  https://review.opendev.org/c/openstack/ironic/+/88723422:16
JayFI bet part of the reason ^ may not be in oslo.db already is b/c it requires the CI testing user to have create/drop privs on the DB22:21
JayF(our test-setup does, fwiw)22:22
JayFinteresting22:28
JayFhttps://zuul.opendev.org/t/openstack/stream/7833fbfcd9f94fcb96a92b5210545b14?logfile=console.log looks to be frozen up22:28
JayFI wonder if drop blocks on open connections, too?22:28
TheJuliaokay, I'm back22:29
JayFI think it improved the failure case, at least22:29
TheJuliauhhhhhhhh22:29
JayFthe fixture timeout caught the freeze and continued22:29
TheJuliayes, it *should* actually22:29
JayF2 failures, both fixture timeouts with no useful data 22:30
JayFother than we know the two it failed on22:30
JayFTheJulia: also is that patch what you expected?22:30
JayFI somehow think you had something more elegant in mind LOL22:30
TheJuliawhere your change froze?22:30
JayFno, I mean https://review.opendev.org/c/openstack/ironic/+/887234/1/ironic/db/sqlalchemy/__init__.py22:30
TheJulialooking22:30
JayFliterally monkeypatching it in22:30
JayFironic.tests.unit.db.sqlalchemy.test_migrations.ModelsMigrationsSyncMysql.test_models_sync22:31
TheJuliaYeah, I guess that works :)22:31
JayFironic.tests.unit.db.sqlalchemy.test_migrations.TestMigrationsMySQL.test_upgrade_and_version22:31
JayFare the two that failed with timeouts22:31
TheJuliaI was thinking from the class inside test_migrations22:31
TheJulia... did the timeout change merge yet22:31
JayFhttps://review.opendev.org/c/openstack/ironic/+/885507 no, v-222:32
JayFshould I stack on it?22:32
JayFI mean, it V-2 with a TIMED_OUT, two of them22:32
TheJuliamight be worth trying to stack on top of it22:33
JayFI'm really, really suspicous of the added timeouts causing some kind of extra problem22:33
TheJuliaoh, it absolutely is22:33
TheJuliathe value is so low we blow up on okay runs22:33
JayFI kinda wanna stack on a change to *remove all timeout fixtures*22:34
JayFah okay22:34
opendevreviewJay Faulkner proposed openstack/ironic master: Drop databases in unit tests instead of contents  https://review.opendev.org/c/openstack/ironic/+/88723422:34
JayFhonestly the only reason I'm sus is just because it puts eventlet in my traceback22:34
JayFand when eventlet is in my traceback I'm always sus22:34
JayFwhich isn't really fair or technical of me22:34
TheJuliaI... feel like we should go ahead on the path of the class parallel execution, so we can at least isolate blast damage down to specific class behaviors too22:40
JayFI want to see how my change, with the timeouts fix, works22:41
JayFif it fixes it, then I think the path is to add a unit test that sets up our DB on mysql then calls self.meta.drop_all() or that drop_all_objects() method22:42
JayF(the /original/ drop_all_objects method, to be clear)22:42
JayFor really, maybe we merge one that looks more like your idea so ^^^ is a saner next step22:42
JayFmetadata.drop_all I mean, like from sqla22:43
JayFso we can validate, in it's own freakin' test, that we can tear down the DB from the HEAD alembic state of it22:43
TheJuliastacked on top, so we get 2x the test runs too22:46
opendevreviewJulia Kreger proposed openstack/ironic master: Execute tests by class, not randomly  https://review.opendev.org/c/openstack/ironic/+/88716622:46
TheJuliathere is the irc message22:46
JayFon my change, https://zuul.opendev.org/t/openstack/stream/1c41663bc5e14deea4083562069fb4b2?logfile=console.log has been hanging out for 8 minutes now :( 22:52
TheJuliale sigh22:55
JayFI'll note22:55
JayFmy patch is a place22:55
JayFwhere we can put ALL KINDS OF DEBUGGING STUFF IN22:55
JayFright before we drop the things22:55
JayFand we have an engine22:55
JayFso perhaps we can flip that approach to instead dump some useful metrics before dropping DB?22:55
JayFlike, perhaps, a show full processlist;?22:56
JayFI'll also note that my change was mysql only; we haven't seen postgres freeze that we know of but it's possible22:56
JayFTheJulia: ... https://zuul.opendev.org/t/openstack/build/ff5c2e1dd077459e91ef34eadf6d2806/log/job-output.txt did arm64 tests always run mysql?22:58
JayFironic.tests.unit.db.sqlalchemy.test_migrations.TestMigrationsMySQL.test_walk_versions failure on arm6422:59
JayFfrom https://review.opendev.org/c/openstack/ironic/+/887234 fwiw, last reported openstack-tox-py310-arm6422:59
* JayF is about to turn into a pumpkin22:59
TheJuliaADD ALL THE DEBUG!22:59
* TheJulia is almost pumpkin already22:59
JayFmy eod is usually at 4pm22:59
* TheJulia has pumpkin bread, and needs to start the sauce for dinner soon()22:59
JayFmy brain hit the door about 15 minutes ago lol22:59
TheJulialol23:00
JayFtbh after this we should consider stepping back23:00
JayFdocumenting what we've tried/merged/determined was bad23:00
JayFand hit the mailing list23:00
JayFmaybe engage fresh brains with less ironic brain 23:00
JayFI don't wanna do that, it won't be as satisfying as just finding the fix, but at some point we have to call for help23:01
TheJuliawe're super close, I think23:01
JayFyour optimism is slightly bothersome ;) 23:01
TheJuliaat least, the path I've been on has been fairly well behaved, but looking at other jobs the big difference is the parallel class change23:02
TheJuliawell, I have a few meetings tomorrow so I expect I'll be busy23:02
JayFI'll look at all the results23:02
JayFand then pick a path, any path23:02
JayFand keep going23:02
JayFwhat else is there to do23:02
TheJuliaYup :(23:06

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!