Monday, 2022-10-24

shukunJayF: Could you please +2 this patch (https://review.opendev.org/c/openstack/ironic/+/850553) again so that I can continue the backport process to the rest branch?01:15
vanougood morning ironic01:36
opendevreviewSONG SHUKUN proposed openstack/ironic bugfix/21.0: Add support auth protocols for iRMC  https://review.opendev.org/c/openstack/ironic/+/86244901:51
opendevreviewSONG SHUKUN proposed openstack/ironic bugfix/21.0: Add support auth protocols for iRMC  https://review.opendev.org/c/openstack/ironic/+/86244901:57
*** Guest0 is now known as osmanlicilegi04:11
opendevreviewJacob Anders proposed openstack/sushy master: Improve resiliency of eTag handling  https://review.opendev.org/c/openstack/sushy/+/85612304:40
rpittaugood morning ironic! o/08:03
dtantsurhmm, I see `sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked` in the latest metal3 CI run09:20
dtantsurI wonder if we broke something. TheJulia ^^09:20
dtantsurthe first traceback is https://paste.opendev.org/show/b9nTqbw0m7HF7QZrsA1i/, then repeats for every action09:22
dtantsuryep, trivial to reproduce locally :(09:25
dtantsurreverting all 3 phases of SQLA2 migration fixes the problem. uh-oh.09:29
dtantsurTheJulia, JayF, we need to find something urgenly, else the whole SQLA2 work need to be redone :(09:30
dtantsurI assume we don't close some transaction, but I don't know which09:35
opendevreviewDmitry Tantsur proposed openstack/ironic master: Do not disable autocommit until we fully migrate  https://review.opendev.org/c/openstack/ironic/+/86247609:43
dtantsurgood news, I think I complete revert is not needed. Bad news ^^^09:43
dtantsurTheJulia, rpittau, ^^09:43
rpittauah... ok09:44
dtantsurI think it boils down to some dbapi calls still opening a transaction without properly finishing it09:45
dtantsurwhich is okay for mysql, but not for sqlite09:45
dtantsurI wonder if we need a CI job with sqlite *somehow*09:46
opendevreviewVerification of a change to openstack/ironic master failed: Cross test sushy with python 3.10  https://review.opendev.org/c/openstack/ironic/+/86214009:49
rpittauI wonder if I can hook up a metal3 job with the ironic-image based on source directly here09:50
dtantsurthat would be ideal (but make sure to run it on our infra, not the metal3's one)09:50
rpittauyeah09:52
opendevreviewJacob Anders proposed openstack/sushy master: Improve resiliency of eTag handling  https://review.opendev.org/c/openstack/sushy/+/85612310:11
iurygregorygood morning Ironic12:15
opendevreviewJacob Anders proposed openstack/sushy master: [WIP] Retry BootSourceOverride request when SettingsURI is read-only  https://review.opendev.org/c/openstack/sushy/+/85659712:57
TheJuliaGood morning13:19
TheJuliadtantsur: oh lovely :(13:20
TheJuliadtantsur: or a commit that hasn't called yet I guess13:20
TheJuliaI remember checking everything to making sure we closed things out, but maybe there is more. 13:22
TheJuliaoh wow, that halted very early on13:22
TheJuliadtantsur: is that log with debug turned on?13:30
*** rcastillo_ is now known as rcastillo13:30
TheJuliadtantsur: no nodes, just start process correct?13:37
TheJuliaI *suspect* it might be register_conductor13:39
dtantsurTheJulia: I assume it's because touch_conductor was not updated14:38
dtantsurgood morning14:38
JayFWhat kind of tests are those failing on metal3? Is it trying to setup an Ironic server only backed by sqlite?14:38
JayFOr just unit tests that cover a case that ours don't?14:38
TheJuliaso orm style of ops is still supported, and enginefacade is supposed to close things out, but I think it is a situation where we end up handing back and object that keeps the transaction open14:39
TheJuliajust some style of ops did not explicitly need to change nor did we spot in unit testing, and heartbeat ops is definitely an area where I could see it breaking14:40
JayFack; good stuff then, nice to find these corners in CI then (even if not /our/ CI)14:40
TheJuliaJayF: starting the service with just sqlite outside of unit test14:40
JayFDo we document that as supported? (if metal3 operates this way; we obviously support it -- it'd be cool if we documented it)14:41
JayFand probably should also have a CI job that tests it in our gate, too14:41
TheJuliayeah, realistically we're going to need a single process sqllite job14:42
TheJuliaThat is likely the only upfront config which would have caught this14:42
* TheJulia rebuilds tox env since it didn't want to be happy to run unit tests14:42
dtantsurJayF: no write transactions work14:47
dtantsurbasically, something (I presume the conductor keepalive loop) locks the database and never unlocks it14:48
JayFYeah I was mainly trying to tease out why this didn't show up in Ironic CI14:48
JayFbecause we shouldn't be breaking downstream integrations14:48
JayFand, apparently, we should advertise you can use Ironic with sqlite14:48
dtantsurbecause mysql is fine with transactions opened forever, apparently?14:48
JayFmysql, depending on how it's configured, implicitly closes transactions in some cases14:48
dtantsurI don't think it's a sqlite-specific problem, it can be an actual bug if we don't commit transactions14:48
* TheJulia blinks14:49
TheJuliaI think the issue is the transaction gets started, and can live onward if we return a query object14:50
TheJuliait could also be singleprocess is a contributing factor14:50
JayFsqlite also doesn't so simultaneous writes14:51
dtantsurThe issue can be reproduced even without any API accesses14:51
JayFso having a txn hang open is guaranteed fatal14:51
dtantsurjust start and wait a few seconds14:51
JayFwhereas in mysql, it's not limited that way14:51
JayFit'd be possible we'd have a dangling write cxn in a place that isn't called much14:51
JayFtherefore not breaking mysql14:51
* JayF thinks this fits with TheJulia's hypothesis of register_conductor or some other method called on start14:52
* TheJulia suspects a breaking version of pip dropped14:52
JayFMeeting in ~5 minutes14:54
TheJuliahttps://paste.opendev.org/show/817308/14:57
dtantsurwut15:00
TheJuliayup... I bumped my requirements locally too just to see if that was it15:00
TheJuliainstalling py39 now15:01
JayF#startmeeting ironic15:01
opendevmeetMeeting started Mon Oct 24 15:01:44 2022 UTC and is due to finish in 60 minutes.  The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot.15:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:01
opendevmeetThe meeting name has been set to 'ironic'15:01
TheJuliao/15:01
matfechnero/15:01
ebbexo/15:01
JayFGood morning everyone. Welcome to our meeting.15:01
ajyao/15:02
hjensaso/15:02
JayFGonna be honest and say I probably dropped the ball and should've cancelled this; I'm doubtful we have much to talk about so I'm going to move fast.15:02
iurygregoryo/15:02
dtantsuro/15:02
rpittauo/15:02
JayF#note We had a productive PTG last week. Thank you to everyone who particiapted!15:02
dtantsurlive fast, endmeeting early15:02
JayFThere are no action items from last meeting that were not handled in a PTG session.15:02
JayFNormally we'd also say at this point to check the whiteboard for workstream status; but we haven't updated any of that for the PTG.15:03
TheJuliadtantsur: ++15:03
JayFSo I am going to just keep going as any information there is more out of date than scrolling up 10 lines in IRC :)15:03
TheJuliaI would suggest updating the whiteboard15:03
rlooo/15:03
TheJuliasome stuff naturally rolls, some stuff is done and can be struck-trough and removed soon15:03
JayFIt's my intention to do that very soon as part of the documenation of workstreams15:04
JayF#action JayF to ensure whiteboard gets updated when he makes 2023.1 Ironic workstreams spec15:04
JayFMoving on15:04
JayFJust a reminder if anyone needs code reviewed, please add hashtag ironic-week-prio and/or mention it here 15:04
JayFI don't believe there's anything outstanding, but if there is, please do tag it. I'll be reviewing all tagged, up to date PRs this afternoon.15:05
dtantsurhttps://review.opendev.org/c/openstack/ironic/+/862476 is the mosti mportant one15:05
JayFack15:05
ebbexI've got a bunch for bifrost here. https://review.opendev.org/q/topic:deps , but there's a big question as to what to do with suse.15:06
dtantsurebbex: already deprecated => remove support for good?15:06
ebbexFine by me, all in favor of removing suse support?15:06
JayFIf it's already been deprecated, removal is the next step15:07
dtantsurhttps://docs.openstack.org/bifrost/latest/install/index.html#supported-operating-systems does not even list suse any more15:07
rpittaulet's remove it15:07
dtantsurI think I just was too lazy to remove the code later on :)15:07
rpittauheh me too :P15:07
JayFThat sounds like the clear consensus.15:07
ebbexCool. I'll have it cleaned out instead.15:08
JayF#action ebbex To remove suse-supporting code from bifrost, it's already deprecated and scheduled to be removed.15:08
JayFthanks for bringing that up and volunteering to clean it up15:08
JayFif there are no other patches to discuss; moving on15:08
ebbexnp :)15:08
JayFBaremetal SIG: it was decided at PTG the BM SIG would move to quarterly, and be booked well in advance with calendar invites sent out.15:09
JayFDoes someone want to own the action to schedule that? I'd suggest booking one for Q1 2023 as our next one15:09
dtantsurarne_wiebalck is the natural candidate, but I can do it if he doesn't have time15:09
dtantsurthere an internal something that we need to clear up first before I know my schedule for Q115:10
JayFthat sounds good; we should follow up next week if nobody has taken action yet15:10
JayFthanks :) 15:10
JayFThere are no RFEs for review.15:10
dtantsuryou can put the action on me for now, keeping in mind that it probably won't happen before next week15:10
dtantsur* by next week15:10
JayF#action dtantsur or arne_wiebalck to book Baremetal SIG meeting for Q1 2023 sometime in the next couple of weeks.15:11
dtantsuryep15:11
JayFThere are no pre-agenda'd open discussion items15:11
JayFAre there any topics for open discussion? I'll give a couple minutes for anyone to speak up before ending the meeting.15:11
TheJuliaI did talk to a larger operator w/r/t the ironic/nova-compute stuffs15:13
JayFAnything interesting come outta the chat?15:13
TheJuliabasically, I could tell there was an undertone that they were *really* not pleased by the availability and the locked interaction behavior and the fact they will now need to self-implement HA, but that the overall tradeoff from the amount of work/effort they face *today* would offset that as the lost hypervisor records today are *far* more painful than what they view that work to be15:14
JayFI think that's generally reflective of how most of us feel about it, yeah?15:14
JayFNot ideal, but the best we can do inside the nova model and orders of magnitude better than things today15:15
TheJuliaThey believe that based on what I wrote in the spec that I tossed up for ironic, that it should generally "just work" moving forward15:15
TheJuliaJayF: yeah, I think so as well15:15
JayFThat's a good report, thank you. 15:15
JayFEveryone being slightly unhappy is the usual sign of true compromise :)15:16
JayFI'm going to close the meeting out if there's nothing else.15:16
TheJulia#link https://review.opendev.org/c/openstack/ironic-specs/+/86180315:16
TheJuliaJayF: johnthetubaguy: sorry, but I had to get the interaction out of my head15:16
JayFReading that is on my list today :). It's going to be a spec writing/reading day :).15:16
TheJuliabut nova-side is going to be important/needed as well15:16
TheJuliajohnthetubaguy: I'd also appreciate a review of that spec if you have the time, since I took a shot at how I percieve the nova-compute service upgrade to execute15:17
JayFThanks for taking the first swing at that spec; we'll review and enhance it15:20
JayF#endmeeting15:20
opendevmeetMeeting ended Mon Oct 24 15:20:08 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:20
opendevmeetMinutes:        https://meetings.opendev.org/meetings/ironic/2022/ironic.2022-10-24-15.01.html15:20
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/ironic/2022/ironic.2022-10-24-15.01.txt15:20
opendevmeetLog:            https://meetings.opendev.org/meetings/ironic/2022/ironic.2022-10-24-15.01.log.html15:20
opendevreviewVerification of a change to openstack/ironic master failed: Add ironic-grenade-skip-level Job  https://review.opendev.org/c/openstack/ironic/+/83696615:21
TheJuliaso py39 works just fine, I'm suspecting a dependency dropped that breaks 3815:25
JayFWould you all be OK if I just documented "possible workstreams for 2023.1 Antelope" and "completed workstreams for 2023.1 Antelope" and refer people to the whiteboard to see what is in-progress?15:31
opendevreviewVerification of a change to openstack/ironic master failed: Do not disable autocommit until we fully migrate  https://review.opendev.org/c/openstack/ironic/+/86247615:34
JayFthat failed in unit tests(!)15:35
JayFThe conflict is caused by:15:35
JayF    The user requested SQLAlchemy>=1.4.015:35
JayF    The user requested (constraint) sqlalchemy===1.4.415:35
JayFTheJulia: ^ is this the pip breakage you were referring to?15:35
JayFif so, it's impacting gate :| 15:35
JayFhttps://zuul.opendev.org/t/openstack/build/dcf1bfcc6b40496badc4d9d37267ec0815:35
JayFthat is a brutal zuul failure line on that fix patch15:36
TheJuliaJayF: seems reasonable, although we've often stalled on items which have built up on the whiteboard15:36
TheJuliathink, "oh, this came up and it was more important, I'll try to get back ot the other thing)15:37
JayFTheJulia: I forsee a "Is anyone working on:" topic with a handful of subtopics for next week15:37
JayFTheJulia: for things that were lost mid-stream 15:37
TheJuliayah15:38
JayFWhat the hell is going on with the gate?15:40
JayFbreaking pip version totally seems possible15:41
JayFbut we can't be the only folks hurting with this15:41
opendevreviewEbbex proposed openstack/bifrost master: Remove remaining traces of Suse  https://review.opendev.org/c/openstack/bifrost/+/86154115:51
TheJuliadtantsur: are you in a place to quickly test a patch?15:55
TheJuliawith sqlite?15:55
* TheJulia crosses her fingers16:09
dtantsurTheJulia: to an extent (I can test if things work at all, but not the whole metal3 flow)16:10
TheJuliathat should be fine16:11
TheJuliarunning unit tests now16:11
rpittaubye everyone o/16:13
JayFIssue with gate appears to be infrastructural; pypi is fronted by a CDN and if the CDN can't reach backend servers; it gives out of date info that blows up the gate16:14
JayF(per clark in #opendev)16:14
TheJuliasweet!16:15
opendevreviewJulia Kreger proposed openstack/ironic master: WIP: Sqllite fix maybe?  https://review.opendev.org/c/openstack/ironic/+/86250616:18
dtantsurTheJulia: I wonder if you need to update touch_conductor as well16:19
TheJuliapossibly16:19
TheJulialikely next16:19
* TheJulia will be pulling it back up after the current test run16:20
TheJuliaI don't think it returned a query object16:20
TheJuliabut I shall see!16:20
opendevreviewEbbex proposed openstack/bifrost master: Switching netstat to ss in report  https://review.opendev.org/c/openstack/bifrost/+/86154216:21
dtantsurTheJulia: with your patch still crashes in touch_conductor :(16:21
TheJuliasame backtrace?16:21
dtantsuryeah16:22
TheJuliaInteresting...16:22
TheJuliawhat gets called *before* touch_conductor16:22
opendevreviewEbbex proposed openstack/bifrost master: Fix initial python/venv dependencies  https://review.opendev.org/c/openstack/bifrost/+/86153416:23
opendevreviewEbbex proposed openstack/bifrost master: Install git-core in prep-for-install  https://review.opendev.org/c/openstack/bifrost/+/86153516:23
opendevreviewEbbex proposed openstack/bifrost master: Remove unused iniparse python system dependency  https://review.opendev.org/c/openstack/bifrost/+/86239116:23
opendevreviewEbbex proposed openstack/bifrost master: Remove pymysql from system dependencies  https://review.opendev.org/c/openstack/bifrost/+/86153716:23
opendevreviewEbbex proposed openstack/bifrost master: Install passlib to venv (htpasswd)  https://review.opendev.org/c/openstack/bifrost/+/86153616:23
opendevreviewEbbex proposed openstack/bifrost master: Install firewall to venv (redhat)  https://review.opendev.org/c/openstack/bifrost/+/86153816:23
TheJuliatouch_conductor should be fine, fwiw16:24
opendevreviewJulia Kreger proposed openstack/ironic master: WIP: Sqllite fix maybe?  https://review.opendev.org/c/openstack/ironic/+/86250616:32
TheJuliamaybe?!16:32
TheJuliadtantsur: give ^^ a quick spin if you can16:33
dtantsursame :(16:36
TheJuliaugh!16:36
dtantsuryou can reproduce by ironic running with16:38
dtantsur[DEFAULT]16:38
dtantsurauth_strategy = noauth16:38
dtantsurdebug = True16:38
dtantsurrpc_transport = none16:38
dtantsurenabled_hardware_types = fake-hardware16:38
dtantsurenabled_boot_interfaces = fake16:38
dtantsurenabled_deploy_interfaces = fake16:38
TheJuliathanks16:42
* TheJulia wonders if it is get_conductor16:45
TheJuliaerr, no, that doesn't do it16:47
TheJuliadtantsur: so... it is running, no deadlocks17:11
TheJuliayet!17:11
dtantsurhuh? with your patch?17:12
TheJuliayeah17:12
TheJuliamy local repo state and current dependencies17:13
TheJuliaperiodics are triggering17:13
TheJuliadtantsur: can I get a link to the job log where it is failing?17:14
dtantsurTheJulia: sure, fetch the tarball from https://jenkins.nordix.org/job/metal3_ironic_image_main_integration_test_ubuntu/22/17:14
TheJuliahttps://paste.opendev.org/show/byaFQonINexfnQ2FeBrR/ <-- what it is running with locally17:15
dtantsurI think I have a freshly built venv17:21
dtantsuranyway, time to go unfortunately17:21
dtantsurTheJulia: do you feel particularly bad about re-enabling autocommit for now and keeping experimenting?17:21
dtantsurmetal3 is kinda stuck17:21
JayFWe already have that change approved17:22
dtantsurooops17:22
JayFgate is in pypi-outage-hell17:22
dtantsuroooooooops17:22
JayFsee #opendev17:22
TheJuliadtantsur: just to confirm, this is single process yes?17:23
dtantsurtrue17:23
dtantsurso yeah, I'm using the `ironic` executable17:23
TheJuliaokay, weird17:25
TheJuliaoh, there we go17:25
TheJuliafinally17:25
TheJuliait failed after the second round of periodics17:25
dtantsur\o/17:26
dtantsuron this positive note I'll wish you a nice evening17:27
TheJuliagoodnight!17:27
JayFIt's not just me or firefox, right? The code review dashboards on the ironic whiteboard are completely fubar now17:34
TheJulialooks fine to me17:36
TheJuliaoh... the dashboards17:36
TheJuliaI'm not sure I've used the classic dashboards in ages17:36
TheJuliasome of the super old, like create db aeva originally did break in one of the gerrit upgrades, but I thought we nuked them17:38
JayFYeah I think it's all broken17:57
JayFI tried to remake them in the "new" way17:57
JayFand they broke again17:57
JayFI'm going to remove them ...17:57
TheJuliayeah, we mainly went to hashtag + combined queries18:08
TheJulia2.5 minutes..18:32
TheJuliaOkay, there are a couple differet things going on21:30
TheJulia1) OperationalError as raised by sqlite3's adapter is not an exception it knows to auto-retry on21:30
TheJulia2) it *actually works* every once and a while. If you drop the heartbeat interval too low, it never works21:35
JayFI'm trying to add21:47
JayF> Cleaning up RAID created by tenants21:47
JayFto my document21:47
JayFbut it's unclear to me what the actual specific work is here?21:47
JayFI think it's tl;dr: 1) we added support for skipping cleaning of sw raid member devices 2) that needs to clean out RAID created by tenants, even if that is disabled (?) 21:48
JayFbut that sounds wrong to me, and I think I must be missing something21:48
JayFdtantsur: ftarasenko: ^ if you could clarify for me what the work is for clean out raid created by tenants, it's not clear to me and the PTG notes arenot wonderful :( 21:49
JayFI'm going to push with a known-probably-bad summary before I EOD; you can also comment on that PR (or push up a better summary) if you'd prefer21:49
opendevreviewJay Faulkner proposed openstack/ironic-specs master: Add Ironic work items for 2023.1  https://review.opendev.org/c/openstack/ironic-specs/+/86253822:00
TheJuliaw/r/t sqlite: What if we only support in-memory....22:07
TheJuliawell, that would solve metal322:08
TheJuliait wouldn't solve standalone without autocommit22:08
* TheJulia suspect metal3 creates a pile of sqlalchemy objects that eventually manage to autocommit through22:08
TheJuliawell, ironic in metal322:08
JayFiurygregory: please remove the -1 or clarify in https://review.opendev.org/c/openstack/releases/+/84793322:10
TheJuliaI *think* the only path forward to possibly making it more sane is to restructure db queries so we no longer use orm model query format22:14
TheJuliaeven then... pagination explicitly requires it22:14
TheJuliafor now.22:14
TheJuliabasically, we need a mode where we can have a singular reader/writer operation22:22
TheJuliaglobally, OR can use the db connection pooling behavior22:22
TheJuliaand... each $thing we do is a new file open22:23
TheJuliawhich is locking22:23
JayFIt seems a little bananas to me that we ever agreed that Ironic's service could run under sqlite limitations22:24
TheJuliaexample: just after the periodic trigger: https://paste.opendev.org/show/bYQLLbacafpsfba8oIsQ/22:24
TheJuliaI think it was always implicit, less explicitly stated22:24
TheJuliaand I think "because heat can"22:24
TheJuliabut I think heat does it in-memory22:24
JayFI mean, it ships that way with metal322:25
JayFwhich means we either explicitly support it or we dunk on a major external integration22:25
JayFin-memory sqlite is a whole different ball of wax to on disk22:25
TheJuliayeah22:26
TheJuliaand in metal3 it is ephemeral22:26
JayFdtantsur: we should probably talk about metal3 moving to sqlite in memory or off sqlite altogether22:26
TheJuliaso.... there really is no need to file back it, I think22:26
JayFdtantsur: this is only going to get worse22:26
* JayF hands anyone who wants it a mysql container image for docker22:27
JayF:P 22:27
TheJuliaYeah, periodics restart, 3 new connections after everything closes out22:27
TheJulia\o/22:27
* TheJulia dances22:27
TheJuliaand not in a good way :)22:27
TheJuliathe added pain with it all is sessions... which create transactions are auto-created by enginefacade even if we don't want/need them22:32
TheJuliaand that compounds things, they are supposed to auto-close out as well, but I'm fairly sure on my theory that the orphaned returned element holds the session... i.e. model_query calls.22:34
opendevreviewJay Faulkner proposed openstack/ironic-specs master: Add Ironic work items for 2023.1  https://review.opendev.org/c/openstack/ironic-specs/+/86253822:44
opendevreviewJay Faulkner proposed openstack/ironic-specs master: Add Ironic work items for 2023.1  https://review.opendev.org/c/openstack/ironic-specs/+/86253822:51
opendevreviewVerification of a change to openstack/ironic master failed: Do not disable autocommit until we fully migrate  https://review.opendev.org/c/openstack/ironic/+/86247623:14

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!