Thursday, 2024-02-01

adamcarthur5Thanks JayF! I will check it all out tonight 00:03
opendevreviewMerged openstack/ironic-python-agent-builder master: [codespell] Fixing Spelling Mistakes  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/90678000:44
opendevreviewMerged openstack/ironic-python-agent-builder master: [codespell] Adding Tox Target for Codespell  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/90678100:44
opendevreviewMerged openstack/ironic-python-agent-builder master: [codespell] Adding CI target for Tox Codespell  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/90678200:44
opendevreviewOpenStack Proposal Bot proposed openstack/ironic-inspector master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic-inspector/+/90693502:15
opendevreviewMerged openstack/python-ironic-inspector-client master: add pyproject.toml to support pip 23.1  https://review.opendev.org/c/openstack/python-ironic-inspector-client/+/90607904:13
opendevreviewTakashi Kajinami proposed openstack/ironic-inspector master: Remove commenct lines for old openstackdocstheme  https://review.opendev.org/c/openstack/ironic-inspector/+/90737804:15
opendevreviewMerged openstack/ironic master: Fix service role support  https://review.opendev.org/c/openstack/ironic/+/90714804:23
opendevreviewSteve Baker proposed openstack/sushy-tools master: Handle nova instance having no image  https://review.opendev.org/c/openstack/sushy-tools/+/90676705:08
opendevreviewSteve Baker proposed openstack/sushy-tools master: [WIP] add virtual-media-boot to openstack driver  https://review.opendev.org/c/openstack/sushy-tools/+/90676805:08
rpittaugood morning ironic! o/08:20
opendevreviewMerged openstack/sushy-tools master: Handle nova instance having no image  https://review.opendev.org/c/openstack/sushy-tools/+/90676708:52
opendevreviewDmitry Tantsur proposed openstack/ironic master: Online migration for inspect_interface inspector->agent  https://review.opendev.org/c/openstack/ironic/+/90739809:41
*** ravlew is now known as Guest120110:42
*** ravlew1 is now known as ravlew10:43
opendevreviewOpenStack Release Bot proposed openstack/ironic-python-agent bugfix/9.9: Update .gitreview for bugfix/9.9  https://review.opendev.org/c/openstack/ironic-python-agent/+/90740410:58
opendevreviewMerged openstack/python-ironic-inspector-client master: [codespell] Adding Tox Target for Codespell  https://review.opendev.org/c/openstack/python-ironic-inspector-client/+/90711611:16
opendevreviewMerged openstack/ironic-inspector master: Remove commenct lines for old openstackdocstheme  https://review.opendev.org/c/openstack/ironic-inspector/+/90737811:16
opendevreviewOpenStack Release Bot proposed openstack/ironic bugfix/24.0: Update .gitreview for bugfix/24.0  https://review.opendev.org/c/openstack/ironic/+/90741011:20
opendevreviewMerged openstack/ironic-python-agent bugfix/9.9: Update .gitreview for bugfix/9.9  https://review.opendev.org/c/openstack/ironic-python-agent/+/90740411:24
iurygregorygood morning Ironic11:44
opendevreviewMerged openstack/python-ironic-inspector-client master: [codespell] Adding CI target for Tox Codespell  https://review.opendev.org/c/openstack/python-ironic-inspector-client/+/90711712:00
arne_wiebalckGood morning, Ironic!13:26
arne_wiebalckPlease note https://indico.cern.ch/event/1378171/ (and https://indico.cern.ch/event/1376907/)13:28
opendevreviewJake Hutchinson proposed openstack/bifrost master: Bifrost NTP configuration  https://review.opendev.org/c/openstack/bifrost/+/89569113:34
dtantsurthanks arne_wiebalck!13:46
rpittauthanks arne_wiebalck :)13:51
opendevreviewDmitry Tantsur proposed openstack/bifrost master: Deprecate ironic-inspector support  https://review.opendev.org/c/openstack/bifrost/+/90519213:51
opendevreviewJulia Kreger proposed openstack/ironic stable/2023.2: Fix service role support  https://review.opendev.org/c/openstack/ironic/+/90726814:44
opendevreviewJulia Kreger proposed openstack/ironic stable/2023.1: Fix service role support  https://review.opendev.org/c/openstack/ironic/+/90726914:45
TheJuliai love it when things cleanly backport just by using the button in the gui14:45
TheJuliadtantsur: you should likely push a reno into ironic to advise of status in advance of our next release to signal to operators and further refine expectations moving forward14:47
arne_wiebalckdtantsur: rpittau: np, would be great if you can make it!15:33
dtantsurTheJulia: you mean, about inspection?16:04
dtantsurgood morning16:04
TheJuliadtantsur: yes!16:05
TheJuliaand Good morning to you as well!16:05
dtantsurTheJulia: that was my thought, I just don't know the status yet :) I believe this work will also be prelude-worthy16:05
TheJulia++16:05
TheJuliastrong preludes are always good!16:05
TheJulia"It was a dark and stormy night, and the ironic contributors plotted and schemed! They determined we needed <insert stuff here>.16:06
dtantsur!!16:06
opendevmeetdtantsur: Error: "!" is not a valid command.16:06
dtantsurc'mon16:07
dtantsurTheJulia: btw, I wanted to ask you if you think https://review.opendev.org/c/openstack/ironic/+/907398 is a good idea at all16:07
TheJuliadtantsur: I'm not entirely sure, but I think it makes sense to also in the same commit to upgrade the status check to return a warning if nodes are set with the deprecated interface16:09
TheJuliaThat warning could also include "set this field" and "here is why" and all that pertinant stuff to make it self-correcting16:10
dtantsurTheJulia: can we have warnings in the status check?16:10
TheJuliayes!16:10
dtantsurnice! I'll make a mental note (if you don't mind not squeezing it in the same change)16:11
TheJuliahttps://github.com/openstack/ironic/blob/master/ironic/cmd/status.py#L9316:11
dtantsurah, you said the same commit16:11
* dtantsur should learn to read16:11
dtantsurremind me please, if I add something today, it will issue the warning on the upgrade TO or FROM 2023.2?16:12
TheJuliawe just want to make sure we leave that migration around until the interface is removed or maybe sometime after "just" so people have an easy path16:12
dtantsurehmmm 2024.116:12
TheJuliaso, your supposed to run the new version's status check *before* completing an upgrade16:12
* dtantsur is curious if we do it in bifrost16:12
TheJulia.... I don't remember16:13
TheJuliadevstack runs it but I think only errors if the result code is "error"16:13
dtantsurwe don't seem to document the command in https://docs.openstack.org/ironic/latest/admin/upgrade-guide.html?16:13
TheJuliaugh, we should fix that indepedently16:13
TheJuliait came from the openstack wide effort to add it16:14
dtantsuryep. otherwise, I don't expect a lot of admin to run it, especially with standalone ironic16:14
dtantsurthe upgrade checks run after online_data_migrations, right?16:15
* dtantsur hasn't paid attention to this work, sorry16:15
TheJuliatake a look at our upgrade.sh16:15
dtantsurfair16:15
TheJuliaI just don't remember at this point16:15
TheJuliaupgrade status checking/preflight warning stuff was back around wallaby16:16
TheJuliabut definitely comes in handy!16:16
* dtantsur is confused by upgrade.sh16:16
dtantsurWe run 'upgrade check' before and after upgrade, ignoring failures in the former. Which makes zero sense to me, to be honest16:18
dtantsurAnyway, TheJulia, upgrade checks run before online_data_migrations, so we need to include any warnings or errors in the next release, not this one.16:18
TheJuliathere is a madness to 16:19
TheJuliayou can include a warning, fwiw16:19
TheJuliayou can then elevate it to an error when the interface is removed16:19
TheJuliain order to block any forward progress16:19
TheJulia"oh, you have a big problem, fix it before proceeding"16:19
TheJuliakeep in mind most of that is in the context of devstack and to help us navigate some of the issues with grenade, since in theory you should be able to run online_data_migrations with the deployment still16:20
TheJuliajust, if you have a error, the migration can also fix it16:20
TheJuliawe had a case where we hit that with something I don't remember at this point16:20
TheJulia(we're talking 2+ years ago)16:20
dtantsurLet's maybe generalize it: we should have a validation that no nodes have interfaces that are not enabled? I'd put it in a separate patch at this point.16:21
TheJulia(or make it a warning as a result, and we did have a legitimate case of this which is why the pattern is in the upgrade.sh16:21
TheJuliadtantsur: likely!16:21
TheJuliaactually, that is a good check to also backport16:22
TheJulia*but* if there are specific things we know are going away, we *should* likely also provide specific direction on correcting the issue16:22
TheJuliacould be as simple of a data structure of "known_interface_removals" to map to an entry with "inspector" with some extra help text16:22
dtantsurNote: inspector is not deprecated yet. I'm only putting deprecation in Bifrost. We cannot deprecate it for the general case.16:22
dtantsurSo "please stop using inspector" is not something I can say yet.16:23
TheJuliatrue, not quite yet16:23
TheJuliasoon, maybe :)16:23
TheJuliaI'm going to go get a shower and dig into a painful support case which needs my eyes16:23
dtantsurThe online migration I'm adding addresses are more narrow case: someone (e.g. bifrost) removing 'inspector' from the enabled list before all nodes are updated16:23
dtantsursure, thanks for the sanity check and ideas16:24
TheJuliaand that is a totally reasonable generic error, I'm only suggesting it could be extended later for a list of known interface removals from deprecations16:24
* dtantsur nods16:24
TheJuliaso we can provide specific case by case guidance16:24
dtantsurI'll write a short RFE to track these ideas16:24
TheJuliaeh, if you hammer it out in code faster, that would also be acceptable ;)16:25
TheJuliaRealistically, just a generic error for "config to db mismatch" is *definitely* something I would backport16:25
dtantsurI need to collect the thoughts anyway, why not in writing :)16:25
TheJulia:)16:25
* dtantsur takes off compression gloves to finish typing before tomorrow..16:26
dtantsurhttps://bugs.launchpad.net/ironic/+bug/205195416:33
opendevreviewMerged openstack/ironic stable/2023.2: [Backport] Fixes Raid creation in iLO6 and other BMC with latest schema  https://review.opendev.org/c/openstack/ironic/+/90407016:38
dtantsurFolks, what do you think, should we maybe detect time skew between ironic and IPA? I.e. send the current agent time with lookup and blow up if it diverges significantly?16:47
dtantsurI have 2 customer cases just this week.....16:48
JayFWe already sync time on IPA startup?16:48
dtantsurWe can do it if there is NTP16:49
JayFThis is infrastructural issues where someone was not syncing time to their conductors, I assume?16:49
dtantsurIt is. I'm just looking for a better way to detect that than "Dmitry looking at your logs and telling you to set your hardware clock to UTC"16:49
JayFin a TLS-secured Ironic, that connection wouldn't ever even happen16:49
dtantsurNot exactly. Lookups will work, heartbeats will fail.16:50
JayFso sending time with IPA lookup would not help unless traffic between IPA<>Ironic were insecure (?)16:50
JayFLookups are permitted to have invalid certificates?16:50
JayFThe lookup that returns the agent token (!?) ... that seems wrong?16:50
dtantsurIt's not necessarily invalid: the Ironic's certificate will probably be generated long ago.16:50
JayFAm I crazy to think that basic https used to fail if your clock was skewed?16:51
dtantsurI'm not sure that's the case, but I'm not the biggest expert16:51
dtantsurI know you cannot have Kerberos working16:51
JayFLet me ask it this way: how does the IPA<>Ironic communication fail if not due to SSL issues in the HTTPS connection? 16:52
dtantsurThe IPA certificate is generated on fly. Heartbeats start coming seconds-to-minutes after it's generated.16:53
dtantsurThe Ironic's certificate may be generated days or months ago.16:53
dtantsurBut you're right, it's not bullet-proof.16:53
JayFHmm.16:53
TheJuliaI feel like it might make sense for maybe the conductor to emit a "our time is" field and maybe the agent can go "I have no ntp, uhhh... this feels like rdate, but okay!"16:53
TheJuliato at least sort of resolve it into the same ballpark16:54
JayFSo I'm just thinking through the possibilities16:54
TheJuliaelse, I wonder how will it detect, since the lookup client time shouldn't trigger a failure to be recorded by the conductor16:54
JayFand anything we do to *fix it* proactively that is not "sync time with NTP" is likely opening a security vulnerability16:54
TheJuliapossibly, or at least a path we need to be very mindful of potentials16:55
JayFso the best case we can get likely is changing "dtantsur looking through your logs" to "Ironic knows that things are likely broken"16:55
JayFwhich I get the feeling we could *probably* derive from the failure cases, if it's failing in ways I'd expect16:55
dtantsurNote: I don't insisting on fixing it, a clear and loud warning will work for me16:55
JayFe.g. Ironic sees a cert that the time is radically off on, and emits a new, unique error16:55
dtantsurIn this case, it will be noticed by field folks without escalating the whole thing to me16:55
TheJulia... I'd really prefer people just to use ntp, but even I had a case.... last week where someone had the clock configured to UTC but in their case UTC was actually AEST16:55
JayFAt some point, we will have Ironic see a cert with a radically screwed up time16:55
JayFand we can use that as the canary to improve the error, perhaps (?)16:56
JayFThat's why I was asking about where/how the error bubbles16:56
dtantsurmmm, you mean, if we get a TLS error, ironic tries to do some investigation on the cert?16:56
JayFexactly16:56
TheJuliaI like that idea16:56
dtantsuran interesting thought, I like it too16:56
TheJuliaand could also be semi-backporting16:56
TheJuliaor, backportable16:56
TheJuliasince we're just making things more operator visible in that case16:56
JayF"IPA presented a certificate created NN minutes in the future" -> node.last_error16:57
JayFor something similar16:57
dtantsurin the 2nd case I have SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1129)'))16:57
dtantsurI wonder what made them asking questions....16:57
dtantsurMaybe we need to augment that with a short explanation16:58
JayFSo we can dig deeper there, get the expiry time, and improve the error16:58
TheJuliajay.suggestion += ". Please consult the system clock. Ensure it is set to UTC and contains the time in UTC."16:58
JayFgetting the time in the error is key: because if you are in UTC-5 and you see "it was 5 hours expired"16:58
TheJuliadtantsur: in my case "not valid yet" was the error last week16:58
dtantsuryeah, depends on the direction :)16:58
JayFjust try to give the information to the error field then ops docs/automation can operate on it16:58
TheJuliayup, and just exactly what is wrong16:58
JayFI'm thinking if this happened in most envs I worked in, we'd probably look for that error in last error and run some kinda automation to fix the time16:59
TheJulia... I honestly think the customer had an internal ntp server offering the wrong time since folks swore they synced it to an internal clock source16:59
JayF:-| 16:59
TheJuliaor it was offering local time and advertising it as utc16:59
JayFthis  is why you have 3+ NTP servers (but never 2)16:59
TheJuliawell, for continious, yes16:59
dtantsurit's not helping that IPA starts showing the not really helpful "ssl.SSLError: [SSL: SSLV3_ALERT_BAD_CERTIFICATE] sslv3 alert bad certificate"16:59
dtantsurin my first case, the customers got fixated on that17:00
JayFone of the first conference talks was listening to David Mills (RIP :C) talk about NTP, and he dedicated a good 15 minutes to "never use two ntp servers"17:00
TheJuliaahh, yes, the classical mistake17:00
JayFWe can do the "dig into what the error is" ssl code in ironic-lib and put it in IPA and Ironic17:01
JayF(and un-libify it for backporting purposes)17:01
JayFthat way if you see a log about an ssl error, we'll always give you enough breadcrumbs to know what is going on, and maybe make an extra loud scream if we see a time that is +-24H17:01
TheJuliaAt one job we ran two remote and let the time critical nodes also use each other. Although, it meant some unplanned "oh no, need to update ntpd!"17:01
TheJuliaand we had external monitoring keeping an eye on our remotes17:02
dtantsurThe problem with the IPA side is that the traceback in entirely in eventlet17:02
JayFI'm surprised external GPS-based first-party time units aren't more popular :)17:02
JayFI know how to fix THAT problem LOL17:02
TheJuliaJayF: we actually considered it... just couldn't get budget to make a stratum 2 clock17:02
JayFadamcarthur5 is going to rip eventlet outta IPA someday ;) 17:02
TheJulia(or am I thinking of stratum 1)17:03
* TheJulia doesn't remember the levels much anymore17:03
dtantsurYeah. It's actually pretty cool that we can detect TLS problems on the server side, but we need a better error message..17:03
JayFwe can detect it both sides, to be clear17:03
JayFand print a better error to the logs + node.last_error when it happens17:03
JayFthat solves the whole problem, doesn't it17:03
TheJuliawe had an private SONET circuit to two central offices, we "almost" went the route of doing the ticker clock source off the rack's DS-3 clock source plug.... That was the least expensive option that didn't involve putting an antenna on the roof17:04
dtantsurthis is how it looks https://paste.opendev.org/show/bnJ33PtnYx0wv4RMBdci/17:04
dtantsurmaybe we can override handle()17:04
dtantsuror process_request17:05
* TheJulia realizes she dated herself mentioning DS-3 hardware17:05
JayFexcept ssl.SSLError as e:\n     ironic_lib.find_ssl_error(request, e)17:05
JayFIs that not just an SSLError we catch at a higher level and do something with?17:05
dtantsurJayF: well, we'll need to override something inside eventlet. Otherwise there is no place for us to catch it.17:06
JayFI'm not even saying inspect the sslerror itself even, but we have the remote host, we can connect to it, look at the cert, and pick the top 3 likely problems and give better errs for them17:06
MarcSchoechlin[m]I am using supermicro ARM servers with the ARS-110M-NR board. That board runs OpenBMC.17:06
MarcSchoechlin[m]Is there a api endpoint or other mechanism to backup and restore the entire configuration of the bmc?17:06
MarcSchoechlin[m](very useful for development purposes....)17:06
dtantsurIn this case, the remote host is us :)17:06
dtantsurMarcSchoechlin[m]: not in Ironic. Check their docs, sometimes similar functionality is provided.17:06
JayFMarcSchoechlin[m]: AFAIK such a command is not part of the redfish spec17:06
dtantsuryeah, Dell did it via an OEM call17:07
JayFdtantsur: I'm looking at code and trying to understand. Am I wrong to think adding an 'except SSLError' here would catch the case? https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/ironic_api_client.py#L9017:08
TheJuliaFirmware settings are visible, but it is not any level of backup/restore, and at least the earlier Supermicro gear with their inhouse redfish service was an entirely oem mechanism if memory serves17:08
opendevreviewRiccardo Pittau proposed openstack/sushy master: Force constraints when installing a package during tox test  https://review.opendev.org/c/openstack/sushy/+/90746117:09
dtantsurJayF: wrong direction. It's not from IPA to Ironic, it's the other way around.17:09
JayFis the exception you pasted from ironic logs or ipa logs?17:09
dtantsurThe WSGI server in eventlet shows a TLS error when a *client* fails. I don't know how it works, to be hoenst.17:09
JayFoh what the hell17:09
JayFokay17:09
dtantsuryep17:09
JayFI'd rather focus on the other half of it17:09
dtantsurTrue17:10
JayFas I'd honestly be crushed if IPA had eventlet after next cycle17:10
JayFalso I can upstream a fix to eventlet, I guess, if we find a way to do it sanely17:10
JayFwe do have an upstream there now17:10
* TheJulia fires up massive attack and goes and digs into a customer case17:11
rpittaufun fact: we've been using unconstrained requirements in pep8 tests since forever :D17:12
dtantsurfor some definition of "fun" certainly :D17:12
JayFhmm dtantsur gonna DM you17:12
TheJulia"for a fun time, run tox -epep8!"17:12
* TheJulia expects this in a rest room at some point17:13
rpittau:D17:13
rpittaumy brain needs air, see you tomorrow o/17:13
TheJuliaBrew Dog in Berlin had some interesting things scribbled on the wall the last time I was there, one thing was actually a programming joke in english17:14
dtantsurand some ironic stickers17:20
TheJulia+++++++17:35
TheJuliaAt least one Cinder sticker as well17:35
JayFTheJulia: https://github.com/eventlet/eventlet/blob/master/eventlet/wsgi.py#L1025 ... think about what might happen if get_errno() no longer works or if the errors themselves from ACCEPT_ERRNO moved/changed18:16
JayFTheJulia: this is what dtantsur and I were talking about, going to push a fix to eventlet and likely to oslo service as well18:16
JayFdtantsur: Talked to fungi as well, think we're going to fix this one in the open. I'm going to work on an eventlet fix for moving-forward, but am also going to see if there's something backportable I can do in oslo.service18:18
JayFI'll note during this conversation, it came up that we've had discussions about Ironic being technically under VMT, but we never got added to the list. I'm happy to push the change, and I think security sig is happy to accept, but just wanted to get a 'core review' on it in IRC first :D 18:20
fungiyep, just need to push a change to add whatever repos to https://opendev.org/openstack/ossa/src/branch/master/doc/source/repos-overseen.rst and it's on you to make sure that list of requirements immediately below is met18:21
fungiand reach out to me directly or to the denizens of the #openstack-security more generally if you have any questions18:22
JayFah that was it18:24
JayFlast time we went to do this, we were halfway between LP and Storyboard18:24
JayFso "configure things for VMT access only/immedaitely" was not possible18:24
fungiwell, it is also possible on storyboard, we have teams using it overseen by the vmt18:25
JayFI think the problem was more the answer to "what bug tracker are you using right now" was "uhhh" moreso than a solid answer18:26
JayF(now it's clearly LP everywhere, and we actually look at the danged bugs LOL)18:26
fungistoryboard has the ability to auto-add access for specific associated "teams" per project when someone opens a private story and marks it as security-related18:26
fungiso we set those projects to have the vmt auto-added in those situations18:26
fungiand then a vmt member adds the project-specific security team's access once intake is done18:27
fungifairly similar to how lp works in that regard18:27
fungii'd also be open to softening requirement #4 to "The defect tracker for each overseen repository must be configured to initially provide access for the VMT..." (striking out the "only"), since some of the projects grandparented into that list never got around to removing their initial access to those reports18:29
fungionly having the vmt members see them at first helps reduce the circle of people who are aware of a private report in cases where it's initially reported against the wrong project18:30
JayFI think that is unlikely to happen in Ironic18:30
JayFor unlikely to happen in Ironic where we don't have to be the ones to peel back the onion :)18:30
fungiso it's still preferable. and we aim for having the right project's security reviewers added to any report within 72 hours regardless18:31
fungi(usually it happens within hours, but depends on when the report comes in if there are weekends/holidays involved)18:31
TheJuliaSo I'm thinking we need an "Onion Peeler 5000"20:59
TheJuliaanyone know anyone openstacky in the LA/SoCal area who does public speaking?22:21
TheJuliaor might be interested in giving it a try?22:21
JayFTheJulia: can I steal your brain for 5m?22:22
JayFTheJulia: pretty sure RBAC enablement made some of our contributor docs invalid22:22
JayFand I'm trying to get adamcarthur5 working, we have all the stuff done but clients can't see nodes with devstack admin or demo projects it seems (?) 22:23
JayFwhich I think is sorta what you were trying to communicate to me yesterday22:23
JayFbut I need a happy path22:23
JayF(we are in a zoom or answer here)22:23
TheJuliagivey the linky22:23
JayFhttps://us06web.zoom.us/j/87639022870?pwd=Jnpb1YEk7xxW7UftDiQSby27YZ3CwK.122:24
*** tosky_ is now known as tosky23:14
opendevreviewVerification of a change to openstack/ironic stable/wallaby failed: [iRMC] Fix parse_driver_info bug enforcing SNMP v3 under FIPS mode  https://review.opendev.org/c/openstack/ironic/+/88524623:33

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!