Monday, 2023-03-20

scott_Hey Ironic team! Quick question -- I'm trying to deploy OKD4.12 via IPI bare metal install in a home lab that consists of 3 PowerEdge R720s with enterprise iDrac7's running the latest firmware. I've got the OKD portion of it all set up with a provisioner FCOS VM running in VMWare Fusion and the bootstrap VM running inside it (with Ironic pod running inside that). No provisionining network, just the baremetal network.03:26
scott_To cut to the point, I've run the openshift install script a dozen times now and get different results every time. It seems that 0 or 1 of the 3 R720s gets provisoned correctly with FCOS running on the bare metal but the other 2 dont. Each of the 3 have gotten it provisioned correctly at least once -- but never more than that 1 server gets it on a run of the install script03:28
scott_The error I get for the other 2 (or sometimes all 3 servers) is "inspect failed" with 'Message': 'Server is already powered ON.', 'MessageArgs': [], 'MessageArgs@odata.count': 0, 'MessageId': 'IDRAC.1.6.PSU501'...03:31
scott_I've tried resetting all the iDracs to factory defaults, only powering on one server at a time during install, looking through the logs for any other hints, setting static IPs (in comparison to normally static-mapped DHCP), etc etc and have had absolutely 0 luck finding any rhyme or reason03:33
scott_I fully understand the iDrac7s don't seem to be supported at this point and am pretty close to switching to an IPMI install or just manual OKD UPI install but thought I would see if anyone may have some thoughts before I give up on it. It's very odd to me that 1 of 3 seems to regularly work and that the 1 is different every run.03:36
scott_I feel very comfortable digging around through the containers or logs if anyone has ideas and would much appreciate the thoughts! I tried to manually kick off the introspection process to make debugging a bit quicker and easier (than running through the entire installer) but couldn't figure out how to get past the Keystone authentication bit quickly/easily.03:39
scott_Additionally, I'm using the "idrac-virtualmedia" as the BMC address and Ironic is 100% able to hit all 3 iDracs -- all 3 power cycle correctly on every install, have their Boot option set to Virtual CD, etc -- but only one will possibly make it past introspection on to success03:44
scott_Thanks in advance!! And if the answer to all of this is simply that iDrac7s are not supported, thanks in advance for that too! Would honestly be a bit of a relief to just go with a different method at this point haha03:50
samuelkunkel[m]Hi, 06:39
samuelkunkel[m]If you describe introspection fails I would ssh into the ipa and have a look at the logs there.06:39
samuelkunkel[m]On controller side these tasks are  handeled by the conductor. So those logs should be of interest.06:39
kaloyanko/07:31
kaloyank2 questions here:07:35
kaloyank1. I've never attended a PTG meeting, I see that the full schedule is yet to be announced, where do I look it once it's published?07:38
kaloyank1. I have a vague memory that there was a draft about saving local disk deployments as images in Glance but I can't find the spec. Does such a spec even exist?07:38
arne_wiebalckGood morning, Ironic!08:03
arne_wiebalckkaloyank: This is the PTG etherpad https://etherpad.opendev.org/p/ironic-bobcat-ptg, the whiteboard says schedule to come (https://etherpad.opendev.org/p/IronicWhiteBoard)08:07
arne_wiebalckkaloyank: for your 2nd question, you mean snapshotting bare metal instances?08:07
vanougood morning ironic10:45
iurygregorygood morning ironic11:17
iurygregorykaloyank, regarding PTG schedule we will add the information this week to the etherpad that arne_wiebalck pasted, the initial proposal for the schedule is in the openstack-discuss so people could give feedback https://lists.openstack.org/pipermail/openstack-discuss/2023-March/032641.html 11:19
kaloyankiurygregory: thanks :)11:56
iurygregoryyw11:56
Nisha_AgarwalJayF, dtantsur Is there any action required from my side on https://review.opendev.org/c/openstack/ironic/+/860820? I see merge fails even after recheck done by Jayf...Shall i do recheck one more time?12:13
dtantsurNisha_Agarwal: someone needs to check why the failure happened. If you don't have time to wait, you may want to do it yourself.12:14
Nisha_Agarwaldtantsur, i saw py310 gate failed but i dont see the same gate failing on https://review.opendev.org/c/openstack/ironic/+/860821/5 while this patch is dependent on the first one...12:16
Nisha_Agarwaldtantsur, dont know but looks like recheck may pass else that gate should fail for all ironic patches as the failing code has nothing to do with the patch code12:17
* TheJulia tries to wake u13:22
TheJuliaup13:22
TheJuliakaloyank: typically we also update the original etherpad with a schedule so it is all in one place13:25
TheJuliakaloyank: snapshots have come up as a topic a few times, but the idea has never really gotten past an idea/initial phase. If you would be willing I would encourage you to start a spec if you have thoughts/interest in the topic.14:12
JayFscott_: I don't specifically know if iDRAC 7 is supported; I'd be surprised if not. I don't have specific troubleshooting tips in your case though.14:19
JayFscott_: what samuelkunkel[m] said about looking in logs is a good suggestion though14:19
dtantsurJayF, scott_, probably works, but probably not with virtual media14:19
TheJuliaJayF: scott_ was using something which was redfish based and only works on idrac8/9 :(14:20
JayFaha14:20
dtantsuryeah, I don't think wsman is supported in metal3. so IPMI it is.14:20
TheJulia... yeah14:22
scott_Thanks samuelkunkel[m], JayF, dtantsur, and TheJulia -- running through an install now to get at the logs. Happy to move on to IPMI but just anecdotally considering that it seems to work on 1 out of 3 fairly regularly, wouldn't that indicate that it's working? I was figuring it may have been a lock, timing, or some kind of multicast issue14:23
scott_Nonetheless, thanks a lot for your guys help and I'll look at these logs here soon14:23
dtantsurMy bet is on timing.14:23
TheJuliascott_: didn't see you re-appear! I saw you were not around last night and didn't respond14:23
scott_hahaha sorry using a web client and just reading through the archives if I go down!14:24
TheJulianote taken for future reference14:24
scott_@dtantsur, I was thinking timing too and thought maybe booting up one at a time would fix that issue but that didn't seem to do any better either14:25
TheJuliascott_: an idrac7? Same firmware?14:26
* TheJulia wonders if dell backported the vmedia capabilities14:26
scott_Yup! iDrac7 on 2.65.65.6514:26
TheJulia... in their firmware14:27
TheJuliaoh!14:27
TheJuliayou know what, 2.65.65.65 *is* the literal minimal version14:27
scott_hahaha yeah im just scraping by here14:27
TheJuliaso, in theory, it should be working. Redfish does need to be enabled for it to work. Worth checking if it is in the settings14:28
scott_yeah, Redfish is enabled on each and each of the 3 machines has provisioned correctly at least once -- just not all 3 of them in the same install14:29
TheJuliaoh joy :(14:29
JayFDo we have someone from Dell in the community still we could point at a bug?14:30
TheJuliasince it doesn't fail, I suspect attachment is succeeding, I wonder if the BMC is just failing internally14:30
TheJulia... this feels familiar unfortuantely14:31
kaloyankTheJulia14:31
kaloyankI'd love to start a spec as I have a real use-case for this feature14:32
TheJuliascott_: Are they all in the same boot mode to start?14:32
scott_I think @dtantsur may be right on timing being the issue since I can't find any other consistency to this14:33
scott_yup -- I've tried them all with NormalBoot set and with VirtualCD set on different runs14:33
TheJuliayeah, I'm thinking the same, the boot mode reset might be causing a configuration reset which would cause the power to cycle through once before14:33
TheJuliaUEFI boot mode?14:33
scott_ahhhh interesting!14:33
scott_it appears Ironic is setting it to UEFI but I haven't manually set it at all14:34
TheJuliaYeah, pre-set them to UEFI and see what happens14:34
scott_will try that now14:34
dtantsurboth metal3 and ironic default to uefi, yeah14:35
TheJuliait could be the config is alternating somehow14:35
TheJuliaIf we can identify a firm bug and write it up, I can email my dell contacts14:37
scott_done! all 3 set to UEFI -- will let you guys know in 20-30 how it plays out and then i can parse through some logs as well14:38
scott_@TheJulia sounds great14:38
TheJuliaThere is an issue stevebaker[m] found with vmedia url in *much* later versions of firmware and we reached out to dell engineering for an answer but in that case it is definitely not our code nor something we can work around.. i.e. has to go to the firmware devs.14:39
scott_gotcha14:44
scott_@TheJulia -- hmm UEFI didn't seem to work -- straight to "inspect failed" for all 3 of them. Will start digging through the logs as samuelkunkel[m] suggested but happy to try any other suggestions as well!15:00
JayFGood morning folks, it's meeting time but hopefully we'll wrap it up quick and get back to DRAC'in :D 15:01
JayF#startmeeting Ironic 15:01
opendevmeetMeeting started Mon Mar 20 15:01:21 2023 UTC and is due to finish in 60 minutes.  The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot.15:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:01
opendevmeetThe meeting name has been set to 'ironic'15:01
TheJuliao/15:01
iurygregoryo/15:01
matfechnero/15:01
JayFWho all is here this morning?15:01
vanouo/15:01
hjensaso/15:01
janderso/15:01
JayF#topic Announcements/Reminder15:02
JayFPlease hashtag your ready-for-review stuff with #ironic-week-prio; and prioritize reviews in the priority dashboard in the Ironic Whiteboard @ http://bit.ly/ironic-whiteboard15:03
JayFWe have 2023.1 branches cut of everything; master commits now go to 2023.2. Coordinated release is Wednesday.15:03
dtantsuro/15:03
JayFCongratulations on yet-another-successful integrated release including Ironic \o/ 15:03
vanou\o/15:04
JayFThere were no action items from previous meeting; skipping the related agenda item.15:05
JayF#topic Ironic CI status15:05
JayFDo we have any observations about CI?15:05
TheJuliaI didn't see any issues last week15:05
JayFFrom me, I'm pretty sure we have a flakey test in py310 CI; I might try to find time to look in depth after PTG (literally time-wise after the PTG meetings on those days)15:05
JayFI'll also note, metal3 CI is in master now15:06
JayFand I think it's almost to the point of running out of the (shared) metal3-dev-env repo; once that change hits I will propose we backport that CI job to 2023.115:06
JayFto ensure we keep things working for our metal3 friends + sqlite users15:07
JayFIf no other comments moving on15:07
JayF#topic VirtualPDU15:08
JayFReminder: repo move scheduled for Apr 7, then it'll be under openstack/ and we'll have full mangement of it (not just paper-governance lol)15:08
JayF#topic Ironic Bobcat vPTG15:08
JayFPlease stick around after the meeting; we'll be doing a sync to schedule PTG items.15:08
JayFPlease join in either the zoom room I will link post-meeting, or just async by being in the ehterpad ( https://etherpad.opendev.org/p/ironic-bobcat-ptg ) making comments.15:09
JayFIf there are any requirements you have for PTG scheduling: requested times for certain topics, topics not listed in the the etherpad, etc15:09
JayFright now is more or less your last chance to make noise about that :) so please do 15:09
TheJuliahopefully that will start promptly, I have another meeting starting at the top of the hour15:10
JayFyep I'll hurry on then :D 15:10
JayF#topic Ironic VMT15:10
JayFGoing to give a quick update here; essentially the only piece we're missing is giving VMT group exclusive access to Ironic security bugs15:10
JayFbut because we're sorta in storyboard/LP limbo, I'm unsure where to go next15:11
JayFwe should probably just configure in storyboard and get VMT managed, and ensure LP is configured correctly when that migration happens? I just haven'15:11
JayF**haven't prioritized making time for that migration15:11
JayFI'll probably go that route unless there are objections15:12
TheJuliaJayF: They can already see them in storyboard AFAIK15:12
TheJuliaand interact with them15:12
JayFThey have to have exclusive access15:12
JayFe.g. VMT sees them but Ironic cores can't15:12
TheJuliaYeah, they have that afaik15:12
TheJuliathe reporter otherwise has to explicitly grant in storyboard15:12
JayFwell that makes this easier; I'll move on VMT this week15:12
JayFmoving on so we can get to PTG planning15:12
JayF#topic Hosting full IPA images15:12
JayFdtantsur: this is your item15:12
dtantsurThat's a past one, sorry, should have removed15:13
JayFack; no problem15:13
JayFwhat was the decision outta that?15:13
JayFwe going to add extra-hardware?15:13
JayF20M didn't seem like much in context of a huge modern image?15:13
dtantsurI want to investigate getting rid of the dependency on extra-hardware in baremetal-operator15:13
JayFnice15:13
JayF#topic Open Discussion15:13
dtantsurIt may involve adding something to the IPA inventory15:13
JayFAnything for open discussion? Speak quickly or else I'm going to close the meeting so we can shift to PTG planning15:13
vanouI have15:14
JayFdtantsur: neat; I'll be interested to see what comes out of that15:14
JayFvanou: awesome; go ahead15:14
vanouI +2 to moving VMT process regarding Ironic vul15:14
vanouHowever, regarding vulnerability which affects both Ironic and vendor library, I think we need to add vul handling note into ironic doc.15:14
vanouJust put 2 things in doc is enough I think: If Ironic community is asked by owner of unofficial library,15:14
vanou1)Ironic community is open and willing to collaborate to solve such rare vul15:14
vanou2)Ironic community is willing to collaborate in resonable manner, which means follwing good manner to handle vul (e.g. craft vul patch in private till fix is published), to resolve vul.15:14
JayFI think we're willing in general to do those things; but like I suggested when this was brought up outside a meeting in IRC; I think there's value in getting that added to Openstack-wide VMT documentation15:15
JayFbecause Ironic is not the only project that has vendor drivers which may require coordinated disclosure15:15
JayFand I suspect the reality would look like what you lay out; but if you're concerned about getting that in writing, it's probably best to put that in OpenStack-level docs since Ironic is going to hook into the OpenStack-level VMT15:15
vanouI see.15:15
vanouyou mean, it is  better to consult this on OpenStack ML15:16
vanoulike you, on openstack-discussion?15:16
JayFOr with the security SIG in #openstack-security; or both15:16
JayFIt's an openstack-wide problem so I prefer not solve it at a project level15:17
vanouOK. I'll contact through that channel15:17
JayFIs there anything else fro Open Discussion?15:17
JayFAlright, thank you everyone. Stay tuned for PTG planning.15:19
JayF#endmeeting 15:19
opendevmeetMeeting ended Mon Mar 20 15:19:08 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:19
opendevmeetMinutes:        https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-03-20-15.01.html15:19
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-03-20-15.01.txt15:19
opendevmeetLog:            https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-03-20-15.01.log.html15:19
JayFPTG planning Zoom -> https://us06web.zoom.us/j/89245276276?pwd=QS83YUh1K1ZoSUpicFM1ZFFNbGJ4dz0915:19
JayFif you don't wanna get in the zoom, you can open etherpad and comment along15:19
JayFhttps://etherpad.opendev.org/p/ironic-bobcat-ptg15:19
kubajjo/15:21
JayF>>>>> PTG planning Zoom -> https://us06web.zoom.us/j/89245276276?pwd=QS83YUh1K1ZoSUpicFM1ZFFNbGJ4dz09 <<<<<15:22
opendevreviewMerged openstack/ironic master: Fixes Secureboot with Anaconda deploy  https://review.opendev.org/c/openstack/ironic/+/86082015:32
scott_Seems like the meeting is over in here? So I don't see any IPA logs -- the only ironic pods running in the bootstrap system are "ironic", "ironic-inspector" and "ironic-ramdisk-logs" -- they seem to all be pushing out to journald on the host.15:47
scott_Of note -- I may have been overlooking this previously but the two errors that pop out of Terraform that OKD is running are the "Error: could not inspect" with 'Server is already powered ON.' as I mentioned but also just noticed the second error is "...node is currently 'inspect failed' , last error was 'Failed to inspect hardware. Reason: unable to start inspection: No suitable virtual media device found'"15:49
scott_I had assumed the "powered ON" issue was the underlying since thats where the ironic.drivers.modules.inspector stack trace pops in the logs but maybe thats not it?15:54
dtantsurscott_: ironic-ramdisk-logs are IPA logs15:56
scott_ah gotcha! thanks @dtantsur15:57
dtantsurscott_: "no suitable virtual media device" may mean that the hardware does not support virtual media, at least the standard way15:58
TheJuliait might not have an attachment?15:58
scott_hmmm -- it's definitely worked oddly -- all 3 of these machines had Windows Server on them before my dozen provisioning attempts and now they all have FCOS15:59
scott_just doesnt seem to have any particular regularity to them provisioning correctly16:00
JayFEven if vmedia is your end goal; I'd be curious if a pxe-based driver would work in your case16:01
JayFjust so we can narrow it down?16:01
TheJuliascott_: is there a 2.75.75.75 available?16:03
TheJuliafor firmware16:03
scott_I think I may end up having to head the direction anyways -- was just trying to avoid adding a separate provisioning network16:03
TheJuliaI do seem to remember some issues on that first version ages ago16:03
scott_@TheJulia -- I hadn't seen a newer available version but let me look!16:03
JayFEven with vmedia it's a good practice to have a separate provisioning network, fwiw :D (although significantly less dangerous than it was pre-agent-token)16:04
scott_from Dell's website "iDRAC7 has reached both the End of Sale as of February 2017 and End of Software Maintenance as of February 2020. The last release of iDRAC7 firmware is version 2.65.65.65. " leads me to believe no unfortunately16:05
scott_@JayF hahaha yeah understandable -- just trying to keep things pretty minimal in this lab environment but may just need to pull the trigger on adding a new network -- will look into doing it via VLANs if thats an option16:07
JayFJust making it clear that it still will hit the network during provisioning :D16:07
JayFI know that "good practice" sometimes is code for "a pain in the rear which might not be worth it" lol16:07
scott_hahahaha yeah definitely16:08
TheJuliascott_: https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=krcxx16:09
* JayF adds "Dell whisperer" to his list of TheJulia's magical powers16:11
scott_TheJulia: :O -- let me double check this will work before I brick an iDrac but thanks so much if it does!!! haha16:11
TheJuliaI *believe* that magical power is hjensas's16:11
TheJuliabut some of it may have rubbed off on me16:11
scott_hmmm when I put in my service tag on that page it returns "This driver is not compatible" but im certainly not opposed to trying it if you think it may work16:14
TheJuliadunno, I know we have applied it to some of our hardware, but there is a variety of hardware out there16:21
TheJuliahonestly, if your super worried about the risk of bricking the BMC (because, I know from experience, it is just not fun to recover from), then PXE is the path forward since your on a version that was used in the very initial development was taking place16:22
scott_TheJulia: I love a little excitment and risk taking in my day! Going to give 2.75.75.75 a go -- will report back shortly! Assuming it does brick my device or simply doesn't work, will likely go the PXE route as you, JayF, and dtantsur have mentioned16:34
scott_Yup appears no luck with 2.75.75.75 -- "Unable to extract payloads from Update Package."16:37
* JayF wonders if it's > if serviceTagWarrantyExpire: raise SomeError16:38
JayFI'm maybe a little more cynical than needed though :P 16:39
scott_hahahahaha that would be very unfortunate but since it works 1 out of 5 times, seems unlikely16:39
JayFI meant more the inability to upgrade but yeah, unlikely anyway :P 16:40
scott_ah gotcha16:40
scott_Well thank you guys regardless! Will try a couple variations on what I've been trying repeatedly (on a separate provisioning network, via the IPMI option, etc) before moving on to PXE 16:40
JayFgood luck :) sorry we were unable to get it working the way you wanted16:41
scott_not at all! was helpful nonetheless! much appreciated16:41
scott_will let you guys know if it unexpectedly seems to work with some minor tweak for archives sake16:42
JayFfeel free to hang out in any event :) It means you get to be the first vote anytime we talk about what an operator would actually want ;) 16:45
opendevreviewDmitry Tantsur proposed openstack/ironic-specs master: [WIP] Merge Inspector into Ironic  https://review.opendev.org/c/openstack/ironic-specs/+/87800118:57
dtantsur^^^ Long overdue, I know :)18:57
dtantsurI've been delaying it until.. well, until I realized that if I don't write it down, I'll keep repeating it over and over again.18:57
dtantsurNot finished, but comments on the motivation section, as well as performance/scaling/upgrades are welcome18:58
dtantsuron this positive note, wishing y'all good nicht18:58
opendevreviewSteve Baker proposed openstack/ironic-python-agent-builder master: Add checksum generation support  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/87800921:06
JayFTheJulia: dtantsur: I think we're going to need another hour for vPTG to fit it all in, unless we want to cut some things21:32
TheJuliaI think it is reasonable21:32
TheJuliato add an hour, that is21:32
JayFTheJulia: dtantsur: I'm thinking adding a 1600-1700 UTC slot on wednesday, use it as a lightning round for all of the things we want to talk about quickly and move on (e.g. ARM CI/image publishing, kernel/ramdisk multiarch)21:32
TheJuliasounds good to me21:56

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!