Tuesday, 2023-10-10

opendevreviewVerification of a change to openstack/ironic master failed: Enable OVN CI  https://review.opendev.org/c/openstack/ironic/+/88508700:41
opendevreviewVerification of a change to openstack/ironic master failed: CI: Fix our internal MTU settings  https://review.opendev.org/c/openstack/ironic/+/89311200:41
opendevreviewVerification of a change to openstack/ironic master failed: Enable OVN CI  https://review.opendev.org/c/openstack/ironic/+/88508701:29
opendevreviewVerification of a change to openstack/ironic master failed: Enable OVN CI  https://review.opendev.org/c/openstack/ironic/+/88508704:12
rpittaugood morning ironic! o/06:54
opendevreviewAdam Rozman proposed openstack/ironic-python-agent master: implement basic-auth support for user-image download process  https://review.opendev.org/c/openstack/ironic-python-agent/+/89027207:02
opendevreviewVerification of a change to openstack/ironic-python-agent master failed: Conditional creation of RAIDed ESP for UEFI Software RAID  https://review.opendev.org/c/openstack/ironic-python-agent/+/89160907:41
opendevreviewAdam Rozman proposed openstack/ironic-python-agent master: implement basic-auth support for user-image download process  https://review.opendev.org/c/openstack/ironic-python-agent/+/89027208:11
opendevreviewMerged openstack/ironic master: Add inspection hooks  https://review.opendev.org/c/openstack/ironic/+/89353308:12
opendevreviewAdam Rozman proposed openstack/ironic-python-agent master: implement basic-auth support for user-image download process  https://review.opendev.org/c/openstack/ironic-python-agent/+/89027208:21
opendevreviewMerged openstack/ironic-python-agent master: Conditional creation of RAIDed ESP for UEFI Software RAID  https://review.opendev.org/c/openstack/ironic-python-agent/+/89160911:07
iurygregorygood morning Ironic11:36
TheJuliagood morning12:58
iurygregorygood morning TheJulia =)12:59
opendevreviewJulia Kreger proposed openstack/sushy master: Add a boot progress indicator  https://review.opendev.org/c/openstack/sushy/+/89683513:04
opendevreviewJulia Kreger proposed openstack/ironic master: Reset parent_node values to uuids instead of names  https://review.opendev.org/c/openstack/ironic/+/88975013:06
TheJuliarpittau: fixed the typo you spotted ^13:06
TheJuliaanother checksum failure. :\13:10
TheJuliawhiskey tango13:10
opendevreviewAdam Rozman proposed openstack/ironic-python-agent master: implement basic-auth support for user-image download process  https://review.opendev.org/c/openstack/ironic-python-agent/+/89027213:27
opendevreviewDmitry Tantsur proposed openstack/ironic master: Extract generic image publishing code from image_utils  https://review.opendev.org/c/openstack/ironic/+/89768114:00
dtantsurrpittau, refactoring part 2 ^^14:01
opendevreviewDmitry Tantsur proposed openstack/ironic master: [WIP] Generic API for attaching/detaching virtual media  https://review.opendev.org/c/openstack/ironic/+/89491814:01
dtantsurI hope it makes ^^^ testable14:01
rpittaudtantsur: ack, I'll have a look as soon as possible14:02
opendevreviewVerification of a change to openstack/ironic master failed: Enable OVN CI  https://review.opendev.org/c/openstack/ironic/+/88508714:31
TheJuliahmm, a bug we have, it seems in the agent14:58
JayFTheJulia: the hashing things? I have a suspicion14:59
TheJuliait seems to be the last iteration of the transfer.... it almost seems like the buffer hangs up15:00
TheJuliaunder pressure15:00
TheJuliabut we don't have enough information to know for sure from logs, but there is definitely a issue with retry logic needed15:00
JayFhttps://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/__init__.py#L27 15:01
JayFI basically have been waiting for the right eventlet version combo for this to bite us15:01
JayFand this behavior is kinda how I'd expect this to be shaped if it's causing us problems15:01
JayFI'm not saying it's the issue for sure by any means, I'm saying I've been suspect this partial-monkey-patch would cause this kinda issue for a while15:02
* JayF looks to see if pypi version has been bumped in u-c recently15:03
JayFs/pypi/eventlet15:03
TheJuliaThat could definitely be part of it, if it silently lets us think the last packet hasn't been delivered yet15:03
TheJuliawhich seems super weird15:03
TheJuliawe need to fix the retry handling around the failure anyhow.15:03
JayFlatest eventlet, from Jan 2023, is in u-c15:03
TheJuliawhich will also make it just go away15:03
JayFso unlikely it's that15:03
TheJuliain theory15:03
TheJuliawell15:03
JayFwell lets see when it bumped in u-c15:03
TheJuliawe only relatively recently added enough logic to log what was happening there15:03
JayFbumped 9 months ago15:04
JayFso we've been running this code for 9 months at least15:04
JayFyou're suggesting it may have been failing this way and we couldn't tell, perhaps?15:04
TheJuliayeah, we added bytes transferred logic say... 3-ish months ago after seeing a failure in CI like this where checksum didn't match15:04
JayFwell, we didn't know it was failing *this way*15:04
JayFTheJulia: https://github.com/eventlet/eventlet/issues/798 does this match our usage pattern, perhaps?15:05
JayFthe bug matches up, the circumstances match up (high contention in CI)15:06
TheJuliait doesn't hang though15:08
TheJuliaThe iterator returns liek there are no more chunks and thus transfer is done15:08
JayF...is it possible for that to happen and be a code problem?15:08
JayFand not a networking problem?15:09
JayFor I guess I should say; a code problem at the level we operate15:09
TheJuliaI don't know yet, your jumping to the conclusion you think it is while I'm still trying to wrap my head around it15:09
JayFyeah I'll leave you to your digging15:09
TheJuliaWhat I know of requests, it should only be able to happen If it thinks the buffer is empty, http downloads are notoriously problematic in that regard because all it has to do is not completely verify the transfer to be shorted15:10
TheJuliaso it could be any of the above and just a lost packet, really15:10
TheJuliaif it parsed the fin before the last packet, I could see that being the cause15:10
TheJuliabut that is still super weird15:10
rpittaugood night! o/15:34
ThiagoCMCHey folks,  I've set up Ironic using Bifrost in both a test environment (`testenv`) and a production lab with actual bare metal machines using IPMI & RedFish. I've set the Bifrost variables to `dhcp_provider: none` and `enable_inspector_discovery: true`. Everything works smoothly; the bare metal machines can PXE boot, enroll automatically, and have an OS deployed successfully. However, there's a challenge. If an `active` machine is 15:39
ThiagoCMCmanually set to PXE and rebooted, it mistakenly starts the IPA process again. In systems like Foreman or MaaS, DHCP management ensures that provisioned nodes don't repeat the "discovery/enlistment" process, even if PXE is the primary boot option. Instead, they boot from local storage. How can this be achieved with standalone Bifrost Ironic? I should note that I used `dhcp_provider: none` because, without it, the `dhcp-boot` entries 15:39
ThiagoCMCare missing in the `dnsmasq` configuration, preventing correct PXE booting. However, this setting feels counter intuitive.15:39
ThiagoCMC*Sorry for the multi-line message!*15:39
JayFThiagoCMC: so I think there's supposed to be something that blocks known-servers from being rediscovered based on their mac15:42
TheJuliaThiagoCMC: yeah, that is a downside of not having a fully managed dhcp service. Realistically we do manage the network boot state, and if your using UEFI (you really really should be), we insert a next boot record anyway which overrides it all. There *is* some hardware out there which disregards new boot overrides in some weird cases (mainly supermicro gear), in particular when IPMI and not Redfish is being used.15:42
JayFbut bifrost and inspection is not my primary area of expertise 15:42
JayFor I'm wrong :D 15:42
TheJuliaThiagoCMC: I *think* stevebaker[m] has done some work about the direct management of dnsmasq, but I just don't know where that is at. He should be awake in 3-4 hours.15:43
TheJuliaJayF: there is not a thing to block known systems with bifrost, it is generally static. The model is we setup the machines, and then they stick to what they've been told to do15:44
TheJuliaIf one is using a full integrated stack/flow, then it is much easier to do, but generally with regards to inspection it is your moving the machine off the original network where you do those activities15:45
JayFI think I'm remembering some old-old-old model stuff where we used iptables to block off known servers from the dhcp server? or vice versa (to only allow known servers in?)15:47
JayFbut the more I think of it, the more I think my recollection is from an implementation that was removed years ago15:48
TheJuliaSo that does exist in inspector, but not configured in bifrost15:48
JayFaha15:48
TheJuliain *large* part because ironic and inspector *share* the dnsmasq15:48
TheJuliait is a single configuration which serves both purposes15:48
TheJuliabasically ThiagoCMC is ending back up in discovery, unfortunately15:49
JayFthat fits, ty for filling the gaps in15:49
JayFand it sounds like you nailed the problem15:49
TheJuliayeah, downside of trying to be lightweight and compact too15:49
TheJuliayou can't have that and have lockout of known entities15:49
TheJuliaThiagoCMC: ... we *should* have persistent config for those nodes stored locally. Are all the nics enrolled in ironic?15:50
TheJuliawheeee meeting time is about to start15:50
* TheJulia needs more coffee15:50
opendevreviewMerged openstack/ironic master: Enable OVN CI  https://review.opendev.org/c/openstack/ironic/+/88508716:01
JayFTheJulia: no, you need a party hat16:02
* TheJulia turns le castle vania back on and begins dancing on the finance committee call16:02
opendevreviewJulia Kreger proposed openstack/ironic-python-agent master: Retry on checksum failures  https://review.opendev.org/c/openstack/ironic-python-agent/+/89785316:15
TheJuliaJayF: ^ since we've sort of seen the same thing with unreliable NICs, I'm inclined to think some bytes got lost in a buffer and it just doesn't get detected due to the fun interactions at play16:24
* TheJulia resumes dancing16:24
JayFit's a good fix for some problems even if it may or may not address a root cause for this one (I'm not saying it doesn't; I'm saying *shrug*)16:25
TheJuliaif we're loosing a fragmented packet, we may not be able to see it with the way the layering works16:26
TheJuliasince there is no size verification when streaming content16:26
TheJuliayeah, it is a complex area, we can only do so much, so better is all we can really do short of being able to go "ohh, I can reproduce it right meow()!"16:27
JayFheads up; I'll be AFK most of the afternoon today17:01
TheJuliaAck ack17:01
JayFTheJulia: this works out for  how we have things scheduled; but I'm basically going to miss the last half of Ironic Thursday vPTG17:27
JayFTheJulia: since that is OVN+Redfish day; would you mind owning moderation for that day?17:28
JayF(I'll be moderating the TC PTG during that time)17:28
TheJuliaJayF: not at all17:33
TheJuliawill do17:33
JayFthanks!17:33
TheJuliajust add a note to the schedule17:33
* TheJulia cackles with gleee17:33
JayFI added moderators to each day17:34
TheJuliamuchas gracias17:37
ThiagoCMCThanks for the insights! I appreciate your help. To confirm, I use EFI exclusively. I now have a clearer understanding of Bifrost/Ironic's current implementation and realize it's not something I missed. I'll explore workarounds based on your guidance. Cheers!19:03
stevebaker[m]good morning19:05
ThiagoCMCMorning!  ^_^19:05
JayFThiagoCMC: No problem, we are here to help. Feel free to hang out and ask more questions :D 19:08
JayFThiagoCMC: start customizing your agent and ask questions about that and maybe I can help, too ;) 19:08
TheJuliaThiagoCMC: since we do set hardware to boot from disk and boot from specific records,, we just don't expect them to go back to network unless there is a specific cause, so if the concern is "it is happening", then that is a different issue that we would likely like to better understand19:08
stevebaker[m]ThiagoCMC: Setting dhcp_provider: dnsmasq should add a dhcp filter exclusion for known nodes, but I don't know what other effects it will have on your use case19:20
ThiagoCMCstevebaker[m], adding ` dhcp_provider: dnsmasq` disables the `dhcp-boot` entries in the `dnsmasq` conf, so the machines don't even PXE boot. Source: https://github.com/openstack/bifrost/blob/stable/2023.1/playbooks/roles/bifrost-ironic-install/templates/dnsmasq.conf.j2#L9619:24
ThiagoCMCI just realized that this line is differnt in Bifrost `2023.2`!19:25
ThiagoCMCThere is now a new `or enable_inspector_discovery | bool` in there. Which was something I initially thought about doing.19:25
ThiagoCMCJayF, TheJulia, thanks again for being so welcoming and helpful! I'll stick around. My challenges are more about the unpredictability of the machine life-cycle - sometimes people change settings without realizing it. I'm also comparing Ironic with MaaS and Foreman in the same lab, trying to weigh the pros and cons of each. JayF, I'll surely take you up on that offer; customizing my agent is on the horizon, and I'll need all the 19:27
ThiagoCMChelp I can get! 😄19:27
TheJuliaThiagoCMC: understand completely! Pesky humans changing hosts!19:28
ThiagoCMCYeah lol19:28
ThiagoCMCMaybe the solution is with Bifrost 2023.2!19:28
* TheJulia goes off to $nextmeeting realizing that the odds of technical work being done today is rapidly approaching no chance19:28
ThiagoCMCI want to try... But there's a error, as follows:19:29
ThiagoCMCstevebaker[m], I just tried to install Bifrost 2023.2, but there's an error: `ERROR: 404 Client Error: Not Found for url: https://releases.openstack.org/constraints/upper/stable/2023.2` - Any idea?19:29
* TheJulia raises an eyebrow19:29
ThiagoCMCThis happens when I try `bifrost-cli install ...` from `stable/2023.2` branch.19:30
JayFUh19:45
JayFI really should close IRC if I'm going AFK in the afternooon so I don't see scary things like that ;) 19:45
JayFthat looks like maybe a url change19:46
JayFI'm going to actually close this window and not dig for now19:47
JayFThiagoCMC: please file a bug at bugs.launchpad.net about this, and re-link it here, if stevebaker[m] or someone else here doesn't get you fixed up19:47
JayFThiagoCMC: under bifrost project, if it doesn't exist, under ironic project is fine19:47
* JayF actually fades into the background19:47
JayFThiagoCMC: in the meantime, just run bifrost from master19:48
JayF2023.2 bifrost and master bifrost are probably almost identical right now :)19:48
ThiagoCMCOk, I'll do it.19:52
stevebaker[m]ThiagoCMC: The URL has moved, I'll have a poke soon20:29
stevebaker[m]ThiagoCMC: In the meantime setting the upper constraints env might help, TOX_CONSTRAINTS_FILE=https://releases.openstack.org/constraints/upper/2023.2/upper-constraints.txt bifrost-cli install ...20:41
ThiagoCMCstevebaker[m], cool! It seems to be working. But I used it like this instead: `TOX_CONSTRAINTS_FILE=https://releases.openstack.org/constraints/upper/2023.2/ bifrost-cli install ...` - Because it already appends `upper-constraints.txt`.21:08
stevebaker[m]ah good21:11
JayFThiagoCMC: so just read your full thing re: machine life cycle; Ironic has a lot of features, if your hardware cooperates, around being able to  reset things back to a sensible state between deployments. It's one of the real feature differences between Ironic and some of the alternatives. We try to follow the machines' life as it goes from role to role cleaning it between21:15
JayFeach deployment21:15
ThiagoCMCYep, absolutely! Ironic's flexibility is impressive.21:22
ThiagoCMCOne of my next tasks is to see how to integrate vendor-specific tools like Lonevo's `onecli` into a custom IPA image, so the firmware gets upgraded during "enrollment/discovery". Hardware RAID on those machines are also going to be a nice challenge.21:25
ThiagoCMCBTW, how are you folks dealing with things like these (`onecli`, `storcli`, etc)?21:26
TheJuliaWe've generally been trying to drive vendors to improve out of band interfaces as opposed to internal CLIs and data passing21:26
ThiagoCMCHmm... Makes sense21:26
ThiagoCMCRedFish BTW?21:26
ThiagoCMCI mean, FTW lol21:26
TheJuliasince the Redfish APIs have been a standardization point as opposed to writing code to support each tool and random integration vendor for raid chips21:26
ThiagoCMCSounds awesome!21:28
TheJuliavendor mileage does, unfortunately, vary there as well21:29
ThiagoCMCYeah, I know the drill. I've been dealing with Lenovo, Dell, HP, and Supermicro. Each has its unique surprises.21:30
ThiagoCMCSo far we were using Foreman, but it kind sucks lol21:31
ThiagoCMCMaaS is promising but limited in different ways.21:31
ThiagoCMCIronic seems the way to go! 21:31
TheJulia<321:31
ThiagoCMC^_^21:32
TheJuliathe dust in the air today must be bad, my allergies are bad enough I'm thinking of calling it a day21:32
* TheJulia lives in a desert area21:32
ThiagoCMCTheJulia, thank you for your help today! I really appreciate it... Take a rest!  =P21:33
JayFThiagoCMC: I'll say; such a hardware manager would be something we'd potentially take upstream, as long as the tooling it uses is publically available (not that we'd build it in by default; but we'd need to point to where it is)22:07
JayFThiagoCMC: in which case you get the benefit of our review :D (I wrote a majority of the hardware managers we used to secure hardware between tenants for rackspace onmetal)22:07
ThiagoCMCOk, got it! I'll act as the "QA guy"  :-D22:08
JayFlol you'd have to be designer, dev, and qa guy22:08
JayFI am "wisened old architect" ;)22:08
ThiagoCMCSounds like fun! LOL22:08
JayF(That's kinda joking; but realistically if it was done upstream; we'd be consultants and you'd have to do the work and testing)22:09
JayFbut I have done it a lot, and am willing to help -- upstream or not :D22:09
TheJuliaJayF: what does that make me?!22:09
JayFTheJulia: I was about to make some kinda chair pun22:09
JayFTheJulia: then I realized you could just reflect it back upon me22:09
JayFTheJulia: Ironic is basically the living room of openstack22:09
TheJulialol22:09
ThiagoCMCI'll definitely need help  ^_^22:09
JayFMuch of the help you need already exists; we have good documentation around the interface there for IPA and several examples.22:10
JayFThose docs and examples were written by some clever wise young architect I once knew22:10
JayFbefore he got old, grizzled and grey 22:10
JayF:P 22:10
TheJuliaJayF: There are pluses to being the chair!22:11
TheJuliaJayF: get some hair dye, it fixes everything!22:11
ThiagoCMClolol22:11
JayFhttps://docs.openstack.org/ironic-python-agent/latest/contributor/hardware_managers.html + https://opendev.org/openstack/ironic-python-agent/src/branch/master/examples22:11
JayFthose examples might need to be freshened up with service and deploy step decorators22:11
ThiagoCMCNice, thanks!22:12
TheJuliayeah, they do need to have notes about that22:12
JayFbut that's mostly the pattern, you might have to add a decorator in an extra place to expose any steps you make for usage in a deploy template or service steps22:12
TheJuliaI'll try to do that tomorrow22:12
JayFI mean, that is prime low hanging fruit22:12
JayFif you wanna write it down and delegate22:12
TheJuliatrue true22:12
JayFI've been trying hard not to eat all the ripe low hanging apples to leave some quick wins for outreachy/mlh folks incoming22:13
TheJuliaThe chair is very bad at delegating, good at convincing others22:13
TheJuliaNext my wife will tell me it is my lot in life22:13
TheJuliabut... not to travel back in time with a space station22:13
JayF(ThiagoCMC context on these jokes are; I am chair of the TC this cycle; TheJulia has been chair of Open Infrastructure Foundation for ... what feels like a long, long time?)22:13
TheJuliaJayF: this is my 2nd year as chair, I served as vice chair for a year as well22:14
JayFthat might as well be forever22:14
JayFthose are post-covid years22:14
JayFthose are roughly equivalent to seven normal human years :P 22:14
TheJuliawait... seven!??!?!?!?!22:15
TheJuliaI can retires now?!22:15
* TheJulia feels sadness that retirement is so far away22:15
ThiagoCMCWow! It's very nice to reach you folks here so easy... S222:15
ThiagoCMCI'm feeling important lol22:15
JayFWe always try to be friendly; today I'm extra friendly because I took a mental health afternoon off. It's very successful, you can tell by the fact I'm here :D 22:16
JayF(it actually was successful, I feel much better now and got many useful things done)22:16
JayFand honestly, Ironic is used a lot by large faceless companies who don't even want to talk publically about their infra; it's always nice to have a new person in here interested and willing to chat/ask questions/etc22:17
JayFCPU Cores is how you measure success in press releases; new happy users I've had a conversation with is how my brain measures success.22:17
ThiagoCMCCool, mental health is priority 22:17
TheJuliaI'm also generally super friendly with JayF... since we've known each other for $OMGIFEELOLDYEARS22:18
ThiagoCMCBTW, I also have some tips/suggestions for the Bifrost documentation (it's a bit hard to follow for starters, but I wrote my own internally to teach co-workers about Ironic and how to set it up easily - step-by-step).22:18
JayFTheJulia: I think we've crossed, for me at least, the "I've known you half my life" threshold :D 22:18
TheJuliaJayF: Yes, I believe so!22:18
JayFThiagoCMC: happy to hear them; also I'll note we will likely liberally merge change requests just to edit them22:19
JayFThiagoCMC: if you've not contributed to an opendev hosted repo before, I have some of the onboarding stuff collated for an MLH fellow who just started on my team, if/when you get to that point lmk and I can share it22:19
TheJuliaAs much as I want to go buy some hair dye, I need to roll towards the wifey's office and pick her up22:19
TheJuliaTonight is "meet the team" night for the local AHL team22:20
JayF(it's all documented; all I did was assemble the various links in one place to make it easier)22:20
JayFoooh, that's fun22:20
JayFwe had sad news; the best defenseman on the team "stuck" in the NHL22:20
JayFso we're probably going the whole season without him here :( 22:20
* JayF stepping away for a bit22:21
TheJuliaJayF: ouch :(22:21
TheJuliaJayF: anyway, have a wonderful remainder of the afternoon!22:21
ThiagoCMCJayF, sure! I'm down for it, never contributed thought (shame on me). I'm also using OpenStack Ansible (and Ceph Ansible) for about 5~6 years. Big fan of Bash/Python/Ansible. I'll write a `.md` somewhere soon (about the docs)!22:22
ThiagoCMCGotta go... Night, night!   ^_^22:32
TheJuliag;night22:33

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!