Tuesday, 2023-10-17

rpittaugood morning ironic! o/07:39
dtantsurTheJulia, JayF, in-band clean steps are orthogonal to drivers, so I'm a bit puzzled by the discussion last night. If there is a step in IPA, you can use it today.09:20
dtantsurOr did you mean out-of-band really?09:20
dtantsurJayF: re automated clean template, we just need to expose https://docs.openstack.org/ironic/latest/configuration/config.html#conductor.clean_step_priority_override as a Node field09:22
TheJuliaJay was thinking of taking driver vendor specific feature steps and making them able to be used across the board by a generic driver, so like use generic redfish but be able to always invoke special ilo driver management step.09:24
* TheJulia tries to go back to sleep09:24
TheJuliaI think the hope was to decouple perception of need to maintain a whole driver to get useful individual features09:25
dtantsurI'm not sure what is bad about it (we literally coined the notion of "hardware types"). Maybe we should just make the driver composition more easily composable.09:49
dtantsur(so, it is indeed about out-of-band steps, not in-band, right?)09:50
TheJuliaNothing bad, but practical challenges09:50
dtantsurCrossing the hardware type boundary will also cause practical challenges (like, the iLO one will expect ilo_address, which is not even the same format as redfish_address)09:51
TheJuliaYup09:51
opendevreviewMahnoor Asghar proposed openstack/ironic master: Add inspection hooks  https://review.opendev.org/c/openstack/ironic/+/89266111:36
iurygregorygood morning Ironic11:41
masgharMorning!11:59
rpittaumasghar: hi! is that the last of the "hooks" patches? or we're missing something else? I lost count :D12:27
masgharrpittau: Its the last one :D12:37
rpittau\o/12:37
opendevreviewRiccardo Pittau proposed openstack/ironic master: [WIP] Generic API for attaching/detaching virtual media  https://review.opendev.org/c/openstack/ironic/+/89491812:53
drannoudtantsur: crossing hardware type could be a good idea, for us for example we would like to be "platform" agnostic.13:08
drannoufor the moment we are using HPe hosts, but I don't want to be vendor lock13:09
TheJuliadrannou: but are you using specific driver features which would only ever work/exist on hpe hardware?13:16
dtantsurdrannou: I don't think what we discuss with TheJulia is related to the vendor lock problem13:23
dtantsurit's really a question of which driver you set to the node13:23
dtantsurIf we want something in Redfish, we can ask DMTF for it. Jacob and I are going through such a proposal process right now for a different thing.13:23
TheJuliaI think there are some small steps we can take to simplify some matters, which could also be go to the standards bodies if needed, but vendors also traditionally try and drive differentiator features in the form of "value-add" for their customers... and to sell more hardware/license fees by locking some of those features behind upgraded use licenses13:25
dtantsurWell, licenses are something quite orthogonal here. They can easily hide parts of standard Redfish behind a paywall.13:26
TheJuliaAnd there *is* already a way for us to do cross cutting features, it is just not the fastest/easiest path if a vendor wants to drive it13:26
TheJuliaoh yes, absolutely13:26
TheJuliaexample, Supermicro13:26
TheJuliaand VMedia13:26
drannouTheJulia: in fact, nothing for the moment, but we will need to have a license management, and firmware upgrade. 13:30
drannouThe problem is that I don't want to switch compeltely to ILO for just that13:31
dtantsurRedfish firmware upgrade has been implemented recently13:31
dtantsurLicense management is something we haven't looked at yet. It's probably highly vendor-specific.13:31
opendevreviewRiccardo Pittau proposed openstack/ironic master: [WIP] Generic API for attaching/detaching virtual media  https://review.opendev.org/c/openstack/ironic/+/89491813:36
TheJuliaso, there *is* a v1.1.0 LicenseService definition off the root in the st andard13:51
TheJuliastandard13:51
dtantsurNice!13:53
TheJuliaLooks disjointed from the system entirely, as a payload to upload, but also embraces the Oem field as well13:53
dtantsurAt some point, we may make inventory of Redfish features we eventually want to support13:53
dtantsurCustom TLS certificates are probably at the top of this list, together with UEFI HTTP boot13:54
TheJuliaThere is also a license list resource13:55
TheJuliaLooks like this stuff got added back in 2021.113:56
TheJuliabut is not in the mockups13:56
TheJuliaat least, the couple I clicked into13:57
TheJuliaSpeaking of UEFI HTTP Boot, I need to get back on that13:59
TheJulialooks like I broke something on the emulator schema :\13:59
* TheJulia tries to grok a thread and wonders if they are advocating nuking PLDM from the face of device interactions and repalcing everything with redfish14:17
JayFdtantsur: the idea would be to think about a world where there would be no ilo hardware type, but instead you'd have something like a redfish hardware type and a bunch of plugins which just call custom redfish hardware APIs. but the world isn't lollipops and rainbows and all of the bonus stuff isn't in the standards-compliant places so this is a solution for a world that14:17
JayFdoesn't exist14:17
dtantsurheh, I see14:18
dtantsurMy less ambitious vision is for both ilo and idrac drivers to be built on top of the redfish one14:18
TheJuliaI'm wary of unicorns, but Legends of Tomorrow Season 4... Episode 4 streamed last night in the living room.14:18
JayFdtantsur: that kinda sucks operationally, to be honest, because then we still have more room for divergence over time14:19
dtantsurOnly if the reality diverges. Which is something we cannot prevent.14:19
JayFdtantsur: my hope would be for an operator to know+understand one driver, hopefully, and not have to figure out the incantation for each different vendor's hardware they buy14:19
JayF:( 14:19
TheJuliaDivergence is always going to occur on some level, the value add driver.14:19
dtantsurI mean... knowing that "ilo" is for HPE hardware is the least complex fact about Ironic you can learn14:19
JayFwe have a PTG session about ^14:20
JayFit's not simple14:20
TheJulia... not all HPE hardware though14:20
dtantsurnor is our redfish driver for all redfish hardware14:20
TheJuliayup14:20
dtantsur#sadbuttrue14:20
JayFthat is the piece that hurts me in my hurtin' place LOL14:20
dtantsuryou're not alone with your hurtin' place's hurt :D14:21
JayFthe 'redfish driver is not for all redfish hardware' ... is that a case we can detect?14:22
JayFlike are most failure modes for that "oh, it's missing support for X, Y, Z" or is it "yeah support for this is advertised but broken" (or "yes")14:23
dtantsurWe sometimes even try (see idrac-redfish-virtual-media in the recent past)14:23
dtantsurThe huawei driver is a counter-example, I think14:23
TheJuliaboth very good examples14:23
dtantsurAnd now we're going to have a Fujitsu's Redfish variant14:23
JayFhuawei driver is not called that upstream is it (?)14:23
TheJuliaone generally works, but we've found cases where firmware updates break it, the later has four different field names to possibly account for power state.14:23
dtantsurJayF: ibmc?14:23
JayFack14:24
JayFI never knew what h/w that was for lol14:24
dtantsur:D14:24
JayFnever worked a place that would've gotten their gear14:24
TheJulia... on a plus side, the ibmc contributor indicated they were aware of the noncompliance and intended to fix it14:24
dtantsurYou would need to move to Russia for that, which is something I'd not recommend14:24
dtantsur(or China, obviously, which I cannot recommend either)14:24
TheJuliaChina != Russia, but they both get cast in the same light often times14:25
TheJuliaDifferent cultures and all14:25
* JayF puts on his best william wallace face and yells something about freedom /s14:25
dtantsurWell, if you don't want to get into details, and is lucky enough to not have to care about them, you can throw them in the same bucket14:25
JayF^ is pretty much where I am14:26
TheJuliadtantsur: this is true14:26
* dtantsur has started the very slow and complicated process of gaining the German citizenship, hold your fingers crossed14:26
TheJuliaI wouldn't mind at some point visiting china on a tourist visa, but the odds of that happening are slim14:26
TheJuliadtantsur: \o/14:26
masgharI heard from an ex-CEO-in-China recently that the people on ground are super nice, but there are difficulties in running a business etc14:51
TheJuliaAlso the movement of funds in and out14:55
TheJuliaAs a business traveler, China is a difficult country to visit.14:55
TheJuliaoh wow, reading about PLDM and the internal modeling explains a LOT of how BMCs have typically viewed and provided networking details14:58
dtantsurPLDM?15:08
TheJuliaoh, wow, and the interaction explains why it is often slow15:08
TheJuliaPlatform Level Data Model15:08
TheJuliahow devices on the motherboard are supposed to talk to the BMC15:08
dtantsurAKA dark magic reference :D15:10
iurygregoryhey TheJulia I was talking with dtantsur about a bug downstream involving multipath, scenario: a hardwar have 84 devices with 4 paths each, Dmitry had an interesting idea, that maybe we could cache the output of https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L233 and avoid doing two more calls on our code (if I recall correctly they are reaching timeouts when trying to provision the 15:10
iurygregorynode), do you think this would be safe to cache?15:10
TheJuliaIn the MCU, it might be titled "The Darkhold"15:10
iurygregorydark magic reference lol :D15:11
TheJuliaCache makes sense until we encounter an IO error and feel the need to reset said cache, fwiw15:11
TheJuliaThe *odds* of an IO error due to changes are more from like a cable getting unplugged15:12
JayFif we're dealing with multipath, remote IO15:12
JayFand we lose a device/topology changes15:12
TheJuliayeah15:12
JayFshouldn't that be a provisioning-aborting event?15:12
JayFor does that sorta intentionally go against the HA nature of the multipathing?15:13
dtantsurIf that's not a root device, we can survive it15:13
TheJuliasort of yes, but we also via the cache know the multipath device name via the kernel so we can continue to operate/resolve15:13
JayFI am unsure if in this case we should be using the HA (?)15:13
TheJuliaoh, the kernel will make sure you keep working if your interacting with the mpath device :)15:13
JayFoh so the idea is, I got an I/O error using this path, I'll try this one15:13
* TheJulia has pulled many a fiber cable15:13
JayFit's like you have ... multi[ple] path[s]!15:13
TheJuliaJayF: the kernel does it for us :)15:13
JayFdoes this mean I'm a storage engineer now? Where can I pick up my LUNs? /s 15:14
TheJuliaYou can pickup your luns once you rack/stack/cable 500TB-1PB storage system15:14
TheJuliaoh, and configure the SAN controllers15:14
JayFI don't like SAN, it's rough and sticky and gets everywhere15:15
* dtantsur has realized he can no longer list all possible hardware interfaces by heart and is a bit upset15:16
iurygregoryon my mind the easy path was use skip_block_devices XD "hey ironic don't look at this devices please" lol15:16
TheJuliaNow remember https://www.etsy.com/listing/1361107953/torch-and-fire-extinguisher-holder makes things *much* better with SANs15:16
dtantsuriurygregory: that's not a terrible idea, but I'm afraid we'll anyway try to list them first15:16
dtantsurand then you need to add Metal3 API to pass this skip_block_devices...15:16
iurygregoryyep15:16
TheJuliaenumeration is dynamic guys15:16
JayFdtantsur: you can never list the ones that I wrote that are locked up in some private repo in rackerlabs/ github repo somewhere ;) 15:17
TheJuliadepending on SAN state and FC login query15:17
TheJuliaand SAN response speed15:17
JayFdtantsur: oh, you mean interface, not manager, just kidding15:17
TheJuliaso don't expect stability *at all* unless your matching WWNs15:17
dtantsurJayF: you got me scared for a minute: new downstream hardware interfaces, that would be.. something15:17
JayFyou're talking about agent15:17
JayFand rescue15:17
* dtantsur suspects someone may have a graphical console interface somewhere in production15:17
JayFyou know that, right? LOL15:17
dtantsurwellllllll :D15:18
iurygregoryi don't even think it's SAN to me is a JBOD they are using hehe15:18
dtantsurTheJulia: I keep telling people that, but people INSIST on using device names... anyway15:18
TheJuliaiurygregory: is it showing more than one path per device?15:18
dtantsurTheJulia: 4 paths for each of the 84 devices15:19
TheJuliaiurygregory: whiskey15:19
TheJulia?$?15:19
dtantsurhundreds of block devices per machine15:19
iurygregoryyeah15:19
TheJuliawtaf15:19
dtantsurfun right?15:19
TheJuliaI've seen 2 paths to a JBOD array, never 4 before15:19
dtantsurIt takes IPA a few minutes to loop through them.. and then it starts again.. and again... and again......15:19
TheJulia... are we sure it is not a JBOD array via FC?15:19
dtantsurI'm personally not sure about anything in that setup15:19
* TheJulia makes nervous laughter as she login into the case management system to see what the customer uploaded15:20
dtantsurLet's hope it's cat photos15:20
iurygregoryyes please15:20
iurygregoryand corgi photos also!15:20
*** dking is now known as Guest372615:21
TheJulia*anyway* caching the multipath command output is stable enough for IPA's exeuction15:21
iurygregoryok, I will try to think on how to do this after grabbing lunch15:21
iurygregoryand coffee15:21
iurygregorybrb15:21
dtantsurDoes anyone know a usable way to alternate between two shared screens?15:23
JayFon linux?15:23
JayFand do you need audio?15:23
dtantsurLike, slides and a console. But without turning the sharing on/off or sharing the whole screen.15:23
dtantsurJayF: on linux, audio not required15:24
JayFI have a great answer for you if both of those are yes15:24
JayFperfect15:24
TheJuliayeah, the list forces a scan, so it is a steady enough state15:24
JayFdtantsur: obs has a plugin which will output to a v4l loopback device15:24
JayFdtantsur: instead of using share screen function; use webcam function to share the v4l loopback device15:24
JayFdtantsur: now you have OBS studio fully functional to swap out what you're sharing on your screen between slides/console15:24
TheJuliaAwww, customer redacted the pathinging details enough that I can't tell if there is more than one storage controller port in the mix15:25
dtantsurJayF: nice! I need to rehearse that, but sounds like it could work.15:26
dtantsurI suspect I can even display my small face in the corner15:26
TheJuliaiurygregory: so, I'm inclined to think *not* a jbod, but a proper san because it looks like 3 different devices are in the mix locally based upon bus id markers, but they are also not sequential so ... dunno15:27
TheJuliadtantsur: I did that once with an f-key to toggle the source, it worked really well15:27
TheJuliaplan 1-2 hours for learning curve and all15:28
JayFdtantsur: exactly, it's a pretty good model, and even if you can't make it work via a webcam, you can open something like gvc or cheese and share that as your screen15:28
opendevreviewTakashi Kajinami proposed openstack/ironic-inspector master: Fix python shebang  https://review.opendev.org/c/openstack/ironic-inspector/+/89858715:28
dtantsurJayF++ TheJulia++15:28
JayFoh and also15:28
JayFtest the hell outta it15:28
TheJulia+++++++++15:28
JayFOBS on linux is flakey as hell15:29
JayFin my world, it was always audio so no audio should help a lot15:29
dtantsurI have some experience with OBS already, but only with regular recording.15:29
TheJuliaAlso, if your doing anything like green screening with your background15:29
JayFI tried to do streaming to twitch for a while from Linux; I had something like a 10% incidence rate of the sound just ... dying mid-stream15:29
JayFno errors from OBS, no indicators in the logs, it just ... stopped15:29
dtantsurIf I share that in a call, the sound will go through the regular means15:30
dtantsur(which also has a small chance of just not working)15:30
JayFyeah so I think you'll be fine, I'm just making the point that it's not the most rock solid platform on linux at hti spoint15:30
dtantsurSince when is Linux rock solid? That's one of the things we love about it :D15:30
JayFalso note that I'm doing all this on gentoo so there's a nonzero chance it was just broken because that point release of ffmpeg, plus that point release of obs, plus a full moon and my cflags or some craziness lol15:30
dtantsurLOL15:31
JayFdtantsur: dead serious; I run linux these days on things other than my gaming machine because I got so dang tired of feeling like my own devices were trying to sell me crap15:31
TheJuliaRemember, do not taunt the darkhold of professional video tools :)15:31
dtantsurhehe15:31
dtantsurI tried making a video from fragments.. I've seen some horrors...15:31
*** Guest3726 is now known as dking15:32
TheJuliaI think I understand PLDM enough to hold down a conversation15:32
TheJuliaI may have had to push some python bits out of my brain for this15:32
dtantsurJayF: ikr.. I'm afraid our ancient phones will need replacements soon, so I'm going to choose between the android crap and the apple crap...15:32
clarkbI would probably do that using an xmonad tiling mode that emulates workspaces on a single workspace15:33
dtantsurTheJulia: is there a short(ish) overview to read?15:33
dtantsurclarkb: but will your browser be able to share only one such workspace?15:34
dkingIs anybody here familiar with troubleshooting SQLAlchemy errors "QueuePool limit of size X overflow Y reached, connection timed out..."? I'm seeing those in my Ironic logs, and I can connect directly to MariaDB, and I don't know how to troubleshoot SQLAlchemy outside of adjusting the code.15:34
TheJuliadtantsur: https://www.dmtf.org/sites/default/files/standards/documents/DSP0240_1.0.0.pdf15:34
dtantsurthx15:34
TheJuliathe rest of the related docs get into the nitty gritty of PLDM, but the base model *appears* to be rather flexible but command/response driven15:34
dtantsurdking: the first time I hear about it. is it linked with some sort of activity?15:34
TheJuliaso "give me your state" -> "Here is my state" after you ask "what do you support" "here is what I support"15:35
clarkbdtantsur: ya I would share the single workspace then use window management to swap windows aroudn appropriate on the workspace15:35
dkingdtantsur: I'm not certain. I don't know of any special activity going on. We're not running that many nodes. It's showing up with: ERROR futurist.periodics [-] Failed to call periodic 'ironic.conductor.manager.ConductorManager._sync_power_states'15:35
dtantsurclarkb: Right, makes sense. I'm on MATE nowadays though15:36
TheJulia... So if memory serves the connection pool is like 8 connections to start15:36
TheJuliadking: how much concurrency are you running for power state sync?15:37
dkingHere, I'm seeing "limit of size 5 overflow 50 reached". I'll have to count, but I didn't thing we had 50 nodes in.15:37
TheJuliadking: is the version of mariadb on sqlalchemy's list of supported versions?15:38
dtantsurTheJulia: ouch, even the glossary is quite a text to read15:38
TheJuliadking: that sort of sounds like mariadb is locking or has a load balancer dropping the connection15:38
dkingTheJulia: I can check. We're using Metal3, so it's likely whatever comes stock with that.15:39
TheJuliadking: as a starting point in db troubleshooting, try making a backup on the server side15:39
TheJuliaoh15:39
TheJuliahmm15:39
JayFIf you're using metal3, there is also potentially value in asking in their slack if we're not able to figure it15:40
TheJuliaso no load balancer15:40
dkingTheJulia: I thought about that, but it's not really using a load balancer. I think it might just be 1 replica, and I can connect to the DB just fine from within the mariadb container. But those are good things to check.15:40
TheJuliais the db in a separate pod?15:40
JayFof course half the people are in both channels so....15:40
dkingJayF: Yeah, the best people for Metal3 are in here, too. :)15:40
dtantsurdking, TheJulia, a useful thing to know about Metal3 is that its MariaDB image is minimized on purpose15:40
dkingTheJulia: No, it should be the same pod.15:40
TheJuliadking: there could be a transaction deadlock... but that should get detected... there is still a chance a held lock is not detected15:41
dtantsure.g. https://github.com/metal3-io/mariadb-image/blob/main/runmariadb#L22 used to be even lower than this15:41
dtantsurhttps://github.com/metal3-io/mariadb-image/blob/main/runmariadb#L46-L48 is a suspect too15:41
TheJuliawhich is stupidly weird15:41
dtantsurdking: I'd start with playing with these values ^^^15:41
TheJuliaas are max connections, since that can scale based upon what is going on15:42
TheJuliaif you can't dump the db or it hangs, then lock is likely it15:42
TheJuliabut dump *through* the server, not from the files (i think that got removed in mariadb...)15:43
* TheJulia has a rather dusty DBA hat someplace15:45
* TheJulia also had the arcane book of databases someplace.... 15:45
dtantsur(side note: "the same pod" thingy is something I'd like to fix eventually)15:45
dkingWell, I'm currently wondering if I can troubleshoot the issue live, as I'm suspecting that it won't replicate (or at least not quickly) if Ironic gets restarted.15:48
TheJuliayou can try and get a feeling from the DB, but yeah, any restart is going to change things15:49
dkingAnd it didn't happen right away, so it may take a while to show up again, if ever. I'm going to check the logs to see how long it's been going on. At least I have some configs that I can play with.15:51
dtantsurdking: the recommendation about slack is not too bad, because the Ericsson folks do use MariaDB (unlike us in OpenShift)15:52
dkingSo, for now, the consensus is that something hung, we don't really know where, but restart, and if it comes back, attempt to increase the limits mentioned above?15:52
dtantsurthat's the only thing that comes to my mind15:53
TheJulialikewise15:53
dkingOkay, thank you very much!15:53
JayFIs metal3+mariadb a common deployment (with the built-in, slimmed mariadb container?)15:53
dtantsurJayF: yes. I don't know the proportion, but both sqlite and mariadb are common (and mariadb was there first)15:54
JayFack15:55
opendevreviewTakashi Kajinami proposed openstack/ironic-inspector master: Fix python shebang  https://review.opendev.org/c/openstack/ironic-inspector/+/89858716:02
rpittaugood night! o/16:16
TheJuliao/16:19
* iurygregory is back16:26
iurygregoryTheJulia, gotcha, I will look at caching now to try to improve this (fingers crossed)16:28
TheJuliaokay, cool!16:28
iurygregoryTheJulia, newbie question, but do you think changing ironic.conf [agent]command_timeout  ( https://opendev.org/openstack/ironic/src/branch/master/ironic/drivers/modules/agent_client.py#L206 ) could help with their issue? since ironic logs show a lot of  https://paste.opendev.org/show/bXAGwIusnE3JdjWTWE43/ 16:50
TheJuliaI would think unlikely16:51
TheJuliaand that feels like a workaround for whatever state the agent is in when the call hits16:51
iurygregoryyeah, its more a workaround just to see if helps them16:52
iurygregorytill we figure out the cache and hope this would help them16:52
iurygregorythe first problem they had was during inspection, so increasing the timeout helped them16:53
dtantsurwell, "helped"16:53
dtantsurafter you walk them through raising all possible timeouts everywhere (and you'll need to), the final question will be "please make it so that the process does not take 6 hours" :D16:54
TheJuliaI guess the only want ot know for sure is to corrilate agent logs16:54
TheJuliadtantsur: ++16:54
iurygregorydtantsur, yeah16:54
dtantsurAgent timeouts make me think about insufficient eventlet monkey patching16:55
dtantsurOtherwise why is IPA not reponsive?16:55
TheJuliaor, is it transient and unrelated?16:55
TheJuliathere are a few different variables, you almost need a timeline of interaction drawn from both sides16:55
JayFthis is the same issue that started with multipath, yeah?16:56
dtantsuryep16:56
JayFI'll note that IO is one of the places that can lock up in a way that python can do nothing whatsoever about it16:56
TheJuliaat 12:01:31.0311 the agent started waving a white flag saying "please, cache the data"16:56
TheJuliaThe conductor continued to think it was dancing until 12:02:3116:56
*** dking is now known as Guest374117:20
*** Guest3741 is now known as dking17:20
dkingdtantsur: Are you still around?17:40
dtantsurdking: I'm not far from the computer17:40
dkingI asked a Metal3 question on slack, but it seems slow there. I'm ended up having to restart my pod, and the Ironic DB got cleared. I believe that BMO populates Ironic, and it did put my inspecting node back. In the past, I thought that it put back in the other nodes, but it hasn't done so yet. Is there a way to force it to populate Ironic with the nodes it has provisioned?17:43
dtantsurdking: it should eventually, but for nodes that are in a stable state there will some delay until the reconciler gets to them17:43
dtantsur(Ericsson folks are also in Europe, so you'll need to get back in the morning)17:43
dkingdtantsur: That's good to know. It's been about 45 minutes, so I was starting to worry. Do you know roughly how long before starting to worry? 17:45
dtantsurhmm, 45 minutes IS quite long to my taste. Anything happening in the baremetal-operator logs?17:45
dkingdtantsur: Hmm. I see: {"level":"info","ts":"2023-10-17T17:46:35Z","logger":"controllers.HostFirmwareSettings","msg":"provisioner returns error","hostfirmwaresettings":{"name":"***","namespace":"machines"},"Error":"could not get node for BIOS settings: Host not registered","RequeueAfter":30}17:47
dtantsurwell, that's fair - the node does not exist17:47
masgharWhat is the RIBCL business with HPE servers...and Redfish being its alternative in iLO - do we enable RIBCL in the server explicitly?17:48
dkingdtantsur: Not in Ironic. I was thinking that BMO would re-populate an empty Ironic. Would it only be able to update existing Ironic nodes?17:48
dtantsurdking: no, it (should) always create nodes. I'm not yet sure why it does not - the logs may have a clue (but you may need to find it between other logging)17:49
JayFmasghar: I'm not sure; we have HPE downstream driver devs joining us for a PTG session to talk about how future of HP drivers in Ironic and how those kinda changes impact it17:49
dtantsurmasghar: RIBCL is their previous protocol. They've been moving to Redfish for a long time already (they were among the founders of Redfish)17:50
masgharJayF: and dtantsur: I see. I'm trying to 'manage' an HPE server and its failing with RIBCL, and the Redfish is trying to use https but http is actually what should be used17:51
dtantsurmasghar: I can take a look tomorrow (nearly 8pm, c'mon! :)17:52
dtantsurMake sure you're using the Redfish driver, I have no experience with the iLO one17:52
masgharAh yes, of course! Thank you17:52
dtantsur(and check the firmware version against our docs - that's what we tested)17:52
masgharI will switch to redfish and see, and check the firmware version too17:53
dtantsurmasghar: you should call it a day too, it's just as 8pm for you ;)17:53
dtantsur(I'm actually watching youtube with IRC open on another screen)17:53
masgharI started pretty late so its alright ^-^17:53
JayFdtantsur: that used to be more common for me17:53
JayFdtantsur: note my lack of commentary in here late-PST since starfield released ;) 17:53
dtantsur:D17:54
dtantsurmasghar: for us in OCP, the interesting drivers are idrac+idrac-redfish* for Dell, just redfish for everything else (FJ folks are testing their stuff themselves)17:56
dtantsur(we only have best effort support for the ilo driver and only for iLO 4)17:56
JayFilo driver does work pretty much outta the box for ilo4/5 stuff, but the things you need in driver_info are different17:56
JayFfor upstream ironic sake17:56
masghardtantsur: noted, thanks! One thing in the HPE docs section is a bit outdated, I did note17:57
dkingdtantsur: Nevermind. It must have just taken a bit. They seem to be back now.17:57
dtantsurJayF: right, we've made a decision to focus much more on Redfish in the Metal3 world17:57
masgharJayF: alright17:57
JayFdking: I prescribe you a walk and a cup of coffee next time before troubleshooting ;) 17:57
JayFdtantsur: makes sense, and fwiw I think that's the correct decision too17:58
dtantsurdking: eventual consistency, amiright?17:58
JayFeventual consistency makes operators nervous (at least it made me nervous)17:58
JayFbecause our pagers go off until "eventual" comes about :P 17:58
dkingJayF: I just got worried because it was about an hour without nodes in Ironic and I wasn't completely sure that BMO was supposed to even poplate provisioning nodes into Ironic.17:58
JayFhonestly that's a metal3/bmo question more than an Ironic one17:59
dtantsuran hour is wild, but I'm not sure how often the reconciler runs17:59
dtantsuryeah17:59
JayFand probably is more related to how slimmed down things are in those containers17:59
dkingI'm just not used to "eventually" being so long. :) But it looks like it was fine in the end.17:59
JayFit's impressive how much Ironic can scale up and down these days17:59
dtantsurdking: it's something we've discussed internally already: when BMO does not know that Ironic was gone, it cannot run an out-of-order reconcile17:59
dtantsurstill, an hour.. wow17:59
dkingYeah, and only 40 nodes. I've run instances with over 200, so that seems a bit long for such a low number.18:00
dtantsurCould be something to experiment with whenever you have some time. To see if it's a reproducible behavior.18:01
dtantsur(Our folks scale-test with ~ 3500 nodes, but I don't think they try to kill Ironic in the process)18:03
TheJulia3500 sounds... sadistic.19:12
JayFwhen we had 300 node clusters at $oldJob, and 3 reschedules setup19:12
JayFand the racey-as-hell clustered compute manager19:12
JayFpquerna used to load test us without warning by issuing 100 build requests19:12
JayFwe usually got about 85% eventual success rate, which was pretty great (most of the 15% failures were losing the CCM race 3x times in a row)19:13
JayFthat was spicy for the age of ironic at the time and the immaturity of the ironic/nova driver and all that19:13
TheJuliaif memory served, it really expected the ccm to do that and one to always sort of fail19:14
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Add test for dhcp-less vmedia based deployment  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/89800620:45
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Add test for dhcp-less vmedia based deployment  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/89800623:00

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!