Friday, 2024-05-10

*** jph0 is now known as jph00:48
*** jph0 is now known as jph07:20
iurygregorygood morning Ironic11:12
adam-metal3hello Ironic11:53
TheJuliagood morning13:13
adam-metal3Would somone have some time to discuss this with me: https://bugs.launchpad.net/ironic-python-agent/+bug/2061362, I would still like to go forward with this proposal if possible ?14:58
TheJuliaadam-metal3: sure, I think I just broke someone's brain with the CRA, so I just need to make coffee to celebrate first15:01
TheJuliaadam-metal3: does zoom work for higher bandwidth, or... ?15:09
TheJuliaewww, the spi flash is getting presented as a device15:11
TheJuliaadam-metal3: any chance we can get lspci and lsusb output from a host presenting this sort of device?15:12
JayF++ that's exactly what I was going to ask15:13
TheJuliapci-0000:00:14.0-usb-0:4:1.0-scsi-0:0:0:2 <-- That sure looks like a BMC to me15:13
TheJuliaan Aspeed based BMC at that15:14
TheJuliawith the virtual usb interconnect15:14
TheJuliaI think the answer is "if we can fingerprint it" it makes sense to just go "ugh, yeah, this is a thing"15:15
TheJuliaThere is a method of using SPI flash modeling to present drivers to a host OS15:16
TheJuliawhich if this is readable with a filesystem, I bet that is exactly what it has on it15:16
TheJuliaif it can't be read, then it is active but not presenting contents, but device IDs are kind of what I'm curious about15:21
TheJuliaI've sort of seen something like this in the past and it presents *very* similarly to the way virtual media devices present15:21
adam-metal3well sure zoom is fine, but related to the device I can't really give more info not that I have not checked but as I wrote there is nothing else about these devices just size , model and the PCI address 15:22
TheJuliawell, there is an emulated usb bus and a pci bus it is coming across, so there should be device IDs15:23
TheJuliain the lsusb output15:23
adam-metal3ohh lsusb, that one I have not asked for just lspci, lsblk, udevadm I will ask for the lsusb then not sure when I will get any answer though , do you have something in mind about how to filter out such devices ?15:25
TheJuliaso, we have seen similar things, but typically they have presented as a read only device like a CD, and thus get scrubbed out from the list of devices15:26
TheJuliaIn this case, it looks and smells like a disk, so we would either have to figure out a pattern to identify it upfront as a filtered entity, or a pattern after the fact15:27
TheJuliaBecause there really is no writing to this device most likely, it is just getting presented in an odd way and we somehow need to handle that if we can "fingerprint" it15:28
TheJuliaand if it matches what we expect as a "this is a known thing" I think we're cool maybe even accepting that as a default, a generalized knob is what concerns us15:28
JayFI'm now pondering if you could actually do harm to a device like that trying to wipe it15:28
TheJuliaJayF: I don't think they accept writes at all, I think io errors result15:28
TheJuliayou can seek/read most likely15:29
TheJuliaIt is *likely* a BMC firmware bug actually15:29
JayFyep, that's where I was going15:29
TheJuliabecause a device can express itself as read-only15:29
JayFdoes it show up as a read only device in lsblk15:29
JayFand if not, we can fix on ironic side, but customer should report a bug to hardware vendor15:29
TheJuliaGood question, it might not or it might, we're sort of grasping at straws without the additional info15:30
JayFyou know, it gets interesting tho15:30
JayFbecause it could still be dangerous to skip r/o devices during cleaning15:30
JayFespecially since that's a failure mode for many forms of storage15:31
TheJuliayeah, this is why it is nice when CD's are known to be CDs :)15:31
TheJuliaor "virtual CDs"15:31
TheJuliayeah, a blanket read-only, I'd still prefer a little deeper of a verification15:31
TheJuliaI want to get to "this looks, smells, and feels like it is a fake BMC device"15:32
JayFyep, exactly15:32
TheJuliaand, I'm cool with skipping fake bmc devices15:32
JayF++15:32
TheJulia(but, do definitely file a bug with a hardware vendor)15:33
JayFsomething like `lspci -M` + `lspci -mm` might give enough combined info to at least tie it back to the bmc on the pci bus, too15:33
JayFbut we really just need more info15:33
adam-metal3I hae to check the the logs, I remember it presented itself as "disk" I will check the access mode 15:34
TheJuliaAnd the line from lsusb representing the device or device's controller15:34
TheJuliaI think the issue is it also sort of boils down to how the transport is being handled with the host15:35
TheJuliaso it will govern how that reports15:35
TheJuliaTIL my desktop has an ISA bridge15:35
JayFmost do15:35
JayFthat's how coretemp works on intels, for instance15:36
JayFit's ISA straight to the chip15:36
* TheJulia blinks15:36
* TheJulia blinks more15:36
JayF"straight to the chip" [hand waving intensifies]15:36
JayFadam-metal3: TheJulia: the other thing we could consider, if we can't ID the device positively: if a block device is smaller than "X" and can't be wiped, it's probably OK (optionally)15:37
JayFI'm assuming this phantom BMC disk isn't 72 terabytes15:37
JayFobvious hyperbole, but I would hope it'd be small enough to be clearly not a data disk15:38
JayFhmm was that in the bug15:38
JayFnope, it wasn't15:39
TheJuliawe've done some similar fingerprinting in the past, it is about building some level of comfort and understanding15:44
TheJulia(and lots of comments)15:44
TheJulia(in the code)15:44
adam-metal3Thanks for the discussion I will get back with one set of the data but the lsusb will be max next week, since I have provided a patch in the form of this quiet cleanup my downstream was fine with it and run away :D15:46
JayFadam-metal3: so, re: the other half of that bug with an implied question; in metal3 ramdisk builds is there a way for you to install an additional python package alongside IPA in that venv?15:48
JayFadam-metal3: that's basically what you have to do in order to add custom cleaning steps or override existing ones15:48
JayFe.g. https://opendev.org/openstack/ironic-python-agent/src/branch/master/examples if you copied custom-disk-erase out of here into your own repo, overwrite erase_devices_metadata with a version that ignores failure, and installed it in the venv alongside IPA, it'll be used automatically15:49
adam-metal3JayF, yeah with building custom IPA I can, but my dowsntream req was to use 1 IPA for all envs so if I overwrite the step then is tehre a way to switch back and forth between the default and the overwritten step? or should I kind of "hack" it together something like "put a condition to the overwriting package to check for e.g. a kernel cmdline argument and based on that start the default or the custom cleanup step" ?16:04
JayFif you look at the example, that's what evaluate_hardware_support method is for16:04
JayFthere are pretty comprehensive docs on this interface here: https://docs.openstack.org/ironic-python-agent/latest/contributor/hardware_managers.html16:05
JayFit's an old doc but the interface hasn't changed in that time16:05
adam-metal3okay nice then this seems like a good first option for me but anyways I will try to get more info also about the root cause, thanks!16:05
JayFhttps://docs.openstack.org/ironic-python-agent/latest/contributor/hardware_managers.html#priority is the specific mechanism you're looking for16:06
JayFYeah, I suggest having this ability in your toolbox, it'll let you do things that we won't always be cool with upstream (e.g. your existing RFE)16:06
adam-metal3nice16:06
-opendevstatus- NOTICE: There will be a short Gerrit downtime while we update a database and our container image17:14
JayFI discovered today servicing never got added to the state diagram; was that intentional? (I know we have some issues with tooling to generate those)20:15
opendevreviewTony Breeds proposed openstack/bifrost master: [DNM] Testing docs bump with new Sphinx  https://review.opendev.org/c/openstack/bifrost/+/91923520:51
opendevreviewTony Breeds proposed openstack/ironic master: [DNM] Testing docs bump with new Sphinx  https://review.opendev.org/c/openstack/ironic/+/91926420:53
opendevreviewTony Breeds proposed openstack/metalsmith master: [DNM] Testing docs bump with new Sphinx  https://review.opendev.org/c/openstack/metalsmith/+/91927520:54
TheJuliaJayF: likely the image generation never got ran20:57
TheJuliait has to be changed and uploaded20:57
JayFI'll see if I can turn that into a quick win before I start a weekend 20:57
opendevreviewTony Breeds proposed openstack/sushy master: [DNM] Testing docs bump with new Sphinx  https://review.opendev.org/c/openstack/sushy/+/91933120:59
opendevreviewTony Breeds proposed openstack/tenks master: [DNM] Testing docs bump with new Sphinx  https://review.opendev.org/c/openstack/tenks/+/91933620:59
opendevreviewTony Breeds proposed openstack/virtualbmc master: [DNM] Testing docs bump with new Sphinx  https://review.opendev.org/c/openstack/virtualbmc/+/91934021:00
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: reboot the node in basic ops tests  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/91846221:08
opendevreviewJay Faulkner proposed openstack/ironic master: Add servicing states to states doc, fix state diagram  https://review.opendev.org/c/openstack/ironic/+/91935321:32

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!