Tuesday, 2022-08-16

opendevreviewJulia Kreger proposed openstack/sushy master: Capture more errors  https://review.opendev.org/c/openstack/sushy/+/85320900:14
TheJuliadtantsur: I'm wondering if ironic should just dump the session on *any* exception00:26
opendevreviewVanou Ishii proposed openstack/ironic stable/yoga: Fix iRMC driver to use certification file in HTTPS  https://review.opendev.org/c/openstack/ironic/+/85279704:00
arne_wiebalckGood morning, Ironic!06:44
arne_wiebalckGreat, thanks stevebaker[m] !06:45
opendevreviewJakub Jelinek proposed openstack/ironic-python-agent master: WIP: Enable skipping RAIDS  https://review.opendev.org/c/openstack/ironic-python-agent/+/85299907:12
kubajjdtantsur: TheJulia: as discussed on Thursday I think it is the easiest to address the skip list issue for RAID devices by pointing to them by the volume name (I found this in the documentation), however, if I am correct, the name is not used. Therefore, I have created a new change which adds the name to the create raid function. Let me know what you think: https://review.opendev.org/c/openstack/ironic-python-agent/+/85318208:10
dtantsurTheJulia: probably not all HTTP exceptions, but 401 and any connection issues10:09
dtantsurkubajj: not sure I'm aware of the issue you're referring to10:10
opendevreviewMerged openstack/sushy-tools master: remove unicode from code  https://review.opendev.org/c/openstack/sushy-tools/+/85262210:27
opendevreviewMerged openstack/sushy-tools master: List network interfaces of all types  https://review.opendev.org/c/openstack/sushy-tools/+/84428410:37
opendevreviewAija Jauntēva proposed openstack/sushy-tools master: Add Chassis to ServiceRoot  https://review.opendev.org/c/openstack/sushy-tools/+/84412610:44
iurygregorygood morning Ironic, I'm alive o/11:09
dtantsuryay, congrats iurygregory 11:11
iurygregoryty!11:12
iurygregorydtantsur, quick question, do you think this would be enough to cover all vmedia cases we have https://review.opendev.org/c/openstack/ironic/+/852234 or I would need to update other drivers to support it? (talking in case they are not using redfish driver itself...)11:14
TheJuliaiurygregory: if I recall correctly, they use the image utils as well11:18
iurygregoryTheJulia, yeah, just wondering if would need to specify the new field "external_http_url" for driver_info ...11:19
iurygregoryalso, good morning  (very early for you...)11:19
TheJuliaI think it is likely covered, but relatively easy to check.11:20
TheJulia(Up early because having some pain that is keeping me from getting back to sleep)11:20
iurygregoryouch =( sorry to hear that11:20
dtantsuroh, I hope you get better asap11:21
TheJuliaI am likely going to need to go find a doctor this morning… but none of them are open for 4+ hours and our insurance in the US denied my last visit to an emergency room as “inappropriate class of service for diagnosis”11:23
kubajjdtantsur: We discussed with TheJulia how to extend the skip list safeguard to RAID devices. We agreed on that including the volume name (user-defined) in the list might be a good approach. Volume name is mentioned in the docs, but not used during the create raid device function execution. That's what the new change addresses.11:30
dtantsurkubajj: volume name won't help you11:31
dtantsurvolume name becomes /dev/md/<volume>, while you need /dev/md5 (we don't work with any symlinks)11:31
dtantsurI already hit a similar dead corner when trying to make /dev/disk/by-<something>/<something> work11:31
kubajjdtantsur: but couldn't we work with them?11:31
dtantsurpotentially - yes11:32
dtantsurspeaking of which, you need to update ironic-inspector since it also deals with root devices11:32
dtantsurironic-inspector processes disks on the server side, making symlink resolution.. interesting11:32
kubajjI was planning to get the list of logical devices and then possibly check if the symlink returns something or an error11:33
dtantsurkubajj: https://opendev.org/openstack/ironic-inspector/src/branch/master/ironic_inspector/plugins/standard.py#L40-L6711:33
dtantsurkubajj: how do you do it outside of the node?11:33
kubajjIPA runs on the node, right? What would we need it outside for?11:34
dtantsurkubajj: ironic-inspector runs outside, please see the messages above and the link11:34
dtantsurkubajj: I started https://review.opendev.org/c/openstack/ironic-lib/+/845986 but then realized the problem with inspector and gave up11:35
dtantsurwe could potentially send all aliases as part of the inventory. i.e. check /dev/disk/*, /dev/md/* for symlinks to the device.11:36
TheJuliaWRT volume names, I was thinking hardware raid, not software raid11:36
dtantsurTheJulia: hardware RAID are just normal disks, no?11:36
TheJuliaSince you may not actually see underlying disks11:36
dtantsuryou won't even see the volume name from inside the machine?11:36
TheJuliadtantsur: depends on the controller, but generally yes11:36
kubajjTheJulia: I see, that's my bad then11:37
TheJuliadtantsur: most expose the volume name11:37
dtantsurbut do we collect it?11:37
TheJuliaTruncated down to 12 chars….11:37
dtantsuror rather: do we have a vendor-neutral way to collect them?11:37
TheJuliaAFAIK, no:11:37
dtantsurkubajj: side note: regardless of this discussion, please do update ironic-inspector, otherwise you're going to have 2 different logics to determine the root device11:38
TheJuliaUhhhh there is a field in the SATA/SCSI spec afaik11:38
kubajjdtantsur: What does the inspector do?11:38
TheJuliaI’m not sure how we would easily capture aside from going and picking the field out from the kernel.11:38
TheJuliaBut my perception was always agent side processing, nothing to do with root device hinting11:39
dtantsurkubajj: as part of processing the introspection data, it determines the root device and sets the node's local_gb accordingly11:39
TheJuliaAgent side processing in regards to what to ignore11:39
TheJuliaWhich reminds me… is local_gb even applicable to any scheduling anymore?11:40
TheJuliaI guess “the old nova scheduler” way still gets used by some11:40
kubajjdtantsur: and so I should update it to also ignore the disks on the skip list, am I following correctly?11:41
dtantsurkubajj: correct11:42
dtantsurTheJulia: we use it for free space calculation, I think11:42
dtantsurbut you're right, we should start phasing all this properties stuff out (except for cpu_arch)11:43
TheJuliaIt is good we have maintained such long compatibility!11:44
dtantsur\o/11:47
opendevreviewMerged openstack/ironic-python-agent master: Enable skipping disks for cleaning  https://review.opendev.org/c/openstack/ironic-python-agent/+/85086111:49
opendevreviewJakub Jelinek proposed openstack/ironic-python-agent master: Improve function list_block_devices_check_skip_list  https://review.opendev.org/c/openstack/ironic-python-agent/+/85328412:39
kubajjHere is the follow-up dtantsur 12:39
dtantsurThanks!12:46
opendevreviewJulia Kreger proposed openstack/sushy master: Capture requests errors  https://review.opendev.org/c/openstack/sushy/+/85320913:44
TheJuliadtantsur: ^13:44
TheJuliaJayF: I think https://review.opendev.org/c/openstack/ironic/+/852794 is ready to rock and roll13:48
opendevreviewJakub Jelinek proposed openstack/ironic-inspector master: Introduce skip list to inspector  https://review.opendev.org/c/openstack/ironic-inspector/+/85330413:55
kubajjdtantsur: and this should be the inspector13:57
frickleriurygregory: any news on jsonschema? should I nag someone else instead? ;)14:29
JayFTheJulia: +214:48
iurygregoryfrickler, still trying to figure out what is happening, was able to reproduce issues but not to directly fix, seems like the exceptions changed I'm trying to debug to see how to make it match...14:52
kubajjTheJulia: dtantsur: What should I do about the software RAIDs then if I can't use the names?14:55
kubajjAnd what is the local_gb?14:56
TheJuliaiurygregory:  I've taken a look at https://review.opendev.org/c/openstack/ironic/+/852797.  mainly release note stuff I think. I'm leaning towards -1, but I could just see fixing the reno stuffs and just pushing the button. Not a fan of their own version determiantion logic getting added, but a fan over the overall approach since the lower minimum version was never incremented upward as time went on15:11
TheJuliakubajj: local_gb is something I think you can ignore, it was mainly used for scheduling a long long time ago representing a sub of available storage on the root disk.15:11
iurygregoryTheJulia, looking15:12
TheJuliakubajj: to dtantsur's point, you can't use a name on a traditional software raid and expect it to be exposed through. You *can* enumerate the devices in use via /proc/mdstat15:13
TheJuliaso if you have say a /dev/md1 you can see it's parent devices might be /dev/sda3 /dev/sdb3 via /proc/mdstat15:14
kubajjTheJulia: I just came across it while writing the tests for inspector and had no clue where the numbers are coming from15:14
iurygregoryTheJulia, the logic you are talking is https://review.opendev.org/c/openstack/ironic/+/852797/7/ironic/drivers/modules/irmc/common.py right?15:16
TheJuliaiurygregory: yup15:16
iurygregoryI can say is a bit interesting...15:17
TheJuliakubajj: data submitted on the likely root device, I believe15:17
opendevreviewJakub Jelinek proposed openstack/ironic-python-agent master: Improve function list_block_devices_check_skip_list  https://review.opendev.org/c/openstack/ironic-python-agent/+/85328415:17
iurygregoryI do agree with your comment re 0.8.215:17
TheJuliaiurygregory: mixed feelings is my state of mind at the moment15:18
iurygregoryno worries15:26
adam-rozmanHi all! Do I have to do some extra step to enable special operatiors for IPA root device hints? I have used a "<in> deviceserial" for the "serial" root device hint but my expressiong gets prepended with "s==" prefix as if IPA is unable to recognize that the expression contains an operator.15:41
adam-rozmanI got a response at the and such as Image provisioning failed: Deploy step deploy.write_image failed15:44
adam-rozman    on node 7f8ac1d4-c478-437a-81a3-9b276bbb99a5. No suitable device was found for15:44
adam-rozman    deployment using these hints  {''serial'': ''s== "<in> deviceserial"\n''}'15:44
dtantsuradam-rozman: feels like you somehow ended up with extra quotes15:45
adam-rozmanthat could be dtantsur I have put it in the serialNumber field of a BMH object 15:46
dtantsuradam-rozman: ah metal3. I'm not sure it supports operators. lemme check.15:48
dtantsur(I should have guess of course :)15:48
adam-rozmanthanks15:48
adam-rozmanI assume it is because I have tried to "edit" the manifest live and it was always removing my ""s and it was failing as it has only passed <in> but this way it failes because it carries over my ""s to IPA15:50
dtantsuradam-rozman: yeah, BMO hardcodes operators: https://github.com/metal3-io/baremetal-operator/blob/main/pkg/provisioner/ironic/devicehints/devicehints.go#L30-L3215:50
dtantsurI don't remember why it was done, but I assume because simplicity15:51
dtantsurmaybe we need to update it to leave any existing operators alone15:51
adam-rozmanbut it is also done on IPA level15:51
dtantsursorry?15:52
adam-rozmanhttps://opendev.org/openstack/ironic-lib/src/commit/95ce746ad101b7af32684c29384237061565a4af/ironic_lib/utils.py#L28315:52
dtantsuradam-rozman: yeah, but ironic-lib takes existing operators into account. BMO does not.15:52
dtantsurthis part actually checks for a correct operator: https://opendev.org/openstack/ironic-lib/src/commit/95ce746ad101b7af32684c29384237061565a4af/ironic_lib/utils.py#L279-L28015:52
dtantsurnow, if you get redundant quotes, chances are high ironic will be confused as well15:53
adam-rozmanyeah but then I assume until BMO hardcodes the "s==" prefix I can't even pass any operator to IPA15:54
adam-rozmanwouldn't it be okay to just remove the "s==" prefix as IPA would anyhow check if there is operator or not?15:54
adam-rozmanI mean remove from BMO15:54
dtantsuradam-rozman: it will, I seem to remember the argument was "explicit better than implicit"15:55
adam-rozmanI actually have a usecase where I would need to use the operators in serial device hint in metal3, it feels to me as if a very good Ironic/IPA feature would be artificially blocked. Would it be okay if I would open an issue related to this in BMO? 16:00
adam-rozmanOfc I wouldn't like to spam the Ironic community with this issue as it turns out to be a BMO thing, so I will transition the conversation there. And thanks for the help dtantsur !!!!16:03
JayFadam-rozman: you're using our stuff, and a reasonably advanced feature of it... but it's broken, but it's not our fault it's broken?16:09
JayFthis is all good news LOL16:09
adam-rozmanyep exactly as you say JayF16:11
JayF:D 16:12
cboucharIs there a limit to the number of BM hosts one should manage with a single ironic instance?  If so, do share with me the magic number.18:05
JayFI don't think we have an official numbere18:09
JayFand realistically, that number can go crazy-high depending on your use case, driver choice, and configuration18:09
ashinclouds[m]Like five to six figure high.  We generally recommend a starting point around 500 nodes per running conductor with default settings and the ipmi driver (as it has a much higher cpu overhead). Redfish is much more lightweight as we have cached sessions and by default config we will cache up to a thousand sessions in each ironic-conductor (I.e. 1000 nodes per conductor(this can be tuned!)18:21
cboucharWow!  I didn't expect this as a response.  I seem to recall in metal3 there was a limitation 100-255.18:40
cboucharThank you for clarifying!18:40
TheJuliaWell, depend on access model, may change things, but we did a lot to improve overall API performance. The costly thing with API performance though is asking the API for all fields on all nodes19:02
opendevreviewJulia Kreger proposed openstack/ironic master: Allow project scoped admins to create/delete nodes  https://review.opendev.org/c/openstack/ironic/+/85279419:07
TheJuliaJayF: ^ removed the whitespace.19:07
TheJuliaajya: by chance do you watch the openstack mailing list?19:14
TheJuliacbouchar: I would be happy to have a deep dive performance discussion at some point20:25
opendevreviewJulia Kreger proposed openstack/ironic master: Add kickstart template 'url' option  https://review.opendev.org/c/openstack/ironic/+/85336821:18
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Initial tempest test idea anaconda deploy  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/83591721:20
opendevreviewJay Faulkner proposed openstack/ironic stable/yoga: Fix pxe image lookups  https://review.opendev.org/c/openstack/ironic/+/85333122:22
opendevreviewJay Faulkner proposed openstack/ironic stable/yoga: Fix pxe image lookups  https://review.opendev.org/c/openstack/ironic/+/85333122:37
JayFTheJulia: ^^ Does that have a prereq, or did I resolve the conflict badly? Tests are failing around the changed code. 22:37
* JayF is on a mission to ensure all our BC+1 patches are actually backported22:37
TheJuliaI don't *think* it did, but I can look likely tomorrow22:38
JayFboth failures are22:38
JayF>     ironic.common.exception.ImageUnacceptable: 'stage2_id' is missing from the properties of the OS image http://fake.url/path. The anaconda deploy interface requires this to be set with the OS image or in instance_info['stage2']. 22:38
JayFwhich makes me wonder if there's a prereq on some of the validation cleanup that I very vaguely remember happening 22:38
JayFand/or when resolving the conflict, I broke these tests or left them in when they shoulda been removed22:39
TheJuliaoh... hmmmm22:39
TheJuliayes, there is a prereq missing then22:39
TheJuliahmmmm22:39
JayFI'm going to put this chat in the comments of that merge req, if you have neurons fire in the direction of this change, put the info there and I'll follow-up on it22:40
TheJuliait could be https://review.opendev.org/c/openstack/ironic/+/83576922:40
JayFTheJulia: also, a blast from the past now landing in victoria https://review.opendev.org/c/openstack/nova/+/80087322:41
TheJuliaI also did https://review.opendev.org/c/openstack/ironic/+/83470922:41
JayFare those both backportage?22:41
JayF834709 is HUGE22:41
TheJuliawow....22:41
TheJuliauhhhhhh I'm not sure about 70922:41
JayFif 769 applies without 709, I can try that22:42
JayFthat at least is sensible22:42
TheJuliaactually 709 is one I intended to eventually backport22:42
JayFLike, trying to think of the right way to ask this22:43
JayFis this a good use of my time? 22:43
JayFaka do we have any KS users in Yoga-and-earlier?22:43
TheJuliaI think cbouchar is using yoga right now22:44
JayFthe answer is probably "screw it do it anyway jay" okay I'm convinced lol22:44
TheJulialol22:44
* JayF is scared of https://dpaste.com/8SX8M8CVH22:45
JayFI'll make a note and revisit this tomorrow, this is gonna be a bloodbath and I don't wanna start it this late lol22:45
TheJuliaoh... my22:46
TheJulia++++22:46
JayFso to be explicit: 834809 then 835769 then 85220122:46
TheJuliaI *think* so, yes22:46
JayFand TBH looking at that cherry-pick, might ensure that there's nothign BEFORE 809 that needs to go22:46
JayFnothing says "I had a super productive day" more loudly than your todo list being longer at the end of it than at the beginning22:47
JayFand I'm kinda serious lol22:47
TheJulialol22:48
TheJuliayeah22:48
opendevreviewJulia Kreger proposed openstack/ironic master: Add kickstart template 'url' option  https://review.opendev.org/c/openstack/ironic/+/85336822:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!