Wednesday, 2022-07-20

opendevreviewJulia Kreger proposed openstack/sushy master: Handle AccessError with Basic Auth instead of "reauth"  https://review.opendev.org/c/openstack/sushy/+/85042500:27
TheJuliaJayF: replied to your primary comment on https://review.opendev.org/c/openstack/ironic/+/850259 :)00:38
TheJuliajust fyi, looks like they were not using session auth, but basic auth00:39
TheJuliaI guess the change is still basically valid and it boils down to proactive vs reactive00:42
rpittaugood morning ironic! o/06:49
rpittauJayF, TheJulia, I approved https://review.opendev.org/c/openstack/ironic-python-agent/+/839084, see if I got the time to use lsblk json output in a follow up06:58
rpittaurethinking about that it's better it was not used as the backport would've been problematic06:58
opendevreviewUemit Seren proposed openstack/sushy master: Implement allowed values for BootSourceOverrideEnabled  https://review.opendev.org/c/openstack/sushy/+/85046108:22
opendevreviewMerged openstack/ironic-python-agent master: Guard shared device/cluster filesystems  https://review.opendev.org/c/openstack/ironic-python-agent/+/83908408:23
jandersgood morning rpittau and Ironic o/08:43
rpittauhey janders :)08:53
opendevreviewMerged openstack/ironic master: Remove support for trusted boot  https://review.opendev.org/c/openstack/ironic/+/85023909:01
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Use lsblk json output for safety_check_block_device  https://review.opendev.org/c/openstack/ironic-python-agent/+/85049709:57
opendevreviewJakub Jelinek proposed openstack/ironic-python-agent master: Remove unused lines of code  https://review.opendev.org/c/openstack/ironic-python-agent/+/85049810:07
jandersarne_wiebalck do you remember what was the hardware platform that originally required https://review.opendev.org/c/openstack/sushy/+/818110 ("Handle weak Etags")?10:29
jandersI'm trying to build a mental model of the failure modes related to eTags that we know about... clearly certain Lenovo and HP models can be a challenge...10:31
janderswe don't know what we don't know but thinking through what we DO know may be a start10:32
arne_wiebalckjanders: I *think* I amended code which was a workaround, IOW I patched to code to actually work with the expected behaviour of the endpoint.  11:12
arne_wiebalckjanders:  you may want to have a look at this one: https://review.opendev.org/c/openstack/ironic-python-agent/+/85049811:12
arne_wiebalckjanders: (different topic)11:13
timeujanders,arne_wiebalck: originally when I contributed the virtual media patch implementation for the Lenovo hardware I also added the workaround to strip the \W from the Weak Etag because back then the Lenovo BMC accepted only strong Etag although it returns a weak one11:20
timeuI guess this broke it for Redfish implementation of some vendors (SuperMicro and HPE) that might return a weak Etag and only accept one 11:20
timeuAFAIK: arne_wiebalck patch to return both fixed the issue 11:21
timeuhowever that patch now broke virtual media for Lenovo nodes. At least for Lenovo nodes with newer firmaware returning the Etag unmodifed works now. My expectation is that it would also work for SuperMicro + HPE but that needs to be verified11:26
iurygregorygood morning Ironic o/11:31
pas-ha[m]hi all, have the next question re ironic api - we're trying to enable CADF notifications for our services, and ironic api is one of the weird ones. Most of other services use paste.ini and actually have the '/' path served by a special pipeline/app that only does version document. the audit middleware is used only on other pipelines that serve /vN paths.11:44
pas-ha[m]Ironic does not use paste.ini, the order and presence of middlewares in kind of hardcoded in the code.11:44
pas-ha[m]Would someone have suggestions on how to block audit middleware and ironic to trigger on e.g. '/' root api path which is not covered by keystone auth and is usually used by monitoring/healthchecks? it just produces too much noise... and somewhat malformatted CADF messages as there's no user/project etc to insert11:44
pas-ha[m]the only ways I see it currently is either re-write ironic api to use different 'apps' on diffferent paths (with e.g. pecan-mount) or patch keystonemiddleware to allow also ignored paths (additionally to ignored HTTP verbs)12:18
pas-ha[m]would be nice if someone has alternative suggestions :-) 12:18
arne_wiebalcktimeu: this is my understanding as well, yes12:21
arne_wiebalcktimeu: it broke new nodes?12:21
arne_wiebalcktimeu: is the breakage "justified" ?12:21
timeuyour patch broke all Lenovo nodes (both old and new firmware)12:29
timeuseems like the Lenovo Redfish can't handle if you pass two Etags12:29
opendevreviewkamlesh chauvhan proposed openstack/ironic-tempest-plugin master: Add iDRAC RAID cleaning steps tests  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/84160112:29
timeuI tried to even reverse the order 12:29
timeubut still the same error12:29
timeunewer Lenovo firmwares however don't need my workaround (stripping \W) anymore and I guess should behave the same as HPE, SuperMicro (just return the Etag as is)12:30
jandersarne_wiebalck timeu I haven't had a chance to do hands-on testing with Lenovos but from the problem reports I'm getting are consistent with timeu's assessment ("Lenovo Redfish can't handle if you pass two Etags")12:34
jandersarne_wiebalck timeu what I haven't had a chance to figure out though is where is the second eTag coming from. In basic queries against the System I've only gotten one eTag back. Do these only show up while querying certain URLs, do you know?12:35
arne_wiebalcktimeu: passing multiple Etags is allowed by the standard.12:37
jandersIn any case I wonder if we need to consider implementing a more complex logic where we try different things depending on server responses, cause it really seems that 1) one vendor's fix is another vendor's bug trigger and 2) the scale of the problem is too widespread to wait for the vendors to improve their compliance with standards due to high12:37
jandersimpact on users12:37
arne_wiebalckjanders: the 2nd etag is coming from the patch12:37
janders"not our fault but our problem" kind of stuff12:37
arne_wiebalckjanders: it gets one etag and returns it in two formats12:37
jandersarne_wiebalck hmm interesting - so is the patch "producing" the weak eTag?12:38
janders(in case it is needed)12:38
* janders is looking at the patch again12:38
arne_wiebalckjanders: FWIU, timeu 's patch stripped the qualifier off the weak etag (and weak etags should not have been used in the first place!)12:39
arne_wiebalckjanders: my patch kept timeu 's patch and also returned the strong etag12:39
arne_wiebalck(I think :-)12:40
arne_wiebalckand I think returning multiple pre-conditions is allowed12:41
jandersarne_wiebalck if I am reading the regex and the conditional right ( https://review.opendev.org/c/openstack/sushy/+/818110/1/sushy/resources/base.py#610 ) it seems that if the server returns a strong eTag, this code only returns one eTag, however if it returns a weak eTag, the code returns it in "strong" and "weak" "format", right?12:45
janders(I think Lenovo weirdness partially comes from the fact that it seems to operate on strong eTags, yet it fails out due to getting strong+weak together somehow)12:45
janders(I haven't figured out the source of that weak eTag in that case)12:46
arne_wiebalckjanders: yes, this is my understanding re the regex12:46
arne_wiebalckjanders: if a weak etag is returned at the start, this is already wrong I think12:47
arne_wiebalckjanders: oh, it returns a strong one, fair enough12:47
jandersarne_wiebalck we've seen it return a strong eTag when just GETting the System but when Ironic errs out, there is strong+weak eTags in the  failing query in the logs (I haven't yet figured out how this happens and why the behaviour varies)12:48
jandersarne_wiebalck timeu have you guys seen BMCs that return strong eTags in some responses and weak eTags in other responses to RedFish queries?12:50
arne_wiebalckjanders: no12:50
jandersarne_wiebalck I think I may need to gather some more evidence of that (once I'm done with some urgent downstream work in a week or so) and take it from there I suppose12:52
TheJuliagood morning12:53
timeuin my original workaround I received a weak etag from Lenovo BMC 12:54
timeuand had to return a strong one by stripping the W/ 12:54
timeuthis was with older firmwares12:54
timeuthe newer firmware still send a weak etag but also accept the same etag back 12:55
timeuso my original workaround is not anymore necessary for newer Lenovo firmware. 12:55
* TheJulia has reached "get contacted by conference organizing business" level of spam12:56
timeuLenovo firmware can't handle multiple Etags (weak + strong) although it is allowed from the standard12:56
janderstimeu in your experience, does the Lenovo BMC always send the weak eTag, or does it vary (sometimes strong, sometimes weak)?12:57
timeuas far as I can tell it always sends the weak one12:57
timeubut I need to check this12:57
timeuat least for PATCH on virtual media it does12:57
janderstimeu I'm trying to find the query we ran the last time - it may be interesting to run it on yours if it's possible12:58
timeusure I can do this12:58
jandersone query where we got a strong eTag was against the vMedia URL: /redfish/v1/Managers/1/VirtualMedia/EXT1 13:00
jandersthat was just a plain GET13:00
jandersGETting the System ( /redfish/v1/Systems/1 ) also seemed to produce a strong eTag in our test13:02
TheJuliaiurygregory: any issue with https://review.opendev.org/c/openstack/sushy/+/850425 getting the ironic-week-prio hashtag?13:03
iurygregoryTheJulia, none (we don't need to wait till the upstream meeting to add the tag ^^)13:04
iurygregoryalso, good morning TheJulia =)13:04
TheJuliaI know, i figured I'd ask anyway :)13:04
timeujanders: this was with Lenovo hardware ? 13:11
janderstimeu yes, ThinkEdge SE45013:14
janderstimeu if it matters, the BMC version on the machine we tested against was 1.10 (4/20/2022)13:18
arne_wiebalckjanders: timeu: are we now in a situation where we needed to patch (due to some implementation issue), then patched the patch (to support the expected behaviour while maintaining the original patch), only to see that the fixed implementation is done in a way which breaks our patched patch? hilarious :)13:33
janderstimeu I'm wrapping up for the night (I'm in Australia) but happy to continue the conversation over the coming days13:33
jandersarne_wiebalck yeah I am not sure whether I should laugh or cry13:33
arne_wiebalckit is too hot for either13:34
jandersarne_wiebalck what are the temps in your area?13:35
arne_wiebalckaround 36 degree Celsius13:36
jandersthat's a fair bit13:37
jandersarne_wiebalck timeu IIUC from limited Lenovo SE450 testing these send eTags but don't actually require them to be sent back (it's optional - as long as only one is sent, never strong+weak) - I may be wrong but if I am right it's another interesting piece to the puzzle.. anyways to be continued...13:37
janderssee you tomorrow Ironic o/13:37
opendevreviewRiccardo Pittau proposed openstack/python-ironicclient stable/yoga: Fix logging in the baremetal CLI  https://review.opendev.org/c/openstack/python-ironicclient/+/85048613:41
opendevreviewRiccardo Pittau proposed openstack/python-ironicclient stable/wallaby: Fix logging in the baremetal CLI  https://review.opendev.org/c/openstack/python-ironicclient/+/85048713:41
opendevreviewRiccardo Pittau proposed openstack/python-ironicclient stable/xena: Fix logging in the baremetal CLI  https://review.opendev.org/c/openstack/python-ironicclient/+/85048813:41
jandersarne_wiebalck I think we're pretty close to a "don't you dare break my bug" situation :)13:41
opendevreviewJulia Kreger proposed openstack/ironic master: Do not require stage2 for anaconda with standalone  https://review.opendev.org/c/openstack/ironic/+/83576913:50
opendevreviewMerged openstack/networking-generic-switch master: Add Nokia SRL release note  https://review.opendev.org/c/openstack/networking-generic-switch/+/85024513:50
timeusorry was a bit AFK. janders I checked for redfish/v1/Systems and there the SD530 Lenovo machine returns a strong Etag. I neeed to check what they send for the virtual media one13:57
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: Swap maintenance fix to cleanup step  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/85052414:02
TheJuliajanders: I guess the question becomes, do we figure out how to navigate around, or do we press vendors. I think navigate around is key if at all posisble which means somewhere the code has to be smarter14:04
TheJuliaperhaps this is a case to add logic into operation handling to check/modify/retry?!14:04
TheJuliaGranted, touching that code is... super low level, but could have the best possible impact14:04
arne_wiebalckjanders: if the implementation does not require the etag to be sent back, the implementation is broken, no?14:05
arne_wiebalckjanders: I mean, double-broken :)14:06
TheJuliaI guess we could argue if all implementations are broken in their own ways :)14:06
TheJulias/if//14:06
TheJuliathere is no if14:06
TheJuliaonly zuul... and broken implementations14:06
arne_wiebalckheh14:06
arne_wiebalckI *think* I have another downstream only patch to deal with implementation specifics14:06
TheJuliatimeu's encounter with the Continious flag not being honored on lenovo gear again just means pain, I think that is a case where we're going to have to be smarter in the grand scheme of the universe14:07
TheJuliaAnd it is okay to need to change/evolve, be a little flexible as long as we do it with smarts and plenty of unit testing :)14:08
timeuI also need to check because the /system/1 endpoint returns a strong etag. But in order for my original patch to have an effect the virtual_media endpoint must return a weak one otherwise stripping the W/ wouldn't have had any effect on it14:09
TheJuliaoh jungleboyj!14:10
TheJuliayou around?14:10
TheJuliaI think we're also fast reaching the point where talking with lenovo engineering folks might provide some clarity14:10
opendevreviewJulia Kreger proposed openstack/metalsmith stable/victoria: Add linters job  https://review.opendev.org/c/openstack/metalsmith/+/84967414:15
TheJuliarpittau: ^^ added an .ansible-lint file to fix the issue, noted in the commit message14:15
opendevreviewJulia Kreger proposed openstack/ironic master: project scoped manager support  https://review.opendev.org/c/openstack/ironic/+/81829914:18
TheJuliaJust a spelling fix on a patch which previously had a +214:18
TheJulia^14:18
rpittauTheJulia: ack 14:23
timeuI don't understand. If I do a curl against the virtual media slot I get a strong Etag 14:23
timeubut sushy tells me that I have a weak one14:23
timeuo_014:23
opendevreviewJulia Kreger proposed openstack/ironic master: CI: Save routing table information for troubleshooting  https://review.opendev.org/c/openstack/ironic/+/85053114:25
TheJuliathat... seems... weird14:26
timeuthis is the output from curl: < ETag: "515e3d5a4e1424e93ec"14:27
timeuand this is if done via sushy: (Pdb) data.headers['Etag']14:27
timeu'W/"515e3d5a4e1424e93ec"'14:27
timeusame URI14:27
timeudoes Lenovo treat the client differently o_014:27
TheJuliait might...14:29
TheJuliaodata header perhaps?14:29
timeuhmm curl returns following data: {"@odata.etag":"\"515e3d5a4e1424e93ec\""14:39
timeuI tried to send a curl request with headers: {'OData-Version': '4.0'}14:39
timeuhmm if i specifiy that I get: Modify OData related header(s) that can be accepted by server.14:41
timeuhmm If I do it properly the OData-Version header works but I still only get a strong etag via curl14:43
TheJuliacould it be time to try with just python-requests?14:58
rpittaummm I was also going to suggest this ^15:00
* TheJulia is worried there is user-agent aware code.15:00
rpittaucurl is ok, but better go through a closer implementation15:00
timeuI checked15:00
timeu'Accept-Encoding: gzip, deflate'15:00
timeuas soon as you send this header 15:00
timeuwhich requests does by defaulkt15:00
timeuI get a weak tag15:00
TheJuliaWTF15:00
timeuif I add this header to curl I also get a weak etag15:00
timeuotherwise it returns me a strong one15:01
timeuo_015:01
rpittaufantastic15:01
rpittauwhiskey tango foxtrot indeed15:01
timeuI gotta leave for an hour or so, but at least some hint why Lenovo sometimes returns strong and sometimes weak etags15:01
TheJuliahttps://media3.giphy.com/media/Aevr67I0q2ZRC/giphy.gif?cid=ecf05e47eptgnq2l56mo64a2ul7yyax86cypmhyp2itnur7y&rid=giphy.gif&ct=g <-- how my brain feels at the news of encoding differences change etag handling responses15:03
TheJuliatransport layer should not change application layer15:03
opendevreviewMerged openstack/ironic-tempest-plugin master: Undo maintenace state on protected node tests  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/84975215:05
TheJuliaoh... I think I know why they are signaling a weak etag back15:20
TheJuliayeah, so it is byte by byte strength of body response indication, and with any compression, that may not always be true because the body may be the same, but the compressed payload can be slightly different15:24
TheJuliaor vise-versa, all because of tokenization I guess15:24
TheJuliaso... those get flagged weak...15:25
TheJuliaI guess we need to just not use encoding on some things...15:25
TheJuliaproxies can also change the etag representation based upon it's own perception15:27
rpittaugood night o/16:01
jrosserI have a question about networking-generic-switch - ngs_mac_address <- this can be used to identify a switch by lldp? Is it possible to identify using the System Name TLV rather than (i think) the Chassis ID?16:11
TheJuliaI *think* it might be16:27
TheJuliaI *think* that field gets translated for the switch configuration to use16:27
TheJuliaso if memory serves i could be something like "switch0-1"16:28
TheJuliaerr, I think it could be16:28
TheJuliabut naturally, that would need to be ensured it is matches up between ngs config and port configuration information in ironic16:28
opendevreviewMerged openstack/metalsmith stable/victoria: Add linters job  https://review.opendev.org/c/openstack/metalsmith/+/84967416:41
timeuTheJulia: ah that makes sense. So one solution would be not to send any encoding for the virtual media slots ?16:50
TheJuliaI'm thinking in general16:52
TheJuliaI'm not sure there really is value in it16:53
timeuso to remove the Accept-Encoding header from the default headers in requests ? I guess we won't have compression then 16:54
TheJuliayeah, and I'm not sure compression really is valuable with the small interactions we end up having16:54
timeulet me try to disable the encoding and test it I think it would fix the issue with Lenovo 16:56
TheJulia++16:56
timeuwithout needing to revert Arne's workaround16:56
timeualthough to be honest the workaround is quite unnecessary because it patched my workaround with was to fix the weak Etag issue16:57
TheJuliayeah, except if you think about it, proxies can break overall interaction too16:57
timeuhmm 17:03
opendevreviewJulia Kreger proposed openstack/sushy master: Do not send compression as acceptable encoding  https://review.opendev.org/c/openstack/sushy/+/85054717:08
timeunice was about to submit the same PR ;-) So I can test this on Lenovo hardware. 17:10
timeuboth old as well as new firmware but I think it should fix the issue for both 17:10
TheJuliaI *think* it will17:10
TheJuliaand I think that it might end up fixing some of the things arne_wiebalck has encountered17:11
TheJuliait feels super heavy handed, but overall I think it is the right change17:11
timeuok tested it now with both old and new firmware and works for both of them (still with Arne's workaround and without my patch)17:18
jrosserTheJulia: thanks for the tip - will take a closer look at it. I have a switch which reports a per-port rather than per-chassis mac which is unfortunate17:18
TheJuliajrosser: ouch17:19
TheJuliajrosser: I think we've heard of something similar in the past where the switch was sending the switch port's mac address so each port was listing a sequentially rising mac address for every port. I think in the end they ended up having to manually fix the records. :\17:21
TheJuliaThey might have done that with introspection rules in end as well17:21
jrosserright - thats why i was pondering on the system name instead because at least that is consistent17:21
TheJuliaYay for hardware vendor inconsistencies17:23
TheJuliabut I guess... that does help ensure gainful employment for us all17:23
opendevreviewJulia Kreger proposed openstack/ironic stable/victoria: CI: disable some jobs which are not working on older branches  https://review.opendev.org/c/openstack/ironic/+/84967717:49
TheJuliawelp... I finally see what is going on with the v6 job18:01
opendevreviewJulia Kreger proposed openstack/ironic master: CI: Save routing table information for troubleshooting  https://review.opendev.org/c/openstack/ironic/+/85053118:10
opendevreviewJulia Kreger proposed openstack/ironic master: CI: Only setup fake v6 interface if needed  https://review.opendev.org/c/openstack/ironic/+/85057018:10
opendevreviewMerged openstack/ironic stable/yoga: CI: Removing ironic job queue  https://review.opendev.org/c/openstack/ironic/+/84351118:15
* TheJulia poke tox with a stick19:04
TheJulias/poke/pokes19:04
TheJuliahey folks, I'd appreciate reviews on https://review.opendev.org/c/openstack/ironic/+/849677 so I can fix the victoria branch up and merge some cleanup patches as well20:43
stevebaker[m]good morning21:07
opendevreviewVerification of a change to openstack/ironic stable/victoria failed: CI: disable some jobs which are not working on older branches  https://review.opendev.org/c/openstack/ironic/+/84967722:30
jandersgood morning Ironic o/23:30
janderstimeu (and TheJulia and rpittau) fantastic work investigating the whiskey-tangoes with Lenovo eTags - thank you for that and it matches/explains what I've seen while troubleshooting with our internal users23:41
jandersbig thanks23:41

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!