opendevreview | Julia Kreger proposed openstack/sushy master: Handle AccessError with Basic Auth instead of "reauth" https://review.opendev.org/c/openstack/sushy/+/850425 | 00:27 |
---|---|---|
TheJulia | JayF: replied to your primary comment on https://review.opendev.org/c/openstack/ironic/+/850259 :) | 00:38 |
TheJulia | just fyi, looks like they were not using session auth, but basic auth | 00:39 |
TheJulia | I guess the change is still basically valid and it boils down to proactive vs reactive | 00:42 |
rpittau | good morning ironic! o/ | 06:49 |
rpittau | JayF, TheJulia, I approved https://review.opendev.org/c/openstack/ironic-python-agent/+/839084, see if I got the time to use lsblk json output in a follow up | 06:58 |
rpittau | rethinking about that it's better it was not used as the backport would've been problematic | 06:58 |
opendevreview | Uemit Seren proposed openstack/sushy master: Implement allowed values for BootSourceOverrideEnabled https://review.opendev.org/c/openstack/sushy/+/850461 | 08:22 |
opendevreview | Merged openstack/ironic-python-agent master: Guard shared device/cluster filesystems https://review.opendev.org/c/openstack/ironic-python-agent/+/839084 | 08:23 |
janders | good morning rpittau and Ironic o/ | 08:43 |
rpittau | hey janders :) | 08:53 |
opendevreview | Merged openstack/ironic master: Remove support for trusted boot https://review.opendev.org/c/openstack/ironic/+/850239 | 09:01 |
opendevreview | Riccardo Pittau proposed openstack/ironic-python-agent master: Use lsblk json output for safety_check_block_device https://review.opendev.org/c/openstack/ironic-python-agent/+/850497 | 09:57 |
opendevreview | Jakub Jelinek proposed openstack/ironic-python-agent master: Remove unused lines of code https://review.opendev.org/c/openstack/ironic-python-agent/+/850498 | 10:07 |
janders | arne_wiebalck do you remember what was the hardware platform that originally required https://review.opendev.org/c/openstack/sushy/+/818110 ("Handle weak Etags")? | 10:29 |
janders | I'm trying to build a mental model of the failure modes related to eTags that we know about... clearly certain Lenovo and HP models can be a challenge... | 10:31 |
janders | we don't know what we don't know but thinking through what we DO know may be a start | 10:32 |
arne_wiebalck | janders: I *think* I amended code which was a workaround, IOW I patched to code to actually work with the expected behaviour of the endpoint. | 11:12 |
arne_wiebalck | janders: you may want to have a look at this one: https://review.opendev.org/c/openstack/ironic-python-agent/+/850498 | 11:12 |
arne_wiebalck | janders: (different topic) | 11:13 |
timeu | janders,arne_wiebalck: originally when I contributed the virtual media patch implementation for the Lenovo hardware I also added the workaround to strip the \W from the Weak Etag because back then the Lenovo BMC accepted only strong Etag although it returns a weak one | 11:20 |
timeu | I guess this broke it for Redfish implementation of some vendors (SuperMicro and HPE) that might return a weak Etag and only accept one | 11:20 |
timeu | AFAIK: arne_wiebalck patch to return both fixed the issue | 11:21 |
timeu | however that patch now broke virtual media for Lenovo nodes. At least for Lenovo nodes with newer firmaware returning the Etag unmodifed works now. My expectation is that it would also work for SuperMicro + HPE but that needs to be verified | 11:26 |
iurygregory | good morning Ironic o/ | 11:31 |
pas-ha[m] | hi all, have the next question re ironic api - we're trying to enable CADF notifications for our services, and ironic api is one of the weird ones. Most of other services use paste.ini and actually have the '/' path served by a special pipeline/app that only does version document. the audit middleware is used only on other pipelines that serve /vN paths. | 11:44 |
pas-ha[m] | Ironic does not use paste.ini, the order and presence of middlewares in kind of hardcoded in the code. | 11:44 |
pas-ha[m] | Would someone have suggestions on how to block audit middleware and ironic to trigger on e.g. '/' root api path which is not covered by keystone auth and is usually used by monitoring/healthchecks? it just produces too much noise... and somewhat malformatted CADF messages as there's no user/project etc to insert | 11:44 |
pas-ha[m] | the only ways I see it currently is either re-write ironic api to use different 'apps' on diffferent paths (with e.g. pecan-mount) or patch keystonemiddleware to allow also ignored paths (additionally to ignored HTTP verbs) | 12:18 |
pas-ha[m] | would be nice if someone has alternative suggestions :-) | 12:18 |
arne_wiebalck | timeu: this is my understanding as well, yes | 12:21 |
arne_wiebalck | timeu: it broke new nodes? | 12:21 |
arne_wiebalck | timeu: is the breakage "justified" ? | 12:21 |
timeu | your patch broke all Lenovo nodes (both old and new firmware) | 12:29 |
timeu | seems like the Lenovo Redfish can't handle if you pass two Etags | 12:29 |
opendevreview | kamlesh chauvhan proposed openstack/ironic-tempest-plugin master: Add iDRAC RAID cleaning steps tests https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/841601 | 12:29 |
timeu | I tried to even reverse the order | 12:29 |
timeu | but still the same error | 12:29 |
timeu | newer Lenovo firmwares however don't need my workaround (stripping \W) anymore and I guess should behave the same as HPE, SuperMicro (just return the Etag as is) | 12:30 |
janders | arne_wiebalck timeu I haven't had a chance to do hands-on testing with Lenovos but from the problem reports I'm getting are consistent with timeu's assessment ("Lenovo Redfish can't handle if you pass two Etags") | 12:34 |
janders | arne_wiebalck timeu what I haven't had a chance to figure out though is where is the second eTag coming from. In basic queries against the System I've only gotten one eTag back. Do these only show up while querying certain URLs, do you know? | 12:35 |
arne_wiebalck | timeu: passing multiple Etags is allowed by the standard. | 12:37 |
janders | In any case I wonder if we need to consider implementing a more complex logic where we try different things depending on server responses, cause it really seems that 1) one vendor's fix is another vendor's bug trigger and 2) the scale of the problem is too widespread to wait for the vendors to improve their compliance with standards due to high | 12:37 |
janders | impact on users | 12:37 |
arne_wiebalck | janders: the 2nd etag is coming from the patch | 12:37 |
janders | "not our fault but our problem" kind of stuff | 12:37 |
arne_wiebalck | janders: it gets one etag and returns it in two formats | 12:37 |
janders | arne_wiebalck hmm interesting - so is the patch "producing" the weak eTag? | 12:38 |
janders | (in case it is needed) | 12:38 |
* janders is looking at the patch again | 12:38 | |
arne_wiebalck | janders: FWIU, timeu 's patch stripped the qualifier off the weak etag (and weak etags should not have been used in the first place!) | 12:39 |
arne_wiebalck | janders: my patch kept timeu 's patch and also returned the strong etag | 12:39 |
arne_wiebalck | (I think :-) | 12:40 |
arne_wiebalck | and I think returning multiple pre-conditions is allowed | 12:41 |
janders | arne_wiebalck if I am reading the regex and the conditional right ( https://review.opendev.org/c/openstack/sushy/+/818110/1/sushy/resources/base.py#610 ) it seems that if the server returns a strong eTag, this code only returns one eTag, however if it returns a weak eTag, the code returns it in "strong" and "weak" "format", right? | 12:45 |
janders | (I think Lenovo weirdness partially comes from the fact that it seems to operate on strong eTags, yet it fails out due to getting strong+weak together somehow) | 12:45 |
janders | (I haven't figured out the source of that weak eTag in that case) | 12:46 |
arne_wiebalck | janders: yes, this is my understanding re the regex | 12:46 |
arne_wiebalck | janders: if a weak etag is returned at the start, this is already wrong I think | 12:47 |
arne_wiebalck | janders: oh, it returns a strong one, fair enough | 12:47 |
janders | arne_wiebalck we've seen it return a strong eTag when just GETting the System but when Ironic errs out, there is strong+weak eTags in the failing query in the logs (I haven't yet figured out how this happens and why the behaviour varies) | 12:48 |
janders | arne_wiebalck timeu have you guys seen BMCs that return strong eTags in some responses and weak eTags in other responses to RedFish queries? | 12:50 |
arne_wiebalck | janders: no | 12:50 |
janders | arne_wiebalck I think I may need to gather some more evidence of that (once I'm done with some urgent downstream work in a week or so) and take it from there I suppose | 12:52 |
TheJulia | good morning | 12:53 |
timeu | in my original workaround I received a weak etag from Lenovo BMC | 12:54 |
timeu | and had to return a strong one by stripping the W/ | 12:54 |
timeu | this was with older firmwares | 12:54 |
timeu | the newer firmware still send a weak etag but also accept the same etag back | 12:55 |
timeu | so my original workaround is not anymore necessary for newer Lenovo firmware. | 12:55 |
* TheJulia has reached "get contacted by conference organizing business" level of spam | 12:56 | |
timeu | Lenovo firmware can't handle multiple Etags (weak + strong) although it is allowed from the standard | 12:56 |
janders | timeu in your experience, does the Lenovo BMC always send the weak eTag, or does it vary (sometimes strong, sometimes weak)? | 12:57 |
timeu | as far as I can tell it always sends the weak one | 12:57 |
timeu | but I need to check this | 12:57 |
timeu | at least for PATCH on virtual media it does | 12:57 |
janders | timeu I'm trying to find the query we ran the last time - it may be interesting to run it on yours if it's possible | 12:58 |
timeu | sure I can do this | 12:58 |
janders | one query where we got a strong eTag was against the vMedia URL: /redfish/v1/Managers/1/VirtualMedia/EXT1 | 13:00 |
janders | that was just a plain GET | 13:00 |
janders | GETting the System ( /redfish/v1/Systems/1 ) also seemed to produce a strong eTag in our test | 13:02 |
TheJulia | iurygregory: any issue with https://review.opendev.org/c/openstack/sushy/+/850425 getting the ironic-week-prio hashtag? | 13:03 |
iurygregory | TheJulia, none (we don't need to wait till the upstream meeting to add the tag ^^) | 13:04 |
iurygregory | also, good morning TheJulia =) | 13:04 |
TheJulia | I know, i figured I'd ask anyway :) | 13:04 |
timeu | janders: this was with Lenovo hardware ? | 13:11 |
janders | timeu yes, ThinkEdge SE450 | 13:14 |
janders | timeu if it matters, the BMC version on the machine we tested against was 1.10 (4/20/2022) | 13:18 |
arne_wiebalck | janders: timeu: are we now in a situation where we needed to patch (due to some implementation issue), then patched the patch (to support the expected behaviour while maintaining the original patch), only to see that the fixed implementation is done in a way which breaks our patched patch? hilarious :) | 13:33 |
janders | timeu I'm wrapping up for the night (I'm in Australia) but happy to continue the conversation over the coming days | 13:33 |
janders | arne_wiebalck yeah I am not sure whether I should laugh or cry | 13:33 |
arne_wiebalck | it is too hot for either | 13:34 |
janders | arne_wiebalck what are the temps in your area? | 13:35 |
arne_wiebalck | around 36 degree Celsius | 13:36 |
janders | that's a fair bit | 13:37 |
janders | arne_wiebalck timeu IIUC from limited Lenovo SE450 testing these send eTags but don't actually require them to be sent back (it's optional - as long as only one is sent, never strong+weak) - I may be wrong but if I am right it's another interesting piece to the puzzle.. anyways to be continued... | 13:37 |
janders | see you tomorrow Ironic o/ | 13:37 |
opendevreview | Riccardo Pittau proposed openstack/python-ironicclient stable/yoga: Fix logging in the baremetal CLI https://review.opendev.org/c/openstack/python-ironicclient/+/850486 | 13:41 |
opendevreview | Riccardo Pittau proposed openstack/python-ironicclient stable/wallaby: Fix logging in the baremetal CLI https://review.opendev.org/c/openstack/python-ironicclient/+/850487 | 13:41 |
opendevreview | Riccardo Pittau proposed openstack/python-ironicclient stable/xena: Fix logging in the baremetal CLI https://review.opendev.org/c/openstack/python-ironicclient/+/850488 | 13:41 |
janders | arne_wiebalck I think we're pretty close to a "don't you dare break my bug" situation :) | 13:41 |
opendevreview | Julia Kreger proposed openstack/ironic master: Do not require stage2 for anaconda with standalone https://review.opendev.org/c/openstack/ironic/+/835769 | 13:50 |
opendevreview | Merged openstack/networking-generic-switch master: Add Nokia SRL release note https://review.opendev.org/c/openstack/networking-generic-switch/+/850245 | 13:50 |
timeu | sorry was a bit AFK. janders I checked for redfish/v1/Systems and there the SD530 Lenovo machine returns a strong Etag. I neeed to check what they send for the virtual media one | 13:57 |
opendevreview | Julia Kreger proposed openstack/ironic-tempest-plugin master: Swap maintenance fix to cleanup step https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/850524 | 14:02 |
TheJulia | janders: I guess the question becomes, do we figure out how to navigate around, or do we press vendors. I think navigate around is key if at all posisble which means somewhere the code has to be smarter | 14:04 |
TheJulia | perhaps this is a case to add logic into operation handling to check/modify/retry?! | 14:04 |
TheJulia | Granted, touching that code is... super low level, but could have the best possible impact | 14:04 |
arne_wiebalck | janders: if the implementation does not require the etag to be sent back, the implementation is broken, no? | 14:05 |
arne_wiebalck | janders: I mean, double-broken :) | 14:06 |
TheJulia | I guess we could argue if all implementations are broken in their own ways :) | 14:06 |
TheJulia | s/if// | 14:06 |
TheJulia | there is no if | 14:06 |
TheJulia | only zuul... and broken implementations | 14:06 |
arne_wiebalck | heh | 14:06 |
arne_wiebalck | I *think* I have another downstream only patch to deal with implementation specifics | 14:06 |
TheJulia | timeu's encounter with the Continious flag not being honored on lenovo gear again just means pain, I think that is a case where we're going to have to be smarter in the grand scheme of the universe | 14:07 |
TheJulia | And it is okay to need to change/evolve, be a little flexible as long as we do it with smarts and plenty of unit testing :) | 14:08 |
timeu | I also need to check because the /system/1 endpoint returns a strong etag. But in order for my original patch to have an effect the virtual_media endpoint must return a weak one otherwise stripping the W/ wouldn't have had any effect on it | 14:09 |
TheJulia | oh jungleboyj! | 14:10 |
TheJulia | you around? | 14:10 |
TheJulia | I think we're also fast reaching the point where talking with lenovo engineering folks might provide some clarity | 14:10 |
opendevreview | Julia Kreger proposed openstack/metalsmith stable/victoria: Add linters job https://review.opendev.org/c/openstack/metalsmith/+/849674 | 14:15 |
TheJulia | rpittau: ^^ added an .ansible-lint file to fix the issue, noted in the commit message | 14:15 |
opendevreview | Julia Kreger proposed openstack/ironic master: project scoped manager support https://review.opendev.org/c/openstack/ironic/+/818299 | 14:18 |
TheJulia | Just a spelling fix on a patch which previously had a +2 | 14:18 |
TheJulia | ^ | 14:18 |
rpittau | TheJulia: ack | 14:23 |
timeu | I don't understand. If I do a curl against the virtual media slot I get a strong Etag | 14:23 |
timeu | but sushy tells me that I have a weak one | 14:23 |
timeu | o_0 | 14:23 |
opendevreview | Julia Kreger proposed openstack/ironic master: CI: Save routing table information for troubleshooting https://review.opendev.org/c/openstack/ironic/+/850531 | 14:25 |
TheJulia | that... seems... weird | 14:26 |
timeu | this is the output from curl: < ETag: "515e3d5a4e1424e93ec" | 14:27 |
timeu | and this is if done via sushy: (Pdb) data.headers['Etag'] | 14:27 |
timeu | 'W/"515e3d5a4e1424e93ec"' | 14:27 |
timeu | same URI | 14:27 |
timeu | does Lenovo treat the client differently o_0 | 14:27 |
TheJulia | it might... | 14:29 |
TheJulia | odata header perhaps? | 14:29 |
timeu | hmm curl returns following data: {"@odata.etag":"\"515e3d5a4e1424e93ec\"" | 14:39 |
timeu | I tried to send a curl request with headers: {'OData-Version': '4.0'} | 14:39 |
timeu | hmm if i specifiy that I get: Modify OData related header(s) that can be accepted by server. | 14:41 |
timeu | hmm If I do it properly the OData-Version header works but I still only get a strong etag via curl | 14:43 |
TheJulia | could it be time to try with just python-requests? | 14:58 |
rpittau | mmm I was also going to suggest this ^ | 15:00 |
* TheJulia is worried there is user-agent aware code. | 15:00 | |
rpittau | curl is ok, but better go through a closer implementation | 15:00 |
timeu | I checked | 15:00 |
timeu | 'Accept-Encoding: gzip, deflate' | 15:00 |
timeu | as soon as you send this header | 15:00 |
timeu | which requests does by defaulkt | 15:00 |
timeu | I get a weak tag | 15:00 |
TheJulia | WTF | 15:00 |
timeu | if I add this header to curl I also get a weak etag | 15:00 |
timeu | otherwise it returns me a strong one | 15:01 |
timeu | o_0 | 15:01 |
rpittau | fantastic | 15:01 |
rpittau | whiskey tango foxtrot indeed | 15:01 |
timeu | I gotta leave for an hour or so, but at least some hint why Lenovo sometimes returns strong and sometimes weak etags | 15:01 |
TheJulia | https://media3.giphy.com/media/Aevr67I0q2ZRC/giphy.gif?cid=ecf05e47eptgnq2l56mo64a2ul7yyax86cypmhyp2itnur7y&rid=giphy.gif&ct=g <-- how my brain feels at the news of encoding differences change etag handling responses | 15:03 |
TheJulia | transport layer should not change application layer | 15:03 |
opendevreview | Merged openstack/ironic-tempest-plugin master: Undo maintenace state on protected node tests https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/849752 | 15:05 |
TheJulia | oh... I think I know why they are signaling a weak etag back | 15:20 |
TheJulia | yeah, so it is byte by byte strength of body response indication, and with any compression, that may not always be true because the body may be the same, but the compressed payload can be slightly different | 15:24 |
TheJulia | or vise-versa, all because of tokenization I guess | 15:24 |
TheJulia | so... those get flagged weak... | 15:25 |
TheJulia | I guess we need to just not use encoding on some things... | 15:25 |
TheJulia | proxies can also change the etag representation based upon it's own perception | 15:27 |
rpittau | good night o/ | 16:01 |
jrosser | I have a question about networking-generic-switch - ngs_mac_address <- this can be used to identify a switch by lldp? Is it possible to identify using the System Name TLV rather than (i think) the Chassis ID? | 16:11 |
TheJulia | I *think* it might be | 16:27 |
TheJulia | I *think* that field gets translated for the switch configuration to use | 16:27 |
TheJulia | so if memory serves i could be something like "switch0-1" | 16:28 |
TheJulia | err, I think it could be | 16:28 |
TheJulia | but naturally, that would need to be ensured it is matches up between ngs config and port configuration information in ironic | 16:28 |
opendevreview | Merged openstack/metalsmith stable/victoria: Add linters job https://review.opendev.org/c/openstack/metalsmith/+/849674 | 16:41 |
timeu | TheJulia: ah that makes sense. So one solution would be not to send any encoding for the virtual media slots ? | 16:50 |
TheJulia | I'm thinking in general | 16:52 |
TheJulia | I'm not sure there really is value in it | 16:53 |
timeu | so to remove the Accept-Encoding header from the default headers in requests ? I guess we won't have compression then | 16:54 |
TheJulia | yeah, and I'm not sure compression really is valuable with the small interactions we end up having | 16:54 |
timeu | let me try to disable the encoding and test it I think it would fix the issue with Lenovo | 16:56 |
TheJulia | ++ | 16:56 |
timeu | without needing to revert Arne's workaround | 16:56 |
timeu | although to be honest the workaround is quite unnecessary because it patched my workaround with was to fix the weak Etag issue | 16:57 |
TheJulia | yeah, except if you think about it, proxies can break overall interaction too | 16:57 |
timeu | hmm | 17:03 |
opendevreview | Julia Kreger proposed openstack/sushy master: Do not send compression as acceptable encoding https://review.opendev.org/c/openstack/sushy/+/850547 | 17:08 |
timeu | nice was about to submit the same PR ;-) So I can test this on Lenovo hardware. | 17:10 |
timeu | both old as well as new firmware but I think it should fix the issue for both | 17:10 |
TheJulia | I *think* it will | 17:10 |
TheJulia | and I think that it might end up fixing some of the things arne_wiebalck has encountered | 17:11 |
TheJulia | it feels super heavy handed, but overall I think it is the right change | 17:11 |
timeu | ok tested it now with both old and new firmware and works for both of them (still with Arne's workaround and without my patch) | 17:18 |
jrosser | TheJulia: thanks for the tip - will take a closer look at it. I have a switch which reports a per-port rather than per-chassis mac which is unfortunate | 17:18 |
TheJulia | jrosser: ouch | 17:19 |
TheJulia | jrosser: I think we've heard of something similar in the past where the switch was sending the switch port's mac address so each port was listing a sequentially rising mac address for every port. I think in the end they ended up having to manually fix the records. :\ | 17:21 |
TheJulia | They might have done that with introspection rules in end as well | 17:21 |
jrosser | right - thats why i was pondering on the system name instead because at least that is consistent | 17:21 |
TheJulia | Yay for hardware vendor inconsistencies | 17:23 |
TheJulia | but I guess... that does help ensure gainful employment for us all | 17:23 |
opendevreview | Julia Kreger proposed openstack/ironic stable/victoria: CI: disable some jobs which are not working on older branches https://review.opendev.org/c/openstack/ironic/+/849677 | 17:49 |
TheJulia | welp... I finally see what is going on with the v6 job | 18:01 |
opendevreview | Julia Kreger proposed openstack/ironic master: CI: Save routing table information for troubleshooting https://review.opendev.org/c/openstack/ironic/+/850531 | 18:10 |
opendevreview | Julia Kreger proposed openstack/ironic master: CI: Only setup fake v6 interface if needed https://review.opendev.org/c/openstack/ironic/+/850570 | 18:10 |
opendevreview | Merged openstack/ironic stable/yoga: CI: Removing ironic job queue https://review.opendev.org/c/openstack/ironic/+/843511 | 18:15 |
* TheJulia poke tox with a stick | 19:04 | |
TheJulia | s/poke/pokes | 19:04 |
TheJulia | hey folks, I'd appreciate reviews on https://review.opendev.org/c/openstack/ironic/+/849677 so I can fix the victoria branch up and merge some cleanup patches as well | 20:43 |
stevebaker[m] | good morning | 21:07 |
opendevreview | Verification of a change to openstack/ironic stable/victoria failed: CI: disable some jobs which are not working on older branches https://review.opendev.org/c/openstack/ironic/+/849677 | 22:30 |
janders | good morning Ironic o/ | 23:30 |
janders | timeu (and TheJulia and rpittau) fantastic work investigating the whiskey-tangoes with Lenovo eTags - thank you for that and it matches/explains what I've seen while troubleshooting with our internal users | 23:41 |
janders | big thanks | 23:41 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!