Tuesday, 2022-04-26

opendevreviewAmeya Raut proposed openstack/ironic-tempest-plugin master: Detaching instance_uuid for standalone TC's  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/83846202:36
arne_wiebalckGood morning, Ironic!06:05
jandersgood morning arne_wiebalck and Ironic o/06:15
arne_wiebalckhey janders o/06:16
opendevreviewHarald Jensås proposed openstack/networking-baremetal master: [DNM] TEST CI networking-baremetal-multitenant-vlans  https://review.opendev.org/c/openstack/networking-baremetal/+/83929806:29
rpittaugood morning ironic! o/07:19
dtantsurmorning folks07:40
dtantsurrpittau: hey, let's sync re https://review.opendev.org/c/openstack/sushy-tools/+/83680107:40
dtantsurthe last thing we determined about lower-constraints is that they should contain all dependencies. has anything changed?07:40
rpittaudtantsur: good morning!07:43
rpittauI actually don't think we need all the deps in lower-constraints, some of them can be left free and adjusted if needed07:44
dtantsurnot having dependencies there is what has broken us07:44
dtantsuressentially, MarkupSafe, Werkzeug and itsdangerous removed API that the old Flask (?) relied on07:44
dtantsuryeah, I think it was Flask07:45
dtantsurthe alternative is to stop testing lower-constraints at all07:46
rpittauwe just need to track in lower-constraints what we track in requirements, and possibly some other deps of deps if needed07:49
rpittaulook at other projects like sushy or ipa, even if they break at some point, it will be for just one or two deps, and always the same07:49
rpittautracking all of them it doesn't make sense07:49
rpittauthat's one of the reasons why we decided to leave the l-c test only in master07:49
dtantsurit's probably easier to add everything that to chase a new package every time something breaks. but I can give it a try.07:50
rpittauI see it the opposite way :)07:54
rpittauleaving the freedom to the direct requirements to "choose" their dependencies version reduce the number of packages that we have to chase in case of breakage07:54
opendevreviewDmitry Tantsur proposed openstack/sushy-tools master: Fix the CI  https://review.opendev.org/c/openstack/sushy-tools/+/83680107:54
dtantsurrpittau: trying ^^^07:54
jandersgood morning rpittau dtantsur and Ironic o/07:59
rpittauhey janders :)08:00
dtantsurrpittau: well, them "choosing" their dependencies IS the source of breakages08:03
dtantsuranyway, the patch seems to be passing. please review.08:04
sagarHi ironic !08:47
sagarI am part of Dell team where we are working on improving the test coverage in its third-party CI. 08:49
sagarIn ironic_tempest_plugin, currently we are adding test cases for different deployment scenario's with all the available drivers. To achieve the synchronize boot mode deployment scenario, planning to use boot_mode parameter in test cases.08:49
sagarHave tried to use boot_mode as a node attribute in test cases, by referring this https://github.com/openstack/ironic-tempest-plugin/blob/56af4756a993b264bac6f5c7788397ebfc7359bf/ironic_tempest_plugin/services/baremetal/v1/json/baremetal_client.py#L505 . still its not getting reflected on the node properties.08:53
dtantsursagar: hi! boot_mode is usually set by devstack. what exactly are you trying to achieve?08:55
sagar we are checking boot mode on server first and then sending opposite  of it. For example : if boot_mode on server is uefi we need to pass "bios" boot_mode from tempest test case.09:03
dtantsursagar: you'll have hard time. take a look at devstack/lib/ironic: we're setting the current boot mode on the flavor.09:05
sagardtantsur: Does that means, we cant update boot_mode as of now through tempest?09:08
sagarMeanwhile I will also check : devstack/lib/ironic as you have suggested.09:10
opendevreviewRiccardo Pittau proposed openstack/bifrost master: [DNM] Test dhcp-all-interfaces fix  https://review.opendev.org/c/openstack/bifrost/+/83932909:52
dtantsursagar: you probably won't be able to without a lot of hackery10:06
sagardtantsur: ok, Thank you.10:09
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Multipath Hardware path handling  https://review.opendev.org/c/openstack/ironic-python-agent/+/83703910:17
rpittauTheJulia: re: multipath, I pushed an update to your patch based on some... testing results, and dtantsur suggestions :)10:21
dtantsurrpittau: can we actually cache the outcome of is_multipath_enabled?10:27
dtantsurmaybe in a global variable as a temporary measure?10:27
rpittauoh yeah10:28
opendevreviewRiccardo Pittau proposed openstack/ironic-python-agent master: Multipath Hardware path handling  https://review.opendev.org/c/openstack/ironic-python-agent/+/83703910:35
opendevreviewRiccardo Pittau proposed openstack/bifrost master: Upgrade from stable/yoga  https://review.opendev.org/c/openstack/bifrost/+/83936912:06
iurygregorygood morning Ironic12:10
rpittauhey iurygregory :)12:11
iurygregoryo/12:12
janderssee you tomorrow Ironic o/12:57
TheJuliarpittau: Okay, I'll caffinate and look in a little bit12:59
TheJuliadtantsur: I tried originally with a global var... I ran into all sorts of testing issues12:59
TheJulialike... bashing my head into a wall until it was covered in blood tried13:00
rpittaubye janders :)13:03
rpittaugood morning TheJulia :)13:03
TheJuliaThe only way to make it a global is likely make overall detection of mpio capability a method invoked as part of startup and never again. Kind of similar to how the node cache works, although that only gets updated after initial check-in upon direct invocation by ?1? method13:21
dtantsurTheJulia: in theory, we should cache it on the hardware manager13:22
TheJuliaI really tried13:22
dtantsurbut the current design makes it harder (list_all_block_devices is a global function)13:22
TheJuliayup13:22
TheJuliarpittau: your changes look good to me13:27
hjensasanyone else see the Task "Generate statistics" error seen here - https://zuul.opendev.org/t/openstack/build/391677d36eed4781bf2394d590c27146 ?13:28
rpittauthanks TheJulia :)13:28
TheJuliahjensas: no, generate statistics is something brand new that I've seen no emails on13:29
hjensasTheJulia: ok, I'll keep digging then. network-baremetal CI is broken, and the failing generate stats task cause "POST_FAILURE" and no logs are collected.13:30
TheJuliaugh13:30
* hjensas filed https://bugs.launchpad.net/devstack/+bug/197043113:36
hjensasdansmith: if you are around, can you look at ^ - I'm not sure if we want to use 'tryint()' on L34 of get-stats.py, or put zero in value, or skip the stat?13:50
dansmithhjensas: yeah, tryint would be good.. I have no idea why systemd is reporting something like that13:51
dansmithhjensas: we should also mark the ansible as "ignore_errors" there so we don't nail you in cases like this13:52
dansmithgmann: ^13:52
dansmithhjensas: I have a patch up to fix something else, so let me just add to that13:52
hjensasdansmith: yeah, it is strange. I proposed https://review.opendev.org/c/openstack/devstack/+/839387 - using trying()13:52
dansmithoh okay13:52
hjensass/trying()/tryint()/13:53
gmanndansmith: you mean to mark devstack role itself with ignore_error so that it does not cause job failure. I think we did same for stackviz role case also. 13:54
opendevreviewHarald Jensås proposed openstack/networking-baremetal master: [DNM] TEST CI networking-baremetal-multitenant-vlans  https://review.opendev.org/c/openstack/networking-baremetal/+/83929813:54
dansmithgmann: yeah, I have another tweak for jobs that don't have pymysql installed, so I added it to that: https://review.opendev.org/c/openstack/devstack/+/83921713:55
gmanndansmith: nice13:56
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic-python-agent stable/wallaby: Multipath Hardware path handling  https://review.opendev.org/c/openstack/ironic-python-agent/+/83778414:03
iurygregoryrpittau, "backport" updated14:03
rpittauiurygregory: checked and lgtm14:06
rpittauthanks!14:06
iurygregorynice!14:08
opendevreviewJulia Kreger proposed openstack/ironic master: DNM: v6/grenade multinode jobs  https://review.opendev.org/c/openstack/ironic/+/83908614:14
TheJuliaif that doesn't work, I'm likely going to need to ask opendev folks to hold the VMs from an execution so I can poke around.14:15
TheJuliavxlan tunnel sadness (which is a headache with every multinode job)14:15
dtantsurlooking for a 2nd +2 on https://review.opendev.org/c/openstack/sushy-tools/+/83680114:43
TheJuliadone14:44
dtantsurthx!14:45
TheJuliawould anyone like classix pixie boots stickers?14:45
TheJuliaclassic14:45
dtantsurbring them to the summit :)14:45
opendevreviewRiccardo Pittau proposed openstack/sushy-tools master: Use python Zed tests  https://review.opendev.org/c/openstack/sushy-tools/+/83867414:52
opendevreviewDmitry Tantsur proposed openstack/sushy-tools master: vmedia: keep the original URL in Image  https://review.opendev.org/c/openstack/sushy-tools/+/83679514:54
dtantsurthis ^^^ is annoying when debugging14:54
TheJuliaApparently my reward for ordering two packs of stickers is hot sauce14:57
opendevreviewDmitry Tantsur proposed openstack/ironic master: Decouple deploy callback timeout from deploy step timeout  https://review.opendev.org/c/openstack/ironic/+/83769015:02
ajyaftarasenko: while looking through the logs and having another try in my environment I was able to reproduce the issue. I'm starting seeing what is happening, but not yet clear why it is happening and why only from time to time. It's around the logic in Ironic with async tasks (the ones that reboot the system).15:08
ajya As same pattern is reused in all interfaces it can affect all idrac async tasks. At least now it does not look that it has anything to do with iDRAC.15:08
ajyaAnother thing, in newer Ironics the error is ignored here https://opendev.org/openstack/ironic/src/commit/93dc442935d5f7553c2459d46fb1d1c1d9c8a57c/ironic/conductor/rpcapi.py#L7315:09
ajyaMy guess is that something was backported to Wallaby and because Wallaby does not get this ^, the error fails all cleaning.15:10
opendevreviewDmitry Tantsur proposed openstack/bifrost master: Prevent the enroll/deploy commands from running without venv  https://review.opendev.org/c/openstack/bifrost/+/83939915:10
ajyaI'll continue looking into the logic to get it fixed. For now don't see any quick workarounds.15:10
TheJuliathere is nothing like waiting for a node to become available... when it is one of five.15:13
ftarasenkoajya: thank you for your research. workaround is not a problem for me, hope the bug will be found and fixed)15:13
opendevreviewHarald Jensås proposed openstack/networking-baremetal master: Remove deprecated ironic client opts  https://review.opendev.org/c/openstack/networking-baremetal/+/83929815:21
adarobinIs storyboard the appropriate venue for feature requests or just bug reports?15:32
TheJuliaadarobin: it is15:33
TheJuliafor both15:33
dtantsuradarobin: yes https://docs.openstack.org/ironic/latest/contributor/contributing.html#adding-new-features15:33
adarobinCool -- I have patches as well, but getting my employer to sign a contributor agreement is all sorts of fun15:34
dtantsursigh, yeah15:34
adarobinOnly took like a year and half to get the last one :-(15:35
dtantsurmmmmghh.. our love for JSON fields backfires from time to time :( I need to search database for things that have agent_url and I cannot (at least in a database-agnostic way)15:46
opendevreviewMerged openstack/sushy-tools master: Fix the CI  https://review.opendev.org/c/openstack/sushy-tools/+/83680115:48
* dtantsur is wondering how modern sqlalchemy deals with JSON queries15:53
rpittaugood night! o/16:05
dtantsurokay, nothing backend-independent. damn.16:13
TheJuliadtantsur: why do you need to query by agent_url?16:14
dtantsurTheJulia: I need to know if we're running a deploy step now or just waiting for the agent to come back16:14
TheJuliaIn other news, for some insane reason, port 8080 works with ipv6, and 443 does not :(16:14
dtantsurOo16:15
dtantsurI can, of course, filter on the Python side.. but it means fetching all nodes in DEPLOYWAIT16:16
TheJuliadtantsur: so teaching ironic to do a secondary query instead of just assuming the node will heartbeat quickly? I thought we had ipa code to immediately heartbeat upon task completion16:17
dtantsurTheJulia: the timeout does not use heartbeats (and we cannot query by the last heartbeat time since it's also in JSON!)16:18
TheJuliaahh16:18
TheJuliaI still thought we tought IPA to immediately heartbeat anyway, so I'm not sure I truly understand the why unless your intending to just teach ironic to go ask "whats up?" explicitly16:19
dtantsurI need to be able to tweak the deploy step timeout16:33
dtantsurso yeah, IPA does heartbeat immediately, but it has no effect on the timeout? unless we flip the node to deploying and back.16:34
TheJuliahmmm16:35
TheJuliafeels like a bug that we don't16:35
TheJuliabut there may be reasons there16:35
dtantsura counter-argument could be: heartbeats are not an indicator that a deploy step is not stuck16:35
dtantsurwhether we should handle the case of a deploy step getting stuck.. is questionable16:36
JayFa deploy step arguably should handle the case of itself getting stuck, internally16:36
JayFif it's doing something that long running, nothing preventing it from making sure e.g. the vendor tool it kicked off is making progress16:37
dtantsuryeah, I'm thinking among the same lines16:37
dtantsurmaybe we don't even need a generic deploy step timeout, only a deploy callback timeout16:37
dtantsurthe hard part is to distinguish the two.. I wish the agent heartbeat timestamp was a normal field we could query against16:40
* dtantsur is curious even we even use it at all16:41
opendevreviewAmeya Raut proposed openstack/ironic-tempest-plugin master: Add iDRAC management cleaning steps tests  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/82664616:41
dtantsuranswering myself: we use it for fast-track detection and for PXE retries16:42
rloowrt dtantsur's comment about searching db for info in json ... we have a downstream hack to allow the user to filter (when doing 'list') on a property key=value pair... maybe it can be generalized to any json field, don't recall how the code works...16:42
dtantsurrloo: it can be done quite easily, but not in a backend-agnostic way16:42
rloothat's what i don't know, the code does something at the db layer and it might be specific to mysql.16:43
*** dking is now known as Guest288716:53
*** Guest2887 is now known as dking17:07
mallik_rloo: Hi17:31
rloohi mallik_17:32
mallik_I tried anaconda based provisioning with rhel 7.9 rhel8.2 and 8.5 as per the documentation, and could not succeed17:33
mallik_rloo: with rhel8.5,  I see  in /tmp/anaconda.log "dasbus.error.DBusError: [Errno 2] No such file or directory: 'efibootmgr': 'efibootmgr'"17:33
mallik_rloo: with rhel8.2, I see "ValueError: Error enabling service chronyd: 1"17:33
TheJuliamallik_: between 8.3 and 8.4, I believe efibootmgr basically became required to be in any rhel image deployed17:34
TheJuliaSo maybe some sort of mismatch?17:34
rlooso it doesn't work with rhel7. there's a bug with anaconda or something. we should prob update some doc or open a bug or ?? about that17:34
rloo(and I don't actually have experience much experience with anaconda or rhel8)17:35
mallik_rloo: with rhel7.9, node moved to active but image did not bootup17:35
rlooi think %onerror doesn't work with rhel7, i've already forgotten. if node moved to active, then the %onerror bug wouldn't be the issue.17:37
mallik_TheJulia: we  checked manuall in the anaconda shell prompt of rhel8.5, efiboomgr was present but it was giving that error17:39
TheJuliamallik_: I'd honestly consider contacting RH support in that case, since your far outside the driver at that point, and it sounds like an issue with anaconda itself17:40
mallik_TheJulia: I also tried with url patch id 834709, for rhel7.9 it was failing with pre-install steps. with 8.5 it crossed pre-install steps and as per anaconda logs it is waiting for some input on the screen. Input is required by ScreenData(IpmiErrorDialog,None,True) screen17:44
opendevreviewJulia Kreger proposed openstack/ironic master: Grenade: Turn up interfaces for vxlan  https://review.opendev.org/c/openstack/ironic/+/83942017:46
TheJuliarloo or any around core, I'd appreciate just merging ^17:47
TheJuliaI can't fix the grenade stuff without it, it seems. :\17:47
TheJuliaand it needs to actually merge in for the subnode to pick it up.17:47
TheJuliasince it runs off master17:47
rlooTheJulia: need 2 people to review?17:52
dtantsurI can be the 2nd17:53
rloothx dtantsur! was wondering if I missed something about only needing 1 core ;)17:53
dtantsurwell.. gate fixes are one of the cases where it's legal to merge with a single core17:54
rlooah.. see, i WAS missing something!17:54
dtantsurOR we can do it this way: we allow TheJulia to approve it herself if it passes the multinode job (it's non-voting)17:55
rlooha ha17:55
TheJuliaoh well, then... 17:55
TheJuliaI guess I'll look for something else to do17:55
TheJulia:)17:55
JayFrloo: policy was changed a few years back to "2x core votes except in cases where it's trivial or a gate fix" 17:56
JayFI don't love anything getting into the code base without 2x independent eyes, but this is the real world where there aren't enough humans working on ironic to go around ;017:56
rloothx JayF! the 'trivial' part I remember, so I'm not totally senile ;)17:57
TheJuliathe v6 job still has me stumped17:57
rloo(I was just musing about changing it to just one core...)17:57
opendevreviewDmitry Tantsur proposed openstack/ironic master: [WIP] Decouple deploy callback timeout from deploy step timeout  https://review.opendev.org/c/openstack/ironic/+/83769017:58
dtantsurthis just stops timing out running steps ^^17:59
dtantsurwe need to check that we handle an abruptly restarted agent correctly.. but this has to be done anyway17:59
dtantsurbut that's for tomorrow. o/18:02
TheJuliagoodnight18:02
opendevreviewJulia Kreger proposed openstack/ironic master: DNM: v6/grenade multinode jobs  https://review.opendev.org/c/openstack/ironic/+/83908618:17
opendevreviewJulia Kreger proposed openstack/ironic master: CI: Turn off STP for CI jobs  https://review.opendev.org/c/openstack/ironic/+/83942518:40
TheJuliahjensas: so I don't think we can do multipath in CI... Looks like it would only work if backed by a real block device19:04
hjensasTheJulia: hm, a file mounted as loop dev does not work?19:16
TheJuliaThat *might* work19:17
TheJuliaI'm not sure it makes sense to retool CI that much though19:17
hjensasyeah, maybe not.19:20
TheJuliawow, we also can't turn off stp19:26
TheJuliain bridge mode19:26
opendevreviewHarald Jensås proposed openstack/networking-baremetal master: Register neutron common config options  https://review.opendev.org/c/openstack/networking-baremetal/+/83929819:48
hjensasTheJulia: do we have loops if we turn of stp?19:56
TheJuliahjensas: it won't let us because we're using a bridge20:00
hjensasah, right I saw something similar in Infrared downstream. They had to condition on being bridge or not.20:01
TheJuliayeah, we default to bridge upstream which makes me wonder how many CI failures we've had over the years due to it20:03
TheJuliahjensas: do you remember any issues with dhcpv6/slaac in ci, specifically with tinycore linux?20:06
hjensasTheJulia: no, it's been too long. 20:07
TheJuliahjensas: would you mind glancing at an IPA console log and seeing if anything screams out at you?20:08
hjensasTheJulia: not at all, is it on your v6/grenade patch?20:09
TheJuliahttps://e850e778a47f7adac9b8-8b0899cb6c8c0582fa25b52fb6031f3e.ssl.cf1.rackcdn.com/839086/8/check/ironic-tempest-ipxe-ipv6/e2950f8/controller/logs/ironic-bm-logs/node-1_console_2022-04-26-19%3A33%3A32_log.txt20:11
TheJuliayeah20:11
TheJuliaI'm a bit stumped... and I'm thinking maybe I should just change the type over to eliminate if it is the Os of the ramdisk20:11
TheJuliaIt feels like it is working as expected though20:11
TheJuliabut nothing is really adding up20:11
TheJuliaerr, rax20:13
TheJuliaso have to use tiny20:13
TheJuliaugh20:13
TheJuliawell20:13
* TheJulia checks to see where the grenade fix is at20:13
TheJuliaNot terribly long, so I can kick it then20:14
hjensasIt looks good, tinycore boots, and there is addresses and routes learned via RA.20:18
hjensasI ran into an issue some time ago where I had flapping routes, I belive because of multiple interfaces all learning a default over RA proto. RHBZ#2046514, I never found the time to dig into that bug properly. - Would it be worth trying to force eth1 down?20:21
TheJuliaWe only have the second interfaces because of portgroup bonding, so I think we could likely just default down the second interfaces in general20:28
opendevreviewVerification of a change to openstack/ironic master failed: Grenade: Turn up interfaces for vxlan  https://review.opendev.org/c/openstack/ironic/+/83942020:55
opendevreviewClark Boylan proposed openstack/bifrost master: [DNM] Test dhcp-all-interfaces fix  https://review.opendev.org/c/openstack/bifrost/+/83932921:47
opendevreviewVerification of a change to openstack/ironic master failed: Grenade: Turn up interfaces for vxlan  https://review.opendev.org/c/openstack/ironic/+/83942021:48

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!