Friday, 2024-09-27

TheJuliastevebaker[m]: ^ if you have a moment to glance at the params, cool, I think that should work, but we'll find out soon :)00:02
opendevreviewJulia Kreger proposed openstack/ironic master: WIP: Add a 4k disk CI job  https://review.opendev.org/c/openstack/ironic/+/93065500:31
opendevreviewTakashi Kajinami proposed openstack/ironic master: Drop logic for pysnmp < 5  https://review.opendev.org/c/openstack/ironic/+/93066102:37
opendevreviewOpenStack Proposal Bot proposed openstack/ironic-inspector master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic-inspector/+/93066503:13
opendevreviewOpenStack Proposal Bot proposed openstack/ironic master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic/+/93066603:28
rpittaugood morning ironic! happy friday! o/06:41
opendevreviewMerged openstack/ironic-inspector master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic-inspector/+/93066507:49
opendevreviewMerged openstack/ironic master: Imported Translations from Zanata  https://review.opendev.org/c/openstack/ironic/+/93066608:06
grami[m]Hi All any chance I can get this review checked in https://review.opendev.org/c/openstack/networking-generic-switch/+/926886?tab=comments08:44
opendevreviewGraeme Moss proposed openstack/networking-generic-switch master: Add Supermicro switches to allow for supported write config  https://review.opendev.org/c/openstack/networking-generic-switch/+/92688608:45
opendevreviewMerged openstack/ironic master: CI: Enable the ability to have test VMs with different block sizes  https://review.opendev.org/c/openstack/ironic/+/92828509:02
opendevreviewMerged openstack/ironic master: [doc] Add instructions on making big fake-BM nodes  https://review.opendev.org/c/openstack/ironic/+/92764509:02
opendevreviewMerged openstack/ironic master: doc: Promote built-in introspection from experimental  https://review.opendev.org/c/openstack/ironic/+/92993609:02
opendevreviewMerged openstack/ironic master: Firmware Update via Firmware Interface Docs  https://review.opendev.org/c/openstack/ironic/+/92696109:02
opendevreviewGraeme Moss proposed openstack/networking-generic-switch master: Add Supermicro switches to allow for supported write config  https://review.opendev.org/c/openstack/networking-generic-switch/+/92688610:07
opendevreviewDmitry Tantsur proposed openstack/ironic master: Add disable_power_off field to the node model  https://review.opendev.org/c/openstack/ironic/+/93022911:47
opendevreviewGraeme Moss proposed openstack/networking-generic-switch master: Add Supermicro switches to allow for supported write config  https://review.opendev.org/c/openstack/networking-generic-switch/+/92688612:45
opendevreviewMerged openstack/ironic master: Drop logic for pysnmp < 5  https://review.opendev.org/c/openstack/ironic/+/93066113:05
dtantsurrpittau: what's your lp id?13:25
rpittaudtantsur: it's rpittau13:25
dtantsurJayF: ~ironic-coresec updated13:26
dtantsurrpittau: also adding you to ~ironic-drivers since you're somehow not there :)13:27
rpittauack13:27
opendevreviewDmitry Tantsur proposed openstack/ironic master: Reject explicit requests to power off nodes with disable_power_off  https://review.opendev.org/c/openstack/ironic/+/93071714:44
opendevreviewDerek Higgins proposed openstack/sushy master: Fix setting "HttpBootUri" attributes  https://review.opendev.org/c/openstack/sushy/+/93073215:19
opendevreviewRiccardo Pittau proposed openstack/networking-generic-switch master: Force autospec=True in tests and fix unit tests  https://review.opendev.org/c/openstack/networking-generic-switch/+/93074516:04
rpittaubye everyone, have a great weekend! o/16:06
opendevreviewMerged openstack/sushy master: Fix setting "HttpBootUri" attributes  https://review.opendev.org/c/openstack/sushy/+/93073216:43
JayFJust generally curious, do we have any known deployments on non-linux platforms? like Solaris or BSD?16:57
JayFnot **deploying** them, but just running Ironic on them at all16:57
JayFI found out today libvirt supports freebsd's hypervisor so I started pondering if ironic had any chance at all of working on a bsd16:58
clarkbJayF: if people can run your testsuite on osx thats usually a good indicator you don't have any fundamental issues16:58
clarkb(Zuul can't and we occasionally get asked about it, but its extra effort and a small team etc)16:58
clarkbbut also freebsd can emulate linux16:59
JayFI don't have OSX to answer that question; I know they did for a while 16:59
JayFthey *do* pass on wsl, which is more commonly needed16:59
clarkbya but wsl is going to use linux syscalls not bsd syscalls16:59
JayFyeah, but it still sussed out some places16:59
JayFlike we had a check for if /dev/sda existed not mocked out16:59
JayFwhich blew up on wsl17:00
clarkbbut as mentioned in theory if you're doing the wrong syscalls you can run it under their linux translation layer17:00
JayFwell my thought for ironic would be less "does the python work" and more "do the various bits in the OS we need to provison stuff there"17:00
JayFlike the various ipxe binaries and such17:00
JayF(and I don't even want to think about someone trying to run IPA in bsd)17:00
JayFnot that it matters, it's just a thought experiment17:01
clarkbthose binaries should be able to run under linux translation too worst case17:01
clarkbbut I've never really tested it with anything complicated17:02
rbuddenQuick question, can anyone point me in the direction to increasing timeouts I’m seeing in the Ironic Nova Compute during the switch to tenant network step? I’m debugging some issues w/custom Neutron NGS code where the Cumulus switches take a bit long during the commit phase and I’m triggering upstream Gateway Timeouts.17:34
rbuddenSpecifically this is what I’m seeing:17:35
rbuddenERROR oslo.service.loopingcall [-] Fixed interval looping call 'nova.virt.ironic.driver.IronicDriver._wait_for_active' failed: nova.exception.InstanceDeployFailure: Failed to provision instance 415d3c5d-85bb-42f1-beac-afdc2e9ab98e: Deploy step deploy.switch_to_tenant_network failed: Error changing node 58a4f5f9-b7f8-4e48-bd27-55fea2f92469 to tenant networks after deploy. NetworkError: Could not add public network VIF 17:35
rbudden1e7f5bc6-7f5c-462b-9cc3-cb612b79cccd to node 58a4f5f9-b7f8-4e48-bd27-55fea2f92469, possible network issue. HttpException: 504: Server Error for url: http://xx.xx.xx.xx:9696/v2.0/ports/1e7f5bc6-7f5c-462b-9cc3-cb612b79cccd, The server didn't respond in time.: 504 Gateway Time-out17:35
rbuddenI’ve tried adjusting the ‘timeout’ setting in the Ironic Nova Computes nova.conf under the [neutron] section without luck and also the vif_plugging_timeout17:36
JayFI don't think upping a timeout is going to help; I suspect it's a failure in your networking configuration17:58
JayFa 504 gateway timeout from whatever that server it's hitting is the issue17:59
JayFgiven the URL; I'd wonder if that's the right URL for neutron for your nova to be talking to (most people want https cross-service)17:59
rbuddenthe endpoint is currently not SSL (this is dev setup of Kayobe/Kolla Ansible)18:00
JayFack18:00
rbuddenthe reason I suspected a timeout is if I shorten the number of switch changes everything works fine18:00
JayFah18:00
JayFwell given it's a *504 gateway timeout* that would indicate to me you might have a proxy between with a misset timeout18:00
rbuddenAlso admittedly there’s optimization on that end, but I was hoping a short term solution to press forward was a timeout increase18:00
JayFbecause that usually means a fronting gateway is timing out the service instead of it timing itself out18:01
JayFbut it's always possible I'm wrong re: neutron HTTP error codes :)18:01
rbuddendef not ruling that out, there’s your typical Kolla HAProxy setup that runs in front of the internal VIP18:01
JayFI'd look at those logs18:01
JayFI bet it's shutting you out18:02
rbuddenI’ve been tracking logs and didn’t see any disconnects, etc. but I can look closer18:02
JayFI am not 100% sure to be clear18:02
rbuddennp18:02
JayFif you have to bump a timeout though, given that's a 50418:02
JayFthe setting you need is in neutron or a fronting proxy for it18:02
JayF*not* ironic or nova18:02
rbuddenok, interesting18:03
JayFhttps://bugs.launchpad.net/nova/+bug/1978444 looks like a cinder-flavored version of this behavior18:03
JayF(nova getting 504s from an external service)18:03
rbuddenI saw this in ironic.conf template:18:03
rbudden# Timeout for request processing when interacting with18:03
rbudden# Neutron. This value should be increased if neutron port18:03
rbudden# action timeouts are observed as neutron performs pre-commit18:04
rbudden# validation prior returning to the API client which can take18:04
JayFbut yeah, focus your troubleshooting on getting neutron timeouts updated and you can potentially change the err18:04
rbudden# longer than normal client/server interactions. (integer18:04
rbudden# value)18:04
rbudden# Note: This option can be changed without restarting.18:04
rbudden#request_timeout = 4518:04
rbudden(ah paste fail, sorry)18:04
JayFI don't *think* this would ever apply a 504 response to nova18:04
JayFbut I can see18:04
ashinclouds[m]Yup,18:04
rbuddenthe execption gets thrown via a FixedIntervalLoopingCall18:05
JayF_NEUTRON_SESSION = keystone.get_session('neutron',timeout=CONF.neutron.timeout)18:05
JayFthere's no way this would end up tossing back a 504 where it is there18:05
rbuddenok18:06
JayFyeah 100% confirmed you won't see a 504 in this case if that timeout hits18:09
JayFhttps://github.com/openstack/keystoneauth/blob/master/keystoneauth1/session.py#L112918:10
JayFthat's where that timeout ends up, going to requests18:10
JayFso I would 100% suspect that haproxy is aggressively closing things18:10
JayFbut I don't know much about the kolla-ansible architecture18:10
rbuddenok18:12
rbuddenI can recheck haproxy logs for signs of connections being dropped/terminated18:12
rbuddenthere is some similar behavior to that bug above in that the ngs switch code will ofter finish on the second leaf switch after the ironic deploy has already been marked as failed18:13
rbudden@JayF thanks for the info18:15
JayFyeah that is 100% behavior of mismatched timeouts18:15
JayFgenerally speaking you always want timeouts of middle-proxies to be set more generously than the proxies for the services/clients themselves18:15
JayFthat way the services/clients can gracefully handle timeout cases *except* when it's completely out to lunch18:15
rbuddenmakes sense18:16
JayFs/proxies for the services/timeouts for the services/18:16
rbuddenATM this is a pretty stock setup of Kayobe/Kolla Ansible. The only tweaking done so far was to Netmiko timeouts in Neutron NGS (which led me to where I am) 18:18
rbuddenI’ll do some digging in Neutron and let you know how it goes18:18
rbuddenFWIW, the code I’m working on is another driver (or two) to Neutron Networking Generic Switch to support both our Cumulus and SONIC switches and to add Trunk suport to those switches in Hybrid mode with VLAN pruning.18:19
rbuddenMy hope is at some point it may be useful to others :) 18:20
opendevreviewJay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs  https://review.opendev.org/c/openstack/ironic/+/93077618:30
JayFthis is part of my swing to get the snmp stuff working18:33
JayFmy hope is if we get virtualbmc in it's own venv, it'll save us headaches18:33
opendevreviewJay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs  https://review.opendev.org/c/openstack/ironic/+/93077618:33
rbudden@JayF Thanks for the info! I’ve tweaked the HAProxy configs and now things are working. The Kolla Ansible defaults were 1m and sadly that barely enough time to commit changes and for Cumulus to HUP switchd18:45
JayF\o/18:45
JayFYeah, I worked a place once18:45
JayFwe ssh'd into two switches to setup a bond18:45
JayFthose network swaps could take a LONG TIME because we had to limit it to one simultaneous change18:45
JayFso if you did like, 50 builds in a row, it would do the 50 network updates serially (one connection per top-of-rack) which could be very time consuming18:46
rbuddenyeah, the codepath in NGS is to loop through the leafs and update the interfaces… that time has grown because I’ve add support to loop through the subports of a trunk if the port given is a parent trunk port18:46
rbuddenthe good news is that I can now build out the bond configurations in a Neutron Trunk and this Neutron NGS driver will use the parent port as it’s PVID and add the others as Bridge VIDs18:47
rbuddenI’m not sure if it’s Netmiko that seems to take a great deal of time, but it’s a bit painful. I know NGS has a way to batch requests. That’s likely the ideal solution in the end, but this gets things working and I can refactor from here.18:49
JayFit’s Netmiko that seems to take a great deal of time <-- you wouldn't be the first to have this experience18:49
JayFI've not got personal experience with it, but I've heard complaints about speed and that integration before18:50
rbuddenyeah, it certainly looks to be Netmiko compared to what I can SSH by hand18:50
rbuddenUnforuntately it seems like there aren’t a ton of supported drivers anymore18:50
rbuddenThere used to be a in-tree Cumulus driver (guessing that went away due to the Nvidia buyout of Mellanox)18:51
rbuddenRedHat at one point had an Ansible based driver18:51
rbuddenbut from what I see the main option is really Networking-Generic-Switch18:51
JayFthere's much more money in locking someone into a whole-network solution than it is to provide sensible interfaces for folks to build their own with oss :(18:52
JayFyeah, NGS is the way that the majority of folks plugin to networks with ironic18:52
rbuddenIt’s been simple enough to work with all things considering. I need to get more familiar with the code paths in general to make this commitable… but it would be nice to have Trunk support in general18:55
rbuddenBut in HPC land being able to plug VLANs to Baremetal is a necessity 18:55
rbuddenI’d love to see (or better yet, help) get this type of support in upstream18:56
JayFThe reality is that like, for servers, we have DMTF Redfish18:59
JayFyou go buy an HPE, Dell, Supermicro, etc etc -- it's almost certainly got a redfish compliant BMC18:59
JayFthere's no such ubiquitous standard with switches, so the integrations remain kinda generic so that people can setup the "last mile" for their brand of switches19:00
rbuddenOh sure, NGS gets as close to that as possible by having just a list of commands to execute for a specific port plug/unplug/etc.19:01
JayFyeah, and I've worked places at pretty massive scale with NGS19:01
rbuddenI’m more of less talking about it was currently lacking the ability to understand when a port is a parent of a Neutron Trunk and act on the subports attached to it.19:02
JayFthe batching support helped a *lot* with that19:02
JayFoh yeah, the lack of modeling for proper portgroups in neutron is a painpoint in more ways than one19:02
rbuddenI can imagine… I need to get the batching piece on my TODO19:05
opendevreviewJay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs  https://review.opendev.org/c/openstack/ironic/+/93077619:42
opendevreviewJay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs  https://review.opendev.org/c/openstack/ironic/+/93077619:52
opendevreviewJay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs  https://review.opendev.org/c/openstack/ironic/+/93077620:34
cardoeNGS is good. But there's definitely a rub with Ironic.20:35
cardoeer neutron.20:35
cardoeOne of the things I've actually been studying the code on more.20:35
TheJuliaYou might quickly start to realize why we want to offload some of that interaction20:35
TheJulia:)20:35
TheJuliaAnyway, I ned to run into town20:35
cardoeI suspect, likely incorrectly, that neutron is modeling it more around tunneling and layered networks.20:36
TheJuliayour pretty much spot on20:36
JayFmany other openstack projects don't have to deal with the real world20:36
JayFbut we're all MTV stars here20:37
* TheJulia gets out the good camera20:37
TheJuliaGot to make the video!20:38
TheJulialol20:38
cardoeWell I'm starting my science experiment but I'm out on PTO this next week so maybe I won't forget it all.20:38
TheJuliaheh20:38
TheJuliascience++good20:38
cardoeI'm gonna try and stick one of my folks to tinker some more.20:38
TheJuliaAnyway, I need to run into town before traffic goes to hell20:38
cardoe++ I'll just ramble for the buffer.20:39
TheJulia<mr. burns>excellent</mr. burns>20:39
opendevreviewJay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs  https://review.opendev.org/c/openstack/ironic/+/93077620:41
cardoeSo the science project is that I don't really care about the tunneled stuff and running more on top. So we've made a "vxlan" type replacement and making NGS use it.20:41
JayFinteresting, my change is mostly working but it looks like requirements might not be getting installed for things in the new venv22:07
opendevreviewJay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs  https://review.opendev.org/c/openstack/ironic/+/93077622:25
JayF> Sep 27 22:41:37.456003 np0038641008 gunicorn[83310]: [2024-09-27 22:41:37,455] ERROR in main: libvirt driver not loaded23:00
JayFI suspect we're missing a dependency somewhere in sushy-tools23:00
JayFwhen installed in it's own venv it apperas to be missing deps23:00
JayFbut I may also just have screwed something up, idk23:01
JayFhmm I think the pip install part may just be broken, and the fact that runs via gunicorn is masking it23:19
opendevreviewJay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs  https://review.opendev.org/c/openstack/ironic/+/93077623:20

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!