TheJulia | stevebaker[m]: ^ if you have a moment to glance at the params, cool, I think that should work, but we'll find out soon :) | 00:02 |
---|---|---|
opendevreview | Julia Kreger proposed openstack/ironic master: WIP: Add a 4k disk CI job https://review.opendev.org/c/openstack/ironic/+/930655 | 00:31 |
opendevreview | Takashi Kajinami proposed openstack/ironic master: Drop logic for pysnmp < 5 https://review.opendev.org/c/openstack/ironic/+/930661 | 02:37 |
opendevreview | OpenStack Proposal Bot proposed openstack/ironic-inspector master: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic-inspector/+/930665 | 03:13 |
opendevreview | OpenStack Proposal Bot proposed openstack/ironic master: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic/+/930666 | 03:28 |
rpittau | good morning ironic! happy friday! o/ | 06:41 |
opendevreview | Merged openstack/ironic-inspector master: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic-inspector/+/930665 | 07:49 |
opendevreview | Merged openstack/ironic master: Imported Translations from Zanata https://review.opendev.org/c/openstack/ironic/+/930666 | 08:06 |
grami[m] | Hi All any chance I can get this review checked in https://review.opendev.org/c/openstack/networking-generic-switch/+/926886?tab=comments | 08:44 |
opendevreview | Graeme Moss proposed openstack/networking-generic-switch master: Add Supermicro switches to allow for supported write config https://review.opendev.org/c/openstack/networking-generic-switch/+/926886 | 08:45 |
opendevreview | Merged openstack/ironic master: CI: Enable the ability to have test VMs with different block sizes https://review.opendev.org/c/openstack/ironic/+/928285 | 09:02 |
opendevreview | Merged openstack/ironic master: [doc] Add instructions on making big fake-BM nodes https://review.opendev.org/c/openstack/ironic/+/927645 | 09:02 |
opendevreview | Merged openstack/ironic master: doc: Promote built-in introspection from experimental https://review.opendev.org/c/openstack/ironic/+/929936 | 09:02 |
opendevreview | Merged openstack/ironic master: Firmware Update via Firmware Interface Docs https://review.opendev.org/c/openstack/ironic/+/926961 | 09:02 |
opendevreview | Graeme Moss proposed openstack/networking-generic-switch master: Add Supermicro switches to allow for supported write config https://review.opendev.org/c/openstack/networking-generic-switch/+/926886 | 10:07 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Add disable_power_off field to the node model https://review.opendev.org/c/openstack/ironic/+/930229 | 11:47 |
opendevreview | Graeme Moss proposed openstack/networking-generic-switch master: Add Supermicro switches to allow for supported write config https://review.opendev.org/c/openstack/networking-generic-switch/+/926886 | 12:45 |
opendevreview | Merged openstack/ironic master: Drop logic for pysnmp < 5 https://review.opendev.org/c/openstack/ironic/+/930661 | 13:05 |
dtantsur | rpittau: what's your lp id? | 13:25 |
rpittau | dtantsur: it's rpittau | 13:25 |
dtantsur | JayF: ~ironic-coresec updated | 13:26 |
dtantsur | rpittau: also adding you to ~ironic-drivers since you're somehow not there :) | 13:27 |
rpittau | ack | 13:27 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Reject explicit requests to power off nodes with disable_power_off https://review.opendev.org/c/openstack/ironic/+/930717 | 14:44 |
opendevreview | Derek Higgins proposed openstack/sushy master: Fix setting "HttpBootUri" attributes https://review.opendev.org/c/openstack/sushy/+/930732 | 15:19 |
opendevreview | Riccardo Pittau proposed openstack/networking-generic-switch master: Force autospec=True in tests and fix unit tests https://review.opendev.org/c/openstack/networking-generic-switch/+/930745 | 16:04 |
rpittau | bye everyone, have a great weekend! o/ | 16:06 |
opendevreview | Merged openstack/sushy master: Fix setting "HttpBootUri" attributes https://review.opendev.org/c/openstack/sushy/+/930732 | 16:43 |
JayF | Just generally curious, do we have any known deployments on non-linux platforms? like Solaris or BSD? | 16:57 |
JayF | not **deploying** them, but just running Ironic on them at all | 16:57 |
JayF | I found out today libvirt supports freebsd's hypervisor so I started pondering if ironic had any chance at all of working on a bsd | 16:58 |
clarkb | JayF: if people can run your testsuite on osx thats usually a good indicator you don't have any fundamental issues | 16:58 |
clarkb | (Zuul can't and we occasionally get asked about it, but its extra effort and a small team etc) | 16:58 |
clarkb | but also freebsd can emulate linux | 16:59 |
JayF | I don't have OSX to answer that question; I know they did for a while | 16:59 |
JayF | they *do* pass on wsl, which is more commonly needed | 16:59 |
clarkb | ya but wsl is going to use linux syscalls not bsd syscalls | 16:59 |
JayF | yeah, but it still sussed out some places | 16:59 |
JayF | like we had a check for if /dev/sda existed not mocked out | 16:59 |
JayF | which blew up on wsl | 17:00 |
clarkb | but as mentioned in theory if you're doing the wrong syscalls you can run it under their linux translation layer | 17:00 |
JayF | well my thought for ironic would be less "does the python work" and more "do the various bits in the OS we need to provison stuff there" | 17:00 |
JayF | like the various ipxe binaries and such | 17:00 |
JayF | (and I don't even want to think about someone trying to run IPA in bsd) | 17:00 |
JayF | not that it matters, it's just a thought experiment | 17:01 |
clarkb | those binaries should be able to run under linux translation too worst case | 17:01 |
clarkb | but I've never really tested it with anything complicated | 17:02 |
rbudden | Quick question, can anyone point me in the direction to increasing timeouts I’m seeing in the Ironic Nova Compute during the switch to tenant network step? I’m debugging some issues w/custom Neutron NGS code where the Cumulus switches take a bit long during the commit phase and I’m triggering upstream Gateway Timeouts. | 17:34 |
rbudden | Specifically this is what I’m seeing: | 17:35 |
rbudden | ERROR oslo.service.loopingcall [-] Fixed interval looping call 'nova.virt.ironic.driver.IronicDriver._wait_for_active' failed: nova.exception.InstanceDeployFailure: Failed to provision instance 415d3c5d-85bb-42f1-beac-afdc2e9ab98e: Deploy step deploy.switch_to_tenant_network failed: Error changing node 58a4f5f9-b7f8-4e48-bd27-55fea2f92469 to tenant networks after deploy. NetworkError: Could not add public network VIF | 17:35 |
rbudden | 1e7f5bc6-7f5c-462b-9cc3-cb612b79cccd to node 58a4f5f9-b7f8-4e48-bd27-55fea2f92469, possible network issue. HttpException: 504: Server Error for url: http://xx.xx.xx.xx:9696/v2.0/ports/1e7f5bc6-7f5c-462b-9cc3-cb612b79cccd, The server didn't respond in time.: 504 Gateway Time-out | 17:35 |
rbudden | I’ve tried adjusting the ‘timeout’ setting in the Ironic Nova Computes nova.conf under the [neutron] section without luck and also the vif_plugging_timeout | 17:36 |
JayF | I don't think upping a timeout is going to help; I suspect it's a failure in your networking configuration | 17:58 |
JayF | a 504 gateway timeout from whatever that server it's hitting is the issue | 17:59 |
JayF | given the URL; I'd wonder if that's the right URL for neutron for your nova to be talking to (most people want https cross-service) | 17:59 |
rbudden | the endpoint is currently not SSL (this is dev setup of Kayobe/Kolla Ansible) | 18:00 |
JayF | ack | 18:00 |
rbudden | the reason I suspected a timeout is if I shorten the number of switch changes everything works fine | 18:00 |
JayF | ah | 18:00 |
JayF | well given it's a *504 gateway timeout* that would indicate to me you might have a proxy between with a misset timeout | 18:00 |
rbudden | Also admittedly there’s optimization on that end, but I was hoping a short term solution to press forward was a timeout increase | 18:00 |
JayF | because that usually means a fronting gateway is timing out the service instead of it timing itself out | 18:01 |
JayF | but it's always possible I'm wrong re: neutron HTTP error codes :) | 18:01 |
rbudden | def not ruling that out, there’s your typical Kolla HAProxy setup that runs in front of the internal VIP | 18:01 |
JayF | I'd look at those logs | 18:01 |
JayF | I bet it's shutting you out | 18:02 |
rbudden | I’ve been tracking logs and didn’t see any disconnects, etc. but I can look closer | 18:02 |
JayF | I am not 100% sure to be clear | 18:02 |
rbudden | np | 18:02 |
JayF | if you have to bump a timeout though, given that's a 504 | 18:02 |
JayF | the setting you need is in neutron or a fronting proxy for it | 18:02 |
JayF | *not* ironic or nova | 18:02 |
rbudden | ok, interesting | 18:03 |
JayF | https://bugs.launchpad.net/nova/+bug/1978444 looks like a cinder-flavored version of this behavior | 18:03 |
JayF | (nova getting 504s from an external service) | 18:03 |
rbudden | I saw this in ironic.conf template: | 18:03 |
rbudden | # Timeout for request processing when interacting with | 18:03 |
rbudden | # Neutron. This value should be increased if neutron port | 18:03 |
rbudden | # action timeouts are observed as neutron performs pre-commit | 18:04 |
rbudden | # validation prior returning to the API client which can take | 18:04 |
JayF | but yeah, focus your troubleshooting on getting neutron timeouts updated and you can potentially change the err | 18:04 |
rbudden | # longer than normal client/server interactions. (integer | 18:04 |
rbudden | # value) | 18:04 |
rbudden | # Note: This option can be changed without restarting. | 18:04 |
rbudden | #request_timeout = 45 | 18:04 |
rbudden | (ah paste fail, sorry) | 18:04 |
JayF | I don't *think* this would ever apply a 504 response to nova | 18:04 |
JayF | but I can see | 18:04 |
ashinclouds[m] | Yup, | 18:04 |
rbudden | the execption gets thrown via a FixedIntervalLoopingCall | 18:05 |
JayF | _NEUTRON_SESSION = keystone.get_session('neutron',timeout=CONF.neutron.timeout) | 18:05 |
JayF | there's no way this would end up tossing back a 504 where it is there | 18:05 |
rbudden | ok | 18:06 |
JayF | yeah 100% confirmed you won't see a 504 in this case if that timeout hits | 18:09 |
JayF | https://github.com/openstack/keystoneauth/blob/master/keystoneauth1/session.py#L1129 | 18:10 |
JayF | that's where that timeout ends up, going to requests | 18:10 |
JayF | so I would 100% suspect that haproxy is aggressively closing things | 18:10 |
JayF | but I don't know much about the kolla-ansible architecture | 18:10 |
rbudden | ok | 18:12 |
rbudden | I can recheck haproxy logs for signs of connections being dropped/terminated | 18:12 |
rbudden | there is some similar behavior to that bug above in that the ngs switch code will ofter finish on the second leaf switch after the ironic deploy has already been marked as failed | 18:13 |
rbudden | @JayF thanks for the info | 18:15 |
JayF | yeah that is 100% behavior of mismatched timeouts | 18:15 |
JayF | generally speaking you always want timeouts of middle-proxies to be set more generously than the proxies for the services/clients themselves | 18:15 |
JayF | that way the services/clients can gracefully handle timeout cases *except* when it's completely out to lunch | 18:15 |
rbudden | makes sense | 18:16 |
JayF | s/proxies for the services/timeouts for the services/ | 18:16 |
rbudden | ATM this is a pretty stock setup of Kayobe/Kolla Ansible. The only tweaking done so far was to Netmiko timeouts in Neutron NGS (which led me to where I am) | 18:18 |
rbudden | I’ll do some digging in Neutron and let you know how it goes | 18:18 |
rbudden | FWIW, the code I’m working on is another driver (or two) to Neutron Networking Generic Switch to support both our Cumulus and SONIC switches and to add Trunk suport to those switches in Hybrid mode with VLAN pruning. | 18:19 |
rbudden | My hope is at some point it may be useful to others :) | 18:20 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs https://review.opendev.org/c/openstack/ironic/+/930776 | 18:30 |
JayF | this is part of my swing to get the snmp stuff working | 18:33 |
JayF | my hope is if we get virtualbmc in it's own venv, it'll save us headaches | 18:33 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs https://review.opendev.org/c/openstack/ironic/+/930776 | 18:33 |
rbudden | @JayF Thanks for the info! I’ve tweaked the HAProxy configs and now things are working. The Kolla Ansible defaults were 1m and sadly that barely enough time to commit changes and for Cumulus to HUP switchd | 18:45 |
JayF | \o/ | 18:45 |
JayF | Yeah, I worked a place once | 18:45 |
JayF | we ssh'd into two switches to setup a bond | 18:45 |
JayF | those network swaps could take a LONG TIME because we had to limit it to one simultaneous change | 18:45 |
JayF | so if you did like, 50 builds in a row, it would do the 50 network updates serially (one connection per top-of-rack) which could be very time consuming | 18:46 |
rbudden | yeah, the codepath in NGS is to loop through the leafs and update the interfaces… that time has grown because I’ve add support to loop through the subports of a trunk if the port given is a parent trunk port | 18:46 |
rbudden | the good news is that I can now build out the bond configurations in a Neutron Trunk and this Neutron NGS driver will use the parent port as it’s PVID and add the others as Bridge VIDs | 18:47 |
rbudden | I’m not sure if it’s Netmiko that seems to take a great deal of time, but it’s a bit painful. I know NGS has a way to batch requests. That’s likely the ideal solution in the end, but this gets things working and I can refactor from here. | 18:49 |
JayF | it’s Netmiko that seems to take a great deal of time <-- you wouldn't be the first to have this experience | 18:49 |
JayF | I've not got personal experience with it, but I've heard complaints about speed and that integration before | 18:50 |
rbudden | yeah, it certainly looks to be Netmiko compared to what I can SSH by hand | 18:50 |
rbudden | Unforuntately it seems like there aren’t a ton of supported drivers anymore | 18:50 |
rbudden | There used to be a in-tree Cumulus driver (guessing that went away due to the Nvidia buyout of Mellanox) | 18:51 |
rbudden | RedHat at one point had an Ansible based driver | 18:51 |
rbudden | but from what I see the main option is really Networking-Generic-Switch | 18:51 |
JayF | there's much more money in locking someone into a whole-network solution than it is to provide sensible interfaces for folks to build their own with oss :( | 18:52 |
JayF | yeah, NGS is the way that the majority of folks plugin to networks with ironic | 18:52 |
rbudden | It’s been simple enough to work with all things considering. I need to get more familiar with the code paths in general to make this commitable… but it would be nice to have Trunk support in general | 18:55 |
rbudden | But in HPC land being able to plug VLANs to Baremetal is a necessity | 18:55 |
rbudden | I’d love to see (or better yet, help) get this type of support in upstream | 18:56 |
JayF | The reality is that like, for servers, we have DMTF Redfish | 18:59 |
JayF | you go buy an HPE, Dell, Supermicro, etc etc -- it's almost certainly got a redfish compliant BMC | 18:59 |
JayF | there's no such ubiquitous standard with switches, so the integrations remain kinda generic so that people can setup the "last mile" for their brand of switches | 19:00 |
rbudden | Oh sure, NGS gets as close to that as possible by having just a list of commands to execute for a specific port plug/unplug/etc. | 19:01 |
JayF | yeah, and I've worked places at pretty massive scale with NGS | 19:01 |
rbudden | I’m more of less talking about it was currently lacking the ability to understand when a port is a parent of a Neutron Trunk and act on the subports attached to it. | 19:02 |
JayF | the batching support helped a *lot* with that | 19:02 |
JayF | oh yeah, the lack of modeling for proper portgroups in neutron is a painpoint in more ways than one | 19:02 |
rbudden | I can imagine… I need to get the batching piece on my TODO | 19:05 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs https://review.opendev.org/c/openstack/ironic/+/930776 | 19:42 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs https://review.opendev.org/c/openstack/ironic/+/930776 | 19:52 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs https://review.opendev.org/c/openstack/ironic/+/930776 | 20:34 |
cardoe | NGS is good. But there's definitely a rub with Ironic. | 20:35 |
cardoe | er neutron. | 20:35 |
cardoe | One of the things I've actually been studying the code on more. | 20:35 |
TheJulia | You might quickly start to realize why we want to offload some of that interaction | 20:35 |
TheJulia | :) | 20:35 |
TheJulia | Anyway, I ned to run into town | 20:35 |
cardoe | I suspect, likely incorrectly, that neutron is modeling it more around tunneling and layered networks. | 20:36 |
TheJulia | your pretty much spot on | 20:36 |
JayF | many other openstack projects don't have to deal with the real world | 20:36 |
JayF | but we're all MTV stars here | 20:37 |
* TheJulia gets out the good camera | 20:37 | |
TheJulia | Got to make the video! | 20:38 |
TheJulia | lol | 20:38 |
cardoe | Well I'm starting my science experiment but I'm out on PTO this next week so maybe I won't forget it all. | 20:38 |
TheJulia | heh | 20:38 |
TheJulia | science++good | 20:38 |
cardoe | I'm gonna try and stick one of my folks to tinker some more. | 20:38 |
TheJulia | Anyway, I need to run into town before traffic goes to hell | 20:38 |
cardoe | ++ I'll just ramble for the buffer. | 20:39 |
TheJulia | <mr. burns>excellent</mr. burns> | 20:39 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs https://review.opendev.org/c/openstack/ironic/+/930776 | 20:41 |
cardoe | So the science project is that I don't really care about the tunneled stuff and running more on top. So we've made a "vxlan" type replacement and making NGS use it. | 20:41 |
JayF | interesting, my change is mostly working but it looks like requirements might not be getting installed for things in the new venv | 22:07 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs https://review.opendev.org/c/openstack/ironic/+/930776 | 22:25 |
JayF | > Sep 27 22:41:37.456003 np0038641008 gunicorn[83310]: [2024-09-27 22:41:37,455] ERROR in main: libvirt driver not loaded | 23:00 |
JayF | I suspect we're missing a dependency somewhere in sushy-tools | 23:00 |
JayF | when installed in it's own venv it apperas to be missing deps | 23:00 |
JayF | but I may also just have screwed something up, idk | 23:01 |
JayF | hmm I think the pip install part may just be broken, and the fact that runs via gunicorn is masking it | 23:19 |
opendevreview | Jay Faulkner proposed openstack/ironic master: [WIP] devstack: split into VENVs https://review.opendev.org/c/openstack/ironic/+/930776 | 23:20 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!