opendevreview | Takashi Kajinami proposed openstack/ironic-inspector master: Suppress logs from stevedore https://review.opendev.org/c/openstack/ironic-inspector/+/903853 | 03:28 |
---|---|---|
adam-metal3 | Hey, since last Monday (I have been told) we are seeing this https://paste.openstack.org/raw/bxMHCSI7kO3fgJh9etbI/ error in the Metal3 ci with the ubuntu tests, so Ironic ans sushy-tools are running as containers on the host (not in K8s cluster) | 07:21 |
adam-metal3 | could you give me some pointer please that what could cause this ? | 07:23 |
rpittau | good morning ironic! o/ | 07:50 |
dtantsur | adam-metal3: I wonder how we can enable this "auto selection disabled" | 07:58 |
dtantsur | adam-metal3: I assume it's a regression in https://opendev.org/openstack/sushy-tools/commit/361e0eef99671cff2c5273649a10ce4367fa7610 | 08:04 |
dtantsur | I had concerns about it, but was assured it's fine... | 08:04 |
dtantsur | I'm talking with the author of the patch | 08:13 |
dtantsur | jm1[m]: this is what I mentioned on slack ^^^ | 08:13 |
jm1[m] | adam-metal3: Hi, sorry for the mess! Which Ubuntu version are you testing with? | 08:21 |
Nisha_Agarwal | Hello Ironic!!! | 08:31 |
Nisha_Agarwal | Need a quick help.....Is there a way we can configure session auth token expiry? | 08:32 |
Nisha_Agarwal | in ironic | 08:32 |
Nisha_Agarwal | for redfish driver | 08:33 |
adam-metal3 | dtantsur: thanks the info | 08:43 |
adam-metal3 | jm1[m]: we are using 22.04 | 08:43 |
adam-metal3 | but Ironic runs in a container for us | 08:44 |
adam-metal3 | so in our case Ubuntu should not matter much , contenerized Ironic talking to libvirt vms via sushy-tools | 08:45 |
adam-metal3 | via* | 08:45 |
jm1[m] | adam-metal3: but libvirtd is running on ubuntu 22.04, right? i am trying to reproduce it | 08:49 |
adam-metal3 | jm1[m]: yes you are right that runs on ubuntu 22.04 | 08:49 |
rpittau | dtantsur: for the dhcp issue in bifrost, I collected the dnsmasq config here https://b0f9ae0491e974b2315d-45537cc5c7120f43f6e626c6c78dc0c0.ssl.cf5.rackcdn.com/903755/2/check/bifrost-integration-dibipa-debian-centos-9/3bc1373/logs/dnsmasq_config/index.html | 09:03 |
rpittau | I don't see anything wrong but we can compare it with a working one | 09:03 |
opendevreview | Riccardo Pittau proposed openstack/ironic-inspector master: [WIP] Handle LLDP parse Unicode error https://review.opendev.org/c/openstack/ironic-inspector/+/903760 | 09:13 |
jm1[m] | adam-metal3: could you please point me to a log output? something is odd, e.g. the snippet you posted above says "Setting boot mode to bios failed for". but in bios mode, the nvram xml tag should not be set at all | 09:15 |
adam-metal3 | jm1[m]: I have asced for a link to the specific build, that will have a log tar file | 09:17 |
opendevreview | Riccardo Pittau proposed openstack/ironic master: Handle LLDP parse Unicode error https://review.opendev.org/c/openstack/ironic/+/903861 | 09:19 |
opendevreview | Riccardo Pittau proposed openstack/ironic-inspector master: Handle LLDP parse Unicode error https://review.opendev.org/c/openstack/ironic-inspector/+/903760 | 09:19 |
adam-metal3 | jm1[m]: https://jenkins.nordix.org/view/Metal3/job/metal3_capm3_main_integration_test_ubuntu/879/ | 09:26 |
adam-metal3 | there is the archive https://jenkins.nordix.org/view/Metal3/job/metal3_capm3_main_integration_test_ubuntu/879/artifact/logs-jenkins-metal3_capm3_main_integration_test_ubuntu-879.tgz | 09:26 |
adam-metal3 | and in the archive there is a "docker" directory and that will have the ironic logs | 09:27 |
jm1[m] | adam-metal3: thank you! will have a look | 09:27 |
adam-metal3 | jm1[m]: Thank you ! | 09:34 |
jm1[m] | adam-metal3: where can i find the sushy-tools version you are using? | 09:40 |
jm1[m] | adam-metal3: or rather the code responsible for pulling in sushy-tools | 09:40 |
adam-metal3 | https://github.com/metal3-io/ironic-image/blob/cf3c71cd0f0e1bd5af710f5f6af45036966641d9/resources/sushy-tools/Dockerfile#L3 | 09:40 |
jm1[m] | dtantsur: we have not merged 1.1.0 yet, so i am wondering how my code could be responsible?!? | 09:43 |
jm1[m] | dtantsur adam-metal3 maybe we have to look somewhere else. ironic wants to boot in bios mode. my patch removes/changes nvram_path but it does not mess with loader_path | 09:46 |
jm1[m] | the error log complains about loader_path though | 09:46 |
adam-metal3 | jm1[m]: what is the use case of "loader_path" I am not familiar with this variable | 09:47 |
dtantsur | jm1[m]: oh, the image does not yet use 1.1.0? that's interesting, I thought we did that already | 09:49 |
jm1[m] | adam-metal3: a verbose explanation 😅 https://libvirt.org/formatdomain.html#bios-bootloader | 09:49 |
* dtantsur hopes it's not a regression in ubuntu | 09:49 | |
TheJulia | o/ morning folks | 09:50 |
* TheJulia is just in a hotel, very bored and trying to avoid sending records in for a car accident on Saturday | 09:51 | |
jm1[m] | adam-metal3: is this the code which creates the domain xml? https://github.com/metal3-io/metal3-dev-env/blob/main/vm-setup/roles/libvirt/templates/baremetalvm.xml.j2 | 09:51 |
* TheJulia has a load shedding idea that needs to get put into BZ | 09:52 | |
TheJulia | Err LP | 09:52 |
zigo | Hi there! Can someone take over this ? https://review.opendev.org/c/openstack/ironic-lib/+/903815 | 09:55 |
zigo | The issue is with zeroconf 0.129, probably this needs global-requirement fix first ... | 09:55 |
zigo | FYI, I'm super busy with Python 3.12 compat, so I can't really take care of that one... | 09:55 |
adam-metal3 | jm1[m] yes | 09:57 |
adam-metal3 | jm1[m] okay so it is about the boot efi/bios fimware get it, i just didn't catch at first what firmware it is reffering to but it is clear now thanks | 10:02 |
dtantsur | adam-metal3: shutting in the darkness really, but maybe https://github.com/metal3-io/metal3-dev-env/pull/1325 will help | 10:06 |
dtantsur | if it's not a sushy-tools regression, I'm completely puzzled what caused it | 10:06 |
dtantsur | adam-metal3: is there any real reason why we configure testing nodes in UEFI but trying to use BIOS afterwards? | 10:08 |
dtantsur | I'd expect most people to use UEFI nowadays unless they have some very specific reason not to (like a broken firmware) | 10:08 |
adam-metal3 | dtantsur: I don't think mixing these 2 is intentional, for a long time Metal3 CI was only testing with BIOS | 10:10 |
adam-metal3 | but dev-env had the ability to provide a choice for the user | 10:11 |
adam-metal3 | ASFAIK dev env is defaulting to bios | 10:14 |
dtantsur | adam-metal3: the easiest way to "fix" the CI is to switch to UEFI IMO | 10:14 |
adam-metal3 | dtantsur: sure I can do that but I have to check then something first I don't remember if I have pushed the fix for the UEFI firmware path, because for some time in case of UEFI it was loading the secure UEFI firmware | 10:16 |
adam-metal3 | yeah okay I have merged the UEFI firmware path fix so we can switch no problem | 10:17 |
adam-metal3 | dtantsur: I have made this https://github.com/metal3-io/metal3-dev-env/pull/1326 I think UEFI default for dev-env is reasonable in general anyways | 10:25 |
dtantsur | adam-metal3: quick grep also shows export LIBVIRT_FIRMWARE="bios" | 10:25 |
adam-metal3 | in network | 10:26 |
dtantsur | ah, it's in a condition. okay. | 10:26 |
dtantsur | adam-metal3: then line 49 export BOOT_MODE="${BOOT_MODE:-legacy}" | 10:26 |
dtantsur | and update config_example.sh | 10:27 |
adam-metal3 | okay | 10:27 |
adam-metal3 | this is so badly organized boot mode defaults in network config what the hell.... | 10:28 |
adam-metal3 | but also in ansible | 10:28 |
dtantsur | adam-metal3: it's because you cannot boot over IPv6 network in legacy mode | 10:29 |
dtantsur | if we default to UEFI, this logic can be simplified | 10:29 |
adam-metal3 | yes | 10:29 |
adam-metal3 | dtantsur: I think now the PR has what is minimally needed to hopefully unclog the ci but I think I will move the boot mode stuff somewhere else, it is very wierd in network config , ofc I get the IPV6 part but the general selection logic is here also | 10:32 |
adam-metal3 | but I don't want to spam Ironic irc with metal3 madness | 10:33 |
dtantsur | :) | 10:33 |
* TheJulia tries to wake up with the worst hotel room coffee ever | 10:40 | |
TheJulia | zigo: maybe after the holidays I can, I'm sort of occupied this week unfortunately. | 10:41 |
opendevreview | Merged openstack/bifrost stable/2023.2: ironic: Perform online data migrations with localhost DB https://review.opendev.org/c/openstack/bifrost/+/901296 | 10:43 |
* dtantsur is wondering if the mdns idea was good in the end... | 10:44 | |
TheJulia | I still think it was | 10:44 |
TheJulia | but a valid question is "is anyone *really* using it". A conundrum is though, they might not know at this point | 10:45 |
TheJulia | or easily know until it is gone | 10:45 |
TheJulia | maybe s/easily/painfully/ | 10:45 |
TheJulia | it does make some things like manual introspection data updates super easy for folks like arne's group | 10:52 |
TheJulia | In theory, of course | 10:52 |
iurygregory | good morning Ironic | 11:21 |
TheJulia | Julia's crazy idea from over the weekend: https://bugs.launchpad.net/ironic/+bug/2046803 | 11:30 |
TheJulia | and it is kind of multiple ideas | 11:30 |
dtantsur | I'll bookmark it until I have some time for a long read :) | 11:31 |
TheJulia | it is definitely a high level idea | 11:31 |
TheJulia | that can kind of spawn, but sort of seems like "operationally reasonable" | 11:31 |
TheJulia | dunno, it is out there | 11:31 |
* dtantsur keeps producing RFEs for minor improvements: https://bugs.launchpad.net/ironic/+bug/2046428 | 11:37 | |
TheJulia | not sure worth microversioning given the redaction, then again it can be turned off, but still not sure we should support "you turned off redaction!" | 11:39 |
* TheJulia wonders if harald is rebooting, or if his irc connection just dislikes the world today | 11:42 | |
dtantsur | heh | 11:42 |
iurygregory | connection issues probably =D | 11:42 |
dtantsur | My IRC bouncer is on the Synology NAS here, so it's mostly unaffected by laptop reboots | 11:42 |
hjensas | rebooting :) | 11:56 |
Nisha_Agarwal | dtantsur, hi | 11:58 |
Nisha_Agarwal | TheJulia, hi | 11:58 |
Nisha_Agarwal | One quick question on session timeout | 11:59 |
Nisha_Agarwal | Is there a way we can configure the session timeout for redfish calls to the baremetal? | 11:59 |
Nisha_Agarwal | in sushy | 11:59 |
dtantsur | session timeout? | 12:01 |
Nisha_Agarwal | Yes auth token timeout | 12:01 |
Nisha_Agarwal | We were trying to certify one of the HPE server on RHOSP17.1 | 12:02 |
Nisha_Agarwal | and since its kolla based we see the session auth token expires very soon leading to "missing attribute" error | 12:02 |
dtantsur | Nisha_Agarwal: it's something the server controls though? | 12:02 |
Nisha_Agarwal | for resolving this we have chanegd the ironic.conf to use auth_type as basic | 12:03 |
Nisha_Agarwal | dtantsur, nope | 12:03 |
dtantsur | I guess the problem is not in the expiration itself, but rather in the wrong error that sushy does not retry? | 12:03 |
Nisha_Agarwal | Yes | 12:03 |
Nisha_Agarwal | So actual flow is that sushy gets session response as 401 | 12:03 |
Nisha_Agarwal | and instead of retrying here it sends that session error response to the called | 12:04 |
Nisha_Agarwal | caller* | 12:04 |
Nisha_Agarwal | then caller tries to parse the attributes | 12:04 |
Nisha_Agarwal | and fails with missing attribute error | 12:04 |
Nisha_Agarwal | The issue is very prominent when sushy is used from inside the container | 12:05 |
dtantsur | I'm not sure I understand how a container can be related.. | 12:05 |
Nisha_Agarwal | if u try the same thing outside the container, issue is seen very less...you can still hit the issue but when u add pdb | 12:05 |
Nisha_Agarwal | may be some timing issue | 12:06 |
Nisha_Agarwal | thats what my observation | 12:06 |
dtantsur | different versions? | 12:06 |
Nisha_Agarwal | we tried stable wallaby (3.7.6) inside and outside the container, and latest sushy 4.7.0 outside the container | 12:06 |
Nisha_Agarwal | and could hit the issue when used pdb outside the container | 12:07 |
Nisha_Agarwal | inside the container issue is seen without pdb very frequently | 12:07 |
Nisha_Agarwal | and to resolve this i could only change the authentication mechanism to "basic" | 12:08 |
Nisha_Agarwal | I was thinking if we could increase(configure) the session expiry and then made it worked, probably that would have been better | 12:08 |
Nisha_Agarwal | dtantsur, https://paste.openstack.org/show/bKBx58BeU1dYEyXPnJdJ/ | 12:11 |
Nisha_Agarwal | check the logs here | 12:11 |
TheJulia | Nisha_Agarwal: please open a bugzilla and add all the details possible, I’m basically out until the new year. | 12:11 |
Nisha_Agarwal | TheJulia, Yes my colleague is already posting all this in the query to RedHat | 12:12 |
TheJulia | Bz, or a support query. | 12:12 |
TheJulia | Err, not support request. | 12:12 |
Nisha_Agarwal | So we are now basically just questioning if we could certify the server with "basic" as auth_type? | 12:12 |
Nisha_Agarwal | or it has to be "auto" | 12:13 |
TheJulia | It is far from ideal, and truthfully it could only be done as a workaround really. | 12:13 |
Nisha_Agarwal | yes so the issue is seen in latest sushy as well if we just add pdb | 12:13 |
TheJulia | Can you certify with a workaround is. Or a question I can answer | 12:14 |
Nisha_Agarwal | Yes thats what we had the query... | 12:14 |
Nisha_Agarwal | Anyway all the queries are being added in the support request | 12:14 |
TheJulia | Not a question I mean. I’m on my phone right now. Sorry for typos. | 12:14 |
Nisha_Agarwal | :) | 12:14 |
Nisha_Agarwal | np | 12:15 |
TheJulia | Support cannot answer that question, either. | 12:15 |
Nisha_Agarwal | hmm then? | 12:15 |
Nisha_Agarwal | because the fix has to be done in master | 12:15 |
Nisha_Agarwal | and then backported | 12:15 |
Nisha_Agarwal | to wallaby in RHOSP | 12:15 |
TheJulia | It comes down to the certification requirements. Basically has to work without workarounds as I understand it. | 12:15 |
Nisha_Agarwal | hmmm | 12:15 |
TheJulia | Because the certification is determined through automated tooling. | 12:16 |
Nisha_Agarwal | yes i understand... | 12:16 |
Nisha_Agarwal | we couldnt find a way to configure this parameter when doing undercloud installation | 12:16 |
TheJulia | Yeah, it is not possible AFAIK. | 12:17 |
Nisha_Agarwal | hmmm | 12:17 |
TheJulia | Ideally, session auth should also be used, so we will need to take a look once the root cause is fully understood on master branch. | 12:17 |
Nisha_Agarwal | it does session auth, but only once | 12:20 |
Nisha_Agarwal | after that when it gets the session auth and it hits the GET call on "redfish/v1/Systems/Partition0" the session has expired by then | 12:20 |
Nisha_Agarwal | so actually there it gets the "invalid session error" with status 401 | 12:21 |
Nisha_Agarwal | and here instead of retrying the session token it just passes the response to the caller i.e. get_system() of sushy | 12:21 |
Nisha_Agarwal | and that fails with missing attribute error | 12:22 |
TheJulia | Please file a BZ if you can, I can look after the first of the year. If you develop a patch in the mean time, please add me to it. | 12:25 |
TheJulia | It seems rather odd the session is invalidated so quickly as well. Anyhow, I need to get going and shouldn't be working when I'm off. | 12:27 |
Nisha_Agarwal | :) | 12:28 |
Nisha_Agarwal | it is happening more frequently when running inside the container...when we do it normally , we dont hit this issue unless we add a pdb | 12:29 |
TheJulia | a container shouldn't impact its behavior at all, but a very detailed BZ will help us tremendiously along with information about what hardware and firmware this has been reproduced against | 12:33 |
jm1[m] | adam-metal3: dtantsur i was reading up on your discussion. did you find the root cause for this loader path issue? | 13:35 |
adam-metal3 | jm1[m]: I think the workaround I have made is more of an "ignore it" type solution, we don't have hard requirement for testing with legacy bios, so I will change the CI and dev-env to use uefi and I will go now and check the logs whether that has helped or not | 13:39 |
jm1[m] | adam-metal3: ack. it must be some kind of edge case. if sushy-tools encounters that the loader xml tag has been defined but no text (loader_path) has been set, then it would log a warning. but it does not, hence it is not sushy-tools that removes that loader_path. | 14:32 |
jm1[m] | something sets the loader xml tag but without the path. libvirtd is unhappy about that | 14:33 |
jm1[m] | anyway, glad you found a workaround. then maybe my patch for sushy-tools 1.1.0 in ironic-image can finally be merged. and THEN we might see some fallout from my nvram patch ;) | 14:35 |
adam-metal3 | jm1[m]: I am still testing , who knows I might haunt you with this issue in the future also :D | 14:38 |
adam-metal3 | but thanks for all the help so far | 14:39 |
jm1[m] | adam-metal3: np :D | 14:45 |
JayF | I might be a couple minutes late in starting the meeting today. Anyone with rights can feel free to start it on time or I should have it started before 5 minutes after. | 14:58 |
JayF | #startmeeting ironic | 15:01 |
opendevmeet | Meeting started Mon Dec 18 15:01:28 2023 UTC and is due to finish in 60 minutes. The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:01 |
opendevmeet | The meeting name has been set to 'ironic' | 15:01 |
JayF | #topic Announcments/Reminder | 15:01 |
JayF | #info Standing reminder to review patches tagged ironic-week-prio and to hashtag your patches; https://tinyurl.com/ironic-weekly-prio-dash | 15:01 |
JayF | #info The next two Ironic meetings (Dec 25, Jan 1 2024) are cancelled. | 15:02 |
JayF | #topic Review Action Items | 15:02 |
JayF | #info JayF emailed list about cancelled meetings | 15:02 |
iurygregory | o/ | 15:02 |
rpittau | o/ | 15:02 |
JayF | #topic Caracal Release Schedule | 15:02 |
JayF | #info Next Milestone R-17, Caracal-@ on Jan 11 | 15:02 |
JayF | Any other comments on the release schedule? Anything we need to consider? | 15:03 |
dtantsur | o/ | 15:03 |
dtantsur | where do we stand with intermediate releases? | 15:03 |
JayF | I've cut none. | 15:03 |
dtantsur | rpittau: ^^ | 15:03 |
rpittau | we cut bugfix in decemeber | 15:04 |
rpittau | next ones will be in February, end of it | 15:04 |
dtantsur | \o/ | 15:04 |
rpittau | and thanks to that we also released ironic-image in metal3 :) | 15:04 |
JayF | Ack, sounds like we're on track then. | 15:04 |
rpittau | yup | 15:04 |
JayF | On this general topic; how is bugfix support in release automation / retiring the old ones? | 15:04 |
JayF | I know that was in process but sorta lost the thread on it during my vacation | 15:05 |
rpittau | JayF: I've opened a patch for that but I didn't get the talk going with the release team after an initial discussion | 15:05 |
rpittau | this is the patch btw https://review.opendev.org/c/openstack/releases/+/900810 | 15:05 |
JayF | ack; so in progress just low priority and not moving quickly it seems | 15:06 |
JayF | basically what I expected | 15:06 |
rpittau | yeah :/ | 15:06 |
JayF | #topic OpenInfra Meetup at CERN June 6 2024 | 15:06 |
JayF | Looks like someone added an item suggesting a meetup for Ironic be done during this. | 15:06 |
rpittau | yes! | 15:06 |
JayF | Sounds like a good idea. I would try to go but will note that "I will try to go" still means very low likelihood | 15:06 |
JayF | so please someone else own this :D | 15:07 |
rpittau | :D | 15:07 |
rpittau | I proposed it, I will own it :) | 15:07 |
JayF | awesome \o/ | 15:07 |
JayF | Gotta go see some protons go boom | 15:07 |
rpittau | arne_wiebalck: this ^ probably is of your interest | 15:07 |
rpittau | I guess a good dte would be June 5 as people will probably travel on Friday (June 7) | 15:08 |
iurygregory | Meetup is probably complicated I would say .-., even Summit is complicated to get $budget | 15:08 |
JayF | I would say picking a date is probably getting ahead of ourselves | 15:08 |
JayF | maybe just send out anemail and get feelers? | 15:09 |
rpittau | yeah, that's the intention, I was just thinking out loud | 15:09 |
JayF | I know if I went, I'd probably have to combine a UK trip with it, so I might actually be more able to go on the 7th | 15:09 |
JayF | Anything else on this topic? | 15:10 |
JayF | #topic Review Ironic CI Status | 15:11 |
dtantsur | Bifrost DHCP jobs are broken, presumably since updating ansible-collection-openstack. We don't know why. | 15:11 |
JayF | I'll note it broke for a couple days last week due to an Ironic<>Nova driver chain being tested on the *tip* but an intermediate patch broke it. | 15:11 |
JayF | Now that whole chain of the openstacksdk migration has landed and those jobs are happy | 15:12 |
rpittau | dtantsur: can we rebase the revert on top of https://review.opendev.org/c/openstack/bifrost/+/903755 to collect the dnsmasq config ? | 15:12 |
dtantsur | doing | 15:12 |
rpittau | tnx | 15:12 |
JayF | #info Bifrost DHCP jobs broke by ansible-collection-openstack upgrade; revert and investigation in progress. | 15:12 |
JayF | Anything else on hte gate? | 15:13 |
opendevreview | Dmitry Tantsur proposed openstack/bifrost master: DNM Revert "Support ansible-collections-openstack 2 and later" https://review.opendev.org/c/openstack/bifrost/+/903694 | 15:13 |
JayF | #topic Bug Deputy | 15:14 |
JayF | rpittau was bug deputy this week; anything interesting to report? | 15:14 |
rpittau | nothing new, it was really calm, I triaged a couple of old things | 15:15 |
JayF | Any volunteers to take the baton this week? | 15:15 |
JayF | If not I think this it is reasonable to say "community" can do it through the holiday? | 15:15 |
dtantsur | yeah | 15:16 |
rpittau | yep | 15:16 |
JayF | #info No specific bug deputy assigned through holiday weeks; Ironic community members encouraged to triage as they are working and have time. | 15:16 |
JayF | #topic RFE Review | 15:16 |
JayF | One for dtantsur | 15:16 |
JayF | #link https://bugs.launchpad.net/ironic/+bug/2046428 Move configdrive to an auxiliary table | 15:16 |
JayF | dtantsur: my big concern about this is how nasty is the migration going to be | 15:16 |
dtantsur | It's a small one, but has an API visibility | 15:16 |
dtantsur | well | 15:16 |
dtantsur | We won't migrate existing configdrives; the code will need to handle both locations for a good while | 15:17 |
JayF | I don't think it's going to be small for scaled up deployments with lots of active configdrive instances :) | 15:17 |
JayF | oooh, so we're not going to migrate the field outta node? | 15:17 |
dtantsur | Well, there is no "field" | 15:17 |
dtantsur | It's just something in instance_info currently | 15:17 |
JayF | *pulls up an api ref* | 15:18 |
dtantsur | So, new code will stop inserting configdrive into instance_info, but will keep reading it from both places | 15:18 |
JayF | This is ~trivial | 15:18 |
JayF | instance_info is not microversioned, we really can't microversion it | 15:18 |
dtantsur | *nod* | 15:18 |
JayF | unless we want to make changes in our nova/ironic driver harder than they already are | 15:18 |
dtantsur | :D | 15:19 |
JayF | Would we still support storing configdrives in swift? | 15:19 |
dtantsur | Absolutely | 15:19 |
JayF | Would we ever use this table in that case? | 15:19 |
JayF | e.g. I can't reach swift; is this table now a fallback? | 15:19 |
dtantsur | I don't know how many people store that in swift, to be honest. It's opt-in. | 15:19 |
JayF | That's fair. I think from that perspective because my two largest environments did | 15:20 |
JayF | but I'm sure my downstream now doesn't | 15:20 |
JayF | and swift usage is much lower | 15:20 |
dtantsur | my downstream definitely does not either :) | 15:20 |
JayF | I am +2 on the feature, and like, +.999999 to it without a spec | 15:20 |
JayF | let me put it this way: there's no way I'd be able to implement this safely without a spec | 15:20 |
JayF | but you may be able to | 15:20 |
dtantsur | The patch is likely going to be shorter than even a short spec. | 15:21 |
rpittau | not sure about the spec either, but probably not needed | 15:21 |
JayF | I think my big concern is more around code we might need to write but don't know than the code we'd know we need to write :) | 15:22 |
JayF | but you can't reduce my concern around unknown unknowns lol | 15:22 |
dtantsur | I'm afraid I cannot :D | 15:22 |
JayF | any objection to an approval as it sits, then? | 15:22 |
iurygregory | none from me | 15:22 |
JayF | #info RFE 2046428 approved | 15:22 |
JayF | #topic Open Discussion | 15:23 |
JayF | Anything for open discussion? | 15:23 |
dtantsur | Wanna chat about https://review.opendev.org/c/openstack/ironic/+/902801 ? | 15:23 |
dtantsur | I may be missing the core of your objections to it | 15:24 |
JayF | I don't like the shape of that change and I don't know how to express it | 15:24 |
JayF | I think you are | 15:24 |
JayF | and I think I am, to an extent | 15:24 |
dtantsur | (and would happily hear other opinions; no need to read the code, the summary should be enough) | 15:24 |
JayF | So basically we have a pie of threads | 15:24 |
JayF | right now, we have AFAICT, two config options to control how that pie is setup | 15:25 |
dtantsur | one? | 15:25 |
JayF | "how big is the pie" (how many threads) and "how much of the pie do periodic workers get to use" | 15:25 |
dtantsur | the latter is not a thing | 15:25 |
JayF | that is untrue, I looked it up, gimme a sec and I'll link | 15:25 |
dtantsur | https://review.opendev.org/c/openstack/ironic/+/902801/2/ironic/conductor/base_manager.py#335 | 15:26 |
dtantsur | that's the same executor... | 15:26 |
JayF | https://opendev.org/openstack/ironic/src/branch/master/ironic/conf/conductor.py#L89 | 15:26 |
JayF | it's the same executor, but we allow you to limit how much of that executor that the periodics will use | 15:26 |
dtantsur | *each periodic* | 15:26 |
JayF | OH | 15:27 |
dtantsur | 1 periodic can use 8 threads. 100 periodics can use 800 threads. | 15:27 |
dtantsur | This was done for power sync IIRC | 15:27 |
JayF | This conversation helps me get to the core of my point though, actually, which is nice | 15:27 |
dtantsur | it's used like this https://opendev.org/openstack/ironic/src/branch/master/ironic/conductor/manager.py#L1415-L1424 | 15:28 |
JayF | I worry that we are goign to make it extremely difficult to figure out sane values for this in scaled up environments | 15:28 |
JayF | hmm but you didn't want it to be configurable | 15:28 |
dtantsur | I do have a percentage | 15:29 |
JayF | you just wanted to reserve 5% of the pie at all times for user-interactive-apis | 15:29 |
dtantsur | it's a config https://review.opendev.org/c/openstack/ironic/+/902801/2/ironic/conf/conductor.py#31 | 15:29 |
JayF | I'm going to reorient my question | 15:29 |
dtantsur | 5% of the default 300 is 15, which matches my personal definition of "several" :) | 15:29 |
JayF | Do these configs exist in a post-eventlet world? | 15:29 |
dtantsur | Possibly? | 15:30 |
JayF | As laid out in the current draft in governance (if you've read it) | 15:30 |
dtantsur | We may want to limit the concurrency for any asynchronous approach we take | 15:30 |
dtantsur | Otherwise, we may land in the situation where Ironic is doing so much in parallel that it never gets to the bottom of its backlog | 15:30 |
JayF | I think I'm just trying to close the barn door when the horse has already escaped w/r/t operational complexity :( | 15:31 |
JayF | and everytime we add something like this, it gets a little harder for a new user to understand how Ironic performs, and we'll never get rid of it | 15:31 |
dtantsur | I cannot fully agree with both statements | 15:31 |
dtantsur | We *can* get rid of configuration options for sure. Removing eventlet will have a huge impact already. | 15:32 |
JayF | well agree or not, it's basically an exasperated "I give up" because I don't have a better answer and I don't want to stand in your way | 15:32 |
dtantsur | Well, it's not super critical for me. If nobody thinks it's a good idea, I'll happily walk away from it. | 15:32 |
dtantsur | (Until the next time someone tries to deploy 3500 nodes within a few hours, lol) | 15:32 |
JayF | I think it's a situation where we're maybe putting a bandaid on a wound that needs stitches, right? | 15:33 |
JayF | but the last thing we need is another "lets take a look at this from another angle" sorta thing | 15:33 |
JayF | and with eventlet's retirement from openstack on the horizon, there's no point | 15:33 |
dtantsur | "on the horizen" ;) | 15:33 |
dtantsur | I keep admiring your optimism :) | 15:34 |
JayF | so kick the can down the road is probably the right call; whether that means for me to stop fighting and drop my -1 or for us to just be OK with the concurrency chokeout bug until it's gone | 15:34 |
JayF | dtantsur: we don't have a choice | 15:34 |
JayF | dtantsur: have you looked at how bad eventlet is on 3.12? | 15:34 |
dtantsur | Not beyond what you shared with us | 15:34 |
JayF | dtantsur: I have optimism only because staying on eventlet is harder than migrating off in the medium term | 15:34 |
dtantsur | I know we must do it; I just don't know if we can practically do it | 15:34 |
JayF | which isn't exactly "optimism" so much as "out of the fire and into the pan" | 15:34 |
dtantsur | :D | 15:34 |
JayF | dtantsur: smart people have already answered the question "yes we can, and here's how" | 15:35 |
dtantsur | \o/ | 15:35 |
JayF | I think code is already written which mades asyncio and eventlet code work together | 15:35 |
JayF | using eventlet/aiohub (iirc) | 15:35 |
JayF | https://github.com/eventlet/aiohub | 15:35 |
* dtantsur doesn't want to imagine potential issues that may arise from it... | 15:35 | |
JayF | hberaud is working on it, along with some others (including itamarst from GR-OSS) | 15:35 |
JayF | dtantsur: I'm thinking the opposite. I'm looking at the other side of this, and seeing any number of "recheck random BS failure" things disappearing | 15:36 |
dtantsur | But.. if eventlet stays in some form, so do these options? | 15:36 |
JayF | dtantsur: I'm telling you, eventlet's status today is miserable | 15:36 |
JayF | dtantsur: probably, yeah :/ | 15:36 |
JayF | dtantsur: so I am like, going to pull my -1 off that. I'm not +1/+2 to the change but don't have a better idea | 15:36 |
dtantsur | Okay, let's see what the quiet people here say :) if someone actually decided it's a good idea, we'll do it. otherwise, I'll silently abandon it the next time I clean up my backlog. | 15:37 |
JayF | As another note for open discussion | 15:38 |
JayF | I believe I'm meeting downstream with a potential doc contractor | 15:38 |
JayF | that we sorta put in motion with my downstream a few weeks ago | 15:38 |
dtantsur | \o/ | 15:38 |
JayF | maybe I'll ask them how to make a decoder ring for 902801 :P | 15:38 |
JayF | Anything else for open discussion? | 15:38 |
JayF | Thanks everyone, have a good holiday o/ | 15:40 |
JayF | #endmeeting | 15:40 |
opendevmeet | Meeting ended Mon Dec 18 15:40:22 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:40 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-12-18-15.01.html | 15:40 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-12-18-15.01.txt | 15:40 |
opendevmeet | Log: https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-12-18-15.01.log.html | 15:40 |
JayF | FWIW; I should be around this week, likely taking PTO Friday unless I have something come up. Similar next week; other than Monday I'll be around. First week of Jan I'll be gone most of the week. | 15:40 |
dtantsur | I'll be out from Friday, through the next week and on the 1st | 15:42 |
rpittau | I'll also be out from Friday and be in only 2 days the first week of January (3-4) | 15:42 |
iurygregory | I'm out from 02-12 January =) | 15:46 |
* JayF has a friend flying in 1 Jan to come with him to https://www.nhl.com/kraken/fans/winter-classic | 15:47 | |
rpittau | release the kraken! | 15:52 |
rpittau | good night! o/ | 16:56 |
JayF | o/ | 17:13 |
opendevreview | Vasyl Saienko proposed openstack/networking-baremetal master: Do not try to bind port when we can't https://review.opendev.org/c/openstack/networking-baremetal/+/903252 | 17:59 |
JayF | dtantsur: I'll note, github.com/eventlet/eventlet is now alive again, you can potentially evaluate other solutions for https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/NMCPYYHUPG766V5MGUUEKNIDEV6RCELC/#H5QDJULOMS73WX34SZUD5AOCV3GDIAQA | 21:02 |
JayF | dtantsur: from my chat with itamarst, it sounds like python 3.7 is as far back as they are testing new eventlet changes, so I'm not sure I'd be confident a new release would be keen on 3.6, and I'm not sure we could just remove the one 2.x compat piece without a whole PR to address it | 21:03 |
JayF | dtantsur: just thinking out loud and letting you know I haven't forgotten this (yet) :D | 21:03 |
JayF | dtantsur: I was looking into this, got to the point where I really wanted to see the centos stream 8 patch for the eventlet RPM ... and I don't have access to the source rpms because they are paywall'd (well, login-wall'd), so I punted :/ | 21:09 |
JayF | dtantsur: essentially itamar said, more or less "PRs welcome" in my downstream slack; so if there's a fix I suspect new-upstream might be amenable to it. Whether or not it's safe to use on python 3.6 is a question you'd have to test for :D | 21:12 |
JayF | Hmm. I got ahold of that source; it doesn't look like the eventlet RPM from https://buildlogs.centos.org/centos/8-stream/cloud/x86_64/openstack-xena/Packages/p/ was patched at all | 21:24 |
* JayF maybe didn't understand something, or is the CVE patched version elsewhere | 21:24 | |
JayF | I realized that searching for SOURCE rpms for python packages was a little silly. | 21:24 |
* JayF inspects all the .py files for ones and zeros /s | 21:24 | |
JayF | FYI Ironic folks: https://blueprints.launchpad.net/nova/+spec/ironic-guest-metadata has some Ironic work items in it now too; thought it'd be wise to share it around here some too | 22:23 |
JayF | nothing we haven't talked about; but wanted to ensure it's been spread around | 22:23 |
JayF | I'll put the Ironic half of this in RFE bugs once we have agreement on the nova half | 22:23 |
JayF | Sharding re-proposed in nova; on top of https://review.opendev.org/c/openstack/nova/+/900831/ -- now that I have a stack to test it on, I'll point my attention in the direction of tempest | 23:12 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!