kubajj | Good morning Ironic! o/ | 07:32 |
---|---|---|
masghar | Good morning! | 08:38 |
mmalchuk | morning Ironic o/ | 09:04 |
mmalchuk | rpittau any updates on CI ? | 09:04 |
iurygregory | good morning ironic | 11:21 |
dtantsur | TheJulia: FRESH BRAINZZ! (mine is ready once you are, but I'll need to leave earlier today) | 12:21 |
TheJulia | braiinnz! | 13:02 |
TheJulia | good morning! | 13:10 |
TheJulia | dtantsur: so do we see the same level in OpenStack CI at all? Or was that just a particularly bad example? | 13:16 |
TheJulia | I sort of have a theory as to it, but still trying to wrap my head around it | 13:17 |
dtantsur | TheJulia: you mean, the metal3 failure? I haven't collected statistics yet. It's not permanent for sure. | 13:26 |
TheJulia | yeah, so I noticed some socket errors and huge interaction latencies appear, that being said a good chunk of it seems to be caused by locking, all sort of originating after the ipmi power status check | 13:28 |
TheJulia | I guess, I am curious what Nordix's CI is backed, but I'm just lacking context there | 13:33 |
dtantsur | I think we use a mix of redfish and ipmi | 13:34 |
TheJulia | yeah | 13:38 |
TheJulia | interestingly, found an instance in zuul's logs, but nowhere near as severe as the one you linked | 13:38 |
TheJulia | more "i'm slightly grumpy..." than anything else | 13:39 |
TheJulia | I feel like we might be leaving the db locked/orphaning something on getting a node, but I might just be hyper focusing... I feel like I need to go look at the sqlalchemy sqlite driver | 13:44 |
iurygregory | dtantsur, we also found the issue downstream I think it was in one of the bugs that rpittau was working on | 13:52 |
dtantsur | iurygregory: mm, which one? | 13:59 |
iurygregory | pm =) | 14:02 |
*** JasonF is now known as JayF | 14:07 | |
dtantsur | ah, well.. this was related to outdated code downstream | 14:09 |
iurygregory | yeah, but since we are still seeing issues upstream I think it can probably happen downstream again ... | 14:10 |
dtantsur | Possibly, I'm literally talking with someone with similar symptoms (although I'm not sure they have the Riccardo's fix). | 14:11 |
* dtantsur is on the verge of desperation with sqlite, sqlalchemy and our usage of them.. | 14:11 | |
TheJulia | dtantsur: up for a chat in say 10 minutes? | 14:20 |
dtantsur | TheJulia: possibly in 10 more, finishing something here right now | 14:32 |
TheJulia | no worries | 14:33 |
dtantsur | TheJulia: free now | 14:40 |
TheJulia | https://meet.google.com/mec-txxy-aqi | 14:40 |
dtantsur | TheJulia: https://review.opendev.org/c/openstack/ironic/+/887835 | 14:49 |
JayF | prepare to nest your meetings ;) | 14:59 |
JayF | #startmeeting ironic | 15:00 |
opendevmeet | Meeting started Mon Aug 14 15:00:57 2023 UTC and is due to finish in 60 minutes. The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
opendevmeet | The meeting name has been set to 'ironic' | 15:00 |
iurygregory | o/ | 15:01 |
masghar | o/ | 15:01 |
dtantsur | o/ | 15:01 |
TheJulia | o/ | 15:01 |
JayF | Good morning, welcome to the Ironic meeting. A reminder we operate under the OpenInfra Foundation CoC https://openinfra.dev/legal/code-of-conduct | 15:01 |
kubajj | o/ | 15:01 |
JayF | #topic Announcements/Reminder | 15:01 |
JayF | Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio: https://tinyurl.com/ironic-weekly-prio-dash | 15:01 |
JayF | thank you for cleaning up that dashboard, btw, it's nice to see it actually shrink as things land haha | 15:02 |
JayF | Note that the next Bobcat milestone is in 10 days; the non-client library freeze | 15:02 |
JayF | #note Reminder PTG will take place virtually 2023-10-23 through 2023-10-27. Please document any items for discussion here | 15:03 |
JayF | #link | 15:03 |
JayF | #undo | 15:03 |
opendevmeet | Removing item from minutes: #link | 15:03 |
JayF | #link https://etherpad.opendev.org/p/ironic-ptg-october-2023 | 15:03 |
JayF | That's a little over two months away but it will sneak up | 15:03 |
JayF | That's all I've got for announcements. | 15:04 |
JayF | No action items from last meeting, skipping that items. | 15:04 |
JayF | #topic Reivew Ironic CI status | 15:04 |
JayF | rpittau: do you have an update on how bifrost CentOS job is doing? | 15:04 |
iurygregory | I think he is out today | 15:04 |
JayF | I also think TheJulia and dtantsur are working through some metal3 locking issues; I'm unsure if that impacts our gate. | 15:04 |
JayF | ack ty iurygregory I'm sure he'll update the channel tomorrow | 15:04 |
dtantsur | Not much so far, seems performance-dependent | 15:05 |
TheJulia | very performance dependent, unfortunately | 15:05 |
JayF | Our CI is very good at finding those kinds of issues :-| | 15:06 |
TheJulia | I went through our logs for recent metal3-integration jobs, and found nothing as horrible as the nordix job dmitry linked to me last week | 15:06 |
JayF | Oh, so performance the *other way* lol | 15:06 |
TheJulia | unfortunately, it seems | 15:07 |
JayF | Thank you all for looking at that, if there's anything I can help review or fix let me know. | 15:07 |
JayF | Is there anything else notable about CI before we move on? | 15:07 |
JayF | Moving on. | 15:08 |
JayF | #topic Ongoing 2023.1 Workstreams | 15:08 |
JayF | #link https://etherpad.opendev.org/p/IronicWorkstreams2023.2 | 15:08 |
JayF | We're reaching the last weeks of the cycle. If we want something in Bobcat we have to move soon :D | 15:09 |
JayF | I know service steps is close and needs review attention. | 15:09 |
dtantsur | we could use more eyes on masghar's changes, first and foremost https://review.opendev.org/c/openstack/ironic/+/887554 | 15:09 |
masghar | Thank you ^^ | 15:09 |
JayF | good stuff, ty masghar it's on my list | 15:10 |
masghar | Should I tag it ironic-weekly-priority? | 15:10 |
JayF | hashtag: ironic-week-prio | 15:10 |
JayF | and it will show up in review dashboards | 15:10 |
masghar | Alright, thanks | 15:10 |
JayF | you have to "Show all" for the hashtag field to show up | 15:10 |
masghar | I was trying to find it thanks | 15:11 |
* dtantsur needs to leave now, will respond to any pings tomorrow | 15:11 | |
JayF | I'm going to move on | 15:11 |
JayF | #topic OpenStack User Survey updates | 15:11 |
JayF | The OpenStack User Survey is asking for projects to review project-specific questions | 15:11 |
JayF | which people are shown if they choose "yep, I use [project]" | 15:11 |
JayF | AFAICT, right now Ironic has no questions. | 15:12 |
JayF | I've started drafting some to add here: | 15:12 |
JayF | #link https://etherpad.opendev.org/p/ironic-user-survey-questions-2023 | 15:12 |
JayF | please give feedback/brainstorm/etc in that Etherpad | 15:12 |
JayF | in the next day or two I'll be consolidating this down and submitting it; we only have until Friday; so please prioritize this if you care to have input | 15:12 |
TheJulia | Suggested a question to help size the usage | 15:14 |
JayF | Perfect, we'll have discussion there in the therpad | 15:15 |
JayF | There are no RFEs currently submitted for review so we're own to | 15:15 |
JayF | #topic Open Dicussion | 15:15 |
JayF | #link https://bugs.launchpad.net/ironic/+bug/2030976 | 15:16 |
JayF | scottsol discovered a security bug in Ironic, which appears to impact most openstack projects | 15:16 |
JayF | where sensitive data is being places in notifications | 15:16 |
JayF | there is a draft PR up to fix it: | 15:16 |
JayF | #link https://review.opendev.org/c/openstack/oslo.messaging/+/891096 | 15:16 |
JayF | I say "draft" but only because it's not been reviewed or merged yet. It's confirmed to fix the issue and pass tests. | 15:17 |
JayF | Please take due notice of the bug, and keep an eye out for the OSS{A,N} coming down the pipe chen completed | 15:17 |
JayF | *when | 15:17 |
JayF | That's all I have for open discussion; anyone wanna talk about this or anything else? | 15:17 |
TheJulia | Also, as additional context, it is when notifications are logged to the message queue as opposed to an actual log file. | 15:17 |
TheJulia | s/queue/bus/ | 15:18 |
JayF | Last call before I close up the meeting. | 15:20 |
TheJulia | I got nothing | 15:20 |
JayF | #endmeeting | 15:20 |
opendevmeet | Meeting ended Mon Aug 14 15:20:59 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:20 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-08-14-15.00.html | 15:20 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-08-14-15.00.txt | 15:20 |
opendevmeet | Log: https://meetings.opendev.org/meetings/ironic/2023/ironic.2023-08-14-15.00.log.html | 15:20 |
opendevreview | Jakub Jelinek proposed openstack/ironic master: WIP: Introduce default kernel/ramdisks by arch https://review.opendev.org/c/openstack/ironic/+/890819 | 15:24 |
opendevreview | Merged openstack/ironic master: Support sha256/sha512 with the ilo firmware upgrade logic https://review.opendev.org/c/openstack/ironic/+/882164 | 16:36 |
JayF | TheJulia: You had mentioned something about syncing up on service steps; I am out this afternoon with medical stuff so if you want to do it, it's gotta be nowish or over the next two hours | 16:41 |
TheJulia | I can in about 15 if that works? | 16:42 |
JayF | sure | 16:43 |
opendevreview | Julia Kreger proposed openstack/ironic master: Slow down the sqlite retry https://review.opendev.org/c/openstack/ironic/+/891333 | 16:59 |
opendevreview | Julia Kreger proposed openstack/ironic master: Log upon completion of power sync https://review.opendev.org/c/openstack/ironic/+/891334 | 16:59 |
opendevreview | Julia Kreger proposed openstack/ironic master: Don't yield on power sync at the end of the work https://review.opendev.org/c/openstack/ironic/+/891335 | 16:59 |
TheJulia | JayF: https://meet.google.com/ewi-rybd-mub | 16:59 |
kubajj | dtantsur, TheJulia or anybody who has any opinion about ironic config, I drafted a functional implementation of the deploy_kernel_by_arch here: https://review.opendev.org/c/openstack/ironic/+/890819 | 18:31 |
JayF | thanks kubajj; what great timing I was just asking Julia to look at that for ya literally 90 seconds ago :D | 18:31 |
kubajj | I will add similar behaviour for rescue_..._by_arch and add the original parameters back in a hierarchical manner as JayF suggested | 18:32 |
kubajj | Thanks JayF:) | 18:33 |
JayF | you will also, potentially, depending on timing of it landing, need to do the same for service (Ironic node service is what I just reviewed for Julia) | 18:33 |
JayF | I'd say that's likely | 18:33 |
JayF | but in a perfect world, you just respect mode in that method and *_kernel/*_ramdisk will work | 18:34 |
TheJulia | kubajj: feedback posted from my point of view on the configuration change | 18:35 |
kubajj | TheJulia: thanks | 18:37 |
TheJulia | CI is not happy today :( | 18:44 |
iurygregory | probably a newbie question, but the code of conduct for openstack-discuss is the one from openinfra right? | 21:11 |
JayF | Absolutely | 21:12 |
iurygregory | tks JayF | 21:12 |
opendevreview | Julia Kreger proposed openstack/ironic-python-agent master: Handle the node being locked https://review.opendev.org/c/openstack/ironic-python-agent/+/891357 | 21:32 |
TheJulia | dtantsur: so it looks like a couple different things happened which cascaded. One of them being the agent is kind of agressive about re-querying ironic when ironic says "node is locked right now". But "why" the node was locked looks to be rooted, in this case, back with introspection stuck in its internal method for ~1200 seconds in one case. Agent did actually log sort of the right thing, except what happened is it kept | 21:32 |
TheJulia | requesting over and over, some of which started to stack up waiting for locks to interact on the file and db. | 21:32 |
TheJulia | dtantsur: so I think tomorrow, the key question is "why did that task hangout for so long!?" and then maybe just come to the conclusion if it was environmental... or not. | 21:33 |
JayF | Should we make IPA retry less aggressively, too? | 21:34 |
TheJulia | look at the change I just posted for ipa :) | 21:36 |
TheJulia | it is more a cascading result as opposed to the root cause though | 21:36 |
TheJulia | but 287 attempts to hit the /v1/lookup endpoint | 21:36 |
TheJulia | in the nordix logs dmitry linked from last week | 21:37 |
TheJulia | *part* of that is that introspection, *did* fail it seems | 21:37 |
TheJulia | just don't see why, at least yet | 21:37 |
TheJulia | from some of the logging I found, it basically retired every 15 seconds, which in that system it was taking 7-14 seconds just do do the needful in some cases when it *could* proceed | 21:40 |
TheJulia | going to head out, dtantsur lets try and sync tomorrow when I get online, I've got a few other small patches up, but tl;dr ci is not happy today due to connectivity/unrelated issues | 22:11 |
opendevreview | Merged openstack/ironic master: Fix several issues in the lock/release database code https://review.opendev.org/c/openstack/ironic/+/887835 | 23:09 |
*** dmellado81918 is now known as dmellado8191 | 23:17 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!