*** gabys has joined #openstack-ironic | 00:46 | |
*** gabys has quit IRC | 00:51 | |
*** hoangcx has joined #openstack-ironic | 00:53 | |
openstackgerrit | Merged openstack/ironic-inspector-specs master: fix tox python3 overrides https://review.openstack.org/607449 | 00:59 |
---|---|---|
*** jiapei has joined #openstack-ironic | 01:39 | |
*** MattMan_1 has quit IRC | 01:43 | |
*** MattMan_1 has joined #openstack-ironic | 01:43 | |
*** tiendc has joined #openstack-ironic | 02:22 | |
*** trungnv has joined #openstack-ironic | 03:06 | |
*** jaganathan has joined #openstack-ironic | 03:17 | |
*** hoangcx has quit IRC | 04:27 | |
openstackgerrit | Dhanuka proposed openstack/sushy master: WIP: Add `CompositionService` top-level resource https://review.openstack.org/608563 | 04:53 |
*** jtomasek has joined #openstack-ironic | 05:17 | |
openstackgerrit | Kaifeng Wang proposed openstack/ironic-inspector-specs master: Configurable introspection data store https://review.openstack.org/587698 | 06:11 |
*** e0ne has joined #openstack-ironic | 06:21 | |
*** iurygregory has joined #openstack-ironic | 06:23 | |
*** gabys has joined #openstack-ironic | 06:58 | |
openstackgerrit | Kaifeng Wang proposed openstack/ironic-inspector master: DNM/TEST grenade job https://review.openstack.org/608591 | 07:14 |
*** jtomasek has quit IRC | 07:16 | |
*** jtomasek has joined #openstack-ironic | 07:17 | |
*** rcernin has quit IRC | 07:21 | |
*** moshele has joined #openstack-ironic | 07:26 | |
*** pvc has joined #openstack-ironic | 07:35 | |
pvc | hi anyone not busy here | 07:35 |
openstackgerrit | Kaifeng Wang proposed openstack/ironic-inspector master: DNM/TEST grenade job https://review.openstack.org/608591 | 07:52 |
*** bandini has joined #openstack-ironic | 08:03 | |
*** olivierb has joined #openstack-ironic | 08:06 | |
*** dciabrin has joined #openstack-ironic | 08:11 | |
*** serlex has joined #openstack-ironic | 08:11 | |
*** tssurya has joined #openstack-ironic | 08:15 | |
*** S4ren has joined #openstack-ironic | 08:25 | |
*** lenka has joined #openstack-ironic | 08:47 | |
openstackgerrit | Moshe Levi proposed openstack/ironic-specs master: Add Support for Smart NIC https://review.openstack.org/582767 | 08:48 |
*** hoangcx has joined #openstack-ironic | 08:49 | |
*** stendulker has joined #openstack-ironic | 08:54 | |
*** gabys has quit IRC | 08:57 | |
*** jrist has joined #openstack-ironic | 08:57 | |
*** gabys has joined #openstack-ironic | 08:57 | |
*** jrist has quit IRC | 09:02 | |
*** jrist has joined #openstack-ironic | 09:13 | |
openstackgerrit | Madhuri Kumari proposed openstack/ironic master: Implement basic interfaces for GraphicalConsole Interface https://review.openstack.org/547356 | 09:17 |
*** pvc has quit IRC | 09:30 | |
*** lenka has quit IRC | 09:36 | |
openstackgerrit | Madhuri Kumari proposed openstack/ironic master: Implement basic interfaces for GraphicalConsole Interface https://review.openstack.org/547356 | 09:40 |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic-inspector master: Replace subprocess with processutils https://review.openstack.org/606349 | 09:45 |
openstackgerrit | Madhuri Kumari proposed openstack/ironic master: Implement basic interfaces for GraphicalConsole Interface https://review.openstack.org/547356 | 09:46 |
openstackgerrit | Madhuri Kumari proposed openstack/ironic master: VNC Console: Update RPC object and related change https://review.openstack.org/599560 | 09:47 |
*** dtantsur|afk is now known as dtantsur | 09:49 | |
dtantsur | TheJulia: \o/ at a grub job | 09:49 |
dtantsur | morning ironic | 09:49 |
iurygregory | good morning | 09:51 |
openstackgerrit | Madhuri Kumari proposed openstack/ironic master: Implement basic interfaces for GraphicalConsole Interface https://review.openstack.org/547356 | 09:55 |
*** iurygregory is now known as iurygregory|lunc | 09:59 | |
etingof | o/ everyone ironic | 10:00 |
*** jtomasek has quit IRC | 10:09 | |
openstackgerrit | paresh sao proposed openstack/sushy master: Requests session keyword arguments for sushy connector https://review.openstack.org/607809 | 10:12 |
*** moshele has quit IRC | 10:26 | |
*** lenka has joined #openstack-ironic | 10:26 | |
*** jtomasek has joined #openstack-ironic | 10:28 | |
*** rcernin has joined #openstack-ironic | 10:35 | |
*** moshele has joined #openstack-ironic | 10:49 | |
*** moshele has quit IRC | 10:59 | |
*** moshele has joined #openstack-ironic | 11:00 | |
*** stendulker has quit IRC | 11:07 | |
*** lenka has quit IRC | 11:09 | |
*** lenka has joined #openstack-ironic | 11:15 | |
*** rcernin has quit IRC | 11:17 | |
*** iurygregory|lunc is now known as iurygregory | 11:32 | |
*** adrianc has joined #openstack-ironic | 11:33 | |
*** rpittau has joined #openstack-ironic | 11:35 | |
*** adrianc has quit IRC | 11:38 | |
*** lenka has quit IRC | 11:40 | |
*** jrist has quit IRC | 11:49 | |
*** jrist has joined #openstack-ironic | 11:50 | |
*** jrist has quit IRC | 11:54 | |
*** lenka has joined #openstack-ironic | 11:57 | |
*** jrist has joined #openstack-ironic | 11:57 | |
*** dnuka has joined #openstack-ironic | 11:59 | |
*** skazi has quit IRC | 12:02 | |
*** e0ne has quit IRC | 12:04 | |
*** rh-jelabarre has joined #openstack-ironic | 12:08 | |
*** sw3 has joined #openstack-ironic | 12:12 | |
*** e0ne has joined #openstack-ironic | 12:13 | |
*** trown|outtypewww is now known as trown | 12:18 | |
*** weshay_pto is now known as weshay | 12:19 | |
hjensas | o/ etingof :) | 12:22 |
etingof | hjensas, o/ | 12:22 |
*** bfournie has joined #openstack-ironic | 12:23 | |
*** arne_wiebalck_ has joined #openstack-ironic | 12:25 | |
jroll | \o morning all | 12:26 |
openstackgerrit | Ilya Etingof proposed openstack/ironic master: Fix unit test run on OS X https://review.openstack.org/608655 | 12:29 |
TheJulia | Good morning | 12:37 |
dtantsur | morning jroll, TheJulia | 12:38 |
jroll | hey :) | 12:38 |
* TheJulia awaits the coffee to be brewed | 12:39 | |
* dtantsur is not sure if his good morning to TheJulia and jroll actually got through | 12:40 | |
jroll | dtantsur: indeed, my hey was for you | 12:40 |
dtantsur | this Monday is too Monday: my wifi router disconnects all my devices (and only mine) every some minutes | 12:40 |
dtantsur | okay, so I did not see your hey :) | 12:40 |
jroll | oof | 12:40 |
jroll | well then morning dtantsur :) | 12:41 |
jroll | I don't miss znc dropping messages on abrupt disconnects | 12:41 |
dtantsur | yeaaaahh.. | 12:41 |
dtantsur | to better news: I might have figured what is wrong with inspector grenade: https://review.openstack.org/608620 | 12:41 |
patchbot | patch 608620 - openstack-dev/grenade - nova: do not verify standard resource classes when... - 1 patch set | 12:41 |
TheJulia | I figured out why rescue jobs sometimes failed | 12:49 |
TheJulia | https://review.openstack.org/#/c/608404 | 12:49 |
patchbot | patch 608404 - ironic - Avoid race with nova on power sync and rescue - 2 patch sets | 12:49 |
jroll | TheJulia: hrm, now we just have a race with our power sync loop :/ | 12:50 |
* TheJulia needs a coffee iv this morning | 12:52 | |
etingof | good morning to jroll & TheJulia o/ | 12:53 |
jroll | hey etingof | 12:53 |
etingof | whenever I see power sync, it immediately reminds me of my power sync patch... does anybody feel the same way? ;) | 12:54 |
TheJulia | etingof: seeking reviews? :) | 12:56 |
*** ijw has joined #openstack-ironic | 12:56 | |
*** ijw has quit IRC | 12:56 | |
*** ijw has joined #openstack-ironic | 12:57 | |
etingof | TheJulia, yep, this will play well with your morning coffee -- https://review.openstack.org/#/c/607949/ | 12:58 |
patchbot | patch 607949 - ironic - WIP: Avoid long-pending ipmitool processes - 1 patch set | 12:58 |
*** ijw has quit IRC | 12:59 | |
*** ijw_ has joined #openstack-ironic | 13:00 | |
jroll | TheJulia: oh, we have an exclusive lock, that patch should be okay, I'll add one comment | 13:00 |
*** lenka has quit IRC | 13:01 | |
TheJulia | jroll: yeah, we do, which I figured is partially why that was the less painful path | 13:01 |
jroll | TheJulia: yeah, I'm +2 if you can add the TODO I mentioned :) | 13:02 |
TheJulia | I can definitely add a todo | 13:02 |
jroll | thanks | 13:04 |
*** e0ne has quit IRC | 13:05 | |
*** arne_wiebalck_ has quit IRC | 13:08 | |
sambetts|afk | TheJulia, jroll: with that patch anything watching the notifications will still see a power off event https://github.com/openstack/ironic/blob/master/ironic/conductor/utils.py#L326 | 13:08 |
sambetts|afk | in case we want to hide that too | 13:09 |
*** lenka has joined #openstack-ironic | 13:10 | |
sambetts|afk | could add a flag on the node_power_action function to prevent it updating the db and making the notification something like "silent=True" | 13:11 |
*** mjturek has joined #openstack-ironic | 13:12 | |
jroll | I think we'd still want to push that to notifications | 13:12 |
jroll | if they're being used for auditing or something, it might be needed | 13:12 |
sambetts|afk | ah thats a good point, although feels a little weird to have two different outputs from ironic telling you different things | 13:14 |
jroll | agree | 13:14 |
TheJulia | tonyb: is https://review.openstack.org/#/c/594591/ just blocked on our ci? | 13:14 |
patchbot | patch 594591 - ironic-python-agent - Add a UUID to the extra-hardware data on ppc64le - 5 patch sets | 13:14 |
*** tiendc has quit IRC | 13:15 | |
jroll | sambetts|afk: I really really don't like the premise of the patch, but it can also happen in production, and I think it's probably for the best to land now until we can do a callback to nova to notify it | 13:15 |
TheJulia | I agree, we still want to push notifications, it is just that we still have an overall operation in-flight where nova sees it and thinks "oh! I know this!" | 13:15 |
sambetts|afk | jroll: I can envision something being like "my autiting system is telling me the server got powered off, but I can see a log in nova telling me its turned on" | 13:15 |
jroll | sambetts|afk: totally | 13:16 |
jroll | sambetts|afk: but worse is "I did a rescue on this instance and now it keeps shutting down randomly" | 13:16 |
TheJulia | sambetts|afk: Sure, but it is still overall in an in-flight operation that Ironic is managing | 13:16 |
sambetts|afk | yeah 100% | 13:16 |
sambetts|afk | I guess even with this patch there is a brief blip where the db changes | 13:17 |
* jroll hopes to get sambetts|afk angry enough about this that he goes and does the work to callback to nova :D | 13:17 | |
rpittau | git diff | 13:17 |
rpittau | lol wrong window :D | 13:17 |
sambetts|afk | jroll: ;) /me is already angry at heat not doing what he wants it to do after 2hrs of waiting for something to deploy | 13:18 |
*** skazi has joined #openstack-ironic | 13:18 | |
jroll | sounds like heat :P | 13:18 |
dtantsur | lol | 13:21 |
TheJulia | sambetts|afk: there is, the right thing would be to add recognition in nova... except they've recently complained about tight coupling, and it would be even more tight coupling. :( | 13:21 |
sambetts|afk | TheJulia: yeah tight coupling sucks, but isn't that what driver's in nova are for (you can't tell me the xen driver isn't tightly coupled to xen)? alternatively a flag to node_power_action could prevent the inital db change which is then overridden, or even a "power_state_mask=state.POWER_ON" which can be used to set which power state to mask the server's power state with | 13:25 |
TheJulia | sambetts|afk: I completely agree with you but they apparently don't see it that way. At the same time, aiui, the power sync loop is mostly outside of our influence other than the status of each instance being checked higher up in the nova-compute process | 13:27 |
jroll | TheJulia: nova was very supportive of doing a callback when we changed power state | 13:28 |
TheJulia | +++ | 13:28 |
TheJulia | That would effectively prevent this from ever being an issue | 13:28 |
sambetts|afk | although we'd have to have control of the callback from each different action in ironic, so recue could decide not to send it | 13:29 |
sambetts|afk | but other actions will | 13:29 |
*** arne_wiebalck_ has joined #openstack-ironic | 13:29 | |
jroll | eh? we want to send it always | 13:30 |
jroll | that way the nova instance is always up to date, and doesn't think something powered it on out of band | 13:30 |
jroll | (thus nova wanting to shut it back off) | 13:30 |
sambetts|afk | except we don't for this case, because then nova will think the server has gone off like it does currently | 13:30 |
sambetts|afk | right? | 13:31 |
jroll | but it will also know it's meant to turn back on, so it won't shut it back down | 13:31 |
jroll | basically, | 13:31 |
jroll | if nova thinks something should be off, and sees that it's on, it shuts it down | 13:31 |
sambetts|afk | right, but nova thinks it should be off, because ironic is tell it its off | 13:32 |
sambetts|afk | I thought thats what TheJulia's patch is circumventing | 13:32 |
jroll | right | 13:32 |
jroll | but with the callback, ironic would be telling it "hey, I'm turning this on now, update your db" | 13:32 |
jroll | rather than nova sayign "oh, somehow this came on | 13:33 |
TheJulia | and then mashing the "turn it off!" button | 13:33 |
jroll | basically, we'd be hooking in here: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1066 | 13:33 |
sambetts|afk | so is the current nova behaviour: if a node goes off out of band it learns a node is off and saves that state, but if a node powers on out of band it powers it back off? | 13:34 |
jroll | correct | 13:34 |
jroll | it doesn't correct the former because things like "systemctl shutdown" and such | 13:35 |
sambetts|afk | wow... makes sense but isn't wholely intuative :-P | 13:36 |
sambetts|afk | and so with the callback nova wouldn't treat an out of band power on as an out of band power on any more? or would we still update our db without a callback to nova if the node was actually out of band powered on? | 13:40 |
sambetts|afk | so it would still get powered back off | 13:40 |
sambetts|afk | (although I guess that depends on your setting on the ironic power sync force option) | 13:40 |
jroll | right, I think we would do it depending on that config | 13:41 |
jroll | I need to write up some sort of spec | 13:41 |
TheJulia | speaking of specs, any progress on conductor_group support for nova-compute? | 13:42 |
* TheJulia is working on the whiteboard atm | 13:43 | |
jroll | no, super busy october downstream | 13:44 |
jroll | hopefully later this month I can get to it | 13:44 |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic stable/rocky: Fixes a race condition in the hash ring code https://review.openstack.org/608675 | 13:45 |
TheJulia | jroll: k | 13:46 |
sambetts|afk | jroll, TheJulia: based on what you've said I'm +2 on that patch with the same comment as jroll | 13:49 |
*** jaypipes has joined #openstack-ironic | 13:53 | |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic stable/queens: Fixes a race condition in the hash ring code https://review.openstack.org/608678 | 13:58 |
*** dougsz has joined #openstack-ironic | 14:00 | |
*** skazi has quit IRC | 14:02 | |
*** lenka has quit IRC | 14:07 | |
*** munimeha1 has joined #openstack-ironic | 14:07 | |
*** baha has joined #openstack-ironic | 14:08 | |
*** SteelyDan is now known as dansmith | 14:13 | |
*** lenka has joined #openstack-ironic | 14:15 | |
*** e0ne has joined #openstack-ironic | 14:17 | |
*** dnuka has quit IRC | 14:19 | |
*** olivierb has quit IRC | 14:21 | |
*** beekneemech is now known as bnemec | 14:22 | |
TheJulia | dtantsur: at a glance, it looks like the merge conflict wasn't too bad | 14:24 |
dtantsur | TheJulia: with the hash ring patch? yeah, more or less straightforward | 14:24 |
TheJulia | Hey jroll, could you take a look at https://review.openstack.org/#/c/608678 when you get a minute | 14:24 |
patchbot | patch 608678 - ironic (stable/queens) - Fixes a race condition in the hash ring code - 1 patch set | 14:24 |
*** bfournie has quit IRC | 14:27 | |
jroll | TheJulia: yep will do | 14:27 |
openstackgerrit | Madhuri Kumari proposed openstack/ironic master: Implement basic interfaces for GraphicalConsole Interface https://review.openstack.org/547356 | 14:28 |
*** moshele has quit IRC | 14:31 | |
openstackgerrit | Aija Jaunteva proposed openstack/sushy master: Add support for loading resources from archive file https://review.openstack.org/589147 | 14:31 |
*** skazi has joined #openstack-ironic | 14:32 | |
openstackgerrit | Julia Kreger proposed openstack/ironic master: Avoid race with nova on power sync and rescue https://review.openstack.org/608404 | 14:35 |
TheJulia | jroll: thanks! | 14:37 |
jroll | TheJulia: the email or? | 14:37 |
*** jaganathan has quit IRC | 14:37 | |
TheJulia | the patch | 14:37 |
jroll | ah right | 14:37 |
* TheJulia has not looked at email this morning | 14:37 | |
jroll | you said thanks about 5 seconds after I +1'd the tenks thing via email :P | 14:38 |
TheJulia | jroll: I just noticed that :) | 14:39 |
jroll | TheJulia: did you also want to review the queens version? it's good with me | 14:39 |
TheJulia | I thought I did... | 14:40 |
jroll | heh, I'll just leave a +2 then | 14:40 |
TheJulia | done | 14:40 |
jroll | thanks | 14:41 |
TheJulia | I've put status updates on a number of items https://etherpad.openstack.org/p/IronicWhiteBoard | 14:41 |
*** cdearborn has joined #openstack-ironic | 14:46 | |
*** stendulker has joined #openstack-ironic | 14:47 | |
*** etingof is now known as etingof|brb | 14:48 | |
*** edleafe has joined #openstack-ironic | 14:51 | |
*** arne_wiebalck_ has quit IRC | 14:53 | |
*** gcb_ has joined #openstack-ironic | 14:53 | |
*** etingof|brb has quit IRC | 14:54 | |
*** e0ne has quit IRC | 14:54 | |
* devananda starts the coffee | 14:56 | |
TheJulia | ++ | 14:56 |
*** ijw_ has quit IRC | 14:58 | |
*** ijw has joined #openstack-ironic | 14:58 | |
*** kaifeng has joined #openstack-ironic | 14:59 | |
jroll | what is with you people and computer before coffee :P | 14:59 |
jroll | morning deva | 14:59 |
devananda | morning, jroll | 15:00 |
TheJulia | #startmeeting ironic | 15:00 |
openstack | Meeting started Mon Oct 8 15:00:22 2018 UTC and is due to finish in 60 minutes. The chair is TheJulia. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
*** openstack changes topic to " (Meeting topic: ironic)" | 15:00 | |
TheJulia | Good morning eveyrone! | 15:00 |
openstack | The meeting name has been set to 'ironic' | 15:00 |
TheJulia | o/ | 15:00 |
rpioso | \o | 15:00 |
kaifeng | o/ | 15:00 |
jroll | \o | 15:00 |
iurygregory | o/ | 15:00 |
stendulker | o/ | 15:00 |
cdearborn | o/ | 15:00 |
*** etingof has joined #openstack-ironic | 15:00 | |
TheJulia | Our agenda this week is fairly strait forward, and can be found on the wiki. | 15:01 |
TheJulia | #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting | 15:01 |
etingof | o/ | 15:01 |
TheJulia | #topic Announcements / Reminders | 15:01 |
*** openstack changes topic to "Announcements / Reminders (Meeting topic: ironic)" | 15:01 | |
jiapei | o、 | 15:01 |
jiapei | o/ | 15:01 |
TheJulia | #info We have published the priorities document for the cycle! \o/ | 15:02 |
mgoddard | o/ | 15:02 |
jroll | woo | 15:02 |
TheJulia | #link http://specs.openstack.org/openstack/ironic-specs/priorities/stein-priorities.html | 15:02 |
TheJulia | There should be a story for everything in storyboard at this time. There is also a high level worklist page on storyboard now | 15:02 |
TheJulia | #link https://storyboard.openstack.org/#!/worklist/494 | 15:03 |
TheJulia | Does anyone have anything else to announce or remind us of this week? | 15:04 |
* etingof would like to introduce the newborn ironicer - iurygregory \o/ | 15:04 | |
rpioso | Welcome, iurygregory :-) | 15:04 |
jroll | welcome! | 15:04 |
TheJulia | Welcome iurygregory! | 15:04 |
iurygregory | thanks everyone o/ | 15:05 |
mgoddard | welcome | 15:05 |
TheJulia | One reminder, I will be effectively completely AFK all day tomorrow. | 15:05 |
stendulker | welcome | 15:05 |
jiapei | welcome iurygregory | 15:05 |
TheJulia | #topic Review action items from previous meeting | 15:06 |
*** openstack changes topic to "Review action items from previous meeting (Meeting topic: ironic)" | 15:06 | |
TheJulia | #info No action items last week, so moving on! | 15:06 |
TheJulia | #topic Review subteam status reports | 15:06 |
*** openstack changes topic to "Review subteam status reports (Meeting topic: ironic)" | 15:06 | |
TheJulia | #link https://etherpad.openstack.org/p/IronicWhiteBoard | 15:07 |
TheJulia | Starting around line 173 | 15:07 |
TheJulia | I've put initial statuses on some of the items, If you own one of the items according to the priorities, please indicate a status, even if work has not started on that item yet. | 15:08 |
TheJulia | dtantsur: awesome about a prototype | 15:09 |
dtantsur | :) | 15:09 |
*** e0ne has joined #openstack-ironic | 15:11 | |
mjturek | o/ | 15:11 |
TheJulia | Greetings mjturek | 15:12 |
TheJulia | Regarding python3 first, have any 3rd party CI operators had a chance to look at setting up a job or two to run python3? | 15:12 |
mjturek | TheJulia not much of an update, but our CI guru doesn't see any issues with it | 15:13 |
TheJulia | Okay, it should be fairly easy, duplicate and pass USE_PYTHON3=True into the CI job :) | 15:14 |
* TheJulia wonders if we need a "Just for fun" category | 15:14 | |
TheJulia | Anyway, has everyone had a chance to review and update statuses? | 15:15 |
TheJulia | And with that, are we ready to proceed? | 15:15 |
rpioso | TheJulia: We're looking into it. | 15:15 |
rpioso | Just a question of which to cut over. | 15:16 |
TheJulia | rpioso: in your guys case, you have tons of jobs, I would just cut over the ones you feel most exercise your driver to use Python3 | 15:16 |
rpioso | TheJulia: +1 | 15:16 |
TheJulia | Everyone good to proceed? | 15:17 |
* TheJulia gets out the crickets | 15:18 | |
TheJulia | Okay, I guess we're good to proceed then | 15:19 |
TheJulia | #topic Deciding on priorities for the coming week | 15:19 |
*** openstack changes topic to "Deciding on priorities for the coming week (Meeting topic: ironic)" | 15:19 | |
TheJulia | #link https://etherpad.openstack.org/p/IronicWhiteBoard | 15:19 |
TheJulia | Starting at line 105, I pre-populated a list based based on looking through review this morning | 15:20 |
TheJulia | Is there anything anyone feels is missing? That they feel needs to be added or removed? | 15:20 |
jroll | seems reasonable | 15:21 |
TheJulia | Any objections, are we good to proceed | 15:22 |
* TheJulia brews more coffee and hands it out | 15:23 | |
* etingof would love to have the long-running ipmitool processes idea criticized | 15:23 | |
*** serlex has quit IRC | 15:23 | |
etingof | meaning https://review.openstack.org/#/c/607949/ | 15:23 |
patchbot | patch 607949 - ironic - WIP: Avoid long-pending ipmitool processes - 1 patch set | 15:23 |
mgoddard | looks good to me | 15:23 |
* jroll already gave his feedback :) | 15:24 | |
TheJulia | etingof: I've not had a chance to look yet, could we take that to discussion? | 15:24 |
etingof | sure | 15:24 |
etingof | thank you! | 15:24 |
TheJulia | Okay, Moving on then | 15:24 |
TheJulia | #topic Discussion | 15:24 |
*** openstack changes topic to "Discussion (Meeting topic: ironic)" | 15:24 | |
TheJulia | First topic of the day is: Do we have a forward path on deploy templates? | 15:24 |
jroll | I'm still waiting to see jay's proposal for the full picture | 15:25 |
jroll | but I think we have enough to start building it out? | 15:25 |
TheJulia | jroll: I'm not sure he is going to given the way the discussion went :( | 15:25 |
jaypipes | jroll: I wasn't planning on continuing any formal proposal considering the discussions. | 15:25 |
jroll | erm | 15:26 |
jroll | I thought we landed on agreement with that proposal | 15:26 |
jroll | other than folks saying it would take too long, I guess | 15:26 |
TheJulia | I think we can do the internals without many issues or disagreements, the what information we act upon and how we get that in or populated seems to be lacking agreement | 15:26 |
jaypipes | jroll: I essentially capitulated. | 15:26 |
jroll | jaypipes: don't blame you, I guess it's my optimism hoping you were continuing writing instead of arguing :) | 15:27 |
jroll | TheJulia: again, I think everybody came to agreement in that thread, with a small chunk of "we don't have time!!!!!!" | 15:27 |
jroll | maybe I read it wrong | 15:27 |
jaypipes | jroll: the official line from nova is that virt drivers should feel free to use the required traits list as signals to the virt driver to configure an instance. We advise the virt driver not to put non-schedulable/non-placement-influencing things as required traits. | 15:28 |
mgoddard | the recent mail Chris Friesen suggested traits should only be used for configuration of booleans | 15:28 |
TheJulia | I feel like with the side discusison, is that the mechanism would be capabilities, and we would just ignore traits | 15:28 |
TheJulia | and capabilities would essentially now live forever | 15:28 |
jroll | capabilities? | 15:28 |
jaypipes | jroll: that said, we're not keen to add any deploy_template_X stuff to os-traits standard traits library, so the deploy template traits should be prefixed with CUSTOM_. | 15:28 |
TheJulia | And then we would have an external overide mechanism | 15:28 |
sambetts|afk | I thought the purpose of the deploy templates was to turn deploy steps into booleans? | 15:28 |
TheJulia | jaypipes: We were never ever suggesting that | 15:29 |
jroll | jaypipes: yep, hear everything you're saying loud and clear | 15:29 |
jaypipes | TheJulia: johnthetubaguy was. | 15:29 |
TheJulia | jaypipes: oh... well... Ummm.. Hmm. :( | 15:29 |
mgoddard | jaypipes: he wanted to use standard traits, but not like that | 15:29 |
TheJulia | jaypipes: I'll put it this way, my impression and understanding is that we would simply rely upon existing traits, but not do anything like a template definition in os-traits, since it is completely freeform with CUSTOM_ | 15:30 |
mgoddard | if there is a sensible standard trait, then it could be used. He wasn't suggesting putting garbage into os-traits | 15:30 |
jaypipes | mgoddard: we'd still support standard traits being added to os-traits like BOOT_MODE_UEFI/BOOT_MODE_BIOS or STORAGE_RAID5 etc | 15:30 |
jroll | so where I believe we're at (or did before this meeting) - we have a path for boolean config things like UEFI, we still need to determine the path for more complex configuration data, and there's a good proposal from jaypipes in that thread. I still think this is the path we should take | 15:30 |
jroll | ^ curious folks' take on that | 15:30 |
* jroll gets links | 15:31 | |
TheJulia | jroll: I concur | 15:31 |
jaypipes | TheJulia: ++ | 15:31 |
*** TheJulia sets mode: -o TheJulia | 15:31 | |
* TheJulia doesn't know why she didn't do that sooner | 15:31 | |
jroll | this is the proposal I think we should take: | 15:31 |
jroll | #link http://lists.openstack.org/pipermail/openstack-dev/2018-October/135300.html | 15:31 |
jroll | and I can't find the simple boolean proposal now :| | 15:32 |
jroll | ah, the simple part: | 15:33 |
jroll | #link http://lists.openstack.org/pipermail/openstack-dev/2018-October/135446.html | 15:33 |
jaypipes | jroll: so, yeah, I'd love to see that type of solution long term, but it ain't a reality any time soon given current state of thinking in nova. | 15:33 |
jaypipes | jroll: I'm referring to the first link above. | 15:34 |
jroll | jaypipes: yeah, social problem, not technical. can be overcome. | 15:34 |
jaypipes | jroll: and yes, cfriesen's email represents the agreed, simple approach. | 15:34 |
TheJulia | jroll: It was on another thread if I remember correctly | 15:34 |
jroll | TheJulia: yes, I linked it :) | 15:35 |
jaypipes | jroll: to which I responded with the Ironic-ness here: http://lists.openstack.org/pipermail/openstack-dev/2018-October/135474.html | 15:35 |
* devananda looks for the jar of unicorn dust, and adds some to their coffee | 15:35 | |
jroll | jaypipes: indeed | 15:35 |
jroll | anyway, to the original question, | 15:35 |
mgoddard | so if we were to build a solution for the boolean proposal using traits, and until --config-data exists, abuse it for non-booleans, how would that sit with everyone? | 15:35 |
jroll | let's proceed... yes what mgoddard said :) | 15:36 |
jroll | er wait | 15:36 |
jroll | no, I don't want to abuse traits like that | 15:36 |
jroll | just wait for complex things until someone cares enough to push for the right solution | 15:36 |
mgoddard | so no support for non-booleans? | 15:36 |
TheJulia | I totally didn't see jaypipes's further comments below | 15:36 |
jroll | mgoddard: that's my opinion, yes, we're going to dig ourselves another compatibility hole like capabilities | 15:37 |
mgoddard | if we build a generic mechanism, can we stop people? | 15:37 |
TheJulia | mgoddard: I'm good with that | 15:37 |
TheJulia | mgoddard: maybe if we also sprinkle some of this unicorn dust devananda has been hiding | 15:37 |
TheJulia | s/sprinkle/sprinkle in/ | 15:38 |
jroll | we can't stop people | 15:38 |
jroll | but we can yell that it isn't supported | 15:38 |
jroll | and then not care about breaking it | 15:38 |
mgoddard | people will just ignore it | 15:38 |
jroll | that's fine | 15:39 |
mgoddard | which might be ok, but we should understand that | 15:39 |
TheJulia | mgoddard: well, if things don't match up or are not viable and don't pass validate, drivers should fail and prohibit deployment | 15:39 |
jroll | and they may be broken later | 15:39 |
dtantsur | are we leaning towards delaying RAID for eternity more? | 15:39 |
TheJulia | (we might need to augment the list in the ironic virt driver for interfaces cared about at some point) | 15:39 |
TheJulia | dtantsur: I think we can consider a boolean of raid, but not information about a raid | 15:39 |
dtantsur | right, so a template still? | 15:39 |
mgoddard | boolean is quite subtle here - you could argue that RAID5 is a boolean - you either have it or you don't | 15:39 |
dtantsur | yeah, this ^^ is my question | 15:40 |
*** e0ne has quit IRC | 15:40 | |
TheJulia | dtantsur: I think it would still be a template, default configuraitons would need to be populated in the boolean scenario | 15:40 |
TheJulia | once we have something with metadata references, then we can allow more dynamic pass-in of raid configuration | 15:40 |
* TheJulia wonders if we're all on the same page and in a relatively happy place | 15:41 | |
mgoddard | so RAID is a boolean? | 15:41 |
mgoddard | (that's not what I understand boolean to mean...) | 15:41 |
TheJulia | so, the gap is the fact that our model requires the configuraiton for raid to be set, the template stored in ironic... I guess in theory could replace the raid template | 15:42 |
devananda | very few boot-time-configurable traits are true booleans, because many of them interact with other settings. how about secure boot mode <-> legacy BIOS setting? | 15:42 |
TheJulia | err | 15:42 |
TheJulia | raid config | 15:42 |
*** e0ne has joined #openstack-ironic | 15:42 | |
TheJulia | So in theory, we could have RAID5, RAID10, and a deploy template could swap default configurations around :\ | 15:43 |
* TheJulia doesn't want raid to derail the boolean nature of secure boot | 15:43 | |
devananda | TheJulia: I'm pointing out that secure boot actually isn't a simple boolean, unfortunately | 15:44 |
devananda | what if I request secureboot=true, and biosmode=true? | 15:44 |
TheJulia | devananda: I think settings would need to fall into the entire bios setting side of the universe where an operator could advertise a specific trait on nodes based upon bios settings they have applied, and as time goes on we could iterrate that | 15:44 |
jroll | now I'm thinking - drivers can already read the traits passed to ironic, for the simple things. the simple proposal just adds some mechanisms to nova to pass additional traits to the virt driver; our side is done. I imaging the more complex --config-data proposal would pass this data in a different way, and I think that's where deploy templates need to come in. | 15:45 |
TheJulia | devananda: validate() code would need to be sufficent enough to recognize such a condition and prohibit deployment. | 15:45 |
devananda | I don't mean to derail, but I don't think this problem is limited to RAID settings | 15:45 |
jroll | so maybe the deploy templates work needs to propose the ironic side of the --config-data bits, and then we can complete the work in the nova api | 15:46 |
TheJulia | devananda: I absolutely agree with you there, but we can only announce 80-ish possible traits to be scheduled upon anyway. | 15:46 |
TheJulia | devananda: and all of those things are booleans, they exist or not | 15:46 |
devananda | I see | 15:47 |
TheJulia | jroll: I think that is reasonable as well | 15:47 |
*** tssurya has quit IRC | 15:47 | |
* TheJulia thinks we just need to go off and hack on code at this point | 15:47 | |
mgoddard | jroll: I'd be wary of implementing something without buy in from nova on a high level design | 15:47 |
jroll | mgoddard: I didn't see anyone from nova opposed to this proposal for reasons other than "it'll take too long and we need to solve this asap" | 15:48 |
jroll | jaypipes: ^ would you agree with that? | 15:48 |
*** e0ne has quit IRC | 15:49 | |
TheJulia | We also kind of reached a similar point in prior in-person discussions and it felt like we're were kind of at that point where an ID value was blessed, and even ironic could recieve that in the post to move to active state, and then go lookup the data if needed | 15:49 |
jaypipes | jroll: reading back, one sec | 15:50 |
TheJulia | Well, it should likely be set first, that way validate can do the needful to determine if the deployment is actually possible or not | 15:50 |
* TheJulia thinks we still call validate right before deployment anyway, so she is just rambling off into the wind | 15:50 | |
jroll | jaypipes: more pointedly, would you agree that the only resistance to your proposal from nova folks was about the time it will take? | 15:51 |
*** lenka has quit IRC | 15:51 | |
jaypipes | jroll: that was the primary resistance, yes. | 15:53 |
jroll | nod | 15:53 |
jaypipes | jroll: and that resistance was from ironic as well :) | 15:53 |
jroll | sure | 15:53 |
jroll | mgoddard: so I'm not too worried about technical resistance from nova, but we can get a nova spec up sooner than later, so we don't implement it without some sort of buyoff | 15:54 |
mgoddard | jroll: +1 for submitting a nova spec | 15:55 |
TheJulia | jaypipes: I think a good chunk of that was also because we didn't want to go to an extreme to get started, but it feels like (with the current discussion) that a happy place has been obtained | 15:55 |
TheJulia | So 5 more minutes for our scheduled time block, I'd like to jump to etingof's ad-hoc topic if we feel that we're at a happy place on the current topic | 15:56 |
jroll | one more question: who's doing the nova spec? :) | 15:56 |
TheJulia | Additional to that, do we want to get it done before berlin? | 15:57 |
jaypipes | TheJulia: ++ | 15:57 |
TheJulia | If needed, I can take the nova spec on my todo list. | 15:58 |
TheJulia | But I'm totally adding a just for fun review category then :) | 15:58 |
TheJulia | (Its the only way I'll retain sanity) | 15:58 |
* TheJulia takes silence as consensus | 15:59 | |
jroll | TheJulia: take "delegate the nova spec" on your todo list instead :) | 15:59 |
TheJulia | jroll: heh | 15:59 |
TheJulia | etingof: So, https://review.openstack.org/#/c/607949/ :) | 15:59 |
patchbot | patch 607949 - ironic - WIP: Avoid long-pending ipmitool processes - 1 patch set | 15:59 |
etingof | so I think I've discovered that we can't trust ipmitool's timeout/retries values we pass it. if we do, ironic gets blocked for up to 250 secs on a dead node. in the patch I've proposed I am trying to play it safe with ipmitool not to get blocked for long. | 15:59 |
TheJulia | I remember we discussed this a long long long time ago (many years) and couldn't reach consensus because it required really cracking open ipmitool's code to understand what it was doing. | 16:00 |
TheJulia | I'm +1 to fixing the behavior | 16:00 |
etingof | well, I've debugged it a bit | 16:00 |
etingof | it has adaptive delays here and there | 16:01 |
devananda | etingof: just curious, which implementation and version of ipmitool are you having that problem with? | 16:01 |
etingof | but that does not matter, the bad thing is that once we call ipmitool on a dead node, we are blocked out for some time | 16:01 |
etingof | devananda, 1.8 | 16:02 |
devananda | etingof: I mean, which implementation? there are different codebases out there, which different distros package under similar package names, which all create a binary called "ipmitool" | 16:02 |
devananda | again, don't mean to derail, but that was one of the fixes that I found way-back-when ... | 16:03 |
devananda | one of the implementations was a lot less "locky" than others | 16:03 |
etingof | devananda, oh, I used one packaged for centos and fedora | 16:03 |
*** gcb_ has quit IRC | 16:03 | |
devananda | k | 16:03 |
etingof | devananda, but does it matter? do we want to depend on some specific ipmitool? | 16:03 |
etingof | rather than on the one being shipped with a distro | 16:04 |
devananda | if there's a bug in an external package, should we fix it in ironic? | 16:04 |
etingof | devananda, it does not sound like a bug | 16:04 |
devananda | ah | 16:04 |
TheJulia | We kind of already do, I seem to remember we've got comments stating 1.8.15 in our docs. | 16:04 |
etingof | devananda, I take it as a way to deal with broken BMCs | 16:04 |
jroll | etingof: we already depend on a given implementation: https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/ipmitool.py#L27 | 16:04 |
devananda | we also already implemented backoff timing in ironic, to work around issues like this in external commands. I don't know if that code is still around (/me goes and look s for it) | 16:05 |
*** moshele has joined #openstack-ironic | 16:05 | |
* dtantsur suspects we should wrap up the meeting.. | 16:06 | |
etingof | devananda, that backoff thing does not prevent ipmitool to take as much time as it wants | 16:06 |
jroll | indeed, I'm hungry | 16:06 |
TheJulia | Yeah... | 16:06 |
TheJulia | Anyone have anything else or we'll call this meeting a wrap? | 16:06 |
* etingof is sorry for being boring tonight | 16:06 | |
TheJulia | etingof: your not being boring :( | 16:06 |
TheJulia | Okay, calling this meeting over, Thanks everyone! | 16:07 |
TheJulia | #endmeeting | 16:07 |
*** openstack changes topic to "Bare Metal Provisioning | Status: http://bit.ly/ironic-whiteboard | Docs: http://docs.openstack.org/ironic/ | Bugs: https://storyboard.openstack.org/#!/project_group/75 | Contributors are generally present between 6 AM and 12 AM UTC, If we do not answer, please feel free to pose questions to openstack-dev mailing list." | 16:07 | |
openstack | Meeting ended Mon Oct 8 16:07:40 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:07 |
*** jrist has quit IRC | 16:07 | |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/ironic/2018/ironic.2018-10-08-15.00.html | 16:07 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/ironic/2018/ironic.2018-10-08-15.00.txt | 16:07 |
openstack | Log: http://eavesdrop.openstack.org/meetings/ironic/2018/ironic.2018-10-08-15.00.log.html | 16:07 |
*** iurygregory is now known as iurygregory|away | 16:09 | |
TheJulia | etingof: by chance have you generated any numbers behind what reducing the timeouts and taking over some of the logic in the conductor would save across 10, 100, 1000 nodes? | 16:09 |
* TheJulia hopes nobody has 1000 nodes down, but construction equipment does seem to like datacenter power feeds | 16:09 | |
etingof | TheJulia, no, but I can probably factor in the limited size of the job queue for that | 16:10 |
*** kaifeng has quit IRC | 16:10 | |
etingof | to see at what node set conductor would throttle at power sync | 16:11 |
TheJulia | etingof: I think numbers are the best way for us to determine a forward path :) | 16:11 |
TheJulia | That way we can at least all relate it and kind of go from there. | 16:11 |
etingof | TheJulia, some numbers for a single, isolated ipmitool invocation is in the patch | 16:11 |
TheJulia | I saw :) | 16:12 |
*** jrist has joined #openstack-ironic | 16:12 | |
TheJulia | Kind of where I got the idea of maybe we need a little more so we can put some of this into perspective | 16:12 |
etingof | TheJulia, the other thing is that the ipmitool situation confuses operators - they set ipmitool to time out in 60 sec, but that may never happen | 16:13 |
TheJulia | Yeah, it does :( | 16:13 |
TheJulia | We've had a few operators come in and complain about that over the years | 16:14 |
etingof | the third thing is that we might have certain timers in conductor (like node power recovery timing), if power sync timeout does not keep up with the settings, those other features might not work as expected | 16:15 |
TheJulia | or take far longer than expected in an ideal universe | 16:17 |
*** jrist has quit IRC | 16:17 | |
TheJulia | devananda: by chance did you see if the backoff code was still present? | 16:17 |
devananda | etingof: after reading your WIP patch, I like the approach, defaulting to much lower timeouts, but in the stated case of a BMC misbehaving and ipmitool getting kinda stuck ... | 16:17 |
devananda | TheJulia: haven't found it yet | 16:17 |
TheJulia | k, I remember that whole path is kind of confusing too :( | 16:17 |
etingof | there is the code that makes sure that we won't invoke ipmitool again if we are by the configured timeout | 16:18 |
TheJulia | Well, ipmitool is going to get stuck if the bmc is offline | 16:18 |
TheJulia | at least until its internal timeout is reached | 16:18 |
devananda | it looks like we now check if ipmitool supports the -N -R options, but don't handle timing ourselves in the situation where it doesn't | 16:18 |
etingof | but that only works if ipmitool is done | 16:18 |
etingof | devananda, let me show you what I mean | 16:19 |
devananda | to handle dead BMC // stuck ipmitool, I think we would need a reaper daemon, or a periodic task in conductor that invokes 'kill' | 16:19 |
devananda | or something along those lines | 16:19 |
devananda | I've avoided using fedora's package of ipmitool for years because it used to get badly stuck ... sounds like you're still running into that issue | 16:19 |
TheJulia | I think several operators would greatly appreciate a periodic task to hunt/kill stalled/frozen ipmi processes | 16:20 |
*** dtantsur is now known as dtantsur|afk | 16:20 | |
dtantsur|afk | o/ | 16:20 |
TheJulia | goodnight dtantsur|afk | 16:21 |
etingof | devananda, https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/ipmitool.py#L451 | 16:21 |
devananda | as an aside, there are 3 separate implementations of ipmitool that I found in a minute of googling | 16:21 |
devananda | freeipmi, openipmi, and ipmitool/ipmitool | 16:22 |
devananda | different operators' differing experiences stem from this, to some degree | 16:22 |
devananda | it may be worth adding notes about that in our docs, not just inline in the code | 16:22 |
etingof | devananda, are those ipmi protocol implementations or `ipmitool` binary implementations? | 16:22 |
devananda | t | 16:22 |
devananda | the latter | 16:23 |
* etingof have seen ipmiutil | 16:23 | |
etingof | devananda, http://ipmiutil.sourceforge.net/docs/ipmisw-compare.htm | 16:23 |
TheJulia | looks like we explicitly reference versioning for ipmitool/ipmitool, but don't explicitly note the caveat | 16:24 |
devananda | TheJulia: that's what i remember, and looking at that project on github, it looks familiar | 16:24 |
etingof | TheJulia, we can probably go the forced ipmitool extermination way | 16:25 |
*** gabys has quit IRC | 16:25 | |
TheJulia | we even link to the correct utility | 16:25 |
devananda | https://github.com/ipmitool/ipmitool isn't on that list, etingof, but is the one we depend on | 16:25 |
TheJulia | note: there is a public note someplace about ipmtool moving from sourceforge to github | 16:25 |
*** gabys has joined #openstack-ironic | 16:25 | |
etingof | devananda, it is under the name https://sourceforge.net/projects/ipmitool/ | 16:25 |
devananda | ooh. I see now | 16:26 |
devananda | yeah, nice note once I clicked through to SF | 16:26 |
* etingof goes offline for a brief while | 16:27 | |
devananda | https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipmitool.py#n445 | 16:28 |
*** baha has quit IRC | 16:28 | |
devananda | I think the issue is this execute is a blocking call, no? | 16:28 |
TheJulia | devananda: basically, yes | 16:29 |
devananda | if ipmitool takes longer than expected to return, that thread is stuck, regardless of how narrow the timing parameters were ... so we would need a periodic task to "reap" stuck processes (assuming that stuck processes really is the concern) | 16:29 |
* TheJulia wonders if this is where we need a real dlm | 16:30 | |
* TheJulia ducks | 16:30 | |
*** gabys has quit IRC | 16:30 | |
devananda | :P | 16:30 |
* TheJulia has already started on that code and even got a +1 from dmitry :) | 16:30 | |
TheJulia | devananda: I think we actually kind of need both, try and expediently loop through (which would be bad)... although there is the alternative threading module that someone pointed us to at the PTG that might help.... ?simplify? (well, unlikely, solve is likely the right word) the threading performance issues and lack of additional cpu core use (since greenthreads) | 16:32 |
devananda | I have a knee-jerk reaction when someone says "use another python threading model, it'll solve everything" | 16:33 |
*** gabys has joined #openstack-ironic | 16:33 | |
TheJulia | It won't, that is for sure | 16:34 |
*** etingof has quit IRC | 16:34 | |
TheJulia | it it can definitely help solve some of the issues and operator complaints | 16:34 |
devananda | I don't see any problems with reducing the default timing and doing incremental backoff timing, like in etingof's proposal. | 16:34 |
devananda | but I also don't think it will solve the problem it claims to address | 16:34 |
TheJulia | I think the issue is larger than just initial timing for execution | 16:35 |
devananda | yes | 16:35 |
*** stendulker has quit IRC | 16:37 | |
TheJulia | If we make a venn diagram of what the issues are, I think etingof's context is one circle that overlaps with your circle, which ultimately too overlaps with my circle of context | 16:37 |
devananda | I like diagrams | 16:37 |
TheJulia | Everyone likes digrams! | 16:38 |
TheJulia | diagrams | 16:38 |
*** gabys has quit IRC | 16:38 | |
* TheJulia just can't type | 16:38 | |
*** etingof has joined #openstack-ironic | 16:42 | |
TheJulia | welcome back etingof | 16:43 |
etingof | ;) | 16:43 |
etingof | did I miss anything crucial? | 16:43 |
TheJulia | etingof: so I think we're at a point where we belive the contextual venn diagram to be three separate circles with some overlap | 16:43 |
etingof | sounds promising! | 16:44 |
etingof | what are those cycles? | 16:45 |
TheJulia | one is the pure blocking length of time for ipmitool, the other is ipmitool actually becoming stuck hard and never returning, the third is "why does ironic only use one cpu, make it go faster!" | 16:46 |
TheJulia | s/cpu/cpu core/ | 16:47 |
*** gyee has joined #openstack-ironic | 16:48 | |
etingof | that is: 1) tackle ipmitool timeouts to limit its blocking time 2) kill the hopelessly blocked ipmitool and 3) fork conductor process ? | 16:48 |
TheJulia | 3) I think is going to be a lot of small things and maybe a few efforts as time goes on, but there was that library that someone mentioned which could help things to fork for periodics... | 16:50 |
TheJulia | but otherwise yes | 16:50 |
etingof | I probably missed that library being mentioned | 16:50 |
TheJulia | etingof: If I'm rememering correctly, it is mentioned on the etherpad from when we were having that discussion | 16:51 |
etingof | do you think I should propose a patch that would kill long-stuck ipmitool processes? | 16:51 |
etingof | which would qualify as cicle #2 while cicle #1 is already proposed | 16:52 |
TheJulia | etingof: as a separate worker.... I guess that should work, we would likely need to check lock status in real time though (hey, that pluggable locking patch!) | 16:52 |
etingof | does it need to be a separate worker given that we seem to wait on the forked ipmitool child to exit? | 16:53 |
etingof | I mean we (as process parent) could kill stuck child, no? | 16:53 |
TheJulia | etingof: aiui, a child process being launched halts the thread, so that worker should never be able to become involved | 16:55 |
TheJulia | hence why I was thinking a new worker | 16:55 |
TheJulia | running inside of the same parent (I think that should work....) | 16:55 |
etingof | alright, I will take a look | 16:55 |
devananda | TheJulia: did work ever get completed to move periodictasks into a separate process? | 16:57 |
etingof | the other thing that worries me is this - currently each ipmi session needs a dedicated ipmitool process. it would probably be more scalable if a single ipmitool-like process would be able to handle more than one BMC at a time | 16:57 |
TheJulia | devananda: no, I started picking that concept back up recently though which is where some of the ptg discussion that I was referencing came from | 16:58 |
openstackgerrit | Merged openstack/ironic master: Add functionality for individual cleaning on nodes https://review.openstack.org/586277 | 16:58 |
*** dougsz has quit IRC | 16:58 | |
TheJulia | devananda: which is also why I started hacking on improving locking support | 16:58 |
devananda | etingof: ah, so there was the pyghmi project, which had the goal of becoming our defacto pure-python parallel execution of ipmi commands | 16:58 |
devananda | etingof: but it never gained traction.... | 16:58 |
etingof | would it make sense to research if we could may be wrap libipmi (the lib behind ipmitool afaik) in some async or MT way? | 16:59 |
devananda | TheJulia: gotcha. that might be something I'd be interested in, and/or how that relates to splitting the conductor up a bit | 16:59 |
TheJulia | etingof: that exec of ipmitool is a HUGE computational and disk cost | 16:59 |
TheJulia | (well, cache hit cost, but if cache is low, then disk is still hit | 16:59 |
TheJulia | ) | 16:59 |
devananda | TheJulia: any thoughts on reviving pyghmi? | 16:59 |
etingof | ideally, we would have libipmi being called from python as a green thread | 17:00 |
TheJulia | Jared has been receptive to patches and changes, I think the issue that ultimately resulted in the intree ipminative power interface getting pulled was a lack of real use and then once tried to use it became clear it was not in a good state. | 17:00 |
etingof | I think the problem with anything other than ipmitool is that other ipmi implementations are less compatibile with existing bmcs | 17:00 |
devananda | I completely agree on that exec() call being a huge cost, it would be great to remove it | 17:00 |
devananda | etingof: and the general pervasive availability of "ipmitool" as both a system call, and something every operator is familiar with, give it momentum // make anything else have trouble gaining adoption. | 17:01 |
etingof | if we are to believe Tanenbaum, unix process creation is generally expensive | 17:02 |
*** trown is now known as trown|lunch | 17:02 | |
devananda | yes | 17:03 |
etingof | devananda, right, that's why if libipmi is as tenacious as ipmitool is, may be python bindings to libipmi would let us do async ipmi calls from conductor? | 17:03 |
devananda | also I would like to see more containerization of the control plane in general | 17:03 |
*** baha has joined #openstack-ironic | 17:04 | |
devananda | ie, avoiding packaging ipmitool with ironic-conductor would be good | 17:05 |
etingof | I think we are likely to stay dependent on ipmitool/libipmi because it is supposedly more compatible with bmcs than pyghmi might be... | 17:06 |
etingof | I can imagine bmc vendors testing bmc firmware against ipmitool | 17:07 |
* TheJulia can confirm this happens | 17:07 | |
* etingof context switches to homework, but could be preempted | 17:09 | |
TheJulia | enjoy | 17:12 |
* TheJulia thinks she is finally going to take a shower | 17:12 | |
moshele | TheJulia: hi | 17:15 |
TheJulia | Hi moshele, sorry I've not replied to the email you sent me, I've been extremely busy so far this morning | 17:16 |
moshele | TheJulia: I update the smart nic spec https://review.openstack.org/#/c/582767/ as I understood from isaku. I just wanted explain that we are working with intel to provide a solution that would fit all use cases | 17:17 |
patchbot | patch 582767 - ironic-specs - Add Support for Smart NIC - 10 patch sets | 17:17 |
moshele | TheJulia: no worries | 17:18 |
TheJulia | moshele: Understood, I think the disconnect is on the actual use cases supported by behaviors | 17:19 |
TheJulia | behaviors of the existing components that is | 17:19 |
TheJulia | we should still sync up though, but I'll try to review the spec after I get some lunch | 17:19 |
moshele | TheJulia: ok cool, let me know when it a good time for sync ( I guess it will be tomorrow it too late in my timezone) | 17:23 |
moshele | TheJulia: bon appetit! | 17:24 |
*** e0ne has joined #openstack-ironic | 17:24 | |
*** moshele has quit IRC | 17:25 | |
*** S4ren has quit IRC | 17:26 | |
*** e0ne has quit IRC | 17:29 | |
*** moshele has joined #openstack-ironic | 17:50 | |
*** moshele has quit IRC | 17:51 | |
*** e0ne has joined #openstack-ironic | 17:54 | |
openstackgerrit | Merged openstack/ironic stable/rocky: Fixes a race condition in the hash ring code https://review.openstack.org/608675 | 18:03 |
*** trown|lunch is now known as trown | 18:11 | |
*** slagle has joined #openstack-ironic | 18:52 | |
*** tssurya has joined #openstack-ironic | 18:53 | |
*** sambetts|afk has quit IRC | 19:07 | |
*** sambetts_ has joined #openstack-ironic | 19:10 | |
openstackgerrit | Merged openstack/ironic stable/queens: Fixes a race condition in the hash ring code https://review.openstack.org/608678 | 19:10 |
TheJulia | jroll: can you pull your -2 from https://review.openstack.org/#/c/579583 ? | 19:33 |
patchbot | patch 579583 - ironic-specs - Add virtual Bare Metal Clusters spec - 10 patch sets | 19:33 |
jroll | TheJulia: done | 19:33 |
TheJulia | Thanks! | 19:33 |
jroll | are we calling it agreed, or? | 19:34 |
TheJulia | I think leaving open for a couple days then declaring agreed | 19:35 |
jroll | okay | 19:35 |
TheJulia | but with a -2, nobody is going to review it :) | 19:35 |
jroll | hopefully rloo has a chance to look tomorrow | 19:35 |
jroll | hopefully they'd read the comment, or the email thread | 19:35 |
jroll | ¯\_(ツ)_/¯ | 19:35 |
*** sthussey has joined #openstack-ironic | 19:36 | |
TheJulia | Yeah, hopefully | 19:40 |
*** slagle has quit IRC | 19:40 | |
*** dciabrin has quit IRC | 19:50 | |
*** jtomasek has quit IRC | 20:13 | |
*** e0ne has quit IRC | 20:19 | |
*** slagle has joined #openstack-ironic | 20:20 | |
*** e0ne has joined #openstack-ironic | 20:20 | |
*** tssurya has quit IRC | 20:28 | |
*** slagle has quit IRC | 20:37 | |
*** e0ne has quit IRC | 20:48 | |
*** pcaruana has quit IRC | 20:49 | |
*** trown is now known as trown|outtypewww | 20:59 | |
*** lenka has joined #openstack-ironic | 21:02 | |
*** sai_p has joined #openstack-ironic | 21:04 | |
*** baha has quit IRC | 21:08 | |
*** cdearborn has quit IRC | 21:39 | |
*** rh-jelabarre has quit IRC | 21:48 | |
*** ijw has quit IRC | 22:19 | |
openstackgerrit | Julia Kreger proposed openstack/ironic master: Create base pxe class https://review.openstack.org/608786 | 22:28 |
*** munimeha1 has quit IRC | 22:50 | |
*** rcernin has joined #openstack-ironic | 22:50 | |
*** gyee has quit IRC | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!