*** zaneb has quit IRC | 02:03 | |
*** Qianbiao has joined #openstack-ironic | 02:04 | |
*** zaneb has joined #openstack-ironic | 02:04 | |
openstackgerrit | Ankit Kumar proposed openstack/ironic master: Adding changes for iso less vmedia support https://review.opendev.org/752001 | 03:47 |
---|---|---|
*** zzzeek has quit IRC | 04:15 | |
*** zzzeek has joined #openstack-ironic | 04:17 | |
*** uzumaki has joined #openstack-ironic | 04:20 | |
*** rcernin has quit IRC | 04:31 | |
openstackgerrit | Pete Zaitcev proposed openstack/virtualbmc master: Drop redundant milliseconds from logging https://review.opendev.org/752850 | 04:35 |
*** rcernin has joined #openstack-ironic | 04:40 | |
*** jawad_axd has joined #openstack-ironic | 05:11 | |
*** jawad_axd has quit IRC | 05:15 | |
*** abdysn has joined #openstack-ironic | 05:20 | |
*** ociuhandu has joined #openstack-ironic | 05:35 | |
*** rcernin has quit IRC | 05:39 | |
*** ociuhandu has quit IRC | 05:40 | |
*** rcernin has joined #openstack-ironic | 06:01 | |
iurygregory | good morning Ironic | 06:04 |
*** Lucas_Gray has joined #openstack-ironic | 06:08 | |
*** rcernin has quit IRC | 06:20 | |
*** Lucas_Gray has quit IRC | 06:22 | |
*** jtomasek has joined #openstack-ironic | 06:44 | |
janders | good morning iurygregory | 06:45 |
iurygregory | hey janders o/ | 06:45 |
janders | hey o/ | 06:45 |
*** bfournie has quit IRC | 06:49 | |
Qianbiao | Morning ironic. | 06:53 |
Qianbiao | Morning iurygregory, janders | 06:53 |
iurygregory | hey Qianbiao o/ | 06:54 |
janders | hey Qianbiao | 06:54 |
Qianbiao | o/ | 06:54 |
arne_wiebalck | Good morning janders Qianbiao iurygregory and ironic! | 06:55 |
iurygregory | hey arne_wiebalck o/ | 06:55 |
Qianbiao | arne_wiebalck morning o/ | 06:56 |
*** tosky has joined #openstack-ironic | 07:03 | |
*** bfournie has joined #openstack-ironic | 07:07 | |
*** rcernin has joined #openstack-ironic | 07:39 | |
*** jtomasek has quit IRC | 07:41 | |
*** jtomasek has joined #openstack-ironic | 07:50 | |
*** rcernin has quit IRC | 07:52 | |
*** lucasagomes has joined #openstack-ironic | 08:00 | |
*** jawad_axd has joined #openstack-ironic | 08:00 | |
*** Lucas_Gray has joined #openstack-ironic | 08:13 | |
*** alexmcleod has joined #openstack-ironic | 08:24 | |
*** derekh has joined #openstack-ironic | 08:42 | |
Qianbiao | Hello, if i raise an error from ml2 mechanism driver's update_port_postcommit, will it break the ironic provision process. | 08:42 |
Qianbiao | will it result in instance ERROR | 08:43 |
iurygregory | well if an error occur it may put the instance in ERROR | 08:44 |
Qianbiao | nice, this is what i want. | 08:45 |
*** ociuhandu has joined #openstack-ironic | 08:53 | |
*** k_mouza has joined #openstack-ironic | 08:54 | |
*** Abdallahyas has joined #openstack-ironic | 09:34 | |
*** jtomasek has quit IRC | 09:35 | |
*** abdysn has quit IRC | 09:38 | |
*** jtomasek has joined #openstack-ironic | 09:48 | |
*** sshnaidm|afk is now known as sshnaidm | 09:52 | |
uzumaki | hola senor iurygregory ! how u doing? | 10:08 |
iurygregory | uzumaki, hey o/ doing good | 10:08 |
uzumaki | anybody has an idea what happens if both hardware and software RAID are provided for a node? | 10:09 |
uzumaki | iurygregory, how's the weather today? winter started rolling in yet or not? | 10:09 |
iurygregory | nah weather is crazy max 24 min 13... | 10:10 |
*** Qianbiao has quit IRC | 10:14 | |
uzumaki | iurygregory, oh boy! | 10:16 |
uzumaki | iurygregory, what's the difference between deploy and clean steps? Is one better than the other? Just wondering | 10:16 |
*** hjensas|afk is now known as hjensas | 10:17 | |
iurygregory | well afaik one is done during deployment and other during cleaning | 10:17 |
iurygregory | I wouldn't say one is better than the other.. | 10:18 |
uzumaki | That's what I thought.. it could be a choice for the operator, to use one or the other, I suppose.. Depending on how it fits their use cases iurygregory | 10:21 |
dtantsur | morning ironic | 10:23 |
dtantsur | so, this time it's Monday, yeah? | 10:23 |
janders | good morning dtantsur | 10:23 |
janders | yeah... that seems to be the general consensus | 10:24 |
iurygregory | morning dtantsur | 10:26 |
*** Wryhder has joined #openstack-ironic | 10:28 | |
*** Lucas_Gray has quit IRC | 10:29 | |
*** Wryhder is now known as Lucas_Gray | 10:29 | |
janders | what is the difference between start_managed_inspection(task) and _start_inspection(node_uuid, context)? | 10:48 |
janders | (context: https://opendev.org/openstack/ironic/src/branch/master/ironic/drivers/modules/inspector.py ) | 10:48 |
janders | first guess: former has to do with OOB and the latter is inspector, but a quick test doesn't seem to confirm that guess, hence the question. I might be seriously confused though! :) | 10:50 |
*** Abdallahyas has quit IRC | 10:50 | |
*** abdysn has joined #openstack-ironic | 10:50 | |
iurygregory | I may be wrong, my mind would say that one ironic manages the boot and the other nope... | 10:59 |
janders | that makes sense - thank you iurygregory! | 10:59 |
*** Lucas_Gray has quit IRC | 11:13 | |
dtantsur | yep. managed inspection goes through the ironic's boot management interface, non-managed - via static DHCP/PXE configuration. | 11:21 |
uzumaki | morning dtantsur, janders ! o/ | 11:23 |
dtantsur | \o | 11:24 |
uzumaki | how you doing? | 11:24 |
*** Qianbiao has joined #openstack-ironic | 11:25 | |
*** jawad_axd has quit IRC | 11:25 | |
dtantsur | pretty okay, you? | 11:26 |
*** jawad_axd has joined #openstack-ironic | 11:26 | |
uzumaki | pretty okay? strange.. :D I'm fine, the usual boring monday morning, post-weekend trance | 11:26 |
uzumaki | what makes you "pretty okay"? xD dtantsur | 11:27 |
dtantsur | a difficult bouldering session this morning, I need to grow new skin on my palms :) | 11:27 |
janders | dtantsur thank you | 11:27 |
uzumaki | bouldering session? since when do they need programmers in building pyramids :D | 11:29 |
dtantsur | we build everything, as long as it can be built from duct tape and chewing gum | 11:30 |
dtantsur | :) | 11:30 |
dtantsur | https://en.wikipedia.org/wiki/Bouldering | 11:30 |
openstackgerrit | Verification of a change to openstack/ironic-python-agent failed: Generate a TLS certificate and send it to ironic https://review.opendev.org/749930 | 11:35 |
dtantsur | SIGH | 11:36 |
dtantsur | 10 rechecks and counting | 11:36 |
uzumaki | dtantsur, so you just felt like Ethan Hunt from Mission Impossible 2, when conquering those rocks? | 11:38 |
dtantsur | I felt like a sack of potatoes :D but I'm a very beginner | 11:39 |
uzumaki | dtantsur, with all the rag-doll physics xD (the ability to realistically fall to the ground) | 11:39 |
janders | see you tomorrow Ironic | 11:43 |
janders | o/ | 11:43 |
uzumaki | au revoir janders ! | 11:43 |
iurygregory | dtantsur, duct tape and chewing gum OMG | 11:50 |
dtantsur | aka IT industry | 11:50 |
uzumaki | iurygregory, ikr? | 11:50 |
iurygregory | hahahaha | 11:50 |
iurygregory | OMG | 11:50 |
*** Abdallahyas has joined #openstack-ironic | 11:53 | |
*** rh-jelabarre has joined #openstack-ironic | 11:55 | |
*** abdysn has quit IRC | 11:56 | |
iurygregory | dtantsur, I've created the RFE https://storyboard.openstack.org/#!/story/2008171 not much details since I need to look at some introspection data to see if it would be useful =) | 11:58 |
openstackgerrit | QianBiao Ng proposed openstack/ironic stable/ussuri: opt: Enhance old stable branches to use latest python-ibmcclient https://review.opendev.org/752006 | 11:58 |
iurygregory | something else I was thinking is provide some sort of cli that could generate the templates for alarms in prometheus so the user could just update ... | 11:59 |
dtantsur | iurygregory: cool, yeah. I guess it would be useful to specify if you expect this to work with the active node introspection | 11:59 |
iurygregory | ack | 12:01 |
uzumaki | term | 12:02 |
uzumaki | xD | 12:03 |
Qianbiao | dtantsur stevebaker JayF it seems there have different opinion in this patch: https://review.opendev.org/#/c/752006/ | 12:06 |
patchbot | patch 752006 - ironic (stable/ussuri) - opt: Enhance old stable branches to use latest pyt... - 7 patch sets | 12:06 |
Qianbiao | basicly, i am agree with both opinion. I agree dtantsur more in the long term. The mechanism can support complex hardware env. | 12:08 |
*** uzumaki has quit IRC | 12:10 | |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic master: Deprecate the iscsi deploy interface https://review.opendev.org/750204 | 12:11 |
*** Abdallahyas has quit IRC | 12:13 | |
*** abdysn has joined #openstack-ironic | 12:13 | |
*** dougsz has joined #openstack-ironic | 12:16 | |
*** priteau has joined #openstack-ironic | 12:22 | |
arne_wiebalck | dtantsur: I tried to track down the missing ESPs when doing UEFI s/w RAID: https://storyboard.openstack.org/#!/story/2008164 and here's what I found: | 12:32 |
arne_wiebalck | It seems that when we reboot after cleaning, i.e. during deploy, the RAID cannot be fully assembled (as one of the disks is "non-fresh"). Consequently, Ironic cannot identify all holder disks and leaves some of them without ESP. The unfresh device may come from an unclean shutdown of the RAID, and from what I see at the end of cleaning we trigger a power off (i.e. no gentle RAID shutdown) ... | 12:32 |
arne_wiebalck | In order to address this I was thinking to shut down the RAID devices right after creation, so that they are stopped in a clean way (even if not sync'ed) ... does this all sound sensible? | 12:32 |
*** jawad_ax_ has joined #openstack-ironic | 12:32 | |
maelk | Hi! Does it make sense to try to offload the json_rpc TLS of a conductor to a local reverse proxy ? Let's say I have two conductors. If I configure them to listen on a localhost port only, and use the reverse proxy to expose it. My guess is that this would not work if I have multiple conductors and it would be much better to move to some AMQP based | 12:35 |
maelk | approach ? Is this correct ? | 12:35 |
*** jawad_axd has quit IRC | 12:36 | |
*** jawad_axd has joined #openstack-ironic | 12:42 | |
*** jawad_ax_ has quit IRC | 12:46 | |
*** Goneri has joined #openstack-ironic | 12:50 | |
*** martalais has joined #openstack-ironic | 12:51 | |
martalais | Good morning, Ironic :) | 12:52 |
iurygregory | morning martalais o/ | 12:54 |
dtantsur | maelk: hi, what are you trying to achieve by that? | 13:01 |
dtantsur | arne_wiebalck: won't it be more useful to change cleaning to graceful shutdown? | 13:02 |
dtantsur | there may be clean steps that need to access the root device, if you stop RAID, it won't be possible | 13:02 |
*** rloo has joined #openstack-ironic | 13:03 | |
dtantsur | TheJulia: I'm staring at the managed-non-standalone failures on your patch (and not only there), and it's weird. 1.5G of RAM disappear in unclear direction, and qemu stars OOMing. see the whiteboard for more findings, help welcome. | 13:04 |
TheJulia | dtantsur: ugh | 13:05 |
dtantsur | wow, I didn't expect you to be already awake :) good morning! | 13:06 |
TheJulia | I just got up | 13:06 |
dtantsur | now it's happy Monday, as folks have told me already | 13:06 |
maelk | dtantsur you made a comment on a PR in Metal3 that it would be better to offload the TLS to httpd for example. It's quite straight forward for the API, but I'm a bit unclear for the Json RPC part. Just trying to understand the ins and outs to see if it is worth it, or if we were in such a case (multiple conductors) we'd anyways need to change the R | 13:07 |
maelk | PC solution | 13:07 |
dtantsur | maelk: I think it's fine to use built-in TLS for now for non-user-facing stuff. | 13:07 |
dtantsur | or even: it's fine to use built-in TLS for now. let's just keep in mind that it's not 100% rock solid, and we may want to replace it as/if we scale. | 13:08 |
dtantsur | JSON RPC should work fine with multiple conductors. AMQP will bring its own bunch of problems. | 13:08 |
dtantsur | the only condition with JSON RPC is that conductor host names must be accessible (i.e. not fake) | 13:09 |
maelk | ok, but then it uses the port configured for the local instance to connect to others, no ? so in pratice that forbids TLS offloading to a reverse proxy | 13:09 |
iurygregory | good morning TheJulia =) | 13:10 |
TheJulia | coffee brewing | 13:10 |
*** cdearborn has joined #openstack-ironic | 13:11 | |
dtantsur | maelk: correct. we could provide an override for that. | 13:11 |
dtantsur | the problems are possible to overcome, but it's probably not worth it right here and now, unless you already hit issues with the built-in TLS implementation | 13:12 |
dtantsur | a much bigger blocker for multi-node ironic is ironic-inspector.. | 13:12 |
maelk | I'm not hitting a problem, just trying to understand the details. Thank you for the explanations! | 13:14 |
*** jawad_axd has quit IRC | 13:21 | |
*** jawad_axd has joined #openstack-ironic | 13:22 | |
TheJulia | out of curiosity, has anyone gone through the whiteboard this morning and updated patch statuses for the review priorities? | 13:34 |
iurygregory | I've updated a few things on friday.. | 13:35 |
openstackgerrit | Zane Bitter proposed openstack/sushy-tools master: Fix race condition initialising persistent dict https://review.opendev.org/752953 | 13:37 |
arne_wiebalck | dtantsur: you mean SOFT_POWER_OFF rather than POWER_OFF ? | 13:38 |
arne_wiebalck | dtantsur: I was thinking this as well, but then tought it might be better do the changes close to the RAID creation ... but your root device point is certainly valid | 13:39 |
Qianbiao | <arne_wiebalck> in IBMC, the raid configuration is always slow then it shows. | 13:41 |
Qianbiao | like it says raid configuration done, but indeen, the backend is still running. | 13:41 |
*** tzumainn has joined #openstack-ironic | 13:42 | |
arne_wiebalck | Qianbiao: the fact that the RAID is not synced is not so much of an issue, I think | 13:42 |
dtantsur | arne_wiebalck: technically.. it's interesting | 13:44 |
dtantsur | I don't think we have a notion of "shut it down gently first" | 13:44 |
arne_wiebalck | dtantsur: I can try both | 13:44 |
dtantsur | we could do it with SOFT_POWER_OFF, maybe worth giving a try | 13:44 |
arne_wiebalck | that seems the easiest to test | 13:44 |
arne_wiebalck | https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/deploy_utils.py#L684 | 13:45 |
arne_wiebalck | dtantsur: here^^ ? | 13:45 |
dtantsur | arne_wiebalck: not only there, there is also reboot between cleanings | 13:45 |
dtantsur | but this is the key place | 13:45 |
dtantsur | note that we'll have to do fall back to hard power off if we cannot soft power off | 13:46 |
arne_wiebalck | dtantsur: right ... ok, let me test this to see if the failure rate goes down | 13:46 |
arne_wiebalck | dtantsur: can you think of any reason why we do power_off here in the first place? Seems pretty harsh. | 13:47 |
dtantsur | arne_wiebalck: you mean, why not soft_power_off? | 13:47 |
arne_wiebalck | dtantsur: exactly | 13:47 |
dtantsur | arne_wiebalck: this code was written long before soft power actions appeared | 13:47 |
dtantsur | and not all drivers support them, so you'll have to handle it too | 13:48 |
* arne_wiebalck is caught in his IPMI world, it seems | 13:48 | |
arne_wiebalck | dtantsur: ok, thanks, I'll do some testing and come back after | 13:48 |
*** sdanni has joined #openstack-ironic | 13:49 | |
*** martalais has left #openstack-ironic | 13:50 | |
*** martalais has joined #openstack-ironic | 13:52 | |
openstackgerrit | Richard G. Pioso proposed openstack/ironic master: Fix redfish BIOS to use @Redfish.SettingsApplyTime https://review.opendev.org/752614 | 13:53 |
*** sdanni has quit IRC | 13:54 | |
openstackgerrit | Julia Kreger proposed openstack/ironic master: Minor agent version code cleanup https://review.opendev.org/749552 | 13:54 |
openstackgerrit | Julia Kreger proposed openstack/ironic master: Add Redfish BIOS interface to idrac HW type https://review.opendev.org/749240 | 13:56 |
TheJulia | dtantsur: the qemu oom stuff you mentioned, is that resulting in truncated console logs? | 13:57 |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic master: Deprecate the iscsi deploy interface https://review.opendev.org/750204 | 14:00 |
dtantsur | TheJulia: it's resulting in virtualbmc freaking out | 14:00 |
TheJulia | hmm | 14:00 |
*** uzumaki has joined #openstack-ironic | 14:01 | |
TheJulia | dtantsur: is the gate status correct on the etherpad? | 14:24 |
TheJulia | I guess it is ipa/inspector only | 14:24 |
*** martalais has left #openstack-ironic | 14:27 | |
iurygregory | yay we can release IPE =) after the bugfix is merged =D | 14:28 |
*** ajya|afk is now known as ajya | 14:29 | |
openstackgerrit | Christopher Dearborn proposed openstack/ironic master: Redfish driver firmware update https://review.opendev.org/749619 | 14:31 |
Qianbiao | Hello TheJulia dtantsur I have update the python-ibmcclient version to compatible with 0.1.0, need a +2 now: https://review.opendev.org/#/c/752006/ | 14:34 |
patchbot | patch 752006 - ironic (stable/ussuri) - opt: Enhance old stable branches to use latest pyt... - 7 patch sets | 14:34 |
TheJulia | dtantsur: I think I have figured out what is at least kind of happening with the ipa jobs | 14:36 |
*** antotala has joined #openstack-ironic | 14:37 | |
TheJulia | dtantsur: I think I have figured out what is at least kind of happening with the ipa jobs | 14:37 |
dtantsur | TheJulia: I updated it today | 14:38 |
TheJulia | the setting overrides are not working | 14:38 |
dtantsur | mmm, which ones? | 14:38 |
TheJulia | vm memory size | 14:38 |
TheJulia | ramdisk type | 14:38 |
TheJulia | the jobs define tinyipa and 512mb of ram | 14:39 |
TheJulia | the jobs are running and zuul inventory with centos and 3GB of ram | 14:39 |
dtantsur | huh | 14:39 |
TheJulia | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b18/752719/1/check/ipa-tempest-bios-ipmi-direct-src/b184792/zuul-info/inventory.yaml <-- from the job you linked in the etherpad | 14:39 |
iurygregory | wut??? | 14:39 |
dtantsur | but it still should work, shouldn't it? or do timeouts get mixed up as well? | 14:39 |
TheJulia | it is unpredicatble because OOMKiller | 14:40 |
dtantsur | same OOM problem as in inspector? or what do you mean? | 14:40 |
dtantsur | centos+3G at least used to work | 14:41 |
TheJulia | possibly | 14:41 |
TheJulia | but not when we're trying to fire two VMs | 14:41 |
TheJulia | IRONIC_VM_COUNT: 2 | 14:41 |
dtantsur | I wonder if we should declare the experiment with running DIB in the CI failed :( | 14:41 |
TheJulia | Not sure, it _seems_ like the override inheratance is broken | 14:42 |
TheJulia | oh wait | 14:42 |
TheJulia | doh! | 14:42 |
TheJulia | I'm stupid | 14:42 |
TheJulia | I see it | 14:42 |
* dtantsur is intrigued | 14:42 | |
TheJulia | I was looking at the wrong line in the job definition | 14:43 |
*** jtomasek has quit IRC | 14:45 | |
*** uzumaki has quit IRC | 14:47 | |
TheJulia | hmm | 14:50 |
*** lmcgann_ has joined #openstack-ironic | 14:50 | |
*** martalais has joined #openstack-ironic | 14:52 | |
dtantsur | iurygregory: do you have any bits related to vmedia in bifrost ready or in-progress? | 14:53 |
iurygregory | dtantsur, not many bits as I wished... | 14:54 |
TheJulia | so these VM's have 1GB of swap | 14:55 |
dtantsur | iurygregory: feel free to post anything WIP, I can take a look | 14:56 |
iurygregory | dtantsur, ack o/ | 14:57 |
iurygregory | will work on that | 14:57 |
*** martalais has quit IRC | 14:57 | |
iurygregory | my idea atm is just add the support for vmedia and make it work, after done add a config that will work for vmedia + dhcp-less | 14:58 |
*** kaifeng has joined #openstack-ironic | 14:59 | |
*** martalais has joined #openstack-ironic | 15:00 | |
TheJulia | #startmeeting ironic | 15:00 |
openstack | Meeting started Mon Sep 21 15:00:10 2020 UTC and is due to finish in 60 minutes. The chair is TheJulia. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
TheJulia | o/ | 15:00 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
*** openstack changes topic to " (Meeting topic: ironic)" | 15:00 | |
openstack | The meeting name has been set to 'ironic' | 15:00 |
iurygregory | o/ | 15:00 |
martalais | o/ | 15:00 |
ajya | o/ | 15:00 |
cdearborn | o/ | 15:00 |
bdodd | o/ | 15:00 |
erbarr | o/ | 15:00 |
TheJulia | Our agenda this week can be found on the wiki. | 15:00 |
rloo | o/ | 15:00 |
TheJulia | #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting | 15:00 |
arne_wiebalck | o/ | 15:00 |
rpioso | \o | 15:01 |
* iurygregory forgot to add 2 rfe's for discussion... | 15:01 | |
TheJulia | #topic Announcements / Reminders | 15:01 |
*** openstack changes topic to "Announcements / Reminders (Meeting topic: ironic)" | 15:01 | |
TheJulia | iurygregory: quick! add them :) | 15:01 |
kaifeng | o/ | 15:01 |
TheJulia | First off! | 15:01 |
TheJulia | #info CI is very unhappy - Details are on the whiteboard. | 15:01 |
iurygregory | CI yay... | 15:01 |
TheJulia | This appears to be memory related :\ | 15:01 |
TheJulia | #info We're also in the home stretch for victoria. This week is R-3 for OpenStack. | 15:02 |
TheJulia | #info Priority obviously is CI and reviews this week. | 15:02 |
TheJulia | #info TC/PTL nominations are this week, if your interested message TheJulia | 15:02 |
*** stendulker has joined #openstack-ironic | 15:02 | |
TheJulia | I guess I'll run again if you fold want me to. | 15:02 |
stendulker | o/ | 15:02 |
TheJulia | #info Redfish interop status meeting has been scheduled | 15:03 |
TheJulia | It will be on Thursday, September 23rd at 12 PM UTC. | 15:03 |
TheJulia | #link https://cern.zoom.us/j/94808950339 | 15:03 |
arne_wiebalck | Everyone is welcome of course. | 15:03 |
iurygregory | we will give you cookies if you run again TheJulia | 15:03 |
arne_wiebalck | iurygregory: ++ | 15:03 |
rajinir | o/ | 15:03 |
TheJulia | iurygregory: cranberry oatmeal and you'll have me sold. | 15:04 |
TheJulia | One final item in my semi-out of order list of announcements/reminders | 15:04 |
rpioso | mraineri from Redfish Forum will attend the first half. | 15:04 |
* iurygregory would ship food from Annapurna to TheJulia | 15:04 | |
TheJulia | It looks like the kexec effort should end up with some devoted PTG time to discuss and determine the next path. I got an email from Boston University and the group of students did not choose ironic :( | 15:05 |
iurygregory | #sad | 15:05 |
TheJulia | c'est la vie | 15:05 |
*** k_mouza has quit IRC | 15:05 | |
TheJulia | Does anyone have anything to announce or remind us of? | 15:05 |
TheJulia | No action items so we can proceed to subteam statuses | 15:06 |
openstackgerrit | Merged openstack/ironic-prometheus-exporter master: Fallback to `node_uuid` if`node_name` is not present https://review.opendev.org/723176 | 15:06 |
TheJulia | iurygregory: I guess you can release IPE :) | 15:07 |
iurygregory | I will =) | 15:07 |
* TheJulia guesses there are no other announcements and reminders | 15:07 | |
TheJulia | onward? | 15:07 |
iurygregory | ++ | 15:07 |
TheJulia | #topic Review subteam status reports | 15:07 |
*** openstack changes topic to "Review subteam status reports (Meeting topic: ironic)" | 15:07 | |
TheJulia | #link https://etherpad.openstack.org/p/IronicWhiteBoard | 15:07 |
*** uzumaki has joined #openstack-ironic | 15:08 | |
TheJulia | Starting at line 279 | 15:08 |
iurygregory | I think we can remove the Zuulv3 migration | 15:08 |
iurygregory | and have a topic for grenade efforts in the future | 15:09 |
TheJulia | arne_wiebalck: w/r/t the scale issues item you noted, I've got a patch up to preserve the efi boot artifacts, we should likely make sure we don't collide in our efforts | 15:10 |
*** priteau has quit IRC | 15:10 | |
*** k_mouza has joined #openstack-ironic | 15:11 | |
arne_wiebalck | TheJulia: ok | 15:11 |
arne_wiebalck | TheJulia: you have a link? | 15:12 |
TheJulia | arne_wiebalck: it is on ipa, I don't at the moment but I'll get it to you | 15:13 |
arne_wiebalck | TheJulia: I should be able to find it ... | 15:13 |
TheJulia | Otherwise I think most things look okay and in a good state. I realize we're also basically blocked on ipa at the moment due to CI | 15:13 |
TheJulia | Anyhow, are we good to proceed? | 15:13 |
dtantsur | yep | 15:14 |
TheJulia | one moment, having to relaunch windows, my browser crashed | 15:14 |
TheJulia | #topic Deciding on priorities for the coming week | 15:14 |
*** openstack changes topic to "Deciding on priorities for the coming week (Meeting topic: ironic)" | 15:14 | |
TheJulia | Is there anything we need to add to the list of the priorites | 15:14 |
TheJulia | #link https://etherpad.openstack.org/p/IronicWhiteBoard | 15:15 |
TheJulia | Starting at line 164 | 15:15 |
*** k_mouza has quit IRC | 15:15 | |
*** k_mouza has joined #openstack-ironic | 15:15 | |
dtantsur | iscsi deprecation? https://review.opendev.org/750204 | 15:16 |
patchbot | patch 750204 - ironic - Deprecate the iscsi deploy interface - 8 patch sets | 15:16 |
TheJulia | I think it is already on the list | 15:16 |
stendulker | This can be added for vendor priority (iLO) https://review.opendev.org/#/c/752001/ | 15:16 |
patchbot | patch 752001 - ironic - Adding changes for iso less vmedia support - 8 patch sets | 15:16 |
TheJulia | it is, just marked as a wip | 15:16 |
dtantsur | ah, right, removed WIP | 15:17 |
TheJulia | stendulker: sure, if you could make that update on the etherpad that would be much appreciated | 15:17 |
stendulker | updated | 15:17 |
stendulker | thanks | 15:17 |
TheJulia | Any objection if I remove the networkin-generic-switch item? | 15:17 |
TheJulia | last updated September 9th | 15:18 |
arne_wiebalck | https://review.opendev.org/#/c/748049 is the one you were referring to earlier? | 15:18 |
patchbot | patch 748049 - ironic-python-agent - Support partition image efi contents - 4 patch sets | 15:18 |
TheJulia | arne_wiebalck: yes | 15:18 |
iurygregory | I will add latter some backports of the IPE (after I push them) | 15:19 |
TheJulia | iurygregory: sounds good | 15:19 |
Qianbiao | I add a line under IPA segment for https://review.opendev.org/#/c/752024/ | 15:19 |
patchbot | patch 752024 - ironic-python-agent - Fix: make Intel CNA hardware manager none generic - 5 patch sets | 15:19 |
TheJulia | I see a couple people have updated a few different areas | 15:19 |
TheJulia | Any objections to what is present at this time? | 15:19 |
*** jawad_axd has quit IRC | 15:20 | |
iurygregory | none from me | 15:20 |
TheJulia | okay, seems like we can proceed then! | 15:21 |
TheJulia | So we have nothing listed for discussion today | 15:21 |
TheJulia | So we can proceed to the Baremetal SIG! | 15:21 |
TheJulia | #topic Baremetal SIG | 15:21 |
*** openstack changes topic to "Baremetal SIG (Meeting topic: ironic)" | 15:21 | |
arne_wiebalck | No more input on the doodle, so I guess we just schedule a first meeting and see how that goes. | 15:22 |
TheJulia | arne_wiebalck: that seems reasonable | 15:22 |
arne_wiebalck | That's it :) | 15:22 |
TheJulia | Okay then, RFE Review it is then | 15:23 |
TheJulia | #topic RFE Review | 15:23 |
*** openstack changes topic to "RFE Review (Meeting topic: ironic)" | 15:23 | |
TheJulia | iurygregory: I believe these are yours? | 15:23 |
iurygregory | yup | 15:23 |
iurygregory | \o/ | 15:23 |
TheJulia | iurygregory: would you like to talk through them? | 15:24 |
iurygregory | So, 1st RFE is https://storyboard.openstack.org/#!/story/2008171 | 15:24 |
iurygregory | to add some support for IPE for introspection data | 15:24 |
TheJulia | so what problem do you see this solving? | 15:25 |
iurygregory | this would probably be something interesting when we have active node introspection | 15:25 |
TheJulia | hardware discrepancy detection? | 15:25 |
iurygregory | for example the operator wants the firmware versions for the X vendor machines the same version... | 15:25 |
*** sdanni has joined #openstack-ironic | 15:26 | |
iurygregory | so he can setup an alarm based on the "metric" for firmware version to get notification if something is different for the machines | 15:26 |
dtantsur | assuming extra-hardware present? I don't think we collect firmware versions by default. | 15:26 |
iurygregory | ofc it would depend on what the introspection data will have =) | 15:27 |
JayF | What would be a downside for optionally supporting putting node inspection data in prometheus? As long as it's not enabled by default, it seems like a potentially good thing. | 15:27 |
arne_wiebalck | can introspection rules be used during active introspection? | 15:27 |
TheJulia | iurygregory: would this only apply to the inspection data for the nodes that the IPE is responsible for based being paired with the conductor and the data supplied from the sensor data collection? | 15:27 |
dtantsur | arne_wiebalck: yes, I think | 15:27 |
JayF | Especially with some of the data you could "inspect" about node lifetime from utilizing plugins for more data, e.g. SMART cycles, firmware versions (as mentioned), etc | 15:28 |
arne_wiebalck | dtantsur: what would happen if they detect an issue? | 15:28 |
iurygregory | TheJulia, only for the ones that conductor can report I would say | 15:28 |
arne_wiebalck | dtantsur: for normal inspection, the node would fail inspection | 15:28 |
dtantsur | arne_wiebalck: probably nothing particular.. I cannot say for sure. | 15:28 |
arne_wiebalck | dtantsur: since that would be a similar functionality | 15:28 |
arne_wiebalck | dtantsur: for the alarming part at least | 15:29 |
dtantsur | overall, I'm with JayF on the "why not" bit. but I'd rather see more technical details in the RFE before committing to it. | 15:29 |
TheJulia | iurygregory: I suspect your going to need to write something a bit more verbose along the lines of a spec. I like the idea, I'm only worried about size/scale/scope issues and mechanics. | 15:29 |
JayF | ++ that is a pretty anemic | 15:29 |
JayF | RFE ** | 15:29 |
iurygregory | yeah sorry for that =) | 15:29 |
dtantsur | I personally don't insist on a spec, I'd just read more text on the story | 15:29 |
TheJulia | That being said, I suspect many of us agree it would be a good thing | 15:29 |
iurygregory | I will try gather more details =) | 15:29 |
TheJulia | same really, just more details would be excellent | 15:30 |
iurygregory | ack | 15:30 |
iurygregory | so moving to the second RFE | 15:30 |
TheJulia | and maybe think through the questions posed in a spec while your adding detail | 15:30 |
iurygregory | sure =) | 15:30 |
TheJulia | Because those questions are asked to provoke thought in many cases :) | 15:30 |
TheJulia | "what if?" | 15:30 |
rloo | well... ideally, it'd be some sort of 'plugin'? or class, so that other non-prometheus systems could also get that introspection data in the future? | 15:31 |
dtantsur | we have plugins for ironic-inspector | 15:31 |
TheJulia | Awesome | 15:31 |
dtantsur | the problem is, prometheus only supports "pull" model | 15:31 |
dtantsur | (there is something for the "push" model, but it's not recommended) | 15:31 |
TheJulia | iurygregory: so your second RFE? | 15:31 |
rloo | ^^ which means the general idea seems ok, but i personally would like a bit more details. if not a spec, please put in the story. | 15:31 |
iurygregory | Templates for alarms https://storyboard.openstack.org/#!/story/2008176 | 15:32 |
iurygregory | when using prometheus normally you will have some set of alarm rules you will use that will trigger notifications | 15:32 |
TheJulia | iurygregory: I think that makes a lot of sense, just maybe a little more detail on how we're going to make it easy | 15:33 |
dtantsur | I guess I have the same concern with this RFE: it's very short | 15:33 |
rloo | iurygregory: sorry, for the first rfe, would you mind updating the title or whatever to something more specific, eg 'push introspection data to prometheus' ? | 15:33 |
iurygregory | for example you want to get a notification if the temparature of the nodes is higher than a threshold ... | 15:33 |
iurygregory | rloo, sure I will do | 15:33 |
dtantsur | I'd like to see two distinct parts: user story ("As an operator I want to") and solution ("we will change that, add this") | 15:33 |
dtantsur | right now I don't quite understand why operators cannot configure it.. the way they usually configure it | 15:34 |
iurygregory | well they can configure | 15:34 |
TheJulia | ++ | 15:34 |
TheJulia | I guess I'm also missing a hint at the solution to the problem of it is hard | 15:34 |
iurygregory | in my mind would be something like | 15:34 |
iurygregory | you want an alarm for higher temperature | 15:35 |
TheJulia | Generally no objection to the rfe otherwise, just need some more detail :) | 15:35 |
iurygregory | so you can say the metric name and the experion it should use | 15:35 |
iurygregory | and we would output the yml format you need to update in the configuration of prometheus... | 15:35 |
iurygregory | instead of going and writing all rules you want etc.. | 15:36 |
TheJulia | I guess the conundrum in a way is they don't really know what to populate until after the fact | 15:36 |
TheJulia | so they have no examples | 15:36 |
dtantsur | I have no idea about prometheus, so please pardon my question if it's silly: is it really easier? | 15:36 |
dtantsur | ah, hmm. do we have all the data we need? I thought it was also driver and hardware specific? | 15:37 |
iurygregory | well I would prefer to get a file that I just need to add to the config instead of writing everything.. | 15:37 |
TheJulia | Yeah, that is a conundrum since there are some data transformations based on names if memory serves | 15:37 |
* TheJulia wonders if this could almost just be sample alarms documentation | 15:37 | |
dtantsur | again, no hard feelings against, but I'd like to understand more before ack'ing | 15:39 |
dtantsur | and ideally have it written :) | 15:39 |
TheJulia | iurygregory: so your going to make things more verbose on both and I guess we can revisit again next week? | 15:39 |
dtantsur | it = details in the RFE | 15:39 |
dtantsur | TheJulia++ | 15:39 |
iurygregory | yup | 15:39 |
TheJulia | Awesome then | 15:39 |
TheJulia | Anyone have any other RFE's while we're at it? | 15:40 |
iurygregory | I will try to show up on next week (monday is holiday in CZ) | 15:40 |
TheJulia | iurygregory: ack | 15:40 |
*** abdysn has quit IRC | 15:40 | |
iurygregory | and I'm moving things during the weekend... | 15:40 |
TheJulia | iurygregory: ugh, well if your not around we can always hold to the following week | 15:40 |
dtantsur | I think you'll deserve some proper rest afterwards | 15:40 |
TheJulia | Anyway! | 15:40 |
TheJulia | #topic Open Discussion | 15:40 |
*** openstack changes topic to "Open Discussion (Meeting topic: ironic)" | 15:40 | |
*** jawad_axd has joined #openstack-ironic | 15:40 | |
dtantsur | iurygregory: just update it, and we can discuss without you | 15:40 |
iurygregory | ack | 15:41 |
dtantsur | worst case, we delay one more week | 15:41 |
TheJulia | Now, we can plot to take over the world! | 15:41 |
dtantsur | \o/ | 15:41 |
* TheJulia wonders where she left the coffee at | 15:41 | |
*** uzumaki has quit IRC | 15:41 | |
*** k_mouza has quit IRC | 15:41 | |
TheJulia | so regarding CI | 15:42 |
*** k_mouza has joined #openstack-ironic | 15:42 | |
iurygregory | \o/ | 15:42 |
cdearborn | hey folks, I ran across an issue when testing firmware update. After some investigation, I believe this issue exists in all cleaning steps that call task.process_event('fail'), which i modeled firmware update after. i fixed the issue in firmware update, but i believe it is present in all the other cleaning steps. wondering how this should be handled. should we discuss now, or outside the meeting? | 15:43 |
TheJulia | it seems like both VMs are running and we're simply running the machines out of ram. 1GB of swap, 8 GB of ram... on RAX :( | 15:43 |
dtantsur | cdearborn: I think we should get rid of any calls to task.process_event in drivers | 15:44 |
TheJulia | I guess we need to force those jobs to use tinyipa and reduce memory count in rax as well? | 15:44 |
dtantsur | it may be some functionality gap that we need to cover | 15:44 |
TheJulia | it may be this stuff just works in other clouds | 15:44 |
iurygregory | TheJulia, it's happening only in rax? | 15:44 |
TheJulia | because of more swap on the instances | 15:44 |
dtantsur | TheJulia: I've seen very different RAM consumption between different jobs - see the whiteboard | 15:44 |
TheJulia | iurygregory: I'm not sure, but we're also trying centos builds on rax right now | 15:44 |
iurygregory | gotcha | 15:44 |
dtantsur | another option is to use concurrency==1 and 1 VM | 15:44 |
dtantsur | will make the jobs take longer, of course | 15:45 |
*** jawad_axd has quit IRC | 15:45 | |
*** JamesBenson has joined #openstack-ironic | 15:45 | |
iurygregory | we are default to 2 VM's on most ironic jobs (netboot/local) for tempest testing... | 15:45 |
iurygregory | since we need to set capabilities before running tempest or nova will *BOOM* | 15:46 |
dtantsur | ah, I remember now. we need two VMs because of cleaning.. | 15:46 |
iurygregory | and I think non-uefi jobs requires 3GB and uefi 4GB that can cause more swap etc since we have bigger instances.. | 15:47 |
cdearborn | the issue is that when a cleaning step detects failure and calls task.process_event('fail'), the node moves into the clean failed state, but does not go into maintenance mode and the next cleaning step that is run against the node after is never actually kicked off. The node goes into clean wait and stays there forever. | 15:47 |
cdearborn | I fixed the issue in firmware update by calling conductor.utils.cleaning_error_handler() instead. | 15:48 |
cdearborn | I believe this has something to do with the state that is left in driver_internal_info. If the cleaning steps just calls into task.process_event('fail'), then that state is never cleaned up. | 15:48 |
dtantsur | I think clean steps should only raise exceptions, not mess with states | 15:49 |
dtantsur | but yes, you're right, just calling process_event is wrong and will leave the node in an unclear state | 15:49 |
TheJulia | So this is an awful idea, what if we created more swap? | 15:49 |
cdearborn | dtantsur, a raised exception is handled correctly, as is a timeout | 15:49 |
iurygregory | TheJulia, if the awful idea helps I'm ok with it.. | 15:50 |
iurygregory | using concurrency 1 would be worth (if we are not doing also) just to see how it goes | 15:50 |
TheJulia | I think we can likely tune down the memory footprint we enable the centos job to have since we did some cleaning in the image | 15:50 |
TheJulia | We were up to 500 megs at one point and now we're down to like 360 | 15:51 |
iurygregory | sounds like a plan | 15:51 |
cdearborn | dtantsur, the issue is mainly with async cleaning steps where throwing an exception is not an option | 15:51 |
TheJulia | that puts the footprint at worst aroudn 2.25 GB if my back of the napkin math is correct | 15:51 |
TheJulia | I'll try tuning down the ipa jobs first with an override and we can see how that goes | 15:52 |
TheJulia | statistically if it passes and survives a recheck, we're likely good to reduce the overall memory consumption size across the board | 15:52 |
iurygregory | ack | 15:52 |
TheJulia | So anything else for us to randomly discuss this morning? | 15:53 |
lmcgann_ | Hi I'm an engineer at red hat research and I just wanted to throw out there that we have begun working on a way to integrate a Keylime into Ironic to provide a means of node attestation under different states. To begin I'll be reviving a patch for generic a security interface on nodes: https://review.opendev.org/#/c/576718/3/specs/approved/security-interface.rst | 15:53 |
patchbot | patch 576718 - ironic-specs - Add security interface spec - 3 patch sets | 15:53 |
dtantsur | cdearborn: then probably cleaning_error_handling is the right tool to use | 15:53 |
dtantsur | hi and welcome lmcgann_ | 15:53 |
iurygregory | welcome lmcgann_ | 15:53 |
dtantsur | great news! | 15:54 |
iurygregory | ++ | 15:54 |
TheJulia | lmcgann_: We _may_ need to split it into two specs mechanics wise, one on the interface and one on just keylime, but maybe wrapped together could work :) | 15:54 |
kaifeng | welcome lmcgann_ | 15:54 |
rpioso | o/ lmcgann_ | 15:54 |
lmcgann_ | Hi everybody :) | 15:54 |
TheJulia | lmcgann_: and yes, feel free to revise that change set however you feel is appropriate | 15:54 |
lmcgann_ | I would also like to shamelessly promote our DevConf talk this Thursday wherein we will demonstrate our work on Ironic multitenancy and other contributions as a means of sharing hardware in the Mass Open Cloud | 15:54 |
cdearborn | \0 lmcgann_ | 15:54 |
lmcgann_ | https://devconfus2020.sched.com/event/b2738f74c3a7e3021ba9fd53a035e5ed | 15:55 |
TheJulia | lmcgann_: Awesome, good luck! | 15:55 |
TheJulia | Well everyone, thank you and have a wonderful week! | 15:55 |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic-inspector master: Limit inspector jobs to 1 testing VM https://review.opendev.org/753051 | 15:55 |
dtantsur | TheJulia: ^^^ | 15:55 |
TheJulia | Yeah, that should work just fine for inspector jobs | 15:56 |
TheJulia | IPA otoh :( | 15:56 |
TheJulia | I'll put up a patch in a few minutes | 15:56 |
TheJulia | Anyway, have a wonderful week! | 15:56 |
TheJulia | Thanks! | 15:56 |
TheJulia | #endmeeting | 15:57 |
*** openstack changes topic to "Bare Metal Provisioning | Status: http://bit.ly/ironic-whiteboard | Docs: http://docs.openstack.org/ironic/ | Bugs: https://storyboard.openstack.org/#!/project_group/75 | Contributors are generally present between 6 AM and 12 AM UTC, If we do not answer, please feel free to pose questions to openstack-discuss mailing list." | 15:57 | |
openstack | Meeting ended Mon Sep 21 15:57:12 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:57 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-09-21-15.00.html | 15:57 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-09-21-15.00.txt | 15:57 |
openstack | Log: http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-09-21-15.00.log.html | 15:57 |
* TheJulia makes everyone coffee | 15:57 | |
dtantsur | lmcgann_: 16:20 EDT is quite late for me, I hope it will be recorded though | 15:57 |
lmcgann_ | The Julia: Yeah I envisioned two specs. I already started drafting the second one from the document we looked at in our meeting on Thursday but I'll be holding off on submitting anything until more of the security interface work is done | 15:57 |
iurygregory | should be recorded | 15:57 |
TheJulia | lmcgann_: awesome | 15:57 |
iurygregory | afaik devconfus is also recorded =) | 15:57 |
dtantsur | yeah, I'd expect that | 15:57 |
iurygregory | if it's not I will ping devconfcz folks to ask devconfus to record hehe | 15:58 |
openstackgerrit | Julia Kreger proposed openstack/ironic-python-agent master: Lower memory usage of VMs https://review.opendev.org/753057 | 16:02 |
*** lucasagomes has quit IRC | 16:03 | |
openstackgerrit | Iury Gregory Melo Ferreira proposed openstack/ironic-prometheus-exporter stable/ussuri: Fallback to `node_uuid` if`node_name` is not present https://review.opendev.org/753065 | 16:08 |
openstackgerrit | Iury Gregory Melo Ferreira proposed openstack/ironic-prometheus-exporter stable/train: Fallback to `node_uuid` if`node_name` is not present https://review.opendev.org/753067 | 16:08 |
openstackgerrit | Merged openstack/virtualbmc master: Drop redundant milliseconds from logging https://review.opendev.org/752850 | 16:13 |
*** stendulker has quit IRC | 16:16 | |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic-python-agent master: Documentation: fix incorrect step names https://review.opendev.org/753080 | 16:16 |
openstackgerrit | Julia Kreger proposed openstack/ironic master: CI: Remove the build check for pre-build ramdisks only https://review.opendev.org/753081 | 16:17 |
TheJulia | I guess ~1 hour we should know if we can make IPA happier | 16:23 |
*** ociuhandu has quit IRC | 16:25 | |
iurygregory | I will be back to check =) | 16:26 |
*** k_mouza has quit IRC | 16:33 | |
*** antotala has quit IRC | 16:33 | |
*** Qianbiao has quit IRC | 16:35 | |
*** gyee has joined #openstack-ironic | 16:40 | |
*** derekh has quit IRC | 17:00 | |
*** k_mouza has joined #openstack-ironic | 17:00 | |
*** dougsz has quit IRC | 17:03 | |
*** k_mouza has quit IRC | 17:05 | |
*** k_mouza has joined #openstack-ironic | 17:14 | |
*** dsneddon has joined #openstack-ironic | 17:18 | |
*** k_mouza has quit IRC | 17:18 | |
*** dtantsur is now known as dtantsur|afk | 17:31 | |
openstackgerrit | Dmitry Tantsur proposed openstack/ironic master: Limit inspector jobs to 1 testing VM https://review.opendev.org/753094 | 17:33 |
dtantsur|afk | TheJulia: another part of it ^^^ | 17:34 |
*** k_mouza has joined #openstack-ironic | 17:40 | |
*** dking has joined #openstack-ironic | 17:41 | |
*** k_mouza has quit IRC | 17:44 | |
trandles | TheJulia: I'm not sure I can make today's meeting. I've got other deadlines I'm up against. I also don't see anything on the slides from anyone, myself included. :P Ok if we delay until tomorrow? | 17:48 |
TheJulia | trandles: I think that is likely best, I've been slammed | 17:48 |
iurygregory | lol pep8 failed in my backport to stable/train | 17:57 |
*** rloo has quit IRC | 18:01 | |
*** kaifeng has quit IRC | 18:02 | |
iurygregory | now I'm puzzled pep8 in stable/train complains about like break after binary operator W504 if I change to the other line i get line break before binary operator W503 ... WHAT?! .-. | 18:03 |
* iurygregory is thinking in add 504 to ignore on tox... | 18:04 | |
*** jtomasek has joined #openstack-ironic | 18:28 | |
*** k_mouza has joined #openstack-ironic | 18:34 | |
*** ociuhandu has joined #openstack-ironic | 18:34 | |
*** k_mouza has quit IRC | 18:38 | |
*** ociuhandu has quit IRC | 18:39 | |
*** jawad_axd has joined #openstack-ironic | 18:41 | |
*** ociuhandu has joined #openstack-ironic | 18:42 | |
*** jawad_axd has quit IRC | 18:46 | |
rpioso | iurygregory: Is the pep8 similar to the results on https://review.opendev.org/#/c/748927/? | 18:53 |
patchbot | patch 748927 - sushy - Make message parsing more resilient - 3 patch sets | 18:53 |
iurygregory | rpioso, nope in my case just complains about W504 but if I fix it will complain about W503 hehe | 18:55 |
iurygregory | probably due to the flak8 version in stable/train | 18:56 |
*** jawad_axd has joined #openstack-ironic | 19:02 | |
*** jawad_axd has quit IRC | 19:06 | |
*** martalais has quit IRC | 19:15 | |
*** jawad_axd has joined #openstack-ironic | 19:23 | |
*** jtomasek has quit IRC | 19:25 | |
*** jawad_axd has quit IRC | 19:27 | |
rpioso | iurygregory: Thank you! | 19:30 |
*** dougsz has joined #openstack-ironic | 19:30 | |
tzumainn | TheJulia, hi! I was talking to hugh about looking into the cinder ceph/iscsi driver for the moc, and he mentioned that he thought there was an iscsi limitation - something about a maximum of two targets? - that he thought he discussed with you a while ago | 19:32 |
tzumainn | I've been trying to look for some documentation regarding this, and was wondering if you knew of any? | 19:32 |
*** ociuhandu has quit IRC | 19:32 | |
*** jawad_axd has joined #openstack-ironic | 19:44 | |
*** jawad_axd has quit IRC | 19:48 | |
*** dougsz has quit IRC | 19:50 | |
*** jawad_axd has joined #openstack-ironic | 20:04 | |
*** jawad_axd has quit IRC | 20:09 | |
TheJulia | tzumainn: ahh yeah | 20:09 |
TheJulia | so that is a linux kernel limitation from loading from the ibft table | 20:09 |
TheJulia | since it is a linux kernel limitation, we never documented it | 20:11 |
tzumainn | TheJulia, ah, okay! and just so I understand correctly - the consequence is that a ceph volume running on the kernel can only create two iscsi targets? | 20:15 |
TheJulia | well, any iscsi targets | 20:16 |
tzumainn | got it - thanks! | 20:16 |
janders | TheJulia trandles I am on the same boat | 20:21 |
janders | same time, tomorrow? | 20:22 |
trandles | same time tomorrow works for me janders | 20:23 |
*** k_mouza has joined #openstack-ironic | 20:25 | |
*** ociuhandu has joined #openstack-ironic | 20:25 | |
*** k_mouza has quit IRC | 20:30 | |
dking | Does Ironic have any option similar to ATA secure erase for NVMe discs? | 20:33 |
dking | From what I'm seeing, it looks like the GenericHardwareManager would simply try to shred an NVMe drive. Is that correct? | 20:34 |
janders | dking I believe you are correct. I've been looking into trim/discard as a potential enhancement at a high level however the concern there is trim/discard varies greatly from device to device | 20:35 |
janders | dking what's your use case? | 20:36 |
janders | trandles updated invite sent | 20:37 |
dking | janders: We have many servers which use NVMe drives where we'll want to be able to clean them fully between each use (as they will be used by different customers). So, we need a secure and fast clean that doesn't put more strain than necessary on the hardware. | 20:37 |
janders | dking what is the brand of NVMes if I may ask? | 20:38 |
dking | Intel from Supermicro | 20:38 |
janders | dking are those direct-attached or going via controller? | 20:38 |
janders | (I suppose the former but I've seen certain configs based on the latter) | 20:39 |
dking | I'm not aware of any controller between them. Some are directly attached, and some are attached on microblades, but I don't think there's any controller involved on any. | 20:41 |
janders | right! | 20:41 |
janders | are you going to attend the PTG? | 20:41 |
dking | When is that? I'm going to be attending the Open Infra summit, and I was at the last PTG, but I suppose that I haven't signed up for the next on yet. | 20:42 |
dking | I was wondering if it might be practical to add in some rule to check for NVMe inside of the GenericHardwareManager, and run something like "nvme format -s1 block_device"? | 20:43 |
janders | https://www.openstack.org/ptg/ | 20:43 |
janders | October 26-30, 2020 | 20:43 |
janders | I've been looking into a potential enhancement like this just last week - but we weren't sure if there is sufficient user demand (and vendor support) to justify it | 20:44 |
janders | I think it can be done - I think it would be good to propose a discussion on this topic for the PTG | 20:44 |
janders | given it's not far away | 20:44 |
janders | would this timeframe work for your dking? | 20:45 |
janders | s/your/you | 20:45 |
*** jawad_axd has joined #openstack-ironic | 20:46 | |
dking | It should be fine. We'll might start working on something before then, but I'd be interested in the official solution regardless. | 20:46 |
janders | from my testing, trim/discard based deletion of a 1.5TB Intel NVMe takes about ~4 seconds and leaves no lasting performance impact (as in you can read/write at full bandwidth immediately after the command is run) | 20:47 |
janders | the biggest two catches are 1) device support for this is inconsistent and in the worst case it means the device may return success on erase and in fact keep the data readable and 2) some devices need vendor specific tooling to do this right or at all | 20:47 |
janders | is the storage setup of your servers consistent or do you have a mix of HDD and NVMe? | 20:48 |
dking | We have a mix. | 20:49 |
dking | We'll be moving mostly to NVMe, but I believe that some of our storage servers will still have HDD. Right now, we have a few servers that have both. | 20:49 |
janders | right! | 20:50 |
janders | so yeah some kind of adaptable behaviour would be best | 20:50 |
*** jawad_axd has quit IRC | 20:51 | |
janders | allright! I think it would be great to discuss this at the PTG and have operator's perspective (and first hand experience) on the matter as well | 20:51 |
janders | meanwhile I will chat to my team about this further | 20:51 |
janders | I think it would be nice to have this feature, we just weren't sure how easy will it be to get this right for most users | 20:51 |
janders | given variability in vendor support | 20:51 |
janders | dking thank you for bringing this up | 20:52 |
dking | janders: Thank you very much for your insight and help. I'll see what I find out on this end, and unless we see a solution before then, I'll be looking forward to talks at the PTG. | 20:53 |
janders | dking +1 | 20:54 |
janders | dking check out https://man7.org/linux/man-pages/man8/blkdiscard.8.html | 20:54 |
janders | 4s to discard 1.5TB NVMe in my lab | 20:55 |
janders | I would be very keen to hear your thoughts on the security implications of it though | 20:55 |
*** ociuhandu has quit IRC | 20:59 | |
TheJulia | So the interesting thing, I think, in regards to the discard/trim capability is we should be able to determine if possible. and head down that path automatically for nvme devices. | 21:25 |
JayF | One other question that's related, but almost-impossible to answer universally | 21:26 |
JayF | is how /secure/ is that behavior across NVMe controllers? | 21:26 |
TheJulia | Well, there is a device behavior contract there... | 21:27 |
TheJulia | but yeah | 21:28 |
JayF | I believe you may have personally experienced the pain of helping recover an ATA driver | 21:28 |
JayF | *drive | 21:28 |
JayF | that did not obey the device behavior contract there, either | 21:29 |
TheJulia | Yes, several | 21:29 |
TheJulia | We had that guy come in here that nuked like 8 machines worth of SSDs | 21:29 |
* TheJulia felt bad about that | 21:29 | |
* TheJulia still feels bad about that | 21:29 | |
JayF | I'm not by any means saying we shouldn't do it -- on the contrary we should -- but for folks who might not have written "generic" HardwareManager code before, making it truly generic is difficult to borderline impossible | 21:29 |
JayF | TheJulia: I'd feel bad for one machine of it. The other seven are just not doing a good job of validating external code :-| | 21:30 |
TheJulia | JayF: oh, in that case it was something like an intermediate raid controller decided the disks had failed | 21:30 |
TheJulia | so we had to put int some extra code to guard rail in that case | 21:30 |
TheJulia | because they went into security locked state | 21:30 |
TheJulia | \o/ | 21:30 |
TheJulia | They were able to get the disks recovered though | 21:31 |
JayF | what's that? code running in hardware space is bad/bad assumptions? Never! | 21:31 |
JayF | *making bad assumptions | 21:31 |
JayF | I'm sure they put in 6 pt font on some PDF that was deep-linked on the vendor webpage that the raid controller is incompatible with security locking | 21:32 |
TheJulia | of course | 21:32 |
*** martalais has joined #openstack-ironic | 21:34 | |
*** lmcgann_ has quit IRC | 21:36 | |
*** kaiokmo has joined #openstack-ironic | 21:36 | |
janders | the two main concerns so far with trim discard are 1) does it work at all for a specific device 2) if it does, how secure it is | 22:09 |
janders | there are some options around that, from having a list of known-good devices (and it could be a fully supported feature only for these) to having this an optional feature which would have to be explicitly enabled by the operator | 22:10 |
janders | and in case of the latter it would be on the operator to ensure it is secure enough for their circumstances | 22:10 |
janders | we could try find out at the introspection stage whether the devices support trim/discard/secure_erase too | 22:11 |
*** rh-jelabarre has quit IRC | 22:12 | |
janders | IMO it would be worthwhile to bring this up at the PTG to feel out 1) the interest among the operators and 2) the willingness of the vendors to make improvements in their kit to better support features like this | 22:12 |
janders | what are your thoughts? | 22:12 |
*** rh-jelabarre has joined #openstack-ironic | 22:12 | |
openstackgerrit | Merged openstack/python-ironicclient master: Add Python3 wallaby unit tests https://review.opendev.org/750722 | 22:13 |
*** k_mouza has joined #openstack-ironic | 22:17 | |
TheJulia | janders: we can't base it on introspection data and many deployments don't care about introspection data | 22:21 |
TheJulia | It has to be inside that code path of erase devices. | 22:21 |
janders | TheJulia right! | 22:22 |
*** k_mouza has quit IRC | 22:22 | |
TheJulia | And re: trim specifically, if we an validate that the trim command is actually sent via a code audit, then it is totally up to the device | 22:22 |
TheJulia | so if the device does not _really_ honor trim but doesn't error on the operation, then there is nothing we can do but fallback. If the trim disappears into the ether, then that is problematic but since it is a distinct command it should ack/respond to it | 22:22 |
TheJulia | janders: I'd add it, and maybe someone and dig through the code before the ptg so we have an understanding of it | 22:23 |
*** jawad_axd has joined #openstack-ironic | 22:29 | |
janders | what worries me a little is the fact that vendors often have their own CLI with "magic" features while this happens with generic tools: | 22:30 |
janders | http://paste.openstack.org/show/798176/ | 22:31 |
janders | (-s should be secure discard - and I am pretty sure those devices do support it) | 22:31 |
janders | I will try with the Intel tool after progressing a bit more with more time sensitive things | 22:32 |
openstackgerrit | Julia Kreger proposed openstack/ironic master: Reduce grenade node count https://review.opendev.org/753183 | 22:32 |
openstackgerrit | Julia Kreger proposed openstack/ironic master: Reduce VMs for multinode and standalone jobs https://review.opendev.org/753184 | 22:32 |
*** jawad_axd has quit IRC | 22:34 | |
openstackgerrit | Julia Kreger proposed openstack/ironic-python-agent master: CI: Lower memory usage of VMs/Increase swap https://review.opendev.org/753057 | 22:43 |
*** k_mouza has joined #openstack-ironic | 22:47 | |
*** tosky has quit IRC | 22:49 | |
TheJulia | janders: is there a boss card in that chassis. Basically a raid controller in the middle shoots the security/discard sort of commands in the foot :( | 22:50 |
*** jawad_axd has joined #openstack-ironic | 22:50 | |
*** k_mouza has quit IRC | 22:52 | |
janders | TheJulia "plain" (no --secure) discard works: http://paste.openstack.org/show/798180/ | 22:53 |
janders | unfortunately the really interesting part (--secure) does not :/ | 22:53 |
janders | I will see if I can quickly install the Intel tool | 22:53 |
*** jawad_axd has quit IRC | 22:55 | |
*** kaiokmo has quit IRC | 22:59 | |
TheJulia | hmm | 22:59 |
*** rcernin has joined #openstack-ironic | 23:01 | |
TheJulia | so news on CI job issues | 23:04 |
TheJulia | good news or bad news first? | 23:05 |
stevebaker | bad please | 23:05 |
TheJulia | by default, test vms no only have 1 GB of swap... and ~60GB for /opt at the worst | 23:06 |
TheJulia | and these changes were made globally | 23:06 |
TheJulia | well | 23:07 |
TheJulia | swap was | 23:07 |
TheJulia | good news is we can actually make things better... i hope. | 23:07 |
stevebaker | better by adding swap? | 23:08 |
TheJulia | well.. not exactly | 23:09 |
TheJulia | we can increase swap, but it digs away at /opt | 23:09 |
TheJulia | so returning us to 8GB of swap gives us ~50GB on /opt | 23:09 |
stevebaker | I see | 23:09 |
*** zzzeek has quit IRC | 23:10 | |
*** rcernin has quit IRC | 23:11 | |
TheJulia | realistically all we can really do is reduce our VM footprint | 23:11 |
*** jawad_axd has joined #openstack-ironic | 23:11 | |
*** rcernin has joined #openstack-ironic | 23:11 | |
*** zzzeek has joined #openstack-ironic | 23:13 | |
janders | TheJulia here are the results of a quick secure erase test with Intel-SSD/NVMe CLI: http://paste.openstack.org/show/798182/ | 23:13 |
janders | it looks like it works (as in controlled isn't causing any hassles) but having to use a vendor-specific CLI is not ideal... | 23:14 |
*** jawad_axd has quit IRC | 23:15 | |
janders | TheJulia more details: http://paste.openstack.org/show/798183/ (extracts from tool's doco) | 23:16 |
*** zzzeek has quit IRC | 23:17 | |
TheJulia | hmm, i wonder if hdparm... | 23:17 |
*** zzzeek has joined #openstack-ironic | 23:20 | |
janders | (extract from hdparm manual) http://paste.openstack.org/ | 23:20 |
janders | oops | 23:20 |
janders | http://paste.openstack.org/show/798184/ | 23:20 |
janders | copy paste fail | 23:20 |
janders | EXCEPTIONALLY DANGEROUS. DO NOT USE THIS OPTION!! | 23:20 |
janders | like that comment :) | 23:20 |
janders | TheJulia what hdparm functionality are you thinking of? | 23:21 |
TheJulia | trim mechanically won't entirely work because of the io interactions | 23:22 |
TheJulia | at least in SATA, you can only trim a 4k block at a time or something absurd like that | 23:23 |
TheJulia | and hdparm looks like it does the translation under the hood | 23:23 |
TheJulia | I'm kind of curious if hdparm might support the nvme call to trigger secure erase but that is too hopeful | 23:23 |
janders | blkdiscard seems to support that | 23:23 |
janders | as in you can pass the entire block device as a parameter | 23:24 |
TheJulia | is it sending option2 | 23:24 |
janders | blkdiscard? | 23:24 |
TheJulia | yeah | 23:24 |
janders | my guess is no because it has --secure flag that didnt work in my lab | 23:24 |
janders | so I suppose without --secure it's doing a "plain" discard as opposed to "secure" one | 23:24 |
TheJulia | so why does intel's tool work... I wonder | 23:24 |
janders | magic smoke | 23:24 |
janders | some proprietary tricks I suppose | 23:25 |
janders | but there is one more option | 23:25 |
janders | http://paste.openstack.org/show/798185/ | 23:25 |
janders | nvme-cli | 23:25 |
janders | trying to find out what that does | 23:26 |
janders | this is looking heaps better... | 23:27 |
janders | looks like nvme CLI secure erase *does* work | 23:29 |
*** jawad_axd has joined #openstack-ironic | 23:32 | |
janders | which is quite awesome | 23:33 |
janders | I will test more thoroughly if it does what it says though... hexdump time | 23:36 |
*** martalais has quit IRC | 23:36 | |
*** jawad_axd has quit IRC | 23:37 | |
*** jawad_axd has joined #openstack-ironic | 23:52 | |
*** jawad_axd has quit IRC | 23:57 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!