Monday, 2020-09-21

*** zaneb has quit IRC02:03
*** Qianbiao has joined #openstack-ironic02:04
*** zaneb has joined #openstack-ironic02:04
openstackgerritAnkit Kumar proposed openstack/ironic master: Adding changes for iso less vmedia support  https://review.opendev.org/75200103:47
*** zzzeek has quit IRC04:15
*** zzzeek has joined #openstack-ironic04:17
*** uzumaki has joined #openstack-ironic04:20
*** rcernin has quit IRC04:31
openstackgerritPete Zaitcev proposed openstack/virtualbmc master: Drop redundant milliseconds from logging  https://review.opendev.org/75285004:35
*** rcernin has joined #openstack-ironic04:40
*** jawad_axd has joined #openstack-ironic05:11
*** jawad_axd has quit IRC05:15
*** abdysn has joined #openstack-ironic05:20
*** ociuhandu has joined #openstack-ironic05:35
*** rcernin has quit IRC05:39
*** ociuhandu has quit IRC05:40
*** rcernin has joined #openstack-ironic06:01
iurygregorygood morning Ironic06:04
*** Lucas_Gray has joined #openstack-ironic06:08
*** rcernin has quit IRC06:20
*** Lucas_Gray has quit IRC06:22
*** jtomasek has joined #openstack-ironic06:44
jandersgood morning iurygregory06:45
iurygregoryhey janders o/06:45
jandershey o/06:45
*** bfournie has quit IRC06:49
QianbiaoMorning ironic.06:53
QianbiaoMorning iurygregory, janders06:53
iurygregoryhey Qianbiao o/06:54
jandershey Qianbiao06:54
Qianbiaoo/06:54
arne_wiebalckGood morning janders Qianbiao iurygregory and ironic!06:55
iurygregoryhey arne_wiebalck o/06:55
Qianbiaoarne_wiebalck morning o/06:56
*** tosky has joined #openstack-ironic07:03
*** bfournie has joined #openstack-ironic07:07
*** rcernin has joined #openstack-ironic07:39
*** jtomasek has quit IRC07:41
*** jtomasek has joined #openstack-ironic07:50
*** rcernin has quit IRC07:52
*** lucasagomes has joined #openstack-ironic08:00
*** jawad_axd has joined #openstack-ironic08:00
*** Lucas_Gray has joined #openstack-ironic08:13
*** alexmcleod has joined #openstack-ironic08:24
*** derekh has joined #openstack-ironic08:42
QianbiaoHello, if i raise an error from ml2 mechanism driver's update_port_postcommit, will it break the ironic provision process.08:42
Qianbiaowill it result in instance ERROR08:43
iurygregorywell if an error occur it may put the instance in ERROR08:44
Qianbiaonice, this is what i want.08:45
*** ociuhandu has joined #openstack-ironic08:53
*** k_mouza has joined #openstack-ironic08:54
*** Abdallahyas has joined #openstack-ironic09:34
*** jtomasek has quit IRC09:35
*** abdysn has quit IRC09:38
*** jtomasek has joined #openstack-ironic09:48
*** sshnaidm|afk is now known as sshnaidm09:52
uzumakihola senor iurygregory ! how u doing?10:08
iurygregoryuzumaki, hey o/ doing good10:08
uzumakianybody has an idea what happens if both hardware and software RAID are provided for a node?10:09
uzumakiiurygregory, how's the weather today? winter started rolling in yet or not?10:09
iurygregorynah weather is crazy max 24 min 13...10:10
*** Qianbiao has quit IRC10:14
uzumakiiurygregory, oh boy!10:16
uzumakiiurygregory, what's the difference between deploy and clean steps? Is one better than the other? Just wondering10:16
*** hjensas|afk is now known as hjensas10:17
iurygregorywell afaik one is done during deployment and other during cleaning10:17
iurygregoryI wouldn't say one is better than the other..10:18
uzumakiThat's what I thought.. it could be a choice for the operator, to use one or the other, I suppose.. Depending on how it fits their use cases iurygregory10:21
dtantsurmorning ironic10:23
dtantsurso, this time it's Monday, yeah?10:23
jandersgood morning dtantsur10:23
jandersyeah... that seems to be the general consensus10:24
iurygregorymorning dtantsur10:26
*** Wryhder has joined #openstack-ironic10:28
*** Lucas_Gray has quit IRC10:29
*** Wryhder is now known as Lucas_Gray10:29
janderswhat is the difference between start_managed_inspection(task) and _start_inspection(node_uuid, context)?10:48
janders(context: https://opendev.org/openstack/ironic/src/branch/master/ironic/drivers/modules/inspector.py )10:48
jandersfirst guess: former has to do with OOB and the latter is inspector, but a quick test doesn't seem to confirm that guess, hence the question. I might be seriously confused though! :)10:50
*** Abdallahyas has quit IRC10:50
*** abdysn has joined #openstack-ironic10:50
iurygregoryI may be wrong, my mind would say that one ironic manages the boot and the other nope...10:59
jandersthat makes sense - thank you iurygregory!10:59
*** Lucas_Gray has quit IRC11:13
dtantsuryep. managed inspection goes through the ironic's boot management interface, non-managed - via static DHCP/PXE configuration.11:21
uzumakimorning dtantsur, janders ! o/11:23
dtantsur\o11:24
uzumakihow you doing?11:24
*** Qianbiao has joined #openstack-ironic11:25
*** jawad_axd has quit IRC11:25
dtantsurpretty okay, you?11:26
*** jawad_axd has joined #openstack-ironic11:26
uzumakipretty okay? strange.. :D I'm fine, the usual boring monday morning, post-weekend trance11:26
uzumakiwhat makes you "pretty okay"? xD dtantsur11:27
dtantsura difficult bouldering session this morning, I need to grow new skin on my palms :)11:27
jandersdtantsur thank you11:27
uzumakibouldering session? since when do they need programmers in building pyramids :D11:29
dtantsurwe build everything, as long as it can be built from duct tape and chewing gum11:30
dtantsur:)11:30
dtantsurhttps://en.wikipedia.org/wiki/Bouldering11:30
openstackgerritVerification of a change to openstack/ironic-python-agent failed: Generate a TLS certificate and send it to ironic  https://review.opendev.org/74993011:35
dtantsurSIGH11:36
dtantsur10 rechecks and counting11:36
uzumakidtantsur, so you just felt like Ethan Hunt from Mission Impossible 2, when conquering those rocks?11:38
dtantsurI felt like a sack of potatoes :D but I'm a very beginner11:39
uzumakidtantsur, with all the rag-doll physics xD (the ability to realistically fall to the ground)11:39
janderssee you tomorrow Ironic11:43
janderso/11:43
uzumakiau revoir janders !11:43
iurygregorydtantsur, duct tape and  chewing gum OMG11:50
dtantsuraka IT industry11:50
uzumakiiurygregory, ikr?11:50
iurygregoryhahahaha11:50
iurygregoryOMG11:50
*** Abdallahyas has joined #openstack-ironic11:53
*** rh-jelabarre has joined #openstack-ironic11:55
*** abdysn has quit IRC11:56
iurygregorydtantsur, I've created the RFE https://storyboard.openstack.org/#!/story/2008171 not much details since I need to look at some introspection data to see if it would be useful =)11:58
openstackgerritQianBiao Ng proposed openstack/ironic stable/ussuri: opt: Enhance old stable branches to use latest python-ibmcclient  https://review.opendev.org/75200611:58
iurygregorysomething else I was thinking is provide some sort of cli that could generate the templates for alarms in prometheus so the user could just update ...11:59
dtantsuriurygregory: cool, yeah. I guess it would be useful to specify if you expect this to work with the active node introspection11:59
iurygregoryack12:01
uzumakiterm12:02
uzumakixD12:03
Qianbiaodtantsur stevebaker JayF it seems there have different opinion in this patch: https://review.opendev.org/#/c/752006/12:06
patchbotpatch 752006 - ironic (stable/ussuri) - opt: Enhance old stable branches to use latest pyt... - 7 patch sets12:06
Qianbiaobasicly, i am agree with both opinion. I agree dtantsur more in the long term. The mechanism can support complex hardware env.12:08
*** uzumaki has quit IRC12:10
openstackgerritDmitry Tantsur proposed openstack/ironic master: Deprecate the iscsi deploy interface  https://review.opendev.org/75020412:11
*** Abdallahyas has quit IRC12:13
*** abdysn has joined #openstack-ironic12:13
*** dougsz has joined #openstack-ironic12:16
*** priteau has joined #openstack-ironic12:22
arne_wiebalckdtantsur: I tried to track down the missing ESPs when doing UEFI s/w RAID: https://storyboard.openstack.org/#!/story/2008164 and here's what I found:12:32
arne_wiebalckIt seems that when we reboot after cleaning, i.e. during deploy, the RAID cannot be fully assembled (as one of the disks is "non-fresh"). Consequently, Ironic cannot identify all holder disks and leaves some of them without ESP. The unfresh device may come from an unclean shutdown of the RAID, and from what I see at the end of cleaning we trigger a power off (i.e. no gentle RAID shutdown) ...12:32
arne_wiebalckIn order to address this I was thinking to shut down the RAID devices right after creation, so that they are stopped in a clean way (even if not sync'ed) ... does this all sound sensible?12:32
*** jawad_ax_ has joined #openstack-ironic12:32
maelkHi! Does it make sense to try to offload the json_rpc TLS of a conductor to a local reverse proxy ? Let's say I have two conductors. If I configure them to listen on a localhost port only, and use the reverse proxy to expose it. My guess is that this would not work if I have multiple conductors and it would be much better to move to some AMQP based12:35
maelk approach ? Is this correct ?12:35
*** jawad_axd has quit IRC12:36
*** jawad_axd has joined #openstack-ironic12:42
*** jawad_ax_ has quit IRC12:46
*** Goneri has joined #openstack-ironic12:50
*** martalais has joined #openstack-ironic12:51
martalaisGood morning, Ironic :)12:52
iurygregorymorning martalais o/12:54
dtantsurmaelk: hi, what are you trying to achieve by that?13:01
dtantsurarne_wiebalck: won't it be more useful to change cleaning to graceful shutdown?13:02
dtantsurthere may be clean steps that need to access the root device, if you stop RAID, it won't be possible13:02
*** rloo has joined #openstack-ironic13:03
dtantsurTheJulia: I'm staring at the managed-non-standalone failures on your patch (and not only there), and it's weird. 1.5G of RAM disappear in unclear direction, and qemu stars OOMing. see the whiteboard for more findings, help welcome.13:04
TheJuliadtantsur: ugh13:05
dtantsurwow, I didn't expect you to be already awake :) good morning!13:06
TheJuliaI just got up13:06
dtantsurnow it's happy Monday, as folks have told me already13:06
maelkdtantsur you made a comment on a PR in Metal3 that it would be better to offload the TLS to httpd for example. It's quite straight forward for the API, but I'm a bit unclear for the Json RPC part. Just trying to understand the ins and outs to see if it is worth it, or if we were in such a case (multiple conductors) we'd anyways need to change the R13:07
maelkPC solution13:07
dtantsurmaelk: I think it's fine to use built-in TLS for now for non-user-facing stuff.13:07
dtantsuror even: it's fine to use built-in TLS for now. let's just keep in mind that it's not 100% rock solid, and we may want to replace it as/if we scale.13:08
dtantsurJSON RPC should work fine with multiple conductors. AMQP will bring its own bunch of problems.13:08
dtantsurthe only condition with JSON RPC is that conductor host names must be accessible (i.e. not fake)13:09
maelkok, but then it uses the port configured for the local instance to connect to others, no ? so in pratice that forbids TLS offloading to a reverse proxy13:09
iurygregorygood morning TheJulia =)13:10
TheJuliacoffee brewing13:10
*** cdearborn has joined #openstack-ironic13:11
dtantsurmaelk: correct. we could provide an override for that.13:11
dtantsurthe problems are possible to overcome, but it's probably not worth it right here and now, unless you already hit issues with the built-in TLS implementation13:12
dtantsura much bigger blocker for multi-node ironic is ironic-inspector..13:12
maelkI'm not hitting a problem, just trying to understand the details. Thank you for the explanations!13:14
*** jawad_axd has quit IRC13:21
*** jawad_axd has joined #openstack-ironic13:22
TheJuliaout of curiosity, has anyone gone through the whiteboard this morning and updated patch statuses for the review priorities?13:34
iurygregoryI've updated a few things on friday..13:35
openstackgerritZane Bitter proposed openstack/sushy-tools master: Fix race condition initialising persistent dict  https://review.opendev.org/75295313:37
arne_wiebalckdtantsur: you mean SOFT_POWER_OFF rather than POWER_OFF ?13:38
arne_wiebalckdtantsur: I was thinking this as well, but then tought it might be better do the changes close to the RAID creation ... but your root device point is certainly valid13:39
Qianbiao<arne_wiebalck> in IBMC, the raid configuration is always slow then it shows.13:41
Qianbiaolike it says raid configuration done, but indeen, the backend is still running.13:41
*** tzumainn has joined #openstack-ironic13:42
arne_wiebalckQianbiao: the fact that the RAID is not synced is not so much of an issue, I think13:42
dtantsurarne_wiebalck: technically.. it's interesting13:44
dtantsurI don't think we have a notion of "shut it down gently first"13:44
arne_wiebalckdtantsur: I can try both13:44
dtantsurwe could do it with SOFT_POWER_OFF, maybe worth giving a try13:44
arne_wiebalckthat seems the easiest to test13:44
arne_wiebalckhttps://github.com/openstack/ironic/blob/master/ironic/drivers/modules/deploy_utils.py#L68413:45
arne_wiebalckdtantsur: here^^ ?13:45
dtantsurarne_wiebalck: not only there, there is also reboot between cleanings13:45
dtantsurbut this is the key place13:45
dtantsurnote that we'll have to do fall back to hard power off if we cannot soft power off13:46
arne_wiebalckdtantsur: right ... ok, let me test this to see if the failure rate goes down13:46
arne_wiebalckdtantsur: can you think of any reason why we do power_off here in the first place? Seems pretty harsh.13:47
dtantsurarne_wiebalck: you mean, why not soft_power_off?13:47
arne_wiebalckdtantsur: exactly13:47
dtantsurarne_wiebalck: this code was written long before soft power actions appeared13:47
dtantsurand not all drivers support them, so you'll have to handle it too13:48
* arne_wiebalck is caught in his IPMI world, it seems13:48
arne_wiebalckdtantsur: ok, thanks, I'll do some testing and come back after13:48
*** sdanni has joined #openstack-ironic13:49
*** martalais has left #openstack-ironic13:50
*** martalais has joined #openstack-ironic13:52
openstackgerritRichard G. Pioso proposed openstack/ironic master: Fix redfish BIOS to use @Redfish.SettingsApplyTime  https://review.opendev.org/75261413:53
*** sdanni has quit IRC13:54
openstackgerritJulia Kreger proposed openstack/ironic master: Minor agent version code cleanup  https://review.opendev.org/74955213:54
openstackgerritJulia Kreger proposed openstack/ironic master: Add Redfish BIOS interface to idrac HW type  https://review.opendev.org/74924013:56
TheJuliadtantsur: the qemu oom stuff you mentioned, is that resulting in truncated console logs?13:57
openstackgerritDmitry Tantsur proposed openstack/ironic master: Deprecate the iscsi deploy interface  https://review.opendev.org/75020414:00
dtantsurTheJulia: it's resulting in virtualbmc freaking out14:00
TheJuliahmm14:00
*** uzumaki has joined #openstack-ironic14:01
TheJuliadtantsur: is the gate status correct on the etherpad?14:24
TheJuliaI guess it is ipa/inspector only14:24
*** martalais has left #openstack-ironic14:27
iurygregoryyay we can release IPE =) after the bugfix is merged =D14:28
*** ajya|afk is now known as ajya14:29
openstackgerritChristopher Dearborn proposed openstack/ironic master: Redfish driver firmware update  https://review.opendev.org/74961914:31
QianbiaoHello TheJulia dtantsur I have update the python-ibmcclient version to compatible with 0.1.0, need a +2 now: https://review.opendev.org/#/c/752006/14:34
patchbotpatch 752006 - ironic (stable/ussuri) - opt: Enhance old stable branches to use latest pyt... - 7 patch sets14:34
TheJuliadtantsur: I think I have figured out what is at least kind of happening with the ipa jobs14:36
*** antotala has joined #openstack-ironic14:37
TheJuliadtantsur: I think I have figured out what is at least kind of happening with the ipa jobs14:37
dtantsurTheJulia: I updated it today14:38
TheJuliathe setting overrides are not working14:38
dtantsurmmm, which ones?14:38
TheJuliavm memory size14:38
TheJuliaramdisk type14:38
TheJuliathe jobs define tinyipa and 512mb of ram14:39
TheJuliathe jobs are running and zuul inventory with centos and 3GB of ram14:39
dtantsurhuh14:39
TheJuliahttps://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b18/752719/1/check/ipa-tempest-bios-ipmi-direct-src/b184792/zuul-info/inventory.yaml <-- from the job you linked in the etherpad14:39
iurygregorywut???14:39
dtantsurbut it still should work, shouldn't it? or do timeouts get mixed up as well?14:39
TheJuliait is unpredicatble because OOMKiller14:40
dtantsursame OOM problem as in inspector? or what do you mean?14:40
dtantsurcentos+3G at least used to work14:41
TheJuliapossibly14:41
TheJuliabut not when we're trying to fire two VMs14:41
TheJuliaIRONIC_VM_COUNT: 214:41
dtantsurI wonder if we should declare the experiment with running DIB in the CI failed :(14:41
TheJuliaNot sure, it _seems_ like the override inheratance is broken14:42
TheJuliaoh wait14:42
TheJuliadoh!14:42
TheJuliaI'm stupid14:42
TheJuliaI see it14:42
* dtantsur is intrigued14:42
TheJuliaI was looking at the wrong line in the job definition14:43
*** jtomasek has quit IRC14:45
*** uzumaki has quit IRC14:47
TheJuliahmm14:50
*** lmcgann_ has joined #openstack-ironic14:50
*** martalais has joined #openstack-ironic14:52
dtantsuriurygregory: do you have any bits related to vmedia in bifrost ready or in-progress?14:53
iurygregorydtantsur, not many bits as I wished...14:54
TheJuliaso these VM's have 1GB of swap14:55
dtantsuriurygregory: feel free to post anything WIP, I can take a look14:56
iurygregorydtantsur, ack o/14:57
iurygregorywill work on that14:57
*** martalais has quit IRC14:57
iurygregorymy idea atm is just add the support for vmedia and make it work, after done add a config that will work for vmedia + dhcp-less14:58
*** kaifeng has joined #openstack-ironic14:59
*** martalais has joined #openstack-ironic15:00
TheJulia#startmeeting ironic15:00
openstackMeeting started Mon Sep 21 15:00:10 2020 UTC and is due to finish in 60 minutes.  The chair is TheJulia. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
TheJuliao/15:00
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
*** openstack changes topic to " (Meeting topic: ironic)"15:00
openstackThe meeting name has been set to 'ironic'15:00
iurygregoryo/15:00
martalaiso/15:00
ajyao/15:00
cdearborno/15:00
bdoddo/15:00
erbarro/15:00
TheJuliaOur agenda this week can be found on the wiki.15:00
rlooo/15:00
TheJulia#link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting15:00
arne_wiebalcko/15:00
rpioso\o15:01
* iurygregory forgot to add 2 rfe's for discussion...15:01
TheJulia#topic Announcements / Reminders15:01
*** openstack changes topic to "Announcements / Reminders (Meeting topic: ironic)"15:01
TheJuliaiurygregory: quick! add them :)15:01
kaifengo/15:01
TheJuliaFirst off!15:01
TheJulia#info CI is very unhappy - Details are on the whiteboard.15:01
iurygregoryCI yay...15:01
TheJuliaThis appears to be memory related :\15:01
TheJulia#info We're also in the home stretch for victoria. This week is R-3 for OpenStack.15:02
TheJulia#info Priority obviously is CI and reviews this week.15:02
TheJulia#info TC/PTL nominations are this week, if your interested message TheJulia15:02
*** stendulker has joined #openstack-ironic15:02
TheJuliaI guess I'll run again if you fold want me to.15:02
stendulkero/15:02
TheJulia#info Redfish interop status meeting has been scheduled15:03
TheJuliaIt will be on Thursday, September 23rd at 12 PM UTC.15:03
TheJulia#link https://cern.zoom.us/j/9480895033915:03
arne_wiebalckEveryone is welcome of course.15:03
iurygregorywe will give you cookies if you run again TheJulia15:03
arne_wiebalckiurygregory: ++15:03
rajiniro/15:03
TheJuliaiurygregory: cranberry oatmeal and you'll have me sold.15:04
TheJuliaOne final item in my semi-out of order list of announcements/reminders15:04
rpiosomraineri from Redfish Forum will attend the first half.15:04
* iurygregory would ship food from Annapurna to TheJulia 15:04
TheJuliaIt looks like the kexec effort should end up with some devoted PTG time to discuss and determine the next path. I got an email from Boston University and the group of students did not choose ironic :(15:05
iurygregory#sad15:05
TheJuliac'est la vie15:05
*** k_mouza has quit IRC15:05
TheJuliaDoes anyone have anything to announce or remind us of?15:05
TheJuliaNo action items so we can proceed to subteam statuses15:06
openstackgerritMerged openstack/ironic-prometheus-exporter master: Fallback to `node_uuid` if`node_name` is not present  https://review.opendev.org/72317615:06
TheJuliaiurygregory: I guess you can release IPE :)15:07
iurygregoryI will =)15:07
* TheJulia guesses there are no other announcements and reminders15:07
TheJuliaonward?15:07
iurygregory++15:07
TheJulia#topic Review subteam status reports15:07
*** openstack changes topic to "Review subteam status reports (Meeting topic: ironic)"15:07
TheJulia#link https://etherpad.openstack.org/p/IronicWhiteBoard15:07
*** uzumaki has joined #openstack-ironic15:08
TheJuliaStarting at line 27915:08
iurygregoryI think we can remove the Zuulv3 migration15:08
iurygregoryand have a topic for grenade efforts in the future15:09
TheJuliaarne_wiebalck: w/r/t the scale issues item you noted, I've got a patch up to preserve the efi boot artifacts, we should likely make sure we don't collide in our efforts15:10
*** priteau has quit IRC15:10
*** k_mouza has joined #openstack-ironic15:11
arne_wiebalckTheJulia: ok15:11
arne_wiebalckTheJulia: you have a link?15:12
TheJuliaarne_wiebalck: it is on ipa, I don't at the moment but I'll get it to you15:13
arne_wiebalckTheJulia: I should be able to find it ...15:13
TheJuliaOtherwise I think most things look okay and in a good state. I realize we're also basically blocked on ipa at the moment due to CI15:13
TheJuliaAnyhow, are we good to proceed?15:13
dtantsuryep15:14
TheJuliaone moment, having to relaunch windows, my browser crashed15:14
TheJulia#topic Deciding on priorities for the coming week15:14
*** openstack changes topic to "Deciding on priorities for the coming week (Meeting topic: ironic)"15:14
TheJuliaIs there anything we need to add to the list of the priorites15:14
TheJulia#link https://etherpad.openstack.org/p/IronicWhiteBoard15:15
TheJuliaStarting at line 16415:15
*** k_mouza has quit IRC15:15
*** k_mouza has joined #openstack-ironic15:15
dtantsuriscsi deprecation? https://review.opendev.org/75020415:16
patchbotpatch 750204 - ironic - Deprecate the iscsi deploy interface - 8 patch sets15:16
TheJuliaI think it is already on the list15:16
stendulkerThis can be added for vendor priority (iLO) https://review.opendev.org/#/c/752001/15:16
patchbotpatch 752001 - ironic - Adding changes for iso less vmedia support - 8 patch sets15:16
TheJuliait is, just marked as a wip15:16
dtantsurah, right, removed WIP15:17
TheJuliastendulker: sure, if you could make that update on the etherpad that would be much appreciated15:17
stendulkerupdated15:17
stendulkerthanks15:17
TheJuliaAny objection if I remove the networkin-generic-switch item?15:17
TheJulialast updated September 9th15:18
arne_wiebalckhttps://review.opendev.org/#/c/748049 is the one you were referring to earlier?15:18
patchbotpatch 748049 - ironic-python-agent - Support partition image efi contents - 4 patch sets15:18
TheJuliaarne_wiebalck: yes15:18
iurygregoryI will add latter some backports of the IPE (after I push them)15:19
TheJuliaiurygregory: sounds good15:19
QianbiaoI add a line under IPA segment for https://review.opendev.org/#/c/752024/15:19
patchbotpatch 752024 - ironic-python-agent - Fix: make Intel CNA hardware manager none generic - 5 patch sets15:19
TheJuliaI see a couple people have updated a few different areas15:19
TheJuliaAny objections to what is present at this time?15:19
*** jawad_axd has quit IRC15:20
iurygregorynone from me15:20
TheJuliaokay, seems like we can proceed then!15:21
TheJuliaSo we have nothing listed for discussion today15:21
TheJuliaSo we can proceed to the Baremetal SIG!15:21
TheJulia#topic Baremetal SIG15:21
*** openstack changes topic to "Baremetal SIG (Meeting topic: ironic)"15:21
arne_wiebalckNo more input on the doodle, so I guess we just schedule a first meeting and see how that goes.15:22
TheJuliaarne_wiebalck: that seems reasonable15:22
arne_wiebalckThat's it :)15:22
TheJuliaOkay then, RFE Review it is then15:23
TheJulia#topic RFE Review15:23
*** openstack changes topic to "RFE Review (Meeting topic: ironic)"15:23
TheJuliaiurygregory: I believe these are yours?15:23
iurygregoryyup15:23
iurygregory\o/15:23
TheJuliaiurygregory: would you like to talk through them?15:24
iurygregorySo, 1st RFE is https://storyboard.openstack.org/#!/story/200817115:24
iurygregoryto add some support for IPE for introspection data15:24
TheJuliaso what problem do you see this solving?15:25
iurygregorythis would probably be something interesting when we have active node introspection15:25
TheJuliahardware discrepancy detection?15:25
iurygregoryfor example the operator wants the firmware versions for the X vendor machines the same version...15:25
*** sdanni has joined #openstack-ironic15:26
iurygregoryso he can setup an alarm based on the "metric" for firmware version to get notification if something is different for the machines15:26
dtantsurassuming extra-hardware present? I don't think we collect firmware versions by default.15:26
iurygregoryofc it would depend on what the introspection data will have =)15:27
JayFWhat would be a downside for optionally supporting putting node inspection data in prometheus? As long as it's not enabled by default, it seems like a potentially good thing.15:27
arne_wiebalckcan introspection rules be used during active introspection?15:27
TheJuliaiurygregory: would this only apply to the inspection data for the nodes that the IPE is responsible for based being paired with the conductor and the data supplied from the sensor data collection?15:27
dtantsurarne_wiebalck: yes, I think15:27
JayFEspecially with some of the data you could "inspect" about node lifetime from utilizing plugins for more data, e.g. SMART cycles, firmware versions (as mentioned), etc15:28
arne_wiebalckdtantsur: what would happen if they detect an issue?15:28
iurygregoryTheJulia, only for the ones that conductor can report I would say15:28
arne_wiebalckdtantsur: for normal inspection, the node would fail inspection15:28
dtantsurarne_wiebalck: probably nothing particular.. I cannot say for sure.15:28
arne_wiebalckdtantsur: since that would be a similar functionality15:28
arne_wiebalckdtantsur: for the alarming part at least15:29
dtantsuroverall, I'm with JayF on the "why not" bit. but I'd rather see more technical details in the RFE before committing to it.15:29
TheJuliaiurygregory: I suspect your going to need to write something a bit more verbose along the lines of a spec. I like the idea, I'm only worried about size/scale/scope issues and mechanics.15:29
JayF++ that is a pretty anemic15:29
JayFRFE **15:29
iurygregoryyeah sorry for that =)15:29
dtantsurI personally don't insist on a spec, I'd just read more text on the story15:29
TheJuliaThat being said, I suspect many of us agree it would be a good thing15:29
iurygregoryI will try gather more details =)15:29
TheJuliasame really, just more details would be excellent15:30
iurygregoryack15:30
iurygregoryso moving to the second RFE15:30
TheJuliaand maybe think through the questions posed in a spec while your adding detail15:30
iurygregorysure =)15:30
TheJuliaBecause those questions are asked to provoke thought in many cases :)15:30
TheJulia"what if?"15:30
rloowell... ideally, it'd be some sort of 'plugin'? or class, so that other non-prometheus systems could also get that introspection data in the future?15:31
dtantsurwe have plugins for ironic-inspector15:31
TheJuliaAwesome15:31
dtantsurthe problem is, prometheus only supports "pull" model15:31
dtantsur(there is something for the "push" model, but it's not recommended)15:31
TheJuliaiurygregory: so your second RFE?15:31
rloo^^ which means the general idea seems ok, but i personally would like a bit more details. if not a spec, please put in the story.15:31
iurygregoryTemplates for alarms  https://storyboard.openstack.org/#!/story/200817615:32
iurygregorywhen using prometheus normally you will have some set of alarm rules you will use that will trigger notifications15:32
TheJuliaiurygregory: I think that makes a lot of sense, just maybe a little more detail on how we're going to make it easy15:33
dtantsurI guess I have the same concern with this RFE: it's very short15:33
rlooiurygregory: sorry, for the first rfe, would you mind updating the title or whatever to something more specific, eg 'push introspection data to prometheus' ?15:33
iurygregoryfor example you want to get a notification if the temparature of the nodes is higher than a threshold ...15:33
iurygregoryrloo, sure I will do15:33
dtantsurI'd like to see two distinct parts: user story ("As an operator I want to") and solution ("we will change that, add this")15:33
dtantsurright now I don't quite understand why operators cannot configure it.. the way they usually configure it15:34
iurygregorywell they can configure15:34
TheJulia++15:34
TheJuliaI guess I'm also missing a hint at the solution to the problem of it is hard15:34
iurygregoryin my mind would be something like15:34
iurygregoryyou want an alarm for higher temperature15:35
TheJuliaGenerally no objection to the rfe otherwise, just need some more detail :)15:35
iurygregoryso you can say the metric name and the experion it should use15:35
iurygregoryand we would output the yml format you need to update in the configuration of prometheus...15:35
iurygregoryinstead of going and writing all rules you want etc..15:36
TheJuliaI guess the conundrum in a way is they don't really know what to populate until after the fact15:36
TheJuliaso they have no examples15:36
dtantsurI have no idea about prometheus, so please pardon my question if it's silly: is it really easier?15:36
dtantsurah, hmm. do we have all the data we need? I thought it was also driver and hardware specific?15:37
iurygregorywell I would prefer to get a file that I just need to add to the config instead of writing everything..15:37
TheJuliaYeah, that is a conundrum since there are some data transformations based on names if memory serves15:37
* TheJulia wonders if this could almost just be sample alarms documentation15:37
dtantsuragain, no hard feelings against, but I'd like to understand more before ack'ing15:39
dtantsurand ideally have it written :)15:39
TheJuliaiurygregory: so your going to make things more verbose on both and I guess we can revisit again next week?15:39
dtantsurit = details in the RFE15:39
dtantsurTheJulia++15:39
iurygregoryyup15:39
TheJuliaAwesome then15:39
TheJuliaAnyone have any other RFE's while we're at it?15:40
iurygregoryI will try to show up on next week (monday is holiday in CZ)15:40
TheJuliaiurygregory: ack15:40
*** abdysn has quit IRC15:40
iurygregoryand I'm moving things during the weekend...15:40
TheJuliaiurygregory: ugh, well if your not around we can always hold to the following week15:40
dtantsurI think you'll deserve some proper rest afterwards15:40
TheJuliaAnyway!15:40
TheJulia#topic Open Discussion15:40
*** openstack changes topic to "Open Discussion (Meeting topic: ironic)"15:40
*** jawad_axd has joined #openstack-ironic15:40
dtantsuriurygregory: just update it, and we can discuss without you15:40
iurygregoryack15:41
dtantsurworst case, we delay one more week15:41
TheJuliaNow, we can plot to take over the world!15:41
dtantsur\o/15:41
* TheJulia wonders where she left the coffee at15:41
*** uzumaki has quit IRC15:41
*** k_mouza has quit IRC15:41
TheJuliaso regarding CI15:42
*** k_mouza has joined #openstack-ironic15:42
iurygregory\o/15:42
cdearbornhey folks, I ran across an issue when testing firmware update. After some investigation, I believe this issue exists in all cleaning steps that call task.process_event('fail'), which i modeled firmware update after. i fixed the issue in firmware update, but i believe it is present in all the other cleaning steps. wondering how this should be handled. should we discuss now, or outside the meeting?15:43
TheJuliait seems like both VMs are running and we're simply running the machines out of ram. 1GB of swap, 8 GB of ram... on RAX :(15:43
dtantsurcdearborn: I think we should get rid of any calls to task.process_event in drivers15:44
TheJuliaI guess we need to force those jobs to use tinyipa and reduce memory count in rax as well?15:44
dtantsurit may be some functionality gap that we need to cover15:44
TheJuliait may be this stuff just works in other clouds15:44
iurygregoryTheJulia, it's happening only in rax?15:44
TheJuliabecause of more swap on the instances15:44
dtantsurTheJulia: I've seen very different RAM consumption between different jobs - see the whiteboard15:44
TheJuliaiurygregory: I'm not sure, but we're also trying centos builds on rax right now15:44
iurygregorygotcha15:44
dtantsuranother option is to use concurrency==1 and 1 VM15:44
dtantsurwill make the jobs take longer, of course15:45
*** jawad_axd has quit IRC15:45
*** JamesBenson has joined #openstack-ironic15:45
iurygregorywe are default to 2 VM's on most ironic jobs (netboot/local) for tempest testing...15:45
iurygregorysince we need to set capabilities before running tempest or nova will *BOOM*15:46
dtantsurah, I remember now. we need two VMs because of cleaning..15:46
iurygregoryand I think non-uefi jobs requires 3GB and uefi 4GB that can cause more swap etc since we have bigger instances..15:47
cdearbornthe issue is that when a cleaning step detects failure and calls task.process_event('fail'), the node moves into the clean failed state, but does not go into maintenance mode and the next cleaning step that is run against the node after is never actually kicked off. The node goes into clean wait and stays there forever.15:47
cdearbornI fixed the issue in firmware update by calling conductor.utils.cleaning_error_handler() instead.15:48
cdearbornI believe this has something to do with the state that is left in driver_internal_info. If the cleaning steps just calls into task.process_event('fail'), then that state is never cleaned up.15:48
dtantsurI think clean steps should only raise exceptions, not mess with states15:49
dtantsurbut yes, you're right, just calling process_event is wrong and will leave the node in an unclear state15:49
TheJuliaSo this is an awful idea, what if we created more swap?15:49
cdearborndtantsur, a raised exception is handled correctly, as is a timeout15:49
iurygregoryTheJulia, if the awful idea helps I'm ok with it..15:50
iurygregoryusing concurrency 1 would be worth (if we are not doing also) just to see how it goes15:50
TheJuliaI think we can likely tune down the memory footprint we enable the centos job to have since we did some cleaning in the image15:50
TheJuliaWe were up to 500 megs at one point and now we're down to like 36015:51
iurygregorysounds like a plan15:51
cdearborndtantsur, the issue is mainly with async cleaning steps where throwing an exception is not an option15:51
TheJuliathat puts the footprint at worst aroudn 2.25 GB if my back of the napkin math is correct15:51
TheJuliaI'll try tuning down the ipa jobs first with an override and we can see how that goes15:52
TheJuliastatistically if it passes and survives a recheck, we're likely good to reduce the overall memory consumption size across the board15:52
iurygregoryack15:52
TheJuliaSo anything else for us to randomly discuss this morning?15:53
lmcgann_Hi I'm an engineer at red hat research and I just wanted to throw out there that we have begun working on a way to integrate a Keylime into Ironic to provide a means of node attestation under different states. To begin I'll be reviving a patch for generic a security interface on nodes: https://review.opendev.org/#/c/576718/3/specs/approved/security-interface.rst15:53
patchbotpatch 576718 - ironic-specs - Add security interface spec - 3 patch sets15:53
dtantsurcdearborn: then probably cleaning_error_handling is the right tool to use15:53
dtantsurhi and welcome lmcgann_15:53
iurygregorywelcome lmcgann_15:53
dtantsurgreat news!15:54
iurygregory++15:54
TheJulialmcgann_: We _may_ need to split it into two specs mechanics wise, one on the interface and one on just keylime, but maybe wrapped together could work :)15:54
kaifengwelcome lmcgann_15:54
rpiosoo/ lmcgann_15:54
lmcgann_Hi everybody :)15:54
TheJulialmcgann_: and yes, feel free to revise that change set however you feel is appropriate15:54
lmcgann_I would also like to shamelessly promote our DevConf talk this Thursday wherein we will demonstrate our work on Ironic multitenancy and other contributions as a means of sharing hardware in the Mass Open Cloud15:54
cdearborn\0 lmcgann_15:54
lmcgann_https://devconfus2020.sched.com/event/b2738f74c3a7e3021ba9fd53a035e5ed15:55
TheJulialmcgann_: Awesome, good luck!15:55
TheJuliaWell everyone, thank you and have a wonderful week!15:55
openstackgerritDmitry Tantsur proposed openstack/ironic-inspector master: Limit inspector jobs to 1 testing VM  https://review.opendev.org/75305115:55
dtantsurTheJulia: ^^^15:55
TheJuliaYeah, that should work just fine for inspector jobs15:56
TheJuliaIPA otoh :(15:56
TheJuliaI'll put up a patch in a few minutes15:56
TheJuliaAnyway, have a wonderful week!15:56
TheJuliaThanks!15:56
TheJulia#endmeeting15:57
*** openstack changes topic to "Bare Metal Provisioning | Status: http://bit.ly/ironic-whiteboard | Docs: http://docs.openstack.org/ironic/ | Bugs: https://storyboard.openstack.org/#!/project_group/75 | Contributors are generally present between 6 AM and 12 AM UTC, If we do not answer, please feel free to pose questions to openstack-discuss mailing list."15:57
openstackMeeting ended Mon Sep 21 15:57:12 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:57
openstackMinutes:        http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-09-21-15.00.html15:57
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-09-21-15.00.txt15:57
openstackLog:            http://eavesdrop.openstack.org/meetings/ironic/2020/ironic.2020-09-21-15.00.log.html15:57
* TheJulia makes everyone coffee15:57
dtantsurlmcgann_: 16:20 EDT is quite late for me, I hope it will be recorded though15:57
lmcgann_The Julia: Yeah I envisioned two specs. I already started drafting the second one from the document we looked at in our meeting on Thursday but I'll be holding off on submitting anything until more of the security interface work is done15:57
iurygregoryshould be recorded15:57
TheJulialmcgann_: awesome15:57
iurygregoryafaik devconfus is also recorded =)15:57
dtantsuryeah, I'd expect that15:57
iurygregoryif it's not I will ping devconfcz folks to ask devconfus to record hehe15:58
openstackgerritJulia Kreger proposed openstack/ironic-python-agent master: Lower memory usage of VMs  https://review.opendev.org/75305716:02
*** lucasagomes has quit IRC16:03
openstackgerritIury Gregory Melo Ferreira proposed openstack/ironic-prometheus-exporter stable/ussuri: Fallback to `node_uuid` if`node_name` is not present  https://review.opendev.org/75306516:08
openstackgerritIury Gregory Melo Ferreira proposed openstack/ironic-prometheus-exporter stable/train: Fallback to `node_uuid` if`node_name` is not present  https://review.opendev.org/75306716:08
openstackgerritMerged openstack/virtualbmc master: Drop redundant milliseconds from logging  https://review.opendev.org/75285016:13
*** stendulker has quit IRC16:16
openstackgerritDmitry Tantsur proposed openstack/ironic-python-agent master: Documentation: fix incorrect step names  https://review.opendev.org/75308016:16
openstackgerritJulia Kreger proposed openstack/ironic master: CI: Remove the build check for pre-build ramdisks only  https://review.opendev.org/75308116:17
TheJuliaI guess ~1 hour we should know if we can make IPA happier16:23
*** ociuhandu has quit IRC16:25
iurygregoryI will be back to check =)16:26
*** k_mouza has quit IRC16:33
*** antotala has quit IRC16:33
*** Qianbiao has quit IRC16:35
*** gyee has joined #openstack-ironic16:40
*** derekh has quit IRC17:00
*** k_mouza has joined #openstack-ironic17:00
*** dougsz has quit IRC17:03
*** k_mouza has quit IRC17:05
*** k_mouza has joined #openstack-ironic17:14
*** dsneddon has joined #openstack-ironic17:18
*** k_mouza has quit IRC17:18
*** dtantsur is now known as dtantsur|afk17:31
openstackgerritDmitry Tantsur proposed openstack/ironic master: Limit inspector jobs to 1 testing VM  https://review.opendev.org/75309417:33
dtantsur|afkTheJulia: another part of it ^^^17:34
*** k_mouza has joined #openstack-ironic17:40
*** dking has joined #openstack-ironic17:41
*** k_mouza has quit IRC17:44
trandlesTheJulia: I'm not sure I can make today's meeting. I've got other deadlines I'm up against. I also don't see anything on the slides from anyone, myself included. :P Ok if we delay until tomorrow?17:48
TheJuliatrandles: I think that is likely best, I've been slammed17:48
iurygregorylol pep8 failed in my backport to stable/train17:57
*** rloo has quit IRC18:01
*** kaifeng has quit IRC18:02
iurygregorynow I'm puzzled pep8 in stable/train complains about like break after binary operator W504 if I change to the other line i get line break before binary operator W503 ... WHAT?! .-.18:03
* iurygregory is thinking in add 504 to ignore on tox...18:04
*** jtomasek has joined #openstack-ironic18:28
*** k_mouza has joined #openstack-ironic18:34
*** ociuhandu has joined #openstack-ironic18:34
*** k_mouza has quit IRC18:38
*** ociuhandu has quit IRC18:39
*** jawad_axd has joined #openstack-ironic18:41
*** ociuhandu has joined #openstack-ironic18:42
*** jawad_axd has quit IRC18:46
rpiosoiurygregory: Is the pep8 similar to the results on https://review.opendev.org/#/c/748927/?18:53
patchbotpatch 748927 - sushy - Make message parsing more resilient - 3 patch sets18:53
iurygregoryrpioso, nope in my case just complains about W504 but if I fix it will complain about W503 hehe18:55
iurygregoryprobably due to the flak8 version in stable/train18:56
*** jawad_axd has joined #openstack-ironic19:02
*** jawad_axd has quit IRC19:06
*** martalais has quit IRC19:15
*** jawad_axd has joined #openstack-ironic19:23
*** jtomasek has quit IRC19:25
*** jawad_axd has quit IRC19:27
rpiosoiurygregory: Thank you!19:30
*** dougsz has joined #openstack-ironic19:30
tzumainnTheJulia, hi! I was talking to hugh about looking into the cinder ceph/iscsi driver for the moc, and he mentioned that he thought there was an iscsi limitation - something about a maximum of two targets? - that he thought he discussed with you a while ago19:32
tzumainnI've been trying to look for some documentation regarding this, and was wondering if you knew of any?19:32
*** ociuhandu has quit IRC19:32
*** jawad_axd has joined #openstack-ironic19:44
*** jawad_axd has quit IRC19:48
*** dougsz has quit IRC19:50
*** jawad_axd has joined #openstack-ironic20:04
*** jawad_axd has quit IRC20:09
TheJuliatzumainn: ahh yeah20:09
TheJuliaso that is a linux kernel limitation from loading from the ibft table20:09
TheJuliasince it is a linux kernel limitation, we never documented it20:11
tzumainnTheJulia, ah, okay! and just so I understand correctly - the consequence is that a ceph volume running on the kernel can only create two iscsi targets?20:15
TheJuliawell, any iscsi targets20:16
tzumainngot it - thanks!20:16
jandersTheJulia trandles I am on the same boat20:21
janderssame time, tomorrow?20:22
trandlessame time tomorrow works for me janders20:23
*** k_mouza has joined #openstack-ironic20:25
*** ociuhandu has joined #openstack-ironic20:25
*** k_mouza has quit IRC20:30
dkingDoes Ironic have any option similar to ATA secure erase for NVMe discs?20:33
dkingFrom what I'm seeing, it looks like the GenericHardwareManager would simply try to shred an NVMe drive. Is that correct?20:34
jandersdking I believe you are correct. I've been looking into trim/discard as a potential enhancement at a high level however the concern there is trim/discard varies greatly from device to device20:35
jandersdking what's your use case?20:36
janderstrandles updated invite sent20:37
dkingjanders: We have many servers which use NVMe drives where we'll want to be able to clean them fully between each use (as they will be used by different customers). So, we need a secure and fast clean that doesn't put more strain than necessary on the hardware.20:37
jandersdking what is the brand of NVMes if I may ask?20:38
dkingIntel from Supermicro20:38
jandersdking are those direct-attached or going via controller?20:38
janders(I suppose the former but I've seen certain configs based on the latter)20:39
dkingI'm not aware of any controller between them. Some are directly attached, and some are attached on microblades, but I don't think there's any controller involved on any.20:41
jandersright!20:41
jandersare you going to attend the PTG?20:41
dkingWhen is that? I'm going to be attending the Open Infra summit, and I was at the last PTG, but I suppose that I haven't signed up for the next on yet.20:42
dkingI was wondering if it might be practical to add in some rule to check for NVMe inside of the GenericHardwareManager, and run something like "nvme format -s1 block_device"?20:43
jandershttps://www.openstack.org/ptg/20:43
jandersOctober 26-30, 202020:43
jandersI've been looking into a potential enhancement like this just last week - but we weren't sure if there is sufficient user demand (and vendor support) to justify it20:44
jandersI think it can be done - I think it would be good to propose a discussion on this topic for the PTG20:44
jandersgiven it's not far away20:44
janderswould this timeframe work for your dking?20:45
janderss/your/you20:45
*** jawad_axd has joined #openstack-ironic20:46
dkingIt should be fine. We'll might start working on something before then, but I'd be interested in the official solution regardless.20:46
jandersfrom my testing, trim/discard based deletion of a 1.5TB Intel NVMe takes about ~4 seconds and leaves no lasting performance impact (as in you can read/write at full bandwidth immediately after the command is run)20:47
jandersthe biggest two catches are 1) device support for this is inconsistent and in the worst case it means the device may return success on erase and in fact keep the data readable and 2) some devices need vendor specific tooling to do this right or at all20:47
jandersis the storage setup of your servers consistent or do you have a mix of HDD and NVMe?20:48
dkingWe have a mix.20:49
dkingWe'll be moving mostly to NVMe, but I believe that some of our storage servers will still have HDD. Right now, we have a few servers that have both.20:49
jandersright!20:50
jandersso yeah some kind of adaptable behaviour would be best20:50
*** jawad_axd has quit IRC20:51
jandersallright! I think it would be great to discuss this at the PTG and have operator's perspective (and first hand experience) on the matter as well20:51
jandersmeanwhile I will chat to my team about this further20:51
jandersI think it would be nice to have this feature, we just weren't sure how easy will it be to get this right for most users20:51
jandersgiven variability in vendor support20:51
jandersdking thank you for bringing this up20:52
dkingjanders: Thank you very much for your insight and help. I'll see what I find out on this end, and unless we see a solution before then, I'll be looking forward to talks at the PTG.20:53
jandersdking +120:54
jandersdking check out https://man7.org/linux/man-pages/man8/blkdiscard.8.html20:54
janders4s to discard 1.5TB NVMe in my lab20:55
jandersI would be very keen to hear your thoughts on the security implications of it though20:55
*** ociuhandu has quit IRC20:59
TheJuliaSo the interesting thing, I think, in regards to the discard/trim capability is we should be able to determine if possible. and head down that path automatically for nvme devices.21:25
JayFOne other question that's related, but almost-impossible to answer universally21:26
JayFis how /secure/ is that behavior across NVMe controllers?21:26
TheJuliaWell, there is a device behavior contract there...21:27
TheJuliabut yeah21:28
JayFI believe you may have personally experienced the pain of helping recover an ATA driver21:28
JayF*drive21:28
JayFthat did not obey the device behavior contract there, either21:29
TheJuliaYes, several21:29
TheJuliaWe had that guy come in here that nuked like 8 machines worth of SSDs21:29
* TheJulia felt bad about that21:29
* TheJulia still feels bad about that21:29
JayFI'm not by any means saying we shouldn't do it -- on the contrary we should -- but for folks who might not have written "generic" HardwareManager code before, making it truly generic is difficult to borderline impossible21:29
JayFTheJulia: I'd feel bad for one machine of it. The other seven are just not doing a good job of validating external code :-|21:30
TheJuliaJayF: oh, in that case it was something like an intermediate raid controller decided the disks had failed21:30
TheJuliaso we had to put int some extra code to guard rail in that case21:30
TheJuliabecause they went into security locked state21:30
TheJulia\o/21:30
TheJuliaThey were able to get the disks recovered though21:31
JayFwhat's that? code running in hardware space is bad/bad assumptions? Never!21:31
JayF*making bad assumptions21:31
JayFI'm sure they put in 6 pt font on some PDF that was deep-linked on the vendor webpage that the raid controller is incompatible with security locking21:32
TheJuliaof course21:32
*** martalais has joined #openstack-ironic21:34
*** lmcgann_ has quit IRC21:36
*** kaiokmo has joined #openstack-ironic21:36
jandersthe two main concerns so far with trim discard are 1) does it work at all for a specific device 2) if it does, how secure it is22:09
jandersthere are some options around that, from having a list of known-good devices (and it could be a fully supported feature only for these) to having this an optional feature which would have to be explicitly enabled by the operator22:10
jandersand in case of the latter it would be on the operator to ensure it is secure enough for their circumstances22:10
janderswe could try find out at the introspection stage whether the devices support trim/discard/secure_erase too22:11
*** rh-jelabarre has quit IRC22:12
jandersIMO it would be worthwhile to bring this up at the PTG to feel out 1) the interest among the operators and 2) the willingness of the vendors to make improvements in their kit to better support features like this22:12
janderswhat are your thoughts?22:12
*** rh-jelabarre has joined #openstack-ironic22:12
openstackgerritMerged openstack/python-ironicclient master: Add Python3 wallaby unit tests  https://review.opendev.org/75072222:13
*** k_mouza has joined #openstack-ironic22:17
TheJuliajanders: we can't base it on introspection data and many deployments don't care about introspection data22:21
TheJuliaIt has to be inside that code path of erase devices.22:21
jandersTheJulia right!22:22
*** k_mouza has quit IRC22:22
TheJuliaAnd re: trim specifically, if we an validate that the trim command is actually sent via a code audit, then it is totally up to the device22:22
TheJuliaso if the device does not _really_ honor trim but doesn't error on the operation, then there is nothing we can do but fallback. If the trim disappears into the ether, then that is problematic but since it is a distinct command  it should ack/respond to it22:22
TheJuliajanders: I'd add it, and maybe someone and dig through the code before the ptg so we have an understanding of it22:23
*** jawad_axd has joined #openstack-ironic22:29
janderswhat worries me a little is the fact that vendors often have their own CLI with "magic" features while this happens with generic tools:22:30
jandershttp://paste.openstack.org/show/798176/22:31
janders(-s should be secure discard - and I am pretty sure those devices do support it)22:31
jandersI will try with the Intel tool after progressing a bit more with more time sensitive things22:32
openstackgerritJulia Kreger proposed openstack/ironic master: Reduce grenade node count  https://review.opendev.org/75318322:32
openstackgerritJulia Kreger proposed openstack/ironic master: Reduce VMs for multinode and standalone jobs  https://review.opendev.org/75318422:32
*** jawad_axd has quit IRC22:34
openstackgerritJulia Kreger proposed openstack/ironic-python-agent master: CI: Lower memory usage of VMs/Increase swap  https://review.opendev.org/75305722:43
*** k_mouza has joined #openstack-ironic22:47
*** tosky has quit IRC22:49
TheJuliajanders: is there a boss card in that chassis. Basically a raid controller in the middle shoots the security/discard sort of commands in the foot :(22:50
*** jawad_axd has joined #openstack-ironic22:50
*** k_mouza has quit IRC22:52
jandersTheJulia "plain" (no --secure) discard works: http://paste.openstack.org/show/798180/22:53
jandersunfortunately the really interesting part (--secure) does not :/22:53
jandersI will see if I can quickly install the Intel tool22:53
*** jawad_axd has quit IRC22:55
*** kaiokmo has quit IRC22:59
TheJuliahmm22:59
*** rcernin has joined #openstack-ironic23:01
TheJuliaso news on CI job issues23:04
TheJuliagood news or bad news first?23:05
stevebakerbad please23:05
TheJuliaby default, test vms no only have 1 GB of swap... and ~60GB for /opt at the worst23:06
TheJuliaand these changes were made globally23:06
TheJuliawell23:07
TheJuliaswap was23:07
TheJuliagood news is we can actually make things better... i hope.23:07
stevebakerbetter by adding swap?23:08
TheJuliawell.. not exactly23:09
TheJuliawe can increase swap, but it digs away at /opt23:09
TheJuliaso returning us to 8GB of swap gives us ~50GB on /opt23:09
stevebakerI see23:09
*** zzzeek has quit IRC23:10
*** rcernin has quit IRC23:11
TheJuliarealistically all we can really do is reduce our VM footprint23:11
*** jawad_axd has joined #openstack-ironic23:11
*** rcernin has joined #openstack-ironic23:11
*** zzzeek has joined #openstack-ironic23:13
jandersTheJulia here are the results of a quick secure erase test with Intel-SSD/NVMe CLI: http://paste.openstack.org/show/798182/23:13
jandersit looks like it works (as in controlled isn't causing any hassles) but having to use a vendor-specific CLI is not ideal...23:14
*** jawad_axd has quit IRC23:15
jandersTheJulia more details: http://paste.openstack.org/show/798183/ (extracts from tool's doco)23:16
*** zzzeek has quit IRC23:17
TheJuliahmm, i wonder if hdparm...23:17
*** zzzeek has joined #openstack-ironic23:20
janders(extract from hdparm manual) http://paste.openstack.org/23:20
jandersoops23:20
jandershttp://paste.openstack.org/show/798184/23:20
janderscopy paste fail23:20
jandersEXCEPTIONALLY DANGEROUS. DO NOT USE THIS OPTION!!23:20
janderslike that comment :)23:20
jandersTheJulia what hdparm functionality are you thinking of?23:21
TheJuliatrim mechanically won't entirely work because of the io interactions23:22
TheJuliaat least in SATA, you can only trim a 4k block at a time or something absurd like that23:23
TheJuliaand hdparm looks like it does the translation under the hood23:23
TheJuliaI'm kind of curious if hdparm might support the nvme call to trigger secure erase but that is too hopeful23:23
jandersblkdiscard seems to support that23:23
jandersas in you can pass the entire block device as a parameter23:24
TheJuliais it sending option223:24
jandersblkdiscard?23:24
TheJuliayeah23:24
jandersmy guess is no because it has --secure flag that didnt work in my lab23:24
jandersso I suppose without --secure it's doing a "plain" discard as opposed to "secure" one23:24
TheJuliaso why does intel's tool work... I wonder23:24
jandersmagic smoke23:24
janderssome proprietary tricks I suppose23:25
jandersbut there is one more option23:25
jandershttp://paste.openstack.org/show/798185/23:25
jandersnvme-cli23:25
janderstrying to find out what that does23:26
jandersthis is looking heaps better...23:27
janderslooks like nvme CLI secure erase *does* work23:29
*** jawad_axd has joined #openstack-ironic23:32
janderswhich is quite awesome23:33
jandersI will test more thoroughly if it does what it says though... hexdump time23:36
*** martalais has quit IRC23:36
*** jawad_axd has quit IRC23:37
*** jawad_axd has joined #openstack-ironic23:52
*** jawad_axd has quit IRC23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!