Tuesday, 2014-12-02

*** Haomeng|2 has joined #openstack-ironic00:01
dlaubejroll: I last cloned devstack right around the juno release00:02
jrolldlaube: ah, that console thing is new00:02
*** Haomeng has quit IRC00:02
*** smoriya has joined #openstack-ironic00:03
*** russellb has joined #openstack-ironic00:04
*** davideagnello has quit IRC00:08
*** davideagnello has joined #openstack-ironic00:10
dlaubeI do see console logs though00:11
dlaubegoing to try another nova boot and will share a paste of the output00:11
*** ChuckC_ has joined #openstack-ironic00:12
jrolldlaube: right, there are console logs for the VM, but IPA logs won't be in there00:13
NobodyCamalso dlaube have you seen : https://ask.openstack.org/en/question/50080/ironicdriversmodulesagent-node-command-status-errored-error-downloading-image/00:14
*** Masahiro has joined #openstack-ironic00:15
*** pensu has joined #openstack-ironic00:18
*** Masahiro has quit IRC00:19
dlaubecrap, yeah.. baremetal console log just shows it sitting at a coreos login00:20
*** ChuckC_ is now known as ChuckC00:20
dlaubepresumably the deploy image00:20
dlaubebefore it goes to install my ubuntu00:20
jrollyeah, with newer devstack it will keep logging00:24
jrollhere, lemme grab the patch00:24
jrollyou can just patch locally00:24
jrolldlaube: https://review.openstack.org/#/c/136867/2/lib/ironic00:25
*** ryanpetrello has quit IRC00:26
jrollyou can actually just edit ironic.conf and restart conductor00:26
*** ChuckC has quit IRC00:29
dlaubethanks jroll!00:33
dlaubeI know how to restart ironic deploy via apt on our lab… but in devstack I normally ./unstack.sh   and ./stack.sh00:33
dlaubeis there an easier way to restart just ironic conductor in devstack?00:34
*** anderbubble has joined #openstack-ironic00:34
JayFyeah, connect up to the screen for it (ir-cond should be the name)00:35
JayF^c and restart the process00:35
dlaubethanks JayF00:35
JayFI think I usually hit ctrl+c then hit up to get the command used to spawn it00:36
NobodyCamJayF: ++00:36
openstackgerritMichael Davies proposed openstack/ironic-specs: Proposal to add logical names to Ironic nodes  https://review.openstack.org/13443900:39
*** pensu has quit IRC00:40
*** penick has quit IRC00:40
jrolldlaube: :) np00:40
*** lucas-dinner has quit IRC00:41
*** penick has joined #openstack-ironic00:42
*** penick has quit IRC00:42
*** Marga_ has quit IRC00:45
*** Marga_ has joined #openstack-ironic00:46
dlaubehmm00:46
dlaubecan only find the sample conf00:46
dlaubeheh00:46
*** ryanpetrello has joined #openstack-ironic00:46
dlauberoot@lab7:~/devstack# find /opt/stack/ -name ironic.conf*00:46
dlaube/opt/stack/ironic/etc/ironic/ironic.conf.sample00:46
*** igordcard has quit IRC00:47
*** Marga_ has quit IRC00:52
*** Masahiro has joined #openstack-ironic00:52
*** Marga_ has joined #openstack-ironic00:52
*** ryanpetrello has quit IRC00:53
*** ryanpetrello has joined #openstack-ironic00:56
*** ryanpetrello has quit IRC01:00
*** spandhe has quit IRC01:10
*** anderbubble has quit IRC01:28
*** Masahiro has quit IRC01:30
*** Masahiro has joined #openstack-ironic01:33
dlaubehmm01:33
*** r-daneel has quit IRC01:50
*** Marga_ has quit IRC01:58
zer0c00lWhat are the drivers used by the devstack setup mentioned here? http://docs.openstack.org/developer/ironic/dev/dev-quickstart.html#deploying-ironic-with-devstack01:58
zer0c00li see the power driver is ssh01:58
zer0c00lDoes it use the pxe driver?01:58
zer0c00lIf i have to make ironic use my custom driver, where should i add it?01:58
zer0c00lIRONIC_ENABLED_DRIVERS ?01:58
zer0c00lAlso how do i see the existing enabled drivers?01:59
zer0c00lMy openrc looks like this http://paste.fedoraproject.org/155684/8557314101:59
zer0c00li can't see anything else other that ssh driver mentioned in the ironic conductor log02:00
jrolldlaube: should be /etc/ironic.conf02:00
jrollzer0c00l: IRONIC_ENABLED_DRIVERS, yeah, and restack02:01
jroll(I think)02:01
*** nosnos has joined #openstack-ironic02:02
zer0c00ljroll: enabled_drivers = fake,pxe_ssh,pxe_ipmitool02:02
openstackgerritTan Lin proposed openstack/ironic: Fixed typo in Drac management driver test  https://review.openstack.org/13802802:02
jrollzer0c00l: yeah, it needs to be there02:03
zer0c00lIt is mentioned in /etc/ironic/ironic.conf02:03
jrollIRONIC_ENABLED_DRIVERS in devstack makes it go there02:03
zer0c00lso if i add a new one there and restart the conductor the new driver should be loaded02:03
zer0c00l?02:03
jrollyes02:03
jrolloh02:03
zer0c00lsure. Thanks!02:03
jrollyou need it in setup.cfg02:03
zer0c00lsetup.cfg?02:03
zer0c00lof ironic?02:03
jrolland that requires setup.py install02:03
jrollyes02:03
*** marcoemorais has quit IRC02:03
Haomeng|2zer0c00l: ironic.conf02:03
zer0c00lok02:04
Haomeng|2zer0c00l: you can change it in ironic.conf, for example - enabled_drivers = fake,pxe_ssh,pxe_ipmitool02:04
Haomeng|2zer0c00l: and restart the ironic-conductor process02:05
zer0c00lGot it!02:05
zer0c00lThanks02:05
Haomeng|2zer0c00l: welcome02:05
*** Dafna has quit IRC02:16
*** takadayuiko has joined #openstack-ironic02:16
openstackgerritHaomeng,Wang proposed openstack/ironic: boot_devices.PXE value should match with pyghmi define  https://review.openstack.org/13774502:20
*** Nisha has joined #openstack-ironic02:25
zer0c00lThe "Boot" and the "Deploy" Stuff are not decoupled in the ironic pxe drivers?02:30
zer0c00lPXE is a boot method, iscsi is a deploy method02:30
zer0c00lThe "PXEDeploy" class always deploys using iscsi method02:30
zer0c00lif i have to write my own deploy method, i need to either make a copy of PXEDeploy or decouple this thing02:31
zer0c00l?02:31
*** ChuckC has joined #openstack-ironic02:34
Haomeng|2zer0c00l: yes, good catch02:35
Haomeng|2zer0c00l: we have bp which try to decouple deploy and boot02:35
Haomeng|2zer0c00l: let me find it02:35
jrollzer0c00l: yeah, that's something we want to do asap02:36
jroll:(02:36
zer0c00li can take a look at it02:37
zer0c00lIs there a bug#?02:37
jrollhttps://review.openstack.org/#/q/status:open+branch:master+topic:bp/new-boot-interface,n,z02:37
jrollis where lucas has been working on it02:38
jrollI gotta run, later02:38
zer0c00lsure02:38
zer0c00llet me check02:38
*** killer_prince is now known as lazy_prince02:40
Haomeng|2yes, it is02:41
harlowja_zer0c00l have u tried using libvirt vms to be the pxeboot targets, i'm not sure if thats what ironic does (vs use the fake stuff)02:48
harlowja_that might be a way to get everything all on your laptop02:48
*** Masahiro has quit IRC02:48
harlowja_including vms that act as 'machines' that u can pxeboot02:48
harlowja_https://bugzilla.redhat.com/show_bug.cgi?id=815136 might be sorta neat to02:49
harlowja_libvirt + ipmi02:49
*** ryanpetrello has joined #openstack-ironic02:51
*** Masahiro has joined #openstack-ironic02:53
*** dlaube has quit IRC02:55
*** ramineni has joined #openstack-ironic03:02
*** achanda has joined #openstack-ironic03:03
*** Masahiro has quit IRC03:11
*** nosnos has quit IRC03:29
*** Masahiro has joined #openstack-ironic03:35
*** Haomeng|2 has quit IRC03:38
*** Masahiro has quit IRC03:40
*** Masahiro has joined #openstack-ironic03:43
*** Masahiro has quit IRC03:45
*** pensu has joined #openstack-ironic03:47
*** rloo_ has quit IRC03:47
*** lazy_prince has quit IRC03:52
*** Masahiro has joined #openstack-ironic03:55
*** rushiagr_away is now known as rushiagr03:58
*** Masahiro has quit IRC04:07
*** naohirot has joined #openstack-ironic04:07
naohirotgood afternoon ironic!04:07
*** nosnos has joined #openstack-ironic04:17
openstackgerritMerged openstack/ironic: Fixed typo in Drac management driver test  https://review.openstack.org/13802804:19
*** achanda has quit IRC04:21
*** Masahiro has joined #openstack-ironic04:28
*** ryanpetrello has quit IRC04:28
*** Marga_ has joined #openstack-ironic04:47
*** killer_prince has joined #openstack-ironic04:48
*** killer_prince is now known as lazy_prince04:48
*** Haomeng has joined #openstack-ironic04:53
*** pensu has quit IRC05:11
*** lazy_prince is now known as killer_prince05:18
*** rameshg87 has joined #openstack-ironic05:21
*** pcrews has quit IRC05:30
*** achanda has joined #openstack-ironic05:36
*** achanda has quit IRC05:38
*** achanda has joined #openstack-ironic05:39
*** pensu has joined #openstack-ironic05:40
*** achanda has quit IRC05:43
*** achanda has joined #openstack-ironic05:44
*** Masahiro has quit IRC05:46
openstackgerritHarshada Mangesh Kakad proposed openstack/ironic: Add documentation for SeaMicro driver  https://review.openstack.org/13632405:53
*** lintan has joined #openstack-ironic06:02
*** Masahiro has joined #openstack-ironic06:04
*** Marga_ has quit IRC06:06
*** achanda has quit IRC06:14
*** achanda has joined #openstack-ironic06:14
*** achanda has quit IRC06:19
*** mrda is now known as mrda-away06:28
*** pradipta_away is now known as pradipta06:33
openstackgerritsandhya proposed openstack/ironic-specs: Chassis Level Node Discovery  https://review.openstack.org/13486606:34
*** Marga_ has joined #openstack-ironic06:41
*** harlowja_ is now known as harlowja_away06:45
*** killer_prince has quit IRC06:47
*** lazy_prince has joined #openstack-ironic06:47
*** chenglch has joined #openstack-ironic06:54
*** Masahiro has quit IRC07:11
*** subscope has quit IRC07:12
*** Masahiro has joined #openstack-ironic07:16
*** k4n0 has joined #openstack-ironic07:21
*** subscope has joined #openstack-ironic07:27
lintanlintan:07:40
Haomenglintan: hi07:45
Haomenglintan: understand you are trying to ping your self for testing:)07:46
lintanHaomeng: haha, you get me :)07:47
Haomenglintan: :)07:47
Haomenglintan: but if it is working, that should be irc client bug, because, that does not make sense:)07:48
Haomenglintan: :)07:48
lintanHaomeng: :( yes, it doesn't work as you said. I just have a try.07:49
Haomenglintan: :)07:49
*** LuisArizmendi has joined #openstack-ironic08:04
*** romcheg has joined #openstack-ironic08:13
*** ndipanov_gone is now known as ndipanov08:19
*** Masahiro has quit IRC08:33
*** dlpartain has joined #openstack-ironic08:36
takadayuikostackuser common-venv use-ephemeral deploy-ironic08:38
takadayuikomistook :O08:39
*** Masahiro has joined #openstack-ironic08:41
*** jcoufal has joined #openstack-ironic08:42
*** andreykurilin has joined #openstack-ironic08:45
*** vinbs has joined #openstack-ironic08:50
Nishahi dtantsur|afk08:51
*** nosnos has quit IRC09:00
openstackgerritNisha Agarwal proposed openstack/ironic-specs: Discover node properties using new CLI node-discover-properties  https://review.openstack.org/10095109:14
*** pradipta is now known as pradipta_away09:15
*** pradipta_away is now known as pradipta09:17
*** pradipta is now known as pradipta_away09:17
*** jistr has joined #openstack-ironic09:18
*** chenglch|2 has joined #openstack-ironic09:24
*** chenglch has quit IRC09:27
*** dlpartain1 has joined #openstack-ironic09:31
*** dlpartain has quit IRC09:31
*** igordcard has joined #openstack-ironic09:31
*** andreykurilin has quit IRC09:32
*** viktors|afk has quit IRC09:34
*** viktors has joined #openstack-ironic09:34
*** athomas has joined #openstack-ironic09:35
*** dlpartain1 has quit IRC09:35
*** lucasagomes has joined #openstack-ironic09:39
*** yuriyz has quit IRC09:40
*** viktors has quit IRC09:40
*** derekh has joined #openstack-ironic09:40
*** dtantsur|afk is now known as dtantsur09:41
dtantsurMorning!09:42
naohirotdtantsur: good morning :)09:42
*** foexle has joined #openstack-ironic09:42
*** lsmola has quit IRC09:43
openstackgerritMerged openstack/ironic: boot_devices.PXE value should match with pyghmi define  https://review.openstack.org/13774509:43
*** jcoufal_ has joined #openstack-ironic09:44
*** jcoufal has quit IRC09:47
*** chenglch|2 has quit IRC09:52
Nishadtantsur, good morning09:53
Nishai have updated the spec09:54
Nishadtantsur, i have used the term introspect now in the spec instead of discovery now09:55
dtantsurNisha, good. The parent spec is not updated with it, but I believe we all agreed.09:55
dtantsurand good morning :)09:55
Nishadtantsur, let me know your comments/suggestions on it09:55
dtantsuryeah sure, gimme a moment09:55
Nishadtantsur, :)09:55
Nishadtantsur, :)09:56
dtantsurNisha, shouldn't CLI command be also called node-introspect-properties? (or even just node-introspect)09:56
Nishai actually wanted to ask that before posting the spec... :)09:57
Nishabut then i though i will do that once discussed09:57
dtantsurNisha, also there are a few places where you still use DISCOVERING and DISCOVERYFAIL09:57
NishaOk. Let me see.09:57
NishaI will repost the spec09:57
*** lsmola has joined #openstack-ironic09:58
dtantsurNisha, also introspection_timeout option should be mentioned in "deployer impact" section (with the default value)09:58
Nishaok09:59
dtantsurlemme post the comments on the spec, IRC is a bad reference source :)10:00
*** sambetts has joined #openstack-ironic10:01
*** viktors has joined #openstack-ironic10:01
*** Masahiro has quit IRC10:03
dtantsurdone10:04
*** luisjariz has joined #openstack-ironic10:04
*** luisjariz has quit IRC10:05
*** LuisArizmendi has quit IRC10:07
dtantsurlucasagomes, o/ may I use you for discoverd reviews today as well? :)10:07
lucasagomesdtantsur, heh sure10:08
lucasagomesmorning all10:08
dtantsurlucasagomes, I have 4 more, the most important right now being https://review.openstack.org/#/c/137418/ and https://review.openstack.org/#/c/137361/10:08
dtantsurthanks in advance :)10:08
takadayuikoHi, lucasagomes10:08
* dtantsur had a hard rebase yesterday...10:09
lucasagomestakadayuiko, hello there :)10:09
dtantsurtakadayuiko, o/10:09
sambettsdtantsur, lucasagomes: I had to head out ysterday evening, was a final decision made about the state machine?10:09
takadayuikodtantsur, o/10:09
dtantsursambetts, I guess it was "carry on with what we have now"10:09
*** Masahiro has joined #openstack-ironic10:09
lucasagomessambetts, yeah improve the one we have now10:10
sambettsso continue to implement the ideas proposed at the summit?10:10
dtantsuryep10:10
lucasagomessome of the stuff proposed have made into the new model, like less multipaths, states are now classified as passive/active (kinda like the state action), some name changes10:11
sambettscool cool :-)10:11
openstackgerritNisha Agarwal proposed openstack/ironic-specs: Discover node properties using new CLI node-introspect  https://review.openstack.org/10095110:11
sambettslucasagomes: ah ok, just refined a bit from the whiteboard scribble10:11
lucasagomesyup10:11
Nishadtantsur, lucasagomes could i request your reviews on https://review.openstack.org/134022 and https://review.openstack.org/13702410:12
*** yuriyz has joined #openstack-ironic10:12
Nishaposted long back and no reviews till now10:13
dtantsuryeah, review queue is huge for us, sorry :) will try to find time today10:13
Nishadtantsur, thanks10:13
dtantsurlucasagomes, thanks, updated10:33
*** lsmola has quit IRC10:34
lucasagomesNisha, #137024 reviwed10:35
*** naohirot has quit IRC10:37
*** vdrok has joined #openstack-ironic10:38
lucasagomesdtantsur, what does /v1/continue does?10:39
* lucasagomes brb 1 sec10:39
dtantsurlucasagomes, it's an endpoint receiving callback from the ramdisk10:39
dtantsurand I know that I suck at naming :)10:39
lucasagomesdtantsur, this is where all the pos_* plugins will run?10:42
lucasagomesI'm wondering if the http request won't timeout there making it sync10:42
*** pelix has joined #openstack-ironic10:42
dtantsurlucasagomes, yep. it corresponds to process() function10:42
*** Masahiro has quit IRC10:42
dtantsurwell, it depends on the timeout :)10:42
dtantsureven on the timeout of CURL, right?10:43
dtantsur(talking about the bash ramdisk)10:43
lucasagomesyeah, usually tools have their own timeout10:44
lucasagomesyou can modify it do with some -- options10:44
lucasagomesdtantsur, but does the ramdisk needs to wait the /continue to finish ?10:44
dtantsurlucasagomes, actually it took 1-2 seconds last time I checked :)10:44
lucasagomesI thought it would post the data and poweroff and the service would then process the data and do what it needs to do10:44
dtantsurlucasagomes, it does, if we want to implement IPMI credentials setting10:44
lucasagomesoh10:45
dtantsurlucasagomes, also: it's nice to leave the ramdisk in the troubleshoot mode if we sent some crap and discoverd returned an error10:45
lucasagomesdtantsur, yeah I'm just wondering when more plugins comes in10:45
lucasagomeswe can kinda lose the control of the time10:45
lucasagomesand being async sounds more flexible10:45
lucasagomesunless we have a notification to tell the ramdisk to continue that's hard10:46
lucasagomesbut I think that for v1/ it may be fine to leave it sync10:46
dtantsurwell... if plugins don't need to return the result to the ramdisk, they could use greenthread.spawn() and become async10:46
mrda-awayhey jroll, are you happy with my response to your query on the logical name spec?10:46
dtantsurif they do need to return the result, it can't be helped10:46
lucasagomesdtantsur, right10:47
lucasagomesok grand then :)10:47
dtantsurcool :)10:47
* dtantsur yoga time, brb10:47
*** lsmola has joined #openstack-ironic10:50
*** ramineni has quit IRC11:06
openstackgerritLucas Alvares Gomes proposed openstack/ironic: Extend API multivalue fields  https://review.openstack.org/13776211:07
openstackgerritLucas Alvares Gomes proposed openstack/ironic: Extend API multivalue fields  https://review.openstack.org/13776211:08
*** Nisha has quit IRC11:14
*** bradjones has quit IRC11:14
*** Nisha has joined #openstack-ironic11:15
*** rameshg87 has quit IRC11:15
*** bradjones has joined #openstack-ironic11:19
*** bradjones has quit IRC11:19
*** bradjones has joined #openstack-ironic11:19
*** alexpilotti_ has joined #openstack-ironic11:19
*** alexpilotti has quit IRC11:20
*** alexpilotti_ has quit IRC11:23
*** vinbs has quit IRC11:38
*** smoriya has quit IRC11:41
*** jistr is now known as jistr|training11:42
*** Masahiro has joined #openstack-ironic11:43
*** Masahiro has quit IRC11:48
*** takadayuiko has quit IRC11:52
*** romcheg has quit IRC11:54
*** romcheg has joined #openstack-ironic11:54
*** naohirot has joined #openstack-ironic11:54
openstackgerritNisha Agarwal proposed openstack/ironic-specs: Discover node properties using new CLI node-introspect  https://review.openstack.org/10095111:56
*** Haomeng|2 has joined #openstack-ironic11:58
*** Haomeng has quit IRC11:59
openstackgerritNisha Agarwal proposed openstack/ironic-specs: Discover node properties for iLO drivers  https://review.openstack.org/10300712:13
*** pensu has quit IRC12:14
openstackgerritNisha Agarwal proposed openstack/ironic-specs: uefi support for agent-ilo driver  https://review.openstack.org/13702412:25
*** k4n0 has quit IRC12:28
*** lucasagomes is now known as lucas-hungry12:32
*** lazy_prince is now known as killer_prince12:43
*** ryanpetrello has joined #openstack-ironic12:43
*** Masahiro has joined #openstack-ironic12:52
*** Masahiro has quit IRC12:57
*** erwan_taf has joined #openstack-ironic13:11
*** dprince has joined #openstack-ironic13:15
*** lucas-hungry is now known as lucasagomes13:23
*** igordcard has quit IRC13:28
*** killer_prince is now known as lazy_prince13:29
*** igordcard has joined #openstack-ironic13:31
openstackgerritOleksii Chuprykov proposed openstack/ironic-python-agent: Use oslo.utils and oslo.concurrency  https://review.openstack.org/13811613:33
*** ryanpetrello has quit IRC13:38
*** ryanpetrello has joined #openstack-ironic13:39
*** Marga_ has quit IRC13:47
*** rloo has joined #openstack-ironic13:54
*** rushiagr is now known as rushiagr_away13:55
*** igordcard has quit IRC13:56
*** jjohnson2 has joined #openstack-ironic13:57
*** Nisha has quit IRC13:59
*** ryanpetrello_ has joined #openstack-ironic14:06
*** ryanpetrello has quit IRC14:08
*** ryanpetrello_ is now known as ryanpetrello14:08
*** linggao has joined #openstack-ironic14:13
*** ndipanov has quit IRC14:16
*** Marga_ has joined #openstack-ironic14:18
*** ryanpetrello has quit IRC14:18
*** ryanpetrello has joined #openstack-ironic14:19
*** Marga_ has quit IRC14:23
ChuckCmorning ironic14:24
openstackgerritMerged openstack/ironic-specs: Proposal to add logical names to Ironic nodes  https://review.openstack.org/13443914:35
jrollmorning everybody14:35
Shrewsmorning jroll14:35
Shrewsand ChuckC14:36
jrollmrda-away: landed your spec, I think you can provide a node uuid in the body and I forgot about that14:36
jrollheya Shrews, ChuckC :)14:36
rloomorning ChuckC, jroll, Shrews14:36
*** dlaube has joined #openstack-ironic14:36
Shrewso/ rloo14:37
jroll\o rloo14:37
jroll(jinx)14:37
Shrewsget outta my head jroll14:37
jroll:D14:38
openstackgerritHarshada Mangesh Kakad proposed openstack/ironic: Add documentation for SeaMicro driver  https://review.openstack.org/13632414:38
rloo'great minds think alike' ?14:38
jrollgreat is an interesting word for my mind :P14:40
Shrewsgreat is an incorrect word for my mind :P14:41
*** Masahiro has joined #openstack-ironic14:41
rloofools seldom differ? ;)14:43
jrolllolol14:44
dtantsurMorning ChuckC, jroll, Shrews, rloo!14:44
jrollhey dtantsur :)14:44
rlooafternoon dtantsur14:44
Shrewsrloo: that's more appropriate  :)14:44
Shrewshey dtantsur14:44
lucasagomesjrist, Shrews rloo ChuckC morning14:46
*** Masahiro has quit IRC14:46
lucasagomesjroll, :)14:46
rlooafternoon lucasagomes14:46
lucasagomesjr<tab> is dangerous14:46
jrolllol14:46
*** rushiagr_away is now known as rushiagr14:49
*** lazy_prince is now known as killer_prince14:50
*** naohirot has quit IRC14:53
NobodyCamgood momrning Ironic-ers14:55
dtantsurNobodyCam, o/14:55
jrollhiya NobodyCam :)14:56
lucasagomesNobodyCam, morning14:56
NobodyCammorning dtantsur jroll lucasagomes :)14:56
* jroll toses a pot of coffee to NobodyCam14:56
jrollhmm, need to find a nova core14:56
NobodyCamoh thank you jroll :) neeeded :)14:56
NobodyCamnova core?14:57
jrollyeah for https://review.openstack.org/#/c/98930/14:58
jrollconfigdrive14:58
dlaubeg'morning14:59
*** r-daneel has joined #openstack-ironic15:00
jrollhiya dlaube :)15:00
jrollNobodyCam: found one \o/15:01
jrolldarn, now I have to actually write code15:01
NobodyCamoh Nice spec15:01
NobodyCamlol15:01
NobodyCammorning dlaube15:01
*** rushiagr is now known as rushiagr_away15:04
lucasagomesjroll, o/15:11
rloojroll: I opened a bug about setting maintenance mode off via node-update, needing to clear maint reason15:12
rloojroll: https://bugs.launchpad.net/ironic/+bug/139819115:12
lucasagomesjroll, any news on the rebuild vs configdrive thing?15:12
rloojroll: so we don't forget ;)15:12
NobodyCammorning rloo :)15:13
rloomorning to the man who hopefully has had coffee15:13
*** ndipanov has joined #openstack-ironic15:13
NobodyCam:) yep :) working on first cup now15:14
jrolllucasagomes: we haven't talked about it more, yesterday was pretty busy15:16
lucasagomes:) yeah I hear ya15:17
jrollrloo: cool, thanks15:18
*** Marga_ has joined #openstack-ironic15:19
*** Marga_ has quit IRC15:23
*** lynxman has quit IRC15:25
*** lynxman has joined #openstack-ironic15:26
NobodyCamlucasagomes: your comment on https://review.openstack.org/#/c/132137 is that because you foresee anyone wanting to set the uuid? or is there another reason I'm not thinking about?15:27
NobodyCams/foresee/don't foresee15:27
lucasagomesNobodyCam, yeah, I mean I can't think about any use case where someone wants to create a node in Ironic and input a UUID by hand15:28
lucasagomesI understand it's supported in the API so makes sense to be able to do in the client15:28
jrollI would probably use it if my cmdb used UUIDs15:28
lucasagomesbut a UUID seems like something that should always be generate (to guarantee uniqueness)15:28
dtantsurlucasagomes, discoverd may by such a thing, if folks do force me to handle creation of Ironic nodes (which I try to avoid)15:28
jrolljust for easy linkage15:28
lucasagomesright15:29
NobodyCamyea, I was thinking about folks who have a existing cmdb and just wanted to keep the same id's15:29
dtantsuryeah and CMDB use case too15:29
lucasagomesyeah, ok :)15:29
lucasagomesI was fine with the change15:29
jrolllucasagomes: the uuid library doesn't ensure uniqueness by remembering uuids or anything, there can actually be collisions I would guess15:29
lucasagomesmy -1 is because of the lack of tests15:29
NobodyCamstill needs tests15:29
NobodyCamyea just wanded to make sure15:29
jrollwe have the unique constraint for ensuring uniqueness :P15:29
lucasagomesyeah, I haven't thought about the CMDB thing15:30
lucasagomesI can see it, but still, I find it odd to input a UUID by hand15:30
NobodyCamanyone seen arun on line?15:30
lucasagomesin the conference we talked about having alias and all for nodes15:30
lucasagomesI like that15:31
jrollyeah15:31
jrollI also find it odd, but could be useful15:31
jrollalso, I just landed mrda's spec for the name thing15:31
lucasagomesw00t15:31
NobodyCamI hope folks doing that are adding nodes via script and not hand :-p15:31
NobodyCamnice15:31
jroll:P15:32
lucasagomesjroll, how IPA picks the disk to use for the deployment?15:33
lucasagomesu guys have some mechanism there? or pick the first one?15:33
jrollit's pluggable15:34
jrollbut it chooses the smallest disk above 4GB15:34
jrollhttps://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/hardware.py#L262-27015:34
* lucasagomes looks15:34
jrollyou could override that with a little hardware manager plugin15:35
jrollsuper easy15:35
lucasagomesnice yeah!15:35
jrollfor example this is our manager for our hardware https://github.com/rackerlabs/onmetal-ironic-hardware-manager/blob/master/onmetal_ironic_hardware_manager/__init__.py#L3915:36
jrolljust replace the methods in that class with whatever you want to override15:36
lucasagomesjroll, having some hints in the node.properties about which disk to pick would make sense for u guys too?15:36
lucasagomes(like UUID, WWN, etc...)15:36
jrolllucasagomes: it would make sense, we personally probably wouldn't use it but yeah15:36
lucasagomesack15:37
* lucasagomes clicks on the example15:37
rlooNobodyCam, lucasagomes. wrt 132137, i could be wrong but that discussion about PUT in the meeting was partially due to a bug, where they wanted to specify the uuid when creating a node15:48
rlooNobodyCam: and I mentioned yesterday to arun that it would be nice if he could add a test to that patch ;)15:49
*** dlaube has quit IRC15:51
NobodyCamrloo: awesome Thank you :)15:52
*** dlaube has joined #openstack-ironic15:55
*** zz_jgrimm is now known as jgrimm15:57
*** alexpilotti has joined #openstack-ironic15:58
lucasagomesrloo, ahh right15:58
lucasagomesso seems people does have many use cases for it :)15:59
lucasagomeswhich is good15:59
rloolucasagomes: we aim to please. ha ha.15:59
*** alexpilotti has quit IRC15:59
*** alexpilotti has joined #openstack-ironic15:59
lucasagomesrloo, lol! aye16:00
*** yjiang5 is now known as yjiang5_away16:01
*** anderbubble has joined #openstack-ironic16:01
*** pcrews has joined #openstack-ironic16:04
NobodyCamhummm16:06
lucasagomesdtantsur, rloo added a comment about DISCOVERING->PREBOOTING/AVAILABLE on the state machine thing16:12
NobodyCamjroll: the IPA `iso-image-create` dose not create the agent image itself... correct?16:13
jrollmmm16:13
* jroll looks16:13
*** rushiagr_away is now known as rushiagr16:13
jrollNobodyCam: no, but 'make iso' will16:14
jrollthrough the dependencies16:14
rloothx lucasagomes. it isn't clear to me whether one can opt out of zapping/discovery at any time 'around' the state machine?16:14
NobodyCam:) commenting on the agent-ilo0uefi support spec16:14
lucasagomesrloo, it doesn't seem to be16:16
lucasagomesbut I think it that those states could be non-op16:16
rloolucasagomes: do people really want to eg do discovering every time a node is going to be made available again?16:17
lucasagomesif the driver doesn't contain any zapping steps, or introspection interface16:17
lucasagomeszapping and discovering just moves to the next state16:17
lucasagomessame for prebooting16:17
rloolucasagomes: yeah, but what if the driver wants to do discovering once, the very first time the node is enrolled?16:17
lucasagomesrloo, if zapping is updating firmware for e.g it can introduce new capabilties that needs to be discovered16:17
rloolucasagomes: but what if zapping doesn't do anything that requires discovering.16:18
lucasagomesrloo, I hope that we are going to introduce a state before INIT to do that16:18
lucasagomesso that it can be configurable, once the node enters the main loop (ZAPPING->...)16:18
lucasagomesthe second discover can be configured16:18
lucasagomesrloo, so it can be skipped16:19
openstackgerritDmitry Tantsur proposed openstack/ironic-specs: In-band hardware properites discovery via ironic-discoverd  https://review.openstack.org/13560516:19
jrollwhoa, I need to look at the state machine again, we should never automatically go to discovered :|16:19
rloolucasagomes: as long as something can be skipped ...16:19
lucasagomesrloo, but that doesn't totally invalidade discovering, I believe that discovering you could catch things like "hey this disk doesn't exist anymore"16:19
lucasagomesdue some failure16:19
*** Marga_ has joined #openstack-ironic16:20
rloojroll: yeah, please look. i think we should try to give this spec high priority.16:20
lucasagomesjroll, I was arguing about, making some states optional16:20
jrolllucasagomes: discovery in that case would touch node.properties to update the disk size or whatever16:20
jrollnot alert that a disk is gone16:20
jroll(AIUI)16:20
lucasagomesyeah in that diagram it seems it's going to do that16:21
dtantsurreference ramdisk for discoverd is merged https://review.openstack.org/#/c/122151/   \o/16:21
jrollyeah, do not want16:21
jrolldtantsur: nice!16:21
lucasagomesthe way I thought about it before was to have a consistent check that could be implemented16:21
lucasagomesafter discovering16:21
lucasagomesand after AVAILABLE16:21
lucasagomesbecause the machine could be hanging there in AVAILABLE for days before a nova boot comes in16:22
lucasagomesand something may fail, or someone wrongly pulled a cable etc16:22
lucasagomesthat way we would reject the machine if it was picked for deployment but the consistent check have failed16:22
lucasagomes(and nova with retry filter would then pick another machine to deploy that instance)16:22
jrollyeah, agree, we need to check for consistency, not sure if that should be in zapping or a different thing, but I really don't think it should be in discovery16:23
dtantsurlucasagomes, on the review you didn't answer the question, how (and why) discoverd will figure out whether to move node to prebooting or available :-/16:23
rloowe already have periodic task that checks the power; would a periodic check for consistency on nodes that are avail do what you want?16:23
lucasagomesdtantsur, my suggestion was to always move to prebooting16:23
jrolldtantsur: I would think ironic would handle that?16:23
jrollrloo: just a check at some point before making the node available16:24
jrollrloo: check that all the ram/disks/networks/etc are there16:24
lucasagomesjroll, yeah I don't think it should be discovering either16:24
dtantsurjroll, Ironic can't :) Ironic should be told that discoverd is done. and it's told by moving a node to the next state16:24
rloojroll: i thought lucas mentioned that the node is already in avail and something happens to it in that state16:24
lucasagomesbut I was trying to fit in the current diagram16:24
*** Marga_ has quit IRC16:24
jrolldtantsur: not sure how I feel about an api call with a target state of available16:25
lucasagomesrloo, jroll yeah that too. so we added two consistent checks (same code/action) in diff stages16:25
lucasagomesone after discovering and one after available16:25
jrolldtantsur: perhaps /nodes/uuid/states/provision {'target': 'discoverydone'} and ironic decides what to do16:25
*** yjiang5_away is now known as yjiang516:25
dtantsurjroll, that works for me16:25
jrolllucasagomes: discovering changes node.properties, why does it need to verify node.properties16:25
lucasagomesjroll, I was trying to fit in the current diagram16:26
lucasagomesbut I think we need some other thing there to actually perform the checking16:26
dtantsurjroll, lucasagomes, will I be able to move it to INTROSPECTIONFAIL too?16:26
lucasagomesdtantsur, somehow you have to tell ironic that it has failed to introspect16:26
lucasagomesso I believe yes16:26
lucasagomesidk if an API call with target:introspectionfail that makes little sense to me16:27
lucasagomesbut some way you gotta notify it16:27
dtantsurlucasagomes, for now I'm planning on Ironic timing out the introspection :D but it doesn't look too friendly16:27
jrollhmm16:27
jrollthis is hard :(16:28
lucasagomesyeah, it's not "fast" enough :)16:28
lucasagomesjroll, yup16:28
lucasagomesjroll, the way we are architect'ing things, we could use a state to do few other steps16:29
dtantsurI guess it's the first time we try to fit a 3rdparty to do some long-running node job for Ironic...16:29
lucasagomeslike DEPLOYING could check consistency16:29
rloodtantsur: your discoveryd needs that wait flag thing that we haven't described yet, right?16:29
dtantsurrloo, yes16:29
lucasagomes(I not necessarily think it's great, but I believe that it could be part of that state)16:29
dtantsurI mean, it can live without it, but it will be a strange state of node :)16:29
*** Masahiro has joined #openstack-ironic16:30
rloodtantsur: it seems like part of that wait flag mechanism should allow for whatever it is waiting for, to indicate that it is done and whether it succeeded or not.16:30
rloodtantsur: or to time out waiting ;)16:30
dtantsurrloo, good idea16:30
lucasagomessounds good rloo16:31
rloodtantsur: and then it seems like if it fails, it does into whatever *FAIL state associated with the state where the wait was?16:31
rloos/does/goes/16:31
dtantsuryeah16:31
rloowith some meaningful msg of course :-)16:31
dtantsurheh, do we need 'transition_reason' now?16:32
dtantsurlike we had 'maintenance_reason'?16:32
jroll... or use last_error?16:33
jrollthat's exactly what last_error is for16:33
rloodtantsur: dunno. I need to wait for the states to settle down first, etc. then we can work on adding the bits we need to deal with it all ;)16:33
dtantsurjroll, make last_error also available for writing from outside?16:34
dtantsurrloo, right16:34
jrollsigh16:34
*** Masahiro has quit IRC16:34
rloowhy the sigh, jroll?16:35
jrolldiscoverd or whatever sends {"target": "DISCOVERYFAIL", "reason": "everything is broken"}, ironic handles the state change and updating last_error16:35
lucasagomesmaybe as part of the body request to unset the wait flag16:35
lucasagomesyou can give the message16:35
jrollyeah16:35
lucasagomesnot even need the target16:35
lucasagomescause if it's DISCOVERING the DISCOVERYFAIL is the error state associated with it16:35
lucasagomesso ironic can figure that out16:35
jroll/nodes/uuid/discovery {"result": "success", "reason": ""}16:36
jrollfor example16:36
jroll/nodes/uuid/discovery {"result": "error", "reason": "busted"}16:36
jrollsomething like that16:36
rlooI have to read dtantsur's spec still. I don't like discoveryd hooked into the introspecting state.16:36
jrollI don't think we should be talking about intimate details of how discovery works before we figure out the states16:37
lucasagomesI think I will need a small spec for the disk hints :/16:40
lucasagomesto determine what could be used to figure out which device to pick, such as UUID, NAME, MODEL, SERIAL, WWN etc...16:41
jrolland any combination16:42
lucasagomesyes16:42
* lucasagomes starts a spec16:42
* jroll bbiab16:43
*** dprince has quit IRC16:45
*** romcheg has quit IRC16:48
openstackgerritVictor Lowther proposed openstack/ironic-specs: New Ironic provisioner state machine.  https://review.openstack.org/13382816:48
devanandamorning, all16:51
*** Marga_ has joined #openstack-ironic16:52
NobodyCamgood morning devananda16:53
* devananda reads scrollback16:53
*** achanda has joined #openstack-ironic16:53
dtantsurdevananda, morning16:54
lucasagomesdevananda, morning16:54
devanandalucasagomes: consistency check after AVAILABLE, in the deploy pipeline? a) I would rather not do that for several reasons, b) we're really adding a lot of features to this state machine now ...16:55
*** achanda has quit IRC16:56
lucasagomesdevananda, right, idk how long the machine could be sitting there waiting for the nova boot command to come16:56
*** jcoufal_ has quit IRC16:57
lucasagomesso it would be nice if we could test things like is the cable still connect to the right port (like JoshNang does when zapping)16:57
lucasagomesdoesn't need to be a full check, but allowing some check would be nice (optional, even if part of the deploy)16:57
*** pensu has joined #openstack-ironic16:58
*** achanda has joined #openstack-ironic16:58
NobodyCamdevananda: didn't you add rebuild to the new state machine?16:59
devanandalucasagomes: sure, I don't know that either. it could be just a few seconds, too.16:59
*** igordcard has joined #openstack-ironic17:00
lucasagomesdevananda, sure, it could be there for 10s or 1 month17:00
lucasagomeswe can't predict that17:00
lucasagomes(that's why I thought that some checks would be nice to have)17:01
*** Nisha has joined #openstack-ironic17:01
devanandahaving hints in node.properties about which disk to pick -- also not something I want to encourage. how is that not getting into the department of snowflake-management?17:01
lucasagomesdevananda, if we are supporting RAID17:02
lucasagomeshow I can tell ironic to use the raid device I just created?17:02
dtantsuralso cases like big RAID for data and small SSD for an OS image... Someone told me about it.17:03
lucasagomesdevananda, it's not about describing all disks we have in the server17:03
lucasagomesbut giving hints about which one we should deploy the image onto17:03
NobodyCamI have seen use cases where the custome wanted OS to be sdb not sda17:03
lucasagomesthat seems aligned with the project scope17:04
lucasagomesit's useful data for the dpeloyment17:04
victor_lowthernottomention that what /dev/sda is in your discovery image may not be the same as it is to the installed OS.17:04
* victor_lowther has encountered that17:04
lucasagomesvictor_lowther, exactly, device names can change on each boot17:05
lucasagomesbut things like serial, wwn, UUID17:05
lucasagomescan't17:05
victor_lowtherright17:05
lucasagomesthose are the hints I'm thinking of17:05
lucasagomesor even, if u want to make it more generic you could say17:05
victor_lowtherdevananda: they will definitly be needed.17:05
lucasagomeshey pick the disk which is >=1TB17:05
victor_lowtherit is snowflake management for vms because their idea of disk order is much more predictable.17:06
dtantsurlucasagomes, this logic is better left to discoverd...17:06
lucasagomesdtantsur, it's at deploy time17:06
dtantsurbut being able to precisely point to a disk seems very much needed to me17:06
lucasagomesdtantsur, idk how discoverd can do that17:06
lucasagomesvictor_lowther, +117:06
dtantsurlucasagomes, well, in RAID case it probably can't. w/o RAID it can pre-populate this property based on some logic17:07
victor_lowtherscan /dev/disks/by-whatever and pick the awesomest ones17:07
dtantsur(in theory, it's not implemented)17:07
lucasagomesvictor_lowther, yup, or lsblk17:07
victor_lowtherafter your raid arrays are created.17:07
lucasagomesbut in Ironic POV, hints are generic, in the ramdisk we cna create the logic using sysfs or lsblk or whatever to figure out it17:07
victor_lowtherya17:08
lucasagomesdevananda, does it sounds fair?17:08
*** Marga_ has quit IRC17:09
* devananda returns to reading scrollback, had to pop into a meeting ...17:09
*** Marga_ has joined #openstack-ironic17:09
victor_lowtherCrowbar used an unholy combination of /dev/disk/by-id and the actual full device paths in sysfs to figure things out.17:10
lucasagomeshardware device path?17:10
victor_lowtherdue to fun on one of our targets where some random disk in an external drive array was /dev/sda17:11
lucasagomesheh yeah without udev rules you can't predict it17:11
lucasagomes(unless it's LVM so u can :))17:11
victor_lowtherdue to PCI bus ordering and module insertion ordering fun.17:11
victor_lowtherLVM and UUIDs are basically useless in the scenario I am thinking of17:12
victor_lowtherbecause they are not applicable to raw physical disks.17:12
lucasagomessure17:12
victor_lowtherso don't help you ifgure out where the boot partition should be.17:12
victor_lowtherUEFI is sorta nice there in that your boot partition can be anywhere as long as UEFI can see it.17:13
victor_lowtherlucasagomes: ya, hardware device path.17:14
lucasagomesright in this case we want to find the root device, and lay the image onto it17:14
*** viktors is now known as viktors|afk17:14
lucasagomesthe root partition (in case of a full disk image) will be part of the image itself17:14
lucasagomesvictor_lowther, right17:14
victor_lowtherso, dtantsur17:16
victor_lowtherhow to you want discoverd to work in the state machine?17:16
rlooNobodyCam: I just looked at this spec https://review.openstack.org/#/c/101122/ which is still targetted for juno. Shouldn't it have been abandoned?17:16
NobodyCamyea let me look17:17
*** marcoemorais has joined #openstack-ironic17:17
NobodyCamah ha.. seems I added a comment instead of abandoning it17:18
dtantsurvictor_lowther, in case of using discoverd for introspection, Ironic sets node to DISCOVERING, WAIT -> true, calls to discoverd and relaxes :)17:18
dtantsurvictor_lowther, after that discoverd comes back and advances node state to the next one17:18
devanandahttps://drive.google.com/file/d/0Bz_nyJF_YYGZZ05zaU9kb2Z4SE0/view?usp=sharing17:19
*** hemna__ has joined #openstack-ironic17:19
NobodyCamrloo: updated17:19
rloothx NobodyCam17:19
*** Marga_ has quit IRC17:19
devanandavictor_lowther: my draft from last night, forgot to post anywhere - also working a bit more on it now17:19
devanandavictor_lowther: i'm looking at your latest spec now ...17:20
NobodyCamlooking very good devananda :)17:20
*** dtantsur is now known as dtantsur|brb17:22
lucasagomesdevananda, sorry for insisting, but just to get the initial feedback cause I'm currently writing the spec for it. R you ok with the hints?17:23
*** dprince has joined #openstack-ironic17:28
devanandaoslo change / hacking rule has broken our gate (and just about everyone else's)17:29
devanandahttps://bugs.launchpad.net/hacking/+bug/139847217:29
lucasagomes:/17:30
devanandalucasagomes: do we guarantee which disk device the node will boot from?17:31
lucasagomesdevananda, yes, after writing the image we get the UUID and set it as the ROOT=UUID=17:31
lucasagomesso at boot time, after deployment it's going to boot from the right device17:32
devanandalucasagomes: that's a kernel param, right?17:32
lucasagomesdevananda, yes17:32
devanandawhich applies only to net boot17:32
lucasagomesyes, for the fulldisk images I believe they already have a bootloader in the image17:33
JayFyes17:33
devanandayup17:33
lucasagomeswith the right config to boot the right partition17:33
*** achanda has quit IRC17:33
devanandaand if we're providing an API to change which disk the image is written to, are we sure that we wrote that to the disk the system expects the bootloader to be on?17:33
*** achanda has joined #openstack-ironic17:34
devanandaalso, after an image is written to disk, and the instance is deployed, what does it mean if the operator changes this value in the API without redeploying a new instance?17:34
lucasagomesdevananda, is a) the fulldisk image example?17:34
lucasagomesdevananda, because right now if u have 2 disks and ironic pick the first17:35
lucasagomesit's random17:35
lucasagomesironic could boot once and get disk A and boot second time and get disk B17:35
lucasagomesit's way more unpredicable the way it works now17:35
lucasagomesfor B) this is what zapping should do, clean the disks17:36
lucasagomesI mean, both examples could happens as-is17:36
lucasagomeswith the current code17:36
devanandaI'm not sure where zapping got introduced to this question17:36
devanandaas for the current behavior, if it's not repeatable -- that's a bug17:37
devanandaassuming no hardware failures, any number of (re)deployments, given the same inputs, to the same node, should result in the same disk being used17:37
lucasagomesI'm not sure I'm in sync. Right yeah it's bug17:37
devanandaif that's not the case, that should definitely be fixed17:37
lucasagomesand the hints helps fixing it17:37
devanandataht's where I disagree17:38
lucasagomesdevananda, so that assumition is wrong17:38
devanandawe should have predictable behavior. if I understand where you're going with hints (and maybe I dont)17:38
lucasagomescause you don't have udev rules enforcing device names or ordering17:38
devanandathey will lead to less repeatability, not more17:38
lucasagomesyou can't assume that17:38
*** achanda has quit IRC17:39
JayFlucasagomes: devananda: device names / identifiers are *not used at all* by IPA today17:39
victor_lowtherthe issue is that (for example) discoverd's idea of what /dev/sda is may not be the same as what any random other distro's idea is.17:39
devanandaright17:39
lucasagomesthe hints is giving the operator a way to tell based on persistent block device naming which device to use to deploy the image17:39
victor_lowtherbecause SCSI device ordering in Linux is not stable, and it never will be.17:39
JayFAlthough I think we take the "first", smallest disk >4GB ... so it's very possible that it could be inconsistent17:39
devanandaeach OS may order the disks differently17:39
JayFbut in the case of IPA we could easily make the pick-a-device-to-deploy-to decision have more knowledge17:39
devanandathe OS-specific ordering shouldn't matter to Ironic. That it matters within the context of discoverd is a different problem17:40
JayFI think the case that lucasagomes suggests is possible (and I've seen it happen), especially since deploy ramdisks use coreos+very recent kernel which means ordering could be radically different than an older (think: RHEL5/6) image would have17:40
lucasagomesJayF, yeah, that's what I'm tying to abstract17:40
JayFdevananda: I'm saying I think it can today with IPA, if you have disks of equal size17:41
victor_lowtherthe stablest solution I have found is to figure out what disk to use via its sysfs device path, then ensure that the IS install and bootloader use that disk based on its disk ID/WWN/serial number.17:41
JayFvictor_lowther: that's *exactly* how we do discovery for updating firmwares on our raid cards during decom17:41
devanandacan Ironic repeatably a) deploy an image b) onto the same disk c) which is the disk that that machine attempts to boot from || has the UUID which we pass as a kernel param via PXE17:41
lucasagomesvictor_lowther, JayF yup. That's what I'm proposing17:41
victor_lowtherdevananda: UUID is a filesystem/partition property, not a disk property.17:42
lucasagomesdevananda, the UUID is after the deploy17:42
victor_lowtherBest to not rely on it for OS install purposes17:42
lucasagomescause it's a property of the fs17:42
lucasagomesyeah that ^17:42
devanandavictor_lowther: yes. argh....17:42
JayFyou can rely on labels17:42
JayFand label the partition how you expect17:42
JayFwhen you deploy it17:42
JayFalthough that doesn't help in the deploy ramdisk case17:43
devanandacan Ironic repeatably a) deploy an image b) onto the same disk c) which is the disk that that machine attempts to boot from17:43
victor_lowtherthat is still after you pick the disk to deploy to17:43
lucasagomesdevananda, if ur booting via the network idk about c)17:43
JayFdevananda: my answer would be that IPA today, in some cases (two disks of same size), could consistently fail c, but I wouldn't expect unstable behavior across provisions17:43
victor_lowtherOnly if something tells it which physical disk to use by ID17:43
lucasagomes+117:44
victor_lowtherotherwise it will just usually work, not always work.17:44
devanandaso, IIUC, the problem you're all referring to is that the ramdisk agents (whether iSCSI or IPA or discoverd) may use non-repeatable ordering, or may use different ordering from each other (eg, if using discoverd and then using IPA on the same node)17:44
victor_lowtheryes17:44
lucasagomesdevananda, correct17:44
devanandaand just to check, the problem you're referring to is NOT related to the physical boot device or knowing which physical disk the server will boot from17:45
*** jistr|training has quit IRC17:45
victor_lowtherit is related.17:45
lucasagomesdevananda, the problem is about finding which device we should write the image onto. And that means that we want to boot from that disk too17:47
lucasagomes(that's why we enforce ROOT=UUID= in the kernel cmdline)... for the fulldisk image it already has a bootloader configured (but there's still problem if both disks contains a bootloader)17:48
lucasagomeswhich is not solved by the hints (and is not intended to solve that too)17:48
victor_lowtherand if you are managing the RAID controller, you can directly control that.  If you are using UEFI, you don't have to care about that.  Otherwise, you have to rely on having enough BIOS control, heuristics and/or quality of firmware.17:48
*** sambetts has quit IRC17:48
JayFvictor_lowther: was not my intention to -1 you as soon as you pushed a new patchset. The comment I just posted is from yesterday and it just never made it's way up :(17:49
lucasagomesyeah, for RAID it's good to, to make sure you are using the device you've just created17:49
victor_lowtherit helps that most RAID controllers let you specify which disk they will try to boot from.17:49
lucasagomesvictor_lowther, [off-topic] http://doodle.com/9h4ncgx4etkyfgdw2wpdircv help us voting on the mascot name :)17:49
*** athomas has quit IRC17:50
victor_lowtherJayF: no worries17:50
*** Marga_ has joined #openstack-ironic17:51
JayFjroll: nova-spec for configdrive just went in17:52
JayFjroll: woo17:52
NobodyCamnice :)17:52
victor_lowtherlucasagomes: Doodle 404!17:55
lucasagomesew lemme check17:55
lucasagomesvictor_lowther, http://doodle.com/9h4ncgx4etkyfgdw here it go17:56
victor_lowtherThat worked17:57
lucasagomesthanks :D17:57
NobodyCambrb17:58
*** rushiagr is now known as rushiagr_away17:59
lucasagomesaight I will go now18:02
lucasagomeshave a good night everyone18:02
lucasagomes(I will check the channel from time to time)18:02
*** lucasagomes is now known as lucas-dinner18:02
*** derekh has quit IRC18:06
*** rwsu has joined #openstack-ironic18:07
*** Nisha has quit IRC18:10
NobodyCamhave a good night lucas-dinner18:11
*** anderbubble has quit IRC18:18
*** Masahiro has joined #openstack-ironic18:19
*** anderbubble has joined #openstack-ironic18:21
*** pensu has quit IRC18:21
*** pensu has joined #openstack-ironic18:23
PaulCzargetting a fun new error from ironic-conductor.    it happens during what looks to be laying down the image files in the tftproot directory:18:23
PaulCzarhttps://gist.github.com/paulczar/ca2c3c21612b5cef5cc318:23
*** Masahiro has quit IRC18:23
PaulCzarunfortunately the error isn't very clear where to look to work out what is going wrong18:24
devanandaPaulCzar: is this repeatable? if so, under what conditions?18:26
PaulCzarany time I try to provision a node ?18:27
PaulCzarstill using the ssh_agent18:27
PaulCzarwith virtualbox18:27
NobodyCamPaulCzar: swift is setup for temp urls?18:30
*** harlowja_away is now known as harlowja_18:31
NobodyCamand you have enough free disk space to hold the image?18:31
devanandathis is odd: 2014-12-02 18:21:19.361 4171 DEBUG ironic.drivers.modules.image_cache [-] Destination /tftpboot/96b061ed-eb91-4b12-b2d6-9d1c6ce34369/deploy_kernel already exists for image 88786efb-8985-4017-8839-7cd98ff9c87a fetch_image /usr/local/lib/python2.7/dist-packages/ironic/drivers/modules/image_cache.py:11018:31
jroll2014-12-02 18:21:19.361 4171 WARNING ironic.conductor.manager [-] Error in deploy of node 96b061ed-eb91-4b12-b2d6-9d1c6ce34369: 'image_source'18:31
devanandaPaulCzar: can you manually delete the /tftpboot/96b061ed-eb91-4b12-b2d6-9d1c6ce34369 directory18:31
jrollthe node doesn't have instance_info.image_source18:31
jrollerr, instance_info['image_source']18:32
jrollwhich nova is supposed to put down afaik18:32
devanandajroll: huh. that's not a great error message, then.18:32
devanandayea, Nova should be passing that in, but it should fail that check sooner, right?18:32
jrolldevananda: really? I think it's a great error :P18:32
devanandalook at where it comes from18:32
jrollI'm joking18:32
devanandaheh18:33
devanandathere should be an exception traceback just after that18:33
PaulCzarhmm where does it get image_source from?   glance metadata ?18:33
jrollPaulCzar: nova puts the image uuid there18:33
devanandaPaulCzar: that should be the image UUID that you requested via "nova boot"18:33
PaulCzarcrazy question ... if I used the image name will nova still pass the uuid ?18:34
devanandathe nova.virt.ironic driver should populate the node.instance_info['image_source'] field during driver.deploy18:34
devanandaPaulCzar: yup18:34
devanandawell. it should. if it's not ....18:34
jrollso that probably comes through from https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/agent.py#L25918:35
jrollthat's the only direct image_source access we do18:35
jrollmore specifically https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/agent.py#L17718:35
PaulCzarlet me try again with the uuid just to make sure18:35
jrollbah, we don't validate image_source18:36
devanandahttps://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L71518:36
*** dtantsur|brb is now known as dtantsur18:36
devanandaPaulCzar: is there an exception traceback in the conductor log after that, which you didn't include in the paste?18:36
PaulCzardevananda: nope, there's no exception traceback18:37
devanandaso that bothers me a bit18:37
jrollbtw that comes from https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L71518:37
devanandawe're catching and reraising, but losing the traceback somehow?18:37
jrollit's a catch-all18:37
devanandajroll: yea, I just linked that :)18:37
* jroll is blind right now18:37
devanandaalso, that exception, when converted to a string, is just the word "image_source"18:38
jrollso, it should reraise that exception18:38
jrollwhere does that end up? :|18:38
devanandawell, I bet it is, but there's no logger set up for that thread18:38
jrollmmm, right.18:38
*** rwsu has quit IRC18:38
dlaubehey guys, I'm seeing "ERROR ironic.drivers.modules.agent [-] node 1d0203c6-151c-4113-bb31-738b336a07e4 command status errored: {u'message': u'Error downloading image.', u'code': 500, u'type': u'ImageDownloadError', u'details': u'Could not download image with id 1edccf41-f244-4304-ab65-66d28d5a86a7.'}" when I try to nova boot18:39
devanandas/logger/exception handler18:39
devanandasince, clearly, it's capable of logging from that thread18:39
dlaubeI've added agent_pxe_append_params = nofb nomodeset vga=normal console=ttyS0 systemd.journald.forward_to_console=yes       but I'm not sure what I should be looking for in the bm console logs18:39
jrollso maybe outside of that with reraise block we should log.exception(e)18:39
jrolldlaube: look for anything interesting... or feel free to just paste the whole thing18:39
devanandajroll: I'm guessing a bit here, but it seems like either we can drop the reraise and just log and clean up, or we should sort out why worker threads aren't logging exceptions.18:41
* devananda returns to thinking about the state machine18:42
dlaubehttp://paste.openstack.org/show/p3Hvh7DAfDzspebWFxbl/18:42
dlaubelooks like coreos deploy image went well… nothing from IPA stands out to me18:42
dlaubeam I able to ssh into the coreos deploy image while it is doing the rest of the deployment?18:43
jrolldevananda: both, imo? unexpected exceptions that we don't catch should be logged, we've written a bunch of code to work around it18:43
jrolldlaube: if you build your own ramdisk and embed ssh keys, you can :)18:43
jrollugh, ipa y u no lo18:43
jrolllog, even18:43
jrollJayF: ^ that's what IPA console logs look like in devstack fyi18:44
rloodevananda, jroll: isn't it because there is a hook there to call ._provisioning_error_handler()18:44
dlaubejroll: can I still use disk-image-builder and then add some things in before I glance image-create?  or is a custom deploy image/ramdisk more involved in that18:46
dlaubegoogling now18:46
jrolldlaube: blah, I think we're missing some log things18:47
jrolldlaube: for IPA, we don't use DIB, there's a builder in the repo18:47
jrolldlaube: https://github.com/openstack/ironic-python-agent/tree/master/imagebuild/coreos18:47
jrolldrop things in oem/ as needed18:47
jroll(e.g. oem/authorized_keys)18:48
NobodyCambrb...18:48
jrollrloo: where is there a hook for ._provisioning_error_handler()?18:48
devanandajroll: https://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L69818:48
PaulCzarI think I'm going backwards now ...  - Error in deploy of node 6bbed42b-68eb-4946-90a5-68c797762f94: HTTPNotFound (HTTP 404)18:48
devanandahttps://github.com/openstack/ironic/blob/master/ironic/conductor/manager.py#L59318:49
dlaubegood to know. thanks jroll18:49
devanandaPaulCzar: does "ironic node-show <UUID>" indicate anything on the last_error field?18:49
jrollah, I see18:49
jrolldlaube: np18:49
PaulCzaralso these errors seem to be surfacing as WARN rather than ERROR18:50
jrollrloo: devananda: yeah, so _provisioning_error_handler doesn't actually do anything in this case18:50
PaulCzardevananda: last_error: none18:50
rloojroll: right. The comment for _provisioning_error_handler() is perhaps why. It thinks it is only being called when there was an exception spawning a worker.18:51
devanandarloo: that does not appear to be the case, based on reading task_manager18:51
jrollit knows more, it just only handles that sort of exception18:51
devanandarloo: but IMBW ...18:51
rloodevananda: right, based on the comments in task_manager (I'm too lazy to read the code now)18:52
devanandaPaulCzar: something's off. if the deploy fails, it should save the failed state and reason why18:52
PaulCzarpaste of conductor log is in the first comment here - https://gist.github.com/paulczar/ca2c3c21612b5cef5cc318:53
devanandaPaulCzar: paste of "ironic node-show <UUID>" for the failed node?18:54
*** dtantsur is now known as dtantsur|afk18:54
PaulCzaradded as comment in above gist18:55
NobodyCaminstance uuid is none?18:56
devanandaPaulCzar: this has been cleaned up. there's no failure here.18:56
devanandaPaulCzar: can you capture that after the boot fails?18:56
PaulCzarthat is after the boot fails18:56
PaulCzarblerg, nova scheduler fails the nova boot with no valid host18:57
dlaubethe thing I find interesting about the failure to download the image error reported in the ironic cond log is that it looks like the image is being retrieved just fine according to glance reg  "29831 DEBUG glance.registry.api.v1.images [7c53c4cb-4271-478c-ae2c-aa87300f7471 08df81fb719d413eacb36c2a249f1514 dcd2a172eb934e39a99ddb216e94b69f - - -] Successfully retrieved image ef0270d4-9e13-4c58-a8e9-bf8aad31d09d show /opt/stack/glance/glance/registry/api18:57
PaulCzardlaube: I had something similar to that the other day and I think the metadata for the image had bad values for kernel and ramdisk18:59
devanandaPaulCzar: gotcha. it looks like maybe this image is not accessible by the ironic service user? b195e0dc-fb06-4032-aae9-720b63abb92319:00
PaulCzarson of a!19:00
PaulCzarthe newer glance cli doesn't allow --public19:01
dlaubePaulCzar: thanks, I'll double check the ids for the kernel and initrd specified in glance for the image I'm using with my nova boot call19:02
jrolldlaube: we do a glance.show() on the image, the agent actually downloads it from swift19:03
devanandavictor_lowther: i think i missed the discussion somewhere -- is there a reason that your new draft does not have a path from DELETED to AVAILABLE, without going through both ZAP and INTROSPECT ?19:03
dlaubejroll: ahh19:03
jrolldlaube: I think you're hitting this one https://github.com/openstack/ironic-python-agent/blob/master/ironic_python_agent/extensions/standby.py#L14619:03
jrollwould love a patch to add logging for that19:04
dlaubeso I should check my ~/logs/screen-s-object.log   in devstack then19:04
victor_lowtheryes, and it has to do with drawing that diagram while listening to an all-hands. :/19:04
PaulCzaris swift needed for the non-devstack ?19:04
PaulCzarit's not mentioned in the developer quick start19:04
devanandavictor_lowther: hah19:04
jrollPaulCzar: for the agent driver (docs suck really bad for the agent driver) :(19:05
jrollvictor_lowther: devananda: I really really think that zapping should not imply introspecting and vice versa19:05
devanandavictor_lowther: ok. I'm taking another stab at it. I like the new wording around multiple states. What do you think of denoting that with a symbol to be even clearer, eg, [DEPLOY*]19:06
devanandajroll: ditto19:06
victor_lowtherMake a comment, and I will fix it in the next rev -- there are a few other reviews I missed while hacking up version 8.19:06
* jroll comments on review19:06
PaulCzarare you kidding me right now ?   something as critical as mentioning swift is required isn't mentioned anywhere ?19:06
victor_lowtherjroll: why not?19:06
*** mikedillion has joined #openstack-ironic19:07
victor_lowtherdevananda: sure, I can throw a symbol there.19:07
jrollPaulCzar: I'm a horrible person :(19:08
jrollPaulCzar: to be clear, this is only for agent driver, not pxe driver19:08
*** spandhe has joined #openstack-ironic19:08
jrollvictor_lowther: 1) because they are two separate things; 2) introspecting should never be automatically triggered19:08
PaulCzarso my options are to run swift or spend $1000 on building a pxe lab ?19:09
jrollPaulCzar: I don't understand what infra you need for a pxe lab that you don't need for the agent19:09
victor_lowtherjroll: I think ZAPPING is a special case because out of all the states we have, ZAPPING is where we are going to make changes that can change what the node hardware looks like to everyone else.19:10
PaulCzarjroll: doesn't pxe require ipmi gear to power on/off ?19:10
jrollPaulCzar: how are you doing power control with the agent driver?19:10
jrollbecause both require *some* form of power control19:10
victor_lowtherso it should update node.properties for cpus, arch. memory, disk sizes, etc.19:10
PaulCzarjroll: using the virtualbox power controls in ssh_agent19:11
jrollPaulCzar: you can do that with the pxe driver as well, see pxe_ssh19:11
jrollvictor_lowther: if the operator does things in zapping that will change properties, they should run discovery afterward or something. maybe we can make that an optional thing.19:11
*** pensu has quit IRC19:11
jrollvictor_lowther: imagine you flash new firmware in zapping and a disk disappears, do you want to update node.properties to reflect that disk is gone?19:12
victor_lowtherhell yes19:12
PaulCzarjroll: ah, good to know19:12
jrollwhat19:12
jrollwhat19:12
PaulCzarjroll : docs for pxe?  or do I need to dig through the source to figure it out? :)19:12
jrollif a disk disappears for an unknown reaosn, I want ironic to fail that node hard19:12
*** sreekanth has joined #openstack-ironic19:13
jrollvictor_lowther: ^19:13
jrollPaulCzar: docs for pxe are pretty alright, just follow the deploy docs that are up19:13
* jroll finds a link19:13
victor_lowtherthe one does not preclude the other,19:13
devanandaPaulCzar: PXE_* drivers are better documented at this point. largely becausethey've been around the project the longest19:13
victor_lowtheror do you want the hardware properties in node.properties to be proscriptive instead of descriptive?19:13
jrollvictor_lowther: I don't understand19:14
devanandaPaulCzar: that said, if you're looking at doing anything meaningful with physical hardware, why are you using *_ssh / virtualbox instead of IPMI ?19:14
jrollPaulCzar: http://docs.openstack.org/developer/ironic/deploy/install-guide.html19:14
PaulCzardevananda: because I want to prove this out locally and be able to do CI etc without tying up phyisical hardware19:15
victor_lowtherdo you want the info in node.properties to declare what you expect a node to have19:15
jrollvictor_lowther: very much yes19:15
victor_lowtheror do you want it to reflect what the node has?19:15
jrollvictor_lowther: perhaps discovery should fill that in the first time, to reflect the actual properties19:15
jrollvictor_lowther: but I never want those to change without me knowing19:15
devanandavictor_lowther, jroll: we have an important distinction here. "properties declare what the node should have; if it doesn't, fail fast" ||19:15
devananda|| "I dont know what tjhe node has, go discover it and update properties"19:16
NobodyCamdevananda: ++19:16
devanandathe latter is exceptionally rare19:16
devanandaand, IMO, should require external initiation19:16
jrollcompletely agree19:16
victor_lowtherthe latter is the model I have always operated in.19:16
victor_lowtherit does not inhibit error detection and resolution due to missing hardware19:16
PaulCzarbtw all the docs say to use glance ... --public ...   new clients should be  glance ... --is-public=true19:16
jrollvictor_lowther: so if half of your ram fails, you want to just keep using that machine?19:16
victor_lowtherbecause I have always tracked those changes over time.19:16
NobodyCamPaulCzar: can you file a bug on that so we don't forget19:17
victor_lowtherif half the ram fails without redundancy, the machine usually dies a horrible death19:17
victor_lowthernot continue to work silently19:17
devanandavictor_lowther: if the characteristics of the node change in any meaningful way, unintentionally, chances are good that it won't match a Nova Flavor any more19:17
victor_lowtherunless it lost halv the ram due to someone stealing it.19:18
jrollvictor_lowther: right. then by the existing spec: the user ends up calling nova delete, ironic introspects the hardware and updates node.properties, it suddenly has half the ram19:18
devanandavictor_lowther: so then the node sits idle, but not in an error state, indefinitely, since it no longer can be used by Nova19:18
victor_lowtherthen your inventory control system notices and goes WTF19:18
victor_lowther(which should be something besides Ironic)19:19
devanandavictor_lowther: you assume someone has a CMDB19:19
devanandawhich, I think they should, but I also think that encoding that behavior in ironic is not helpful.19:19
victor_lowtherand if they do not, how do they track everything?19:19
devanandadoing the other thing (take the node out of rotation) does not prevent an external system from doing what you expect (namely, noticing the error)19:19
devanandavictor_lowther: napkins19:20
victor_lowtherExcel19:20
NobodyCammost testing env's don't have a cmdb19:20
devanandaPaulCzar: if you don't want to run swift, and you are building a virtualized CI system for Ironic, thenyes, you probably want to stick to pxe_ssh driver19:20
PaulCzarfinding out all of this the hard way :)19:21
devanandaPaulCzar: that said, the Agent is pretty spiffy, and I didn't think running swift would be enough of a burden to prevent someone from choosing the Agent19:22
jrollvictor_lowther: even if you do have a CMDB, you'd have to set it up to poll ironic, notice changes, alert, etc etc19:22
victor_lowtherIf y'all intend node.properties to be proscriptive, I don't have an issue with that.19:22
victor_lowtherthat is jsut a mode that Crowbar has never operated in.19:22
devanandavictor_lowther: :)19:22
jrollvictor_lowther: I may be biased, I don't even care to run discovery because I want to tell ironic what should be there19:22
victor_lowtherjroll: that inevitably fails19:23
jrollwhy?19:23
jrollit's working great for me19:23
victor_lowtherso someone else sorts out inventory from ordering then?19:23
jrollwe get a spreadsheet from our vendor of exactly what was shipped19:24
devanandafor my uses, discovery is good once and only twice -- when new hardware arrives and I want to ensure the factory manifest is accurate (which it usually isn't) and after replacing (parts of) hardware, for the same reason19:24
jrollwe run a shitty python script to take that data and put it in ironic19:24
devanandajroll: next you'll tell me that spreadsheet is never wrong19:24
jrollplug everything in and pxe boot19:24
jrolldevananda: for the purposes of node.properties, it's never been wrong :)19:24
devanandajroll: fair 'nuf19:24
* jroll will not say the same about other info there19:25
devanandaNIC MAC's and IPMI info are the largest problem I've seen so far, actually19:25
devanandanot # of CPU cores19:25
jrollyep, same here19:25
victor_lowtherMust not order systems in Q4, then. :)19:25
victor_lowtherbut anyways19:26
victor_lowtherI am fine with logic that has node.properties for hardware config be proscriptive if set19:26
victor_lowtherwe would just need to have logic at certian points in the state machine to check it.19:27
jrollagree19:27
jrollwhat we do today as part of zapping is verify that what's in the node is what's in ironic19:28
jrolland fail out if there's a mismatch19:28
jrollI don't know if that's the best route, but it's been working for us, we've caught real hardware failures with this19:28
devanandajroll: iiuc, lucas-dinner was proposing to put in an assertion-check at the beginning of deployment19:29
jrolldevananda: right, I disagree, too slow19:29
devanandaI would strongly prefer to put that befor making the node available19:29
jrollyep19:29
victor_lowtherso, in INTROSPECTING, then? :)19:29
jrollIMO no19:30
jrollintrospecting should be manually triggered, look at what's there and update the node object, no questions ask19:30
*** ndipanov is now known as ndipanov_gone19:30
victor_lowtherok19:30
jrollasked*19:30
victor_lowtherthen it should be outside the state machine19:30
jrollbut... folks can and will disagree with me :)19:30
jrollI mean, it's still a valid state19:31
victor_lowthersince it can happen any time19:31
victor_lowtherby operator request19:31
victor_lowthersort of like maintenance mode.19:31
jrollno, it can only happen in certain states (available/init/maybe more019:31
jrolllike, you couldn't send a DEPLOYING node into introspection19:31
*** pelix has quit IRC19:31
jrollbecause it's busy19:31
devanandanot in available19:32
jrollso you would go available -> init -> introspecting/19:32
jroll?19:32
jrollthat would be fine with me19:32
victor_lowthermeh19:32
devanandajroll: s/init/managed/19:32
jrollsure19:32
devanandait's just a word, but i think that substitution has value in our discussion19:32
jrollso many new words19:32
victor_lowtherI would just set maintenance mode, run introspection, verify that I got what I expect, and unset maintennance mode.19:33
devanandamaintenance mode doesn't affect current state,a nd can be applied to any state19:33
victor_lowtherExactly.19:33
devanandawhat about requiring the node be transitioned back to managed (aka init) in order to initiate introspection?19:34
victor_lowtherback to the zapping can change things point19:34
devanandaa node that is managed but not available for use yet can be (re)introspected19:34
devanandazapping is orthogonal19:34
devanandaI can zap at that point. I can also zap between delete and available19:34
devanandait's automatic between delete and avaialble19:35
devanandait's manual on a managed node19:35
devanandajroll: I think this is closer to what ya'll are doing today?19:35
jrolldevananda: without the intermediate managed state, yes19:35
jroll(and we don't have any concept of introspection, ofc)19:36
devanandasure19:36
devanandado you zap nodes which are available to nova?19:36
devanandaor do you do something to take them out of scheduling first?19:36
*** mrda-away is now known as mrda19:36
mrdaMorning Ironic19:36
NobodyCammorning mrda19:37
jrolldevananda: we zap available nodes, they go to the "ZAPPING" state (DECOMMISSIONING downstream) where they can no longer be scheduled19:37
devanandak19:37
devanandaso there's a small race there19:37
jrollyet another :)19:38
devanandajroll, victor_lowther: http://paste.openstack.org/show/XsFt3wtJevf8LCcCB5fG/19:38
devanandalemme know what you think19:38
jrollwe don't actually do that very often, only if e.g. we get new firmware or whatever19:38
jrollis R:thing a request for thing?19:39
devanandajroll: yes. see victor's draft for an explanation19:39
devanandajroll: tldr; the PUT to the REST API19:39
jrollthought so, thanks19:39
jrollright19:39
devananda[FOO*] indicates FOOING, FOOED, and FOOFAIL19:39
victor_lowtherhm19:39
jrollright19:40
victor_lowtherI think it is weird to have [ZAP*] in two places in the graph19:40
devanandavictor_lowther: i agree19:40
devanandabut I wanted it to be clear that managed -> zap -> managed is a manually-invoked process19:40
devanandaand active -> delete -> zap -> available is automatic19:41
devanandawasn't sure how else to do that19:41
victor_lowtherhm...19:41
jrolldevananda: yeah, I think this looks fine19:41
devanandai also left out preboot :(19:41
victor_lowthernamaged -> zap+flash -> managed19:41
victor_lowtherand delete -> zap -> available19:42
victor_lowtherpreboot is easy to forget19:42
victor_lowtherI would just make it a rule that all fooing states should be booted into somehting19:43
victor_lowthermaybe.19:43
victor_lowtherHave to think about what that entails.19:43
victor_lowtherhm...19:43
victor_lowthermanaged -> [mangle] -> [introspect] -> managed?19:44
devanandathat's not necessarily linear19:44
victor_lowthermake it clear that zap will not change what the hardware looks like, whereas mangle can?19:44
* victor_lowther nods19:44
devanandawhich is why i made zap and inspect separate loops19:44
victor_lowtherbut hard to freehand nonlinear things in irc. :)19:45
devanandaindeed19:45
devanandajroll: could you guys implement the long-running-ramdisk stuff within the zap* state?19:46
jrolldevananda: as in, boot the ramdisk at the end of zap*19:46
jrollseems weird19:46
devanandayes19:46
jrollI don't think we need the preboot state, tbh19:46
devanandait's part of "get it ready for provisioning again"19:46
devanandaright?19:46
jrollI guess?19:46
jrollI mean, it's an optimization19:47
devanandathat's what I'm getting at19:47
NobodyCamI agree with jroll sounds strange at first read19:47
jrollyou can schedule to nodes that are prebooted or not prebooted19:47
PaulCzarany benefits to using ipxe over pxe ?19:47
PaulCzarreading through the pxe docs right now19:47
devanandapreboot as a seprate requestable state, which is essentially just a permutation of AVAILABLE, doesn't fit in the general case19:47
jrollone is faster, should prefer one, just check power on to decide19:47
victor_lowtherPaulCzar: less tftp traffic19:47
jrollPaulCzar: http > pxe19:47
PaulCzaripxe seems to be simpler ?19:47
jrollerr19:47
devanandaPaulCzar: http > tftp19:47
jrollhttp > tftp19:47
*** lucas-dinner has quit IRC19:48
devanandaPaulCzar: PXE is simpler in a sense. iPXE is both more extensible and more robust.19:48
victor_lowtherpotential downside is that you have booting to local disk can be problematic depending on firmware19:48
openstackgerritJarrod Johnson proposed stackforge/pyghmi: Implement server side IPMI protocol (WIP)  https://review.openstack.org/13810919:48
devanandaPaulCzar: you'll need a tftp service (just run tfptd) for PXE. you'll *also* need an HTTP service (eg, apache) for iPXE19:48
PaulCzardevananda: right ... I'll start with pxe then ... although I do like the idea of http ... tftp is super slow19:49
jjohnson2tftp means a really ugly workaround for > 65k blocks, and tftp is easy to implement in software, meaning it isn't fancy enough to do things like send more than one packet without acknowledgement19:49
devanandaPaulCzar: start simple ++19:49
devanandajroll: if you just want a scheduling hint, why not use something in node.properties ?19:50
devanandajroll: we don't need a discrete state for that19:50
devanandalike a nova filter which prefers nodes with "is_prewarmed" in node.properties['capabilities'] over ones that do not have that key19:51
victor_lowtherthat is what I suggested to hint to Ironic whether and what to preboot.19:51
devanandavictor_lowther: the question in my mind right now is, how would one indicate that to ironic19:52
devanandaeither ...19:52
devananda- all nodes preboot all the time19:52
devananda- none preboot automaticaly, but an operator (or external service) can manually request a specific node to preboot19:53
devananda- some magical orchestration logic gets added to ironic with knobs that allow it to decide when and how many to preboot19:53
devanandai clearly don't like option #319:53
victor_lowtherdefeinitly not 319:53
victor_lowtherI would lean to 219:53
devanandaif it's 1, I would say it belongs in ZAP*19:54
devanandaand I don't actually see a benefit to 219:54
jrolldevananda: yeah, that's fine, I've never thought we needed preboot states19:54
devanandaand I don't actually see a benefit to 2 -- as a separate state in the state machine19:54
jrolldevananda: yeah, actually, in ZAP* might work19:55
* jroll looks at code19:55
devanandajroll: cool. i'm curious to know how you'd implement 2 in ZAP*19:55
victor_lowtherso, how to tell what gets prebooted?19:55
victor_lowther(the image, not hte nodes)19:55
victor_lowtherin that case?19:55
jrolldevananda: 2 would not work in ZAP*19:55
NobodyCambrb19:55
devanandajroll: hm. ok. then ya'll would just preboot all the time?19:56
jrollyes19:57
jrollI see why others might want to only preboot some19:57
jjohnson2fyi, the ipmi target implementation can sanely respond to a get channel auth request19:58
jrolldevananda: what we do now is at the end of ZAP*, we reboot instead of power off19:58
jjohnson2NobodyCam,so not too much work and the hard part is done19:58
jjohnson2too much more work until the hard part is over that is19:58
*** ParsectiX has joined #openstack-ironic19:58
devanandajroll: nice. that's what I thought. so this works for you19:59
devanandajroll: and I"m not aware of any other contributors working on / asking for a prewarmed-some-of-the-time optimization. so I'm OK with not making this more complicated to accomodate that19:59
*** spandhe has quit IRC19:59
victor_lowtherok20:00
*** spandhe has joined #openstack-ironic20:00
jrolldevananda: sure, makes sense20:00
devanandavictor_lowther: new version of mine coming in 10m20:00
victor_lowtherso to me that sounds like PREBOOT* should vanish in favor of a "don't shut down when you hit AVAILABLE" flag.20:01
victor_lowthercoupled with "ZAP* always boots into something like discoverd or IPA"20:01
*** andreykurilin_ has joined #openstack-ironic20:02
devanandagah. TC meeting ... make that 2 hours20:02
* jroll also has a meeting now20:02
victor_lowther:) No worries.  I am not going to so anything with the spec until more folks have commented on it anyways.20:03
victor_lowtherer, do anything.20:03
devanandahttp://paste.openstack.org/show/JZ0tjouzcFYWZx1P18z6/20:04
jrolldevananda: transition from AVAILABLE to MANAGED?20:04
devanandayes20:04
devanandamissing an ^ arrow20:05
jrolldevananda: other than that, lgtm, would love for victor_lowther to push a new patchset with this20:07
*** Masahiro has joined #openstack-ironic20:07
NobodyCamjjohnson2: I'll have a look in a bit20:08
NobodyCambut awesome20:08
victor_lowtherI will, but not until this evening.  Have to give interested folks on the far side of the planet their chance to point out other things I got wrong.20:08
rloodevananda: I'm not sure how you get from delete->zap (unless I am misinterpreting [DELET*/AVAILABLE] -- doesn't this mean go to AVAILABLE after deleted?20:09
devanandarloo: it means the target state is AVAILABLE20:09
devanandabut it can go through anything the state machine wants to along the way20:09
jrollvictor_lowther: if we've already agreed something is wrong, we should fix it asap so that people aren't commenting on outdated things20:09
devanandacurrent / target20:09
devanandathat tracks the difference between ZAP/MANAGE and ZAP/AVAILABLE20:10
jrollvictor_lowther: give them a chance to comment on the new state machine overnight, rather than change it tomorrow and wait another night20:10
devanandavictor_lowther: if you dont mind and dont have time, i can just push a new rev over yours, with that picture in it20:10
rloodevananda: hmm. ok, it is the only one that is different (target state of others, go to that state next)20:11
victor_lowtherit is not just hte picture, there is feedback that I inadvertly ignored from the last rev that also needs to be incorporated.20:11
rloodevananda: you lost me with delet* -> zap/manage vs delet* -> zap/available. I don't see that in the diagram.20:11
jrollrloo: right, the ACTIVE -> AVAILABLE transition goes through deleting and zapping20:11
devanandarloo: ZAP is not a target state, ever, actually20:11
victor_lowtherbut a comment on the current rev with a link to your new state machine would be appreciated. :)20:12
devanandavictor_lowther: ack, will do20:12
jrolldevananda: hmm, so what's the API call for MANAGED -> ZAP -> MANAGED20:12
*** Masahiro has quit IRC20:12
jrolltarget: ZAPPED?20:12
devanandajroll: PUT /... {target: zap}20:12
JayFAs a note, since you're talking about this; I do have a proposal up on the state machine spec that suggests "decom" work be done in another step other than ZAPPING20:12
jrollok20:12
jrollJayF: I don't love that :(20:13
devanandawhcihi goes to current=ZAPPING,target=MANAGED20:13
JayFbecause ZAPPING being arbitrary stuff or security cleanups is incredibly confusing20:13
devanandaJayF: look at http://paste.openstack.org/show/JZ0tjouzcFYWZx1P18z6/20:13
devanandaJayF: and tell me if you still think that20:13
devanandacurrent=ZAPPING,target=MANAGED is different from current=ZAPPING,target=AVAILABLE20:14
devanandaJayF: which I think captures enough information for your needs20:14
JayFso what kicks things from MANAGED to AVAILABLE20:14
devanandaa human20:14
victor_lowtherI woud rename ZAP/AVAILABLE to CLEAN/AVAILABLE20:14
jrolldevananda: yep, got it20:14
devanandawe could just rename ZAP to CLEAN20:15
victor_lowthernah20:15
devananda:)20:15
JayFIf we rename zap to clean for the case of cleanup after deleted20:15
JayFthat's exactly what I wanted20:15
victor_lowtherbecause then we can rule that only cleaning stuff hallens in the ->AVAILABLE transition, and if you want to do arbitrary things you have to MANAGE the node first.20:15
victor_lowtherand ZAP/MANAGE would be clean + other interesting things.20:16
*** r-daneel has quit IRC20:17
*** r-daneel has joined #openstack-ironic20:17
rloo+1 for not having both zap/managed and zap/available. clean/available works for me.20:22
JayFvictor_lowther: exactly what I was thinking. Although ZAP/MANAGE should not do the CLEAN steps20:23
victor_lowtherJayF: why not?20:23
jrollJayF: what are the CLEAN steps? how do you update your fleet's firmware?20:23
devanandaJayF: sure it should20:23
JayFjroll: by managing the node, then using zap20:23
jrollAlthough ZAP/MANAGE should not do the CLEAN steps20:24
jroll""20:24
JayFvictor_lowther: My thought is more that any state that goes to ZAP/MANAGE would've already been CLEAN20:24
jrollidgi20:24
victor_lowtherif they don't then you potentially have not zeroed the disks on a node you are transitioning to AVAILABLE for the first tim.20:24
jrollunless steps can be in both ZAP and CLEAN20:24
JayF12:16:27 <victor_lowther> and ZAP/MANAGE would be clean + other interesting things. <-- that's the part I disagree with20:24
jrollyeah, that too20:24
JayFjroll: ++ that's more that I was thinking20:24
JayFZAP steps can be identical to CLEAN steps if that's what you want20:24
jrollJayF: I think victor is adding things like "build a raid" into that20:24
victor_lowtheryy20:24
JayFfor purposes of ZAP, sure20:25
jrollI think ZAP/MANAGE should likely be CLEAN + other things20:25
JayFbut I'm just saying we shouldn't do all the 'decom' / 'clean' steps in ZAP/MANAGE20:25
jrollI can't think of anything you wouldn't do20:25
JayFHow about secure erasing a JBOD20:25
jrollespecially because ZAP/MANAGE is how you bring in a new node20:25
JayFZAP/MANAGE would also be used for people who wanted to change a node config after it's been created, right?20:26
victor_lowtherJayF: what specirfically would you not want to do in ZAP/MANAGE that you would want to do in CLEAN/AVAILABLE?20:26
devanandaJayF: zap/manage is a superset of zap/available20:26
JayFI disagree with ^ the idea that zap/manage should be a superset of zap/available20:26
JayFthis is honestly part of why I don't want clean/zap conflated20:26
devanandathough as I read this -- I am more and more leaning towards zap/manage and clean/available20:26
JayFif we want CLEAN, we should do a CLEAN then do a ZAP20:26
victor_lowtherother way around.20:27
JayFer, okay20:27
NobodyCam++ other way around20:27
JayFmaybe going from MANAGE/ZAP could have an optional trip through CLEAN/AVAILABLE20:27
JayFwhich seems like it'd fulfill your desires20:27
JayFwithout mixing the things together which I don't like20:27
victor_lowtherI don't think it should be optional20:27
devanandaJayF: if we have CLEAN/AVAILABLE, do you think we can put all long-running reconfiguration tasks outside of the AVAILABLE loop entirely?20:27
JayFdefine:long-running20:28
devanandaeg, requrie operators to transition a node back to MANAGED in order to do things which are not part of CLEAN20:28
victor_lowtherI basically always want to CLEAN things before they go into production20:28
devanandafor what ever definition of CLEAN you use20:28
*** lucasagomes has joined #openstack-ironic20:28
JayFvictor_lowther: +1 I agree20:29
JayFI just think that CLEAN before going int o prod should actually just go into CLEAN/AVAILABLE20:29
JayFafter ZAP/MANAGE is done20:30
JayFpretty much anytime something "leaves" the AVAILABLE loop, it's reentry point (if cleaning enabled) should be CLEAN/AVAILABLE20:30
victor_lowtherwell, then we will have to redraw the graph.20:30
devanandaJayF: what would you do in one state that you wouldn't d oin the other?20:30
victor_lowtherbut I am fine with that.20:30
victor_lowtheri.e: MANAGE -> CLEAN -> AVAILABLE20:31
JayFdevananda: I don't think that's even relevant here20:31
victor_lowtherinstead of MANAGE -> AVAILABLE20:31
JayFif we're saying a node should be CLEAN before being AVAILABLE20:31
JayFwe should just have it transit that state20:31
JayFrather than overloading ZAP to do n+CLEAN without going through the actual CLEAN state20:31
devanandaJayF: I'd like to know, though. what would you do in one state that you wouldn't do in the other?20:32
devanandaalso, maybe someone already said that and I missed it -- multitasking meetings is great20:32
JayFdevananda: we have steps in our CLEAN that would change BIOS settings to enable access to some hardware then flips them back later20:32
JayFif there's a case where I'm doing something like rebuilding a RAID, I may not want those settings changed20:32
JayFor generally doing anything unneccessary that can be limited by write cycles (like reflashing a firmware)20:33
jrollmmm.20:33
*** ParsectiX has quit IRC20:33
JayFplus I might enqueue a ZAP task to fix a node that was in CLEAN FAILED20:34
devanandaJayF: so that would leave the managed -> zap -> managed loop intact20:34
devanandaJayF: change zap/available to clean/available20:34
JayFdevananda: Yah; I'm very OK with that20:34
devanandaJayF: and insert that step in the connection from managed -> available as well20:35
JayFthen MANAGED -> [CLEAN] -> AVAILABLE20:35
devanandaright20:35
dlaubeZAP?  sounds pretty ….shocking.20:35
dlaube:P20:35
victor_lowtherdlaube: it is my greatest contribution to Ironic ever.20:37
*** anderbubble has quit IRC20:38
*** jjohnson2 has quit IRC20:39
devanandaJayF: http://paste.openstack.org/show/b0XISoOBBfD48UrSB0CI/20:40
dlaubevictor_lowther: is this like IPA but with extra awesome sauce added or something?20:40
dlaube:D20:40
JayFdevananda: dumb question; what's the "R:" in that diagram?20:41
victor_lowtherIt has _all_ the awesome sauce.  And sprinkles.20:41
victor_lowtherJayF: API call20:41
jrollJayF: the PUT request requesting a state20:41
JayFthat's what I thought20:41
jrolldlaube: it's things like wiping disks, flashing firmware, etc20:41
JayFdevananda: +120:41
victor_lowtherdlaube: really, just hte name of a state.20:41
*** ParsectiX has joined #openstack-ironic20:41
jrollvictor_lowther: ... for now :)20:42
victor_lowtherA state that does awesome things20:42
jrolldef do_zap()20:42
victor_lowtherZOMG!20:42
dlaubeahh ok20:42
jrollit's going to end up everywhere20:42
dlaubegotcha20:42
victor_lowtherI imagine IPA will be a player in that state.20:42
JayFzap == Ironic is doing a thing to the hardware that the operator requested that isn't CLEANING or DEPLOYING20:42
victor_lowtherwhere such a thing can be "blow away my RAID array" or "flash all the things"20:43
JayFor do a burn-in test20:44
victor_lowtherthat too20:46
devanandaanyone care to poke more holes in my latest dia?20:47
lucasagomesdevananda, yup yeah I was suggesting that (checking before deploying)20:47
NobodyCamsame link?20:47
* lucasagomes too much scrollback20:47
devanandaNobodyCam: http://paste.openstack.org/show/b0XISoOBBfD48UrSB0CI/20:47
jrollwhoa, lucas is here late20:48
lucasagomesjroll, devananda the thing is that putting before doesn't help cause you don't know how much time the machine is sitting on available (could be 1 month)20:48
NobodyCamdevananda: I asked earlier but may have missed the answer. you dropped the rebuild state?20:48
lucasagomesjroll, and it could be optional :) so it can be fast20:48
lucasagomesfor the baremetal-to-tenant use case20:48
lucasagomesjroll, yeah20:48
lucasagomesI should go sleep :D20:49
jrolllucasagomes: I still disagree20:49
devanandalucasagomes: we already have a power status check loop for nodes in that state20:49
devanandalucasagomes: no reason we couldn't add a similar check loop for other things there20:49
lucasagomespower status check?20:49
lucasagomesdevananda, +120:49
lucasagomesyeah offering some flexibility for checks there is good20:49
lucasagomesif operators wants, cause u know someone may have pulled the cable20:49
devanandalucasagomes: that runs in the background on available nodes, but not after a user has started booting one20:49
lucasagomeslike a periodic task?20:50
devanandalucasagomes: we'll already notice if someone pulled the IPMI cable out20:50
devanandalucasagomes: yes. periodic task20:50
devanandawhich only runs on nodes in AVAILABLE state20:50
lucasagomesright, yeah it's fine. Doesn't need to be a state20:50
lucasagomesbut it's good that we have in mind that this is a valid use case and we need to tackle it somehow20:50
lucasagomesdevananda, fair enff20:50
lucasagomesjroll, seems to disagree, I still think its valid :)20:51
lucasagomesbut I'm ok with the periodic task20:51
devanandalucasagomes: i object to putting in a mandatory inband-status-assertion-check during deploy20:51
devanandathat'll slow down deploys by way, way too much20:51
lucasagomesdevananda, it could be optional20:51
lucasagomesthat's what I'm trying to point, it's not because it's represented as a state that it has to be mandatory20:52
lucasagomeszapping is optional afaiui20:52
devanandabut doing something in a periodic task to assert that nodes which we think are AVAILABLE actually are, and still have the same properties as the last time we checked?20:52
devanandasure, that's fine20:52
*** igordcard has quit IRC20:52
lucasagomesdevananda, yeah the periodic task works too :020:52
devanandalucasagomes: in a state machine like A -> B -> C, state "B" is not optional20:52
lucasagomes:)*20:52
* jroll just keeps his datacenter locked and doesn't worry about idle servers being changed20:53
devanandait may be no-op'd, but it's not optional20:53
lucasagomesdevananda, that's goes back to the FSM20:53
lucasagomesstate -> action -> state20:53
NobodyCamjroll: power supplys do fail20:53
devanandajroll: garden gnomes. they're sneaky ...20:53
lucasagomesactions can be optional20:53
lucasagomeswhich is where the task runs20:53
jrollNobodyCam: I feel like we might notice that through IPMI, dunno20:53
lucasagomesin a FSM the engine drivers the code from one state to another and just call some hooks (aka actions)20:53
lucasagomesand it could be non-op20:54
jrolllook, without this, worst case scenario is the deploy fails and is rescheduled20:54
devanandajroll: ++20:54
jrollwhich sucks as far as time that 'nova boot' takes20:54
jrollbut like, isn't that bad20:54
devanandaand the node gets kicked into maintenance mode20:54
jrollerror status, but yeah20:54
jrollsame idea20:54
devanandaright20:54
lucasagomesalright I think we agreed with the periodic task ting20:55
lucasagomesthing20:55
lucasagomesI don't wanna go back to discuss FSM vs non-FSM20:55
lucasagomes(did had a pleasant time doing that)20:55
lucasagomesdidn't*20:55
*** anderbubble has joined #openstack-ironic20:55
*** igordcard has joined #openstack-ironic20:56
* lucasagomes brb20:58
*** jjohnson2 has joined #openstack-ironic21:00
devanandaanyone interested in the cross-project meeting?21:00
devanandait's starting now21:00
NobodyCamdid we have a volenter from our team for that?21:01
* jroll will be lurking21:01
devanandawe have folks doing multiple cross project things21:01
NobodyCamvolunteer21:01
devanandaapi, oslo, stable maint, vuln, etc ...21:01
devanandathis is a general all kinds of cross project thing21:01
devanandathing21:01
NobodyCamlucasagomes: when you get back.. take a look at https://review.openstack.org/#/c/138109 if you have the time21:03
*** marcoemorais has quit IRC21:03
jjohnson2well, I officially have coded in VR21:04
NobodyCamn VR?21:04
jjohnson2NobodyCam, yeah, had my editor up in my oculus21:04
NobodyCamoh cool21:04
jjohnson2now I'm done doing that21:04
jjohnson2it needs a few more pixels before it'll be comfortable developing on a 12 meter screen 10 meters away21:05
rloodevananda: wrt your latest diagram, nit: need arrow from MANAGED to AVAILABLE (and should the request be 'manage'? or eg 'available'?)21:09
devanandarloo: there is such an arrow. it goes through CLEAN though21:10
devanandarloo: and the verb is "provide"21:10
rloodevananda: oh, I thought it was going to be optional to clean from MANAGED -> AVAILABLE. guess not.21:11
rloodevananda: that was my (hopefully) last question. why is the verb 'provide' instead of 'clean'?21:11
devanandarloo: i thought so too, but folks this morning were fairly adamant about it21:11
JayFIf you have cleaning disabled, obviously that steps a noop, right?21:11
devanandaclean could, presumably, decide if it wants to no-op when coming from managed (or something)21:11
devanandaJayF: sure21:11
JayFyeah exactly21:12
rlooJayF: I was wondering if you might want to skip the clean when you go from MANAGED-> AVAIL, but always clean after a deploy.21:12
mrdarloo: thank you for your wise and thorough review of the logical-name spec21:12
JayFthis is a plank in my+joshnang's platform21:12
rlooI guess if the 'clean' is smart enough to know when it might want to clean. like a lazy janitor :D21:12
mrdarloo: but now I have to patch the merged spec as the point you raised is very valid :)21:14
rloomrda: yw. sorry, i meant to look at it sooner. but I feel overwhelmed when I look at the list of specs and my 'method' of starting with the older specs was probably not a good idea.21:14
NobodyCamrloo: ++ to starting with older specs ... Thank you21:15
mrdarloo: +121:15
mrdarloo: appreciate your comments - had a small brain fade, which I now have to fix :)21:16
rlooNobodyCam, mrda: good idea in theory, but I'm learning not to be too strict about it ;)21:16
rloomrda: no worries. I'm sure we would have picked them up at coding time. but I like sooner better than later ;)21:16
mrdayup21:17
rloomrda. i think it is difficult to get a spec 'correct'. so good enuf is good enuf!21:17
jrollrloo: great comments, sorry I landed that early21:18
rloojroll: that issue with no stack trace for the exception in the conductor/deploy, are you going to handle that? (eg open a ticket or whatever)?21:18
jrollgah, did a bug not get filed?21:19
rloojroll: no worries. I don't think you landed that early, we have to get those specs approved.21:19
rloojroll: I don't think so. looking...21:19
jrollI'd rather not own that21:19
rloojroll: ok, that was my other question. ok, i'll open a bug for it so we don't forget. I'm only opening a bug ;)21:20
jrollok, thanks21:20
jrollwe may have a use for lots of low-hanging fruit :)21:21
rlooit'll take me longer to write up the bug than to fix it I think ;)21:21
mrdaSo, regarding the logical-name spec, now that I have to patch it - do I just raise a new review with the (small) patch to remove the reference to tenant?21:21
mrdaand should I worry about a bug?21:21
* mrda thinks not21:21
rloomrda: yup. i've got a small patch up for some other spec.21:21
JayFI wouldn't bug it at all21:21
rloomrda: no bug21:21
jrollmrda: jfdi :)21:22
rlooha ha21:22
mrdaok cool, I'll raise a new review and fix my brain fade.  Thanks for the direction.21:22
lucasagomesNobodyCam, sure :) I will add it to the todo list here and review tomorrow morning21:32
lucasagomesNobodyCam, it's a bit late now :) /me wants to relax :D21:32
*** mikedillion has quit IRC21:32
NobodyCamlucasagomes: its the start jjohnson2's ipmi to system command listener21:33
lucasagomesyeah I did a quick skimming21:33
NobodyCam:)21:33
jjohnson2huh?21:33
NobodyCamyour wip patch21:33
lucasagomesawesome! seems we are going to have the ipmi listener (BiMiC) :)21:33
jjohnson2yeah, ipmi 2.0 only21:34
jjohnson2I'm doing the rmcp+ open session request parsing now21:34
*** openstackgerrit has quit IRC21:34
lucasagomescool! 1.5 can come later no hurry (if needed as well)21:34
*** openstackgerrit has joined #openstack-ironic21:35
NobodyCambrb ... /me looks for some food stuffs21:35
openstackgerritMerged openstack/ironic-specs: iRMC Power Driver for Ironic  https://review.openstack.org/13448721:35
jjohnson2I might further walk the line of ipmitool and pyghmi compatibility testing first21:35
jjohnson2cipher suite 3 specifically21:36
*** alexpilotti has quit IRC21:38
devanandanow, with REBUILD: http://paste.openstack.org/show/ojbuBbsQGNlDMyz2mnPj/21:40
*** romcheg has joined #openstack-ironic21:40
JayFdevananda: [ot] does nova rebuild in Ironic guarantee the same backend node?21:40
devanandaJayF: yes21:41
JayFclearly it does with preserve ephemeral, but what about the other cases?21:41
openstackgerritMerged openstack/ironic-specs: Don't deprecate maint mode updates via node-update  https://review.openstack.org/13817821:46
*** ParsectiX has quit IRC21:50
NobodyCamw00 h00 :) rebuild21:52
*** mikedillion has joined #openstack-ironic21:53
NobodyCamdevananda: do you think rescue would ever have a need to redeploy anything?21:54
*** sreekanth has quit IRC21:54
*** anderbubble has quit IRC21:55
*** Masahiro has joined #openstack-ironic21:56
*** ParsectiX has joined #openstack-ironic21:57
*** ParsectiX has quit IRC21:59
*** ParsectiX has joined #openstack-ironic21:59
devanandaNobodyCam: then its rebuild22:00
devanandaNobodyCam: rescue should be "net boot this machine into a recovery ramdisk so I can troubleshoot it"22:00
*** Masahiro has quit IRC22:01
devanandawhich, actually, as an operator, i might want to do from the MANAGEMENT side22:01
devanandaer, MANAGED22:01
devanandawithout ever going to AVAILABLE or ACTIVE22:01
devanandabah22:01
devanandawhy did i have to think of that22:01
*** linggao has quit IRC22:02
devanandaright, that's thinking of it the wrong way. time for more coffee.22:02
*** anderbubble has joined #openstack-ironic22:04
*** dprince has quit IRC22:06
NobodyCam:)22:06
*** alexpilotti has joined #openstack-ironic22:08
victor_lowther devananda: I will get started on the next rev of the state machine spec using your latest graph.22:10
*** igordcard has quit IRC22:12
devanandavictor_lowther: cheers22:13
*** ryanpetrello has quit IRC22:18
openstackgerritJarrod Johnson proposed stackforge/pyghmi: Implement server side IPMI protocol (WIP)  https://review.openstack.org/13810922:19
*** mjturek has quit IRC22:21
*** Hefeweizen has quit IRC22:22
*** mikedillion has quit IRC22:23
*** ryanpetrello has joined #openstack-ironic22:23
*** lucasagomes has quit IRC22:25
JayFdevananda: no22:29
*** jjohnson2 has quit IRC22:29
JayFdevananda: rescue is a nova concept; how can you rescue an instance if there is no instacne to rescue22:30
devanandaJayF: right. and the thing I was thinking of is ZAP22:32
*** foexle has quit IRC22:38
*** ryanpetrello has quit IRC22:40
*** anderbubble has quit IRC22:42
*** ryanpetrello has joined #openstack-ironic22:44
*** erwan_taf has quit IRC22:45
openstackgerritMichael Davies proposed openstack/ironic-specs: Updates to logical name spec from review 134439  https://review.openstack.org/13856522:45
JayFmrda: ^ +222:47
*** ryanpetrello has quit IRC22:48
victor_lowtherdevananda: why UNRESCUE?22:48
devanandait's not DEPLOYING and it's not RESCUING22:49
*** anderbubble has joined #openstack-ironic22:49
victor_lowtherwhat I mean is22:49
devanandareturning the instance to the ACTIVE state22:50
victor_lowtherwhat is Ironic doing during that state transition?22:50
JayFturning machine off22:50
jrollrebooting to the instance22:50
JayFchanging boot device22:50
JayFflipping networks22:50
devanandaprobably changing the PXE configs22:50
JayFturning machine on22:50
devanandapossibly those things too22:50
JayFaweeks: ^ you should likely read this22:50
victor_lowtherok.22:50
victor_lowtherquestion answered.22:50
devanandaso yah. there is an implicit UNRESCUEFAIL in my diagram too22:50
aweeksyo22:51
mrdathanks JayF22:51
devanandaas a path forward for the code itself22:51
aweeksI'm in the process of implementing rescue mode22:51
aweeksinternally so far22:51
devanandado ya'll think it might be worth implemeting the current states in a state machine22:52
devanandalanding that22:52
devanandathen moving to the new states?22:52
JayFThat just sounds really confusing tbh22:52
JayFversus a clean "break" and migration22:52
NobodyCamthat seems like more work22:52
devanandamore work yes. easier to reason about the steps involved, migration / upgrade path, etc22:52
devanandaJayF: clean break doesn't sound like a good thing to me. why does it to you?22:53
aweeksdevananda: is there currently a proposal for states related to rescue mode?22:53
devanandaaweeks: http://paste.openstack.org/show/ojbuBbsQGNlDMyz2mnPj/22:53
victor_lowtherFor (UN)?RESCUING, is ironic responsible for the pxe/bootimg/whatever swizzling, or Something Nova-Ish?22:53
aweeksah, thanks22:53
devanandavictor_lowther: ironic is22:53
jrollvictor_lowther: ironic is22:53
JayFvictor_lowther: Ironic does it; Nova just tells us to rescue/unrescue22:53
victor_lowtherok22:53
jrollnova has the virt driver calls22:53
aweeksyeah, there are two calls: rescue(), unrescue(), and a "RESCUED" state in Nova22:54
JayFdevananda: At first thought it seems simpler ... but honestly I'd defer to knowledge of others :) Upgrading a state machine without breaking backwards compat is hard :)22:54
*** jgrimm is now known as zz_jgrimm22:55
*** marcoemorais has joined #openstack-ironic22:55
*** anderbubble has quit IRC22:56
devanandaJayF: let's assume it is hard but not impossible within this cycle. is that worth it?22:56
aweeksdevananda: in that diagram, what do the *s represent? "[RESCU*/RESCUE]"22:56
*** anderbubble has joined #openstack-ironic22:57
devanandaaweeks: see 133828 and comments thereon22:57
JayFdevananda: I don't know :)22:57
*** romcheg has quit IRC22:57
aweeksdevananda: got it, thanks22:58
devanandaJayF: massively breaking backwards compat the first cycle after integration isn't exactly on my priority list, btw :)22:58
*** marcoemorais1 has joined #openstack-ironic22:58
JayFdevananda: are you sure? I think it'd be great, and there's lots of precedent for it to ;)22:59
JayFs/to/too/22:59
devanandalol22:59
*** bradjones has quit IRC22:59
*** marcoemorais has quit IRC23:00
aweeksdevananda: not sure if relevant, but my implementation so far only has two states: RESCUEWAIT, and RESCUED in ironic23:02
aweeksand implements the rescue() and unrescue() functions in the virt driver23:02
devanandaaweeks: that's fine for now23:02
JayFaweeks: likely you'll want to either convince devananda to adopt the states you see or change the states you're using, lol23:02
devanandaaweeks: we'll likely rename all the states soon anyway23:03
aweeksI don't really care about the names23:03
* devananda gets out the alphabet soup23:03
jrollhehe23:03
*** marcoemorais1 has quit IRC23:03
aweeksdevananda: JayF: also, to be clear, the proposal includes removing *WAIT states, and instead has two separate states (the actual state, and a "wait" state)?23:05
*** marcoemorais has joined #openstack-ironic23:06
devanandaaweeks: nope. it introduces a not-yet-well-defined wait flag23:06
*** ryanpetrello has joined #openstack-ironic23:06
aweekshurm23:06
devanandait does remove the *WAIT states, though -- that's correct23:06
devanandaDEPLOYING+wait23:06
devanandaRESCUING+wait23:06
devanandaetc23:07
NobodyCamrloo: if you have a free minute, can you give https://review.openstack.org/#/c/138565/ a quick look over23:07
*** harlowja_ is now known as harlowja_away23:07
*** spandhe has quit IRC23:08
openstackgerritVictor Lowther proposed openstack/ironic-specs: New Ironic provisioner state machine.  https://review.openstack.org/13382823:09
victor_lowtherdevananda: I think we should drop the wait flag stuff for now23:09
*** alexpilotti has quit IRC23:09
victor_lowtherin the interests of finalizing the spec by the end of the week.23:09
devanandavictor_lowther: then we need to add *WAIT to the STATE* description23:10
aweeksso, my possibly uninformed perspective is that it seems like the ironic state machine should be a super set of the nova state machine.  in that there are a set of states in ironic that are 1-1 with the nova states, and edges in the nova state machine can be replaced by 1 or more states/edges in ironic?23:10
devanandavictor_lowther: because we must have a DEPLOYWAIT state, or equivalent23:10
devanandaor the state machien can't handle the current drivers23:10
victor_lowtherah23:10
devanandaaweeks: superset, yes. there are also states within ironic where the node is not even visible to nova23:11
victor_lowtherMind throwing which states need that treatment at the newly-updated spec?23:11
aweeksthe idea being that the ACTIVE (nova) -> rescue() -> RESCUED (nova) -> unrescue() -> ACTIVE (nova) in nova could be transformed into:  ACTIVE (ironic) -> rescue() -> INTERNALSTATE (ironic) -> ... RESCUED (ironic) -> INTERNALSTATE (ironic) -> unrescue() -> ACTIVE  (ironic)23:12
*** harlowja_away is now known as harlowja_23:12
aweekswith ACTIVE and RESCUED being 1-1 between ironic/nova23:12
aweeksbut with intermediate states potentially in ironic23:12
jrollunrelated: I really wish I could add arbitrary fields to node-list in the client23:12
devanandavictor_lowther: at a minimum, deploy, clean, zap.. possibly also validate, inspect23:13
victor_lowtherjroll: I was suprised that you could not.23:13
aweeksor similar for other state transitions23:13
devanandajroll: you mean, because the client has to change, or the API service doesn't erturn the fields you want?23:13
victor_lowtherdevananda: so basically, instead of -ING states they should be -WAIT23:14
victor_lowther?23:14
NobodyCamvictor_lowther: "In the active state, Ironic is doing something to the node." just checking thats not reffering to the ACTIVE state23:14
victor_lowtherline?23:14
NobodyCam119-12023:14
jrolldevananda: I mean, as an operator, I want to do a node-list and get last_error as well23:15
jrolljust throwing it out there23:15
victor_lowtherno, otherwise it would be in CAPS.  That usage refers to the -ING state.23:15
NobodyCam:)23:16
*** alexpilotti has joined #openstack-ironic23:16
devanandavictor_lowther: more granularly, it may go like this, for some drivers23:16
devanandaDEPLOYING (ironic-conductor is doing things)23:16
devanandaDEPLOYWAIT (conductor is idle, lock is released, and an agent is doing something on the node locally)23:17
devanandaDEPLOYING (ironic conductor is working on it again, since the agent is done)23:17
devanandaDEPLOYDONE (hand off...)23:17
devanandaACTIVE23:17
devanandajroll: ^ fair statements?23:17
jrollyes23:18
devanandaI think that is better modelled by DEPLOYING +/- WAIT_FLAG23:18
victor_lowtherwell, if lucasgomes does not like our state machine now...23:19
victor_lowtherya23:19
victor_lowtherwhat I have been missing is a clear articulation of precisely when and how the wait flag would work.23:19
devanandait signals that ironic is mid-task, but has released the lock and is waiting for an external call-back23:19
*** ryanpetrello has quit IRC23:20
devanandathe same process happens within introspection23:20
victor_lowtherspecifically around how the node handoff to and from whatever external agent works.23:20
victor_lowtherargh, after 1700 here.23:20
NobodyCamthats just what we do now23:21
devanandafor the PXE driver, there's a waiting period after the machine is first powered on23:21
victor_lowtherGotta scram.23:21
devanandavictor_lowther: ack, ttyl23:21
*** alexpilotti has quit IRC23:21
NobodyCamhave a good night victor_lowther23:21
NobodyCamthank you for the awesome effort23:21
NobodyCamand others too23:21
dlaubeg'night victor_lowther23:22
*** bradjones has joined #openstack-ironic23:28
*** Haomeng has joined #openstack-ironic23:33
*** spandhe has joined #openstack-ironic23:34
*** Haomeng|2 has quit IRC23:34
*** anderbubble has quit IRC23:35
*** andreykurilin_ has quit IRC23:39
*** Masahiro has joined #openstack-ironic23:45
rloomrda: I was just looking at your patch 13856523:47
rloomrda: does it say anywhere that the logical names must be unique?23:47
*** yuanying has joined #openstack-ironic23:47
NobodyCamrloo: I enfered that from the 1:1 uuid statment... mabe incorrectly23:47
NobodyCammaybe*23:47
rloomrda: or is that what '1:1 mapping between a <logical name> and a <node uuid>' means.23:48
rlooNobodyCam: ok. I must be tired, I don't remember what 1:1 mapping means!23:48
* NobodyCam just found https://wiki.openstack.org/wiki/OpenStackClient/HumanInterfaceGuidelines23:48
JayFrloo: 1:1 mapping means no duped names or uuids23:49
JayFrloo: each name maps to exactly one uuid and vice-versa23:49
NobodyCamdoes this mean we need to support --format in our cli23:49
rlooJayF: thx for clarifying!23:49
*** Masahiro has quit IRC23:49
JayFNobodyCam: openstackclient != python-*client iirc23:49
JayFNobodyCam: I think openstackclient is the "openstack" command/sdk people are working on23:49
NobodyCamok I took it as openstack clientS23:50
NobodyCamyou are correct23:50
*** yuanying_ has quit IRC23:50
rlooNobodyCam: Thx for asking; I'm good with 138565. Would you do the honours and approve it?23:51
NobodyCam:)23:51
NobodyCamrloo will do23:51
NobodyCamrloo: done23:51
mrdarloo: yes, a 1:1 mapping between logical_name and uuid implies that logical_name needs to be unique23:52
mrda(or at least I intended it to be so)23:52
openstackgerritJosh Gachnang proposed openstack/ironic-python-agent: Use LLDP to get switch port mapping  https://review.openstack.org/9262723:52
NobodyCamthats how I took it23:52
*** Hefeweizen has joined #openstack-ironic23:52
NobodyCamJoshNang: ooooo neat-oh23:52
JoshNangNobodyCam: :D23:52
JoshNangit's basically going to require a custom hardware manager per switch manufacturer though. lldp is a not great format23:53
JayFNobodyCam: we're running that in our prod hw manager to verify our ports are accurate today23:53
* mrda just upgrade his internet from 6/0.3 to 30/1. It's a nice change :)23:54
openstackgerritMerged openstack/ironic-specs: Updates to logical name spec from review 134439  https://review.openstack.org/13856523:54
JayFmrda: congratulations, that's what, like 20% of all the internet down there :P23:54
NobodyCamlol23:54
mrdalol, not exactly.  Some people are getting 1000 down.23:55
NobodyCamJayF: would that look like this: https://scholarworks.iu.edu/dspace/bitstream/handle/2022/171/image9CP.JPG23:56
mrdaADSL -> Cable23:56
* NobodyCam has cable modem installed in his RV :) 100/30 atm I think23:56
JayFNobodyCam: which one of the cans is for the nsa?23:56
NobodyCamlol23:56
mrdaNobodyCam: I can go there too, but it's an extra 20/month.  I'll see how this goes for now.23:57
JayFthat's nice.23:57
JayFI have 50 down now but I can get 300 down for like $20/month23:58
mrdaIt's really the change from 0.3 to 1 up that's important.  Video calls are hard at 0.3 up.23:58
NobodyCam:) /me pays a bit more as he never get the "contract" price23:58
NobodyCammrda: audio only is ruff at .323:59
*** ryanpetrello has joined #openstack-ironic23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!