Wednesday, 2014-04-16

*** ChanServ changes topic to "OpenStack Bare Metal Provisioning | Docs: http://docs.openstack.org/developer/ironic/ | Bugs: https://bugs.launchpad.net/ironic | Status: https://etherpad.openstack.org/p/IronicWhiteBoard"00:02
openstackgerritJay Faulkner proposed a change to openstack/ironic-python-agent: Use docker import/export to make image smaller  https://review.openstack.org/8781900:19
openstackgerritJay Faulkner proposed a change to openstack/ironic-python-agent: Use docker import/export to make image smaller  https://review.openstack.org/8781900:20
dwalleck_devananda: I almost mentioned this earlier when we were talking about Devstack instructions, but is getting Ironic docs up to api.openstack.org and other parts of OpenStack something that needs some help? I have nothing but time at this point :-)00:23
devanandadwalleck_: yep! that'd be great00:24
dwalleck_Good deal. I'll poke around in my spare time and see if I can get something bootstrapped. I think it shouldn't be too bad00:26
*** matsuhashi has joined #openstack-ironic00:30
*** zdiN0bot has joined #openstack-ironic00:32
*** yongli has joined #openstack-ironic00:36
*** zdiN0bot has quit IRC00:36
dwalleck_oh my....they generate everything from wadls00:39
*** blamar has quit IRC00:41
*** zdiN0bot has joined #openstack-ironic00:43
*** eguz has quit IRC00:45
*** zdiN0bot has quit IRC00:47
russell_hdwalleck_: :)00:47
dwalleck_This is what I get for trying to help =P00:48
*** blamar has joined #openstack-ironic00:59
russell_hdwalleck_: we're supposed to have people who know how that stuff works00:59
*** newell_ has quit IRC01:02
*** ilives has joined #openstack-ironic01:16
*** matsuhashi has quit IRC01:40
*** matsuhas_ has joined #openstack-ironic01:43
*** zdiN0bot has joined #openstack-ironic01:44
*** zdiN0bot has quit IRC01:48
*** coolsvap|afk is now known as coolsvap02:35
*** harlowja is now known as harlowja_away02:35
*** zdiN0bot has joined #openstack-ironic02:44
*** zdiN0bot has quit IRC02:48
*** matsuhas_ has quit IRC02:49
*** matsuhashi has joined #openstack-ironic02:49
*** matsuhas_ has joined #openstack-ironic02:52
*** matsuhashi has quit IRC02:52
*** matsuhas_ has quit IRC02:58
*** nosnos has quit IRC03:35
*** zdiN0bot has joined #openstack-ironic03:45
*** zdiN0bot has quit IRC03:49
*** pradipta` is now known as pradipta04:19
*** nosnos has joined #openstack-ironic04:25
*** saju_m has joined #openstack-ironic04:27
*** saju_m has quit IRC04:27
*** lazy_prince has joined #openstack-ironic04:34
*** coolsvap is now known as coolsvap|afk04:35
*** dwalleck_ has quit IRC04:36
*** zdiN0bot has joined #openstack-ironic04:46
*** zdiN0bot has quit IRC04:50
*** coolsvap|afk is now known as coolsvap04:52
*** Mikhail_D_ltp has joined #openstack-ironic04:54
*** radsy has quit IRC04:56
*** zdiN0bot has joined #openstack-ironic05:23
*** zdiN0bot has quit IRC05:27
*** pradipta is now known as pradipta_away05:40
*** florentflament has quit IRC05:49
*** sabah has joined #openstack-ironic05:56
*** pradipta_away is now known as pradipta06:03
*** zdiN0bot has joined #openstack-ironic06:23
*** zdiN0bot has quit IRC06:28
*** pradipta is now known as pradipta_away06:38
*** Mikhail_D_ltp has quit IRC06:51
openstackgerritHaomeng,Wang proposed a change to openstack/ironic: Implements send-data-to-ceilometer  https://review.openstack.org/7253806:51
*** romcheg1 has joined #openstack-ironic06:57
*** foexle has joined #openstack-ironic07:03
*** ilives has quit IRC07:11
*** ilives has joined #openstack-ironic07:11
*** zdiN0bot has joined #openstack-ironic07:24
*** zdiN0bot has quit IRC07:29
GheRiveromorning all07:36
HaomengGheRivero: morning:)07:36
Mikhail_D_wkGood morning all! :)07:39
*** Haomeng has quit IRC07:44
*** jistr has joined #openstack-ironic07:52
*** mrda is now known as mrda_away08:00
*** romcheg1 has left #openstack-ironic08:03
*** lucasagomes has joined #openstack-ironic08:07
*** yuriyz has joined #openstack-ironic08:12
*** derekh has joined #openstack-ironic08:12
dtantsurMorning Ironic, morning GheRivero, Haomeng, Mikhail_D_wk :)08:16
Mikhail_D_wkdtantsur: morning :)08:17
yuriyzmorning All08:17
lucasagomesmorning yuriyz dtantsur Mikhail_D_wk08:18
lucasagomeslifeless, morning, ping re https://bugs.launchpad.net/ironic/+bug/1199665 and https://bugs.launchpad.net/ironic/+bug/123135108:18
Mikhail_D_wkyuriyz,  lucasagomes morning! :)08:18
lifelesslucasagomes: hi08:18
*** athomas has joined #openstack-ironic08:18
lucasagomeslifeless, they both contradict... I would like ur opnion on that, should we be caching images at all?08:19
lifelesslucasagomes: oh, so hmmm08:20
lifelesslucasagomes: I don't think we want to grow storage without bound08:20
dtantsurlucasagomes, lifeless morning :)08:21
lifelesslucasagomes: is there a size limit on the cache?08:21
dtantsurlifeless, not now, I can apply (+ timeouts as well)08:21
lucasagomeslifeless, not in trunk, but there's a patch upstream by dtantsur that does limit the size of the cache08:21
yuriyzdtantsur, looks like image will not be downloaded if exists even if it was changed in Glance https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/pxe.py#L33008:22
dtantsuryuriyz, yes, and my point is to change this as well, so that if checksum changes, image is redownloaded no matter what08:22
dtantsurlifeless, ^^08:22
lucasagomesyeah there's no cache invalidation in that patch, but would be good to add08:23
lucasagomesdtantsur, got the link handy there?08:23
lifelessso08:24
lifelessI think:08:24
lifeless - total size limit08:24
lifeless - if exceeded we need to serialize things carefully (e.g. what if a single image is larger than cache size)08:24
lifeless - cache size doesn't apply to kernel and initrd files08:24
lifeless - delete images if not used for $time08:25
lifelesssounds great to me08:25
*** zdiN0bot has joined #openstack-ironic08:25
lucasagomeslifeless, right that's a good list08:25
lifelessand the bugs don't conflict :)08:25
lucasagomeslifeless, and for the default option, by default should cache be disabled?08:25
lifelesslucasagomes: enabled08:25
*** Haomeng has joined #openstack-ironic08:25
lucasagomesas we have that rule of having default ready for production08:25
lucasagomeslifeless, ack08:25
lucasagomeslifeless, cheers for the input08:26
lifelesslucasagomes: nova BM takes nearly 2 hours to deploy to a rack08:26
lifelesslucasagomes: ironic today takes 22m08:26
dtantsurlifeless, could you leave this as a comment to my patch?08:26
lifelesslucasagomes: :)08:26
lucasagomesheh08:26
lucasagomesw00t08:26
dtantsurlifeless, wow! really wow08:26
lifelessdtantsur: I need to pop out of here and do other stuff - just copy the IRC log :>08:26
dtantsurlifeless, ack :)08:27
Haomenglucasagomes: morning08:27
lucasagomesHaomeng, morning there :)08:27
Haomenglucasagomes: one quick problem, do you know which keystone-pythonclient version is used for Jenkins, I found it is not latest one, but my local is latest keystone-pythonclient, and our ironic refer the keystoneclient conf files08:28
dtantsurifarkas, we reached some consensus on caching above ^^^08:29
Haomenglucasagomes: so the output of "tools/config/generate_sample.sh" is difference with Jenkins, because we have different base version08:29
*** zdiN0bot has quit IRC08:29
Haomenglucasagomes: where we can get the same version for keystone-pythonclient with Jenkins08:30
lucasagomesHaomeng, hmm I jenkins will download it from pip as we do when generating our venv08:30
lucasagomesHaomeng, one possible problem would be if the new version of python-keystoneclient was just fresh released and infra do mirror the pip repo08:30
lucasagomesit would be outdated08:30
Haomenglucasagomes: I think so, but not sure if pip source is same one08:30
Haomengmaybe08:30
* ifarkas reads back08:31
Haomenglucasagomes: :)08:31
lucasagomesHaomeng, I dunno much about infra, I think they do have a mirror... I think it worth asking at #openstack-infra08:31
Haomenglucasagomes: but it is diffcult to get same version with jenkins, not sure the pip repo08:31
Haomengused by jenkins08:31
Haomenglucasagomes: ok08:31
lucasagomes:)08:32
Haomenglucasagomes: let me check with #openstack-infra irc, thank you:)08:32
lucasagomesHaomeng, np :)08:32
Haomenglucasagomes: nice day:)08:32
Haomenglucasagomes: :)08:32
lucasagomesHaomeng, you too buddy08:32
Haomenglucasagomes: :)08:32
*** dshulyak has joined #openstack-ironic08:34
*** pradipta_away is now known as pradipta08:35
lifelesslucasagomes: actually let me go further08:44
lifelesslucasagomes: ironic is fast because it designed in shared state for images etc08:44
lifelesslucasagomes: I would say the cache cannot be disabled, only tuned.08:44
lucasagomeslifeless, wouldn't set the total size limit to 0 a way to "disable" it?08:44
lucasagomeswhich is grand imo08:45
lifelessno - consider the case of 'image bigger than cache' - we have to download the image; we should deploy - it would be ideal then if there are other deploys needing that image to reuse it before releasing whatever lock and having it deleted08:46
*** dtantsur is now known as dtantsur|lunch08:46
lucasagomeslifeless, ahh +1..08:48
lucasagomeslifeless, if image is bigger than cache AND no longer have links to it, then we delete get rid of it08:48
*** eghobo has joined #openstack-ironic08:51
lifelesslucasagomes: right, though I'm not sure we ever have links for images we're going to dd...08:51
*** martyntaylor has joined #openstack-ironic08:53
lucasagomesright, some condition then... the cleanup function could check whether more nodes set to be deployed are going to use that image08:53
lifelesssomething08:54
lifelesswe can iterate to it08:54
lifelesskey thing is that the user setting doesn't disable the cache codepaths08:54
lifelessit just makes cleanup more aggressive08:54
lucasagomesack, sounds reasonable to me08:55
lucasagomesI'm going to a comment to the patch08:55
openstackgerritLucas Alvares Gomes proposed a change to openstack/ironic: Add DiskPartitioner  https://review.openstack.org/8339609:08
openstackgerritLucas Alvares Gomes proposed a change to openstack/ironic: Use DiskPartitioner  https://review.openstack.org/8339909:08
openstackgerritLucas Alvares Gomes proposed a change to openstack/ironic: Get rid of the swap partition  https://review.openstack.org/8372609:08
openstackgerritLucas Alvares Gomes proposed a change to openstack/ironic: Use GB instead of MB for swap  https://review.openstack.org/8378809:09
*** overlayer has joined #openstack-ironic09:13
*** max_lobur has joined #openstack-ironic09:25
*** zdiN0bot has joined #openstack-ironic09:26
*** zdiN0bot has quit IRC09:30
*** mdickson has joined #openstack-ironic09:37
*** athomas has quit IRC09:37
*** athomas has joined #openstack-ironic09:38
*** mdickson2 has quit IRC09:41
*** dtantsur|lunch is now known as dtantsur09:53
*** mdickson2 has joined #openstack-ironic09:53
*** mdickson has quit IRC09:54
*** nosnos has quit IRC10:07
*** nosnos has joined #openstack-ironic10:12
*** eghobo has quit IRC10:21
*** zdiN0bot has joined #openstack-ironic10:26
*** zdiN0bot has quit IRC10:31
*** nosnos has quit IRC10:34
*** nosnos has joined #openstack-ironic10:35
openstackgerritVladimir Kozhukalov proposed a change to openstack/ironic-python-agent: Added disk partitioner  https://review.openstack.org/8616310:38
*** nosnos has quit IRC10:39
openstackgerritA change was merged to openstack/ironic: Some minor clean up of various doc pages  https://review.openstack.org/8776510:44
*** overlayer has quit IRC10:46
*** Alexei_987 has joined #openstack-ironic10:51
*** nosnos has joined #openstack-ironic11:06
*** ifarkas has quit IRC11:10
*** coolsvap is now known as coolsvap|afk11:15
*** lucasagomes is now known as lucas-hungry11:16
openstackgerritAndrey Kurilin proposed a change to openstack/python-ironicclient: Sync latest code and reuse exceptions from oslo  https://review.openstack.org/7150011:18
openstackgerritAndrey Kurilin proposed a change to openstack/python-ironicclient: Reuse module `cliutils` from common code  https://review.openstack.org/7241811:22
*** ifarkas has joined #openstack-ironic11:22
*** mdickson has joined #openstack-ironic11:22
*** mdickson2 has quit IRC11:22
*** zdiN0bot has joined #openstack-ironic11:27
*** zdiN0bot has quit IRC11:31
*** romcheg1 has joined #openstack-ironic11:36
openstackgerritVladimir Kozhukalov proposed a change to openstack/ironic-python-agent: Added disk partitioner  https://review.openstack.org/8616311:52
openstackgerritVladimir Kozhukalov proposed a change to openstack/ironic-python-agent: Added disk partitioner  https://review.openstack.org/8616311:55
*** sabah has quit IRC12:11
Shrewsmorning ironic. fyi, change your IRC passwords: http://blog.freenode.net/2014/04/heartbleed/12:11
Shrewsit seems Nickserv was temporarily compromised12:12
dtantsurouch...12:13
*** pradipta is now known as pradipta_away12:14
*** nosnos has quit IRC12:17
*** dtantsur is now known as dtantsur|bbl12:17
*** dtantsur|bbl has quit IRC12:19
*** lucas-hungry is now known as lucasagomes12:24
agordeevmorning Ironic12:27
*** zdiN0bot has joined #openstack-ironic12:28
romcheg1Morning agordeev and everyone else12:28
agordeevShrews: should i change my pass if i haven't used ssl connection for IRC?12:31
agordeevromcheg1: morning :)12:31
Shrewsagordeev:  because Nickserv was compromised, if you happened to authenticate with Nickserv during that period, your password would have been exposed... ssl or not12:32
*** zdiN0bot has quit IRC12:33
agordeevShrews: thanks, i got it. Glad to recall that i don't use IRC on weekends (mostly because of not having PC/'internet access' at home) :D12:37
*** jdob has joined #openstack-ironic12:38
*** overlayer has joined #openstack-ironic12:39
romcheg1lucasagomes: I'm looking at the IPMI console patch12:39
romcheg1It's still bound to using HTTP all the time12:39
romcheg1I don't remember, what was our decision last time we noticed that?12:40
lucasagomesromcheg1, hey afternoon12:44
lucasagomeshmmmm trying to remember12:44
lucasagomesI think that it was about first getting a console access implemented, even if it's only http for now12:45
lucasagomesthat's enough to have pair functionality with nova bm12:45
lucasagomesand after that we can start adding more stuff on it12:45
romcheg1Makes sense12:45
lucasagomeslinggao is also changing a bit the api to make it more flexible to get the console information12:46
*** dtantsur has joined #openstack-ironic12:52
*** jbjohnso has joined #openstack-ironic12:56
NobodyCamGood morning IRonic12:56
openstackgerritA change was merged to openstack/ironic: Fix message preventing overwrite the instance_uuid  https://review.openstack.org/8773113:05
Mikhail_D_wk NobodyCam: morning :)13:06
NobodyCammorning Mikhail_D_wk :)13:07
openstackgerritVladimir Kozhukalov proposed a change to openstack/ironic-python-agent: Added disk utils  https://review.openstack.org/8616313:08
openstackgerritVladimir Kozhukalov proposed a change to openstack/ironic-python-agent: Added disk utils  https://review.openstack.org/8616313:11
*** jgrimm has joined #openstack-ironic13:12
lucasagomesmorning NobodyCam :)13:14
NobodyCambrb... morning walkies13:14
NobodyCammorning lucasagomes :)13:14
lucasagomesNobodyCam, enjoy the walkies :)13:14
*** linggao has joined #openstack-ironic13:16
linggaoping lucasagomes13:18
lucasagomeslinggao, pong13:18
lucasagomeslinggao, morning :)13:18
linggaogood morning lucasagomes13:18
linggaoI am reviewing your patch https://review.openstack.org/#/c/86588/13:19
linggaoI saw you have removed the VendorPassThrou Interface and replaced it with ManagementInterface13:20
lucasagomeslinggao, yeah, for ipmitool/native because the only method in the vendor_passthru was the set_boot_device13:20
lucasagomesfor the seamicro I kept, since they do have other stuff, attach_volume etc13:21
linggaoI see. that's okay.13:21
linggaoI saw you also have a patch https://review.openstack.org/#/c/85742/13:21
linggaoto set the bootd evice persistent13:21
lucasagomesahn /me have forgot that hah13:21
lucasagomesI probably could rebase it on top of the management interface one13:22
linggaois that patch really needed? should the persistent be moved to the new patch?13:22
linggaoyes, that's what I meant.13:22
lucasagomeshmm I could add it to the same one porting to the management interface13:23
lucasagomesbut I could argue that they are 2 diff things as well, 1 is a port the other one is adding a new functionality13:24
lucasagomeslike 2 diff changes13:24
lucasagomesI'm fine doing it in the same patch or a separated one as is now13:24
linggaoIt's okay with me either way.13:25
linggaothanks.13:25
lucasagomeslinggao, ack, thank YOU for pointing me that... I had forgot about the persistent one13:26
lucasagomesmarked as WIP for rebase or inclusion in the other patch13:26
linggaolucasagomes, np. thanks for all the help you've given me.13:26
lucasagomesnp at all :)13:26
NobodyCamgood morning linggao13:27
linggaoHey NobodyCam, good morning. Got coffee?13:28
NobodyCamsi13:28
NobodyCam:)13:28
*** matty_dubs|gone is now known as matty_dubs13:29
*** zdiN0bot has joined #openstack-ironic13:29
* NobodyCam has decent intertubes too :)13:30
*** zdiN0bot has quit IRC13:33
NobodyCamall quick question on the info logging patch. There seems to be many questions about adding info loging to ssh.py. Several revisitions ago I it looked like this: https://review.openstack.org/#/c/85124/3/ironic/drivers/modules/ssh.py13:55
openstackgerritA change was merged to stackforge/pyghmi: Add optical and bios aliases for boot devices  https://review.openstack.org/8768213:57
NobodyCamlooking for any input on weather I should contine to add info logging to ssh or jsut remove13:57
NobodyCam*just even13:57
lucasagomesNobodyCam, :( I think we should have logs... on that link ^... "Attempting..." it sounds more like a debug indeed13:59
lucasagomesI would log the INFO after the state change happened13:59
lucasagomesINFO("Node <uuid> is powered on")13:59
lucasagomesif it fails to change the power state13:59
lucasagomesERROR("Failed to power on node <uuid>")13:59
lucasagomesI see info as hmmm a storyline of the events that happened to that node14:00
lucasagomesnode is now active, node is powered on, node is powered off etc...14:00
lucasagomesDEBUG for the attempts or logging commands "Attempting to power on node <uuid>: <cmd>"14:00
lucasagomesthings like that14:01
NobodyCamahh ok will rework14:01
lucasagomesINFO for success, ERROR for failures, WARNING for failures that will be automatic retried or fallback somehow, CRITICAL for errors that compromise the whole service and not an operation14:01
lucasagomesand DEBUG for the rest14:02
NobodyCambrb14:07
*** lazy_prince has quit IRC14:10
*** dwalleck has joined #openstack-ironic14:27
*** overlayer has quit IRC14:29
*** zdiN0bot has joined #openstack-ironic14:29
*** zdiN0bot has quit IRC14:34
*** martyntaylor has quit IRC14:34
*** dwalleck has quit IRC14:36
*** dwalleck has joined #openstack-ironic14:36
*** dwalleck_ has joined #openstack-ironic14:39
*** dwalleck has quit IRC14:42
*** martyntaylor has joined #openstack-ironic14:48
*** coolsvap|afk is now known as coolsvap14:51
*** hemna_ has quit IRC14:52
*** ilives has quit IRC14:54
*** ilives has joined #openstack-ironic14:58
Shrewsso, nova provides a default "rebuild" if the driver doesn't implement one. has anyone tried to see if this actually works with real bm?14:58
Shrewssince ironic doesn't provide one, i'm just curious if/how it works14:59
Shrewsdevananda or NobodyCam? ^^^15:00
*** dwalleck_ has quit IRC15:03
NobodyCamShrews: I have not tested, but would assume that it dose not work15:05
Shrewsit works with the vm's from devstack... but that's not actual h/w  :)15:06
NobodyCamoh then it may work... I'm nit sure15:11
NobodyCamnot even15:11
NobodyCambbiafm15:12
openstackgerritChris Behrens proposed a change to openstack/ironic: Add create() and destroy() to Node  https://review.openstack.org/8482315:19
openstackgerritChris Behrens proposed a change to openstack/ironic: Clean up calls to get_node()  https://review.openstack.org/8457315:19
openstackgerritChris Behrens proposed a change to openstack/ironic: Make sync_power_states yield  https://review.openstack.org/8486215:19
openstackgerritChris Behrens proposed a change to openstack/ironic: Refactor sync_power_states tests to not use DB  https://review.openstack.org/8707615:19
openstackgerritChris Krelle proposed a change to openstack/ironic: Add Logging  https://review.openstack.org/8512415:22
NobodyCamlucasagomes: Mikhail_D_wk: let me know how that version looks15:23
*** ilives has quit IRC15:24
*** ilives has joined #openstack-ironic15:25
*** zdiN0bot has joined #openstack-ironic15:30
*** zdiN0bot has quit IRC15:34
openstackgerritChris Krelle proposed a change to openstack/ironic: Add Logging  https://review.openstack.org/8512415:42
*** vkozhukalov has joined #openstack-ironic15:46
*** martyntaylor has quit IRC15:57
jrollgood morning ironic15:59
NobodyCamgood morning jroll15:59
*** eghobo has joined #openstack-ironic16:03
*** matty_dubs is now known as matty_dubs|lunch16:05
NobodyCamdevananda: once your up and going looks like https://review.openstack.org/#/c/87396 needs a quick rebase16:09
*** max_lobur has quit IRC16:09
linggaolucasagomes, ping16:09
lucasagomeslinggao, pong16:11
lucasagomesmorning jroll16:11
linggaoThis is regarding the console API.16:11
linggaothe GET is v1/nodes/<uuid>/console to return the console information.16:11
openstackgerritRussell Haering proposed a change to openstack/ironic: Fix an issue that left nodes permanently locked  https://review.openstack.org/8801716:12
linggaoThe PUT v1/nodes/<uuid>/states/console to enable/disable the console.16:12
linggaoI aggress with you that it is a little odd to have get/put in different places.16:13
linggaobut get gets more info than just the console enablement state.16:13
linggaomaybe change put to v1/nodes/<uuid>/states/console_enabled  ?16:14
linggaoanother thing is that  v1/nodes/<uuid>/states returns all node state includes the console state.16:15
linggaoand  v1/nodes/<uuid>/states/power turns node power on/off.16:15
lucasagomeslinggao, yeah hmmm /me thinking16:17
*** martyntaylor has joined #openstack-ironic16:17
lucasagomesthe /states/... is meant to change all the states (power, provision, console)16:18
lucasagomesidk maybe we shouldn't have console as a state?16:18
lucasagomesPUT /nodes/<uuid>/console, to enable/disable it16:18
lucasagomesGET /nodes/<uuid>/console to get the info about the console (if enabled or disable, url to access it)16:19
linggaoand remove the console state from the ALL state.16:19
lucasagomesyeah16:19
linggaoI like it.16:20
linggaothis is more consistent.16:20
lucasagomesso console will be a sub-element of a node and have it's own URI16:20
* NobodyCam brb making minor network change so he can access his test laptop16:20
lucasagomesyeah16:20
lucasagomesI thnk it's better than have it divided16:21
lucasagomesand leave states for power and provision states16:21
linggaoyes.16:21
lucasagomesit's already a bit confusing that we do have 4 states for a node16:21
lucasagomes[target_]power_state, [target_]provision_state16:21
lucasagomesdevananda, NobodyCam any objections ^16:21
lucasagomess/objections/objections?/g16:22
linggaoagree. now with my path, the GET console returns {'console_enabled': True, 'console_info': {'type': 'shellinaboxd', 'url': 'http://<hostname>:<port>'}} for the node with consolle enable. {'console_enabled': False, 'console_info': None} for the node with console disabled.16:23
lucasagomesyeah that's good, much easier to parse16:24
lucasagomescheck if enable, then get console_info16:24
*** rloo has joined #openstack-ironic16:25
linggaoyes16:25
*** zdiN0bot has joined #openstack-ironic16:31
*** zdiN0bot has quit IRC16:35
*** ilives has quit IRC16:36
*** Mikhail_D_ltp has joined #openstack-ironic16:44
*** newell_ has joined #openstack-ironic16:48
*** mdickson has quit IRC16:48
kevinbentondevananda: ping16:48
*** mdickson has joined #openstack-ironic16:49
*** harlowja_away is now known as harlowja16:51
*** matty_dubs|lunch is now known as matty_dubs16:53
devanandamorning, all16:57
devanandaShrews: nova.virt.baremetal driver implements rebuild method, and afaik tripleo uses / tests it16:57
*** foexle has quit IRC16:58
*** rloo has quit IRC16:58
*** rloo has joined #openstack-ironic16:58
*** wendar has quit IRC16:59
devanandalucasagomes: "INFO for success, ..." yes. I'm going to add that to the bug report (or open one if we dont already have one)16:59
*** rloo has quit IRC17:00
*** vkozhukalov has quit IRC17:00
lucasagomesdevananda, :) ack17:00
*** rloo has joined #openstack-ironic17:00
NobodyCamGood morning devananda :)17:00
devanandaand good morning :)17:00
lucasagomesdevananda, morning17:00
*** wendar has joined #openstack-ironic17:01
Shrewsdevananda: but that driver isn't being used. there is a default implementation: http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py#n238217:01
*** derekh has quit IRC17:02
*** rloo has quit IRC17:04
devanandaShrews: http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py#n255117:04
devanandahttp://git.openstack.org/cgit/openstack/nova/tree/nova/virt/baremetal/driver.py#n30817:04
*** rloo has joined #openstack-ironic17:04
devanandaShrews: that is being called. I tried calling it for nova.virt.ironic and got http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/driver.py#n27117:05
Shrewsdevananda: i call it on my devstack test machine and I get a rebuild17:05
devanandawith ironic??17:05
Shrewsusing nova.virt.ironic17:06
Shrewsyup17:06
devanandahuh. i got NotImplemented when i tried it yesterday17:06
*** rloo has quit IRC17:06
Shrewsdevananda: you ran 'nova rebuild' directly i guess?17:06
devananda"nova rebuild --preserve-ephemeral" to be precise17:06
*** rloo has joined #openstack-ironic17:06
Shrewsah, with that option, you'll get an error17:06
*** max_lobur has joined #openstack-ironic17:07
Shrewsdevananda: http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py#n239117:07
Shrewsdevananda: http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py#n239117:07
Shrewsoops, sorry17:07
Shrewsirc lag17:07
devanandaah17:08
devanandaok, so default impl is going to actually destry() + spawn()17:08
*** Alexei_987 has quit IRC17:08
devanandawhich for ironic means releasing the node, then trying to reclaim it -- nice race condition17:08
devanandasomeone else could claim it during that window17:08
openstackgerritJosh Gachnang proposed a change to openstack/ironic: Adding swift temp url support  https://review.openstack.org/8139117:10
*** dwalleck_ has joined #openstack-ironic17:11
*** eghobo has quit IRC17:11
*** eghobo has joined #openstack-ironic17:11
*** rloo has quit IRC17:12
*** rloo has joined #openstack-ironic17:13
*** dwalleck has joined #openstack-ironic17:13
openstackgerritDevananda van der Veen proposed a change to openstack/ironic: nova.virt.ironic passes ephemeral_gb to ironic  https://review.openstack.org/8739617:14
*** dwalleck_ has quit IRC17:16
*** rloo has quit IRC17:18
*** rloo has joined #openstack-ironic17:18
linggaomorning devananda17:20
linggaodevananda, lucasagomes and I discussed the console API. we have a new proposal.17:21
linggaodevananda, could you roll back and see if you have any objections to it?17:21
devanandalinggao: I skimmed the discussion - can you summarize?17:22
linggaoyes, devananda. GET v1/nodes/<uuid>/console will return the console information. {'console_enabled': True, 'console_info': {'type': 'shellinaboxd', 'url': 'http://<hostname>:<port>'}} for the node with consolle enabled. {'console_enabled': False, 'console_info': None} for the node with console disabled.17:24
linggaoPUT v1/nodes/<uuid>/console will enable/diable the console17:24
linggaoand remove console_enabled from GET v1/nodes/<uuid>/states because it already has a lot of states.17:25
linggaothis way makes console a sub-element itself under node.17:26
devanandalinggao: how will this expose an error?17:27
devanandalinggao: eg, if console can not be started, what will the API show?17:28
linggao??17:28
linggaothe last_error in the details will show the error.17:29
linggaodevananda, I see what you mean. the last_error is in the v1/nodes/<uuid>/states17:31
*** zdiN0bot has joined #openstack-ironic17:32
* linggao thinking...17:32
*** rloo has quit IRC17:33
*** rloo has joined #openstack-ironic17:33
devanandaright17:34
linggaodevananda, lucasagomes, now the console and power and provision are sharing the last error. one will overwrite ther other.17:36
devanandalinggao: yep, which is not very helpful17:36
*** zdiN0bot has quit IRC17:36
devanandaeg, conductor.utils.node_power_action will clear node.last_error17:37
linggaoright.17:37
devanandaso even sync_power_states periodic task may clear the node.last_error in some cases17:37
devanandaand by no action from the user, they might "lose" the console error message17:37
devanandathe more independent states we add (power, provision, console...) the more each one really needs a separate target_* and error_* state17:38
lucasagomes+117:38
NobodyCamdevananda: linggao lucasagomes if we are making the console a entire subclass could / should it nit return its own (last) error? ie.. {'console_enabled': False, 'console_info': None, 'console_error': 'blah'}17:39
devanandaNobodyCam: exactly17:39
devanandaNobodyCam: except, right now, power and provision share a single error state17:40
NobodyCamya.17:40
linggaoNobodyCam, but we do not have a place in the db to store the error so far.17:40
devanandaif we all agree that's the right way to go (independent error states) then let's work on that17:40
openstackgerritJosh Gachnang proposed a change to openstack/ironic: Adding a reference driver for the agent  https://review.openstack.org/8479517:40
NobodyCamdevananda: ++17:41
NobodyCamya17:41
linggao+117:41
NobodyCamI think that is the way we should shoot for17:41
devanandalucasagomes: rloo: NobodyCam: side track. what do y'all think about having a required blueprint format? take a look at https://wiki.openstack.org/wiki/TroveBlueprint17:41
* NobodyCam clicks17:41
* lucasagomes lemme read the scrollback17:41
devanandalucasagomes: it's a side track, lol. not directly related to scrollback :)17:42
lifelessmorning17:42
devanandalifeless: g'morning!17:42
lucasagomesdevananda, yeah reading NobodyCam ping as well17:42
NobodyCammorning lifeless :)17:42
NobodyCamdevananda: I like that! (bp template)17:43
lucasagomesdevananda, https://github.com/openstack/nova-specs/blob/master/specs/template.rst17:43
lucasagomesso nova now has this nova-specs repo17:44
NobodyCammakes people think about the over impact of a BP17:44
devanandalucasagomes: right. that's another option17:44
lucasagomesdevananda, yeah I see that neutron wants to adopt the nova way of doing it17:45
lucasagomesI didn't dig much into it to see how it's really done17:45
matty_dubsTripleO was just talking about the Neutron way yesterday as well17:45
lucasagomesdevananda, but I would +1 to have some standardlization of bps17:45
matty_dubsI haven't looked too deeply and don't have a strong opinion17:45
lucasagomesidk if it's trove-way or nova-way17:45
lucasagomesmatty_dubs, nice, tripleo wants to have a -specs repo as well?17:46
linggaodavananda, but you do aggree to move console GET/PUT from ../states/console to ../console, right?17:46
devanandalifeless: opinions? we're discussing how to standardize BPs -- nova-specs way, or just a launchpad template. I think it'd be helpful if both tripleo and ironic use same approach / similar format17:46
lucasagomesdevananda, http://lists.openstack.org/pipermail/openstack-dev/2014-March/029232.html17:46
lucasagomesthat's the nova discussion17:46
* lucasagomes didn't read yet, will do17:47
devanandalucasagomes: thanks17:47
devanandalinggao: i haven't thought through the ramifications of that yet17:47
matty_dubslucasagomes: That was my understanding, though I wasn't familiar with the Nova background at the time so I didn't fully grasp it then.17:47
lucasagomesmatty_dubs, ack... at a first glance, being able to +2 blueprints seems good17:47
devanandalinggao: my initial reaction is to keep it in ./states/console for now, because moving console to ../console -- without also moving the other states -- is inconsistent17:48
devanandalinggao: for an API change like that, I'd like more time and some documentation (like a blueprint for it)17:48
rloodevananda: yes to standardizing on something. If people think the nova-way is the way to go, that would be fine. I haven't been following it to have any opinion yet ;)17:49
devanandaheh17:49
devanandathis sentence: "The results of this have been that design review today is typically not17:49
devanandahappening on Blueprint approval, but is instead happening once the code17:50
devanandashows up in the code review."17:50
rloodevananda: is that before or after the new 'system'? Hopefully before!17:50
devanandarloo: that's a comment from the nova discussion as to why they made the switch17:51
lifelessdevananda: http://lists.openstack.org/pipermail/openstack-dev/2014-April/032768.html17:51
rloodevananda: ah. so yes, I also don't like the design being 'discussed' as one is code-reviewing. So +2 for anything that prevents that!17:51
devanandarloo: exactly17:52
NobodyCamlucasagomes: reading that link I also like the idea of iteratting BP's in gerrit. but not sure about steps #1 (create bad blueprint) and #4 (once approved copy back the approved text into the blueprint)17:53
lifelessI'm not sure the lp tracking item should be created until the spec has two x +2's17:54
linggaodevananda, lucasagomes and NobodyCam, we can add a console_error in the nodes table. It is just odd to have to go to ...states/console to get console information. I can write a blueprint for it.17:54
devanandayea, i dont see the benefit to the LP artefact prior to the spec approval17:55
lucasagomesNobodyCam, yeah, I'll read it more soon... I'm testing something so didn't want to stop to read17:55
devanandaseems like just more noise17:55
JayFThe thing I don't always get about the way blueprints are done in general in Openstack is it assumes you know the right way to do something from the beginning, when some of the best designs I've seen were ones that were built one small piece at a time, where you know the first step, and you know, in a general sense, where you're going.17:56
devanandaJayF: what I immediately like about https://github.com/openstack/nova-specs/blob/master/specs/template.rst is that it requires the proposer to think about the impact of their changes17:57
lifelessJayF: that aspect is terrible ;)17:58
lifelessJayF: and why I generally don't go for blueprints at all - but its a scaling problem for new team members to get good feedback on ideas17:58
lifelessJayF: and *that* aspect is valuable17:58
JayFI mean, it's the one aspect I care a lot about? How can you decide how software should run until you /make/ it run.17:58
devanandasure17:58
*** eguz has joined #openstack-ironic18:00
*** eghobo has quit IRC18:04
*** epim has joined #openstack-ironic18:05
jrollthis looks like failure to RPC, yes? https://gist.github.com/jimrollenhagen/cc1970e2b28e875a735c18:09
jrollmy conductor node seems to die after this happens18:10
jroll(I'm powering off a bunch of nodes in a loop)18:10
*** todd_dsm has joined #openstack-ironic18:14
lucasagomesjroll, maybe you hit this; https://review.openstack.org/#/c/88017/18:15
jrolllucasagomes: yes, but I think there's a bigger issue in the first traceback18:17
jrollit makes it through a couple of other stuck nodes just fine18:17
jrollunless... I think maybe I see what you mean18:17
jrollwe end up with no free workers?18:17
lucasagomesjroll, no, I just saw that patch and thought it was related... lemme take a look at the traceback u have18:19
lucasagomeshmmm no idea... scary that the conductor just died after it18:21
jrollright18:21
lucasagomesbrb18:21
jrollI'll dig in18:21
devanandajroll: that doesn't look like it should cause a failure18:21
jrollI wonder if our rabbit is somehow getting overloaded18:21
lucasagomesack, yeah I can't see why the conductor would die because of that error18:22
*** lucasagomes is now known as lucas-dinner18:22
jrolldevananda: I agree, but right after that I get spammed with "No valid host was found. Reason: No conductor service registered which supports driver agent_ipmitool."18:22
devanandainteresting18:22
jrollno errors in conductor log, and conductor process is still running18:22
devanandahttp://git.openstack.org/cgit/openstack/ironic/tree/ironic/conductor/manager.py#n45718:23
openstackgerritChris Krelle proposed a change to openstack/ironic: Fix for tripleO undercloud gate tests DO NOT MERGE  https://review.openstack.org/8552918:23
jrollrestarting the conductor seems to make it happy18:23
devanandajroll: add logging there and see if it's getting run18:23
devanandajroll: it may be that something has starved the main thread and the keepalive didn't get sent18:24
devanandajroll: also can you check the timestamp on the conductors table -- that could be another way to confirm this hypothesis18:24
jrollyeah, that was my next thought18:24
devanandajroll: if so, https://review.openstack.org/#/c/84862/10 is related, but not a complete fix18:25
devanandajroll: since it sounds like this starvation is not coming from a periodic_task -- it's the main thread starving out a periodic_task18:25
devanandas/main thread/RPC receivers/18:25
* devananda needs more coffee18:26
jrollright18:26
*** coolsvap is now known as coolsvap|afk18:26
jrollheh18:27
jrollyou are right, sir18:27
devanandaheh. i'm not sure if that's a good thing18:28
jrolldevananda: we have touch_conductor running at 10s intervals, because we suspected it18:28
jrollit goes to 3-5 *minutes* while we're pounding the conductor18:28
devanandawow18:28
jrollindeed.18:28
jrollI'll file a bug right now18:28
devanandaok, so we can easily DOS the conductor18:28
devanandathanks18:28
jrolland take a look after lunch18:28
jrollheh, np18:28
*** vkozhukalov has joined #openstack-ironic18:30
*** zdiN0bot has joined #openstack-ironic18:32
openstackgerritChris Krelle proposed a change to openstack/ironic: Fix for tripleO undercloud gate tests DO NOT MERGE  https://review.openstack.org/8552918:33
*** todd_dsm has quit IRC18:33
jrolldevananda: fyi https://bugs.launchpad.net/ironic/+bug/130868018:33
* jroll -> lunch18:34
*** athomas has quit IRC18:34
*** datajerk has quit IRC18:35
NobodyCamDOS our conductor.. ieek18:35
devanandajroll: just curious - how many nodes are enrolled? how many conductor services are running? and is it hw or virt?18:36
*** zdiN0bot has quit IRC18:36
*** epim_ has joined #openstack-ironic18:39
*** epim has quit IRC18:42
*** epim_ is now known as epim18:42
*** datajerk has joined #openstack-ironic18:42
comstuddevananda: would you mind reviewing the dependent patch for that power state sync yield?18:48
comstudhttps://review.openstack.org/#/c/87076/18:48
comstudI'd like to get that to land.. it has potential for conflicting a lot due to moving a lot of tests in conductor test_manager18:49
devanandacomstud: started looking at it already. decided i needed food first :)18:49
comstudhaha np18:49
devanandayea18:49
NobodyCam:)18:49
*** russellb has quit IRC18:53
*** russellb has joined #openstack-ironic18:55
linggaodevananda, lucasagomes and NobodyCam,  here is the blueprint https://blueprints.launchpad.net/ironic/+spec/update-console-api.18:56
linggaoI hope you guys can at least agree on the #1 item on the blueprint, I already have a patch on it for review.18:56
jrolldevananda: 72 nodes on bare metal; one conductor and one api, running on separate VMs. conductor craps out after 30-40 "power off" commands.19:05
devanandajroll: awesome19:05
jrollusing ipmitool for power, if it matters19:05
devanandajroll: i'd love to see the timing data from that19:05
devanandayea, it does19:05
jrolltiming data on the power commands?19:06
rloocan the Rally stuff be hooked in jroll?19:06
devanandajroll: avg/min/max time for each power operation19:06
devanandahow long each worker thread is occupied19:06
jrollrloo: I know nothing about rally. maybe?19:07
jrolldevananda: sure, I'll grab logs and see what I can dig up19:07
rloojroll, romcheg has been looking into it. maybe he'd have an idea as to how easy, blah blah.19:07
jrollyeah, I'd like to hear about it19:07
romcheg1jroll, rloo: Morning, yeah, working on it19:08
romcheg1There are a few problems I experienced. Hopefuly everything will get merged tonight :)19:08
jrollcool :)19:08
openstackgerritJosh Gachnang proposed a change to openstack/ironic: Adding a reference driver for the agent  https://review.openstack.org/8479519:11
devanandajroll: as for why 30 - 40 is the magic number, 30 is the default rpc worker pool19:13
jrollaha19:13
devanandajroll: you're filling up the threadpool faster than requests complete. IPMI is slow19:13
JayFdevananda: could we bump that worker pool to workaround our issue for now?19:13
devanandayes19:13
JayFIs that a reasonable default?19:13
devanandai dont know :)19:14
jrollJayF: I told you we needed more conductors :P19:14
openstackgerritDevananda van der Veen proposed a change to openstack/ironic: Better handling of missing drivers  https://review.openstack.org/8357219:14
JayFI told you more conductors would hide the bug, that seems to remain valid ;)19:14
devanandaheh19:15
devanandaso19:15
devanandapower cycling 100 machines from a single conductor should be achievable -- it's mostly waiting on IPMI19:15
devanandabut19:15
*** jdob_ has joined #openstack-ironic19:15
devanandadeploying to 100 machines in parallel? that is definitely not achievable to day from a single conductor19:16
devanandaIO will bottleneck19:16
devanandathere's some amount of parallelization that's reasonable today, depending on hardware19:16
devanandawell19:16
devanandathat is, with the PXE driver.19:17
jrollI'd say it's achievable in the agent model, at least to the point where you saturate glance :)19:17
devanandawith the Agent driver, ... right19:17
jrollyeah19:17
JayFI want to find those limits (in the agent_ipmitool driver), stretch it to it's limit, and file bugs and fix the bottlenecks19:17
devanandayou probably wont saturate tftp, even with 100 booting in parallel19:17
devanandaJayF: ++19:17
JayFbecause we'll want to boot a lot more than a hundred boxes on a single conductor19:17
devanandaindeed19:17
*** rloo has quit IRC19:18
devanandai bet i can dupliate that bug locally19:18
*** rloo has joined #openstack-ironic19:18
devanandajroll: any data you have on how long it is taking to power-off a single node via ipmitool would be helpful19:20
JayFdevananda: some # of these nodes, ipmi is timing out19:20
JayFso that's almost certainly adding to the contention19:20
devanandagreat19:20
devanandathat's unfortunately normal19:20
devanandaso we need to handle that w/o the conductor falling over19:20
JayFI agree entirely19:21
*** rloo has quit IRC19:22
*** rloo has joined #openstack-ironic19:22
JayFwhen log_file is set, what things would still be writing to stdout?19:25
JayFwe're seeing different output from stdout on an api server than what is going to a log, and I need the info from stdout to hit a log somewhere19:25
*** zdiN0bot has joined #openstack-ironic19:27
devanandawhat's a trivial way to block a greenthread?19:33
devanandaJayF: there are a set of logging options that should control what the api logs where19:33
devanandaJayF: part of oslo.logging19:33
JayFit appears to be wsgiref19:33
JayFthat's doing the logging I'm referring to19:33
devanandaJayF: ah. fwiw, you can also use apache mod_wsgi. probably will perform better19:34
NobodyCambrb19:34
*** dwalleck has quit IRC19:39
*** jistr has quit IRC19:40
*** dkehn_ has joined #openstack-ironic19:43
*** dkehn_ has quit IRC19:46
*** dkehnx has quit IRC19:46
*** dkehn_ has joined #openstack-ironic19:46
*** zdiN0bot1 has joined #openstack-ironic19:46
*** zdiN0bot has quit IRC19:46
comstuddevananda, jroll: so, eventlet will essentially never switch greenthreads if there's always stuff to process from the queue19:47
comstudi mean19:47
comstudif the read() from the rabbit socket never returns EAGAIN..19:47
comstudand you don't have any other I/O to cause greenthread switches19:47
*** zdiN0bot1 has quit IRC19:48
comstud(DB calls do not right now -- we put some explicit yield in all DB calls in nova, but it's ugly)19:48
*** zdiN0bot has joined #openstack-ironic19:48
comstud(and causes some other side effects)19:48
jrollhrm.19:48
devanandajroll: great. i can repro locally with a fake driver19:49
comstudeventlet will only switch if it gets a EAGAIN on socket I/O19:49
jrolldevananda: cool19:50
jrollcomstud: so, is there anything we can do at the ironic level?19:50
comstudi have something for you to try19:51
comstudsec19:51
comstudthis is ugly19:54
comstudbut try it19:54
comstudhttp://paste.openstack.org/show/75966/19:54
comstudoops19:54
comstudexcept use the correct decorator name19:54
* jroll tries19:55
devanandacomstud: I think the solution is just to put an explicit yield in the ipmitool driver19:55
comstudhow is ipmitool implemented?19:56
comstudis it something not wrapped by eventlet?19:56
comstud(C code underneath)19:56
devanandahttp://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipmitool.py#n13519:56
comstudok, that should be fine19:57
comstudthat should cause a yield19:57
devanandak19:57
comstudassuming you're monkey patching os19:57
*** zdiN0bot has quit IRC19:57
comstudor subprocess i guess19:57
devanandawhih we're not19:57
devanandahttp://git.openstack.org/cgit/openstack/ironic/tree/ironic/__init__.py#n2219:57
jrolloh. welp.19:58
comstudhm19:58
comstudlet me check if that wraps subprocess or not19:58
devanandai dont recall if we had a reason for that19:58
jrolldevananda: is there an explicit reason for os=False?19:58
jrolloh19:58
comstudi think i've see os=False in nova for some reason too19:58
comstudmaybe not19:58
*** jdob_ has quit IRC19:59
jrollyeah, comstud, yielding for the db didn't do anything20:01
comstudok20:01
*** romcheg1 has quit IRC20:02
comstudinteresting20:02
devanandacomstud: yea, this is from the initial refactoring -- http://git.openstack.org/cgit/openstack/ironic/diff/ironic/cmd/__init__.py?id=0480834614476997e297187ec43d7ca500c8dcdb20:03
jrollcomstud: I don't think this is blocking on the db20:03
jrollI think it's blocking on ipmi like deva said20:03
comstudI don't see where eventlet patches subprocess at all20:04
comstudunless you explicitly import eventlet.green.subprocess20:04
devanandajroll: i think it's not actually blocking20:04
devanandaor it is?20:04
jrollI don't really know20:04
comstudKeyError: 'eventlet.green.subprocess'20:04
devanandai mean, there's loopingcall20:04
devanandawe're explicitly tying up a greenthread while waiting for power on/off to succeed20:04
comstudthat module is not loaded at all20:04
comstudeven after calling monkey_patch20:05
devanandaso even wiuthout the greenthread blocking on db/io/foo20:05
devanandawe can still run out of threads from the pool, no?20:05
jrollsure20:05
jrollbut I tested this with pool size of 256 and got the same thing20:05
jrollso I don't think it's that it's running out of worker threads20:06
devanandajroll: rpc_conn_pool_size? rpc_thread_pool_size?20:06
jrollrpc_thread_pool_size=256; rpc_conn_pool_size=25620:06
jrollboth20:06
devanandahrm20:06
devanandaand you clearly have <256 nodes20:06
comstudjroll: try adding: from eventlet.green import subprocess20:06
comstudinstead of import subprocess20:06
comstudin utils.py or wherever it is20:06
jrollcomstud: yeah, I was just going to do that20:06
jrolldirectly in the ipmitool code, yes?20:07
comstudin utils.py20:07
comstudwherever execute is20:07
jrolloh right20:07
comstudthat calls subprocess.. i'm assuming it does20:07
comstudSo20:07
jrollno, processutils20:07
* jroll dives into oslo20:07
comstudre: rpc pool size.. it's is not really useful beyond a few threads, really.20:08
jrollwhich uses eventlet.green.subprocess20:08
comstudah so it does20:08
comstudlol ok20:08
comstudwell.. interesting.20:08
jrollindeed.20:09
comstudtry adding explicit eventlet.sleep() somewhere in your call path that's being slammed20:09
comstudbut it almost sounds like something is blocking the whole process somewhere in the path20:10
*** zdiN0bot has joined #openstack-ironic20:10
comstud(besides DB calls)20:10
comstudmaybe you need the multiple-process-workers support for conductor.20:10
comstudthat would be useful no matter, I think20:10
devanandaquite possibly, ya20:11
comstudor many more conductors.20:11
devanandacomstud: how complete is support for that in the various oslo utils? i haven't looked in at least a year20:11
comstudunless it's a single node slamming20:11
jrollcomstud: you're talking about adding the sleep() call in the code path of the worker, yes? or the code that spawns the worker?20:11
comstuddevananda: I don't know, I haven't looked at oslo for it20:11
comstudwe certainly use it in nova.. but i dunno if it's coming from oslo or not!20:12
comstudjroll: the side that's starved.. i assume conductor20:12
jrollcomstud: and that can be a sleep(0) yeah?20:12
comstudyeah20:12
jrollwell, it's in the conductor for sure20:12
jrollit's an RPC call to change_node_power_state20:12
jrollwhich does self._spawn_worker(utils.node_power_state, ...)20:13
comstudhm20:13
comstudsec, looking20:13
jrollironic/conductor/manager.py20:13
comstudright20:13
comstudhm20:14
jrollalthough20:14
comstudi don't know there's much point to the extra worker pool20:14
jrolloh20:14
comstudover having the rpc pool20:14
comstudbut20:14
comstudstick it in the worker20:14
*** zdiN0bot has quit IRC20:14
jrollah ha20:14
jrollwell20:14
jrollit does that power.validate() call first20:14
jrollwhich is not in a worker20:15
jrolland hits ipmi20:15
jrollso it might be choking there20:15
JayFand if it, say, hit a half dozen bad IPMI addresses20:15
comstudit should still switch greenthreads20:15
jrollJayF: nah, that part is synchronous20:15
comstudoh, i wonder.20:15
openstackgerritDevananda van der Veen proposed a change to openstack/ironic: DO NOT MERGE - demonstration of reproducing 1308680  https://review.openstack.org/8807620:16
*** jistr has joined #openstack-ironic20:17
NobodyCamrloo: you happen to be about?20:17
rlooNobodyCam: I happen to be about.20:17
NobodyCam:)20:17
NobodyCamhappen to have a second to rebase 8510720:17
rloosure.20:17
devanandajroll, comstud: simple steps to reproduce ^20:18
jrolldevananda: yeah, I see that20:18
openstackgerritlinggao proposed a change to openstack/ironic: Modify the get console API  https://review.openstack.org/8776020:18
*** max_lobur1 has joined #openstack-ironic20:18
*** max_lobur has quit IRC20:18
comstuddevananda: well yeah, all DB calls are going to block.. and there's nothing we can do about that20:19
comstudif you sleep in all of your available threads, then boom.20:19
devanandacomstud: right -- that's fine. the point of that patch is to demonstrate the failure. not solve the db blocking issue20:19
comstudI should say... 'nothing we can do about that right now'20:19
comstudgotcha20:19
comstudI'm just wondering if that's the same problem as jroll is seeing.20:19
comstudor not.20:20
devanandacomstud: same result. different cause.20:20
jrollwell20:20
devanandaprobably different cause20:20
jrollI have 72 nodes, and 256 workers.20:20
jrollso it's not all of the workers being used.20:20
devanandajroll: what's the uptime on that machine?20:20
jrollor being blocked20:20
comstudwell20:20
devanandajroll: or the VM where ir-cond is running20:20
comstudworkers in this case would be...20:20
comstudthe process...20:20
comstudi mis-spoke20:20
devanandajroll: like -- what's the CPU utilization, context swaps, etc20:20
jrolldevananda: 34 days20:20
comstud1 just DB call doing a SLEEP 1020:20
jrollbut CPU is never pegged20:21
comstudis going to hang the process for 10 seconds20:21
devanandajroll: vmstat 1 1020:21
comstudjroll: I have another thing for you to try20:21
comstudeven thought it will probably cause other issues at some point20:22
comstudbut just to verify something..20:22
comstudsec20:22
jrolldevananda: https://gist.github.com/jimrollenhagen/001cefc6422e5f8f8f3020:22
devanandacomstud: ahh. yea. damn.20:22
devanandacomstud: my patch is actually demonstrating something entirely different20:23
devanandaif a single DB query takes longer than heartbeat_timeout time, the conductor appears to be dead20:23
devanandasince the db query is blocking the whole process, not just the thread20:23
comstuddevananda: nod20:23
comstudjroll: http://paste.openstack.org/show/75972/20:24
comstudtry that just for shits20:24
devanandagreenthreads hurt my brain sometimes. I want real threads ...20:24
comstudeventlet is not completely Thread safe here, but..20:24
comstuddevananda: nod!20:24
comstudi've let my eventlet replacement sit over a month now20:24
comstudI need to get back to it.20:24
jrollcomstud: running...20:25
comstudgood luck20:26
jrollheh20:26
comstudif it even works20:26
NobodyCamhumm.... http://paste.openstack.org/show/tBVlJZHhhxjUK7deqVgQ/20:26
devanandajroll: i'd also like to see the output of 'vmstat 1 10' while this is mid-run20:27
comstudyeah, that'd be interesting also20:27
openstackgerritRuby Loo proposed a change to openstack/python-ironicclient: Adds documentation for ironicclient API  https://review.openstack.org/8510720:27
*** zdiN0bot has joined #openstack-ironic20:28
jrolldevananda: so when this occurs, things go smoothly for a bit, then the api hangs, then everything bursts through after api decides conductor doesn't exist20:29
jrollbut I did get one in20:29
jrollduring the hang20:29
*** zdiN0bot has quit IRC20:29
jrolldevananda: https://gist.github.com/jimrollenhagen/82f7b6bf942f8215b35c20:29
*** zdiN0bot has joined #openstack-ironic20:29
comstudthe api hangs too?20:29
*** pradipta_away has quit IRC20:29
jrollyeah20:30
comstudfor how long?20:30
jrollas I understand it, the RPC call is synchronous20:30
comstudputting into the queue and waiting for response will be asynch20:31
jrollcomstud: 34 seconds this round20:31
jrollor at least, this call to the API hangs20:31
jrollidk about other calls20:31
comstudk20:31
jrollbut yeah, that tpool fix didn't help20:32
devanandayea, i'm seeing the API hang too20:32
comstudgoing to hop on the api node and look while you slam20:32
comstuddid you put it on the conductor side only?20:32
jrollyes20:32
devanandajroll: that vmstat run is from during the its-all-hung period?20:32
jrolldevananda: yes20:32
comstudjroll: try also putting it on the api side20:33
jrolldevananda: on the conductor side20:33
devanandahuh20:33
jrollcomstud: ok20:33
comstuddevananda: yeah, something's fishy20:33
comstudi have 1 idea20:33
* comstud checks something20:33
*** zdiN0bot has quit IRC20:34
*** vkozhukalov has quit IRC20:34
comstudcould DB be waiting on a lock for something for a long period of time?20:34
jrollI don't think so, I think it just kicks out here if the node is already locked20:34
comstudand maybe a lock wait timeout we're hiding/not seeing?20:34
jrollbut I see where you're going with this20:34
*** pradipta_away has joined #openstack-ironic20:34
comstudwell, i'm talking about an actual DB lock20:34
comstudvs the reservation column20:34
jrollohhhh20:34
jrollI doubt it?20:35
comstudjust an idea20:35
comstudchecking db for slow query log20:35
jrollok20:35
jrollI just restarted services, going to start hitting it20:35
jrollif you want to watch20:35
devanandaok, something's fishy with just changing the power state20:36
devanandamy patch to add db(sleep) into the fake driver is causing the *client* to wait20:36
devanandathose are supposed to be async20:36
comstuddon't you wait to acquire reservation before returning rpc response?20:37
devanandacomstud: yes20:37
comstudwouldn't that cause the client to wait?20:37
devanandaif that blocked20:38
comstudif you're sleeping20:38
devanandabut i'm not runing anything else on this node20:38
comstud(i dunno where you inserted the sleep now)20:38
devanandai added the sleep into drivers.modules.fake.FakePowerDriver.set_power_state20:38
comstudah20:38
devanandahttps://review.openstack.org/#/c/88076/1/ironic/drivers/modules/fake.py20:38
comstudok, that seems bad20:38
devanandahacky way to simulate ipmitool blocking20:39
comstudnod20:39
devanandawhile using the fake driver20:39
devanandayea. wtf20:39
*** Mikhail_D_ltp has left #openstack-ironic20:40
devanandahttp://git.openstack.org/cgit/openstack/ironic/tree/ironic/conductor/manager.py#n24520:40
devanandayea, wtf20:41
devanandathat method exits but the RPC call isn't returning20:41
russell_hdevananda: I'm seeing ours returning _much_ later20:43
jrollhuh.20:43
russell_hthe RPC call times out in the API20:44
russell_hthen a long time later the API gets a response that it doesn't know what to do with20:44
russell_hjroll: one of those should hit the API log soon20:44
devanandarussell_h: right. so the problem *there* is that the RPC call isn't returning right away20:44
devanandawhich it should be20:44
comstudIs the API blocking on something ?20:44
jrollrussell_h: one hit about 4 minutes ago20:45
comstudand not getting the reply?20:45
devanandamanager.change_node_power_state should be spawning another thread, then returning right away20:45
russell_hjroll: the timeout hit20:45
devanandait's not20:45
russell_hjroll: but the reply from the conductor should hit soon I think20:45
russell_hunless you restarted it20:45
jrollrussell_h: I mean the "no calling thread" thing20:45
devanandaoi, i need to jump on a call20:45
russell_hoh20:45
russell_htotally right, I missed it20:45
jroll2014-04-16 20:40:32.232 24876 WARNING ironic.openstack.common.rpc.amqp [-] No calling threads waiting for msg_id : d1a3ba68fecd4587b4af7bbc9a2cfef320:45
jrollya20:45
jrolldevananda: have fun20:46
* jroll needs to take a walk20:46
devanandathis http://git.openstack.org/cgit/openstack/ironic/tree/ironic/conductor/manager.py#n72120:46
devanandais allowing control flow to continue20:47
devanandabut preventing the RPC response from being sent back to the API20:47
devanandauntil the worker_pool.spawn()'d thread finishes20:47
devanandathat's causing all kinds of problems20:47
jrollwat20:48
devanandalike a simple "ironic node-set-power-state" is hanging for 10 seconds20:48
jrollspawn() doesn't return immediately?20:48
devanandano -- it does!20:48
devanandaexecution continuies20:48
jrollwhat's preventing RPC response then?20:48
devanandathe RPC response that should happen when that thread exits ISN OT20:48
devanandaI dunno20:48
jrollheh20:48
jrollok20:48
jrollinteresting20:48
*** eguz has quit IRC20:49
devanandajroll: http://paste.openstack.org/show/75977/20:50
*** eghobo has joined #openstack-ironic20:50
jrollwtf indeed20:51
* jroll bbiab20:51
*** romcheg1 has joined #openstack-ironic20:52
*** jdob has quit IRC20:55
russell_hdevananda: here's what I'm thinking20:58
openstackgerritlinggao proposed a change to openstack/python-ironicclient: node-get-console command use the new API  https://review.openstack.org/8776920:59
russell_hdevananda: something probably yields to the event loop before the RPC response is fully written20:59
* devananda will bbiah20:59
russell_hdevananda: at which point your long-running call blocks the event loop20:59
* russell_h source dives into oslo21:00
russell_hjroll: I knew we should have used twisted ;)21:00
openstackgerritlinggao proposed a change to openstack/python-ironicclient: node-get-console incorporate the changes in API  https://review.openstack.org/8776921:01
*** romcheg1 has quit IRC21:03
jrollrussell_h: I have a commit that says otherwise21:10
dividehexis it possible to separate pools of baremetal nodes? for example if I wanted to allocate a baremetal instance from a pool of identical machines21:13
*** linggao has quit IRC21:14
jrollrussell_h: so if you're right, eventlet.sleep(0) in utils.node_power_action should solve this, but... doesn't21:16
russell_hjroll: not necessarily21:16
*** mrda_away is now known as mrda21:16
jrollwell, sure21:16
*** max_lobur1 has quit IRC21:17
jrollsigh21:17
russell_hjroll: try eventlet.sleep(1)21:17
jrollheh, that was my next guess21:17
*** jbjohnso has quit IRC21:20
mrdarussell_h: that would be sad if we needed eventlet.sleep(1) there in power action. It does take a float, so perhaps eventlet.sleep(0.1) might be sufficient?21:21
russell_hmrda: if we have to sleep at all I'm going to rage21:22
mrdaright21:22
russell_hmrda: I'm just thinking of how we could diagnose this issue21:22
russell_hthe goal would be to give rpc time to sort its shit out21:22
russell_hbefore embarking on any blocking call21:22
russell_hif that fixes it, I'd be more confident in my theory on this21:23
jrollrussell_h: fwiw sleep(1) didn't help anything21:23
russell_hmeh21:23
russell_hthats in seconds right?21:23
jrollI think the heartbeat is just not running21:23
jrollhell if I know :)21:23
russell_hjroll: I'ma bout to try something21:23
jrollwait21:23
russell_hk21:23
jrollI'm doing things21:23
russell_hwhen you're ready, restart that conductor21:23
russell_hand brace yourself21:23
jrolloh god21:23
* jroll looks at git diff21:24
comstudsorry, i got pulled into something in -nova21:24
russell_hI don't even know if I did that right21:24
jrolloh, this is gonna be fun21:25
comstudso21:25
comstudi missed a lot of stuff here21:25
comstudbut21:25
comstudis that node_power_action call in greenthread worker in conductor?21:26
jrollok so increasing heartbeat timeout at least let me get my nodes powered off21:26
russell_hcomstud: yeah21:26
comstudi don't think the worker should become after until after the RPC call returns21:26
comstudeventlet doesn't tend to schedule greenthreads immediately21:26
comstudbut i'd have to check how it works with pools21:26
comstudbut, for example, spawn() doesn't fire the greenthread immediately21:27
comstudor switch to it, i mean21:27
russell_hcomstud: what I'm wondering though, is if you managed to yield to the event loop before your response is fully written21:27
comstudyeah21:27
russell_hlike, an easy example would be if your TCP buffer filled up I'm guessing21:27
comstudnod21:27
russell_hbut it could be something more likely in the AMQP library or something21:27
russell_ha read-before-write or something21:27
comstudi don't think anything else should cause a switch21:27
comstudbesides a write() and getting EAGAIN21:27
comstudso, socket buffer full.21:28
comstudsetsockopt sndbuf to 128K21:28
comstud:)21:28
comstudor sysctl it21:28
comstudhehe21:28
jrollrussell_h: let me finish powering on these nodes and then I'll run your debug thing21:29
comstudright, i suppose there could be a read before write21:29
jrollpower cycle everything was my goal to begin with21:29
comstudbut the other thing is...21:29
comstudI had roll try wrapping the DB calls in Thread pools21:29
comstudthat would have eliminated the problem here21:29
*** eguz has joined #openstack-ironic21:29
comstudjroll even21:29
comstudoh21:30
*** zdiN0bot has joined #openstack-ironic21:30
comstudactually,21:30
comstudi wonder if you are using our patched eventlet or not21:30
comstudthere's another very bad thing in stock eventlet21:31
comstudin that it likes to use set()s for things.21:31
jrollit's probably stock eventlet21:31
comstudso you don't get very fair scheduling21:31
jrollffs21:31
jrollsigh21:31
comstudbut it mostly applies to Queue21:31
comstudthe event loop is a list (heapq)21:32
jrollrussell_h: I'm not seeing spam from eventlet21:32
russell_hme either21:33
*** jistr has quit IRC21:33
*** eghobo has quit IRC21:34
*** zdiN0bot has quit IRC21:34
*** zdiN0bot has joined #openstack-ironic21:43
*** matty_dubs is now known as matty_dubs|gone22:00
JayFNobodyCam: is there any documentation or do you have a few moments to chat about your dev environment22:06
NobodyCamhumm22:23
*** dwalleck has joined #openstack-ironic22:23
NobodyCamotp right now22:23
JayFNobodyCam: otp?22:24
NobodyCamon the phone22:24
*** dwalleck_ has joined #openstack-ironic22:24
JayFooh. I thought it was some new testing tech I hadn't heard of :P22:25
*** dwalleck has quit IRC22:28
*** jgrimm has quit IRC22:32
*** ilives has joined #openstack-ironic22:32
*** ilives has quit IRC22:38
NobodyCamJayF: where they quick questions that you had.. I ask only because I need to step away from keybord for a bit.22:38
JayFI just wanted to gauge what you were doing, because it seems to work well for you, before thinking about what we should do22:39
JayFit's not urgent, go on your walkabout and have some bagels ;)22:40
* devananda is back22:44
devanandajroll: make any progress diagnosing? I skimmed scrollback but things tapered off ~1hr ago22:44
jrolldevananda: didn't really find too much. if we set our heartbeat timeout to 6000 everything goes through before it dies22:47
*** dwalleck_ has quit IRC22:47
NobodyCamJayF: its eazy.. I run tripleO with USE_IRONIC=122:48
jrolldevananda: so I think it is the heartbeat getting starved22:48
NobodyCam:( /me is actually out of bagels22:48
devananda"before it dies"22:49
devanandahah22:49
*** lnxnut_ has joined #openstack-ironic22:51
*** lnxnut has quit IRC22:51
JayFNobodyCam: wonder how difficult it would be to have that work with the agent. I know very little about ooo22:51
*** lnxnut_ has quit IRC22:51
*** lnxnut has joined #openstack-ironic22:52
devanandashouldn't be hard. change a few settings and maybe install different package / prereq22:52
devanandaJayF: i suspect the biggest obstacle will be that the agent isn't buildable with DIB22:52
JayFFor now, I'd probably just set it up that you'd provide the image to pxe boot22:53
JayFif that's possible at all22:53
*** todd_dsm has joined #openstack-ironic22:57
NobodyCamJayF: if it all works like it should.. the only change should be with how you regiseter the node and ofc having the deploy-ironic element actually use the agent22:58
*** todd_dsm has quit IRC23:09
*** todd_dsm has joined #openstack-ironic23:10
*** killer_prince has quit IRC23:14
*** ifarkas has quit IRC23:14
*** lucas-dinner has quit IRC23:20
*** killer_prince has joined #openstack-ironic23:21
devanandajroll: so the last paste of mine, where it looked like the RPC response wasn't being sent, is because the "background" greenthread was executing after change_node_power_state returned but before the RPC message was sent23:28
devanandathus blocking the whole process23:28
devanandai switched the test to use utils.execute(*['sleep', '10']) and it works as expected -- RPC message is returned right away and background thread waits for 10 seconds23:29
devanandaso that also confirms that utils.execute is getting patched (which i think you guys also confirmed by digging in the code)23:29
jrolldevananda: cool23:29
jrolldevananda: although, russell_h had that theory, we added eventlet.sleep(1) to the utils.node_power_action function23:31
jrolland that didn't help at all23:31
devanandaright23:31
devanandathe way that i recreated the error is not the way you guys are encountering it23:32
devananda*not the same cause23:32
devanandaso using utils.execute(...) i'm getting "lack of free conductor workers"23:33
devanandabut not seeing the conductor fall over23:33
devanandajroll: do you know where (in the python code) the threads are mostly sitting?23:34
jrolldevananda: we are not running out of worker threads23:35
jrollthe problem is that heartbeat does not run fast enough while we're spamming power commands23:35
devanandajroll: that's the symptom. what's causing the periodic task not to run?23:35
jrolldevananda: I'm not sure23:36
jrollI'm off on other tangents atm23:36
devanandai was able to reproduce that symptom with calls that block the whole process23:36
devanandak23:36
*** todd_dsm has quit IRC23:38
jrolldevananda: I may take a crack at it again tomorrow, but I wasn't getting very far :/23:39

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!