Wednesday, 2017-03-29

*** markvoelker has joined #openstack-ha00:01
*** markvoelker has quit IRC00:05
*** yan-gao has quit IRC00:17
*** hoonetorg has quit IRC00:32
*** catintheroof has quit IRC00:34
*** yan-gao has joined #openstack-ha00:42
*** hoonetorg has joined #openstack-ha00:45
*** furlongm has quit IRC01:28
*** kgaillot has quit IRC01:33
*** hoonetorg has quit IRC01:42
*** masahito has joined #openstack-ha01:47
*** hoonetorg has joined #openstack-ha01:54
*** masahito has quit IRC03:14
*** raginbajin has quit IRC03:14
*** zerick has quit IRC03:15
*** zerick has joined #openstack-ha03:18
*** raginbajin has joined #openstack-ha03:21
*** Dinesh_Bhor has joined #openstack-ha04:00
*** dgurtner has joined #openstack-ha04:47
*** dgurtner has quit IRC04:53
*** obre_ has joined #openstack-ha05:06
*** obre has quit IRC05:07
*** masahito has joined #openstack-ha05:17
*** nkrinner_afk is now known as nkrinner06:20
*** dgurtner has joined #openstack-ha06:22
*** pcaruana has joined #openstack-ha06:24
*** dgurtner has quit IRC06:39
*** dgurtner has joined #openstack-ha07:35
*** jpena|off is now known as jpena07:39
*** jpena is now known as jpena|off07:53
aspiersbeekhof, samP: you around?07:57
*** jpena|off is now known as jpena07:59
*** rossella_s has joined #openstack-ha08:03
beekhofaspiers: for now :)08:04
*** ducnc has joined #openstack-ha08:09
*** dgurtner has quit IRC08:14
*** dgurtner has joined #openstack-ha08:31
beekhofaspiers: ok, i'm back08:38
beekhofit was story time08:38
beekhofso fundamentally, my question is, if they can reliably evacuate a VM after a failure, why do whole nodes still need attrd08:38
aspiersby "they" you mean masakari?08:39
aspiersI don't think masakari *needs* attrd but like I said, this architecture needs it in order to avoid a hard-coupling with masakari08:40
beekhofyes08:40
aspiersthe monitoring/notification side needs somewhere to queue failures it spotted08:40
beekhoftbh, i am most of the way towards saying "to hell with it, lets just convert to masakari"08:41
aspierswell that's kind of what this is proposing08:41
aspiersbut retaining some leeway in case we need to swap it for something else08:41
beekhofi mean exactly as-is. no attrd, no fake fencing agents08:42
aspiersand also crucially not doing the crazy host monitoring which masakari currently does08:42
beekhofcrazy?08:42
aspiersmasakari has its own host and process monitoring08:42
aspiersI want to use pacemaker for that08:42
aspierslet me show you some code08:42
beekhofwell, we have host monitoring08:42
beekhofi assume process == vm?08:43
aspiersno process == process :)08:43
aspiersmasakari monitors at 3 levels: host, process, VM08:44
aspiersbut I don't like it doing host/process08:44
beekhofagreed08:44
aspiershttps://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/hostmonitor/hostmonitor.sh08:44
aspiershttps://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/processmonitor/processmonitor.sh08:44
aspiersthose attempt to reimplement aspects of Pacemaker as bash scripts08:45
aspierswhich I do not like08:45
aspiersOTOH, I *do* like this VM monitor https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/instancemonitor/instance.py08:46
aspiersI discussed this with samP and IIRC he was OK with the idea of changing it so that Pacemaker does host / process monitoring08:46
beekhoflog_info "WARNING : $0 is deprecated as of the Ocata release and will be removed in the Queens release. Use masakari-hostmonitor implemented in python instead of $0."08:46
aspiersoh, interesting08:46
aspiersbut still08:47
aspiersthat's what Pacemaker already does, well08:47
aspiersand Pacemaker can guarantee the node has been fenced *before* anything else happens08:47
aspiersI guess that message was referring to https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/cmd/hostmonitor.py08:48
aspiersPython is better than shell but that's not enough to convince me that the architecture is right08:49
aspiersbeekhof: AFAIK the architecture has not changed much from https://github.com/ntt-sic/masakari08:50
beekhofwe do it less well for remote nodes though08:51
beekhofbut yes, fencing08:51
aspiersbeekhof: look at https://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/hostmonitor/host_handler/handle_host.py08:51
beekhofyeah, i was looking at that one :)08:51
beekhofsince its just wrapping pacemaker's view of the world, might as well just let pacemaker do it08:52
aspiershttps://github.com/openstack/masakari-monitors/blob/master/masakarimonitors/hostmonitor/host_handler/parse_cib_xml.py#L13908:53
aspiershardcoding use of IPMI?08:53
beekhofin any case, i agree with your premise. masakari for handling the evacuations, something else for triggering them08:53
aspiersgreat!08:53
aspiersso in terms of how to "sell" this internally08:54
aspiersI guess the points are08:54
aspiersmasakari does a nice job of VM monitoring (and I assume recovery, but I can't remember)08:54
aspiersmasakari is (I think) in Big Tent already, and adheres to a lot of OpenStack conventions08:55
beekhofhow this is impacted by containers will be a significant factor08:55
aspierswe need something which can execute more flexible / sophisticated policies08:55
beekhofmore reliable too08:55
aspiersI suspect it would be easy to run masakari server in containers08:55
beekhofie. if the evac fails08:55
aspiersright08:56
aspiersalso masakari already uses oslo e.g. oslo.db08:57
aspiersthe only question mark for me is whether we could achieve something similar with Congress + Mistral08:58
aspiersbut did you see my comment at the end of the meeting earlier?08:58
aspiershttps://blueprints.launchpad.net/mistral/+spec/mistral-ha was closed as obsolete in January08:58
aspiersIIRC I asked ddeja if he knew why. I can't remember the answer, but I remember being disappointed by it08:59
ddejaaspiers: let me think about it....09:00
aspiersddeja: thanks :)09:01
ddejaaspiers: I see that there is some movement around that blueprint09:02
ddejaI guess there was some decisions to move foreward on PTG09:02
aspiersddeja: where do you see the movement?09:03
ddejaI see new dependencies09:03
ddejathat wasn't there last time I was looking on it09:03
* ddeja may just be wrong09:03
*** ushkalim_ has joined #openstack-ha09:03
ddejabut AFAIK it was closed because it was opened so long ago that circumstances changed and therefore there is a need to write a new one09:05
* ddeja doesn't have enaugh time to keep up with Mistral...09:07
aspiersthat sounds like a strange reason to close09:09
ddejaaspiers: I may forgot about something else...09:12
samPhi..09:13
samPI thought meeting starts at 6:00 JST ...my mistake..09:14
aspiersoh crap, no my mistake :-(09:16
aspiersstupid daylight savings09:16
samPaspiers: np...09:16
aspierssamP: did your clocks change too?09:16
samPaspiers: no...we dont have that luxury09:17
aspiersI wouldn't call it a luxury ;-)09:17
aspiersmakes it much harder to wake up ...09:17
aspierssamP: mostly I was explaining the diagram to beekhof. I think he likes it09:17
samPaspiers: Im just following the discussion.. diagram is same as one you mail me?09:18
aspierssamP: yes09:18
samPaspiers: great..09:18
aspiersit would make it easier for us to adopt masakari if the host and process monitors could be dropped in favour of Pacemaker, since they don't seem to offer anything extra over what Pacemaker can already do09:19
aspiersor am I wrong?09:19
aspiersthe masakari host monitor seems to be just a wrapper around Pacemaker09:20
samPaspiers: you are right09:20
aspierswould it be possible for masakari developers to join future meetings?09:20
samPaspiers: only reason we maintain it is we have some users who using it.09:20
aspiersOK09:20
aspiersso if we could integrate Pacemaker host/process monitoring with masakari then those users would be able to switch09:21
*** rmart04 has joined #openstack-ha09:21
aspiersback in a few mins09:22
samPaspiers: I think it is a nice solution and I cant see any problems with that. Only problem is, they have huge clusters and it will take time to adopt09:23
samPaspiers: sure09:23
aspierssamP: that's fine, we have similar problems ;-/09:23
aspierssamP: BTW http://lists.openstack.org/pipermail/user-committee/2017-March/001890.html09:23
samPaspiers: thank you for bringing this up09:26
samPaspiers: In masakari meetings, We have already discussed about replace masakari-monitors** with resource-agents.09:29
samPits one of the pike work items.09:30
samP#link https://etherpad.openstack.org/p/masakari-pike-workitems09:30
samPplease see #L51-5309:30
aspierssamP: thanks!09:31
*** dgurtner has quit IRC09:32
samPaspiers: I was planning to make this before summit.. I can ask masakari developers to join ha meeting.09:35
aspierssamP: thanks!09:35
samPBut the problem is, most of masakari developers have very little knowledge about pacemaker-resource-agents.09:38
aspiersthat should be easy to fix09:38
aspiersI volunteer beekhof to help explain them ;-)09:38
aspiershe's even in the right time zone09:38
samPaspiers: that would be great..or..09:39
aspiersof course I am happy to answer questions about OCF RAs09:39
aspierssamP: about https://etherpad.openstack.org/p/masakari-pike-workitems, it would be nice if masakari didn't hardcode any assumptions about stonith09:40
aspiersif it just delegates stonith to pacemaker then there is nothing to do09:40
aspiersand then it is not limited to IPMI09:40
aspiersand this would happen automatically if masakari uses pacemaker for host monitoring09:40
samPaspiers: which item?09:40
aspiers"Force Stonith" #L33-3609:41
aspierssame for split brain detection #L1909:41
samPah....its has a different usecase09:41
samPaspiers: Force Stonith is use for isolate a node by force.. kind of and optional...09:43
aspierssamP: when would you need to do that?09:43
aspierssamP: but again it makes sense to do it through Pacemaker09:44
samPaspiers: if pacemaker there, then we can do it through pacemaker. Force Stonith will be the masakari side function to call it. In etherpad, 'IPMI' is an example.09:46
aspierssamP: OK. what is the use case?09:47
samPaspiers: In process or VM failures, in the case of masakari can not rescue and if the operator decide that he can no longer rescue the compute node, then operator might need to kill the compute node.09:50
aspierssamP: OK09:50
samPaspiers: I will explain about split brain after this..09:50
samPaspiers: We have got an another request for this... let me try to explain..09:51
samPOne of our masakari users user user pacemaker+masakari09:52
samPwhen compute node goes down, pacemaker fence it and call masakari for evacuation.09:53
samPthe problem is, pacemaker kill the compute node and node does not have enough time to do the core dump09:53
samPThey have no way to know why that compute node went down..09:54
samP<-- thatz what they said...09:54
aspierspacemaker could do a core dump before fencing09:57
samPaspiers: correct..but they need isolate that node immediately, so masakari can do the evacuate. since they have to dump 256GB of mem, it takes some time10:00
aspiersI see10:02
aspiersgotta go now, back later10:02
aspiersthanks for all the info!10:02
samPaspiers: sure, thanks I will catch you later10:02
*** dgurtner has joined #openstack-ha10:15
*** dgurtner has quit IRC10:15
*** dgurtner has joined #openstack-ha10:15
*** sticker has quit IRC10:24
*** dgurtner has quit IRC10:31
*** masahito has quit IRC10:35
*** masahito has joined #openstack-ha10:40
*** masahito has quit IRC10:45
*** samP has quit IRC10:45
*** ushkalim_ has quit IRC10:48
*** ushkalim_ has joined #openstack-ha11:03
*** ushkalim_ has quit IRC11:24
*** ushkalim_ has joined #openstack-ha11:36
*** ushkalim_ has quit IRC12:03
*** ushkalim_ has joined #openstack-ha12:18
*** jpena is now known as jpena|lunch12:40
*** rossella_s has quit IRC12:42
*** rossella_s has joined #openstack-ha12:43
*** jmlowe has quit IRC12:48
*** jmlowe has joined #openstack-ha13:00
*** jmlowe has quit IRC13:02
*** ushkalim_ has quit IRC13:13
*** ushkalim_ has joined #openstack-ha13:25
*** catintheroof has joined #openstack-ha13:30
*** jmlowe has joined #openstack-ha13:36
*** catintheroof has quit IRC13:41
*** sticker has joined #openstack-ha13:45
*** jpena|lunch is now known as jpena13:45
*** kgaillot has joined #openstack-ha13:55
*** masahito has joined #openstack-ha13:58
*** aasmith has quit IRC13:59
*** jmlowe_ has joined #openstack-ha14:02
*** jmlowe has quit IRC14:04
*** masahito has quit IRC14:16
*** masahito has joined #openstack-ha14:18
*** dgurtner has joined #openstack-ha14:31
*** aasmith has joined #openstack-ha14:48
*** cleong has joined #openstack-ha15:07
*** nkrinner is now known as nkrinner_afk15:51
*** rmart04 has quit IRC15:55
*** ushkalim_ has quit IRC16:17
*** masahito has quit IRC16:45
*** masahito has joined #openstack-ha17:03
*** mrhillsman has quit IRC17:10
*** codebauss has joined #openstack-ha17:19
*** jpena is now known as jpena|off17:20
*** codebauss is now known as mrhillsman17:20
*** mrhillsman has quit IRC17:21
*** codebauss has joined #openstack-ha17:23
*** masahito has quit IRC17:23
*** codebauss is now known as mrhillsman17:23
*** jmlowe_ has quit IRC17:25
*** pcaruana has quit IRC17:52
*** dgurtner has quit IRC17:55
*** masahito has joined #openstack-ha17:59
*** hannibal has joined #openstack-ha18:06
*** aasmith has quit IRC18:08
*** jmlowe has joined #openstack-ha18:13
*** masahito has quit IRC18:22
*** hannibal has quit IRC18:36
*** jmlowe has quit IRC18:42
*** openstackstatus has joined #openstack-ha18:44
*** ChanServ sets mode: +v openstackstatus18:44
*** hannibal has joined #openstack-ha18:49
*** hannibal has quit IRC19:00
*** jmlowe has joined #openstack-ha19:06
*** hannibal has joined #openstack-ha19:16
*** dgurtner has joined #openstack-ha19:53
*** jmlowe has quit IRC20:00
*** jmlowe has joined #openstack-ha20:03
*** jmlowe has quit IRC20:15
*** hannibal has quit IRC20:25
*** hannibal has joined #openstack-ha20:37
*** dgurtner has quit IRC20:40
*** hannibal has quit IRC20:54
*** cleong has quit IRC21:19
*** yee379 has joined #openstack-ha21:25
*** yee37915 has quit IRC21:26
*** kgaillot has quit IRC23:01

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!