Tuesday, 2020-02-11

*** goldyfruit has quit IRC04:49
*** goldyfruit has joined #openstack-masakari13:18
*** goldyfruit has quit IRC14:38
*** goldyfruit has joined #openstack-masakari14:38
*** vishalmanchanda has joined #openstack-masakari14:51
*** gmann is now known as gmann_afk17:20
*** gmann_afk is now known as gmann18:49
lileHi goldyfruit19:34
lileI've tried out the patch from Change 675734 which allows me to configure a segment by either fqdn or hostname19:35
lileHowever it doesn't appear to help matters for me, masakari still doesn't attempt to migrate VMs from a failed hypervisor19:35
lileIt looks like everything regarding hostnames should match, although I got a bit cute with the segments19:36
lileThis cluster was deployed with kolla-ansible (train)19:36
lileThe domain name has been redacted with "example.com"19:38
lilehttps://www.irccloud.com/pastebin/ZOZOsiov/19:38
*** vishalmanchanda has quit IRC19:43
goldyfruitWhat about the masakari logs ?20:08
goldyfruiton controller nodes and computes20:08
goldyfruitlile,20:08
goldyfruitI see that you are using kolla20:09
goldyfruitApply this change: https://review.opendev.org/#/c/697712/20:09
goldyfruitto your configuration20:09
lileIs there a specific masakari log you're interested in?  masakari-engine?20:09
goldyfruitengine and the one on the compute from the hostmonitior20:10
lilels20:10
goldyfruitDid you setup pacemaker and corosync ?20:10
goldyfruitBecause kolla-ansible doesn't do that20:11
lileok, that's probably where I've missed20:11
goldyfruitThe only support right now in Kolla is instancemonitor20:11
lileCan you point me at the correct documentation for that?20:11
goldyfruitThat is a tricky question :p20:11
lilelol, yeah, I've looked all over for docs and ended up very confused...20:11
goldyfruitBut basically, you will need to setup pacemaker and corosync on the controller nodes20:12
goldyfruitAnd only pacemaker-remote on the compute20:13
lileThe hv node masakari logs are basically empty20:13
lilecat /var/log/kolla/masakari/masakari-instancemonitor.log20:13
lile2020-02-11 11:58:38.497 6 INFO masakarimonitors.service [-] Starting masakarimonitors-instancemonitor20:13
lileLikewise for engine on the controllers20:14
lile[root@odt-rsd-ost-ctr-b1 masakari]# cat masakari-engine.log20:14
lile2020-02-11 11:57:02.448 6 INFO masakari.engine.driver [-] Loading masakari notification driver 'taskflow_driver'20:14
lile2020-02-11 11:57:06.525 6 INFO masakari.service [-] Starting ha_engine (version 8.0.0)20:14
lileThe wsgi server log just complains when it sees an HV node go down, regarding rabbitmq20:15
goldyfruitSo at least to test masakari-engine and masakari-instancemonitor, you can kill an instance on the compute and see if the instance is coming back20:15
lile2020-02-11 13:06:12.596 20 INFO masakari.compute.nova [req-26c23ee2-b6b9-4cfe-8af4-16f5a4567c22 nova - - - -] Call hypervisor search command to get list of matching hypervisor name 'hv-10-011.example.com'20:15
lile2020-02-11 13:06:13.588 20 ERROR oslo.messaging._drivers.impl_rabbit [req-26c23ee2-b6b9-4cfe-8af4-16f5a4567c22 nova - - - -] [1c1fecd1-495e-41a9-acc4-3c955f14fba4] AMQP server on 10.232.194.237:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: IOError: Server unexpectedly closed connection20:15
lile2020-02-11 13:06:14.616 20 INFO oslo.messaging._drivers.impl_rabbit [req-26c23ee2-b6b9-4cfe-8af4-16f5a4567c22 nova - - - -] [1c1fecd1-495e-41a9-acc4-3c955f14fba4] Reconnected to AMQP server on 10.232.194.237:5672 via [amqp] client with port 41096.20:15
lile2020-02-11 13:06:48.848 18 WARNING oslo.messaging._drivers.impl_rabbit [req-0bf6856f-8bbd-4a4a-8354-8d4c9fd70812 2bc8474c1633412fa53fab73b98714cc 21500607bf2747068ee65d3fe1b51874 - - -] Unexpected error during heartbeat thread processing, retrying...: IOError: Server unexpectedly closed connection20:15
lile2020-02-11 14:04:29.846 20 WARNING oslo.messaging._drivers.impl_rabbit [req-26c23ee2-b6b9-4cfe-8af4-16f5a4567c22 nova - - - -] Unexpected error during heartbeat thread processing, retrying...: IOError: Server unexpectedly closed connection20:15
lile2020-02-11 14:04:30.941 20 ERROR oslo.messaging._drivers.impl_rabbit [req-0311e219-1083-4ad4-961d-3d629449f4c5 2bc8474c1633412fa53fab73b98714cc 21500607bf2747068ee65d3fe1b51874 - - -] [1c1fecd1-495e-41a9-acc4-3c955f14fba4] AMQP server on 10.232.194.237:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: IOError: Server unexpectedly closed connection20:15
lile2020-02-11 14:04:31.974 20 INFO oslo.messaging._drivers.impl_rabbit [req-0311e219-1083-4ad4-961d-3d629449f4c5 2bc8474c1633412fa53fab73b98714cc 21500607bf2747068ee65d3fe1b51874 - - -] [1c1fecd1-495e-41a9-acc4-3c955f14fba4] Reconnected to AMQP server on 10.232.194.237:5672 via [amqp] client with port 35934.20:15
lile2020-02-11 14:05:56.914 18 INFO masakari.compute.nova [req-9e44e63e-f1de-4faa-86fa-3357f1a58ec0 nova - - - -] Call hypervisor search command to get list of matching hypervisor name 'hv-10-012'20:16
lile2020-02-11 14:05:57.835 18 ERROR oslo.messaging._drivers.impl_rabbit [req-9e44e63e-f1de-4faa-86fa-3357f1a58ec0 nova - - - -] [60c60034-bd54-4af0-81ce-7e53d152a714] AMQP server on 10.232.194.236:5672 is unreachable: Server unexpectedly closed connection. Trying again in 1 seconds.: IOError: Server unexpectedly closed connection20:16
lile2020-02-11 14:05:58.861 18 INFO oslo.messaging._drivers.impl_rabbit [req-9e44e63e-f1de-4faa-86fa-3357f1a58ec0 nova - - - -] [60c60034-bd54-4af0-81ce-7e53d152a714] Reconnected to AMQP server on 10.232.194.236:5672 via [amqp] client with port 59904.20:16
lile2020-02-11 14:42:37.087 18 WARNING oslo.messaging._drivers.impl_rabbit [req-9e44e63e-f1de-4faa-86fa-3357f1a58ec0 nova - - - -] Unexpected error during heartbeat thread processing, retrying...: IOError: Server unexpectedly closed connection20:16
goldyfruitlile, don't paste the log here please20:16
lileapologies20:16
lilepastebin?20:16
goldyfruityes20:17
lileFor 697712, I could just push the "[api] api_interface = internal" into masakari-monitors.conf with a kolla-ansible config merge, correct?  (as a quick test)20:18
goldyfruityep20:19
lileFor instance ha, set--property HA_Enabled=True on the instance, correct?20:23
goldyfruityes20:23
lileSo masakari noticed the instance going down but didn't restart the instance20:55
lileSo masakari noticed the instance going down but didn't restart the instance20:56
lilehttps://www.irccloud.com/pastebin/X3IqLeG9/20:56
goldyfruitdid you kill the instance ?20:57
lileYes, from the hv using an os level kill20:57
goldyfruitkill -9 pid ?20:58
goldyfruitTry to enable debug on masakari api, engine and instancemonitor21:04
lilejust a kill, but I can hit it harder next time ;-)21:06
lileI did find an interesting message in engine on the controller21:09
lilehttps://www.irccloud.com/pastebin/rFdFahA8/21:09
goldyfruittry the -921:11
goldyfruitAnd turn on debug21:11
lileSaw that while I was enabling debug21:11
lileok, with kill -9 it did restart21:15
goldyfruitgreat21:15
lilewould that recover instances if the compute node failed though?21:16
goldyfruitnop21:17
goldyfruitYou need to setup the hostmonitor21:17
lileok, any pointer to how to get pacemaker/corosync or hostmonitor running along side a kolla-ansible deployment?21:17
goldyfruitI didn' t have the time to continue the impletementation21:18
goldyfruitimplementation21:18
goldyfruitBut the images have been merged into kolla21:18
goldyfruitthey are under hacluster21:19
lileok, I'll dig into that.21:19
goldyfruitAbout the Ansible role21:19
goldyfruithttps://review.opendev.org/#/c/670104/21:19
goldyfruitBy reading the role you should be able to see how to install pacemaker/corosync21:20
lileexcellent21:20
lileIs there anything specific missing from the role or did it just not get merged back to Train?21:23
goldyfruitThe role was working for train/master21:24
goldyfruitJust needed to clean some stuff but never had the time since I left the company21:24
lileOk, I can probably just figure out how to invoke it in my existing train setup21:24
lilewe're building our own kolla images, when necessary, and merging them in at kolla-ansible through docker tags21:25
lileThank you so much for your help today!  I'll dig into https://review.opendev.org/#/c/670104/ and see what I come up with.21:26
goldyfruitGood luck :)21:26
lileIf the clean ups aren't beyond my capability, I may be able to submit them back to you or possibly back to the change21:27
goldyfruitCool21:28

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!