Thursday, 2018-11-29

*** slaweq has quit IRC00:05
eanderssonsorrison, what version of rabbitmq are you running?00:08
sorrisonwe're on 3.6.5 erlang 19.300:09
sorrisonWe tried 3.7.9 and erlang 20.3 but it crashed after about 5 mins or running and so we rolled back. (we tried this twice)00:10
eanderssoninteresting00:10
eanderssonHow many nodes is this with? does it happen with 1? or 100?00:10
eandersson*compute nodes00:10
eanderssonbasically does the load have an impact?00:11
sorrisonYes I think load is a factor00:11
eanderssonSo one obvious factor, especially with older versions of RabbitMQ could be that it just can't accept connections fast enough.00:12
sorrisonprod is around 1000 hosts and can't run latest version, test is 8 hosts and is fine (with new or old version of rabbit)00:12
eanderssonWould cause > {handshake_timeout,frame_header}00:12
sorrisonyes that is what we see!00:12
eandersson> num_acceptors.ssl = 100:13
eanderssonSo for SSL in these older versions they can only support one SSL acceptor00:13
eanderssonIf you upgrade to something like 3.6.14, or maybe even your version00:13
eanderssonyou can bump that to something like 1000:13
*** mriedem has quit IRC00:13
eanderssonhttps://www.rabbitmq.com/configure.html00:14
sorrisonthe bulk of the load is neutron agents on this rabbit and so we offloaded ssl to our F5 but didn't help00:14
sorrisonYeah ok, I'll have a look into that. We tried 3.6.14 but it crashed and burned when we put the load on it too00:14
sorrisonwe think it's to do with the distributed mgmt interface00:14
eanderssonhttps://github.com/rabbitmq/rabbitmq-server/issues/172900:16
eanderssonTrying to find the exact bug00:16
eanderssonI think it might work on 3.6.5 as well, but haven't tested it.00:18
eanderssonAlso consider raising the ssl_handshake_timeout00:25
*** simon-AS5591 has quit IRC00:26
sorrisonyeah ok, I'll try {num_ssl_acceptors, 10} and maybe doubling ssl_handshake_timeout00:27
sorrisonSo in my reading of this bug 10 acceptors is the default in 3.7.9?00:27
openstackbug 10 in Launchpad itself "It says "displaying matching bugs 1 to 8 of 8", but there is 9" [Medium,Invalid] https://launchpad.net/bugs/1000:27
eanderssonYea - unfortunately only 3.7.900:28
sorrisonlooks like it might be referenced as num_acceptors.ssl too00:28
eanderssonI can't remember which version it was actually fixed in thou00:28
eanderssonhttps://github.com/rabbitmq/rabbitmq-server/commit/d687bf0be3a23fdb63c1c0b36db967285a112c7400:29
eanderssonFound it ^00:29
sorrisonah so looks like it's in 3.8.0-beta.100:30
eandersson3.7.1300:30
eandersson*3.6.1300:30
eanderssonBut depends on Ranch, not RabbitMQ00:31
eandersson> due to a bug in Ranch 1.000:31
sorrisonwe just use the debs provided my rabbitmq so I assume this is packaged in there somewhere? (no idea what Ranch is)00:32
eanderssonYea library that RabbitMQ uses00:33
eanderssonUnfortunately no clue how to check00:33
eanderssonAccording to the bug, worst case it generates harmless error messages on shutdown if you have 1.000:33
sorrisonrabbitmqctl shows it00:34
eanderssonRanch 1.0 is over 4 years old, so you are probably using a newer version.00:34
sorrisonwe have {ranch,"Socket acceptor pool for TCP protocols.","1.2.1"}, on our 3.6.5 erlang 19.3 cluster00:34
sorrisonour dev cluster which is 3.7.9 erlang 20.3 has  {ranch,"Socket acceptor pool for TCP protocols.","1.6.2"},00:36
sorrisonSo now I just need to figure out why anything newer than 3.6.5 falls over in our case once I get all the neutron agents connecting to it again00:36
eanderssonbtw how many RabbitMQ nodes are you running? 2 or 3?00:37
sorrison300:37
sorrisonphysical hosts with 24 cores and 96G ram00:37
sorrison Connections: 1113400:39
sorrisonChannels: 1088400:39
sorrisonExchanges: 126200:39
sorrisonQueues: 1305500:39
sorrisonConsumers: 1896500:39
eanderssonAre you spreading the connections out between the nodes?00:39
sorrisonyeah they evenly spread00:40
eanderssonIs handshake_timeout the only obvious error you see in the RabbitMQ logs?00:44
sorrison{handshake_error,starting,1,00:51
sorrison{handshake_timeout,frame_header}00:51
sorrison{handshake_timeout,handshake}00:51
sorrison{inet_error,{tls_alert,"bad record mac"}}00:51
sorrison{inet_error,{tls_alert,"unexpected message"}}00:51
sorrisonthose are the things I see in the logs00:51
sorrison{handshake_timeout,frame_header} is by far the common one, the others only seen a couple times00:51
sorrisonAlso get missed heartbeat from client timeouts too00:52
eanderssonAre you using a LB in front of RabbitMQ, or hitting them directly?01:03
eanderssonunexpected message / bad record mac is a bit concerning01:05
sorrisonabout 90% of our clients go through an LB which terminates the ssl, the rest hit directly using ssl from rabbit01:05
eanderssonbad record usually means that there is a client side race condition01:06
eanderssonmaybe a threading issue01:07
sorrisonthe unexpected message / bad record mac  only a couple times and that could've been when we were stopping and starting with new version etc.01:07
eanderssonI see01:07
eanderssonIt's worth keeping an eye out for those type of errors, as they would indicate a potential issue with kombu/oslo.messaging01:09
eandersson*could01:09
*** jakeyip has joined #openstack-operators01:14
sorrisonWell downgrading from pike oslo.messaging to ocata oslo.messaging fixes the timeouts for us01:16
sorrisonSee last comment on https://bugs.launchpad.net/oslo.messaging/+bug/1800957/ about what works/doesn't work for us01:16
openstackLaunchpad bug 1800957 in oslo.messaging "Upgrading to pike version causes rabbit timeouts with ssl" [Undecided,Incomplete] - Assigned to Ken Giusti (kgiusti)01:16
*** markvoelker has joined #openstack-operators01:23
*** VW_ has quit IRC01:24
*** markvoelker has quit IRC01:28
*** VW_ has joined #openstack-operators01:28
*** jakeyip has quit IRC01:31
*** blake has quit IRC01:33
*** jakeyip has joined #openstack-operators01:35
*** blake has joined #openstack-operators01:46
*** trident has quit IRC01:50
*** blake has quit IRC01:51
*** gyee has quit IRC01:54
*** trident has joined #openstack-operators01:55
*** blake has joined #openstack-operators01:57
*** blake has quit IRC02:02
*** markvoelker has joined #openstack-operators02:30
*** rcernin has quit IRC02:58
*** jamesmcarthur has joined #openstack-operators03:00
*** jamesmcarthur has quit IRC03:04
*** jamesmcarthur has joined #openstack-operators03:14
*** jamesmcarthur has quit IRC03:30
*** VW_ has quit IRC03:37
*** VW_ has joined #openstack-operators03:37
*** rcernin has joined #openstack-operators03:38
*** jamesmcarthur has joined #openstack-operators04:20
*** jamesmcarthur has quit IRC04:25
*** jackivanov has joined #openstack-operators05:49
*** blake has joined #openstack-operators05:58
*** blake has quit IRC06:02
*** simon-AS559 has joined #openstack-operators06:51
*** simon-AS5591 has joined #openstack-operators06:53
*** simon-AS559 has quit IRC06:55
*** VW_ has quit IRC06:56
*** ahosam has joined #openstack-operators07:02
*** slaweq has joined #openstack-operators07:03
*** ahosam has quit IRC07:26
*** ahosam has joined #openstack-operators07:26
*** aojea has joined #openstack-operators07:32
*** rcernin has quit IRC07:35
*** gkadam has joined #openstack-operators08:07
*** takamatsu has quit IRC08:31
*** dims has quit IRC08:32
*** dims has joined #openstack-operators08:33
*** pcaruana has joined #openstack-operators08:48
*** markvoelker has quit IRC08:50
*** ahosam has quit IRC09:10
*** ahosam has joined #openstack-operators09:10
*** takamatsu has joined #openstack-operators09:13
*** derekh has joined #openstack-operators09:22
*** markvoelker has joined #openstack-operators09:51
*** simon-AS5591 has quit IRC09:58
*** blake has joined #openstack-operators09:58
*** blake has quit IRC10:03
*** ahosam has quit IRC10:06
*** simon-AS559 has joined #openstack-operators10:08
*** markvoelker has quit IRC10:24
*** ahosam has joined #openstack-operators10:31
*** electrofelix has joined #openstack-operators11:04
*** markvoelker has joined #openstack-operators11:21
*** takamatsu has quit IRC11:25
*** takamatsu has joined #openstack-operators11:31
*** ahosam has quit IRC11:45
*** markvoelker has quit IRC11:55
*** jamesmcarthur has joined #openstack-operators12:00
*** jamesmcarthur has quit IRC12:04
*** takamatsu has quit IRC12:06
*** gkadam_ has joined #openstack-operators12:23
*** gkadam has quit IRC12:25
*** gkadam_ has quit IRC12:28
*** gkadam has joined #openstack-operators12:28
*** markvoelker has joined #openstack-operators12:52
*** markvoelker has quit IRC13:24
*** gkadam_ has joined #openstack-operators13:25
*** gkadam has quit IRC13:28
*** blake has joined #openstack-operators13:35
*** blake has quit IRC13:40
*** jamesmcarthur has joined #openstack-operators13:48
*** takamatsu has joined #openstack-operators13:56
*** jamesmcarthur has quit IRC14:04
*** mriedem has joined #openstack-operators14:29
*** VW_ has joined #openstack-operators15:15
*** mriedem is now known as mriedem_afk15:29
*** ahosam has joined #openstack-operators15:39
*** aojea has quit IRC15:59
*** jamesmcarthur has joined #openstack-operators16:05
*** jamesmcarthur has quit IRC16:10
*** gkadam_ has quit IRC16:20
*** mriedem_afk is now known as mriedem16:22
*** blake has joined #openstack-operators16:30
*** blake has quit IRC16:30
*** ahosam has quit IRC16:46
*** gyee has joined #openstack-operators17:09
*** jamesmcarthur has joined #openstack-operators17:24
*** derekh has quit IRC18:06
*** VW_ has quit IRC18:07
*** jamesmcarthur has quit IRC18:24
*** electrofelix has quit IRC19:12
*** markvoelker has joined #openstack-operators19:36
*** jamesmcarthur has joined #openstack-operators19:37
*** markvoelker has quit IRC19:40
*** dtrainor has quit IRC19:41
*** dtrainor has joined #openstack-operators19:48
*** jamesmcarthur has quit IRC19:50
*** jamesmcarthur has joined #openstack-operators19:52
*** klindgren has joined #openstack-operators19:57
*** markvoelker has joined #openstack-operators20:00
*** jamesmcarthur has quit IRC20:00
*** slaweq has quit IRC20:11
*** jamesmcarthur has joined #openstack-operators20:25
*** slaweq has joined #openstack-operators20:28
*** jamesmcarthur has quit IRC20:29
*** jamesmcarthur has joined #openstack-operators20:53
*** markvoelker has quit IRC21:38
*** markvoelker has joined #openstack-operators21:38
*** markvoelker has quit IRC21:43
*** jamesmcarthur has quit IRC21:48
*** jamesmcarthur has joined #openstack-operators21:48
*** rcernin has joined #openstack-operators21:57
*** markvoelker has joined #openstack-operators22:01
*** jamesmcarthur has quit IRC22:03
*** mriedem is now known as mriedem_afk22:33
*** slaweq has quit IRC22:41
*** VW_ has joined #openstack-operators22:43
*** slaweq has joined #openstack-operators22:52
*** slaweq has quit IRC22:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!