Saturday, 2022-02-12

opendevreviewMerged openstack/openstack-ansible-rabbitmq_server master: Remove old repos for Debian  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/82722102:03
opendevreviewMerged openstack/openstack-ansible-rabbitmq_server master: Use journald logging for RabbitMQ  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/82634502:10
jrossernoonedeadpunk: re. 828932 i am unsure, first it would listen on the external vip with the default hosts layout? and also https://opendev.org/openstack/openstack-manuals/src/branch/master/doc/install-guide/source/shared/edit_hosts_file.txt#L22-L2509:29
jrosser127.0.1.1 resolving to the hostname feels just wrong anyway?09:29
mirek186hi, could someone help troubleshooting lost connection in mysql12:47
mirek186random connection to DB via haproxy will fail and then deployment will fail as well12:48
mirek186mysql --defaults-file=/root/.my.cnf12:48
mirek186ERROR 2013 (HY000): Lost connection to server at 'handshake: reading initial communication packet', system error: 1112:48
mirek186Feb 12 12:14:48 infra1-utility-container-c5a86fc4 ansible-community.mysql.mysql_db[5955]: Invoked with name=['magnum_service'] login_host=172.16.71.20 login_port=3306 config_file=/root/.my.cnf connect_timeout=30 encoding= collation= stervice'] login_host=172.16.71.20 login_port=3306 config_file=/root/.my.cnf connect_timeout=30 encoding= collation= state=present single_transaction=False quick=True ignore_tables=[] hex_blob=False force=False 12:49
mirek186master_data=0 skip_lock_ta>te=present single_transaction=False quick=True ignore_tables=[] hex_blob=False force=False master_data=0 skip_lock_tables=False use_shell=False unsafe_login_password=NOT_LOGGING_PARAMETER restrict_config_file=False check_implicit_admin>les=False use_shell=False unsafe_login_password=NOT_LOGGING_PARAMETER restrict_config_file=False check_implicit_admin=False config_overrides_defaults=False login_user=None login_12:49
mirek186password=NOT_LOGGING_PARAMETER login_unix_socket=None cli>False config_overrides_defaults=False login_user=None login_password=NOT_LOGGING_PARAMETER login_unix_socket=None client_cert=None client_key=None ca_cert=None check_hostname=None target=None dump_extra_args=None12:49
mirek186I know issue is around haproxy or galera, just don't know where to start identifing the issue12:49
jrossermirek186: first thing I would check is that you are not exceeding the galera max connections limit14:01
jrosseralso please use paste.opendev.org for debug output14:02
jrosserif you’re enabling more services you might need to increase the connection limit14:03
mirek186thanks jrosser, I think that's the issue, looking at haproxy log: Feb 12 14:08:50 srv4-infra-1 haproxy[138066]: Connect() failed for backend galera-back: no free ports.14:09
jrosseroh well that feels different maybe14:26
jrosserthere is a hard limit in the galera config for connections14:27
jrosserno free ports suggests running out of ports on the haproxy node to connect to the galera backend14:27
mirek186I found following recommendation from haproxy blog: https://www.haproxy.com/blog/haproxy-high-mysql-request-rate-and-tcp-source-port-exhaustion/14:28
mirek186to alow for quicker reuse of time_wait ports, plus expand src port, default anything above 32k but they recommend in busy env to do anything above reserved 102414:29
jrosseroh well…..14:29
mirek186i'll re-run the deploment see what happen14:29
jrosserthe openstack services should use a connection pool for the db14:29
jrosserso you should not see a heavy churn of db connections at haproxy14:29
mirek186It's my first time deploying using openstack-ansible so just trying to fix one error at the time14:30
jrosserhowever, hitting the galera backend connection limit may make haproxy think the backend is down, and that can cause an instant 2x requirement on ports as they fail over to another backend14:31
jrosserI would still double check the # connections that galera thinks it has14:31
mirek186I had it set to 409614:32
jrosserdid you get things working ok with the core services before moving on to extras like magnum?14:35
mirek186yes, it seams ok, however as I said those are my first deployments using ansible14:37
jrosserok cool14:37
mirek186In the past all my builds where done using Juju, so I haven't checked all services. Trying to get clean install of all components I need14:37
jrosserdoes haproxy think your galera backend goes down?14:39
jrosseranyway - I need to weekend :) make a bug on launchpad if you’re really stuck14:40
jrosseror generally things are quite active here EU time on weekdays14:41
admin1mirek186, try this in user_variables: galera_max_connections: 400014:46
admin1or 1000 .. or 6000 . depends on how big the total cluster is14:47
mirek186thanks guys, I alrady had it on galera_max_connections: 409614:50
mirek186just wiped out all hosts for redeployment. I've added the following two as recomended by haproxy blog as well. 14:51
mirek186openstack_user_kernel_options:14:51
mirek186  - { key: 'net.ipv4.ip_local_port_range', value: '1025 65000' }14:51
mirek186  - { key: 'net.ipv4.tcp_tw_reuse', value: 1 }14:51
mirek186I also had deployment using --forks 10, maybe I do final setup-openstack on defaults I won't hit any limits. Looking at hatop all galera backends were fine14:52
admin1upgrading from 23 -> 24.0.1 i am getting: virtualenv --no-download --python=python3 --always-copy /openstack/venvs/keystone-24.0.1", "msg": "[Errno 2] No such file or directory: b'virtualenv'", "rc": 2, "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}15:13
admin1doh !15:13

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!