Monday, 2021-04-12

*** tosky has quit IRC		00:00
*** dwilde has joined #openstack-ansible		00:03
*** dwilde has quit IRC		00:19
*** dwilde has joined #openstack-ansible		01:33
*** dwilde has quit IRC		01:48
*** evrardjp has quit IRC		02:33
*** evrardjp has joined #openstack-ansible		02:33
*** cloudnull8 has quit IRC		03:02
*** cloudnull8 has joined #openstack-ansible		03:02
*** fridtjof[m] has quit IRC		04:20
*** jrosser has quit IRC		04:20
*** jrosser has joined #openstack-ansible		04:34
*** fridtjof[m] has joined #openstack-ansible		04:38
*** SiavashSardari has joined #openstack-ansible		04:52
*** miloa has joined #openstack-ansible		05:34
*** SiavashSardari18 has joined #openstack-ansible		05:34
*** miloa has quit IRC		05:35
*** SiavashSardari has quit IRC		05:37
*** dmsimard has quit IRC		06:48
*** dmsimard has joined #openstack-ansible		06:51
*** luksky has joined #openstack-ansible		07:01
*** andrewbonney has joined #openstack-ansible		07:08
jrosser	morning	07:31
*** dmsimard has quit IRC		07:32
*** dmsimard has joined #openstack-ansible		07:33
*** tosky has joined #openstack-ansible		07:35
*** jbadiapa has joined #openstack-ansible		07:37
jonher	morning, https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/784885 originated from me seeing the deprecation notice in another job, but it looks we don't have the same version of tempest in these tests: "tempest run: error: unrecognized arguments: --include-list"	07:47
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Test dashboard only for horizon scenario https://review.opendev.org/c/openstack/openstack-ansible/+/782376	07:53
jrosser	jonher: theres a couple of ways you can look at the tempest version, it is in "/openstack/venvs/tempest-22.0.0.0rc2.dev138/bin" -> 138 commits after the 22.0.0.0rc2(?) tag	07:55
jrosser	but ultimately the thing that defines the version installed should be this https://github.com/openstack/openstack-ansible/blob/master/playbooks/defaults/repo_packages/openstack_testing.yml#L20	07:55
jrosser	(for master branch at least, stable branches are a little different)	07:56
* jrosser puzzled at os_keystone breakage		07:59
jonher	the release where it was announce as deprecated is from 2020-12-24 so the pinned SHA from 2021-01-17 should have the change, hmm	07:59
jonher	the commit* https://github.com/openstack/tempest/blob/7e96c8e854386f43604ad098a6ec7606ee676145/releasenotes/notes/Inclusive-jargon-17621346744f0cf4.yaml	08:00
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Test dashboard only for horizon scenario https://review.opendev.org/c/openstack/openstack-ansible/+/782376	08:01
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Set buster jobs to NV https://review.opendev.org/c/openstack/openstack-ansible/+/782681	08:01
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Set buster jobs to NV https://review.opendev.org/c/openstack/openstack-ansible/+/782681	08:02
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Set buster jobs to NV https://review.opendev.org/c/openstack/openstack-ansible/+/782681	08:03
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump SHAs for master https://review.opendev.org/c/openstack/openstack-ansible/+/785800	08:03
*** rpittau\|afk is now known as rpittau		08:04
*** manti has quit IRC		09:00
*** SiavashSardari has joined #openstack-ansible		10:19
admin0	morning	10:33
noonedeadpunk	o/	10:34
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Map dbaas and lbaas with role defaults https://review.opendev.org/c/openstack/openstack-ansible/+/784113	10:45
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Map dbaas and lbaas with role defaults https://review.opendev.org/c/openstack/openstack-ansible/+/784113	10:48
*** macz_ has joined #openstack-ansible		10:53
*** macz_ has quit IRC		10:57
*** sri_ has quit IRC		11:04
*** csmart has joined #openstack-ansible		11:35
*** rh-jelabarre has joined #openstack-ansible		12:17
*** SiavashSardari has quit IRC		12:30
*** manti has joined #openstack-ansible		12:48
*** cloudnull8 is now known as cloudnull		12:48
*** spatel_ has joined #openstack-ansible		13:01
*** spatel_ is now known as spatel		13:01
*** luksky has quit IRC		13:11
*** luksky has joined #openstack-ansible		13:11
*** luksky has quit IRC		13:12
*** luksky has joined #openstack-ansible		13:12
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: Update trove configuration https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/784571	13:31
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: Update trove configuration https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/784571	13:31
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: [doc] Document how to use separate RabbitMQ cluster https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/784781	13:34
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: Update trove configuration https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/784571	13:45
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: Update trove configuration https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/784571	13:46
openstackgerrit	Merged openstack/openstack-ansible-os_trove master: Add image upload option https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/784372	13:55
openstackgerrit	Merged openstack/openstack-ansible-openstack_hosts master: Decrease TCP retries in case of VIP failover https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/778028	14:17
*** macz_ has joined #openstack-ansible		14:35
*** macz_ has quit IRC		14:40
manti	I have a problem with lxc containers not having any services. I've run playbooks setup-hosts and setup-infrastructure successfully, but when comparing with aio-setup for example cinder-api container does not have anything in tcp port 8776, but aio-setup has uwsgi in tcp port 8776	14:57
*** dpawlik has quit IRC		14:58
manti	Same with horizon container, there is no apache2 etc	14:58
manti	It's like containers have been created, but the actual services have not been installed	14:59
jrosser	manti: setup-hosts creates the containers etc, in this context "infrastructure" is rabbitmq/galera/haproxy	14:59
jrosser	you need to run setup-openstack.yml to lay down the actual openstack services	14:59
manti	ah, I assumed the services should already be there because haproxy is unhappy since the healthcheck is failing, because there is nothing that could respond to it	15:01
*** macz_ has joined #openstack-ansible		15:01
jrosser	manti: thats expected, and those healthchecks should come good one by one as the services are deployed	15:03
manti	But I can't run setup-openstack because task os_keystone : Create database for service fails to Exception message: (2013, 'Lost connection to MySQL server during query'), which I kind of assumed would be caused by haproxy	15:06
manti	If I go to the galera container I can run mysql and access the database, that the problem is clearly in the connections	15:07
jrosser	if you can paste the error message at paste.openstack.org (with the whole ansible task headers) then we can try to find out whats going on	15:10
manti	http://paste.openstack.org/show/804403/	15:12
manti	Running the playbook with -vv or more doesn't give any extra info	15:15
jrosser	what is 10.254.17.84 ?	15:16
manti	infra1_utility_container-d225f98e	15:17
jrosser	in the utility container you should have the mysql client, can you check that it can connect to the database?	15:19
manti	"ERROR 2013 (HY000): Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 11" that doesn't work	15:20
manti	correction, it works when I give the host	15:21
manti	mysql -h infra1_galera_container-8862d63d and mysql -h 10.254.17.117 both work	15:21
manti	mysql -h 10.0.3.112 does not work	15:22
jrosser	can i check which branch you have deployed?	15:23
manti	stable/victoria	15:23
jrosser	it should not be needed to specify the node http://paste.openstack.org/show/804404/	15:23
jrosser	there should be a /root/.my.cnf file on the utility container which has the IP address of the haproxy internal endpoint	15:25
jrosser	the connection should be client > loadbalancer > active galera node	15:25
jrosser	so i think that the next thing to check is if haproxy is seeing the galera nodes as up or down	15:26
admin0	how to troubleshoot/fix when 2 routers ( 3x in HA) are active at the same time and ping-pong the IP every x seconds	15:27
manti	there is /root/.my.cnf file on the utility container and it has IP address that is used on the haproxy.conf bind clauses (like frontend galera-front-1 bind 10.254.17.11:3306), so I guess that is the haproxy internal endpoint?	15:28
manti	but I think haproxy does not see galera node up since there is " Server galera-back/infra1_galera_container-8862d63d is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." and " backend galera-back has no server available!"	15:30
manti	I assumed those erros in haproxy come because the healthcheck fails	15:31
jrosser	for a real-world deployment i would expect the ip in .my.cnf to be the one specified in internal_lb_vip_address, here https://github.com/openstack/openstack-ansible/blob/master/etc/openstack_deploy/openstack_user_config.yml.aio#L24	15:31
jrosser	for galera the mysql protocol is on port 3306, but the healthcheck is done with an http service on port 9200	15:32
manti	I had to check, but yeah, the IP in .my.cnf is the same as internal_lb_vip_address	15:34
jrosser	root@haproxy1-utility-container-ec8755d0:~# curl 10.11.128.95:9200	15:34
jrosser	Percona XtraDB Cluster Node is synced.	15:34
jrosser	^ that is what determines if haproxy is happy with each galera node	15:34
manti	well, it's not then: root@infra1:~# curl 10.254.17.117:9200	15:36
manti	curl: (56) Recv failure: Connection reset by peer	15:36
jrosser	hmm so this could be some network connectivity issue	15:38
jrosser	time to do basic ping / tcpdump checks between the haproxy node and galera containers	15:38
manti	ping is working, both with IP and name	15:39
jrosser	the galera status check service on port 9200 is run with xinetd on all the galera hosts	15:40
manti	I can see that the xinetd runs mysqlchk, which has only_from definition. Is that list of IPs used to determine from where the connections are allowed?	15:44
jrosser	yes thats correct	15:45
jrosser	it should be populated with this https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/galera_all.yml#L33-L39	15:45
manti	There's my problem! It doesn't contain the right IP	15:49
jrosser	ahha, interesting	15:50
jrosser	it should get the IP of all your galera hosts and all your haproxy hosts from this -> groups['galera_all'] \| union(groups['haproxy'])	15:51
admin0	manti, check the mtu also	15:55
admin0	i also had similar issues when my mtu was not in sync	15:55
admin0	which you cannot determine by ming	15:55
admin0	ping*	15:55
admin0	if you do ping, try to ping say 65000 bytes from each instance	15:55
manti	I have a feeling this is happening because the internal_lb_vip_address is 10.254.17.11, but the rest of the /etc/openstack_deploy/openstack_user_config.yml has infra and compute nodes defined with other IP	15:56
admin0	is .11 even up ?	15:56
manti	yes it is	15:56
admin0	a vip is different from the other nodes .. but ideally in some network that each controller can reach	15:56
admin0	so you cannot have 10x in the controllers and 192.x in the vip	15:57
admin0	the controllers ( default routing table) must be able to reach this vip ip	15:57
admin0	so ideally its in some common network or via a diff vlan routed via a router	15:57
manti	deployment host is in 172.x and can reach the controller and compute nodes via 172.x, and now it is this 172.x that is shown on the mysqlchk only_from definition	15:58
manti	but it clearly should have the 10.x address	15:58
manti	my openstack_user_config.yml has the 172.x defined for infra_hosts and also for haproxy host, so that is how it has appeared to the mysqlchk	16:01
jrosser	manti: why is the internal_vip on a different CIDR than the internal mgmt network? usually it would be on the same network	16:03
*** jamesdenton has quit IRC		16:03
manti	internal mgmt network is 10.254.17.0/24 and the internal vip 10.254.17.11, so they are on the same network	16:04
* manti < https://matrix.org/_matrix/media/r0/download/matrix.org/bCtVfntJUpdMwsmRxGxYQpGb/message.txt >		16:06
jrosser	and you've got those 10.254.17 IP on the hosts and eth1 of the containers where needed	16:08
jrosser	bit 172.x is what you want to use to ssh for out-of-band management and ansible?	16:08
jrosser	*but	16:08
manti	yes	16:10
jrosser	ok, so i think in this case you are going to need to put an override for galera_monitoring_allowed_source in /etc/openstack_deploy/user_variables.yml	16:11
jrosser	it's completely fine to have a separate network for your host management/ansible thats not the openstack mgmt network	16:11
manti	ok, that galera_monitoring_allowed_source sounds like it'll solve my problem	16:13
jrosser	but it will be necessary to make overrides in some cases because of issues like this - you can make whatever network setup you want, but going away from the stock config will require some overrides	16:13
jrosser	fwiw i have 3 deployments set up just like this with a seperate managment network, it's completely fine	16:14
openstackgerrit	Merged openstack/openstack-ansible stable/victoria: Bump SHAs for stable/victoria https://review.opendev.org/c/openstack/openstack-ansible/+/785803	16:14
jrosser	just not necessarily working out of the box without some extra settings	16:14
manti	that is good to know also, I did notice that documentation does not recommended this kind of setup as it says the deployment host should be within mgmt network (if I recall correctly)	16:15
jrosser	i think it's more like there is a reference deployment assumed throughout the documentation	16:18
*** macz_ has quit IRC		16:19
*** jamesdenton has joined #openstack-ansible		16:30
openstackgerrit	Merged openstack/openstack-ansible stable/train: Bump SHAs for stable/train https://review.opendev.org/c/openstack/openstack-ansible/+/785801	16:33
openstackgerrit	Merged openstack/openstack-ansible stable/ussuri: Bump SHAs for stable/ussuri https://review.opendev.org/c/openstack/openstack-ansible/+/785802	16:33
*** rpittau is now known as rpittau\|afk		16:59
*** macz_ has joined #openstack-ansible		17:15
*** macz_ has quit IRC		17:17
*** macz_ has joined #openstack-ansible		17:17
*** MrClayPole has quit IRC		17:35
*** MrClayPole has joined #openstack-ansible		17:35
*** andrewbonney has quit IRC		18:47
*** juanoterocas has joined #openstack-ansible		19:15
*** snapdeal has joined #openstack-ansible		19:43
*** juanoterocas has quit IRC		19:59
*** juanoterocas has joined #openstack-ansible		20:00
*** devtolu1__ has joined #openstack-ansible		20:04
*** juanoterocas has quit IRC		20:04
*** snapdeal has quit IRC		20:46
*** macz_ has quit IRC		20:47
*** devtolu1__ has quit IRC		20:56
*** juanoterocas has joined #openstack-ansible		20:56
*** bjoernt has joined #openstack-ansible		21:11
*** jralbert has joined #openstack-ansible		21:25
*** spatel has quit IRC		21:26
jralbert	Having just upgraded to Train, I'm aware from the release notes that placement is removed from the nova role and gets its own role and container; however I see that the nova-placement-api service is still defined and running from the old Stein nova venv (although evidently receiving no requests). Should I just shut down and mask the service? Should	21:28
jralbert	there be a play in the upgrade to clean up this service?	21:28
*** spatel_ has joined #openstack-ansible		21:29
*** spatel_ is now known as spatel		21:29
*** bjoernt has quit IRC		21:37
*** bjoernt has joined #openstack-ansible		21:38
jrosser	jralbert: i think that the clean up is left up to you, i'd expect the haproxy backend for nova-placement-api to be removed by the upgrade playbooks as here https://github.com/openstack/openstack-ansible/blob/stable/train/scripts/run-upgrade.sh#L185-L186	21:39
jrosser	shutting down and masking the service seems like a reasonable next step	21:40
jralbert	Yes, the haproxy backend went away, I was just surprised in our post-install review to still see 19-release binaries running in the nova container and then realized they're from the deprecated placement service	21:41
*** bjoernt has quit IRC		21:41
*** bjoernt has joined #openstack-ansible		21:41
jrosser	jralbert: were you previously having trouble during your upgrade with the venvs being built repeatedly for each node?	21:44
*** macz_ has joined #openstack-ansible		21:44
*** sshnaidm\|pto has quit IRC		21:44
jralbert	that was me, yes	21:44
jrosser	did you manage to find what was happening there?	21:45
jralbert	Nope, we tried several variations on limits, including having the repo containers in the limit, and saw no change	21:46
jralbert	at that point we just directed osa to pull nova and neutron source from github and pushed through	21:46
jrosser	i was thinking that was probably a factor in the upgrade taking a very long time	21:47
jralbert	our goal was to run a set of limited plays so we could upgrade infra, control, and compute planes independently and thereby limit the outage to our clients	21:47
jrosser	yes, understood	21:47
jralbert	and that was largely successful, except that compute was heavy to get through	21:47
*** spatel has quit IRC		21:47
jralbert	it was very late, and I was very tired, but I couldn't see clearly in the venv build role where it decides what host to delegate to	21:48
jralbert	there's a task that identifies the build host, and it does name a repo container, but then the build still happens on the compute node	21:48
jrosser	yeah, it's kind of a bit tortuous to find that	21:48
jrosser	the other thing is that previously gathered facts can time out after 24 hours	21:49
jralbert	but what fact would change the delegation behaviour?	21:50
jrosser	oh, well it uses the repo server cpu architeecture and operating system version when selecting the one to use to build wheels	21:51
jrosser	and silently falls back to building on the target host instead if a match isnt found	21:51
jrosser	this should all be handled fine, but i just wonder if theres something underlying which made it always fail to find a suitable repo server for you	21:53
jralbert	Is there a way to add some debug plays to illustrate what it's picking?	21:54
jrosser	this is the interesting bit https://github.com/openstack/ansible-role-python_venv_build/blob/stable/train/tasks/python_venv_wheel_build.yml#L16-L24	21:57
jrosser	it should be gathering facts about the repo servers	21:57
jrosser	then evaluating {{ venv_build_host }} to decide which to use	21:57
jrosser	if it's possible to reproduce some sort of "run nova/neutron playbook limited to one compute node" you might temporarily stick this in the python_venv_build role just after the "gather build target facts" task http://paste.openstack.org/show/804418/	22:02
jrosser	which will lead you to here https://github.com/openstack/ansible-role-python_venv_build/blob/dac99008d5c2473ce82f83559024ce0c62e78a9f/defaults/main.yml#L121	22:03
jrosser	anyway, it's kinda late here now, but if the opportunity to understand what was happening comes up i'd be interested to know if we have something to fix	22:05
jralbert	Ah, okay. Perhaps of interest, here's a little chunk of the log from a limited build of horizon on one of our control nodes, showing a repo container selected to gather facts from, and then delegation to the horizon container itself to do the build anyway: http://paste.openstack.org/show/804419/	22:05
jralbert	I will see about trying to get some debug info as you describe	22:06
jrosser	i think printing venv_build_targets might also be enlightening if you can make it misbehave as before	22:09
jrosser	jralbert: reading your paste carefully it looks like what you show with horizon is the same lack of delegation, so whatever is easiest for you to add some debug to will be fine, does not need to be a compute node	22:23
jralbert	yes, I think I can trigger this anywhere in our environment. I'll do some testing when I have a bit of time and let you know what I find out	22:24
jrosser	great, thankyou	22:24
*** luksky has quit IRC		22:49
openstackgerrit	Michael Soto proposed openstack/ansible-hardening master: Added pam_auth_password to nullok check https://review.opendev.org/c/openstack/ansible-hardening/+/785984	22:52
*** juanoterocas has quit IRC		22:59
*** tosky has quit IRC		23:24
*** macz_ has quit IRC		23:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!