Monday, 2021-04-12

*** tosky has quit IRC00:00
*** dwilde has joined #openstack-ansible00:03
*** dwilde has quit IRC00:19
*** dwilde has joined #openstack-ansible01:33
*** dwilde has quit IRC01:48
*** evrardjp has quit IRC02:33
*** evrardjp has joined #openstack-ansible02:33
*** cloudnull8 has quit IRC03:02
*** cloudnull8 has joined #openstack-ansible03:02
*** fridtjof[m] has quit IRC04:20
*** jrosser has quit IRC04:20
*** jrosser has joined #openstack-ansible04:34
*** fridtjof[m] has joined #openstack-ansible04:38
*** SiavashSardari has joined #openstack-ansible04:52
*** miloa has joined #openstack-ansible05:34
*** SiavashSardari18 has joined #openstack-ansible05:34
*** miloa has quit IRC05:35
*** SiavashSardari has quit IRC05:37
*** dmsimard has quit IRC06:48
*** dmsimard has joined #openstack-ansible06:51
*** luksky has joined #openstack-ansible07:01
*** andrewbonney has joined #openstack-ansible07:08
jrossermorning07:31
*** dmsimard has quit IRC07:32
*** dmsimard has joined #openstack-ansible07:33
*** tosky has joined #openstack-ansible07:35
*** jbadiapa has joined #openstack-ansible07:37
jonhermorning, https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/784885 originated from me seeing the deprecation notice in another job, but it looks we don't have the same version of tempest in these tests: "tempest run: error: unrecognized arguments: --include-list"07:47
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible master: Test dashboard only for horizon scenario  https://review.opendev.org/c/openstack/openstack-ansible/+/78237607:53
jrosserjonher: theres a couple of ways you can look at the tempest version, it is in "/openstack/venvs/tempest-22.0.0.0rc2.dev138/bin" -> 138 commits after the 22.0.0.0rc2(?) tag07:55
jrosserbut ultimately the thing that defines the version installed should be this https://github.com/openstack/openstack-ansible/blob/master/playbooks/defaults/repo_packages/openstack_testing.yml#L2007:55
jrosser(for master branch at least, stable branches are a little different)07:56
* jrosser puzzled at os_keystone breakage07:59
jonherthe release where it was announce as deprecated is from 2020-12-24 so the pinned SHA from 2021-01-17 should have the change, hmm07:59
jonherthe commit*   https://github.com/openstack/tempest/blob/7e96c8e854386f43604ad098a6ec7606ee676145/releasenotes/notes/Inclusive-jargon-17621346744f0cf4.yaml08:00
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible master: Test dashboard only for horizon scenario  https://review.opendev.org/c/openstack/openstack-ansible/+/78237608:01
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible master: Set buster jobs to NV  https://review.opendev.org/c/openstack/openstack-ansible/+/78268108:01
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible master: Set buster jobs to NV  https://review.opendev.org/c/openstack/openstack-ansible/+/78268108:02
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible master: Set buster jobs to NV  https://review.opendev.org/c/openstack/openstack-ansible/+/78268108:03
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump SHAs for master  https://review.opendev.org/c/openstack/openstack-ansible/+/78580008:03
*** rpittau|afk is now known as rpittau08:04
*** manti has quit IRC09:00
*** SiavashSardari has joined #openstack-ansible10:19
admin0morning10:33
noonedeadpunko/10:34
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible master: Map dbaas and lbaas with role defaults  https://review.opendev.org/c/openstack/openstack-ansible/+/78411310:45
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible master: Map dbaas and lbaas with role defaults  https://review.opendev.org/c/openstack/openstack-ansible/+/78411310:48
*** macz_ has joined #openstack-ansible10:53
*** macz_ has quit IRC10:57
*** sri_ has quit IRC11:04
*** csmart has joined #openstack-ansible11:35
*** rh-jelabarre has joined #openstack-ansible12:17
*** SiavashSardari has quit IRC12:30
*** manti has joined #openstack-ansible12:48
*** cloudnull8 is now known as cloudnull12:48
*** spatel_ has joined #openstack-ansible13:01
*** spatel_ is now known as spatel13:01
*** luksky has quit IRC13:11
*** luksky has joined #openstack-ansible13:11
*** luksky has quit IRC13:12
*** luksky has joined #openstack-ansible13:12
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: Update trove configuration  https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/78457113:31
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: Update trove configuration  https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/78457113:31
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: [doc] Document how to use separate RabbitMQ cluster  https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/78478113:34
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: Update trove configuration  https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/78457113:45
openstackgerritDmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: Update trove configuration  https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/78457113:46
openstackgerritMerged openstack/openstack-ansible-os_trove master: Add image upload option  https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/78437213:55
openstackgerritMerged openstack/openstack-ansible-openstack_hosts master: Decrease TCP retries in case of VIP failover  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/77802814:17
*** macz_ has joined #openstack-ansible14:35
*** macz_ has quit IRC14:40
mantiI have a problem with lxc containers not having any services. I've run playbooks setup-hosts and setup-infrastructure successfully, but when comparing with aio-setup for example cinder-api container does not have anything in tcp port 8776, but aio-setup has uwsgi in tcp port 877614:57
*** dpawlik has quit IRC14:58
mantiSame with horizon container, there is no apache2 etc14:58
mantiIt's like containers have been created, but the actual services have not been installed14:59
jrossermanti: setup-hosts creates the containers etc, in this context "infrastructure" is rabbitmq/galera/haproxy14:59
jrosseryou need to run setup-openstack.yml to lay down the actual openstack services14:59
mantiah, I assumed the services should already be there because haproxy is unhappy since the healthcheck is failing, because there is nothing that could respond to it15:01
*** macz_ has joined #openstack-ansible15:01
jrossermanti: thats expected, and those healthchecks should come good one by one as the services are deployed15:03
mantiBut I can't run setup-openstack because task os_keystone : Create database for service fails to Exception message: (2013, 'Lost connection to MySQL server during query'), which I kind of assumed would be caused by haproxy15:06
mantiIf I go to the galera container I can run mysql and access the database, that the problem is clearly in the connections15:07
jrosserif you can paste the error message at paste.openstack.org (with the whole ansible task headers) then we can try to find out whats going on15:10
mantihttp://paste.openstack.org/show/804403/15:12
mantiRunning the playbook with -vv or more doesn't give any extra info15:15
jrosserwhat is 10.254.17.84 ?15:16
mantiinfra1_utility_container-d225f98e15:17
jrosserin the utility container you should have the mysql client, can you check that it can connect to the database?15:19
manti"ERROR 2013 (HY000): Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 11" that doesn't work15:20
manticorrection, it works when I give the host15:21
mantimysql -h infra1_galera_container-8862d63d and mysql -h 10.254.17.117 both work15:21
mantimysql -h 10.0.3.112 does not work15:22
jrossercan i check which branch you have deployed?15:23
mantistable/victoria15:23
jrosserit should not be needed to specify the node http://paste.openstack.org/show/804404/15:23
jrosserthere should be a /root/.my.cnf file on the utility container which has the IP address of the haproxy internal endpoint15:25
jrosserthe connection should be client > loadbalancer > active galera node15:25
jrosserso i think that the next thing to check is if haproxy is seeing the galera nodes as up or down15:26
admin0how to troubleshoot/fix when 2 routers ( 3x in HA) are active at the same time and ping-pong the IP every x seconds15:27
mantithere is /root/.my.cnf file on the utility container and it has IP address that is used on the haproxy.conf bind clauses (like frontend galera-front-1 bind 10.254.17.11:3306), so I guess that is the haproxy internal endpoint?15:28
mantibut I think haproxy does not see galera node up since there is " Server galera-back/infra1_galera_container-8862d63d is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue." and " backend galera-back has no server available!"15:30
mantiI assumed those erros in haproxy come because the healthcheck fails15:31
jrosserfor a real-world deployment i would expect the ip in .my.cnf to be the one specified in internal_lb_vip_address, here https://github.com/openstack/openstack-ansible/blob/master/etc/openstack_deploy/openstack_user_config.yml.aio#L2415:31
jrosserfor galera the mysql protocol is on port 3306, but the healthcheck is done with an http service on port 920015:32
mantiI had to check, but yeah, the IP in .my.cnf is the same as internal_lb_vip_address15:34
jrosserroot@haproxy1-utility-container-ec8755d0:~# curl 10.11.128.95:920015:34
jrosserPercona XtraDB Cluster Node is synced.15:34
jrosser^ that is what determines if haproxy is happy with each galera node15:34
mantiwell, it's not then: root@infra1:~# curl 10.254.17.117:920015:36
manticurl: (56) Recv failure: Connection reset by peer15:36
jrosserhmm so this could be some network connectivity issue15:38
jrossertime to do basic ping / tcpdump checks between the haproxy node and galera containers15:38
mantiping is working, both with IP and name15:39
jrosserthe galera status check service on port 9200 is run with xinetd on all the galera hosts15:40
mantiI can see that the xinetd runs mysqlchk, which has only_from definition. Is that list of IPs used to determine from where the connections are allowed?15:44
jrosseryes thats correct15:45
jrosserit should be populated with this https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/galera_all.yml#L33-L3915:45
mantiThere's my problem! It doesn't contain the right IP15:49
jrosserahha, interesting15:50
jrosserit should get the IP of all your galera hosts and all your haproxy hosts from this -> groups['galera_all'] | union(groups['haproxy'])15:51
admin0manti, check the mtu also15:55
admin0i also had similar issues when my mtu was not in sync15:55
admin0which you cannot determine by ming15:55
admin0ping*15:55
admin0if you do ping, try to ping say 65000 bytes from each instance15:55
mantiI have a feeling this is happening because the internal_lb_vip_address is 10.254.17.11, but the rest of the /etc/openstack_deploy/openstack_user_config.yml has infra and compute nodes defined with other IP15:56
admin0is .11 even up ?15:56
mantiyes it is15:56
admin0a vip is different from the other nodes .. but ideally in some network that each controller can reach15:56
admin0so you cannot have 10x in the controllers and 192.x in the vip15:57
admin0the controllers ( default routing table) must be able to reach this vip ip15:57
admin0so ideally its in some common network or via a diff vlan routed via a router15:57
mantideployment host is in 172.x and can reach the controller and compute nodes via 172.x, and now it is this 172.x that is shown on the mysqlchk only_from definition15:58
mantibut it clearly should have the 10.x address15:58
mantimy openstack_user_config.yml has the 172.x defined for infra_hosts and also for haproxy host, so that is how it has appeared to the mysqlchk16:01
jrossermanti: why is the internal_vip on a different CIDR than the internal mgmt network? usually it would be on the same network16:03
*** jamesdenton has quit IRC16:03
mantiinternal mgmt network is 10.254.17.0/24 and the internal vip 10.254.17.11, so they are on the same network16:04
* manti < https://matrix.org/_matrix/media/r0/download/matrix.org/bCtVfntJUpdMwsmRxGxYQpGb/message.txt >16:06
jrosserand you've got those 10.254.17 IP on the hosts and eth1 of the containers where needed16:08
jrosserbit 172.x is what you want to use to ssh for out-of-band management and ansible?16:08
jrosser*but16:08
mantiyes16:10
jrosserok, so i think in this case you are going to need to put an override for galera_monitoring_allowed_source in /etc/openstack_deploy/user_variables.yml16:11
jrosserit's completely fine to have a separate network for your host management/ansible thats not the openstack mgmt network16:11
mantiok, that galera_monitoring_allowed_source  sounds like it'll solve my problem16:13
jrosserbut it will be necessary to make overrides in some cases because of issues like this - you can make whatever network setup you want, but going away from the stock config will require some overrides16:13
jrosserfwiw i have 3 deployments set up just like this with a seperate managment network, it's completely fine16:14
openstackgerritMerged openstack/openstack-ansible stable/victoria: Bump SHAs for stable/victoria  https://review.opendev.org/c/openstack/openstack-ansible/+/78580316:14
jrosserjust not necessarily working out of the box without some extra settings16:14
mantithat is good to know also, I did notice that documentation does not recommended this kind of setup as it says the deployment host should be within mgmt network (if I recall correctly)16:15
jrosseri think it's more like there is a reference deployment assumed throughout the documentation16:18
*** macz_ has quit IRC16:19
*** jamesdenton has joined #openstack-ansible16:30
openstackgerritMerged openstack/openstack-ansible stable/train: Bump SHAs for stable/train  https://review.opendev.org/c/openstack/openstack-ansible/+/78580116:33
openstackgerritMerged openstack/openstack-ansible stable/ussuri: Bump SHAs for stable/ussuri  https://review.opendev.org/c/openstack/openstack-ansible/+/78580216:33
*** rpittau is now known as rpittau|afk16:59
*** macz_ has joined #openstack-ansible17:15
*** macz_ has quit IRC17:17
*** macz_ has joined #openstack-ansible17:17
*** MrClayPole has quit IRC17:35
*** MrClayPole has joined #openstack-ansible17:35
*** andrewbonney has quit IRC18:47
*** juanoterocas has joined #openstack-ansible19:15
*** snapdeal has joined #openstack-ansible19:43
*** juanoterocas has quit IRC19:59
*** juanoterocas has joined #openstack-ansible20:00
*** devtolu1__ has joined #openstack-ansible20:04
*** juanoterocas has quit IRC20:04
*** snapdeal has quit IRC20:46
*** macz_ has quit IRC20:47
*** devtolu1__ has quit IRC20:56
*** juanoterocas has joined #openstack-ansible20:56
*** bjoernt has joined #openstack-ansible21:11
*** jralbert has joined #openstack-ansible21:25
*** spatel has quit IRC21:26
jralbertHaving just upgraded to Train, I'm aware from the release notes that placement is removed from the nova role and gets its own role and container; however I see that the nova-placement-api service is still defined and running from the old Stein nova venv (although evidently receiving no requests). Should I just shut down and mask the service? Should21:28
jralbertthere be a play in the upgrade to clean up this service?21:28
*** spatel_ has joined #openstack-ansible21:29
*** spatel_ is now known as spatel21:29
*** bjoernt has quit IRC21:37
*** bjoernt has joined #openstack-ansible21:38
jrosserjralbert: i think that the clean up is left up to you, i'd expect the haproxy backend for nova-placement-api to be removed by the upgrade playbooks as here https://github.com/openstack/openstack-ansible/blob/stable/train/scripts/run-upgrade.sh#L185-L18621:39
jrossershutting down and masking the service seems like a reasonable next step21:40
jralbertYes, the haproxy backend went away, I was just surprised in our post-install review to still see 19-release binaries running in the nova container and then realized they're from the deprecated placement service21:41
*** bjoernt has quit IRC21:41
*** bjoernt has joined #openstack-ansible21:41
jrosserjralbert: were you previously having trouble during your upgrade with the venvs being built repeatedly for each node?21:44
*** macz_ has joined #openstack-ansible21:44
*** sshnaidm|pto has quit IRC21:44
jralbertthat was me, yes21:44
jrosserdid you manage to find what was happening there?21:45
jralbertNope, we tried several variations on limits, including having the repo containers in the limit, and saw no change21:46
jralbertat that point we just directed osa to pull nova and neutron source from github and pushed through21:46
jrosseri was thinking that was probably a factor in the upgrade taking a very long time21:47
jralbertour goal was to run a set of limited plays so we could upgrade infra, control, and compute planes independently and thereby limit the outage to our clients21:47
jrosseryes, understood21:47
jralbertand that was largely successful, except that compute was heavy to get through21:47
*** spatel has quit IRC21:47
jralbertit was very late, and I was very tired, but I couldn't see clearly in the venv build role where it decides what host to delegate to21:48
jralbertthere's a task that identifies the build host, and it does name a repo container, but then the build still happens on the compute node21:48
jrosseryeah, it's kind of a bit tortuous to find that21:48
jrosserthe other thing is that previously gathered facts can time out after 24 hours21:49
jralbertbut what fact would change the delegation behaviour?21:50
jrosseroh, well it uses the repo server cpu architeecture and operating system version when selecting the one to use to build wheels21:51
jrosserand silently falls back to building on the target host instead if a match isnt found21:51
jrosserthis should all be handled fine, but i just wonder if theres something underlying which made it always fail to find a suitable repo server for you21:53
jralbertIs there a way to add some debug plays to illustrate what it's picking?21:54
jrosserthis is the interesting bit https://github.com/openstack/ansible-role-python_venv_build/blob/stable/train/tasks/python_venv_wheel_build.yml#L16-L2421:57
jrosserit should be gathering facts about the repo servers21:57
jrosserthen evaluating {{ venv_build_host }} to decide which to use21:57
jrosserif it's possible to reproduce some sort of "run nova/neutron playbook limited to one compute node" you might temporarily stick this in the python_venv_build role just after the "gather build target facts" task http://paste.openstack.org/show/804418/22:02
jrosserwhich will lead you to here https://github.com/openstack/ansible-role-python_venv_build/blob/dac99008d5c2473ce82f83559024ce0c62e78a9f/defaults/main.yml#L12122:03
jrosseranyway, it's kinda late here now, but if the opportunity to understand what was happening comes up i'd be interested to know if we have something to fix22:05
jralbertAh, okay. Perhaps of interest, here's a little chunk of the log from a limited build of horizon on one of our control nodes, showing a repo container selected to gather facts from, and then delegation to the horizon container itself to do the build anyway: http://paste.openstack.org/show/804419/22:05
jralbertI will see about trying to get some debug info as you describe22:06
jrosseri think printing venv_build_targets might also be enlightening if you can make it misbehave as before22:09
jrosserjralbert: reading your paste carefully it looks like what you show with horizon is the same lack of delegation, so whatever is easiest for you to add some debug to will be fine, does not need to be a compute node22:23
jralbertyes, I think I can trigger this anywhere in our environment. I'll do some testing when I have a bit of time and let you know what I find out22:24
jrossergreat, thankyou22:24
*** luksky has quit IRC22:49
openstackgerritMichael Soto proposed openstack/ansible-hardening master: Added pam_auth_password to nullok check  https://review.opendev.org/c/openstack/ansible-hardening/+/78598422:52
*** juanoterocas has quit IRC22:59
*** tosky has quit IRC23:24
*** macz_ has quit IRC23:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!