Thursday, 2021-07-29

*** Guest2352 is now known as prometheanfire00:34
*** prometheanfire is now known as Guest264000:35
*** Guest2640 is now known as prometheanfire00:36
dmsimard> 08:14:33 <evrardjp> jrosser: yes I am not surprised about the "do not install from git sources" .  But I am more puzzled nowadays on how we managed to make ansible more complex than what it should be ...00:48
dmsimardDo you happen to have a link for that ? There's a lot of users that install from git (instead of galaxy) and it's well documented here: https://docs.ansible.com/ansible/latest/user_guide/collections_using.html#install-multiple-collections-with-a-requirements-file00:49
dmsimardI tried to find scary warnings but I don't see them00:50
opendevreviewIan Wienand proposed openstack/openstack-ansible-tests stable/stein: Update Debian stable job  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/80281600:51
dmsimardIf you have pain points/papercuts from stuff like that, feel free to reach out to me, happy to be a liaison via my role in the ansible community team00:51
opendevreviewDavid Moreau Simard proposed openstack/openstack-ansible master: DNM: Test ara 1.5.7rc3 with --diff  https://review.opendev.org/c/openstack/openstack-ansible/+/69663401:57
dmsimardhopefully rc3 is good enough now haha02:01
dmsimardthe good news it that testing ara with OSA has helped uncover various bugs in rc1 and rc202:02
dmsimardthanks <302:02
opendevreviewSatish Patel proposed openstack/openstack-ansible-os_neutron master: Adding https option for calico metadata service  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/80281903:13
evrardjpdmsimard: I have very little experience with collections.  I am old school : ) To what I have seen from collections, their download and their install are,  by default,  built from archives of git repositories without the .git. i.e.  not git repos.07:09
*** rpittau|afk is now known as rpittau07:10
evrardjpMy comment about "how we managed to make ansible more complex than it should be",  is a reference to OSA. OSA could be simpler.  Few examples: we decided to have a dynamic inventory for one single reason: lxc containers needing a random mac.  Other example: We have ansible-role-requirements and ansible-collections-requirements.  Ansible evolved now,  and we could use leverage latest ansible features to simplify things.  But sadly, 07:13
evrardjpansible itself is becoming more complex too...07:13
evrardjpI am just looking a way to _simplify_ where I can,  to make operations simpler.  However,  targetting OSA for changes might not be the best first target for making operations simpler :D 07:15
jrosseryou can't install roles and collections to paths-of-your-choice with a single requirements file07:23
jrosserdmsimard: in the last week or so "sivel> fwiw, installing a collection from git, basically is a shortcut for developers, in that ansible-galaxy clones, builds the artifact, and the installs the artifact, throwing away the git clone"07:24
jrosserand "sivel> installing a collection from git is not supposed to be used for production installs fwiw, iirc we document that it should only be used in development, and you should create actual artifacts instead"07:24
jrosserevrardjp: i think that the publishing of collections is much more like pip/pypi than cloning git, if you follow the official way to push things to galaxy07:26
admin1o/09:09
depasqualeciao everybody10:53
depasqualeI need help with an issue with openstack-ansible galera-install playbook10:53
depasqualeI have reported the following bug -> https://bugs.launchpad.net/openstack-ansible/+bug/193832710:53
depasqualecan someone help me about this topic?10:54
jrosserdepasquale: you could paste the output when you run the galera playbook to paste.opendev.org and put the link here?11:20
depasqualejrosser I will re-run right now and give you the output 11:48
depasquale<jrosser> can you please check the following https://paste.opendev.org/show/807787/11:56
jrosserdepasquale: so this time it has run through and completed?11:57
depasqualeyes11:58
depasqualeit is stucked at the stage of creating users... 11:58
depasqualeit will remain in this status for hours... without any further error11:58
jrosseris it ubuntu focal?11:59
depasqualeUbuntu 20.04.1 lts11:59
jrosserfeels like this https://jira.mariadb.org/browse/MDEV-2482912:04
depasqualeuhm... with OSA 22.1.3 I was able to complete the setup-infrastructure playbook12:11
depasqualeit is very strange12:11
depasqualethe fact that the other containers in infra2 and infra3 have no Mariadb installed is foreseen in your feeling?12:12
jrosserMDEV-24829 is not deterministic12:13
depasqualeok. there is a chance to downgrade mariadb to a 10.3.x version in openstack-ansible?12:14
depasqualejust for my understanding12:15
jrosserthe galera hosts are installed sequentially, not in parallel https://github.com/openstack/openstack-ansible/blob/master/playbooks/galera-install.yml#L4412:15
jrosserno it's not possible to downgrade12:15
depasqualeok so do I have any workarounds?12:18
jrossergive me a moment :)12:18
jrossercan you check which version of maradb is installed?12:19
depasqualeGreat!! :)12:19
depasqualeok let me check12:19
jrosseri would expect 10.5.812:20
depasqualehttps://paste.opendev.org/show/807788/12:20
depasqualethe output of service mysqld status12:20
jrosserthen can you take a look at the output of journalctl -u mariadb12:22
jrosseris the end of the log "normal" or filled with loads of errors about mutex?12:23
depasqualesorry it took some time to face with a proxy error of paste.opendev.org... 12:28
depasqualeI took just the last lines of my 3k line file12:28
depasqualehttps://paste.opendev.org/show/807790/12:28
depasqualeit seems there are several errors on mutex as you anticipated12:28
jrosserright, so i think if you systemctl restart mariadb12:29
jrosserthen re-try the playbook it is possible it will succeed12:29
depasqualeok let me try12:29
* jrosser curses recent mariadb releases :(12:29
depasqualeI will restart mariadb and re-execute galera install12:29
depasqualemariadb restart is stucked ahahahah :D 12:31
jrossersometimes it can take a while12:31
depasqualeunbelievable... and depressive! :)12:31
jrosseryeah12:31
jrosser10.5.9 is broken in different ways unfortunatley12:32
jrosserthis is been horrible to deal with for us12:32
jrosseroh yes and 10.5.10 doesnt work with cinder properly12:34
depasqualewow! it looks promising for my installation :D12:35
jrosserawesome12:35
depasqualestill waiting for the stop12:35
depasquale....12:35
jrosserif you were using the stable/wallaby branch of OSA it would install mariadb 10.5.9, and we have a built in workaround in the playbooks for https://jira.mariadb.org/browse/MDEV-2503012:36
jrosserso that release is not going to suffer from sometimes mariadb deadlocking on startup, on focal12:37
depasqualeok jrosser12:39
depasqualeso I will do the following: format everything on my servers and move to wallaby release12:40
depasqualemy goal is to find a reasonable and stable release to adopt in the distribution of a new cloud region for production... I start fearing about everything now :D12:41
jrosserwell i read your launchpad bugs12:41
depasqualeI really thank you for the help jrosser12:42
jrosseralso we've not made a point release of wallaby since 23.0.0 so i would recommend using stable/wallaby head of branch instead of 23.0.012:42
depasqualeah ok12:43
jrosserthere is a point release every ~two weeks12:43
jrosserthat brings in all the upstream fixes to nova/cinder/..... and also any bugfixes on the stable branch in openstack ansible / ansible roles12:43
jrosserjust so the release model is clear12:43
depasqualewhat if I go for a victoria release but not on ubuntu 20.04?12:44
jrosseryou could install victoria on bionic, as thats a supported OS for V12:44
spatelI am running victoria with ubuntu 20.04 in production and its rock solid 12:44
jrosserdepasquale: ^ there you go :)12:45
depasquale:)12:45
spatelI have 200 compute nodes in that cloud and didn't see any issue related mysql / cinder or anything name it :)12:45
jrosseryou're looking for a "reasonable and stable release", what we have is a "reasonable way to keep on a recent release"12:46
jrosserspatel: no i was just explaining all the difficulties with having to pick a specific version of mariadb12:46
depasqualespatel wich version of ansible did you use?12:46
jrosserthe version of ansible is defined entirely by which version of OSA you use12:46
depasqualebecause with jrosser we were discussing about the latest documentation that is osa 22.1.4 and it is not working for me on a small setup12:47
spateldepasquale ansible 2.10.512:47
jrosserdepasquale: i think if you did several deployments it would work sometimes, not others, becasue it's a non deterministic bug in mariadb12:47
depasqualeyes yes jrosser I was wrong :) my curiosity was OSA12:47
depasqualenot ansible12:47
jrosserah!12:47
depasquale;)12:48
jrosseranyway, to answer your launchpad question - there are lots of people using OSA to deploy production clouds12:48
jrosseri'm one, so is spatel12:48
spateldepasquale i am running mariadb 10.5.8 12:48
depasqualejrosser you were very clear thanks12:49
depasqualespatel I envy you12:49
depasquale:D12:49
spatelI am running 4 large production cloud with OSA. last 4 years i had zero downtime and issue again its all matter how you running all the stuff. 12:50
spatelI have total 1000 compute nodes and soon going to open new datacenter :) 12:50
jrosserdepasquale: OSA is made by deployers for deployers12:50
spatelI am running my cloud using OSA + SRIOV for high performance network throughput 12:50
jrosserspatel started as a user and is now fixing stuff / writing new support which is awesome :)12:50
jrosserand also making really cool blog posts for us all to learn from12:51
spateljrosser :)  yes 4 year ago i was asking same question, like is this stable.. is this going to work.. ? 12:51
spatelbut now i am so happy and keep going with OSA 12:51
jrosseri remember :) it is so nice to see you contributing now too +++112:51
depasqualeok ok so you motivated me! I will do it! Let me format everything and start again from the beginning!12:52
spateldepasquale you can see lots of my OSA related stuff here - https://satishdotpatel.github.io/blog/12:52
jrosser^ don't be afraid to do that a few times12:52
jrosseroften it is quicker to wipe / run again than try to fix a mess, particulary for lab setups12:52
depasqualethanks spatel you have another follower12:52
depasquale:)12:52
spatelu welcome.. don't worry. i was in same boat few years ago.. chasing people to get right answer 12:53
jrosseralso OSA is a toolkit, not a shrink-wrap installer, there is massive flexibility to do whatever you like12:54
jrosserbut that does come at a price of having to dig in and understand the internals a bit12:54
depasqualeok I hope to become an active member in some way. my openstack-queen is still working nicely... but I would love an automatic tool like osa to involve also other colleagues in12:54
depasqualethanks guys12:55
jrosserno problem, theres usually someone around here EU timezone so just ask if you get stuck12:55
depasqualeI will try and try again and let you know about the success or defeats I will face with12:55
depasqualeok thanks12:55
spatel+1  you need to understand underlying structure of OSA without that it will be little struggling. once you know how OSA pieces laidout then you will rule 12:57
spateljrosser kick off export SCENARIO='aio_metal_calico' build in my lab to see where its failing, i know its metadata but not sure how to tell it to use https protocol but lets see.. 13:00
jrosseri saw your patch13:00
jrosserlooks like felix doesnt understand https://......13:00
jrosserso kind of two options13:00
jrosserdrop the calico job13:00
jrosseror override the thing that sets internal endpoint to https, just for the calico job13:01
spatelhmm 13:03
spatellet me finish my lab and see if i can find work around otherwise i will drop calico 13:03
spatelwhen you say drop it means remove it or set to non-voting ? 13:05
jrosserperhaps something to discuss at the weekly meeting next week would be if we keep the calico job or not13:06
jrosserbut i think we can make it work by switching the internal endpoint back to http13:06
jrosserthere are overrides here which are only used for the calico test jobs https://github.com/openstack/openstack-ansible/blob/master/tests/roles/bootstrap-host/templates/user_variables_calico.yml.j213:07
spateljrosser that is what i want to test in my lab to point to internal and see if tempest pass if not then we can just drop calico 13:09
spatelI don't know how many people want to deploy openstack with calico ?13:09
jrosserinternal is https in master though13:09
spatelYes i think we moved everything to SSL vips recently 13:10
jrosserso my suggestion for the calico job is to switch the internal VIP back to http13:12
spatelswitch all internal vip back to http OR just nova-metadata vip? 13:14
jrosserinteresting question13:16
jrosserthe easiest thing is to just switch them all back13:17
spatelagreed.. let me see what we can do otherwise set it to non-voting to unblock others 13:18
jrosserit looks like the way to do it is to do it here https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L267-L26913:18
spatelyes, openstack_service_internaluri_proto: http 13:19
jrosseri am now thinking you can't do that in the calico specific user_variables file13:19
jrosserbecasue it's the same variable precedence as what comes from the user_variables.yml.j2 template 13:20
spatelhmm13:20
spatelI am very curious why calico felix configuration doesn't support https protocol.. thinking to open bug for that13:21
spatelI have opened bug to networking-calico so lets see if someone answer or fix it 13:29
spatelhttps://bugs.launchpad.net/networking-calico/+bug/193844713:42
spateljrosser look like someone know how to fix it :) https://bugs.launchpad.net/networking-calico/+bug/193844714:25
jrosserspatel: maybe look at some of your non-calico stuff15:48
spateljrosser i think felix not going to work with SSL 15:49
spatelWe have to change our haproxy endpoint to non-SSL 15:49
jrossernova-metadata service(http) <- OSA haproxy (https) <- neutron haproxy on network node(http?) <- instance asks for metadata15:50
spatelThis is what calico felix doing, inserting iptables rules on compute node  - -A cali-PREROUTING -d 169.254.169.254/32 -p tcp -m comment --comment "cali:J9-8BAIsw7Yc9tBK" -m multiport --dports 80 -j DNAT --to-destination 172.29.236.101:877515:50
jrosserright, so from the VM perspective i think the metadata service is still expected to be http, even when the internal VIP is https?15:50
spatelif felix using iptables then we can't tell it to use SSL 15:50
spatelcheck this thread - https://github.com/projectcalico/felix/issues/293315:51
jrosserbecasue normally there is haproxy on the network node, doing something more complex than just an iptables forward15:51
jrosseryes i'm reading it15:51
spatel:)15:51
jrosserso it's the case that we have an http -> https translation in the neutron haproxy right?15:51
spatelyes that should work 15:51
jrosseri think thats what we have today in a normal deployment without calico15:52
spatelwhy don't we create one extra vip endpoint for nova-api with non-SSL15:52
spatelkeep everything SSL and just nova-api-metadata with http and https both 15:52
jrosseryeah, would have to look how to do that15:53
spatelcurious why we decided to go all SSL ?15:54
spatelwhy don't we set it to non-voting and later when we have good solution we can remove non-voting 15:59
spateli don't know how many people deploying openstack with calico and they are very dependent on CI job15:59
*** rpittau is now known as rpittau|afk16:03
jrosserspatel: well, the rework of all the SSL stuff i did was primarily aimed at the public endpoint16:07
jrosserbut noonedeadpunk did a load of followup work on that to also apply it to the internal endpoint16:07
jrosseri expect there are some good reasons they have at city network to want to do that, perhaps regulatory / compliance issued depending on who the customers are?16:08
spatelyes, public endpoint was already public earlier look like we just made it for all 16:08
jrosserevrardjp: ^ do you have insight into this?16:08
jrosserwell it was loads of extra work16:08
jrosserin a way, doing the internal endpoint was harder than the external16:08
spateli am worried if i upgrade my openstack may this change break some stuff 16:08
jrosserthe upgrade jobs are passing :)16:09
jrosserbut you should read/understand how the new PKI role is used16:09
jrosserparticularly if you want your own, or a trusted certificate on the internal endpoint16:09
jrosserby default it will create a custom CA and certificates for internal16:10
spatelassuming we are using self-singed certificate righg?16:10
jrosserfor internal or external :) ?16:11
spatelinternal16:11
jrosserit's a bit more complicated now16:12
spatelwe may need to think about renew them also at some point, I would prefer if we have nod to turn it on and off :)16:12
spatelSSL is always difficult + hard to troubleshoot, specially with tcpdump etc..16:13
spatelassuming haproxy_ssl_all_vips: false will turn SSL stuff off and make it deployment like previous right? but what will happened to external vips? 16:14
jrosseri linked you three variables before16:16
jrosserin master theres also support for different certs on the internal and external endpoints16:21
jrosseri think this needs some new documentation writing16:21
spatelhttps://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L267-L26916:25
spatelfor experiment i did haproxy_ssl_all_vips: false and re-run haproxy playbook but nothing happened 16:25
jrosserwhat about openstack_service_internaluri_proto ?16:26
spateli didn't set that but let me set all 3 and re-run playbook16:30
jrosseryou should see it make changes to the haproxy config and reload it16:31
spatelno luck, i did set this https://paste.opendev.org/show/807800/16:31
jrosserdo you mean really "nothing happened" ?16:32
spatelre-run haproxy-server.yml and still nothing changed in haproxy.cfg16:32
spatelaio1                       : ok=38   changed=0    unreachable=0    failed=0    skipped=30   rescued=0    ignored=016:32
jrosserand you're sure that var isnt also set somewhere else in /etc/openstack_deploy ?16:32
spateldamn it you are right.. it was in same user_variables file but in different locations so i didn't scan all the lines.. look like it works16:35
spatelso that is what we need to turn it on and off 16:37
spatelnow all internal endpoints are non-SSL16:37
spatelwhy don't we educate end user to use these 3 nod to make your deployment super secure 16:38
spatelwe have two solution here to fix calico 16:43
spatel1. add special stanza for nova_api_metadata to non-SSL 16:43
spatel2. disable SSL for deployment and let user decide to enable or not (but it will still break calico) so not a good option 16:44
spatel3. we can deploy small haproxy for calico on compute node to handle Metadata service, that is what neutron_ovn doing :)16:45
spatel@jrosser ^16:45
*** sshnaidm is now known as sshnaidm|afk18:30
evrardjphey.  I am not aware of this previous work.  I am not surprised,  however,  with our compliance requirements.21:19
evrardjp(it was in reference to SSL everywhere)21:20
opendevreviewDavid Moreau Simard proposed openstack/openstack-ansible master: DNM: Test ara 1.5.7rc4 with --diff  https://review.opendev.org/c/openstack/openstack-ansible/+/69663421:41
opendevreviewIan Wienand proposed openstack/openstack-ansible-tests stable/stein: Update Debian stable job  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/80281622:01

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!