Friday, 2023-01-06

prometheanfiredoes `openstack network agent list show XXX for alive ovn controller and ovn-metadata-agent for others?00:06
jamesdentonthey should all show alive, prometheanfire 01:43
jamesdentonhttps://docs.openstack.org/openstack-ansible-os_neutron/latest/app-ovn.html01:45
prometheanfirejamesdenton: br-int sock was not accessable due to protocol version issues, think it's fixed now02:38
jamesdentonahh, good deal03:12
jamesdentonprotocol version.. meaning ssl vs non ssl?03:12
*** akahat|rover is now known as akahat05:12
prometheanfiremaybe? not sure05:13
prometheanfiresomeone set up the systems with linuxbridge and ovs but I'm redoing it with zed and ovn (instructions unclear for them when I went on leave I guess), so ovs created bridges without ssl I'm guessing and ovn connects with it, maybe05:15
prometheanfiredeleted the bridge and had stuff recreate since this is greenfield05:15
*** dviroel|afk is now known as dviroel11:24
cloudnullOHAI - happy Friday all 14:37
prometheanfirecloudnull: ohai15:11
*** dviroel is now known as dviroel|lunch16:39
admin1is there a easy way to enable 2fa/mfa in keystone via osa ? 17:34
darmanHey17:41
darmanIs this channel an active channel, anybody here online? (Unfortunately all openstack channels on Libra.chat are silent!)17:43
jrosserthere are people here :)17:44
jrossermost activity is working-days / working-hours EU time 17:45
jrosseradmin1: 2fa enablement is not really an OSA thing, you'd use a config override to enable the auth method then the rest is via the keystone API https://docs.openstack.org/keystone/latest/admin/resource-options.html#multi-factor-auth-enabled17:46
jrosserit's per user, and as OSA does not deploy end-users then there is not really anywhere to do that17:47
darmanAh, finally I found you (:17:49
darmanI have some general question, and also some issues17:49
*** dviroel|lunch is now known as dviroel17:50
darman1. If you were to deploy a production environment, would you choose OVN as it's not as common as OVS? Personally I prefer OVN since it's been the next step in openstack networking development to redesign the network backend; But some technical aspects would help me to defeat the choice against my managers.17:53
darman2. This the error I get: https://pastebin.ubuntu.com/p/prnxqmSCb4/ when the setup-everything.yml reaches to the keystone service installation.17:54
darmanthis is*17:54
darmanError link*: https://pastebin.ubuntu.com/p/dCsjv6bz9p/17:56
jrosseri have no direct experience of OVN myself but we have other poeple here who are using it for real17:58
darman3. Do you know a general active channel for openstack itself (here on IRC or other online platforms)?17:59
jrosserregarding your deploy error it is not possible to know what is wrong from that output18:00
jrosserthat ansible task has no_log: True on it as otherwise it would display the database password in the log output18:00
jrosserfirst thing you need to do is check haproxy that it thinks the database backends are up18:01
jrosseryou can either use the haproxy log, hatop or the haproxy management web interface for that18:01
darmanWhere I should set`false` for this option:  'no_log: true'? in the user_variables.yml?18:01
darman"Overview — HATop: Interactive ncurses client for HAProxy"; I didn't know it!18:03
darmanjrosser: Here are my variables: https://pastebin.ubuntu.com/p/wP2vCzGJwf/ Do you see something strange there for haproxy? Is there anything that has been forgotten in the config file? I would appreciate it if you take a look18:11
darmanLink*: user_variables.yml: https://pastebin.ubuntu.com/p/CpvC3Ym36Y/18:13
darmanYes, there's an issue with haproxy: https://pastebin.ubuntu.com/p/pK2kmMXgFw/18:24
jrosserwell first i would really advise against install_method: distro unless you have a super clear understanding of why you choose that18:28
jrosserthen from your error message we see failed: [infra01_keystone_container-51bb0d04 -> infra01_utility_container-e956a5a6(172.17.236.15)18:29
jrosser^ an address in 172.17....18:29
jrosserbut you define internal and external vip in 10.x ranges18:29
darmanThe installation process from the source was very long, almost 6 hours. I thought maybe it would be faster from the distro, which was no different. I will change it to the source in the next installation.18:45
jrosserit should not be 6 hours at all, that suggests some sort of problem18:46
darmanin my experience: setup-hosts --> 45 minutes18:46
darmansetup-infra: 1 h18:47
jrosseris this on real hardware or some virtualised environment?18:47
darmanOn VMS on proxmox18:47
jrosseroh right, well18:47
jrosseri think that the deploy time is pretty sensitive to disk speed18:48
jrosserhaving said that our CI jobs run a complete deployment on a single node in < 2hours18:49
jrosserand those are virtualised18:50
jrossera bare metal node with an nvme disk might complete in < 1 hour18:50
jrosseranyway, it feels like your haproxy problem is networking related18:51
jrosseri don't understand what is happening with your addressing18:51
darmanon a single node in < 2hours; What about 3 controllers and 2 computes? 19:00
jrosserthe equivalent of 3 controllers in one of our H/A CI jobs takes 20 mins for setup-infrastructure19:08
spateldarman i would go with OVN if this is new cloud. because after few year converting production cloud would be mess. 19:20
spatelI am deploying all new cloud using OVN 19:20
darmanspatel: +1, the 'converting in the future' is good point19:22
jrosserdarman: do you find anything yet with your galera trouble?19:24
spateleventually linuxbridge will die if no maintainer left. new version of OS will stop delivering it. 19:24
darmanjrosser: not yet. 19:26
jrosseryou need to find out why from the perspective of haproxy the backend is down19:27
jrosserthere is a healthcheck19:27
jrosserand there is basic network connectivity to check19:27
darmanIt seems that the examples in the repository (/etc/openstack_deploy) are not suitable for deploying with OVN.19:27
darmanIs there a place where users have shared their configs? Or if it is possible to share the here by removing sensitive data?19:27
jrosserhopefully everything is in the documentation19:28
darman`an address in 172.17.... but you define internal and external vip in 10.x ranges` I manually changed it to 10.0.0 when posting the error here to make it clearer!19:28
spatelI did blog out some OVN stuff - https://satishdotpatel.github.io/openstack-ansible-multinode-ovn/19:28
jrosserhttps://docs.openstack.org/openstack-ansible-os_neutron/latest/app-ovn.html19:28
jrosserspatel: you may need to update your blog for the changes in zed/master?19:29
spatelrelated SSL?19:29
spatelbut method would be same.. running playbook etc.. correct?19:29
jrosserwell i don't know :)19:29
spateli don't think we did any major changes in OVN deployment 19:30
spatelI will sure deploy zed with multinode and give it a try19:30
darmanspatel: nice, I'll try your configs in that blog post19:31
spatelTry in lab first and let me know if any change required.. 19:31
spateljrosser we should put some of my blogs links to OSA/OVN deployment example. Its not prefect but can help someone to give it a try :) 19:33
jrosserwell i think it may just lead to confusion19:33
spatelI will add more stuff as required 19:34
jrosseras the AIO now defaults to OVN......19:34
jrosserso that is the 'reference' deployment19:34
darmanspatel: Ah, you're using `/etc/openstack_deploy/env.d/neutron.yml` there, but I don't have it! Let me try it.19:34
jrosserspatel: ^ see19:34
jrossernow we have total confusion19:34
jrosserdarman: have you yet used the "all-in-one" deployment?19:35
darmanNo, I wanted an environment as close as possible to production.19:36
spateljrosser you are right Zed has built in environment for OVN so that step can be skip. 19:36
jrosserso why follow that blog?19:37
jrosseryou already have the default neutron env.d from here which is wildly different https://github.com/openstack/openstack-ansible/blob/master/inventory/env.d/neutron.yml19:37
jrosserdarman: i am pretty unclear what you want to acheive19:37
jrosserthe all-in-one will get you going automatically in a single VM and is more likley to work than anything else, as it is the *exact* code that we run in CI19:38
darmanjrosser: Installation test for a multi-node environment19:38
admin1issue with going right now with ovn is that it does not support all LB functions,  .. so tools like CAPI  do not work 19:38
jrosserthen when your multinode is haveing difficulty you can use the AIO as a reference to see what is different / broken19:38
spateladmin1 LB is totally different service, you can use amphora if you want advance LB feature with OVN. What is CAPI?19:40
jrosserdarman: if you want help with your deployment error - do you have a specific question?19:41
darmanI am doing a T-shoot. If I can't solve it, I will ask here19:43
darmanjrosser: For the all-in-one, I'm going to follow this doc: https://docs.openstack.org/openstack-ansible/latest/user/aio/quickstart.html, it's ok, right?19:44
jrosserwell, 'latest' in the URL means that is the documentation for master branch, which is the next release19:45
jrosserthe current release is here https://docs.openstack.org/openstack-ansible/zed/user/aio/quickstart.html19:45
darmanthanks19:46
jrosserand personally i would check out stable/zed instead of the tag19:46
admin1capi is Kubernetes Cluster API  ..   its getting popular now a days as the way to deploy k8s cluster on clouds 19:46
admin1including os 19:46
admin1os => openstack19:46
admin1i will test a multinode install with ovn and see how far i can go 19:46
admin1darman, if you have a big server where you can create vms, you can make it as close to prod as possible .. 19:47
darmanI have an HP G8 server running ProxMox, old but still powerful19:48
admin1you can create vms, replicate the network and vlans and even router 19:49
admin1mimic ip address and everything to the exact detail 19:49
admin1i rented a AMD EPYC  from hetzner :) 19:49
admin1works good 19:49
admin1put 2 nvmes in raid0 19:49
admin1so that  the build goes faster 19:49
admin1and use vyos for  the router 19:50
admin1to mimic vlans and DC side of stuff 19:50
spatelI am running all my openstack labs on single VMware HOST (gen8 with 128GB ram 1TB SSD) 19:50
darmanvyos --> interesting +119:53
darman"the output has been hidden due to the fact that 'no_log: true' was specified for this result"20:00
darmanHow can I override `no_log` to be false20:00
darman?20:00
jrosserthere is no way to override that without editing the code20:01
jrosserfrom the top of my head its something like /etc/ansible/ansible_collections/openstack/osa/roles/db_setup/tasks/main.yml20:05
jrosser^ adjust to match reality20:05
darmanno_log is only used in: `/opt/openstack-ansible/playbooks/healthcheck-infrastructure.yml`20:12
darman`/opt/openstack-ansible/playbooks/ceph-rgw-keystone-setup.yml`20:12
darman`/opt/openstack-ansible/playbooks/rabbitmq-install.yml`20:12
darmanby `grep -r no_log /opt/openstack-ansible/`20:12
jrosserdid you see the path i gave?20:13
darmanchanging it false on all above file didn't have any effect! For keystone installation, it still says: FAILED! => {"censored": "the output has been hidden due to the fact that `no_log: true` was specified for this result", "changed": false}20:13
darmanOops, I saw that message now. w820:14
darmanWorked, and now It says what the issue is: "`unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (2013, 'Lost connection to MySQL server during query')"`20:20
jrosserdid you get haproxy to think that the galera back end was up?20:20
darmanFrom the haproxy aspect, all containers are down! https://i.imgur.com/SPceco6.png20:27
darman^ `hatop -s /var/run/haproxy.stat`20:27
jamesdentonspatel Your blog is great, but some of what you outline is no longer necessary with OSA Zed, and there's an extra group or two that need defined.20:28
jrosserdarman: they will all be down until the services are deployed, and as you have a failure on keystone that is the first openstack service, so it is not a surprise that they are down20:30
jrosserhowever, the database service should be up after you have run setup-infrastructure20:30
jrosserlook at really basic things, is the database service in the db container actually running? does the journal suggest anything is wrong20:31
jrossercan you ping the db backend IP from where haproxy is running20:31
jrosserwhat happens if you curl/wget the db backend healthcheck service from haproxy?20:31
admin1darman, single controller ? 20:50
admin1i had the same issue a day back .. i had to manually fix the  database check to whitelist the ip20:50
darmanWoooops! not possible to ping containers as I was using the wrong range in the `openstack_user_config.yml` for br-mgmt interface. I'm going to destroy containers, then deploy everything from the step setup-hosts.yml to assign new IPs to the containers.20:55
jrosserdarman: also make sure you disable any IP/mc security stuff if there is any in proxmox20:56
jrosser*ip/mac address....20:56
jrosseradmin1: it is not one controller20:56
prometheanfireping from vm on node 1 to vm on node2 fails with ovn for me, vm on node 1 to second vm on node 2 works.  I see the icmp packets hit node2's geneve interface though, but nothing beyond that21:00
prometheanfiretrying to figure out why packets are not being forwarded is 'fun'21:01
prometheanfirethat I can't run ovn-nbctl (or sbctl) doesn't help, tried passing the right socket and ssl terms21:02
spatelprometheanfire do you have OVN in cluster?21:04
spatelyou can run ovn-nbctl only on leader node. 21:04
prometheanfireoh, didn't know that part, guess I'll run it on the leader lol21:04
spatelif you want to run from member node then you need to pass some switch call --not-leader or something...21:05
spatel--no-leader-only21:06
spatelhttps://man7.org/linux/man-pages/man8/ovn-nbctl.8.html 21:06
spatelyou can use that switch on non-leader node to get data of OVN21:06
prometheanfireya, got the command working at least21:06
spatelovn has nice tool called ovn-trace which can simulate packet flow and tell you where is the blockage or drop 21:08
spateljamesdenton i will redefine my blog with latest Zed or make some comments. 21:09
spateljrosser is correct because when i deploy openstack on VMware then i disabled mac spoofing and some security shit in VMware. 21:12
*** dviroel is now known as dviroel|pto21:12
prometheanfirenot getting anything useful from ovn-trace, shows that the packet should reach the instance :|21:50
spatelyou can ping vm running on same compute node but not across the compute nodes correct?21:53
prometheanfireyep21:53
prometheanfireI see the packet reach the geneve interface on compute-node-221:54
spatelGeneve tunnel is up.. assuming yes21:54
prometheanfirebut that's the end21:54
prometheanfireis there a way to regenerate the openflow table on node-2?21:54
spatelsecurity group etc.. blocking it21:54
prometheanfireI don't think so, at least the ovn-trace seemed to work21:55
spatelwhat is the output of ovs-vsctl show?21:55
prometheanfirefor br-int?21:56
spatelovs-vsctl show command output21:57
prometheanfirehttps://pastebin.com/raw/JBFrSy4v21:58
spatellooks good so far i can see tunnel and tap interface on br-int bridge22:00
prometheanfireyep, I can only think that it's some flow that's not working, harder to troubleshoot that lxb lol22:01
prometheanfireis there a good way to rule out port security?22:04
prometheanfireofctl dump-ports shows the vm port recieving packets at the rate of the ping, so ovs seems to be routing it that far22:06
spatelThis is what i have and everything works for me - https://paste.opendev.org/show/bjDY3HTMJV4fNtzIGSxK/22:09
spateli wonder why we have br-tun 22:10
spatelin my case i have tunnel directly connected to br-int 22:10
spatelmake sure you configure security-group with allow all.. 22:11
prometheanfireI just disabled security groups entirely on the port to test, no good22:11
spatelmany time i endup in that issue where i assumed security-group is ok but endup finding issue there22:12
spatelwhat do you means disable security-group entries?22:12
prometheanfireopenstack port set --no-security-group --no-port-security22:12
prometheanfiresomething like that22:12
spateli don't think that is the issue here.. i am talking about security-group rules 22:13
prometheanfireah, with things disabled that's not it22:13
spatelopenstack security group list22:13
prometheanfireI have a secgroup allowing all outbound and icmp+22 inbound22:13
spateljust make sure.. its :)22:14
prometheanfirealso, having just removed the secgroup from the port should remove that variable, ovn-trace says all packets should reach (tested port 123)22:14
spatelI have to leave now.. but please keep us posted on progress 22:19
spatelrun ovs-tcpdump command which will help you to find painpoints 22:19
prometheanfireyep, used that too :D22:24
prometheanfirecya22:24

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!