Monday, 2022-12-26

*** chkumar|rover is now known as chandankumar03:31
noonedeadpunkFor those who use prometheus and libvirt exporter - it might be useful to know that project has changed an owner to quite contraversary one (just own opinion) - some details are in kolla patch - https://review.opendev.org/c/openstack/kolla/+/86816110:20
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Allow to manage extra services, mounts and networks  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/86853410:23
*** dviroel_ is now known as dviroel11:15
anskiyquestion! I have openstack installation with one region and Ceph cluster. I'm trying to move Cinder to control-plane nodes and use it in active-active mode. So, if I understood correctly: there would be only one cinder-volume service which would be attached to one AZ. Suppose I want to add another AZ (which should represent another DC), should I create another Ceph cluster with separate cinder-volume service on the exact sa13:25
noonedeadpunkanskiy: it kind of depends on your AZ implementation13:26
noonedeadpunkI'm doing AZ deployment at the moment and was planning to publish some better docs about it (and had some talk on how to configure AZs with OSA in OCtober)13:26
noonedeadpunkBut long story short - it does depend on your requitements. For AZ you can both share and separate storages. So if your DCs are less then 10km from each other and you're confident in link between them - you might want to strecth ceph cluster between AZs13:28
noonedeadpunkBut if you want separate ceph clusters - you can do that as well. But then I think you will need to spawn independant cinder-volumes, if you want to isolate az1 going to storage in az213:28
anskiynoonedeadpunk: that's gonna be a one-to-one relation between AZ and DC, but control-plane nodes would be only in one DC. Is this reasonable (with your note on my trust in cross-DC network link)? Or do people just span control-plane across DCs for multi-AZ setup?13:31
noonedeadpunkWell, I personally spawn control plane cross DC, but we're going to have 3 AZs13:35
noonedeadpunkAs then, you can survive complete AZ failure even API wise13:35
noonedeadpunkI did a pair of keepalived instances per AZ, so 3 public instances and 3 privatre, and then DNS RR13:36
noonedeadpunkAlso, haproxy is targeting only local to AZ backends to reduce cross-az traffic13:36
noonedeadpunkand same can be done with internal vip either through /etc/hosts or dns - so services in containers will talk to local haproxy and be pointed towards local backends (ie nova-cinder communication)13:39
noonedeadpunkthe only nasty thing are images in glance13:39
noonedeadpunkis I wasn't able to find proper way to satisfy everyone without using swift backend instead of rbd. If you're fine with using interoperable import only - there's a way around I guess. 13:40
noonedeadpunk(so depends on how you can mandate users behaviour)13:41
anskiynoonedeadpunk: thank you for the insights, gonna have to think more about this.13:50
noonedeadpunkanskiy: that's actually the talk I was mentioning - it's far from being a good one, but still might give some insights https://www.youtube.com/watch?v=wvTvfAR_4eM&list=PLuLHMFPfD_--LAMu7bBkCNAXfTy04iLPj There're also presentations for the event lying in public access somewhere13:56
moha7while `telnet To whom not being in vacation (:14:11
moha7OSA was deployed successfully (I mean not getting any error in deployment processes), But now when I run the command ` openstack network list`, I get this error:14:11
moha7HttpException: 503: Server Error for url: http://172.17.246.1:9696/v2.0/networks, 503 Service Unavailable: No server is available to handle this request.14:11
moha7while `telnet 172.17.246.1 9696` from the infra1-utility-container gets connected to the port 969614:12
moha7There's a same error here: https://bugzilla.redhat.com/show_bug.cgi?id=2045082#c914:12
noonedeadpunkmoha7: you should be telneting not to haproxy (which listens on 172.17.246.1 I guess), but to haproxy backends, or to put in a better way - mgmt address of neutron-server container14:24
noonedeadpunkthe error you see most likely says that haproxy for some reason can't reach neutron-server either because of some networking issue, or because neutron-server died14:25
moha7172.17.246.1 --> internal vip14:27
moha7172.17.246.174 --> infra1-neutron-server-container-21189fcd14:28
moha7can not telnet to infra1-neutron-server-container-21189fcd from infra1-utility-container-5cf19aed on port 969614:29
noonedeadpunkbut does anything listein inside infra1-neutron-server-container-21189fcd container on that port?14:31
moha7There's a service there named "neutron.slice" with some error. I've never seen this name before! Services status: http://ix.io/4jAM14:31
noonedeadpunkso you're trying to have OVN as a networking driver?14:33
noonedeadpunkOr you don't care and just spawning default option?14:33
moha7nobody listens to 9696 in the neutron lxc container: http://ix.io/4jAN14:33
noonedeadpunkmhm, yeah, I guess it's related to ovn init issue - `ValueError: :6642: bad peer name format` 14:35
jamesdentonthat's a missing northd group14:35
moha7I didn't know OVN is the default and previously was configuring it as I was thinking it is on linuxbridge, trying to port it to OVS; But this time, I deployed it with ovn as it is the default option.14:35
noonedeadpunkmoha7: yes, we switched default to OVN in Zed14:36
noonedeadpunkBut you can still use lxb if you want to14:36
moha7I followed this post: https://satishdotpatel.github.io/openstack-ansible-multinode-ovn/ to configure the user_variables and openstack_user_config files14:36
noonedeadpunkyeah, I think northd group was introduced relatively lately14:37
noonedeadpunkSo you'd need to add network-northd_hosts definition to your openstack-user-config.yml14:38
jamesdentonthat blog is likely a little outdated. That is the way ^^^14:38
jamesdentonSomething like --> network-northd_hosts: *controller_hosts, if you have an aliad setup14:39
jamesdenton*alias14:39
noonedeadpunksi we basically made override `env.d/neutron.yml` as default behaviour14:39
moha7Yeah, I have not set network-northd_hosts; Does it need to an OVN gateway toa sout network deinitions to?14:42
moha7Yeah, I have not set network-northd_hosts; Does it need to an OVN gateway toa sout network deinitions to?14:42
moha7Does it need to an OVN gateway too?*14:42
moha7noonedeadpunk: So, the setting for env.d/neutron.yml that is introduced in that blog is wrong?14:44
jamesdentonmoha7 those aren't really necessary anymore14:44
moha7Then, network-northd_hosts would be enough, right?14:44
jamesdentonyou will likely want:: network-gateway_hosts: *compute_hosts14:44
jamesdentonSo, all computes are ovn controllers. you can decide if you want computes to be gateway nodes with that ^^14:45
jamesdentonor, you can make controllers or dedicated network nodes the gateway nodes using the appropriate alias14:45
jamesdentonmoha7 the blog was correct as of early December. This is a very recent change, and docs are forthcoming14:46
moha7jamesdenton: I'm not enogh familiar with OVN to decide where I should put the gateway! Based on the picture in the post below, seems  compute hosts are a good option:14:48
moha7https://blog.russellbryant.net/2016/09/29/ovs-2-6-and-the-first-release-of-ovn/14:48
jamesdentonyes, i agree, the gateway on computes mirrors the OVS DVR arch14:49
jamesdentonand i think that was the intention14:49
moha7Do you know any recent document on OVN, I'm searching but couldn't find any recent document!14:49
jamesdentonhmm, i don't really. sorry14:49
jamesdentonhttps://docs.openstack.org/networking-ovn/latest/admin/refarch/refarch.html14:50
jamesdentonthat might help?14:50
moha7Thanks; By this changes, should deploy from scratch? Or just the os-neutron-install.yml would be enough?14:51
moha7jamesdenton: Sure, thanks for the link; It seems OVN is an interesting backend with new concepts14:52
jamesdentonjust -os-neutron-install should be enough14:52
moha7+114:53
*** dviroel is now known as dviroel|lunch15:08
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Allow to create OVS bridge for lxcbr0  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/86860315:32
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_container_create master: Add bridge_type to lxc_container_networks  https://review.opendev.org/c/openstack/openstack-ansible-lxc_container_create/+/86860415:40
moha7now, after adding network-northd_hosts and network-gateway_hosts (here: http://ix.io/4jB4), there's no more of this error: "ValueError: :6642: bad peer name format", but this warning is in the status output for neutron-server and neutron.slice: http://ix.io/4jB3 Is there any other option missing? | the command `openstack network list` on utility container returns "Gateway Timeout (HTTP 504)" after a long wait. | port 9696 15:54
moha7is not up on none of neutron containers15:54
noonedeadpunkAnd can you telnet to 172.17.246.1 3306 from neutron-server?15:56
moha7It's connected, but closed free fast!15:58
noonedeadpunkthat can also be result of the bug that should be fixed with https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/86841515:58
noonedeadpunkbut I think that you should have neutron version installed that is not affected by it yet15:58
moha7"Connection closed by foreign host."15:58
noonedeadpunkso sounds more like mariadb thingy15:58
noonedeadpunkOk, and can you run `mysql -e "SHOW GLOBAL STATUS LIKE 'wsrep_%'"` from utility or galera container?15:59
noonedeadpunkeventually, results from galera and utility may differ15:59
jamesdentonit does seem like haproxy and/or galera are being a problem16:00
moha7from utility, the output of that mysql command is: ERROR 2013 (HY000): Lost connection to server at 'handshake: reading initial communication packet', system error: 116:00
jamesdentonyou might check the status of haproxy, it might be worth re-running haproxy playbook or simply restarting the service16:00
noonedeadpunkthat sound like ssl16:01
noonedeadpunkand from galera?16:01
moha7from galera container: ERROR 2002 (HY000): Can't connect to local server through socket '/var/run/mysqld/mysqld.sock' (111)16:01
noonedeadpunkhuh16:01
noonedeadpunkand systemctl status mariadb?16:01
moha7failed: http://ix.io/4jBa16:03
moha7`galera_new_cluster` couldn't start it.16:04
noonedeadpunkwell. here you go... Do you have some strict firewall rules between controlelrs?16:04
moha7no at all16:05
moha7Seems I should re-deploy it, right?16:05
jamesdentonwhat is the status of the other 2 galera containers?16:05
moha7w816:05
noonedeadpunkYou can try re-running `openstack-ansible playbooks/galera-server.yml -e galera_ignore_cluster_state=true -e galera_force_bootstrap=true` if they're also down16:06
noonedeadpunkit either fail or succeed16:06
moha7jamesdenton: "Failed to start MariaDB" on all 3 galera containers.16:07
jamesdentonok, try what noonedeadpunk mentioned16:07
moha7+116:07
noonedeadpunkI wonder why they all would fail though16:07
noonedeadpunkdoesn't sound too healthy that they did16:08
jamesdentonalso, if you can post the output of this from each container, that would be helpful: cat /var/lib/mysql/grastate.dat16:09
noonedeadpunkbet it all -116:11
noonedeadpunkI have impression that grstate kinda broken for a while as haven't seen anything except -1 there for years now16:11
noonedeadpunkOr maybe we were failing in a way that's not covered only16:12
moha7jamesdenton: http://ix.io/4jBc16:13
jamesdentoninteresting, i feel like i've seen this before16:14
jamesdentonfrom within the ct3 container, can you use the mysql client?16:16
moha7I re-run the galera cluster from contanier3, there are some errors there, but now, `mysql -e "SHOW GLOBAL STATUS LIKE 'wsrep_%'"` returns the tables on utility16:16
jamesdentonok, so on ct2 and ct1, it should just be a matter of "systemctl start mariadb"16:16
noonedeadpunkhm16:19
noonedeadpunkthat's weird16:19
noonedeadpunkthese errors in log should have been covered with https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/defaults/main.yml#L112-L11416:20
moha7now, started and running on all galera nodes, but returining `[Warning] Aborted connection 67 to db: 'neutron' user: 'neutron' host: 'ct1-neut ron-server-container-21189fcd.openstack.local' (Got an error reading communication packets)` in the `systemctl status mariadb`16:20
jamesdentonok - try rerunning neutron playbooks now that the DB is up16:21
moha7still no port 9696 on ct1-neutron-container16:21
moha7Ah, ok16:22
noonedeadpunkI'd say that galera is unlikely is desired state tbh16:22
jamesdentonand that could bem too16:22
jamesdentonmaybe rerun setup-infra and setup-openstack?16:22
noonedeadpunkas `FATAL ERROR: Upgrade failed` is not good tbh16:22
noonedeadpunkand all these tmp tables shouldn't be there16:22
moha7I have snapshots. I rollback to the step where setup-hosts.yml was done16:23
moha7and start from setup-infra 16:23
jamesdentonok, don't forget to add the groups, then16:24
moha7The deployment server is standalone, not on the nodes16:25
jamesdentongotcha16:26
*** dviroel|lunch is now known as dviroel16:32
*** dviroel is now known as dviroel}out19:37
*** dviroel}out is now known as dviroel|out19:37

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!