Monday, 2023-02-06

moha7A: https://i.ibb.co/4NsHT13/a.png07:12
moha7B: https://i.ibb.co/KWgYtmy/b.png07:12
moha7Is B equivalent to A?07:12
damiandabrowskihi folks09:11
damiandabrowskimoha7: imo yes09:12
damiandabrowskijrosser: thanks for your reviews, most of them are definitely valid. I'll work on this today09:13
jrosserdamiandabrowski: i think you can make it massively more simple09:20
jrosseralso i just found another problem in the haproxy changes, added another comment09:20
jrosserdamiandabrowski: the reason that i think we should not really adjust the way the vars are handled is that if we leave them alone then the service playbooks could be changed very simply to be like this (example for glance) https://paste.opendev.org/show/bh5S74g7x8lhoMDy0rWj/09:30
jrosser^ psuedo-code, not tested09:30
damiandabrowskibut in this case, we won't be able to fix variable-scope issue(which is the main reason why I started working on this). 09:52
damiandabrowskifor me it really sucks if users would need to be aware what service-specific variables(like glance_ssl) need to be defined globally rather than in group_vars. For 2 reasons: clarity and performance09:52
damiandabrowskithis problem was raised here before: https://review.opendev.org/c/openstack/openstack-ansible/+/821090/comments/4e4d8147_e0b4d08709:53
jrosseri think we just swap one for another really09:54
jrosserit is equally confusing that the haproxy "role" now runs targetting all the different service groups, not the haproxy_all group09:54
jrosserthat took me a loooong time to finally realise when looking at the new patches09:55
jrosserof course except when it's run from the haproxy playbook itself.....09:55
damiandabrowskihmm, idk... for me having for ex. os_keystone playbook, triggering haproxy_service_config.yml tasks from haproxy_server role to configure its haproxy service isn't confusing at all10:01
damiandabrowski(especially when we don't have any other way to fix issue with variables scope)10:02
moha7openstack-ansible /opt/openstack-ansible/playbooks/containers-lxc-destroy.yml10:41
moha7How can I remove only a specific container?10:41
moha7`containers-lxc-destroy.yml --limit infra01_repo_container-55a9a634`10:48
jrossermoha7: you can use --limit for with a specific container, or an ansiblel group name for all containers for a particular service10:56
jrossermoha7: it's also useful to double-check what you are about to do by adding `--list-hosts` which will just print the things that the playbook will target and exist10:57
jrosser*exit10:57
moha7jrosser: Thanks11:46
moha71. Does OSA supports Prometheus as a monitoring service, integrated? I couldn't find something in Docs.12:19
moha72. Does OSA deploys SkyLine dashboard? (https://docs.openstack.org/skyline-console/zed/)12:20
moha7deploy*12:20
moha7For the question 1, is it here: https://opendev.org/openstack/openstack-ansible-ops  ?12:22
jrosserdamiandabrowski: i was just digging into nova / nvidia / mdev stuff through the mailing list and i saw you linked to this in a nova patch https://paste.openstack.org/show/bU4qYaIySl5y7qEsEWjR/12:22
jrosserdamiandabrowski: we have some similar script (from Tobias on the ML) to recreate the mdev but it seems to be a race condition with nova-compute starting 12:23
jrosserdamiandabrowski: did you manage to make `After=` in those units behave how you need? Like there is some time after the nvidia services have started before they are properly initialised, and you need some way to make the mdev creation script wait for that (sleep(10) /o\)12:25
jrosserdamiandabrowski: the also nova-compute has to wait for the mdev to be actually created rather than the service just started12:26
jrosserwe get a lot of race condition here using `After=` and the systemd docs are not clear, it seems to behave like "After the oneshot service has successfully started" rather than "After the oneshot service has successfully finished"12:28
jrossermoha7: OSA does not deploy any monitoring natively, but there are settings to enable prometheus exporters on quite a few of the components such as mariadb, zookeeper and haproxy12:29
jrossermoha7: there is an OSA role for Skyline here https://opendev.org/openstack/openstack-ansible-os_skyline12:30
moha7In user_variables.yml, what happens for `haproxy_keepalived_external_vip_cidr: "{{external_lb_vip_address}}/32"` if I set `external_lb_vip_address` on a domain name in opanstack_user_config.yml file? Would it be "sub.example.com/32"?12:32
jrossermoha7: and an incomplete patch to add skyline support to the main repo https://review.opendev.org/c/openstack/openstack-ansible/+/859446 - please do contribute some developer time if you are interested in this12:32
moha7+112:33
jrosseryou would make something like `external_lb_vip_address: openstack.example.com`12:33
moha7I did it the same as you, but `haproxy_keepalived_external_vip_cidr` reads `from external_lb_vip_address`, doesn't it?12:38
moha7I did it the same as you, but `haproxy_keepalived_external_vip_cidr` reads from `external_lb_vip_address`, doesn't it?*12:38
jrosseri set `haproxy_keepalived_external_vip_cidr: "a.b.c.d/32"` in my user_variables12:40
jrossersee https://opendev.org/openstack/openstack-ansible/src/branch/master/etc/openstack_deploy/user_variables.yml#L176-L17912:41
moha7What other services do you usually use alongside with your openstack service? I was thinking of monitoring with Prometheus and log management with ELK/EFK. What else?15:39
jrossermoha7: everyone has a different answer to this :) but here we use prometheus exporters for every component in the OSA deploy that supports it, plus ELK for log/journal collection, and linux/network snmp collection with Observium16:06
jrosseron top of that there are some other things you can do with prometheus blackbox_exporter, libvirt_exporter and so on if you need them16:07
damiandabrowskijrosser: looks like certbot-auto is deprecated and no longer available under https://dl.eff.org/certbot-auto16:16
damiandabrowskihttps://eff-certbot.readthedocs.io/en/stable/install.html#certbot-auto-deprecated16:16
damiandabrowskishould we remove it from haproxy_server role and leave 'distro' as a single valid option for haproxy_ssl_letsencrypt_install_method?16:17
jrosseryes i think we should16:17
jrosseras far as i remember certbot-auto was what was in the haproxy role before lots of work got done to make it H/A16:18
jrosseri added support for `haproxy_ssl_letsencrypt_install_method == 'distro'` to take the package from ubuntu repo16:18
jrosserbut i'm not sure what happens on RH though, if thats possible at all16:18
jrosserah looks like it is in EPEL16:20
jrosserwhat we need is someone to maintain the RH-alikes support16:20
damiandabrowskidon't you use centos-based openstack? :D 16:23
damiandabrowskilol either I don't understand it or I just found something really weird16:34
moha7jrosser: Good points. thanks. (I think Zabbix can do what Observium does as I've previously work with it.)16:34
damiandabrowskiaccording to LetsEncrypt docs, HTTP-01 challenge works only with port 80: https://letsencrypt.org/docs/challenge-types/#http-01-challenge16:34
damiandabrowski"The HTTP-01 challenge can only be done on port 80. Allowing clients to specify arbitrary ports would make the challenge less secure, and so it is not allowed by the ACME standard."16:34
damiandabrowskibut in our docs, we suggest to spawn haproxy certbot service listening on port 443 and it works fine: https://docs.openstack.org/openstack-ansible/latest/user/security/ssl-certificates.html#certbot-certificates16:36
damiandabrowskiso think HTTP-01 challenge does not really work as they describe it16:36
jrosserwe have a redirect from 80 -> 44316:36
jrosserand there is an ACL also on the port 80 part for the LE backend16:36
jrosser`path_beg` or something i think?16:37
damiandabrowskiahhh16:37
damiandabrowskihaproxy_redirect_http_port: 8016:37
damiandabrowskii miseed it sorry16:37
jrosserNeilHanlon: i'm sure we see this before https://zuul.opendev.org/t/openstack/build/54443b4af4a24a2e875fd55cd6d06d4a19:45
jrosserdo you remember what that is about?19:45
jrosseri've rechecked it to see if it's just a mirrors / rocky image out-of-step thing19:46
*** noonedeadpunk_ is now known as noonedeadpunk21:51
moha7https://docs.openstack.org/openstack-ansible-rsyslog_client/latest/ops-logging.html --> this page last updated: 201622:10
moha7Setting `log_hosts` in the openstack_user_config file leads to error: https://paste.opendev.org/show/818616/22:13
jrossermoha7: that is long deprecated i think, which is why so long since an update22:16
jrosserhere is the release note https://github.com/openstack/openstack-ansible/blob/master/releasenotes/notes/remove_rsyslog_roles-05893ed9f8534a39.yaml22:17
moha7Ah22:19
jrossermoha7: i would recommend collecting somehow from the systemd journals22:22
jrosseryou'll find that they are also bind mounted from the LXC containers down to the host, so it should be possible to run some journal collection tool/agent just on the host and collect logs from pretty much everything22:23
jrosserELK + journalbeat, or the journal input to filebeat etc would be one way to do that22:24
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Define some temporary vars for haproxy  https://review.opendev.org/c/openstack/openstack-ansible/+/87232822:34
MrR_Hi people, managed to get some time to work on my openstack setup today, playbooks completed sucesfully but i'm having trouble with neutron, in the container i kept getting: ERROR neutron.plugins.ml2.managers [-] No type driver for tenant network_type: vxlan. Service terminated so i put neutron_ml2_drivers_type: "flat,vlan,vxlan" in my user_variables, this now throws a massive error that i have pasted here: 22:39
MrR_https://paste.openstack.org/show/818617/ and i have no idea what the actual issue is there. I also get errors in horizon when trying to view the network which say Unable to check if network availability zone extension is supported/Unable to check if DHCP agent scheduler extension is supported/Network list can not be retrieved - all 503 errors.22:39
MrR_Some background, i have a typical bonded setup with the relevant vlans as per the docs and generally followed the docs to set it up, i then have a (currently untagged) bridge on bond0 called br-outside that connects to a switch/router that talks to you guessed it, the outside world, i assume this should just work but i've also tried some variables i may of needed to set to no avail, can anyone help me figure 22:39
MrR_it out? It would be very much appreciated22:39
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Prepare service roles for separated haproxy config  https://review.opendev.org/c/openstack/openstack-ansible/+/87118922:47
jrosserMrR_: i think that your errors in horizon are likley down to the neutron API service not running? perhaps due to the error in your paste preventing it starting properly23:07
MrR_yeah I figured that, just mentioned it as well as. Not sure how to trace the error as i've not messed with any core files only edited user config/variables23:09
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-haproxy_server master: Prepare haproxy role for separated haproxy config  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/87118823:10
MrR_have absolutely no idea what ValueError: :6642: bad peer name format means in all to be honest23:11
jrosserMrR_: unfortunately OVS is not something i am too familiar with, but if you are lucky jamesdenton might be around 23:11
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Prepare service roles for separated haproxy config  https://review.opendev.org/c/openstack/openstack-ansible/+/87118923:12
MrR_i haven't configured ovs so this is just linuxbridge? I'm not sure why both ovs and ovn have been loaded into the container, i did try setting up the relevant ovs config but the above error persists so i havent used it23:14
jrosseroh!23:16
MrR_this is a fresh setup so its not a remaining config either23:17
jrosserwhich branch/release are you using?23:17
jrosserthe reason i assumed it was OVN was thats all over the stack trace in your paste23:18
MrR_one thing is i have become proficient in quickly setting up systems to test openstack haha, i'm using zed (26.0.0)23:18
jrosserok so in Zed OVN is the default, and you need to switch it specifically to linuxbridge if thats what you want23:19
jrosserbut linuxbridge is really not supported any more by the neutron developers, they have marked it as "experimental"23:20
MrR_i'm not fussed in all honesty, just want it working23:20
MrR_thought linuxbridge was default thats all23:20
jrosseryes, it used to be23:21
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Prepare service roles for separated haproxy config  https://review.opendev.org/c/openstack/openstack-ansible/+/87118923:21
jrosserthough that has changed for Zed23:21
MrR_So, knowing that, is it automagically set up or do I now need to follow this: https://docs.openstack.org/openstack-ansible-os_neutron/latest/app-ovn.html23:23
jrosserfor a multinode deployment that would be what you'd do23:26
jrosserif you were to build a all-in-one (https://docs.openstack.org/openstack-ansible/zed/user/aio/quickstart.html) then then default config will automagically set that up for OVN23:27
jrosseri would always recommend setting up an all-in-one to use as a reference as this is the thing we test over and over in continuous integration tests23:27
MrR_It's a multi node setup, I'll get that done and see how I go, now that i know i was missing a step hopefully i'll be ok23:30
jrosseroh also there is now a later release than 26.0.023:31
jrosserthat was the point we cut the Zed branch and theres been some pretty big fixes since then23:31
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-specs master: Blueprint for separated haproxy service config  https://review.opendev.org/c/openstack/openstack-ansible-specs/+/87118723:31
jrosser26.0.1 is tagged as a release23:32
MrR_so just a git checkout 26.0.1 and follow the minor upgrade path23:35
MrR_i'll do that now, run through the ovn setup and pop back on here tomorrow at some point, i have some time over the next few days to work on this23:37

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!