Wednesday, 2023-08-16

arthi all05:50
artwe have installed OSA(Zed)+OVN on 3 controllers and 2 compute nodes (called compute01, and compute03)05:50
artOvn-northd and neutron-servers are installed on containers (on infras). And, ovn-gateways have been set on the compute nodes. 05:50
arthttps://www.irccloud.com/pastebin/lSSoji45/user%20variable05:51
artAlso, this is our config of v-lan(br-provider) and tenant(br-vxlan)networks: 05:52
arthttps://www.irccloud.com/pastebin/qyTylsZm/openstack_user_config05:52
artOn our compute nodes, before deploying OSA, we had created br-provider (using OVS in the netplan). Then let OSA install the compute nodes. 05:52
artIn this scenario, sometimes some of instances had not access to external network 05:52
artThen I added another compute node (called compute02). The only difference between compute2 and other computes is that I have not created br-provider manually, and let OSA to create it. 05:53
artCurrently, routers of the projects which their first chassis priority is compute2, have access to external network, but the others have not.  05:53
artFor example: In the following screenshot, chassis_name: “75c7ce80-…” is our compute02, and have access to external net.05:53
artAs it seems the default configuration of OVN is according to the “Centralized Floating IP” architecture, I tried to stop ovn_controllers on compute1 and compute3, and expected the router on compute2 routes the traffic of the project which their first chassis priority is not compute02, such as this one (first priority of this router  is the chassis ‘0d3c2a0a-…, which is our compute01)’: 05:54
artIt seems I can not send screenshots here! :P05:55
artanyway, stopping all the ovn controllers on all three compute nodes have not any effect on the instances access to the external network.  It seems, ovn gw can not route the traffic to its second and third gw-priorities. 05:55
artNot sure if I am making mistake, or sth is wrong with our config.  05:58
artDo you have any idea what’s wrong overhere?05:58
jrosserdoes your use of *controller_hosts and *compute_hosts in user_variables actually do what you expect?05:59
jrosserwhat I mean is, where are those defined, and how do you think ansible loads the values?06:00
artwe have defined them in openstack_user_config: 06:14
arthttps://www.irccloud.com/pastebin/RDfVaDm7/06:15
jrosserok, so openstack_user_config is the input to the OSA dynamic inventory which creates any required ansible groups and so on06:33
artexactly06:34
jrosseruser_variables sets ansible vars that you use to override things in group_vars / host_vars or ansible role defaults06:34
jrosserso defining network_northd-hosts in user_variables won’t define an ansible host group, nor is *compute_hosts an ansible variable that you can refer to outside openstack_user_config06:36
jrosserI think it looks like you are trying to define host groups in user_variables, which won’t work06:38
artoh, i'm sorry. I have written incorrectly, here!  the ovn groups have been defined in openstack_user_variables, not user_variables. I just made mistake in writting  them here06:39
artthe config related to ovn in user_variable is just these lines: 06:39
arthttps://www.irccloud.com/pastebin/aQmtD267/06:40
jrosseranywa I think you need OVN controller running on all chassis regardless of if they are gateway or not06:41
artnothing else is defined in user_variable which be related to ovn. All other configs have been defined in openstack_user_config, exactly as you mentioned06:41
artit means I need to have ovn-controller deamon on all of my compute nodes, which I currently have them06:42
artDo I need any other thing that should be considered?06:43
jrossersadly I have very limited experience with OVN06:43
jrosserif your compute2 works reliably then this maybe points to some difference with how be-provider was created between net plan and letting OSA do it06:44
artthat's ok. Thank you very much for taking the time to help me with this issue. really appreciate that 06:45
artyeah. it seems. I  try to change all other computes to this config for br-provider. let you know the result :) 06:45
artThanks again06:45
opendevreviewAndrew Bonney proposed openstack/openstack-ansible stable/2023.1: haproxy: fix health checks for serialconsole in http mode  https://review.opendev.org/c/openstack/openstack-ansible/+/89145207:05
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_tempest master: Remove deprecated variables  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/89156708:03
jrosserarxcruz: ^ looking at codesearch this might be a breaking change, but I am assuming that the design error mentioned in the commit message here has been addressed https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/88417208:05
noonedeadpunkbtw https://review.opendev.org/q/topic:osa%252Fopenstack_resources is another breaking thing08:21
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-haproxy_server master: Add HTTP/2 support for frontends/backends  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/89157208:37
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-haproxy_server master: Add HTTP/2 support for frontends/backends  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/89157209:21
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Enable HTTP/2 for TLS-covered frontends  https://review.opendev.org/c/openstack/openstack-ansible/+/89157509:31
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Update haproxy healthcheck options  https://review.opendev.org/c/openstack/openstack-ansible/+/88728509:36
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_tempest master: Rename includelist/excludelist file path vars  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/89157810:00
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_tempest master: Allow include/exclude lists to be defined in many variables  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/89157910:00
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant stable/zed: Install mysqlclient devel package  https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/89144711:05
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant stable/2023.1: Install mysqlclient devel package  https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/89127911:06
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant master: Install mysqlclient devel package  https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/88898511:06
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant stable/2023.1: Install mysqlclient devel package  https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/89127911:06
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant stable/zed: Install mysqlclient devel package  https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/89144711:07
jrosseri don't have much luck at all with tempest basic server ops on a baremetal AIO :/11:38
jrosserlots of neutron failed port binding11:38
noonedeadpunkI've just spawned one.11:48
noonedeadpunkEven with Octavia11:48
jrosserthe tempest vars patches seem to be working11:50
jrosserbut i backed out to only basic server ops as it was all very breaking11:50
jrosseri see this https://paste.opendev.org/show/bWJPQOl7ddixcg6TXrKp/11:50
jrosserbut then becasue OVN i'm a bit lost with what to look at next11:51
noonedeadpunkI actually do recall seing that in CI, but it was solved somehow...11:52
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_tempest master: Ensure test exclusion file is removed when there are no exclusions  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/89158611:52
jrosserlike theres no neutron agent any more to look in the logs of11:52
noonedeadpunkUsually neutron-server and then ovn-nortd are good starting points11:53
noonedeadpunkAh. I think it was hostname mismatch when I saw in CI11:53
noonedeadpunkthat one defined for ovsdb does not match what's expected by neutron11:53
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible-os_neutron/commit/74b0884fc232aa96f601b4c24c3e36f3fba4f1bb11:54
jrosserwierdly, i can manually boot a cirros and aparrently attach it to one of the leftover tempest networks11:54
noonedeadpunkActually, we likely can revert this now...11:54
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Revert "Workaround ovs bug that resets hostname with add command"  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/89145311:57
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Do not override tempest_plugins for AIO scenarios  https://review.opendev.org/c/openstack/openstack-ansible/+/89158711:57
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Do not override tempest_plugins for AIO scenarios  https://review.opendev.org/c/openstack/openstack-ansible/+/89158711:58
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Revert "Workaround ovs bug that resets hostname with add command"  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/89145411:59
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Revert "Workaround ovs bug that resets hostname with add command"  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/89145312:00
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Revert "Workaround ovs bug that resets hostname with add command"  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/89145312:00
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: Adopt for usage openstack_resources role  https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/88987913:26
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_gnocchi master: Use proper galera port in configuration  https://review.opendev.org/c/openstack/openstack-ansible-os_gnocchi/+/89010013:33
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_aodh master: Use proper galera port in configuration  https://review.opendev.org/c/openstack/openstack-ansible-os_aodh/+/89009313:34
noonedeadpunkto unblock adjutant role, we'd need to start merging from Zed: https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/89144713:36
noonedeadpunkotherwise upgrade jobs are not happy13:36
spatelFolks! Do you have any suggestion of good JBOD card for Dell server? I am looking to use for Ceph. 13:46
jrosserlsi-logic/broadcom HBA have been ok for us13:51
jrosserspatel: ^ you mean like SAS adaptor to connect a JBOD chassis?13:51
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant master: Stop reffering _member_ role  https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/89146214:01
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Add openstack_resources role skeleton  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/87879414:17
spateljrosser, what do you mean by JBOD chassis? 14:22
jrosserspatel: you've not been very specific - i assumed you meant you had some disk chassis to add to a dell server14:23
spatelI have Dell server which RAID controller and it doesn't have JBOD mode. I have two option configure all drive with RAID0 each and expose to OS or buy new card which has JBOD mode.14:23
spatelI have 4x 6TB disk on each dell server for OSDs14:24
jrosserthe firmware on some dell controllers can be reflashed to IT mode14:25
jrosseron my r730 i can configure the drives as "non raid" and they just get passed through to the OS14:26
spatelOh!! I see 14:32
spatelDo you have dedicated mon nodes or running on with openstack controller?14:33
spatelI have 5 ceph nodes and thinking running controller node with OSDs nodes :(14:33
spatelmon is very light weight and only required processing when OSD or any node is down 14:34
noonedeadpunkusually having mon/mgr on osd is not great14:34
noonedeadpunkas then upgrade process is quite cumbersome.14:35
noonedeadpunkor well, maybe with ceph-adm it's not a concern anymore14:35
jrosseri have dedicated mon nodes in that they are in LXD on some other hosts sharing with other stuff14:36
noonedeadpunkyeah, you totally should some kind of isolation between mon and osd services if you decide to co-locate them on the same hardware14:36
jrosserbut they are in a totally separate failure domain from the OSD14:36
spatelIn cephadm they are kind of isolated anyway in containers 14:40
noonedeadpunkalso rgw/mds are not _that_ lightweight in case you've ever considered using them14:41
spatelI have also question related OS which one is better for ceph CentOS vs Ubuntu 14:41
* jrosser has waaay more hardware for rgw than mons14:41
jrosserlike 10x14:41
spatelI heard podman is best container system for Ceph so choose CentOS distro 14:42
noonedeadpunkWell, I guess it would highly depend on the usecase and traffic served 14:42
noonedeadpunkbut cephadm will deploy centos containers anyway?14:42
noonedeadpunkthere's no selection from what I heard?14:43
noonedeadpunkand dockerhub cointains pretty outdated images, so it's also pretty much only quay.io that should be used14:44
spatelwhat do you mean by depend on usecase of traffic? 14:51
spatelcontainer are just for control plane right14:51
noonedeadpunkI think pretty much everything runs in containers? Including OSDs?14:52
jrosserthat was probably to do with my very large number of rgw14:52
noonedeadpunkot 100% sure though14:52
spatelI love centOS but only worried in future IBM mess with it again :)14:54
noonedeadpunkbut there're no other options with cephadm?14:54
noonedeadpunkyes, you can run containers pretty much anywhere, but containers themselvs will have centos anyway?14:55
noonedeadpunkjsut check their manifests: https://quay.io/repository/ceph/ceph/manifest/sha256:11e9bcdd06cfd4f6b0fef3da5c2a3f0f87d7a48c043080d6f5d02f7010ad72eb14:58
spatelYes! Ubuntu required dedicated repository to run podman 14:58
noonedeadpunk17.2.6 running on CentOS 8 Stream14:59
spatelcentOS does it native 14:59
jrosserno wonder there is nervous people tbh14:59
spatelI don't trust IBM tbh :) 15:00
noonedeadpunkor well: https://github.com/ceph/ceph-container/tree/main/ceph-releases/quincy15:00
spatelThey are business people and they can change policy with stroke of pen 15:00
noonedeadpunkthis is repo from which these containers are built from - rhel/ibm/centos15:01
noonedeadpunkand they not only can - they have already did that lately15:02
noonedeadpunk*have already done that lately15:02
spatelLets stick with Ubuntu then.. hell to centos 15:02
noonedeadpunkbut as I said - you still have CentOS inside images :)15:03
noonedeadpunkor better say - inside containers15:03
noonedeadpunkso as long as you're running cephadm it doesn't really matter what will be on host15:05
spatelnoonedeadpunk No... I have deployed cephadm and its running ubuntu based containers 15:06
spatelhttps://paste.opendev.org/show/bOZDh0p3sTj62acovqxh/15:06
noonedeadpunkbut uname will show host kernel15:06
noonedeadpunkas containers are not virtualizing that15:07
jrossertry some dnf command :)15:07
noonedeadpunkwhat's `cat /etc/*release*`?15:07
spatelCentOS Stream release 815:08
opendevreviewMerged openstack/openstack-ansible master: Update haproxy healthcheck options  https://review.opendev.org/c/openstack/openstack-ansible/+/88728515:08
spatelwtf... 15:08
jrosserspatel: this is how containers work, right?15:09
jrossertheres some $you-shoudlnt-have-to-care rootfs / runtime and it uses the host kernel15:10
spatel+1 15:10
jrossersame is true for LXC/LXD15:10
jrosserit's completely trivial to use other OS LXC/D on ubuntu or whatever15:11
jrosserit's just an OSA thing where we take the choice to tightly couple the OS in the container to the host15:11
noonedeadpunkWell, we technicaly could de-couple that as well... But then we'd need to pull in container images from linuxcontainers.org rather then building them15:12
spatelcurrently we have ceph + cinder but in future we want to setup NFS storage 16:09
spatelcinder with NFS required to mount NFS volume to all compute nodes correct?16:10
noonedeadpunkyup16:16
noonedeadpunkwhich ends up in a need to reboot all computes once NFS becomes unreachable16:17
spatelholy crap!16:18
spatelwhy not lazy unmount ?16:18
spatelDoes cinder create LVM volume on top of NFS share and give it to VMs? 16:19
spatelTrying to understand how NFS share plug to VMs :)16:20
jonherhi, i was running a deployment yesterday and finished it up today after hitting an issues with the ceph-rgw-keystone-setup playbook. I'm pretty sure this has worked in previous deployments on the same git branch/version when running 'setup-openstack' playbook: https://github.com/openstack/openstack-ansible/blob/xena-em/playbooks/ceph-rgw-keystone-setup.yml#L1716:31
jonherhere hosts would evaluate to "localhost" and execute there, then as vars_file loads "openstack_service_setup_host" will become the first utility container as defined in defaults/source_install.yml16:31
jonherwhat i ended up doing was settings the "openstack_service_setup_host" in user_variables.yml to the first utility container, just like the source_install.yml does. Otherwise it would attempt to use /openstack/utility-<verison>/bin/python on the osa-deployer host to run the task.16:31
jonherBut this seems like a bug? And why has it previously worked? Is the variable somehow available from a previous task when running setup-openstack.yml in one pass or similar?16:31
noonedeadpunkjonher: huh, interesting. But the playboook is expected to load defaults/source_install.yml according to the code? But now I kinda no sure if ansible does not evaluate vars differently. Ie, including vars after resolving hosts20:07
noonedeadpunkas that would result in a behaviour you've mentioned20:07
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/2023.1: Bump keystone version to unblock upgrade jobs  https://review.opendev.org/c/openstack/openstack-ansible/+/89163320:11
noonedeadpunkthough I wonder how that worked in CI, as I believe we should run this code in ceph job, and it's LXC-based, so localhost should not be valid target...20:12
*** bjoernt_ is now known as bjoernt21:41

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!