Wednesday, 2022-10-19

*** ysandeep|out is now known as ysandeep|PTO00:05
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/train: Use cloudsmith repo for rabbit and erlang  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/86179407:06
jrosser_^ thats running an upgrade job on train too, which we probably want to get rid of?07:10
noonedeadpunkYeah, true07:11
noonedeadpunkI also see no way of saving centos 7 lxc job07:11
noonedeadpunkAs centos has dropped their image for lxc07:11
noonedeadpunkAnd for the legacy method we need infra proxy that was likely dropped or super outdated07:11
noonedeadpunkDon't want to mess/fix it07:12
noonedeadpunkI wonder what out of that we actually need https://opendev.org/opendev/base-jobs/src/branch/master/roles/mirror-info/templates/mirror_info.sh.j2#L83-L8507:13
noonedeadpunkor well, we'd need to fix that https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror/templates/mirror.vhost.j2#L259-L26207:16
noonedeadpunkand repalce with https://us.lxd.images.canonical.com/07:17
noonedeadpunk(but I'd rather drop)07:17
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Drop usage of lxc containers proxy  https://review.opendev.org/c/openstack/openstack-ansible/+/86182507:20
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/ussuri: Bump SHA for galera, rabbitmq and rally roles  https://review.opendev.org/c/openstack/openstack-ansible/+/85302907:41
noonedeadpunkNah, we can't drop as I guess like Rocky use only this way of images retrieval07:42
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-tests stable/train: Restrict pyOpenSSL to less then 20.0.0  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/86183107:45
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-tests stable/train: Restrict pyOpenSSL to less then 20.0.0  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/86183107:47
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/train: Use cloudsmith repo for rabbit and erlang  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/86179407:48
gokhanisihi folks, I am trying to make keystone ldap integration but it didn't work. For testing ı created openldap server on ubuntu focal and created this ldif file > https://paste.openstack.org/show/bEY20h8Nvj5XtaE2zDJC/ this is my keystone domain config > https://paste.openstack.org/show/boQuvnoOHaLZ2lmkFMYd/, I created b3lab domain manually but in keystone logs it says b3lab domain not found.  Maybe I have missed some things to do. 07:52
* noonedeadpunk has no experience in ldap integration08:01
kleinigokhanisi: this is my configuration for OSA to configure keystone with LDAP auth08:24
gokhanisikleini, and it worked for you 08:27
kleiniyes, it works08:34
kleinigokhanisi: is your keystone configuration file located in /etc/keystone/domains/keystone.b3lab.conf localted?08:37
gokhanisikleini, yes it is like that https://paste.openstack.org/show/bjO6UVFPtOA71srmlcMq/08:41
gokhanisiand this is keystone.conf https://paste.openstack.org/show/bBggmfcI7WMFPZYMGo31/08:43
kleinimaybe turn on verbose and debug logging in keystone. did you try to use ldapsearch to test connection to your LDAP?08:47
gokhanisikleini, it is working with ldap search > https://paste.openstack.org/show/bbOjfu6ZohKIh5PVZaZn/ 08:54
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/ussuri: Switch to tracking stable/ussuri for EM release  https://review.opendev.org/c/openstack/openstack-ansible/+/85302908:54
gokhanisimay be I am typing url wron08:54
gokhanisikleini, thanks it is working now :) and now How can we map openstack projects with ldap objects? 09:02
gokhanisiI can list groups and users on ldap09:02
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/train: Use cloudsmith repo for rabbit and erlang  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/86179409:13
kleinijust do normal role assignments. I create projects then in the same domain - b3lab in your case - and then assign role member to some group or user for that project09:16
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-tests stable/train: Restrict pyOpenSSL to less then 20.0.0  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/86183109:17
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-tests stable/train: Return jobs to voting  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/86185509:20
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/train: Disable upgrade jobs on EM branch  https://review.opendev.org/c/openstack/openstack-ansible/+/86185809:23
dokeeffe85Hi again, I lost my controller and computes to a power failure and when I rebooted them today after getting them back online I get the following errors https://paste.openstack.org/show/beSCxaSdw95snHjbRdhy/ when trying to attach & start containers. I obviously didn't get a chance to stop the containers before I lost the servers. Is there anyway to fix this or is it a lxc-containers-destroy.yml + lxc-containers-create.yml again? Thanks in 09:56
dokeeffe85advance09:56
noonedeadpunkdokeeffe85: and what does /var/log/lxc/lxc-infra1_utility_container-f80f87fa.log says?09:58
admin1tag 25.1.1 fails on python_venv_build : Install python packages into the venv  =>  ERROR: Error [Errno 2] No such file or directory: 'git' while  executing command git version\nERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH  .. doh !! 10:24
admin1and i have not done any changes or overrides 10:24
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/queens: EOL OpenStack-Ansible Queens  https://review.opendev.org/c/openstack/openstack-ansible/+/86186810:25
admin1ssh to container, apt install git ; re-run playbook .. and its solved .. 10:25
noonedeadpunkadmin1: is it for placement?10:25
admin1no .. c1_heat_api_container10:25
admin1ignore the c1_ 10:26
noonedeadpunknah, then needs patching10:26
admin1where would i see our CI passing logs .. 10:26
noonedeadpunkJust for placement was fixed with https://opendev.org/openstack/openstack-ansible-os_placement/commit/6084c248fcae02c413133329b705678cd75c1bfe10:26
admin1i got no issues on placement 10:26
noonedeadpunkCI can be quite different. As we forcefully enable wheels building there10:27
admin1oh 10:27
noonedeadpunkAnd you will see error if wheels build is disabled for some reason10:27
noonedeadpunklike running with limit is one option10:27
noonedeadpunkand it's result of one "fix" that now more properly evaluates things10:28
noonedeadpunkbut git issues in other places started arising when wheels are not built10:29
noonedeadpunkso would be great if you could push some patch for that10:29
admin1this was in acceptance .. i will deploy tonight in a prod env . .. will have a 100% confirmation then .. 10:31
dokeeffe85noonedeadpunk this is the entire file https://paste.openstack.org/show/bNys7633pfauez8qdhQV/10:43
noonedeadpunkand what if you add `-F` to lxc-start?10:45
noonedeadpunkI just assume that smth is off with either /var/lib/machines mount or some net interface 10:47
noonedeadpunkBut not sure what's exactly10:47
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_tacker master: Add deployment of tacker-scheduler  https://review.opendev.org/c/openstack/openstack-ansible-os_tacker/+/86187010:52
admin1when "external" ceph is enabled, ceph is also not installed in glance 10:52
admin1that blocks the playbook on octavia when it wants to upload the amphora image 10:52
admin1external ceph using cephadm, 10:53
noonedeadpunkI think it depends on what backends you've enabled for glance?10:54
admin1user_variables => glance_ceph_client: glance   ;     glance_default_store: rbd   ; glance_rbd_store_pool: images10:55
noonedeadpunk`glance_default_store: rbd` is the thing that should do the trick actually10:55
noonedeadpunkas that's the condition on when ceph part does run https://opendev.org/openstack/openstack-ansible-os_glance/src/branch/master/tasks/main.yml#L157-L15810:56
noonedeadpunkand `_glance_available_stores: "{{ [ glance_default_store ] + glance_additional_stores }}"` 10:57
admin1i am going to return glance playbook with -vvv  and grep rbd/ceph 10:59
admin1return -> rerun 10:59
noonedeadpunkso if you go to interactive python (ie /openstack/venvs/glance-<version>/bin/python) and execute `import rbd` it will fail with import?10:59
admin1yeah .. no /etc/ceph and no packages 10:59
noonedeadpunksuper weird11:00
admin1ceph_pkg_source: distro  .. 11:00
admin1without this, build does not work on 22.0.4 11:00
noonedeadpunkit's 22.04?11:00
admin1yeah11:00
noonedeadpunkrly weird11:01
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-tests stable/queens: Switch linters to EOL  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/86187311:08
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/queens: EOL OpenStack-Ansible Queens  https://review.opendev.org/c/openstack/openstack-ansible/+/86186811:08
admin1noonedeadpunk https://gist.github.com/a1git/5f5a129b62e57cd52c8791f5ecd0d98611:11
admin1not sure if it helps 11:11
admin1noonedeadpunk, is this a good way to force  ? openstack-ansible os-glance-install.yml  -e 'glance_default_store: rbd' 11:12
noonedeadpunkbut it shows that ceph.conf is being copied11:12
noonedeadpunkand ceph packafges are symlinked properly11:13
admin1now i see ceph.conf :D 11:13
admin1hmm.. 11:13
noonedeadpunkhttps://gist.github.com/a1git/5f5a129b62e57cd52c8791f5ecd0d986#file-gistfile1-txt-L41811:14
admin1i will destroy this container and retry .. could be it fails the first time and then works the 2nd time 11:14
noonedeadpunkand all tasks are OK actually. Nothing was changed11:14
noonedeadpunkum, then tasks would be in changed state11:14
noonedeadpunkaccording to paste I can say nothing was done during this run11:15
noonedeadpunk`c1_glance_container-bf88ca5b : ok=108  changed=1 ` and this changed is forceful user creation11:15
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/queens: EOL OpenStack-Ansible Queens  https://review.opendev.org/c/openstack/openstack-ansible/+/86186811:28
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/queens: EOL OpenStack-Ansible Queens  https://review.opendev.org/c/openstack/openstack-ansible/+/86186811:28
admin1tracked down the issue of glance not working to " installed ceph-common package post-installation script subprocess returned error exit status 6"11:39
admin1cannot even purge it .. 11:40
admin1that is the only diff in relation to ceph i see in cinder vs glance container 11:41
jrosser_try installing that by hand and see what the error is11:44
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_barbican stable/rocky: Stop old uwsgi service if exist  https://review.opendev.org/c/openstack/openstack-ansible-os_barbican/+/86187711:46
dokeeffe85noonedeadpunk it seems that lxcbr0 doesn't exist https://paste.openstack.org/show/bI9tQbBaDAeXkBgVqT4Z/11:53
admin1no logs, nothing except  /usr/bin/dpkg returned an error code   .. 11:53
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/ussuri: Switch to tracking stable/ussuri for EM release  https://review.opendev.org/c/openstack/openstack-ansible/+/85302911:53
admin1installing strace to go deep 11:53
admin1no logs and nothing 11:53
noonedeadpunkdokeeffe85:try restarting systemd-networkd11:54
noonedeadpunklxcbr0 is managed with it11:54
admin1write(6, "{\"jsonrpc\":\"2.0\",\"method\":\"org.d"..., 100) = 100 -- read(6, 0x562324022690, 4096)           = -1 ECONNRESET (Connection reset by peer) -- EBADF (Bad file descriptor)11:55
admin1i am going to deploy it in prod and see if i face the same issue or not 11:56
noonedeadpunkso, train seems to be fixed way easier then ussuri11:56
admin1one quick question .. is there any override to tell glance and cinder to use the same container for instance :D 11:56
dokeeffe85noonedeadpunk, no joy with that. "brctl show" only lists mgmt, storage & vxlan bridges11:56
admin1same repos, same packages .. ceph-common is good in one, bad in another 11:56
admin1i have already lxc-destroyed the containers to recreate 11:57
admin1so will try in a prod env now  to make sure its not my env . 11:57
noonedeadpunkdokeeffe85: as I said check systemd-networkd11:58
dokeeffe85noonedeadpunk yep I restarted it and then tried restarting the container with -F and same result11:59
jrosser_if lxcbr0 is missing you need to look at the service that creates it and see why it is missing12:01
jrosser_there is no point moving on to restarting the container until the bridge is there12:01
admin1ip link set dev lxcbr0 up ? 12:03
dokeeffe85lxcbr0 doesn't exit. Not sure which service creates it jrosser_ it was all working fine until a reboot of the server so it was created initially12:13
jrosser_you have /etc/network/interfaces.d/lxc-net-bridge.cfg ?12:16
admin1dokeeffe85 is it an aio ? 12:25
admin1single node all in 1 install 12:25
admin1dokeeffe85, reboot -- was it after some update/upgrade of packages ? 12:26
noonedeadpunkadmin1: regarding override for glance/cinder - it's defenitely smth env.d related12:37
noonedeadpunkshould be doable12:37
noonedeadpunklike create /etc/openstack_deploy/env.d/glance.yml with https://paste.openstack.org/show/bVCyhed2fVR46gT2FQPu/12:39
noonedeadpunknot 100% sure about that so worth to backup openstack_inventory jsut in case :D12:39
noonedeadpunkactually... likely it won't work as just glance playbook won't run as no hosts will be in glance_all12:40
dokeeffe85jrosser_ yep I have that file. admin1 nope it's not aio. We had a power cut and I lost the three servers 12:40
jrosser_admin1: i dont think that combining glance and cinder is useful to address whatever ceph trouble you had12:41
jrosser_as usual root cause needs to be found12:41
noonedeadpunkthen likely worth to mess up with other parts of env.d12:41
admin1yeah .. working to deploy in prod with full logs .. 12:41
jrosser_dokeeffe85: then you can try to 'ifup' the interface12:41
admin1so that i can share 12:41
noonedeadpunkbut should be overall doable12:41
jamesdentonFWIW: I have resigned to adding a lxcbr0 bridge to my netplan config to avoid the issue mentioned (not being recreated on reboot). I could not easily replicate the issue, and since a) we do baremetal in prod and b) we don't often reboot controllers, i don't see it in the wild ,either12:44
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Mark Zaqar as deprecated in role matrix  https://review.opendev.org/c/openstack/openstack-ansible/+/86188412:45
noonedeadpunkwell there're 2 patches around for this topic - first to move lxcbr fully to networkd and second - allow to avoid osa trying to create it12:46
noonedeadpunk(and manage)12:46
jamesdenton /thumbsup12:47
*** frenzy_friday is now known as frenzyfriday|rover12:52
dokeeffe85jrosser_ nope I can't as the interface doesn't exist, I can't see it anywhere. jamesdenton can you give me a paste of that netplan bridge you created please?13:01
jrosser_ifup is the command to bring up the interface with /etc/network/... type definitions afaik13:02
jrosser_it doesnt need to exist, you need to make it exist with some command13:02
jamesdentonhttps://paste.opendev.org/show/bLtNoxDfwJGIHg9t2KQp/ -- netplan apply would bring it up in this case13:02
admin1noonedeadpunk, i am trying the env.d override to have cinder/glance in the same container .. and changed the line in setup-openstack to install cinder first and then glance .. lets see what that does 13:05
admin1if it works, then i can destroy this env and again start fresh 13:05
jrosser_admin1: we already warned you about the ansible groups being empty doing that13:06
admin1:) 13:07
jrosser_i cannot understand doing this rather than just debug an apt problem13:07
jrosser_tbh there are expected to be issues as you're attempting something thats not tested13:08
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove usage of rsyslog roles  https://review.opendev.org/c/openstack/openstack-ansible/+/86188613:11
jamesdentonjrosser_ Re: OVN ref arch: I am thinking of creating three -- 1) 3 controller nodes + X compute nodes, with DVR and computes as gateway chassis. 2) 3 controller node + 3 network node (gateway chassis) + X compute (non-gateway). 3) 3 controller/network node (gateway chassis) + X compute node. I think that mirrors most of today's deployments. Thoughts?13:18
jrosser_yes that would cover it - though i'm not sure they actually call it DVR these days as it's a little different13:19
jrosser_i13:20
jamesdentoni'll double check13:20
admin1jamesdenton, these days  deployment demand  are now more towards HCI with ceph ..       so 3x nodes ( controllers ) + 3x nodes ( hypervisor )  .. no network nodes ..    the 3x controllers + 3x hypervisors all have  ceph running .. 13:22
admin1compute also act as network nodes 13:22
jrosser_really?13:22
jamesdentonYes, I have seen that pushed more lately.13:22
admin1jrosser_ yes :) 13:23
dokeeffe85Thanks jamesdenton that worked as far as starting the containers but there's other issues now after the reboot that I'll have to dig a bit deeper on before I ask any questions13:23
jamesdentondokeeffe85 sure, just let us know.13:23
dokeeffe85Will do thanks13:23
jrosser_must be fun constraining the memory on a combined ceph/hypervisor when things go $wrong in ceph13:24
admin1hypervisor is only doing the role of the osd 13:24
admin1and not monitors and others .. its the controllers that do them 13:24
jrosser_steady state yeah whatever, but it can get out of hand real quick on the OSD when things are "broken"13:24
jamesdentonadmin1 since we decouple OSA from Ceph deployment, I'll let the operator tack Ceph onto any one of those scenarios13:25
jrosser_lose a portion of your cluster due to a switch or some other problem and the memory usage can get very high13:25
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Add release note about used ansible and ceph versions  https://review.opendev.org/c/openstack/openstack-ansible/+/86188913:26
jamesdentonAnyone here using OSA+OpenDaylight? Or OSA+NSX-T?13:27
noonedeadpunkhyperconverged inra is always "fun". Most fun for osd+hypervisor is RAM consumption. As you always need RAM for domains, but OSDs also require ram. So I think I would reserve a lot of ram for hypervisor in placement if I had to do that13:28
noonedeadpunkthere bunch of fixes passes for stable branches - would be good if we could merge them sooner then later before they didn't break again :D https://review.opendev.org/q/parentproject:openstack/openstack-ansible+branch:%255Estable/.*+status:open+label:Verified13:32
jrosser_huh lots of depends-on there13:35
jrosser_need to get the order right13:35
noonedeadpunkyeah, quite some... But I expected to be worse tbh13:36
noonedeadpunkit's mostly rabbitmq/erlang thing13:36
noonedeadpunkthat's broken even back to Rocky13:37
noonedeadpunkBut I'm not sure I have enough motivation now to fix Rocky....13:37
noonedeadpunkand rocky not in terms of distro but in terms of openstack release13:40
noonedeadpunkwe should EOL it to get rid of this confusion lol13:40
jamesdentonmgariepy I seem to recall you had some patches to os_neutron for placing OVN gateway chassis? versus all ovn-controllers being gateways? Or maybe you just mentioned wanting to do it, can't recall13:43
jamesdentonif not, i can do it as part of this doc exercise13:43
mgariepylet me look13:44
jamesdentonMy plan is to split out the gateway logic from the neutron_ovn_controller group, and create a second group names neutron_ovn_gateway_chassis, and just manipulate inventory accordingly.13:46
mgariepyhttps://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/76064713:47
jamesdentonyeah ok, same concept13:47
mgariepyyep.13:47
mgariepyadd me to review i can help if you want :D13:47
jamesdentongreat. I'll ressurect that, thank you!13:47
jamesdentondefinitely13:47
jamesdentoni'll put together some steps to duplicate this in 6-9 VMs, depending on the scenario13:48
noonedeadpunkSo regarding zookeeper. I was using fork of this repo https://opendev.org/windmill/ansible-role-zookeeper/src/branch/master13:49
noonedeadpunkAnd I do see how we will struggle with it13:50
noonedeadpunkMaybe worth trying to reach pabelanger but I kind of doubt he will be eager to add config_template...13:52
admin1destroyed everything .. retrying again .. 13:53
admin1this time i will log everything from start13:53
noonedeadpunkas he was not willing to listen about configuring cluster out of the box (https://github.com/openstack-archive/ansible-role-zookeeper/commit/b223d56660ea21f0feb8c6c7bf27dd4bac07a7fe)13:54
opendevreviewMerged openstack/openstack-ansible stable/queens: EOL OpenStack-Ansible Queens  https://review.opendev.org/c/openstack/openstack-ansible/+/86186813:55
admin1kolla+ovn works out of the box ( to get inspired from ) 13:55
jamesdentonit's not inspiration that's lacking :D13:56
jamesdentononly time13:56
admin1getting inspired is subtly saying to copy to save time :) 13:56
jamesdentonit does work for OSA, too. Just looking to make it more complete so we can make it the default vs LXB13:56
admin1i also know ovn is now in starlingx .. but have not tested the latest one 13:56
jamesdentonyes, things have been borrowed here and there :)13:57
admin1no need to reinvent the wheel if it works13:57
jamesdentonbut I think OSA does a better job of spelling out different deployment scenarios versus some of the others, which tend to be a little more... prescriptive/opinionated13:57
admin1yes .. which is why i stick to osa for all prod and (paid) work .. and rest of the time, test others 13:57
admin1we all are operators and so also OSA is better ..    2 weeks back i tried kolla + ovn and could not get octavia to work ..    tried for 2 weeks .. zero replies :) 13:59
admin1kolla/docker works . and so when it works, there are no interaction because it just works and people have no questions .. and when it breaks or does not work for some reasons, no one knows how to answer14:01
noonedeadpunkfwiw we're having operator hours now in https://www.openinfra.dev/ptg/rooms/folsom14:01
jamesdentoni am delayed by 25 more min14:04
opendevreviewMerged openstack/openstack-ansible-tests stable/ussuri: Restrict pyOpenSSL to less then 20.0.0  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/86174214:10
opendevreviewMerged openstack/openstack-ansible-tests stable/train: Restrict pyOpenSSL to less then 20.0.0  https://review.opendev.org/c/openstack/openstack-ansible-tests/+/86183114:10
dokeeffe85jamesdenton I left my desk for a bit and when I came back I know have a horizon dashboard and my VM's are up and can ping out so I don't know what happened but it's back. Thanks everyone 14:47
dokeeffe85*now14:47
*** dviroel is now known as dviroel|lunch15:46
nixbuilderI am confused (as is normal) about vxlan for private tenant networks.  The documentation here (https://docs.openstack.org/project-deploy-guide/openstack-ansible/latest/targethosts-networkconfig.html) says that "Note that br-vxlan is not required to be a bridge at all, a physical interface or a bond VLAN subinterface can be used directly and will be more efficient."  So what parameters do I use in openstack_user_config to use a16:17
nixbuilder physical interface?16:17
noonedeadpunkbasically what you need - consistent interface name across net nodes and compute nodes16:24
noonedeadpunkIt can be bridge, but you can also name interface as vxlan in netplan or systemd-networkd16:24
jrosser_is it eventually what is specified here https://github.com/openstack/openstack-ansible/blob/master/etc/openstack_deploy/openstack_user_config.yml.example#L27316:25
noonedeadpunkto make it less confusing you might use `host_bind_override: $name` there as well16:26
noonedeadpunk(iirc)16:26
jamesdentoni would expand that to say, the container_* bits are probably not important these days since neutron agents are on metal and not in LXC anymore (right?), but there is logic that uses the range based on type to populate ml2_conf.ini and the agent configs. we could probably stand to test/update this16:26
jamesdentonhost_bind_override would only be applicable to vlan type, i think16:26
jrosser_ultimately whatever the value of "{{ tunnel_address }}" is what matters in os_neutron16:27
noonedeadpunkiirc it's applied everywhere in bare metal hosts16:27
jamesdentonright, so i might try a deployment and it would be nice if we could eliminate container_* since it's irrelevant16:27
jrosser_it is *hugely* confusing16:27
*** dviroel|lunch is now known as dviroel16:28
noonedeadpunkor you can  just define neutron_provider_networks like here https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/defaults/main.yml#L392-L399 in user_variables and forget about openstack_user_config :D16:28
jamesdentonor that :)16:28
noonedeadpunk(in terms of vxlan/vlan/flat nets)16:28
jamesdenton1,000 ways... to shoot yourself in the foot16:29
noonedeadpunkyup, seems we tried our best to make it confusing...16:29
noonedeadpunkand really huge part is just historical16:29
noonedeadpunkbut we indeed need to review our docs16:30
jamesdentonnixbuilder i think what we're saying is, that you need a VLAN dedicated to overlay traffic and that vlan interface can have an IP on it (the TEP) or the vlan interface could be in a bridge (i.e. br-vxlan) that has an IP on it (the TEP). Based on some logic, that IP will be automatically discovered and used for local_ip in neutron config files16:30
jamesdentonnoonedeadpunk agreed. it all comes back to docs <crying emoji>16:31
jamesdentonnixbuilder it may be easier to not define at all in openstack_user_config, and instead use what jrosser_ mentioned. You're then defining everything manually16:32
noonedeadpunktbh what I like about metal deployments is how clean your openstack_user_config is....16:33
nixbuilderThanks everyone... I'll put on another pot of coffee and digest all this.  Thanks again.16:34
jrosser_https://github.com/openstack/openstack-ansible-os_neutron/blob/36a2f02561b9281ee7e46287601f2d21a7fbc142/defaults/main.yml#L385-L38616:34
jamesdentonsure, thanks for asking. it prods us to update the docs16:34
jamesdentonthats a bad default16:34
jamesdentonlol16:34
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts stable/ussuri: Use legacy image retrieval for CentOS 7  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/86174416:36
jrosser_i wonder if theres actually much point at all keeping the vxlan part of `provider_networks`16:36
jrosser_maybe it's needed for feeding into OVS or something, i don't know16:36
jamesdentonwell, i think the only benefit is the range logic.16:36
jamesdentonwe'd have to push that into a separate var or doc or whatever16:37
jrosser_but it could instead be just two obivously named vars16:37
jrosser_the address and the range16:37
jamesdentonsure. we're gonna find out here shortly16:37
jamesdentoni never liked all of those being "provider" networks, anyway. confusion.16:38
jrosser_imho it sort of turns into "container networks" which really is what you might want this for, wiring up the control plane16:38
jamesdentontrue16:38
jrosser_the neutron-ness of it is somehow making it conceptually really hard16:39
jrosser_but then also we do use it to create OVS bridges with the right settings?16:39
jamesdentonnot in this case, that only applies to vlan16:39
jrosser_:)16:39
jamesdentonfor vxlan, since the ip logic is separate, i think it's only the container wiring (not relevant) and range stuffs. So maybe new vars is the way to go, but then also need to keep the provider_networks override mechanism there, too.16:41
* noonedeadpunk is also confused by now16:42
jamesdentonfor vlan, yeah, i think we use container_bridge for building the ovs bridge16:42
noonedeadpunkI think I need to get mnaio admin1 promoted to really have a good play with all options to come up with what we can drop and how simplify16:43
jamesdentonagreed16:43
jrosser_i dont specify any of this at all btw16:43
noonedeadpunkWe do have neutron_provider_networks16:44
jamesdentonyou're using neutron_provider_networks?16:44
jrosser_let me look16:44
noonedeadpunkbut I see lot's of crap in openstack_user_config as well for $reason16:44
jamesdentonyeah, i guess that was intended as the one-stop-shop for abstracting this stuff out? but thee days specifying things in both places gets confusing16:45
jrosser_https://paste.opendev.org/show/bQFDXRzwYupWNh14989p/16:46
jamesdentoni see16:46
jrosser_its the same interface name everywhere - except where it's not16:46
noonedeadpunkyeah, it's yet another option I'd say16:46
jamesdentonwell, we never defined the interface, anyway16:46
jamesdentoni think we're relying on matching the CIDR?16:46
jrosser_so its easier to just not bother trying to do it in o_u_c and just do that in group_vars instead16:46
noonedeadpunkyeah, I tend to agree here16:48
jrosser_there might have been a neater way to do that, but if you have non uniformity then the providder_networks thing is very hard to use16:48
jrosser_imho variables are better16:48
jrosser_becasue you can have them in user_variables globally -> uniform deployment16:48
noonedeadpunkbut again, why would you have non uniformity given you can name interface in netplan/systemd-networkd16:48
jrosser_or you can put them wherever you need in group_vars16:48
noonedeadpunkBut yes, providder_networks is quite complex/unobvious for neutron usecase16:49
jrosser_naming interfaces is hard16:50
noonedeadpunkso I'd rather place them wherever else, and leave providder_networks only for containers usecase16:50
jrosser_as you need to have the mac of everything recorded somewhere to deploy those names16:50
noonedeadpunkwell... In maas and ironic you have I believe?16:50
jamesdentonso, yank all of the neutron-specific "provider_networks'" from o_u_c and direct folks (via docs) to using neutron_provider_networks override? either as group or host vars? or global?16:51
jrosser_and fun times as the behaviour is surprising between focal/jammy and focal/focal+HWE kernel16:51
noonedeadpunkjamesdenton: I'd say yes?16:51
jamesdentoni just went thru this exercise yesterday... real PITA. (interface naming).16:51
jrosser_like PXEboot into focal, mess with the interfaces, install HWE kernel, reboot -> WTF16:51
jamesdentonnoonedeadpunk i think that's fair. keep the logic for upgraded deployments but remove the doc examples16:51
noonedeadpunkyeah, fair16:51
jrosser_jamesdenton: if you have good ideas about interface naming would be interesting16:52
jrosser_we talked about it here this week off the back of nvidia/mlx changing theirs again16:52
noonedeadpunkI totally want to get rid of br-vlan/br-vxlan naming....16:52
jrosser_but did not have a great plan16:52
noonedeadpunkbut indeed seems that bridge is still really consistent in terms of naming...16:53
jamesdentonpfft. seems we had been relying on biosdevname but our recent deploys don't have it. Just had to implement *.link files based on driver and using PATH as the name. Names are obnoxious, but consistent. But even between drivers you can have mild variance in the same slot (ens1f0s0 vs ens1f0s0np0) or something like that16:53
jamesdentonyes, i think eliminating those bridges is a Good Thing™16:54
jrosser_yes that np0 thing is way caught us out16:54
jrosser_theres now np<N> and nv<N> or smth for PF vs. VF16:54
jamesdentonbut it does add consistency. i don't like br-ex -> br-vlan -> bond1 though16:54
noonedeadpunktrue... that's why they're still there. 16:55
noonedeadpunkI feel physical pain for ppl who have br-vlan.100 added to neutron-lxb bridge though...16:55
jamesdentonjrosser_ i have no good ideas, just complaints.16:56
jrosser_understood - we decided to do nothing and just fix everything up for the new names across focal->jammy16:56
jrosser_there was no good answer16:56
jamesdentonnoonedeadpunk Bridgeception16:57
jamesdentonnon-persistent persistent naming. gotta love it.16:57
mgariepyi rename all the interface this way i can upgrade the OS without having a suprise name when i upgrade ;p17:07
jamesdentonso, i was under the impression that in order to rename an interface in netplan it first had to be identified as something (ie. rename ens1f0 -> management) but that if the interface came up as ens1f0np0 first, then it wouldn't work? Maybe you then also have to specify something more specific (like MAC?)17:08
mgariepyi use MAC17:09
mgariepybut i guess it would be too much work to do it via ansible for physical hw.17:10
jamesdentonthen i guess you just have to be conscious of a chassis swap or nic swap or something? maybe not too common17:10
jamesdentoni think some of the tech debt we have in OSA is trying to be too clever17:10
mgariepyi don't swap nic often.17:10
mgariepya couple of year back netplan was not too great about it either.. using the mac when adding vlans it was trying to rename the vlan interface as well ...17:12
mgariepyfun times.17:12
jamesdentonseems to have matured a bit17:13
mgariepyyeah it works ok now.17:13
opendevreviewMerged openstack/openstack-ansible-os_rally stable/ussuri: Move rally details to constraints  https://review.opendev.org/c/openstack/openstack-ansible-os_rally/+/86173017:15
opendevreviewMerged openstack/openstack-ansible-lxc_hosts stable/ussuri: Use legacy image retrieval for CentOS 7  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/86174417:15
mgariepyhttps://netplan.io/reference#common-properties-for-physical-device-types 17:19
mgariepyyou can match by driver also.17:20
mgariepybut i do have my full inventory with macs most of the time so it's quite easy for me to set it up with renaming :D17:21
mgariepywith interface renaming i get some consistency also.17:23
mgariepyall my server have the first 25G interface names 25G-1 whatever pci slot/brand it is.17:25
mgariepyyou can probably get the info from setup module, and filter by speed/type and so on.17:26
spatelI have question folks, by mistake i have deleted monitoring user in galera and now haproxy saying mysql is down17:54
spatelHow do i quickly add monitoring user back ? 17:54
spatelwhat is that monitoring user has to do with haproxy?17:58
jamesdentonmight be as easy as: CREATE USER 'monitoring' IDENTIFIED BY '{{ galera_monitoring_user_password }}';17:59
admin1you can do grant connect,select  on *.* to monitoring@'%'  identified by 'password from secrets'17:59
jamesdentonhttps://github.com/openstack/openstack-ansible-galera_server/blob/5200b50cf650fb5ad5e0733b9e0ead207dbf6c6a/vars/main.yml#L31-L5117:59
admin1well, you can just add quickly and later fix the specific permissions17:59
spateljamesdenton i didn't find any password in /etc/openstack_deploy/user_secrets.yml18:00
admin1spatel the pass could also be in the haproxy cfg 18:01
spatelNothing here - cat /etc/openstack_deploy/user_secrets.yml | grep galera_monitoring_user_password18:01
jamesdentonok hmm18:02
spatelnothing in haproxy.cfg file 18:02
mgariepyin the clustercheck script inside the galera container18:04
spatelThis is bizarre.. :(  18:04
jamesdentonlooks like a fairly recent addition i guess18:04
spatelThis is the script - https://paste.opendev.org/show/bYkLV1m5l2ZjrlVjObsa/ 18:06
spatelNo password there, may be it use mysql root password?18:07
spatelMYSQL_PASSWORD="${2-}" ?18:07
jamesdentonmaybe theres no password?18:07
jamesdentoni think that's a hash18:08
anskiythere is a password, I can see it in user_secrets18:08
opendevreviewMerged openstack/openstack-ansible-rabbitmq_server stable/train: Use cloudsmith repo for rabbit and erlang  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/86179418:08
spatelanskiy why my user_secret not showing it18:08
jamesdentonhttps://github.com/openstack/openstack-ansible/commit/302c8226e6ea51d9e0c76050b470d525dfb33d6018:08
spatelanskiy what is the name in user_secret?18:09
jamesdentonsee the release note? i think it implies there was no password18:09
anskiyspatel: I think, during the upgrade on Y, when you should ran the script to check for missing secrets, you'd had to add one18:09
jamesdentonYou can also override variable to18:09
jamesdenton     ``galera_monitoring_user_password: ""`` to not use password for auth and18:09
jamesdenton     preserve previous behaviour18:09
jamesdentonmaybe just start with creating the monitoring user w/ no pass and see if that wqorks18:10
spatelNow what i should to bring it back quickly to fix my production :(18:10
spatelI didn't know its so important 18:10
anskiyspatel: `galera_monitoring_user_password` is the variable name in my user_secrets18:11
spatelanskiy i don't have that in my user_sec file18:11
jamesdentonspatel what OSA version? anskiy what OSA version?18:11
spatelWallaby 23.3.018:12
anskiyjamesdenton: I've added it when I've upgraded to Y (that was stable/yoga at the time)18:12
anskiyoh, you're on W, nvm then, I guess...18:12
jamesdentonright, so Wallaby wouldn't have that logic18:12
jamesdentonspatel create a monitoring user with no password in galera18:13
jamesdentonthat ought to do  it18:13
spatellet me do ...18:13
spateldone! - CREATE USER 'monitoring' IDENTIFIED BY '';18:14
spatelLook like that fixed my issue jamesdenton 18:15
jamesdentongood deal18:15
spatelhaproxy is happy now18:15
spatelThank you so much jamesdenton 18:16
spatelThis is total mess :) 18:16
jamesdentonas anskiy mentioned, once you upgrade you may need to update secrets to add that var, as a password will be required18:16
jamesdenton^^ release notes ought to cover this18:16
admin1^^ yes .. release note was there 18:17
admin1spatel, still have ovn on prod ? 18:18
admin1any issues so far ? 18:18
spatelYes i am running ovn in production without any issue. 18:18
Adri2000hello jrosser_, I filed this bug about an issue with ansible-role-pki, would appreciate your input when possible :) thanks https://bugs.launchpad.net/openstack-ansible/+bug/199357519:00
Adri2000I took the opportunity to close this old (2016) related wishlist bug that's actually fixed since the introduction of openstack_host_ca_certificates: https://bugs.launchpad.net/openstack-ansible/+bug/1649844 (apparently I'm also the reporter of this bug report!)19:05
jrosser_Adri2000: the run_once certainly looks like it could be an issue19:08
jrosser_which playbook would you be expecting to install this CA for you?19:08
jrosser_either playbooks/containers-lxc-create.yml or playbooks/openstack-hosts-setup.yml i expect19:10
jrosser_if you are able to test it out removing that run_once it would be helpful - even better a patch :)19:10
Adri2000jrosser_: playbooks/containers-lxc-create.yml, as I'm targetting the Keystone containers. for now the workaround I used is to run the playbook limited to Keystone containers only, so the task will run_once on a Keystone container and therefore will have the variable correctly set. I guess removing run_once should work, but I'll test to be sure, and can push it as a patch19:15
Adri2000then19:15
jrosser_seems i used rather too much run_once in the PKI role19:15
jrosser_Adri2000: also take a look in vars/main.yml of the PKI role - there are other ways in there to provide your own certs in as many variables as you like19:18
Adri2000interesting, didn't know that19:28
*** dviroel is now known as dviroel|biab20:47
*** dviroel|biab is now known as dviroel21:59
*** dviroel is now known as dviroel|out23:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!