Tuesday, 2023-04-18

noonedeadpunkmornings08:01
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Disable floating IP usage in magnum_cluster_templates  https://review.opendev.org/c/openstack/openstack-ansible/+/88004708:27
jrossermorning08:30
jrosserthis is looking reasonable https://review.opendev.org/c/openstack/openstack-ansible/+/871189/3409:16
jrosserdamiandabrowski: if we add TLS + no TLS jobs now to the opentack-ansible repo then we would be able to test both situations as we merge the TLS backend things09:18
damiandabrowskiokok, do you have any suggestions how to handle it? should we just enable tls backend for one already existing job or create new one? 11:01
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_blazar master: Add uWSGI support to blazar  https://review.opendev.org/c/openstack/openstack-ansible-os_blazar/+/88065111:06
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-os_blazar master: Add TLS support to blazar backends  https://review.opendev.org/c/openstack/openstack-ansible-os_blazar/+/88065211:06
noonedeadpunkpffffff11:17
noonedeadpunkSeems gates are broken again11:17
noonedeadpunkNeilHanlon: `Error: Failed to download metadata for repo 'appstream': repomd.xml parser error: Parse error at line: 1 (Extra content at the end of the document`  for the https://mirrors.rockylinux.org/mirrorlist?arch=$basearch&repo=AppStream-$releasever$rltype11:19
jrosserfrom our org wide slack where everything also has blown up `The Rocky Linux people believe they've fixed the problem and things should be recovering now`11:34
noonedeadpunk++11:37
admin1i did an upgrade from 26.0.1 -> 26.1.0 and now horizon does not load .. https:// on horizon internal IP gives This site can’t provide a secure connection 172.29.239.156 sent an invalid response.13:36
admin1ERR_SSL_PROTOCOL_ERROR13:36
admin1https://cloud.domain.com returns 503 13:37
*** cloudnull6 is now known as cloudnull13:37
admin1is there some internal tls or https:// or cert thing ? 13:37
noonedeadpunkthere should not be any TLS from haproxy to horizon (yet)14:07
NeilHanlonnoonedeadpunk, jrosser: should be all fixed now. sorry :( 14:11
NeilHanlonnever try to 'simply' replace your domain controllers14:11
noonedeadpunkhehe14:13
noonedeadpunkI was just gona to do that :D14:13
noonedeadpunk#startmeeting openstack_ansible_meeting15:03
opendevmeetMeeting started Tue Apr 18 15:03:52 2023 UTC and is due to finish in 60 minutes.  The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.15:03
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:03
opendevmeetThe meeting name has been set to 'openstack_ansible_meeting'15:03
noonedeadpunk#topic rollcall15:03
NeilHanlono/15:03
noonedeadpunko/15:04
noonedeadpunkhey everyone15:04
damiandabrowskihi!15:04
jrossero/ hello15:04
noonedeadpunk#topic office hours15:06
noonedeadpunkI've jsut found out, that we somehow missed trove for https://review.opendev.org/q/topic:osa/pki15:06
noonedeadpunkI'm going to prepare a fix for that and was kinda wondering if we wanna backport it15:06
jrosserwhat does it use it for?15:07
noonedeadpunkrabbitmq?15:08
jrosseroh you mean kind of like this one https://review.opendev.org/c/openstack/openstack-ansible-os_murano/+/79172615:09
noonedeadpunkyup, exctly15:09
noonedeadpunkanother thing that is broken at the moment is zun. 15:09
noonedeadpunkIt's been a while since we've bumped version of kata, and now kata is gone from suse repos (obviously)15:10
damiandabrowskii was gonna talk about it15:10
noonedeadpunkI've attempted to try install kata from github sources but did not spent much time to be frank15:10
noonedeadpunkalso I'm not quite sure if still it should be integrated with docker or just podman should be good enough with modern zun15:11
noonedeadpunkgo on damiandabrowski :)15:11
damiandabrowskifirstly i thought that disabling katacontainers as default for debian/ubuntu is the best option(it's an optional component anyway and IMO it should be somehow "fixed" on zun side because they mention invalid repo in their docs: https://docs.openstack.org/zun/latest/install/compute-install.html#enable-kata-containers-optional)15:12
damiandabrowskibut disabling kata on master is not enough because obviously upgrade jobs do not pass CI15:12
damiandabrowskiso it can be solved by cherry-picking this change to stable branches which doesn't sound good :|15:13
damiandabrowskihttps://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/88068315:13
noonedeadpunkI think issue is also that main jobs times out15:14
noonedeadpunkso it actuially does not work as well15:15
noonedeadpunkI kinda have same "result" with https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/880288?tab=change-view-tab-header-zuul-results-summary15:15
noonedeadpunkBut the thing is that container is never ready15:15
noonedeadpunkso regardless of kata - it needs a closer look15:15
noonedeadpunkgood thing is that octavia seems to be sorted out now15:16
damiandabrowskiah, i thought timeout issue is just the matter of recheck but probably you're right "|15:16
jrosseri will look through our notes here15:17
jrosserwe did some stuff with zun but didnt ever deploy it for real, but the kata thing is a mess15:17
noonedeadpunkI think it was a while back as well... 15:20
jrosseryeah15:20
noonedeadpunkI'd imagine that docker/podman can be a mess as well15:20
noonedeadpunkas kata now suggests to just go with podman 15:21
noonedeadpunkand no idea if zun has support for that, as it wasn't really developent lately15:21
noonedeadpunkwe're also super close to merging https://review.opendev.org/q/topic:osa/systemd_restart_on_unit_change+status:open15:22
noonedeadpunkSo Zun, Adjutant and Magnum15:23
noonedeadpunkFor Adjutant and Magnum we need to fix upgrade jobs by backporting stuff.15:23
noonedeadpunkOnce we land this I will go through patches and backport then as we've agreed on PTG15:24
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_trove master: Add variables for rabbitmq ssl configuration  https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/88076015:25
noonedeadpunkAnother thing. There was a ML during weekends calling for volunteers to maintain OVS->OVN migration in neutron code. Migration is done for TripleO but I decided to pick this challange and adopt/refactor for OSA as well15:27
damiandabrowskigreat!15:28
mgariepyi will have a ovs to migrate to ovn at some point.15:28
mgariepybut i don't have any cycle right now15:28
opendevreviewMerged openstack/openstack-ansible-os_keystone master: Use chain cert file for apache  https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/87991415:29
noonedeadpunkYeah, me neither, but it's smth I'd love to have :)15:30
noonedeadpunkjamesdenton: btw, I can recall you saying smth about LXB->OVN? Do you have any draft?15:30
noonedeadpunkAs maybe it's smth I could take a look during this work as well15:30
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Update release name to Antelope  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/88076115:31
noonedeadpunkAlso. RDO folks were looking into adding OSA aio deployment to their CI to spot issues early. They were barely aware that we have this feature, so with death of tripleo this path can get some attention. At very least awareness is raised a bit15:33
noonedeadpunkI guess that's why we've seen these great doc patches for aio :)15:37
jrosserthey were nice patches15:37
noonedeadpunkBtw, I've also changed naming of the project on this page https://docs.openstack.org/zed/deploy/index.html15:40
noonedeadpunkSo it was more clear to what project does comparing to others 15:40
noonedeadpunkDamn. Jast spotted that "Guide" is used twice....15:40
damiandabrowskianother thing I wanted to raise is blazar haproxy service15:44
noonedeadpunkmhm15:45
damiandabrowskihttps://review.opendev.org/c/openstack/openstack-ansible/+/88056415:45
damiandabrowskiseems like blazar doesn't have '/healthcheck' implemented and it's required to authenticate for all API requests15:45
damiandabrowskiit makes hard for haproxy to monitor backends(haproxy always receives 401 http code):15:45
damiandabrowski{"error": {"code": 401, "title": "Unauthorized", "message": "The request you have made requires authentication."}}15:45
damiandabrowskiDo you think it's ok to fix it by applying below change? (at least it works on my aio)15:45
damiandabrowski  haproxy_backend_httpcheck_options:15:46
damiandabrowski    - 'expect rstatus (200|401)'15:46
noonedeadpunkSo it's requiring also for `/`?15:46
damiandabrowskiyeah15:46
noonedeadpunkthat sucks15:46
noonedeadpunkyeah, it doesn't seem to have api-paste...15:47
damiandabrowskiwe have something similar for murano(but without regex)15:47
damiandabrowskihttps://opendev.org/openstack/openstack-ansible/src/commit/3f9c8300d8d09832607d2670cb3425a59bb26ac1/inventory/group_vars/haproxy/haproxy.yml#L39215:47
noonedeadpunkBtw. I kind wonder if for murano we could jsut drop /v1 instead15:48
noonedeadpunkdamiandabrowski: we have smth simmilar for rgw btw https://opendev.org/openstack/openstack-ansible/src/commit/3f9c8300d8d09832607d2670cb3425a59bb26ac1/inventory/group_vars/haproxy/haproxy.yml#L16815:48
noonedeadpunkbut yes, I think that fix would be fine15:49
jamesdentonhi noonedeadpunk 15:50
jamesdentonhttps://www.jimmdenton.com/migrating-lxb-to-ovn/15:50
noonedeadpunkaha, great, thanks!15:51
noonedeadpunkhave you tried btw running vxlans with ovn?15:51
jamesdentonI think it was written before we went to OVN in Zed, so the skel manipulation may not be required anymore15:52
jamesdentonI don't think i have tried vxlan, as i recall this: Also, according to the OVN manpage, VXLAN networks are only supported for gateway nodes and not traffic between hypervisors:15:52
jamesdentonhttps://www.ovn.org/support/dist-docs/ovn-controller.8.html15:52
jamesdenton" Supported  tunnel  types  for  connecting hypervisors are15:53
jamesdenton                     geneve and stt. Gateways may use geneve, vxlan, or stt."15:53
jamesdenton /shrug15:53
noonedeadpunkaha15:53
noonedeadpunkok, yes, I see. As I heard rumors that it's doable...15:54
jamesdentonit might be, let me see if i get any sort of error trying. If all nodes are gateway nodes, then maybe?15:55
noonedeadpunkhuh, might be.. But then all communication between VMs will be possible only through public networks I assume?15:56
noonedeadpunkor well, through gateways15:56
damiandabrowskiah, there's one more thing. As jrosser pointed out, we should somehow test tls backend in CI.15:58
damiandabrowskiDo you have any ideas how should we do this?15:58
damiandabrowskienable tls backend on some of already existing jobs or create new ones?15:59
damiandabrowski(I'd appreciate some help here as I'm not really experienced with zuul :|)15:59
noonedeadpunkI think we need to add new job at least for 1 distro (like jammy) that would differ from default16:00
noonedeadpunkbut we should then discuss what job do we want16:01
jamesdentonnot public, just that every node can be an egress point16:01
jamesdentonprobably better to get vxlan->geneve16:01
noonedeadpunkyeah, I think it's better indeed...16:02
noonedeadpunkdamiandabrowski: meaning - no tls at all, or no tls between haproxy and api, or no tls for internal at all16:02
noonedeadpunkmaybe no tls for internal endpoint and no between haproxy->uwsgi would make most sense for me16:03
noonedeadpunkI think this is good example https://opendev.org/openstack/openstack-ansible/commit/b59b392813c060139860afb74682ce664d89556216:03
noonedeadpunk#endmeeting16:03
opendevmeetMeeting ended Tue Apr 18 16:03:58 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:03
opendevmeetMinutes:        https://meetings.opendev.org/meetings/openstack_ansible_meeting/2023/openstack_ansible_meeting.2023-04-18-15.03.html16:03
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2023/openstack_ansible_meeting.2023-04-18-15.03.txt16:03
opendevmeetLog:            https://meetings.opendev.org/meetings/openstack_ansible_meeting/2023/openstack_ansible_meeting.2023-04-18-15.03.log.html16:03
damiandabrowskiso if we're going to create one new job with no tls enabled at all, I assume we aim to enable tls backend by default?16:07
jrosserdid we decide at PTG what the defaults would be?16:08
damiandabrowskiI don't recall it(but maybe I forgot about something)16:09
noonedeadpunkno I don't think we did16:10
noonedeadpunkOr well, maybe during previous PTG. But then arguments were based on excessive complexity we need to maintain16:11
noonedeadpunkNow, when it's not obliged to have it...16:11
noonedeadpunkMaybe we want to keep TLS for haproxy<->uwsgi disabled by default16:12
noonedeadpunk(i don't really know)16:12
damiandabrowskimaybe disabling it by default is ok for now but we still should enable it in CI?16:13
damiandabrowskithen we can still do as you say and keep tls disabled only for one job16:14
noonedeadpunk(or enabled by 1 job)16:15
noonedeadpunkI'd say that CI should follow mostly the default behaviour16:16
noonedeadpunkas then by default we also don't have TLS for internal endpoints IIRC16:16
noonedeadpunkIt's only AIO thing16:16
damiandabrowskiyeah, but on the other hand it's very unlikely to break something only for non-TLS, while it's quite easy to break something for TLS :D 16:17
noonedeadpunkTo be frank I'd rather discuss that next week and vote16:17
damiandabrowski+116:17
opendevreviewMerged openstack/openstack-ansible master: Add missing blazar haproxy service  https://review.opendev.org/c/openstack/openstack-ansible/+/88056416:52
damiandabrowskihave anyone seen strange behavior during facts gathering recently? Now I'm struggling with issues with masakari, but I had similar issue for nova few days ago17:57
damiandabrowskihttps://paste.openstack.org/raw/bdYgzmf4Aem1iVoVRKsH/17:57
damiandabrowskii tried to comment out pacemaker_corosync role but then I just got similar error later:17:57
damiandabrowskihttps://paste.openstack.org/raw/b2KGLxNhn174Pb6uItwf/17:57
damiandabrowskiremoving /etc/openstack_deploy/ansible_facts content doesn't help17:58
damiandabrowskibut manually running setup module ansible -m setup masakari_all does help17:58
damiandabrowskii tried to run os-placement-install.yml but I did s/placement/masakari/g beforehand and it worked fine17:59
damiandabrowskii really can't explain it :|17:59
damiandabrowskirunning setup-hosts.yml also fixes the issue so that's probably why we don't see it in CI18:04
noonedeadpunkdamiandabrowski: we have a patch merged lately for masakari18:09
noonedeadpunkor it's even not - not sure18:09
noonedeadpunkhttps://review.opendev.org/c/openstack/openstack-ansible-os_masakari/+/88036018:10
noonedeadpunkhttps://review.opendev.org/c/openstack/openstack-ansible/+/88045918:10
noonedeadpunkdamiandabrowski: you should do some reviews to stop fighting with bugs that are already fixed :D18:10
damiandabrowskiyou may be right... :D 18:15
damiandabrowskiregarding https://review.opendev.org/c/openstack/openstack-ansible-os_masakari/+/88036018:18
damiandabrowskican you explain how/where tests/ansible-role-requirements.yml file is used?18:18
damiandabrowskii thought that /tests directory inside service roles is not used these days18:19
noonedeadpunkit's not18:19
noonedeadpunkbut meta is 18:20
noonedeadpunkso whenever you inlcude role it will trigger run of apt_cache_pinning that will fail on missing facts18:20
noonedeadpunkSo 2 things - we need to ensure we gather facts and we don't need this role to run :)18:21
noonedeadpunkI'm actually not 100% sure that smth is completely wrong with facts gathering or meta is being processed even before pre-tasks (it actually can)18:21
noonedeadpunkBut regardless it won't hurt18:21
damiandabrowskiokok18:43
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Implement separated haproxy service config  https://review.opendev.org/c/openstack/openstack-ansible/+/87118919:10
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Fix blazar haproxy service  https://review.opendev.org/c/openstack/openstack-ansible/+/88077519:10
psyminI'm hoping to set up a basic/simple 3 machine instance of openstack ansible on rocky.  Going with Zed and Rocky 9 at the moment, but I'm open to other versions.  Bumbling through the process now.  I expect to have to reinstall the OS a number of times before I get it right.19:25
psyminI'm not sure what I'm missing in the deploy guide.   My hope is to have, for now, the most simple setup possible with three baremetal servers.19:43
noonedeadpunkpsymin: hey. So, where are you getting stuck then?:)19:45
damiandabrowskimaybe we will be able to help if you can provide more details ;)19:45
noonedeadpunkthough, I think we will be away now - it's quite late already in EU :(19:48
psyminshould I manually create the lvm volumes individually on the target machines?19:51
psyminEach of the three machines has four nics.  I have them set up with four /24 networks.  I have them named Ext, Stor, Virt, and Mgmt.  Does this sound acceptable?19:55
noonedeadpunkpsymin: so lvm volumes for cinder storage?19:57
noonedeadpunkor how are you wanna utilize lvm?19:57
noonedeadpunkregarding networks - sure, that does work19:58
psyminAt the moment I don't care if I use lvm or not.  I just want to succeed with an ansible deploy so I can feel more confident about the process :)  Then I'll play with it for a while, and probably reinstall the OS, change some parameters, and deploy again.19:58
noonedeadpunkpsymin: question - have you tried out AIO setup? That will setup everything on a single VM?19:58
noonedeadpunkAs that's smth I'd suggest starting with then19:58
psyminI have done the AIO in a single server and it functions.  19:58
noonedeadpunkOk, gotcha. 19:59
noonedeadpunkYou can also replace dummy interfaces there with real ones, and expand setup for new controllers19:59
noonedeadpunkto get mnaio :D19:59
psyminmulti node all in one?20:00
noonedeadpunkyup. We actually have some code/doc here https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/multi-node-aio but I'm not sure how relevant it is to be frank20:01
noonedeadpunkHaven't used it in quite a while20:01
noonedeadpunkThough, I think with just manual install you should be on right track20:01
noonedeadpunkSo LVM is needed mostly for cinder, as a volume backend20:01
noonedeadpunkand yes, it needs to be configured manually. at least part with PV/VG20:02
psyminFor the finished system, we'll definitely need block devices.20:02
noonedeadpunkBut cinder is not _really_ required20:02
noonedeadpunkWell, nova does provide block devices as well20:02
noonedeadpunkand it can use just qcow files on compute node filesystem and even live migrate with that20:03
noonedeadpunkIt's not very handy, as it manages disk size with flavors, so you'll need to have way more flavors. On top of that you can have only 1 block drive (or well, 2-3 if count swap and config drive)20:04
psyminfrom the mindset of wanting the least configuration to start with, and building from there / redeploying, what would I need to configure?20:04
noonedeadpunkso cinder allows to attach/detach extra ones whenever needed20:04
psyminHere is my openstack_user_config.yml, which I assume is missing essential info and probably has some incorrect info :) https://paste.centos.org/view/raw/353bd6d620:05
noonedeadpunkaside from infra (like repo/galera/rabbit/utility/haproxy/keepalived) you will absolutely need keystone, nova, neutron, glance and placement20:05
noonedeadpunklog_hosts is not valid anymore20:06
psyminremoved, thank you20:06
noonedeadpunkAlso I'd assume having same names of the server with same IP20:07
psyminwill I need to install rabbit manually on the targets, or does the deploy host do that with ansible for me?20:07
noonedeadpunkrabbit should be part of os-infra_hosts20:07
noonedeadpunkSo OSA does manage that as well as mariadb galera cluster20:08
noonedeadpunksorry, shared-infra_hosts not os-infra_hosts :)20:08
noonedeadpunkAlso, are you wanna to play with ironic?20:08
psymingood catch, nope, we won't be needing that, removing20:09
noonedeadpunkAnd do you want bare metal deployment or using LXC?20:09
psyminI doubt we'll need bare metal deployment and will only have these three bare metal servers for openstack.20:10
noonedeadpunkLet me rephrase myself a bit:)20:10
psymineach has 8 ssds, 6 cores, and 64gb of ram, currenly only one SSD is partitioned and used for the host OS, the rest are unpartitioned20:11
noonedeadpunkSo there're couple of options that are available. 1 - deploy services, like api, scheduler, rabbit, etc to LXC containers. 2 - deploy all that just to these VMs20:11
noonedeadpunkas in 1st case you will likely need to define provider_networks as well20:12
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible/src/branch/stable/xena/etc/openstack_deploy/openstack_user_config.yml.aio#L3920:12
psyminwould that be global_overrides: ?20:12
noonedeadpunkyep20:12
psyminwe don't have a hardware load balancer, does that mean we shouldn't configure external_lb_vip_address ?20:14
noonedeadpunkum, no? So we're deploying haproxy with keepalived by default, that failover IP over in case of any troubles20:15
noonedeadpunkIt does not require a standalone loadbalancer20:15
psymincool20:15
noonedeadpunkmostly it's just fine to locate these with rest of controller plane20:15
noonedeadpunkunless you're starting to server object storage through it and want a good throughput20:16
noonedeadpunkbut I mean - then you'll need hardware LB anyway20:16
noonedeadpunksry, I need to head out now, it's getting quite late... folks are mostly around during UTC business hours20:16
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible-haproxy_server master: Define blank _haproxy_service_configs_simplified  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/88078120:17
psyminI don't intend to need object storage20:18
psyminthank you20:18
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Revert "Skip haproxy with setup-infrastructure for upgrades"  https://review.opendev.org/c/openstack/openstack-ansible/+/88009120:20
noonedeadpunkdamiandabrowski: maybe we should add meta/clear_facts to the end of https://review.opendev.org/c/openstack/openstack-ansible/+/871189/35/playbooks/common-playbooks/haproxy-service-config.yml instead?20:20
noonedeadpunkas like 880781 saying that we rather should?20:21
* noonedeadpunk signing off20:22
damiandabrowskiso...i tried to solve it with meta: refresh_inventory but it didn't help20:23
damiandabrowskii can try with clear_facts tomorrow if you think it's safe20:23
damiandabrowskibut i'm not sure what's wrong with 880781 :D 20:24
noonedeadpunkmeh, I don't know... just a bit meh... I wonder if defining it in task vars would do the trick as well20:32
noonedeadpunkbut also - we don't import that part multiple times?20:33
noonedeadpunkwe do tasks_from always?20:33
opendevreviewDamian Dąbrowski proposed openstack/openstack-ansible master: Add support for TLS backends  https://review.opendev.org/c/openstack/openstack-ansible/+/87908520:33
noonedeadpunkso maybe it's "broken" due to import_role vs include_role?20:34
damiandabrowskii'll work on this tomorrow, my brain is not working anymore... :D 20:38
opendevreviewMerged openstack/openstack-ansible-os_masakari master: Drop apt_package_pinning from role requirements  https://review.opendev.org/c/openstack/openstack-ansible-os_masakari/+/88036020:39
opendevreviewMerged openstack/openstack-ansible master: Gather generic masakari facts  https://review.opendev.org/c/openstack/openstack-ansible/+/88045920:41
noonedeadpunksure thing, I'm not saying that it should be done now hehe, just throwing ideas aloud20:47
NeilHanlonhey psymin, welcome :) 21:01
psyminhowdy21:01
psyminI fear that I might need more handholding than I'd like to admit.21:02
psyminwhen you deploy with rocky 9, do you get the warnings "Failed to parse /opt/openstack-ansible/inventory/dynamic_inventory.py" ?21:07
psyminperhaps I should try with different distros and compare the output21:08
jrosserpsymin: I think ansible tries to determine if it is an ini file, something like that21:08
jrosserthe warning is ok21:08
NeilHanlonjust getting my lab back up. what branch did you decide to deploy psymin? Zed or Antelope?21:09
psyminNeilHanlon, Zed, since it seemed to me that it was "done"21:09
jrosserpsymin: you can look at the output for all the different distros in our CI jobs21:09
jrosserNeilHanlon: psymin openstack-ansible is a “cycle trailing” project which means we get 3 months after the openstack projects make a release to finalise ours21:10
jrosserso for OSA the most recent release is Zed, and we still work on Antelope21:11
NeilHanlonope, yeah. bad question on my part21:12
NeilHanlonpsymin: it might be best to try and start "fresh" on your server and try the deployment from the beginning. an AIO on a single server is a good starting point to understand how it all works together, before starting on a multi-node (distributed) setup21:14
psyminI've done AIO on a single server a few times21:15
NeilHanlongotcha - how are your servers connected?21:15
NeilHanlonyou'll want to do a bit of planning on the networks you need, and how you'll configure the interfaces21:15
NeilHanlonhttps://docs.openstack.org/openstack-ansible/zed/user/network-arch/example.html21:16
psyminfour nics, four /24 networks, mgmt, stor, virt, ext21:16
jrosserdon’t be afraid to use vlans if your switch supports that21:18
psyminit does support vlans21:19
jrosserthings are vastly simpler particularly with external networks if you use trunk ports/vlans21:19
psyminWhere is the CI for OSA and distros online?21:20
jrosserspecifically network type “flat” in openstack looks appealing because it it conceptually simple, but “vlan” type eventually needs less config and is more flexible21:20
jrosser^ for external networks21:20
NeilHanloni've burned many hours on that very thing heh21:20
psyminWhichever is easiest for me to grasp temporarily to get a functional test environment so that I have renewed vigor to continue :)21:21
jrosserwell, like NeilHanlon said you can add extra compute nodes to an existing AIO relative easily21:21
jrossermight start from a slightly different host network setup but it will be very similar21:22
jrossertake away the NAT and stuff that bootstrap-aio does and your pretty much there21:22
jrossertrouble is, OSA is really like a toolkit21:23
jrosserso you can make really anything you like, and there’s not really a right answer for anything21:23
* NeilHanlon needs to contribute some example RHEL network configs21:24
jrosserso perhaps what I mean is that it’s important to understand/plan what you want, rather than expect the tool to do magic and decide for you21:24
jrosserthen express what you want in the config21:25
jrosserthis is most true for networking21:25
NeilHanlonThere are a _lot_ of knobs to tune/touch/play with, if you want to, but not all of them (most of them) are required for many deployments21:26
NeilHanlonthis is a really good "just starting out" guide - https://docs.openstack.org/openstack-ansible/zed/user/test/example.html21:26
NeilHanloncan even remove heat from there, I think, to make it more simple21:27
psyminis it acceptable to have storage1, compute1, and infra1 all bound to the same server?21:27
jrosserNeilHanlon: I think we lack a “homelab” type doc, everything we have is oriented more at larger scale and lots of H/A21:27
psyminwe have three baremetal machines, all with ample cpu, disk, nics and ram to utilize21:28
NeilHanlonpsymin: a more "hyperconverged" setup is definitely possible21:29
NeilHanlonas long as your networking and such is configured and defined in the user_config, OSA doesn't "care" where you put things21:29
jrosserpsymin: you can certainly do that, just be mindful of how much ram you need to reserve for the services vs vm - otherwise the OOM killer will cause havoc21:29
NeilHanlonjrosser: one of the things i'm working on is a project that will deploy to different cloud providers (currently digital ocean and vultr), and then install a cluster on top of those nodes21:32
admin1when I do a curl http://horizon-internal-ip, it redirects me to https://horizon-internal-ip 21:32
admin1but its not listening on https://21:32
NeilHanloncloudception 21:32
admin1so where did that https redirect came from 21:32
admin1curl 172.29.239.156:80 -I  =>  Location: https://172.29.239.156/auth/login/?next=/   ;  .. i see apache2 on both 80 and 443 listening, but the one on 443 does not repond .. could be looking for a cert that haproxy does not have21:33
jrosseradmin1: doesn’t the haproxy config have an 80->443 redirect as part of it?21:36
jrosseradmin1: actually what do you mean horizon-internal-ip?21:37
admin1backend horizon-back  has something like server r2c1_horizon_container-19ab5602 172.29.239.156:80 check port 80 inter 12000 rise 3 fall 3  .. but when I curl to that .156:80, it redirects to 443 21:37
jrosserthe ip of the backend? or haproxy?21:37
admin1and that 443 does not open 21:37
admin1that is the ip of the backend 21:37
jrosserso the answer would be in the horizon container I think21:38
jrosserlikely the Apache config21:39
admin1i destroyed all and recreated, same result .. so some config that turns ON this setting  most probably 21:39
psyminNeilHanlon, It sounds like you're suggesting I reinstall rocky 9 on all these servers, then rewrite my configs, then try deploying again?21:43
jrosseradmin1: look in the horizon role at the openstack_dashboard.conf.j2 template - it’s fairly clear what’s going on21:43
* jrosser enough for today21:43
NeilHanlonpsymin: that's "cleanest", at least for the node you started the deployment on. but you can also probably get away with deleting a few directories which contain the major outputs of the bootstrap-ansible.sh script21:48
admin1jrosser, thanks .. i will try . but i do not get it and also why it is breaking all of a sudden 21:49
jrosseradmin1: well we really don’t change stuff much on stable branches21:49
psyminNeilHanlon, I can probably have the reinstall and prep done tomorrow.  Perhaps I can coerce you to help with the openstack_user_config.yml for deploying?21:50
NeilHanlonof course, happy to give advice21:50
psyminAwesome!  I also have some questions about the network.21:50
admin1i think horizon_enable_ssl  goes enabled somehow .. or it was not activated properly in mine, so it did it appear before 21:51
admin1so it did not *21:52
admin1i will set it to false and rerun the playbooks .. thanks jrosser21:52
psyminI assume Container / Management / br-mgmt needs to be routable to the lan and not completely isolated.  Storage / br-storage can be isolated.  Overlay / br-vxlan .. does that need to be routable to the internet or can it be isolated?21:53
NeilHanlonoverlay should be isolated, as it's your guest's project traffic21:53
admin1everything can be isolated as well ..  in my case br-mgmt, br-storage, br-vxlan all are unrouted network on their own private vlans 21:54
NeilHanlonyep, that's also true. depends how you define 'isolation'21:54
NeilHanlonas long as the relevant hosts can speak to one another21:54
psyminPerhaps I should have the management interface on the same LAN that all of our desktops are on?21:55
psyminrather than be its own?21:56
NeilHanlonfrom a security perspective that's not as good. if it's in their own network you can firewall them off. plus there is a fair amount of traffic on that network21:57
jrosser^ don’t do that:)21:57
psyminokay :)21:57
admin1you can have one ip that can be reached over office lan in a interfeace of its own , and then just use it as  VIP /NAT 21:57
psyminso we'll just have to route between the lan and management/container ?21:57
admin1or have 1 haproxy or router or switch with (L3)  map your internal api endpoint -> external ip 21:57
psyminI hope the management network isn't accessible from the external world, just Lan 21:58
psyminWe'll wireguard in if we need access21:58
admin1on controllers, say you have eth0 -- this is where you do ssh to login to the server .. it has ip from network/lan ...   .. in the same controller, you can have one or multiple network cards, or just this eth0 but diff vlans on top of those, you have br-mgmt, br-vxlan , br-storage etc 21:58
admin1then your cloud IP ( VIP ) will be something on the eth0 which internally connects it to br-mgmt 21:59
psyminwe currently have four physical nics on each server, so we might as well use them IMO.22:00
psyminit sounds like Overlay / br-vxlan doesn't need any routing and can be isolated, same with Storage / br-storage.  Container / br-mgmt will need routing to the local LAN so we can access the horizon interface.  Then we have another nic for external network access?22:01
admin1br-mgmt is not routed . you use the same IP as you SSH to the server for example as a proxy endpoint which will give you access to the services running on br-mgmt22:02
admin1you need to ssh to the server right ?  so think of 1 separate IP on the ssh range that will NAT ( in our case haproxy )   and give you access to all the services running on br-mgmt22:02
psyminso if eno1 has IP 192.168.104.101 (ext) and eno3 has IP 192.168.103.101 (management) .. are you suggesting I set up an ssh tunnel to allow me access to the horizon web interface that is bound to management?22:03
psymineno1 is what I'm currently sshing to22:03
admin1no the ssh IP itself  192.168.104.101   will have haproxy running, so port 80 of .101 will proxy and provide you service running on 192.168.103.x ( management ) range 22:05
admin1have you ever used lxc or docker in a system ? 22:05
admin1how do you expose it ? 22:05
admin1you use something like nginx or haproxy to map  network reachable ip to the internal ips 22:05
psyminI'm most familiar with qemu / kvm22:05
psyminI have used docker22:05
admin1lets use docker 22:06
admin1docker creates 172.x ip 22:06
admin1your system may have 192.168.x.1 22:06
admin1so you run haproxy or nginx on 192.168.x.1 and map internal docker IP/port for it to be reachable from outside 22:06
admin1think of the same in openstack ase 22:06
admin1case*22:06
psyminWhen you say "outside" here you're meaning the local lan?22:07
admin1br-mgmt , br-storage, br-vxlan is like docker .. no one from outside sees it directly 22:07
admin1yes22:07
admin1your ssh ip is what will be used to expose via haproxy these services to the outside lan 22:07
psyminit sounded like the IP I ssh to is supposed to be on the management network.  Did I misread the documentation?22:08
admin1you have private  br-mgmt, br-storage, br-vxlan range ..   and ssh ip 22:08
admin1you have 4 network cards ? 22:09
admin1what are their speeds ? 22:09
psymingigabit22:09
admin11gb each ? 22:09
psyminyes, technically I think some support more but our switch doesn't22:09
admin1how many controllers ? 22:10
psyminlooks like they're all 10 gig nics but connected to 1 gig ports on the switch22:11
NeilHanlonplenty of buffer space, then ! 22:11
admin1how many controllers are you starting with ? 22:11
NeilHanlonadmin1: that's the question, basically22:11
NeilHanlonit's a three node "lab" sort of cluster, it sounds like22:12
psyminI'm not sure what you're meaning by controller.  There are three baremetal servers, each one I'd like to have offer cpu, ram and disk.  If they can all be "controllers" that'd be handy.22:14
admin1you can have 1 controller , and 2 computes 22:14
admin1what is your storage system ? 22:14
admin1where do you plan to save your images and volumes 22:15
psyminwe have 8 sata ssds on each server22:15
psyminI hope to use 7 ssds on each server to offer storage22:15
psyminone is for os22:16
admin1do you plan to use ceph ? 22:16
psymina fantasy was to use ceph, but that adds more complexity than I can handle at the moment.22:16
psyminif you think ceph will simplify things, great22:16
psyminfor our needs ceph is overkill, but it would be good to know22:17
admin1you can have 3 servers, all on ceph .. create your ceph cluster 3 .. so i would give 2x ssd for storage of the OS and 6x for ceph 22:17
admin1it depends on how this cluster is going to be used 22:17
admin1will it grow, will there be paying customers, what are growth prospects, if things go awesome, how do you see growth in 6 months, what kind of workload profile etc 22:18
psyminno customer data, mostly just our own deployments, email server, web server, probably nextcloud22:18
psyminno paying customers, only our virtual machines22:18
psyminmigrating away from qemu / kvm 22:18
psyminhowever, our product does get deployed to openstack environments in our customer networks, so having one locally will be of great use to us22:19
admin1what is the server spec ? 22:19
psyminin summary we won't even come close to any bottlenecks on these servers22:19
NeilHanlonpsymin: a controller in this instance is basically "a host which runs the components of OpenStack which are required to run OpenStack". the biggest thing to be concerned with when having less than, say 2, controllers, is you don't have redundancy of those components. For example, if you have one controller and it goes down, your cluster no worky22:19
psyminI agree, redundancy would be great.  Multiple controllers would be great.  22:20
psyminif things "go awesome" there will be no growth and everything will run stable22:21
admin1you can have 1 server as controller,  use 2x ssd disk for OS .. and then raid10 the other 6 .. this will be used for glance and cinder ..22:22
NeilHanlonhow much CPU/RAM do the nodes have? the services which run openstack use their own resources, so it could make sense to use one controller with two compute nodes for now, since then your compute nodes are only doing compute stuff (well, and storage)22:22
psymingrowth for our situation shouldn't put any additional load on these servers22:22
admin1the rest of the 2 servers can be used for your workload22:22
NeilHanlonthen if in a year you need another compute node, buy another controller too22:22
psyminNeilHanlon, 64 gigs of ram, 6 1.9ghz cores each22:23
NeilHanlonas long as you have backups of your database and the important stuff, it's not the end of the world to have one controller22:23
NeilHanlondepends on your SLAs :D 22:23
psyminthe hope and plan is to never need any more hardware than we have, since it is so excessive and growth of the company won't put any more load on these servers22:23
psyminwe'll be the only ones using the vms, no customer machines on them22:24
NeilHanlonright, but if you host a mail server for example, your boss might be mad if email is down for a few days while you restore the controller22:24
psyminyep, definitely that, so we should have two (or three) controllers if possible.22:25
psyminIt isn't possible to have one do cpu and be a controller?22:25
admin1for internal ones, if no SLA is needed and no one will dance on your head .. then 1 controller is fine 22:25
admin13 controllers will be waste of resources as they are just replicating stuff22:25
psyminwasting resources is absolutely fine22:26
psyminbut 2 makes sense22:26
admin12 is split brain 22:26
admin11 or 3 22:26
psyminokay, three sounds good22:26
NeilHanlonin that case you'd just have your three hosts as all the targets in your openstack_user_config22:27
psyminThat sounds perfect!22:27
NeilHanloninfra hosts, compute host, storage hosts.. etc22:27
admin1no but if you have 3 controllers, where is your compute ? 22:27
admin1unless you buy more 22:27
admin1hardware 22:27
psymincan compute not run on a controller?22:27
NeilHanlonadmin1: also on the 'controller' hardware22:27
NeilHanlonhyperconverged22:27
admin1you should not 22:28
psyminadmin1, would running AIO be better?22:28
admin1as compute will eat the cpu and  procesing of the api and your clister will die 22:28
admin13 node is fine .. 22:28
admin11x = controller, 2x = compute 22:29
admin11x = controller + storage  ( also maybe network )  , rest 2 = compute for your workload 22:29
NeilHanlonbut there's no redundancy for controller components then, admin122:30
psyminsounds like you're saying that mixing compute and controller will somehow create a feedback loop that eats itself?22:30
psyminif all it does is max one cpu 24/7 that is fine22:30
NeilHanlonAIO is a controller with integrated compute/storage, so it can definitely work22:31
admin1how many cpus do you hvae ? 22:31
psyminadmin1, six cores per server, so 18 total22:31
admin1psymin, your controller will die in itself not being a compute also .. based on your workload 22:31
NeilHanlon_may_ require some tuning of the CPUs to dedicate cores for project compute22:32
admin1mysql , rabbitmq, network, apis they  are in constant chatter 22:32
psyminI'm okay with testing, deploying and realizing it is awful.  That would be a great start.22:32
admin1then you start with 1 controller and 2 computes 22:32
psyminokay22:33
admin1so server 1 = raid1 for 2x ssd where you install the OS .. ( debian/ubuntu ) and  6x raid10 for your glance and cinder 22:33
psyminif is okay, I'm going to skip the raid for now22:34
admin1its more hassle if you skip the raid 22:34
admin1because how else are you going to expose the cinder and glance 22:34
psyminraid for cinder and glance makes sense22:34
admin1raid10 on the 6x ssd  and expose them via nfs is the cheapest ( in terms of simplicity and resources utilization) path 22:35
psyminlets back up a moment22:36
* NeilHanlon has to step away for dinner. biab22:37
psyminokay, for the final deployment I'll set up mdraid.  But for a test deployment, like what I'm trying to do first, I think it can be safely skipped.22:40
psymindoes openstack want the block devices partitioned and formatted?22:41
admin1you want storage for images and volumes .. how do you plan to pass that to openstack ? 22:41
psyminwhichever way it prefers22:42
admin1so the preferred way for your resources will be exposing it via nfs 22:42
psyminokay, so before deploying openstack w/ ansible, I should partition and format the disks, and set up nfs?22:43
admin1for controller, yes 22:43
admin1for computes, you can also have all disks on a big raid10 with single /boot, swap and  / everything else for simplicity 22:44
psyminI'm not sure why that isn't in the openstack ansible guide.22:44
admin1because there are more then 4 dozen ways to do storage 22:44
admin1it all depends on what you have and what you want 22:44
admin1so its not possible to put this on a guide saying this is what you must have or should do 22:44
psyminat this point I want whatever is easiest to get working with ansible.22:44
admin1which i  already told you . controller do 2x  raid1 os , ubuntu,     6x raid 10 , mount raid10 to say /srv and create glance and cinder folder and expose them via nfs to your br-storage range    22:45
admin1compute, for simplicity, make all on raid10 so that you have the redundancy + speed and , create a boot, swap and rest all / 22:46
psyminokay, before using ansible, set up raid 1 on two disks, raid 10 on six, install ubuntu to the raid1 disk, mount raid 10 (ext4?) to /srv and serve it up with nfs to the storage network.22:49
admin1i would use xfs, but ext4 is upto you 22:49
psyminin the ansible config set this for cinder to use nfs? https://docs.openstack.org/openstack-ansible/12.2.6/install-guide/configure-cinder-nfs.html22:50
admin1do you have a vlan with IPs to use ? 22:50
admin1what is the ip range you plan to use for the vms ? is it a new ip range, some old ip range,  is it on a specific vlan , can you ask it to be tagged on a vlan ? 22:51
psyminI currently have four nics on four private /24 networks.  Three of them are isolated, including management.22:51
admin1is management = ssh ? 22:52
psyminmanagement is not currently ssh, because it is isolated :(22:52
admin1you need one non isolated network for the vms to be reachable from the office22:52
psyminthe nic and network that is not isolated I call "ext" for external access.22:53
psymindo I need more than one nic to not be isolated?  I can rename the ext network to management if that is wise.  Or talk to my coworker to figure out how to get another network routable.22:53
admin1yes 22:54
admin1do you have a ext network that is not isolated ? 22:54
psyminyes22:54
admin1is it via a specific network interface ? 22:54
psyminthat is what I'm sshing with at the moment22:54
psymineno122:54
admin1you need 1 more 22:54
admin1where the .1 or .254 is in a router with dhcp and possibly in a tagged vlan 22:55
admin1sorry without dhcp 22:55
admin1as openstack controls the dhcp and assigns the ip 22:55
psymin192.168.104.1 is a router that isn't serving DHCP to that network, it is the default route for these machines and goes over eno122:56
psyminthe machines have 192.168.104.101, 192.168.104.102 and 192.168.104.103 on eno1 .. eno2, eno3 and eno4 follow a similar format and have IPs assigned and are isolated.22:57
psyminyou're saying I need one more interface that is routable?  I assume it shouldn't be the interface w/ storage, or virtual, so that leaves management22:59
admin1you can have your ssh on eth0 .. so its 192.168.104.101 102 and 103 ..  you can make your haproxy etc on 192.168.104.100 or 99 and point cloud.domain.com to  that ..   eth1 can have br-mgmt and br-storage ,   eth2 can be br-vxlan .. and then finally eth3 can be your routed network like 192.168.105.1 on VLAN and this eth3 on the switch will not be23:06
admin1access but trunk to this vlan23:06
admin1so you can add vlan say 11-19 or   21-29 etc  to ports eth1 , eth2 and eth3 .. this way, u can from the same port run multiple isolated networks 23:07
psyminWhat would you call that network on eth0?23:07
admin1ssh/oob/office network 23:08
admin1managment in terms of openstack is internal openstack api network 23:08
admin1its 1 am .. i have to go :D 23:08
psyminokay, eno1 (eth0) is already set up with routing and I ssh there :)23:08
admin1can continue tomorrow 23:08
psyminsleep well, thank you23:08
NeilHanlonnoonedeadpunk generic question when you're around. the openvswitch3.1 change in openstack_hosts; I just went to deploy a rocky AIO and it was missing the exclude on rdo-deps.repo. is it correct we need to bump ansible-role-requirements.yml in stable/zed to pull updated commit?23:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!