Thursday, 2023-09-07

jrosserso it looks like the proxy job was broken with https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/88993406:04
jrosserand unfortunately it does not run on that change06:04
jrosseri suspect it's that we don't set `environment:` here https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/os-keystone-install.yml#L43-L5206:05
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Apply deployment env vars during keystone main_pre  https://review.opendev.org/c/openstack/openstack-ansible/+/89400406:09
jrossernoonedeadpunk: did you follow the octavia discussion yesterday..... looks like my br-octavia/br-lbaas change is pretty incorrect07:33
jrosserjust different brokenness /o\07:34
noonedeadpunkah, no, I was already out07:37
farbodhello Farbod again07:40
jrosserbr-lbaas cannot be both from systemd-networkd and ovs at the same time07:40
jrosserbut would not exist from ovs until neutron role runs07:41
jrosserand controller may never have ovs at all with seperate network nodes07:41
farbodFirst thing what IRC client do you suggest for linux ubuntu?07:42
jrosserfarbod: in the past i used irssi+screen, but eventually now i use irccloud for more convenience07:42
farbodThanks07:43
noonedeadpunkjrosser: but systemd can create it as ovs bridge?07:43
jrosserwell it can07:43
jrosserbut like i say on my controllers there would not be an ovs at all07:43
noonedeadpunkmhm07:44
jrosserso we need some guidance/examples for different situations07:44
noonedeadpunkWell, technically LXC can connect to OVS instead of LXB nicely, but it's complete another topic07:45
jrosserand not just OVS tbh - you need OVN running there too07:45
jrosserotherwise the neutron netowkr would not be actually wired to br-lbaas07:45
farbodSo yesterday i looked at the neutro documentation and OSA network architecture. I relized that there are two bridge networks for instances. br-vxlan and br-ex. as i understand the br-vxlan is the tunnel(overlay ) network. but what exactly this overlay network does?07:45
noonedeadpunkugh07:46
jrosserfarbod: in openstack there are two kinds of networks, ones which the openstack operator owns and are usually associated with something physical, like your internet connection or some company internal network07:47
jrosserthese are called provider networks07:47
jrosserthen there are neutron networks that users can create themselves as part of their project in openstack, these (generally but not always) are internal to the openstack deployment07:48
farbodso you mean neutron uses br-vxlan for this kindof networking. I mean additional user defined networks, subnets etc?07:49
jrosserthe self-service networks that users create can be implemented using some range of underlying vlan id on an interface, or they can be done with vxlan or geneve overlays07:49
jrosserthis is why we have a br-vlxan, its a kind of placeholder interface for some IP address on the compute/network hosts to create endpoints for virtual networks built with vxlan/geneve07:50
jrosserin a real deployment you can have an actual interface or bond for this07:50
farbodas i understand br-vxlan is neutron network for additional defiend networks in cluster and br-ex is for external access to physical net?07:51
jrosserthats right (basically)07:52
farbodOK07:52
farbodan another question07:52
jrosserhave you looked at things like vxlan before?07:52
farbodnot really07:53
jrosserok so br-vxlan is not actually a neutron network07:53
farbodi think neutron uses br-vxlan for its networks07:53
jrosserit is some interface (you can use a different one if you wish) where neutron creates the "tunnel endpoint"07:53
jrosserthe neutron network that a user creates is actually one of those tunnels, rather than br-vxlan itself07:54
jrosserbr-vxlan is the "transport" for all the different tunnels07:54
farbodoh nice07:54
farbodso what is the connection between neutron and br-ex?07:54
jrossergenerally that is where you as the operator of openstack connect your external physical networks07:55
jrosserthere can be as many as you need07:55
farbodso how to create a network and assign IPs fro br-ex to instances?07:56
jrosserdo this is one of the design choices you have to make07:56
jrosserin a multitenant openstack a user would be in a project, and in that project they could create their own neutron network with some vlan or overlay approach07:57
farbodand another question. yesterday one of you guys provided me a config for br-ex with vlan and flat options: https://paste.opendev.org/show/b93fNbuLFMMR7jPkCDrH/07:57
jrosserthen they would create a neutron router, and connect to to their network and the external network. the router would do NAT07:58
farbodmy question is that in vlan mode does openstack creates a VLAN Interface for each network and in flat it doesn't?07:58
farbodjrosser: i get it07:59
jrosserif you want traffic for an external IP to go directly to a VM you can then create a neutron floating IP taken from the range on br-ex 07:59
jrosserso thats a multitenant approch07:59
jrosserhowever you can also allow VM to connect straight to external networks, if you want that07:59
jrosserpros and cons everywhere07:59
jrosserin the vlan mode you need to give neutron some physical interface and tell it a range of vlan-id that it is allowed to allocate to users08:00
jrosserin the OSA reference design thats usually br-vlan08:00
jrosserregarding flat networks, that is how you describe to neutron a physical interface with untagged traffic on it, so like a 1:1 mapping08:01
farbodyou mean br-vlan is for floating IPs?08:02
jrosserfloating ips are on an external network08:02
jrosserusually OSA has br-vlan being dynamically allocated tenant/project networks08:03
farbodOK08:03
farbodThanks08:03
jrosserit's worth understanding the concepts in neutron a bit08:04
farbodi tried to lean from documentation but not a much info08:04
jrossermy experience is that flat networks are a lot of trouble08:04
jrosserbecasue you have to define them at deploy time and it all gets written into config files and services restarted etc etc08:05
jrosserif you use a vlan type then you define the physical interface mapping once in the config08:06
farbodbut what if i have only one interface and public ips are on that interface 08:06
jrosserthen tell neutron through the openstack cli which tags to use08:06
jrosserthat is what everyone says to start with08:07
jrosseri think that when you say you have only one interface, you mean the whole server only has one interface08:08
farbodyes08:09
jrosserso you're having to use vlans anyway?08:11
admin1my lxcbr0 disappeared after reboot 08:11
admin1never seen :) 08:11
farbodyes i have vlans but limited to only 5 vlans08:12
admin1you can use 1 for api, 2 for storage 3 for east-west and 4 for   neutron provider network and still have 1 more left :D 08:13
farbodnice08:14
farbodand yesterday i had a problem with image uploading08:14
farbodhow to configure the OSA to enabl image uploading?08:15
noonedeadpunkfarbod: you mean from web uri?08:20
farbodyes08:20
farbodalso when i try to create instances from dashboard i get this error: https://paste.opendev.org/show/bzmQ5XhNeCQLVVdETvb7/08:21
noonedeadpunkTbh I would suggest just upload image from local file for beginning. And get cluster working with "simple" setup. As I said, it needs to configure interoperable image import. Also not all clients/tools/services does support usage of import api and you need to understand how to configure glance for import to work.08:23
farbod👍️08:23
noonedeadpunkregarding error - you need to check details of volume then, to see why specifically it failed08:23
noonedeadpunklike `openstack volume show 44795349-1d1a-49c9-85e9-a8a81cf31330`08:26
jrosserfarbod: what choices did you make for storage?08:26
farbodi think the problem is that i even didn't deploy a storage :D08:26
farbodis this configuration OK? : user config: https://paste.opendev.org/show/baowUwNLc0ZdcWPrfmPM/  first infra network node network interfaces: https://paste.opendev.org/show/bcQZPFykWq0Xge3OUwgr/ second node network interfaces which is storage and comute:  https://paste.opendev.org/show/byY4qFhZEWlgV29yRtpm/08:36
noonedeadpunkcorrect me if I'm wrong, but I don't think that LVM can act as remote storage? I guess you'd need then to have cinder-volume running on each compute08:46
noonedeadpunkor maybe you can....08:47
* noonedeadpunk never used that08:48
jrosseri think thats possible but depends what you want to do08:50
jrosseri saw some blog that ran cinder volume on each compute and made it so that it was "local" iscsi per compute08:51
jrosserbut it was complex to do08:51
admin1my lxcbr0 disappears on every reboot 09:22
admin1running lxc-setup-hosts brings it back up 09:22
admin1what could it be ? 09:22
admin1something to do with netplan/ubuntu not playing nice with files inside /etc/network/interfaces.d/   09:26
noonedeadpunklxcbr0 is created with systemd-networkd. 09:35
noonedeadpunkin latest releases09:36
jrosserdid we change that?09:38
* jrosser just wondering if there is an upgrade gotcha there with some left over stuff09:38
noonedeadpunkI think we did some time ago09:41
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible-lxc_hosts/commit/3d8e3690ba620d1724129f8ed1a6a040c5ccdac909:43
noonedeadpunkso it was for Zed09:43
noonedeadpunkbut we should have handled the upgrade09:44
noonedeadpunkadmin1: so I would check status of systemd-networkd after the reboot and if it's enabled09:45
admin1its enabled after reboot 09:51
jrosserbut not working?09:53
admin1https://gist.githubusercontent.com/a1git/1fc8bd3cb5104c643c8d24cffdbb74fa/raw/13d41dc6f5dab1681d435ddb6f460d841083b972/gistfile1.txt  -- 09:55
admin1hmm.. trying to get to it 09:55
admin1only log i could find -> Sep 07 09:20:57 c1 lxc-system-manage[4054]: /usr/local/bin/lxc-system-manage: line 81: /proc/sys/net/ipv6/conf/lxcbr0/accept_dad: No such file or directory09:57
admin1this cluster only has a single controller .. so cannot reboot it again ( in prod ) 09:59
noonedeadpunkso basically you;d need to `ip link set lxbr0 up`09:59
noonedeadpunkas somehow it's not brought up on it's own10:00
frickleripv6 disabled globally?10:00
admin1it sees all except lxcbr0 after reboot .. https://gist.githubusercontent.com/a1git/4c18a5ee1b23e65ea9ba0226fa831e90/raw/b28281cb2b406a971bfca45f73bc77531f887646/gistfile1.txt10:06
admin1this one is on 25.2.0 10:06
jrosserit's not to do with the OSA version really10:06
jrosserthere will be something in the host/environment config that prevents it coming up10:07
jrosseror ordering/dependancy of some kind10:07
jrosseri don't think we can fix anything without knowing exactly what is stopping it coming up10:07
noonedeadpunkI actually never restarted prod controllers after uprgade to >=Zed10:09
noonedeadpunkas we did it really recently10:09
admin1the only change from working - non working is  some ubuntu update and it was rebooted 10:09
admin1i am trying to find out what packages 10:09
noonedeadpunkI wonder if that could be absent IP on interface or smth like that10:10
noonedeadpunkon lxcbr0 I mean10:11
admin1i will go through the playbooks of that tag to check what files it creates10:11
admin1and then match it10:11
noonedeadpunkyou should look in /etc/systemd/network10:12
noonedeadpunkit's all there10:12
admin1that folder is blank :) 10:14
noonedeadpunkAh, it's Yoga I guess...10:16
noonedeadpunkthen it's not systemd-networkd that manage the bridge yet10:17
noonedeadpunkthen it should be /etc/network/interfaces.d/lxc-net-bridge.cfg10:18
admin1right .. and i think some update causes it to ignore it now 10:19
admin1as it was fine with reboots 2 weeks ago .. 10:19
admin1i think i will move it up a tag to zed and give it a try :D 10:21
noonedeadpunkyou just said it's production :D10:25
noonedeadpunkyou can also move a tag to antelope then - direct upgrades from Y to AA are supported just in case10:26
noonedeadpunkThough, move it to HEAD of stable/2023.1 instead of 27.0.110:26
noonedeadpunkWe're about to release quite big bugfix release10:26
admin1i did one upgrade from 26 -> 27 ,  still trying to figure out why nova-console is broken 10:27
admin1first it broke my custom domain mapping .. like id.domain.com instead of cloud.domain.com:5000 , and then i reverted that  , still  nova-console does not work10:27
admin1and terraform is also broken 10:27
admin126 -> 27   => https://gist.githubusercontent.com/a1git/2efcfd956f342333070f04b7bc048f6f/raw/0bc97dbc4888fae7bd97505557f73a3eb2186480/gistfile1.txt10:28
noonedeadpunkWe indeed made quite big changes to haproxy setup in antelope10:28
admin1openstack cli does not fail on any command , horizon fails on nova-console ,, haproxy fails on custom domain .. terraform failure --still no clear idea why 10:29
noonedeadpunkso likely that for custom domain overrides should be adjusted. Also depends on where you've defined them, as now individual services are scoped not with haproxy group, but with specific service 10:30
noonedeadpunkso to have override for glance haproxy service it should be done not in group_vars/haproxy but in group_vars/glance_all10:31
admin1it was on haproxy_horizon_service_overrides:    in user_variables10:31
noonedeadpunk(if it's not user_variables)10:31
noonedeadpunkhave no idea about terraform though - it's even not opensource anymore :D10:32
admin1acl cloud_keystone hdr(host) -i id.domain.com  ;   use_backend keystone_service-back if cloud_keystone  ; keystone_service_publicuri: https://id.domain.com10:32
admin1those were my overrides for keystone 10:32
admin1and similar for the rest 10:32
noonedeadpunkbut how/why that's horizon overrides...10:32
admin1using a wilcard for the domain ssl 10:32
admin1haproxy_horizon_service_overrides:10:33
admin1  haproxy_frontend_raw:10:33
noonedeadpunkwe also have haproxy maps support, that should make such changes way more trivial10:33
noonedeadpunkah10:33
admin1if you can point me to the right way  for this to be done in 27, i can put it back 10:33
noonedeadpunkI think you might need to use base service isntead then10:33
noonedeadpunkas we use a "special" service that listens on 80 and 44310:34
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/haproxy/haproxy.yml#L71-L9110:34
noonedeadpunkand you can use `haproxy_base_service_overrides` to chime in there10:35
noonedeadpunkso try replacing `haproxy_horizon_service_overrides` with `haproxy_base_service_overrides` and same content might even work10:35
admin1i will give it a try 10:35
admin1do i still need to do horizon_service_overrides ? 10:36
jrosseradmin1: there is a bunch of release notes for this :)10:36
jrosser*hopefully10:36
noonedeadpunkalso, you can indeed convert all that to a map file. Here's some haproxy doc explaining this: https://www.haproxy.com/blog/introduction-to-haproxy-maps10:37
noonedeadpunkso you should be able to populate `haproxy_map_entries` 10:37
admin1perfect exercise to test how advaned our LLM AI gods are ( chatgpt ) :D10:38
admin1i will give it my output in haproxy.cfg and ask it to map it 10:39
admin1though this, i can live without10:39
admin1why terraform broke is more interesting .. 10:39
noonedeadpunkThat I have no idea about frankly speaking10:39
admin1i have to MITM my own ssl in the middle to see its raw api calls ..   coz even with trace, i was not able to see it 10:40
noonedeadpunkAnd since they've changed license to BSL - I even don't care about that10:40
admin1bsl will affect service providers more ..  users not so much 10:42
admin1this was on hackernews today https://threadreaderapp.com/thread/1696521808143683812.html10:42
noonedeadpunkwell.. it's kind of - bugs won't be fixed until there will be a support ticket from valuable customer10:43
noonedeadpunkSo I guess - feel free to submit one?:)10:45
noonedeadpunkthat thread reminds me ones that protected centos stream, hiding sources for it, etc.10:47
noonedeadpunkThere're 31 occurance of word "support" in the thread. 10:47
admin1i am not for or against it .. but just saying that if you have a customer that uses tf ( majority ) its best to upgrade in  a test env and test it againt tf as well 10:47
noonedeadpunkWhich makes me think that person who wrote it does not fully realize what OpenSource is and why it's valuable10:47
noonedeadpunkAnd it's also not OpenSTack that is broken in this case, IMO.10:48
noonedeadpunkSo now it's kinda situation, that in order to keep really broken thing working, you need to keep upgrades and break what is fine10:49
noonedeadpunks/keep/stop/10:49
noonedeadpunkSo imo, bsl is quite a deal for end users as well10:50
noonedeadpunkas at the end users will require to have some feature that is not in tf and won't be in tf until hashi got a support contract to implement it for them10:51
noonedeadpunkbut it's just my opinion....10:52
opendevreviewMerged openstack/openstack-ansible-os_placement master: Add online_data_migrations for placement  https://review.opendev.org/c/openstack/openstack-ansible-os_placement/+/89215911:16
opendevreviewMerged openstack/openstack-ansible-os_neutron master: Check length of network_mappings  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/89392411:22
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron stable/2023.1: Check length of network_mappings  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/89395111:25
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron stable/zed: Check length of network_mappings  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/89395211:25
opendevreviewMerged openstack/openstack-ansible master: Apply deployment env vars during keystone main_pre  https://review.opendev.org/c/openstack/openstack-ansible/+/89400411:39
farbodhttps://paste.opendev.org/show/bGCVrDXcZPEJK8YiaBKD/12:08
farbodhey guys12:08
farbodwhats the problem with this one?12:08
noonedeadpunkhas never seen that. is it run as root?12:12
noonedeadpunkalso what playbook are you running?12:12
farbod2023.112:12
farbodyes its root12:12
noonedeadpunk2023.1 is not a playbook(12:12
noonedeadpunkit's version :p12:13
farbodoh sorry12:13
farbodsetup_hosts12:13
noonedeadpunkif everything fine with... diskspace on infra1?12:14
farbodyes12:15
noonedeadpunkAs I'mn not sure what could be the reason of `[Errno 13] Permission denied: b'/var/lib/lxc/infra1_glance_container-c1ec6e05/`12:15
noonedeadpunkLike it's some system error rather then playbook or logic12:15
noonedeadpunkwhat's the task name that fails?12:16
farbodgive me some mins12:16
admin1farbod, paste your netplan, ip link, ip -4 a and  brctl show output 12:41
jamesdentonmornin'12:59
noonedeadpunko/13:13
noonedeadpunkNeilHanlon: hey! has smth happened to your infra again?13:13
noonedeadpunkjamesdenton: there's another version of vpnaas templates fix available:) https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/89385613:14
noonedeadpunkjrosser: sooo... what should be do with octavia bridge now? Is it breaking CI for octavia?13:16
noonedeadpunkjust trying to understand if backport should be included in 27.1.0 or not13:16
NeilHanlonnoonedeadpunk: not that I know of!13:18
noonedeadpunkI've jsut seen rocky failures like last weekend13:18
noonedeadpunkhttps://zuul.opendev.org/t/openstack/build/015e754691b74e6fa7ef1c381f6f092313:18
noonedeadpunkagain with systemd13:19
NeilHanlonhm13:19
noonedeadpunkand here https://zuul.opendev.org/t/openstack/build/a7c9e54791cd4c9f8637b959062e3fe013:19
NeilHanlonIt's definitely possible that something happened, but I am not seeing any related events on our monitoring13:22
NeilHanlonlet me correlate this time. i did make a CDN change, but that should have been transparent.13:22
NeilHanlonyea no those don't line up. it was 0300 UTC I made that change13:23
NeilHanloni'm checking w/ our releng team to see if they've seen anything. but i wonder if we caught a bad mirror in zuul13:23
NeilHanloni forget if we hardcode to dl.rockylinux.org13:24
jamesdentonnoonedeadpunk thanks, i had already abandoned the other one. Didn't see the typo :(13:26
jrossernoonedeadpunk: i think perhaps we should ask jamesdenton what he thinks is a good idea for octavia13:27
jrosserbecasue there was brokennes for LXC before13:27
jamesdentonnoonedeadpunk the issue for Octavia is that the octavia lxc container needs to connect to the OVS bridge, which doesn't yet exist at the time of setup-hosts13:27
jrosserand my change maybe just makes different problems,13:27
jrosserso there is not really good value in including it in 27.1.013:27
jamesdentonand what makes it even more of a challenge is that if octavia is not deployed on the same host has network bits, there's a chance ovs would never exist 13:28
NeilHanlonI want to also revisit asking infra if we have space to mirror rocky 13:28
jamesdentonI think this may be a situation where there's something different for CI and something else for production; IMO routed is the way. lbaas_mgmt is just a provider network that needs to be accessible by the control plane, but doesn't need L2 adjacency13:29
jamesdentonand what further complicates the OVS side of it, is its not enough to just connect LXC -> OVS, you need to make sure the right OVS flow(s) get implemented to allow that connection to talk to VMs13:30
jrosserbecasue we dont make (guessing) neutron ports for the control plane hosts13:31
jamesdentonmore or less13:31
jrosserwow what a mess13:31
jamesdentonif you check out the devstack bits, they're creating a neutron port after the fact, and then using that resulting mac address for a veth or tap or dummy, can't recall exactly13:32
jamesdentonwhich gets the job done for tempest13:32
jamesdentonWe've been doing routed for a while now in our deployments. There's one override needed IIRC to make it work, and the provider network gets set up early in the deployment process13:32
jrosserdo you need to do stuff to make the ovs flows be right?13:34
jamesdentonnot that i'm aware; the creation of the neutron 'port' allows the agents to setup the flows accordingly (i believe) - but i need to look closer at the devstack13:38
jrosseri expect there is a similar situation for ironic13:40
jrosserwell not sure actually on that13:41
jamesdentonin my experience, the conductor needs to hit ipmi and that is routable13:44
jamesdentonbut PXE is a different story and "depends"13:44
jrosserand cloud-init needs to do it's thing as well somehow13:46
jrosserironic people were a little surprised we allow cleaning/provisioning/inspection networks to be the same13:48
jamesdentonWell, without neutron integration i don't see how you don't do it that way13:49
johnsomI agree, in production deployments, using a routing approach for the lb-mgmt-net is a good strategy14:11
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump ansible-core to 2.15.3 and ansible-lint  https://review.opendev.org/c/openstack/openstack-ansible/+/89237115:01
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant master: Fix linters and metadata  https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/88846915:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!