Friday, 2024-02-09

gokhangood morning noonedeadpunk, I am trying distribution upgrades, but when reinstalling infra nodes, lxc containers are not created with "openstack-ansible setup-hosts.yml --limit localhost,reinstalled_host*". it seems we need also adding lxc_hosts when using limit 06:27
gokhannoonedeadpunk, I found it we need to add also reinstalled_host,reinstalled_host-host_containers. we need to update distribution upgrade document07:48
noonedeadpunkgokhan: oh, yes, sure08:32
noonedeadpunkyou defenitely need it08:32
noonedeadpunkhowever, I somehow thought that reinstalled_host* includes both it and containers?08:32
noonedeadpunkgokhan: like, when I do `ansible -m ping os-control01*` I get both host and containers08:33
noonedeadpunkso `openstack-ansible setup-hosts.yml --limit localhost,reinstalled_host*` should do the trick?08:34
gokhannoonedeadpunk, sorry, it was my fault, I didn't use asterix * after node name :( it is working 08:36
noonedeadpunkyeah, asterisk is important there :)08:38
gokhannoonedeadpunk, in this command "openstack-ansible set-haproxy-backends-state.yml -e hostname=<infrahost> -e backend_state=disabled --limit reinstalled_host" infrahost is reinstalled host or another infrahost 08:58
gokhan?08:58
noonedeadpunkgood question09:04
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Disable RPC configuration for Neutron with OVN in CI  https://review.opendev.org/c/openstack/openstack-ansible/+/90852109:33
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Disable RPC configuration for Neutron with OVN in CI  https://review.opendev.org/c/openstack/openstack-ansible/+/90852109:34
noonedeadpunkso, seems that centos9 is broken on nova-compute09:41
noonedeadpunkand also we have broken CI overall...09:42
noonedeadpunkjobs are not scheduled due to zuul config error09:43
noonedeadpunkhttps://review.opendev.org/c/openstack/openstack-ansible/+/908322 to solve it09:43
noonedeadpunkgokhan: sorry got distracted09:43
noonedeadpunkgokhan: yes infrahost is reinstalled_host in this context09:44
noonedeadpunkwould be good to align these 2 in doc09:45
andrewbonneyI can fix that in my patch. Should have it ready to go within a day or two09:53
noonedeadpunkNeilHanlon: just in case you might be interested, that libvirt 9.10 has nasty regression https://fedoraproject.org/wiki/Changes/LibvirtModularDaemons10:48
noonedeadpunkugh10:48
noonedeadpunkhttps://issues.redhat.com/browse/RHEL-2060910:48
noonedeadpunkandrewbonney: we're having another OS upgrade next week, so can practice it a bit :)11:00
halalifolk, will be good to land/merge this change soon https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/907708 :) 11:04
gokhannoonedeadpunk, when running "openstack-ansible setup-infrastructure.yml --limit localhost,repo_all,rabbitmq_all,reinstalled_host*" , it throws error when creating wheel directory on the build host 11:07
gokhanhttps://paste.openstack.org/show/bTsHar3QBasMsJk1UyEP/11:07
gokhanerror is "msg": "chown failed: failed to look up user nginx", nginx user is not created on repo container11:09
noonedeadpunkgokhan: maybe it failed somewhere before that?11:10
noonedeadpunkwhat if you run repo-install.yml?11:10
gokhannoonedeadpunk, I am checking now 11:11
noonedeadpunkas it feel that repo installation also potentially failed.11:11
noonedeadpunkAs task is delegated to repo container11:11
noonedeadpunkand it expects nginx to be present there11:12
gokhannoonedeadpunk, when I run "openstack-ansible repo-install.yml --limit localhost,dev-compute1*", I am getting failed: dev-infra1-repo-container-c3e5f3be is either already part of another cluster or having volumes configured\11:14
gokhanI have previously removed  this infra node from peers 11:15
noonedeadpunkok, frankly speaking I'm not that an expert in gluster... And we don't use it locally either...11:17
noonedeadpunkSo I can hardly help with this part11:18
noonedeadpunkBut I know andrewbonney dealt with it11:18
noonedeadpunkthere's a doc update containing removing brick in advacne: https://review.opendev.org/c/openstack/openstack-ansible/+/906832/2/doc/source/admin/upgrades/distribution-upgrades.rst11:18
noonedeadpunkL19511:19
andrewbonneyAssuming the brick/peer was removed in advance, the issue may be that the repo install needs to run against all hosts (no limit)11:20
gokhannoonedeadpunk, sorry it was my fault again :(  repoinstall.yml is commented in setup-infrastructre.yml :( 11:20
noonedeadpunkheh, ok  :)11:21
gokhanI previously followed https://review.opendev.org/c/openstack/openstack-ansible/+/906832/2/doc/source/admin/upgrades/distribution-upgrades.rst and remove brick/peers 11:21
gokhannoonedeadpunk, you are right  repo install needs to run against all hosts11:22
gokhanthanks noonedeadpunk andrewbonney it is now working 11:22
noonedeadpunkwas not me, but ok11:22
andrewbonney:)11:22
noonedeadpunkalso your original paste does limit repo_all11:23
noonedeadpunkso I guess it should be fine11:23
gokhanyes my original post does limit repo_all but when ı run it repo-install.yml is commented :(11:23
noonedeadpunkyeah, ok, gotcha11:24
gokhannoonedeadpunk, mariadb is installed on new node but it doesn't create /root/.my.cnf file I manually created this file 11:34
gokhanalso on rabbitmq it is failed "To install a new major/minor version of RabbitMQ set '-e rabbitmq_upgrade=true'.""11:36
gokhando we need to add "-e rabbitmq_upgrade=true"11:36
noonedeadpunkgokhan: yes, so that is kinda known thing....11:37
noonedeadpunkAnd  I'm not sure about it at all11:38
noonedeadpunkKnown - missing /root/.my.cnf11:38
noonedeadpunkEventually, with any modern MariaDB you are not supposed to have my.cnf11:38
noonedeadpunkAs you're expected to login as root through socket auth11:38
noonedeadpunkwhich is default11:38
noonedeadpunkAnd old envs that has root messed up would struggle with not being able to auth as root without my.cnf11:39
noonedeadpunkI guess we might want to add a note about that....11:39
noonedeadpunknot sure about rabbit, but potentially yes11:40
noonedeadpunkactually, another thing about `rabbitmq_upgrade=true` that it feels to be bad/wrong approach when quorum queues are enabled11:40
noonedeadpunkWhat I see in my sandbox now, is that rabbitmq behaves like mysql more or less - being in `activating` state until it can get clustered properly11:41
noonedeadpunkAnd our rabbitmq_upgrade currently stops everything except 1 node "by design"11:41
noonedeadpunkbut it's future problem....11:41
noonedeadpunkor well11:41
noonedeadpunknot for you gokhan at least :)11:41
gokhannoonedeadpunk, yes when I tried to check galera status on deployment host, I realized that .my.cnf is missing on the new host. it is needed when checking status for me. 11:41
gokhanyes in my env. mirroring queue is enabled :) 11:42
noonedeadpunkmirroring of queues != quorum queues11:42
noonedeadpunkthese are 2 very distinct things and switching is not very trivial, available only since 2023.211:43
noonedeadpunkand mirrored queues are considered deprecated at this point11:44
gokhanis quorum queues enabled default on bobcat ? is there any migration path from  mirrored queues to  quorum queues?11:47
noonedeadpunkno, not default, yes, upgrade is possible11:52
noonedeadpunkbut it's involving some downtime/disturbance11:52
noonedeadpunkeventually, upgrade is already there. The problem is, that to upgrade to quorum, you actually need to drop existing vhost, and create a new one which will be replicated11:53
noonedeadpunkSo after removing vhost for the service (which happens around at the beginning) and until playbook ends - service might misbehave11:54
noonedeadpunkBut so far in sandbox experience is waaay better 11:54
opendevreviewMerged openstack/openstack-ansible-rabbitmq_server master: Add the abillity to configure the logging options  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/90290811:59
gokhanthanks for information noonedeadpunk :) 12:09
NeilHanlonnoonedeadpunk ah.. yeah. i had heard about that in the Integration SIG.. :\13:46
noonedeadpunkif you around... can you check this backport pls?:) https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/907708 13:47
spatelmgariepy morning! 14:35
spatelany luck with CAPI?14:35
mgariepydidnt had time to try it.15:02
mgariepyit's for my future self ;) haha15:03
nixbuilderI know this may not be the proper place for this question... however I need to know if anyone has a procedure for deleting images and volumes using only mysql?  There are a few images/volumes that are in error.  Somehow the image/volume already was deleted on our SAN but not within the openstack databases.  I am attempting to clean this up.15:10
noonedeadpunkupdate volumes set deleted = 1 deleted_at = "2024-02-09 15:11:23" where id = UUID ?15:12
noonedeadpunkbut eventually for volumes specifically - it should not get to error if backing device is gone15:12
noonedeadpunkit should be marked as deleted properly15:12
noonedeadpunkSo you should be able to issue delete request thorugh api15:13
nixbuildernoonedeadpunk: from what I can tell cinder makes a call through the SAN driver to delete the volume, that call fails because the volume is not there and then I get an "error deleting" status on the volume.  But I will try your suggestion.15:16
noonedeadpunkhuh15:16
noonedeadpunkOk, that's different in ceph. Or well. It still tries to issue request to ceph, it says - no image, and cinder happily marks as "deleted" afterwards,15:17
noonedeadpunkSo potentially a bug in a driver, as I would expect such exception to be catched15:17
nixbuildernoonedeadpunk: Perhaps a bug in the driver... as always thanks for your help!15:18
drarveseGreetings! I'm running into an issue during the Keystone playbook where I get a "504 Gateway timeout" when adding the service project -- https://paste.openstack.org/show/bwzv3tuyCp8mLQNaTf5w/. Does anyone have any ideas? This is an AIO deployment, though I'm not using the bootstrap-aio.sh script or scenarios. This is the second time I've ran into this. The previous time (also an AIO 16:41
drarvesedeployment) I was able to get around it by deploying everything on baremetal, but that seems like a really heavy handed solution.16:41
noonedeadpunko/16:51
noonedeadpunkdrarvese: I guess, first question should be if you can access a keystone with curl from the VM?16:52
noonedeadpunkmeaning - through container IP16:52
noonedeadpunkprobably you can, as that's container timeout....16:52
noonedeadpunk*API16:52
noonedeadpunkand then if you can reach MySQL and what you see in logs inside keystone container16:53
noonedeadpunkas that sounds like some kind of connectivity issue to me...16:54
noonedeadpunkbetween what parts is a good question...16:54
noonedeadpunkso it can be haproxy -> keystone or keystone -> mysql, keystone -> memcached16:54
drarveseYeah, I can curl the keystone endpoint through its container IP. I can reach MySQL through the utility container. Lemme grab the logs from the keystone container17:00
noonedeadpunkHuh, ok, interesting17:08
noonedeadpunkand with curl it returns api version and some json?17:08
drarveseYeah17:09
noonedeadpunkI guess I would install telnet or smth like that to keystone container and would try to reach mariadb and memcached ips from it17:10
noonedeadpunkvia ips defined in /etc/keystone/keystone.conf17:10
noonedeadpunkoh, btw, can you run smth like `openstack endpoint list` from utility container?17:11
noonedeadpunkAs I assume you should get same 504?17:11
drarveseLogs from the keystone container: https://paste.openstack.org/show/byw3oUGWo0NzkzXIBHtW/17:19
drarveseAnd, yes, that returns a 50417:19
noonedeadpunkhuh17:21
noonedeadpunkaccording to the log - keystone answers eventually17:22
noonedeadpunklog looks quite short though....17:24
noonedeadpunkanother thing - have you applied same overrides as for aio?17:25
noonedeadpunkie: https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L74-L8117:25
noonedeadpunkbut franky speaking I'm not sure what's really wrong, given that keystone can connect to memcache and mariadb17:26
noonedeadpunkand system is not under some weird load17:26
drarveseNo, I haven't applied any overrides like that17:27
noonedeadpunkofc you can try to increase timeouts and see if request will eventually pass....17:29
noonedeadpunkthere're couple of variables for that: https://opendev.org/openstack/openstack-ansible-haproxy_server/src/branch/master/defaults/main.yml#L244-L25117:29
noonedeadpunkBUt in fact I experienced this sort of issues only when keystone was not able to reach memcached due to some firewalling17:30
drarveseI'm able to telnet to the MySQL IP (the internal_lb_vip_ip), but not memcached or the IP of the MySQL container17:30
noonedeadpunkwhen connection was not reseted, but dropped17:31
noonedeadpunkyeah, so then, when keystone can not reach memcached, it will wait for connection timeout and only then proceed with request17:32
noonedeadpunkWhich has high probability of timing out on haproxy17:32
noonedeadpunkI dunno how aio is done (and if it aio), but memcached and keystone containers are ideally on the same bridge inside the controller17:32
noonedeadpunkso unless it's some multi-node aio - issue is strange17:33
drarveseYeah, they are on the same bridge17:34
noonedeadpunkand memcached container does have IP on eth1?17:35
noonedeadpunkand running?17:35
drarveseYep17:36
noonedeadpunkThen I can only guess the reason might be in disabled net.ipv4.ip_forward  or smth like that...17:37
noonedeadpunkbut that should be set by openstack_hosts role even....17:37
noonedeadpunkdrarvese: ok, easy test. comment out memcached in /etc/keystone/keystone.conf, restart service. After that you should be able to issue request from utility container17:38
drarveseThat works17:40
noonedeadpunkmhm... well... you need to find out why direct connection withing same bridge does not work... While you can reach host - you can't reach other container somehow...17:44
noonedeadpunkmaybe proxy_arp is needed, but I'd doubt...17:44
noonedeadpunkthat really feels like some firewall frankly speaking17:44
noonedeadpunkdrarvese: you would totally need that for rabbitmq for sure in the future17:47
drarveseYeah. It does seem like a firewall issue. I'll look closer at that17:49
noonedeadpunkfrom osa prespective - nothing touches firewall17:53
drarveseSigh, it was a firewall issue. The FORWARD iptables chain was configured to deny stuff.18:03
noonedeadpunkthat would explain it :D18:07
opendevreviewMerged openstack/openstack-ansible master: Remove distro_ceph template from project defenition  https://review.opendev.org/c/openstack/openstack-ansible/+/90832218:41
noonedeadpunkfolks, does anybody know how VLAN in OVN works? :D18:54
noonedeadpunkLike - I do see there's a virtual switch, I also see patch-provnet that in nbdb maps to vlan18:55
noonedeadpunkas well as all ports in the network18:55
noonedeadpunkbut question more - where traffic does go out from this vlan?18:56
noonedeadpunkI guess meaning, if gateway != compute, should compute have access to vlan?18:56
noonedeadpunkAs it feels like there's anyway geneve in between18:57
noonedeadpunkjamesdenton: sorry, not sure if you're around, but I guess you might know best :D18:57
jamesdentonhi19:14
jamesdentonIIRC your gateway nodes will handle non-floatingip traffic always, and compute nodes would handle floatingip traffic when distributed routing is enabled. Otherwise, the gateway nodes handle that too19:16
jamesdentonIf it's just a provider network (w/o a neutron router) then the computes would need to have access to that vlan19:16
noonedeadpunkaha19:17
noonedeadpunkand what is non-floating ip traffic then?19:17
jamesdentonSNAT19:18
noonedeadpunkok, so routers19:18
jamesdentonSo, tenant network behind neutron router, likely geneve19:18
jamesdentonyes19:18
noonedeadpunkand fip in routers if distributed is disabled19:18
noonedeadpunkmhm, ok yes19:18
noonedeadpunkI somehow started assuming that vlan somehow goes through gateways as well19:18
jamesdentonyep19:19
noonedeadpunkbut didn't find how to prove that or dismiss19:19
noonedeadpunkand I was thinking about octavia lbaas vlan per say19:19
jamesdentonyeah that gateway node is only used when routers are in play, if it's just a VM on a vlan network straight up to the fabric then that's all through the compute19:20
noonedeadpunkso it does not have router or anything in ovn19:20
noonedeadpunkjust being in nbdb confused me I guess :D19:20
noonedeadpunkok, thanks!19:20
jamesdentonoctavia w/ ovn provider does not require the lbaas mgmt network19:20
noonedeadpunkyeah, I know that19:20
jamesdentoncool19:20
noonedeadpunkBut it does not have l4 either19:20
noonedeadpunkso meh :(19:20
jamesdentonit's a little more basic :)19:20
noonedeadpunkyeah, I mean, it can repalce some usecases, but not all I guess19:21
noonedeadpunkbtw, I've did some cleanup of your octavia ovn patch19:21
jamesdentonbut cheap!19:21
jamesdentonhow's the ovn vpnaas stuff coming along?19:21
noonedeadpunkand tested it - works nicely19:21
jamesdentonoh thank you19:21
noonedeadpunkthough it's somehow failing CI on quite unrelated failures....19:22
noonedeadpunkjamesdenton: well. it looks very nice19:22
noonedeadpunkand about working19:22
jamesdentondoes it use a namespace?19:22
noonedeadpunkI guess I just constantly messing up with bringing tunnel up19:22
noonedeadpunkit does19:22
noonedeadpunkAnd messing what's left what's right side....19:22
noonedeadpunkSo it creates a namespace with ipsec, it does use one more IP from the external network, as it can't share one with the router19:23
jamesdentonoh ok, not terrible i guess19:23
noonedeadpunkThen it also creates internal /30 network and wires up with the router19:23
jamesdentonand adds some routes to the router?19:24
noonedeadpunkyeah, I believe it does19:24
noonedeadpunkI didn't manage to make a pair fully working yet:)19:24
noonedeadpunkbut all pieces are in place, so it must work19:24
noonedeadpunkAh! And VPN is running as extra service, alike to metadata, and is registered in neutron agents19:25
noonedeadpunkAnd uses RPC....19:25
jamesdentonoh nice19:25
jamesdentoni'll give the patch a go locally this weekend19:25
jamesdentoni could never get a tunnel up in an OVS environment for some reason19:25
noonedeadpunkBut I really think I'm doing some very basic and stupid mistake when bringing 2 VPNs up19:25
jamesdentonbeen a few months since i tried though19:25
noonedeadpunk(in the same env)19:25
noonedeadpunkI'm also about to look into ovn-bgp-agent really shortly19:26
noonedeadpunkbut dunno where to take frr from...19:26
jamesdentonbeen keeping eyes on that too19:26
noonedeadpunkYeah, according to internal planning I should have done that 2 weeks ago...19:27
jamesdentondon't be so hard on yourself, i'm still working on backlog from 3 years ago19:27
noonedeadpunkhaha19:28
noonedeadpunkyeah, true19:28
noonedeadpunkbacklog from 3y ago haven't gone anywhere19:28
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove galera_client from required projects  https://review.opendev.org/c/openstack/openstack-ansible/+/90832419:32
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/2023.2: Remove distro_ceph template from project defenition  https://review.opendev.org/c/openstack/openstack-ansible/+/90828019:33
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/2023.1: Remove distro_ceph template from project defenition  https://review.opendev.org/c/openstack/openstack-ansible/+/90868119:34
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Remove distro_ceph template from project defenition  https://review.opendev.org/c/openstack/openstack-ansible/+/90868219:34
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Remove distro_ceph template from project defenition  https://review.opendev.org/c/openstack/openstack-ansible/+/90868219:35
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Remove distro_ceph template from project defenition  https://review.opendev.org/c/openstack/openstack-ansible/+/90868219:35
spateljamesdenton hey! after long time 19:42
jamesdentonhey spatel !19:43
spatelhow is your EVPN issue? 19:43
jamesdentonwhat's new?19:43
jamesdentonwe got that worked out... i think there were a few issues but mainly a mismatch between switches on the reserved vlan ranges, in addition to the lack of an infra vlan configuration19:44
jamesdentonbut we're cookin' now19:44
spateloh so it was mis-config issue right?19:44
jamesdentonyeah, at the end of the day it was19:45
jamesdentonout setup is ingress replication, no multicast19:45
jamesdentonall is well, for now19:45
spatelI am busy in building new DC and new openstack. I am looking for k8s with sriov support 19:45
jamesdentonthe fun stuff19:45
spatelDid you ever run k8s with sriov ?19:45
jamesdentoni have not19:46
spateldeveloper want to run voice application on k8s with sriov support 19:46
spatelYes OVN-BGP-AGENT is in my list 19:47
spateljamesdenton why are you using ingress replication? 19:48
spatelMulticast is easy and scalable.. 19:48
jamesdentonthis is the way our network guys wanna run it19:49
spatelIngress is easy so I can understand but using multicast give your better control on BUM engineering.. 19:50
spatelif you don't want to send BUM traffic on ABC rack then you can do that without any issue :)19:50

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!