Monday, 2022-07-18

*** ysandeep is now known as ysandeep|lunch08:38
opendevreviewMerged openstack/openstack-ansible stable/wallaby: Bump OpenStack-Ansible for Wallaby  https://review.opendev.org/c/openstack/openstack-ansible/+/84979909:20
jrosser^ hopefully this means we can now merge things on xena09:50
noonedeadpunkoh, yes, I believe we should be able now10:16
*** ysandeep|lunch is now known as ysandeep10:23
*** dviroel_ is now known as dviroel11:35
jrossercentos-8 on xena looks pretty broken13:14
jrosserhttps://paste.opendev.org/show/bbxpY7ZJU1fIKdA9w4HO/13:16
noonedeadpunkok, so they're dropping some version with time from that repo. damn13:18
noonedeadpunkthat really does suck13:18
jrosseryeah it's just not there any more https://cloudsmith.io/~rabbitmq/repos/rabbitmq-erlang/packages/?q=version%3A24.%2A-1.el8&page=313:23
spateljrosser centos-8 isn't end of life? 13:42
noonedeadpunkI bet we were talking about Stream which is not14:13
spatelmake sense 14:30
spateljrosser noonedeadpunk i have created blog for ovn deployment using OSA - https://satishdotpatel.github.io/openstack-ansible-multinode-ovn/14:34
spatelI will add more troubleshooting scenario in coming days.. 14:34
jrosserspatel: so do all gateways go to the highest priority chassis or can they be spread?14:40
jrosserlike not DVR, but if you have N "network nodes" for example14:40
spatelThey always go to high priority gateway in active-standby config14:50
spatelLets say if i set priority manually then last one automatically be active one. 14:50
jrosserthats a bit sad as i think the current L3 agent spreads the active ones around14:51
spatelHow? 14:52
jrosserwell it's keepalived ultimately14:52
spatelWe are talking about virtual router here for tenant how can you setup active active router?14:52
jrosseryes14:52
spatelIf you setup DVR with ovn then yes.. each compute node will be your routers and vms traffic will go out directly from that gateway14:54
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/xena: Sync RedHat erlang version  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/85023314:55
jrosserspatel: well thats kind of not what i mean14:56
noonedeadpunkBut then you need to pass public vlan to each compute I guess?14:56
jrosserDVR can be wasteful of external IP and you need the public network everywhere14:56
spatelnoonedeadpunk you that is correct.. 14:56
spateljrosser not in OVN based DVR 14:56
spatelOVN base DVR doesn't waste public IP :)14:57
mgariepyi don't think you need an ip on the public net for it to work only the l2 needs to be there for the network.14:57
spatelall the magic happened inside openflow 14:57
spatelmgariepy yes just need public VLAN connectivity14:58
spatellegacy DVR waste public IPs for each compute node but in OVN it doesn't. 14:59
noonedeadpunkI really eager to test vpnaas patch as well as bgp implementation with OVN....15:00
spatelI am on it.. to deploy BGP based OVN (i am stuck in devstack, causing issue to deploy stack)15:01
spatelThinking to deploy OSA instead of devstack 15:01
spatelbeautify of OVN is you can buy good smartnic for dedicated network node and offload ovs on nic to boost performance for network node 15:02
spatelbeauty*15:03
jrosser^ do you actually make this work?15:03
spatelsmartnic ?15:03
jrosseryes15:03
spatellooking for sponsor :(15:03
jrosseranyway - regarding L3 HA this suggests that the active routers are not always the same chassis https://docs.openstack.org/neutron/latest/admin/ovn/routing.html#l3ha-support15:03
jrosserthough surprising choice to have each compute node hit all the gateways constantly with BFD15:04
jrosserthats going to scale interestingly15:04
* jrosser old enough to remember cisco 6500 with not enough CPU power to do BFD on all the ports concurretly. that got interesting if you tried to.....15:05
mgariepylol15:06
mgariepydidn't your friendly cisco support expert helped you with that ?15:06
jrosseroh well we had people who knew better than to try it15:08
spatelare you concern about BFD to run on all compute nodes :) 15:08
jrosseran people unfortunately who didnt15:08
*** dviroel is now known as dviroel|lunch15:08
jrosserspatel: well it's maybe just surprising from an architecture POV - you have hundreds of compute nodes dont you?15:08
noonedeadpunkI'm personally concernd on passing public net to each compute node...15:09
jrosser^ this15:09
jrosserI dont / wont do that15:09
jrosserthough i would love to see offloaded L3 agent actually working15:09
noonedeadpunkOh yes15:10
spatelnoonedeadpunk its trad off performance / high availability or security :) being public cloud company i can understand 15:10
spatelin our case we are running private cloud and need performance as much as possible in zero downtime. 15:11
jrosseri think my concern with BFD is how little packet loss you'd need to fail out a gatway node15:14
jrosserbecasue thats the point, to give extremely fast failover15:14
jrosserand the cpu in the gateway node is handling both control plane and data plane, some data plane overload would break the control plane15:15
jrosserwhich it totally different to how a hardware router would deal with it15:15
*** ysandeep is now known as ysandeep|out15:17
admin1no one else getting => galera_server : Fail if galera_cluster_name doesnt match provided value      when doing upgrades ( minor also major ) ? 15:57
admin1i seem to always get it 15:57
spateljrosser I am sure you can control BFD packet rate per second/minute etc.. dead timer/hold timer, you can isolate host CPU or ovs threads to specific CPU for better control and not overload  16:05
spateladmin1 post full error.. i believe i have seen it 16:06
admin1running again now .. will post once i hit the error16:11
admin1https://gist.github.com/a1git/a2368b36dd8465f13c829c2354515cfc16:12
*** dviroel_ is now known as dviroel16:15
spateladmin1 mostly that means means cluster is not happy16:17
admin1but the cluster is happy , all is in sync , the name is good 16:22
spateldid you query cluster name in DB? 16:28
spatelthat playbook try to match db stored name with file stored name.. i may need to check that task to understand16:29
admin1also during upgrade, some process creates folders in the /var/lib/mysql like #tmp and tmp.xxxxx which is not a valid database names (wich appears as database names) 16:45
spatelhmm 16:49
admin1ansible galera_container -m shell   -a "mysql -h localhost -e 'show variables like \"%wsrep_cluster_name%\";'"   - all 3 return  openstack_galera_cluster16:54
jrosseradmin1: there are fixes for that #tmp stuff17:05
jrosseryou need to look at the patches we merged for that and if you are using them17:05
spateladmin1 i always set this in my user_variables.yml :) i know its default but still i do galera_cluster_name: openstack_galera_cluster17:06
admin1i am upgading from 24.x latest to 25.0.0 -- 17:12
jrosserearly adopter :)17:13
admin1someone has to :) 17:13
jrosserhttps://github.com/openstack/openstack-ansible-galera_server/commit/ebc0417919fcedd924fa5a21107055a433eca6f617:14
jamesdentonalso upgrading... running into an issue in lxc_hosts, seems ca-certificates needs to be installed in ubuntu-20-amd64... https://paste.opendev.org/show/bsvKILJ5V3woJvVHVkma/17:16
jamesdentonverifying that theory now17:16
jrosserinteresting17:18
spateljamesdenton i have notice that in 20.04.1 version but if you have ubuntu 20.04.4 you should be ok.. but i believe OSA by default doing it when it run lxc_hosts17:20
jrosserca-certificates is certainly installed in the lxc image https://github.com/openstack/openstack-ansible-lxc_hosts/blob/c679877abaaf4b8449c05def5e4f3969ebf2dd65/vars/debian.yml#L4217:20
jrosserbut if somehow that decides to use https (which is kind of shouldnt) you would be in a chicken/egg situation17:20
jamesdentoni think it is chicken/egg, but for a different reason. i think ca-certificates is needed before pkg.osquery.io repo can be added17:44
jamesdentonhttps://paste.opendev.org/show/bOl1SeK5Q6wykAutjLwH/17:44
jrosseryou might need some Acquire::https::repo.domain.tld::Verify-Peer "false"; / Acquire::https::repo.domain.tld::Verify-Host "false"; in the hosts apt.conf to make that work17:48
jrosserthat will be copied into the lxc cache before the prep script is run https://github.com/openstack/openstack-ansible-lxc_hosts/blob/c679877abaaf4b8449c05def5e4f3969ebf2dd65/vars/debian.yml#L2417:49
jrosserthough it's ugly17:49
jrosseralternative is to locally mirror (or reverse proxy) the osquery repo at an http endpoint17:50
jrosserit's a bit tricky - as we can't make any assmumptions about what the host prep has done with /etc/apt/.... so just copy the whole lot to the container base image17:52
jamesdentonor.. https://paste.opendev.org/show/btmSPKASGeF7ZPKJ2kNH/... Line 16 :D18:10
jamesdentonaka i just installed ca-certificates higher in debian_prep.sh, before the apt update18:40
jamesdentoni guess lxc_cache_prep_pre_commands could be used18:41
jamesdentonspatel this is 20.04.4, so not sure what's different18:51
spatelvery odd.. i had same issue last week with 20.04.1 but later when i deploy osa with 04.4 had no issue 18:55
jrosserjamesdenton: does that work even when creating the container cache from nothing? I guess there is sufficient repo configuration from debootstrap19:39
jrosserthough I think one of the reasons the apt config is copied in early is to account for any mirrors or proxies defined on the host19:40
*** tosky_ is now known as tosky19:45
admin1quick check ..  in one of my controller, i have like 15k threads ..  if you run a busy controller, how many threads do you guys see and also not be bothered about it 20:05
spateladmin1 what are those threads? 20:17
spatelnova/neutron blah..20:17
admin1spatel https://gist.github.com/a1git/319e4b591ab18b26fa5892f0ab7e4c7220:20
spatellooks ok to me.. mostly when i deploy multiple roles on single server then individually set worker to not overkill box 20:24
spatelby default OSA do math with number of cpu core time foo to set workers 20:25
spateli mostly start with 2 worker and then add more if i need more.. 20:25
spatelneutron_rpc_workers: 4 20:26
spatelexample20:26
admin1ok 20:37
*** dviroel is now known as dviroel|out21:39

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!