Tuesday, 2020-09-29

*** maharg101 has quit IRC00:03
*** yolanda has quit IRC00:38
*** yolanda has joined #openstack-ansible00:39
*** cshen has joined #openstack-ansible00:48
*** cshen has quit IRC00:52
*** gyee has quit IRC01:04
*** cshen has joined #openstack-ansible01:55
*** maharg101 has joined #openstack-ansible01:59
*** cshen has quit IRC02:00
*** d34dh0r53 has quit IRC02:03
*** maharg101 has quit IRC02:06
*** rh-jelabarre has quit IRC02:09
*** nurdie has joined #openstack-ansible02:26
*** nurdie has quit IRC02:30
*** d34dh0r53 has joined #openstack-ansible02:32
*** MickyMan77 has joined #openstack-ansible03:32
*** MickyMan77 has quit IRC03:40
*** cshen has joined #openstack-ansible03:55
*** cshen has quit IRC04:00
*** shyamb has joined #openstack-ansible04:03
*** maharg101 has joined #openstack-ansible04:03
*** maharg101 has quit IRC04:08
*** MickyMan77 has joined #openstack-ansible04:10
*** idlemind has quit IRC04:16
*** idlemind_ has joined #openstack-ansible04:16
*** shyamb has quit IRC04:18
*** MickyMan77 has quit IRC04:18
*** nurdie has joined #openstack-ansible04:27
*** nurdie has quit IRC04:31
*** evrardjp has quit IRC04:33
*** evrardjp has joined #openstack-ansible04:33
*** zigo has quit IRC04:38
*** shyamb has joined #openstack-ansible04:46
*** cshen has joined #openstack-ansible04:48
*** MickyMan77 has joined #openstack-ansible04:51
*** cshen has quit IRC04:53
*** MickyMan77 has quit IRC04:59
*** shyam89 has joined #openstack-ansible05:05
*** shyamb has quit IRC05:08
*** cshen has joined #openstack-ansible05:15
*** cshen has quit IRC05:19
*** suryasingh has joined #openstack-ansible05:30
*** MickyMan77 has joined #openstack-ansible05:38
*** MickyMan77 has quit IRC05:41
*** itandops has joined #openstack-ansible05:44
*** miloa has joined #openstack-ansible05:57
*** itandops has quit IRC05:57
*** maharg101 has joined #openstack-ansible06:04
*** pcaruana has joined #openstack-ansible06:05
*** maharg101 has quit IRC06:09
BlackFXis it normal for rabbit to sit constantly at 80%+ CPU?06:24
*** shyam89 has quit IRC06:26
janno_BlackFX: yes06:26
*** nurdie has joined #openstack-ansible06:28
BlackFXOkay06:28
BlackFXI seem to have a really slow horizon06:28
janno_BlackFX: This is due to BEAM06:31
janno_BlackFX: https://stressgrid.com/blog/beam_cpu_usage/06:31
*** nurdie has quit IRC06:32
BlackFXmemcached had too many open files06:36
*** noonedeadpunk has quit IRC06:37
*** shyamb has joined #openstack-ansible06:42
*** noonedeadpunk has joined #openstack-ansible06:46
*** djhankb has quit IRC07:09
*** djhankb has joined #openstack-ansible07:10
*** maharg101 has joined #openstack-ansible07:10
*** dirk has quit IRC07:13
*** shyamb has quit IRC07:14
*** dirk has joined #openstack-ansible07:15
*** cshen has joined #openstack-ansible07:16
jrossermorning07:19
jrossernoonedeadpunk: if you have an idea on this - i don't really see why it still breaks https://review.opendev.org/#/c/75472207:19
noonedeadpunkmorning jrosser:)07:19
*** shyamb has joined #openstack-ansible07:23
noonedeadpunkok, so aodh and panko fails due to gnocchi patch07:23
noonedeadpunkso in order to revert we need 754722 merged07:23
noonedeadpunkand 754722 doesn't seem to have DB creation delegated07:23
*** shyam89 has joined #openstack-ansible07:26
jrosseri was expecting to see 754722 pass to give confidence it was OK to merge the aodh and panko patches without CI07:27
*** shyamb has quit IRC07:29
*** shyam89 has quit IRC07:31
noonedeadpunklet's probably try re-checking it, but dunno if that gonna work07:32
noonedeadpunkas seems like it didn't pull changes for me07:32
jrosseryeah, or there is some other underlying problem in aodh role that i don't spot07:34
*** andrewbonney has joined #openstack-ansible07:44
*** tosky has joined #openstack-ansible07:45
*** cshen has quit IRC07:45
*** jbadiapa has joined #openstack-ansible07:49
*** cshen has joined #openstack-ansible08:00
*** shyamb has joined #openstack-ansible08:02
openstackgerritJonathan Rosser proposed openstack/openstack-ansible master: Use nodepool epel mirror in CI for systemd-networkd package  https://review.opendev.org/75470608:03
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-rabbitmq_server master: Require the use of community.rabbitmq ansible collection  https://review.opendev.org/75465708:06
openstackgerritJonathan Rosser proposed openstack/openstack-ansible master: Add user defined collections  https://review.opendev.org/75341108:06
*** cshen has quit IRC08:26
*** mensis has joined #openstack-ansible08:27
*** djhankb has quit IRC08:28
*** djhankb has joined #openstack-ansible08:28
*** nurdie has joined #openstack-ansible08:29
*** cshen has joined #openstack-ansible08:33
*** sshnaidm|off is now known as sshnaidm08:33
*** nurdie has quit IRC08:34
jrosseri updated the linter version, and it doesnt like this https://github.com/openstack/openstack-ansible-tests/blob/master/test-prepare-host.yml#L22408:45
jrosserand i agree :)08:46
noonedeadpunkin terms of replace?:)08:48
jrosser'Don't compare to literal True/False'08:48
noonedeadpunkah08:48
jrosserit's taken me a while just to work out what an earth it is doing08:48
noonedeadpunklol, yes08:48
noonedeadpunkwait, really...08:51
noonedeadpunkwhat are we appending here xD08:51
jrosseryes exactly08:51
jrosserit's quite special08:51
jrosseri think it makes a list of 'true true true true'08:51
jrossermy head hurts now!08:52
noonedeadpunkand we're asserting list of trues?:)08:52
jrosseryup08:52
noonedeadpunkI want to unsee it08:54
jrosseri need to go sit in a quiet dark place for a while now08:56
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-galera_server master: DNM Try to understand what's wrong in CI  https://review.opendev.org/75461008:57
openstackgerritJonathan Rosser proposed openstack/openstack-ansible master: Use nodepool epel mirror in CI for systemd-networkd package  https://review.opendev.org/75470609:01
openstackgerritJames Gibson proposed openstack/openstack-ansible-ops master: Change ansible tests to prefer Python3 over Python2 in vitualenv  https://review.opendev.org/75177309:03
openstackgerritMerged openstack/openstack-ansible-openstack_hosts stable/ussuri: Use xt_MASQUERADE instead of ipt_MASQUERADE for kernels > 5.2  https://review.opendev.org/75483309:09
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-galera_server master: DNM Try to understand what's wrong in CI  https://review.opendev.org/75461009:27
*** arxcruz has quit IRC09:35
*** shyamb has quit IRC09:45
*** arxcruz has joined #openstack-ansible09:46
*** mensis has quit IRC10:05
*** nurdie has joined #openstack-ansible10:13
*** nurdie has quit IRC10:18
masterpeCinder-volume had too many open files, I think it was default (1024 4096), we change it by creating a files /etc/systemd/system/cinder-volume.service.d/limits.conf with content LimitNOFILE=16384.10:23
masterpeis LimitNOFILE managed by openstack-ansible?10:24
*** shyamb has joined #openstack-ansible10:43
*** nurdie has joined #openstack-ansible10:44
*** nurdie has quit IRC10:48
*** shyam89 has joined #openstack-ansible10:51
*** miloa has quit IRC10:52
*** shyamb has quit IRC10:53
noonedeadpunkjrosser: galera is super weird... like it fails always 2nd container, and service restart just got stuck locally. but in case of container restart it just spawns and joins cluster without issues...10:55
noonedeadpunkI really not sure what's wrong with it...10:55
*** cshen has quit IRC10:59
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-galera_server master: DNM Try to understand what's wrong in CI  https://review.opendev.org/75461010:59
*** cshen has joined #openstack-ansible11:03
*** djhankb has quit IRC11:07
*** djhankb has joined #openstack-ansible11:08
*** cshen has quit IRC11:10
*** shyam89 has quit IRC11:12
*** cshen has joined #openstack-ansible11:13
*** shyamb has joined #openstack-ansible11:22
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/ansible-role-systemd_mount master: Install required packages for NFS/CephFS mounts  https://review.opendev.org/75497811:27
openstackgerritJames Gibson proposed openstack/openstack-ansible-ops master: Change ansible tests to prefer Python3 over Python2 in vitualenv  https://review.opendev.org/75177311:29
jrossermasterpe: we already make limits.conf for galera and memcached, plus also for the swift service like this https://opendev.org/openstack/openstack-ansible-os_swift/src/branch/master/defaults/main.yml#L34611:30
jrosserif you think that we need to increase the default for cinder then we can do something similar11:30
jrosserfor the time being you can make a config override of this variable https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/defaults/main.yml#L33011:31
jrosserand add whatever you need to the cinder volume systemd unit11:31
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/ansible-role-systemd_mount master: Install required packages for NFS/CephFS mounts  https://review.opendev.org/75497811:41
*** shyamb has quit IRC11:49
*** rh-jelabarre has joined #openstack-ansible11:53
*** rh-jelabarre has quit IRC11:53
*** rh-jelabarre has joined #openstack-ansible11:54
*** shyamb has joined #openstack-ansible11:58
jrossernoonedeadpunk: in the galera functional test, do we run it in serial container1/2/3 or all at the same time?12:02
*** shyam89 has joined #openstack-ansible12:03
*** cshen has quit IRC12:04
*** shyamb has quit IRC12:06
noonedeadpunkall at the same time12:07
noonedeadpunkI was thinking about serial tbh12:08
noonedeadpunkas we run serial in prod12:08
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-tests master: Update ansible-linti==4.3.5, flake8==3.8.3, bashate==2.0.0  https://review.opendev.org/75498212:11
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-tests master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0  https://review.opendev.org/75498212:11
openstackgerritJonathan Rosser proposed openstack/openstack-ansible master: Use nodepool epel mirror in CI for systemd-networkd package  https://review.opendev.org/75470612:17
*** rfolco|ruck has joined #openstack-ansible12:22
*** shyam89 has quit IRC12:29
*** shyam89 has joined #openstack-ansible12:29
*** shyam89 has quit IRC12:30
*** shyam89 has joined #openstack-ansible12:31
snadgenoonedeadpunk: how did you know about jumbo frames and no route to host issue i was having?12:31
noonedeadpunkso it was them?:)12:31
snadgei've done some testing since.. and local to container itself, i dont seem to have any connectivity issues12:31
snadgebut from the controller to the container.. i get no route to host, which doesn't make sense to me12:31
noonedeadpunkfrom controller to container within same controler or container on other controller?12:32
openstackgerritJonathan Rosser proposed openstack/openstack-ansible master: Remove python3 packages from bindep.txt  https://review.opendev.org/75498712:32
snadgei get no route to host from other hosts on the same br-mgmt network12:32
snadgewhich could be a firewalling issue or anything.. i honestly hate networks and its not my thing12:33
snadgewhat i do know is.. i shouldn't get an intermittent fault no route to network which then disappears from the controller to the neutron container12:33
snadgei'd love to know why that happens, and i suspect it has something to do with using centos 712:34
noonedeadpunkeventually you shouldn't have it at all:)12:34
*** shyam89 has quit IRC12:34
noonedeadpunkbut in case of jubmo frames I had to set lc container interfaces MTU specificly to 145012:34
*** shyam89 has joined #openstack-ansible12:34
snadgeinteresting12:35
noonedeadpunkI guess I did that even in lxc config or smth like that12:35
mgariepyjumbo frame is a mess;) hahaah12:35
snadgeif that fixes it i owe you a carton of beers12:35
openstackgerritJames Gibson proposed openstack/openstack-ansible-ops master: Change ansible tests to prefer Python3 over Python2 in vitualenv  https://review.opendev.org/75177312:38
*** yolanda has quit IRC12:38
noonedeadpunksnadge: I probably set `lxc_container_default_mtu` for that12:39
*** yolanda has joined #openstack-ansible12:39
noonedeadpunkor maybe set it in container_networks....12:40
noonedeadpunkcan't really recall nowadays...12:40
jrossermtu and no route to host is a really odd combination12:41
jrosseri wonder what 'ip r' has to say12:41
snadgeit shows the routes obviously.. and the issue is with the br-mgmt network (apparently)12:42
snadgeat the time that it has the problem maybe that route is missing.. is what i should check12:43
*** nurdie has joined #openstack-ansible12:45
*** shyam89 has quit IRC12:46
snadgeok i have confirmed the route is still there according to ip r12:47
snadgebut i got "no route to host" from telnet to the neutron backend server port (9696) immediately prior to and after that12:47
snadgeand then however many seconds later.. it connects and starts working again. frustratign12:47
noonedeadpunkI think what may happen here is that vlan interface with mtu 1450 is part of the br-mgmt, but lxc tries to use mtu 1500 by default12:49
*** nurdie has quit IRC12:49
snadgeim curious to try changing that to 1450.. where can i do that?12:50
snadgei mean.. thats a simple thing to try right12:50
noonedeadpunkyep, you can either set in lxc directly, or if doing it normally, you should put into /var/lib/lxc/container_name/eth1.ini or smth like this12:51
noonedeadpunkand restart container12:51
jrossertheres no encapsulation going on with the lxc bridges so really the whole thing should be 1500-mtu transparent all they way12:52
noonedeadpunkor, set `lxc_container_default_mtu` in user_variables, and run some playbook... like containers-lxc-create.yml12:52
jrosseryou can do 'ping -M do -s <number-28> <destination-ip>' and fiddle around with the value of 'number' to find the actual mtu that will pass12:54
jrosserthe 28 accounts for ICMP and ethernet header12:55
snadgei just changed neutron to 1450 from 150012:55
snadgeand rebooted the container12:55
*** shyamb has joined #openstack-ansible12:56
jrosserso 1450 would generally be the setting in neutron when your project network type is vxlan12:56
jrosserthat means that the packets created by your VM will be small enough to fit inside a vxlan packet and still be smaller than 150012:57
*** shyamb has quit IRC12:57
*** shyamb has joined #openstack-ansible12:57
snadgethe mtu didn't seem to apply so im just rebooting the entire controller ;)12:59
*** shyamb has quit IRC12:59
*** shyamb has joined #openstack-ansible12:59
jrosserip -d link show <- that'll show you what you've got12:59
snadgethats what i should've done yeah.. this will take about 5-10 minutes to come back up (blade server)12:59
jrosserso "right answer" here depends what you want to happen13:01
jrosserif you need to pass vxlan traffic over a 1500mtu underlying network then the neutron networks need to be 145013:01
snadgebut that doesn't really make sense why it would work intermittently13:02
jrosserand that can propagate across interfaces and affect other stuff if you share the bridges with your containers and so on13:02
*** shyamb has quit IRC13:04
*** shyamb has joined #openstack-ansible13:05
*** cshen has joined #openstack-ansible13:06
*** shyamb has quit IRC13:07
*** shyamb has joined #openstack-ansible13:08
snadgeok well the neutron container has a 1450 mtu now and its still doing the layer 4 no route thing13:15
*** shyamb has quit IRC13:15
snadgeim starting to wonder if i should just give someone else a login to this server somehow via a port forward or whatever13:16
*** nurdie has joined #openstack-ansible13:31
snadgei need to find a resolution on this problem ideally within the next week or so.. i've at least narrowed it down to some kind of lxc networking issue13:33
snadgesince the problem is easily reproducible between the lxc host, and the container which is running on the same blade13:34
snadgeif i telnet to localhost within the container.. it always connects and i never get the no route to host issue13:34
jrossernoonedeadpunk: are we missing a release for the most recent set of SHA bumps?13:42
noonedeadpunkwe do as we had a bug there, which we closed right afterwards, but I couldn't recall what exactly it is13:43
*** sshnaidm has quit IRC13:48
openstackgerritMerged openstack/openstack-ansible-openstack_hosts master: Updated from OpenStack Ansible Tests  https://review.opendev.org/75415913:55
*** sshnaidm has joined #openstack-ansible13:58
*** nurdie_ has joined #openstack-ansible13:59
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-tests master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0  https://review.opendev.org/75498214:02
*** nurdie has quit IRC14:02
*** pcaruana has quit IRC14:15
*** pcaruana has joined #openstack-ansible14:24
*** spatel has joined #openstack-ansible14:24
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-tests master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0  https://review.opendev.org/75498214:24
spateljrosser: spotz does this make sense to you guys? I have created my blog for octavia networking - https://satishdotpatel.github.io//openstack-ansible-octavia/14:25
jrosserspatel: that is cool - nice diagram14:31
jrosseri am guessing you don't use neutron l3 agent?14:31
spateljrosser: no i have my Cisco ASA is my gateway so pure vlan base provider14:32
jrosserand the DHCP, does neutron do that for you?14:33
*** theintern has joined #openstack-ansible14:33
jrosserlike amphora IP14:33
spatelYes everything neutron does that for me.14:33
spateli get IP address on amphora (dual IP, 1. mgmt and 2. vm traffic)14:34
jrosserright - so there will be something like qdhcp namespace on the controller also talking to eth1414:34
jrosserwell actually i'm not sure how it'll be wired actually, but it would be great to add that too14:35
openstackgerritJames Gibson proposed openstack/openstack-ansible-ops master: Change ansible tests to prefer Python3 over Python2 in vitualenv  https://review.opendev.org/75177314:38
spateljrosser: let me find that out how does my neutron DHCP agent get wire-up with br-lbaas (currently my lab is broken and trying to bring it up)14:39
jrosserspatel: cool - it's good to add becasue theres three things in play, the octavia container, neutron dhcp and the wiring to the amphora14:40
spateljrosser: i do have namespace for VLAN 27 subnet on controller node running qdhcp14:41
spotzspatel: I can't help myself... no did here and configured - how did i configure:)14:41
spotzspatel: same changes here - how did i wire - how I wired14:42
spatellet me show you how14:42
spotzspatel:  i didn’t created - I didn't create14:43
spotzHelp I'm doing reviews not in Gerrit:)14:43
spatelhere you go - http://paste.openstack.org/show/798522/14:44
spateltap interface tapbbe749e9-8f is connected to vlan.27 and namespace for qdhcp14:45
spatelspotz: I am also kinda new for octavia so lets clear all doubt here and then we will write up nice official doc with good example for new folks..14:47
spateljrosser: does that make sense to you - http://paste.openstack.org/show/798522/14:47
CeeMacsnadge: its definitely worth performing the ping exercise jrosser mentioned to try and work out the maximum MTU14:48
spotzspatel: I'll put what you have in a local doc and clean it up when I get home in a bit, if you want PM my your email to send it back to you14:48
jrosserspatel: i think so - neutron has created vlan.27, and you can see that the tap name matches up with the ns name at the top of your paste14:48
CeeMacI've had all kinds of crazy issues in various network environments where the MTU has been out of alignment.14:49
spateljrosser: spotz i will add DHCP namespace in diagram also so it will be little clear to understand how dhcp handing over lbaas-mgmt ip14:49
jrosserCeeMac: i was wondering if the issue there was using br-vlan for the mgmt traffic as well as the neutron vlans14:50
jrosserneutron will fiddle with the MTU on the interfaces and that could easily mess up other things that you use the bridge for if they don't account for the changed MTU14:51
CeeMacjrosser: yeah. makes sense.  at first I thought it might have been similar to the issues i'd seen trying to run controller on vmware, but then its intermittent whereas I had constant no route to host14:52
CeeMacmtu issues are haunting me at the moment it seems14:52
jrosserwell, if it's not dns it'll be mtu :/14:52
CeeMac+114:55
*** theintern has quit IRC14:58
jrosserspatel: also in /etc/openstack_deploy/openstack_user_config.yml do you have a used_ips section keeping the containers out of 172.27.40.200-172.27.40.250 ?14:58
spatelyes i do but i missed that in my doc14:59
spateli will add that14:59
jrosserquite a small range btw - only 50 amphora ip there out of a whole /2414:59
spatelits my lab :)15:00
spatelin production i have /21 range15:00
jrosserah cool15:01
spatelnow i am build new datacenter using VxLAN+EVPN (spine-leaf) and going to run octavia and senlin in production there so doing all preliminary exercise in lab.15:02
spatel200 node private cloud.15:03
CeeMacspatel: nice :D15:03
CeeMacwhat hardware are you running that on switch/router wise?15:03
spatelI am planning to to make 6 node controller ( 3 node for all API and other 3 nodes for shared services like mysql, rabbitmq etc..)15:04
spatelWe are using Cisco nexus 9336-FX2 for spine and Cisco nexus 9396PX for leaf15:04
jrosser^ snap15:04
jrosseri have evpn on 9336-FX2 here15:04
jrosseralso same split of 3 x infra / 3x shared nodes15:05
CeeMachaven't looked at Nexus switches for a while15:05
spatelthat switches are beast, it can support 10G to 100G :)15:05
CeeMacguess they're pretty pricey :)15:05
jrosserdo you have the evpn running?15:06
spateljrosser: are running 3x shared nodes in LXC or metal way?15:06
jrosserlxc15:06
spatelcool that is what i am thinking15:06
spatelI am going to run OSFP+BGP style evpn for datacenter15:06
spatelcurrently practicing them on Cisco VIRL simulator :) + in my network lab15:07
jrosseri have octavia lbaas network on an evpn, works nicely15:07
spatelnice!15:08
spateljrosser: do you guys run multicast for BUM traffic?15:08
jrosseryes15:08
spatelsame here :)15:08
jrosseroh - you mean neutron VXLAN or the nxos stuff, becasue both15:09
spatelnxos VXLAN15:09
jrosserright yes using multicast for that15:09
spatelEVPN multicast :)15:09
jrosserthen we also made TRM work inside the evpn tunnels for muticast applications15:09
spatelAre you guys using anycast gatewya on leaf?15:10
jrosseryes15:10
jrosseralso leaf are VPC pair15:10
spatelsame here15:10
spatelvPC for all TOR15:10
spateljrosser: it would be great if you share your config if that is not very confidential :)  i would like to see if i am following all best practices.15:11
jrosserwe did eBGP for the underlay15:11
spateli can share mine next week when i will start rolling out all config15:11
spatelI was thinking to use eBGP but its little complicated so i decided to use OSFP for underlay15:12
jrosserthe config will be much smaller or OSPF15:12
jrosser*for15:12
spateleBGP required lots of typing and peers while OSPF is very simple and copy paste conifig:)15:12
spateleBGP is good for massive datacenter design but we have only 10 racks :)15:13
noonedeadpunknot _so_ small, considering potential growth15:13
spatelnow we are planning to build multiple datacenter instead putting all eggs on single bucket.15:14
spatelsoon planning to open another datacenter in EU and then Singapore15:15
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-tests master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0  https://review.opendev.org/75498215:17
spateljrosser: this is what i am trying to build in new DC - https://ibb.co/5vc4bn215:17
jrosserreally 40g? :)15:18
spatelyes15:18
spatelwhy?15:18
jrosser100G optics are cheap15:19
jrosserif you don't buy cisco.....15:20
mgariepylol.15:20
spatelWe already have lots of optics in stock so thought lets use them.. i don't think we will ever max out any link15:20
CeeMaclaser2000 FTW15:20
jrosseroh well thats ok if you have them :)15:20
spatelWe used fs.com mostly15:20
CeeMaci'd love to get 40GB DCIs15:21
CeeMacstuck at 10GB without large bag of cash15:21
spatelWe also have 10G DCI15:22
spatelwe don't have L2 stretch between DC15:22
CeeMacthats what EVPN is there for :p15:24
jrosserspatel: these have been good in the 9336 https://www.fs.com/uk/products/65210.html15:24
CeeMacwe use MPLS-EVPN for DCI15:25
spateljrosser: someday we will upgrade from 40G to 100G :)15:26
openstackgerritJames Gibson proposed openstack/openstack-ansible-ops master: Change ansible tests to prefer Python3 over Python2 in vitualenv  https://review.opendev.org/75177315:31
spateljrosser: what is the configuration of you 3x shared infra nodes? cpu + memory etc..15:31
*** gyee has joined #openstack-ansible15:31
jrosserthey are fairly small, xeon-d 8C/16T with 64G15:32
fridtjof[m]I just upgraded from Stein to Train, and somehow the placement service broke15:36
fridtjof[m]When creating an instance, nova-scheduler complains: "Failed to retrieve allocation candidates from placement API for filters [...]" and gives me a 50315:36
fridtjof[m](with an HTML body)15:37
fridtjof[m]I checked both my (new, apparently) placement containers, and they're running fine15:37
fridtjof[m]both are UP in haproxy15:37
*** mensis has joined #openstack-ansible15:38
fridtjof[m]oh, seems like nova-scheduler is still trying to access nova_api_placement_front?15:38
openstackgerritJonathan Rosser proposed openstack/openstack-ansible master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0  https://review.opendev.org/75506515:39
jrosserfridtjof[m]: there are i think some specific options on the upgrade scripts for placement S->T, did you see those?15:40
fridtjof[m]I just ran the run-upgrade script, assuming it would do all that's described in the major upgrade documentation15:40
fridtjof[m]I see it sets up the new placement containers, which exist now and seem to be fine15:41
fridtjof[m]what it definitely missed was removing the legacy backends from haproxy15:41
fridtjof[m]also, nova config seems to be untouched in that matter15:41
jrossersee https://github.com/openstack/openstack-ansible/blob/stable/train/scripts/run-upgrade.sh#L17915:42
jrosserplacement_migrate_flag=true is intended to make the changes you need15:43
fridtjof[m]yeah, I think i'll just rerun them and take a close look15:44
jrossertheres step-by-step at the bottom of here https://docs.openstack.org/openstack-ansible/train/admin/upgrades/major-upgrades.html15:48
fridtjof[m]ah, looking at the haproxy config it seems like there's both the old and new placement frontends defined with the same ports15:49
fridtjof[m]and because of order, the old one takes precedence15:49
fridtjof[m]just going to redeploy haproxy then15:49
fridtjof[m]yup, got that page open :)15:50
spateljrosser: i do have 64GB + 2.5GHz cpu with 48 cores, i have 200 compute nodes so hope it should be enough15:50
jrosserfridtjof[m]: the haproxy role works by dropping lots of config fragments then using the ansible 'assemble' module to glue them together into one config file15:52
jrosseri'm not quite seeing at the moment where in the upgrade process the old placement frontend is removed15:52
fridtjof[m]yeah, it's just ignoring /etc/haproxy/conf.d/nova_api_placement15:53
fridtjof[m]I see the two steps generating config files and dropping files for non present services, but the list for that does not seem to contain nova_api_placement15:54
jrosserto remove it, the entry should be in the list of endpoints but state: absent15:57
jrosserthen it gets deleted15:57
dmsimardpleasantly surprised to see upgrading to ansible 2.10 didn't seem to break much ?15:58
jrosserdmsimard: seems to be working out ok15:58
dmsimardthat's neat considering the amount of changes under the hood15:58
jrosserwould be interested to see if the whitespace/padding can be tightened up with 1000's of tasks15:59
dmsimardin the ara reports you mean ?15:59
jrosseryeah15:59
dmsimardyeah, definitely15:59
jrosseri futzed around a bit in the browser developer tools - just confirmed that i actually don't know what i'm doing :)15:59
dmsimardit's probably some <tr> css somewhere ¯\_(ツ)_/¯16:00
noonedeadpunk#startmeeting openstack_ansible_meeting16:01
openstackMeeting started Tue Sep 29 16:01:03 2020 UTC and is due to finish in 60 minutes.  The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.16:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.16:01
*** openstack changes topic to " (Meeting topic: openstack_ansible_meeting)"16:01
openstackThe meeting name has been set to 'openstack_ansible_meeting'16:01
fridtjof[m]jrosser: found the issue i think16:01
noonedeadpunk#topic office hours16:01
*** openstack changes topic to "office hours (Meeting topic: openstack_ansible_meeting)"16:01
noonedeadpunk\o/16:01
jrossero/ hello16:01
noonedeadpunkOk, so telemetry failure?16:02
noonedeadpunkI'd say let's maybe trying to merge aodh at least?16:04
jrosserworst case we have to revert it16:05
jrosseri've not had opportunity to test that locally yet16:05
noonedeadpunkI think worst case we will have just another patch to fix it16:05
jrosserright, thats fine16:05
jrosserso thats this https://review.opendev.org/#/c/75479116:06
jrosserfollowed by https://review.opendev.org/#/c/754720/16:06
noonedeadpunkyep16:06
noonedeadpunkok, then next thingis galera....16:07
noonedeadpunkI tried to look into it and it fails in so many different ways....16:07
noonedeadpunkWhen I deployed it locally it was passing 3 or 4 times in a row when I decided that it's ok16:08
noonedeadpunkin some cases there was smth weird with container, as service start was just hanging...16:09
noonedeadpunkso at the moment, we have 2 scenarios16:09
*** nurdie has joined #openstack-ansible16:10
noonedeadpunk1st is old one, when one of the containers don't see address of another partner. and it's the issue of this specific member, and it goes back to ok state in case of restart16:10
noonedeadpunkwhile cluster is synced in this state16:11
noonedeadpunk2nd case when one of the containers is really down and didn't get up. IN this case we should restart not containers which don't see neighboor but down member...16:12
noonedeadpunkAnd I dunno how to make ogic to make it work16:12
noonedeadpunk*logic16:12
noonedeadpunkFrom other side, we can add serial and probably forget about the issue at once16:12
*** nurdie_ has quit IRC16:13
jrosserfor the first case do you think that the container networking is completely broken16:13
noonedeadpunkno, for the first 3 members are up and synced, but one of them show only 2 addresses in wsrep_incoming_addresses16:15
noonedeadpunkwhich doesn't affect anything functionally, except it's weird and our tests fail16:15
jrosserthe functional test is kind of tech debt somehow16:17
jrosserwe could have an integrated test with affinity=3 on the container16:17
jrosserthen expand the galera role to have cluster status checks16:17
noonedeadpunkhave no idea how to do the last part16:17
noonedeadpunkor just do cluster checks by default?16:19
noonedeadpunkor with some var passed?16:19
noonedeadpunkhm, yeah, might be16:19
jrossersee affinity on here https://github.com/openstack/openstack-ansible/blob/master/doc/source/admin/maintenance-tasks/containers.rst16:22
jrosseri never used this though.... maybe works!?16:22
noonedeadpunkyeah. I was not about affinity, but about how to extend role with tests)16:22
noonedeadpunkme too lol16:22
jrosseryes so it would be optional sanity checks i guess, you don't want that interfering when trying to rescue a broken galera cluster16:23
jrosserand some flag to make everything stop after setup-openstack16:23
jrosser*setup-infrastructure16:23
openstackgerritJames Gibson proposed openstack/openstack-ansible-ops master: Change ansible tests to prefer Python3 over Python2 in vitualenv  https://review.opendev.org/75177316:24
noonedeadpunkhm, yeah, makes sense16:24
jrossernoonedeadpunk: i have to head out for a bit but there is still a lot to go over for V release16:25
jrosseri sort of took over the PTG etherpad to track all these patches16:25
jrosserwe needs the linters fixed for at least openstack-ansible-tests to land 2.10.1 patch there16:25
noonedeadpunkAnd I think it's about time to freeze master bumps?16:26
noonedeadpunkor at least switch master to victoria...16:26
noonedeadpunkso we don't start figting with W issues16:27
*** djhankb has quit IRC16:33
openstackgerritMerged openstack/openstack-ansible-os_aodh master: Remove CI jobs to allow db setup patch to merge  https://review.opendev.org/75479116:33
openstackgerritMerged openstack/openstack-ansible-os_aodh master: Use the utility host for db setup tasks  https://review.opendev.org/75472016:33
*** djhankb has joined #openstack-ansible16:33
jrosseryes though we also start fighting requirements changes too as they’re based off the branch name16:36
*** d34dh0r53 has quit IRC16:36
jrossernoonedeadpunk: maybe an extra keyword on the scenario “infra” we could just run the first part of the deploy16:37
*** d34dh0r53 has joined #openstack-ansible16:38
noonedeadpunkjrosser: we can actually just to break here in case of some scenarios https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/gate-check-commit.sh#L18816:41
jrosserright - perhaps a small step to getting rid of the functional tests16:42
noonedeadpunkyeah, I think I will try doing that tomorrow instead of trying to revive functional tests as is16:44
jrosserbetter value time I think16:44
noonedeadpunkyeah16:45
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-tests master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0  https://review.opendev.org/75498216:47
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/ansible-role-systemd_mount master: Install required packages for NFS/CephFS mounts  https://review.opendev.org/75497816:48
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/ansible-role-systemd_mount master: Install required packages for NFS/CephFS mounts  https://review.opendev.org/75497816:49
*** olivierbourdon38 has quit IRC16:51
*** olivierbourdon38 has joined #openstack-ansible16:52
* jrosser back16:57
noonedeadpunkbtw, mensis have completed fixing monasca role at least for train16:58
jrosserthats good - so long as we can keep on top of it16:59
jrosserlike senlin CI already seems completely broken :(16:59
noonedeadpunkoh damn17:00
* jrosser wish we had a better dashboard for periodic jobs17:00
jrosserit's kind of easy if you're a one-repo project to look in zuul state17:00
jrosserbut with so many it's just really hard17:01
noonedeadpunkyeah it is...17:01
noonedeadpunkbut what I was going to say about monasca - we have retired roles17:01
noonedeadpunkand I was thinking about reviving it17:01
noonedeadpunkthe thing was, that monasca had 2 repos - for service and agent17:01
noonedeadpunkand I was thinking if it's worth mmerging them now17:02
noonedeadpunklike we did for galera17:02
jrosserthat makes sense, it's not unlike neutron or nova really17:02
noonedeadpunkpoint in separation might be, that agent installation can be provided to customers who know nothing about osa17:02
jrosserwhat does it create?17:03
noonedeadpunkI think it grabs data from vms?17:03
noonedeadpunklike prometheus expoter or smth...17:03
mensisits for grabbing metrics, and it has several plugins which including gathering metrics from vms17:04
openstackgerritJonathan Rosser proposed openstack/openstack-ansible master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0  https://review.opendev.org/75506517:05
noonedeadpunkbut the thing is, that monasca can be left without PTL and not sure about project future because of that...17:06
noonedeadpunk#endmeeting17:12
*** openstack changes topic to "Launchpad: https://launchpad.net/openstack-ansible || Weekly Meetings: https://wiki.openstack.org/wiki/Meetings/openstack-ansible || Review Dashboard: https://bit.ly/2SAcGAn"17:12
openstackMeeting ended Tue Sep 29 17:12:48 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)17:12
openstackMinutes:        http://eavesdrop.openstack.org/meetings/openstack_ansible_meeting/2020/openstack_ansible_meeting.2020-09-29-16.01.html17:12
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/openstack_ansible_meeting/2020/openstack_ansible_meeting.2020-09-29-16.01.txt17:12
openstackLog:            http://eavesdrop.openstack.org/meetings/openstack_ansible_meeting/2020/openstack_ansible_meeting.2020-09-29-16.01.log.html17:12
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_keystone master: Fix keystone nginx behaviour  https://review.opendev.org/75438217:16
*** ianychoi_ has joined #openstack-ansible17:17
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_keystone master: Fix keystone nginx behaviour  https://review.opendev.org/75438217:18
*** cshen has quit IRC17:19
*** ianychoi has quit IRC17:20
*** andrewbonney has quit IRC17:32
*** maharg101 has quit IRC17:34
*** cyberpear has quit IRC17:36
*** PrinzElvis has quit IRC17:37
*** sri_ has quit IRC17:37
*** suryasingh has quit IRC17:37
*** mnaser has quit IRC17:37
*** fyx has quit IRC17:37
*** viks____ has quit IRC17:37
*** gixx has quit IRC17:37
*** gundalow has quit IRC17:37
*** jrosser has quit IRC17:38
*** mwhahaha has quit IRC17:38
*** mubix has quit IRC17:38
*** nicolasbock has quit IRC17:38
*** johnsom has quit IRC17:38
*** alanmeadows has quit IRC17:38
*** jungleboyj has quit IRC17:38
*** guilhermesp has quit IRC17:38
*** CeeMac has quit IRC17:38
*** Open10K8S has quit IRC17:38
*** gouthamr has quit IRC17:39
*** johnsom has joined #openstack-ansible17:40
*** mubix has joined #openstack-ansible17:47
*** jungleboyj has joined #openstack-ansible17:48
*** guilhermesp has joined #openstack-ansible17:48
*** fyx has joined #openstack-ansible17:48
*** sri_ has joined #openstack-ansible17:49
*** mwhahaha has joined #openstack-ansible17:49
*** nicolasbock has joined #openstack-ansible17:49
*** cyberpear has joined #openstack-ansible17:49
*** CeeMac has joined #openstack-ansible17:50
*** Open10K8S has joined #openstack-ansible17:50
*** mnaser has joined #openstack-ansible17:55
*** gundalow has joined #openstack-ansible17:58
*** mensis has quit IRC17:58
*** suryasingh has joined #openstack-ansible17:58
*** alanmeadows has joined #openstack-ansible17:59
openstackgerritJonathan Rosser proposed openstack/openstack-ansible master: Use nodepool epel mirror in CI for systemd-networkd package  https://review.opendev.org/75470618:00
*** gixx has joined #openstack-ansible18:00
*** PrinzElvis has joined #openstack-ansible18:00
*** jrosser has joined #openstack-ansible18:05
*** nurdie has quit IRC18:05
fridtjof[m]jrosser: i found the cause, but didnt want to interrupt the meeting18:08
fridtjof[m]Now i'm no longer on my desktop so i dont have the draft message i typed out18:08
fridtjof[m]But it's a commit between 20.1.5 and 20.1.6 affecting inventory/group_vars/haproxy/<something>.yml18:09
jrosserno worries - theres always tomorrow :)18:09
*** spatel has quit IRC18:14
*** maharg101 has joined #openstack-ansible18:14
fridtjof[m]Found it: https://opendev.org/openstack/openstack-ansible/commit/095bc436b7237ff3aa03d38d552a1e8a6e4859a718:15
*** nurdie has joined #openstack-ansible18:19
*** maharg101 has quit IRC18:24
*** gouthamr__ has joined #openstack-ansible18:26
*** olivierbourdon38 has quit IRC18:36
*** olivierbourdon38 has joined #openstack-ansible18:38
masterpejrosser: about the systemd init_config_overrides and LimitNOFILE we currently have about 80 compute nodes and somehow we are hitting the limits. Cinder-volume is giving the "too many open files" error. We use Ceph as backend. I'm not sure why 4096 what is default is not enough.19:00
*** spatel has joined #openstack-ansible19:03
jrossermasterpe: ‘lsof’ might help see what it is19:14
jrosserwe can certainly increase the default though if it’s too small19:14
*** cshen has joined #openstack-ansible19:15
*** cshen has quit IRC19:20
*** gouthamr__ is now known as gouthamr19:39
*** tosky has quit IRC19:51
openstackgerritJonathan Rosser proposed openstack/openstack-ansible stable/train: Revert "Remove nova_api_placement from inventory"  https://review.opendev.org/75511720:17
*** maharg101 has joined #openstack-ansible20:21
openstackgerritJonathan Rosser proposed openstack/openstack-ansible master: Update ansible-lint==4.3.5, flake8==3.8.3, bashate==2.0.0  https://review.opendev.org/75506520:22
*** maharg101 has quit IRC20:26
*** BlackFX has quit IRC20:42
*** spatel has quit IRC20:52
spotzjrosser: I've got booth duty tonight poke with any reviews you need to get through20:53
*** theintern has joined #openstack-ansible21:14
*** theintern has quit IRC21:14
*** cshen has joined #openstack-ansible21:15
*** cshen has quit IRC21:20
*** jbadiapa has quit IRC21:31
*** gundalow has quit IRC21:52
*** johnsom has quit IRC21:52
*** jungleboyj has quit IRC21:52
*** alanmeadows has quit IRC21:52
*** Open10K8S has quit IRC21:52
*** CeeMac has quit IRC21:52
*** fyx has quit IRC21:52
*** sri_ has quit IRC21:52
*** guilhermesp has quit IRC21:53
*** rpittau|afk has quit IRC21:53
*** johnsom has joined #openstack-ansible21:54
*** cyberpear has quit IRC21:54
*** PrinzElvis has quit IRC21:54
*** suryasingh has quit IRC21:54
*** alanmeadows has joined #openstack-ansible21:54
*** gixx has quit IRC21:54
*** gundalow has joined #openstack-ansible21:55
*** suryasingh has joined #openstack-ansible21:56
*** sri_ has joined #openstack-ansible21:56
*** jungleboyj has joined #openstack-ansible21:56
*** Open10K8S has joined #openstack-ansible21:56
*** fyx has joined #openstack-ansible21:56
*** rpittau|afk has joined #openstack-ansible21:57
*** PrinzElvis has joined #openstack-ansible21:57
*** cyberpear has joined #openstack-ansible21:57
*** guilhermesp has joined #openstack-ansible21:57
*** gixx has joined #openstack-ansible21:57
*** CeeMac has joined #openstack-ansible21:57
fridtjof[m]jrosser: thanks for the proposal!22:16
fridtjof[m]I found another issue!22:16
fridtjof[m]when creating an instance, the linuxbridge agent on the corrresponding compute host is stuck with this error:22:16
fridtjof[m]2020-09-29 22:13:44.231 1781 ERROR neutron.plugins.ml2.drivers.agent._common_agent [req-... - - - - -] Error in agent loop. Devices info: {'current': {'tap8e53ab18-3d', 'tape3d6613a-44', 'tapb1d81907-c2', 'tapd63475d5-04'}, 'timestamps': {'tap8e53ab18-3d': 43, 'tape3d6613a-44': 42, 'tapb1d81907-c2': 46, 'tapd63475d5-04': 44}, 'added': {'tap8e53ab18-3d', 'tape3d6613a-44', 'tapb1d81907-c2', 'tapd63475d5-04'},22:16
fridtjof[m]'removed': set(), 'updated': set()}: pyroute2.netlink.exceptions.NetlinkError: (13, 'Permission denied')22:16
fridtjof[m]from what i can see, the agent is running as user 'neutron', but it does all that through rootwrap?22:20
*** maharg101 has joined #openstack-ansible22:22
*** MickyMan77 has joined #openstack-ansible22:23
*** maharg101 has quit IRC22:29
*** MickyMan77 has quit IRC22:31
*** nurdie has quit IRC22:34
fridtjof[m]seems like my placement service's database is kind of broken :/22:42
fridtjof[m]http://paste.openstack.org/show/798548/22:45
fridtjof[m]shouldn't these two tables match?22:45
fridtjof[m]because nova-compute (on compute2) is now regularily giving me this: http://paste.openstack.org/show/798549/22:47
*** nurdie has joined #openstack-ansible22:50
*** nurdie has quit IRC22:55
*** ianychoi_ is now known as ianychoi23:00
*** klamath_atx has joined #openstack-ansible23:14
*** cshen has joined #openstack-ansible23:16
*** cshen has quit IRC23:20
*** nurdie has joined #openstack-ansible23:26
*** nurdie has quit IRC23:31

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!