Friday, 2020-11-20

*** cshen has joined #openstack-ansible00:00
*** cshen has quit IRC00:04
*** luksky has quit IRC00:07
*** macz_ has quit IRC00:12
*** tosky has quit IRC00:18
*** gyee has quit IRC01:01
*** cshen has joined #openstack-ansible02:00
*** spatel has joined #openstack-ansible02:04
*** spatel has quit IRC02:04
*** cshen has quit IRC02:05
*** jhesketh has quit IRC02:27
*** jhesketh has joined #openstack-ansible02:33
*** spatel has joined #openstack-ansible03:08
*** macz_ has joined #openstack-ansible03:26
*** macz_ has quit IRC03:31
*** d34dh0r53 has quit IRC03:48
*** cshen has joined #openstack-ansible04:00
*** cshen has quit IRC04:05
*** spatel has quit IRC05:31
*** evrardjp has joined #openstack-ansible05:33
*** rh-jlabarre has quit IRC05:40
*** alvinstarr has quit IRC05:42
*** cshen has joined #openstack-ansible06:01
*** cshen has quit IRC06:05
*** yasemind34 has joined #openstack-ansible06:23
*** cshen has joined #openstack-ansible06:25
*** cshen has quit IRC06:30
*** rpittau|afk is now known as rpittau06:52
*** pto has joined #openstack-ansible06:53
*** miloa has joined #openstack-ansible07:06
openstackgerritSiavash Sardari proposed openstack/openstack-ansible-openstack_openrc stable/ussuri: Adding support of system scoped openrc and clouds.yaml  https://review.opendev.org/76350807:08
*** pto has quit IRC07:24
*** pto_ has joined #openstack-ansible07:24
*** pto has joined #openstack-ansible07:26
*** pto_ has quit IRC07:29
*** luksky has joined #openstack-ansible07:56
*** pcaruana has joined #openstack-ansible08:03
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_ceilometer master: Unify deployment of ceilometer files  https://review.opendev.org/76218308:06
*** cshen has joined #openstack-ansible08:16
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_neutron master: Add centos-8 support for ovs-dpdk  https://review.opendev.org/76272908:26
*** andrewbonney has joined #openstack-ansible08:28
*** cshen has quit IRC08:31
*** fanfi has quit IRC08:56
*** pto has quit IRC09:00
*** pto_ has joined #openstack-ansible09:01
*** pto has joined #openstack-ansible09:09
*** pto_ has quit IRC09:11
*** cshen has joined #openstack-ansible09:12
noonedeadpunkgixx: oh, also I dunno if you're using lbaas, but you need to do migration to octavia if you do. Also octavia upgrade from Q to U might not work well, at least they claim to support upgrade only for sequential releases https://docs.openstack.org/octavia/latest/admin/guides/upgrade.html09:36
admin0morning noonedeadpunk .. do you have a good howto to get ocatavia going with osa ?09:49
noonedeadpunkadmin0: only rackspace one and spatel blogpost09:50
noonedeadpunkhttps://developer.rackspace.com/docs/private-cloud/rpc/master/rpc-octavia-internal/octavia-install-guide/  https://satishdotpatel.github.io//openstack-ansible-octavia/09:50
admin0just curious, do they also have a kubernetes/magnum guide :) ?09:52
admin0this guide is a good start09:52
noonedeadpunkhaha09:52
noonedeadpunkeventually the main problem with magnum is magnum itself09:52
noonedeadpunkas to get it working you need to chose magnum version that is working properly first09:53
noonedeadpunkBut iirc U magnum was pretty good in general09:53
admin0we have 21.1.0 and 21.2.0 -- aren't they good in these tags ?09:57
noonedeadpunkshould be I guess09:57
*** tosky has joined #openstack-ansible09:57
noonedeadpunkbut you need coreos image here for sure09:58
admin0right now, i am doing  16->18 migration for one cluster10:05
admin0slow process :)10:05
admin0thanks everyone for these notes: https://etherpad.opendev.org/p/osa-rocky-bionic-upgrade10:06
noonedeadpunkthere's also https://docs.openstack.org/openstack-ansible/rocky/admin/upgrades/distribution-upgrades.html in case you haven't come around it10:07
noonedeadpunkthanks to ebbex for it:)10:07
*** openstackgerrit has quit IRC10:25
*** spatel has joined #openstack-ansible10:34
*** spatel has quit IRC10:39
*** SecOpsNinja has joined #openstack-ansible11:05
*** jbadiapa has joined #openstack-ansible11:29
admin0in one osa cluster where cinder,glance,nova is using ceph, i added a 2nd ceph just for cinder .. . so i can create volumes in both .. but when i try to mount it, the volumes from the 2nd ceph cannot be  mounted .. i get  libvirt.libvirtError: internal error: unable to execute QEMU command 'blockdev-add': error connecting: Permission denied11:39
admin0the hypervisor has no idea/keys of the 2nd ceph11:39
admin0if anyone knows or can point me to the right direction, it would help11:40
SecOpsNinjahi everyone. in this project, the rsyslog cntainers shoudn't receive all the logs of  lxc contaienrs and physical services? the strange part is  following this https://docs.openstack.org/openstack-ansible-rsyslog_server/latest/ops-logging.html#finding-logs, the directory /var/log/log-storage doesnt exit. Im trying to trobleshotting and strange error regarding the creation of the news staying11:51
SecOpsNinja stuck forever in "Scheduling" but the only error that i could find was regarding nova-api-wsg that returns ERROR oslo.messaging._drivers.impl_rabbit [req-68a168f1-4e5a-49c1-95c6-6ba6d8c69377 - - - - -] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 32.0 seconds): OSError: [Errno 113] EHOSTUNREACH. is there any way to try to track this req-68a1... and see where is falling?11:51
SecOpsNinjaeven http://paste.openstack.org/show/800250/ doesnt say much...11:54
*** mgariepy has quit IRC11:59
jrosseradmin0: why two cephs? you can put the cinder pool on specific/different devices in a single ceph if you want to, it may not be necessary to have two clusters12:08
admin0some biz requirement :)12:08
admin0i just deliver :)12:09
jrosseralso when you boot from volume theres a no-op snapshot taken from the glance pool to cinder12:09
jrosseryou will miss out on that12:09
admin0SecOpsNinja, rsyslog container is not used anymore recently if i recall .. its all journald .. you can use something like graylog to centralize logging12:10
jrosseradmin0: https://medium.com/walmartglobaltech/deploying-cinder-with-multiple-ceph-cluster-backends-2cd90d64b1012:11
SecOpsNinjaadmin0, that was what i was thinging regarding systemd journing and rsyslog... so the rsylogs container is useless12:11
admin0jrosser, thanks .. i think i missed the Create a new nova-secret.xml file  part .. and now thinking how to do this with osa12:12
*** openstackgerrit has joined #openstack-ansible12:35
openstackgerritMerged openstack/openstack-ansible-os_neutron master: Test OVS/OVN deployments on CentOS 8  https://review.opendev.org/76266112:35
*** macz_ has joined #openstack-ansible12:39
admin0moving from 16 -> 18, i nuke a controller and rerun the playbooks with limit .. all containers are recreated and work OK, except utility .. i get this error in utility -- https://gist.githubusercontent.com/a1git/be5353eb91260945d8b00bcd21df7b68/raw/cc330d8f96267ea79a5057e5a50b7984bc72bf46/gistfile1.txt .. looks like it still tries to setup a 16.04 version one12:41
*** macz_ has quit IRC12:44
admin0SecOpsNinja, something like this in variables will do https://gist.github.com/a1git/ae3934799479e18f3e9553f7bbe7c25a12:53
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_neutron master: Return calico to voting  https://review.opendev.org/70265712:57
openstackgerritDmitriy Rabotyagov (noonedeadpunk) proposed openstack/openstack-ansible-os_neutron master: Return calico to voting  https://review.opendev.org/70265712:58
-openstackstatus- NOTICE: The Gerrit service at review.opendev.org will be offline starting at 15:00 UTC (roughly two hours from now) for a weekend upgrade maintenance: http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html13:01
*** spatel has joined #openstack-ansible13:11
openstackgerritMerged openstack/openstack-ansible master: Pin SHA of murano-dashboard so it is controlled by OSA releases  https://review.opendev.org/76300213:15
*** spatel has quit IRC13:16
*** rh-jlabarre has joined #openstack-ansible13:21
SecOpsNinjathanks admin0  fpr graylog config13:23
SecOpsNinjaatm im trying to find the cause of the errors in iscsi in my lvm backends13:24
*** mgariepy has joined #openstack-ansible13:33
openstackgerritMerged openstack/openstack-ansible master: Bump calico version  https://review.opendev.org/76298513:35
openstackgerritAndrew Bonney proposed openstack/openstack-ansible master: Bump API microversion required for Zun AIO  https://review.opendev.org/76356213:41
admin0SecOpsNinja, for iscsi you can use iscsi commands to check if the server is responding .. or if you can manually mount13:45
openstackgerritAndrew Bonney proposed openstack/openstack-ansible-os_zun master: DNM: Update zun role to match current requirements  https://review.opendev.org/76314113:48
*** d34dh0r53 has joined #openstack-ansible13:54
-openstackstatus- NOTICE: The Gerrit service at review.opendev.org will be offline starting at 15:00 UTC (roughly one hour from now) for a weekend upgrade maintenance: http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html14:00
*** macz_ has joined #openstack-ansible14:00
*** pto has quit IRC14:01
*** pto_ has joined #openstack-ansible14:01
openstackgerritJames Denton proposed openstack/openstack-ansible-os_neutron master: Add centos-8 support for ovs-dpdk  https://review.opendev.org/76272914:02
*** spatel has joined #openstack-ansible14:03
*** macz_ has quit IRC14:05
*** pto_ has quit IRC14:10
spatelquestion folks, I have deployed my 3 node controllers and realized that my VxLAN subnet range is wrong so i want to change that subnet range. How difficult its going to change that? This is what i am thinking.14:11
spatel1. change range in openstack_user_config.yml14:11
spatel2. change IP on each controller node for VxLAN bridge14:11
spatel3. run neutron playbook14:12
spatelwhat do you think?14:12
admin0and delete containers that might use the br-vxlan range14:15
spatelI don't think any container use br-vxlan14:18
spatelonly neutron-server run inside container and it doesn't care about br-vxlan14:19
spatelcorrect me if i am missing something14:19
admin0right .. in some very old deployments, i recall seeing neutron-agents in containers14:20
admin0was just mentioning as a point to check14:21
spateladmin0: sure good to know14:21
SecOpsNinjawhtat is best way to manually remove a vm when nova force-delete f61e7054-2943-4302-9e1f-4883c183f090 doesnt work? i have already detached the inexisting previous attached volume14:23
admin0set its state to available and try to delete again14:23
admin0if it fails, marked it as deleted from the database14:23
admin0so that it does not appear in the UI /list14:24
SecOpsNinjai don't know how but 3 vms got in unstable mode and dispite being able to delete its vlume tought cinder the nova didn't updated the stte. Im trying to remove them from kvm compute node an trying to remove all from openstack but its getting in error state14:26
spatelSecOpsNinja: sometime when rabbitMQ not happy i have noticed that kind of behavior, vm got stuck in bad state and never get recover.14:30
SecOpsNinjaspatel, yep i think that is whats is causing the instability on my cluster because i cant create new vms(being stuck in creating forever and i can only start a few ones). so im trying to remove the ghosts vms and disks and see if it resolves thjis instability14:32
SecOpsNinjabeecause trought rabbitmq i wasn't able to understand with the request id to see what was causing the OSError: [Errno 113] EHOSTUNREACH14:32
openstackgerritMerged openstack/openstack-ansible-os_tempest master: Add tempest cleanup support  https://review.opendev.org/76240514:51
*** miloa has quit IRC14:53
*** rpittau is now known as rpittau|afk14:53
ThiagoCMCspatel, my Ceph cluster have a low IOPS, I believe that it's due to the home gigalan network... lol14:55
ThiagoCMCAre you aware of potential fine tuning here and there, let's say, PG numbers?14:55
spatelTotally possible. Idle requirement is to use 10G14:56
spatelI mostly use this method - https://ceph.io/pgcalc/14:56
ThiagoCMCYeah... True. I do have 2.5Gpbs NIC cars but I can't find a 2.5Gbps cheap switch on the market, with 16 ports, for example14:56
spatelThiagoCMC: there are plenty of switch in cheap price in market14:57
ThiagoCMCNot with 16 * 2.5Gbps ports14:57
spatelhow many server you have?14:58
spatelupgrade them to 10G nic and get 10G switch14:58
ThiagoCMC814:58
ThiagoCMCI found a close one: https://www.anandtech.com/show/15916/at-last-a-25gbps-consumer-network-switch-qnap-releases-qsw11055t-5port-switch14:58
ThiagoCMCBut, only 5 ports lol14:58
admin0ebay :)14:58
ThiagoCMCSo sad   :-D14:58
spatelwhy are you only looking for 2.5G?14:59
ThiagoCMCBecause my NIC cars are all 2.5Gbps14:59
ThiagoCMCcards*14:59
spatelwhat is the model of server? HP/IBM/Dell?14:59
admin0for my demo lab, i have a single server with 256gb of ram and  4x nvme on raid0 .. and i use the same method like openstack .. use cloud-init and backing image to quickly spawn instances and delete them again .. with script ..15:00
admin0so a total up and down takes less than a minute to get any OS  up and running15:00
ThiagoCMCIt isn't a server... haha - It's an Asus ROG motherboard top gaming PC with AMD 3950X and water cooling15:01
spateloh boy!15:01
spateli would like to see picture of it :)15:02
ThiagoCMCI can definitely share it! hahaha15:02
ThiagoCMCAt amazon: https://www.amazon.com/gp/product/B07SYW3RT2/ref=ppx_yo_dt_b_asin_title_o09_s00?ie=UTF8&psc=115:02
ThiagoCMCASUS ROG Crosshair VIII Hero X57015:02
ThiagoCMCIt's an awesome compute node!15:02
ThiagoCMCLOL15:03
-openstackstatus- NOTICE: The Gerrit service at review.opendev.org is offline for a weekend upgrade maintenance, updates will be provided once it's available again: http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html15:03
admin0i have this same mb . no water cooling though15:03
ThiagoCMCNice  ^_^15:03
spatelone day i will get that machine.15:03
ThiagoCMC:-D15:03
spatelI have gaming PC but its ok15:03
admin0good setup to play cities skylines in the side and run ubuntu desktop inside vmware15:04
ThiagoCMCVMWare?! Ewww...  lol15:04
admin0vmware workstation15:04
ThiagoCMCI have Windows on QEMU accessing the NVIDIA via GPU Passthrough lol15:04
admin0for (gaming) reasons, cannot switch to linux completely, but cannot work on windows desktop either15:04
ThiagoCMCWindows bare-metal? Never.15:05
ThiagoCMC:-P15:05
admin0so material desktop + ubuntu inside  vmware workstation15:05
admin0allows me to be on linux and play games15:05
admin0laptops ( even my old mac) is linux15:05
ThiagoCMCCool!15:05
spatelI hate window (only reason i have that to just play PUBG/RainbowSix/valorant/ :)15:05
admin0i am addicted to cities skylines .. just a week back , reached 90% cpu on all 16 cores .. time to upgrade to 2415:06
ThiagoCMCDamn LOL15:06
admin0my laptop is now a dedicated aio box :)15:06
ThiagoCMCI hate Windows too... I tried the 1- pro on those machines, man, it turns off ramdonly. I tried everything, no power saving, not screensaver, performance tuning. Nothing worked. Got Ubuntu on those babies, weeks of uptime.15:07
spatelNext year planning to buy new MacBook Pro with M1 chip :)15:08
kleiniThiagoCMC: do you have an extra SSD on Ceph OSD nodes for Ceph journal partition?15:08
*** macz_ has joined #openstack-ansible15:09
spatelIf you have SSD then i don't think you need dedicated journal15:09
kleiniwrites into Ceph return after they are written into the journal15:09
noonedeadpunkwell, if journal is on nvme in some raid 5 :p15:09
spatelIf you need more performance then you can put journal on NVMe and SSD for data15:09
kleiniokay, so the whole disc is an SSD and journal is colocated?15:09
admin0https://pasteboard.co/JBfx9LY.png  -- 1186 hours that could have spent on openstack :D15:10
ThiagoCMCkleini, yes, SSDs for Ceph dbs15:10
kleiniokay, we have normal SATA SSDs for non-colocated journal and performance is really good15:10
ThiagoCMCadmin0, LOLOL15:11
ThiagoCMCkleini, what is your network speed?15:12
*** macz_ has quit IRC15:13
spatelThis is my ceph network diagram - https://ibb.co/fxstpx115:15
spateli had performance issue with EVO which i have replaced with Enterprise SSD (spent $8000)15:15
ThiagoCMCNice!15:17
kleini1G copper between compute nodes and Ceph OSDs15:19
ThiagoCMC:-O15:19
ThiagoCMCbonded?15:19
kleiniI have full 100MB/s write speed in VMs in OpenStack15:19
kleininothing bonded at all15:19
ThiagoCMCHmm15:20
kleiniplain RJ45 connection15:20
spatelThiagoCMC: here is some performance comparison EVO vs PM  - http://paste.openstack.org/show/800257/15:20
ThiagoCMCHave you tested it with `fio`, how much IOPS?15:20
ThiagoCMCspatel, thanks for sharing this!15:20
spatelnoonedeadpunk: jrosser curious why do we need this anymore - http://paste.openstack.org/show/800258/ ?15:26
spatelbr-vxlan not attached to any container in current deployment15:26
noonedeadpunkit doesn't need to be brdige, but you need to have interface for vxlan operation15:26
noonedeadpunkat least for lxb15:26
spatelYes we need br-vxlan bridge interface on controller/compute but we don't need that block of code in openstack_user_config.yml right?15:27
ThiagoCMCTrue, but it doesn't have to be declared under "provider_networks:" at user config15:27
spatelit has no use anywhere15:27
noonedeadpunkit has15:27
ThiagoCMCI don't have the br-vxlan declared under my "provider_networks:"15:28
spatelwhere?15:28
noonedeadpunksince you get neutron ml2 config out of this15:28
ThiagoCMCThere is another way15:28
spatelnow neutron agent is outside container so that block of code doing nothing.15:29
ThiagoCMCFor example: http://paste.openstack.org/show/800259/ - user vars15:29
noonedeadpunkwell yes, it's true15:29
noonedeadpunkspatel: it defines default for vars ThiagoCMC pasted15:30
ThiagoCMCI also have: http://paste.openstack.org/show/800260/15:30
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/library/provider_networks15:30
spatelthanks noonedeadpunk15:33
spatelThiagoCMC: why are you using 224.0.0.0 range ?  i think you should use 239.0.0.0 to 239.255.255.25515:34
spatelas a best practice (it doesn't matter but sometime it can create issue)15:34
spatelhttps://en.wikipedia.org/wiki/Multicast_address15:35
noonedeadpunklike for metal deployments you don't need provier_networks at all, but again - all depends.15:35
noonedeadpunkI prefer having itdefined and not bother myself with defining specific overrides when deployment allows to do so15:36
spatelnoonedeadpunk: totally understand. just trying to understand how this lego pieces connected so if i pull out one..nothing should break15:36
noonedeadpunkwell yes, if you're covered with specific definintions it can be dropped15:36
ThiagoCMCspatel, it was jamesdenton a long time ago that configured this for me, I was facing network problems and that fixed it!15:37
jamesdentonO_O15:37
ThiagoCMClol15:37
noonedeadpunkxD15:37
spatelrelated multicast address or vxlan bridge stuff?15:37
ThiagoCMCI don't remember exactly what was happening with the vxlan connectivity...15:38
spateljamesdenton: what i need to do if i want to deploy dedicated l3-agent metal node in OSA?15:40
spatelLets say i deploy 5 dedicated node for traffic load-balancing, how does that work with l3-agent?15:41
jamesdenton5 network nodes?15:41
spatelyes15:42
spatelnot neutron-server15:42
jamesdentonright, ok15:42
spateljust for SNAT and all high traffic volume router15:42
jamesdentonyou would get l3 agent on each of them, but any router would only be scheduled/active to one at any given time.15:42
jamesdentonso make the router HA and let vrrp do its thing15:42
jamesdentonbut over time, the distribution may not be even15:43
spatelso its not like all 5 node will be active and working together15:43
jamesdentonno15:43
spatelhmm15:43
ThiagoCMCWhat about when with OVS' OVN?15:44
*** watersj has joined #openstack-ansible15:44
spatelI have vlan provider in my datacenter but i am thinking to run k8s cluster and it doesn't work with vlan provider (it need tenant network so i have to deploy l3-agent for them)15:45
watersjfor those with limited nics doing bond, what mode are you using?15:45
watersjand or suggest?15:46
spatelwatersj: if your switch support MLAG then active+active mode=415:46
spatellayer2+315:46
jamesdentonspatel OVS/DVR is better suited to distributing North-South traffic15:47
spateltruly speaking i never convinced myself to try DVR :(15:48
spateli don't know anyone out there use DVR on large cloud15:48
spatelI tried and end up doing troubleshooting day night, i might be new that time but it has lots of pieces to look after15:49
jamesdentonit works, but there are some incompatibilties (allowed address pairs being the main one, currently)15:49
spatelyou know what it eat up my lost of public IPs :(15:50
jamesdentonyes, it will do that15:50
spatelthen finally i decided to get out from that mess and use vlan provider15:50
spateli think i talked to you first time when i join this IRC 2 years ago :)15:51
spatelcloudnull & you :)15:51
admin0xenial -> bionic -- how to force a new container to be in ubuntu 18 and not 16 ?15:51
cloudnullwat!?15:51
spatelyou are here :)15:51
admin0he is always here :D15:51
admin0\o/15:51
spatelsilent majority.. haha15:52
admin0you just need to ping them :)15:52
jamesdentonGerrit is down, so cloudnull has time to play15:52
cloudnullunpossible15:52
cloudnullthere's no playing in cloud15:52
spatelyou helped me a lot to build my first cloud on pike and its rock solid, its been 2.5 years and 99.999 SLA :)15:52
cloudnullhow has everyone been ?15:52
jamesdentonno complaints15:55
admin0spatel, what i do is also make compute nodes as network nodes15:55
spatelcloudnull: waiting or COVID-19 patch released :)15:55
admin0that way, all traffic is distributed15:55
admin0and one node going down does not take away the whole15:56
spateladmin0: what is the advantage of doing that?15:56
admin0no centralized point of failures15:56
spatelthat is called DVR deployment right?15:56
admin0its not using DVR per se15:56
spatelI have 300 compute nodes in my cloud.. that would be very odd don't you think15:57
admin0my biggest is 1/2 of that, but its not that odd15:57
ThiagoCMCWith OVN, all compute nodes can easily also be network nodes15:57
ThiagoCMCIf I'm not wrong lol - it's an efficient topology15:58
admin0only when ovn is approved and battle tested .. right now, this works for both lb and ovs15:58
ThiagoCMCYep, I have the same topology that ovn provides but, with linux bridges15:58
spateladmin0: let me ask stupid question, if i turn every compute node into network node and create tenant-1 in that case it will pick 1 HA pair to handle that tenant traffic right?16:00
admin0if HA is enabled, yes16:01
spatelif HA not enable then?16:01
admin0then the router will be in one of the compute node in the cluster16:01
admin0or you can disable l3 agents on some nodes and force some compute nodes to be more dedicated to network functions16:01
spatelso if i have many many tenant then my workload will distributed right but if i have only 1 or 2 tenant then it doesn't make sense16:02
admin0right16:02
*** macz_ has joined #openstack-ansible16:02
spatelThanks for confirming that.16:02
admin0the ones i managed was a big public cloud provider with over 15000 tenants16:02
admin0so a dedicated network node in such scale did not make sense16:02
admin0the more distributed, the better16:02
spatelhow many total compute node you have to handle 15000 tenants?16:02
admin0it grew upto 350 eventually16:03
admin0but i left and moved on, so no idea of current16:03
spatelwith 3 controller node?16:03
spatel15000 is pretty big size :)16:03
admin03 (main) controller nodes, but additional 3 to spawn up more network server instances16:03
spatelI have 3 controller node with 326 compute nodes.16:04
spatelnow i am build new cloud with 6 controller node for make it more resilient16:05
admin0for public cloud provider, where you go into events and give a 15 day test a/c where people can come and in play, that was a good planning to handle if some customers try to create a lot of network16:11
admin0there were 3 main conrollers ( for haproxy ) and other services, but we had to 3 extra nodes for neutron server16:11
admin0neutron server was a bottleneck16:11
SecOpsNinjaone quick question i have my cinder storage using lvm but from what im seing some volumes arne't available in cinder storage so compute node is complaning about target not existing. how can i remove this ip-*.*.*.*:3260-iscsi-iqn.2010-10.org.openstack:volume-aca70400-6f16-4534-a5b0-596d0b1fa5a2-lun-1 in /dev/disks/by-paths?16:15
*** gyee has joined #openstack-ansible16:32
ThiagoCMCHow to make use of the local storage of the compute nodes, when deploying OSA with Ceph? Right now, I see no option to launch an Instance outside of the RBD "vms" pool. Even when selection: "Create Image: no", any idea?16:36
*** macz_ has quit IRC16:39
*** macz_ has joined #openstack-ansible16:40
*** pcaruana has quit IRC16:40
*** mgariepy has quit IRC16:41
*** watersj has quit IRC16:48
spatelDo you have following in nova.conf ?16:50
-spatel- images_rbd_pool = vms16:50
-spatel- images_type = rbd16:50
spatelif yes then nova default put your VM on Ceph16:51
spatelThiagoCMC: ^^16:51
kukaczadmin0: in that public cloud setup you've mentioned, with compute node = network node, which backend was used? did I understood there was HA used but no DVR?16:54
admin0linuxbridge16:55
admin0HA in routers was not used ..16:55
kukaczadmin0: (I'm looking for inspiration what's possible and proven in production. for long years we've been on Contrail/Tungsten and lost track of ML2/OVS/...)16:56
*** tosky has quit IRC16:56
admin0simple = easy to maintain, manage, grow16:58
kukaczadmin0: thanks. using linuxbridge, is it possible to flexibly implement multiple (10th's) tenant-bound provider networks?16:59
admin0based on how big range you give on vxlan, it will work16:59
admin0but one thing i found was how vxlan is implemented in neutron17:00
admin0for example, in one setup i put 100000:10000000 .. something like that .. super long range17:00
admin0but i found that during upgrades, neutron took like 30 mins to came up17:00
admin0found that that the vxlan table os pre-populated by neutron without any indexing17:00
admin0so if you give say 1 million vxlan range, you will have a neutron table with 1 million entries17:01
admin0even if you are only using 10 vxlan networks17:01
admin0and it makes neutron come up very very slow17:01
admin0had to manually readjust the range in setting and truncate records17:01
jamesdentonyikes! sounds bug worthy17:02
kukaczadmin0: cool :-) that's always a pain, to discover such hidden internals17:02
jamesdentonkukacz if i were deploying today, i would settle on ML2/OVS w/ HA routers (at a minimum) and wait for migration to OVN down the road17:02
kukaczadmin0: that was an OSA environment?17:03
jamesdentonbut depending on your use case, straight provider networks w/o tenant networks/routers may be better option17:03
admin0kukacz, if for public cloud, you can offer 2 choices - ..  floating ip and direct IP17:04
admin0direct IP means people get direct public IP from dhcp17:04
admin0no need to create routers or networks17:04
admin0and people are happy ( cpanel, directmin, windows) for licensing purpose17:04
kukaczjamesdenton: thanks! we're a multitenant service provider cloud. need to deliver each customer their own routed network and bind it to a set of their projects, enable their subnet pools to be distributed via FIPs etc.17:04
admin0cons - if they delete the server, the IP is also gone17:04
admin0so in real usage, direct ip is used a lot more than floating ip .. because of the simplicity ..  so no longer 3 dhcp, 2 routers per network17:05
*** macz_ has quit IRC17:06
kukaczadmin0: direct IP is what we're using currently, though it's due to a Tungsten Fabric limitation. customers ask for FIPs, as those are usually part of shared Ansible playbooks they use for orchestration etc.17:07
admin0i meant to offer both .. like net-floating net-direct  etc names17:08
admin0so customers have a choice to have both17:09
ThiagoCMCspatel, that's exactly what I have at nova.conf!17:13
kukaczadmin0, jamesdenton: thanks for your inputs!17:13
jamesdentonany time17:13
spatelkukacz: my reason to use direct IP because of performance. (NAT is big bottleneck for network throughput)17:14
ThiagoCMCIPv6 rocks!  :-P17:14
ThiagoCMCspatel, so, how to put the VM's disks at the local storage at the compute node (instead of the default, ceph)?17:15
kukaczspatel: also using direct IPs from provider(external) networks?17:16
spatelThiagoCMC: i don't think there is a way to do that you have only two choice remove rbd from nova.conf and then nova default use local disk. (Or use cinder boot volume where you can pick if you want local or cinder to boot)17:17
admin0kukacz, https://www.openstackfaq.com/openstack-add-direct-attached-dhcp-ip/17:17
admin0this range can be a vlan range provided by the datacenter ..  all you need is a router .117:17
spatelkukacz: this is what i have - https://satishdotpatel.github.io//build-openstack-cloud-using-openstack-ansible/17:18
ThiagoCMCspatel, on no! Thanks for the info, I'll researching more into this.17:18
spatelRouter is my physical router for all my VLAN17:18
spatelkukacz: no NAT anywhere. high speed networking17:19
admin0ThiagoCMC, if you remove nova_libvirt_images_rbd_pool from user_variables and use host_overrides to only do in selected servers, you can have some servers using ceph for vms and rest using local disk17:19
spatelwe have 100gbps traffic coming in/out which i don't think any server based network node can handle :)17:20
ThiagoCMCadmin0, interesting, thanks!17:22
ThiagoCMCI found people talking about this here: https://bugzilla.redhat.com/show_bug.cgi?id=130381417:22
openstackbugzilla.redhat.com bug 1303814 in rhosp-director "[RFE] ephemeral storage for multiple back end" [Medium,Closed: wontfix] - Assigned to rhos-maint17:22
admin0that too .. if you have big nodes and need multi gb connectivity, ovs it is .. but if you are older smaller nodes where the number of instances will not be a lot  ( based on the flavors), lb17:22
admin0lb vs ovs has some calculations17:22
*** MickyMan77 has quit IRC17:23
kukaczspatel: thanks for sharing that nice guide17:24
*** klamath_atx has quit IRC17:24
kukaczadmin0: our computes are typically running 50-70 instances17:24
spatelkukacz: 70vms wow! that is a lot of VM17:25
admin0:) just this morning i was trying osa+ multiple ceph clusters ..  i managed to cinder to support 2nd ceph .. so i can crate volumes .. but nova does not yet support multi ceph ( it cannot even mount the volume from the 2nd ceph)17:25
admin0so i have a setup where users can create volumes on the 2nd ceph, but no way to mount them17:25
kukaczthe biggest challenge is about handling (routing) the big amount of customer networks as provider networks. at best (currently) we need them BGP routed from datacenter network17:26
admin0ebgp with the dc ?17:26
kukaczadmin0: bgpvpnaas is what we're considering now17:27
admin0how do you add networks ? is it one network with multiple ranges, or do you add a new provider with every subnet ?17:27
*** mgariepy has joined #openstack-ansible17:30
spatelI am also interested to learn how people run BGP with openstack networking17:31
kukaczadmin0: with the current contrail setup, it's just not marked as provider network. it's common neutron network which is routed by contrail via mpls tunnels towards physical edger routers17:31
kukaczon edge router, there's a VRF per tenant. multiple subnets usually, pairing 1:1 with openstack networks or multiple subnets per 1 openstack network, both is possible17:33
kleiniThiagoCMC: https://opendev.org/openstack/openstack-ansible/src/branch/master/etc/openstack_deploy/user_variables.yml.prod-ceph.example#L28 just omit this line in your user_variables.yml and your compute nodes will create ephemeral as local qcow2 images based on downloaded images from Glance/Ceph17:34
kleiniwe use that to have the fastest possible ephemeral storage. And if you want storage on Ceph, just create it as a volume. So you have choice, when spawning VM17:35
ThiagoCMCkleini, that's exactly what I want!!!17:35
ThiagoCMCI'll try it now.17:35
ThiagoCMCThank you!17:35
kleinimultiple local NVMe PCIe 3.0x4 SSDs with ZFS on top and disabled sync, so every write returns when written into ZFS ARC. sync to discs runs async17:37
ThiagoCMCSounds awesome! I also have NVMe SSDs at my compute nodes, doing thing. I really want to use them as fast ephemeral, and then, Ceph volumes for my lovely data.17:38
kleinispatel: I am running about 200 VMs on each compute node, with 128 cores and 1TB memory17:38
ThiagoCMCCool!17:38
spatelkleini: 128 cores?  (overcommit vCPU?)17:38
kleiniAMD Epycs 7702P with DDR4-3200 rock everything17:39
ThiagoCMCSo, after commenting out the line: "nova_libvirt_images_rbd_pool", which playbooks should I run, just "os-nova-install.yml" ?17:39
kleini1:4 vcpu overcommit17:39
ThiagoCMCOr better go with "setup-everything.yml"?17:39
kleinios-nova-install.yml should be enough17:40
admin0if everythign is already setup, just nova setup17:40
ThiagoCMCok17:40
spatelThiagoCMC: nova playbook17:40
kleiniqcow2 images are then stored in /var/lib/nova/instances17:40
ThiagoCMCJust like old days17:40
admin0i am stuck  16.04 -> 18.04 .. my util wants to be created with 16.04 and fails .. anyone recalls how they were able to fix it ?17:40
*** jbadiapa has quit IRC17:41
kleinithe container itself?17:41
admin0yep17:47
admin0i nuked c1 (controler1) , got it up on ubuntu 18.04 ( c2 and c3 still on ubuntu 16.04) .. removed the facts,  reran setup-hosts -l c1 and then lxc-containers-create -l c1_*17:48
admin0kleini, this is the error i get: https://gist.github.com/a1git/be5353eb91260945d8b00bcd21df7b6817:48
admin0https://gist.githubusercontent.com/a1git/be5353eb91260945d8b00bcd21df7b68/raw/cc330d8f96267ea79a5057e5a50b7984bc72bf46/gistfile1.txt17:49
admin0nuked c2 sorry .. not c117:49
admin0is the util container used to do anything ?17:52
*** yasemind34 has quit IRC17:54
*** yann-kaelig has joined #openstack-ansible17:56
admin0maybe when all the controlles are on 18, this will fix itself18:01
ThiagoCMCBTW, just curious about another thing... How mature is the systemd-nspawn deployment?18:01
kleiniadmin0: I read a lot about magic, that needs to be done on repo containers. The error message looks like, something is missing on the repo containers18:05
kleiniadmin0: magic that needs to be done on repo containers when you upgrade xenial->bionic18:06
*** SecOpsNinja has left #openstack-ansible18:08
kleiniThiagoCMC: does not work any more. I think it was removed in U release. I am using systemd-nspawn a lot as it is very nice integrated into systemd of the host but for OSA it needs to be re-implemented. Especially the network connection of containers is just done by a list of network commands in a systemd unit. Instead it should be done today with systemd-networkd and systemd-nspawn18:09
jrosserThiagoCMC: the nspawn deployment is basically deprecated because there is no one to maintain it18:09
ThiagoCMCOh, I see... Ok then  =P18:10
*** tosky has joined #openstack-ansible18:10
ThiagoCMCLooking forward to try LXD instead!18:10
kleiniadmin0: do you follow this https://docs.openstack.org/openstack-ansible/rocky/admin/upgrades/distribution-upgrades.html guide? a lot of things in there regarding disabling repo containers and haproxy redirection18:10
kleiniThiagoCMC: I use Linux Bridge and LXC in control plane while using OVS and ML2 OVS on compute and network nodes18:11
ThiagoCMCCool, I did that in the past as well. But today I'm 100% on Linux Bridges18:12
jrosseradmin0: at 16->18 upgrade time you need to make sure that the repo server that the venvs get built on is the one which is running 18.0418:12
ThiagoCMCThing is, my Controllers are QEMU VMs, with OSA's Containers within them. I would like to make my Controllers, LXD (instead of QEMU), and make OSA's Containers nested.18:13
jrosseradmin0: if it turns out one of the old 16.04 repo servers is being used for venv building you can override the automatic selection with this variable https://github.com/openstack/ansible-role-python_venv_build/blob/master/defaults/main.yml#L12118:13
jrosserThiagoCMC: the 'surface area' which we can support is pretty much related to the number of contributors we have18:14
jrossercurrently whilst i have POC patches for LXD i have no requirement for that in $dayjob so other things will get my attention ahead of those18:15
ThiagoCMCSure, all OSA contributors are really awesome! I wanna try the LXD patches, can you send me the link again?18:16
jrosserand thats pretty much the story with nspawn, no-one who is actively contributing is using it18:16
ThiagoCMCI see, makes sense18:16
jrossergerrit is offline for upgrade so you can't see them right now18:17
ThiagoCMCOk18:17
jrosserbut i did start the new ansible roles https://github.com/jrosser/openstack-ansible-lxd_hosts18:17
jrosserhttps://github.com/jrosser/openstack-ansible-lxd_container_create18:18
jrosserthey are very rough, but i did get as far as setup_hosts completing18:18
jrosserthe way the containers are initialised is totally different, and would use cloud-init18:19
ThiagoCMCCool! Yep, I really love LXD18:20
ThiagoCMCI can see it improving OSA deployments18:20
jrosserand the addition of networks and mounts should be doable either via the lxd api or cli, so there is really quite some work to do to get equivalence with all the options possible right now with the lxc role18:20
ThiagoCMCSure18:20
jrosserelsewhere i use lxd profiles for this sort of thing18:21
ThiagoCMCMe too18:21
jrosserand that would be neat, managing profiles for particular containers18:21
ThiagoCMCMy Compute Nodes and Ceph OSDs are actually LXD containers (bare-metal though)18:21
jrosserbut thats a bit of a contradiction with "add mount X to container Y" which would maybe imply a 1:1 profiles:containers setup18:22
jrosserso i'm really not at all sure what the best way to duplicate what the lxc stuff does using the features in LXD18:22
ThiagoCMCHmm... Kinda weird18:22
ThiagoCMCMight be better to research more and do things the "LXD-way" only... Maybe even using LXD Cluster features somehow18:24
jrosseroh and also lxd<>snap makes me very nervous18:24
ThiagoCMCLOL18:24
kleinihave an nice weekend18:25
ThiagoCMCYou too!18:25
jrosserwe download specific snaps and use the --dangerous flag to manually install them18:25
*** andrewbonney has quit IRC18:25
jrosserthen there is no danger of asynchronous auto-upgrades18:25
ThiagoCMCjrosser, maybe one day, LXD without snap: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=76807318:26
openstackDebian bug 768073 in wnpp "ITP: lxd -- The Linux Container Daemon" [Wishlist,Open]18:26
ThiagoCMCSoon as Debian releases LXD without Snap, I'm moving our from Ubuntu.  heheeh18:27
ThiagoCMC*moving out18:28
jrosseryeah well that will be an interesting discussion at canonical about if they follow the upstream debian approach for LXD18:29
jrosseri expect it will be a quick "no"18:29
ThiagoCMCYep18:29
ThiagoCMCBut Snap is... Creepy.18:29
*** gouthamr_ has quit IRC18:30
jrosserright - we had an auto-update install the same LXD bug pretty much simultaneously on a bunch of stuff running H/A18:30
jrosserwhich broke it all18:30
ThiagoCMCLOL18:30
ThiagoCMCdamn18:30
jrosserindeed, thats when we switched to manual snap downloads and --dangerous to install them offline18:31
jrossersnapd doesnt know where they are from so can't update them18:31
ThiagoCMCRight!? You can turn off automatic downloads even on an iPhone...18:31
*** gouthamr_ has joined #openstack-ansible18:46
*** jamesdenton has quit IRC18:57
*** jamesden_ has joined #openstack-ansible18:57
*** tosky has quit IRC19:13
*** luksky has quit IRC19:50
*** luksky has joined #openstack-ansible19:50
*** yann-kaelig has quit IRC19:55
spatelfolks, what nova filters you guys using in your cloud selective your put each available filter in nova_scheduler_default_filters: ?20:12
spatelany performance issue if i fill that list with all of the filters?20:12
*** klamath_atx has joined #openstack-ansible20:22
admin0its evaluated when a new instance is created20:27
admin0and is usually very fast20:27
admin0unless you spawn like 100+ instances simultaneously, should not be that much of an issue20:28
admin0but use only what you need20:28
spateladmin0: thanks20:30
admin0for example, unless you exactly know what you need, you don't need NUMATopologyFilter for example20:31
admin0ok all .. have a great weekend20:33
spatelthanks admin020:39
spateli do use NUMA so i need that filter but i get it what you saying20:39
spateljrosser: do we have any documentation about ansible tags? I meant list of tags we can use?20:42
spatelor just dig into roles and find out ?20:42
ThiagoCMCadmin0, enjoy your weekend too buddy!20:46
*** klamath_atx has quit IRC20:48
spatelwhere you folks located ?20:51
ThiagoCMCI'm in Canada, about 100km from Toronto20:54
spatelnice!20:56
spatelI am getting very strange issue. i am trying to add huge page compute node and define that in flavor but when i launch instance getting error No valid host found20:58
spateldo we need to do anything else?20:58
*** klamath_atx has joined #openstack-ansible21:08
*** klamath_atx has quit IRC21:14
*** alvinstarr has joined #openstack-ansible21:15
*** klamath_atx has joined #openstack-ansible21:16
*** klamath_atx has quit IRC21:20
*** klamath_atx has joined #openstack-ansible21:21
jamesden_spatel anything interesting in nova conductor log?21:25
spateljrosser: nova conductor just saying No available host21:26
spateljamesden_: ^^21:26
spatelI found issue21:26
spatelmy flavor had memory size not power of 221:26
spatel8193  instead of 819221:26
spatellook like huge page need proper size but first time i have seen this kind of issue21:27
*** klamath_atx has quit IRC21:27
*** jamesden_ is now known as jamesdenton21:29
jamesdentonnice find21:29
spateljamesdenton: one of Redhat KB article saying - Nova requires that the memory indicated on a HugePages-enabled flavor be a direct multiple of the actual HugePage size.21:32
jamesdentonwell, i think that makes sense. if your page size is 1G then to be some multiple of 1024. Same for 4k or 2M. never an odd number, i guess21:34
spatelthis is not documented anywhere... hmm21:41
*** klamath_atx has joined #openstack-ansible21:42
ThiagoCMCkleini, thank you so much for helping me to create Instances at the local storage while having the option to attach Ceph Volumes into them!! THANK YOU!!! THANK YOU!!!21:44
ThiagoCMCI'm getting there... lol21:47
ThiagoCMCI can't wait to update my cloud to Victoria!  :-D21:51
spatelThiagoCMC: what was the solution ?21:51
*** klamath_atx has quit IRC21:51
ThiagoCMCspatel, comment the following line: https://opendev.org/openstack/openstack-ansible/src/branch/master/etc/openstack_deploy/user_variables.yml.prod-ceph.example#L2821:52
spateloh that is what i told you to remove that from nova.conf but yes you can do via user_variable also21:52
ThiagoCMCNoa my Insntaces runs on NVMe local PCI storage (no RAID1 though), while I can attach Ceph volumes to them, pretty much like Amazon EC2.21:52
ThiagoCMC:-P21:53
ThiagoCMCI'm so happy!  LOL21:53
spatelIn my case i have created compute node file so per node i turn that on/off21:53
ThiagoCMCHmm... I see, like host aggregates?21:53
*** klamath_atx has joined #openstack-ansible21:54
spatel in /etc/openstack_deploy/host_var/compute-1.yml21:54
spatel in /etc/openstack_deploy/host_var/compute-2.yml21:54
ThiagoCMCNice21:54
ThiagoCMCGot it21:54
spatelin compute-1.yml i have added nova_libvirt_images_rbd_pool: vms21:54
ThiagoCMCthat's neat21:54
spateland created aggregates filter so if someone say i need shared storage for live migration then their VM will endup on ceph disk21:55
spatelbasic grouping...21:55
spatel user_variable.yml is global file so i trying to keep most of stuff in node specific file.21:56
spatelalso you can create groups in inventory and create single yml file like all_ceph_compute.yml and put nova_libvirt_images_rbd_pool: vms option in it..21:57
ThiagoCMCThat's really cool, I'll do the same here  =)21:57
*** rh-jlabarre has quit IRC22:12
*** spatel has quit IRC22:18
*** nurdie has quit IRC22:37
*** nurdie has joined #openstack-ansible22:37
*** nurdie has quit IRC22:42
*** luksky has quit IRC22:50
*** luksky has joined #openstack-ansible23:03
*** klamath_atx has quit IRC23:34
*** klamath_atx has joined #openstack-ansible23:35
*** nurdie has joined #openstack-ansible23:43
*** nurdie has quit IRC23:48

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!