Wednesday, 2019-01-30

*** hamzaachi has quit IRC00:01
*** tosky has quit IRC00:02
cmartHi OSA. I just upgraded my production cluster from Pike to Queens. I ran the playbook to remove the old Neutron agents containers, and when I did this (a few hours ago), networking for the entire cloud broke00:07
cmartThe Neutron agents are running on bare metal of the control plane, the linuxbridge agent claims to be creating bridges etc, but I can't pass any traffic.00:08
cmartHas anyone encountered a similar issue, or have any ideas/suggestions?00:08
cmartfollowing this now https://docs.openstack.org/openstack-ansible/queens/admin/troubleshooting.html#troubleshooting-instance-connectivity-issues00:09
*** ansmith has joined #openstack-ansible00:09
*** sdake has quit IRC00:19
*** ansmith has quit IRC00:23
*** sdake has joined #openstack-ansible00:27
*** ansmith has joined #openstack-ansible00:46
*** macza has quit IRC00:53
*** nurdie has joined #openstack-ansible00:59
*** cmart has quit IRC01:03
*** nurdie has quit IRC01:03
*** hamzy has joined #openstack-ansible01:04
*** ansmith has quit IRC01:11
*** sdake has quit IRC01:19
*** sdake has joined #openstack-ansible01:22
*** markvoelker has joined #openstack-ansible01:36
jamesdentonmay need to run thru that scenario again and see if something is wonky01:40
*** markvoelker has quit IRC01:41
*** cmart has joined #openstack-ansible01:56
*** tinwood has quit IRC02:09
*** tinwood has joined #openstack-ansible02:11
jamesdentoncmart did you manage to get it resolved?02:23
cmarthi jamesdenton! not yet. :(02:23
cmarti'm hand-editing inventory to re-deploy Neutron agents in containers02:24
cmartkind of backpedaling at this point. 12 hours in02:24
jamesdentonare you seeing anything of interest in the linuxbridge agent log?02:24
jamesdentonon the infras?02:24
cmartnope, i've pored over those. not a hint02:25
cmartinstances can talk to each other on private networks02:26
cmartso some kind of networking is working02:26
jamesdentonAlright - so can you describe in a little more detail whats happening?02:26
jamesdentonwhat kind of networks are you using? vxlan? vlan? are you using routers and floating ips?02:26
cmartwe're using vxlan I believe. and yes, external routers and floating IPs02:27
cmartinstances can talk to each other on private networks, but can't ping their default gateways on same private networks02:27
cmartso it's almost like something with the routers is busted02:27
jamesdentonok gotcha. If you spin up a new instance does DHCP work?02:27
cmartchecking!02:27
jamesdentonassuming DHCP runs on the infras (now baremetal)02:27
cmart(yes, I would assume that too)02:28
*** TxGirlGe_ has quit IRC02:28
jamesdentonwhen you get a chance, please show me the 'brctl show' output from just one of the compute nodes and one of the infras. You can use paste.openstack.org and then share the link02:28
cmartsure! give me just a minute02:28
jamesdentonno worries. I'm gonna ask questions but take your time. Did you already delete the old neutron agent containers? And have you rerun any playbooks with any modifications to your inventory?02:29
jamesdentonwhen you get a chance, also share the output of 'openstack network agent list'02:30
cmart`brctl-show` on an infra node: http://paste.openstack.org/show/744220/02:35
cmart`brctl-show` on a compute node: http://paste.openstack.org/show/744221/02:35
cmartoutput of `openstack network agent list`: http://paste.openstack.org/show/744222/02:36
jamesdentonlooking02:36
cmartanswering other questions: I have already deleted the old neutron agent containers :(02:36
jamesdentonno worries02:36
jamesdentonsafe to assume t01, t05, and t09 are infras?02:36
cmartI have NOT yet re-run playbooks with the most recent modifications to inventory (hand-adding the neutron agent containers back in)02:36
jamesdentonbaby duty - brb02:37
cmartyessir02:37
*** markvoelker has joined #openstack-ansible02:37
jamesdentonon your infras are you seeing qdhcp and qrouter namespaces?02:38
jamesdentoni would assume so based on the # of interfaces i'm seeing02:39
cmartOne slightly weird thing I did, mid-upgrade I switched from the 17.1.6 tag of OSA. to the stable/queens tag. The diff set is fairly small, but I did it to get odyssey4me's commit that supports the automated Neutron agent migration from containers to bare metal02:39
cmartsorry, I'm unsure how to look for these Q namespaces. googling02:40
jamesdenton'ip netns'02:40
jamesdentonwhen you switched... how far along?02:40
cmart'ip netns' returns about 20-some qdhcp namespaces, and a similar number of qrouter namespaces, roughly equivalent to the number of projects / private networks we have02:41
jamesdentonright, ok02:41
jamesdentonon the infra - does br-vxlan have an ip?02:42
jamesdentonand can you ping the ip of that compute?02:42
cmartI switched branches after running everything (including setup-openstack.yml completing without error). Since switching branches, I re-ran "${UPGRADE_PLAYBOOKS}/neutron-tmp-inventory.yml", "repo-install.yml", "bootstrap-ansible.sh" (to pull in some updated roles), then I re-ran both Neutron and Nova playbooks.02:43
jamesdentonif you can, show me 'ip addr show br-vxlan' and 'ip -d link show vxlan-62' on both of those machines02:43
cmart"on both of those machines" meaning the three infra hosts, or an infra and a compute host?02:44
jamesdentonjust the two you sent me02:44
cmartoutput on infra node (tombstone01): http://paste.openstack.org/show/744223/02:46
cmartoutput on compute node (tombstone22): http://paste.openstack.org/show/744224/02:46
jamesdentonok yeah, i see it02:47
jamesdentonone sec02:47
jamesdentonlet me put the baby back to bed and we'll see what we can do02:47
cmartOK. thank you very much! a bunch of researchers in Arizona may wish to buy you a beverage if we can get them access to their servers again :)02:48
jamesdentonhah well let's get you fixed up first02:49
jamesdentonso, the short of it appears to be that the infra nodes are using a different vtep interface than the computes.02:49
jamesdentoncompute: vxlan id 62 group 239.1.1.1 dev br-vxlan02:50
jamesdentoninfra: vxlan id 62 group 239.1.1.1 dev br-mgmt02:50
jamesdentoni've seen this, just need to find out to remedy. brb02:50
cmartaha. different bridge, meaning different layer 2 fabric02:50
jamesdentonyeah the bridge itself is just a placeholder for the interface where the VTEP addr is applied02:51
jamesdentonso one if using the IP configured on br-mgmt while the other, br-vxlan (correct)02:51
jamesdentonthis likely means that the infras are missing a block in the inventory relating to br-vxlan or the tunnel_address02:51
jamesdentonjust need to find my notes or bug on this..02:52
cmartI didn't make any changes to my `provider_networks` in openstack_user_config.yml before deploying Queens02:53
jamesdentonyeah i don't think it's anything you did or didn't do, necessarily02:54
jamesdentondid br-vxlan always have an IP on it, or was it something you added recently?02:54
jamesdentonon the infra nodes that is02:54
cmartit's not something that I knowingly added02:54
jamesdentonkk02:54
cmartlooking at our other production cloud (which is still on Pike), br-vxlan does have an IP on all infra nodes.02:56
jamesdentonok, can you do me a favor and post the contents of /opt/openstack-ansible/playbooks/common-tasks/dynamic-address-fact.yml02:58
cmarthere 'tis http://paste.openstack.org/show/744225/02:58
jamesdentonOh right, you're on Queens now. n/m on that03:00
cmartwould a diff of openstack_inventory.json between now and the backup from Pike be helpful?03:00
jamesdentonum, actually, can you post the inventory sections for the two hosts you're working with?03:00
*** sdake has quit IRC03:00
cmartsure. tombstone01 (infra): http://paste.openstack.org/show/744226/03:02
cmarttombstone22 (compute): http://paste.openstack.org/show/744227/03:02
jamesdentonAlso... in either the linuxbridge_agent.ini or ml2_conf.ini you'll find a var called 'local_ip'. Can you share that for each host?03:02
cmartthat's 172.29.224.201 on the infra host and 172.29.228.222 on the compute host03:04
*** jpward1981 has quit IRC03:04
jamesdentonk03:06
*** markvoelker has quit IRC03:08
jamesdentonOn your deploy host, there is a directory at /etc/openstack_deploy/ansible_facts. Can you open up the file for tombstone22 and share the entire block for 'ansible_br_vxlan'?03:10
jamesdentonand the same thing for tombstone01.03:11
jamesdentonThe IP address that populates 'local_ip' in those config files is used as the VTEP. And that address is determined dynamically during the neutron playbook run based on the ansible facts for a given host. If the IP doesn't exist (in facts) then it defaults to ansible_host, which is the br-mgmt IP we saw03:12
cmartansible_br_vxlan for tombstone22: http://paste.openstack.org/show/744228/03:12
jamesdentoni'm trying to figure out a) if the info doesn't exist in facts, and b) how to get it recognized so we can fix it properly03:12
cmartansible_br_vxlan for tombstone01: http://paste.openstack.org/show/744229/03:13
jamesdentonOK, so it looks like the facts contains the correct IP for tombstone01 172.29.228.20103:14
jamesdentonso what you might consider is to run the os-neutron-install.yml playbook only against tombstone01. "openstack-ansible os-neutron-install.yml --limit localhost,tombstone01"03:15
cmartyep. but we saw 172.29.224.201 for the `local_ip` value, which is on the management subnet rather than the tunnel subnet03:15
jamesdentonright03:15
jamesdentontwo things need to happen - we need to make sure the config file gets the right IP and then we will need to blow away the vxlan interfaces and restart linuxbridge/dhcp/l3 agents on the node03:16
jamesdentonso those interfaces can be rebuilt03:16
jamesdentonbut the key is making sure that the config file gets the right IP.03:16
jamesdentonnot sure why that didn't happen here03:17
cmartsure. happy to do that. can't hurt anything. i'm reverting my most recent changes to openstack_inventory.json to bring it back to what OSA set up during the queens upgrade.03:17
jamesdentonok yes03:17
cmarthere goes, just ran playbook03:19
jamesdentonthat was fast03:19
cmarti mean, it's running :) just started it03:20
jamesdentonahh ok03:20
jamesdenton:)03:20
jamesdentonwhen it's done, please take a peek at those neutron config files, whichever had local_ip, and see if it changed to 172.29.228.20103:20
cmartyeah linuxbridge_agent.ini now shows `local_ip = 172.29.228.201`03:21
jamesdentonfantastic03:21
jamesdentonlet me know when it's done03:22
cmartplaybook is finished03:22
jamesdentonok cool03:22
cmarti'm taking a peek at the other infra hosts to see if they have the same issue...03:22
jamesdentonthey likely do03:22
cmartyeah they do03:23
cmartso running playbook against those as well...03:23
jamesdentonyou can run it against them, too. cool.03:23
jamesdentonmake sure to include localhost03:23
*** jpward1981 has joined #openstack-ansible03:24
cmartsure enough, it changed local_ip for the other infra hosts too.03:27
cmartplaybook finished. now, question is do we still need to kick those services...03:27
jamesdentonok cool. so starting with tombstone01, we need to isolate the vxlan-* interfaces so they can be deleted and then rebuilt. One way to do that is with something like this:03:28
jamesdentonbrctl show | grep vxlan | grep -v br-vxlan | awk {'print $4'} | wc -l03:29
jamesdentonand then maybe: "for i in $(brctl show | grep vxlan | grep -v br-vxlan | awk {'print $4'}); do ip link delete $i; done"03:29
cmartthe first of your commands returns 4703:29
jamesdentonand then "systemctl restart neutron-linuxbridge-agent neutron-dhcp-agent neutron-l3-agent"03:29
jamesdentonk03:29
cmartrunning these in pieces so I understand what they do03:30
jamesdentoncan you run it without the wc -l and make sure it only contains vxlan-*03:30
jamesdentonso the idea is to delete all of those vxlan interfaces, since the vtep address is wrong. when you restart those agents )likely only linuxbridge-agent, they will be rebuilt with the proper vtep addr03:30
jamesdentonhow you do the delete is up to you - no warranty with my commands03:31
cmartunderstood :) your first command works up until the awk03:31
cmarti'll just munge it into `ip link delete` followed by the vxlan-##03:32
jamesdentonright on03:32
cmartok everything is now gone except for br-vxlan which we apparently want to keep03:32
cmartnow restarting the agent(s)03:33
jamesdentonok great03:33
jamesdentonif you -- watch 'brctl show | grep vxlan' -- you should see then come back03:34
jamesdentonideally we end up with 4703:34
cmartthey're repopulating!03:34
jamesdentonand then 'ip -d link show vxlan-62' should show 'br-vxlan' instead of 'br-mgmt'03:34
cmartit does indeed03:35
jamesdentongood deal03:35
cmartholy crap some of my servers are now back online03:35
jamesdentonyou should be good to repeat that on the other hosts, just be sure to rebuild those ip link delete commands for each host, as they could be a little different03:36
cmartright, because different vxlan numbers03:36
jamesdentonwell, same vxlan numbers, but sometimes they don't exist everywhere03:36
cmartok, rinse & repeating03:36
jamesdentonvxlan-62 represents the same 'network' across this cloud no matter what03:37
jamesdentonthe 62 is arbitrary, really.03:37
cmartright, and there's a different one for each network03:37
jamesdentonyep03:37
jamesdentonit is connected to a brq bridge that is named consistently across all hosts that have an object in that network, be it a dhcp server, router, vm, whatever03:38
jamesdentonthat brq bridge name, the 9 chars, should be the first 9 chars of the UUID of the respective network03:38
jamesdentonjust FYI03:38
jamesdentonand the connected tap interfaces, the chars line up with the port UUID for whatever its connected to03:38
jamesdentonhow are things looking?03:43
cmartreally good! just finished rebuilding the vxlan links on the last infra host. everything appears to be back online03:44
cmartI feel humbled and grateful, kind stranger03:44
jamesdentonvery glad to hear that03:45
jamesdentoni know how those 12 hr upgrades go, believe me03:45
cmartyeah. my pencil is now very dull03:46
jamesdentonwhen you're feeling up for it, can you fill out a bug describing your upgrade process, including the change in the middle, and if you have them, the playbook commands you ran?03:46
jamesdentonseems to me like it was using stale facts and applying the wrong IP for tunnel_address, either because it didn't have one in facts or it did and ignored it or something03:47
cmartyes, I should have all that. do you suspect the change in the middle may have caused the misconfigured "local_ip"?03:47
jamesdentoni don't think so03:47
jamesdentoni have heard of folks (maybe even today) having a similar issues because br-vxlan did not have an address on it, and when they upgrading, things didn't work. In the old container, the VTEP addr was configured on eth10.03:49
jamesdentonAnd the playbooks will happily default to what is effectively br-mgmt IP, and you won't be able to communicate between the infras and the computes03:49
jamesdentonand even if you fix local_ip after the fact, things still won't work until you delete the vxlan-* interfaces and rebuild them. Or restart the box03:50
jamesdentonSo, if there's anything productive to come out of this exercise maybe it's some additional troubleshooting steps to at least identify or rule out that particular issue03:50
jamesdentonsorry you had to go thru that03:50
cmartright. I did clear all my Ansible facts at the start of the upgrade, but not after switching branches03:51
jamesdentonk03:51
jamesdentonwho knows :/03:52
cmartand no apology needed :) if this is the price we occasionally pay for what is otherwise very solid infrastructure for us, then I will take that deal03:54
cmartbut clearly I need to learn more about vxlan and friends03:54
cmartif you tell me the second edition of your book is still reasonably current and applicable, then you just sold at least one book, my friend03:55
jamesdentonWell, it's just one of many little observations I've made over time in these kinds of cases.03:55
jamesdentonwell, it isn't really, to be honest. There is a third edition that's pretty relevant (Pike/Queens, IIRC) but isn't OSA-based. Packt runs $5 eBook specials all the time, i would wait :D03:57
cmartheh okay03:57
cmartbut yes, ruling out the afore-troubleshot issue seems like one for this doc https://docs.openstack.org/openstack-ansible/queens/admin/troubleshooting.html#troubleshooting-instance-connectivity-issues03:58
jamesdentonnice ascii art. lol03:59
jamesdentonbut yes, would be a good spot indeed03:59
jamesdentonlooking forward to your bug report. thanks for hanging in there. and don't be a stranger here03:59
jamesdentonget some rest. i'm sure you need it03:59
cmartyes. I'll do the writeup after a big food and a long sleep04:00
cmartThank you very much!04:00
jamesdentonperfect04:00
jamesdentonyou're very welcome. glad we figured it out04:00
* jamesdenton wipes brow04:00
*** nurdie has joined #openstack-ansible04:36
*** chkumar|out is now known as chandankumar04:38
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Update cirros from 3.5 to 3.6  https://review.openstack.org/63320804:41
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Added dependencies of os_tempest role  https://review.openstack.org/63272604:42
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Disable nova-lxd tempest plugin  https://review.openstack.org/63371104:43
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Use the correct heat tests  https://review.openstack.org/63069504:50
*** aedc has quit IRC05:01
*** ArchiFleKs has quit IRC05:18
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Always generate stackviz irrespective of tests pass or fail  https://review.openstack.org/63196705:25
*** ArchiFleKs has joined #openstack-ansible05:28
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Use tempest_cloud_name in tempestconf  https://review.openstack.org/63170805:32
*** ArchiFleKs has quit IRC05:34
*** udesale has joined #openstack-ansible05:34
*** ArchiFleKs has joined #openstack-ansible05:35
*** devx has quit IRC05:44
*** udesale has quit IRC05:46
*** udesale has joined #openstack-ansible05:48
chandankumarjrosser: morning05:52
chandankumarjrosser: https://review.openstack.org/#/c/632726/ is failing http://logs.openstack.org/26/632726/6/check/openstack-ansible-functional-centos-7/1fdf846/job-output.txt.gz#_2019-01-30_05_28_44_98101805:52
*** udesale has quit IRC05:53
*** udesale has joined #openstack-ansible05:54
chandankumarjrosser: {"changed": false, "msg": "The variable venv_install_destination_path is required and\nhas not been set.\n"05:55
*** cmart has quit IRC06:01
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_nova master: Use venv_packages_to_symlink to symlink to import libvirt-python  https://review.openstack.org/63347406:01
*** cmart has joined #openstack-ansible06:02
*** cmart has quit IRC06:08
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Update cirros from 3.5 to 3.6  https://review.openstack.org/63320806:28
*** hamzaachi_ has quit IRC06:29
*** nurdie has quit IRC06:35
*** nurdie has joined #openstack-ansible06:35
*** nurdie has quit IRC06:40
*** PTO has quit IRC06:53
*** udesale has quit IRC07:02
*** udesale has joined #openstack-ansible07:05
*** udesale has quit IRC07:06
*** udesale has joined #openstack-ansible07:08
*** udesale has quit IRC07:09
*** udesale has joined #openstack-ansible07:09
chandankumarjrosser: odyssey4me https://review.openstack.org/#/c/631967/ and https://review.openstack.org/#/c/631708/ are good to go, needed for final tripleo ci job07:10
*** radeks_ has joined #openstack-ansible07:11
*** jawad_axd has joined #openstack-ansible07:13
*** radeks_ has quit IRC07:17
*** radeks_ has joined #openstack-ansible07:23
*** kopecmartin|off is now known as kopecmartin07:26
openstackgerritChandan Kumar proposed openstack/openstack-ansible-tests master: [WIP] Add delorean-deps.repo in OSA  https://review.openstack.org/63388207:35
chandankumarjrosser: I think we need to fix vxlan issue on os_nova side also07:37
chandankumarjrosser: Same scneario tests are failing there07:37
jrosserchandankumar: yes you are right07:38
chandankumarjrosser: let me propose a patch based on yours07:38
jrosserThis appears to be happening after the host network was changed to use systemd_networkd. Most likely due to systemd version on centos being different/older than the other distro07:39
chandankumarjrosser: may be using delorean-deps.repo will the issue07:40
*** fnpanic has joined #openstack-ansible07:40
jrosserIdeally we understand what it is there that is broken, and then there is a fix possible in one place, rather than in all the roles individually07:40
chandankumarjrosser: do we collect list of installed rpms in each hosts?07:40
jrosserI think so yes07:40
chandankumarjrosser: yes we collect07:40
chandankumarhttp://logs.openstack.org/74/633474/4/check/openstack-ansible-functional-centos-7/4335862/logs/redhat-rpm-list-installed-host.txt.gz07:41
chandankumarsystemd-networkd.x86_64            219-62.el7_6.2                      @updates07:41
chandankumarjrosser: may be cloudnull can help here to fix it at one place07:43
*** jawad_axd has quit IRC07:43
prometheanfirejrosser: you don't sleep either?07:43
jrosserIt may appear that way, I guess :)07:43
jrosserchandankumar: yes it would be good for cloudnull to take a peek now we have ide tidied exactly what is broken07:44
jrosserchandankumar: and this is all about to turn into a big bonfire on centos anyway sadly https://bugzilla.redhat.com/show_bug.cgi?id=165034207:44
openstackbugzilla.redhat.com bug 1650342 in systemd "systemd-networkd support in RHEL 8" [Unspecified,New] - Assigned to systemd-maint07:44
prometheanfirelink to failure?07:45
prometheanfireand ya, redhat pissed me off because of that07:46
prometheanfirepush it on everyone and then re-invent07:46
jrosserprometheanfire: this is a workaround, hopefully the commit msg explains https://review.openstack.org/#/c/633732/07:47
chandankumarprometheanfire: it is related to tempest scenario tests failure on os_nova side http://logs.openstack.org/74/633474/4/check/openstack-ansible-functional-centos-7/4335862/job-output.txt.gz#_2019-01-30_07_29_46_08608307:47
prometheanfireah07:48
*** gkadam has joined #openstack-ansible07:51
*** slaweq has joined #openstack-ansible07:58
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Ping router once it is created  https://review.openstack.org/63388308:02
chandankumarjrosser: ^^ regarding pinging router not sure too much perfect08:03
*** mkuf has quit IRC08:10
*** pcaruana has joined #openstack-ansible08:10
*** markvoelker has joined #openstack-ansible08:16
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Added tempest.conf for heat_plugin  https://review.openstack.org/63202108:24
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Use the correct heat tests  https://review.openstack.org/63069508:25
*** aedc has joined #openstack-ansible08:32
*** rgogunskiy has joined #openstack-ansible08:32
*** mkuf has joined #openstack-ansible08:33
*** gkadam has quit IRC08:42
*** markvoelker has quit IRC08:49
chandankumarjrosser: how to kick this environment locally?08:51
chandankumarjrosser: http://git.openstack.org/cgit/openstack/openstack-ansible-os_heat/tree/zuul.d/jobs.yaml#n2108:51
*** electrofelix has joined #openstack-ansible08:51
chandankumarjrosser:       scenario: aio_metal_heat08:51
*** tosky has joined #openstack-ansible08:55
jrosserchandankumar: here are the AIO instructions https://docs.openstack.org/openstack-ansible/rocky/user/aio/quickstart.html09:03
jrosserlook at the bit where the SCENARIO environment variable is set09:03
fnpanichi09:04
fnpanici am currently trying to find where lxc_net_gateway is used in rocky?09:06
fnpanicthe docs say how to set it but i cannot find where this is used...09:06
fnpanichttps://docs.openstack.org/openstack-ansible-lxc_hosts/rocky/09:06
jrosserfnpanic: two ways to find where it is used, first in that role https://github.com/openstack/openstack-ansible-lxc_hosts/search?q=lxc_net_gateway&unscoped_q=lxc_net_gateway09:10
jrosserand second in all the repos http://codesearch.openstack.org/?q=lxc_net_gateway&i=nope&files=&repos=09:10
jrosserwhich in this case yield pretty similar results09:10
*** DanyC has joined #openstack-ansible09:16
*** DanyC has quit IRC09:24
*** DanyC has joined #openstack-ansible09:25
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Always generate stackviz irrespective of tests pass or fail  https://review.openstack.org/63196709:33
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Always generate stackviz irrespective of tests pass or fail  https://review.openstack.org/63196709:33
chandankumarodyssey4me: jrosser https://review.openstack.org/#/c/631708/ cloudname fix to merge tripleo ci patch thanks !09:35
*** markvoelker has joined #openstack-ansible09:47
*** shyamb has joined #openstack-ansible09:58
*** DanyC has quit IRC09:58
*** DanyC has joined #openstack-ansible10:05
odyssey4mejamesdenton you are magic, sir - thanks for helping cmart out so quickly and precisely... I, too, a interested to figure out what went wrong there10:06
chandankumarjrosser: https://review.openstack.org/#/c/633711/ this one also10:08
jrosserodyssey4me: ++ on that, we should make sure that the steps are captured because its a great example of systematic debugging and fixing up10:09
*** DanyC has quit IRC10:11
*** DanyC has joined #openstack-ansible10:12
*** shyamb has quit IRC10:13
*** shyamb has joined #openstack-ansible10:13
openstackgerritChandan Kumar proposed openstack/openstack-ansible-tests master: Add delorean-deps.repo in OSA  https://review.openstack.org/63388210:18
*** markvoelker has quit IRC10:19
odyssey4mechandankumar are you aware of where that var gets used, because just that alone will do nothing at all10:20
chandankumarodyssey4me: one more change coming10:20
chandankumarin openstack_hosts10:20
odyssey4meoh, what do you know - there's already a var for deps: https://github.com/openstack/openstack-ansible-openstack_hosts/blob/master/defaults/main.yml#L14210:21
chandankumarodyssey4me: https://trunk.rdoproject.org/centos7-master/delorean-deps.repo10:21
*** DanyC has quit IRC10:22
odyssey4mehmm, it looks like we've been using the deps repo the whole time10:22
*** DanyC has joined #openstack-ansible10:23
chandankumarodyssey4me: http://logs.openstack.org/83/633883/1/check/openstack-ansible-functional-centos-7/b052465/logs/ara-report/result/12e6100a-c09a-42ac-bbc7-67a1f679a436/10:25
*** DanyC has quit IRC10:25
chandankumarodyssey4me: deps is there     "baseurl": "https://trunk.rdoproject.org/centos7-master/deps/latest/",10:25
*** DanyC has joined #openstack-ansible10:26
odyssey4meyep, so what exactly are we changing then?10:26
chandankumarodyssey4me: from code search, I found only delorean.repo10:26
chandankumarso that we can enable delorean-deps.repo (which is missibg)10:26
chandankumar*missing10:27
odyssey4meoh ok10:30
chandankumarodyssey4me: does downloading repo file directly in /etc/yum.repos.d does not uses the repo? why we have to enable/disable each repo using ansible?10:30
chandankumarI mean in CI10:31
odyssey4mechandankumar I honestly have no idea.10:31
odyssey4meAs far as I recall, we had issues with repositories built in the the images at some point, and wanted to change them to be the right ones.10:32
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Ping router once it is created  https://review.openstack.org/63388310:34
*** priteau has joined #openstack-ansible10:46
chandankumarodyssey4me: enabling and disabling stackviz is a nice idea for a normal user10:54
odyssey4mechandankumar yeah, it should likely be disabled by default so it's opt-in - then enabled for CI?10:56
openstackgerritMerged openstack/openstack-ansible-os_tempest master: Use tempest_cloud_name in tempestconf  https://review.openstack.org/63170811:09
*** mkuf has quit IRC11:14
*** shyamb has quit IRC11:15
*** markvoelker has joined #openstack-ansible11:16
fnpanicjrosser: thanks! works great!11:18
fnpanicbtw will the proxy patches make it inot 18.1.2?11:19
*** udesale has quit IRC11:26
chandankumarodyssey4me: yes, I will propose a patch for the stackviz enable11:38
chandankumarodyssey4me: http://logs.openstack.org/26/632726/6/check/openstack-ansible-functional-centos-7/1fdf846/logs/ara-report/result/10aaa101-aab4-40c4-9fcc-6a554fcb1790/11:38
chandankumarodyssey4me: I am not sure what is wrong with this patch https://review.openstack.org/#/c/632726/11:38
odyssey4mefnpanic I think they're all in, yes.11:40
fnpanicso as soon as 18.1.2 is released i should be fine right?11:41
jamesdentonodyssey4me i'm hoping there's something in the bug report that can shed light on what happened. i'll be out today, as the kids are all home from school due to the cold, cold temps11:41
odyssey4mechandankumar adding it as a dependency, makes python_venv_build run prior to the os_tempest role - but the metal dep does not have the included vars, so it fails11:41
odyssey4mechandankumar the role dep will work for config_template, I think (is that right evrardjp?) - but not for python_venv_build11:42
odyssey4mejamesdenton oh dear - hope they mend well, enjoy making the chicken soup :)11:43
chandankumarodyssey4me: let me try the reverseoricess11:43
jamesdenton:)11:43
evrardjpadding dependencies on projects that have a tasks/main.yml is generally a bad idea in the long run (if conditionals are added for example)11:44
evrardjpadding dependencies to roles with module_utils , modules or various plugins make sense to me11:44
evrardjpif that was the question11:45
fnpanicbtw when i want to use the dynamic inventory to do a matrix ping check for example, how can i use it? the json file is not accepted by ansible. The docs are not clear to me.. :-(11:45
odyssey4meevrardjp yes, I was asking if adding config_template as a meta-dep will work as-is, or whether anything else is needed11:45
odyssey4mefnpanic just do: cd /opt/openstack-ansible; ansible -m ping all11:46
odyssey4mefnpanic as long as you're in the subdirectory /opt/openstack-ansible, the wrapper will include the inventory11:46
fnpanici wanted to use fping for a matrix like ping11:46
odyssey4mefnpanic otherwise you can also do: ansible -i /opt/openstack-ansible/inventory -m ping all11:47
fnpanicnot  the ansible ping ;-)11:47
fnpanicah ok...11:47
odyssey4mefnpanic well, I was illustrating how to use the inventory, not how to write the whole task :p11:47
* jrosser has a jenkins job doing matrix ping11:48
*** mkuf has joined #openstack-ansible11:48
jrosserfor bonus points you should also do the inverse, check that the things that shouldnt be able to talk to each other cant11:48
fnpanic - /opt/openstack-ansible/inventory does not exist :-)11:48
evrardjpodyssey4me: nothing is required as far as I know, because I am using it outside OSA right now11:48
evrardjp:)11:49
odyssey4mefnpanic which series are you using?11:49
fnpanicjrosser: yeah, that is why i am disabling nat now in the lxcbridge11:49
fnpanicthe one i tried is pike11:49
odyssey4mefnpanic assuming you cloned openstack-ansible to /opt, then https://github.com/openstack/openstack-ansible/tree/stable/rocky/inventory should be there11:49
fnpanici will switch to rocky system11:50
*** markvoelker has quit IRC11:50
odyssey4mefnpanic the proxy fixes only went back to rocky IIRC11:50
fnpanicodyssey4me: the inventory is in rocky11:50
fnpanicodyssey4me: the pike one was just a system running here which i was logged into11:51
fnpanicwill reinstall it with rocky11:51
fnpanicso the fixes will be in 18.1.2 when it is released, r8?11:51
odyssey4mefnpanic for pike, the path is /opt/openstack-ansible/playbooks/inventory11:52
fnpanicyeah, works for both now. Thanks!11:53
odyssey4mefnpanic fnpanic 18.1.3 is the next scheduled release - and yes, the fixes will be in it11:53
fnpanicok11:53
odyssey4me18.1.2 is out already, but it didn't have one of them IIRC11:53
fnpanicok, so i will cherrypick the fixes for 18.1.2.11:54
fnpanicany eta for 1.3?11:54
odyssey4mefnpanic no need - just use stable/rocky11:54
fnpanicok11:54
odyssey4meor if you want a fixed point for docs or something, use the SHA at the current head of stable/rocky11:54
odyssey4meie, git checkout a603874cc54e62ccb3ac290443b59f242c5df84c11:55
fnpanicthanks11:55
*** sdake has joined #openstack-ansible12:02
*** shyamb has joined #openstack-ansible12:07
*** gkadam has joined #openstack-ansible12:08
jrosserfnpanic: if you are disabling nat on the container eth0 interface you may as well just remove that interface entirely?12:11
jrosserfnpanic: see what is done here https://review.openstack.org/#/c/625523/9/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j212:11
*** ansmith has joined #openstack-ansible12:17
fnpanicjrosser: thanks! where is this set in the docs?12:18
*** gkadam is now known as gkadam-bmgr12:19
jrosserfnpanic: it's not in the docs per-se, this really an example of where you are able to override any of the role defaults you like in your user_variables, to customise the deployment as you like12:20
fnpanicok, got it.12:21
fnpanicso then there is no default GW in the containers i guess.What about the dnsmasq part?12:21
fnpanicbecause what if the containers need to talk to ldap for example12:22
jrosserit depends where the ldap server is and if the containers have a route to it via the mgmt network12:22
jrosserthats still there on eth112:22
jrosserbut really again you get to choose how you architect that12:23
fnpanicthen i need to put in route on the mgmt network. i will use the syntax from the routed deployment example12:24
*** shyamb has quit IRC12:24
*** shyamb has joined #openstack-ansible12:25
jrosserfnpanic: it's worth studying the defaults/main.yml for the roles, they really are the documentation12:26
jrosseryou can do this for example https://github.com/openstack/openstack-ansible-lxc_container_create/blob/33268989de7d271a28386c9f57fb6c295baad0b0/releasenotes/notes/container-extra-networks-c74119ba6a559a59.yaml12:26
*** Pbing has joined #openstack-ansible12:28
*** apevec has joined #openstack-ansible12:28
PbingWhen I am trying to create a instance on compute node. i am facing below error12:29
PbingResourceProviderCreationFailed: Failed to create resource provider12:30
Pbingi am using devstack12:30
*** shyamb has quit IRC12:30
fnpanici was thinking of doing this with   provider_networks: in openstack_user_config and adding this to the keystone containers for example12:31
CeeMacafternoon all12:33
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Add tripleo-ci-centos-7-standalone-os-tempest job  https://review.openstack.org/63393112:35
CeeMacis anyone able to help with a PortBindingFailed issue when spinning up an instance?12:37
jrosserfnpanic: there a lots of different ways depending on what you want - provider_networks on o_u_c is fine if you are happy with the dynamic inventory assigning the addresses12:37
jrossercontainer_extra_networks is a bit more low level than that, you can specify exact IP you want per container, if that is what you need. Pick the one that suits I guess12:38
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Added dependency of os_tempest role  https://review.openstack.org/63272612:40
openstackgerritChandan Kumar proposed openstack/openstack-ansible-os_tempest master: Add tripleo-ci-centos-7-standalone-os-tempest job  https://review.openstack.org/63393112:42
chandankumarodyssey4me: hello12:42
*** sdake has quit IRC12:43
chandankumarodyssey4me: I need to reuse this var file https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/validate-tempest/vars/tempest_skip_master.yml in the existing tripleo os_tempest job so that if we start moving other scenario job it does not explode12:43
chandankumarodyssey4me: will I directly call the var file there?12:43
chandankumarjrosser: need some help here on router ping stuff http://logs.openstack.org/83/633883/2/check/openstack-ansible-functional-centos-7/75ddf44/logs/ara-report/result/e79dbc75-2c5d-46c7-8263-77c1b9934256/12:45
*** sdake has joined #openstack-ansible12:45
chandankumarjrosser: https://review.openstack.org/63388312:45
jrosseroh wow12:45
odyssey4mechandankumar where's the test playbook you want to apply that vars file to the os_tempest role?12:45
jrosseryou need to pick the right fields out of the registered variable12:46
chandankumarodyssey4me: https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/multinode-standalone.yml#L4812:46
chandankumarodyssey4me: I might need to copy that var file in the playbook directory12:47
*** markvoelker has joined #openstack-ansible12:47
odyssey4mechandankumar nope, not necessary - I'll work up a gist for you12:47
odyssey4melemme grab some coffee first12:47
chandankumarodyssey4me: sure thanks :-)12:47
jrosserchandankumar: i think you can make that much simpler, the 10.1.3.x address should be accessible directly with ping, no need to use the netns12:48
jrosserthe external IP of the router is public12:48
chandankumarjrosser: ok just ping module will do the job12:48
jrosserno, thats ansible ping not ip ping12:49
jrosser^ confusing :/12:49
chandankumarok got it12:49
chandankumarjrosser: let me update the patch12:49
jrosseryou see the field with the ip there in the error? just extract the right part of that dict12:49
jrosserchandankumar: to be even more excellent you could assert that 'admin_state_up': True12:51
chandankumarok, nice12:51
jrosserrouter up, ping fails is even more useful debug12:54
jrosserchandankumar: maybe -c1 is a bit fail prone, particularly if arp tables are empty, the first one might go missing12:55
odyssey4mechandankumar https://gist.github.com/odyssey4me/e1bd3b3bea5851a988ac015c94b4b52e should do the trick12:57
odyssey4me(for the tripleo changes)12:58
openstackgerritMartin Kopec proposed openstack/openstack-ansible-os_tempest master: Improve overview subpage  https://review.openstack.org/63393412:58
openstackgerritMartin Kopec proposed openstack/openstack-ansible-os_tempest master: Improve overview subpage  https://review.openstack.org/63393413:00
*** Pbing has quit IRC13:10
chandankumarodyssey4me: thanks , will try that13:12
*** markvoelker has quit IRC13:13
mnasermorning all :)13:38
openstackgerritJonathan Rosser proposed openstack/openstack-ansible-os_heat master: [DNM] heat tempest tests finding  https://review.openstack.org/63069413:39
jrossero/ mnaser13:40
mnaserhow are ya jrosser13:42
jrossergood, busy but good13:42
jrosserdid you have any more thoughts on that heat internal vs external thing?13:42
mnaserjrosser: i haven't dug as much, i am concerned because i think that the url presented for os-collect-config might be one of those13:45
mnaserwhich means that a vm gets deployed would get an internal url13:45
jrosseri was worried about the original patch really too13:46
jrosserbut i'm not using heat for anything so don't have a reference right now13:46
*** rgogunskiy has quit IRC13:47
mnaserjrosser: i mean i know without that patch, magnum doesn't work if your internal net isnt accessible for sure13:48
*** pcaruana has quit IRC13:50
jrosserour tests are unhelpful, because the service containers all get to see the inside and outside trivially13:51
mnaserjrosser: yeah we don't do a good job on that part13:51
jrosseri want to revisit the patch i did for tempest yesterday because in hindsight thats just made it worse13:52
jrosseras in, more short-circuiting of inside/outside13:52
mnaseri think so13:54
mnaseri mean i think its a big major effort we need to look into13:54
jrosserwe should check this with cloudnull - he did a bunch of stuff on the host networking recently13:54
*** strattao has joined #openstack-ansible13:55
*** pcaruana has joined #openstack-ansible13:57
mnaseryeah, i also need to get around pulling containers out of c713:57
mnaserjrosser: lol, guilhermesp literally just got blocked because an internal path was just pulled in a magnum deployment14:01
guilhermesphhahaha mnaser14:01
mnaserguilhermesp: do you have this patch in your checkout? https://review.openstack.org/#/c/619355/14:01
guilhermespmnaser: checking14:02
guilhermespall right, I don't14:03
guilhermespit is using the internal endpoint14:03
guilhermespI can check it out a re-execute the heat playbook to see how it goes14:04
jrossermnaser: hrrm - in your deploys do you allow your internal things to nat out and hit the external endpoint?14:04
mnaserguilhermesp: yes, try again with the patch14:06
mnaserjrosser: yes, they can technically do that14:06
mnaserits not firewalled off14:07
jrosserthat might be what allows it to work for you, when it failed for ThiagoCMC yesterday14:07
jrosserheat, that is14:07
mnaserwell heat works fine in this scenario14:14
mnasereven after that patch14:14
*** sdake has quit IRC14:20
guilhermespowo mnaser jrosser the cluster was created successfuly14:20
guilhermesp:D14:21
openstackgerritChandan Kumar proposed openstack/openstack-ansible-tests master: Add delorean-deps.repo in OSA  https://review.openstack.org/63388214:24
mnaserso yeah.. it's the fix14:25
mnaserand heat works fine with it applied too, right guilhermesp ?14:26
guilhermespseems to be mnaser I didn't test heat specifically but I assume is working because the creation was completed14:27
guilhermespcluster creation*14:27
*** jpward1981 has quit IRC14:28
*** jpward1981 has joined #openstack-ansible14:29
*** sdake has joined #openstack-ansible14:34
mgariepycan I have some review on https://review.openstack.org/#/c/632907/ and https://review.openstack.org/#/c/632908/ please :)14:43
jrosserguilhermesp: whats the state of the OSA magnum bits? "just works"?14:43
*** cmart has joined #openstack-ansible14:45
*** jpward1981 has quit IRC14:47
openstackgerritMerged openstack/openstack-ansible-os_tempest master: Always generate stackviz irrespective of tests pass or fail  https://review.openstack.org/63196714:47
*** sdake has quit IRC14:49
guilhermespjrosser: in my 18.1.1 deployment, now it is working. I needed a few workarounds14:49
guilhermespone of them, is on task that I'm going to do asap: the heat user in heat domain didn't get an admin role14:49
guilhermespsecond: that heat patch mnaser provided14:50
openstackgerritMerged openstack/openstack-ansible-os_tempest master: Update cirros from 3.5 to 3.6  https://review.openstack.org/63320814:51
*** sdake has joined #openstack-ansible14:55
gshippeyhttps://review.openstack.org/#/c/608031/ has anyone seen this issue on a fresh rocky AIO?15:05
gshippeyspecifically with octavia deployed15:07
jrosserodyssey4me: do we need to backport that ^ ?15:08
odyssey4mejrosser it would appear so, given it's master only15:08
*** udesale has joined #openstack-ansible15:08
jrosserok, we'll do it15:09
odyssey4methanks jrosser15:09
openstackgerritGeorgina Shippey proposed openstack/openstack-ansible-os_neutron stable/rocky: Fix whitespace in neutron.conf template  https://review.openstack.org/63397315:09
*** sdake has quit IRC15:11
odyssey4mejrosser also interesting is https://github.com/openstack/openstack-ansible-os_neutron/commit/30e20d6cd7a6a248aa2336f501e3de9b47f263be#diff-574b76c26842c4c0c2607fc38bf7e90d15:11
*** sdake has joined #openstack-ansible15:13
mnaserisnt that whole layer deprecated in rocky btw?15:15
ThiagoCMCGood morning guys!   =P15:15
*** TxGirlGeek has joined #openstack-ansible15:16
ThiagoCMCI'm curious about something... Yesterday I finally managed to deploy OSA/Rocky/Ubuntu with Ceph, Glance, Cinder, Nova, Neutron and Heat working! Then, I tried the Object Storage RadosGW, it was deployed successfully but, when I access Horizon, I'm seeing the following errors:15:17
ThiagoCMCThis error is seen when 'Object Storage' is selected: Error: Unable to get the Swift container listing.15:17
ThiagoCMCThis error is seen when I attempt to create containers: Unable to create containers.15:17
ThiagoCMCAny idea?15:18
ThiagoCMCI don't need Swift to use RadosGW, right>15:18
ThiagoCMC?15:18
cloudnullmornings15:19
ThiagoCMCMorning!  =)15:19
*** ostackz has joined #openstack-ansible15:22
odyssey4meThiagoCMC Nope, you don't need swift - what's happening, though, is that radosgw is pretending to be swift - which is why the horizon plugin is there, I think.15:23
odyssey4meThiagoCMC There may be something broken in the rados config - I dunno.15:23
jrosserwe have that working here - horizon shows us the radosgw contents15:24
jrosserstuartgr might be able to help15:24
stuartgrThiagoCMC: the object storage pages in horizon work for us. We don''t have swift deployed, just a rados gateway providing the swift API listed in the service catalog15:27
*** TxGirlGeek has quit IRC15:28
stuartgrThiagoCMC: we disable the S3 API and put the following setting in ceph.conf on the rados gateway:  rgw swift url prefix = /15:32
odyssey4mestuartgr ThiagoCMC I know that fghaas also did a bunch of work recently to make it work with swift enough to pass all the appropriate swift API standard tests.15:33
ThiagoCMCOh, wow! Thank you guys so much for this help!   :-D15:38
ThiagoCMCI'll try that prefix thing...15:38
ostackzcloudnull hi, could you help to understand what are correct git commands to issue regarding systemd?15:46
cloudnullwhats going on ?15:46
ostackzWe recently had a chat on topic that syslog-remote did not receive data from other hosts, you had fix for it http://eavesdrop.openstack.org/irclogs/%23openstack-ansible/%23openstack-ansible.2019-01-22.log.html#t2019-01-22T16:32:1215:46
cloudnullyes15:47
ostackzI have fresh Rocky 18.1.2 install after that, but Im afraid to run following - not to switch my git to Stein:15:47
ostackzcd /opt/openstack-ansible; git fetch https://git.openstack.org/openstack/openstack-ansible refs/changes/05/632505/1 && git checkout FETCH_HEAD;15:47
odyssey4meostackz that fix is in rocky already - just checkout stable/rocky15:47
ostackzopenstack-ansible infra-journal-remote.yml15:47
ThiagoCMCstuartgr, I'm reading that if S3 is disabled, then, it's impossible to use multi-site configuration! This sounds undesirable but, I'll try it anyway! Also, do you add that line under [global] in ceph.conf?15:47
odyssey4meostackz also, that would have you checking out master - not rocky!15:48
cloudnullostackz what odyssey4me said, but you could also run the git command with "cherry-pick" instead of "checkout"15:48
ostackzodyssey4me, cloudnull what I would be happy to do is have this systemd patched on top of 18.1.2 but not switch away from it so I can for sure upgrade to 18.1.3 later15:50
ostackznot master of git yet so not sure if I can get patches in rocky/stable if Im in 18.1.215:51
odyssey4meostackz if you checkout stable/rocky now then you will have the currently proposed 18.1.3... and if there are further changes, you will be able to upgrade to whatever is actually released as 18.1.315:54
openstackgerritKevin Carter (cloudnull) proposed openstack/openstack-ansible-ops master: Add the ability to set the JVM heap size  https://review.openstack.org/63398415:54
odyssey4meas far as I know, the request to release went in yesterday15:54
mnaserjrosser: will you be at denver?15:55
mnaseralso asking same for cloudnull odyssey4me evrardjp and others :)15:55
mnaserunless thats too early to ask now15:55
odyssey4meostackz yep, the release request went in yesterday: https://review.openstack.org/63371915:56
odyssey4meostackz if you wish to checkout that particular SHA to match it before the tag goes out, then do: git checkout 93bcc1fd24ae1d46c8773cd218f0b63531f273b015:57
ThiagoCMCstuartgr, I added the " rgw swift url prefix = /" to my 3 radosgw containers and restarted them, however, Horizon still can't list containers, neither create them.  :-(15:57
*** gkadam-bmgr has quit IRC15:57
odyssey4meostackz that gets you the journal fix: https://github.com/openstack/openstack-ansible/commits/93bcc1fd24ae1d46c8773cd218f0b63531f273b015:57
ThiagoCMCSo I need to explicity disable S3 somewhere as well?15:58
*** arxcruz|ruck is now known as arxcruz15:58
ostackzodyssey4me ok, starting to get it! Only thing remaining is actual commands to run, my "git tag" only shows 18.1.2 so I would do something like "cd /opt/openstack-ansible/ ; git checkout ...?" to get current stage of 18.1.3 ?15:58
odyssey4meostackz - cd /opt/openstack-ansible; git checkout 93bcc1fd24ae1d46c8773cd218f0b63531f273b016:00
*** pcaruana has quit IRC16:01
odyssey4mewell, you might need to do a little more, so let's be sure16:01
odyssey4mecd /opt/openstack-ansible; git fetch --all; git checkout 93bcc1fd24ae1d46c8773cd218f0b63531f273b016:01
*** udesale has quit IRC16:02
ostackzodyssey4me thanks! git part worked fine, now will rerun playbooks. After these commands I understand Im somewhere between 18.1.2 - 18.1.3 and when 18.1.3 comes out it should be ok to apply that?16:09
stuartgrThiagoCMC:  disable S3 API by omitting it from this list in ceph.conf:   rgw enable apis = swift, swift_auth, admin16:09
evrardjpostackz: https://review.openstack.org/#/c/633719/16:11
odyssey4meostackz unless https://review.openstack.org/633719 changes, that SHA is 18.1.316:11
jrosserThiagoCMC: regarding having to turn S3 off - we have had to deploy different radosgw instances to serve S3 and swift, otherwise you can't pass the validation tests. Both API really need to be served from /16:13
*** sdake has quit IRC16:14
ostackzevrardjp odyssey4me well, then if I do cd /opt/openstack-ansible; git fetch --all; git checkout I4933d91cb989b3fa3010db8528c56b616f99ce7c I actually would be in 18.1.3?16:15
ostackzWhen will that change marked as tag 18.1.3?16:15
*** Pbing has joined #openstack-ansible16:16
odyssey4meostackz when the above-mentioned review is merged16:16
openstackgerritJuri Hudolejev proposed openstack/openstack-ansible-os_glance stable/rocky: Fix Glance NFS mount point ownership  https://review.openstack.org/63399316:17
odyssey4meusually that takes no more than 2-3 days from the release team16:17
*** pcaruana has joined #openstack-ansible16:17
*** kopecmartin is now known as kopecmartin|off16:20
*** sdake has joined #openstack-ansible16:21
ostackzodyssey4me, cloudnull, evrardjp thanks, got more clear this git checkout stuff! My syslogs start to flow now and will wait for 18.1.3 comming soon.16:22
odyssey4meostackz a tag is just a human-friendly way to refer to a SHA - it's a pointer from the tag to the SHA16:27
*** nurdie has joined #openstack-ansible16:28
*** TxGirlGeek has joined #openstack-ansible16:31
*** macza has joined #openstack-ansible16:36
*** macza has quit IRC16:37
ThiagoCMCjrosser, is this going to be like this forever?  I mean, RadosGW = no S3 and no multi-site / you need two kinds of RadosGW?16:41
jrosserthat is what stuartgr and I have built, yes16:41
ThiagoCMCOh, I see... Well, ok then!   =)16:41
odyssey4meThiagoCMC I think he's suggesting to go with a known good config, then once that's working you can experiment to get to where you want to go.16:42
ThiagoCMCSure!16:42
*** gyee has joined #openstack-ansible16:42
jrosserstart with radosgw doing swift and get that to pass tempest/refstack/whatever and the horizon integration going16:42
jrosserif you want S3 as well thats not really an OSA thing, you need to extend your ceph deployment somehow16:43
ThiagoCMCOkdok!16:43
ThiagoCMCThanks for clarifying that!16:43
ThiagoCMCMaybe for S3, go with Swift instead of Ceph? To be an OSA thing then?16:43
jrosseryou can probably construct enough extra config for some radosgw-s3 containers and haproxy endpoints to do that16:43
jrosserit would all just be extra config data16:43
ThiagoCMCok16:44
ThiagoCMCIs it possible to use Swift for S3 while hosting data on Ceph somehow?16:44
ThiagoCMCOr it doesn't make any sense?  lol16:44
jrosser^ that, i think :)16:44
odyssey4meThiagoCMC ceph-rgw's advantage is using a common back-end for storage with your block storage (cinder) too... so same hardware = more flexibility16:44
odyssey4meswift does things that ceph doesn't, and ceph does things swift doesn't16:45
odyssey4meyou really have to figure out what's important to you for your use-case16:45
odyssey4meyou can also have both as far as I know, although then you'd have to double up storage servers which probably doesn't make much sense16:45
jrosserwe also use a dedicated haproxy for this becasue we have massive object traffic, so odyssey4me is right you need to tune up your deployment to match your use case16:45
*** pcaruana has quit IRC16:45
ThiagoCMCNice! What's important for me is S3 and multi-site...16:48
ThiagoCMCBut we're using Ceph for everything16:48
jrosserthe same here, minus the multi-site16:49
ThiagoCMCcool  =)16:49
*** electrofelix has quit IRC16:49
jrosserit would be nice to come up with an example to document this in the ops repo, if you choose to do it all with the OSA/ceph integration16:49
*** sdake has quit IRC16:51
*** TxGirlGeek has quit IRC16:56
*** TxGirlGeek has joined #openstack-ansible16:57
jrossercloudnull: this you just rechecked https://review.openstack.org/#/c/633104/17:00
cloudnullyes?17:01
jrossertake a read of this https://review.openstack.org/#/c/633732/ which i think is the same reason it is failing, but in a different role17:01
*** sdake has joined #openstack-ansible17:01
cloudnulllooks like that merged yesterday?17:02
cloudnullso this recheck should work, I assume?17:02
*** radeks_ has quit IRC17:02
*** radeks_ has joined #openstack-ansible17:02
jrosserso, i have a theory that changing the host networking to systemd_network has brok things on centos17:03
jrosserwhich shows itself as container traffic in role tests heading out eth0 / default route can no longer get to 10.1.3.x on br-vlan17:03
jrosserwe held a node the other day and poked it after one broke, and that was certainly the case17:04
cloudnulldid the ip exist on the interface?17:04
jrosserso either we go round all the role tests and fix them up, or there is something subtle happening with systemd_networkd on centos which breaks the previous behaviour17:05
jrosserno it didnt17:05
cloudnullwas the ip in config though ?17:05
jrosserbefore the tests relied on the default route and host networking for tempest to ssh to the instances17:05
cloudnullor was it all together missing17:05
jrosseri added one, and it started working https://review.openstack.org/#/c/633732/2/tests/host_vars/tempest1.yml17:06
cloudnullah17:06
jrosserbut it's a bit ewwwwwww becasue thats now on all the containers, not just tempest17:06
jrosserand thats very much crossed the streams of internal/external networking, which i don't really like17:06
*** priteau has quit IRC17:08
cloudnullwe could add a local only route instead of an IP adderss?17:08
jrosserwell, ideally there is something we can do in the systemd_networkd role to mend whatever is wrong on centos17:09
jrosserotherwise we have to visit all the role tests and fix them up17:09
cloudnullsomething like `ip route add 10.1.3.103/24 dev br-vlan table local`17:09
cloudnullwell. `ip route add 10.1.3.0/24 dev br-vlan table local`17:09
jrosseron the host?17:10
cloudnullyes17:10
cloudnullisn't br-vlan supposed to have an IP on it ?17:10
jrosseryes it has one, the .1 is on there17:10
cloudnullok17:10
cloudnullbut in the container theres no address17:11
jrosserright - eth12 was there and wired, and hooked to br-vlan17:12
cloudnullok17:12
jrosserso the "fix" for tempest was to stick and IP on it, and to keep the test host_vars simple to all the other containers too17:12
jrosserbut this is only centos brokenness17:12
cloudnull"\17:13
* cloudnull <begins rant about centos> 17:13
cloudnulldo we need to implement that fix in the tempest role generally ?17:13
jrosserwe can do, but it's a lot of patches, so if there is a route we can add somewhere globally that would be prefereably17:14
cloudnullso in the event we have br-vlan and eth12 we need a route added to the cidr used by br-vlan ?17:17
mnasercloudnull: i'll be removing container support in centos-7 -- which i believe largely affects/caused by this17:18
cloudnullmnaser it sounds like an issue regardless of containers.17:18
mnasersorry i see bridges and assume containers :)17:18
mnaseri didn't read context17:19
*** nurdie has quit IRC17:19
* cloudnull just getting up to speed myself 17:19
cloudnullso maybe it is17:19
jrossercloudnull: adding the ip to eth12 automatically gives the route in the container and everything works17:19
*** nurdie has joined #openstack-ansible17:19
jrosserbut it does leave open what is actually brok on the host compared to bionic, for example17:20
jrossersmells like some systemd version nonsense17:20
mnaserok y'all17:20
cloudnullcould be17:20
mnaserwhen are we just going to start rolling out our own distro17:21
mnaserim down for it at this point17:21
cloudnullgentoo ?17:21
mnaserlatest kernels to support latest tech17:21
cloudnull-cc prometheanfire :)17:21
mnaserone common deployment target17:21
cloudnulljrosser itd be awesome to compare the routing table of cent vs bionic17:22
cloudnullif its just something missing then we can simply make sure the missing bits are added17:22
cloudnullwhich test did this crop up on ?17:22
cloudnullI can spin a cent and bionic test instance of the same thing to try and recreate it17:23
mnaseri know the systemd_networkd bits arent super reliable (the ones that creates routes and all)17:23
jrosseros_tempest, but that got attention becasue there is quite a lot of activity there and it was blocking stuff merging17:23
mnaserlike it has a lot of "|| true" or things like that17:23
mnaserso lots of things can break and it just noops17:23
bgmccollumis the current discussion why my flat external network wasn't working properly?17:23
prometheanfirewat17:23
prometheanfirecloudnull: cent7 non-networkd issues?17:24
jrossercloudnull: if you could compare centos vs. bionic routing now the pressure is off unblocking tempest that would be ace17:24
cloudnullbgmccollum kinda.17:24
mnaserprometheanfire: we're just deciding we're dropping support for all operating systems and just doing gentoo from now on17:24
mnaserwith our own hand rolled kernels and os17:24
prometheanfiremnaser: sgtm17:24
prometheanfireI've been hand-rolling my kernels forever :P17:25
odyssey4memnaser cloudnull so... we're just gonna turn ourselves into Piston?17:25
prometheanfirelol17:25
cloudnullhahahaha17:25
bgmccollumpxe -> initrd -> dd -> done?17:25
mnaserminus the part where we disappear :(17:25
cloudnullodyssey4me exactly :)17:25
mnaserrip piston17:25
cloudnullthey had the best parties17:25
mnasermaybe that contributed to their disappearance17:25
mnaserahaha17:25
cloudnullhahaha17:25
cloudnullcould be17:25
odyssey4melol17:26
prometheanfireyou can always judge a project by their parties17:26
mnaseri always say hp died off17:26
mnaserbecause of those crazy parties17:26
mnaserlol17:26
prometheanfirepossible17:26
cloudnullim sure that was part of it17:26
cloudnulljrosser so just run tests from os_tempst?17:26
jrosserdo whatever the role test in the gate does17:26
cloudnullok.17:27
odyssey4mecloudnull you'd need to revert the fix jrosser made - then do run_tests.sh functional17:30
mnaserbtw, cores -- just a heads up17:31
mnaserhttp://lists.openstack.org/pipermail/openstack-discuss/2019-January/002176.html and http://lists.openstack.org/pipermail/openstack-discuss/2019-January/002237.html17:31
*** DanyC has quit IRC17:31
odyssey4methat reminds me, I am just not getting those emails - I need to figure out why17:31
odyssey4meoh yes, very welcome jamesdenton17:31
*** DanyC has joined #openstack-ansible17:31
jrosseroh thats cool - i'm already plotting an adventure with networking-vpp17:33
cloudnullYES! welcome jamesdenton!17:35
* cloudnull goes to update my list subscription ... 17:35
mnaserodyssey4me: are you subscribed to openstack-discuss?17:36
*** DanyC_ has joined #openstack-ansible17:37
*** DanyC has quit IRC17:37
mnaseropenstack-dev and openstack are not longer a thing, moved to openstack-discuss17:37
odyssey4memnaser I have requested it several times, but I see that nothing's ever come back to me. I'll work out what's going on tomorrow.17:37
odyssey4meTime for me to exit for the day. Cheers folks!17:38
mnaserlater odyssey4me17:40
*** cmart has quit IRC17:43
jamesdentonweeeeee17:43
jamesdentonhttps://www.youtube.com/watch?v=wSqWc88Qj4U17:44
jamesdentonthanks mnaser odyssey4me cloudnull. happy to help!17:45
mnaserlols17:46
*** DanyC_ has quit IRC17:52
ThiagoCMCstuartgr, I only have "rgw_enable_apis = swift", so I believe that S3 isn't there but, I don't have swift_auth, neither admin. Under which config [group] you have yours "rgw swift url prefix = /"?18:02
ThiagoCMCHere is my ceph.conf from rgw container (from OSA): http://paste.openstack.org/show/744271/18:03
ThiagoCMCHorizon still can't list object containers, neither create them.18:03
ThiagoCMC:-18:03
stuartgrThiagoCMC:  "rgw swift url prefix = /"  should be under  [client.rgw.vucmon-1-ceph-rgw-container-875ddb12]18:08
ThiagoCMCOh, thanks!18:08
ThiagoCMCJust on first container?18:09
ThiagoCMCbetter replicate 3 * 3, right?18:09
stuartgryes, on all containers18:09
ThiagoCMCok18:09
nsmedshey guys - if we're going to upgrade from Queens to Rocky, do we need to be on latest version of Queens before upgrading?18:12
nsmedsusing 17.1.2, looks like newest is 17.1.718:12
nsmedserr, 17.1.6*18:13
*** shardy has quit IRC18:15
ThiagoCMCstuartgr, still error: "Unable to get the Swift container listing"   :-(18:16
*** pcaruana has joined #openstack-ansible18:17
ThiagoCMCcould you share a working ceph.conf from a ceph-rgw container?18:19
stuartgrswift API expected to be on port 7980 I think, so where you have  "rgw frontends = civetweb port=172.29.236.172:8080 num_threads=100"   that should be  "rgw frontends = civetweb port=172.29.236.172:7980 num_threads=100"18:22
stuartgrand similar on other containers18:22
jrosserThat needs to match up with the haproxy config18:23
ThiagoCMCYeah, I'm just using the OSA/Rocky defauls...18:23
jrosserAnd the object storage entry in the device catalogue18:23
ThiagoCMCsounds hard to deploy it via openstack-ansible and then, change a bunch of things...18:23
jrosser*service18:23
ThiagoCMCI'm deploying both OpenStack and Ceph (osd/mon/mgr/rgw) via OSA.18:24
ThiagoCMCI was expecting that it would just works out of the box  :-P18:24
jrosserosa just wraps ceph-ansible18:24
ThiagoCMCsure18:24
ThiagoCMCI can tell that my ceph cluster os healthy, glance works, nova boot and cinder create/attache, all using rbd18:25
jrosserAs far as out of the box goes there is a ceph test in the osa gate which should be your reference18:25
ThiagoCMCHmmm18:25
jrosserThat deploys all the bits enough to pass tempest with object storage18:25
ThiagoCMCI see... Well, it isn't working for me...  :-(18:28
ThiagoCMCBut, okay18:28
stuartgrThiagoCMC:  ceph.conf from one of our rgw containers:   http://paste.openstack.org/show/744272/18:30
ThiagoCMCThank you!!!18:30
ThiagoCMCYou had to change the haproxy settings for that port, right?18:32
ThiagoCMCIs there a OSA-way of doing this? Or you just manually changed it after ansible?18:32
openstackgerritMerged openstack/openstack-ansible-os_tempest master: Disable nova-lxd tempest plugin  https://review.openstack.org/63371118:33
ThiagoCMCOk, I'm also manually updating the port to 7980 at /etc/haproxy/conf.d/ceph-rgw18:43
ThiagoCMCSame problem...   =/18:44
ThiagoCMCNever mind...18:44
ThiagoCMCI gave up on OSA / radosgw / horizon for now18:44
jrosserThe haproxy config is generates from here https://github.com/openstack/openstack-ansible/blob/master/inventory/group_vars/haproxy/haproxy.yml#L34318:47
jrosserThiagoCMC: the thing here is that the OSA role defaults/main.yml and the host/group vars set “sensible defaults”18:48
jrosserYou are free then to use the user_variabkes.yml or your own host/group bars to override any of that as you need18:49
jrosser*vars18:49
*** TxGirlGeek has quit IRC18:50
*** TxGirlGeek has joined #openstack-ansible18:51
*** Darcidride_ has joined #openstack-ansible18:52
*** Darcidride_ has quit IRC18:54
*** priteau has joined #openstack-ansible18:54
*** Darcidride_ has joined #openstack-ansible18:55
*** Darcidride_ has quit IRC18:55
ThiagoCMCOk! I manually changed the rgw's ceph.conf and haproxy but, Horizon still can't list/create new containers on it.19:00
*** TxGirlGeek has quit IRC19:00
ThiagoCMCAnd I'm just shooting in the dark now, which isn't good. I'll try again later on, maybe on next OSA/Stein release and hope that it will just works19:00
*** TxGirlGeek has joined #openstack-ansible19:02
openstackgerritGuilherme  Steinmuller Pimentel proposed openstack/openstack-ansible-os_heat master: Add heat user to heat domain admin role  https://review.openstack.org/63403219:06
ThiagoCMCguilhermesp, is this missing on OSA/Rocky stable branch?19:09
ThiagoCMCQuick question... 1 TASK from os-neutron-install.yml is failing only in 1 of 3 machines, error:19:26
ThiagoCMCfatal: [c3bb-os-cmpt]: FAILED! => {"attempts": 5, "changed": false, "cmd": "/openstack/venvs/neutron-18.1.3/bin/pip2 install -U  --constraint http://172.29.239.250:8181/os-releases/18.1.3/ubuntu-18.04-x86_64/requirements_absolute_requirements.txt  neutron_dynamic_routing19:26
ThiagoCMCneutron_fwaas neutron_lbaas", "msg": "\n:stderr: Traceback (most recent call last):\n  File \"/openstack/venvs/neutron-18.1.3/bin/pip2\", line 7, in <module>\n    from pip._internal import main\nImportError: No module named pip._internal\n"}19:26
ThiagoCMCTASK: os_neutron : Install optional pip packages19:27
*** cmart has joined #openstack-ansible19:51
guilhermespThiagoCMC: yes. We need to wait it to be merged in master to backport it19:59
*** priteau has quit IRC20:01
*** sdake has quit IRC20:06
*** sdake has joined #openstack-ansible20:06
ThiagoCMCOk!20:08
cmartwill a lot of you be in Denver at the end of April? strongly considering actually going to one of these20:08
mnasercmart: i will be personally :)20:08
cmartforgive the dumb question, will there be an OSA gathering / track?20:09
mnasercmart: we have the ptg which is the project team gathering where the OSA team has a few days where we sit and scope up the next cycle20:09
mnaserand we also do some hacking20:09
mnaserand some hanging out :)20:10
cmartright on. is that at the summit or before?20:10
openstackgerritGuilherme  Steinmuller Pimentel proposed openstack/openstack-ansible-os_tempest master: Only init a workspace if doesn't exists  https://review.openstack.org/63354920:11
*** sdake has quit IRC20:39
*** sdake has joined #openstack-ansible20:43
*** sdake has quit IRC20:44
*** mgariepy has quit IRC20:45
*** sdake has joined #openstack-ansible20:46
*** hwoarang has quit IRC20:50
*** hwoarang has joined #openstack-ansible20:51
*** sdake has quit IRC21:01
*** sdake has joined #openstack-ansible21:02
cloudnulljrosser mnaser odyssey4me - I spun up a test instance and run the tempest tests (cloned then executed run-tests.sh) for the os_tempest repo21:04
cloudnullubuntu worked, cent failed.21:04
cloudnulland it failed on the python venv build21:04
cloudnullis that something we're aware of , or maybe its just a transient thing21:04
cloudnullthat said, the routing table looks the same across distros21:04
jrosserthings have passed recently https://review.openstack.org/#/q/project:openstack/openstack-ansible-os_tempest21:05
jrosserif you enter the tempest1 container can you ping 10.1.3.x things?21:05
cloudnullyes https://pasted.tech/pastes/c77ef1cb7f59953c7b7ef45d5f55c2fbc9a71ee921:07
jrosserthis was the symptom of it http://logs.openstack.org/08/633208/9/check/openstack-ansible-functional-centos-7/95a334e/logs/openstack/tempest1/stestr_results.html21:07
* cloudnull looking21:07
jrosserand this is with my patch reverted?21:09
cloudnulloh no.21:12
cloudnullthat was clone of master21:12
* cloudnull can do another one 21:12
*** radeks_ has quit IRC21:19
cloudnulljrosser my test didnt get that far21:22
cloudnullfor cent21:22
*** zul has quit IRC21:27
jrosserRight, but it made the containers, and added ip to eth12?21:28
cloudnullneither deployment has an IP on eth1221:34
cloudnullthough it does exist21:34
cloudnullhttps://pasted.tech/pastes/5003f296d04e28653d705f1a0e9170e7fd50b5d521:35
cloudnullthats what i see in br-vlan21:35
cloudnullthese are the interface files I see /etc/systemd/network/6-general-eth12-veth.link  /etc/systemd/network/6-general-eth12-veth.network21:36
*** sdake has quit IRC21:38
*** nurdie has quit IRC21:47
*** nurdie has joined #openstack-ansible21:48
*** sdake has joined #openstack-ansible21:49
*** openstackgerrit has quit IRC21:50
*** nurdie has quit IRC21:52
*** Pbing has quit IRC22:10
cloudnullwelp. i think i found the difference22:12
cloudnullxenial and bionic use `/sbin/ip` while centos uses `/usr/sbin/ip` and suse uses `/bin/ip`.22:13
cloudnulland the tests role just uses /usr/sbin/ip22:14
* cloudnull shakes fist at ALL THE LINUXES! 22:14
cloudnullmnaser now im ready to build a distro22:14
cloudnull:D22:14
*** ostackz has quit IRC22:15
apevecFWIW /usr/sbin/ is correct and for backward compat centos has symlink /sbin -> usr/sbin22:18
*** apevec has quit IRC22:18
ThiagoCMCcloudnull, Linux from Scratch?  lol22:19
cloudnullapevec +1 agree /usr/sbin is correct22:19
cloudnullso this case I agree with centos ")22:19
cloudnull^ a first I'm sure22:20
*** DanyC has joined #openstack-ansible22:40
*** strattao has quit IRC22:41
*** openstackgerrit has joined #openstack-ansible22:45
openstackgerritKevin Carter (cloudnull) proposed openstack/openstack-ansible-tests master: Set the ip tool path for each OS  https://review.openstack.org/63405722:45
*** sdake has quit IRC23:01
*** hwoarang has quit IRC23:04
*** hwoarang has joined #openstack-ansible23:05
*** hwoarang has quit IRC23:10
*** hwoarang has joined #openstack-ansible23:11
*** nurdie has joined #openstack-ansible23:24
*** sdake has joined #openstack-ansible23:29
*** sdake has quit IRC23:33
*** slaweq has quit IRC23:38
*** nurdie_ has joined #openstack-ansible23:49
*** nurdie has quit IRC23:50
*** nurdie_ has quit IRC23:56
*** nurdie has joined #openstack-ansible23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!