Wednesday, 2022-03-30

OutBackDingois there a secret ansible recipe to make this all work00:20
OutBackDingoTASK [Authenticate to the cloud and retrieve the service catalog] *************************************************************************************************************************************************************************************************************************************************************00:21
OutBackDingofatal: [localhost]: FAILED! => {"ansible_facts": {"discovered_interpreter_python": "/usr/bin/python3"}, "changed": false, "msg": "Cloud default was not found."}00:21
OutBackDingorunning openstack-ansible -i inventory playbooks/healthcheck-openstack.yml fails wiotht he above meesage00:22
jrosserOutBackDingo: that is running against localhost, and would be looking for clouds.yaml or an openrc file i think07:09
jrosserOutBackDingo: this is relevant https://github.com/openstack/openstack-ansible/blob/4d6c3a2ec743e149505e5b9c936dacee6d6d4379/releasenotes/notes/openstack-service-setup-host-f38d655eed285f57.yaml07:11
*** arxcruz is now known as arxcruz|out07:15
jrosseron the other hand we really only use this script in our CI tests and don't document it's use outside of that, so.......07:17
opendevreviewMerged openstack/openstack-ansible-ops master: Updated from OpenStack Ansible Tests  https://review.opendev.org/c/openstack/openstack-ansible-ops/+/83569608:35
opendevreviewMerged openstack/openstack-ansible-plugins master: Updated from OpenStack Ansible Tests  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/83572708:46
noonedeadpunkshould we merge https://review.opendev.org/c/openstack/ansible-role-pki/+/830794 and https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/830179/9 ?09:53
jrosseri would certainly like the first one as it fixed some broken behaviour09:54
jrossersecond one i am maybe not confident about the IDP parts09:56
opendevreviewMerged openstack/openstack-ansible-rabbitmq_server master: Updated from OpenStack Ansible Tests  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/83572809:56
opendevreviewMerged openstack/openstack-ansible-memcached_server master: Updated from OpenStack Ansible Tests  https://review.opendev.org/c/openstack/openstack-ansible-memcached_server/+/83569309:58
jrossernoonedeadpunk: the keystone patch is probably OK - expect we do not test k2k and neither do i have a deployment like that09:59
jrosser*except09:59
jrosserso testing if the IDP changes are working is not something i have had an opportunity to do09:59
opendevreviewMerged openstack/openstack-ansible-lxc_hosts master: Updated from OpenStack Ansible Tests  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/83569210:03
noonedeadpunkok, gotcha10:10
opendevreviewMerged openstack/openstack-ansible-rsyslog_client master: Updated from OpenStack Ansible Tests  https://review.opendev.org/c/openstack/openstack-ansible-rsyslog_client/+/83573010:15
OutBackDingoanyone ? openstack firewall groups inactive... shouldnt it be active 10:31
OutBackDingocant ping instance10:31
opendevreviewMerged openstack/openstack-ansible-repo_server master: Updated from OpenStack Ansible Tests  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83572910:38
jrosserOutBackDingo: I think we'd need a bit more context than that to understand what you mean10:41
OutBackDingoso ive deployed openstack-ansible, and launched an instance, we cant ping that instance from the compute host10:42
OutBackDingo@jrosser im nmot deeply familiar enough with openstack networking to debug this10:42
OutBackDingoonly thing i see in the ui is where it says the openstack firewall group is inactive.. not sure it even means anything related10:44
OutBackDingobut i would expect it should be active10:45
jrosserOutBackDingo: do you think that the compute node should be able to ping the instance? that will only be possible if you've set up the networking to enable that to happen10:48
jrosseryou would need the vm to be on a flat network which is the same as, or can route to the compute node network10:49
jrosseror a vlan network that can do the same10:49
jrosserand with / without a neutron router as your use-case required10:49
jrosseror a vxlan network with a neutron router and provider network that can contact the compute node10:50
jrossermany many possbilites, all enabled by openstack-ansible but you need to decide which is appropriate and configure it that way10:51
OutBackDingotrying to console the vm even i get "Something went wrong, connection is closed"11:05
*** dviroel|out is now known as dviroel11:17
admin18OutBackDingo, you are not supposed to ping the instance from the compute host12:00
admin18they should not see each other .. unless they are via a router12:00
OutBackDingo@admin1 ok so if not, how can i 1) debug the instance console 2) validate it has internet via ping12:07
admin1first thing is to fix the console ..   and once console is there,  launch a cirros instance that allows you to  login with the default user/pass and then check if you can ping the gateway from inside the instance12:08
noonedeadpunkOutBackDingo:  2) if it's part of internal network, then you should have router to be created for that network. routers are served with neutron-l3-agents and in fact these are simple network namespaces. So you should be able to ping from there12:09
admin1noonedeadpunk beat me to that 12:09
admin1you should be able to ping the internal ip of the instance via the router namespace 12:10
OutBackDingook, how do i login to the router instance12:23
admin1first check where the router is 12:26
admin1and then ssh to that node 12:26
admin1and then  ip netns exec $namespace   bash 12:26
admin1and then you can do things inside like ip -4 a ; iptables -L -n -t nat  etc 12:27
OutBackDingoso it says 10.16.64.110router_gateway12:28
OutBackDingo"check where the router is ? meaning on what node is sitting12:29
OutBackDingosays its active on controller-312:30
OutBackDingo@admin1  ip netns exec $namespace   bash ?? what defines $namespace12:31
OutBackDingoqrouter-24bd726d-b3b3-4b9d-9447-4a5e322518d612:32
admin1yep12:32
admin1ssh to controll3, and then run    ip netns exec qrouter-24bd726d-b3b3-4b9d-9447-4a5e322518d6 bash 12:32
admin1then you are inside the namespace12:32
admin1ip -4 a ; ifconfig ; ip route show ;  iptables -L -nvx -t nat    .. will show you details 12:32
OutBackDingonope nothing12:32
admin1it wont say "Welcome to namespace message" .. typical unix  .. but echo $? is 0 :) 12:33
OutBackDingoroot@controller-3:~# ip netns exec qrouter-24bd726d-b3b3-4b9d-9447-4a5e322518d6 bash12:33
OutBackDingoroot@controller-3:~#12:33
OutBackDingonada12:33
admin1yes 12:33
admin1that is how it is 12:33
admin1the bash is now inside the namespace12:34
OutBackDingooh12:34
admin1there is no docker/python type helper to let you know where you are 12:34
admin1ip -4 a ; ifconfig ; ip route show ;  iptables -L -nvx -t nat   12:34
admin1those will show you the details you need to know .. 12:34
OutBackDingoanyone want to take a look i can pastebin it12:35
OutBackDingo@admin1and should i be able to ping the instance ips from inside the router12:38
jrosserfrom the network namespace you should be able to ping the internal IP of the instance12:41
jrosserand you should also be able to ping "outward" to whatever the next hop is on your external/provider network12:42
jrosseryou could paste that stuff if you like12:43
admin1based on your security groups, you can or cannot ping ..  12:45
OutBackDingook seems i can ping the instance ips, the floating ips and the primary router interface ip 12:45
OutBackDingobut cannnot ping the listed gateway12:45
admin1but if you ping and then run tcpdumps, you will be able to trace the packets 12:45
admin1so router -> instance can ping ? 12:45
admin1then you to need trace your external network .. as in why packets are not reaching the gateway 12:46
admin1how is the external network ? is it flat or vlan based ? 12:46
OutBackDingovlan12:47
admin1linuxbridge or openvswith ? 12:49
admin1you can tcpdump in the  br-vlan interface and see if you see tagged packets leaving the physical interface12:50
admin1if you do, then you have to check if switch is allowing tagged and if the router can see the mac/packets in that vlan 12:50
admin1you need to do a bit of arp/mac address hunting in the switch and router to see in which interface they appear 12:50
OutBackDingolinuxbridge12:51
admin1cat /proc/net/vlan/config .. check if you can see the vlan tag added to the right interface 12:51
OutBackDingofrom the controller i can ping 10.16.64.112:51
admin1that is a different route 12:51
admin1how you can ping from the controller is different from how you can ping from the namespace12:52
OutBackDingoso seems something not typing the router interface 10.16.64.100  to 10.16.64.112:52
OutBackDingoso seems something not tying the router interface 10.16.64.100  to 10.16.64.112:52
admin1i have 17 mins before a meeting .. i can help you over zoom if you can share the screen .. 12:53
admin1can you pastebin the output of cat /proc/net/vlan/config12:54
admin1you should be able to see br-vlan.$TAG    on br-vlan 12:54
admin1if you see that, it means 99.% of the time, your side is OK .. and then need to check switch/router side 12:54
OutBackDingoyeah br-vlan seems wonky12:56
OutBackDingobond0.1696     | 1696  | bond012:57
OutBackDingobr-vlan.2464   | 2464  | br-vlan12:57
admin1that is good 12:57
admin1the tag is proper 12:57
OutBackDingowhere in netplan br-vlan is bond0.169612:57
admin1and how it should be 12:57
OutBackDingoso whats br-vlan.246412:57
admin1you cannot add br-vlan on top of a vlan 12:57
admin1that is your problem right there12:57
admin1do this 12:57
admin1give bond0 to br-vlan 12:57
admin1that way, it will add the right tag to the bond 12:58
OutBackDingogive bond0 to br-vlan ?12:58
admin1your br-vlan  CANNOT be on top of a tagged interface  unless you are using it as flat or vlanQ-in-Q12:58
lowercaseCorrect, its Bond -> then Bridge then Vlan12:58
admin1in your netplan, br-vlan is what ? its  bond0.1696 ? 12:59
OutBackDingoyes12:59
admin1make it br-vlan ; interface => bon0 12:59
admin1without any vlan tags12:59
admin1br-vlan needs to own the bond0 without any tags 12:59
admin1nothing will break ..12:59
OutBackDingohuh... its the same on the compute hosts also13:00
OutBackDingoughhh13:00
admin1its a simple netplan generate;apply and restarting of the l3 agents 13:01
admin1you don't need to re-run any playbooks 13:01
OutBackDingoso it should be actually13:01
OutBackDingo       br-vlan:13:01
OutBackDingo            interfaces:13:01
OutBackDingo            - bond013:01
admin1yes13:01
admin1because neutron adds the tags when you create an external network .. so it will create and pass the right vlan on the bond 13:02
admin1right now its trying to send 2464 on top of 1696 13:02
admin1so br-vlan is always on bond0 untagged 13:02
admin1at most you may need to delete the  router and readd the external network 13:03
OutBackDingocrap now netplan throwing an error /etc/netplan/50-cloud-init.yaml:59:15: Error in network definition: br-vxlan: interface 'bond0.1680' is not defined13:06
OutBackDingoalso note br-mgmt: is on bond0 with an ip, can i add br-vlan on bond0 also without ip 13:07
opendevreviewMerged openstack/ansible-role-pki master: Refactor conditional generation of CA and certificates  https://review.opendev.org/c/openstack/ansible-role-pki/+/83079413:19
spatelOutBackDingo i am using netplan in my cloud, here is the sample of one of infra node - https://paste.opendev.org/show/bT7L4Gw4SBtdccIJIe4g/13:29
admin1OutBackDingo, you can 13:32
admin1the way neutron uses it will be with a vlan tag13:32
admin1so your normal traffic and tagged traffic is diferent 13:32
lowercaseOutBackDingo: here is a sample one from one of my hypervisors13:39
lowercasehttps://paste.opendev.org/show/bqoF0VQU8vty3FxaO3gB/13:39
spatellowercase no MTU 9000 ? 13:46
lowercaseeverything should be mtu 900013:46
lowercaselet em check13:47
lowercasehuh.... dev doesn't have mtu set to 9k, but prod does13:48
spatelI am using MTU 9000 on only br-vxlan and br-storage. We have lots of legacy servers in DC which is configured for 1500 default so trying to avoid two kind of MTU :)13:48
spatel+113:48
lowercaseOur dev environment runs on significantly older hardware so its possible there is a hardware issue preventing that. I'll look into it..... later13:49
OutBackDingook something really funky, i cannot add br-vlan on bond0, netplan then says  /etc/netplan/50-cloud-init.yaml:59:15: Error in network definition: br-vxlan: interface 'bond0.1680' is not defined14:03
OutBackDingoill pastebin this config... 14:04
NeilHanlonlowercase: possibly has to do with network config, too. jumbo frames need to be supported across the net14:12
OutBackDingodo i need tagged vlans at all ?14:15
OutBackDingoits a small cloud... 14:15
NeilHanlonyou should separate your management, data, storage, etc planes, yes14:15
OutBackDingo@NeilHanlon why ?14:16
OutBackDingologic ?14:16
OutBackDingowhat is the logic for it?14:16
lowercaseSecurity, isolation, and preventing from cross talk.14:17
lowercaseI also want to say that it helps with the logical layout as well. It's easier to to know where things go and how they should be "organized" on the network14:18
lowercaseSecurity its the number 1, howevor.14:18
NeilHanlonbandwidth considerations as well. for small clouds you don't necessarily need more than a single interface. I suppose it's also possible to do everything on a single network without segregation, but it'd likely require significant modificaiton to the config14:18
lowercaseI almost say it would be harder to do without vlans than with lol14:19
NeilHanlonheh, I agree14:19
NeilHanlonOutBackDingo: this is a good read w.r.t. OSA https://docs.openstack.org/openstack-ansible/latest/user/network-arch/example.html14:20
OutBackDingoyeah im just being asked by higher powers14:20
lowercaseAre you point lead on this project?14:20
OutBackDingono, im the guy trying to figure out why qrouter networking is broken14:21
jrosserOutBackDingo: tbh it sounds like you are nearly there14:21
OutBackDingoi get the vlan configuration, just nnot so much how they did it14:21
jrosserhost networking is always hard, just needs wrangling till it works14:22
OutBackDingo@jrosser i agree14:22
jrosserand every-single-time, reconfiguring a host from complex-setup-A to complex-setup-B is fraught with trouble14:22
NeilHanloni'll be honest setting up the initial external network/routing stuff in openstack is something I perpetually struggle with14:22
jrosserit's quite often better to reboot into the new config than try to do complex changes in-place14:23
OutBackDingo@jrosser yeah when netplan isont companing br-vlan is wanting to go on untagged14:24
NeilHanlonjrosser++ on that. host network stacks are not the most... stable APIs :) 14:24
OutBackDingoand then throws br-vxlan under the bus14:24
admin1paste the config you have for netplan now OutBackDingo14:24
admin1pastebin*14:24
OutBackDingohttps://pastebin.com/DGLmeu4k14:26
OutBackDingopersonally id move br-mgmt to bond0.1680 and br-vlan to bond014:27
OutBackDingojusat reversing them14:27
OutBackDingoearmm bond0.1696 rather14:29
admin1OutBackDingo, a equivalent one in one of of my cluster  .. https://pastebin.com/raw/uMDHD7en14:29
NeilHanlonYeah, that looks weird to me. IMO br-vlan cannot be a bridge on top of a vlan interface14:29
admin1and OutBackDingo, you don't need to give nameservers for br-vxlan, etc 14:29
admin1maybe the one i pasted helps you simplify yours 14:30
OutBackDingoyupp welp that broke it14:44
OutBackDingocant even ping it now, maas rescue mode to save the day14:45
OutBackDingo@admin1 is this same netplan config on ALL your hosts? controler/compute and storage14:47
noonedeadpunkSo when I had single inderface, I used vlan interface for public network, and used bond0 as for br-vlan14:55
*** dviroel is now known as dviroel_15:04
*** dviroel_ is now known as dviroel15:04
*** dviroel is now known as dviroel|lunch15:44
admin1yes ..  storage has 1 more bond for replication 15:49
noonedeadpunk(unless mgmt net is not used for that for whatever reason :))15:52
noonedeadpunkbut yes, absolutely!15:53
OutBackDingo@admin1meaning yes, same config on all hosts, expcept your replication bond15:55
OutBackDingoseems im having an issue adding br-vlan on bond0 when br-mgmt is also there15:56
OutBackDingobr-mgmt has the ip15:56
OutBackDingobut it never comes up / cant ping it after a reboot15:56
OutBackDingowondering if i should move the ip to br-vlan15:57
OutBackDingosee if that shakes it loose15:57
noonedeadpunkwhy not to create another vlan for br-mgtm?15:59
noonedeadpunkso bond0 - br-vlan, bond0.100 - public, bond0.200 - br-mgmt, bond0.300 - br-storage15:59
noonedeadpunkor smth like that16:00
OutBackDingo@noonedeadpunk meaning bond0.100 is just an alias on bond0 - same network interfaces16:02
OutBackDingotheres only 1 bonded pair of interfaces in this box16:02
OutBackDingo2 100Gb16:03
noonedeadpunkI think alias would be bond0:100?:)16:04
noonedeadpunkI meant vlans16:05
noonedeadpunkoh, wait, you can't have vlans16:05
noonedeadpunkdamn16:05
noonedeadpunkclean forgot16:05
noonedeadpunkthen disregard16:05
noonedeadpunkOutBackDingo: basically, you don't need br-vlan then at all16:05
OutBackDingoyupp my issue seems to be making br-vlan happy on bond0 along with br-mgmt16:05
OutBackDingouhmm whys that ?16:06
noonedeadpunkif you can't pass vlans through your switch this is smth you don't bneed for sure16:06
noonedeadpunkand if you can pass vlans - then just make mgmt/stor/vlan as separate vlans16:07
noonedeadpunk(as I suggested)16:07
noonedeadpunkas for tenant networks you need to have only vxlans to be fair16:07
NeilHanlonit would also probably help to see the network config on the switch side. it's hard to guess at how your switches are configured16:07
noonedeadpunkand for vxlan you don't need even a bridge - it can be anything having IP address through which vxlan would be built16:08
jrosser2 x 100Gb withoug vlans? surely not.....16:10
OutBackDingo@NeilHanlon not sure where switch configuration matters here, when im just trying to configure br-vlan untagged on bond0 and it doesnt work16:11
jrosseri am pretty confused by all of this tbh16:14
jrosserOutBackDingo: if you can ask a simple question, showing the config and the error output and it's surrounding context, things might be easier to understand16:14
admin1OutBackDingo, is your ports on hybrid mode ? 16:14
admin1that they allow both tagged and untagged traffic at the same time 16:14
admin1its configured in the switches 16:14
admin1based on switch, you have a default pv ( vlan) and tagged vlans16:15
OutBackDingoyes16:15
admin1so its not trunk or switchport, but something in middle 16:15
OutBackDingocorrect16:15
admin1to test, you can also direct add ip on the bond0 and test it 16:15
jrosseradmin1: but we struggle with netplan here, not switches?16:15
OutBackDingoadmin1: there is an ip on the bond16:15
OutBackDingo    br-mgmt:16:16
OutBackDingo            addresses:16:16
OutBackDingo            - 10.16.48.23/2416:16
OutBackDingo            gateway4: 10.16.48.116:16
OutBackDingo            interfaces:16:16
OutBackDingo            - bond016:16
admin1the netplan i used similar to his use case is the one working for me .. bond0 untagged that does ssh for mgmt, ( i added ips on br-vlan) and then tags on bond0 for the api 16:16
admin1this is on br-mgmt 16:16
admin1so now br-mgmt is directly on bond0 ? 16:16
admin1then your br-vlan will not work 16:16
OutBackDingoit always was directly on bond016:17
OutBackDingowhich is why br-vlan isnt working16:17
admin1ip on bond0 does not interfere with br-vlan sitting on top of bond0 16:17
admin1ip on bond0 or br-vlan on bond0 is untagged ..      neutron adds a new tagged interface on bond0 and send traffic 16:18
admin1so they do not interfere with one another 16:18
admin1first thing for you to do .. remove all tags etc .. just add bond0 in the netplan and then ping  your gateway .. 16:18
OutBackDingowelp something does because if i put br-vlan on bond0 ... and reobot i cant get back onto it16:18
admin1if ti works, then slowly add the rest 16:18
admin1what is the netplan file that you have, before you reboot ? 16:18
OutBackDingohttps://pastebin.com/9mP9hXfP16:20
NeilHanlonOutBackDingo: the switch config will determine how you can configure your server's networking, so it does matter16:20
OutBackDingo@admin1 ^16:22
OutBackDingopastebinned it16:22
jrossermaybe i ask a silly question, but why try to do this all through one bond when there appear to be 6 interfaces?16:24
OutBackDingoNeilHanlon: yes i get that but none of it will  matter unless i can get br-vlan on bond0 with br-mgmt also16:24
noonedeadpunkOutBackDingo: I truly do not understand why you just not create another vlan for br-mgmt???16:24
OutBackDingo@jrosser i didnt design it...  and im told 2 x 100gb is plenty for this small setup16:25
OutBackDingonoonedeadpunk: i tried, didnt work16:25
noonedeadpunkwhat didn't work?:)16:25
OutBackDingoif you look at the past bin i tried to move br-mgmt to bond0.169616:25
OutBackDingowhich is what br-vlan was on16:25
OutBackDingobasically reversing them16:25
jrosseryou know br-vlan kind of represents a trunk port (tagged) ?16:26
noonedeadpunkWell, I don't see that on pastebin you provided :) And not sure what didn't work16:26
OutBackDingoright now it seems like br-mgmt with ip is only happy on bond0 and refuses to share bond0 with br-vlan16:26
noonedeadpunkyes, you can not have br-vlan and br-mgmt on same interface16:27
noonedeadpunkas neutron will takeover br-vlan16:27
noonedeadpunkBut I'm pretty sure that br-mgmt on another vlan is good idea16:27
jrosserwe have an example file too https://github.com/openstack/openstack-ansible/blob/master/etc/netplan/01-static.yml16:28
noonedeadpunkOR, you can just skip having br-vlan - do you need vlans for your tenants?16:28
noonedeadpunkas this is why it's even existing16:28
OutBackDingo@noonedeadpunk maybe try the previous pastbin https://pastebin.com/DGLmeu4k16:28
noonedeadpunkin most of deployments ppl use _only_ vxlans16:28
noonedeadpunkBut what haven't worked when you set br-mgmt on bond0.1696 ?:)16:29
OutBackDingo@noonedeadpunk correct16:29
noonedeadpunkthat was a question :)16:30
noonedeadpunkyou have changed that on all hosts?16:30
OutBackDingoi basically tried to reverse br-vlan on bond0.1696 and br-mgmt with ip on bond016:30
OutBackDingomakein br-vlan bond0 and br-mgmt bond0.1696 and rebooted16:31
OutBackDingoand after the reboot could not ping / access the node16:31
OutBackDingoas br-mgmt has the primary ip16:31
noonedeadpunkok, but you should have done that on other nodes then as well16:31
noonedeadpunkor well, you don't have then vlan reachable / routable 16:32
OutBackDingowelp, shit! your right16:32
jrosserimho a simple 1G port that you use for ssh aside from all this other stuff is worth a very large amount16:32
noonedeadpunkso likely this vlan only available inside switch and not passed futher on16:32
OutBackDingo bridges:16:33
OutBackDingo    br-mgmt:16:33
OutBackDingo      interfaces:16:33
OutBackDingo      - bond016:33
OutBackDingo      mtu: 900016:33
OutBackDingoyupp seems the infra host i was on is on bond0 also16:33
OutBackDingoso it wouldnt be able to talk to same network on bond0.169616:34
noonedeadpunkbut before changing everything, it's worth thinking if you _really_ need to provide vlans to your tenants in addition to vxlan. Likely you might want if you decide to deploy trove or octavia...16:34
OutBackDingotherefor, can i move the ip from br-mgmt to br-vlan16:34
noonedeadpunkI won't do that16:34
OutBackDingoand reverse the bond0 / bond0.169616:34
noonedeadpunkas I said, br-vlan would be taken over by neutron16:34
noonedeadpunkwhich means it will be part of other bridge, so IP won't work basically16:35
noonedeadpunkso in fact br-vlan should be jsut regular interface, like bond016:35
jrosserultimately you have to pass an interface to neutron, not a bridge, right?16:36
noonedeadpunkyup16:36
noonedeadpunkit can be bridge though16:36
noonedeadpunkbut it's quite obscure to see bridge inside bridge16:37
noonedeadpunkor well, not bridge inside bridge, but vlan, created on top of bridge inside other bridge16:37
noonedeadpunkso like br-vlan.1000 inside generated by neutron bridge16:38
noonedeadpunktons of unnecessary overhead16:38
noonedeadpunkfrom other side easy to switch underlying interface16:39
OutBackDingoso basically what im hearing is br-mgmt w/ ip and br-vlan cannot both reside on bond016:40
OutBackDingoi cant move br-mgmt w/ip off bond0 to a van on say bond0.1696 unless i move literally all hosts to same vlan bond0.1696 w/ ip16:41
OutBackDingoand then add br-vlan to bond0 on all hosts16:41
OutBackDingoright!@16:41
noonedeadpunkI have no idea why you can't move br-mgmt there16:42
OutBackDingonoonedeadpunk: i can if i do it to every single host16:43
OutBackDingobecause every single host has br-mgmt w/ ip on bond016:44
noonedeadpunkIt's only guess, but as I said, likely your bond0.1696 is not passed somewhere16:44
noonedeadpunklike to the router16:44
noonedeadpunkor not configured on the router properly16:44
noonedeadpunkso networks that are defined there are not routable16:44
noonedeadpunkwhich might be valid though if it wasn't IP you're reaching environment16:44
OutBackDingoexactly...16:45
OutBackDingoits not an out of band mgmt ip, its the primary16:45
OutBackDingothere isnt a out of band ip16:45
noonedeadpunkbut well, it's doable :) I mean I literally faced same thing when was deploying my first cloud years back :)16:45
jrosser^ mistake :)16:45
noonedeadpunkand yeah...16:45
OutBackDingo@jrosseri didnt design it 16:46
jrosseri have two completely independant means to get into each server beside the openstack mgmt interface16:46
noonedeadpunkyup, me too16:46
jrosserreality just is that you need some options for when $real-life does something unexpected16:46
OutBackDingowell we have 3 networks on vlans per machine16:46
OutBackDingobut if that bond0 never comes upo for X reasons, your done16:47
jrosserbut they're all on the same interface to the same switches.....16:47
OutBackDingoyupp16:47
jrosserfor a production environment thats kind of not good when things go wrong16:47
OutBackDingowhich is a point ill raise to the higher powers16:48
jrosserhow do you upgrade the firmware on the nic, or some other disruptive thing.....16:48
OutBackDingoas a "design" modification16:48
OutBackDingo@jrosserdisruptive as in rwnaming every primary bond0 to bond0.xxxx which has your primary IP16:49
OutBackDingoi get it16:49
OutBackDingoand can see it clearly16:49
jrosserwell yeah, if you need to reconfigure the interface in a way that tears down / rebuild the config, you can't be ssh in over the same thing16:49
OutBackDingoand maas deployed wount let you login to ipmi console either16:50
jrosseran option is some sort of KVM / remote screen on IPMI, but thats really for emergency as its very inflexible16:50
jrosserlike no ssh keys, no copy/paste and so on16:50
OutBackDingo@jrosser cannot console ipmi with a maas deployment to my knowledge and login16:51
OutBackDingoall hosts only accessable via ssh16:51
OutBackDingocan only reboot into rescue undo the breakage and reboot16:52
jrossersomething else to feed back is the 100G NIC are really overkill16:52
jrosseryou won't be able to utilise even a fraction of that with linuxbridge16:52
OutBackDingopersonally id break the bonded pair into to separate interfaces16:52
jrossermaybe - upgrading switch firmware has things to say about that16:53
jrosseryou can figure a lot of this out by working through what you'd do for managment / operational tasks16:54
jrosserlike replace a server / upgrade a nic firmware / deal with a broken switch16:54
jrosserhow do you keep things working enough in all those situations and still have sufficient access to things16:54
OutBackDingoeither way i can see where the deployment  as far as network config goes needs a "back door" for ssh access16:54
jrosserwhat must always carry on, and what is only an inconvenience if it's down16:55
jrosseropenstack-ansible has no problem with that alternative ssh access be the one that the playbooks run over16:55
*** dviroel|lunch is now known as dviroel17:00
admin1OutBackDingo, does it ping if you give the ip directly on bond0 and nothing else ? remove everything else .. 17:01
OutBackDingoyes as br-mgmt is on bond017:01
OutBackDingoand its primary ip17:01
OutBackDingothen 10.16.48 network works on the interface plain dhcp even on single 100gb nic, or bridge / bond017:07
OutBackDingoi guess a good logic test is move br-mgmt w/ ipto a bond.1696 vlan  to 2 hosts and test their connectivity between the two hosts 17:09
jrossersounds good - start as simple as you can and test connectivity at each step17:13
OutBackDingoLOL i need a sed via ssh to every hoost to rewrite this file and reboot all nodes17:21
admin1first test it in 2 only ...  17:50
admin1your config is wrong though .. you are adding bond0 on top of br-mgmt and  br-vlan is on top of bond0.1672 and 1680  .. so your vlan will not work 17:51
OutBackDingo@admin1 uhmmm17:52
OutBackDingoill check that17:53
OutBackDingo@admin1 meanwhile here is the proposed fix17:53
OutBackDingohttps://pastebin.com/6tHAJSzJ17:53
OutBackDingobased on your netplan, with our interfaces/bonds/vlans17:54
admin1yes, but here you don't have your ssh ip :) 17:56
OutBackDingo@admin1 sure i do 17:57
OutBackDingo10.16.48.23/2417:57
OutBackDingolast line17:57
admin1ok .. as long as you can reach it , that is fine17:57
OutBackDingo    br-mgmt:17:57
OutBackDingo      mtu: 900017:57
OutBackDingo      interfaces: [ bond0.1696 ]17:57
OutBackDingo      addresses:17:57
OutBackDingo        - 10.16.48.23/2417:57
admin1now add br-vlan and give it bond0 17:57
admin1don't add any ip or anyhing .. just br-vlan .. interfaces bond0 .. dhcp4 false dhcp6 false 17:58
admin1that should not bring anything down 17:58
OutBackDingo@admin1 final https://pastebin.com/LnHAZAvq18:01
OutBackDingohah ill add the dhcp4 / dhcp6 false18:02
admin1yep .. do the same in another node and test ping between the br-storage, vlan and mgmt18:02
admin1if they all ping fine, replicate to all nodes 18:02
OutBackDingoand no nameservers / no macs correct18:03
admin1macs are not needed since all ips are static 18:03
admin1and nameservers are needed 18:03
admin1so that apt-get update etc will work 18:03
OutBackDingowell nameservers go in /etc/resolv.conf also18:04
OutBackDingoahhh netplan populates it18:04
admin1netplan does it for you 18:04
OutBackDingoso just on br-mgmt18:04
admin1right 18:04
admin1after that, run    curl gw.am  .. if dns and routing is good, it will return back your current outgoing public ip 18:05
admin1when working with a lot of clouds, and to check if all is good, i wrote my own service on gw.am which shows back your ip in curl 18:05
admin1that way, i know a vm is working fine 18:05
OutBackDingo@admin ok last one https://pastebin.com/ksg2kSCD18:07
admin1looks good .. i never use search domains though ..  search domain maas .. 18:08
admin1i would also add dhcp4/6 false under storage and others to prevent it looking for dhcp during bootup 18:09
admin1so storage and vxlan also add dhcpX false18:10
jrosserwhen thinking about the mtu two of these matter18:11
jrosserthe vxlan packets ideally are roughly 1500+vxlan header18:11
jrosserand if you want best performance from some shared storage than you’d want that as 900018:12
jrosserbut otherwise things that need to connect outside of your deployment should stay as 150018:12
OutBackDingoso add the dhcp false, and add mtu 900018:16
OutBackDingo@jrosser all storage is on the nodes18:17
OutBackDingonothing external18:17
OutBackDingoits all ceph18:17
OutBackDingoand i really appreciate all the input, help, critics and insight18:18
OutBackDingofrom you all18:18
jrosserwhen you get this working pay attention to where the storage traffic actually goes18:18
OutBackDingojrosser: meaning ?18:19
jrosserif you follow the layout in the openstack-ansible all-in-one as a reference then you will have ceph traffic on the mgmt network18:19
OutBackDingoall compute nodes have their own "ceph osds" and we used ceph ansible oin the nodes prior, we didnt deploy openstacks-ansibble ceph18:20
jrosserensure that the ceph cluster and relocation network cidr end up as you expect in the various ceph.conf18:20
jrosseroh ok sure that’s ok18:20
OutBackDingo@jrosser thumbs up!18:20
jrossermake sure you reserve sufficient hypervisor memory for what ceph needs when it’s converged like this - admin1 did you ever run with osd on compute nodes?18:22
admin1i tested HCI .. but i have not one in production 18:24
admin1can't properly plan resources around it .. 18:24
jrosserOutBackDingo: more experiences here ^18:25
admin1if you have ceph with ec2, and and osd  goes down and at the same time you end up with fairly busy instances on a 1:6 share or even 1:4 share, you see cpu spikes 18:25
jrosserI’ve got a few osd hosts with more ram than some compute nodes18:26
admin1plus it makes the resource planning and ratios usage calculation not  exact .. like how much are you going to oversell/overuse cpu and ram 18:26
jrosserwhen ceph has to deal with dead disks or a big rebalance then the memory usage can be really significant18:27
jrosserOutBackDingo: “in general” most serious deployments have kept the ceph hardware separate from compute nodes18:28
jrosseras admin1 says, dealing with resource allocation can become very tricky18:28
OutBackDingosmall 6 node cluster each with 512 memory18:30
jrosseramusingly the HCI vendors are now pushing this fact new model with separate storage nodes :)18:30
jrosser*fancy new…18:30
OutBackDingoall nodes have like 30+ TB storage in osd, + an intel nvme card18:33
OutBackDingoso yeah small clusters18:33
OutBackDingoand with that 130AM here, so time to sleep18:36
OutBackDingoill catch up tomorrow let all know how it goes18:36
admin1good luck 18:40
admin1anyone confirmed yet for openstack summit berlin ? 18:41
mgariepyhmm auto patch from tests where are the rules defined ? as in https://review.opendev.org/c/openstack/openstack-ansible-tests/+/835468 this one should be pushed to all other repo.19:40
mgariepynoonedeadpunk, jrosser ^^19:41
jrosserI don’t think we sync tox.ini19:41
jrosseroh it’s setup.py….19:42
mgariepyalso where is that job defined ? 19:42
mgariepylol19:42
mgariepythe sync one.19:43
jrosserI think it’s partly in the tests repo and partly in (?)system-config19:43
jrosserI’m still not sure that setup.py is synced19:45
jrosserbut there are a ton of our repos broken for the same thing19:45
mgariepynop not sync19:46
mgariepyhttps://github.com/openstack/openstack-ansible-tests/blob/master/sync-test-repos.sh#L11819:46
mgariepyi guess that even if we do add it it won't sync back the old commits.. 19:48
mgariepyi will patch te repos in like 20 minutes let's see what we do for that one after.19:50
opendevreviewMerged openstack/openstack-ansible-plugins master: Update ssh_keypairs role to fix module for Rocky Linux 8  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/83515220:05
opendevreviewMarc Gariépy proposed openstack/ansible-role-python_venv_build master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/83589220:15
opendevreviewMarc Gariépy proposed openstack/ansible-role-systemd_mount master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/ansible-role-systemd_mount/+/83589320:15
opendevreviewMarc Gariépy proposed openstack/ansible-role-systemd_service master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/83589420:16
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-ceph_client master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/83589520:17
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_aodh master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_aodh/+/83589620:17
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_ceilometer master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/83589720:18
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_glance master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/83589820:18
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_gnocchi master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_gnocchi/+/83589920:19
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_magnum master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/83590020:19
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_manila master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_manila/+/83590120:19
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_neutron master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/83590220:20
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_octavia master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/83590320:20
* NeilHanlon mutes his email notifications for a bit :) 20:20
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_sahara master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_sahara/+/83590420:20
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_tacker master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_tacker/+/83590520:20
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_trove master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_trove/+/83590620:21
mgariepylol  .. sorry .. 20:21
NeilHanlonhehe no worries20:21
NeilHanloni needed to setup a mail filter anyways, just sorta forces the issue :D 20:21
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_zun master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/83590720:22
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_horizon master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_horizon/+/83590820:28
mgariepyjrosser, not all repos had the setup.py .. ..20:28
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_tempest master: Disable setuptools auto discovery  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/83590920:30
mgariepyshould be the last one.. lol20:30
jrossermgariepy: I except we have a few different issues to look at20:30
mgariepyprobably lol20:31
mgariepyat least the doc seems to be passing :D20:36
mgariepyi should have addeda topic :/20:39
mgariepywell these are the only patch i had in review so..20:39
NeilHanlonhmm, is it worth me code reviewing your changes mgariepy if I can only give them a +1 ?21:02
jrosserNeilHanlon: that is the pathway to +2, if you are interested in that….21:04
NeilHanlonsounds dangerous ;) 21:14
*** dviroel is now known as dviroel|out21:16
*** ianw_pto is now known as ianw22:24
opendevreviewMarc Gariépy proposed openstack/openstack-ansible-os_tempest master: Updated from OpenStack Ansible Tests  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/83572423:04

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!