Friday, 2020-05-08

*** billzvonar has quit IRC01:02
*** stampeder has quit IRC03:58
*** sgw has quit IRC05:39
*** ijolliffe has joined #starlingx12:27
*** sgw has joined #starlingx14:20
sgwMorning all14:20
*** stampeder has joined #starlingx15:08
stampederGood morning: The adventure continues. Overnight I tested controller-0 PXE server issue. I connected controller-0 and my new PXE client server both into a simple hub. When I tried to run the pxe client on controller-1 it displayed PXE-E51 noDHCP. So, I took my trusty Lenovo Yoda and tried it. Again no success. I found two different PXEChecker so I15:18
stampederdownloaded. One called PXEChecker and the other called 2Link. Fired both up individually and found that they displayed the same NoDHCP errors. Not wanted to be diswayed I firerd up good old Wireshark. I put it into the hub as well and started monitoring traffic. I fired up controller-1 and saw the DHCP discover message from source 0.0.0.0 to dest.15:18
stampeder255.255.255.255 with length 342 bytes. There should have been a DHCP offer response from 255.255.255.255 to 0.0.0.0 however no such response ever showed up. Thus the result is that while the PXE server in Controller-0 is listening on port 86 from my netstat query on controller-0 it is most definitely responding to any discover requests. Thoughts?15:18
stampederAh, correction. The second test software should read 2Pint. D:15:21
stampederArghh bad morning. The PXE server listens on port 67, not 86 sorry.15:29
sgwstampeder: dumb question, which kind of switch do you have, if its a "smart switch" any chance you have just dumb simple switch?  As bwensley pointed out we do this all the time are you hooked up on what you configured as the "mgmt port" on the C-0 vs the OAM?  So if your eno1 is your OAM, and eno2 is your MGMT network, make sure C-0 and C-1 are both on the eno2 ports, because eno1 on C-1 will also be configured to be the OAM.15:54
stampedersgw: I have a couple of switches and a dumb as a post  switch. The dumb one is the one I used. At this point I have mgmt, OAM and data lan all on port eno1. So here's the thing. I installed the Simplex AIO according to the docs. The thing the docs don't show is a typical server port setup (etho, eth1,etc). There are two kinds of management networks16:15
stampederin the data center. In-band and out of band. In band typically is a layer3 network riding on the data port ie eth0. Out of band is typically another dedicated ethernet port sometimes called BMC or IPMI. This is confusing I know but as a network and data center guy it makes total sense. It's when I get to where documentation says MGMT port that16:15
stampederconfusion sets in. Hence my request to see a drawing of how the network for a duplex environment looks. In figure 1 of the Bare metal All-in-one Duplex Installation R3.0 doc I count no less that 7 ports connected to Controller-1. Very confusing when one tries to put this together with real server hardware. Hope this helps. What a journey....16:15
stampederOne other thing. In my realm OAM is considered an out of band management network. It's typically a DRAC, BMC or IPMI connection. It stays powered even though the server may be powered off, hence we can boot the server from off  state remotely.16:19
sgwWait, you initially setup C-0 as AIO-Simplex?  If your trying to connect to that it won't work, it's Simplex.  You need to configure C-0 as Duplex for it to work correctly.  Also you can't have OAM and mgmt on the same port/nic.  I will admit that I am more of a systems / linux distro guy than networking in a data center, what you say about in-band/out-band makes sense.  You could make the Out-of-band/IMPI port be the mgmt port.16:40
sgwI know the image you speak of an yes it's confusing, I think some better labeling and explaination that some of those items can share the same network port16:41
stampedersgw: Do you know if anyone else is using actual servers to run this software or they mostly using desktop devices like the NUC. Don't get me wrong I like the NUC and would love to have a couple but my real world is in servers and hardware switches, routers and now virtualization.16:43
sgwWe are using servers in the test labs for sure.  Since I work from home, I have been dealing with NUC, I have the Hades Canyon which just has 2 nic, but I was able to setup a "standard" 2+2+2 with it.  So I know it can be done.16:45
bwensleystampeder: if you installed your controller-0 as AIO-simplex it cannot be used to boot other servers. That is the cause of your problem.16:46
sgwI know there are some folks that have Data Center deployments and Proof of Concepts16:46
bwensleystampeder: there are hundreds (if not thousands) of "actual servers" running starlingx (or the previous commercial product from Wind River it is based on).16:48
sgwbwensley: right, I think that might be part of stampeder's issue16:48
*** slittle1 has quit IRC16:48
sgwBut I also agree with him we need to simplify the Standard network diagram in the documentation, or provide it better explanation, not sure I am the right person to provide that.16:49
stampederbwensley: I don't doubt you for a minute. What I was referring to was people running this in their home labs on servers.16:49
bwensleyI agree about the docs.16:49
sgwstampeder: it should not be any different.  If you configured it as Simplex, it won't do duplex!16:50
bwensleystampeder: if there are specific issues in the docs that led you wrong, please raise an LP and we will improve them.16:50
bwensleystampeder: I don't understand the difference between a "home lab" and a "work lab" or "company lab". A server is a server.16:51
stampederThanks to both of you. I believe you have pointed out where this is going off the rails. So, my question is are there three different iso's. One for simplex, one for duplex and one for cloud. In looking at the docs it doesn't mention this. It only refers to one iso. The Bare Metal AIO-DX doc install sets out exactly the same steps at the beginning16:54
stampederas the Simplex. It only starts changing after one unlocks Controller-0. Then you have to boot controller-1 by network booting it. Tell me where i have gone wrong? Thanks.16:54
bwensleyOne iso that supports all configs.16:54
bwensleyThe key difference for simplex vs. duplex starts in the localhost.yml file you use when doing the ansible bootstrap.16:56
bwensleysystem_mode: simplex16:56
bwensleyvs.16:56
bwensleysystem_mode: duplex16:56
stampederbwensley: It's all in the configuration. We never do duplex, HA, geo diversity in homelabs. Also the security aspect is very low priority. Work lab similar but a little more stringent, Company lab is  typically pre productioon so it is a copy of what  goes into the real world. Same box different use cases.16:57
*** slittle1 has joined #starlingx16:58
stampederbwensely: THANK YOU!!!!!!! First time through this and I totally missed that little gem. YOU ARE MY NEW HERO!!! I'm off to OZ now to buid a duplex install. Quick question. Does the same change in the system-mode apply for cloud. So is there a command system_mode: cloud?16:59
stampedersgw: YOU are my hero too.17:01
stampeder(y)17:01
sgwstampeder: I am sorry you had so many false starts, it would be great if you would be willing to file some Launchpads with comments about the documentation (https://bugs.launchpad.net/starlingx)17:04
stampederFalse starts come with the territory. Getting opensource code to run THE FIRST TIME is ALWAYS a challenge and I have been doing that all my career in opensource. Perseverance and help from the communities have always been the key for me. Now as to your question. I am about to start a new job. I can't disclose my new employer just yet, but once I'm17:08
stampederon board I will certainly be more than willing to document my adventures so far.17:08
*** slittle1 has quit IRC17:08
bwensleystampeder: no cloud system-mode17:08
sgwstampeder: Understood!  Good luck on your new job!  Look forward to working with you in the future, I will do what I can to help.17:10
stampederbwensley: Thanks. Good to know. I'm sure I'll have a couple of questions when I get past the duplex and on to the cloud.17:10
stampedersgw: Thanks for that you have been a great help.17:10
*** slittle1 has joined #starlingx17:14
*** slittle1 has quit IRC17:32
*** slittle1 has joined #starlingx17:38
stampedersgw: Found that ansible needs to be run a couple of times. It will throw an error the first time and then run fine the next time. Not highly unusual for ansible but worth knowing. It is often caused by time delays and the variances between different hardware processing times.18:35
sgwstampeder: do you know where it failed the first time through?18:36
stampedersgw: In other words "You mileage may vary".18:36
stampedersgw:18:36
stampeder"initalizing configuration". I'd have to look through the playbook to tell more.18:37
sgwdid you save the ansible.log from the first run?18:37
stampederIt's still running the second time so let me see if it clears up the failure.18:38
stampederHere is the failure: fatal: [localhost]: FAILED! => {"changed": false, "elapsed": 450, "msg": "Timeout waiting for service endpoints reconfiguration to complete"}18:42
stampederHere is my localhost.yml:18:43
stampederlocalhost:~$ cat localhost.ymlsystem_mode: duplexdns_servers:  - 8.8.8.8  - 8.8.4.4external_oam_subnet: 10.10.20.0/24external_oam_gateway_address: 10.10.20.1external_oam_floating_address: 10.10.20.181external_oam_node_0_address: 10.10.20.181external_oam_node_1_address: 10.10.20.182admin_username: adminadmin_password: G833ranch%ansible_become_pass:18:43
stampederG833ranch%# Add these lines to configure Docker to use a proxy server# docker_http_proxy: http://my.proxy.com:1080# docker_https_proxy: https://my.proxy.com:1443# docker_no_proxy:#   - 1.2.3.418:43
stampederI know password. This is an isolated system. No internet connection.18:49
stampedersgw: Got through the install up to the point of unlocking controller-0. However controller-0 continues to be offline. I wonder if the ansible error is causing that. With simplex the ansible code would run fine and then the controller-0 would come online and could be unlocked.19:11
sgwstampeder: Did you do the additional setup for configuring the network and storage?  Are you planning on running OpenStack or just k8s?19:27
*** ijolliffe has quit IRC21:21
*** riuzen has joined #starlingx21:57
*** hyunsikyang__ has joined #starlingx22:14
*** hyunsikyang has quit IRC22:17
*** riuzen has quit IRC22:20

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!