Thursday, 2021-06-17

*** zbr is now known as Guest248805:03
*** raukadah is now known as chandankumar05:46
snadgeTASK [os_nova : Install kvm pip packages .. is now failing "unable to execute 'gcc': No such file or directory"06:52
snadgeso close .. i seem to be almost through the setup_openstack playbook06:58
noonedeadpunkwell issue you pasted is not about python307:00
noonedeadpunkbut yeah, it could be if ansible is not bootstraped, as it would still use old roles07:03
noonedeadpunkwhile using new playbooks07:03
jrosserkvm pip should never need gcc either07:04
jrosserthis indicates it is trying to build the wheel locally on the compute host rather than in the repo server07:04
noonedeadpunkbut actually then compute would be treated as repo? Or wheel won't be built at all?07:05
jrosserthere wont be a C toolchain potentially07:05
jrosserbut imho this looks pretty suspect from the log above `changed: [infra1_repo_container-cf45d187] => (item={u'path': u'/var/www/repo/os-releases/20.0.1'})`07:06
jrosser20.0.1? srsly?07:06
jrossersnadge: which release / branch / tag are you wanting?07:10
jrosserUssuri i think?07:11
noonedeadpunkI think paste was before bootstrapping.... but dunno...07:12
jrosseryeah even so for T thats a very very early tag, should ideally be starting from the most recent07:16
*** rpittau|afk is now known as rpittau07:22
snadgeit is 21.2.6. latest ussuri yes07:42
snadgethe bootstrap fixed the above problem yes.. this latest one, im looking into07:44
snadgeim not sure why os_nova is trying to install kvm pip packages onto the compute nodes07:44
jrosseris this an AIO or something more complicated?07:45
snadgesomething more complicated but not by much, its based on a stripped down version of that pretty much07:46
jrosserand you have an AIO build alongside as reference?07:46
snadgei had installed it in a test setup that was similar just using vsphere vms, that was the reference07:47
snadgethe production setup is basically the same thing07:47
snadgebut of course it isn't exactly07:47
jrosserwell, if i was having this much trouble i'd maybe take a step back07:47
jrosserfirst off, there are regular CI jobs which test this stuff for all the stable branches07:48
jrosserso we could take a look at the latest one of those for Ussuri, which is here https://review.opendev.org/c/openstack/openstack-ansible/+/79499907:48
jrosserif you click "Zuul Summary" you can see the the Centos-7 jobs were passing (on June 6th anyway)07:49
jrosseryou can also go look in the logs for those jobs07:49
jrosserthen it would be try to reproduce exactly those same tests locally with the AIO config, which should be just a couple hours waiting for it to deploy in a VM07:50
snadgethe other thing i did different in testing was use 21.2.307:51
snadgeok so if it built on jun 7.. when was 21.2.6 released07:52
jrosserremeber that these releases are pretty mechanically generated07:53
snadgeassuming i can git checkout stable/ussuri instead of eg. 21.2.607:53
jrosserimho the level of change on these branches is really low07:54
jrosserfor something like Ussuri we really are not changing anything unless something really stupid happens (like Ubuntu change a load of repos or Centos have a minor release with breaking changes) - the point releases on the whole just pull in any bugfixes made to the actual services like nova/keystone etc07:56
jrosserthe ansible roles and openstack-ansible itself are staying pretty much the same07:56
*** raukadah is now known as chandankumar08:28
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_neutron master: Do not set Open vSwitch hostname  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/79300909:10
noonedeadpunkhm I actually wonder why https://review.opendev.org/c/openstack/openstack-ansible-os_senlin/+/754045 still fails with cert verification issue09:42
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Don't set keystone URI as unsecure  https://review.opendev.org/c/openstack/openstack-ansible/+/79680909:48
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Don't set keystone URI as unsecure  https://review.opendev.org/c/openstack/openstack-ansible/+/79680910:08
noonedeadpunkI'd expect it to fail atm...10:08
noonedeadpunkcurl https://89.42.141.147:5000 results in `SSL certificate problem: unable to get local issuer certificate` :(10:26
noonedeadpunkoh, well.. probably we should either put url or add IP to the certificate?10:28
noonedeadpunk*put url as keystone endpoint10:28
noonedeadpunkbut asking https://aio1.openstack.local:5000 has exact same result :(10:32
opendevreviewArx Cruz proposed openstack/openstack-ansible-os_tempest master: Create alternative tempest run command  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/79681810:37
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Use openstack_repo_url for requirements_git_url  https://review.opendev.org/c/openstack/openstack-ansible/+/79682010:43
noonedeadpunkjrosser: is it enough to copy ExampleCorpRoot.crt into /usr/local/share/ca-certificates/ or also intermediate should be copied there as well?11:09
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Don't set keystone URI as unsecure  https://review.opendev.org/c/openstack/openstack-ansible/+/79680911:13
snadgeim still unable to figure out why Install kvm pip packages is failing11:31
noonedeadpunkoh, and you're running centos 7?11:37
noonedeadpunkI think this really might be osa bug actually11:39
snadgeyes unfortunately11:44
snadgewe are in the process of sorting out a driver issue.. rhel 8.2/8.3 with some out of tree drivers appear to work11:44
snadgebut its too late to test and deploy with that now11:44
snadgealso speccing new hardware but this is also future plans11:45
snadgei see centos 8 and centos 8 stream are listed as supported.. any plans to support rocky or alma linux, or basically a rhel clone?11:48
snadgeif we still haven't replaced the hardware and need to replace ussuri, i can look into centos 8 stream11:52
jrossernoonedeadpunk: i think you can set up the CA however you need - the thing that makes the intermediate necessary is the signed_by https://opendev.org/openstack/ansible-role-pki/src/branch/master/defaults/main.yml#L9212:11
jrosserso in theory (not tried it though!) you could set it up with just a root and no intermediate at all12:12
jrosserthe most general case is your company has a root which you will never be able to access the key for, but they give you the CA cert12:13
jrosserthen they create an intermediate CA cert/key which you do have access to12:13
jrosserin terms of copying the CA into /usr/local/share/ca-certificates you should only need to put the root CA there12:14
jrosserif you also have to put the intermediate there, then something is broken12:15
jrosserthe services should be set up to present a cert chain of the service cert + the intermediate CA, which can then be validated by a client which only needs the root CA cert12:15
noonedeadpunkwell, smth really is broken with basic aio...12:18
noonedeadpunkoh, well, intermediate is not part of ha proxy cert12:18
jrosserit should be, just a momenty12:19
jrosserthis should define what gets copied over to the haproxy certs dir https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/defaults/main.yml#L152-L16812:19
noonedeadpunk /etc/openstack_deploy/pki/certs/certs/haproxy_aio1-chain.crt is present, but it's not part of /etc/ssl/private/haproxy.pem12:20
jrosserand then this should concatenate the pieces https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/handlers/main.yml#L16-L2112:20
noonedeadpunkoh well, and here's a mistake12:21
noonedeadpunk`{{ haproxy_user_ssl_ca_cert is defined | ternary(haproxy_ssl_ca_cert,'') }}`12:21
noonedeadpunkand we have `haproxy_user_ssl_ca_cert | default(haproxy_pki_intermediate_cert_path)`12:21
jrosseroh hmm12:23
jrosserlike previously there was maybe not a CA (previous haproxy default self signed case?) but now there always is12:24
noonedeadpunkyeah...12:24
noonedeadpunkand another thing, that we probably need to add ip address to certificate subject12:24
noonedeadpunkor at least add what we have defined as vip address12:25
noonedeadpunkas once I added chain to haproxy, I got another issue  - `SSL: no alternative certificate subject name matches target host name '172.29.236.101'`12:25
jrosserright - theres an example of that in the rabbitmq role12:26
noonedeadpunkbut thanks for helping out here12:26
noonedeadpunkI would spend so much time without these pointers...12:26
jrosserhttps://github.com/openstack/openstack-ansible-rabbitmq_server/blob/master/defaults/main.yml#L15712:26
noonedeadpunkhttps://github.com/openstack/openstack-ansible-haproxy_server/blob/master/defaults/main.yml#L149 I think we just need to add internal here as well just in case? but not sure...12:27
noonedeadpunkI think it's worth being another cert....12:27
jrosseryeah12:28
jrosserwe should look at the internal+external both being SSL as a seperate thing12:28
jrosserbecasue you might want company cert on the outside12:28
jrosserbut still need to use internal CA on the inside12:28
noonedeadpunkyep, or do let's encrypt outside... Which I guess we can't do right now12:29
noonedeadpunk(not sure)12:29
jrosserhmm well that is a good question actually12:31
jrosseri am hoping that everything still works as before, but now not so sure if re-running the haproxy role after a deployment with LE will replace the LE certs ones with those from the PKI role12:31
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_nova master: Drop CentOS 7 specific task  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/79683012:32
jrosseri think it should be OK, as this should only happen once https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/handlers/main.yml#L16-L2112:33
noonedeadpunkbut I mean we get let's encryprt cert for both internal and external at the same time?12:33
noonedeadpunkdisregard - let me read code carefully )12:33
jrosseroh yes i imagine if you set ssl_all_endpoints of whatever it is then yes, it would be LE both sides12:34
mgariepythe gate check for ovn patch is painful.12:34
mgariepy3 recheck, 3 differents failure12:34
noonedeadpunk :(12:35
mgariepy1 calico, 1 buster, 1 centos8 irrc12:35
noonedeadpunkand only in gates...12:35
mgariepythat's annoying lol12:35
noonedeadpunkbtw calico fails pretty frequently imo12:36
mgariepyis there a lot of users using it that you know of ?12:36
noonedeadpunkI know logan- used to...12:36
noonedeadpunkAnd who knows - maybe we will consider it as well for some reason one day (at least there were suggestions to try it out)12:37
noonedeadpunksnadge: nope, not planning to try alma/rocky/etc at the moment12:39
jrosseri guess we maybe have not bumped the calico version for a long time12:39
noonedeadpunksnadge: but I think you can try installing gcc on computes as a workaround manually12:39
noonedeadpunkjrosser: well, I kind of did recently. we're following 3.18 branch and current one is 3.1912:40
noonedeadpunk(and it has been released pretty recently)12:40
mgariepyjamesdenton, are you around ?12:42
mgariepyspatel, do you know how to reset the ovn sb db ?13:03
spatelhmm what do you mean reset13:04
jamesdentonyes? on a call at the moment13:04
jamesdentoni do not know how to reset that13:04
mgariepyi got a buggy aio install with ovn 13:05
mgariepythe issue is that the sb db is somewhat empty.13:05
spateli don't think there is any reset thing, you need to delete db file and re-create it i believe 13:05
mgariepy        gateway chassis: [neutron-ovn-invalid-chassis]13:05
spatelyour compute node should push data to sb 13:05
mgariepyand it's not listening on all the ports the working one is.. 13:06
mgariepyi think the issue was that i did zap the neutron containers but didn't drop the DB.13:11
mgariepyfor redeployment.13:12
opendevreviewArx Cruz proposed openstack/openstack-ansible-os_tempest master: Create alternative tempest run command  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/79681813:52
opendevreviewAdrien Cunin proposed openstack/openstack-ansible-openstack_hosts master: Make sure tzdata is installed in containers  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/79685014:33
opendevreviewGaudenz Steinlin proposed openstack/openstack-ansible-os_nova master: Use version from repo_packages for SPICE HTML5  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/79685214:54
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Fix serialized playbook runs  https://review.opendev.org/c/openstack/openstack-ansible/+/75204015:05
CeeMacanyone have any tops for troubleshooting rabbitmq message timeouts?15:36
CeeMacwe're having some problems with live migrations failing, at first look we're getting 504 gateway timeouts from cinder api, however when looking deeper in the logs we also see some message timeout events in the cinder api logs. so i think nova is requesting a volume operation, cinder api is dropping a message on the queue, then its not getting a response back within a particular time so its dropping the request, which i 15:38
CeeMacthink is then generating the 50415:38
CeeMactops==tips15:39
*** rpittau is now known as rpittau|afk16:09
noonedeadpunkI only have one thought, that some rabbitmq member is non functional and doesn't operate normally16:14
noonedeadpunkor they're under really high load that they can't handle all messages in time16:15
noonedeadpunksecond point can be checked with statistics pretty easily16:15
*** sshnaidm is now known as sshnaidm|afk16:16
noonedeadpunkin case of first one I ussually just run playbook with -e rabbitmq_upgrade=true, as downtime costs more then just running role...16:16
CeeMacthanks noonedeadpunk in this case it looks like a couple of volumes were associated with cinder-volume pools which no longer exist, which would explain why the message wasn't getting processed in the queue16:41
noonedeadpunkah:)16:41
noonedeadpunkI wouldn't guess that16:42
CeeMacwe've re-managed the volumes to a valid host/pool and the migration went straight through16:42
CeeMac(after tidying up the inactive vif bindings - another thing we took ages to find)16:42
noonedeadpunkyeah it's quite straightforward with cinder-manage16:42
CeeMacyeah, it had us stumped for a fair while too,16:42
opendevreviewDmitriy Rabotyagov proposed openstack/ansible-role-pki master: Allow to generate/install certificates conditionally  https://review.opendev.org/c/openstack/ansible-role-pki/+/79689516:54
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-haproxy_server master: WIP Generate self-signed SSL per listen IP  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/79694018:32
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-haproxy_server master: WIP Generate self-signed SSL per listen IP  https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/79694018:37
mgariepy (˚Õ˚)ر ~~~~╚╩╩╝ centos-8-stream18:42
mgariepythe ovn patchs seems to be failing randomly more reliably.. 18:43
mgariepyhoo. it's a non-voling one .. :S18:45
opendevreviewMerged openstack/openstack-ansible-os_magnum stable/victoria: Define region for Magnum trust  https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/79504320:32
snadgethese timezone differences are rough, i feel like i need to work during the evenings but it's probably better this way21:31
snadgeinstalling gcc on the compute nodes is my latest act of desperation ;)21:32
snadgeis the python virtual environment a new dependency for the compute nodes?21:33
jrossersnadge: it's been like that forever21:36
jrosserbut look here, for the AIO case the deploy host == compute host, which gets gcc https://github.com/openstack/openstack-ansible/blob/master/scripts/bootstrap-ansible.sh#L73-L8121:36
snadge-bash: dnf: command not found21:39
snadgethat could be a problem ;)21:39
jrosseri gave you a link to the master branch21:39
snadgethats the latest version of that script which is master branch yes21:39
jrosserwhich is centos-8 only21:39
jrosserussuri would be https://github.com/openstack/openstack-ansible/blob/stable/ussuri/scripts/bootstrap-ansible.sh#L73-L7721:40
snadgeok that has $RHT_PKG_MGR -y install which should work.. i guess the question is why its not21:40
snadgebootstrap also runs on the compute nodes?21:40
jrosserno, so it feels like a bug maybe21:41
snadgeyou only run it on the deployment host is my understanding21:41
snadgeyeah im not bothered and the easiest thing was to yum install gcc on the computes to see what happens i guess21:41
jrossersure21:41
snadgethats running now21:41
jrosserand actually here is the root cause https://opendev.org/openstack/openstack-ansible-os_nova/commit/e72835e5ac97f51665add4ade2a737eab12b3a9e21:41
jrossercurse of centos again21:41
jrosserwhilst there is a python 3 interpreter for centos-7 nowadays, they don't package any useful python libraries21:42
jrosserso the libvirt python lib is just completely unavailable and it looks like we have to build it from source21:43
snadgeok that makes complete sense now.. nobody should be using ussuri or centos 7 anymore, my apologies21:43
snadgethis will be the last time i promise :P21:44
jrosserno its fine - it's as much a bug in OSA that we don't special case installing gcc on computes for ussuri/centos-721:44
snadgeim probably going to suggest moving onto centos 8 stream21:44
snadgeeven though we can use RHEL for free21:44
jrosserjust unlucky that in the AIO/CI we get that as a side effect of all the functions being collapsed onto the same node21:44
jrosseri'll see if i can make a patch21:45
snadgehow is the centos 8 stream support? are any other centos forks supported, like almalinux or rocky?21:48
snadgeeveryone else laughs in ubuntu.. i know ;)21:52
jrosserstream support will be in the upcoming Wallaby release21:58
jrosserthough i guess it needs some multinode testing properly in a lab, for reasons just like you found with ussuri/centos-721:59
jrosserthere will likley be a few small gremins21:59
jrosser*gremlins21:59
snadgeas soon as we have deployed this ussuri release I can look into wallaby/stream21:59
snadgewho knows how long it will take them to buy new hardware, its a new bladecenter and san22:00
jrosserand it remains to be seen what the stability is, though hopefully as OSA gets all the openstack stuff from source code you might be relatively decoupled from that22:00
snadgeits possible that rhel clone support could be added alongside stream.. for now, they are almost identical22:02
snadgebut how they diverge who knows, and that obviously doubles the amount of testing etc22:02
snadgecentos 8 stream just seems like a weird choice for a bladecenter.. but im sure i can get approval to at least try it :P22:05
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_nova stable/ussuri: Install gcc on any nova-compute hosts which need libvirt-devel  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/79695722:13
jrossersnadge: ^ i've not got anywhere to test that locally, the CI should verify that syntactically it's ok at least22:14
snadgeyeah it looks good, i could simply pull in that patch and run the playbook again.. maybe remove gcc first22:15
jrosseryeah, if you can test it and leave a comment on the patch that would be super helpful22:15
snadgewhich playbook will run that?22:15
jrosseryou can run the playbooks/os-nova-install.yml22:16
snadgeahh okay even easier22:16
jrosserhave you applied a patch from gerrit before?22:16
snadgenot for this project, but cyanogenmod which also uses gerrit, but a while ago22:16
jrosserhit the 3-dots thing top right on https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/79695722:17
jrosserselect "download patch" and copy the "cherry pick" line to your clipboard22:18
jrosserthen cd to /etc/ansible/roles/os_nova, paste the command and it should apply it there for you22:18
jrosserin terms of rhel support - as you say it is extremely similar to stream22:20
jrosserand i would expect you could get that working reasonably OK - there are almost certainly some places that we match on the string 'centos' that would need to be made more generic22:21
jrosserand the usual suspects would be getting systemd-networkd and lxc working22:22
snadgeive installed the patch but waiting for setup_openstack to finish22:23
snadgei just want to see the horizon web interface and don't actually care if it works ;)22:23
jrosserlet me just check what we had to do for horizon too.....22:23
jrosseryes so theres a similar missing-python3-library situation for mod-wsgi which needed this workaround https://github.com/openstack/openstack-ansible-os_horizon/commit/075dcf9c7e7b13776848b08b092d901cc185b66922:30
snadgethats in master.. is it in ussuri/stable?22:41
snadgeif horizon installs correctly i will assume yes22:42
snadgeits my day off.. maybe i'll get paid an extra day this week hehe22:45
snadgei owe you a carton of beers at this rate, by the time travel between australia and other countries is allowed again23:02
snadgehorizon installed 👍23:07

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!