Thursday, 2022-01-27

opendevreviewMerged openstack/openstack-ansible-os_cinder stable/xena: Enable recursion in combine() filter  https://review.opendev.org/c/openstack/openstack-ansible-os_cinder/+/82441101:45
opendevreviewMerged openstack/openstack-ansible-os_aodh stable/xena: Ensure libxml2 is installed on debian systems  https://review.opendev.org/c/openstack/openstack-ansible-os_aodh/+/82637802:07
opendevreviewMerged openstack/openstack-ansible stable/wallaby: Bump OpenStack-Ansible Wallaby  https://review.opendev.org/c/openstack/openstack-ansible/+/82539502:29
opendevreviewMerged openstack/openstack-ansible stable/xena: Bump OpenStack-Ansible Xena  https://review.opendev.org/c/openstack/openstack-ansible/+/82539102:40
opendevreviewMerged openstack/openstack-ansible master: Bump OpenStack-Ansible master  https://review.opendev.org/c/openstack/openstack-ansible/+/82539003:09
*** dmsimard6 is now known as dmsimard06:38
noonedeadpunkprometheanfire: are you sure it's max_connections?08:13
noonedeadpunkas galera has several bugs regarding their threading, that makes cluster fall apart08:14
jrossernoonedeadpunk: related to that - https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/78638108:41
jrosserif the connection limit is ever reached its totally bad for the loadbalancer08:41
jrosserwhich then makes it worse again with failover08:42
noonedeadpunkmakes sense to me08:52
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-plugins master: Add ssh_keypairs role  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/82511309:58
opendevreviewJonathan Rosser proposed openstack/ansible-role-systemd_service master: Allow StandardOutput to be set for a systemd service  https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/82660210:16
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-galera_server master: Convert xinetd clustercheck to systemd socket service  https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/82404210:17
*** dviroel|afk is now known as dviroel11:20
*** sshnaidm|afk is now known as sshnaidm11:35
opendevreviewJonathan Rosser proposed openstack/ansible-role-systemd_service master: Allow StandardOutput to be set for a systemd service  https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/82660212:13
opendevreviewMerged openstack/openstack-ansible stable/xena: Remove CI jobs for centos-8  https://review.opendev.org/c/openstack/openstack-ansible/+/82456713:02
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_tempest stable/victoria: Remove tempestconf centos-8 job  https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/82669715:10
prometheanfirenoonedeadpunk: yep, I'm sure, 200 was too little15:15
prometheanfire16R 8C and 128R 24C infra nodes both seemed to get that15:16
prometheanfirejrosser: ya, that's kinda what I'm seeing15:16
jrosserwell, for now you can up the connection limit15:17
jrosserbut it is basically broken if you ever reach that15:18
prometheanfire800 seems to be working, but good to know it's known behavior15:19
prometheanfiresub'd to the review15:19
jrosserit's kind of two sides problem - config on the clients needs to do something sensible with the connections that are made15:20
jrosserand galera server should not shoot its own foot when max connections is reached15:21
prometheanfireheh, bossman was complaining about how many connections to the DB are being made (grumbles in greybeard hypertuning)15:22
jrosseryou might want to look at these https://review.opendev.org/q/topic:db-pooling15:24
jrosserand the releasenote https://review.opendev.org/c/openstack/openstack-ansible/+/81942415:26
prometheanfireI think those hit xena15:26
prometheanfireya, the release note is what I'm reading15:26
jrosserthey were cherry picked but you'd need to look if theres a point release that pulls them in15:27
prometheanfirenova still has max_overflow = 5015:27
prometheanfireso, looks like it, I'll look in a sec15:27
prometheanfirecoffee calling15:27
prometheanfireconfirmed, made the xena tag15:28
jrosseri think the thing now is that there is one place you can globally set all that stuff15:28
jrosserbut there is an important relation to what happens at a keepalived failover15:29
prometheanfireya, able to set openstack wide defaults, then override with role level stuff if needed15:29
prometheanfireyep, doubling15:29
jrosserindeed15:29
opendevreviewMerged openstack/openstack-ansible stable/wallaby: Fix definition of ssl_protocol  https://review.opendev.org/c/openstack/openstack-ansible/+/82638215:31
prometheanfirehopefully that first galera-server review fixes the bouncing around (doubling or not)15:31
opendevreviewAndrew Bonney proposed openstack/openstack-ansible-galera_server master: Listen on an additional port for monitoring/diagnostic purposes  https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/78638115:36
prometheanfire:D15:37
andrewbonneyI'll bump it up my to do list :)15:37
opendevreviewJames Denton proposed openstack/openstack-ansible-ops master: Update MNAIO for Focal  https://review.opendev.org/c/openstack/openstack-ansible-ops/+/82448615:39
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-plugins master: Add ssh_keypairs role  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/82511315:44
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-plugins master: Add ssh_keypairs role  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/82511315:45
opendevreviewMerged openstack/openstack-ansible stable/xena: Fix definition of ssl_protocol  https://review.opendev.org/c/openstack/openstack-ansible/+/82638115:51
opendevreviewMerged openstack/openstack-ansible stable/victoria: Fix definition of ssl_protocol  https://review.opendev.org/c/openstack/openstack-ansible/+/82638315:51
prometheanfireinvestigating, but uefi used to work... on victoria, nova.exception.UEFINotSupported: UEFI is not supported15:54
spatelprometheanfire did you set machine type in flavor i think only q35 supported 15:58
spateli was messing with that last week 15:58
prometheanfireit was via the image iirc16:00
prometheanfire--property hw_firmware_type=uefi16:00
prometheanfireI wonder if it's because the osbpo repo for buster doesn't go to xena16:07
opendevreviewMerged openstack/openstack-ansible master: Bootstrap lxc_net mtu for gate  https://review.opendev.org/c/openstack/openstack-ansible/+/55748416:09
opendevreviewMerged openstack/openstack-ansible stable/xena: Gather additional facts for haproxy playbook  https://review.opendev.org/c/openstack/openstack-ansible/+/82656116:09
opendevreviewMerged openstack/openstack-ansible stable/wallaby: Gather additional facts for haproxy playbook  https://review.opendev.org/c/openstack/openstack-ansible/+/82656216:09
*** dviroel is now known as dviroel|lunch16:13
spatelprometheanfire may be issue with OVMF_CODE.fd path16:25
spatelnoonedeadpunk I am half way in my upgrade, as soon as i upgrade rabbitMQ from W->X i found all infra nodes neutron-agent die and failed because can't talk to rabbitmq. I have manually restarted to bring them back. Not saying this is major issue but just noticed this16:28
noonedeadpunkwell, I haven't spotted anything like that. But I guess I can recall someone having same issue... Oh, well, there was also issue in oslo that andrewbonney spotted, that got fixed with requirements bump16:33
noonedeadpunkspatel: https://bugs.launchpad.net/oslo.messaging/+bug/194996416:34
prometheanfirespatel: I didn't hit that in my w->x upgrade16:34
spatelhmm 16:35
spatelnoonedeadpunk assuming that bug patch yet to merge in 24.0.0 correct?16:36
spatelhttps://review.opendev.org/c/openstack/requirements/+/82310416:37
spatelfolks are talking about wallaby issue but in my case i am already running wallaby and moving to Xena16:39
noonedeadpunkit's both in w and v and 24.0.016:52
noonedeadpunkjust rabbit restart could trigger issue potentially16:52
noonedeadpunkas services are still on w16:52
noonedeadpunkbut not sure16:53
noonedeadpunkoh, not on v, sorry )16:53
jrosserthat amqp thing that needed the requirements bump leaks fd on the compute nodes16:59
spatel24.0.0 still using amqp  5.0.6 so yes we need bump to use 5.0.8 in next xena tag 17:21
spateljrosser one more thing i noticed that setup-host.yml updating /etc/openstack-release file during upgrade but technically i didn't upgrade anything yet (except infra nodes) 17:22
spatelshouldn't /etc/openstack-release should get update based on roles you are running? for example when i run nova role then end of that roles it should update /etc/openstack-release file. 17:23
jrosserthe release is the release of openstack-ansible17:24
jrosserso it contains something like DISTRIB_RELEASE="24.0.0.0rc1"17:24
spatelDISTRIB_CODENAME="Xena"17:24
jrosseri think it would be hard to do anything different17:25
jrosseras nothing stops you mixing releases, like it is common to use master magnum on an otherwise stable branch deploy17:25
spatelYou are correct, it would be difficult with BM deployment :)17:26
spatelanyway ignore :)17:26
jrosserok :)17:26
*** dviroel|lunch is now known as dviroel17:26
spatellook like i had lots of coffee today.. hehe.. thinking too much 17:26
jrosserso do we run out of memory here? https://zuul.opendev.org/t/openstack/build/40f34af251424c82bf1e83d05d8b1620/log/logs/host/nova-conductor.service.journal-09-22-10.log.txt#248217:31
jrosserall of the centos-8-stream jobs fail on that patch17:31
prometheanfireso, I think my uefi boot issues are because ovmf is not installed from backports on buster17:52
prometheanfireinstalling from backports worked17:52
prometheanfirepatches incoming I suppose :P17:52
jrosserprometheanfire: https://opendev.org/openstack/openstack-ansible-os_nova/src/branch/master/tasks/nova_install.yml#L28-L4317:55
prometheanfirejrosser: I was looking at https://github.com/openstack/openstack-ansible-os_nova/blob/f7cb4f60e7d81da5f6886683c5a92712ac24365e/vars/debian.yml#L9017:55
jrosseryeah, so it would be good to know that the backports repo is available without any more config17:56
prometheanfireout of scope (that list is already there) :P17:56
jrosserand that updating the pin (which is what that list does) works out as expected17:56
prometheanfirebut I agree, it is annoying when it doesn't exist17:56
jrosserwell thats why i ask17:57
prometheanfirewell, for supported versions (buster and bullseye) they both have backports now17:57
jrosseri'm not sure if you are saying "it doesnt exist" means that the backports repo was not set up automatically17:57
jrosseror if you mean that the package you want does not exist in that list17:58
prometheanfireboth I suppose, I know it exists (package and repo both) for buster at least17:58
opendevreviewMerged openstack/openstack-ansible stable/victoria: Bump OpenStack-Ansible Victoria  https://review.opendev.org/c/openstack/openstack-ansible/+/82539718:02
jrosserthe setup there is a little strange18:02
jrosserin that os_nova manages the pins for the backports repo, but takes no steps to ensure that it is present18:02
jrosserbut if we enabled the backports repo in openstack_hosts, it would be present everywhere, with no pins18:02
jrosserand that might not be what we want18:03
jrosserprometheanfire: this kind of works out for CI because i think the backports repo is enabled on the CI node https://7c50b6b0ae2183e4c536-aac042a0844b1c7bd58db620c0fb1e04.ssl.cf1.rackcdn.com/826379/1/check/openstack-ansible-deploy-aio_metal-debian-buster/74db0ab/logs/etc/host/apt/sources.list.d/18:09
prometheanfirelolol18:12
jrosserlooking at the code though this is a special case for buster18:13
jrosserOSA doesnt generally manage the repos for your underlying distro18:13
*** sshnaidm is now known as sshnaidm|afk18:13
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-os_nova master: Remove apt pinning task for debian buster  https://review.opendev.org/c/openstack/openstack-ansible-os_nova/+/82675918:17
*** dviroel is now known as dviroel|out20:20
spateljrosser i got this error related PKI and i have noticed i don't have that file. why nova need SSL cert? - https://paste.opendev.org/show/812413/20:51
jrosserit should all be in the release notes20:53
jrosserthere is now tls for live migration20:53
jrosserand tls between the compute node and novncproxy20:53
jrosserthe nova role should can the PKI role and generate the certificates20:54
jrosser*call the20:54
spatelhmm 20:54
spateldo you think this is because i ignore - openstack-ansible certificate-authority.yml 20:55
jrosserI don’t think so20:55
spatelNova should create certificate for node so i don't think its related to certificate-authority.yml  playbook20:55
jrossercorrect, that just creates the overall CA20:56
spatelsomething went wrong somewhere.. 20:57
jrosserso20:58
jrosserthere are two steps20:59
jrosserfirst, it makes some certs on the deploy host from this list https://github.com/openstack/openstack-ansible-os_nova/blob/master/tasks/main.yml#L16120:59
jrossersecond, it installs them to the required places with this list https://github.com/openstack/openstack-ansible-os_nova/blob/master/tasks/main.yml#L16221:00
jrosserso if you what it tries to install is not there, then we have an issue between those two steps21:00
spatelhere is the all step, look like it skip create certificate  - https://paste.opendev.org/show/812414/21:00
jrosserprobably becasue it thinks it is already done21:01
jrosserCreate the CSR for nova_os-infra-1-nova-api-container-2482b57a-client -> OK21:02
jrosserso the CSR exists from some previous run21:02
jrosserthen next task is only done one time when the CSR task is changed https://github.com/openstack/ansible-role-pki/blob/master/tasks/standalone/create_cert.yml#L6121:03
spatelhere are all the steps nova did - https://paste.opendev.org/show/812415/21:05
spatelTASK [Create and install SSL certificates for compute hosts]  - didn't run21:06
admin16i installed  24.0.0 in a new cluster today .. it went in OK  and all working without issues .. 21:06
admin16have not done an upgrade from 23.2.0 -> 24.0 yet21:06
admin16maybe this is where the issues are coming ? 21:06
spateli am doing 23.1.0 - > 24.0.0 upgrade21:07
jrosserwell hold on21:07
jrosserit's this thats missing /etc/openstack_deploy/pki/roots/VivoxIntermediate/certs/VivoxIntermediate-chain.crt21:07
jrossersee that its in the "roots" directory - this is a CA cert21:07
spatelhere is the tree - https://paste.opendev.org/show/812416/21:08
spatelhmm 21:09
spatelwhy rabbitmq didn't complain?21:10
jrosseryou maybe fail becasue of not running the certificate authority playbook21:11
jrosserthese tasks should generate the CA chain https://github.com/openstack/ansible-role-pki/blob/master/tasks/standalone/create_ca.yml#L128-L14421:11
jrosserrabbitmq has its way of setting up SSL21:11
spatelhmm21:11
jrosserlibvirt is different and messy21:11
jrosserand requires the root and intermediate to be combined in the same file21:12
jrossermost things want the server cert and the intermediate21:12
jrosserbut libvirt is just wierd for some reason21:12
spatelis there a way to say don't use SSL for libvirt ? (anyway i am not using live vm migration feature because of SRIOV)21:13
jrosserso it is probably the case that in 24.x.y you needs the -chain CA cert which would not have been created in a 23.x.y install, becasue we did no libvirt ssl there21:13
jrosserplease dont21:13
spatelok21:14
jrosseryou need it for consoles and stuff21:14
spatellet me run CA auth playbook 21:14
jrosserthat would be great21:15
jrosserit should hopefully create that missing file21:15
spateljrosser but one more thing i do have - /etc/openstack_deploy/pki/roots/VivoxIntermediate/certs/VivoxIntermediate.crt 21:17
spatelbut playbook looking for VivoxIntermediate-chain.crt  file 21:17
spatelwhy its adding -chain in file?21:18
jrosser /o\21:18
jrosserread this carefully :) https://github.com/openstack/ansible-role-pki/blob/master/tasks/standalone/create_ca.yml#L128-L14421:18
opendevreviewJames Denton proposed openstack/openstack-ansible-ops master: Update MNAIO for Focal  https://review.opendev.org/c/openstack/openstack-ansible-ops/+/82448621:18
jrosserit concatenates the root and the intermediate into one file21:19
jrosser`cat {{ cert_path }} {{ ownca_path }} > {{ cert_chain_path }}`21:19
spatelohhh 21:19
jrosserthat is necessary, becasue that is what libvirt wants21:20
jrosserit needs the root and the intermediate in the same file, for $unknown-reason21:20
spatel:) - this step is required - openstack-ansible certificate-authority.yml21:21
spateli can see chain now :)21:21
jrosserawesome!21:21
jrosseri kind of explained why yesterday21:22
jrosserthe major upgrade instructions tell you to do this first `openstack-ansible setup-hosts.yml --limit '!galera_all:!rabbitmq_all' -e package_state=latest`21:23
spateli thought it will create brand new CA certificate again so ignore that step 21:23
jrosserooohhhh no21:23
jrosserthat would be a big disaster21:23
spatelyes that is why i was confused and ask why are we Regenerating CA during upgrade process 21:24
jrosserthere is a variable to do that specifically https://github.com/openstack/ansible-role-pki/blob/master/defaults/main.yml#L67-L6821:24
jrosserwell you make a good point, that is really bad wording21:25
spatelOk in-short its saying to run openstack-ansible certificate-authority.yml21:25
jrosserlets fix that21:25
spatel+1 wording is wrong here - To generate new CA, you will need to run the following command:21:25
jrosserregenerating the CA will destroy the deployment21:26
jrosseruntil you re-run all the playbooks completely21:26
spatelor better we add - If you already have CA then it will ignore 21:27
jrosserit's there for a good reason, the state of the CA needs to be up to date21:27
jrosserlike that missing chain file21:27
spatelyes.. 21:27
spatelbecause of wording i got scared and ignore but look like its safe to run (so better should change wording saying Just run this command to cert related thing or whatever is best)21:29
spatelwhat will happen after 10 year when this CA get expire? 21:30
jrosserwell, when you override the default CA for the initial deployment you can set the duration21:31
spateldefault is 10 year correct? 21:32
jrosserthere is no process yet for rolling the CA cert to a new one21:32
jrosserso ideally you make the root CA last a very long time21:32
jrosserand the intermediate is much easier to rotate21:33
spatelI forgot to put end date so assuming its 10 year default for self-sign but if i want to change this date today then its safe to change date/time and re-create CA ?21:34
jrosserpossibly, you'd have to try21:34
spatelhmm21:35
jrosserbut you will certainly break things if you use pki_regen_ca21:35
spatelhmm 21:35
jrosseri would *guess* that re-signing the same private key with a new expiry date would work, but that is total guesswork21:36
spateli can totally try that and see in lab21:36
jrossertake copies of the existing stuff21:36
spatelyep21:37
spatelrabbitMQ has own CA etc.. correct so it won't come under that requirement21:37
jrosserrabbitmq had it's own for a very long time now in OSA21:38
jrosserbut this was the first thing ported to the PKI role https://github.com/openstack/openstack-ansible-rabbitmq_server/blob/master/tasks/main.yml#L47-L6121:40
spatelits 10 year i just checked 21:40
jrosseri am looking at what pki_regen_ca actually does21:40
jrosserit seems to not touch the private key21:41
jrosserit will regenerate the CSR with whatever settings you have and then resign21:41
spatelwhat else will break in current scenario, mysql and nova only two components tie up with pki correct?21:48
jrosserrabbitmq21:48
jrosserkeep the old files, you can undo anything bad if you keep them21:49
spatelyes 21:49
spatelI will keep backup :)21:49
jrosseropenstack-hosts role needs to be run if you update the CA cert21:49
jrosserthats where it gets put into the trust store of all-the-things21:49
spatelOk21:52
spatelHow does third-party providerd certs will work with this process? 21:52
spatelI don't think we can use that CA/cert :)21:53
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Clarify major upgrade documentation for updating internal CA  https://review.opendev.org/c/openstack/openstack-ansible/+/82678221:53
jrosserthe CA run by OSA is intended to be internal to the deployment21:53
jrosserits so that the internal components can talk over TLS21:53
spatelyes21:54
jrosserbut it also happens that with no overrides, it is also used for haproxy21:54
jrosserunless you make some extra settings21:54
jrosserhaproxy actually is a complicated example21:55
spatelIn my case i have F5 so do i need to install certificate in F5 in advance?21:55
spatelDoes this pki required by LB or not?21:56
jrosserlets concentrate on haproxy for a moment21:57
jrosserthere was always a variable like this https://github.com/openstack/openstack-ansible-haproxy_server/blob/master/doc/source/configure-haproxy.rst#securing-haproxy-communication-with-ssl-certificates21:57
jrosserwhere you could supply the path to your own cert and key21:57
jrosserand those vars are still valid, and if they are set then the PKI role will not be used for making the external VIP certificate21:58
jrosserfor the case of using an F5, you would do the same thing as you did before for the external IP21:58
jrosserhowever, we are starting to also support https now on the internal VIP21:59
jrosserand probably the best thing to do there is to have the PKI role generate an internal cert/key for you21:59
spatelin that case i have to install self-sign certificate on F5 + CA too21:59
jrosserand then manually install that to the F521:59
jrosserthere is already a hook for you to do this https://github.com/openstack/openstack-ansible/blob/master/playbooks/certificate-generate.yml22:00
spatelDo i need to do anything currently in F5 when upgrade from W -> X ? 22:01
jrosserif you leave the internal VIP as http, nothing changes22:01
jrosserand if you are doing an upgrade of an existing deployment, you should leave it as http22:01
spatelcool! i have no plan to turn that on anytime soon22:01
jrosseranyway, playbooks/certificate-generate.yml is worth knowing about22:02
spatelI believe default is http correct?22:02
jrosserif you have some other service that needs a cert generated from the OSA CA then that playbook can make whatever you need22:03
spatelin my case F5 correct?22:03
jrosserif you wanted to do the internal VIP22:04
spatelYes i can test that out in lab sure.. 22:04
jrosserit creates them but doesnt install them anywher22:04
spatelyes i have to do that manually correct22:04
jrossermake any variables you like with the prefix user_pki_certificates_22:04
jrosserit's run as part of setup-hosts22:05
jrosserbut you can do it anytime, it doesnt do anything except create the certs on the deploy host22:05
spatelgood to know and worth trying 22:06
spatelmy br-mgmt is not routable so kind of secure 22:06
spatelhope soon we have good doc about PKI bits and bytes so easy to understand and use some of cool stuff22:08
jrosserhttps://docs.openstack.org/openstack-ansible/latest/user/security/ssl-certificates.html ?22:11
jrosserif you think there is something missing here please say22:12
spatel:) oh i didn't see that22:13
spatelFinally my upgrade about to finish and all looks good.. 22:16
spatelThank you so much for your help on time :)22:18
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Clarify the difference between generating and regenerating certificates  https://review.opendev.org/c/openstack/openstack-ansible/+/82678622:20
jrosserno problem - it is good to get new things like the TLS stuff properly tested22:20
jrosserall deployments are different in some ways and the problems only appear when we kick the tyres a but22:21
jrosser*bit22:21
jrosserif you have some notes about how to extend the life of the root CA that would be useful22:21
spatelI will let you know after try that out.. sure will share with you22:29
spatelGotta go!! good night 22:31

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!