Monday, 2021-12-27

*** akahat|ruck is now known as akahat05:38
admin1\o08:11
opendevreviewDmitriy Rabotyagov proposed openstack/ansible-role-python_venv_build master: Replace virtualenv with exacutable for pip  https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/82299808:17
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Update ansible-core to 2.12.1  https://review.opendev.org/c/openstack/openstack-ansible/+/82206308:18
*** akahat is now known as akahat|PTO08:28
kleinihttps://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/healthcheck-hosts.yml#L122 <- how does this make sense? I configured very different IP addresses on the internal networks of my deployment.08:41
noonedeadpunkit does in CI :D08:43
noonedeadpunkbut yeah, you're right, we need to adjust that08:44
noonedeadpunkdo you want to patch?08:44
kleiniSo, do the healthcheck playbooks only make sense in CI not in prod?08:44
noonedeadpunkI think it should be both ideally08:45
kleiniOkay, let me check, how I can add the actual management network IP there08:45
noonedeadpunkI'd say it should be smth like {{ management_address }} there08:45
kleinidigging in my memories how ansible debugging works08:46
noonedeadpunkto have that said, next line is not valid as well due to `openstack.local`08:47
admin1hi noonedeadpunk .. i tested upgrade from rocky  xenial -> bionic twice in the lab .. there the repo was built and all worked fine .. yesterday i dropped one server in production and the repo is not built .. is there a way to force repo build ? 08:48
noonedeadpunkthat should be {{ openstack_domain }} (or {{ container_domain }} actually)08:48
noonedeadpunkadmin1: I have super vague memories about old repo_build stuff and it was always super painfull tbh...08:49
noonedeadpunkI would need to read through the code the same way you'd do that...08:49
noonedeadpunkadmin1: how it fails at least?08:50
admin1it does not fail .. it skips the build 08:52
admin1let me gist one run 08:52
noonedeadpunkoh, there was some var for sure to trigger that....08:52
noonedeadpunk`repo_build_wheel_rebuild`08:53
noonedeadpunkand `repo_build_venv_rebuild`08:53
noonedeadpunkdepending on what exactly you want08:53
noonedeadpunkBut I'd backup repo_servers before doing that08:53
admin1even if i limit to just the new container ? 08:54
admin1on bionic ? 08:54
noonedeadpunkyou can't do this08:54
noonedeadpunkoh, well, repo container?08:54
admin1c3 and c2 are the old repo containers ....   c1 is bionic ..  maybe i can do openstack-ansible repo-install.yml -v -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true -l c1_repo_container-xxx 08:55
noonedeadpunkwell, the question is how lsyncd is also configured, as one day it had --delete flag, so whatever you build could be dropped with lsync08:56
admin1c3 has the lsyncd ( master) ... i had stopped lsyncd there 08:56
noonedeadpunkbut other then that it might work, yes08:56
admin1is a repo built on c1 under binonc overwritten by lsync that runs on c3 ? 08:56
noonedeadpunkdo you know how rsync with --delete works ?:)08:57
admin1what is the repo build location in repo containers .. just that i can check if the data is there and back it up .. 08:57
admin1i do 08:57
noonedeadpunklsycn runs on source, all others are destinations08:57
admin1i hope my data is still there 08:57
noonedeadpunkso it's jsut triggeres rsync from c3 with --delete08:57
admin1got it 08:58
noonedeadpunkyou can check nginx conf for that/ but it's /var/www/repo/08:59
noonedeadpunkvenv iirc08:59
noonedeadpunk*./venvs08:59
admin1i see data in c2 .. 08:59
admin1checking in c3 09:00
admin1its there 09:00
admin1so if i set  c2 and c3(lsyncd.lua)  on MAINT,  disable lsyncd from c3,       enable c1 ( bionic)  as READY in haproxy, and run openstack-ansible repo-install.yml -v -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true -l c1_repo_container_xxx , it should theoritically build the stuff in c1 ? 09:03
noonedeadpunkhm, so from what I see, playbook itself decides which repo containers would be used as targets for build... https://opendev.org/openstack/openstack-ansible/src/branch/stable/rocky/playbooks/repo-build.yml#L33-L4409:08
noonedeadpunkSo I'd really expected that stuff should be built for c1 just by default...09:08
noonedeadpunkwait...09:09
noonedeadpunkok, gotcha, that is ridiculous...09:11
noonedeadpunkor not)09:12
noonedeadpunkso you sure that you don't have anything related to bionic in c1 in /var/www/repo/pools ?09:12
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: [doc] Update infra node scaling documentation  https://review.opendev.org/c/openstack/openstack-ansible/+/82291209:16
noonedeadpunkseems solving our issue with failing lxc jobs. Eventually I bleieve only adding setuptools would help, but virtualenv part is somehow messy imo in ansible. it might be fine if we used it for creation, but we have a command for that anyway. https://review.opendev.org/c/openstack/ansible-role-python_venv_build/+/822998 09:19
kleinigreen W healthcheck-hosts.yml. Will provide my fixes.09:19
noonedeadpunknice!09:19
admin1noonedeadpunk, i rm -rf the /var/www , rebooted the container and retrying .. 09:21
admin1noonedeadpunk, its searching for some repo_master role .. .. https://gist.githubusercontent.com/a1git/8f4df96f5933d0db944267ac70f584ea/raw/f04f70263e66fe74337ff27931e40a863900eff7/repo-build2.log09:28
admin1maybe i should not disable c3 ( lsyncd.lua )   master 09:31
admin1will enable that and retry ..  without the limit 09:31
admin1i guess there is no way to say rebuild only for 18.04 but skip 16.04 already there. 09:38
noonedeadpunkwell, that's what I suspected kind of....09:40
noonedeadpunkbut eventually I thought that limit might affect the way this dynamic group will be generated09:40
admin1well, i started on -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true  without any limits .. i suspect 16.04 might fail ..if checksums are missing or something as its too old .. but 18 might be built ..  .. if it fails , then i have backup of /var/www/ which i can restore and retry again09:41
noonedeadpunkhm [WARNING]: Could not match supplied host pattern, ignoring: repo_masters09:41
noonedeadpunkbut you kind of have `{"add_group": "repo_servers_18.04_x86_64", "changed": false, "parent_groups": ["all"]}`09:43
noonedeadpunkI kind of afraid about https://opendev.org/openstack/openstack-ansible/src/branch/stable/rocky/playbooks/repo-build.yml#L3809:44
noonedeadpunkbut eventually this should add a host per each OS version09:44
noonedeadpunknot sure why this does not happen09:44
admin1strange thing is this worked twice on lab .. i used the same config and variable .. even kept the domain name and ips the same  and there it upgraded fine  just like the documentation .. 09:45
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: [doc] Update infra node scaling documentation  https://review.opendev.org/c/openstack/openstack-ansible/+/82291209:46
admin1even if 16.04 is gone, that is ok .. as we are not growing now in 16.04 .. so as long as it builds 18.04 i think its good enough09:46
noonedeadpunkwell, repo-build used to weirdly failing for me as well, even when having exactly same deployments that were passing.. I'm actually glad we got rid of it....09:46
noonedeadpunkwell, I can suggest nasty thing then - edit inventory (and openstack_user_config) to have repo container only on c109:47
admin1 "openstack-ansible repo-install.yml -vv -e repo_build_wheel_rebuild=true -e repo_build_venv_rebuild=true" is running now .. if this fails, then will try that one09:48
noonedeadpunkbut I think the question is not only in growing, but also in maintenance of existing xenail09:48
noonedeadpunkas you will fail I believe even when trying to adjust some config09:48
noonedeadpunknot repo-install.yml, repo-build.yml09:48
admin1i don't want to control C .. but it does call repo-build also 09:49
noonedeadpunkyeah, just wasting time :)09:50
admin1its on the "repo_build : Create OpenStack-Ansible requirement wheels" task, so i thikn its working .. 09:50
admin1think*09:50
admin1i see it building in c1 .. finally \o/ 09:52
admin1wheel_build log 09:52
admin1i know its not relevant anymore, but out of curiosity .. if lsync is on c3, but the new bionic is on c1, does it copy from c1 -> c3 and then lsync it again from c3 ? 09:57
noonedeadpunknope, it's not copied from c110:01
noonedeadpunkwe never managed to get this flow working really properly. 10:01
admin1its done .. i see both 16 and 18 packages in c3 and only 18 in c1 10:07
noonedeadpunkoh?10:07
admin1checking with keystone playbook if all is good .. 10:07
admin1its complaining about /etc/keystone/fernet-keys does not contain keys, use keystone-manage fernet_setup to create Fernet keys. .. .. is it safe to login inside the venv and issue the create command ? 10:16
admin1glance went in ok .. 10:23
admin1i will disable keystone and do the rest .. will check into keystone individually later 10:23
admin1quick question .. when all of this is upgraded(hopefully) , do i have upgrade 1 version at a time ? or can i jump a few versions at once ? 10:25
noonedeadpunkwell, I was jumping R->T and T->V10:41
admin1ok 10:41
admin1exept keystone complaining on fertnet keys, all other services are almost installed .. no errors.. and i used the newly built repo server to ensure it has all the pakages 10:42
noonedeadpunkand it was pretty well. But you might go your own way:) eventually no upgrades except version+1 are tested by any project10:42
noonedeadpunkand nova now explicitly blocks such upgrades from W10:42
admin1this cluster is with integrated ceph .. .. i think i need to bump ceph version at some point as well 10:42
admin1i will do it slow .. 1 version at a time .. 10:43
*** chkumar|rover is now known as chandankumar10:50
admin1can osa handle letsencyrpt ssl automatically if domain is pointed to the external vip ? 10:54
admin1sorry .. ignore that quesiton 10:55
admin1except keystone all things look good :) 10:56
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone master: Drop keystone_default_role_name  https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/82300311:06
admin1noonedeadpunk, seen this error before ? /etc/keystone/fernet-keys does not contain keys, use keystone-manage fernet_setup to create Fernet keys 11:38
admin1i did the setup command .. but it did not worked 11:38
admin1i did the setup command ..  keystone-manage fernet_setup --keystone-user keystone --keystone-group service .. but it did not helped 11:39
noonedeadpunkmight be smth related to symlinking?11:39
noonedeadpunk /etc/keystone is likely a symlink in R11:39
admin1its a directory 11:41
*** sshnaidm|afk is now known as sshnaidm11:42
noonedeadpunkhm... I think error must be logged anyway in /var/log/keystone?11:46
admin1this is all it has ...100s of lines .. 11:47
admin1https://gist.githubusercontent.com/a1git/24a333b2976a798a502eb5201f651a60/raw/fcd005b718e8be8aebb6422c40c5083f99d31d61/gistfile1.txt11:47
admin1i will try to nuke this container and retry 11:48
admin1i think it was because i was using it with a limit 12:01
admin1i did it without limit and it just worked 12:01
admin1doh !12:01
noonedeadpunkah, keystone with limit never works just in case12:05
noonedeadpunkI was just updating https://review.opendev.org/c/openstack/openstack-ansible/+/822912/5/doc/source/admin/scale-environment.rst to mention that)12:06
admin1how do i run only mds and mon role but not the osd role 12:09
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Disable service_token requirement by default  https://review.opendev.org/c/openstack/openstack-ansible/+/82300512:25
noonedeadpunkadmin1: at least you can leverage limit to ceph_mons only for example13:01
noonedeadpunkbut there could be also tags that would allow to do that13:01
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens  https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/82300913:06
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens  https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/82300913:07
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens  https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/82300913:09
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens  https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/82300913:52
noonedeadpunkso, zun now fails single tempest job that is eaily reproducable in aio - test_run_container_with_cinder_volume_dynamic_created14:39
noonedeadpunkI wonder if it should be run at all considering that test_run_container_with_cinder_volume is disabled because of bug https://bugs.launchpad.net/zun/+bug/189749714:40
noonedeadpunkhttps://github.com/openstack/zun-tempest-plugin/blob/master/zun_tempest_plugin/tests/tempest/api/test_containers.py#L38014:41
noonedeadpunkah, no `No iscsi_target is presently exported for volume`. So we have just our CI broken I guess14:58
admin1is ubuntu-esm-infra.list part of osa ? 15:48
admin1what happened is  xenial has version 13.0 of ceph (mimic) ..   bionic got version 12.0 of ceph . -- both point to the same mimic repo .. but i found this deb https://esm.ubuntu.com/infra/ubuntu xenial-infra-security main  extra on xenial host with the name  15:49
admin1sources.list.d/ubuntu-esm-infra.list15:49
noonedeadpunkno, I don't think it is16:02
noonedeadpunkcan't recall having that16:02
admin1i found out .. in bionic ceph pinning, its like this Pin: release o=Ubuntu .. while in xenial its Pin: release o=ceph.com 16:09
admin1so one got pinned via ceph.com, other via Ubuntu16:09
noonedeadpunkyeah, but pinning not in sources16:13
admin1changing it manually to ceph.com and then apt upgrade reboot fixed it 16:29
admin1one server is done .. 2 more controllers to go 16:29
jrosser_on old releases like this there are variable marks to set if it takes the Ubuntu ceph packages, the UCA ones or the ones at ceph.com16:36
jrosser_it’s not automatic to choose the one you need/want16:36
admin1i changed to ceph.com, rebooted .. they are good to go .. then i ran playbooks again, it set it back to Ubuntu , but since package already upgraded, it did not downgrade it .. so i am good 16:38
admin1one wishlist from a long time is to run actual swift using osa .. 16:48
admin1because its not strong consistency but eventual consistency, i can place the servers in dual datacenters ( including high latency) and be sure that backups are protected 16:48
noonedeadpunkwell, with ceph you can have rgw (with swift and s3 compatability) relatively easily16:49
noonedeadpunkand do cross-region bacups as well16:49
opendevreviewJames Denton proposed openstack/openstack-ansible-os_glance master: Define _glance_available_stores in variables  https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/82289916:53
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens  https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/82300917:21
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Add boto3 module for s3 backend  https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/82287017:21
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_glance master: Support service tokens  https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/82300917:21
opendevreviewMarcus Klein proposed openstack/openstack-ansible master: fix healthcheck-hosts.yml for different configuration  https://review.opendev.org/c/openstack/openstack-ansible/+/82302317:31
admin1noonedeadpunk, jrosser_ .. thanks for all the help and support .. 17:31
kleinihttps://review.opendev.org/c/openstack/openstack-ansible/+/774472 <- this commit removed the openstacksdk which is used by healthcheck-openstack.yml. How does it work in CI, if it fails for me in prod?17:57
noonedeadpunkkleini: we don't run healthcheck-openstack.yml in CI18:13
noonedeadpunkeventually, I think it needs to spawn just own venv with clients or be delegated to utility container (second is easier)18:13
kleiniso only healthcheck-hosts.yml is run in CI?18:16
kleiniand healthcheck-openstack.yml works again when adding openstacksdk to requirements.txt again18:17
kleiniwill try to delegate healthcheck-openstack.yml to utility container. need to find some example, how to do that18:17
noonedeadpunkand healthcheck-infrastructure.yml is also run18:23
noonedeadpunkopenstack is not as tempest or rally is the way better method how to test openstack18:24
noonedeadpunkas eventually this playbook needs to be maintained, while tempest is maintained by service developers18:25
kleiniokay, will skip it then and stick to tempest18:25
noonedeadpunkkleini: eventually, I think you just need to replace `hosts: localhost` with `hosts: utility_all[018:25
noonedeadpunk* `hosts: groups['utility_all'][0]`18:26
kleinidid that and it says again that openstacksdk is missing18:27
kleiniso maybe additionally the venv needs to be set18:27
noonedeadpunkand set ansible_python_interpreter: "{{ utility_venv_bin/python }}"18:27
noonedeadpunk* "{{ utility_venv_bin }}/python"18:28
noonedeadpunkbut tbh I'd rather dropped that playbook in favor of tempest unless somebody really wants to maintain it and finds useful18:32
kleiniworks18:34
kleinitempest is hard for me to configure in regards to which tests should be run. there is no list of test, no list of suites18:35
kleiniand suite "smoke" is nearly nothing useful. it tests only keystone API18:36
opendevreviewMarcus Klein proposed openstack/openstack-ansible master: fix healthcheck-hosts.yml for different configuration  https://review.opendev.org/c/openstack/openstack-ansible/+/82302318:47
jrosser_kleini: if you want to validate your install you should look at refstack https://refstack.openstack.org19:42
jrosser_that bundles tempest and a defined set of tests for validating interoperability19:42
admin1with the PKI certs in place, is the kesytone url for ceph object storage still  http://<internal-vip:5000> or something else ? 21:20
admin1to integrate osa managed openstack and ceph-ansible managed ceph 21:20
admin1to add swift/s3 of ceph to openstack 21:21
jrosser_admin1: although we have the PKI role in place now, the internal endpoint still defaults to http rather than https22:02
jrosser_there are instructions here for if you want to switch that to https, which you could do on fresh deployments https://github.com/openstack/openstack-ansible/blob/master/doc/source/user/security/ssl-certificates.rst#tls-for-haproxy-internal-vip22:03
admin1how recommended is it to use https:// for internal traffic ..  i have tested and internal is robust ( i mean does not leak to guests ) 22:04
jrosser_for the internal keystone endpoint, when you enable https, the certificate should be valid for whatever fqdn or ip you have defined the internal vip as22:04
jrosser_we will switch to defaulting to https at some future release22:04
jrosser_currently there is no upgrade path for that so the default remains as http22:04
admin1question is .. for those doing ceph-ansible + osa ..  if we switch to pki, since its self signed cert, do we need to copy the ca certs etc to ceph mons as well ? 22:05
jrosser_well, there is an upgrade path if you're happy that the control plane is broken for the period of doing an upgrade22:05
admin1its all in lab .. 22:05
jrosser_it is not self signed22:05
jrosser_it creates a CA root, which is self signed22:05
jrosser_so there is a CA cert you can copy off the deploy node and install onto whatever else you want22:06
admin1so for the ceph-mons to connect to https:// keystone,  that ca cert is the only thing that needs to be copied over  .. 22:06
jrosser_probably22:06
jrosser_different things tend to behave differently22:06
jrosser_libvirt has different needs to python code, for example22:07
jrosser_for where you put the CA, if it needs a copy of the intermediate CA, if it wants a cert chain blah blah blah22:07
jrosser_first thing to do is add the OSA PKI root to the system CA store of your ceph nodes and see if thats good enough22:08
admin1yeah . no hurry now .. just doing some future roadmap plannings . 22:08
jrosser_if not, then dig into the ceph docs to find what it wants22:08
jrosser_there are some things we don't yet test really22:09
jrosser_like providing your own external cert, and also using the PKI role for internal22:09
admin1if we provide own external cert, is that cert used instead of pki ? 22:10
jrosser_or providing your own intermediate CA / key from an existing company CA for osa+PKI to use22:10
jrosser_for what? there are now lots of certs22:10
admin1:D 22:10
jrosser_for external, you would use the vars in the haproxy role, which should be very similar/same as before22:11
admin1for example .. if we change the internal vip for example to   cloud-int.domain.com and external vip to  cloud.domain.com  and provide a san/wildcard that satisfies both  cloud-int and cloud.domain.com, can that cert be used for internal and external instead of the self-signed pki ? 22:11
jrosser_that sort of misses the point22:13
jrosser_you need ssl on rabbitmq today regardless, and that is coming from the PKI role22:13
jrosser_so the internal VIP is really just one of very many places that certificates are required22:14
admin1that is true .. 22:14
jrosser_and imho it is more important to have well designed trust on the internal SSL, more important than it being a certificate from a "real issuer"22:15
admin1some "customers" really insist on having everything "certified" :) 22:16
jrosser_the trouble is you can't have certificates issued for rfc1918 ip addresses, or things which are not public22:16
jrosser_so thats basically broken thinking22:16
admin1yeah 22:18
jrosser_having a private internal CA is more secure than a publically trusted one22:18
jrosser_becasue the internal things will only authenticate with each other, not with an external hacker22:18

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!