Thursday, 2023-07-20

opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant master: Install mysqlclient devel package  https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/88898507:21
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant master: Fix linters and metadata  https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/88846907:23
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_adjutant master: Fix linters and metadata  https://review.opendev.org/c/openstack/openstack-ansible-os_adjutant/+/88846907:23
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Refactor LXC image expiration  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/88827807:25
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Fix linters issue and metadata  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/88818007:26
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Fix linters issue and metadata  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/88818007:27
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Add retries to LXC base build command  https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/88875007:27
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-galera_server stable/zed: Add optional compression to mariabackup  https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/88714307:44
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Include proper vars_file for rally  https://review.opendev.org/c/openstack/openstack-ansible/+/88865607:45
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_rally stable/yoga: Include proper commit in rally_upper_constraints_url  https://review.opendev.org/c/openstack/openstack-ansible-os_rally/+/88768107:45
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_rally stable/yoga: Include proper commit in rally_upper_constraints_url  https://review.opendev.org/c/openstack/openstack-ansible-os_rally/+/88768107:46
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Include proper vars_file for rally  https://review.opendev.org/c/openstack/openstack-ansible/+/88865607:48
kleinihttps://paste.opendev.org/show/bZj7Yq3mmW8wWi1e9pqj/ <- I have this issue during upgrade to 26.1.2. The SSH keyfiles are there and I can properly read them with ssh-keygen. generated public key matches the public key file. Do you have any hints, what is wrong? ssh-keygen does not ask me for a passphrase for the private key, when showing the public one with ssh-keygen -y -e -f private08:26
noonedeadpunkkleini: what mode is file in?09:03
noonedeadpunkas IIRC it does fail if it is not 060009:03
noonedeadpunkAnd if it is stored in git - it won't be 060009:04
noonedeadpunkbut if you say ssh-keygen can read them... huh09:06
noonedeadpunkas I was thinking about this thingy https://github.com/ansible-collections/community.crypto/issues/56409:06
noonedeadpunkkleini: do you have `backend: cryptography` in /etc/ansible/ansible_collections/openstack/osa/roles/ssh_keypairs/tasks/standalone/create_keypair.yml ?09:08
noonedeadpunkas this could be fixed with https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/870997 but yeah, it's available only for Antelope and not Zed09:09
kleiniit was the file permissions. many thanks!09:10
noonedeadpunkwe probably can backport this patch to Zed09:10
noonedeadpunkor you can propose it as well ;)09:11
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/2023.1: Use wildcards to specify rabbit/erlang versions  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/88865709:12
kleinion the staging system, I don't have those files in Git, just the configuration without upper constraints, pki, keypairs and so on. because I deploy staging freshly every time, IPs change, container UUIDs change and so on. but for production I have all files in Git. resulting in wrong file permissions for SSH private key files. *facepalm*09:13
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/2023.1: Use wildcards to specify rabbit/erlang versions  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/88865709:13
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/2023.1: Use wildcards to specify rabbit/erlang versions  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/88865709:15
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/2023.1: Use wildcards to specify rabbit/erlang versions  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/88865709:16
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server stable/2023.1: Use wildcards to specify rabbit/erlang versions  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/88865709:17
opendevreviewMarcus Klein proposed openstack/openstack-ansible-plugins stable/zed: Use cryptography backend for openssh_keypair  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/88865809:18
kleinitoo easy ;-)09:19
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/2023.1: Use include_role in task to avoid lack of access to vars  https://review.opendev.org/c/openstack/openstack-ansible/+/88865909:20
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Use include_role in task to avoid lack of access to vars  https://review.opendev.org/c/openstack/openstack-ansible/+/88866009:20
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Use include_role in task to avoid lack of access to vars  https://review.opendev.org/c/openstack/openstack-ansible/+/88902109:20
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Re-enable CI jobs after rally is fixed  https://review.opendev.org/c/openstack/openstack-ansible/+/88901609:23
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Re-enable CI jobs after rally is fixed  https://review.opendev.org/c/openstack/openstack-ansible/+/88901809:24
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Pin version of setuptools  https://review.opendev.org/c/openstack/openstack-ansible/+/88902209:25
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Pin version of setuptools  https://review.opendev.org/c/openstack/openstack-ansible/+/88902209:25
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Pin version of setuptools  https://review.opendev.org/c/openstack/openstack-ansible/+/88902209:25
noonedeadpunkkleini: nobody said it will be hard :) but this way it will get reviewed faster09:27
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/yoga: Restore an ability for HAProxy to bind on interal IP  https://review.opendev.org/c/openstack/openstack-ansible/+/88757709:28
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Restore an ability for HAProxy to bind on interal IP  https://review.opendev.org/c/openstack/openstack-ansible/+/88757409:29
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible stable/2023.1: Gather facts before including common-playbooks  https://review.opendev.org/c/openstack/openstack-ansible/+/88902309:30
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-plugins stable/zed: Skip updating service password by default  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/88815309:30
kleiniI have now an issue with zookeeper in production deployment. incoming connection from other zookeeper instances (in containers) seem to come in from their hosts, not the containers. what can be a possible cause for that? IPs and routing looks the same as with all other LXC containers.10:43
kleinithe certificate check fails then because the source IP is wrong and zookeeper drops the connection from the other zookeeper instance.10:44
noonedeadpunkkleini: so it's cluster connection that fails? Or client connection?10:51
noonedeadpunkAs for client connection, there's a bug in tooz library (was fixed quite recently), that does not allow to enable encryption for clients10:51
noonedeadpunkbut clustering encryption should work10:51
kleiniit is for the cluster connection10:51
anskiynoonedeadpunk: https://zuul.opendev.org/t/openstack/build/42f7de398a5a42a498ffd264914301b1/log/logs/host/glance-api.service.journal-10-08-02.log.txt now it's glance that is broken. I think there is some problem with keystone, not nova10:51
anskiyhmm: https://zuul.opendev.org/t/openstack/build/42f7de398a5a42a498ffd264914301b1/log/logs/host/keystone-wsgi-public.service.journal-10-08-02.log.txt#1757910:53
noonedeadpunkanskiy: I have quite vague understanding why this can happen to be frank, and only in upgrade jobs10:53
noonedeadpunkah10:53
kleinihttps://paste.opendev.org/show/bKFZxWBwt20oLxi8memH/ 10.20.150.2-4 are the infra hosts, while 127,132,184 are the zookeeper containers10:53
noonedeadpunkbut that's kinda "expected"10:53
noonedeadpunkkleini: the only guess how this might happen - is that zookeeper attempts to use eth0 instead of eth110:54
noonedeadpunkand eth0 has src nat 10:54
kleiniokay, so bind seems to be wrong10:54
noonedeadpunkBut, eth0 should not be routable, as lxcbr0 is isolated10:54
noonedeadpunkI _think_ it binds to 0.0.0.010:54
noonedeadpunkbut not sure10:55
noonedeadpunkand default route is through eth0 actually10:55
anskiyI wonder why does it say 10.20.150.132 in ListenHandler, does it bind on it? 10:55
noonedeadpunkthere're different set of settings for client and clustering10:55
anskiyI mean, what's that address anyway, as it seems, that's not an infra host10:56
noonedeadpunkI woudl need to read zookeper docs to recall what is what10:56
noonedeadpunkbut I'd check `ss` output to check where it is binded10:56
kleinizookeeper is bound to eth1 address in containers. strangely incoming connections seem to come in from own host not the other containers host...11:02
noonedeadpunkjust to ensure - zookeper is not running on hosts as well?11:04
noonedeadpunkas might be you ended up with 6 zookeepers or smth?11:05
kleinidamn, it is11:05
noonedeadpunkok, that's interesting11:05
kleiniI have 6 zookeepers11:05
anskiynoonedeadpunk: well, for `openstack-ansible-upgrade_yoga-aio_metal-ubuntu-focal`, which succeeded glance_service_password is 47 chars, for `openstack-ansible-upgrade-aio_metal-rockylinux-9` it's 6111:05
noonedeadpunkis it our env.d that is failing or your own inventory weird?11:05
noonedeadpunkanskiy: well, there's a configuration for keystone to change hashing method to remove this issue11:06
kleinihttps://paste.opendev.org/show/biAjsSQbx4XG3GsA0CVU/ <- issue in inventory11:06
noonedeadpunkit's due to bcache or smth like that11:06
noonedeadpunkkleini: yeah, but I wonder what caused it....11:07
noonedeadpunkyou used default env.d file?11:07
noonedeadpunkanskiy: password_hash_algorithm https://docs.openstack.org/keystone/latest/configuration/config-options.html#identity.password_hash_algorithm11:07
noonedeadpunkbcrypt has a limit of 54, scrypt does not11:08
kleinienv.d is only modified for having cinder volume in containers for Ceph only backing storage11:08
noonedeadpunkanskiy: https://opendev.org/openstack/keystone/src/commit/8ad765e0230ceeb5ca7c36ec3ed6d25c57b22c9d/releasenotes/notes/bug_1543048_and_1668503-7ead4e15faaab778.yaml11:09
anskiynoonedeadpunk: so, it's better to change user_secrets generation?11:10
noonedeadpunkanother way would be to ensure that our tooling does not make passowrds longer then 54 by default11:10
noonedeadpunkI'd say that for new deployments it makes sense to start using scrypt to be frank...11:10
noonedeadpunkanskiy: but I really wonder why it's the issue for this specific patch only11:10
noonedeadpunkthis should be adjsuted then to be 54 max https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/pw-token-gen.py#L8911:11
kleinisorry, I was wrong. there is no zookeeper instance on the hosts. the inventory looks the same in staging11:12
noonedeadpunkugh... Having zookeeper on hosts would be way easier explanation11:13
noonedeadpunkbut yeah, coordination_all should contain containers and hosts - it "as designed"11:14
noonedeadpunkas playbook runs against `zookeeper_all`11:14
noonedeadpunkwhich should contain only containers11:14
kleiniI found some iptables rules, maybe causing this11:16
kleinihttps://paste.opendev.org/show/bCO5z7LFXcb6zI383WBB/ <- some masquerading rule for computes and network nodes to access outside world through infra hosts was not strict enough...11:26
anskiynoonedeadpunk: there is this thing https://review.opendev.org/c/openstack/openstack-ansible/+/887866 and it's 2023.1 too, which fails in `nova-status upgrade check` (jammy) and glance (rocky)11:29
noonedeadpunkkleini: I think at least one of the rules is created by lxc-hosts role11:31
noonedeadpunkthe one for 10.0.3.0/2411:31
noonedeadpunknot sure about second one though11:31
kleiniyes. and the other one was from systemd-networkd to allow computes and network nodes to access outside world through infra hosts. in my setup only infra hosts have outside world IPs and floating IPs are reachable from outside11:32
kleinicomputes and network nodes need to access outside world through management network using infra hosts as SNAT gateways11:33
noonedeadpunkah, ok, I see then.12:03
lsudreHi, I try to deploy openstack+ceph with OSA when I run "openstack-ansible setup-infrastructure.yml" I have an issue with this task: [ceph-osd : wait for all osd to be up] skipping osd1 and osd2, and failing with osd3 and retrying 60times. Do you have any ideas, about what is causing this? Thx 14:02
noonedeadpunklsudre: worth checking `ceph -s` or `ceph health`14:06
lsudrein osd?14:07
noonedeadpunkon monitor host14:07
noonedeadpunkand might be smth like `ceph osd tree`14:07
lsudre auth: unable to find a keyring14:07
anskiylsudre: do you run this on one of controller nodes, as opposed to deploy host?14:10
noonedeadpunkis there even monitor service running?14:10
lsudrenoonedeadpunk: sudo systemctl status ceph-mon.service return inactive14:11
lsudreanskiy: in my ceph-mon host14:11
anskiylsudre: service should be called like `ceph-mon@<HOST>`14:12
lsudrenoonedeadpunk: and sudo systemctl status ceph-mon@os-deploy-ceph-host.service  return active and running14:13
lsudreanskiy: like this ? ceph-mon@os-deploy-ceph-host.service 14:13
noonedeadpunkand what's in /etc/ceph then? There should be ceph.conf and keyrings14:14
lsudremy infra is: one mon(mon1) and 3 osd ([osd1, osd2, osd3])14:14
lsudreok now I can run a ceph -s command14:14
lsudrethe command return a HEALTH_WARN14:15
lsudre mon is allowing insecure global_id reclaim             1 MDSs report slow metadata IOs             Reduced data availability: 2 pgs inactive             OSD count 0 < osd_pool_default_size 314:15
anskiylsudre: so what is the status of OSDs: `systemctl status ceph-osd@<OSD ID>`?14:16
noonedeadpunklsudre: `mon is allowing insecure global_id reclaim` is relatively minor14:18
lsudreanskiy: in the mon? I haven't this service, only a ceph-osd.target14:18
noonedeadpunknah. on osd node14:19
noonedeadpunksaying, osd314:19
lsudresame shit only ceph-osd.target and is running14:20
lsudrethe ceph -s command return services: mon: 1 daemons, quorum os-deploy-ceph-host (age 26m)     mgr: os-deploy-ceph-host(active, since 26m)     mds: 1/1 daemons up     osd: 0 osds: 0 up, 0 in14:20
anskiylsudre: I would still try to run `systemctl status ceph-osd@3` on osd3 -- there could be some logs of the previous attempt to start it14:21
anskiyor `journalctl -u ceph-osd@3`14:22
anskiyor whichever is ID for osd314:22
lsudre○ ceph-osd@3.service - Ceph object storage daemon osd.3      Loaded: loaded (/lib/systemd/system/ceph-osd@.service; disabled; vendor preset: enabled)      Active: inactive (dead)14:24
noonedeadpunkand what if you try to start it?14:28
noonedeadpunkor indeed - check journalctl14:28
lsudrehe ask me a password14:28
lsudreOSD data directory /var/lib/ceph/osd/ceph-3 does not exist; bailing out.14:29
lsudreI have no files in /var/lib/ceph/osd/ folder14:30
lsudresomething is missing in my OSA conf?14:32
anskiylsudre: could you please show us your `openstack_user_config.yml` via paste.opendev.org and user_variables.yml?14:32
lsudreshure14:32
lsudrehttps://paste.opendev.org/show/bGDj5FVzWWKEjkaxhKJX/ https://paste.opendev.org/show/bJBfMzednhy5Y5IOgwiL/14:34
anskiyI wonder how are example configurations suppose to work, without settings `lvm_volumes`...14:48
anskiylsudre: so, I suppose, you need to set `lvm_volumes` variable to some disk devices (like `/dev/sdX`) that you're willing to use as OSDs. 14:52
anskiynoonedeadpunk: so, I've forcefully set https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/pw-token-gen.py#L89 this thing to 64 and succesfully bootstrapped antelope, so it's something else -_-14:54
lsudreok thank you for your time 14:54
*** dviroel__ is now known as dviroel14:55
noonedeadpunkanskiy: from what I read in keystone code - it should jsut "strip" to 54 anything that is longer14:56
noonedeadpunkand it happens only to upgrade, so might be smth realted to re-hashing... And then when we did "update_password" - it was resetting it, so it was not a concern I guess14:57
anskiynoonedeadpunk: didn't the patch was only applicable to nova? As Glance get 401 too14:59
noonedeadpunkNope, we jsut disabled resetting password by default15:00
noonedeadpunkor updating it15:00
noonedeadpunkso if you need to update password - you'de need to define a variable for that15:00
noonedeadpunkso it could be result of keystone upgrade. But then it's good we've catched that15:04
lsudreanskiy: why I need a volume_group: cinder-volumes  in user_variables like the openstack_user_config.yml.test.example when i should use the rbd_volumes specifically made for ceph configuration?  15:23
anskiylsudre: Ceph needs some disks to be used as OSDs, so it could provide block storage for your cluster. And defining which devices should be used by Ceph is done like this: https://github.com/ceph/ceph-ansible/blob/main/group_vars/osds.yml.sample#L21-L122. I, for example set this via `lvm_volumes` list for each OSD node.15:41
opendevreviewMerged openstack/openstack-ansible-openstack_hosts master: Fix linters issue and metadata  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/88845521:59
opendevreviewMerged openstack/openstack-ansible-ceph_client master: Fix linters and metadata  https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/88821622:01
opendevreviewMerged openstack/openstack-ansible-ceph_client master: Apply tags to included tasks  https://review.opendev.org/c/openstack/openstack-ansible-ceph_client/+/88846122:27
opendevreviewMerged openstack/openstack-ansible-repo_server master: Fix linters and metadata  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/88828022:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!