Monday, 2022-05-09

*** ysandeep|out is now known as ysandeep|rover04:58
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-repo_server master: Add upgrade path from lsyncd to shared filesystem.  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83941106:46
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-repo_server master: Remove all code for lsync, rsync and ssh  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83758806:46
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-repo_server master: Clean up legacy lsycnd, rsync and ssh key config  https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83785906:46
*** ysandeep|rover is now known as ysandeep|rover|lunch07:20
*** ysandeep|rover|lunch is now known as ysandeep|rover08:23
jrossernoonedeadpunk: i am not sure how we merge this https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/83941109:35
jrosserseems it is very dependant on this https://review.opendev.org/c/openstack/openstack-ansible/+/83758909:35
jrosserthe filesystem is created in the integrated repo atm, becasue the serial: keyword in the repo_server playbook really breaks forming the gluster cluster which can't be done with the tasks serial09:41
jrosseroh also i sent an email direct to the Derek guy from the ML in case all the replies are ending up in his spam filter09:47
noonedeadpunk^ I did same actually...09:50
jrosser:)09:51
jrossernoonedeadpunk: https://github.com/jrosser/openstack-ansible-os_skyline09:59
noonedeadpunkI can create a repo in opendev for that:)10:00
noonedeadpunkRegarding repo - sorry, I'm away today, can't check details for that10:00
noonedeadpunkBut https://github.com/jrosser/openstack-ansible-os_skyline/blob/master/tasks/db_setup.yml and https://github.com/jrosser/openstack-ansible-os_skyline/blob/master/tasks/service_setup.yml are obsolete :D10:01
jrosseroh yeah it's a big hack10:01
jrosserthats how i found we still didnt tidy up os_placement which is what this is created from10:01
jrosseri need to fix up the copyright lines as well10:02
*** dviroel_ is now known as dviroel\11:14
*** dviroel\ is now known as dviroel11:14
*** ysandeep|rover is now known as ysandeep|rover|break11:57
opendevreviewMerged openstack/openstack-ansible stable/xena: Fix extra facts gathering with tags  https://review.opendev.org/c/openstack/openstack-ansible/+/84047912:07
*** ysandeep|rover|break is now known as ysandeep|rover12:57
johnd_Hello there, I would like to ask you some question about cinder deployment. We have 3 controllers with 3 cinder-volume and 3 cinder-api each. Those services are installed o14:27
johnd_Hello there, I would like to ask you some question about cinder deployment. We have 3 controllers with 3 cinder-volume and 3 cinder-api each. Those services are installed in LXC and connected to a same Ceph Backend.14:28
johnd_When we create VMs, volumes are managed by different cinder volumes. We can see this in Horizon (os-controller-2-cinder-volumes-container-0533e258@rbd#rbd / os-controller-3-cinder-volumes-container-871bcbc8@rbd#rbd/...). Now, if we want to migrate the volume from one service to another in horizon, we can only select ceph@rbd#rbd and this fails. If we migrate with CLI, the volume always ends on os-controller-3-cinder-volumes-container-871bcbc8@r14:30
johnd_Is this normal ? How to fix this ?14:31
mgariepyyou can set the `backend_host` config in the ceph secion on cinder. this will allow any 3 cinder-volume to manage the volumes.14:39
jrosserjohnd_: is this an existing/old deployment, becasue I think this is somthing that was adjusted a while ago14:41
johnd_this is an old deployment14:41
johnd_the cluster was created on pike14:42
jrossermgariepy: i remember this being not quite so simple https://opendev.org/openstack/openstack-ansible/src/branch/master/releasenotes/notes/enable-active-active-9af1551759468dc8.yaml14:45
jrosserhttps://github.com/openstack/openstack-ansible-os_cinder/commit/c148d77e29af6faebc1c9b012ae08aed447cd17914:48
jrosserhttps://github.com/openstack/openstack-ansible-os_cinder/commit/c6b9f011b777aa0513e99a35fba7d976a4c9d4c114:50
johnd_We have "cluster = ceph" in our config14:50
jrosserit seems a bit opaque to me how this is actually supposed to work14:50
jrosseri see we make that config setting, but from a cinder POV i'm not sure i understand the active/active model here if the volumes are bound to a specific back end14:51
johnd_i have this in horizon, is this normal for you ? https://ibb.co/617q8r414:52
mgariepywhat is your backend config?14:53
jrosseri have this https://paste.opendev.org/show/bYpnYSYModo1HzRP4qv6/14:53
johnd_here is my config: https://paste.opendev.org/show/bfWltAz5gtdx0jAVueo0/14:55
mgariepyi have both config.14:55
jrosserthere is some background info about active-active here https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.0/html/storage_guide/ch-cinder#active-active-deployment-for-high-availability14:56
johnd_Here are my pool and cluster https://paste.opendev.org/show/bDbZ8GPJf6r2ihM6o2JQ/14:57
jrosserbut it is still confusing that the volumes have a os-vol-host-attr:host which points to a actual host rather than the cluster14:58
johnd_Here are also the cinder services https://paste.opendev.org/show/b40TC8fNXkiktUJoEU7w/15:02
johnd_If you have Horizon installed, how are your hosts volumes showing ?15:08
*** dviroel is now known as dviroel|lunch15:10
foutattorohi all, i have a issue with my OSA cloud after a restart. all services run correctly but I'm getting an error with glance15:14
foutattoroFailed to contact the endpoint at http://172.29.236.15:9292 for discovery. Fallback to using that endpoint as the base url. Failed to contact the endpoint at http://172.29.236.15:9292 for discovery. Fallback to using that endpoint as the base url. The image service for :RegionOne exists but does not have any supported versions.15:14
foutattoroI had alredy create images and instances. I have do any update since my deployment15:15
foutattorosomeone knows how to solve this please ?15:15
jrosserfoutattoro: is 172.29.236.15 your internal VIP?15:28
foutattoroyes15:33
jrosserhave you checked what happends with something like `wget http://172.29.236.15:9292/`15:36
*** ysandeep|rover is now known as ysandeep|out16:09
*** dviroel|lunch is now known as dviroel16:24
foutattorojrosser: I will check what's going wrong after restarting servers17:00
foutattoroIs there any procedure to follow after restarting infra servers ?17:01
jrosserrestarting them all is not so great, as you have rabbitmq / galera clusters to consider17:14
jrosserthere are some pointers here for checking those services after some kind of disruption https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html17:14
foutattorojrosser:You right I think the issue come from galera cluster with is not synchronized17:22
jrosserdid you restart all the infra nodes together?17:22
foutattoroyes17:23
foutattorobut I get this for galera https://paste.opendev.org/show/b42nMfxPH5Utzjt0lhVA/17:23
foutattorohow can I synchronize the cluster17:24
jrosserwell, i think you have to follow the instructions here https://docs.openstack.org/openstack-ansible/latest/admin/maintenance-tasks.html#recover-a-multi-node-failure17:25
jrosserand tbh this is not something ever we expect openstack-ansible to be able to deal with17:26
jrosserthis needs some mariadb skills to bring the DB back online after a total outage17:27
jrosserwhen rebooting/restarting control plane nodes it needs to be done very very carefully, checking the state of rabbitmq/galera at each step17:28
foutattorojrosser: thanks for this information17:43
foutattoroI'm still dealing  with this issue 17:44
mgariepythe good thing is that you have a node with : safe_to_bootstrap: 117:54
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Add CSP headers for img-src and worker-src  https://review.opendev.org/c/openstack/openstack-ansible/+/84115417:55
foutattoromgariepy: could you explain a bit more please17:56
mgariepyyou should restart the cluster with that node.17:58
mgariepyhttps://galeracluster.com/2016/11/introducing-the-safe-to-bootstrap-feature-in-galera-cluster/17:58
foutattorojrosser  mgariepy: what is the risk of deploying openstack-infrastructure services ?18:49
foutattoroand backup vms from ceph storage 18:50
mgariepyfoutattoro, not sur i understand what you mean18:55
foutattorosince my galera cluster is not synchronied, I wonder if it is possible to deploy galera in new lxc containers18:56
mgariepyrestarting galera is not that hard.18:57
foutatorothanks mgariepy & jrosser: I have resated the galera cluster 19:51
mgariepyis your rabbitmq cluster happy ?19:52
foutatoroI think yes.19:52
foutatoroI face with ceh issue19:52
foutatorobecause openstack can't retrieve volume frem ceph https://paste.opendev.org/show/bnIUJ6sVLnsH7mnJZsLX/19:53
mgariepycan you paste: rabbitmq cluster_status 19:53
mgariepygalera is needed by keystone.19:54
mgariepywhich is needed for all the other services along with rabbitmq for most of them.19:54
foutatoroI seems that keysone works since I can list servers with "openstack server list"19:55
foutatorobut all vm are powerd off19:55
mgariepyrestart all the cinder containers19:56
mgariepyand services.19:56
mgariepyhow comes you have 33% degraded if everything is up :/19:57
jrosserhow many infra hosts are there? only two?19:57
jrosseroddly there are 6 ceph mons19:57
mgariepyan even number for quorum is not good.19:58
mgariepysince you need majority to be happy (4 out of 6) 19:58
mgariepyif you only have 5, with 3 your cluster can work19:59
foutatoroyes I've 2 infra hosts19:59
jrosserthats not so great - two nodes is going to fail just as badly as a single one for galera and ceph20:01
foutatorojrosser: I run 3 ceph mon containers on each infra since I only have 2 hosts20:05
foutatorowhat do you suggest for the ceph cluster ?20:06
jrosserif you lose one of those hosts then ceph is down20:06
jrosseras mgariepy says you need 4 of 6 mons to be up as a minimum, thats just how ceph works20:06
jrosserit must always be >50% of them available20:06
foutatoroOK, I have to add a new infra host if I understand 20:09
jrosseryeah, that would certainly make things more robust20:09
jrosserbut you should be able to recover whats happening now20:10
foutatorobut this doesn't explain why cinder can't get volumes from ceph20:10
jrosserit's also worth reading up on galera cluster sizing too https://galeracluster.com/library/documentation/weighted-quorum.html20:10
jrosserno it doesnt explain cinder, the place to start is the cinder api / cinder volumes, check the logs, restart the services if necessary20:12
foutatorojrosser: cender services seem to be up with erros https://paste.opendev.org/show/bjrEenjqdbkwR1FbaDEt/20:23
foutatorowith this deployement I can't find logs into /var/log/cinder/*20:24
foutatoromgariepy: is there a way to redure  33% degraded in ceph cluster ?20:25
jrosserthe logs are all in systemd journals20:27
jrosser`journalctl -u <unit-name>`20:27
foutatoromay be the error come from this message 20:28
foutatoro2022-05-09 20:26:04.072 61 DEBUG cinder.scheduler.host_manager [req-ae393cc0-efc2-4 e2d-add6-120ae3bd3d05 - - - - -] Received volume service update from Cluster: ceph@RBD - Host:  rbd:volumes@RBD: {'vendor_name': 'Open Source', 'driver_version': '1.2.0', 'storage_protocol':  'ceph', 'total_capacity_gb': 6977.09, 'free_capacity_gb': 6977.09, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_s20:28
foutatoro'total_capacity_gb': 6977.09, 'free_capacity_gb': 6977.09, 'reserved_percentage': 0,20:28
foutatorojrosser, mgariepy: any suggestion ?20:43
*** dviroel is now known as dviroel|afk20:43
foutatorojrosser, mgariepy:: I just found the root issue https://paste.opendev.org/show/buSSglduS58Tt2s7yzuG/20:52
foutatorohow to start cinder-volume please ?20:53
jrosseri'm not so sure that it's stopped21:05
jrosserrestarting the cinder-volume service and looking at the journal perhaps21:06
foutatoroI wonder also if my PV are full  https://paste.opendev.org/show/bMbxrFpzlZ8WWFKrVfhY/21:11
foutatorojrosser: do you think is it possible to loose volume after reboot21:33
jrosseri suspect that this is a result of the cinder active/active stuff21:33
jrosserbut it's late now here so i have to go21:34
foutatorojrosser: ok thanks for your help21:37

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!