Monday, 2022-05-09

*** ysandeep|out is now known as ysandeep|rover04:58
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-repo_server master: Add upgrade path from lsyncd to shared filesystem.
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-repo_server master: Remove all code for lsync, rsync and ssh
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-repo_server master: Clean up legacy lsycnd, rsync and ssh key config
*** ysandeep|rover is now known as ysandeep|rover|lunch07:20
*** ysandeep|rover|lunch is now known as ysandeep|rover08:23
jrossernoonedeadpunk: i am not sure how we merge this
jrosserseems it is very dependant on this
jrosserthe filesystem is created in the integrated repo atm, becasue the serial: keyword in the repo_server playbook really breaks forming the gluster cluster which can't be done with the tasks serial09:41
jrosseroh also i sent an email direct to the Derek guy from the ML in case all the replies are ending up in his spam filter09:47
noonedeadpunk^ I did same actually...09:50
noonedeadpunkI can create a repo in opendev for that:)10:00
noonedeadpunkRegarding repo - sorry, I'm away today, can't check details for that10:00
noonedeadpunkBut and are obsolete :D10:01
jrosseroh yeah it's a big hack10:01
jrosserthats how i found we still didnt tidy up os_placement which is what this is created from10:01
jrosseri need to fix up the copyright lines as well10:02
*** dviroel_ is now known as dviroel\11:14
*** dviroel\ is now known as dviroel11:14
*** ysandeep|rover is now known as ysandeep|rover|break11:57
opendevreviewMerged openstack/openstack-ansible stable/xena: Fix extra facts gathering with tags
*** ysandeep|rover|break is now known as ysandeep|rover12:57
johnd_Hello there, I would like to ask you some question about cinder deployment. We have 3 controllers with 3 cinder-volume and 3 cinder-api each. Those services are installed o14:27
johnd_Hello there, I would like to ask you some question about cinder deployment. We have 3 controllers with 3 cinder-volume and 3 cinder-api each. Those services are installed in LXC and connected to a same Ceph Backend.14:28
johnd_When we create VMs, volumes are managed by different cinder volumes. We can see this in Horizon (os-controller-2-cinder-volumes-container-0533e258@rbd#rbd / os-controller-3-cinder-volumes-container-871bcbc8@rbd#rbd/...). Now, if we want to migrate the volume from one service to another in horizon, we can only select ceph@rbd#rbd and this fails. If we migrate with CLI, the volume always ends on os-controller-3-cinder-volumes-container-871bcbc8@r14:30
johnd_Is this normal ? How to fix this ?14:31
mgariepyyou can set the `backend_host` config in the ceph secion on cinder. this will allow any 3 cinder-volume to manage the volumes.14:39
jrosserjohnd_: is this an existing/old deployment, becasue I think this is somthing that was adjusted a while ago14:41
johnd_this is an old deployment14:41
johnd_the cluster was created on pike14:42
jrossermgariepy: i remember this being not quite so simple
johnd_We have "cluster = ceph" in our config14:50
jrosserit seems a bit opaque to me how this is actually supposed to work14:50
jrosseri see we make that config setting, but from a cinder POV i'm not sure i understand the active/active model here if the volumes are bound to a specific back end14:51
johnd_i have this in horizon, is this normal for you ?
mgariepywhat is your backend config?14:53
jrosseri have this
johnd_here is my config:
mgariepyi have both config.14:55
jrosserthere is some background info about active-active here
johnd_Here are my pool and cluster
jrosserbut it is still confusing that the volumes have a os-vol-host-attr:host which points to a actual host rather than the cluster14:58
johnd_Here are also the cinder services
johnd_If you have Horizon installed, how are your hosts volumes showing ?15:08
*** dviroel is now known as dviroel|lunch15:10
foutattorohi all, i have a issue with my OSA cloud after a restart. all services run correctly but I'm getting an error with glance15:14
foutattoroFailed to contact the endpoint at for discovery. Fallback to using that endpoint as the base url. Failed to contact the endpoint at for discovery. Fallback to using that endpoint as the base url. The image service for :RegionOne exists but does not have any supported versions.15:14
foutattoroI had alredy create images and instances. I have do any update since my deployment15:15
foutattorosomeone knows how to solve this please ?15:15
jrosserfoutattoro: is your internal VIP?15:28
jrosserhave you checked what happends with something like `wget`15:36
*** ysandeep|rover is now known as ysandeep|out16:09
*** dviroel|lunch is now known as dviroel16:24
foutattorojrosser: I will check what's going wrong after restarting servers17:00
foutattoroIs there any procedure to follow after restarting infra servers ?17:01
jrosserrestarting them all is not so great, as you have rabbitmq / galera clusters to consider17:14
jrosserthere are some pointers here for checking those services after some kind of disruption
foutattorojrosser:You right I think the issue come from galera cluster with is not synchronized17:22
jrosserdid you restart all the infra nodes together?17:22
foutattorobut I get this for galera
foutattorohow can I synchronize the cluster17:24
jrosserwell, i think you have to follow the instructions here
jrosserand tbh this is not something ever we expect openstack-ansible to be able to deal with17:26
jrosserthis needs some mariadb skills to bring the DB back online after a total outage17:27
jrosserwhen rebooting/restarting control plane nodes it needs to be done very very carefully, checking the state of rabbitmq/galera at each step17:28
foutattorojrosser: thanks for this information17:43
foutattoroI'm still dealing  with this issue 17:44
mgariepythe good thing is that you have a node with : safe_to_bootstrap: 117:54
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Add CSP headers for img-src and worker-src
foutattoromgariepy: could you explain a bit more please17:56
mgariepyyou should restart the cluster with that node.17:58
foutattorojrosser  mgariepy: what is the risk of deploying openstack-infrastructure services ?18:49
foutattoroand backup vms from ceph storage 18:50
mgariepyfoutattoro, not sur i understand what you mean18:55
foutattorosince my galera cluster is not synchronied, I wonder if it is possible to deploy galera in new lxc containers18:56
mgariepyrestarting galera is not that hard.18:57
foutatorothanks mgariepy & jrosser: I have resated the galera cluster 19:51
mgariepyis your rabbitmq cluster happy ?19:52
foutatoroI think yes.19:52
foutatoroI face with ceh issue19:52
foutatorobecause openstack can't retrieve volume frem ceph
mgariepycan you paste: rabbitmq cluster_status 19:53
mgariepygalera is needed by keystone.19:54
mgariepywhich is needed for all the other services along with rabbitmq for most of them.19:54
foutatoroI seems that keysone works since I can list servers with "openstack server list"19:55
foutatorobut all vm are powerd off19:55
mgariepyrestart all the cinder containers19:56
mgariepyand services.19:56
mgariepyhow comes you have 33% degraded if everything is up :/19:57
jrosserhow many infra hosts are there? only two?19:57
jrosseroddly there are 6 ceph mons19:57
mgariepyan even number for quorum is not good.19:58
mgariepysince you need majority to be happy (4 out of 6) 19:58
mgariepyif you only have 5, with 3 your cluster can work19:59
foutatoroyes I've 2 infra hosts19:59
jrosserthats not so great - two nodes is going to fail just as badly as a single one for galera and ceph20:01
foutatorojrosser: I run 3 ceph mon containers on each infra since I only have 2 hosts20:05
foutatorowhat do you suggest for the ceph cluster ?20:06
jrosserif you lose one of those hosts then ceph is down20:06
jrosseras mgariepy says you need 4 of 6 mons to be up as a minimum, thats just how ceph works20:06
jrosserit must always be >50% of them available20:06
foutatoroOK, I have to add a new infra host if I understand 20:09
jrosseryeah, that would certainly make things more robust20:09
jrosserbut you should be able to recover whats happening now20:10
foutatorobut this doesn't explain why cinder can't get volumes from ceph20:10
jrosserit's also worth reading up on galera cluster sizing too
jrosserno it doesnt explain cinder, the place to start is the cinder api / cinder volumes, check the logs, restart the services if necessary20:12
foutatorojrosser: cender services seem to be up with erros
foutatorowith this deployement I can't find logs into /var/log/cinder/*20:24
foutatoromgariepy: is there a way to redure  33% degraded in ceph cluster ?20:25
jrosserthe logs are all in systemd journals20:27
jrosser`journalctl -u <unit-name>`20:27
foutatoromay be the error come from this message 20:28
foutatoro2022-05-09 20:26:04.072 61 DEBUG cinder.scheduler.host_manager [req-ae393cc0-efc2-4 e2d-add6-120ae3bd3d05 - - - - -] Received volume service update from Cluster: ceph@RBD - Host:  rbd:volumes@RBD: {'vendor_name': 'Open Source', 'driver_version': '1.2.0', 'storage_protocol':  'ceph', 'total_capacity_gb': 6977.09, 'free_capacity_gb': 6977.09, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_s20:28
foutatoro'total_capacity_gb': 6977.09, 'free_capacity_gb': 6977.09, 'reserved_percentage': 0,20:28
foutatorojrosser, mgariepy: any suggestion ?20:43
*** dviroel is now known as dviroel|afk20:43
foutatorojrosser, mgariepy:: I just found the root issue
foutatorohow to start cinder-volume please ?20:53
jrosseri'm not so sure that it's stopped21:05
jrosserrestarting the cinder-volume service and looking at the journal perhaps21:06
foutatoroI wonder also if my PV are full
foutatorojrosser: do you think is it possible to loose volume after reboot21:33
jrosseri suspect that this is a result of the cinder active/active stuff21:33
jrosserbut it's late now here so i have to go21:34
foutatorojrosser: ok thanks for your help21:37

Generated by 2.17.3 by Marius Gedminas - find it at!