opendevreview | Merged openstack/manila-specs master: Add spec for share/share-snapshot deferred deletion https://review.opendev.org/c/openstack/manila-specs/+/901700 | 01:24 |
---|---|---|
opendevreview | Merged openstack/manila stable/xena: Fix error message from share server API https://review.opendev.org/c/openstack/manila/+/905355 | 12:01 |
opendevreview | Merged openstack/manila stable/xena: skip periodic update on replicas in 'error_deleting' https://review.opendev.org/c/openstack/manila/+/892345 | 12:03 |
opendevreview | Merged openstack/python-manilaclient stable/wallaby: Support --os-key option https://review.opendev.org/c/openstack/python-manilaclient/+/877023 | 12:07 |
opendevreview | Merged openstack/python-manilaclient stable/wallaby: Use suitable api version for OSC. https://review.opendev.org/c/openstack/python-manilaclient/+/878037 | 12:15 |
opendevreview | Merged openstack/manila stable/wallaby: Validate provider_location while managing snapshot https://review.opendev.org/c/openstack/manila/+/897035 | 12:42 |
opendevreview | Merged openstack/manila stable/2023.1: Change status and error handling for /shares API https://review.opendev.org/c/openstack/manila/+/905459 | 12:42 |
opendevreview | kiran pawar proposed openstack/manila master: Retry on connection error to neutron https://review.opendev.org/c/openstack/manila/+/905695 | 14:11 |
klindgren | hello I am having some issues with Manila and looking for some guidance. I am trying to use manila with the cephfs native driver. I have a single cephfs cluster with mutiple cephfs filesystems. I am trying to add each filesystem to manila and have share requests split between them. When I do so, all share provisioning requests go to the first backend. The scheduler see's both backends but always filters down two 2 hosts. Then it weights, | 16:40 |
klindgren | and goodness them both to 0 and then chooses then always chooses the first backend. No matter how many share's are provisioned against it. If I try setting the goodness setting on the second backend, the scheduler reports that the goodness filter is not found and defaults 0. It seems like looking at the code that the goodness filter defaults to none, and that its up to the share driver to implement it? | 16:40 |
klindgren | How can I get share requests to split screates between two backends? I tried the simple scheduler and it stack traces on: ` results = db.service_get_all_share_sorted(elevated)` in the simple driver with a `sqlalchemy.exc.InvalidRequestError: Entity namespace for "coalesce(anon_1.share_gigabytes, :coalesce_1)" has no property "topic"`. | 16:40 |
klindgren | So just wondering what am I missing and how is this suppose to work? | 16:40 |
gouthamr | hey klindgren: what are you trying to set the goodness function to? When creating multiple backends off of the same ceph cluster, the capacity information would be the same (cephfs filesystems can span the whole ceph cluster) - so all things will appear the same unless the request has some characteristics (share type extra specs) that need to match the backend capabilities | 16:49 |
klindgren | for testing I was just simply setting the the goodness_filter to "100". To try to make a request go to the second backend, but the debug logs say that the goodness function is not defined. | 16:51 |
klindgren | My hope was that I could write a filter that would simple split the number of shares between the backends with the same share_backend_name. The reason for multiple cephfs filesystems on the same cluster is each cephfs filesystem is tied to different MDS daemon pairs. Due to the workloads that we have being extreemly metadata intensive. | 16:52 |
klindgren | I had already seen that the capacity was reported the same, though I would have expected to also look at allocated resources against a backend (vs's actual consumed) and eventually would expect one to be weighed higher than the other, simple because it had no shares/resources allocated to it. | 16:54 |
gouthamr | yes; that’s how it works with drivers that allow controlling over provisioning with Manila… but the Cephfs driver does not | 16:55 |
gouthamr | are you aware of a way that the mds load can be detected? Maybe that’s a good goodness_function that we can implement | 16:56 |
klindgren | re: detecting MDS load, I can ask around - I deal much less on that side of the house. | 16:58 |
gouthamr | thanks; I’ll look at some docs as well or consult some ceph folks.. in the meanwhile, the simple filter failure sure looks like a db query bug… | 16:59 |
gouthamr | could you please report it on bugs.launchpad.net/manila with the stack trace? | 17:00 |
klindgren | Sure. | 17:05 |
klindgren | When you say: "that�s how it works with drivers that allow controlling over provisioning with Manila� but the Cephfs driver does not". Can you help me understand how scheduling works then? Like if I have 2 backends both providing the same features and I jsut want share creates to round-robin between the two - is that not possible? or only possible on specific drivers? | 17:07 |
klindgren | Is there someway that I can have each filesystem on the ceph cluster exposed as a pool and use the pool_weight_multiplier = -1.0, so that it spreads requests across the pools, vs's packs them all into the same pool | 17:08 |
klindgren | is it possible to just set a simple no more than x shares per backend? | 17:09 |
klindgren | Per my ceph teamates, they said: `The number of requests per second could be used, but it's pretty spiky. The command "ceph fs status" can be used to see those metrics` | 17:20 |
gouthamr | klindgren: thanks for that feedback; round robin's not possible because we don't preserve context of scheduling decisions.. i think your issue may be resolved with implementing "allocated_capacity_gb" in the cephfs driver, so that you could use it in a goodness_function configuration | 17:29 |
klindgren | https://bugs.launchpad.net/manila/+bug/2049528 - re simple scheduler stack trace | 17:31 |
gouthamr | thanks klindgren | 17:31 |
klindgren | That also appears to require implementing the goodness function in the share driver as well? From what I was able to see - it looks like only really the netapp drivers appear to have this functionality? Everything else appeared to inherit the base class which returned none. | 17:38 |
klindgren | I did have an additional user cluster. We have multiple control plane servers, that we were planning on running manila-share under with the same backends configured (like we do for pretty much everything else that we run openstack wise). However, this appears that we would cause 3 host entires to show up exposing the same backends. EG if we had the same 20 backends, configured on 3 control plane nodes, with would result in 60 backends showi | 17:42 |
klindgren | ng up? What is the recommended deployment for this? To set the hostname in config file on all the share servers the same? | 17:42 |
gouthamr | klindgren: goodness functions can be layered on top of whatever the driver supports, and evaluated in the scheduler.. so no, the driver doesn't _need_ to implement a custom one; it can be oblivious to it... but, drivers can implement a default goodness function (this one will kick in if there's no configured goodness_function) | 17:45 |
klindgren | Ok - so in the exmaple where I set the goodness filter in the config to 100. And the scheduler doesn't see a function defined, is that a bug then? When I added debug statements around `host_state.capabilities` the goodness_function is not contained in the data about the backend. It just contains: | 17:49 |
klindgren | {'pool_name': 'cephfs', 'total_capacity_gb': 429237.19, 'free_capacity_gb': 429161.58, 'qos': 'False', 'reserved_percentage': 0, 'reserved_snapshot_percentage': 0, 'reserved_share_extend_percentage': 0, 'dedupe': [False], 'compression': [False], 'thin_provisioning': [False], 'share_backend_name': 'cephfs-fstest-2', 'storage_protocol': 'CEPHFS', 'vendor_name': 'Ceph', 'driver_version': '1.0', 'timestamp': datetime.datetime(2024, 1, 16, 15, 22, | 17:49 |
klindgren | 10, 250194), 'driver_handles_share_servers': False, 'snapshot_support': True, 'create_share_from_snapshot_support': True, 'revert_to_snapshot_support': False, 'mount_snapshot_support': False, 'replication_type': None, 'replication_domain': None, 'sg_consistent_snapshot_support': None, 'security_service_update_support': False, 'network_allocation_update_support': False, 'share_server_multiple_subnet_support': False, 'ipv4_support': True, 'ipv6_ | 17:49 |
klindgren | support': False} | 17:49 |
klindgren | even when the backend is defined as: | 17:51 |
klindgren | [cephfs-fstest-2] | 17:51 |
klindgren | driver_handles_share_servers = False | 17:51 |
klindgren | share_backend_name = cephfs-fstest-2 | 17:51 |
klindgren | share_driver = manila.share.drivers.cephfs.driver.CephFSDriver | 17:51 |
klindgren | cephfs_conf_path = /etc/ceph/fstest-2.conf | 17:51 |
klindgren | cephfs_auth_id = manila | 17:51 |
klindgren | cephfs_filesystem_name = cephfs02 | 17:51 |
klindgren | cephfs_cluster_name = cephtest | 17:51 |
klindgren | goodness_function = "100" | 17:51 |
gouthamr | klindgren: regarding your other question.. we'd recommend you only run one instance of manila-share per backend... what most deployments do is run manila's share-manager service under pacemaker or similar services that handle HA... the service is effectively deployed active/passive: | 17:51 |
gouthamr | klindgren: in that case, yes, the "host" attribute in the config file on each controller node is set to a common string | 17:52 |
klindgren | pacemaker :puke: - I've personally never had good experience with pacemaker clusters, but I guess we can come up with something for ensuring only one copy is running at a time. | 17:53 |
gouthamr | klindgren: hmm, i don't see that behavior on the CI - i.e., the scheduler does see a "goodness_function" (defaults to None) - https://zuul.opendev.org/t/openstack/build/cf3bfa7cdca4489caad829bcd11c4bab/log/controller/logs/screen-m-sch.txt#892 .. | 17:56 |
gouthamr | klindgren: what version of openstack are you using? | 17:56 |
klindgren | 2023.1 | 17:56 |
klindgren | I am using kolla-ansible under the hood here. Which by default only configures the share specific stuff under the manila.conf for the share service. However, I modified the scheduler manila.conf as well and added it there, and the logs still say its not found. | 17:58 |
gouthamr | klindgren: share-manager service node is where you need to put this.. and it should work like the way you've configured it; i'm confused and reading code to see why that doesn't work | 17:59 |
gouthamr | klindgren: is it possible for you to enable debug=True on the node containing manila-share? the service will spit out the config opts right on top.. can you see if this is getting picked up? | 18:03 |
opendevreview | Takashi Kajinami proposed openstack/manila master: Drop upgrade scripts for old releases https://review.opendev.org/c/openstack/manila/+/905754 | 18:04 |
klindgren | its already running in debug - checking. | 18:04 |
gouthamr | klindgren: ack; you can use this pastebin to share long pastes: paste.openstack.org | 18:05 |
klindgren | ```2024-01-16 18:04:57.979 7 DEBUG oslo_service.service [None req-5e7d5451-80c5-451d-9791-b9971f10b132 - - - - - -] cephfs-fstest-2.goodness_function = 100 log_opt_values /var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_config/cfg.py:2609 | 18:05 |
klindgren | looks like its seeing it | 18:05 |
klindgren | but scheduler still logs that its not set on share creates: | 18:08 |
klindgren | https://paste.openstack.org/show/bbsu5ESqhqogNkTzVgnc/ | 18:08 |
gouthamr | klindgren: i just tried doing this on my machine, and it showed up in the scheduler.. | 18:13 |
gouthamr | klindgren: by this, i mean i set "goodness_function = 100".. and i bounced the manila-share service.. the scheduler reflected it in the backend's pool | 18:14 |
gouthamr | if you turn on debug=True on the scheduler's manila.conf, you'll see a message in the host_manager: "Received share service update from <host> ..." -- this message contains host's stats | 18:15 |
klindgren | does the `share_backend_name` need to exactly match the config stanza name? I saw something about that as a bug like 7-8 years ago that got fixed. | 18:18 |
klindgren | https://paste.openstack.org/show/buBeMSGeiX9KSfUF6pnv/ | 18:18 |
klindgren | I believe is what you are talking about - it doesn't have the goodness_filter stuff in the updates | 18:19 |
gouthamr | klindgren: i see "'goodness_function': '55'" here | 18:20 |
klindgren | hrm | 18:20 |
klindgren | I guess I have it set to 55 right now, I had it at 100, but I see 55. now. | 18:21 |
klindgren | So at `2024-01-16 18:04:58.274`, it says its set to 55, but at `2024-01-16 18:06:02.110` it says that its not defined. No other updates happened between those: | 18:24 |
klindgren | https://paste.openstack.org/show/b1WWLlBSSg9a8M0q2ntx/ | 18:24 |
gouthamr | Couple of things to try to isolate this: bounce the scheduler service - allow it to get fresh updates and not rely on any data that’s possibly stale.. it is also possible that updates from multiple controllers are messing with this? has the config opt been set everywhere where this backend has been defined? | 18:56 |
klindgren | I can work on debugging this some more. Everything is on a single host for now - to avoid initial roll out complications. Might also look at just moving to the latest release for this component. | 19:10 |
gouthamr | klindgren: thanks; i reported https://bugs.launchpad.net/manila/+bug/2049538 .. please take a look/subscribe to it, and feel free to add any comments there | 19:30 |
gouthamr | i'll use it when discussing with my ceph engineering colleagues | 19:30 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!