Tuesday, 2024-01-23

-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 906245: Fix bug in legacy code for getting parent jobs https://review.opendev.org/c/zuul/zuul/+/90624507:39
@mfeder:matrix.orgoh sorry, I missed this thread, so at the end I wrote a simple Zuul MQTT reporter -> Matrix chat bridge 09:37
- https://github.com/SovereignCloudStack/zuul-mqtt-matrix-bridge
It works so far, see some reports here https://matrix.to/#/#scs-zuul-reports:matrix.org
-@gerrit:opendev.org- Fabien Boucher proposed: [zuul/zuul] 906355: gitlab doc - update the reference pipelines https://review.opendev.org/c/zuul/zuul/+/90635509:55
-@gerrit:opendev.org- Fabien Boucher proposed on behalf of Artem Goncharov: [zuul/zuul] 813136: Check blocking_discussions_resolved in gitlab driver https://review.opendev.org/c/zuul/zuul/+/81313610:45
@mfeder:matrix.orgHi, I would like to see Zuul's nodepool-related metrics in Grafana. I tried to reuse this nice dashboard (https://grafana.opendev.org/d/6c807ed8fd/nodepool), but my Zuul instance does not send metrics like stats.gauges.nodepool.provider.*.nodes.building or stats.gauges.nodepool.provider.*.nodes.deleting. I can only see the in_use and total metrics of Nodepool, as mentioned in the docs (https://zuul-ci.org/docs/zuul/latest/monitoring.html#stat-zuul.nodepool.resources).13:35
The docs do not mention the building or deleting nodes-related metrics. Could you please point me in the right direction to find these building or deleting nodes metrics?
@morucci:matrix.orgHi, I'd like to know if having a way to include or exclude configuration items by type + name and not only type could make sense to you ? In some of our use cases where we re-use some upstream Zuul config in other Zuul instances/tenants this feature would be really practical.14:26
@fungicide:matrix.org> <@mfeder:matrix.org> Hi, I would like to see Zuul's nodepool-related metrics in Grafana. I tried to reuse this nice dashboard (https://grafana.opendev.org/d/6c807ed8fd/nodepool), but my Zuul instance does not send metrics like stats.gauges.nodepool.provider.*.nodes.building or stats.gauges.nodepool.provider.*.nodes.deleting. I can only see the in_use and total metrics of Nodepool, as mentioned in the docs (https://zuul-ci.org/docs/zuul/latest/monitoring.html#stat-zuul.nodepool.resources).14:35
>
> The docs do not mention the building or deleting nodes-related metrics. Could you please point me in the right direction to find these building or deleting nodes metrics?
what nodepool driver(s) are you using? some of them may not emit all of those events
@fungicide:matrix.orgmfeder: in the case of the openstack driver, some of the stats are emitted by openstackclient on the launchers, and so it needs configuring as well. see https://zuul-ci.org/docs/nodepool/latest/installation.html#statsd-and-graphite for more information14:40
@mhuin:matrix.org@mfeder - not sure if Grafana supports statsd as a data source, but if you have a prometheus running you could use statsd-exporter to convert statsd metrics into prometheus metrics, then injest them in Grafana via the prometheus datasource14:42
@fungicide:matrix.orgmfeder: here's an example of how opendev configures openstackclient to use a per-provider statsd prefix for nodepool launchers when using nodepool's openstack driver: https://opendev.org/opendev/system-config/src/branch/master/playbooks/templates/clouds/nodepool_clouds.yaml.j2#L26-L2814:44
@mhuin:matrix.orghere's an example of a statsd-exporter config to fetch nodepool metrics: https://github.com/softwarefactory-project/sf-operator/blob/master/controllers/static/nodepool/statsd_mapping.yaml.tmpl14:44
@fungicide:matrix.orgfbo: do you have a proposal for what that might look like in the tenant configuration?14:45
@mhuin:matrix.org * @mfeder - not sure if Grafana supports statsd as a data source, but if you have a prometheus running you could use statsd-exporter to convert statsd metrics into prometheus metrics, then ingest them in Grafana via the prometheus datasource14:46
@mfeder:matrix.org> <@fungicide:matrix.org> mfeder: here's an example of how opendev configures openstackclient to use a per-provider statsd prefix for nodepool launchers when using nodepool's openstack driver: https://opendev.org/opendev/system-config/src/branch/master/playbooks/templates/clouds/nodepool_clouds.yaml.j2#L26-L2814:46
fungi: yes it is `driver: openstack`, thank you, this is my missing piece
@jim:acmegating.comfbo: i think i would consider that out of scope.  most of zuul's config is designed to be in-repo, and that would require too much coordination between the out-of-repo config and in-repo config.14:46
@jim:acmegating.comfbo: note that filenames/paths could be used to some similar effect here14:47
@jim:acmegating.commfeder: fungi note that the provider state guages should be included for all providers automatically14:48
@jim:acmegating.commfeder: so while you may need additional config to get some of the the openstack api performance metrics, the node state gauges should already be there if you're getting anything at all from nodepool, so there may be another problem.14:49
@morucci:matrix.orgfungi: no not yet, I'd like to get yours thoughts on it first14:51
@fungicide:matrix.orgmfeder: to corvus's point, make sure your stats target is allowing statsd traffic from your nodepool launchers if they're on different servers than your other components14:52
@morucci:matrix.orgcorvus:  Do you means that splitting the config in various files in zuul.d and having tenant's config parameters to explicitely include or exclude files under zuul.d would make more sense than by config element name ?14:52
@morucci:matrix.org * corvus:  Do you mean that splitting the config in various files in zuul.d and having tenant's config parameters to explicitely include or exclude files under zuul.d would make more sense than by config element name ?14:52
@mfeder:matrix.org> <@mhuin:matrix.org> @mfeder - not sure if Grafana supports statsd as a data source, but if you have a prometheus running you could use statsd-exporter to convert statsd metrics into prometheus metrics, then ingest them in Grafana via the prometheus datasource14:53
Grafana data source is graphite, so my zuul instance pushes metrics to the graphite.
Yes, I also played with the option to use statsd-exporter to convert statsd metrics into prometheus metrics, but i see two issues here:
1. Mapping statsd metrics to the Prometheus format. This appears to be addressed (thank you:) https://softwarefactory-project.io/r/c/software-factory/sf-operator/+/29482
2. Creating Grafana dashboards for Zuul and Prometheus datasouce could be time-consuming, as I did not find any Zuul Grafana dashboards for prometheus datasource similar to those for Nodepool and Zuul Status at https://grafana.opendev.org/ (designed for the Graphite datasource).
because of the above I'm investigating graphite datasource as a storage of zuul's statsd metrics
@mhuin:matrix.orgmfeder: right, apologies, I overlooked the part about openstack metrics14:54
@jim:acmegating.comfbo: i mean that today there is an ability to include extra config directories, so with some careful judgement about what's in the default config directory, and when and which extra directories to include, some amount of partial or no-reuse can be achieved.  i don't know how easy it is to adapt that to your use case, but that is something that i have seen done to address similar issues.14:55
@jim:acmegating.comi don't think prometheus is the issue here, but there is a prpoposed mapping for the upstream repo here: https://review.opendev.org/66047214:56
@jim:acmegating.comi agree that mfeder's best course is to fix this in graphite, since statsd is the ultimate source of these particular metrics and needs to work regardless, and leave consideration of prometheus for some other time14:57
@jim:acmegating.commfeder: the method in nodepool that emits the total "in-use" metrics also emits the same metrics for each provider, and also the "building" and "deleting" metrics, both total and per-provider.  so if any of that is working, then all of it should be.  you might want to make sure that the graphite server hasn't run out of disk; or check the actual database files on disk on the graphite server that correspond to those metrics.  errors like that can appear as missing metrics in graphite, while graphite can continue to update existing metrics since it doesn't need any new space for those.15:03
@mfeder:matrix.org> <@fungicide:matrix.org> mfeder: to corvus's point, make sure your stats target is allowing statsd traffic from your nodepool launchers if they're on different servers than your other components15:09
hmm, all Zuul components are deployed on the same server, allowing me to observe StatsD metrics in the targeting store (Graphite).
Interestingly, I can access all metrics except those related to nodes and Zookeeper. I need to review the configurations for Nodepool launcher and Zookeeper to ensure everything is set up correctly
@mfeder:matrix.org> <@jim:acmegating.com> mfeder: the method in nodepool that emits the total "in-use" metrics also emits the same metrics for each provider, and also the "building" and "deleting" metrics, both total and per-provider.  so if any of that is working, then all of it should be.  you might want to make sure that the graphite server hasn't run out of disk; or check the actual database files on disk on the graphite server that correspond to those metrics.  errors like that can appear as missing metrics in graphite, while graphite can continue to update existing metrics since it doesn't need any new space for those.15:11
I will do that. Thanks for all these hints!
@morucci:matrix.orgcorvus: thanks perhaps that could help yes, so if I understood well based on that feature: https://zuul-ci.org/docs/zuul/latest/tenants.html#attr-tenant.untrusted-projects.%3Cproject%3E.extra-config-paths I'll see if that fit their need.16:32
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 906433: Add support for persisting add_host across playbooks https://review.opendev.org/c/zuul/zuul/+/90643323:16
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 906433: Add support for persisting add_host across playbooks https://review.opendev.org/c/zuul/zuul/+/90643323:20

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!