*** nicovs_be has joined #ara | 00:08 | |
*** nicovs_be has quit IRC | 00:12 | |
ara-slack | <pilotmattk> @dmsimard We need the DB, running Maria. Using this in a *very, large, international shop. 200,000+ endpoints :grinning:. We had to build our own ansible control/cluster to slice inventory across nodes. Currently doing some load testing before opening the gates in Q1 next year | 02:07 |
---|---|---|
ara-slack | <pilotmattk> Without ARA, I can hit ~5K nodes in 3-6 minutes. With ARA it's more like 20mins, can only push about 32 commits per minute. DB is fairly close (same LAN). Troubleshooting both sides. DB and the app, currently. | 02:08 |
ara-slack | <dmsimard> @pilotmattk I'm sure you could be surprised by the performance of sqlite, even at a large scale. If there's even 5ms roundtrip (10ms) for a MySQL database, if you're recording 20k task results (4 tasks on 5k hosts?) that's already 3 minutes worth of latency over the course of a playbook run | 02:10 |
ara-slack | <dmsimard> There's certainly an overhead in running a callback that records data, especially that amount of data. I'm not particularly surprised by your numbers. I'd love to improve them, though. | 02:11 |
ara-slack | <dmsimard> Out of curiosity, we could try benchmarking the current state of ARA 1.0 with your setup -- I'd love to test with that use case and find improvement opportunities. Not tonight, though :) | 02:12 |
ara-slack | <pilotmattk> Yea, I'm not at all surprised about callbacks adding time. 2-3X just seemed high. We're running playbooks all over the place inside docker containers (multiple clouds). Having SQLight files all over would be hard to track down. I wonder if there is a way to batch up results and send bulk commits? Maybe send things through a redis buffer as forks spawn/die. I'd be happy to give 1.0 a go, just replying as there is time | 02:16 |
ara-slack | :slightly_smiling_face: Will check back on tomorrow. | 02:16 |
ara-slack | <dmsimard> @pilotmattk Another thing is that ara 1.0 will ship the notion of input drivers. Right now you have this callback <1.0 that does pure SQL queries, in >1.0, this callback is refactored to use an API instead (either "internal (offline)" or HTTP REST). However, this callback will be folded back as a "driver". The driver implementation will make it easier to add other means of inserting data into ARA. I see you mention redis but we already have | 02:20 |
ara-slack | other message queues in mind like mqtt, rabbitmq, etc. This way, the data could be written to a low-latency bus and asynchronously processed to make it available in the interface. | 02:20 |
ara-slack | <dmsimard> Definitely happy to spend some time narrowing down how we can improve this :slightly_smiling_face: | 02:40 |
*** bcoca has quit IRC | 03:48 | |
*** jparrill has joined #ara | 06:59 | |
*** nicovs_be has joined #ara | 07:45 | |
*** jclaret has joined #ara | 07:57 | |
*** jcl has joined #ara | 07:57 | |
*** twouters_ is now known as twouters | 08:23 | |
*** twouters has joined #ara | 08:23 | |
*** jcl has quit IRC | 09:59 | |
*** jclaret has quit IRC | 10:00 | |
*** jcl has joined #ara | 10:00 | |
*** sshnaidm is now known as sshnaidm|afk | 10:06 | |
*** jclaret has joined #ara | 10:07 | |
*** sshnaidm|afk is now known as sshnaidm | 11:34 | |
*** nicovs_b_ has joined #ara | 12:02 | |
*** nicovs_be has quit IRC | 12:04 | |
*** bcoca has joined #ara | 13:35 | |
*** jclaret has quit IRC | 14:01 | |
*** jcl has quit IRC | 14:01 | |
*** dmsimard|off is now known as dmsimard | 14:05 | |
*** jclaret has joined #ara | 14:11 | |
*** jcl has joined #ara | 14:11 | |
*** bcoca has quit IRC | 14:18 | |
*** bcoca has joined #ara | 14:19 | |
*** bcoca has quit IRC | 14:19 | |
*** bcoca has joined #ara | 14:19 | |
ara-slack | <pilotmattk> I have a few ideas to try. Just to double-check that latency estimate vs 3 minutes. If I understand callbacks correctly, each fork runs the callback with the parent running a final callback at the end (summary). Is this correct? There *Should* be a high degree of concurrency at 500 forks (depends on db threads). | 14:34 |
ara-slack | <pilotmattk> I see the 1.0 branch out on github (your repo and openstack), how safe are the callbacks? Does the data model match 0.14.5 | 14:36 |
*** tbielawa has joined #ara | 14:41 | |
ara-slack | <dmsimard> @pilotmattk the database model is very different and there's no upgrade path, it breaks backwards compatibility. | 15:04 |
ara-slack | <dmsimard> The callback is safe (it is integration tested), I haven't yet fully tested the API with MySQL however. | 15:05 |
*** jcl has quit IRC | 15:06 | |
ara-slack | <dmsimard> I'm not sure about the impact of forks, not familiar with the low level implementation in Ansible | 15:06 |
*** jcl has joined #ara | 15:06 | |
ara-slack | <dmsimard> For yesterday's example, it was just napkin math, though, there's a bit more queries involved than that.. recording hosts, files, plays. Bulk of your time would be spent recording task results in your use case however | 15:08 |
ara-slack | <pilotmattk> OK, thank you. First order of business is to locate the DB in the same rack (or floor) as the worker nodes. Then thinking about ways to batch up / federate mysql (without the overhead of FederatedX). Might try writing to SQlite then dump and load to backhaul the data. Need some sort of local write cache / tempfs. RabbitMQ is perfect going forward, seems 50/50 which in-mem store a python project will pick | 15:21 |
ara-slack | <dmsimard> @pilotmattk I don't know what's your use case but we had scalability issues in OpenStack because we were generating static reports for every CI job. The static reports aren't large but it's a lot of smaller files. Anyway, we came up with a WSGI middleware to load arbitrary sqlite databases which suits well the use case for "ephemeral" CI reports http://ara.readthedocs.io/en/latest/advanced.html | 15:25 |
ara-slack | <pilotmattk> I think I remember that blog post... something about static reports in jenkins. To date we do not generate static reports... We could likely store the sqlite file as a blob inside our Deployment Orchestrator db (postgres).. That has some potential. Call postgres HTTP api to retrieve the report. | 15:50 |
ara-slack | <pilotmattk> I'd have to think about how to collate the reports... eventually. there is a desire to have some visibility exactly what is being deployed where and how often. | 15:51 |
*** jparrill has quit IRC | 15:54 | |
*** nicovs_b_ has quit IRC | 16:08 | |
ara-slack | <dmsimard> @pilotmattk yeah, the feature is not so much to create static reports, but rather to dynamically load an arbitrary sqlite database | 16:31 |
ara-slack | <dmsimard> So instead of generating a static report and storing the report files, you store the sqlite database(s) instead | 16:31 |
ara-slack | <dmsimard> I recognize it's a niche use case though :slightly_smiling_face: | 16:32 |
*** nicovs_be has joined #ara | 16:52 | |
*** nicovs_be has quit IRC | 16:56 | |
*** dougbtv__ has joined #ara | 17:45 | |
*** dougbtv_ has quit IRC | 17:48 | |
*** cliles has joined #ara | 17:53 | |
*** tbielawa has quit IRC | 17:56 | |
*** dougbtv__ has quit IRC | 18:12 | |
*** tbielawa has joined #ara | 18:33 | |
*** dougbtv__ has joined #ara | 18:36 | |
*** tbielawa is now known as tbielawa|caff | 19:00 | |
*** jclaret has quit IRC | 19:21 | |
*** jcl has quit IRC | 19:21 | |
*** tbielawa|caff is now known as tbielawa | 19:28 | |
*** resmo has joined #ara | 20:22 | |
*** resmo has quit IRC | 20:24 | |
*** tbielawa has quit IRC | 21:07 | |
*** nicovs_be has joined #ara | 23:09 | |
*** nicovs_be has quit IRC | 23:13 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!