*** tosky has quit IRC | 00:04 | |
*** macz_ has quit IRC | 00:58 | |
*** mlavalle has quit IRC | 01:30 | |
*** _mlavalle_1 has joined #openstack-meeting-3 | 01:30 | |
*** artom has quit IRC | 02:12 | |
*** hemanth_n has joined #openstack-meeting-3 | 02:26 | |
*** benj_- has joined #openstack-meeting-3 | 02:35 | |
*** benj_ has quit IRC | 02:35 | |
*** benj_- is now known as benj_ | 02:35 | |
*** macz_ has joined #openstack-meeting-3 | 02:55 | |
*** macz_ has quit IRC | 03:00 | |
*** macz_ has joined #openstack-meeting-3 | 03:46 | |
*** macz_ has quit IRC | 03:50 | |
*** ricolin has joined #openstack-meeting-3 | 05:59 | |
*** yamamoto has quit IRC | 06:48 | |
*** lkoranda has joined #openstack-meeting-3 | 07:25 | |
*** yamamoto has joined #openstack-meeting-3 | 07:29 | |
*** lkoranda has quit IRC | 07:35 | |
*** eolivare has joined #openstack-meeting-3 | 07:35 | |
*** yamamoto has quit IRC | 07:39 | |
*** slaweq has joined #openstack-meeting-3 | 08:00 | |
*** tosky has joined #openstack-meeting-3 | 08:33 | |
*** e0ne has joined #openstack-meeting-3 | 08:54 | |
*** aarents has quit IRC | 09:24 | |
*** tosky_ has joined #openstack-meeting-3 | 09:47 | |
*** tosky is now known as Guest24372 | 09:49 | |
*** tosky_ is now known as tosky | 09:49 | |
*** Guest24372 has quit IRC | 09:50 | |
*** lpetrut has joined #openstack-meeting-3 | 09:57 | |
*** yamamoto has joined #openstack-meeting-3 | 10:12 | |
*** baojg has quit IRC | 10:18 | |
*** baojg has joined #openstack-meeting-3 | 10:20 | |
*** artom has joined #openstack-meeting-3 | 11:13 | |
*** macz_ has joined #openstack-meeting-3 | 11:18 | |
*** macz_ has quit IRC | 11:23 | |
*** yamamoto has quit IRC | 11:32 | |
*** raildo has joined #openstack-meeting-3 | 11:53 | |
*** yamamoto has joined #openstack-meeting-3 | 11:55 | |
*** yamamoto has quit IRC | 12:00 | |
*** eolivare_ has joined #openstack-meeting-3 | 12:03 | |
*** eolivare has quit IRC | 12:05 | |
*** baojg has quit IRC | 12:09 | |
*** baojg has joined #openstack-meeting-3 | 12:09 | |
*** yamamoto has joined #openstack-meeting-3 | 12:20 | |
*** eolivare_ has quit IRC | 12:25 | |
*** baojg has quit IRC | 12:42 | |
*** eolivare_ has joined #openstack-meeting-3 | 12:43 | |
*** baojg has joined #openstack-meeting-3 | 12:44 | |
*** hemanth_n has quit IRC | 13:03 | |
*** liuyulong has joined #openstack-meeting-3 | 13:59 | |
*** mdelavergne has joined #openstack-meeting-3 | 14:51 | |
ttx | #startmeeting large_scale_sig | 15:00 |
---|---|---|
openstack | Meeting started Wed Dec 16 15:00:06 2020 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
*** genekuo has joined #openstack-meeting-3 | 15:00 | |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
*** openstack changes topic to " (Meeting topic: large_scale_sig)" | 15:00 | |
openstack | The meeting name has been set to 'large_scale_sig' | 15:00 |
ttx | #topic Rollcall | 15:00 |
*** openstack changes topic to "Rollcall (Meeting topic: large_scale_sig)" | 15:00 | |
ttx | Who is here for the Large Scale SIG meeting ? | 15:00 |
mdelavergne | Hi! | 15:00 |
genekuo | o/ | 15:00 |
ttx | mdelavergne: hi! | 15:00 |
jpward | o/ | 15:00 |
liuyulong | HI | 15:00 |
ttx | pinging amorin | 15:01 |
ttx | I don;t see belmiro in channel | 15:01 |
*** belmoreira has joined #openstack-meeting-3 | 15:01 | |
ttx | pinging imtiazc too | 15:01 |
imtiazc | I'm here | 15:02 |
belmoreira | o/ | 15:02 |
ttx | if we say belmoreira 3 times he appears | 15:02 |
ttx | Our agenda for today is at: | 15:02 |
ttx | #link https://etherpad.openstack.org/p/large-scale-sig-meeting | 15:02 |
ttx | #topic Review previous meetings action items | 15:02 |
*** openstack changes topic to "Review previous meetings action items (Meeting topic: large_scale_sig)" | 15:02 | |
ttx | "ttx to add 5th stage around upgrade and maintain scaled out systems in operation" | 15:03 |
ttx | that's done at: | 15:03 |
ttx | #link https://wiki.openstack.org/wiki/Large_Scale_SIG/UpgradeAndMaintain | 15:03 |
ttx | we can have a look when we'll review those pages later | 15:03 |
ttx | "ttx to make sure oslo.metrics 0.1 is released" | 15:03 |
ttx | That was done through https://review.opendev.org/c/openstack/releases/+/764631 and now oslo.metrics is available at: | 15:03 |
ttx | #link https://pypi.org/project/oslo.metrics/ | 15:03 |
ttx | It was also added to OpenStack global requirements by genekuo: | 15:03 |
ttx | #link https://review.opendev.org/c/openstack/requirements/+/766662 | 15:04 |
ttx | So it's not ready to consume and will be included in OpenStack Wallaby. | 15:04 |
genekuo | ttx, can you ping me when CI is fix? | 15:04 |
genekuo | *fixed | 15:04 |
ttx | It's up to us to now better explain how to enable and use it | 15:04 |
ttx | I'll ask around on what is still blocked, yes | 15:04 |
ttx | "all to help in filling out https://etherpad.opendev.org/p/large-scale-sig-scaling-videos" | 15:04 |
ttx | Thanks everyone for the help there! | 15:05 |
imtiazc | Here | 15:05 |
ttx | As a reminder we did look up those videos for two reasons: | 15:05 |
ttx | - we can link to them on wiki pages as a good resource to watch (if relevant to any stage) | 15:05 |
ttx | - we could reach out to specific presenters so that they share a bit more about their scaling story | 15:05 |
ttx | So if you watch them and find them very relevant for any of our stages, please add them to the wiki pages | 15:05 |
ttx | And if a specific use case looks very interesting but lacks details, we could reach our to the presenters with more questions | 15:05 |
ttx | Especially from presenters who are not already on the SIG, like China Mobile, Los Alamos, ATT, Reliance Jio... | 15:06 |
ttx | Questions on that? | 15:06 |
mdelavergne | seems straightforward | 15:07 |
ttx | "ttx to check out Ops meetups future plans" | 15:07 |
ttx | I did ask and there is no event planned yet, so we can't piggyback on that for now for our "scaling story collection" work | 15:07 |
ttx | we'll see what event(s) are being organized in 2021. should see more clearly in January | 15:08 |
ttx | "all to review pages under https://wiki.openstack.org/wiki/Large_Scale_SIG in preparation for next meeting" | 15:08 |
ttx | We'll discuss that now in more details in the next topic | 15:08 |
genekuo | I've put some short answer on some of the question listed there | 15:08 |
ttx | Any question or comment on those action items? Anything to add? | 15:08 |
genekuo | at least the thing I know | 15:08 |
ttx | oh yes I saw it | 15:08 |
ttx | that's the idea, feel free to add things to those pages. We'll review them now and see if there is anything we should prioritize adding | 15:09 |
ttx | #topic Reviewing all scaling stages, and identifying simple tasks to do a first pass at improving those pages | 15:09 |
*** openstack changes topic to "Reviewing all scaling stages, and identifying simple tasks to do a first pass at improving those pages (Meeting topic: large_scale_sig)" | 15:09 | |
ttx | So.. the first one is... | 15:09 |
ttx | #link https://wiki.openstack.org/wiki/Large_Scale_SIG/Configure | 15:10 |
ttx | At this stage I don't think there are any easy tasks... | 15:10 |
ttx | amorin did lead the curation of at-scale configuration defaults | 15:10 |
ttx | But it's still work in progress, | 15:10 |
ttx | so i don;t think we have a final answer for the "Which parameters should I adjust before tackling scale ?" question | 15:11 |
ttx | Are there other common questions that we should list for that stage? | 15:11 |
ttx | maybe something around choosing the right drivers/backends at install time | 15:12 |
mdelavergne | maybe "how not" ? | 15:12 |
ttx | like which are the ones that actually CAN scale? | 15:12 |
imtiazc | Rabbit configuration. We had to tweak a few things there. | 15:12 |
ttx | maybe we can split the question into openstack parameters and rabbit parameters | 15:13 |
ttx | I'll do that now | 15:13 |
genekuo | I agree with listing out the drivers and backends people are using in large scale | 15:14 |
*** ralonsoh has quit IRC | 15:15 | |
ttx | ok I added those two as questions | 15:15 |
*** ralonsoh has joined #openstack-meeting-3 | 15:15 | |
*** ricolin_ has joined #openstack-meeting-3 | 15:15 | |
ttx | Any other easy things to add at that stage? | 15:16 |
imtiazc | Apart from DB and RMQ tuning, we had to add memcached. Memcached is used to be an optional deployment option but makes a big difference in performance | 15:16 |
ttx | imtiazc: how about we add "should I use memcached?" question | 15:17 |
ttx | then you can answer it | 15:17 |
ttx | :) | 15:17 |
imtiazc | Sure | 15:17 |
ttx | I like it, that's a good one | 15:18 |
ttx | OK, moving on to next stage... | 15:18 |
ttx | #link https://wiki.openstack.org/wiki/Large_Scale_SIG/Monitor | 15:18 |
jpward | I don't know exactly how to ask the question, but what about determining the number of controller nodes and associated services? | 15:18 |
ttx | jpward: that would be for step 3 | 15:18 |
ttx | we'll be back to it | 15:18 |
ttx | For the "Monitor" stage I feel like we should redirect people more aggressively to oslo.metrics | 15:19 |
ttx | oh I see that genekuo already added those | 15:19 |
genekuo | yeah | 15:20 |
ttx | genekuo: the next step will be to write good doc for oslo.metrics | 15:20 |
mdelavergne | yep, seems that oslo.metrics is currently everywhere :D | 15:20 |
ttx | so that we can redirect people to it and there they will find all answers | 15:20 |
genekuo | I think there are some other stuff that is worth monitoring like queued messages in rabbitmq | 15:20 |
genekuo | I will try to add some docs once the oslo.messaging code is done | 15:21 |
ttx | Anything else to add? I was tempted to add questions around "how do I track latency issues", "how do I track traffic issues", "how do I track error rates", "how do I track saturation issues" | 15:22 |
ttx | but I'm not sure we would have good answers for those anytime soon | 15:22 |
imtiazc | Are oslo.metrics supposed to help with distributed tracing? | 15:22 |
*** lpetrut has quit IRC | 15:23 | |
genekuo | I'm not sure about how the question will be, but we do monitor queued messages in rabbitmq | 15:23 |
genekuo | if it keep piling up, it may indicate that the workers aren't enough | 15:23 |
ttx | imtiazc: I'd say that oslo.metrics is more around targeted monitoring of external data sources (database, queue) from openstack perspective | 15:24 |
genekuo | imtiazc, what do you mean by distributed tracing? can you give an example on that? | 15:24 |
ttx | like tracing a user call through all components? | 15:24 |
genekuo | thanks ttx for the explaination | 15:24 |
imtiazc | ttx: Those are good questions. I think the answers will vary from one operator to another. | 15:24 |
genekuo | I agree it will be good to add those questions | 15:25 |
ttx | ok, I'll add them now | 15:25 |
imtiazc | genekuo; An example would be how much time each component of OpenStack takes to create a VM. It can be traced using a common request ID | 15:26 |
ttx | imtiazc: OSProfiler is supposed to help there | 15:27 |
ttx | example https://docs.openstack.org/ironic/pike/_images/sample_trace.svg | 15:28 |
imtiazc | Thanks, haven't tried that out yet. We were considering hooking up with OpenTracing or something like Jaeger | 15:28 |
ttx | I haven;t looked at it in a while, so not sure how usable it is | 15:29 |
genekuo | for oslo.metrics, I think what you can get is how much time it takes for scheduling rpc calls in a certain period. | 15:29 |
genekuo | but not for a specific request | 15:29 |
ttx | right, it's different goal | 15:29 |
mdelavergne | Osprofiler worked fine when we used it | 15:30 |
ttx | imtiazc: if you have a fresh look at it, I'm sure the group will be interested in learning what you thought of it | 15:30 |
ttx | ok, anything else to add to the Monitoring stage at this point? | 15:30 |
genekuo | LGTM | 15:31 |
imtiazc | Is there a plan for the community to develop all the monitoring checks (e.g. prometheus checks)? | 15:31 |
ttx | imtiazc: there has been a technical Committee discussion on how to develop something for monitoring that's more sustainable than Ceilometer | 15:32 |
ttx | including building something around prometheus | 15:32 |
ttx | discussion died down as people did not take immediate interest in working on it | 15:33 |
ttx | that does not mean it's not important | 15:33 |
ttx | we might need to revive that discussion after the holidays in one way or another | 15:33 |
ttx | moving on to stage 3 | 15:34 |
ttx | #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleUp | 15:34 |
ttx | so this is where we should give guidance on number of nodes | 15:35 |
ttx | jpward: ^ | 15:35 |
ttx | some of the resources listed here might be a better fit in the Configure stage | 15:35 |
genekuo | For third question "RabbitMQ is clearly my bottleneck, is there a way to reconfigure that part to handle more scale?" | 15:36 |
ttx | Like it's a bit late in the journey to select a Neutron backend | 15:36 |
genekuo | should we put this in step 1? | 15:36 |
imtiazc | That a good topic :) The answer, however, depends a lot on the network provider selection. | 15:36 |
imtiazc | We often wondered about what tools other operators use. For e.g. what network provider are they using, what is used for monitoring, logging. How do other provision their hosts (before even deploying OpenStack) and also deployment tool - Puppet, Ansible, etc. Do you think we can come up with a matrix /table for this? | 15:36 |
ttx | genekuo: yeah, I think we should delete that question from this stage. I already added a question on RabbitMQ configuration | 15:37 |
ttx | done | 15:37 |
jpward | imtiazc, I have wondered the same thing, I would like to see that as well | 15:37 |
ttx | I'll move the Neutron backends comparison to stage 1 too | 15:38 |
ttx | ok done | 15:39 |
genekuo | imtiazc, I think there's a lot of feedback about what tools ops uses in ops forum during summit | 15:40 |
jpward | should there also be a planning stage? Like determining the type of hardware, networking configurations, etc? | 15:40 |
ttx | yeah, the trick is to reduce all that feedback into common best practices | 15:40 |
ttx | jpward: currently we use stage 1 (Configure) for that | 15:40 |
ttx | It's like initial decisions (stage 1) and later decisions (stage 3) | 15:41 |
jpward | ok | 15:41 |
ttx | Picking a neutron backend would be an initial decision | 15:41 |
ttx | deciding on a control plane / data plane number of nodes mix is more stage 3 | 15:42 |
*** liuyulong has quit IRC | 15:42 | |
ttx | (bad example, it's like where the answer is the most "it depends") | 15:42 |
ttx | maybe we should rename to the "It Depends SIG" | 15:42 |
jpward | lol | 15:43 |
genekuo | lol | 15:43 |
ttx | Seriously though, there is a reason why there is no "Scaling guide" yet... It's just hard to extract common guidance | 15:43 |
genekuo | we determine the number of control plane process by looking into rabbitmq queue | 15:43 |
ttx | yet we need to, because this journey is superscary | 15:44 |
genekuo | if the number of messages keep queueing up it probably means that you need to add more workers | 15:44 |
ttx | So any answer or reassurance we can give, we should. | 15:44 |
ttx | genekuo: would you mind adding a question around that? Like "how do you decide to add a new node for control plane" maybe | 15:45 |
imtiazc | Yes, the guidance is somewhat dependent on monitoring your queues and other services. But I think we can vouch for the max number of computes given our architecture. | 15:45 |
genekuo | ttx, let me add it | 15:45 |
ttx | Frankly, we should set the bar pretty low. Any information is better than the current void | 15:46 |
ttx | which is why I see this as a no pressure exercise | 15:46 |
*** _mlavalle_1 has quit IRC | 15:46 | |
ttx | It is a complex system and every use case is different | 15:46 |
ttx | If optimizing was easy we'd encode it in the software | 15:47 |
ttx | So even if the answer is always "it depends", at least we can say "it depends on..." | 15:47 |
genekuo | done | 15:47 |
ttx | and provide tools to help determining the best path | 15:47 |
ttx | genekuo: thx | 15:47 |
imtiazc | We had some rough ideas on how much we could scale based on feedback from other operators like CERN, Salesforce, PayPal etc.. | 15:48 |
ttx | Anything else to add to Scaleup? | 15:48 |
ttx | imtiazc: the best way is, indeed, to listen and discuss with others and apply what they say mentally to your use case | 15:48 |
ttx | Maybe one pro tip we should give is to attend events, watch presentations, engage with fellow operators | 15:49 |
ttx | once that will be possible again to socialize :) | 15:50 |
genekuo | sounds good | 15:50 |
ttx | ok,. moving on to next stage | 15:50 |
ttx | #link https://wiki.openstack.org/wiki/Large_Scale_SIG/ScaleOut | 15:50 |
ttx | So here I think it would be great to have a few models | 15:50 |
ttx | i can't lead that as I don;t have practical experience doing it | 15:51 |
genekuo | me too, we currently only split regions because of DR purpose | 15:51 |
ttx | If someone is interested in listing the various ways you can scale out to multiple clusters/zones/regions/cells... | 15:51 |
ttx | genekuo: independent clusters is still one model | 15:52 |
ttx | So we won;t solve that one today, but if you;re interested in helping there, let me know | 15:52 |
ttx | Last stage is the one I just added | 15:53 |
ttx | #link https://wiki.openstack.org/wiki/Large_Scale_SIG/UpgradeAndMaintain | 15:53 |
ttx | (based on input from last meeting) | 15:53 |
imtiazc | We are also following a cookie cutter model. Once we have determined a max size we are comfortable with, we just replicate. I do like what CERN has done there | 15:53 |
ttx | imtiazc: that's good input. If you can formalize it as a question/answer, I think it would be a great addition | 15:54 |
ttx | So again, I don't think there is easy low-hanging fruit in this stage we could pick up | 15:54 |
ttx | Also wondering how much that stage depends on the distribution you picked at stage 1 | 15:55 |
ttx | could be an interesting question to add -- which OpenStack distribution model is well-suited for large scale | 15:56 |
ttx | (stage 1 probably) | 15:56 |
ttx | I'll add it | 15:56 |
ttx | Any last comment before we switch to discussing next meeting date? | 15:57 |
genekuo | nope :) | 15:57 |
imtiazc | By distribution, do you mean Ubuntu, RedHat, SuSe etc? | 15:57 |
ttx | or openstackansible etc | 15:58 |
ttx | Like how do you install openstack | 15:58 |
imtiazc | ok. thanks. I don't have anything else for today | 15:58 |
ttx | So not really Ubuntu, but Ubuntu debs vs. Juju vs... | 15:59 |
ttx | #topic Next meeting | 15:59 |
*** openstack changes topic to "Next meeting (Meeting topic: large_scale_sig)" | 15:59 | |
ttx | As discussed last meeting, we'll skip the meeting over teh end-of-year holidays | 15:59 |
ttx | So our next meeting will be January 13. | 15:59 |
ttx | I don't think we'll have a specific item to discuss in-depth, we'll just focus on restarting the Large Scale SIG engine in the new year | 15:59 |
imtiazc | Happy holidays everyone! | 15:59 |
ttx | Super, we made it to the end of the meeting without logging any TODOs! We'll be able to take a clean break over the holidays | 15:59 |
ttx | Thanks everyone | 15:59 |
ttx | #endmeeting | 16:00 |
*** openstack changes topic to "OpenStack Meetings || https://wiki.openstack.org/wiki/Meetings/" | 16:00 | |
openstack | Meeting ended Wed Dec 16 16:00:03 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:00 |
mdelavergne | Happy holidays, see you next year, and thanks :) | 16:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-12-16-15.00.html | 16:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-12-16-15.00.txt | 16:00 |
openstack | Log: http://eavesdrop.openstack.org/meetings/large_scale_sig/2020/large_scale_sig.2020-12-16-15.00.log.html | 16:00 |
ttx | right on time | 16:00 |
ttx | that was clsoe | 16:00 |
genekuo | thanks all, see you next year | 16:00 |
ttx | close even | 16:00 |
ttx | genekuo: thanks! | 16:00 |
*** imtiazc has left #openstack-meeting-3 | 16:00 | |
*** mdelavergne has quit IRC | 16:00 | |
*** macz_ has joined #openstack-meeting-3 | 16:04 | |
*** eolivare_ has quit IRC | 16:16 | |
*** mlavalle has joined #openstack-meeting-3 | 16:16 | |
*** ricolin_ has quit IRC | 16:31 | |
*** ralonsoh is now known as ralonsoh|afk | 17:00 | |
*** belmoreira has quit IRC | 17:34 | |
*** e0ne has quit IRC | 19:52 | |
*** baojg has quit IRC | 21:15 | |
*** baojg has joined #openstack-meeting-3 | 21:17 | |
*** baojg has quit IRC | 21:19 | |
*** baojg has joined #openstack-meeting-3 | 21:21 | |
*** ralonsoh|afk has quit IRC | 22:09 | |
*** raildo has quit IRC | 22:30 | |
*** slaweq has quit IRC | 22:42 | |
*** haleyb is now known as haleyb|away | 23:18 | |
*** baojg has quit IRC | 23:25 | |
*** baojg has joined #openstack-meeting-3 | 23:26 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!