*** d0ugal has quit IRC | 01:48 | |
*** d0ugal has joined #openstack-mistral | 01:51 | |
*** bobh has joined #openstack-mistral | 02:14 | |
*** apetrich has quit IRC | 02:15 | |
*** nguyenhai_ has quit IRC | 02:18 | |
openstackgerrit | Nguyen Van Trung proposed openstack/mistral master: Don't quote {posargs} in tox.ini https://review.openstack.org/608827 | 03:22 |
---|---|---|
*** bobh has quit IRC | 03:23 | |
*** hardikjasani has joined #openstack-mistral | 04:06 | |
*** akovi has joined #openstack-mistral | 04:20 | |
*** jaosorior has joined #openstack-mistral | 04:28 | |
*** apetrich has joined #openstack-mistral | 05:32 | |
*** jtomasek has joined #openstack-mistral | 07:08 | |
*** pgaxatte has joined #openstack-mistral | 07:20 | |
*** mcdoker18 has joined #openstack-mistral | 07:27 | |
*** gkadam has joined #openstack-mistral | 07:57 | |
d0ugal | #endmeeting | 08:10 |
*** openstack changes topic to "Mistral the Workflow Service for OpenStack. https://docs.openstack.org/mistral/latest/" | 08:10 | |
openstack | Meeting ended Tue Oct 9 08:10:42 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 08:10 |
d0ugal | oops :) | 08:10 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/mistral/2018/mistral.2018-10-08-15.00.html | 08:10 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/mistral/2018/mistral.2018-10-08-15.00.txt | 08:10 |
openstack | Log: http://eavesdrop.openstack.org/meetings/mistral/2018/mistral.2018-10-08-15.00.log.html | 08:10 |
d0ugal | I'm getting more forgetful. Maybe logging office hours these isn't very useful anyway | 08:11 |
*** shardy has joined #openstack-mistral | 08:40 | |
mcdoker18 | Hi, maybe we disable mistral-rally-task job? It fails too often without a reason | 08:47 |
openstackgerrit | Vitalii Solodilov proposed openstack/mistral master: Fix state change propagation in workflows https://review.openstack.org/607960 | 09:59 |
d0ugal | mcdoker18: Yeah, lets make it non-voting until somebody can make it more stable | 10:16 |
d0ugal | I'll do that. | 10:16 |
openstackgerrit | Dougal Matthews proposed openstack/mistral master: Make the rally job non-voting https://review.openstack.org/608906 | 10:18 |
d0ugal | mcdoker18, rakhmerov, apetrich: ^ | 10:18 |
rakhmerov | d0ugal: I investigated that | 10:19 |
rakhmerov | ok to make it non-voting for now | 10:19 |
rakhmerov | mcdoker18: there's a reason | 10:19 |
d0ugal | :) | 10:19 |
d0ugal | What is the reason? | 10:20 |
rakhmerov | Mysql | 10:20 |
d0ugal | lol | 10:20 |
rakhmerov | that gives tons of deadlocks | 10:20 |
d0ugal | right | 10:20 |
rakhmerov | they've always been there but somehow they started occurring more often | 10:21 |
rakhmerov | that is something that I don't understand yet | 10:21 |
d0ugal | Is that also why the mysql unit tests have been failing more recently? | 10:21 |
rakhmerov | we can also switch this test to a lighter workflow | 10:21 |
rakhmerov | d0ugal: yes | 10:21 |
rakhmerov | so that scenario now uses join_500 workflow | 10:21 |
rakhmerov | we can switch it to join_100 | 10:22 |
rakhmerov | that also exists in the rally scenarios | 10:22 |
d0ugal | Right | 10:22 |
rakhmerov | I think that's better than making it non-voting may be | 10:22 |
d0ugal | rakhmerov: did you also see that the rally jobs caused OOM on CI infra? | 10:22 |
rakhmerov | the chances of failure will be much lower | 10:22 |
d0ugal | https://bugs.launchpad.net/mistral/+bug/1795068 | 10:22 |
openstack | Launchpad bug 1795068 in Mistral "screen-mistral-engine.txt size is causing logstash index OOM" [High,Confirmed] | 10:22 |
rakhmerov | yes, I have | 10:22 |
rakhmerov | same reason | 10:23 |
rakhmerov | I downloaded the log (~1 GB) and it's full of these deadlock exceptions | 10:23 |
rakhmerov | so I'd suggest we switch to join_100 for now | 10:23 |
rakhmerov | what do you think? | 10:23 |
d0ugal | Sure, if you think that will reduce the deadlocks? | 10:24 |
rakhmerov | yes | 10:24 |
rakhmerov | definitely it will | 10:24 |
rakhmerov | the thing is that we started using pymysql driver everywhere | 10:24 |
rakhmerov | + eventlet RPC executor | 10:24 |
d0ugal | so maybe that is the issue | 10:24 |
rakhmerov | which means that within one process we can now have many parallel transactions | 10:25 |
d0ugal | Right | 10:25 |
rakhmerov | well, eventually yes | 10:25 |
d0ugal | Do you use that configuration in production? | 10:25 |
rakhmerov | so more parallel transactions => more deadlocks | 10:25 |
rakhmerov | yep | 10:25 |
d0ugal | Makes sense | 10:25 |
rakhmerov | and also have issues with that ) | 10:25 |
d0ugal | That was my next question :) | 10:25 |
rakhmerov | I'm actually currently working on improving join | 10:25 |
rakhmerov | so I might fix that as well soon | 10:25 |
mcdoker18 | I didn't see such problems using postgresql (: | 10:26 |
d0ugal | I guess we don't often have enough parallelisation to see that issue. I'm not aware of any reports. | 10:26 |
d0ugal | <3 Postgres | 10:26 |
rakhmerov | mcdoker18: ooh, yes. Postgres behaves much better ) | 10:26 |
rakhmerov | d0ugal: yes, this problem becomes serious exactly in case of join | 10:27 |
d0ugal | Right | 10:27 |
rakhmerov | 500 potential parallel transactions | 10:27 |
d0ugal | lol | 10:27 |
rakhmerov | practically less, of course, because not all of them turn to be parallel | 10:27 |
d0ugal | Sure | 10:27 |
rakhmerov | but still a big number under 500 | 10:27 |
rakhmerov | so let's use a different WF for now | 10:28 |
rakhmerov | I'm also thinking about limiting a number of parallel transactions/threads in the configuration somehow | 10:28 |
mcdoker18 | Can we restrict a number of eventlet thread that handle messages? | 10:28 |
rakhmerov | that may be even possible already, e.g. via sqlalchemy | 10:28 |
rakhmerov | mcdoker18: yep, thinking about it | 10:28 |
rakhmerov | I'll look what we can do | 10:29 |
mcdoker18 | [database] max_overflow | 10:29 |
mcdoker18 | If you have couple of minutes, please leave comments https://review.openstack.org/#/c/607703/ | 10:30 |
rakhmerov | yes, ok | 10:30 |
rakhmerov | mcdoker18: Vitalii, can you please look at https://bugs.launchpad.net/mistral/+bug/1793651 ? | 10:31 |
openstack | Launchpad bug 1793651 in Mistral "Backwards compatibility issue: when starting a workflow "params" can't be null now" [High,Confirmed] | 10:31 |
rakhmerov | I have a feeling that it's related with something that you did no so long ago | 10:31 |
rakhmerov | maybe I'm wrong but still asking you to check if you can find time this week | 10:31 |
rakhmerov | d0ugal: do you know if anybody is checking https://bugs.launchpad.net/mistral/+bug/1794414 ? | 10:32 |
openstack | Launchpad bug 1794414 in Mistral "Backward compatibility issue: "-" and "." are not allowed now in task names" [High,Confirmed] | 10:32 |
d0ugal | rakhmerov: Not that I am aware of | 10:33 |
d0ugal | I hadn't seen that bug actually | 10:33 |
rakhmerov | ok | 10:33 |
rakhmerov | d0ugal: so will you change your patch then? | 10:33 |
rakhmerov | https://review.openstack.org/#/c/608906/ | 10:34 |
d0ugal | rakhmerov: If we are sure it will fix the issue, otherwise I would prefer to make it non-voting and reenable it once we know it is stable | 10:35 |
rakhmerov | let's check :) | 10:35 |
rakhmerov | we can even create a different patch | 10:35 |
d0ugal | rakhmerov: doing it | 10:35 |
rakhmerov | thanks | 10:35 |
mcdoker18 | rakhmerov: I never changed the api/executions.py :) | 10:36 |
rakhmerov | aah.. ok | 10:36 |
rakhmerov | can you still check that? | 10:36 |
openstackgerrit | Dougal Matthews proposed openstack/mistral master: Change the rally job to use a lighter workflow https://review.openstack.org/608910 | 10:37 |
mcdoker18 | https://bugs.launchpad.net/mistral/+bug/1794414 But it's my fault likely | 10:38 |
openstack | Launchpad bug 1794414 in Mistral "Backward compatibility issue: "-" and "." are not allowed now in task names" [High,Confirmed] - Assigned to Adriano Petrich (apetrich) | 10:38 |
apetrich | rakhmerov, if it's all right for me to take that one | 10:38 |
rakhmerov | apetrich: absolutely | 10:43 |
rakhmerov | I'm just looking for help | 10:43 |
rakhmerov | please take it | 10:43 |
apetrich | cheers | 10:43 |
rakhmerov | mcdoker18: yeah, then may be you can take https://bugs.launchpad.net/mistral/+bug/1794414 | 10:44 |
openstack | Launchpad bug 1794414 in Mistral "Backward compatibility issue: "-" and "." are not allowed now in task names" [High,Confirmed] - Assigned to Adriano Petrich (apetrich) | 10:44 |
rakhmerov | apetrich, mcdoker18: just please assign yourself to the bugs | 10:44 |
d0ugal | rakhmerov: I think apetrich wants https://bugs.launchpad.net/mistral/+bug/1794414 | 10:46 |
openstack | Launchpad bug 1794414 in Mistral "Backward compatibility issue: "-" and "." are not allowed now in task names" [High,Confirmed] - Assigned to Adriano Petrich (apetrich) | 10:46 |
d0ugal | rakhmerov: or maybe you meant to share a different bug to mcdoker18? | 10:47 |
rakhmerov | ooh, ok | 10:48 |
rakhmerov | yes | 10:48 |
mcdoker18 | rakhmerov: https://review.openstack.org/#/c/607807/ if engine will die between two transaction? | 10:57 |
mcdoker18 | task freeze in RUNNING state | 10:58 |
rakhmerov | mcdoker18: we have such issues now anyway. Plus I'm going to make some protection against it a little later | 10:58 |
rakhmerov | mcdoker18: and the chance is practically tiny | 10:59 |
rakhmerov | e.g. same problem exists with action_queue.process but we've never seen it happen in production | 10:59 |
rakhmerov | simply because of the probability | 10:59 |
rakhmerov | but like I said, I'm planning to make a protection | 10:59 |
rakhmerov | I just want to provide more complex solution rather than just covering this particular case | 11:00 |
rakhmerov | btw, this patch isn't fully ready, I'd like to improve it. And I also postponed it for now because I need to fix join which is a much more serious problem | 11:01 |
mcdoker18 | Do you plan to get rid from the polling delayed calls in case of join task? | 11:04 |
rakhmerov | mcdoker18: yes | 11:07 |
rakhmerov | completely | 11:07 |
vgvoleg | dougal, rakhmerov, could you please tell your opinion about this review? https://review.openstack.org/607703/ | 11:08 |
rakhmerov | mcdoker18: we found a case at one of our customers when the current approach doesn't work pretty much at all | 11:08 |
mcdoker18 | async action?) | 11:08 |
rakhmerov | now, just a very large join | 11:09 |
rakhmerov | with lots of dependencies | 11:10 |
rakhmerov | and that leads to the situation when scheduler is 99% busy with checking joins | 11:10 |
rakhmerov | and there's a lot of joins in the workflow running simultaneously | 11:10 |
rakhmerov | and since scheduler is 99% busy with that the workflow is almost not progressing | 11:11 |
rakhmerov | vgvoleg: yes, sorry that I still didn't do it. I'll look at it the first thing tomorrow. About to leave now | 11:12 |
vgvoleg | ty :) | 11:13 |
rakhmerov | vgvoleg: btw, I left a comment in https://review.openstack.org/#/c/603208/ | 11:16 |
rakhmerov | I think we can improve the patch a little bit | 11:17 |
*** thrash|g0ne is now known as thrash | 11:18 | |
*** jrist has joined #openstack-mistral | 11:29 | |
*** jrist has quit IRC | 11:36 | |
openstackgerrit | Hardik Jasani proposed openstack/mistral master: Fix next link in get resource list rest API https://review.openstack.org/608677 | 11:44 |
*** toure|biab is now known as toure | 12:33 | |
*** apetrich has quit IRC | 12:35 | |
*** apetrich has joined #openstack-mistral | 12:42 | |
*** bobh has joined #openstack-mistral | 13:13 | |
*** hardikjasani has quit IRC | 13:16 | |
*** akovi has quit IRC | 13:28 | |
*** bobh has quit IRC | 14:49 | |
*** bobh has joined #openstack-mistral | 14:56 | |
*** bobh has quit IRC | 15:01 | |
*** bobh has joined #openstack-mistral | 15:11 | |
*** gkadam has quit IRC | 16:25 | |
openstackgerrit | Bob Haddleton proposed openstack/mistral master: Update version.version_string to actually be a string https://review.openstack.org/609050 | 16:27 |
openstackgerrit | Bob Haddleton proposed openstack/mistral master: Update version.version_string to actually be a string https://review.openstack.org/609050 | 16:36 |
*** toure is now known as toure|biab | 16:58 | |
*** apetrich has quit IRC | 17:22 | |
*** toure|biab is now known as toure | 17:41 | |
*** shardy has quit IRC | 17:41 | |
*** gkadam has joined #openstack-mistral | 18:01 | |
*** gkadam has quit IRC | 19:19 | |
openstackgerrit | 98k proposed openstack/mistral-extra master: Don't quote {posargs} in tox.ini https://review.openstack.org/609136 | 21:06 |
openstackgerrit | 98k proposed openstack/mistral-lib master: Don't quote {posargs} in tox.ini https://review.openstack.org/609137 | 21:07 |
*** bobh has quit IRC | 21:08 | |
*** jrist has joined #openstack-mistral | 21:14 | |
*** jtomasek has quit IRC | 21:29 | |
*** bobh has joined #openstack-mistral | 21:41 | |
*** bobh has quit IRC | 21:45 | |
*** bobh has joined #openstack-mistral | 21:54 | |
*** bobh has quit IRC | 22:09 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!