Tuesday, 2018-10-09

*** d0ugal has quit IRC01:48
*** d0ugal has joined #openstack-mistral01:51
*** bobh has joined #openstack-mistral02:14
*** apetrich has quit IRC02:15
*** nguyenhai_ has quit IRC02:18
openstackgerritNguyen Van Trung proposed openstack/mistral master: Don't quote {posargs} in tox.ini  https://review.openstack.org/60882703:22
*** bobh has quit IRC03:23
*** hardikjasani has joined #openstack-mistral04:06
*** akovi has joined #openstack-mistral04:20
*** jaosorior has joined #openstack-mistral04:28
*** apetrich has joined #openstack-mistral05:32
*** jtomasek has joined #openstack-mistral07:08
*** pgaxatte has joined #openstack-mistral07:20
*** mcdoker18 has joined #openstack-mistral07:27
*** gkadam has joined #openstack-mistral07:57
d0ugal#endmeeting08:10
*** openstack changes topic to "Mistral the Workflow Service for OpenStack. https://docs.openstack.org/mistral/latest/"08:10
openstackMeeting ended Tue Oct  9 08:10:42 2018 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)08:10
d0ugaloops :)08:10
openstackMinutes:        http://eavesdrop.openstack.org/meetings/mistral/2018/mistral.2018-10-08-15.00.html08:10
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/mistral/2018/mistral.2018-10-08-15.00.txt08:10
openstackLog:            http://eavesdrop.openstack.org/meetings/mistral/2018/mistral.2018-10-08-15.00.log.html08:10
d0ugalI'm getting more forgetful. Maybe logging office hours these isn't very useful anyway08:11
*** shardy has joined #openstack-mistral08:40
mcdoker18Hi, maybe we disable mistral-rally-task job? It fails too often without a reason08:47
openstackgerritVitalii Solodilov proposed openstack/mistral master: Fix state change propagation in workflows  https://review.openstack.org/60796009:59
d0ugalmcdoker18: Yeah, lets make it non-voting until somebody can make it more stable10:16
d0ugalI'll do that.10:16
openstackgerritDougal Matthews proposed openstack/mistral master: Make the rally job non-voting  https://review.openstack.org/60890610:18
d0ugalmcdoker18, rakhmerov, apetrich: ^10:18
rakhmerovd0ugal: I investigated that10:19
rakhmerovok to make it non-voting for now10:19
rakhmerovmcdoker18: there's a reason10:19
d0ugal:)10:19
d0ugalWhat is the reason?10:20
rakhmerovMysql10:20
d0ugallol10:20
rakhmerovthat gives tons of deadlocks10:20
d0ugalright10:20
rakhmerovthey've always been there but somehow they started occurring more often10:21
rakhmerovthat is something that I don't understand yet10:21
d0ugalIs that also why the mysql unit tests have been failing more recently?10:21
rakhmerovwe can also switch this test to a lighter workflow10:21
rakhmerovd0ugal: yes10:21
rakhmerovso that scenario now uses join_500 workflow10:21
rakhmerovwe can switch it to join_10010:22
rakhmerovthat also exists in the rally scenarios10:22
d0ugalRight10:22
rakhmerovI think that's better than making it non-voting may be10:22
d0ugalrakhmerov: did you also see that the rally jobs caused OOM on CI infra?10:22
rakhmerovthe chances of failure will be much lower10:22
d0ugalhttps://bugs.launchpad.net/mistral/+bug/179506810:22
openstackLaunchpad bug 1795068 in Mistral "screen-mistral-engine.txt size is causing logstash index OOM" [High,Confirmed]10:22
rakhmerovyes, I have10:22
rakhmerovsame reason10:23
rakhmerovI downloaded the log (~1 GB) and it's full of these deadlock exceptions10:23
rakhmerovso I'd suggest we switch to join_100 for now10:23
rakhmerovwhat do you think?10:23
d0ugalSure, if you think that will reduce the deadlocks?10:24
rakhmerovyes10:24
rakhmerovdefinitely it will10:24
rakhmerovthe thing is that we started using pymysql driver everywhere10:24
rakhmerov+ eventlet RPC executor10:24
d0ugalso maybe that is the issue10:24
rakhmerovwhich means that within one process we can now have many parallel transactions10:25
d0ugalRight10:25
rakhmerovwell, eventually yes10:25
d0ugalDo you use that configuration in production?10:25
rakhmerovso more parallel transactions => more deadlocks10:25
rakhmerovyep10:25
d0ugalMakes sense10:25
rakhmerovand also have issues with that )10:25
d0ugalThat was my next question :)10:25
rakhmerovI'm actually currently working on improving join10:25
rakhmerovso I might fix that as well soon10:25
mcdoker18I didn't see such problems using postgresql (:10:26
d0ugalI guess we don't often have enough parallelisation to see that issue. I'm not aware of any reports.10:26
d0ugal<3 Postgres10:26
rakhmerovmcdoker18: ooh, yes. Postgres behaves much better )10:26
rakhmerovd0ugal: yes, this problem becomes serious exactly in case of join10:27
d0ugalRight10:27
rakhmerov500 potential parallel transactions10:27
d0ugallol10:27
rakhmerovpractically less, of course, because not all of them turn to be parallel10:27
d0ugalSure10:27
rakhmerovbut still a big number under 50010:27
rakhmerovso let's use a different WF for now10:28
rakhmerovI'm also thinking about limiting a number of parallel transactions/threads in the configuration somehow10:28
mcdoker18Can we restrict a number of eventlet thread that handle messages?10:28
rakhmerovthat may be even possible already, e.g. via sqlalchemy10:28
rakhmerovmcdoker18: yep, thinking about it10:28
rakhmerovI'll look what we can do10:29
mcdoker18[database] max_overflow10:29
mcdoker18If you have couple of minutes, please leave comments https://review.openstack.org/#/c/607703/10:30
rakhmerovyes, ok10:30
rakhmerovmcdoker18: Vitalii, can you please look at https://bugs.launchpad.net/mistral/+bug/1793651 ?10:31
openstackLaunchpad bug 1793651 in Mistral "Backwards compatibility issue: when starting a workflow "params" can't be null now" [High,Confirmed]10:31
rakhmerovI have a feeling that it's related with something that you did no so long ago10:31
rakhmerovmaybe I'm wrong but still asking you to check if you can find time this week10:31
rakhmerovd0ugal: do you know if anybody is checking https://bugs.launchpad.net/mistral/+bug/1794414 ?10:32
openstackLaunchpad bug 1794414 in Mistral "Backward compatibility issue: "-" and "." are not allowed now in task names" [High,Confirmed]10:32
d0ugalrakhmerov: Not that I am aware of10:33
d0ugalI hadn't seen that bug actually10:33
rakhmerovok10:33
rakhmerovd0ugal: so will you change your patch then?10:33
rakhmerovhttps://review.openstack.org/#/c/608906/10:34
d0ugalrakhmerov: If we are sure it will fix the issue, otherwise I would prefer to make it non-voting and reenable it once we know it is stable10:35
rakhmerovlet's check :)10:35
rakhmerovwe can even create a different patch10:35
d0ugalrakhmerov: doing it10:35
rakhmerovthanks10:35
mcdoker18rakhmerov: I never changed the api/executions.py :)10:36
rakhmerovaah.. ok10:36
rakhmerovcan you still check that?10:36
openstackgerritDougal Matthews proposed openstack/mistral master: Change the rally job to use a lighter workflow  https://review.openstack.org/60891010:37
mcdoker18https://bugs.launchpad.net/mistral/+bug/1794414 But it's my fault likely10:38
openstackLaunchpad bug 1794414 in Mistral "Backward compatibility issue: "-" and "." are not allowed now in task names" [High,Confirmed] - Assigned to Adriano Petrich (apetrich)10:38
apetrichrakhmerov, if it's all right for me to take that one10:38
rakhmerovapetrich: absolutely10:43
rakhmerovI'm just looking for help10:43
rakhmerovplease take it10:43
apetrichcheers10:43
rakhmerovmcdoker18: yeah, then may be you can take https://bugs.launchpad.net/mistral/+bug/179441410:44
openstackLaunchpad bug 1794414 in Mistral "Backward compatibility issue: "-" and "." are not allowed now in task names" [High,Confirmed] - Assigned to Adriano Petrich (apetrich)10:44
rakhmerovapetrich, mcdoker18: just please assign yourself to the bugs10:44
d0ugalrakhmerov: I think apetrich wants https://bugs.launchpad.net/mistral/+bug/179441410:46
openstackLaunchpad bug 1794414 in Mistral "Backward compatibility issue: "-" and "." are not allowed now in task names" [High,Confirmed] - Assigned to Adriano Petrich (apetrich)10:46
d0ugalrakhmerov: or maybe you meant to share a different bug to mcdoker18?10:47
rakhmerovooh, ok10:48
rakhmerovyes10:48
mcdoker18rakhmerov: https://review.openstack.org/#/c/607807/ if engine will die between two transaction?10:57
mcdoker18task freeze in RUNNING state10:58
rakhmerovmcdoker18: we have such issues now anyway. Plus I'm going to make some protection against it a little later10:58
rakhmerovmcdoker18: and the chance is practically tiny10:59
rakhmerove.g. same problem exists with action_queue.process but we've never seen it happen in production10:59
rakhmerovsimply because of the probability10:59
rakhmerovbut like I said, I'm planning to make a protection10:59
rakhmerovI just want to provide more complex solution rather than just covering this particular case11:00
rakhmerovbtw, this patch isn't fully ready, I'd like to improve it. And I also postponed it for now because I need to fix join which is a much more serious problem11:01
mcdoker18Do you plan to get rid from the polling delayed calls in case of join task?11:04
rakhmerovmcdoker18: yes11:07
rakhmerovcompletely11:07
vgvolegdougal, rakhmerov, could you please tell your opinion about this review? https://review.openstack.org/607703/11:08
rakhmerovmcdoker18: we found a case at one of our customers when the current approach doesn't work pretty much at all11:08
mcdoker18async action?)11:08
rakhmerovnow, just a very large join11:09
rakhmerovwith lots of dependencies11:10
rakhmerovand that leads to the situation when scheduler is 99% busy with checking joins11:10
rakhmerovand there's a lot of joins in the workflow running simultaneously11:10
rakhmerovand since scheduler is 99% busy with that the workflow is almost not progressing11:11
rakhmerovvgvoleg: yes, sorry that I still didn't do it. I'll look at it the first thing tomorrow. About to leave now11:12
vgvolegty :)11:13
rakhmerovvgvoleg: btw, I left a comment in https://review.openstack.org/#/c/603208/11:16
rakhmerovI think we can improve the patch a little bit11:17
*** thrash|g0ne is now known as thrash11:18
*** jrist has joined #openstack-mistral11:29
*** jrist has quit IRC11:36
openstackgerritHardik Jasani proposed openstack/mistral master: Fix next link in get resource list rest API  https://review.openstack.org/60867711:44
*** toure|biab is now known as toure12:33
*** apetrich has quit IRC12:35
*** apetrich has joined #openstack-mistral12:42
*** bobh has joined #openstack-mistral13:13
*** hardikjasani has quit IRC13:16
*** akovi has quit IRC13:28
*** bobh has quit IRC14:49
*** bobh has joined #openstack-mistral14:56
*** bobh has quit IRC15:01
*** bobh has joined #openstack-mistral15:11
*** gkadam has quit IRC16:25
openstackgerritBob Haddleton proposed openstack/mistral master: Update version.version_string to actually be a string  https://review.openstack.org/60905016:27
openstackgerritBob Haddleton proposed openstack/mistral master: Update version.version_string to actually be a string  https://review.openstack.org/60905016:36
*** toure is now known as toure|biab16:58
*** apetrich has quit IRC17:22
*** toure|biab is now known as toure17:41
*** shardy has quit IRC17:41
*** gkadam has joined #openstack-mistral18:01
*** gkadam has quit IRC19:19
openstackgerrit98k proposed openstack/mistral-extra master: Don't quote {posargs} in tox.ini  https://review.openstack.org/60913621:06
openstackgerrit98k proposed openstack/mistral-lib master: Don't quote {posargs} in tox.ini  https://review.openstack.org/60913721:07
*** bobh has quit IRC21:08
*** jrist has joined #openstack-mistral21:14
*** jtomasek has quit IRC21:29
*** bobh has joined #openstack-mistral21:41
*** bobh has quit IRC21:45
*** bobh has joined #openstack-mistral21:54
*** bobh has quit IRC22:09

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!