*** bobh has joined #openstack-mistral | 00:23 | |
*** bobh has quit IRC | 00:23 | |
*** bobh has joined #openstack-mistral | 00:27 | |
*** gyee has quit IRC | 00:56 | |
*** bobh has quit IRC | 00:56 | |
*** bobh has joined #openstack-mistral | 01:13 | |
*** bobh has quit IRC | 01:44 | |
*** bobh has joined #openstack-mistral | 01:46 | |
openstackgerrit | Winson Chan proposed openstack/mistral: Abstract authentication function https://review.openstack.org/371233 | 01:47 |
---|---|---|
openstackgerrit | Winson Chan proposed openstack/python-mistralclient: Abstract authentication function https://review.openstack.org/371234 | 01:48 |
*** bobh has quit IRC | 01:50 | |
*** bobh has joined #openstack-mistral | 01:53 | |
*** rrecio_ has quit IRC | 02:05 | |
*** bobh has quit IRC | 02:17 | |
*** catintheroof has joined #openstack-mistral | 02:27 | |
*** vishwanathj has quit IRC | 02:39 | |
*** janki has joined #openstack-mistral | 02:50 | |
*** gmann has quit IRC | 03:35 | |
*** gmann has joined #openstack-mistral | 03:37 | |
*** janki has quit IRC | 03:42 | |
*** sharatss has joined #openstack-mistral | 03:46 | |
*** hparekh has joined #openstack-mistral | 04:24 | |
*** harlowja has quit IRC | 04:45 | |
*** jaosorior has joined #openstack-mistral | 04:57 | |
*** vishwanathj has joined #openstack-mistral | 04:59 | |
*** vishwanathj has quit IRC | 05:00 | |
openstackgerrit | Lucky samadhiya proposed openstack/mistral: delete python bytecode including pyo before every test run https://review.openstack.org/371286 | 06:16 |
*** Ravikiran_K has joined #openstack-mistral | 06:34 | |
*** stevebaker has quit IRC | 07:20 | |
*** stevebaker has joined #openstack-mistral | 07:23 | |
openstackgerrit | Lucky samadhiya proposed openstack/python-mistralclient: delete python bytecode including pyo before every test run https://review.openstack.org/371305 | 07:29 |
*** kong has quit IRC | 07:41 | |
openstackgerrit | Merged openstack/python-mistralclient: Updated from global requirements https://review.openstack.org/371123 | 07:54 |
*** shardy has joined #openstack-mistral | 07:58 | |
d0ugal | ddeja, rakhmerov - Hey, so just a small update, we are not totally sure what happened but we seemed to be causing a race condition. We managed to resolve it in CI for now, I am going to try and re-create it now we know more next week and hopefully come up with a mistral bug that is actionable :) | 07:59 |
*** jpich has joined #openstack-mistral | 08:01 | |
*** openstackgerrit has quit IRC | 08:03 | |
*** openstackgerrit has joined #openstack-mistral | 08:04 | |
therve | d0ugal, Still no good idea what happened? | 08:17 |
d0ugal | d0ugal: nah, we know - sort, of | 08:18 |
d0ugal | therve: to the point that we were able to stop it happening :) | 08:19 |
d0ugal | therve: but we have not fixed it yet, just removed the commit that highlighted the issue | 08:19 |
therve | d0ugal, Didn't you just removed a bunch of tasks? | 08:19 |
d0ugal | therve: Yeah | 08:19 |
therve | So yeah no good idea :D | 08:20 |
d0ugal | therve: so, openstack undercloud install at the end starts off a workflow, but didn't wait for it to finish - it is fairly quick, but we think some action calls were started by following commands that caused a race condition | 08:20 |
d0ugal | therve: we have a theory, but I need to write some nasty shell scripts to try and reproduce it at some point | 08:20 |
therve | Ah, ok | 08:20 |
d0ugal | My personal theory is that we start a Mistral workflow, then we start some direct synchronous action calls while the workflow is still running | 08:21 |
d0ugal | and because there is only one vCPU on the multinode jobs, what happens then? | 08:21 |
d0ugal | (and one CPU means one worker in Mistral by default I think) | 08:22 |
therve | There are 4 vcpu AFAIK | 08:22 |
d0ugal | If that is correct, it shouldn't be that hard to try. I just need to re-create my undercloud with one vCPU | 08:22 |
therve | But still only one executor and one engine | 08:22 |
d0ugal | therve: oh right, I am sure slagle said the multinodes only have 1, but I may have miss-understood that bit | 08:23 |
d0ugal | (I fail to understand quite a few things about CI) | 08:23 |
therve | d0ugal, Still, that only matters for the API AFAICT | 08:23 |
d0ugal | therve: Yeah, that is the one thing that makes me doubt it :) | 08:23 |
d0ugal | I'll try and re-create it with various combinations of this anyway | 08:24 |
d0ugal | Actually, I am going to try now. This seems more fun than rebasing patches. | 08:25 |
*** brunograz_ has quit IRC | 08:26 | |
*** sharatss has quit IRC | 08:27 | |
*** jchhatbar has joined #openstack-mistral | 08:30 | |
*** nmakhotkin has joined #openstack-mistral | 08:30 | |
*** jchhatbar is now known as janki | 08:32 | |
*** shardy has quit IRC | 08:40 | |
therve | Ahah | 08:51 |
therve | d0ugal, Reproduced | 08:51 |
d0ugal | therve: oh, how? | 08:51 |
therve | d0ugal, Simply using run-action | 08:51 |
d0ugal | therve: when? | 08:51 |
therve | mistral run-action mistral.executions_create inputdata.json | 08:51 |
d0ugal | therve: Sure, I know the command. | 08:52 |
therve | The executor will talk back to the engine via the API, and that's when you have the cycle, so the timeout | 08:52 |
d0ugal | therve: I don't fully follow - can you share the full example? | 08:53 |
therve | Hum sure let me try to have the simplest example | 08:53 |
d0ugal | therve: oh, is this to do with 1 worker? | 08:53 |
d0ugal | I think maybe I do understand | 08:53 |
therve | Well I have 4 API workers, but it doesn't matter | 08:54 |
d0ugal | the API is waiting for the response from the engine on the API? | 08:54 |
d0ugal | ah | 08:54 |
d0ugal | Then I am missing the point :) | 08:54 |
therve | The engine is waiting for the executor, which is waiting for the API, which is waiting for the engine | 08:56 |
d0ugal | right | 08:56 |
therve | d0ugal, http://paste.openstack.org/show/579281/ | 08:57 |
therve | Simplest reproducer | 08:57 |
d0ugal | therve: $ mistral run-action mistral.action_executions_create '{"name": "std.echo"}' | 09:02 |
d0ugal | :) | 09:02 |
therve | That should do the trick too :) | 09:03 |
therve | I should have added "Simplest that I found" :) | 09:03 |
d0ugal | hah, indeed. | 09:03 |
d0ugal | therve: Thanks for finding it. | 09:03 |
therve | Actually that one doesn't work :) | 09:03 |
d0ugal | It doesn't? | 09:04 |
d0ugal | http://paste.openstack.org/show/579295/ | 09:04 |
therve | Weird | 09:05 |
therve | d0ugal, Oh because I worked around the problem | 09:05 |
therve | It does trigger the timeout | 09:06 |
d0ugal | therve: How did you workaround it? | 09:06 |
therve | d0ugal, I changed the oslo messaging executor to threading | 09:06 |
d0ugal | ah | 09:06 |
openstackgerrit | Nikolay Mahotkin proposed openstack/mistral: Updating mistralclient docs https://review.openstack.org/370821 | 09:08 |
d0ugal | therve: thanks for finding it, are you going to open a bug? | 09:10 |
therve | d0ugal, Sure, doing it now | 09:11 |
d0ugal | I've never liked our use of direct action calls, I'd like us to move away from them | 09:12 |
d0ugal | Maybe I can use this as another excuse to do so :) | 09:12 |
therve | Sure, doesn't mean it should blow up :) | 09:13 |
therve | d0ugal, https://bugs.launchpad.net/mistral/+bug/1624284 | 09:14 |
openstack | Launchpad bug 1624284 in Mistral "MessagingTimeout when executing mistral actions" [Undecided,New] | 09:14 |
therve | rakhmerov ^^^ | 09:14 |
d0ugal | therve: so reverting that mistral patch would have worked, if our CI wasn't broken :-D | 09:17 |
therve | d0ugal, Yep | 09:17 |
d0ugal | I guess the solution is to make it configurable? | 09:17 |
therve | I *doubt* it | 09:18 |
d0ugal | I'm not sure how hard it is to support multiple messaging executors | 09:18 |
d0ugal | ah, ok | 09:18 |
therve | The solution is to use oslo_messaging properly. Which to be fair is somewhat hard. | 09:18 |
d0ugal | :) | 09:19 |
*** shardy has joined #openstack-mistral | 09:26 | |
d0ugal | therve: so, here is the bit I still don't really understand... | 09:39 |
d0ugal | therve: How do we re-create it with the tripleo actions? | 09:39 |
d0ugal | It sort of makes sense to me that it would fail with a mistral one like that | 09:40 |
therve | d0ugal, That part I'm not sure. Possibly the environment interactions? | 09:41 |
d0ugal | therve: oh, good point. That in effect creates nested actions I guess. | 09:41 |
therve | d0ugal, I tried to track down usage of mistral itself in the actions, and it seems to only be about using environments | 09:42 |
d0ugal | Right, that makes sense | 09:43 |
d0ugal | I didn't think I'd understand it by the end of this week, so I'm glad that I sort-of do | 09:43 |
d0ugal | Thanks :) | 09:44 |
*** sharatss has joined #openstack-mistral | 09:45 | |
therve | You're welcome | 09:45 |
therve | That was a itch I had to scratch :) | 09:45 |
openstackgerrit | Dougal Matthews proposed openstack/mistral: Correct documentation about task attributes 'action' and 'workflow' https://review.openstack.org/371404 | 10:03 |
*** janki has quit IRC | 10:22 | |
*** kong has joined #openstack-mistral | 10:38 | |
d0ugal | therve: To change to threading, did you just change the executor string? | 10:49 |
therve | d0ugal, Yep | 10:49 |
d0ugal | therve: to "threading"? | 10:49 |
therve | Yep | 10:49 |
d0ugal | therve: cool, I am going to try it in CI with a depends on, just to link it all together and check that works. | 10:49 |
openstackgerrit | Dougal Matthews proposed openstack/mistral: [TESTING] Change the oslo messaging executor to threading https://review.openstack.org/371435 | 10:53 |
openstackgerrit | Hardik Parekh proposed openstack/python-mistralclient: Added pagination options for workflow and actions https://review.openstack.org/371439 | 10:57 |
openstackgerrit | Hardik Parekh proposed openstack/python-mistralclient: Added pagination options for workflow and actions https://review.openstack.org/371439 | 10:59 |
openstackgerrit | Dougal Matthews proposed openstack/mistral: Correct documentation about task attributes 'action' and 'workflow' https://review.openstack.org/371404 | 11:20 |
*** catintheroof has quit IRC | 11:39 | |
*** dprince has joined #openstack-mistral | 12:00 | |
*** bobh has joined #openstack-mistral | 12:03 | |
*** hparekh has quit IRC | 12:08 | |
*** catintheroof has joined #openstack-mistral | 12:27 | |
openstackgerrit | Merged openstack/mistral: Correct documentation about task attributes 'action' and 'workflow' https://review.openstack.org/371404 | 12:28 |
*** bobh has quit IRC | 12:29 | |
*** vishwanathj has joined #openstack-mistral | 12:55 | |
ddeja | therve, d0ugal Hi guys, I'm glad that you have found issue | 13:00 |
ddeja | talking about changing blocking into threading - It was threding before, but rakhmerov change it to blocking to make mistral works faster if I remember correctly | 13:01 |
ddeja | I know how it sounds - how making something blocking instead of threding makes things faster. But this enables us to use non-locking model in the DB | 13:02 |
ddeja | I would like to ask you, why do you call mistral from mistral? Because I'm not sure if the bug is really a bug, mistral calling itself not sounds like a good idea for me | 13:03 |
*** jaosorior has quit IRC | 13:03 | |
*** jaosorior has joined #openstack-mistral | 13:04 | |
*** bobh has joined #openstack-mistral | 13:06 | |
therve | ddeja, Because actions can do several things | 13:07 |
therve | That's a serious limitation. It's also one that I reproduced quickly, but I'm pretty sure I can find other problems with using blocking | 13:07 |
ddeja | therve: OK. Could you show a real example when mistral calling itself is used? Because I can't think of scenario when it is needed | 13:09 |
therve | ddeja, https://github.com/openstack/tripleo-common/blob/master/tripleo_common/actions/plan.py#L91 | 13:10 |
ddeja | why not put this into a workflow? and on success call another action? | 13:12 |
*** bobh has quit IRC | 13:12 | |
therve | Sure there are workarounds | 13:12 |
therve | But the behavior is still broken | 13:12 |
ddeja | Agree | 13:13 |
ddeja | Nevertheless I'm wondering if such usage of mistral is OK, I would like to see Renat's opinon | 13:13 |
therve | Using the blocking executor is trying to hide concurrency issues under the rug | 13:15 |
ddeja | therve: let me fing the patch where it was change | 13:16 |
therve | ddeja, https://review.openstack.org/#/c/356343/ | 13:17 |
ddeja | therve: oh, and there is explenation in this patch | 13:20 |
therve | Some, yeah | 13:21 |
*** tonytan4ever has joined #openstack-mistral | 13:27 | |
*** Ravikiran_K has quit IRC | 13:36 | |
*** tonytan_brb has joined #openstack-mistral | 13:44 | |
*** brian_price has quit IRC | 13:46 | |
*** tonytan4ever has quit IRC | 13:46 | |
ddeja | therve: Oh, no I got it. You are calling the mistral environment-create from mistral action, but it hangs on API. It is serious issue | 13:47 |
openstackgerrit | Merged openstack/mistral: Take os_actions_endpoint_type into use https://review.openstack.org/369253 | 13:48 |
ddeja | therve: but I've tested it now - it works fine (creating an environment while action is running) | 13:57 |
therve | ddeja, While the action is running, or inside the action itself? | 13:58 |
ddeja | therve: oh... | 13:58 |
ddeja | that matters, right? | 13:59 |
therve | Well yeah :) | 13:59 |
ddeja | but I can't think why? It just creates an mistralclient object and calls API | 13:59 |
ddeja | it shouldn't matter | 14:00 |
ddeja | because I see obvious deadlock with calling mistral run-action from mistral action, but this should work... | 14:01 |
ddeja | let my try it on my devstack | 14:01 |
*** bobh has joined #openstack-mistral | 14:09 | |
d0ugal | ddeja: it is a bug in that we have been doing it for months and it just stopped working :( | 14:10 |
*** jaosorior has quit IRC | 14:10 | |
d0ugal | ddeja: we first started using this approach late last year actually | 14:10 |
*** nmakhotkin has quit IRC | 14:10 | |
d0ugal | ddeja: From what I remember, dprince and/or rbrady did chat with rakhmerov about our usage place (with environments) | 14:11 |
openstackgerrit | Sharat Sharma proposed openstack/mistral-lib: Small changes like deletion of extra underline in the docs https://review.openstack.org/371571 | 14:13 |
*** bobh has quit IRC | 14:13 | |
ddeja | d0ugal: well, but it seems it works fine | 14:17 |
ddeja | d0ugal, therve: http://paste.openstack.org/show/580024/ | 14:17 |
therve | ddeja, Not sure what I'm looking at | 14:18 |
therve | Ah you made a env create call | 14:18 |
ddeja | therve: yes. Inside the std.noop. But I've used debbuger | 14:19 |
ddeja | so I'm just pasting this to line to action body to be sure | 14:19 |
therve | ddeja, It should be in the executor, I think? | 14:19 |
therve | You seem to be in the engine here | 14:19 |
ddeja | it is executor | 14:19 |
*** tonytan_brb is now known as tonytan4ever | 14:20 | |
d0ugal | ddeja: FWIW, it works in one of our CI systems but not the other | 14:20 |
d0ugal | We don't know why that is. | 14:20 |
ddeja | well, calling env.create from action execution works fine... | 14:21 |
ddeja | at least on devstack with ubuntu | 14:21 |
d0ugal | ddeja: Yeah, and this is why it's been so hard to track down because it seems to work in some places fine | 14:22 |
ddeja | it is failing on Centos, right? | 14:22 |
therve | FWIW env create was just an hypothesis | 14:22 |
therve | In theory env create just hits the database in the API, so it shouldn't raise an issue | 14:22 |
ddeja | therve: env create should be OK. I'm thinking if there may be a place, where mistral.run-action is called from action | 14:23 |
ddeja | because that will block things | 14:23 |
ddeja | d0ugal, therve is it failing on CentOS only? | 14:25 |
therve | It's unlikely to be the issue | 14:25 |
d0ugal | I agree with therve, but as far as I know that is the only place it has been seen. | 14:25 |
ddeja | I'm not sure - we have recently change how the API service is beeing run. Maybe default config on centos differs from one from ubuntu | 14:26 |
therve | Feel free to dissect tripleo logs to find the right cause | 14:30 |
therve | Or just use the simple example I gave which trivially reproduce a similar issue | 14:30 |
d0ugal | ddeja: Can you point me to that change? | 14:31 |
ddeja | d0ugal: well, It doesn't matter. It uses python, so it same independent the OS | 14:32 |
d0ugal | True, in theory :) | 14:32 |
ddeja | I've setup number of workers to 1, but still can't reproduce it :/ | 14:34 |
ddeja | d0ugal: do you have a Ceno | 14:39 |
ddeja | CentOS with mistral? | 14:39 |
d0ugal | ddeja: Yup! | 14:39 |
d0ugal | ddeja: I wont have much time to look at it again today, but I will be focusing on this on Monday | 14:39 |
ddeja | can yuo run 'sudo ps aux | grep mistral' and tell how many api services are being run? | 14:40 |
ddeja | just this one cmd, I'm curious if it differs from ubuntu ;) | 14:40 |
*** kong has quit IRC | 14:51 | |
*** rrecio_ has joined #openstack-mistral | 14:57 | |
*** bobh has joined #openstack-mistral | 15:10 | |
*** bobh has quit IRC | 15:14 | |
ddeja | d0ugal, therve, guys, I found the root couse | 15:47 |
ddeja | or maybe not... | 15:48 |
*** tonytan4ever has quit IRC | 16:04 | |
*** bobh has joined #openstack-mistral | 16:10 | |
*** bobh has quit IRC | 16:12 | |
*** bobh has joined #openstack-mistral | 16:12 | |
*** sharatss has quit IRC | 16:15 | |
*** dprince has quit IRC | 16:24 | |
*** jpich has quit IRC | 16:33 | |
*** bobh_ has joined #openstack-mistral | 16:34 | |
*** bobh has quit IRC | 16:37 | |
*** bobh_ has quit IRC | 16:42 | |
*** tonytan4ever has joined #openstack-mistral | 17:04 | |
*** tonytan4ever has quit IRC | 17:10 | |
*** tonytan4ever has joined #openstack-mistral | 17:29 | |
ddeja | d0ugal, therve, rakhmerov: Ok guys, now I really found the root couse (after really cerfully reading what's in tripleo failing gate logs). I left a note under bug report https://bugs.launchpad.net/mistral/+bug/1624284 | 17:37 |
openstack | Launchpad bug 1624284 in Mistral "MessagingTimeout when executing mistral actions" [Undecided,Confirmed] - Assigned to Dawid Deja (dawid-deja-0) | 17:37 |
*** bobh has joined #openstack-mistral | 17:43 | |
*** bobh has quit IRC | 17:47 | |
*** bobh has joined #openstack-mistral | 17:50 | |
*** harlowja has joined #openstack-mistral | 18:13 | |
*** bobh has quit IRC | 18:27 | |
*** chlong_ has quit IRC | 18:27 | |
rbrady | ddeja: thanks for the update! | 18:32 |
*** brian_price has joined #openstack-mistral | 18:39 | |
*** bobh has joined #openstack-mistral | 19:28 | |
*** bobh has quit IRC | 19:37 | |
*** dprince has joined #openstack-mistral | 19:37 | |
*** dprince has quit IRC | 19:39 | |
therve | ddeja, Sounds about right. That's mostly the same that I was doing though :) | 19:57 |
therve | The interaction between several calls is more worrying | 20:00 |
*** bobh has joined #openstack-mistral | 20:18 | |
*** bobh has quit IRC | 20:39 | |
*** vishwanathj has quit IRC | 21:22 | |
*** bobh has joined #openstack-mistral | 22:03 | |
*** bobh has quit IRC | 22:15 | |
*** catintheroof has quit IRC | 22:31 | |
*** brian_price has quit IRC | 23:47 | |
*** bobh has joined #openstack-mistral | 23:52 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!