*** rcernin_ has joined #openstack-sahara | 03:00 | |
*** rcernin has quit IRC | 03:03 | |
*** rcernin_ has quit IRC | 03:26 | |
*** rcernin has joined #openstack-sahara | 03:26 | |
*** rcernin has quit IRC | 07:22 | |
*** tosky has joined #openstack-sahara | 07:52 | |
*** tesseract has joined #openstack-sahara | 08:11 | |
*** tosky has quit IRC | 08:32 | |
*** tosky has joined #openstack-sahara | 08:37 | |
*** irclogbot_2 has quit IRC | 09:39 | |
*** irclogbot_0 has joined #openstack-sahara | 09:41 | |
*** tosky has quit IRC | 09:41 | |
*** tosky has joined #openstack-sahara | 09:42 | |
*** rcernin has joined #openstack-sahara | 09:45 | |
*** tosky_ has joined #openstack-sahara | 09:52 | |
*** tosky has quit IRC | 09:54 | |
*** tosky has joined #openstack-sahara | 09:58 | |
*** tosky_ has quit IRC | 10:01 | |
*** tosky_ has joined #openstack-sahara | 10:05 | |
*** tosky has quit IRC | 10:08 | |
*** rcernin has quit IRC | 10:11 | |
*** tosky has joined #openstack-sahara | 10:14 | |
*** tosky_ has quit IRC | 10:17 | |
*** tosky_ has joined #openstack-sahara | 10:21 | |
*** tosky has quit IRC | 10:24 | |
*** tosky has joined #openstack-sahara | 10:34 | |
*** tosky_ has quit IRC | 10:36 | |
*** tosky_ has joined #openstack-sahara | 10:43 | |
*** tosky has quit IRC | 10:45 | |
*** tosky_ has quit IRC | 10:46 | |
*** tosky has joined #openstack-sahara | 10:46 | |
*** rcernin has joined #openstack-sahara | 10:50 | |
*** tosky has quit IRC | 11:01 | |
*** tosky has joined #openstack-sahara | 11:01 | |
*** tosky_ has joined #openstack-sahara | 11:05 | |
*** tosky has quit IRC | 11:06 | |
*** tosky_ has quit IRC | 11:09 | |
*** tosky has joined #openstack-sahara | 11:09 | |
*** tosky has quit IRC | 11:17 | |
*** tosky has joined #openstack-sahara | 11:23 | |
*** rcernin has quit IRC | 11:26 | |
*** tosky_ has joined #openstack-sahara | 11:47 | |
*** tosky has quit IRC | 11:49 | |
*** tosky has joined #openstack-sahara | 11:50 | |
*** tosky_ has quit IRC | 11:53 | |
*** dave-mccowan has joined #openstack-sahara | 12:20 | |
*** openstackgerrit has quit IRC | 12:41 | |
*** sapd1 has joined #openstack-sahara | 14:32 | |
*** tosky_ has joined #openstack-sahara | 15:00 | |
sapd1 | Hi everyone, I'm trying to create a job with datasource from S3. I'm using spark plugin (2.3) and minio for S3-like. | 15:00 |
---|---|---|
*** tosky has quit IRC | 15:00 | |
sapd1 | I put job binaries and datasource to S3. I have checked sahara-engine log and It could load job binary from S3. But It was failed. I think the problem related datasource. | 15:01 |
sapd1 | I'm using this example too: https://opendev.org/openstack/sahara-tests/src/branch/master/sahara_tests/scenario/defaults/edp-examples/edp-spark | 15:01 |
*** tosky_ is now known as tosky | 15:01 | |
tosky | I tested the S3 example some time ago (but the code hasn't changed) | 15:02 |
tosky | but I tested mostly with the real S3 | 15:02 |
sapd1 | I got stdout log from spark job: http://paste.openstack.org/show/785976/ | 15:13 |
jeremy__bouncer | i don't see much in that log - do you have stderr? | 15:15 |
tosky | unfortunately I don't know that S3 provider; maybe there are some compatibility quirks | 15:15 |
sapd1 | tosky, how can I set 'edp.java.main_class' variable in job execute command line? | 15:18 |
sapd1 | jeremy__bouncer, There is nothing in stderr | 15:19 |
jeremy__bouncer | i believe it's openstack dataprocessing job execute ... --configs key:value | 15:23 |
tosky | uh, I usually used some wrapper scripts which calls the API directly, or the UI; but the main class may be openstack dataprocessing job --mains... | 15:24 |
sapd1 | tosky, jeremy__bouncer Thanks. The correct option is `configs` | 15:25 |
sapd1 | I tried this command: openstack dataprocessing job execute --input input-example --output output --job-template wordcount2 --cluster aaaaa --configs edp.java.main_class:sahara.edp.spark.SparkWordCount | 15:25 |
sapd1 | Is it correct? | 15:26 |
jeremy__bouncer | tosky: --mains is how you refer to a job binary in a job template (at least for certain plugins) | 15:26 |
tosky | oh, right | 15:26 |
jeremy__bouncer | sapd1: it looks okay to me | 15:27 |
jeremy__bouncer | i haven't used the cli for a while though | 15:27 |
*** pcaruana has joined #openstack-sahara | 15:28 | |
sapd1 | jeremy__bouncer, the stderr log: http://paste.openstack.org/show/785982/ | 15:28 |
sapd1 | Which plugin are you using? vanilla or spark. | 15:28 |
jeremy__bouncer | sapd1: currently i don't have any deployed sahara. in the past i've used both vanilla and spark for spark jobs | 15:30 |
jeremy__bouncer | anyway, you are getting that error because you need to specify what file to count words for https://github.com/openstack/sahara-tests/blob/master/sahara_tests/scenario/defaults/edp-examples/edp-spark/wordcountapp/src/main/scala/sahara/edp/spark/SparkWordCount.scala#L28 | 15:30 |
*** tosky has quit IRC | 15:31 | |
*** tosky has joined #openstack-sahara | 15:32 | |
sapd1 | jeremy__bouncer, So maybe the problem is job execute command is not correct. | 15:33 |
jeremy__bouncer | yeah, i guess you need --args | 15:33 |
jeremy__bouncer | ah, i know what it is | 15:34 |
jeremy__bouncer | spark edp jobs do not take an input and output (whereas mapreduce edp and some other types take it) | 15:34 |
jeremy__bouncer | everything for spark is done through args | 15:34 |
jeremy__bouncer | one sec, gotta find the doc that explains this and expains how to reference a datasource in args | 15:35 |
jeremy__bouncer | (this is all much clearer in ui, btw) | 15:35 |
jeremy__bouncer | https://docs.openstack.org/sahara/queens/user/edp.html#using-data-source-references-as-arguments | 15:36 |
jeremy__bouncer | edp.substitute_data_source_for_name or edp.substitute_data_source_for_uuid should be true in configs | 15:37 |
jeremy__bouncer | and then args can contain stuff like datasource://name | 15:37 |
sapd1 | I see | 15:38 |
sapd1 | The args like: s3://bigdata/input-example.txt | 15:38 |
jeremy__bouncer | if you put s3a:// stuff directly into args you will have to specify fs.s3a.* creds/configs manually | 15:41 |
jeremy__bouncer | whereas if you put datasource:// into args all that stuff will be taken care of | 15:41 |
jeremy__bouncer | taken care of, in the definition of the data source, i mean | 15:41 |
sapd1 | Ah. thankyou. It's success | 15:43 |
jeremy__bouncer | cool | 15:44 |
sapd1 | How can I set edp.substitute_data_source_for_name option in the command line? | 15:48 |
sapd1 | It's not params, not configs and not args. | 15:49 |
sapd1 | I don't know how to add this option. | 15:49 |
jeremy__bouncer | it should be configs | 15:53 |
sapd1 | 'Exception in thread "main" java.io.IOException: No FileSystem for scheme: datasource' It does not work. | 15:54 |
jeremy__bouncer | hmm... | 15:59 |
jeremy__bouncer | can you try adding edp.spark.adapt_for_swift to configs also? | 15:59 |
jeremy__bouncer | as true | 15:59 |
jeremy__bouncer | (i know s3 is not swift) | 16:00 |
sapd1 | jeremy__bouncer, I have tried on the horizon, and It worked. | 16:00 |
sapd1 | I want to try with command line. | 16:00 |
sapd1 | My command is: openstack dataprocessing job execute --args datasource://input-example datasource://output --job-template wordcount2 --cluster aaaaa --configs edp.substitute_data_source_for_name:True --configs edp.java.main_class:sahara.edp.spark.SparkWordCount | 16:00 |
sapd1 | But it does not work. | 16:00 |
tosky | "datasource" is a placeholder name; it should be replaced by the type of the datasource (s3a in your case) | 16:00 |
jeremy__bouncer | tosky, nope | 16:01 |
tosky | jeremy__bouncer: or did I misread everything? :) | 16:01 |
jeremy__bouncer | it should actually work with datasource:// | 16:01 |
tosky | I probably forgot | 16:01 |
jeremy__bouncer | elise came up with that | 16:01 |
sapd1 | tosky, It worked with datasource:// | 16:01 |
sapd1 | It worked on the UI :D | 16:01 |
tosky | oook | 16:01 |
jeremy__bouncer | sapd1: if it worked in horizon, then you should be able to view the details of the succeeded job execution and see what configs are present | 16:01 |
jeremy__bouncer | and then replicate those configs in cli | 16:02 |
jeremy__bouncer | anyway i think in cli it should not be --configs multiple times, it should be like --configs k1:v1 k2:v2 | 16:02 |
jeremy__bouncer | that's how openstackclient usually likes things, i think | 16:02 |
sapd1 | jeremy__bouncer, Thanks | 16:09 |
sapd1 | The correct command is: openstack dataprocessing job execute --args datasource://input-example datasource://output --job-template wordcount2 --cluster aaaaa --configs edp.substitute_data_source_for_name:True edp.java.main_class:sahara.edp.spark.SparkWordCount edp.spark.adapt_for_swift:True | 16:09 |
jeremy__bouncer | sapd1: so that command worked? | 16:10 |
sapd1 | Yes. | 16:10 |
sapd1 | I need to define the option adapt_for_swift too. | 16:10 |
jeremy__bouncer | sapd1: awesome, good to know | 16:10 |
sapd1 | thanks for your help. you guys. | 16:11 |
jeremy__bouncer | no problem, sorry it was not so straightforward | 16:11 |
*** tesseract has quit IRC | 16:32 | |
*** pcaruana has quit IRC | 17:03 | |
*** rcernin has joined #openstack-sahara | 21:17 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!