Tuesday, 2019-11-12

*** rcernin_ has joined #openstack-sahara		03:00
*** rcernin has quit IRC		03:03
*** rcernin_ has quit IRC		03:26
*** rcernin has joined #openstack-sahara		03:26
*** rcernin has quit IRC		07:22
*** tosky has joined #openstack-sahara		07:52
*** tesseract has joined #openstack-sahara		08:11
*** tosky has quit IRC		08:32
*** tosky has joined #openstack-sahara		08:37
*** irclogbot_2 has quit IRC		09:39
*** irclogbot_0 has joined #openstack-sahara		09:41
*** tosky has quit IRC		09:41
*** tosky has joined #openstack-sahara		09:42
*** rcernin has joined #openstack-sahara		09:45
*** tosky_ has joined #openstack-sahara		09:52
*** tosky has quit IRC		09:54
*** tosky has joined #openstack-sahara		09:58
*** tosky_ has quit IRC		10:01
*** tosky_ has joined #openstack-sahara		10:05
*** tosky has quit IRC		10:08
*** rcernin has quit IRC		10:11
*** tosky has joined #openstack-sahara		10:14
*** tosky_ has quit IRC		10:17
*** tosky_ has joined #openstack-sahara		10:21
*** tosky has quit IRC		10:24
*** tosky has joined #openstack-sahara		10:34
*** tosky_ has quit IRC		10:36
*** tosky_ has joined #openstack-sahara		10:43
*** tosky has quit IRC		10:45
*** tosky_ has quit IRC		10:46
*** tosky has joined #openstack-sahara		10:46
*** rcernin has joined #openstack-sahara		10:50
*** tosky has quit IRC		11:01
*** tosky has joined #openstack-sahara		11:01
*** tosky_ has joined #openstack-sahara		11:05
*** tosky has quit IRC		11:06
*** tosky_ has quit IRC		11:09
*** tosky has joined #openstack-sahara		11:09
*** tosky has quit IRC		11:17
*** tosky has joined #openstack-sahara		11:23
*** rcernin has quit IRC		11:26
*** tosky_ has joined #openstack-sahara		11:47
*** tosky has quit IRC		11:49
*** tosky has joined #openstack-sahara		11:50
*** tosky_ has quit IRC		11:53
*** dave-mccowan has joined #openstack-sahara		12:20
*** openstackgerrit has quit IRC		12:41
*** sapd1 has joined #openstack-sahara		14:32
*** tosky_ has joined #openstack-sahara		15:00
sapd1	Hi everyone, I'm trying to create a job with datasource from S3. I'm using spark plugin (2.3) and minio for S3-like.	15:00
*** tosky has quit IRC		15:00
sapd1	I put job binaries and datasource to S3. I have checked sahara-engine log and It could load job binary from S3. But It was failed. I think the problem related datasource.	15:01
sapd1	I'm using this example too: https://opendev.org/openstack/sahara-tests/src/branch/master/sahara_tests/scenario/defaults/edp-examples/edp-spark	15:01
*** tosky_ is now known as tosky		15:01
tosky	I tested the S3 example some time ago (but the code hasn't changed)	15:02
tosky	but I tested mostly with the real S3	15:02
sapd1	I got stdout log from spark job: http://paste.openstack.org/show/785976/	15:13
jeremy__bouncer	i don't see much in that log - do you have stderr?	15:15
tosky	unfortunately I don't know that S3 provider; maybe there are some compatibility quirks	15:15
sapd1	tosky, how can I set 'edp.java.main_class' variable in job execute command line?	15:18
sapd1	jeremy__bouncer, There is nothing in stderr	15:19
jeremy__bouncer	i believe it's openstack dataprocessing job execute ... --configs key:value	15:23
tosky	uh, I usually used some wrapper scripts which calls the API directly, or the UI; but the main class may be openstack dataprocessing job --mains...	15:24
sapd1	tosky, jeremy__bouncer Thanks. The correct option is `configs`	15:25
sapd1	I tried this command: openstack dataprocessing job execute --input input-example --output output --job-template wordcount2 --cluster aaaaa --configs edp.java.main_class:sahara.edp.spark.SparkWordCount	15:25
sapd1	Is it correct?	15:26
jeremy__bouncer	tosky: --mains is how you refer to a job binary in a job template (at least for certain plugins)	15:26
tosky	oh, right	15:26
jeremy__bouncer	sapd1: it looks okay to me	15:27
jeremy__bouncer	i haven't used the cli for a while though	15:27
*** pcaruana has joined #openstack-sahara		15:28
sapd1	jeremy__bouncer, the stderr log: http://paste.openstack.org/show/785982/	15:28
sapd1	Which plugin are you using? vanilla or spark.	15:28
jeremy__bouncer	sapd1: currently i don't have any deployed sahara. in the past i've used both vanilla and spark for spark jobs	15:30
jeremy__bouncer	anyway, you are getting that error because you need to specify what file to count words for https://github.com/openstack/sahara-tests/blob/master/sahara_tests/scenario/defaults/edp-examples/edp-spark/wordcountapp/src/main/scala/sahara/edp/spark/SparkWordCount.scala#L28	15:30
*** tosky has quit IRC		15:31
*** tosky has joined #openstack-sahara		15:32
sapd1	jeremy__bouncer, So maybe the problem is job execute command is not correct.	15:33
jeremy__bouncer	yeah, i guess you need --args	15:33
jeremy__bouncer	ah, i know what it is	15:34
jeremy__bouncer	spark edp jobs do not take an input and output (whereas mapreduce edp and some other types take it)	15:34
jeremy__bouncer	everything for spark is done through args	15:34
jeremy__bouncer	one sec, gotta find the doc that explains this and expains how to reference a datasource in args	15:35
jeremy__bouncer	(this is all much clearer in ui, btw)	15:35
jeremy__bouncer	https://docs.openstack.org/sahara/queens/user/edp.html#using-data-source-references-as-arguments	15:36
jeremy__bouncer	edp.substitute_data_source_for_name or edp.substitute_data_source_for_uuid should be true in configs	15:37
jeremy__bouncer	and then args can contain stuff like datasource://name	15:37
sapd1	I see	15:38
sapd1	The args like: s3://bigdata/input-example.txt	15:38
jeremy__bouncer	if you put s3a:// stuff directly into args you will have to specify fs.s3a.* creds/configs manually	15:41
jeremy__bouncer	whereas if you put datasource:// into args all that stuff will be taken care of	15:41
jeremy__bouncer	taken care of, in the definition of the data source, i mean	15:41
sapd1	Ah. thankyou. It's success	15:43
jeremy__bouncer	cool	15:44
sapd1	How can I set edp.substitute_data_source_for_name option in the command line?	15:48
sapd1	It's not params, not configs and not args.	15:49
sapd1	I don't know how to add this option.	15:49
jeremy__bouncer	it should be configs	15:53
sapd1	'Exception in thread "main" java.io.IOException: No FileSystem for scheme: datasource' It does not work.	15:54
jeremy__bouncer	hmm...	15:59
jeremy__bouncer	can you try adding edp.spark.adapt_for_swift to configs also?	15:59
jeremy__bouncer	as true	15:59
jeremy__bouncer	(i know s3 is not swift)	16:00
sapd1	jeremy__bouncer, I have tried on the horizon, and It worked.	16:00
sapd1	I want to try with command line.	16:00
sapd1	My command is: openstack dataprocessing job execute --args datasource://input-example datasource://output --job-template wordcount2 --cluster aaaaa --configs edp.substitute_data_source_for_name:True --configs edp.java.main_class:sahara.edp.spark.SparkWordCount	16:00
sapd1	But it does not work.	16:00
tosky	"datasource" is a placeholder name; it should be replaced by the type of the datasource (s3a in your case)	16:00
jeremy__bouncer	tosky, nope	16:01
tosky	jeremy__bouncer: or did I misread everything? :)	16:01
jeremy__bouncer	it should actually work with datasource://	16:01
tosky	I probably forgot	16:01
jeremy__bouncer	elise came up with that	16:01
sapd1	tosky, It worked with datasource://	16:01
sapd1	It worked on the UI :D	16:01
tosky	oook	16:01
jeremy__bouncer	sapd1: if it worked in horizon, then you should be able to view the details of the succeeded job execution and see what configs are present	16:01
jeremy__bouncer	and then replicate those configs in cli	16:02
jeremy__bouncer	anyway i think in cli it should not be --configs multiple times, it should be like --configs k1:v1 k2:v2	16:02
jeremy__bouncer	that's how openstackclient usually likes things, i think	16:02
sapd1	jeremy__bouncer, Thanks	16:09
sapd1	The correct command is: openstack dataprocessing job execute --args datasource://input-example datasource://output --job-template wordcount2 --cluster aaaaa --configs edp.substitute_data_source_for_name:True edp.java.main_class:sahara.edp.spark.SparkWordCount edp.spark.adapt_for_swift:True	16:09
jeremy__bouncer	sapd1: so that command worked?	16:10
sapd1	Yes.	16:10
sapd1	I need to define the option adapt_for_swift too.	16:10
jeremy__bouncer	sapd1: awesome, good to know	16:10
sapd1	thanks for your help. you guys.	16:11
jeremy__bouncer	no problem, sorry it was not so straightforward	16:11
*** tesseract has quit IRC		16:32
*** pcaruana has quit IRC		17:03
*** rcernin has joined #openstack-sahara		21:17

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!