hadoop - Spark saveAsNewAPIHadoopFile java.io.IOException: Could not find a serializer for the Value class -


i'm trying store java pair rdd hadoop sequence file follows:

javapairrdd<immutablebyteswritable, put> putrdd = ... config.set("io.serializations","org.apache.hadoop.io.serializer.javaserialization,org.apache.hadoop.io.serializer.writableserialization"); putrdd.saveasnewapihadoopfile(outputpath, immutablebyteswritable.class, put.class, sequencefileoutputformat.class, config); 

but exception if i'm setting io.serializations:

2017-04-06 14:39:32,623 error [executor task launch worker-0] executor.executor: exception in task 0.0 in stage 0.0 (tid 0) java.io.ioexception: not find serializer value class: 'org.apache.hadoop.hbase.client.put'. please ensure configuration 'io.serializations' configured, if you're usingcustom serialization.     @ org.apache.hadoop.io.sequencefile$writer.init(sequencefile.java:1192)     @ org.apache.hadoop.io.sequencefile$writer.<init>(sequencefile.java:1094)     @ org.apache.hadoop.io.sequencefile.createwriter(sequencefile.java:273)     @ org.apache.hadoop.io.sequencefile.createwriter(sequencefile.java:530)     @ org.apache.hadoop.mapreduce.lib.output.sequencefileoutputformat.getsequencewriter(sequencefileoutputformat.java:64)     @ org.apache.hadoop.mapreduce.lib.output.sequencefileoutputformat.getrecordwriter(sequencefileoutputformat.java:75)     @ org.apache.spark.rdd.pairrddfunctions$$anonfun$saveasnewapihadoopdataset$1$$anonfun$12.apply(pairrddfunctions.scala:1030)     @ org.apache.spark.rdd.pairrddfunctions$$anonfun$saveasnewapihadoopdataset$1$$anonfun$12.apply(pairrddfunctions.scala:1014)     @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:66)     @ org.apache.spark.scheduler.task.run(task.scala:88)     @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:214)     @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142)     @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617)     @ java.lang.thread.run(thread.java:745) 2017-04-06 14:39:32,669 error [task-result-getter-0] scheduler.tasksetmanager: task 0 in stage 0.0 failed 1 times; aborting job 

any idea on how can fix this??

i find fix, apparently put (and hbase mutations) have specific serialiser mutationserialization.

the following line fixes issue:

config.setstrings("io.serializations",      config.get("io.serializations"),     mutationserialization.class.getname(),     resultserialization.class.getname()); 

Comments

Popular posts from this blog

inversion of control - Autofac named registration constructor injection -

verilog - Systemverilog dynamic casting issues -

ios - Change Storyboard View using Seague -