hadoop - Spark saveAsNewAPIHadoopFile java.io.IOException: Could not find a serializer for the Value class -
i'm trying store java pair rdd hadoop sequence file follows:
javapairrdd<immutablebyteswritable, put> putrdd = ... config.set("io.serializations","org.apache.hadoop.io.serializer.javaserialization,org.apache.hadoop.io.serializer.writableserialization"); putrdd.saveasnewapihadoopfile(outputpath, immutablebyteswritable.class, put.class, sequencefileoutputformat.class, config); but exception if i'm setting io.serializations:
2017-04-06 14:39:32,623 error [executor task launch worker-0] executor.executor: exception in task 0.0 in stage 0.0 (tid 0) java.io.ioexception: not find serializer value class: 'org.apache.hadoop.hbase.client.put'. please ensure configuration 'io.serializations' configured, if you're usingcustom serialization. @ org.apache.hadoop.io.sequencefile$writer.init(sequencefile.java:1192) @ org.apache.hadoop.io.sequencefile$writer.<init>(sequencefile.java:1094) @ org.apache.hadoop.io.sequencefile.createwriter(sequencefile.java:273) @ org.apache.hadoop.io.sequencefile.createwriter(sequencefile.java:530) @ org.apache.hadoop.mapreduce.lib.output.sequencefileoutputformat.getsequencewriter(sequencefileoutputformat.java:64) @ org.apache.hadoop.mapreduce.lib.output.sequencefileoutputformat.getrecordwriter(sequencefileoutputformat.java:75) @ org.apache.spark.rdd.pairrddfunctions$$anonfun$saveasnewapihadoopdataset$1$$anonfun$12.apply(pairrddfunctions.scala:1030) @ org.apache.spark.rdd.pairrddfunctions$$anonfun$saveasnewapihadoopdataset$1$$anonfun$12.apply(pairrddfunctions.scala:1014) @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:66) @ org.apache.spark.scheduler.task.run(task.scala:88) @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:214) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617) @ java.lang.thread.run(thread.java:745) 2017-04-06 14:39:32,669 error [task-result-getter-0] scheduler.tasksetmanager: task 0 in stage 0.0 failed 1 times; aborting job any idea on how can fix this??
i find fix, apparently put (and hbase mutations) have specific serialiser mutationserialization.
the following line fixes issue:
config.setstrings("io.serializations", config.get("io.serializations"), mutationserialization.class.getname(), resultserialization.class.getname());
Comments
Post a Comment