scala - loading csv file to HBase through Spark -
this simple " how " question:: can bring data spark environment through com.databricks.spark.csv. know how create hbase table through spark, , write data hbase tables manually. possible load text/csv/jason files directly hbase through spark? cannot see talking it. so, checking. if possible, please guide me website explains scala code in detail done.
thank you,
there multiple ways can that.
- spark hbase connector:
https://github.com/hortonworks-spark/shc
you can see lot of examples on link.
- also can use spark core load data hbase using hbaseconfiguration.
code example:
val filerdd = sc.textfile(args(0), 2) val transformedrdd = filerdd.map { line => converttokeyvaluepairs(line) } val conf = hbaseconfiguration.create() conf.set(tableoutputformat.output_table, "tablename") conf.set("hbase.zookeeper.quorum", "localhost:2181") conf.set("hbase.master", "localhost:60000") conf.set("fs.default.name", "hdfs://localhost:8020") conf.set("hbase.rootdir", "/hbase") val jobconf = new configuration(conf) jobconf.set("mapreduce.job.output.key.class", classof[text].getname) jobconf.set("mapreduce.job.output.value.class", classof[longwritable].getname) jobconf.set("mapreduce.outputformat.class", classof[tableoutputformat[text]].getname) transformedrdd.saveasnewapihadoopdataset(jobconf) def converttokeyvaluepairs(line: string): (immutablebyteswritable, put) = { val cfdatabytes = bytes.tobytes("cf") val rowkey = bytes.tobytes(line.split("\\|")(1)) val put = new put(rowkey) put.add(cfdatabytes, bytes.tobytes("paymentdate"), bytes.tobytes(line.split("|")(0))) put.add(cfdatabytes, bytes.tobytes("paymentnumber"), bytes.tobytes(line.split("|")(1))) put.add(cfdatabytes, bytes.tobytes("vendorname"), bytes.tobytes(line.split("|")(2))) put.add(cfdatabytes, bytes.tobytes("category"), bytes.tobytes(line.split("|")(3))) put.add(cfdatabytes, bytes.tobytes("amount"), bytes.tobytes(line.split("|")(4))) return (new immutablebyteswritable(rowkey), put) }
- also can use one
Comments
Post a Comment