scala - loading csv file to HBase through Spark -

this simple " how " question:: can bring data spark environment through com.databricks.spark.csv. know how create hbase table through spark, , write data hbase tables manually. possible load text/csv/jason files directly hbase through spark? cannot see talking it. so, checking. if possible, please guide me website explains scala code in detail done.

thank you,

there multiple ways can that.

spark hbase connector:

https://github.com/hortonworks-spark/shc

you can see lot of examples on link.

also can use spark core load data hbase using hbaseconfiguration.

code example:

val filerdd = sc.textfile(args(0), 2)   val transformedrdd = filerdd.map { line => converttokeyvaluepairs(line) }    val conf = hbaseconfiguration.create()   conf.set(tableoutputformat.output_table, "tablename")   conf.set("hbase.zookeeper.quorum", "localhost:2181")   conf.set("hbase.master", "localhost:60000")   conf.set("fs.default.name", "hdfs://localhost:8020")   conf.set("hbase.rootdir", "/hbase")    val jobconf = new configuration(conf)   jobconf.set("mapreduce.job.output.key.class", classof[text].getname)   jobconf.set("mapreduce.job.output.value.class", classof[longwritable].getname)   jobconf.set("mapreduce.outputformat.class", classof[tableoutputformat[text]].getname)    transformedrdd.saveasnewapihadoopdataset(jobconf)    def converttokeyvaluepairs(line: string): (immutablebyteswritable, put) = {      val cfdatabytes = bytes.tobytes("cf")     val rowkey = bytes.tobytes(line.split("\\|")(1))     val put = new put(rowkey)      put.add(cfdatabytes, bytes.tobytes("paymentdate"), bytes.tobytes(line.split("|")(0)))     put.add(cfdatabytes, bytes.tobytes("paymentnumber"), bytes.tobytes(line.split("|")(1)))     put.add(cfdatabytes, bytes.tobytes("vendorname"), bytes.tobytes(line.split("|")(2)))     put.add(cfdatabytes, bytes.tobytes("category"), bytes.tobytes(line.split("|")(3)))     put.add(cfdatabytes, bytes.tobytes("amount"), bytes.tobytes(line.split("|")(4)))     return (new immutablebyteswritable(rowkey), put)   }

also can use one

https://github.com/nerdammer/spark-hbase-connector

Search This Blog

Brent

scala - loading csv file to HBase through Spark -

Comments

Post a Comment

Popular posts from this blog

inversion of control - Autofac named registration constructor injection -

ios - Change Storyboard View using Seague -

verilog - Systemverilog dynamic casting issues -