scala - loading csv file to HBase through Spark -


this simple " how " question:: can bring data spark environment through com.databricks.spark.csv. know how create hbase table through spark, , write data hbase tables manually. possible load text/csv/jason files directly hbase through spark? cannot see talking it. so, checking. if possible, please guide me website explains scala code in detail done.

thank you,

there multiple ways can that.

  1. spark hbase connector:

https://github.com/hortonworks-spark/shc

you can see lot of examples on link.

  1. also can use spark core load data hbase using hbaseconfiguration.

code example:

val filerdd = sc.textfile(args(0), 2)   val transformedrdd = filerdd.map { line => converttokeyvaluepairs(line) }    val conf = hbaseconfiguration.create()   conf.set(tableoutputformat.output_table, "tablename")   conf.set("hbase.zookeeper.quorum", "localhost:2181")   conf.set("hbase.master", "localhost:60000")   conf.set("fs.default.name", "hdfs://localhost:8020")   conf.set("hbase.rootdir", "/hbase")    val jobconf = new configuration(conf)   jobconf.set("mapreduce.job.output.key.class", classof[text].getname)   jobconf.set("mapreduce.job.output.value.class", classof[longwritable].getname)   jobconf.set("mapreduce.outputformat.class", classof[tableoutputformat[text]].getname)    transformedrdd.saveasnewapihadoopdataset(jobconf)    def converttokeyvaluepairs(line: string): (immutablebyteswritable, put) = {      val cfdatabytes = bytes.tobytes("cf")     val rowkey = bytes.tobytes(line.split("\\|")(1))     val put = new put(rowkey)      put.add(cfdatabytes, bytes.tobytes("paymentdate"), bytes.tobytes(line.split("|")(0)))     put.add(cfdatabytes, bytes.tobytes("paymentnumber"), bytes.tobytes(line.split("|")(1)))     put.add(cfdatabytes, bytes.tobytes("vendorname"), bytes.tobytes(line.split("|")(2)))     put.add(cfdatabytes, bytes.tobytes("category"), bytes.tobytes(line.split("|")(3)))     put.add(cfdatabytes, bytes.tobytes("amount"), bytes.tobytes(line.split("|")(4)))     return (new immutablebyteswritable(rowkey), put)   } 
  1. also can use one

https://github.com/nerdammer/spark-hbase-connector


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -