scala - Best Practices using Hbase in a Spark process -


i'm trying tuning written spark process, , there problem when process need data hbase, because start making lot of calls , doesn't seem optimized.

the code below invoke repository method in order informations needed. @ repository code , inside it, being done lot hbase queries, 4 or 5 queries order aggregate data , return final informations, can imagine how many queries milions of orders. the repository java class.

val orderswithconsumptions = orders.map { order =>    val consumptions = order.getconsperiods.map(id => consumptionrepository.findbyperiodid(id, order.getpod)))    (order, consumptions) } 

is there more distribute/massive way work?

i avoid use connections , i'm considering use mappartitions that.

another idea maybe remove repository , data aggregation work straight away scala process code, can't see big improvements doing that, maybe bit more control , less written code.

but: don't have other idea, hint, idea or best practices appreciated.


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -