scala - Best Practices using Hbase in a Spark process -

i'm trying tuning written spark process, , there problem when process need data hbase, because start making lot of calls , doesn't seem optimized.

the code below invoke repository method in order informations needed. @ repository code , inside it, being done lot hbase queries, 4 or 5 queries order aggregate data , return final informations, can imagine how many queries milions of orders. the repository java class.

val orderswithconsumptions = orders.map { order =>    val consumptions = order.getconsperiods.map(id => consumptionrepository.findbyperiodid(id, order.getpod)))    (order, consumptions) }

is there more distribute/massive way work?

i avoid use connections , i'm considering use mappartitions that.

another idea maybe remove repository , data aggregation work straight away scala process code, can't see big improvements doing that, maybe bit more control , less written code.

but: don't have other idea, hint, idea or best practices appreciated.

Search This Blog

Brent

scala - Best Practices using Hbase in a Spark process -

Comments

Post a Comment

Popular posts from this blog

inversion of control - Autofac named registration constructor injection -

ios - Change Storyboard View using Seague -

verilog - Systemverilog dynamic casting issues -