scala - Best Practices using Hbase in a Spark process -
i'm trying tuning written spark process, , there problem when process need data hbase, because start making lot of calls , doesn't seem optimized.
the code below invoke repository method in order informations needed. @ repository code , inside it, being done lot hbase queries, 4 or 5 queries order aggregate data , return final informations, can imagine how many queries milions of orders. the repository java class.
val orderswithconsumptions = orders.map { order => val consumptions = order.getconsperiods.map(id => consumptionrepository.findbyperiodid(id, order.getpod))) (order, consumptions) }
is there more distribute/massive way work?
i avoid use connections , i'm considering use mappartitions
that.
another idea maybe remove repository , data aggregation work straight away scala process code, can't see big improvements doing that, maybe bit more control , less written code.
but: don't have other idea, hint, idea or best practices appreciated.
Comments
Post a Comment