apache pig - Sending relation to UDF functions -
can send relation pig udf function input? relation can have multiple tuples in it. how read each tuple 1 one in pig udf function?
ok.below sample input file.
surender,hdfc,60000,cts raja,axis,80000,tcs raj,hdfc,70000,tcs kumar,axis,70000,cts remya,axis,40000,cts arun,sbi,30000,tcs vimal,sbi,10000,tcs ankur,hdfc,80000,cts karthic,hdfc,95000,cts sandhya,axis,60000,cts amit,sbi,70000,cts myinput = load '/home/cloudera/surender/laurela/balance.txt' using pigstorage(',') as(name:chararray,bank:chararray,amt:long,company:chararray); grouped = group myinput company;
all need details highest paid employee in each company. how use udf ?
i need this
cts karthic,hdfc,95000,cts tcs raja,axis,80000,tcs
can me on this.
this script give results want :
= load '/home/cloudera/surender/laurela/balance.txt' using pigstorage(',') as(name:chararray,bank:chararray,amt:long,company:chararray); b = group (company); topresults = foreach b {result = top(1, 2, a); generate flatten(result);} dump topresults;
explanation:
first group on basis of company.so is:
(cts,{(surender,hdfc,60000,cts),(kumar,axis,70000,cts),(remya,axis,40000,cts),(ankur,hdfc,80000,cts),(karthic,hdfc,95000,cts),(sandhya,axis,60000,cts),(amit,sbi,70000,cts)})
(tcs,{(raja,axis,80000,tcs),(raj,hdfc,70000,tcs),(arun,sbi,30000,tcs),(vimal,sbi,10000,tcs)})
then foreach tuple in b , generate tuple result equal top 1 record relation a found in b on basis of value of column number 2 i.e. amt. columns numbered 0.
note first data has spaces after company name. please remove spaces or use following data :
surender,hdfc,60000,cts raja,axis,80000,tcs raj,hdfc,70000,tcs kumar,axis,70000,cts remya,axis,40000,cts arun,sbi,30000,tcs vimal,sbi,10000,tcs ankur,hdfc,80000,cts karthic,hdfc,95000,cts sandhya,axis,60000,cts mit,sbi,70000,cts
Comments
Post a Comment