bash - Copying files from a hdfs directory to another with oozie distcp-action -


my actions

start_fair_usage ends status okey, test_copy returns

main class [org.apache.oozie.action.hadoop.distcpmain], main() threw exception, null 

in /user/comverse/data/${1}_b have lot of different files, of want copy ${name_node}/user/evkuzmin/output. try pass paths copy_files.sh holds array of paths files need.

  <action name="start_fair_usage">     <shell xmlns="uri:oozie:shell-action:0.1">       <job-tracker>${job_tracker}</job-tracker>       <name-node>${name_node}</name-node>       <exec>${copy_file}</exec>             <argument>${today_without_dash}</argument>       <argument>${mta}</argument>       <!-- <file>${path}#${start_fair_usage}</file> -->       <file>${path}${copy_file}#${copy_file}</file>       <capture-output/>     </shell>     <ok to="test_copy"/>     <error to="kill"/>   </action>    <action name="test_copy">     <distcp xmlns="uri:oozie:distcp-action:0.2">       <job-tracker>${job_tracker}</job-tracker>       <name-node>${name_node}</name-node>       <arg>${wf:actiondata('start_fair_usage')['paths']}</arg>       <!-- <arg>${name_node}/user/evkuzmin/input/*</arg> -->       <arg>${name_node}/user/evkuzmin/output</arg>     </distcp>     <ok to="end"/>     <error to="kill"/>   </action> 

start_fair_usage starts copy_file.sh

echo ${1}  echo ${2}  dirs=(     /user/comverse/data/${1}_b     ) args=()  in $(hadoop fs -ls "${dirs[@]}" | egrep ${2}.gz | awk -f " " '{print $8}')     args+=("$i")     echo "copy file - "${i} done  paths=${args} echo ${paths} 

here did in end.

  <start to="start_copy"/>    <fork name="start_copy">     <path start="copy_mta"/>     <path start="copy_rcr"/>     <path start="copy_sub"/>   </fork>    <action name="copy_mta">     <distcp xmlns="uri:oozie:distcp-action:0.2">       <prepare>         <delete path="${name_node}${dstfolder}mta/*"/>       </prepare>       <arg>${name_node}${srcfolder}/*mta.gz</arg>       <arg>${name_node}${dstfolder}mta/</arg>     </distcp>     <ok to="end_copy"/>     <error to="kill"/>   </action>    <action name="copy_rcr">     <distcp xmlns="uri:oozie:distcp-action:0.2">       <prepare>         <delete path="${name_node}${dstfolder}rcr/*"/>       </prepare>       <arg>${name_node}${srcfolder}/*rcr.gz</arg>       <arg>${name_node}${dstfolder}rcr/</arg>     </distcp>     <ok to="end_copy"/>     <error to="kill"/>   </action>    <action name="copy_sub">     <distcp xmlns="uri:oozie:distcp-action:0.2">       <prepare>         <delete path="${name_node}${dstfolder}sub/*"/>       </prepare>       <arg>${name_node}${srcfolder}/*sub.gz</arg>       <arg>${name_node}${dstfolder}sub/</arg>     </distcp>     <ok to="end_copy"/>     <error to="kill"/>   </action>    <join name="end_copy" to="end"/>    <kill name="kill">     <message>action failed, error message[${wf:errormessage(wf:lasterrornode())}]</message>   </kill>   <end name="end"/> 

turned out possible use wildcards in distcp, didn't need bash @ all.

also. people adviced me write in scala.

import org.apache.hadoop.conf.configuration import org.apache.hadoop.fs.{filesystem, path, fileutil}  val conf = new configuration() val fs = filesystem.get(conf)  val listoffiletypes = list("mta", "rcr", "sub") val listofplatforms = list("b", "c", "h", "m", "y")  for(filetype <- listoffiletypes){   fileutil.fullydeletecontents(new file("/apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin/file_" + filetype))   (platform <- listofplatforms) {     var srcpaths = fs.globstatus(new path("/user/comverse/data/" + "20170404" + "_" + platform + "/*" + filetype + ".gz"))     var dstpath = new path("/apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin/file_" + filetype)      for(srcpath <- srcpaths){       println("copying " + srcpath.getpath.tostring)       fileutil.copy(fs, srcpath.getpath, fs, dstpath, false, conf)     }   } } 

both things work, thought haven't tried run scala script in oozie.


Comments

Popular posts from this blog

inversion of control - Autofac named registration constructor injection -

verilog - Systemverilog dynamic casting issues -

ios - Change Storyboard View using Seague -