bash - Copying files from a hdfs directory to another with oozie distcp-action -
my actions
start_fair_usage ends status okey, test_copy returns
main class [org.apache.oozie.action.hadoop.distcpmain], main() threw exception, null in /user/comverse/data/${1}_b have lot of different files, of want copy ${name_node}/user/evkuzmin/output. try pass paths copy_files.sh holds array of paths files need.
<action name="start_fair_usage"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${job_tracker}</job-tracker> <name-node>${name_node}</name-node> <exec>${copy_file}</exec> <argument>${today_without_dash}</argument> <argument>${mta}</argument> <!-- <file>${path}#${start_fair_usage}</file> --> <file>${path}${copy_file}#${copy_file}</file> <capture-output/> </shell> <ok to="test_copy"/> <error to="kill"/> </action> <action name="test_copy"> <distcp xmlns="uri:oozie:distcp-action:0.2"> <job-tracker>${job_tracker}</job-tracker> <name-node>${name_node}</name-node> <arg>${wf:actiondata('start_fair_usage')['paths']}</arg> <!-- <arg>${name_node}/user/evkuzmin/input/*</arg> --> <arg>${name_node}/user/evkuzmin/output</arg> </distcp> <ok to="end"/> <error to="kill"/> </action> start_fair_usage starts copy_file.sh
echo ${1} echo ${2} dirs=( /user/comverse/data/${1}_b ) args=() in $(hadoop fs -ls "${dirs[@]}" | egrep ${2}.gz | awk -f " " '{print $8}') args+=("$i") echo "copy file - "${i} done paths=${args} echo ${paths}
here did in end.
<start to="start_copy"/> <fork name="start_copy"> <path start="copy_mta"/> <path start="copy_rcr"/> <path start="copy_sub"/> </fork> <action name="copy_mta"> <distcp xmlns="uri:oozie:distcp-action:0.2"> <prepare> <delete path="${name_node}${dstfolder}mta/*"/> </prepare> <arg>${name_node}${srcfolder}/*mta.gz</arg> <arg>${name_node}${dstfolder}mta/</arg> </distcp> <ok to="end_copy"/> <error to="kill"/> </action> <action name="copy_rcr"> <distcp xmlns="uri:oozie:distcp-action:0.2"> <prepare> <delete path="${name_node}${dstfolder}rcr/*"/> </prepare> <arg>${name_node}${srcfolder}/*rcr.gz</arg> <arg>${name_node}${dstfolder}rcr/</arg> </distcp> <ok to="end_copy"/> <error to="kill"/> </action> <action name="copy_sub"> <distcp xmlns="uri:oozie:distcp-action:0.2"> <prepare> <delete path="${name_node}${dstfolder}sub/*"/> </prepare> <arg>${name_node}${srcfolder}/*sub.gz</arg> <arg>${name_node}${dstfolder}sub/</arg> </distcp> <ok to="end_copy"/> <error to="kill"/> </action> <join name="end_copy" to="end"/> <kill name="kill"> <message>action failed, error message[${wf:errormessage(wf:lasterrornode())}]</message> </kill> <end name="end"/> turned out possible use wildcards in distcp, didn't need bash @ all.
also. people adviced me write in scala.
import org.apache.hadoop.conf.configuration import org.apache.hadoop.fs.{filesystem, path, fileutil} val conf = new configuration() val fs = filesystem.get(conf) val listoffiletypes = list("mta", "rcr", "sub") val listofplatforms = list("b", "c", "h", "m", "y") for(filetype <- listoffiletypes){ fileutil.fullydeletecontents(new file("/apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin/file_" + filetype)) (platform <- listofplatforms) { var srcpaths = fs.globstatus(new path("/user/comverse/data/" + "20170404" + "_" + platform + "/*" + filetype + ".gz")) var dstpath = new path("/apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin/file_" + filetype) for(srcpath <- srcpaths){ println("copying " + srcpath.getpath.tostring) fileutil.copy(fs, srcpath.getpath, fs, dstpath, false, conf) } } } both things work, thought haven't tried run scala script in oozie.
Comments
Post a Comment