Problem : How do I build my own version of Hadoop with my custom patch.
Solution : Apply patch and build hadoop.
You will need : Hadoop Source code, Custom Patch, Java 6 , Apache Ant, Java 5 (for generating Documents), Apache Forrest (for generating documents).
Checkout hadoop source code,
> svn co https://svn.apache.org/repos/asf/hadoop/common/tags/release-X.Y.Z-rcR -m “Hadoop-X.Y.Z-rcR.release.”
Apply your patch for checking it’s functionality using following command
> patch -p0 -E < ~/Path/To/Patch.patch
Ant test and compile source code with latest patch.
> ant ant -Djava5.home=/System/Library/Frameworks/JavaVM.framework/Versions/1.5/Home/ -Dforrest.home=/Path/to/forrest/apache-forrest-0.8 -Dfindbugs.home=/Path/to/findbugs/latest compile-core compile-core tar
How to build documents.
> ant -Dforrest.home=$FORREST_HOME -Djava5.home=$JAVA5 docs
Problem : You have multiple Hadoop clusters running and you want to transfer several tera bytes of data from one cluster to another.
Solution : DistCp – Distributed copy.
It’s common that hadoop clusters are loaded with tera bytes of data (not all clusters are of Petabytes of size 🙂 ), It will take forever to transfer terabytes of data from one cluster to another. Distributed or parallel copying of data can be a good solution for this and that is what Distcp does. Distcp runs map reduce job to transfer your data from one cluster to another.
To transfer data using DistCp you need to specify hdfs path name of source and destination as shown below.
bash$ hadoop distcp hdfs://nn1:8020/foo/bar \
You can also specify multiple source directories on the command line:
bash$ hadoop distcp hdfs://nn1:8020/foo/a \
Or, equivalently, from a file using the -f option:
bash$ hadoop distcp -f hdfs://nn1:8020/srclist \
Where srclist contains
Click here to learn more about DistCp