Setup Hadoop on Macos
Table of content
Hadoop installation
In particular, Hadoop will run in a standalone mode. The installation will allow full functionalities for coding practice although it does not provide cluster performance.
Download Hadoop on Macos
-
Install
brewwith the following commandruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" -
Install
javawith the following commandbrew install Caskroom/cask/java -
Install
hadoopwitht the following commandbrew install hadoopHadoop is the located in the directory
/usr/local/Cellar/hadoop/2.7.1/in which my current verion number is 2.7.1
Hadoop configuration
For Hadoop configuration, we need to modify the following files in the Macos system
/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/hadoop-env.sh/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/Core-site.xml/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/mapred-site.xml/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/hdfs-site.xml~/.profile
-
Modify
/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/hadoop-env.sh-
Find the following line from the file
export HADOOP_OPTS=“$HADOOP_OPTS -Djava.net.preferIPv4Stack=true”
-
Change it into
export HADOOP_OPTS=“$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc=”
-
-
Modify
/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/Core-site.xml-
Find the block for configuration
-
Replace with the following content
hadoop.tmp.dir /usr/local/Cellar/hadoop/hdfs/tmp A base for other temporary directories. fs.default.name hdfs://localhost:9000
-
-
Modify
/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/mapred-site.xml-
Find the block for configuration in the file
-
Replace it with following block
mapred.job.tracker localhost:9010
-
-
Modify
/usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/hdfs-site.xml-
Find the block for configuration in the file
-
Replace it with following block
dfs.replication 1
-
-
Modify
~/.profile-
Add alias in
~/.profilealias hstart="/usr/local/Cellar/hadoop/2.7.1/sbin/start-dfs.sh;/usr/local/Cellar/hadoop/2.7.1/sbin/start-yarn.sh"alias hstop="/usr/local/Cellar/hadoop/2.7.1/sbin/stop-yarn.sh;/usr/local/Cellar/hadoop/2.7.1/sbin/stop-dfs.sh" -
Update the file with the following command
source ~/.profile
-
System configuration
- Format the Hadoop file system with
hdfs namenode -format. - Enable the functionality of remote login from
system preference->sharing->remove login.
Start Hadoop
- Start Hadoop with the command
hstart. - Stop Hadoop with the command
hstop. - Generate SSH key to allow pass-free local access.
- The running status of Hadoop can be checked by browsing the following address
- Resource Manager: http://localhost:50070
- JobTracker: http://localhost:8088
- Specific Node Information: http://localhost:8042