Setup Hadoop on Macos

Table of content

Hadoop installation

In particular, Hadoop will run in a standalone mode. The installation will allow full functionalities for coding practice although it does not provide cluster performance.

Download Hadoop on Macos

  1. Install brew with the following command

    ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

  2. Install java with the following command

    brew install Caskroom/cask/java

  3. Install hadoop witht the following command

    brew install hadoop

    Hadoop is the located in the directory /usr/local/Cellar/hadoop/2.7.1/ in which my current verion number is 2.7.1

Hadoop configuration

For Hadoop configuration, we need to modify the following files in the Macos system

  • /usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/hadoop-env.sh
  • /usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/Core-site.xml
  • /usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/mapred-site.xml
  • /usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/hdfs-site.xml
  • ~/.profile
  1. Modify /usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/hadoop-env.sh

    1. Find the following line from the file

      export HADOOP_OPTS=“$HADOOP_OPTS -Djava.net.preferIPv4Stack=true”

    2. Change it into

      export HADOOP_OPTS=“$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc=”

  2. Modify /usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/Core-site.xml

    1. Find the block for configuration

    2. Replace with the following content

      hadoop.tmp.dir /usr/local/Cellar/hadoop/hdfs/tmp A base for other temporary directories. fs.default.name hdfs://localhost:9000
  3. Modify /usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/mapred-site.xml

    1. Find the block for configuration in the file

    2. Replace it with following block

      mapred.job.tracker localhost:9010
  4. Modify /usr/local/Cellar/hadoop/2.7.1/libexec/etc/hadoop/hdfs-site.xml

    1. Find the block for configuration in the file

    2. Replace it with following block

      dfs.replication 1
  5. Modify ~/.profile

    1. Add alias in ~/.profile

      alias hstart="/usr/local/Cellar/hadoop/2.7.1/sbin/start-dfs.sh; /usr/local/Cellar/hadoop/2.7.1/sbin/start-yarn.sh" alias hstop="/usr/local/Cellar/hadoop/2.7.1/sbin/stop-yarn.sh; /usr/local/Cellar/hadoop/2.7.1/sbin/stop-dfs.sh"

    2. Update the file with the following command

      source ~/.profile

System configuration

  1. Format the Hadoop file system with hdfs namenode -format.
  2. Enable the functionality of remote login from system preference->sharing->remove login.

Start Hadoop

  1. Start Hadoop with the command hstart.
  2. Stop Hadoop with the command hstop.
  3. Generate SSH key to allow pass-free local access.
  4. The running status of Hadoop can be checked by browsing the following address
    1. Resource Manager: http://localhost:50070
    2. JobTracker: http://localhost:8088
    3. Specific Node Information: http://localhost:8042