Infrastructure at your Service

Mehdi Bada

Deploy a Cloudera cluster with Terraform and Ansible in Azure – part 2

In this part of the blog posts series, we will show how ansible helps us to configure our cluster and install all pre-requisite needed for Cloudera Manager. Ansible is one of the most important automation tools currently.

Ansible will help us to configure all nodes for a manual installation using Cloudera Manager. Our playbook will contain the following roles:

  • cm_repo: add the same C.M repo into all nodes.
  • os_config: Adjust all OS parameter for installing a Cloudera cluster. 
  • java: Java JDK 1.7.80 installation.
  • cm_agents: Installation of the C.M agent’s packages
  • MariaDB: Installation of a MariaDB. C.M needs an Oracle, MySQL (MariaDB) or PostgreSQL database for Cloudera Manager meta-data storage and Hive meta-store.
  • mysql_connector: Installation of the MySQL connector for connecting to MariaDB. 
  • scm: Install and start the Cloudera Manager Server.

In a Big Data cluster, we split the node into roles.

  • Manager: dedicated node for all Cloudera Manager daemons
  • Master: NameNode daemon + Secondary NameNode daemon
  • Workers: DataNode daemons

The first step is to define the Ansible hosts inventory file. Below my inventory file.

manager ansible_host=<manager_ip> id=6

manager  ansible_host=<manager_ip> id=6

master ansible_host=<master_ip>  id=5

worker1 ansible_host=<worker1>  id=2
worker2 ansible_host=<worker2>  id=3
worker3 ansible_host=<worker3>  id=4



We will now, define all variable needed for our roles. Variables are split into roles:

Below the example of variables definition for CDH server instances: cdh_servers.yml


db_hostname: "{{ hostvars[groups['db_server'][0]]['inventory_hostname'] }}"
scm_hostname: "{{ hostvars[groups['cdh_manager'][0]]['inventory_hostname'] }}"

cdh_version: 5.14.2
cluster_display_name: cluster_1

# Users and Groups
  - dbi
  - dbi

# Java variables
java_download_folder: /usr/java
java_name: "{{java_download_folder}}/jdk1.7_80"
java_archive: "{{java_download_folder}}/jdk-7u80-linux-x64.tar.gz"

# Mysql Java connector
mysql_java: mysql-connector-java-5.1.46
mysql_java_download_folder: /usr/share/mysql-java/
mysql_java_archive: "{{ mysql_java_download_folder }}/{{ mysql_java }}.tar.gz"

mysql_java_jar: /usr/share/java/mysql-connector-java.jar

Same files will created for database server variable (db_server.yml) and Cloudera Manager server variables (scm_server.yml).

After the variables definition, we can start creating the different roles and their associated tasks.

 Cloudera Manager repo

The goal of this role is to add the same C.M repo in all cluster hosts. We will use a template of the repository file.


# Packages for Cloudera Manager, Version 5, on RedHat or CentOS 7 x86_64
name=Cloudera Manager
gpgcheck = 1


- name: Add Cloudera repo
    src: ../templates/cloudera-manager.repo.j2
    dest: "/etc/yum.repos.d/cloudera-manager{{cdh_version}}.repo"

The definition of the Cloudera Manager version has previously done in the cdh_servers.yml variable file.

OS Configuration

Some requirements are needed before installing a Cloudera cluster. This role will configure all hosts with Cloudera requirements: .

- name: Create groups
    name: "{{item}}"
    state: present
  with_items: "{{group}}"

- name: Create user
    name: "{{item}}"
    shell: /bin/bash
    uid: 1050
    groups: "{{group}}"
  with_items: "{{user}}"

- name: "Build hosts file"
    dest: /etc/hosts
    regexp: '.*{{ item }}$'
    line: "{{ hostvars
  • ['ansible_default_ipv4']['address'] }} {{item}}"     state: present   when: hostvars
  • ['ansible_default_ipv4']['address'] is defined   with_items: '{{groups.all}}' - name: Disable transparent huge page - defrag   shell: echo "never" > /sys/kernel/mm/transparent_hugepage/defrag - name: Disable transparent huge page - enabled   shell: echo "never" > /sys/kernel/mm/transparent_hugepage/enabled - name: VM swappiness - 1   shell: echo "1" > /proc/sys/vm/swappiness - name: Set VM swappiness - 2   sysctl:     name: vm.swappiness     value: 1     state: present - name: Create /data dir   file:     path: /data     state: directory     mode: 0775     owner: dbi     group: dbi - name: Create file system on volume   filesystem:     fstype: ext4     dev: /dev/xvdb - name: Mount volume as /data   mount:     name: /data     src: /dev/xvdb     fstype: ext4     opts: defaults,noatime     state: mounted - name: install the latest version of ntp   yum:     name: ntp     state: latest - name: install the latest version of nscd   yum:     name: nscd     state: latest - name: install wget   yum:     name: wget     state: latest - name: Disable SELinux   selinux:     state: disabled - name: Reboot for SELinux if needed   command: /sbin/shutdown -r +1   async: 0   poll: 0

    Java installation

    The Java installation is one of the most complex parts of the installation. First, we need to choose a supported version of JDK. Then we need to be sure that Java has been installed properly in all hosts. The installation tasks is split into the following part:

    • Create installation directories: /usr/share/java and /usr/java
    • Download Java JDK 1.7.80 which is a supported version for Cloudera Manager
    • Unarchive Java JDK
    • Fix ownership
    • Make Java available for the system with alternatives
    • Clean up installation download folder
    • Add Java home path by exporting $JAVA_HOME variable

    Below the java install tasks.

    - name: Create directories
        path: "{{ item }}"
        state: directory
        - "{{ java_download_folder }}"
        - "/usr/share/java"
    - name: Creates directory
        path:  "{{ java_download_folder }}"
        state: directory
    - name: Download Java
        url: "{{ java_download_url }}"
        dest: "{{ java_archive }}"
        headers: "Cookie:'; oraclelicense=accept-securebackup-cookie'"
        validate_certs: no
    - name: Unarchive Java archive
        src: "{{ java_archive }}"
        dest: "{{ java_download_folder }}"
        copy: no
    - name: Fix ownership
        state: directory
        path: "{{ java_name }}"
        owner: root
        group: root
        recurse: yes
    - name: Make Java available for system with alternatives
      command: 'alternatives --install "/usr/bin/java" "java" "{{java_name}}/bin/java" 2'
    - name: Clean up Java download
        state: absent
        path: "{{java_archive}}"
    - name: Add java home path
        dest: /etc/profile
        block: |
          export JAVA_HOME=/usr/java/jdk1.7.0_80
          export PATH=$JAVA_HOME/bin:$PATH
          regexp: "JAVA_HOME"
        state: present

    MariaDB installation

    After installing Java, we can start the installation and configuration of MariaDB database. You can find the entire role for MariaDB installation here.

    MySQL connector

    MySQL connector installation steps will follow approximatively the same steps as Java installation. All details here.

    Cloudera Manager Server installation

    The last role of this playbook is the installation of Cloudera Manager server. This role will simply install the Cloudera Manager server package in the cdh_manager host and start the 2 following deamons:

    • cloudera-manager-daemons
    • cloudera-manager-server
    - include_vars: ../../../group_vars/db_server.yml
    - name: Install the Cloudera Manager Server Packages
        name: "{{ item }}"
        state: installed
        - cloudera-manager-daemons
        - cloudera-manager-server
    # - name: Prepare Cloudera Manager Server External Database
    #   command: /usr/share/cmf/schema/
    #              -f
    #              --host {{ hostvars[db_hostname]['inventory_hostname'] }}
    #              mysql {{ }} {{ databases.scm.user }} {{ databases.scm.pass }}
    #   changed_when: False
    - name: Start the Cloudera Manager Server
        name: "{{ item }}"
        state: restarted
        enabled: yes
        - wait cloudera-scm-server
        - cloudera-scm-server
        - cloudera-scm-agent
    # Trigger handler to wait for SCM to startup
    - meta: flush_handlers



    After creating all roles, we need to define our site.yml in order to execute all tasks in the desired order.

    # Cloudera playbook
    - name: Configure Cloudera Manager Repository
      become: ansible_become
      hosts: cdh_servers
        - cm_repo
      tags: cm_repo
    - name: Configure Epel repository
      become: ansible_become
      hosts: cdh_servers
        - epel
      tags: epel_repo
    - name: OS Configuration
      become: ansible_become
      hosts: cdh_servers
          - os_config
      tags: os_config
    - name: Install Java JDK 7
      become: ansible_become
      hosts: cdh_servers
        - java
      tags: java
    - name: Install MySQL Java Connector
      become: ansible_become
      hosts: cdh_servers
        - mysql_connector
      tags: mysql_java_connector
    - name: Install MariaDB and create databases
      hosts: db_server
        - mariadb
      tags: mysql
    # ##############
    - name: Install Cloudera Manager Agents
      hosts: cdh_servers
        - cm_agents
      tags: cm_agents
    - name: Install Cloudera Manager Server
      hosts: cdh_manager
        - scm
      tags: cluster_template


    When all steps will finish, you can access to Cloudera Manager web interface by the following:


    Be sure, your network configuration is well configured to allow access to Cloudera Manager webUI through the default 7180 port.


    The entire project with all files is available here.

    Mehdi Bada
    Mehdi Bada