Infrastructure at your Service

Mehdi Bada

Deploy a Cloudera cluster with Terraform and Ansible in Azure – part 2

In this part of the blog posts series, we will show how ansible helps us to configure our cluster and install all pre-requisite needed for Cloudera Manager. Ansible is one of the most important automation tools currently.

Ansible will help us to configure all nodes for a manual installation using Cloudera Manager. Our playbook will contain the following roles:

  • cm_repo: add the same C.M repo into all nodes.
  • os_config: Adjust all OS parameter for installing a Cloudera cluster. 
  • java: Java JDK 1.7.80 installation.
  • cm_agents: Installation of the C.M agent’s packages
  • MariaDB: Installation of a MariaDB. C.M needs an Oracle, MySQL (MariaDB) or PostgreSQL database for Cloudera Manager meta-data storage and Hive meta-store.
  • mysql_connector: Installation of the MySQL connector for connecting to MariaDB. 
  • scm: Install and start the Cloudera Manager Server.

In a Big Data cluster, we split the node into roles.

  • Manager: dedicated node for all Cloudera Manager daemons
  • Master: NameNode daemon + Secondary NameNode daemon
  • Workers: DataNode daemons

The first step is to define the Ansible hosts inventory file. Below my inventory file.

manager ansible_host=<manager_ip> id=6

manager  ansible_host=<manager_ip> id=6

master ansible_host=<master_ip>  id=5

worker1 ansible_host=<worker1>  id=2
worker2 ansible_host=<worker2>  id=3
worker3 ansible_host=<worker3>  id=4



We will now, define all variable needed for our roles. Variables are split into roles:

Below the example of variables definition for CDH server instances: cdh_servers.yml


db_hostname: "{{ hostvars[groups['db_server'][0]]['inventory_hostname'] }}"
scm_hostname: "{{ hostvars[groups['cdh_manager'][0]]['inventory_hostname'] }}"

cdh_version: 5.14.2
cluster_display_name: cluster_1

# Users and Groups
  - dbi
  - dbi

# Java variables
java_download_folder: /usr/java
java_name: "{{java_download_folder}}/jdk1.7_80"
java_archive: "{{java_download_folder}}/jdk-7u80-linux-x64.tar.gz"

# Mysql Java connector
mysql_java: mysql-connector-java-5.1.46
mysql_java_download_folder: /usr/share/mysql-java/
mysql_java_archive: "{{ mysql_java_download_folder }}/{{ mysql_java }}.tar.gz"

mysql_java_jar: /usr/share/java/mysql-connector-java.jar

Same files will created for database server variable (db_server.yml) and Cloudera Manager server variables (scm_server.yml).

After the variables definition, we can start creating the different roles and their associated tasks.

 Cloudera Manager repo

The goal of this role is to add the same C.M repo in all cluster hosts. We will use a template of the repository file.


# Packages for Cloudera Manager, Version 5, on RedHat or CentOS 7 x86_64
name=Cloudera Manager
gpgcheck = 1


- name: Add Cloudera repo
    src: ../templates/cloudera-manager.repo.j2
    dest: "/etc/yum.repos.d/cloudera-manager{{cdh_version}}.repo"

The definition of the Cloudera Manager version has previously done in the cdh_servers.yml variable file.

OS Configuration

Some requirements are needed before installing a Cloudera cluster. This role will configure all hosts with Cloudera requirements: .

- name: Create groups
    name: "{{item}}"
    state: present
  with_items: "{{group}}"

- name: Create user
    name: "{{item}}"
    shell: /bin/bash
    uid: 1050
    groups: "{{group}}"
  with_items: "{{user}}"

- name: "Build hosts file"
    dest: /etc/hosts
    regexp: '.*{{ item }}$'
    line: "{{ hostvars['ansible_default_ipv4']['address'] }} {{item}}"
    state: present
  when: hostvars['ansible_default_ipv4']['address'] is defined
  with_items: '{{groups.all}}'

- name: Disable transparent huge page - defrag
  shell: echo "never" > /sys/kernel/mm/transparent_hugepage/defrag

- name: Disable transparent huge page - enabled
  shell: echo "never" > /sys/kernel/mm/transparent_hugepage/enabled

- name: VM swappiness - 1
  shell: echo "1" > /proc/sys/vm/swappiness

- name: Set VM swappiness - 2
    name: vm.swappiness
    value: 1
    state: present

- name: Create /data dir
    path: /data
    state: directory
    mode: 0775
    owner: dbi
    group: dbi

- name: Create file system on volume
    fstype: ext4
    dev: /dev/xvdb

- name: Mount volume as /data
    name: /data
    src: /dev/xvdb
    fstype: ext4
    opts: defaults,noatime
    state: mounted

- name: install the latest version of ntp
    name: ntp
    state: latest

- name: install the latest version of nscd
    name: nscd
    state: latest

- name: install wget
    name: wget
    state: latest

- name: Disable SELinux
    state: disabled

- name: Reboot for SELinux if needed
  command: /sbin/shutdown -r +1
  async: 0
  poll: 0

Java installation

The Java installation is one of the most complex parts of the installation. First, we need to choose a supported version of JDK. Then we need to be sure that Java has been installed properly in all hosts. The installation tasks is split into the following part:

  • Create installation directories: /usr/share/java and /usr/java
  • Download Java JDK 1.7.80 which is a supported version for Cloudera Manager
  • Unarchive Java JDK
  • Fix ownership
  • Make Java available for the system with alternatives
  • Clean up installation download folder
  • Add Java home path by exporting $JAVA_HOME variable

Below the java install tasks.

- name: Create directories
    path: "{{ item }}"
    state: directory
    - "{{ java_download_folder }}"
    - "/usr/share/java"

- name: Creates directory
    path:  "{{ java_download_folder }}"
    state: directory

- name: Download Java
    url: "{{ java_download_url }}"
    dest: "{{ java_archive }}"
    headers: "Cookie:'; oraclelicense=accept-securebackup-cookie'"
    validate_certs: no

- name: Unarchive Java archive
    src: "{{ java_archive }}"
    dest: "{{ java_download_folder }}"
    copy: no

- name: Fix ownership
    state: directory
    path: "{{ java_name }}"
    owner: root
    group: root
    recurse: yes

- name: Make Java available for system with alternatives
  command: 'alternatives --install "/usr/bin/java" "java" "{{java_name}}/bin/java" 2'

- name: Clean up Java download
    state: absent
    path: "{{java_archive}}"

- name: Add java home path
    dest: /etc/profile
    block: |
      export JAVA_HOME=/usr/java/jdk1.7.0_80
      export PATH=$JAVA_HOME/bin:$PATH
      regexp: "JAVA_HOME"
    state: present

MariaDB installation

After installing Java, we can start the installation and configuration of MariaDB database. You can find the entire role for MariaDB installation here.

MySQL connector

MySQL connector installation steps will follow approximatively the same steps as Java installation. All details here.

Cloudera Manager Server installation

The last role of this playbook is the installation of Cloudera Manager server. This role will simply install the Cloudera Manager server package in the cdh_manager host and start the 2 following deamons:

  • cloudera-manager-daemons
  • cloudera-manager-server
- include_vars: ../../../group_vars/db_server.yml

- name: Install the Cloudera Manager Server Packages
    name: "{{ item }}"
    state: installed
    - cloudera-manager-daemons
    - cloudera-manager-server

# - name: Prepare Cloudera Manager Server External Database
#   command: /usr/share/cmf/schema/
#              -f
#              --host {{ hostvars[db_hostname]['inventory_hostname'] }}
#              mysql {{ }} {{ databases.scm.user }} {{ databases.scm.pass }}
#   changed_when: False

- name: Start the Cloudera Manager Server
    name: "{{ item }}"
    state: restarted
    enabled: yes
    - wait cloudera-scm-server
    - cloudera-scm-server
    - cloudera-scm-agent

# Trigger handler to wait for SCM to startup
- meta: flush_handlers



After creating all roles, we need to define our site.yml in order to execute all tasks in the desired order.

# Cloudera playbook

- name: Configure Cloudera Manager Repository
  become: ansible_become
  hosts: cdh_servers
    - cm_repo
  tags: cm_repo

- name: Configure Epel repository
  become: ansible_become
  hosts: cdh_servers
    - epel
  tags: epel_repo

- name: OS Configuration
  become: ansible_become
  hosts: cdh_servers
      - os_config
  tags: os_config

- name: Install Java JDK 7
  become: ansible_become
  hosts: cdh_servers
    - java
  tags: java

- name: Install MySQL Java Connector
  become: ansible_become
  hosts: cdh_servers
    - mysql_connector
  tags: mysql_java_connector

- name: Install MariaDB and create databases
  hosts: db_server
    - mariadb
  tags: mysql

# ##############
- name: Install Cloudera Manager Agents
  hosts: cdh_servers
    - cm_agents
  tags: cm_agents

- name: Install Cloudera Manager Server
  hosts: cdh_manager
    - scm
  tags: cluster_template


When all steps will finish, you can access to Cloudera Manager web interface by the following:


Be sure, your network configuration is well configured to allow access to Cloudera Manager webUI through the default 7180 port.


The entire project with all files is available here.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Mehdi Bada
Mehdi Bada