Infrastructure at your Service

Julia Gugel

Deploy DC/OS using Ansible (Part 2) – Playbooks

Finally, after all the configuration stuff is done, we can run the playbooks

Create SSH Access

First the SSH Access on all nodes need to be created. Therefore the access-onprem.yml is used:
Be careful, I used CentOS on my system, so I commented the apt-get and the debian-based part out.
If you want to run the playbook on another operating system, adjust it carefully.

---
# This playbook enable access to all ansible targets via ssh

- name: setup the ansible requirements on all nodes
  hosts: all:!localhost
  #hosts: all
  serial: 20
  remote_user: "{{ initial_remote_user | default('root') }}"
  become: true
  tasks:

#    - name: attempt to update apt's cache
#      raw: test -e /usr/bin/apt-get && apt-get update
#      ignore_errors: yes

#    - name: attempt to install Python on Debian-based systems
#      raw: test -e /usr/bin/apt-get && apt-get -y install python-simplejson python
#      ignore_errors: yes

    - name: attempt to install Python on CentOS-based systems
      raw: test -e /usr/bin/yum && yum -y install python-simplejson python
      ignore_errors: yes

    - name: Create admin user group
      group:
        name: admin
        system: yes
        state: present

    - name: Ensure sudo is installed
      package:
        name: sudo
        state: present

    - name: Remove user centos
      user:
        name: centos
        state: absent
        remove: yes

    - name: Create Ansible user
      user:
        name: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
        shell: /bin/bash
        comment: "Ansible management user"
        home: "/home/{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
        createhome: yes
        password: "admin123"

    - name: Add Ansible user to admin group
      user:
        name: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
        groups: admin
        append: yes

    - name: Add authorized key
      authorized_key:
        user: "{{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}"
        state: present
        key: "{{ lookup('file', lookup('env','HOME') + '/.ssh/ansible-dcos.pub') }}"

    - name: Copy sudoers file
      command: cp -f /etc/sudoers /etc/sudoers.tmp

    - name: Backup sudoers file
      command: cp -f /etc/sudoers /etc/sudoers.bak

    - name: Ensure admin group can sudo
      lineinfile:
        dest: /etc/sudoers.tmp
        state: present
        regexp: '^%admin'
        line: '%admin ALL=(ALL) NOPASSWD: ALL'
      when: ansible_os_family == 'Debian'

    - name: Ensure admin group can sudo
      lineinfile:
        dest: /etc/sudoers.tmp
        state: present
        regexp: '^%admin'
        insertafter: '^root'
        line: '%admin ALL=(ALL) NOPASSWD: ALL'
      when: ansible_os_family == 'RedHat'

    - name: Replace sudoers file
      shell: visudo -q -c -f /etc/sudoers.tmp && cp -f /etc/sudoers.tmp /etc/sudoers

    - name: Test Ansible user's access
      local_action: "shell ssh {{ lookup('ini', 'remote_user section=defaults file=../ansible.cfg') }}@{{ ansible_host }} 'sudo echo success'"
      become: False
      register: ansible_success

    - name: Remove Ansible SSH key from bootstrap user's authorized keys
      lineinfile:
        path: "{{ ansible_env.HOME }}/.ssh/authorized_keys"
        state: absent
        regexp: '^ssh-rsa AAAAB3N'
      when: ansible_success.stdout == "success"

Start the Playbook for the SSH access

[[email protected] ansible-dcos]# pwd
/root/ansible-dcos

[[email protected] ansible-dcos]# ansible-playbook plays/access-onprem.yml
PLAY [setup the ansible requirements on all nodes] 
****************************************************************************************
TASK [Gathering Facts]
****************************************************************************************
ok: [192.168.22.103]
ok: [192.168.22.102]
ok: [192.168.22.104]
ok: [192.168.22.101]
ok: [192.168.22.100]

[....]

PLAY RECAP 
**************************************************************************************
192.168.22.100             : ok=14   changed=6    unreachable=0    failed=0
192.168.22.101             : ok=14   changed=6    unreachable=0    failed=0
192.168.22.102             : ok=14   changed=6    unreachable=0    failed=0
192.168.22.103             : ok=14   changed=6    unreachable=0    failed=0
192.168.22.104             : ok=14   changed=6    unreachable=0    failed=0

This is not the whole output of the playbook. Important to know, during the “TASK [Test Ansible user’s access]” I had to insert the Ansible password 5 times. After that the playbooks finished successfully.

Ping the servers using Ansible

After the playbook finished successfully do a test ping

[[email protected] ansible-dcos]# ansible all -m ping
192.168.22.102 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
192.168.22.100 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
192.168.22.104 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
192.168.22.101 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
192.168.22.103 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

In case of trouble it is really helpful to use the “-vvv” option.
It is also possible to ping only one server using

ansible 192.168.22.100 -m ping

Rollout the DC/OS installation

[[email protected] ansible-dcos]# pwd
/root/ansible-dcos
[[email protected] ansible-dcos]# cat plays/install.yml
---
- name: setup the system requirements on all nodes
  hosts: all
  serial: 20
  become: true
  roles:
    - common
    - docker

- name: generate the DC/OS configuration
  hosts: bootstraps
  serial: 1
  become: true
  roles:
    - bootstrap

- name: deploy nodes
  hosts: [ masters, agents, agent_publics]
  serial: 20
  become: true
  roles:
    - node-install

[[email protected] ansible-dcos]# pwd
/root/ansible-dcos
[[email protected] ansible-dcos]# ansible-playbook plays/install.yml

PLAY [setup the system requirements on all nodes]
*********************************************************************

TASK [Gathering Facts]
*********************************************************************
ok: [192.168.22.102]
ok: [192.168.22.104]
ok: [192.168.22.101]
ok: [192.168.22.100]
[....]

In case some installation steps fail, Ansible will skip for that server and gives you the opportunity to rerun the playbook on the failed server.

ansible-playbook plays/install.yml --limit @/root/ansible-dcos/plays/install.retry

If you cannot connect to your master via browser: Check your /var/log/messages for error messages. In my case the master searched for the eth0 interface. Which isn’t available on my VM.
Just change the detect-ip script as follows, according to your network interface. Same step is needed on all agent-nodes as well.

[[email protected] bin]# cat /opt/mesosphere/bin/detect_ip
#!/usr/bin/env bash
set -o nounset -o errexit
export PATH=/usr/sbin:/usr/bin:$PATH
echo $(ip addr show enp0s8 | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | head -1)

Install the CLI

For those of you, which prefer a CLI, just install it on your master.

[[email protected] ~]#  [ -d /usr/local/bin ] || sudo mkdir -p /usr/local/bin
[[email protected] ~]# curl https://downloads.dcos.io/binaries/cli/linux/x86-64/dcos-1.11/dcos -o dcos
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.9M  100 13.9M    0     0  1313k      0  0:00:10  0:00:10 --:--:-- 3920k
[[email protected] ~]# sudo mv dcos /usr/local/bin
[[email protected] ~]# chmod +x /usr/local/bin/dcos
[[email protected] ~]# dcos cluster setup http://192.168.22.101
If your browser didn't open, please go to the following link:

    http://192.168.22.101/login?redirect_uri=urn:ietf:wg:oauth:2.0:oob

Enter OpenID Connect ID Token: eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6Ik9UQkVOakZFTWtWQ09VRTRPRVpGTlRNMFJrWXlRa015Tnprd1JrSkVRemRCTWpBM1FqYzVOZyJ9.eyJlbWFpbCI6Imp1bGlhLmd1Z2VsQGdtYWlsLmNvbSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJpc3MiOiJodHRwczovL2Rjb3MuYXV0aDAuY29tLyIsInN1YiI6Imdvb2dsZS1vYXV0aDJ8MTA2NTU2OTI5OTM1NTc2MzQ1OTEyIiwiYXVkIjoiM3lGNVRPU3pkbEk0NVExeHNweHplb0dCZTlmTnhtOW0iLCJpYXQiOjE1NDA0NTA4MTcsImV4cCI6MTU0MDg4MjgxN30.M8d6dT4QNsBmUXbAH8B58K6Q2XvnCKnEd_yziiijBXHdW18P2OnJEYrKa9ewvOfFhyisvLa7XMU3xeBUhoqX5T6mGkQo_XUlxXM82Ohv3zNCdqyNCwPwoniX4vU7R736blcLRx1aB8TJnydNb0H0IzEAVzaYBQ1CRV-4a9KsiMXKBBPlskOSvek4b_FRghA6hsjMA2eO-G5r3B6UgHo6CCwdwVrhsOygvJ5NwDC0xiFrnkW-SjZRZztCN8cRj7b40VH43uY6R2ibxJfE7SaGpbWzLyp7juUJ766WXar3O7ww42bYIqLnAx6YmWG5kFeJnmJGT-Rdmhl2JuvdABoozA

That’s it, now you can configure and use your DC/OS. Always keep in mind: the ntpd service is really essential for a working DC/OS Node. Also use the /var/log/messages, it really helps!
One little thing I have to mention at the end. Don’t confide in the official documentation and the troubleshooting guide, it does not help as much as expected…

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Julia Gugel
Julia Gugel

Consultant