Infrastructure at your Service

Category Archives: Big Data

Mehdi Bada

Creating and Using a Parcel Repository for Cloudera Manager

By | Big Data | No Comments

This blog post describes how to create a hosted Cloudera repository and use it in your Cloudera Manager deployment. The first step is to install a web server, which will host RPM packages and repodata. The common way, is to use an Apache web server. Installing Apache HTTPD service [cdhtest@edge ]$ sudo yum install httpd -y   Starting Apache HTTPD service [cdhtest@edge ]$ sudo systemctl start httpd Verify that the service has been started properly….

 
Read More
Mehdi Bada

Create an HDFS user’s home directory

By | Big Data | No Comments

Let’s assume we need to create an HDFS home directory for a user named “dbitest”. We need first to verify if the user exists on the local filesystem. It’s important to understand that HDFS is mapping users from the local filesystem. [cdhtest@master ~]$ cat /etc/passwd | grep dbitest  Create a user on the local file system When the user is not created, we can easily create one with it associated group. [cdhtest@master ~]$ sudo groupadd…

 
Read More
Mehdi Bada

Deploy a Cloudera cluster with Terraform and Ansible in Azure – part 3

By | Big Data, Cloud | No Comments

After the deployment step with Terraform and the configuration/installation with Ansible, we will continue the installation of our Cloudera cluster with Cloudera Manager. By following the below steps you will see how to install CDH on our hosts using Cloudera Manager. Connection First, Login to Cloudera manager URL. When you connect to C.M for the first time, you need to accept the Cloudera Terms and Conditions. Then choose your desired edition of Cloudera. For this…

 
Read More
Mehdi Bada

Deploy a Cloudera cluster with Terraform and Ansible in Azure – part 2

By | Big Data, Cloud, Development & Performance | No Comments

In this part of the blog posts series, we will show how ansible helps us to configure our cluster and install all pre-requisite needed for Cloudera Manager. Ansible is one of the most important automation tools currently. Ansible will help us to configure all nodes for a manual installation using Cloudera Manager. Our playbook will contain the following roles: cm_repo: add the same C.M repo into all nodes. os_config: Adjust all OS parameter for installing…

 
Read More
Mehdi Bada

Deploy a Cloudera cluster with Terraform and Ansible in Azure – part 1

By | Big Data, Cloud | No Comments

Deploying a Cloudera distribution of Hadoop automatically is very interesting in terms of time-saving. Infrastructure as Code tools such as Ansible, Puppet, Chef, Terraform, allow now to provision, manage and deploy configuration for large clusters. In this blog posts series, we will see how to deploy and install a CDH cluster with Terraform and Ansible in the Azure cloud. The first part consists of provisioning the environment with Terraform in Azure. Terraform features an extension…

 
Read More
Mehdi Bada

Managing Oracle Big Data Cloud – CE with REST API

By | Big Data, Cloud, Oracle | 3 Comments

In this blog post, we will see how to manage Oracle Public Cloud Big Data service Compute Edition with REST API. Scheduling the start/stop/restart of a metered PaaS in the Oracle Cloud can be interesting for managing efficiently the consumption of your cloud credits. We should first have a look at the official documentation so as to understand what the API is composed of. https://docs.oracle.com/en/cloud/paas/big-data-compute-cloud/csbdp/QuickStart.html  Use the following URL composition to get access to REST endpoint: https://region-prefix.oraclecloud.com/resource-path According…

 
Read More
Mehdi Bada

Introduction to Oracle Big Data Services

By | Big Data, Oracle | No Comments

Since few years, Oracle decided to move forward in the Big Data area, as their main competitor. The goal of this blog post is to explain you, how the Oracle Big Data offering is composed. As the Oracle Big Data offering is continuously improving, I’m always open to your feedback Oracle Big Data offering is split in 2 parts: On-Premise Public Cloud Note: It’s important to know, that the 2 main Big Data distribution on…

 
Read More