Infrastructure at your Service

Category

Big Data

Christophe Cosme

Pass summit – day 4

By | Big Data, Business Intelligence, Cloud, Database Administration & Monitoring, Development & Performance | No Comments

Optimizing Multi-Billion Row Tables in Tabular in 2018   I wanted to attend the session moderated by Marco Russo to see his approach of optimizing performance in Tabular model The first thing to understand is how the data is stored and organized in a Tabular model. It is of course using the xVelocity in-memory capabilities with the Vertipac column storage layout engine. It organizes the data, compressing  it by column,  in combination with a dictionary…

Read More
Christophe Cosme

Pass summit – dbi visit day 3

By | Big Data, Business Intelligence, Cloud, Database Administration & Monitoring, Database management, SQL Server | No Comments

The Microsoft data platform is evolving The third day began with the Keynote help by Rohan Kumar the Corporate Vice President of Azure Data by Microsoft The main message was “Hybrid data platform is the way for the future” and Microsoft is working in this direction. The use of AI and analytic to transform the customer business is also a key driver for the Microsoft data platform, which is building to enable this easily. Rohan…

Read More
Christophe Cosme

Pass summit – dbi visit day 1

By | Big Data, Business Intelligence, Cloud | No Comments

Designing Modern Data and Analytic Solution in Azure After having explained the pros and cons of Azure and the decision drivers for going to an Azure Architecture, some interesting messages were delivered like the decoupling of the storage and compute aspect on Azure, even if some of services still combine both. Another message that we all know but is essential to remind on a regular basis is that cost control is an important aspect. Developers…

Read More
Mehdi Bada

Hitachi Content Intelligence deployment

By | Big Data | No Comments

Hitachi Content Intelligence (HCI) is a search and data processing solution. It allows the extraction, classification, enrichment, and categorization of data, regardless of where the data lives or what format it’s in. Content Intelligence provides tools at large scale across multiple repositories. These tools are useful for identifying, blending, normalizing, querying, and indexing data for search, discovery, and reporting purposes. Architecture HCI has components called data connections that it uses to access the places where…

Read More
Mehdi Bada

Creating and Using a Parcel Repository for Cloudera Manager

By | Big Data | No Comments

This blog post describes how to create a hosted Cloudera repository and use it in your Cloudera Manager deployment. The first step is to install a web server, which will host RPM packages and repodata. The common way, is to use an Apache web server. Installing Apache HTTPD service [[email protected] ]$ sudo yum install httpd -y   Starting Apache HTTPD service [[email protected] ]$ sudo systemctl start httpd Verify that the service has been started properly….

Read More
Mehdi Bada

Create an HDFS user’s home directory

By | Big Data | No Comments

Let’s assume we need to create an HDFS home directory for a user named “dbitest”. We need first to verify if the user exists on the local filesystem. It’s important to understand that HDFS is mapping users from the local filesystem. [[email protected] ~]$ cat /etc/passwd | grep dbitest  Create a user on the local file system When the user is not created, we can easily create one with it associated group. [[email protected] ~]$ sudo groupadd…

Read More
Mehdi Bada

Deploy a Cloudera cluster with Terraform and Ansible in Azure – part 3

By | Big Data, Cloud | No Comments

After the deployment step with Terraform and the configuration/installation with Ansible, we will continue the installation of our Cloudera cluster with Cloudera Manager. By following the below steps you will see how to install CDH on our hosts using Cloudera Manager. Connection First, Login to Cloudera manager URL. When you connect to C.M for the first time, you need to accept the Cloudera Terms and Conditions. Then choose your desired edition of Cloudera. For this…

Read More
Mehdi Bada

Deploy a Cloudera cluster with Terraform and Ansible in Azure – part 2

By | Big Data, Cloud, Development & Performance | No Comments

In this part of the blog posts series, we will show how ansible helps us to configure our cluster and install all pre-requisite needed for Cloudera Manager. Ansible is one of the most important automation tools currently. Ansible will help us to configure all nodes for a manual installation using Cloudera Manager. Our playbook will contain the following roles: cm_repo: add the same C.M repo into all nodes. os_config: Adjust all OS parameter for installing…

Read More
Mehdi Bada

Deploy a Cloudera cluster with Terraform and Ansible in Azure – part 1

By | Big Data, Cloud | No Comments

Deploying a Cloudera distribution of Hadoop automatically is very interesting in terms of time-saving. Infrastructure as Code tools such as Ansible, Puppet, Chef, Terraform, allow now to provision, manage and deploy configuration for large clusters. In this blog posts series, we will see how to deploy and install a CDH cluster with Terraform and Ansible in the Azure cloud. The first part consists of provisioning the environment with Terraform in Azure. Terraform features an extension…

Read More
Mehdi Bada

Managing Oracle Big Data Cloud – CE with REST API

By | Big Data, Cloud, Oracle | 3 Comments

In this blog post, we will see how to manage Oracle Public Cloud Big Data service Compute Edition with REST API. Scheduling the start/stop/restart of a metered PaaS in the Oracle Cloud can be interesting for managing efficiently the consumption of your cloud credits. We should first have a look at the official documentation so as to understand what the API is composed of. https://docs.oracle.com/en/cloud/paas/big-data-compute-cloud/csbdp/QuickStart.html  Use the following URL composition to get access to REST endpoint: https://region-prefix.oraclecloud.com/resource-path According…

Read More