Big Data Analytics Lab

Description 

This project was carried out with a multinational based in Barcelona with the aim of launching a pilot Big Data platform based on Open Source tools to facilitate and streamline support processes analysis and processing of data in order to improve their business process.

The project focuses on improving a very specific use case, which will serve as a starting point to define and implement a reference architecture.

In defining the reference architecture it is where the different technological components are specified. These technological components will be Big Data tools, you have to choose carefully depending on the structure of the data and queries which we expect to be made.

The project starts with the study of the origin data, which allows us to make an initial selection of Big Data tools that will serve us best. Once selected, we value the various tools, including some benchmarkings, to determine the speed of each chosen solution.

Then, we create an ontology of the data so that data analysts of the company, who need data arrays to work, can choose the variables, see the features, see the distribution,.... and then apply rules of cleanliness, discretization and/or processing (Map-Reduce type rules).

InLab FIB has constituted a technical team to support the above analytical and technological processes, consisting of experts in Big Data and data mining, who will train technical staff of the company to ensure continuity of service beyond the project duration .

 

Duration of the project 
December, 2014 to December, 2015
Benefits for the client 

Working together with the client allows to form its technical team with considerable savings of time in preparing and processing the data.

The customer has access to top experts in Big Data technologies based on Open Source

Technology 
Ambari (Monitorització),
Cassandra (Data Lake),
Flume (Data Ingestion),
Gem (Data Analysis),
HBase (Data Lake),
HDFS (Data Lake),
Hive (Data Query),
Knox (Seguretat),
Mahout (Data Analysis),
Oozie (Scheduling),
Pig (Data Query),
SAS (Data Analysis),
Spark (Data Processing),
Sqoop (Data Ingestion),
Yarn (Data Processing),
R
Areas of expertise involved in the project 

Follow us on

Els nostres articles del bloc d'inLab FIB

         
         

inLab FIB incorporates esCert

Icona ESCERT

inLab is member of