What is a Data Scientist?

Detall de les habitats que hauria de tenir un bon Data Scientist
What is a Data Scientist?
Authors:

Data science,” was born of  the scientific method, is the evolution of what has hitherto been known as a data analyst, but unlike it, data scientist should explore and analyze data from multiple sources, often huge (known as Big data), which may have very different formats. Data scientist also has a strong business vision to be able to extract and transmit recommendations to business leaders in his company.

A Data Scientist is an expert in Data Science (Science Data), his job is to extract knowledge from the data to answer the questions.

What is the data science“?

Data science,” was born of the scientific method, is the evolution of what has hitherto been known as a data analyst, but unlike it, data scientist should explore and analyze data from multiple sources, often huge (known as Big data), which may have very different formats. Data scientist also has a strong business vision to be able to extract and transmit recommendations to business leaders in his company.

These data sets can come from all types of electronic devices (such as a phone, all types of sensors, genome sequencers,), social networking, medical data, web pages… and they affect in a very significant way the current investigation in many fields as the biological sciences, the medical computer science, the social sciences…

What process follows a data scientist?

The process follows a Data Scientist to answer the questions can be summarized in these five steps:

  • Extract data, regardless of its source (websites, csv, logs, celery, etc.) and volume (Small or Big Data Data).
  • Clean the data.
  • Process data using different statistical methods (statistical inference, regression, hypothesis testing, etc.).
  • To design new tests or experiments
  • Visualize and present data graphically.

What is expected from a Data Scientist?

What is expected from a Data Scientist is that not only it is capable of approaching a problem of exploitation of data from the point of view of analysis, but also it has the necessary aptitudes for covering the stage of management of data. So, the aim of a profile of this type is bring over two worlds (the management and data analysis), which until now they had been separated, but due to the new requirements of volume, variety and data speed exploitation of these (ie, three V’s of the standard definition of the term Big Data) it has become essential to carry out this exploitation through a combined profile.

What profile must have a Data Scientist?

The profile of the Data Scientist, is as a magic potion, needs as principal ingredients advanced skills in computer science, mathematics/statistics, automatic learning, to be able to handle large volumes of data, aptitude to communicate the knowledge that we have extracted from the information, vision of business, etc.

Since science of data is multidisciplinaryIt, it is necessary to learn many things, and is a specialization demanding and advanced time, but the combination is very powerful and difficult to find, maybe that’s why the Harvard Business Review magazine defined this work as the most sexy of the 21st century.

In the graph that heads the article, extracted from Applied Data Science in Europe published in the Zurich University of Applied Sciences and in the blog of one of his authors, in Thilo Stadelmann, there are detailed the different skills that a data scientist should have.

What challenges can we approach?

For mentioning an example, one of the challenges of current Big Data and Data Science technologies is its application in the analysis of the huge amount of genomic information that we have, and used it to study diseases such as cancer.

Consider that humans, have 23 pairs of chromosomes, each one consists of about 3,200 million base pairs of DNA containing approximately 20.000-25.000 gens. Determine which combination of these gens are significant for certain diseases opens the door to think that someday we will be personalized medicine.

Currently there are a lot of open data sources that we can analyze, for example, open data from Barcelona town hall, or full details of all the human cancer genome from the Pediatric Cancer Genome Project at the University of Washington.

You can take part in different challenges of data science, such as: Identifying signs of diabetic retinopathy in images of the eye. This and other challenges are published kaggle competitions, where if you’re good, you get good rewards.

How can I learn?

A good way to learn Data Science is through specialization in the MOOC’s platform (online coursesCoursera, they offer nine courses for free.

In inLab FIB have been working in the data analysis, for many years, in areas such as modeling, simulation, optimization and analysis of learning (Learning Analytics). With the appearance of technologies to treat large volumes of data (Big Data) now have powerful tools that complement this area.