Automation of multidimensional data storage design

Theses

Estudiant:
Oscar Romero
Director:
Data de defensa:
09/02/2010
Departament:
Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya.

In this thesis we propose two methods to support the data warehouse modeling task: MDBE (Multidimensional Design Based on Examples) and AMDO (Automating the Multidimensional Design from Ontologies). Both consider the requirements and the data sources to carry out the modeling task and were designed to overcome the limitations of current approaches.

1. MDBE follows a classical approach, in which the user requirements are known in advance. This method benefits from the knowledge captured from the data sources, but guides the process from the requirements and, consequently, is able to work on semantically poor data sources. That is, exploiting the fact that with quality requirements, we can overcome the inconvenience of having data sources that do not properly capture our work domain.
2. Unlike MDBE, AMDO assumes a scenario where semantically rich data sources are available. For this reason, it directs the modeling process from the data sources, and uses the requirements to shape and adapt the generated results to the user’s needs. In this context, unlike the previous one, semantically rich data sources cushion the fact of not having clear user requirements beforehand.

It should be noted that our methods establish a combined framework that can be used to decide, given a particular scenario, which approach is more appropriate. For example, the same approach cannot be followed in a scenario where the requirements are well known in advance and in a scenario where they are not yet clear (a case in point is when the user is not clear about the analysis capabilities of his own system). In fact, having good requirements in advance dampens the need to have semantically rich data sources, while conversely, if we have data sources that adequately capture our domain of work, the requirements are not necessary in advance. For these reasons, in this thesis we provide a combined framework that covers all possible scenarios that we can find during the data warehouse modeling task