This project involves the application of techniques of natural language processing (NLP) to identify the author's patterns of writing, allowing to compare the "shape" of writing between documents. Each person has a different way of writing to others, and how to write identifies and differentiates us from others. Furthermore, when some text is literally copied from another source such as an article found in Internet or a book, it is detected quickly that that "way" of writing is not ours.
InLab is responsible for developing the prototype of this project. That means it collects, for each document with known author, a set of indicators such as the length of the words, length of the phrases, wealth of vocabulary, frequency of words, etc.
The project will indicate the likelihood of plagiarism when introducing a new document by an unknown author. Through a series of classification algorithms (Support Vector Machine, Knn ....) it'll reveal the probability that the document belongs to the author for each of the indicators identified.