Detection of authorship to validate the identity and avoid plagiarism in UOC's practices

Description 

This project involves the application of techniques of natural language processing (NLP) to identify the author's patterns of writing, allowing to compare the "shape" of writing between documents. Each person has a different way of writing to others, and how to write identifies and differentiates us from others. Furthermore, when some text is literally copied from another source such as an article found in Internet or a book, it is detected quickly that that "way" of writing is not ours.

InLab is responsible for developing the prototype of this project. That means it collects, for each document with known author, a set of indicators such as the length of the words, length of the phrases, wealth of vocabulary, frequency of words, etc.

The project will indicate the likelihood of plagiarism when introducing a new document by an unknown author. Through a series of classification algorithms (Support Vector Machine, Knn ....) it'll reveal the probability that the document belongs to the author for each of the indicators identified.

Duration of the project 
June, 2015 to October, 2015
Collaborators 
Technology 
PLN (Processament de Llenguatges Naturals),
MySQL,
Python
Areas of expertise involved in the project 
Project Manager 

Follow us on

Els nostres articles del bloc d'inLab FIB

         
         

inLab FIB incorporates esCert

Icona ESCERT

inLab is member of