Authorship detection to validate identity and prevent plagiarism in UOC internships

Duration of the project:
June, 2015 –
October, 2015
Client
Funded by
Project Manager
inLab FIB Team:
Areas of expertise involved in the project
Technology
Authorship detection to validate identity and prevent plagiarism in UOC internships

Description

This project consists of the application of natural language processing (NLP) techniques to identify author’s writing patterns and thus allow to compare the “way” of writing between documents. Each person has a different style of writing and this style of writing identifies and differentiates us from other people. In fact, when we literally copy some text from another source, for example, an article found on the Internet or from a book, it is quickly detected that this “way of writing” is not our own.

inLab has been in charge of developing the prototype of this project. It is based on collecting for each document with a known author, a set of indicators such as word length, sentence length, vocabulary richness, word frequency, etc.

The project will indicate the probability of plagiarism when introducing a new document of unknown author. What it will do is assign the probability with which the document belongs to the author for each of the indicators identified by a series of classification algorithms (Support Vector Machine, Knn….).