Methods and solutions

Document authentication by delta2T

A text consists of semantic elements (words) structured by the turns of phrase (syntax) of an author. In first analysis, the semantic signal seems to be dominant in a text. However, the syntactic signal largely predominates along a text because, more stable, it measures the choices of style followed by a writer. If text authorship can be determined by semantic or syntactic methods, linguistic studies show that syntactic analyses are the most efficient to determine the author of a text. This type of analysis, however, takes time and needs the assistance of a specialized expertise in the language of the text.

By their speed of execution, algorithmic approaches of text authorship differ from linguistic methods that require the knowledge of the language. The algorithms implemented in delta2T, the software application of OrphAnalytics, determine the characteristic profile of a document by systematically identifying patterns used in and between words, in and between sentences. This algorithmic approach essentially measures the syntax, because within the large amount of the measured patterns, syntactic signals dominate, their stability strongly minorizing semantic signals. The strategy of delta2T enabling to measure the syntax of an author - and thus his style - joins the efficient strategy developed by linguists for text authentication.

The comparison by delta2T of stylometric profiles of texts of an alleged author allows to rule objectively on their text authorship: similar profiles indicate that these texts were most likely written by a single author. Finally, the systematic and therefore without a priori approach integrated into delta2T works in all tested languages. It is currently the subject of a patent application filed on February 22, 2016.

Specifically, the delta2T application installed in the computer platform of an institution works independently without computer or linguistic supervision. The delta2T application processes documents confidentially, without database: the analyzed texts are only available to the institution’s managers and are not stored unnecessarily.

More specifically, the text sequence prepared by the signer with the text extraction module of delta2T is sent to the second module of delta2T installed on the institution's platform. Dedicated to stylometric analyses, this module then establishes a dashboard that summarizes the main results of textual analyses to be transferred to those concerned with the document assessment.

Compared to the optimal analyses of text authentication developed by linguists, the stylometric approach of delta2T approach is disruptive because, in summary:

  • the texts to authenticate are prepared by the signer;
  • the analysis application works without operator;
  • the immediate results are sent to reviewers;
  • the analysis is summarized for a jury without linguistic training;
  • the textual analyses are applicable in all tested languages.

Our algorithmic approach to syntactic analyses allows to introduce for the first time a systematic prevention of intellectual fraud, more precisely ghostwriting. The algorithmic approach delta2T complements the linguistic expertise required for each of these disputed cases.