Between October 28, 2017 and November 13, 2020, 4952 anonymous messages signed Q were published first on the 4chan site and then on the 8chan site, which recently became 8kun. This corpus of messages is designated QAnon, for their Q-signature letter and their anonymous character. For the QAnon movement surrounding this corpus, the signatory is a U.S. official who would have a Q clearance giving him access to confidential information and this official would anonymously publish information intended to prepare the population for important political changes. To our knowledge, the QAnon corpus has not yet been subjected to a stylometric analysis capable of characterizing the style of a writer. The unsupervised machine learning approach using multivariate statistical analysis allows the appearance of text sequences clustered by the author's own style.
The whole corpus is collected in order to challenge the proposal that a single writer be the sole author of Q-drops specific to QAnon. To this goal, texts that are too short, quotations and speeches of historical figures for example, that do not have a personal syntax to the anonymous writer have been discarded for analysis in chronological order. Multivariate statistical analyses focused on comparing three-letter pattern profiles to define the styles used in QAnon.
The stylometrics of the 7.5k chr Q-drops concatenates classified chronologically reveals two clusters, characteristic of two different styles corresponding to the two periods of publication of the Q-drops on the 4chan and 8chan forum. This observation sheds light on the background information of the media investigations. The signal is mostly carried by Q-drops of less than 1k chr and clustering does not seem to interfere with the analyses. The other type of concatenation tested, concatenation by size, was unable to cluster reasonably. A success rate of more than 90% was calculated for the stylometrics of Q-drops concatenates classified chronologically by non-hierarchical clustering analysis. This rate is comparable to that measured in a criminal case under investigation and to that obtained on texts from a solved case.
Other algorithms are available that would allow to specify the results already obtained. If texts from candidate writers are provided, this knowledge of the QAnon corpus would make it easier to recognize the styles of people likely to have written Q-drops.
Download this white paper in pdf