Investigating a proposed standardised method for authorship attribution in l1/2 english texts
Abstract
Authorship attribution is a method of revealing the obscure or unknown individuals who may have played a part in the creation of texts (Kotze, 2007). The purpose of authorship attribution is both to test claims of authorship and to detect secret or anonymous authors. Yet, to the best of our knowledge, there is no standardised method for authorship attribution. The literature reveals a need for such a method to be devised and ratified by the courts so that forensic linguists may act as expert witnesses. Such a method needs to be both quantitative and qualitative in nature. The dearth of such a method led to a mixed-method investigation into existing frames and methods of authorship attribution, with a view to proposing a method of enquiry that would uncover the unseen authors or contributors to texts. A detailed and systematic literature review led to the identification of writing-style features and a classification technique for the proposed method. The research therefore set out to discover whether authorship attribution could include a consideration of T-units and cohesion markers, in addition to Chaski’s (2007: 133–146) existing language indicators of authorship, which are (i) end-of-sentence punctuation, (ii) internal structure of sentences, and (iii) average sentence length. Ultimately, the investigation sought to uncover whether the additional two markers could be used for authorship profiling among selected first language (L1) and second language (L2) English speakers. These markers constituted the specific writing-style features of the various authors and were manually tagged in the chosen texts. The classification technique was assisted by the software, WordSmith Tools. The investigation analysed the linguistic evidence of eight L1/L2 English texts – four L2 texts and four L1 texts – to test whether the methodology actually worked. The markers were identified in the texts, and graphs and tables were calculated to present the quantitative data. From this data, linguistic deductions were made about the authors. The study found that a method for hidden authorship attribution is possible, but its success depends on the combination of writing-style features selected and the nature of the text analysed. The study found that average sentence length, average T-unit length and cohesion markers are good indicators of authorship, but that a more reliable classification technique is still needed. However, the current method may be used successfully in the analysis of texts of various lengths found online, as long as there is more than one text by the same author to serve as a basis for identifying these features.