Talk on generalizability of stylometry to darkweb forums
Abstract:
Stylometry on web forums enables researchers to track changes in a user's writing style across multiple posts and threads. Our prior work has shown that combining structural and textual features can help identify darkweb users who migrate across different forums. However, on the darkweb, users often try to conceal their identities through pseudonyms and other obfuscation techniques. In this study, we investigate whether author identification models trained on clear web forums can be applied to darkweb forums. To accomplish this, we leverage Reddit data to model the clear web, as Reddit forms the basis of popular darkweb forums like Dread. We analyze whether authorship identification models trained on Reddit data can successfully identify authors on Dread and other darkweb forums present in the CrimeBB dataset. We also investigate how the amount of training data and its specificity affect the accuracy of our models. Finally, we compare the performance of fine-tuned clear web models to those trained on darkweb data alone. (Full list of authors: Pranav Maneriker, Yuntian He, Scott Duxbury, Dana Haynie, and Srinivasan Parthasarathy)
Link: https://www.cambridgecybercrime.uk/conference2023.html