Skip to content

Talk on generalizability of stylometry to darkweb forums presented at the Cambridge Cybercrime Conference 2023

License

Notifications You must be signed in to change notification settings

pranavmaneriker/ccc_darkweb_stylometry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ccc_darkweb_stylometry

Talk on generalizability of stylometry to darkweb forums

Abstract:

Stylometry on web forums enables researchers to track changes in a user's writing style across multiple posts and threads. Our prior work has shown that combining structural and textual features can help identify darkweb users who migrate across different forums. However, on the darkweb, users often try to conceal their identities through pseudonyms and other obfuscation techniques. In this study, we investigate whether author identification models trained on clear web forums can be applied to darkweb forums. To accomplish this, we leverage Reddit data to model the clear web, as Reddit forms the basis of popular darkweb forums like Dread. We analyze whether authorship identification models trained on Reddit data can successfully identify authors on Dread and other darkweb forums present in the CrimeBB dataset. We also investigate how the amount of training data and its specificity affect the accuracy of our models. Finally, we compare the performance of fine-tuned clear web models to those trained on darkweb data alone. (Full list of authors: Pranav Maneriker, Yuntian He, Scott Duxbury, Dana Haynie, and Srinivasan Parthasarathy)

Link: https://www.cambridgecybercrime.uk/conference2023.html

About

Talk on generalizability of stylometry to darkweb forums presented at the Cambridge Cybercrime Conference 2023

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published