ccc_darkweb_stylometry

Talk on generalizability of stylometry to darkweb forums

Abstract:

Stylometry on web forums enables researchers to track changes in a user's writing style across multiple posts and threads. Our prior work has shown that combining structural and textual features can help identify darkweb users who migrate across different forums. However, on the darkweb, users often try to conceal their identities through pseudonyms and other obfuscation techniques. In this study, we investigate whether author identification models trained on clear web forums can be applied to darkweb forums. To accomplish this, we leverage Reddit data to model the clear web, as Reddit forms the basis of popular darkweb forums like Dread. We analyze whether authorship identification models trained on Reddit data can successfully identify authors on Dread and other darkweb forums present in the CrimeBB dataset. We also investigate how the amount of training data and its specificity affect the accuracy of our models. Finally, we compare the performance of fine-tuned clear web models to those trained on darkweb data alone. (Full list of authors: Pranav Maneriker, Yuntian He, Scott Duxbury, Dana Haynie, and Srinivasan Parthasarathy)

Link: https://www.cambridgecybercrime.uk/conference2023.html

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
evaluation		evaluation
models		models
train		train
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ccc_darkweb_stylometry

About

Releases

Packages

Languages

License

pranavmaneriker/ccc_darkweb_stylometry

Folders and files

Latest commit

History

Repository files navigation

ccc_darkweb_stylometry

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages