SDLib: A Python library used to collect shilling detection methods. (for academic research)
- 1.Configure the **xx.conf** file in the directory named config. (xx is the name of the method you want to run)
- 2.Run the **main.py** in the project, and then input following the prompt.
Entry | Example | Description |
---|---|---|
ratings | ../dataset/averageattack/ratings.txt | Set the path to the dirty recommendation dataset. Format: each row separated by empty, tab or comma symbol. |
label | ../dataset/averageattack/labels.txt | Set the path to labels (for users). Format: each row separated by empty, tab or comma symbol. |
ratings.setup | -columns 0 1 2 | -columns: (user, item, rating) columns of rating data are used;
-header: to skip the first head line when reading data |
MethodName | DegreeSAD/PCASelect/etc. | The name of the detection method |
evaluation.setup | -testSet ../dataset/testset.txt | Main option: -testSet, -ap, -cv -testSet path/to/test/file (need to specify the test set manually) -ap ratio (ap means that the user set (including items and ratings) are automatically partitioned into training set and test set, the number is the ratio of test set. e.g. -ap 0.2) -cv k (-cv means cross validation, k is the number of the fold. e.g. -cv 5) |
output.setup | on -dir ./Results/ | Main option: whether to output recommendation results -dir path: the directory path of output results. |
- 1.Make your new algorithm generalize the proper base class.
- 2.Rewrite some of the following functions as needed.
- printAlgorConfig()
- initModel()
- buildModel()
- saveModel()
- loadModel()
- predict()
- 1.Configure the **xx.conf** file in shillingmodels/config/.
- 2.Modify /shillingmodels/generateData.py as needed and run it.
Entry | Example | Description |
---|---|---|
ratings | ../dataset/averageattack/ratings.txt | Set the path to the recommendation dataset. Format: each row separated by empty, tab or comma symbol. |
ratings.setup | -columns 0 1 2 | -columns: (user, item, rating) columns of rating data are used;
-header: to skip the first head line when reading data |
attackSize | 0.01 | The ratio of the injected spammers to genuine users |
fillerSize | 0.01 | The ratio of the filler items to all items |
selectedSize | 0.001 | The ratio of the selected items to all items |
targetCount | 20 | The count of the targeted items |
targetScore | 5.0 | The score given to the target items |
threshold | 3.0 | Item has an average score lower than threshold may be chosen as one of the target items |
minCount | 3 | Item has a ratings count larger than minCount may be chosen as one of the target items |
maxCount | 50 | Item has a rating count smaller that maxCount may be chosen as one of the target items |
outputDir | ./data/ | User profiles and labels will be output here |
Algorithm | Paper |
---|---|
DegreeSAD | 李文涛,等,一种基于流行度分类特征的托攻击检测算法, 自动化学报 |