Skip to content

Latest commit

 

History

History
52 lines (28 loc) · 2.01 KB

Readme.md

File metadata and controls

52 lines (28 loc) · 2.01 KB

SSFGM

This repository contains the codes for the Semi-Supervise Factor Graph Model (SSFGM).

If you have any problem with the code and data format, please contact the author by yujieq@csail.mit.edu.

Codes

In the ./py folder, we present our python implementation of the Two-Chain Sampling (TCS) algorithm, based on Tensorflow framework. (newest version)

In the ./cpp folder, we present our c++ implementation of the Loopy Belief Propagation (LBP), SampleRank, and TCS with Metropolis-Hastings sampling algorithm.

In the ./misc folder, there are some other scripts for baseline methods or evaluations.

Please see each folder for details.

Datasets

Please download the preprocessed feature files using the following links:

We cannot release the raw data of the Twitter datasets due to some limitations. Original data for Weibo and Facebook can be found at:

Weibo: https://aminer.org/influencelocality

Facebook: http://snap.stanford.edu/data/egonets-Facebook.html

Data Format

Training file consists of two parts: node and edge.

The first part is node. Each line represent a node (instance), and the format is defined as follows:

[+/*/?]label featname_1:val featname_2:val ... [#id]

where +/*/? each stands for training/validation/testing data, labels and feature names can be strings (length<32). The value can be real-valued or 0/1. We suggest to normalize the input features to [0,1].

The second part is edge. Each line represent an edge (correlation between two instances). The format is:

#edge line_a line_b edgetype

where line_a, line_b correspond to two nodes in the first part, and lines are counted starting with 0. edgetype is a string indicating the type of this edge. Currently the code only support one type.