SSFGM

This repository contains the codes for the Semi-Supervise Factor Graph Model (SSFGM).

If you have any problem with the code and data format, please contact the author by yujieq@csail.mit.edu.

Codes

In the ./py folder, we present our python implementation of the Two-Chain Sampling (TCS) algorithm, based on Tensorflow framework. (newest version)

In the ./cpp folder, we present our c++ implementation of the Loopy Belief Propagation (LBP), SampleRank, and TCS with Metropolis-Hastings sampling algorithm.

In the ./misc folder, there are some other scripts for baseline methods or evaluations.

Please see each folder for details.

Datasets

Please download the preprocessed feature files using the following links:

Twitter (World) (2.2G)
Twitter (USA) (139M)
Weibo (616M)

We cannot release the raw data of the Twitter datasets due to some limitations. Original data for Weibo and Facebook can be found at:

Weibo: https://aminer.org/influencelocality

Facebook: http://snap.stanford.edu/data/egonets-Facebook.html

Data Format

Training file consists of two parts: node and edge.

The first part is node. Each line represent a node (instance), and the format is defined as follows:

[+/*/?]label featname_1:val featname_2:val ... [#id]

where +/*/? each stands for training/validation/testing data, labels and feature names can be strings (length<32). The value can be real-valued or 0/1. We suggest to normalize the input features to [0,1].

The second part is edge. Each line represent an edge (correlation between two instances). The format is:

#edge line_a line_b edgetype

where line_a, line_b correspond to two nodes in the first part, and lines are counted starting with 0. edgetype is a string indicating the type of this edge. Currently the code only support one type.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Readme.md

SSFGM

Codes

Datasets

Data Format

Files

Readme.md

Latest commit

History

Readme.md

File metadata and controls

SSFGM

Codes

Datasets

Data Format