PyTorch code2class

This repository contains an pytorch implementation of a network combination of code2vec: Learning Distributed Representations of Code and code2seq: Generating Sequences from Structured Representations of Code.

The implementation is based on the pytorch code2vec implementation (https://github.com/bentrevett/code2vec) by bentrevett.

It uses an added LSTM path encoding from code2seq and softmax label classification.

Requirements

Python 3+
PyTorch
A CUDA compatible GPU
CometML

Quickstart

./download_preprocess.sh to get the datasets from the code2seq paper.
./preprocess.sh to create necessary dictionary and format data. Also saves the files with suffix '.c2c'.
python run.py

Data

Context/Example Format

We have a training, testing and validation file, where:

Each row is an example.
Each example is a space-delimited list of fields, where:

The first field is the target label, internally delimited by the "|" character
Each of the following fields are contexts, where each context has three components separated by commas (","). None of these components can include spaces nor commas.

We refer to these three components as a token, a path, and another token, but in general other types of ternary contexts can be considered.

Each token is a token in the code.

Each path is a path between two tokens, split to path nodes (or other kinds of building blocks) using the "|" character.

One example would look like: <label-1>|...|<label-n> <context-1> ... <context-m>

Where each context is: <left-token>,<path-node-1>|...|<path-node>,<right-token> Here left-token and right-token are tokens, and <subtoken-1>|...|<subtoken-p> is the syntactic path that connects them.

One row/example in a file could look like: target1|target2 token1,path|that|leads|to,token2 token3,another|path,token2

Files

The examples are split up into 3 files:

<data_dir>/<data>/<data>.train.c2c
<data_dir>/<data>/<data>.test.c2c
<data_dir>/<data>/<data>.val.c2c

A dictionary (<data_dir>/<data>/<data>.dict.c2c)is also required. This will be created by running ./preprocess.sh

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
c2c_network.png		c2c_network.png
download_preprocessed.sh		download_preprocessed.sh
models.py		models.py
preprocess.py		preprocess.py
preprocess.sh		preprocess.sh
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch code2class

Requirements

Quickstart

Data

Context/Example Format

Files

About

Contributors 2

Languages

License

noemiernst/code2class

Folders and files

Latest commit

History

Repository files navigation

PyTorch code2class

Requirements

Quickstart

Data

Context/Example Format

Files

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages