GitHub

This is an official Pytorch implementation of the approaches proposed in:

Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, Zhi Jin “Integrating Tree Path in Transformer for Code Representation”

which appeared at NeurIPS 2021[Paper Link].

In this paper, we investigate the interaction between the absolute and relative path encoding, and propose novel code representation model TPTrans and its variants, which introduce path encoding inductive bias into the attention module of Transformer and power Transformer to know the structure of source codes.

Please cite our paper if you use the model, experimental results, or our code in your own work.

1.1 Raw data

To run experiments with TPTrans and its variants, please first create datasets from raw code snippets of CodeSearchNet dataset. Download and unzip the raw jsonl data of CSN into the raw_data dir like that

├── raw_data        
│   ├── python         
│   │   ├── train    
│   │   │   ├── XXXX.jsonl...
│   │   ├── test    
│   │   ├── valid   
│   ├── ruby          
│   ├── go        
│   ├── javascript

1.2 Tree-Sitter

The Tree-Sitter is a open-source parser for multi-language programming languages. Please install it and then download the grammer files into vendor dir for four different programming languages like that

├── vendor        
│   ├── tree-sitter-python  (from https://github.com/tree-sitter/tree-sitter-python)         
│   ├── tree-sitter-javascript  (from https://github.com/tree-sitter/tree-sitter-javascript)     
│   ├── tree-sitter-go  (from https://github.com/tree-sitter/tree-sitter-go)
│   ├── tree-sitter-ruby  (from https://github.com/tree-sitter/tree-sitter-ruby)

After that, run the multi_language_parse.py in parser dir to parse the raw code snippets into the data dir.

1.3 Training

After preprocessing, run the _main.py_ to train the model.

To run the TPTrans, please specify the relation_path=True and absolute_path=False.

To run the TPTrans-\alpha, please specify the relation_path=True and absolute_path=True.

For other command triggers, please refer the comment inline for details.

Contact If you have any questions, please contact me via email: phan@pku.edu.cn or open issue on Github.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
build		build
checkpoint		checkpoint
data		data
dataset		dataset
model		model
parser		parser
pic		pic
raw_data		raw_data
run		run
trainer		trainer
vendor		vendor
.gitignore		.gitignore
README.md		README.md
__main__.py		__main__.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1.1 Raw data

1.2 Tree-Sitter

1.3 Training

About

Releases

Packages

Languages

nashid/TPTrans

Folders and files

Latest commit

History

Repository files navigation

1.1 Raw data

1.2 Tree-Sitter

1.3 Training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages