Skip to content

kurpicz/tcc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Corpus Collection (tcc)

This is work in progress!

What is it?

This project provides simple tools to obtain (popular) text corpora that are used for benchmarks and tests.

What it is not?

We do not host any of the corpora. We just provide an easy way to get and/or compute them. Please visit the websites of the corpora for further information.

What is contained?

How to use it?

Use make download to download all files in the download configs, make random to generate random strings as defined in the config and make processing to build all preprocessing tools.