Finviz and Stocktwits scraper

This program collects 80 variables about a ticker by scraping Finviz and Stocktwits. It divides the input into multiple threads, making it possible to scrape many thousand tickers at a time. The number of threads to use can be changed by changing the THREADCOUNT constant.

Dependencies

It uses htmlunit and minimal-json for scraping the net and parsing the json. I recommend htmlunit version 2.46 because I had problems with the newer version. Jar links: htmlunit version 2.46 and minimal-json

Usage

Check the demo method in Main for a working example.

Finviz

For scraping finviz data you have to use the Finviz constructor, that requires a list of tickers (strings, not case sensitive). Then you have to run it or start it through the thread, after which you can acces the Finviz.data which contains a list of Ticker objects. Each Ticker has a public HashMap which contains all the data scraped from finviz. This includes the 72 variables in the main table and additionaly the industry and the sector of the company.

Stocktwits

Similarly to finviz, you have to make a StocktwitsScraper class and after running, you can acces the StocktwitsScraper.result which contains a list of Data classes. Each Data class contains the information scraped from stocktwits about that ticker. This consists of the following:

String ticker - the ticker of the company
boolean found - whether stocktwits keeps track of the company or not
int trending - 2 if it didn't find this data, 0 if it was true, 1 if it was false
float trendingScore - if it didn't found it, it is set to Variables.undefinedFloat
float msgVolume - if it didn't found it, it is set to Variables.undefinedFloat
int followers - number of followers
float sentiment

Output

Originally I wrote this program to run it periodically on a Raspberry Pi before market open (that's why it was important to keep it lightweight). Because of that there is a function appendGoodFile which takes the list of Data files and writes it out to an existing csv file. This method marks the date, at which it added a new colum in the first row. If you want to start a new file, you have to start it by making the file and copying the list of tickers you will scraper. This will be the first column. After that all the data will be recorded into that file at each run.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
src/com/company		src/com/company
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finviz and Stocktwits scraper

Dependencies

Usage

Finviz

Stocktwits

Output

About

Releases

Packages

Contributors 2

Languages

harcipulyka/Stock_Data_Scraper

Folders and files

Latest commit

History

Repository files navigation

Finviz and Stocktwits scraper

Dependencies

Usage

Finviz

Stocktwits

Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages