Skip to content

[OUTDATED] scrapy spiders to crawl the financial text data 📚 📜 pertinent to train word vectors 🚀

License

Notifications You must be signed in to change notification settings

hardikp/scrapy-finance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrapy-finance

license

scrapy spiders to crawl the financial data pertinent to train word vectors.

List of sources

How to use this

  1. Install scrapy.
pip3 install scrapy
  1. Run the scrapy crawl command.
(py3) hardik@shire:~/scrapy-finance$ scrapy crawl bloomberg

How to modify spiders for your use

Please look at the specific spider files like wikipedia.py. They are relatively easy to follow and modify.

.
├── LICENSE
├── README.md
├── scrapy.cfg
└── text
    ├── __init__.py
    ├── items.py
    ├── middlewares.py
    ├── pipelines.py
    ├── settings.py
    └── spiders
        ├── bloomberg.py
        ├── __init__.py
        ├── investopedia.py
        ├── qplum.py
        └── wikipedia.py

Notes

  • The text data is written in the lower case at the moment in all spiders.
  • This is not checked with python2.

Contributing

Please feel free to submit a pull request to add relevant spiders.

LICENSE

MIT

About

[OUTDATED] scrapy spiders to crawl the financial text data 📚 📜 pertinent to train word vectors 🚀

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published