Detecting Fringe Financial Advice.

An NLP-based classification tool for advisors and private investors.

Nicholas Van Bergen
General Assembly Data Science Immersive
Cohort 0927-Remote

Combating Fringe Financial Advice on Reddit, Executive Summary:

Using Webscraping and Natural Language Processing techniques, we have developed a tool to help investment advisors combat notions of positive investment returns on so-called meme trades primarily and nearly exclusively made by users in the subreddit community r/wallstreetbets (WSB).

Problem statement:

We consider the majority of advice within WSB to be of poor quality, that is more likely to lead to financial ruin than prosperity. We apply webscraping and NLP classification techniques to build a product that will alert our target audience if some web clipping came from WSB or some other source.

Solution:

Using Multinomial Naive Bayes classification algorithm we have developed the minimum viable product (mvp) for such a system.

Results:

Metric	*Null Model*	Multi.Naiive Bayes (MNB)	Random Forest Classifier (RFC)	$$\Delta$$ RFC - MNB	Extra Trees Classifier (ETC)	$$\Delta$$ ETC-mnb
Accuracy	0.50	Training: 0.808 Testing: 0.783	Training: 0.997 Testing: 0.792	Training: 0.189 Testing: 0.009	Training: 0.997 Testing: 0.799	Training: 0.189 Testing: 0.016
Type I error	---	1,413	1,925	492	2,321	908
Type II error	---	1,301	1,175	-120	847	-454
Sensitivity	---	0.792	0.811	0.019	0.864	0.072
Precision	---	0.778	0.727	0.051	0.699	-0.079
F1	---	0.785	0.766	-0.019	0.773	-0.012
ROC AUC	---	0.865	0.84	-0.025	0.848	0.017

Conclusion

The prototype mvp has been promising. Additional tuning of the model and testing should be completed to improve model accuracy and reduce Type One errors from the testing and training data.

The extensibility of this product is vast. The vision for this product is to be an end-to-end advice confidence tool that will tell investors (not just investment advisors) a actionable ranking of the advice being input into the tool. In other words, we envision a system that will take natural language inputs, scan investment message boards, review track records of the users/bots offering the advice and then determine if that propositional investment is viable.

Our mvp product is the first step to such a useful and helpful (and potentially lucrative) tool.

Structure of project notebooks.

Our research conducted over two notebooks in JupyterLab.

Ingestion
In this notebook we design a way to collect, clean, and vectorize parsed text data.
Modeling
In this notebook we pass our cleaned data through many models.

About the API

Pushshift's API is fairly straightforward. For example, if I want the posts from /r/boardgames, all I have to do is use the following url: https://api.pushshift.io/reddit/search/submission?subreddit=boardgames

To help you get started, we have a primer video on how to use the API: https://youtu.be/AcrjEWsMi_E

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
1. code		1. code
2. Data		2. Data
3. presentation		3. presentation
.DS_Store		.DS_Store
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detecting Fringe Financial Advice.

An NLP-based classification tool for advisors and private investors.

Nicholas Van Bergen
General Assembly Data Science Immersive
Cohort 0927-Remote

Combating Fringe Financial Advice on Reddit, Executive Summary:

Problem statement:

Solution:

Results:

Conclusion

Structure of project notebooks.

Contents

About the API

About

Releases

Packages

Languages

License

nvbergen/NLP_fringe

Folders and files

Latest commit

History

Repository files navigation

Detecting Fringe Financial Advice.

An NLP-based classification tool for advisors and private investors.

Nicholas Van Bergen General Assembly Data Science Immersive Cohort 0927-Remote

Combating Fringe Financial Advice on Reddit, Executive Summary:

Problem statement:

Solution:

Results:

Conclusion

Structure of project notebooks.

Contents

About the API

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Nicholas Van Bergen
General Assembly Data Science Immersive
Cohort 0927-Remote

Packages