Skip to content

akashg94/StockMarketAnalysisAndMLPrediction

Repository files navigation

StockMarketAnalysisAndMachineLearningPrediction-

Abstract:

The project that we have done is about Analysis and prediction of the stock market. Our focus was to see how the Covid19 issue actually affected the different types of industries such as Technology, Airlines, and Healthcare. I personally was looking at airline industries and how the Coronavirus pandemic affecting the airline industry since there have been canceled flights all over the world. By performing various types of analysis and using machine learning to understand and learn more about the stock market and predict the value of any particular company in a few months or years of stocks. Once covid19 started and the whole world and businesses were losing tons of millions of dollars for this pandemic issue, we were interested in how those huge companies were losing shares and profit or if there are any particular industries who are getting benefited such as healthcare companies or it’s just every company that losing business and profits. By recovering three various types of stocks in each market, we will have the option to decide how well each stock will do while helping us increase a more clear picture of its stock action. We have decided to use the python programming language for performing analysis and machine learning. We have used powerful python libraries such as pandas, NumPy, matplotlib, etc for the analysis using various graphs and plots to determine the stock movement over the period of time. We have also used machine learning to predict the market or how the market will behave in the future to understand how those stocks will take place near. Is their stocks will crash more or it will get better, such questions have arrived during this project. We have collected data as a CSV file and used DataReader to directly connect with the server to fetch the data from there to do our analysis. If I think like a data analyst or data scientist my role is to find the value out of my data. Such as fetching the data from various sources and analyzing better for understanding how the business or company is performing over a period of time. Our primary goal was to be in applying data analysis techniques, doing statistical analysis, and building high-quality prediction systems integrated with our companies. It takes a lot of time to study data analysis and machine learning algorithms to truly understand the concepts of understanding the market how it’s working.

Goals:

The main goal of this project is to investigate investment opportunities while weighing the risks and benefits while making educated investment decisions. A detailed understanding of the risks and the opportunities on high return for each industry. By gaining all this information, this will be used to help make future decisions in making smart investments.

We want to be able to analyze specific stock markets such as Technology, Airlines, and Healthcare based on the effects of the coronavirus outbreak globally. By retrieving three different kinds of stocks in each market, we will be able to determine how well each stock will end up doing. By analyzing the stock market data with Python (Pandas, NumPy, matplotlib, Seaborn, etc.). We will also store the data via MySql which will also help us access the data through jupyter notebook and Pycharm Python IDE, which will also help to generate graphs analyze the behavior of the data.

We will also try to implement a sort of machine learning for time series into our program. Further research is needed in order to complete this step. We also want to be able to implement effective team strategies in order to fully understand processes similar to Scrum.

Project Sources:

Language/ IDE used:

Python(Pandas, NumPy, matplotlib, Seaborn, etc.)/Jupyter notebook/PyCharm Machine Learning Machine Learning: Machine learning is part of data science. The word learning in machine learning means that the algorithms depend on some type data that has been used as a training set to fine-tune some model or algorithm parameters. There are many techniques such as regression, naive Bayes or supervised clustering. Machine learning is also involved with statistics to build a model to predict the behavior of data in future. It finds the pattern of the data by using the algorithms of machine learning. By data it can be mean numbers, words, images, clicks or anything we can digitally stored can be used as machine learning data. It has been seen as a subset of Artificial Intelligence which basically builds a mathematical model to predict the data by using a training set. Machine learning has been used everywhere from retail to the financial industry to predict the data set to know what is going to happen in future or how to make it better by getting a prediction. We have tried to use four different types of machine learning algorithms to understand the stock market data we got. We basically want it to see how this model can train the data set we have and implement our data for prediction.

Linear Regression: There are two types of supervised machine learning model one is Regression and another one is classification. Regression predicts continuous value of the output where another one predicts discrete output. For this model we have used scikit-Learn which is one of the most popular machine learning libraries to use it. Scikit-learn has most of the statistical modelling including regression, classification, clustering etc. It has various components such as supervised learning algorithm, unsupervised algorithm and cross validation. In this particular model we have used a supervised algorithm. The spread of machine learning is one of the big reasons to use scikit-learn. We also have used Keras with tensorflow API which is one of the leading high-level neural networks APIs. It is written in Python and supports multiple back-end neural network computation engines. Keras is an open-source neural-network library. It is capable of running on top of TensorFlow, Designed to enable fast experimentation with deep neural networks. Which is widely used in machine learning algorithms.

Long Short-term Memory (LSTM) Model : LSTM model is an artificial recurrent neural network architecture that is used in the field of deep learning. This model has feedback connections that only can process multiple layers of data. Which makes it one of the most useful models for machine learning. LSTM networks have memory hinders that are associated through layers. A square has parts that make it more useful than a traditional neuron and a memory for ongoing arrangements. A square contains doors that deal with the square's state and yield. A square works upon an info succession and each entryway inside a square uses the sigmoid actuation units to control whether they are activated or not, rolling out the improvement of state and expansion of data coursing through the square. Below is the picture of how the LSTM model works in real life time.

Support Vector Regression Model: The Support Vector Regression (SVR) uses the same principles as the SVM for classification, with only a few minor differences.In SVR we try to fit the error within a certain threshold. Our objective when we are moving on with SVR is to basically consider the points that are within the boundary line. By using the SVR model we are to minimize error, individualizing the hyperplane which maximizes the margin, keeping in mind that part of the error is tolerated. By using an SVR model it is geared towards finding a cutting plane through the data which separates the data into two regimes in a way that maximizes the distance from the cutting plane to both datasets (the margin). For this model we have used scikit-learn regression which predicts continuous-valued attributes associated with an object. Scikit-learn is Simple and efficient tools for predictive data analysis Accessible to everybody, and reusable in various contexts. It is built on numpy, scipy and matplotlib.

KNN (K to the nearest neighbor): KNN is a non-parametric, lazy learning algorithm. A technique is non-parametric meaning that it does not make any assumptions about the underlying data. In other words, it makes its selection based on the proximity to other data points regardless of what feature the numerical values represent.KNN Can be used for classification and regression. The k-nearest neighbors (KNN) algorithm is a simple, supervised machine learning algorithm that can be used to solve both classification and regression problems. It’s easy to implement and understand, but has a major drawback of becoming significantly slower as the size of that data in use grows. KNN algorithm is basically reading the data set you provided to it and it will run a training and do an accuracy test to give you the feedback how good enough the data set is for training.