This ipython notebook is working to build a model which will predict the house price based on some features, identify the features that affects the house’s price and carry out evaluation matrices.
The dataset used has been taken from: Kaggle: housing_datafile
You can follow the analysis on kaggle
Real estate management has become a significant industry in recent years, with a significant amount of income and transactions taking place. One of the most important aspects of this industry is the accurate prediction of the price of a house. In buy and sell based businesses, such as real estate, one of the most important aspects is accurately predicting the price of a house based on various features. This can be done using machine learning algorithms such as RandomForestRegressor, LinearRegression, etc which takes into account various features of the property, such as its location, size, number of rooms, and other relevant factors. Real estate managers can use this information to make more informed decisions about the pricing and sales of properties. By accurately predicting the price of a house, real estate managers can streamline their sales process and make it more efficient, ultimately leading to increased profits.
RandomForestRegressor is a machine learning algorithm that is part of the Random Forest ensemble method. It is used for regression problems, where the goal is to predict a continuous numerical value. The algorithm creates multiple decision trees (hence the name "forest") and combines their predictions to make a final prediction. The combination of multiple decision trees helps to reduce overfitting and improve the overall performance of the model. It is implemented in scikit-learn library, and it can be used by importing the RandomForestRegressor class from the sklearn.ensemble module.
identify the features that affects the house’s price, build a model which will predict the house price based on some features, test the model, carry out evaluation matrices, Evaluate the model.
- EDA : understand the data
- Feature Creation
- Data Cleaning
- Data Scaling
- Create Pipeline
- Train Model
- Test Model(Evaluation):
1. MSE
2. RMSE
3. Cross-Validation
- Dump the model
- Use the model to do predictions
- Deployment