house_prices_RF

Random Forest practice using house price data from kaggle

Utilizing Fastai as a guide and for accessory functions. Utilizing scikit-learn RandomForestRegressor.

Things considered and implemented:
Preprocessing data:

converting categorical string variables to "categories" (which encode the numeric information necessary for machine learning)
performing feature extractions if there are dates for example
reordering any ordinal variable categories to make more sense ("high", "medium", "low")
taking care of any missing data, which we cannot pass directly to a Random Forest

fastai function train_cats to convert strings to pandas categories.
Check for missing values.
fastai function proc_df to handle missing continuous data (replacing missing values with the median).

split dataset into training and validation sets. Validation set is 25% of total dataset.
Consider OOB score.

Attempt to reduce overfitting
Subsampling: fastai function set_rf_samples to give each tree a random sample of n random rows (default is to use all rows with replacement)
Grow trees less deeply: adjust the min_samples_leaf parameter of RandomForestRegressor
Increase variation among trees: randomly sample columns for each split by adjusting the max_features parameter of RandomForestRegressor.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
house_prices_rf.ipynb		house_prices_rf.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

house_prices_RF

About

Releases

Packages

Languages

ba-davis/house_prices_RF

Folders and files

Latest commit

History

Repository files navigation

house_prices_RF

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages