The data contains 28242 rows and 7 columns
1- Area
2- Crop
3- Year
4- Average rain fall mm per year
5- Pesticides tonnes
6- Average temperature
7- hg/ha_yield (Output)
1- Dropped the "Year" column because it has no relevance
2- Hot encoded categorical data using pandas get dummis
3- Split the data into X (features) and Y (output)
4- Normalized the feature columns to be between 0 and 1
5- Split the data into 80% for training and 20% for testing
1- Used LazyPredict library to compare the results of multiple regression algorithms
2- Use Random Forest Regressor for regression as it has the best accuracy