Skip to content

A web app which predicts the diameters of asteroids using machine learning and astronomical data

License

Notifications You must be signed in to change notification settings

Oyebamiji-Micheal/Asteriod-Diameter-Prediction

Repository files navigation

Asteriod Diameter Prediction

Language Framework Framework hosted build reposize

Predicting the diameters of asteroids using machine learning and astronomical data

You can view the live demo here.

Table of Contents

Project Overview

One of the intriguing challenges in the field of astronomy is the prediction of asteroid diameters. Asteroids, celestial bodies orbiting the Sun, come in various shapes and sizes, making the estimation of their diameters a complex task. Over the years, numerous methods and approaches have been developed to tackle this challenge, each striving to outperform its predecessors. I have decided to step outside of my domain into the realm of asteroids, aiming to leverage the power of data-driven algorithms to predict their diameters accurately to a reasonable extent. In doing so, it not only contributes to the field of space science (lol) but also demonstrates the potential of machine learning in solving complex problems in alien domains.

Objective

The primary objective of this project is to address the challenging task of asteroid diameter prediction. This objective is pursued with a key intention of benchmarking against a prior work titled Prediction of Asteroid Diameter with the Help of Multi-Layer Perceptron Regressor by Victor Basu. Basu's work encompasses the application of various machine learning algorithms, including XGBoost, Random Forest, ADA Boost and Multi-Layer Perceptron Regressor. Performance evaluation metrics such as mean absolute error, mean squared error, and R-squared score are used to assess the effectiveness of these models.

Data

While I have familiarized myself with the data description, it's worth noting that the column names may not immediately convey their meanings to anyone reading the notebook for the first time. To enhance clarity and facilitate a deeper understanding of the dataset, concise and informative summaries have been included for each column. This addition aims to provide a clearer overview of the dataset, ensuring that anyone can grasp the context and insights more readily.

Column Name Kaggle Description Additional Description
full_name Object's full name/designation Contains the complete designation or name of celestial objects in the dataset, serving as a unique identifier.
a Semi-major axis (au) Represents the size of the object's orbit around the Sun in astronomical units (au).
albedo Geometric albedo Reflectivity of the object's surface, indicating how much sunlight it reflects.
e Eccentricity Indicates how elliptical or circular the object's orbit is, with values close to 1 indicating high eccentricity.
i Inclination (deg) Angle describing the tilt of the object's orbit relative to the solar system's plane.
q Perihelion distance (au) Closest distance between the object and the Sun during its orbit, measured in astronomical units.
ad Aphelion distance (au) Farthest distance between the object and the Sun during its orbit, measured in astronomical units.
per_y Orbital period Time taken for the object to complete one orbit around the Sun, measured in years.
data_arc Data arc-span (d) Duration over which observational data has been collected for the object, measured in days.
condition_code Orbit condition code Code indicating the quality and reliability of the object's orbital data.
n_obs_used Number of observations used Number of observational data points used to calculate the object's orbital parameters.
H Absolute Magnitude parameter Measure of the object's intrinsic brightness or reflectivity, indicating its size and composition.
diameter Diameter of asteroid (Km) Physical size of the asteroid, measured in kilometers.
rot_per Rotation Period (h) Time taken for the object to complete one full rotation around its axis, measured in hours.
neo Near Earth Object Indicates whether the object is classified as a Near Earth Object (NEO), with orbits in close proximity to Earth.
pha Physically Hazardous Asteroid Identifies whether the object is classified as a Physically Hazardous Asteroid (PHA) with the potential to pose a physical threat to Earth.
moid Earth Minimum orbit Intersection Distance (au) Quantifies the closest approach of the object's orbit to Earth's orbit, providing information about potential close encounters with our planet.
... ... ...

The full column description can be found in the notebook.

Insights From EDA

The question of which feature(s) correlate with asteroid diameter and other related questions have I tried to answer while performing data visualization.
  • It seems most asteroids have a diameter of 2km - 5km while others have up to 939km. In order to avoid a funny-looking plot due to the huge outlier, the data was capped using Tukey’s method before plotting.

    Capping the data before plotting allows us to see the underlying distribution of the data - right-skewed (positively skewed). Rightly skewed here means that the majority of the data points are clustered on the left side of the distribution, and there are some larger values on the right side that are pulling the mean to the right.

  • The semi-major axis of an asteroid is one-half of the major axis of the elliptical orbit. It is measured in astronomical units and describes an object's distance from the Sun. From the scatter plot below, the data points are concentrated in certain areas and all I see here is a weak correlation.

  • Another feature we can look at which perhaps provides us with an estimate of diameter is the Minimum orbit intersection distance (MOID). MOID is a measure used in astronomy to assess potential close approaches and collision risks between astronomical objects. It quantifies the closest approach of the object's orbit to Earth's orbit, providing information about potential close encounters with our planet.

    What was observed here are three clear clusters with a somewhat linear trend between each cluster. Asteroids in the first cluster seem to have a relatively small diameter. Though we can roughly conclude that asteroids in the third cluster surely have a higher diameter, it is worth keeping into consideration that a lot of asteroids in the second cluster despite having a lower EMOID, have a significantly high diameter. My takeaway from here is probably taking advantage of clustering algorithms such as K-Means which can capture this relationship.

    These insights demonstrate the advantage of doing EDA and not only looking at correlation coefficients or scatter matrix.

Model and Evaluation Metrics

The performance of all models used in the notebook is given below

Model R2 Score Adjusted R2 Score RMSE MAE
Random Forest Regressor 0.961945 0.961905 0.473486 0.305354
LightGBM Regressor 0.961883 0.961843 0.473868 0.310384
XGBoost Regressor 0.960489 0.960447 0.482460 0.318531
K-Nearest Neighbors 0.861554 0.861408 0.903108 0.623587
Linear Regression 0.830079 0.829900 1.000515 0.713975
Ridge Regression 0.829130 0.828949 1.003305 0.716889
Elastic Net -0.000093 -0.001148 2.427280 1.824157

In actuality, it is not feasible to directly compare my results to the one in the research paper since we do not use the same test set. However, when I compared various results from other people's notebooks on Kaggle, I was able to achieve a better result just by doing a few data cleaning and feature engineering. I do not 100% trust these values though. Perhaps there are some columns in our training data which provide some think of info in the form of data leakage. Since I do not see anything related to this in the research paper, I have just simply gone with these values.

Kindly note that the above feature importances are from the random forest model and not lightgbm.

References

About

A web app which predicts the diameters of asteroids using machine learning and astronomical data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published