As a data engineer at an aeronautics consulting firm renowned for its efficient airfoil design for planes and sports cars, my role involves facilitating data preparation and constructing machine learning pipelines. In this project, we'll be working with a modified version of the NASA Airfoil Self Noise dataset. Our objective is to clean the dataset by removing duplicate rows and eliminating any entries with null values. Subsequently, we'll develop a machine learning pipeline to create a predictive model that estimates sound levels based on all other available columns. Following model evaluation, we'll finalize the process by persisting the model for future use.
- Load a CSV dataset.
- Eliminate any duplicate entries.
- Remove rows containing null values.
- Apply necessary transformations.
- Save the cleansed data in the Parquet format.
- Develop a machine learning pipeline for prediction.
- Assess the model's performance using appropriate metrics.
- Save the model for future deployment purposes.
- Load and validate the stored model.
- The original dataset can be found here NASA airfoil self noise dataset. https://archive.ics.uci.edu/dataset/291/airfoil+self+noise
Diagram of an airfoil. - For informational purpose
Diagram showing the Angle of attack. - For informational purpose