To view the associated publication for this project see here: link. (upcoming!) For a link to all associated data used along side the above publication see here : link. (upcoming!)
MAJOR UPDATES IN PROGESS!
The purpose of this project is to explore the predicability of atmospheric trace substance concentraions from the underlying meteorological variables prior to their collection.
This project/model are very much in the development stage. Results are preliminary.
The PASTEL model uses meteorological variables calculated at along backwards trajectories generated by the NOAA HYSPLIT model as inputs to make predictions about atmospheric trace substance concentrations using Sci-Kit Learn's Random Forest Regression algorithm.
Disclaimer! - I am not a software engineer, I apologize for messy code!
Clone the repository and look at the Zenodo link to get the final cleaned csv used for the main "code/v4.5.5_PASTEL.ipynb" file. This is a priliminary file so it will be necessary to change directory paths within the file itself.
Be sure to look at 'Prerequisites' section for necessary packages.
The provided 'pvocal_env.yml' file is the environment with the packages currently required to run the main pvocal_env file.
To install the environment file run this line in an anaconda prompt window:
conda env create -f pvocal_env.yml
Make sure to use the provided environment file and find the data in the Zenodo link.
This project is in it early stages and has some significant challenges yet, but please feel free to explore what you can with the model. There is most likely a way to speed up the code I've written but this is what I have so far!
It can predict 6 target compounds: DMS, Ethane, Ozone, Carbon Monoxide, Methane, and Methyl Bromide with the code currently there. Within the input dataset there is room for many more to be incorporated.
Main contributions include making the model compatible with all VOCs in the spreadsheet and adapting code so that this process is more streamlined.
A way to speed up or modularize this project is probably the main thing it needs right now!
Additionally there exists a package for Utility Based Regression in R here: https://github.com/paobranco/UBL I believe that implementing this for python and as a part of this project would be benifical for more accurate predictions while using the SMOGN algorithm.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
All contributors for PySPLIT!
The NASA SARP program for giving me the courage to chase the idea this project is based off.
Copyright (C) 2024 Victor Geiser. Licensed under the General Public License v3.0 (GPLv3).