There are three ways to Download and Manage the MWDC package:
1 - Use GitHub Desktop (Recomended)
2 - Use command line:
*Because the repository is private the command line method is not Recomended.
3 - Download the .zip
file and use it.
4 - On Google Colab use the command below.
## Installation
#### 1. On PC
To install the package you need to create an environment using [pip](https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/) or [conda](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html).
##### Conda environment setup
```bash
conda create -n mwdc pandas numpy xarray netCDF4 matplotlib scikit-learn scipy dask
conda activate mwdc
After that just clone this repository and install the setup.py
file inside it.
cd multivariate-weather-data-clustering
python setup.py install
Note: If you are using macOS, you should use python3 setup.py install
instead.
After cloning the repository just run the command below to install it.
%cd multivariate-weather-data-clustering
!python setup.py install
To use the functions you just need to import them from MWDC. Modules could be imported either seperately or all together.
from mwdc import *
## or ##
from mwdc.preprocessing import preprocessing
from mwdc.evaluation import st_evaluation
from mwdc.visualization import visualization
Example:
trans_data = preprocessing.datatransformation(data)
Functions | Description |
---|---|
transformddaily() |
Transformation function for Daily Data |
transformdmock() |
Transformation function for Mock Data |
transformqm() |
Variable for Quater Map |
datatransformation() |
Description in the Note below* |
datanormalization() |
Input in this case will be the transformed pandas dataframe |
null_fill() |
Function to input NaN values across variables |
pca1() |
data is data to be input , n is the number of components |
pcacomponents() |
Showing the proper number of components for pca by computing cumulative variance |
data_preprocessing() |
Transforms the xArray input data into a 2D NumPy Array. |
*Note: This function is used to transform the xarray dataset into a pandas dataframe where the dimension "time" would become the index of the DataFrame and, pairs of both dimensions "latitude" and "longitude" will become the columns for each variable
Functions | Description |
---|---|
dbscanreal(x, eps1=0.5, min=5) |
eps1 for epsilon , min for minimum samples, x is for data input |
Functions | Description |
---|---|
`st_agglomerative(data, n, K, p=7, affinity, linkage) | n=PCA components, K=number of clusters, p=truncate_mode. |
Functions | Description |
---|---|
Kmeans(n_cluster).fit(xarray_data, PCA=(boolian), pass_trans_data=(boolian)) |
* |
Kmeans(n_cluster).evaluate(z, PCA=(boolian), pass_trans_data=(boolian)) |
** |
* This function fits the K-means model to the data that is passed to it.
Parameters that this function will accept are as follows:
- xarray_data = string of the name of the original xarray file
- PCA (bool) = whether or not PCA has to be applied. Default value is True.
- pass_trans_data (bool) = whether saved data has to be passed. If False, data will be transformed instantly. Default value is True.
** This function evaluates and assigns data points to clusters. Parameters that this function will accept are as follows:
- z = string of the name of the original xarray file.
- PCA (bool) = whether or not PCA has to be applied. Default value is True.
- pass_trans_data (bool) = whether saved data has to be passed. If False, data will be transformed instantly. Default value is True.
Functions | Params |
---|---|
st_rmse() |
input,formed_clusters |
st_corr() |
input,formed_clusters |
st_calinski() |
input,formed_clusters |
davies_bouldin() |
input, formed_clusters |
compute_silhouette_score() |
X, labels,transformation=False, *, metric="euclidean", sample_size=None, random_state=None, **kwds |
Functions | Params |
---|---|
visualization() |
data_file,cluster_filename,coast_file |
make_Csv_cluster() |
label,name |
* Parameters that visualization()
will accept are as follows:
- data_file is the .nc file.
- Example data_file = 'path/data.nc' It is the raw unprocessed data. - cluster_filename is the csv file which contains clusterid and time_step.
- Example cluster_filename = 'path/clusters.csv' # This file contains what cluster belongs to what date. - coast_file = This file contains the data of how a coastline should look like in the result.
- Example 'path/coast.txt'.
* Parameters that make_Csv_cluster()
will accept are as follows:
- label contains the clusterids.
- Name is the file name that will generated eg:('test.csv').