This Python project aims to detect anomalies in time-series data related to a well system, specifically "Well PS002." The primary goal is to identify events of downtime or abnormal behavior in the well system. The project involves the use of various data processing, statistical, and machine learning techniques.
Python: The entire project is coded in Python.
- Pandas: Pandas is used for data manipulation and handling time-series data.
- PyArrow and PyArrow Parquet: These libraries are employed for reading and processing data stored in the Parquet file format.
- Matplotlib and Seaborn: Matplotlib and Seaborn are used for data visualization, including creating various plots to visualize the well system's behavior.
- Scipy: Scipy is utilized for signal processing and filtering.
- SEEQ : Real-time data is obtained from the deep water well data base
- Scikit-learn (sklearn): Scikit-learn is used for machine learning tasks, such as Principal Component Analysis (PCA) and train-test splitting.
- Statsmodels: Statsmodels is used for nonparametric smoothing.
- Data Preprocessing: The project involves extensive data preprocessing, including resampling, smoothing, and interpolation of time-series data.
- Statistical Analysis: Statistical methods, including lowess smoothing, are applied to analyze the data and identify anomalies.
- Machine Learning: Principal Component Analysis (PCA) is used for feature reduction and anomaly detection.
- Event Detection: The project includes the identification of downtime events within the well system data.
- Data Visualization: Various data visualizations are created to aid in understanding and presenting the results.