Julia code for high dimensional machine learning models. In this repository, you'll find some bioinformatic tools to analyze biological and genetics data with a focus on machine learning and statistics in high dimensions. You could also find some data engineering pipelines to handle big data. Here is an incomplete and quick list of topics:
- AWS: read data from AWS S3 repository using DuckDB, Arrow, Polars and other tools
- SQL: querying data from S3 and write simple SQL code to extract data
- Machine Learning: mainly based on
MLJ
package → which allow us to write ML pipeline - Statistics: investigate hypothesis test and new correlation coefficients
- Linear Algebra: to handle big data in genetics
- Bash: bash script to automatically download data from URLs