Skip to content

Commit

Permalink
Merge pull request #355 from bsipocz/irsa_fornax_projects
Browse files Browse the repository at this point in the history
GSOC: adding IRSA projects
  • Loading branch information
bsipocz authored Feb 17, 2024
2 parents 8c948c5 + 3a80924 commit 2f96e8a
Show file tree
Hide file tree
Showing 2 changed files with 137 additions and 0 deletions.
74 changes: 74 additions & 0 deletions _projects/2024/irsa-fornax/astrodata_DL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
name: Astronomical data enhancement with DL
desc:
requirements:
- Experience with data processing, AI, and machine learning
- Experience with Python
- Experience with AI/ML libraries such as TensorFlor and PyTorch
difficulty: very high
mentors:
- xoubish
- jkrick
initiatives:
- GSOC
project_size:
- 350 h / large
tags:
- irsa
- fornax
- Python
collaborating_projects:
- irsa-fornax
---

# Description

In partnership with the NASA Science Platform Initiative, which promotes scientific
research by providing enhanced access to archival data on the cloud, this project
aims to leverage artificial intelligence (AI) for the generation, augmentation, and
enhancement of archival astronomical data. Addressing the challenge of data diversity,
including variations in temporal spans, wavelength coverages, resolutions, and depth,
we plan to integrate advanced data processing techniques with deep learning models
such as flow-based architectures and transformers. Our objective is to rectify data
gaps and inconsistencies within astronomical datasets, facilitating uniform analysis
and minimizing information loss.

This initiative seeks to significantly enhance the
utility of existing astronomical datasets, ensuring that even incomplete data can
contribute meaningfully to scientific advancements. A crucial part of our methodology
involves the development and testing of a data unification schema on large samples
of Active Galactic Nuclei (AGNs), serving as a robust testbed for our algorithms.


This effort will not only validate the algorithms' ability to address complex data
challenges but also highlight their applicability across astronomical research.


## Goals

* Design and optimize a deep learning architecture tailored for the enhancement and unification of astronomical archival data.
* Conduct comprehensive testing of the data unification schema on large samples of AGNs


## Project requirements

* Strong foundation in computer science, with a specialization in data processing, AI, and machine learning (ML)
* Proficiency in programming, particularly Python, and familiarity with AI/ML libraries and frameworks such as TensorFlow and PyTorch
* Expertise in data analysis, capable of assessing the impact of different augmentation techniques on the informational content and practical utility of datasets.
* Collaborative spirit, prepared to work within the NASA science platform group and engage with astronomers, ensuring that the project's technical solutions are aligned with scientific objectives and effectively contribute to the field.


### Community Bonding Period

* Familiarize yourself with the current code and the challenges.
* Setup a development environment.

### Coding starts

#### 1st evaluation

* Have developed an initial DL architecture for gap filling in archival AGN data.

#### Final evaluation

* Have optimized the DL architecture for data unification from multiple archives, with quantified improvement metrics.
63 changes: 63 additions & 0 deletions _projects/2024/irsa-fornax/light_curve_dask.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
name: Enable Dask execution of NASA time-domain analysis
desc:
requirements:
- Experience with Python
- Experience with Dask
- Background with astronomy is desired but not required.
difficulty: high
mentors:
- jkrick
- troyraen
initiatives:
- GSOC
project_size:
- 350 h / large
tags:
- irsa
- fornax
- python
collaborating_projects:
- irsa-fornax
---

# Description

NASA is building a science console that runs on cloud compute and supports astrophysicists to access the literally
astronomically large datasets produced by space telescopes past and future. Our team is directing the science
development of that console by trying out novel, big-data science projects and turning them into code and tutorials
for use by the astrophysical community. We seek a contributor who can enable our code to be executed efficiently at
scale on a Dask cluster provided by the science console.


This project will focus on a science use case that is based on the idea of collecting data from all of NASA's
archival time domain datasets for a user-defined set of targets. This produces "light curves" -- roughly,
brightness as a function of time -- in multiple wavelengths for each target. The science that can result from
these multi-wavelength light curves includes classification of AGN (black holes), young stellar objects, and many
other astronomically variable targets. Collecting these light curves is difficult to do, especially at the scale
of millions of targets. Our team has written code that does this work, and an accompanying tutorial that
demonstrates how to run the code in parallel using python's `multiprocessing` library.

The main task of this project is to determine and implement a solution that efficiently executes the light
curve collection code on a Dask cluster. This will involve writing new python code to manage Dask, and may or
may not require altering the existing code to work more efficiently with Dask. Time permitting, the contributor
may work with additional codes that we are developing for related use cases, each of which is likely to present
different challenges to running at scale on a Dask cluster.

## Goals

### Community Bonding Period

* Familiarize yourself with the current code and the challenges to running at scale.
* Setup a development environment.

### Coding starts

#### 1st evaluation

* Have written new code that executes the light curve collection code on a Dask cluster.

#### Final evaluation

* Have implemented a solution that runs smoothly on a Dask cluster and finishes in less time than the current
code takes.

0 comments on commit 2f96e8a

Please sign in to comment.