-
Notifications
You must be signed in to change notification settings - Fork 158
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #355 from bsipocz/irsa_fornax_projects
GSOC: adding IRSA projects
- Loading branch information
Showing
2 changed files
with
137 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
--- | ||
name: Astronomical data enhancement with DL | ||
desc: | ||
requirements: | ||
- Experience with data processing, AI, and machine learning | ||
- Experience with Python | ||
- Experience with AI/ML libraries such as TensorFlor and PyTorch | ||
difficulty: very high | ||
mentors: | ||
- xoubish | ||
- jkrick | ||
initiatives: | ||
- GSOC | ||
project_size: | ||
- 350 h / large | ||
tags: | ||
- irsa | ||
- fornax | ||
- Python | ||
collaborating_projects: | ||
- irsa-fornax | ||
--- | ||
|
||
# Description | ||
|
||
In partnership with the NASA Science Platform Initiative, which promotes scientific | ||
research by providing enhanced access to archival data on the cloud, this project | ||
aims to leverage artificial intelligence (AI) for the generation, augmentation, and | ||
enhancement of archival astronomical data. Addressing the challenge of data diversity, | ||
including variations in temporal spans, wavelength coverages, resolutions, and depth, | ||
we plan to integrate advanced data processing techniques with deep learning models | ||
such as flow-based architectures and transformers. Our objective is to rectify data | ||
gaps and inconsistencies within astronomical datasets, facilitating uniform analysis | ||
and minimizing information loss. | ||
|
||
This initiative seeks to significantly enhance the | ||
utility of existing astronomical datasets, ensuring that even incomplete data can | ||
contribute meaningfully to scientific advancements. A crucial part of our methodology | ||
involves the development and testing of a data unification schema on large samples | ||
of Active Galactic Nuclei (AGNs), serving as a robust testbed for our algorithms. | ||
|
||
|
||
This effort will not only validate the algorithms' ability to address complex data | ||
challenges but also highlight their applicability across astronomical research. | ||
|
||
|
||
## Goals | ||
|
||
* Design and optimize a deep learning architecture tailored for the enhancement and unification of astronomical archival data. | ||
* Conduct comprehensive testing of the data unification schema on large samples of AGNs | ||
|
||
|
||
## Project requirements | ||
|
||
* Strong foundation in computer science, with a specialization in data processing, AI, and machine learning (ML) | ||
* Proficiency in programming, particularly Python, and familiarity with AI/ML libraries and frameworks such as TensorFlow and PyTorch | ||
* Expertise in data analysis, capable of assessing the impact of different augmentation techniques on the informational content and practical utility of datasets. | ||
* Collaborative spirit, prepared to work within the NASA science platform group and engage with astronomers, ensuring that the project's technical solutions are aligned with scientific objectives and effectively contribute to the field. | ||
|
||
|
||
### Community Bonding Period | ||
|
||
* Familiarize yourself with the current code and the challenges. | ||
* Setup a development environment. | ||
|
||
### Coding starts | ||
|
||
#### 1st evaluation | ||
|
||
* Have developed an initial DL architecture for gap filling in archival AGN data. | ||
|
||
#### Final evaluation | ||
|
||
* Have optimized the DL architecture for data unification from multiple archives, with quantified improvement metrics. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
--- | ||
name: Enable Dask execution of NASA time-domain analysis | ||
desc: | ||
requirements: | ||
- Experience with Python | ||
- Experience with Dask | ||
- Background with astronomy is desired but not required. | ||
difficulty: high | ||
mentors: | ||
- jkrick | ||
- troyraen | ||
initiatives: | ||
- GSOC | ||
project_size: | ||
- 350 h / large | ||
tags: | ||
- irsa | ||
- fornax | ||
- python | ||
collaborating_projects: | ||
- irsa-fornax | ||
--- | ||
|
||
# Description | ||
|
||
NASA is building a science console that runs on cloud compute and supports astrophysicists to access the literally | ||
astronomically large datasets produced by space telescopes past and future. Our team is directing the science | ||
development of that console by trying out novel, big-data science projects and turning them into code and tutorials | ||
for use by the astrophysical community. We seek a contributor who can enable our code to be executed efficiently at | ||
scale on a Dask cluster provided by the science console. | ||
|
||
|
||
This project will focus on a science use case that is based on the idea of collecting data from all of NASA's | ||
archival time domain datasets for a user-defined set of targets. This produces "light curves" -- roughly, | ||
brightness as a function of time -- in multiple wavelengths for each target. The science that can result from | ||
these multi-wavelength light curves includes classification of AGN (black holes), young stellar objects, and many | ||
other astronomically variable targets. Collecting these light curves is difficult to do, especially at the scale | ||
of millions of targets. Our team has written code that does this work, and an accompanying tutorial that | ||
demonstrates how to run the code in parallel using python's `multiprocessing` library. | ||
|
||
The main task of this project is to determine and implement a solution that efficiently executes the light | ||
curve collection code on a Dask cluster. This will involve writing new python code to manage Dask, and may or | ||
may not require altering the existing code to work more efficiently with Dask. Time permitting, the contributor | ||
may work with additional codes that we are developing for related use cases, each of which is likely to present | ||
different challenges to running at scale on a Dask cluster. | ||
|
||
## Goals | ||
|
||
### Community Bonding Period | ||
|
||
* Familiarize yourself with the current code and the challenges to running at scale. | ||
* Setup a development environment. | ||
|
||
### Coding starts | ||
|
||
#### 1st evaluation | ||
|
||
* Have written new code that executes the light curve collection code on a Dask cluster. | ||
|
||
#### Final evaluation | ||
|
||
* Have implemented a solution that runs smoothly on a Dask cluster and finishes in less time than the current | ||
code takes. |