Skip to content

Latest commit

 

History

History
127 lines (98 loc) · 8.53 KB

README.md

File metadata and controls

127 lines (98 loc) · 8.53 KB

Notebooks and Data for Scientific Computing II - Fall 24

Most of the materials in this class are derived from notebooks and activities developed by Ben Farr for his scientific computing class at the University of Oregon and by Stephen Taylor for his Astrostatistics class at Vanderbilt University.

This is a collection of notebooks and data, which will be added to throughout the term.

Data Analysis Notebooks

Data Provenance

Exploring births in the US

US Birth data from the Social Security Administration, prepared by FiveThirtyEight.

source

This data can be with a wget command:

mkdir -p ../data
wget -qO ../data/US_births_2000-2014_SSA.csv https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_2000-2014_SSA.csv

Solar Neighborhood w/ Gaia

We will use the Gaia DR3 data release to explore the solar neighborhood. The data is available from the Gaia Archive. We will use the following query to get the data:

SELECT TOP 300000 phot_g_mean_mag+5*log10(parallax)-10 AS mg, bp_rp, parallax FROM gaiadr3.gaia_source
WHERE parallax_over_error > 10
AND parallax > 10
AND phot_g_mean_flux_over_error>50
AND phot_rp_mean_flux_over_error>20
AND phot_bp_mean_flux_over_error>20
AND phot_bp_rp_excess_factor < 1.3+0.06*power(phot_bp_mean_mag-phot_rp_mean_mag,2)
AND phot_bp_rp_excess_factor > 1.0+0.015*power(phot_bp_mean_mag-phot_rp_mean_mag,2)
AND visibility_periods_used>8
AND astrometric_chi2_al/(astrometric_n_good_obs_al-5)<1.44*greatest(1,exp(-0.4*(phot_g_mean_mag-19.5)))

Synthetic data for linear regression

This data accompanies Hogg, Bovy, and Lang (2010). It can be downloaded directly with

!wget -o ../data/data_yerr.dat https://raw.githubusercontent.com/davidwhogg/DataAnalysisRecipes/master/straightline/src/data_yerr.dat

CO2 Concentrations in Mauna Loa, Hawaii

Monthy-averaged CO2 concentrations measured in Mauna Loa, Hawaii, hosted by the NOAA:

!wget -q ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt -O ../data/co2_mm_mlo.txt

Logistic Regression Synthetic Data

To introduce logistic regression we make use of some data used by Jordi Warmenhoven in their Coursera Machine Learning course.

!wget https://raw.githubusercontent.com/JWarmenhoven/Coursera-Machine-Learning/master/notebooks/data/ex2data1.txt -O ../data/ex2data1.txt
!wget https://raw.githubusercontent.com/JWarmenhoven/Coursera-Machine-Learning/master/notebooks/data/ex2data2.txt -O ../data/ex2data2.txt

SDSS Quasars

This is data collected by the Sloan Digital Sky Survey (SDSS) relating to quasars. The catalogs we'll be using are part of PSU's astrostatistics data sets. We need three separate files, separated by spectroscopically confirmed classifications.

Spectroscopically confirmed stars:

!wget -q --no-check-certificate -O ../data/SDSS_stars.csv https://astrostatistics.psu.edu/MSMA/datasets/SDSS_stars.csv

white dwarfs:

!wget -q --no-check-certificate -O ../data/SDSS_wd.csv https://astrostatistics.psu.edu/MSMA/datasets/SDSS_wd.csv

and quasars:

!wget -q --no-check-certificate -O ../data/SDSS_quasar.dat https://astrostatistics.psu.edu/datasets/SDSS_quasar.dat

More info on the dataset can be found here.

M4

We make use of two separate data products from the Gaia collaboration. First is a cluster catalog here, which is associated with this paper looking at the kinematics of many globular clusters. The full data release associated with the paper can be found here, and includes tables of members identified for each cluster they studied. This can be downloaded directly with:

wget http://cdsarc.u-strasbg.fr/ftp/J/A+A/616/A12/files/NGC6121-1.dat -O ../data/NGC6121-1.dat

Second, we use m4_gaia_source.csv, which was pulled from the Gaia data archive with the following query:

SELECT TOP 1000000 gaia_source.designation,gaia_source.source_id,gaia_source.ra,gaia_source.dec,gaia_source.parallax,gaia_source.parallax_error,gaia_source.parallax_over_error,gaia_source.pm,gaia_source.pmra,gaia_source.pmra_error,gaia_source.pmdec,gaia_source.pmdec_error,gaia_source.astrometric_n_good_obs_al,gaia_source.astrometric_chi2_al,gaia_source.visibility_periods_used,gaia_source.phot_g_mean_flux_over_error,gaia_source.phot_g_mean_mag,gaia_source.phot_bp_mean_flux_over_error,gaia_source.phot_bp_mean_mag,gaia_source.phot_rp_mean_flux_over_error,gaia_source.phot_rp_mean_mag,gaia_source.phot_bp_rp_excess_factor,gaia_source.bp_rp,gaia_source.radial_velocity,gaia_source.radial_velocity_error
FROM gaiadr3.gaia_source 
WHERE 
CONTAINS(
	POINT('ICRS',gaiadr3.gaia_source.ra,gaiadr3.gaia_source.dec),
	BOX('ICRS',246,-26.5,3,3)
)=1