The purpose of this document is to provide attribution required for data used in the InstructLab project including data for pre-training, knowledge data, and skills data.
For new contributions to InstructLab that require data along with the submission, we have thorough guidelines on how to provide attribution for these data sources.
For data used to create the artifacts of the InstructLab project prior to its being open sourced, we are aware that this list is not comprehensive at the time of creating it. The project maintainers will augment and update it over time to the best of our abilities.
The following openly licensed textbook works were used as knowledge source seeds in the knowledge synthetic data generation pipeline.
Dataset Name | Knowledge Taxonomy Location | License (where possible, use SPDX License Identifier) | Creator Names | Copyright |
---|---|---|---|---|
Anatomy and Physiology 2e. | anatomy | CC-BY-4.0 | Senior Contributing Authors: Kelly A. Young, California State University, Long Beach; James A. Wise, Hampton University; Eddie Johnson, Central Oregon Community College; Brandon Poe, Springfield Technical Community College; Dean H. Kruse, Portland Community College; Oksana Korol, Aims Community College; Jody E. Johnson, Arapahoe Community College; Mark Womble, Youngstown State University; Peter DeSaix, University of North Carolina at Chapel Hill | Copyright 2022 Rice University |
Astronomy 2e. | astronomy | CC-BY-4.0 | Senior Contributing Authors: Andrew Fraknoi, Fromm Institute, University of San Francisco; David Morrison, NASA (Emeritus) and SETI Institute; Sidney Wolff, NOIRLab (Emerita) | Copyright 2022 Rice University |
Business Ethics | business_ethics | CC-BY-4.0 | Senior Contributing Authors: Stephen M. Byars, USC Marshall School of Business; Kurt Stanberry, University of Houston–Downtown | Copyright 2018 Rice University |
Biology 2e | college_biology | CC-BY-4.0 | Senior Contributing Authors: Mary Ann Clark, Texas Wesleyan University; Matthew Douglas, Grand Rapids Community College; Jung Choi, Georgia Institute of Technology | Copyright 2020 Rice University |
Biology for AP Courses | college_biology | CC-BY-4.0 | Senior Contributing Authors: Julianne Zedalis, The Bishop's School in La Jolla, CA; John Eggebrecht, Cornell University | Copyright 2018 Rice University |
Concepts of Biology | college_biology | CC-BY-4.0 | Senior Contributing Authors: Samantha Fowler, Clayton State University; Rebecca Roush, Sandhills Community College; James Wise, Hampton University | Copyright 2023 Rice University |
Microbiology | college_biology | CC-BY-4.0 | Senior Contributing Authors: Nina Parker, Shenandoah University; Mark Schneegurt, Wichita State University; Anh-Hue Thi Tu, Georgia Southwestern State University; Philip Lister, Central New Mexico Community College; Brian M. Forster, Saint Joseph’s University | Copyright 2021 Rice University |
Chemistry 2e | college_chemistry | CC-BY-4.0 | Senior Contributing Authors: Paul Flowers, University of North Carolina at Pembroke; Klaus Theopold, University of Delaware; Richard Langley, Stephen F. Austin State University; William R. Robinson, PhD, Purdue University | Copyright 2019 Rice University |
Chemistry Atoms First 2e. | college_chemistry | CC-BY-4.0 | Senior Contributing Authors: Paul Flowers, University of North Carolina at Pembroke; Edward J. Neth, University of Connecticut; Klaus Theopold, University of Delaware; Richard Langley, Stephen F. Austin State University; William R. Robinson, PhD, Purdue University | Copyright 2019 Rice University |
University Physics volume 1 | college_physics | CC-BY-4.0 | Senior Contributing Authors: William Moebs, Formerly of Loyola Marymount University; Samuel J. Ling, Truman State University; Jeff Sanny, Loyola Marymount University | Copyright 2021 Rice University |
University Physics volume 2 | college_physics | CC-BY-4.0 | Senior Contributing Authors: William Moebs, Formerly of Loyola Marymount University; Samuel J. Ling, Truman State University; Jeff Sanny, Loyola Marymount University | Copyright 2021 Rice University |
University Physics volume 3 | college_physics | CC-BY-4.0 | Senior Contributing Authors: William Moebs, Formerly of Loyola Marymount University; Samuel J. Ling, Truman State University; Jeff Sanny, Loyola Marymount University | Copyright 2021 Rice University |
Introductory Business Statistics with Interactive Spreadsheets - 1st Canadian Edition | business_statistics | CC-BY-4.0 | Mohammad Mahbobi and Thomas K. Tiemann | Copyright 2015 by Mohammad Mahbobi and Thomas K. Tiemann |
Principles of Macroeconomics 3e | econometrics | CC-BY-4.0 | Senior Contributing Authors: David Shapiro, Pennsylvania State University; Daniel MacDonald, California State University, San Bernardino; Steven A. Greenlaw, University of Mary Washington | Copyright 2022 Rice University |
Algebra and Trigonometry 2e | high_school_mathematics | CC-BY-4.0 | Senior Contributing Author: Jay Abramson, Arizona State University | Copyright 2021 Rice University |
Precalculus 2e | high_school_mathematics | CC-BY-4.0 | Senior Contributing Author: Jay Abramson, Arizona State University | Copyright 2021 Rice University |
Introductory Business Statistics | high_school_mathematics | CC-BY-4.0 | Senior Contributing Authors: Alexander Holmes, The University of Oklahoma; Barbara Illowsky, De Anza College; Susan Dean, De Anza College | Copyright 2018 Rice University |
Contemporary Mathematics | high_school_mathematics | CC-BY-4.0 | Senior Contributing Author: Donna Kirk, University of Wisconsin at Superior | Copyright 2023 Rice University |
Introductory Statistics | high_school_mathematics | CC-BY-4.0 | Senior Contributing Authors: Barbara Illowsky, De Anza College; Susan Dean, De Anza College | Copyright 2018 Rice University |
Statistics | high_school_mathematics | CC-BY-4.0 | Senior Contributing Authors: Barbara Illowsky, De Anza College; Susan Dean, De Anza College | Copyright 2020 Texas Education Agency (TEA) |
College Algebra 2e | high_school_mathematics | CC-BY-4.0 | Senior Contributing Author: Jay Abramson, Arizona State University | Copyright 2021 Rice University |
Applied Calculus | high_school_mathematics | CC-BY-SA-4.0 | Kevin Gonzales, Eric Hopkins, Catherine Zimmitti, Cheryl Kane; Modified to fit Applied Calculus from Coordinated Calculus by Nathan Wakefield et al.; Based upon Active Calculus by Matthew Boelkins | Copyright 2018 - 2021 University of Nebraska - Lincoln, Department of Mathematics |
Coordinated Calculus | high_school_mathematics | CC-BY-SA-4.0 | Nathan Wakefield, Christine Kelley, Marla Williams, Michelle Haver, Lawrence Seminario-Romero, Robert Huben, Aurora Marks, Stephanie Prahl; Based upon Active Calculus by Matthew Boelkins | Copyright 2019 University of Nebraska - Lincoln, Department of Mathematics |
Coordinated Multivariable Calculus | high_school_mathematics | CC-BY-NC-SA-4.0 | Steve Schlicker, Mitchel T. Keller, Nicholas Long, Zach Norwood, Audrey Goodnight; Based on Active Calculus | Copyright 2013 - 2022 Steven Schlicker, Mitchel T. Keller, and Nicholas Long |
Principles of Economics 3e | high_school_microeconomics | CC-BY-4.0 | Senior Contributing Authors: Steven A. Greenlaw, University of Mary Washington; David Shapiro, Pennsylvania State University; Daniel MacDonald, California State University, San Bernardino | Copyright 2022 Rice University |
Principles of Microeconomics 3e | high_school_microeconomics | CC-BY-4.0 | Senior Contributing Authors: Steven A. Greenlaw, University of Mary Washington; David Shapiro, Pennsylvania State University; Daniel MacDonald, California State University, San Bernardino | Copyright 2022 Rice University |
Physics | high_school_physics | CC-BY-4.0 | Senior Contributing Authors: Paul Peter Urone, California State University, Sacramento; Roger Hinrichs, State University of New York, College at Oswego | Copyright 2020 Texas Education Agency (TEA) |
Psychology 2e | high_school_psychology | CC-BY-4.0 | Senior Contributing Authors: Rose M. Spielman, Formerly of Quinnipiac University; William J. Jenkins, Mercer University; Marilyn D. Lovett, Spelman College | Copyright 2020 Rice University |
U.S. History | high_school_us_history | CC-BY-4.0 | Senior Contributing Authors: P. Scott Corbett, Ventura College; Volker Janssen, California State University, Fullerton; John M. Lund, Keene State College; Todd Pfannestiel, Clarion University; Sylvie Waskiewicz; Paul Vickery, Oral Roberts University | Copyright 2021 Rice University |
World History, Volume 1: to 1500 | high_school_world_history | CC-BY-4.0 | Senior Contributing Authors: Ann Kordas, Johnson & Wales University; Ryan J. Lynch, Columbus State University; Brooke Nelson, formerly California State University; Julie Tatlock, Mount Mary University | Copyright 2023 Rice University |
World History, Volume 2: from 1400 | high_school_world_history | CC-BY-4.0 | Senior Contributing Authors: Ann Kordas, Johnson & Wales University; Ryan J. Lynch, Columbus State University; Brooke Nelson, formerly California State University; Julie Tatlock, Mount Mary University | Copyright 2022 Rice University |
Introduction to Philosophy | philosophy | CC-BY-4.0 | Senior Contributing Author: Nathan Smith, Houston Community College | Copyright 2022 Rice University |
Principles of Financial Accounting | financial_accounting | CC-BY-SA-4.0 | Christine Jonick | Copyright 2017 University of North Georgia Press |
Intermediate Financial Accounting Volume 1 | financial_accounting | CC-BY-4.0 | Glenn Arnold, Athabasca University and Suzanne Kyle | Copyright 2016 Vretta-Lyryx Inc. |
Intermediate Financial Accounting Volume 2) | financial_accounting | CC-BY-4.0 | Glenn Arnold, Athabasca University and Suzanne Kyle | Copyright 2017-2021 Vretta-Lyryx Inc. |
Introduction to Political Science | political_science | CC-BY-2.0 | Senior Contributing Authors: Mark Carl Rom, Georgetown University; Masaki Hidaka, American University; Rachel Bzostek Walker, Collin College | Copyright 2022 Rice University |
Introduction to Anthropology | anthropology | CC-BY-4.0 | Senior Contributing Authors: Jennifer Hasty, University of Pennsylvania; David G. Lewis, Oregon State University; Marjorie M. Snipes, University of West Georgia | Copyright 2022 Rice University |
Dataset Name | Knowledge Taxonomy Location | License and/or Copyright | Other Citation Information |
---|---|---|---|
IBM Redbooks | ibm_redbooks | Copyright IBM with some rights available |
The following openly licensed datasets were used as foundational and safety seeds in the skills synthetic data generation pipeline
The following openly licensed datasets were used as foundational and safety seeds in the skills synthetic data generation pipeline
Dataset Name | License |
---|---|
GSM-8K | MIT |
AQuA-RAT | Apache-2.0 |
NumGLUE | ODC |
MathQA | Apache-2.0 |
CommitPack | MIT |
PRM800K | MIT |
SciBench | MIT |
TheoremQA | MIT |
openbookQA | Apache-2.0 |
ARB | CC-BY-4.0 |
TigerResearch | Apache 2.0 |
Musique | Apache-2.0 |
MMLU | MIT |
Conala | MIT |
2wikimultihop | Apache-2.0 |
hotpot_qa | CC-BY-SA-4.0 |
Helpsteer | CC-BY-4.0 |
Squad v2 | CC-BY-SA-4.0 |
HH-RHLF | MIT |
Flan Collection | Apache-2.0 |
Chatbot arena (Prompts Only) | CC-BY-4.0 |
Dataset Name | License |
---|---|
OASST2 | Apache-2.0 |
Prosocial-dialog | CC-BY-4.0 |