Skip to content

Commit

Permalink
Deploying to gh-pages from @ f7d8fb2 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
tombaker committed Sep 26, 2023
1 parent 8b537f0 commit 382d919
Show file tree
Hide file tree
Showing 7 changed files with 23 additions and 23 deletions.
8 changes: 4 additions & 4 deletions _sources/introduction/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@ Research typically starts with a problem, such as the relationship of growing co

To explore that problem, a researcher may need to pull data together from a variety of sources, such as sensors, smart phones, drones, or published statistics. That data may be available in a diversity of formats, such as databases, spreadsheets, instrument feeds, or even printed reports. The available data may use different units of measure, such as centimeters versus inches, or reflect different levels of granularity, such as state versus county.

Researchers must decide which elements of the available sources will be required to support their planned queries or regression analyses. Through discussions and whiteboard diagrams (which we like to call "boxologies"), researchers conceptualize the entities of interest for a given research problem, relationships among those entities, and the required properties for each entity. Each entity, along with its properties, is defined as a "data shape".
Researchers must decide which elements of the available sources will be required to support their planned queries or regression analyses. Through discussions and whiteboard diagrams (which we like to call ["boxologies"](https://www.youtube.com/watch?v=w1HTzAbCMDs&t=1491s)), researchers conceptualize the entities of interest for a given research problem, relationships among those entities, and the required properties for each entity. Each entity, along with its properties, is defined as a "data shape".

In this project, data shapes are formalized as schemas using the Shape Expressions Language, or [ShEx](https://shex.io/shex-primer/). ShEx schemas are based, in turn, on Resource Description Framework, or [RDF]() -- a generalized and implementation-agnostic standard for expressing data in terms of sentence-like statements organized in flexible graph structures. RDF, in turn, is founded on the Uniform Resource Identifier, or URIs, most recognizable as the ubiquitous URL, or "Web address".
In this project, data shapes are formalized as schemas using the Shape Expressions Language, or [ShEx](https://shex.io/shex-primer/). ShEx schemas are based, in turn, on Resource Description Framework, or [RDF](https://www.w3.org/TR/skos-primer/) -- a generalized and implementation-agnostic standard for expressing data in terms of sentence-like statements organized in flexible graph structures. RDF, in turn, is founded on the Uniform Resource Identifier, or URIs, most recognizable as the ubiquitous URL, or "Web address".

The elements of data shapes are ideally identified with URIs from global "authorities" -- stable, persistent, institutionally backed repositories of well-defined URIs. It is the use of authoritative URIs that allow data shapes to serve as the basis for data interoperability not just in the here and now, but also in the future. For authoritative URIs related to concepts and properties in agriculture, the USDA research world looks to the NALT Concept Space of the National Agricultural Library. NALT, formerly known as the NAL Thesaurus, has extended its mission to support the creation of data shapes in support of research data interoperability.
The elements of data shapes are ideally identified with URIs from global "authorities" -- stable, persistent, institutionally backed repositories of well-defined URIs. It is the use of authoritative URIs that allow data shapes to serve as the basis for data interoperability not just in the here and now, but also in the future. For authoritative URIs related to concepts and properties in agriculture, the USDA research world looks to the [NALT Concept Space](https://agclass.nal.usda.gov/) of the National Agricultural Library. NALT, formerly known as the NAL Thesaurus, has extended its mission to support the creation of data shapes in support of research data interoperability.

To illustrate with a simple example: In addressing a given research problem, a researcher may need two data shapes: a "crop sample shape" with NALT URIs for harvest weight and color, and a "weather shape" with NALT URIs for precipitation and cloud cover. These shapes, or "target shapes", can serve as targets for converting, normalizing, and integrating selected data elements from a variety of sources and in a variety of formats. Precipitation might be pulled from a weather database by SQL query. Harvest weight might be extracted from a spreadsheet and converted into metric units using a Python script. (The implementation specifics of extraction and normalization are orthogonal to data shapes themselves.)
A simple example: To address a given research problem, a researcher may need two data shapes: a "crop sample shape" with NALT URIs for harvest weight and color, and a "weather shape" with NALT URIs for precipitation and cloud cover. These shapes, or "target shapes", can serve as targets for converting, normalizing, and integrating selected data elements from a variety of sources and in a variety of formats. Precipitation might be pulled from a weather database by SQL query. Harvest weight might be extracted from a spreadsheet and converted into metric units using a Python script. (The implementation specifics of extraction and normalization are orthogonal to data shapes themselves.)

Data shapes can be published on the Web for use by other researchers or for adaptation to related research domains. To the extent that data shapes are based on authoritative URIs from NALT, data shapes serve to focus the efforts of agricultural researchers in the interest of improving data interoperability over the long term.
12 changes: 6 additions & 6 deletions creating-rdf/cotton/convert2rdf.html
Original file line number Diff line number Diff line change
Expand Up @@ -584,7 +584,7 @@ <h2>Schema of meta data on synthetic dataset 1<a class="headerlink" href="#schem
<span class="ne">ExecutableNotFound</span>: failed to execute PosixPath(&#39;dot&#39;), make sure the Graphviz executables are on your systems&#39; PATH
</pre></div>
</div>
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;graphviz.graphs.Digraph at 0x7f54141c47f0&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;graphviz.graphs.Digraph at 0x7f66b8e85910&gt;
</pre></div>
</div>
</div>
Expand Down Expand Up @@ -727,7 +727,7 @@ <h2>Schema of meta data on synthetic dataset 1<a class="headerlink" href="#schem
<span class="ne">ExecutableNotFound</span>: failed to execute PosixPath(&#39;dot&#39;), make sure the Graphviz executables are on your systems&#39; PATH
</pre></div>
</div>
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;graphviz.graphs.Digraph at 0x7f53e46e52e0&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;graphviz.graphs.Digraph at 0x7f668a3f8a90&gt;
</pre></div>
</div>
</div>
Expand All @@ -743,7 +743,7 @@ <h3>Load (synthetic) data<a class="headerlink" href="#load-synthetic-data" title
</div>
</div>
<div class="cell_output docutils container">
<div class="output stderr highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>/tmp/ipykernel_1901/1407265399.py:2: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
<div class="output stderr highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>/tmp/ipykernel_1902/1407265399.py:2: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
subjects = pd.read_csv(filename,index_col=False)
</pre></div>
</div>
Expand Down Expand Up @@ -1535,7 +1535,7 @@ <h2>Production Shape<a class="headerlink" href="#id2" title="Permalink to this h
<span class="ne">ExecutableNotFound</span>: failed to execute PosixPath(&#39;dot&#39;), make sure the Graphviz executables are on your systems&#39; PATH
</pre></div>
</div>
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;graphviz.graphs.Digraph at 0x7f53e4482e20&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;graphviz.graphs.Digraph at 0x7f668a190e20&gt;
</pre></div>
</div>
</div>
Expand Down Expand Up @@ -1721,7 +1721,7 @@ <h2>ProductionCrop<a class="headerlink" href="#id3" title="Permalink to this hea
<span class="ne">ExecutableNotFound</span>: failed to execute PosixPath(&#39;dot&#39;), make sure the Graphviz executables are on your systems&#39; PATH
</pre></div>
</div>
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;graphviz.graphs.Digraph at 0x7f53e4582280&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;graphviz.graphs.Digraph at 0x7f668a399190&gt;
</pre></div>
</div>
</div>
Expand Down Expand Up @@ -1893,7 +1893,7 @@ <h2>Agricultural experiment<a class="headerlink" href="#agricultural-experiment"
<span class="ne">ExecutableNotFound</span>: failed to execute PosixPath(&#39;dot&#39;), make sure the Graphviz executables are on your systems&#39; PATH
</pre></div>
</div>
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;graphviz.graphs.Digraph at 0x7f53e4616940&gt;
<div class="output text_plain highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>&lt;graphviz.graphs.Digraph at 0x7f668a2e1130&gt;
</pre></div>
</div>
</div>
Expand Down
2 changes: 1 addition & 1 deletion creating-rdf/cotton/primary_data.html
Original file line number Diff line number Diff line change
Expand Up @@ -425,7 +425,7 @@ <h2>Dataset on Cotton<a class="headerlink" href="#dataset-on-cotton" title="Perm
It contains legacy data on cotton harvests.</p>
<div class="cell tag_remove-input docutils container">
<div class="cell_output docutils container">
<div class="output stderr highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>/tmp/ipykernel_1926/2413312554.py:2: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
<div class="output stderr highlight-myst-ansi notranslate"><div class="highlight"><pre><span></span>/tmp/ipykernel_1930/2413312554.py:2: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
cotton = pd.read_csv(&quot;data/Legacy_Cotton_VT_data.csv&quot;,index_col=False, on_bad_lines = &#39;skip&#39;)
</pre></div>
</div>
Expand Down
8 changes: 4 additions & 4 deletions introduction/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -403,10 +403,10 @@ <h1>Overview<a class="headerlink" href="#overview" title="Permalink to this head
<p>Researchers in agriculture seek to understand how factors such as weather, soil composition, and planting technique affect the quantity and quality of agricultural production.</p>
<p>Research typically starts with a problem, such as the relationship of growing conditions to the quality of a harvested product.</p>
<p>To explore that problem, a researcher may need to pull data together from a variety of sources, such as sensors, smart phones, drones, or published statistics. That data may be available in a diversity of formats, such as databases, spreadsheets, instrument feeds, or even printed reports. The available data may use different units of measure, such as centimeters versus inches, or reflect different levels of granularity, such as state versus county.</p>
<p>Researchers must decide which elements of the available sources will be required to support their planned queries or regression analyses. Through discussions and whiteboard diagrams (which we like to call “boxologies”), researchers conceptualize the entities of interest for a given research problem, relationships among those entities, and the required properties for each entity. Each entity, along with its properties, is defined as a “data shape”.</p>
<p>In this project, data shapes are formalized as schemas using the Shape Expressions Language, or <a class="reference external" href="https://shex.io/shex-primer/">ShEx</a>. ShEx schemas are based, in turn, on Resource Description Framework, or <span class="xref myst">RDF</span> – a generalized and implementation-agnostic standard for expressing data in terms of sentence-like statements organized in flexible graph structures. RDF, in turn, is founded on the Uniform Resource Identifier, or URIs, most recognizable as the ubiquitous URL, or “Web address”.</p>
<p>The elements of data shapes are ideally identified with URIs from global “authorities” – stable, persistent, institutionally backed repositories of well-defined URIs. It is the use of authoritative URIs that allow data shapes to serve as the basis for data interoperability not just in the here and now, but also in the future. For authoritative URIs related to concepts and properties in agriculture, the USDA research world looks to the NALT Concept Space of the National Agricultural Library. NALT, formerly known as the NAL Thesaurus, has extended its mission to support the creation of data shapes in support of research data interoperability.</p>
<p>To illustrate with a simple example: In addressing a given research problem, a researcher may need two data shapes: a “crop sample shape” with NALT URIs for harvest weight and color, and a “weather shape” with NALT URIs for precipitation and cloud cover. These shapes, or “target shapes”, can serve as targets for converting, normalizing, and integrating selected data elements from a variety of sources and in a variety of formats. Precipitation might be pulled from a weather database by SQL query. Harvest weight might be extracted from a spreadsheet and converted into metric units using a Python script. (The implementation specifics of extraction and normalization are orthogonal to data shapes themselves.)</p>
<p>Researchers must decide which elements of the available sources will be required to support their planned queries or regression analyses. Through discussions and whiteboard diagrams (which we like to call <a class="reference external" href="https://www.youtube.com/watch?v=w1HTzAbCMDs&amp;t=1491s">“boxologies”</a>), researchers conceptualize the entities of interest for a given research problem, relationships among those entities, and the required properties for each entity. Each entity, along with its properties, is defined as a “data shape”.</p>
<p>In this project, data shapes are formalized as schemas using the Shape Expressions Language, or <a class="reference external" href="https://shex.io/shex-primer/">ShEx</a>. ShEx schemas are based, in turn, on Resource Description Framework, or <a class="reference external" href="https://www.w3.org/TR/skos-primer/">RDF</a> – a generalized and implementation-agnostic standard for expressing data in terms of sentence-like statements organized in flexible graph structures. RDF, in turn, is founded on the Uniform Resource Identifier, or URIs, most recognizable as the ubiquitous URL, or “Web address”.</p>
<p>The elements of data shapes are ideally identified with URIs from global “authorities” – stable, persistent, institutionally backed repositories of well-defined URIs. It is the use of authoritative URIs that allow data shapes to serve as the basis for data interoperability not just in the here and now, but also in the future. For authoritative URIs related to concepts and properties in agriculture, the USDA research world looks to the <a class="reference external" href="https://agclass.nal.usda.gov/">NALT Concept Space</a> of the National Agricultural Library. NALT, formerly known as the NAL Thesaurus, has extended its mission to support the creation of data shapes in support of research data interoperability.</p>
<p>A simple example: To address a given research problem, a researcher may need two data shapes: a “crop sample shape” with NALT URIs for harvest weight and color, and a “weather shape” with NALT URIs for precipitation and cloud cover. These shapes, or “target shapes”, can serve as targets for converting, normalizing, and integrating selected data elements from a variety of sources and in a variety of formats. Precipitation might be pulled from a weather database by SQL query. Harvest weight might be extracted from a spreadsheet and converted into metric units using a Python script. (The implementation specifics of extraction and normalization are orthogonal to data shapes themselves.)</p>
<p>Data shapes can be published on the Web for use by other researchers or for adaptation to related research domains. To the extent that data shapes are based on authoritative URIs from NALT, data shapes serve to focus the efforts of agricultural researchers in the interest of improving data interoperability over the long term.</p>
<div class="toctree-wrapper compound">
</div>
Expand Down
Loading

0 comments on commit 382d919

Please sign in to comment.