Skip to content

Commit

Permalink
ds4j
Browse files Browse the repository at this point in the history
  • Loading branch information
jsoma committed Dec 2, 2019
1 parent e61dac7 commit 03337ee
Show file tree
Hide file tree
Showing 9 changed files with 33 additions and 18 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ <h1>
<section class="normal" id="section-">
<div id="conclusion" class="section level1">
<h1><span class="header-section-number">6</span> Conclusion</h1>
<p>While the regression was straightforward in terms of code and there wasn’t too much cleaning to do, there are still alternative ways of performing regressions and plenty of data-related questions we might ask. To follow up with these, be sure to <a href="/ds4j/ds4j/ds4j/ds4j/ds4j/ds4j/ap-regression-unemployment/">check the other notebooks and the discussion topics on the project page</a>.</p>
<p>While the regression was straightforward in terms of code and there wasn’t too much cleaning to do, there are still alternative ways of performing regressions and plenty of data-related questions we might ask. To follow up with these, be sure to <a href="/ds4j/ds4j/ds4j/ds4j/ds4j/ds4j/ds4j/ds4j/ap-regression-unemployment/">check the other notebooks and the discussion topics on the project page</a>.</p>

</div>
</section>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3792,10 +3792,10 @@ <h1 id="Download-and-extract-all-of-the-datasets-from-LegiScan">Download and ext



<div id="80c268c5-44f4-4176-bd8f-8ab009da2001"></div>
<div id="8ce94ec5-ea81-440e-bd8b-8556cec3528c"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#80c268c5-44f4-4176-bd8f-8ab009da2001');
var element = $('#8ce94ec5-ea81-440e-bd8b-8556cec3528c');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "d7ce247c10be477e8c20200fcb77e280", "version_major": 2, "version_minor": 0}
Expand Down Expand Up @@ -3971,10 +3971,10 @@ <h1 id="Converting-the-many-JSON-files-to-single-CSV-file">Converting the many J



<div id="0fe0ce2d-738e-400f-bfb5-035cd65bf060"></div>
<div id="66abdf36-a685-4d8e-a307-be041e94a59c"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#0fe0ce2d-738e-400f-bfb5-035cd65bf060');
var element = $('#66abdf36-a685-4d8e-a307-be041e94a59c');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "523ac203bd664e6f95afe6fd739d209a", "version_major": 2, "version_minor": 0}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -561,10 +561,10 @@ <h1 id="Run-many">Run many<a class="anchor-link" href="#Run-many">#</a></h1><p>N



<div id="008bf129-18c4-4794-8ab9-9020cf7b6fbe"></div>
<div id="c5da48b9-bc90-4062-8a3c-ea124027a528"></div>
<div class="output_subarea output_javascript ">
<script type="text/javascript">
var element = $('#008bf129-18c4-4794-8ab9-9020cf7b6fbe');
var element = $('#c5da48b9-bc90-4062-8a3c-ea124027a528');
setInterval(() => [...document.querySelectorAll('.output_stderr')].forEach(e => e.remove()), 5000)

</script>
Expand Down Expand Up @@ -619,10 +619,10 @@ <h1 id="Run-many">Run many<a class="anchor-link" href="#Run-many">#</a></h1><p>N



<div id="ef1114f8-48be-47a0-8c12-c956ea54aa00"></div>
<div id="8459761c-27b7-410c-adc1-866f34861b17"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#ef1114f8-48be-47a0-8c12-c956ea54aa00');
var element = $('#8459761c-27b7-410c-adc1-866f34861b17');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "7cffc947070c4350b7c8a2840a912a97", "version_major": 2, "version_minor": 0}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -273,10 +273,10 @@ <h1 id="Find-matches">Find matches<a class="anchor-link" href="#Find-matches">#<



<div id="48519f98-c001-4457-8874-cf5972a7d077"></div>
<div id="61b4dba7-8761-4c67-97e5-ad133bd1b127"></div>
<div class="output_subarea output_widget_view ">
<script type="text/javascript">
var element = $('#48519f98-c001-4457-8874-cf5972a7d077');
var element = $('#61b4dba7-8761-4c67-97e5-ad133bd1b127');
</script>
<script type="application/vnd.jupyter.widget-view+json">
{"model_id": "ce130262842146558da28b3655a3f6aa", "version_major": 2, "version_minor": 0}
Expand Down
11 changes: 9 additions & 2 deletions ds4j/foia-predictor/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,18 @@ <h1>Predicting FOIA requests success rates</h1>
<div class="columns">
<div class="column col-5 col-md-12 readings">
<h3>Readings and links</h3>
None
<ul>
<li><a href="https://datadotworld.shinyapps.io/foia_shiny_app/">FOIA Predictor</a></li>
<li><a href="https://www.poynter.org/tech-tools/2017/will-your-foia-request-succeed-this-new-machine-will-tell-you/">Will your FOIA request succeed? This new machine will tell you</a>, from Poynter</li>
<li><a href="https://journalistsresource.org/tip-sheets/predict-foia-request-will-succeed/">Predict if your FOIA request will succeed</a>, from Journalist's Resource</li>
<li><a href="https://www.reddit.com/r/foia/comments/6evaw2/open_source_foia_predictor/">A short discussion on Reddit</a></li>
</ul>
</div>
<div class="column col-7 col-md-12">
<h3>Summary</h3>
None
<p>Filing FOIA requests can be an unforgiving process, with arcane rules and layers of bureaucracy (and paperwork) to fight through. The FOIA Predictor was a chance to improve the process for weary journalists.</p>
<p>Bannered with "Predict Your FOIA Request Success: This model is trained on 9,000+ FOIA requests tracked by MuckRock" and "a test classification accuracy rate of 80%," the FOIA Predictor appealed strongly to a data journalist's desire for data-driven results, and was reported on accordingly.</p>
<p>But what's going on under the hood? Let's peel back the hood on this open-source project to see what's going on inside.</p>
</div>
</div>

Expand Down
2 changes: 1 addition & 1 deletion ds4j/nyt-takata-airbags/walkthrough/conclusion.html
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ <h1>
<section class="normal" id="section-">
<div id="conclusion" class="section level2">
<h2><span class="header-section-number">5.5</span> Conclusion</h2>
<p>If you found this short walkthrough interesting, you’ll want to check out the <a href="/ds4j/ds4j/ds4j/ds4j/ds4j/ds4j/nyt-takata-airbags/">full notebooks</a> on the page. They dive deep into using different classifiers, and different ways of counting words.</p>
<p>If you found this short walkthrough interesting, you’ll want to check out the <a href="/ds4j/ds4j/ds4j/ds4j/ds4j/ds4j/ds4j/ds4j/nyt-takata-airbags/">full notebooks</a> on the page. They dive deep into using different classifiers, and different ways of counting words.</p>
<p>In the end, though, this airbags research is just a toy example to prove that throwing data science at a problem doesn’t automatically solve it. If we labeled more cases our classifier would probably be better, yes, but in all honestly <strong>we should just search for “airbag” and manually read the cases.</strong> It’ll take a little more time, but we’ll be able to have much more faith in our end result.</p>

</div>
Expand Down
3 changes: 2 additions & 1 deletion ds4j/propublica-opportunity-gap/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ <h1>Tracking equal access to school programs</h1>
<div class="column col-5 col-md-12 readings">
<h3>Readings and links</h3>
<ul>
<li><a href="https://www.propublica.org/article/new-data-analysis-at-some-schools-achievement-lags-behind-opportunity">At Some Schools, Achievement Lags Behind Opportunity</a></li>
<li><a href="https://projects.propublica.org/schools/">The Opportunity Gap: Is Your State Providing Equal Access to Education?</a></li>
<li><a href="https://www.propublica.org/article/opportunity-gap-methodology">The Opportunity Gap methodology</a></li>
<li><a href="https://www2.ed.gov/about/offices/list/ocr/index.html">Office for Civil Rights</a>, a division of the US Department of Education</li>
Expand All @@ -77,7 +78,7 @@ <h3>Readings and links</h3>
</div>
<div class="column col-7 col-md-12">
<h3>Summary</h3>

<p>An analysis from ProPublica on the relationship between poverty and performance in Advanced Placement classes. The original data is no longer available, but in comparison to the other, smaller-scale educational analyses (Tampa Bay Times and Dallas Morning News, for example) this shows the wider range possible for national-level data.</p>
</div>
</div>

Expand Down
9 changes: 7 additions & 2 deletions ds4j/reuters-asylum/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,16 @@ <h1>Analyzing the impact of particular judges on the US asylum process</h1>
<div class="columns">
<div class="column col-5 col-md-12 readings">
<h3>Readings and links</h3>
None
<ul>
<li><a href="https://www.reuters.com/investigates/special-report/usa-immigration-asylum/">They fled danger at home to make a high-stakes bet on U.S. immigration courts</a>, from Reuters</li>
<li><a href="https://www.justice.gov/eoir/frequently-requested-agency-records">EOIR data</a></li>
<li><a href="https://trac.syr.edu/immigration/reports/580/">Incomplete and Garbled Immigration Court Data Suggest Lack of Commitment to Accuracy</a>, from the Transactional Records Access Clearinghouse (TRAC)</li>
<li><a href="https://datajournalismawards.org/projects/they-fled-danger-at-home-to-make-a-high-stakes-bet-on-u-s-immigration-courts/">Short methodology description</a></li>
</ul>
</div>
<div class="column col-7 col-md-12">
<h3>Summary</h3>
<p>In U.S. immigration courts, are certain judges and locations more likely to approve or deny claims of asylum?</p>
<p>In U.S. immigration courts, are certain judges and locations more likely to approve or deny claims of asylum? While logistic regression is an excellent choice in this situation, the dataset requires a hundred and one editorial decisions be made along the way. And in the end: even though it's an official government data dump, is the dataset even reliable enough for analysis?</p>
</div>
</div>

Expand Down
4 changes: 3 additions & 1 deletion ds4j/reveal-mortgages/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,13 @@ <h3>Readings and links</h3>
<li><a href="https://s3-us-west-2.amazonaws.com/revealnews.org/uploads/lending_disparities_whitepaper_180214.pdf">Whitepaper</a> for the data analysis</li>
<li><a href="https://www.revealnews.org/article/how-we-identified-lending-disparities-in-federal-mortgage-data/">Less formal writeup</a></li>
<li><a href="https://www.ffiec.gov/hmda/hmdaproducts.htm">Home Mortgage Disclosure Act data products</a></li>
<li><a href="https://github.com/cfpb/HMDA_Data_Science_Kit">HMDA Data Science Kit</a></li>
</ul>
</div>
<div class="column col-7 col-md-12">
<h3>Summary</h3>

<p>Analyzing a massive trove of public records, Reveal performed an analysis of lending disparities within racial and ethnic groups. Home Mortgage Discloser Act data from individual borrowers is too unwieldy to sit as a CSV or even open in pandas, so this project jumps directly into managing a SQL database populated through scripts provided by the Consumer Finance Protection Bureau.</p>
<p>With one of the <a href="https://s3-us-west-2.amazonaws.com/revealnews.org/uploads/lending_disparities_whitepaper_180214.pdf">most easily reproducible whitepapers I've ever seen</a>, it's simple to walk through Reveal's foodsteps and use logistic regression to pull back the mask on the mortgage industry.</p>
</div>
</div>

Expand Down

0 comments on commit 03337ee

Please sign in to comment.