Applications and next steps

At the end of the day, our process did a pretty decent job coming up with results that matched CRPs -- even in some cases finding things that the CRP data got wrong. But the question still remains: So what? What good does it do to work with campaign data at the donor level as opposed to the contribution level? Is it really worth all the trouble?

Of course it is! Here are a few examples of applications that might derive from this process at the local, state and federal levels.

Potential applications

Recall that part of the motivation behind this project was to generalize the standardization of donor names across campaign finance datasets. The main fields in most campaign finance datasets -- local, state or federal -- look pretty much the same: donor name, recipient name, some location information, and often some info about occupation and employer. CRP does a great job cleaning up this data on the national level, and the National Institute for Money in State Politics does something similar for the states, but neither of them are going to be able to standardize your local city council's campaign finance records on demand. So there's one application right there.

But that still leaves a bigger question: What's the point of standardizing this stuff at all? Some of the most instructive and inspirational ideas I've heard along these lines came with a data mining contest we ran late last year. I was with the Center for Investigative Reporting at the time, and we teamed up with IRE to co-host the contest with Kaggle. The point was for data scientists and other non-journalist experts to look at a set of federal campaign finance data and see what kinds of cool analyses might be performed that reporters might not think of. You can see most of the entries here, but here are a few that stood out:

The winning entry, by Australia's own Nathaniel Ramm, proposed a tool called a behavior stability index for tracking whether and when a particular donor's giving patterns change over time.
Another entry proposed linking donors with Wikipedia pages to enrich donor information with useful metadata. The entry also proposed looking at networks and communities of donors, which could reveal interesting patterns.
Another proposed using statistical techniques to detect donor coordination. Although the method was proposed to find illegal coordination between candidates and Super PACs, it could also be adapted to reveal donors who tend to work together, which could lead to new and interesting stories.

Common among all of them is the need to look at data from a donor-centric perspective. But even beyond that, BLAH.

Rainmaker
Multi-year donor checks
Donors that work in concert

DONE RIGHT, ENABLE NEW ANALYSES, VISUALIZATIONS AND STORIES NEVER BEFORE POSSIBLE.

Next steps

Check bias vs. variance and recommend accordingly
Better features, particularly ZIP code distance
Review training data. Some CRP stuff is wrong.
Name parser improvements: nicknames; Jrs and Srs are screwing up in some places
Zero padding on ZIPs

Wrapping up

That's all for now! Further work I do to generalize this method will be made available here on Github. In the mean time, if you have any thoughts or questions, I'm at chase.davis@gmail.com.

Thanks for reading!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applications and next steps

Potential applications

Next steps

Wrapping up

Clone this wiki locally