-
Notifications
You must be signed in to change notification settings - Fork 13
Applications and next steps
At the end of the day, our process did a pretty decent job coming up with results that matched CRPs -- even in some cases finding things that the CRP data got wrong. But the question still remains: So what? What good does it do to work with campaign data at the donor level as opposed to the contribution level? Is it really worth all the trouble?
Of course it is! Here are a few examples of applications that might derive from this process at the local, state and federal levels.
Recall that part of the motivation behind this project was to generalize the standardization of donor names across campaign finance datasets. The main fields in most campaign finance datasets -- local, state or federal -- look pretty much the same: donor name, recipient name, some location information, and often some info about occupation and employer. CRP does a great job cleaning up this data on the national level, and the National Institute for Money in State Politics does something similar for the states, but neither of them are going to be able to standardize your local city council's campaign finance records on demand. So there's one application right there.
But that still leaves a bigger question: What's the point of standardizing this stuff at all? Some of the most instructive and inspirational ideas I've heard along these lines came with a data mining contest we ran late last year. I was with the Center for Investigative Reporting at the time, and we teamed up with IRE to co-host the contest with Kaggle. The point was for data scientists and other non-journalist experts to look at a set of federal campaign finance data and see what kinds of cool analyses might be performed that reporters might not think of. You can see most of the entries here, but here are a few that stood out:
-
The winning entry, by Australia's own Nathaniel Ramm, proposed a tool called a behavior stability index for tracking whether and when a particular donor's giving patterns change over time -- a decidedly donor-centric application.
-
Another entries proposed linking donors with things like Wikipedia pages to add useful metadata.
-
Another proposed using statistical techniques to detect phenomena like Astroturfing. Almost all of them require being able to look at a donor's history within, and even beyond, a single election cycle to be effective.
- Check bias vs. variance and recommend accordingly
- Better features, particularly ZIP code distance
- Review training data. Some CRP stuff is wrong.
- Name parser improvements: nicknames; Jrs and Srs are screwing up in some places
- Zero padding on ZIPs