-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A flexible approach to privacy and location #61
Comments
Nitpick: I think that option 1 would require a small change to the spec in that you’d need to remove the line:
I vote option 1. It’s much more flexible and allows for accurate routes for providers and agencies who are comfortable with that. I suspect that many agencies will want to know traffic patterns, and that is only discernible via option 1. Note too that a provider and agency could agree to implement option 2 with the spec of option 1 by just saying that the provider will move each point to the center of its census block (or neighborhood or whatever). The converse (implementing option 1 with the spec of option 2) is not possible. Also, I’d argue that option 2 is worrisome given that there is no international block/neighborhood standard. |
This is a great discussion. We've heard reports of certain vendors truncating their values to 2 decimal places already! At least, LADOT should be clear they are requiring 6-7 decimal points in latitude and longitude values, without snapping. I don't believe that truncation is an appropriate anonymization technique, but a snapping technique that is related population-density (like census tract) might be. However, such low-precision locations are not useful for street-level insights, as @aickin points out. |
Agreed with @ezheidtmann, great discussion on this topic! I think the proposal over at #51 would actually cover @jfh01's Option 1 above, correct? |
I think it would. I don't think we need a way to reflect GPS accuracy separately from intentional "deprecisioning." As @ezheidtmann points out though, there's a larger question about whether this approach can achieve a good balance between utility and privacy. The answer may end up being dependent on the needs/attitudes of each city. There's a related question on time-specificity. One additional privacy measure would be to provide trip data with the start and end times rounded to the nearest 15 minute or 1 hour increment. |
A good article on the specific risks with geo-precise, anonymized data: Even without persistent user IDs, there is some risk of reidentification or misuse. |
A little late to this thread but option 1 seems like a good option. It's worth distinguishing between accuracy and precision. What we are really recording with Good StackExchange thread on the subject. |
I agree that we don't want to conflate imprecision due to the fundamentals of GPS sensors (called Imagine a tool that processes MDS entities -- that tool should be able to know whether it's working with a stream of GPS locations, a list of census tract centroids, or a list of locations snapped to a 1km grid. (just 2 examples) |
Could one version of the specification just take block level aggregations to a SharedStreets ID or OSM ID? I realize this is an old thread but, I think this provides the degree of operational data many cities would need for curbside management and related applications. I think route choice applications might be lost, but this might be a problem that requires a tiers of deployment perhaps? I agree with @aickin that this could be compatible with by snapping all points to the the nearest "centroid" of a SS ID or OSM ID. Benefit of these is that in theory they should be able to be internationally compatible if based around OSM. My only concern here is it might not be "anonymized" enough. |
Per conversation on Provider Services Working Group we are closing this issue and will open new issue(s) with more considered approaches to this issue. |
The MDS technical workshop made clear that LADOT wants precise location data as part of their interactions with providers. LADOT feels that can manage any real or perceived privacy risks that stem possessing anonymized, location-precise trip data, and that the benefits of this data are significant.
From comments made at the workshop, it seems that there is not universal agreement on this point, especially given the variability in state-level FOIA laws and the different attitudes individual cities have about the broad topic.
I'd like to propose that MDS be extended to allow for flexibility in how location data is reported for trips and routes. LA can choose to require precise data, but other jurisdictions would have the flexibility to request less detailed information. It is ultimately a policy decision, and a universal spec needs to support some level of policy variance across its users.
Two approaches to (intentionally) reducing the accuracy of GPS data:
Blurring data to report less precise coordinates. These would correspond to a radius of uncertainty (e.g. accurate to within 500m).
Aggregating data into defined boundary areas (e.g. census block, ZCTA, neighborhood, etc.).
Option 1 seems like the simplest approach. It could be done without change to the MDS specification. Providers and agencies would simply agree on the level of specificity in the route's GeoJSON FeatureCollection. With this approach, the specification can simply be modified to acknowledge the possibility of deliberately imprecise data and leave it up to the stakeholders to agree on the details.
Option 2 may be easier for cities to consume (since they're already used to aggregating data into boundaries like census block), though would necessitate more pre-processing by the providers. It also would require modification to the specification since GeoJSON does not have an obvious way to describe location within specify externally-defined boundary areas.
Thoughts on:
The text was updated successfully, but these errors were encountered: