-
Notifications
You must be signed in to change notification settings - Fork 30
Potential Data Reuses
See, e.g., this paper
Time of implementation: ⭐ Flesch reading ease
and Flesch–Kincaid grade level
metrics.
💡 : track how these 2 measures change when an update to a document is made
See this article for a primer on the topic.
Time of implementation: ⭐ ⭐ doc2vec
have python implementations that are easy to integrate to our repo. Plotting/representing these embeddings is a little bit more involved but still relatively simple.
💡 : Projecting documents into 2-dimensional embedding could allow us to produce "maps" of documents — grouping service providers together by proximity and making interesting comparisons. For example, are Facebook terms semantically closer to Instagram's terms than to Twitter's?
Time of implementation: ❓
💡 : do certain service providers tend to perform the same kind of updates at the same time? can we correlate some changes with some well-identified exogenous events (e.g. Covid crisis, a new legislation, etc.) and measure the average time it takes each service provider to perform a terms update in response to these events?
💡 : Some changes are applied “officially” and modify the “last updated” date. Some changes are applied, yet are not detectable by the end user if they don't use Open Terms Archive. How often are terms updated without the users' knowledge? In which proportion? Do “official” changes always correlate with “significant” changes?