You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The San Mateo source data (which is an actual RSS feed!) currently has some double-encoded HTML entities that we should see if we can clean up. That is, the source data might have code like:
One common way to fix this is to just repeatedly decode the HTML into plain text until there’s nothing left to decode, and then re-encode it once. e.g:
The downside here is that this can ruin code that was intentionally double-encoded, like source code examples. That’s probably not likely in the kind of data we’re dealing with, though.
The text was updated successfully, but these errors were encountered:
The San Mateo source data (which is an actual RSS feed!) currently has some double-encoded HTML entities that we should see if we can clean up. That is, the source data might have code like:
Which is a common symptom of an HTML entity like:
Getting re-encoded and second time and therefore getting ruined. See an example of this happening in practice on sfbrigade/stop-covid19-sfbayarea#309 (comment)
One common way to fix this is to just repeatedly decode the HTML into plain text until there’s nothing left to decode, and then re-encode it once. e.g:
The downside here is that this can ruin code that was intentionally double-encoded, like source code examples. That’s probably not likely in the kind of data we’re dealing with, though.
The text was updated successfully, but these errors were encountered: