Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and remove bad image URLS for person avatars #218

Open
davidmooreppf opened this issue Dec 18, 2013 · 9 comments
Open

Detect and remove bad image URLS for person avatars #218

davidmooreppf opened this issue Dec 18, 2013 · 9 comments
Assignees

Comments

@davidmooreppf
Copy link
Member

Originally issue name: default avatars when no photo (or something like that)

On many pages ::

http://preview.askthem.io/locator?q=cincinnati%2C+oh#

... images are showing as broken when we don't have the avatar from our various data sources. Can we have them be uniformly the blank-person-outline default image, when not definitely available? Realize easier typed than done, haha.

@ghost ghost assigned walter Dec 18, 2013
@davidmooreppf
Copy link
Member Author

(on above link, click to state electeds to see a broken image)

@walter
Copy link
Contributor

walter commented Dec 18, 2013

We actually have an image URL for that person, but it's bad:

http://www.house.state.oh.us/houseImages/129/headshots/h32.jpg

It redirects and then gives a 404.

When we don' have any image URL, it does give the default blank avatar.

I'll have a think about a strategy for testing and pruning out image URS so this is avoided.

@davidmooreppf
Copy link
Member Author

This is still important, b/c electeds will want to see their faces where possible, and a fair amount of photos are missing it seems from the API. E.g. an ally, Brad Lander ::

https://www.dropbox.com/s/y1lpqcnk5pxg9zi/Screenshot%202014-01-22%2014.00.38.png

@walter
Copy link
Contributor

walter commented Jan 25, 2014

Just because it is important doesn't mean it is possible.

Looking at the example of Brad Lander…

At this point most councilmember data (i.e. outside of Philly, San José, and Chicago) is acquired from Google Civic Info API. If the API has an image for the person, we grab it.

Google has recently updated the data for NYC and now has a photo url for Brad Lander. Because Brad Lander hasn't been loaded into the production site's data yet, when someone does go to ask him a question, we should get his photo url. I.e. it wasn't available when we loaded his data on preview, but is now.

However, when we look at another councilmember, Antonio Reynoso, Google Civic Info API does not have an image url for him and we are shit out of luck.

If Google Civic Info API later adds a photo_url for Antonio Reynoso, the next time someone has him as a potential recipient for a new question with an address lookup, we'll grab his photo_url and save it.


Having said all that, the problem of not having images available for some elected officials, is not the actually what this, issue #218, is about! It's about when we DO have a photo url for a person, but it is a BAD URL.

That means what we want to do is clear out photo urls anywhere in our elected officials where there IS a photo url for a person, but it is no longer something that will return an actual image from the web.

What this is going to require is running something through almost all our data that requests almost EVERY person's photo url and checks to see if it still any good. If not, it should remove the photo url value from the person.

No photo_url value will actually allow our subsequent calls to Google Civic Info API to populate the person's photo_url that hopefully is more current and works.

@walter
Copy link
Contributor

walter commented Jan 25, 2014

As an interim measure, I have gone ahead and cleared out the 148 photo urls from state legislators that pointed at bad Ballotpedia urls with this command in the rails console for both preview (aka staging) and production:

Person.where(photo_url: /ballotpedia/i).each { |person| person.photo_url = nil; person.save! }

If Google Civic Info API has photo urls for these state legislators and anyone enters an address that bring them up through Google Civic Info API, we should then get the good photo url for them.

walter added a commit that referenced this issue Jan 25, 2014
walter added a commit that referenced this issue Jan 25, 2014
@konklone
Copy link
Contributor

konklone commented Feb 8, 2014

Could you cache photos when you first find that they're valid, and have Cloudfront sit in front of the cached copies? Then your problem at least becomes stale photos rather than missing photos - which you could mitigate by re-checking your photos on a regular basis and only updating photos when you have a heuristic that tells you it's actually a 200 and an image.

@walter
Copy link
Contributor

walter commented Feb 8, 2014

We use Cloudfront now in front of images in most cases for elected officials, but I have to admit I haven't seen if it is used as best it can be.

I've only tweaked what was there before I came on the scene as needed rather than figuring out if we are doing the best thing possible.

When I get a breather I'll take a look at the assets as a whole and see if they can be handled in a less fragile way. Images going missing is common for opengov data, so have to plan for it.

@konklone
Copy link
Contributor

konklone commented Feb 8, 2014

All understood. I haven't used Cloudfront myself either. I just looked at the URLs, and they look like this:

http://d2xfsikitl0nz3.cloudfront.net/http%3A%2F%2Fwww.in.gov%2Flegislative%2Fsenate_republicans%2FhiResImages%2Fthumbs%2FBecker_hi_res_tn.jpg/60/60

So CloudFront is sitting right in front of the original URL, which can change underneath. My suggestion is to cache them at an askthem.io URL, an then continue to use CloudFront but have it sit in front of that URL.

Just so the problem is obvious, here's a cropped screen of the state of Indiana's state people:

askthem-indiana-state

@walter
Copy link
Contributor

walter commented Feb 8, 2014

Either way we are going to have to work through our data and clear out the bad URLs as we don't have caches for them.

Once that is done, the placeholder image will be used instead of the broken icon.

I would actually like to build in some reporting of bad images into the periodically run sweeper or whatever so we can alert those in charge of the source data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants