-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scraper - possible non-identical guest/host pages from various shows? #72
Comments
So there's no merging of any kind. I tried to implemented it as simply as possible (and now examining it I understand it wasn't ideal and actually a little messy). Basically each host/guest is uniquely identified by their The intend was to grab everything from the first occurrence, but I realized that I'm overriding the data all the time (not the img tho) by mistake, so it end ups being the last. |
I'm going to do something to maybe save all the different versions of the data and images, and then we will have to merge them manually. |
See PR draft #75 The scraper grabbed the host/guest from the first occurrence it saw and saved it as In terms of the avatar, I just left it to save the image that's on the first show it found that person Here are all the additional files that got saved:
@gerbrent I'll leave it up to you to merge them, because you gotta think about the content of the bio. I suggest using meld to see all the diffs and merge it together. Or at least just the
|
nice work, added to my list. |
Hmm.... also looks like my bio is outdated ; ) |
Important: Any file that will be manually edited (basically merged/consolidated from multiple variant files) should be added to the |
After #110 was completed the files aren't JSON anymore. I added some commits with the new hugo md files for the variants in #75. This comment still applies, but now they are all |
Curious how we're dealing with merging potentially inconsistent guest or host data (description, links, photos, etc) when scraping across shows, since we are attempting to merge guest/host profiles into a single source-of-truth/entry here in hugo.
I ask since I'm in the process of creating an inconsistency on Fireside ; )
The text was updated successfully, but these errors were encountered: