Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing Dataverses in Google Scholar #2717

Closed
eugene-barsky opened this issue Nov 2, 2015 · 23 comments
Closed

Indexing Dataverses in Google Scholar #2717

eugene-barsky opened this issue Nov 2, 2015 · 23 comments

Comments

@eugene-barsky
Copy link

Our data is only good if people can find/discover it. And in academia, many people are using Google Scholar to search for research. Also, Google Scholar is a place many of our faculty go for tenure and promotion metrics.

As I was reading Google Scholar (GS) inclusion guidelines - https://scholar.google.ca/intl/en/scholar/inclusion.html, and I could see that Institutional repositories are often indexed in GS automatically (DSpace, ePrints, etc)

However, data repositories, even these issuing DOIs, seem not to be indexed. Of course, the vague and unclear scope of GS indexing does not help either. Well, I wrote about it back in 2005 - https://ejournals.library.ualberta.ca/index.php/jchla/article/viewFile/22437/16666

Therefore, I was wondering whether you had any conversations with Google Scholar team to include your Dataverses and/or other to the GS database?

Also, for instance, The National Snow and Ice Data Center implemented the schema.org dataset extension last year to enable crawlers to index their datasets. It is a small, machine-friendly chunk of code that basically tells crawlers that data live here. The nice thing about this is that, rather than actions on the search engine side, the schema.org implementation works for all crawlers... ie. so as independent data crawlers come up to speed, they will be able to see your data in addition to google.

I would be delighted to assist your team in this discoverabilty work with Google Scholar as I have collaborated with them before.

With thanks,

Eugene

@mercecrosas
Copy link
Member

👍 I think that the idea of implementing schema.or dataset extension is a great one.

@posixeleni
Copy link
Contributor

Would be happy to help in any way I can with this.

@pdurbin
Copy link
Member

pdurbin commented Nov 3, 2015

@eugene-barsky thanks for opening this issue! I like the focus on Google Scholar as a use case but I'd like to point out some related ideas:

I could see that Institutional repositories are often indexed in GS automatically (DSpace, ePrints, etc)

Does anyone know how this works? Are DSpace and ePrints using sitemaps?

@posixeleni
Copy link
Contributor

Good questions @pdurbin wonder if it relates to this ticket as well: #1393 which relates to Reference Management applications grabbing data citations from our site.

@eugene-barsky
Copy link
Author

Here is Google Scholar Anurag Acharya's presentation on how they index in GS - https://media.dlib.indiana.edu/media_objects/avalon:16122. It is fresh from Summer 2015. I'm not sure if Google Scholar actually indexes schema.org tags. However, Anurag goes to great detail in that presentation on what they want to see in GS...

@mercecrosas mercecrosas modified the milestone: In Review Nov 30, 2015
@scolapasta scolapasta modified the milestone: Not Assigned to a Release Jan 28, 2016
@pdurbin
Copy link
Member

pdurbin commented Apr 12, 2016

I just came across an interesting comment from @bnosek at https://groups.google.com/d/msg/openscienceframework/-5sOS4bH-M0/lG4NwxnUAAAJ who says, "At present, the Google Scholar team has expressed interest in only indexing manuscripts/articles, not other research products (data, materials, etc.). That may change over time as these other research products are recognized as unique intellectual contributions."

@eugene-barsky
Copy link
Author

Yeah, this is the sense that I have been getting from GS team, specifically
from Anurag Acharya. Too bad...I think that we need to keep talking to them
as much as we can...

E

On Tue, Apr 12, 2016 at 9:19 AM, Philip Durbin notifications@github.com
wrote:

I just came across an interesting comment from @bnosek
https://github.com/bnosek at
https://groups.google.com/d/msg/openscienceframework/-5sOS4bH-M0/lG4NwxnUAAAJ
who says, "At present, the Google Scholar team has expressed interest in
only indexing manuscripts/articles, not other research products (data,
materials, etc.). That may change over time as these other research
products are recognized as unique intellectual contributions."


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#2717 (comment)

@borsna
Copy link

borsna commented Apr 12, 2016

It would help if dataverse would use the Dataset markup from schema.org.
icpsr and some nationall archives have already done it, but it would be a big push to get it on dataverse.harvard.edu and all other portals running dataverse.

If you need i could try to make a first draft of it in your templates?

@mercecrosas
Copy link
Member

That would be great, Olof. I've been pushing the idea that data repositories should use schema.org, and help schema.org define better metadata for data.

Please share your first draft when you have it!

Merce

Sent from my iPhone

On Apr 12, 2016, at 4:13 PM, Olof Olsson notifications@github.com wrote:

It would help if dataverse would use the Dataset markup from schema.org.
icpsr and some nationall archives have already done it, but it would be a big push to get it on dataverse.harvard.edu and all other portals running dataverse.

If you need i could try to make a first draft of it in your templates?


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

@adam3smith
Copy link
Contributor

(Please re-direct if this isn't the right place):
Google has now released an experimental metadata schema for data markup and discoverability: https://developers.google.com/search/docs/data-types/datasets
While I'm not necessarily in love with google just defining a new standard, it's google so it'd be great to support that in addition to schema.org (on which I agree with Merce & others).

@djbrooke
Copy link
Contributor

Thanks @adam3smith. We just had another request (via ticketing system) for the experimental metadata schema that you linked. I'll ask the requestor to drop in this issue to add any additional information that's valuable. I'll bring this up in my next meeting with @mcrosas in order to get an idea of where it could fit into our next few releases.

While I'm not necessarily in love with google just defining a new standard, it's google

Well said :)

@adam3smith
Copy link
Contributor

Thanks @djbrooke . To the extent it's relevant (i.e. as a signal for the degree of uptake we'll see), I just heard from Figshare that they're implementing this schema before the end of the year.

@mercecrosas
Copy link
Member

Dataverse should support schema.org so datasets are easily searchable by
Google, as recommended by the Data Citation Implementation expert group. In
a few days, I'll share a paper that we are finishing up with the details of
what to support from this schema.

Merce

Mercè Crosas, Ph.D.
Chief Data Science and Technology Officer, IQSS
Harvard University
http://scholar.harvard.edu/mercecrosas

On Thu, Oct 27, 2016 at 12:11 PM, Sebastian Karcher <
notifications@github.com> wrote:

Thanks @djbrooke
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_djbrooke&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=LAoRIftlP6k1kObRsBXEeoS-QgALdonGB-jQNRvm-wE&s=CtLh5R6hlvhnjm6dEaQHW2pxkMGns_ULixyrMrO9L6A&e=
. To the extent it's relevant (i.e. as a signal for the degree of uptake
we'll see), I just heard from Figshare that they're implementing this
schema before the end of the year.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse_issues_2717-23issuecomment-2D256692394&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=LAoRIftlP6k1kObRsBXEeoS-QgALdonGB-jQNRvm-wE&s=wjRvLVfTXM3qMRY7LlKNZDdKQDdiad2dya-neltsMWw&e=,
or mute the thread
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AApQyCDK6rpx0voM42LT1loyYGB7Fd9Xks5q4M1IgaJpZM4GagWz&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=n9HCCtgqDPssu5vpqjbO3q4h2g6vMeTOp0Ez7NsdVFM&m=LAoRIftlP6k1kObRsBXEeoS-QgALdonGB-jQNRvm-wE&s=NAaRl3X6buWwfgVsoHip7hkzEP3SSa3Gb6HPnTQt7cY&e=
.

@pdurbin
Copy link
Member

pdurbin commented Nov 5, 2016

Should this issue and #2243 be combined? They seem highly related to me.

@pdurbin
Copy link
Member

pdurbin commented Jan 26, 2017

This was posted two days ago: https://research.googleblog.com/2017/01/facilitating-discovery-of-public.html . Thanks for pointing it out, @eugene-barsky

@pdurbin
Copy link
Member

pdurbin commented Jun 23, 2017

@eugene-barsky does the recent work on #1393 help?

@eugene-barsky
Copy link
Author

eugene-barsky commented Jun 23, 2017 via email

@jggautier
Copy link
Contributor

The scope of the issue changed during the conversation from asking if Dataverse had been in touch with the Google Scholar team about having our repositories indexed in Google Scholar, to improving dataset discoverability in search engines in general by using schema.org metadata to describe datasets.

I'm in favor of closing this issue since Google Scholar still has no plans to index data repos. More discussion and resources about using schema.org metadata is in #2243.

Having metadata tags in dataset landing page html (#1393), especially the dataset PID, will help with the first-step approach in #3793, where we would add schema.org metadata to datasets using a DataCite script that needs "the DOI from the page via a "DC.identifier" meta tag."

@pdurbin
Copy link
Member

pdurbin commented Jun 26, 2017

@jggautier I'm in favor of closing the issue since both you and @eugene-barsky seem to agree that Google Scholar still has no plans to index data repositories.

@eugene-barsky what do you think? If you want, you could open a new issue if this one is getting a bit too sprawling.

@eugene-barsky
Copy link
Author

eugene-barsky commented Jun 26, 2017 via email

@pdurbin
Copy link
Member

pdurbin commented Jun 26, 2017

Thanks. Closing.

@pdurbin pdurbin closed this as completed Jun 26, 2017
@pdurbin
Copy link
Member

pdurbin commented Nov 6, 2017

@eugene-barsky it struck me that Google really emphasized sitemaps in the video at https://www.rd-alliance.org/making-data-discoverable-web-search-engines . Thanks for putting that session on my radar! Can you please open a new issue about sitemaps? I don't have the slides but here's a screenshot from the video from about 24 minutes in:

screen shot 2017-11-05 at 10 18 12 pm

@pdurbin
Copy link
Member

pdurbin commented Nov 7, 2017

@eugene-barsky thanks for opening #4261!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants