Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Detect duplicate articles #205

Closed
Tracked by #250
Yikai-Wang opened this issue Feb 4, 2023 · 4 comments
Closed
Tracked by #250

[Feature Request] Detect duplicate articles #205

Yikai-Wang opened this issue Feb 4, 2023 · 4 comments
Labels
feature-request New feature request mid-priority mid priority

Comments

@Yikai-Wang
Copy link

Thanks for the great work!
It may be an important feature to automatically detect duplicate articles (mostly have the same titles and authors) for users.
It would be better if it can further automatically keep one of the duplicate articles and remove others (maybe let the user choose which one to keep).
Thanks again!

@GeoffreyChen777
Copy link
Member

Paperlib does check duplicated papers when yo import a new one. If you have two duplicated papers. The reason might be:

  1. their titles/authors/publications are slightly different.
  2. You manually edit one before.

@Yikai-Wang
Copy link
Author

When using Paperlib, I encountered a scenario where I found duplicated papers:
Snipaste_2023-02-04_18-45-43
One was obtained from the authors' website while the other was sourced from arXiv.
I cannot recall the exact details, but I suspect that the difference in capitalization between the original titles might have resulted in the failure of checking duplicated papers. Then I standardize the formatting manually. To avoid such incidents, it's advisable to double check manual edits made to the titles/authors/publications, if possible.
Again, thanks for your great work!

@GeoffreyChen777
Copy link
Member

I guess capitalization caused this issue.

I will try to solve this after March 8. Because recently I've been busy because of my paper submission.

@GeoffreyChen777
Copy link
Member

Hi, 3.0.0-beta.1 is released now. It introduces an extension system.

Your feature request can be achieved by a command extension. For more information, please refer to:

https://paperlib.app/en/extension-doc/ext-types/command-ext.html

The overall pipeline of this extension should be:

  1. register a command to trigger your duplicated paper finding function
  2. In this function, use the provided API to get all papers.
  3. Then find the duplicated.
  4. Add a new folder for them, such as 'duplicated'.
  5. FInally, update these papers by using corresponding database API.

Now, all duplicated papers will be in the duplicated folder.

3.0.0-beta.1 Release Note:

  1. The entire code has been refactored to support the extensible architecture. For details on extension development, please refer to our official website. Let's make Paperlib better together!
  2. All metadata scrapers and downloaders have been moved into corresponding extensions.
  3. A new command panel interface has been introduced to replace the basic search bar.
  4. Support for creating new tags and folders in the sidebar.
  5. Fixed some bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature request mid-priority mid priority
Projects
Status: Done
Development

No branches or pull requests

2 participants