PaperSumGPT is a tool for abbreviating long scientific paper contents using ChatGPT, designed to help researchers and students to quickly understand the key points of academic papers.
한국 분들은 여기에 있는 문서를 읽어주세요!
- NOTE 1: For ChatGPT free users!
- NOTE 2: PDF converting functionality deprecated
- NOTE 3: ANSI escape sequences updated!
- How to Install
- Usage
- Dependencies
- License
- Extra: The easy way (using ChatGPT splitter)
::2023-04-03 updated::
After I tested with several accounts with ChatGPT, I found that there were significant differences in the performance of ChatGPT depending whether the account is a free user or a paid user (ChatGPT Plus).
If you are a free user of ChatGPT, and you have a long paper to summarize, I recommend you to upgrade your account to ChatGPT Plus to get a better performance.
Unfortunately, the free version of ChatGPT cannot understand and store the long context of the input text, which leads to a poor performance; it will export a summary that is NOT related to the input text at all, or it will export an output related to the certain part of the input text.
::2023-06-21 updated::
The PDF converting functionality is now deprecated. Instead, I recommend you to use the following online PDF to text converter:
::2023-04-12 updated::
ANSI escape sequences are now updated to support the rich text formatting of the messages in the terminal. Important notices and warnings are now highlighted in bolded red font.
If you are using Mac, you can skip (0) For Windows users step.
Since there are no pre-built binaries for Windows, follow the instructions below to install PaperSumGPT on Windows.
In the search tab, type
Turn Windows features On (Windows 기능 켜기/끄기 in Korean)
. Then, check the box ofWindows Subsystem for Linux
.Next, reboot your computer.
Now, you need to install Ubuntu in your local computer.
Open Ubuntu and make your UNIX accounts and passwords.
For ease of use, you should install
Anaconda
by following instructions.wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
bash Anaconda3-2019.10-Linux-x86_64.sh
Read all the instructions with Enter and type
yes
to agree with the license.source ~/.bashrc
Now, type
conda activate
in your terminal. If you see
(base)
in your terminal, you have successfully installed Anaconda.Install VcXsrv in your local computer. Download
VcXsrv
installer and run it.
Then, clickFinish
.Next, open
XLaunch
and clickNext
.After you open
XLaunch
, you should check the following options:
- Multiple windows
- Start no client
- Disable access control
Done! Now let's move on to the terminal.
Type the below commands in your terminal.
sudo systemd-machine-id-setup
sudo dbus-uuidgen --ensure
cat /etc/machine-id
If terminal shows a long string of numbers and letters, you have successfully installed
systemd-machine-id-setup
anddbus-uuidgen
.Finally, you can install
x11-apps
by typing the following command:sudo apt-get install x11-apps xfonts-base xfonts-100dpi xfonts-75dpi xfonts-cyrillic
Add the environment variable
DISPLAY
to your.bashrc
file by typing the following command:echo "export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2; exit;}'):0.0 sudo /etc/init.d/dbus start &> /dev/null" >> ~/.bashrc
source ~/.bashrc
Test your X11 GUI by typing the following command:
xeyes
If you see a pair of eyes, you have successfully installed X11 GUI.
These steps are essential (in Windows) for successfully executing
playwright
in Windows terminal (which is critical when you configure yourChatGPT
account).
You can install PaperSumGPT by cloning this repository and install it from the source:
git clone https://github.com/wjgoarxiv/papersumgpt.git
cd papersumgpt/
And you must use install_old-repo.sh
to install the legacy version of chatgpt_wrapper
. The new version of chatgpt_wrapper
is not compatible with the current version of papersumgpt
(since the new version of chatgpt_wrapper
will use the ChatGPT API, not the stream-based one).
chmod +x *
./install_old-repo.sh
Then, you can install PaperSumGPT by running the following command:
pip install .
Before using papersumgpt
, you must run chatgpt_wrapper
to start the ChatGPT server.
Since you are first running chatgpt_wrapper
in your computer, you might input the following command to install playwright
:
playwright install
The nightly will be downloaded and installed in your local machine.
Next, you can use the following command to start the server:
chatgpt install
Login to your ChatGPT account in Nightly browser. If you see the chat window, close the browser and type /exit
to close the chatgpt_wrapper
. After that, you can restart the chatgpt_wrapper
by running the following command:
chatgpt
This is the original functionality of chatgpt_wrapper
. For more information, please visit the chatgpt_wrapper github repository.
After running chatgpt_wrapper
, you can use papersumgpt
to summarize the content of a paper. You can use the following command to summarize the content of a paper:
papersumgpt
The following error might occur:
------------------------------------------------
ERROR: There is no file in the current directory. Please check the current directory.
------------------------------------------------
Note that you must put the paper you want to summarize in the current working directory. For a demonstration, we will use chatgpt-a+meta+analysis+after+2.5+months.txt
as an example. Refer to the ExampleRun/
folder. A chatgpt-a+meta+analysis+after+2.5+months.txt
file was prepared by just copying the text contents of chatgpt-a+meta+analysis+after+2.5+months.pdf
and pasting it into a text file.
Copy that file to the current working directory and run papersumgpt
again:
papersumgpt
And then, papersumgpt will ask you to choose the file type that you want to use:
INFO: Please type the number the file type that you want to use:
1. Markdown (`.md`) file
2. Plain text (`.txt`) file
:
Since we have chatgpt-a+meta+analysis+after+2.5+months.txt
in the ExampleRun/
folder, we will choose option 2. The papersumgpt will show the list of text files in the current directory and ask you to choose the file you want to summarize.
------------------------------------------------
+---------------+------------------------------------------------+
| File number | File name |
|---------------+------------------------------------------------|
| 1 | ./chatgpt-a+meta+analysis+after+2.5+months.txt |
+---------------+------------------------------------------------+
------------------------------------------------
INFO: Please select the file number or press "0" to exit:
Then, we will choose option 1.
INFO: The file name that would be utilized is ./chatgpt-a+meta+analysis+after+2.5+months.txt
------------------------------------------------
INFO: Do you want to turn on `verbose` mode? If you turn on `verbose` mode, the program will print the intermediate results. (y/n):
If you want to see the intermediate results, you can type y
. Otherwise, you can type n
. In this case, we will type y
to see the intermediate results.
INFO: Tossing initial prompt...
INFO: ChatGPT started abbreviating the input contents...
INFO: Progressing... (3/11)
...
The tool will process the content summary of the paper and make an output file in the same directory as the input file. Let's wait for a while! ☕️
While we are waiting, I have to mention that all these steps are synchronized with your current ChatGPT session in ChatGPT website. You can visit the website later to see all the progresses of the content summary.
After the abbreviation process is finished, the program will show the following message:
INFO Choose output format (stream / txt / md):
You can choose the output format by typing stream
, txt
, or md
. In this case, we will choose md
to output the result as a markdown file.
INFO: Output saved to ./chatgpt-a+meta+analysis+after+2.5+months.txt.md
You can find chatgpt-a+meta+analysis+after+2.5+months.txt.md
in the ExampleRun/
folder.
Open the markdown file with markdown-compatible editors. You can see the awesome result! 🎉 (Click here to see the output markdown file)
::2023-04-16 updated::
Now, the abbreviation result is more improved! Since the prompt has been more enhanced, the abbreviation output is more excellent. You can see the output contents as table format. The table format is more readable and clean. I've also added the updated version of output markdown file in theExampleRun/
folder. You can check [here](ExampleRun/[NEW] chatgpt-a+meta+analysis+after+2.5+months_output.md)! 🍾
Sections Abbreviated contents Title Perception of ChatGPT: An Analysis of Social Media and Scientific Publications Introduction ChatGPT is a chatbot released by OpenAI that has gained over 100 million subscribers in two months. This paper presents a comprehensive analysis of how ChatGPT is perceived based on over 300k tweets and 150+ scientific papers. Methodology The authors used NLP technology to analyze sentiment and emotion in tweets and machine translation systems to analyze tweets in languages other than English. For scientific papers, four co-authors annotated 48 Arxiv and 104 SemanticScholar papers on three dimensions: topic, impact, and quality. Experimental procedure The authors annotated scientific papers using guidelines developed after low agreement in initial annotation rounds. Only paper abstracts were used for classification, and guidelines were developed for prioritizing labels in ambiguous cases. Data analysis ChatGPT is generally perceived positively, with high quality and associated emotions of joy dominating. Its perception has slightly decreased since its debut, and non-English tweets tend to have more negative sentiment. ChatGPT is viewed as a great opportunity across various scientific fields, including the medical domain, but it is also seen as a threat in the education domain and from an ethical perspective. Results & discussion The authors found that ChatGPT is viewed as an opportunity in most scientific fields, but also as a threat from an ethical perspective and in education. ChatGPT is mostly perceived positively on social media, with some decrease in positivity since its debut. Conclusions This analysis contributes to shaping the public debate and informing the future development of ChatGPT. Future work should investigate trends over longer periods, consider popularity of tweets and papers, and investigate additional dimensions beyond sentiment and emotion. Significance of this study This study provides insights into the perception of a highly popular chatbot, which can inform future development and public debate surrounding AI language models. Things to look out for in follow-up research Future research should investigate the real impact of language models like ChatGPT on society, including their potential to exacerbate or mitigate existing inequalities and biases. Useful references to consider Haque et al. (2022), Borji (2023), Bowman (2022), Beese et al. (2022)
Note that ChatGPT sometimes makes undesired outputs. In this case, you should try a few times to get the best result. Good luck with your research! 🚀
- pyfiglet - For generating ASCII art of the project name.
- tabulate - For creating clean and readable tables for the output.
- chatgpt_wrapper - An useful open-source unofficial Power CLI, Python API and Flask API that lets us interact programmatically with ChatGPT/GPT4.
This project is licensed under the MIT License.
::2023-09-15 updated::
You can even achieve the same results even without installing papersumgpt
!
Thanks to the website ChatGPT splitter, you can easily summarize the contents of a paper (but it requires you to click the splitted contents manually :) ). Here how you can do it:
-
Convert the paper texts & contents by using PDF-to-text converter. You can visit any of the following websites:
-
Save the converted text file into your local computer with the file type
.txt
(.md
is also possible). -
Next, visit ChatGPT splitter website, and click
Upload file(s)
button (or you can paste the text contents into theOr paste your text
section). -
Into the
Prompt
section, paste the following prompt:Please, act as 'High-quality content abbreviator'. Since you have the input limits (OpenAI limited your input limit), you have to firstly take the all the inputs iteratively. To do this, I've already truncated the long inputs into each subpart. You'll now have to take the inputs iteratively. The important thing is that you should NOT answer directly or respond to the previous message. Make sure that you have to accomplish the task when all the inputs are given. I'll let you know if all the inputs are given.
-
Click
Process
button! -
The truncated texts would be splitted into several parts. You can click the
Copy
button to copy the splitted contents, and iteratively paste the contents into the ChatGPT (this takes time and effort). -
If you pasted the final chunk, then you can copy either of the following final prompts that I've prepared:
(1) Tabulated version
Now, all the inputs are given to you. You should combine and abbreviate all the inputs by fitting them into the following markdown format. The markdown format is as follows: ------ TEMPLATE STARTS ------ # **[TITLE]** (Bring the title from the foremost heading in the document. The powerful hint is that the title comes before the people who wrote the document.) ## **Introduction** ## **Methodology** ### **Apparatus** ### **Experimental procedure** ### **Computational procedure (if exists)** ### **Data analysis** ## **Results & discussion** ## **Conclusions** ## **Significance of this study** ## **Things to look out for in follow-up research** ### **Useful references to consider** ... ------ TEMPLATE ENDS ------ You have to write the outputs in a way that the readers can understand the contents easily. Don't forget to miss any important information from inputs. Detailed things that should be noticed would be included in the output (if possible, please bold them with `__BOLD__` or `**BOLD**` markdown marking for clear visibility). Consecutively, if possible, please find some useful references (including title and authors) from the Text or Markdown input file, and re-write them into `### Useful references to consider` subheader. Sort all these things into TABLE format; which will be efficient to understand what is what. Something like this: ```markdown | Sections | Abbreviated contents | | :----: | :----: | | __Title__ | [TITLE] | | __Introduction__ | [INTRODUCTION] | | __Methodology__ | [METHODOLOGY] | | __Experimental procedure__ | [EXPERIMENTAL PROCEDURE] | | __Computational procedure__ | [COMPUTATIONAL PROCEDURE] | | __Data analysis__ | [DATA ANALYSIS] | | __Results & discussion__ | [RESULTS & DISCUSSION] | | __Conclusions__ | [CONCLUSIONS] | | __Significance of this study__ | [SIGNIFICANCE OF THIS STUDY] | | __Things to look out for in follow-up research__ | [THINGS TO LOOK OUT FOR IN FOLLOW-UP RESEARCH] | | __Useful references to consider__ | [USEFUL REFERENCES TO CONSIDER] |
(2) Abbreviated markdown version
Now, all the inputs are given to you. You should combine and abbreviate all the inputs by fitting them into the following format. Note that you have to write the outputs __assuming you are making a paper sharing powerpoint presentation (ppt) for the audience__. You have to make audiences understand the content and methodology of this paper very well. Therefore, clearly abbreviate and express the important information only. Thank you for your consideration. ```markdown # **[TITLE]** (Bring the title from the foremost heading in the document. The powerful hint is that the title comes before the people who wrote the document.) ## **Introduction** ## **Methodology** ### **Apparatus** ### **Experimental procedure** ### **Computational procedure (if exists)** ### **Data analysis** ## **Results & discussion** ## **Conclusions** ## **Significance of this study** ## **Things to look out for in follow-up research** ### **Useful references to consider** ...
-
That's it! You can see the awesome results! 🎉
For more information, bug reports, or feature requests, please visit the GitHub repository.