The dataset is constructed for deeper analysis of scholars as well as papers in the AI community.
We provide public access to the two collections below:
- Download the 78K Google Scholars data through the Google Drive shared link: gs_scholars.npy.
- Download 100K random samples from the 2.6M papers through the Google Drive shared link: ai_paper_features_100k. If you need the full data, please initiate a github ticket.
The data contains 78,536 AI scholars with all features directed obtained from Google Scholar profile pages. We crawled the list of AI Scholars through four domain tags shown on the Google Scholar profile page: AI, MLP, ML, CV. To control the scale of the dataset, we includes scholars with total citations over 100 by Jan 1, 2022.
AIScholars78k_samp1000.csv shows 1000 random samples of the dataset.
The data contains 2,890,908 AI papers. We collected all paper titles by iterating through the Google Scholar profile of each AI researcher by Jan 1, 2022.
Papers100k_samp1000.csv shows 1000 random samples of the data.