-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microposts original data not matching with what gerbil expects #206
Comments
Thanks for creating a new issue! :) hmm it looks allright. The datasets you linked are correct and should work with the implemented Wrapper. |
I assume that you simply have to rename the files and move them in the directory where they are expected. Can you please try this? The "check" that is done by GERBIL when starting the server is very simple and does only look for the files that it is expecting. If there would be a problem with the data inside the files, you would encounter it when you try to benchmark something with these datasets. |
i used the Microposts 2014 set you provided, added them to the gerbil_data and changed the file name. It worked. So i guess what MichaelRoeder just said seems correct. |
Ok, I'll test all of them and keep you posted.
Sagnik
… On Aug 3, 2017, at 12:43 PM, F.C. ***@***.***> wrote:
i used the Microposts 2014 set you provided, added them to the gerbil_data and changed the file name. It worked.
So i guess what MichaelRoeder just said seems correct.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This is what I have done:
Is the mapping correct? Because this is the output I get
|
|
and the filenames are not hardcoded but in the |
I exactly followed your instruction, this is what I am getting from the log while trying to run an experiment with
|
hmm, the dataset has empty lines (only IDs) I will handle it in the code. But not sure right now whats the best way. This seems to be the problem for all the datasets. |
Yup. It fails with empty id lines. Can confirm. But more importantly, for 2014, the filetype is |
This python script should do the line removal trick (tested only with MP2014)
|
you have been very helpful @TortugaAttack ! I am assuming you are converting micropost data to NFI format? If you could point out the code for me, I could give it a try. Thanks anyway. |
ah thanks! 2014 should work now |
ok, ll test and keep you updated. |
Extremely sorry for the late reply, but this is the result I get, which I don't think is correct:
|
Nope, that is not correct :D |
Sorry it took so long! Yours missing the tweet itself (this is why the dataset has empty lines) and thus cannot be used with the MP2014 Wrapper. Sorry this took so long! |
@sagnik If the last post answered your question, please close this issue. Thanks. |
Please refer to #41 , The wiki mentions that it expects the data in certain formats, specifically:
Microposts2013
Microposts2014
Microposts2015
Microposts2016
I downloaded microposts data from the following sources:
For 2013, the contents of the zip file do match, for others, the contents are as follows:
2014
2015
2016
This is clearly different from what Gerbil expects. If you have any suggestions, please let me know. Also, as @TortugaAttack suggested in #41, I went through the logs in my local machine and it does seem that the microposts data is not loaded:
The text was updated successfully, but these errors were encountered: