-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process lines fewer than 1000 #2
Conversation
@yiyangh-ps I just noticed this. Can you merge it? It will make testing a lot easier. |
I am waiting on anyone to review it. You can checkout |
char delim = absl::GetFlag(FLAGS_new_line_delim); | ||
for (const auto &filename : rest_args) { | ||
if (filename.empty()) { | ||
LOG(FATAL) << "Pipe input is not supported. Please use --input to specify the names of the input files"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is not pipe input supported? I am relying on this for in-memory processing of large amounts of bytes.
Also, please update the README accordingly. Because I think this basically means that all spm_encode
examples in the README will stop to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring getting data from file will have a huge impact on performance, as it adds the overhead of writing to disk and reading from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is not pipe input supported?
I don't know. It bugged me for a while too.
Requiring getting data from file will have a huge impact on performance
Only if the encoding is faster than the disk read+write speed. Otherwise the bottleneck is the CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. It bugged me for a while too.
I tried to pipe in some input. There are not any errors. But the output is empty. Is this also what you had observed?
The logs actually say that 100 sentences were encoded:
spm_encode_main.cc(200) LOG(INFO) Encoded 50 sentences
spm_encode_main.cc(228) LOG(INFO) Encoded 50 sentences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't remember what I saw.
This is superseded by #11 |
We'll add stdin support, if it does not already exist. |
No description provided.