-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lzo.index.tmp files not deleted #87
Comments
We have not seen this in our self-hosted environment. Might be due something EC2 specific. Do you have any theories about the root cause? Sometimes (observed twice by now) we had the following issue: all lzo.index is generated, but some of the lzo.index.tmp files are not deleted and cause problem when processing them with pig. No exception or error is thrown during the indexing and job is reported to run successfully. —Reply to this email directly or view it on GitHub. |
Meanwhile we have noticed that these index.tmp files disappeared. We 2014-01-29 dvryaboy notifications@github.com
|
I see. Well perhaps it would make sense to add a filter to the lzo input formats so they ignore these temp files and you don't get an error. Feel free to send a pull request with such a change, we will be happy to take a look. |
excluding .tmp files is a good fix. There are other subtle issues with S3 because of these delays e.g. https://github.com/kevinweil/elephant-bird/issues/309 |
We use distributed lzo indexer on EMR (hadoop version: 1.0.3), files stored on Amazon s3.
Sometimes (observed twice by now) we had the following issue:
all lzo.index is generated, but some of the lzo.index.tmp files are not deleted and cause problem when processing them with pig. No exception or error is thrown during the indexing and job is reported to run successfully.
The text was updated successfully, but these errors were encountered: