Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform analysis on stdin? #96

Closed
pm64 opened this issue Feb 2, 2017 · 5 comments
Closed

Perform analysis on stdin? #96

pm64 opened this issue Feb 2, 2017 · 5 comments
Assignees
Milestone

Comments

@pm64
Copy link

pm64 commented Feb 2, 2017

From what I can see, Siegfried is only able to analyze files on disk. Is there any feature planned that would allow analysis of bytes piped in via stdin?

@richardlehane richardlehane self-assigned this Feb 3, 2017
@richardlehane richardlehane added this to the 1.7.0 milestone Feb 3, 2017
@richardlehane
Copy link
Owner

Hi @pm64 thanks for this message.

I hadn't planned to add this but it would be a fairly straight forward addition that I'm happy to consider (the underlying API can accept streams or files - https://godoc.org/github.com/richardlehane/siegfried#Siegfried.Identify).

The reason I've never added this before is because if you use standard PRONOM sigs then it would normally be much more efficient to let sf do the file handling. Lots of PRONOM sigs have end of file as well as beginning of file sequences & also wildcards that can appear anywhere in file: this means potentially lots of seeking and if you are supplying bytes rather than a file then those bytes will all be copied and stored by sf in memory until the match is made. So if you did want to go this route I'd suggest you'd probably also want to use the roy tool to customise a signature file that has no end of file sequences and has a fixed scan size. E.g. roy -bof 128000 -noeof. Does that make sense and fit with your use case?

The only other hurdle is I've stupidly already use the - flag (which is traditionally used to say read from STDIN) for reading lists of files to scan from stdin. So adding this feature would also necessitate an API change (& perhaps copying the file command's use of -f flag for reading lists of files).

@pm64
Copy link
Author

pm64 commented Feb 3, 2017

Hi @richardlehane, thank you for your thoughtful reply.

Your suggestion of excluding the EOF sequences from the signature file might help immensely in my use case, even though the file is already in RAM, depending on how I wind up streaming the bytes to stdin.

Either way, I'm pleased to learn this functionality is already supported on the API level. I know the typical use case is to read files from disk, but I think many Siegfried fans will appreciate the ability to read from stdin and the increased flexibility such a feature would provide.

@richardlehane
Copy link
Owner

Hi @pm64
this is now implemented in sf 1.7.0. Use sf - to scan stdin. Let me know if you hit any issues

@pm64
Copy link
Author

pm64 commented Feb 22, 2017

@richardlehane, I'm testing 1.7.0 for my use case and so far it is working flawlessly. Can't thank you enough for this awesome update!! Will keep you posted.

@richardlehane
Copy link
Owner

thanks @pm64 that's great to hear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants