-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFM: mark/reset for IOStream, IOBuffer, & AsyncStream (addresses #2638) #3656
Conversation
Very cool! |
return true | ||
end | ||
function compact(io::IOBuffer) | ||
if !io.writable error("compact failed") end | ||
if io.seekable error("compact failed") end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test was to ensure data integrity and enforce the distinction between a file-like object and a stream-like object. i don't think it should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll put this back. I thought I needed this before I added mark()
and company to AsyncStreams
, but I'm pretty sure it isn't necessary now.
+1 |
This is very cool functionality. Let's let this get reviewed as an API before we merge it though – I have a nagging feeling about this for some reason. |
I agree with Stefan, it is very cool, but something (possibly minor) is bothering me too. It could be that
|
@JeffBezanson , I've been updating this, and didn't see your comments until after the updates (which I finished earlier and just pushed). I've modified the description at the top with what the current patch provides.
mark(io) do f
a = chomp(readline(f))
if a == "JL"
seekmark(f) #jump back to mark
result = io_handler(f)
end
...
end
# exiting the do block resets io (calling seekmark(io))
b = chomp(readline(io))
# b == "JL" (if it did above) It should actually be trivial to implement a multibyte |
Didn't have a chance to finish responding earlier. I generally agree with most of @JeffBezanson's comments above. My original motivation was simply to have a multibyte peek, which requires some sort of buffering, and there was some interest in this mechanism when I brought it up in #2638. The main point of this interface, I think, is when parsing an input stream, and you need to read some of the stream in order to determine which parser to use. An example where this would be useful: I use files that are compressed in bgzip format, which is a gzip variant with an extra section in the header giving the compressed block length, allowing indexing and random access into a bgzip file. It would be useful to read the gzip header to see if this extra header section was there, and then reset the stream and send it off to the correct decompressor. A similar argument could be made for auto-detecting other file types. Removing ( |
I think I can agree that mark/seekmark is much better than |
|
One step ahead of you (barely). (I did leave it WIP for a reason! :-) The locks are necessary (or at least a good idea) if you want to use marking with the But, really, that syntax is just sugar, and it complicates things unnecessarily, so I just removed it. The current request is pretty minimal. I'm going to squash, and actually change it to RFC--let me know if you have additional feedback. |
I hesitated to pollute the Base namespace with useful names, but given the typical use cases, I decided that I liked |
Should |
Updates:
|
What about this following scenario... You have code that sets a mark and then calls some other routines to do processing on the stream. The code that it calls also wants to set a mark and possibly rewind in order to do its thing. Now this will fail. I don't see why the callee or caller should care if the other sets a mark or not. It seems to me that the only way to make this work is to have a mark stack. Ideally, however, the API should not make it possible (or at least easy) to reset when you haven't marked or reset someone else's mark. |
The "do" version of However, I think that's getting a little complicated, and should probably be part of a separate pull request. |
The API could be
where |
Another possibility is to have |
Okay, that would work. It would have to be a token and not a location, at least for |
I finally got around to finishing this up. Marks are implemented as a stack, per @StefanKarpinski and @JeffBezanson's suggestion, and users are required to keep track of mark tokens, as in @JeffBezanson's example above. The patch includes docs and tests. The tests pass on my system. Travis seems to have failed because of a missing ncurses dependency. If there are no additional comments or suggestions, I think this is good to merge. |
Whoops, no, it was my fault. Forgot to add a file when I was cleaning things up. Fixed in a sec... |
Travis gcc build passed, clang failed... not clear to me why. |
bump. lgtm. can this be merged now? (travis build error was unrelated) |
Fine with me. |
I'll rebase in the morning. |
Rebased. It would be good to hear from @StefanKarpinski and/or @JeffBezanson before merging, as they have expressed the strongest opinions about previous versions (and the current implementation is based on their suggestions). Otherwise, this should be good to go. |
As long as this allows you to stack marks, I defer to @JeffBezanson's opinion. |
Marks are stacked, although unlike your original proposal, reseting a mark removes all later marks. I did this under the assumption that later marks would most likely have been added by called functions (or inner code blocks), which would have exited by the time |
Please stop adding changes to 0.3! |
Rebased. Although I would love to see this merged (if only so I can stop wasting my time rebasing), I can also sympathize with @StefanKarpinski's desire not to muck with the v0.3 requirements. So if this should go in, please merge it (when travis passes, which it should). If it should not go in, please remove the v0.3 tag, and we can merge as soon as v0.4 opens. |
There's also a complementary functionality to consider: cork and uncork – corking an output stream would prevent things printed to it from being flushed, buffering output until an uncork. This can be useful for making sure that output is contiguous. Not sure if it makes sense to think about these together. There seems to be some relation. |
I want to merge this but I find it very hard to stomach allocating an extra array for every |
Isn't the ability to remove arbitrary marks (with |
good point. would it be worth leaving this field uninitialized until you use it for the first time? or using tuples? |
it removes all marks to the end, thereby invalidating all later marks (although it doesn't have a way to truly validate them later) |
Ah, of course. I find it strange to remove arbitrary marks; it's just hard to see when I would use that. |
That's fair. There are also a lot of other things in |
a few extra Bools and an Int allocated inline is much less than an Array object |
It's been a while since I implemented this, but if you read back through the comments above, the current implementation was loosely based on #3656 (comment). Originally, there was only one mark--there's no reason it couldn't be changed back to that. In theory, the current mechanism could be used for complicated or multilevel header parsing, as in multimedia files, or perhaps as input to backtracking parsers. But these are more obscure, and no one is clamoring for these features. |
That's true. But at least if string building was a separate entity, the extra array allocation in this PR wouldn't affect string building. Anyway, I just took out the array and replaced it with another inline |
Okay, I reverted back to 1 mark per I started to rename the functions to |
I think I see now; it's a stack of marks but you can remove any number from the end for something like exception unwinding. To me it's nice for I/O code to also be string building code. In fact it's hard to distinguish them, since we often write text-producing code using calls to Maybe this is a case where a linked list is appropriate. You wouldn't expect the mark stack to get very deep, and we could incrementally add one tiny object per mark. |
It could certainly be done with a linked-list. The behavior and logic would be different (and I think more complicated) than the array implementation. At any rate, I won't have time to work on this soon. The current version allows just one mark, with just one extra field (an I think this is still useful, and generic enough that it can be extended easily in the future. @JeffBezanson, if you agree (and travis is green), take a quick look and merge if all is good. If not, please remove the 0.3 tag, and we can return to this in 0.4. (If anyone else wants to update this to use a linked list, feel free.) |
|
||
.. function:: unmark(s) | ||
|
||
Remove a mark from stream ``s``. Throws an error if ``s`` is not marked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second sentence is wrong; looks like it can just be removed.
Also added show() for IOBuffer. * Define mark, reset, unmark, ismarked for IO only. This is slightly sketchy, as it assumes that IO.mark exists, but is perhaps better than duplicating the code for all subtypes.
Okay, I've redefined I also fixed the I considered creating a simple interface (ala graphics) which required |
RFM: mark/reset for IOStream, IOBuffer, & AsyncStream (addresses #2638)
Thanks! On Monday, June 30, 2014, Jeff Bezanson notifications@github.com wrote:
|
Changes Unknown when pulling 990f8f1 on kmsquire:buffered_reader into * on JuliaLang:master*. |
Stdlib: Pkg URL: https://github.com/JuliaLang/Pkg.jl.git Stdlib branch: master Julia branch: master Old commit: b02fb9597 New commit: ffb6edf03 Julia version: 1.11.0-DEV Pkg version: 1.11.0 Bump invoked by: @IanButterworth Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaLang/Pkg.jl@b02fb95...ffb6edf ``` $ git log --oneline b02fb9597..ffb6edf03 ffb6edf03 cache pidlock tweaks (#3654) 550eadd7e Pin registry for MetaGraph tests (#3666) ee39026b8 Remove test that depends on Random being in the sysimg (#3656) 561508db2 CI: Increase the CI timeout. Update actions. Fix double precompilation. (#3665) 7c7ed63b1 Remove change UUID script it should be uncessary on Julia v1.11-dev (#3655) a8648f7c8 Precompile: Fix algorithmic complexity of cycle detection (#3651) 0e0cf4514 Switch datastructure Vector -> Set for algorithmic complexity (#3652) 894cc3f78 respect if load-time precompile is disabled (#3648) 3ffd1cf73 Make auto GC message use printpkgstyle (#3633) ``` Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Edit:
The current patch provides:
mark()
,reset()
,unmark()
, andismarked()
forIOStream
andIOBuffer
(andAsyncStream
throughIOBuffer
)AsyncStreams
andIOStream
These are loosely based on ideas from Java's BufferedReader, which allows a programmer to mark a location in an IO stream, and then seek back to that location later, even if the IO stream itself is not seekable:
See #2638 for further discussion.