Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jq/datastreams2 #80

Merged
merged 8 commits into from
Oct 27, 2015
Merged

Jq/datastreams2 #80

merged 8 commits into from
Oct 27, 2015

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Oct 20, 2015

Ok, a much more cleaned up PR (i.e. no merge conflicts). Not sure what happened with the git history, but I suspect it got borked because I've been doing so much locally lately and not staying in sync with master very well.

@quinnj
Copy link
Member Author

quinnj commented Oct 25, 2015

Alright, to summarize what's going on here:

  • There's been a substantial overhaul of the core SQLite interactions, mainly name changes to follow common Julia patterns, such as SQLite.bind! instead of just exported bind; this avoids polluting namespaces since bind is a fairly common term in other domains; it also adds the ! operator since this is a "mutating" type function; it's actually binding a value to an SQL statement, mutating it.
  • Related to above, there's almost nothing exported from the package now; this again is inline with current Julia package best practices of not polluting the main namespace and forcing more qualification of usage.
  • All the "old UI" methods/types still exist, with appropriate deprecations. In particular, test/old_runtests.jl is the exact same file test/runtests.jl was before, and still passes tests (with lots of deprecation warnings). So no code should break at all with this change
  • A more minor point is the removal of the close methods for SQLite.DB and SQLite.Stmt; ultimately, the "closing" of these resources is now handled by finalizers, which I think ends up being a bit cleaner and helps users avoid shooting themselves in the foot (feet?).
  • Implementation of the DataStreams framework through the SQLite.Source and SQLite.Sink types
    • Before, we had create, append, and query, but they were fairly independent, duplicated code, and it was hard to extend (say, for adding CSV file support: either dumping an SQLite table out to a csv file, or loading one directly into a table).
    • The DataStreams framework now provides a clean interface for creating Source and Sink types and the appropriate streaming methods between Data.Table (the default Julia structure), CSV, and SQLite. What's more, it's fairly trivial to continue to extend these types to stream with other sources and sinks in the future
    • Along with this implementation, the overall "fetching" process has been overhauled, mainly focusing on (hopefully) stronger typed results, cleaner code, and easier extensibility.

I've updated the docs, so as I see it, this is ready to merge. I'd obviously appreciate feedback from anyone/everyone who sees this, but I'd like to get this merged tomorrow (Sunday Oct 25) as a more formal announcement of DataStreams gets ready to be published.

@felipenoris
Copy link
Contributor

Great work!
I'm testing out simple stuff.

julia> db = SQLite.DB("Chinook_Sqlite.sqlite")
SQLite.DB("Chinook_Sqlite.sqlite")

julia> SQLite.query(db, "SELECT * FROM Genre")
Data.Table:
25x2 Data.Schema:
 GenreId,       Name
   Int64, UTF8String
Error showing value of type DataStreams.Data.Table{Array{ERROR: MethodError: `type2string` has no method matching type2string(
 in show at /Users/felipenoris/.julia/v0.4/NullableArrays/src/show.jl:7
 in show_type_parameter at show.jl:100
 in show at show.jl:110
 in show_type_parameter at show.jl:100
 in show at show.jl:110
 in print at strings/io.jl:8
 in print at strings/io.jl:18
 in print_response at REPL.jl:139
 in print_response at REPL.jl:121
 in anonymous at REPL.jl:624
 in run_interface at ./LineEdit.jl:1610
 in run_frontend at ./REPL.jl:863
 in run_repl at ./REPL.jl:167
 in _start at ./client.jl:453SYSTEM: show(lasterr) caused an error

@felipenoris
Copy link
Contributor

Oh, > Pkg.checkout("NullableArrays") did the job for me... That package needs to be tagged.

julia> db = SQLite.DB("Chinook_Sqlite.sqlite")
SQLite.DB("Chinook_Sqlite.sqlite")

julia> SQLite.query(db, "SELECT * FROM Genre")
Data.Table:
25x2 Data.Schema:
 GenreId,       Name
   Int64, UTF8String
NullableArrays.NullableArray{T,1}[NullableArrays.NullableArray{Int64,1}[1,2,3,4,5,6,7,8,9,10    16,17,18,19,20,21,22,23,24,25],NullableArrays.NullableArray{UTF8String,1}["Rock","Jazz","Metal","Alternative & Punk","Rock And Roll","Blues","Latin","Reggae","Pop","Soundtrack"    "World","Hip Hop/Rap","Science Fiction","TV Shows","Sci Fi & Fantasy","Drama","Comedy","Alternative","Classical","Opera"]]

@quinnj
Copy link
Member Author

quinnj commented Oct 25, 2015

Yeah, we need to actually get NullableArrays, Libz, and BufferedStreams to all get new tags in METADATA

@ScottPJones
Copy link

This looks like some very nice improvements!
I was a bit concerned about the comment of depending on finalizers now, since there is no guarantee that those will get called, and in general, don't get called in any predictable fashion.
I've been adding finalizers, just in case they forget to do so, but keeping close/disconnect methods.

@quinnj
Copy link
Member Author

quinnj commented Oct 26, 2015

It's a good point to bring up Scott. I've mulled this issue a bit more than most, I presume, and I've concluded that there's no great answer right now in Julia; for a greater discussion on the subject, see JuliaLang/julia#11207. What I've settled on is that we kind of have to rely on finalizers, since we need to destroy the resources here eventually to avoid database corruption and memory leaks. The way the finalizers are setup here also ensures that things can be killed in any order and they'll all be shut down correctly (i.e. if a statement is finalized before a db, then that's obviously fine, but even if the db is finalized before the statement, we're now using sqlite3_close_v2, which will hold the db in a zombie state of sorts until the last statement is finalized at which point it will finalize the db). I have actually defined _close methods for SQLite.DB and SQLite.Stmt, but I'm still not comfortable making them part of the official UI. I just think it's too easy to get yourself into trouble when you start manually closing things.

So in short, I realize that this isn't a "perfect world" kind of solution, but it seems to be about the best we can do currently in Julia. The recommended best practice is to just create SQLite.DB and SQLite.Stmt as needed and let them be automatically finalized as they go out of scope. If you have the application bandwidth, you can also call finalize(db) or finalize(stmt) to ensure things get taken care of at a specific point in time.

@ScottPJones
Copy link

Yes, and I was one of the biggest commenters on JuliaLang/julia#11207, that issue was in fact why I brought this up here now.
I definitely realize it's not a perfect world (yet, but I'm always optimistic!) in Julia, I've been struggling with this issue since May myself, for Aerospike, MariaDB, etc.
I wonder if adding connect/open etc. functions with the first argument as a function, as Base.open does, and recommending that style for your SQLite package, would be cleaner and safer (using the open(args) do db ; ...code... ; end syntax), so that any code could be wrapped with try/catch/finally to make sure that even in the face of exceptions, the db is closed properly and resources returned.
Anyway, keep up the good work, on this and with CSV!

@quinnj
Copy link
Member Author

quinnj commented Oct 27, 2015

That's not a bad idea to have do .. end function block versions for DB and Stmt. For DB, it doesn't seem like it would really fit my usual use-case, but I could see that being a natural way some people would like to use it. @ScottPJones, would you mind opening a feature request issue about it?

quinnj added a commit that referenced this pull request Oct 27, 2015
@quinnj quinnj merged commit 2aa4c3e into master Oct 27, 2015
@quinnj quinnj deleted the jq/datastreams2 branch October 27, 2015 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants