Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: DataTables.jl Backport #1214

Closed
wants to merge 264 commits into from
Closed

WIP: DataTables.jl Backport #1214

wants to merge 264 commits into from

Conversation

rofinn
Copy link
Member

@rofinn rofinn commented Aug 17, 2017

The nl/datatables2 branch rebased with master.

TODO:

  • Collect all the DataTable commits that should be backported (completed by @nalimilan).
  • Rebase with master branch of DataFrames.jl
  • Add fixes for issues identified during rebasing (e.g., reverted features, Compat wasn't removed in DataTables, still supports 0.5)
  • Remove DataFrames usage from DataStreams.jl?

NOTES:

  • This backport changes the default storage from DataArray to NullableArray.
  • We'll be porting another PR shortly after this which changes the default column storage to Vector{?T} now that Nulls.jl is registered.

nalimilan and others added 30 commits September 22, 2016 15:27
Shorter written that way for now. Filed as JuliaStats/NullableArrays.jl#144.
This depends on a CategoricalArrays change by which levels are sorted when
creating the array.
There's no inconsistency here: when the input is a Matrix, there's no
point in returning a NullableArray. Anyway, these are test methods.
We don't have to handle this right now.
Keep this in DataFrames for now, renaming it to the more explicit
sharepools(). Also relax signatures to accept non-Nullable categorical arrays.
These were not exercized by the tests, and the use case for them isn't obvious.
(They were formerly methods of DataArrays.PooledDataArray().)
For NullableArrays, even current git master is not enough at this time.
Tests pass, but the Nullable{Any} results could be annoying for users.
New type merging NominalArray and OrdinalArray in 0.0.5.
These shouldn't live in DataFrames.
The full code isn't that longer, and we can go back to the original code
when RDataset will have been ported.
Since we give them the same semantics for now, there's no point in keeping
separate functions.
More logical name for the constructor provided by CategoricalArray.
categorical!() is the in-place equivalent provided only by DataFrames. Also
move the compacting logic into CategoricalArrays, as it's both a cleaner
design and a more efficient one.
cjprybol and others added 14 commits September 2, 2017 05:44
io.md does not exist since the readtable() has been removed. Pooling
should be called categorical, so rename the file and sections.
CategoricalValue entries should always be printed via showcompact() in order
to get a short representation. This uses ourshowcompact() to do that when
printing DataTables to REPL via show, as well as when printing them to HTML,
LaTeX and CSV via printtable().

Also fix a failure due to duplicated keyword arguments on Julia 0.7.
- Changed julia requirement to 0.6 minimum
- Stopped testing on 0.5
- Removed Compat dependency
- Fixed a few 0.6 warnings.
@quinnj
Copy link
Member

quinnj commented Sep 2, 2017

Rebased. Porting over #66 now.

@nalimilan
Copy link
Member

@quinnj Also push the commits to nl/datatables3 when they are ready, that will be the branch to use in the end.

@quinnj
Copy link
Member

quinnj commented Sep 2, 2017

Ok, got a good start here. Merged #66 with this branch and resolved all the conflicts/renaming issues (there are still a few dt-like variable names hanging around). Now it's just getting all tests passing.

@nalimilan
Copy link
Member

I've included the new changes into nl/datatables4, and merged it with a merge commit into a new nl/datatables-backport branch which was equivalent to master: #1220. If that's OK, I can do the same directly on master (or rename master to release-0.9 and nl/datatables-backport to master).

@rofinn
Copy link
Member Author

rofinn commented Sep 2, 2017

I'll note that I was planning on rebasing (vs merging) the changes from DataTables#66 into this branch. The rf/datatables and nl/datatables-backport branches are very different now, so I'm just going to close this.

@rofinn rofinn closed this Sep 2, 2017
@ararslan ararslan deleted the rf/datatables branch September 2, 2017 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.