Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Sparse ModelMatrix support #1040

Merged
merged 11 commits into from
Aug 26, 2016
21 changes: 13 additions & 8 deletions src/statsmodels/formula.jl
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,11 @@ type ModelFrame
contrasts::Dict{Symbol, ContrastsMatrix}
end

type ModelMatrix{T <: @compat(Union{Matrix{Float32}, Matrix{Float64}, SparseMatrixCSC{Float32,Int}, SparseMatrixCSC{Float64,Int}})}
modelmatrixcontainertypes = [Matrix{Float32}, Matrix{Float64},
Copy link
Member

@nalimilan nalimilan Aug 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should accept any subtype of Union{AbstractMatrix{Float32}, AbstractMatrix{Float32}}? Or is there anything specific that only Matrix and SparseMatrixCSC support?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware of?

Copy link
Member

@ararslan ararslan Aug 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we even skip the Union and do AbstractMatrix{AbstractFloat}, or does it have to be 32- or 64-bit floats?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea. @andreasnoack?

SparseMatrixCSC{Float32,Int},
SparseMatrixCSC{Float64,Int}]

type ModelMatrix{T <: Union{modelmatrixcontainertypes...}}
m::T
assign::Vector{Int}
end
Expand Down Expand Up @@ -437,21 +441,21 @@ If there is an intercept in the model, that column occurs first and its
Mixed-effects models include "random-effects" terms which are ignored when
creating the model matrix.
"""
function ModelMatrix(mf::ModelFrame)
function ModelMatrix(T::Union{map(t->Type{t}, modelmatrixcontainertypes)...}, mf::ModelFrame)
dfrm = mf.df
terms = droprandomeffects(dropresponse!(mf.terms))

blocks = Matrix{Float64}[]
blocks = T[]
assign = Int[]
if terms.intercept
push!(blocks, ones(size(dfrm, 1), 1)) # columns of 1's is first block
push!(assign, 0) # this block corresponds to term zero
push!(blocks, convert(T, ones(size(dfrm, 1), 1))) # columns of 1's is first block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since blocks is already typed, conversion should happen automatically without changing anything AFAIK.

push!(assign, 0) # this block corresponds to term zero
end

factors = terms.factors

## Map eval. term name + redundancy bool to cached model matrix columns
eterm_cols = @compat Dict{Tuple{Symbol,Bool}, Array{Float64}}()
eterm_cols = @compat Dict{Tuple{Symbol,Bool}, T}()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any reason not to restrict the Array dimension here. Did I miss something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue I can think of is the case where a single-column term would give a column vector instead of a one-column matrix. But conversion will probably happen automatically, and tests should catch this. Have you run the tests of GLM.jl on the modified package?

Copy link
Author

@GordStephen GordStephen Aug 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I could tell modelmat_cols handles the vector->matrix conversion. The GLM.jl tests pass as well, so hopefully this is OK.

## Accumulator for each term's vector of eval. term columns.

## TODO: this method makes multiple copies of the data in the ModelFrame:
Expand All @@ -462,7 +466,7 @@ function ModelMatrix(mf::ModelFrame)
## "promoted" full-rank versions of categorical columns for non-redundant
## eval. terms:
for (i_term, term) in enumerate(terms.terms)
term_cols = Matrix{Float64}[]
term_cols = T[]
## Pull out the eval terms, and the non-redundancy flags for this term
ff = Compat.view(factors, :, i_term)
eterms = Compat.view(terms.eterms, ff)
Expand All @@ -479,8 +483,9 @@ function ModelMatrix(mf::ModelFrame)
append!(assign, fill(i_term, size(blocks[end], 2)))
end

ModelMatrix{Matrix{Float64}}(reduce(hcat, blocks), assign)
ModelMatrix{T}(reduce(hcat, blocks), assign)
end
ModelMatrix(mf::ModelFrame) = ModelMatrix(Matrix{Float64}, mf)


"""
Expand Down