- Allow passing multiple values to add in
push!
,pushfirst!
,append!
, andprepend!
(#3372) rename
andrename!
now allow to apply a function transforming column names only to a subset of the columns specified by thecols
keyword argument (#3380)mapcols
andmapcols!
now allow to apply a function transforming columns only to a subset of the columns specified by thecols
keyword argument (#3386)
- Correctly throw an error if negative number of rows is passed
to
first
orlast
(#3402) - Always use the default thread pool for multithreaded operations,
instead of using the interactive thread pool when Julia was started
with
-tM,N
with N > 0 (#3385) - Correctly return
Bool[]
in thenonunique
function applied to a data frame with a pulled column that has zero levels in the pool (#3393) - Correctly index
eachrow
andeachcol
withCartesianIndex
(#3413) - Correctly handle non-standard integers when converting them to
BigInt
(#3419)
- The
by
andaggregate
functions that were deprecated before 1.0 release are now removed. (#3422)
- Ensure that
allunique(::AbstractDataFrame, ::Any)
always gets interpreted as test for uniqueness of rows in the first positional argument (#3434) - Make sure that an empty vector of
Any
or ofAbstractVector
is treated as having no columns when a data frame is being processed withcombine
/select
/transform
. (#3435)
- Fix error in specification of dependency on DataStructures.jl (#3359)
- Objects inheriting from
Tables.AbstractRow
are now treated in the same way asDataFrameRow
byselect
/transform
/combine
functions. In previous versions they were treated as a scalar, but this was inconsistent with the intention ofTables.AbstractRow
definition (#3348)
- Add
Iterators.partition
support forDataFrameRows
(#3299) - Add support for
renamecols
keyword argument incrossjoin
(#3314) DataFrameRows
andDataFrameColumns
now supportnrow
,ncol
, andTables.subset
(#3311)Not
allows passing multiple positional arguments that are treated as if they were wrapped inCols
and does not throw an error when a vector of duplicate indices is passed when doing column selection (#3302)- Added the kwarg
checkunique
to sorting related functions (issorted
,sort
,sort!
andsortperm
) that throws an error when duplicate elements make multiple sort orders valid (#3312) reduce
performingvcat
on a collection of data frames now acceptsinit
keyword argument (#3310)- Allow to pass column names in
DataFrame
constructor that replace the names generated by default (#3320) describe
now has:sum
available as a descriptive statistic. (#3303)
deleteat!
correctly handles the situation when vector of rows to be dropped from a data frame is its column or might alias with some of its columns (#3304)
- Add
Iterators.partition
support (#3212) - Add
allunique
and allow transformations incols
argument ofdescribe
andnonunique
when working withSubDataFrame
(3232) - Add support for
Tables.AbstractRow
forpush!
,pushfirst!
, andinsert!
(#3245) - Add support for
operator
keyword argument inCols
to take a set operation to apply to passed selectors (union
by default) (3224) - Allow to pass multiple predicates in
Cols
and mix them with other selectors (3279) - Improve support for setting group order in
groupby
(3253) - Joining functions now support
order
keyword argument allowing the user to specify the order of the rows in the produced table (#3233) - Add
keep
keyword argument tononunique
,unique
, andunique!
allowing to specify which duplicate rows should be kept (#3260) - Add
haskey
andget
methods toDataFrameColumns
to make it support dictionary interface more completely (#3282) - Allow passing
scalar
keyword argument inflatten
(#3283)
- passing very many data frames to
innerjoin
andouterjoin
does not lead to stack overflow (#3233) - fixed incorrect handling of passing no conditions in
subset
andsubset!
(#3264) - fixed error in fast aggregation in
sum
andmean
of columns only havingmissing
values (#3268) - fixed error in indexing of
SubDataFrame
that has no columns selected from its parent (#3273)
dropmissing
creates new columns in a single pass ifdisallowmissing=true
(#3256)
- Fix bug in
select
andtransform
withcopycols=false
onSubDataFrame
that incorrectly allowed passing transformations (#3231)
- Fix incorrect handling of column metadata in
insertcols!
andinsertcols
(#3220) - Correctly handle
GroupedDataFrame
with no groups in multi-column operation specification syntax (#3122)
- Improve printing of grouping keys when displaying
GroupedDataFrame
(#3213)
- Support updates of metadata API introduced in DataAPI.jl 1.13.0 (3216)
- Make sure
flatten
works correctly on a data frame with zero rows (#3198)
- Make sure we always copy the indexing value when calling
getindex
onDataFrameRows
object (#3192)
- DataFrames.jl 1.4 requires Julia 1.6 (#3145)
subset
andsubset!
now allow passing zero column selectors (#3025)subset
andsubset!
processingGroupedDataFrame
allow using a scalar as a subsetting condition (this will result in including/excluding a whole group); forAbstractDataFrame
processing onlyAbstractVector
subsetting condition is allowed as accepting scalars can lead to hard to catch bugs in users' code (#3032)permutedims
now supports astrict
keyword argument that allows for a more flexible handling of values stored in a column that will become a new header (#3004)unstack
now allows passing a function incombine
keyword argument; this allows for a convenient creation of two dimensional pivot tables (#2998, #3185)filter
forGroupedDataFrame
now acceptsungroup
keyword argument (#3021)- Add special syntax for
eachindex
,groupindices
, andproprow
to transformation mini-language (#3001). - Add support for
reverse!
,permute!
,invpermute!
,shuffle
, andshuffle!
functions. Improve functionality ofreverse
. (#3010). first
andlast
forGroupedDataFrame
now support passing number of elements to get (#3006)- Add
insertcols
, which is a version ofinsertcols!
that creates a new data frame (#3020) - Add
fillcombinations
function that generates all combinations of levels of selected columns of a data frame (#3012) - Guarantee that
permute!
andinvpermute!
throw on invalid input (#3035) - Add
allcombinations
function that returns a data frame created from all combinations of the passed vectors (#3031) - Add
resize!
,keepat!
,pop!
,popfirst!
, andpopat!
, makedeleteat!
signature more precise (#3047) - Add
pushfirst!
andinsert!
(#3072) - New
threads
argument allows disabling multithreading incombine
,select
,select!
,transform
,transform!
,subset
andsubset!
(#3030) - Add support for table-level and column-level metadata using DataAPI.jl interface (#3055)
completecases
andnonunique
no longer throw an error when data frame with no columns is passed (#3055)describe
now accepts two predefined arguments::nnonmissing
and:nuniqueall
(#3146)
- On Julia 1.7 or newer broadcasting assignment into an existing column of a data frame replaces it. Under Julia 1.6 or older it is an in place operation. (#3022)
allowduplicates
keyword argument inunstack
is deprecated,combine
keyword argument should be used instead (#3185)
DataFrame
is now amutable struct
and has three new fieldsmetadata
,colmetadata
, andallnotemetadata
; this change makesDataFrame
objects serialized under earlier versions of DataFrames.jl incompatible with version 1.4 (#3055)
- fix dispatch ambiguity in
rename
andrename!
when only source data frame is passed (#3055) - Make sure that
AsTable
accepts only valid argument (#3064) - Make sure we avoid aliasing when repeating the same column
in
select[!]
andtransform[!]
onGroupedDataFrame
(#3070) - Make
vcat
correctly handlecols
keyword argument if only data frames having no columns are passed (#3081) - Make
subset
preserves group ordering whenungroup=false
likesubset!
already does (#3094) - Fix incorrect behavior of
GroupDataFrame
indexing in corner cases (#3179) - Fix errors in
insertcols!
when no columns to add are passed (#3179) - Fix errors in
minimum
andmaximum
aggregates when processingGroupedDataFrame
withcombine
in corner cases (#3179)
- Speed up
permute!
andinvpermute!
(and therefore sorting) 2x-8x for large tables by using cycle notation (#3035) - Make one-dimensional multi-element indexing of
DataFrameRows
returnDataFrameRows
(#3037) - Make
transform!
onSubDataFrame
faster (#3070)
- Support
Tables.subset
and moveByRow
definition to Tables.jl (#3158)
- Fix overly restrictive type assertion in
filter
andfilter!
(#3155)
- Allow version 4 of Compat.jl
- Fix handling of
variable_eltype
instack
(#3043)
- Fix handling of
matchmissing
keyword argument in joins (#3040)
- Make sure that
select!
/transform!
andselect
/transform
(withcopycols=false
) do not produce aliases of the same source column consistently (currently onlytransform[!]
ensured it for an unwrapped column renaming operation) (#2983) - Fix aliasing detection in
sort!
(now only identical columns passing===
test are considered aliases) (#2981) - Make sure
ByRow
calls wrapped function exactly once for each element in all cases (#2982)
- Fix
getindex
that incorrectly allowed vectors ofPair
s (#2970)
-
Improve
sort
keyword argument ingroupby
(#2812).In the
groupby
function thesort
keyword argument now allows three values:nothing
(the default) leaves the order of groups undefined and allowsgroupby
to pick the fastest available grouping algorithm;true
sorts groups by key columns;false
creates groups in the order of their appearance in the parent data frame;
In previous versions, the
sort
keyword argument allowed onlyBool
values andfalse
(which was the default) corresponded to the new behavior whennothing
is passed. Therefore only the user visible change affecting existing code is whensort=false
is passed explicitly. The order of groups was undefined in that case, but in practice groups were already created in their order of appearance, except when grouping columns implemented theDataAPI.refpool
API (notablyPooledArray
andCategoricalArray
) or when they contained only integers in a small range. (#2812) -
the
unstack
function receives new keyword argumentfill
(withmissing
default) that is used to fill combinations of not encountered rows and columns. This feature allows to distinguish between missings in value column and just missing row/column combinations and to easily fill with zeros non existing combinations in case of counting. (#2828) -
Allow adding new columns to a
SubDataFrame
created with:
as column selector (#2794).If
sdf
is aSubDataFrame
created with:
as a column selector theninsertcols!
,setindex!
, and broadcasted assignment allow for creation of new columns, automatically filling filtered-out rows withmissing
values; -
Allow replacing existing columns in a
SubDataFrame
with!
as row selector in assignment and broadcasted assignment (#2794).Assignment to existing columns allocates a new column. Values already stored in filtered-out rows are copied.
-
Allow
SubDataFrame
to be passed as an argument toselect!
andtransform!
(also onGroupedDataFrame
created from aSubDataFrame
) (#2794).Assignment to existing columns allocates a new column. Values already stored in filtered-out rows are copied. In case of creation of new columns, filtered-out rows are automatically filled with
missing
values. IfSubDataFrame
was not created with:
as column selector the resulting operation must produce the same column names as stored in the sourceSubDataFrame
or an error is thrown. -
Tables.materializer
when passed the following types or their subtypes:AbstractDataFrame
,DataFrameRows
,DataFrameColumns
returnsDataFrame
. (#2839) -
the
insertcols!
function receives new keyword argumentafter
(withfalse
default) that specifies if columns should be inserted after or beforecol
. (#2829) -
Added support for
deleteat!
(#2854) -
leftjoin!
performing a left join of two data frame objects by updating the left data frame with the joined columns from right data frame. (#2843) -
the
DataFrame
constructor when column names are passed to it as a second argument now determines if a passed vector of column names is valid based on its contents and not element type (#2859) -
the
DataFrame
constructor when matrix is passed to it as a first argument now allowscopycols
keyword argument (#2859) -
Cols
now accepts a predicate accepting column names as strings. (#2881) -
In
source => transformation => destination
transformation specification minilanguage nowdestination
can be also aFunction
generating target column names and taking column names specified bysource
as an argument. (#2897) -
subset
andsubset!
now allow passing multiple column selectors and vectors or matrices ofPair
s as specifications of selection conditions (#2926) -
When using broadcasting in
source .=> transformation .=> destination
transformation specification minilanguage nowAll
,Cols
,Between
, andNot
selectors when used assource
ordestination
are properly expanded to selected column names within the call data frame scope. (#2918) -
describe
now accepts:detailed
as thestats
argument to compute standard deviation and quartiles in addition to statistics that are reported by default. (#2459) -
sort!
now supports generalAbstractDataFrame
(#2946) -
filter
now supportsview
keyword argument (#2951)
- fix a problem with
unstack
on empty data frame (#2842) - fix a problem with not specialized
Pair
arguments passed as transformations (#2889) - sorting related functions now more carefully check passed arguments for correctness. Now all keyword arguments are correctly checked to be either scalars of vectors of scalars. (#2946)
- for selected common transformation specifications like e.g.
AsTable(...) => ByRow(sum)
use a custom implementations that lead to lower compilation latency and faster computation (#2869), (#2919)
delete!
is deprecated in favor ofdeleteat!
(#2854)- In
sort
,sort!
,issorted
andsortperm
it is now documented that the result of passing an empty column selector uses lexicographic ordering of all columns, but this behavior is deprecated. (#2941)
- In DataFrames.jl 1.4 release on Julia 1.7 or newer broadcasting assignment into an existing column of a data frame will replace it. Under Julia 1.6 or older it will be an in place operation. (#2937
- fix a bug in
crossjoin
if the first argument isSubDataFrame
andmakeunique=true
(#2826)
- Add workaround for
deleteat!
bug in Julia Base indelete!
function (#2820)
- add option
matchmissing=:notequal
in joins; inleftjoin
,semijoin
andantijoin
missings are dropped in right data frame, but preserved in left; inrightjoin
missings are dropped in left data frame, but preserved in right; ininnerjoin
missings are dropped in both data frames; inouterjoin
this value of keyword argument is not supported (#2724) - correctly handle selectors of the form
:col => AsTable
and:col => cols
by expanding a single column into multiple columns (#2780) - if
subset!
is passed aGroupedDataFrame
the grouping in the passed object gets updated to reflect rows removed from the parent data frame (#2809)
- fix bug in how
groupby
handles grouping of float columns; now-0.0
is treated as not integer when deciding on which grouping algorithm should be used (#2791) - fix bug in how
issorted
handles custom orderings and improve performance of sorting when complex custom orderings are passed (#2746) - fix bug in
combine
,select
,select!
,transform
, andtransform!
that incorrectly disallowed matrices ofPair
s inGroupedDataFrame
processing (#2782) - fix location of summary in
text/html
output (#2801)
SubDataFrame
,filter!
,unique!
,getindex
,delete!
,leftjoin
,rightjoin
, andouterjoin
are now more efficient if rows selected in internal operations form a continuous block (#2727, #2769)
hcat
of a data frame with a vector is now deprecated to allow consistent handling of horizontal concatenation of data frame with Tables.jl tables in the future (#2777)
text/plain
rendering of columns containing complex numbers is now improved (#2756)- in
text/html
display of a data frame show full type information when hovering over the shortened type with a mouse (#2774)
- fix performance issue when aggregation function produces multiple rows in split-apply-combine (2749)
completecases
is now optimized and only processes columns that can contain missing values; additionally it is now type stable and always returns aBitVector
(#2726)- fix performance bottleneck when displaying wide tables (#2750)
- make sure
subset
checks if the passed condition function returns a vector of values (in the 1.0 release also returning scalartrue
,false
, ormissing
was allowed which was unintended and error prone) (#2744)
- fix of performance issue of
groupby
when using multi-threading (#2736) - fix of performance issue of
groupby
when usingPooledVector
(2733)
- No breaking changes are planned for v1.0 release
- DataFrames.jl now checks that passed columns are 1-based as this is a current design assumption (#2594)
mapcols!
makes sure not to create columns beingAbstractRange
consistently with other methods that add columns to aDataFrame
(#2594)transform
andtransform!
always copy columns when column renaming transformation is passed. If similar issues are identified after 1.0 release (i.e. that a copy of data is not made in scenarios where it normally should be made these will be considered bugs and fixed as non-breaking changes) (#2721)
firstindex
,lastindex
,size
,ndims
, andaxes
are now consistently defined and documented in the manual forAbstractDataFrame
,DataFrameRow
,DataFrameRows
,DataFrameColumns
,GroupedDataFrame
,GroupKeys
, andGroupKey
(#2573)- add
subset
andsubset!
functions that allow to subset rows (#2496) names
now allows passing a predicate as a column selector (#2417)vcat
now allows asource
keyword argument that specifies the additional column to be added in the last position in the resulting data frame that will identify the source data frame. (#2649)GroupKey
andDataFrameRow
are consistently behaving likeNamedTuple
in comparisons and they now implement:hash
,==
,isequal
,<
,isless
(#2669])- since Julia 1.7 using broadcasting assignment on a
DataFrame
column selected as a property (e.g.df.col .= 1
) is allowed when column does not exist and it allocates a fresh column (#2655) delete!
now correctly handles the case when columns of a data frame are aliased (#2690)
- in
leftjoin
,rightjoin
, andouterjoin
theindicator
keyword argument is deprecated in favor ofsource
keyword argument;indicator
will be removed in 2.0 release (2649) - Using broadcasting assignment on a
SubDataFrames
column selected as a property (e.g.sdf.col .= 1
) is deprecated; it will be disallowed in the future. (#2655) - Broadcasting assignment to an existing column of a
DataFrame
selected as a property (e.g.df.col .= 1
) being an in-place operation is deprecated. It will allocate a fresh column in the future (#2655) - all deprecations present in 0.22 release now throw an error
(#2554);
in particular
convert
methods,map
onGroupedDataFrame
that were deprecated in 0.22.6 release now throw an error (#2679)
innerjoin
,leftjoin
,rightjoin
,outerjoin
,semijoin
, andantijoin
are now much faster and check if passed data frames are sorted by theon
columns and take into account if shorter data frame that is joined has unique values inon
columns. These aspects of input data frames might affect the order of rows produced in the output (#2612, #2622)DataFrame
constructor,copy
,getindex
,select
,select!
,transform
,transform!
,combine
,sort
, and join functions now use multiple threads in selected operations (#2647, #2588, #2574, #2664)
convert
methods fromAbstractDataFrame
,DataFrameRow
andGroupKey
toArray
,Matrix
,Vector
andTuple
, as well as fromAbstractDict
toDataFrame
, are now deprecated: use corresponding constructors instead. The only conversions that are retained areconvert(::Type{NamedTuple}, dfr::DataFrameRow)
,convert(::Type{NamedTuple}, key::GroupKey)
, andconvert(::Type{DataFrame}, sdf::SubDataFrame)
; the deprecated methods will be removed in 1.0 release- as a bug fix
eltype
of vector returned byeachrow
is nowDataFrameRow
(#2662) - applying
map
toGroupedDataFrame
is now deprecated. It will be an error in 1.0 release. (#2662) copycols
keyword argument is now respected when building aDataFrame
fromTables.CopiedColumns
(#2656)
- the rules for transformations passed to
select
/select!
,transform
/transform!
, andcombine
have been made more flexible; in particular now it is allowed to return multiple columns from a transformation function (#2461 and #2481) - CategoricalArrays.jl is no longer reexported: call
using CategoricalArrays
to use it #2404. In the same vein, thecategorical
andcategorical!
functions have been deprecated in favor oftransform(df, cols .=> categorical .=> cols)
and similar syntaxes #2394.stack
now creates aPooledVector{String}
variable column rather than aCategoricalVector{String}
column by default; passvariable_eltype=CategoricalValue{String}
to get the previous behavior (#2391) isless
forDataFrameRow
s now checks column names (#2292)DataFrameColumns
is now not a subtype ofAbstractVector
(#2291)nunique
is not reported now bydescribe
by default (#2339)- stop reordering columns of the parent in
transform
andtransform!
; always generate columns that were specified to be computed even forGroupedDataFrame
with zero rows (#2324) - improve the rule for automatically generated column names in
combine
/select(!)
/transform(!)
with composed functions (#2274) :nmissing
indescribe
now produces0
if the column does not allow missing values; earliernothing
was produced in this case (#2360)- fast aggregation functions in for
GroupedDataFrame
now correctly choose the fast path only when it is safe; this resolves inconsistencies with what the same functions not using fast path produce (#2357) - joins now return
PooledVector
notCategoricalVector
in indicator column (#2505) GroupKeys
now supportsin
forGroupKey
,Tuple
,NamedTuple
and dictionaries (2392)- in
describe
the specification of custom aggregation is nowfunction => name
; oldname => function
order is now deprecated (#2401) - in joins passing
NaN
or real or imaginary-0.0
inon
column now throws an error; passingmissing
throws an error unlessmatchmissing=:equal
keyword argument is passed (#2504) unstack
now produces row and column keys in the order of their first appearance and has two new keyword argumentsallowmissing
andallowduplicates
(#2494)- PrettyTables.jl is now the
default back-end to print DataFrames to text/plain; the print option
splitcols
was removed and the output format was changed (#2429)
- add
filter
toGroupedDataFrame
(#2279) - add
empty
andempty!
function forDataFrame
that remove all rows from it, but keep columns (#2262) - make
indicator
keyword argument in joins allow passing a string (#2284, #2296) - add new functions to
GroupKey
API to make it more consistent withDataFrameRow
(#2308) - allow column renaming in joins (#2313 and (#2398)
- add
rownumber
toDataFrameRow
(#2356) - allow passing column name to specify the position where a new columns should be
inserted in
insertcols!
(#2365) - allow
GroupedDataFrame
s to be indexed using a dictionary, which can useSymbol
or string keys and are not dependent on the order of keys. (#2281) - add
isapprox
method to check for approximate equality between two dataframes (#2373) - add
columnindex
forDataFrameRow
(#2380) names
now acceptsType
as a column selector (#2400)select
,select!
,transform
,transform!
andcombine
now allowrenamecols
keyword argument that makes it possible to avoid adding transformation function name as a suffix in automatically generated column names (#2397)filter
,sort
,dropmissing
, andunique
now support aview
keyword argument which if set totrue
makes them return aSubDataFrame
view into the passed data frame.- add
only
method forAbstractDataFrame
(#2449) - passing empty sets of columns in
filter
/filter!
and inselect
/transform
/combine
withByRow
is now accepted (#2476) - add
permutedims
method forAbstractDataFrame
(#2447) - add support for
Cols
from DataAPI.jl (#2495) - add
reverse
function forAbstractDataFrame
that reverses the rows (#2944)
DataFrame!
is now deprecated (#2338)- several in-standard
DataFrame
constructors are now deprecated (#2464) - all old deprecations now throw an error (#2350)
- Tables.jl version 1.2 is now required.
- DataAPI.jl version 1.4 is now required. It implies that
All(args...)
is deprecated andCols(args...)
is recommended instead.All()
is still supported.
- Documentation is now available also in Dark mode (#2315)
- add rich display support for Markdown cell entries in HTML and LaTeX (#2346)
- limit the maximal display width the output can use in
text/plain
before being truncated (in thetextwidth
sense, excluding…
) to32
per column by default and fix a corner case when no columns are printed in situations when they are too wide (#2403) - Common methods are now precompiled to improve responsiveness the first time a method is called in a Julia session. Precompilation takes up to 30 seconds after installing the package (#2456).