Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small formatting tweaks to #3360 after reviewing online #3483

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 47 additions & 37 deletions docs/src/man/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -1635,32 +1635,36 @@ The `operation` argument defines the
operation to be applied to the source `dataframe`,
and it can take any of the following common forms explained below:

`source_column_selector`
: selects source column(s) without manipulating or renaming them
* `source_column_selector`

Examples: `:a`, `[:a, :b]`, `All()`, `Not(:a)`
selects source column(s) without manipulating or renaming them

`source_column_selector => operation_function`
: passes source column(s) as arguments to a function
and automatically names the resulting column(s)
Examples: `:a`, `[:a, :b]`, `All()`, `Not(:a)`

Examples: `:a => sum`, `[:a, :b] => +`, `:a => ByRow(==(3))`
* `source_column_selector => operation_function`

`source_column_selector => operation_function => new_column_names`
: passes source column(s) as arguments to a function
and names the resulting column(s) `new_column_names`
passes source column(s) as arguments to a function
and automatically names the resulting column(s)

Examples: `:a => sum => :sum_of_a`, `[:a, :b] => (+) => :a_plus_b`
Examples: `:a => sum`, `[:a, :b] => +`, `:a => ByRow(==(3))`

*(Not available for `subset` or `subset!`)*
* `source_column_selector => operation_function => new_column_names`

`source_column_selector => new_column_names`
: renames a source column,
or splits a column containing collection elements into multiple new columns
passes source column(s) as arguments to a function
and names the resulting column(s) `new_column_names`

Examples: `:a => :new_a`, `:a_b => [:a, :b]`, `:nt => AsTable`
Examples: `:a => sum => :sum_of_a`, `[:a, :b] => (+) => :a_plus_b`

(*Not available for `subset` or `subset!`*)
*(Not available for `subset` or `subset!`)*

* `source_column_selector => new_column_names`

renames a source column,
or splits a column containing collection elements into multiple new columns

Examples: `:a => :new_a`, `:a_b => [:a, :b]`, `:nt => AsTable`

(*Not available for `subset` or `subset!`*)

The `=>` operator constructs a
[Pair](https://docs.julialang.org/en/v1/base/collections/#Core.Pair),
Expand Down Expand Up @@ -1747,7 +1751,7 @@ julia> subset(df, :minor)
```

`source_column_selector` may instead be a collection of columns such as a vector,
a [regular expression](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions),
a [regular expression](https://docs.julialang.org/en/v1/manual/strings/#man-regex-literals),
a `Not`, `Between`, `All`, or `Cols` expression,
or a `:`.
See the [Indexing](@ref) API for the full list of possible values with references.
Expand Down Expand Up @@ -2279,7 +2283,8 @@ julia> transform(df, :b => (x -> x .+ 10) => :a) # replace column :a
4 │ 18 8
```

Actually, `renamecols=false` just prevents the function name from being appended to the final column name such that the operation is *usually* returned to the same column.
Actually, `renamecols=false` just prevents the function name from being appended
to the final column name such that the operation is *usually* returned to the same column.

```julia
julia> transform(df, [:a, :b] => +) # new column name is all source columns and function name
Expand Down Expand Up @@ -2939,13 +2944,18 @@ julia> select(

!!! note "Notes"

* `Not("Time")` or `2:4` would have been equally good choices for `source_column_selector` in the above operations.
* Don't forget `ByRow` if your function is to be applied to elements rather than entire column vectors.
Without `ByRow`, the manipulations above would have thrown
`ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.
* `Not("Time")` or `2:4` would have been equally good choices
for `source_column_selector` in the above operations.

* Don't forget `ByRow` if your function is to be applied to elements
rather than entire column vectors.
Without `ByRow`, the manipulations above would have thrown
`ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.

* Regular expression (`r""`) and `:` `source_column_selectors`
must be wrapped in `Cols` to be properly broadcasted
because otherwise the broadcasting occurs before the expression is expanded into a vector of matches.
must be wrapped in `Cols` to be properly broadcasted
because otherwise the broadcasting occurs before the expression
is expanded into a vector of matches.

You could also broadcast different columns to different functions
by supplying a vector of functions.
Expand Down Expand Up @@ -3095,7 +3105,8 @@ julia> df # see that the previous expression updated the data frame `df`
3 │ 3 6 9
```

Recall that the return type from a data frame manipulation function call is always a data frame.
Recall that the return type from a data frame manipulation function call
is always a data frame.
The return type of a data frame column accessed with dot syntax is a `Vector`.
Thus the expression `df.x + df.y` gets the column data as vectors
and returns the result of the vector addition.
Expand Down Expand Up @@ -3210,7 +3221,6 @@ julia> my_very_long_data_frame_name = DataFrame(

julia> c1 = "My First Column"; c2 = "My Second Column"; c3 = "My Third Column"; # define column names
```

**Manipulation:**

```julia
Expand Down Expand Up @@ -3267,16 +3277,10 @@ julia> df.Not(:x) # will not work; requires a literal column name
ERROR: ArgumentError: column name :Not not found in the data frame
```

**Indexing:**
**Manipulation:**

```julia
julia> df[:, :y_z_max] = maximum.(eachrow(df[:, Not(:x)])) # find maximum value across all rows except for column `x`
3-element Vector{Int64}:
7
8
9

julia> df # see that the previous expression updated the data frame `df`
julia> transform!(df, Not(:x) => ByRow(max)) # find maximum value across all rows except for column `x`
3×4 DataFrame
Row │ x y z y_z_max
│ Int64 Int64 Int64 Int64
Expand All @@ -3286,10 +3290,16 @@ julia> df # see that the previous expression updated the data frame `df`
3 │ 3 6 9 9
```

**Manipulation:**
**Indexing:**

```julia
julia> transform!(df, Not(:x) => ByRow(max)) # find maximum value across all rows except for column `x`
julia> df[:, :y_z_max] = maximum.(eachrow(df[:, Not(:x)])) # find maximum value across all rows except for column `x`
3-element Vector{Int64}:
7
8
9

julia> df # see that the previous expression updated the data frame `df`
3×4 DataFrame
Row │ x y z y_z_max
│ Int64 Int64 Int64 Int64
Expand Down
Loading