Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20544][SPARKR] R wrapper for input_file_name #17818

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,7 @@ exportMethods("%<=>%",
"hypot",
"ifelse",
"initcap",
"input_file_name",
"instr",
"isNaN",
"isNotNull",
Expand Down
20 changes: 20 additions & 0 deletions R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -3974,3 +3974,23 @@ setMethod("grouping_id",
jc <- callJStatic("org.apache.spark.sql.functions", "grouping_id", jcols)
column(jc)
})

#' input_file_name
#'
#' Creates a string column for the file name of the current Spark task.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually find this description in Scala API quite a bit confusing - what is "Spark task" and how it has "file name"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the new one?

#'
#' @rdname input_file_name
#' @name input_file_name
#' @aliases input_file_name,missing-method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, could you add @family normal_funcs here? I missed this earlier and in the other PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#' @export
#' @examples \dontrun{
#' df <- read.text("README.md")
#'
#' head(select(df, input_file_name()))
#' }
#' @note input_file_name since 2.3.0
setMethod("input_file_name", signature("missing"),
function() {
jc <- callJStatic("org.apache.spark.sql.functions", "input_file_name")
column(jc)
})
6 changes: 6 additions & 0 deletions R/pkg/R/generics.R
Original file line number Diff line number Diff line change
Expand Up @@ -1076,6 +1076,12 @@ setGeneric("hypot", function(y, x) { standardGeneric("hypot") })
#' @export
setGeneric("initcap", function(x) { standardGeneric("initcap") })

#' @param x empty. Should be used with no argument.
#' @rdname input_file_name
#' @export
setGeneric("input_file_name",
function(x = "missing") { standardGeneric("input_file_name") })

#' @rdname instr
#' @export
setGeneric("instr", function(y, x) { standardGeneric("instr") })
Expand Down
5 changes: 5 additions & 0 deletions R/pkg/inst/tests/testthat/test_sparkSQL.R
Original file line number Diff line number Diff line change
Expand Up @@ -1366,6 +1366,11 @@ test_that("column functions", {
expect_equal(collect(df2)[[3, 1]], FALSE)
expect_equal(collect(df2)[[3, 2]], TRUE)

# Test that input_file_name()
actual_names <- sort(collect(distinct(select(df, input_file_name()))))
expect_equal(length(actual_names), 1)
expect_equal(basename(actual_names[1, 1]), basename(jsonPath))

df3 <- select(df, between(df$name, c("Apache", "Spark")))
expect_equal(collect(df3)[[1, 1]], TRUE)
expect_equal(collect(df3)[[2, 1]], FALSE)
Expand Down