Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart Body parsing #663

Merged
merged 24 commits into from
Aug 28, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
90870b4
Remove type is `multi` in `combine_keys()`
schloerke Aug 27, 2020
aa05ec6
`parser_multi()` should return and upgraded form of `webutils::parse_…
schloerke Aug 27, 2020
0867892
`parser_octet()` should only return the raw value
schloerke Aug 27, 2020
7d24ca9
parser_multi should be applied to any multipart content type
schloerke Aug 27, 2020
e084316
if the body is returned with class `plumber_multipart`, pluck the par…
schloerke Aug 27, 2020
22fb06f
Allow for make_req to pass in extra arguments
schloerke Aug 27, 2020
7bd66ba
Update tests to test for new multipart structures
schloerke Aug 27, 2020
f5d0aeb
Add json and yaml file extension content types. Update tsv and csv
schloerke Aug 27, 2020
6edbbee
`postBody` -> `body` in function names and docs
schloerke Aug 27, 2020
32410db
If the parsed body is not a list and does not have names, wrap it in …
schloerke Aug 27, 2020
65a94c8
`body_parser()` -> `req_body_parser()`
schloerke Aug 27, 2020
b97a8b2
"post body" -> "request body" (or "body")
schloerke Aug 27, 2020
1446a1a
Fix missing comma
schloerke Aug 27, 2020
0682d8f
Add news items
schloerke Aug 27, 2020
4f212c5
merge master
schloerke Aug 27, 2020
e93f1ee
Revert "Remove type is `multi` in `combine_keys()`"
schloerke Aug 28, 2020
122360d
Use `args_body` var name over `body`
schloerke Aug 28, 2020
cc271ac
Set up `ret$argsBody` to be user friendly. Coalesce same names together
schloerke Aug 28, 2020
f98d535
Update NEWS.md
schloerke Aug 28, 2020
3178f10
Apply suggestions from code review
schloerke Aug 28, 2020
d4a5ea8
Revert wording change for now
schloerke Aug 28, 2020
405d70a
Add comment
schloerke Aug 28, 2020
65eca06
Add test for same file name when combining multipart values
schloerke Aug 28, 2020
3499fc2
Add more ex info in `args`. Move `postBody` down for later PR
schloerke Aug 28, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 24 additions & 20 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ plumber 1.0.0

* An error will be thrown if multiple arguments are matched to an Plumber Endpoint route definition.
While it is not required, it is safer to define routes to only use `req` and `res` when there is a possiblity to have multiple arguments match a single parameter name.
Use `req$argsPath`, `req$argsQuery`, and `req$argsPostBody` to access path, query, and postBody parameters respectively.
Use `req$argsPath`, `req$argsQuery`, and `req$argsBody` to access path, query, and postBody parameters respectively.
See `system.file("plumber/17-arguments/plumber.R", package = "plumber")` to view an example with expected output and `plumb_api("plumber", "17-arguments")` to retrieve the api.
(#637)

Expand Down Expand Up @@ -50,24 +50,28 @@ plumber 1.0.0
* `serializer_headers(header_list)`: Method which sets a list of static headers for each serialized value. Heavily inspired from @ycphs (#455). (#585)
* `serializer_write_file()`: Method which wraps `serializer_content_type()`, but orchestrates creating, writing serialized content to, reading from, and removing a temp file. (#660)

#### POST body parsing

* Added support for POST body parsing (@meztez, #532)

* New POST body parsers
* `parser_csv()`: Parse POST body as a commas separated value (#584)
* `parser_json()`: Parse POST body as JSON (@meztez, #532)
* `parser_multi()`: Parse multi part POST bodies (@meztez, #532)
* `parser_octet()`: Parse POST body octet stream (@meztez, #532)
* `parser_form()`: Parse POST body as form input (@meztez, #532)
* `parser_rds()`: Parse POST body as RDS file input (@meztez, #532)
* `parser_text()`: Parse POST body plain text (@meztez, #532)
* `parser_tsv()`: Parse POST body a tab separated value (#584)
* `parser_yaml()`: Parse POST body as `yaml` (#584)
* `parser_none()`: Do not parse the post body (#584)
* `parser_yaml()`: Parse POST body (@meztez, #556)
* `parser_feather()`: Parse POST body using `feather` (#626)
* pseudo parser named `"all"` to allow for using all parsers. (Not recommended in production!) (#584)
#### Body parsing

* Added support for request body parsing (@meztez, #532)

* New request body parsers
* `parser_csv()`: Parse request body as a commas separated value (#584)
* `parser_json()`: Parse request body as JSON (@meztez, #532)
* `parser_multi()`: Parse multi part request bodies (@meztez, #532) and (#663)
* `parser_octet()`: Parse request body octet stream (@meztez, #532)
* `parser_form()`: Parse request body as form input (@meztez, #532)
* `parser_rds()`: Parse request body as RDS file input (@meztez, #532)
* `parser_text()`: Parse request body plain text (@meztez, #532)
* `parser_tsv()`: Parse request body a tab separated value (#584)
* `parser_yaml()`: Parse request body as `yaml` (#584)
* `parser_none()`: Do not parse the request body (#584)
* `parser_yaml()`: Parse request body (@meztez, #556)
* `parser_feather()`: Parse request body using `feather` (#626)
* Pseudo parser named `"all"` to allow for using all parsers. (Not recommended in production!) (#584)

* The parsed request body values is stored at `req$body`. (#663)

* If `multipart/*` content is parsed, `req$body` will contain named output from `webutils::parse_multipart()` and add the parsed value to each part. Look here for access to all provided information (e.g., `name`, `filename`, `content_type`, etc). In addition, `req$argsBody` (which is used for route argument matching) will contain a named reduced form of this information where `parsed` values (and `filename`s) are combined on the same `name`. (#663)

#### Visual Documentation

Expand Down Expand Up @@ -180,7 +184,7 @@ plumber 1.0.0

* Fixed bug where functions defined earlier in the file could not be found when `plumb()`ing a file. (#416)

* A multiline POST body is now collapsed to a single line (@robertdj, #270 #297).
* A multiline request body is now collapsed to a single line (@robertdj, #270 #297).

* Bumped version of httpuv to >= 1.4.5.9000 to address an unexpected segfault (@shapenaji, #289)

Expand Down
93 changes: 48 additions & 45 deletions R/content-types.R
Original file line number Diff line number Diff line change
@@ -1,51 +1,54 @@
# FROM Shiny
# @author Shiny package authors
knownContentTypes <- c(
html='text/html; charset=UTF-8',
htm='text/html; charset=UTF-8',
js='text/javascript',
css='text/css',
png='image/png',
jpg='image/jpeg',
jpeg='image/jpeg',
gif='image/gif',
svg='image/svg+xml',
txt='text/plain',
pdf='application/pdf',
ps='application/postscript',
xml='application/xml',
m3u='audio/x-mpegurl',
m4a='audio/mp4a-latm',
m4b='audio/mp4a-latm',
m4p='audio/mp4a-latm',
mp3='audio/mpeg',
wav='audio/x-wav',
m4u='video/vnd.mpegurl',
m4v='video/x-m4v',
mp4='video/mp4',
mpeg='video/mpeg',
mpg='video/mpeg',
avi='video/x-msvideo',
mov='video/quicktime',
ogg='application/ogg',
swf='application/x-shockwave-flash',
doc='application/msword',
xls='application/vnd.ms-excel',
ppt='application/vnd.ms-powerpoint',
xlsx='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
xltx='application/vnd.openxmlformats-officedocument.spreadsheetml.template',
potx='application/vnd.openxmlformats-officedocument.presentationml.template',
ppsx='application/vnd.openxmlformats-officedocument.presentationml.slideshow',
pptx='application/vnd.openxmlformats-officedocument.presentationml.presentation',
sldx='application/vnd.openxmlformats-officedocument.presentationml.slide',
docx='application/vnd.openxmlformats-officedocument.wordprocessingml.document',
dotx='application/vnd.openxmlformats-officedocument.wordprocessingml.template',
xlam='application/vnd.ms-excel.addin.macroEnabled.12',
xlsb='application/vnd.ms-excel.sheet.binary.macroEnabled.12',
feather='application/feather',
rds='application/rds',
tsv="text/tab-separated-values",
csv="text/csv"
html = "text/html; charset=UTF-8",
htm = "text/html; charset=UTF-8",
js = "text/javascript",
css = "text/css",
png = "image/png",
jpg = "image/jpeg",
jpeg = "image/jpeg",
gif = "image/gif",
svg = "image/svg+xml",
txt = "text/plain",
pdf = "application/pdf",
ps = "application/postscript",
xml = "application/xml",
m3u = "audio/x-mpegurl",
m4a = "audio/mp4a-latm",
m4b = "audio/mp4a-latm",
m4p = "audio/mp4a-latm",
mp3 = "audio/mpeg",
wav = "audio/x-wav",
m4u = "video/vnd.mpegurl",
m4v = "video/x-m4v",
mp4 = "video/mp4",
mpeg = "video/mpeg",
mpg = "video/mpeg",
avi = "video/x-msvideo",
mov = "video/quicktime",
ogg = "application/ogg",
swf = "application/x-shockwave-flash",
doc = "application/msword",
xls = "application/vnd.ms-excel",
ppt = "application/vnd.ms-powerpoint",
xlsx = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
xltx = "application/vnd.openxmlformats-officedocument.spreadsheetml.template",
potx = "application/vnd.openxmlformats-officedocument.presentationml.template",
ppsx = "application/vnd.openxmlformats-officedocument.presentationml.slideshow",
pptx = "application/vnd.openxmlformats-officedocument.presentationml.presentation",
sldx = "application/vnd.openxmlformats-officedocument.presentationml.slide",
docx = "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
dotx = "application/vnd.openxmlformats-officedocument.wordprocessingml.template",
xlam = "application/vnd.ms-excel.addin.macroEnabled.12",
xlsb = "application/vnd.ms-excel.sheet.binary.macroEnabled.12",
feather = "application/feather",
rds = "application/rds",
tsv = "application/tab-separated-values",
csv = "application/csv",
json = "application/json",
yml = "application/yaml",
yaml = "application/yaml"
)

getContentType <- function(ext, defaultType = 'application/octet-stream') {
Expand Down
85 changes: 63 additions & 22 deletions R/parse-body.R
Original file line number Diff line number Diff line change
@@ -1,23 +1,47 @@
postBodyFilter <- function(req){
handled <- req$.internal$postBodyHandled
bodyFilter <- function(req){
handled <- req$.internal$bodyHandled
if (is.null(handled) || handled != TRUE) {
# This will return raw bytes
req$postBodyRaw <- req$rook.input$read()
# store raw body into req$bodyRaw
req$bodyRaw <- req$rook.input$read()
if (isTRUE(getOption("plumber.postBody", TRUE))) {
req$rook.input$rewind()
req$postBody <- paste0(req$rook.input$read_lines(), collapse = "\n")
}
req$.internal$postBodyHandled <- TRUE
req$.internal$bodyHandled <- TRUE
}
forward()
}

postbody_parser <- function(req, parsers = NULL) {
req_body_parser <- function(req, parsers = NULL) {
if (length(parsers) == 0) {return(list())}
type <- req$HTTP_CONTENT_TYPE
body <- req$postBodyRaw
if (is.null(body)) {return(list())}
parse_body(body, type, parsers)
bodyRaw <- req$bodyRaw
if (is.null(bodyRaw)) {return(list())}
body <- parse_body(bodyRaw, type, parsers)
# store parsed body into req$body
req$body <- body
schloerke marked this conversation as resolved.
Show resolved Hide resolved

# Copy name over so that it is clearer as to the goal of the code below
# The value returned from this function is set to `ret$argsBody`
args_body <- body

if (inherits(args_body, "plumber_multipart")) {
args_body <- combine_keys(args_body, type = "multi")

} else if (!is.null(args_body)) {
# if it's a vector, then we should maybe bundle it as a list
# this will allow for req$args to have the single piece of information
# but it will deter from trying to formal name match against MANY false positive values
if (!is.list(args_body)) {
args_body_names <- names(args_body)
# if there are no names at all, wrap it in a unnamed list to pass it through
if (is.null(args_body_names) || all(args_body_names == "")) {
args_body <- list(args_body)
}
}
}
args_body
}

parse_body <- function(body, content_type = NULL, parsers = NULL) {
Expand Down Expand Up @@ -54,7 +78,7 @@ parser_picker <- function(content_type, first_byte, filename = NULL, parsers = N

# parse as json or a form
if (length(content_type) == 0) {
# fast default to json when first byte is 7b (ascii {)
# fast default to json when first byte is 7b (ascii {) or 5b (ascii [)
if (looks_like_json(first_byte)) {
return(parsers$alias$json)
}
Expand Down Expand Up @@ -400,7 +424,7 @@ parser_tsv <- function(...) {
}


#' @describeIn parsers Helper parser that writes the binary post body to a file and reads it back again using `read_fn`.
#' @describeIn parsers Helper parser that writes the binary body to a file and reads it back again using `read_fn`.
#' This parser should be used when reading from a file is required.
#' @param read_fn function used to read a the content of a file. Ex: [readRDS()]
#' @export
Expand Down Expand Up @@ -441,19 +465,16 @@ parser_feather <- function(...) {



#' @describeIn parsers Octet stream parser. Will add a filename attribute if the filename exists.
#' Returns a single item list where the value is the raw content and the key is the filename (if applicable).
#' @describeIn parsers Octet stream parser. Returns the raw content.
#' @export
parser_octet <- function() {
function(value, filename = NULL, ...) {
arg <- list(value)
names(arg) <- filename
return(arg)
function(value, ...) {
return(value)
}
}


#' @describeIn parsers Multi part parser. This parser will then parse each individual body with its respective parser
#' @describeIn parsers Multi part parser. This parser will then parse each individual body with its respective parser. When this parser is used, `req$body` will contain the updated output from [webutils::parse_multipart()] by adding the `parsed` output to each part. Each part may contain detailed information, such as `name` (required), `content_type`, `content_disposition`, `filename`, (raw, original) `value`, and `parsed` (parsed `value`). When performing Plumber route argument matching, each multipart part will match its `name` to the `parsed` content.
#' @export
#' @importFrom webutils parse_multipart
parser_multi <- function() {
Expand All @@ -462,8 +483,20 @@ parser_multi <- function() {
stop("No boundary found in multipart content-type header: ", content_type)
boundary <- stri_match_first_regex(content_type, "boundary=([^; ]{2,})", case_insensitive = TRUE)[,2]
toparse <- parse_multipart(value, boundary)

# set the names of the items as the `name` of each item
toparse_names <- vapply(toparse, function(x) {
name <- x$name
# null or character(0)
if (length(name) == 0) {
return("")
}
name
}, character(1))
names(toparse) <- toparse_names

# content-type detection
parsed_items <- lapply(toparse, function(x) {
ret <- lapply(toparse, function(x) {
if (
is.null(x$content_type) ||
# allows for files to be shipped as octect, but parsed using the matching value in `knownContentTypes`
Expand All @@ -475,11 +508,19 @@ parser_multi <- function() {
x$content_type <- getContentType(tools::file_ext(x$filename))
}
}
x$parsers <- parsers
parse_raw(x)
# copy over to allow to return the updated `x` without `parsers`
item <- x
# add `parsers` to allow `parse_raw` to work
item$parsers <- parsers
# store the parsed information into `x`
x$parsed <- parse_raw(item)
# return the updated `webutils::parse_multipart()` output
x
})

combine_keys(parsed_items, type = "multi")
# set a class so `req$argsBody` can be reduced to a named list of values
class(ret) <- "plumber_multipart"
ret
}
}

Expand All @@ -495,7 +536,7 @@ register_parsers_onLoad <- function() {
# parser alias names for plumbing
register_parser("csv", parser_csv, fixed = c("application/csv", "application/x-csv", "text/csv", "text/x-csv"))
register_parser("json", parser_json, fixed = c("application/json", "text/json"))
register_parser("multi", parser_multi, fixed = "multipart/form-data")
register_parser("multi", parser_multi, fixed = "multipart/form-data", regex = "^multipart/")
register_parser("octet", parser_octet, fixed = "application/octet-stream")
register_parser("form", parser_form, fixed = "application/x-www-form-urlencoded")
register_parser("rds", parser_rds, fixed = "application/rds")
Expand Down
Loading