Added support for feather files in POST #77

FvD · 2017-03-01T03:40:36Z

Hi Jeff,

This was not an open request in the issues, but was something that came up in a plumber use case: the need to be able to send feather (binary) files. I would much appreciate your review, as I think this may not be the most elegant solution. But it works, provided you upload the file as multipart/form-data:

 curl -X POST http://localhost:9080/ -F 'myfile=@mydata.feather'

It then uses webutils from Jeroen Ooms to parse the multipart content into its different components and writes out a temporary file to read it back in using feather.

What I have not been able to do (yet) is write a test for this. I've tried to copy an example multipart upload out of the debug session (by transforming rawToChar()) but for some reason I cannot get it right. Any pointers there are welcome.

FvD · 2017-03-01T13:29:04Z

For some reason the installation of feather failed in Travis CI. I'll have a look why.

But before making any changes, i'd like to know you opinion on generalizing this to more (all?) binary formats. Either we move out all the code related to feather, so that the raw binary body is returned to be handles by the function written in the endpoint. We can return the name of the handle and the name of the file, as both are available in the multiparse object and the tmpfile location of the binary file (as set by tempfile()

The other option is to include the handling of files here, either based on Content-type (that has to be included then by the user in the POST request) or the file extension (that has to be correct, otherwise it will fail, but this is seldom an issue). The trouble there is that there are many different formats, and why should plumber enforce how to handle XLSX files or ODS files, for instance. It may be much better to leave all that to the code in the endpoint.

trestletech

This looks really promising! Thanks for taking the time to put it together.

As pointed out in #75, I think there's potential here to have much broader applicability. My preference would be to attach a files list off of the incoming request that could provide pointers to the relevant files that were uploaded and are now stored on the server's disk. Then the caller could just decide themselves that they want to read in the file as a feather file, for instance. And plumber doesn't have to be in the business of providing special parsers for every file type someone might want to upload.

So you could do something like

@post /upload
function(req){
  cat("Files uploaded: ", names(req$files))
  feather::read_feather(files[[1]])
  ...
}

We'd also need to figure out how to limit the filetypes and file sizes that we want people uploading. I'd hope the server would start off with an empty list of supported file extensions which could be added to in $run(), perhaps.

trestletech · 2017-04-06T02:49:17Z

R/post-body.R

+  boundary <- "-----------------"
+  parsed_binary <- webutils::parse_multipart(body, boundary)
+  file_name <- parsed_binary[[1]][[1]]
+  file_data <- parsed_binary[[1]][[2]]


Am I understanding correctly that we could iterate here to support more than one file?

I'm not sure yet. The list indexes used here are not the best way of accessing the parsed content. This looks better (myfile is a name assigned by webutils::parse_multipart())

parsed_binary$myfile$filename parsed_binary$myfile$value

For readability I've now implemented it as named lists, and have yet to figure out yet how to parse multiple files.

trestletech · 2017-04-06T02:49:55Z

R/post-body.R

+  file_data <- parsed_binary[[1]][[2]]
+
+  tmpfile <- tempfile()
+  writeBin(file_data, tmpfile)


Would be neat if we could keep this off disk using textConnections, but I know some packages struggle with connections and prefer files, so maybe landing this stuff on disk is the more generic solution.

For my purpose I would prefer to be able to land the files on disk. If we use textConnections, the data would not be persistent at all, and not be available after a restart of the service.

I can imagine use cases where persistence would not be an issue. Maybe this is something to make a separate ticket of, to add as an option later.

trestletech · 2017-04-06T02:51:07Z

DESCRIPTION

@@ -18,7 +18,9 @@ Imports:
    R6 (>= 2.0.0),
    stringi (>= 0.3.0),
    jsonlite (>= 0.9.16),
-    httpuv (>= 1.2.3)
+    httpuv (>= 1.2.3),
+    feather,


I'd hope to keep feather off of imports and in suggests. See how I handle the htmlwidgets package in this PR: https://github.com/trestletech/plumber/pull/82/files

We could just suggest and then, if a user goes to use a feature that requires a particular package and doesn't have it, we'd abort at runtime.

Absolutely! I had not looked carefully enough at how you are handling this. Your solution is elegant. Also, perhaps I should focus on a more generic solution to POST binary files in plumber and leave the handling of feather or other formats to the API code.

trestletech · 2017-04-06T02:53:05Z

R/post-body.R

+  # does not work when the boundary is set to the content type as per the
+  # webutils docs.
+  boundary <- "-----------------"
+  parsed_binary <- webutils::parse_multipart(body, boundary)


I'm unfamiliar with the specs here but this feels pretty hacky. I wonder if this is something that we should just fix in webutils directly?

There is nothing wrong with webutils. It was just poorly implemented on my part. The full boundary is part of the req$CONTENT_TYPE so we can fetch each current boundary like so:

boundary <- strsplit(req$CONTENT_TYPE, split = "=")[[1]][2]

FvD · 2017-08-01T17:02:09Z

Hi Jeff,

I've pushed the update based on your review. I could not implement everything yet, perhaps I can set up tickets to tackle them in the future.

Things changed:

Made a generic solution for binary files
Made a reference to the boundary in CONTENT_TYPE for use in webutils::part_multipart()
Do not clutter imports but move additional packages to suggest and warn user when suggested package is required.
Added specific use case of feather files to an example

Things no changed (yet)

[] Accept and parse multiple files
[] Leave data in-memory until user decides to write to disk
[] Specify supported (accepted?) extensions in the call to $run()

I think this will be an answer to #75 and #126 that is workable. But by all means send this back for changes if you see a need for further improvements before merging.

FvD · 2017-08-01T17:18:32Z

Travis fails with the message:

checking package dependencies ... ERROR
Namespace dependency not required: ‘crayon’

Crayon is outside of the scope of the changes I have made. Should I merge with upstream and commit again?

korterling · 2017-08-11T06:01:58Z

any example about upload the file?

@post /upload
function(req){
cat("Files uploaded: ", names(req$files))
}
curl -X POST http://localhost:1371/upload -F 'req=./test.txt'

FvD · 2017-08-11T12:36:02Z

HI @yangbenfa , I included an example of binary uploads in the pull request. You can find it in the folder plumber/inst/examples/13-binary-upload/. Please note the following two things:

For it to work you have to install from the feather branch in my repository - this pull request is still under review and has not been merged yet.
I have only tested this with binary files. I'm not sure if a text file would work, but I would love to know whether it does. If it does not work, please let me know.

The examples look as follows:

# To upload and use the content of the file in a function:
# curl -X POST http://localhost:9080/upload -F 'myfile=@test.feather'

#* @post /upload
function(name, data) {
  # work with binary files after upload (in this example a feather file)
  content <- read_feather(data)
  return(content)
}

# To upload and use the properties of the file in a function:
# curl -X POST http://localhost:9080/inspect -F 'myfile=@test.feather'

#* @post /inspect
function(name, data) {
  file_info <- data.frame(
    filename = name,
    mtime = file.info(data)$mtime,
    ctime = file.info(data)$ctime
    )
  return(file_info)
}

trestletech

(Sorry this took so long for me to get back to, Frans. I had somehow missed that the ball was in my court!)

This is getting close. There are a few missing features (like limits on how big of a file we'd try to bring in -- since it does get loaded into memory -- and testing), but I like the approach.

How much time do you have to devote to this? With the Thanksgiving week next week I'm hoping to have some time to devote to Plumber and to finishing up the release. I might be able to build a few more commits on top of this to get it to a point where I'm happy with it, but if you were willing/able to do it yourself I'd be happy to go that route.

One thing we could consider is using some sort of "feature flag" in which we could merge this so we can keep iterating on it but require that users opt-in (perhaps via some R option) in order to get this behavior. That way we can get it in on master and keep working on it there and it's available for users who want it but without putting anyone else at risk while we iron out the kinks.

trestletech · 2017-11-17T22:45:47Z

R/post-body.R

+  if (!is.null(req$CONTENT_TYPE) && grepl("multipart/form-data; boundary=",
+                                          req$CONTENT_TYPE, fixed = TRUE)) {
+
+    boundary <- strsplit(req$CONTENT_TYPE, split = "=")[[1]][2]


Is this vulnerable to having an = sign in the body of the message?

trestletech · 2017-11-17T22:52:52Z

R/post-body.R

+        return(list())
+      }
+      parsed_binary <- webutils::parse_multipart(body, boundary)
+      file_name <- parsed_binary$myfile$filename


I'm having a hard time finding where myfile comes from? Is that baked in to parse_multipart's return value?

trestletech · 2017-11-17T22:54:32Z

R/post-body.R

+      file_name <- parsed_binary$myfile$filename
+      file_data <- parsed_binary$myfile$value
+
+      tmpfile <- tempfile()


Hm. It's unfortunate that we're not able to clean up these temp files when we're done. But if we added an on.exit here the file would be gone before we could access the file downstream. Maybe I need to add a hook to the response object so that we can clean up any associated resources when it's done.

trestletech · 2017-11-17T22:55:00Z

R/post-body.R

+      if (is.null(body) || length(body) == 0 || body == "") {
+        return(list())
+      }
+      parsed_binary <- webutils::parse_multipart(body, boundary)


Are there limits built in to any of these on how big of a file we'd attempt to parse? Anything to prevent a client from sending a 10GB file and running the server out of RAM?

trestletech · 2017-11-17T22:55:40Z

R/post-body.R

+      tmpfile <- tempfile()
+      writeBin(file_data, tmpfile)
+
+      ret <- NULL


Is this any different than ret <-list(name = file_name, data=tmpfile)?

It surprised me that you could assign to a field on a NULL objec.t

trestletech · 2017-11-17T23:01:55Z

Oh and the one other high-level concern I had was around the hard-coding (I think) of the file name. I'd love to make that more generic before we bring it in.

eduardszoecs · 2019-08-01T17:06:46Z

Any updates on this feature?

CLAassistant · 2019-09-04T21:24:17Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

meztez · 2020-06-27T02:09:25Z

#550 adds support for any type of file, this should cover this use case and I think this one can be closed. Documentations to follow

Added support for feather files

d17249a

FvD mentioned this pull request Mar 2, 2017

Provide a clean way to upload files via an endpoint #75

Closed

trestletech suggested changes Apr 6, 2017

View reviewed changes

FvD added 2 commits June 13, 2017 15:00

Added support for feather files

44be238

update to upstream

076df64

trestletech force-pushed the master branch from bf0c022 to 6fe7c6e Compare July 7, 2017 18:29

FvD added 7 commits July 28, 2017 14:33

merged changes from upstream

3bc55ec

Remove feather from imports and move to suggests

98d6f22

Remove webutils from imports and move to suggests

d2c04a9

clean up input to and output from parse_multipart

2d7e59e

generic binary upload an example for feather file

ebaba94

Include example of usage

f115c21

Add check for the availability of webutils package

8a1a108

FvD mentioned this pull request Jul 31, 2017

how can upload files to server with plumber? #126

Closed

trestletech reviewed Nov 17, 2017

View reviewed changes

schloerke mentioned this pull request Jul 31, 2020

Add serializer_feather() and parser_feather() #626

Merged

3 tasks

schloerke added the Fixed in PR A PR has implemented a fix for this Issue label Jul 31, 2020

schloerke added this to the v0.5.0 - Next CRAN release milestone Jul 31, 2020

schloerke closed this in #626 Aug 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for feather files in POST #77

Added support for feather files in POST #77

FvD commented Mar 1, 2017

FvD commented Mar 1, 2017

trestletech left a comment

trestletech Apr 6, 2017

FvD Jul 29, 2017

trestletech Apr 6, 2017

FvD Jul 28, 2017

trestletech Apr 6, 2017

FvD Jul 28, 2017

trestletech Apr 6, 2017

FvD Jul 29, 2017

FvD commented Aug 1, 2017 •

edited

Loading

FvD commented Aug 1, 2017

korterling commented Aug 11, 2017

FvD commented Aug 11, 2017

trestletech left a comment

trestletech Nov 17, 2017

trestletech Nov 17, 2017

trestletech Nov 17, 2017

trestletech Nov 17, 2017

trestletech Nov 17, 2017

trestletech commented Nov 17, 2017

eduardszoecs commented Aug 1, 2019

CLAassistant commented Sep 4, 2019 •

edited

Loading

meztez commented Jun 27, 2020

Added support for feather files in POST #77

Added support for feather files in POST #77

Conversation

FvD commented Mar 1, 2017

FvD commented Mar 1, 2017

trestletech left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FvD commented Aug 1, 2017 • edited Loading

FvD commented Aug 1, 2017

korterling commented Aug 11, 2017

FvD commented Aug 11, 2017

trestletech left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trestletech commented Nov 17, 2017

eduardszoecs commented Aug 1, 2019

CLAassistant commented Sep 4, 2019 • edited Loading

meztez commented Jun 27, 2020

FvD commented Aug 1, 2017 •

edited

Loading

CLAassistant commented Sep 4, 2019 •

edited

Loading