serving large files :) !

This talks about how we can reason with serving large files to clients.

running the app

Ensure Go is installed!
```
brew install go
```
Clone the repository
Enter the directory

Create a large file named lotsofdata.csv

head -c 10000000000 /dev/urandom > lotsofdata.csv

Run the app!
```
go run main.go
```
Read the comments in the main.go file!

learning about the memory

With the app running, observe the memory usage.
Make a request to the webserver.
```
wget localhost:8080
```
Monitor the memory usage alongside the request.

Note the length in the response is unspecified! It's an octet-stream coming from the server :) !
Once the request has finished downloading the file, verify the memory returns to the original usage!

Additional Considerations

What makes this work!?

There's a few key problems we need to solve for in order to make models like this work. By understanding these concepts, we can make this work in any language!

Reading the file in chunks.

It's critical we don't open the entire CSV file. It's like 8GB! Instead, we need to read a chunk of the file, write it to the response, "flush", and repeat.

If you look in the code, the magic is mostly in io.Copy. By default it is doing everything in 32kb chunks (but we can customize this!). Looking at the code, we have two key variables f and w.

f is the opened file. It IS NOT READ! but opened for us to read. f implements what we'd call a "reader", meaning it can be read :).

w is the response we'll send to the client. It implements a "writer", which means it can be written to.
```
f, err := os.Open(fileName)

io.Copy(w, f)
```
By doing this we introduce the following.
1. Grab the buffer.
2. Read contents of the file until the buffer is filled or end of file (EOF)
3. Deplete the buffer by writing the data into the response
4. Repeat!!!
Responding with a stream or chunk of data.

This can depend on the HTTP version your client is using. Most clients / browsers today will be using HTTP2, but we need to be certain that we can pass data bank in chunks or some form of stream to the client. See the screenshot about that tells wget it's NOT getting a length specified and it should expect an octet stream!

Parallelism

Currently, we see usage per request in the realm of 40kb. So if we had a VERY conservative estimate of capping at 20mb, that should serve us in the realm of 500 parallel requests for this big file.

There's a ton of cool things we could do to providing waiting mechanisms or handle back pressure. We also could scale this thing horizontally using a ...... !! loadbalancer !!.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
imgs		imgs
README.md		README.md
go.mod		go.mod
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

serving large files :) !

running the app

learning about the memory

Additional Considerations

What makes this work!?

Parallelism

About

Releases

Packages

Languages

joshrosso/serving-large-files

Folders and files

Latest commit

History

Repository files navigation

serving large files :) !

running the app

learning about the memory

Additional Considerations

What makes this work!?

Parallelism

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages