Skip to content

Commit

Permalink
cors/auth handling, README updates
Browse files Browse the repository at this point in the history
  • Loading branch information
ryan-williams committed Dec 26, 2021
1 parent 74be570 commit 584d005
Show file tree
Hide file tree
Showing 6 changed files with 360 additions and 125 deletions.
144 changes: 137 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,31 @@
# s3idx ![](assets/favicon.ico)
S3 bucket browsing via single `index.html` file
Amazon S3 bucket browser in a single `index.html` file

## Usage
- [Usage](#usage)
- [Examples](#examples)
- [Implementation Notes](#implementation)
- [Caching](#caching)
- [Configuration](#configuration)
- [Upload args](#upload-args)
- [Local development](#local-development)
- [S3 websites](#s3-websites)
- [Security](#security)
- [Roadmap](#roadmap)

## Usage <a id="usage"></a>
Copy `s3://s3idx/index.html` to any public S3 bucket:
```bash
aws s3 cp s3://s3idx/index.html s3://$bucket/ --content-type="text/html; charset=utf-8" --acl public-read
aws s3 cp s3://s3idx/index.html s3://$bucket/ \
--content-type="text/html; charset=utf-8" \
--acl public-read
```

Browse `$bucket` interactively:
```bash
open https://$bucket.s3.amazonaws.com/index.html
```

## Example
## Examples <a id="examples"></a>
Here's `index.html` in action in the `ctbk` bucket, [ctbk.s3.amazonaws.com/index.html](https://ctbk.s3.amazonaws.com/index.html):

![](ctbk.gif)
Expand All @@ -23,12 +36,125 @@ Note:
- opt-in recursive fetching
- total sizes / last modified times for directories (only when fully fetched)

## Implementation Notes
## Implementation Notes <a id="implementation"></a>

### Caching <a id="caching"></a>
- Requests to S3 (`ListObjectsV2`) are cached for a configurable length of time (default: 10hrs)
- "Recurse" checkbox (default: off) toggles fetching bucket/directory contents recursively (vs. just the immediate children of the current directory)
- Changes to above settings (as well as others, e.g. size and datetime formats) are persisted in `localStorage`

## Roadmap / Feature wishlist
### Configuration <a id="configuration"></a>
Various global defaults can be initialized on a per-bucket basis, in the deployed `index.html`, by modifying these values at the top of the file:
```html
<!doctype html><head><title>s3idx</title><script>// ****** s3idx config ******
// Global default config; uncomment/change lines as desired.
// Used to seed localStorage values (that then take precedence)
var S3IDX_CONFIG = {
// datetimeFmt: "YYYY-MM-DD HH:mm:ss",
// sizeFmt: "iso",
// eagerMetadata: false,
// ttl: "10h",
// pageSize: 20,
// s3PageSize: 1000,
// paginationInfoInURL: true,
// region: undefined,
}</script>
```

There's probably a better way to do this, but for now, you can just edit `index.html` in a text-editor to uncomment/change the lines you want to change, or do something hacky like:
```bash
aws cp s3://s3idx/index.html ./
perl -pe -i 's/ttl: "10h"/ttl: "1h"/'
aws cp ./index.html s3://s3idx/ \
--content-type="text/html; charset=utf-8" \
--acl public-read \
--cache-control max-age=3600,public
```

### Upload args <a id="upload-args"></a>
The trailing arguments above are necessary to make sure `index.html` is:
- public
- served as UTF-8 (it contains non-ASCII characters and will error otherwise, usually stuck with a page that says "Loadingæ")
- cached at a reasonable frequency
- 1 hour, in this example
- this caching is at the HTTP level, orthogonal to the app's `localStorage`-caching of data it receives from S3 (cf. [above](#caching))

### Local development <a id="local-development"></a>
Build in one terminal (and watch for changes):
```bash
npm run dev
```
Serve from `dist/` directory:
```bash
cd dist
http-server
```
Open in browser:
```bash
open http://127.0.0.1:8080/#/$bucket
```

Only buckets that set appropriate CORS headers will be usable; others will show an error page with info/suggestions. See also [discussion of CORS nuances in the Security section](#security).

### S3 websites <a id="s3-websites"></a>
`s3idx` should work from [S3 buckets configured to serve as static sites](https://docs.aws.amazon.com/AmazonS3/latest/userguide/WebsiteEndpoints.html), e.g. [`http://s3idx.s3-website-us-east-1.amazonaws.com/`](http://s3idx.s3-website-us-east-1.amazonaws.com/). However, I'm not sure there are any advantages to using it that way, as opposed to on the `s3.amazonaws.com` REST API subdomain (where your browser is happy to render it as `text/html`).

### `S3Fetcher`
See [`src/s3fetcher.tsx`](./src/s3fetcher.tsx); each `S3Fetcher` handles interfacing with a specific "directory" (bucket or "prefix") on S3, fetching pages and maintaining a cache in `localStorage`, computing various summary statistics (number of children, total size, last modified time), and firing callbacks when they change.

It's pretty messy and imperative; lots of room for improvement. In particular, a [SQL.js](https://sql.js.org/#/) backend is appealing, especially for desired table-sorting/searching functionality.

## Security <a id="security"></a>
Below is an informal analysis of s3idx's security assumptions and properties.

### tl;dr
- Use on **public buckets** is believed to be secure / low-risk
- Use on **private buckets** is believed to be secure, by me, but I'm not 100% positive, and I am not a security engineer. **USE ON PRIVATE DATA AT YOUR OWN RISK!**
- Access/Secret keys are submitted by the user and persisted in `localStorage`.
- `index.html` bundles everything it needs, and makes no requests to any external domains. No data leaves the browser, so it's in principle safe for use even on private buckets.
- CORS configurations on private buckets can inadvertently expose

### Public "bucket-subdomain" endpoint
In the simple case, `index.html` is deployed to a public bucket and accessed at `<bucket>.s3.amazonaws.com/index.html`. It only makes HEAD and GET requests to that domain (when it doesn't have a cached version to fall back on).

### Private buckets
s3idx's `index.html` can be used in private buckets by:
- deploying it as a publicly readable object (cf. see `--acl public-read` in the installation commands)
- when a person visits it, it will call `listObjectsV2` to read the bucket's contents, receive an HTTP 403 error code (`AccessDenied`), and present the user with a form soliciting a "region" for the bucket as well as an access/secret key pair
- credentials are persisted in `localStorage`, so the user will be able to browse that bucket thereafter.

This increases the vulnerability surface in two important ways:
1. Credentials (access/secret key pair) are stored in `localStorage`.
2. A public `index.html` exists under `<bucket>.s3.amazonaws.com`

To mitigate 1., keys should always be scoped to "read" actions (`Get*`, `List*`) on the current bucket. 2. doesn't directly make any private data public, but it makes it easier for an attacker to access the user's keys (e.g. in the presence of overly permissive CORS headers).

### CORS <a id="cors"></a>
The most likely security issue I can see results from overly permissive CORS headers on a private bucket with a public s3idx `index.html`.

The degree of over-permissioning required still seems quite high:
- wildcard origin (❗️)
- include credentials (‼️)

Such a CORS configuration on private or sensitive data seems to represent a serious security rsk on its own, independent of s3idx, so I'm not too concerned about it.

The more concerning possibility is that I've misunderstood some detail of how CORS works, or that I've missed some attack vector, which is very possible. Again, USE ON PRIVATE DATA AT YOUR OWN RISK! And feel free ot [file an issue](https://github.com/runsascoded/s3idx/issues/new) to discuss any of this further.

### "Bucket-path" endpoints
Another security consideration relates to S3 "bucket-path" REST API endpoints of the form `s3.amazonaws.com/<bucket>` (as opposed to the "bucket-subdomain" endpoints s3idx typically uses; example: [`s3.amazonaws.com/s3idx/index.html`](https://s3.amazonaws.com/s3idx/index.html)).

Bucket-path endpoints are generally less secure than bucket-subdomain endpoints. For example, if a user can be tricked into visiting `s3.amazonaws.com/malicious-bucket/index.html`, scripts there can read (and write!) s3idx state about other buckets. For public buckets, this isn't a big problem (afaict!), but with private buckets it's a huge issue.

For this reason, s3idx redirects users to bucket-subdomain endpoints, *except in one case*: when a bucket name contains a `.`, `<bucket>.s3.amazonaws.com` seems to exhibit HTTPS errors, so s3idx allows use of the `s3.amazonaws.com/<bucket>` form (and uses that endpoint for S3 API requests). Here's an example: [`s3.amazonaws.com/ctbk.dev/index.html`](https://s3.amazonaws.com/ctbk.dev/index.html).

So, a private bucket with a dot (`.`) in the name is a bit stuck:
- bucket-subdomain endpoint suffers HTTPS errors
- bucket-path endpoint risks a credential leak

To mitigate this, the access/secret key input fields are disabled on bucket-path endpoints.

## Roadmap / Feature wishlist <a id="roadmap"></a>
TODO: make these GitHub issues

### Caching
Expand Down Expand Up @@ -56,9 +182,13 @@ TODO: make these GitHub issues
- [ ] region/credentials per bucket

### Misc
- [ ] check both `<bucket>.s3.amazonaws.com/…` and `s3.amazonaws.com/<bucket>/…` URL forms
- [ ] audit/reduce bundle size
- [ ] treemap view
- [ ] DEP0005 deprecation warning during `npm run build`
- [ ] better/structured logging
- [ ] support blob download
- [ ] support click-to-copy paths to clipboard
- [ ] source documentation
- [ ] create Lambda that compiles `index.html` with various default configs set (or e.g. `sql.js` mode)
- [ ] support deploying to a subdirectory within a bucket
- [ ] configurable S3 endpoint
30 changes: 15 additions & 15 deletions index.html
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
<!DOCTYPE html>
<head>
<title>S3 Tree</title>
<script>
// Global default config; uncomment/change lines as desired.
// Used to seed localStorage values (that then take precedence)
var S3IDX_CONFIG = {
// datetimeFmt: "YYYY-MM-DD HH:mm:ss",
// sizeFmt: "iso",
// eagerMetadata: false,
// ttl: "10h",
// pageSize: 20,
// s3PageSize: 1000,
// paginationInfoInURL: true,
// region: undefined,
}
</script>
<title>s3idx</title>
<script>// ****** s3idx config ******
// Global default config; uncomment/change lines as desired.
// Used to seed localStorage values (that then take precedence)
var S3IDX_CONFIG = {
// datetimeFmt: "YYYY-MM-DD HH:mm:ss",
// sizeFmt: "iso",
// eagerMetadata: false,
// ttl: "10h",
// pageSize: 20,
// s3PageSize: 1000,
// paginationInfoInURL: true,
// region: undefined,
}
</script>
</head>
<body>
<div id="root">Loading…</div>
Expand Down
8 changes: 5 additions & 3 deletions src/github-link.tsx
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
import styled from "styled-components";
import React from "react";

const repoUrl = "https://github.com/runsascoded/s3idx/issues"
export const repoUrl = "https://github.com/runsascoded/s3idx"
export const issuesUrl = `${repoUrl}/issues`

const githubLogo = "iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAAyRpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdpbj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMy1jMDExIDY2LjE0NTY2MSwgMjAxMi8wMi8wNi0xNDo1NjoyNyAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlwdGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAvIiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RSZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpDcmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNiAoTWFjaW50b3NoKSIgeG1wTU06SW5zdGFuY2VJRD0ieG1wLmlpZDpFNTE3OEEyQTk5QTAxMUUyOUExNUJDMTA0NkE4OTA0RCIgeG1wTU06RG9jdW1lbnRJRD0ieG1wLmRpZDpFNTE3OEEyQjk5QTAxMUUyOUExNUJDMTA0NkE4OTA0RCI+IDx4bXBNTTpEZXJpdmVkRnJvbSBzdFJlZjppbnN0YW5jZUlEPSJ4bXAuaWlkOkU1MTc4QTI4OTlBMDExRTI5QTE1QkMxMDQ2QTg5MDREIiBzdFJlZjpkb2N1bWVudElEPSJ4bXAuZGlkOkU1MTc4QTI5OTlBMDExRTI5QTE1QkMxMDQ2QTg5MDREIi8+IDwvcmRmOkRlc2NyaXB0aW9uPiA8L3JkZjpSREY+IDwveDp4bXBtZXRhPiA8P3hwYWNrZXQgZW5kPSJyIj8+m4QGuQAAAyRJREFUeNrEl21ojWEYx895TDPbMNlBK46IUiNmPvHBSUjaqc0H8pF5+aDUKPEBqU2NhRQpX5Rv5jWlDIWlMCv7MMSWsWwmb3tpXub4XXWdPHvc9/Gc41nu+nedc7/8r/99PffLdYdDPsvkwsgkTBwsA/PADJCnzX2gHTwBt8Hl7p537/3whn04XoDZDcpBlk+9P8AFcAghzRkJwPF4zGGw0Y9QS0mAM2AnQj77FqCzrtcwB1Hk81SYojHK4DyGuQ6mhIIrBWB9Xm7ug/6B/nZrBHBegrkFxoVGpnwBMSLR9EcEcC4qb8pP14BWcBcUgewMnF3T34VqhWMFkThLJAalwnENOAKiHpJq1FZgI2AT6HZtuxZwR9GidSHtI30jOrbawxlVX78/AbNfhHlomEUJJI89O2MqeE79T8/nk8nMBm/dK576hZgmA3cp/R4l9/UeSxiHLVIlNm4nFfT0bxyuIj7LHRTKai+zdJobwMKzcZSJb0ePV5PKN+BqAAKE47UlMnERELMM3EdYP/yrd+XYb2mOiYBiQ8OQnoRBlXrl9JZix7D1pHTazu4MoyBcnYamqAjIMTR8G4FT8LuhLsexXYYjICBiqhQBvYb6fLZIJCjPypVvaOoVAW2WcasCnL2Nq82xHJNSqlCeFcDshaPK0twkAhosjZL31QYw+1rlMpWGMArl23SBsZZO58F2tlJXmjOXS+s4WGvpMiBJT/I2PInZ6lIs9/hBsNS1hS6BG0DSqmYEDRlCXQrmy50P1oDRKTSegmNbUsA0zDMwRhPJXeCE3vWLPQMvan6X8AgIa1vcR4AkGZkDR4ejJ1UHpsaVI0g2LInpOsNFUud1rhxSV+fzC9Woz2EZkWQuja7/B+jUrgtIMpy9YCW4n4K41YfzRneW5E1KJTe4B2Zq1Q5EHEtj4U3AfEzR5SVY4l7QYQPJdN2as7RKBF0BPZqqH4VgMAMBL8Byxr7y8zCZiDlnOcEKIPmUpgB5Z2ww5RdOiiRiNajUmWda5IG6WbhsyY2fx6m8gLcoJDJFkH219M3We1+cnda93pfycZpIJEL/s/wSYADmOAwAQgdpBAAAAABJRU5ErkJggg=="

const GithubLink = styled.a`
export const GithubLink = styled.a`
margin-left: 1rem;
margin-top: 0.4rem;
`

export function GithubIssuesLink() {
return <GithubLink href={repoUrl}>
return <GithubLink href={issuesUrl}>
<img src={`data:image/png;base64,${githubLogo}`}/>
</GithubLink>
}
43 changes: 27 additions & 16 deletions src/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -11,30 +11,41 @@ import {RouteAdapter} from "./route-adapter";
function Router() {
console.log("Router rendering")
const { hostname, pathname, } = window.location
const rgx1 = /(?<bucket>.*)\.s3(-(?<region>[^.]+))?\.amazonaws\.com$/
const rgx2 = /(?<bucket>.*)\.s3-website(-(?<region>[^.]+))?\.amazonaws\.com$/
const match = hostname.match(rgx1) || hostname.match(rgx2)
const rgx = /((?<bucket>.*)\.)?s3(-website)?(\.(?<region>[^.]+))?\.amazonaws\.com$/
let endpoint = ''
const match = hostname.match(rgx)
let { bucket } = match?.groups || {}
let prefix
if (bucket) {
prefix = pathname.replace(/\/.*?$/, '')
console.log(`Parsed bucket ${bucket}, prefix ${prefix} from URL hostname ${hostname} / pathname ${pathname}`)
} else {
const rgx = /s3(\.(?<region>[^.]+))?\.amazonaws\.com$/
if (hostname.match(rgx)) {
const pieces = pathname.replace(/^\//, '').split('/')
bucket = pieces[0]
if (bucket) {
prefix = pieces.slice(1, pieces.length - 1).join('/')
console.log(`Parsed bucket ${bucket}, prefix ${prefix} from URL pathname ${pathname}`)
let pathPrefix
if (match) {
if (bucket) {
pathPrefix = pathname.replace(/\/.*?$/, '')
console.log(`Parsed bucket ${bucket}, prefix ${pathPrefix} from URL hostname ${hostname}, pathname ${pathname}`)
} else {
const rgx = /s3(\.(?<region>[^.]+))?\.amazonaws\.com$/ // TODO: factor with s3tree
if (hostname.match(rgx)) {
const pieces = pathname.replace(/^\//, '').split('/')
bucket = pieces[0]
if (bucket) {
// Redirect URLs of the form `s3.amazonaws.com/<bucket>` to `<bucket>.s3.amazonaws.com`, for
// security reasons
if (!bucket.includes('.')) {
// One exception is buckets that have a dot (`.`) in their name, for which S3's default HTTPS
// certificate setup doesn't work correctly at `<bucket>.s3.amazonaws.com`, and so are better
// viewed at `s3.amazonaws.com/<bucket>`.
const newUrl = `https://${bucket}.s3.amazonaws.com/index.html`
console.log(`Redirecting to ${newUrl}`)
window.location.assign(newUrl)
return null
}
}
}
}
}
return (
<HashRouter>
<QueryParamProvider ReactRouterRoute={RouteAdapter}>
<Routes>
<Route path="/*" element={<S3Tree bucket={bucket} prefix={prefix} />} />
<Route path="/*" element={<S3Tree bucket={bucket} pathPrefix={pathPrefix} endpoint={endpoint} />} />
</Routes>
</QueryParamProvider>
</HashRouter>
Expand Down
Loading

0 comments on commit 584d005

Please sign in to comment.