Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support S3-compatible blob storage #1071

Merged
merged 760 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
760 commits
Select commit Hold shift + click to select a range
7f10c97
Add TODOs
Jun 21, 2024
f4368ee
Add TODO
Jun 21, 2024
a236c1f
Add TODOs
Jun 21, 2024
ca4a05a
Add TODO
Jun 21, 2024
58c536f
Add TODO
Jun 21, 2024
96c8e2e
add more TODO
Jun 21, 2024
3346494
stream content into blobs
Jun 24, 2024
d69f948
s3-mock: use Map
Jun 24, 2024
c4d16aa
fix streamBriefcaseCsvs()
Jun 24, 2024
792eb18
try reverting more changes
Jun 24, 2024
8929347
reduce changeset
Jun 24, 2024
cc34846
remove debug
Jun 24, 2024
4542666
streamClientAudits() - minimise changes
Jun 24, 2024
90559c0
Drop unnecessary props
Jun 24, 2024
63a9709
reduce changes
Jun 24, 2024
96c75f7
fix copyright year
Jun 24, 2024
84afe86
fix?
Jun 24, 2024
c2325e9
Move Blobs._ensureWithStatus() into test file
Jun 26, 2024
28bf1ab
Makefile: split fake-s3-server into 2 versions
Jun 27, 2024
a967f4f
config: move to default.external.s3blobStore
Jun 27, 2024
f156bb6
Remove resolved FIXME
Jun 27, 2024
b8d334c
Test encrypted submission fetching from s3
Jun 27, 2024
2a8ff47
remove resolved TODO
Jun 27, 2024
92f7d16
Update TODOs
Jun 27, 2024
0c5e7e9
Resolve FIXME
Jun 27, 2024
d7a40e4
blobs
Jun 27, 2024
11c9353
Remove call to non-existent fn
Jun 27, 2024
40d34f3
tone down TODO
Jun 27, 2024
18a5335
re-org and cull TODOs
Jun 27, 2024
025bb61
Add test case; resolve TODO
Jun 27, 2024
24fbd21
Add form test
Jun 27, 2024
7526560
Update TODOs
Jun 27, 2024
65dd842
lint
Jun 27, 2024
080b9a2
.only
Jun 27, 2024
26f28c9
fix location check
Jun 27, 2024
89fcec4
delete done TODO
Jun 27, 2024
df29a06
Add test for changed attachemnt
Jun 27, 2024
910b3f8
Add TODOs
Jun 27, 2024
57cbc5a
More TODOs from github comments
Jun 27, 2024
bc89109
more TODO structure
Jun 27, 2024
8b0f955
re-order TODOs
Jun 27, 2024
9cc42c1
async
Jun 27, 2024
da81754
remove TODO
Jun 27, 2024
c1ba695
e2e: make cli() async
Jun 27, 2024
a8bfa45
e2e: move TODOs to failing tests
Jun 27, 2024
9e7db67
s3/run-tests: clean up better
Jul 1, 2024
4e60a92
Check if server already running & kill safer
Jul 1, 2024
31a8bc2
Clean up fully
Jul 1, 2024
920db9b
Add new test case
Jul 1, 2024
462b606
Another test
Jul 1, 2024
4e03bb1
make final test work
Jul 1, 2024
ca8fd8d
Prepare for multiple tests
Jul 3, 2024
b2f2710
Add 4 to every 4-attachment
Jul 3, 2024
22fb07b
twp tests
Jul 3, 2024
7351e9d
cleanup after each test
Jul 3, 2024
279ec5a
TODO: add more info to crontab config
Jul 25, 2024
238a5f1
lint
Aug 5, 2024
b352c70
lint
Aug 5, 2024
73db3fe
Make failure easier to read
Aug 5, 2024
5321a9b
re-add debug
Aug 5, 2024
5190c2d
prevent simultaneous update of same blob
Aug 5, 2024
2e3e3f7
expectFailure()
Aug 5, 2024
4dd7f81
await & log sleeping
Aug 5, 2024
e081a45
Log pending uploads before beginning uploads
Aug 5, 2024
8d10fc1
laxer
Aug 5, 2024
cebcfb9
Initial uploaded
Aug 5, 2024
1acbf8b
Correct count
Aug 5, 2024
5c32a76
e2e: expect upload to be stuck in-progress
Aug 5, 2024
a10534f
pipefail
Aug 5, 2024
cc8d062
+e
Aug 5, 2024
42fe3b4
Make things more flexible
Aug 5, 2024
ec45da5
form-4: remove unnecessary attachments
Aug 5, 2024
08ce51b
Fix first test?
Aug 5, 2024
e1d3be6
Remove row-locks
Aug 5, 2024
e778cca
Revert "Remove row-locks"
Aug 5, 2024
1d16d75
Revert "Fix first test?"
Aug 5, 2024
e7986a2
fix uploaded expectation
Aug 5, 2024
452bec4
Revert "Revert "Fix first test?""
Aug 5, 2024
8723f7f
Revert "Revert "Remove row-locks""
Aug 5, 2024
9b7456f
Revert "Revert "Revert "Fix first test?"""
Aug 5, 2024
891b9fc
Revert "Revert "Revert "Revert "Fix first test?""""
Aug 5, 2024
2e6b4aa
remove pointless attachments from form 2
Aug 5, 2024
9899b9a
Revert "Revert "Revert "Remove row-locks"""
Aug 5, 2024
07cc1ad
simplify wait loop
Aug 5, 2024
930805d
remove debug
Aug 5, 2024
ebf0626
Merge branch 'master' into s3-blob-storage-wip
Aug 5, 2024
cc56f29
TODO: more notes
Aug 5, 2024
1c7f25e
manual transaction management
Aug 5, 2024
e93ef3f
Revert "manual transaction management"
Aug 6, 2024
d8cb870
resolve TODO in lib/task/s3
Aug 6, 2024
ea35218
remove isTesting param
Aug 6, 2024
9433543
Add TODOs
Aug 15, 2024
8b96366
Fix test cases; add new ideas
Aug 15, 2024
fb6256d
possible approach to transaction-wrap for upload
Aug 15, 2024
9059267
lint
Aug 15, 2024
7e6d06b
Revert uploadBlobIfAvailable()
Aug 16, 2024
9c574c9
Extract countByStatus();
Aug 16, 2024
c3f264a
Fix the test?
Aug 16, 2024
0916f01
Extract untilUploadInProgress() fn
Aug 16, 2024
a40bdcb
Add test #5 (pending)Add test #5 (pending)Add test #5 (pending)Add te…
Aug 16, 2024
707809a
Rename expectRejectionFrom()
Aug 16, 2024
695be70
countAllByStatus(): convert to numbers
Aug 16, 2024
242852f
numbers not strings
Aug 16, 2024
ec55e87
Add final test killing minio
Aug 16, 2024
c4a2c91
add sleep
Aug 16, 2024
063c99c
assertBlobStatuses()
Aug 16, 2024
7f5a1c9
only
Aug 16, 2024
24f71ea
add setInProgressToPending
Aug 16, 2024
87f3bff
fix s3SetInProgressToPending()
Aug 16, 2024
67672fe
Add TODO
Aug 16, 2024
536b3b7
track when minio terminated
Aug 16, 2024
c37a895
not just only
Aug 16, 2024
7a9ff63
resolve TODOs
Aug 16, 2024
e0cf527
try more docker-centric approach
Aug 16, 2024
2ef4118
new approach?
Aug 16, 2024
cb8c3f4
fake in_progress count
Aug 17, 2024
df011cc
enable SIGTERM test
Aug 17, 2024
be3c7c6
update expectations
Aug 17, 2024
264750f
Expect success
Aug 17, 2024
e931572
docker debug
Aug 17, 2024
ab47ab4
update SIGTERM expectations
Aug 17, 2024
2a005b7
remove debug
Aug 17, 2024
5b3b24b
nicer debug
Aug 17, 2024
7313891
docker debug 1 & 2
Aug 17, 2024
accd6eb
awk
Aug 17, 2024
974fc3c
add TODO
Aug 17, 2024
c9f7473
match
Aug 17, 2024
23d8b7e
Add TODO
Aug 17, 2024
3c173d1
better match
Aug 17, 2024
a6e12b2
fix eregex
Aug 17, 2024
8c1db35
Safer regex
Aug 17, 2024
a0134b3
Add FIXME
Aug 17, 2024
46237ba
lint
Aug 17, 2024
f46c9cf
remove in_progress explicit status
Aug 17, 2024
73dc0f0
remove more refs to setInProgressToPending()
Aug 17, 2024
20fe7a4
Fix error bubbling?
Aug 17, 2024
e3b2eb4
Fix integration test
Aug 17, 2024
26dcb3f
lint
Aug 17, 2024
0da8fea
remove bad command
Aug 17, 2024
9e0a0e3
no big
Aug 17, 2024
1c90d26
fix test
Aug 17, 2024
3d2e453
more simpler assertions
Aug 17, 2024
ac4d9f1
restreict args
Aug 17, 2024
a6a3041
remove no-op
Aug 17, 2024
e3aa773
Avoid timeout
Aug 17, 2024
59ccda0
smaller timeout
Aug 17, 2024
0a6a354
make regex moire reliable
Aug 20, 2024
5ac294c
DEBUG: add timings to uploadFormWithAttachments()
Aug 20, 2024
b88444a
Add comment and another error type
Aug 20, 2024
9d1004f
Resolve FIXME
Aug 20, 2024
cb655ca
Revert "DEBUG: add timings to uploadFormWithAttachments()"
Aug 20, 2024
bf30e4e
Try decrease TIMEOUT
Aug 20, 2024
2dea757
decrease TIMEOUT even more
Aug 20, 2024
836cb31
increase timeout
Aug 20, 2024
acb0565
big number notation
Aug 20, 2024
4398676
TIMEOUT: 60s
Aug 20, 2024
d6caffe
timeout: 120s
Aug 20, 2024
a55832c
Add TODOs, remove brackets
Aug 20, 2024
9cd3fbd
cleanup
Aug 20, 2024
f893030
lint
Aug 20, 2024
aefd213
lint2
Aug 20, 2024
bdbfd36
e2e: add SIGINT test
Aug 20, 2024
4c2d202
Remove resolved TODO
Aug 20, 2024
b4f87b5
increase timeout; sigint=2
Aug 21, 2024
bac3d4a
add TODO
Aug 21, 2024
ac214cc
Make 3.xml unique
Aug 22, 2024
44e9f67
wip
Aug 22, 2024
9dcd901
remove .only
Aug 22, 2024
5c7de0a
remove logging
Aug 22, 2024
b5bd8cf
remove logging
Aug 22, 2024
d70b86c
lint
Aug 22, 2024
956ce2f
Re-intro corect output
Aug 22, 2024
18a3083
remove logging
Aug 22, 2024
62b0709
tone down logging
Aug 22, 2024
888f75a
tidy up
Aug 22, 2024
c1b7fe0
remove logging; resolve TODO
Aug 22, 2024
8f687e6
restructure TODOs
Aug 22, 2024
b0f7d0e
TODO -> FIXME
Aug 22, 2024
d4d57bf
resolve TODO
Aug 22, 2024
9b230d4
lower timeout
Aug 22, 2024
590a456
multiple bigfiles for test7
Aug 22, 2024
2490138
double tiny
Aug 22, 2024
85cf814
fix filenames
Aug 22, 2024
eeb4114
remove TODO
Aug 22, 2024
e7209a5
resolve TODO
Aug 22, 2024
4fc3e41
await before sleep()
Aug 23, 2024
c2e932d
3: fix big bin name
Aug 23, 2024
5fc65eb
7: fix wait code
Aug 23, 2024
63e3130
lint
Aug 23, 2024
a7f4442
recursive
Aug 23, 2024
4e4ef4a
no sleep
Aug 23, 2024
69c1411
2: fix form ID
Aug 23, 2024
b143d19
Add FIXMEs
Aug 23, 2024
acadfc1
Enable encryption
Aug 23, 2024
6a15168
temporary fix - not handling differing etags yet
Aug 23, 2024
773ad92
new assertion
Aug 23, 2024
6109e43
catch unexpected success
Aug 23, 2024
4139b1d
special pending count
Aug 23, 2024
e467898
Add missing awaits
Aug 24, 2024
6161eb6
postgres: trim log line lengths
Aug 24, 2024
37b82f4
Count unlocked rows without locking them
Aug 24, 2024
b0820a9
fix pending count
Aug 24, 2024
7d0f703
Add another expected error
Aug 24, 2024
8c38514
lint
Aug 24, 2024
d3cde93
lint - don't await in promise
Aug 24, 2024
5d8d95f
Update etag handling
Aug 26, 2024
de6adce
Update RFC quote
Aug 26, 2024
77b00a5
Add new ETag test
Aug 26, 2024
1eca836
Fix tests for new etag handling
Aug 26, 2024
76cdcb3
FIXME -> TODO
Aug 26, 2024
87d482e
remove etag handling from blob upload
Aug 26, 2024
2c4b9f0
add TODO item
Aug 26, 2024
1324ada
add link to blocking issue
Aug 26, 2024
ffe101f
remove TODO
Aug 26, 2024
08b4f6e
Add test TODO
Aug 26, 2024
ed5695a
simplify purge code
Aug 26, 2024
2639f5a
change object name to include blob id
Aug 27, 2024
3e3d856
lint
Aug 27, 2024
5ad1944
Add check for null/undefined id
Aug 27, 2024
3d97ddf
atomic purge?
Aug 27, 2024
452ca71
notes
Aug 27, 2024
ef52d1c
purge optimisation
Aug 27, 2024
18261a4
Fix s3 mock
Aug 27, 2024
fef1d25
remove console.log
Aug 27, 2024
e5f8c8f
Include sha in object URL
Aug 27, 2024
b1b0cdd
return sha!
Aug 27, 2024
077b509
blobIds
Aug 27, 2024
627928c
lint
Aug 27, 2024
0114300
Fix blob streaming
Aug 27, 2024
45da0a6
blobId
Aug 27, 2024
3cd892b
Merge branch 'master' into s3-blob-storage-wip
Aug 27, 2024
cf80876
reintro worker queue
Aug 29, 2024
cc3805f
external/s3: move require()s to top
Aug 29, 2024
c068356
migration: drop extension on unmigrate
Aug 29, 2024
cd44b5e
Update comment
Aug 29, 2024
6a7a9e0
re-order vars to make comparison of fns easier
Aug 29, 2024
aeaaf14
revert whitespace change
Aug 29, 2024
221f081
clarify test explanation
Aug 29, 2024
d03a98f
comment
Aug 30, 2024
116d48b
Add option: objectPrefix
Aug 30, 2024
e35646a
Update TODOs
Sep 3, 2024
d569f2c
Update TODOs
Sep 3, 2024
b8aae1c
try include stack in Error
Sep 11, 2024
f9e3b69
remove TODO file
Sep 11, 2024
f2b732d
Makefile: revert changes
Sep 11, 2024
7c91aba
Merge branch 'master' into s3-blob-storage-wip
Sep 11, 2024
e391c4d
Remove unnecessary linter exception
Sep 13, 2024
d9bb2db
Reswet package-lock
Sep 13, 2024
94bc690
update migration name
Sep 13, 2024
ce22192
remove down migration
Sep 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions .github/workflows/s3-e2e.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
name: S3 E2E Tests

on: push

env:
LOG_LEVEL: DEBUG
alxndrsn marked this conversation as resolved.
Show resolved Hide resolved

jobs:
s3-e2e:
timeout-minutes: 15
# TODO should we use the same container as circle & central?
runs-on: ubuntu-latest
services:
# see: https://docs.github.com/en/enterprise-server@3.5/actions/using-containerized-services/creating-postgresql-service-containers
postgres:
image: postgres:14.10
env:
POSTGRES_PASSWORD: odktest
ports:
- 5432:5432
# Set health checks to wait until postgres has started
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
minio:
# see: https://github.com/minio/minio/discussions/16099
image: minio/minio:edge-cicd
env:
MINIO_ROOT_USER: odk-central-dev
MINIO_ROOT_PASSWORD: topSecret123
ports:
- 9000:9000
options: >-
--health-cmd "curl -s http://localhost:9000/minio/health/live"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- name: Use Node.js 20
uses: actions/setup-node@v4
with:
node-version: 20.10.0
cache: 'npm'
- run: npm ci --legacy-peer-deps
- run: node lib/bin/create-docker-databases.js
- name: E2E Test
timeout-minutes: 10
run: ./test/e2e/s3/ci
- name: Backend Logs
if: always()
run: "! [[ -f ./backend.log ]] || cat ./backend.log"
alxndrsn marked this conversation as resolved.
Show resolved Hide resolved
46 changes: 46 additions & 0 deletions .github/workflows/s3-integration.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: S3 integration tests

on: push

jobs:
s3-integration-test:
timeout-minutes: 4
# TODO should we use the same container as circle & central?
runs-on: ubuntu-latest
services:
# see: https://docs.github.com/en/enterprise-server@3.5/actions/using-containerized-services/creating-postgresql-service-containers
postgres:
image: postgres:14.10
env:
POSTGRES_PASSWORD: odktest
ports:
- 5432:5432
# Set health checks to wait until postgres has started
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
minio:
# see: https://github.com/minio/minio/discussions/16099
image: minio/minio:edge-cicd
env:
MINIO_ROOT_USER: odk-central-dev
MINIO_ROOT_PASSWORD: topSecret123
ports:
- 9000:9000
options: >-
--health-cmd "curl -s http://localhost:9000/minio/health/live"
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- name: Use Node.js 20
uses: actions/setup-node@v4
with:
node-version: 20.10.0
cache: 'npm'
- run: npm ci --legacy-peer-deps
- run: node lib/bin/create-docker-databases.js
- run: make test-s3-integration
23 changes: 23 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ node_modules: package.json
test-oidc-integration: node_version
TEST_AUTH=oidc NODE_CONFIG_ENV=oidc-integration-test make test-integration

.PHONY: test-s3-integration
test-s3-integration: fake-s3-accounts
TEST_S3=true NODE_CONFIG_ENV=s3-dev make test-integration

.PHONY: test-oidc-e2e
test-oidc-e2e: node_version
cd test/e2e/oidc && \
Expand All @@ -19,6 +23,14 @@ test-oidc-e2e: node_version
dev-oidc: base
NODE_CONFIG_ENV=oidc-development npx nodemon --watch lib --watch config lib/bin/run-server.js

.PHONY: fake-s3-accounts
fake-s3-accounts: node_version
NODE_CONFIG_ENV=s3-dev node lib/bin/s3-create-bucket.js

.PHONY: dev-s3
dev-s3: fake-s3-accounts base
NODE_CONFIG_ENV=s3-dev npx nodemon --watch lib --watch config lib/bin/run-server.js

alxndrsn marked this conversation as resolved.
Show resolved Hide resolved
.PHONY: fake-oidc-server
fake-oidc-server:
cd test/e2e/oidc/fake-oidc-server && \
Expand All @@ -31,6 +43,17 @@ fake-oidc-server-ci:
npm clean-install && \
FAKE_OIDC_ROOT_URL=http://localhost:9898 node index.js

.PHONY: fake-s3-server
fake-s3-server:
alxndrsn marked this conversation as resolved.
Show resolved Hide resolved
# run an ephemeral, s3-compatible local store
# default admin credentials: minioadmin:minioadmin
# see: https://hub.docker.com/r/minio/minio/
docker run --rm \
-p 9000:9000 -p 9001:9001 \
-e MINIO_ROOT_USER=odk-central-dev \
-e MINIO_ROOT_PASSWORD=topSecret123 \
minio/minio server /data --console-address ":9001"

.PHONY: node_version
node_version: node_modules
node lib/bin/enforce-node-version.js
Expand Down
11 changes: 11 additions & 0 deletions config/s3-dev.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"default": {
"s3blobStore": {
"server": "http://localhost:9000",
"accessKey": "odk-central-dev",
"secretKey": "topSecret123",
"bucketName": "odk-central-bucket",
"requestTimeout": 60000
}
}
}
3 changes: 3 additions & 0 deletions lib/bin/run-server.js
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,9 @@ const server = service.listen(config.get('default.server.port'), () => {
const { workerQueue } = require('../worker/worker');
workerQueue(container).loops(4);

const { s3worker } = require('../util/s3');
s3worker.start(container);


////////////////////////////////////////////////////////////////////////////////
// CLEANUP
Expand Down
39 changes: 39 additions & 0 deletions lib/bin/s3-create-bucket.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
// Copyright 2024 ODK Central Developers
// See the NOTICE file at the top-level directory of this distribution and at
// https://github.com/getodk/central-backend/blob/master/NOTICE.
// This file is part of ODK Central. It is subject to the license terms in
// the LICENSE file found in the top-level directory of this distribution and at
// https://www.apache.org/licenses/LICENSE-2.0. No part of ODK Central,
// including this file, may be copied, modified, propagated, or distributed
// except according to the terms contained in the LICENSE file.

ktuite marked this conversation as resolved.
Show resolved Hide resolved
const Minio = require('minio');

const { server, bucketName, accessKey, secretKey } = require('config').get('default').s3blobStore;

alxndrsn marked this conversation as resolved.
Show resolved Hide resolved
const minioClient = (() => {
const url = new URL(server);
const useSSL = url.protocol === 'https:';
const endPoint = (url.hostname + url.pathname).replace(/\/$/, '');
const port = parseInt(url.port, 10);

return new Minio.Client({ endPoint, port, useSSL, accessKey, secretKey });
})();

const log = (...args) => console.log(__filename, ...args);

minioClient.bucketExists(bucketName)
.then(exists => {
if (exists) {
log('Bucket already exists.');
return;
}

log('Creating bucket:', bucketName);
return minioClient.makeBucket(bucketName)
.then(() => log('Bucket created OK.'));
})
.catch(err => {
log('ERROR CREATING MINIO BUCKET:', err);
process.exit(1);
});
59 changes: 59 additions & 0 deletions lib/bin/s3.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
// Copyright 2024 ODK Central Developers
// See the NOTICE file at the top-level directory of this distribution and at
// https://github.com/getodk/central-backend/blob/master/NOTICE.
// This file is part of ODK Central. It is subject to the license terms in
// the LICENSE file found in the top-level directory of this distribution and at
// https://www.apache.org/licenses/LICENSE-2.0. No part of ODK Central,
// including this file, may be copied, modified, propagated, or distributed
// except according to the terms contained in the LICENSE file.

/* eslint-disable no-use-before-define */

const config = require('config');
const { program } = require('commander');
const { sql } = require('slonik');

const { slonikPool } = require('../external/slonik');
const { exhaustBlobs } = require('../util/s3');

program.command('count-failed-uploads').action(countFailed);
program.command('reset-failed-as-pending').action(setFailedToPending);
program.command('upload-pending').action(uploadPending);
program.parse();

function countFailed() {
withDb(async (db) => console.log('Failed uploads:', await getFailedCount(db)));
}

function setFailedToPending() {
withDb(async (db) => {
const count = await db.oneFirst(sql`
WITH updated AS (
UPDATE blobs SET s3_status='pending' WHERE s3_status='failed' RETURNING 1
)
SELECT COUNT(*) FROM updated
`);
console.log(count, 'blobs marked for re-uploading.');
});
}

function uploadPending() {
withDb(async (db) => {
console.log('Uploading', await getFailedCount(db), 'blobs...');
await exhaustBlobs({ db });
console.log('Upload completed.');
});
}

function getFailedCount(db) {
return db.oneFirst(sql`SELECT COUNT(*) FROM blobs WHERE s3_status='failed'`);
}

async function withDb(fn) {
const db = slonikPool(config.get('default.database'));
try {
await fn(db);
} finally {
await db.end();
}
}
25 changes: 18 additions & 7 deletions lib/data/attachments.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ const { compose, identity } = require('ramda');
const { Writable, pipeline } = require('stream');
const { rejectIfError } = require('../util/promise');
const { zipPart } = require('../util/zip');
const s3 = require('../util/s3');
const sanitize = require('sanitize-filename');

// encrypted files have a .enc extension that needs to be stripped. we will only
Expand All @@ -33,14 +34,24 @@ const streamAttachments = (inStream, decryptor) => {
write(x, _, done) {
const att = x.row;

// this sanitization means that two filenames could end up identical.
// luckily, this is not actually illegal in the zip spec; two files can live at precisely
// the same location, and the conflict is dealt with interactively by the unzipping client.
const content = (att.localKey == null)
? att.content
: decryptor(att.content, att.keyId, att.localKey, att.instanceId, att.index);
const appendContent = (rawContent) => {
// this sanitization means that two filenames could end up identical.
// luckily, this is not actually illegal in the zip spec; two files can live at precisely
// the same location, and the conflict is dealt with interactively by the unzipping client.
const content = (att.localKey == null)
? rawContent
: decryptor(rawContent, att.keyId, att.localKey, att.instanceId, att.index);

archive.append(content, { name: join('media', processName(att.name)) }, done);
archive.append(content, { name: join('media', processName(att.name)) }, done);
};

if (att.s3_status === 'uploaded') {
s3.getContentFor(att)
.then(appendContent)
.catch(err => { this.destroy(err); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean about this query:

would it be preferable to make blob.content async everywhere, and moving the s3 check/fetch inside the getter?

  • pro: neater code, less likely to have bugs where blob.content used when data is not available
  • con: seems intrusive of frames

(Just highlighting this one because it seemed like the smallest example.)

Maybe it's okay to leave it like this since there are only about 4 places blobs are used (unless you have an idea for some clever alternative):

  1. exporting attachments (here)
  2. exporting encrypting submissions that were stored as blobs
  3. processing client audit attachments
  4. (different mechanism, urlForBlob instead of getContentFor line the ones above) the http blob response

As long as each of these 4 paths is tested, since each might have a different behavior if there is an error fetching from S3.

Some errors i'm thinking of:

  • auth problem
  • data doesn't exist where it should
  • response from s3 is taking too long
  • data is broken in some other way?

} else {
appendContent(att.content);
}
},
final(done) {
archive.finalize();
Expand Down
52 changes: 33 additions & 19 deletions lib/data/briefcase.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ const { SchemaStack } = require('./schema');
const { rejectIfError } = require('../util/promise');
const { PartialPipe } = require('../util/stream');
const { zipPart } = require('../util/zip');
const s3 = require('../util/s3');

/* eslint-disable no-multi-spaces */

Expand Down Expand Up @@ -293,27 +294,40 @@ const streamBriefcaseCsvs = (inStream, inFields, xmlFormId, selectValues, decryp
// the decryptor will return null if it does not have a decryption key for the
// record. this is okay; we pass the null through and processRow deals with it.
const { encryption } = submission.aux;
const xml =
(submission.def.localKey == null) ? submission.xml :
(encryption.encHasData === false) ? 'missing' : // eslint-disable-line indent
decryptor(encryption.encData, encryption.encKeyId, submission.def.localKey, submission.instanceId, encryption.encIndex); // eslint-disable-line indent

// if something about the xml didn't work so well, we can figure out what
// to say and bail out early.
const status =
(xml === 'missing') ? 'missing encrypted form data' :
(xml === null) ? 'not decrypted' : null; // eslint-disable-line indent
if (status != null) {
const result = new Array(rootHeader.length);
writeMetadata(result, rootMeta, submission, submission.aux.submitter, submission.aux.exports.formVersion, submission.aux.attachment, status);
return done(null, result);

const decryptXml = (data) => decryptor(data, encryption.encKeyId, submission.def.localKey, submission.instanceId, encryption.encIndex);
const processXml = (xml) => {
try {
// if something about the xml didn't work so well, we can figure out what
// to say and bail out early.
const status =
(xml === 'missing') ? 'missing encrypted form data' :
(xml === null) ? 'not decrypted' : null; // eslint-disable-line indent
if (status != null) {
const result = new Array(rootHeader.length);
writeMetadata(result, rootMeta, submission, submission.aux.submitter, submission.aux.exports.formVersion, submission.aux.attachment, status);
return done(null, result);
}

// write the root row we get back from parsing the xml.
processRow(xml, submission.instanceId, fields, rootHeader, selectValues).then((result) => {
writeMetadata(result, rootMeta, submission, submission.aux.submitter, submission.aux.exports.formVersion, submission.aux.attachment);
done(null, result);
}, done); // pass through errors.
} catch (ex) { done(ex); }
};

if (submission.def.localKey == null) processXml(submission.xml);
else if (encryption.encHasData === false) processXml('missing');
else if (encryption.encS3Status === 'uploaded') {
s3.getContentFor({ sha: encryption.encSha, md5: encryption.encMd5 })
.then(decryptXml)
.then(processXml)
.catch(done);
} else {
processXml(decryptXml(encryption.encData));
}

// write the root row we get back from parsing the xml.
processRow(xml, submission.instanceId, fields, rootHeader, selectValues).then((result) => {
writeMetadata(result, rootMeta, submission, submission.aux.submitter, submission.aux.exports.formVersion, submission.aux.attachment);
done(null, result);
}, done); // pass through errors.
} catch (ex) { done(ex); }
}, flush(done) {
if (rootHeaderSent === false) this.push(rootHeader);
Expand Down
2 changes: 1 addition & 1 deletion lib/model/frames/blob.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ const { digestWith, md5sum, shasum } = require('../../util/crypto');
const { pipethroughAndBuffer } = require('../../util/stream');


class Blob extends Frame.define(table('blobs'), 'id', 'sha', 'content', 'contentType', 'md5') {
class Blob extends Frame.define(table('blobs'), 'id', 'sha', 'content', 'contentType', 'md5', 's3_status') {
// Given a path to a file on disk (typically written to a temporary location for the
// duration of the request), will do the work to generate a Blob instance with the
// appropriate SHA and binary content information. Does _not_ save it to the database;
Expand Down
2 changes: 1 addition & 1 deletion lib/model/frames/submission.js
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ Submission.Extended = Frame.define('formVersion', readable);

Submission.Xml = Frame.define(table('submission_defs', 'xml'), 'xml');

Submission.Encryption = Frame.define(into('encryption'), 'encHasData', 'encData', 'encIndex', 'encKeyId');
Submission.Encryption = Frame.define(into('encryption'), 'encHasData', 'encData', 'encSha', 'encMd5', 'encIndex', 'encKeyId', 'encS3Status');

Submission.Exports = Frame.define(into('exports'), 'formVersion');

Expand Down
Loading