Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add journalist interface API #3619

Merged
merged 71 commits into from
Jul 24, 2018
Merged

Add journalist interface API #3619

merged 71 commits into from
Jul 24, 2018

Conversation

redshiftzero
Copy link
Contributor

@redshiftzero redshiftzero commented Jun 29, 2018

Status

Ready for review

Description of Changes

Fixes #1761

Changes proposed in this pull request:

  • add initial journalist interface API. Note that the API has changed somewhat from what was initially proposed in Journalist API #1761, the canonical reference point is the docs in bdcd4ba
  • Endpoints that are not represented here are imho “nice to haves” and I propose we slowly add them in followup issues as needed. My rationale here is: 1. they are not needed for an initial client program and 2. The functionality currently available via this API already unblocks developers at news orgs to implement custom functionality. Adding endpoints without modifying existing functionality also means we do not need to increment the API version

Testing

  • Follow the docs in this branch and try to use the API. From the perspective of a user consuming this API, is anything not intuitive or could be clearer (in the docs or in the API responses itself)?

  • Are there endpoints that are important for an initial API that are not here?

  • Are there error cases that are not gracefully handled? Apologies for the lack of detailed test plan here, but exploratory testing / attempts to break this is really what this needs.

  • Are there security improvements that should be made?

Test database migrations

  1. Provision staging VMs on develop:
git checkout develop
make build-debs
vagrant up /staging/
  1. Now add a source by submitting a document via the source interface.
  2. Now upgrade:
git checkout journalist-api-0.9.0
make build-debs
vagrant provision /staging/

The database migrations should occur without issue and you should be able to use the API (or direct database access) to verify the UUID on the source and submission now exist.

Deployment

Will be deployed in securedrop-app-code package

Checklist

If you made non-trivial code changes:

  • I have written a test plan and validated it for this PR

If you made changes to documentation:

  • Doc linting (make docs-lint) passed locally

And an initial (pytest-based) unit test
We have logic in __init__.py here to ensure that a developer
does not accidentally forget to protect a route with a
@login_required decorator. Instead of the decorator, we have
a list of insecure views. We rework this to allow us to use a
decorator for the API routes.
This is primarily for the API, but it turns out that we actually
make use of abort(403) to prevent an admin from deleting
themselves, so instead of seeing a Flask error page, they will see
a nice error page in the style of the rest of the journalist
interface.
@redshiftzero redshiftzero requested review from kushaldas, a user, heartsucker and emkll June 29, 2018 20:45
@redshiftzero redshiftzero force-pushed the journalist-api-0.9.0 branch 2 times, most recently from 51cd067 to 6a2b8aa Compare June 29, 2018 21:03
Again, mostly for the journalist API, but users who are logged
into the webapp will now see a custom error page with the regular
SecureDrop styling instead of the default Flask page
Covers cases where:
  * Source does not exist (404 response)
  * Star is successfully added (201 response)
This was a bare except, but I think instead we want to handle
only itsdangerous.BadData exceptions, which is a general exception
that includes a bad signature and an expired one.
We need to also have a nice error handler for method not allowed,
that should return JSON for the API and an HTML page for
users of the regular web application.
redshiftzero and others added 11 commits July 17, 2018 15:52
During testing, I ran into an issue where there was a
failure in the following case:

reverted_schema was:

CREATE TABLE "sources" (
 	id INTEGER NOT NULL,
 	filesystem_id VARCHAR(96),
 	journalist_designation VARCHAR(255) NOT NULL,
 	flagged BOOLEAN,
 	last_updated DATETIME,
 	pending BOOLEAN,
 	interaction_count INTEGER NOT NULL,
 	PRIMARY KEY (id),
 	CHECK (flagged IN (0, 1)),
 	CHECK (pending IN (0, 1)),
 	UNIQUE (filesystem_id)
)

and original_schema was:

CREATE TABLE sources (
 	id INTEGER NOT NULL,
 	filesystem_id VARCHAR(96),
 	journalist_designation VARCHAR(255) NOT NULL,
 	flagged BOOLEAN,
 	last_updated DATETIME,
 	pending BOOLEAN,
 	interaction_count INTEGER NOT NULL,
 	PRIMARY KEY (id),
 	UNIQUE (filesystem_id),
 	CHECK (flagged IN (0, 1)),
 	CHECK (pending IN (0, 1))
)

which fails for two reasons:

* The unique constraint on filesystem_id is not at the same line in
  the CREATE TABLE statement.
* The table name is quoted in one CREATE TABLE statement, but not
  in the other.

In order to make our tests a little more lenient in this case (and
not produce spurious test failures), we should:

* Compare sorted lists consisting of the lines in each CREATE
  TABLE statement
* Strip commas and double quotes for each element in the aforementioned
  lists
To prevent confusion, we also rename `uuid` to `source_uuid`.
Note that Flask's send_file does include ETags by default,
but they are not hashes, so less useful for verifying downloads
were not corrupted after fetching over Tor. The ETag in Flask is:

```
rv.set_etag('%s-%s-%s' % (
                os.path.getmtime(filename),
                os.path.getsize(filename),
                adler32(
                    filename.encode('utf-8') if isinstance(filename, text_type)
                    else filename
                ) & 0xffffffff
            ))
```

https://github.com/pallets/flask/blob/161c43649d8c362c8359e0b79aeca40c754c5b51/flask/helpers.py#L616
I'm intentionally not using test_source in the test_submissions
fixture as it is a bit messy/spaghetti for saving 1 LOC
@redshiftzero
Copy link
Contributor Author

fixed, squashed (some of them), re-pushed 🚂

seconds=TOKEN_EXPIRATION_MINS * 60)
response = jsonify({'token': journalist.generate_api_token(
expiration=TOKEN_EXPIRATION_MINS * 60),
'expiration': token_expiry.isoformat() + 'Z'})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @redshiftzero mentioned, the timezone is always in UTC, the expiration value will have sometime like '2018-07-18T13:39:34.072044Z'.

@wraps(f)
def decorated_function(*args, **kwargs):
try:
auth_header = request.headers['Authorization']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a one line comment on what that header looks like after this line.

return abort(403, 'API token not found in Authorization header.')

if auth_header:
auth_token = auth_header.split(" ")[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What all can be valid values in first part before the space?

source = get_or_404(Source, source_uuid, column=Source.uuid)
utils.make_star_false(source.filesystem_id)
db.session.commit()
return jsonify({'message': 'Star removed'}), 200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my test, I removed a start from a source, I got back this reply: {'message': 'Star removed'}. After this when I am trying to get all the sources or that particular source again, I can still see 'is_starred': True for that source.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good catch! filing followup ticket to address

heartsucker
heartsucker previously approved these changes Jul 18, 2018
Copy link
Contributor

@heartsucker heartsucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test manually on this one, but the code and associated tests look good. Very happy to give this the 💯

with journalist_app.app_context():
source = Source.query.first()
with pytest.raises(NotImplementedError):
source.public_key = 'a curious developer tries to set a pubkey!'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤭

Copy link
Contributor

@emkll emkll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks fantastic @redshiftzero , I did not encounter any major issues. I separated my testing in 2 parts, the app/api testing in a development VM, and upgrade testing in staging VMs.

For the application portion, I tested the authentication, rate limiting, and went through all the various methods. Everything works as expected. I've observed the following behavior:

  • POST with parameters to an API endpoint generates a error: 500: no JSON object could be decoded
  • Request body format: When making API calls with raw/text, I get the following error only with the message reply endpoint: please send texts in valid json . Switching to the preferred application/json works find, but curious that raw/text works for other methods.
  • There is no logout functionality. How complex would it be to implement some fort of logout functionality? For example, attaching the token to a user's session.

For the upgrade part of the review, I provisioned staging on develop, then build debs on this branch, and vagrant provision on this API branch. The code was updated in /var/www/securedrop, the migrations were successful, I observed the UUID column in the database, and the API is accessible over the authenticated Tor hidden service.

return user


def token_required(f):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is a good idea, but instead of calling the token_required decorator, we could use the before_request decorator (http://flask.pocoo.org/docs/1.0/api/#flask.Flask.before_request) on the login function and explicitly mark the public endpoints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1-ing that

s = TimedJSONWebSignatureSerializer(current_app.config['SECRET_KEY'])
try:
data = s.loads(token)
except BadData:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 , BadData catches all errors, including BadSignature

@heartsucker
Copy link
Contributor

RE: @emkll

attaching the token to a user's session

A Flask session is a cookie, so we'd have to have API consumers use both the Authorization header and and correct Cookie. The latter would be sufficient to remove the first.

Sources should only be exposed to journalists when they
have submitted something
Copy link
Contributor

@emkll emkll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another round of testing and everything looks good to me 👍. I will address the comment from my previous review (invalid/empty JSON in POST returns 500 from #3619 (review))

Thanks @redshiftzero and thanks @heartsucker for the thorough review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants