Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: unpublished files can be downloaded by users with the File Downloader role (DownloadFile permission) on that file or inherited from dataset or dataverse #2648

Closed
pdurbin opened this issue Oct 16, 2015 · 39 comments

Comments

@pdurbin
Copy link
Member

pdurbin commented Oct 16, 2015

As clarified in c8e49f2 the File Downloader role should only apply to published files. Currently, if a user (or group, presumably) is assigned the DownloadFile permission to an unpublished file (either directly or through inheritance, it is believed), that user can download the file if he or she knows the file id. That is to say, if a user clicks a link to http://localhost:8080/api/access/datafile/12 (for example) and is logged in to Dataverse, the file can be downloaded even if hasn't been published. This is not the intended behavior.

The intention is that a dataset is being prepared and files are being set to restricted, the File Downloader role can be assigned ahead of time, before the dataset is published. The role should have no affect, however, until the dataset is published, as discussed at #2645. (This should probably be documented.)

Testing of this issue should include assigning the DownloadFile permission at various levels such as at the file itself, at the file's parent dataset, and at the dataset's parent dataverse. Testing should include assignment of the role to both users and groups.

@bencomp
Copy link
Contributor

bencomp commented Oct 20, 2015

it is believed

Please write unit tests for testing the permissions, instead of going by beliefs.

(This should probably be documented.)

Yes! In fact, please start writing down the way things should work, i.e. the requirements, let the accountable people sign them off, then write the tests, see that the tests fail in the current situation and then start fixing the code.

(I've been saying this for a while and it's not meant to critique individuals, but testing is essential in these parts of the code.)

@pdurbin
Copy link
Member Author

pdurbin commented Oct 20, 2015

@bencomp I'm not aware of a requirements doc for this part of the code. When you ask for documentation are you asking for developer-related documentation in the form of requirements? Or are you asking for end user documentation in the form additions to the User Guide. Do know that #2653 was opened yesterday for the User Guide that says, "Permissions: write up more information about each specific permission." The permission in question for this issue is the "DownloadFile" permission.

When you say unit tests do you really mean integration tests? I'm pretty sure one would need a running system (Glassfish, Postgres, etc.) to test this issue. With enough time I could write failing integration tests that demonstrate the issue (there are many edge cases though) and then fix the code to make the tests pass. I'd be starting from scratch though, so it would be time consuming. @bencomp I'm glad you're championing a culture of automated testing because it's an interest of mine as well. I've written some integration tests at https://github.com/IQSS/dataverse/tree/v4.2/src/test/java/edu/harvard/iq/dataverse/api but they are not executed automatically on every build like the unit tests are. I run them when I think of them. And actually, I can't even too many at once because of database deadlocks: #2460

@bencomp
Copy link
Contributor

bencomp commented Oct 20, 2015

@pdurbin I don't think you need a running system to test enforcement of permissions, because that happens in the Dataverse code, doesn't it? You might need to create some Dataverses, Datasets and DataFiles and roles. You don't need to start from scratch: you may reuse https://github.com/bencomp/dataverse/blob/4.3-dev/src/test/java/edu/harvard/iq/dataverse/authorization/DataverseRolePermissionHelperTest.java

My suggestion about writing documentation first was aimed at preventing being 6 months into production and then find out different people understand core functionality differently. From the comments on #2645 I understand that this is the case.
Are you sure there are many edge cases for the permission system? Can you list them?
If there are so many, how will you make sure end users understand the application?

Also: how can you mark this issue solved if you don't test for every (edge) case? Can you at least start with the case that fails right now?

@pdurbin
Copy link
Member Author

pdurbin commented Oct 20, 2015

@bencomp the main edge cases to test for this issue are when a permission is inherited, when it is implicit vs. explicit. That's why I wrote "either directly or through inheritance, it is believed." Both implicit and explicit permissions should be tested for this issue. Oh, and groups vs. users are edge cases. There are a variety of types of groups (explicit, IP, Shib).

https://github.com/bencomp/dataverse/blob/4.3-dev/src/test/java/edu/harvard/iq/dataverse/authorization/DataverseRolePermissionHelperTest.java looks interesting but it doesn't test the "Data Access" API ( i.e. http://localhost:8080/api/access/datafile/12 ). I'm all for refactoring API code to make it testable by unit tests but I'm honestly not sure how to do this. @bencomp if you can show the way, please do!

As I mentioned at http://irclog.iq.harvard.edu/dataverse/2015-10-20 perhaps http://guides.dataverse.org/en/latest/developers/testing.html should be updated to communicate to developers what's expected with regard to automated testing (unit tests and integration tests). Also, it would be great to have integration tests run on every commit at https://travis-ci.org/IQSS/dataverse but I haven't figured out how to do this. Right now Travis only runs unit tests, not integration tests.

@bencomp
Copy link
Contributor

bencomp commented Oct 20, 2015

So it looks like every download is a call to /api/access/datafile/... and the code called at that endpoint checks whether it should return the requested file or an error. The outcome depends on Dataverse's business rules. You should break up the business rules to get to the lower level rules that can be unit-tested.
At play here are the rules for determining publication status of (the container(s) of) the file, access restriction, role assignment and permissions for a role.

For unit tests, you could mock parts of the code that go into the entity management, or provide a mock entity manager that has every possible situation that you want to support in Dataverse.
In the end you should test for every combination of

  • role (= set of permissions)
  • role assignment source type (= <? extends DvObject>)
  • publication status of the role assignment source and the container(s) it is in
  • (explicit) access restrictions on the role assignment source and the container(s) it is in
  • role assignment target (= user or group)

whether the one definitive authorisation check

  1. returns true when it is expected to return true
  2. returns false when it is expected to return false
  3. throws an exception for combinations that make no sense but that are not strictly impossible to make, i.e. the compiler doesn't throw an error building the test.

I think the edge cases are the combinations that should throw the exceptions. All valid combinations are just cases that need to be handled correctly.

The case at hand is of course the combination

  • role: FileDownloader = {DownloadFile}
  • RAS: DataFile
  • publication status: unpublished because a enclosing container is unpublished
  • access restriction: file is restricted (either because all files in dataset are restricted or because this file is restricted)
  • RAT: some user or group

The rule "should the download be allowed?" should be answered "no" under these conditions.

As I said, you want to break up this rule. You should have a unit test for the publication status check and another one for the file restriction check. Then you can treat many cases as a single one:

assertTrue("public unrestricted file should be allowed",
    isDownloadAllowed(
        fileThatIsUnrestrictedAndPublished
    ));
assertFalse("unpublished file should not be allowed",
    isDownloadAllowed(
        fileThatIsUnpublished
    ));

@pdurbin
Copy link
Member Author

pdurbin commented Oct 20, 2015

You should break up the business rules to get to the lower level rules that can be unit-tested.

I couldn't agree more. Pinging @michbarsinai for feedback on this great writeup since he implemented the permission system. Thanks @bencomp !

pdurbin added a commit that referenced this issue Oct 21, 2015
Note that manual setup is required in the GUI due to #2497
@pdurbin
Copy link
Member Author

pdurbin commented Oct 21, 2015

In 852742e I added a quick and dirty test in bash to reproduce the bug but manual setup is required because there is no API endpoint for restricting files per #2497. :(

@pdurbin
Copy link
Member Author

pdurbin commented Oct 21, 2015

@landreev as we discussed I'm passing this issue to you to look at the commit I just pushed (852742e) which is a no-op (no change in behavior) but adds more logging and javadoc in the area of the bug.

The commit also includes an updated quick-and-dirty script with three test scenarios (but more should be written): https://github.com/IQSS/dataverse/blob/5a235110a482b41ea81c1a51fb5a3146121ba5c0/scripts/issues/2648/reproduce

@pdurbin pdurbin assigned landreev and unassigned pdurbin Oct 21, 2015
landreev added a commit that referenced this issue Oct 22, 2015
@landreev landreev assigned pdurbin and unassigned landreev Oct 22, 2015
@landreev
Copy link
Contributor

OK, I believe it should be working properly now, but please test using your scripts.
No, we don't want to throw 403 on all unpublished files; simply because that way the owners/authors would not be able to download such files.
But I believe the logic below should achieve what we need: (this is how the logic on the datasetpage works pretty much)

if (permissionService.on(df).has(Permission.DownloadFile)) {
if (published)
return true; // access granted
} else {
// if the file is NOT published, we will let them download the
// file ONLY if they also have the permission to view
// the unpublished verion (this will catch the owner/contributors):
if (permissionService.on(df.getOwner()).has(Permission.ViewUnpublishedDataset)) {
return true; // access granted
}
}
}
return false; // 403

@pdurbin
Copy link
Member Author

pdurbin commented Oct 22, 2015

I'm on 3ae2013 and I'm getting an exception when I try to restrict a file. Here's a stack trace about "org.eclipse.persistence.exceptions.OptimisticLockException Exception Description: The object [edu.harvard.iq.dvn.core.study.FileMetadata[id=1]] cannot be merged because it has changed or been deleted since it was last read": stack.txt

@pdurbin
Copy link
Member Author

pdurbin commented Oct 22, 2015

I just dropped my database and tried again to restrict a file but I can't via the GUI (and it's not possible via the API per #2497).

As with the previous stack trace I uploaded with my last comment, this one also ended with "Caused by: java.lang.NullPointerException at edu.harvard.iq.dataverse.DatasetPage.populateDatasetUpdateFailureMessage(DatasetPage.java:2247)"

This worked before recent changes by @landreev and @sekmiller . Do either of you know what's going on? If you can reproduce this it probably deserves a separate issue. "Can't restrict files" or whatever.

@landreev
Copy link
Contributor

Kevin, please re-test this. Both your ("exotic") case, of a non-owner user with the VIewUnpublished permission (they should be able to download non-restricted unpublished files). And see if everything download permissions-related is still working properly.

@kcondon
Copy link
Contributor

kcondon commented Oct 28, 2015

@landreev Hold the presses, found another issue: can download restricted file through API when have only view unpublished ds perms. Also no request access button appears in UI in this case.

Additional set up info:
Published dataverse:
/dataverse/tap
Unpublished (draft0 dataset:
persistentId=doi:10.5072/FK2/93II1J
Creator=dataverseAdmin
Grantee: kcadmin1
Perms Granted: ViewUnpublishedDataset, role name: Special1

restricted file: 50by1000.tab, file id=2700
unrestricted file: ShortFile.txt, file id=2701

Given the above, this API call should return 403:
https:///api/access/datafile/2700

Otherwise initial problem and all likely cases works as expected.

@kcondon kcondon assigned landreev and unassigned kcondon Oct 28, 2015
@landreev
Copy link
Contributor

OK, yeah, just as I suspected - only the filemetadata is marked as restricted; but not the file:

SELECT restricted FROM filemetadata WHERE datafile_id=2700
1
SELECT restricted FROM filemetadata WHERE datafile_id=2701
0
SELECT restricted FROM datafile WHERE id=2700
0
SELECT restricted FROM datafile WHERE id=2701
0
(Same reason you're not seeing the Request Access button)

This is how DatasetPage works (only marks the filemetadata, when you restrict the file; the file object becomes restricted only when the dataset is published).

This is done on purpose, actually. Because when you have a published version with a non-restricted file, and you try to restrict it - we only want it to be restricted in the next DRAFT version. And we don't want it to affect the unrestricted status of that published file immediately. But only when it's finalized, by publishing the DRAFT.

Of course whoever implemented it didn't think of this exotic case - an unpublished file that only exists in a DRAFT, which another user has been given permission to view. So it looks like for such files we DO want to mark the DataFile object as restricted right away.

@landreev landreev assigned kcondon and unassigned landreev Oct 28, 2015
@landreev
Copy link
Contributor

@kcondon:
I checked in a fix.
As I said, this was not a bug in the access/auth code. But in the behavior of the dataset page.
However, I also checked in an update for the Access code, to check the restriction status of both the datafile and filemetadata for unpublished files. So you should be seeing the correct download behavior (403) on previously created files as well.

@bencomp
Copy link
Contributor

bencomp commented Oct 29, 2015

Could you say that the restriction status of any file (i.e. DataFile) MUST always be the same as the restriction status of the latest published related FileMetadata? If so, could you make the Most Recently Published FileMetadata the definitive source for restriction status checking and remove the status field from DataFile?

@kcondon kcondon assigned landreev and unassigned kcondon Oct 29, 2015
@kcondon
Copy link
Contributor

kcondon commented Oct 29, 2015

@landreev
OK, now can't ever download a restricted file.

@landreev landreev assigned kcondon and unassigned landreev Oct 29, 2015
@landreev
Copy link
Contributor

OK, should now be working, for reals!

@landreev
Copy link
Contributor

@bencomp:
Let me think... I do believe this is already true... IF there is such a thing as the most recently published metadata... So the logic you're describing will be "if there is published metadata, check the latest one for the restriction status; if not, check the restriction status of the draft; if there's no draft - check the most recent archived version..."
Yeah, it sounds like this should work...
Just in case we are missing something here, I'll discuss this with the group (with @gdurand and @michbarsinai primarily ... anyone else wants to chime in?). If we're not missing anything, then we could get rid of the restriction flag on the DataFile object in 4.3 maybe(?)

@bencomp
Copy link
Contributor

bencomp commented Oct 29, 2015

Please write an automated test and use a function table listing all inputs and corresponding outputs to help prevent overlooking anything. :)

@kcondon
Copy link
Contributor

kcondon commented Oct 29, 2015

  1. Verify assigning file downloader role alone isn't sufficient to download unpublished files.
    using an unpublished dataset with two files, one restricted, one unrestricted, grant file downloader permission to a user at file, dataset, and dataverse one at a time, verify no notifications are sent, no data appears in My Data, and using the access api, no files are downloaded, all 403. Remember to enable request access for the restricted file.
  2. Verify combining file downloader role with view unpublished dataset is sufficient to download unpublished files
    same test case as above except verify granting existing roles other than file downloader, those that combine file downloader and view unpublished dataset can download both files: member, contributor, curator, admin. Verify notifications are sent, my data appears and download works in UI and API.
  3. Edge case: verify view unpublished dataset grants downloading unrestricted file but not restricted unpublished file, until file downloader perm is also granted.
    using the above test dataset and files, create a special role at the datateverse level that has only view unpublished dataset. Grant to user. Verify they can view dataset and all file access buttons appear:
    request access for restricted file (make sure allow request access is enabled), and download for the unrestricted file. Make sure you can download unrestricted file. Verify you can download unrestricted file and get 403 with restricted file via API.
  4. Verify granting file downloader role to published restricted files works
    publish above dataset, grant file downloader role and roles that contain it to user, verify notification is sent, my data appears, files can be downloaded in UI and API.

Note: for API data access testin, use both session and API key authentication.

@landreev
Copy link
Contributor

@bencomp
Yes, I have started working on an automated test. But it will not in any way eliminate the possibility to overlook something. In how we use and maintain file restrictions across the entire app that is. But yes, it will ensure that the access API keeps working correctly.

@kcondon
Copy link
Contributor

kcondon commented Oct 29, 2015

Closing

@landreev
Copy link
Contributor

landreev commented Nov 4, 2015

OK, this issue was a total disaster. I feel bad about it. Of course it didn't even have any business being in 4.2.1. I picked it up as a result of misunderstanding; I thought I was being asked to just add 3 lines to the Access class. But once I realized it was a bigger issue, I should've undone any changes and tagged it 4.3 (this had nothing to do with performance; and the original bug was narrow in scope and there was a workaround...). Because in the end I was working on it in a hurry and did more damage than good... In my defense, I hadn't done much work on permissions before, except of hooking up this API method to the PermissionService originally.

The only positive side here is that in the process we've realized/been reminded of how messy that code was, that the business logic for these permission lookups is spread all over the app, that there are no automated tests for this, etc. I'll be opening more tickets for all these issues, hopefully we'll address them soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants