Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3353 batch job import #3497

Merged
merged 39 commits into from
Feb 9, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
74f58c4
adds support for batch file system import via JSR 352
bmckinney Nov 10, 2016
f2fde2e
adds more messages when retries occur
bmckinney Nov 10, 2016
42f3664
removes dataverse data dir and log dir properties from batch job xml …
bmckinney Nov 14, 2016
3237a62
adds notifications for checksum and file system import; adds notifica…
bmckinney Nov 15, 2016
02a3c30
fixes bug in prior merge conflict resolution
bmckinney Nov 15, 2016
aa8f83c
adds import type enumeration
bmckinney Nov 16, 2016
5fcc9c0
minor refactor
bmckinney Nov 16, 2016
a239f57
comments out debugging println statement
bmckinney Nov 16, 2016
403aa2c
modifications to api based on code review feedback
bmckinney Nov 18, 2016
774ce3c
adds single version constraint; integration test teardown destroys pu…
bmckinney Nov 18, 2016
17cfdcf
action log stores path to json log instead of the full, or potentiall…
bmckinney Nov 18, 2016
e2046e9
failed attempt to use command engine to save dataset (help!)
bmckinney Nov 21, 2016
afa7cb9
backs out the failed command engine attempt
bmckinney Nov 21, 2016
f676399
uses command engine to update dataset version; consolidates update in…
bmckinney Nov 23, 2016
d51185f
bootstraps constraint validation for the api; adds unauthorized user …
bmckinney Nov 23, 2016
a99f929
Merge branch 'develop' into 3353-batch-job-import
bmckinney Nov 23, 2016
4838dbd
Merge branch 'develop' into 3353-batch-job-import
bmckinney Nov 30, 2016
45a90ac
creates DataverseRequest with null HttpServletRequest instead of null…
bmckinney Nov 30, 2016
9de9b24
addresses QA feedback: fixes action log bug; allows system property t…
bmckinney Dec 7, 2016
16caeb2
fixes bug where directoryLabel is set with a filename (e.g., there is…
bmckinney Dec 7, 2016
8cbbbb6
fixes target of permission to dataset, not parent dataverse
bmckinney Dec 8, 2016
de8a649
fixes database merge errors thrown by invoking the command engine too…
bmckinney Dec 8, 2016
993ec9e
Merge branch 'develop' into 3353-batch-job-import
bmckinney Dec 8, 2016
84fff7b
allows any whitespace to separate filename and checksum value in mani…
bmckinney Dec 8, 2016
347807c
looks like datafile has to be set to mergeable=true now
bmckinney Dec 8, 2016
616c12b
adds required boilerplate code to support email notifications
bmckinney Dec 8, 2016
702bbd9
replaces explicit EditDataset permission check with: canIssue(UpdateD…
bmckinney Dec 8, 2016
c9402b9
Merge branch 'develop' into 3353-batch-job-import
bmckinney Dec 8, 2016
0254cf8
removes command engine since it fails when a dataset has 5,000+ files
bmckinney Dec 10, 2016
e7e65d5
removes checksum step by adding the property in the import step; adds…
bmckinney Jan 3, 2017
b28f520
FileRecordWriter modified to handle batch ingest of file batches as "…
landreev Jan 30, 2017
c7335d7
reorganized the parameters of the batch import job, to be able to han…
landreev Jan 31, 2017
b182f98
reorganized the structure of the final package file folder
landreev Feb 2, 2017
e69211d
The changes (5 total - will post the list in the file request and/or …
landreev Feb 2, 2017
fc3fade
the last few fixes for the batch upload.
landreev Feb 7, 2017
2a1f915
fix for the notification page.
landreev Feb 7, 2017
7d8b467
Merge branch 'develop' into 3353-batch-job-import
landreev Feb 7, 2017
b1be2dc
update mime type for file package #3353
pdurbin Feb 8, 2017
a1c46ef
some last minute fixes/changes:
landreev Feb 8, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions src/main/java/Bundle.properties
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,9 @@ notification.access.granted.fileDownloader.additionalDataset={0} You now have ac
notification.access.revoked.dataverse=You have been removed from a role in {0}.
notification.access.revoked.dataset=You have been removed from a role in {0}.
notification.access.revoked.datafile=You have been removed from a role in {0}.
notification.checksumfail=Your upload to dataset "{0}" failed checksum validation.
notification.import.filesystem=<a href="{0}/dataset.xhtml?persistentId={1}" title="{2}"&>{2}</a>, dataset had files imported from the file system via a batch job.
notification.import.checksum=<a href="/dataset.xhtml?persistentId={0}" title="{1}"&>{1}</a>, dataset had file checksums added via a batch job.
removeNotification=Remove Notification
groupAndRoles.manageTips=Here is where you can access and manage all the groups you belong to, and the roles you have been assigned.
user.signup.tip=Why have a Dataverse account? To create your own dataverse and customize it, add datasets, or request access to restricted files.
Expand Down Expand Up @@ -528,6 +531,9 @@ hours=hours
hour=hour
minutes=minutes
minute=minute
notification.email.checksumfail.subject=Dataverse: Your upload failed checksum validation.
notification.email.import.filesystem.subject=Dataverse: Your file import job has completed
notification.email.import.checksum.subject=Dataverse: Your file checksum job has completed

# passwordreset.xhtml

Expand Down
7 changes: 7 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/DataFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -633,6 +633,13 @@ public boolean isImage() {
// generate thumbnails and previews for them)
return (contentType != null && (contentType.startsWith("image/") || contentType.equalsIgnoreCase("application/pdf")));
}

public boolean isFilePackage() {
if (DataFileServiceBean.MIME_TYPE_PACKAGE_FILE.equalsIgnoreCase(contentType)) {
return true;
}
return false;
}

public void setIngestStatus(char ingestStatus) {
this.ingestStatus = ingestStatus;
Expand Down
30 changes: 29 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,17 @@ public class DataFileServiceBean implements java.io.Serializable {

private static final String MIME_TYPE_UNDETERMINED_DEFAULT = "application/octet-stream";
private static final String MIME_TYPE_UNDETERMINED_BINARY = "application/binary";

/**
* Per https://en.wikipedia.org/wiki/Media_type#Vendor_tree just "dataverse"
* should be fine.
*
* @todo Consider registering this at http://www.iana.org/form/media-types
* or switch to "prs" which "includes media types created experimentally or
* as part of products that are not distributed commercially" according to
* the page URL above.
*/
public static final String MIME_TYPE_PACKAGE_FILE = "application/vnd.dataverse.file-package";

public DataFile find(Object pk) {
return (DataFile) em.find(DataFile.class, pk);
Expand Down Expand Up @@ -168,7 +179,24 @@ public List<DataFile> findByDatasetId(Long studyId) {
Query query = em.createQuery("select o from DataFile o where o.owner.id = :studyId order by o.id");
query.setParameter("studyId", studyId);
return query.getResultList();
}
}

public DataFile findByStorageIdandDatasetVersion(String storageId, DatasetVersion dv) {
try {
Query query = em.createNativeQuery("select o.id from datafile o, filemetadata m " +
"where o.filesystemname = '" + storageId + "' and o.id = m.datafile_id and m.datasetversion_id = " +
dv.getId() + "");
query.setMaxResults(1);
if (query.getResultList().size() < 1) {
return null;
} else {
return findCheapAndEasy((Long) query.getSingleResult());
}
} catch (Exception e) {
logger.log(Level.SEVERE, "Error finding datafile by storageID and DataSetVersion: " + e.getMessage());
return null;
}
}

public List<FileMetadata> findFileMetadataByDatasetVersionId(Long datasetVersionId, int maxResults, String userSuppliedSortField, String userSuppliedSortOrder) {
FileSortFieldAndOrder sortFieldAndOrder = new FileSortFieldAndOrder(userSuppliedSortField, userSuppliedSortOrder);
Expand Down
40 changes: 40 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/MailServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,12 @@ private String getSubjectTextBasedOnNotification(UserNotification userNotificati
return ResourceBundle.getBundle("Bundle").getString("notification.email.returned.dataset.subject");
case CREATEACC:
return ResourceBundle.getBundle("Bundle").getString("notification.email.create.account.subject");
case CHECKSUMFAIL:
return ResourceBundle.getBundle("Bundle").getString("notification.email.checksumfail.subject");
case FILESYSTEMIMPORT:
return ResourceBundle.getBundle("Bundle").getString("notification.email.import.filesystem.subject");
case CHECKSUMIMPORT:
return ResourceBundle.getBundle("Bundle").getString("notification.email.import.checksum.subject");
}
return "";
}
Expand Down Expand Up @@ -435,6 +441,34 @@ private String getMessageTextBasedOnNotification(UserNotification userNotificati
accountCreatedMessage += optionalConfirmEmailAddon;
logger.fine("accountCreatedMessage: " + accountCreatedMessage);
return messageText += accountCreatedMessage;

case CHECKSUMFAIL:
version = (DatasetVersion) targetObject;
String checksumFailMsg = BundleUtil.getStringFromBundle("notification.checksumfail", Arrays.asList(
version.getDataset().getGlobalId()
));
logger.info("checksumFailMsg: " + checksumFailMsg);
return messageText += checksumFailMsg;

case FILESYSTEMIMPORT:
version = (DatasetVersion) targetObject;
String fileImportMsg = BundleUtil.getStringFromBundle("notification.import.filesystem", Arrays.asList(
systemConfig.getDataverseSiteUrl(),
version.getDataset().getGlobalId(),
version.getDataset().getDisplayName()
));
logger.info("fileImportMsg: " + fileImportMsg);
return messageText += fileImportMsg;

case CHECKSUMIMPORT:
version = (DatasetVersion) targetObject;
String checksumImportMsg = BundleUtil.getStringFromBundle("notification.import.checksum", Arrays.asList(
version.getDataset().getGlobalId(),
version.getDataset().getDisplayName()
));
logger.info("checksumImportMsg: " + checksumImportMsg);
return messageText += checksumImportMsg;

}

return "";
Expand Down Expand Up @@ -465,6 +499,12 @@ private Object getObjectOfNotification (UserNotification userNotification){
return versionService.find(userNotification.getObjectId());
case CREATEACC:
return userNotification.getUser();
case CHECKSUMFAIL:
return datasetService.find(userNotification.getObjectId());
case FILESYSTEMIMPORT:
return versionService.find(userNotification.getObjectId());
case CHECKSUMIMPORT:
return versionService.find(userNotification.getObjectId());
}
return null;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@

public class UserNotification implements Serializable {
public enum Type {
ASSIGNROLE, REVOKEROLE, CREATEDV, CREATEDS, CREATEACC, MAPLAYERUPDATED, SUBMITTEDDS, RETURNEDDS, PUBLISHEDDS, REQUESTFILEACCESS, GRANTFILEACCESS, REJECTFILEACCESS
ASSIGNROLE, REVOKEROLE, CREATEDV, CREATEDS, CREATEACC, MAPLAYERUPDATED, SUBMITTEDDS, RETURNEDDS, PUBLISHEDDS, REQUESTFILEACCESS, GRANTFILEACCESS, REJECTFILEACCESS, FILESYSTEMIMPORT, CHECKSUMIMPORT, CHECKSUMFAIL
};

private static final long serialVersionUID = 1L;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
package edu.harvard.iq.dataverse.api.batchjob;

import com.fasterxml.jackson.databind.ObjectMapper;
import edu.harvard.iq.dataverse.api.AbstractApiBean;
import edu.harvard.iq.dataverse.batch.entities.JobExecutionEntity;

import javax.batch.operations.JobOperator;
import javax.batch.runtime.BatchRuntime;
import javax.batch.runtime.JobExecution;
import javax.batch.runtime.JobInstance;
import javax.ejb.Stateless;
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.PathParam;
import javax.ws.rs.Produces;
import javax.ws.rs.core.MediaType;
import javax.ws.rs.core.Response;
import java.util.ArrayList;
import java.util.List;
import java.util.Set;


@Stateless
@Path("admin/batch")
public class BatchJobResource extends AbstractApiBean {

private static String EMPTY_JSON_LIST = "[]";
private static String EMPTY_JSON_OBJ = "{}";
private static ObjectMapper mapper = new ObjectMapper();

@GET
@Path("/jobs")
@Produces(MediaType.APPLICATION_JSON)
public Response listBatchJobs() {
try {
final List<JobExecutionEntity> executionEntities = new ArrayList<>();
final JobOperator jobOperator = BatchRuntime.getJobOperator();
final Set<String> names = jobOperator.getJobNames();
for (String name : names) {
final int end = jobOperator.getJobInstanceCount(name);
final List<JobInstance> jobInstances = jobOperator.getJobInstances(name, 0, end);
for (JobInstance jobInstance : jobInstances) {
final List<JobExecution> executions = jobOperator.getJobExecutions(jobInstance);
for (JobExecution execution : executions) {
executionEntities.add(JobExecutionEntity.create(execution));
}
}
}
return Response.ok("{ \"jobs\": \n" + mapper.writeValueAsString(executionEntities) + "\n}").build();
} catch (Exception e) {
return Response.ok(EMPTY_JSON_LIST).build();
}
}

@GET
@Path("/jobs/name/{jobName}")
@Produces(MediaType.APPLICATION_JSON)
public Response listBatchJobsByName( @PathParam("jobName") String jobName) {
try {
final List<JobExecutionEntity> executionEntities = new ArrayList<>();
final JobOperator jobOperator = BatchRuntime.getJobOperator();
final int end = jobOperator.getJobInstanceCount(jobName);
final List<JobInstance> jobInstances = jobOperator.getJobInstances(jobName, 0, end);
for (JobInstance jobInstance : jobInstances) {
final List<JobExecution> executions = jobOperator.getJobExecutions(jobInstance);
for (JobExecution execution : executions) {
executionEntities.add(JobExecutionEntity.create(execution));
}
}
return Response.ok("{ \"jobs\": \n" + mapper.writeValueAsString(executionEntities) + "\n}").build();
} catch (Exception e) {
return Response.ok(EMPTY_JSON_LIST).build();
}
}


@GET
@Path("/jobs/{jobId}")
@Produces(MediaType.APPLICATION_JSON)
public Response listBatchJobById(@PathParam("jobId") String jobId) {
try {
JobExecution execution = BatchRuntime.getJobOperator().getJobExecution(Long.valueOf(jobId));
return Response.ok(mapper.writeValueAsString(JobExecutionEntity.create(execution))).build();
} catch (Exception e) {
return Response.ok(EMPTY_JSON_OBJ).build();
}
}

}
Loading