Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New REST Endpoints for Content Import Job Management #30550

Open
4 tasks done
Tracked by #29474
jgambarios opened this issue Nov 1, 2024 · 2 comments
Open
4 tasks done
Tracked by #29474

New REST Endpoints for Content Import Job Management #30550

jgambarios opened this issue Nov 1, 2024 · 2 comments

Comments

@jgambarios
Copy link
Contributor

jgambarios commented Nov 1, 2024

Parent Issue

#29575

Task

We need to create REST endpoints to interact with the new com.dotcms.jobs.business.processor.impl.ImportContentletsProcessor to manage content import operations through a job queue system.

Objectives

  • Implement REST endpoints for content import job management
  • Provide real-time monitoring capabilities
  • Enable comprehensive validation and error handling

REST Endpoints Specification

  1. Create Import Job
  • Endpoint: POST /content/import
  • Purpose: Creates and enqueues a new content import job
  • Returns: Job identifier
  1. List Import Jobs
  • Endpoint: GET /content/import
  • Parameters:
    • page: Page number
    • size: Items per page
    • status: Filter by job status
  • Purpose: Lists all enqueued jobs with pagination
  1. Validate Import
  • Endpoint: POST /content/import/validate
  • Purpose: Performs import validation without actual import
  • Parameters: Same as content import endpoint
  1. Get Job Status
  • Endpoint: GET /content/import/{jobId}
  • Parameters: jobId
  • Returns: Job state, progress percentage, executing node, etc
  1. Cancel Job
  • Endpoint: POST /content/import/{jobId}/cancel
  • Parameters: jobId
  • Purpose: Cancels a running import job
  1. Monitor Job
  • Endpoint: GET /content/import/{jobId}/monitor
  • Parameters: jobId
  • Type: Server-Sent Events (SSE)
  • Purpose: Real-time job status monitoring

Technical Requirements

Import Features:

  • CSV file upload support
  • Content type specification
  • Content relationships handling
  • Multilingual content support
  • Error handling
  • Comprehensive data validation

Error Handling:

  • Detailed error reporting
  • Validation-only mode

Proposed Objective

Core Features

Proposed Priority

Priority 2 - Important

Acceptance Criteria

  • All endpoints return appropriate HTTP status codes
  • Job queue properly manages concurrent imports
  • SSE endpoint provides real-time updates
  • Error messages are clear and actionable
  • Import validation provides comprehensive checks
  • Job cancellation effectively stops processing
  • Multilingual content is properly handled
  • Content relationships are maintained

Tasks

  1. Doc : Needs Doc Merged QA : Passed Internal Release : 24.12.05 Team : Scout Type : Task
  2. Doc : Needs Doc Merged OKR : Core Features Priority : 2 High QA : Passed Internal Release : 24.12.05 Team : Scout Type : Task
  3. Doc : Needs Doc Merged QA : Passed Internal Release : 24.12.10 Team : Scout Type : Task
  4. Doc : Needs Doc Merged QA : Passed Internal Release : 24.12.10 Team : Scout Type : Task
@fmontes
Copy link
Member

fmontes commented Nov 4, 2024

Thanks for the detailed user story! I have a few comments to ensure we align with the comprehensive goals outlined in our Import/Export API documentation:

  1. Multilingual and Relationship Handling: Ensure that we have clear implementation details on how to handle multilingual content and relationships.

  2. Dry-Run Feature: It looks like the POST /content/import/validate endpoint, which acts as a dry-run mode, is not explicitly mentioned.

  3. Binary Content Consideration: Double-check that binary content handling is covered. Users need to reference existing binary files via paths in the CSV, so make sure this is clear in both implementation and documentation.

  4. Performance Expectations: Can we confirm that the system can handle importing at least 100,000 items within one hour? is that a realistic goal?

@jgambarios
Copy link
Contributor Author

jgambarios commented Nov 4, 2024

@fmontes

1. Multilingual and Relationship Handling: The current import process already handles multilingual and relationships, so, this is covered.

2. Dry-Run Feature: Correct, the dry-run (or "Preview" in our current import) feature will run under the POST /content/import/validate endpoint, we can always change the name of the endpoint for clarity if we want.

3. Binary Content Consideration: Same as number 1, as we are "sharing" the current import logic in our new import processor, the binary case is handled, like you said, the binary must already exist in dotCMS and it is referenced by path.

4. Performance Expectations: This is something we need to investigate, during my local testing I imported 30,000 contentlets in like 6mins, but, this is locally and using a very simple content type, so I would need to run tests to provide more realistic numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Current Sprint Backlog
Development

No branches or pull requests

3 participants