Skip to content
This repository has been archived by the owner on May 20, 2019. It is now read-only.
Eugene Tulika edited this page Apr 10, 2018 · 7 revisions

1. Phase 1: Introduction of Asynchronous Bulk API Endpoints

Phase 1 of the Bulk API project introduced new set of endpoints to the Magento which allow to trigger asynchronous operations via WebAPI.

Every WebAPI endpoint existing in Magento two new endpoints were added: async to process operations asynchronously and async/bulk to accept the array of entities as a request.

For example POST rest/V1/product

  • rest/async/V1/product which will store the request to the Message Queue
  • rest/async/bulk/V1/product/save which allows to accept multiple entities at once and store them as messages in Message Queue

Asynchronous/Bulk API solves following problems:

  • have an API which accepts multiple products to be persisted a time
  • It must be possible for the client to send multiple Bulk API requests in parallel
  • It should be possible for Magento to process multiple Bulk API workers in parallel
  • HTTP requests must complete quickly. Processing of bulk data is performed asynchronously.

Phase 2: Improvement in performance of operations

1. Service Contracts Accepting Multiple Entities

ServiceContracts Interfaces should accept multiple entities, so low-level optimizations can be performed while they are persisted. For example, part of the entity persistence can be moved to the indexers or events can be triggered for a group of entities at once, instead of invoking them one by one. Besides, it will allow for 3-d party extensions to write plugins on the operation dealing with the group of entities and optimize the plugins for such case.

2. Incremental Import

Support for incremental import should address the problem of the third-party systems which persists multiple entities without checking did they really change. Every entity which is persisted should have hash generated for the data. When persistence operation starts, it will load the entity from the database and calculate the hash on key data which is imported. In case if the hash is not changed, an entity should be skipped from the persistence.

3. Partial Updates

Additionally, instead of sending the whole entity if only one field changed, Bulk API can support partial update to the entity.

4. Import/Export on Top of Bulk API

Additionally, the Bulk API can be used to serve the Import / Export functionality.

Pros:

  • solve the whole bunch of inconsistencies between import/export and current APIS

Cons:

  • Bulk API performance might be worse in comparison to importing export

5. Group Events

The bulk API should ideally support existing model events to the extent that performance is not adversely impacted.

Third-party extensions commonly perform HTTP requests, or other long-running synchronous operations based on dispatch of events such as product save. Many hundreds or thousands of these events executing during the creation or update of individual entities has the potential to reduce performance of the bulk API. In some cases, these have been implemented in a manner that will require their execution to occur within a certain context or execute some logic (such as validation) prior to an entity being persisted.

Currently, the Import / Export module and mass update actions within Magento Admin do not trigger model events, so the addition of a specific set of bulk-API related events may be considered a suitable resolution to support high-throughput, while still allowing third party modules to be notified of changes to model data.

Should new bulk events be required to achieve required performance, those considered 'blocking' may be designed to allow for parallel execution in one or more workers. Parallel processing of bulk API events will:

  • Reduce the overall processing time of the bulk API worker(s)
  • Allow an accurate operation status to be reported at the time that the bulk API worker has completed processing a message (when compared to non-blocking event execution)
  • Reduce the amount of time that the worker thread needs to sustain a database transaction (where events dispatch during existing save implementations), thereby reducing the opportunity for database deadlocks

Additionally, should new bulk API events be added these should include (and be documented towards a preference for) events which do not block the processing and persistence of the initial request to reduce the amount of time that a database transaction must be maintained.

Examples of "blocking" events:

  • bulk_update_products_before
  • bulk_update_products_prepare_before

The results of these may affect the eventual status of the bulk / bulk item operation which is reported by the API.

Examples of "non-blocking" events:

  • bulk_update_products_after

The results of these are outside of the concern for the bulk API client, so the operation status can be reported to the bulk API client prior to their execution.

7. Persistence Layer

The bulk API should use service contracts and thereby maintain support for existing customisations (events, plugins etc) made to entity resource models wherever possible.

Currently, deadlocks created during simultaneous insert (and to a lesser degree, update) operations occur as a result of many concurrent database connections each saving a different (single) product per HTTP request.

Persisting multiple entities via a single database transaction (and therefore connection) in the form of a Bulk API persistence worker may provide sufficient throughput while maintaining compatibility with existing customisations made to resource models when compared to a separate performance-oriented implementation of the persistence layer. This would not necessarily prevent multiple workers from pre-processing or preparing data / deltas for eventual insert via a single worker where data preparation proves to be an intensive operation.

Clone this wiki locally