Skip to content
This repository has been archived by the owner on Jun 16, 2023. It is now read-only.

Description & Scenarios

heipacker edited this page Feb 12, 2014 · 9 revisions

Description & Scenarios

JStorm is a distributed real-time calculation engine.

JStorm system is similar to Hadoop MapReduce,user just need to implement specified interface and submit the task to JStorm system. The system will run up this task by 7*24 hours, once a worker accident happened, the dispatcher will assign a new worker to replace the failed worker immediately. Therefore, from the application point of view, JStorm is a distributed application which complies with a programming specification. From the system point of view, JStorm is a scheduling system similar to MapReduce. From the data point of view, JStorm is a mechanism based on message processing pipeline. Real-time computing is the hottest direction in the field of big data. On the one hand, people’s demanding for data is becoming higher and higher, on the other hand, real-time requirements are getting faster and faster, the traditional Hadoop MapReduce gradually can not meet the demand. Thus, demand of Real-time computing continues to grow in this area.

Advantages

There are many real-time calculation engines before Storm and JStorm, but the Storm and JStorm occupied the entire market since they appeared. The advantages are below:

  • Quick development, simple interface, easy to use. To develop a good scalability application, without thinking about the bottom rpc, redundancy between the worker, data distribution, user just to observe the programming specifications of Topology, Spout and Bolt.
  • Excellent scalability, you can get a linear expansion performance by configure concurrent item directly.
  • Robust, the scheduler will automatically assign a new worker to replace invalid worker when the worker fails or machine break down.
  • Accuracy of the data, user can use Acker mechanism to prevent data lossing. If there are more step on the accuracy requirements, using transaction mechanism to ensure data accuracy.

Scenarios

The way JStorm processing data is based on the message processing pipeline , which is particularly suitable for non-state calculation. That is, the data which the calculation unit depend on, can be found in the message received, and preferably a data stream is not dependent on other data stream. Therefore, it is often used like:

  • Log analysis, which can analysis specific data from the log, and stored the analyze results in the external memory such as databases. Currently, the mainstream log analysis technology is based on JStorm or Storm.
  • Pipeline system, which can transfer data from one system to another, such as synchronizes data to Hadoop.
  • Message converter, according to a certain format , convert the messages received, then store into another system, such as messaging middleware.
  • Statistical analyzer, extract a certain field from the log or message, then do count or sum calculate, finally store statistics in the external storage. The intermediate processing may be more complex.
Clone this wiki locally