Skip to content

ironmanMA/architect-awesome

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

Backend Architect Technical Map

Knowledge Sharing Agreement (CC Agreement) GitHub stars GitHub forks GitHub watchers

Last updated on 20180502

(Toc generated by simple-php-github-toc )

data structure

queue

set

Lists, arrays

Dictionary, associative array

Stack

tree

Binary tree

Each node has a maximum of two leaf nodes.

Complete Binary Tree

  • Complete Binary Tree
    • The leaf nodes can only appear at the bottom and sub-lower levels, and the nodes at the bottom level are all concentrating on the binary tree at the leftmost position of the layer.

Balanced binary tree

The absolute value of the height difference between the left and right two subtrees does not exceed one, and both the left and right subtrees are balanced binary trees.

Binary Search Trees, also called ordered binary trees, are sorted binary trees.

Red black tree

B-, B+, B* trees

MySQL is based on the B+ tree clustered index organization table

LSM tree

Compared with B+ trees, LSM (Log-Structured Merge-Trees) sacrifices partial read performance in exchange for write performance (through batch writes) and achieves read and write. Hbase, LevelDB, Tair (Long DB), and nessDB use the structure of the LSM tree. LSM can quickly establish an index.

  • LSM Tree vs B+ Tree

    • The B+ tree read performance is good, but due to the need for an orderly structure, the disk seeks frequently when the keys are scattered, resulting in write performance.
    • LSM divides a large tree into N small trees, first writes to memory (without seeking problems, high performance), and builds an ordered small tree (ordered tree) in memory. The bigger the memory tree is flushed to disk. When reading, because it does not know which tree the data is on, it is necessary to traverse (bi-search) all the small trees, but the data is ordered inside each small tree.
  • LSM Tree (Log-Structured Merge Tree Storage Engine)

    • In extreme terms, the write performance of HBase based on the LSM tree is an order of magnitude higher than that of MySQL, and the read performance is an order of magnitude lower.
    • Optimization method: Bloom filter replaces binary search; compact decimal tree, improves query performance.
    • In Hbase, when the memory reaches a certain threshold, the entire flush flushes to disk and forms a file (B+ number). HDFS does not support the update operation. Therefore, Hbase does an overall flush instead of a merge update. Flush to the small tree on the disk, periodically merged into a big tree.

BitSet

It is often used for checking the weight of large-scale data.

Common algorithms

Sorting, finding algorithm

Select sort

Bubble Sort

Insert sort

Quick sort

Merge sort

Hill Sort

TODO

Heap sort

Counting sort

Bucket sort

  • "[Aha! Algorithms] The Fastest and Simplest Sorting - Bucket Sorting
  • "Sort Algorithm (3): Counting and Bucket Sorting"
    • Bucket sorting divides the [0,1) interval into n sub-intervals of the same size, which are called buckets.
    • Each bucket is sorted individually and then it traverses each bucket.

Cardinal sort

Sort by rank, ten, hundred, ...

Binary search

Sorting tools in Java

Bloom filter

Commonly used for big data, such as email, url, etc. Core principle: Generate a fingerprint (a byte or bytes, but certainly a lot less than the original data) by calculating each piece of data, each of which is obtained by random calculation, mapping the fingerprint to a large The bitwise storage space. Note: There will be a certain error rate. Advantages: High space and time efficiency. Disadvantages: As the number of deposited elements increases, the miscalculation rate increases.

String comparison

KMP algorithm

KMP: The core principle of the Knuth-Morris-Pratt algorithm (abbreviated as KMP) is to use a "partial match table" to skip over elements that have already been matched.

Depth first, breadth first

how are you

Backtracking algorithm

Pruning algorithm

Dynamic planning

Naive Bayes

Recommended algorithm

Minimum spanning tree algorithm

Shortest path algorithm

Concurrent

  1. Basic knowledge

    1.1 advantages and disadvantages of concurrent programming

    Knowledge points: (1) Why use concurrency? (Advantages); (2) Disadvantages of concurrent programming; (3) Confusing concept

    1.2 Thread Status and Basic Operations

    Knowledge points: (1) how to create new threads; (2) thread state transitions; (3) basic thread operations; (4) daemon threads;

  2. Concurrency Theory (JMM)

    Java memory model and happens-before rules

    Knowledge points: (1) JMM memory structure; (2) reordering; (3) happens-before rules

  3. Concurrent Keyword

    3.1 Let you thoroughly understand Synchronized

    Knowledge points: (1) how to use synchronized; (2) monitor mechanism; (3) synchronized happens-before relationship; (4) synchronized memory semantics; (5) lock optimization; (6) lock escalation strategy

    3.2 Let you thoroughly understand volatile

    Knowledge points: (1) implementation principle; (2) derivation of happens-before relations; (3) memory semantics; (4) realization of memory semantics

    3.3 Do you think you really understand final?

    Knowledge points: (1) how to use; (2) final reordering rules; (3) final implementation principles; (4) final references cannot "overflow" (this escape) from constructors

    3.4 Summary of the three major properties: atomicity, orderliness, visibility

    Knowledge points: (1) atomicity: synchronized; (2) visibility: synchronized, volatile; (3) orderliness: synchronized, volatile

  4. Lock system

    4.1 Meet Lock and AbstractQueuedSynchronizer (AQS)

    Knowledge points: (1) Lock and synchronized comparisons; (2) AQS design intent; (3) How to implement custom synchronization components using AQS; (4) Rewritable methods; (5) Template methods provided by AQS;

    4.2 In- depth understanding of AbstractQueuedSynchronizer (AQS)

    Knowledge points: (1) AQS synchronization queue data structure; (2) exclusive lock; (3) shared lock;

    4.3 Understanding ReentrantLock Again

    Knowledge points: (1) implementation principle of reentrant locks; (2) implementation principle of fair locks; (3) implementation principle of non-fair locks; (4) comparison of fair locks and non-fair locks

    4.4 In- depth understanding of the read-write lock ReentrantReadWriteLock

    Knowledge points: (1) How to represent read/write status; (2) WriteLock acquisition and release; (3) ReadLock acquisition and release; (4) lock downgrading strategy; (5) Generation of Condition wait queues; (6) Application scenarios

    4.5 Detailed conditions await and signal wait / notification mechanism

    Knowledge points: (1) Features that are comparable to the wait/notify mechanism of Object; (2) Methods corresponding to wait/notify of Object; (3) Underlying data structures; (4) Await implementation principles; (5) Signal/signalAll implementation principle; (6) combination of await and signal/signalAll;

    4.6 LockSupport Tool

    Knowledge points: (1) main functions; (2) characteristics compared to synchronized blocking wake-up;

  5. Concurrent container

    5.1 concurrent containers ConcurrentHashMap (JDK 1.8 version)

    Knowledge points: (1) key attributes; (2) important internal classes; (3) CAS operations involved; (4) construction methods; (5) put execution flow; (6) get execution flow; (7) expansion mechanism (8) Execution flow of method for counting size; (9) Comparison of ConcurrentHashMap version 1.8 with previous version

    5.2 CopyOnWriteArrayList of Concurrent Containers

    Knowledge points: (1) realization principle; (2) difference between COW and ReentrantReadWriteLock; (3) application scenario; (4) why there is weak consistency; (5) shortcomings of COW;

    5.3 ConcurrentLinkedQueue for Concurrent Containers

    Knowledge points: (1) implementation principle; (2) data structure; (3) core method; (4) design intent of delayed update of HOPS

    5.4 concurrent container ThreadLocal

    Knowledge points: (1) implementation principle; (2) set method principle; (3) get method principle; (4) remove method principle; (5) ThreadLocalMap

    An article, from the source code in-depth detailed ThreadLocal memory leak problem

    Knowledge points: (1) ThreadLocal memory leak principle; (2) ThreadLocal best practices; (3) application scenarios

    5.5 Concurrent Container BlockingQueue

    Knowledge points: (1) Basic operation of BlockingQueue; (2) Commonly used BlockingQueue;

    Concurrent container ArrayBlockingQueue and LinkedBlockingQueue implementation principle explained

  6. Thread pool (Executor system)

    6.1 Thread Pool Implementation Principle

    Knowledge points: (1) Why use a thread pool? (2) execution flow; (3) the meaning of each parameter of the constructor; (4) how to close the thread pool; (5) how to configure the thread pool;

    6.2 Thread Pool ScheduledThreadPoolExecutor

    Knowledge points: (1) class structure; (2) common methods; (3) ScheduledFutureTask; (3) DelayedWorkQueue;

    6.3 FutureTask Basic Operation Summary

    Knowledge points: (1) Several states of FutureTask; (2) get method; (3) cancel method; (4) application scenario; (5) implementation of Runnable interface

  7. Atomic operations

    7.1 Atomic Operation Classes in Atomic Packages in Java

    Knowledge points: (1) implementation principle; (2) atomic update basic types; (3) atomic update array types; (4) atomic update reference types; (5) atomic update field types

  8. Concurrency tools

    8.1 Big vernacular Java Concurrency Tool - CountDownLatch, CyclicBarrier

    Knowledge Points: (1) CountDownLatch, (2) CyclicBarrier, and (3) Comparison between CountDownLatch and CyclicBarrier

    8.2 Big vernacular Java Concurrency Tools - Semaphore, Exchanger

    Knowledge Points: (1) Resource Access Control Semaphore; (2) Data Exchange Exchanger

  9. Concurrent practice

    9.1 An article that lets you thoroughly understand the producer-consumer problem

JAVA concurrent knowledge map

Move to a new window, zoom in to see better results or view originals

Knowledge map artwork link, if useful, can be cloned for use

JAVA concurrency knowledge map.png

Multithreading

Thread safety

Consistency, transaction

Transaction ACID features

Transaction isolation level

  • Uncommitted read: A transaction can read another uncommitted data and is prone to dirty reads.

  • Read submission: A transaction can read data only after another transaction has been submitted, but a non-repeatable read situation occurs (the read data is inconsistent), and an UPDATE operation occurs during the read process. (The default level for most databases is RC, such as SQL Server, Oracle), which cannot be changed when reading.

  • Repeatable reading: The same transaction ensures that the same data is obtained for each read, but does not guarantee that the original data is updated (phantom read) by other transactions. Mysql InnoDB is this level.

  • Serialization: serial processing of all things (sacrificing efficiency)

  • Understanding the Four Isolation Levels of a Business

  • Four characteristics of database transactions and transaction isolation levels

  • "InnoDB's phantom reading problem"

    • The example of phantom reading is very clear.
    • Solve with SELECT ... FOR UPDATE.
  • "An article with you to read MySQL and InnoDB"

    • Graphical dirty reading, non-repeatable reading, phantom reading problems.

MVCC

lock

Locks and synchronization classes in Java

Fair lock & non-fair lock

The role of the fair lock is to execute in strict accordance with the order in which the threads are started, and no other thread is allowed to queue execution; non-fair locks are allowed to be queued.

  • Fair and Unfair Locks
    • ReentrantLock and synchronized are both non-fair locks by default. ReentrantLock can be set to fair lock.

Pessimistic lock

Pessimistic locking, if used improperly (with too many locks), can cause large areas of service to wait. It is recommended to use optimistic locking + retries first.

Optimistic Lock & CAS

ABA issues

Due to high concurrency, under CAS, this A may not be A after updating. It can be solved by the version number, similar to the optimistic lock mentioned in Mysql above.

CopyOnWrite container

Concurrent reads can be made to the CopyOnWrite container without the need for locking. The CopyOnWrite concurrent container is used to read multiple writes and fewer concurrent scenes. For example, whitelists, blacklists, and visits and update scenarios for product categories are not suitable for scenarios that require strong data consistency.

RingBuffer

Reentrant Locks & Non-Reentrant Locks

  • Reentrant and Non-Reentrant Locks

    • Examples of reentrant locks and non-reentrant locks are illustrated by simple code.
    • Reentrant locks mean that the same thread can regain previously acquired locks.
    • Reentrant locks allow users to avoid deadlocks.
    • Reentrant locks in Java: synchronized and java.util.concurrent.locks.ReentrantLock
  • "ReenTrantLock reentrant lock (and the difference between synchronized) summary"

    • Synchronized easy to use, the compiler to lock, is a non-fair lock.
    • ReenTrantLock is flexible and lock fairness can be customized.
    • In the same locked scenario, it is recommended to use synchronized.

Mutexes & shared locks

Mutexes: Only one thread can acquire a lock at the same time. For example, ReentrantLock is a mutex, and write locks in ReadWriteLock are mutexes. Shared locks: There can be multiple or simultaneous locks. For example, Semaphore, CountDownLatch are shared locks, and read locks in ReadWriteLock are shared locks.

Deadlock

operating system

Principle of computer

CPU

Multi-level cache

A typical CPU has three levels of cache. The closer it is to the core, the faster and the smaller the space. L1 is generally 32k, L2 is generally 256k, and L3 is generally 12M. The memory speed requires 200 CPU cycles and the CPU cache requires 1 CPU cycle.

process

TODO

Threads

Correspondence

  • "Terminating Python Coroutines--Implementation from Yield to actor Model"
    • Thread scheduling is the responsibility of the operating system. Coroutine scheduling is the responsibility of the program.
    • Compared with threads, coroutines reduce unnecessary operating system switching.
    • In fact, when switching to an IO operation is more meaningful, (because the IO operation does not occupy the CPU), if no IO operation is encountered, switch according to the time slice.

Linux

Design Patterns

Six principles of design patterns

  • The Six Principles of Design Patterns
    • The principle of opening and closing: open to the extension, close to the modification, use more abstract classes and interfaces.
    • Lie substitution principle: the base class can be replaced by subclasses, use abstract class inheritance, do not use concrete class inheritance.
    • Relying on the principle of reversal: to rely on abstraction, not relying on the specific, programming for the interface, not for programming.
    • Interface isolation principle: Using multiple isolated interfaces is better than using a single interface to establish a minimal interface.
    • Dimit's Law: A software entity should interact with other entities as little as possible and establish links through intermediate classes.
    • Synthetic reuse principles: Use synthetic/aggregate as much as possible instead of using inheritance.

23 common design patterns

Application scenario

  • "Detailed Design Patterns in the JDK"

    • Structural mode:

      • Adapter: Used to convert an interface to another interface, such as java.util.Arrays#asList().
      • Bridge mode: This mode decouples the implementation of abstract and abstract operations, so that the abstraction and implementation can be independently changed, such as JDBC;
      • Combined mode: Makes the client seem to have the same combination of individual objects and objects. In other words, a certain type of method also accepts its own type as parameters, such as Map.putAll, List.addAll, Set.addAll.
      • Decorator pattern: Dynamically adds an extra function to an object, which is an alternative to subclasses, such as java.util.Collections#checkedList|Map|Set|SortedSet|SortedMap.
      • Fragmentation mode: use caching to speed up the access time of a large number of small objects, such as valueOf(int).
      • Proxy mode: The proxy mode is to replace a complex or time-consuming object with a simple object, such as java.lang.reflect.Proxy
    • Create mode:

      • Abstract factory pattern: The abstract factory pattern provides a protocol to generate a series of related or independent objects without specifying the concrete object type, such as java.util.Calendar#getInstance().
      • Builder: A new class is defined to build an instance of another class to simplify the creation of complex objects such as: java.lang.StringBuilder#append().
      • Factory method: that a return * back to the specific object methods, rather than several, such as java.lang.Object # toString (), java.lang.Class # newInstance ().
      • Prototype mode: Makes an instance of a class generate its own copy, such as: java.lang.Object#clone().
      • Singleton mode: There is only one instance globally, such as java.lang.Runtime#getRuntime().
    • Behavior pattern:

      • Chain of Responsibility: By delegating requests from an object to the next object in the chain until the request is processed, decoupling between objects is achieved. Such as javax.servlet.Filter#doFilter().
      • Command mode: encapsulates operations into objects for storage, delivery, and return, such as: java.lang.Runnable.
      • Interpreter mode: defines the syntax of a language, and then parses the corresponding syntax statement, such as java.text.Format, java.text.Normalizer.
      • Iterator mode: Provides a consistent method for sequentially accessing objects in a collection, such as java.util.Iterator.
      • Mediator pattern: java.lang.reflect.Method#invoke() by using an intermediate object for message distribution and reducing the direct dependency between classes.
      • Empty object patterns: such as java.util.Collections#emptyList().
      • Observer pattern: It allows an object to send messages to interested objects flexibly, such as java.util.EventListener.
      • Template method pattern: Allows subclasses to override part of the method instead of the entire rewrite, such as java.util.Collections#sort().
  • "Spring-related design patterns summary"

  • "Design Patterns Used by Mybatis"

Singleton mode

Chain of responsibility model

TODO

MVC

IOC

  • Understanding the IOC
  • "ICO Understanding and Interpretation"
    • Positive control: Tradition through new. Reverse control, inject the object through the container.
    • Role: For module decoupling.
    • DI: Dependency Injection, that is, dependency injection, is only concerned with resource usage and does not care about the source of resources.

AOP

UML

Microservice idea

Conway Law

  • "The Theoretical Basis of Microservice Architecture - Conway's Law"

    • Rule One: Organizational communication methods will be expressed through system design, which means that the layout and organizational structure of the architecture will be similar.
    • Law 2: Time is not possible to do more perfect one thing, but there is always time to finish one thing. Can't get fat in one go, but you can get it first.
    • Law 3: There are potentially heterogeneous homomorphic characteristics between linear systems and linear organizational structures. Get melons and become independent autonomous subsystems to reduce communication costs.
    • Law 4: Large system organizations are always more decomposed than small systems. All the time will be divided, divide and rule.
  • "Microservice Architecture Core 20"

Operation & Statistics & Technical Support

General monitoring

Command line monitoring tool

APM

APM — Application Performance Management

Statistical Analysis

Jenkins

Environmental separation

Development, testing, and production environment separation.

Automation operation and maintenance

Ansible

Puppet

Chef

test

TDD theory

  • Deep Interpretation - TDD (Test Driven Development)
    • Based on test case coding function code, XP (Extreme Programming) core practice.
    • Benefits: Focus on one point at a time, reduce the burden of thinking; meet changes in demand or improve code design; clarify requirements in advance;

unit test

pressure test

Full-link pressure test

A/B, grayscale, blue-green test

Virtualization

KVM

Xen

OpenVZ

Container technology

Docker

Cloud technology

OpenStack

DevOps

Document management

Middleware

Web Server

Nginx

OpenResty

Apache Httpd

Tomcat

Architecture principle

Tuning plan

  • Tomcat Tuning Solution

    • Start NIO mode (or APR); adjust thread pool; disable AJP connector (Nginx+tomcat architecture, no AJP required);
  • "tomcat http protocol and ajp protocol"

  • AJP vs. HTTP Comparison and Analysis

    • The AJP protocol (port 8009) is used to reduce the number of connections (front-end) to the front-end Server (such as Apache, which also needs to support the AJP protocol), and to increase performance through long connections.
    • When concurrency is high, the AJP protocol is better than the HTTP protocol.

Jetty

  • "How Jetty Works and Comparison with Tomcat"
  • "Compared advantages of jetty and tomcat"
    • Architecture comparison: Jetty's architecture is much simpler than Tomcat's.
    • Performance comparison: Jetty and Tomcat have little difference in performance. Jetty uses NIO to end up with more advantages in handling I/O requests. Tomcat uses BIO to handle I/O requests by default. Tomcat is suitable for handling a few very busy links and handles static resources. Poor performance.
    • Other aspects: The application of Jetty is faster, simpler to modify, and better supported by the new Servlet specification; Tomcat supports JEE and Servlet more comprehensively.

Caching

Local cache

Client Cache

Server cache

Web Cache

Memcached

Redis

  • Redis Tutorial

  • "The principle of redis bottom"

    • Using ziplists to store linked lists, ziplists are a kind of compressed list, which has the advantage of saving more memory because everything it stores is in a contiguous area of ​​memory.
    • Skiplist is used to store ordered collection objects, search on high level, complexity of time and red-black tree, implementation is easy, lock-free and concurrency is good.
  • "Redis Persistence Method"

    • RDB mode: Periodically backs up snapshots and is commonly used for disaster recovery. Advantages: Backing up through the fork process does not affect the speed of the main process and RDB when recovering large data sets faster than AOF recovery. Disadvantages: Will lose data.
    • AOF mode: Save the operation log mode. Advantages: Less data loss during recovery, disadvantages: large files, slow response.
    • It can also be used in combination.
  • "Distributed Cache - Sequence 3 - Atomic Operations and CAS Optimistic Locking"

Structure

Recovery strategy

Tair

  • Official website
  • "The Comparison of Tair and Redis"
  • Features: You can configure the number of backup nodes, asynchronously to the backup node
  • Consistent hashing algorithm.
  • Architecture: Similar to the design concept of Hadoop, Configserver, DataServer, and Configserver are used to detect heartbeats. Configserver also has a master/slave relationship.

Several storage engines:

  • MDB, full memory, can be used to store data such as Session.
  • Rdb (similar to Redis), lightweight, removes operations like aof, supports Restfull operations
  • LDB (LevelDB storage engine), persistent storage, LDB as a persistence of rdb, google implementation, more efficient, theoretical basis is the LSM (Log-Structured-Merge Tree) algorithm, now modify the data in memory, reach a certain amount (and Memory aggregated old data is written to disk together. Write to disk and storage is more efficient. The county compares the hash algorithm.
  • Tair uses shared memory to store data. If the service goes down (not the server), the data is still available after the service is restarted.

message queue

Message bus

The message bus is equivalent to a layer of encapsulation on the message queue, unified entrance, unified management and control, simplifying the access cost.

Order of messages

RabbitMQ

Supports transactions, push and pull modes are all supported, and are suitable for scenarios requiring reliable message transmission.

RocketMQ

Java implementation, push and pull modes are all supported, and the throughput is lower than Kafka. The order of messages can be guaranteed.

ActiveMQ

Pure Java implementation, compatible with JMS, can be embedded in Java applications.

Kafka

High throughput, pull mode. Suitable for high IO scenarios, such as log synchronization.

Redis message push

Producer and consumer patterns are completely client-side behaviors, list and pull modes are implemented, blocking waits for blpop instructions.

ZeroMQ

TODO

Scheduled scheduling

Stand-alone scheduled scheduling

Distributed timing scheduling

RPC

Dubbo

** SPI ** TODO

Thrift

gRPC

The server can be authenticated and encrypted. In an external network environment, data security can be guaranteed.

Database middleware

Sharding Jdbc

Log system

Log collection

Configuration Center

Servlet 3.0 Asynchronous Feature Available to Configuration Center Clients

API Gateway

Main responsibilities: request forwarding, security authentication, protocol conversion, disaster recovery.

The internet

protocol

OSI Layer 7 Protocol

TCP/IP

HTTP

HTTP2.0

HTTPS

Network model

  • "The principles of web optimization must understand the five models of the I/O and the three working modes of the web."

    • Five I/O models: Blocking I/O, Non-blocking I/O, I/O Multiplexing, Event (Signal) Driven I/O, Asynchronous I/O, and the first four I/Os are synchronous operations, I/O The first stage of O is the same and the second stage is the same. The last one belongs to asynchronous operation.
    • Three kinds of Web Server work methods: Prefork (multi-process), Worker mode (thread mode), Event mode.
  • Summary of differences between select, poll, and epoll

    • Select, poll, and epoll are essentially synchronous I/O because they all need to be responsible for reading and writing after the read/write event is ready. This means that the read/write process is blocked.
    • Select has a limit on the number of open file descriptors, the default is 1024 (2048 for x64), 1 million concurrent, 1000 processes will be used, and the switching overhead is large; poll uses the linked list structure, and there is no limit to the number.
    • Select, poll "wake up" when iterating over the entire fd collection, and epoll "wake up" as long as the judgment is ready list is empty, through the callback mechanism to save a lot of CPU time; select, poll call every time To copy the fd set from the user state to the kernel state, epoll only needs to copy once.
    • With the increase of concurrent poll, the performance gradually declines. Epoll adopts the red-black tree structure and has stable performance. It will not decrease as the number of connections increases.
  • "select, poll, epoll comparison"

    • In situations where the number of connections is small and the connections are very active, the performance of select and poll may be better than epoll. After all, the epoll notification mechanism requires a lot of function callbacks.
  • "In-depth understanding of Java NIO"

    • NIO is a synchronous non-blocking IO model. Synchronization means that the thread continuously polls for IO events. Non-blocking means that the thread can wait for IO and can do other tasks at the same time.
  • "The difference between BIO and NIO, AIO"

  • "Two Efficient Server Design Models: Reactor and Proactor Models"

Epoll

Java NIO

Kqueue

Connections and short connections

frame

Zero-copy

Hessian

Protobuf

database

Basic theory

Three paradigms of database design

  • "Three Paradigms of Database and Five Constraints"
    • The first normal form: each column (each field) in the data table must be the smallest unit that cannot be split, that is, to ensure the atomicity of each column;
    • Second Normal Form (2NF): After satisfying 1NF, all the columns in the table must be dependent on the primary key. No column can be related to the primary key. That is, a table describes only one thing.
    • The third paradigm: must meet the second paradigm (2NF), requirements: each column in the table is only directly related to the primary key and not indirectly, (each column in the table can only rely on the primary key);

MySQL

principle

InnoDB

optimization

index

Clustered index, non-clustered index

MyISAM is non-clustered, InnoDB is aggregated

Composite index

Explain

NoSQL

MongoDB

  • MongoDB Tutorial
  • "The Advantages and Disadvantages of Mongodb vs. Relational Databases"
    • Advantages: weak consistency (eventually consistent), better guarantee user access speed; built-in GridFS, support for large-capacity storage; Schema-less database, no pre-defined structure; built-in Sharding; compared to other NoSQL, third-party support is rich Superior performance;
    • Disadvantages: mongodb does not support transaction operations; mongodb takes up too much space; MongoDB does not have mature maintenance tools such as MySQL, which is a noteworthy area for development and IT operations;

Hbase

search engine

Search Engine Principle

Lucene

Elasticsearch

Solr

Sphinx

performance

Performance Optimization Methodology

Capacity assessment

CDN network

connection pool

Performance tuning

Big Data

Flow calculation

Storm

Flink

Kafka Stream

Application scenario

E.g:

  • Advertising related real-time statistics;
  • Recommended system user image tag updates in real time;
  • Online service health monitoring in real time;
  • Real-time rankings;
  • Real-time statistics.

Hadoop

HDFS

MapReduce

Yarn

Spark

Safety

Web security

XSS

CSRF

SQL injection

Hash Dos

Script injection

Vulnerability Scan Tool

Verification code

DDoS protection

User privacy protection

  1. User password is not stored in plain text, plus dynamic slat.
  2. ID number, mobile number If you want to display, replace some of the characters with "*".
  3. The display of the contact information is controlled by the user himself or herself.
  4. TODO

Serialization vulnerability

  • "Lib's past? Java Universal Deserialization Vulnerability Analysis

encrypt and decode

Symmetric encryption

  • Common symmetric encryption algorithm
    • DES, 3DES, Blowfish, AES
    • DES uses a 56-bit key and Blowfish uses 1- to 448-bit variable keys, AES 128, 192, and 256-bit length keys.
    • The DES key is too short (56-bit only) algorithm is currently replaced by AES, and AES has hardware acceleration and performs well.

Hash algorithm

Asymmetric encryption

  • Common Asymmetric Encryption Algorithm
    • RSA, DSA, ECDSA (Helix Curve Encryption Algorithm)

    • Unlike RSA, DSA can only be used for digital signatures and cannot encrypt or decrypt data. Its security is comparable to that of RSA, but its performance is faster than that of RSA.

    • The 256-bit ECC key has the same security as the 3072-bit RSA key.

      Blockchain encryption technology

Server security

Data Security

data backup

TODO

Network isolation

Internal and external network separation

TODO

Login board

In the internal and external environment, log in to the online host through the springboard.

Authorization, certification

RBAC

OAuth2.0

2FA - Two-factor authentication for enhanced login authentication

Common practice is login password + phone verification code (or token key, similar to the USB key with online banking)

Common open source framework

Open Source Agreement

Logging framework

Log4j, Log4j2

Logback

ORM

MyBatis:

Network framework

TODO

Web framework

Spring family

Spring

Spring Boot

Spring Cloud

Tool frame

Distributed design

Scalable design

Stability & High Availability

  • "System Design: Some Technical Solutions for High Availability Systems"

    • Scalable: horizontal expansion, vertical expansion. Through redundant deployment, single points of failure are avoided.
    • Isolation: Avoiding a single business occupies all resources. Avoid business interactions 2. Computer room isolation avoids single points of failure.
    • Decoupling: Reduce maintenance costs and reduce coupling risks. Reduce dependence and reduce mutual influence.
    • Current limiting: sliding window counting, leaky bucket algorithm, token bucket algorithm and other algorithms. When encountering burst traffic, ensure the system is stable.
    • Degradation: The release of non-core functional resources in an emergency. Sacrifice non-core business to ensure high availability of core business.
    • Fuse: An abnormal condition exceeds the threshold and enters the blown state and quickly fails. Reduce the impact of unstable external dependencies on core services.
    • Automated testing: Reduce faults caused by publications through comprehensive testing.
    • Grayscale publishing: Grayscale publishing is a compromise between speed and security that can effectively reduce release failures.
  • "About Highly Available Systems"

    • Design principles: data is not lost (persistent); service is highly available (copy of service); absolute 100% high availability is difficult, the goal is to do as much as 9, such as 99.999% (accrued for only 5 minutes throughout the year).

Hardware load balancing

Software load balancing

Limiting

  • "Talk about the Current Limitations of High Concurrent Systems"
    • Counter: Sliding the window counter to control the number of requests per unit time, simple and crude.
    • Leaky bucket algorithm: Fixed-capacity leaky buckets, which are frequently used when the leaky bucket is full.
    • Token bucket algorithm: a fixed-capacity token bucket. To add a token at a certain rate, the token needs to be obtained before the request is processed. If the token is not available, the request is discarded or the queue is dropped. The rate of adding the token can be controlled. To control the overall speed. The RateLimiter in Guava is an implementation of a token bucket.
    • Nginx limiting: By limit_reqlimiting the number of concurrent connections module.

Application layer disaster recovery

  • "Avalanche Fighters: Principles and Uses of Fused Hystrix"

    • Avalanche effect causes: hardware failures, hardware failures, program bugs, retries to increase traffic, and user requests.
    • Avalanche countermeasures: current limit, improved cache mode (cache preload, synchronous call change asynchronous), automatic expansion, and degraded.
    • Hystrix Design Principles:
      • Resource isolation: Hystrix avoids service avalanche by isolating each dependent service by assigning separate thread pools for resource isolation.
      • Fuse switch: health status of service = request failures / total number of requests, threshold setting and sliding window control switch.
      • Command mode: wraps the service invocation logic by inheriting the HystrixCommand.
  • Cache Penetration, Cache Breakdown, Cache Avalanche Solution Analysis

  • "Cache Breakdown, Failure, and Hot Key Problems"

    • The main strategy: failure instant: stand-alone use of locks; use of distributed lock; not expire;
    • Hotspot data: Hotspot data is stored separately; use local cache; divided into multiple subkeys;

Cross-room disaster recovery

  • "Distribution Experience of Multi-Room and Multi-Rooms"

    • Synchronous data synchronization through self-developed middleware.
  • "Disposal of living in different places (remote living) experience"

    • Pay attention to the delay problem. Multiple calls across the room will amplify the delay several times.
    • There is a large probability that there will be a large number of building room lines, and fault tolerance will be done at the operation and maintenance level and at the program level.
    • Can not rely on dual-write data in the program, there must be automatic synchronization program.
    • The data never considers high delays or poor network quality, considering synchronization quality issues.
    • The core business and the secondary business are divided and conquered, and even only the core business is considered.
    • Remote live monitoring deployment and testing should also keep up.
    • Consider user partitions where business is allowed, especially games and email services.
    • Control the size of the message body across the room, the smaller the better.
    • Consider using docker container virtualization technology to improve dynamic scheduling capabilities.
  • Disaster recovery technology and construction experience

Disaster recovery drill process

Smooth start

Database expansion

Read-write separation mode

Fragmentation mode

  • "Sub-tables need to consider the issues and programs"

    • Middleware: Lightweight: sharding-jdbc, TSharding; Heavyweight: Atlas, MyCAT, Vitess, etc.
    • Problem: transaction, Join, migration, expansion, ID, paging, etc.
    • Transaction compensation: reconciliation of data; comparison based on logs; regular synchronization with standard data sources.
    • Sub-library strategy: numerical range; modulus; date, etc.
    • The number of sub-libraries: usually MySQL single library 50 million, Oracle single library 100 million need to sub-library.
  • "MySql Table and Table Partition Detailed"

    • Partitioning: It is an internal mechanism of MySQL. It is transparent to the client. Data is stored in different files. On the surface, it is the same table.
    • Table: Physically create different tables, clients need to manage the sub-table routing.

Service governance

Service Registration and Discovery

  • "Never lose contact! How to achieve service discovery in microservice architecture? 》

    • Client service discovery mode: The client directly queries the registry and is responsible for load balancing. Eureka uses this approach.
    • Server-side service discovery mode: The client queries service instances through load balancing.
  • "SpringCloud Service Registry Comparison: Consul vs Zookeeper vs Etcd vs Eureka"

    • CAP support: Consul (CA), zookeeper (cp), etcd (cp), Euerka (ap)
    • The author thinks Consul's support for Spring cloud is better.
  • "Zookeeper based service registration and discovery"

    • Advantages: API is simple, Pinterest, Airbnb in use, multi-language, through the watcher mechanism to achieve configuration PUSH, can quickly respond to configuration changes.

Service Routing Control

  • Distributed Services Framework Study Notes 4 Service Routing
    • Principle: Transparent routing
    • Load Balancing Policy: Random, Polling, Service Call Latency, Consistent Hash, Sticky Connection
    • Local routing has a limited strategy: injvm (preferentially calling jvm internal services) and initial (priority using the same physical machine services), and in principle finding the nearest service.
    • Configuration method: unified registry, local configuration, and dynamic delivery.

Distributed consensus

CAP and BASE theory

  • "Discussing CAP Theory, BASE Theory from Distributed Consistency"
    • Consistency classification: strong agreement (immediately consistent); weakly consistent (achievable in units of time, such as seconds); eventually consistent (weakly consistent, eventually consistent within a certain period of time)
    • CAP: Consistency, Availability, Partition Fault Tolerance (Cause of Network Failure)
    • BASE: Basically Available, Soft state, and Eventually consistent
    • The core idea of ​​BASE theory is: Even if it can not achieve strong consistency, but each application can be based on their own business characteristics, using appropriate methods to achieve the final consistency of the system.

Distributed lock

  • "Several Implementations of Distributed Locks"

    • Database-based distributed locks: Advantages: Simple and easy to understand. Disadvantages: There is a single point problem, the database can be a large overhead, not reentrant;
    • Cache-based distributed locks: Advantages: Non-blocking, good performance. Disadvantages: Operation is not easy to cause the lock can not be released.
    • Zookeeper distributed lock: The lock mechanism is implemented by an orderly temporary node, and its corresponding node needs to be the smallest, and it is considered that the lock is obtained. Advantages: The cluster can solve single problems transparently, avoiding locks from being released, and locks can be reentrant. Disadvantages: Performance is not as good as caching, and throughput decreases as the zk cluster grows in size.
  • "Zookeeper Based Distributed Lock"

    • Clear principle description + Java code example.
  • "jedisLock-redis distributed lock implementation"

    • Based on setnx(set if ont exists), it returns false, otherwise it returns true. And support expired time.
  • Memcached and Redis Distributed Locking Scheme

    • Use the add (as opposed to set) operation of memcached to return false when the key exists.

Distributed consensus algorithm

PAXOS

Zab

Raft

Gossip

Two-phase commit, multi-phase commit

Idempotent

  • "Distributed Systems - Idempotency Design"
    • The role of idempotency: The resource is idempotent and the requester doesn't need to worry about repeated calls that can generate errors.
    • Common means to guarantee idempotency: MVCC (similar to optimistic locking), deduplication table (unique index), pessimistic locking, one-time token, serial number mode.

Distributed consensus solution

Distributed Leader node election

Flexible Transactions

  • "Traditional and Flexible Affairs"
    • Based on BASE theory: basic availability, flexibility, and eventual consistency.
    • Solution: Record log + compensation (forward supplement or rollback), message retry (requires the program to be exponentiated, etc.); "no lock design", using optimistic locking mechanism.

Distributed File System

Unique ID generation

Globally unique ID

  • Generating Globally Unique Id Summary in Highly Concurrent Distributed Systems

    • Twitter scheme (Snowflake algorithm): 41-bit timestamp + 10-digit machine identifier (such as IP, server name, etc.) + 12-digit serial number (local counter)
    • Flicker project: MySQL self-increment ID + "REPLACE INTO XXX:SELECT LAST_INSERT_ID();"
    • UUID: Disadvantages, unordered, too long strings, take up space, affect retrieval performance.
    • MongoDB Scenario: Use ObjectId. Disadvantages: can not increase.
  • The TDDL SEQUENCE Principle Under Distributed

    • Create a sequence table in the database to record the maximum value of the id that is currently occupied.
    • Each client host takes an id interval (such as 1000~2000) cached locally, and updates the id maximum record in the sequence table.
    • Different id ranges are used between client hosts. After they are used up and fetched, optimistic locking is used to control concurrency.

Consistent hashing algorithm

Design Ideas & Development Models

  • "Talking about my understanding of DDD-driven design"

    • Concept: DDD is mainly proposed for the separation of the various stages of the traditional software development process (Analysis-Design-Code) to avoid undeliverable software (and requirements) due to unclear analysis at first or inconsistent information flow during software development. Assuming inconsistencies). DDD emphasizes that everything is centered on the domain and emphasizes the role of Domain Expert. It emphasizes that the domain model is first defined and then developed, and the domain model can guide the development (the so-called driver).
    • Process: Understanding the domain, splitting the domain, refining the domain, the accuracy of the model depends on the depth of understanding of the model.
    • Design: DDD proposes modeling tools such as aggregations, entities, value objects, factories, warehousing, domain services, and domain events to help with domain modeling.
  • "Summary of Domain Driven Design Basics"

    • Domain (Doamin) is essentially a problem domain, such as an e-commerce system, a forum system, etc.
    • Bounded Context: Explains the relationship between subdomains and can be simply understood as a subsystem or component module.
    • Domain Model: The core of DDD is to establish the correct domain model (using universal description language, tools, and domain common language); to reflect the nature of business requirements, including entities and processes; it runs through software analysis, design, and development. Process; commonly used expression field model: diagram, code or text;
    • Domain common language: Domain experts, developers and designers can have immediate language or tools.
    • The classic layered architecture: user interface/presentation layer, application layer, domain layer, and infrastructure layer is a four-tier architecture model.
    • The mode used:
      • Link as little as possible, as far as possible single item, try to reduce the overall complexity.
      • Entity: The only indication in the domain that an entity has as few attributes as possible and at least as clear.
      • Value Object: There is no unique identifier, and the property value is immutable. The second is a simple object, such as Date.
      • Domain Service: Coordinate multiple domain objects, only the method has no state (no data); it can be divided into application layer services, domain layer services, and basic layer services.
      • Aggregate, Aggregate Root: Aggregate defines a set of related objects with a cohesive relationship; the aggregate root is the only element of the aggregate reference; when modifying an aggregate, it must be at the transaction level; most areas In the model, 70% of aggregates usually have only one entity, and 30% only have 2 to 3 entities; if an aggregate has only one entity, then this entity is the aggregate root; if there are multiple entities, then we can think about which object in the aggregate Has independent significance and can interact with the outside directly;
      • Factory: Similar to the factory mode in design mode.
      • Repository: Persisting to DB, managing objects, and designing storage only for aggregates.
  • "Realm Driven Design (DDD) Road to Realization"

    • Aggregation: For example, a Car contains components such as Engine, Wheel, and Tank.
  • "Field-Driven Design Series (2) Analysis of VO, DTO, DO, PO Concepts, Differences, and Uses"

CQRS - Command Query Responsibility Seperation

Anemia, congestive model

  • "Anemia, explanation of hyperemia model, and some experience"
    • Blood loss model: Lao Tzu and son are defined separately, and they do not know each other. There is no business logic in the entity definition of the two, and they are related through external services.
    • Anemia model: Laozi knows son, son also knows Laozi; part of the business logic is put into the entity; Advantages: Individual layers of dependent, structural clear, easy to maintain; Disadvantages: does not meet the OO thinking, compared to the congestion model, the service layer is more heavy ;
    • Congestion model: Similar to the anemia model, the difference lies in how to divide the business logic. Advantages: The service layer is relatively thin, only acts as a facade, does not deal with DAO, compound OO thinking; disadvantages: non-single dependency, bi-directional dependency between DO and DAO, and logical division of the Service layer is likely to cause confusion.
    • Swollen mode: It is an extreme situation, cancel the service layer, all the business logic in DO; Advantages: in line with OO ideas, simplification of layering; Disadvantages: too much exposure information, many non-DO logic will be forced into DO. This pattern should be avoided.
    • The authors advocate using the anemia model.

Actor mode

TODO

Reactive programming

Reactor

TODO

RxJava

TODO

Vert.x

TODO

DODAF2.0

Serverless

TODO

Service Mesh

TODO

Project management

Architecture review

Reconstruction

Code specification

TODO

Code Review

Institution or system! In addition, each company needs to develop its own check list based on its own needs and goals.

RUP

Kanban Management

SCRUM

SCRUM - Scrimmage

Agile development

TODO

XP - eXtreme Programming

  • "Mainstream Agile Development Method: Extreme Programming XP"
    • It is a methodology to guide developers.

    • 4 great values:

      • Communication: Encourage verbal communication and increase efficiency.
      • Simple: Just enough.
      • Feedback: timely feedback, notify relevant people.
      • Courage: Advocate embrace change, dare to reconstruct.
    • Five principles: fast feedback, simple assumptions, gradual modification, promotion of change (small step run), high quality work (guarantee the quality of the premise to guarantee small steps run).

    • 5 jobs: Staged sprints; sprint planning meetings; daily standing meetings; sprint review;

Pair programming

Write code, edge review. Can enhance code quality and reduce bugs.

FMEA management model

TODO

General Business Terms

TODO

Technical trends

TODO

Policies and regulations

TODO

legal

Strictly abide by Article 253 of the Criminal Law

Article 253 of the Chinese Criminal Law states:

  • Staff of state agencies or units in finance, telecommunications, transportation, education, medical care, etc., who, in violation of state regulations, sell or illegally provide personal information of citizens obtained by the entity in the course of performing its duties or providing services to others, if the circumstances are serious, Departments shall be sentenced to fixed-term imprisonment of not more than 3 years or criminal detention and shall be concurrently executed or a single fine.
  • If the above information is stolen or otherwise illegally obtained, and the circumstances are serious, it shall be punished in accordance with the provisions of the preceding paragraph.
  • If a unit guilty of the first two crimes, the unit shall be fined, and the person directly in charge and other persons directly responsible shall be punished in accordance with the provisions of each paragraph.

The Supreme People's Court and the Supreme People's Procuratorate's Supplementary Provisions (IV) on the enforcement of the "Criminal Law of the People's Republic of China" confirming that the offender violates paragraph 1 of Article 253 of the Criminal Law and constitutes the crime of "selling and illegally providing personal information of citizens"; Infringe the provisions of paragraph 2 of Article 253 of the Criminal Law and constitute the crime of “illegal acquisition of personal information of citizens”

Architect quality

  • Architect Portrait

    • Business understanding and abstraction capabilities
    • NB's code capabilities
    • Comprehensiveness: 1. Will there be multiple technical solutions in the minds of architects in the face of business problems; 2. Do you consider enough aspects in the design of the system? 3. Do you consider enough in system design? Many aspects;
    • Global: Whether to consider the impact on the upstream and downstream systems.
    • Tradeoffs: trade-offs between input-output ratios; priority and rhythm control;
  • What Things Architects Must Know About Architecture Optimization and Design

    • The details to consider: Modular, light-coupled, shared-nothing architecture; reduction of dependencies before individual components, attention to dependencies between services, and all the resulting chain failures and effects.
    • Comprehensive consideration of infrastructure, configuration, testing, development, operation and maintenance.
    • Consider the influence of people, teams, and organizations.
  • "How can I really improve myself and become an outstanding architect?

  • "Architect's Essential Quality and Growth Path"

    • Quality: business understanding, breadth of technology, depth of technology, rich experience, communication skills, hands-on capabilities, and aesthetic qualities.
    • Growth path: 2 years of accumulated knowledge, 4 years of accumulated skills and influence within the group, 7 years of accumulated influence within the sector, and 7 years of accumulated cross-sectoral influence.
  • "Architects - Where are you on the floor?

    • The first-tier architect sees only the product itself
    • The second-tier architect not only saw his own product but also saw the overall solution
    • Third-tier architects see business value

Team management

TODO

Recruitment

Information

Industry information

Public number list

TODO

Blog

Team blog

personal blog

Integrated portal, community

domestic:

foreign:

Q&A, discussion community

Industry data analysis

Special website

other

Recommended reference book

Online eBooks

Paper book

Development aspects

Architecture aspects

Technical management

Basic theory

Tools

TODO

Big data

Technical resources

Open source resources

Manuals, Documents, Tutorials

domestic:

  • W3Cschool

  • Runoob.com

    • HTML, CSS, XML, Java, Python, PHP, design patterns, and other introductory manuals.
  • Love2.io

    • Many, many Chinese online e-books are a new open source technology document sharing platform.
  • Gitbook.cn

    • Paid e-books.
  • ApacheCN

    • AI, big data series Chinese documents.

foreign:

  • Quick Code
    • Free online technical tutorials.
  • Gitbook.com
    • There are some Chinese e-books.
  • Cheatography
    • Cheat Sheets, a one-page document website.
  • Tutorialspoint
    • Well-known tutorial website, providing high-quality introductory tutorials such as Java, Python, JS, SQL, big data.

Online class

conference

Event publishing platform:

Common APP

find a job

tool

Code hosting

File service

  • Seven cattle
  • Shooting clouds again

Comprehensive cloud service provider

VPS

About

Backend Architect Technical Map

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published