Merge pull request #166 from JceLee/main

Vancouver-KDD · Feb 21, 2024 · bf293da · bf293da
2 parents a52d506 + 6fea50f
commit bf293da
Show file tree

Hide file tree

Showing 4 changed files with 227 additions and 0 deletions.
diff --git a/...back-of-the-envelope-estimation/kangmin-ch02-back-of-the-envelope-estimation.md b/...back-of-the-envelope-estimation/kangmin-ch02-back-of-the-envelope-estimation.md
@@ -0,0 +1,54 @@
+# Chapter 2 : Back of the envelope estimation
+
+## Power of two
+
+- Data Volume 을 다룰 때, Power of two 로 주로 표현 되며, 이를 이해 할 필요가 있다.
+
+| Power | Approximate Value | Full name  | Short name |
+|-------|-------------------|------------|------------|
+| 10    | 1 Thousand        | 1 Kilobyte | 1 KB       |
+| 20    | 1 Million         | 1 Megabyte | 1 MB       |
+| 30    | 1 Billion         | 1 Gigabyte | 1 GB       |
+| 40    | 1 Trillion        | 1 Terabyte | 1 TB       |
+| 50    | 1 Quadrillion     | 1 Petabyte | 1 PB       |
+
+## Latency numbers every programmer should know
+
+- 지연시간(Latency) 를 나타내는 방식들
+- ns = nanosecond, μs = microsecond, ms = millisecond
+
+| Operation name                                   | Time                    |
+|--------------------------------------------------|-------------------------|
+| L1 cache reference                               | 0.5 ns                  |
+| Branch mispredict                                | 5 ns                    |
+| L2 cache reference                               | 7ns                     |
+| Mutex lock/unlock                                | 100 ns                  |
+| Main memory reference                            | 100 ns                  |
+| Compress 1K bytes with Zippy                     | 10,000 ns = 10 μs       |
+| Send 2K bytes over 1 Gbps network                | 20,000 ns = 20 μs       |
+| Read 1 MB sequentially from memory               | 250,000 ns = 250 μs     |
+| Round trip within the same datacenter            | 500,000 ns = 500 μs     |
+| Disk seek                                        | 10,000,000 ns = 10 ms   |
+| Read 1 MB sequentially from the network          | 10,000,000 ns = 10 ms   |
+| Read 1 MB sequentially from the disk             | 30,000,000 ns = 30 ms   |
+| Send packet CA (California) -> Netherlands -> CA | 150,000,000 ns = 150 ms |
+
+
+- Memory 가 disk 보다 빠르다 -> 왠만하면 disk seeks 을 ㅍ하자
+- 간단한 compression 작업 역시 리소스를 많이 먹지 않는다.
+- 위 표를 보면 compression 을 한 뒤 network 로 보내는 것이 Latency 를 줄일 수 있 음.
+- Data Center 의 위치를 무시 할 수 없다... 마지막 표를 보면
+
+## Availability numbers
+
+- High availability 이란 시스템이 지속적으로 운영될 수 있는 능력을 말한다.
+- 100% 는 0의 downtime 을 의미한다.
+- 대부분 서비스들은 99 ~ 100 % 의 downtime 을 갖는다.
+
+| Availability (%) | Downtime per day    | Downtime per year |
+|------------------|---------------------|-------------------|
+| 99%              | 14.40 minutes       | 3.65 days         |
+| 99.9%            | 14.4 minutes        | 8.77 hours        |
+| 99.99%           | 8.64 seconds        | 52.60 minutes     |
+| 99.999%          | 864.00 milliseconds | 5.26 minutes      |
+| 99.9999%         | 86.40 milliseconds  | 31.56 seconds     |
diff --git a/...stem-design-interviews/kangmin-ch03-a-framework-for-system-design-interviews.md b/...stem-design-interviews/kangmin-ch03-a-framework-for-system-design-interviews.md
@@ -0,0 +1,68 @@
+# Chapter 3 : A FRAMEWORK FOR SYSTEM DESIGN INTERVIEWS
+
+## What is system design interview?
+
+- System design interview 란 실제 상황에서의 문제 해결을 모방(intimidate) 하는 인터뷰이다.
+- 한정적인(Limited) 인터뷰 시간 내에서 실제로 하나의 전체 product 를 design 하는 것은 불가능하다.
+- Interview 를 통해서 알고자 하는 것은 'ability to collaborate, to work under pressure, and to resolve ambiguity
+  constructively' 이다.
+- Answer 이 정해져있지 않으며, Good Question 을 할 수 있는 능력 또한 필요하다.
+
+## A 4-step process for effective system design interview
+
+### Step 1 - Understand the problem and establish design scope
+
+- 빠르게 답변을 하는 것보다 요구사항 (Requirements) 를 명확하는(clarify) 것이 중요하다.
+- 이를 위해, Good questions 을 하도록 하자.
+- Good questions 이란 모호한(vague) 요구사항 혹은 아직 정해지지 않은 디테일 들에 대한 질문이 될 수 있다.
+
+### Step 2 - Propose high-level design and get buy-in
+
+- High-level design 을 우선적으로 목표로 하고, interviewer 와 피드백을 주고 받는다.
+- 디테일을 아직 논의하지 말고, clients side, APIs, web servers, data stores, cache, CDN, message queue 등에 대해서 어떻게 처리 할 지 큰 그림을 그리자.
+- Whiteboard 에 Diagram 을 제공하는 것도 좋은 방법
+
+
+### Step 3 - Design deep dive
+
+#### 전체 디자인을 보완하고, 아래의 목표에 집중한다.
+
+- Agreed on the overall goals and feature scope
+- Sketched out a high-level blueprint for the overall design 
+- Obtained feedback from your interviewer on the high-level design 
+- Had some initial ideas about areas to focus on in deep dive based on her feedback
+
+#### 이 과정에서 Interviewer 가 선호하는 방향을 파악하려고 해보자. 예를 들면, some interviewers 는 high-level design 에 우선순위(priority) 가 여전히 있을 수도 있지만, 다른 몇몇은 다른 부분을 건드는 것을 원할 수도 있다.
+
+### Step 4 - Wrap up
+
+- Potential improvements 등을 논의하면서 피드백을 받자.
+- Never say 'your design is perfect and nothing can be improved'
+- 너가 어필 할 부분이 있으면 다시 한번 interviewer 에게 강조를 해주자. (or refreshing)
+- Error cases, Operation issues, the Next scale curve 도 마무리에 꺼내기 좋은 토픽이다.
+
+### Do's & Don'ts
+
+#### Do's
+
+- Always ask for clarification. Do not assume your assumption is correct. 
+- Understand the requirements of the problem. 
+- There is neither the right answer nor the best answer. A solution designed to solve the
+problems of a young startup is different from that of an established company with millions
+of users. Make sure you understand the requirements. 
+- Let the interviewer know what you are thinking. Communicate with your interview. 
+- Suggest multiple approaches if possible. 
+- Once you agree with your interviewer on the blueprint, go into details on each
+component. Design the most critical components first. 
+- Bounce ideas off the interviewer. A good interviewer works with you as a teammate. 
+- Never give up.
+
+#### Don'ts
+
+- Don't be unprepared for typical interview questions.
+- Don’t jump into a solution without clarifying the requirements and assumptions. 
+- Don’t go into too much detail on a single component in the beginning. Give the highlevel
+design first then drills down. 
+- If you get stuck, don't hesitate to ask for hints. 
+- Again, communicate. Don't think in silence. 
+- Don’t think your interview is done once you give the design. You are not
diff --git a/...design-interview/04-design-a-rate-limiter/kangmin-ch04-design-a-rate-limiter.md b/...design-interview/04-design-a-rate-limiter/kangmin-ch04-design-a-rate-limiter.md
@@ -0,0 +1,62 @@
+# Chapter 4 : DESIGN A RATE RIMITER
+
+## Step 1 - Understand the problem and establish design scope
+
+- Rate limiting 은 클라이언트나 서비스의 traffic 속도를 제어하는 기능이며, HTTP 에서는 특정 기간 내에 클라이언트의 요청 수를 제한.
+
+## Step 2 - Propose high-level design and get buy-in
+
+- 클라이언트 측 rate limiting 과 서버 측 API rate limiting 중 어디에 구현할지에 대한 고려가 필요.
+- rate limiter middleware 를 사용하여 API gateway 에 구현하는 것이 일반적.
+- rate limiting 을 위한 알고리즘 선택은 기업의 기술 스택, 엔지니어링 리소스, 우선 순위 등에 따라 달라짐.
+
+### Algorithms for rate limiting
+
+#### Token Bucket Algorithm
+- 특정 비율로 token 을 생성하는 시스템을 사용하여 요청 처리.
+- bucket 에는 일정 수의 token 이 들어있고, 클라이언트가 요청을 보낼 때마다 토큰 사용.
+- bucket 에 token이 없으면 클라이언트의 요청을 거부하거나 대기.
+- Pros: 간단하고 효율적인 방식으로 요청을 제어.
+- Cons: 트래픽이 다소 불규칙하거나 급격하게 변하는 경우 효과적으로 대응하기 힘듬.
+
+#### Leaky Bucket Algorithm
+- 특정 속도로 요청을 처리하고, 초과된 요청은 누출되는 방식으로 처리.
+- bucket 에 일정량의 toekn 이 들어있고, 요청이 들어올 때마다 toekn 소비.
+- 일정 시간마다 bucket이 비워지면서, 일정 속도로 누출.
+- Pros: 일정한 처리율을 유지하면서 초과 요청을 효과적으로 처리 가능.
+- Cons: 요청이 급격하게 변하는 경우 처리에 어려움.
+
+#### Fixed Window Counter Algorithm
+- 특정 시간 동안의 요청 수를 세는 방식.
+- 시간 창을 고정하고 해당 시간 동안의 요청 수를 세는 방식입니다.
+- 시간이 지나면 창을 초기화합니다.
+- Pros: 간단한 구현 가능.
+- Cons: 시간이 경과할 때마다 초기화되어 트래픽의 불규칙성을 처리하는 데 제약이 있음.
+
+#### Sliding Window Log Algorithm
+- 시간별 또는 사건별 로그를 유지하고, 특정 시간 동안의 로그를 보고 처리량을 계산.
+- request 가 발생할 때마다 로그를 남기고, 특정 시간 범위 내의 로그를 검사하여 처리량을 계산.
+- Pros: 유연성이 있으며, 시간대별로 정확한 처리량을 계산 가능.
+- Cons: 로그의 크기가 커지고 처리량이 많아질수록 성능에 영향을 줌.
+
+#### Sliding Window Counter Algorithm
+- 슬라이딩 창 카운터 알고리즘은 고정 창 카운터와 유사하지만, 창의 시작 시간이 고정되어 있지 않고 계속 변경.
+- 이전 시간 창의 요청 수를 고려하여 새로운 요청을 처리.
+- Pros: 시간 창의 이동이 유연하며, 실시간으로 요청을 처리하는 데 적합.
+- Cons: 구현이 복잡할 수 있으며, 시간 창의 이동에 따른 성능 영향을 고려해야 함.
+
+### High-level architecture
+- rate limiter를 어디에 저장할지에 대한 고려가 중요. 대부분의 경우 Redis와 같은 인메모리 캐시를 사용.
+- 클라이언트 요청을 처리하기 전에 rate limiter middleware 에서 요청을 필터링하고, 필요한 경우 rate limiter 를 적용.
+
+## Step 3 - Design deep dive
+
+### Rate limiter in a distributed environment
+
+- Rate limiter 를 multiple servers 및 concurrent threads 를 지원하도록 확장하는 것은 복잡한 문제.
+- Race condition 및 Synchronization issue 를 해결하기 위해 Lua 스크립트 및 Redis의 정렬된 집합과 같은 전략을 사용할 수 있다.
+
+### Performance optimization & Monitoring
+
+- multi-data center setup 과 더불어 eventual consistency model 을 사용하여 성능을 최적화 가능.
+- 모니터링은 시스템이 효과적인지를 확인하는 데 중요합니다.
diff --git a/...nterview/05-design-consistent-hashing/kangmin-ch05-design-consistent-hashing.md b/...nterview/05-design-consistent-hashing/kangmin-ch05-design-consistent-hashing.md
@@ -0,0 +1,43 @@
+# Chapter 5 : DESIGN CONSISTENT HASHING
+
+
+## The rehashing problem
+
+- n개의 cache servers 가 있다면, 부하를 균형있게(balance) 분산하기 위한 일반적인 해싱 방법은 다음과 같이 사용됨
+- serverIndex = hash(key) % N, where N is the size of the server pool.
+
+- 이 접근 방식은 서버 풀의 크기가 고정되어 있고 데이터 분배가 균등한 경우 잘 작동함.
+- But, 새 서버가 추가되거나 기존 서버가 제거될 때 문제가 발생 할 수 있음.
+- 예를 들어, 서버 1이 offline 상태가 되면 서버 풀(pool)의 크기가 3이 됨.
+- 동일한 hash 함수를 사용하여 키에 대해 동일한 hash 값을 얻지만 모듈러 연산(modular
+  operation)을 적용하면 서버 수가 1개 줄어들기 때문에 다른 server index 를 얻게 됨. (이로 인해, cash miss 가 발생)
+
+## Consistent hashing
+
+- 위의 문제를 해결하는데 좋은 효과적인 기술
+- 일관된 해싱은 해시 테이블이 다시 크기 조정되고 일관된 해싱이 사용될 때, 평균적으로 k/n 개의 키만 다시 매핑되어야 함.
+- 여기서 k는 the number of keys, n은 the number of slots.
+- 대조적으로 대부분의 전통적인 hash table 에서는 the number of array slots 의 변경으로 인해 거의 모든 키가 다시 매핑(mapping)됨.
+
+## Hash space, hash ring and hash servers
+
+- hash space 의 양끝을 모아서 hash ring 을 만듬
+- 같은 hash function f 를 사용하여 hash ring 에 mapping
+
+## Hash keys, Server lookup, Add and remove a server
+
+- key 가 저장된 서버를 결정하기 위해 key position 에서 clockwise 으로 이동하여 링에서 첫 번째로 만나는 서버를 찾음.
+- 새로운 서버를 추가하는 것은 일부 키의 재분배만 필요함 (전부 필요하지 않음)
+- 역시, 제거할 때도 일부 키의 재분배만 필요함.
+
+## Two issues in the basic approach
+
+### basic approach 요약
+- 일관된 해싱 알고리즘을 사용하여 링 위의 서버 및 키를 매핑함.
+- 키가 매핑된 서버를 찾으려면 키 위치에서 시계 방향으로 이동하여 링 위의 첫 번째 서버를 찾아야 함.
+
+### issues
+- 서버가 추가되거나 제거되므로 링에 대한 partition 크기를 동일하게 유지하는 것은 불가능
+- partition 은 인접한 서버 사이의 hash space, 할당된 링의 파티션 크기가 매우 작거나 상당히 큰 경우가 있을 수 있음.
+- 또한 링 위에 비균일한 키 분배가 일어날 수 있음.
+