-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCM(OriginClusterMaster): Use external application for Origin Cluster to discovery services #1607
Comments
The design goal of the OCM scheme is to support 100,000 routes. Since each origin server provides services independently, OCM only serves as a service discovery function. This means that each origin needs to know or be able to know the address or name of its external service, which may need to be done through configuration and then passed from each origin to OCM. For example, if the K8s service corresponding to OriginA is "origina-service", then "origina-service" needs to be configured as the service name for OriginA, and then OriginA will pass it to OCM. Each Origin can support around 1000 streams. 10 Origins are required for 10,000 streams, and 100 Origins are required for 100,000 streams. SRS is positioned at the level of 100,000 streams, so this solution is acceptable. Generally, within 10,000 streams, it only requires a source cluster of less than 10 Origins to handle.
|
Less than 5k traffic, you can directly use the Origin Cluster, including in K8s, refer to #1501 (comment)
|
The Origin Cluster Master (OCM) also needs to have an important capability, which is to provide a unified API. Since each origin server in the source cluster has its own independent API and console, there is no unified API for the entire cluster externally. Currently, OriginCluster and OriginClusterMaster only support stream discovery and redirection. For example, the cluster's console and 1985 can be used in a single origin server, but the edge information is lost. If in OriginCluster, only one origin can be selected because each origin has a separate service, and multiple services cannot provide services on the same port of the same SLB. In addition, the same problem exists with edge clusters. This issue may be because the fundamental goal of the origin server and edge cluster is to be distributed and provide multiple nodes to share the load. On the other hand, the API is centralized and aims to have an overview of the entire cluster, which conflicts with the goal of the cluster itself. It is possible that the system API needs to be considered separately and not included in OCM.
|
For a cluster with more than 30 origin servers, it is necessary to configure 30 origin servers for edge configuration as well. The way edge servers retrieve content from origin servers needs improvement. Currently, edge clusters use a configuration of a list of origin server domain names. A better approach would be to configure OCM's API to query for available origin servers, so that only OCM's domain name needs to be configured.
|
When the origin server cluster is being released, the ideal state is to wait until the old stream service ends before stopping the Pod, which means the old stream should continue to provide service. This way, both the new and old Pods can provide service, or in other words, the old ones should no longer have new streams but still need to serve established connections. Currently, the current practice is to directly restart, regardless of whether it is an update or a rollback. Generally, the origin server cluster is located behind the edge. When the origin server restarts, the edge will re-push, so that users will not be disconnected but there may be some impact.
|
Streaming media must be stateful, which means that the stream must be located on a certain origin server and cannot have its state transferred to a database (the database cannot determine which origin server the stream is located on). Whether it is in a tree-like structure of Origin-Edge or a loose mesh structure (where any node can be both an Origin and an Edge), the essence is that the stream has a state and when it needs to be fetched from the origin server, it must be accessed from the process where the stream is located. Edge is stateless, because the actual request initiated by the client is only one, which is to play the stream (even though RTMP sends multiple commands). The stream itself is not located on the Edge, so it can be accessed from any edge server, and all edge servers will fetch the stream from the origin server. This solves the problem of downstream scalability, which means that when many people are watching a stream, it can be scaled using edge servers. For scenarios with a large number of streams, such as surveillance cameras and conferences, the producers (publishers) and consumers (players) of the streams may be similar, or even more publishers than players. This involves the fundamental issue of streaming media, which is that streams have state. This is why conference services are more difficult to handle. In reality, publishing streams themselves are also stateless because TCP publishing only has one request, which is to publish the stream. However, UDP publishing may involve a new request due to IP address changes, which is a problem in mobile scenarios (generally, the solution for live streams after network switching is to reconnect, but the cost of reconnecting in conferences is too high and not suitable). Let's first consider streaming with a fixed address, which means we can consider the act of streaming as a request. In this case, streaming itself is stateless, and it is feasible to stream to any server. Playing the stream is also stateless, as it doesn't matter from which edge server it is played. However, the stream itself has state, as there is a difference in which source server the edge server retrieves the stream from. This is the only place where state exists. In other words, the state of the SRS cluster lies in where the stream is located, while the act of streaming and playing the stream itself is stateless. In the current Origin Cluster solution, the state of the stream is determined by the mutual access between the source servers. The source server addresses are configured in a configuration file, and this state needs to be updated when scaling up. In the OCM (Origin Cluster Manager) solution, the source server addresses are reported to the OCM, which stores them in a backend service, such as KV (Key-Value) storage, to solve the state issue of the source server addresses. In addition to the source server addresses, OriginCluster also assumes that the source servers can directly access each other. This requires that these source servers are in the same internal network, for example, using the StatefulSet+HeadlessService approach, where each source server has its own service domain name and address. If deployed using the Deployment+ClusterIP approach, it is equivalent to having each source server behind a SLB (Server Load Balancer), which has limited scalability and is not suitable for situations that require frequent scaling. It also makes the deployment process more cumbersome. The most comprehensive approach for the origin server cluster is to address the issue of directly exposing the origin server addresses, which means making the streaming addresses stateless. This not only involves making the storage of streaming addresses stateless, but also ensuring that the servers behind these addresses are stateless. One optional solution is:
The final solution is shown in the following diagram: Key technical points:
|
For general applications, the business volume will not reach the level of millions of requests and millions of concurrent users. In this case, a simpler OCM (Object-Component Mapping) approach can be chosen, mainly addressing:
|
For the time being, we are not considering the OCM solution. The current Origin Cluster is sufficient for open source solutions. Is it?
|
The design of the OCM version of the source station cluster looks more ideal and aesthetically pleasing.
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Similar to a microservice registry center, Consul can be used to achieve flow registration and discovery. It transfers the flow's status to the registry center, eliminating the need to deploy a separate master service. The ability to register and discover flows can be abstracted into an API layer, making it easier to replace the underlying implementation. Additionally, it is recommended not to overly rely on the capabilities provided by Kubernetes (k8s).
|
In #1501, it is described that when deploying a source station cluster using Docker, SRS actually does not know its accessible IP. After starting SRS in Docker, it is assigned a NAT address, and only Docker knows the external address.
The solution is to add a parameter
coworker
when making internal requests within the source station cluster, as described in #1501. For example:This solution has a prerequisite, which is that we need to know the addresses of Docker A and B in advance. Additionally, it can be quite troublesome to update other source stations when the addresses change. This is known as manual service discovery and updating.
In frameworks like K8s with automatic service discovery, it is necessary to do this even when deploying multiple source station clusters. Manual discovery is not suitable in this case, and the problem should be solved using a separate service discovery service for the source station cluster called OCM (OriginClusterMaster). For example:
coworkers: ocm-service
. Multiple configurations can be set.on_publish
andon_unpublish
, and possibly other events that will be updated later.TRANS_BY_GPT3
The text was updated successfully, but these errors were encountered: