Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy: Use connection pools for images #4326

Merged
merged 6 commits into from
Oct 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/invidious.cr
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ SOFTWARE = {

YT_POOL = YoutubeConnectionPool.new(YT_URL, capacity: CONFIG.pool_size)

# Image request pool

GGPHT_POOL = YoutubeConnectionPool.new(URI.parse("https://yt3.ggpht.com"), capacity: CONFIG.pool_size)

# CLI
Kemal.config.extra_options do |parser|
parser.banner = "Usage: invidious [arguments]"
Expand Down
106 changes: 25 additions & 81 deletions src/invidious/routes/images.cr
Original file line number Diff line number Diff line change
Expand Up @@ -11,29 +11,9 @@ module Invidious::Routes::Images
end
end

# We're encapsulating this into a proc in order to easily reuse this
# portion of the code for each request block below.
request_proc = ->(response : HTTP::Client::Response) {
env.response.status_code = response.status_code
response.headers.each do |key, value|
if !RESPONSE_HEADERS_BLACKLIST.includes?(key.downcase)
env.response.headers[key] = value
end
end

env.response.headers["Access-Control-Allow-Origin"] = "*"

if response.status_code >= 300
env.response.headers.delete("Transfer-Encoding")
return
end

proxy_file(response, env)
}

begin
HTTP::Client.get("https://yt3.ggpht.com#{url}") do |resp|
return request_proc.call(resp)
GGPHT_POOL.client &.get(url, headers) do |resp|
return self.proxy_image(env, resp)
end
rescue ex
end
Expand Down Expand Up @@ -61,27 +41,10 @@ module Invidious::Routes::Images
end
end

request_proc = ->(response : HTTP::Client::Response) {
env.response.status_code = response.status_code
response.headers.each do |key, value|
if !RESPONSE_HEADERS_BLACKLIST.includes?(key.downcase)
env.response.headers[key] = value
end
end

env.response.headers["Connection"] = "close"
env.response.headers["Access-Control-Allow-Origin"] = "*"

if response.status_code >= 300
return env.response.headers.delete("Transfer-Encoding")
end

proxy_file(response, env)
}

begin
HTTP::Client.get("https://#{authority}.ytimg.com#{url}") do |resp|
return request_proc.call(resp)
get_ytimg_pool(authority).client &.get(url, headers) do |resp|
env.response.headers["Connection"] = "close"
return self.proxy_image(env, resp)
end
rescue ex
end
Expand All @@ -101,26 +64,9 @@ module Invidious::Routes::Images
end
end

request_proc = ->(response : HTTP::Client::Response) {
env.response.status_code = response.status_code
response.headers.each do |key, value|
if !RESPONSE_HEADERS_BLACKLIST.includes?(key.downcase)
env.response.headers[key] = value
end
end

env.response.headers["Access-Control-Allow-Origin"] = "*"

if response.status_code >= 300 && response.status_code != 404
return env.response.headers.delete("Transfer-Encoding")
end

proxy_file(response, env)
}

begin
HTTP::Client.get("https://i9.ytimg.com#{url}") do |resp|
return request_proc.call(resp)
get_ytimg_pool("i9").client &.get(url, headers) do |resp|
return self.proxy_image(env, resp)
end
rescue ex
end
Expand Down Expand Up @@ -165,8 +111,7 @@ module Invidious::Routes::Images
if name == "maxres.jpg"
build_thumbnails(id).each do |thumb|
thumbnail_resource_path = "/vi/#{id}/#{thumb[:url]}.jpg"
# This can likely be optimized into a (small) pool sometime in the future.
if HTTP::Client.head("https://i.ytimg.com#{thumbnail_resource_path}").status_code == 200
if get_ytimg_pool("i9").client &.head(thumbnail_resource_path, headers).status_code == 200
syeopite marked this conversation as resolved.
Show resolved Hide resolved
name = thumb[:url] + ".jpg"
break
end
Expand All @@ -181,29 +126,28 @@ module Invidious::Routes::Images
end
end

request_proc = ->(response : HTTP::Client::Response) {
env.response.status_code = response.status_code
response.headers.each do |key, value|
if !RESPONSE_HEADERS_BLACKLIST.includes?(key.downcase)
env.response.headers[key] = value
end
begin
get_ytimg_pool("i").client &.get(url, headers) do |resp|
return self.proxy_image(env, resp)
end
rescue ex
end
end

env.response.headers["Access-Control-Allow-Origin"] = "*"

if response.status_code >= 300 && response.status_code != 404
return env.response.headers.delete("Transfer-Encoding")
private def self.proxy_image(env, response)
env.response.status_code = response.status_code
response.headers.each do |key, value|
if !RESPONSE_HEADERS_BLACKLIST.includes?(key.downcase)
env.response.headers[key] = value
end
end

proxy_file(response, env)
}
env.response.headers["Access-Control-Allow-Origin"] = "*"
syeopite marked this conversation as resolved.
Show resolved Hide resolved

begin
# This can likely be optimized into a (small) pool sometime in the future.
HTTP::Client.get("https://i.ytimg.com#{url}") do |resp|
return request_proc.call(resp)
end
rescue ex
if response.status_code >= 300
return env.response.headers.delete("Transfer-Encoding")
end

return proxy_file(response, env)
syeopite marked this conversation as resolved.
Show resolved Hide resolved
end
end
47 changes: 33 additions & 14 deletions src/invidious/yt_backend/connection_pool.cr
Original file line number Diff line number Diff line change
@@ -1,17 +1,6 @@
def add_yt_headers(request)
request.headers.delete("User-Agent") if request.headers["User-Agent"] == "Crystal"
request.headers["User-Agent"] ||= "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36"

request.headers["Accept-Charset"] ||= "ISO-8859-1,utf-8;q=0.7,*;q=0.7"
request.headers["Accept"] ||= "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
request.headers["Accept-Language"] ||= "en-us,en;q=0.5"

# Preserve original cookies and add new YT consent cookie for EU servers
request.headers["Cookie"] = "#{request.headers["cookie"]?}; CONSENT=PENDING+#{Random.rand(100..999)}"
if !CONFIG.cookies.empty?
request.headers["Cookie"] = "#{(CONFIG.cookies.map { |c| "#{c.name}=#{c.value}" }).join("; ")}; #{request.headers["cookie"]?}"
end
end
# Mapping of subdomain => YoutubeConnectionPool
# This is needed as we may need to access arbitrary subdomains of ytimg
private YTIMG_POOLS = {} of String => YoutubeConnectionPool

struct YoutubeConnectionPool
property! url : URI
Expand Down Expand Up @@ -54,6 +43,21 @@ struct YoutubeConnectionPool
end
end

def add_yt_headers(request)
request.headers.delete("User-Agent") if request.headers["User-Agent"] == "Crystal"
request.headers["User-Agent"] ||= "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36"

request.headers["Accept-Charset"] ||= "ISO-8859-1,utf-8;q=0.7,*;q=0.7"
request.headers["Accept"] ||= "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
request.headers["Accept-Language"] ||= "en-us,en;q=0.5"

# Preserve original cookies and add new YT consent cookie for EU servers
request.headers["Cookie"] = "#{request.headers["cookie"]?}; CONSENT=PENDING+#{Random.rand(100..999)}"
if !CONFIG.cookies.empty?
request.headers["Cookie"] = "#{(CONFIG.cookies.map { |c| "#{c.name}=#{c.value}" }).join("; ")}; #{request.headers["cookie"]?}"
end
end

def make_client(url : URI, region = nil, force_resolve : Bool = false)
client = HTTP::Client.new(url)

Expand All @@ -77,3 +81,18 @@ def make_client(url : URI, region = nil, force_resolve : Bool = false, &)
client.close
end
end

# Fetches a HTTP pool for the specified subdomain of ytimg.com
#
# Creates a new one when the specified pool for the subdomain does not exist
def get_ytimg_pool(subdomain)
if pool = YTIMG_POOLS[subdomain]?
return pool
else
LOGGER.info("ytimg_pool: Creating a new HTTP pool for \"https://#{subdomain}.ytimg.com\"")
pool = YoutubeConnectionPool.new(URI.parse("https://#{subdomain}.ytimg.com"), capacity: CONFIG.pool_size)
YTIMG_POOLS[subdomain] = pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about keeping that many open connections on small instances. Maybe we could bypass the pooling system if CONFIG.pool_size == 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the pool not automatically clean up connections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so? It seems that released connections are pushed to the idle list, but I'm not sure what causes them to be closed:
https://github.com/crystal-lang/crystal-db/blob/3eaac85a5d4b7bee565b55dcb584e84e29fc5567/src/db/pool.cr#L151-L185

Copy link
Member Author

@syeopite syeopite Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's a problem...

Regarding the bypass though, shouldn't that already be what DB:Pool does when pool_size equals 0? Invidious seems to be able to handle requests fine when CONFIG.pool_size is set to zero at least

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YoutubeConnectionPool sets the max_pool_size and max_idle_pool_size to the value of capacity. So meaning the pool will start out with 0 clients, and create more up to capacity, but then it'll keep all of them around forever.

There is also not currently a way to set a timeout on idle connections. I.e. that would expire idle connections after some period of time. Is some discussion related to this in crystal-lang/crystal-db#47.

In the meantime it might be worth allowing to customize the max_idle_pool_size vs always assuming it should be CONFIG.pool_size big. E.g. have a higher max_pool_size to handle bursts, but then keep max_idle_pool_size set to something smaller for normal traffic. 1 might be a good starting point, could run some benchmarks to figure out a good value.


return pool
end
end
Loading