Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hosting: manual integrations via build contract #10127

Merged
merged 43 commits into from
Mar 20, 2023
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
5b28071
Hosting: manual integrations via build contract
humitos Mar 8, 2023
e18b40f
Use a single script to load everything
humitos Mar 8, 2023
7754d5f
Include Read the Docs analytics to integrations
humitos Mar 8, 2023
2925ed9
Initial work for hosting features
humitos Mar 8, 2023
3385649
External version banner and doc-diff integration
humitos Mar 8, 2023
1e391b8
Old version warning
humitos Mar 8, 2023
6afca0b
Do not inject doc-diff on search page
humitos Mar 8, 2023
7ce98a4
Inject old version warning only for non-external versions
humitos Mar 8, 2023
3ca96ec
Comments!
humitos Mar 8, 2023
f4f1268
More comments
humitos Mar 8, 2023
a596512
Build: pass `PATH` environment variable to Docker container
humitos Mar 9, 2023
33fdb2b
Lint: for some reason fails at CircleCI otherwise
humitos Mar 9, 2023
4ced5c3
Merge branch 'humitos/build-cmd-environment' of github.com:readthedoc…
humitos Mar 9, 2023
ea2af4c
Feature flag for new hosting integrations
humitos Mar 10, 2023
79b7393
Load `readthedocs-build.yaml` and generate `readthedocs-data.html`
humitos Mar 10, 2023
17effaf
Load READTHEDOCS_DATA async
humitos Mar 10, 2023
2b9cdbf
Absolute proxied API path
humitos Mar 10, 2023
0116f41
Remove duplicated code
humitos Mar 10, 2023
d5130cc
New approach using `readthedocs-client.js` and `/_/readthedocs-config…
humitos Mar 11, 2023
761e3b6
Do not require `readthedocs-build.YAML` for now
humitos Mar 11, 2023
bd9f70e
Expand the JSON response with more data
humitos Mar 13, 2023
842a228
Remove non-required files and rely on `readthedocs-client.js` only
humitos Mar 13, 2023
2ad30cc
Improve helper text
humitos Mar 13, 2023
89662fa
Builds: save `readthedocs-build.yaml` into database
humitos Mar 13, 2023
f83eee6
Use `Version.build_data` from the endpoint
humitos Mar 13, 2023
d14115a
Flyout: return data required to generate flyout dynamically
humitos Mar 13, 2023
067bb4c
Updates to the API
humitos Mar 14, 2023
17c1af3
Minor updates
humitos Mar 15, 2023
53366aa
Update the javascript client compiled version
humitos Mar 15, 2023
0f89186
doc-diff object returned
humitos Mar 16, 2023
48de597
Build: check if the YAML file exists before trying to open it
humitos Mar 16, 2023
4b05e77
Proxito: don't inject the header if the feature is turned off
humitos Mar 16, 2023
e90af75
Merge branch 'main' of github.com:readthedocs/readthedocs.org into hu…
humitos Mar 16, 2023
364de9c
Test: add hosting integrations tests
humitos Mar 16, 2023
c1cf8cb
Remove JS
humitos Mar 20, 2023
3930915
Load the javascript from a local server for now
humitos Mar 20, 2023
2d7b6fe
Update URL to remove .json from it
humitos Mar 20, 2023
ab43635
Remove non-required f-string
humitos Mar 20, 2023
85ab2e7
Allow saving `build_data` via API endpoint
humitos Mar 20, 2023
3439e24
Merge branch 'main' of github.com:readthedocs/readthedocs.org into hu…
humitos Mar 20, 2023
6154c7f
Lint
humitos Mar 20, 2023
0e8f245
Migrate nodejs installation to asdf
humitos Mar 20, 2023
a6c5977
Change port to match `npm run dev` from readthedocs-client
humitos Mar 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions dockerfiles/nginx/proxito.conf.template
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,18 @@ server {
proxy_hide_header Content-Security-Policy;
set $content_security_policy $upstream_http_content_security_policy;
add_header Content-Security-Policy $content_security_policy always;

# This header allows us to decide whether or not inject the script at CloudFlare level
# Now, I'm injecting it in all the NGINX responses because `sub_filter` is not allowed inside an `if` statement.
set $rtd_hosting_integrations $upstream_http_x_rtd_hosting_integrations;
add_header X-RTD-Hosting-Integrations $rtd_hosting_integrations always;
humitos marked this conversation as resolved.
Show resolved Hide resolved

# Inject our own script dynamically
# TODO: decide where is the best place to do this
sub_filter '</head>' '<script language="javascript" src="http://devthedocs.org/static/core/js/readthedocs-client.js"></script>\n</head>';
# sub_filter_types text/html;
sub_filter_last_modified on;
sub_filter_once on;
}

# Serve 404 pages here
Expand Down
22 changes: 22 additions & 0 deletions readthedocs/builds/migrations/0048_add_build_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Generated by Django 3.2.18 on 2023-03-13 15:15

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
("builds", "0047_build_default_triggered"),
]

operations = [
migrations.AddField(
model_name="version",
name="build_data",
field=models.JSONField(
default=None,
null=True,
verbose_name="Data generated at build time by the doctool (`readthedocs-build.yaml`).",
),
),
]
6 changes: 6 additions & 0 deletions readthedocs/builds/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,12 @@ class Version(TimeStampedModel):
),
)

build_data = models.JSONField(
_("Data generated at build time by the doctool (`readthedocs-build.yaml`)."),
default=None,
null=True,
)

objects = VersionManager.from_queryset(VersionQuerySet)()
# Only include BRANCH, TAG, UNKNOWN type Versions.
internal = InternalVersionManager.from_queryset(partial(VersionQuerySet, internal_only=True))()
Expand Down
2 changes: 2 additions & 0 deletions readthedocs/core/static/core/js/readthedocs-client.js
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to ship this with the app, or version it directly on S3, like we do with the ad client? I think keeping it deployable outside of the application seems right to me.

https://github.com/readthedocs/ethical-ad-client/tags

https://media.ethicalads.io/media/client/v1.4.0/ethicalads.min.js

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought too much about this and I don't have experience deploying the script with a different process than the normal one. I'm not sure about the pros/cons here.

I put this file here because we need it for development as well. We could just put this file in the local MinIO S3, tho, as well.

Note this file is generated by running npm run build from another repository.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to also think about what version are we going to serve by default. The latest one? Would people be able to pin to a particular version? Are we going to support multiple versions at the same time? How do we deploy new features to those that are pinned to an older version? Do we care? Too many questions 🤷🏼

Copy link
Member

@ericholscher ericholscher Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we always deploy the latest, but if the goal is for other people to integrate the library, then it can be versioned based on their needs. The versioning is primarily valuable for strict SRI protection, like PyPI does, to validate the hash of the library hasn't changed for security reasons.

I think we should definitely deploy the client outside of our application. We don't need to decide on a proper deployment pattern yet though, but I think we should keep it out of the application from the start.

For now, we can just manually upload it to S3, and use that everywhere?

Large diffs are not rendered by default.

33 changes: 33 additions & 0 deletions readthedocs/doc_builder/director.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import tarfile

import structlog
import yaml
from django.conf import settings
from django.utils.translation import gettext_lazy as _

Expand Down Expand Up @@ -187,6 +188,7 @@ def build(self):
self.build_epub()

self.run_build_job("post_build")
self.store_readthedocs_build_yaml()

after_build.send(
sender=self.data.version,
Expand Down Expand Up @@ -392,6 +394,7 @@ def run_build_commands(self):
# Update the `Version.documentation_type` to match the doctype defined
# by the config file. When using `build.commands` it will be `GENERIC`
self.data.version.documentation_type = self.data.config.doctype
self.store_readthedocs_build_yaml()

def install_build_tools(self):
"""
Expand Down Expand Up @@ -625,3 +628,33 @@ def get_build_env_vars(self):
def is_type_sphinx(self):
"""Is documentation type Sphinx."""
return "sphinx" in self.data.config.doctype

def store_readthedocs_build_yaml(self):
# load YAML from user
yaml_path = os.path.join(
self.data.project.artifact_path(
version=self.data.version.slug, type_="html"
),
"readthedocs-build.yaml",
)

try:
with open(yaml_path, "r") as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use readthedocs.core.utils.filesystem.safe_open here

data = yaml.safe_load(f)
except Exception:
# NOTE: skip this work for now until we decide whether or not this
humitos marked this conversation as resolved.
Show resolved Hide resolved
# YAML file is required.
#
# NOTE: decide whether or not we want this
# file to be mandatory and raise an exception here.
return

log.info("readthedocs-build.yaml loaded.", path=yaml_path)

# TODO: validate the YAML generated by the user
# self._validate_readthedocs_build_yaml(data)

# Copy the YAML data into `Version.build_data`.
# It will be saved when the API is hit.
# This data will be used by the `/_/readthedocs-config.json` API endpoint.
self.data.version.build_data = data
Comment on lines +661 to +664
Copy link
Member

@ericholscher ericholscher Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to actually parse the data into memory, or just store the file contents directly as a string? I think we just want to store a string to start? I'd like to avoid as much YAML parsing as possible...

I also wonder if we should make this file JSON, instead of YAML? If the goal is for it to be aligned with the JSON data returned via the API, I think that makes more sense. But if it's closer to our .readthedocs.yaml config, then YAML makes sense 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to actually parse the data into memory, or just store the file contents directly as a string? I think we just want to store a string to start? I'd like to avoid as much YAML parsing as possible...

I think we will need to parse it so we can validate it at some point, anyways.

This also allows us to use a JSON field in the database that we can query in the future, looking for answers.

I also wonder if we should make this file JSON, instead of YAML? If the goal is for it to be aligned with the JSON data returned via the API, I think that makes more sense. But if it's closer to our .readthedocs.yaml config, then YAML makes sense thinking

I decide to use YAML here on purpose. It's a lot easier to write than JSON, a lot less nit picking (e.g. requires no trailing comma in the last element of a list), supports comments, works better with more data types, and others. The structure is going to be just a dictionary, YAML is going to be just the representation/serialization of it

In particular, being able to put comments in .readthedocs.yaml is important for ourselves and our users as well. That was one of the reasons why I picked YAML for this file as well. Otherwise, you end up with things like this in JSON:

"comment": (
"THIS RESPONSE IS IN ALPHA FOR TEST PURPOSES ONLY"
" AND IT'S GOING TO CHANGE COMPLETELY -- DO NOT USE IT!"
),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea.. let's use YAML for anything human-writable, and JSON for machine-writable 👍

67 changes: 39 additions & 28 deletions readthedocs/doc_builder/environments.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from docker.errors import APIError as DockerAPIError
from docker.errors import DockerException
from docker.errors import NotFound as DockerNotFoundError
from requests.exceptions import ConnectionError, ReadTimeout
from requests.exceptions import ConnectionError, ReadTimeout # noqa
from requests_toolbelt.multipart.encoder import MultipartEncoder

from readthedocs.api.v2.client import api as api_v2
Expand Down Expand Up @@ -73,7 +73,7 @@ def __init__(
bin_path=None,
record_as_success=False,
demux=False,
**kwargs,
**kwargs, # pylint: disable=unused-argument
):
self.command = command
self.shell = shell
Expand Down Expand Up @@ -252,8 +252,8 @@ def save(self):
{key: str(value) for key, value in data.items()}
)
resource = api_v2.command
resp = resource._store['session'].post(
resource._store['base_url'] + '/',
resp = resource._store["session"].post( # pylint: disable=protected-access
resource._store["base_url"] + "/", # pylint: disable=protected-access
data=encoder,
headers={
'Content-Type': encoder.content_type,
Expand Down Expand Up @@ -301,11 +301,35 @@ def run(self):

self.start_time = datetime.utcnow()
client = self.build_env.get_client()

# Create a copy of the environment to update PATH variable
environment = self._environment.copy()
# Default PATH variable
environment[
"PATH"
] = "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# Add asdf extra paths
environment["PATH"] += (
":/home/{settings.RTD_DOCKER_USER}/.asdf/shims"
":/home/{settings.RTD_DOCKER_USER}/.asdf/bin"
)

if settings.RTD_DOCKER_COMPOSE:
environment["PATH"] += ":/root/.asdf/shims:/root/.asdf/bin"

# Prepend the BIN_PATH if it's defined
if self.bin_path:
path = environment.get("PATH")
bin_path = self._escape_command(self.bin_path)
environment["PATH"] = bin_path
if path:
environment["PATH"] += f":{path}"

try:
exec_cmd = client.exec_create(
container=self.build_env.container_id,
cmd=self.get_wrapped_command(),
environment=self._environment,
environment=environment,
user=self.user,
workdir=self.cwd,
stdout=True,
Expand Down Expand Up @@ -357,31 +381,18 @@ def get_wrapped_command(self):
"""
Wrap command in a shell and optionally escape special bash characters.

In order to set the current working path inside a docker container, we
need to wrap the command in a shell call manually.

Some characters will be interpreted as shell characters without
escaping, such as: ``pip install requests<0.8``. When passing
``escape_command=True`` in the init method this escapes a good majority
of those characters.
"""
prefix = ''
if self.bin_path:
bin_path = self._escape_command(self.bin_path)
prefix += f'PATH={bin_path}:$PATH '

command = (
' '.join(
self._escape_command(part) if self.escape_command else part
for part in self.command
)
)
return (
"/bin/sh -c '{prefix}{cmd}'".format(
prefix=prefix,
cmd=command,
)
)
return f"/bin/bash -c '{command}'"

def _escape_command(self, cmd):
r"""Escape the command by prefixing suspicious chars with `\`."""
Expand Down Expand Up @@ -524,14 +535,14 @@ class BuildEnvironment(BaseEnvironment):
"""

def __init__(
self,
project=None,
version=None,
build=None,
config=None,
environment=None,
record=True,
**kwargs,
self,
project=None,
version=None,
build=None,
config=None,
environment=None,
record=True,
**kwargs, # pylint: disable=unused-argument
):
super().__init__(project, environment)
self.version = version
Expand All @@ -557,7 +568,7 @@ def run(self, *cmd, **kwargs):
})
return super().run(*cmd, **kwargs)

def run_command_class(self, *cmd, **kwargs): # pylint: disable=arguments-differ
def run_command_class(self, *cmd, **kwargs): # pylint: disable=signature-differs
kwargs.update({
'build_env': self,
})
Expand Down
5 changes: 5 additions & 0 deletions readthedocs/projects/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -1884,6 +1884,7 @@ def add_features(sender, **kwargs):
CANCEL_OLD_BUILDS = "cancel_old_builds"
DONT_CREATE_INDEX = "dont_create_index"
USE_RCLONE = "use_rclone"
HOSTING_INTEGRATIONS = "hosting_integrations"

FEATURES = (
(ALLOW_DEPRECATED_WEBHOOKS, _('Allow deprecated webhook views')),
Expand Down Expand Up @@ -2058,6 +2059,10 @@ def add_features(sender, **kwargs):
USE_RCLONE,
_("Use rclone for syncing files to the media storage."),
),
(
HOSTING_INTEGRATIONS,
_("Inject 'readthedocs-client.js' as <script> HTML tag in responses."),
),
)

projects = models.ManyToManyField(
Expand Down
1 change: 1 addition & 0 deletions readthedocs/projects/tasks/builds.py
Original file line number Diff line number Diff line change
Expand Up @@ -598,6 +598,7 @@ def on_success(self, retval, task_id, args, kwargs):
"has_pdf": "pdf" in valid_artifacts,
"has_epub": "epub" in valid_artifacts,
"has_htmlzip": "htmlzip" in valid_artifacts,
"build_data": self.data.version.build_data,
}
)
except HttpClientError:
Expand Down
11 changes: 11 additions & 0 deletions readthedocs/proxito/middleware.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
unresolver,
)
from readthedocs.core.utils import get_cache_tag
from readthedocs.projects.models import Feature, Project

log = structlog.get_logger(__name__)

Expand Down Expand Up @@ -278,9 +279,19 @@ def process_request(self, request): # noqa

return None

def add_hosting_integrations_headers(self, request, response):
project_slug = getattr(request, "path_project_slug", "")
if project_slug:
project = Project.objects.get(slug=project_slug)
if project.has_feature(Feature.HOSTING_INTEGRATIONS):
response["X-RTD-Hosting-Integrations"] = "true"
else:
response["X-RTD-Hosting-Integrations"] = "false"
humitos marked this conversation as resolved.
Show resolved Hide resolved

def process_response(self, request, response): # noqa
self.add_proxito_headers(request, response)
self.add_cache_headers(request, response)
self.add_hsts_headers(request, response)
self.add_user_headers(request, response)
self.add_hosting_integrations_headers(request, response)
return response
7 changes: 7 additions & 0 deletions readthedocs/proxito/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
from readthedocs.constants import pattern_opts
from readthedocs.core.views import HealthCheckView
from readthedocs.projects.views.public import ProjectDownloadMedia
from readthedocs.proxito.views.hosting import ReadTheDocsConfigJson
from readthedocs.proxito.views.serve import (
ServeDocs,
ServeError404,
Expand Down Expand Up @@ -114,6 +115,12 @@
ServeStaticFiles.as_view(),
name="proxito_static_files",
),
# readthedocs-config.js
path(
f"{DOC_PATH_PREFIX}readthedocs-config.json",
ReadTheDocsConfigJson.as_view(),
name="proxito_readthedocs_config_json",
),
]

core_urls = [
Expand Down
Loading