Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
wvengen committed Nov 19, 2024
1 parent f36de79 commit b65050e
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 22 deletions.
73 changes: 53 additions & 20 deletions CONFIG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
## About
This file provides you with the detailed description of parameters listed in the config file, and explaining why they are used
and when you are expected to provide or change them.
# scrapyd-k8s configuration

## Configuration file
scrapyd-k8s is configured with the file `scrapyd_k8s.conf`. The file format is meant to
stick to [scrapyd's configuration](https://scrapyd.readthedocs.io/en/latest/config.html) where possible.

## `[scrapyd]` section

* `http_port` - defaults to `6800` ([](https://scrapyd.readthedocs.io/en/latest/config.html#http-port))
* `bind_address` - defaults to `127.0.0.1` ([](https://scrapyd.readthedocs.io/en/latest/config.html#bind-address))
Expand All @@ -14,25 +15,57 @@ and when you are expected to provide or change them.

The Docker and Kubernetes launchers have their own additional options.

## [scrapyd] section, reconnection_attempts, backoff_time, backoff_coefficient
## project sections

Each project you want to be able to run, gets its own section, prefixed with `project.`. For example,
consider an `example` spider, this would be defined in a `[project.example]` section.

* `repository` - container repository for the project, e.g. `ghcr.io/q-m/scrapyd-k8s-spider-example`

## Docker

This section describes Docker-specific options.
See [`scrapyd_k8s.sample-docker.conf`](scrapyd_k8s.sample-docker.conf) for an example.

* `[scrapyd]` `launcher` - set this to `scrapyd_k8s.launcher.Docker`
* `[scrapyd]` `repository` - choose between `scrapyd_k8s.repository.Local` and `scrapyd_k8s.repository.Remote`

TODO: explain `Local` and `Remote` repository, and how to use them

## Kubernetes

This section describes Kubernetes-specific options.
See [`scrapyd_k8s.sample-k8s.conf`](scrapyd_k8s.sample-k8s.conf) for an example.

### Context
The Kubernetes event watcher is used in the code as part of the joblogs feature and is also utilized for limiting the
number of jobs running in parallel on the cluster. Both features are not enabled by default and can be activated if you
* `[scrapyd]` `launcher` - set this to `scrapyd_k8s.launcher.K8s`
* `[scrapyd]` `repository` - set this to `scrapyd_k8s.repository.Remote`

For Kubernetes, it is important to set resource limits.

TODO: explain how to set limits, with default, project and spider specificity.


### Kubernetes API interaction

The Kubernetes event watcher is used in the code as part of the joblogs feature and is also utilized for limiting the
number of jobs running in parallel on the cluster. Both features are not enabled by default and can be activated if you
choose to use them.

The event watcher establishes a connection to the Kubernetes API and receives a stream of events from it. However, the
nature of this long-lived connection is unstable; it can be interrupted by network issues, proxies configured to terminate
long-lived connections, and other factors. For this reason, a mechanism was implemented to re-establish the long-lived
connection to the Kubernetes API. To achieve this, three parameters were introduced: `reconnection_attempts`,
The event watcher establishes a connection to the Kubernetes API and receives a stream of events from it. However, the
nature of this long-lived connection is unstable; it can be interrupted by network issues, proxies configured to terminate
long-lived connections, and other factors. For this reason, a mechanism was implemented to re-establish the long-lived
connection to the Kubernetes API. To achieve this, three parameters were introduced: `reconnection_attempts`,
`backoff_time` and `backoff_coefficient`.

### What are these parameters about?
- `reconnection_attempts` - defines how many consecutive attempts will be made to reconnect if the connection fails;
- `backoff_time` and `backoff_coefficient` - are used to gradually slow down each subsequent attempt to establish a
connection with the Kubernetes API, preventing the API from becoming overloaded with requests. The `backoff_time` increases
exponentially and is calculated as `backoff_time *= self.backoff_coefficient`.
#### What are these parameters about?

* `reconnection_attempts` - defines how many consecutive attempts will be made to reconnect if the connection fails;
* `backoff_time`, `backoff_coefficient` - are used to gradually slow down each subsequent attempt to establish a
connection with the Kubernetes API, preventing the API from becoming overloaded with requests.
The `backoff_time` increases exponentially and is calculated as `backoff_time *= self.backoff_coefficient`.

#### When do I need to change it in the config file?

Default values for these parameters are provided in the code and are tuned to an "average" cluster setting. If your network
requirements or other conditions are unusual, you may need to adjust these values to better suit your specific setup.

### When do I need to change it in the config file?
Default values for these parameters are provided in the code and are tuned to an "average" cluster setting. If your network
requirements or other conditions are unusual, you may need to adjust these values to better suit your specific setup.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,8 +240,9 @@ If you want to delete a version, remove the corresponding Docker image from the
Not supported, by design.
If you want to delete a project, remove it from the configuration file.
## Configuration file
To read in detail about the config file, please, navigate to the [Configuration Guide](CONFIG.md)
## Configuration
This is done in the file `scrapyd_k8s.conf`, the options are explained in the [Configuration Guide](CONFIG.md).
## License
Expand Down

0 comments on commit b65050e

Please sign in to comment.