diff --git a/docs/architecture.md b/docs/architecture.md
index 23cd8dd64249..611a3bea54e2 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -29,7 +29,7 @@ mode. Monolithic mode is the default deployment of Loki when Loki is installed
 using Helm.
 
 When `target` is _not_ set to `all` (i.e., it is set to `querier`, `ingester`,
-or `distributor`), then Loki is said to be running in "horizontally scalable",
+`query-frontend`, or `distributor`), then Loki is said to be running in "horizontally scalable",
 or microservices, mode.
 
 Each component of Loki, such as the ingesters and distributors, communicate with
@@ -170,6 +170,36 @@ set of tokens.
 This process is used to avoid flushing all chunks when shutting down, which is a
 slow process.
 
+### Query frontend
+
+The **query frontend** is an **optional service** providing the querier's API endpoints and can be used to accelerate the read path. When the query frontend is in place, incoming query requests should be directed to the query frontend instead of the queriers. The querier service will be still required within the cluster, in order to execute the actual queries.
+
+The query frontend internally performs some query adjustments and holds queries in an internal queue. In this setup, queriers act as workers which pull jobs from the queue, execute them, and return them to the query-frontend for aggregation. Queriers need to be configured with the query frontend address (via the `-querier.frontend-address` CLI flag) in order to allow them to connect to the query frontends.
+
+Query frontends are **stateless**. However, due to how the internal queue works, it's recommended to run a few query frontend replicas to reap the benefit of fair scheduling. Two replicas should suffice in most cases.
+
+#### Queueing
+
+The query frontend queuing mechanism is used to:
+
+* Ensure that large queries, that could cause an out-of-memory (OOM) error in the querier, will be retried on failure. This allows administrators to under-provision memory for queries, or optimistically run more small queries in parallel, which helps to reduce the TCO.
+* Prevent multiple large requests from being convoyed on a single querier by distributing them across all queriers using a first-in/first-out queue (FIFO).
+* Prevent a single tenant from denial-of-service-ing (DOSing) other tenants by fairly scheduling queries between tenants.
+
+#### Splitting
+
+The query frontend splits larger queries into multiple smaller queries, executing these queries in parallel on downstream queriers and stitching the results back together again. This prevents large (multi-day, etc) queries from causing out of memory issues in a single querier and helps to execute them faster.
+
+#### Caching
+
+##### Metric Queries
+
+The query frontend supports caching metric query results and reuses them on subsequent queries. If the cached results are incomplete, the query frontend calculates the required subqueries and executes them in parallel on downstream queriers. The query frontend can optionally align queries with their step parameter to improve the cacheability of the query results. The result cache is compatible with any loki caching backend (currently memcached, redis, and an in-memory cache).
+
+##### Log Queries - Coming soon!
+
+Caching log (filter, regexp) queries are under active development.
+
 ### Querier
 
 The **querier** service handles queries using the [LogQL](./logql.md) query
diff --git a/docs/configuration/examples.md b/docs/configuration/examples.md
index d48e6ef0c78e..7da0e851d034 100644
--- a/docs/configuration/examples.md
+++ b/docs/configuration/examples.md
@@ -4,6 +4,7 @@
 2. [Google Cloud Storage](#google-cloud-storage)
 3. [Cassandra Index](#cassandra-index)
 4. [AWS](#aws)
+5. [Using the query-frontend](#query-frontend)
 
 ## Complete Local config
 
@@ -161,3 +162,7 @@ storage_config:
     s3: s3://access_key:secret_access_key@custom_endpoint/bucket_name
     s3forcepathstyle: true
 ```
+
+## Query Frontend
+
+[example configuration](./query-frontend.md)
diff --git a/docs/configuration/query-frontend.md b/docs/configuration/query-frontend.md
new file mode 100644
index 000000000000..91de063084a8
--- /dev/null
+++ b/docs/configuration/query-frontend.md
@@ -0,0 +1,137 @@
+## Kubernetes Query Frontend Example
+
+### Disclaimer
+
+This aims to be a general purpose example; there are a number of substitutions to make for it to work correctly. These variables take the form of <variable_name>. You should override them with specifics to your environment.
+
+### Use case
+
+It's a common occurrence to start running Loki as a single binary while trying it out in order to simplify deployments and defer learning the (initially unnecessary) nitty gritty details. As we become more comfortable with its paradigms and begin migrating towards a more production ready deployment there are a number of things to be aware of. A common bottleneck is on the read path: queries that executed effortlessly on small data sets may churn to a halt on larger ones. Sometimes we can solve this with more queriers. However, that doesn't help when our queries are too large for a single querier to execute. Then we need the query frontend.
+
+#### Parallelization
+
+One of the most important functions of the query frontend is the ability to split larger queries into smaller ones, execute them in parallel, and stitch the results back together. How often it splits them is determined by the `querier.split-queries-by-interval` flag or the yaml config `queryrange.split_queriers_by_interval`. With this set to `1h`, the frontend will dissect a day long query into 24 one hour queries, distribute them to the queriers, and collect the results. This is immensely helpful in production environments as it not only allows us to perform larger queries via aggregation, but also evens the work distribution across queriers so that one or two are not stuck with impossibly large queries while others are left idle.
+
+## Kubernetes Deployment
+
+### ConfigMap
+
+Use this ConfigMap to get the benefits of query parallelisation and caching with the query-frontend component.
+
+```
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: loki_frontend
+  namespace: <namespace>
+data:
+  config.yaml: |
+    # Disable the requirement that every request to Cortex has a
+    # X-Scope-OrgID header. `fake` will be substituted in instead.
+    auth_enabled: false
+
+    # We don't want the usual /api/prom prefix.
+    http_prefix:
+
+    server:
+      http_listen_port: 3100
+
+    query_range:
+      # make queries more cache-able by aligning them with their step intervals
+      align_queries_with_step: true
+      max_retries: 5
+      # parallelize queries in 15min intervals
+      split_queries_by_interval: 15m 
+      cache_results: true
+
+      results_cache:
+        max_freshness: 10m
+        cache:
+          # We're going to use the in-process "FIFO" cache
+          enable_fifocache: true
+          fifocache:
+            size: 1024
+            validity: 24h
+
+    frontend:
+      log_queries_longer_than: 5s
+      downstream: querier.<namespace>.svc.cluster.local:3100
+      compress_responses: true
+```
+
+### Frontend Service
+```yaml
+apiVersion: v1
+kind: Service
+metadata:
+  annotations:
+  labels:
+    name: query-frontend
+  name: query-frontend
+  namespace: <namespace>
+spec:
+  ports:
+  - name: query-frontend-http
+    port: 3100
+    protocol: TCP
+    targetPort: 3100
+  selector:
+    name: query-frontend
+  sessionAffinity: None
+  type: ClusterIP
+```
+
+### Frontend Deployment
+
+```yaml
+apiVersion: extensions/v1beta1
+kind: Deployment
+metadata:
+  labels:
+    name: query-frontend
+  name: query-frontend
+  namespace: <namespace>
+spec:
+  minReadySeconds: 10
+  replicas: 2
+  selector:
+    matchLabels:
+      name: query-frontend
+  template:
+    metadata:
+      labels:
+        name: query-frontend
+    spec:
+      containers:
+      - args:
+        - -config.file=/etc/loki/config.yaml
+        - -log.level=debug
+        - -target=query-frontend
+        image: grafana/loki:latest
+        imagePullPolicy: Always
+        name: query-frontend
+        ports:
+        - containerPort: 3100
+          name: http
+          protocol: TCP
+        resources:
+          limits:
+            memory: 1200Mi
+          requests:
+            cpu: "2"
+            memory: 600Mi
+        volumeMounts:
+        - mountPath: /etc/loki
+          name: loki_frontend
+      restartPolicy: Always
+      terminationGracePeriodSeconds: 30
+      volumes:
+      - configMap:
+          defaultMode: 420
+          name: loki_frontend
+        name: loki_frontend
+```
+
+### Grafana
+
+Once you've deployed these, you'll need your grafana datasource to point to the new frontend service, now available within the cluster at `http://query-frontend.<namespace>.svc.cluster.local:3100`.
diff --git a/docs/overview/README.md b/docs/overview/README.md
index 5bffaacc192c..129e10c8b8b0 100644
--- a/docs/overview/README.md
+++ b/docs/overview/README.md
@@ -125,6 +125,10 @@ logs stored in long-term storage.
 It first tries to query all ingesters for in-memory data before falling back to
 loading data from the backend store.
 
+### Query frontend
+
+The **query-frotend** service is an optional component in front of a pool of queriers. It's responsible for fairly scheduling requests between them, paralleling them when possible, and caching.
+
 ## Chunk Store
 
 The **chunk store** is Loki's long-term data store, designed to support