This is the source for https://docs.snowplow.io/docs.
All contributions are welcome, from reporting issues to correcting typos and formatting to full-blown how-tos and guides.
If you are new to Github, the easiest way to propose changes is via the UI.
For more substantial contributions, below you can find tips on how to preview your changes, as well as how to organize and format your content.
Step 1. Clone this repository.
Step 2 (one-time setup). Install some tools:
brew install node
brew install yarn
# On Linux, use your favorite package manager
Step 3. Install dependencies and start the preview:
yarn
yarn start
(Hitting an error with yarn start
? brew upgrade yarn
and try again. Yarn might also suggest further commands to update docusaurus
.)
Step 4. Go to localhost:3000
in your browser and enjoy!
In this section you’ll find some general tips on how the docs are structured.
The sidebar on the left follows file structure (all docs are in the /docs
folder). So when you add new pages, create folders as you see fit.
To control the position of a section in the sidebar, go to the index.md
file for that section and adjust the sidebar_position
attribute at the top (see this example). Sidebar positions are just numbers, and you can use any number as long as the order is correct.
The text shown in the sidebar doesn't have to be the same as the page title. Use the sidebar_label
attribute to specify a different label for the section.
To update the sidebar_position
, sidebar_label
, and/or page title for multiple sections in one go, you can use the extract_index_attributes.rb
script followed by the update_index_attributes.rb
script. They're both in the root of this repo.
Use the extract_index_attributes.rb
script like this, providing a path to the parent folder you're interested in:
ruby extract_index_attributes.rb docs/collecting-data
It will output the relevant attributes of that folder's index.md
file, and the index.md
files of subfolders in it (just 1 level deep), into a new file called update_attributes_here.txt
(in the repo root). The file will look something like this:
docs/data-product-studio/data-quality/failed-events
- title: "Managing data quality"
- sidebar_label: "Failed events"
- sidebar_position: 3
docs/data-product-studio/data-quality/failed-events/exploring-failed-events
- title: "Exploring failed events"
- sidebar_label: "Explore"
- sidebar_position: 3
Update the attributes to the new values you want. You can delete the sections for any folders you don't want to edit.
To also output the attributes of the next level of subfolders inside each subfolder (so 2 levels deep in total), use the -r
flag:
ruby extract_index_attributes.rb -r docs/collecting-data
Once you finish editing update_attributes_here.txt
, save the file and run the update_index_attributes.rb
script:
ruby update_index_attributes.rb
It'll update the index.md
files as appropriate.
You can now delete the update_attributes_here.txt
file.
Some documentation is only relevant to a particular offering. You can indicate it like this:
---
title: ...
...
sidebar_custom_props:
offerings:
- bdp
...
---
This will result in an icon appearing in the sidebar, as well as an automatic banner on the page, specifying that the docs only apply to a given offering.
The available values are: bdp
and community
. Do not specify both values at once — if a piece of documentation is relevant to all offerings, there should be no offerings
property as that’s the default.
Whenever the same functionality can be achieved in multiple offerings but in a different way (e.g. managing schemas), create a parent folder (“Managing schemas”) that’s offering-neutral, and then add offering-specific pages inside it. This way, other pages can link to the generic page without having to specify different methods for different offerings.
For links within this documentation, please end the link with /index.md
. This way all links will be checked, and you’ll get an error if a link is broken at any point.
If you forgot to do this, you can quickly fix a bunch of links by running ./make-links-validatable.sh
. (You might need to install the moreutils
package via brew install moreutils
to get the sponge
command.)
Avoid using relative links (e.g. ../../setup/index.md
) unless within a versioned module section.
There are several key concepts in Snowplow: events (self-describing, structured), entities, schemas. We must ensure that we use and explain them consistently.
Please, use up-to-date terms:
- Self-describing event, not unstructured event
- Entities, not contexts (it’s ok-ish to refer to a set of entities as “context”, but only in a casual sense, as in “these provide some context to the event”)
- Failed events and not bad rows, unless specifically referring to the legacy bad row JSON format and associated tooling
- If you are writing about schemas, pick “schema” or “data structure” and stick with it
Please, do not over-explain these in any of your writing. Instead, just link to one of the existing concept pages:
- Events in general:
/docs/understanding-your-pipeline/events/index.md
- Custom events:
/docs/understanding-your-pipeline/events/index.md#self-describing-events
- Self-describing events:
/docs/understanding-your-pipeline/events/index.md#self-describing-events
- Structured events:
/docs/understanding-your-pipeline/events/index.md#structured-events
- Entities in general:
/docs/understanding-your-pipeline/entities/index.md
- Custom entities:
/docs/understanding-your-pipeline/events/index.md#custom-entities
- Schemas:
/docs/understanding-your-pipeline/schemas/index.md
- Iglu resolvers/registries:
/docs/understanding-your-pipeline/schemas/index.md#iglu
- Failed events:
/docs/understanding-your-pipeline/failed-events/index.md
You can create reusable fragments and include them in multiple files (see this example).
Some of our modules are versioned (e.g. trackers, loaders). Here are a few simple rules to follow.
- Within pages for versioned modules ONLY, use relative links (e.g.
../setup/index.md
) when pointing to pages for the same version. This helps moving directories around without breaking the links. - For the latest docs, don’t include the version number in the URL. Otherwise we’d need to update internal links to it with every version change (also, it would get indexed and we’ll need to add a redirect later to avoid breaking external links). For example, see the Scala tracker docs — the path ends with
scala-tracker
rather thanscala-tracker-2-0
. - Put older versions in a single folder, e.g.
previous-versions/
. In theindex.md
for that folder, add the following:This automatically enables the “you are looking at an old version” warning. See the Scala tracker docs for an example of how to add thesidebar_custom_props: outdated: true
previous-versions
directory and what to put there. - When a new version is released, you can either update the latest version pages, or move them to
previous-versions
and replace with the new content. If there are not too many breaking changes, you might want to do the former to avoid having too many previous version directories. - Put the latest bugfix version for each component into componentVersions.js. This way you only need to update it in one place when a new bugfix release comes out. See the Scala tracker docs for how to then use this on the page. If you need to use them in a markdown table you will have to render it in a particular way for it to work, see the dbt package docs for an example.
When you move pages around, make sure to add a redirect in static/_redirects
.
This ensures that any external links pointing to the old URL still work.
To easily accomplish this, you can use ./move.sh
. (You might need to install the moreutils
package via brew install moreutils
to get the sponge
command.)
./move.sh docs/old/page/location docs/new/page/location
This command will automatically move the pages, create redirect rules, and add all changes to git.
Note: the script is somewhat brittle, and you need to follow these rules:
- Only run it from the root of the repo
- Use relative paths, starting with
docs/
(like above) - End the path on a directory, rather than an
index.md
- Do not include trailing slashes
To move multiple sections at once, use the move-multiple.sh
script. It's a wrapper around the move.sh
script, so it has the same limitations as that one.
Add the old and new paths to the moves
array at the top of the move-multiple.sh
script. For example:
moves=(
"docs/collecting-data docs/sources",
"docs/contributing docs/resources/contributing"
)
Run the script with
./move-multiple.sh
It will run the move.sh
script for each move command, in order. If a folder in the new path doesn't exist, it'll create it.
Additionally, you can add a new sidebar label for each move command. It'll be added to (or update existing) the index.md
file of the moved folder. This can make it easier to find your rearranged pages.
For example:
moves=(
"docs/collecting-data docs/sources Sources NEW Update this",
)
In this section you’ll find some general tips on how to write the pages.
The documentation is written in Markdown. In addition, since we are using Docusaurus, more features are available. Here are a few of our favorites:
- Use “admonitions” (e.g.
:::note
) to draw attention to a certain paragraph. - Use collapsible blocks for information that most readers will want to skip.
- Use code blocks for code, and don’t forget to specify the languange.
- Use tabs for content where multiple alternatives are possible (e.g. iOS code vs Android code). Inside the tabs, try to only put the content that differs.
To ensure consistency of our codebase we also utilize prettier to format our source files and enforce correctness in a CI step.
For the best experience set up your IDE to automatically format files on save. Here's a guide for VSCode.
You can also run the formatter command before committing changes manually:
yarn format
We need to ensure that all images are visible in both light and dark themes, if your image does not have a background this can be achieved in a few ways:
- Where possible for diagrams prefer the use of mermaid diagrams as these adjust automatically
- If you are using
drawio
diagrams, ensure that either the diagram has a non-transparent background, or provide a dark and light mode version using Themed images. If possible also include a copy of your diagram in the image as part of the export to allow easy editing in the future. - If you are editing a page with an existing diagram you cannot recreate, you can replace the existing image link with the below, which will add the light background even in dark mode. If you do this please also comment on the issue here to let us know so we can try and re-make the diagram in the future.
<div style={{"background-color": '#F2F4F7'}}> <img src={require("./images/IMAGE_NAME.png").default}/> </div>
We have created a selection of VSCode Snippets that cover some of the common, but fiddly, blocks of code you may require when writing docs. You can trigger these by starting to type their name and then pressing tab or use Insert Snippet
in the Command Palette. All snippets we have created can be found here and we currently have ones for:
- Admonitions
- Collapsible blocks
- Tabs
- Themed images
- DocCards
- Mdx blocks for enabling tabs and themed images
- Front matter