diff --git a/src/docs/core-hashing.md b/src/docs/core-hashing.md index 5f420fe3..ed027699 100644 --- a/src/docs/core-hashing.md +++ b/src/docs/core-hashing.md @@ -82,7 +82,7 @@ basically anywhere in the binary, but much like the Unix `strip` command, for the sake of simplicity and correctness, we move the section table to the back of the binary. Moreover, it would take significant additional work and require some storage to make this -operation invertable. +operation invertible. As a result, the Chalk Hash (the `HASH` metadata key), is not defined based on the file system hash. Instead, it is a _normalized_ hash, @@ -104,8 +104,8 @@ artifacts must be semantically identical. ### More on The Chalk ID -Once an artifact has been normalized, and the normalizated data stream -has been hashed using SHA-256, we programiatically take 100 bits of +Once an artifact has been normalized, and the normalized data stream +has been hashed using SHA-256, we programmatically take 100 bits of the raw hash output, base-32 encode those bits, and then add some hyphens for clarity, to get the `CHALK_ID`. diff --git a/src/docs/core-secret-manager-api.md b/src/docs/core-secret-manager-api.md index 27db4459..6222a664 100644 --- a/src/docs/core-secret-manager-api.md +++ b/src/docs/core-secret-manager-api.md @@ -6,7 +6,7 @@ signing and attestation operations. All secrets and keying material are locally generated on the system running chalk, with the secret itself being encrypted -locally priot to being sent to the API. +locally prior to being sent to the API. This document provides an overview of the Secret Manager API, how data is stored securely, and how chalk interacts with the API as a @@ -125,7 +125,7 @@ open source. The encryption scheme makes use of a PRP using the Luby-Rackoff construction. The easiest thing for us to do is to break the input into two 'halves',one being 128 bits (the width of AES, which we -will call the 'lefthalf'), and the other the rest of the remaining +will call the 'left half'), and the other the rest of the remaining width of the input (the 'right half'). The nonce is random. @@ -145,13 +145,13 @@ generate a key stream, that we XOR into the right half. The other PRF is HMAC-3. We take the round key, HMAC the right side, truncate the result to 128 bits, then XOR into the left half. -The PRFs are used in a feistel cipher, so we alternate PRFs through -our four feistel rounds. +The PRFs are used in a Feistel cipher, so we alternate PRFs through +our four Feistel rounds. While three-round Luby-Rackoff is secure against some use cases, we go through the full four rounds. -PRPs are reversable, and with feistel contstructions, it's by +PRPs are reversible, and with Feistel construction, it's by running the rounds backward. Once constructed it is this encrypted value that is sent to the diff --git a/src/docs/guide-config-overview.md b/src/docs/guide-config-overview.md index f5c4c31a..7fac4d71 100644 --- a/src/docs/guide-config-overview.md +++ b/src/docs/guide-config-overview.md @@ -32,7 +32,7 @@ The exact metadata that will be getting included in a report are defined in _templates_ which are simply collections of metadata keys (with optional conditions on when said metadata should be getting emitted). The same template can be re-used across many reports, however each of the different reports -making use of the template could have different trigger/generation condidtions +making use of the template could have different trigger/generation conditions and different destinations. Here is an excerpt from the template used by default for any metadata extracted @@ -235,7 +235,7 @@ custom_report chalk_s3_logger { ``` -Notice that we have also suppreassed local terminal output for the above report. +Notice that we have also suppressed local terminal output for the above report. ### Updating the used templates diff --git a/src/docs/guide-getting-started.md b/src/docs/guide-getting-started.md index 70c06942..32b8d03b 100644 --- a/src/docs/guide-getting-started.md +++ b/src/docs/guide-getting-started.md @@ -12,9 +12,9 @@ CI/CD pipeline. In many cases, it can be completely transparent to the user. Any configuration should be done up-front by whoever needs the data -from chalk. While chalk is designed to be deeply customisable, we also +from chalk. While chalk is designed to be deeply customizable, we also worked hard to make out-of-the-box configurations useful, and to make -it very easy to configure common usecases. +it very easy to configure common use-cases. First, let's do some basics to get up and running, both with chalking artifacts, and reporting on them in production. @@ -239,7 +239,7 @@ to note for now: 1. We've captured basic information about the build environment, including our repo, branch and commit ID. If you pull a repo remotely - from Github or Gitlab, the "ORIGIN_URI" key will give the URL where + from GitHub or GitLab, the "ORIGIN_URI" key will give the URL where the repository is hosted, instead of `local`. 2. In addition to the report, we inserted a JSON blob into our @@ -583,7 +583,7 @@ without specifying a file name (which will just print to stdout): chalk dump ``` -Youu should see: +You should see: ```bash # The default config is empty. Please see chalk documentation for examples. @@ -751,7 +751,7 @@ code you're running. Chalk really only monitors a subset of docker commands, but when wrapping docker, it will pass through all docker commands even if it -doesn't do any of its own processing on them. If chalk encoounters an +doesn't do any of its own processing on them. If chalk encounters an error while attempting to wrap docker, it will then execute the underlying docker command without chalk so that this doesn't break any pre-existing pipelines. @@ -820,7 +820,7 @@ curl http://127.0.0.1:8585/execs # for pretty json output if you have jq installed, run `curl http://127.0.0.1:8585/execs | jq` ``` -![serverout](./img/execout.png){ loading=lazy } +![exec output](./img/execout.png){ loading=lazy } You can see that, in addition to artifact information, there is also information about the operating environment, including the container diff --git a/src/docs/guide-heartbeat.md b/src/docs/guide-heartbeat.md index 913f5031..4e8dd248 100644 --- a/src/docs/guide-heartbeat.md +++ b/src/docs/guide-heartbeat.md @@ -8,7 +8,7 @@ This document is a guide on how to configure chalk so that a chalked binary or docker container emits a snapshot of network connections at set intervals. -### Prerequisities +### Prerequisites - chalk binary - (optional) dockerfile for a docker image with compatible architecture diff --git a/src/docs/guide-user-guide.md b/src/docs/guide-user-guide.md index 725bfa3f..4ccbba9b 100644 --- a/src/docs/guide-user-guide.md +++ b/src/docs/guide-user-guide.md @@ -46,7 +46,7 @@ wizard, which we expect will meet most configuration needs. We will be making source code available at the time of our public launch. Instructions on how to build directly and building via docker -file are availabe in the [Chalk Getting Started +file are available in the [Chalk Getting Started Guide](./guide-getting-started.md), as well as instructions on how to download pre-built chalk binaries. @@ -249,7 +249,7 @@ will, by default: 3. Generate a chalk report with metadata on the build operation. Chalk also reports a bit of metadata when pushing images to help -provide full tracability. +provide full traceability. Chalk can also be configured to add build-time attestation when possible. @@ -367,7 +367,7 @@ Metadata is at the core of Chalk, which categorizes data into four types: 1. **Chalk-time artifact metadata**, which is data specific to a software artifact, collected when inserting chalk marks. This data can - be put into a chalk mark, and it can also be seprately reported + be put into a chalk mark, and it can also be separately reported without putting it in the chalk mark. 2. **Chalk-time host metadata**, which is data about the environment @@ -441,7 +441,7 @@ interoperability across implementations. For instance, it is easy to write a compliant chalk library that allows programs to store their implementations inside their -executable, and retrieve them, while still interoperating with other +executable, and retrieve them, while still inter-operating with other programs that collect a wider range of metadata. We certainly intend to allow other people to implement compatible @@ -531,7 +531,7 @@ modify chalk marks. Starting with Chalk 0.1.1, Chalk mark injectors that find an existing chalk mark in an artifact will, if replacing the chalk mark, keep `$` -keys they do not recognize, unless specificly configured to remove +keys they do not recognize, unless specifically configured to remove them, while also considering them part of the previous chalk mark. With Chalk 0.1.0, the `$CHALK_CONFIG` key is the only allowable key, @@ -593,7 +593,7 @@ keys will always be directly taken from the chalk mark. No keys without the leading underscore can be reported for non-insertion operations unless they are found in a chalk mark. -We do recommend, at chalk insertion time, to to be thoughtful about +We do recommend, at chalk insertion time, to be thoughtful about what metadata will be added to the chalk mark itself. There are two key reasons for this: @@ -609,7 +609,7 @@ There are two key reasons for this: in practice, some metadata objects may be quite large, such as generated SBOMs or static analysis reports. -The first concern is, by far, the most sigificant. Even in cases where +The first concern is, by far, the most significant. Even in cases where software never intentionally leaves an organization, there can be risks. For instance, if the chalk mark contains code ownership or other contact information, while it does make life easier for @@ -669,7 +669,7 @@ In the first case, the mark does NOT need to be at the end of the file, due to the support for placeholders. A valid placeholder consists of the JSON object `{ "MAGIC" : -"dadfedabbadabbed" }`. The presense of spaces and the number of spaces +"dadfedabbadabbed" }`. The presence of spaces and the number of spaces is all flexible, but no newlines are allowed. The intent here is to allow developers to specify where they want @@ -694,8 +694,8 @@ solution. Currently, we're considering two approaches: 1. File-based artifacts will need to be scanned in their entirety before marking, and if a mark is found, the spot is reused. This would - make things easier on implementators, but could impact performance for - some larger artifiacts. + make things easier on implementors, but could impact performance for + some larger artifacts. 2. We may require marking the locations that older versions would have selected with a mark that invalidates the location, and points to the @@ -763,7 +763,7 @@ well-defined image format is not allowed. ### Replacing existing marks When a Chalk mark already exists in a document, it's up to the context -of the insertion whether the the existing chalk mark should be +of the insertion whether the existing chalk mark should be removed. In most cases, an existing chalk mark should be preserved. For instance, when chalking during deployment, any previous chalk mark from the build process should be preserved. @@ -788,7 +788,7 @@ strongly discourage using those keys without reporting. Extractors generally do not need to care about file structure for non-image formats. It should be sufficient for them to scan the bytes -of such artifacts, looking for the existance of Chalk `MAGIC` key. +of such artifacts, looking for the existence of Chalk `MAGIC` key. However, for image-based formats, the extractor needs to be aware enough of the marking requirements for that format to be able to @@ -882,7 +882,7 @@ For more information, see the following: fields. Documentation for keys will also include the conditions where the reference implementation can find them. - [The Config Overview Guide](./guide-config-overview.md) covers how - to to configure WHERE reports get sent. + to configure WHERE reports get sent. Note that compliant insertion implementations do not require compliant reporting implementations. But compliant chalk tools for other @@ -890,7 +890,7 @@ operations MUST produce fully conformant JSON. However, there are no requirements on how that JSON gets distributed or managed, other than that compliant implementations must provide a -straightforward way to make the JSON avilable to users if desired. +straightforward way to make the JSON available to users if desired. A report not in the proper format, or with key/values pairs that are not compliant, is not a Chalk report. @@ -926,7 +926,7 @@ The normalization algorithm is as follows: `TZ_OFFSET`, `DATETIME`. 3. The following key/value pair is encoded LAST, (whenever present): `ERR_INFO`. -4. The remaining keys are encoded in lexigraphical order. +4. The remaining keys are encoded in lexicographical order. 5. The encoding starts with the number of keys in the normalization, as a 32-bit little endian integer. 6. Each key/value pair is encoded in order by encoding the key, and @@ -974,7 +974,7 @@ validation discussed below built on top of the `METADATA_ID`. We currently omit `EMBEDDED_CHALK`, instead allowing them to be independently validated, if desired. While this does mean the `EMBEDDED_CHALK` key can be excised without detection at validation -time, we expect that either the relevent sub-artifacts will have +time, we expect that either the relevant sub-artifacts will have embedded chalk marks themselves, or the server will have record of the insertion. @@ -1000,12 +1000,12 @@ well, as long as there is a `HASH` field). In containers, where we do not have an easy, reliable hash, metadata normalization and validation works the same way. But we strongly -recommend automatic digitial signatures to ensure that you can detect +recommend automatic digital signatures to ensure that you can detect changes to the container. Digital signing can be used both with containers and with other artifacts. With containers, we use Sigstore with their In-Toto -attestations that we appply on `docker push`. The mark is replicated +attestations that we apply on `docker push`. The mark is replicated in full inside the attestation. For other artifacts, the signature is stored in the Chalk mark, but is @@ -1044,7 +1044,7 @@ docker.label_prefix: "com.example." ``` In the configuration file, we can also set up environment variables -for reporting, such as by defining new environment variablaes and +for reporting, such as by defining new environment variables and using simple if / else logic to set a default if the environment variable is not set on the host. For example: @@ -1135,7 +1135,7 @@ reporting on any of those keys. | Artifact | Any software artifact handled by Chalk, which can recursively include other artifacts. For instance, a Zip file is an artifact type that can currently be chalked, which can contain ELF executables that can also be chalked. | | Chalk Mark | JSON containing metadata about a software artifact, generally inserted directly into the artifact in a way that doesn’t affect execution. Often, a chalk mark will be minimal, containing only small bits of identifying information that can be used to correlate the artifact with other metadata collected. | | Unchalked | A software artifact that does not have a chalk mark embedded in it. | -| Metadata Key | Each piece of metadata Chalk is able to collect (metadata being data about an artifact or a host on which an artifact has been found) is associated with a metadata key. Chalk reports all metadata in JSon key/value pairs, and you specify what gets added to a chalk mark and what gets reported on by listing the metadata keys you’re interested in via the report template and mark emplate. | +| Metadata Key | Each piece of metadata Chalk is able to collect (metadata being data about an artifact or a host on which an artifact has been found) is associated with a metadata key. Chalk reports all metadata in JSon key/value pairs, and you specify what gets added to a chalk mark and what gets reported on by listing the metadata keys you’re interested in via the report template and mark template. | | Chalking | The act of adding metadata to a software artifact. Aka, “insertion”. | | Extraction | The act of reading metadata from artifacts and reporting on them. | | Report | Every time Chalk runs, it will want to report on its activity. That can include information about artifacts, and also about the host. Reports are “published” to output “sinks”. By default, you’ll get reports output to the console, and written to a local log file, but can easily set up HTTPS post or writing to object storage either by supplying environment variables, or by editing the Chalk configuration. | @@ -1143,6 +1143,6 @@ reporting on any of those keys. | Mark Template | Like report templates, you have complete flexibility over what goes into chalk marks. A mark template is a specification of metadata keys that you want to go into the chalk mark. | | Sinks | Output types handled by Chalk. Currently, chalk supports JSON log files, rotating (self-truncating) JSON log files, s3 objects, http/https post, and stdin/stdout. | | Chalk ID | A value unique to an unchalked artifact. Usually, it is derived from the SHA-256 hash of the unchalked artifact, except when that hash is not available at chalking time, in which case, it’s random. Chalk IDs are 100 bits, and human readable (Base32). | -| Metadata ID | A value unique to a chalked artifact. It is always derived from a normalized hash of all other metadata (except for any metadata keys involved in signing the Metadata ID). Metdata IDs are also 100 bits, and Base32 encoded. | +| Metadata ID | A value unique to a chalked artifact. It is always derived from a normalized hash of all other metadata (except for any metadata keys involved in signing the Metadata ID). Metadata IDs are also 100 bits, and Base32 encoded. | | Chalkable keys | Metadata keys that can be added to chalk marks. When reported for an artifact (e.g., during extraction in production), they will always indicated chalk-time metadata. | | Non-chalkable keys | Metadata keys that will NOT be added to chalk marks. They will always be reported for the current operation, and start with a `_`. There are plenty of metadata keys that have chalkable and non-chalkable versions. | diff --git a/src/docs/howto-app-inventory.md b/src/docs/howto-app-inventory.md index 00fff409..a4a89fee 100644 --- a/src/docs/howto-app-inventory.md +++ b/src/docs/howto-app-inventory.md @@ -248,7 +248,7 @@ at [http://localhost:8585/docs](http://localhost:8585/docs) ## Warning -This how-to was written for local demonstration purposes only.There is no security for this how-to. You should always have authn, authz and uses SSL as an absolute minimum. +This how-to was written for local demonstration purposes only. There is no security for this how-to. You should always have authentication, authorization and use TLS/SSL as an absolute minimum. ## Our cloud platform diff --git a/src/docs/howto-compliance.md b/src/docs/howto-compliance.md index 2822a5c4..9177ddeb 100644 --- a/src/docs/howto-compliance.md +++ b/src/docs/howto-compliance.md @@ -243,7 +243,7 @@ in those environments often take advantage of access to build environments to subtly trojan software. Therefore, very mature security programs in places with an acute -awareness of wisk, very much want the ability to monitor the integrity +awareness of risk, very much want the ability to monitor the integrity of builds throughout their supply chain. And, baring that, they'd are looking to get as much information as possible, or at least some assurances of build practices. @@ -264,7 +264,7 @@ was deployed and what else was going on in the environment. Chalk was built originally for those internal use cases, to get people the data they need to be able to automate the work they do to graph out the relationships in their software. However, the exact same -approach turns out to give other companies extactly what they're +approach turns out to give other companies exactly what they're looking for. #### Digital signatures @@ -391,11 +391,11 @@ by a hosted build platform. Lodash is a popular NPM library that was, in March 2021 found to have a prototype pollution vulnerability. It was the most depended on package in NPM meaning almost all applications built in Node.js were affected. -[Prototype Pollution in lodash](https://github.com/advisories/GHSA-p6mc-m468-83gw) - Github +[Prototype Pollution in lodash](https://github.com/advisories/GHSA-p6mc-m468-83gw) - GitHub ##### Netbeans -Ih 2020 it was reported that a tool called the Octopus scanner was searching Github and injecting malware into projects that were using the popular Java development framework Netbeans and then serving up malware to all applications built from those code repos. +In 2020 it was reported that a tool called the Octopus scanner was searching GitHub and injecting malware into projects that were using the popular Java development framework Netbeans and then serving up malware to all applications built from those code repos. [https://duo.com/decipher/malware-infects-netbeans-projects-in-software-supply-chain-attack](https://duo.com/decipher/malware-infects-netbeans-projects-in-software-supply-chain-attack) - Duo Research Labs @@ -407,6 +407,6 @@ Log4J is a popular logging library for the Java programming language. In late 20 ##### SolarWinds -The SolarWinds attack used an IT monitoring system, Orion, which which had over 30,000 organizations including Cisco, Deloitte, Intel, Microsoft, FireEye, and US government departments, including the Department of Homeland Security. The attackers created a backdoor that was delivered via a software update. +The SolarWinds attack used an IT monitoring system, Orion, which had over 30,000 organizations including Cisco, Deloitte, Intel, Microsoft, FireEye, and US government departments, including the Department of Homeland Security. The attackers created a backdoor that was delivered via a software update. [The Untold Story of the Boldest Supply-Chain Hack Ever](https://www.wired.com/story/the-untold-story-of-solarwinds-the-boldest-supply-chain-hack-ever/) - Wired Magazine diff --git a/src/docs/howto-net-services.md b/src/docs/howto-net-services.md index 90b705d4..d7681dbd 100644 --- a/src/docs/howto-net-services.md +++ b/src/docs/howto-net-services.md @@ -139,7 +139,7 @@ configured reporting. You should see some additional JSON output from `chalk` after the build finishes, identifying the metadata information for the newly -chalked contianer: +chalked container: ```json [ @@ -187,7 +187,7 @@ chalked contianer: If you built your container with the commands above, you should now be able to now run it with: `docker run --rm -it mychalkedcontainer` -Also, if you kept the the `output_to_screen` sink to be `enabled: +Also, if you kept the `output_to_screen` sink to be `enabled: true`, and set the heartbeat window to 10 seconds, then after 10 seconds you should see output similar to the following: