From 961ad404f2a468d008a17ad8572844b343e00bab Mon Sep 17 00:00:00 2001 From: blublinsky Date: Thu, 25 Apr 2024 21:12:28 +0100 Subject: [PATCH] readme fixes --- data-processing-lib/doc/using_s3_transformers.md | 2 ++ data-processing-lib/src/data_processing/__init__.py | 0 transforms/code/code_quality/README.md | 3 +++ transforms/code/malware/README.md | 5 +++++ transforms/code/proglang_select/README.md | 7 +++++++ transforms/universal/doc_id/Readme.md | 5 +++++ transforms/universal/ededup/Readme.md | 4 ++++ transforms/universal/fdedup/Readme.md | 5 +++++ transforms/universal/filter/README.md | 6 ++++++ transforms/universal/noop/README.md | 4 ++++ transforms/universal/tokenization/README.md | 5 +++++ 11 files changed, 46 insertions(+) delete mode 100644 data-processing-lib/src/data_processing/__init__.py diff --git a/data-processing-lib/doc/using_s3_transformers.md b/data-processing-lib/doc/using_s3_transformers.md index 3c671c3c99..5ea30049b0 100644 --- a/data-processing-lib/doc/using_s3_transformers.md +++ b/data-processing-lib/doc/using_s3_transformers.md @@ -80,6 +80,8 @@ mc cp --recursive universal/tokenization/test-data/ds02/input/ local/test/tokeni *Note*, that once the data is copied, Minio is storing it on the local file system, so you do not need to copy it again after cluster restart +## Creating access and secret key for Minio access + The last thing is to add Minio access and secret keys for accessing it. The following command: ```shell diff --git a/data-processing-lib/src/data_processing/__init__.py b/data-processing-lib/src/data_processing/__init__.py deleted file mode 100644 index e69de29bb2..0000000000 diff --git a/transforms/code/code_quality/README.md b/transforms/code/code_quality/README.md index 9b66d84bd0..c049cd53ee 100644 --- a/transforms/code/code_quality/README.md +++ b/transforms/code/code_quality/README.md @@ -76,4 +76,7 @@ the following command line arguments are available in addition to * "--tokenizer" - input a tokenizer to convert the data into tokens. The default tokenizer is `codeparrot/codeparrot` * "--hf_token" - input the Hugging Face auth token to download the tokenizer. This option is only required for the tokenizer's whose access is restricted in Hugging Face. +## Executing S3 examples +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/code/malware/README.md b/transforms/code/malware/README.md index a152f4f9c8..c2380e8797 100644 --- a/transforms/code/malware/README.md +++ b/transforms/code/malware/README.md @@ -148,3 +148,8 @@ the following command line arguments are available in addition to --malware_output_column MALWARE_OUTPUT_COLUMN output column name ``` + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/code/proglang_select/README.md b/transforms/code/proglang_select/README.md index ac1517299d..e5b5cfe69a 100644 --- a/transforms/code/proglang_select/README.md +++ b/transforms/code/proglang_select/README.md @@ -61,3 +61,10 @@ the following command line arguments are available in addition to secret_key: secret key help text url: optional s3 url region: optional s3 region``` +``` + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example + diff --git a/transforms/universal/doc_id/Readme.md b/transforms/universal/doc_id/Readme.md index bdffeb58e9..6a8f075b8b 100644 --- a/transforms/universal/doc_id/Readme.md +++ b/transforms/universal/doc_id/Readme.md @@ -91,3 +91,8 @@ the following command line arguments are available in addition to Compute unique integer id and place in the given named column ``` These correspond to the configuration keys described above. + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/universal/ededup/Readme.md b/transforms/universal/ededup/Readme.md index c65fdab686..1a65c702d5 100644 --- a/transforms/universal/ededup/Readme.md +++ b/transforms/universal/ededup/Readme.md @@ -102,4 +102,8 @@ the following command line arguments are available in addition to These correspond to the configuration keys described above. +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/universal/fdedup/Readme.md b/transforms/universal/fdedup/Readme.md index 32615004ed..304cec8089 100644 --- a/transforms/universal/fdedup/Readme.md +++ b/transforms/universal/fdedup/Readme.md @@ -210,3 +210,8 @@ the following command line arguments are available in addition to ``` These correspond to the configuration keys described above. + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/universal/filter/README.md b/transforms/universal/filter/README.md index 6fdbc6ea2f..5203c13dd6 100644 --- a/transforms/universal/filter/README.md +++ b/transforms/universal/filter/README.md @@ -258,3 +258,9 @@ the following command line arguments are available in addition to logical operator (AND or OR) that joins filter criteria ``` + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example + diff --git a/transforms/universal/noop/README.md b/transforms/universal/noop/README.md index ed383046b9..6b98856fb4 100644 --- a/transforms/universal/noop/README.md +++ b/transforms/universal/noop/README.md @@ -63,6 +63,10 @@ In addition, there are some useful `make` targets (see conventions above): * `make help` - displays the available `make` targets and help text. +## Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/universal/tokenization/README.md b/transforms/universal/tokenization/README.md index 02f0dff905..5460d5250e 100644 --- a/transforms/universal/tokenization/README.md +++ b/transforms/universal/tokenization/README.md @@ -97,3 +97,8 @@ the following command line arguments are available in addition to --tkn_chunk_size TKN_CHUNK_SIZE Specify >0 value to tokenize each row/doc in chunks of characters (rounded in words) ``` + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example