diff --git a/data-processing-lib/doc/using_s3_transformers.md b/data-processing-lib/doc/using_s3_transformers.md index 3c671c3c9..5ea30049b 100644 --- a/data-processing-lib/doc/using_s3_transformers.md +++ b/data-processing-lib/doc/using_s3_transformers.md @@ -80,6 +80,8 @@ mc cp --recursive universal/tokenization/test-data/ds02/input/ local/test/tokeni *Note*, that once the data is copied, Minio is storing it on the local file system, so you do not need to copy it again after cluster restart +## Creating access and secret key for Minio access + The last thing is to add Minio access and secret keys for accessing it. The following command: ```shell diff --git a/data-processing-lib/src/data_processing/__init__.py b/data-processing-lib/src/data_processing/__init__.py deleted file mode 100644 index e69de29bb..000000000 diff --git a/transforms/code/code_quality/README.md b/transforms/code/code_quality/README.md index 9b66d84bd..c049cd53e 100644 --- a/transforms/code/code_quality/README.md +++ b/transforms/code/code_quality/README.md @@ -76,4 +76,7 @@ the following command line arguments are available in addition to * "--tokenizer" - input a tokenizer to convert the data into tokens. The default tokenizer is `codeparrot/codeparrot` * "--hf_token" - input the Hugging Face auth token to download the tokenizer. This option is only required for the tokenizer's whose access is restricted in Hugging Face. +## Executing S3 examples +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/code/malware/README.md b/transforms/code/malware/README.md index a152f4f9c..c2380e879 100644 --- a/transforms/code/malware/README.md +++ b/transforms/code/malware/README.md @@ -148,3 +148,8 @@ the following command line arguments are available in addition to --malware_output_column MALWARE_OUTPUT_COLUMN output column name ``` + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/code/proglang_select/README.md b/transforms/code/proglang_select/README.md index ac1517299..e5b5cfe69 100644 --- a/transforms/code/proglang_select/README.md +++ b/transforms/code/proglang_select/README.md @@ -61,3 +61,10 @@ the following command line arguments are available in addition to secret_key: secret key help text url: optional s3 url region: optional s3 region``` +``` + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example + diff --git a/transforms/universal/doc_id/Readme.md b/transforms/universal/doc_id/Readme.md index bdffeb58e..6a8f075b8 100644 --- a/transforms/universal/doc_id/Readme.md +++ b/transforms/universal/doc_id/Readme.md @@ -91,3 +91,8 @@ the following command line arguments are available in addition to Compute unique integer id and place in the given named column ``` These correspond to the configuration keys described above. + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/universal/ededup/Readme.md b/transforms/universal/ededup/Readme.md index c65fdab68..1a65c702d 100644 --- a/transforms/universal/ededup/Readme.md +++ b/transforms/universal/ededup/Readme.md @@ -102,4 +102,8 @@ the following command line arguments are available in addition to These correspond to the configuration keys described above. +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/universal/fdedup/Readme.md b/transforms/universal/fdedup/Readme.md index 32615004e..304cec808 100644 --- a/transforms/universal/fdedup/Readme.md +++ b/transforms/universal/fdedup/Readme.md @@ -210,3 +210,8 @@ the following command line arguments are available in addition to ``` These correspond to the configuration keys described above. + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/universal/filter/README.md b/transforms/universal/filter/README.md index 6fdbc6ea2..5203c13dd 100644 --- a/transforms/universal/filter/README.md +++ b/transforms/universal/filter/README.md @@ -258,3 +258,9 @@ the following command line arguments are available in addition to logical operator (AND or OR) that joins filter criteria ``` + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example + diff --git a/transforms/universal/noop/README.md b/transforms/universal/noop/README.md index ed383046b..6b98856fb 100644 --- a/transforms/universal/noop/README.md +++ b/transforms/universal/noop/README.md @@ -63,6 +63,10 @@ In addition, there are some useful `make` targets (see conventions above): * `make help` - displays the available `make` targets and help text. +## Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example diff --git a/transforms/universal/tokenization/README.md b/transforms/universal/tokenization/README.md index 02f0dff90..5460d5250 100644 --- a/transforms/universal/tokenization/README.md +++ b/transforms/universal/tokenization/README.md @@ -97,3 +97,8 @@ the following command line arguments are available in addition to --tkn_chunk_size TKN_CHUNK_SIZE Specify >0 value to tokenize each row/doc in chunks of characters (rounded in words) ``` + +### Executing S3 examples + +To execute S3 examples, please refer to this [document](../../../data-processing-lib/doc/using_s3_transformers.md) +for setting up MinIO and mc prior to running the example