Skip to content

Commit

Permalink
Add hyperlinks and paths validation. (#699)
Browse files Browse the repository at this point in the history
* Add hyperlinks and paths validation.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Fix format issue.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Change runs-on

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Add hyperlinks and paths validation.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Fix format issue.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Change runs-on

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Change link head.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Fix issue.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Add output.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Change serch rules.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Change output and fix error

Signed-off-by: ZePan110 <ze.pan@intel.com>

* For test

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Fix error

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Fix error.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Fix error.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* test.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Fix issue and add output

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Fix issue and test

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Add PR's own detection.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* reduce output

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Remove debug code.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* test

Signed-off-by: ZePan110 <ze.pan@intel.com>

* test.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Compatible with the origin of PR.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Ignore links that require verification by a real person.
Restore test files.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Change the judgment method.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Add need ignore link.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Change runs-on.

Signed-off-by: ZePan110 <ze.pan@intel.com>

* Redefine output.

Signed-off-by: ZePan110 <ze.pan@intel.com>

---------

Signed-off-by: ZePan110 <ze.pan@intel.com>
  • Loading branch information
ZePan110 authored Sep 19, 2024
1 parent e29865e commit ccdd2d0
Show file tree
Hide file tree
Showing 5 changed files with 125 additions and 4 deletions.
121 changes: 121 additions & 0 deletions .github/workflows/pr-dockerfile-path-scan.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,124 @@ jobs:
echo "Please modify the corresponding README in GenAIExamples repo and ask suyue.chen@intel.com for final confirmation."
exit 1
fi
check-the-validity-of-hyperlinks-in-README:
runs-on: ubuntu-latest
steps:
- name: Clean Up Working Directory
run: sudo rm -rf ${{github.workspace}}/*

- name: Checkout Repo GenAIComps
uses: actions/checkout@v4

- name: Check the Validity of Hyperlinks
# ignore_links=("https://platform.openai.com/docs/api-reference/fine-tuning"
# "https://platform.openai.com/docs/api-reference/"
# "https://openai.com/index/whisper/"
# "https://platform.openai.com/docs/api-reference/chat/create")
run: |
cd ${{github.workspace}}
fail="FALSE"
url_lines=$(grep -Eo '\]\(http[s]?://[^)]+\)' --include='*.md' -r .)
if [ -n "$url_lines" ]; then
for url_line in $url_lines; do
url=$(echo "$url_line"|cut -d '(' -f2 | cut -d ')' -f1|sed 's/\.git$//')
path=$(echo "$url_line"|cut -d':' -f1 | cut -d'/' -f2-)
if [[ "https://platform.openai.com/docs/api-reference/fine-tuning" == "$url" || "https://platform.openai.com/docs/api-reference/" == "$url" || "https://openai.com/index/whisper/" == "$url" || "https://platform.openai.com/docs/api-reference/chat/create" == "$url" ]]; then
echo "Link "$url" from ${{github.workspace}}/$path need to be verified by a real person."
else
response=$(curl -L -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Invalid link from ${{github.workspace}}/$path: $url"
fail="TRUE"
fi
fi
fi
done
fi
if [[ "$fail" == "TRUE" ]]; then
exit 1
else
echo "All hyperlinks are valid."
fi
shell: bash

check-the-validity-of-relative-path:
runs-on: ubuntu-latest
steps:
- name: Clean up Working Directory
run: sudo rm -rf ${{github.workspace}}/*

- name: Checkout Repo GenAIComps
uses: actions/checkout@v4

- name: Checking Relative Path Validity
run: |
cd ${{github.workspace}}
fail="FALSE"
repo_name=${{ github.event.pull_request.head.repo.full_name }}
if [ "$(echo "$repo_name"|cut -d'/' -f1)" != "opea-project" ]; then
owner=$(echo "${{ github.event.pull_request.head.repo.full_name }}" |cut -d'/' -f1)
branch="https://github.com/$owner/GenAIComps/tree/${{ github.event.pull_request.head.ref }}"
else
branch="https://github.com/opea-project/GenAIComps/blob/${{ github.event.pull_request.head.ref }}"
fi
link_head="https://github.com/opea-project/GenAIComps/blob/main"
png_lines=$(grep -Eo '\]\([^)]+\)' --include='*.md' -r .|grep -Ev 'http')
if [ -n "$png_lines" ]; then
for png_line in $png_lines; do
refer_path=$(echo "$png_line"|cut -d':' -f1 | cut -d'/' -f2-)
png_path=$(echo "$png_line"|cut -d '(' -f2 | cut -d ')' -f1)
if [[ "${png_path:0:1}" == "/" ]]; then
check_path=${{github.workspace}}$png_path
elif [[ "${png_path:0:1}" == "#" ]]; then
check_path=${{github.workspace}}/$refer_path$png_path
else
check_path=${{github.workspace}}/$(dirname "$refer_path")/$png_path
fi
real_path=$(realpath $check_path)
if [ $? -ne 0 ]; then
echo "Path $png_path in file ${{github.workspace}}/$refer_path does not exist"
fail="TRUE"
else
url=$link_head$(echo "$real_path" | sed 's|.*/GenAIComps||')
response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Retry failed. Check branch ${{ github.event.pull_request.head.ref }}"
url_dev=$branch$(echo "$real_path" | sed 's|.*/GenAIComps||')
response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url_dev")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url_dev")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Invalid path from ${{github.workspace}}/$refer_path: $png_path"
fail="TRUE"
fi
else
echo "Check branch ${{ github.event.pull_request.head.ref }} successfully."
fi
fi
fi
fi
done
fi
if [[ "$fail" == "TRUE" ]]; then
exit 1
else
echo "All hyperlinks are valid."
fi
shell: bash
2 changes: 1 addition & 1 deletion comps/dataprep/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ apt-get install libreoffice

## Use LVM (Large Vision Model) for Summarizing Image Data

Occasionally unstructured data will contain image data, to convert the image data to the text data, LVM can be used to summarize the image. To leverage LVM, please refer to this [readme](../lvms/README.md) to start the LVM microservice first and then set the below environment variable, before starting any dataprep microservice.
Occasionally unstructured data will contain image data, to convert the image data to the text data, LVM can be used to summarize the image. To leverage LVM, please refer to this [readme](../lvms/llava/README.md) to start the LVM microservice first and then set the below environment variable, before starting any dataprep microservice.

```bash
export SUMMARIZE_IMAGE_VIA_LVM=1
Expand Down
2 changes: 1 addition & 1 deletion comps/finetuning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ curl http://${your_ip}:8015/v1/finetune/list_checkpoints -X POST -H "Content-Typ

### 3.4 Leverage fine-tuned model

After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [reranks](../reranks/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../embeddings/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../llms/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.
After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [reranks](../reranks/fastrag/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../embeddings/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../llms/text-generation/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.

## 🚀4. Descriptions for Finetuning parameters

Expand Down
2 changes: 1 addition & 1 deletion comps/guardrails/llama_guard/langchain/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ curl 127.0.0.1:8088/generate \

### 1.4 Start Guardrails Service

Optional: If you have deployed a Guardrails model with TGI Gaudi Service other than default model (i.e., `meta-llama/Meta-Llama-Guard-2-8B`) [from section 1.2](## 1.2 Start TGI Gaudi Service), you will need to add the eviornment variable `SAFETY_GUARD_MODEL_ID` containing the model id. For example, the following informs the Guardrails Service the deployed model used LlamaGuard2:
Optional: If you have deployed a Guardrails model with TGI Gaudi Service other than default model (i.e., `meta-llama/Meta-Llama-Guard-2-8B`) [from section 1.2](#12-start-tgi-gaudi-service), you will need to add the eviornment variable `SAFETY_GUARD_MODEL_ID` containing the model id. For example, the following informs the Guardrails Service the deployed model used LlamaGuard2:

```bash
export SAFETY_GUARD_MODEL_ID="meta-llama/Meta-Llama-Guard-2-8B"
Expand Down
2 changes: 1 addition & 1 deletion comps/vectorstores/pathway/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Set the environment variables for Pathway, and the embedding model.

> Note: If you are using `TEI_EMBEDDING_ENDPOINT`, make sure embedding service is already running.
> See the instructions under [here](../../../retrievers/langchain/pathway/README.md)
> See the instructions under [here](../../retrievers/pathway/langchain/README.md)
```bash
export PATHWAY_HOST=0.0.0.0
Expand Down

0 comments on commit ccdd2d0

Please sign in to comment.