Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt latest changes for data-prep #712

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lianhao
Copy link
Collaborator

@lianhao lianhao commented Jan 16, 2025

Description

Adapt latest changes for data-prep.

Issues

n/a.

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Dependencies

List the newly introduced 3rd party dependency if exists.

Tests

Describe the tests that you ran to verify your changes.

@lianhao
Copy link
Collaborator Author

lianhao commented Jan 16, 2025

We need dataprep related changes get merged into GenAIExamples first. Otherwise, there will be no image in CI

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few text suggestions, otherwise looks fine.

helm-charts/common/data-prep/README.md Outdated Show resolved Hide resolved
helm-charts/common/data-prep/milvus-values.yaml Outdated Show resolved Hide resolved
Comment on lines +31 to +32
initContainers:
- name: wait-for-db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't helm wait for the Chart be enough instead?

Or is this intended for cases where Helm is used just to generate manifest file?

Copy link
Collaborator Author

@lianhao lianhao Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helm wait is not enough. The dataprep code itself will create a redis/milvus client to connect to DB when it starts, if that fails, it will never retry to create the db client and hence fails all the incoming user requests later. This is actually a cloud native non-friendly bug in the dataprep/retriever source code itself. But we don't have time to fix that in this release cycle. Already created an issue in GenAIComps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. And I guess helm template manifest generation is valid use-case too...

@lianhao lianhao added this to the v1.2 milestone Jan 17, 2025
@lianhao lianhao force-pushed the data-prep branch 2 times, most recently from cee6420 to 7ed1a69 Compare January 17, 2025 08:20
@@ -38,21 +44,23 @@ Then run the command `kubectl port-forward svc/data-prep 6007:6007` to expose th
Open another terminal and run the following command to verify the service if working:

```console
curl http://localhost:6007/v1/dataprep \
curl http://localhost:6007/v1/dataprep/ingest \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataprep api has been changed for file uploading?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yes

Copy link
Contributor

@eero-t eero-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. I have suggestion about the wait variable name though.

TESTHOST=$(MILVUS_HOST);
TESTPORT=$(MILVUS_PORT);
{{- end }}
retry_count={{ .Values.retryCount | default 60 }};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO better name would be e.g. retryTimeoutSeconds, because that way there's no dependency on whether checks are done at 1s, 2s or 5s interval...

Copy link
Collaborator Author

@lianhao lianhao Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting we do a single nc -z -w retryTimeoutSeconds <hostname> <port> test, without the while loop? Sometime nc -z -w retryTimeoutSeconds will fail early, even before retryTimeoutSeconds time

Copy link
Contributor

@eero-t eero-t Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if that is not reliable, loop can still be made independent of sleep interval:

timeout={{ .Values.retryTimeoutSeconds | default 60 }};
j=1
while ! nc -z ${TESTHOST} ${TESTPORT}; do
    if [[ $j -ge ${timeout} ]] && echo "ERROR: ${TESTHOST}:${TESTPORT} is NOT reachable within $j seconds!" && exit 1;
    j=$((j+2)); sleep 2;
done;

EDIT: I think it's fine if few extra secs are waited when the last interval increase does not match the timeout exactly, and service is still unreacable.

Copy link
Collaborator Author

@lianhao lianhao Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjust the wait time code a little bit to be more time accurate as much as I could. In a very nasty network environment where a normal successful tcp 3-way handshakes takes more than 1 seconds(which is very rarely I assume), users can set waitTimeout to a larger value.

If this is ok, I'll apply this to all other helm chart with the same wait-fot-xxx logic.

Signed-off-by: Lianhao Lu <lianhao.lu@intel.com>
@lianhao
Copy link
Collaborator Author

lianhao commented Jan 18, 2025

Need to wait for opea-project/GenAIExamples#1408 to get merged in first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants