Skip to content

Commit

Permalink
Merge branch 'resolve-issue-patchflow' into generatereadme-GenerateCo…
Browse files Browse the repository at this point in the history
…deRepositoryEmbeddingsresolve-issue-patchflow
  • Loading branch information
codelion authored Apr 29, 2024
2 parents 8f9b6d1 + 21cac9d commit 42435f2
Show file tree
Hide file tree
Showing 5 changed files with 61 additions and 4 deletions.
4 changes: 1 addition & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,5 @@ jobs:
- name: Propose relevant file to issues
run: |
poetry run patchwork ResolveIssue --log debug \
--openai_api_key=${{ secrets.OPENAI_KEY }} \
--github_api_key=${{ secrets.SCM_GITHUB_KEY }} \
--issue_url=https://github.com/patched-codes/patchwork/issues/20
--openai_embedding_model=text-embedding-3-small
--issue_url=https://github.com/patched-codes/patchwork/issues/20
12 changes: 11 additions & 1 deletion patchwork/steps/GenerateCodeRepositoryEmbeddings/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Code Documentation

## Inputs

- The code takes inputs in the form of a dictionary passed to the `__init__` method of the `GenerateCodeRepositoryEmbeddings` class.

## Outputs
Expand All @@ -13,4 +14,13 @@
- The `hash_text` function generates a SHA-1 hash for the text content of code files.
- The `GenerateCodeRepositoryEmbeddings` class manages the process of generating embeddings for the code repository.
- It fetches code files, reads their content, generates hashes, and interacts with the ChromaDB database to store embeddings and related metadata.
- The results are then passed to the `GenerateEmbeddings` class for further processing.
- The results are then passed to the `GenerateEmbeddings` class for further processing.
=======
- The code provides a function `filter_files` that takes an iterable of file paths and filters out files based on directory blacklists.
- The code includes a function `batch` that slices an iterable into batches of a specific size.
- It contains a function `hash_text` that hashes a text string using SHA1.
- The `GenerateCodeRepositoryEmbeddings` class is a step class that requires certain keys in the input dictionary, initializes a client, and defines a `run` method that generates code repository embeddings.

## Outputs
- The `GenerateCodeRepositoryEmbeddings` class generates embeddings for code repositories, processes files, handles ignored files, interacts with a database, and eventually runs a separate `GenerateEmbeddings` step with updated inputs.

19 changes: 19 additions & 0 deletions patchwork/steps/GenerateEmbeddings/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
## `patchwork/steps/GenerateEmbeddings/GenerateEmbeddings.py`

### Inputs:
- `inputs` dictionary with keys `"embedding_name"` and `"documents"`.

### Code:
- Defines `filter_by_extension` function to filter files by extension.
- Defines `split_text` function to chunk text based on given parameters.
- Class `GenerateEmbeddings(Step)` inheriting from `Step`.
- Checks for required keys in the input dictionary.
- Initializes the step with input data and sets up a client connection to a vector database.
- Runs the step by processing documents and embeddings, splitting document texts if needed, and upserting data into the vector database.

### Outputs:
- Returns an empty dictionary.

## `patchwork/steps/GenerateEmbeddings/__init__.py`

- Empty file.
16 changes: 16 additions & 0 deletions patchwork/steps/QueryEmbeddings/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## QueryEmbeddings.py

### Inputs:
- `inputs`: A dictionary containing keys "embedding_name" and "texts", and optional keys "top_k" and "token_limit".

### Outputs:
- `embedding_results`: A list of dictionaries containing document details and distances, sorted by distance.

### Code:
- Imports necessary modules from the project.
- Defines a class `QueryEmbeddings` inheriting from `Step`.
- Initializes the class with input data, identifies required keys, and sets up connection to a database.
- Executes a query on input texts, filters results based on token count and distance.
- Returns a sorted list of document details and distances based on the query results.

This code seems to be a part of a larger project involving querying embeddings of texts and returning relevant information based on the query results.
14 changes: 14 additions & 0 deletions patchwork/steps/ReadIssues/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## Documentation: ReadIssues Step

### Inputs
- **Required Keys:** `issue_url`
- **Optional Keys:** `github_api_key`, `gitlab_api_key`, `scm_url`
- **Parameters:**
- `inputs`: A dictionary containing required and optional keys for configuring the step.

### Outputs
- **Returns:**
- A dictionary with the issue text extracted from the provided issue URL.

### Description
The `ReadIssues` step is a class that extends `Step` and is responsible for reading and extracting issue text from a specified issue URL on GitHub or Gitlab. It uses a SCM client based on the provided API key to access the issue information. The step filters out certain file extensions from the issues such as images and documents. The primary functionality includes initialization, checking input data, retrieving issue text, and providing the extracted issue text as output.

0 comments on commit 42435f2

Please sign in to comment.