Skip to content

Commit

Permalink
Merge branch 'resolve-issue-patchflow' into generatereadme-GenerateEm…
Browse files Browse the repository at this point in the history
…beddingsresolve-issue-patchflow
  • Loading branch information
codelion authored Apr 29, 2024
2 parents db896c9 + 94a4697 commit 8420c76
Show file tree
Hide file tree
Showing 5 changed files with 66 additions and 4 deletions.
4 changes: 1 addition & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,5 @@ jobs:
- name: Propose relevant file to issues
run: |
poetry run patchwork ResolveIssue --log debug \
--openai_api_key=${{ secrets.OPENAI_KEY }} \
--github_api_key=${{ secrets.SCM_GITHUB_KEY }} \
--issue_url=https://github.com/patched-codes/patchwork/issues/20
--openai_embedding_model=text-embedding-3-small
--issue_url=https://github.com/patched-codes/patchwork/issues/20
26 changes: 26 additions & 0 deletions patchwork/steps/GenerateCodeRepositoryEmbeddings/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Code Documentation

## Inputs

- The code takes inputs in the form of a dictionary passed to the `__init__` method of the `GenerateCodeRepositoryEmbeddings` class.

## Outputs
- The `run` method of the `GenerateCodeRepositoryEmbeddings` class returns a dictionary containing the results of generating embeddings for a code repository.

### Description
- The code is responsible for generating embeddings for code files in a code repository.
- It uses the `git` Python package to interact with the Git repository where the code files are stored.
- The code filters out specific file types based on a whitelist and ignores certain directories based on a blacklist.
- The `hash_text` function generates a SHA-1 hash for the text content of code files.
- The `GenerateCodeRepositoryEmbeddings` class manages the process of generating embeddings for the code repository.
- It fetches code files, reads their content, generates hashes, and interacts with the ChromaDB database to store embeddings and related metadata.
- The results are then passed to the `GenerateEmbeddings` class for further processing.
=======
- The code provides a function `filter_files` that takes an iterable of file paths and filters out files based on directory blacklists.
- The code includes a function `batch` that slices an iterable into batches of a specific size.
- It contains a function `hash_text` that hashes a text string using SHA1.
- The `GenerateCodeRepositoryEmbeddings` class is a step class that requires certain keys in the input dictionary, initializes a client, and defines a `run` method that generates code repository embeddings.

## Outputs
- The `GenerateCodeRepositoryEmbeddings` class generates embeddings for code repositories, processes files, handles ignored files, interacts with a database, and eventually runs a separate `GenerateEmbeddings` step with updated inputs.

1 change: 0 additions & 1 deletion patchwork/steps/GenerateEmbeddings/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# Summary of `GenerateEmbeddings.py`

## Inputs:
Expand Down
23 changes: 23 additions & 0 deletions patchwork/steps/QueryEmbeddings/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Documentation for `QueryEmbeddings.py`

## Inputs
- Module imports:
- `chromadb`
- `get_embedding_function` from `patchwork.common.utils`
- `get_vector_db_path` from `patchwork.common.utils`
- Classes:
- `QueryEmbeddings` (extending `Step` class)
- Attributes:
- `required_keys` set to `{"embedding_name", "texts"}`
- Methods:
- `__init__` method taking `inputs` dict as a parameter to initialize the class instance.
- `run` method to execute the functionality of querying embeddings.

## Outputs
- A dictionary containing the embedded results of the queried texts. The output includes:
- `embedding_results` key with a value being a sorted list of embedding results by distance.

## Usage
The `QueryEmbeddings` class is designed to query embeddings for a list of texts using a specified embedding function and a given embedding name.
- Users can pass the required inputs to the `__init__` method to create an instance of `QueryEmbeddings`.
- The `run` method executes the query process and returns the dictionary with the embedding results sorted by distance, respecting the specified token limit.
16 changes: 16 additions & 0 deletions patchwork/steps/ReadIssues/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## Patchwork ReadIssues Module

### Inputs
- The `ReadIssues` class in `ReadIssues.py` expects a dictionary `inputs` containing the following keys:
- `issue_url`: URL of the issue
- `github_api_key`: API key for GitHub (optional, can be provided if working with GitHub issues)
- `gitlab_api_key`: API key for GitLab (optional, can be provided if working with GitLab issues)
- `scm_url`: URL of the source code management platform

### Outputs
- The `ReadIssues` class provides a `run` method that returns a dictionary containing the issue text associated with the provided `issue_url`.

### Usage
- The `ReadIssues` class reads issues from a source code management platform (GitHub or GitLab) using the provided API keys and URL.
- It ensures the required input keys are present, selects the appropriate SCM client based on the provided API key, sets the SCM URL, and retrieves the issue text based on the provided issue URL.
- Users can initiate the `RunIssues` class by providing the necessary inputs and then executing the `run` method to obtain the issue text data.

0 comments on commit 8420c76

Please sign in to comment.