Skip to content

Commit

Permalink
Merge branch 'resolve-issue-patchflow' into generatereadme-QueryEmbed…
Browse files Browse the repository at this point in the history
…dingsresolve-issue-patchflow
  • Loading branch information
codelion authored Apr 29, 2024
2 parents 77e0c32 + 7b2702b commit daff306
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 3 deletions.
4 changes: 1 addition & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,5 @@ jobs:
- name: Propose relevant file to issues
run: |
poetry run patchwork ResolveIssue --log debug \
--openai_api_key=${{ secrets.OPENAI_KEY }} \
--github_api_key=${{ secrets.SCM_GITHUB_KEY }} \
--issue_url=https://github.com/patched-codes/patchwork/issues/20
--openai_embedding_model=text-embedding-3-small
--issue_url=https://github.com/patched-codes/patchwork/issues/20
26 changes: 26 additions & 0 deletions patchwork/steps/GenerateCodeRepositoryEmbeddings/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Code Documentation

## Inputs

- The code takes inputs in the form of a dictionary passed to the `__init__` method of the `GenerateCodeRepositoryEmbeddings` class.

## Outputs
- The `run` method of the `GenerateCodeRepositoryEmbeddings` class returns a dictionary containing the results of generating embeddings for a code repository.

### Description
- The code is responsible for generating embeddings for code files in a code repository.
- It uses the `git` Python package to interact with the Git repository where the code files are stored.
- The code filters out specific file types based on a whitelist and ignores certain directories based on a blacklist.
- The `hash_text` function generates a SHA-1 hash for the text content of code files.
- The `GenerateCodeRepositoryEmbeddings` class manages the process of generating embeddings for the code repository.
- It fetches code files, reads their content, generates hashes, and interacts with the ChromaDB database to store embeddings and related metadata.
- The results are then passed to the `GenerateEmbeddings` class for further processing.
=======
- The code provides a function `filter_files` that takes an iterable of file paths and filters out files based on directory blacklists.
- The code includes a function `batch` that slices an iterable into batches of a specific size.
- It contains a function `hash_text` that hashes a text string using SHA1.
- The `GenerateCodeRepositoryEmbeddings` class is a step class that requires certain keys in the input dictionary, initializes a client, and defines a `run` method that generates code repository embeddings.

## Outputs
- The `GenerateCodeRepositoryEmbeddings` class generates embeddings for code repositories, processes files, handles ignored files, interacts with a database, and eventually runs a separate `GenerateEmbeddings` step with updated inputs.

19 changes: 19 additions & 0 deletions patchwork/steps/GenerateEmbeddings/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
## `patchwork/steps/GenerateEmbeddings/GenerateEmbeddings.py`

### Inputs:
- `inputs` dictionary with keys `"embedding_name"` and `"documents"`.

### Code:
- Defines `filter_by_extension` function to filter files by extension.
- Defines `split_text` function to chunk text based on given parameters.
- Class `GenerateEmbeddings(Step)` inheriting from `Step`.
- Checks for required keys in the input dictionary.
- Initializes the step with input data and sets up a client connection to a vector database.
- Runs the step by processing documents and embeddings, splitting document texts if needed, and upserting data into the vector database.

### Outputs:
- Returns an empty dictionary.

## `patchwork/steps/GenerateEmbeddings/__init__.py`

- Empty file.
16 changes: 16 additions & 0 deletions patchwork/steps/ReadIssues/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## Patchwork ReadIssues Module

### Inputs
- The `ReadIssues` class in `ReadIssues.py` expects a dictionary `inputs` containing the following keys:
- `issue_url`: URL of the issue
- `github_api_key`: API key for GitHub (optional, can be provided if working with GitHub issues)
- `gitlab_api_key`: API key for GitLab (optional, can be provided if working with GitLab issues)
- `scm_url`: URL of the source code management platform

### Outputs
- The `ReadIssues` class provides a `run` method that returns a dictionary containing the issue text associated with the provided `issue_url`.

### Usage
- The `ReadIssues` class reads issues from a source code management platform (GitHub or GitLab) using the provided API keys and URL.
- It ensures the required input keys are present, selects the appropriate SCM client based on the provided API key, sets the SCM URL, and retrieves the issue text based on the provided issue URL.
- Users can initiate the `RunIssues` class by providing the necessary inputs and then executing the `run` method to obtain the issue text data.

0 comments on commit daff306

Please sign in to comment.