-
Notifications
You must be signed in to change notification settings - Fork 85
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'resolve-issue-patchflow' into generatereadme-QueryEmbed…
…dingsresolve-issue-patchflow
- Loading branch information
Showing
4 changed files
with
62 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
26 changes: 26 additions & 0 deletions
26
patchwork/steps/GenerateCodeRepositoryEmbeddings/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Code Documentation | ||
|
||
## Inputs | ||
|
||
- The code takes inputs in the form of a dictionary passed to the `__init__` method of the `GenerateCodeRepositoryEmbeddings` class. | ||
|
||
## Outputs | ||
- The `run` method of the `GenerateCodeRepositoryEmbeddings` class returns a dictionary containing the results of generating embeddings for a code repository. | ||
|
||
### Description | ||
- The code is responsible for generating embeddings for code files in a code repository. | ||
- It uses the `git` Python package to interact with the Git repository where the code files are stored. | ||
- The code filters out specific file types based on a whitelist and ignores certain directories based on a blacklist. | ||
- The `hash_text` function generates a SHA-1 hash for the text content of code files. | ||
- The `GenerateCodeRepositoryEmbeddings` class manages the process of generating embeddings for the code repository. | ||
- It fetches code files, reads their content, generates hashes, and interacts with the ChromaDB database to store embeddings and related metadata. | ||
- The results are then passed to the `GenerateEmbeddings` class for further processing. | ||
======= | ||
- The code provides a function `filter_files` that takes an iterable of file paths and filters out files based on directory blacklists. | ||
- The code includes a function `batch` that slices an iterable into batches of a specific size. | ||
- It contains a function `hash_text` that hashes a text string using SHA1. | ||
- The `GenerateCodeRepositoryEmbeddings` class is a step class that requires certain keys in the input dictionary, initializes a client, and defines a `run` method that generates code repository embeddings. | ||
|
||
## Outputs | ||
- The `GenerateCodeRepositoryEmbeddings` class generates embeddings for code repositories, processes files, handles ignored files, interacts with a database, and eventually runs a separate `GenerateEmbeddings` step with updated inputs. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
## `patchwork/steps/GenerateEmbeddings/GenerateEmbeddings.py` | ||
|
||
### Inputs: | ||
- `inputs` dictionary with keys `"embedding_name"` and `"documents"`. | ||
|
||
### Code: | ||
- Defines `filter_by_extension` function to filter files by extension. | ||
- Defines `split_text` function to chunk text based on given parameters. | ||
- Class `GenerateEmbeddings(Step)` inheriting from `Step`. | ||
- Checks for required keys in the input dictionary. | ||
- Initializes the step with input data and sets up a client connection to a vector database. | ||
- Runs the step by processing documents and embeddings, splitting document texts if needed, and upserting data into the vector database. | ||
|
||
### Outputs: | ||
- Returns an empty dictionary. | ||
|
||
## `patchwork/steps/GenerateEmbeddings/__init__.py` | ||
|
||
- Empty file. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
## Patchwork ReadIssues Module | ||
|
||
### Inputs | ||
- The `ReadIssues` class in `ReadIssues.py` expects a dictionary `inputs` containing the following keys: | ||
- `issue_url`: URL of the issue | ||
- `github_api_key`: API key for GitHub (optional, can be provided if working with GitHub issues) | ||
- `gitlab_api_key`: API key for GitLab (optional, can be provided if working with GitLab issues) | ||
- `scm_url`: URL of the source code management platform | ||
|
||
### Outputs | ||
- The `ReadIssues` class provides a `run` method that returns a dictionary containing the issue text associated with the provided `issue_url`. | ||
|
||
### Usage | ||
- The `ReadIssues` class reads issues from a source code management platform (GitHub or GitLab) using the provided API keys and URL. | ||
- It ensures the required input keys are present, selects the appropriate SCM client based on the provided API key, sets the SCM URL, and retrieves the issue text based on the provided issue URL. | ||
- Users can initiate the `RunIssues` class by providing the necessary inputs and then executing the `run` method to obtain the issue text data. |