-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #10 from zzstoatzz/sync
sync first and update docs
- Loading branch information
Showing
21 changed files
with
709 additions
and
351 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Contributing to Raggy | ||
|
||
We love your input! We want to make contributing to Raggy as easy and transparent as possible. | ||
|
||
## Development Setup | ||
|
||
We recommend using [uv](https://github.com/astral-sh/uv) for Python environment management and package installation: | ||
|
||
```bash | ||
# Install uv | ||
curl -LsSf https://astral.sh/uv/install.sh | sh | ||
|
||
# Clone the repo | ||
git clone https://github.com/zzstoatzz/raggy.git | ||
cd raggy | ||
|
||
# Create and activate a virtual environment | ||
uv venv | ||
|
||
# Install in editable mode with dev dependencies | ||
uv pip install -e ".[dev]" | ||
``` | ||
|
||
## Running Tests | ||
|
||
```bash | ||
# Install test dependencies | ||
uv pip install -e ".[test]" | ||
|
||
# Run tests | ||
pytest | ||
``` | ||
|
||
## Building Documentation | ||
|
||
```bash | ||
# Install docs dependencies | ||
uv pip install -e ".[docs]" | ||
|
||
# Serve docs locally | ||
mkdocs serve | ||
``` | ||
|
||
## Code Style | ||
|
||
``` | ||
pre-commit install | ||
pre-commit run --all-files # happens automatically on commit | ||
``` | ||
|
||
## Running Examples | ||
|
||
All examples can be run using uv: | ||
|
||
!!! question "where are the dependencies?" | ||
`uv` will run the example in an isolated environment using [inline script dependencies](https://docs.astral.sh/uv/guides/scripts/#declaring-script-dependencies). | ||
|
||
```bash | ||
# Run example | ||
uv run examples/chat_with_X/website.py | ||
``` | ||
|
||
See our [example gallery](examples/index.md) for more details. | ||
|
||
## Versioning | ||
|
||
We use [Semantic Versioning](http://semver.org/). For the versions available, see the [tags on this repository](https://github.com/zzstoatzz/raggy/tags). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Example Gallery | ||
|
||
Here are some practical examples of using `raggy` in real-world scenarios. | ||
|
||
## Chat with Content | ||
|
||
Ye old "chat your data" examples. | ||
|
||
#### Chat with a Website | ||
|
||
```bash | ||
uv run examples/chat_with_X/website.py "let's chat about docs.astral.sh/uv" | ||
``` | ||
|
||
#### Chat with a GitHub Repo | ||
|
||
```bash | ||
uv run examples/chat_with_X/repo.py "let's chat about astral-sh/uv" | ||
``` | ||
|
||
## Refresh Vectorstores | ||
|
||
A `prefect` flow to gather documents from sources of knowledge, embed them and put them in a vectorstore. | ||
|
||
#### Refresh TurboPuffer | ||
|
||
```bash | ||
uv run examples/refresh_vectorstore/tpuf_namespace.py | ||
``` | ||
|
||
#### Refresh Chroma | ||
|
||
```bash | ||
uv run examples/refresh_vectorstore/chroma_collection.py | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,21 +1,8 @@ | ||
import logging | ||
import subprocess | ||
|
||
log = logging.getLogger("mkdocs") | ||
|
||
|
||
def on_pre_build(config, **kwargs): | ||
"""Add a custom route to the server.""" | ||
try: | ||
subprocess.run( | ||
[ | ||
"npx", | ||
"tailwindcss", | ||
"-i", | ||
"./docs/overrides/tailwind.css", | ||
"-o", | ||
"./docs/static/css/tailwind.css", | ||
] | ||
) | ||
except Exception: | ||
log.error("You need to install tailwindcss using npx install tailwindcss") | ||
"""Add any pre-build hooks here.""" | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,86 @@ | ||
# Coming soon! | ||
# Ingest Strategy | ||
|
||
When building RAG applications, you often need to load and refresh content from multiple sources. This can involve: | ||
- Expensive API calls | ||
- Large document processing | ||
- Concurrent embedding operations | ||
|
||
We use [Prefect](https://docs.prefect.io) to handle these challenges, giving us: | ||
|
||
- Automatic caching of expensive operations | ||
- Concurrent processing with backpressure | ||
- Observability and retries | ||
|
||
Let's look at a real example that demonstrates these concepts. | ||
|
||
## Building a Knowledge Base | ||
|
||
```python | ||
from datetime import timedelta | ||
import httpx | ||
from prefect import flow, task | ||
from prefect.tasks import task_input_hash | ||
|
||
from raggy.loaders.github import GitHubRepoLoader | ||
from raggy.loaders.web import SitemapLoader | ||
from raggy.vectorstores.tpuf import TurboPuffer | ||
|
||
# Cache based on content changes | ||
def get_last_modified(context, parameters): | ||
"""Only reload if the content has changed.""" | ||
try: | ||
return httpx.head(parameters["urls"][0]).headers.get("Last-Modified", "") | ||
except Exception: | ||
return None | ||
|
||
@task( | ||
cache_key_fn=get_last_modified, | ||
cache_expiration=timedelta(hours=24), | ||
retries=2, | ||
) | ||
async def gather_documents(urls: list[str]): | ||
return await SitemapLoader(urls=urls).load() | ||
|
||
@flow | ||
async def refresh_knowledge(): | ||
# Load from multiple sources | ||
documents = [] | ||
for loader in [ | ||
SitemapLoader(urls=["https://docs.prefect.io/sitemap.xml"]), | ||
GitHubRepoLoader(repo="PrefectHQ/prefect", include_globs=["README.md"]), | ||
]: | ||
documents.extend(await gather_documents(loader)) | ||
|
||
# Store efficiently with concurrent embedding | ||
with TurboPuffer(namespace="knowledge") as tpuf: | ||
await tpuf.upsert_batched( | ||
documents, | ||
batch_size=100, # tune based on document size | ||
max_concurrent=8 # tune based on rate limits | ||
) | ||
``` | ||
|
||
This example shows key patterns: | ||
|
||
1. Content-aware caching (`Last-Modified` headers, commit SHAs, etc) | ||
2. Automatic retries for resilience | ||
3. Concurrent processing with backpressure | ||
4. Efficient batching of embedding operations | ||
|
||
See the [refresh examples](https://github.com/zzstoatzz/raggy/tree/main/examples/refresh_vectorstore) for complete implementations using both Chroma and TurboPuffer. | ||
|
||
## Performance Tips | ||
|
||
For production workloads: | ||
```python | ||
@task( | ||
retries=2, | ||
retry_delay_seconds=[3, 60], # exponential backoff | ||
cache_expiration=timedelta(days=1), | ||
persist_result=True, # save results to storage | ||
) | ||
async def gather_documents(loader): | ||
return await loader.load() | ||
``` | ||
|
||
See [Prefect's documentation](https://docs.prefect.io/latest/concepts/tasks/) for more on task configuration and caching strategies. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{% extends "base.html" %} | ||
|
||
{% block announce %} | ||
<style> | ||
.md-announce { | ||
font-family: 'Roboto Mono', monospace; | ||
background-color: var(--md-primary-fg-color); | ||
} | ||
.md-announce__inner { | ||
margin: 0 auto; | ||
padding: 0.2rem; | ||
text-align: center; | ||
font-weight: 300; | ||
letter-spacing: 0.05em; | ||
} | ||
</style> | ||
<a href="{{ config.extra.announcement.link }}" style="color: currentColor"> | ||
{{ config.extra.announcement.text }} | ||
</a> | ||
{% endblock %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.