Skip to content

Commit

Permalink
add mp4
Browse files Browse the repository at this point in the history
  • Loading branch information
florian committed Jun 15, 2024
1 parent f6ff615 commit c64dbba
Show file tree
Hide file tree
Showing 2 changed files with 66 additions and 16 deletions.
82 changes: 66 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,73 @@
# ✨PyPi Scout

https://drive.google.com/file/d/1huR7-VD3AieBRCcQyRX9MWbPLMb_czjq/view?usp=sharing
<video width="820">
<source src="./pypi-scout.mp4" type="video/mp4">
</video>

# setup
![](./pypi-scout.mp4)

```sh
cp .env.template .env
```
PyPi Scout helps you find PyPi packages using natural language prompts powered by Large Language Models (LLMs).

add API token
The project works by collecting project summaries and descriptions for all packages on PyPI with more than 50 weekly downloads. These are then converted into vector representations using [Sentence Transformers](https://www.sbert.net/). When the user enters a query, it is converted into a vector representation, and the most similar package descriptions are fetched from the vector database. Additional weight is given to weekly downloads before presenting the results to the user in a dashboard.

```
docker build -t pypi-scout .
```
## Getting Started

```
docker run --rm \
--env-file .env \
-v $(pwd)/data:/code/data \
pypi-scout \
python /code/pypi_scout/scripts/setup.py
```
### Prerequisites

1. **Create a `.env` File**

Copy the `.env.template` to create a new `.env` file:

```sh
cp .env.template .env
```

2. **Set Up Pinecone**

Since PyPi Scout uses [Pinecone](https://www.pinecone.io/) as the vector database, register for a free account on their website. Obtain your API key from [here](https://docs.pinecone.io/guides/get-started/quickstart) and add it to your `.env` file.

### Build and Setup

1. **Build the Docker Image**

From the root of the project, build the Docker image:

```sh
docker build -t pypi-scout .
```

2. **Run the Setup Script**

Execute the setup script to download and process the PyPI dataset, set up your Pinecone index, create vector embeddings, and upsert them to the Pinecone index:

```sh
docker run --rm \
--env-file .env \
-v $(pwd)/data:/code/data \
pypi-scout \
python /code/pypi_scout/scripts/setup.py
```

This script will:

- Download and process the PyPI dataset and store the results in the `data` directory
- Set up your Pinecone index
- Create vector embeddings for the PyPI dataset and upsert them to the Pinecone index

3. **Run the Application**

Start the application using Docker Compose:

```sh
docker-compose up
```

After a short while, your application will be live at [http://localhost:3000](http://localhost:3000).

## Data

The dataset for this project is created using the [PyPI dataset on Google BigQuery](https://console.cloud.google.com/marketplace/product/gcp-public-data-pypi/pypi?project=regal-net-412415). The SQL query used can be found in [pypi_bigquery.sql](./pypi_bigquery.sql). The resulting dataset is available as a CSV file on [Google Drive](https://drive.google.com/file/d/1huR7-VD3AieBRCcQyRX9MWbPLMb_czjq/view?usp=sharing).

---

By following these instructions, you'll have PyPi Scout up and running, enabling you to find the best PyPi packages with ease using natural language queries. Enjoy exploring!
Binary file added pypi-scout.mp4
Binary file not shown.

0 comments on commit c64dbba

Please sign in to comment.