Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add screenshots data explorer #555

Merged
merged 5 commits into from
Oct 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/art/data_explorer/data_explorer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/art/data_explorer/image_explorer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 12 additions & 12 deletions docs/components/hub.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ Below you can find the reusable components offered by Fondant.

--8<-- "components/caption_images/README.md:1"

??? "chunk_text"

--8<-- "components/chunk_text/README.md:1"

??? "download_images"

--8<-- "components/download_images/README.md:1"
Expand All @@ -18,22 +22,18 @@ Below you can find the reusable components offered by Fondant.

--8<-- "components/embed_images/README.md:1"

??? "embedding_based_laion_retrieval"
??? "embed_text"

--8<-- "components/embedding_based_laion_retrieval/README.md:1"
--8<-- "components/embed_text/README.md:1"

??? "filter_comments"
??? "embedding_based_laion_retrieval"

--8<-- "components/filter_comments/README.md:1"
--8<-- "components/embedding_based_laion_retrieval/README.md:1"

??? "filter_image_resolution"

--8<-- "components/filter_image_resolution/README.md:1"

??? "filter_line_length"

--8<-- "components/filter_line_length/README.md:1"

??? "image_cropping"

--8<-- "components/image_cropping/README.md:1"
Expand All @@ -42,6 +42,10 @@ Below you can find the reusable components offered by Fondant.

--8<-- "components/image_resolution_extraction/README.md:1"

??? "index_weaviate"

--8<-- "components/index_weaviate/README.md:1"

??? "language_filter"

--8<-- "components/language_filter/README.md:1"
Expand All @@ -62,10 +66,6 @@ Below you can find the reusable components offered by Fondant.

--8<-- "components/minhash_generator/README.md:1"

??? "pii_redaction"

--8<-- "components/pii_redaction/README.md:1"

??? "prompt_based_laion_retrieval"

--8<-- "components/prompt_based_laion_retrieval/README.md:1"
Expand Down
24 changes: 16 additions & 8 deletions docs/data_explorer.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# Data explorer

## Data explorer UI

The data explorer UI enables Fondant users to explore the inputs and outputs of their Fondant pipeline.

The user can specify a pipeline and a specific pipeline run and component to explore. The user will then be able to explore the different subsets produced by by Fondant components.

The chosen subset (and the columns within the subset) can be explored in 3 tabs.

![data explorer](../art/data_explorer/data_explorer.png)

## How to use?
You can setup the data explorer container with the `fondant explore` CLI command, which is installed together with the Fondant python package.

Expand All @@ -16,21 +26,19 @@ Example:
```bash
fondant explore --base_path gs://foo/bar --auth-gcp
```
## Data explorer UI

The data explorer UI enables Fondant users to explore the inputs and outputs of their Fondant pipeline.

The user can specify a pipeline and a specific pipeline run and component to explore. The user will then be able to explore the different subsets produced by by Fondant components.

The chosen subset (and the columns within the subset) can be explored in 3 tabs.

### Sidebar
In the sidebar, the user can specify the path to a manifest file. This will load the available subsets into a dropdown, from which the user can select one of the subsets. Finally, the columns within the subset are shown in a multiselect box, and can be used to remove / select the columns that are loaded into the exploration tabs.

### Data explorer Tab
The data explorer shows an interactive table of the loaded subset DataFrame with on each row a sample. The table can be used to browse through a partition of the data, to visualize images inside image columns and more.

### Numeric analysis Tab
The numerical analysis tab shows statistics of the numerical columns of the loaded subset (mean, std, percentiles, ...) in a table. In the second part of the tab, the user can choose one of the numerical columns for in depth exploration of the data by visualizing it in a variety of interactive plots.

![data explorer](../art/data_explorer/data_explorer_numeric_analysis.png)

### Image explorer Tab
The image explorer tab enables the user to choose one of the image columns and analyse these images.
The image explorer tab enables the user to choose one of the image columns and analyse these images.

![data explorer](../art/data_explorer/image_explorer.png)