-
Notifications
You must be signed in to change notification settings - Fork 50
Reading data
For the following code examples, it is assumed that Kangas has been imported as follows:
import kangas as kg
Kangas can read a Pandas DataFrame object directly.
import pandas as pd
df = pd.DataFrame(...)
dg = kg.read_dataframe(df)
Likewise, you can easily create a DataFrame from a DataGrid:
df = dg.to_dataframe()
HuggingFace's datasets can be loaded into DataGrid directly because they use rows of dictionaries, and images are represented by PIL images. DataGrid will automatically convert PIL images into a Kangas Image.
from datasets import load_dataset
dataset = load_dataset("beans", split="train")
dg = kg.DataGrid(dataset)
Grouping on labels
in the Kangas UI:
In addition, Kangas can also read in annotation data (such as bounding boxes) from HuggingFace datasets. For more information on HuggingFace's datasets, see: https://huggingface.co/datasets
Kangas can read directly from CSV files. This is a more nuanced process than Pandas CSV reading as it preserves
floats, integers, and dates automatically. Kangas also supports a dictionary of converters
.
dg = kg.read_csv("samples.csv")
dg = kg.read_csv("https://company.com/samples.csv")
dg = kg.read_csv("https://company.com/samples.csv.zip")
You can also read from a URL, and if the file is in an archived format ("zip", "tgz", etc.) then it will download, unarchive, and load it.
For more options on reading CSV files, see DataGrid.read_csv()
See also:
This example uses the dataset from: https://www.kaggle.com/c/dog-breed-identification
Here, we use one DataGrid to read the CSV file, and then construct another that contains the breed and image.
dg = kg.read_csv("labels.csv")
dogs = kg.DataGrid(
name="Dog Breeds",
columns=["Breed", "Image"],
)
for row in dg.to_dicts():
dogs.append([row["breed"], kg.Image("train/" + row["id"] + ".jpg")])
Grouping on "breed" in the Kangas UI gives:
You can also read a CSV from a URL, and if the file is in an archived format ("zip", "tgz", etc.) then it will download, unarchive, and load it.
Kangas can read JSON line files as described here: https://jsonlines.org/
A "JSON line file" is basically JSON objects, one per line. These are useful as you can process one line at a time, rather than needing to ready the entire file into memory before deserializing it.
dg = kg.read_json("json_line_file.json")
You can also read a JSON line file from a URL, and if the file is in an archived format ("zip", "tgz", etc.) then it will download, unarchive, and load it.
For more options on reading JSON line files, see DataGrid.read_json()
Each of the read_
methods, and the DataGrid
constructor itself, also takes a parameter named converters
. This is a dictionary where the key is a column name, and the value is a function of one argument. The function should take the column's raw value, and returns the converted value.
In addition, you can also use a dictionary with the key "row". This special form takes the entire row as a dictionary. You can alter one column based on the values of another. For example:
def huggingface_annotations(row):
cppe_labels = ["Coverall", "FaceShield", "Gloves", "Goggles", "Mask"]
if "image" in row and "objects" in row:
# cppe
if isinstance(row["image"], Image) and isinstance(row["objects"], dict):
if ("bbox" in row["objects"]) and ("category" in row["objects"]):
boxes = row["objects"]["bbox"]
labels = row["objects"]["category"]
for box, label in zip(boxes, labels):
x, y, w, h = box
row["image"].add_bounding_boxes(
cppe_labels[label], [[x, y], [x + w, y + h]]
)
dg = DataGrid(data, converters={"row": huggingface_annotations})
This example will read the contents of the "objects" JSON column, and add the data as bounding boxes to an "image" column.
Kangas DataGrid is completely open source; sponsored by Comet ML
-
Home
- User Guides
- Installation - installing kangas
- Reading data - importing data
- Constructing DataGrids - building from scratch
- Exploring data - exploration and analysis
- Examples - scripts and notebooks
- Kangas Command-Line Interface
- Kangas Python API
- Integrations - with Hugging Face and Comet
- User Interface
- FAQ - Frequently Asked Questions
- Under the Hood
- Security - issues related to security
- Development - setting up a development environment
- Roadmap - plans and known issues
- User Guides