Skip to content

Commit

Permalink
docs(interactive): Add document about column_mapping (#3912)
Browse files Browse the repository at this point in the history
Clarifying the behavior of interactive when importing data from diverse
sources.
  • Loading branch information
zhanglei1949 committed Jun 13, 2024
1 parent aff323a commit a46f1fe
Show file tree
Hide file tree
Showing 5 changed files with 68 additions and 15 deletions.
28 changes: 26 additions & 2 deletions docs/flex/interactive/data_import.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,28 @@ In our guide on [using custom graph data](./custom_graph_data.md), we introduced

Currently we only support import data to graph from local `csv` files or `odps` table. See configuration `loading_config.data_source.scheme`.

## Column mapping

When importing vertex and edge data into a graph, users must define how the raw data maps to the graph's schema.
This can be done using a YAML configuration, as shown:

```yaml
- column:
index: 0 # Column index in the data source
name: col_name # If a column name is present
property: property_name # The mapped property name
```
The column mapping requirements differ based on the data source:
#### Import from CSV
You can provide either `index`, `name`, or both. If both `index` and `name` are specified, we will check whether they matches.

#### Import from ODPS Table

You just need to specify the name of the `column`, since the name is guaranteed to be unique in a odps table. The `index` is disregarded.

## Sample Configuration for loading "Modern" Graph from csv files

To illustrate, let's examine the `examples/modern_import_full.yaml` file. This configuration is designed for importing the "modern" graph and showcases the full range of configuration possibilities. We'll dissect each configuration item in the sections that follow.
Expand Down Expand Up @@ -67,11 +89,13 @@ edge_mappings:
source_vertex_mappings:
- column:
index: 0
name: id
name: src_id
property: id
destination_vertex_mappings:
- column:
index: 1
name: id
name: dst_id
property: id
column_mappings:
- column:
index: 2
Expand Down
2 changes: 1 addition & 1 deletion flex/interactive/examples/movies/import.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ edge_mappings:
destination_vertex: Movie
column_mappings:
- column:
index: 3
index: 2
name: rating
property: rating
inputs:
Expand Down
33 changes: 26 additions & 7 deletions flex/storages/rt_mutable_graph/loader/csv_fragment_loader.cc
Original file line number Diff line number Diff line change
Expand Up @@ -356,7 +356,6 @@ void CSVFragmentLoader::fillVertexReaderMeta(
// parse all column_names

std::vector<std::string> included_col_names;
std::vector<size_t> included_col_indices;
std::vector<std::string> mapped_property_names;

auto cur_label_col_mapping = loading_config_.GetVertexColumnMappings(v_label);
Expand All @@ -383,20 +382,29 @@ void CSVFragmentLoader::fillVertexReaderMeta(

for (size_t i = 0; i < read_options.column_names.size(); ++i) {
included_col_names.emplace_back(read_options.column_names[i]);
included_col_indices.emplace_back(i);
// We assume the order of the columns in the file is the same as the
// order of the properties in the schema, except for primary key.
mapped_property_names.emplace_back(property_names[i]);
}
} else {
for (size_t i = 0; i < cur_label_col_mapping.size(); ++i) {
auto& [col_id, col_name, property_name] = cur_label_col_mapping[i];
if (col_name.empty()) {
// use default mapping
if (col_name.empty()){
if (col_id >= read_options.column_names.size() || col_id < 0) {
LOG(FATAL) << "The specified column index: " << col_id
<< " is out of range, please check your configuration";
}
col_name = read_options.column_names[col_id];
}
// check whether index match to the name if col_id is valid
if (col_id >= 0 && col_id < read_options.column_names.size()) {
if (col_name != read_options.column_names[col_id]) {
LOG(FATAL) << "The specified column name: " << col_name
<< " does not match the column name in the file: "
<< read_options.column_names[col_id];
}
}
included_col_names.emplace_back(col_name);
included_col_indices.emplace_back(col_id);
mapped_property_names.emplace_back(property_name);
}
}
Expand Down Expand Up @@ -521,10 +529,21 @@ void CSVFragmentLoader::fillEdgeReaderMeta(
for (size_t i = 0; i < cur_label_col_mapping.size(); ++i) {
// TODO: make the property column's names are in same order with schema.
auto& [col_id, col_name, property_name] = cur_label_col_mapping[i];
if (col_name.empty()) {
// use default mapping
if (col_name.empty()){
if (col_id >= read_options.column_names.size() || col_id < 0) {
LOG(FATAL) << "The specified column index: " << col_id
<< " is out of range, please check your configuration";
}
col_name = read_options.column_names[col_id];
}
// check whether index match to the name if col_id is valid
if (col_id >= 0 && col_id < read_options.column_names.size()) {
if (col_name != read_options.column_names[col_id]) {
LOG(FATAL) << "The specified column name: " << col_name
<< " does not match the column name in the file: "
<< read_options.column_names[col_id];
}
}
included_col_names.emplace_back(col_name);
mapped_property_names.emplace_back(property_name);
}
Expand Down
18 changes: 14 additions & 4 deletions flex/storages/rt_mutable_graph/loading_config.cc
Original file line number Diff line number Diff line change
Expand Up @@ -148,16 +148,26 @@ static bool parse_column_mappings(
LOG(ERROR) << "column_mappings should have field [column]";
return false;
}
int32_t column_id;
int32_t column_id = -1;
if (!get_scalar(column_mapping, "index", column_id)) {
LOG(ERROR) << "Expect column index for column mapping";
return false;
VLOG(10) << "Column index for column mapping is not set, skip";
}
else {
if (column_id < 0) {
LOG(ERROR) << "Column index for column mapping should be non-negative";
return false;
}
}
std::string column_name;
std::string column_name = "";
if (!get_scalar(column_mapping, "name", column_name)) {
VLOG(10) << "Column name for col_id: " << column_id
<< " is not set, make it empty";
}
// At least one need to be specified.
if (column_id == -1 && column_name.empty()) {
LOG(ERROR) << "Expect column index or name for column mapping";
return false;
}

std::string property_name; // property name is optional.
if (!get_scalar(node[i], "property", property_name)) {
Expand Down
2 changes: 1 addition & 1 deletion flex/storages/rt_mutable_graph/loading_config.h
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ class LoadingConfig {
GetEdgeLoadingMeta() const;

// Get vertex column mappings. Each element in the vector is a pair of
// <column_index, property_name>.
// <column_index, column_name, property_name>.
const std::vector<std::tuple<size_t, std::string, std::string>>&
GetVertexColumnMappings(label_t label_id) const;

Expand Down

0 comments on commit a46f1fe

Please sign in to comment.