Skip to content

Commit

Permalink
[Issue #405] In memory index - obscure basetype to end user (#576)
Browse files Browse the repository at this point in the history
* Adds RTree group in doxygen

* Adds structures and function for RTree in meos

There are two important structures. RTreeNode and RTree. There are 4
basic functions to interact with this structures: create and rtree,
insert something to it, query and rtree and free the memory of an rtree.

* Adds RTree in memory index

We include a basic structure for the Rtree which is working for STBoxes.
It allows to insert one by one. Queries are done with an STBox. A
function to find each value as a double is required to be passed down to
the RTree itself.

* Fix lint to C files

* Adds Example of RTree

Since there is no tests for C code and only for sql code, this example
is in some way also a test.

It shows how you would normally use the Rtree, inserting and querying.

* Fix imports in rtree

* Fix small error on import

* Fixes small bug

The largest axis was always choosing to split via the first element. Now
it changes. Some thought should be done for the temporal axis since it
is normally in a different scale. Maybe a mahalanobis distance would be
better.

* Moves definition of rtree node away from meos

meos.h should only have information that the user would use. RTreeNode
is no an structure that a user would use. Thus, is better to have it in
tpoint_rtree.h and have a forward definition in meos.h for RTree node.

* Changes malloc and realloc for palloc and repalloc

Keeping it consistent with the rest of the codebase

* Reduces the malloc calls in Rtree

RtreeNode will allocate only once and there will be no need to allocate
when inserting since it copying to an already allocated block of memory.

* Creation of RTree done with meostype

I have assumed meostype as an int since I meostype is defined in
meos_catalog and I can't import it from meos.h, thus int was best option
I had.

* Refactor BRANCH and LEAF into RTREE_INNER_NODE

It was ambiguous that both #defines were related so they were refactored
as RTREE_INNER_NODE and RTREE_INNER_NODE_NO

* Improvements to documentation

Adds missing documentation to newly add functions

* Improves readability in vector-ish implementation

To answer the query of rtree_search it is necessary to realloc memory,
thus the code can become a little obscure. Added one extra function and
some comments to make life easier for maintenance.

* Fix lint

* Adds timing in tests

Included some time taking sinceit is important to know the difference
between bruteforce and RTree search.

* Adds endline

* Deletes some comments

* Node box calculate does not use the heap

Change it so that when finding the minimum enclosing STBox there is no
need to allocate new memory.

* Unioned area doesnt use palloc

Use a simple STBox instead of a palloc.

* deletes unnecesary import

* Adds Metadata for Rtree to include basetype

This method obscures the use of basetype and other information to the
final user, he calls rtree_create_stbox() and inside of it is called
rtree_create(T_STBOX) which makes the process cleaner for meos.

* Fix lint

* Deletes the count and stbox from the RTree struct

The count is being deprecated and the box is being inserted into the
metadata.

* Avoid using intermediate RTreeMetadata struct

* Remove #if MEOS from tpoint_rtree.c

---------

Co-authored-by: Maxime Schoemans <maxime.schoemans@ulb.ac.be>
  • Loading branch information
Matematikoi and mschoema authored Aug 28, 2024
1 parent 2eb3a7a commit e8001e3
Show file tree
Hide file tree
Showing 4 changed files with 83 additions and 79 deletions.
13 changes: 5 additions & 8 deletions meos/examples/rtree_example.c
Original file line number Diff line number Diff line change
Expand Up @@ -36,16 +36,13 @@
* @endcode
*/

#include<meos.h>

#include <meos_internal.h>

#include <stdio.h>

#include <stdlib.h>

#include <time.h>

#include <meos.h>
#include <meos_internal.h>

#define NO_STBOX 10000

bool index_result[NO_STBOX];
Expand All @@ -67,7 +64,7 @@ int main() {
clock_t t;
double time_taken;
RTree * rtree;
rtree = rtree_create(T_STBOX);
rtree = rtree_create_stbox();

for (int i = 0; i < NO_STBOX; ++i) {
int xmin = get_random_number(1, 1000);
Expand Down Expand Up @@ -127,4 +124,4 @@ int main() {
printf("\nEXPECTED HITS = %d \n", real_count);
printf("\nINDEX HITS = %d\n", count);
rtree_free(rtree);
}
}
20 changes: 3 additions & 17 deletions meos/include/meos.h
Original file line number Diff line number Diff line change
Expand Up @@ -296,25 +296,11 @@ typedef struct
SkipListElem *elems;
} SkipList;

typedef struct RTreeNode RTreeNode;

/**
* Rtree in memory index basic structure.
*
* It works based on STBox. The spliting criteria is based on the largest axis.
* The inserting criteria is based on least enlarging square.
*
* The get axis function makes it ease to implement with X,Y,Z and time or any
* combination that you may want.
* Structure for the in-memory Rtree index
*/
typedef struct {
int basetype;
RTreeNode *root;
int count;
int dims;
STBox box; /* In the future this should be able to be TBox or Span */
double (*get_axis)(const STBox*, int, bool);
} RTree;
typedef struct RTree RTree;

/*****************************************************************************
* Error codes
Expand Down Expand Up @@ -1253,7 +1239,7 @@ extern STBox *intersection_stbox_stbox(const STBox *box1, const STBox *box2);
* RTree functions
*****************************************************************************/

extern RTree * rtree_create (int basetype);
extern RTree * rtree_create_stbox();
extern void rtree_insert (RTree *rtree ,STBox *box, int64 id);
extern int * rtree_search ( const RTree* rtree,const STBox * query, int * count);
extern void rtree_free(RTree* rtree);
Expand Down
18 changes: 18 additions & 0 deletions meos/include/point/tpoint_rtree.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@
/* MEOS */
#include <meos.h>

#include "general/meos_catalog.h"

/*****************************************************************************
* Definitions
*****************************************************************************/
Expand Down Expand Up @@ -72,6 +74,22 @@ typedef struct RTreeNode{
STBox boxes[MAXITEMS];
} RTreeNode;

/**
* Rtree in memory index basic structure.
*
* It works based on STBox. The spliting criteria is based on the largest axis.
* The inserting criteria is based on least enlarging square.
*
* The get axis function makes it ease to implement with X,Y,Z and time or any
* combination that you may want.
*/
struct RTree {
meosType basetype;
int dims;
RTreeNode *root;
STBox box;
double (*get_axis)(const STBox*, int, bool);
};

/*****************************************************************************/

Expand Down
111 changes: 57 additions & 54 deletions meos/src/point/tpoint_rtree.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,14 +34,10 @@

/* C */
#include <stdlib.h>

#include <math.h>

/* MEOS */
#include <meos.h>

#include <meos_internal.h>

#include "point/tpoint_rtree.h"

/**
Expand Down Expand Up @@ -95,11 +91,11 @@ get_axis_stbox(const STBox * box, int axis, bool upper) {
* @return Pointer to the newly created RTreeNode structure.
*/
static RTreeNode *
node_new(bool kind) {
RTreeNode * node = palloc(sizeof(RTreeNode));
node -> kind = kind;
return node;
}
node_new(bool kind) {
RTreeNode * node = palloc(sizeof(RTreeNode));
node -> kind = kind;
return node;
}

/**
* @brief Calculates the length of an STBox along a specified axis.
Expand Down Expand Up @@ -132,7 +128,8 @@ get_axis_length(const RTree * rtree,
* @return The computed area or volume of the STBox.
*/
static double
box_area(const STBox * box,const RTree * rtree) {
box_area(const STBox * box,
const RTree * rtree) {
double result = 1.0;
for (int i = 0; i < rtree -> dims; ++i) {
result *= get_axis_length(rtree, box, i);
Expand All @@ -155,12 +152,13 @@ box_area(const STBox * box,const RTree * rtree) {
*/
static double
box_unioned_area(const STBox * box,
const STBox * other_box,const RTree * rtree) {
const STBox * other_box,
const RTree * rtree) {
STBox union_box;
memcpy(&union_box, box, sizeof(STBox));
stbox_expand(other_box, &union_box);
memcpy( & union_box, box, sizeof(STBox));
stbox_expand(other_box, & union_box);

return box_area(&union_box, rtree);
return box_area( & union_box, rtree);
}

/**
Expand All @@ -176,7 +174,7 @@ box_unioned_area(const STBox * box,
*/
static int
node_choose_least_enlargement(const RTreeNode * node,
const STBox * box, RTree * rtree) {
const STBox * box,const RTree * rtree) {
int result = 0;
double previous_enlargement = INFINITY;
for (int i = 0; i < node -> count; ++i) {
Expand Down Expand Up @@ -206,7 +204,7 @@ node_choose_least_enlargement(const RTreeNode * node,
* @return The index of the chosen child node for insertion.
*/
static int
node_choose(RTree * rtree,
node_choose(const RTree * rtree,
const STBox * box,
const RTreeNode * node) {
// Check if you can add without expanding any rectangle.
Expand All @@ -228,7 +226,7 @@ node_choose(RTree * rtree,
* @param[out] box STBox that will be expanded
*/
static void
node_box_calculate(const RTreeNode * node, STBox* box) {
node_box_calculate(const RTreeNode * node, STBox * box) {
memcpy(box, & node -> boxes[0], sizeof(STBox));
for (int i = 1; i < node -> count; ++i) {
stbox_expand( & node -> boxes[i], box);
Expand All @@ -246,7 +244,8 @@ node_box_calculate(const RTreeNode * node, STBox* box) {
* @return The index of the axis with the largest length.
*/
static int
stbox_largest_axis(STBox * box, RTree * rtree) {
stbox_largest_axis(const STBox * box,
const RTree * rtree) {
int largest_axis = 0;
double previous_largest = get_axis_length(rtree, box, 0);
for (int i = 1; i < rtree -> dims; ++i) {
Expand Down Expand Up @@ -285,7 +284,7 @@ node_move_box_at_index_into(RTreeNode * from, int index, RTreeNode * into) {
* @details This function exchanges the positions of two STBoxes within a single RTree node.
* If the node is a leaf, it also swaps the associated IDs. For internal nodes, it swaps the
* pointers to child nodes. This function is useful for reordering elements within a node.
* @param[in] node Pointer to the RTreeNode structure containing the STBoxes and associated data.
* @param[in,out] node Pointer to the RTreeNode structure containing the STBoxes and associated data.
* @param[in] i The index of the first STBox to be swapped.
* @param[in] j The index of the second STBox to be swapped.
*/
Expand All @@ -311,7 +310,7 @@ node_swap(RTreeNode * node, int i, int j) {
* along a particular axis. It uses the QuickSort algorithm to order the STBoxes based on their
* axis values, either upper or lower, as provided by the `get_axis` function in the RTree structure.
* @param[in] rtree Pointer to the RTree structure which provides the function for retrieving axis values.
* @param[in] node Pointer to the RTreeNode structure containing the STBoxes to be sorted.
* @param[in,out] node Pointer to the RTreeNode structure containing the STBoxes to be sorted.
* @param[in] index The axis index along which to sort the STBoxes.
* @param[in] upper Boolean flag indicating whether to sort by upper or lower axis value.
* @param[in] s The starting index of the range to be sorted in the `node->boxes` array.
Expand Down Expand Up @@ -385,7 +384,7 @@ node_split(RTree * rtree, RTreeNode * node, STBox * box, RTreeNode ** right_out)
// reverse sort by min axis
node_sort_axis(rtree, right, largest_axis, false);
do {
node_move_box_at_index_into(right, right -> count, node);
node_move_box_at_index_into(right, right -> count - 1, node);
} while (node -> count < MINITEMS);
} else if (right -> count < MINITEMS) {
// reverse sort by max axis
Expand All @@ -409,14 +408,14 @@ node_split(RTree * rtree, RTreeNode * node, STBox * box, RTreeNode ** right_out)
* the appropriate child node for insertion and recursively inserts the STBox. If splitting occurs,
* the function handles the split and updates the parent node's bounding boxes.
* @param[in] rtree Pointer to the RTree structure that provides axis value retrieval and node splitting functions.
* @param[in] old_box Pointer to the STBox that is being replaced or added.
* @param[in] node_bounding_box Pointer to the bounding STBox of all the STBoxes in `node`
* @param[in] node Pointer to the RTreeNode structure where the STBox is being inserted.
* @param[in] new_box Pointer to the STBox to be inserted.
* @param[in] id Identifier associated with the new STBox (used only for leaf nodes).
* @param[out] split Pointer to a boolean flag that indicates if the node was split during insertion.
*/
static void
node_insert(RTree * rtree, STBox * old_box, RTreeNode * node,
node_insert(RTree * rtree, STBox * node_bounding_box, RTreeNode * node,
STBox * new_box, int id, bool * split) {
if (node -> kind == RTREE_INNER_NODE_NO) {
if (node -> count == MAXITEMS) {
Expand Down Expand Up @@ -447,7 +446,7 @@ node_insert(RTree * rtree, STBox * old_box, RTreeNode * node,
node_box_calculate(right, & node -> boxes[node -> count]);
node -> nodes[node -> count] = right;;
node -> count++;
return node_insert(rtree, old_box, node, new_box, id, split);
return node_insert(rtree, node_bounding_box, node, new_box, id, split);

}

Expand Down Expand Up @@ -522,7 +521,7 @@ void node_search(const RTreeNode * node,
/**
* @brief Sets the dimensions of an R-tree based on the meostype of the RTree
*
* @param[in] rtree The R-tree structure whose dimensions are to be set.
* @param[in,out] rtree The R-tree structure whose dimensions are to be set.
* @param[in] box The spatial bounding box (STBox) from which to derive the dimensions.
* @return `true` if the dimensions were successfully set; `false` otherwise.
*/
Expand All @@ -542,11 +541,11 @@ rtree_set_dims(RTree * rtree,
/**
* @brief Sets the appropriate get axis function for the R-tree based on its meostype.
*
* @param[in] rtree The R-tree structure for which the function pointer is to be set.
* @param[in,out] rtree The R-tree structure for which the function pointer is to be set.
* @return `true` if the function pointer was successfully set; `false` otherwise.
*/
static bool
set_rtree_functions(RTree * rtree) {
rtree_set_functions(RTree * rtree) {
switch (rtree -> basetype) {
case T_STBOX:
/** TODO: This should be deprecated when we start using picksplit since this function
Expand All @@ -560,6 +559,24 @@ set_rtree_functions(RTree * rtree) {
return false;
}

/**
* @brief Creates an RTree index.
* @param[in] basetype The meosType of the elements to index.
* Currently the only basetype supported is T_STBOX.
* @return RTree initialized.
*/
RTree *
rtree_create(meosType basetype) {
RTree * rtree = palloc0(sizeof(RTree));
rtree -> basetype = basetype;
if (!rtree_set_functions(rtree)) {
pfree(rtree);
meos_error(ERROR, MEOS_ERR_INVALID_ARG_VALUE, "Unsupported base type for RTree %d", basetype);
return NULL;
}
return rtree;
}

/**
* @ingroup meos_stbox_rtree_index
* @brief Insert an STBox into the RTree index.
Expand All @@ -582,7 +599,6 @@ rtree_insert(RTree * rtree, STBox * box, int64 id) {
node_insert(rtree, & rtree -> box, rtree -> root, box, id, & split);
if (!split) {
stbox_expand(box, & rtree -> box);
rtree -> count++;
return;
}
RTreeNode * new_root = node_new(RTREE_INNER_NODE);
Expand All @@ -601,26 +617,13 @@ rtree_insert(RTree * rtree, STBox * box, int64 id) {

/**
* @ingroup meos_stbox_rtree_index
* @brief Creates an RTree index.
* @note the get axis function is to facilitate STBox having only
* some data available i.e. z is optional.
* @param[in] get_axis function that given an stbox, a dimension and whether
* upper or lower, returns the STBox value at that dimension in the lower or
* uppper value.
* @param[in] dims The number of axis.@
* @return RTree initialized.
* @brief Creates an RTree index for STBoxes.
* @return RTree initialized for STBoxes.
*/
RTree *
rtree_create(int basetype) {
RTree * rtree = palloc0(sizeof(RTree));
rtree -> basetype = basetype;
if (!set_rtree_functions(rtree)) {
pfree(rtree);
meos_error(ERROR, MEOS_ERR_INVALID_ARG_VALUE, "Unsupported base type for RTree %d", basetype);
return NULL;
}
return rtree;
}
rtree_create_stbox() {
return rtree_create(T_STBOX);
}

/**
* @ingroup meos_stbox_rtree_index
Expand All @@ -632,15 +635,15 @@ RTree *
* @return array of ids that have a hit.
*/
int *
rtree_search(const RTree * rtree,
const STBox * query, int * count) {
int * ids = palloc(sizeof(int) * SEARCH_ARRAY_STARTING_SIZE);
* count = 0;
if (rtree -> root) {
node_search(rtree -> root, query, & ids, count);
}
return ids;
rtree_search(const RTree * rtree,
const STBox * query, int * count) {
int * ids = palloc(sizeof(int) * SEARCH_ARRAY_STARTING_SIZE);
* count = 0;
if (rtree -> root) {
node_search(rtree -> root, query, & ids, count);
}
return ids;
}

/**
* @ingroup meos_stbox_rtree_index
Expand Down

0 comments on commit e8001e3

Please sign in to comment.