Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close tdbMatrix and tdbMatrixWithIds Array's when we have nothing left to to read #466

Merged

Conversation

jparismorgan
Copy link
Collaborator

@jparismorgan jparismorgan commented Jul 25, 2024

What

When I tested with the new TileDB-Py 0.31.0 wheels (downloaded manually from this run: https://github.com/TileDB-Inc/TileDB-Py/actions/runs/10096606074), I saw a new error when running Flat and IVF Flat:

>           Index.clear_history(uri=index_uri, timestamp=19)

apis/python/test/test_ingestion.py:720: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/homebrew/anaconda3/envs/TileDB-Vector-Search-2/lib/python3.9/site-packages/tiledb/vector_search/index.py:641: in clear_history
    tiledb.Array.delete_fragments(db_uri, 0, timestamp)
libtiledb.pyx:1337: in tiledb.libtiledb.Array.delete_fragments
    ???
libtiledb.pyx:337: in tiledb.libtiledb._raise_ctx_err
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   tiledb.cc.TileDBError: TileDB internal: [Array::set_array_closed] May not perform simultaneous open or close operations.

This happens from code like this:

def test_ingestion_timetravel(tmp_path):
    for index_type, index_class in zip(INDEXES, INDEX_CLASSES):
        index_uri = os.path.join(tmp_path, f"array_{index_type}")

        data = np.array([[1.0, 1.1, 1.2, 1.3], [2.0, 2.1, 2.2, 2.3]], dtype=np.float32)
        default_result_d = [[np.finfo(np.float32).max], [np.finfo(np.float32).max]]
        default_result_i = [[np.iinfo(np.uint64).max], [np.iinfo(np.uint64).max]]

        # We ingest at timestamp 10.
        index = ingest(
            index_type=index_type,
            index_uri=index_uri,
            input_vectors=data,
            index_timestamp=10,
            num_subspaces=2,
        )
        
        Index.clear_history(uri=index_uri, timestamp=19)

and is coming from load_as_matrix():

flat_index.py
           self._db = load_as_matrix(
                self.db_uri,
                ctx=self.ctx,
                config=config,
                size=self.size,
                timestamp=self.base_array_timestamp,
            )

which does:

module.py
def load_as_matrix(
    ...    
    if dtype == np.float32:
        m = tdbColMajorMatrix_f32(ctx, path, 0, None, 0, size, 0, timestamp)
    ...
    m.load()
    return m

I fixed a similar bug for type-erased matrixes by having tdbPartitionedMatrix automatically close Array's when done reading (#448). So here we do the same thing for tdbMatrix and tdbMatrixWithIds.

Testing

  • Existing unit tests pass.
  • Added new unit tests that we can call load() many times and nothing bad happens.
  • With these changes test_ingestion.py now passes with TileDB-Py 0.31.0 wheels.

@jparismorgan jparismorgan marked this pull request as ready for review July 26, 2024 02:05
Copy link
Collaborator

@NikolaosPapailiou NikolaosPapailiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for debugging this!

@NikolaosPapailiou NikolaosPapailiou merged commit 7adba24 into main Jul 26, 2024
6 checks passed
@NikolaosPapailiou NikolaosPapailiou deleted the jparismorgan/close-tdb-array-if-finished-loading branch July 26, 2024 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants