-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/2479/simplify databackend contract #2700
Refactor/2479/simplify databackend contract #2700
Conversation
Still problematic:
|
superduper/base/datalayer.py
Outdated
for r in insert.documents: | ||
r.setdefault( | ||
'_fold', | ||
'train' if random.random() >= s.CFG.fold_probability else 'valid', | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic that duplicates with the _add_fold_to_insert
superduper/components/model.py
Outdated
# TODO why all this complex logic just to get ids | ||
if not self.db.databackend.check_output_dest(predict_id): | ||
overwrite = True | ||
try: | ||
if not overwrite: | ||
if ids: | ||
select = select.select_using_ids(ids) | ||
select = select.select(select.primary_id) | ||
# TODO - this is broken | ||
query = select.select_ids_of_missing_outputs(predict_id=predict_id) | ||
|
||
# query = select.select_ids_of_missing_outputs(predict_id=predict_id) | ||
predict_ids = select.missing_ids(predict_id).execute() | ||
else: | ||
if ids: | ||
return ids | ||
query = select.select_ids | ||
predict_ids = select.ids() | ||
except FileNotFoundError: | ||
# This is case for sql where Table is not created yet | ||
# and we try to access `db.load('table', name)`. | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to sort out this logic, something like this
if overwrite:
predict_ids = ids or select.ids()
else:
ids = ids or []
missing_ids = select.missing_ids(predict_id)
predict_ids = [id_ for id in ids if id_ in missing_ids]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm planning a full overhaul of this logic in an upcoming PR.
@@ -11,7 +11,7 @@ | |||
from superduper.components.model import ObjectModel | |||
from superduper.components.vector_index import VectorIndex | |||
|
|||
from superduper_mongodb.query import MongoQuery | |||
from superduper_mongodb.query import MongoDBQuery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only have Query and do not need MongoDBQuery
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update our query testing utils for this generic interface.
def test_insert(db): | ||
db['documents'].insert([{'x': i} for i in range(10)]).execute() | ||
r = db.databackend._db.documents.find_one() | ||
assert 'x' in r | ||
|
||
|
||
def test_select_table(db): | ||
db['documents'].insert([{'x': i} for i in range(10)]).execute() | ||
|
||
results = db['documents'].execute() | ||
assert len(results) == 10 | ||
|
||
|
||
def test_ids(db): | ||
|
||
db.cfg.auto_schema = True | ||
|
||
db['documents'].insert([{'x': i} for i in range(10)]).execute() | ||
|
||
results = db['documents'].ids() | ||
|
||
assert len(results) == 10 | ||
|
||
assert all(isinstance(x, str) for x in results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move them to test/utils
, then the ibis
and mongodb
can reuse it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that the query testing utility makes sense if all plugins test the same queries. TBD.
afab96f
to
fb86814
Compare
24e206c
to
20cab12
Compare
fdc66ad
to
c23916a
Compare
c23916a
to
1afa81f
Compare
Description
superduper.backends.base.query.Query
superduper_<plugin>.Executor
t.insert(...)
instead oft.insert().execute()
since we don't need to serialize insertionst.update
,t.delete
deprecatedt.filter(...).select(...).outputs(...)
, ort.like().(...)
ort.(...).like()
t.get()
to get one data point (eager)t.ids()
to get the ids (eager)t.subset(ids)
to subset a queryt.limit(n, offset=m)
to get a chunk of data.execute()
no longer returns a cursor, instead a simple listt.column == x
, replace witht['column'] == x
q.dict()['documents']