Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dump functions with operator<< overloads #5026

Merged
merged 30 commits into from
Jul 9, 2024

Conversation

kounelisagis
Copy link
Member

@kounelisagis kounelisagis commented May 30, 2024

[sc-32959]

In TileDB-Py schema.dump() is called without any argument, leading writes to C stdout which Jupyter does not capture:

if (out == nullptr)
out = stdout;

Implementing a function that captures the output of the dump() to a string and then prints it to sys.stdout will fix the problem. Of course, ArraySchema::dump(std::string*) calls the dump() functions of other classes, so those need to be implemented as well.

But we don't even want to write functions called dump internally; that's a C idiom; we will reserve that term for the C API only. Since we're dealing with text output, everything here can be done better by overloading operator<< for std::ostream.


TYPE: IMPROVEMENT
DESC: Output of schema.dump() in TileDB-Py is not captured by Jupyter. Replacing dump functions with operator<< overloads will give the ability to print the resulted string from Python.

@kounelisagis kounelisagis force-pushed the agis/add-dump-functions-with-str-argument branch from 8f974d8 to 2c08b88 Compare May 30, 2024 11:36
@kounelisagis kounelisagis marked this pull request as ready for review May 30, 2024 15:32
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these functions that implement a "dump" should be implemented as nonmember functions. This is in part to assure correctness. There's nothing in such a function for a schema that should ever need to rely on private member variables or private member functions. Using nonmember functions assures that they can't be called.

For historical reasons, our serialization and deserialization functions, both for storage and network, are implemented as member functions, but it would be better if they were moved out themselves. That's out of scope of this PR, but all the new code going in can be done this way.

We don't even want to write functions called dump internally; that's a C idiom; we will reserve that term for the C API only. Since we're dealing with text output, everything here can be done better by overloading operator<< for std::ostream. We don't want functions that return strings.

Changes required:

  • No new dump functions.
  • New overloads of operator<<.
  • Implement C API functions with stringstream and fstream internally.

For an quick reference for the write function signatures, see the section "Stream Insertion and Extraction" in https://stackoverflow.com/questions/4421706/what-are-the-basic-rules-and-idioms-for-operator-overloading.

tiledb/sm/array_schema/attribute.h Outdated Show resolved Hide resolved
tiledb/sm/array_schema/array_schema.h Outdated Show resolved Hide resolved
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The various T::dump(FILE *) functions should all be inlined into their call sites in the C API implementation functions. The class-specific parts are not in operator<< functions and all that's remaining in these is a bit of interface code. Also in all of these function fwrite is better than fprintf.

tiledb/sm/array_schema/array_schema.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/attribute.h Outdated Show resolved Hide resolved
tiledb/sm/array_schema/array_schema.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/array_schema.cc Show resolved Hide resolved
tiledb/sm/array_schema/array_schema.h Outdated Show resolved Hide resolved
tiledb/sm/array_schema/array_schema.h Outdated Show resolved Hide resolved
tiledb/sm/array_schema/attribute.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even without dealing with the issues with Filter, there are a number of mostly-small changes to be made.

For class Filter, it looks like the following would work:

  1. A single definition of operator<< for filters.
  2. A pure virtual function Filter::output that (1) calls. This should be protected.
  3. An override of Filter::output in each class derived from filter. Also protected.
  4. A friend declaration for operator<< so that Filter::output is available.

tiledb/sm/array_schema/attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/dimension.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/dimension.cc Show resolved Hide resolved
tiledb/sm/array_schema/dimension.h Outdated Show resolved Hide resolved
tiledb/sm/array_schema/dimension_label.h Outdated Show resolved Hide resolved
tiledb/sm/array_schema/domain.h Outdated Show resolved Hide resolved
tiledb/sm/array_schema/enumeration.cc Outdated Show resolved Hide resolved
tiledb/sm/c_api/tiledb.h Show resolved Hide resolved
tiledb/sm/cpp_api/schema_base.h Outdated Show resolved Hide resolved
@TileDB-Inc TileDB-Inc deleted a comment from eric-hughes-tiledb Jun 10, 2024
@kounelisagis
Copy link
Member Author

kounelisagis commented Jun 10, 2024

Even without dealing with the issues with Filter, there are a number of mostly-small changes to be made.

For class Filter, it looks like the following would work:

  1. A single definition of operator<< for filters.
  2. A pure virtual function Filter::output that (1) calls. This should be protected.
  3. An override of Filter::output in each class derived from filter. Also protected.
  4. A friend declaration for operator<< so that Filter::output is available.

@eric-hughes-tiledb, making it a friend causes the need to be in tiledb::sm, causing all other operator<< overloads to also be in the namespace.

@eric-hughes-tiledb
Copy link
Contributor

making it a friend causes the need to be in tiledb::sm

Friend declarations can be for namespace-qualified identifiers. friend ::something for the global namespace, for instance.

@kounelisagis kounelisagis force-pushed the agis/add-dump-functions-with-str-argument branch from f5451fe to 67d7ed4 Compare June 11, 2024 00:01
@kounelisagis kounelisagis force-pushed the agis/add-dump-functions-with-str-argument branch from 91bbef2 to 7baa8fd Compare June 11, 2024 01:05
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't approve this until ArraySchema::dump is gone, along with all of its ilk. Now that we have proper stream output, there's nothing left of value in these functions. All of the

All the assert statements here have to go. If the need for [[maybe_unused]] wasn't clear enough, assert does not always check the return value of an I/O function. (It does so only in some compiles, but not all.) We always need to check the return value of I/O functions.

The signature of Filter::output has to change. And we don't need Filter::dump any longer after this change; it and all its overrides can just go.

There are a number of places that use \n explicitly. These should be replaced with std::endl.

tiledb/sm/array_schema/array_schema.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/array_schema.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/current_domain.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/dimension.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/ndrectangle.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/enumeration.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/attribute.cc Outdated Show resolved Hide resolved
tiledb/sm/array_schema/dimension_label.cc Outdated Show resolved Hide resolved
tiledb/sm/c_api/tiledb.h Outdated Show resolved Hide resolved
tiledb/sm/filter/filter.h Outdated Show resolved Hide resolved
Copy link
Contributor

@eric-hughes-tiledb eric-hughes-tiledb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't resolve all your suggestions

I have. The problem with the regression test seems to have been a preexisting defect; there was missing inclusion of the external dimension header.

I went ahead and fixed up all my last comments as well, since I already was building the branch locally.

LGTM.

The result is a solid improvement over the prior.

@kounelisagis kounelisagis changed the title Add dump functions with std::string* argument Replace dump functions with operator<< overloads Jul 4, 2024
Copy link
Member

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These APIs do not exist.

tiledb/doxygen/source/c-api.rst Outdated Show resolved Hide resolved
tiledb/doxygen/source/c-api.rst Outdated Show resolved Hide resolved
@kounelisagis kounelisagis force-pushed the agis/add-dump-functions-with-str-argument branch 2 times, most recently from 700a974 to e72a39c Compare July 4, 2024 09:37
@kounelisagis kounelisagis force-pushed the agis/add-dump-functions-with-str-argument branch from 0813e4b to 7d71079 Compare July 4, 2024 11:37
Copy link
Member

@teo-tsirpanis teo-tsirpanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@KiterLuc KiterLuc merged commit f40530f into dev Jul 9, 2024
62 checks passed
@KiterLuc KiterLuc deleted the agis/add-dump-functions-with-str-argument branch July 9, 2024 18:07
github-actions bot pushed a commit that referenced this pull request Jul 9, 2024
[sc-32959]

In TileDB-Py `schema.dump()` is called without any argument, leading
writes to C `stdout` which Jupyter does not capture:
https://github.com/TileDB-Inc/TileDB/blob/76dbda43d98bff7b71527fbffd0dfd657437b00f/tiledb/sm/array_schema/array_schema.cc#L693-L694

Implementing a function that captures the output of the `dump()` to a
string and then prints it to `sys.stdout` will fix the problem. Of
course, `ArraySchema::dump(std::string*)` calls the `dump()` functions
of other classes, so those need to be implemented as well.

**But** we don't even want to write functions called dump internally;
that's a C idiom; we will reserve that term for the C API only. Since
we're dealing with text output, everything here can be done better by
overloading `operator<<` for `std::ostream`.

---
TYPE: IMPROVEMENT
DESC: Output of `schema.dump()` in TileDB-Py is not captured by Jupyter.
Replacing `dump` functions with `operator<<` overloads will give the
ability to print the resulted string from Python.

---------

Co-authored-by: Eric Hughes <eric.hughes@tiledb.com>
(cherry picked from commit f40530f)
KiterLuc added a commit that referenced this pull request Jul 18, 2024
[sc-32959]

Following #5026, `dump` C++ APIs were mistakenly deleted. This PR
reimplements them as deprecated. `operator<<` remains the canonical
interface.

---
TYPE: NO_HISTORY
DESC: Reimplement `dump` C++ APIs as deprecated.

---------

Co-authored-by: Luc Rancourt <lucrancourt@gmail.com>
Co-authored-by: KiterLuc <67824247+KiterLuc@users.noreply.github.com>
Co-authored-by: Theodore Tsirpanis <theodore.tsirpanis@tiledb.com>
KiterLuc pushed a commit that referenced this pull request Sep 2, 2024
…dump(FILE*)` with `operator<<` overload. (#5266)

After #5026, the only `dump`
API that remains without a string counterpart is on `FragmentInfo`. This
PR adds the `tiledb_fragment_info_dump_str` C API and replaces
`FragmentInfo::dump(FILE*)` with an `operator<<` overload.

[sc-50605]

---
TYPE: C_API | CPP_API
DESC: Add `tiledb_fragment_info_dump_str` C API and replace
`FragmentInfo::dump(FILE*)` with `operator<<` overload.

---------

Co-authored-by: Theodore Tsirpanis <theodore.tsirpanis@tiledb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants