Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and improve abstract reader/writer interface #208

Open
wants to merge 26 commits into
base: dev
Choose a base branch
from

Conversation

sjlongland
Copy link

tinycbor fork branch thiagomacieira/dev adds support for abstract readers and writers, but the implementation had some limitations in cases where multiple CborValue cursors were iterating over the same CBOR document, causing state contamination. The interface also was not documented.

This documents the new interface and further builds upon it by slightly re-arranging CborParser members and opening the token in CborValue to use by the reader interface for any required purpose.

This pull request is now re-based on intel/dev and supercedes thiagomacieira#2.

@sjlongland
Copy link
Author

Forced commits to:

  1. remove stale commit from the old thiagomacieira/dev
  2. fix authorship of commits so that Github's PGP signature verification is happy.

Copy link
Member

@thiagomacieira thiagomacieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Stuart

Thank you for your patience. This is looking really good. I will not insist on coding style changes in the examples (they're your code) nor on the auto usage. The reinterpret_cast is a different case, please update that and please move the Doxygen docs to the .c sources (See https://www.doxygen.nl/manual/docblocks.html#structuralcommands).

I will accept the change without the getter-setter functions working on tokens and context. We can discuss which ones to have after this is merged. I'd rather have wrapper functions so the users don't need to know about the internals of the object, so we can later change them if necessary, like you're doing now.

Finally, what's WSHUB-458? Sounds like a JIRA task number. Can you add a link to the commit message body to what this is and remove from the first line?

src/cbor.h Outdated Show resolved Hide resolved
src/cbor.h Show resolved Hide resolved
tests/parser/tst_parser.cpp Outdated Show resolved Hide resolved
tests/parser/tst_parser.cpp Outdated Show resolved Hide resolved
tests/parser/tst_parser.cpp Outdated Show resolved Hide resolved
@sjlongland
Copy link
Author

sjlongland commented Sep 7, 2021

Thank you for your patience. This is looking really good. I will not insist on coding style changes in the examples (they're your code) nor on the auto usage. The reinterpret_cast is a different case, please update that and please move the Doxygen docs to the .c sources (See https://www.doxygen.nl/manual/docblocks.html#structuralcommands).

Ahh, the auto and reinterpret_cast are copied and pasted from the existing test cases. I normally avoid such things but was trying to match the style of the existing test cases. I will fix that. :-)

Finally, what's WSHUB-458? Sounds like a JIRA task number. Can you add a link to the commit message body to what this is and remove from the first line?

Good point, force of habit due to the fact I was doing this at work. I'll strip that token from the commit messages.

@sjlongland sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from d2c2fd1 to 5b74b5d Compare September 7, 2021 22:05
@thiagomacieira
Copy link
Member

(oops, accidentally edited instead of quoting)

Ahh, the auto and reinterpret_cast are copied and pasted from the existing test cases. I normally avoid such things but was trying to match the style of the existing test cases. I will fix that. :-)

I'm using the Qt rules for auto: it has to be clear what it is, for someone reviewing on a dumb tool like GitHub. That usually means obvious (like iterators) or the type is visible on the same line.

@sjlongland
Copy link
Author

I'm using the Qt rules for auto: it has to be clear what it is, for someone reviewing on a dumb tool like GitHub. That usually means obvious (like iterators) or the type is visible on the same line.

Agreed, code clarity must be paramount. I do more C than C++ so not used to using C++ features like auto and tend to avoid using it for that reason.

Copy link
Member

@thiagomacieira thiagomacieira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge and let me see about using this in Qt too.

@thiagomacieira
Copy link
Member

Gah, Travis hasn't run yet... Need to find someone to write GitHub Actions workflow, though likely that'll be me.

@sjlongland
Copy link
Author

Ahh travis.org is no more… I hit this on the weekend with one of my own projects (aioax25) and it's a landmine waiting to go blam for some of the @vrtsystems and @widesky projects (pyat, cachefs, hszinc) that are still set up to do CI on Travis-CI. (And sadly, Github Actions doesn't hold a candle to Travis-CI, but Travis-CI is expensive now.)

@thiagomacieira
Copy link
Member

Ahh travis.org is no more… I hit this on the weekend with one of my own projects (aioax25) and it's a landmine waiting to go blam for some of the @vrtsystems and @widesky projects (pyat, cachefs, hszinc) that are still set up to do CI on Travis-CI. (And sadly, Github Actions doesn't hold a candle to Travis-CI, but Travis-CI is expensive now.)

Too bad. Thanks for the update. Let me do a manual check locally and then force the merging.

@sjlongland
Copy link
Author

So, last time we looked at this, we got caught out by Travis CI's shutdown. I've just done a rebase on the current dev branch to poke CI again.

@sjlongland sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from c356a31 to ec1ebfe Compare June 21, 2024 00:31
@sjlongland
Copy link
Author

Second rebase to pick up the changes on main that weren't merged to dev.

pjonsson and others added 10 commits June 21, 2024 10:33
Dependabot can provide automatic pull requests
for things in the repository that should
be updated.

Example PR from dependabot against a repo owned
by the github organization:

github/opensource.guide#2248

Documentation:

https://docs.github.com/en/code-security/dependabot/dependabot-security-updates/about-dependabot-security-updates
facilitate replacement of malloc/free functions

open_memstream functions are left out of this change as they require
other functions from stdlib.
We don't support this any more.
```
../../src/cbor.h:255:69: error: '_Bool' is a C99 extension [-Werror,-Wc99-extensions]
CBOR_INLINE_API CborError cbor_encode_boolean(CborEncoder *encoder, bool value)
                                                                    ^
```

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
Homebrew no longer carries precompiled versions of Qt for macos-11, so
switch to macos-13, the last on x86 CPUs (so we can still run
Valgrind). It's also going away after the end of June.

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
```
==21147== Valgrind: debuginfo reader: ensure_valid failed:
==21147== Valgrind:   during call to ML_(img_get)
==21147== Valgrind:   request for range [18446744069408125024, +16) exceeds
==21147== Valgrind:   valid image size of 140733057859584 for image:
==21147== Valgrind:   "/usr/local/Cellar/icu4c/74.2/lib/libicudata.74.2.dylib"
```

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
I don't think we need to worry about Qt 4 any more.

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
rdower and others added 16 commits June 21, 2024 10:33
Intel says this is needed

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
`cbor_parser_init` and `cbor_parser_init_reader` are substantially
similar, however the latter misses clearing out `it->flags`, leaving it
uninitialised so possibly unsafe.

Rather than copying & pasting that from `cbor_parser_init`, lets just
use one routine that does the "common" part, then each routine can focus
on the specifics needed.
Describe the input parameters for the function and how they are used as
best we understand from on-paper analysis of the C code.
The `token` parameter is not sufficient since it is effectively shared
by all `CborValue` instances.  Since `tinycbor` often uses a temporary
`CborValue` context to perform some operation, we need to store our
context inside that `CborValue` so that we don't pollute the global
state of the reader.
In its place, put an arbitrary `void *` pointer for reader context.  The
reader needs to store some context information which is specific to the
`CborParser` instance it is serving.  Right now, `CborValue::source::token`
serves this purpose, but the problem is that we also need a
per-`CborValue` context and have nowhere to put it.

Better to spend an extra pointer (4 bytes on 32-bit platforms) in the
`CborParser` (which there'll be just one of), then to do it in the
`CborValue` (which there may be several of) or to use a `CborReader`
object that itself carries two pointers (`ops` and the context, thus
we'd need an extra 3 pointers).
We simplify this reader in two ways:
1. we remove the `consumed` member of `struct Input`, and instead use
   the `CborValue`'s `source.token` member, which we treat as an
   unsigned integer offset into our `QByteArray`.
2. we replace the reader-specific `struct Input` with the `QByteArray`
   it was wrapping, since that's the only thing now contained in our
   `struct Input`.

If a `CborValue` gets cloned, the pointer referred to by `source.token`
similarly gets cloned, thus when we advance the pointer on the clone, it
leaves the original alone, so computing the length of unknown-length
entities in the CBOR document can be done safely.
What is not known, is what the significance is of
`CborEncoderAppendType`.  It basically tells the writer the nature of
the data being written, but the default implementation ignores this and
just blindly appends it no matter what.

That raises the question of why it's important enough that the writer
function needs to know about it.
This reads a CBOR file piece-wise, seeking backward and forward through
the file if needed.  Some seeking can be avoided by tuning the block
size used in reads so that the read window shifts by smaller amounts.
Not 100% sure of the syntax for documenting struct-members outside of
the struct as I'm too used to doing it inline, hopefully this works as
expected. :-)
@sjlongland sjlongland force-pushed the feature/WSHUB-455-chunked-codec branch from ec1ebfe to 427e009 Compare June 21, 2024 00:37
@sjlongland
Copy link
Author

Okay, after picking up the changes in main, we have a merge conflict, so final rebase onto dev again to resolve the merge conflict. This should unify the currently diverging main and dev branches once more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants