Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix StreamString read O(n^2); improve range read performance #4584

Closed
wants to merge 4 commits into from

Conversation

Adam5Wu
Copy link
Contributor

@Adam5Wu Adam5Wu commented Mar 28, 2018

I ran into a performance anomaly where decoding a base64 certificate is faster than reading the decoded DER blob from a StreamString.

When I inspect the operation, I found StreamString is reading the string from head to tail, byte by byte, and with each byte read, it removes it from the head, which triggers memmove of the rest of the string content 1 byte forward... such algorithm is O(n^2)

char c = charAt(0);
remove(0, 1);

This PR changed the internals of StreamString to overcome the slow read operation:

  1. Instead of removing bytes from the head of string, an offset is used to keep track of read progress;
  2. Range read is further optimized with a single memcpy.

There is, however, a bit of caveat with this change:

  1. Any data consumed by read is not lost. It could be re-consumed by resetting the offset. I consider it a benefit, and have added a reset() method to do just that;
  2. String assignment needs to be accompanied by a reset() call. Because the offset is in derived class while String is the base class, assignment of new string content will not automatically trigger offset reset (unless a significant amount of refactoring is done in the String class).

@@ -0,0 +1,2 @@
compiler.c.extra_flags=-DASYNC_TCP_SSL_ENABLED=1 -DASYNC_TCP_SSL_BEARSSL=1 -DESPZW_DEBUG_LEVEL=1 -DESPZW_LOG_TIME=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you really mean to include this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, accident... maybe I should add this file to .gitignore?

@devyte
Copy link
Collaborator

devyte commented Mar 28, 2018

This is a very interesting find, and could explain some wdt issues I've seen.
Can you give an example of the second caveat?

@Adam5Wu
Copy link
Contributor Author

Adam5Wu commented Mar 28, 2018

Sure, let's say I have a service function which converts any input string to a new string only contains chars from the odd indices.

Input = "123456789"
Output = "13579"
String gimmeOddChars(String &&instr) {
  static StreamString ss;
  String out;

  ss = std::move(instr);
  ss.reset();
  int c = ss.read();
  while (c >= 0) {
    out += (char)c;
    ss.read();
    c = ss.read();
  }
  return std::move(out);
}

Before this PR, without ss.reset(); the code should function as expected (abeit slowly);
After this PR, ss.reset(); is required for the code to function as expected.

@devyte
Copy link
Collaborator

devyte commented Mar 28, 2018

@Adam5Wu what about implementing StreamString::operator=()? It could forward to String::operator=() and then call its own reset().

@Adam5Wu
Copy link
Contributor Author

Adam5Wu commented Mar 28, 2018

Well, I think I did not phrase the 2nd caveat well. Not only assignments are affected.
Any string manipulation routine need to be overriden to consider the offset, such as remove, replace.
This will make them all subject to at least one virtual function call.

Then there are discussion #4551 and PR #4582

Having virtual function call in String would significantly reduce the likelihood vtable could be placed on flash. I think this may have too much performance implications and design constraints.

I'd rather settle with a little bit of manual efforts. :D

@Adam5Wu
Copy link
Contributor Author

Adam5Wu commented Mar 28, 2018

Another possible approach is, instead of StreamString, let's have StringStream, which is just an Stream adapter of String, instead of descendant.

class StringStream : public Stream {
  protected:
    size_t _offset;
    String const &backend;
  public:
    StringStream(String const &str);
    ....
}

In this way we can safely override assignment operator without worry about inflicting unwanted virtual method in String.

Also since it is an adapter, by convention user of the adapter is responsible for maintaining the content of the backend string, so any String manipulation is outside of the contract.

It is not going to solve the problem (string manipulation can cause weird Stream behavior), but just pushing it aside, but the argument would be, these problems shouldn't be our (String-Stream adapter) responsibility in the first place -- the StreamString simply promised too much it cannot deliver.

@devyte
Copy link
Collaborator

devyte commented Mar 29, 2018

@Adam5Wu I'm not an expert on how things are handled under the hood by the compilers (ABI in this case), but my understanding is as follows:
Consider a case with base class VirtualBase1, base class VirtualBase2, and DerivedClass that inherits virtual methods from both VirtualBase1 and VirtualBase2. Then, DerivedClass has two vtables, one for each of the base methods.
In this case, String doesn't have virtual methods, and Stream is a pure abstract class. Therefore, StreamString should have a single vtable for the Stream virtual methods, and no vtable for String. That means that implementing forwarder methods in StreamString in terms of the String methods to allow correct handling of the offset should not increase mem overhead at all.

@Adam5Wu
Copy link
Contributor Author

Adam5Wu commented Mar 29, 2018

Yes, if we implemented a StreamString::operator =, without marking the String::operator = virtual, then the child class operator hides the parent class operator.

But the hidding is superficial, it only works if the reference is the child class; if the reference becomes the parent class, then only the parent operator will be invoked, not the child. For example:

StreamString Test;
Test = "123"; // StreamString::operator = is invoked

function FillString(String &out) {
  out = "456";
}
FillString(Test); // String::operator = is invoked, offset not reset

To truly override the behavior, no matter what reference is used to refer to the instance, we need to declare the method virtual in parent class, and use override keyword in child class methods.

@Adam5Wu
Copy link
Contributor Author

Adam5Wu commented Mar 29, 2018

If the argument is that, we are only interested in the superficial hidding of parent class's operator, then a more suitable approach is encapulation (i.e. the adapter) instead of inheritance.
Because encapsulation makes user explicitly aware there are two different classes, so there is no chance they can get confused and do the wrong thing (i.e. invoke hidden parent's operator while thinking they are using the child's operator)

String backstore;
StringStream adapted(backstore);

function FillString(String &out) {
  out = "456";
}
FillString(adapted); // Does not compile
FillString(backstore); // Works, and user explicitly aware what is going on

@devyte
Copy link
Collaborator

devyte commented Mar 29, 2018

Now I understand what you had in mind, and you're right. The example you just showed above works with the current O(n2) code.
Sigh, the more use cases I see of inheritance, the deeper into templates I go...
Personally, I agree about encapsulation and forwarders However, my concern here is a change in the behavior, which will cause code out there to break. So the question is: how do we avoid all of:

  • the current performance penalty
  • force the user to have to manually call reset() when the String content changes
  • make String methods virtual
    ?

@Adam5Wu
Copy link
Contributor Author

Adam5Wu commented Mar 29, 2018

Template also have its weirdness.
Because a template is not a concrete type, so it becomes very difficult to generalize, which in turn, makes type-safe reference become very difficult.

A good example is ArduinoJson's StaticJsonBuffer. It is written in template for very good reason.
However, when I want to write a utility function, which parses a json fragment and insert it into an existing json structure, I find it next to impossible:

void parseAndInsertJson(StaticJsonBuffer<?> &buffer, JsonObject &existing, String &fragment) {
   existing["newData"] = buffer.parse(fragment);
}

The above function cannot be realized with any simple transformation that I know of...

There may be some very ugly hack that can achieve similar effects with templated class, but I'd rather not getting into it...

@Adam5Wu
Copy link
Contributor Author

Adam5Wu commented Mar 29, 2018

So the question is: how do we avoid all of

There is another way, but you may not like it either:
Inject the offset field and maintenance logic into String, and make StreamString its friend.

Effectively String has some extra logic and field nobody can use except StreamString.
It avoids all the pitfalls, but the prices is a little extra memory and CPU cycle wasted in every instance that is not StreamString.

Aesthetically, I think nobody will like this one. :(

@Adam5Wu
Copy link
Contributor Author

Adam5Wu commented Mar 29, 2018

Maybe we should start a poll, ask everyone to vote what they think are evil:

  1. O(n^2) time for read
  2. O(n) time for read, need to remember call reset() if they reassign or manipulate string content
  3. O(n) time for read, make String methods virtual (slight slow down of all common uses, some vtable space, and if vtable in flash, cannot use String in ISR)
  4. O(n) time for read, inject offset field and maintenance logic in String base class (slight slow down of all common uses, slight increase of memory footprint of all common uses)
  5. O(n) time for read, replace with StringStream adapter, so user must be aware they are manipulating the backing store of stream (so they won't forget to call reset())
  6. O(n) time for read, mark inheritance of String as protected or private, so user simply cannot manipulate it as String (and forget to call reset())

And we choose the least evil solution.

@devyte
Copy link
Collaborator

devyte commented Mar 29, 2018

template<size_t sizeT>
void parseAndInsertJson(StaticJsonBuffer<sizeT> &buffer, JsonObject &existing, String &fragment) {
   ...
}

@Adam5Wu
Copy link
Contributor Author

Adam5Wu commented Mar 29, 2018

@devyte Ah yes, the template problem can be solved by ... more templates! :D
But once you board the template ship, you cannot get off easily. :)

What if I need to use this method as a callback? (Actually it is a callback in my design, forgot to mention it earlier)

@devyte
Copy link
Collaborator

devyte commented Mar 29, 2018

1 is too slow.
2, 5, and 6 are breaking changes, so they could cause existing apps and libs to break. I guess we could implement one of these, but e.g.: for release 3.0.0.
3 increases mem overhead in every sketch out there by introducing a vtable for String (iram and heap cases), or make use of String from ISRs forbidden due to flash cache (vtables in flash).

...that leaves 4, which makes my brain cry.

@igrr @earlephilhower @d-a-v do any of you have insight for this?

@devyte
Copy link
Collaborator

devyte commented Mar 29, 2018

I suppose another option is possible:
7. make cache access from ISRs not an issue (ref: mask user facing interrupts, #3579 , wrap SDK functions Cache_Read_Enable*, Cache_Read_Disable*), do option 3, and put vtables in flash

@earlephilhower
Copy link
Collaborator

@Adam5Wu, @devyte - Maybe there is some middle ground here that's not as algorithmically pure but can work just fine without introducing the reset method which I think was the main sticking point here.

The thread's long, so maybe I'm missing something, but looks like nothing is blocking if we can open things up in String, too.

How about lazy reset on a write to the string while updating WString.cpp to handle the reset() (we own String class, so no need to make it virtual so you could percolate the offset bits to String itself, right?)?

Even just including a real readBytes (which right now is always O(n^2) since it calls the read() over and over from Stream.h) would be a big win without any real breaking changes, no? Could be a simple separate PR from this.

@Adam5Wu
Copy link
Contributor Author

Adam5Wu commented Oct 8, 2018

How about lazy reset on a write to the string while updating WString.cpp to handle the reset() (we own String class, so no need to make it virtual so you could percolate the offset bits to String itself, right?)?

@earlephilhower, that matches No. 4 of my previously suggested solution.

@Adam5Wu
Copy link
Contributor Author

Adam5Wu commented Oct 8, 2018

I personally slightly prefer solution 5 or 6, which are breaking changes.

However, I think the conceptual modeling of StreamString is somewhat problematic, which is the root of all these problems.

  • Having an adapter to operate a String backed storage as stream is desirable;
  • Having a hybrid class which can be operated both as stream and as string, independently and interchangeably, is functionally an overkill. How many scenarios demand this kind of functionality? Especially in Arduino context?

Fixing the root yields much cleaner code and easier maintenance, compared to adding more and more complications to try to make a bad idea approximately right.

@earlephilhower
Copy link
Collaborator

Ah, sorry, I did see it but didn't recognize it. It's getting a little long!

I'm not seeing (4) as that bad, honestly, and not breaking == goodness.

A nice dumb implementation has a tiny if{} that's inlined in all String functions. if ( _offset) move_string_such_that_offset_eq_zero(). Load of a class variable, compare to zero, jump. 5-7 CPU instructions except when you've been using the StringStream on it. Extra 2 bytes per string to hold _offset (maybe not even that if we can pack it somewhere already used for padding).

Compared to what happens if you end up doing a heap operation, feels like nothing in the common case.

@devyte
Copy link
Collaborator

devyte commented Oct 8, 2018

Yes, as I said:

...that leaves 4, which makes my brain cry.

Then, for release 3.0.0 we could pull 4 out, and implement the adapter to move forward cleanly.

@mcbp223
Copy link
Contributor

mcbp223 commented Jan 29, 2019

The stl stringstream implementation from <sstream> maintains reading/writing state separated from the store as in (5), but owns the string in basic_stringbuf, and generates (as of 4.8.2) memory allocations for constructor from string, and for string retrieval, so maybe the adapter is better in a memory constricted environment, without cow strings. c++11 move semantics could help with this and favor an implementation more in line with the standard.
Apart from these considerations, would it be feasible to have two implementations, one non-breaking, StreamString, eventually marked for deprecation, and one StringStream as proposed?

@earlephilhower earlephilhower added this to the 3.0.0 milestone Oct 1, 2019
@earlephilhower earlephilhower added the merge-conflict PR has a merge conflict that needs manual correction label Feb 27, 2020
@earlephilhower earlephilhower mentioned this pull request Aug 18, 2020
@d-a-v
Copy link
Collaborator

d-a-v commented Mar 31, 2021

Closing per #6979 (comment)

@d-a-v d-a-v closed this Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-conflict PR has a merge conflict that needs manual correction
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants