Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json_tokener_parse_ex: handle out of memory errors #759

Merged
merged 1 commit into from
Jul 4, 2023

Conversation

stoeckmann
Copy link
Contributor

@stoeckmann stoeckmann commented Mar 20, 2022

Do not silently truncate values or skip entries if out of memory errors
occur.

Proof of Concept:

  • Create poc.c, a program which creates an eight megabyte large json
    object with key "A" and a lot of "B"s as value, one of them is
    UTF-formatted:
 #include <err.h>
 #include <stdio.h>
 #include <string.h>

 #include "json.h"

 #define STR_LEN (8 * 1024 * 1024)
 #define STR_PREFIX "{ \"A\": \""
 #define STR_SUFFIX "\\u0042\" }"

int main(void) {
  char *str;
  struct json_tokener *tok;
  struct json_object *obj;

  if ((tok = json_tokener_new()) == NULL)
    errx(1, "json_tokener_new");

  if ((str = malloc(STR_LEN)) == NULL)
    err(1, "malloc");
  memset(str, 'B', STR_LEN);
  memcpy(str, STR_PREFIX, sizeof(STR_PREFIX) - 1);
  memcpy(str + STR_LEN - sizeof(STR_SUFFIX), STR_SUFFIX, sizeof(STR_SUFFIX));

  obj = json_tokener_parse(str);
  free(str);

  printf("%p\n", obj);
  if (obj != NULL) {
    printf("%.*s\n", 50, json_object_to_json_string(obj));
    json_object_put(obj);
  }

  json_tokener_free(tok);
  return 0;
}
  • Compile and run poc, assuming you have enough free heap space:
gcc $(pkg-config --cflags --libs) -o poc poc.c
./poc
0x559421e15de0
{ "A": "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
  • Reduce available heap and run again, which leads to truncation:
ulimit -d 10000
./poc
0x555a5b453de0
{ "A": "B" }
  • Compile json-c with this change and run with reduced heap again:
ulimit -d 10000
./poc
(nil)

The output is limited to 50 characters, i.e. json-c parses the 8 MB
string correctly but the poc does not print all of them to the screen.

The truncation occurs because the parser tries to add all chars up
to the UTF-8 formatted 'B' at once. Since memory is limited to 10 MB
there is not enough for this operation. The parser does not fail but
continues normally.

Another possibility is to create a json file close to 2 GB and run a
program on a system with limited amount of RAM, i.e. around 3 GB. But
ulimit restrictions are much easier for proof of concepts.

Treat memory errors correctly and abort operations.

Do not silently truncate values or skip entries if out of memory errors
occur.

Proof of Concept:

- Create poc.c, a program which creates an eight megabyte large json
  object with key "A" and a lot of "B"s as value, one of them is
  UTF-formatted:

```c
 #include <err.h>
 #include <stdio.h>
 #include <string.h>

 #include "json.h"

 #define STR_LEN (8 * 1024 * 1024)
 #define STR_PREFIX "{ \"A\": \""
 #define STR_SUFFIX "\\u0042\" }"

int main(void) {
  char *str;
  struct json_tokener *tok;
  struct json_object *obj;

  if ((tok = json_tokener_new()) == NULL)
    errx(1, "json_tokener_new");

  if ((str = malloc(STR_LEN)) == NULL)
    err(1, "malloc");
  memset(str, 'B', STR_LEN);
  memcpy(str, STR_PREFIX, sizeof(STR_PREFIX) - 1);
  memcpy(str + STR_LEN - sizeof(STR_SUFFIX), STR_SUFFIX, sizeof(STR_SUFFIX));

  obj = json_tokener_parse(str);
  free(str);

  printf("%p\n", obj);
  if (obj != NULL) {
    printf("%.*s\n", 50, json_object_to_json_string(obj));
    json_object_put(obj);
  }

  json_tokener_free(tok);
  return 0;
}
```
- Compile and run poc, assuming you have enough free heap space:
```
gcc $(pkg-config --cflags --libs) -o poc poc.c
./poc
0x559421e15de0
{ "A": "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
```
- Reduce available heap and run again, which leads to truncation:
```
ulimit -d 10000
./poc
0x555a5b453de0
{ "A": "B" }
```
- Compile json-c with this change and run with reduced heap again:
```
ulimit -d 10000
./poc
(nil)
```

The output is limited to 70 characters, i.e. json-c parses the 8 MB
string correctly but the poc does not print all of them to the screen.

The truncation occurs because the parser tries to add all chars up
to the UTF-8 formatted 'B' at once. Since memory is limited to 10 MB
there is not enough for this operation. The parser does not fail but
continues normally.

Another possibility is to create a json file close to 2 GB and run a
program on a system with limited amount of RAM, i.e. around 3 GB. But
ulimit restrictions are much easier for proof of concepts.

Treat memory errors correctly and abort operations.
@hawicz
Copy link
Member

hawicz commented Jul 4, 2023

These changes look fine, my one reservation is that this eliminates the "fast" codepath where the length is checked before the printbuf_memappend() function call.
Before merging I'd like to get a performance measurement to see whether this actually has a visible impact.

@hawicz
Copy link
Member

hawicz commented Jul 4, 2023

I ran a few ad-hoc performance tests with input files up to ~100MB, but the variation in between runs was greater than any difference between with or without these changes. If anything, a build with these changes seemed marginally faster.
So, I'll merge this, and we can worry about optimizations separately.

@hawicz hawicz merged commit e9d3ab2 into json-c:master Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants