Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce converters to consume Postgres Json and convert these to String/byte[] #453

Closed
jdmwood opened this issue Sep 9, 2020 · 14 comments
Labels
type: enhancement A general enhancement

Comments

@jdmwood
Copy link

jdmwood commented Sep 9, 2020

I think I have found a potential memory leak using Spring Data + R2BC Posgres.

I have a model and repository like this (Kotlin code):

@Table
data class MyTable(
    val data: Json
)

interface MyTableRepository : CoroutineCrudRepository<MyTable, Int> {
}

The SQL to create this table is:

create table my_table (
     data json not null
);

If I do a query like this:

       for (i in 0..1000) {
            testRepo.findAll().toList()
        }

... then I get a bunch of Netty memory leak errors (https://netty.io/wiki/reference-counted-objects.html). The culprit seems to be this one:

2020-09-09 13:26:51.761 ERROR 78212 --- [actor-tcp-nio-2] io.netty.util.ResourceLeakDetector       : LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 
Created at:
	io.netty.buffer.SimpleLeakAwareByteBuf.unwrappedDerived(SimpleLeakAwareByteBuf.java:143)
	io.netty.buffer.SimpleLeakAwareByteBuf.retainedSlice(SimpleLeakAwareByteBuf.java:52)
	io.netty.buffer.AdvancedLeakAwareByteBuf.retainedSlice(AdvancedLeakAwareByteBuf.java:89)
	io.r2dbc.postgresql.codec.JsonCodec.doDecode(JsonCodec.java:49)
	io.r2dbc.postgresql.codec.JsonCodec.doDecode(JsonCodec.java:34)
	io.r2dbc.postgresql.codec.AbstractCodec.decode(AbstractCodec.java:82)
	io.r2dbc.postgresql.codec.DefaultCodecs.decode(DefaultCodecs.java:129)
	io.r2dbc.postgresql.PostgresqlRow.decode(PostgresqlRow.java:90)
	io.r2dbc.postgresql.PostgresqlRow.get(PostgresqlRow.java:77)
	io.r2dbc.spi.Row.get(Row.java:78)
	org.springframework.data.r2dbc.convert.MappingR2dbcConverter$RowParameterValueProvider.getParameterValue(MappingR2dbcConverter.java:656)
	org.springframework.data.r2dbc.convert.MappingR2dbcConverter$$Lambda$1333/0000000000000000.apply(Unknown Source)
	org.springframework.data.relational.core.conversion.BasicRelationalConverter$ConvertingParameterValueProvider.getParameterValue(BasicRelationalConverter.java:264)
	org.springframework.data.mapping.model.ClassGeneratingEntityInstantiator$EntityInstantiatorAdapter.extractInvocationArguments(ClassGeneratingEntityInstantiator.java:262)
	org.springframework.data.mapping.model.ClassGeneratingEntityInstantiator$EntityInstantiatorAdapter.createInstance(ClassGeneratingEntityInstantiator.java:235)
	org.springframework.data.mapping.model.ClassGeneratingEntityInstantiator.createInstance(ClassGeneratingEntityInstantiator.java:87)
	org.springframework.data.relational.core.conversion.BasicRelationalConverter.createInstance(BasicRelationalConverter.java:147)
	org.springframework.data.r2dbc.convert.MappingR2dbcConverter.createInstance(MappingR2dbcConverter.java:300)
	org.springframework.data.r2dbc.convert.MappingR2dbcConverter.read(MappingR2dbcConverter.java:121)
	org.springframework.data.r2dbc.convert.MappingR2dbcConverter.read(MappingR2dbcConverter.java:116)
	org.springframework.data.r2dbc.convert.EntityRowMapper.apply(EntityRowMapper.java:46)
...

Apologies if this is actually an issue in the core R2BC library. I'm happy to re-raise this there if that is the case.

Thanks!

@jdmwood
Copy link
Author

jdmwood commented Sep 9, 2020

OK digging into this a bit more, it appears that the issue is that the Json type is Json.JsonOutput which uses ByteBuf internally. It appears that it only cleans this up if you actually try and read the object.

Slightly weird behaviour I guess because if you don't read the value then it will leak. Maybe this is intentional?

@jdmwood
Copy link
Author

jdmwood commented Sep 9, 2020

One thing I should point out is that even if I change my model to:

@Table
data class MyTable(
    val data: String
)

... the problem still occurs, presumably because the R2DBC drivers are still doing the conversion to Json.

@mp911de
Copy link
Member

mp911de commented Sep 9, 2020

Thanks for bringing up that issue.

As per Javadoc of Json:

JSON values should be generally considered for single-consumption only. Output values retain a reference to a potentially pooled memory buffer and must be consumed to avoid memory leaks.

Using Json in a mapped entity having a reference to a ByteBuf is error-prone and can quickly lead to memory issues as you've reported. The JSON object you're getting hold of uses a memory-efficient implementation. I assume that this arrangement is pretty common and it's not obvious where it comes from.

Mapping Json to string will likely render something like JsonByteBufInput{ {your JSON here} } into the field.

I see two things to do of which we can address one directly:

  1. Introduce a Json to byte[]/String converter to consume Json objects that are backed by a ByteBuf. That's something we can do in Spring Data
  2. Introduce an option in R2DBC Postgres to de-optimize Json consumption so that the driver copies the data from ByteBuf into a byte[] so that memory leaks cannot happen. Likely, that should be the default so that attached buffers become opt-in.

@mp911de mp911de changed the title Memory leak with jsonb data type Introduce converters to consume Postgres Json and convert these to String/byte[] Sep 9, 2020
@mp911de mp911de added the type: enhancement A general enhancement label Sep 9, 2020
@jdmwood
Copy link
Author

jdmwood commented Sep 9, 2020

As a temporary workaround (in case this helps others), I made a new type with custom converters which always read the data exactly once into a GC collected string:

data class JsonString(
    val json: String
)

class JsonToJsonStringConverter : Converter<Json, JsonString> {
    override fun convert(from: Json): JsonString {
        return JsonString(from.asString())
    }
}

class JsonStringToJsonConverter : Converter<JsonString, Json> {
    override fun convert(from: JsonString): Json {
        return Json.of(from.json)
    }
}

@Bean
override fun r2dbcCustomConversions(): R2dbcCustomConversions {
    val converterList: MutableList<Converter<*, *>?> = ArrayList<Converter<*, *>?>()
    converterList.add(JsonToJsonStringConverter())
    converterList.add(JsonStringToJsonConverter())
    return R2dbcCustomConversions(storeConversions, converterList)
}

@Table
data class MyTable(
    val data: JsonString
)

@jdmwood
Copy link
Author

jdmwood commented Sep 9, 2020

Thanks for your reply @mp911de.

For the record, the super confusing thing is that even if I use a String in my mapped entity, I still get the leak. So simply not using Json isn't enough - if my native table uses json or jsonb datatypes this problem occurs, which is much less obvious for users.

@mp911de
Copy link
Member

mp911de commented Sep 9, 2020

Can you file a ticket in https://github.com/pgjdbc/r2dbc-postgresql asking for an optimization switch for JSON? Probably the same would make sense for Clob and Blob in a way that the driver returns either detached objects (should be the default) or attached/memory-optimized ones (like it's now.).

@mp911de mp911de added this to the 1.2 RC1 (2020.0.0) milestone Sep 9, 2020
mp911de added a commit that referenced this issue Sep 10, 2020
…e to String/byte[].

We now ship converters for Postgres' Json data type to map JSON to String and to byte[] for easier and safe consumption.
@mp911de
Copy link
Member

mp911de commented Sep 10, 2020

Spring Data converters are now in place.

@mp911de mp911de closed this as completed Sep 10, 2020
@jdmwood
Copy link
Author

jdmwood commented Sep 10, 2020

Can you file a ticket in https://github.com/pgjdbc/r2dbc-postgresql asking for an optimization switch for JSON? Probably the same would make sense for Clob and Blob in a way that the driver returns either detached objects (should be the default) or attached/memory-optimized ones (like it's now.).

@mp911de Sorry, was this targeted at me? Do you still need this?

@mp911de
Copy link
Member

mp911de commented Sep 10, 2020

I just created pgjdbc/r2dbc-postgresql#330. So everything is done here.

@ajeans
Copy link

ajeans commented Jan 5, 2021

@mp911de sorry for commenting on a closed issue, but I am trying to get Postgres Json / Jsonb columns converted to String rather than io.r2dbc.postgresql.codec.Json

This fails on insertion with

2021-01-05 09:23:34.777 DEBUG 77612 --- [tor-tcp-epoll-2] o.s.r2dbc.core.DefaultDatabaseClient     : Executing SQL statement [INSERT INTO onboarding_case (organization, workflow, details) VALUES ($1, $2, $3)]
2021-01-05 09:23:34.795  WARN 77612 --- [tor-tcp-epoll-2] i.r.p.client.ReactorNettyClient          : Error: SEVERITY_LOCALIZED=ERROR, SEVERITY_NON_LOCALIZED=ERROR, CODE=42804, MESSAGE=column "details" is of type jsonb but expression is of type character varying, HINT=You will need to rewrite or cast the expression., POSITION=79, FILE=parse_target.c, LINE=590, ROUTINE=transformAssignedExpr

Looking at the commit 4506fb6 it seems that the converters are only implemented on the reading side, not on the writing side, so this is kind of expected.

Is my analysis correct? Should I open another issue on this topic?

@mp911de
Copy link
Member

mp911de commented Jan 5, 2021

Your analysis is correct.

The main concern is that writing JSON requires JSON-awareness as Postgres must know that it's a JSON value to write. While under the hood, JSON is represented as UTF-8 string, the value OID (wire value type) is set to JSON. Alternatively, Postgres casting ($1::json) can be used to cast a value into JSON. That being said, if you are in full control over the actual SQL statement, then you can apply the cast accordingly.

Converting every string to JSON isn't an option. So the only thing you can do for now is using the Json wrapper type and extract the JSON value as String via Json.asString().

@ajeans
Copy link

ajeans commented Jan 5, 2021

@mp911de thanks for the quick reply

Indeed, I tried to add a String -> Json converter on the writing side and promptly broke all String fields mapped to character varying columns ;)
So for now I introduced JsonString as detailed by @jdmwood in #453 (comment) and this works fine.
I'd rather use a standard String though. but cannot see a clean way to do this. I do have full control of the app, but I am not sure how I can add the casting cleanly within spring-data-r2dbc and our repositories based on ReactiveCrudRepository.

@mp911de
Copy link
Member

mp911de commented Jan 5, 2021

Another alternative would be to introduce a Converter<YourDomainType, OutboundRow> so you take full control over the writing side.

There's a Spring Data ticket to introduce property-based converters – right now, we have only type-based converters. That could be a future hook for these customizations.

Taking a step back, for now, forcing a String to be written as JSON one is highly dialect-specific. I'm not even sure other databases have similar behavior. So introducing an annotation or the like leaves us with the question of what problem we want to solve and how do we want to go about it.

@ajeans
Copy link

ajeans commented Jan 5, 2021

@mp911de I see what you're getting at...

Converter<YourDomainType, OutboundRow> would indeed be an option, but I use complex model objects with one json column for custom functional properties, so this would be definitely more involved than growing a JsonString type.

I couldn't find the issue for property-based converters, did you mean a generic spring-data ticket or a spring-data-r2dbc one?

Anyway, fair point about forcing all Strings to be written to a one size fits all / db specific column. I guess I was imagining some kind of hint alongside @Column to mark a String field as having a "special" type.
I could also imagine then mapping dialect-specific converters but that doesn't seem supported yet

Anyway, in short I now have:

  • Fields that should be saved as JSON are of type JsonString
  • For PG, I have fairly simple converters to do JsonString <-> io.r2dbc.postgresql.codec.Json
  • For H2 (which I use for unit testing), I have even simpler converters to do JsonString <-> String
  • I have approximated custom converters per dialect by looking up the r2dbc properties in getCustomConverters()
@Configuration
@EnableR2dbcRepositories
public class GatewayR2dbcConfiguration extends AbstractR2dbcConfiguration {

  @Autowired private R2dbcProperties r2dbcProperties;

  @Bean
  @Override
  public ConnectionFactory connectionFactory() {
    final ConnectionFactoryBuilder factoryBuilder =
        ConnectionFactoryBuilder.of(r2dbcProperties, null);
    return factoryBuilder.build();
  }

  @Override
  protected List<Object> getCustomConverters() {
    // URGH
    if (r2dbcProperties.getUrl().contains("postgres")) {
      return List.of(
          PgConverters.StringToJsonStringConverter.INSTANCE,
          PgConverters.JsonStringToStringConverter.INSTANCE);
    } else {
      return List.of(
          H2Converters.StringToJsonStringConverter.INSTANCE,
          H2Converters.JsonStringToStringConverter.INSTANCE);
    }
  }
}

Thanks again for your thoughts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

3 participants