[Java] Document how to convert JDBC Adapter result into a Parquet file #316

davisusanibar · 2023-07-24T21:46:13Z

In this example, we have the JDBC adapter result and trying to write them into a parquet file.

Current workaround:

Just put Allocator creation out of try-with-resources intentionally. Reviewing how to create/release allocator properly
If put Allocator creation inside try-with-resources errors appear such as: Closed with outstanding buffers allocated / RefCnt has gone negative

davisusanibar · 2023-07-24T22:15:29Z

To close #315

danepitkin

Thanks for adding more examples!

danepitkin · 2023-08-01T19:28:51Z

java/source/dataset.rst

+===================
+
+Go to :doc:`JDBC Adapter - Write ResultSet to Parquet File <jdbc>` for an example.


Ideally we would have a parquet example here that doesn't include things like JDBC. Do you think it would best to add it as part of this PR?

danepitkin · 2023-08-01T19:30:49Z

java/source/jdbc.rst

+      protected Schema readSchema() {
+        return schema;
+      }
+    }


Could we add whitespace to the code below so it's organized into sections? I think it will be easier to read.

danepitkin · 2023-08-01T19:36:41Z

java/source/jdbc.rst

+Write ResultSet to Parquet File
+===============================
+
+In this example, we have the JDBC adapter result and trying to write them
+into a parquet file.


Hmm, was this specific example requested? I think it would be better to include a minimal read/write parquet example in dataset.rst and remove this one. The jdbc.rst already has an example for converting ResultSet to VectorSchemaRoot. What do you think?

danepitkin · 2023-08-01T19:43:09Z

Ah I see the related issue now. I think it would be best if we had "read/write parquet" examples in dataset.rst and then added a very minimal example of why/how to extend the ArrowReader class for JDBC. What do you think?

davisusanibar · 2023-08-01T23:03:45Z

Ah I see the related issue now. I think it would be best if we had "read/write parquet" examples in dataset.rst and then added a very minimal example of why/how to extend the ArrowReader class for JDBC. What do you think?

That make sense, let me also divide that.

davisusanibar · 2023-08-01T23:07:42Z

Hi @lidavidm, Are there some recommendation for your side to where I could try to search/review for this issue?

Did you see this error when you were working with DatasetFileWriter.write or some errors related?

Current error messages:

07:52:55.995 [main] INFO org.apache.arrow.memory.BaseAllocator - Debug mode enabled.
07:52:55.999 [main] INFO org.apache.arrow.memory.DefaultAllocationManagerOption - allocation manager type not specified, using netty as the default type
07:52:56.001 [main] INFO org.apache.arrow.memory.CheckAllocator - Using DefaultAllocationManager at memory-netty/13.0.0-SNAPSHOT/arrow-memory-netty-13.0.0-SNAPSHOT.jar!/org/apache/arrow/memory/DefaultAllocationManagerFactory.class
07:52:56.020 [main] DEBUG io.netty.util.internal.logging.InternalLoggerFactory - Using SLF4J as the default logging framework
07:52:56.020 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetection.level: simple
07:52:56.020 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetection.targetRecords: 4
07:52:56.039 [main] DEBUG io.netty.util.internal.PlatformDependent0 - -Dio.netty.noUnsafe: false
07:52:56.039 [main] DEBUG io.netty.util.internal.PlatformDependent0 - Java version: 11
07:52:56.041 [main] DEBUG io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.theUnsafe: available
07:52:56.041 [main] DEBUG io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.copyMemory: available
07:52:56.042 [main] DEBUG io.netty.util.internal.PlatformDependent0 - sun.misc.Unsafe.storeFence: available
07:52:56.042 [main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.Buffer.address: available
07:52:56.042 [main] DEBUG io.netty.util.internal.PlatformDependent0 - direct buffer constructor: unavailable: Reflective setAccessible(true) disabled
07:52:56.043 [main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.Bits.unaligned: available, true
07:52:56.043 [main] DEBUG io.netty.util.internal.PlatformDependent0 - jdk.internal.misc.Unsafe.allocateUninitializedArray(int): unavailable: class io.netty.util.internal.PlatformDependent0$7 cannot access class jdk.internal.misc.Unsafe (in module java.base) because module java.base does not export jdk.internal.misc to unnamed module @d4342c2
07:52:56.044 [main] DEBUG io.netty.util.internal.PlatformDependent0 - java.nio.DirectByteBuffer.<init>(long, {int,long}): unavailable
07:52:56.044 [main] DEBUG io.netty.util.internal.PlatformDependent - sun.misc.Unsafe: available
07:52:56.060 [main] DEBUG io.netty.util.internal.PlatformDependent - maxDirectMemory: 8589934592 bytes (maybe)
07:52:56.060 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.tmpdir: /var/folders/d6/cz55k4qj52b40dmdvfjc_stm0000gn/T (java.io.tmpdir)
07:52:56.060 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.bitMode: 64 (sun.arch.data.model)
07:52:56.061 [main] DEBUG io.netty.util.internal.PlatformDependent - Platform: MacOS
07:52:56.063 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.maxDirectMemory: -1 bytes
07:52:56.063 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.uninitializedArrayAllocationThreshold: -1
07:52:56.063 [main] DEBUG io.netty.util.internal.CleanerJava9 - java.nio.ByteBuffer.cleaner(): available
07:52:56.063 [main] DEBUG io.netty.util.internal.PlatformDependent - -Dio.netty.noPreferDirect: false
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numHeapArenas: 32
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numDirectArenas: 32
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.pageSize: 8192
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxOrder: 9
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.chunkSize: 4194304
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.smallCacheSize: 256
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.normalCacheSize: 64
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxCachedBufferCapacity: 32768
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.cacheTrimInterval: 8192
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.cacheTrimIntervalMillis: 0
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.useCacheForAllThreads: false
07:52:56.064 [main] DEBUG io.netty.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxCachedByteBuffersPerChunk: 1023
07:52:56.069 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - -Dio.netty.threadLocalMap.stringBuilder.initialSize: 1024
07:52:56.069 [main] DEBUG io.netty.util.internal.InternalThreadLocalMap - -Dio.netty.threadLocalMap.stringBuilder.maxSize: 4096
07:52:56.086 [main] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.checkAccessible: true
07:52:56.086 [main] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.checkBounds: true
07:52:56.087 [main] DEBUG io.netty.util.ResourceLeakDetectorFactory - Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@72057ecf
07:52:56.105 [main] DEBUG org.apache.arrow.memory.rounding.DefaultRoundingPolicy - -Dorg.apache.memory.allocator.pageSize: 8192
07:52:56.105 [main] DEBUG org.apache.arrow.memory.rounding.DefaultRoundingPolicy - -Dorg.apache.memory.allocator.maxOrder: 11
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.maxCapacityPerThread: 4096
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.ratio: 8
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.chunkSize: 32
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.blocking: false
07:52:56.625 [main] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.batchFastThreadLocalOnly: true
07:52:56.630 [main] DEBUG io.netty.util.internal.PlatformDependent - org.jctools-core.MpscChunkedArrayQueue: available
07:52:56.637 [main] DEBUG org.apache.arrow.memory.util.MemoryUtil - Constructor for direct buffer found and made accessible
07:52:56.637 [main] DEBUG org.apache.arrow.memory.util.MemoryUtil - direct buffer constructor: available
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.arrow.memory.util.MemoryUtil (file:/Users/dsusanibar/.m2/repository/org/apache/arrow/arrow-memory-core/13.0.0-SNAPSHOT/arrow-memory-core-13.0.0-SNAPSHOT.jar) to field java.nio.Buffer.address
WARNING: Please consider reporting this to the maintainers of org.apache.arrow.memory.util.MemoryUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 8
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 16
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 12
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 32
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 112, length: 1
07:52:57.641 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 120, length: 12
07:52:57.642 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 136, length: 1
07:52:57.642 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 144, length: 20
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 8
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 16
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 12
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 32
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 112, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 120, length: 12
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 136, length: 1
07:52:57.646 [Thread-1] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 144, length: 20
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 4
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 8
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 48, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 8
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 16
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 88, length: 8
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 96, length: 1
07:52:57.676 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 104, length: 4
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 0, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 8, length: 4
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 16, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 24, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 32, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 40, length: 8
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 48, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 56, length: 8
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 64, length: 16
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 80, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 88, length: 8
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 96, length: 1
07:52:57.683 [Thread-2] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 104, length: 4
Exception in thread "Thread-8" java.lang.IllegalStateException: RefCnt has gone negative
	at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
	at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
	at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
	at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
	at org.apache.arrow.vector.BaseFixedWidthVector.clear(BaseFixedWidthVector.java:248)
	at org.apache.arrow.vector.BaseFixedWidthVector.close(BaseFixedWidthVector.java:238)
	at org.apache.arrow.util.AutoCloseables.close(AutoCloseables.java:97)
	at org.apache.arrow.vector.VectorSchemaRoot.close(VectorSchemaRoot.java:247)
	at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:143)
	at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:131)
	at org.apache.arrow.c.ArrayStreamExporter$ExportedArrayStreamPrivateData.close(ArrayStreamExporter.java:97)
	Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
		... 11 more
	Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
		at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
		at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
		at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
		at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
		at org.apache.arrow.vector.complex.BaseRepeatedValueVector.clear(BaseRepeatedValueVector.java:247)
		at org.apache.arrow.vector.complex.ListVector.clear(ListVector.java:624)
		at org.apache.arrow.vector.BaseValueVector.close(BaseValueVector.java:77)
		... 5 more
Exception in thread "main" java.lang.IllegalStateException: RefCnt has gone negative
	at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
	at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
	at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
	at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
	at org.apache.arrow.vector.BaseFixedWidthVector.clear(BaseFixedWidthVector.java:248)
	at org.apache.arrow.vector.BaseFixedWidthVector.close(BaseFixedWidthVector.java:238)
	at org.apache.arrow.util.AutoCloseables.close(AutoCloseables.java:97)
	at org.apache.arrow.vector.VectorSchemaRoot.close(VectorSchemaRoot.java:247)
	at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:143)
	at org.apache.arrow.vector.ipc.ArrowReader.close(ArrowReader.java:131)
	at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:70)
	Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
		... 11 more
	Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
		... 11 more
	Suppressed: java.lang.IllegalStateException: RefCnt has gone negative
		at org.apache.arrow.util.Preconditions.checkState(Preconditions.java:458)
		at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:130)
		at org.apache.arrow.memory.BufferLedger.release(BufferLedger.java:104)
		at org.apache.arrow.vector.BaseValueVector.releaseBuffer(BaseValueVector.java:117)
		at org.apache.arrow.vector.BaseVariableWidthVector.clear(BaseVariableWidthVector.java:270)
		at org.apache.arrow.vector.BaseVariableWidthVector.close(BaseVariableWidthVector.java:261)
		... 5 more
	Suppressed: java.lang.IllegalStateException: Allocator[allocatorParquetWrite] closed with outstanding buffers allocated (12).
Allocator(allocatorParquetWrite) 0/17746/51748/9223372036854775807 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 12
    ledger[101] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362784444..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[155], address:140388662068096, capacity:128
    ledger[111] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383364210168..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[165], address:140388662068608, capacity:128
    ledger[98] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362357658..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[152], address:140388661985520, capacity:2
    ledger[95] allocator: allocatorParquetWrite), isOwning: , size: , references: 0, life: 183383362099977..183383372675946, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[148], address:140388662035456, capacity:512
    ledger[103] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383363021510..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[157], address:140388662068352, capacity:128
    ledger[102] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362877116..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[156], address:140388662068224, capacity:128
    ledger[100] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362665191..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[154], address:140388662067968, capacity:128
    ledger[104] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383363121777..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[158], address:140388662068480, capacity:128
    ledger[105] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383363250885..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[159], address:140388661985536, capacity:8
    ledger[110] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383364066585..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[164], address:140388661985600, capacity:8
    ledger[99] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362510994..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[153], address:140388662059136, capacity:64
    ledger[96] allocator: allocatorParquetWrite), isOwning: , size: , references: 1, life: 183383362167890..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[149], address:140388662140928, capacity:16384
  reservations: 0

		at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:445)
		at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:34)
	Suppressed: java.lang.IllegalStateException: Allocator[allocatorJDBC] closed with outstanding buffers allocated (8).
Allocator(allocatorJDBC) 0/49760/99536/9223372036854775807 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 8
    ledger[4] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382320693061..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[13], address:140388662017544, capacity:504
        ArrowBuf[11], address:140388662001664, capacity:16384
        ArrowBuf[12], address:140388662001664, capacity:15880
    ledger[3] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382316692606..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[8], address:140388661993472, capacity:32
        ArrowBuf[9], address:140388661993472, capacity:24
        ArrowBuf[10], address:140388661993496, capacity:8
    ledger[1] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382299608346..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[2], address:140388661985280, capacity:16
        ArrowBuf[3], address:140388661985280, capacity:8
        ArrowBuf[4], address:140388661985288, capacity:8
    ledger[2] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382315070903..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[7], address:140388661985304, capacity:8
        ArrowBuf[5], address:140388661985296, capacity:16
        ArrowBuf[6], address:140388661985296, capacity:8
    ledger[8] allocator: allocatorJDBC), isOwning: , size: , references: 2, life: 183382324122274..0, allocatorManager: [, life: ] holds 3 buffers. 
        ArrowBuf[19], address:140388662058504, capacity:504
        ArrowBuf[18], address:140388662042624, capacity:15880
        ArrowBuf[17], address:140388662042624, capacity:16384
    ledger[9] allocator: allocatorJDBC), isOwning: , size: , references: 1, life: 183382325184402..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[20], address:140388661993504, capacity:32
    ledger[6] allocator: allocatorJDBC), isOwning: , size: , references: 1, life: 183382323054940..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[15], address:140388662018048, capacity:16384
    ledger[7] allocator: allocatorJDBC), isOwning: , size: , references: 1, life: 183382323374059..0, allocatorManager: [, life: ] holds 1 buffers. 
        ArrowBuf[16], address:140388662034432, capacity:512
  reservations: 0

		at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:445)
		at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:34)
	Suppressed: java.util.ConcurrentModificationException
		at java.base/java.util.IdentityHashMap$IdentityHashMapIterator.nextIndex(IdentityHashMap.java:737)
		at java.base/java.util.IdentityHashMap$KeyIterator.next(IdentityHashMap.java:828)
		at org.apache.arrow.memory.BaseAllocator.print(BaseAllocator.java:693)
		at org.apache.arrow.memory.BaseAllocator.print(BaseAllocator.java:689)
		at org.apache.arrow.memory.BaseAllocator.toString(BaseAllocator.java:501)
		at org.apache.arrow.memory.RootAllocator.toString(RootAllocator.java:29)
		at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:432)
		at org.apache.arrow.memory.RootAllocator.close(RootAllocator.java:29)
		at dataset.domingo.WriteArrowObjectsToParquet.main(WriteArrowObjectsToParquet.java:34)
07:52:57.697 [main] DEBUG org.apache.arrow.memory.BaseAllocator - closed allocator[allocatorReader].

lidavidm · 2023-08-02T00:21:00Z

Have you isolated the problem? Looked at a debugger? Enabled allocation tracing?

davisusanibar · 2023-08-11T19:35:23Z

Hi @danepitkin changes was added as requested.

danepitkin · 2023-08-11T20:01:58Z

Nice work! I left a couple more comments. Let me know what you think.

davisusanibar · 2023-08-11T20:56:29Z

Nice work! I left a couple more comments. Let me know what you think.

What are those comments?

danepitkin · 2023-08-11T19:55:22Z

java/source/dataset.rst

+
+
+Write Parquet Files


Can we move this to io.rst? That's were "Read parquet" is.

Can we move this to io.rst? That's were "Read parquet" is.

Currently, io.rst redirect to dataset.rst for read parquet.

What about to add write parquet on io.rst to also redirect to dataset.rst for write parquet?

Hmm, I think it's actually better to put "Write Parquet" examples in io.rst. The dataset.rst examples are primarily for querying (reading) data.

danepitkin · 2023-08-11T19:57:07Z

java/source/io.rst

@@ -579,3 +579,95 @@ Reading and writing dictionary-encoded data requires separately tracking the dic
   Dictionary-encoded data recovered: [0, 3, 4, 5, 7]
   Dictionary recovered: Dictionary DictionaryEncoding[id=666,ordered=false,indexType=Int(8, true)] [Andorra, Cuba, Grecia, Guinea, Islandia, Malta, Tailandia, Uganda, Yemen, Zambia]
   Decoded data: [Andorra, Guinea, Islandia, Malta, Uganda]
+
+Customize Logic to Read Dataset


Can we move this to jdbc.rst? I think it fits better there since its directly applicable.

Just maintain the steps needed to implement a data reader, and references as an example to jdbc page.

danepitkin · 2023-08-11T20:00:29Z

java/source/jdbc.rst

+    import ch.qos.logback.classic.Level;
+    import ch.qos.logback.classic.Logger;
+
+    class JDBCReader extends ArrowReader {


Could we somehow delete the duplicate code here and reuse the other one? Or combine the two?

Only JDBC is maintaining this demo example now.

danepitkin · 2023-08-11T20:57:30Z

I forgot to hit "Submit Review" 😅 sorry!

davisusanibar · 2023-09-06T02:46:58Z

I would appreciate your help with a new code review, @danepitkin.

danepitkin

Overall LGTM! Thank you @davisusanibar

danepitkin · 2023-09-13T20:58:50Z

java/source/io.rst

+    import org.apache.arrow.vector.util.ByteArrayReadableSeekableByteChannel;
+
+    // read arrow demo data
+    Path uriRead = Paths.get("./thirdpartydeps/arrowfiles/random_access.arrow");


Should we add a comment describing what's in this file? Looks like it's three row groups of 3 rows each based on the output.

danepitkin · 2023-09-13T21:01:49Z

java/source/jdbc.rst

+    import ch.qos.logback.classic.Level;
+    import ch.qos.logback.classic.Logger;
+
+    class JDBCReader extends ArrowReader {


lidavidm · 2023-09-15T21:02:46Z

java/source/jdbc.rst

+      }
+    }
+
+    ((Logger) LoggerFactory.getLogger("org.apache.arrow")).setLevel(Level.TRACE);


Why are we fiddling with loggers and adding logback to the example? I don't think we need any of that?

lidavidm · 2023-09-15T21:03:30Z

java/source/jdbc.rst

+    import ch.qos.logback.classic.Level;
+    import ch.qos.logback.classic.Logger;
+
+    class JDBCReader extends ArrowReader {


Explain that we need this because writing a dataset takes an ArrowReader, so we have to adapt the JDBC ArrowVectorIterator to the ArrowReader interface

lidavidm · 2023-09-15T21:04:53Z

java/source/jdbc.rst

+          JdbcToArrowUtils.getUtcCalendar())
+          .setTargetBatchSize(2)
+          .setReuseVectorSchemaRoot(true)
+          .setArraySubTypeByColumnNameMap(


In the interest of keeping examples concise, let's use sample data that doesn't require us to deal with all of this in the first place.

pronzato · 2023-09-28T17:49:17Z

Hi David, When I try to run JDBCReader I get URI has empty scheme java.lang.RuntimeException: URI has empty scheme: '/tmp at org.apache.arrow.dataset.file.JniWrapper.writeFromScannerToFile(Native Method) at org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:46) at org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:59) Any idea what could be causing this? Regards GP

…

On Fri, Sep 15, 2023, 5:05 PM David Li ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In java/source/jdbc.rst <#316 (comment)> : > + + @OverRide + protected Schema readSchema() throws IOException { + return null; + } + + @OverRide + public VectorSchemaRoot getVectorSchemaRoot() throws IOException { + if (root == null) { + root = iter.next(); + } + return root; + } + } + + ((Logger) LoggerFactory.getLogger("org.apache.arrow")).setLevel(Level.TRACE); Why are we fiddling with loggers and adding logback to the example? I don't think we need any of that? ------------------------------ In java/source/jdbc.rst <#316 (comment)> : > + import org.apache.arrow.dataset.scanner.ScanOptions; + import org.apache.arrow.dataset.scanner.Scanner; + import org.apache.arrow.dataset.source.Dataset; + import org.apache.arrow.dataset.source.DatasetFactory; + import org.apache.arrow.memory.BufferAllocator; + import org.apache.arrow.memory.RootAllocator; + import org.apache.arrow.vector.VectorSchemaRoot; + import org.apache.arrow.vector.ipc.ArrowReader; + import org.apache.arrow.vector.types.pojo.Schema; + import org.apache.ibatis.jdbc.ScriptRunner; + import org.slf4j.LoggerFactory; + + import ch.qos.logback.classic.Level; + import ch.qos.logback.classic.Logger; + + class JDBCReader extends ArrowReader { Explain that we need this because writing a dataset takes an ArrowReader, so we have to adapt the JDBC ArrowVectorIterator to the ArrowReader interface ------------------------------ In java/source/jdbc.rst <#316 (comment)> : > + final BufferAllocator allocatorParquetWrite = allocator.newChildAllocator("allocatorParquetWrite", 0, + Long.MAX_VALUE); + final Connection connection = DriverManager.getConnection( + "jdbc:h2:mem:h2-jdbc-adapter") + ) { + ScriptRunner runnerDDLDML = new ScriptRunner(connection); + runnerDDLDML.setLogWriter(null); + runnerDDLDML.runScript(new BufferedReader( + new FileReader("./thirdpartydeps/jdbc/h2-ddl.sql"))); + runnerDDLDML.runScript(new BufferedReader( + new FileReader("./thirdpartydeps/jdbc/h2-dml.sql"))); + JdbcToArrowConfig config = new JdbcToArrowConfigBuilder(allocatorJDBC, + JdbcToArrowUtils.getUtcCalendar()) + .setTargetBatchSize(2) + .setReuseVectorSchemaRoot(true) + .setArraySubTypeByColumnNameMap( In the interest of keeping examples concise, let's use sample data that doesn't require us to deal with all of this in the first place. — Reply to this email directly, view it on GitHub <#316 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACO7PHBDXWQAOPIBFG2WX6LX2S7IZANCNFSM6AAAAAA2WFM25A> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

davisusanibar · 2023-10-04T10:37:40Z

Hi David, When I try to run JDBCReader I get URI has empty scheme java.lang.RuntimeException: URI has empty scheme: '/tmp at org.apache.arrow.dataset.file.JniWrapper.writeFromScannerToFile(Native Method) at org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:46) at org.apache.arrow.dataset.file.DatasetFileWriter.write(DatasetFileWriter.java:59) Any idea what could be causing this? Regards GP

Hi @pronzato, this project also uses JDBC reader https://github.com/davisusanibar/java-python-by-cdata.git.

Could you please try using that and confirm if it is also failing?

feat: Document how to convert JDBC Adapter result into a Parquet file

52a7034

danepitkin reviewed Aug 1, 2023

View reviewed changes

clean code + jdbc/dataset/reader recipes

7d448c0

davisusanibar marked this pull request as ready for review August 11, 2023 19:34

danepitkin reviewed Aug 11, 2023

View reviewed changes

davisusanibar added 2 commits August 28, 2023 17:36

Merge branch 'main' into apacheGH-315

b81312a

fix: code review

95edea4

danepitkin reviewed Sep 13, 2023

View reviewed changes

fix: code review

991f40b

lidavidm reviewed Sep 15, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Java] Document how to convert JDBC Adapter result into a Parquet file #316

[Java] Document how to convert JDBC Adapter result into a Parquet file #316

davisusanibar commented Jul 24, 2023

davisusanibar commented Jul 24, 2023

danepitkin left a comment

danepitkin Aug 1, 2023

danepitkin Aug 1, 2023

danepitkin Aug 1, 2023

danepitkin commented Aug 1, 2023

davisusanibar commented Aug 1, 2023

davisusanibar commented Aug 1, 2023

lidavidm commented Aug 2, 2023

davisusanibar commented Aug 11, 2023

danepitkin commented Aug 11, 2023

davisusanibar commented Aug 11, 2023

danepitkin Aug 11, 2023

davisusanibar Aug 11, 2023

danepitkin Aug 15, 2023

davisusanibar Aug 28, 2023

danepitkin Aug 11, 2023

davisusanibar Aug 28, 2023

danepitkin Aug 11, 2023

davisusanibar Aug 28, 2023

danepitkin Sep 13, 2023

danepitkin commented Aug 11, 2023

davisusanibar commented Sep 6, 2023

danepitkin left a comment

danepitkin Sep 13, 2023

davisusanibar Sep 14, 2023

danepitkin Sep 13, 2023

lidavidm Sep 15, 2023

lidavidm Sep 15, 2023

lidavidm Sep 15, 2023

pronzato commented Sep 28, 2023 via email

davisusanibar commented Oct 4, 2023

		===================

		Go to :doc:`JDBC Adapter - Write ResultSet to Parquet File <jdbc>` for an example.

[Java] Document how to convert JDBC Adapter result into a Parquet file #316

Are you sure you want to change the base?

[Java] Document how to convert JDBC Adapter result into a Parquet file #316

Conversation

davisusanibar commented Jul 24, 2023

davisusanibar commented Jul 24, 2023

danepitkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danepitkin commented Aug 1, 2023

davisusanibar commented Aug 1, 2023

davisusanibar commented Aug 1, 2023

lidavidm commented Aug 2, 2023

davisusanibar commented Aug 11, 2023

danepitkin commented Aug 11, 2023

davisusanibar commented Aug 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danepitkin commented Aug 11, 2023

davisusanibar commented Sep 6, 2023

danepitkin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pronzato commented Sep 28, 2023 via email

davisusanibar commented Oct 4, 2023