Skip to content

Commit

Permalink
Memory improvements in FastJsonNode (#5088)
Browse files Browse the repository at this point in the history
Fixes #5124, DGRAPH-1170

While converting a subgraph to JSON response, an intermediate data structure called
fastJsonNode tree is formed. We have observed when response to be returned is big(specially in recurse queries), this datastructure itself can occupy lot of memory and leading to OOM in some
cases.
This PR aims to reduce space occupied by fastJsonNode tree. fastJsonNode tree is kind of n-ary
tree, where each fastJsonNode maintains some meta data and list of its children. This PR tries to
reduce space occupied by each node in following way:

For each response a separate datastructure called encoder is formed which is responsible for
maintaining meta data for all fastJsonNodes.
encoder has metaSlice and childrenMap where all meta and children list are maintained for
all fastJsonNodes. Index at which meta for a fastJsonNode is present, becomes its value and hence type of a fastJsonNode is uint32.
meta for a fastJsonNode(present at int(fastJsonNode) value in metaSlice) is of uint64 type. It stores all the info for a fastJsonNode. Most significant bit stores value of List field, bytes 7-6 stores attr id and bytes 4 to 1 stores arena offset((explained below)).
encoder has attrMap which has mapping of predicates to unique uint16 number.
encoder also has arena. arena is a larger []byte, which stores bytes for each leaf node. It
offsets are stored in fastJsonNode meta. arena stores same []byte only once and keeps a map
for memhash([]byte) to offset mapping.
On this change, I am able to run some of queries which were resulting in OOM current master.

Profile for a query when RSS usage was around 30GB
master profile on the query:

File: dgraph
Build ID: 4009644c7dfb41957358d88f228b977b2fb552c7
Type: inuse_space
Time: Apr 11, 2020 at 10:31pm (IST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 26.74GB, 98.87% of 27.05GB total
Dropped 133 nodes (cum <= 0.14GB)
Showing top 10 nodes out of 42
      flat  flat%   sum%        cum   cum%
   16.62GB 61.43% 61.43%    16.62GB 61.43%  github.com/dgraph-io/dgraph/query.makeScalarNode
    4.23GB 15.63% 77.06%     4.23GB 15.63%  github.com/dgraph-io/dgraph/query.(*fastJsonNode).New
    2.03GB  7.51% 84.57%     2.03GB  7.51%  github.com/dgraph-io/dgraph/query.stringJsonMarshal
    1.64GB  6.08% 90.65%    15.99GB 59.10%  github.com/dgraph-io/dgraph/query.(*fastJsonNode).AddListValue
    0.88GB  3.25% 93.90%     5.24GB 19.36%  github.com/dgraph-io/dgraph/query.(*fastJsonNode).SetUID
    0.44GB  1.61% 95.52%     0.44GB  1.61%  github.com/dgraph-io/dgraph/query.(*fastJsonNode).AddListChild
    0.36GB  1.32% 96.84%     0.39GB  1.45%  github.com/dgraph-io/dgraph/worker.(*queryState).handleValuePostings.func1
    0.25GB  0.92% 97.76%     0.25GB  0.92%  github.com/dgraph-io/ristretto.newCmRow
    0.16GB   0.6% 98.36%     0.16GB   0.6%  github.com/dgraph-io/badger/v2/skl.newArena
    0.14GB  0.51% 98.87%     0.14GB  0.51%  github.com/dgraph-io/ristretto/z.(*Bloom).Size
This PR profile:

File: dgraph
Build ID: 8fd737a95d4edf3ffb305638081766fd0044e99d
Type: inuse_space
Time: Apr 15, 2020 at 11:20am (IST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 15557.02MB, 98.53% of 15789.61MB total
Dropped 168 nodes (cum <= 78.95MB)
Showing top 10 nodes out of 60
      flat  flat%   sum%        cum   cum%
 6341.11MB 40.16% 40.16%  6341.11MB 40.16%  github.com/dgraph-io/dgraph/query.(*encoder).appendAttrs
 4598.84MB 29.13% 69.29%  4598.84MB 29.13%  bytes.makeSlice
 3591.16MB 22.74% 92.03%  3591.16MB 22.74%  github.com/dgraph-io/dgraph/query.(*encoder).newNode
  365.52MB  2.31% 94.34%   408.54MB  2.59%  github.com/dgraph-io/dgraph/worker.(*queryState).handleValuePostings.func1
     256MB  1.62% 95.97%      256MB  1.62%  github.com/dgraph-io/ristretto.newCmRow
  166.41MB  1.05% 97.02%   166.41MB  1.05%  github.com/dgraph-io/badger/v2/skl.newArena
  140.25MB  0.89% 97.91%   140.25MB  0.89%  github.com/dgraph-io/ristretto/z.(*Bloom).Size
   91.12MB  0.58% 98.48%    98.05MB  0.62%  github.com/dgraph-io/dgraph/posting.(*List).Uids
       6MB 0.038% 98.52%   122.23MB  0.77%  github.com/dgraph-io/dgraph/worker.(*queryState).handleUidPostings.func1
    0.64MB 0.004% 98.53%   387.17MB  2.45%  github.com/dgraph-io/ristretto.NewCache
  • Loading branch information
ashish-goswami authored Apr 20, 2020
1 parent 17a9c79 commit d3a4305
Show file tree
Hide file tree
Showing 4 changed files with 712 additions and 202 deletions.
111 changes: 111 additions & 0 deletions query/arena.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
/*
* Copyright 2017-2020 Dgraph Labs, Inc. and Contributors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package query

import (
"encoding/binary"
"errors"
"fmt"
"math"
"sync"

"github.com/dgraph-io/ristretto/z"
)

const maxArenaSize = int64(math.MaxUint32)

var (
errInvalidOffset = errors.New("arena get performed with invalid offset")

arenaPool = sync.Pool{
New: func() interface{} {
a := newArena(1 * 1024)
return a
},
}
)

// arena can used to store []byte. It has one underlying large buffer([]byte). All of []byte to be
// stored in arena are appended to this underlying buffer. For futher optimizations, arena also
// keeps mapping from memhash([]byte) => offset in map. This ensures same []byte is put into
// arena only once.
// For now, max size for underlying buffer is limited to math.MaxUint32.
type arena struct {
buf []byte
offsetMap map[uint64]uint32
}

// newArena returns arena with initial capacity size.
func newArena(size int) *arena {
// Start offset from 1, to avoid reading bytes when offset is storing default value(0) in
// fastJsonNode. Hence append dummy byte.
buf := make([]byte, 0, size)
return &arena{
buf: append(buf, []byte("a")...),
offsetMap: make(map[uint64]uint32),
}
}

// put stores b in arena and returns offset for it. Returned offset is always > 0(if no error).
// Note: for now this function can only put buffers such that:
// len(current arena buf) + varint(len(b)) + len(b) <= math.MaxUint32.
func (a *arena) put(b []byte) (uint32, error) {
// Check if we already have b.
fp := z.MemHash(b)
if co, ok := a.offsetMap[fp]; ok {
return co, nil
}
// First put length of buffer(varint encoded), then put actual buffer.
var sizeBuf [binary.MaxVarintLen64]byte
w := binary.PutVarint(sizeBuf[:], int64(len(b)))
offset := len(a.buf)
if int64(len(a.buf)+w+len(b)) > maxArenaSize {
msg := fmt.Sprintf("errNotEnoughSpaceArena, curSize: %d, maxSize: %d, bufSize: %d",
len(a.buf), maxArenaSize, w+len(b))
return 0, errors.New(msg)
}

a.buf = append(a.buf, sizeBuf[:w]...)
a.buf = append(a.buf, b...)

a.offsetMap[fp] = uint32(offset) // Store offset in map.
return uint32(offset), nil
}

func (a *arena) get(offset uint32) ([]byte, error) {
// We have only dummy value at offset 0.
if offset == 0 {
return nil, nil
}

if int64(offset) >= int64(len(a.buf)) {
return nil, errInvalidOffset
}

// First read length, then read actual buffer.
size, r := binary.Varint(a.buf[offset:])
offset += uint32(r)
return a.buf[offset : offset+uint32(size)], nil
}

func (a *arena) reset() {
a.buf = a.buf[:1]

for k := range a.offsetMap {
delete(a.offsetMap, k)
}
}
77 changes: 77 additions & 0 deletions query/fastjson_test.go
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
package query

import (
"bytes"
"math"
"testing"

"github.com/dgraph-io/dgraph/protos/pb"
"github.com/dgraph-io/dgraph/task"
"github.com/dgraph-io/dgraph/testutil"
"github.com/dgraph-io/dgraph/types"
"github.com/stretchr/testify/require"
)

Expand Down Expand Up @@ -58,3 +61,77 @@ func TestSubgraphToFastJSON(t *testing.T) {
assertJSON(t, `{"query":[]}`, subgraphWithSingleResultAndSingleValue(task.FromFloat(math.Inf(1))))
})
}

func TestEncode(t *testing.T) {
enc := newEncoder()

t.Run("with uid list predicate", func(t *testing.T) {
root := enc.newNode()
friendNode1 := enc.newNodeWithAttr(enc.idForAttr("friend"))
enc.AddValue(friendNode1, enc.idForAttr("name"),
types.Val{Tid: types.StringID, Value: "alice"})
friendNode2 := enc.newNodeWithAttr(enc.idForAttr("friend"))
enc.AddValue(friendNode2, enc.idForAttr("name"),
types.Val{Tid: types.StringID, Value: "bob"})

enc.AddListChild(root, enc.idForAttr("friend"), friendNode1)
enc.AddListChild(root, enc.idForAttr("friend"), friendNode2)

buf := new(bytes.Buffer)
require.NoError(t, enc.encode(root, buf))
testutil.CompareJSON(t, `
{
"friend":[
{
"name":"alice"
},
{
"name":"bob"
}
]
}
`, buf.String())
})

t.Run("with value list predicate", func(t *testing.T) {
root := enc.newNode()
enc.AddValue(root, enc.idForAttr("name"),
types.Val{Tid: types.StringID, Value: "alice"})
enc.AddValue(root, enc.idForAttr("name"),
types.Val{Tid: types.StringID, Value: "bob"})

buf := new(bytes.Buffer)
require.NoError(t, enc.encode(root, buf))
testutil.CompareJSON(t, `
{
"name":[
"alice",
"bob"
]
}
`, buf.String())
})

t.Run("with uid predicate", func(t *testing.T) {
root := enc.newNode()

person := enc.newNode()
enc.AddValue(person, enc.idForAttr("name"), types.Val{Tid: types.StringID, Value: "alice"})
enc.AddValue(person, enc.idForAttr("age"), types.Val{Tid: types.IntID, Value: 25})

enc.AddListChild(root, enc.idForAttr("person"), person)

buf := new(bytes.Buffer)
require.NoError(t, enc.encode(root, buf))
testutil.CompareJSON(t, `
{
"person":[
{
"name":"alice",
"age":25
}
]
}
`, buf.String())
})
}
Loading

0 comments on commit d3a4305

Please sign in to comment.