Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Decoding of nil map value in map #208

Closed
ajwerner opened this issue Oct 5, 2017 · 2 comments
Closed

Broken Decoding of nil map value in map #208

ajwerner opened this issue Oct 5, 2017 · 2 comments

Comments

@ajwerner
Copy link

ajwerner commented Oct 5, 2017

54210f4
Seems to have broken decoding of nil map values. They are now omitted.

The below test demonstrates this behavior. In the prior commit it passes.

func TestMsgpackDecodeNilMapValue(t *testing.T) {
        type Struct struct {
                Field map[uint16]map[uint32]struct{}
        }
        toEncode := Struct{Field: map[uint16]map[uint32]struct{}{
                1: nil,
        }}
        handle := &MsgpackHandle{}
        handle.MapType = reflect.TypeOf(map[interface{}]interface{}(nil))
        buf := bytes.NewBuffer(nil)
        enc := NewEncoder(buf, handle)
        if err := enc.Encode(toEncode); err != nil {
                t.Fatalf("Failed to encode struct: %v", err)
        }
        dec := NewDecoder(bytes.NewReader(buf.Bytes()), handle)
        var decoded Struct
        if err := dec.Decode(&decoded); err != nil {
                t.Fatalf("Failed to decode struct: %v", err)
        }
        if !reflect.DeepEqual(decoded, toEncode) {
                t.Fatalf("Decoded value %#v != %#v", decoded, toEncode)
        }
}
@ajwerner
Copy link
Author

ajwerner commented Oct 5, 2017

https://github.com/ugorji/go/blob/master/codec/codec_test.go#L1753

seems like the relevant testing TODO

ajwerner referenced this issue Oct 5, 2017
Along with these changes, we support go 1.4+ across the board, and test across
the last 3 go releases (go 1.7+) and tip before each github push.

The changes articulated below are contained in this push:

expand support for careful contained use of unsafe for performance

    This mostly involved creating unsafe and safe versions of a few functions, and
    using them across the codebase.

    - []byte-->string or string-->[]byte when we know there is no write done.
      e.g. map lookup, looking up a struct field given a field name, etc
    - rt2id(...): getting an id for a type usable as an id
    - rv2i(...):  getting an interface from a reflect.Value in our narrow context for encoding/decoding
    - ptrToRvMap: caching reflect.Value got for the same pointers
    - using atomic pointers to reduce sync.RWMutex contention (lock-free model)
    - fast conversion from reflect.Value to a pointer when we know exactly what the type is.

    Note that we only support unsafe for the 3 most recent go releases.
    This is because use of unsafe requires intimate knowledge of implementation details.

Simplify codebase during decode, making things consistent and improving performance

    - Use single conditional to remove duplicate codes for decoding into maps, arrays, slices and chans
    - Make changes across: reflection, fast-path and codecgen modes

    Litany of changes
    - prepare safe and unsafe variants for *decFnInfo methods
    - clean up struct for decoding into structs and collections (maps, slices, arrays, chans)
      so they are consistent in the reflection path.
      Previously, i had separate paths for length-prefix vs separator-based collections.
    - Update fast-path and codecgen template generation scripts to make all code mirror themselves
    - Apply some optimizations along the way (specifically for decode)
    - removed some unnecessary reflection calls in kMap
    - add some CanSet() checks and reuse reflect.Value (if immutablekind) during decode

use safe and appengine tags to judiciously use "unsafe" by default for performance

    The tags are set up this way, so that a user must explicitly pass
    one of the following tags, to jse the standard "slower" apis.

json: use lookup table for safe characters to improve performance

    The previous check for characters inside of a JSON string that needed
    to be escaped performed seven different boolean comparisons before
    determining that a ASCII character did not need to be escaped. Most
    characters do not need to be escaped, so this check can be done in a
    more performant way *typically* using a single/simple memory table lookup.

    We use [128]bool to represent safe and unsafe characters, and look up
    there instead of doing a computation.

use sync/atomic to lookup typeinfo without synchronization

    Also, move from using a map to just do a binary search when looking
    up typeInfo. This should be faster (arguably).

optimize for map with string keys to avoid allocation if possible

re-introduce decDriver.DecodeStringAsBytes

    This reduces allocation significantly and is clearer.

    DecodeString and DecodeStringAsBytes are 2 versions of the same code.

add native helper functions for performant json parsing
improved json interaction with codec package for better performance.

    One way to improve json performance is to bypass the multiple readn1
    calls in a tight loop, and just have a single interface call to do the
    combo functionality.

    json does some things a lot during decoding:
    - skip whitespace characters
    - constructs numbers
    - read strings

    doing these one character at a time incur interface indirection and indirect function
    call overhead, as it all goes through decDriver interface to an implementation like *bytesDecDriver.

    We can make some major gains by exposing "combo-like" methods that expose this functionality
    as a single call (as opposed to calls that do one byte at a time).
        // skip will skip any byte that matches, and return the first non-matching byte
        skip(accept *[256]bool) (token byte)
        // readTo will read any byte that matches, stopping once no-longer matching.
        readTo(in []byte, accept *[256]bool) (out []byte)
        // readUntil will read, only stopping once it matches the 'stop' byte.
        readUntil(in []byte, stop byte) (out []byte)

    Currently, only json handle has use of these methods, giving it much better perf.

    In addition, since we can just request a block of text, and work on it directly.
    we might as well remove the logic that does strconv.ParseFloat at iterator time.
    Instead, we just grab the block of text representing a number, and pass that to the
    strconv.ParseXXX functions.

    This resulted in the removal of all the string->number logic. Which is A-ok.

    passing through interface calls to an implementation (function and indirection).

    We saw performance gains when using [256]bool arrays in place of multiple comparisons.
    We continue to leverage that pattern to get some performance improvements without the
    need to pass callback functions for matching bytes. This is good because the call-backs
    were costing us over 10% in performance loss.

    Performance gains are very good now over the last few code iterations.

re-packaged all language-specific code in appropriately named files

json: add PreferFloat option to default to decoding numbers at float

test suite and organization

    We now use test suites to define and run our tests, making
    most of what tests.sh and bench.sh did unnecessary.

    We now drive scripts from run.sh, and those just call
    go test ... for running tests and benchmarks.

    This required some re-factoring in the tests to make them all work.

    organize tests so they depend on helper functions, allowing for many suites.

    This affords XSuite, CodecSuite and AllSuite .

    It also allows us separate the tests from benchmarks, and further separate benchmarks into
    codec+stdlib only, and external ones.

codecgen: remove unsafe - delegating to codec's safe/unsafe instead.

    codecgen will now use whatever tag (safe or unsafe) the codec package is built with.

    To do this, expose stringView to codecgen (as genHelperDecoder.StringView(...)
    and use that within the generated code.

conditional build: to support go1.4+

    support go1.4+ and use goversion_* to conditionally compile things based on go version

codecgen: gen.go is only used by codecgen execution, behind build tag "codecgen.exec"

    Do this by
    - adding the build tag "codecgen.exec" to gen.go
    - moving the definition of GenVersion to gen-helper.go.tmpl (which is used at runtime)
      which requires adding a "Version" field to genInternal struct
    - passing the "codecgen.exec" tag as one of the values to "go run ..."

    Also, codecgen depends on fastpath support.
    Document that, and update run.sh appropriately.

    Also, values_codecgen_generated_test.go MUST not have the 'x' build tag set.
    That 'x' build tag is only for codebases which are not codec based.

    Code coverage is currently at 71%, which is just above our goals.

----------

Testing
    - expand list of float values to test
    - add a true large string for testing
    - add testUseIOEnc support to json-iterator benchmark
    - add a test suite for running all codec tests and grabbing codecoverage metrics
    - add test for json indent
    - exercise all paths in fast-path using full-featured generated type

Misc:
    - add json-iterator and easyjson to the benchmarks
@ugorji
Copy link
Owner

ugorji commented Oct 11, 2017

Fixed via 50530d8

@ugorji ugorji closed this as completed Oct 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants