Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Julia support to flatbuffers #5088

Closed
wants to merge 26 commits into from
Closed

add Julia support to flatbuffers #5088

wants to merge 26 commits into from

Conversation

rjkat
Copy link

@rjkat rjkat commented Dec 11, 2018

Hi.

This is a rather large change, adding support for Julia to flatbuffers. I am committing on behalf of my company, which has a pre-existing corporate contribution license with Google.

The generated code has a dependency on the FlatBuffers.jl project, of which I am a maintainer and will be making a release to coincide with the integration of this PR. Tests for compatibility with other flatbuffers implementations reside within that project.

Comments and criticism are most welcome - please don't be shy. Given the size of the change I'm sure there are many things which can be improved, and I will do my best to address them.

Thanks for your time!

R

docs/source/Support.md Outdated Show resolved Hide resolved
include/flatbuffers/idl.h Outdated Show resolved Hide resolved
@rjkat
Copy link
Author

rjkat commented Dec 11, 2018

Looks like only VS2010 failed - all the other ones passed. I'll attempt to hunt down a machine with VS2010 and reproduce...

Edit: fixed!

docs/source/Tutorial.md Outdated Show resolved Hide resolved
docs/source/Tutorial.md Outdated Show resolved Hide resolved
docs/source/Tutorial.md Outdated Show resolved Hide resolved
docs/source/Tutorial.md Outdated Show resolved Hide resolved
@aardappel
Copy link
Collaborator

Very cool.. give me some time to go thru all of this :)

@aardappel
Copy link
Collaborator

So far, all language implementations are self-contained in this repo, and this repo has no dependencies. I don't think we'll want to change this for Julia. Since it is just a few files, I think they can be copied.

@rjkat
Copy link
Author

rjkat commented Dec 13, 2018

I don't have any objections to bringing the Julia code into the flatbuffers project. One thing to consider would be the license for this code, @dmbates is the original author and @quinnj has also made substantial contributions. I also don't know how we could then release it as a Julia package - Julia expects packages to be git repos in their own right with a certain structure. Would adding it as a git submodule be an acceptable option?

@quinnj
Copy link

quinnj commented Dec 13, 2018

The new Julia package manager available in 1.0 will soon support "packages" that don't necessarily follow the one-package-per-repository setup, so I think it's fine to put the code here and we can just maintain both FlatBuffers.jl and this until we feel comfortable removing the FlatBuffers.jl repository. Also, I don't think any of @dmbates code is still around; the repository has been licensed as MIT since the beginning, so I don't think there should be licensing concerns.

@rjkat
Copy link
Author

rjkat commented Dec 14, 2018

I just brought in the FlatBuffers.jl package. I haven't preserved the git commit history from that repository. My thinking was that this is something we can do later if the need arises.

@aardappel
Copy link
Collaborator

We don't use git submodules in FlatBuffers yet, so would prefer to keep it that way.

julia> Pkg.add("FlatBuffers")
```

## Documentation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to point to the main docs and tutorial in docs/ or https://google.github.io/flatbuffers/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -0,0 +1,97 @@
# FlatBuffers.jl Documentation

#### Overview
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It be good to have most of the docs under docs/ that are user facing, so they end up as part of the website etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically because I've gone and added the whole Julia package in here. I already have an external link to these documents in the docs/ folder, so you can get to them from the website. Until we are actually releasing the Julia package out of the flatbuffers project, it might be best to keep the documentation for that package along with it and link to it externally for now?

@@ -0,0 +1,14 @@
# automatically generated by the FlatBuffers compiler, do not modify
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move these files to be in test/ to be analogous to all other languages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, looks like they're simply duplicates, so remove them here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

import ..InParentNamespace
import FlatBuffers

FlatBuffers.@with_kw mutable struct Monster{T}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice and short, but.. as I understand it, this generates an "object API", where you construct an object which then gets serialized. The whole point of FlatBuffers design is that it can create (and in particular, read) serialized data without creating language objects first, but directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is a question for @quinnj. There is a lower-level API in Julia for reading/writing flatbuffers directly however it relies on metaprogramming to determine the size of struct fields, meaning you have to construct a language object in order to be able to serialize one. Just so I'm clear, is performance the reason you want to avoid creating language objects? If so, we could run some tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is entirely about performance. If Julia's meta programming / JIT can make this just as fast as an API that constructs serialized data directly, then that is great also.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay here is my hokey performance benchmark.

julia> include("MyGame.jl")
julia> using FlatBuffers, .MyGame
julia> mutable struct HpTable
          hp::Int
       end
julia> struct HpStruct
          hp::Int
       end
julia> io = IOBuffer()
julia> @time [FlatBuffers.serialize(io, MyGame.Example.Monster(;hp=42)) for _ = 1:1e5];
 14.450891 seconds (16.88 M allocations: 2.280 GiB, 1.39% gc time)

julia> @time [FlatBuffers.serialize(io, HpTable(42)) for _ = 1:1e5];
  1.284482 seconds (3.55 M allocations: 220.096 MiB, 1.91% gc time)

julia> @time [FlatBuffers.serialize(io, HpStruct(42)) for _ = 1:1e5];
  0.497959 seconds (1.28 M allocations: 103.010 MiB, 4.31% gc time)

julia> @time [FlatBuffers.serialize(io, 42) for _ = 1:1e5];
  0.429897 seconds (1.26 M allocations: 79.441 MiB, 1.92% gc time)

So it seems constructing a whole object (at least in the case of a Monster) consumes an order of magnitude more time and memory than constructing a one-element table, which is about 3 times slower than constructing a one-element struct. Sounds like the ability to create serialised data directly would definitely be a big performance win. However, it would be a significant API breaking change to the FlatBuffers julia package (which is already in use by a non-zero amount of people).

I'd like the addition of Julia to be true to the original intent of FlatBuffers, and for the generated code to be as performant as possible. I'm prepared to put in some effort to make this happen. It would be a significant increase in scope on the current changes, however. Could we consider integrating Julia support as-is (i.e. suboptimal), and later adding the API for direct serialization? @quinnj do you think this could be a "1.0" breaking release of the FlatBuffers package?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can certainly go forward w/ a better design here; even if it means making breaking changes. The path forward would be to implement the changes, and put package version limits on downstream packages; then to go the extra mile, do pull requests for downstream packages to switch to the new APIs and then they can do new releases requiring the new FlatBuffers version. Happy to help navigate all that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are only just adding official support for Julia to FlatBuffers, so I'd say breaking the API should not be a concern, doing the right thing is more important now. Those existing users should probably continue to use the existing code and the upgrade at their convenience.

@@ -0,0 +1,820 @@
/*
* Copyright 2018 Dolby Laboratories. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of this PR, you need to sign the Google CLA, which as I understand it, means that code that goes into this repo, won't be your copyright anymore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Google Inc.

// Singleton class which keeps track of generated modules
// and their dependencies. A singleton class is necessary
// since our generator is run multiple times with
// multiple different .fbs files, and we want to preserve
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unfortunate, since for all languages code generation from a single .fbs is supposed to be a stateless operation. If it is so that we can't get around this, we better document super clearly that code generation for Julia has additional rules.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, if FBS files depend on other ones, then yes this is true until Julia has forward declaration of types. JuliaLang/julia#269

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just looking at updating the documentation for this and it seems a bit unclear. The default seems to be that code will be generated from included files unless the --no-includes option is specified. However in Schemas.md it says

When using the `flatc` compiler to generate code for schema definitions,
only definitions in the current file will be generated, not those from the
included files (those you still generate separately).

What should I write where?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--no-includes is for not generating include statements in the generated code. Afaik all generators do not generate code for definitions coming from included files, instead it relies on those being generated already when flatc was invoked on those other files. It be great if Julia could follow that pattern as closely as possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree, this is the point of code reviews - making sure new stuff fits as closely as possible :)

Just so I’m clear on the desired behaviour, let’s take namespace_test as an example. There are two .fbs files - namespace_test1.fbs and namespace_test2.fbs. namespace_test1.fbs defines a namespace NamespaceA.NamespaceB. namespace_test2.fbs includes namespace_test1.fbs and defines some other things in NamespaceA. With the singleton class, the Julia code generated looks like this

module NamespaceA
    include(“NamespaceB/NamespaceB.jl”)
    include(“TableInFirstNS.jl”)
    include(“SecondTableInA.jl”)
end

Without the singleton class, the include(“NamespaceB/NamespaceB.jl”) is lost and the TableInFirstNS type is missing definititions for the types of its fields.

To be explicit, the definitions for NamespaceB are not generated here, but we need to keep track of which files to include, hence the need for the ModuleTable. This issue could potentially be resolved by doing away with the object API altogether as discussed elsewhere in this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spent a bit of time looking at this today. It doesn't appear that Julia modules can be re-opened in the same way that C++ namespaces can. Also, Julia does not use prefixing - files must be explicitly included. Since there are no forward declarations, the generated files must also be included in the right order. It does seem like there should be a way to do this, but the solution is not obvious to me and I suspect it is quite hairy and involved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm afraid I am not much help there. Have you asked around with Julia experts how they would solve this?

Copy link
Author

@rjkat rjkat Jan 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't, but I don't feel well acquainted enough with any of the Julia package experts to bug them. @quinnj might have better contacts than me.

The only solution I can think of is to save the module hierarchy in a structured format. When you need to add something into the hierarchy, merge it in and then eval() everything into existence. Something like this:

# NamespaceTest1.jl
module NamespaceTest1

function getmoduletree()
(
    name="NamespaceA",
    includes=[
        (name="NamespaceB",
         includes="NamespaceB/NamespaceB.jl"),
         "TableInFirstNS.jl"
    ]
)
end

end

# namespace_test1_generated.jl

function loadmoduletree(root)
    quote 
    module $(root.name)
        $([isa(String, n) ? :(include("$n")) : loadmoduletree(n) for n in root.includes]...)
    end
    end
end

include("NamespaceTest1.jl")
import .NamespaceTest1
root = NamespaceTest1.getmoduletree()

eval(loadmoduletree(root))

# namespace_test2_generated.jl

function mergemoduletrees(tree1, tree2)
    # merge two module hierarchies together
end

include("NamespaceTest1.jl")
import .NamespaceTest1
root = NamespaceTest1.getmoduletree()

include("NamespaceTest2.jl")
import .NamespaceTest2
root = mergemoduletrees(root, NamespaceTest2.getmoduletree())

eval(loadmoduletree(root))

This is not pretty, and means we are circumventing Julia's module system, which makes it tricky to work with the generated code. That said, if you think the benefits outweigh these costs I'm happy to give this a go.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjkat, I'm happy to help how I can here; I'm not quite following the details of the issue however. In particular, I'm not sure what exactly you mean when you mention "singleton class" in Julia; happy to correspond over email if that's easier, or on the JuliaLang slack for quick back and forth if that would also be easy. It would probably be best to maybe share how the "ideal" code layout would look and then point out what exactly Julia can't handle and then we can brainstorm the best approach (and as mentioned, I can definitely ask around as well, I'm close with a number of core devs).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks to a slack session with @quinnj yesterday, have come up with a solution to this. Singleton class is gone. Final things left to do are the API for incremental building and adding Docker tests for Julia.

// go up to common level (don't add dots to path if we are
// annotating the type of a field - julia doesn't like that)
unsigned int i = 0;
unsigned int n = parent.components.size();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use size_t.. unsigned int will give warnings on 64-bit compilers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

void GenEnum(const EnumDef &enum_def, std::string *code_ptr) {
if (enum_def.generated) return;
GenComment(enum_def.doc_comment, code_ptr, &JuliaCommentConfig);
std::string enum_name = NormalizedName(enum_def);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use auto where possible, here and elsewhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


static std::string GenTypeBasic(const Type &type) {
static const char *ctypename[] = {
// clang-format off
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -0,0 +1,811 @@
# Store specific byte patterns in these variables for the fuzzer. These
# values are taken verbatim from the C++ function FuzzTest1.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are handwritten tests? Should also be in tests/, though maybe under `tests/julia'.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, artefact of adding the julia package in here. Might be best to keep the tests for that package along with it rather than duplicating them in here until we are releasing the Julia package from this repository?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly I don't want duplication of anything.. but moving them would be great. Don't want Julia to be the exception :)

@rw
Copy link
Collaborator

rw commented Dec 30, 2018

This is cool stuff! Thanks for taking the time to write this PR.

@rjkat We recently added a way to run language test suites in CI using Docker. Would you add a Dockerfile to to test Julia? It will go here: https://github.com/google/flatbuffers/tree/master/tests/docker/languages

You can use the official images at https://hub.docker.com/_/julia/.

@rjkat
Copy link
Author

rjkat commented Feb 6, 2019

Sorry for the delay on this. The remaining tasks to be done are:

  • Incremental building/reading
  • Docker tests for Julia

One question in my mind is whether having a patch for the generated code as part of the test is acceptable. Currently the generated code for the Monster test needs to be tweaked - since Julia does not have forward declaration of types, the types which reference each other need to be parametrised.
JuliaLang/julia#269

If this is not acceptable as a solution, then there will be significant effort required to make the generator spit out parametric struct definitions.

I'm on holiday for 2 weeks from today, but I am keen to keep working on this when I'm back at the end of the month.

@aardappel
Copy link
Collaborator

@rjkat I am not sure what you mean by a patch here. I'd say in general we'd want flatc to be able to generate code for the tests that just works. We wouldn't want programmers to have to fix generated code manually, or have restrictions on schemas if Julia is one of the output languages.

I'm sure there is a solution, it may just require a different code structure in the generator or in the generated code. If all these languages can deal with the current test schemas, so can Julia I'm sure.

@ExpandingMan
Copy link

Hello all, is there any hope of this being resurrected?

@rjkat
Copy link
Author

rjkat commented Mar 17, 2019

@ExpandingMan the only thing left to do here is make the generator support tables that reference each other.
This would mean it would have to spit out type variables for types that haven't been defined yet, based on the include order. See JuliaLang/julia#269 for some context. It's a fair amount of work which I don't have time for at the moment, but I'm hopeful I may be able to get to it eventually.

@rw
Copy link
Collaborator

rw commented Mar 18, 2019

Is there benefit in trying to get the PR into a merge-able state, even if it isn't completely feature-complete? That way others can work on improving it?

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@quinnj
Copy link

quinnj commented Jul 31, 2020

I've been looking a little into the circular dependency problem raised in this PR for Julia; can someone comment on how the go generated code works for this? I'm sure I'm missing something, but it seems that with https://github.com/google/flatbuffers/blob/master/tests/namespace_test/NamespaceA/SecondTableInA.go and https://github.com/google/flatbuffers/blob/master/tests/namespace_test/NamespaceC/TableInC.go, both files declare an import dependency on each other, and define type structs that use types from each other's package. I thought this wasn't allow in go because of cyclic import dependencies? I also noticed that go produces an "extra" TableInC definition in the NameSpaceA (https://github.com/google/flatbuffers/blob/master/tests/namespace_test/NamespaceA/TableInC.go) and I wasn't sure if that was related/how they worked around the cyclic issue?

@aardappel
Copy link
Collaborator

@quinnj I don't thnink that we have anyone in this community who is an expert on how Go importing works ;)

@github-actions
Copy link

github-actions bot commented Feb 1, 2021

This pull request is stale because it has been open 6 months with no activity. Please comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Feb 1, 2021
@github-actions github-actions bot closed this Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants