-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Software Engineering Language Policy at Posthog #71
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,169 @@ | ||
# Request for comments: Supported languages at PostHog | ||
|
||
Since day 1 here at Posthog we've supported two and half core languages: Python and Javascript/Typescript. | ||
|
||
This set of tools has carried us a long way and we should both the cpython runtime and node a huge hats off for bringing us this far. | ||
|
||
## Problem statement | ||
|
||
We have the opportunity coming up to rebuild or greenfield build out services that are critical parts of our data pipeline. It will be important for these services to be correct, fast, and efficient. Considering this now is a good time to ask: Are we using the correct tools for the job? | ||
|
||
So why even bring this up? | ||
|
||
Frankly: | ||
|
||
- Python is slow. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Python can also be blazingly fast
Anyway, this is all a way of saying that you can write slow python and you can write fast python, and perhaps we should consider focusing our efforts on the latter before introducing new languages. |
||
- Node is a memory hog. | ||
- Node tooling is slow. | ||
- Dependencies can be huge. | ||
- No guarantees that code is correct. | ||
Comment on lines
+13
to
+19
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
True generally, but relevant only for specific usecases (like here).
Has this been a problem? There are alternative implementations of JS (e.g. bun, deno) that do this better, but are too fresh to rely on in production.
Depends on the tooling. The frontend builds quite fast (sub 3-10sec), the plugin server can be improved 10x if needed.
We're switching to
Can you ellaborate? |
||
|
||
Can we be successful continuing to use these languages? Of course. | ||
|
||
- They are well understood. | ||
- Our tooling is setup to support these languages | ||
- We have internal expertise. | ||
- Generally they are _good enough_. | ||
|
||
Why should we be open to other languages? | ||
|
||
- We have different people with experience working with different languages. | ||
- 100% guaranteed static typing is quite beneficial | ||
- Compiled languages provide more confidence shipped code is correct | ||
- There are compiled languages that provide significant efficiency wins on CPU, Memory, and performance | ||
- Using the right tool for the job is typically the right thing to do (if you can ship it) | ||
|
||
Candidate Languages: | ||
|
||
- [Golang](https://go.dev/) (big surprise) | ||
- [Rust](https://www.rust-lang.org/) 🦀 | ||
- [OCaml](https://ocaml.org/) | ||
- [Elixir](https://elixir-lang.org/) (dynamically typed) | ||
- [Scala](https://www.scala-lang.org/) | ||
- [Java](<https://en.wikipedia.org/wiki/Java_(programming_language)>) | ||
|
||
## Meet the eligible languages | ||
|
||
### Golang | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tbh I would not be against us introducing Go for a new service, so long as we're sure that Python does not and cannot suit our use case |
||
|
||
The good: | ||
|
||
- Extremely easy to learn. Most engineers can learn and be productive in about a day. | ||
- Built to remove most contentious parts of development | ||
- Designed to make engineering easier | ||
- Built for network services and concurrency | ||
- Great standard lib | ||
- Super fast compile times | ||
- Light on memory | ||
- Used by plenty of organizations big and small | ||
- We have plenty of Gophers here. | ||
- A simple binary as a deliverable. | ||
|
||
The bad: | ||
|
||
- Most people would not say the language sparks joy (I disagree) | ||
- Not terribly expressive | ||
- Verbose, but easy to read | ||
|
||
### Rust | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I would not be wrong in saying I have the most Rust experience here. I LOVE Rust. and I'd love to write it at work. I also have several friends who are fantastic engineers + would consider applying here purely because of Rust. But tbh I'd be super against us adopting it. The learning curve is huge (prepare to not be comparatively productive for around a quarter). Unless it turns out there's actually more people writing Rust for fun here than I thought 😄 While maturing, the ecosystem is nothing like Python. We'd be spending a lot more time in the weeds. There are fewer "obvious" choices than other languages. EG, which async runtime should we use (if any...)? Why? Do we know enough about how our application does/should work in production to even start making such a choice? Generally organizations choose Rust because
|
||
|
||
The good: | ||
|
||
- Very hot right now | ||
- Very good for when you need to hyper optimize something but would like to avoid C or C++ | ||
- Extraordinarily expressive and fun to program | ||
- Extremely performant and light on memory | ||
- Used by plenty of organizations big and small | ||
- We have a few Crustaceans here! | ||
- A simple binary as a deliverable. | ||
|
||
The bad: | ||
|
||
- Slow compile times | ||
- Harder to read because of how expressive | ||
- Slower ramp up time to become proficient in as an Engineer | ||
|
||
### OCaml | ||
|
||
> OCaml is a general-purpose, industrial-strength programming language with an emphasis on expressiveness and safety. | ||
|
||
You may not be familiar with this one but it is _very_ safe and performant. Companies that have huge amounts of money on the line who are risk adverse to defects and slowdowns, like the HFT firm Jane Street use OCaml because it is unforgiving in how type safe it is. | ||
|
||
> OCaml’s powerful type system means more bugs are caught at compile time, and large, complex codebases are easier to maintain. This makes it a good language for running critical code. At the same time, sophisticated inference makes the type system unobtrusive, creating a smooth developer experience. | ||
|
||
The good: | ||
|
||
- Garbage collected | ||
- Algebraic data types | ||
- Pattern matching | ||
- Type inference | ||
- Immutable | ||
- Static Type-checking | ||
- First-class functions | ||
- Parametric polymorphism | ||
- Used by serious programming shops | ||
|
||
The bad: | ||
|
||
- Not terribly popular relative to the others | ||
- Can be considered relatively academic | ||
- No one experienced here | ||
|
||
### Elixir | ||
|
||
Oh, Erlang <3 | ||
|
||
The good: | ||
|
||
- Built for streaming data | ||
- The cockroach of runtimes. | ||
- Automatic function level clustering | ||
- Hot reloading of functions | ||
- Compiled | ||
- Relatively functional | ||
|
||
The bad: | ||
|
||
- Dynamically typed | ||
- Runs in a VM | ||
|
||
### Scala / Java | ||
|
||
Grouping these together because they really are converging. Really this is any language on the JVM. | ||
|
||
The JVM is truly a wonderful runtime. It's very fast. The hotspot detection + Just In Time Compiling is _super_ impressive. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I saw a talk by the folk at a German car sales company who were shipping an ML model in Java. They had to write tooling to hit all of the code paths they cared about during boot because JIT was a problem for them. That said there are a bunch of alternative VMs that are focussed on speed because of serverless-style loads... |
||
|
||
The good: | ||
|
||
- Surprisingly performant | ||
- JVM Library interrop means huge availability of libraries out there | ||
- Tons of expertise and companies big and small are building software in this | ||
- Kafka and Zookeeper are built on this | ||
|
||
The bad: | ||
|
||
- Considered somewhat crusty | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Strong opinions weakly held time. Anyone writing java should be writing kotlin. The dependency management and build tooling in java is horrible: arcane, manual, and hard to debug. |
||
- Slow boot times | ||
- JVM tuning is no fun | ||
|
||
## Discussion | ||
|
||
So with this overview laid out let's talk about our first scenario where we need to decide what language we should use: | ||
|
||
[RFC: Inserter Service Requirements](https://github.com/PostHog/meta/pull/68) | ||
|
||
The TL;DR requirements here are: | ||
|
||
- Consume from Kafka | ||
- Insert into ClickHouse | ||
- Deserialize some portion of payload (serialization TBD) to determine where to insert | ||
|
||
It's a simple service and could be written any anything really. It does give us an opportunity to branch off our well traveled path of Python and Typescript. In my opinion I think Golang or Rust would be a great fit here, as would the other languages listed. I'm a Gopher in particular so I would really like to see more written here. | ||
|
||
## Success criteria | ||
|
||
_How do we know if this is successful (i.e. metrics, customer feedback), what's out of scope, whats makes this ambitious?_ | ||
|
||
The goal here is to spark debate about languages that we should use here at Posthog. What are we open to? Why should we not adopt new technologies. | ||
|
||
Success here would be to make a decision and effectively enact a policy on this and have engineers aligned and not worry about this again (at least for some time) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As above... one decision only?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest some of the benefits of switching language comes from re-writing/extracting the service and only part from re-writing/extracting the service in a different language
At a previous job we spent a while rewriting from Python to C# because "C# is faster than Python". Then realised that we were CPU bound and moved a bunch of work onto the GPU. Python and C# were then faster than we needed them to be.
So: "are we using the correct tools for the job" is a great question. But being really clear on why we're asking that and how we'll know when we're done is super important. E.g. moving ingestion onto the GPU is probably more complicated and not actually faster.
So, there are two interesting questions here:
These are really different questions.
I'm not working on ingestion so I can have opinions on 1 but "so what?"
On "how many programming languages"... The problem with a new (to you) programming language is almost never the language.
Aside... from https://neverworkintheory.org/2014/01/29/stefik-siebert-syntax.html Python and Ruby are consistently measured as easiest to learn. C-style languages are no easier to learn than a language made of random keywords.
The problem with a new (to you) language is generally the tooling. Most noticeably, in my experience: dependency management, and building and releasing things.
(go and update our android library from java 8 to java 19 if you want proof of this :))
So, we need a comparison of building, releasing, and running in k8s for the languages we consider. I think we can consider a smaller list (although C# marketing should be "Java but good", and I've a friend we could hire if we started using clojure which seems to inspire massive love from folk that use it)
The other thing is adoption within the org. Who needs to learn the language, and how and when do they do that? How do we know when to write it? Are we migrating to it or adding it alongside? Which services mustn't it be used for?
And finally: hiring. "Come work here because you're excited about language X. Incidentally the first few months you'll be working on these bugs in Python". So, we can pull from a wider pool with a wider pool of languages but are we in a place to hire someone who only wants to work on language X
Bonus "after finally" point... who owns tooling for each language? Does the platform team commit to providing support for building, deploying, and running all the languages? Is that in our definition of platform? Or do we need some champions