Using Benthos as a Conduit Processor #1614

nickchomey · 2024-05-23T17:10:47Z

nickchomey
May 23, 2024

I'm relatively new to these sorts of tools, but it seems to me that Conduit and Benthos are somewhat redundant as they are both stream processors. As such, it seems somewhat silly to use them together (as with the PoC Benthos Connector - conduitio-labs/conduit-connector-benthos#4) - better to choose one.

The main difference between them seems to be that Conduit is much more focused on CDC from data sources/stores/bases while Benthos is much more focused on the actual pipeline processing/transformation - it has many dozens of processors while Conduit has only a handful. Their processors also allow for enrichment via Sql queries, nats kv etc...

It seems to me that the universal/OpenCDC of Conduit is far more fundamental/important, since a pipeline ultimately needs to start from some data source, and should therefore be used as the main tool. But it would be a shame to not leverage the immense processing power of Benthos.

So, what I'm thinking is that rather than use Benthos as a Conduit Source/Destination, as was attempted in this repo, why not just embed it's pipeline processors into Conduit as a Standalone Processor? It could have some sort of Benthos bloblang mechanism for choosing the desired Benthos processors.

This would allow Conduit to focus on its strength of CDC, while leveraging Benthos' strength in stream processing. You could, of course, always make other custom processors in Go or JavaScript to suit your needs (or probably even use existing Benthos custom processors).

It's a topic that has been brought up various times in Benthos' Github and Discord, and they're generally responded to with the following links:

Apparently this can be used to embed Benthos into a golang app/binary
https://pkg.go.dev/github.com/benthosdev/benthos/v4/public/service#example-package-StreamBuilderConfig

One more example of that api here redpanda-data/connect#1727 (comment)

And here's a repo that apparently has relevant examples
https://github.com/benthosdev/benthos-plugin-example

Thoughts?

nickchomey · 2024-05-23T17:21:43Z

nickchomey
May 23, 2024
Author

It was briefly mentioned in the original issue in that this would face challenges as standalone processors need to be wasm.

Here is an article from the Benthos dev about how to build Benthos in wasm. Perhaps this can be done here?

https://www.benthos.dev/blog/2019/05/27/compiling-benthos-to-wasm/

0 replies

lovromazgon · 2024-05-23T18:09:45Z

lovromazgon
May 23, 2024
Maintainer

This is definitely an interesting idea. While Conduit provides a set of builtin processors including a JavaScript processor and a Go SDK for standalone processors, it's currently missing an easy way to enrich data in the pipeline. This is the part where I think Benthos could provide a lot of value. It could also be valuable to users who are already familiar with Benthos and would like to use Bloblang to process their data.

Sadly, we'll hit an issue if we try to tackle this as a standalone processor. The thing is that standalone processors currently run in a constrained WASM sandbox and don't have access to the file system or network. This makes enriching pretty much impossible, with or without Benthos. To solve this we'd have to bind system calls for the standard net package, say using something like github.com/dispatchrun/net, although last time I played around with that package it didn't seem stable enough. I found issues with reading the response body and I got stuck with configuring HTTPS, as the WASM plugin would need to have access to the certificate store. Additionally, the lib only provides bindings for UNIX systems, which is ok for a start, but it would break the Windows release of Conduit. Doing this right is not trivial, but in the long run, we definitely want to add this functionality, as it would be a very powerful addition.

In other words, implementing a Benthos standalone processor would require significant effort to enable the needed functionality. However, there's another way - a builtin processor. A builtin processor is running inside of Conduit directly (regular Go code), so it doesn't have any of the WASM sandbox limitations. This approach could work, although we don't plan to tackle this in the next few months, so I'll give you some pointers, in case you want to give it a go yourself. If you wanted to create a fully functional Benthos processor, you would have to create your own entrypoint (like we have here) and add the Benthos processor to the global map of builtin processors before calling conduit.Serve. The Benthos processor configuration would have to be provided by the user as a string, which will make it a bit awkward to write it in the pipeline configuration file, as it will basically be a yaml formatted string inside a yaml file. This doesn't mean it can't be done, to prove the value of such a processor.

Let us know if this was helpful and if we can provide any further guidance, in case you decide to give it a go.

17 replies

mihaitodor Jun 6, 2024

Yes, I think we're aware of what bloblang is and does. But what I'm trying to understand is what a bloblang-only (without the rest of benthos) standalone processor for Conduit would offer/allow?

I noticed you have a javascript processor and, in many cases, people don't need to write very complex transformations which require multiple processors. Basically, it would be a shortcut for someone who just wants to write a Bloblang transformation and use it with either Benthos or Conduit. However, they can achieve the same with a Benthos processor which contains just a mapping / mutation processor in it. My initial thought was that it would be easier to include Bloblang as WASM if you want to go down that route rather than a Benthos stream. But, if you're willing to import Benthos as a library in your Go code, then it makes sense to go for the stream version and support multiple processors.

It would be great if you or others from the Benthos/ Redpanda team could assist with this!

Feel free to raise a matching enhancement proposal on https://github.com/redpanda-data/connect so we can track it. I'm a bit caught up right now, but happy to keep in touch about this.

nickchomey Jun 6, 2024
Author

@mihaitodor yes, JavaScript is another option or processing in conduit. I agree that bloblang would be easier, though can it do much without having all of the other benthos processors available?

I think having the full range of benthos processors (but not necessarily the inputs and outputs) within a conduit pipeline would be ideal for most people, as it would combine the powerful CDC capabilities of Conduit with the processing flexibility and simplicity of benthos and bloblang.

I'll create an issue at the benthos repo in a bit. Hopefully this can happen at some point!

mihaitodor Jun 6, 2024

though can it do much without having all of the other benthos processors available?

You'd get the full list of currently available functions and methods, which I think is on par with what you can do with Javascript, although there probably are more Javascript libraries and snippets out there that you can just import and use. Would be cool to know what else people would like to have available in bloblang.

I'll create an issue at the benthos repo in a bit. Hopefully this can happen at some point!

Thank you! ❤️

nickchomey Jun 7, 2024
Author

Ah ok, very cool. Perhaps I'll try to make a bloblang wasm processor sometime soon, in lieu of a full processor pipeline plugin.

nickchomey Jul 17, 2024
Author

As mentioned in #1646, Wazero has no plans whatsoever to add support for WASI preview 2 (or any other preview until 1.0 and wasi is in w3c spec. So, implementing Benthos as a standalone processor seems to be impossible. Perhaps at some point i'll look at doing so as a built-in processor, as discussed above

gedw99 · 2024-08-25T05:40:28Z

gedw99
Aug 25, 2024

I use https://github.com/calmera/jetscript

Its NATS and Benthos together with BlobLang...

If Conduit CDC can output to NATS then Jetscript can react to that.

So Nats is the Bus between Conduit and Benthos in this case.

https://github.com/julien040/anyquery is like Conduit, but keeps a SQL DB of the changed records, allowing you to Query it. Ironically you could build a connector in Conduit to do CDC on the AnyQuery SQLite DB records.

Just like Conduit CDC, AnyQuery can fire out to NATS, and so JetScript can react to those messages in NATS using Benthos.

The Benthos Scripts are themselves stored in NATS also, so you can easily Scale out 1,00 of servers that just catchup to the Scripts and then the messages. It's just like how devs store binaries in NATS, and when a new servers provisions, it asks for them from NATS and then starts processing Message send in to it. When The Servers dies or whatever, it naturally falls on the NATS System for another day.

I really like using NATS for everything :) You can built rings of NATS like we built rings of Caches.

3 replies

nickchomey Aug 25, 2024
Author

I'm having trouble seeing what is being proposed here, or even what is relevant...

This is about embedding/connecting to Benthos from within a Conduit ETL pipeline - prior to outputting to a destination connector.

There's already a NATS destination connector for Conduit, so there's nothing stopping you from outputting to that and then benthos or jetscript or anything else can subscribe to nats and do it's thing...

As for anyquery, I don't see how it compares to conduit. It is just a multilingual query client, it isn't a real time CDC or ETL mechanism.

It also doesn't seem to have a db of what passes through it, as you suggested - it's just aims to do what it says on the tin: make any query to any datastore. The sqlite db just seems to be used for managing connections, transforming queries etc

It also doesn't seem to mention NATS anywhere in its docs...

Are you proposing that conduit build a datastore and query mechanism?

Or are you suggesting that conduit make an Anyquery standalone processor (https://conduit.io/docs/processors/standalone), such that specific ETL events can trigger an AnyQuery query? I'm not sure if that would be possible, given the same WASI constraints that we've already discussed regarding Benthos. It would be better to output to nats and then trigger anyquery from there.

Whatever the case, this isnt the place for such a discussion. A separate discussion or issue could be opened

gedw99 Aug 25, 2024

It’s simply that since benthos is not part of conduit currently, you feed the conduit data to benthos via nats .

Believe me , I prefer if it benthos was built in to conduit . This is just a pretty yucky work around.

The other thing is that you would have many other thing from many other systems often feeding into a NATS / Benthos system .. so running it inside conduit is not really a great way to do it.

nickchomey Aug 26, 2024
Author

Yes, it's definitely an option, which I listed above as well. But it's not the topic of discussion...

The reason that benthos inside conduit would be desirable is
a) for people who aren't using nats
b) to be able to do the transformations before it gets to nats, to lower the load
C) probably even filter out some messages, to lower the load further

And I'm still confused about what AnyQuery has to do with any of this...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Benthos as a Conduit Processor #1614

{{title}}

Replies: 3 comments 20 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Using Benthos as a Conduit Processor #1614

nickchomey May 23, 2024

Replies: 3 comments · 20 replies

nickchomey May 23, 2024 Author

lovromazgon May 23, 2024 Maintainer

mihaitodor Jun 6, 2024

nickchomey Jun 6, 2024 Author

mihaitodor Jun 6, 2024

nickchomey Jun 7, 2024 Author

nickchomey Jul 17, 2024 Author

gedw99 Aug 25, 2024

nickchomey Aug 25, 2024 Author

gedw99 Aug 25, 2024

nickchomey Aug 26, 2024 Author

nickchomey
May 23, 2024

Replies: 3 comments 20 replies

nickchomey
May 23, 2024
Author

lovromazgon
May 23, 2024
Maintainer

nickchomey Jun 6, 2024
Author

nickchomey Jun 7, 2024
Author

nickchomey Jul 17, 2024
Author

gedw99
Aug 25, 2024

nickchomey Aug 25, 2024
Author

nickchomey Aug 26, 2024
Author