Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md

date

title

subtitle

author

keywords

license

abstract

toc

hyperrefoptions

2023-08-11

Implementing Bech32 and Segwit Address in Pure Erlang

From Pair Programming to Solo Programming

Mathieu Kerjouan

erlang,otp,nostr,bech32,segwit,address,bitcoin

CC BY-NC-ND

Bech32 encoding is used by Bitcoin and Nostr to encode binary data like public key or signature. On Bitcoin side, another format is also used, called Segwit using Bech32 to encoding its payload. This article is a brief overview of Bech32 and Segwit implementation in pure Erlang. The goal is to show how to use this new implementation, but also to explain how and why it as been implemented. A quick overview of other implementation is also available at the end of this publication.

true

linktoc=all

Implementing Bech32 and Segwit Address in Pure Erlang

This implementation was created to be easily shared with other Erlang project/application without importing external modules. A simple copy/paste of bech32 and segwit modules should do the job for the moment or one can important this application in his own project by including nostr as dependency. In both case, it should work.

In fact, this application has not been deployed in production environment, and we can't really trust it for now, but it will be pretty soon integrated in nostr project. Both modules have a decent coverage (more than 90%) and should work pretty well everywhere.

Bitcoin, Bech32 and Segwit Addresses

Bech32¹ data format was designed, created and specified in BIP-0173[^bitcoin-bip-00173] by the Bitcoin core team, in particular Pieter Wuille² and Greg Maxwell³. The goal of this new format was to fix issues from the old base58⁴ format, in particular its complexity and its poor performance because of SHA256 checksum.

Bech32 format is divided in 3 parts:

The human readable part or HRP is used to store
A static separator represented by the character 1 (one)
A data part containing a payload and a checksum.

In other hand, segwit⁵ (segregated witness) address is kind of subset of bech32, fixing limits on the length of the encoded data. Only segwit address standard is implemented there.

If you are looking for more information about how bech32 magic is working, other posts⁶ can easily be found using your favorite search engine.

Teaching and Learning with Pair Programming

This implementation started as an experiment with the following question in mind: "is it possible to do pair programming during live session recorded on multi-platform?" The idea was to deploy a small collaborative text editor like cloud9⁷ in an isolated place, share a link to this editor to the developers and start working on the project, without a lot of preparation.

The good part of this experiment was to see the different point of view during the implementation of the algorithm, based on the respective experiences of each participants. It was interesting to see what kind of solution was firstly implemented and how it evolved.

The bad part was quite important though. Because of the lack of preparation, the code was not correctly implemented. During the experiment, it was quite hard to think about the different features.

It's always hard to work with someone else, even more when you are recorded and without any kind of preparation. If you want to record your pair programming session, try, at least, to prepare it few weeks before. Try also to do a small implementation by yourself first, just to find which part will be more complex than another.

From Wild Implementation to Erlang/OTP

Creating implementation only from specification is hard, sometimes, the specifications are not even available, and one will need to read code example from already existing application. It could be harder at first to understand the code created by some anonymous developers but it can also help a lot.

The main reference used to implement it in Erlang was the Python version but few others like the one in Haskell, C and Ruby helped a lot as well. One big advantage with script languages like Python or Ruby is the way you can develop by just crafting something on a REPL. Recreating the module in this environment by just copying and pasting it or by importing it works pretty well.

The previous implementation of Schnorr scheme⁸ was also using this model and it makes a lot of things easier. Firstly, having a code you can play with will help you to design the interfaces you will create on your own implementation. Everyone is designing them differently because of their experiences or their requirement. In the case of bech32 and segwit, the implementation was created with already some helpers in mind.

Secondly, having an example in another language will help you to design the state(s) you will need to deal with. One of the most complex task is to manage the different transition of the state, looking on the other implemention will give you an idea of the moving part of the application.

Usage and Example

The reference implementation in Python looks a bit weird at first glance, in particular because if something goes wrong, nothing is returned. Any developers using this version in production environment will probably have troubles when understanding a crash because of that. As sysadmin and Erlang developer, I hate being in a situation where something is crashing without any explicit reason. This bech32/segwit implementation is fixing that by explicitely returning errors and their reason or by raising an error if no matching clause has been found.

1> bech32:encode("123", "test").
{error,[{reason,"Invalid char"},
        {char,"t"},
        {position,0},
        {head,[]},
        {rest,"est"}]}

Bech32 encoding works only with indexed values by default. The first argument is the HRP part as string() and can use any kind of printable characters. The second argument though is the indexed value, and should use a list of integers, each one greater or equal to 0 and lesser than 32.

2> bech32:encode("123", [ X || X <- lists:seq(0,31) ]).
{ok,"1231qpzry9x8gf2tvdw0s3jn54khce6mua7l4lr8s5"}

As you can see, this function used with these arguments is returning a correct bech32 encoded string without any errors. Using indexed values is not always what you want and what you need. In fact, I know lot of developers would probably use also unindexed data by default for some of their hacks. bech32:encode/2 is only a wrapper around bech32:encode/3 where the last argument contains the extra options. If someone want to use unindexed value, or switch to another bech32 format like bech32m, one can simply define it in this part of the function.

3> bech32:encode("123", "test", [{format, bech32}, {indexed, false}]).
{ok,"1231w3jhxaq5p4x29"}

4> bech32:encode("123", "test", [{format, bech32m}, {indexed, false}]).
{ok,"1231w3jhxaqpa9208"}

This feature can be quite interesting to encode on the fly any kind of unindexed data and use bech32 as serializer mechanism. If you want to encode External Term Format (a format used to export terms in Erlang, also known as BERT) it is possible to do it easily.

5> bech32:encode("123", term_to_binary([{ok, value}])
                ,[{format, bech32m}, {indexed, false}]).
{ok,"1231sdkqqqqqq95qyeqqqfhkkeqqq4mxzmr4v44qh4dmfp"}

The bech32 string produced contain now the value of the serialized term [{ok, value}].

6> bech32:decode("1231sdkqqqqqq95qyeqqqfhkkeqqq4mxzmr4v44qh4dmfp").
{ok,#{checksum => [23,21,13,27,9,1],
      data =>  [16,13,22,0,0,0,0,0,0,5,20,
                0,4,25,0,0,0,9,23,22,22,25,
                0,0,0,21,27,6,2,27,3,21,12,
                21,21,0],
      format => bech32m,hrp => "123",
      origin => "1231sdkqqqqqq95qyeqqqfhkkeqqq4mxzmr4v44qh4dmfp"}
}

When decoded, the data returned are using base32 and must be converted to another base if you want to use it correctly. Because developers are lazy, an option has been created to help them to automatically convert this data into any kind of base (or format).

7> bech32:decode("1231sdkqqqqqq95qyeqqqfhkkeqqq4mxzmr4v44qh4dmfp"
                ,[{converter, {base, 8}}]).
{ok,#{checksum => [23,21,13,27,9,1],
      data => [131,108,0,0,0,1,104,2,100,0,
               2,111,107,100,0,5,118,97,108,
               117,101,106,0],
      format => bech32m,hrp => "123",
      origin => "1231sdkqqqqqq95qyeqqqfhkkeqqq4mxzmr4v44qh4dmfp"}
}

Using {convert, {base, 8}}, the function is now returning a list of integer using bytes. If you are familiar with ETF, you can see the first integer is 131, it's the tag used to identify this kind of data. Does it mean our data was correctly decoded? Let try it.

8> Data =
8> {ok, #{ data := Data}} =
8>   bech32:decode("1231sdkqqqqqq95qyeqqqfhkkeqqq4mxzmr4v44qh4dmfp"
                  ,[{converter, {base, 8}}]).

9> erlang:binary_to_term(list_to_binary(Data)).
[{ok,value}]

The data returned is the serialized one created previously but the conversion is done outside the converter. To make things easier (and lazyer), a function can be passed to the converter using {converter, fun(D) -> {ok, D} end}.

10> Converter = fun(Data) ->
10>   % convert base5 to base8
10>   {ok, Bytes} = bech32:convertbits(Data, 5, 8),
10>   % convert list to binary
10>   Binary = list_to_binary(Bytes),
10>   % convert binary to term
10>   {ok, binary_to_term(Binary)}
10> end.
#Fun<erl_eval.42.3316493>

11> bech32:decode("1231sdkqqqqqq95qyeqqqfhkkeqqq4mxzmr4v44qh4dmfp"
11>              ,[{converter, Converter}]).
{ok,#{checksum => [23,21,13,27,9,1],
      data => [{ok,value}],
      format => bech32m,hrp => "123",
      origin => "1231sdkqqqqqq95qyeqqqfhkkeqqq4mxzmr4v44qh4dmfp"}
}

Now, the data returned is the good one, already deserialized and valid. The function convertbits/3 has been used, it's the good time to talk a bit about this one. The bech32 model implementation offered by Bitcoin project and made in Python gives access to a function called convertbits. It is used to convert from one base to another. In fact, in Erlang we already have this feature with binary and bitstring term coupled with pattern matching, but in this particular scenario, I found this convertbits function easier to use.

12> bech32:convertbits("test", 8, 6).
{ok,[29,6,21,51,29,0]}

13> bech32:convertbits("test", 8, 64).
{ok,[8387236823100817408]}

14> bech32:convertbits("test", 8, 128).
{ok,[154717211161333530444085113906767331328]}

15> bech32:convertbits("test", 8, 1)
{ok,[0,1,1,1,0,1,0,0,0,1,1,0,0,1,0,1,0,1,
      1,1,0,0,1,1,0,1,1,1,0,1,0,0]}

In the example below, an Erlang string (a list of printable integers) is converted to different base like base6, base64, base128 and in bits. convertbits/3 is probably slower than doing a simple conversion with binary pattern matching, but it offers more flexibility. Perhaps small optimization can be done there.

The next part is segwit, this format is a bit more restrictive than raw bech32 encoding. Firstly, two versions of segwit are available, version 0 and version 1. Version 0 set a limit to the length of the data section to 20 elements.

15> segwit:encode("test", 0, [ X || X <- lists:seq(1,19) ]).
{error,[{reason,"Invalid data length"}]}

16> segwit:encode("test", 0, [ X || X <- lists:seq(1,21) ]).
{error,[{reason,"Invalid data length"}]}

16> segwit:encode("test", 0, [ X || X <- lists:seq(1,20) ]).
{ok,"test1qqypqxpq9qcrsszg2pvxq6rs0zqg3yyc5uskwrt"}

In other hands, this is not really the case with version 1 of the format, where it can encode a value from 2 to 40 elements.

17> segwit:encode("test", 1, [ X || X <- [1,2] ]).
{ok,"test1pqypqe8v4yj"}

18> segwit:encode("test", 1, [ X || X <- lists:seq(1,40) ]).
{ok,"test1pqypqxpq9qcrsszg2pvxq6rs0zqg3yyc5z5tpwxqergd3c8g7"
    "ruszzg3rysjjvfegauug3k"}

To decode a segwit address, one can use segwit:decode/1 and will have all required information about it retured as map.

19> segwit:decode("test1pqypqxpq9qcrsszg2pvxq6rs0zqg3yyc5z5t"
19>               "pwxqergd3c8g7ruszzg3rysjjvfegauug3k").
{ok,#{hrp => "test",
      value => [1,2,3,4,5,6,7,8,9,10
               ,11,12,13,14,15,16,17
               ,18,19,20,21,22,23,24
               ,25,26|...],
      version => 1}
}

Let try all these marvelous new functions on real world data. The first to test is with a nostr address, perhaps the most important feature for this nostr client/server project. Edward Snowden nostr address is npub1sn0wdenkukak0d9dfczzeacvhkrgz92ak56egt7vdgzn8pv2wfqqhrjdv9.

20> bech32:decode("npub1sn0wdenkukak0d9dfczzeacvhkrgz92ak56egt7vd"
20>               "gzn8pv2wfqqhrjdv9").
{ok,#{checksum => [23,3,18,13,12,5],
      data => [16,19,15,14,13,25,19,22,28,22,29,22,15,13,5,13,9,
               24,2,2,25,29,24,12,23,22,3,8,2,5,10,29,22,20,26,
               25,8,11,30,12,13,8,2,19,7,1,12,10,14,9,0,0],
      format => bech32,
      hrp => "npub",
      origin => 
        "npub1sn0wdenkukak0d9dfczzeacvhkrgz92ak56egt7vdgzn8pv2wfqqhrjdv9"}
}

It looks good, the function deserialize the address, print the checksum, return the data and gives the HRP. What about segwit address now. To have a raw segwit address, the best way is to go to blockchain.com and take the first transaction displayed. The transaction was from bc1q52265p57jwxph407dqp9r9cucm3qvrqwg9kqpu to bc1q0nwkgcvxhufess9keeszv7vw8xq8zmuy3upa7t.

21> segwit:decode("bc1q52265p57jwxph407dqp9r9cucm3qvrqwg9kqpu")
{ok,#{hrp => "bc",
      value => [162,149,170,6,158,147,140,27,213,254,104,2,81,
                151,28,198,226,6,12,14],
      version => 0}}
      
22> segwit:decode("bc1q0nwkgcvxhufess9keeszv7vw8xq8zmuy3upa7t").
{ok,#{hrp => "bc",
      value => [124,221,100,97,134,191,19,152,64,182,206,96,
                38,121,142,57,128,113,111,132],
      version => 0}}

It seems bech32 and segwit modules can parse values from the wild.

Under the Hood

The whole code is divided in small functions doing one thing, but trying to do it well.

Charset and indexing is done using index_to_charset/1 and charset_to_index/1 functions. Both are really simple one getting an integer and converting it to another one (ASCII for example). To be sure a character is available in the charset, the function is_charset/1 has been created, same has been done for the indexed values with is_index/1 function.

Strings or indexed strings must be tested, they are by using respectively valid_indexed_string/1 and valid_string/1. Both of these functions are really important to ensure the data to be parsed and serialized or deserialized.

Bech32 format is including lot of interesting function, in particular a feature to checksum and reconstruct the message in case of problem. The algorithm used is still a bit obscure at my level of knowledge but seems to work perfectly. polymod/1 function generates the checksum needed to reconstruct and ensure the data sent have been correctly received. A macro has been used here to simulate a loop as a pipeline.

% ...
-define(POLYMOD_GENERATOR(INDEX, GENERATOR),
    polymod_generator(INDEX, Checksum, Top) ->
        case (Top bsr INDEX) band 1 of
            0 ->
                polymod_generator(INDEX+1, Checksum bxor 0, Top);
            _Value ->
                polymod_generator(INDEX+1, Checksum bxor GENERATOR, Top)
        end
    ).
% ...
?POLYMOD_GENERATOR(0, 16#3B6A_57B2);
?POLYMOD_GENERATOR(1, 16#2650_8E6D);
?POLYMOD_GENERATOR(2, 16#1EA1_19FA);
?POLYMOD_GENERATOR(3, 16#3D42_33DD);
?POLYMOD_GENERATOR(4, 16#2A14_62B3);
polymod_generator(5, Checksum, _Top) -> Checksum.

The rest is mainly functions used to convert and validate the value, this previous one is probably the most completed created in this project. Decoding and encoding functions are executed as instruction flow, one after another one until the final result is being returned by encode or decode functions.

At this time, the coverage of these two modules, segwit and bech32 are quite pretty good:

module	coverage
bech32	99%
segwit	97%

Here a quick analysis of these modules with scc.

Language	File	Lines	Blanks	Comments	Code	Complexity
Erlang	`bech32.erl`	1316	75	413	828	10
Erlang	`segwip.erl`	263	18	135	110	2

In total, the full bech32/segwit implementation with all the coverage was made in 1579 line of Erlang code (including comments and blank lines).

Language	Files	Lines	Blanks	Comments	Code	Complexity
Erlang	2	1579	93	548	938	12
Total	2	1579	93	548	938	12

The final result of estimated cost, schedule effort and people required:

Estimated Cost to Develop (organic): $25,258
Estimated Schedule Effort (organic): 3.398828 months
Estimated People Required (organic): 0.660230

In fact, this implementation in Erlang takes around 3 days with one developer dedicated full time but that's an estimation, other analysis tools could have produced more accurate results.

Conclusion, Optimizations and Future Modifications

This is a first implementation, made from scratch based on Python model implementation, and lot of features are missing, even if this implementation seems flexible, a bit of cleanup will be required.

This code is not optimized, it has been created quickly but it was correctly tested. Unfortunately, if this code is used in production environment, a clear performance problem will happen. This is why it is required to have an idea how to improve this implementation.

Proper (property based testing) and Dialyzer (static-analysis) needs to be added as well as profiling References.

Anyway, it was fun to create bech32 and segwit format in pure Erlang. I hope I will find more time to improve it in a near future, and perhaps use it outside of nostr.

Alternative Implementations

Erlang is perhaps not the best language to deal with bech32 and segwit. Firstly because Erlang is not the best language for doing mathematics, but also because bech32 and segwit have been designed for more classical language. You can, though, find small implementation, more tested and optimized in other languages. Here a list of other bech32 and segwit implementation you can easily find on Github.

Project	Language	File	Lines	Blanks	Comments	Code	Complexity
bech32	C	`segwit_addr.c`	209	11	20	178	90
bech32	C++	`bech32.cpp`	226	27	81	118	43
bech32	C++	`segwit_addr.cpp`	86	10	23	53	27
bech32	Go	`bech32.go`	217	12	25	180	78
bech32	Haskell	`Bech32.hs`	159	25	10	124	18
bech32	JavaScript	`bech32.js`	131	10	19	102	28
bech32	JavaScript	`segwit_addr.js`	91	5	19	67	26
bech32	Python	`segwit_addr.py`	137	13	26	98	11
bech32	Ruby	`bech32.rb`	99	14	30	55	12
bech32	Ruby	`segwit_addr.rb`	86	11	19	56	36
bech32	Rust	`bech32.rs`	230	23	60	147	45
bech32	Rust	`wit_prog.rs`	219	11	73	135	36
bitcoinj	Java	`Bech32.java`	188	17	35	136	38
bitcoinj	Java	`SegwitAddress.java`	464	42	196	226	58
bitcoinaddress	Python	`segwit_addr.py`	123	13	25	85	7
btclib	Python	`bech32.py`	146	28	56	62	28
ocaml-bech32	OCaml	`bech32.ml`	468	43	22	403	54
bitcoin-address-validator	PHP	`Bech32.php`	342	61	80	201	43
bech32	Lisp	`bech32.lisp`	248	25	0	223	19
bech32	Haskell	`Internal.hs`	919	78	293	548	49
bech32-elixir	Elixir	`bech32.ex`	356	47	14	299	12

Based on scc, the less complex is using Python and the most complex is the one in C (headers were not included).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0009-implementing-bech32-and-segwit-address-in-pure-erlang

0009-implementing-bech32-and-segwit-address-in-pure-erlang

README.md

Implementing Bech32 and Segwit Address in Pure Erlang

Bitcoin, Bech32 and Segwit Addresses

Teaching and Learning with Pair Programming

From Wild Implementation to Erlang/OTP

Usage and Example

Under the Hood

Conclusion, Optimizations and Future Modifications

Alternative Implementations

Resources

Files

0009-implementing-bech32-and-segwit-address-in-pure-erlang

Directory actions

More options

Directory actions

More options

Latest commit

History

0009-implementing-bech32-and-segwit-address-in-pure-erlang

Folders and files

parent directory

README.md

Implementing Bech32 and Segwit Address in Pure Erlang

Bitcoin, Bech32 and Segwit Addresses

Teaching and Learning with Pair Programming

From Wild Implementation to Erlang/OTP

Usage and Example

Under the Hood

Conclusion, Optimizations and Future Modifications

Alternative Implementations

Resources

Footnotes