diff --git a/Chapters/01-base64.qmd b/Chapters/01-base64.qmd
index 2a35272..c76958d 100644
--- a/Chapters/01-base64.qmd
+++ b/Chapters/01-base64.qmd
@@ -483,7 +483,7 @@ at the sixth position in both binary sequences we had a 1 value. So any
position where we do not have both binary sequences setted to 1, we get
a 0 bit in the resulting binary sequence.
-We loose information about the original bit values
+We lose information about the original bit values
from both sequences in this case. Because we no longer know
if this 0 bit in the resulting binary sequence was produced by
combining 0 with 0, or 1 with 0, or 0 with 1.
diff --git a/Chapters/01-memory.qmd b/Chapters/01-memory.qmd
index 06eee5f..588495c 100644
--- a/Chapters/01-memory.qmd
+++ b/Chapters/01-memory.qmd
@@ -250,7 +250,7 @@ The example below demonstrates this idea.
#| auto_main: true
#| build_type: "run"
#| eval: false
-// This does not compile succesfully!
+// This does not compile successfully!
const a = [_]u8{0, 1, 2, 3, 4};
for (0..a.len) |i| {
const index = i;
@@ -297,7 +297,7 @@ bugs in your program [@zigdocs, see "Lifetime and Ownership"[^life] and "Undefin
```{zig}
#| auto_main: true
#| build_type: "run"
-// This code compiles succesfully. But it has
+// This code compiles successfully. But it has
// undefined behaviour. Never do this!!!
// ==== Variable
// The `r` object is undefined!
diff --git a/Chapters/01-zig-weird.qmd b/Chapters/01-zig-weird.qmd
index 43976bb..1359815 100644
--- a/Chapters/01-zig-weird.qmd
+++ b/Chapters/01-zig-weird.qmd
@@ -531,7 +531,7 @@ keyword `var` in Zig is similar to using the keywords `let mut` in Rust.
In the code example below, we are creating a new constant object called `age`.
This object stores a number representing the age of someone. However, this code example
-does not compiles succesfully. Because on the next line of code, we are trying to change the value
+does not compiles successfully. Because on the next line of code, we are trying to change the value
of the object `age` to 25.
The `zig` compiler detects that we are trying to change
@@ -557,7 +557,7 @@ change the value of this object how many times you want over future points
in your source code.
So, using the same code example exposed above, if I change the declaration of the
-`age` object to use the `var` keyword, then, the program gets compiled succesfully.
+`age` object to use the `var` keyword, then, the program gets compiled successfully.
Because now, the `zig` compiler detects that we are changing the value of an
object that allows this behaviour, because it is an "variable object".
@@ -641,7 +641,7 @@ When you assign an object to a underscore, like in the example below, the `zig`
discard the value of this particular object.
You can see in the example below that, this time, the compiler did not
-complain about any "unused constant", and succesfully compiled our source code.
+complain about any "unused constant", and successfully compiled our source code.
```{zig}
#| auto_main: true
diff --git a/Chapters/03-structs.qmd b/Chapters/03-structs.qmd
index 1811d9c..f2e39b1 100644
--- a/Chapters/03-structs.qmd
+++ b/Chapters/03-structs.qmd
@@ -571,7 +571,7 @@ try std.testing.expect(i == 10);
try stdout.print("Everything worked!", .{});
```
-Since this code example was executed succesfully by the `zig` compiler,
+Since this code example was executed successfully by the `zig` compiler,
without raising any errors, then, we known that, after the execution of while loop,
the `i` object is equal to 10. Because if it wasn't equal to 10, then, an error would
be raised by `expect()`.
diff --git a/Chapters/04-http-server.qmd b/Chapters/04-http-server.qmd
index 0d9c961..6cd32b5 100644
--- a/Chapters/04-http-server.qmd
+++ b/Chapters/04-http-server.qmd
@@ -644,7 +644,7 @@ content back to us. It can be any type of content. It can be a web page,
a document file, or some data in a JSON format.
When a client sends a POST HTTP Request, the HTTP Response sent by the server normally have the sole purpose of
-letting the client know if the server processed and stored the data succesfully.
+letting the client know if the server processed and stored the data successfully.
In contrast, when the server receives a GET HTTP Request, then, the server sends the content
that the client asked for in the HTTP Response itself. This demonstrates that the method associated
with the HTTP Request changes a lot on the dynamics and the roles that each party
@@ -1028,7 +1028,7 @@ see the effects of these last changes. First, I execute the program once again,
Then, I open my web browser, and try to connect to the server again, using the URL `localhost:3490`.
This time, instead of getting some sort of an error message from the browser, you will get the message
"Hello World" printed into your web browser. Because this time, the server sended the HTTP Response
-succesfully to the web browser, as demonstrated by @fig-print-zigrun3.
+successfully to the web browser, as demonstrated by @fig-print-zigrun3.
![The Hello World message sent in the HTTP Response](./../Figures/print-zigrun3.png){#fig-print-zigrun3}
diff --git a/Chapters/05-pointers.qmd b/Chapters/05-pointers.qmd
index 91c122e..640ee33 100644
--- a/Chapters/05-pointers.qmd
+++ b/Chapters/05-pointers.qmd
@@ -150,7 +150,7 @@ p.zig:6:12: error: cannot assign to constant
```
If I change the `number` object to be a variable object, by introducing the `var` keyword,
-then, I can succesfully change the value of this object through a pointer, as demonstrated below:
+then, I can successfully change the value of this object through a pointer, as demonstrated below:
```{zig}
#| auto_main: true
diff --git a/Chapters/09-data-structures.qmd b/Chapters/09-data-structures.qmd
index 36a7ab7..ae8d412 100644
--- a/Chapters/09-data-structures.qmd
+++ b/Chapters/09-data-structures.qmd
@@ -427,7 +427,7 @@ pub fn main() !void {
if (hash_table.remove(57709)) {
std.debug.print(
- "Value at key 57709 succesfully removed!\n",
+ "Value at key 57709 successfully removed!\n",
.{}
);
}
@@ -441,7 +441,7 @@ pub fn main() !void {
```
N of values stored: 3
Value at key 50050: 55
-Value at key 57709 succesfully removed!
+Value at key 57709 successfully removed!
N of values stored: 2
```
@@ -458,7 +458,7 @@ and that is why we use the `?` method at the end to get access to the actual val
Also notice that we can remove (or delete) values from a hashtables by using the `remove()` method.
You provide the key that identifies the value that you want to delete, then, the method will
delete this value and return a `true` value as output. This `true` value essentially tells us
-that the method succesfully deleted the value.
+that the method successfully deleted the value.
But this delete operation might not be always successful. For example, you might provide the wrong
key to this method. I mean, maybe you provide
diff --git a/Chapters/09-error-handling.qmd b/Chapters/09-error-handling.qmd
index c805587..c8143cf 100644
--- a/Chapters/09-error-handling.qmd
+++ b/Chapters/09-error-handling.qmd
@@ -22,7 +22,7 @@ In this chapter, I want to discuss how error handling is done in Zig.
We already briefly learned about one of the available strategies to handle errors in Zig,
which is the `try` keyword presented at @sec-main-file. But we still haven't learned about
the other methods, such as the `catch` keyword.
-I also want to discuss in this chapter how enum types are created in Zig.
+I also want to discuss in this chapter how union types are created in Zig.
## Learning more about errors in Zig
@@ -31,7 +31,7 @@ An error is actually a value in Zig [@zigoverview]. In other words, when an erro
it means that somewhere in your Zig codebase, an error value is being generated.
An error value is similar to any integer value that you create in your Zig code.
You can take an error value and pass it as input to a function,
-and you can also cast (or coerce) it into a different type of error value.
+and you can also cast (or coerce) it into a different type of an error value.
This have some similarities with exceptions in C++ and Python.
Because in C++ and Python, when an exception happens inside a `try` block,
@@ -39,10 +39,10 @@ you can use a `catch` block (in C++) or an `except` block (in Python)
to capture the exception produced in the `try` block,
and pass it to functions as an input.
-
-Although they are normal values as any other, you cannot ignore error values in your Zig code. Meaning that, if an error
+However, error values in Zig are treated very differently than exceptions.
+First, you cannot ignore error values in your Zig code. Meaning that, if an error
value appears somewhere in your source code, this error value must be explicitly handled in some way.
-This also means that you cannot discard error values by assigning them to a underscore,
+This also means that you cannot discard error values by assigning them to an underscore,
as you could do with normal values and objects.
Take the source code below as an example. Here we are trying to open a file that does not exist
@@ -70,6 +70,7 @@ t.zig:8:17: error: error set is discarded
t.zig:8:17: note: consider using 'try', 'catch', or 'if'
```
+
### Returning errors from functions
As we described at @sec-main-file, when we have a function that might return an error
@@ -122,18 +123,19 @@ stay on the right side of the exclamation mark. So the syntax format become:
!
```
+
### Error sets
But what about when we have a single function that might return different types of errors?
When you have such a function, you can list
all of these different types of errors that can be returned from this function,
-through a structure in Zig that we call of *error set*.
+through a structure in Zig that we call of an *error set*.
-An error set is a special case of an union type.
-It essentially is an union that contains error values in it.
+An error set is a special case of an union type. It is an union that contains error values in it.
Not all programming languages have a notion of an "union object".
-But in summary, an union is just a list of the options that
-an object can be. For example, a union of `x`, `y` and `z`, means that
+But in summary, an union is just a set of data types.
+Unions are used to allow an object to have multiple data types.
+For example, a union of `x`, `y` and `z`, means that
an object can be either of type `x`, or type `y` or type `z`.
We are going to talk in more depth about unions at @sec-unions.
@@ -145,7 +147,7 @@ Take the `resolvePath()` function below as an example, which comes from the
`introspect.zig` module of the Zig Standard Library. We can see in it's return type annotation, that this
function return either: 1) a valid slice of `u8` values (`[]u8`); or, 2) one of the three different
types of error values listed inside the error set (`OutOfMemory`, `Unexpected`, etc.).
-This is an example of use of an error set.
+This is an usage example of an error set.
```{zig}
@@ -172,11 +174,11 @@ We can see that in the `ReadError` error set that we showed earlier in the `fill
which is defined in the `http.Client` module.
So yes, I presented the `ReadError` as if it was just a standard and single error value, but in fact,
it is an error set defined in the `http.Client` module, and therefore, it actually represents
-a set of different error values that might happen in the `fill()` and other functions.
+a set of different error values that might happen inside the `fill()` function.
Take a look at the `ReadError` definition reproduced below. Notice that we are grouping all of these
-different error values into a single object, and then, we use this object into the return type annotation of the functions.
+different error values into a single object, and then, we use this object into the return type annotation of the function.
Like the `fill()` function that we showed earlier, or, the `readvDirect()` function from the same module,
which is reproduced below.
@@ -292,7 +294,7 @@ are different and completely separate strategies in the Zig language.
This is uncommon, and different than what happens in other languages. Most
programming languages that adopts the *try catch* pattern (such as C++, R, Python, Javascript, etc.), normally use
-these two keywords in conjunction to form the complete logic to
+these two keywords together to form the complete logic to
properly handle the errors.
Anyway, Zig tries a different approach in the *try catch* pattern.
@@ -307,7 +309,7 @@ but this time, I use `catch` to actually implement a logic to handle the error,
just stopping the execution right away.
More specifically, in this example, I'm using a logger object to record some logs into
-the system, before I return the error, and stops the execution of the program. For example,
+the system, before I return the error, and stop the execution of the program. For example,
this could be some part of the codebase of a complex system that I do not have full control over,
and I want to record these logs before the program crashes, so that I can debug it later
(e.g. maybe I cannot compile the full program, and properly debug it with a debugger. So, these logs might
@@ -333,8 +335,7 @@ But I could also, return a valid value from this block of code, which would
be stored in the `file` object.
Notice that, instead of writing the keyword before the expression that might return the error,
-like we do with `try`,
-we write `catch` after the expression. We can open the pair of pipes (`|`),
+like we do with `try`, we write `catch` after the expression. We can open the pair of pipes (`|`),
which captures the error value returned by the expression, and makes
this error value available in the scope of the `catch` block as the object named `err`.
In other words, because I wrote `|err|` in the code, I can access the error value
@@ -355,7 +356,7 @@ But this parsing process done by the function `parseU64()` may fail, resulting i
The `catch` keyword used in this example provides an alternative value (13) to be used in case
this `parseU64()` function raises an error. So, the expression below essentially means:
"Hey! Please, parse this string into a `u64` for me, and store the results into the
-object `number`. But, if an error occurs, then, return the value `13` instead".
+object `number`. But, if an error occurs, then, use the value `13` instead".
```{zig}
#| eval: false
@@ -363,8 +364,8 @@ const number = parseU64(str, 10) catch 13;
```
So, at the end of this process, the object `number` will contain either a `u64` integer
-that was parsed succesfully from the input string `str`, or, if an error in the
-parsing process occurs, it will contain the `u64` value `13` that was provided by the `catch`
+that was parsed successfully from the input string `str`, or, if an error occurs in the
+parsing process, it will contain the `u64` value `13` that was provided by the `catch`
keyword as the "default", or, the "alternative" value.
@@ -401,10 +402,10 @@ if (parseU64(str, 10)) |number| {
Now, if the expression that you are executing returns different types of error values,
and you want to take a different action in each of these types of error values, the
-`catch` keyword becomes limited.
+`try` and `catch` keywords, and the if statement strategy, becomes limited.
-For this type of situation, the official documentation
-of the language suggests the use of a switch statement with an if statement [@zigdocs].
+For this type of situation, the official documentation of the language suggests
+the use of a switch statement together with an if statement [@zigdocs].
The basic idea is, to use the if statement to execute the expression, and
use the "else branch" to pass the error value to a switch statement, where
you define a different action for each type of error value that might be
@@ -450,8 +451,8 @@ get's stopped because of an error value being generated.
The basic idea is to provide an expression to the `errdefer` keyword. Then,
`errdefer` executes this expression if, and only if, an error occurs
during the execution of the current scope.
-In the example below, we are using an allocator object (that we presented at @sec-allocators)
-to create a new `User` object. If we are succesfull in creating and registering this new user,
+In the example below, we are using an allocator object (that we have presented at @sec-allocators)
+to create a new `User` object. If we are successful in creating and registering this new user,
this `create_user()` function will return this new `User` object as it's return value.
However, if for some reason, an error value is generated by some expression
@@ -477,16 +478,16 @@ By using `errdefer` to destroy the `user` object that we have just created,
we garantee that the memory allocated for this `user` object
get's freed, before the execution of the program stops.
Because if the expression `try db.add(user)` returns an error value,
-the execution of our program stops, and we loose all references and control over the memory
+the execution of our program stops, and we lose all references and control over the memory
that we have allocated for the `user` object.
As a result, if we do not free the memory associated with the `user` object before the program stops,
-we cannot free this memory anymore. We simply loose our chance to do the right thing.
+we cannot free this memory anymore. We simply lose our chance to do the right thing.
That is why `errdefer` is essential in this situation.
-Just to make very clear the differences between `defer` (which I described at @sec-defer)
-and `errdefer`, it might be worth to discuss the subject a bit further.
-You might still have the question "why use `errdefer` if we can use `defer` instead?"
-in your mind.
+Just to state clearly the differences between `defer` and `errdefer`
+(which I described at @sec-defer and @sec-errdefer1), it might be worth
+to discuss the subject a bit further. You might still have the question
+"why use `errdefer` if we can use `defer` instead?" in your mind.
Although being similar, the key difference between `errdefer` and `defer` keyword
is when the provided expression get's executed.
@@ -502,13 +503,13 @@ closely about this function, you will notice that this function returns
the `user` object as the result.
In other words, the allocated memory for the `user` object does not get
-freed inside the `create_user()`, if the function returns succesfully.
+freed inside the `create_user()` function, if it returns successfully.
So, if an error does not occur inside this function, the `user` object
is returned from the function, and probably, the code that runs after
this `create_user()` function will be responsible for freeying
the memory of the `user` object.
-But what if an error do occur inside the `create_user()`? What happens then?
+But what if an error occurs inside the `create_user()` function? What happens then?
This would mean that the execution of your code would stop in this `create_user()`
function, and, as a consequence, the code that runs after this `create_user()`
function would simply not run, and, as a result, the memory of the `user` object
@@ -518,8 +519,8 @@ This is the perfect scenario for `errdefer`. We use this keyword to garantee
that our program will free the allocated memory for the `user` object,
even if an error occurs inside the `create_user()` function.
-If you allocate and free some memory for an object in the same scope, then,
-just use `defer` and be happy, `errdefer` have no use for you in such situation.
+If you allocate and free some memory for an object inside the same scope, then,
+just use `defer` and be happy, i.e. `errdefer` have no use for you in such situation.
But if you allocate some memory in a scope A, but you only free this memory
later, in a scope B for example, then, `errdefer` becomes useful to avoid leaking memory
in sketchy situations.
@@ -532,12 +533,12 @@ An union type defines a set of types that an object can be. It is like a list of
options. Each option is a type that an object can assume. Therefore, unions in Zig
have the same meaning, or, the same role as unions in C. They are used for the same purpose.
You could also say that unions in Zig produces a similar effect to
-[`typing.Union` in Python](https://docs.python.org/3/library/typing.html#typing.Union)[^pyunion].
+[using `typing.Union` in Python](https://docs.python.org/3/library/typing.html#typing.Union)[^pyunion].
[^pyunion]:
For example, you might be creating an API that sends data to a data lake, hosted
-in some private cloud infrastructure. Suppose you created different structs in your codebase,
+in some private cloud infrastructure. Suppose you have created different structs in your codebase,
to store the necessary information that you need, in order to connect to the services of
each mainstream data lake service (Amazon S3, Azure Blob, etc.).
@@ -551,10 +552,10 @@ to be either an object of type `AzureBlob`, or type `AmazonS3`, or type `GoogleG
This union allows the `send_event()` function to receive an object of any of these three types
as input in the `lake_target` argument.
-Remember that each of these three types
-(`AmazonS3`, `GoogleGCP` and `AzureBlob`) are separate structs that we defined in
-our source code. So, at first glance, they are separate data types in our source code.
-But is the `union` keyword that unifies them into a single data type called `LakeTarget`.
+Remember that each of these three types (`AmazonS3`, `GoogleGCP` and `AzureBlob`)
+are separate structs that we have defined in our source code. So, at first glance,
+they are separate data types in our source code. But is the `union` keyword that
+unifies them into a single data type called `LakeTarget`.
```{zig}
#| eval: false
@@ -614,7 +615,7 @@ and you can no longer use them after you instantiate the object.
You can activate another data member by completely redefining the entire enum object.
In the example below, I initially use the `azure` data member. But then, I redefine the
-`target` object to use a new `LakeTarget` object, which uses this time the `google` data member.
+`target` object to use a new `LakeTarget` object, which uses the `google` data member.
```{zig}
#| eval: false
@@ -626,9 +627,9 @@ target = LakeTarget {
};
```
-An curious fact about union types, is that, at first, you cannot use them in switch statements (that we preseted at @sec-switch).
+A curious fact about union types, is that, at first, you cannot use them in switch statements (which were presented at @sec-switch).
In other words, if you have an object of type `LakeTarget` for example, you cannot give this object
-to a switch statement as input.
+as input to a switch statement.
But what if you really need to do so? What if you actually need to
provide an "union object" to a switch statement? The answer to this question relies on another special type in Zig,
@@ -640,7 +641,7 @@ below. This type comes from the
[`grammar.zig` module](https://github.com/ziglang/zig/blob/30b4a87db711c368853b3eff8e214ab681810ef9/tools/spirv/grammar.zig)[^grammar]
from the Zig repository. This union type lists different types of registries.
But notice this time, the use of `(enum)` after the `union` keyword. This is what makes
-this union type a tagged union. Also, by being a tagged union, an object of this `Registry` type
+this union type a tagged union. By being a tagged union, an object of this `Registry` type
can be used as input in a switch statement. This is all you have to do. Just add `(enum)`
to your `union` declaration, and you can use it in switch statements.
diff --git a/Chapters/12-file-op.qmd b/Chapters/12-file-op.qmd
index 4b4e293..b751f46 100644
--- a/Chapters/12-file-op.qmd
+++ b/Chapters/12-file-op.qmd
@@ -168,7 +168,7 @@ In C, a "file descriptor" is a `FILE` pointer, but, in Zig, a file descriptor is
This data type (`File`) is described in the `std.fs` module of the Zig Standard Library.
We normally don't create a `File` object directly in our Zig code. Instead, we normally get such object as result when we
open an IO resource. In other words, we normally ask to our OS to open and use a particular IO
-resource, and, if the OS do open succesfully this IO resource, the OS normally handles back to us
+resource, and, if the OS do open successfully this IO resource, the OS normally handles back to us
a file descriptor to this particular IO resource.
So you usually get a `File` object by using functions and methods from the Zig Standard Library
diff --git a/Chapters/14-threads.qmd b/Chapters/14-threads.qmd
index 9b9c798..1188044 100644
--- a/Chapters/14-threads.qmd
+++ b/Chapters/14-threads.qmd
@@ -259,7 +259,7 @@ that this method can return an error in some circunstances. One circunstance
in particular is when you attempt to create a new thread, when you have already
created too much (i.e. you have excedeed the quota of concurrent threads in your system).
-But, if the new thread is succesfully created, the `spawn()` method returns a handler
+But, if the new thread is successfully created, the `spawn()` method returns a handler
object (which is just an object of type `Thread`) to this new thread. You can use
this handler object to effectively control all aspects of the thread.
@@ -1061,7 +1061,7 @@ becomes completely independent from the execution of the main process in your pr
This means that the main process of your program might end before the thread finish it's job,
or vice-versa. The idea is that we have no idea of who is going to finish first. It
becomes a race condition problem.
-In such case, we loose control over this thread, and it's resources are never freed
+In such case, we lose control over this thread, and it's resources are never freed
(i.e. you have leaked resources in the system).
diff --git a/Chapters/14-zig-c-interop.qmd b/Chapters/14-zig-c-interop.qmd
index 9562f8d..4182ecf 100644
--- a/Chapters/14-zig-c-interop.qmd
+++ b/Chapters/14-zig-c-interop.qmd
@@ -324,7 +324,7 @@ converting them into C strings as needed.
But what about using one of the primitive data types that were introduced at @sec-primitive-data-types?
Let's take code exposed below as an example of that. Here, we are giving some float literal values as input
-to the C function `powf()`. Notice that this code example compiles and runs succesfully.
+to the C function `powf()`. Notice that this code example compiles and runs successfully.
```{zig}
#| eval: false
@@ -348,7 +348,7 @@ Once again, because the `zig` compiler does not associate a specific data type w
`15.68` and `2.32` at first glance, the compiler can automatically convert these values
into their C `float` (or `double`) equivalents, before it passes to the `powf()` C function.
Now, even if I give an explicit Zig data type to these literal values, by storing them into a Zig object,
-and explicit annotating the type of these objects, the code still compiles and runs succesfully.
+and explicit annotating the type of these objects, the code still compiles and runs successfully.
```{zig}
#| eval: false
@@ -426,7 +426,7 @@ the `@ptrCast()` function is involved.
In the example below, we are using this function to cast our `path` object
into a C pointer to an array of bytes. Then, we pass this C pointer as input
-to the `fopen()` function. Notice that this code example compiles succesfully
+to the `fopen()` function. Notice that this code example compiles successfully
with no errors.
```{zig}
diff --git a/_freeze/Chapters/01-base64/execute-results/html.json b/_freeze/Chapters/01-base64/execute-results/html.json
index 3a9679e..03aab53 100644
--- a/_freeze/Chapters/01-base64/execute-results/html.json
+++ b/_freeze/Chapters/01-base64/execute-results/html.json
@@ -1,8 +1,8 @@
{
- "hash": "86f4c387e93059b02c785052d074fb7f",
+ "hash": "eb808c8573d58da0bd1b1f3d76b28a9c",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Project 1 - Building a base64 encoder/decoder {#sec-base64}\n\nAs our first small project, I want to implement a base64 encoder/decoder with you.\nBase64 is an encoding system which translates binary data to text.\nA big chunk of the web uses base64 to deliver binary data to systems\nthat can only read text data.\n\nThe most common example of a modern use case for base64 is essentially any email system,\nlike GMail, Outlook, etc. Because email systems normally use\nthe Simple Mail Transfer Protocol (SMTP), which is a web protocol\nthat supports only text data. So, if you need, for any reason, to\nsend a binary file (like for example, a PDF, or an Excel file) as\nan attachment in your email, these binary files are normally\nconverted to base64, before they are included in the SMTP message.\nSo, the base64 encoding is largely used in these email systems to include\nbinary data into the SMTP message.\n\n\n\n\n\n\n## How the base64 algorithm work?\n\nBut how exactly the algorithm behind the base64 encoding works? Let's discuss that. First, I will\nexplain the base64 scale, which is the 64-character scale that is the basis for\nthe base64 encoding system.\n\nAfter that, I explain the algorithm behind a base64 encoder, which is the part of the algorithm that is responsible for encoding messages\ninto the base64 encoding system. Then, after that, I explain the algorithm behind a base64 decoder, which is\nthe part of the algorithm that is responsible for translating base64 messages back into their original meaning.\n\nIf you are unsure about the differences between an \"encoder\" and a \"decoder\",\ntake a look at @sec-encode-vs-decode.\n\n\n### The base64 scale {#sec-base64-scale}\n\nThe base64 encoding system is based on a scale that goes from 0 to 63 (hence the name).\nEach index in this scale is represented by a character (it is a scale of 64 characters).\nSo, in order to convert some binary data, to the base64 encoding, we need to convert each binary number to the corresponding\ncharacter in this \"scale of 64 characters\".\n\nThe base64 scale starts with all ASCII uppercase letters (A to Z) which represents\nthe first 25 indexes in this scale (0 to 25). After that, we have all ASCII lowercase letters\n(a to z), which represents the range 26 to 51 in the scale. After that, we\nhave the one digit numbers (0 to 9), which represents the indexes from 52 to 61 in the scale.\nFinally, the last two indexes in the scale (62 and 63) are represented by the characters `+` and `/`,\nrespectively.\n\nThese are the 64 characters that compose the base64 scale. The equal sign character (`=`) is not part of the scale itself,\nbut it is a special character in the base64 encoding system. This character is used solely as a suffix, to mark the end of the character sequence,\nor, to mark the end of meaningful characters in the sequence.\n\nThe bulletpoints below summarises the base64 scale:\n\n- range 0 to 25 is represented by: ASCII uppercase letters `-> [A-Z]`;\n- range 26 to 51 is represented by: ASCII lowercase letters `-> [a-z]`;\n- range 52 to 61 is represented by: one digit numbers `-> [0-9]`;\n- index 62 and 63 are represented by the characters `+` and `/`, respectively;\n- the character `=` represents the end of meaningful characters in the sequence;\n\n\n\n\n### Creating the scale as a lookup table {#sec-base64-table}\n\nThe best way to represent this scale in code, is to represent it as a *lookup table*.\nLookup tables are a classic strategy in computer science to speed calculations. The basic idea\nis to replace a runtime calculation (which can take a long time to be done) with a basic array indexing\noperation.\n\nInstead of calculating the results everytime you need them, you calculate all possible results at once, and then, you store them in an array\n(which behaves lake a \"table\"). Then, every time you need to use one of the characters in the base64 scale, instead of\nusing many resources to calculate the exact character to be used, you simply retrieve this character\nfrom the array where you stored all the possible characters in the base64 scale.\nWe retrieve the character that we need directly from memory.\n\nWe can start building a Zig struct to store our base64 decoder/encoder logic.\nWe start with the `Base64` struct below. For now, we only have one single data member in this\nstruct, i.e. the member `_table`, which represents our lookup table. We also have an `init()` method,\nto create a new instance of a `Base64` object, and, a `_char_at()` method, which is a\n\"get character at index $x$\" type of function.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst Base64 = struct {\n _table: *const [64]u8,\n\n pub fn init() Base64 {\n const upper = \"ABCDEFGHIJKLMNOPQRSTUVWXYZ\";\n const lower = \"abcdefghijklmnopqrstuvwxyz\";\n const numbers_symb = \"0123456789+/\";\n return Base64{\n ._table = upper ++ lower ++ numbers_symb,\n };\n }\n\n pub fn _char_at(self: Base64, index: u8) u8 {\n return self._table[index];\n }\n};\n```\n:::\n\n\n\n\n\nIn other words, the `_char_at()` method is responsible for getting the character in the lookup\ntable (i.e. the `_table` struct data member) that corresponds to a particular index in the\n\"base64 scale\". So, in the example below, we know that the character that corresponds to the\nindex 28 in the \"base64 scale\" is the character \"c\".\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst base64 = Base64.init();\ntry stdout.print(\n \"Character at index 28: {c}\\n\",\n .{base64._char_at(28)}\n);\n```\n:::\n\n\n\n\n```\nCharacter at index 28: c\n```\n\n\n\n### A base64 encoder {#sec-base64-encoder-algo}\n\nThe algorithm behind a base64 encoder usually works on a window of 3 bytes. Because each byte have\n8 bits, so, 3 bytes forms a set of $8 \\times 3 = 24$ bits. This is desirable for the base64 algorithm, because\n24 bits is divisible by 6, which forms $24 / 6 = 4$ groups of 6 bits each.\n\nTherefore, the base64 algorithm works by converting 3 bytes at a time\ninto 4 characters from the base64 scale. It keeps iterating through the input string,\n3 bytes at a time, and converting them into the base64 scale, producing 4 characters\nper iteration. It keeps iterating, and producing these \"new characters\"\nuntil it hits the end of the input string.\n\nNow, you may think, what if you have a particular string that have a number of bytes\nthat is not divisible by 3? What happens? For example, if you have a string\nthat contains only two characters/bytes, such as \"Hi\". How the\nalgorithm would behave in such situation? You find the answer at @fig-base64-algo1.\nYou can see at @fig-base64-algo1 that the string \"Hi\", when converted to base64,\nbecomes the string \"SGk=\":\n\n![The logic behind a base64 encoder](./../Figures/base64-encoder-flow.png){#fig-base64-algo1}\n\nTaking the string \"Hi\" as an example, we have 2 bytes, or, 16 bits in total. So, we lack a full byte (8 bits)\nto complete the window of 24 bits that the base64 algorithm likes to work on. The first thing that\nthe algorithm does, is to check how to divide the input bytes into groups of 6 bits.\n\nIf the algorithm notice that there is a group of 6 bits that, have some bits in it, but, at the same time, it is not full\n(in other words, $0 < nbits < 6$, being $nbits$ the number of bits), meaning that, it lacks\nsome bits to fill the 6-bits requirement, the algorithm simply add extra zeros in this group\nto fill the space that it needs. That is why at @fig-base64-algo1, on the third group after the 6-bit transformation,\n2 extra zeros were added to fill the gap in this group.\n\nWhen we have a 6-bit group that is not completely full, like the third group, extra zeros\nare added to fill the gap. But what about when an entire 6-bit group is empty, or, it \nsimply doesn't exist? This is the case of the fourth 6-bit group exposed at\n@fig-base64-algo1.\n\nThis fourth group is necessary, because the algorithm works on 4 groups of 6 bits.\nBut the input string does not have enough bytes to create a fourth 6-bit group.\nEvery time this happens, where an entire group of 6 bits is empty,\nthis group becomes a \"padding group\". Every \"padding group\" is mapped to\nthe character `=` (equal sign), which represents \"null\", or, the end\nof meaninful characters in the sequence. Hence, everytime that the algorithm produces a\n\"padding group\", this group is automatically mapped to `=`.\n\nAs another example, if you give the string \"0\" as input to a base64 encoder, this string is\ntranslated into the base64 sequence \"MA==\".\nThe character \"0\" is, in binary, the sequence `00110000`[^zero-note]. So, with the 6-bit transformation\nexposed at @fig-base64-algo1, this single character would produce these two 6-bit groups: `001100`, `000000`.\nThe remaining two 6-bit groups become \"padding groups\". That is why the last\ntwo characters in the output sequence (MA==) are `==`.\n\n\n[^zero-note]: Notice that, the character \"0\" is different than the actual number 0, which is simply zero in binary.\n\n### A base64 decoder {#sec-base64-decoder-algo}\n\nThe algorithm behind a base64 decoder is essentially the inverse process of a base64 encoder.\nA base64 decoder needs to translate base64 messages back into their original meaning,\ni.e. into the original sequence of binary data.\n\nA base64 decoder usually works on a window of 4 bytes. Because it wants to convert these 4 bytes\nback into the original sequence of 3 bytes, that was converted into 4 groups of 6 bits by the\nbase64 encoder. Remember, in a base64 decoder we are essentially reverting the process made\nby the base64 encoder.\n\nEach byte in the input string (the base64 encoded string) normally contributes to re-create\ntwo different bytes in the output (the original binary data).\nIn other words, each byte that comes out of a base64 decoder is created by transforming merging two different\nbytes in the input together. You can visualize this relationship at @fig-base64-algo2:\n\n![The logic behind a base64 decoder](./../Figures/base64-decoder-flow.png){#fig-base64-algo2}\n\nThe exact transformations, or, the exact steps applied to each byte from the input to transform them\ninto the bytes of the output, are a bit tricky to visualize in a figure like this. Because of that, I have\nsummarized these transformations as \"Some bit shifting and additions ...\" in the figure. These transformations\nwill be described in depth later.\n\nBesides that, if you look again at @fig-base64-algo2, you will notice that the character `=` was completely\nignored by the algorithm. Remember, this is just a special character that marks the end of meaninful characters\nin the base64 sequence. So, every `=` character in a base64 encoded sequence should be ignored by a base64 decoder.\n\n\n## Difference between encode and decode {#sec-encode-vs-decode}\n\nIf you don't have any previous experience with base64, you might not understand the differences\nbetween \"encode\" and \"decode\". Essentially, the terms \"encode\" and \"decode\" here\nhave the exact same meaning as they have in the field of encryption (i.e. they mean the same thing as \"encode\" and \"decode\" in hashing\nalgorithms, like the MD5 algorithm).\n\nThus, \"encode\" means that we want to encode, or, in other words, we want to translate some message into\nthe base64 encoding system. We want to produce the sequence of base64 characters that represent this\noriginal message in the base64 encoding system.\n\nIn contrast, \"decode\" represents the inverse process.\nWe want to decode, or, in other words, translate a base64 message back to it's original content.\nSo, in this process we get a sequence of base64 characters as input, and produce as output,\nthe binary data that is represented by this sequence of base64 characters.\n\nAny base64 library is normally composed by these two parts: 1) the encoder, which is a function that encodes\n(i.e. it converts) any sequence of binary data into a sequence of base64 characters; 2) the decoder, which is a function\nthat converts a sequence of base64 characters back into the original sequence of binary data.\n\n\n\n## Calculating the size of the output {#sec-base64-length-out}\n\nOne task that we need to do is to calculate how much space we need to reserve for the\noutput, both of the encoder and decoder. This is simple math, and can be done easily in Zig\nbecause every array have it's length (it's number of elements) easily accesible by consulting\nthe `.len` property of the array.\n\nFor the encoder, the logic is the following: for each 3 bytes that we find in the input,\n4 new bytes are created in the output. So, we take the number of bytes in the input, divide it\nby 3, use a ceiling function, then, we multiply the result by 4. That way, we get the total\nnumber of bytes that will be produced by the encoder in it's output.\n\nThe `_calc_encode_length()` function below encapsulates this logic.\nInside this function, we take the length of the input array,\nwe divide it by 3, and apply a ceil operation over the result by using the\n`divCeil()` function from the Zig Standard Library. Lastly, we multiply\nthe end result by 4 to get the answer we need.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn _calc_encode_length(input: []const u8) !usize {\n if (input.len < 3) {\n const n_output: usize = 4;\n return n_output;\n }\n const n_output: usize = try std.math.divCeil(\n usize, input.len, 3\n );\n return n_output * 4;\n}\n```\n:::\n\n\n\n\n\nAlso, you might have notice that, if the input length is less than 3 bytes, then, the output length of the encoder is\nalways 4 bytes. This is the case for every input with less than 3 bytes, because, as I described at @sec-base64-encoder-algo,\nthe algorithm always produces enough \"padding-groups\" in the end result, to complete the 4 bytes window.\n\nNow, for the decoder, we just need to apply the inverse logic: for each 4 bytes in the input, 3 bytes\nwill be produced in the output of the decoder. I mean, this is roughly true, because we also need to\ntake the `=` character into account, which is always ignored by the decoder, as we described at @sec-base64-decoder-algo, and,\nat @fig-base64-algo2. But we can ignore this fact for now, just to keep things simple.\n\nThe function `_calc_decode_length()` exposed below summarizes this logic that we described. It is very similar\nto the function `_calc_encode_length()`. Only the division part is twisted, and also, in the special\ncase where we have less than 4 bytes in the input to work on. Also notice that this time, we apply\na floor operation over the output of the division, by using the `divFloor()`\nfunction (instead of a ceiling operation with `divCeil()`).\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn _calc_decode_length(input: []const u8) !usize {\n if (input.len < 4) {\n const n_output: usize = 3;\n return n_output;\n }\n const n_output: usize = try std.math.divFloor(\n usize, input.len, 4\n );\n return n_output * 3;\n}\n```\n:::\n\n\n\n\n\n## Building the encoder logic {#sec-encoder-logic}\n\nIn this section, we can start building the logic behind the `encode()` function, which\nwill be responsible for encoding messages into the base64 encoding system.\nIf you are an anxious person, and you want to see now the full source code of the implementation\nfor this base64 encoder/decoder, you can find it at the `ZigExamples` folder in the official repository of\nthis book[^zig-base64-algo].\n\n[^zig-base64-algo]: .\n\n\n\n### The 6-bit transformation {#sec-6bit-transf}\n\nThe 6-bit transformation presented at @fig-base64-algo1 is the core part of the base64 encoder algorithm.\nBy understanding how this transformation is made in code, the rest of the algorithm becomes much simpler\nto comprehend.\n\nIn essence, this 6-bit transformation is made with the help of bitwise operators.\nBitwise operators are essential to any type of low-level operation that is done at the bit-level. For the specific case of the base64 algorithm,\nthe operators *bif shift to the left* (`<<`), *bit shift to the right* (`>>`), and the *bitwise and* (`&`) are used. They\nare the core solution for the 6-bit transformation.\n\nThere are 3 different scenarios that we need to take into account in this transformation. First, is the perfect scenario,\nwhere we have the perfect window of 3 bytes to work on. Second, we have the scenario where we have a window of only\ntwo bytes to work with. And last, we have the scenario where we have a window of one single byte.\n\nIn each of these 3 scenarios, the 6-bit transformation works a bit differently. To make the explanation\neasier, I will use the variable `output` to refer to the bytes in the output of the base64 encoder,\nand the variable `input` to refer to the bytes in the input of the encoder.\n\n\nSo, if you have the perfect window of 3 bytes, these are steps for the 6-bit transformation:\n\n1. `output[0]` is produced by moving the bits from `input[0]` two positions to the right.\n1. `output[1]` is produced by summing two components. First, take the last two bits from `input[0]`, then, move them four positions to the left. Second, move the bits from `input[1]` four positions to the right. Sum these two components.\n1. `output[2]` is produced by summing two components. First, take the last four bits from `input[1]`, then, move them two positions to the left. Second, move the bits from `input[2]` six positions to the right. Sum these two components.\n1. `output[3]` is produced by taking the last six bits from `input[2]`.\n\n\nThis is the perfect scenario, where we have a full window of 3 bytes to work on.\nJust to make things as clear as possible, the @fig-encoder-bitshift demonstrates visually how\nthe step 2 mentioned above works. So the 2nd byte in the `output` of the encoder, is made by taking the 1st byte (dark purple)\nand the 2nd byte (orange) from the input. You can see that, at the end of the process, we get a new\nbyte that contains the last 2 bits from the 1st byte in the `input`, and the first 4 bits\nfrom the 2nd byte in the `input`.\n\n![How the 2nd byte in the output of the encoder is produced from the 1st byte (dark purple) and the 2nd byte (orange) of the input.](../Figures/base64-encoder-bit-shift.png){#fig-encoder-bitshift}\n\nOn the other hand, we must be prepared for the instances where we do not have the perfect window of 3 bytes.\nIf you have a window of 2 bytes, then, the steps 3 and 4, which produces the bytes `output[2]` and `output[3]`, change a little bit,\nand they become:\n\n- `output[2]` is produced by taking the last 4 bits from `input[1]`, then, move them two positions to the left.\n- `output[3]` is the character `'='`.\n\n\nFinally, if you have a window of a single byte, then, the steps 2 to 4, which produces the bytes `output[1]`, `output[2]` and `output[3]` change,\nbecoming:\n\n- `output[1]` is produced by taking the last two bits from `input[0]`, then, move them four positions to the left.\n- `output[2]` and `output[3]` are the character `=`.\n\n\nIf these bulletpoints were a bit confusing for you, you may find the @tbl-transf-6bit more intuitive.\nThis table unifies all this logic into a simple table. Notice that\nthis table also provides the exact expression in Zig that creates the corresponding\nbyte in the output.\n\n\n::: {#tbl-transf-6bit}\n\n| Number of bytes in the window | Byte index in the output | In code |\n|-------------------------------|--------------------------|--------------------------------------------|\n| 3 | 0 | input[0] >> 2 |\n| 3 | 1 | ((input[0] & 0x03) << 4) + (input[1] >> 4) |\n| 3 | 2 | ((input[1] & 0x0f) << 2) + (input[2] >> 6) |\n| 3 | 3 | input[2] & 0x3f |\n| 2 | 0 | input[0] >> 2 |\n| 2 | 1 | ((input[0] & 0x03) << 4) + (input[1] >> 4) |\n| 2 | 2 | ((input[1] & 0x0f) << 2) |\n| 2 | 3 | '=' |\n| 1 | 0 | input[0] >> 2 |\n| 1 | 1 | ((input[0] & 0x03) << 4) |\n| 1 | 2 | '=' |\n| 1 | 3 | '=' |\n\n: How the 6-bit transformation translates into code in different window settings.\n\n:::\n\n\n\n\n\n\n### Bit-shifting in Zig\n\nBit-shifting in Zig works similarly to bit-shifting in C.\nAll bitwise operators that exist in C are available in Zig.\nHere, in the base64 encoder algorithm, they are essential\nto produce the result we want.\n\nFor those who are not familiar with these operators, they are\noperators that operates at the bit-level of your values.\nThis means that these operators takes the bits that form the value\nyou have, and change them in some way. This ultimately also changes\nthe value itself, because the binary representation of this value\nchanges.\n\nWe have already seen at @fig-encoder-bitshift the effect produced by a bit-shift.\nBut let's use the first byte in the output of the base64 encoder as another example of what\nbit-shifting means. This is the easiest byte of the 4 bytes in the output\nto build. Because we only need to move the bits from the first byte in the input two positions to the right,\nwith the *bit shift to the right* (`>>`) operator.\n\nIf we take the string \"Hi\" that we used at @fig-base64-algo1 as an example, the first byte in\nthis string is \"H\", which is the sequence `01001000` in binary.\nIf we move the bits of this byte, two places to the right, we get the sequence `00010010` as result.\nThis binary sequence is the value `18` in decimal, and also, the value `0x12` in hexadecimal.\nNotice that the first 6 bits of \"H\" were moved to the end of the byte.\nWith this operation, we get the first byte of the output.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const input = \"Hi\";\n try stdout.print(\"{d}\\n\", .{input[0] >> 2});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n18\n```\n\n\n:::\n:::\n\n\n\n\nIf you recall @fig-base64-algo1, the first byte present in the output should\nbe equivalent to the 6-bit group `010010`. Although being visually different, the\nsequences `010010` and `00010010` are semantically equal. They mean the same thing.\nThey both represent the number 18 in decimal, and the value `0x12` in hexadecimal.\n\nSo, don't take the \"6-bit group\" factor so seriously. We do not need necessarily to\nget a 6-bit sequence as result. As long as the meaning of the 8-bit sequence we get is the same\nof the 6-bit sequence, we are in the clear.\n\n\n\n### Selecting specific bits with the `&` operator\n\nIf you comeback to @sec-6bit-transf, you will see that, in order to produce\nthe second and third bytes in the output, we need to select specific\nbits from the first and second bytes in the input string. But how\ncan we do that? The answer relies on the *bitwise and* (`&`) operator.\n\nThe @fig-encoder-bitshift already showed you what effect this `&` operator\nproduces in the bits of it's operands. But let's make a clear description of it.\n\nIn summary, the `&` operator performs a logical conjunction operation\nbetween the bits of it's operands. In more details, the operator `&`\ncompares each bit of the first operand to the corresponding bit of the second operand.\nIf both bits are 1, the corresponding result bit is set to 1.\nOtherwise, the corresponding result bit is set to 0 [@microsoftbitwiseand].\n\nSo, if we apply this operator to the binary sequences `1000100` and `00001101`\nthe result of this operation is the binary sequence `00000100`. Because only\nat the sixth position in both binary sequences we had a 1 value. So any\nposition where we do not have both binary sequences setted to 1, we get\na 0 bit in the resulting binary sequence.\n\nWe loose information about the original bit values\nfrom both sequences in this case. Because we no longer know\nif this 0 bit in the resulting binary sequence was produced by\ncombining 0 with 0, or 1 with 0, or 0 with 1.\n\nAs an example, suppose you have the binary sequence `10010111`, which is the number 151 in decimal. How\ncan we get a new binary sequence which contains only the third and\nfourth bits of this sequence?\n\nWe just need to combine this sequence with `00110000` (is `0x30` in hexadecimal) using the `&` operator.\nNotice that only the third and fourth positions in this binary sequence is setted to 1. As a consequence, only the\nthird and fourth values of both binary sequences are potentially preserved in the output. All the remaining positions\nare setted to zero in the output sequence, which is `00010000` (is the number 16 in decimal).\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const bits = 0b10010111;\n try stdout.print(\"{d}\\n\", .{bits & 0b00110000});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n16\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Allocating space for the output\n\nAs I described at @sec-stack, to store an object in the stack,\nthis object needs to have a known and fixed length at compile-time. This is an important\nlimitation for our base64 encoder/decoder case. Because the size of\nthe output (from both the encoder and decoder) depends\ndirectly on the size of the input.\n\nHaving this in mind, we cannot know at compile time which is\nthe size of the output for both the encoder and decoder.\nSo, if we can't know the size of the output at compile time,\nthis means that we cannot store the output for both the encoder\nand decoder in the stack.\n\nConsequently, we need to store this output on the heap,\nand, as I commented at @sec-heap, we can only\nstore objects in the heap by using allocator objects.\nSo, one the arguments to both the `encode()` and `decode()`\nfunctions, needs to be an allocator object, because\nwe know for sure that, at some point inside the body of these\nfunctions, we need to allocate space on the heap to\nstore the output of these functions.\n\nThat is why, both the `encode()` and `decode()` functions that I\npresent in this book, have an argument called `allocator`,\nwhich receives a allocator object as input, identified by\nthe type `std.mem.Allocator` from the Zig Standard Library.\n\n\n\n### Writing the `encode()` function\n\nNow that we have a basic understanding on how the bitwise operators work, and how\nexactly they help us to achieve the result we want to achieve. We can now encapsulate\nall the logic that we have described at @fig-base64-algo1 and @tbl-transf-6bit into a nice\nfunction that we can add to our `Base64` struct definition, that we started at @sec-base64-table.\n\nYou can find the `encode()` function below. Notice that the first argument of this function,\nis the `Base64` struct itself. Therefore, this argument clearly signals\nthat this function is a method from the `Base64` struct.\n\nBecause the `encode()` function itself is fairly long,\nI intentionally ommitted the `Base64` struct definition in this source code,\njust for brevity reasons. So, just remember that this function is a public function (or a public method) from the\n`Base64` struct.\n\nFurthermore, this `encode()` function have two other arguments:\n\n1. `input` is the input sequence of characters that you want to encode in base64;\n2. `allocator` is an allocator object to use in the necessary memory allocations.\n\nI described everything you need to know about allocator objects at @sec-allocators.\nSo, if you are not familiar with them, I highly recommend you to comeback to\nthat section, and read it.\nBy looking at the `encode()` function, you will see that we use this\nallocator object to allocate enough memory to store the output of\nencoding process.\n\nThe main for loop in the function is responsible for iterating through the entire input string.\nIn every iteration, we use a `count` variable to count how many iterations we had at the\nmoment. When `count` reaches 3, then, we try to encode the 3 characters (or bytes) that we have accumulated\nin the temporary buffer object (`buf`).\n\nAfter encoding these 3 characters and storing the result in the `output` variable, we reset\nthe `count` variable to zero, and start to count again on the next iteration of the loop.\nIf the loop hits the end of the string, and, the `count` variable is less than 3, then, it means that\nthe temporary buffer contains the last 1 or 2 bytes from the input.\nThat is why we have two `if` statements after the for loop. To deal which each possible case.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn encode(self: Base64,\n allocator: std.mem.Allocator,\n input: []const u8) ![]u8 {\n\n if (input.len == 0) {\n return \"\";\n }\n\n const n_out = try _calc_encode_length(input);\n var out = try allocator.alloc(u8, n_out);\n var buf = [3]u8{ 0, 0, 0 };\n var count: u8 = 0;\n var iout: u64 = 0;\n\n for (input, 0..) |_, i| {\n buf[count] = input[i];\n count += 1;\n if (count == 3) {\n out[iout] = self._char_at(buf[0] >> 2);\n out[iout + 1] = self._char_at(\n ((buf[0] & 0x03) << 4) + (buf[1] >> 4)\n );\n out[iout + 2] = self._char_at(\n ((buf[1] & 0x0f) << 2) + (buf[2] >> 6)\n );\n out[iout + 3] = self._char_at(buf[2] & 0x3f);\n iout += 4;\n count = 0;\n }\n }\n\n if (count == 1) {\n out[iout] = self._char_at(buf[0] >> 2);\n out[iout + 1] = self._char_at(\n (buf[0] & 0x03) << 4\n );\n out[iout + 2] = '=';\n out[iout + 3] = '=';\n }\n\n if (count == 2) {\n out[iout] = self._char_at(buf[0] >> 2);\n out[iout + 1] = self._char_at(\n ((buf[0] & 0x03) << 4) + (buf[1] >> 4)\n );\n out[iout + 2] = self._char_at(\n (buf[1] & 0x0f) << 2\n );\n out[iout + 3] = '=';\n iout += 4;\n }\n\n return out;\n}\n```\n:::\n\n\n\n\n\n\n## Building the decoder logic {#sec-decoder-logic}\n\nNow, we can focus on writing the base64 decoder logic. Remember from @fig-base64-algo2 that,\na base64 decoder do the inverse process of an encoder. So, all we need to do, is to\nwrite a `decode()` function that performs the inverse process that I exposed at @sec-encoder-logic.\n\n\n### Mapping base64 characters to their indexes {#sec-map-base64-index}\n\nOne thing that we need to do, in order to decode a base64-encoded message, is to calculate\nthe index in the base64 scale of every base64 character that we encounter in the decoder input.\n\nIn other words, the decoder receives as input, a sequence of base64 characters. We need\nto translate this sequence of characters into a sequence of indexes. These indexes\nare the index of each character in the base64 scale. This way, we get the value/byte\nthat was calculated in the 6-bit transformation step of the encoder process.\n\nThere are probably better/faster ways to calculate this, especially using a \"divide and conquer\"\ntype of strategy. But for now, I am satisfied with a simple and \"brute force\" type of strategy.\nThe `_char_index()` function below contains this strategy.\n\nWe are essentially looping through the *lookup table* with the base64 scale,\nand comparing the character we got with each character in the base64 scale.\nIf these characters match, then, we return the index of this character in the\nbase64 scale as the result.\n\nNotice that, if the input character is `'='`, the function returns the index 64, which is\n\"out of range\" in the scale. But, as I described at @sec-base64-scale,\nthe character `'='` does not belong to the base64 scale itself.\nIt is a special and meaningless character in base64.\n\nAlso notice that this `_char_index()` function is a method from our `Base64` struct,\nbecause of the `self` argument. Again, I have omitted the `Base64` struct definition in this example\nfor brevity reasons.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn _char_index(self: Base64, char: u8) u8 {\n if (char == '=')\n return 64;\n var index: u8 = 0;\n for (0..63) |i| {\n if (self._char_at(i) == char) {\n index = i;\n break;\n }\n }\n\n return index;\n}\n```\n:::\n\n\n\n\n\n\n### The 6-bit transformation\n\nOnce again, the core part of the algorithm is the 6-bit transformation.\nIf we understand the necessary steps to perform this transformation, the rest\nof the algorithm becomes much easier.\n\nFirst of all, before we actually go to the 6-bit transformation,\nwe need to make sure that we use `_char_index()` to convert the sequence of base64 characters\ninto a sequence of indexes. So the snippet below is important for the job that will be done.\nThe result of `_char_index()` is stored in a temporary buffer, and this temporary buffer\nis what we are going to use in the 6-bit transformation, instead of the actual `input` object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfor (0..input.len) |i| {\n buf[i] = self._char_index(input[i]);\n}\n```\n:::\n\n\n\n\nNow, instead of producing 4 bytes (or 4 characters) as output per each window of 3 characters in the input,\na base64 decoder produces 3 bytes (or 3 characters) as output per each window of 4 characters in the input.\nOnce again, is the inverse process.\n\nSo, the steps to produce the 3 bytes in the output are:\n\n1. `output[0]` is produced by summing two components. First, move the bits from `buf[0]` two positions to the left. Second, move the bits from `buf[1]` 4 positions to the right. Then, sum these two components.\n1. `output[1]` is produced by summing two components. First, move the bits from `buf[1]` four positions to the left. Second, move the bits from `buf[2]` 2 positions to the right. Then, sum these two components.\n1. `output[2]` is produced by summing two components. First, move the bits from `buf[2]` six positions to the left. Then, you sum the result with `buf[3]`.\n\n\nBefore we continue, let's try to visualize how these transformations make the original bytes that we had\nbefore the encoding process. First, think back at the 6-bit transformation performed by the encoder exposed at @sec-encoder-logic.\nThe first byte in the output of the encoder is produced by moving the bits in the first byte of the input two positions to the right.\n\nIf for example the first byte in the input of the encoder was the sequence `ABCDEFGH`, then, the first byte in the output of the encoder would be\n`00ABCDEF` (this sequence would be the first byte in the input of the decoder). Now, if the second byte in the input of the encoder was the sequence\n`IJKLMNOP`, then, the second byte in the encoder output would be `00GHIJKL` (as we demonstrated at @fig-encoder-bitshift).\n\nHence, if the sequences `00ABCDEF` and `00GHIJKL` are the first and second bytes, respectively, in the input of the decoder, the\n@fig-decoder-bitshift demonstrates visually how these two bytes are transformed into the first byte of the output of the decoder.\nNotice that the output byte is the sequence `ABCDEFGH`, which is the original byte from the input of the encoder.\n\n![How the 1st byte in the decoder output is produced from the 1st byte (dark purple) and the 2nd byte (orange) of the input](../Figures/base64-decoder-bit-shift.png){#fig-decoder-bitshift}\n\nThe @tbl-6bit-decode presents how the three steps described ealier translate into Zig code:\n\n\n\n::: {#tbl-6bit-decode}\n\n| Byte index in the output | In code |\n|--------------------------|-------------------------------|\n| 0 | (buf[0] << 2) + (buf[1] >> 4) |\n| 1 | (buf[1] << 4) + (buf[2] >> 2) |\n| 2 | (buf[2] << 6) + buf[3] |\n\n: The necessary steps for the 6-transformation in the decode process.\n\n\n:::\n\n\n\n\n\n\n\n### Writing the `decode()` function\n\nThe `decode()` function below contains the entire decoding process.\nWe first calculate the size of the output, with\n`_calc_decode_length()`, then, we allocate enough memory for this output with\nthe allocator object.\n\nThree temporary variables are created: 1) `count`, to hold the window count\nin each iteration of the for loop; 2) `iout`, to hold the current index in the output;\n3) `buf`, which is the temporary buffer that holds the base64 indexes to be\nconverted through the 6-bit transformation.\n\nThen, in each iteration of the for loop we fill the temporary buffer with the current\nwindow of bytes. When `count` hits the number 4, then, we have a full window of\nindexes in `buf` to be converted, and then, we apply the 6-bit transformation\nover the temporary buffer.\n\nNotice that we check if the indexes 2 and 3 in the temporary buffer are the number 64, which, if you recall\nfrom @sec-map-base64-index, is when the `_calc_index()` function receives a `'='` character\nas input. So, if these indexes are equal to the number 64, the `decode()` function knows\nthat it can simply ignore these indexes. They are not converted because, as I described before,\nthe character `'='` have no meaning, despite being the end of meaningful characters in the sequence.\nSo we can safely ignore them when they appear in the sequence.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn decode(self: Base64,\n allocator: std.mem.Allocator,\n input: []const u8) ![]u8 {\n\n if (input.len == 0) {\n return \"\";\n }\n const n_output = try _calc_decode_length(input);\n var output = try allocator.alloc(u8, n_output);\n var count: u8 = 0;\n var iout: u64 = 0;\n var buf = [4]u8{ 0, 0, 0, 0 };\n\n for (0..input.len) |i| {\n buf[count] = self._char_index(input[i]);\n count += 1;\n if (count == 4) {\n output[iout] = (buf[0] << 2) + (buf[1] >> 4);\n if (buf[2] != 64) {\n output[iout + 1] = (buf[1] << 4) + (buf[2] >> 2);\n }\n if (buf[3] != 64) {\n output[iout + 2] = (buf[2] << 6) + buf[3];\n }\n iout += 3;\n count = 0;\n }\n }\n\n return output;\n}\n```\n:::\n\n\n\n\n\n## The end result\n\nNow that we have both `decode()` and `encode()` implemented. We have a fully functioning\nbase64 encoder/decoder implemented in Zig. Here is an usage example of our\n`Base64` struct with the `encode()` and `decode()` methods that we have implemented.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar memory_buffer: [1000]u8 = undefined;\nvar fba = std.heap.FixedBufferAllocator.init(\n &memory_buffer\n);\nconst allocator = fba.allocator();\n\nconst text = \"Testing some more shit\";\nconst etext = \"VGVzdGluZyBzb21lIG1vcmUgc2hpdA==\";\nconst base64 = Base64.init();\nconst encoded_text = try base64.encode(\n allocator, text\n);\nconst decoded_text = try base64.decode(\n allocator, etext\n);\ntry stdout.print(\n \"Encoded text: {s}\\n\", .{encoded_text}\n);\ntry stdout.print(\n \"Decoded text: {s}\\n\", .{decoded_text}\n);\n```\n:::\n\n\n\n\n```\nEncoded text: VGVzdGluZyBzb21lIG1vcmUgc2hpdA==\nDecoded text: Testing some more shit\n```\n\nYou can also see the full source code at once, by visiting the official repository of this book[^repo].\nMore precisely inside the `ZigExamples` folder[^zig-base64-algo].\n\n[^repo]: \n",
+ "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Project 1 - Building a base64 encoder/decoder {#sec-base64}\n\nAs our first small project, I want to implement a base64 encoder/decoder with you.\nBase64 is an encoding system which translates binary data to text.\nA big chunk of the web uses base64 to deliver binary data to systems\nthat can only read text data.\n\nThe most common example of a modern use case for base64 is essentially any email system,\nlike GMail, Outlook, etc. Because email systems normally use\nthe Simple Mail Transfer Protocol (SMTP), which is a web protocol\nthat supports only text data. So, if you need, for any reason, to\nsend a binary file (like for example, a PDF, or an Excel file) as\nan attachment in your email, these binary files are normally\nconverted to base64, before they are included in the SMTP message.\nSo, the base64 encoding is largely used in these email systems to include\nbinary data into the SMTP message.\n\n\n\n\n\n\n## How the base64 algorithm work?\n\nBut how exactly the algorithm behind the base64 encoding works? Let's discuss that. First, I will\nexplain the base64 scale, which is the 64-character scale that is the basis for\nthe base64 encoding system.\n\nAfter that, I explain the algorithm behind a base64 encoder, which is the part of the algorithm that is responsible for encoding messages\ninto the base64 encoding system. Then, after that, I explain the algorithm behind a base64 decoder, which is\nthe part of the algorithm that is responsible for translating base64 messages back into their original meaning.\n\nIf you are unsure about the differences between an \"encoder\" and a \"decoder\",\ntake a look at @sec-encode-vs-decode.\n\n\n### The base64 scale {#sec-base64-scale}\n\nThe base64 encoding system is based on a scale that goes from 0 to 63 (hence the name).\nEach index in this scale is represented by a character (it is a scale of 64 characters).\nSo, in order to convert some binary data, to the base64 encoding, we need to convert each binary number to the corresponding\ncharacter in this \"scale of 64 characters\".\n\nThe base64 scale starts with all ASCII uppercase letters (A to Z) which represents\nthe first 25 indexes in this scale (0 to 25). After that, we have all ASCII lowercase letters\n(a to z), which represents the range 26 to 51 in the scale. After that, we\nhave the one digit numbers (0 to 9), which represents the indexes from 52 to 61 in the scale.\nFinally, the last two indexes in the scale (62 and 63) are represented by the characters `+` and `/`,\nrespectively.\n\nThese are the 64 characters that compose the base64 scale. The equal sign character (`=`) is not part of the scale itself,\nbut it is a special character in the base64 encoding system. This character is used solely as a suffix, to mark the end of the character sequence,\nor, to mark the end of meaningful characters in the sequence.\n\nThe bulletpoints below summarises the base64 scale:\n\n- range 0 to 25 is represented by: ASCII uppercase letters `-> [A-Z]`;\n- range 26 to 51 is represented by: ASCII lowercase letters `-> [a-z]`;\n- range 52 to 61 is represented by: one digit numbers `-> [0-9]`;\n- index 62 and 63 are represented by the characters `+` and `/`, respectively;\n- the character `=` represents the end of meaningful characters in the sequence;\n\n\n\n\n### Creating the scale as a lookup table {#sec-base64-table}\n\nThe best way to represent this scale in code, is to represent it as a *lookup table*.\nLookup tables are a classic strategy in computer science to speed calculations. The basic idea\nis to replace a runtime calculation (which can take a long time to be done) with a basic array indexing\noperation.\n\nInstead of calculating the results everytime you need them, you calculate all possible results at once, and then, you store them in an array\n(which behaves lake a \"table\"). Then, every time you need to use one of the characters in the base64 scale, instead of\nusing many resources to calculate the exact character to be used, you simply retrieve this character\nfrom the array where you stored all the possible characters in the base64 scale.\nWe retrieve the character that we need directly from memory.\n\nWe can start building a Zig struct to store our base64 decoder/encoder logic.\nWe start with the `Base64` struct below. For now, we only have one single data member in this\nstruct, i.e. the member `_table`, which represents our lookup table. We also have an `init()` method,\nto create a new instance of a `Base64` object, and, a `_char_at()` method, which is a\n\"get character at index $x$\" type of function.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst Base64 = struct {\n _table: *const [64]u8,\n\n pub fn init() Base64 {\n const upper = \"ABCDEFGHIJKLMNOPQRSTUVWXYZ\";\n const lower = \"abcdefghijklmnopqrstuvwxyz\";\n const numbers_symb = \"0123456789+/\";\n return Base64{\n ._table = upper ++ lower ++ numbers_symb,\n };\n }\n\n pub fn _char_at(self: Base64, index: u8) u8 {\n return self._table[index];\n }\n};\n```\n:::\n\n\n\n\n\nIn other words, the `_char_at()` method is responsible for getting the character in the lookup\ntable (i.e. the `_table` struct data member) that corresponds to a particular index in the\n\"base64 scale\". So, in the example below, we know that the character that corresponds to the\nindex 28 in the \"base64 scale\" is the character \"c\".\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst base64 = Base64.init();\ntry stdout.print(\n \"Character at index 28: {c}\\n\",\n .{base64._char_at(28)}\n);\n```\n:::\n\n\n\n\n```\nCharacter at index 28: c\n```\n\n\n\n### A base64 encoder {#sec-base64-encoder-algo}\n\nThe algorithm behind a base64 encoder usually works on a window of 3 bytes. Because each byte have\n8 bits, so, 3 bytes forms a set of $8 \\times 3 = 24$ bits. This is desirable for the base64 algorithm, because\n24 bits is divisible by 6, which forms $24 / 6 = 4$ groups of 6 bits each.\n\nTherefore, the base64 algorithm works by converting 3 bytes at a time\ninto 4 characters from the base64 scale. It keeps iterating through the input string,\n3 bytes at a time, and converting them into the base64 scale, producing 4 characters\nper iteration. It keeps iterating, and producing these \"new characters\"\nuntil it hits the end of the input string.\n\nNow, you may think, what if you have a particular string that have a number of bytes\nthat is not divisible by 3? What happens? For example, if you have a string\nthat contains only two characters/bytes, such as \"Hi\". How the\nalgorithm would behave in such situation? You find the answer at @fig-base64-algo1.\nYou can see at @fig-base64-algo1 that the string \"Hi\", when converted to base64,\nbecomes the string \"SGk=\":\n\n![The logic behind a base64 encoder](./../Figures/base64-encoder-flow.png){#fig-base64-algo1}\n\nTaking the string \"Hi\" as an example, we have 2 bytes, or, 16 bits in total. So, we lack a full byte (8 bits)\nto complete the window of 24 bits that the base64 algorithm likes to work on. The first thing that\nthe algorithm does, is to check how to divide the input bytes into groups of 6 bits.\n\nIf the algorithm notice that there is a group of 6 bits that, have some bits in it, but, at the same time, it is not full\n(in other words, $0 < nbits < 6$, being $nbits$ the number of bits), meaning that, it lacks\nsome bits to fill the 6-bits requirement, the algorithm simply add extra zeros in this group\nto fill the space that it needs. That is why at @fig-base64-algo1, on the third group after the 6-bit transformation,\n2 extra zeros were added to fill the gap in this group.\n\nWhen we have a 6-bit group that is not completely full, like the third group, extra zeros\nare added to fill the gap. But what about when an entire 6-bit group is empty, or, it \nsimply doesn't exist? This is the case of the fourth 6-bit group exposed at\n@fig-base64-algo1.\n\nThis fourth group is necessary, because the algorithm works on 4 groups of 6 bits.\nBut the input string does not have enough bytes to create a fourth 6-bit group.\nEvery time this happens, where an entire group of 6 bits is empty,\nthis group becomes a \"padding group\". Every \"padding group\" is mapped to\nthe character `=` (equal sign), which represents \"null\", or, the end\nof meaninful characters in the sequence. Hence, everytime that the algorithm produces a\n\"padding group\", this group is automatically mapped to `=`.\n\nAs another example, if you give the string \"0\" as input to a base64 encoder, this string is\ntranslated into the base64 sequence \"MA==\".\nThe character \"0\" is, in binary, the sequence `00110000`[^zero-note]. So, with the 6-bit transformation\nexposed at @fig-base64-algo1, this single character would produce these two 6-bit groups: `001100`, `000000`.\nThe remaining two 6-bit groups become \"padding groups\". That is why the last\ntwo characters in the output sequence (MA==) are `==`.\n\n\n[^zero-note]: Notice that, the character \"0\" is different than the actual number 0, which is simply zero in binary.\n\n### A base64 decoder {#sec-base64-decoder-algo}\n\nThe algorithm behind a base64 decoder is essentially the inverse process of a base64 encoder.\nA base64 decoder needs to translate base64 messages back into their original meaning,\ni.e. into the original sequence of binary data.\n\nA base64 decoder usually works on a window of 4 bytes. Because it wants to convert these 4 bytes\nback into the original sequence of 3 bytes, that was converted into 4 groups of 6 bits by the\nbase64 encoder. Remember, in a base64 decoder we are essentially reverting the process made\nby the base64 encoder.\n\nEach byte in the input string (the base64 encoded string) normally contributes to re-create\ntwo different bytes in the output (the original binary data).\nIn other words, each byte that comes out of a base64 decoder is created by transforming merging two different\nbytes in the input together. You can visualize this relationship at @fig-base64-algo2:\n\n![The logic behind a base64 decoder](./../Figures/base64-decoder-flow.png){#fig-base64-algo2}\n\nThe exact transformations, or, the exact steps applied to each byte from the input to transform them\ninto the bytes of the output, are a bit tricky to visualize in a figure like this. Because of that, I have\nsummarized these transformations as \"Some bit shifting and additions ...\" in the figure. These transformations\nwill be described in depth later.\n\nBesides that, if you look again at @fig-base64-algo2, you will notice that the character `=` was completely\nignored by the algorithm. Remember, this is just a special character that marks the end of meaninful characters\nin the base64 sequence. So, every `=` character in a base64 encoded sequence should be ignored by a base64 decoder.\n\n\n## Difference between encode and decode {#sec-encode-vs-decode}\n\nIf you don't have any previous experience with base64, you might not understand the differences\nbetween \"encode\" and \"decode\". Essentially, the terms \"encode\" and \"decode\" here\nhave the exact same meaning as they have in the field of encryption (i.e. they mean the same thing as \"encode\" and \"decode\" in hashing\nalgorithms, like the MD5 algorithm).\n\nThus, \"encode\" means that we want to encode, or, in other words, we want to translate some message into\nthe base64 encoding system. We want to produce the sequence of base64 characters that represent this\noriginal message in the base64 encoding system.\n\nIn contrast, \"decode\" represents the inverse process.\nWe want to decode, or, in other words, translate a base64 message back to it's original content.\nSo, in this process we get a sequence of base64 characters as input, and produce as output,\nthe binary data that is represented by this sequence of base64 characters.\n\nAny base64 library is normally composed by these two parts: 1) the encoder, which is a function that encodes\n(i.e. it converts) any sequence of binary data into a sequence of base64 characters; 2) the decoder, which is a function\nthat converts a sequence of base64 characters back into the original sequence of binary data.\n\n\n\n## Calculating the size of the output {#sec-base64-length-out}\n\nOne task that we need to do is to calculate how much space we need to reserve for the\noutput, both of the encoder and decoder. This is simple math, and can be done easily in Zig\nbecause every array have it's length (it's number of elements) easily accesible by consulting\nthe `.len` property of the array.\n\nFor the encoder, the logic is the following: for each 3 bytes that we find in the input,\n4 new bytes are created in the output. So, we take the number of bytes in the input, divide it\nby 3, use a ceiling function, then, we multiply the result by 4. That way, we get the total\nnumber of bytes that will be produced by the encoder in it's output.\n\nThe `_calc_encode_length()` function below encapsulates this logic.\nInside this function, we take the length of the input array,\nwe divide it by 3, and apply a ceil operation over the result by using the\n`divCeil()` function from the Zig Standard Library. Lastly, we multiply\nthe end result by 4 to get the answer we need.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn _calc_encode_length(input: []const u8) !usize {\n if (input.len < 3) {\n const n_output: usize = 4;\n return n_output;\n }\n const n_output: usize = try std.math.divCeil(\n usize, input.len, 3\n );\n return n_output * 4;\n}\n```\n:::\n\n\n\n\n\nAlso, you might have notice that, if the input length is less than 3 bytes, then, the output length of the encoder is\nalways 4 bytes. This is the case for every input with less than 3 bytes, because, as I described at @sec-base64-encoder-algo,\nthe algorithm always produces enough \"padding-groups\" in the end result, to complete the 4 bytes window.\n\nNow, for the decoder, we just need to apply the inverse logic: for each 4 bytes in the input, 3 bytes\nwill be produced in the output of the decoder. I mean, this is roughly true, because we also need to\ntake the `=` character into account, which is always ignored by the decoder, as we described at @sec-base64-decoder-algo, and,\nat @fig-base64-algo2. But we can ignore this fact for now, just to keep things simple.\n\nThe function `_calc_decode_length()` exposed below summarizes this logic that we described. It is very similar\nto the function `_calc_encode_length()`. Only the division part is twisted, and also, in the special\ncase where we have less than 4 bytes in the input to work on. Also notice that this time, we apply\na floor operation over the output of the division, by using the `divFloor()`\nfunction (instead of a ceiling operation with `divCeil()`).\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn _calc_decode_length(input: []const u8) !usize {\n if (input.len < 4) {\n const n_output: usize = 3;\n return n_output;\n }\n const n_output: usize = try std.math.divFloor(\n usize, input.len, 4\n );\n return n_output * 3;\n}\n```\n:::\n\n\n\n\n\n## Building the encoder logic {#sec-encoder-logic}\n\nIn this section, we can start building the logic behind the `encode()` function, which\nwill be responsible for encoding messages into the base64 encoding system.\nIf you are an anxious person, and you want to see now the full source code of the implementation\nfor this base64 encoder/decoder, you can find it at the `ZigExamples` folder in the official repository of\nthis book[^zig-base64-algo].\n\n[^zig-base64-algo]: .\n\n\n\n### The 6-bit transformation {#sec-6bit-transf}\n\nThe 6-bit transformation presented at @fig-base64-algo1 is the core part of the base64 encoder algorithm.\nBy understanding how this transformation is made in code, the rest of the algorithm becomes much simpler\nto comprehend.\n\nIn essence, this 6-bit transformation is made with the help of bitwise operators.\nBitwise operators are essential to any type of low-level operation that is done at the bit-level. For the specific case of the base64 algorithm,\nthe operators *bif shift to the left* (`<<`), *bit shift to the right* (`>>`), and the *bitwise and* (`&`) are used. They\nare the core solution for the 6-bit transformation.\n\nThere are 3 different scenarios that we need to take into account in this transformation. First, is the perfect scenario,\nwhere we have the perfect window of 3 bytes to work on. Second, we have the scenario where we have a window of only\ntwo bytes to work with. And last, we have the scenario where we have a window of one single byte.\n\nIn each of these 3 scenarios, the 6-bit transformation works a bit differently. To make the explanation\neasier, I will use the variable `output` to refer to the bytes in the output of the base64 encoder,\nand the variable `input` to refer to the bytes in the input of the encoder.\n\n\nSo, if you have the perfect window of 3 bytes, these are steps for the 6-bit transformation:\n\n1. `output[0]` is produced by moving the bits from `input[0]` two positions to the right.\n1. `output[1]` is produced by summing two components. First, take the last two bits from `input[0]`, then, move them four positions to the left. Second, move the bits from `input[1]` four positions to the right. Sum these two components.\n1. `output[2]` is produced by summing two components. First, take the last four bits from `input[1]`, then, move them two positions to the left. Second, move the bits from `input[2]` six positions to the right. Sum these two components.\n1. `output[3]` is produced by taking the last six bits from `input[2]`.\n\n\nThis is the perfect scenario, where we have a full window of 3 bytes to work on.\nJust to make things as clear as possible, the @fig-encoder-bitshift demonstrates visually how\nthe step 2 mentioned above works. So the 2nd byte in the `output` of the encoder, is made by taking the 1st byte (dark purple)\nand the 2nd byte (orange) from the input. You can see that, at the end of the process, we get a new\nbyte that contains the last 2 bits from the 1st byte in the `input`, and the first 4 bits\nfrom the 2nd byte in the `input`.\n\n![How the 2nd byte in the output of the encoder is produced from the 1st byte (dark purple) and the 2nd byte (orange) of the input.](../Figures/base64-encoder-bit-shift.png){#fig-encoder-bitshift}\n\nOn the other hand, we must be prepared for the instances where we do not have the perfect window of 3 bytes.\nIf you have a window of 2 bytes, then, the steps 3 and 4, which produces the bytes `output[2]` and `output[3]`, change a little bit,\nand they become:\n\n- `output[2]` is produced by taking the last 4 bits from `input[1]`, then, move them two positions to the left.\n- `output[3]` is the character `'='`.\n\n\nFinally, if you have a window of a single byte, then, the steps 2 to 4, which produces the bytes `output[1]`, `output[2]` and `output[3]` change,\nbecoming:\n\n- `output[1]` is produced by taking the last two bits from `input[0]`, then, move them four positions to the left.\n- `output[2]` and `output[3]` are the character `=`.\n\n\nIf these bulletpoints were a bit confusing for you, you may find the @tbl-transf-6bit more intuitive.\nThis table unifies all this logic into a simple table. Notice that\nthis table also provides the exact expression in Zig that creates the corresponding\nbyte in the output.\n\n\n::: {#tbl-transf-6bit}\n\n| Number of bytes in the window | Byte index in the output | In code |\n|-------------------------------|--------------------------|--------------------------------------------|\n| 3 | 0 | input[0] >> 2 |\n| 3 | 1 | ((input[0] & 0x03) << 4) + (input[1] >> 4) |\n| 3 | 2 | ((input[1] & 0x0f) << 2) + (input[2] >> 6) |\n| 3 | 3 | input[2] & 0x3f |\n| 2 | 0 | input[0] >> 2 |\n| 2 | 1 | ((input[0] & 0x03) << 4) + (input[1] >> 4) |\n| 2 | 2 | ((input[1] & 0x0f) << 2) |\n| 2 | 3 | '=' |\n| 1 | 0 | input[0] >> 2 |\n| 1 | 1 | ((input[0] & 0x03) << 4) |\n| 1 | 2 | '=' |\n| 1 | 3 | '=' |\n\n: How the 6-bit transformation translates into code in different window settings.\n\n:::\n\n\n\n\n\n\n### Bit-shifting in Zig\n\nBit-shifting in Zig works similarly to bit-shifting in C.\nAll bitwise operators that exist in C are available in Zig.\nHere, in the base64 encoder algorithm, they are essential\nto produce the result we want.\n\nFor those who are not familiar with these operators, they are\noperators that operates at the bit-level of your values.\nThis means that these operators takes the bits that form the value\nyou have, and change them in some way. This ultimately also changes\nthe value itself, because the binary representation of this value\nchanges.\n\nWe have already seen at @fig-encoder-bitshift the effect produced by a bit-shift.\nBut let's use the first byte in the output of the base64 encoder as another example of what\nbit-shifting means. This is the easiest byte of the 4 bytes in the output\nto build. Because we only need to move the bits from the first byte in the input two positions to the right,\nwith the *bit shift to the right* (`>>`) operator.\n\nIf we take the string \"Hi\" that we used at @fig-base64-algo1 as an example, the first byte in\nthis string is \"H\", which is the sequence `01001000` in binary.\nIf we move the bits of this byte, two places to the right, we get the sequence `00010010` as result.\nThis binary sequence is the value `18` in decimal, and also, the value `0x12` in hexadecimal.\nNotice that the first 6 bits of \"H\" were moved to the end of the byte.\nWith this operation, we get the first byte of the output.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const input = \"Hi\";\n try stdout.print(\"{d}\\n\", .{input[0] >> 2});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n18\n```\n\n\n:::\n:::\n\n\n\n\nIf you recall @fig-base64-algo1, the first byte present in the output should\nbe equivalent to the 6-bit group `010010`. Although being visually different, the\nsequences `010010` and `00010010` are semantically equal. They mean the same thing.\nThey both represent the number 18 in decimal, and the value `0x12` in hexadecimal.\n\nSo, don't take the \"6-bit group\" factor so seriously. We do not need necessarily to\nget a 6-bit sequence as result. As long as the meaning of the 8-bit sequence we get is the same\nof the 6-bit sequence, we are in the clear.\n\n\n\n### Selecting specific bits with the `&` operator\n\nIf you comeback to @sec-6bit-transf, you will see that, in order to produce\nthe second and third bytes in the output, we need to select specific\nbits from the first and second bytes in the input string. But how\ncan we do that? The answer relies on the *bitwise and* (`&`) operator.\n\nThe @fig-encoder-bitshift already showed you what effect this `&` operator\nproduces in the bits of it's operands. But let's make a clear description of it.\n\nIn summary, the `&` operator performs a logical conjunction operation\nbetween the bits of it's operands. In more details, the operator `&`\ncompares each bit of the first operand to the corresponding bit of the second operand.\nIf both bits are 1, the corresponding result bit is set to 1.\nOtherwise, the corresponding result bit is set to 0 [@microsoftbitwiseand].\n\nSo, if we apply this operator to the binary sequences `1000100` and `00001101`\nthe result of this operation is the binary sequence `00000100`. Because only\nat the sixth position in both binary sequences we had a 1 value. So any\nposition where we do not have both binary sequences setted to 1, we get\na 0 bit in the resulting binary sequence.\n\nWe lose information about the original bit values\nfrom both sequences in this case. Because we no longer know\nif this 0 bit in the resulting binary sequence was produced by\ncombining 0 with 0, or 1 with 0, or 0 with 1.\n\nAs an example, suppose you have the binary sequence `10010111`, which is the number 151 in decimal. How\ncan we get a new binary sequence which contains only the third and\nfourth bits of this sequence?\n\nWe just need to combine this sequence with `00110000` (is `0x30` in hexadecimal) using the `&` operator.\nNotice that only the third and fourth positions in this binary sequence is setted to 1. As a consequence, only the\nthird and fourth values of both binary sequences are potentially preserved in the output. All the remaining positions\nare setted to zero in the output sequence, which is `00010000` (is the number 16 in decimal).\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const bits = 0b10010111;\n try stdout.print(\"{d}\\n\", .{bits & 0b00110000});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n16\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Allocating space for the output\n\nAs I described at @sec-stack, to store an object in the stack,\nthis object needs to have a known and fixed length at compile-time. This is an important\nlimitation for our base64 encoder/decoder case. Because the size of\nthe output (from both the encoder and decoder) depends\ndirectly on the size of the input.\n\nHaving this in mind, we cannot know at compile time which is\nthe size of the output for both the encoder and decoder.\nSo, if we can't know the size of the output at compile time,\nthis means that we cannot store the output for both the encoder\nand decoder in the stack.\n\nConsequently, we need to store this output on the heap,\nand, as I commented at @sec-heap, we can only\nstore objects in the heap by using allocator objects.\nSo, one the arguments to both the `encode()` and `decode()`\nfunctions, needs to be an allocator object, because\nwe know for sure that, at some point inside the body of these\nfunctions, we need to allocate space on the heap to\nstore the output of these functions.\n\nThat is why, both the `encode()` and `decode()` functions that I\npresent in this book, have an argument called `allocator`,\nwhich receives a allocator object as input, identified by\nthe type `std.mem.Allocator` from the Zig Standard Library.\n\n\n\n### Writing the `encode()` function\n\nNow that we have a basic understanding on how the bitwise operators work, and how\nexactly they help us to achieve the result we want to achieve. We can now encapsulate\nall the logic that we have described at @fig-base64-algo1 and @tbl-transf-6bit into a nice\nfunction that we can add to our `Base64` struct definition, that we started at @sec-base64-table.\n\nYou can find the `encode()` function below. Notice that the first argument of this function,\nis the `Base64` struct itself. Therefore, this argument clearly signals\nthat this function is a method from the `Base64` struct.\n\nBecause the `encode()` function itself is fairly long,\nI intentionally ommitted the `Base64` struct definition in this source code,\njust for brevity reasons. So, just remember that this function is a public function (or a public method) from the\n`Base64` struct.\n\nFurthermore, this `encode()` function have two other arguments:\n\n1. `input` is the input sequence of characters that you want to encode in base64;\n2. `allocator` is an allocator object to use in the necessary memory allocations.\n\nI described everything you need to know about allocator objects at @sec-allocators.\nSo, if you are not familiar with them, I highly recommend you to comeback to\nthat section, and read it.\nBy looking at the `encode()` function, you will see that we use this\nallocator object to allocate enough memory to store the output of\nencoding process.\n\nThe main for loop in the function is responsible for iterating through the entire input string.\nIn every iteration, we use a `count` variable to count how many iterations we had at the\nmoment. When `count` reaches 3, then, we try to encode the 3 characters (or bytes) that we have accumulated\nin the temporary buffer object (`buf`).\n\nAfter encoding these 3 characters and storing the result in the `output` variable, we reset\nthe `count` variable to zero, and start to count again on the next iteration of the loop.\nIf the loop hits the end of the string, and, the `count` variable is less than 3, then, it means that\nthe temporary buffer contains the last 1 or 2 bytes from the input.\nThat is why we have two `if` statements after the for loop. To deal which each possible case.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn encode(self: Base64,\n allocator: std.mem.Allocator,\n input: []const u8) ![]u8 {\n\n if (input.len == 0) {\n return \"\";\n }\n\n const n_out = try _calc_encode_length(input);\n var out = try allocator.alloc(u8, n_out);\n var buf = [3]u8{ 0, 0, 0 };\n var count: u8 = 0;\n var iout: u64 = 0;\n\n for (input, 0..) |_, i| {\n buf[count] = input[i];\n count += 1;\n if (count == 3) {\n out[iout] = self._char_at(buf[0] >> 2);\n out[iout + 1] = self._char_at(\n ((buf[0] & 0x03) << 4) + (buf[1] >> 4)\n );\n out[iout + 2] = self._char_at(\n ((buf[1] & 0x0f) << 2) + (buf[2] >> 6)\n );\n out[iout + 3] = self._char_at(buf[2] & 0x3f);\n iout += 4;\n count = 0;\n }\n }\n\n if (count == 1) {\n out[iout] = self._char_at(buf[0] >> 2);\n out[iout + 1] = self._char_at(\n (buf[0] & 0x03) << 4\n );\n out[iout + 2] = '=';\n out[iout + 3] = '=';\n }\n\n if (count == 2) {\n out[iout] = self._char_at(buf[0] >> 2);\n out[iout + 1] = self._char_at(\n ((buf[0] & 0x03) << 4) + (buf[1] >> 4)\n );\n out[iout + 2] = self._char_at(\n (buf[1] & 0x0f) << 2\n );\n out[iout + 3] = '=';\n iout += 4;\n }\n\n return out;\n}\n```\n:::\n\n\n\n\n\n\n## Building the decoder logic {#sec-decoder-logic}\n\nNow, we can focus on writing the base64 decoder logic. Remember from @fig-base64-algo2 that,\na base64 decoder do the inverse process of an encoder. So, all we need to do, is to\nwrite a `decode()` function that performs the inverse process that I exposed at @sec-encoder-logic.\n\n\n### Mapping base64 characters to their indexes {#sec-map-base64-index}\n\nOne thing that we need to do, in order to decode a base64-encoded message, is to calculate\nthe index in the base64 scale of every base64 character that we encounter in the decoder input.\n\nIn other words, the decoder receives as input, a sequence of base64 characters. We need\nto translate this sequence of characters into a sequence of indexes. These indexes\nare the index of each character in the base64 scale. This way, we get the value/byte\nthat was calculated in the 6-bit transformation step of the encoder process.\n\nThere are probably better/faster ways to calculate this, especially using a \"divide and conquer\"\ntype of strategy. But for now, I am satisfied with a simple and \"brute force\" type of strategy.\nThe `_char_index()` function below contains this strategy.\n\nWe are essentially looping through the *lookup table* with the base64 scale,\nand comparing the character we got with each character in the base64 scale.\nIf these characters match, then, we return the index of this character in the\nbase64 scale as the result.\n\nNotice that, if the input character is `'='`, the function returns the index 64, which is\n\"out of range\" in the scale. But, as I described at @sec-base64-scale,\nthe character `'='` does not belong to the base64 scale itself.\nIt is a special and meaningless character in base64.\n\nAlso notice that this `_char_index()` function is a method from our `Base64` struct,\nbecause of the `self` argument. Again, I have omitted the `Base64` struct definition in this example\nfor brevity reasons.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn _char_index(self: Base64, char: u8) u8 {\n if (char == '=')\n return 64;\n var index: u8 = 0;\n for (0..63) |i| {\n if (self._char_at(i) == char) {\n index = i;\n break;\n }\n }\n\n return index;\n}\n```\n:::\n\n\n\n\n\n\n### The 6-bit transformation\n\nOnce again, the core part of the algorithm is the 6-bit transformation.\nIf we understand the necessary steps to perform this transformation, the rest\nof the algorithm becomes much easier.\n\nFirst of all, before we actually go to the 6-bit transformation,\nwe need to make sure that we use `_char_index()` to convert the sequence of base64 characters\ninto a sequence of indexes. So the snippet below is important for the job that will be done.\nThe result of `_char_index()` is stored in a temporary buffer, and this temporary buffer\nis what we are going to use in the 6-bit transformation, instead of the actual `input` object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfor (0..input.len) |i| {\n buf[i] = self._char_index(input[i]);\n}\n```\n:::\n\n\n\n\nNow, instead of producing 4 bytes (or 4 characters) as output per each window of 3 characters in the input,\na base64 decoder produces 3 bytes (or 3 characters) as output per each window of 4 characters in the input.\nOnce again, is the inverse process.\n\nSo, the steps to produce the 3 bytes in the output are:\n\n1. `output[0]` is produced by summing two components. First, move the bits from `buf[0]` two positions to the left. Second, move the bits from `buf[1]` 4 positions to the right. Then, sum these two components.\n1. `output[1]` is produced by summing two components. First, move the bits from `buf[1]` four positions to the left. Second, move the bits from `buf[2]` 2 positions to the right. Then, sum these two components.\n1. `output[2]` is produced by summing two components. First, move the bits from `buf[2]` six positions to the left. Then, you sum the result with `buf[3]`.\n\n\nBefore we continue, let's try to visualize how these transformations make the original bytes that we had\nbefore the encoding process. First, think back at the 6-bit transformation performed by the encoder exposed at @sec-encoder-logic.\nThe first byte in the output of the encoder is produced by moving the bits in the first byte of the input two positions to the right.\n\nIf for example the first byte in the input of the encoder was the sequence `ABCDEFGH`, then, the first byte in the output of the encoder would be\n`00ABCDEF` (this sequence would be the first byte in the input of the decoder). Now, if the second byte in the input of the encoder was the sequence\n`IJKLMNOP`, then, the second byte in the encoder output would be `00GHIJKL` (as we demonstrated at @fig-encoder-bitshift).\n\nHence, if the sequences `00ABCDEF` and `00GHIJKL` are the first and second bytes, respectively, in the input of the decoder, the\n@fig-decoder-bitshift demonstrates visually how these two bytes are transformed into the first byte of the output of the decoder.\nNotice that the output byte is the sequence `ABCDEFGH`, which is the original byte from the input of the encoder.\n\n![How the 1st byte in the decoder output is produced from the 1st byte (dark purple) and the 2nd byte (orange) of the input](../Figures/base64-decoder-bit-shift.png){#fig-decoder-bitshift}\n\nThe @tbl-6bit-decode presents how the three steps described ealier translate into Zig code:\n\n\n\n::: {#tbl-6bit-decode}\n\n| Byte index in the output | In code |\n|--------------------------|-------------------------------|\n| 0 | (buf[0] << 2) + (buf[1] >> 4) |\n| 1 | (buf[1] << 4) + (buf[2] >> 2) |\n| 2 | (buf[2] << 6) + buf[3] |\n\n: The necessary steps for the 6-transformation in the decode process.\n\n\n:::\n\n\n\n\n\n\n\n### Writing the `decode()` function\n\nThe `decode()` function below contains the entire decoding process.\nWe first calculate the size of the output, with\n`_calc_decode_length()`, then, we allocate enough memory for this output with\nthe allocator object.\n\nThree temporary variables are created: 1) `count`, to hold the window count\nin each iteration of the for loop; 2) `iout`, to hold the current index in the output;\n3) `buf`, which is the temporary buffer that holds the base64 indexes to be\nconverted through the 6-bit transformation.\n\nThen, in each iteration of the for loop we fill the temporary buffer with the current\nwindow of bytes. When `count` hits the number 4, then, we have a full window of\nindexes in `buf` to be converted, and then, we apply the 6-bit transformation\nover the temporary buffer.\n\nNotice that we check if the indexes 2 and 3 in the temporary buffer are the number 64, which, if you recall\nfrom @sec-map-base64-index, is when the `_calc_index()` function receives a `'='` character\nas input. So, if these indexes are equal to the number 64, the `decode()` function knows\nthat it can simply ignore these indexes. They are not converted because, as I described before,\nthe character `'='` have no meaning, despite being the end of meaningful characters in the sequence.\nSo we can safely ignore them when they appear in the sequence.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn decode(self: Base64,\n allocator: std.mem.Allocator,\n input: []const u8) ![]u8 {\n\n if (input.len == 0) {\n return \"\";\n }\n const n_output = try _calc_decode_length(input);\n var output = try allocator.alloc(u8, n_output);\n var count: u8 = 0;\n var iout: u64 = 0;\n var buf = [4]u8{ 0, 0, 0, 0 };\n\n for (0..input.len) |i| {\n buf[count] = self._char_index(input[i]);\n count += 1;\n if (count == 4) {\n output[iout] = (buf[0] << 2) + (buf[1] >> 4);\n if (buf[2] != 64) {\n output[iout + 1] = (buf[1] << 4) + (buf[2] >> 2);\n }\n if (buf[3] != 64) {\n output[iout + 2] = (buf[2] << 6) + buf[3];\n }\n iout += 3;\n count = 0;\n }\n }\n\n return output;\n}\n```\n:::\n\n\n\n\n\n## The end result\n\nNow that we have both `decode()` and `encode()` implemented. We have a fully functioning\nbase64 encoder/decoder implemented in Zig. Here is an usage example of our\n`Base64` struct with the `encode()` and `decode()` methods that we have implemented.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar memory_buffer: [1000]u8 = undefined;\nvar fba = std.heap.FixedBufferAllocator.init(\n &memory_buffer\n);\nconst allocator = fba.allocator();\n\nconst text = \"Testing some more shit\";\nconst etext = \"VGVzdGluZyBzb21lIG1vcmUgc2hpdA==\";\nconst base64 = Base64.init();\nconst encoded_text = try base64.encode(\n allocator, text\n);\nconst decoded_text = try base64.decode(\n allocator, etext\n);\ntry stdout.print(\n \"Encoded text: {s}\\n\", .{encoded_text}\n);\ntry stdout.print(\n \"Decoded text: {s}\\n\", .{decoded_text}\n);\n```\n:::\n\n\n\n\n```\nEncoded text: VGVzdGluZyBzb21lIG1vcmUgc2hpdA==\nDecoded text: Testing some more shit\n```\n\nYou can also see the full source code at once, by visiting the official repository of this book[^repo].\nMore precisely inside the `ZigExamples` folder[^zig-base64-algo].\n\n[^repo]: \n",
"supporting": [
"01-base64_files"
],
diff --git a/_freeze/Chapters/01-memory/execute-results/html.json b/_freeze/Chapters/01-memory/execute-results/html.json
index 4b45877..347dcb6 100644
--- a/_freeze/Chapters/01-memory/execute-results/html.json
+++ b/_freeze/Chapters/01-memory/execute-results/html.json
@@ -1,8 +1,8 @@
{
- "hash": "9ff7da16566169375efb1e911cec98e5",
+ "hash": "6eec5695f16f7671e06b72fd53cbe1ae",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Memory and Allocators\n\n\nIn this chapter, we will talk about memory. How does Zig controls memory? What\ncommon tools are used? Are there any important aspect that makes memory\ndifferent/special in Zig? You will find the answers here.\n\nEvery computer needs memory. Is by having memory that computers can temporarily store\nthe values/results of your calculations. Without memory, programming languages would never have\nconcepts such as \"variables\", or \"objects\", to store the values that you generate.\n\n\n## Memory spaces\n\nEvery object that you create in your Zig source code needs to be stored somewhere,\nin your computer's memory. Depending on where and how you define your object, Zig\nwill use a different \"memory space\", or a different\ntype of memory to store this object.\n\nEach type of memory normally serves for different purposes.\nIn Zig, there are 3 types of memory (or 3 different memory spaces) that we care about. They are:\n\n- Global data register (or the \"global data section\");\n- Stack;\n- Heap;\n\n\n### Compile-time known versus runtime known {#sec-compile-time}\n\nOne strategy that Zig uses to decide where it will store each object that you declare, is by looking\nat the value of this particular object. More specifically, by investigating if this value is\nknown at \"compile-time\" or at \"runtime\".\n\nWhen you write a program in Zig, the values of some of the objects that you write in your program are *known\nat compile time*. Meaning that, when you compile your Zig source code, during the compilation process,\nthe `zig` compiler can figure it out what is the exact value of a particular object\nthat exists in your source code.\nKnowing the length (or the size) of each object is also important. So the length (or the size) of each object that you write in your program is,\nin some cases, *known at compile time*.\n\nThe `zig` compiler cares more about knowing the length (or the size) of a particular object\n, than to know it's actual value. But, if the `zig` compiler knows the value of the object, then, it\nautomatically knows the size of this object. Because it can simply calculate the\nsize of the object by looking at the size of the value.\n\nTherefore, the priority for the `zig` compiler is to discover the size of each object in your source code.\nIf the value of the object in question is known at compile-time, then, the `zig` compiler\nautomatically knows the size/length of this object. But if the value of this object is not\nknown at compile-time, then, the size of this object is only known at compile-time if,\nand only if, the type of this object have a known fixed size.\n\nIn order to a type have a known fixed size, this type must have data members whose size is fixed.\nIf this type includes, for example, a variable sized array in it, then, this type do not have a known\nfixed size. Because this array can have any size at runtime\n(i.e. it can be an array of 2 elements, or 50 elements, or 1 thousand elements, etc.).\n\nFor example, a string object, which internally is an array of constant u8 values (`[]const u8`)\nhave a variable size. It can be a string object with 100 or 500 characters in it. If we do not\nknow at compile-time, which exact string will be stored inside this string object, then, we cannot calculate\nthe size of this string object at compile-time. So, any type, or any struct declaration that you make, that\nincludes a string data member that do not have an explicit fixed size, makes this type, or this\nnew struct that you are declaring, a type that do not have a known fixed size at compile-time.\n\nIn contrast, if the type or this struct that you are declaring, includes a data member that is an array,\nbut this array have a known fixed size, like `[60]u8` (which declares an array of 60 `u8` values), then,\nthis type, or, this struct that you are declaring, becomes a type with a known fixed size at compile-time.\nAnd because of that, in this case, the `zig` compiler do not need to known at compile-time the exact value of\nany object of this type. Since the compiler can find the necessary size to store this object by\nlooking at the size of it's type.\n\n\nLet's look at an example. In the source code below, we have two constant objects (`name` and `array`) declared.\nBecause the values of these particular objects are written down, in the source code itself (`\"Pedro\"`\nand the number sequence from 1 to 4), the `zig` compiler can easily discover the values of these constant\nobjects (`name` and `array`) during the compilation process.\nThis is what \"known at compile time\" means. It refers to any object that you have in your Zig source code\nwhose value can be identified at compile time.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst name = \"Pedro\";\nconst array = [_]u8{1, 2, 3, 4};\n_ = name; _ = array;\n\nfn input_length(input: []const u8) usize {\n const n = input.len;\n return n;\n}\n```\n:::\n\n\n\n\nThe other side of the spectrum are objects whose values are not known at compile time.\nFunction arguments are a classic example of this. Because the value of each function\nargument depends on the value that you assign to this particular argument,\nwhen you call the function.\n\nFor example, the function `input_length()` contains an argument named `input`, which is an array of constant `u8` integers (`[]const u8`).\nIs impossible to know at compile time the value of this particular argument. And it also is impossible to know the size/length\nof this particular argument. Because it is an array that do not have a fixed size specified explicitly in the argument type annotation.\n\nSo, we know that this `input` argument will be an array of `u8` integers. But we do not know at compile-time, it's value, and neither his size.\nThis information is known only at runtime, which is the period of time when you program is executed.\nAs a consequence, the value of the expression `input.len` is also known only at runtime.\nThis is an intrinsic characteristic of any function. Just remember that the value of function arguments is usually not \"compile-time known\".\n\nHowever, as I mentioned earlier, what really matters to the compiler is to know the size of the object\nat compile-time, and not necessarily it's value. So, although we don't know the value of the object `n`, which is the result of the expression\n`input.len`, at compile-time, we do know it's size. Because the expression `input.len` always return a value of type `usize`,\nand the type `usize` have a known fixed size.\n\n\n\n### Global data register\n\nThe global data register is a specific section of the executable of your Zig program, that is responsible\nfor storing any value that is known at compile time.\n\nEvery constant object whose value is known at compile time that you declare in your source code,\nis stored in the global data register. Also, every literal value that you write in your source code,\nsuch as the string `\"this is a string\"`, or the integer `10`, or a boolean value such as `true`,\nis also stored in the global data register.\n\nHonestly, you don't need to care much about this memory space. Because you can't control it,\nyou can't deliberately access it or use it for your own purposes.\nAlso, this memory space does not affect the logic of your program.\nIt simply exists in your program.\n\n\n### Stack vs Heap\n\nIf you are familiar with system's programming, or just low-level programming in general, you\nprobably have heard of the \"duel\" between Stack vs Heap. These are two different types of memory,\nor different memory spaces, which are both available in Zig.\n\nThese two types of memory don't actually duel with\neach other. This is a common mistake that beginners have, when seeing \"x vs y\" styles of\ntabloid headlines. These two types of memory are actually complementary to each other.\nSo, in almost every Zig program that you ever write, you will likely use a combination of both.\nI will describe each memory space in detail over the next sections. But for now, I just want to\nstablish the main difference between these two types of memory.\n\nIn essence, the stack memory is normally used to store values whose length is fixed and known\nat compile time. In contrast, the heap memory is a *dynamic* type of memory space, meaning that, it is\nused to store values whose length might grow during the execution (runtime) of your program [@jenny2022].\n\nLengths that grow during runtime are intrinsically associated with \"runtime known\" type of values.\nIn other words, if you have an object whose length might grow during runtime, then, the length\nof this object becomes not known at compile time. If the length is not known at compile-time,\nthe value of this object also becomes not known at compile-time.\nThese types of objects should be stored in the heap memory space, which is\na dynamic memory space, which can grow or shrink to fit the size of your objects.\n\n\n\n### Stack {#sec-stack}\n\nThe stack is a type of memory that uses the power of the *stack data structure*, hence the name. \nA \"stack\" is a type of *data structure* that uses a \"last in, first out\" (LIFO) mechanism to store the values\nyou give it to. I imagine you are familiar with this data structure.\nBut, if you are not, the [Wikipedia page](https://en.wikipedia.org/wiki/Stack_(abstract_data_type))[^wiki-stack]\n, or, the [Geeks For Geeks page](https://www.geeksforgeeks.org/stack-data-structure/)[^geek-stack] are both\nexcellent and easy resources to fully understand how this data structure works.\n\n[^wiki-stack]: \n[^geek-stack]: \n\nSo, the stack memory space is a type of memory that stores values using a stack data structure.\nIt adds and removes values from the memory by following a \"last in, first out\" (LIFO) principle.\n\nEvery time you make a function call in Zig, an amount of space in the stack is\nreserved for this particular function call [@jenny2022; @zigdocs].\nThe value of each function argument given to the function in this function call is stored in this\nstack space. Also, every local object that you declare inside the function scope is\nusually stored in this same stack space.\n\n\nLooking at the example below, the object `result` is a local object declared inside the scope of the `add()`\nfunction. Because of that, this object is stored inside the stack space reserved for the `add()` function.\nThe `r` object (which is declared outside of the `add()` function scope) is also stored in the stack.\nBut since it is declared in the \"outer\" scope, this object is stored in the\nstack space that belongs to this outer scope.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst r = add(5, 27);\n_ = r;\n\nfn add(x: u8, y: u8) u8 {\n const result = x + y;\n return result;\n}\n```\n:::\n\n\n\n\n\nSo, any object that you declare inside the scope of a function is always stored inside\nthe space that was reserved for that particular function in the stack memory. This\nalso counts for any object declared inside the scope of your `main()` function for example.\nAs you would expect, in this case, they\nare stored inside the stack space reserved for the `main()` function.\n\nOne very important detail about the stack memory is that **it frees itself automatically**.\nThis is very important, remember that. When objects are stored in the stack memory,\nyou don't have the work (or the responsibility) of freeing/destroying these objects.\nBecause they will be automatically destroyed once the stack space is freed at the end of the function scope.\n\nSo, once the function call returns (or ends, if you prefer to call it this way)\nthe space that was reserved in the stack is destroyed, and all of the objects that were in that space goes away with it.\nThis mechanism exists because this space, and the objects within it, are not necessary anymore,\nsince the function \"finished it's business\".\nUsing the `add()` function that we exposed above as an example, it means that the object `result` is automatically\ndestroyed once the function returns.\n\n::: {.callout-important}\nLocal objects that are stored in the stack space of a function are automatically\nfreed/destroyed at the end of the function scope.\n:::\n\n\nThis same logic applies to any other special structure in Zig that have it's own scope by surrounding\nit with curly braces (`{}`).\nFor loops, while loops, if else statements, etc. For example, if you declare any local\nobject in the scope of a for loop, this local object is accessible only within the scope\nof this particular for loop. Because once the scope of this for loop ends, the space in the stack\nreserved for this for loop is freed.\nThe example below demonstrates this idea.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// This does not compile succesfully!\nconst a = [_]u8{0, 1, 2, 3, 4};\nfor (0..a.len) |i| {\n const index = i;\n _ = index;\n}\n// Trying to use an object that was\n// declared in the for loop scope,\n// and that does not exist anymore.\nstd.debug.print(\"{d}\\n\", index);\n```\n:::\n\n\n\n\n\n\nOne important consequence of this mechanism is that, once the function returns, you can no longer access any memory\naddress that was inside the space in the stack reserved for this particular function. Because this space was\ndestroyed. This means that, if this local object is stored in the stack,\nyou cannot make a function that **returns a pointer to this object**.\n\nThink about that for a second. If all local objects in the stack are destroyed at the end of the function scope, why\nwould you even consider returning a pointer to one of these objects? This pointer is at best,\ninvalid, or, more likely, \"undefined\".\n\nConclusion, is totally fine to write a function that returns the local object\nitself as result, because then, you return the value of that object as the result.\nBut, if this local object is stored in the stack, you should never write a function\nthat returns a pointer to this local object. Because the memory address pointed by the pointer\nno longer exists.\n\n\nSo, using again the `add()` function as an example, if you rewrite this function so that it\nreturns a pointer to the local object `result`, the `zig` compiler will actually compile\nyou program, with no warnings or erros. At first glance, it looks that this is good code\nthat works as expected. But this is a lie!\n\nIf you try to take a look at the value inside of the `r` object,\nor, if you try to use this `r` object in another expression\nor function call, then, you would have undefined behaviour, and major\nbugs in your program [@zigdocs, see \"Lifetime and Ownership\"[^life] and \"Undefined Behaviour\"[^undef] sections].\n\n[^life]: \n[^undef]: \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// This code compiles succesfully. But it has\n// undefined behaviour. Never do this!!!\n\n// The `r` object is undefined!\nconst r = add(5, 27);\n_ = r;\n\nfn add(x: u8, y: u8) *const u8 {\n const result = x + y;\n return &result;\n}\n```\n:::\n\n\n\n\nThis \"invalid pointer to stack variable\" problem is very known across many programming language communities.\nIf you try to do the same thing, for example, in a C or C++ program (i.e. returning an address to\na local object stored in the stack), you would also get undefined behaviour\nin the program.\n\n::: {.callout-important}\nIf a local object in your function is stored in the stack, you should never\nreturn a pointer to this local object from the function. Because\nthis pointer will always become undefined after the function returns, since the stack space of the function\nis destroyed at the end of it's scope.\n:::\n\nBut what if you really need to use this local object in some way after your function returns?\nHow can you do this? The answer is: \"in the same you would do if this was a C or C++ program. By returning\nan address to an object stored in the heap\". The heap memory have a much more flexible lifecycle,\nand allows you to get a valid pointer to a local object of a function that already returned\nfrom it's scope.\n\n\n### Heap {#sec-heap}\n\nOne important limitation of the stack, is that, only objects whose length/size is known at compile-time can be\nstored in it. In contrast, the heap is a much more dynamic\n(and flexible) type of memory. It is the perfect type of memory to use\non objects whose size/length might grow during the execution of your program.\n\nVirtually any application that behaves as a server is a classic use case of the heap.\nA HTTP server, a SSH server, a DNS server, a LSP server, ... any type of server.\nIn summary, a server is a type of application that runs for long periods of time,\nand that serves (or \"deals with\") any incoming request that reaches this particular server.\n\nThe heap is a good choice for this type of system, mainly because the server does not know upfront\nhow many requests it will receive from users, while it is active. It could be one single request,\nor, 5 thousand requests, or, it could also be zero requests.\nThe server needs to have the ability to allocate and manage it's memory according to how many requests it receives.\n\nAnother key difference between the stack and the heap, is that the heap is a type\nof memory that you, the programmer, have complete control over. This makes the heap a\nmore flexible type of memory, but it also makes it harder to work with it. Because you,\nthe programmer, is responsible for managing everything related to it. Including where the memory is allocated,\nhow much memory is allocated, and where this memory is freed.\n\n> Unlike stack memory, heap memory is allocated explicitly by programmers and it won’t be deallocated until it is explicitly freed [@jenny2022].\n\nTo store an object in the heap, you, the programmer, needs to explicitly tells Zig to do so,\nby using an allocator to allocate some space in the heap. At @sec-allocators, I will present how you can use allocators to allocate memory\nin Zig.\n\n::: {.callout-important}\nEvery memory you allocate in the heap needs to be explicitly freed by you, the programmer.\n:::\n\nThe majority of allocators in Zig do allocate memory on the heap. But some exceptions to this rule are\n`ArenaAllocator()` and `FixedBufferAllocator()`. The `ArenaAllocator()` is a special\ntype of allocator that works in conjunction with a second type of allocator.\nOn the other side, the `FixedBufferAllocator()` is an allocator that works based on\nbuffer objects created on the stack. This means that the `FixedBufferAllocator()` makes\nallocations only on the stack.\n\n\n\n\n### Summary\n\nAfter discussing all of these boring details, we can quickly recap what we learned.\nIn summary, the Zig compiler will use the following rules to decide where each\nobject you declare is stored:\n\n1. every literal value (such as `\"this is string\"`, `10`, or `true`) is stored in the global data section.\n\n1. every constant object (`const`) whose value **is known at compile-time** is also stored in the global data section.\n\n1. every object (constant or not) whose length/size **is known at compile time** is stored in the stack space for the current scope.\n\n1. if an object is created with the method `alloc()` or `create()` of an allocator object, this object is stored in the memory space used by this particular allocator object. Most of allocators available in Zig use the heap memory, so, this object is likely stored in the heap (`FixedBufferAllocator()` is an exception to that).\n\n1. the heap can only be accessed through allocators. If your object was not created through the `alloc()` or `create()` methods of an allocator object, then, he is most certainly not an object stored in the heap.\n\n\n## Allocators {#sec-allocators}\n\nOne key aspect about Zig, is that there are \"no hidden-memory allocations\" in Zig.\nWhat that really means, is that \"no allocations happen behind your back in the standard library\" [@zigguide].\n\nThis is a known problem, especially in C++. Because in C++, there are some operators that do allocate\nmemory behind the scene, and there is no way for you to known that, until you actually read the\nsource code of these operators, and find the memory allocation calls.\nMany programmers find this behaviour annoying and hard to keep track of.\n\nBut, in Zig, if a function, an operator, or anything from the standard library\nneeds to allocate some memory during it's execution, then, this function/operator needs to receive (as input) an allocator\nprovided by the user, to actually be able to allocate the memory it needs.\n\nThis creates a clear distinction between functions that \"do not\" from those that \"actually do\"\nallocate memory. Just look at the arguments of this function.\nIf a function, or operator, have an allocator object as one of it's inputs/arguments, then, you know for\nsure that this function/operator will allocate some memory during it's execution.\n\nAn example is the `allocPrint()` function from the Zig standard library. With this function, you can\nwrite a new string using format specifiers. So, this function is, for example, very similar to the function `sprintf()` in C.\nIn order to write such new string, the `allocPrint()` function needs to allocate some memory to store the\noutput string.\n\nThat is why, the first argument of this function is an allocator object that you, the user/programmer, gives\nas input to the function. In the example below, I am using the `GeneralPurposeAllocator()` as my allocator\nobject. But I could easily use any other type of allocator object from the Zig standard library.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nconst name = \"Pedro\";\nconst output = try std.fmt.allocPrint(\n allocator,\n \"Hello {s}!!!\",\n .{name}\n);\ntry stdout.print(\"{s}\\n\", .{output});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHello Pedro!!!\n```\n\n\n:::\n:::\n\n\n\n\n\nYou get a lot of control\nover where and how much memory this function can allocate. Because it is you,\nthe user/programmer, that provides the allocator for the function to use.\nThis makes \"total control\" over memory management easier to achieve in Zig.\n\n### What are allocators?\n\nAllocators in Zig are objects that you can use to allocate memory for your program.\nThey are similar to the memory allocating functions in C, like `malloc()` and `calloc()`.\nSo, if you need to use more memory than you initially have, during the execution of your program, you can simply ask\nfor more memory using an allocator.\n\nZig offers different types of allocators, and they are usually available through the `std.heap` module of\nthe standard library. So, just import the Zig standard library into your Zig module (with `@import(\"std\")`), and you can start\nusing these allocators in your code.\n\nFurthermore, every allocator object is built on top of the `Allocator` interface in Zig. This\nmeans that, every allocator object you find in Zig must have the methods `alloc()`,\n`create()`, `free()` and `destroy()`. So, you can change the type of allocator you are using,\nbut you don't need to change the function calls to the methods that do the memory allocation\n(and the free memory operations) for your program.\n\n### Why you need an allocator?\n\nAs we described at @sec-stack, everytime you make a function call in Zig,\na space in the stack is reserved for this function call. But the stack\nhave a key limitation which is: every object stored in the stack have a\nknown fixed length.\n\nBut in reality, there are two very common instances where this \"fixed length limitation\" of the stack is a deal braker:\n\n1. the objects that you create inside your function might grow in size during the execution of the function.\n\n2. sometimes, it is impossible to know upfront how many inputs you will receive, or how big this input will be.\n\nAlso, there is another instance where you might want to use an allocator, which is when you want to write a function that returns a pointer\nto a local object. As I described at @sec-stack, you cannot do that if this local object is stored in the\nstack. However, if this object is stored in the heap, then, you can return a pointer to this object at the\nend of the function. Because you (the programmer) control the lyfetime of any heap memory that you allocate. You decide\nwhen this memory get's destroyed/freed.\n\nThese are common situations where the stack is not good for.\nThat is why you need a different memory management strategy to\nstore these objects inside your function. You need to use\na memory type that can grow together with your objects, or that you\ncan control the lyfetime of this memory.\nThe heap fit this description.\n\nAllocating memory on the heap is commonly known as dynamic memory management. As the objects you create grow in size\nduring the execution of your program, you grow the amount of memory\nyou have by allocating more memory in the heap to store these objects. \nAnd you that in Zig, by using an allocator object.\n\n\n### The different types of allocators\n\n\nAt the moment of the writing of this book, in Zig, we have 6 different\nallocators available in the standard library:\n\n- `GeneralPurposeAllocator()`.\n- `page_allocator()`.\n- `FixedBufferAllocator()` and `ThreadSafeFixedBufferAllocator()`.\n- `ArenaAllocator()`.\n- `c_allocator()` (requires you to link to libc).\n\n\nEach allocator have it's own perks and limitations. All allocators, except `FixedBufferAllocator()` and `ArenaAllocator()`,\nare allocators that use the heap memory. So any memory that you allocate with\nthese allocators, will be placed in the heap.\n\n### General-purpose allocators\n\nThe `GeneralPurposeAllocator()`, as the name suggests, is a \"general purpose\" allocator. You can use it for every type\nof task. In the example below, I'm allocating enough space to store a single integer in the object `some_number`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n const some_number = try allocator.create(u32);\n defer allocator.destroy(some_number);\n\n some_number.* = @as(u32, 45);\n}\n```\n:::\n\n\n\n\n\nWhile useful, you might want to use the `c_allocator()`, which is a alias to the C standard allocator `malloc()`. So, yes, you can use\n`malloc()` in Zig if you want to. Just use the `c_allocator()` from the Zig standard library. However,\nif you do use `c_allocator()`, you must link to Libc when compiling your source code with the\n`zig` compiler, by including the flag `-lc` in your compilation process.\nIf you do not link your source code to Libc, Zig will not be able to find the\n`malloc()` implementation in your system.\n\n### Page allocator\n\nThe `page_allocator()` is an allocator that allocates full pages of memory in the heap. In other words,\nevery time you allocate memory with `page_allocator()`, a full page of memory in the heap is allocated,\ninstead of just a small piece of it.\n\nThe size of this page depends on the system you are using.\nMost systems use a page size of 4KB in the heap, so, that is the amount of memory that is normally\nallocated in each call by `page_allocator()`. That is why, `page_allocator()` is considered a\nfast, but also \"wasteful\" allocator in Zig. Because it allocates a big amount of memory\nin each call, and you most likely will not need that much memory in your program.\n\n### Buffer allocators\n\nThe `FixedBufferAllocator()` and `ThreadSafeFixedBufferAllocator()` are allocator objects that\nwork with a fixed sized buffer that is stored in the stack. So these two allocators only allocates\nmemory in the stack. This also means that, in order to use these allocators, you must first\ncreate a buffer object, and then, give this buffer as an input to these allocators.\n\nIn the example below, I am creating a `buffer` object that is 10 elements long.\nNotice that I give this `buffer` object to the `FixedBufferAllocator()` constructor.\nNow, because this `buffer` object is 10 elements long, this means that I am limited to this space.\nI cannot allocate more than 10 elements with this allocator object. If I try to\nallocate more than that, the `alloc()` method will return an `OutOfMemory` error value.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar buffer: [10]u8 = undefined;\nfor (0..buffer.len) |i| {\n buffer[i] = 0; // Initialize to zero\n}\n\nvar fba = std.heap.FixedBufferAllocator.init(&buffer);\nconst allocator = fba.allocator();\nconst input = try allocator.alloc(u8, 5);\ndefer allocator.free(input);\n```\n:::\n\n\n\n\n\n### Arena allocator {#sec-arena-allocator}\n\nThe `ArenaAllocator()` is an allocator object that takes a child allocator as input. The idea behind the `ArenaAllocator()` in Zig\nis similar to the concept of \"arenas\" in the programming language Go[^go-arena]. It is an allocator object that allows you\nto allocate memory as many times you want, but free all memory only once.\nIn other words, if you have, for example, called 5 times the method `alloc()` of an `ArenaAllocator()` object, you can\nfree all the memory you allocated over these 5 calls at once, by simply calling the `deinit()` method of the same `ArenaAllocator()` object.\n\n[^go-arena]: \n\nIf you give, for example, a `GeneralPurposeAllocator()` object as input to the `ArenaAllocator()` constructor, like in the example below, then, the allocations\nyou perform with `alloc()` will actually be made with the underlying object `GeneralPurposeAllocator()` that was passed.\nSo, with an arena allocator, any new memory you ask for is allocated by the child allocator. The only thing that an arena allocator\nreally do is helping you to free all the memory you allocated multiple times with just a single command. In the example\nbelow, I called `alloc()` 3 times. So, if I did not used an arena allocator, then, I would need to call\n`free()` 3 times to free all the allocated memory.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nvar aa = std.heap.ArenaAllocator.init(gpa.allocator());\ndefer aa.deinit();\nconst allocator = aa.allocator();\n\nconst in1 = allocator.alloc(u8, 5);\nconst in2 = allocator.alloc(u8, 10);\nconst in3 = allocator.alloc(u8, 15);\n_ = in1; _ = in2; _ = in3;\n```\n:::\n\n\n\n\n\n\n### The `alloc()` and `free()` methods\n\nIn the code example below, we are accessing the `stdin`, which is\nthe standard input channel, to receive an input from the\nuser. We read the input given by the user with the `readUntilDelimiterOrEof()`\nmethod.\n\nNow, after reading the input of the user, we need to store this input somewhere in\nour program. That is why I use an allocator in this example. I use it to allocate some\namount of memory to store this input given by the user. More specifically, the method `alloc()`\nof the allocator object is used to allocate an array capable of storing 50 `u8` values.\n\nNotice that this `alloc()` method receives two inputs. The first one, is a type.\nThis defines what type of values the allocated array will store. In the example\nbelow, we are allocating an array of unsigned 8-bit integers (`u8`). But\nyou can create an array to store any type of value you want. Next, on the second argument, we\ndefine the size of the allocated array, by specifying how much elements\nthis array will contain. In the case below, we are allocating an array of 50 elements.\n\nAt @sec-zig-strings we described that strings in Zig are simply arrays of characters.\nEach character is represented by an `u8` value. So, this means that the array that\nwas allocated in the object `input` is capable of storing a string that is\n50-characters long.\n\nSo, in essence, the expression `var input: [50]u8 = undefined` would create\nan array for 50 `u8` values in the stack of the current scope. But, you\ncan allocate the same array in the heap by using the expression `var input = try allocator.alloc(u8, 50)`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdin = std.io.getStdIn();\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var input = try allocator.alloc(u8, 50);\n defer allocator.free(input);\n for (0..input.len) |i| {\n input[i] = 0; // initialize all fields to zero.\n }\n // read user input\n const input_reader = stdin.reader();\n _ = try input_reader.readUntilDelimiterOrEof(\n input,\n '\\n'\n );\n std.debug.print(\"{s}\\n\", .{input});\n}\n```\n:::\n\n\n\n\nAlso, notice that in this example, we use the `defer` keyword (which I described at @sec-defer) to run a small\npiece of code at the end of the current scope, which is the expression `allocator.free(input)`.\nWhen you execute this expression, the allocator will free the memory that it allocated\nfor the `input` object.\n\nWe have talked about this at @sec-heap. You **should always** explicitly free any memory that you allocate\nusing an allocator! You do that by using the `free()` method of the same allocator object you\nused to allocate this memory. The `defer` keyword is used in this example only to help us execute\nthis free operation at the end of the current scope.\n\n\n### The `create()` and `destroy()` methods\n\nWith the `alloc()` and `free()` methods, you can allocate memory to store multiple elements\nat once. In other words, with these methods, we always allocate an array to store multiple elements at once.\nBut what if you need enough space to store just a single item? Should you\nallocate an array of a single element through `alloc()`?\n\nThe answer is no! In this case,\nyou should use the `create()` method of the allocator object.\nEvery allocator object offers the `create()` and `destroy()` methods,\nwhich are used to allocate and free memory for a single item, respectively.\n\nSo, in essence, if you want to allocate memory to store an array of elements, you\nshould use `alloc()` and `free()`. But if you need to store just a single item,\nthen, the `create()` and `destroy()` methods are ideal for you.\n\nIn the example below, I'm defining a struct to represent an user of some sort.\nIt could be an user for a game, or a software to manage resources, it doesn't mater.\nNotice that I use the `create()` method this time, to store a single `User` object\nin the program. Also notice that I use the `destroy()` method to free the memory\nused by this object at the end of the scope.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst User = struct {\n id: usize,\n name: []const u8,\n\n pub fn init(id: usize, name: []const u8) User {\n return .{ .id = id, .name = name };\n }\n};\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n const user = try allocator.create(User);\n defer allocator.destroy(user);\n\n user.* = User.init(0, \"Pedro\");\n}\n```\n:::\n",
+ "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Memory and Allocators\n\n\nIn this chapter, we will talk about memory. How does Zig controls memory? What\ncommon tools are used? Are there any important aspect that makes memory\ndifferent/special in Zig? You will find the answers here.\n\nEvery computer needs memory. Is by having memory that computers can temporarily store\nthe values/results of your calculations. Without memory, programming languages would never have\nconcepts such as \"variables\", or \"objects\", to store the values that you generate.\n\n\n## Memory spaces\n\nEvery object that you create in your Zig source code needs to be stored somewhere,\nin your computer's memory. Depending on where and how you define your object, Zig\nwill use a different \"memory space\", or a different\ntype of memory to store this object.\n\nEach type of memory normally serves for different purposes.\nIn Zig, there are 3 types of memory (or 3 different memory spaces) that we care about. They are:\n\n- Global data register (or the \"global data section\");\n- Stack;\n- Heap;\n\n\n### Compile-time known versus runtime known {#sec-compile-time}\n\nOne strategy that Zig uses to decide where it will store each object that you declare, is by looking\nat the value of this particular object. More specifically, by investigating if this value is\nknown at \"compile-time\" or at \"runtime\".\n\nWhen you write a program in Zig, the values of some of the objects that you write in your program are *known\nat compile time*. Meaning that, when you compile your Zig source code, during the compilation process,\nthe `zig` compiler can figure it out what is the exact value of a particular object\nthat exists in your source code.\nKnowing the length (or the size) of each object is also important. So the length (or the size) of each object that you write in your program is,\nin some cases, *known at compile time*.\n\nThe `zig` compiler cares more about knowing the length (or the size) of a particular object\n, than to know it's actual value. But, if the `zig` compiler knows the value of the object, then, it\nautomatically knows the size of this object. Because it can simply calculate the\nsize of the object by looking at the size of the value.\n\nTherefore, the priority for the `zig` compiler is to discover the size of each object in your source code.\nIf the value of the object in question is known at compile-time, then, the `zig` compiler\nautomatically knows the size/length of this object. But if the value of this object is not\nknown at compile-time, then, the size of this object is only known at compile-time if,\nand only if, the type of this object have a known fixed size.\n\nIn order to a type have a known fixed size, this type must have data members whose size is fixed.\nIf this type includes, for example, a variable sized array in it, then, this type do not have a known\nfixed size. Because this array can have any size at runtime\n(i.e. it can be an array of 2 elements, or 50 elements, or 1 thousand elements, etc.).\n\nFor example, a string object, which internally is an array of constant u8 values (`[]const u8`)\nhave a variable size. It can be a string object with 100 or 500 characters in it. If we do not\nknow at compile-time, which exact string will be stored inside this string object, then, we cannot calculate\nthe size of this string object at compile-time. So, any type, or any struct declaration that you make, that\nincludes a string data member that do not have an explicit fixed size, makes this type, or this\nnew struct that you are declaring, a type that do not have a known fixed size at compile-time.\n\nIn contrast, if the type or this struct that you are declaring, includes a data member that is an array,\nbut this array have a known fixed size, like `[60]u8` (which declares an array of 60 `u8` values), then,\nthis type, or, this struct that you are declaring, becomes a type with a known fixed size at compile-time.\nAnd because of that, in this case, the `zig` compiler do not need to known at compile-time the exact value of\nany object of this type. Since the compiler can find the necessary size to store this object by\nlooking at the size of it's type.\n\n\nLet's look at an example. In the source code below, we have two constant objects (`name` and `array`) declared.\nBecause the values of these particular objects are written down, in the source code itself (`\"Pedro\"`\nand the number sequence from 1 to 4), the `zig` compiler can easily discover the values of these constant\nobjects (`name` and `array`) during the compilation process.\nThis is what \"known at compile time\" means. It refers to any object that you have in your Zig source code\nwhose value can be identified at compile time.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst name = \"Pedro\";\nconst array = [_]u8{1, 2, 3, 4};\n_ = name; _ = array;\n\nfn input_length(input: []const u8) usize {\n const n = input.len;\n return n;\n}\n```\n:::\n\n\n\n\nThe other side of the spectrum are objects whose values are not known at compile time.\nFunction arguments are a classic example of this. Because the value of each function\nargument depends on the value that you assign to this particular argument,\nwhen you call the function.\n\nFor example, the function `input_length()` contains an argument named `input`, which is an array of constant `u8` integers (`[]const u8`).\nIs impossible to know at compile time the value of this particular argument. And it also is impossible to know the size/length\nof this particular argument. Because it is an array that do not have a fixed size specified explicitly in the argument type annotation.\n\nSo, we know that this `input` argument will be an array of `u8` integers. But we do not know at compile-time, it's value, and neither his size.\nThis information is known only at runtime, which is the period of time when you program is executed.\nAs a consequence, the value of the expression `input.len` is also known only at runtime.\nThis is an intrinsic characteristic of any function. Just remember that the value of function arguments is usually not \"compile-time known\".\n\nHowever, as I mentioned earlier, what really matters to the compiler is to know the size of the object\nat compile-time, and not necessarily it's value. So, although we don't know the value of the object `n`, which is the result of the expression\n`input.len`, at compile-time, we do know it's size. Because the expression `input.len` always return a value of type `usize`,\nand the type `usize` have a known fixed size.\n\n\n\n### Global data register\n\nThe global data register is a specific section of the executable of your Zig program, that is responsible\nfor storing any value that is known at compile time.\n\nEvery constant object whose value is known at compile time that you declare in your source code,\nis stored in the global data register. Also, every literal value that you write in your source code,\nsuch as the string `\"this is a string\"`, or the integer `10`, or a boolean value such as `true`,\nis also stored in the global data register.\n\nHonestly, you don't need to care much about this memory space. Because you can't control it,\nyou can't deliberately access it or use it for your own purposes.\nAlso, this memory space does not affect the logic of your program.\nIt simply exists in your program.\n\n\n### Stack vs Heap\n\nIf you are familiar with system's programming, or just low-level programming in general, you\nprobably have heard of the \"duel\" between Stack vs Heap. These are two different types of memory,\nor different memory spaces, which are both available in Zig.\n\nThese two types of memory don't actually duel with\neach other. This is a common mistake that beginners have, when seeing \"x vs y\" styles of\ntabloid headlines. These two types of memory are actually complementary to each other.\nSo, in almost every Zig program that you ever write, you will likely use a combination of both.\nI will describe each memory space in detail over the next sections. But for now, I just want to\nstablish the main difference between these two types of memory.\n\nIn essence, the stack memory is normally used to store values whose length is fixed and known\nat compile time. In contrast, the heap memory is a *dynamic* type of memory space, meaning that, it is\nused to store values whose length might grow during the execution (runtime) of your program [@jenny2022].\n\nLengths that grow during runtime are intrinsically associated with \"runtime known\" type of values.\nIn other words, if you have an object whose length might grow during runtime, then, the length\nof this object becomes not known at compile time. If the length is not known at compile-time,\nthe value of this object also becomes not known at compile-time.\nThese types of objects should be stored in the heap memory space, which is\na dynamic memory space, which can grow or shrink to fit the size of your objects.\n\n\n\n### Stack {#sec-stack}\n\nThe stack is a type of memory that uses the power of the *stack data structure*, hence the name. \nA \"stack\" is a type of *data structure* that uses a \"last in, first out\" (LIFO) mechanism to store the values\nyou give it to. I imagine you are familiar with this data structure.\nBut, if you are not, the [Wikipedia page](https://en.wikipedia.org/wiki/Stack_(abstract_data_type))[^wiki-stack]\n, or, the [Geeks For Geeks page](https://www.geeksforgeeks.org/stack-data-structure/)[^geek-stack] are both\nexcellent and easy resources to fully understand how this data structure works.\n\n[^wiki-stack]: \n[^geek-stack]: \n\nSo, the stack memory space is a type of memory that stores values using a stack data structure.\nIt adds and removes values from the memory by following a \"last in, first out\" (LIFO) principle.\n\nEvery time you make a function call in Zig, an amount of space in the stack is\nreserved for this particular function call [@jenny2022; @zigdocs].\nThe value of each function argument given to the function in this function call is stored in this\nstack space. Also, every local object that you declare inside the function scope is\nusually stored in this same stack space.\n\n\nLooking at the example below, the object `result` is a local object declared inside the scope of the `add()`\nfunction. Because of that, this object is stored inside the stack space reserved for the `add()` function.\nThe `r` object (which is declared outside of the `add()` function scope) is also stored in the stack.\nBut since it is declared in the \"outer\" scope, this object is stored in the\nstack space that belongs to this outer scope.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst r = add(5, 27);\n_ = r;\n\nfn add(x: u8, y: u8) u8 {\n const result = x + y;\n return result;\n}\n```\n:::\n\n\n\n\n\nSo, any object that you declare inside the scope of a function is always stored inside\nthe space that was reserved for that particular function in the stack memory. This\nalso counts for any object declared inside the scope of your `main()` function for example.\nAs you would expect, in this case, they\nare stored inside the stack space reserved for the `main()` function.\n\nOne very important detail about the stack memory is that **it frees itself automatically**.\nThis is very important, remember that. When objects are stored in the stack memory,\nyou don't have the work (or the responsibility) of freeing/destroying these objects.\nBecause they will be automatically destroyed once the stack space is freed at the end of the function scope.\n\nSo, once the function call returns (or ends, if you prefer to call it this way)\nthe space that was reserved in the stack is destroyed, and all of the objects that were in that space goes away with it.\nThis mechanism exists because this space, and the objects within it, are not necessary anymore,\nsince the function \"finished it's business\".\nUsing the `add()` function that we exposed above as an example, it means that the object `result` is automatically\ndestroyed once the function returns.\n\n::: {.callout-important}\nLocal objects that are stored in the stack space of a function are automatically\nfreed/destroyed at the end of the function scope.\n:::\n\n\nThis same logic applies to any other special structure in Zig that have it's own scope by surrounding\nit with curly braces (`{}`).\nFor loops, while loops, if else statements, etc. For example, if you declare any local\nobject in the scope of a for loop, this local object is accessible only within the scope\nof this particular for loop. Because once the scope of this for loop ends, the space in the stack\nreserved for this for loop is freed.\nThe example below demonstrates this idea.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// This does not compile successfully!\nconst a = [_]u8{0, 1, 2, 3, 4};\nfor (0..a.len) |i| {\n const index = i;\n _ = index;\n}\n// Trying to use an object that was\n// declared in the for loop scope,\n// and that does not exist anymore.\nstd.debug.print(\"{d}\\n\", index);\n```\n:::\n\n\n\n\n\n\nOne important consequence of this mechanism is that, once the function returns, you can no longer access any memory\naddress that was inside the space in the stack reserved for this particular function. Because this space was\ndestroyed. This means that, if this local object is stored in the stack,\nyou cannot make a function that **returns a pointer to this object**.\n\nThink about that for a second. If all local objects in the stack are destroyed at the end of the function scope, why\nwould you even consider returning a pointer to one of these objects? This pointer is at best,\ninvalid, or, more likely, \"undefined\".\n\nConclusion, is totally fine to write a function that returns the local object\nitself as result, because then, you return the value of that object as the result.\nBut, if this local object is stored in the stack, you should never write a function\nthat returns a pointer to this local object. Because the memory address pointed by the pointer\nno longer exists.\n\n\nSo, using again the `add()` function as an example, if you rewrite this function so that it\nreturns a pointer to the local object `result`, the `zig` compiler will actually compile\nyou program, with no warnings or erros. At first glance, it looks that this is good code\nthat works as expected. But this is a lie!\n\nIf you try to take a look at the value inside of the `r` object,\nor, if you try to use this `r` object in another expression\nor function call, then, you would have undefined behaviour, and major\nbugs in your program [@zigdocs, see \"Lifetime and Ownership\"[^life] and \"Undefined Behaviour\"[^undef] sections].\n\n[^life]: \n[^undef]: \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// This code compiles successfully. But it has\n// undefined behaviour. Never do this!!!\n\n// The `r` object is undefined!\nconst r = add(5, 27);\n_ = r;\n\nfn add(x: u8, y: u8) *const u8 {\n const result = x + y;\n return &result;\n}\n```\n:::\n\n\n\n\nThis \"invalid pointer to stack variable\" problem is very known across many programming language communities.\nIf you try to do the same thing, for example, in a C or C++ program (i.e. returning an address to\na local object stored in the stack), you would also get undefined behaviour\nin the program.\n\n::: {.callout-important}\nIf a local object in your function is stored in the stack, you should never\nreturn a pointer to this local object from the function. Because\nthis pointer will always become undefined after the function returns, since the stack space of the function\nis destroyed at the end of it's scope.\n:::\n\nBut what if you really need to use this local object in some way after your function returns?\nHow can you do this? The answer is: \"in the same you would do if this was a C or C++ program. By returning\nan address to an object stored in the heap\". The heap memory have a much more flexible lifecycle,\nand allows you to get a valid pointer to a local object of a function that already returned\nfrom it's scope.\n\n\n### Heap {#sec-heap}\n\nOne important limitation of the stack, is that, only objects whose length/size is known at compile-time can be\nstored in it. In contrast, the heap is a much more dynamic\n(and flexible) type of memory. It is the perfect type of memory to use\non objects whose size/length might grow during the execution of your program.\n\nVirtually any application that behaves as a server is a classic use case of the heap.\nA HTTP server, a SSH server, a DNS server, a LSP server, ... any type of server.\nIn summary, a server is a type of application that runs for long periods of time,\nand that serves (or \"deals with\") any incoming request that reaches this particular server.\n\nThe heap is a good choice for this type of system, mainly because the server does not know upfront\nhow many requests it will receive from users, while it is active. It could be one single request,\nor, 5 thousand requests, or, it could also be zero requests.\nThe server needs to have the ability to allocate and manage it's memory according to how many requests it receives.\n\nAnother key difference between the stack and the heap, is that the heap is a type\nof memory that you, the programmer, have complete control over. This makes the heap a\nmore flexible type of memory, but it also makes it harder to work with it. Because you,\nthe programmer, is responsible for managing everything related to it. Including where the memory is allocated,\nhow much memory is allocated, and where this memory is freed.\n\n> Unlike stack memory, heap memory is allocated explicitly by programmers and it won’t be deallocated until it is explicitly freed [@jenny2022].\n\nTo store an object in the heap, you, the programmer, needs to explicitly tells Zig to do so,\nby using an allocator to allocate some space in the heap. At @sec-allocators, I will present how you can use allocators to allocate memory\nin Zig.\n\n::: {.callout-important}\nEvery memory you allocate in the heap needs to be explicitly freed by you, the programmer.\n:::\n\nThe majority of allocators in Zig do allocate memory on the heap. But some exceptions to this rule are\n`ArenaAllocator()` and `FixedBufferAllocator()`. The `ArenaAllocator()` is a special\ntype of allocator that works in conjunction with a second type of allocator.\nOn the other side, the `FixedBufferAllocator()` is an allocator that works based on\nbuffer objects created on the stack. This means that the `FixedBufferAllocator()` makes\nallocations only on the stack.\n\n\n\n\n### Summary\n\nAfter discussing all of these boring details, we can quickly recap what we learned.\nIn summary, the Zig compiler will use the following rules to decide where each\nobject you declare is stored:\n\n1. every literal value (such as `\"this is string\"`, `10`, or `true`) is stored in the global data section.\n\n1. every constant object (`const`) whose value **is known at compile-time** is also stored in the global data section.\n\n1. every object (constant or not) whose length/size **is known at compile time** is stored in the stack space for the current scope.\n\n1. if an object is created with the method `alloc()` or `create()` of an allocator object, this object is stored in the memory space used by this particular allocator object. Most of allocators available in Zig use the heap memory, so, this object is likely stored in the heap (`FixedBufferAllocator()` is an exception to that).\n\n1. the heap can only be accessed through allocators. If your object was not created through the `alloc()` or `create()` methods of an allocator object, then, he is most certainly not an object stored in the heap.\n\n\n## Allocators {#sec-allocators}\n\nOne key aspect about Zig, is that there are \"no hidden-memory allocations\" in Zig.\nWhat that really means, is that \"no allocations happen behind your back in the standard library\" [@zigguide].\n\nThis is a known problem, especially in C++. Because in C++, there are some operators that do allocate\nmemory behind the scene, and there is no way for you to known that, until you actually read the\nsource code of these operators, and find the memory allocation calls.\nMany programmers find this behaviour annoying and hard to keep track of.\n\nBut, in Zig, if a function, an operator, or anything from the standard library\nneeds to allocate some memory during it's execution, then, this function/operator needs to receive (as input) an allocator\nprovided by the user, to actually be able to allocate the memory it needs.\n\nThis creates a clear distinction between functions that \"do not\" from those that \"actually do\"\nallocate memory. Just look at the arguments of this function.\nIf a function, or operator, have an allocator object as one of it's inputs/arguments, then, you know for\nsure that this function/operator will allocate some memory during it's execution.\n\nAn example is the `allocPrint()` function from the Zig standard library. With this function, you can\nwrite a new string using format specifiers. So, this function is, for example, very similar to the function `sprintf()` in C.\nIn order to write such new string, the `allocPrint()` function needs to allocate some memory to store the\noutput string.\n\nThat is why, the first argument of this function is an allocator object that you, the user/programmer, gives\nas input to the function. In the example below, I am using the `GeneralPurposeAllocator()` as my allocator\nobject. But I could easily use any other type of allocator object from the Zig standard library.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nconst name = \"Pedro\";\nconst output = try std.fmt.allocPrint(\n allocator,\n \"Hello {s}!!!\",\n .{name}\n);\ntry stdout.print(\"{s}\\n\", .{output});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHello Pedro!!!\n```\n\n\n:::\n:::\n\n\n\n\n\nYou get a lot of control\nover where and how much memory this function can allocate. Because it is you,\nthe user/programmer, that provides the allocator for the function to use.\nThis makes \"total control\" over memory management easier to achieve in Zig.\n\n### What are allocators?\n\nAllocators in Zig are objects that you can use to allocate memory for your program.\nThey are similar to the memory allocating functions in C, like `malloc()` and `calloc()`.\nSo, if you need to use more memory than you initially have, during the execution of your program, you can simply ask\nfor more memory using an allocator.\n\nZig offers different types of allocators, and they are usually available through the `std.heap` module of\nthe standard library. So, just import the Zig standard library into your Zig module (with `@import(\"std\")`), and you can start\nusing these allocators in your code.\n\nFurthermore, every allocator object is built on top of the `Allocator` interface in Zig. This\nmeans that, every allocator object you find in Zig must have the methods `alloc()`,\n`create()`, `free()` and `destroy()`. So, you can change the type of allocator you are using,\nbut you don't need to change the function calls to the methods that do the memory allocation\n(and the free memory operations) for your program.\n\n### Why you need an allocator?\n\nAs we described at @sec-stack, everytime you make a function call in Zig,\na space in the stack is reserved for this function call. But the stack\nhave a key limitation which is: every object stored in the stack have a\nknown fixed length.\n\nBut in reality, there are two very common instances where this \"fixed length limitation\" of the stack is a deal braker:\n\n1. the objects that you create inside your function might grow in size during the execution of the function.\n\n2. sometimes, it is impossible to know upfront how many inputs you will receive, or how big this input will be.\n\nAlso, there is another instance where you might want to use an allocator, which is when you want to write a function that returns a pointer\nto a local object. As I described at @sec-stack, you cannot do that if this local object is stored in the\nstack. However, if this object is stored in the heap, then, you can return a pointer to this object at the\nend of the function. Because you (the programmer) control the lyfetime of any heap memory that you allocate. You decide\nwhen this memory get's destroyed/freed.\n\nThese are common situations where the stack is not good for.\nThat is why you need a different memory management strategy to\nstore these objects inside your function. You need to use\na memory type that can grow together with your objects, or that you\ncan control the lyfetime of this memory.\nThe heap fit this description.\n\nAllocating memory on the heap is commonly known as dynamic memory management. As the objects you create grow in size\nduring the execution of your program, you grow the amount of memory\nyou have by allocating more memory in the heap to store these objects. \nAnd you that in Zig, by using an allocator object.\n\n\n### The different types of allocators\n\n\nAt the moment of the writing of this book, in Zig, we have 6 different\nallocators available in the standard library:\n\n- `GeneralPurposeAllocator()`.\n- `page_allocator()`.\n- `FixedBufferAllocator()` and `ThreadSafeFixedBufferAllocator()`.\n- `ArenaAllocator()`.\n- `c_allocator()` (requires you to link to libc).\n\n\nEach allocator have it's own perks and limitations. All allocators, except `FixedBufferAllocator()` and `ArenaAllocator()`,\nare allocators that use the heap memory. So any memory that you allocate with\nthese allocators, will be placed in the heap.\n\n### General-purpose allocators\n\nThe `GeneralPurposeAllocator()`, as the name suggests, is a \"general purpose\" allocator. You can use it for every type\nof task. In the example below, I'm allocating enough space to store a single integer in the object `some_number`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n const some_number = try allocator.create(u32);\n defer allocator.destroy(some_number);\n\n some_number.* = @as(u32, 45);\n}\n```\n:::\n\n\n\n\n\nWhile useful, you might want to use the `c_allocator()`, which is a alias to the C standard allocator `malloc()`. So, yes, you can use\n`malloc()` in Zig if you want to. Just use the `c_allocator()` from the Zig standard library. However,\nif you do use `c_allocator()`, you must link to Libc when compiling your source code with the\n`zig` compiler, by including the flag `-lc` in your compilation process.\nIf you do not link your source code to Libc, Zig will not be able to find the\n`malloc()` implementation in your system.\n\n### Page allocator\n\nThe `page_allocator()` is an allocator that allocates full pages of memory in the heap. In other words,\nevery time you allocate memory with `page_allocator()`, a full page of memory in the heap is allocated,\ninstead of just a small piece of it.\n\nThe size of this page depends on the system you are using.\nMost systems use a page size of 4KB in the heap, so, that is the amount of memory that is normally\nallocated in each call by `page_allocator()`. That is why, `page_allocator()` is considered a\nfast, but also \"wasteful\" allocator in Zig. Because it allocates a big amount of memory\nin each call, and you most likely will not need that much memory in your program.\n\n### Buffer allocators\n\nThe `FixedBufferAllocator()` and `ThreadSafeFixedBufferAllocator()` are allocator objects that\nwork with a fixed sized buffer that is stored in the stack. So these two allocators only allocates\nmemory in the stack. This also means that, in order to use these allocators, you must first\ncreate a buffer object, and then, give this buffer as an input to these allocators.\n\nIn the example below, I am creating a `buffer` object that is 10 elements long.\nNotice that I give this `buffer` object to the `FixedBufferAllocator()` constructor.\nNow, because this `buffer` object is 10 elements long, this means that I am limited to this space.\nI cannot allocate more than 10 elements with this allocator object. If I try to\nallocate more than that, the `alloc()` method will return an `OutOfMemory` error value.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar buffer: [10]u8 = undefined;\nfor (0..buffer.len) |i| {\n buffer[i] = 0; // Initialize to zero\n}\n\nvar fba = std.heap.FixedBufferAllocator.init(&buffer);\nconst allocator = fba.allocator();\nconst input = try allocator.alloc(u8, 5);\ndefer allocator.free(input);\n```\n:::\n\n\n\n\n\n### Arena allocator {#sec-arena-allocator}\n\nThe `ArenaAllocator()` is an allocator object that takes a child allocator as input. The idea behind the `ArenaAllocator()` in Zig\nis similar to the concept of \"arenas\" in the programming language Go[^go-arena]. It is an allocator object that allows you\nto allocate memory as many times you want, but free all memory only once.\nIn other words, if you have, for example, called 5 times the method `alloc()` of an `ArenaAllocator()` object, you can\nfree all the memory you allocated over these 5 calls at once, by simply calling the `deinit()` method of the same `ArenaAllocator()` object.\n\n[^go-arena]: \n\nIf you give, for example, a `GeneralPurposeAllocator()` object as input to the `ArenaAllocator()` constructor, like in the example below, then, the allocations\nyou perform with `alloc()` will actually be made with the underlying object `GeneralPurposeAllocator()` that was passed.\nSo, with an arena allocator, any new memory you ask for is allocated by the child allocator. The only thing that an arena allocator\nreally do is helping you to free all the memory you allocated multiple times with just a single command. In the example\nbelow, I called `alloc()` 3 times. So, if I did not used an arena allocator, then, I would need to call\n`free()` 3 times to free all the allocated memory.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nvar aa = std.heap.ArenaAllocator.init(gpa.allocator());\ndefer aa.deinit();\nconst allocator = aa.allocator();\n\nconst in1 = allocator.alloc(u8, 5);\nconst in2 = allocator.alloc(u8, 10);\nconst in3 = allocator.alloc(u8, 15);\n_ = in1; _ = in2; _ = in3;\n```\n:::\n\n\n\n\n\n\n### The `alloc()` and `free()` methods\n\nIn the code example below, we are accessing the `stdin`, which is\nthe standard input channel, to receive an input from the\nuser. We read the input given by the user with the `readUntilDelimiterOrEof()`\nmethod.\n\nNow, after reading the input of the user, we need to store this input somewhere in\nour program. That is why I use an allocator in this example. I use it to allocate some\namount of memory to store this input given by the user. More specifically, the method `alloc()`\nof the allocator object is used to allocate an array capable of storing 50 `u8` values.\n\nNotice that this `alloc()` method receives two inputs. The first one, is a type.\nThis defines what type of values the allocated array will store. In the example\nbelow, we are allocating an array of unsigned 8-bit integers (`u8`). But\nyou can create an array to store any type of value you want. Next, on the second argument, we\ndefine the size of the allocated array, by specifying how much elements\nthis array will contain. In the case below, we are allocating an array of 50 elements.\n\nAt @sec-zig-strings we described that strings in Zig are simply arrays of characters.\nEach character is represented by an `u8` value. So, this means that the array that\nwas allocated in the object `input` is capable of storing a string that is\n50-characters long.\n\nSo, in essence, the expression `var input: [50]u8 = undefined` would create\nan array for 50 `u8` values in the stack of the current scope. But, you\ncan allocate the same array in the heap by using the expression `var input = try allocator.alloc(u8, 50)`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdin = std.io.getStdIn();\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var input = try allocator.alloc(u8, 50);\n defer allocator.free(input);\n for (0..input.len) |i| {\n input[i] = 0; // initialize all fields to zero.\n }\n // read user input\n const input_reader = stdin.reader();\n _ = try input_reader.readUntilDelimiterOrEof(\n input,\n '\\n'\n );\n std.debug.print(\"{s}\\n\", .{input});\n}\n```\n:::\n\n\n\n\nAlso, notice that in this example, we use the `defer` keyword (which I described at @sec-defer) to run a small\npiece of code at the end of the current scope, which is the expression `allocator.free(input)`.\nWhen you execute this expression, the allocator will free the memory that it allocated\nfor the `input` object.\n\nWe have talked about this at @sec-heap. You **should always** explicitly free any memory that you allocate\nusing an allocator! You do that by using the `free()` method of the same allocator object you\nused to allocate this memory. The `defer` keyword is used in this example only to help us execute\nthis free operation at the end of the current scope.\n\n\n### The `create()` and `destroy()` methods\n\nWith the `alloc()` and `free()` methods, you can allocate memory to store multiple elements\nat once. In other words, with these methods, we always allocate an array to store multiple elements at once.\nBut what if you need enough space to store just a single item? Should you\nallocate an array of a single element through `alloc()`?\n\nThe answer is no! In this case,\nyou should use the `create()` method of the allocator object.\nEvery allocator object offers the `create()` and `destroy()` methods,\nwhich are used to allocate and free memory for a single item, respectively.\n\nSo, in essence, if you want to allocate memory to store an array of elements, you\nshould use `alloc()` and `free()`. But if you need to store just a single item,\nthen, the `create()` and `destroy()` methods are ideal for you.\n\nIn the example below, I'm defining a struct to represent an user of some sort.\nIt could be an user for a game, or a software to manage resources, it doesn't mater.\nNotice that I use the `create()` method this time, to store a single `User` object\nin the program. Also notice that I use the `destroy()` method to free the memory\nused by this object at the end of the scope.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst User = struct {\n id: usize,\n name: []const u8,\n\n pub fn init(id: usize, name: []const u8) User {\n return .{ .id = id, .name = name };\n }\n};\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n const user = try allocator.create(User);\n defer allocator.destroy(user);\n\n user.* = User.init(0, \"Pedro\");\n}\n```\n:::\n",
"supporting": [
"01-memory_files"
],
diff --git a/_freeze/Chapters/01-zig-weird/execute-results/html.json b/_freeze/Chapters/01-zig-weird/execute-results/html.json
index 44fc9c9..452e079 100644
--- a/_freeze/Chapters/01-zig-weird/execute-results/html.json
+++ b/_freeze/Chapters/01-zig-weird/execute-results/html.json
@@ -1,9 +1,11 @@
{
- "hash": "21f5f3029bf2d923cc9f91c2a48c7f67",
+ "hash": "36ab0ec9b0940f24d1c334bc6f5fc7eb",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n\n# Introducing Zig\n\nIn this chapter, I want to introduce you to the world of Zig.\nZig is a very young language that is being actively developed.\nAs a consequence, it's world is still very wild and to be explored.\nThis book is my attempt to help you on your personal journey for\nunderstanding and exploring the exciting world of Zig.\n\nI assume you have previous experience with some programming\nlanguage in this book, not necessarily with a low-level one.\nSo, if you have experience with Python, or Javascript, for example, it will be fine.\nBut, if you do have experience with low-level languages, such as C, C++, or\nRust, you will probably learn faster throughout this book.\n\n## What is Zig?\n\nZig is a modern, low-level, and general-purpose programming language. Some programmers think of\nZig as a modern and better version of C.\n\nIn the author's personal interpretation, Zig is tightly connected with \"less is more\".\nInstead of trying to become a modern language by adding more and more features,\nmany of the core improvements that Zig brings to the\ntable are actually about removing annoying behaviours/features from C and C++.\nIn other words, Zig tries to be better by simplifying the language, and by having more consistent and robust behaviour.\nAs a result, analyzing, writing and debugging applications become much easier and simpler in Zig, than it is in C or C++.\n\nThis philosophy becomes clear with the following phrase from the official website of Zig:\n\n> \"Focus on debugging your application rather than debugging your programming language knowledge\".\n\nThis phrase is specially true for C++ programmers. Because C++ is a gigantic language,\nwith tons of features, and also, there are lots of different \"flavors of C++\". These elements\nare what makes C++ so complex and hard to learn. Zig tries to go in the opposite direction.\nZig is a very simple language, more closely related to other simple languages such as C and Go.\n\nThe phrase above is still important for C programmers too. Because, even C being a simple\nlanguage, it is still hard sometimes to read and understand C code. For example, pre-processor macros in\nC are a frequent source of confusion. They really make it sometimes hard to debug\nC programs. Because macros are essentially a second language embedded in C that obscures\nyour C code. With macros, you are no longer 100% sure about which pieces\nof the code are being sent to the compiler, i.e.\nthey obscures the actual source code that you wrote.\n\nYou don't have macros in Zig. In Zig, the code you write, is the actual code that get's compiled by the compiler.\nYou also don't have a hidden control flow happening behind the scenes. And, you also\ndon't have functions or operators from the standard library that make\nhidden memory allocations behind your back.\n\nBy being a simpler language, Zig becomes much more clear and easier to read/write,\nbut at the same time, it also achieves a much more robust state, with more consistent\nbehaviour in edge situations. Once again, less is more.\n\n\n## Hello world in Zig\n\nWe begin our journey in Zig by creating a small \"Hello World\" program.\nTo start a new Zig project in your computer, you simply call the `init` command\nfrom the `zig` compiler.\nJust create a new directory in your computer, then, init a new Zig project\ninside this directory, like this:\n\n```bash\nmkdir hello_world\ncd hello_world\nzig init\n```\n\n```\ninfo: created build.zig\ninfo: created build.zig.zon\ninfo: created src/main.zig\ninfo: created src/root.zig\ninfo: see `zig build --help` for a menu of options\n```\n\n### Understanding the project files {#sec-project-files}\n\nAfter you run the `init` command from the `zig` compiler, some new files\nare created inside of your current directory. First, a \"source\" (`src`) directory\nis created, containing two files, `main.zig` and `root.zig`. Each `.zig` file\nis a separate Zig module, which is simply a text file that contains some Zig code.\n\nBy convention, the `main.zig` module is where your main function lives. Thus,\nif you are building an executable program in Zig, you need to declare a `main()` function,\nwhich represents the entrypoint of your program, i.e. it is where the execution of your program begins.\n\nHowever, if you are building a library (instead of an executable program), then,\nthe normal procedure is to delete this `main.zig` file and start with the `root.zig` module.\nBy convention, the `root.zig` module is the root source file of your library.\n\n```bash\ntree .\n```\n\n```\n.\n├── build.zig\n├── build.zig.zon\n└── src\n ├── main.zig\n └── root.zig\n\n1 directory, 4 files\n```\n\nThe `ìnit` command also creates two additional files in our working directory:\n`build.zig` and `build.zig.zon`. The first file (`build.zig`) represents a build script written in Zig.\nThis script is executed when you call the `build` command from the `zig` compiler.\nIn other words, this file contain Zig code that executes the necessary steps to build the entire project.\n\n\nLow-level languages normally use a compiler to build your\nsource code into binary executables or binary libraries.\nNevertheless, this process of compiling your source code and building\nbinary executables or binary libraries from it, became a real challenge\nin the programming world, once the projects became bigger and bigger.\nAs a result, programmers created \"build systems\", which are a second set of tools designed to make this process\nof compiling and building complex projects, easier.\n\nExamples of build systems are CMake, GNU Make, GNU Autoconf and Ninja,\nwhich are used to build complex C and C++ projects.\nWith these systems, you can write scripts, which are called \"build scripts\".\nThey simply are scripts that describes the necessary steps to compile/build\nyour project.\n\nHowever, these are separate tools, that do not\nbelong to C/C++ compilers, like `gcc` or `clang`.\nAs a result, in C/C++ projects, you have not only to install and\nmanage your C/C++ compilers, but you also have to install and manage\nthese build systems separately.\n\nIn Zig, we don't need to use a separate set of tools to build our projects,\nbecause a build system is embedded inside the language itself.\nTherefore, Zig contains a native build system in it, and\nwe can use this build system to write small scripts in Zig,\nwhich describes the necessary steps to build/compile our Zig project[^zig-build-system].\nSo, everything you need to build a complex Zig project is the\n`zig` compiler, and nothing more.\n\n[^zig-build-system]: .\n\n\nThe second generated file (`build.zig.zon`) is the Zig package manager configuration file,\nwhere you can list and manage the dependencies of your project. Yes, Zig has\na package manager (like `pip` in Python, `cargo` in Rust, or `npm` in Javascript) called Zon,\nand this `build.zig.zon` file is similar to the `package.json` file\nin Javascript projects, or, the `Pipfile` file in Python projects,\nor the `Cargo.toml` file in Rust projects.\n\n\n### The file `root.zig` {#sec-root-file}\n\nLet's take a look into the `root.zig` file.\nYou might have noticed that every line of code with an expression ends with a semicolon (`;`).\nThis follows the syntax of a C-family programming language[^c-family].\n\n[^c-family]: \n\nAlso, notice the `@import()` call at the first line. We use this built-in function\nto import functionality from other Zig modules into our current module.\nThis `@import()` function works similarly to the `#include` pre-processor\nin C or C++, or, to the `import` statement in Python or Javascript code.\nIn this example, we are importing the `std` module,\nwhich gives you access to the Zig Standard Library.\n\nIn this `root.zig` file, we can also see how assignments (i.e. creating new objects)\nare made in Zig. You can create a new object in Zig by using the following syntax\n`(const|var) name = value;`. In the example below, we are creating two constant\nobjects (`std` and `testing`). At @sec-assignments we talk more about objects in general.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst testing = std.testing;\n\nexport fn add(a: i32, b: i32) i32 {\n return a + b;\n}\n```\n:::\n\n\n\n\n\nFunctions in Zig are declared using the `fn` keyword.\nIn this `root.zig` module, we are declaring a function called `add()`, which has two arguments named `a` and `b`.\nThe function returns an integer of the type `i32` as result.\n\n\nZig is not exactly a strongly-typed language. Because you can (if you want to) omit\nthe type of an object in your code, if this type can be derived from the assigned value.\nBut there are other situations where you do need to be explicit.\nFor example, you do have to explicitly specify the type of each function argument, and also,\nthe return type of every function you create in Zig. So, at least in function declarations,\nZig is a strongly-typed language.\n\nWe specify the type of an object or a function argument in Zig by\nusing a colon character (`:`) followed by the type after the name of this object/function argument.\nWith the expressions `a: i32` and `b: i32`, we know that both `a` and `b` arguments have type `i32`,\nwhich is a signed 32 bit integer. In this part,\nthe syntax in Zig is identical to the syntax in Rust, which also specifies types by\nusing the colon character.\n\nLastly, we have the return type of the function at the end of the line, before we open\nthe curly braces to start writing the function's body. In the example above, this type is also\na signed 32 bit integer (`i32`) value.\n\nNotice that we also have an `export` keyword before the function declaration. This keyword\nis similar to the `extern` keyword in C. It exposes the function\nto make it available in the library API. Therefore, if you are writing\na library for other people to use, you have to expose the functions\nyou write in the public API of this library by using this `export` keyword.\nIf we removed the `export` keyword from the `add()` function declaration,\nthen, this function would be no longer exposed in the library object built\nby the `zig` compiler.\n\n\n### The `main.zig` file {#sec-main-file}\n\nNow that we have learned a lot about Zig's syntax from the `root.zig` file,\nlet's take a look at the `main.zig` file.\nA lot of the elements we saw in `root.zig` are also present in `main.zig`.\nBut there are some other elements that we haven't seen yet, so let's dive in.\n\nFirst, look at the return type of the `main()` function in this file.\nWe can see a small change. The return\ntype of the function (`void`) is accompanied by an exclamation mark (`!`).\nThis exclamation mark tells us that this `main()` function\nmight return an error.\n\nIn this example, the `main()` function can either return `void` or return an error.\nThis is an interesting feature of Zig. If you write a function and something inside of\nthe body of this function might return an error then you are forced to:\n\n- either add the exclamation mark to the return type of the function and make it clear that\nthis function might return an error\n- explicitly handle this error inside the function\n\nIn most programming languages, we normally handle (or deal with) an error through\na *try catch* pattern. Zig do have both `try` and `catch` keywords. But they work\na little differently than what you're probably used to in other languages.\n\nIf we look at the `main()` function below, you can see that we do have a `try` keyword\non the 5th line. But we do not have a `catch` keyword in this code.\nIn Zig, we use the `try` keyword to execute an expression that might return an error,\nwhich, in this example, is the `stdout.print()` expression.\n\nIn essence, the `try` keyword executes the expression `stdout.print()`. If this expression\nreturns a valid value, then, the `try` keyword do nothing. It only passes the value forward.\nBut if the expression does return an error, then, the `try` keyword just unwrap the error value,\nand return this error from the function and also prints the current stack trace to `stderr`.\n\nThis might sound weird to you if you come from a high-level language. Because in\nhigh-level languages, such as Python, if an error occurs somewhere, this error is automatically\nreturned and the execution of your program will automatically stop even if you don't want\nto stop the execution. You are obligated to face the error.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\n\npub fn main() !void {\n const stdout = std.io.getStdOut().writer();\n try stdout.print(\"Hello, {s}!\\n\", .{\"world\"});\n}\n```\n:::\n\n\n\n\n\nAnother thing that you might have noticed in this code example, is that\nthe `main()` function is marked with the `pub` keyword.\nIt marks the `main()` function as a *public function* from this module.\n\nEvery function in your Zig module is by default private to this Zig module and can only be called from within the module.\nUnless, you explicitly mark this function as a public function with the `pub` keyword.\nThis means that the `pub` keyword in Zig do essentially the opposite of what the `static` keyword\ndo in C/C++.\n\nBy making a function \"public\" you allow other Zig modules to access and call it.\nA calling Zig module imports the module with the `@import()`\nbuilt-in. That makes all public functions from the imported module visible.\n\n\n### Compiling your source code {#sec-compile-code}\n\nYou can compile your Zig modules into a binary executable by running the `build-exe` command\nfrom the `zig` compiler. You simply list all the Zig modules that you want to build after\nthe `build-exe` command, separated by spaces. In the example below, we are compiling the module `main.zig`.\n\n```bash\nzig build-exe src/main.zig\n```\n\nSince we are building an executable, the `zig` compiler will look for a `main()` function\ndeclared in any of the files that you list after the `build-exe` command. If\nthe compiler does not find a `main()` function declared somewhere, a\ncompilation error will be raised, warning about this mistake.\n\nThe `zig` compiler also offers a `build-lib` and `build-obj` commands, which work\nthe exact same way as the `build-exe` command. The only difference is that, they compile your\nZig modules into a portale C ABI library, or, into object files, respectively.\n\nIn the case of the `build-exe` command, a binary executable file is created by the `zig`\ncompiler in the root directory of your project.\nIf we take a look now at the contents of our current directory, with a simple `ls` command, we can\nsee the binary file called `main` that was created by the compiler.\n\n```bash\nls\n```\n\n```\nbuild.zig build.zig.zon main src\n```\n\nIf I execute this binary executable, I get the \"Hello World\" message in the terminal\n, as we expected.\n\n```bash\n./main\n```\n\n```\nHello, world!\n```\n\n\n### Compile and execute at the same time {#sec-compile-run-code}\n\nOn the previous section, I presented the `zig build-exe` command, which\ncompiles Zig modules into an executable file. However, this means that,\nin order to execute the executable file, we have to run two different commands.\nFirst, the `zig build-exe` command, and then, we call the executable file\ncreated by the compiler.\n\nBut what if we wanted to perform these two steps,\nall at once, in a single command? We can do that by using the `zig run`\ncommand.\n\n```bash\nzig run src/main.zig\n```\n\n```\nHello, world!\n```\n\n### Compiling the entire project {#sec-compile-project}\n\nJust as I described at @sec-project-files, as our project grows in size and\ncomplexity, we usually prefer to organize the compilation and build process\nof the project into a build script, using some sort of \"build system\".\n\nIn other words, as our project grows in size and complexity,\nthe `build-exe`, `build-lib` and `build-obj` commands become\nharder to use directly. Because then, we start to list\nmultiple and multiple modules at the same time. We also\nstart to add built-in compilation flags to customize the\nbuild process for our needs, etc. It becomes a lot of work\nto write the necessary commands by hand.\n\nIn C/C++ projects, programmers normally opt to use CMake, Ninja, `Makefile` or `configure` scripts\nto organize this process. However, in Zig, we have a native build system in the language itself.\nSo, we can write build scripts in Zig to compile and build Zig projects. Then, all we\nneed to do, is to call the `zig build` command to build our project.\n\nSo, when you execute the `zig build` command, the `zig` compiler will search\nfor a Zig module named `build.zig` inside your current directory, which\nshould be your build script, containing the necessary code to compile and\nbuild your project. If the compiler do find this `build.zig` file in your directory,\nthen, the compiler will essentially execute a `zig run` command\nover this `build.zig` file, to compile and execute this build\nscript, which in turn, will compile and build your entire project.\n\n\n```bash\nzig build\n```\n\n\nAfter you execute this \"build project\" command, a `zig-out` directory\nis created in the root of your project directory, where you can find\nthe binary executables and libraries created from your Zig modules\naccordingly to the build commands that you specified at `build.zig`.\nWe will talk more about the build system in Zig latter in this book.\n\nIn the example below, I'm executing the binary executable\nnamed `hello_world` that was generated by the compiler after the\n`zig build` command.\n\n```bash\n./zig-out/bin/hello_world\n```\n\n```\nHello, world!\n```\n\n\n\n## How to learn Zig?\n\nWhat are the best strategies to learn Zig? \nFirst of all, of course this book will help you a lot on your journey through Zig.\nBut you will also need some extra resources if you want to be really good at Zig.\n\nAs a first tip, you can join a community with Zig programmers to get some help\n, when you need it:\n\n- Reddit forum: ;\n- Ziggit community: ;\n- Discord, Slack, Telegram, and others: ;\n\nNow, one of the best ways to learn Zig is to simply read Zig code. Try\nto read Zig code often, and things will become more clear.\nA C/C++ programmer would also probably give you this same tip.\nBecause this strategy really works!\n\nNow, where you can find Zig code to read?\nI personally think that, the best way of reading Zig code is to read the source code of the\nZig Standard Library. The Zig Standard Library is available at the [`lib/std` folder](https://github.com/ziglang/zig/tree/master/lib/std)[^zig-lib-std] on\nthe official GitHub repository of Zig. Access this folder, and start exploring the Zig modules.\n\nAlso, a great alternative is to read code from other large Zig\ncodebases, such as:\n\n1. the [Javascript runtime Bun](https://github.com/oven-sh/bun)[^bunjs].\n1. the [game engine Mach](https://github.com/hexops/mach)[^mach].\n1. a [LLama 2 LLM model implementation in Zig](https://github.com/cgbur/llama2.zig/tree/main)[^ll2].\n1. the [financial transactions database `tigerbeetle`](https://github.com/tigerbeetle/tigerbeetle)[^tiger].\n1. the [command-line arguments parser `zig-clap`](https://github.com/Hejsil/zig-clap)[^clap].\n1. the [UI framework `capy`](https://github.com/capy-ui/capy)[^capy].\n1. the [Language Protocol implementation for Zig, `zls`](https://github.com/zigtools/zls)[^zls].\n1. the [event-loop library `libxev`](https://github.com/mitchellh/libxev)[^xev].\n\n[^xev]: \n[^zls]: \n[^capy]: \n[^clap]: \n[^tiger]: \n[^ll2]: \n[^mach]: \n[^bunjs]: .\n\nAll these assets are available on GitHub,\nand this is great, because we can use the GitHub search bar in our advantage,\nto find Zig code that fits our description.\nFor example, you can always include `lang:Zig` in the GitHub search bar when you\nare searching for a particular pattern. This will limit the search to only Zig modules.\n\n[^zig-lib-std]: \n\nAlso, a great alternative is to consult online resources and documentations.\nHere is a quick list of resources that I personally use from time to time to learn\nmore about the language each day:\n\n- Zig Language Reference: ;\n- Zig Standard Library Reference: ;\n- Zig Guide: ;\n- Karl Seguin Blog: ;\n- Zig News: ;\n- Read the code written by one of the Zig core team members: ;\n- Some livecoding sessions are transmitted in the Zig Showtime Youtube Channel: ;\n\n\nAnother great strategy to learn Zig, or honestly, to learn any language you want,\nis to practice it by solving exercises. For example, there is a famous repository\nin the Zig community called [Ziglings](https://codeberg.org/ziglings/exercises/)[^ziglings]\n, which contains more than 100 small exercises that you can solve. It is a repository of\ntiny programs written in Zig that are currently broken, and your responsibility is to\nfix these programs, and make them work again.\n\n[^ziglings]: .\n\nA famous tech YouTuber known as *The Primeagen* also posted some videos (at YouTube)\nwhere he solves these exercises from Ziglings. The first video is named\n[\"Trying Zig Part 1\"](https://www.youtube.com/watch?v=OPuztQfM3Fg&t=2524s&ab_channel=TheVimeagen)[^prime1].\n\n[^prime1]: .\n\nAnother great alternative, is to solve the [Advent of Code exercises](https://adventofcode.com/)[^advent-code].\nThere are people that already took the time to learn and solve the exercises, and they posted\ntheir solutions on GitHub as well, so, in case you need some resource to compare while solving\nthe exercises, you can look at these two repositories:\n\n- ;\n- ;\n\n[^advent-code]: \n\n\n\n\n\n\n## Creating new objects in Zig (i.e. identifiers) {#sec-assignments}\n\nLet's talk more about objects in Zig. Readers that have past experience\nwith other programming languages might know this concept through\na different name, such as: \"variable\" or \"identifier\". In this book, I choose\nto use the term \"object\" to refer to this concept.\n\nTo create a new object (or a new \"identifier\") in Zig, we use\nthe keywords `const` or `var`. These keywords specificy if the object\nthat you are creating is mutable or not.\nIf you use `const`, then the object you are\ncreating is a constant (or immutable) object, which means that once you declare this object, you\ncan no longer change the value stored inside this object.\n\nOn the other side, if you use `var`, then, you are creating a variable (or mutable) object.\nYou can change the value of this object as many times you want. Using the\nkeyword `var` in Zig is similar to using the keywords `let mut` in Rust.\n\n### Constant objects vs variable objects\n\nIn the code example below, we are creating a new constant object called `age`.\nThis object stores a number representing the age of someone. However, this code example\ndoes not compiles succesfully. Because on the next line of code, we are trying to change the value\nof the object `age` to 25.\n\nThe `zig` compiler detects that we are trying to change\nthe value of an object/identifier that is constant, and because of that,\nthe compiler will raise a compilation error, warning us about the mistake.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst age = 24;\n// The line below is not valid!\nage = 25;\n```\n:::\n\n\n\n\n\n```\nt.zig:10:5: error: cannot assign to constant\n age = 25;\n ~~^~~\n```\n\nIn contrast, if you use `var`, then, the object created is a variable object.\nWith `var` you can declare this object in your source code, and then,\nchange the value of this object how many times you want over future points\nin your source code.\n\nSo, using the same code example exposed above, if I change the declaration of the\n`age` object to use the `var` keyword, then, the program gets compiled succesfully.\nBecause now, the `zig` compiler detects that we are changing the value of an\nobject that allows this behaviour, because it is an \"variable object\".\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar age: u8 = 24;\nage = 25;\n```\n:::\n\n\n\n\n\n\n### Declaring without an initial value\n\nBy default, when you declare a new object in Zig, you must give it\nan initial value. In other words, this means\nthat we have to declare, and, at the same time, initialize every object we\ncreate in our source code.\n\nOn the other hand, you can, in fact, declare a new object in your source code,\nand not give it an explicit value. But we need to use a special keyword for that,\nwhich is the `undefined` keyword.\n\nIs important to emphasize that, you should avoid using `undefined` as much as possible.\nBecause when you use this keyword, you leave your object uninitialized, and, as a consequence,\nif for some reason, your code use this object while it is uninitialized, then, you will definitely\nhave undefined behaviour and major bugs in your program.\n\nIn the example below, I'm declaring the `age` object again. But this time,\nI do not give it an initial value. The variable is only initialized at\nthe second line of code, where I store the number 25 in this object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar age: u8 = undefined;\nage = 25;\n```\n:::\n\n\n\n\n\nHaving these points in mind, just remember that you should avoid as much as possible to use `undefined` in your code.\nAlways declare and initialize your objects. Because this gives you much more safety in your program.\nBut in case you really need to declare an object without initializing it... the\n`undefined` keyword is the way to do it in Zig.\n\n\n### There is no such thing as unused objects\n\nEvery object (being constant or variable) that you declare in Zig **must be used in some way**. You can give this object\nto a function call, as a function argument, or, you can use it in another expression\nto calculate the value of another object, or, you can call a method that belongs to this\nparticular object. \n\nIt doesn't matter in which way you use it. As long as you use it.\nIf you try to break this rule, i.e. if your try to declare a object, but not use it,\nthe `zig` compiler will not compile your Zig source code, and it will issue a error\nmessage warning that you have unused objects in your code.\n\nLet's demonstrate this with an example. In the source code below, we declare a constant object\ncalled `age`. If you try to compile a simple Zig program with this line of code below,\nthe compiler will return an error as demonstrated below:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst age = 15;\n```\n:::\n\n\n\n\n\n```\nt.zig:4:11: error: unused local constant\n const age = 15;\n ^~~\n```\n\nEverytime you declare a new object in Zig, you have two choices:\n\n1. you either use the value of this object;\n2. or you explicitly discard the value of the object;\n\nTo explicitly discard the value of any object (constant or variable), all you need to do is to assign\nthis object to an special character in Zig, which is the underscore (`_`).\nWhen you assign an object to a underscore, like in the example below, the `zig` compiler will automatically\ndiscard the value of this particular object.\n\nYou can see in the example below that, this time, the compiler did not\ncomplain about any \"unused constant\", and succesfully compiled our source code.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// It compiles!\nconst age = 15;\n_ = age;\n```\n:::\n\n\n\n\n\nNow, remember, everytime you assign a particular object to the underscore, this object\nis essentially destroyed. It is discarded by the compiler. This means that you can no longer\nuse this object further in your code. It doesn't exist anymore.\n\nSo if you try to use the constant `age` in the example below, after we discarded it, you\nwill get a loud error message from the compiler (talking about a \"pointless discard\")\nwarning you about this mistake.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// It does not compile.\nconst age = 15;\n_ = age;\n// Using a discarded value!\nstd.debug.print(\"{d}\\n\", .{age + 2});\n```\n:::\n\n\n\n\n\n```\nt.zig:7:5: error: pointless discard\n of local constant\n```\n\n\nThis same rule applies to variable objects. Every variable object must also be used in\nsome way. And if you assign a variable object to the underscore,\nthis object also get's discarded, and you can no longer use this object.\n\n\n\n### You must mutate every variable objects\n\nEvery variable object that you create in your source code must be mutated at some point.\nIn other words, if you declare an object as a variable\nobject, with the keyword `var`, and you do not change the value of this object\nat some point in the future, the `zig` compiler will detect this,\nand it will raise an error warning you about this mistake.\n\nThe concept behind this is that every object you create in Zig should be preferably a\nconstant object, unless you really need an object whose value will\nchange during the execution of your program.\n\nSo, if I try to declare a variable object such as `where_i_live` below,\nand I do not change the value of this object in some way,\nthe `zig` compiler raises an error message with the phrase \"variable is never mutated\".\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar where_i_live = \"Belo Horizonte\";\n_ = where_i_live;\n```\n:::\n\n\n\n\n\n```\nt.zig:7:5: error: local variable is never mutated\nt.zig:7:5: note: consider using 'const'\n```\n\n## Primitive Data Types {#sec-primitive-data-types}\n\nZig have many different primitive data types available for you to use.\nYou can see the full list of available data types at the official\n[Language Reference page](https://ziglang.org/documentation/master/#Primitive-Types)[^lang-data-types].\n\n[^lang-data-types]: .\n\nBut here is a quick list:\n\n- Unsigned integers: `u8`, 8-bit integer; `u16`, 16-bit integer; `u32`, 32-bit integer; `u64`, 64-bit integer; `u128`, 128-bit integer.\n- Signed integers: `i8`, 8-bit integer; `i16`, 16-bit integer; `i32`, 32-bit integer; `i64`, 64-bit integer; `i128`, 128-bit integer.\n- Float number: `f16`, 16-bit floating point; `f32`, 32-bit floating point; `f64`, 64-bit floating point; `f128`, 128-bit floating point;\n- Boolean: `bool`, represents true or false values.\n- C ABI compatible types: `c_long`, `c_char`, `c_short`, `c_ushort`, `c_int`, `c_uint`, and many others.\n- Pointer sized integers: `isize` and `usize`.\n\n\n\n\n\n\n\n## Arrays {#sec-arrays}\n\nYou create arrays in Zig by using a syntax that resembles the C syntax.\nFirst, you specify the size of the array (i.e. the number of elements that will be stored in the array)\nyou want to create inside a pair of brackets.\n\nThen, you specify the data type of the elements that will be stored inside this array.\nAll elements present in an array in Zig must have the same data type. For example, you cannot mix elements\nof type `f32` with elements of type `i32` in the same array.\n\nAfter that, you simply list the values that you want to store in this array inside\na pair of curly braces.\nIn the example below, I am creating two constant objets that contain different arrays.\nThe first object contains an array of 4 integer values, while the second object,\nan array of 3 floating point values.\n\nNow, you should notice that in the object `ls`, I am\nnot explicitly specifying the size of the array inside of the brackets. Instead\nof using a literal value (like the value 4 that I used in the `ns` object), I am\nusing the special character underscore (`_`). This syntax tells the `zig` compiler\nto fill this field with the number of elements listed inside of the curly braces.\nSo, this syntax `[_]` is for lazy (or smart) programmers who leave the job of\ncounting how many elements there are in the curly braces for the compiler.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst ls = [_]f64{432.1, 87.2, 900.05};\n_ = ns; _ = ls;\n```\n:::\n\n\n\n\n\nIs worth noting that these are static arrays, meaning that\nthey cannot grow in size.\nOnce you declare your array, you cannot change the size of it.\nThis is very common in low level languages.\nBecause low level languages normally wants to give you (the programmer) full control over memory,\nand the way in which arrays are expanded is tightly related to\nmemory management.\n\n\n### Selecting elements of the array {#sec-select-array-elem}\n\nOne very common activity is to select specific portions of an array\nyou have in your source code.\nIn Zig, you can select a specific element from your\narray, by simply providing the index of this particular\nelement inside brackets after the object name.\nIn the example below, I am selecting the third element from the\n`ns` array. Notice that Zig is a \"zero-index\" based language,\nlike C, C++, Rust, Python, and many other languages.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\ntry stdout.print(\"{d}\\n\", .{ ns[2] });\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n12\n```\n\n\n:::\n:::\n\n\n\n\n\nIn contrast, you can also select specific slices (or sections) of your array, by using a\nrange selector. Some programmers also call these selectors of \"slice selectors\",\nand they also exist in Rust, and have the exact same syntax as in Zig.\nAnyway, a range selector is a special expression in Zig that defines\na range of indexes, and it have the syntax `start..end`.\n\nIn the example below, at the second line of code,\nthe `sl` object stores a slice (or a portion) of the\n`ns` array. More precisely, the elements at index 1 and 2\nin the `ns` array. \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..3];\n_ = sl;\n```\n:::\n\n\n\n\n\nWhen you use the `start..end` syntax,\nthe \"end tail\" of the range selector is non-inclusive,\nmeaning that, the index at the end is not included in the range that is\nselected from the array.\nTherefore, the syntax `start..end` actually means `start..end - 1` in practice.\n\nYou can for example, create a slice that goes from the first to the\nlast elements of the array, by using `ar[0..ar.len]` syntax\nIn other words, it is a slice that\naccess all elements in the array.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ar = [4]u8{48, 24, 12, 6};\nconst sl = ar[0..ar.len];\n_ = sl;\n```\n:::\n\n\n\n\n\nYou can also use the syntax `start..` in your range selector.\nWhich tells the `zig` compiler to select the portion of the array\nthat begins at the `start` index until the last element of the array.\nIn the example below, we are selecting the range from index 1\nuntil the end of the array.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..];\n_ = sl;\n```\n:::\n\n\n\n\n\n\n### More on slices\n\nAs we discussed before, in Zig, you can select specific portions of an existing\narray. This is called *slicing* in Zig [@zigguide], because when you select a portion\nof an array, you are creating a slice object from that array.\n\nA slice object is essentially a pointer object accompained by a length number.\nThe pointer object points to the first element in the slice, and the\nlength number tells the `zig` compiler how many elements there are in this slice.\n\n> Slices can be thought of as a pair of `[*]T` (the pointer to the data) and a `usize` (the element count) [@zigguide].\n\nThrough the pointer contained inside the slice you can access the elements (or values)\nthat are inside this range (or portion) that you selected from the original array.\nBut the length number (which you can access through the `len` property of your slice object)\nis the really big improvement (over C arrays for example) that Zig brings to the table here.\n\nBecause with this length number\nthe `zig` compiler can easily check if you are trying to access an index that is out of the bounds of this particular slice,\nor, if you are causing any buffer overflow problems. In the example below,\nwe access the `len` property of the slice `sl`, which tells us that this slice\nhave 2 elements in it.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..3];\ntry stdout.print(\"{d}\\n\", .{sl.len});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n2\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Array operators\n\nThere are two array operators available in Zig that are very useful.\nThe array concatenation operator (`++`), and the array multiplication operator (`**`). As the name suggests,\nthese are array operators.\n\nOne important detail about these two operators is that they work\nonly when both operands have a size (or \"length\") that is compile-time known.\nWe are going to talk more about\nthe differences between \"compile-time known\" and \"runtime known\" at @sec-compile-time.\nBut for now, keep this information in mind, that you cannot use these operators in every situation.\n\nIn summary, the `++` operator creates a new array that is the concatenation,\nof both arrays provided as operands. So, the expression `a ++ b` produces\na new array which contains all the elements from arrays `a` and `b`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst a = [_]u8{1,2,3};\nconst b = [_]u8{4,5};\nconst c = a ++ b;\ntry stdout.print(\"{any}\\n\", .{c});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n{ 1, 2, 3, 4, 5 }\n```\n\n\n:::\n:::\n\n\n\n\n\nThis `++` operator is particularly useful to concatenate strings together.\nStrings in Zig are described in depth at @sec-zig-strings. In summary, a string object in Zig\nis essentially an arrays of bytes. So, you can use this array concatenation operator\nto effectively concatenate strings together.\n\nIn contrast, the `**` operator is used to replicate an array multiple\ntimes. In other words, the expression `a ** 3` creates a new array\nwhich contains the elements of the array `a` repeated 3 times.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst a = [_]u8{1,2,3};\nconst c = a ** 2;\ntry stdout.print(\"{any}\\n\", .{c});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n{ 1, 2, 3, 1, 2, 3 }\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Runtime versus compile-time known length in slices\n\nWe are going to talk a lot about the differences between compile-time known\nand runtime known across this book, especially at @sec-compile-time.\nBut the basic idea is that a thing is compile-time known, when we know\neverything (the value, the attributes and the characteristics) about this thing at compile-time.\nIn contrast, a runtime known thing is when the exact value of a thing is calculated only at runtime.\nTherefore, we don't know the value of this thing at compile-time, only at runtime.\n\nWe have learned at @sec-select-array-elem that slices are created by using a *range selector*,\nwhich represents a range of indexes. When this \"range of indexes\" (i.e. the start and the end of this range)\nis known at compile-time, the slice object that get's created is actually, under the hood, just\na single-item pointer to an array.\n\nYou don't need to precisely understand what that means now. We are going to talk a lot about pointers\nat @sec-pointer. For now, just understand that, when the range of indexes is known at compile-time,\nthe slice that get's created is just a pointer to an array, accompanied by a length value that\ntells the size of the slice.\n\nIf you have a slice object like this, i.e. a slice that has a compile-time known range,\nyou can use common pointer operations over this slice object. For example, you can \ndereference the pointer of this slice, by using the `.*` method, like you would\ndo on a normal pointer object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst arr1 = [10]u64 {\n 1, 2, 3, 4, 5,\n 6, 7, 8, 9, 10\n};\n// This slice have a compile-time known range.\n// Because we know both the start and end of the range.\nconst slice = arr1[1..4];\n```\n:::\n\n\n\n\n\n\nOn the other hand, if the range of indexes is not known at compile time, then, the slice object\nthat get's created is not a pointer anymore, and, thus, it does not support pointer operations.\nFor example, maybe the start index is known at compile time, but the end index is not. In such\ncase, the range of the slice becomes runtime known only.\n\nIn the example below, the `slice` object have a runtime known range, because the end index of the range\nis not known at compile time. In other words, the size of the array at `buffer` is not known\nat compile time. When we execute this program, the size of the array might be 10, or, it might be 12\ndepending on where we execute it. Therefore, we don't know at compile time if\nthe slice object have a range of size 10, or, a range of size 12.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst builtin = @import(\"builtin\");\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var n: usize = 0;\n if (builtin.target.os.tag == .windows) {\n n = 10;\n } else {\n n = 12;\n }\n const buffer = try allocator.alloc(u64, n);\n const slice = buffer[0..];\n _ = slice;\n}\n```\n:::\n\n\n\n\n\n\n## Blocks and scopes {#sec-blocks}\n\nBlocks are created in Zig by a pair of curly braces. A block is just a group of\nexpressions (or statements) contained inside of a pair of curly braces. All of these expressions that\nare contained inside of this pair of curly braces belongs to the same scope.\n\nIn other words, a block just delimits a scope in your code.\nThe objects that you define inside the same block belongs to the same\nscope, and, therefore, are accessible from within this scope.\nAt the same time, these objects are not accessible outside of this scope.\nSo, you could also say that blocks are used to limit the scope of the objects that you create in\nyour source code. In less technical terms, blocks are used to specify where in your source code\nyou can access whatever object you have in your source code.\n\nSo, a block is just a group of expressions contained inside a pair of curly braces.\nAnd every block have it's own scope separated from the others.\nThe body of a function is a classic example of a block. If statements, for and while loops\n(and any other structure in the language that uses the pair of curly braces)\nare also examples of blocks.\n\nThis means that, every if statement, or for loop,\netc., that you create in your source code have it's own separate scope.\nThat is why you can't access the objects that you defined inside\nof your for loop (or if statement) in an outer scope, i.e. a scope outside of the for loop.\nBecause you are trying to access an object that belongs to a scope that is different\nthan your current scope.\n\n\nYou can create blocks within blocks, with multiple levels of nesting.\nYou can also (if you want to) give a label to a particular block, with the colon character (`:`).\nJust write `label:` before you open the pair of curly braces that delimits your block. When you label a block\nin Zig, you can use the `break` keyword to return a value from this block, like as if it\nwas a function's body. You just write the `break` keyword, followed by the block label in the format `:label`,\nand the expression that defines the value that you want to return.\n\nLike in the example below, where we are returning the value from the `y` object\nfrom the block `add_one`, and saving the result inside the `x` object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar y: i32 = 123;\nconst x = add_one: {\n y += 1;\n break :add_one y;\n};\nif (x == 124 and y == 124) {\n try stdout.print(\"Hey!\", .{});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHey!\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\n\n## How strings work in Zig? {#sec-zig-strings}\n\nThe first project that we are going to build and discuss in this book is a base64 encoder/decoder (@sec-base64).\nBut in order for us to build such a thing, we need to get a better understanding on how strings work in Zig.\nSo let's discuss this specific aspect of Zig.\n\nIn Zig, a string literal value is just a pointer to a null-terminated array of bytes (i.e. the same thing as a C string).\nHowever, a string object in Zig is a little more than just a pointer. A string object\nin Zig is an object of type `[]const u8`, and, this object always contains two things: the\nsame null-terminated array of bytes that you would find in a string literal value, plus a length value.\nEach byte in this \"array of bytes\" is represented by an `u8` value, which is an unsigned 8 bit integer,\nso, it is equivalent to the C data type `unsigned char`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// This is a string literal value:\n\"A literal value\";\n// This is a string object:\nconst object: []const u8 = \"A string object\";\n```\n:::\n\n\n\n\n\nZig always assumes that this sequence of bytes is UTF-8 encoded. This might not be true for every\nsequence of bytes you have it, but is not really Zig's job to fix the encoding of your strings\n(you can use [`iconv`](https://www.gnu.org/software/libiconv/)[^libiconv] for that).\nToday, most of the text in our modern world, especially on the web, should be UTF-8 encoded.\nSo if your string literal is not UTF-8 encoded, then, you will likely\nhave problems in Zig.\n\n[^libiconv]: \n\nLet’s take for example the word \"Hello\". In UTF-8, this sequence of characters (H, e, l, l, o)\nis represented by the sequence of decimal numbers 72, 101, 108, 108, 111. In xecadecimal, this\nsequence is `0x48`, `0x65`, `0x6C`, `0x6C`, `0x6F`. So if I take this sequence of hexadecimal values,\nand ask Zig to print this sequence of bytes as a sequence of characters (i.e. a string), then,\nthe text \"Hello\" will be printed into the terminal:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const bytes = [_]u8{0x48, 0x65, 0x6C, 0x6C, 0x6F};\n try stdout.print(\"{s}\\n\", .{bytes});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHello\n```\n\n\n:::\n:::\n\n\n\n\n\n\nIf you want to see the actual bytes that represents a string in Zig, you can use\na `for` loop to iterate through each byte in the string, and ask Zig to print each byte as an hexadecimal\nvalue to the terminal. You do that by using a `print()` statement with the `X` formatting specifier,\nlike you would normally do with the [`printf()` function](https://cplusplus.com/reference/cstdio/printf/)[^printfs] in C.\n\n[^printfs]: \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n try stdout.print(\"Bytes that represents the string object: \", .{});\n for (string_object) |byte| {\n try stdout.print(\"{X} \", .{byte});\n }\n try stdout.print(\"\\n\", .{});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBytes that represents the string object: 54 68 69 \n 73 20 69 73 20 61 6E 20 65 78 61 6D 70 6C 65 20 6F\n F 66 20 73 74 72 69 6E 67 20 6C 69 74 65 72 61 6C 2\n 20 69 6E 20 5A 69 67 \n```\n\n\n:::\n:::\n\n\n\n\n\n### Strings in C\n\nAt first glance, this looks very similar to how C treats strings as well. In more details, string values\nin C are treated internally as an array of arbitrary bytes, and this array is also null-terminated.\n\nBut one key difference between a Zig string and a C string, is that Zig also stores the length of\nthe array inside the string object. This small detail makes your code safer, because is much\neasier for the Zig compiler to check if you are trying to access an element that is \"out of bounds\", i.e. if\nyour trying to access memory that does not belong to you.\n\nTo achieve this same kind of safety in C, you have to do a lot of work that kind of seems pointless.\nSo getting this kind of safety is not automatic and much harder to do in C. For example, if you want\nto track the length of your string troughout your program in C, then, you first need to loop through\nthe array of bytes that represents this string, and find the null element (`'\\0'`) position to discover\nwhere exactly the array ends, or, in other words, to find how much elements the array of bytes contain.\n\nTo do that, you would need something like this in C. In this example, the C string stored in\nthe object `array` is 25 bytes long:\n\n```c\n#include \nint main() {\n char* array = \"An example of string in C\";\n int index = 0;\n while (1) {\n if (array[index] == '\\0') {\n break;\n }\n index++;\n }\n printf(\"Number of elements in the array: %d\\n\", index);\n}\n```\n\n```\nNumber of elements in the array: 25\n```\n\nBut in Zig, you do not have to do this, because the object already contains a `len`\nfield which stores the length information of the array. As an example, the `string_object` object below is 43 bytes long:\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n try stdout.print(\"{d}\\n\", .{string_object.len});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n43\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### A better look at the object type\n\nNow, we can inspect better the type of objects that Zig create. To check the type of any object in Zig, you can use the\n`@TypeOf()` function. If we look at the type of the `simple_array` object below, you will find that this object\nis a array of 4 elements. Each element is a signed integer of 32 bits which corresponds to the data type `i32` in Zig.\nThat is what an object of type `[4]i32` is.\n\nBut if we look closely at the type of the `string_object` object below, you will find that this object is a\nconstant pointer (hence the `*const` annotation) to an array of 43 elements (or 43 bytes). Each element is a\nsingle byte (more precisely, an unsigned 8 bit integer - `u8`), that is why we have the `[43:0]u8` portion of the type below.\nIn other words, the string stored inside the `string_object` object is 43 bytes long.\nThat is why you have the type `*const [43:0]u8` below.\n\nIn the case of `string_object`, it is a constant pointer (`*const`) because the object `string_object` is declared\nas constant in the source code (in the line `const string_object = ...`). So, if we changed that for some reason, if\nwe declare `string_object` as a variable object (i.e. `var string_object = ...`), then, `string_object` would be\njust a normal pointer to an array of unsigned 8-bit integers (i.e. `* [43:0]u8`).\n\nNow, if we create an pointer to the `simple_array` object, then, we get a constant pointer to an array of 4 elements (`*const [4]i32`),\nwhich is very similar to the type of the `string_object` object. This demonstrates that a string object (or a string literal)\nin Zig is already a pointer to an array.\n\nJust remember that a \"pointer to an array\" is different than an \"array\". So a string object in Zig is a pointer to an array\nof bytes, and not simply an array of bytes.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n const simple_array = [_]i32{1, 2, 3, 4};\n try stdout.print(\"Type of array object: {}\", .{@TypeOf(simple_array)});\n try stdout.print(\n \"Type of string object: {}\",\n .{@TypeOf(string_object)}\n );\n try stdout.print(\n \"Type of a pointer that points to the array object: {}\",\n .{@TypeOf(&simple_array)}\n );\n}\n```\n:::\n\n\n\n\n\n```\nType of array object: [4]i32\nType of string object: *const [43:0]u8\nType of a pointer that points to\n the array object: *const [4]i32\n```\n\n\n### Byte vs unicode points\n\nIs important to point out that each byte in the array is not necessarily a single character.\nThis fact arises from the difference between a single byte and a single unicode point.\n\nThe encoding UTF-8 works by assigning a number (which is called a unicode point) to each character in\nthe string. For example, the character \"H\" is stored in UTF-8 as the decimal number 72. This means that\nthe number 72 is the unicode point for the character \"H\". Each possible character that can appear in a\nUTF-8 encoded string have its own unicode point.\n\nFor example, the Latin Capital Letter A With Stroke (Ⱥ) is represented by the number (or the unicode point)\n570. However, this decimal number (570) is higher than the maximum number stored inside a single byte, which\nis 255. In other words, the maximum decimal number that can be represented with a single byte is 255. That is why,\nthe unicode point 570 is actually stored inside the computer’s memory as the bytes `C8 BA`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"Ⱥ\";\n try stdout.print(\"Bytes that represents the string object: \", .{});\n for (string_object) |char| {\n try stdout.print(\"{X} \", .{char});\n }\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBytes that represents the string object: C8 BA \n```\n\n\n:::\n:::\n\n\n\n\n\n\nThis means that to store the character Ⱥ in an UTF-8 encoded string, we need to use two bytes together\nto represent the number 570. That is why the relationship between bytes and unicode points is not always\n1 to 1. Each unicode point is a single character in the string, but not always a single byte corresponds\nto a single unicode point.\n\nAll of this means that if you loop trough the elements of a string in Zig, you will be looping through the\nbytes that represents that string, and not through the characters of that string. In the Ⱥ example above,\nthe for loop needed two iterations (instead of a single iteration) to print the two bytes that represents this Ⱥ letter.\n\nNow, all english letters (or ASCII letters if you prefer) can be represented by a single byte in UTF-8. As a\nconsequence, if your UTF-8 string contains only english letters (or ASCII letters), then, you are lucky. Because\nthe number of bytes will be equal to the number of characters in that string. In other words, in this specific\nsituation, the relationship between bytes and unicode points is 1 to 1.\n\nBut on the other side, if your string contains other types of letters… for example, you might be working with\ntext data that contains, chinese, japanese or latin letters, then, the number of bytes necessary to represent\nyour UTF-8 string will likely be much higher than the number of characters in that string.\n\nIf you need to iterate through the characters of a string, instead of its bytes, then, you can use the\n`std.unicode.Utf8View` struct to create an iterator that iterates through the unicode points of your string.\n\nIn the example below, we loop through the japanese characters “アメリカ”. Each of the four characters in\nthis string is represented by three bytes. But the for loop iterates four times, one iteration for each\ncharacter/unicode point in this string:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n var utf8 = (\n (try std.unicode.Utf8View.init(\"アメリカ\"))\n .iterator()\n );\n while (utf8.nextCodepointSlice()) |codepoint| {\n try stdout.print(\n \"got codepoint {}\\n\",\n .{std.fmt.fmtSliceHexUpper(codepoint)}\n );\n }\n}\n```\n:::\n\n\n\n\n\n```\ngot codepoint E382A2\ngot codepoint E383A1\ngot codepoint E383AA\ngot codepoint E382AB\n```\n\n\n### Some useful functions for strings {#sec-strings-useful-funs}\n\nIn this section, I just want to quickly describe some functions from the Zig Standard Library\nthat are very useful to use when working with strings. Most notably:\n\n- `std.mem.eql()`: to compare if two strings are equal.\n- `std.mem.splitScalar()`: to split a string into an array of substrings given a delimiter value.\n- `std.mem.splitSequence()`: to split a string into an array of substrings given a substring delimiter.\n- `std.mem.startsWith()`: to check if string starts with substring.\n- `std.mem.endsWith()`: to check if string starts with substring.\n- `std.mem.trim()`: to remove specific values from both start and end of the string.\n- `std.mem.concat()`: to concatenate strings together.\n- `std.mem.count()`: to count the occurrences of substring in the string.\n- `std.mem.replace()`: to replace the occurrences of substring in the string.\n\nNotice that all of these functions come from the `mem` module of\nthe Zig Standard Library. This module contains multiple functions and methods\nthat are useful to work with memory and sequences of bytes in general.\n\nThe `eql()` function is used to check if two arrays of data are equal or not.\nSince strings are just arbitrary arrays of bytes, we can use this function to compare two strings together.\nThis function returns a boolean value indicating if the two strings are equal\nor not. The first argument of this function is the data type of the elements of the arrays\nthat are being compared.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst name: []const u8 = \"Pedro\";\ntry stdout.print(\n \"{any}\\n\", .{std.mem.eql(u8, name, \"Pedro\")}\n);\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntrue\n```\n\n\n:::\n:::\n\n\n\n\n\nThe `splitScalar()` and `splitSequence()` functions are useful to split\na string into multiple fragments, like the `split()` method from Python strings. The difference between these two\nmethods is that the `splitScalar()` uses a single character as the separator to\nsplit the string, while `splitSequence()` uses a sequence of characters (a.k.a. a substring)\nas the separator. There is a practical example of these functions later in the book.\n\nThe `startsWith()` and `endsWith()` functions are pretty straightforward. They\nreturn a boolean value indicating if the string (or, more precisely, if the array of data)\nbegins (`startsWith`) or ends (`endsWith`) with the sequence provided.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst name: []const u8 = \"Pedro\";\ntry stdout.print(\n \"{any}\\n\", .{std.mem.startsWith(u8, name, \"Pe\")}\n);\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntrue\n```\n\n\n:::\n:::\n\n\n\n\n\nThe `concat()` function, as the name suggests, concatenate two or more strings together.\nBecause the process of concatenating the strings involves allocating enough space to\naccomodate all the strings together, this `concat()` function receives an allocator\nobject as input.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst str1 = \"Hello\";\nconst str2 = \" you!\";\nconst str3 = try std.mem.concat(\n allocator, u8, &[_][]const u8{ str1, str2 }\n);\ntry stdout.print(\"{s}\\n\", .{str3});\n```\n:::\n\n\n\n\n\n```\nHello you!\n```\n\nAs you can imagine, the `replace()` function is used to replace substrings in a string by another substring.\nThis function works very similarly to the `replace()` method from Python strings. Therefore, you\nprovide a substring to search, and every time that the `replace()` function finds\nthis substring within the input string, it replaces this substring with the \"replacement substring\"\nthat you provided as input.\n\nIn the example below, we are taking the input string \"Hello\", and replacing all occurrences\nof the substring \"el\" inside this input string with \"34\", and saving the results inside the\n`buffer` object. As result, the `replace()` function returns an `usize` value that\nindicates how many replacements were performed.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst str1 = \"Hello\";\nvar buffer: [5]u8 = undefined;\nconst nrep = std.mem.replace(\n u8, str1, \"el\", \"34\", buffer[0..]\n);\ntry stdout.print(\"New string: {s}\\n\", .{buffer});\ntry stdout.print(\"N of replacements: {d}\\n\", .{nrep});\n```\n:::\n\n\n\n\n\n```\nNew string: H34lo\nN of replacements: 1\n```\n\n\n\n\n\n\n## Safety in Zig\n\nA general trend in modern low-level programming languages is safety. As our modern world\nbecome more interconnected with techology and computers,\nthe data produced by all of this technology becomes one of the most important\n(and also, one of the most dangerous) assets that we have.\n\nThis is probably the main reason why modern low-level programming languages\nhave been giving great attention to safety, especially memory safety, because\nmemory corruption is still the main target for hackers to exploit.\nThe reality is that we don't have an easy solution for this problem.\nFor now, we only have techniques and strategies that mitigates these\nproblems.\n\nAs Richard Feldman explains on his [most recent GOTO conference talk](https://www.youtube.com/watch?v=jIZpKpLCOiU&ab_channel=GOTOConferences)[^gotop]\n, we haven't figured it out yet a way to achieve **true safety in technology**.\nIn other words, we haven't found a way to build software that won't be exploited\nwith 100% certainty. We can greatly reduce the risks of our software being\nexploited, by ensuring memory safety for example. But this is not enough\nto achieve \"true safety\" territory.\n\nBecause even if you write your program in a \"safe language\", hackers can still\nexploit failures in the operational system where your program is running (e.g. maybe the\nsystem where your code is running have a \"backdoor exploit\" that can still\naffect your code in unexpected ways), or also, they can exploit the features\nfrom the architecture of your computer. A recently found exploit\nthat involves memory invalidation through a feature of \"memory tags\"\npresent in ARM chips is an example of that [@exploit1].\n\n[^gotop]: \n\nThe question is: what Zig and other languages have been doing to mitigate this problem?\nIf we take Rust as an example, Rust is, for the most part[^rust-safe], a memory safe\nlanguage by enforcing specific rules to the developer. In other words, the key feature\nof Rust, the *borrow checker*, forces you to follow a specific logic when you are writing\nyour Rust code, and the Rust compiler will always complain everytime you try to go out of this\npattern.\n\n[^rust-safe]: Actually, a lot of existing Rust code is still memory unsafe, because they communicate with external libraries through FFI (*foreign function interface*), which disables the borrow-checker features through the `unsafe` keyword.\n\n\nIn contrast, the Zig language is not a memory safe language by default.\nThere are some memory safety features that you get for free in Zig,\nespecially in arrays and pointer objects. But there are other tools\noffered by the language, that are not used by default.\nIn other words, the `zig` compiler does not obligates you to use such tools.\n\nThe tools listed below are related to memory safety. That is, they help you to achieve\nmemory safety in your Zig code:\n\n- `defer` allows you to keep free operations phisically close to allocations. This helps you to avoid memory leaks, \"use after free\", and also \"double-free\" problems. Furthermore, it also keeps free operations logically tied to the end of the current scope, which greatly reduces the mental overhead about object lifetime.\n- `errdefer` helps you to garantee that your program frees the allocated memory, even if a runtime error occurs.\n- pointers and objects are non-nullable by default. This helps you to avoid memory problems that might arise from de-referencing null pointers.\n- Zig offers some native types of allocators (called \"testing allocators\") that can detect memory leaks and double-frees. These types of allocators are widely used on unit tests, so they transform your unit tests into a weapon that you can use to detect memory problems in your code.\n- arrays and slices in Zig have their lengths embedded in the object itself, which makes the `zig` compiler very effective on detecting \"index out-of-range\" type of errors, and avoiding buffer overflows.\n\n\nDespite these features that Zig offers that are related to memory safety issues, the language\nalso have some rules that help you to achieve another type of safety, which is more related to\nprogram logic safety. These rules are:\n\n- pointers and objects are non-nullable by default. Which eliminates an edge case that might break the logic of your program.\n- switch statements must exaust all possible options.\n- the `zig` compiler forces you to handle every possible error in your program.\n\n\n## Other parts of Zig\n\nWe already learned a lot about Zig's syntax, and also, some pretty technical\ndetails about it. Just as a quick recap:\n\n- We talked about how functions are written in Zig at @sec-root-file and @sec-main-file.\n- How to create new objects/identifiers at @sec-root-file and especially at @sec-assignments.\n- How strings work in Zig at @sec-zig-strings.\n- How to use arrays and slices at @sec-arrays.\n- How to import functionality from other Zig modules at @sec-root-file.\n\n\nBut, for now, this amount of knowledge is enough for us to continue with this book.\nLater, over the next chapters we will still talk more about other parts of\nZig's syntax that are also equally important. Such as:\n\n\n- How Object-Oriented programming can be done in Zig through *struct declarations* at @sec-structs-and-oop.\n- Basic control flow syntax at @sec-zig-control-flow.\n- Enums at @sec-enum;\n- Pointers and Optionals at @sec-pointer;\n- Error handling with `try` and `catch` at @sec-error-handling;\n- Unit tests at @sec-unittests;\n- Vectors at @sec-vectors-simd;\n- Build System at @sec-build-system;\n\n\n\n\n",
- "supporting": [],
+ "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Introducing Zig\n\nIn this chapter, I want to introduce you to the world of Zig.\nZig is a very young language that is being actively developed.\nAs a consequence, it's world is still very wild and to be explored.\nThis book is my attempt to help you on your personal journey for\nunderstanding and exploring the exciting world of Zig.\n\nI assume you have previous experience with some programming\nlanguage in this book, not necessarily with a low-level one.\nSo, if you have experience with Python, or Javascript, for example, it will be fine.\nBut, if you do have experience with low-level languages, such as C, C++, or\nRust, you will probably learn faster throughout this book.\n\n## What is Zig?\n\nZig is a modern, low-level, and general-purpose programming language. Some programmers think of\nZig as a modern and better version of C.\n\nIn the author's personal interpretation, Zig is tightly connected with \"less is more\".\nInstead of trying to become a modern language by adding more and more features,\nmany of the core improvements that Zig brings to the\ntable are actually about removing annoying behaviours/features from C and C++.\nIn other words, Zig tries to be better by simplifying the language, and by having more consistent and robust behaviour.\nAs a result, analyzing, writing and debugging applications become much easier and simpler in Zig, than it is in C or C++.\n\nThis philosophy becomes clear with the following phrase from the official website of Zig:\n\n> \"Focus on debugging your application rather than debugging your programming language knowledge\".\n\nThis phrase is specially true for C++ programmers. Because C++ is a gigantic language,\nwith tons of features, and also, there are lots of different \"flavors of C++\". These elements\nare what makes C++ so complex and hard to learn. Zig tries to go in the opposite direction.\nZig is a very simple language, more closely related to other simple languages such as C and Go.\n\nThe phrase above is still important for C programmers too. Because, even C being a simple\nlanguage, it is still hard sometimes to read and understand C code. For example, pre-processor macros in\nC are a frequent source of confusion. They really make it sometimes hard to debug\nC programs. Because macros are essentially a second language embedded in C that obscures\nyour C code. With macros, you are no longer 100% sure about which pieces\nof the code are being sent to the compiler, i.e.\nthey obscures the actual source code that you wrote.\n\nYou don't have macros in Zig. In Zig, the code you write, is the actual code that get's compiled by the compiler.\nYou also don't have a hidden control flow happening behind the scenes. And, you also\ndon't have functions or operators from the standard library that make\nhidden memory allocations behind your back.\n\nBy being a simpler language, Zig becomes much more clear and easier to read/write,\nbut at the same time, it also achieves a much more robust state, with more consistent\nbehaviour in edge situations. Once again, less is more.\n\n\n## Hello world in Zig\n\nWe begin our journey in Zig by creating a small \"Hello World\" program.\nTo start a new Zig project in your computer, you simply call the `init` command\nfrom the `zig` compiler.\nJust create a new directory in your computer, then, init a new Zig project\ninside this directory, like this:\n\n```bash\nmkdir hello_world\ncd hello_world\nzig init\n```\n\n```\ninfo: created build.zig\ninfo: created build.zig.zon\ninfo: created src/main.zig\ninfo: created src/root.zig\ninfo: see `zig build --help` for a menu of options\n```\n\n### Understanding the project files {#sec-project-files}\n\nAfter you run the `init` command from the `zig` compiler, some new files\nare created inside of your current directory. First, a \"source\" (`src`) directory\nis created, containing two files, `main.zig` and `root.zig`. Each `.zig` file\nis a separate Zig module, which is simply a text file that contains some Zig code.\n\nBy convention, the `main.zig` module is where your main function lives. Thus,\nif you are building an executable program in Zig, you need to declare a `main()` function,\nwhich represents the entrypoint of your program, i.e. it is where the execution of your program begins.\n\nHowever, if you are building a library (instead of an executable program), then,\nthe normal procedure is to delete this `main.zig` file and start with the `root.zig` module.\nBy convention, the `root.zig` module is the root source file of your library.\n\n```bash\ntree .\n```\n\n```\n.\n├── build.zig\n├── build.zig.zon\n└── src\n ├── main.zig\n └── root.zig\n\n1 directory, 4 files\n```\n\nThe `ìnit` command also creates two additional files in our working directory:\n`build.zig` and `build.zig.zon`. The first file (`build.zig`) represents a build script written in Zig.\nThis script is executed when you call the `build` command from the `zig` compiler.\nIn other words, this file contain Zig code that executes the necessary steps to build the entire project.\n\n\nLow-level languages normally use a compiler to build your\nsource code into binary executables or binary libraries.\nNevertheless, this process of compiling your source code and building\nbinary executables or binary libraries from it, became a real challenge\nin the programming world, once the projects became bigger and bigger.\nAs a result, programmers created \"build systems\", which are a second set of tools designed to make this process\nof compiling and building complex projects, easier.\n\nExamples of build systems are CMake, GNU Make, GNU Autoconf and Ninja,\nwhich are used to build complex C and C++ projects.\nWith these systems, you can write scripts, which are called \"build scripts\".\nThey simply are scripts that describes the necessary steps to compile/build\nyour project.\n\nHowever, these are separate tools, that do not\nbelong to C/C++ compilers, like `gcc` or `clang`.\nAs a result, in C/C++ projects, you have not only to install and\nmanage your C/C++ compilers, but you also have to install and manage\nthese build systems separately.\n\nIn Zig, we don't need to use a separate set of tools to build our projects,\nbecause a build system is embedded inside the language itself.\nTherefore, Zig contains a native build system in it, and\nwe can use this build system to write small scripts in Zig,\nwhich describes the necessary steps to build/compile our Zig project[^zig-build-system].\nSo, everything you need to build a complex Zig project is the\n`zig` compiler, and nothing more.\n\n[^zig-build-system]: .\n\n\nThe second generated file (`build.zig.zon`) is the Zig package manager configuration file,\nwhere you can list and manage the dependencies of your project. Yes, Zig has\na package manager (like `pip` in Python, `cargo` in Rust, or `npm` in Javascript) called Zon,\nand this `build.zig.zon` file is similar to the `package.json` file\nin Javascript projects, or, the `Pipfile` file in Python projects,\nor the `Cargo.toml` file in Rust projects.\n\n\n### The file `root.zig` {#sec-root-file}\n\nLet's take a look into the `root.zig` file.\nYou might have noticed that every line of code with an expression ends with a semicolon (`;`).\nThis follows the syntax of a C-family programming language[^c-family].\n\n[^c-family]: \n\nAlso, notice the `@import()` call at the first line. We use this built-in function\nto import functionality from other Zig modules into our current module.\nThis `@import()` function works similarly to the `#include` pre-processor\nin C or C++, or, to the `import` statement in Python or Javascript code.\nIn this example, we are importing the `std` module,\nwhich gives you access to the Zig Standard Library.\n\nIn this `root.zig` file, we can also see how assignments (i.e. creating new objects)\nare made in Zig. You can create a new object in Zig by using the following syntax\n`(const|var) name = value;`. In the example below, we are creating two constant\nobjects (`std` and `testing`). At @sec-assignments we talk more about objects in general.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst testing = std.testing;\n\nexport fn add(a: i32, b: i32) i32 {\n return a + b;\n}\n```\n:::\n\n\n\n\nFunctions in Zig are declared using the `fn` keyword.\nIn this `root.zig` module, we are declaring a function called `add()`, which has two arguments named `a` and `b`.\nThe function returns an integer of the type `i32` as result.\n\n\nZig is not exactly a strongly-typed language. Because you can (if you want to) omit\nthe type of an object in your code, if this type can be derived from the assigned value.\nBut there are other situations where you do need to be explicit.\nFor example, you do have to explicitly specify the type of each function argument, and also,\nthe return type of every function you create in Zig. So, at least in function declarations,\nZig is a strongly-typed language.\n\nWe specify the type of an object or a function argument in Zig by\nusing a colon character (`:`) followed by the type after the name of this object/function argument.\nWith the expressions `a: i32` and `b: i32`, we know that both `a` and `b` arguments have type `i32`,\nwhich is a signed 32 bit integer. In this part,\nthe syntax in Zig is identical to the syntax in Rust, which also specifies types by\nusing the colon character.\n\nLastly, we have the return type of the function at the end of the line, before we open\nthe curly braces to start writing the function's body. In the example above, this type is also\na signed 32 bit integer (`i32`) value.\n\nNotice that we also have an `export` keyword before the function declaration. This keyword\nis similar to the `extern` keyword in C. It exposes the function\nto make it available in the library API. Therefore, if you are writing\na library for other people to use, you have to expose the functions\nyou write in the public API of this library by using this `export` keyword.\nIf we removed the `export` keyword from the `add()` function declaration,\nthen, this function would be no longer exposed in the library object built\nby the `zig` compiler.\n\n\n### The `main.zig` file {#sec-main-file}\n\nNow that we have learned a lot about Zig's syntax from the `root.zig` file,\nlet's take a look at the `main.zig` file.\nA lot of the elements we saw in `root.zig` are also present in `main.zig`.\nBut there are some other elements that we haven't seen yet, so let's dive in.\n\nFirst, look at the return type of the `main()` function in this file.\nWe can see a small change. The return\ntype of the function (`void`) is accompanied by an exclamation mark (`!`).\nThis exclamation mark tells us that this `main()` function\nmight return an error.\n\nIn this example, the `main()` function can either return `void` or return an error.\nThis is an interesting feature of Zig. If you write a function and something inside of\nthe body of this function might return an error then you are forced to:\n\n- either add the exclamation mark to the return type of the function and make it clear that\nthis function might return an error\n- explicitly handle this error inside the function\n\nIn most programming languages, we normally handle (or deal with) an error through\na *try catch* pattern. Zig do have both `try` and `catch` keywords. But they work\na little differently than what you're probably used to in other languages.\n\nIf we look at the `main()` function below, you can see that we do have a `try` keyword\non the 5th line. But we do not have a `catch` keyword in this code.\nIn Zig, we use the `try` keyword to execute an expression that might return an error,\nwhich, in this example, is the `stdout.print()` expression.\n\nIn essence, the `try` keyword executes the expression `stdout.print()`. If this expression\nreturns a valid value, then, the `try` keyword do nothing. It only passes the value forward.\nBut if the expression does return an error, then, the `try` keyword just unwrap the error value,\nand return this error from the function and also prints the current stack trace to `stderr`.\n\nThis might sound weird to you if you come from a high-level language. Because in\nhigh-level languages, such as Python, if an error occurs somewhere, this error is automatically\nreturned and the execution of your program will automatically stop even if you don't want\nto stop the execution. You are obligated to face the error.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\n\npub fn main() !void {\n const stdout = std.io.getStdOut().writer();\n try stdout.print(\"Hello, {s}!\\n\", .{\"world\"});\n}\n```\n:::\n\n\n\n\nAnother thing that you might have noticed in this code example, is that\nthe `main()` function is marked with the `pub` keyword.\nIt marks the `main()` function as a *public function* from this module.\n\nEvery function in your Zig module is by default private to this Zig module and can only be called from within the module.\nUnless, you explicitly mark this function as a public function with the `pub` keyword.\nThis means that the `pub` keyword in Zig do essentially the opposite of what the `static` keyword\ndo in C/C++.\n\nBy making a function \"public\" you allow other Zig modules to access and call it.\nA calling Zig module imports the module with the `@import()`\nbuilt-in. That makes all public functions from the imported module visible.\n\n\n### Compiling your source code {#sec-compile-code}\n\nYou can compile your Zig modules into a binary executable by running the `build-exe` command\nfrom the `zig` compiler. You simply list all the Zig modules that you want to build after\nthe `build-exe` command, separated by spaces. In the example below, we are compiling the module `main.zig`.\n\n```bash\nzig build-exe src/main.zig\n```\n\nSince we are building an executable, the `zig` compiler will look for a `main()` function\ndeclared in any of the files that you list after the `build-exe` command. If\nthe compiler does not find a `main()` function declared somewhere, a\ncompilation error will be raised, warning about this mistake.\n\nThe `zig` compiler also offers a `build-lib` and `build-obj` commands, which work\nthe exact same way as the `build-exe` command. The only difference is that, they compile your\nZig modules into a portale C ABI library, or, into object files, respectively.\n\nIn the case of the `build-exe` command, a binary executable file is created by the `zig`\ncompiler in the root directory of your project.\nIf we take a look now at the contents of our current directory, with a simple `ls` command, we can\nsee the binary file called `main` that was created by the compiler.\n\n```bash\nls\n```\n\n```\nbuild.zig build.zig.zon main src\n```\n\nIf I execute this binary executable, I get the \"Hello World\" message in the terminal\n, as we expected.\n\n```bash\n./main\n```\n\n```\nHello, world!\n```\n\n\n### Compile and execute at the same time {#sec-compile-run-code}\n\nOn the previous section, I presented the `zig build-exe` command, which\ncompiles Zig modules into an executable file. However, this means that,\nin order to execute the executable file, we have to run two different commands.\nFirst, the `zig build-exe` command, and then, we call the executable file\ncreated by the compiler.\n\nBut what if we wanted to perform these two steps,\nall at once, in a single command? We can do that by using the `zig run`\ncommand.\n\n```bash\nzig run src/main.zig\n```\n\n```\nHello, world!\n```\n\n### Compiling the entire project {#sec-compile-project}\n\nJust as I described at @sec-project-files, as our project grows in size and\ncomplexity, we usually prefer to organize the compilation and build process\nof the project into a build script, using some sort of \"build system\".\n\nIn other words, as our project grows in size and complexity,\nthe `build-exe`, `build-lib` and `build-obj` commands become\nharder to use directly. Because then, we start to list\nmultiple and multiple modules at the same time. We also\nstart to add built-in compilation flags to customize the\nbuild process for our needs, etc. It becomes a lot of work\nto write the necessary commands by hand.\n\nIn C/C++ projects, programmers normally opt to use CMake, Ninja, `Makefile` or `configure` scripts\nto organize this process. However, in Zig, we have a native build system in the language itself.\nSo, we can write build scripts in Zig to compile and build Zig projects. Then, all we\nneed to do, is to call the `zig build` command to build our project.\n\nSo, when you execute the `zig build` command, the `zig` compiler will search\nfor a Zig module named `build.zig` inside your current directory, which\nshould be your build script, containing the necessary code to compile and\nbuild your project. If the compiler do find this `build.zig` file in your directory,\nthen, the compiler will essentially execute a `zig run` command\nover this `build.zig` file, to compile and execute this build\nscript, which in turn, will compile and build your entire project.\n\n\n```bash\nzig build\n```\n\n\nAfter you execute this \"build project\" command, a `zig-out` directory\nis created in the root of your project directory, where you can find\nthe binary executables and libraries created from your Zig modules\naccordingly to the build commands that you specified at `build.zig`.\nWe will talk more about the build system in Zig latter in this book.\n\nIn the example below, I'm executing the binary executable\nnamed `hello_world` that was generated by the compiler after the\n`zig build` command.\n\n```bash\n./zig-out/bin/hello_world\n```\n\n```\nHello, world!\n```\n\n\n\n## How to learn Zig?\n\nWhat are the best strategies to learn Zig? \nFirst of all, of course this book will help you a lot on your journey through Zig.\nBut you will also need some extra resources if you want to be really good at Zig.\n\nAs a first tip, you can join a community with Zig programmers to get some help\n, when you need it:\n\n- Reddit forum: ;\n- Ziggit community: ;\n- Discord, Slack, Telegram, and others: ;\n\nNow, one of the best ways to learn Zig is to simply read Zig code. Try\nto read Zig code often, and things will become more clear.\nA C/C++ programmer would also probably give you this same tip.\nBecause this strategy really works!\n\nNow, where you can find Zig code to read?\nI personally think that, the best way of reading Zig code is to read the source code of the\nZig Standard Library. The Zig Standard Library is available at the [`lib/std` folder](https://github.com/ziglang/zig/tree/master/lib/std)[^zig-lib-std] on\nthe official GitHub repository of Zig. Access this folder, and start exploring the Zig modules.\n\nAlso, a great alternative is to read code from other large Zig\ncodebases, such as:\n\n1. the [Javascript runtime Bun](https://github.com/oven-sh/bun)[^bunjs].\n1. the [game engine Mach](https://github.com/hexops/mach)[^mach].\n1. a [LLama 2 LLM model implementation in Zig](https://github.com/cgbur/llama2.zig/tree/main)[^ll2].\n1. the [financial transactions database `tigerbeetle`](https://github.com/tigerbeetle/tigerbeetle)[^tiger].\n1. the [command-line arguments parser `zig-clap`](https://github.com/Hejsil/zig-clap)[^clap].\n1. the [UI framework `capy`](https://github.com/capy-ui/capy)[^capy].\n1. the [Language Protocol implementation for Zig, `zls`](https://github.com/zigtools/zls)[^zls].\n1. the [event-loop library `libxev`](https://github.com/mitchellh/libxev)[^xev].\n\n[^xev]: \n[^zls]: \n[^capy]: \n[^clap]: \n[^tiger]: \n[^ll2]: \n[^mach]: \n[^bunjs]: .\n\nAll these assets are available on GitHub,\nand this is great, because we can use the GitHub search bar in our advantage,\nto find Zig code that fits our description.\nFor example, you can always include `lang:Zig` in the GitHub search bar when you\nare searching for a particular pattern. This will limit the search to only Zig modules.\n\n[^zig-lib-std]: \n\nAlso, a great alternative is to consult online resources and documentations.\nHere is a quick list of resources that I personally use from time to time to learn\nmore about the language each day:\n\n- Zig Language Reference: ;\n- Zig Standard Library Reference: ;\n- Zig Guide: ;\n- Karl Seguin Blog: ;\n- Zig News: ;\n- Read the code written by one of the Zig core team members: ;\n- Some livecoding sessions are transmitted in the Zig Showtime Youtube Channel: ;\n\n\nAnother great strategy to learn Zig, or honestly, to learn any language you want,\nis to practice it by solving exercises. For example, there is a famous repository\nin the Zig community called [Ziglings](https://codeberg.org/ziglings/exercises/)[^ziglings]\n, which contains more than 100 small exercises that you can solve. It is a repository of\ntiny programs written in Zig that are currently broken, and your responsibility is to\nfix these programs, and make them work again.\n\n[^ziglings]: .\n\nA famous tech YouTuber known as *The Primeagen* also posted some videos (at YouTube)\nwhere he solves these exercises from Ziglings. The first video is named\n[\"Trying Zig Part 1\"](https://www.youtube.com/watch?v=OPuztQfM3Fg&t=2524s&ab_channel=TheVimeagen)[^prime1].\n\n[^prime1]: .\n\nAnother great alternative, is to solve the [Advent of Code exercises](https://adventofcode.com/)[^advent-code].\nThere are people that already took the time to learn and solve the exercises, and they posted\ntheir solutions on GitHub as well, so, in case you need some resource to compare while solving\nthe exercises, you can look at these two repositories:\n\n- ;\n- ;\n\n[^advent-code]: \n\n\n\n\n\n\n## Creating new objects in Zig (i.e. identifiers) {#sec-assignments}\n\nLet's talk more about objects in Zig. Readers that have past experience\nwith other programming languages might know this concept through\na different name, such as: \"variable\" or \"identifier\". In this book, I choose\nto use the term \"object\" to refer to this concept.\n\nTo create a new object (or a new \"identifier\") in Zig, we use\nthe keywords `const` or `var`. These keywords specificy if the object\nthat you are creating is mutable or not.\nIf you use `const`, then the object you are\ncreating is a constant (or immutable) object, which means that once you declare this object, you\ncan no longer change the value stored inside this object.\n\nOn the other side, if you use `var`, then, you are creating a variable (or mutable) object.\nYou can change the value of this object as many times you want. Using the\nkeyword `var` in Zig is similar to using the keywords `let mut` in Rust.\n\n### Constant objects vs variable objects\n\nIn the code example below, we are creating a new constant object called `age`.\nThis object stores a number representing the age of someone. However, this code example\ndoes not compiles successfully. Because on the next line of code, we are trying to change the value\nof the object `age` to 25.\n\nThe `zig` compiler detects that we are trying to change\nthe value of an object/identifier that is constant, and because of that,\nthe compiler will raise a compilation error, warning us about the mistake.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst age = 24;\n// The line below is not valid!\nage = 25;\n```\n:::\n\n\n\n\n```\nt.zig:10:5: error: cannot assign to constant\n age = 25;\n ~~^~~\n```\n\nIn contrast, if you use `var`, then, the object created is a variable object.\nWith `var` you can declare this object in your source code, and then,\nchange the value of this object how many times you want over future points\nin your source code.\n\nSo, using the same code example exposed above, if I change the declaration of the\n`age` object to use the `var` keyword, then, the program gets compiled successfully.\nBecause now, the `zig` compiler detects that we are changing the value of an\nobject that allows this behaviour, because it is an \"variable object\".\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar age: u8 = 24;\nage = 25;\n```\n:::\n\n\n\n\n\n### Declaring without an initial value\n\nBy default, when you declare a new object in Zig, you must give it\nan initial value. In other words, this means\nthat we have to declare, and, at the same time, initialize every object we\ncreate in our source code.\n\nOn the other hand, you can, in fact, declare a new object in your source code,\nand not give it an explicit value. But we need to use a special keyword for that,\nwhich is the `undefined` keyword.\n\nIs important to emphasize that, you should avoid using `undefined` as much as possible.\nBecause when you use this keyword, you leave your object uninitialized, and, as a consequence,\nif for some reason, your code use this object while it is uninitialized, then, you will definitely\nhave undefined behaviour and major bugs in your program.\n\nIn the example below, I'm declaring the `age` object again. But this time,\nI do not give it an initial value. The variable is only initialized at\nthe second line of code, where I store the number 25 in this object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar age: u8 = undefined;\nage = 25;\n```\n:::\n\n\n\n\nHaving these points in mind, just remember that you should avoid as much as possible to use `undefined` in your code.\nAlways declare and initialize your objects. Because this gives you much more safety in your program.\nBut in case you really need to declare an object without initializing it... the\n`undefined` keyword is the way to do it in Zig.\n\n\n### There is no such thing as unused objects\n\nEvery object (being constant or variable) that you declare in Zig **must be used in some way**. You can give this object\nto a function call, as a function argument, or, you can use it in another expression\nto calculate the value of another object, or, you can call a method that belongs to this\nparticular object. \n\nIt doesn't matter in which way you use it. As long as you use it.\nIf you try to break this rule, i.e. if your try to declare a object, but not use it,\nthe `zig` compiler will not compile your Zig source code, and it will issue a error\nmessage warning that you have unused objects in your code.\n\nLet's demonstrate this with an example. In the source code below, we declare a constant object\ncalled `age`. If you try to compile a simple Zig program with this line of code below,\nthe compiler will return an error as demonstrated below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst age = 15;\n```\n:::\n\n\n\n\n```\nt.zig:4:11: error: unused local constant\n const age = 15;\n ^~~\n```\n\nEverytime you declare a new object in Zig, you have two choices:\n\n1. you either use the value of this object;\n2. or you explicitly discard the value of the object;\n\nTo explicitly discard the value of any object (constant or variable), all you need to do is to assign\nthis object to an special character in Zig, which is the underscore (`_`).\nWhen you assign an object to a underscore, like in the example below, the `zig` compiler will automatically\ndiscard the value of this particular object.\n\nYou can see in the example below that, this time, the compiler did not\ncomplain about any \"unused constant\", and successfully compiled our source code.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// It compiles!\nconst age = 15;\n_ = age;\n```\n:::\n\n\n\n\nNow, remember, everytime you assign a particular object to the underscore, this object\nis essentially destroyed. It is discarded by the compiler. This means that you can no longer\nuse this object further in your code. It doesn't exist anymore.\n\nSo if you try to use the constant `age` in the example below, after we discarded it, you\nwill get a loud error message from the compiler (talking about a \"pointless discard\")\nwarning you about this mistake.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// It does not compile.\nconst age = 15;\n_ = age;\n// Using a discarded value!\nstd.debug.print(\"{d}\\n\", .{age + 2});\n```\n:::\n\n\n\n\n```\nt.zig:7:5: error: pointless discard\n of local constant\n```\n\n\nThis same rule applies to variable objects. Every variable object must also be used in\nsome way. And if you assign a variable object to the underscore,\nthis object also get's discarded, and you can no longer use this object.\n\n\n\n### You must mutate every variable objects\n\nEvery variable object that you create in your source code must be mutated at some point.\nIn other words, if you declare an object as a variable\nobject, with the keyword `var`, and you do not change the value of this object\nat some point in the future, the `zig` compiler will detect this,\nand it will raise an error warning you about this mistake.\n\nThe concept behind this is that every object you create in Zig should be preferably a\nconstant object, unless you really need an object whose value will\nchange during the execution of your program.\n\nSo, if I try to declare a variable object such as `where_i_live` below,\nand I do not change the value of this object in some way,\nthe `zig` compiler raises an error message with the phrase \"variable is never mutated\".\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar where_i_live = \"Belo Horizonte\";\n_ = where_i_live;\n```\n:::\n\n\n\n\n```\nt.zig:7:5: error: local variable is never mutated\nt.zig:7:5: note: consider using 'const'\n```\n\n## Primitive Data Types {#sec-primitive-data-types}\n\nZig have many different primitive data types available for you to use.\nYou can see the full list of available data types at the official\n[Language Reference page](https://ziglang.org/documentation/master/#Primitive-Types)[^lang-data-types].\n\n[^lang-data-types]: .\n\nBut here is a quick list:\n\n- Unsigned integers: `u8`, 8-bit integer; `u16`, 16-bit integer; `u32`, 32-bit integer; `u64`, 64-bit integer; `u128`, 128-bit integer.\n- Signed integers: `i8`, 8-bit integer; `i16`, 16-bit integer; `i32`, 32-bit integer; `i64`, 64-bit integer; `i128`, 128-bit integer.\n- Float number: `f16`, 16-bit floating point; `f32`, 32-bit floating point; `f64`, 64-bit floating point; `f128`, 128-bit floating point;\n- Boolean: `bool`, represents true or false values.\n- C ABI compatible types: `c_long`, `c_char`, `c_short`, `c_ushort`, `c_int`, `c_uint`, and many others.\n- Pointer sized integers: `isize` and `usize`.\n\n\n\n\n\n\n\n## Arrays {#sec-arrays}\n\nYou create arrays in Zig by using a syntax that resembles the C syntax.\nFirst, you specify the size of the array (i.e. the number of elements that will be stored in the array)\nyou want to create inside a pair of brackets.\n\nThen, you specify the data type of the elements that will be stored inside this array.\nAll elements present in an array in Zig must have the same data type. For example, you cannot mix elements\nof type `f32` with elements of type `i32` in the same array.\n\nAfter that, you simply list the values that you want to store in this array inside\na pair of curly braces.\nIn the example below, I am creating two constant objets that contain different arrays.\nThe first object contains an array of 4 integer values, while the second object,\nan array of 3 floating point values.\n\nNow, you should notice that in the object `ls`, I am\nnot explicitly specifying the size of the array inside of the brackets. Instead\nof using a literal value (like the value 4 that I used in the `ns` object), I am\nusing the special character underscore (`_`). This syntax tells the `zig` compiler\nto fill this field with the number of elements listed inside of the curly braces.\nSo, this syntax `[_]` is for lazy (or smart) programmers who leave the job of\ncounting how many elements there are in the curly braces for the compiler.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst ls = [_]f64{432.1, 87.2, 900.05};\n_ = ns; _ = ls;\n```\n:::\n\n\n\n\nIs worth noting that these are static arrays, meaning that\nthey cannot grow in size.\nOnce you declare your array, you cannot change the size of it.\nThis is very common in low level languages.\nBecause low level languages normally wants to give you (the programmer) full control over memory,\nand the way in which arrays are expanded is tightly related to\nmemory management.\n\n\n### Selecting elements of the array {#sec-select-array-elem}\n\nOne very common activity is to select specific portions of an array\nyou have in your source code.\nIn Zig, you can select a specific element from your\narray, by simply providing the index of this particular\nelement inside brackets after the object name.\nIn the example below, I am selecting the third element from the\n`ns` array. Notice that Zig is a \"zero-index\" based language,\nlike C, C++, Rust, Python, and many other languages.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\ntry stdout.print(\"{d}\\n\", .{ ns[2] });\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n12\n```\n\n\n:::\n:::\n\n\n\n\nIn contrast, you can also select specific slices (or sections) of your array, by using a\nrange selector. Some programmers also call these selectors of \"slice selectors\",\nand they also exist in Rust, and have the exact same syntax as in Zig.\nAnyway, a range selector is a special expression in Zig that defines\na range of indexes, and it have the syntax `start..end`.\n\nIn the example below, at the second line of code,\nthe `sl` object stores a slice (or a portion) of the\n`ns` array. More precisely, the elements at index 1 and 2\nin the `ns` array. \n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..3];\n_ = sl;\n```\n:::\n\n\n\n\nWhen you use the `start..end` syntax,\nthe \"end tail\" of the range selector is non-inclusive,\nmeaning that, the index at the end is not included in the range that is\nselected from the array.\nTherefore, the syntax `start..end` actually means `start..end - 1` in practice.\n\nYou can for example, create a slice that goes from the first to the\nlast elements of the array, by using `ar[0..ar.len]` syntax\nIn other words, it is a slice that\naccess all elements in the array.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ar = [4]u8{48, 24, 12, 6};\nconst sl = ar[0..ar.len];\n_ = sl;\n```\n:::\n\n\n\n\nYou can also use the syntax `start..` in your range selector.\nWhich tells the `zig` compiler to select the portion of the array\nthat begins at the `start` index until the last element of the array.\nIn the example below, we are selecting the range from index 1\nuntil the end of the array.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..];\n_ = sl;\n```\n:::\n\n\n\n\n\n### More on slices\n\nAs we discussed before, in Zig, you can select specific portions of an existing\narray. This is called *slicing* in Zig [@zigguide], because when you select a portion\nof an array, you are creating a slice object from that array.\n\nA slice object is essentially a pointer object accompained by a length number.\nThe pointer object points to the first element in the slice, and the\nlength number tells the `zig` compiler how many elements there are in this slice.\n\n> Slices can be thought of as a pair of `[*]T` (the pointer to the data) and a `usize` (the element count) [@zigguide].\n\nThrough the pointer contained inside the slice you can access the elements (or values)\nthat are inside this range (or portion) that you selected from the original array.\nBut the length number (which you can access through the `len` property of your slice object)\nis the really big improvement (over C arrays for example) that Zig brings to the table here.\n\nBecause with this length number\nthe `zig` compiler can easily check if you are trying to access an index that is out of the bounds of this particular slice,\nor, if you are causing any buffer overflow problems. In the example below,\nwe access the `len` property of the slice `sl`, which tells us that this slice\nhave 2 elements in it.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [4]u8{48, 24, 12, 6};\nconst sl = ns[1..3];\ntry stdout.print(\"{d}\\n\", .{sl.len});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n2\n```\n\n\n:::\n:::\n\n\n\n\n\n### Array operators\n\nThere are two array operators available in Zig that are very useful.\nThe array concatenation operator (`++`), and the array multiplication operator (`**`). As the name suggests,\nthese are array operators.\n\nOne important detail about these two operators is that they work\nonly when both operands have a size (or \"length\") that is compile-time known.\nWe are going to talk more about\nthe differences between \"compile-time known\" and \"runtime known\" at @sec-compile-time.\nBut for now, keep this information in mind, that you cannot use these operators in every situation.\n\nIn summary, the `++` operator creates a new array that is the concatenation,\nof both arrays provided as operands. So, the expression `a ++ b` produces\na new array which contains all the elements from arrays `a` and `b`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst a = [_]u8{1,2,3};\nconst b = [_]u8{4,5};\nconst c = a ++ b;\ntry stdout.print(\"{any}\\n\", .{c});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n{ 1, 2, 3, 4, 5 }\n```\n\n\n:::\n:::\n\n\n\n\nThis `++` operator is particularly useful to concatenate strings together.\nStrings in Zig are described in depth at @sec-zig-strings. In summary, a string object in Zig\nis essentially an arrays of bytes. So, you can use this array concatenation operator\nto effectively concatenate strings together.\n\nIn contrast, the `**` operator is used to replicate an array multiple\ntimes. In other words, the expression `a ** 3` creates a new array\nwhich contains the elements of the array `a` repeated 3 times.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst a = [_]u8{1,2,3};\nconst c = a ** 2;\ntry stdout.print(\"{any}\\n\", .{c});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n{ 1, 2, 3, 1, 2, 3 }\n```\n\n\n:::\n:::\n\n\n\n\n\n### Runtime versus compile-time known length in slices\n\nWe are going to talk a lot about the differences between compile-time known\nand runtime known across this book, especially at @sec-compile-time.\nBut the basic idea is that a thing is compile-time known, when we know\neverything (the value, the attributes and the characteristics) about this thing at compile-time.\nIn contrast, a runtime known thing is when the exact value of a thing is calculated only at runtime.\nTherefore, we don't know the value of this thing at compile-time, only at runtime.\n\nWe have learned at @sec-select-array-elem that slices are created by using a *range selector*,\nwhich represents a range of indexes. When this \"range of indexes\" (i.e. the start and the end of this range)\nis known at compile-time, the slice object that get's created is actually, under the hood, just\na single-item pointer to an array.\n\nYou don't need to precisely understand what that means now. We are going to talk a lot about pointers\nat @sec-pointer. For now, just understand that, when the range of indexes is known at compile-time,\nthe slice that get's created is just a pointer to an array, accompanied by a length value that\ntells the size of the slice.\n\nIf you have a slice object like this, i.e. a slice that has a compile-time known range,\nyou can use common pointer operations over this slice object. For example, you can \ndereference the pointer of this slice, by using the `.*` method, like you would\ndo on a normal pointer object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst arr1 = [10]u64 {\n 1, 2, 3, 4, 5,\n 6, 7, 8, 9, 10\n};\n// This slice have a compile-time known range.\n// Because we know both the start and end of the range.\nconst slice = arr1[1..4];\n```\n:::\n\n\n\n\n\nOn the other hand, if the range of indexes is not known at compile time, then, the slice object\nthat get's created is not a pointer anymore, and, thus, it does not support pointer operations.\nFor example, maybe the start index is known at compile time, but the end index is not. In such\ncase, the range of the slice becomes runtime known only.\n\nIn the example below, the `slice` object have a runtime known range, because the end index of the range\nis not known at compile time. In other words, the size of the array at `buffer` is not known\nat compile time. When we execute this program, the size of the array might be 10, or, it might be 12\ndepending on where we execute it. Therefore, we don't know at compile time if\nthe slice object have a range of size 10, or, a range of size 12.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst builtin = @import(\"builtin\");\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var n: usize = 0;\n if (builtin.target.os.tag == .windows) {\n n = 10;\n } else {\n n = 12;\n }\n const buffer = try allocator.alloc(u64, n);\n const slice = buffer[0..];\n _ = slice;\n}\n```\n:::\n\n\n\n\n\n## Blocks and scopes {#sec-blocks}\n\nBlocks are created in Zig by a pair of curly braces. A block is just a group of\nexpressions (or statements) contained inside of a pair of curly braces. All of these expressions that\nare contained inside of this pair of curly braces belongs to the same scope.\n\nIn other words, a block just delimits a scope in your code.\nThe objects that you define inside the same block belongs to the same\nscope, and, therefore, are accessible from within this scope.\nAt the same time, these objects are not accessible outside of this scope.\nSo, you could also say that blocks are used to limit the scope of the objects that you create in\nyour source code. In less technical terms, blocks are used to specify where in your source code\nyou can access whatever object you have in your source code.\n\nSo, a block is just a group of expressions contained inside a pair of curly braces.\nAnd every block have it's own scope separated from the others.\nThe body of a function is a classic example of a block. If statements, for and while loops\n(and any other structure in the language that uses the pair of curly braces)\nare also examples of blocks.\n\nThis means that, every if statement, or for loop,\netc., that you create in your source code have it's own separate scope.\nThat is why you can't access the objects that you defined inside\nof your for loop (or if statement) in an outer scope, i.e. a scope outside of the for loop.\nBecause you are trying to access an object that belongs to a scope that is different\nthan your current scope.\n\n\nYou can create blocks within blocks, with multiple levels of nesting.\nYou can also (if you want to) give a label to a particular block, with the colon character (`:`).\nJust write `label:` before you open the pair of curly braces that delimits your block. When you label a block\nin Zig, you can use the `break` keyword to return a value from this block, like as if it\nwas a function's body. You just write the `break` keyword, followed by the block label in the format `:label`,\nand the expression that defines the value that you want to return.\n\nLike in the example below, where we are returning the value from the `y` object\nfrom the block `add_one`, and saving the result inside the `x` object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar y: i32 = 123;\nconst x = add_one: {\n y += 1;\n break :add_one y;\n};\nif (x == 124 and y == 124) {\n try stdout.print(\"Hey!\", .{});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHey!\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n\n## How strings work in Zig? {#sec-zig-strings}\n\nThe first project that we are going to build and discuss in this book is a base64 encoder/decoder (@sec-base64).\nBut in order for us to build such a thing, we need to get a better understanding on how strings work in Zig.\nSo let's discuss this specific aspect of Zig.\n\nIn Zig, a string literal value is just a pointer to a null-terminated array of bytes (i.e. the same thing as a C string).\nHowever, a string object in Zig is a little more than just a pointer. A string object\nin Zig is an object of type `[]const u8`, and, this object always contains two things: the\nsame null-terminated array of bytes that you would find in a string literal value, plus a length value.\nEach byte in this \"array of bytes\" is represented by an `u8` value, which is an unsigned 8 bit integer,\nso, it is equivalent to the C data type `unsigned char`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// This is a string literal value:\n\"A literal value\";\n// This is a string object:\nconst object: []const u8 = \"A string object\";\n```\n:::\n\n\n\n\nZig always assumes that this sequence of bytes is UTF-8 encoded. This might not be true for every\nsequence of bytes you have it, but is not really Zig's job to fix the encoding of your strings\n(you can use [`iconv`](https://www.gnu.org/software/libiconv/)[^libiconv] for that).\nToday, most of the text in our modern world, especially on the web, should be UTF-8 encoded.\nSo if your string literal is not UTF-8 encoded, then, you will likely\nhave problems in Zig.\n\n[^libiconv]: \n\nLet’s take for example the word \"Hello\". In UTF-8, this sequence of characters (H, e, l, l, o)\nis represented by the sequence of decimal numbers 72, 101, 108, 108, 111. In xecadecimal, this\nsequence is `0x48`, `0x65`, `0x6C`, `0x6C`, `0x6F`. So if I take this sequence of hexadecimal values,\nand ask Zig to print this sequence of bytes as a sequence of characters (i.e. a string), then,\nthe text \"Hello\" will be printed into the terminal:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const bytes = [_]u8{0x48, 0x65, 0x6C, 0x6C, 0x6F};\n try stdout.print(\"{s}\\n\", .{bytes});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHello\n```\n\n\n:::\n:::\n\n\n\n\n\nIf you want to see the actual bytes that represents a string in Zig, you can use\na `for` loop to iterate through each byte in the string, and ask Zig to print each byte as an hexadecimal\nvalue to the terminal. You do that by using a `print()` statement with the `X` formatting specifier,\nlike you would normally do with the [`printf()` function](https://cplusplus.com/reference/cstdio/printf/)[^printfs] in C.\n\n[^printfs]: \n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n try stdout.print(\"Bytes that represents the string object: \", .{});\n for (string_object) |byte| {\n try stdout.print(\"{X} \", .{byte});\n }\n try stdout.print(\"\\n\", .{});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBytes that represents the string object: 54 68 69 \n 73 20 69 73 20 61 6E 20 65 78 61 6D 70 6C 65 20 6F\n F 66 20 73 74 72 69 6E 67 20 6C 69 74 65 72 61 6C 2\n 20 69 6E 20 5A 69 67 \n```\n\n\n:::\n:::\n\n\n\n\n### Strings in C\n\nAt first glance, this looks very similar to how C treats strings as well. In more details, string values\nin C are treated internally as an array of arbitrary bytes, and this array is also null-terminated.\n\nBut one key difference between a Zig string and a C string, is that Zig also stores the length of\nthe array inside the string object. This small detail makes your code safer, because is much\neasier for the Zig compiler to check if you are trying to access an element that is \"out of bounds\", i.e. if\nyour trying to access memory that does not belong to you.\n\nTo achieve this same kind of safety in C, you have to do a lot of work that kind of seems pointless.\nSo getting this kind of safety is not automatic and much harder to do in C. For example, if you want\nto track the length of your string troughout your program in C, then, you first need to loop through\nthe array of bytes that represents this string, and find the null element (`'\\0'`) position to discover\nwhere exactly the array ends, or, in other words, to find how much elements the array of bytes contain.\n\nTo do that, you would need something like this in C. In this example, the C string stored in\nthe object `array` is 25 bytes long:\n\n```c\n#include \nint main() {\n char* array = \"An example of string in C\";\n int index = 0;\n while (1) {\n if (array[index] == '\\0') {\n break;\n }\n index++;\n }\n printf(\"Number of elements in the array: %d\\n\", index);\n}\n```\n\n```\nNumber of elements in the array: 25\n```\n\nBut in Zig, you do not have to do this, because the object already contains a `len`\nfield which stores the length information of the array. As an example, the `string_object` object below is 43 bytes long:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n try stdout.print(\"{d}\\n\", .{string_object.len});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n43\n```\n\n\n:::\n:::\n\n\n\n\n\n### A better look at the object type\n\nNow, we can inspect better the type of objects that Zig create. To check the type of any object in Zig, you can use the\n`@TypeOf()` function. If we look at the type of the `simple_array` object below, you will find that this object\nis a array of 4 elements. Each element is a signed integer of 32 bits which corresponds to the data type `i32` in Zig.\nThat is what an object of type `[4]i32` is.\n\nBut if we look closely at the type of the `string_object` object below, you will find that this object is a\nconstant pointer (hence the `*const` annotation) to an array of 43 elements (or 43 bytes). Each element is a\nsingle byte (more precisely, an unsigned 8 bit integer - `u8`), that is why we have the `[43:0]u8` portion of the type below.\nIn other words, the string stored inside the `string_object` object is 43 bytes long.\nThat is why you have the type `*const [43:0]u8` below.\n\nIn the case of `string_object`, it is a constant pointer (`*const`) because the object `string_object` is declared\nas constant in the source code (in the line `const string_object = ...`). So, if we changed that for some reason, if\nwe declare `string_object` as a variable object (i.e. `var string_object = ...`), then, `string_object` would be\njust a normal pointer to an array of unsigned 8-bit integers (i.e. `* [43:0]u8`).\n\nNow, if we create an pointer to the `simple_array` object, then, we get a constant pointer to an array of 4 elements (`*const [4]i32`),\nwhich is very similar to the type of the `string_object` object. This demonstrates that a string object (or a string literal)\nin Zig is already a pointer to an array.\n\nJust remember that a \"pointer to an array\" is different than an \"array\". So a string object in Zig is a pointer to an array\nof bytes, and not simply an array of bytes.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"This is an example of string literal in Zig\";\n const simple_array = [_]i32{1, 2, 3, 4};\n try stdout.print(\"Type of array object: {}\", .{@TypeOf(simple_array)});\n try stdout.print(\n \"Type of string object: {}\",\n .{@TypeOf(string_object)}\n );\n try stdout.print(\n \"Type of a pointer that points to the array object: {}\",\n .{@TypeOf(&simple_array)}\n );\n}\n```\n:::\n\n\n\n\n```\nType of array object: [4]i32\nType of string object: *const [43:0]u8\nType of a pointer that points to\n the array object: *const [4]i32\n```\n\n\n### Byte vs unicode points\n\nIs important to point out that each byte in the array is not necessarily a single character.\nThis fact arises from the difference between a single byte and a single unicode point.\n\nThe encoding UTF-8 works by assigning a number (which is called a unicode point) to each character in\nthe string. For example, the character \"H\" is stored in UTF-8 as the decimal number 72. This means that\nthe number 72 is the unicode point for the character \"H\". Each possible character that can appear in a\nUTF-8 encoded string have its own unicode point.\n\nFor example, the Latin Capital Letter A With Stroke (Ⱥ) is represented by the number (or the unicode point)\n570. However, this decimal number (570) is higher than the maximum number stored inside a single byte, which\nis 255. In other words, the maximum decimal number that can be represented with a single byte is 255. That is why,\nthe unicode point 570 is actually stored inside the computer’s memory as the bytes `C8 BA`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n const string_object = \"Ⱥ\";\n try stdout.print(\"Bytes that represents the string object: \", .{});\n for (string_object) |char| {\n try stdout.print(\"{X} \", .{char});\n }\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nBytes that represents the string object: C8 BA \n```\n\n\n:::\n:::\n\n\n\n\n\nThis means that to store the character Ⱥ in an UTF-8 encoded string, we need to use two bytes together\nto represent the number 570. That is why the relationship between bytes and unicode points is not always\n1 to 1. Each unicode point is a single character in the string, but not always a single byte corresponds\nto a single unicode point.\n\nAll of this means that if you loop trough the elements of a string in Zig, you will be looping through the\nbytes that represents that string, and not through the characters of that string. In the Ⱥ example above,\nthe for loop needed two iterations (instead of a single iteration) to print the two bytes that represents this Ⱥ letter.\n\nNow, all english letters (or ASCII letters if you prefer) can be represented by a single byte in UTF-8. As a\nconsequence, if your UTF-8 string contains only english letters (or ASCII letters), then, you are lucky. Because\nthe number of bytes will be equal to the number of characters in that string. In other words, in this specific\nsituation, the relationship between bytes and unicode points is 1 to 1.\n\nBut on the other side, if your string contains other types of letters… for example, you might be working with\ntext data that contains, chinese, japanese or latin letters, then, the number of bytes necessary to represent\nyour UTF-8 string will likely be much higher than the number of characters in that string.\n\nIf you need to iterate through the characters of a string, instead of its bytes, then, you can use the\n`std.unicode.Utf8View` struct to create an iterator that iterates through the unicode points of your string.\n\nIn the example below, we loop through the japanese characters “アメリカ”. Each of the four characters in\nthis string is represented by three bytes. But the for loop iterates four times, one iteration for each\ncharacter/unicode point in this string:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n var utf8 = (\n (try std.unicode.Utf8View.init(\"アメリカ\"))\n .iterator()\n );\n while (utf8.nextCodepointSlice()) |codepoint| {\n try stdout.print(\n \"got codepoint {}\\n\",\n .{std.fmt.fmtSliceHexUpper(codepoint)}\n );\n }\n}\n```\n:::\n\n\n\n\n```\ngot codepoint E382A2\ngot codepoint E383A1\ngot codepoint E383AA\ngot codepoint E382AB\n```\n\n\n### Some useful functions for strings {#sec-strings-useful-funs}\n\nIn this section, I just want to quickly describe some functions from the Zig Standard Library\nthat are very useful to use when working with strings. Most notably:\n\n- `std.mem.eql()`: to compare if two strings are equal.\n- `std.mem.splitScalar()`: to split a string into an array of substrings given a delimiter value.\n- `std.mem.splitSequence()`: to split a string into an array of substrings given a substring delimiter.\n- `std.mem.startsWith()`: to check if string starts with substring.\n- `std.mem.endsWith()`: to check if string starts with substring.\n- `std.mem.trim()`: to remove specific values from both start and end of the string.\n- `std.mem.concat()`: to concatenate strings together.\n- `std.mem.count()`: to count the occurrences of substring in the string.\n- `std.mem.replace()`: to replace the occurrences of substring in the string.\n\nNotice that all of these functions come from the `mem` module of\nthe Zig Standard Library. This module contains multiple functions and methods\nthat are useful to work with memory and sequences of bytes in general.\n\nThe `eql()` function is used to check if two arrays of data are equal or not.\nSince strings are just arbitrary arrays of bytes, we can use this function to compare two strings together.\nThis function returns a boolean value indicating if the two strings are equal\nor not. The first argument of this function is the data type of the elements of the arrays\nthat are being compared.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst name: []const u8 = \"Pedro\";\ntry stdout.print(\n \"{any}\\n\", .{std.mem.eql(u8, name, \"Pedro\")}\n);\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntrue\n```\n\n\n:::\n:::\n\n\n\n\nThe `splitScalar()` and `splitSequence()` functions are useful to split\na string into multiple fragments, like the `split()` method from Python strings. The difference between these two\nmethods is that the `splitScalar()` uses a single character as the separator to\nsplit the string, while `splitSequence()` uses a sequence of characters (a.k.a. a substring)\nas the separator. There is a practical example of these functions later in the book.\n\nThe `startsWith()` and `endsWith()` functions are pretty straightforward. They\nreturn a boolean value indicating if the string (or, more precisely, if the array of data)\nbegins (`startsWith`) or ends (`endsWith`) with the sequence provided.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst name: []const u8 = \"Pedro\";\ntry stdout.print(\n \"{any}\\n\", .{std.mem.startsWith(u8, name, \"Pe\")}\n);\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\ntrue\n```\n\n\n:::\n:::\n\n\n\n\nThe `concat()` function, as the name suggests, concatenate two or more strings together.\nBecause the process of concatenating the strings involves allocating enough space to\naccomodate all the strings together, this `concat()` function receives an allocator\nobject as input.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst str1 = \"Hello\";\nconst str2 = \" you!\";\nconst str3 = try std.mem.concat(\n allocator, u8, &[_][]const u8{ str1, str2 }\n);\ntry stdout.print(\"{s}\\n\", .{str3});\n```\n:::\n\n\n\n\n```\nHello you!\n```\n\nAs you can imagine, the `replace()` function is used to replace substrings in a string by another substring.\nThis function works very similarly to the `replace()` method from Python strings. Therefore, you\nprovide a substring to search, and every time that the `replace()` function finds\nthis substring within the input string, it replaces this substring with the \"replacement substring\"\nthat you provided as input.\n\nIn the example below, we are taking the input string \"Hello\", and replacing all occurrences\nof the substring \"el\" inside this input string with \"34\", and saving the results inside the\n`buffer` object. As result, the `replace()` function returns an `usize` value that\nindicates how many replacements were performed.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst str1 = \"Hello\";\nvar buffer: [5]u8 = undefined;\nconst nrep = std.mem.replace(\n u8, str1, \"el\", \"34\", buffer[0..]\n);\ntry stdout.print(\"New string: {s}\\n\", .{buffer});\ntry stdout.print(\"N of replacements: {d}\\n\", .{nrep});\n```\n:::\n\n\n\n\n```\nNew string: H34lo\nN of replacements: 1\n```\n\n\n\n\n\n\n## Safety in Zig\n\nA general trend in modern low-level programming languages is safety. As our modern world\nbecome more interconnected with techology and computers,\nthe data produced by all of this technology becomes one of the most important\n(and also, one of the most dangerous) assets that we have.\n\nThis is probably the main reason why modern low-level programming languages\nhave been giving great attention to safety, especially memory safety, because\nmemory corruption is still the main target for hackers to exploit.\nThe reality is that we don't have an easy solution for this problem.\nFor now, we only have techniques and strategies that mitigates these\nproblems.\n\nAs Richard Feldman explains on his [most recent GOTO conference talk](https://www.youtube.com/watch?v=jIZpKpLCOiU&ab_channel=GOTOConferences)[^gotop]\n, we haven't figured it out yet a way to achieve **true safety in technology**.\nIn other words, we haven't found a way to build software that won't be exploited\nwith 100% certainty. We can greatly reduce the risks of our software being\nexploited, by ensuring memory safety for example. But this is not enough\nto achieve \"true safety\" territory.\n\nBecause even if you write your program in a \"safe language\", hackers can still\nexploit failures in the operational system where your program is running (e.g. maybe the\nsystem where your code is running have a \"backdoor exploit\" that can still\naffect your code in unexpected ways), or also, they can exploit the features\nfrom the architecture of your computer. A recently found exploit\nthat involves memory invalidation through a feature of \"memory tags\"\npresent in ARM chips is an example of that [@exploit1].\n\n[^gotop]: \n\nThe question is: what Zig and other languages have been doing to mitigate this problem?\nIf we take Rust as an example, Rust is, for the most part[^rust-safe], a memory safe\nlanguage by enforcing specific rules to the developer. In other words, the key feature\nof Rust, the *borrow checker*, forces you to follow a specific logic when you are writing\nyour Rust code, and the Rust compiler will always complain everytime you try to go out of this\npattern.\n\n[^rust-safe]: Actually, a lot of existing Rust code is still memory unsafe, because they communicate with external libraries through FFI (*foreign function interface*), which disables the borrow-checker features through the `unsafe` keyword.\n\n\nIn contrast, the Zig language is not a memory safe language by default.\nThere are some memory safety features that you get for free in Zig,\nespecially in arrays and pointer objects. But there are other tools\noffered by the language, that are not used by default.\nIn other words, the `zig` compiler does not obligates you to use such tools.\n\nThe tools listed below are related to memory safety. That is, they help you to achieve\nmemory safety in your Zig code:\n\n- `defer` allows you to keep free operations phisically close to allocations. This helps you to avoid memory leaks, \"use after free\", and also \"double-free\" problems. Furthermore, it also keeps free operations logically tied to the end of the current scope, which greatly reduces the mental overhead about object lifetime.\n- `errdefer` helps you to garantee that your program frees the allocated memory, even if a runtime error occurs.\n- pointers and objects are non-nullable by default. This helps you to avoid memory problems that might arise from de-referencing null pointers.\n- Zig offers some native types of allocators (called \"testing allocators\") that can detect memory leaks and double-frees. These types of allocators are widely used on unit tests, so they transform your unit tests into a weapon that you can use to detect memory problems in your code.\n- arrays and slices in Zig have their lengths embedded in the object itself, which makes the `zig` compiler very effective on detecting \"index out-of-range\" type of errors, and avoiding buffer overflows.\n\n\nDespite these features that Zig offers that are related to memory safety issues, the language\nalso have some rules that help you to achieve another type of safety, which is more related to\nprogram logic safety. These rules are:\n\n- pointers and objects are non-nullable by default. Which eliminates an edge case that might break the logic of your program.\n- switch statements must exaust all possible options.\n- the `zig` compiler forces you to handle every possible error in your program.\n\n\n## Other parts of Zig\n\nWe already learned a lot about Zig's syntax, and also, some pretty technical\ndetails about it. Just as a quick recap:\n\n- We talked about how functions are written in Zig at @sec-root-file and @sec-main-file.\n- How to create new objects/identifiers at @sec-root-file and especially at @sec-assignments.\n- How strings work in Zig at @sec-zig-strings.\n- How to use arrays and slices at @sec-arrays.\n- How to import functionality from other Zig modules at @sec-root-file.\n\n\nBut, for now, this amount of knowledge is enough for us to continue with this book.\nLater, over the next chapters we will still talk more about other parts of\nZig's syntax that are also equally important. Such as:\n\n\n- How Object-Oriented programming can be done in Zig through *struct declarations* at @sec-structs-and-oop.\n- Basic control flow syntax at @sec-zig-control-flow.\n- Enums at @sec-enum;\n- Pointers and Optionals at @sec-pointer;\n- Error handling with `try` and `catch` at @sec-error-handling;\n- Unit tests at @sec-unittests;\n- Vectors at @sec-vectors-simd;\n- Build System at @sec-build-system;\n\n\n\n\n",
+ "supporting": [
+ "01-zig-weird_files"
+ ],
"filters": [
"rmarkdown/pagebreak.lua"
],
diff --git a/_freeze/Chapters/03-structs/execute-results/html.json b/_freeze/Chapters/03-structs/execute-results/html.json
index d08c23b..c29bc00 100644
--- a/_freeze/Chapters/03-structs/execute-results/html.json
+++ b/_freeze/Chapters/03-structs/execute-results/html.json
@@ -1,8 +1,8 @@
{
- "hash": "70fe8a0eba13d161513df2fb8cba10da",
+ "hash": "280a16afd35089f77202dba2a381353c",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Control flow, structs, modules and types\n\nWe have discussed a lot of Zig's syntax in the last chapter,\nespecially at @sec-root-file and @sec-main-file.\nBut we still need to discuss some other very important\nelements of the language. Elements that you will use constantly on your day-to-day\nroutine.\n\nWe begin this chapter by discussing the different keywords and structures\nin Zig related to control flow (e.g. loops and if statements).\nThen, we talk about structs and how they can be used to do some\nbasic Object-Oriented (OOP) patterns in Zig. We also talk about\ntype inference and type casting.\nFinally, we end this chapter by discussing modules, and how they relate\nto structs.\n\n\n\n## Control flow {#sec-zig-control-flow}\n\nSometimes, you need to make decisions in your program. Maybe you need to decide\nwhether to execute or not a specific piece of code. Or maybe,\nyou need to apply the same operation over a sequence of values. These kinds of tasks,\ninvolve using structures that are capable of changing the \"control flow\" of our program.\n\nIn computer science, the term \"control flow\" usually refers to the order in which expressions (or commands)\nare evaluated in a given language or program. But this term is also used to refer\nto structures that are capable of changing this \"evaluation order\" of the commands\nexecuted by a given language/program.\n\nThese structures are better known\nby a set of terms, such as: loops, if/else statements, switch statements, among others. So,\nloops and if/else statements are examples of structures that can change the \"control\nflow\" of our program. The keywords `continue` and `break` are also examples of symbols\nthat can change the order of evaluation, since they can move our program to the next iteration\nof a loop, or make the loop stop completely.\n\n\n### If/else statements\n\nAn if/else statement performs a \"conditional flow operation\".\nA conditional flow control (or choice control) allows you to execute\nor ignore a certain block of commands based on a logical condition.\nMany programmers and computer science professionals also use\nthe term \"branching\" in this case.\nIn essence, an if/else statement allow us to use the result of a logical test\nto decide whether or not to execute a given block of commands.\n\nIn Zig, we write if/else statements by using the keywords `if` and `else`.\nWe start with the `if` keyword followed by a logical test inside a pair\nof parentheses, followed by a pair of curly braces which contains the lines\nof code to be executed in case the logical test returns the value `true`.\n\nAfter that, you can optionally add an `else` statement. To do that, just add the `else`\nkeyword followed by a pair of curly braces, with the lines of code\nto executed in case the logical test defined at `if` returns `false`.\n\nIn the example below, we are testing if the object `x` contains a number\nthat is greater than 10. Judging by the output printed to the console,\nwe know that this logical test returned `false`. Because the output\nin the console is compatible with the line of code present in the\n`else` branch of the if/else statement.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst x = 5;\nif (x > 10) {\n try stdout.print(\n \"x > 10!\\n\", .{}\n );\n} else {\n try stdout.print(\n \"x <= 10!\\n\", .{}\n );\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nx <= 10!\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Switch statements {#sec-switch}\n\nSwitch statements are also available in Zig, and they have a very similar syntax to a switch statement in Rust.\nAs you would expect, to write a switch statement in Zig we use the `switch` keyword.\nWe provide the value that we want to \"switch over\" inside a\npair of parentheses. Then, we list the possible combinations (or \"branchs\")\ninside a pair of curly braces.\n\nLet's take a look at the code example below. You can see that\nI'm creating an enum type called `Role`. We talk more about enums at @sec-enum.\nBut in summary, this `Role` type is listing different types of roles in a fictitious\ncompany, like `SE` for Software Engineer, `DE` for Data Engineer, `PM` for Product Manager,\netc.\n\nNotice that we are using the value from the `role` object in the\nswitch statement, to discover which exact area we need to store in the `area` variable object.\nAlso notice that we are using type inference inside the switch statement, with the dot character,\nas we described at @sec-type-inference.\nThis makes the `zig` compiler infer the correct data type of the values (`PM`, `SE`, etc.) for us.\n\nAlso notice that, we are grouping multiple values in the same branch of the switch statement.\nWe just separate each possible value with a comma. For example, if `role` contains either `DE` or `DA`,\nthe `area` variable would contain the value `\"Data & Analytics\"`, instead of `\"Platform\"` or `\"Sales\"`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst Role = enum {\n SE, DPE, DE, DA, PM, PO, KS\n};\n\npub fn main() !void {\n var area: []const u8 = undefined;\n const role = Role.SE;\n switch (role) {\n .PM, .SE, .DPE, .PO => {\n area = \"Platform\";\n },\n .DE, .DA => {\n area = \"Data & Analytics\";\n },\n .KS => {\n area = \"Sales\";\n },\n }\n try stdout.print(\"{s}\\n\", .{area});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nPlatform\n```\n\n\n:::\n:::\n\n\n\n\n\n#### Switch statements must exhaust all possibilities\n\nOne very important aspect about switch statements in Zig\nis that they must exhaust all existing possibilities.\nIn other words, all possible values that could be found inside the `order`\nobject must be explicitly handled in this switch statement.\n\nSince the `role` object have type `Role`, the only possible values to\nbe found inside this object are `PM`, `SE`, `DPE`, `PO`, `DE`, `DA` and `KS`.\nThere are no other possible values to be stored in this `role` object.\nThus, the switch statements must have a combination (branch) for each one of these values.\nThis is what \"exhaust all existing possibilities\" means. The switch statement covers\nevery possible case.\n\nTherefore, you cannot write a switch statement in Zig, and leave an edge case\nwith no expliciting action to be taken.\nThis is a similar behaviour to switch statements in Rust, which also have to\nhandle all possible cases.\n\n\n\n#### The else branch\n\nTake a look at the `dump_hex_fallible()` function below as an example. This function\ncomes from the Zig Standard Library. More precisely, from the\n[`debug.zig` module](https://github.com/ziglang/zig/blob/master/lib/std/debug.zig)[^debug-mod].\nThere are multiple lines in this function, but I omitted them to focus solely on the\nswitch statement found in this function. Notice that this switch statement have four\npossible cases, or four explicit branches. Also, notice that we used an `else` branch\nin this case.\n\nAn `else` branch in a switch statement work as the \"default branch\".\nWhenever you have multiple cases in your switch statement where\nyou want to apply the exact same action, you can use an `else` branch to do that.\n\n[^debug-mod]: \n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn dump_hex_fallible(bytes: []const u8) !void {\n // Many lines ...\n switch (byte) {\n '\\n' => try writer.writeAll(\"␊\"),\n '\\r' => try writer.writeAll(\"␍\"),\n '\\t' => try writer.writeAll(\"␉\"),\n else => try writer.writeByte('.'),\n }\n}\n```\n:::\n\n\n\n\nMany programmers would also use an `else` branch to handle a \"not supported\" case.\nThat is, a case that cannot be properly handled by your code, or, just a case that\nshould not be \"fixed\". Therefore, you can use an `else` branch to panic (or raise an error)\nin your program to stop the current execution.\n\nTake the code example below. We can see that, we are handling the cases\nfor the `level` object being either 1, 2, or 3. All other possible cases are not supported by default,\nand, as consequence, we raise a runtime error in such cases through the `@panic()` built-in function.\n\nAlso notice that, we are assigning the result of the switch statement to a new object called `category`.\nThis is another thing that you can do with switch statements in Zig. If the branchs\noutput some value as result, you can store the result value of the switch statement into\na new object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst level: u8 = 4;\nconst category = switch (level) {\n 1, 2 => \"beginner\",\n 3 => \"professional\",\n else => {\n @panic(\"Not supported level!\");\n },\n};\ntry stdout.print(\"{s}\\n\", .{category});\n```\n:::\n\n\n\n\n```\nthread 13103 panic: Not supported level!\nt.zig:9:13: 0x1033c58 in main (switch2)\n @panic(\"Not supported level!\");\n ^\n```\n\n\n\n#### Using ranges in switch\n\nFurthermore, you can also use ranges of values in switch statements.\nThat is, you can create a branch in your switch statement that is used\nwhenever the input value is within a range. These \"range expressions\"\nare created with the operator `...`. Is important\nto emphasize that the ranges created by this operator are\ninclusive on both ends.\n\nFor example, I could easily change the previous code example to support all\nlevels between 0 and 100. Like this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst level: u8 = 4;\nconst category = switch (level) {\n 0...25 => \"beginner\",\n 26...75 => \"intermediary\",\n 76...100 => \"professional\",\n else => {\n @panic(\"Not supported level!\");\n },\n};\ntry stdout.print(\"{s}\\n\", .{category});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nbeginner\n```\n\n\n:::\n:::\n\n\n\n\nThis is neat, and it works with character ranges too. That is, I could\nsimply write `'a'...'z'`, to match any character value that is a\nlowercase letter, and it would work fine.\n\n\n#### Labeled switch statements\n\nAt @sec-blocks we have talked about labeling blocks, and also, about using these labels\nto return a value from the block. Well, from version 0.14.0 and onwards of the `zig` compiler,\nyou can also apply labels over switch statements, which makes it possible to almost implement a\n\"C `goto`\" like pattern.\n\nFor example, if you give the label `xsw` to a switch statement, you can use this\nlabel in conjunction with the `continue` keyword to go back to the beginning of the switch\nstatement. In the example below, the execution goes back to the beginning of the\nswitch statement two times, before ending at the `3` branch.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nxsw: switch (@as(u8, 1)) {\n 1 => continue :xsw 2,\n 2 => continue :xsw 3,\n 3 => return,\n 4 => {},\n}\n```\n:::\n\n\n\n\n\n### The `defer` keyword {#sec-defer}\n\nWith the `defer` keyword you can register an expression to be executed when you exit the current scope.\nTherefore, this keyword has a similar functionality as the `on.exit()` function from R.\nTake the `foo()` function below as an example. When we execute this `foo()` function, the expression\nthat prints the message \"Exiting function ...\" get's executed only when the function exits\nit's scope.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nfn foo() !void {\n defer std.debug.print(\n \"Exiting function ...\\n\", .{}\n );\n try stdout.print(\"Adding some numbers ...\\n\", .{});\n const x = 2 + 2; _ = x;\n try stdout.print(\"Multiplying ...\\n\", .{});\n const y = 2 * 8; _ = y;\n}\n\npub fn main() !void {\n try foo();\n}\n```\n:::\n\n\n\n\n```\nAdding some numbers ...\nMultiplying ...\nExiting function ...\n```\n\nTherefore, we can use `defer` to declare an expression that is going to be executed\nwhen your code exits the current scope. Some programmers like to interpret the phrase \"exit of the current scope\"\nas \"the end of the current scope\". But this interpretation might not be entirely correct, depending\non what you consider as \"the end of the current scope\".\n\nI mean, what do you consider as **the end** of the current scope? Is it the closing curly bracket (`}`) of the scope?\nIs it when the last expression in the function get's executed? Is it when the function returns to the previous scope?\nEtc. For example, it would not be correct to interpret the \"exit of the current scope\" as the closing\ncurly bracket of the scope. Because the function might exit from an earlier position than this\nclosing curly bracket (e.g. an error value was generated at a previous line inside the function;\nthe function reached an earlier return statement; etc.). Anyway, just be careful with this interpretation.\n\nNow, if you remember of what we have discussed at @sec-blocks, there are multiple structures in the language\nthat create their own separate scopes. For/while loops, if/else statements,\nfunctions, normal blocks, etc. This also affects the interpretation of `defer`.\nFor example, if you use `defer` inside a for loop, then, the given expression\nwill be executed everytime this specific for loop exits it's own scope.\n\nBefore we continue, is worth emphasizing that the `defer` keyword is an \"unconditional defer\".\nWhich means that the given expression will be executed no matter how the code exits\nthe current scope. For example, your code might exit the current scope because of an error value\nbeing generated, or, because of a return statement, or, a break statement, etc.\n\n\n\n### The `errdefer` keyword {#sec-errdefer1}\n\nOn the previous section, we have discussed the `defer` keyword, which you can use to\nregister an expression to be executed at the exit of the current scope.\nBut this keyword have a brother, which is the `errdefer` keyword. While `defer`\nis an \"unconditional defer\", the `errdefer` keyword is a \"conditional defer\".\nWhich means that the given expression is executed only when you exit the current\nscope on a very specific circumstance.\n\nIn more details, the expression given to `errdefer` is executed only when an error occurs in the current scope.\nTherefore, if the function (or for/while loop, if/else statement, etc.) exits the current scope\nin a normal situation, without errors, the expression given to `errdefer` is not executed.\n\nThis makes the `errdefer` keyword one of the many tools available in Zig for error handling.\nIn this section, we are more concerned with the control flow aspects around `errdefer`.\nBut we are going to discuss `errdefer` later as a error handling tool at @sec-errdefer2.\n\nThe code example below demonstrates three things:\n\n- that `defer` is an \"unconditional defer\", because the given expression get's executed regardless of how the function `foo()` exits it's own scope.\n- that `errdefer` is executed because the function `foo()` returned an error value.\n- that `defer` and `errdefer` expressions are executed in a LIFO (*last in, first out*) order.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn foo() !void { return error.FooError; }\npub fn main() !void {\n var i: usize = 1;\n errdefer std.debug.print(\"Value of i: {d}\\n\", .{i});\n defer i = 2;\n try foo();\n}\n```\n:::\n\n\n\n\n```\nValue of i: 2\nerror: FooError\n/t.zig:6:5: 0x1037e48 in foo (defer)\n return error.FooError;\n ^\n```\n\n\nWhen I say that \"defer expressions\" are executed in a LIFO order, what I want to mean is that\nthe last `defer` or `errdefer` expressions in the code are the first ones to be executed.\nYou could also interpret this as: \"defer expressions\" are executed from bottom to top, or,\nfrom last to first.\n\nTherefore, if I change the order of the `defer` and `errdefer` expressions, you will notice that\nthe value of `i` that get's printed to the console changes to 1. This doesn't mean that the\n`defer` expression was not executed in this case. This actually means that the `defer` expression\nwas executed only after the `errdefer` expression. The code example below demonstrates this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn foo() !void { return error.FooError; }\npub fn main() !void {\n var i: usize = 1;\n defer i = 2;\n errdefer std.debug.print(\"Value of i: {d}\\n\", .{i});\n try foo();\n}\n```\n:::\n\n\n\n\n```\nValue of i: 1\nerror: FooError\n/t.zig:6:5: 0x1037e48 in foo (defer)\n return error.FooError;\n ^\n```\n\n\n\n\n### For loops\n\nA loop allows you to execute the same lines of code multiple times,\nthus, creating a \"repetition space\" in the execution flow of your program.\nLoops are particularly useful when we want to replicate the same function\n(or the same set of commands) over different inputs.\n\nThere are different types of loops available in Zig. But the most\nessential of them all is probably the *for loop*. A for loop is\nused to apply the same piece of code over the elements of a slice, or, an array.\n\nFor loops in Zig have a slightly different syntax that you are\nprobably used to seeing in other languages. You start with the `for` keyword, then, you\nlist the items that you want to iterate\nover inside a pair of parentheses. Then, inside of a pair of pipes (`|`)\nyou should declare an identifier that will serve as your iterator, or,\nthe \"repetition index of the loop\".\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfor (items) |value| {\n // code to execute\n}\n```\n:::\n\n\n\n\nTherefore, instead of using a `(value in items)` syntax,\nin Zig, for loops use the syntax `(items) |value|`. In the example\nbelow, you can see that we are looping through the items\nof the array stored at the object `name`, and printing to the\nconsole the decimal representation of each character in this array.\n\nIf we wanted, we could also iterate through a slice (or a portion) of\nthe array, instead of iterating through the entire array stored in the `name` object.\nJust use a range selector to select the section you want. For example,\nI could provide the expression `name[0..3]` to the for loop, to iterate\njust through the first 3 elements in the array.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst name = [_]u8{'P','e','d','r','o'};\nfor (name) |char| {\n try stdout.print(\"{d} | \", .{char});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n80 | 101 | 100 | 114 | 111 | \n```\n\n\n:::\n:::\n\n\n\n\nIn the above example we are using the value itself of each\nelement in the array as our iterator. But there are many situations where\nwe need to use an index instead of the actual values of the items.\n\nYou can do that by providing a second set of items to iterate over.\nMore precisely, you provide the range selector `0..` to the for loop. So,\nyes, you can use two different iterators at the same time in a for\nloop in Zig.\n\nBut remember from @sec-assignments that, every object\nyou create in Zig must be used in some way. So if you declare two iterators\nin your for loop, you must use both iterators inside the for loop body.\nBut if you want to use just the index iterator, and not use the \"value iterator\",\nthen, you can discard the value iterator by maching the\nvalue items to the underscore character, like in the example below:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfor (name, 0..) |_, i| {\n try stdout.print(\"{d} | \", .{i});\n}\n```\n:::\n\n\n\n\n```\n0 | 1 | 2 | 3 | 4 |\n```\n\n\n### While loops\n\nA while loop is created from the `while` keyword. A `for` loop\niterates through the items of an array, but a `while` loop\nwill loop continuously, and infinitely, until a logical test\n(specified by you) becomes false.\n\nYou start with the `while` keyword, then, you define a logical\nexpression inside a pair of parentheses, and the body of the\nloop is provided inside a pair of curly braces, like in the example below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar i: u8 = 1;\nwhile (i < 5) {\n try stdout.print(\"{d} | \", .{i});\n i += 1;\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n1 | 2 | 3 | 4 | \n```\n\n\n:::\n:::\n\n\n\n\nYou can also specify the increment expression to be used at the beginning of while loop.\nTo do that, we write the increment expression inside a pair of parentheses after a colon character (`:`).\nThe code example below demonstrates this other pattern.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar i: u8 = 1;\nwhile (i < 5) : (i += 1) {\n try stdout.print(\"{d} | \", .{i});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n1 | 2 | 3 | 4 | \n```\n\n\n:::\n:::\n\n\n\n\n### Using `break` and `continue`\n\nIn Zig, you can explicitly stop the execution of a loop, or, jump to the next iteration of the loop, by using\nthe keywords `break` and `continue`, respectively. The `while` loop presented in the next code example is,\nat first sight, an infinite loop. Because the logical value inside the parenthese will always be equal to `true`.\nBut what makes this `while` loop stop when the `i` object reaches the count\n10? Is the `break` keyword!\n\nInside the while loop, we have an if statement that is constantly checking if the `i` variable\nis equal to 10. Since we are incrementing the value of `i` at each iteration of the\nwhile loop, this `i` object will eventually be equal to 10, and when it does, the if statement\nwill execute the `break` expression, and, as a result, the execution of the while loop is stopped.\n\nNotice the use of the `expect()` function from the Zig Standard Library after the while loop.\nThis `expect()` function is an \"assert\" type of function.\nThis function checks if the logical test provided is equal to true. If so, the function do nothing.\nOtherwise (i.e. the logical test is equal to false), the function raises an assertion error.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar i: usize = 0;\nwhile (true) {\n if (i == 10) {\n break;\n }\n i += 1;\n}\ntry std.testing.expect(i == 10);\ntry stdout.print(\"Everything worked!\", .{});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nEverything worked!\n```\n\n\n:::\n:::\n\n\n\n\nSince this code example was executed succesfully by the `zig` compiler,\nwithout raising any errors, then, we known that, after the execution of while loop,\nthe `i` object is equal to 10. Because if it wasn't equal to 10, then, an error would\nbe raised by `expect()`.\n\nNow, in the next example, we have an use case for\nthe `continue` keyword. The if statement is constantly\nchecking if the current index is a multiple of 2. If\nit is, then, we jump to the next iteration of the loop.\nOtherwise, the loop just prints the current index to the console.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [_]u8{1,2,3,4,5,6};\nfor (ns) |i| {\n if ((i % 2) == 0) {\n continue;\n }\n try stdout.print(\"{d} | \", .{i});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n1 | 3 | 5 | \n```\n\n\n:::\n:::\n\n\n\n\n\n\n## Function parameters are immutable {#sec-fun-pars}\n\nWe have already discussed a lot of the syntax behind function declarations at @sec-root-file and @sec-main-file.\nBut I want to emphasize a curious fact about function parameters (a.k.a. function arguments) in Zig.\nIn summary, function parameters are immutable in Zig.\n\nTake the code example below, where we declare a simple function that just tries to add\nsome amount to the input integer, and returns the result back. But if you look closely\nat the body of this `add2()` function, you will notice that we try\nto save the result back into the `x` function argument.\n\nIn other words, this function not only use the value that it received through the function argument\n`x`, but it also tries to change the value of this function argument, by assigning the addition result\ninto `x`. However, function arguments in Zig are immutable. You cannot change their values, or, you\ncannot assign values to them inside the body's function.\n\nThis is the reason why, the code example below do not compile successfully. If you try to compile\nthis code example, you get a compile error warning you that you are trying to change the value of a\nimmutable (i.e. constant) object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn add2(x: u32) u32 {\n x = x + 2;\n return x;\n}\n\npub fn main() !void {\n const y = add2(4);\n std.debug.print(\"{d}\\n\", .{y});\n}\n```\n:::\n\n\n\n\n```\nt.zig:3:5: error: cannot assign to constant\n x = x + 2;\n ^\n```\n\n\nIf a function argument receives as input an object whose data type is\nany of the primitive types that we have listed at @sec-primitive-data-types,\nthis object is always passed by value to the function. In other words, this object\nis copied into the function stack frame.\n\nHowever, if the input object have a more complex data type, for example, it might\nbe a struct instance, or an array, or an union value, etc., in cases like that, the `zig` compiler\nwill take the liberty of deciding for you which strategy is best. Thus, the `zig` compiler will\npass your object to the function either by value, or by reference. The compiler will always\nchoose the strategy that is faster for you.\nThis optimization that you get for free is possible only because function arguments are\nimmutable in Zig.\n\nThere are some situations where you might need to change the value of your function argument\ndirectly inside the function's body. This happens more often when we are passing\nC structs as inputs to Zig functions.\n\nIn a situation like this, you can overcome this barrier of immutable function arguments, by simply taking the lead,\nand explicitly choosing to pass the object by reference to the function.\nThat is, instead of depending on the `zig` compiler to decide which strategy is best, you have\nto explicitly mark the function argument as a pointer. This way, we are telling the compiler\nthat this function argument will be passed by reference to the function.\n\nBy making it a pointer, we can finally alter the value of this function argument directly inside\nthe body of the `add2()` function. You can see that the code example below compiles successfully.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn add2(x: *u32) void {\n const d: u32 = 2;\n x.* = x.* + d;\n}\n\npub fn main() !void {\n var x: u32 = 4;\n add2(&x);\n std.debug.print(\"Result: {d}\\n\", .{x});\n}\n```\n:::\n\n\n\n\n```\nResult: 6\n```\n\n\n\n## Structs and OOP {#sec-structs-and-oop}\n\nZig is a language more closely related to C (which is a procedural language),\nthan it is to C++ or Java (which are object-oriented languages). Because of that, you do not\nhave advanced OOP (Object-Oriented Programming) patterns available in Zig, such as classes, interfaces or\nclass inheritance. Nonetheless, OOP in Zig is still possible by using struct definitions.\n\nWith struct definitions, you can create (or define) a new data type in Zig. These struct definitions work the same way as they work in C.\nYou give a name to this new struct (or, to this new data type you are creating), then, you list the data members of this new struct. You can\nalso register functions inside this struct, and they become the methods of this particular struct (or data type), so that, every object\nthat you create with this new type, will always have these methods available and associated with them.\n\nIn C++, when we create a new class, we normally have a constructor method (or, a constructor function) which\nis used to construct (or, to instantiate) every object of this particular class, and we also have\na destructor method (or a destructor function), which is the function responsible for destroying\nevery object of this class.\n\nIn Zig, we normally declare the constructor and the destructor methods\nof our structs, by declaring an `init()` and a `deinit()` methods inside the struct.\nThis is just a naming convention that you will find across the entire Zig Standard Library.\nSo, in Zig, the `init()` method of a struct is normally the constructor method of the class represented by this struct.\nWhile the `deinit()` method is the method used for destroying an existing instance of that struct.\n\nThe `init()` and `deinit()` methods are both used extensively in Zig code, and you will see both of\nthem being used when we talk about allocators at @sec-allocators.\nBut, as another example, let's build a simple `User` struct to represent an user of some sort of system.\n\nIf you look at the `User` struct below, you can see the `struct` keyword.\nNotice the data members of this struct, `id`, `name` and `email`. Every data member have it's\ntype explicitly annotated, with the colon character (`:`) syntax that we described earlier at @sec-root-file.\nBut also notice that every line in the struct body that describes a data member, ends with a comma character (`,`).\nSo every time you declare a data member in your Zig code, always end the line with a comma character, instead\nof ending it with the traditional semicolon character (`;`).\n\nNext, also notice in this example, that we have registrated an `init()` function as a method\nof this `User` struct. This `init()` method is the constructor method that we will use to instantiate\nevery new `User` object. That is why this `init()` function returns a new `User` object as result.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst User = struct {\n id: u64,\n name: []const u8,\n email: []const u8,\n\n pub fn init(id: u64,\n name: []const u8,\n email: []const u8) User {\n\n return User {\n .id = id,\n .name = name,\n .email = email\n };\n }\n\n pub fn print_name(self: User) !void {\n try stdout.print(\"{s}\\n\", .{self.name});\n }\n};\n\npub fn main() !void {\n const u = User.init(1, \"pedro\", \"email@gmail.com\");\n try u.print_name();\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\npedro\n```\n\n\n:::\n:::\n\n\n\n\nThe `pub` keyword plays an important role in struct declarations, and OOP in Zig.\nEvery method that you declare in your struct that is marked with the keyword `pub`,\nbecomes a public method of this particular struct.\n\nSo every method that you create inside your struct, is, at first, a private method\nof that struct. Meaning that, this method can only be called from within this\nstruct. But, if you mark this method as public, with the keyword `pub`, then,\nyou can call the method directly from an instance of the `User` struct.\n\nIn other words, the functions marked by the keyword `pub`\nare members of the public API of that struct.\nFor example, if I did not marked the `print_name()` method as public,\nthen, I could not execute the line `u.print_name()`. Because I would\nnot be authorized to call this method directly in my code.\n\n\n\n### Anonymous struct literals {#sec-anonymous-struct-literals}\n\nYou can declare a struct object as a literal value. When we do that, we normally specify the\ndata type of this struct literal by writing it's data type just before the opening curly brace.\nFor example, I could write a struct literal value of the type `User` that we have defined\nin the previous section like this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst eu = User {\n .id = 1,\n .name = \"Pedro\",\n .email = \"someemail@gmail.com\"\n};\n_ = eu;\n```\n:::\n\n\n\n\nHowever, in Zig, we can also write an anonymous struct literal. That is, you can write a\nstruct literal, but not especify explicitly the type of this particular struct.\nAn anonymous struct is written by using the syntax `.{}`. So, we essentially\nreplaced the explicit type of the struct literal with a dot character (`.`).\n\nAs we described at @sec-type-inference, when you put a dot before a struct literal,\nthe type of this struct literal is automatically inferred by the `zig` compiler.\nIn essence, the `zig` compiler will look for some hint of what is the type of that struct.\nThis hint can be the type annotation of a function argument,\nor the return type annotation of the function that you are using, or the type annotation\nof an existing object.\nIf the compiler do find such type annotation, then, it will use this\ntype in your literal struct.\n\nAnonymous structs are very common to be used as inputs to function arguments in Zig.\nOne example that you have seen already constantly, is the `print()`\nfunction from the `stdout` object.\nThis function takes two arguments.\nThe first argument, is a template string, which should\ncontain string format specifiers in it, which tells how the values provided\nin the second argument should be printed into the message.\n\nWhile the second argument is a struct literal that lists the values\nto be printed into the template message specified in the first argument.\nYou normally want to use an anonymous struct literal here, so that, the\n`zig` compiler do the job of specifying the type of this particular\nanonymous struct for you.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\npub fn main() !void {\n const stdout = std.io.getStdOut().writer();\n try stdout.print(\"Hello, {s}!\\n\", .{\"world\"});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHello, world!\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Struct declarations must be constant\n\nTypes in Zig must be `const` or `comptime` (we are going to talk more about comptime at @sec-comptime).\nWhat this means is that you cannot create a new data type, and mark it as variable with the `var` keyword.\nSo struct declarations are always constant. You cannot declare a new struct type using the `var` keyword.\nIt must be `const`.\n\nIn the `Vec3` example below, this declaration is allowed because I'm using the `const` keyword\nto declare this new data type.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst Vec3 = struct {\n x: f64,\n y: f64,\n z: f64,\n};\n```\n:::\n\n\n\n\n\n### The `self` method argument {#sec-self-arg}\n\nIn every language that have OOP, when we declare a method of some class or struct, we\nusually declare this method as a function that have a `self` argument.\nThis `self` argument is the reference to the object itself from which the method\nis being called from.\n\nIt is not mandatory to use this `self` argument. But why would you not use this `self` argument?\nThere is no reason to not use it. Because the only way to get access to the data stored in the\ndata members of your struct is to access them through this `self` argument.\nIf you don't need to use the data in the data members of your struct inside your method, then, you very likely don't need\na method. You can just declare this logic as a simple function, outside of your\nstruct declaration.\n\n\nTake the `Vec3` struct below. Inside this `Vec3` struct we declared a method named `distance()`.\nThis method calculates the distance between two `Vec3` objects, by following the distance\nformula in euclidean space. Notice that this `distance()` method takes two `Vec3` objects\nas input, `self` and `other`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst m = std.math;\nconst Vec3 = struct {\n x: f64,\n y: f64,\n z: f64,\n\n pub fn distance(self: Vec3, other: Vec3) f64 {\n const xd = m.pow(f64, self.x - other.x, 2.0);\n const yd = m.pow(f64, self.y - other.y, 2.0);\n const zd = m.pow(f64, self.z - other.z, 2.0);\n return m.sqrt(xd + yd + zd);\n }\n};\n```\n:::\n\n\n\n\n\nThe `self` argument corresponds to the `Vec3` object from which this `distance()` method\nis being called from. While the `other` is a separate `Vec3` object that is given as input\nto this method. In the example below, the `self` argument corresponds to the object\n`v1`, because the `distance()` method is being called from the `v1` object,\nwhile the `other` argument corresponds to the object `v2`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst v1 = Vec3 {\n .x = 4.2, .y = 2.4, .z = 0.9\n};\nconst v2 = Vec3 {\n .x = 5.1, .y = 5.6, .z = 1.6\n};\n\nstd.debug.print(\n \"Distance: {d}\\n\",\n .{v1.distance(v2)}\n);\n```\n:::\n\n\n\n\n```\nDistance: 3.3970575502926055\n```\n\n\n\n### About the struct state\n\nSometimes you don't need to care about the state of your struct object. Sometimes, you just need\nto instantiate and use the objects, without altering their state. You can notice that when you have methods\ninside your struct declaration that might use the values that are present in the data members, but they\ndo not alter the values in these data members of the struct in anyway.\n\nThe `Vec3` struct that was presented at @sec-self-arg is an example of that.\nThis struct have a single method named `distance()`, and this method do use the values\npresent in all three data members of the struct (`x`, `y` and `z`). But at the same time,\nthis method do not change the values of these data members in any point.\n\nAs a result of that, when we create `Vec3` objects we usually create them as\nconstant objects, like the `v1` and `v2` objects presented at @sec-self-arg.\nWe can create them as variable objects with the `var` keyword,\nif we want to. But because the methods of this `Vec3` struct do not change\nthe state of the objects in any point, is unnecessary to mark them\nas variable objects.\n\nBut why? Why am I talkin about this here? Is because the `self` argument\nin the methods is affected depending on whether the\nmethods present in a struct change or not the state of the object itself.\nMore specifically, when you have a method in a struct that changes the state\nof the object (i.e. change the value of a data member), the `self` argument\nin this method must be annotated in a different manner.\n\nAs I described at @sec-self-arg, the `self` argument in methods of\na struct is the argument that receives as input the object from which the method\nwas called from. We usually annotate this argument in the methods by writing `self`,\nfollowed by the colon character (`:`), and the data type of the struct to which\nthe method belongs to (e.g. `User`, `Vec3`, etc.).\n\nIf we take the `Vec3` struct that we defined in the previous section as an example,\nwe can see in the `distance()` method that this `self` argument is annotated as\n`self: Vec3`. Because the state of the `Vec3` object is never altered by this\nmethod.\n\nBut what if we do have a method that alters the state of the object, by altering the\nvalues of it's data members, how should we annotate `self` in this instance? The answer is:\n\"we should annotate `self` as a pointer of `x`, instead of just `x`\".\nIn other words, you should annotate `self` as `self: *x`, instead of annotating it\nas `self: x`.\n\nIf we create a new method inside the `Vec3` object that, for example, expands the\nvector by multiplying it's coordinates by a factor of two, then, we need to follow\nthis rule specified in the previous paragraph. The code example below demonstrates\nthis idea:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst m = std.math;\nconst Vec3 = struct {\n x: f64,\n y: f64,\n z: f64,\n\n pub fn distance(self: Vec3, other: Vec3) f64 {\n const xd = m.pow(f64, self.x - other.x, 2.0);\n const yd = m.pow(f64, self.y - other.y, 2.0);\n const zd = m.pow(f64, self.z - other.z, 2.0);\n return m.sqrt(xd + yd + zd);\n }\n\n pub fn double(self: *Vec3) void {\n self.x = self.x * 2.0;\n self.y = self.y * 2.0;\n self.z = self.z * 2.0;\n }\n};\n```\n:::\n\n\n\n\nNotice in the code example above that we have added a new method\nto our `Vec3` struct named `double()`. This method essentially doubles the\ncoordinate values of our vector object. Also notice that, in the\ncase of the `double()` method, we annotated the `self` argument as `*Vec3`,\nindicating that this argument receives a pointer (or a reference, if you prefer to call it this way)\nto a `Vec3` object as input.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar v3 = Vec3 {\n .x = 4.2, .y = 2.4, .z = 0.9\n};\nv3.double();\nstd.debug.print(\"Doubled: {d}\\n\", .{v3.x});\n```\n:::\n\n\n\n\n```\nDoubled: 8.4\n```\n\n\n\nNow, if you change the `self` argument in this `double()` method to `self: Vec3`, like in the\n`distance()` method, you will get the compiler error exposed below as result. Notice that this\nerror message is indicating a line from the `double()` method body,\nindicating that you cannot alter the value of the `x` data member.\n\n```zig\n// If we change the function signature of double to:\n pub fn double(self: Vec3) void {\n```\n\n```\nt.zig:16:13: error: cannot assign to constant\n self.x = self.x * 2.0;\n ~~~~^~\n```\n\nThis error message indicates that the `x` data member belongs to a constant object,\nand, because of that, it cannot be changed. Ultimately, this error message\nis telling us that the `self` argument is constant.\n\nIf you take some time, and think hard about this error message, you will understand it.\nYou already have the tools to understand why we are getting this error message.\nWe have talked about it already at @sec-fun-pars.\nSo remember, every function argument is immutable in Zig, and `self`\nis included in this rule.\n\nIt does not matter if the object that you pass as input to the function argument is\na variable object or not. In this example, we marked the `v3` object as a variable object.\nBut this does not matter. Because it is not about the input object, it is about\nthe function argument.\n\nThe problem begins when we try to alter the value of `self` directly, which is a function argument,\nand, every function argument is immutable by default. You may quest yourself how can we overcome\nthis barrier, and once again, the solution was also discussed at @sec-fun-pars.\nWe overcome this barrier, by explicitly marking the `self` argument as a pointer.\n\n\n::: {.callout-note}\nIf a method of your `x` struct alters the state of the object, by\nchanging the value of any data member, then, remember to use `self: *x`,\ninstead of `self: x` in the function signature of this method.\n:::\n\n\nYou could also interpret the content discussed in this section as:\n\"if you need to alter the state of your `x` struct object in one of it's methods,\nyou must explicitly pass the `x` struct object by reference to the `self` argument of this method\".\n\n\n\n## Type inference {#sec-type-inference}\n\nZig is kind of a strongly typed language. I say \"kind of\" because there are situations\nwhere you don't have to explicitly write the type of every single object in your source code,\nas you would expect from a traditional strongly typed language, such as C and C++.\n\nIn some situations, the `zig` compiler can use type inference to solves the data types for you, easing some of\nthe burden that you carry as a developer.\nThe most common way this happens is through function arguments that receives struct objects\nas input.\n\nIn general, type inference in Zig is done by using the dot character (`.`).\nEverytime you see a dot character written before a struct literal, or before an enum value, or something like that,\nyou know that this dot character is playing a special party in this place. More specifically, it is\ntelling the `zig` compiler something on the lines of: \"Hey! Can you infer the type of this\nvalue for me? Please!\". In other words, this dot character is playing a role similar to the `auto` keyword in C++.\n\nI give you some examples of this at @sec-anonymous-struct-literals, where we present anonymous struct literals.\nAnonymous struct literals are, essentially, struct literals that use type inference to\ninfer the exact type of this particular struct literal.\nThis type inference is done by looking for some minimal hint of the correct data type to be used.\nYou could say that the `zig` compiler looks for any neighbouring type annotation that might tell him\nwhat would be the correct type.\n\nAnother common place where we use type inference in Zig is at switch statements (which we talk about at @sec-switch).\nI also gave some other examples of type inference at @sec-switch, where we were inferring the data types of enum values listed inside\nof switch statements (e.g. `.DE`).\nBut as another example, take a look at this `fence()` function reproduced below,\nwhich comes from the [`atomic.zig` module](https://github.com/ziglang/zig/blob/master/lib/std/atomic.zig)[^fence-fn]\nof the Zig Standard Library.\n\n[^fence-fn]: .\n\nThere are a lot of things in this function that we haven't talked about yet, such as:\nwhat `comptime` means? `inline`? `extern`? What is this star symbol before `Self`?\nLet's just ignore all of these things, and focus solely on the switch statement\nthat is inside this function.\n\nWe can see that this switch statement uses the `order` object as input. This `order`\nobject is one of the inputs of this `fence()` function, and we can see in the type annotation,\nthat this object is of type `AtomicOrder`. We can also see a bunch of values inside the\nswitch statements that begins with a dot character, such as `.release` and `.acquire`.\n\nBecause these weird values contain a dot character before them, we are asking the `zig`\ncompiler to infer the types of these values inside the switch statement. Then, the `zig`\ncompiler is looking into the current context where these values are being used, and it is\ntrying to infer the types of these values.\n\nSince they are being used inside a switch statement, the `zig` compiler looks into the type\nof the input object given to the switch statement, which is the `order` object in this case.\nBecause this object have type `AtomicOrder`, the `zig` compiler infers that these values\nare data members from this type `AtomicOrder`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub inline fn fence(self: *Self, comptime order: AtomicOrder) void {\n // LLVM's ThreadSanitizer doesn't support the normal fences so we specialize for it.\n if (builtin.sanitize_thread) {\n const tsan = struct {\n extern \"c\" fn __tsan_acquire(addr: *anyopaque) void;\n extern \"c\" fn __tsan_release(addr: *anyopaque) void;\n };\n\n const addr: *anyopaque = self;\n return switch (order) {\n .unordered, .monotonic => @compileError(\n @tagName(order) ++ \" only applies to atomic loads and stores\"\n ),\n .acquire => tsan.__tsan_acquire(addr),\n .release => tsan.__tsan_release(addr),\n .acq_rel, .seq_cst => {\n tsan.__tsan_acquire(addr);\n tsan.__tsan_release(addr);\n },\n };\n }\n\n return @fence(order);\n}\n```\n:::\n\n\n\n\nThis is how basic type inference is done in Zig. If we didn't use the dot character before\nthe values inside this switch statement, then, we would be forced to write explicitly\nthe data types of these values. For example, instead of writing `.release` we would have to\nwrite `AtomicOrder.release`. We would have to do this for every single value\nin this switch statement, and this is a lot of work. That is why type inference\nis commonly used on switch statements in Zig.\n\n\n\n## Type casting {#sec-type-cast}\n\nIn this section, I want to discuss type casting (or, type conversion) with you.\nWe use type casting when we have an object of type \"x\", and we want to convert\nit into an object of type \"y\", i.e. we want to change the data type of the object.\n\nMost languages have a formal way to perform type casting. In Rust for example, we normally\nuse the keyword `as`, and in C, we normally use the type casting syntax, e.g. `(int) x`.\nIn Zig, we use the `@as()` built-in function to cast an object of type \"x\", into\nan object of type \"y\".\n\nThis `@as()` function is the preferred way to perform type conversion (or type casting)\nin Zig. Because it is explicit, and, it also performs the casting only if it\nis unambiguous and safe. To use this function, you just provide the target data type\nin the first argument, and, the object that you want cast at the second argument.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst expect = std.testing.expect;\ntest {\n const x: usize = 500;\n const y = @as(u32, x);\n try expect(@TypeOf(y) == u32);\n}\n```\n:::\n\n\n\n\nThis is the general way to perform type casting in Zig. But remember, `@as()` works only when casting\nis unambiguous and safe, and there are situations where these assumptions do not hold. For example,\nwhen casting an integer value into a float value, or vice-versa, it is not clear to the compiler\nhow to perform this conversion safely.\n\nTherefore, we need to use specialized \"casting functions\" in such situations.\nFor example, if you want to cast an integer value into a float value, then, you\nshould use the `@floatFromInt()` function. In the inverse scenario, you should use\nthe `@intFromFloat()` function.\n\nIn these functions, you just provide the object that you want to\ncast as input. Then, the target data type of the \"type casting operation\" is determined by\nthe type annotation of the object where you are saving the results.\nIn the example below, we are casting the object `x` into a value of type `f32`,\nbecause the object `y`, which is where we are saving the results, is annotated\nas an object of type `f32`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst expect = std.testing.expect;\ntest {\n const x: usize = 565;\n const y: f32 = @floatFromInt(x);\n try expect(@TypeOf(y) == f32);\n}\n```\n:::\n\n\n\n\nAnother built-in function that is very useful when performing type casting operations is `@ptrCast()`.\nIn essence, we use the `@as()` built-in function when we want to explicit convert (or cast) a Zig value/object\nfrom a type \"x\" to a type \"y\", etc. However, pointers (we are going to discuss pointers\nin more depth at @sec-pointer) are a special type of object in Zig,\ni.e. they are treated differently from \"normal objects\".\n\nEverytime a pointer is involved in some \"type casting operation\" in Zig, the `@ptrCast()` function is used.\nThis function works similarly to `@floatFromInt()`.\nYou just provide the pointer object that you want to cast as input to this function, and the\ntarget data type is, once again, determined by the type annotation of the object where the results are being\nstored.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst expect = std.testing.expect;\ntest {\n const bytes align(@alignOf(u32)) = [_]u8{\n 0x12, 0x12, 0x12, 0x12\n };\n const u32_ptr: *const u32 = @ptrCast(&bytes);\n try expect(@TypeOf(u32_ptr) == *const u32);\n}\n```\n:::\n\n\n\n\n\n\n\n\n## Modules\n\nWe already talked about what modules are, and also, how to import other modules into\nyour current module via *import statements*. Every Zig module (i.e. a `.zig` file) that you write in your project\nis internally stored as a struct object. Take the line exposed below as an example. In this line we are importing the\nZig Standard Library into our current module.\n\n```zig\nconst std = @import(\"std\");\n```\n\nWhen we want to access the functions and objects from the standard library, we\nare basically accessing the data members of the struct stored in the `std`\nobject. That is why we use the same syntax that we use in normal structs, with the dot operator (`.`)\nto access the data members and methods of the struct.\n\nWhen this \"import statement\" get's executed, the result of this expression is a struct\nobject that contains the Zig Standard Library modules, global variables, functions, etc.\nAnd this struct object get's saved (or stored) inside the constant object named `std`.\n\n\nTake the [`thread_pool.zig` module from the project `zap`](https://github.com/kprotty/zap/blob/blog/src/thread_pool.zig)[^thread]\nas an example. This module is written as if it was\na big struct. That is why we have a top-level and public `init()` method\nwritten in this module. The idea is that all top-level functions written in this\nmodule are methods from the struct, and all top-level objects and struct declarations\nare data members of this struct. The module is the struct itself.\n\n[^thread]: \n\n\nSo you would import and use this module by doing something like this:\n\n```zig\nconst std = @import(\"std\");\nconst ThreadPool = @import(\"thread_pool.zig\");\nconst num_cpus = std.Thread.getCpuCount()\n catch @panic(\"failed to get cpu core count\");\nconst num_threads = std.math.cast(u16, num_cpus)\n catch std.math.maxInt(u16);\nconst pool = ThreadPool.init(\n .{ .max_threads = num_threads }\n);\n```\n\n\n\n",
+ "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Control flow, structs, modules and types\n\nWe have discussed a lot of Zig's syntax in the last chapter,\nespecially at @sec-root-file and @sec-main-file.\nBut we still need to discuss some other very important\nelements of the language. Elements that you will use constantly on your day-to-day\nroutine.\n\nWe begin this chapter by discussing the different keywords and structures\nin Zig related to control flow (e.g. loops and if statements).\nThen, we talk about structs and how they can be used to do some\nbasic Object-Oriented (OOP) patterns in Zig. We also talk about\ntype inference and type casting.\nFinally, we end this chapter by discussing modules, and how they relate\nto structs.\n\n\n\n## Control flow {#sec-zig-control-flow}\n\nSometimes, you need to make decisions in your program. Maybe you need to decide\nwhether to execute or not a specific piece of code. Or maybe,\nyou need to apply the same operation over a sequence of values. These kinds of tasks,\ninvolve using structures that are capable of changing the \"control flow\" of our program.\n\nIn computer science, the term \"control flow\" usually refers to the order in which expressions (or commands)\nare evaluated in a given language or program. But this term is also used to refer\nto structures that are capable of changing this \"evaluation order\" of the commands\nexecuted by a given language/program.\n\nThese structures are better known\nby a set of terms, such as: loops, if/else statements, switch statements, among others. So,\nloops and if/else statements are examples of structures that can change the \"control\nflow\" of our program. The keywords `continue` and `break` are also examples of symbols\nthat can change the order of evaluation, since they can move our program to the next iteration\nof a loop, or make the loop stop completely.\n\n\n### If/else statements\n\nAn if/else statement performs a \"conditional flow operation\".\nA conditional flow control (or choice control) allows you to execute\nor ignore a certain block of commands based on a logical condition.\nMany programmers and computer science professionals also use\nthe term \"branching\" in this case.\nIn essence, an if/else statement allow us to use the result of a logical test\nto decide whether or not to execute a given block of commands.\n\nIn Zig, we write if/else statements by using the keywords `if` and `else`.\nWe start with the `if` keyword followed by a logical test inside a pair\nof parentheses, followed by a pair of curly braces which contains the lines\nof code to be executed in case the logical test returns the value `true`.\n\nAfter that, you can optionally add an `else` statement. To do that, just add the `else`\nkeyword followed by a pair of curly braces, with the lines of code\nto executed in case the logical test defined at `if` returns `false`.\n\nIn the example below, we are testing if the object `x` contains a number\nthat is greater than 10. Judging by the output printed to the console,\nwe know that this logical test returned `false`. Because the output\nin the console is compatible with the line of code present in the\n`else` branch of the if/else statement.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst x = 5;\nif (x > 10) {\n try stdout.print(\n \"x > 10!\\n\", .{}\n );\n} else {\n try stdout.print(\n \"x <= 10!\\n\", .{}\n );\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nx <= 10!\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Switch statements {#sec-switch}\n\nSwitch statements are also available in Zig, and they have a very similar syntax to a switch statement in Rust.\nAs you would expect, to write a switch statement in Zig we use the `switch` keyword.\nWe provide the value that we want to \"switch over\" inside a\npair of parentheses. Then, we list the possible combinations (or \"branchs\")\ninside a pair of curly braces.\n\nLet's take a look at the code example below. You can see that\nI'm creating an enum type called `Role`. We talk more about enums at @sec-enum.\nBut in summary, this `Role` type is listing different types of roles in a fictitious\ncompany, like `SE` for Software Engineer, `DE` for Data Engineer, `PM` for Product Manager,\netc.\n\nNotice that we are using the value from the `role` object in the\nswitch statement, to discover which exact area we need to store in the `area` variable object.\nAlso notice that we are using type inference inside the switch statement, with the dot character,\nas we described at @sec-type-inference.\nThis makes the `zig` compiler infer the correct data type of the values (`PM`, `SE`, etc.) for us.\n\nAlso notice that, we are grouping multiple values in the same branch of the switch statement.\nWe just separate each possible value with a comma. For example, if `role` contains either `DE` or `DA`,\nthe `area` variable would contain the value `\"Data & Analytics\"`, instead of `\"Platform\"` or `\"Sales\"`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst Role = enum {\n SE, DPE, DE, DA, PM, PO, KS\n};\n\npub fn main() !void {\n var area: []const u8 = undefined;\n const role = Role.SE;\n switch (role) {\n .PM, .SE, .DPE, .PO => {\n area = \"Platform\";\n },\n .DE, .DA => {\n area = \"Data & Analytics\";\n },\n .KS => {\n area = \"Sales\";\n },\n }\n try stdout.print(\"{s}\\n\", .{area});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nPlatform\n```\n\n\n:::\n:::\n\n\n\n\n\n#### Switch statements must exhaust all possibilities\n\nOne very important aspect about switch statements in Zig\nis that they must exhaust all existing possibilities.\nIn other words, all possible values that could be found inside the `order`\nobject must be explicitly handled in this switch statement.\n\nSince the `role` object have type `Role`, the only possible values to\nbe found inside this object are `PM`, `SE`, `DPE`, `PO`, `DE`, `DA` and `KS`.\nThere are no other possible values to be stored in this `role` object.\nThus, the switch statements must have a combination (branch) for each one of these values.\nThis is what \"exhaust all existing possibilities\" means. The switch statement covers\nevery possible case.\n\nTherefore, you cannot write a switch statement in Zig, and leave an edge case\nwith no expliciting action to be taken.\nThis is a similar behaviour to switch statements in Rust, which also have to\nhandle all possible cases.\n\n\n\n#### The else branch\n\nTake a look at the `dump_hex_fallible()` function below as an example. This function\ncomes from the Zig Standard Library. More precisely, from the\n[`debug.zig` module](https://github.com/ziglang/zig/blob/master/lib/std/debug.zig)[^debug-mod].\nThere are multiple lines in this function, but I omitted them to focus solely on the\nswitch statement found in this function. Notice that this switch statement have four\npossible cases, or four explicit branches. Also, notice that we used an `else` branch\nin this case.\n\nAn `else` branch in a switch statement work as the \"default branch\".\nWhenever you have multiple cases in your switch statement where\nyou want to apply the exact same action, you can use an `else` branch to do that.\n\n[^debug-mod]: \n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn dump_hex_fallible(bytes: []const u8) !void {\n // Many lines ...\n switch (byte) {\n '\\n' => try writer.writeAll(\"␊\"),\n '\\r' => try writer.writeAll(\"␍\"),\n '\\t' => try writer.writeAll(\"␉\"),\n else => try writer.writeByte('.'),\n }\n}\n```\n:::\n\n\n\n\nMany programmers would also use an `else` branch to handle a \"not supported\" case.\nThat is, a case that cannot be properly handled by your code, or, just a case that\nshould not be \"fixed\". Therefore, you can use an `else` branch to panic (or raise an error)\nin your program to stop the current execution.\n\nTake the code example below. We can see that, we are handling the cases\nfor the `level` object being either 1, 2, or 3. All other possible cases are not supported by default,\nand, as consequence, we raise a runtime error in such cases through the `@panic()` built-in function.\n\nAlso notice that, we are assigning the result of the switch statement to a new object called `category`.\nThis is another thing that you can do with switch statements in Zig. If the branchs\noutput some value as result, you can store the result value of the switch statement into\na new object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst level: u8 = 4;\nconst category = switch (level) {\n 1, 2 => \"beginner\",\n 3 => \"professional\",\n else => {\n @panic(\"Not supported level!\");\n },\n};\ntry stdout.print(\"{s}\\n\", .{category});\n```\n:::\n\n\n\n\n```\nthread 13103 panic: Not supported level!\nt.zig:9:13: 0x1033c58 in main (switch2)\n @panic(\"Not supported level!\");\n ^\n```\n\n\n\n#### Using ranges in switch\n\nFurthermore, you can also use ranges of values in switch statements.\nThat is, you can create a branch in your switch statement that is used\nwhenever the input value is within a range. These \"range expressions\"\nare created with the operator `...`. Is important\nto emphasize that the ranges created by this operator are\ninclusive on both ends.\n\nFor example, I could easily change the previous code example to support all\nlevels between 0 and 100. Like this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst level: u8 = 4;\nconst category = switch (level) {\n 0...25 => \"beginner\",\n 26...75 => \"intermediary\",\n 76...100 => \"professional\",\n else => {\n @panic(\"Not supported level!\");\n },\n};\ntry stdout.print(\"{s}\\n\", .{category});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nbeginner\n```\n\n\n:::\n:::\n\n\n\n\nThis is neat, and it works with character ranges too. That is, I could\nsimply write `'a'...'z'`, to match any character value that is a\nlowercase letter, and it would work fine.\n\n\n#### Labeled switch statements\n\nAt @sec-blocks we have talked about labeling blocks, and also, about using these labels\nto return a value from the block. Well, from version 0.14.0 and onwards of the `zig` compiler,\nyou can also apply labels over switch statements, which makes it possible to almost implement a\n\"C `goto`\" like pattern.\n\nFor example, if you give the label `xsw` to a switch statement, you can use this\nlabel in conjunction with the `continue` keyword to go back to the beginning of the switch\nstatement. In the example below, the execution goes back to the beginning of the\nswitch statement two times, before ending at the `3` branch.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nxsw: switch (@as(u8, 1)) {\n 1 => continue :xsw 2,\n 2 => continue :xsw 3,\n 3 => return,\n 4 => {},\n}\n```\n:::\n\n\n\n\n\n### The `defer` keyword {#sec-defer}\n\nWith the `defer` keyword you can register an expression to be executed when you exit the current scope.\nTherefore, this keyword has a similar functionality as the `on.exit()` function from R.\nTake the `foo()` function below as an example. When we execute this `foo()` function, the expression\nthat prints the message \"Exiting function ...\" get's executed only when the function exits\nit's scope.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nfn foo() !void {\n defer std.debug.print(\n \"Exiting function ...\\n\", .{}\n );\n try stdout.print(\"Adding some numbers ...\\n\", .{});\n const x = 2 + 2; _ = x;\n try stdout.print(\"Multiplying ...\\n\", .{});\n const y = 2 * 8; _ = y;\n}\n\npub fn main() !void {\n try foo();\n}\n```\n:::\n\n\n\n\n```\nAdding some numbers ...\nMultiplying ...\nExiting function ...\n```\n\nTherefore, we can use `defer` to declare an expression that is going to be executed\nwhen your code exits the current scope. Some programmers like to interpret the phrase \"exit of the current scope\"\nas \"the end of the current scope\". But this interpretation might not be entirely correct, depending\non what you consider as \"the end of the current scope\".\n\nI mean, what do you consider as **the end** of the current scope? Is it the closing curly bracket (`}`) of the scope?\nIs it when the last expression in the function get's executed? Is it when the function returns to the previous scope?\nEtc. For example, it would not be correct to interpret the \"exit of the current scope\" as the closing\ncurly bracket of the scope. Because the function might exit from an earlier position than this\nclosing curly bracket (e.g. an error value was generated at a previous line inside the function;\nthe function reached an earlier return statement; etc.). Anyway, just be careful with this interpretation.\n\nNow, if you remember of what we have discussed at @sec-blocks, there are multiple structures in the language\nthat create their own separate scopes. For/while loops, if/else statements,\nfunctions, normal blocks, etc. This also affects the interpretation of `defer`.\nFor example, if you use `defer` inside a for loop, then, the given expression\nwill be executed everytime this specific for loop exits it's own scope.\n\nBefore we continue, is worth emphasizing that the `defer` keyword is an \"unconditional defer\".\nWhich means that the given expression will be executed no matter how the code exits\nthe current scope. For example, your code might exit the current scope because of an error value\nbeing generated, or, because of a return statement, or, a break statement, etc.\n\n\n\n### The `errdefer` keyword {#sec-errdefer1}\n\nOn the previous section, we have discussed the `defer` keyword, which you can use to\nregister an expression to be executed at the exit of the current scope.\nBut this keyword have a brother, which is the `errdefer` keyword. While `defer`\nis an \"unconditional defer\", the `errdefer` keyword is a \"conditional defer\".\nWhich means that the given expression is executed only when you exit the current\nscope on a very specific circumstance.\n\nIn more details, the expression given to `errdefer` is executed only when an error occurs in the current scope.\nTherefore, if the function (or for/while loop, if/else statement, etc.) exits the current scope\nin a normal situation, without errors, the expression given to `errdefer` is not executed.\n\nThis makes the `errdefer` keyword one of the many tools available in Zig for error handling.\nIn this section, we are more concerned with the control flow aspects around `errdefer`.\nBut we are going to discuss `errdefer` later as a error handling tool at @sec-errdefer2.\n\nThe code example below demonstrates three things:\n\n- that `defer` is an \"unconditional defer\", because the given expression get's executed regardless of how the function `foo()` exits it's own scope.\n- that `errdefer` is executed because the function `foo()` returned an error value.\n- that `defer` and `errdefer` expressions are executed in a LIFO (*last in, first out*) order.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn foo() !void { return error.FooError; }\npub fn main() !void {\n var i: usize = 1;\n errdefer std.debug.print(\"Value of i: {d}\\n\", .{i});\n defer i = 2;\n try foo();\n}\n```\n:::\n\n\n\n\n```\nValue of i: 2\nerror: FooError\n/t.zig:6:5: 0x1037e48 in foo (defer)\n return error.FooError;\n ^\n```\n\n\nWhen I say that \"defer expressions\" are executed in a LIFO order, what I want to mean is that\nthe last `defer` or `errdefer` expressions in the code are the first ones to be executed.\nYou could also interpret this as: \"defer expressions\" are executed from bottom to top, or,\nfrom last to first.\n\nTherefore, if I change the order of the `defer` and `errdefer` expressions, you will notice that\nthe value of `i` that get's printed to the console changes to 1. This doesn't mean that the\n`defer` expression was not executed in this case. This actually means that the `defer` expression\nwas executed only after the `errdefer` expression. The code example below demonstrates this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn foo() !void { return error.FooError; }\npub fn main() !void {\n var i: usize = 1;\n defer i = 2;\n errdefer std.debug.print(\"Value of i: {d}\\n\", .{i});\n try foo();\n}\n```\n:::\n\n\n\n\n```\nValue of i: 1\nerror: FooError\n/t.zig:6:5: 0x1037e48 in foo (defer)\n return error.FooError;\n ^\n```\n\n\n\n\n### For loops\n\nA loop allows you to execute the same lines of code multiple times,\nthus, creating a \"repetition space\" in the execution flow of your program.\nLoops are particularly useful when we want to replicate the same function\n(or the same set of commands) over different inputs.\n\nThere are different types of loops available in Zig. But the most\nessential of them all is probably the *for loop*. A for loop is\nused to apply the same piece of code over the elements of a slice, or, an array.\n\nFor loops in Zig have a slightly different syntax that you are\nprobably used to seeing in other languages. You start with the `for` keyword, then, you\nlist the items that you want to iterate\nover inside a pair of parentheses. Then, inside of a pair of pipes (`|`)\nyou should declare an identifier that will serve as your iterator, or,\nthe \"repetition index of the loop\".\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfor (items) |value| {\n // code to execute\n}\n```\n:::\n\n\n\n\nTherefore, instead of using a `(value in items)` syntax,\nin Zig, for loops use the syntax `(items) |value|`. In the example\nbelow, you can see that we are looping through the items\nof the array stored at the object `name`, and printing to the\nconsole the decimal representation of each character in this array.\n\nIf we wanted, we could also iterate through a slice (or a portion) of\nthe array, instead of iterating through the entire array stored in the `name` object.\nJust use a range selector to select the section you want. For example,\nI could provide the expression `name[0..3]` to the for loop, to iterate\njust through the first 3 elements in the array.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst name = [_]u8{'P','e','d','r','o'};\nfor (name) |char| {\n try stdout.print(\"{d} | \", .{char});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n80 | 101 | 100 | 114 | 111 | \n```\n\n\n:::\n:::\n\n\n\n\nIn the above example we are using the value itself of each\nelement in the array as our iterator. But there are many situations where\nwe need to use an index instead of the actual values of the items.\n\nYou can do that by providing a second set of items to iterate over.\nMore precisely, you provide the range selector `0..` to the for loop. So,\nyes, you can use two different iterators at the same time in a for\nloop in Zig.\n\nBut remember from @sec-assignments that, every object\nyou create in Zig must be used in some way. So if you declare two iterators\nin your for loop, you must use both iterators inside the for loop body.\nBut if you want to use just the index iterator, and not use the \"value iterator\",\nthen, you can discard the value iterator by maching the\nvalue items to the underscore character, like in the example below:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfor (name, 0..) |_, i| {\n try stdout.print(\"{d} | \", .{i});\n}\n```\n:::\n\n\n\n\n```\n0 | 1 | 2 | 3 | 4 |\n```\n\n\n### While loops\n\nA while loop is created from the `while` keyword. A `for` loop\niterates through the items of an array, but a `while` loop\nwill loop continuously, and infinitely, until a logical test\n(specified by you) becomes false.\n\nYou start with the `while` keyword, then, you define a logical\nexpression inside a pair of parentheses, and the body of the\nloop is provided inside a pair of curly braces, like in the example below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar i: u8 = 1;\nwhile (i < 5) {\n try stdout.print(\"{d} | \", .{i});\n i += 1;\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n1 | 2 | 3 | 4 | \n```\n\n\n:::\n:::\n\n\n\n\nYou can also specify the increment expression to be used at the beginning of while loop.\nTo do that, we write the increment expression inside a pair of parentheses after a colon character (`:`).\nThe code example below demonstrates this other pattern.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar i: u8 = 1;\nwhile (i < 5) : (i += 1) {\n try stdout.print(\"{d} | \", .{i});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n1 | 2 | 3 | 4 | \n```\n\n\n:::\n:::\n\n\n\n\n### Using `break` and `continue`\n\nIn Zig, you can explicitly stop the execution of a loop, or, jump to the next iteration of the loop, by using\nthe keywords `break` and `continue`, respectively. The `while` loop presented in the next code example is,\nat first sight, an infinite loop. Because the logical value inside the parenthese will always be equal to `true`.\nBut what makes this `while` loop stop when the `i` object reaches the count\n10? Is the `break` keyword!\n\nInside the while loop, we have an if statement that is constantly checking if the `i` variable\nis equal to 10. Since we are incrementing the value of `i` at each iteration of the\nwhile loop, this `i` object will eventually be equal to 10, and when it does, the if statement\nwill execute the `break` expression, and, as a result, the execution of the while loop is stopped.\n\nNotice the use of the `expect()` function from the Zig Standard Library after the while loop.\nThis `expect()` function is an \"assert\" type of function.\nThis function checks if the logical test provided is equal to true. If so, the function do nothing.\nOtherwise (i.e. the logical test is equal to false), the function raises an assertion error.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar i: usize = 0;\nwhile (true) {\n if (i == 10) {\n break;\n }\n i += 1;\n}\ntry std.testing.expect(i == 10);\ntry stdout.print(\"Everything worked!\", .{});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nEverything worked!\n```\n\n\n:::\n:::\n\n\n\n\nSince this code example was executed successfully by the `zig` compiler,\nwithout raising any errors, then, we known that, after the execution of while loop,\nthe `i` object is equal to 10. Because if it wasn't equal to 10, then, an error would\nbe raised by `expect()`.\n\nNow, in the next example, we have an use case for\nthe `continue` keyword. The if statement is constantly\nchecking if the current index is a multiple of 2. If\nit is, then, we jump to the next iteration of the loop.\nOtherwise, the loop just prints the current index to the console.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ns = [_]u8{1,2,3,4,5,6};\nfor (ns) |i| {\n if ((i % 2) == 0) {\n continue;\n }\n try stdout.print(\"{d} | \", .{i});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n1 | 3 | 5 | \n```\n\n\n:::\n:::\n\n\n\n\n\n\n## Function parameters are immutable {#sec-fun-pars}\n\nWe have already discussed a lot of the syntax behind function declarations at @sec-root-file and @sec-main-file.\nBut I want to emphasize a curious fact about function parameters (a.k.a. function arguments) in Zig.\nIn summary, function parameters are immutable in Zig.\n\nTake the code example below, where we declare a simple function that just tries to add\nsome amount to the input integer, and returns the result back. But if you look closely\nat the body of this `add2()` function, you will notice that we try\nto save the result back into the `x` function argument.\n\nIn other words, this function not only use the value that it received through the function argument\n`x`, but it also tries to change the value of this function argument, by assigning the addition result\ninto `x`. However, function arguments in Zig are immutable. You cannot change their values, or, you\ncannot assign values to them inside the body's function.\n\nThis is the reason why, the code example below do not compile successfully. If you try to compile\nthis code example, you get a compile error warning you that you are trying to change the value of a\nimmutable (i.e. constant) object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn add2(x: u32) u32 {\n x = x + 2;\n return x;\n}\n\npub fn main() !void {\n const y = add2(4);\n std.debug.print(\"{d}\\n\", .{y});\n}\n```\n:::\n\n\n\n\n```\nt.zig:3:5: error: cannot assign to constant\n x = x + 2;\n ^\n```\n\n\nIf a function argument receives as input an object whose data type is\nany of the primitive types that we have listed at @sec-primitive-data-types,\nthis object is always passed by value to the function. In other words, this object\nis copied into the function stack frame.\n\nHowever, if the input object have a more complex data type, for example, it might\nbe a struct instance, or an array, or an union value, etc., in cases like that, the `zig` compiler\nwill take the liberty of deciding for you which strategy is best. Thus, the `zig` compiler will\npass your object to the function either by value, or by reference. The compiler will always\nchoose the strategy that is faster for you.\nThis optimization that you get for free is possible only because function arguments are\nimmutable in Zig.\n\nThere are some situations where you might need to change the value of your function argument\ndirectly inside the function's body. This happens more often when we are passing\nC structs as inputs to Zig functions.\n\nIn a situation like this, you can overcome this barrier of immutable function arguments, by simply taking the lead,\nand explicitly choosing to pass the object by reference to the function.\nThat is, instead of depending on the `zig` compiler to decide which strategy is best, you have\nto explicitly mark the function argument as a pointer. This way, we are telling the compiler\nthat this function argument will be passed by reference to the function.\n\nBy making it a pointer, we can finally alter the value of this function argument directly inside\nthe body of the `add2()` function. You can see that the code example below compiles successfully.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nfn add2(x: *u32) void {\n const d: u32 = 2;\n x.* = x.* + d;\n}\n\npub fn main() !void {\n var x: u32 = 4;\n add2(&x);\n std.debug.print(\"Result: {d}\\n\", .{x});\n}\n```\n:::\n\n\n\n\n```\nResult: 6\n```\n\n\n\n## Structs and OOP {#sec-structs-and-oop}\n\nZig is a language more closely related to C (which is a procedural language),\nthan it is to C++ or Java (which are object-oriented languages). Because of that, you do not\nhave advanced OOP (Object-Oriented Programming) patterns available in Zig, such as classes, interfaces or\nclass inheritance. Nonetheless, OOP in Zig is still possible by using struct definitions.\n\nWith struct definitions, you can create (or define) a new data type in Zig. These struct definitions work the same way as they work in C.\nYou give a name to this new struct (or, to this new data type you are creating), then, you list the data members of this new struct. You can\nalso register functions inside this struct, and they become the methods of this particular struct (or data type), so that, every object\nthat you create with this new type, will always have these methods available and associated with them.\n\nIn C++, when we create a new class, we normally have a constructor method (or, a constructor function) which\nis used to construct (or, to instantiate) every object of this particular class, and we also have\na destructor method (or a destructor function), which is the function responsible for destroying\nevery object of this class.\n\nIn Zig, we normally declare the constructor and the destructor methods\nof our structs, by declaring an `init()` and a `deinit()` methods inside the struct.\nThis is just a naming convention that you will find across the entire Zig Standard Library.\nSo, in Zig, the `init()` method of a struct is normally the constructor method of the class represented by this struct.\nWhile the `deinit()` method is the method used for destroying an existing instance of that struct.\n\nThe `init()` and `deinit()` methods are both used extensively in Zig code, and you will see both of\nthem being used when we talk about allocators at @sec-allocators.\nBut, as another example, let's build a simple `User` struct to represent an user of some sort of system.\n\nIf you look at the `User` struct below, you can see the `struct` keyword.\nNotice the data members of this struct, `id`, `name` and `email`. Every data member have it's\ntype explicitly annotated, with the colon character (`:`) syntax that we described earlier at @sec-root-file.\nBut also notice that every line in the struct body that describes a data member, ends with a comma character (`,`).\nSo every time you declare a data member in your Zig code, always end the line with a comma character, instead\nof ending it with the traditional semicolon character (`;`).\n\nNext, also notice in this example, that we have registrated an `init()` function as a method\nof this `User` struct. This `init()` method is the constructor method that we will use to instantiate\nevery new `User` object. That is why this `init()` function returns a new `User` object as result.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst User = struct {\n id: u64,\n name: []const u8,\n email: []const u8,\n\n pub fn init(id: u64,\n name: []const u8,\n email: []const u8) User {\n\n return User {\n .id = id,\n .name = name,\n .email = email\n };\n }\n\n pub fn print_name(self: User) !void {\n try stdout.print(\"{s}\\n\", .{self.name});\n }\n};\n\npub fn main() !void {\n const u = User.init(1, \"pedro\", \"email@gmail.com\");\n try u.print_name();\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\npedro\n```\n\n\n:::\n:::\n\n\n\n\nThe `pub` keyword plays an important role in struct declarations, and OOP in Zig.\nEvery method that you declare in your struct that is marked with the keyword `pub`,\nbecomes a public method of this particular struct.\n\nSo every method that you create inside your struct, is, at first, a private method\nof that struct. Meaning that, this method can only be called from within this\nstruct. But, if you mark this method as public, with the keyword `pub`, then,\nyou can call the method directly from an instance of the `User` struct.\n\nIn other words, the functions marked by the keyword `pub`\nare members of the public API of that struct.\nFor example, if I did not marked the `print_name()` method as public,\nthen, I could not execute the line `u.print_name()`. Because I would\nnot be authorized to call this method directly in my code.\n\n\n\n### Anonymous struct literals {#sec-anonymous-struct-literals}\n\nYou can declare a struct object as a literal value. When we do that, we normally specify the\ndata type of this struct literal by writing it's data type just before the opening curly brace.\nFor example, I could write a struct literal value of the type `User` that we have defined\nin the previous section like this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst eu = User {\n .id = 1,\n .name = \"Pedro\",\n .email = \"someemail@gmail.com\"\n};\n_ = eu;\n```\n:::\n\n\n\n\nHowever, in Zig, we can also write an anonymous struct literal. That is, you can write a\nstruct literal, but not especify explicitly the type of this particular struct.\nAn anonymous struct is written by using the syntax `.{}`. So, we essentially\nreplaced the explicit type of the struct literal with a dot character (`.`).\n\nAs we described at @sec-type-inference, when you put a dot before a struct literal,\nthe type of this struct literal is automatically inferred by the `zig` compiler.\nIn essence, the `zig` compiler will look for some hint of what is the type of that struct.\nThis hint can be the type annotation of a function argument,\nor the return type annotation of the function that you are using, or the type annotation\nof an existing object.\nIf the compiler do find such type annotation, then, it will use this\ntype in your literal struct.\n\nAnonymous structs are very common to be used as inputs to function arguments in Zig.\nOne example that you have seen already constantly, is the `print()`\nfunction from the `stdout` object.\nThis function takes two arguments.\nThe first argument, is a template string, which should\ncontain string format specifiers in it, which tells how the values provided\nin the second argument should be printed into the message.\n\nWhile the second argument is a struct literal that lists the values\nto be printed into the template message specified in the first argument.\nYou normally want to use an anonymous struct literal here, so that, the\n`zig` compiler do the job of specifying the type of this particular\nanonymous struct for you.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\npub fn main() !void {\n const stdout = std.io.getStdOut().writer();\n try stdout.print(\"Hello, {s}!\\n\", .{\"world\"});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nHello, world!\n```\n\n\n:::\n:::\n\n\n\n\n\n\n### Struct declarations must be constant\n\nTypes in Zig must be `const` or `comptime` (we are going to talk more about comptime at @sec-comptime).\nWhat this means is that you cannot create a new data type, and mark it as variable with the `var` keyword.\nSo struct declarations are always constant. You cannot declare a new struct type using the `var` keyword.\nIt must be `const`.\n\nIn the `Vec3` example below, this declaration is allowed because I'm using the `const` keyword\nto declare this new data type.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst Vec3 = struct {\n x: f64,\n y: f64,\n z: f64,\n};\n```\n:::\n\n\n\n\n\n### The `self` method argument {#sec-self-arg}\n\nIn every language that have OOP, when we declare a method of some class or struct, we\nusually declare this method as a function that have a `self` argument.\nThis `self` argument is the reference to the object itself from which the method\nis being called from.\n\nIt is not mandatory to use this `self` argument. But why would you not use this `self` argument?\nThere is no reason to not use it. Because the only way to get access to the data stored in the\ndata members of your struct is to access them through this `self` argument.\nIf you don't need to use the data in the data members of your struct inside your method, then, you very likely don't need\na method. You can just declare this logic as a simple function, outside of your\nstruct declaration.\n\n\nTake the `Vec3` struct below. Inside this `Vec3` struct we declared a method named `distance()`.\nThis method calculates the distance between two `Vec3` objects, by following the distance\nformula in euclidean space. Notice that this `distance()` method takes two `Vec3` objects\nas input, `self` and `other`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst m = std.math;\nconst Vec3 = struct {\n x: f64,\n y: f64,\n z: f64,\n\n pub fn distance(self: Vec3, other: Vec3) f64 {\n const xd = m.pow(f64, self.x - other.x, 2.0);\n const yd = m.pow(f64, self.y - other.y, 2.0);\n const zd = m.pow(f64, self.z - other.z, 2.0);\n return m.sqrt(xd + yd + zd);\n }\n};\n```\n:::\n\n\n\n\n\nThe `self` argument corresponds to the `Vec3` object from which this `distance()` method\nis being called from. While the `other` is a separate `Vec3` object that is given as input\nto this method. In the example below, the `self` argument corresponds to the object\n`v1`, because the `distance()` method is being called from the `v1` object,\nwhile the `other` argument corresponds to the object `v2`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst v1 = Vec3 {\n .x = 4.2, .y = 2.4, .z = 0.9\n};\nconst v2 = Vec3 {\n .x = 5.1, .y = 5.6, .z = 1.6\n};\n\nstd.debug.print(\n \"Distance: {d}\\n\",\n .{v1.distance(v2)}\n);\n```\n:::\n\n\n\n\n```\nDistance: 3.3970575502926055\n```\n\n\n\n### About the struct state\n\nSometimes you don't need to care about the state of your struct object. Sometimes, you just need\nto instantiate and use the objects, without altering their state. You can notice that when you have methods\ninside your struct declaration that might use the values that are present in the data members, but they\ndo not alter the values in these data members of the struct in anyway.\n\nThe `Vec3` struct that was presented at @sec-self-arg is an example of that.\nThis struct have a single method named `distance()`, and this method do use the values\npresent in all three data members of the struct (`x`, `y` and `z`). But at the same time,\nthis method do not change the values of these data members in any point.\n\nAs a result of that, when we create `Vec3` objects we usually create them as\nconstant objects, like the `v1` and `v2` objects presented at @sec-self-arg.\nWe can create them as variable objects with the `var` keyword,\nif we want to. But because the methods of this `Vec3` struct do not change\nthe state of the objects in any point, is unnecessary to mark them\nas variable objects.\n\nBut why? Why am I talkin about this here? Is because the `self` argument\nin the methods is affected depending on whether the\nmethods present in a struct change or not the state of the object itself.\nMore specifically, when you have a method in a struct that changes the state\nof the object (i.e. change the value of a data member), the `self` argument\nin this method must be annotated in a different manner.\n\nAs I described at @sec-self-arg, the `self` argument in methods of\na struct is the argument that receives as input the object from which the method\nwas called from. We usually annotate this argument in the methods by writing `self`,\nfollowed by the colon character (`:`), and the data type of the struct to which\nthe method belongs to (e.g. `User`, `Vec3`, etc.).\n\nIf we take the `Vec3` struct that we defined in the previous section as an example,\nwe can see in the `distance()` method that this `self` argument is annotated as\n`self: Vec3`. Because the state of the `Vec3` object is never altered by this\nmethod.\n\nBut what if we do have a method that alters the state of the object, by altering the\nvalues of it's data members, how should we annotate `self` in this instance? The answer is:\n\"we should annotate `self` as a pointer of `x`, instead of just `x`\".\nIn other words, you should annotate `self` as `self: *x`, instead of annotating it\nas `self: x`.\n\nIf we create a new method inside the `Vec3` object that, for example, expands the\nvector by multiplying it's coordinates by a factor of two, then, we need to follow\nthis rule specified in the previous paragraph. The code example below demonstrates\nthis idea:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst m = std.math;\nconst Vec3 = struct {\n x: f64,\n y: f64,\n z: f64,\n\n pub fn distance(self: Vec3, other: Vec3) f64 {\n const xd = m.pow(f64, self.x - other.x, 2.0);\n const yd = m.pow(f64, self.y - other.y, 2.0);\n const zd = m.pow(f64, self.z - other.z, 2.0);\n return m.sqrt(xd + yd + zd);\n }\n\n pub fn double(self: *Vec3) void {\n self.x = self.x * 2.0;\n self.y = self.y * 2.0;\n self.z = self.z * 2.0;\n }\n};\n```\n:::\n\n\n\n\nNotice in the code example above that we have added a new method\nto our `Vec3` struct named `double()`. This method essentially doubles the\ncoordinate values of our vector object. Also notice that, in the\ncase of the `double()` method, we annotated the `self` argument as `*Vec3`,\nindicating that this argument receives a pointer (or a reference, if you prefer to call it this way)\nto a `Vec3` object as input.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar v3 = Vec3 {\n .x = 4.2, .y = 2.4, .z = 0.9\n};\nv3.double();\nstd.debug.print(\"Doubled: {d}\\n\", .{v3.x});\n```\n:::\n\n\n\n\n```\nDoubled: 8.4\n```\n\n\n\nNow, if you change the `self` argument in this `double()` method to `self: Vec3`, like in the\n`distance()` method, you will get the compiler error exposed below as result. Notice that this\nerror message is indicating a line from the `double()` method body,\nindicating that you cannot alter the value of the `x` data member.\n\n```zig\n// If we change the function signature of double to:\n pub fn double(self: Vec3) void {\n```\n\n```\nt.zig:16:13: error: cannot assign to constant\n self.x = self.x * 2.0;\n ~~~~^~\n```\n\nThis error message indicates that the `x` data member belongs to a constant object,\nand, because of that, it cannot be changed. Ultimately, this error message\nis telling us that the `self` argument is constant.\n\nIf you take some time, and think hard about this error message, you will understand it.\nYou already have the tools to understand why we are getting this error message.\nWe have talked about it already at @sec-fun-pars.\nSo remember, every function argument is immutable in Zig, and `self`\nis included in this rule.\n\nIt does not matter if the object that you pass as input to the function argument is\na variable object or not. In this example, we marked the `v3` object as a variable object.\nBut this does not matter. Because it is not about the input object, it is about\nthe function argument.\n\nThe problem begins when we try to alter the value of `self` directly, which is a function argument,\nand, every function argument is immutable by default. You may quest yourself how can we overcome\nthis barrier, and once again, the solution was also discussed at @sec-fun-pars.\nWe overcome this barrier, by explicitly marking the `self` argument as a pointer.\n\n\n::: {.callout-note}\nIf a method of your `x` struct alters the state of the object, by\nchanging the value of any data member, then, remember to use `self: *x`,\ninstead of `self: x` in the function signature of this method.\n:::\n\n\nYou could also interpret the content discussed in this section as:\n\"if you need to alter the state of your `x` struct object in one of it's methods,\nyou must explicitly pass the `x` struct object by reference to the `self` argument of this method\".\n\n\n\n## Type inference {#sec-type-inference}\n\nZig is kind of a strongly typed language. I say \"kind of\" because there are situations\nwhere you don't have to explicitly write the type of every single object in your source code,\nas you would expect from a traditional strongly typed language, such as C and C++.\n\nIn some situations, the `zig` compiler can use type inference to solves the data types for you, easing some of\nthe burden that you carry as a developer.\nThe most common way this happens is through function arguments that receives struct objects\nas input.\n\nIn general, type inference in Zig is done by using the dot character (`.`).\nEverytime you see a dot character written before a struct literal, or before an enum value, or something like that,\nyou know that this dot character is playing a special party in this place. More specifically, it is\ntelling the `zig` compiler something on the lines of: \"Hey! Can you infer the type of this\nvalue for me? Please!\". In other words, this dot character is playing a role similar to the `auto` keyword in C++.\n\nI give you some examples of this at @sec-anonymous-struct-literals, where we present anonymous struct literals.\nAnonymous struct literals are, essentially, struct literals that use type inference to\ninfer the exact type of this particular struct literal.\nThis type inference is done by looking for some minimal hint of the correct data type to be used.\nYou could say that the `zig` compiler looks for any neighbouring type annotation that might tell him\nwhat would be the correct type.\n\nAnother common place where we use type inference in Zig is at switch statements (which we talk about at @sec-switch).\nI also gave some other examples of type inference at @sec-switch, where we were inferring the data types of enum values listed inside\nof switch statements (e.g. `.DE`).\nBut as another example, take a look at this `fence()` function reproduced below,\nwhich comes from the [`atomic.zig` module](https://github.com/ziglang/zig/blob/master/lib/std/atomic.zig)[^fence-fn]\nof the Zig Standard Library.\n\n[^fence-fn]: .\n\nThere are a lot of things in this function that we haven't talked about yet, such as:\nwhat `comptime` means? `inline`? `extern`? What is this star symbol before `Self`?\nLet's just ignore all of these things, and focus solely on the switch statement\nthat is inside this function.\n\nWe can see that this switch statement uses the `order` object as input. This `order`\nobject is one of the inputs of this `fence()` function, and we can see in the type annotation,\nthat this object is of type `AtomicOrder`. We can also see a bunch of values inside the\nswitch statements that begins with a dot character, such as `.release` and `.acquire`.\n\nBecause these weird values contain a dot character before them, we are asking the `zig`\ncompiler to infer the types of these values inside the switch statement. Then, the `zig`\ncompiler is looking into the current context where these values are being used, and it is\ntrying to infer the types of these values.\n\nSince they are being used inside a switch statement, the `zig` compiler looks into the type\nof the input object given to the switch statement, which is the `order` object in this case.\nBecause this object have type `AtomicOrder`, the `zig` compiler infers that these values\nare data members from this type `AtomicOrder`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub inline fn fence(self: *Self, comptime order: AtomicOrder) void {\n // LLVM's ThreadSanitizer doesn't support the normal fences so we specialize for it.\n if (builtin.sanitize_thread) {\n const tsan = struct {\n extern \"c\" fn __tsan_acquire(addr: *anyopaque) void;\n extern \"c\" fn __tsan_release(addr: *anyopaque) void;\n };\n\n const addr: *anyopaque = self;\n return switch (order) {\n .unordered, .monotonic => @compileError(\n @tagName(order) ++ \" only applies to atomic loads and stores\"\n ),\n .acquire => tsan.__tsan_acquire(addr),\n .release => tsan.__tsan_release(addr),\n .acq_rel, .seq_cst => {\n tsan.__tsan_acquire(addr);\n tsan.__tsan_release(addr);\n },\n };\n }\n\n return @fence(order);\n}\n```\n:::\n\n\n\n\nThis is how basic type inference is done in Zig. If we didn't use the dot character before\nthe values inside this switch statement, then, we would be forced to write explicitly\nthe data types of these values. For example, instead of writing `.release` we would have to\nwrite `AtomicOrder.release`. We would have to do this for every single value\nin this switch statement, and this is a lot of work. That is why type inference\nis commonly used on switch statements in Zig.\n\n\n\n## Type casting {#sec-type-cast}\n\nIn this section, I want to discuss type casting (or, type conversion) with you.\nWe use type casting when we have an object of type \"x\", and we want to convert\nit into an object of type \"y\", i.e. we want to change the data type of the object.\n\nMost languages have a formal way to perform type casting. In Rust for example, we normally\nuse the keyword `as`, and in C, we normally use the type casting syntax, e.g. `(int) x`.\nIn Zig, we use the `@as()` built-in function to cast an object of type \"x\", into\nan object of type \"y\".\n\nThis `@as()` function is the preferred way to perform type conversion (or type casting)\nin Zig. Because it is explicit, and, it also performs the casting only if it\nis unambiguous and safe. To use this function, you just provide the target data type\nin the first argument, and, the object that you want cast at the second argument.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst expect = std.testing.expect;\ntest {\n const x: usize = 500;\n const y = @as(u32, x);\n try expect(@TypeOf(y) == u32);\n}\n```\n:::\n\n\n\n\nThis is the general way to perform type casting in Zig. But remember, `@as()` works only when casting\nis unambiguous and safe, and there are situations where these assumptions do not hold. For example,\nwhen casting an integer value into a float value, or vice-versa, it is not clear to the compiler\nhow to perform this conversion safely.\n\nTherefore, we need to use specialized \"casting functions\" in such situations.\nFor example, if you want to cast an integer value into a float value, then, you\nshould use the `@floatFromInt()` function. In the inverse scenario, you should use\nthe `@intFromFloat()` function.\n\nIn these functions, you just provide the object that you want to\ncast as input. Then, the target data type of the \"type casting operation\" is determined by\nthe type annotation of the object where you are saving the results.\nIn the example below, we are casting the object `x` into a value of type `f32`,\nbecause the object `y`, which is where we are saving the results, is annotated\nas an object of type `f32`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst expect = std.testing.expect;\ntest {\n const x: usize = 565;\n const y: f32 = @floatFromInt(x);\n try expect(@TypeOf(y) == f32);\n}\n```\n:::\n\n\n\n\nAnother built-in function that is very useful when performing type casting operations is `@ptrCast()`.\nIn essence, we use the `@as()` built-in function when we want to explicit convert (or cast) a Zig value/object\nfrom a type \"x\" to a type \"y\", etc. However, pointers (we are going to discuss pointers\nin more depth at @sec-pointer) are a special type of object in Zig,\ni.e. they are treated differently from \"normal objects\".\n\nEverytime a pointer is involved in some \"type casting operation\" in Zig, the `@ptrCast()` function is used.\nThis function works similarly to `@floatFromInt()`.\nYou just provide the pointer object that you want to cast as input to this function, and the\ntarget data type is, once again, determined by the type annotation of the object where the results are being\nstored.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst expect = std.testing.expect;\ntest {\n const bytes align(@alignOf(u32)) = [_]u8{\n 0x12, 0x12, 0x12, 0x12\n };\n const u32_ptr: *const u32 = @ptrCast(&bytes);\n try expect(@TypeOf(u32_ptr) == *const u32);\n}\n```\n:::\n\n\n\n\n\n\n\n\n## Modules\n\nWe already talked about what modules are, and also, how to import other modules into\nyour current module via *import statements*. Every Zig module (i.e. a `.zig` file) that you write in your project\nis internally stored as a struct object. Take the line exposed below as an example. In this line we are importing the\nZig Standard Library into our current module.\n\n```zig\nconst std = @import(\"std\");\n```\n\nWhen we want to access the functions and objects from the standard library, we\nare basically accessing the data members of the struct stored in the `std`\nobject. That is why we use the same syntax that we use in normal structs, with the dot operator (`.`)\nto access the data members and methods of the struct.\n\nWhen this \"import statement\" get's executed, the result of this expression is a struct\nobject that contains the Zig Standard Library modules, global variables, functions, etc.\nAnd this struct object get's saved (or stored) inside the constant object named `std`.\n\n\nTake the [`thread_pool.zig` module from the project `zap`](https://github.com/kprotty/zap/blob/blog/src/thread_pool.zig)[^thread]\nas an example. This module is written as if it was\na big struct. That is why we have a top-level and public `init()` method\nwritten in this module. The idea is that all top-level functions written in this\nmodule are methods from the struct, and all top-level objects and struct declarations\nare data members of this struct. The module is the struct itself.\n\n[^thread]: \n\n\nSo you would import and use this module by doing something like this:\n\n```zig\nconst std = @import(\"std\");\nconst ThreadPool = @import(\"thread_pool.zig\");\nconst num_cpus = std.Thread.getCpuCount()\n catch @panic(\"failed to get cpu core count\");\nconst num_threads = std.math.cast(u16, num_cpus)\n catch std.math.maxInt(u16);\nconst pool = ThreadPool.init(\n .{ .max_threads = num_threads }\n);\n```\n\n\n\n",
"supporting": [
"03-structs_files"
],
diff --git a/_freeze/Chapters/04-http-server/execute-results/html.json b/_freeze/Chapters/04-http-server/execute-results/html.json
index 55d61bd..d3605d7 100644
--- a/_freeze/Chapters/04-http-server/execute-results/html.json
+++ b/_freeze/Chapters/04-http-server/execute-results/html.json
@@ -1,9 +1,11 @@
{
- "hash": "48d7ba5bf8c1d62ff6673ea2badfd346",
+ "hash": "5b2814a67e5aacd90ffe6b3145480ec8",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Project 2 - Building a HTTP Server from scratch\n\nIn this chapter, I want to implement a new\nsmall project with you. This time, we are going\nto implement a basic HTTP Server from scratch.\n\nThe Zig Standard Library already have a HTTP Server\nimplemented, which is available at `std.http.Server`.\nBut again, our objective here in this chapter, is to implement\nit **from scratch**. So we can't use this server object available\nfrom the Zig Standard Library.\n\n## What is a HTTP Server?\n\nFirst of all, what is a HTTP Server?\nA HTTP server, as any other type of server, is essentially\na program that runs indefinitely, on an infinite loop, waiting for incoming connections\nfrom clients. Once the server receives an incoming connection, it will\naccept this connection, and it will send messages back-and-forth to the client\nthrough this connection.\n\nBut the messages that are transmitted inside this connection are in a\nspecific format. They are HTTP messages\n(i.e. messages that use the HTTP Protocol specification).\nThe HTTP Protocol is the backbone of the modern web.\nThe world wide web as we know it today, would not exist without the \nHTTP Protocol.\n\nSo, Web servers (which is just a fancy name to\nHTTP Servers) are servers that exchange HTTP messages with clients.\nAnd these HTTP servers and the HTTP Protocol specification\nare essential to the operation of the world wide web today.\n\nThat is the whole picture of the process.\nAgain, we have two subjects involved here, a server (which is\na program that is running indefinitely, waiting to receive incoming connections),\nand a client (which is someone that wants to connect to the server,\nand exchange HTTP messages with it).\n\nYou may find the material about the [HTTP Protocol available at the Mozilla MDN Docs](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview)[^mdn-http]\n, a great resource for you to also look at. It gives you a great overview on how\nHTTP works, and what role the server plays in this matter.\n\n[^mdn-http]: .\n\n\n## How a HTTP Server works? {#sec-how-http-works}\n\nImagine a HTTP Server as if it were the receptionist of a large hotel. In a hotel,\nyou have a reception, and inside that reception there is a receptionist\nwaiting for customers to arrive. A HTTP Server is essentially a receptionist\nthat is indefinitely waiting for new customers (or, in the context of HTTP, new clients)\nto arrive in the hotel.\n\nWhen a customer arrives at the hotel, that customer starts a conversation with the\nreceptionist. He tells the receptionist how many days he wants to stay at the hotel.\nThen, the receptionist search for an available apartment. If there is an available apartment\nat the moment, the customer pays the hotel fees, then, he gets the keys to the apartment,\nand then, he goes to the apartment to rest.\n\nAfter this entire process of dealing with the customer (searching for available apartments,\nreceiving payment, handing over the keys), the receptionist goes back to what he was\ndoing earlier, which is to wait. Wait for new customers to arrive.\n\nThat is, in a nutshell, what a HTTP Server do. It waits for clients to connect to the\nserver. When a client attempts to connect to the server, the server accepts this connection,\nand it starts to exchange messages with the client through this connection.\nThe first message that happens inside this connection is always a message from the client\nto the server. This message is called the *HTTP Request*.\n\nThis HTTP Request is a HTTP message that contains what\nthe client wants from the server. It is literally a request. The client\nthat connected to the server is asking this server to do something for him.\n\nThere are different \"types of request\" that a client can send to a HTTP Server.\nBut the most basic type of request, is when a client ask to the\nHTTP Server to serve (i.e. to send) some specific web page (which is a HTML file) to him.\nWhen you type `google.com` in your web browser, you are essentially sending a HTTP Request to Google's\nHTTP servers. This request is asking these servers to send the Google webpage to you.\n\nNonetheless, when the server receives this first message, the *HTTP Request*, it\nanalyzes this request, to understand: who the client is? What he wants the server to do?\nThis client has provided all the necessary information to perform the action that he\nasked? Etc.\n\nOnce the server understands what the client wants, he simply perform the action\nthat was requested, and, to finish the whole process, the server sends back\na HTTP message to the client, informing if the action performed was successful or not,\nand, at last, the server ends (or closes) the connection with the client.\n\nThis last HTTP message sent from the server to the client, is called the *HTTP Response*.\nBecause the server is responding to the action that was requested by the client.\nThe main objective of this response message is let the client know if the\naction requested was successful or not, before the server closes the connection.\n\n\n## How a HTTP server is normally implemented? {#sec-http-how-impl}\n\nLet's use the C language as an example. There are many materials\nteaching how to write a simple HTTP server in C code, like @jeffrey_http,\nor @nipun_http, or @eric_http.\nHaving this in mind, I will not show C code examples here, because you\ncan find them on the internet.\nBut I will describe the theory behind the necessary steps to create\nsuch HTTP server in C.\n\n\nIn essence, we normally implement a HTTP server in C by using WebSocket technology,\nwhich involves the following steps:\n\n1. Create a socket object.\n1. Bind a name (or more specifically, an address) to this socket object.\n1. Make this socket object to start listening and waiting for incoming connections.\n1. When a connection arrive, we accept this connection, and we exchange the HTTP messages (HTTP Request and HTTP Response).\n1. Then, we simply close this connection.\n\n\nA socket object is essentially a channel of communication.\nYou are creating a channel where people can send messages through.\nWhen you create a socket object, this object is not binded to any particular\naddress. This means that with this object you have a representation of a channel of communication\nin your hands. But this channel is not currently available, or, it is not currently accessible,\nbecause it do not have a known address where you can find it.\n\nThat is what the \"bind\" operation do. It binds a name (or more specifically, an address) to\nthis socket object, or, this channel of communication, so that it becomes available,\nor, accessible through this address. While the \"listen\" operation makes the socket object to\nlisten for incoming connections in this address. In other words, the \"listen\" operation\nmakes the socket to wait for incoming connections.\ncurrently\nNow, when a client actually attempts to connect to the server through the socket address\nthat we have specified, in order to establish this connection with the client,\nthe socket object needs to accept this incoming connection. Thus, when we\naccept an incoming connection, the client and the server become\nconnected to each other, and they can start reading or writing messages into this\nestablished connection.\n\nAfter we receive the HTTP Request from the client, analyze it, and send the HTTP Response\nto the client, we can then close the connection, and end this communication.\n\n\n## Implementing the server - Part 1\n\n### Creating the socket object {#sec-create-socket}\n\nLet's begin with creating the socket object for our server.\nJust to make things shorter, I will create this socket object in\na separate Zig module. I will name it `config.zig`.\n\nIn Zig, we can create a web socket using\nthe `std.posix.socket()` function, from the Zig Standard Library.\nAs I meantioned earlier at @sec-http-how-impl, every socket object that we create\nrepresents a communication channel, and we need to bind this channel to a specific address.\nAn \"address\" is defined as an IP address, or, more specifically, an IPv4 address^[It can be also an IPv6 address. But normally, we use a IPv4 address for that.].\nEvery IPv4 address is composed by two components. The first component is the host,\nwhich is a sequence of 4 numbers separated by dot characters (`.`) that identifies the machine used.\nWhile the second component is a port number, which identifies the specific\ndoor, or, the specific port to use in the host machine.\n\nThe sequence of 4 numbers (i.e. the host) identifies the machine (i.e. the computer itself) where\nthis socket will live in. Every computer normally have multiple \"doors\" available inside of him, because \nthis allows the computer to receive and work with multiple connections at the same time.\nHe simply use a single door for each connection. So the port number, is\nessentially a number that identifies the specific door in the computer that will be resposible\nfor receiving the connection. That is, it identifies the \"door\" in the computer that the socket will use\nto receive incoming connections.\n\nTo make things simpler, I will use an IP address that identifies our current machine in this example.\nThis means that, our socket object will reside on the same computer that we are currently using\n(this is also known as the \"localhost\") to write this Zig source code.\n\nBy convention, the IP address that identifies the \"localhost\", which is the current machine we\nare using, is the IP `127.0.0.1`. So, that is the IP\naddress we are going to use in our server. I can declare it in Zig\nby using an array of 4 integers, like this:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst localhost = [4]u8{ 127, 0, 0, 1 };\n_ = localhost;\n```\n:::\n\n\n\n\n\nNow, we need to decide which port number to use. By convention, there are some\nport numbers that are reserved, meaning that, we cannot use them for our own\npurposes, like the port 22 (which is normally used for SSH connections).\nFor TCP connections, which is our case here,\na port number is a 16-bit unsigned integer (type `u16` in Zig),\nthus ranging from 0 to 65535 [@wikipedia_port].\nSo, we can choose\na number from 0 to 65535 for our port number. In the \nexample of this book, I will use the port number 3490\n(just a random number).\n\n\nNow that we have these two informations at hand, I can\nfinally create our socket object, using the `std.posix.socket()` function.\nFirst, we use the host and the port number to create an `Address` object,\nwith the `std.net.Address.initIp4()` function, like in the example below.\nAfter that, I use this address object inside the `socket()` function\nto create our socket object.\n\nThe `Socket` struct defined below summarizes all the logic behind\nthis process. In this struct, we have two data members, which are:\n1) the address object; 2) and a stream object, which is\nthe object we will use to read and write the messages into any connection we establish.\n\nNotice that, inside the constructor method of this struct,\nwhen we create the socket object, we are using the `IPROTO.TCP` property as an input to\ntell the function to create a socket for TCP connections.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst builtin = @import(\"builtin\");\nconst net = @import(\"std\").net;\n\npub const Socket = struct {\n _address: std.net.Address,\n _stream: std.net.Stream,\n\n pub fn init() !Socket {\n const host = [4]u8{ 127, 0, 0, 1 };\n const port = 3490;\n const addr = net.Address.initIp4(host, port);\n const socket = try std.posix.socket(\n addr.any.family,\n std.posix.SOCK.STREAM,\n std.posix.IPPROTO.TCP\n );\n const stream = net.Stream{ .handle = socket };\n return Socket{ ._address = addr, ._stream = stream };\n }\n};\n```\n:::\n\n\n\n\n\n\n### Listening and receiving connections\n\nRemember that we stored the `Socket` struct\ndeclaration that we built at @sec-create-socket inside a Zig module named `config.zig`.\nThis is why I imported this module into our main module (`main.zig`) in the example below, as the `SocketConf` object,\nto access the `Socket` struct.\n\nOnce we created our socket object, we can focus now on making this socket object\nlisten and receive new incoming connections. We do that, by calling the `listen()`\nmethod from the `Address` object that is contained inside the socket object, and then,\nwe call the `accept()` method over the result.\n\nThe `listen()` method from the `Address` object produces a server object,\nwhich is an object that will stay open and running indefinitely, waiting\nto receive an incoming connection. Therefore, if you try to run the code\nexample below, by calling the `run` command from the `zig` compiler,\nyou will notice that the programs keeps running indefinitely,\nwithout a clear end.\n\nThis happens, because the program is waiting for something to happen.\nIt is waiting for someone to try to connect to the address (`http://127.0.0.1:3490`) where\nthe server is running and listening for incoming connections. This is what\nthe `listen()` method do, it makes the socket to be active waiting for someone\nto connect.\n\nOn the other side, the `accept()` method is the function that establishes the connection\nwhen someone tries to connect to the socket. This means that, the `accept()` method\nreturns a new connection object as a result. And you can use this connection object\nto read or write messages from or to the client.\nFor now, we are not doing anything with this connection object.\nBut we are going to use it on the next section.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n try stdout.print(\"Server Addr: {any}\\n\", .{socket._address});\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n _ = connection;\n}\n```\n:::\n\n\n\n\n\nThis code example allows one single connection. In other words, the\nserver will wait for one incoming connection, and as soon as the\nserver is done with this first connection that it establishes, the\nprogram ends, and the server stops.\n\nThis is not the norm on the real world. Most people that write\na HTTP server like this, usually put the `accept()` method\ninside a `while` (infinite) loop, where if a connection\nis created with `accept()`, a new thread of execution is created to deal with\nthis new connection and the client. That is, real-world examples of HTTP Servers\nnormally rely on parallel computing to work.\n\nWith this design, the server simply accepts the connection,\nand the whole process of dealing with the client, and receiving\nthe HTTP Request, and sending the HTTP Response, all of this\nis done in the background, on a separate execution thread.\n\nSo, as soon as the server accepts the connection, and creates\nthe separate thread, the server goes back to what he was doing earlier,\nwhich is to wait indefinitely for a new connection to accept.\nHaving this in mind, the code example exposed above, is a\nserver that serves only a single client. Because the program\nterminates as soon as the connection is accepted.\n\n\n\n### Reading the message from the client {#sec-read-http-message}\n\nNow that we have a connection established, i.e. the connection\nobject that we created through the `accept()` function, we can now\nuse this connection object to read any messages that the client\nsend to our server. But we can also use it to send messages back\nto the client.\n\nThe basic idea is, if we **write** any data into this connection object,\nthen, we are sending data to the client, and if we **read** the data present in\nthis connection object, then, we are reading any data that the\nclient sent to us, through this connection object. So, just\nhave this logic in mind. \"Read\" is for reading messages from the client,\nand \"write\" is to send a message to the client.\n\nRemember from @sec-how-http-works that, the first thing that we need to do is to read the HTTP Request\nsent by the client to our server. Because it is the first message that happens\ninside the established connection, and, as a consequence, it is the first\nthing that we need to deal with.\n\nThat is why, I'm going to create a new Zig module in this small project, named `request.zig`\nto keep all functions related to the HTTP Request\ntogether. Then, I will create a new function named `read_request()` that will\nuse our connection object to read the message sent by the client,\nwhich is the HTTP Request.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Connection = std.net.Server.Connection;\npub fn read_request(conn: Connection,\n buffer: []u8) !void {\n const reader = conn.stream.reader();\n _ = try reader.read(buffer);\n}\n```\n:::\n\n\n\n\n\n\nThis function accepts a slice object which behaves as a buffer.\nThe `read_request()` function reads the message sent into\nthe connection object, and saves this message into this buffer object that\nwe have provided as input.\n\nNotice that I'm using the connection object that we created to read\nthe message from the client. I first access the `reader` object that lives inside the\nconnection object. Then, I call the `read()` method of this `reader` object\nto effectivelly read and save the data sent by the client into the buffer object\nthat we created earlier. I'm discarting the return value\nof the `read()` method, by assigning it to the underscore character (`_`),\nbecause this return value is not useful for us right now.\n\n\n\n## Looking at the current state of the program\n\n\nI think now is a good time to see how our program is currently working. Shall we?\nSo, the first thing I will do is to update the `main.zig` module in our small Zig project,\nso that the `main()` function call this new `read_request()` function that we have just created.\nI will also add a print statement at the end of the `main()` function,\njust so that you can see what the HTTP Request that we have just loaded into the buffer object\nlooks like.\n\nAlso, I'm creating the buffer object in the `main()` function, which will be\nresponsible for storing the message sent by the client, and, I'm also\nusing a `for` loop to initialize all fields of this buffer object to the number zero.\nThis is important to make sure that we don't have uninitialized memory in\nthis object. Because uninitialized memory may cause undefined behaviour in our program.\n\nSince the `read_request()` function should receive as input the buffer object as a slice object (`[]u8`),\nI am using the syntax `array[0..array.len]` to get access to a slice of this `buffer` object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst Request = @import(\"request.zig\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n try stdout.print(\"Server Addr: {any}\\n\", .{socket._address});\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n var buffer: [1000]u8 = undefined;\n for (0..buffer.len) |i| {\n buffer[i] = 0;\n }\n _ = try Request.read_request(\n connection, buffer[0..buffer.len]\n );\n try stdout.print(\"{s}\\n\", .{buffer});\n}\n```\n:::\n\n\n\n\n\nNow, I'm going to execute this program, with the `run` command from the\n`zig` compiler. But remember, as we sad earlier, as soon as I execute this program, it will\nhang indefinitely, because the program is waiting for a client trying to\nconnect to the server.\n\nMore specifically, the program will pause at the line\nwith the `accept()` call. As soon as a client try to connect to the\nserver, then, the execution will \"unpause\", and the `accept()` function\nwill finally be executed to create the\nconnection object that we need, and the remaining of the program\nwill run.\n\nYou can see that at @fig-print-zigrun1. The message `Server Addr: 127.0.0.1:3490`\nis printed to the console, and the program is now waiting for an incoming connection.\n\n![A screenshot of running the program](./../Figures/print-zigrun1.png){#fig-print-zigrun1}\n\n\nWe can finally try to connect to this server, and there are several ways we can do this.\nFor example, we could use the following Python script:\n\n```python\nimport requests\nrequests.get(\"http://127.0.0.1:3490\")\n```\n\nOr, we could also open any web browser of our preference, and type\nthe URL `localhost:3490`. OBS: `localhost` is the same thing as the\nIP `127.0.0.1`. When you press enter, and your web browser go\nto this address, first, the browser will probably print a message\nsaying that \"this page isn't working\", and, then, it will\nprobably change to a new message saying that \"the site can't be\nreached\".\n\nYou get these \"error messages\" in the web browser, because\nit got no response back from the server. In other words, when the web\nbrowser connected to our server, it did send the HTTP Request through the established connection.\nThen, the web browser was expecting to receive a HTTP Response back, but\nit got no response from the server (we didn't implemented the HTTP Response logic yet).\n\nBut that is okay. We have achieved the result that we wanted for now,\nwhich is to connect to the server, and see the HTTP Request\nthat was sent by the web browser (or by the Python script)\nto the server.\n\nIf you comeback to the console that you left open\nwhen you have executed the program, you will see that the\nprogram finished it's execution, and, a new message is\nprinted in the console, which is the actual HTTP Request\nmessage that was sent by the web browser to the server.\nYou can see this message at @fig-print-zigrun2.\n\n![A screenshot of the HTTP Request sent by the web browser](./../Figures/print-zigrun2.png){#fig-print-zigrun2}\n\n\n\n\n## Learning about Enums in Zig {#sec-enum}\n\nEnums structures are available in Zig through the `enum` keyword.\nAn enum (short for \"enumeration\") is a special structure that represents a group of constant values.\nSo, if you have a variable which can assume a short and known\nset of values, you might want to associate this variable to an enum structure,\nto make sure that this variable only assumes a value from this set.\n\nA classic example for enums are primary colors. If for some reason, your program\nneeds to represent one of the primary colors, you can create an enum\nthat represents one of these colors.\nIn the example below, we are creating the enum `PrimaryColorRGB`, which\nrepresents a primary color from the RGB color system. By using this enum,\nI am garanteed that the `acolor` object for example, will contain\none of these three values: `RED`, `GREEN` or `BLUE`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst PrimaryColorRGB = enum {\n RED, GREEN, BLUE\n};\nconst acolor = PrimaryColorRGB.RED;\n_ = acolor;\n```\n:::\n\n\n\n\n\nIf for some reason, my code tries to save in `acolor`,\na value that is not in this set, I will get an error message\nwarning me that a value such as \"MAGENTA\" do not exist\ninside the `PrimaryColorRGB` enum.\nThen I can easily fix my mistake.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst acolor = PrimaryColorRGB.MAGENTA;\n```\n:::\n\n\n\n\n\n```\ne1.zig:5:36: error: enum 'PrimaryColorRGB' has\n no member named 'MAGENTA':\n const acolor = PrimaryColorRGB.MAGENTA;\n ^~~~~~~\n```\n\nBehind the hood, enums in Zig work the same way that enums\nwork in C. Each enum value is essentially represented as an integer.\nThe first value in the set is represented as zero,\nthen, the second value is one, ... etc.\n\nOne thing that we are going to learn on the next section is that\nenums can have methods in them. Wait... What? This is amazing!\nYes, enums in Zig are similar to structs, and they can have\nprivate and public methods inside them.\n\n\n\n\n\n\n\n## Implementing the server - Part 2\n\nNow, on this section, I want to focus on parsing\nthe HTTP Request that we received from the client.\nHowever, to effectively parse a HTTP Request message, we first need to understand it's\nstructure.\nIn summary, a HTTP Request is a text message that is divided into 3 different\nsections (or parts):\n\n- The top-level header indicating the method of the HTTP Request, the URI, and the HTTP version used in the message.\n- A list of HTTP Headers.\n- The body of the HTTP Request.\n\n### The top-level header\n\nThe first line of text in a HTTP Request always come with the three most essential\ninformation about the request. These three key attributes of the HTTP Request\nare separated by a simple space in this first line of the request.\nThe first information is the HTTP method that is being\nused in the request, second, we have the URI to which this HTTP Request is being sent to,\nand third, we have the version of the HTTP protocol that is being used in this HTTP Request.\n\nIn the snippet below, you can find an example of this first line in a HTTP Request.\nFirst, we have the HTTP method of this request (`GET`). Many programmers\nrefer to the URI component (`/users/list`) as the \"API endpoint\" to which the HTTP Request\nis being sent to. In the context of this specific request, since it is a GET request,\nyou could also say that the URI component is the path to the resource we want to access,\nor, the path to the document (or the file) that we want to retrieve from the server.\n\n```\nGET /users/list HTTP/1.1\n```\n\nAlso, notice that this HTTP Request is using the version 1.1 of the HTTP protocol,\nwhich is the most popular version of the protocol used in the web.\n\n\n\n### The list of HTTP headers\n\nMost HTTP Requests also include a section of HTTP Headers,\nwhich is just a list of attributes or key-value pairs associated with this\nparticular request. This section always comes right after the \"top-level header\" of the request.\n\nFor our purpose in this chapter, which is to build a simple HTTP Server,\nwe are going to ignore this section of the HTTP Request, for simplicity.\nBut most HTTP servers that exist in the wild parses and use these\nHTTP headers to change the way that the server responds to the request\nsent by the client.\n\nFor example, many requests we encounter in the real-world comes with\na HTTP header called `Accept`. In this header, we find a list of [MIME types](https://en.wikipedia.org/wiki/Media_type)[^mime].\nThis list indicates the file formats that the client can read, or parse, or interpret.\nIn other words, you also interpret this header as the client saying the following phrase\nto the server: \"Hey! Look, I can read only HTML documents, so please, send me back\na document that is in a HTML format.\".\n\n[^mime]: .\n\nIf the HTTP server can read and use this `Accept` header, then, the server can identify\nwhich is the best file format for the document to be sent to the client. Maybe the HTTP server have\nthe same document in multiple formats, for example, in JSON, in XML, in HTML and in PDF,\nbut the client can only understand documents in the HTML format. That is the purpose\nof this `Accept` header.\n\n\n### The body\n\nThe body comes after the list of HTTP headers, and it is an optional section of the HTTP Request, meaning that, not\nall HTTP Request will come with a body in it. For example, every HTTP Request that uses the\nGET method usually do not come with a body.\n\nBecause a GET request is used to request data, instead of sending it to the server.\nSo, the body section is more related to the POST method, which is a method that involves\nsending data to the server, to be processed and stored.\n\nSince we are going to support only the GET method in this project, it means that\nwe also do not need to care about the body of the request.\n\n\n\n### Creating the HTTP Method enum\n\nEvery HTTP Request comes with a explicit method. The method used in a HTTP Request\nis identified by one these words:\n\n- GET;\n- POST;\n- OPTIONS;\n- PATCH;\n- DELETE;\n- and some other methods.\n\nEach HTTP method is used for a specific type of task. The POST method for example is normally\nused to post some data into the destination. In other words, it is used\nto send some data to the HTTP server, so that it can be processed and stored by the server.\n\nAs another example, the GET method is normally used to get content from the server.\nIn other words, we use this method whenever we want the server to send some\ncontent back to us. It can be any type of content. It can be a web page,\na document file, or some data in a JSON format.\n\nWhen a client sends a POST HTTP Request, the HTTP Response sent by the server normally have the sole purpose of\nletting the client know if the server processed and stored the data succesfully.\nIn contrast, when the server receives a GET HTTP Request, then, the server sends the content\nthat the client asked for in the HTTP Response itself. This demonstrates that the method associated\nwith the HTTP Request changes a lot on the dynamics and the roles that each party\nplays in the whole process.\n\nSince the HTTP method of the HTTP Request is identified by this very small and specific\nset of words, it would be interesting to create an enum structure to represent a HTTP method.\nThis way, we can easily check if the HTTP Request we receive from the client is a\nHTTP method that we currently support in our small HTTP server project.\n\nThe `Method` structure below represents this enumeration.\nNotice that, for now, only the GET HTTP method is included in this\nenumeration. Because, for the purpose of this chapter, I want to\nimplement only the GET HTTP method. That is why I am not\nincluding the other HTTP methods in this enumeration.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const Method = enum {\n GET\n};\n```\n:::\n\n\n\n\n\n\nNow, I think we should add two methods to this enum structure. One method is `is_supported()`,\nwhich will be a function that returns a boolean value, indicating if the input HTTP method is supported\nor not by our HTTP Server. The other is `init()`, which is a constructor function that takes a string as input,\nand tries to convert it into a `Method` value.\n\n\nBut in order to build these functions, I will use a functionality from the Zig Standard Library, called\n`StaticStringMap()`. This function allows us to create a simple map from strings to enum values.\nIn other words, we can use this map structure to map a string to the respective enum value.\nTo some extent, this specific structure from the standard library works almost like a \"hashtable\" structure,\nand it is optimized for small sets of words, or, small sets of keys, which is our case here.\nWe are going to talk more about hashtables in Zig at @sec-maps-hashtables.\n\nTo use this \"static string map\" structure, you have to import it from the `std.static_string_map` module\nof the Zig Standard Library. Just to make things shorter and easier to type, I am going to import this\nfunction through a different and shorter name (`Map`).\n\nWith `Map()` imported, we can just apply this function over the enum structure\nthat we are going to use in the resulting map. In our case here, it is the `Method` enum structure\nthat we declared at the last code example. Then, I call the `initComptime()` method with the\nmap, i.e. the list of key-value pairs that we are going to use.\n\nYou can see in the example below that I wrote this map using multiple anonymous struct literals.\nInside the first (or \"top-level\") struct literal, we have a list (or a sequence) of struct literals.\nEach struct literal in this list represents a separate key-value pair. The first element (or the key)\nin each key-value pair should always be a string value. While the second element should\nbe a value from the enum structure that you have used inside the `Map()` function.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst Map = std.static_string_map.StaticStringMap;\nconst MethodMap = Map(Method).initComptime(.{\n .{ \"GET\", Method.GET },\n});\n```\n:::\n\n\n\n\n\nTherefore, the `MethodMap` object is basically a `std::map` object from C++, or,\na `dict` object from Python. You can retrieve (or get) the enum value that\ncorresponds to a particular key, by using the `get()` method from the map\nobject. This method returns an optional value, so, the `get()` method might\nresult in a null value.\n\nWe can use this in our advantage to detect if a particular HTTP method is\nsupported or not in our HTTP server. Because, if the `get()` method returns null,\nit means that it did not found the method that we provided inside the `MethodMap` object, and,\nas a consequence, this method is not supported by our HTTP server.\n\nThe `init()` method below, takes a string value as input, and then, it simply passes this string value\nto the `get()` method of our `MethodMap` object. As consequence, we should get the enum value that corresponds\nto this input string.\n\nNotice in the example below that, the `init()` method returns either an error\n(which might happen if the `?` method returns `unreacheable`, checkout @sec-null-handling for more details)\nor a `Method` object as result. Since `GET` is currently the only value in our `Method` enum\nstructure, it means that, the `init()` method will most likely return the value `Method.GET` as result.\n\nAlso notice that, in the `is_supported()` method, we are using the optional value returned\nby the `get()` method from our `MethodMap` object. The if statement unwrapes the optional value\nreturned by this method, and returns `true` in case this optional value is a not-null value.\nOtherwise, it simply returns `false`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const Method = enum {\n GET,\n pub fn init(text: []const u8) !Method {\n return MethodMap.get(text).?;\n }\n pub fn is_supported(m: []const u8) bool {\n const method = MethodMap.get(m);\n if (method) |_| {\n return true;\n }\n return false;\n }\n};\n```\n:::\n\n\n\n\n\n\n\n\n\n\n\n### Writing the parse request function\n\nNow that we created the enum that represents our HTTP method,\nwe should start to write the function responsible for\nactually parsing the HTTP Request.\n\nThe first thing we can do, is to write a struct to represent the HTTP Request.\nTake the `Request` struct below as an example. It contains the three\nessential information from the \"top-level\" header (i.e. the first line)\nin the HTTP Request.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst Request = struct {\n method: Method,\n version: []const u8,\n uri: []const u8,\n pub fn init(method: Method,\n uri: []const u8,\n version: []const u8) Request {\n return Request{\n .method = method,\n .uri = uri,\n .version = version,\n };\n }\n};\n```\n:::\n\n\n\n\n\n\nThe `parse_request()` function should receive a string as input. This input string\ncontains the entire HTTP Request message, and the parsing function should\nread and understand the individual parts of this message.\n\nNow, remember that for the purpose of this chapter, we care only about the first\nline in this message, which contains the \"top-level header\", or, the three essential attributes about the HTTP Request,\nwhich are the HTTP method used, the URI and the HTTP version.\n\nNotice that I use the function `indexOfScalar()` in `parse_request()`. This function from the\nZig Standard Library returns the first index where the scalar value that we provide\nhappens in a string. In this case, I'm looking at the first occurrence of the new line character (`\\n`).\nBecause once again, we care only about the first line in the HTTP Request message.\nThis is the line where we have the three information that we want to parse\n(version of HTTP, the HTTP method and the URI).\n\nTherefore, we are using this `indexOfScalar()` function\nto limit our parsing process to the first line in the message.\nIs also worth mentioning that, the `indexOfScalar()` function returns an optional value.\nThat is why I use the `orelse` keyword to provide an alternative value, in case\nthe value returned by the function is a null value.\n\nSince each of these three attributes are separated by a simple space, we\ncould use the function `splitScalar()` from the Zig Standard Library to split\nthe input string into sections by looking for every position that appears\na simple space. In other words, this `splitScalar()` function is equivalent\nto the `split()` method in Python, or, the `std::getline()` function from C++,\nor the `strtok()` function in C.\n\nWhen you use this `splitScalar()` function, you get an iterator as the result.\nThis iterator have a `next()` method that you can use to advance the iterator\nto the next position, or, to the next section of the splitted string.\nNote that, when you use `next()`, the method not only advances the iterator,\nbut it also returns a slice to the current section of the splitted\nstring as result.\n\nNow, if you want to get a slice to the current section of the splitted\nstring, but not advance the iterator to the next position, you can use\nthe `peek()` method. Both `next()` and `peek()` methods return an optional value, that is\nwhy I use the `?` method to unwrap these optional values.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn parse_request(text: []u8) Request {\n const line_index = std.mem.indexOfScalar(\n u8, text, '\\n'\n ) orelse text.len;\n var iterator = std.mem.splitScalar(\n u8, text[0..line_index], ' '\n );\n const method = try Method.init(iterator.next().?);\n const uri = iterator.next().?;\n const version = iterator.next().?;\n const request = Request.init(method, uri, version);\n return request;\n}\n```\n:::\n\n\n\n\n\n\nAs I described at @sec-zig-strings, strings in Zig are simply arrays of bytes in the language.\nSo, you will find lots of excellent utility functions to work directly with strings\ninside this `mem` module from the Zig Standard Library.\nWe have described some of these useful utility functions already\nat @sec-strings-useful-funs.\n\n\n\n### Using the parse request function\n\nNow that we wrote the function responsible for parsing the HTTP Request,\nwe can add the function call to `parse_request()` in\nthe `main()` function of our program.\n\nAfter that, is a good idea to test once again the state of our program.\nI execute this program again with the `run` command from the `zig` compiler,\nthen, I use my web browser to connect once again to the server through the URL `localhost:3490`, and finally,\nthe end result of our `Request` object is printed to the console.\n\nA quick observation, since I have used the `any` format specifier in the\nprint statement, the data members `version` and `uri` of the `Request`\nstruct were printed as raw integer values. String data being printed\nas integer values is common in Zig, and remember, these integer values are just the decimal representation of\nthe bytes that form the string in question.\n\nIn the result below, the sequence of decimal values 72, 84, 84, 80, 47, 49, 46, 49, and 13,\nare the bytes that form the text \"HTTP/1.1\". And the integer 47, is the decimal value of\nthe character `/`, which represents our URI in this request.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst Request = @import(\"request.zig\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n\n var buffer: [1000]u8 = undefined;\n for (0..buffer.len) |i| {\n buffer[i] = 0;\n }\n try Request.read_request(\n connection, buffer[0..buffer.len]\n );\n const request = Request.parse_request(buffer[0..buffer.len]);\n try stdout.print(\"{any}\\n\", .{request});\n}\n```\n:::\n\n\n\n\n\n```\nrequest.Request{\n .method = request.Method.GET,\n .version = {72, 84, 84, 80, 47, 49, 46, 49, 13},\n .uri = {47}\n}\n```\n\n\n\n### Sending the HTTP Response to the client\n\nIn this last part, we are going to write the logic responsible for\nsending the HTTP Response from the server to the client. To make things\nsimple, the server in this project will send just a simple web page\ncontaining the text \"Hello world\".\n\nFirst, I create a new Zig module in the project, named `response.zig`.\nIn this module, I will declare just two functions. Each function\ncorresponds to a specific status code in the HTTP Response.\nThe `send_200()` function will send a HTTP Response with status code 200\n(which means \"Success\") to the client. While the `send_404()` function sends a response\nwith status code 404 (which means \"Not found\").\n\nThis is definitely not the most ergonomic and adequate way of handling the\nHTTP Response, but it works for our case here. We are just building toy projects\nin this book after all, therefore, the source code that we write do not need to be perfect.\nIt just needs to work!\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Connection = std.net.Server.Connection;\npub fn send_200(conn: Connection) !void {\n const message = (\n \"HTTP/1.1 200 OK\\nContent-Length: 48\"\n ++ \"\\nContent-Type: text/html\\n\"\n ++ \"Connection: Closed\\n\\n\"\n ++ \"
\"\n );\n _ = try conn.stream.write(message);\n}\n```\n:::\n\n\n\n\n\nNotice that both functions receives the connection object as input, and\nuse the `write()` method to write the HTTP Response message directly\ninto this communication channel. As result, the party in the other\nside of the connection (i.e. the client), will receive such message.\n\nMost real-world HTTP Servers will have a single function (or a single struct) to effectively handle\nthe response. It gets the HTTP Request already parsed as input, and then, it tries to build\nthe HTTP Response bit by bit, before the function sends it over the connection.\n\nWe would also have a specialized struct to represent a HTTP Response, and\na lot of methods that would be used to build each part or component of the response object.\nTake the `Response` struct created by the Javascript runtime Bun as an example.\nYou can find this struct in the [`response.zig` module](https://github.com/oven-sh/bun/blob/main/src/bun.js/webcore/response.zig)[^bun-resp]\nin their GitHub project.\n\n[^bun-resp]: .\n\n\n## The end result\n\nWe can now, update once again our `main()` function to incorporate our new\nfunctions from the `response.zig` module. First, I need to import this module\ninto our `main.zig` module, then, I add the function calls to `send_200()`\nand `send_404()`.\n\nNotice that I'm using if statements to decide which \"response function\" to call,\nbased especially on the URI present in the HTTP Request. If the user asked for\na content (or a document) that is not present in our server, we should respond\nwith a 404 status code. But since we have just a simple HTTP server, with no\nreal documents to send, we can just check if the URI is the root path (`/`)\nor not to decide which function to call.\n\nAlso, notice that I'm using the function `std.mem.eql()` from the Zig Standard Library\nto check if the string from `uri` is equal or not the string `\"/\"`. We have\ndescribed this function already at @sec-strings-useful-funs, so, comeback to\nthat section if you are not familiar yet with this function.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst Request = @import(\"request.zig\");\nconst Response = @import(\"response.zig\");\nconst Method = Request.Method;\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n try stdout.print(\"Server Addr: {any}\\n\", .{socket._address});\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n\n var buffer: [1000]u8 = undefined;\n for (0..buffer.len) |i| {\n buffer[i] = 0;\n }\n try Request.read_request(connection, buffer[0..buffer.len]);\n const request = Request.parse_request(buffer[0..buffer.len]);\n if (request.method == Method.GET) {\n if (std.mem.eql(u8, request.uri, \"/\")) {\n try Response.send_200(connection);\n } else {\n try Response.send_404(connection);\n }\n }\n}\n```\n:::\n\n\n\n\n\n\nNow that we adjusted our `main()` function, I can now execute our program, and\nsee the effects of these last changes. First, I execute the program once again, with the\n`run` command of the `zig` compiler. The program will hang, waiting for a client to connect.\n\nThen, I open my web browser, and try to connect to the server again, using the URL `localhost:3490`.\nThis time, instead of getting some sort of an error message from the browser, you will get the message\n\"Hello World\" printed into your web browser. Because this time, the server sended the HTTP Response\nsuccesfully to the web browser, as demonstrated by @fig-print-zigrun3.\n\n\n![The Hello World message sent in the HTTP Response](./../Figures/print-zigrun3.png){#fig-print-zigrun3}\n\n",
- "supporting": [],
+ "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Project 2 - Building a HTTP Server from scratch\n\nIn this chapter, I want to implement a new\nsmall project with you. This time, we are going\nto implement a basic HTTP Server from scratch.\n\nThe Zig Standard Library already have a HTTP Server\nimplemented, which is available at `std.http.Server`.\nBut again, our objective here in this chapter, is to implement\nit **from scratch**. So we can't use this server object available\nfrom the Zig Standard Library.\n\n## What is a HTTP Server?\n\nFirst of all, what is a HTTP Server?\nA HTTP server, as any other type of server, is essentially\na program that runs indefinitely, on an infinite loop, waiting for incoming connections\nfrom clients. Once the server receives an incoming connection, it will\naccept this connection, and it will send messages back-and-forth to the client\nthrough this connection.\n\nBut the messages that are transmitted inside this connection are in a\nspecific format. They are HTTP messages\n(i.e. messages that use the HTTP Protocol specification).\nThe HTTP Protocol is the backbone of the modern web.\nThe world wide web as we know it today, would not exist without the \nHTTP Protocol.\n\nSo, Web servers (which is just a fancy name to\nHTTP Servers) are servers that exchange HTTP messages with clients.\nAnd these HTTP servers and the HTTP Protocol specification\nare essential to the operation of the world wide web today.\n\nThat is the whole picture of the process.\nAgain, we have two subjects involved here, a server (which is\na program that is running indefinitely, waiting to receive incoming connections),\nand a client (which is someone that wants to connect to the server,\nand exchange HTTP messages with it).\n\nYou may find the material about the [HTTP Protocol available at the Mozilla MDN Docs](https://developer.mozilla.org/en-US/docs/Web/HTTP/Overview)[^mdn-http]\n, a great resource for you to also look at. It gives you a great overview on how\nHTTP works, and what role the server plays in this matter.\n\n[^mdn-http]: .\n\n\n## How a HTTP Server works? {#sec-how-http-works}\n\nImagine a HTTP Server as if it were the receptionist of a large hotel. In a hotel,\nyou have a reception, and inside that reception there is a receptionist\nwaiting for customers to arrive. A HTTP Server is essentially a receptionist\nthat is indefinitely waiting for new customers (or, in the context of HTTP, new clients)\nto arrive in the hotel.\n\nWhen a customer arrives at the hotel, that customer starts a conversation with the\nreceptionist. He tells the receptionist how many days he wants to stay at the hotel.\nThen, the receptionist search for an available apartment. If there is an available apartment\nat the moment, the customer pays the hotel fees, then, he gets the keys to the apartment,\nand then, he goes to the apartment to rest.\n\nAfter this entire process of dealing with the customer (searching for available apartments,\nreceiving payment, handing over the keys), the receptionist goes back to what he was\ndoing earlier, which is to wait. Wait for new customers to arrive.\n\nThat is, in a nutshell, what a HTTP Server do. It waits for clients to connect to the\nserver. When a client attempts to connect to the server, the server accepts this connection,\nand it starts to exchange messages with the client through this connection.\nThe first message that happens inside this connection is always a message from the client\nto the server. This message is called the *HTTP Request*.\n\nThis HTTP Request is a HTTP message that contains what\nthe client wants from the server. It is literally a request. The client\nthat connected to the server is asking this server to do something for him.\n\nThere are different \"types of request\" that a client can send to a HTTP Server.\nBut the most basic type of request, is when a client ask to the\nHTTP Server to serve (i.e. to send) some specific web page (which is a HTML file) to him.\nWhen you type `google.com` in your web browser, you are essentially sending a HTTP Request to Google's\nHTTP servers. This request is asking these servers to send the Google webpage to you.\n\nNonetheless, when the server receives this first message, the *HTTP Request*, it\nanalyzes this request, to understand: who the client is? What he wants the server to do?\nThis client has provided all the necessary information to perform the action that he\nasked? Etc.\n\nOnce the server understands what the client wants, he simply perform the action\nthat was requested, and, to finish the whole process, the server sends back\na HTTP message to the client, informing if the action performed was successful or not,\nand, at last, the server ends (or closes) the connection with the client.\n\nThis last HTTP message sent from the server to the client, is called the *HTTP Response*.\nBecause the server is responding to the action that was requested by the client.\nThe main objective of this response message is let the client know if the\naction requested was successful or not, before the server closes the connection.\n\n\n## How a HTTP server is normally implemented? {#sec-http-how-impl}\n\nLet's use the C language as an example. There are many materials\nteaching how to write a simple HTTP server in C code, like @jeffrey_http,\nor @nipun_http, or @eric_http.\nHaving this in mind, I will not show C code examples here, because you\ncan find them on the internet.\nBut I will describe the theory behind the necessary steps to create\nsuch HTTP server in C.\n\n\nIn essence, we normally implement a HTTP server in C by using WebSocket technology,\nwhich involves the following steps:\n\n1. Create a socket object.\n1. Bind a name (or more specifically, an address) to this socket object.\n1. Make this socket object to start listening and waiting for incoming connections.\n1. When a connection arrive, we accept this connection, and we exchange the HTTP messages (HTTP Request and HTTP Response).\n1. Then, we simply close this connection.\n\n\nA socket object is essentially a channel of communication.\nYou are creating a channel where people can send messages through.\nWhen you create a socket object, this object is not binded to any particular\naddress. This means that with this object you have a representation of a channel of communication\nin your hands. But this channel is not currently available, or, it is not currently accessible,\nbecause it do not have a known address where you can find it.\n\nThat is what the \"bind\" operation do. It binds a name (or more specifically, an address) to\nthis socket object, or, this channel of communication, so that it becomes available,\nor, accessible through this address. While the \"listen\" operation makes the socket object to\nlisten for incoming connections in this address. In other words, the \"listen\" operation\nmakes the socket to wait for incoming connections.\ncurrently\nNow, when a client actually attempts to connect to the server through the socket address\nthat we have specified, in order to establish this connection with the client,\nthe socket object needs to accept this incoming connection. Thus, when we\naccept an incoming connection, the client and the server become\nconnected to each other, and they can start reading or writing messages into this\nestablished connection.\n\nAfter we receive the HTTP Request from the client, analyze it, and send the HTTP Response\nto the client, we can then close the connection, and end this communication.\n\n\n## Implementing the server - Part 1\n\n### Creating the socket object {#sec-create-socket}\n\nLet's begin with creating the socket object for our server.\nJust to make things shorter, I will create this socket object in\na separate Zig module. I will name it `config.zig`.\n\nIn Zig, we can create a web socket using\nthe `std.posix.socket()` function, from the Zig Standard Library.\nAs I meantioned earlier at @sec-http-how-impl, every socket object that we create\nrepresents a communication channel, and we need to bind this channel to a specific address.\nAn \"address\" is defined as an IP address, or, more specifically, an IPv4 address^[It can be also an IPv6 address. But normally, we use a IPv4 address for that.].\nEvery IPv4 address is composed by two components. The first component is the host,\nwhich is a sequence of 4 numbers separated by dot characters (`.`) that identifies the machine used.\nWhile the second component is a port number, which identifies the specific\ndoor, or, the specific port to use in the host machine.\n\nThe sequence of 4 numbers (i.e. the host) identifies the machine (i.e. the computer itself) where\nthis socket will live in. Every computer normally have multiple \"doors\" available inside of him, because \nthis allows the computer to receive and work with multiple connections at the same time.\nHe simply use a single door for each connection. So the port number, is\nessentially a number that identifies the specific door in the computer that will be resposible\nfor receiving the connection. That is, it identifies the \"door\" in the computer that the socket will use\nto receive incoming connections.\n\nTo make things simpler, I will use an IP address that identifies our current machine in this example.\nThis means that, our socket object will reside on the same computer that we are currently using\n(this is also known as the \"localhost\") to write this Zig source code.\n\nBy convention, the IP address that identifies the \"localhost\", which is the current machine we\nare using, is the IP `127.0.0.1`. So, that is the IP\naddress we are going to use in our server. I can declare it in Zig\nby using an array of 4 integers, like this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst localhost = [4]u8{ 127, 0, 0, 1 };\n_ = localhost;\n```\n:::\n\n\n\n\nNow, we need to decide which port number to use. By convention, there are some\nport numbers that are reserved, meaning that, we cannot use them for our own\npurposes, like the port 22 (which is normally used for SSH connections).\nFor TCP connections, which is our case here,\na port number is a 16-bit unsigned integer (type `u16` in Zig),\nthus ranging from 0 to 65535 [@wikipedia_port].\nSo, we can choose\na number from 0 to 65535 for our port number. In the \nexample of this book, I will use the port number 3490\n(just a random number).\n\n\nNow that we have these two informations at hand, I can\nfinally create our socket object, using the `std.posix.socket()` function.\nFirst, we use the host and the port number to create an `Address` object,\nwith the `std.net.Address.initIp4()` function, like in the example below.\nAfter that, I use this address object inside the `socket()` function\nto create our socket object.\n\nThe `Socket` struct defined below summarizes all the logic behind\nthis process. In this struct, we have two data members, which are:\n1) the address object; 2) and a stream object, which is\nthe object we will use to read and write the messages into any connection we establish.\n\nNotice that, inside the constructor method of this struct,\nwhen we create the socket object, we are using the `IPROTO.TCP` property as an input to\ntell the function to create a socket for TCP connections.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst builtin = @import(\"builtin\");\nconst net = @import(\"std\").net;\n\npub const Socket = struct {\n _address: std.net.Address,\n _stream: std.net.Stream,\n\n pub fn init() !Socket {\n const host = [4]u8{ 127, 0, 0, 1 };\n const port = 3490;\n const addr = net.Address.initIp4(host, port);\n const socket = try std.posix.socket(\n addr.any.family,\n std.posix.SOCK.STREAM,\n std.posix.IPPROTO.TCP\n );\n const stream = net.Stream{ .handle = socket };\n return Socket{ ._address = addr, ._stream = stream };\n }\n};\n```\n:::\n\n\n\n\n\n### Listening and receiving connections\n\nRemember that we stored the `Socket` struct\ndeclaration that we built at @sec-create-socket inside a Zig module named `config.zig`.\nThis is why I imported this module into our main module (`main.zig`) in the example below, as the `SocketConf` object,\nto access the `Socket` struct.\n\nOnce we created our socket object, we can focus now on making this socket object\nlisten and receive new incoming connections. We do that, by calling the `listen()`\nmethod from the `Address` object that is contained inside the socket object, and then,\nwe call the `accept()` method over the result.\n\nThe `listen()` method from the `Address` object produces a server object,\nwhich is an object that will stay open and running indefinitely, waiting\nto receive an incoming connection. Therefore, if you try to run the code\nexample below, by calling the `run` command from the `zig` compiler,\nyou will notice that the programs keeps running indefinitely,\nwithout a clear end.\n\nThis happens, because the program is waiting for something to happen.\nIt is waiting for someone to try to connect to the address (`http://127.0.0.1:3490`) where\nthe server is running and listening for incoming connections. This is what\nthe `listen()` method do, it makes the socket to be active waiting for someone\nto connect.\n\nOn the other side, the `accept()` method is the function that establishes the connection\nwhen someone tries to connect to the socket. This means that, the `accept()` method\nreturns a new connection object as a result. And you can use this connection object\nto read or write messages from or to the client.\nFor now, we are not doing anything with this connection object.\nBut we are going to use it on the next section.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n try stdout.print(\"Server Addr: {any}\\n\", .{socket._address});\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n _ = connection;\n}\n```\n:::\n\n\n\n\nThis code example allows one single connection. In other words, the\nserver will wait for one incoming connection, and as soon as the\nserver is done with this first connection that it establishes, the\nprogram ends, and the server stops.\n\nThis is not the norm on the real world. Most people that write\na HTTP server like this, usually put the `accept()` method\ninside a `while` (infinite) loop, where if a connection\nis created with `accept()`, a new thread of execution is created to deal with\nthis new connection and the client. That is, real-world examples of HTTP Servers\nnormally rely on parallel computing to work.\n\nWith this design, the server simply accepts the connection,\nand the whole process of dealing with the client, and receiving\nthe HTTP Request, and sending the HTTP Response, all of this\nis done in the background, on a separate execution thread.\n\nSo, as soon as the server accepts the connection, and creates\nthe separate thread, the server goes back to what he was doing earlier,\nwhich is to wait indefinitely for a new connection to accept.\nHaving this in mind, the code example exposed above, is a\nserver that serves only a single client. Because the program\nterminates as soon as the connection is accepted.\n\n\n\n### Reading the message from the client {#sec-read-http-message}\n\nNow that we have a connection established, i.e. the connection\nobject that we created through the `accept()` function, we can now\nuse this connection object to read any messages that the client\nsend to our server. But we can also use it to send messages back\nto the client.\n\nThe basic idea is, if we **write** any data into this connection object,\nthen, we are sending data to the client, and if we **read** the data present in\nthis connection object, then, we are reading any data that the\nclient sent to us, through this connection object. So, just\nhave this logic in mind. \"Read\" is for reading messages from the client,\nand \"write\" is to send a message to the client.\n\nRemember from @sec-how-http-works that, the first thing that we need to do is to read the HTTP Request\nsent by the client to our server. Because it is the first message that happens\ninside the established connection, and, as a consequence, it is the first\nthing that we need to deal with.\n\nThat is why, I'm going to create a new Zig module in this small project, named `request.zig`\nto keep all functions related to the HTTP Request\ntogether. Then, I will create a new function named `read_request()` that will\nuse our connection object to read the message sent by the client,\nwhich is the HTTP Request.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Connection = std.net.Server.Connection;\npub fn read_request(conn: Connection,\n buffer: []u8) !void {\n const reader = conn.stream.reader();\n _ = try reader.read(buffer);\n}\n```\n:::\n\n\n\n\n\nThis function accepts a slice object which behaves as a buffer.\nThe `read_request()` function reads the message sent into\nthe connection object, and saves this message into this buffer object that\nwe have provided as input.\n\nNotice that I'm using the connection object that we created to read\nthe message from the client. I first access the `reader` object that lives inside the\nconnection object. Then, I call the `read()` method of this `reader` object\nto effectivelly read and save the data sent by the client into the buffer object\nthat we created earlier. I'm discarting the return value\nof the `read()` method, by assigning it to the underscore character (`_`),\nbecause this return value is not useful for us right now.\n\n\n\n## Looking at the current state of the program\n\n\nI think now is a good time to see how our program is currently working. Shall we?\nSo, the first thing I will do is to update the `main.zig` module in our small Zig project,\nso that the `main()` function call this new `read_request()` function that we have just created.\nI will also add a print statement at the end of the `main()` function,\njust so that you can see what the HTTP Request that we have just loaded into the buffer object\nlooks like.\n\nAlso, I'm creating the buffer object in the `main()` function, which will be\nresponsible for storing the message sent by the client, and, I'm also\nusing a `for` loop to initialize all fields of this buffer object to the number zero.\nThis is important to make sure that we don't have uninitialized memory in\nthis object. Because uninitialized memory may cause undefined behaviour in our program.\n\nSince the `read_request()` function should receive as input the buffer object as a slice object (`[]u8`),\nI am using the syntax `array[0..array.len]` to get access to a slice of this `buffer` object.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst Request = @import(\"request.zig\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n try stdout.print(\"Server Addr: {any}\\n\", .{socket._address});\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n var buffer: [1000]u8 = undefined;\n for (0..buffer.len) |i| {\n buffer[i] = 0;\n }\n _ = try Request.read_request(\n connection, buffer[0..buffer.len]\n );\n try stdout.print(\"{s}\\n\", .{buffer});\n}\n```\n:::\n\n\n\n\nNow, I'm going to execute this program, with the `run` command from the\n`zig` compiler. But remember, as we sad earlier, as soon as I execute this program, it will\nhang indefinitely, because the program is waiting for a client trying to\nconnect to the server.\n\nMore specifically, the program will pause at the line\nwith the `accept()` call. As soon as a client try to connect to the\nserver, then, the execution will \"unpause\", and the `accept()` function\nwill finally be executed to create the\nconnection object that we need, and the remaining of the program\nwill run.\n\nYou can see that at @fig-print-zigrun1. The message `Server Addr: 127.0.0.1:3490`\nis printed to the console, and the program is now waiting for an incoming connection.\n\n![A screenshot of running the program](./../Figures/print-zigrun1.png){#fig-print-zigrun1}\n\n\nWe can finally try to connect to this server, and there are several ways we can do this.\nFor example, we could use the following Python script:\n\n```python\nimport requests\nrequests.get(\"http://127.0.0.1:3490\")\n```\n\nOr, we could also open any web browser of our preference, and type\nthe URL `localhost:3490`. OBS: `localhost` is the same thing as the\nIP `127.0.0.1`. When you press enter, and your web browser go\nto this address, first, the browser will probably print a message\nsaying that \"this page isn't working\", and, then, it will\nprobably change to a new message saying that \"the site can't be\nreached\".\n\nYou get these \"error messages\" in the web browser, because\nit got no response back from the server. In other words, when the web\nbrowser connected to our server, it did send the HTTP Request through the established connection.\nThen, the web browser was expecting to receive a HTTP Response back, but\nit got no response from the server (we didn't implemented the HTTP Response logic yet).\n\nBut that is okay. We have achieved the result that we wanted for now,\nwhich is to connect to the server, and see the HTTP Request\nthat was sent by the web browser (or by the Python script)\nto the server.\n\nIf you comeback to the console that you left open\nwhen you have executed the program, you will see that the\nprogram finished it's execution, and, a new message is\nprinted in the console, which is the actual HTTP Request\nmessage that was sent by the web browser to the server.\nYou can see this message at @fig-print-zigrun2.\n\n![A screenshot of the HTTP Request sent by the web browser](./../Figures/print-zigrun2.png){#fig-print-zigrun2}\n\n\n\n\n## Learning about Enums in Zig {#sec-enum}\n\nEnums structures are available in Zig through the `enum` keyword.\nAn enum (short for \"enumeration\") is a special structure that represents a group of constant values.\nSo, if you have a variable which can assume a short and known\nset of values, you might want to associate this variable to an enum structure,\nto make sure that this variable only assumes a value from this set.\n\nA classic example for enums are primary colors. If for some reason, your program\nneeds to represent one of the primary colors, you can create an enum\nthat represents one of these colors.\nIn the example below, we are creating the enum `PrimaryColorRGB`, which\nrepresents a primary color from the RGB color system. By using this enum,\nI am garanteed that the `acolor` object for example, will contain\none of these three values: `RED`, `GREEN` or `BLUE`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst PrimaryColorRGB = enum {\n RED, GREEN, BLUE\n};\nconst acolor = PrimaryColorRGB.RED;\n_ = acolor;\n```\n:::\n\n\n\n\nIf for some reason, my code tries to save in `acolor`,\na value that is not in this set, I will get an error message\nwarning me that a value such as \"MAGENTA\" do not exist\ninside the `PrimaryColorRGB` enum.\nThen I can easily fix my mistake.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst acolor = PrimaryColorRGB.MAGENTA;\n```\n:::\n\n\n\n\n```\ne1.zig:5:36: error: enum 'PrimaryColorRGB' has\n no member named 'MAGENTA':\n const acolor = PrimaryColorRGB.MAGENTA;\n ^~~~~~~\n```\n\nBehind the hood, enums in Zig work the same way that enums\nwork in C. Each enum value is essentially represented as an integer.\nThe first value in the set is represented as zero,\nthen, the second value is one, ... etc.\n\nOne thing that we are going to learn on the next section is that\nenums can have methods in them. Wait... What? This is amazing!\nYes, enums in Zig are similar to structs, and they can have\nprivate and public methods inside them.\n\n\n\n\n\n\n\n## Implementing the server - Part 2\n\nNow, on this section, I want to focus on parsing\nthe HTTP Request that we received from the client.\nHowever, to effectively parse a HTTP Request message, we first need to understand it's\nstructure.\nIn summary, a HTTP Request is a text message that is divided into 3 different\nsections (or parts):\n\n- The top-level header indicating the method of the HTTP Request, the URI, and the HTTP version used in the message.\n- A list of HTTP Headers.\n- The body of the HTTP Request.\n\n### The top-level header\n\nThe first line of text in a HTTP Request always come with the three most essential\ninformation about the request. These three key attributes of the HTTP Request\nare separated by a simple space in this first line of the request.\nThe first information is the HTTP method that is being\nused in the request, second, we have the URI to which this HTTP Request is being sent to,\nand third, we have the version of the HTTP protocol that is being used in this HTTP Request.\n\nIn the snippet below, you can find an example of this first line in a HTTP Request.\nFirst, we have the HTTP method of this request (`GET`). Many programmers\nrefer to the URI component (`/users/list`) as the \"API endpoint\" to which the HTTP Request\nis being sent to. In the context of this specific request, since it is a GET request,\nyou could also say that the URI component is the path to the resource we want to access,\nor, the path to the document (or the file) that we want to retrieve from the server.\n\n```\nGET /users/list HTTP/1.1\n```\n\nAlso, notice that this HTTP Request is using the version 1.1 of the HTTP protocol,\nwhich is the most popular version of the protocol used in the web.\n\n\n\n### The list of HTTP headers\n\nMost HTTP Requests also include a section of HTTP Headers,\nwhich is just a list of attributes or key-value pairs associated with this\nparticular request. This section always comes right after the \"top-level header\" of the request.\n\nFor our purpose in this chapter, which is to build a simple HTTP Server,\nwe are going to ignore this section of the HTTP Request, for simplicity.\nBut most HTTP servers that exist in the wild parses and use these\nHTTP headers to change the way that the server responds to the request\nsent by the client.\n\nFor example, many requests we encounter in the real-world comes with\na HTTP header called `Accept`. In this header, we find a list of [MIME types](https://en.wikipedia.org/wiki/Media_type)[^mime].\nThis list indicates the file formats that the client can read, or parse, or interpret.\nIn other words, you also interpret this header as the client saying the following phrase\nto the server: \"Hey! Look, I can read only HTML documents, so please, send me back\na document that is in a HTML format.\".\n\n[^mime]: .\n\nIf the HTTP server can read and use this `Accept` header, then, the server can identify\nwhich is the best file format for the document to be sent to the client. Maybe the HTTP server have\nthe same document in multiple formats, for example, in JSON, in XML, in HTML and in PDF,\nbut the client can only understand documents in the HTML format. That is the purpose\nof this `Accept` header.\n\n\n### The body\n\nThe body comes after the list of HTTP headers, and it is an optional section of the HTTP Request, meaning that, not\nall HTTP Request will come with a body in it. For example, every HTTP Request that uses the\nGET method usually do not come with a body.\n\nBecause a GET request is used to request data, instead of sending it to the server.\nSo, the body section is more related to the POST method, which is a method that involves\nsending data to the server, to be processed and stored.\n\nSince we are going to support only the GET method in this project, it means that\nwe also do not need to care about the body of the request.\n\n\n\n### Creating the HTTP Method enum\n\nEvery HTTP Request comes with a explicit method. The method used in a HTTP Request\nis identified by one these words:\n\n- GET;\n- POST;\n- OPTIONS;\n- PATCH;\n- DELETE;\n- and some other methods.\n\nEach HTTP method is used for a specific type of task. The POST method for example is normally\nused to post some data into the destination. In other words, it is used\nto send some data to the HTTP server, so that it can be processed and stored by the server.\n\nAs another example, the GET method is normally used to get content from the server.\nIn other words, we use this method whenever we want the server to send some\ncontent back to us. It can be any type of content. It can be a web page,\na document file, or some data in a JSON format.\n\nWhen a client sends a POST HTTP Request, the HTTP Response sent by the server normally have the sole purpose of\nletting the client know if the server processed and stored the data successfully.\nIn contrast, when the server receives a GET HTTP Request, then, the server sends the content\nthat the client asked for in the HTTP Response itself. This demonstrates that the method associated\nwith the HTTP Request changes a lot on the dynamics and the roles that each party\nplays in the whole process.\n\nSince the HTTP method of the HTTP Request is identified by this very small and specific\nset of words, it would be interesting to create an enum structure to represent a HTTP method.\nThis way, we can easily check if the HTTP Request we receive from the client is a\nHTTP method that we currently support in our small HTTP server project.\n\nThe `Method` structure below represents this enumeration.\nNotice that, for now, only the GET HTTP method is included in this\nenumeration. Because, for the purpose of this chapter, I want to\nimplement only the GET HTTP method. That is why I am not\nincluding the other HTTP methods in this enumeration.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const Method = enum {\n GET\n};\n```\n:::\n\n\n\n\n\nNow, I think we should add two methods to this enum structure. One method is `is_supported()`,\nwhich will be a function that returns a boolean value, indicating if the input HTTP method is supported\nor not by our HTTP Server. The other is `init()`, which is a constructor function that takes a string as input,\nand tries to convert it into a `Method` value.\n\n\nBut in order to build these functions, I will use a functionality from the Zig Standard Library, called\n`StaticStringMap()`. This function allows us to create a simple map from strings to enum values.\nIn other words, we can use this map structure to map a string to the respective enum value.\nTo some extent, this specific structure from the standard library works almost like a \"hashtable\" structure,\nand it is optimized for small sets of words, or, small sets of keys, which is our case here.\nWe are going to talk more about hashtables in Zig at @sec-maps-hashtables.\n\nTo use this \"static string map\" structure, you have to import it from the `std.static_string_map` module\nof the Zig Standard Library. Just to make things shorter and easier to type, I am going to import this\nfunction through a different and shorter name (`Map`).\n\nWith `Map()` imported, we can just apply this function over the enum structure\nthat we are going to use in the resulting map. In our case here, it is the `Method` enum structure\nthat we declared at the last code example. Then, I call the `initComptime()` method with the\nmap, i.e. the list of key-value pairs that we are going to use.\n\nYou can see in the example below that I wrote this map using multiple anonymous struct literals.\nInside the first (or \"top-level\") struct literal, we have a list (or a sequence) of struct literals.\nEach struct literal in this list represents a separate key-value pair. The first element (or the key)\nin each key-value pair should always be a string value. While the second element should\nbe a value from the enum structure that you have used inside the `Map()` function.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst Map = std.static_string_map.StaticStringMap;\nconst MethodMap = Map(Method).initComptime(.{\n .{ \"GET\", Method.GET },\n});\n```\n:::\n\n\n\n\nTherefore, the `MethodMap` object is basically a `std::map` object from C++, or,\na `dict` object from Python. You can retrieve (or get) the enum value that\ncorresponds to a particular key, by using the `get()` method from the map\nobject. This method returns an optional value, so, the `get()` method might\nresult in a null value.\n\nWe can use this in our advantage to detect if a particular HTTP method is\nsupported or not in our HTTP server. Because, if the `get()` method returns null,\nit means that it did not found the method that we provided inside the `MethodMap` object, and,\nas a consequence, this method is not supported by our HTTP server.\n\nThe `init()` method below, takes a string value as input, and then, it simply passes this string value\nto the `get()` method of our `MethodMap` object. As consequence, we should get the enum value that corresponds\nto this input string.\n\nNotice in the example below that, the `init()` method returns either an error\n(which might happen if the `?` method returns `unreacheable`, checkout @sec-null-handling for more details)\nor a `Method` object as result. Since `GET` is currently the only value in our `Method` enum\nstructure, it means that, the `init()` method will most likely return the value `Method.GET` as result.\n\nAlso notice that, in the `is_supported()` method, we are using the optional value returned\nby the `get()` method from our `MethodMap` object. The if statement unwrapes the optional value\nreturned by this method, and returns `true` in case this optional value is a not-null value.\nOtherwise, it simply returns `false`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const Method = enum {\n GET,\n pub fn init(text: []const u8) !Method {\n return MethodMap.get(text).?;\n }\n pub fn is_supported(m: []const u8) bool {\n const method = MethodMap.get(m);\n if (method) |_| {\n return true;\n }\n return false;\n }\n};\n```\n:::\n\n\n\n\n\n\n\n\n\n\n### Writing the parse request function\n\nNow that we created the enum that represents our HTTP method,\nwe should start to write the function responsible for\nactually parsing the HTTP Request.\n\nThe first thing we can do, is to write a struct to represent the HTTP Request.\nTake the `Request` struct below as an example. It contains the three\nessential information from the \"top-level\" header (i.e. the first line)\nin the HTTP Request.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst Request = struct {\n method: Method,\n version: []const u8,\n uri: []const u8,\n pub fn init(method: Method,\n uri: []const u8,\n version: []const u8) Request {\n return Request{\n .method = method,\n .uri = uri,\n .version = version,\n };\n }\n};\n```\n:::\n\n\n\n\n\nThe `parse_request()` function should receive a string as input. This input string\ncontains the entire HTTP Request message, and the parsing function should\nread and understand the individual parts of this message.\n\nNow, remember that for the purpose of this chapter, we care only about the first\nline in this message, which contains the \"top-level header\", or, the three essential attributes about the HTTP Request,\nwhich are the HTTP method used, the URI and the HTTP version.\n\nNotice that I use the function `indexOfScalar()` in `parse_request()`. This function from the\nZig Standard Library returns the first index where the scalar value that we provide\nhappens in a string. In this case, I'm looking at the first occurrence of the new line character (`\\n`).\nBecause once again, we care only about the first line in the HTTP Request message.\nThis is the line where we have the three information that we want to parse\n(version of HTTP, the HTTP method and the URI).\n\nTherefore, we are using this `indexOfScalar()` function\nto limit our parsing process to the first line in the message.\nIs also worth mentioning that, the `indexOfScalar()` function returns an optional value.\nThat is why I use the `orelse` keyword to provide an alternative value, in case\nthe value returned by the function is a null value.\n\nSince each of these three attributes are separated by a simple space, we\ncould use the function `splitScalar()` from the Zig Standard Library to split\nthe input string into sections by looking for every position that appears\na simple space. In other words, this `splitScalar()` function is equivalent\nto the `split()` method in Python, or, the `std::getline()` function from C++,\nor the `strtok()` function in C.\n\nWhen you use this `splitScalar()` function, you get an iterator as the result.\nThis iterator have a `next()` method that you can use to advance the iterator\nto the next position, or, to the next section of the splitted string.\nNote that, when you use `next()`, the method not only advances the iterator,\nbut it also returns a slice to the current section of the splitted\nstring as result.\n\nNow, if you want to get a slice to the current section of the splitted\nstring, but not advance the iterator to the next position, you can use\nthe `peek()` method. Both `next()` and `peek()` methods return an optional value, that is\nwhy I use the `?` method to unwrap these optional values.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn parse_request(text: []u8) Request {\n const line_index = std.mem.indexOfScalar(\n u8, text, '\\n'\n ) orelse text.len;\n var iterator = std.mem.splitScalar(\n u8, text[0..line_index], ' '\n );\n const method = try Method.init(iterator.next().?);\n const uri = iterator.next().?;\n const version = iterator.next().?;\n const request = Request.init(method, uri, version);\n return request;\n}\n```\n:::\n\n\n\n\n\nAs I described at @sec-zig-strings, strings in Zig are simply arrays of bytes in the language.\nSo, you will find lots of excellent utility functions to work directly with strings\ninside this `mem` module from the Zig Standard Library.\nWe have described some of these useful utility functions already\nat @sec-strings-useful-funs.\n\n\n\n### Using the parse request function\n\nNow that we wrote the function responsible for parsing the HTTP Request,\nwe can add the function call to `parse_request()` in\nthe `main()` function of our program.\n\nAfter that, is a good idea to test once again the state of our program.\nI execute this program again with the `run` command from the `zig` compiler,\nthen, I use my web browser to connect once again to the server through the URL `localhost:3490`, and finally,\nthe end result of our `Request` object is printed to the console.\n\nA quick observation, since I have used the `any` format specifier in the\nprint statement, the data members `version` and `uri` of the `Request`\nstruct were printed as raw integer values. String data being printed\nas integer values is common in Zig, and remember, these integer values are just the decimal representation of\nthe bytes that form the string in question.\n\nIn the result below, the sequence of decimal values 72, 84, 84, 80, 47, 49, 46, 49, and 13,\nare the bytes that form the text \"HTTP/1.1\". And the integer 47, is the decimal value of\nthe character `/`, which represents our URI in this request.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst Request = @import(\"request.zig\");\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n\n var buffer: [1000]u8 = undefined;\n for (0..buffer.len) |i| {\n buffer[i] = 0;\n }\n try Request.read_request(\n connection, buffer[0..buffer.len]\n );\n const request = Request.parse_request(buffer[0..buffer.len]);\n try stdout.print(\"{any}\\n\", .{request});\n}\n```\n:::\n\n\n\n\n```\nrequest.Request{\n .method = request.Method.GET,\n .version = {72, 84, 84, 80, 47, 49, 46, 49, 13},\n .uri = {47}\n}\n```\n\n\n\n### Sending the HTTP Response to the client\n\nIn this last part, we are going to write the logic responsible for\nsending the HTTP Response from the server to the client. To make things\nsimple, the server in this project will send just a simple web page\ncontaining the text \"Hello world\".\n\nFirst, I create a new Zig module in the project, named `response.zig`.\nIn this module, I will declare just two functions. Each function\ncorresponds to a specific status code in the HTTP Response.\nThe `send_200()` function will send a HTTP Response with status code 200\n(which means \"Success\") to the client. While the `send_404()` function sends a response\nwith status code 404 (which means \"Not found\").\n\nThis is definitely not the most ergonomic and adequate way of handling the\nHTTP Response, but it works for our case here. We are just building toy projects\nin this book after all, therefore, the source code that we write do not need to be perfect.\nIt just needs to work!\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Connection = std.net.Server.Connection;\npub fn send_200(conn: Connection) !void {\n const message = (\n \"HTTP/1.1 200 OK\\nContent-Length: 48\"\n ++ \"\\nContent-Type: text/html\\n\"\n ++ \"Connection: Closed\\n\\n\"\n ++ \"
\"\n );\n _ = try conn.stream.write(message);\n}\n```\n:::\n\n\n\n\nNotice that both functions receives the connection object as input, and\nuse the `write()` method to write the HTTP Response message directly\ninto this communication channel. As result, the party in the other\nside of the connection (i.e. the client), will receive such message.\n\nMost real-world HTTP Servers will have a single function (or a single struct) to effectively handle\nthe response. It gets the HTTP Request already parsed as input, and then, it tries to build\nthe HTTP Response bit by bit, before the function sends it over the connection.\n\nWe would also have a specialized struct to represent a HTTP Response, and\na lot of methods that would be used to build each part or component of the response object.\nTake the `Response` struct created by the Javascript runtime Bun as an example.\nYou can find this struct in the [`response.zig` module](https://github.com/oven-sh/bun/blob/main/src/bun.js/webcore/response.zig)[^bun-resp]\nin their GitHub project.\n\n[^bun-resp]: .\n\n\n## The end result\n\nWe can now, update once again our `main()` function to incorporate our new\nfunctions from the `response.zig` module. First, I need to import this module\ninto our `main.zig` module, then, I add the function calls to `send_200()`\nand `send_404()`.\n\nNotice that I'm using if statements to decide which \"response function\" to call,\nbased especially on the URI present in the HTTP Request. If the user asked for\na content (or a document) that is not present in our server, we should respond\nwith a 404 status code. But since we have just a simple HTTP server, with no\nreal documents to send, we can just check if the URI is the root path (`/`)\nor not to decide which function to call.\n\nAlso, notice that I'm using the function `std.mem.eql()` from the Zig Standard Library\nto check if the string from `uri` is equal or not the string `\"/\"`. We have\ndescribed this function already at @sec-strings-useful-funs, so, comeback to\nthat section if you are not familiar yet with this function.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SocketConf = @import(\"config.zig\");\nconst Request = @import(\"request.zig\");\nconst Response = @import(\"response.zig\");\nconst Method = Request.Method;\nconst stdout = std.io.getStdOut().writer();\n\npub fn main() !void {\n const socket = try SocketConf.Socket.init();\n try stdout.print(\"Server Addr: {any}\\n\", .{socket._address});\n var server = try socket._address.listen(.{});\n const connection = try server.accept();\n\n var buffer: [1000]u8 = undefined;\n for (0..buffer.len) |i| {\n buffer[i] = 0;\n }\n try Request.read_request(connection, buffer[0..buffer.len]);\n const request = Request.parse_request(buffer[0..buffer.len]);\n if (request.method == Method.GET) {\n if (std.mem.eql(u8, request.uri, \"/\")) {\n try Response.send_200(connection);\n } else {\n try Response.send_404(connection);\n }\n }\n}\n```\n:::\n\n\n\n\n\nNow that we adjusted our `main()` function, I can now execute our program, and\nsee the effects of these last changes. First, I execute the program once again, with the\n`run` command of the `zig` compiler. The program will hang, waiting for a client to connect.\n\nThen, I open my web browser, and try to connect to the server again, using the URL `localhost:3490`.\nThis time, instead of getting some sort of an error message from the browser, you will get the message\n\"Hello World\" printed into your web browser. Because this time, the server sended the HTTP Response\nsuccessfully to the web browser, as demonstrated by @fig-print-zigrun3.\n\n\n![The Hello World message sent in the HTTP Response](./../Figures/print-zigrun3.png){#fig-print-zigrun3}\n\n",
+ "supporting": [
+ "04-http-server_files"
+ ],
"filters": [
"rmarkdown/pagebreak.lua"
],
diff --git a/_freeze/Chapters/05-pointers/execute-results/html.json b/_freeze/Chapters/05-pointers/execute-results/html.json
index 92e19a1..0d1a409 100644
--- a/_freeze/Chapters/05-pointers/execute-results/html.json
+++ b/_freeze/Chapters/05-pointers/execute-results/html.json
@@ -1,8 +1,8 @@
{
- "hash": "729f27787112d9c078c937a8dd01693c",
+ "hash": "7dd5223398b22bf134089b564d7827c0",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Pointers and Optionals {#sec-pointer}\n\nOn our next project we are going to build a HTTP server from scratch.\nBut in order to do that, we need to learn more about pointers and how they work in Zig.\nPointers in Zig are similar to pointers in C. But they come with some extra advantages in Zig.\n\nA pointer is an object that contains a memory address. This memory address is the address where\na particular value is stored in memory. It can be any value. Most of the times,\nit is a value that comes from another object (or variable) present in our code.\n\nIn the example below, I'm creating two objects (`number` and `pointer`).\nThe `pointer` object contains the memory address where the value of the `number` object\n(the number 5) is stored. So, that is a pointer in a nutshell. It is a memory\naddress that points to a particular existing value in the memory. You could\nalso say, that, the `pointer` object points to the memory address where the `number` object is\nstored.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst number: u8 = 5;\nconst pointer = &number;\n_ = pointer;\n```\n:::\n\n\n\n\nWe create a pointer object in Zig by using the `&` operator. When you put this operator\nbefore the name of an existing object, you get the memory address of this object as result.\nWhen you store this memory address inside a new object, this new object becomes a pointer object.\nBecause it stores a memory address.\n\nPeople mostly use pointers as an alternative way to access a particular value.\nFor example, I can use the `pointer` object to access the value stored by\nthe `number` object. This operation of accessing the value that the\npointer \"points to\" is normally called of *dereferencing the pointer*.\nWe can dereference a pointer in Zig by using the `*` method of the pointer object. Like in the example\nbelow, where we take the number 5 pointed by the `pointer` object,\nand double it.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst number: u8 = 5;\nconst pointer = &number;\nconst doubled = 2 * pointer.*;\nstd.debug.print(\"{d}\\n\", .{doubled});\n```\n:::\n\n\n\n\n```\n10\n```\n\nThis syntax to dereference the pointer is nice. Because we can easily chain it with\nmethods of the value pointed by the pointer. We can use the `User` struct that we have\ncreated at @sec-structs-and-oop as an example. If you comeback to that section,\nyou will see that this struct have a method named `print_name()`.\n\nSo, for example, if we have an user object, and a pointer that points to this user object,\nwe can use the pointer to access this user object, and, at the same time, call the method `print_name()`\non it, by chaining the dereference method (`*`) with the `print_name()` method. Like in the\nexample below:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst u = User.init(1, \"pedro\", \"email@gmail.com\");\nconst pointer = &u;\ntry pointer.*.print_name();\n```\n:::\n\n\n\n\n```\npedro\n```\n\nWe can also use pointers to effectively alter the value of an object.\nFor example, I could use the `pointer` object to set\nthe value of the object `number` to 6, like in the example below.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar number: u8 = 5;\nconst pointer = &number;\npointer.* = 6;\ntry stdout.print(\"{d}\\n\", .{number});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n6\n```\n\n\n:::\n:::\n\n\n\n\n\nTherefore, as I mentioned earlier, people use pointers as an alternative way to access a particular value.\nAnd they use it especially when they do not want to \"move\" these values around. There are situations where,\nyou want to access a particular value in a different scope (i.e. a different location) of your code,\nbut you do not want to \"move\" this value to this new scope (or location) that you are in.\n\nThis matters especially if this value is big in size. Because if it is, then,\nmoving this value becomes an expensive operation to do.\nThe computer will have to spend a considerable amount of time\ncopying this value to this new location.\n\nTherefore, many programmers prefer to avoid this heavy operation of copying the value\nto the new location, by accessing this value through pointers.\nWe are going to talk more about this \"moving operation\" over the next sections.\nFor now, just keep in mind that avoiding this \"move operation\" is\none of main reasons why pointers are used in programming languages.\n\n\n\n\n\n## Constant objects vs variable objects {#sec-pointer-var}\n\nYou can have a pointer that points to a constant object, or, a pointer that points to a variable object.\nBut regardless of who this pointer is, a pointer **must always respect the characteristics of the object that it points to**.\nAs a consequence, if the pointer points to a constant object, then, you cannot use this pointer\nto change the value that it points to. Because it points to a value that is constant. As we discussed at @sec-assignments, you cannot\nchange a value that is constant.\n\nFor example, if I have a `number` object, which is constant, I cannot execute\nthe expression below where I'm trying to change the value of `number` to 6 through\nthe `pointer` object. As demonstrated below, when you try to do something\nlike that, you get a compile time error:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst number = 5;\nconst pointer = &number;\npointer.* = 6;\n```\n:::\n\n\n\n\n```\np.zig:6:12: error: cannot assign to constant\n pointer.* = 6;\n```\n\nIf I change the `number` object to be a variable object, by introducing the `var` keyword,\nthen, I can succesfully change the value of this object through a pointer, as demonstrated below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar number: u8 = 5;\nconst pointer = &number;\npointer.* = 6;\ntry stdout.print(\"{d}\\n\", .{number});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n6\n```\n\n\n:::\n:::\n\n\n\n\nYou can see this relationship between \"constant versus variable\" on the data type of\nyour pointer object. In other words, the data type of a pointer object already gives you\nsome clues about whether the value that it points to is constant or not.\n\nWhen a pointer object points to a constant value, then, this pointer have a data type `*const T`,\nwhich means \"a pointer to a constant value of type `T`\".\nIn contrast, if the pointer points to a variable value, then, the type of the pointer is usually `*T`, which is\nsimply \"a pointer to a value of type `T`\".\nHence, whenever you see a pointer object whose data type is in the format `*const T`, then,\nyou know that you cannot use this pointer to change the value that it points to.\nBecause this pointer points to a constant value of type `T`.\n\n\nWe have talked about the value pointed by the pointer being constant or not,\nand the consequences that arises from it. But, what about the pointer object itself? I mean, what happens\nif the pointer object itself is constant or not? Think about it.\nWe can have a constant pointer that points to a constant value.\nBut we can also have a variable pointer that points to a constant value. And vice-versa.\n\nUntil this point, the `pointer` object was always constant,\nbut what this means for us? What is the consequence of the\n`pointer` object being constant? The consequence is that\nwe cannot change the pointer object, because it is constant. We can use the\npointer object in multiple ways, but we cannot change the\nmemory address that is inside this pointer object.\n\nHowever, if we mark the `pointer` object as a variable object,\nthen, we can change the memory address pointed by this `pointer` object.\nThe example below demonstrates that. Notice that the object pointed\nby the `pointer` object changes from `c1` to `c2`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c1: u8 = 5;\nconst c2: u8 = 6;\nvar pointer = &c1;\ntry stdout.print(\"{d}\\n\", .{pointer.*});\npointer = &c2;\ntry stdout.print(\"{d}\\n\", .{pointer.*});\n```\n:::\n\n\n\n\n```\n5\n6\n```\n\nThus, by setting the `pointer` object to a `var` or `const` object,\nyou specify if the memory address contained in this pointer object can change or not\nin your program. On the other side, you can change the value pointed by the pointer,\nif, and only if this value is stored in a variable object. If this value\nis in a constant object, then, you cannot change this value through a pointer.\n\n\n## Types of pointer\n\nIn Zig, there are two types of pointers [@zigdocs], which are:\n\n- single-item pointer (`*`);\n- many-item pointer (`[*]`);\n\n\nSingle-item pointer objects are objects whose data types are in the format `*T`.\nSo, for example, if an object have a data type `*u32`, it means that, this\nobject contains a single-item pointer that points to an unsigned 32-bit integer value.\nAs another example, if an object have type `*User`, then, it contains\na single-item pointer to an `User` value.\n\nIn contrast, many-item pointers are objects whose data types are in the format `[*]T`.\nNotice that the star symbol (`*`) is now inside a pair of brackets (`[]`). If the star\nsymbol is inside a pair of brackets, you know that this object is a many-item pointer.\n\nWhen you apply the `&` operator over an object, you will always get a single-item pointer.\nMany-item pointers are more of a \"internal type\" of the language, more closely\nrelated to slices. So, when you deliberately create a pointer with the `&` operator,\nyou always get a single-item pointer as result.\n\n\n\n## Pointer arithmethic\n\nPointer arithmethic is available in Zig, and they work the same way they work in C.\nWhen you have a pointer that points to an array, the pointer usually points to\nthe first element in the array, and you can use pointer arithmethic to\nadvance this pointer and access the other elements in the array.\n\n\nNotice in the example below, that initially, the `ptr` object was pointing\nto the first element in the array `ar`. But then, I started to walk through the array, by advancing\nthe pointer with simple pointer arithmethic.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ar = [_]i32{1,2,3,4};\nvar ptr = &ar;\ntry stdout.print(\"{d}\\n\", .{ptr.*});\nptr += 1;\ntry stdout.print(\"{d}\\n\", .{ptr.*});\nptr += 1;\ntry stdout.print(\"{d}\\n\", .{ptr.*});\n```\n:::\n\n\n\n\n```\n1\n2\n3\n```\n\nAlthough you can create a pointer to an array like that, and\nstart to walk through this array by using pointer arithmethic,\nin Zig, we prefer to use slices, which were presented at @sec-arrays.\n\nBehind the hood, slices already are pointers,\nand they also come with the `len` property, which indicates\nhow many elements are in the slice. This is good because the `zig` compiler\ncan use it to check for potential buffer overflows, and other problems like that.\n\nAlso, you don't need to use pointer arithmethic to walk through the elements\nof a slice. You can simply use the `slice[index]` syntax to directly access\nany element you want in the slice.\nAs I mentioned at @sec-arrays, you can get a slice from an array by using\na range selector inside brackets. In the example below, I'm creating\na slice (`sl`) that covers the entire `ar` array. I can access any\nelement of `ar` from this slice, and, the slice itself already is a pointer\nbehind the hood.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ar = [_]i32{1,2,3,4};\nconst sl = ar[0..ar.len];\n_ = sl;\n```\n:::\n\n\n\n\n\n## Optionals and Optional Pointers\n\nLet's talk about optionals and how they relate to pointers in Zig.\nBy default, objects in Zig are **non-nullable**. This means that, in Zig,\nyou can safely assume that any object in your source code is not null.\n\nThis is a powerful feature of Zig when you compare it to the developer experience in C.\nBecause in C, any object can be null at any point, and, as consequence, a pointer in C\nmight point to a null value. This is a common source of undefined behaviour in C.\nWhen programmers work with pointers in C, they have to constantly check if\ntheir pointers are pointing to null values or not.\n\nIf for some reason, your Zig code produces a null value somewhere, and, this null\nvalue ends up in an object that is non-nullable, a runtime error is always\nraised by your Zig program. Take the program below as an example.\nThe `zig` compiler can see the `null` value at compile time, and, as result,\nit raises a compile time error. But, if a `null` value is raised during\nruntime, a runtime error is also raised by the Zig program, with a\n\"attempt to use null value\" message.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar number: u8 = 5;\nnumber = null;\n```\n:::\n\n\n\n```\np5.zig:5:14: error: expected type 'u8',\n found '@TypeOf(null)'\n number = null;\n ^~~~\n```\n\n\nYou don't get this type of safety in C.\nIn C, you don't get warnings or errors about null values being produced in your program.\nIf for some reason, your code produces a null value in C, most of the times, you end up getting a segmentation fault error\nas result, which can mean many things.\nThat is why programmers have to constantly check for null values in C.\n\nPointers in Zig are also, by default, **non-nullable**. This is another amazing\nfeature in Zig. So, you can safely assume that any pointer that you create in\nyour Zig code is pointing to a non-null value.\nTherefore, you don't have this heavy work of checking if the pointers you create\nin Zig are pointing to a null value.\n\n\n### What are optionals?\n\nOk, we know now that all objects are non-nullable by default in Zig.\nBut what if we actually need to use an object that might receive a null value?\nHere is where optionals come in.\n\nAn optional object in Zig is an object that can be null.\nTo mark an object as optional, we use the `?` operator. When you put\nthis `?` operator right before the data type of an object, you transform\nthis data type into an optional data type, and the object becomes an optional object.\n\nTake the snippet below as an example. We are creating a new variable object\ncalled `num`. This object have the data type `?i32`, which means that,\nthis object contains either a signed 32-bit integer (`i32`), or, a null value.\nBoth alternatives are valid values to the `num` object.\nThat is why, I can actually change the value of this object to null, and,\nno errors are raised by the `zig` compiler, as demonstrated below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar num: ?i32 = 5;\nnum = null;\n```\n:::\n\n\n\n\n### Optional pointers\n\nYou can also mark a pointer object as an optional pointer, meaning that,\nthis object contains either a null value, or, a pointer that points to a value.\nWhen you mark a pointer as optional, the data type of this pointer object\nbecomes `?*const T` or `?*T`, depending if the value pointed by the pointer\nis a constant value or not. The `?` identifies the object as optional, while\nthe `*` identifies it as a pointer object.\n\nIn the example below, we are creating a variable object named `num`, and an\noptional pointer object named `ptr`. Notice that the data type of the object\n`ptr` indicates that it is either a null value, or a pointer to an `i32` value.\nAlso, notice that the pointer object (`ptr`) can be marked as optional, even if\nthe object `num` is not optional.\n\nWhat this code tells us is that, the `num` variable will never contain a null value.\nThis variable will always contain a valid `i32` value. But in contrast, the `ptr` object might contain either a null\nvalue, or, a pointer to an `i32` value.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar num: i32 = 5;\nvar ptr: ?*i32 = #\nptr = null;\nnum = 6;\n```\n:::\n\n\n\n\nBut what happens if we turn the table, and mark the `num` object as optional,\ninstead of the pointer object. If we do that, then, the pointer object is\nnot optional anymore. It would be a similar (although different) result. Because then, we would have\na pointer to an optional value. In other words, a pointer to a value that is either a\nnull value, or, a not-null value.\n\nIn the example below, we are recreating this idea. Now, the `ptr` object\nhave a data type of `*?i32`, instead of `?*i32`. Notice that the `*` symbol comes before of `?`\nthis time. So now, we have a pointer that points to a value that is either null\n, or, a signed 32-bit integer.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar num: ?i32 = 5;\n// ptr have type `*?i32`, instead of `?*i32`.\nconst ptr = #\n_ = ptr;\n```\n:::\n\n\n\n\n\n### Null handling in optionals {#sec-null-handling}\n\nWhen you have an optional object in your Zig code, you have to explicitly handle\nthe possibility of this object being null. It is like error-handling with `try` and `catch`.\nIn Zig you also have to handle null values like if they were a type of error.\n\nWe can do that, by using either:\n\n- an if statement, like you would do in C.\n- the `orelse` keyword.\n- unwrap the optional value with the `?` method.\n\nWhen you use an if statement, you use a pair of pipes\nto unwrap the optional value, and use this \"unwrapped object\"\ninside the if block.\nUsing the example below as a reference, if the object `num` is null,\nthen, the code inside the if statement is not executed. Otherwise,\nthe if statement will unwrap the object `num` into the `not_null_num`\nobject. This `not_null_num` object is garanteed to be not null inside\nthe scope of the if statement.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst num: ?i32 = 5;\nif (num) |not_null_num| {\n try stdout.print(\"{d}\\n\", .{not_null_num});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n5\n```\n\n\n:::\n:::\n\n\n\n\nNow, the `orelse` keyword behaves like a binary operator. You connect two expressions with this keyword.\nOn the left side of `orelse`, you provide the expression that might result\nin a null value, and on the right side of `orelse`, you provide another expression\nthat will not result in a null value.\n\nThe idea behind the `orelse` keyword is: if the expression on the left side\nresult in a not-null value, then, this not-null value is used. However,\nif this expression on the left side result in a null value, then, the value\nof the expression on the right side is used instead.\n\nLooking at the example below, since the `x` object is currently null, the\n`orelse` decided to use the alternative value, which is the number 15.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst x: ?i32 = null;\nconst dbl = (x orelse 15) * 2;\ntry stdout.print(\"{d}\\n\", .{dbl});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n30\n```\n\n\n:::\n:::\n\n\n\n\nYou can use the if statement or the `orelse` keyword, when you want to\nsolve (or deal with) this null value. However, if there is no clear solution\nto this null value, and the most logic and sane path is to simply panic\nand raise a loud error in your program when this null value is encountered,\nyou can use the `?` method of your optional object.\n\nIn essence, when you use this `?` method, the optional object is unwraped.\nIf a not-null value is found in the optional object, then, this not-null value is used.\nOtherwise, the `unreachable` keyword is used. You can read more about this\n[`unreacheable` keyword at the official documentation](https://ziglang.org/documentation/master/#unreachable)[^un-docs].\nBut in essence, when you build your Zig source code using the build modes `ReleaseSafe` or `Debug`, this\n`unreacheable` keyword causes the program to panic and raise an error during runtime,\nlike in the example below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nfn return_null(n: i32) ?i32 {\n if (n == 5) return null;\n return n;\n}\n\npub fn main() !void {\n const x: i32 = 5;\n const y: ?i32 = return_null(x);\n try stdout.print(\"{d}\\n\", .{y.?});\n}\n```\n:::\n\n\n\n\n```\nthread 12767 panic: attempt to use null value\np7.zig:12:34: 0x103419d in main (p7):\n try stdout.print(\"{d}\\n\", .{y.?});\n ^\n```\n\n\n[^un-docs]: .\n\n\n",
+ "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Pointers and Optionals {#sec-pointer}\n\nOn our next project we are going to build a HTTP server from scratch.\nBut in order to do that, we need to learn more about pointers and how they work in Zig.\nPointers in Zig are similar to pointers in C. But they come with some extra advantages in Zig.\n\nA pointer is an object that contains a memory address. This memory address is the address where\na particular value is stored in memory. It can be any value. Most of the times,\nit is a value that comes from another object (or variable) present in our code.\n\nIn the example below, I'm creating two objects (`number` and `pointer`).\nThe `pointer` object contains the memory address where the value of the `number` object\n(the number 5) is stored. So, that is a pointer in a nutshell. It is a memory\naddress that points to a particular existing value in the memory. You could\nalso say, that, the `pointer` object points to the memory address where the `number` object is\nstored.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst number: u8 = 5;\nconst pointer = &number;\n_ = pointer;\n```\n:::\n\n\n\n\nWe create a pointer object in Zig by using the `&` operator. When you put this operator\nbefore the name of an existing object, you get the memory address of this object as result.\nWhen you store this memory address inside a new object, this new object becomes a pointer object.\nBecause it stores a memory address.\n\nPeople mostly use pointers as an alternative way to access a particular value.\nFor example, I can use the `pointer` object to access the value stored by\nthe `number` object. This operation of accessing the value that the\npointer \"points to\" is normally called of *dereferencing the pointer*.\nWe can dereference a pointer in Zig by using the `*` method of the pointer object. Like in the example\nbelow, where we take the number 5 pointed by the `pointer` object,\nand double it.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst number: u8 = 5;\nconst pointer = &number;\nconst doubled = 2 * pointer.*;\nstd.debug.print(\"{d}\\n\", .{doubled});\n```\n:::\n\n\n\n\n```\n10\n```\n\nThis syntax to dereference the pointer is nice. Because we can easily chain it with\nmethods of the value pointed by the pointer. We can use the `User` struct that we have\ncreated at @sec-structs-and-oop as an example. If you comeback to that section,\nyou will see that this struct have a method named `print_name()`.\n\nSo, for example, if we have an user object, and a pointer that points to this user object,\nwe can use the pointer to access this user object, and, at the same time, call the method `print_name()`\non it, by chaining the dereference method (`*`) with the `print_name()` method. Like in the\nexample below:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst u = User.init(1, \"pedro\", \"email@gmail.com\");\nconst pointer = &u;\ntry pointer.*.print_name();\n```\n:::\n\n\n\n\n```\npedro\n```\n\nWe can also use pointers to effectively alter the value of an object.\nFor example, I could use the `pointer` object to set\nthe value of the object `number` to 6, like in the example below.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar number: u8 = 5;\nconst pointer = &number;\npointer.* = 6;\ntry stdout.print(\"{d}\\n\", .{number});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n6\n```\n\n\n:::\n:::\n\n\n\n\n\nTherefore, as I mentioned earlier, people use pointers as an alternative way to access a particular value.\nAnd they use it especially when they do not want to \"move\" these values around. There are situations where,\nyou want to access a particular value in a different scope (i.e. a different location) of your code,\nbut you do not want to \"move\" this value to this new scope (or location) that you are in.\n\nThis matters especially if this value is big in size. Because if it is, then,\nmoving this value becomes an expensive operation to do.\nThe computer will have to spend a considerable amount of time\ncopying this value to this new location.\n\nTherefore, many programmers prefer to avoid this heavy operation of copying the value\nto the new location, by accessing this value through pointers.\nWe are going to talk more about this \"moving operation\" over the next sections.\nFor now, just keep in mind that avoiding this \"move operation\" is\none of main reasons why pointers are used in programming languages.\n\n\n\n\n\n## Constant objects vs variable objects {#sec-pointer-var}\n\nYou can have a pointer that points to a constant object, or, a pointer that points to a variable object.\nBut regardless of who this pointer is, a pointer **must always respect the characteristics of the object that it points to**.\nAs a consequence, if the pointer points to a constant object, then, you cannot use this pointer\nto change the value that it points to. Because it points to a value that is constant. As we discussed at @sec-assignments, you cannot\nchange a value that is constant.\n\nFor example, if I have a `number` object, which is constant, I cannot execute\nthe expression below where I'm trying to change the value of `number` to 6 through\nthe `pointer` object. As demonstrated below, when you try to do something\nlike that, you get a compile time error:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst number = 5;\nconst pointer = &number;\npointer.* = 6;\n```\n:::\n\n\n\n\n```\np.zig:6:12: error: cannot assign to constant\n pointer.* = 6;\n```\n\nIf I change the `number` object to be a variable object, by introducing the `var` keyword,\nthen, I can successfully change the value of this object through a pointer, as demonstrated below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar number: u8 = 5;\nconst pointer = &number;\npointer.* = 6;\ntry stdout.print(\"{d}\\n\", .{number});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n6\n```\n\n\n:::\n:::\n\n\n\n\nYou can see this relationship between \"constant versus variable\" on the data type of\nyour pointer object. In other words, the data type of a pointer object already gives you\nsome clues about whether the value that it points to is constant or not.\n\nWhen a pointer object points to a constant value, then, this pointer have a data type `*const T`,\nwhich means \"a pointer to a constant value of type `T`\".\nIn contrast, if the pointer points to a variable value, then, the type of the pointer is usually `*T`, which is\nsimply \"a pointer to a value of type `T`\".\nHence, whenever you see a pointer object whose data type is in the format `*const T`, then,\nyou know that you cannot use this pointer to change the value that it points to.\nBecause this pointer points to a constant value of type `T`.\n\n\nWe have talked about the value pointed by the pointer being constant or not,\nand the consequences that arises from it. But, what about the pointer object itself? I mean, what happens\nif the pointer object itself is constant or not? Think about it.\nWe can have a constant pointer that points to a constant value.\nBut we can also have a variable pointer that points to a constant value. And vice-versa.\n\nUntil this point, the `pointer` object was always constant,\nbut what this means for us? What is the consequence of the\n`pointer` object being constant? The consequence is that\nwe cannot change the pointer object, because it is constant. We can use the\npointer object in multiple ways, but we cannot change the\nmemory address that is inside this pointer object.\n\nHowever, if we mark the `pointer` object as a variable object,\nthen, we can change the memory address pointed by this `pointer` object.\nThe example below demonstrates that. Notice that the object pointed\nby the `pointer` object changes from `c1` to `c2`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c1: u8 = 5;\nconst c2: u8 = 6;\nvar pointer = &c1;\ntry stdout.print(\"{d}\\n\", .{pointer.*});\npointer = &c2;\ntry stdout.print(\"{d}\\n\", .{pointer.*});\n```\n:::\n\n\n\n\n```\n5\n6\n```\n\nThus, by setting the `pointer` object to a `var` or `const` object,\nyou specify if the memory address contained in this pointer object can change or not\nin your program. On the other side, you can change the value pointed by the pointer,\nif, and only if this value is stored in a variable object. If this value\nis in a constant object, then, you cannot change this value through a pointer.\n\n\n## Types of pointer\n\nIn Zig, there are two types of pointers [@zigdocs], which are:\n\n- single-item pointer (`*`);\n- many-item pointer (`[*]`);\n\n\nSingle-item pointer objects are objects whose data types are in the format `*T`.\nSo, for example, if an object have a data type `*u32`, it means that, this\nobject contains a single-item pointer that points to an unsigned 32-bit integer value.\nAs another example, if an object have type `*User`, then, it contains\na single-item pointer to an `User` value.\n\nIn contrast, many-item pointers are objects whose data types are in the format `[*]T`.\nNotice that the star symbol (`*`) is now inside a pair of brackets (`[]`). If the star\nsymbol is inside a pair of brackets, you know that this object is a many-item pointer.\n\nWhen you apply the `&` operator over an object, you will always get a single-item pointer.\nMany-item pointers are more of a \"internal type\" of the language, more closely\nrelated to slices. So, when you deliberately create a pointer with the `&` operator,\nyou always get a single-item pointer as result.\n\n\n\n## Pointer arithmethic\n\nPointer arithmethic is available in Zig, and they work the same way they work in C.\nWhen you have a pointer that points to an array, the pointer usually points to\nthe first element in the array, and you can use pointer arithmethic to\nadvance this pointer and access the other elements in the array.\n\n\nNotice in the example below, that initially, the `ptr` object was pointing\nto the first element in the array `ar`. But then, I started to walk through the array, by advancing\nthe pointer with simple pointer arithmethic.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ar = [_]i32{1,2,3,4};\nvar ptr = &ar;\ntry stdout.print(\"{d}\\n\", .{ptr.*});\nptr += 1;\ntry stdout.print(\"{d}\\n\", .{ptr.*});\nptr += 1;\ntry stdout.print(\"{d}\\n\", .{ptr.*});\n```\n:::\n\n\n\n\n```\n1\n2\n3\n```\n\nAlthough you can create a pointer to an array like that, and\nstart to walk through this array by using pointer arithmethic,\nin Zig, we prefer to use slices, which were presented at @sec-arrays.\n\nBehind the hood, slices already are pointers,\nand they also come with the `len` property, which indicates\nhow many elements are in the slice. This is good because the `zig` compiler\ncan use it to check for potential buffer overflows, and other problems like that.\n\nAlso, you don't need to use pointer arithmethic to walk through the elements\nof a slice. You can simply use the `slice[index]` syntax to directly access\nany element you want in the slice.\nAs I mentioned at @sec-arrays, you can get a slice from an array by using\na range selector inside brackets. In the example below, I'm creating\na slice (`sl`) that covers the entire `ar` array. I can access any\nelement of `ar` from this slice, and, the slice itself already is a pointer\nbehind the hood.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst ar = [_]i32{1,2,3,4};\nconst sl = ar[0..ar.len];\n_ = sl;\n```\n:::\n\n\n\n\n\n## Optionals and Optional Pointers\n\nLet's talk about optionals and how they relate to pointers in Zig.\nBy default, objects in Zig are **non-nullable**. This means that, in Zig,\nyou can safely assume that any object in your source code is not null.\n\nThis is a powerful feature of Zig when you compare it to the developer experience in C.\nBecause in C, any object can be null at any point, and, as consequence, a pointer in C\nmight point to a null value. This is a common source of undefined behaviour in C.\nWhen programmers work with pointers in C, they have to constantly check if\ntheir pointers are pointing to null values or not.\n\nIf for some reason, your Zig code produces a null value somewhere, and, this null\nvalue ends up in an object that is non-nullable, a runtime error is always\nraised by your Zig program. Take the program below as an example.\nThe `zig` compiler can see the `null` value at compile time, and, as result,\nit raises a compile time error. But, if a `null` value is raised during\nruntime, a runtime error is also raised by the Zig program, with a\n\"attempt to use null value\" message.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar number: u8 = 5;\nnumber = null;\n```\n:::\n\n\n\n```\np5.zig:5:14: error: expected type 'u8',\n found '@TypeOf(null)'\n number = null;\n ^~~~\n```\n\n\nYou don't get this type of safety in C.\nIn C, you don't get warnings or errors about null values being produced in your program.\nIf for some reason, your code produces a null value in C, most of the times, you end up getting a segmentation fault error\nas result, which can mean many things.\nThat is why programmers have to constantly check for null values in C.\n\nPointers in Zig are also, by default, **non-nullable**. This is another amazing\nfeature in Zig. So, you can safely assume that any pointer that you create in\nyour Zig code is pointing to a non-null value.\nTherefore, you don't have this heavy work of checking if the pointers you create\nin Zig are pointing to a null value.\n\n\n### What are optionals?\n\nOk, we know now that all objects are non-nullable by default in Zig.\nBut what if we actually need to use an object that might receive a null value?\nHere is where optionals come in.\n\nAn optional object in Zig is an object that can be null.\nTo mark an object as optional, we use the `?` operator. When you put\nthis `?` operator right before the data type of an object, you transform\nthis data type into an optional data type, and the object becomes an optional object.\n\nTake the snippet below as an example. We are creating a new variable object\ncalled `num`. This object have the data type `?i32`, which means that,\nthis object contains either a signed 32-bit integer (`i32`), or, a null value.\nBoth alternatives are valid values to the `num` object.\nThat is why, I can actually change the value of this object to null, and,\nno errors are raised by the `zig` compiler, as demonstrated below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar num: ?i32 = 5;\nnum = null;\n```\n:::\n\n\n\n\n### Optional pointers\n\nYou can also mark a pointer object as an optional pointer, meaning that,\nthis object contains either a null value, or, a pointer that points to a value.\nWhen you mark a pointer as optional, the data type of this pointer object\nbecomes `?*const T` or `?*T`, depending if the value pointed by the pointer\nis a constant value or not. The `?` identifies the object as optional, while\nthe `*` identifies it as a pointer object.\n\nIn the example below, we are creating a variable object named `num`, and an\noptional pointer object named `ptr`. Notice that the data type of the object\n`ptr` indicates that it is either a null value, or a pointer to an `i32` value.\nAlso, notice that the pointer object (`ptr`) can be marked as optional, even if\nthe object `num` is not optional.\n\nWhat this code tells us is that, the `num` variable will never contain a null value.\nThis variable will always contain a valid `i32` value. But in contrast, the `ptr` object might contain either a null\nvalue, or, a pointer to an `i32` value.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar num: i32 = 5;\nvar ptr: ?*i32 = #\nptr = null;\nnum = 6;\n```\n:::\n\n\n\n\nBut what happens if we turn the table, and mark the `num` object as optional,\ninstead of the pointer object. If we do that, then, the pointer object is\nnot optional anymore. It would be a similar (although different) result. Because then, we would have\na pointer to an optional value. In other words, a pointer to a value that is either a\nnull value, or, a not-null value.\n\nIn the example below, we are recreating this idea. Now, the `ptr` object\nhave a data type of `*?i32`, instead of `?*i32`. Notice that the `*` symbol comes before of `?`\nthis time. So now, we have a pointer that points to a value that is either null\n, or, a signed 32-bit integer.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar num: ?i32 = 5;\n// ptr have type `*?i32`, instead of `?*i32`.\nconst ptr = #\n_ = ptr;\n```\n:::\n\n\n\n\n\n### Null handling in optionals {#sec-null-handling}\n\nWhen you have an optional object in your Zig code, you have to explicitly handle\nthe possibility of this object being null. It is like error-handling with `try` and `catch`.\nIn Zig you also have to handle null values like if they were a type of error.\n\nWe can do that, by using either:\n\n- an if statement, like you would do in C.\n- the `orelse` keyword.\n- unwrap the optional value with the `?` method.\n\nWhen you use an if statement, you use a pair of pipes\nto unwrap the optional value, and use this \"unwrapped object\"\ninside the if block.\nUsing the example below as a reference, if the object `num` is null,\nthen, the code inside the if statement is not executed. Otherwise,\nthe if statement will unwrap the object `num` into the `not_null_num`\nobject. This `not_null_num` object is garanteed to be not null inside\nthe scope of the if statement.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst num: ?i32 = 5;\nif (num) |not_null_num| {\n try stdout.print(\"{d}\\n\", .{not_null_num});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n5\n```\n\n\n:::\n:::\n\n\n\n\nNow, the `orelse` keyword behaves like a binary operator. You connect two expressions with this keyword.\nOn the left side of `orelse`, you provide the expression that might result\nin a null value, and on the right side of `orelse`, you provide another expression\nthat will not result in a null value.\n\nThe idea behind the `orelse` keyword is: if the expression on the left side\nresult in a not-null value, then, this not-null value is used. However,\nif this expression on the left side result in a null value, then, the value\nof the expression on the right side is used instead.\n\nLooking at the example below, since the `x` object is currently null, the\n`orelse` decided to use the alternative value, which is the number 15.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst x: ?i32 = null;\nconst dbl = (x orelse 15) * 2;\ntry stdout.print(\"{d}\\n\", .{dbl});\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\n30\n```\n\n\n:::\n:::\n\n\n\n\nYou can use the if statement or the `orelse` keyword, when you want to\nsolve (or deal with) this null value. However, if there is no clear solution\nto this null value, and the most logic and sane path is to simply panic\nand raise a loud error in your program when this null value is encountered,\nyou can use the `?` method of your optional object.\n\nIn essence, when you use this `?` method, the optional object is unwraped.\nIf a not-null value is found in the optional object, then, this not-null value is used.\nOtherwise, the `unreachable` keyword is used. You can read more about this\n[`unreacheable` keyword at the official documentation](https://ziglang.org/documentation/master/#unreachable)[^un-docs].\nBut in essence, when you build your Zig source code using the build modes `ReleaseSafe` or `Debug`, this\n`unreacheable` keyword causes the program to panic and raise an error during runtime,\nlike in the example below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nfn return_null(n: i32) ?i32 {\n if (n == 5) return null;\n return n;\n}\n\npub fn main() !void {\n const x: i32 = 5;\n const y: ?i32 = return_null(x);\n try stdout.print(\"{d}\\n\", .{y.?});\n}\n```\n:::\n\n\n\n\n```\nthread 12767 panic: attempt to use null value\np7.zig:12:34: 0x103419d in main (p7):\n try stdout.print(\"{d}\\n\", .{y.?});\n ^\n```\n\n\n[^un-docs]: .\n\n\n",
"supporting": [
"05-pointers_files"
],
diff --git a/_freeze/Chapters/09-data-structures/execute-results/html.json b/_freeze/Chapters/09-data-structures/execute-results/html.json
index 5a75ff7..7ef0d1a 100644
--- a/_freeze/Chapters/09-data-structures/execute-results/html.json
+++ b/_freeze/Chapters/09-data-structures/execute-results/html.json
@@ -1,9 +1,11 @@
{
- "hash": "e12d5504183883b04436a702f6db9acb",
+ "hash": "47fdbf80e1d0f091bc3e5596a4ef5de3",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n\n# Data Structures\n\nIn this chapter, we are going to discuss some Data Structures that are available from\nthe Zig Standard Library, especially `ArrayList` and also `HashMap`. I'm also want\nto talk about one of the key features of Zig in this chapter, which is `comptime`, and\nhow we can use it to create generics in Zig.\n\n\n## Dynamic Arrays {#sec-dynamic-array}\n\nIn high level languages, arrays are usually dynamic. They easily grow\nin size when they have to, and you don't need to worry about it.\nIn contrast, arrays in low level languages are usually static by default.\nThis is the reality of C, C++, Rust and also Zig. Static arrays were presented at\n@sec-arrays, but in this section, we are going to talk about dynamic arrays.\n\nDynamic arrays are simply arrays that can grow in size during the runtime\nof your program. Most low level languages offer some implementation of\na dynamic array in their standard library. C++ have `std::vector`, Rust have `Vec`,\nand Zig have `std.ArrayList`.\n\nThe `std.ArrayList` struct provides a contiguous and growable array for you.\nIt works like any other dinamic array, it allocates a contiguous block of memory, and when this block have no space left,\n`ArrayList` allocates another contiguous and bigger block of memory, copies the\nelements to this new location, and erases (or frees) the previous block of memory.\n\n\n### Capacity vs Length\n\nWhen we talk about dynamic arrays, we have two similar concepts that\nare very essential to how a dynamic array works behind the hood.\nThese concepts are *capacity* and *length*. In some contexts, especially\nin C++, *length* is also called of *size*.\n\nAlthough they look similar, these concepts represent different things\nin the context of dynamic arrays. *Capacity* is the number of items (or elements)\nthat your dynamic array can currently hold without the need to allocate more memory.\n\nIn contrast, the *length* refers to how many elements in the array\nare currently being used, or, in other words, how many elements in this array\nthat you assigned a value to. Every dynamic array works around\na block of allocated memory that represents an array with total capacity of $n$ elements,\nbut only a portion of these $n$ elements are being used most of the time. This portion\nof $n$ is the *length* of the array. So every time you append a new value\nto the array, you are incrementing it's *length* by one.\n\nThis means that a dynamic array usually works with an extra margin, or, an extra space\nwhich is currently empty, but it is waiting and ready to be used. This \"extra space\"\nis essentially the difference between *capacity* and *length*. *Capacity* represents\nthe total number of elements that the array can hold without the need to re-allocate\nor re-expand the array, while the *length* represents how much of this capacity\nis currently being used to hold/store values.\n\n@fig-capacity-length presents this idea visually. Notice that, at first,\nthe capacity of the array is greater than the length of the array.\nSo, the dynamic array have extra space that is currently empty, but it\nis ready to receive a value to be stored.\n\n![Difference between capacity and length in a dynamic array](./../Figures/dynamic-array.png){#fig-capacity-length}\n\nWe can also see at @fig-capacity-length that, when *length* and *capacity* are equal, it means that the array have no space left.\nWe reached the roof of our capacity, and because of that, if we want to store more values\nin this array, we need to expand it. We need to get a bigger space that can hold more values\nthat we currently have.\n\nA dynamic array works by expanding the underlying array, whenever the *length* becomes equal\nto the *capacity* of the array. It basically allocates a new contiguos block of memory that is bigger\nthan the previous one, then, it copies all values that are currently being stored to this new\nlocation (i.e. this new block of memory), then, it frees the previous block of\nmemory. At the end of this process, the new underlying array have a bigger *capacity*, and, therefore,\nthe *length* becomes once again smaller than the *capacity* of the array.\n\nThis is the cycle of an dynamic array. Notice that, throughout this cycle, the *capacity* is always\neither equal to or higher than the *length* of the array. If youh have an `ArrayList` object, let's suppose\nyou named it of `buffer`, you can check the current capacity of your array by accessing the `capacity`\nattribute of your `ArrayList` object, while the current *length* of it is available through the `items.len`\nattribute of your `ArrayList` object.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// Check capacity\nbuffer.capacity;\n// Check length\nbuffer.items.len;\n```\n:::\n\n\n\n\n\n### Creating an `ArrayList` object\n\nIn order to use `ArrayList`, you must provide an allocator object to it.\nRemember, Zig does not have a default memory allocator. And as I described at @sec-allocators, all memory\nallocations must be done by allocator objects that you define, that\nyou have control over. In our example here, I'm going to use\na general purpose allocator, but you can use any other allocator\nof your preference.\n\nWhen you initialize an `ArrayList` object, you must provide the data type of the elements of\nthe array. In other words, this defines the type of data that this array (or container) will\nstore. Therefore, if I provide the `u8` type to it, then, I will create a dynamic\narray of `u8` values. However, if I provide a struct that I defined instead, like the struct `User`\nfrom @sec-structs-and-oop, then, a dynamic array of `User` values\nwill be created. In the example below, with the expression `ArrayList(u8)` we\nare creating a dynamic array of `u8` values.\n\nAfter you provide the data type of the elements of the array, you can initialize\nan `ArrayList` object by either using the `init()` or the `initCapacity()` method.\nThe former method receives only the allocator object\nas input, while the latter method receives both the allocator object and a capacity number as inputs.\nWith the latter method, you not only initialize the struct, but you\nalso set the starting capacity of the allocated array.\n\nUsing the `initCapacity()` method is the preferred way to initialize your dynamic array.\nBecause reallocations, or, in other words, the process of expanding the capacity of the array,\nis always a high cost operation. You should take any possible opportunity to avoid reallocations in\nyour array. If you know how much space your array needs to occupy at the beginning,\nyou should always use `initCapacity()` to create your dynamic array.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nvar buffer = try std.ArrayList(u8)\n .initCapacity(allocator, 100);\ndefer buffer.deinit();\n```\n:::\n\n\n\n\n\n\nIn the example above, the `buffer` object starts as an array of 100 elements. If this\n`buffer` object needs to create more space to accomodate more elements during the runtime of your program, the `ArrayList`\ninternals will perform the necessary actions for you automatically.\nAlso notice the `deinit()` method being used to destroy the `buffer` object at the\nend of the current scope, by freeing all the memory that was allocated for the dynamic\narray stored in this `buffer` object.\n\n\n### Adding new elements to the array\n\nNow that we created our dynamic array, we can start to use it. You can append (a.k.a \"add\")\nnew values to this array by using the `append()` method. This method works the same way\nas the `append()` method from a Python list, or, the `emplace_back()` method from `std::vector` of C++.\nYou provide a single value to this method, and the method appends this value to the array.\n\nYou can also use the `appendSlice()` method to append multiple values at once. You provide\na slice (slices were described at @sec-arrays) to this method, and the method adds all values present\nin this slice to your dynamic array.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\ntry buffer.append('H');\ntry buffer.append('e');\ntry buffer.append('l');\ntry buffer.append('l');\ntry buffer.append('o');\ntry buffer.appendSlice(\" World!\");\n```\n:::\n\n\n\n\n\n### Removing elements from the array {#sec-dynamic-array-remove}\n\nYou can use the `pop()` method to \"pop\" or remove\nthe last element in the array. Is worth noting that this method\ndo not change the capacity of the array. It just deletes or erases\nthe last value stored in the array.\n\nAlso, this method returns as result the value that got deleted. That is, you can\nuse this method to both get the last value in the array, and also, remove\nit from the array. It is a \"get and remove value\" type of method.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst exclamation_mark = buffer.pop();\n```\n:::\n\n\n\n\n\nNow, if you want to remove specific elements from specific positions\nof your array, you can use the `orderedRemove()` method from your\n`ArrayList` object. With this method, you can provide an index as input,\nthen, the method will delete the value that is at this index in the array.\nThis effectively reduces the *length* of the array everytime you execute\nan `orderedRemove()` operation.\n\nIn the example below, we first create an `ArrayList` object, and we fill it\nwith numbers. Then, we use `orderedRemove()` to remove the value at\nindex 3 in the array, two consecutive times.\n\nAlso, notice that we are assigning the result of `orderedRemove()` to the\nunderscore character. So we are discarding the result value of this method.\nAs the result value, the `orderedRemove()` method returns the value that\ngot deleted, in a similar style to the `pop()` method.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nvar buffer = try std.ArrayList(u8)\n .initCapacity(allocator, 100);\ndefer buffer.deinit();\n\nfor (0..10) |i| {\n const index: u8 = @intCast(i);\n try buffer.append(index);\n}\n\nstd.debug.print(\n \"{any}\\n\", .{buffer.items}\n);\n_ = buffer.orderedRemove(3);\n_ = buffer.orderedRemove(3);\n\nstd.debug.print(\n \"{any}\\n\", .{buffer.items}\n);\nstd.debug.print(\n \"{any}\\n\", .{buffer.items.len}\n);\n```\n:::\n\n\n\n\n\n```\n{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }\n{ 0, 1, 2, 5, 6, 7, 8, 9 }\n8\n```\n\nOne key characteristic about `orderedRemove()` is that it preserves the order\nof the values in the array. So, it deletes the value that you asked it to\nremove, but it also makes sure that the order of the values that remain in the array\nstay the same as before.\n\nNow, if you don't care about the order of the values, for example, maybe you want to treat\nyour dynamic array as a set of values, like the `std::unordered_set`\nstructure from C++, you can use the `swapRemove()` method instead. This method\nworks similarly to the `orderedRemove()` method. You give an index to this\nmethod, then, it deletes the value that is at this index in the array.\nBut this method does not preserve the original order of the values that remain\nin the array. As a result, `swapRemove()` is, in general, faster than `orderedRemove()`.\n\n\n### Inserting elements at specific indexes\n\nWhen you need to insert values in the middle of your array,\ninstead of just appending them to the end of the array, you need to use\nthe `insert()` and `insertSlice()` methods, instead of\nthe `append()` and `appendSlice()` methods.\n\nThese two methods work very similarly to `insert()` and `insert_range()`\nfrom the C++ vector class. You provide an index to these methods,\nand they insert the values that you provide at that index in the array.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nvar buffer = try std.ArrayList(u8)\n .initCapacity(allocator, 10);\ndefer buffer.deinit();\n\ntry buffer.appendSlice(\"My Pedro\");\ntry buffer.insert(4, '3');\ntry buffer.insertSlice(2, \" name\");\nfor (buffer.items) |char| {\n try stdout.print(\"{c}\", .{char});\n}\n```\n:::\n\n\n\n\n\n```\nMy name P3edro\n```\n\n\n### Conclusion\n\nIf you feel the lack of some other method, I recommend\nyou to read the [official documentation for the `ArrayListAligned`](https://ziglang.org/documentation/master/std/#std.array_list.ArrayListAligned)[^zig-array2]\nstruct, which describes most of the methods available\nthrough the `ArrayList` object.\n\nYou will notice that there is a lot other methods in this page that\nI did not described here, and I recommend you to explore these methods,\nand understand how they work.\n\n[^zig-array2]: \n\n\n\n## Maps or HashTables {#sec-maps-hashtables}\n\nSome professionals know this type of data structure by different terms, like \"map\", \"hashmap\" or \"associative arrays\". But most professionals\nknow this structure by the name *hashtable*.\nEvery programming language normally have some implementation of a hashtable in their\nstardard libraries. Python have `dict()`, C++ have `std::map` and `std::unordered_map`, Rust\nhave `HashMap`, Javascript have `Object()` and `Map()`,\nC# have `Hashtable()`, etc.\n\n\n\n### What is a hashtable?\n\nA hashtable is a data structure based on key-value pairs.\nYou provide a key and a value to this structure, then, the hashtable will store\nthe input value at a location that can be identified by the input\nkey that you provided.\nIt does that by using an underlying array and a hash function.\nThese two components are essential to how a hashtable works.\n\nUnder the hood, the hashtable contains an array. This array is where the values\nare stored, and the elements of this array are usually called of *buckets*.\nSo the values that you provide to the hashtable are stored inside buckets,\nand you access each bucket by using an index.\n\nWhen you provide a key to a hashtable, it passes this key to the\nhash function. This hash function uses some sort of hashing algorithm to transform\nthis key into an index. This index is actually an array index. It is a position\nin the underlying array of the hashtable.\nThis is how a key identifies a specific position (or location) inside the hashtable\nstructure.\n\nSo you provide a key to the hashtable, and this key identifies an specific location\ninside the hastable, then, the hashtable takes the input value that you provided,\nand stores this value in the location identified by the input key that you provided.\nYou could say that the key maps to the value stored in the hashtable. You find\nthe value, by using the key that identifies the location where the value is stored.\nThe @fig-hashtable presents this process visually.\n\n\n![A diagram of a Hashtable. Source: Wikipedia, the free encyclopedia.](./../Figures/hashtable.svg){#fig-hashtable}\n\n\nThe operation described in the previous paragraph is normally called an *insertion* operation.\nBecause you are inserting new values into the hashtable.\nBut there are other types of operations in hashtables such as *delete* and *lookup*.\nDelete is self describing, it is when you delete (or remove) a value from the hashtable.\nWhile lookup corresponds to when you retrieve (or look at) a value that is stored in\nthe hashtable, by using the key that identifies the location where this value is stored.\n\nSometimes, instead of storing the values directly, the underlying array of the hashtable might be an array of pointers,\ni.e. the buckets of the array stores pointers that points to the value,\nor also, may be an array of linked lists.\nThese cases are common on hashtables that allows duplicate keys, or, in other words,\non hashtables that effectively handle \"collisions\" that may arise from the hash function.\n\nDuplicate keys, or this \"collision\" thing that I'm talking about, is when you have two different keys that points to the same location (i.e. to the same index)\nin the underlying array of the hashtable. This might happen depending on the characteristics of the hash function\nthat is being used in the hashtable. Some implementations of the hashtable will actively deal with collisions,\nmeaning that, they will handle this case in some way. For example, the hashtable\nmight transform all buckets into linked lists. Because with a liked list you can store\nmultiple values into a single bucket.\n\nThere are different techniques to handle collisions in hashtables, which I will not describe\nin this book, because it is not our main scope here. But you can find a good description of\nsome of the most common techniques at the Wikipedia page of hashtables [@wikipedia_hashtables].\n\n\n### Hashtables in Zig {#sec-hashmap}\n\nThe Zig Standard Library provides different implementations of a hashtable,\nlike the struct `HashMap`. Each implementation have it's own cons and pros, which we will\ndiscuss later on, and all of them are available through the `std.hash_map` module.\n\nThe `HashMap` struct is a general-purpose hashtable,\nwhich have very fast operations (lookup, insertion, delete), and also,\nquite high load factors for low memory usage. You can create and provide a context object\nto the `HashMap` constructor. This context object allows you to tailor\nthe behaviour of the hashtable itself, because you can\nprovide a hash function implementation to be used by the hashtable\nthrough this context object.\n\nBut let's not worry about this context object now, because it is meant to be used\nby \"experts in the field of hashtables\". Since we are most likely not\nexperts in this field, we are going to take the easy way to create\na hashtable. Which is by using the `AutoHashMap()` function.\n\n\nThis `AutoHashMap()` function is essentially a \"create a hashtable object that uses the default settings\"\ntype of function. It chooses a context object, and, therefore, a hash function implementation,\nautomatically for you. This function receives two data types as input, the first data type is the data type of the keys\nthat will be used in this hashtable, while the second data type is the data type of that data that will be\nstored inside the hashtable, that is, the data type of the values to be stored.\n\nIn the example below, we are providing the data type `u32` in the first argument, and `u16` in the second argument of this\nfunction. It means that we are going to use `u32` values as keys in this hashtable, while `u16` values are the actual values\nthat are going to be stored into this hashtable.\nAt the end of this process, the `hash_table` object contains a `HashMap` object as output\nthat uses the default context, and the default load factor.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst AutoHashMap = std.hash_map.AutoHashMap;\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var hash_table = AutoHashMap(u32, u16).init(allocator);\n defer hash_table.deinit();\n\n try hash_table.put(54321, 89);\n try hash_table.put(50050, 55);\n try hash_table.put(57709, 41);\n std.debug.print(\n \"N of values stored: {d}\\n\",\n .{hash_table.count()}\n );\n std.debug.print(\n \"Value at key 50050: {d}\\n\",\n .{hash_table.get(50050).?}\n );\n\n if (hash_table.remove(57709)) {\n std.debug.print(\n \"Value at key 57709 succesfully removed!\\n\",\n .{}\n );\n }\n std.debug.print(\n \"N of values stored: {d}\\n\",\n .{hash_table.count()}\n );\n}\n```\n:::\n\n\n\n\n\n```\nN of values stored: 3\nValue at key 50050: 55\nValue at key 57709 succesfully removed!\nN of values stored: 2\n```\n\nYou can add/put new values into the hashtable by using the `put()` method. The first argument\nis the key to be used, and the second argument is the actual value that you want to store inside\nthe hashtable. In the example below, we first add the value 89 using the key 54321, next, we add\nthe value 55 using the key 50050, etc.\n\nNotice that we used the method `count()` to see how many values are currently stored in the\nhashtable. After that, we also used the `get()` method to access (or look) at the value stored in\nthe position identified by the key 500050. The output of this `get()` method is an optional value,\nand that is why we use the `?` method at the end to get access to the actual value.\n\nAlso notice that we can remove (or delete) values from a hashtables by using the `remove()` method.\nYou provide the key that identifies the value that you want to delete, then, the method will\ndelete this value and return a `true` value as output. This `true` value essentially tells us\nthat the method succesfully deleted the value.\n\nBut this delete operation might not be always successful. For example, you might provide the wrong\nkey to this method. I mean, maybe you provide\n(either intentionally or unintentionally) a key that points to an empty bucket,\ni.e. a bucket that still doesn't have a value in it.\nIn this case, the `remove()` method would return a `false` value.\n\n\n\n### Iterating through the hashtable\n\nIterating through the keys and values that are currently being stored in\nthe hashtable is a very common need.\nYou can do that in Zig by using an iterator object that can iterate\nthrough the elements of you hashtable object.\n\nThis iterator object works like any other iterator object that you would\nfind in languages such as C++ and Rust. It is basically a pointer object\nthat points to some value in the container, and has a `next()` method\nthat you can use to navigate (or iterate) through the next values in the\ncontainer.\n\nYou can create such iterator object by using the `iterator()` method of the hashtable object.\nThis method returns an iterator object, from which you can use the `next()` method in conjunction\nwith a while loop to iterate through the elements of your hashtable. The `next()` method returns an optional\n`Entry` value, and therefore, you must unwrap this optional value to get the actual `Entry` value\nfrom which you can access the key and also the value identified by this key.\n\nWith this `Entry` value at hand, you can access the key of this current entry by using the `key_ptr`\nattribute and dereferencing the pointer that lives inside of it, while the value identified by this\nkey is accessed through the `value_ptr` attribute instead, which is also a pointer to be dereferenced.\nThe code example below demonstrates the use of these elements:\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst AutoHashMap = std.hash_map.AutoHashMap;\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var hash_table = AutoHashMap(u32, u16).init(allocator);\n defer hash_table.deinit();\n\n try hash_table.put(54321, 89);\n try hash_table.put(50050, 55);\n try hash_table.put(57709, 41);\n\n var it = hash_table.iterator();\n while (it.next()) |kv| {\n // Access the current key\n std.debug.print(\"Key: {d} | \", .{kv.key_ptr.*});\n // Access the current value\n std.debug.print(\"Value: {d}\\n\", .{kv.value_ptr.*});\n }\n}\n```\n:::\n\n\n\n\n\n```\nKey: 54321 | Value: 89\nKey: 50050 | Value: 55\nKey: 57709 | Value: 41\n```\n\n\nIf you want to iterate through only the values or the keys of your hashtable,\nyou can create a key iterator or a value iterator object. These are also iterator\nobjects, which have the same `next()` method that you can use to iterate through the\nsequence of values.\n\nKey iterators are created from the `keyIterator()` method of your\nhashtable object, while value iterators are created from the `valueIterator()` method.\nAll you have to do is to unwrap the value from the `next()` method and deference it\ndirectly to access the key or value that you iterating over.\nThe code example below demonstrates what would this be for a key iterator,\nbut you can replicate the same logic to a value iterator.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar kit = hash_table.keyIterator();\nwhile (kit.next()) |key| {\n std.debug.print(\"Key: {d}\\n\", .{key.*});\n}\n```\n:::\n\n\n\n\n\n```\nKey: 54321\nKey: 50050\nKey: 57709\n```\n\n\n### The `ArrayHashMap` hashtable {#sec-array-map}\n\nIf you need to iterate through the elements of your hashtable constantly,\nyou might want to use the `ArrayHashMap` struct for your specific case,\ninstead of going with the usual and general-purpose `HashMap` struct.\n\nThe `ArrayHashMap` struct creates a hashtable that is faster to iterate over.\nThat is why this specific type of hashtable might be valuable to you.\nSome other properties of a `ArrayHashMap` hashtable are:\n\n- the order of insertion is preserved. So the order of the values you find while iterating through this hashtable\nare actually the order in which these values were inserted in the hashtable.\n\n- the key-value pairs are stored sequentially, one after another.\n\n\nYou can create an `ArrayHashMap` object by using, once again, a helper function that\nchooses automatically for you a hash function implementation. This is the\n`AutoArrayHashMap()` function, which works very similarly to the `AutoHashMap()`\nfunction that we presented at @sec-hashmap.\n\nYou provide two data types to this function. The data type of the keys that will be\nused in this hashtable, and the data type of the values that will be stored in\nthis hashtable.\n\nAn `ArrayHashMap` object have essentially the exact same methods from the `HashMap` struct.\nSo you can insert new values into the hashtable by using the `put()` method, you can look (or get)\na value from the hashtable by using the `get()` method. But the `remove()` method is not available\nin this specific type of hashtable.\n\nIn order to delete values from the hashtable, you would use the same methods that you find in\nan `ArrayList` object, i.e. a dynamic array. I presented these methods at @sec-dynamic-array-remove,\nwhich are the `swapRemove()` and `orderedRemove()` methods. These methods have here the same meaning, or,\nthe same effect that they have in an `ArrayList` object.\n\nThis means that, with `swapRemove()` you remove the value from the hashtable, but you do not preserve\nthe order in which the values were inserted into the structure. While `orderedRemove()` is capable\nof retaining the insertion order of these values.\n\nBut instead of providing an index as input to `swapRemove()` or `orderedRemove()`, like I described\nat @sec-dynamic-array-remove, these methods here in an `ArrayHashMap` take a key as input, like\nthe `remove()` method from a `HashMap` object. If you want to provide an index as input, instead\nof a key, you should use the `swapRemoveAt()` and `orderedRemoveAt()` methods.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar hash_table = AutoArrayHashMap(u32, u16)\n .init(allocator);\ndefer hash_table.deinit();\n```\n:::\n\n\n\n\n\n\n\n### The `StringHashMap` hashtable {#sec-string-hash-map}\n\nOne thing that you will notice in the other two types of hashtables that I\npresented in the last sections, is that neither of them accepts a slice data type\nin their keys.\nWhat this means is that you cannot use a slice value to represent a key in\nthese types of hashtable.\n\nThe most obvious consequence of this, is that you cannot use strings as keys\nin these hashtables. But is extremely common to use string values as keys\nin hashtables.\n\nTake this very simple Javascript code snippet as an example. We are creating\na simple hashtable object named `people`. Then, we add a new entry to this\nhashtable, which is identified by the string `'Pedro'`. This string is the\nkey in this case, while the object containing different personal information such as\nage, height and city, is the value to be stored in the hashtable.\n\n```js\nvar people = new Object();\npeople['Pedro'] = {\n 'age': 25,\n 'height': 1.67,\n 'city': 'Belo Horizonte'\n};\n```\n\nThis pattern of using strings as keys is very common in\nall sorts of situations. That is why the Zig Standard Library offers a\nspecific type of hashtable for this purpose, which is created through the `StringHashMap()` function.\nThis function creates a hashtable that uses strings as keys. The only input of this\nfunction is the data type of the values that will be stored into this hashtable.\n\nIn the example below, I'm creating a hashtable to store the ages of different people.\nThe keys to be used in this hashtable are the names of each person, while the value stored in the\nhashtable is the age of the person identified by the key.\n\nThat is why I provide the `u8` data type (which is the data type used by the age values) as input to this `StringHashMap()` function.\nAs the result, it creates a hashtable that uses string values as keys, and, that stores\n`u8` values in it. Notice that an allocator object is provided at the `init()` method of the\nresulting object from the `StringHashMap()` function.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var ages = std.StringHashMap(u8).init(allocator);\n defer ages.deinit();\n\n try ages.put(\"Pedro\", 25);\n try ages.put(\"Matheus\", 21);\n try ages.put(\"Abgail\", 42);\n\n var it = ages.iterator();\n while (it.next()) |kv| {\n std.debug.print(\"Key: {s} | \", .{kv.key_ptr.*});\n std.debug.print(\"Age: {d}\\n\", .{kv.value_ptr.*});\n }\n}\n```\n:::\n\n\n\n\n\n```\nKey: Pedro | Age: 25\nKey: Abgail | Age: 42\nKey: Matheus | Age: 21\n```\n\n\n### The `StringArrayHashMap` hashtable\n\nThe Zig Standard Library also provides a type of hashtable that mix the cons and pros of the\ntypes of hashtables that were presented on the previous two sections. That is, a hashtable\nthat uses strings as keys, but also have the advantages from the `ArrayHashMap` struct.\nIn other words, you can have a hashtable that is fast to iterate over,\nthat preserves insertion order, and also, that uses strings as keys.\n\nYou can create such type of hashtable by using the `StringArrayHashMap()` function.\nThis function accepts a data type as input, which is the data type of the values that are\ngoing to be stored inside this hashtable, in the same style as the function presented\nat @sec-string-hash-map.\n\nYou can insert new values into this hashtable by using the same `put()` method that\nI presented at @sec-string-hash-map. And you can also get values from the hashtable\nby using the same `get()` method that I exposed on previous sections.\nLike it's `ArrayHashMap` brother, to delete values from this specific type of hashtable,\nwe also use the `orderedRemove()` and `swapRemove()` methods, with the same effects that\nI described at @sec-array-map.\n\nIf we take the code example that was exposed at @sec-string-hash-map, we can\nachieve the exact same result with `StringArrayHashMap()`. All we have to do\nis to change the use of `StringHashMap()` to `StringArrayHashMap()` at the\nfifth line in this code example. It would change to this:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar ages = std.StringArrayHashMap(u8).init(allocator);\n```\n:::\n\n\n\n\n\n\n\n## Linked lists\n\nThe Zig Standard Library provides implementation for both single and doubly linked lists.\nA linked list is a linear data structure that looks like a chain, or, a rope.\nThe main advantage of this data structure is that you normally have fast\ninsertion and deletion operations. But, as a disadvantage, iterating through\nthis data structure is usually not so fast as iterating through an array.\n\nThe idea behind a linked list is basically build a structure that concists of a series of nodes\nconnected to each other by pointers. This means that linked lists are usually not contiguos\nin memory, because each node might be south in memory, but the next node might be north\nin memory, then the next node might be left in memory, anyway, you get it, they can be anywhere.\n\nAt @fig-linked-list we can see a diagram of a singly linked list. Notice that we begin with\na first node. This first node is usually called \"the head of the linked list\". Then, from this\nfirst node we uncover the remaining nodes in the structure, by following the locations pointed\nby the pointers.\n\nEvery node have two things in it. It have the value that is stored in the current node\n, and also have a pointer. This pointer points to the next node in the list. If this pointer\nis null, then, it means that we reached the end of our linked list.\n\n![A diagram of a singly linked list.](./../Figures/linked-list.png){#fig-linked-list}\n\n\nAt @fig-linked-list2 we can see a diagram of a doubly linked list. The only thing that really\nchanges is that every node in the linked list have both a pointer to the previous node,\nand, a pointer to the next node. So every node have now two pointers in it. These are\nusually called the `prev` (for \"previous\") and `next` (for \"next\") pointers of the node.\n\nIn the singly linked list example, we had only one single pointer in each node, and this singular\npointer was always pointing to the next node in the sequence. In other words, singly linked lists\nnormally have only the `next` pointer in them.\n\n![A diagram of a doubly linked list.](./../Figures/doubly-linked-list.png){#fig-linked-list2}\n\n\n\nLinked lists are available in Zig through the functions `SinglyLinkedList()` and\n`DoublyLinkedList()`, for \"singly linked lists\" and \"doubly linked lists\", respectively. These functions are\nactually generic functions, which we are going to talk more about at @sec-generic-fun.\n\nFor now, just understand that, in order to create a linked list object,\nwe begin by providing a data type to these functions. This data type defines\nthe type of data that this linked list will store. In the example below,\nwe are creating a singly linked list capable of storing `u32` values.\nSo each node in this linked list will store a `u32` value.\n\nBoth the `SinglyLinkedList()` and `DoublyLinkedList()` functions returns a type, i.e. a struct definition, as result. This means that\nthe object `Lu32` is actually a type definition, or a struct definition. It defines\nthe type \"singly linked list of `u32` values\".\n\nSo now that we have the definition of the struct, we have to instantiate a `Lu32` object.\nWe normally instantiate struct objects in Zig by using an `init()` method.\nBut in this case, we are instantiating the struct directly, by using an empty\n`struct` literal, in the expression `Lu32{}`.\n\nIn this example, we first create multiple node objects, and after we create them,\nwe start to insert and connect these nodes to build the linked list, using the\n`prepend()` and `insertAfter()` methods. Notice that the `prepend()` method\nis a method from the linked list object, while the `insertAfter()` is a method\npresent in the node objects.\n\nIn essence, the `prepend()` method inserts a node at the beginning of the linked\nlist. In other words, the node that you provide to this method, becomes the new\n\"head node\" of the linked list. It becomes the first node in the list (see @fig-linked-list).\n\nOn the other side, the `insertAfter()` method is used to basically connect two nodes together.\nWhen you provide a node to this method, it creates a pointer to this input node,\nand stores this pointer in the current node, from which the method was called from.\nIn other words, this method creates the pointer that connects these two nodes together\nand stores it in the `next` attribute of the current node.\n\nSince doubly linked list have both a `next` and a `prev` pointers in each node,\nreferring to the next and previous nodes in the sequence, respectively,\nas I described at @fig-linked-list2, a node object created from\na `DoublyLinkedList()` object would have both a\n`insertBefore()` (for `prev`) and a `insertAfter()` (for `next`) methods\navailable.\n\nThis means that, if we used a doubly linked list, we could use the `insertBefore()` method\nto store the pointer to the input node in the `prev` attribute. This would put the input\nnode as the \"previous node\", or, the node before the current node. The `insertAfter()` method\nhave \"after\" in it's name to indicate that this method puts the pointer created to the input\nnode in the `next` attribute of the current node, and as the result, the input node becomes\nthe \"next node\" of the current node.\n\nSince we are using a singly linked list in this example, we have only the `insertAfter()` method\navailable in the node objects that we create from our `Lu32` type.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SinglyLinkedList = std.SinglyLinkedList;\nconst Lu32 = SinglyLinkedList(u32);\n\npub fn main() !void {\n var list = Lu32{};\n var one = Lu32.Node{ .data = 1 };\n var two = Lu32.Node{ .data = 2 };\n var three = Lu32.Node{ .data = 3 };\n var four = Lu32.Node{ .data = 4 };\n var five = Lu32.Node{ .data = 5 };\n\n list.prepend(&two); // {2}\n two.insertAfter(&five); // {2, 5}\n list.prepend(&one); // {1, 2, 5}\n two.insertAfter(&three); // {1, 2, 3, 5}\n three.insertAfter(&four); // {1, 2, 3, 4, 5}\n}\n```\n:::\n\n\n\n\n\n\nThere are other methods available from the linked list object, depending if this object is\na singly linked list or a doubly linked list, that might be very useful for you, like:\n\n- `remove()` to remove a specific node from the linked list.\n- `popFirst()` to remove the first node from the linked list.\n- if singly linked list, `len()` to count how many nodes there is in the linked list.\n- if doubly linked list, checkout the `len` attribute to see how many nodes there is in the linked list.\n- if singly linked list, `popFirst()` to remove the first node from the linked list.\n- if doubly linked list, `pop()` and `popFirst()` to remove the last and first nodes from the linked list, respectively.\n- if doubly linked list, `append()` to add a new node to end of the linked list (i.e. inverse of `prepend()`).\n\n\n\n## Multi array structure\n\nZig introduces a new data structure called `MultiArrayList()`. It is a different version of the dynamic array\nthat we have introduced at @sec-dynamic-array. The difference between this structure and the `ArrayList()`\nthat we know from @sec-dynamic-array, is that `MultiArrayList()` creates a separate dynamic array\nfor each field of the struct that you provide as input.\n\nConsider the following code example. We create a new custom struct called `Person`. This\nstruct contains three different data members, or, three different fields. As consequence,\nwhen we provide this `Person` data type as input to `MultiArrayList()`, this\ncreates a \"struct of three different arrays\" called `PersonArray`. In other words,\nthis `PersonArray` is a struct that contains three internal dynamic arrays in it.\nOne array for each field found in the `Person` struct definition.\n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Person = struct {\n name: []const u8,\n age: u8,\n height: f32,\n};\nconst PersonArray = std.MultiArrayList(Person);\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var people = PersonArray{};\n defer people.deinit(allocator);\n\n try people.append(allocator, .{\n .name = \"Auguste\", .age = 15, .height = 1.54\n });\n try people.append(allocator, .{\n .name = \"Elena\", .age = 26, .height = 1.65\n });\n try people.append(allocator, .{\n .name = \"Michael\", .age = 64, .height = 1.87\n });\n}\n```\n:::\n\n\n\n\n\nIn other words, instead of creating an array of \"persons\", the `MultiArrayList()` function\ncreates a \"struct of arrays\". Each data member of this struct is a different array that stores\nthe values of a specific field from the `Person` struct values that were added (or, appended) to this \"struct of arrays\".\nOne important detail is that each of these separate internal arrays stored inside `PersonArray`\nare dynamic arrays. This means that these arrays can grow in capacity automatically as needed, to accomodate\nmore values.\n\nThe @fig-multi-array exposed below presents a diagram that describes the `PersonArray` struct\nthat we have created in the previous code example. Notice that the values of the data members\npresent in each of the three `Person` values that we have appended into the `PersonArray` object\nthat we have instantiated, are scattered across three different internal arrays of the `PersonArray` object.\n\n![A diagram of the `PersonArray` struct.](./../Figures/multi-array.png){#fig-multi-array}\n\nYou can easily access each of these arrays separately, and iterate over the values of each array.\nFor that, you will need to call the `items()` method from the `PersonArray` object, and provide as input\nto this method, the name of the field that you want to iterate over.\nIf you want to iterate through the `.age` array for example, then, you need to call `items(.age)` from\nthe `PersonArray` object, like in the example below:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfor (people.items(.age)) |*age| {\n try stdout.print(\"Age: {d}\\n\", .{age.*});\n}\n```\n:::\n\n\n\n\n\n```\nAge: 15\nAge: 26\nAge: 64\n```\n\n\nIn the above example, we are iterating over the values of the `.age` array, or,\nthe internal array of the `PersonArray` object that contains the values of the `age`\ndata member from the `Person` values that were added to the multi array struct.\n\nIn this example we are calling the `items()` method directly from the `PersonArray`\nobject. However, it is recommended on most situations to call this `items()` method\nfrom a \"slice object\", which you can create from the `slice()` method.\nThe reason for this is that calling `items()` multiple times have better performance\nif you use a slice object.\n\nIn other words, if you are planning to access only one of the\ninternal arrays from your \"multi array struct\", it is fine to call `items()` directly\nfrom the multi array object. But if you need to access many of the internal arrays\nfrom your \"multi array struct\", then, you will likely need to call `items()` more\nthan once, and, in such circustance, is better to call `items()` through a slice object.\nThe example below demonstrates the use of such object:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar slice = people.slice();\nfor (slice.items(.age)) |*age| {\n age.* += 10;\n}\nfor (slice.items(.name), slice.items(.age)) |*n,*a| {\n try stdout.print(\n \"Name: {s}, Age: {d}\\n\", .{n.*, a.*}\n );\n}\n```\n:::\n\n\n\n\n\n```\nName: Auguste, Age: 25\nName: Elena, Age: 36\nName: Michael, Age: 74\n```\n\n\n## Conclusion\n\nThere are many other data structures that I did not presented here.\nBut you can check them out at the offical Zig Standard Library documentation page.\nActually, when you get into the [homepage of the documentation](https://ziglang.org/documentation/master/std/#)[^home], the first thing\nthat appears to you in this page, is a list of types and data structures.\n\n\nIn this section you can see a list of the many different data structures available in\nthe Zig Standard Library. There are some very specific structures in this list, like a\n[`BoundedArray` struct](https://ziglang.org/documentation/master/std/#std.bounded_array.BoundedArray)[^bounded]\n, but there is also some more general structures, such as a\n[`PriorityQueue` struct](https://ziglang.org/documentation/master/std/#std.priority_queue.PriorityQueue)[^priority].\n\n\n[^home]: \n[^priority]: .\n[^bounded]: \n\n\n\n\n\n\n",
- "supporting": [],
+ "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Data Structures\n\nIn this chapter, we are going to discuss some Data Structures that are available from\nthe Zig Standard Library, especially `ArrayList` and also `HashMap`. I'm also want\nto talk about one of the key features of Zig in this chapter, which is `comptime`, and\nhow we can use it to create generics in Zig.\n\n\n## Dynamic Arrays {#sec-dynamic-array}\n\nIn high level languages, arrays are usually dynamic. They easily grow\nin size when they have to, and you don't need to worry about it.\nIn contrast, arrays in low level languages are usually static by default.\nThis is the reality of C, C++, Rust and also Zig. Static arrays were presented at\n@sec-arrays, but in this section, we are going to talk about dynamic arrays.\n\nDynamic arrays are simply arrays that can grow in size during the runtime\nof your program. Most low level languages offer some implementation of\na dynamic array in their standard library. C++ have `std::vector`, Rust have `Vec`,\nand Zig have `std.ArrayList`.\n\nThe `std.ArrayList` struct provides a contiguous and growable array for you.\nIt works like any other dinamic array, it allocates a contiguous block of memory, and when this block have no space left,\n`ArrayList` allocates another contiguous and bigger block of memory, copies the\nelements to this new location, and erases (or frees) the previous block of memory.\n\n\n### Capacity vs Length\n\nWhen we talk about dynamic arrays, we have two similar concepts that\nare very essential to how a dynamic array works behind the hood.\nThese concepts are *capacity* and *length*. In some contexts, especially\nin C++, *length* is also called of *size*.\n\nAlthough they look similar, these concepts represent different things\nin the context of dynamic arrays. *Capacity* is the number of items (or elements)\nthat your dynamic array can currently hold without the need to allocate more memory.\n\nIn contrast, the *length* refers to how many elements in the array\nare currently being used, or, in other words, how many elements in this array\nthat you assigned a value to. Every dynamic array works around\na block of allocated memory that represents an array with total capacity of $n$ elements,\nbut only a portion of these $n$ elements are being used most of the time. This portion\nof $n$ is the *length* of the array. So every time you append a new value\nto the array, you are incrementing it's *length* by one.\n\nThis means that a dynamic array usually works with an extra margin, or, an extra space\nwhich is currently empty, but it is waiting and ready to be used. This \"extra space\"\nis essentially the difference between *capacity* and *length*. *Capacity* represents\nthe total number of elements that the array can hold without the need to re-allocate\nor re-expand the array, while the *length* represents how much of this capacity\nis currently being used to hold/store values.\n\n@fig-capacity-length presents this idea visually. Notice that, at first,\nthe capacity of the array is greater than the length of the array.\nSo, the dynamic array have extra space that is currently empty, but it\nis ready to receive a value to be stored.\n\n![Difference between capacity and length in a dynamic array](./../Figures/dynamic-array.png){#fig-capacity-length}\n\nWe can also see at @fig-capacity-length that, when *length* and *capacity* are equal, it means that the array have no space left.\nWe reached the roof of our capacity, and because of that, if we want to store more values\nin this array, we need to expand it. We need to get a bigger space that can hold more values\nthat we currently have.\n\nA dynamic array works by expanding the underlying array, whenever the *length* becomes equal\nto the *capacity* of the array. It basically allocates a new contiguos block of memory that is bigger\nthan the previous one, then, it copies all values that are currently being stored to this new\nlocation (i.e. this new block of memory), then, it frees the previous block of\nmemory. At the end of this process, the new underlying array have a bigger *capacity*, and, therefore,\nthe *length* becomes once again smaller than the *capacity* of the array.\n\nThis is the cycle of an dynamic array. Notice that, throughout this cycle, the *capacity* is always\neither equal to or higher than the *length* of the array. If youh have an `ArrayList` object, let's suppose\nyou named it of `buffer`, you can check the current capacity of your array by accessing the `capacity`\nattribute of your `ArrayList` object, while the current *length* of it is available through the `items.len`\nattribute of your `ArrayList` object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// Check capacity\nbuffer.capacity;\n// Check length\nbuffer.items.len;\n```\n:::\n\n\n\n\n### Creating an `ArrayList` object\n\nIn order to use `ArrayList`, you must provide an allocator object to it.\nRemember, Zig does not have a default memory allocator. And as I described at @sec-allocators, all memory\nallocations must be done by allocator objects that you define, that\nyou have control over. In our example here, I'm going to use\na general purpose allocator, but you can use any other allocator\nof your preference.\n\nWhen you initialize an `ArrayList` object, you must provide the data type of the elements of\nthe array. In other words, this defines the type of data that this array (or container) will\nstore. Therefore, if I provide the `u8` type to it, then, I will create a dynamic\narray of `u8` values. However, if I provide a struct that I defined instead, like the struct `User`\nfrom @sec-structs-and-oop, then, a dynamic array of `User` values\nwill be created. In the example below, with the expression `ArrayList(u8)` we\nare creating a dynamic array of `u8` values.\n\nAfter you provide the data type of the elements of the array, you can initialize\nan `ArrayList` object by either using the `init()` or the `initCapacity()` method.\nThe former method receives only the allocator object\nas input, while the latter method receives both the allocator object and a capacity number as inputs.\nWith the latter method, you not only initialize the struct, but you\nalso set the starting capacity of the allocated array.\n\nUsing the `initCapacity()` method is the preferred way to initialize your dynamic array.\nBecause reallocations, or, in other words, the process of expanding the capacity of the array,\nis always a high cost operation. You should take any possible opportunity to avoid reallocations in\nyour array. If you know how much space your array needs to occupy at the beginning,\nyou should always use `initCapacity()` to create your dynamic array.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nvar buffer = try std.ArrayList(u8)\n .initCapacity(allocator, 100);\ndefer buffer.deinit();\n```\n:::\n\n\n\n\n\nIn the example above, the `buffer` object starts as an array of 100 elements. If this\n`buffer` object needs to create more space to accomodate more elements during the runtime of your program, the `ArrayList`\ninternals will perform the necessary actions for you automatically.\nAlso notice the `deinit()` method being used to destroy the `buffer` object at the\nend of the current scope, by freeing all the memory that was allocated for the dynamic\narray stored in this `buffer` object.\n\n\n### Adding new elements to the array\n\nNow that we created our dynamic array, we can start to use it. You can append (a.k.a \"add\")\nnew values to this array by using the `append()` method. This method works the same way\nas the `append()` method from a Python list, or, the `emplace_back()` method from `std::vector` of C++.\nYou provide a single value to this method, and the method appends this value to the array.\n\nYou can also use the `appendSlice()` method to append multiple values at once. You provide\na slice (slices were described at @sec-arrays) to this method, and the method adds all values present\nin this slice to your dynamic array.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\ntry buffer.append('H');\ntry buffer.append('e');\ntry buffer.append('l');\ntry buffer.append('l');\ntry buffer.append('o');\ntry buffer.appendSlice(\" World!\");\n```\n:::\n\n\n\n\n### Removing elements from the array {#sec-dynamic-array-remove}\n\nYou can use the `pop()` method to \"pop\" or remove\nthe last element in the array. Is worth noting that this method\ndo not change the capacity of the array. It just deletes or erases\nthe last value stored in the array.\n\nAlso, this method returns as result the value that got deleted. That is, you can\nuse this method to both get the last value in the array, and also, remove\nit from the array. It is a \"get and remove value\" type of method.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst exclamation_mark = buffer.pop();\n```\n:::\n\n\n\n\nNow, if you want to remove specific elements from specific positions\nof your array, you can use the `orderedRemove()` method from your\n`ArrayList` object. With this method, you can provide an index as input,\nthen, the method will delete the value that is at this index in the array.\nThis effectively reduces the *length* of the array everytime you execute\nan `orderedRemove()` operation.\n\nIn the example below, we first create an `ArrayList` object, and we fill it\nwith numbers. Then, we use `orderedRemove()` to remove the value at\nindex 3 in the array, two consecutive times.\n\nAlso, notice that we are assigning the result of `orderedRemove()` to the\nunderscore character. So we are discarding the result value of this method.\nAs the result value, the `orderedRemove()` method returns the value that\ngot deleted, in a similar style to the `pop()` method.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nvar buffer = try std.ArrayList(u8)\n .initCapacity(allocator, 100);\ndefer buffer.deinit();\n\nfor (0..10) |i| {\n const index: u8 = @intCast(i);\n try buffer.append(index);\n}\n\nstd.debug.print(\n \"{any}\\n\", .{buffer.items}\n);\n_ = buffer.orderedRemove(3);\n_ = buffer.orderedRemove(3);\n\nstd.debug.print(\n \"{any}\\n\", .{buffer.items}\n);\nstd.debug.print(\n \"{any}\\n\", .{buffer.items.len}\n);\n```\n:::\n\n\n\n\n```\n{ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }\n{ 0, 1, 2, 5, 6, 7, 8, 9 }\n8\n```\n\nOne key characteristic about `orderedRemove()` is that it preserves the order\nof the values in the array. So, it deletes the value that you asked it to\nremove, but it also makes sure that the order of the values that remain in the array\nstay the same as before.\n\nNow, if you don't care about the order of the values, for example, maybe you want to treat\nyour dynamic array as a set of values, like the `std::unordered_set`\nstructure from C++, you can use the `swapRemove()` method instead. This method\nworks similarly to the `orderedRemove()` method. You give an index to this\nmethod, then, it deletes the value that is at this index in the array.\nBut this method does not preserve the original order of the values that remain\nin the array. As a result, `swapRemove()` is, in general, faster than `orderedRemove()`.\n\n\n### Inserting elements at specific indexes\n\nWhen you need to insert values in the middle of your array,\ninstead of just appending them to the end of the array, you need to use\nthe `insert()` and `insertSlice()` methods, instead of\nthe `append()` and `appendSlice()` methods.\n\nThese two methods work very similarly to `insert()` and `insert_range()`\nfrom the C++ vector class. You provide an index to these methods,\nand they insert the values that you provide at that index in the array.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar gpa = std.heap.GeneralPurposeAllocator(.{}){};\nconst allocator = gpa.allocator();\nvar buffer = try std.ArrayList(u8)\n .initCapacity(allocator, 10);\ndefer buffer.deinit();\n\ntry buffer.appendSlice(\"My Pedro\");\ntry buffer.insert(4, '3');\ntry buffer.insertSlice(2, \" name\");\nfor (buffer.items) |char| {\n try stdout.print(\"{c}\", .{char});\n}\n```\n:::\n\n\n\n\n```\nMy name P3edro\n```\n\n\n### Conclusion\n\nIf you feel the lack of some other method, I recommend\nyou to read the [official documentation for the `ArrayListAligned`](https://ziglang.org/documentation/master/std/#std.array_list.ArrayListAligned)[^zig-array2]\nstruct, which describes most of the methods available\nthrough the `ArrayList` object.\n\nYou will notice that there is a lot other methods in this page that\nI did not described here, and I recommend you to explore these methods,\nand understand how they work.\n\n[^zig-array2]: \n\n\n\n## Maps or HashTables {#sec-maps-hashtables}\n\nSome professionals know this type of data structure by different terms, like \"map\", \"hashmap\" or \"associative arrays\". But most professionals\nknow this structure by the name *hashtable*.\nEvery programming language normally have some implementation of a hashtable in their\nstardard libraries. Python have `dict()`, C++ have `std::map` and `std::unordered_map`, Rust\nhave `HashMap`, Javascript have `Object()` and `Map()`,\nC# have `Hashtable()`, etc.\n\n\n\n### What is a hashtable?\n\nA hashtable is a data structure based on key-value pairs.\nYou provide a key and a value to this structure, then, the hashtable will store\nthe input value at a location that can be identified by the input\nkey that you provided.\nIt does that by using an underlying array and a hash function.\nThese two components are essential to how a hashtable works.\n\nUnder the hood, the hashtable contains an array. This array is where the values\nare stored, and the elements of this array are usually called of *buckets*.\nSo the values that you provide to the hashtable are stored inside buckets,\nand you access each bucket by using an index.\n\nWhen you provide a key to a hashtable, it passes this key to the\nhash function. This hash function uses some sort of hashing algorithm to transform\nthis key into an index. This index is actually an array index. It is a position\nin the underlying array of the hashtable.\nThis is how a key identifies a specific position (or location) inside the hashtable\nstructure.\n\nSo you provide a key to the hashtable, and this key identifies an specific location\ninside the hastable, then, the hashtable takes the input value that you provided,\nand stores this value in the location identified by the input key that you provided.\nYou could say that the key maps to the value stored in the hashtable. You find\nthe value, by using the key that identifies the location where the value is stored.\nThe @fig-hashtable presents this process visually.\n\n\n![A diagram of a Hashtable. Source: Wikipedia, the free encyclopedia.](./../Figures/hashtable.svg){#fig-hashtable}\n\n\nThe operation described in the previous paragraph is normally called an *insertion* operation.\nBecause you are inserting new values into the hashtable.\nBut there are other types of operations in hashtables such as *delete* and *lookup*.\nDelete is self describing, it is when you delete (or remove) a value from the hashtable.\nWhile lookup corresponds to when you retrieve (or look at) a value that is stored in\nthe hashtable, by using the key that identifies the location where this value is stored.\n\nSometimes, instead of storing the values directly, the underlying array of the hashtable might be an array of pointers,\ni.e. the buckets of the array stores pointers that points to the value,\nor also, may be an array of linked lists.\nThese cases are common on hashtables that allows duplicate keys, or, in other words,\non hashtables that effectively handle \"collisions\" that may arise from the hash function.\n\nDuplicate keys, or this \"collision\" thing that I'm talking about, is when you have two different keys that points to the same location (i.e. to the same index)\nin the underlying array of the hashtable. This might happen depending on the characteristics of the hash function\nthat is being used in the hashtable. Some implementations of the hashtable will actively deal with collisions,\nmeaning that, they will handle this case in some way. For example, the hashtable\nmight transform all buckets into linked lists. Because with a liked list you can store\nmultiple values into a single bucket.\n\nThere are different techniques to handle collisions in hashtables, which I will not describe\nin this book, because it is not our main scope here. But you can find a good description of\nsome of the most common techniques at the Wikipedia page of hashtables [@wikipedia_hashtables].\n\n\n### Hashtables in Zig {#sec-hashmap}\n\nThe Zig Standard Library provides different implementations of a hashtable,\nlike the struct `HashMap`. Each implementation have it's own cons and pros, which we will\ndiscuss later on, and all of them are available through the `std.hash_map` module.\n\nThe `HashMap` struct is a general-purpose hashtable,\nwhich have very fast operations (lookup, insertion, delete), and also,\nquite high load factors for low memory usage. You can create and provide a context object\nto the `HashMap` constructor. This context object allows you to tailor\nthe behaviour of the hashtable itself, because you can\nprovide a hash function implementation to be used by the hashtable\nthrough this context object.\n\nBut let's not worry about this context object now, because it is meant to be used\nby \"experts in the field of hashtables\". Since we are most likely not\nexperts in this field, we are going to take the easy way to create\na hashtable. Which is by using the `AutoHashMap()` function.\n\n\nThis `AutoHashMap()` function is essentially a \"create a hashtable object that uses the default settings\"\ntype of function. It chooses a context object, and, therefore, a hash function implementation,\nautomatically for you. This function receives two data types as input, the first data type is the data type of the keys\nthat will be used in this hashtable, while the second data type is the data type of that data that will be\nstored inside the hashtable, that is, the data type of the values to be stored.\n\nIn the example below, we are providing the data type `u32` in the first argument, and `u16` in the second argument of this\nfunction. It means that we are going to use `u32` values as keys in this hashtable, while `u16` values are the actual values\nthat are going to be stored into this hashtable.\nAt the end of this process, the `hash_table` object contains a `HashMap` object as output\nthat uses the default context, and the default load factor.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst AutoHashMap = std.hash_map.AutoHashMap;\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var hash_table = AutoHashMap(u32, u16).init(allocator);\n defer hash_table.deinit();\n\n try hash_table.put(54321, 89);\n try hash_table.put(50050, 55);\n try hash_table.put(57709, 41);\n std.debug.print(\n \"N of values stored: {d}\\n\",\n .{hash_table.count()}\n );\n std.debug.print(\n \"Value at key 50050: {d}\\n\",\n .{hash_table.get(50050).?}\n );\n\n if (hash_table.remove(57709)) {\n std.debug.print(\n \"Value at key 57709 successfully removed!\\n\",\n .{}\n );\n }\n std.debug.print(\n \"N of values stored: {d}\\n\",\n .{hash_table.count()}\n );\n}\n```\n:::\n\n\n\n\n```\nN of values stored: 3\nValue at key 50050: 55\nValue at key 57709 successfully removed!\nN of values stored: 2\n```\n\nYou can add/put new values into the hashtable by using the `put()` method. The first argument\nis the key to be used, and the second argument is the actual value that you want to store inside\nthe hashtable. In the example below, we first add the value 89 using the key 54321, next, we add\nthe value 55 using the key 50050, etc.\n\nNotice that we used the method `count()` to see how many values are currently stored in the\nhashtable. After that, we also used the `get()` method to access (or look) at the value stored in\nthe position identified by the key 500050. The output of this `get()` method is an optional value,\nand that is why we use the `?` method at the end to get access to the actual value.\n\nAlso notice that we can remove (or delete) values from a hashtables by using the `remove()` method.\nYou provide the key that identifies the value that you want to delete, then, the method will\ndelete this value and return a `true` value as output. This `true` value essentially tells us\nthat the method successfully deleted the value.\n\nBut this delete operation might not be always successful. For example, you might provide the wrong\nkey to this method. I mean, maybe you provide\n(either intentionally or unintentionally) a key that points to an empty bucket,\ni.e. a bucket that still doesn't have a value in it.\nIn this case, the `remove()` method would return a `false` value.\n\n\n\n### Iterating through the hashtable\n\nIterating through the keys and values that are currently being stored in\nthe hashtable is a very common need.\nYou can do that in Zig by using an iterator object that can iterate\nthrough the elements of you hashtable object.\n\nThis iterator object works like any other iterator object that you would\nfind in languages such as C++ and Rust. It is basically a pointer object\nthat points to some value in the container, and has a `next()` method\nthat you can use to navigate (or iterate) through the next values in the\ncontainer.\n\nYou can create such iterator object by using the `iterator()` method of the hashtable object.\nThis method returns an iterator object, from which you can use the `next()` method in conjunction\nwith a while loop to iterate through the elements of your hashtable. The `next()` method returns an optional\n`Entry` value, and therefore, you must unwrap this optional value to get the actual `Entry` value\nfrom which you can access the key and also the value identified by this key.\n\nWith this `Entry` value at hand, you can access the key of this current entry by using the `key_ptr`\nattribute and dereferencing the pointer that lives inside of it, while the value identified by this\nkey is accessed through the `value_ptr` attribute instead, which is also a pointer to be dereferenced.\nThe code example below demonstrates the use of these elements:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst AutoHashMap = std.hash_map.AutoHashMap;\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var hash_table = AutoHashMap(u32, u16).init(allocator);\n defer hash_table.deinit();\n\n try hash_table.put(54321, 89);\n try hash_table.put(50050, 55);\n try hash_table.put(57709, 41);\n\n var it = hash_table.iterator();\n while (it.next()) |kv| {\n // Access the current key\n std.debug.print(\"Key: {d} | \", .{kv.key_ptr.*});\n // Access the current value\n std.debug.print(\"Value: {d}\\n\", .{kv.value_ptr.*});\n }\n}\n```\n:::\n\n\n\n\n```\nKey: 54321 | Value: 89\nKey: 50050 | Value: 55\nKey: 57709 | Value: 41\n```\n\n\nIf you want to iterate through only the values or the keys of your hashtable,\nyou can create a key iterator or a value iterator object. These are also iterator\nobjects, which have the same `next()` method that you can use to iterate through the\nsequence of values.\n\nKey iterators are created from the `keyIterator()` method of your\nhashtable object, while value iterators are created from the `valueIterator()` method.\nAll you have to do is to unwrap the value from the `next()` method and deference it\ndirectly to access the key or value that you iterating over.\nThe code example below demonstrates what would this be for a key iterator,\nbut you can replicate the same logic to a value iterator.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar kit = hash_table.keyIterator();\nwhile (kit.next()) |key| {\n std.debug.print(\"Key: {d}\\n\", .{key.*});\n}\n```\n:::\n\n\n\n\n```\nKey: 54321\nKey: 50050\nKey: 57709\n```\n\n\n### The `ArrayHashMap` hashtable {#sec-array-map}\n\nIf you need to iterate through the elements of your hashtable constantly,\nyou might want to use the `ArrayHashMap` struct for your specific case,\ninstead of going with the usual and general-purpose `HashMap` struct.\n\nThe `ArrayHashMap` struct creates a hashtable that is faster to iterate over.\nThat is why this specific type of hashtable might be valuable to you.\nSome other properties of a `ArrayHashMap` hashtable are:\n\n- the order of insertion is preserved. So the order of the values you find while iterating through this hashtable\nare actually the order in which these values were inserted in the hashtable.\n\n- the key-value pairs are stored sequentially, one after another.\n\n\nYou can create an `ArrayHashMap` object by using, once again, a helper function that\nchooses automatically for you a hash function implementation. This is the\n`AutoArrayHashMap()` function, which works very similarly to the `AutoHashMap()`\nfunction that we presented at @sec-hashmap.\n\nYou provide two data types to this function. The data type of the keys that will be\nused in this hashtable, and the data type of the values that will be stored in\nthis hashtable.\n\nAn `ArrayHashMap` object have essentially the exact same methods from the `HashMap` struct.\nSo you can insert new values into the hashtable by using the `put()` method, you can look (or get)\na value from the hashtable by using the `get()` method. But the `remove()` method is not available\nin this specific type of hashtable.\n\nIn order to delete values from the hashtable, you would use the same methods that you find in\nan `ArrayList` object, i.e. a dynamic array. I presented these methods at @sec-dynamic-array-remove,\nwhich are the `swapRemove()` and `orderedRemove()` methods. These methods have here the same meaning, or,\nthe same effect that they have in an `ArrayList` object.\n\nThis means that, with `swapRemove()` you remove the value from the hashtable, but you do not preserve\nthe order in which the values were inserted into the structure. While `orderedRemove()` is capable\nof retaining the insertion order of these values.\n\nBut instead of providing an index as input to `swapRemove()` or `orderedRemove()`, like I described\nat @sec-dynamic-array-remove, these methods here in an `ArrayHashMap` take a key as input, like\nthe `remove()` method from a `HashMap` object. If you want to provide an index as input, instead\nof a key, you should use the `swapRemoveAt()` and `orderedRemoveAt()` methods.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar hash_table = AutoArrayHashMap(u32, u16)\n .init(allocator);\ndefer hash_table.deinit();\n```\n:::\n\n\n\n\n\n\n### The `StringHashMap` hashtable {#sec-string-hash-map}\n\nOne thing that you will notice in the other two types of hashtables that I\npresented in the last sections, is that neither of them accepts a slice data type\nin their keys.\nWhat this means is that you cannot use a slice value to represent a key in\nthese types of hashtable.\n\nThe most obvious consequence of this, is that you cannot use strings as keys\nin these hashtables. But is extremely common to use string values as keys\nin hashtables.\n\nTake this very simple Javascript code snippet as an example. We are creating\na simple hashtable object named `people`. Then, we add a new entry to this\nhashtable, which is identified by the string `'Pedro'`. This string is the\nkey in this case, while the object containing different personal information such as\nage, height and city, is the value to be stored in the hashtable.\n\n```js\nvar people = new Object();\npeople['Pedro'] = {\n 'age': 25,\n 'height': 1.67,\n 'city': 'Belo Horizonte'\n};\n```\n\nThis pattern of using strings as keys is very common in\nall sorts of situations. That is why the Zig Standard Library offers a\nspecific type of hashtable for this purpose, which is created through the `StringHashMap()` function.\nThis function creates a hashtable that uses strings as keys. The only input of this\nfunction is the data type of the values that will be stored into this hashtable.\n\nIn the example below, I'm creating a hashtable to store the ages of different people.\nThe keys to be used in this hashtable are the names of each person, while the value stored in the\nhashtable is the age of the person identified by the key.\n\nThat is why I provide the `u8` data type (which is the data type used by the age values) as input to this `StringHashMap()` function.\nAs the result, it creates a hashtable that uses string values as keys, and, that stores\n`u8` values in it. Notice that an allocator object is provided at the `init()` method of the\nresulting object from the `StringHashMap()` function.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var ages = std.StringHashMap(u8).init(allocator);\n defer ages.deinit();\n\n try ages.put(\"Pedro\", 25);\n try ages.put(\"Matheus\", 21);\n try ages.put(\"Abgail\", 42);\n\n var it = ages.iterator();\n while (it.next()) |kv| {\n std.debug.print(\"Key: {s} | \", .{kv.key_ptr.*});\n std.debug.print(\"Age: {d}\\n\", .{kv.value_ptr.*});\n }\n}\n```\n:::\n\n\n\n\n```\nKey: Pedro | Age: 25\nKey: Abgail | Age: 42\nKey: Matheus | Age: 21\n```\n\n\n### The `StringArrayHashMap` hashtable\n\nThe Zig Standard Library also provides a type of hashtable that mix the cons and pros of the\ntypes of hashtables that were presented on the previous two sections. That is, a hashtable\nthat uses strings as keys, but also have the advantages from the `ArrayHashMap` struct.\nIn other words, you can have a hashtable that is fast to iterate over,\nthat preserves insertion order, and also, that uses strings as keys.\n\nYou can create such type of hashtable by using the `StringArrayHashMap()` function.\nThis function accepts a data type as input, which is the data type of the values that are\ngoing to be stored inside this hashtable, in the same style as the function presented\nat @sec-string-hash-map.\n\nYou can insert new values into this hashtable by using the same `put()` method that\nI presented at @sec-string-hash-map. And you can also get values from the hashtable\nby using the same `get()` method that I exposed on previous sections.\nLike it's `ArrayHashMap` brother, to delete values from this specific type of hashtable,\nwe also use the `orderedRemove()` and `swapRemove()` methods, with the same effects that\nI described at @sec-array-map.\n\nIf we take the code example that was exposed at @sec-string-hash-map, we can\nachieve the exact same result with `StringArrayHashMap()`. All we have to do\nis to change the use of `StringHashMap()` to `StringArrayHashMap()` at the\nfifth line in this code example. It would change to this:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar ages = std.StringArrayHashMap(u8).init(allocator);\n```\n:::\n\n\n\n\n\n\n## Linked lists\n\nThe Zig Standard Library provides implementation for both single and doubly linked lists.\nA linked list is a linear data structure that looks like a chain, or, a rope.\nThe main advantage of this data structure is that you normally have fast\ninsertion and deletion operations. But, as a disadvantage, iterating through\nthis data structure is usually not so fast as iterating through an array.\n\nThe idea behind a linked list is basically build a structure that concists of a series of nodes\nconnected to each other by pointers. This means that linked lists are usually not contiguos\nin memory, because each node might be south in memory, but the next node might be north\nin memory, then the next node might be left in memory, anyway, you get it, they can be anywhere.\n\nAt @fig-linked-list we can see a diagram of a singly linked list. Notice that we begin with\na first node. This first node is usually called \"the head of the linked list\". Then, from this\nfirst node we uncover the remaining nodes in the structure, by following the locations pointed\nby the pointers.\n\nEvery node have two things in it. It have the value that is stored in the current node\n, and also have a pointer. This pointer points to the next node in the list. If this pointer\nis null, then, it means that we reached the end of our linked list.\n\n![A diagram of a singly linked list.](./../Figures/linked-list.png){#fig-linked-list}\n\n\nAt @fig-linked-list2 we can see a diagram of a doubly linked list. The only thing that really\nchanges is that every node in the linked list have both a pointer to the previous node,\nand, a pointer to the next node. So every node have now two pointers in it. These are\nusually called the `prev` (for \"previous\") and `next` (for \"next\") pointers of the node.\n\nIn the singly linked list example, we had only one single pointer in each node, and this singular\npointer was always pointing to the next node in the sequence. In other words, singly linked lists\nnormally have only the `next` pointer in them.\n\n![A diagram of a doubly linked list.](./../Figures/doubly-linked-list.png){#fig-linked-list2}\n\n\n\nLinked lists are available in Zig through the functions `SinglyLinkedList()` and\n`DoublyLinkedList()`, for \"singly linked lists\" and \"doubly linked lists\", respectively. These functions are\nactually generic functions, which we are going to talk more about at @sec-generic-fun.\n\nFor now, just understand that, in order to create a linked list object,\nwe begin by providing a data type to these functions. This data type defines\nthe type of data that this linked list will store. In the example below,\nwe are creating a singly linked list capable of storing `u32` values.\nSo each node in this linked list will store a `u32` value.\n\nBoth the `SinglyLinkedList()` and `DoublyLinkedList()` functions returns a type, i.e. a struct definition, as result. This means that\nthe object `Lu32` is actually a type definition, or a struct definition. It defines\nthe type \"singly linked list of `u32` values\".\n\nSo now that we have the definition of the struct, we have to instantiate a `Lu32` object.\nWe normally instantiate struct objects in Zig by using an `init()` method.\nBut in this case, we are instantiating the struct directly, by using an empty\n`struct` literal, in the expression `Lu32{}`.\n\nIn this example, we first create multiple node objects, and after we create them,\nwe start to insert and connect these nodes to build the linked list, using the\n`prepend()` and `insertAfter()` methods. Notice that the `prepend()` method\nis a method from the linked list object, while the `insertAfter()` is a method\npresent in the node objects.\n\nIn essence, the `prepend()` method inserts a node at the beginning of the linked\nlist. In other words, the node that you provide to this method, becomes the new\n\"head node\" of the linked list. It becomes the first node in the list (see @fig-linked-list).\n\nOn the other side, the `insertAfter()` method is used to basically connect two nodes together.\nWhen you provide a node to this method, it creates a pointer to this input node,\nand stores this pointer in the current node, from which the method was called from.\nIn other words, this method creates the pointer that connects these two nodes together\nand stores it in the `next` attribute of the current node.\n\nSince doubly linked list have both a `next` and a `prev` pointers in each node,\nreferring to the next and previous nodes in the sequence, respectively,\nas I described at @fig-linked-list2, a node object created from\na `DoublyLinkedList()` object would have both a\n`insertBefore()` (for `prev`) and a `insertAfter()` (for `next`) methods\navailable.\n\nThis means that, if we used a doubly linked list, we could use the `insertBefore()` method\nto store the pointer to the input node in the `prev` attribute. This would put the input\nnode as the \"previous node\", or, the node before the current node. The `insertAfter()` method\nhave \"after\" in it's name to indicate that this method puts the pointer created to the input\nnode in the `next` attribute of the current node, and as the result, the input node becomes\nthe \"next node\" of the current node.\n\nSince we are using a singly linked list in this example, we have only the `insertAfter()` method\navailable in the node objects that we create from our `Lu32` type.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst SinglyLinkedList = std.SinglyLinkedList;\nconst Lu32 = SinglyLinkedList(u32);\n\npub fn main() !void {\n var list = Lu32{};\n var one = Lu32.Node{ .data = 1 };\n var two = Lu32.Node{ .data = 2 };\n var three = Lu32.Node{ .data = 3 };\n var four = Lu32.Node{ .data = 4 };\n var five = Lu32.Node{ .data = 5 };\n\n list.prepend(&two); // {2}\n two.insertAfter(&five); // {2, 5}\n list.prepend(&one); // {1, 2, 5}\n two.insertAfter(&three); // {1, 2, 3, 5}\n three.insertAfter(&four); // {1, 2, 3, 4, 5}\n}\n```\n:::\n\n\n\n\n\nThere are other methods available from the linked list object, depending if this object is\na singly linked list or a doubly linked list, that might be very useful for you, like:\n\n- `remove()` to remove a specific node from the linked list.\n- `popFirst()` to remove the first node from the linked list.\n- if singly linked list, `len()` to count how many nodes there is in the linked list.\n- if doubly linked list, checkout the `len` attribute to see how many nodes there is in the linked list.\n- if singly linked list, `popFirst()` to remove the first node from the linked list.\n- if doubly linked list, `pop()` and `popFirst()` to remove the last and first nodes from the linked list, respectively.\n- if doubly linked list, `append()` to add a new node to end of the linked list (i.e. inverse of `prepend()`).\n\n\n\n## Multi array structure\n\nZig introduces a new data structure called `MultiArrayList()`. It is a different version of the dynamic array\nthat we have introduced at @sec-dynamic-array. The difference between this structure and the `ArrayList()`\nthat we know from @sec-dynamic-array, is that `MultiArrayList()` creates a separate dynamic array\nfor each field of the struct that you provide as input.\n\nConsider the following code example. We create a new custom struct called `Person`. This\nstruct contains three different data members, or, three different fields. As consequence,\nwhen we provide this `Person` data type as input to `MultiArrayList()`, this\ncreates a \"struct of three different arrays\" called `PersonArray`. In other words,\nthis `PersonArray` is a struct that contains three internal dynamic arrays in it.\nOne array for each field found in the `Person` struct definition.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Person = struct {\n name: []const u8,\n age: u8,\n height: f32,\n};\nconst PersonArray = std.MultiArrayList(Person);\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n var people = PersonArray{};\n defer people.deinit(allocator);\n\n try people.append(allocator, .{\n .name = \"Auguste\", .age = 15, .height = 1.54\n });\n try people.append(allocator, .{\n .name = \"Elena\", .age = 26, .height = 1.65\n });\n try people.append(allocator, .{\n .name = \"Michael\", .age = 64, .height = 1.87\n });\n}\n```\n:::\n\n\n\n\nIn other words, instead of creating an array of \"persons\", the `MultiArrayList()` function\ncreates a \"struct of arrays\". Each data member of this struct is a different array that stores\nthe values of a specific field from the `Person` struct values that were added (or, appended) to this \"struct of arrays\".\nOne important detail is that each of these separate internal arrays stored inside `PersonArray`\nare dynamic arrays. This means that these arrays can grow in capacity automatically as needed, to accomodate\nmore values.\n\nThe @fig-multi-array exposed below presents a diagram that describes the `PersonArray` struct\nthat we have created in the previous code example. Notice that the values of the data members\npresent in each of the three `Person` values that we have appended into the `PersonArray` object\nthat we have instantiated, are scattered across three different internal arrays of the `PersonArray` object.\n\n![A diagram of the `PersonArray` struct.](./../Figures/multi-array.png){#fig-multi-array}\n\nYou can easily access each of these arrays separately, and iterate over the values of each array.\nFor that, you will need to call the `items()` method from the `PersonArray` object, and provide as input\nto this method, the name of the field that you want to iterate over.\nIf you want to iterate through the `.age` array for example, then, you need to call `items(.age)` from\nthe `PersonArray` object, like in the example below:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfor (people.items(.age)) |*age| {\n try stdout.print(\"Age: {d}\\n\", .{age.*});\n}\n```\n:::\n\n\n\n\n```\nAge: 15\nAge: 26\nAge: 64\n```\n\n\nIn the above example, we are iterating over the values of the `.age` array, or,\nthe internal array of the `PersonArray` object that contains the values of the `age`\ndata member from the `Person` values that were added to the multi array struct.\n\nIn this example we are calling the `items()` method directly from the `PersonArray`\nobject. However, it is recommended on most situations to call this `items()` method\nfrom a \"slice object\", which you can create from the `slice()` method.\nThe reason for this is that calling `items()` multiple times have better performance\nif you use a slice object.\n\nIn other words, if you are planning to access only one of the\ninternal arrays from your \"multi array struct\", it is fine to call `items()` directly\nfrom the multi array object. But if you need to access many of the internal arrays\nfrom your \"multi array struct\", then, you will likely need to call `items()` more\nthan once, and, in such circustance, is better to call `items()` through a slice object.\nThe example below demonstrates the use of such object:\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar slice = people.slice();\nfor (slice.items(.age)) |*age| {\n age.* += 10;\n}\nfor (slice.items(.name), slice.items(.age)) |*n,*a| {\n try stdout.print(\n \"Name: {s}, Age: {d}\\n\", .{n.*, a.*}\n );\n}\n```\n:::\n\n\n\n\n```\nName: Auguste, Age: 25\nName: Elena, Age: 36\nName: Michael, Age: 74\n```\n\n\n## Conclusion\n\nThere are many other data structures that I did not presented here.\nBut you can check them out at the offical Zig Standard Library documentation page.\nActually, when you get into the [homepage of the documentation](https://ziglang.org/documentation/master/std/#)[^home], the first thing\nthat appears to you in this page, is a list of types and data structures.\n\n\nIn this section you can see a list of the many different data structures available in\nthe Zig Standard Library. There are some very specific structures in this list, like a\n[`BoundedArray` struct](https://ziglang.org/documentation/master/std/#std.bounded_array.BoundedArray)[^bounded]\n, but there is also some more general structures, such as a\n[`PriorityQueue` struct](https://ziglang.org/documentation/master/std/#std.priority_queue.PriorityQueue)[^priority].\n\n\n[^home]: \n[^priority]: .\n[^bounded]: \n\n\n\n\n\n\n",
+ "supporting": [
+ "09-data-structures_files"
+ ],
"filters": [
"rmarkdown/pagebreak.lua"
],
diff --git a/_freeze/Chapters/09-error-handling/execute-results/html.json b/_freeze/Chapters/09-error-handling/execute-results/html.json
index 86d3ffd..13ec78a 100644
--- a/_freeze/Chapters/09-error-handling/execute-results/html.json
+++ b/_freeze/Chapters/09-error-handling/execute-results/html.json
@@ -1,8 +1,8 @@
{
- "hash": "fc54ccd44d4c7901ac1b1d600f18c891",
+ "hash": "f7486aa021063a7aea617a7b10de34f9",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Error handling and unions {#sec-error-handling}\n\nIn this chapter, I want to discuss how error handling is done in Zig.\nWe already briefly learned about one of the available strategies to handle errors in Zig,\nwhich is the `try` keyword presented at @sec-main-file. But we still haven't learned about\nthe other methods, such as the `catch` keyword.\nI also want to discuss in this chapter how enum types are created in Zig.\n\n## Learning more about errors in Zig\n\nBefore we get into how error handling is done, we need to learn more about what errors are in Zig.\nAn error is actually a value in Zig [@zigoverview]. In other words, when an error occurs inside your Zig program,\nit means that somewhere in your Zig codebase, an error value is being generated.\nAn error value is similar to any integer value that you create in your Zig code.\nYou can take an error value and pass it as input to a function,\nand you can also cast (or coerce) it into a different type of error value.\n\nThis have some similarities with exceptions in C++ and Python.\nBecause in C++ and Python, when an exception happens inside a `try` block,\nyou can use a `catch` block (in C++) or an `except` block (in Python)\nto capture the exception produced in the `try` block,\nand pass it to functions as an input.\n\n\nAlthough they are normal values as any other, you cannot ignore error values in your Zig code. Meaning that, if an error\nvalue appears somewhere in your source code, this error value must be explicitly handled in some way.\nThis also means that you cannot discard error values by assigning them to a underscore,\nas you could do with normal values and objects.\n\nTake the source code below as an example. Here we are trying to open a file that does not exist\nin my computer, and as a result, an obvious error value of `FileNotFound` is returned from the `openFile()`\nfunction. But because I'm assigning the result of this function to an underscore, I end up\ntrying to discard an error value.\n\nThe `zig` compiler detects this mistake, and raises a compile\nerror telling me that I'm trying to discard an error value.\nIt also adds a note message that suggests the use of `try`,\n`catch` or an if statement to explicitly handle this error value\nThis note is reinforcing that every possible error value must be explicitly handled in Zig.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst dir = std.fs.cwd();\n_ = dir.openFile(\"doesnt_exist.txt\", .{});\n```\n:::\n\n\n\n\n```\nt.zig:8:17: error: error set is discarded\nt.zig:8:17: note: consider using 'try', 'catch', or 'if'\n```\n\n### Returning errors from functions\n\nAs we described at @sec-main-file, when we have a function that might return an error\nvalue, this function normally includes an exclamation mark (`!`) in it's return type\nannotation. The presence of this exclamation mark indicates that this function might\nreturn an error value as result, and, the `zig` compiler forces you to always handle explicitly\nthe case of this function returning an error value.\n\nTake a look at the `print_name()` function below. This function might return an error in the `stdout.print()` function call,\nand, as a consequence, it's return type (`!void`) includes an exclamation mark in it.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn print_name() !void {\n const stdout = std.getStdOut().writer();\n try stdout.print(\"My name is Pedro!\", .{});\n}\n```\n:::\n\n\n\n\nIn the example above, we are using the exclamation mark to tell the `zig` compiler\nthat this function might return some error. But which error exactly is returned from\nthis function? For now, we are not specifying a specific error value. We only\nknown for now that some error value (whatever it is) might be returned.\n\nBut in fact, you can (if you want to) specify clearly which exact error values\nmight be returned from this function. There are lot of examples of\nthis in the Zig Standard Library. Take this `fill()` function from\nthe `http.Client` module as an example. This function returns\neither a error value of type `ReadError`, or `void`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn fill(conn: *Connection) ReadError!void {\n // The body of this function ...\n}\n```\n:::\n\n\n\n\nThis idea of specifying the exact error values that you expect to be returned\nfrom the function is interesting. Because they automatically become some sort of documentation\nof your function, and also, it allows the `zig` compiler to perform some extra checks over\nyour code. Because it can check if there is any other type of error value\nthat is being generated inside your function, and, that it is not being accounted\nfor in this return type annotation.\n\nAnyway, you can list the types of errors that can be returned from the function\nby listing them on the left side of the exclamation mark. While the valid values\nstay on the right side of the exclamation mark. So the syntax format become:\n\n```\n!\n```\n\n### Error sets\n\nBut what about when we have a single function that might return different types of errors?\nWhen you have such a function, you can list\nall of these different types of errors that can be returned from this function,\nthrough a structure in Zig that we call of *error set*.\n\nAn error set is a special case of an union type.\nIt essentially is an union that contains error values in it.\nNot all programming languages have a notion of an \"union object\".\nBut in summary, an union is just a list of the options that\nan object can be. For example, a union of `x`, `y` and `z`, means that\nan object can be either of type `x`, or type `y` or type `z`.\n\nWe are going to talk in more depth about unions at @sec-unions.\nBut you can write an error set by writing the keyword `error` before\na pair of curly braces, then you list the error values that can be\nreturned from the function inside this pair of curly braces.\n\nTake the `resolvePath()` function below as an example, which comes from the\n`introspect.zig` module of the Zig Standard Library. We can see in it's return type annotation, that this\nfunction return either: 1) a valid slice of `u8` values (`[]u8`); or, 2) one of the three different\ntypes of error values listed inside the error set (`OutOfMemory`, `Unexpected`, etc.).\nThis is an example of use of an error set.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn resolvePath(\n ally: mem.Allocator,\n p: []const u8,\n) error{\n OutOfMemory,\n CurrentWorkingDirectoryUnlinked,\n Unexpected,\n}![]u8 {\n // The body of the function ...\n}\n```\n:::\n\n\n\n\n\nThis is a valid way of annotating the return value of a Zig function. But, if you navigate through\nthe modules that composes the Zig Standard Library, you will notice that, for the majority of cases,\nthe programmers prefer to give a descriptive name to this error set, and then, use this name (or this \"label\")\nof the error set in the return type annotation, instead of using the error set directly.\n\nWe can see that in the `ReadError` error set that we showed earlier in the `fill()` function,\nwhich is defined in the `http.Client` module.\nSo yes, I presented the `ReadError` as if it was just a standard and single error value, but in fact,\nit is an error set defined in the `http.Client` module, and therefore, it actually represents\na set of different error values that might happen in the `fill()` and other functions.\n\n\nTake a look at the `ReadError` definition reproduced below. Notice that we are grouping all of these\ndifferent error values into a single object, and then, we use this object into the return type annotation of the functions.\nLike the `fill()` function that we showed earlier, or, the `readvDirect()` function from the same module,\nwhich is reproduced below.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const ReadError = error{\n TlsFailure,\n TlsAlert,\n ConnectionTimedOut,\n ConnectionResetByPeer,\n UnexpectedReadFailure,\n EndOfStream,\n};\n// Some lines of code\npub fn readvDirect(\n conn: *Connection,\n buffers: []std.posix.iovec\n ) ReadError!usize {\n // The body of the function ...\n}\n```\n:::\n\n\n\n\nSo, an error set is just a convenient way of grouping a set of\npossible error values into a single object, or a single type of an error value.\n\n\n### Casting error values\n\nLet's suppose you have two different error sets, named `A` and `B`.\nIf error set `A` is a superset of error set `B`, then, you can cast (or coerce)\nerror values from `B` into error values of `A`.\n\nError sets are just a set of error values. So, if the error set `A`\ncontains all error values from the error set `B`, then `A`\nbecomes a superset of `B`. You could also say\nthat the error set `B` is a subset of error set `A`.\n\nThe example below demonstrates this idea. Because `A` contains all\nvalues from `B`, `A` is a superset of `B`.\nIn math notation, we would say that $A \\supset B$.\nAs a consequence, we can give an error value from `B` as input to the `cast()`\nfunction, and, implicitly cast this input into the same error value, but from the `A` set.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst A = error{\n ConnectionTimeoutError,\n DatabaseNotFound,\n OutOfMemory,\n InvalidToken,\n};\nconst B = error {\n OutOfMemory,\n};\n\nfn cast(err: B) A {\n return err;\n}\n\ntest \"coerce error value\" {\n const error_value = cast(B.OutOfMemory);\n try std.testing.expect(\n error_value == A.OutOfMemory\n );\n}\n```\n:::\n\n\n\n\n\n## How to handle errors\n\nNow that we learned more about what errors are in Zig,\nlet's discuss the available strategies to handle these errors,\nwhich are:\n\n- `try` keyword;\n- `catch` keyword;\n- an if statement;\n- `errdefer` keyword;\n\n\n\n### What `try` means?\n\nAs I described over the previous sections, when we say that an expression might\nreturn an error, we are basically referring to an expression that have\na return type in the format `!T`.\nThe `!` indicates that this expression returns either an error value, or a value of type `T`.\n\nAt @sec-main-file, I presented the `try` keyword and where to use it.\nBut I did not talked about what exactly this keyword does to your code,\nor, in other words, I have not explained yet what `try` means in your code.\n\nIn essence, when you use the `try` keyword in an expression, you are telling\nthe `zig` compiler the following: \"Hey! Execute this expression for me,\nand, if this expression return an error, please, return this error for me\nand stop the execution of my program. But if this expression return a valid\nvalue, then, return this value, and move on\".\n\nIn other words, the `try` keyword is essentially, a strategy to enter in panic mode, and stop\nthe execution of your program in case an error occurs.\nWith the `try` keyword, you are telling the `zig` compiler, that stopping the execution\nof your program is the most reasonable strategy to take if an error occurs\nin that particular expression.\n\n### The `catch` keyword\n\nOk, now that we understand properly what `try` means, let's discuss `catch` now.\nOne important detail here, is that you can use `try` or `catch` to handle your errors,\nbut you **cannot use `try` and `catch` together**. In other words, `try` and `catch`\nare different and completely separate strategies in the Zig language.\n\nThis is uncommon, and different than what happens in other languages. Most\nprogramming languages that adopts the *try catch* pattern (such as C++, R, Python, Javascript, etc.), normally use\nthese two keywords in conjunction to form the complete logic to\nproperly handle the errors.\nAnyway, Zig tries a different approach in the *try catch* pattern.\n\nSo, we learned already about what `try` means, and we also known that both\n`try` and `catch` should be used alone, separate from each other. But\nwhat exactly `catch` do in Zig? With `catch`, we can construct a block of\nlogic to handle the error value, in case it happens in the current expression.\n\nLook at the code example below. Once again, we go back to the previous\nexample where we were trying to open a file that doesn't exist in my computer,\nbut this time, I use `catch` to actually implement a logic to handle the error, instead of\njust stopping the execution right away.\n\nMore specifically, in this example, I'm using a logger object to record some logs into\nthe system, before I return the error, and stops the execution of the program. For example,\nthis could be some part of the codebase of a complex system that I do not have full control over,\nand I want to record these logs before the program crashes, so that I can debug it later\n(e.g. maybe I cannot compile the full program, and properly debug it with a debugger. So, these logs might\nbe a valid strategy to surpass this barrier).\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst dir = std.fs.cwd();\nconst file = dir.openFile(\n \"doesnt_exist.txt\", .{}\n) catch |err| {\n logger.record_context();\n logger.log_error(err);\n return err;\n};\n```\n:::\n\n\n\n\n\nTherefore, we use `catch` to create a block of expressions that will handle the error.\nI can return the error value from this block of expressions, like I did in the above example,\nwhich, will make the program enter in panic mode, and, stop the execution.\nBut I could also, return a valid value from this block of code, which would\nbe stored in the `file` object.\n\nNotice that, instead of writing the keyword before the expression that might return the error,\nlike we do with `try`,\nwe write `catch` after the expression. We can open the pair of pipes (`|`),\nwhich captures the error value returned by the expression, and makes\nthis error value available in the scope of the `catch` block as the object named `err`.\nIn other words, because I wrote `|err|` in the code, I can access the error value\nreturned by the expression, by using the `err` object.\n\nAlthough this being the most common use of `catch`, you can also use this keyword\nto handle the error in a \"default value\" style. That is, if the expression returns\nan error, we use the default value instead. Otherwise, we use the valid value returned\nby the expression.\n\nThe Zig official language reference, provides a great example of this \"default value\"\nstrategy with `catch`. This example is reproduced below. Notice that we are trying to parse\nsome unsigned integer from a string object named `str`. In other words, this function\nis trying to transform an object of type `[]const u8` (i.e. an array of characters, a string, etc.)\ninto an object of type `u64`.\n\nBut this parsing process done by the function `parseU64()` may fail, resulting in a runtime error.\nThe `catch` keyword used in this example provides an alternative value (13) to be used in case\nthis `parseU64()` function raises an error. So, the expression below essentially means:\n\"Hey! Please, parse this string into a `u64` for me, and store the results into the\nobject `number`. But, if an error occurs, then, return the value `13` instead\".\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst number = parseU64(str, 10) catch 13;\n```\n:::\n\n\n\n\nSo, at the end of this process, the object `number` will contain either a `u64` integer\nthat was parsed succesfully from the input string `str`, or, if an error in the\nparsing process occurs, it will contain the `u64` value `13` that was provided by the `catch`\nkeyword as the \"default\", or, the \"alternative\" value.\n\n\n\n### Using if statements\n\nNow, you can also use if statements to handle errors in your Zig code.\nIn the example below, I'm reproducing the previous example, where\nwe try to parse an integer value from an input string with a function\nnamed `parseU64()`.\n\nWe execute the expression inside the \"if\". If this expression returns an\nerror value, the \"if branch\" (or, the \"true branch\") of the if statement is not executed.\nBut if this expression returns a valid value instead, then, this value is unwrapped\ninto the `number` object.\n\nThis means that, if the `parseU64()` expression returns a valid value, this value becomes available\ninside the scope of this \"if branch\" (i.e. the \"true branch\") through the object that we listed inside the pair\nof pipe charactes (`|`), which is the object `number`.\n\nIf an error occurs, we can use an \"else branch\" (or the \"false branch\") of the if statement\nto handle the error. In the example below, we are using the `else` in the if statement\nto unwrap the error value (that was returned by `parseU64()`) into the `err` object,\nand handle the error.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nif (parseU64(str, 10)) |number| {\n // do something with `number` here\n} else |err| {\n // handle the error value.\n}\n```\n:::\n\n\n\n\nNow, if the expression that you are executing returns different types of error values,\nand you want to take a different action in each of these types of error values, the\n`catch` keyword becomes limited.\n\nFor this type of situation, the official documentation\nof the language suggests the use of a switch statement with an if statement [@zigdocs].\nThe basic idea is, to use the if statement to execute the expression, and\nuse the \"else branch\" to pass the error value to a switch statement, where\nyou define a different action for each type of error value that might be\nreturned by the expression executed in the if statement.\n\nThe example below demonstrates this idea. We first try to add (or register) a set of\ntasks to a queue. If this \"registration process\" occurs well, we then try\nto distribute these tasks across the workers of our system. But\nif this \"registration process\" returns an error value, we then use a switch\nstatement in the \"else branch\" to handle each possible error value.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nif (add_tasks_to_queue(&queue, tasks)) |_| {\n distribute_tasks(&queue);\n} else |err| switch (err) {\n error.InvalidTaskName => {\n // do something\n },\n error.TimeoutTooBig => {\n // do something\n },\n error.QueueNotFound => {\n // do somethimg\n },\n // and all the other error options ...\n}\n```\n:::\n\n\n\n\n\n### The `errdefer` keyword {#sec-errdefer2}\n\nA common pattern in C programs in general, is to clean resources when an error occurs during\nthe execution of the program. In other words, one common way to handle errors, is to perform\n\"cleanup actions\" before we exit our program. This garantees that a runtime error does not make\nour program to leak resources of the system.\n\n\nThe `errdefer` keyword is a tool to perform such \"cleanup actions\" in hostile situations.\nThis keyword is commonly used to clean (or to free) allocated resources, before the execution of our program\nget's stopped because of an error value being generated.\n\nThe basic idea is to provide an expression to the `errdefer` keyword. Then,\n`errdefer` executes this expression if, and only if, an error occurs\nduring the execution of the current scope.\nIn the example below, we are using an allocator object (that we presented at @sec-allocators)\nto create a new `User` object. If we are succesfull in creating and registering this new user,\nthis `create_user()` function will return this new `User` object as it's return value.\n\nHowever, if for some reason, an error value is generated by some expression\nthat is after the `errdefer` line, for example, in the `db.add(user)` expression,\nthe expression registered by `errdefer` get's executed before the error value is returned\nfrom the function, and before the program enters in panic mode and stops the\ncurrent execution.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn create_user(db: Database, allocator: Allocator) !User {\n const user = try allocator.create(User);\n errdefer allocator.destroy(user);\n\n // Register new user in the Database.\n _ = try db.register_user(user);\n return user;\n}\n```\n:::\n\n\n\n\nBy using `errdefer` to destroy the `user` object that we have just created,\nwe garantee that the memory allocated for this `user` object\nget's freed, before the execution of the program stops.\nBecause if the expression `try db.add(user)` returns an error value,\nthe execution of our program stops, and we loose all references and control over the memory\nthat we have allocated for the `user` object.\nAs a result, if we do not free the memory associated with the `user` object before the program stops,\nwe cannot free this memory anymore. We simply loose our chance to do the right thing.\nThat is why `errdefer` is essential in this situation.\n\nJust to make very clear the differences between `defer` (which I described at @sec-defer)\nand `errdefer`, it might be worth to discuss the subject a bit further.\nYou might still have the question \"why use `errdefer` if we can use `defer` instead?\"\nin your mind.\n\nAlthough being similar, the key difference between `errdefer` and `defer` keyword\nis when the provided expression get's executed.\nThe `defer` keyword always execute the provided expression at the end of the\ncurrent scope, no matter how your code exits this scope.\nIn contrast, `errdefer` executes the provided expression only when an error occurs in the\ncurrent scope.\n\nThis becomes important if a resource that you allocate in the\ncurrent scope get's freed later in your code, in a different scope.\nThe `create_user()` functions is an example of this. If you think\nclosely about this function, you will notice that this function returns\nthe `user` object as the result.\n\nIn other words, the allocated memory for the `user` object does not get\nfreed inside the `create_user()`, if the function returns succesfully.\nSo, if an error does not occur inside this function, the `user` object\nis returned from the function, and probably, the code that runs after\nthis `create_user()` function will be responsible for freeying\nthe memory of the `user` object.\n\nBut what if an error do occur inside the `create_user()`? What happens then?\nThis would mean that the execution of your code would stop in this `create_user()`\nfunction, and, as a consequence, the code that runs after this `create_user()`\nfunction would simply not run, and, as a result, the memory of the `user` object\nwould not be freed before your program stops.\n\nThis is the perfect scenario for `errdefer`. We use this keyword to garantee\nthat our program will free the allocated memory for the `user` object,\neven if an error occurs inside the `create_user()` function.\n\nIf you allocate and free some memory for an object in the same scope, then,\njust use `defer` and be happy, `errdefer` have no use for you in such situation.\nBut if you allocate some memory in a scope A, but you only free this memory\nlater, in a scope B for example, then, `errdefer` becomes useful to avoid leaking memory\nin sketchy situations.\n\n\n\n## Union type in Zig {#sec-unions}\n\nAn union type defines a set of types that an object can be. It is like a list of\noptions. Each option is a type that an object can assume. Therefore, unions in Zig\nhave the same meaning, or, the same role as unions in C. They are used for the same purpose.\nYou could also say that unions in Zig produces a similar effect to\n[`typing.Union` in Python](https://docs.python.org/3/library/typing.html#typing.Union)[^pyunion].\n\n[^pyunion]: \n\nFor example, you might be creating an API that sends data to a data lake, hosted\nin some private cloud infrastructure. Suppose you created different structs in your codebase,\nto store the necessary information that you need, in order to connect to the services of\neach mainstream data lake service (Amazon S3, Azure Blob, etc.).\n\nNow, suppose you also have a function named `send_event()` that receives an event as input,\nand, a target data lake, and it sends the input event to the data lake specified in the\ntarget data lake argument. But this target data lake could be any of the three mainstream data lakes\nservices (Amazon S3, Azure Blob, etc.). Here is where an union can help you.\n\nThe union `LakeTarget` defined below allows the `lake_target` argument of `send_event()`\nto be either an object of type `AzureBlob`, or type `AmazonS3`, or type `GoogleGCP`.\nThis union allows the `send_event()` function to receive an object of any of these three types\nas input in the `lake_target` argument.\n\nRemember that each of these three types\n(`AmazonS3`, `GoogleGCP` and `AzureBlob`) are separate structs that we defined in\nour source code. So, at first glance, they are separate data types in our source code.\nBut is the `union` keyword that unifies them into a single data type called `LakeTarget`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst LakeTarget = union {\n azure: AzureBlob,\n amazon: AmazonS3,\n google: GoogleGCP,\n};\n\nfn send_event(\n event: Event,\n lake_target: LakeTarget\n) bool {\n // body of the function ...\n}\n```\n:::\n\n\n\n\nAn union definition is composed by a list of data members. Each data member is of a specific data type.\nIn the example above, the `LakeTarget` union have three data members (`azure`, `amazon`, `google`).\nWhen you instantiate an object that uses an union type, you can only use one of it's data members\nin this instantiation.\n\nYou could also interpret this as: only one data member of an union type can be activated at a time, the other data\nmembers remain deactivated and unaccessible. For example, if you create a `LakeTarget` object that uses\nthe `azure` data member, you can no longer use or access the data members `google` or `amazon`.\nIt is like if these other data members didn't exist at all in the `LakeTarget` type.\n\nYou can see this logic in the example below. Notice that, we first instantiate the union\nobject using the `azure` data member. As a result, this `target` object contains only\nthe `azure` data member inside of it. Only this data member is active in this object.\nThat is why the last line in this code example is invalid. Because we are trying to instantiate the data member\n`google`, which is currently inactive for this `target` object, and as a result, the program\nenters in panic mode warning us about this mistake through a loud error message.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar target = LakeTarget {\n .azure = AzureBlob.init()\n};\n// Only the `azure` data member exist inside\n// the `target` object, and, as a result, this\n// line below is invalid:\ntarget.google = GoogleGCP.init();\n```\n:::\n\n\n\n\n```\nthread 2177312 panic: access of union field 'google' while\n field 'azure' is active:\n target.google = GoogleGCP.init();\n ^\n```\n\nSo, when you instantiate an union object, you must choose one of the data types (or, one of the data members)\nlisted in the union type. In the example above, I choose to use the `azure` data member, and, as a result,\nall other data members were automatically deactivated,\nand you can no longer use them after you instantiate the object.\n\nYou can activate another data member by completely redefining the entire enum object.\nIn the example below, I initially use the `azure` data member. But then, I redefine the\n`target` object to use a new `LakeTarget` object, which uses this time the `google` data member.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar target = LakeTarget {\n .azure = AzureBlob.init()\n};\ntarget = LakeTarget {\n .google = GoogleGCP.init()\n};\n```\n:::\n\n\n\n\nAn curious fact about union types, is that, at first, you cannot use them in switch statements (that we preseted at @sec-switch).\nIn other words, if you have an object of type `LakeTarget` for example, you cannot give this object\nto a switch statement as input.\n\nBut what if you really need to do so? What if you actually need to\nprovide an \"union object\" to a switch statement? The answer to this question relies on another special type in Zig,\nwhich are the *tagged unions*. To create a tagged union, all you have to do is to add\nan enum type into your union declaration.\n\nAs an example of a tagged union in Zig, take the `Registry` type exposed\nbelow. This type comes from the\n[`grammar.zig` module](https://github.com/ziglang/zig/blob/30b4a87db711c368853b3eff8e214ab681810ef9/tools/spirv/grammar.zig)[^grammar]\nfrom the Zig repository. This union type lists different types of registries.\nBut notice this time, the use of `(enum)` after the `union` keyword. This is what makes\nthis union type a tagged union. Also, by being a tagged union, an object of this `Registry` type\ncan be used as input in a switch statement. This is all you have to do. Just add `(enum)`\nto your `union` declaration, and you can use it in switch statements.\n\n[^grammar]: .\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const Registry = union(enum) {\n core: CoreRegistry,\n extension: ExtensionRegistry,\n};\n```\n:::\n",
+ "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Error handling and unions {#sec-error-handling}\n\nIn this chapter, I want to discuss how error handling is done in Zig.\nWe already briefly learned about one of the available strategies to handle errors in Zig,\nwhich is the `try` keyword presented at @sec-main-file. But we still haven't learned about\nthe other methods, such as the `catch` keyword.\nI also want to discuss in this chapter how union types are created in Zig.\n\n## Learning more about errors in Zig\n\nBefore we get into how error handling is done, we need to learn more about what errors are in Zig.\nAn error is actually a value in Zig [@zigoverview]. In other words, when an error occurs inside your Zig program,\nit means that somewhere in your Zig codebase, an error value is being generated.\nAn error value is similar to any integer value that you create in your Zig code.\nYou can take an error value and pass it as input to a function,\nand you can also cast (or coerce) it into a different type of an error value.\n\nThis have some similarities with exceptions in C++ and Python.\nBecause in C++ and Python, when an exception happens inside a `try` block,\nyou can use a `catch` block (in C++) or an `except` block (in Python)\nto capture the exception produced in the `try` block,\nand pass it to functions as an input.\n\nHowever, error values in Zig are treated very differently than exceptions.\nFirst, you cannot ignore error values in your Zig code. Meaning that, if an error\nvalue appears somewhere in your source code, this error value must be explicitly handled in some way.\nThis also means that you cannot discard error values by assigning them to an underscore,\nas you could do with normal values and objects.\n\nTake the source code below as an example. Here we are trying to open a file that does not exist\nin my computer, and as a result, an obvious error value of `FileNotFound` is returned from the `openFile()`\nfunction. But because I'm assigning the result of this function to an underscore, I end up\ntrying to discard an error value.\n\nThe `zig` compiler detects this mistake, and raises a compile\nerror telling me that I'm trying to discard an error value.\nIt also adds a note message that suggests the use of `try`,\n`catch` or an if statement to explicitly handle this error value\nThis note is reinforcing that every possible error value must be explicitly handled in Zig.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst dir = std.fs.cwd();\n_ = dir.openFile(\"doesnt_exist.txt\", .{});\n```\n:::\n\n\n\n\n```\nt.zig:8:17: error: error set is discarded\nt.zig:8:17: note: consider using 'try', 'catch', or 'if'\n```\n\n\n### Returning errors from functions\n\nAs we described at @sec-main-file, when we have a function that might return an error\nvalue, this function normally includes an exclamation mark (`!`) in it's return type\nannotation. The presence of this exclamation mark indicates that this function might\nreturn an error value as result, and, the `zig` compiler forces you to always handle explicitly\nthe case of this function returning an error value.\n\nTake a look at the `print_name()` function below. This function might return an error in the `stdout.print()` function call,\nand, as a consequence, it's return type (`!void`) includes an exclamation mark in it.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn print_name() !void {\n const stdout = std.getStdOut().writer();\n try stdout.print(\"My name is Pedro!\", .{});\n}\n```\n:::\n\n\n\n\nIn the example above, we are using the exclamation mark to tell the `zig` compiler\nthat this function might return some error. But which error exactly is returned from\nthis function? For now, we are not specifying a specific error value. We only\nknown for now that some error value (whatever it is) might be returned.\n\nBut in fact, you can (if you want to) specify clearly which exact error values\nmight be returned from this function. There are lot of examples of\nthis in the Zig Standard Library. Take this `fill()` function from\nthe `http.Client` module as an example. This function returns\neither a error value of type `ReadError`, or `void`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn fill(conn: *Connection) ReadError!void {\n // The body of this function ...\n}\n```\n:::\n\n\n\n\nThis idea of specifying the exact error values that you expect to be returned\nfrom the function is interesting. Because they automatically become some sort of documentation\nof your function, and also, it allows the `zig` compiler to perform some extra checks over\nyour code. Because it can check if there is any other type of error value\nthat is being generated inside your function, and, that it is not being accounted\nfor in this return type annotation.\n\nAnyway, you can list the types of errors that can be returned from the function\nby listing them on the left side of the exclamation mark. While the valid values\nstay on the right side of the exclamation mark. So the syntax format become:\n\n```\n!\n```\n\n\n### Error sets\n\nBut what about when we have a single function that might return different types of errors?\nWhen you have such a function, you can list\nall of these different types of errors that can be returned from this function,\nthrough a structure in Zig that we call of an *error set*.\n\nAn error set is a special case of an union type. It is an union that contains error values in it.\nNot all programming languages have a notion of an \"union object\".\nBut in summary, an union is just a set of data types.\nUnions are used to allow an object to have multiple data types.\nFor example, a union of `x`, `y` and `z`, means that\nan object can be either of type `x`, or type `y` or type `z`.\n\nWe are going to talk in more depth about unions at @sec-unions.\nBut you can write an error set by writing the keyword `error` before\na pair of curly braces, then you list the error values that can be\nreturned from the function inside this pair of curly braces.\n\nTake the `resolvePath()` function below as an example, which comes from the\n`introspect.zig` module of the Zig Standard Library. We can see in it's return type annotation, that this\nfunction return either: 1) a valid slice of `u8` values (`[]u8`); or, 2) one of the three different\ntypes of error values listed inside the error set (`OutOfMemory`, `Unexpected`, etc.).\nThis is an usage example of an error set.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn resolvePath(\n ally: mem.Allocator,\n p: []const u8,\n) error{\n OutOfMemory,\n CurrentWorkingDirectoryUnlinked,\n Unexpected,\n}![]u8 {\n // The body of the function ...\n}\n```\n:::\n\n\n\n\n\nThis is a valid way of annotating the return value of a Zig function. But, if you navigate through\nthe modules that composes the Zig Standard Library, you will notice that, for the majority of cases,\nthe programmers prefer to give a descriptive name to this error set, and then, use this name (or this \"label\")\nof the error set in the return type annotation, instead of using the error set directly.\n\nWe can see that in the `ReadError` error set that we showed earlier in the `fill()` function,\nwhich is defined in the `http.Client` module.\nSo yes, I presented the `ReadError` as if it was just a standard and single error value, but in fact,\nit is an error set defined in the `http.Client` module, and therefore, it actually represents\na set of different error values that might happen inside the `fill()` function.\n\n\nTake a look at the `ReadError` definition reproduced below. Notice that we are grouping all of these\ndifferent error values into a single object, and then, we use this object into the return type annotation of the function.\nLike the `fill()` function that we showed earlier, or, the `readvDirect()` function from the same module,\nwhich is reproduced below.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const ReadError = error{\n TlsFailure,\n TlsAlert,\n ConnectionTimedOut,\n ConnectionResetByPeer,\n UnexpectedReadFailure,\n EndOfStream,\n};\n// Some lines of code\npub fn readvDirect(\n conn: *Connection,\n buffers: []std.posix.iovec\n ) ReadError!usize {\n // The body of the function ...\n}\n```\n:::\n\n\n\n\nSo, an error set is just a convenient way of grouping a set of\npossible error values into a single object, or a single type of an error value.\n\n\n### Casting error values\n\nLet's suppose you have two different error sets, named `A` and `B`.\nIf error set `A` is a superset of error set `B`, then, you can cast (or coerce)\nerror values from `B` into error values of `A`.\n\nError sets are just a set of error values. So, if the error set `A`\ncontains all error values from the error set `B`, then `A`\nbecomes a superset of `B`. You could also say\nthat the error set `B` is a subset of error set `A`.\n\nThe example below demonstrates this idea. Because `A` contains all\nvalues from `B`, `A` is a superset of `B`.\nIn math notation, we would say that $A \\supset B$.\nAs a consequence, we can give an error value from `B` as input to the `cast()`\nfunction, and, implicitly cast this input into the same error value, but from the `A` set.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst A = error{\n ConnectionTimeoutError,\n DatabaseNotFound,\n OutOfMemory,\n InvalidToken,\n};\nconst B = error {\n OutOfMemory,\n};\n\nfn cast(err: B) A {\n return err;\n}\n\ntest \"coerce error value\" {\n const error_value = cast(B.OutOfMemory);\n try std.testing.expect(\n error_value == A.OutOfMemory\n );\n}\n```\n:::\n\n\n\n\n\n## How to handle errors\n\nNow that we learned more about what errors are in Zig,\nlet's discuss the available strategies to handle these errors,\nwhich are:\n\n- `try` keyword;\n- `catch` keyword;\n- an if statement;\n- `errdefer` keyword;\n\n\n\n### What `try` means?\n\nAs I described over the previous sections, when we say that an expression might\nreturn an error, we are basically referring to an expression that have\na return type in the format `!T`.\nThe `!` indicates that this expression returns either an error value, or a value of type `T`.\n\nAt @sec-main-file, I presented the `try` keyword and where to use it.\nBut I did not talked about what exactly this keyword does to your code,\nor, in other words, I have not explained yet what `try` means in your code.\n\nIn essence, when you use the `try` keyword in an expression, you are telling\nthe `zig` compiler the following: \"Hey! Execute this expression for me,\nand, if this expression return an error, please, return this error for me\nand stop the execution of my program. But if this expression return a valid\nvalue, then, return this value, and move on\".\n\nIn other words, the `try` keyword is essentially, a strategy to enter in panic mode, and stop\nthe execution of your program in case an error occurs.\nWith the `try` keyword, you are telling the `zig` compiler, that stopping the execution\nof your program is the most reasonable strategy to take if an error occurs\nin that particular expression.\n\n### The `catch` keyword\n\nOk, now that we understand properly what `try` means, let's discuss `catch` now.\nOne important detail here, is that you can use `try` or `catch` to handle your errors,\nbut you **cannot use `try` and `catch` together**. In other words, `try` and `catch`\nare different and completely separate strategies in the Zig language.\n\nThis is uncommon, and different than what happens in other languages. Most\nprogramming languages that adopts the *try catch* pattern (such as C++, R, Python, Javascript, etc.), normally use\nthese two keywords together to form the complete logic to\nproperly handle the errors.\nAnyway, Zig tries a different approach in the *try catch* pattern.\n\nSo, we learned already about what `try` means, and we also known that both\n`try` and `catch` should be used alone, separate from each other. But\nwhat exactly `catch` do in Zig? With `catch`, we can construct a block of\nlogic to handle the error value, in case it happens in the current expression.\n\nLook at the code example below. Once again, we go back to the previous\nexample where we were trying to open a file that doesn't exist in my computer,\nbut this time, I use `catch` to actually implement a logic to handle the error, instead of\njust stopping the execution right away.\n\nMore specifically, in this example, I'm using a logger object to record some logs into\nthe system, before I return the error, and stop the execution of the program. For example,\nthis could be some part of the codebase of a complex system that I do not have full control over,\nand I want to record these logs before the program crashes, so that I can debug it later\n(e.g. maybe I cannot compile the full program, and properly debug it with a debugger. So, these logs might\nbe a valid strategy to surpass this barrier).\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst dir = std.fs.cwd();\nconst file = dir.openFile(\n \"doesnt_exist.txt\", .{}\n) catch |err| {\n logger.record_context();\n logger.log_error(err);\n return err;\n};\n```\n:::\n\n\n\n\n\nTherefore, we use `catch` to create a block of expressions that will handle the error.\nI can return the error value from this block of expressions, like I did in the above example,\nwhich, will make the program enter in panic mode, and, stop the execution.\nBut I could also, return a valid value from this block of code, which would\nbe stored in the `file` object.\n\nNotice that, instead of writing the keyword before the expression that might return the error,\nlike we do with `try`, we write `catch` after the expression. We can open the pair of pipes (`|`),\nwhich captures the error value returned by the expression, and makes\nthis error value available in the scope of the `catch` block as the object named `err`.\nIn other words, because I wrote `|err|` in the code, I can access the error value\nreturned by the expression, by using the `err` object.\n\nAlthough this being the most common use of `catch`, you can also use this keyword\nto handle the error in a \"default value\" style. That is, if the expression returns\nan error, we use the default value instead. Otherwise, we use the valid value returned\nby the expression.\n\nThe Zig official language reference, provides a great example of this \"default value\"\nstrategy with `catch`. This example is reproduced below. Notice that we are trying to parse\nsome unsigned integer from a string object named `str`. In other words, this function\nis trying to transform an object of type `[]const u8` (i.e. an array of characters, a string, etc.)\ninto an object of type `u64`.\n\nBut this parsing process done by the function `parseU64()` may fail, resulting in a runtime error.\nThe `catch` keyword used in this example provides an alternative value (13) to be used in case\nthis `parseU64()` function raises an error. So, the expression below essentially means:\n\"Hey! Please, parse this string into a `u64` for me, and store the results into the\nobject `number`. But, if an error occurs, then, use the value `13` instead\".\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst number = parseU64(str, 10) catch 13;\n```\n:::\n\n\n\n\nSo, at the end of this process, the object `number` will contain either a `u64` integer\nthat was parsed successfully from the input string `str`, or, if an error occurs in the\nparsing process, it will contain the `u64` value `13` that was provided by the `catch`\nkeyword as the \"default\", or, the \"alternative\" value.\n\n\n\n### Using if statements\n\nNow, you can also use if statements to handle errors in your Zig code.\nIn the example below, I'm reproducing the previous example, where\nwe try to parse an integer value from an input string with a function\nnamed `parseU64()`.\n\nWe execute the expression inside the \"if\". If this expression returns an\nerror value, the \"if branch\" (or, the \"true branch\") of the if statement is not executed.\nBut if this expression returns a valid value instead, then, this value is unwrapped\ninto the `number` object.\n\nThis means that, if the `parseU64()` expression returns a valid value, this value becomes available\ninside the scope of this \"if branch\" (i.e. the \"true branch\") through the object that we listed inside the pair\nof pipe charactes (`|`), which is the object `number`.\n\nIf an error occurs, we can use an \"else branch\" (or the \"false branch\") of the if statement\nto handle the error. In the example below, we are using the `else` in the if statement\nto unwrap the error value (that was returned by `parseU64()`) into the `err` object,\nand handle the error.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nif (parseU64(str, 10)) |number| {\n // do something with `number` here\n} else |err| {\n // handle the error value.\n}\n```\n:::\n\n\n\n\nNow, if the expression that you are executing returns different types of error values,\nand you want to take a different action in each of these types of error values, the\n`try` and `catch` keywords, and the if statement strategy, becomes limited.\n\nFor this type of situation, the official documentation of the language suggests\nthe use of a switch statement together with an if statement [@zigdocs].\nThe basic idea is, to use the if statement to execute the expression, and\nuse the \"else branch\" to pass the error value to a switch statement, where\nyou define a different action for each type of error value that might be\nreturned by the expression executed in the if statement.\n\nThe example below demonstrates this idea. We first try to add (or register) a set of\ntasks to a queue. If this \"registration process\" occurs well, we then try\nto distribute these tasks across the workers of our system. But\nif this \"registration process\" returns an error value, we then use a switch\nstatement in the \"else branch\" to handle each possible error value.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nif (add_tasks_to_queue(&queue, tasks)) |_| {\n distribute_tasks(&queue);\n} else |err| switch (err) {\n error.InvalidTaskName => {\n // do something\n },\n error.TimeoutTooBig => {\n // do something\n },\n error.QueueNotFound => {\n // do somethimg\n },\n // and all the other error options ...\n}\n```\n:::\n\n\n\n\n\n### The `errdefer` keyword {#sec-errdefer2}\n\nA common pattern in C programs in general, is to clean resources when an error occurs during\nthe execution of the program. In other words, one common way to handle errors, is to perform\n\"cleanup actions\" before we exit our program. This garantees that a runtime error does not make\nour program to leak resources of the system.\n\n\nThe `errdefer` keyword is a tool to perform such \"cleanup actions\" in hostile situations.\nThis keyword is commonly used to clean (or to free) allocated resources, before the execution of our program\nget's stopped because of an error value being generated.\n\nThe basic idea is to provide an expression to the `errdefer` keyword. Then,\n`errdefer` executes this expression if, and only if, an error occurs\nduring the execution of the current scope.\nIn the example below, we are using an allocator object (that we have presented at @sec-allocators)\nto create a new `User` object. If we are successful in creating and registering this new user,\nthis `create_user()` function will return this new `User` object as it's return value.\n\nHowever, if for some reason, an error value is generated by some expression\nthat is after the `errdefer` line, for example, in the `db.add(user)` expression,\nthe expression registered by `errdefer` get's executed before the error value is returned\nfrom the function, and before the program enters in panic mode and stops the\ncurrent execution.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn create_user(db: Database, allocator: Allocator) !User {\n const user = try allocator.create(User);\n errdefer allocator.destroy(user);\n\n // Register new user in the Database.\n _ = try db.register_user(user);\n return user;\n}\n```\n:::\n\n\n\n\nBy using `errdefer` to destroy the `user` object that we have just created,\nwe garantee that the memory allocated for this `user` object\nget's freed, before the execution of the program stops.\nBecause if the expression `try db.add(user)` returns an error value,\nthe execution of our program stops, and we lose all references and control over the memory\nthat we have allocated for the `user` object.\nAs a result, if we do not free the memory associated with the `user` object before the program stops,\nwe cannot free this memory anymore. We simply lose our chance to do the right thing.\nThat is why `errdefer` is essential in this situation.\n\nJust to state clearly the differences between `defer` and `errdefer`\n(which I described at @sec-defer and @sec-errdefer1), it might be worth\nto discuss the subject a bit further. You might still have the question\n\"why use `errdefer` if we can use `defer` instead?\" in your mind.\n\nAlthough being similar, the key difference between `errdefer` and `defer` keyword\nis when the provided expression get's executed.\nThe `defer` keyword always execute the provided expression at the end of the\ncurrent scope, no matter how your code exits this scope.\nIn contrast, `errdefer` executes the provided expression only when an error occurs in the\ncurrent scope.\n\nThis becomes important if a resource that you allocate in the\ncurrent scope get's freed later in your code, in a different scope.\nThe `create_user()` functions is an example of this. If you think\nclosely about this function, you will notice that this function returns\nthe `user` object as the result.\n\nIn other words, the allocated memory for the `user` object does not get\nfreed inside the `create_user()` function, if it returns successfully.\nSo, if an error does not occur inside this function, the `user` object\nis returned from the function, and probably, the code that runs after\nthis `create_user()` function will be responsible for freeying\nthe memory of the `user` object.\n\nBut what if an error occurs inside the `create_user()` function? What happens then?\nThis would mean that the execution of your code would stop in this `create_user()`\nfunction, and, as a consequence, the code that runs after this `create_user()`\nfunction would simply not run, and, as a result, the memory of the `user` object\nwould not be freed before your program stops.\n\nThis is the perfect scenario for `errdefer`. We use this keyword to garantee\nthat our program will free the allocated memory for the `user` object,\neven if an error occurs inside the `create_user()` function.\n\nIf you allocate and free some memory for an object inside the same scope, then,\njust use `defer` and be happy, i.e. `errdefer` have no use for you in such situation.\nBut if you allocate some memory in a scope A, but you only free this memory\nlater, in a scope B for example, then, `errdefer` becomes useful to avoid leaking memory\nin sketchy situations.\n\n\n\n## Union type in Zig {#sec-unions}\n\nAn union type defines a set of types that an object can be. It is like a list of\noptions. Each option is a type that an object can assume. Therefore, unions in Zig\nhave the same meaning, or, the same role as unions in C. They are used for the same purpose.\nYou could also say that unions in Zig produces a similar effect to\n[`typing.Union` in Python](https://docs.python.org/3/library/typing.html#typing.Union)[^pyunion].\n\n[^pyunion]: \n\nFor example, you might be creating an API that sends data to a data lake, hosted\nin some private cloud infrastructure. Suppose you created different structs in your codebase,\nto store the necessary information that you need, in order to connect to the services of\neach mainstream data lake service (Amazon S3, Azure Blob, etc.).\n\nNow, suppose you also have a function named `send_event()` that receives an event as input,\nand, a target data lake, and it sends the input event to the data lake specified in the\ntarget data lake argument. But this target data lake could be any of the three mainstream data lakes\nservices (Amazon S3, Azure Blob, etc.). Here is where an union can help you.\n\nThe union `LakeTarget` defined below allows the `lake_target` argument of `send_event()`\nto be either an object of type `AzureBlob`, or type `AmazonS3`, or type `GoogleGCP`.\nThis union allows the `send_event()` function to receive an object of any of these three types\nas input in the `lake_target` argument.\n\nRemember that each of these three types\n(`AmazonS3`, `GoogleGCP` and `AzureBlob`) are separate structs that we defined in\nour source code. So, at first glance, they are separate data types in our source code.\nBut is the `union` keyword that unifies them into a single data type called `LakeTarget`.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst LakeTarget = union {\n azure: AzureBlob,\n amazon: AmazonS3,\n google: GoogleGCP,\n};\n\nfn send_event(\n event: Event,\n lake_target: LakeTarget\n) bool {\n // body of the function ...\n}\n```\n:::\n\n\n\n\nAn union definition is composed by a list of data members. Each data member is of a specific data type.\nIn the example above, the `LakeTarget` union have three data members (`azure`, `amazon`, `google`).\nWhen you instantiate an object that uses an union type, you can only use one of it's data members\nin this instantiation.\n\nYou could also interpret this as: only one data member of an union type can be activated at a time, the other data\nmembers remain deactivated and unaccessible. For example, if you create a `LakeTarget` object that uses\nthe `azure` data member, you can no longer use or access the data members `google` or `amazon`.\nIt is like if these other data members didn't exist at all in the `LakeTarget` type.\n\nYou can see this logic in the example below. Notice that, we first instantiate the union\nobject using the `azure` data member. As a result, this `target` object contains only\nthe `azure` data member inside of it. Only this data member is active in this object.\nThat is why the last line in this code example is invalid. Because we are trying to instantiate the data member\n`google`, which is currently inactive for this `target` object, and as a result, the program\nenters in panic mode warning us about this mistake through a loud error message.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar target = LakeTarget {\n .azure = AzureBlob.init()\n};\n// Only the `azure` data member exist inside\n// the `target` object, and, as a result, this\n// line below is invalid:\ntarget.google = GoogleGCP.init();\n```\n:::\n\n\n\n\n```\nthread 2177312 panic: access of union field 'google' while\n field 'azure' is active:\n target.google = GoogleGCP.init();\n ^\n```\n\nSo, when you instantiate an union object, you must choose one of the data types (or, one of the data members)\nlisted in the union type. In the example above, I choose to use the `azure` data member, and, as a result,\nall other data members were automatically deactivated,\nand you can no longer use them after you instantiate the object.\n\nYou can activate another data member by completely redefining the entire enum object.\nIn the example below, I initially use the `azure` data member. But then, I redefine the\n`target` object to use a new `LakeTarget` object, which uses this time the `google` data member.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar target = LakeTarget {\n .azure = AzureBlob.init()\n};\ntarget = LakeTarget {\n .google = GoogleGCP.init()\n};\n```\n:::\n\n\n\n\nAn curious fact about union types, is that, at first, you cannot use them in switch statements (that we preseted at @sec-switch).\nIn other words, if you have an object of type `LakeTarget` for example, you cannot give this object\nto a switch statement as input.\n\nBut what if you really need to do so? What if you actually need to\nprovide an \"union object\" to a switch statement? The answer to this question relies on another special type in Zig,\nwhich are the *tagged unions*. To create a tagged union, all you have to do is to add\nan enum type into your union declaration.\n\nAs an example of a tagged union in Zig, take the `Registry` type exposed\nbelow. This type comes from the\n[`grammar.zig` module](https://github.com/ziglang/zig/blob/30b4a87db711c368853b3eff8e214ab681810ef9/tools/spirv/grammar.zig)[^grammar]\nfrom the Zig repository. This union type lists different types of registries.\nBut notice this time, the use of `(enum)` after the `union` keyword. This is what makes\nthis union type a tagged union. Also, by being a tagged union, an object of this `Registry` type\ncan be used as input in a switch statement. This is all you have to do. Just add `(enum)`\nto your `union` declaration, and you can use it in switch statements.\n\n[^grammar]: .\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub const Registry = union(enum) {\n core: CoreRegistry,\n extension: ExtensionRegistry,\n};\n```\n:::\n",
"supporting": [
"09-error-handling_files"
],
diff --git a/_freeze/Chapters/12-file-op/execute-results/html.json b/_freeze/Chapters/12-file-op/execute-results/html.json
index 09684e9..5414aa0 100644
--- a/_freeze/Chapters/12-file-op/execute-results/html.json
+++ b/_freeze/Chapters/12-file-op/execute-results/html.json
@@ -1,8 +1,8 @@
{
- "hash": "d735051e6bfd96a7a908b999537df4d9",
+ "hash": "8c00763b3f545bad9c18e2382e56e147",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Filesystem and Input/Output (IO) {#sec-filesystem}\n\nIn this chapter we are going to discuss how to use the cross-platform structs and functions available\nin the Zig Standard Library that executes filesystem operations. Most of these functions and structs\ncomes from the `std.fs` module.\n\nWe are also going to talk about Input/Output (also known as IO) operations in Zig. Most of\nthese operations are made by using the structs and functions from `std.io` module, which defines\ndescriptors for the *standard channels* of your system (`stdout` and `stdin`), and also,\nfunctions to create and use I/O streams.\n\n\n## Input/Output basics {#sec-io-basics}\n\nIf you have programming experience in a high-level language, you certainly have used before\nthe input and output functionalities of this language. In other words, you certainly have\nbeen in a situation where you needed to sent some output to the user, or, to receive an input\nfrom the user.\n\nFor example, in Python we can receive some input from the user by using the `input()` built-in\nfunction. But we can also print (or \"show\") some output to the user by using the `print()`\nbuilt-in function. So yes, if you have programmed before in Python, you certainly have\nused these functions once before.\n\nBut do you know how these functions relate back to your operating system (OS)? How exactly\nthey are interacting with the resources of your OS to receive or sent input/output.\nIn essence, these input/output functions from high-level languages are just abstractions\nover the *standard output* and *standard input* channels of your operating system.\n\nThis means that we receive an input, or send some output, through the operating system.\nIt is the OS that makes the bridge between the user and your program. Your program\ndoes not have a direct access to the user. It is the OS that intermediates every\nmessage exchanged between your program and the user.\n\nThe *standard output* and *standard input* channels of your OS are commonly known as the\n`stdout` and `stdin` channels of your OS, respectively. In some contexts, they are also called of the *standard output device*\nand *standard input device*. As the name suggests, the *standard output*\nis the channel through which output flows, while the *standard input* is the channel in which\ninput flows.\n\nFurthermore, OS's also normally create a dedicated channel for exchanging error messages, known as the\n*standard error* channel, or, the `stderr` channel. This is the channel to which error and warning messages\nare usually sent to. These are the messages that are normally displayed in red-like or orange-like colors\ninto your terminal.\n\nNormally, every OS (e.g. Windows, MacOS, Linux, etc.) creates a dedicated and separate pair of\n*standard output*, *standard error* and *standard input* channels for every single program (or process) that runs in your computer.\nThis means that every program you write have a dedicated `stdin`, `stderr` and `stdout` that are separate\nfrom the `stdin`, `stderr` and `stdout` of other programs and processes that are currently running.\n\nThis is a behaviour from your OS.\nThis does not come from the programming language that you are using.\nBecause as I sad earlier, input and output in programming languages, especially\nin high-level ones, are just a simple abstraction over the `stdin`, `stderr` and `stdout` from your current OS.\nThat is, your OS is the intermediary between every input/output operation made in your program,\nregardless of the programming language that you are using.\n\n\n### The writer and reader pattern {#sec-writer-reader}\n\nIn Zig, there is a pattern around input/output (IO). I (the author of this book) don't know if there is an official name for this pattern.\nBut here, in this book, I will call it the \"writer and reader pattern\". In essence, every IO operation in Zig is\nmade through either a `GenericReader` or a `GenericWriter` object[^gen-zig].\n\nThese two data types come from the `std.io` module of the Zig Standard Library. As their names suggests, a\n`GenericReader` is an object that offers tools to read data from \"something\" (or \"somewhere\"), while a `GenericWriter`\noffers tools to write data into this \"something\".\nThis \"something\" might be different things: like a file that exists in your filesystem; or, it might be a network socket of your system[^sock]; or,\na continuous stream of data, like a standard input device from your system, that might be constantly\nreceiving new data from users, or, as another example, a live chat in a game that is constantly receiving and displaying new messages from the\nplayers of the game.\n\n[^gen-zig]: Previously, these objects were known as the `Reader` and `Writer` objects.\n[^sock]: The socket objects that we have created at @sec-create-socket, are examples of network sockets.\n\nSo, if you want to **read** data from something, or somewhere, it means that you need to use a `GenericReader` object.\nBut if you need instead, to **write** data into this \"something\", then, you need to use a `GenericWriter` object instead.\nBoth of these objects are normally created from a file descriptor object. More specifically, through the `writer()` and `reader()`\nmethods of this file descriptor object. If you are not familiar with this type of object, go to the\nnext section.\n\nEvery `GenericWriter` object have methods like `print()`, which allows you to write/send a formatted string\n(i.e. this formatted string is like a `f` string in Python, or, similar to the `printf()` C function)\ninto the \"something\" (file, socket, stream, etc.) that you are using. It also have a `writeAll()` method, which allows you to\nwrite a string, or, an array of bytes into the \"something\".\n\nLikewise, every `GenericReader` object have methods like `readAll()`, which allows you to read the\ndata from the \"something\" (file, socket, stream, etc.) until it fills a particular array (i.e. a \"buffer\") object.\nIn other words, if you provide an array object of 300 `u8` values to `readAll()`, then, this method attempts to read 300 bytes\nof data from the \"something\", and it stores them into the array object that you have provided.\n\nWe also have other methods, like the `readAtLeast()` method,\nwhich allows you to specify how many bytes exactly you want to read from the \"something\".\nIn more details, if you give the number $n$ as input to this method, then, it will attempt to read at least $n$ bytes of data from the \"something\".\nThe \"something\" might have less than $n$ bytes of data available for you to read, so, it is not garanteed\nthat you will get precisely $n$ bytes as result.\n\nAnother useful method is `readUntilDelimiterOrEof()`. In this method, you specify a \"delimiter character\".\nThe idea is that this function will attempt to read as many bytes of data as possible from the \"something\",\nuntil it encounters the end of the stream, or, it encounters the \"delimiter character\" that you have specified.\n\nIf you don't know exactly how many bytes will come from the \"something\", you may find the `readAllAlloc()` method\nuseful. In essence, you provide an allocator object to this method, so that it can allocate more space if needed.\nAs consequence, this method will try to read all bytes of the \"something\", and, if it runs out of space at some point\nduring the \"reading process\", it uses the allocator object to allocate more space to continue reading the bytes.\nAs result, this method returns a slice to the array object containing all the bytes read.\n\nThis is just a quick description of the methods present in these types of objects. But I recommend you\nto read the official docs, both for\n[`GenericWriter`](https://ziglang.org/documentation/master/std/#std.io.GenericWriter)[^gen-write] and\n[`GenericReader`](https://ziglang.org/documentation/master/std/#std.io.GenericReader)[^gen-read].\nI also think it is a good idea to read the source code of the modules in the Zig Standard Library\nthat defines the methods present in these objects, which are the\n[`Reader.zig`](https://github.com/ziglang/zig/blob/master/lib/std/io/Reader.zig)[^mod-read]\nand [`Writer.zig`]()[^mod-write].\n\n[^gen-read]: .\n[^gen-write]: .\n[^mod-read]: .\n[^mod-write]: .\n\n\n### Introducing file descriptors {#sec-file-descriptor}\n\nA \"file descriptor\" object is a core component behind every I/O operation that is made in any operating system (OS).\nSuch object is an identifier for a particular input/output (IO) resource from your OS [@wiki_file_descriptor].\nIt describes and identifies this particular resource. An IO resource might be:\n\n- an existing file in your filesystem.\n- an existing network socket.\n- other types of stream channels.\n- a pipeline (or just \"pipe\") in your terminal[^pipes].\n\n[^pipes]: A pipeline is a mechanism for inter-process communication, or, inter-process IO. You could also interpret a pipeline as a \"set of processes that are chained together, through the standard input/output devices of the system\". At Linux for example, a pipeline is created inside a terminal, by connecting two or more terminal commands with the \"pipe\" character (`|`).\n\nFrom the bulletpoints listed aboved, we know that although the term \"file\" is present,\na \"file descriptor\" might describe something more than just a file.\nThis concept of a \"file descriptor\" comes from the Portable Operating System Interface (POSIX) API,\nwhich is a set of standards that guide how operating systems across the world should be implemented,\nto maintain compatibility between them.\n\nA file descriptor not only identifies the input/output resource that you are using to receive or send some data,\nbut it also describes where this resource is, and also, which IO mode this resource is currently using.\nFor example, this IO resource might be using only the \"read\" IO mode, which means that this resource\nis open to \"read operations\", while \"write operations\" are closed and not authorized.\nThese IO modes are essentially, the modes that you provide to the argument `mode`\nfrom the `fopen()` C function, and also, from the `open()` Python built-in function.\n\nIn C, a \"file descriptor\" is a `FILE` pointer, but, in Zig, a file descriptor is a `File` object.\nThis data type (`File`) is described in the `std.fs` module of the Zig Standard Library.\nWe normally don't create a `File` object directly in our Zig code. Instead, we normally get such object as result when we\nopen an IO resource. In other words, we normally ask to our OS to open and use a particular IO\nresource, and, if the OS do open succesfully this IO resource, the OS normally handles back to us\na file descriptor to this particular IO resource.\n\nSo you usually get a `File` object by using functions and methods from the Zig Standard Library\nthat asks the OS to open some IO resources, like the `openFile()` method that opens a file in the\nfilesystem. The `net.Stream` object that we have created at @sec-create-socket is also a type of\nfile descriptor object.\n\n\n### The *standard output*\n\nYou already saw across this book, how can we access and use specifically the `stdout` in Zig\nto send some output to the user.\nFor that, we use the `getStdOut()` function from the `std.io` module. This function returns\na file descriptor that describes the `stdout` channel of your current OS. Through this file\ndescriptor object, we can read from or write stuff to the `stdout` of our program.\n\nAlthough we can read stuff recorded into the `stdout` channel, we normally only\nwrite to (or \"print\") stuff into this channel. The reason is very similar to what we discussed at\n@sec-read-http-message, when we were discussing what \"reading from\" versus \"writing to\" the connection\nobject from our small HTTP Server project would mean.\n\nWhen we write stuff into a channel, we are essentially sending data to the other end of this channel.\nIn contrast, when we read stuff from this channel, we are essentially reading the data that was sent\nthrough this channel. Since the `stdout` is a channel to send output to the user, the key verb here\nis **send**. We want to send something to someone, and, as consequence, we want to **write** something\ninto some channel.\n\nThat is why, when we use `getStdOut()`, most of the times, we also use the `writer()` method from the `stdout` file descriptor,\nto get access to a writer object that we can use to write stuff into this `stdout` channel.\nMore specifically, this `writer()` method returns a `GenericWriter` object. One of the\nmain methods of this `GenericWriter` object is the `print()` method that we have used\nbefore to write (or \"print\") a formatted string into the `stdout` channel.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n try stdout.writeAll(\n \"This message was written into stdout.\\n\"\n );\n}\n```\n:::\n\n\n\n\n```\nThis message was written into stdout.\n```\n\n\nThis `GenericWriter` object is like any other generic writer object that you would normally get from a file descriptor object.\nSo, the same methods from a generic writer object that you would use while writing files to the filesystem for example, you could also\nuse them here, from the file descriptor object of `stdout`, and vice-versa.\n\n\n### The *standard input*\n\nYou can access the *standard input* (i.e. `stdin`) in Zig by using the `getStdIn()` function from the `std.io` module.\nLike it's sister (`getStdOut()`), this function also returns a file descriptor object that describes the `stdin` channel\nof your OS.\n\nSince now, we want to receive some input from the user, the key verb here becomes **receive**, and, as consequence,\nwe usually want to **read** data from the `stdin` channel, instead of writing data into it. So, we normally use\nthe `reader()` method of the file descriptor object returned by `getStdIn()`, to get access to a `GenericReader`\nobject that we can use to read data from `stdin`.\n\nIn the example below, we are creating a small buffer capable of holding 20 characters. Then, we try to read\nthe data from the `stdin` with the `readUntilDelimiterOrEof()` method, and save this data into the `buffer` object.\nAlso notice that we are reading the data from the `stdin` until we hit a new line character (`'\\n'`).\n\nIf you execute this program, you will notice that this program stops the execution, and start to wait indefinitely\nfor some input from the user. In other words, you need to type your name into the terminal, and then, you press Enter to\nsend your name to `stdin`. After you send your name to `stdin`, the program reads this input, and continues with the execution,\nby printing the given name to `stdout`. In the example below, I typed my name (Pedro) into the terminal, and then, pressed Enter.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst stdin = std.io.getStdIn().reader();\npub fn main() !void {\n try stdout.writeAll(\"Type your name\\n\");\n var buffer: [20]u8 = undefined;\n @memset(buffer[0..], 0);\n _ = try stdin.readUntilDelimiterOrEof(buffer[0..], '\\n');\n try stdout.print(\"Your name is: {s}\\n\", .{buffer});\n}\n```\n:::\n\n\n\n\n```\nType your name\nYour name is: Pedro\n\n```\n\n\n### The *standard error*\n\nThe *standard error* (a.k.a. the `stderr`) works exactly the same as the `stdout`.\nYou just call the `getStdErr()` function from the `std.io` module, and you get the file descriptor to `stderr`.\nIdeally, you should write only error or warning messages to `stderr`, because this is\nthe purpose of this channel.\n\n\n\n\n\n## Buffered IO\n\nAs we described at @sec-io-basics, input/output (IO) operations are made directly by the operating system.\nIt is the OS that manages the IO resource that you want to use for your IO operations.\nThe consequence of this fact is that IO operations are heavilly based on system calls (i.e. calling the operating system directly).\n\nJust to be clear, there is nothing particularly wrong with system calls. We use them all the time on\nany serious codebase written in any low-level programming language. However, system calls are\nalways orders of magnitude slower than many different types of operations.\n\nSo is perfectly fine to use a system call once in a while. But when these system calls start to be used often,\nyou can clearly notice most of the times the lost of performance in your application. So, the good rule of thumbs\nis to use a system call only when it is needed, and also, only in infrequent situations, to reduce\nthe number of system calls performed to a minimum.\n\n\n### Understanding how buffered IO works\n\nBuffered IO is a strategy to achieve better performance. It is used to reduce the number of system calls made by IO operations, and, as\nconsequence, achieve a much higher performance. At @fig-buff-diff you can find two different diagrams which presents the differences between\nread operations performed in an unbuferred IO environment versus a buffered IO environemnt.\n\nTo give a better context to these diagrams, let's suppose that we have a text file that contains the famous Lorem ipsum text[^lorem]\nin our filesystem. Let's also suppose that these diagrams at @fig-buff-diff\nare showing the read operations that we are performing to read the Lorem ipsum text from this text file.\nThe first thing you notice when looking at the diagrams, is that in an unbuffered environment the read operations leads to many system calls.\nMore precisely, in the diagram exposed at @fig-unbuffered-io we get one system call per each byte that we read from the text file.\nOn the other hand, at @fig-buffered-io we have only one system call at the very beginning.\n\nWhen we use a buffered IO system, at the first read operation we perform, instead of sending one single byte directly\nto our program, the OS first sends a chunk of bytes from the file to a buffer object (i.e. an array).\nThis chunk of bytes are cached/stored inside this buffer object, and when this operation is done, then\nyour program receives the byte that it actually asked for.\n\nFrom now on, for every new read operation that you perform, instead of making a new system call to ask\nfor the next byte in the file to the OS, this read operation is redirected to the buffer object, that have\nthis next byte already cached and ready to go.\n\n\n[^lorem]: .\n\n::: {#fig-buff-diff layout-nrow=2}\n\n![Unbuffered IO](./../Figures/unbuffered-io.png){#fig-unbuffered-io width=60%}\n\n![Buffered IO](./../Figures/buffered-io.png){#fig-buffered-io}\n\nDiagrams of read operations performed in buffered IO and unbuffered IO environments.\n\n:::\n\nThis is the basic logic behind buffered IO systems. The size of the buffer object depends, but most of the times,\nit is equal to a full page of memory (4096 bytes). If we follow this logic, then, the OS reads the first 4096 bytes\nof the file and caches it into the buffer object. As long as your program does not consume all of the 4096 bytes from the buffer,\nnot a single system call is created.\n\nHowever, as soon as you consume all of the 4096 bytes from the buffer, it means that there is no bytes left in the buffer.\nIn this situation, a new system call is made to ask the OS to send the next 4096 bytes in the file, and once again,\nthese bytes are cached into the buffer object, and the cycle starts once again.\n\n\n### Buffered IO across different languages\n\nIO operations made through a `FILE` pointer in C are buffered\nby default, so, at least in C, you don't need to worry about this subject. But in contrast, IO operations in both Rust and Zig are not\nbuffered depending on which functions from the standard libraries that you are using.\n\nFor example, in Rust, buffered IO is implemented through the `BufReader` and `BufWriter` structs, while in Zig, it is implemented\nthrough the `BufferedReader` and `BufferedWriter` structs.\nSo any IO operation that you perform through the `GenericWriter` and `GenericReader` objects\nthat I presented at @sec-writer-reader are not buffered, which means that these objects\nmight create a lot of system calls depending on the situation.\n\n\n### Using buffered IO in Zig\n\nUsing buffered IO in Zig is actually very easy. All you have to do is to just\ngive the `GenericWriter` object to the `bufferedWriter()` function, or, to give the `GenericReader`\nobject to the `bufferedReader()` function. These functions come from the `std.io` module,\nand they will construct the `BufferedWriter` or `BufferedReader` object for you.\n\nAfter you create this new `BufferedWriter` or `BufferedReader` object, you can call the `writer()`\nor `reader()` method of this new object, to get access to a new (and buffered) generic reader or\ngeneric writer.\n\nLet's describe the process once again. Every time that you have a file descriptor object, you first get the generic writer or generic reader\nobject from it, by calling the `writer()` or `reader()` methods of this file descriptor object.\nThen, you provide this generic writer or generic reader to the `bufferedWriter()` or `bufferedReader()`\nfunction, which creates a new `BufferedWriter` or `BufferedReader` object. Then, you call\nthe `writer()` or `reader()` methods of this buffered writer or buffered reader object,\nwhich gives you access to a generic writer or generic reader object that is buffered.\n\nTake this program as an example. This program is essentially demonstrating the process exposed at @fig-buffered-io.\nWe are simply opening a text file that contains the Lorem ipsum text, and then, we create a buffered IO reader object\nat `bufreader`, and we use this `bufreader` object to read the contents of this file into a buffer object, then,\nwe end the program by printing this buffer to `stdout`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar file = try std.fs.cwd().openFile(\n \"ZigExamples/file-io/lorem.txt\", .{}\n);\ndefer file.close();\nvar buffered = std.io.bufferedReader(file.reader());\nvar bufreader = buffered.reader();\n\nvar buffer: [1000]u8 = undefined;\n@memset(buffer[0..], 0);\n\n_ = try bufreader.readUntilDelimiterOrEof(\n buffer[0..], '\\n'\n);\ntry stdout.print(\"{s}\\n\", .{buffer});\n```\n:::\n\n\n\n\n```\nLorem ipsum dolor sit amet, consectetur\nadipiscing elit. Sed tincidunt erat sed nulla ornare, nec\naliquet ex laoreet. Ut nec rhoncus nunc. Integer magna metus,\nultrices eleifend porttitor ut, finibus ut tortor. Maecenas\nsapien justo, finibus tincidunt dictum ac, semper et lectus.\nVivamus molestie egestas orci ac viverra. Pellentesque nec\narcu facilisis, euismod eros eu, sodales nisl. Ut egestas\nsagittis arcu, in accumsan sapien rhoncus sit amet. Aenean\nneque lectus, imperdiet ac lobortis a, ullamcorper sed massa.\nNullam porttitor porttitor erat nec dapibus. Ut vel dui nec\nnulla vulputate molestie eget non nunc. Ut commodo luctus ipsum,\nin finibus libero feugiat eget. Etiam vel ante at urna tincidunt\nposuere sit amet ut felis. Maecenas finibus suscipit tristique.\nDonec viverra non sapien id suscipit.\n```\n\nDespite being a buffered IO reader, this `bufreader` object is similar to any other `GenericReader` object,\nand have the exact same methods. So, although these two types of objects perform very different IO operations,\nthey have the same interface, so, you the programmer, can interchangeably use them\nwithout the need to change anything in your source code.\nSo a buffered IO reader or a buffered IO writer objects have the same methods than it's generic and unbuffered brothers,\ni.e. the generic reader and generic writer objects that I presented at @sec-writer-reader.\n\n::: {.callout-tip}\nIn general, you should always use a buffered IO reader or a buffered IO writer object to perform\nIO operations in Zig. Because they deliver better performance to your IO operations.\n:::\n\n\n## Filesystem basics\n\nNow that we have discussed the basics around Input/Output operations in Zig, we need to\ntalk about the basics around filesystems, which is another core part of any operating system.\nAlso, filesystems are related to input/output, because the files that we store and create in our\ncomputer are considered an IO resource, as we described at @sec-file-descriptor.\n\nLikewise when we were talking about input/output, if you have ever programmed in your life, you probably know\nsome basics about filesystems and file operations, etc.\nBut, since I don't know you, I don't know what is your background. As a result,\nthese concepts that I will describe might be clear in your mind, but they also maybe be not as clear as you think.\nJust bare with me, while I'm trying to put everyone on the same basis.\n\n\n### The concept of current working directory (CWD)\n\nThe working directory is the folder on your computer where you are currently rooted at,\nor in other words, it is the folder that your program is currently looking at.\nTherefore, whenever you are executing a program, this program is always working with\na specific folder on your computer. It is always in this folder that the program will initially\nlook for the files you require, and it is also in this folder that the program\nwill initially save all the files you ask it to save.\n\nThe working directory is determined by the folder from which you invoke your program\nin the terminal. In other words, if you are in the terminal of your OS, and you\nexecute a binary file (i.e. a program) from this terminal, the folder to which your terminal\nis pointing at is the current working directory of your program that is being executed.\n\nAt @fig-cwd we have an example of me executing a program from the terminal. We are executing\nthe program outputted by the `zig` compiler by compiling the Zig module named `hello.zig`.\nThe CWD in this case is the `zig-book` folder. In other words, while the `hello.zig` program\nis executing, it will be looking at the `zig-book` folder, and any file operation that we perform\ninside this program, will be using this `zig-book` folder as the \"starting point\", or, as the \"central focus\".\n\n![An example of executing a program from the terminal](./../Figures/cwd.png){#fig-cwd}\n\nJust because we are rooted inside a particular folder (in the case of @fig-cwd, the `zig-book` folder) of our computer,\nit doesn't mean that we cannot access or write resources in other locations of our computer.\nThe current working directory (CWD) mechanism just defines where your program will look first\nfor the files you ask for. This does not prevent you from accessing files that are located\nelsewhere on your computer. However, to access any file that is in a folder other than your\ncurrent working directory, you must provide a path to that file or folder.\n\n\n### The concept of paths\n\nA path is essentially a location. It points to a location in your filesystem. We use\npaths to describe the location of files and folders in our computer.\nOne important aspect is that paths are always written inside strings,\ni.e. they are always provided as text values.\n\nThere are two types of paths that you can provide to any program in any OS: a relative path, or an absolute path.\nAbsolute paths are paths that start at the root of your filesystem, and go all the way to the file name or the specfic folder\nthat you are referring to. This type of path is called absolute, because it points to a unique, absolute location on your computer.\nThat is, there is no other existing location on your computer that corresponds to this path. It is an unique identifier.\n\nIn Windows, an absolute path is a path that starts with a hard disk identifier (e.g. `C:/Users/pedro`).\nOn the other hand, absolute paths in Linux and MacOS, are paths that start with a forward slash character (e.g. `/usr/local/bin`).\nNotice that a path is composed by \"segments\". Each segment is connected to each other by a slash character (`\\` or `/`).\nOn Windows, the backward slash (`\\`) is normally used to connect the path segments. While on Linux and MacOS, the forward\nslash (`/`) is the character used to connect path segments.\n\nIn contrast, a relative path is a path that start at the CWD. In other words, a relative path is\n\"relative to the CWD\". The path used to access the `hello.zig` file at @fig-cwd is an example of relative path. This path\nis reproduced below. This path begins at the CWD, which in the context of @fig-cwd, is the `zig-book` folder,\nthen, it goes to the `ZigExamples` folder, then, into `zig-basics`, then, to the `hello.zig` file.\n\n```\nZigExamples/zig-basics/hello_world.zig\n```\n\n\n### Path wildcards\n\nWhen providing paths, especially relative paths, you have the option of using a *wildcard*.\nThere are two commonly used *wildcards* in paths, which are \"one period\" (.) and \"two periods\" (..).\nIn other words, these two specific characters have special meanings when used in paths,\nand can be used on any operating system (Mac, Windows, Linux, etc.). That is, they\nare \"cross platform\".\n\nThe \"one period\" represents an alias for your current working directory.\nThis means that the relative paths `\"./Course/Data/covid.csv\"` and `\"Course/Data/covid.csv\"` are equivalent.\nOn the other hand, the \"two periods\" refers to the previous directory.\nFor example, the path `\"Course/..\"` is equivalent to the path `\".\"`, that is, the current working directory.\n\nTherefore, the path `\"Course/..\"` refers to the folder before the `Course` folder.\nAs another example, the path `\"src/writexml/../xml.cpp\"` refers to the file `xml.cpp`\nthat is inside the folder before the `writexml` folder, which in this example is the `src` folder.\nTherefore, this path is equivalent to `\"src/xml.cpp\"`.\n\n\n\n\n## The CWD handler\n\nIn Zig, filesystem operations are usually made through a directory handler object.\nA directory handler in Zig is an object of type `Dir`, which is an object that describes\na particular folder in the filesystem of our computer.\nYou normally create a `Dir` object, by calling the `std.fs.cwd()` function.\nThis function returns a `Dir` object that points to (or, that describes) the\ncurrent working directory (CWD).\n\nThrough this `Dir` object, you can create new files, or modify, or read existing ones that are\ninside your CWD. In other words, a `Dir` object is the main entrypoint in Zig to perform\nmultiple types of filesystem operations.\nIn the example below, we are creating this `Dir` object, and storing it\ninside the `cwd` object. Although we are not using this object at this code example,\nwe are going to use it a lot over the next examples.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\n_ = cwd;\n```\n:::\n\n\n\n\n\n\n\n\n\n\n\n## File operations\n\n### Creating files {#sec-creating-files}\n\nWe create new files by using the `createFile()` method from the `Dir` object.\nJust provide the name of the file that you want to create, and this function will\ndo the necessary steps to create such file. You can also provide a relative path to this function,\nand it will create the file by following this path, which is relative to the CWD.\n\nThis function might return an error, so, you should use `try`, `catch`, or any of the other methods presented\nat @sec-error-handling to handle the possible error. But if everything goes well,\nthis `createFile()` method returns a file descriptor object (i.e. a `File` object) as result,\nthrough which you can add content to the file with the IO operations that I presented before.\n\nTake this code example below. In this example, we are creating a new text file\nnamed `foo.txt`. If the function `createFile()` succeeds, the object named `file` will contain a file descriptor\nobject, which we can use to write (or add) new content to the file, like we do in this example, by using\na buffered writer object to write a new line of text to the file.\n\nNow, a quick note, when we create a file descriptor object in C, by using a C function like `fopen()`, we must always close the file\nat the end of our program, or, as soon as we complete all operations that we wanted to perform\non the file. In Zig, this is no different. So everytime we create a new file, this file remains\n\"open\", waiting for some operation to be performed. As soon as we are done with it, we always have\nto close this file, to free the resources associated with it.\nIn Zig, we do this by calling the method `close()` from the file descriptor object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst file = try cwd.createFile(\"foo.txt\", .{});\n// Don't forget to close the file at the end.\ndefer file.close();\n// Do things with the file ...\nvar fw = file.writer();\n_ = try fw.writeAll(\n \"Writing this line to the file\\n\"\n);\n```\n:::\n\n\n\n\n\nSo, in this example we not only have created a file into the filesystem,\nbut we also wrote some data into this file, using the file descriptor object\nreturned by `createFile()`. If the file that you are trying to create\nalready exists in your filesystem, this `createFile()` call will\noverwrite the contents of the file, or, in other words, it will\nin practice erase all the contents of the existing file.\n\nIf you don't want this to happen, meaning, that you don't want to overwrite\nthe contents of the existing file, but you want to write data to this file anyway\n(i.e. you want to append data to the file), you should use the `openFile()`\nmethod from the `Dir` object.\n\nAnother important aspect about `createFile()` is that this method creates a file\nthat is not opened to read operations by default. It means that you cannot read this file.\nYou are not allowed to.\nSo for example, you might want to write some stuff into this file at the beginning of the execution\nof your program. Then, at a future point in your program you might need to read what you have\nwroted into this file. If you try to read data from this file, you will likely\nget a `NotOpenForReading` error as result.\n\n\nBut how can you overcome this barrier? How can you create a file that is open\nto read operations? All you have to do, is to set the `read` flag to true\nin the second argument of `createFile()`. When you set this flag to true,\nthen the file get's create with \"read permissions\", and, as consequence,\na program like this one below becomes valid:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst file = try cwd.createFile(\"foo.txt\", .{ .read = true });\ndefer file.close();\n\nvar fw = file.writer();\n_ = try fw.writeAll(\"We are going to read this line\\n\");\n\nvar buffer: [300]u8 = undefined;\n@memset(buffer[0..], 0);\ntry file.seekTo(0);\nvar fr = file.reader();\n_ = try fr.readAll(buffer[0..]);\ntry stdout.print(\"{s}\\n\", .{buffer});\n```\n:::\n\n\n\n\n\n```\nWe are going to read this line\n```\n\n\nIf you are not familiar with position indicators, you may not recognize what the method\n`seekTo()` is, or, what does it do. If that is your case, do not worry,\nwe are going to talk more about this method at @sec-indicators. But essentially\nthis method is moving the position indicator back to the beginning of the file,\nso that we can read the contents of the file from the beginning.\n\n\n### Opening files and appending data to it\n\nOpening files is easy. Just use the `openFile()` method instead of `createFile()`.\nIn the first argument of `openFile()` you provide the path to the file that\nyou want to open. Then, on the second argument you provide the flags (or, the options)\nthat dictates how the file is opened.\n\nYou can see the full list of options for `openFile()` by visiting the documentation for\n[`OpenFlags`](https://ziglang.org/documentation/master/std/#std.fs.File.OpenFlags)[^oflags].\nBut the main flag that you will most certainly be worried about is the `mode` flag.\nThis flag specifies the IO mode that the file will be using when it get's opened.\nThere are three IO modes, or, three values that you can provide to this flag, which are:\n\n- `read_only`, allows only read operations on the file. All write operations are blocked.\n- `write_only`, allows only write operations on the file. All read operations are blocked. \n- `read_write`, allows both write and read operations on the file.\n\n[^oflags]: \n\nThese modes are similar to the modes that you provide to the `mode` argument of the\n`open()` Python built-in function[^py-open], or, the `mode` argument of the\n`fopen()` C function[^c-open].\nIn the code example below, we are opening the `foo.txt` text file with a `write_only` mode,\nand appending a new line of text to the end of the file. We use `seekFromEnd()` this time\nto garantee that we are going to append the text to the end of the file. Once again, methods\nsuch as `seekFromEnd()` are described in more depth at @sec-indicators.\n\n[^py-open]: \n[^c-open]: \n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst file = try cwd.openFile(\"foo.txt\", .{ .mode = .write_only });\ndefer file.close();\ntry file.seekFromEnd(0);\nvar fw = file.writer();\n_ = try fw.writeAll(\"Some random text to write\\n\");\n```\n:::\n\n\n\n\n\n### Deleting files\n\nSometimes, we just need to delete/remove the files that we have.\nTo do that, we use the `deleteFile()` method. You just provide the path of the\nfile that you want to delete, and this method will try to delete the file located\nat this path.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.deleteFile(\"foo.txt\");\n```\n:::\n\n\n\n\n### Copying files\n\nTo copy existing files, we use the `copyFile()` method. The first argument in this method\nis the path to the file that you want to copy. The second argument is a `Dir` object, i.e. a directory handler,\nmore specifically, a `Dir` object that points to the folder in your computer where you want to\ncopy the file to. The third argument is the new path of the file, or, in other words, the new location\nof the file. The fourth argument is the options (or flags) to be used in the copy operation.\n\nThe `Dir` object that you provide as input to this method will be used to copy the file to\nthe new location. You may create this `Dir` object before calling the `copyFile()` method.\nMaybe you are planning to copy the file to a completly different location in your computer,\nso it might be worth to create a directory handler to that location. But if you copying the\nfile to a subfolder of your CWD, then, you can just simply pass the CWD handler to this argument.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.copyFile(\n \"foo.txt\",\n cwd,\n \"ZigExamples/file-io/foo.txt\",\n .{}\n);\n```\n:::\n\n\n\n\n\n### Read the docs!\n\nThere are some other useful methods for file operations available at `Dir` objects,\nsuch as the `writeFile()` method, but I recommend you to read the docs for the\n[`Dir` type](https://ziglang.org/documentation/master/std/#std.fs.Dir)[^zig-dir]\nto explore the other available methods, since I already talked too much about them.\n\n\n[^zig-dir]: \n\n\n\n\n## Position indicators {#sec-indicators}\n\nA position indicator is like a type of cursor, or, an index. This \"index\" identifies the current\nlocation in the file (or, in the data stream) that the file descriptor object that you have\nis currently looking at.\nWhen you create a file descriptor, the position indicator starts at the beginning of the file,\nor, at the beginning of the stream. When you read or write data into the file (or socket, or data stream, etc.)\ndescribed by this file descriptor object, you end up moving the position indicator.\n\nIn other words, any IO operation have a common side effect, which is moving the position indicator.\nFor example, suppose that we have a file of 300 bytes total in size. If you\nread 100 bytes from the file, the position indicator moves 100 bytes forward. If you try\nto write 50 bytes into this same file, these 50 bytes will be written from the current\nposition indicated by the position indicator. Since the indicator is at a 100 bytes forward from\nthe beginning of the file, these 50 bytes would be written in the middle of the file.\n\nThis is why we have used the `seekTo()` method at the last code example presented at @sec-creating-files.\nWe have used this method to move the position indicator back to the beginning of the file, which\nwould make sure that we would write the text that we wanted to write from the beginning of the file,\ninstead of writing it from the middle of the file. Because before the write operation, we already had\nperformed a read operation, which means that the position indicator was moved in this read operation.\n\nThe position indicators of a file descriptor object can be changed (or altered) by using the\n\"seek\" methods from this file descriptor, which are: `seekTo()`, `seekFromEnd()` and `seekBy()`.\nThese methods have the same effect, or, the same resposibility that the\n[`fseek()`](https://en.cppreference.com/w/c/io/fseek)[^c-fseek]\nC function.\n\n[^c-fseek]: \n\n\nConsidering that `offset` refers to the index that you provide as input to these \"seek\" methods,\nthe bulletpoints below summarises what is the effect of each of these methods.\nA quick note, in the case of `seekFromEnd()` and `seekBy()`, the `offset` provided can be either a\npositive or negative index.\n\n- `seekTo()` will move the position indicator to the location that is `offset` bytes from the beginning of the file.\n- `seekFromEnd()` will move the position indicator to the location that is `offset` bytes from the end of the file.\n- `seekBy()` will move the position indicator to the location that is `offset` bytes from the current position in the file.\n\n\n\n\n\n\n\n## Directory operations\n\n### Iterating through the files in a directory\n\nOne of the most classic tasks related to filesystem is to be able\nto iterate through the existing files in a directory. Iteration\nover a directory is made in Zig through a iterator pattern. In other words, we need\nto create a iterator object, and use this object to iterate through the files.\n\nYou can produce such directory object by using either the `iterate()` or `walk()` methods\nof a `Dir` object. Both methods return a iterator object as result, which you can advance by using\n`next()`. The difference between these methods, is that `iterate()` returns a non-recursive iterator,\nwhile `walk()` does. It means that the iterator returned by `walk()` will not only iterate through\nthe files available in the current directory, but also, through the files from any subdirectory found\ninside the current directory.\n\nIn the example below, we are displaying the names of the files stored inside the\ndirectory `ZigExamples/file-io`. Notice that we had to open this directory through\nthe `openDir()` function. Also notice that we provided the flag `iterate` in the\nsecond argument of `openDir()`. This flag is important, because without this flag,\nwe would not be allowed to iterate through the files in this directory.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst dir = try cwd.openDir(\n \"ZigExamples/file-io/\",\n .{ .iterate = true }\n);\nvar it = dir.iterate();\nwhile (try it.next()) |entry| {\n try stdout.print(\n \"File name: {s}\\n\",\n .{entry.name}\n );\n}\n```\n:::\n\n\n\n\n```\nFile name: create_file_and_write_toit.zig\nFile name: create_file.zig\nFile name: lorem.txt\nFile name: iterate.zig\nFile name: delete_file.zig\nFile name: append_to_file.zig\nFile name: user_input.zig\nFile name: foo.txt\nFile name: create_file_and_read.zig\nFile name: buff_io.zig\nFile name: copy_file.zig\n```\n\n\n### Creating new directories\n\nThere are two methods that are important when it comes to\ncreating directories, which are `makeDir()` and `makePath()`.\nThe difference between these two methods is that `makeDir()` can\nonly create one single directory in the current directory in each call,\nwhile `makePath()` is capable of recursively create subdirectories in the same call.\n\n\nThis is why the name of this method is \"make path\". It will create as many\nsubdirectories as necessary to create the path that you provided as input.\nSo, if you provide the path `\"sub1/sub2/sub3\"` as input to this method,\nit will create three different subdirectories, `sub1`, `sub2` and `sub3`,\nwithin the same function call. In contrast, if you provided such path\nas input to `makeDir()`, you would likely get an error as result, since\nthis method can only create a single subdirectory.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.makeDir(\"src\");\ntry cwd.makePath(\"src/decoders/jpg/\");\n```\n:::\n\n\n\n\n### Deleting directories\n\nTo delete a directory, just provide the path to the directory that you want to delete\nas input to the `deleteDir()` method from a `Dir` object. In the example below,\nwe are deleting the `src` directory that we have just created in the previous example.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.deleteDir(\"src\");\n```\n:::\n\n\n\n\n\n## Conclusion\n\nIn this chapter, I have described how to perform in Zig the most common filesystem and IO operations.\nBut you might feel the lack of some other, less common, operation in this chapter, such as: how to rename files,\nor how to open a directory, or how to create symbolic links, or how to use `access()` to test if a particular\npath exists in your computer. But for all of these less common tasks, I recommend you to read\nthe docs of the [`Dir` type](https://ziglang.org/documentation/master/std/#std.fs.Dir)[^zig-dir]\n, since you can find a good description of these cases there.\n\n\n\n\n",
+ "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Filesystem and Input/Output (IO) {#sec-filesystem}\n\nIn this chapter we are going to discuss how to use the cross-platform structs and functions available\nin the Zig Standard Library that executes filesystem operations. Most of these functions and structs\ncomes from the `std.fs` module.\n\nWe are also going to talk about Input/Output (also known as IO) operations in Zig. Most of\nthese operations are made by using the structs and functions from `std.io` module, which defines\ndescriptors for the *standard channels* of your system (`stdout` and `stdin`), and also,\nfunctions to create and use I/O streams.\n\n\n## Input/Output basics {#sec-io-basics}\n\nIf you have programming experience in a high-level language, you certainly have used before\nthe input and output functionalities of this language. In other words, you certainly have\nbeen in a situation where you needed to sent some output to the user, or, to receive an input\nfrom the user.\n\nFor example, in Python we can receive some input from the user by using the `input()` built-in\nfunction. But we can also print (or \"show\") some output to the user by using the `print()`\nbuilt-in function. So yes, if you have programmed before in Python, you certainly have\nused these functions once before.\n\nBut do you know how these functions relate back to your operating system (OS)? How exactly\nthey are interacting with the resources of your OS to receive or sent input/output.\nIn essence, these input/output functions from high-level languages are just abstractions\nover the *standard output* and *standard input* channels of your operating system.\n\nThis means that we receive an input, or send some output, through the operating system.\nIt is the OS that makes the bridge between the user and your program. Your program\ndoes not have a direct access to the user. It is the OS that intermediates every\nmessage exchanged between your program and the user.\n\nThe *standard output* and *standard input* channels of your OS are commonly known as the\n`stdout` and `stdin` channels of your OS, respectively. In some contexts, they are also called of the *standard output device*\nand *standard input device*. As the name suggests, the *standard output*\nis the channel through which output flows, while the *standard input* is the channel in which\ninput flows.\n\nFurthermore, OS's also normally create a dedicated channel for exchanging error messages, known as the\n*standard error* channel, or, the `stderr` channel. This is the channel to which error and warning messages\nare usually sent to. These are the messages that are normally displayed in red-like or orange-like colors\ninto your terminal.\n\nNormally, every OS (e.g. Windows, MacOS, Linux, etc.) creates a dedicated and separate pair of\n*standard output*, *standard error* and *standard input* channels for every single program (or process) that runs in your computer.\nThis means that every program you write have a dedicated `stdin`, `stderr` and `stdout` that are separate\nfrom the `stdin`, `stderr` and `stdout` of other programs and processes that are currently running.\n\nThis is a behaviour from your OS.\nThis does not come from the programming language that you are using.\nBecause as I sad earlier, input and output in programming languages, especially\nin high-level ones, are just a simple abstraction over the `stdin`, `stderr` and `stdout` from your current OS.\nThat is, your OS is the intermediary between every input/output operation made in your program,\nregardless of the programming language that you are using.\n\n\n### The writer and reader pattern {#sec-writer-reader}\n\nIn Zig, there is a pattern around input/output (IO). I (the author of this book) don't know if there is an official name for this pattern.\nBut here, in this book, I will call it the \"writer and reader pattern\". In essence, every IO operation in Zig is\nmade through either a `GenericReader` or a `GenericWriter` object[^gen-zig].\n\nThese two data types come from the `std.io` module of the Zig Standard Library. As their names suggests, a\n`GenericReader` is an object that offers tools to read data from \"something\" (or \"somewhere\"), while a `GenericWriter`\noffers tools to write data into this \"something\".\nThis \"something\" might be different things: like a file that exists in your filesystem; or, it might be a network socket of your system[^sock]; or,\na continuous stream of data, like a standard input device from your system, that might be constantly\nreceiving new data from users, or, as another example, a live chat in a game that is constantly receiving and displaying new messages from the\nplayers of the game.\n\n[^gen-zig]: Previously, these objects were known as the `Reader` and `Writer` objects.\n[^sock]: The socket objects that we have created at @sec-create-socket, are examples of network sockets.\n\nSo, if you want to **read** data from something, or somewhere, it means that you need to use a `GenericReader` object.\nBut if you need instead, to **write** data into this \"something\", then, you need to use a `GenericWriter` object instead.\nBoth of these objects are normally created from a file descriptor object. More specifically, through the `writer()` and `reader()`\nmethods of this file descriptor object. If you are not familiar with this type of object, go to the\nnext section.\n\nEvery `GenericWriter` object have methods like `print()`, which allows you to write/send a formatted string\n(i.e. this formatted string is like a `f` string in Python, or, similar to the `printf()` C function)\ninto the \"something\" (file, socket, stream, etc.) that you are using. It also have a `writeAll()` method, which allows you to\nwrite a string, or, an array of bytes into the \"something\".\n\nLikewise, every `GenericReader` object have methods like `readAll()`, which allows you to read the\ndata from the \"something\" (file, socket, stream, etc.) until it fills a particular array (i.e. a \"buffer\") object.\nIn other words, if you provide an array object of 300 `u8` values to `readAll()`, then, this method attempts to read 300 bytes\nof data from the \"something\", and it stores them into the array object that you have provided.\n\nWe also have other methods, like the `readAtLeast()` method,\nwhich allows you to specify how many bytes exactly you want to read from the \"something\".\nIn more details, if you give the number $n$ as input to this method, then, it will attempt to read at least $n$ bytes of data from the \"something\".\nThe \"something\" might have less than $n$ bytes of data available for you to read, so, it is not garanteed\nthat you will get precisely $n$ bytes as result.\n\nAnother useful method is `readUntilDelimiterOrEof()`. In this method, you specify a \"delimiter character\".\nThe idea is that this function will attempt to read as many bytes of data as possible from the \"something\",\nuntil it encounters the end of the stream, or, it encounters the \"delimiter character\" that you have specified.\n\nIf you don't know exactly how many bytes will come from the \"something\", you may find the `readAllAlloc()` method\nuseful. In essence, you provide an allocator object to this method, so that it can allocate more space if needed.\nAs consequence, this method will try to read all bytes of the \"something\", and, if it runs out of space at some point\nduring the \"reading process\", it uses the allocator object to allocate more space to continue reading the bytes.\nAs result, this method returns a slice to the array object containing all the bytes read.\n\nThis is just a quick description of the methods present in these types of objects. But I recommend you\nto read the official docs, both for\n[`GenericWriter`](https://ziglang.org/documentation/master/std/#std.io.GenericWriter)[^gen-write] and\n[`GenericReader`](https://ziglang.org/documentation/master/std/#std.io.GenericReader)[^gen-read].\nI also think it is a good idea to read the source code of the modules in the Zig Standard Library\nthat defines the methods present in these objects, which are the\n[`Reader.zig`](https://github.com/ziglang/zig/blob/master/lib/std/io/Reader.zig)[^mod-read]\nand [`Writer.zig`]()[^mod-write].\n\n[^gen-read]: .\n[^gen-write]: .\n[^mod-read]: .\n[^mod-write]: .\n\n\n### Introducing file descriptors {#sec-file-descriptor}\n\nA \"file descriptor\" object is a core component behind every I/O operation that is made in any operating system (OS).\nSuch object is an identifier for a particular input/output (IO) resource from your OS [@wiki_file_descriptor].\nIt describes and identifies this particular resource. An IO resource might be:\n\n- an existing file in your filesystem.\n- an existing network socket.\n- other types of stream channels.\n- a pipeline (or just \"pipe\") in your terminal[^pipes].\n\n[^pipes]: A pipeline is a mechanism for inter-process communication, or, inter-process IO. You could also interpret a pipeline as a \"set of processes that are chained together, through the standard input/output devices of the system\". At Linux for example, a pipeline is created inside a terminal, by connecting two or more terminal commands with the \"pipe\" character (`|`).\n\nFrom the bulletpoints listed aboved, we know that although the term \"file\" is present,\na \"file descriptor\" might describe something more than just a file.\nThis concept of a \"file descriptor\" comes from the Portable Operating System Interface (POSIX) API,\nwhich is a set of standards that guide how operating systems across the world should be implemented,\nto maintain compatibility between them.\n\nA file descriptor not only identifies the input/output resource that you are using to receive or send some data,\nbut it also describes where this resource is, and also, which IO mode this resource is currently using.\nFor example, this IO resource might be using only the \"read\" IO mode, which means that this resource\nis open to \"read operations\", while \"write operations\" are closed and not authorized.\nThese IO modes are essentially, the modes that you provide to the argument `mode`\nfrom the `fopen()` C function, and also, from the `open()` Python built-in function.\n\nIn C, a \"file descriptor\" is a `FILE` pointer, but, in Zig, a file descriptor is a `File` object.\nThis data type (`File`) is described in the `std.fs` module of the Zig Standard Library.\nWe normally don't create a `File` object directly in our Zig code. Instead, we normally get such object as result when we\nopen an IO resource. In other words, we normally ask to our OS to open and use a particular IO\nresource, and, if the OS do open successfully this IO resource, the OS normally handles back to us\na file descriptor to this particular IO resource.\n\nSo you usually get a `File` object by using functions and methods from the Zig Standard Library\nthat asks the OS to open some IO resources, like the `openFile()` method that opens a file in the\nfilesystem. The `net.Stream` object that we have created at @sec-create-socket is also a type of\nfile descriptor object.\n\n\n### The *standard output*\n\nYou already saw across this book, how can we access and use specifically the `stdout` in Zig\nto send some output to the user.\nFor that, we use the `getStdOut()` function from the `std.io` module. This function returns\na file descriptor that describes the `stdout` channel of your current OS. Through this file\ndescriptor object, we can read from or write stuff to the `stdout` of our program.\n\nAlthough we can read stuff recorded into the `stdout` channel, we normally only\nwrite to (or \"print\") stuff into this channel. The reason is very similar to what we discussed at\n@sec-read-http-message, when we were discussing what \"reading from\" versus \"writing to\" the connection\nobject from our small HTTP Server project would mean.\n\nWhen we write stuff into a channel, we are essentially sending data to the other end of this channel.\nIn contrast, when we read stuff from this channel, we are essentially reading the data that was sent\nthrough this channel. Since the `stdout` is a channel to send output to the user, the key verb here\nis **send**. We want to send something to someone, and, as consequence, we want to **write** something\ninto some channel.\n\nThat is why, when we use `getStdOut()`, most of the times, we also use the `writer()` method from the `stdout` file descriptor,\nto get access to a writer object that we can use to write stuff into this `stdout` channel.\nMore specifically, this `writer()` method returns a `GenericWriter` object. One of the\nmain methods of this `GenericWriter` object is the `print()` method that we have used\nbefore to write (or \"print\") a formatted string into the `stdout` channel.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\npub fn main() !void {\n try stdout.writeAll(\n \"This message was written into stdout.\\n\"\n );\n}\n```\n:::\n\n\n\n\n```\nThis message was written into stdout.\n```\n\n\nThis `GenericWriter` object is like any other generic writer object that you would normally get from a file descriptor object.\nSo, the same methods from a generic writer object that you would use while writing files to the filesystem for example, you could also\nuse them here, from the file descriptor object of `stdout`, and vice-versa.\n\n\n### The *standard input*\n\nYou can access the *standard input* (i.e. `stdin`) in Zig by using the `getStdIn()` function from the `std.io` module.\nLike it's sister (`getStdOut()`), this function also returns a file descriptor object that describes the `stdin` channel\nof your OS.\n\nSince now, we want to receive some input from the user, the key verb here becomes **receive**, and, as consequence,\nwe usually want to **read** data from the `stdin` channel, instead of writing data into it. So, we normally use\nthe `reader()` method of the file descriptor object returned by `getStdIn()`, to get access to a `GenericReader`\nobject that we can use to read data from `stdin`.\n\nIn the example below, we are creating a small buffer capable of holding 20 characters. Then, we try to read\nthe data from the `stdin` with the `readUntilDelimiterOrEof()` method, and save this data into the `buffer` object.\nAlso notice that we are reading the data from the `stdin` until we hit a new line character (`'\\n'`).\n\nIf you execute this program, you will notice that this program stops the execution, and start to wait indefinitely\nfor some input from the user. In other words, you need to type your name into the terminal, and then, you press Enter to\nsend your name to `stdin`. After you send your name to `stdin`, the program reads this input, and continues with the execution,\nby printing the given name to `stdout`. In the example below, I typed my name (Pedro) into the terminal, and then, pressed Enter.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst stdin = std.io.getStdIn().reader();\npub fn main() !void {\n try stdout.writeAll(\"Type your name\\n\");\n var buffer: [20]u8 = undefined;\n @memset(buffer[0..], 0);\n _ = try stdin.readUntilDelimiterOrEof(buffer[0..], '\\n');\n try stdout.print(\"Your name is: {s}\\n\", .{buffer});\n}\n```\n:::\n\n\n\n\n```\nType your name\nYour name is: Pedro\n\n```\n\n\n### The *standard error*\n\nThe *standard error* (a.k.a. the `stderr`) works exactly the same as the `stdout`.\nYou just call the `getStdErr()` function from the `std.io` module, and you get the file descriptor to `stderr`.\nIdeally, you should write only error or warning messages to `stderr`, because this is\nthe purpose of this channel.\n\n\n\n\n\n## Buffered IO\n\nAs we described at @sec-io-basics, input/output (IO) operations are made directly by the operating system.\nIt is the OS that manages the IO resource that you want to use for your IO operations.\nThe consequence of this fact is that IO operations are heavilly based on system calls (i.e. calling the operating system directly).\n\nJust to be clear, there is nothing particularly wrong with system calls. We use them all the time on\nany serious codebase written in any low-level programming language. However, system calls are\nalways orders of magnitude slower than many different types of operations.\n\nSo is perfectly fine to use a system call once in a while. But when these system calls start to be used often,\nyou can clearly notice most of the times the lost of performance in your application. So, the good rule of thumbs\nis to use a system call only when it is needed, and also, only in infrequent situations, to reduce\nthe number of system calls performed to a minimum.\n\n\n### Understanding how buffered IO works\n\nBuffered IO is a strategy to achieve better performance. It is used to reduce the number of system calls made by IO operations, and, as\nconsequence, achieve a much higher performance. At @fig-buff-diff you can find two different diagrams which presents the differences between\nread operations performed in an unbuferred IO environment versus a buffered IO environemnt.\n\nTo give a better context to these diagrams, let's suppose that we have a text file that contains the famous Lorem ipsum text[^lorem]\nin our filesystem. Let's also suppose that these diagrams at @fig-buff-diff\nare showing the read operations that we are performing to read the Lorem ipsum text from this text file.\nThe first thing you notice when looking at the diagrams, is that in an unbuffered environment the read operations leads to many system calls.\nMore precisely, in the diagram exposed at @fig-unbuffered-io we get one system call per each byte that we read from the text file.\nOn the other hand, at @fig-buffered-io we have only one system call at the very beginning.\n\nWhen we use a buffered IO system, at the first read operation we perform, instead of sending one single byte directly\nto our program, the OS first sends a chunk of bytes from the file to a buffer object (i.e. an array).\nThis chunk of bytes are cached/stored inside this buffer object, and when this operation is done, then\nyour program receives the byte that it actually asked for.\n\nFrom now on, for every new read operation that you perform, instead of making a new system call to ask\nfor the next byte in the file to the OS, this read operation is redirected to the buffer object, that have\nthis next byte already cached and ready to go.\n\n\n[^lorem]: .\n\n::: {#fig-buff-diff layout-nrow=2}\n\n![Unbuffered IO](./../Figures/unbuffered-io.png){#fig-unbuffered-io width=60%}\n\n![Buffered IO](./../Figures/buffered-io.png){#fig-buffered-io}\n\nDiagrams of read operations performed in buffered IO and unbuffered IO environments.\n\n:::\n\nThis is the basic logic behind buffered IO systems. The size of the buffer object depends, but most of the times,\nit is equal to a full page of memory (4096 bytes). If we follow this logic, then, the OS reads the first 4096 bytes\nof the file and caches it into the buffer object. As long as your program does not consume all of the 4096 bytes from the buffer,\nnot a single system call is created.\n\nHowever, as soon as you consume all of the 4096 bytes from the buffer, it means that there is no bytes left in the buffer.\nIn this situation, a new system call is made to ask the OS to send the next 4096 bytes in the file, and once again,\nthese bytes are cached into the buffer object, and the cycle starts once again.\n\n\n### Buffered IO across different languages\n\nIO operations made through a `FILE` pointer in C are buffered\nby default, so, at least in C, you don't need to worry about this subject. But in contrast, IO operations in both Rust and Zig are not\nbuffered depending on which functions from the standard libraries that you are using.\n\nFor example, in Rust, buffered IO is implemented through the `BufReader` and `BufWriter` structs, while in Zig, it is implemented\nthrough the `BufferedReader` and `BufferedWriter` structs.\nSo any IO operation that you perform through the `GenericWriter` and `GenericReader` objects\nthat I presented at @sec-writer-reader are not buffered, which means that these objects\nmight create a lot of system calls depending on the situation.\n\n\n### Using buffered IO in Zig\n\nUsing buffered IO in Zig is actually very easy. All you have to do is to just\ngive the `GenericWriter` object to the `bufferedWriter()` function, or, to give the `GenericReader`\nobject to the `bufferedReader()` function. These functions come from the `std.io` module,\nand they will construct the `BufferedWriter` or `BufferedReader` object for you.\n\nAfter you create this new `BufferedWriter` or `BufferedReader` object, you can call the `writer()`\nor `reader()` method of this new object, to get access to a new (and buffered) generic reader or\ngeneric writer.\n\nLet's describe the process once again. Every time that you have a file descriptor object, you first get the generic writer or generic reader\nobject from it, by calling the `writer()` or `reader()` methods of this file descriptor object.\nThen, you provide this generic writer or generic reader to the `bufferedWriter()` or `bufferedReader()`\nfunction, which creates a new `BufferedWriter` or `BufferedReader` object. Then, you call\nthe `writer()` or `reader()` methods of this buffered writer or buffered reader object,\nwhich gives you access to a generic writer or generic reader object that is buffered.\n\nTake this program as an example. This program is essentially demonstrating the process exposed at @fig-buffered-io.\nWe are simply opening a text file that contains the Lorem ipsum text, and then, we create a buffered IO reader object\nat `bufreader`, and we use this `bufreader` object to read the contents of this file into a buffer object, then,\nwe end the program by printing this buffer to `stdout`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar file = try std.fs.cwd().openFile(\n \"ZigExamples/file-io/lorem.txt\", .{}\n);\ndefer file.close();\nvar buffered = std.io.bufferedReader(file.reader());\nvar bufreader = buffered.reader();\n\nvar buffer: [1000]u8 = undefined;\n@memset(buffer[0..], 0);\n\n_ = try bufreader.readUntilDelimiterOrEof(\n buffer[0..], '\\n'\n);\ntry stdout.print(\"{s}\\n\", .{buffer});\n```\n:::\n\n\n\n\n```\nLorem ipsum dolor sit amet, consectetur\nadipiscing elit. Sed tincidunt erat sed nulla ornare, nec\naliquet ex laoreet. Ut nec rhoncus nunc. Integer magna metus,\nultrices eleifend porttitor ut, finibus ut tortor. Maecenas\nsapien justo, finibus tincidunt dictum ac, semper et lectus.\nVivamus molestie egestas orci ac viverra. Pellentesque nec\narcu facilisis, euismod eros eu, sodales nisl. Ut egestas\nsagittis arcu, in accumsan sapien rhoncus sit amet. Aenean\nneque lectus, imperdiet ac lobortis a, ullamcorper sed massa.\nNullam porttitor porttitor erat nec dapibus. Ut vel dui nec\nnulla vulputate molestie eget non nunc. Ut commodo luctus ipsum,\nin finibus libero feugiat eget. Etiam vel ante at urna tincidunt\nposuere sit amet ut felis. Maecenas finibus suscipit tristique.\nDonec viverra non sapien id suscipit.\n```\n\nDespite being a buffered IO reader, this `bufreader` object is similar to any other `GenericReader` object,\nand have the exact same methods. So, although these two types of objects perform very different IO operations,\nthey have the same interface, so, you the programmer, can interchangeably use them\nwithout the need to change anything in your source code.\nSo a buffered IO reader or a buffered IO writer objects have the same methods than it's generic and unbuffered brothers,\ni.e. the generic reader and generic writer objects that I presented at @sec-writer-reader.\n\n::: {.callout-tip}\nIn general, you should always use a buffered IO reader or a buffered IO writer object to perform\nIO operations in Zig. Because they deliver better performance to your IO operations.\n:::\n\n\n## Filesystem basics\n\nNow that we have discussed the basics around Input/Output operations in Zig, we need to\ntalk about the basics around filesystems, which is another core part of any operating system.\nAlso, filesystems are related to input/output, because the files that we store and create in our\ncomputer are considered an IO resource, as we described at @sec-file-descriptor.\n\nLikewise when we were talking about input/output, if you have ever programmed in your life, you probably know\nsome basics about filesystems and file operations, etc.\nBut, since I don't know you, I don't know what is your background. As a result,\nthese concepts that I will describe might be clear in your mind, but they also maybe be not as clear as you think.\nJust bare with me, while I'm trying to put everyone on the same basis.\n\n\n### The concept of current working directory (CWD)\n\nThe working directory is the folder on your computer where you are currently rooted at,\nor in other words, it is the folder that your program is currently looking at.\nTherefore, whenever you are executing a program, this program is always working with\na specific folder on your computer. It is always in this folder that the program will initially\nlook for the files you require, and it is also in this folder that the program\nwill initially save all the files you ask it to save.\n\nThe working directory is determined by the folder from which you invoke your program\nin the terminal. In other words, if you are in the terminal of your OS, and you\nexecute a binary file (i.e. a program) from this terminal, the folder to which your terminal\nis pointing at is the current working directory of your program that is being executed.\n\nAt @fig-cwd we have an example of me executing a program from the terminal. We are executing\nthe program outputted by the `zig` compiler by compiling the Zig module named `hello.zig`.\nThe CWD in this case is the `zig-book` folder. In other words, while the `hello.zig` program\nis executing, it will be looking at the `zig-book` folder, and any file operation that we perform\ninside this program, will be using this `zig-book` folder as the \"starting point\", or, as the \"central focus\".\n\n![An example of executing a program from the terminal](./../Figures/cwd.png){#fig-cwd}\n\nJust because we are rooted inside a particular folder (in the case of @fig-cwd, the `zig-book` folder) of our computer,\nit doesn't mean that we cannot access or write resources in other locations of our computer.\nThe current working directory (CWD) mechanism just defines where your program will look first\nfor the files you ask for. This does not prevent you from accessing files that are located\nelsewhere on your computer. However, to access any file that is in a folder other than your\ncurrent working directory, you must provide a path to that file or folder.\n\n\n### The concept of paths\n\nA path is essentially a location. It points to a location in your filesystem. We use\npaths to describe the location of files and folders in our computer.\nOne important aspect is that paths are always written inside strings,\ni.e. they are always provided as text values.\n\nThere are two types of paths that you can provide to any program in any OS: a relative path, or an absolute path.\nAbsolute paths are paths that start at the root of your filesystem, and go all the way to the file name or the specfic folder\nthat you are referring to. This type of path is called absolute, because it points to a unique, absolute location on your computer.\nThat is, there is no other existing location on your computer that corresponds to this path. It is an unique identifier.\n\nIn Windows, an absolute path is a path that starts with a hard disk identifier (e.g. `C:/Users/pedro`).\nOn the other hand, absolute paths in Linux and MacOS, are paths that start with a forward slash character (e.g. `/usr/local/bin`).\nNotice that a path is composed by \"segments\". Each segment is connected to each other by a slash character (`\\` or `/`).\nOn Windows, the backward slash (`\\`) is normally used to connect the path segments. While on Linux and MacOS, the forward\nslash (`/`) is the character used to connect path segments.\n\nIn contrast, a relative path is a path that start at the CWD. In other words, a relative path is\n\"relative to the CWD\". The path used to access the `hello.zig` file at @fig-cwd is an example of relative path. This path\nis reproduced below. This path begins at the CWD, which in the context of @fig-cwd, is the `zig-book` folder,\nthen, it goes to the `ZigExamples` folder, then, into `zig-basics`, then, to the `hello.zig` file.\n\n```\nZigExamples/zig-basics/hello_world.zig\n```\n\n\n### Path wildcards\n\nWhen providing paths, especially relative paths, you have the option of using a *wildcard*.\nThere are two commonly used *wildcards* in paths, which are \"one period\" (.) and \"two periods\" (..).\nIn other words, these two specific characters have special meanings when used in paths,\nand can be used on any operating system (Mac, Windows, Linux, etc.). That is, they\nare \"cross platform\".\n\nThe \"one period\" represents an alias for your current working directory.\nThis means that the relative paths `\"./Course/Data/covid.csv\"` and `\"Course/Data/covid.csv\"` are equivalent.\nOn the other hand, the \"two periods\" refers to the previous directory.\nFor example, the path `\"Course/..\"` is equivalent to the path `\".\"`, that is, the current working directory.\n\nTherefore, the path `\"Course/..\"` refers to the folder before the `Course` folder.\nAs another example, the path `\"src/writexml/../xml.cpp\"` refers to the file `xml.cpp`\nthat is inside the folder before the `writexml` folder, which in this example is the `src` folder.\nTherefore, this path is equivalent to `\"src/xml.cpp\"`.\n\n\n\n\n## The CWD handler\n\nIn Zig, filesystem operations are usually made through a directory handler object.\nA directory handler in Zig is an object of type `Dir`, which is an object that describes\na particular folder in the filesystem of our computer.\nYou normally create a `Dir` object, by calling the `std.fs.cwd()` function.\nThis function returns a `Dir` object that points to (or, that describes) the\ncurrent working directory (CWD).\n\nThrough this `Dir` object, you can create new files, or modify, or read existing ones that are\ninside your CWD. In other words, a `Dir` object is the main entrypoint in Zig to perform\nmultiple types of filesystem operations.\nIn the example below, we are creating this `Dir` object, and storing it\ninside the `cwd` object. Although we are not using this object at this code example,\nwe are going to use it a lot over the next examples.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\n_ = cwd;\n```\n:::\n\n\n\n\n\n\n\n\n\n\n\n## File operations\n\n### Creating files {#sec-creating-files}\n\nWe create new files by using the `createFile()` method from the `Dir` object.\nJust provide the name of the file that you want to create, and this function will\ndo the necessary steps to create such file. You can also provide a relative path to this function,\nand it will create the file by following this path, which is relative to the CWD.\n\nThis function might return an error, so, you should use `try`, `catch`, or any of the other methods presented\nat @sec-error-handling to handle the possible error. But if everything goes well,\nthis `createFile()` method returns a file descriptor object (i.e. a `File` object) as result,\nthrough which you can add content to the file with the IO operations that I presented before.\n\nTake this code example below. In this example, we are creating a new text file\nnamed `foo.txt`. If the function `createFile()` succeeds, the object named `file` will contain a file descriptor\nobject, which we can use to write (or add) new content to the file, like we do in this example, by using\na buffered writer object to write a new line of text to the file.\n\nNow, a quick note, when we create a file descriptor object in C, by using a C function like `fopen()`, we must always close the file\nat the end of our program, or, as soon as we complete all operations that we wanted to perform\non the file. In Zig, this is no different. So everytime we create a new file, this file remains\n\"open\", waiting for some operation to be performed. As soon as we are done with it, we always have\nto close this file, to free the resources associated with it.\nIn Zig, we do this by calling the method `close()` from the file descriptor object.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst file = try cwd.createFile(\"foo.txt\", .{});\n// Don't forget to close the file at the end.\ndefer file.close();\n// Do things with the file ...\nvar fw = file.writer();\n_ = try fw.writeAll(\n \"Writing this line to the file\\n\"\n);\n```\n:::\n\n\n\n\n\nSo, in this example we not only have created a file into the filesystem,\nbut we also wrote some data into this file, using the file descriptor object\nreturned by `createFile()`. If the file that you are trying to create\nalready exists in your filesystem, this `createFile()` call will\noverwrite the contents of the file, or, in other words, it will\nin practice erase all the contents of the existing file.\n\nIf you don't want this to happen, meaning, that you don't want to overwrite\nthe contents of the existing file, but you want to write data to this file anyway\n(i.e. you want to append data to the file), you should use the `openFile()`\nmethod from the `Dir` object.\n\nAnother important aspect about `createFile()` is that this method creates a file\nthat is not opened to read operations by default. It means that you cannot read this file.\nYou are not allowed to.\nSo for example, you might want to write some stuff into this file at the beginning of the execution\nof your program. Then, at a future point in your program you might need to read what you have\nwroted into this file. If you try to read data from this file, you will likely\nget a `NotOpenForReading` error as result.\n\n\nBut how can you overcome this barrier? How can you create a file that is open\nto read operations? All you have to do, is to set the `read` flag to true\nin the second argument of `createFile()`. When you set this flag to true,\nthen the file get's create with \"read permissions\", and, as consequence,\na program like this one below becomes valid:\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst file = try cwd.createFile(\"foo.txt\", .{ .read = true });\ndefer file.close();\n\nvar fw = file.writer();\n_ = try fw.writeAll(\"We are going to read this line\\n\");\n\nvar buffer: [300]u8 = undefined;\n@memset(buffer[0..], 0);\ntry file.seekTo(0);\nvar fr = file.reader();\n_ = try fr.readAll(buffer[0..]);\ntry stdout.print(\"{s}\\n\", .{buffer});\n```\n:::\n\n\n\n\n\n```\nWe are going to read this line\n```\n\n\nIf you are not familiar with position indicators, you may not recognize what the method\n`seekTo()` is, or, what does it do. If that is your case, do not worry,\nwe are going to talk more about this method at @sec-indicators. But essentially\nthis method is moving the position indicator back to the beginning of the file,\nso that we can read the contents of the file from the beginning.\n\n\n### Opening files and appending data to it\n\nOpening files is easy. Just use the `openFile()` method instead of `createFile()`.\nIn the first argument of `openFile()` you provide the path to the file that\nyou want to open. Then, on the second argument you provide the flags (or, the options)\nthat dictates how the file is opened.\n\nYou can see the full list of options for `openFile()` by visiting the documentation for\n[`OpenFlags`](https://ziglang.org/documentation/master/std/#std.fs.File.OpenFlags)[^oflags].\nBut the main flag that you will most certainly be worried about is the `mode` flag.\nThis flag specifies the IO mode that the file will be using when it get's opened.\nThere are three IO modes, or, three values that you can provide to this flag, which are:\n\n- `read_only`, allows only read operations on the file. All write operations are blocked.\n- `write_only`, allows only write operations on the file. All read operations are blocked. \n- `read_write`, allows both write and read operations on the file.\n\n[^oflags]: \n\nThese modes are similar to the modes that you provide to the `mode` argument of the\n`open()` Python built-in function[^py-open], or, the `mode` argument of the\n`fopen()` C function[^c-open].\nIn the code example below, we are opening the `foo.txt` text file with a `write_only` mode,\nand appending a new line of text to the end of the file. We use `seekFromEnd()` this time\nto garantee that we are going to append the text to the end of the file. Once again, methods\nsuch as `seekFromEnd()` are described in more depth at @sec-indicators.\n\n[^py-open]: \n[^c-open]: \n\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst file = try cwd.openFile(\"foo.txt\", .{ .mode = .write_only });\ndefer file.close();\ntry file.seekFromEnd(0);\nvar fw = file.writer();\n_ = try fw.writeAll(\"Some random text to write\\n\");\n```\n:::\n\n\n\n\n\n### Deleting files\n\nSometimes, we just need to delete/remove the files that we have.\nTo do that, we use the `deleteFile()` method. You just provide the path of the\nfile that you want to delete, and this method will try to delete the file located\nat this path.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.deleteFile(\"foo.txt\");\n```\n:::\n\n\n\n\n### Copying files\n\nTo copy existing files, we use the `copyFile()` method. The first argument in this method\nis the path to the file that you want to copy. The second argument is a `Dir` object, i.e. a directory handler,\nmore specifically, a `Dir` object that points to the folder in your computer where you want to\ncopy the file to. The third argument is the new path of the file, or, in other words, the new location\nof the file. The fourth argument is the options (or flags) to be used in the copy operation.\n\nThe `Dir` object that you provide as input to this method will be used to copy the file to\nthe new location. You may create this `Dir` object before calling the `copyFile()` method.\nMaybe you are planning to copy the file to a completly different location in your computer,\nso it might be worth to create a directory handler to that location. But if you copying the\nfile to a subfolder of your CWD, then, you can just simply pass the CWD handler to this argument.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.copyFile(\n \"foo.txt\",\n cwd,\n \"ZigExamples/file-io/foo.txt\",\n .{}\n);\n```\n:::\n\n\n\n\n\n### Read the docs!\n\nThere are some other useful methods for file operations available at `Dir` objects,\nsuch as the `writeFile()` method, but I recommend you to read the docs for the\n[`Dir` type](https://ziglang.org/documentation/master/std/#std.fs.Dir)[^zig-dir]\nto explore the other available methods, since I already talked too much about them.\n\n\n[^zig-dir]: \n\n\n\n\n## Position indicators {#sec-indicators}\n\nA position indicator is like a type of cursor, or, an index. This \"index\" identifies the current\nlocation in the file (or, in the data stream) that the file descriptor object that you have\nis currently looking at.\nWhen you create a file descriptor, the position indicator starts at the beginning of the file,\nor, at the beginning of the stream. When you read or write data into the file (or socket, or data stream, etc.)\ndescribed by this file descriptor object, you end up moving the position indicator.\n\nIn other words, any IO operation have a common side effect, which is moving the position indicator.\nFor example, suppose that we have a file of 300 bytes total in size. If you\nread 100 bytes from the file, the position indicator moves 100 bytes forward. If you try\nto write 50 bytes into this same file, these 50 bytes will be written from the current\nposition indicated by the position indicator. Since the indicator is at a 100 bytes forward from\nthe beginning of the file, these 50 bytes would be written in the middle of the file.\n\nThis is why we have used the `seekTo()` method at the last code example presented at @sec-creating-files.\nWe have used this method to move the position indicator back to the beginning of the file, which\nwould make sure that we would write the text that we wanted to write from the beginning of the file,\ninstead of writing it from the middle of the file. Because before the write operation, we already had\nperformed a read operation, which means that the position indicator was moved in this read operation.\n\nThe position indicators of a file descriptor object can be changed (or altered) by using the\n\"seek\" methods from this file descriptor, which are: `seekTo()`, `seekFromEnd()` and `seekBy()`.\nThese methods have the same effect, or, the same resposibility that the\n[`fseek()`](https://en.cppreference.com/w/c/io/fseek)[^c-fseek]\nC function.\n\n[^c-fseek]: \n\n\nConsidering that `offset` refers to the index that you provide as input to these \"seek\" methods,\nthe bulletpoints below summarises what is the effect of each of these methods.\nA quick note, in the case of `seekFromEnd()` and `seekBy()`, the `offset` provided can be either a\npositive or negative index.\n\n- `seekTo()` will move the position indicator to the location that is `offset` bytes from the beginning of the file.\n- `seekFromEnd()` will move the position indicator to the location that is `offset` bytes from the end of the file.\n- `seekBy()` will move the position indicator to the location that is `offset` bytes from the current position in the file.\n\n\n\n\n\n\n\n## Directory operations\n\n### Iterating through the files in a directory\n\nOne of the most classic tasks related to filesystem is to be able\nto iterate through the existing files in a directory. Iteration\nover a directory is made in Zig through a iterator pattern. In other words, we need\nto create a iterator object, and use this object to iterate through the files.\n\nYou can produce such directory object by using either the `iterate()` or `walk()` methods\nof a `Dir` object. Both methods return a iterator object as result, which you can advance by using\n`next()`. The difference between these methods, is that `iterate()` returns a non-recursive iterator,\nwhile `walk()` does. It means that the iterator returned by `walk()` will not only iterate through\nthe files available in the current directory, but also, through the files from any subdirectory found\ninside the current directory.\n\nIn the example below, we are displaying the names of the files stored inside the\ndirectory `ZigExamples/file-io`. Notice that we had to open this directory through\nthe `openDir()` function. Also notice that we provided the flag `iterate` in the\nsecond argument of `openDir()`. This flag is important, because without this flag,\nwe would not be allowed to iterate through the files in this directory.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\nconst dir = try cwd.openDir(\n \"ZigExamples/file-io/\",\n .{ .iterate = true }\n);\nvar it = dir.iterate();\nwhile (try it.next()) |entry| {\n try stdout.print(\n \"File name: {s}\\n\",\n .{entry.name}\n );\n}\n```\n:::\n\n\n\n\n```\nFile name: create_file_and_write_toit.zig\nFile name: create_file.zig\nFile name: lorem.txt\nFile name: iterate.zig\nFile name: delete_file.zig\nFile name: append_to_file.zig\nFile name: user_input.zig\nFile name: foo.txt\nFile name: create_file_and_read.zig\nFile name: buff_io.zig\nFile name: copy_file.zig\n```\n\n\n### Creating new directories\n\nThere are two methods that are important when it comes to\ncreating directories, which are `makeDir()` and `makePath()`.\nThe difference between these two methods is that `makeDir()` can\nonly create one single directory in the current directory in each call,\nwhile `makePath()` is capable of recursively create subdirectories in the same call.\n\n\nThis is why the name of this method is \"make path\". It will create as many\nsubdirectories as necessary to create the path that you provided as input.\nSo, if you provide the path `\"sub1/sub2/sub3\"` as input to this method,\nit will create three different subdirectories, `sub1`, `sub2` and `sub3`,\nwithin the same function call. In contrast, if you provided such path\nas input to `makeDir()`, you would likely get an error as result, since\nthis method can only create a single subdirectory.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.makeDir(\"src\");\ntry cwd.makePath(\"src/decoders/jpg/\");\n```\n:::\n\n\n\n\n### Deleting directories\n\nTo delete a directory, just provide the path to the directory that you want to delete\nas input to the `deleteDir()` method from a `Dir` object. In the example below,\nwe are deleting the `src` directory that we have just created in the previous example.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst cwd = std.fs.cwd();\ntry cwd.deleteDir(\"src\");\n```\n:::\n\n\n\n\n\n## Conclusion\n\nIn this chapter, I have described how to perform in Zig the most common filesystem and IO operations.\nBut you might feel the lack of some other, less common, operation in this chapter, such as: how to rename files,\nor how to open a directory, or how to create symbolic links, or how to use `access()` to test if a particular\npath exists in your computer. But for all of these less common tasks, I recommend you to read\nthe docs of the [`Dir` type](https://ziglang.org/documentation/master/std/#std.fs.Dir)[^zig-dir]\n, since you can find a good description of these cases there.\n\n\n\n\n",
"supporting": [
"12-file-op_files"
],
diff --git a/_freeze/Chapters/14-threads/execute-results/html.json b/_freeze/Chapters/14-threads/execute-results/html.json
index f92e2a0..821af09 100644
--- a/_freeze/Chapters/14-threads/execute-results/html.json
+++ b/_freeze/Chapters/14-threads/execute-results/html.json
@@ -1,8 +1,8 @@
{
- "hash": "97d9a308fa122ba882d065ea3df85c18",
+ "hash": "69a75274db6a230aa2778b3eccd354c8",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Introducing threads and parallelism in Zig {#sec-thread}\n\nThreads are available in Zig through the `Thread` struct\nfrom the Zig Standard Library. This struct represents a kernel thread, and it follows a POSIX Thread pattern,\nmeaning that, it works similarly to a thread from the `pthread` C library, which is usually available on any distribution\nof the GNU C Compiler (`gcc`). If you are not familiar with threads, I will give you some threory behind it first, shall we?\n\n\n## What are threads? {#sec-what-thread}\n\nA thread is basically a separate context of execution.\nWe use threads to introduce parallelism into our program,\nwhich in most cases, makes the program runs faster, because we have multiple tasks\nbeing performed at the same time, parallel to each other.\n\nPrograms are normally single-threaded by default. Which means that each program\nusually runs on a single thread, or, a single context of execution. When we have only one thread running, we have no\nparallelism. And when we don't have parallelism, the commands are executed sequentially, that is,\nonly one command is executed at a time, one after another. By creating multiple threads inside our program,\nwe start to execute multiple commands at the same time.\n\nPrograms that create multiple threads are very common on the wild. Because many different types\nof applications are well suited for parallelism. Good examples are video and photo-editing applications\n(e.g. Adobe Photoshop or DaVinci Resolve)\n, games (e.g. The Witcher 3), and also web browsers (e.g. Google Chrome, Firefox, Microsoft Edge, etc).\nFor example, in web browsers, threads are normally used to implement tabs.\nIn other words, the tabs in a web browsers usually run as separate threads in the main process of\nthe web browser. That is, each new tab that you open in your web browser,\nusually runs on a separate thread of execution.\n\nBy running each tab in a separate thread, we allow all open tabs in the browser to run at the same time,\nand independently from each other. For example, you might have YouTube, or Spotify, currently opened in\na tab, and you are listening to some podcast in that tab, while, at the same time,\nyou are working in another tab, writing an essay on Google Docs. Even if you are not looking\ninto the YouTube tab, you can still hear the podcast only because this YouTube tab is running in parallel\nwith the other tab where Google Docs is running.\n\nWithout threads, the other alternative would be to run each tab as a completely separate running\nprocess in your computer. But that would be a bad choice, because just a few tabs would already consume\ntoo much power and resources from your computer. In other words, is very expensive to create a completely new process,\ncompared to creating a new thread of execution. Also, the chances of you experiencing lag and overhead\nwhile using the browser would be significant. Threads are faster to create, and they also consume\nmuch, much less resources from the computer, especially because they share some resources\nwith the main process.\n\nTherefore, is the use of threads in modern web browsers that allows you to hear the podcast\nat the same time while you are writing something on Google Docs.\nWithout threads, a web browser would probably be limited to just one single tab.\n\nThreads are also well-suited for anything that involves serving requests or orders.\nBecause serving a request takes time, and usually involves a lot of \"waiting time\".\nIn other words, we spend a lot of time in idle, waiting for something to complete.\nFor example, consider a restaurant. Serving orders in a restaurant usually involves\nthe following steps:\n\n1. receive order from the client.\n1. pass the order to the kitchen, and wait for the food to be cooked.\n1. start cooking the food in the kitchen.\n1. when the food is fully cooked deliver this food to the client.\n\nIf you think about the bulletpoints above, you will notice that one big moment of waiting\nis present in this hole process, which is while the food is being prepared and cooked\ninside the kitchen. Because while the food is being prepped, both the waiter and the client\nitself are waiting for the food to be ready and delivered.\n\nIf we write a program to represent this restaurant, more specifically, a single-threaded program, then,\nthis program would be very inefficient. Because the program would stay in idle, waiting for a considerable amount\nof time on the \"check if food is ready\" step.\nConsider the code snippet exposed below that could potentially represent such\nprogram.\n\nThe problem with this program is the while loop. This program will spend a lot of time\nwaiting on the while loop, doing nothing more than just checking if the food is ready.\nThis is a waste of time. Instead of waiting for something to happen, the waiter\ncould just send the order to the kitchen, and just move on, and continue with receiving\nmore orders from other clients, and sending more orders to the kitchen, insteading\nof doing nothing and waiting for the food to be ready.\n\n```zig\nconst order = Order.init(\"Pizza Margherita\", n = 1);\nconst waiter = Waiter.init();\nwaiter.receive_order(order);\nwaiter.ask_kitchen_to_cook();\nvar food_not_ready = false;\nwhile (food_not_ready) {\n food_not_ready = waiter.is_food_ready();\n}\nconst food = waiter.get_food_from_kitchen();\nwaiter.send_food_to_client(food);\n```\n\nThis is why threads would be a great fit for this program. We could use threads\nto free the waiters from their \"waiting duties\", so they can go on with their\nother tasks, and receive more orders. Take a look at the next example, where I have re-written the above\nprogram into a different program that uses threads to cook and deliver the orders.\n\nYou can see in this program that when a waiter receives a new order\nfrom a client, this waiter executes the `send_order()` function.\nThe only thing that this function does is: it creates a new thread\nand detaches it. Since creating a thread is a very fast operation,\nthis `send_order()` function returns almost immediatly,\nso the waiter spends almost no time worring about the order, and just\nmove on and tries to get the next order from the clients.\n\nInside the new thread created, the order get's cooked by a chef, and when the\nfood is ready, it is delivered to the client's table.\n\n\n```zig\nfn cook_and_deliver_order(order: *Order) void {\n const chef = Chef.init();\n const food = chef.cook(order.*);\n chef.deliver_food(food);\n}\nfn send_order(order: Order) void {\n const cook_thread = Thread.spawn(\n .{}, cook_and_deliver_order, .{&order}\n );\n cook_thread.detach();\n}\n\nconst waiter = Waiter.init();\nwhile (true) {\n const order = waiter.get_new_order();\n if (order) {\n send_order(order);\n }\n}\n```\n\n\n\n## Threads versus processes\n\nWhen we run a program, this program is executed as a *process* in the operating system.\nThis is a one to one relationship, each program or application that you execute\nis a separate process in the operating system. But each program, or each process,\ncan create and contain multiple threads inside of it. Therefore,\nprocesses and threads have a one to many relationship.\n\nThis also means that every thread that we create is always associated with a particular process in our computer.\nIn other words, a thread is always a subset (or a children) of an existing process.\nAll threads share some of the resources associated with the process from which they were created.\nAnd because threads share resources with the process, they are very good for making communication\nbetween tasks easier.\n\nFor example, suppose that you were developing a big and complex application\nthat would be much simpler if you could split it in two, and make these two separate pieces talk\nwith each other. Some programmers opt to effectively write these two pieces of the codebase as two\ncompletely separate programs, and then, they use IPC (*inter-process communication*) to make these\ntwo separate programs/processes talk to each other, and make them work together.\n\nHowever, some programmers find IPC hard to deal with, and, as consequence,\nthey prefer to write one piece of the codebase as the \"main part of the program\",\nor, as the part of the code that runs as the process in the operating system,\nwhile the other piece of the codebase is written as a task to be executed in\na new thread. A process and a thread can easily comunicate with each other\nthrough both control flow, and also, through data, because they share and have\naccess to the same standard file descriptors (`stdout`, `stdin`, `stderr`) and also to the same memory space\non the heap and global data section.\n\n\nIn more details, each thread that you create have a separate stack frame reserved just for that thread,\nwhich essentially means that each local object that you create inside this thread, is local to that\nthread, i.e. the other threads cannot see this local object. Unless this object that you have created\nis an object that lives on the heap. In other words, if the memory associated with this object\nis on the heap, then, the other threads can potentially access this object.\n\nTherefore, objects that are stored in the stack are local to the thread where they were created.\nBut objects that are stored on the heap are potentially accessible to other threads. All of this means that,\neach thread have it's own separate stack frame, but, at the same time, all threads share\nthe same heap, the same standard file descriptors (which means that they share the same `stdout`, `stdin`, `stderr`),\nand the same global data section in the program.\n\n\n\n## Creating a thread\n\nWe create new threads in Zig, by first, importing the `Thread` struct into\nour current Zig module, and then, calling the `spawn()` method of this struct,\nwhich creates (or, \"spawns\") a new thread of execution from our current process.\nThis method have three arguments, which are, respectively:\n\n1. a `SpawnConfig` object, which contains configurations for the spawn process.\n1. the name of the function that is going to be executed (or, that is going to be \"called\") inside this new thread.\n1. a list of arguments (or inputs) to be passed to the function provided in the second argument.\n\nWith these three arguments, you can control how the thread get's created, and also, specify which\nwork (or \"tasks\") will be performed inside this new thread. A thread is just a separate context of execution,\nand we usually create new threads in our code, because we want to perform some work inside this\nnew context of execution. And we specify which exact work, or, which exact steps that are going to be\nperformed inside this context, by providing the name of a function on the second argument of the `spawn()` method.\n\nThus, when this new thread get's created, this function that you provided as input to the `spawn()`\nmethod get's called, or, get's executed inside this new thread. You can control the\narguments, or, the inputs that are passed to this function when it get's called, by providing\na list of arguments (or a list of inputs) on the third argument of the `spawn()` method.\nThese arguments are passed to the function in the same order that they are\nprovided to `spawn()`.\n\nFurthermore, the `SpawnConfig` is a struct object with only two possible fields, or, two possible members, that you\ncan set to tailor the spawn behaviour. These fields are:\n\n- `stack_size`: you can provide an `usize` value to specify the size (in bytes) of the thread's stack frame. By default, this value is: $16 \\times 1024 \\times 1024$.\n- `allocator`: you can provide an allocator object to be used when allocating memory for the thread.\n\nTo use one of these two fields (or, \"configs\") you just have to create a new object of type `SpawnConfig`,\nand provide this object as input to the `spawn()` method. But, if you are not interested in using\none of these configs, and you are ok with using just the defaults, you can just provide an anonymous\nstruct literal (`.{}`) in the place of this `SpawnConfig` argument.\n\nAs our first, and very simple example, consider the code exposed below.\nInside the same program, you can create multiple threads of execution if you want to.\nBut, in this first example, we are creating just a single thread of execution, because\nwe call `spawn()` only once.\n\nAlso, notice in this example that we are executing the function `do_some_work()`\ninside the new thread. Since this function receives no inputs, because it has\nno arguments, in this instance, we have passed an empty list, or, more precisely, an empty and anonymous struct (`.{}`)\nin the third argument of `spawn()`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst Thread = std.Thread;\nfn do_some_work() !void {\n _ = try stdout.write(\"Starting the work.\\n\");\n std.time.sleep(100 * std.time.ns_per_ms);\n _ = try stdout.write(\"Finishing the work.\\n\");\n}\n\npub fn main() !void {\n const thread = try Thread.spawn(.{}, do_some_work, .{});\n thread.join();\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nStarting the work.Finishing the work.\n```\n\n\n:::\n:::\n\n\n\n\nNotice the use of `try` when calling the `spawn()` method. This means\nthat this method can return an error in some circunstances. One circunstance\nin particular is when you attempt to create a new thread, when you have already\ncreated too much (i.e. you have excedeed the quota of concurrent threads in your system).\n\nBut, if the new thread is succesfully created, the `spawn()` method returns a handler\nobject (which is just an object of type `Thread`) to this new thread. You can use\nthis handler object to effectively control all aspects of the thread.\n\nThe instant that you create the new thread, the function that you provided as input to `spawn()`\nget's invoked (i.e. get's called) to start the execution on this new thread.\nIn other words, everytime you call `spawn()`, not only a new thread get's created,\nbut also, the \"start work button\" of this thread get's automatically pressed.\nSo the work being performed in this thread starts at the moment that the thread is created.\nThis is similar to how `pthread_create()` from the `pthreads` library in C works,\nwhich also starts the execution at the moment that the thread get's created.\n\n\n## Returning from a thread\n\nWe have learned on the previous section that the execution of the thread starts at the moment\nthat the thread get's created. Now, we will learn how to \"join\" or \"detach\" a thread in Zig.\n\"Join\" and \"detach\" are operations that control how the thread returns to\nthe main thread, or, to the main process in our program.\n\nWe perform these operations by using the methods `join()` and `detach()` from the thread handler object.\nEvery thread that you create can be marked as either *joinable* or *detached* [@linux_pthread_create].\nYou can turn a thread into a *detached* thread by calling the `detach()` method\nfrom the thread handler object. But if you call the `join()` method instead, then, this thread\nbecomes a *joinable* thread.\n\nA thread cannot be both *joinable* and *detached*. Which in general means\nthat you cannot call both `join()` and `detach()` on the same thread.\nBut a thread must be one of the two, meaning that, you should always call\neither `join()` or `detach()` over a thread. If you don't call\none of these two methods over your thread, you introduce undefined behaviour into your program,\nwhich is described at @sec-not-call-join-detach.\n\nNow, let's describe what each of these two methods do to your thread.\n\n\n### Joining a thread\n\nWhen you join a thread, you are essentially saying: \"Hey! Could you please wait for the thread to finish,\nbefore you continue with your execution?\". For example, if we comeback to our first and simpliest example\nof a thread in Zig, in that example we have created a single thread inside the `main()` function of our program,\nand just called `join()` over this thread at the end. This section of the code example is reproduced below.\n\nBecause we are joining this new thread inside the `main()`'s scope, it means that the\nexecution of the `main()` function is temporarily stopped, to wait for the execution of the thread\nto finish. That is, the execution of `main()` stops temporarily at the line where `join()` get's called,\nand it will continue only after the thread has finished it's tasks.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn main() !void {\n const thread = try Thread.spawn(.{}, do_some_work, .{});\n thread.join();\n}\n```\n:::\n\n\n\n\nBecause we have joined this new thread inside `main()`, by calling `join()`, we have a\ngarantee that this new thread will finish before the end of the execution of `main()`.\nBecause it is garanteed that `main()` will wait for the thread to finish it's tasks.\nYou could also interpret this as: the execution of main will hang at\nthe line where `join()` is called, and the next lines of code that come after\nthis `join()` call, will be executed solely after the execution of main\nis \"unlocked\" after the thread finish it's tasks.\n\nIn the example above, there is no more expressions after the `join()` call. We just have the end\nof the `main()`'s scope, and, therefore after the thread finish it's tasks, the execution\nof our program just ends, since there is nothing more to do. But what if we had more stuff to do\nafter the join call?\n\nTo demonstrate this other possibility, consider the next example exposed\nbelow. Here, we create a `print_id()` function, that just receives an id\nas input, and prints it to `stdout`. In this example, we are creating two\nnew threads, one after another. Then, we join the first thread, then,\nwe wait for two hole seconds, then, at last, we join the second thread.\n\nThe idea behind this example is that the last `join()` call is executed\nonly after the first thread finish it's task (i.e. the first `join()` call),\nand also, after the two seconds of delay. If you compile and run this\nexample, you will notice that most messages are quickly printed to `stdout`,\ni.e. they appear almost instantly on your screen.\nHowever, the last message (\"Joining thread 2\") takes aroung 2 seconds to appear\nin the screen.\n\n\n```zig\nfn print_id(id: *const u8) !void {\n try stdout.print(\"Thread ID: {d}\\n\", .{id.*});\n}\n\npub fn main() !void {\n const id1: u8 = 1;\n const id2: u8 = 2;\n const thread1 = try Thread.spawn(.{}, print_id, .{&id1});\n const thread2 = try Thread.spawn(.{}, print_id, .{&id2});\n\n _ = try stdout.write(\"Joining thread 1\\n\");\n thread1.join();\n std.time.sleep(2 * std.time.ns_per_s);\n _ = try stdout.write(\"Joining thread 2\\n\");\n thread2.join();\n}\n```\n\n```\nThread ID: Joining thread 1\n1\nThread ID: 2\nJoining thread 2\n```\n\nThis demonstrates that both threads finish their work (i.e. printing the IDs)\nvery fast, before the two seconds of delay end. Because of that, the last `join()` call\nreturns pretty much instantly. Because when this last `join()` call happens, the second\nthread have already finished it's task.\n\nNow, if you compile and run this example, you will also notice that, in some cases,\nthe messages get intertwined with each other. In other words, you might see\nthe message \"Joining thread 1\" inserted in the middle of the message \"Thread 1\",\nor vice-versa. This happens because:\n\n- the threads are executing basically at the same time as the main process of the program (i.e. the `main()` function).\n- the threads share the same `stdout` from the main process of the program, which means that the messages that the threads produce are sent to exact same place as the messages produced by the main process.\n\nBoth of these points were described previously at @sec-what-thread.\nSo the messages might get intertwined because they are being produced and\nsent to the same `stdout` roughly at the same time.\nAnyway, when you call `join()` over a thread, the current process will wait\nfor the thread to finish before it continues, and, when the thread does finishs it's\ntask, the resources associated with this thread are automatically freed, and,\nthe current process continues with it's execution.\n\n\n### Detaching a thread\n\nWhen you detach a thread, by calling the `detach()` method, the thread is marked as *detached*.\nWhen a *detached* thread terminates, its resources are automatically released back to the system without\nthe need for another thread to join with this terminated thread.\n\nIn other words, when you call `detach()` over a thread is like when your children becomes adults,\ni.e. they become independent from you. A detached thread frees itself, and it does need to report the results back\nto you, when the thread finishs it's task. Thus, you normally mark a thread as *detached*\nwhen you don't need to use the return value of the thread, or, when you don't care about\nwhen exactly the thread finishs it's job, i.e. the thread solves everything by itself.\n\nTake the code example below. We create a new thread, detach it, and then, we just\nprint a final message before we end our program. We use the same `print_id()`\nfunction that we have used over the previous examples.\n\n\n```zig\nfn print_id(id: *const u8) !void {\n try stdout.print(\"Thread ID: {d}\\n\", .{id.*});\n}\n\npub fn main() !void {\n const id1: u8 = 1;\n const thread1 = try Thread.spawn(.{}, print_id, .{&id1});\n thread1.detach();\n _ = try stdout.write(\"Finish main\\n\");\n}\n```\n\n```\nFinish main\n```\n\nNow, if you look closely at the output of this code example, you will notice\nthat only the final message in main was printed to the console. The message\nthat was supposed to be printed by `print_id()` did not appear in the console.\nWhy? Is because the main process of our program has finished first,\nbefore the thread was able to say anything.\n\nAnd that is perfectly ok behaviour, because the thread was detached, so, it was\nable to free itself, without the need of the main process.\nIf you ask main to sleep (or \"wait\") for some extra nanoseconds, before it ends, you will likely\nsee the message printed by `print_id()`, because you give enough time for the thread to\nfinish before the main process ends.\n\n\n## Thread pools\n\nThread pools is a very popular programming pattern, which is used especially on servers and daemons processes. A thread pool is just a\nset of threads, or, a \"pool\" of threads. Many programmers like to use this pattern, because it makes\neasier to manage and use multiple threads, instead of manually creating the threads when you need them.\n\nAlso, using thread pools might increase performance as well in your program,\nespecially if your program is constantly creating threads to perform short-lived tasks.\nIn such instance, a thread pool might cause an increase in performance because you do not have be constantly\ncreating and destroying threads all the time, so you don't face a lot of the overhead involved\nin this constant process of creating and destroying threads.\n\nThe main idea behind a thread pool is to have a set of threads already created and ready to perform\ntasks at all times. You create a set of threads at the moment that your program starts, and keep\nthese threads alive while your program runs. Each of these threads will be either performing a task, or,\nwaiting for a task to be assigned.\nEvery time a new task emerges in your program, this task is added to a \"queue of tasks\".\nThe moment that a thread becomes available and ready to perform a new task,\nthis thread takes the next task in the \"queue of tasks\", then,\nit simply performs the task.\n\nThe Zig Standard Library offers a thread pool implementation on the `std.Thread.Pool` struct.\nYou create a new instance of a `Pool` object by providing a `Pool.Options` object\nas input to the `init()` method of this struct. A `Pool.Options` object, is a struct object that contains\nconfigurations for the pool of threads. The most important settings in this struct object are\nthe members `n_jobs` and `allocator`. As the name suggests, the member `allocator` should receive an allocator object,\nwhile the member `n_jobs` specifies the number of threads to be created and maintained in this pool.\n\nConsider the example exposed below, that demonstrates how can we create a new thread pool object.\nHere, we create a `Pool.Options` object that contains\na general purpose allocator object, and also, the `n_jobs` member was set to 4, which\nmeans that the thread pool will create and use 4 threads.\n\nAlso notice that the `pool` object was initially set to `undefined`. This allow us\nto initially declare the thread pool object, but not properly instantiate the\nunderlying memory of the object. You have to initially declare your thread pool object\nby using `undefined` like this, because the `init()` method of `Pool` needs\nto have an initial pointer to properly instantiate the object.\n\nSo, just\nremember to create your thread pool object by using `undefined`, and then,\nafter that, you call the `init()` method over the object.\nYou should also not forget to call the `deinit()` method over the thread pool\nobject, once you are done with it, to release the resources allocated for the thread pool. Otherwise, you will\nhave a memory leak in your program.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Pool = std.Thread.Pool;\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n const opt = Pool.Options{\n .n_jobs = 4,\n .allocator = allocator,\n };\n var pool: Pool = undefined;\n _ = try pool.init(opt);\n defer pool.deinit();\n}\n```\n:::\n\n\n\n\nNow that we know how to create `Pool` objects, we have\nto understand how to assign tasks to be executed by the threads in this pool object.\nTo assign a task to be performed by a thread, we need to call the `spawn()` method\nfrom the thread pool object.\n\nThis `spawn()` method works identical to the `spawn()` method from the\n`Thread` object. The method have almost the same arguments as the previous one,\nmore precisely, we don't have to provide a `SpawnConfig` object in this case.\nBut instead of creating a new thread, this `spawn()` method from\nthe thread pool object just register a new task in the internal \"queue of tasks\" to be performed,\nand any available thread in the pool will get this task, and it will simply perform the task.\n\nIn the example below, we are using our previous `print_id()` function once again.\nBut you may notice that the `print_id()` function is a little different this time,\nbecause now we are using `catch` instead of `try` in the `print()` call.\nCurrently, the `Pool` struct only supports functions that don't return errors\nas tasks. Thus, when assigining tasks to threads in a thread pool, is essential to use functions\nthat don't return errors. That is why we are using `catch` here, so that the\n`print_id()` function don't return an error.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn print_id(id: *const u8) void {\n _ = stdout.print(\"Thread ID: {d}\\n\", .{id.*})\n catch void;\n}\nconst id1: u8 = 1;\nconst id2: u8 = 2;\ntry pool.spawn(print_id, .{&id1});\ntry pool.spawn(print_id, .{&id2});\n```\n:::\n\n\n\n\nThis limitation should probably not exist, and, in fact, it is already on the radar of the\nZig team to fix this issue, and it is being tracked on an [open issue](https://github.com/ziglang/zig/issues/18810)[^issue].\nSo, if you do need to provide a function that might return an error as the task\nto be performed by the threads in the thread pool, then, you are either limited to:\n\n- implementing your own thread pool that does not have this limitation.\n- wait for the Zig team to actually fix this issue.\n\n[^issue]: \n\n\n\n\n## Mutexes\n\nMutexes are a classic component of every thread library. In essence, a mutex is a *Mutually Exclusive Flag*, and this flag\nacts like a type of \"lock\", or as a gate keeper to a particular section of your code. Mutexes are related to thread syncronization,\nmore specifically, they prevent you from having some classic race conditions in your program,\nand, therefore, major bugs and undefined behaviour that are usually difficult to track and understand.\n\nThe main idea behind a mutex is to help us to control the execution of a particular section of the code, and to\nprevent two or more threads from executing this particular section of the code at the same time.\nMany programmers like to compare a mutex to a bathroom door (which usually have a lock).\nWhen a thread locks it's own mutex object, it is like if the bathroom door was locked,\nand, therefore, the other people (in this case, the other threads) that wants to use the same bathroom at the same time\nhave to be patient, and simply wait for the other person (or the other thread) to unlock the door and get out of the bathroom.\n\nSome other programmers also like to explain mutexes by using the analogy of \"each person will have their turn to speak\".\nThis is the analogy used on the [*Multithreading Code* video from the Computherfile project](https://www.youtube.com/watch?v=7ENFeb-J75k&ab_channel=Computerphile)[^computerphile].\nImagine\nif you are in a conversation circle. There is a moderator in this circle, which is the person that decides who\nhave the right to speak at that particular moment. The moderator gives a green card (or some sort of an authorization card) to the person that\nis going to speak, and, as a result, everyone else must be silent and hear this person that has the green card.\nWhen the person finishs talking, it gives the green card back to the moderator, and the moderator decides\nwho is going to talk next, and delivers the green card to that person. And the cycle goes on like this.\n\n[^computerphile]: \n\n\nA mutex acts like the moderator in this conversation circle. The mutex authorizes one single thread to execute a specific section of the code,\nand it also blocks the other threads from executing this same section of the code. If these other threads wants to execute this same\npiece of the code, they are forced to wait for the the authorized thread to finish first.\nWhen the authorized thread finishs executing this code, the mutex authorizes the next thread to execute this code,\nand the other threads are still blocked. Therefore, a mutex is like a moderator that does a \"each thread will have their turn to execute this section of the code\"\ntype of control.\n\n\nMutexes are especially used to prevent data race problems from happening. A data race problem happens when two or more threads\nare trying to read from or write to the same shared object at the same time.\nSo, when you have an object that is shared will all threads, and, you want to avoid two or more threads from\naccessing this same object at the same time, you can use a mutex to lock the part of the code that access this specific object.\nWhen a thread tries to run this code that is locked by a mutex, this thread stops it's execution, and patiently waits for this section of the codebase to be\nunlocked to continue.\n\nIn other words, the execution of the thread is paused while the code section\nis locked by the mutex, and it is unpaused the moment that the code section is unlocked by the other thread that\nwas executing this code section.\nNotice that mutexes are normally used to lock areas of the codebase that access/modify data that is **shared** with all threads,\ni.e. objects that are either stored in the global data section, or, in the heap space of your program.\nSo mutexes are not normally used on areas of the codebase that access/modify objects that are local to the thread.\n\n\n\n### Critical section {#sec-critical-section}\n\nCritical section is a concept commonly associated with mutexes and thread syncronization.\nIn essence, a critical section is the section of the program that a thread access/modify a shared resource\n(i.e. an object, a file descriptor, something that all threads have access to). In other words,\na critical section is the section of the program where race conditions might happen, and, therefore,\nwhere undefined behaviour can be introduced into the program.\n\nWhen we use mutexes in our program, the critical section defines the area of the codebase that we want to lock.\nSo we normally lock the mutex object at the beginning of the critical section,\nand then, we unlock it at the end of the critical section.\nThe two bulletpoints exposed below comes from the \"Critical Section\" article from GeekFromGeeks,\nand they summarise well the role that a critical section plays in the thread syncronization problem [@geeks_critical_section].\n\n\n1. The critical section must be executed as an atomic operation, which means that once one thread or process has entered the critical section, all other threads or processes must wait until the executing thread or process exits the critical section. The purpose of synchronization mechanisms is to ensure that only one thread or process can execute the critical section at a time.\n2. The concept of a critical section is central to synchronization in computer systems, as it is necessary to ensure that multiple threads or processes can execute concurrently without interfering with each other. Various synchronization mechanisms such as semaphores, mutexes, monitors, and condition variables are used to implement critical sections and ensure that shared resources are accessed in a mutually exclusive manner.\n\n\n### Atomic operations {#sec-atomic-operation}\n\nYou will also see the term \"atomic operation\" a lot when reading about threads, race conditions and mutexes.\nIn summary, an operation is categorized as \"atomic\", when there is no way to happen a context switch in\nthe middle of this operation. In other words, this operation is always done from beginning to end, without interruptions\nof another process or operation in the middle of it's execution phase.\n\nNot many operations today are atomic. But why atomic operations matters here? Is because data races\n(which is a type of a race condition) cannot happen on operations that are atomic.\nSo if a particular line in your code performs an atomic operation, then, this line will never\nsuffer from a data race problem. Therefore, programmers sometimes use an atomic operation\nto protect themselves from data race problems in their code.\n\nWhen you have an operation that is compiled into just one single assembly instruction, this operation might be atomic,\nbecause is just one assembly instruction. But this is not guaranteed. This is usually true for old CPU architectures (such as `x86`). But nowadays, most\nassembly instructions in modern CPU architectures turn into multiple micro-tasks, which inherently makes the operation not atomic anymore,\neven though it has just one single assembly instruction.\n\nThe Zig Standard Library offers some atomic functionality at the `std.atomic` module.\nIn this module, you will find a public and generic function called `Value()`. With this function we create an \"atomic object\", which is\na value that contains some native atomic operations, most notably, a `load()` and a `fetchAdd()` operation.\nIf you have experience with multithreading in C++, you probably have recognized this pattern. So yes, this generic\n\"atomic object\" in Zig is essentially identical to the template struct `std::atomic` from the C++ Standard Library.\nIs important to emphasize that only primitive data types (i.e. the types presented at @sec-primitive-data-types)\nare supported by these atomic operations.\n\n\n\n\n\n### Data races and race conditions\n\nTo understand why mutexes are used, we need to understand better the problem that they seek\nto solve, which can be summarized into data races problems. A data race problem is a type of a race condition,\nwhich happens when one thread is accessing a particular memory location (i.e. a particular shared object) at the same\ntime that another thread is trying to write/save new data into this same memory location (i.e. the same shared object).\n\nWe can simply define a race condition as any type of bug in your program that is based\non a \"who get's there first\" problem. A data race problem is a type of a race condition, because it occurs when two or more parties\nare trying to read and write into the same memory location at the same time, and, therefore, the end result of this operation\ndepends completely on who get's to this memory location first.\nAs consequence, a program that have a data race problem will likely produce a different result each time that we execute it.\n\nThus, race conditions produce unefined behaviour and unpredictability because the program produces\na different answer in each time that a different person get's to the target location first than the others.\nAnd we have no easy way to either predict or control who is going to get to this target location first.\nIn other words, in each execution of your program,\nyou get a different answer, because a different person, or, a different function, or, a different part of the code is finishing\nits tasks (or it is reaching a location) first than the others.\n\nAs an example, consider the code snippet exposed below. In this example, we create a global counter\nvariable, and we also create a `increment()` function, whose job is to just increment this global counter\nvariable in a for loop.\n\nSince the for loop iterates 1 hundred thousand times, and, we create two separate threads\nin this code example, what number do you expect to see in the final message printed to `stdout`?\nThe answer should be 2 hundred thousand. Right? Well, in threory, this program was supposed\nto print 2 hundred thousand at the end, but in practice, every time that I execute this program\nI get a different answer.\n\nIn the example exposed below, you can see that this time we have executed the program, the end\nresult was 117254, instead of the expected 200000. The second time I have executed this program,\nI got the number 108592 as result. So the end result of this program is varying, but it never gets\nto the expected 200000 that we want.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// Global counter variable\nvar counter: usize = 0;\n// Function to increment the counter\nfn increment() void {\n for (0..100000) |_| {\n counter += 1;\n }\n}\n\npub fn main() !void {\n const thr1 = try Thread.spawn(.{}, increment, .{});\n const thr2 = try Thread.spawn(.{}, increment, .{});\n thr1.join();\n thr2.join();\n try stdout.print(\"Couter value: {d}\\n\", .{counter});\n}\n```\n:::\n\n\n\n\n```\nCouter value: 117254\n```\n\n\nWhy this is happening? The answer is: because this program contains a data race problem.\nThis program would print the correct number 200000, if, and only if the first thread finishs\nit's tasks before the second thread starts to execute. But that is very unlikely to happen.\nBecause the process of creating the thread is too fast, and therefore, both threads starts to execute roughly\nat the same time. If you change this code to add some nanoseconds of sleep between the first and the second calls to `spawn()`,\nyou will increase the chances of the program producing the \"correct result\".\n\nSo the data race problem happens, because both threads are reading and writing to the same\nmemory location at roughly the same time. In this example, each thread is essentially performing\nthree basic operations at each iteration of the for loop, which are:\n\n1. reading the current value of `count`.\n1. incrementing this value by 1.\n1. writing the result back into `count`.\n\nIdeally, a thread B should read the value of `count`, only after the other thread A has finished\nwriting the incremented value back into the `count` object. Therefore, in the ideal scenario, which is demonstrated\nat @tbl-data-race-ideal, the threads should work in sync with each other. But the reality is that these\nthreads are out of sync, and because of that, they suffer from a data race problem, which is demonstrated\nat @tbl-data-race-not.\n\nNotice that, in the data race scenario (@tbl-data-race-not), the read performed by a thread B happens\nbefore the write operation of thread A, and that ultimately leads to wrong results at the end of the program.\nBecause when the thread B reads the value from the `count` variable, the thread A is still processing\nthe initial value from `count`, and it did not write the new and incremented value into `count` yet. So what\nhappens is that thread B ends up reading the same initial value (the \"old\" value) from `count`, instead of\nreading the new and incremented version of this value that would be calculated by thread A.\n\n\n::: {#tbl-data-race-ideal}\n\n| Thread 1 | Thread 2 | Integer value |\n|-------------|-------------|---------------|\n| read value | | 0 |\n| increment | | 1 |\n| write value | | 1 |\n| | read value | 1 |\n| | increment | 2 |\n| | write value | 2 |\n\n: An ideal scenario for two threads incrementing the same integer value\n:::\n\n::: {#tbl-data-race-not}\n\n| Thread 1 | Thread 2 | Integer value |\n|-------------|-------------|---------------|\n| read value | | 0 |\n| | read value | 0 |\n| increment | | 1 |\n| | increment | 1 |\n| write value | | 1 |\n| | write value | 1 |\n\n: A data race scenario when two threads are incrementing the same integer value\n:::\n\n\nIf you think about these diagrams exposed in form of tables, you will notice that they relate back to our discussion of atomic operations\nat @sec-atomic-operation. Remember, atomic operations are operations that the CPU executes\nfrom beginning to end, without interruptions from other threads or processes. So,\nthe scenario exposed at @tbl-data-race-ideal do not suffer from a data race, because\nthe operations performed by thread A are not interrupted in the middle by the operations\nfrom thread B.\n\nIf we also think about the discussion of critical section from @sec-critical-section, we can identify\nthe section that representes the critical section of the program, which is the section that is vulnerable\nto data race conditions. In this example, the critical section of the program is the line where we increment\nthe `counter` variable (`counter += 1`). So, ideally, we want to use a mutex, and lock right before this line, and then,\nunlock right after this line.\n\n\n\n\n### Using mutexes in Zig\n\nNow that we know the problem that mutexes seek to solve, we can learn how to use them in Zig.\nMutexes in Zig are available through the `std.Thread.Mutex` struct from the Zig Standard Library.\nIf we take the same code example from the previous example, and improve it with mutexes, to solve\nour data race problem, we get the code example exposed below.\n\nNotice that we had this time to alter the `increment()` function to receive a pointer to\nthe `Mutex` object as input. All that we need to do, to make this program safe against\ndata race problems, is to call the `lock()` method at the beginning of\nthe critical section, and then, call `unlock()` at the end of the critical section.\nNotice that the output of this program is now the correct number of 200000.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst Thread = std.Thread;\nconst Mutex = std.Thread.Mutex;\nvar counter: usize = 0;\nfn increment(mutex: *Mutex) void {\n for (0..100000) |_| {\n mutex.lock();\n counter += 1;\n mutex.unlock();\n }\n}\n\npub fn main() !void {\n var mutex: Mutex = .{};\n const thr1 = try Thread.spawn(.{}, increment, .{&mutex});\n const thr2 = try Thread.spawn(.{}, increment, .{&mutex});\n thr1.join();\n thr2.join();\n try stdout.print(\"Couter value: {d}\\n\", .{counter});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nCouter value: 200000\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n## Read/Write locks\n\nMutexes are normally used when is always not safe to have two or more threads running the same\npiece of code at the same time. In contrast, read/write locks are normally used in situations\nwhere you have a mixture of scenarios, i.e. there are some pieces of the codebase that are safe to run in parallel, and other pieces that\nare not safe.\n\nFor example, suppose that you have multiple threads that uses the same shared file in the filesystem to store some configurations, or,\nstatistics. If two or more threads try to read the data from this same file at the same time, nothing bad happens.\nSo this part of the codebase is perfectly safe to be executed in parallel, with multiple threads reading the same file at the same time.\n\nHowever, if two or more threads try to write data into this same file at the same time, then, we cause some race conditions\nproblems. So this other part of the codebase is not safe to be executed in parallel.\nMore specifically, a thread might end up writing data in the middle of the data written by the other thread.\nThis process of two or more threads writing to the same location, might lead to data corruption.\nThis specific situation is usually called of a *torn write*.\n\nThus, what we can extract from this is that there is certain types of operations that causes a race condition, but there\nare also, other types of operations that do not cause a race condition problem.\nYou could also say that, there are types of operations that are susceptible to race condition problems,\nand there are other types of operations that are not.\n\nA read/write lock is a type of lock that acknowledges the existance of this specific scenario, and you can\nuse this type of lock to control which parts of the codebase are safe to run in parallel, and which parts are not safe.\n\n\n\n### Exclusive lock vs shared lock\n\nTherefore, a read/write lock is a little different from a mutex. Because a mutex is always an *exclusive lock*, meaning that, only\none thread is allowed to execute at all times. With an exclusive lock, the other threads are always \"excluded\",\ni.e. they are always blocked from executing.\nBut in a read/write lock, the other threads might be authorized to run at the same time, depending on the type of lock that they acquire.\n\nWe have two types of locks in a read/write lock, which are: an exclusive lock and a shared lock. An exclusive lock works exactly the same\nas a mutex, while a shared lock is a lock that does not block the other threads from running.\nIn the `pthreads` C library, read/write locks are available through the `pthread_rwlock_t` C struct. With\nthis C struct, you can create a \"write lock\", which corresponds to an exclusive lock, or, you can create a \"read lock\",\nwhich corresponds to a shared lock. The terminology might be a little different, but the meaning is the same,\nso just remember this relationship, write locks are exclusive locks, while read locks are shared locks.\n\nWhen a thread tries to acquire a read lock (i.e. a shared lock), this thread get's the shared lock\nif, and only if another thread does not currently holds a write lock (i.e. an exclusive lock), and also,\nif there are no other threads that are already in the queue,\nwaiting for their turn to acquire a write lock. In other words, the thread in the queue have attempted\nto get a write lock earlier, but this thread was blocked\nbecause there was another thread running that already had a write lock. As consequence, this thread is on the queue to get a write lock,\nand it's currently waiting for the other thread with a write lock to finish it's execution.\n\nWhen a thread tries to acquire a read lock, but it fails in acquiring this read lock, either because there is\na thread with a write lock already running, or, because there is a thread in the queue to get a write lock,\nthe execution of this thread is instantly blocked, i.e. paused. This thread will indefinitely attempt to get the\nread lock, and it's execution will be unblocked (or unpaused) only after this thread successfully acquires the read lock.\n\nIf you think deeply about this dynamic between read locks versus write locks, you might notice that a read lock is basically a safety mechanism.\nMore specifically, it is a way for us to\nallow a particular thread to run together with the other threads, only when it's safe to. In other words, if there is currently\na thread with a write lock running, then, it is very likely not safe for the thread that is trying to acquire the read lock to run now.\nAs consequence, the read lock protects this thread from running into dangerous waters, and patienly waits for the\n\"write lock\" thread to finishs it's tasks before it continues.\n\nOn the other hand, if there are only \"read lock\" (i.e. \"shared lock\") threads currently running\n(i.e. not a single \"write lock\" thread currently exists), then,\nis perfectly safe for this thread that is acquiring the read lock to run in parallel with the other\nthreads. As a result, the read lock just\nallows for this thread to run together with the other threads.\n\nThus, by using read locks (shared locks) in conjunction with write locks (exclusive locks), we can control which regions or sections\nof our multithreaded code is safe for us to have parallelism, and which sections are not safe to have parallelism.\n\n\n\n\n\n### Using read/write locks in Zig\n\nThe Zig Standard Library supports read/write locks through the `std.Thread.RwLock` module.\nIf you want to a particular thread to acquire a shared lock (i.e. a read lock), you should\ncall the `lockShared()` method from the `RwLock` object. But, if you want for this thread\nto acquire an exclusive lock (i.e. a write lock) instead, then, you should call the\n`lock()` method from the `RwLock` object.\n\nAs with mutexes, we also have to unlock the shared or exclusive locks that we acquire through a read/write lock object,\nonce we are at the end of our \"critical section\". If you have acquired an exclusive lock, then, you unlock\nthis exclusive lock by calling the `unlock()` method from the read/write lock object. In contrast,\nif you have acquired a shared lock instead, then, call `unlockShared()` to unlock this shared lock.\n\nAs a simple example, the code below creates three separate threads responsible for reading the\ncurrent value in a `counter` object, and it also creates another thread, responsible for writing\nnew data into the `counter` object (incrementing it, more specifically).\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar counter: u32 = 0;\nfn reader(lock: *RwLock) !void {\n while (true) {\n lock.lockShared();\n const v: u32 = counter;\n try stdout.print(\"{d}\", .{v});\n lock.unlockShared();\n std.time.sleep(2 * std.time.ns_per_s);\n }\n}\nfn writer(lock: *RwLock) void {\n while (true) {\n lock.lock();\n counter += 1;\n lock.unlock();\n std.time.sleep(2 * std.time.ns_per_s);\n }\n}\n\npub fn main() !void {\n var lock: RwLock = .{};\n const thr1 = try Thread.spawn(.{}, reader, .{&lock});\n const thr2 = try Thread.spawn(.{}, reader, .{&lock});\n const thr3 = try Thread.spawn(.{}, reader, .{&lock});\n const wthread = try Thread.spawn(.{}, writer, .{&lock});\n\n thr1.join();\n thr2.join();\n thr3.join();\n wthread.join();\n}\n```\n:::\n\n\n\n\n\n## Yielding a thread\n\nThe `Thread` struct supports yielding through the `yield()` method.\nYielding a thread means that the execution of the thread is temporarily stopped,\nand the thread comes back to the end of the queue of priority of the scheduler from\nyour operating system.\n\nThat is, when you yield a thread, you are essentially saying the following to your OS:\n\"Hey! Could you please stop executing this thread for now, and comeback to continue it later?\".\nYou could also interpret this yield operation as: \"Could you please deprioritize this thread,\nto focus on doing other things instead?\".\nSo this yield operation is also a way for you\nto stop a particular thread, so that you can work and prioritize other threads instead.\n\nIs important to say that, yielding a thread is a \"not so common\" thread operation these days.\nIn other words, not many programmers use yielding in production, simply because is hard to use\nthis operation and make it work properly, and also, there\nare better alternatives. Most programmers prefer to use `join()` instead.\nIn fact, most of the times, when you see somebody using yield in some code example, they are mostly using it to help them\ndebug race conditions in their applications. That is, yield is mostly used as a debug tool nowadays.\n\nAnyway, if you want to yield a thread, just call the `yield()` method from it, like this:\n\n```zig\nthread.yield();\n```\n\n\n\n\n\n\n## Common problems in threads\n\n\n\n### Deadlocks\n\nA deadlock occurs when two or more threads are blocked forever,\nwaiting for each other to release a resource. This usually happens when multiple locks are involved,\nand the order of acquiring them is not well managed.\n\nThe code example below demonstrates a deadlock situation. We have two different threads that execute\ntwo different functions (`work1()` and `work2()`) in this example. And we also have two separate\nmutexes. If you compile and run this code example, you will notice that the program just runs indefinitely,\nwithout ending.\n\nWhen we look into the first thread, which executes the `work1()` function, we can\nnotice that this function acquires the `mut1` lock first. Because this is the first operation\nthat is executed inside this thread, which is the first thread created in the program.\nAfter that, the function sleeps for 1 second, to\nsimulate some type of work, and then, the function tries to acquire the `mut2` lock.\n\nOn the other hand, when we look into the second thread, which executes the `work2()` function,\nwe can see that this function acquires the `mut2` lock first. Because when this thread get's created and it tries\nto acquire this `mut2` lock, the first thread is still sleeping on that \"sleep 1 second\" line.\nAfter acquiring `mut2`, the `work2()` function also sleeps for 1 second, to\nsimulate some type of work, and then, the function tries to acquire the `mut1` lock.\n\nThis creates a deadlock situation, because after the \"sleep for 1 second\" line in both threads,\nthe thread 1 is trying to acquire the `mut2` lock, but this lock is currently being used by thread 2.\nHowever, at this moment, the thread 2 is also trying to acquire the `mut1` lock, which is currently\nbeing used by thread 1. Therefore, both threads end up waiting for ever. Waiting for their peer to\nfree the lock that they want to acquire.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar mut1: Mutex = .{}; var mut2: Mutex = .{};\nfn work1() !void {\n mut1.lock();\n std.time.sleep(1 * std.time.ns_per_s);\n mut2.lock();\n _ = try stdout.write(\"Doing some work 1\\n\");\n mut2.unlock(); mut1.unlock();\n}\n\nfn work2() !void {\n mut2.lock();\n std.time.sleep(1 * std.time.ns_per_s);\n mut1.lock();\n _ = try stdout.write(\"Doing some work 1\\n\");\n mut1.unlock(); mut2.unlock();\n}\n\npub fn main() !void {\n const thr1 = try Thread.spawn(.{}, work1, .{});\n const thr2 = try Thread.spawn(.{}, work2, .{});\n thr1.join();\n thr2.join();\n}\n```\n:::\n\n\n\n\n\n### Not calling `join()` or `detach()` {#sec-not-call-join-detach}\n\nWhen you do not call either `join()` or `detach()` over a thread, then, this thread becomes a \"zombie thread\",\nbecause it does not have a clear \"return point\".\nYou could also interpret this as: \"nobody is properly resposible for managing the thread\".\nWhen we don't establish if a thread is either *joinable* or *detached*,\nnobody becomes responsible for dealing with the return value of this thread, and also,\nnobody becomes responsible for clearing (or freeing) the resources associated with this thread.\n\nYou don't want to be in this situation, so remember to always use `join()` or `detach()`\non the threads that you create. When you don't use these methods, the execution of the thread\nbecomes completely independent from the execution of the main process in your program.\nThis means that the main process of your program might end before the thread finish it's job,\nor vice-versa. The idea is that we have no idea of who is going to finish first. It\nbecomes a race condition problem.\nIn such case, we loose control over this thread, and it's resources are never freed\n(i.e. you have leaked resources in the system).\n\n\n### Cancelling or killing a particular thread\n\nWhen we think about the `pthreads` C library, there is a possible way to asynchronously kill or cancel\na thread, which is by sending a `SIGTERM` signal to the thread through the `pthread_kill()` function.\nBut canceling a thread like this is bad. Is dangerously bad. As consequence, the Zig implementation\nof threads does not have a similar function, or, a similar way to asynchronously cancel or kill\na thread.\n\nTherefore, if you want to cancel a thread in the middle of it's execution in Zig,\nthen, one good strategy that you can take is to use control flow in your favor in conjunction with `join()`.\nMore specifically, you can design your thread around a while loop, that is constantly\nchecking if the thread should continue running.\nIf is time to cancel the thread, we could make the while loop break, and join the thread with the main thread\nby calling `join()`.\n\nThe code example below demonstrates to some extent this strategy.\nHere, we are using control flow to break the while loop, and exit the thread earlier than\nwhat we have initially planned to. This example also demonstrates how can we use\natomic objects in Zig with the `Value()` generic function that we have mentioned at @sec-atomic-operation.\n\n\n```zig\nvar running = std.atomic.Value(bool).init(true);\nvar counter: u64 = 0;\nfn do_more_work() void {\n std.time.sleep(2 * std.time.ns_per_s);\n}\nfn work() !void {\n while (running.load(.monotonic)) {\n for (0..10000) |_| { counter += 1; }\n if (counter < 15000) {\n _ = try stdout.write(\"Time to cancel the thread.\\n\");\n running.store(false, .monotonic);\n } else {\n _ = try stdout.write(\"Time to do more work.\\n\");\n do_more_work();\n running.store(false, .monotonic);\n }\n }\n}\n\npub fn main() !void {\n const thread = try Thread.spawn(.{}, work, .{});\n thread.join();\n}\n```\n\n",
+ "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n# Introducing threads and parallelism in Zig {#sec-thread}\n\nThreads are available in Zig through the `Thread` struct\nfrom the Zig Standard Library. This struct represents a kernel thread, and it follows a POSIX Thread pattern,\nmeaning that, it works similarly to a thread from the `pthread` C library, which is usually available on any distribution\nof the GNU C Compiler (`gcc`). If you are not familiar with threads, I will give you some threory behind it first, shall we?\n\n\n## What are threads? {#sec-what-thread}\n\nA thread is basically a separate context of execution.\nWe use threads to introduce parallelism into our program,\nwhich in most cases, makes the program runs faster, because we have multiple tasks\nbeing performed at the same time, parallel to each other.\n\nPrograms are normally single-threaded by default. Which means that each program\nusually runs on a single thread, or, a single context of execution. When we have only one thread running, we have no\nparallelism. And when we don't have parallelism, the commands are executed sequentially, that is,\nonly one command is executed at a time, one after another. By creating multiple threads inside our program,\nwe start to execute multiple commands at the same time.\n\nPrograms that create multiple threads are very common on the wild. Because many different types\nof applications are well suited for parallelism. Good examples are video and photo-editing applications\n(e.g. Adobe Photoshop or DaVinci Resolve)\n, games (e.g. The Witcher 3), and also web browsers (e.g. Google Chrome, Firefox, Microsoft Edge, etc).\nFor example, in web browsers, threads are normally used to implement tabs.\nIn other words, the tabs in a web browsers usually run as separate threads in the main process of\nthe web browser. That is, each new tab that you open in your web browser,\nusually runs on a separate thread of execution.\n\nBy running each tab in a separate thread, we allow all open tabs in the browser to run at the same time,\nand independently from each other. For example, you might have YouTube, or Spotify, currently opened in\na tab, and you are listening to some podcast in that tab, while, at the same time,\nyou are working in another tab, writing an essay on Google Docs. Even if you are not looking\ninto the YouTube tab, you can still hear the podcast only because this YouTube tab is running in parallel\nwith the other tab where Google Docs is running.\n\nWithout threads, the other alternative would be to run each tab as a completely separate running\nprocess in your computer. But that would be a bad choice, because just a few tabs would already consume\ntoo much power and resources from your computer. In other words, is very expensive to create a completely new process,\ncompared to creating a new thread of execution. Also, the chances of you experiencing lag and overhead\nwhile using the browser would be significant. Threads are faster to create, and they also consume\nmuch, much less resources from the computer, especially because they share some resources\nwith the main process.\n\nTherefore, is the use of threads in modern web browsers that allows you to hear the podcast\nat the same time while you are writing something on Google Docs.\nWithout threads, a web browser would probably be limited to just one single tab.\n\nThreads are also well-suited for anything that involves serving requests or orders.\nBecause serving a request takes time, and usually involves a lot of \"waiting time\".\nIn other words, we spend a lot of time in idle, waiting for something to complete.\nFor example, consider a restaurant. Serving orders in a restaurant usually involves\nthe following steps:\n\n1. receive order from the client.\n1. pass the order to the kitchen, and wait for the food to be cooked.\n1. start cooking the food in the kitchen.\n1. when the food is fully cooked deliver this food to the client.\n\nIf you think about the bulletpoints above, you will notice that one big moment of waiting\nis present in this hole process, which is while the food is being prepared and cooked\ninside the kitchen. Because while the food is being prepped, both the waiter and the client\nitself are waiting for the food to be ready and delivered.\n\nIf we write a program to represent this restaurant, more specifically, a single-threaded program, then,\nthis program would be very inefficient. Because the program would stay in idle, waiting for a considerable amount\nof time on the \"check if food is ready\" step.\nConsider the code snippet exposed below that could potentially represent such\nprogram.\n\nThe problem with this program is the while loop. This program will spend a lot of time\nwaiting on the while loop, doing nothing more than just checking if the food is ready.\nThis is a waste of time. Instead of waiting for something to happen, the waiter\ncould just send the order to the kitchen, and just move on, and continue with receiving\nmore orders from other clients, and sending more orders to the kitchen, insteading\nof doing nothing and waiting for the food to be ready.\n\n```zig\nconst order = Order.init(\"Pizza Margherita\", n = 1);\nconst waiter = Waiter.init();\nwaiter.receive_order(order);\nwaiter.ask_kitchen_to_cook();\nvar food_not_ready = false;\nwhile (food_not_ready) {\n food_not_ready = waiter.is_food_ready();\n}\nconst food = waiter.get_food_from_kitchen();\nwaiter.send_food_to_client(food);\n```\n\nThis is why threads would be a great fit for this program. We could use threads\nto free the waiters from their \"waiting duties\", so they can go on with their\nother tasks, and receive more orders. Take a look at the next example, where I have re-written the above\nprogram into a different program that uses threads to cook and deliver the orders.\n\nYou can see in this program that when a waiter receives a new order\nfrom a client, this waiter executes the `send_order()` function.\nThe only thing that this function does is: it creates a new thread\nand detaches it. Since creating a thread is a very fast operation,\nthis `send_order()` function returns almost immediatly,\nso the waiter spends almost no time worring about the order, and just\nmove on and tries to get the next order from the clients.\n\nInside the new thread created, the order get's cooked by a chef, and when the\nfood is ready, it is delivered to the client's table.\n\n\n```zig\nfn cook_and_deliver_order(order: *Order) void {\n const chef = Chef.init();\n const food = chef.cook(order.*);\n chef.deliver_food(food);\n}\nfn send_order(order: Order) void {\n const cook_thread = Thread.spawn(\n .{}, cook_and_deliver_order, .{&order}\n );\n cook_thread.detach();\n}\n\nconst waiter = Waiter.init();\nwhile (true) {\n const order = waiter.get_new_order();\n if (order) {\n send_order(order);\n }\n}\n```\n\n\n\n## Threads versus processes\n\nWhen we run a program, this program is executed as a *process* in the operating system.\nThis is a one to one relationship, each program or application that you execute\nis a separate process in the operating system. But each program, or each process,\ncan create and contain multiple threads inside of it. Therefore,\nprocesses and threads have a one to many relationship.\n\nThis also means that every thread that we create is always associated with a particular process in our computer.\nIn other words, a thread is always a subset (or a children) of an existing process.\nAll threads share some of the resources associated with the process from which they were created.\nAnd because threads share resources with the process, they are very good for making communication\nbetween tasks easier.\n\nFor example, suppose that you were developing a big and complex application\nthat would be much simpler if you could split it in two, and make these two separate pieces talk\nwith each other. Some programmers opt to effectively write these two pieces of the codebase as two\ncompletely separate programs, and then, they use IPC (*inter-process communication*) to make these\ntwo separate programs/processes talk to each other, and make them work together.\n\nHowever, some programmers find IPC hard to deal with, and, as consequence,\nthey prefer to write one piece of the codebase as the \"main part of the program\",\nor, as the part of the code that runs as the process in the operating system,\nwhile the other piece of the codebase is written as a task to be executed in\na new thread. A process and a thread can easily comunicate with each other\nthrough both control flow, and also, through data, because they share and have\naccess to the same standard file descriptors (`stdout`, `stdin`, `stderr`) and also to the same memory space\non the heap and global data section.\n\n\nIn more details, each thread that you create have a separate stack frame reserved just for that thread,\nwhich essentially means that each local object that you create inside this thread, is local to that\nthread, i.e. the other threads cannot see this local object. Unless this object that you have created\nis an object that lives on the heap. In other words, if the memory associated with this object\nis on the heap, then, the other threads can potentially access this object.\n\nTherefore, objects that are stored in the stack are local to the thread where they were created.\nBut objects that are stored on the heap are potentially accessible to other threads. All of this means that,\neach thread have it's own separate stack frame, but, at the same time, all threads share\nthe same heap, the same standard file descriptors (which means that they share the same `stdout`, `stdin`, `stderr`),\nand the same global data section in the program.\n\n\n\n## Creating a thread\n\nWe create new threads in Zig, by first, importing the `Thread` struct into\nour current Zig module, and then, calling the `spawn()` method of this struct,\nwhich creates (or, \"spawns\") a new thread of execution from our current process.\nThis method have three arguments, which are, respectively:\n\n1. a `SpawnConfig` object, which contains configurations for the spawn process.\n1. the name of the function that is going to be executed (or, that is going to be \"called\") inside this new thread.\n1. a list of arguments (or inputs) to be passed to the function provided in the second argument.\n\nWith these three arguments, you can control how the thread get's created, and also, specify which\nwork (or \"tasks\") will be performed inside this new thread. A thread is just a separate context of execution,\nand we usually create new threads in our code, because we want to perform some work inside this\nnew context of execution. And we specify which exact work, or, which exact steps that are going to be\nperformed inside this context, by providing the name of a function on the second argument of the `spawn()` method.\n\nThus, when this new thread get's created, this function that you provided as input to the `spawn()`\nmethod get's called, or, get's executed inside this new thread. You can control the\narguments, or, the inputs that are passed to this function when it get's called, by providing\na list of arguments (or a list of inputs) on the third argument of the `spawn()` method.\nThese arguments are passed to the function in the same order that they are\nprovided to `spawn()`.\n\nFurthermore, the `SpawnConfig` is a struct object with only two possible fields, or, two possible members, that you\ncan set to tailor the spawn behaviour. These fields are:\n\n- `stack_size`: you can provide an `usize` value to specify the size (in bytes) of the thread's stack frame. By default, this value is: $16 \\times 1024 \\times 1024$.\n- `allocator`: you can provide an allocator object to be used when allocating memory for the thread.\n\nTo use one of these two fields (or, \"configs\") you just have to create a new object of type `SpawnConfig`,\nand provide this object as input to the `spawn()` method. But, if you are not interested in using\none of these configs, and you are ok with using just the defaults, you can just provide an anonymous\nstruct literal (`.{}`) in the place of this `SpawnConfig` argument.\n\nAs our first, and very simple example, consider the code exposed below.\nInside the same program, you can create multiple threads of execution if you want to.\nBut, in this first example, we are creating just a single thread of execution, because\nwe call `spawn()` only once.\n\nAlso, notice in this example that we are executing the function `do_some_work()`\ninside the new thread. Since this function receives no inputs, because it has\nno arguments, in this instance, we have passed an empty list, or, more precisely, an empty and anonymous struct (`.{}`)\nin the third argument of `spawn()`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst Thread = std.Thread;\nfn do_some_work() !void {\n _ = try stdout.write(\"Starting the work.\\n\");\n std.time.sleep(100 * std.time.ns_per_ms);\n _ = try stdout.write(\"Finishing the work.\\n\");\n}\n\npub fn main() !void {\n const thread = try Thread.spawn(.{}, do_some_work, .{});\n thread.join();\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nStarting the work.Finishing the work.\n```\n\n\n:::\n:::\n\n\n\n\nNotice the use of `try` when calling the `spawn()` method. This means\nthat this method can return an error in some circunstances. One circunstance\nin particular is when you attempt to create a new thread, when you have already\ncreated too much (i.e. you have excedeed the quota of concurrent threads in your system).\n\nBut, if the new thread is successfully created, the `spawn()` method returns a handler\nobject (which is just an object of type `Thread`) to this new thread. You can use\nthis handler object to effectively control all aspects of the thread.\n\nThe instant that you create the new thread, the function that you provided as input to `spawn()`\nget's invoked (i.e. get's called) to start the execution on this new thread.\nIn other words, everytime you call `spawn()`, not only a new thread get's created,\nbut also, the \"start work button\" of this thread get's automatically pressed.\nSo the work being performed in this thread starts at the moment that the thread is created.\nThis is similar to how `pthread_create()` from the `pthreads` library in C works,\nwhich also starts the execution at the moment that the thread get's created.\n\n\n## Returning from a thread\n\nWe have learned on the previous section that the execution of the thread starts at the moment\nthat the thread get's created. Now, we will learn how to \"join\" or \"detach\" a thread in Zig.\n\"Join\" and \"detach\" are operations that control how the thread returns to\nthe main thread, or, to the main process in our program.\n\nWe perform these operations by using the methods `join()` and `detach()` from the thread handler object.\nEvery thread that you create can be marked as either *joinable* or *detached* [@linux_pthread_create].\nYou can turn a thread into a *detached* thread by calling the `detach()` method\nfrom the thread handler object. But if you call the `join()` method instead, then, this thread\nbecomes a *joinable* thread.\n\nA thread cannot be both *joinable* and *detached*. Which in general means\nthat you cannot call both `join()` and `detach()` on the same thread.\nBut a thread must be one of the two, meaning that, you should always call\neither `join()` or `detach()` over a thread. If you don't call\none of these two methods over your thread, you introduce undefined behaviour into your program,\nwhich is described at @sec-not-call-join-detach.\n\nNow, let's describe what each of these two methods do to your thread.\n\n\n### Joining a thread\n\nWhen you join a thread, you are essentially saying: \"Hey! Could you please wait for the thread to finish,\nbefore you continue with your execution?\". For example, if we comeback to our first and simpliest example\nof a thread in Zig, in that example we have created a single thread inside the `main()` function of our program,\nand just called `join()` over this thread at the end. This section of the code example is reproduced below.\n\nBecause we are joining this new thread inside the `main()`'s scope, it means that the\nexecution of the `main()` function is temporarily stopped, to wait for the execution of the thread\nto finish. That is, the execution of `main()` stops temporarily at the line where `join()` get's called,\nand it will continue only after the thread has finished it's tasks.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\npub fn main() !void {\n const thread = try Thread.spawn(.{}, do_some_work, .{});\n thread.join();\n}\n```\n:::\n\n\n\n\nBecause we have joined this new thread inside `main()`, by calling `join()`, we have a\ngarantee that this new thread will finish before the end of the execution of `main()`.\nBecause it is garanteed that `main()` will wait for the thread to finish it's tasks.\nYou could also interpret this as: the execution of main will hang at\nthe line where `join()` is called, and the next lines of code that come after\nthis `join()` call, will be executed solely after the execution of main\nis \"unlocked\" after the thread finish it's tasks.\n\nIn the example above, there is no more expressions after the `join()` call. We just have the end\nof the `main()`'s scope, and, therefore after the thread finish it's tasks, the execution\nof our program just ends, since there is nothing more to do. But what if we had more stuff to do\nafter the join call?\n\nTo demonstrate this other possibility, consider the next example exposed\nbelow. Here, we create a `print_id()` function, that just receives an id\nas input, and prints it to `stdout`. In this example, we are creating two\nnew threads, one after another. Then, we join the first thread, then,\nwe wait for two hole seconds, then, at last, we join the second thread.\n\nThe idea behind this example is that the last `join()` call is executed\nonly after the first thread finish it's task (i.e. the first `join()` call),\nand also, after the two seconds of delay. If you compile and run this\nexample, you will notice that most messages are quickly printed to `stdout`,\ni.e. they appear almost instantly on your screen.\nHowever, the last message (\"Joining thread 2\") takes aroung 2 seconds to appear\nin the screen.\n\n\n```zig\nfn print_id(id: *const u8) !void {\n try stdout.print(\"Thread ID: {d}\\n\", .{id.*});\n}\n\npub fn main() !void {\n const id1: u8 = 1;\n const id2: u8 = 2;\n const thread1 = try Thread.spawn(.{}, print_id, .{&id1});\n const thread2 = try Thread.spawn(.{}, print_id, .{&id2});\n\n _ = try stdout.write(\"Joining thread 1\\n\");\n thread1.join();\n std.time.sleep(2 * std.time.ns_per_s);\n _ = try stdout.write(\"Joining thread 2\\n\");\n thread2.join();\n}\n```\n\n```\nThread ID: Joining thread 1\n1\nThread ID: 2\nJoining thread 2\n```\n\nThis demonstrates that both threads finish their work (i.e. printing the IDs)\nvery fast, before the two seconds of delay end. Because of that, the last `join()` call\nreturns pretty much instantly. Because when this last `join()` call happens, the second\nthread have already finished it's task.\n\nNow, if you compile and run this example, you will also notice that, in some cases,\nthe messages get intertwined with each other. In other words, you might see\nthe message \"Joining thread 1\" inserted in the middle of the message \"Thread 1\",\nor vice-versa. This happens because:\n\n- the threads are executing basically at the same time as the main process of the program (i.e. the `main()` function).\n- the threads share the same `stdout` from the main process of the program, which means that the messages that the threads produce are sent to exact same place as the messages produced by the main process.\n\nBoth of these points were described previously at @sec-what-thread.\nSo the messages might get intertwined because they are being produced and\nsent to the same `stdout` roughly at the same time.\nAnyway, when you call `join()` over a thread, the current process will wait\nfor the thread to finish before it continues, and, when the thread does finishs it's\ntask, the resources associated with this thread are automatically freed, and,\nthe current process continues with it's execution.\n\n\n### Detaching a thread\n\nWhen you detach a thread, by calling the `detach()` method, the thread is marked as *detached*.\nWhen a *detached* thread terminates, its resources are automatically released back to the system without\nthe need for another thread to join with this terminated thread.\n\nIn other words, when you call `detach()` over a thread is like when your children becomes adults,\ni.e. they become independent from you. A detached thread frees itself, and it does need to report the results back\nto you, when the thread finishs it's task. Thus, you normally mark a thread as *detached*\nwhen you don't need to use the return value of the thread, or, when you don't care about\nwhen exactly the thread finishs it's job, i.e. the thread solves everything by itself.\n\nTake the code example below. We create a new thread, detach it, and then, we just\nprint a final message before we end our program. We use the same `print_id()`\nfunction that we have used over the previous examples.\n\n\n```zig\nfn print_id(id: *const u8) !void {\n try stdout.print(\"Thread ID: {d}\\n\", .{id.*});\n}\n\npub fn main() !void {\n const id1: u8 = 1;\n const thread1 = try Thread.spawn(.{}, print_id, .{&id1});\n thread1.detach();\n _ = try stdout.write(\"Finish main\\n\");\n}\n```\n\n```\nFinish main\n```\n\nNow, if you look closely at the output of this code example, you will notice\nthat only the final message in main was printed to the console. The message\nthat was supposed to be printed by `print_id()` did not appear in the console.\nWhy? Is because the main process of our program has finished first,\nbefore the thread was able to say anything.\n\nAnd that is perfectly ok behaviour, because the thread was detached, so, it was\nable to free itself, without the need of the main process.\nIf you ask main to sleep (or \"wait\") for some extra nanoseconds, before it ends, you will likely\nsee the message printed by `print_id()`, because you give enough time for the thread to\nfinish before the main process ends.\n\n\n## Thread pools\n\nThread pools is a very popular programming pattern, which is used especially on servers and daemons processes. A thread pool is just a\nset of threads, or, a \"pool\" of threads. Many programmers like to use this pattern, because it makes\neasier to manage and use multiple threads, instead of manually creating the threads when you need them.\n\nAlso, using thread pools might increase performance as well in your program,\nespecially if your program is constantly creating threads to perform short-lived tasks.\nIn such instance, a thread pool might cause an increase in performance because you do not have be constantly\ncreating and destroying threads all the time, so you don't face a lot of the overhead involved\nin this constant process of creating and destroying threads.\n\nThe main idea behind a thread pool is to have a set of threads already created and ready to perform\ntasks at all times. You create a set of threads at the moment that your program starts, and keep\nthese threads alive while your program runs. Each of these threads will be either performing a task, or,\nwaiting for a task to be assigned.\nEvery time a new task emerges in your program, this task is added to a \"queue of tasks\".\nThe moment that a thread becomes available and ready to perform a new task,\nthis thread takes the next task in the \"queue of tasks\", then,\nit simply performs the task.\n\nThe Zig Standard Library offers a thread pool implementation on the `std.Thread.Pool` struct.\nYou create a new instance of a `Pool` object by providing a `Pool.Options` object\nas input to the `init()` method of this struct. A `Pool.Options` object, is a struct object that contains\nconfigurations for the pool of threads. The most important settings in this struct object are\nthe members `n_jobs` and `allocator`. As the name suggests, the member `allocator` should receive an allocator object,\nwhile the member `n_jobs` specifies the number of threads to be created and maintained in this pool.\n\nConsider the example exposed below, that demonstrates how can we create a new thread pool object.\nHere, we create a `Pool.Options` object that contains\na general purpose allocator object, and also, the `n_jobs` member was set to 4, which\nmeans that the thread pool will create and use 4 threads.\n\nAlso notice that the `pool` object was initially set to `undefined`. This allow us\nto initially declare the thread pool object, but not properly instantiate the\nunderlying memory of the object. You have to initially declare your thread pool object\nby using `undefined` like this, because the `init()` method of `Pool` needs\nto have an initial pointer to properly instantiate the object.\n\nSo, just\nremember to create your thread pool object by using `undefined`, and then,\nafter that, you call the `init()` method over the object.\nYou should also not forget to call the `deinit()` method over the thread pool\nobject, once you are done with it, to release the resources allocated for the thread pool. Otherwise, you will\nhave a memory leak in your program.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst Pool = std.Thread.Pool;\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n const opt = Pool.Options{\n .n_jobs = 4,\n .allocator = allocator,\n };\n var pool: Pool = undefined;\n _ = try pool.init(opt);\n defer pool.deinit();\n}\n```\n:::\n\n\n\n\nNow that we know how to create `Pool` objects, we have\nto understand how to assign tasks to be executed by the threads in this pool object.\nTo assign a task to be performed by a thread, we need to call the `spawn()` method\nfrom the thread pool object.\n\nThis `spawn()` method works identical to the `spawn()` method from the\n`Thread` object. The method have almost the same arguments as the previous one,\nmore precisely, we don't have to provide a `SpawnConfig` object in this case.\nBut instead of creating a new thread, this `spawn()` method from\nthe thread pool object just register a new task in the internal \"queue of tasks\" to be performed,\nand any available thread in the pool will get this task, and it will simply perform the task.\n\nIn the example below, we are using our previous `print_id()` function once again.\nBut you may notice that the `print_id()` function is a little different this time,\nbecause now we are using `catch` instead of `try` in the `print()` call.\nCurrently, the `Pool` struct only supports functions that don't return errors\nas tasks. Thus, when assigining tasks to threads in a thread pool, is essential to use functions\nthat don't return errors. That is why we are using `catch` here, so that the\n`print_id()` function don't return an error.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nfn print_id(id: *const u8) void {\n _ = stdout.print(\"Thread ID: {d}\\n\", .{id.*})\n catch void;\n}\nconst id1: u8 = 1;\nconst id2: u8 = 2;\ntry pool.spawn(print_id, .{&id1});\ntry pool.spawn(print_id, .{&id2});\n```\n:::\n\n\n\n\nThis limitation should probably not exist, and, in fact, it is already on the radar of the\nZig team to fix this issue, and it is being tracked on an [open issue](https://github.com/ziglang/zig/issues/18810)[^issue].\nSo, if you do need to provide a function that might return an error as the task\nto be performed by the threads in the thread pool, then, you are either limited to:\n\n- implementing your own thread pool that does not have this limitation.\n- wait for the Zig team to actually fix this issue.\n\n[^issue]: \n\n\n\n\n## Mutexes\n\nMutexes are a classic component of every thread library. In essence, a mutex is a *Mutually Exclusive Flag*, and this flag\nacts like a type of \"lock\", or as a gate keeper to a particular section of your code. Mutexes are related to thread syncronization,\nmore specifically, they prevent you from having some classic race conditions in your program,\nand, therefore, major bugs and undefined behaviour that are usually difficult to track and understand.\n\nThe main idea behind a mutex is to help us to control the execution of a particular section of the code, and to\nprevent two or more threads from executing this particular section of the code at the same time.\nMany programmers like to compare a mutex to a bathroom door (which usually have a lock).\nWhen a thread locks it's own mutex object, it is like if the bathroom door was locked,\nand, therefore, the other people (in this case, the other threads) that wants to use the same bathroom at the same time\nhave to be patient, and simply wait for the other person (or the other thread) to unlock the door and get out of the bathroom.\n\nSome other programmers also like to explain mutexes by using the analogy of \"each person will have their turn to speak\".\nThis is the analogy used on the [*Multithreading Code* video from the Computherfile project](https://www.youtube.com/watch?v=7ENFeb-J75k&ab_channel=Computerphile)[^computerphile].\nImagine\nif you are in a conversation circle. There is a moderator in this circle, which is the person that decides who\nhave the right to speak at that particular moment. The moderator gives a green card (or some sort of an authorization card) to the person that\nis going to speak, and, as a result, everyone else must be silent and hear this person that has the green card.\nWhen the person finishs talking, it gives the green card back to the moderator, and the moderator decides\nwho is going to talk next, and delivers the green card to that person. And the cycle goes on like this.\n\n[^computerphile]: \n\n\nA mutex acts like the moderator in this conversation circle. The mutex authorizes one single thread to execute a specific section of the code,\nand it also blocks the other threads from executing this same section of the code. If these other threads wants to execute this same\npiece of the code, they are forced to wait for the the authorized thread to finish first.\nWhen the authorized thread finishs executing this code, the mutex authorizes the next thread to execute this code,\nand the other threads are still blocked. Therefore, a mutex is like a moderator that does a \"each thread will have their turn to execute this section of the code\"\ntype of control.\n\n\nMutexes are especially used to prevent data race problems from happening. A data race problem happens when two or more threads\nare trying to read from or write to the same shared object at the same time.\nSo, when you have an object that is shared will all threads, and, you want to avoid two or more threads from\naccessing this same object at the same time, you can use a mutex to lock the part of the code that access this specific object.\nWhen a thread tries to run this code that is locked by a mutex, this thread stops it's execution, and patiently waits for this section of the codebase to be\nunlocked to continue.\n\nIn other words, the execution of the thread is paused while the code section\nis locked by the mutex, and it is unpaused the moment that the code section is unlocked by the other thread that\nwas executing this code section.\nNotice that mutexes are normally used to lock areas of the codebase that access/modify data that is **shared** with all threads,\ni.e. objects that are either stored in the global data section, or, in the heap space of your program.\nSo mutexes are not normally used on areas of the codebase that access/modify objects that are local to the thread.\n\n\n\n### Critical section {#sec-critical-section}\n\nCritical section is a concept commonly associated with mutexes and thread syncronization.\nIn essence, a critical section is the section of the program that a thread access/modify a shared resource\n(i.e. an object, a file descriptor, something that all threads have access to). In other words,\na critical section is the section of the program where race conditions might happen, and, therefore,\nwhere undefined behaviour can be introduced into the program.\n\nWhen we use mutexes in our program, the critical section defines the area of the codebase that we want to lock.\nSo we normally lock the mutex object at the beginning of the critical section,\nand then, we unlock it at the end of the critical section.\nThe two bulletpoints exposed below comes from the \"Critical Section\" article from GeekFromGeeks,\nand they summarise well the role that a critical section plays in the thread syncronization problem [@geeks_critical_section].\n\n\n1. The critical section must be executed as an atomic operation, which means that once one thread or process has entered the critical section, all other threads or processes must wait until the executing thread or process exits the critical section. The purpose of synchronization mechanisms is to ensure that only one thread or process can execute the critical section at a time.\n2. The concept of a critical section is central to synchronization in computer systems, as it is necessary to ensure that multiple threads or processes can execute concurrently without interfering with each other. Various synchronization mechanisms such as semaphores, mutexes, monitors, and condition variables are used to implement critical sections and ensure that shared resources are accessed in a mutually exclusive manner.\n\n\n### Atomic operations {#sec-atomic-operation}\n\nYou will also see the term \"atomic operation\" a lot when reading about threads, race conditions and mutexes.\nIn summary, an operation is categorized as \"atomic\", when there is no way to happen a context switch in\nthe middle of this operation. In other words, this operation is always done from beginning to end, without interruptions\nof another process or operation in the middle of it's execution phase.\n\nNot many operations today are atomic. But why atomic operations matters here? Is because data races\n(which is a type of a race condition) cannot happen on operations that are atomic.\nSo if a particular line in your code performs an atomic operation, then, this line will never\nsuffer from a data race problem. Therefore, programmers sometimes use an atomic operation\nto protect themselves from data race problems in their code.\n\nWhen you have an operation that is compiled into just one single assembly instruction, this operation might be atomic,\nbecause is just one assembly instruction. But this is not guaranteed. This is usually true for old CPU architectures (such as `x86`). But nowadays, most\nassembly instructions in modern CPU architectures turn into multiple micro-tasks, which inherently makes the operation not atomic anymore,\neven though it has just one single assembly instruction.\n\nThe Zig Standard Library offers some atomic functionality at the `std.atomic` module.\nIn this module, you will find a public and generic function called `Value()`. With this function we create an \"atomic object\", which is\na value that contains some native atomic operations, most notably, a `load()` and a `fetchAdd()` operation.\nIf you have experience with multithreading in C++, you probably have recognized this pattern. So yes, this generic\n\"atomic object\" in Zig is essentially identical to the template struct `std::atomic` from the C++ Standard Library.\nIs important to emphasize that only primitive data types (i.e. the types presented at @sec-primitive-data-types)\nare supported by these atomic operations.\n\n\n\n\n\n### Data races and race conditions\n\nTo understand why mutexes are used, we need to understand better the problem that they seek\nto solve, which can be summarized into data races problems. A data race problem is a type of a race condition,\nwhich happens when one thread is accessing a particular memory location (i.e. a particular shared object) at the same\ntime that another thread is trying to write/save new data into this same memory location (i.e. the same shared object).\n\nWe can simply define a race condition as any type of bug in your program that is based\non a \"who get's there first\" problem. A data race problem is a type of a race condition, because it occurs when two or more parties\nare trying to read and write into the same memory location at the same time, and, therefore, the end result of this operation\ndepends completely on who get's to this memory location first.\nAs consequence, a program that have a data race problem will likely produce a different result each time that we execute it.\n\nThus, race conditions produce unefined behaviour and unpredictability because the program produces\na different answer in each time that a different person get's to the target location first than the others.\nAnd we have no easy way to either predict or control who is going to get to this target location first.\nIn other words, in each execution of your program,\nyou get a different answer, because a different person, or, a different function, or, a different part of the code is finishing\nits tasks (or it is reaching a location) first than the others.\n\nAs an example, consider the code snippet exposed below. In this example, we create a global counter\nvariable, and we also create a `increment()` function, whose job is to just increment this global counter\nvariable in a for loop.\n\nSince the for loop iterates 1 hundred thousand times, and, we create two separate threads\nin this code example, what number do you expect to see in the final message printed to `stdout`?\nThe answer should be 2 hundred thousand. Right? Well, in threory, this program was supposed\nto print 2 hundred thousand at the end, but in practice, every time that I execute this program\nI get a different answer.\n\nIn the example exposed below, you can see that this time we have executed the program, the end\nresult was 117254, instead of the expected 200000. The second time I have executed this program,\nI got the number 108592 as result. So the end result of this program is varying, but it never gets\nto the expected 200000 that we want.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n// Global counter variable\nvar counter: usize = 0;\n// Function to increment the counter\nfn increment() void {\n for (0..100000) |_| {\n counter += 1;\n }\n}\n\npub fn main() !void {\n const thr1 = try Thread.spawn(.{}, increment, .{});\n const thr2 = try Thread.spawn(.{}, increment, .{});\n thr1.join();\n thr2.join();\n try stdout.print(\"Couter value: {d}\\n\", .{counter});\n}\n```\n:::\n\n\n\n\n```\nCouter value: 117254\n```\n\n\nWhy this is happening? The answer is: because this program contains a data race problem.\nThis program would print the correct number 200000, if, and only if the first thread finishs\nit's tasks before the second thread starts to execute. But that is very unlikely to happen.\nBecause the process of creating the thread is too fast, and therefore, both threads starts to execute roughly\nat the same time. If you change this code to add some nanoseconds of sleep between the first and the second calls to `spawn()`,\nyou will increase the chances of the program producing the \"correct result\".\n\nSo the data race problem happens, because both threads are reading and writing to the same\nmemory location at roughly the same time. In this example, each thread is essentially performing\nthree basic operations at each iteration of the for loop, which are:\n\n1. reading the current value of `count`.\n1. incrementing this value by 1.\n1. writing the result back into `count`.\n\nIdeally, a thread B should read the value of `count`, only after the other thread A has finished\nwriting the incremented value back into the `count` object. Therefore, in the ideal scenario, which is demonstrated\nat @tbl-data-race-ideal, the threads should work in sync with each other. But the reality is that these\nthreads are out of sync, and because of that, they suffer from a data race problem, which is demonstrated\nat @tbl-data-race-not.\n\nNotice that, in the data race scenario (@tbl-data-race-not), the read performed by a thread B happens\nbefore the write operation of thread A, and that ultimately leads to wrong results at the end of the program.\nBecause when the thread B reads the value from the `count` variable, the thread A is still processing\nthe initial value from `count`, and it did not write the new and incremented value into `count` yet. So what\nhappens is that thread B ends up reading the same initial value (the \"old\" value) from `count`, instead of\nreading the new and incremented version of this value that would be calculated by thread A.\n\n\n::: {#tbl-data-race-ideal}\n\n| Thread 1 | Thread 2 | Integer value |\n|-------------|-------------|---------------|\n| read value | | 0 |\n| increment | | 1 |\n| write value | | 1 |\n| | read value | 1 |\n| | increment | 2 |\n| | write value | 2 |\n\n: An ideal scenario for two threads incrementing the same integer value\n:::\n\n::: {#tbl-data-race-not}\n\n| Thread 1 | Thread 2 | Integer value |\n|-------------|-------------|---------------|\n| read value | | 0 |\n| | read value | 0 |\n| increment | | 1 |\n| | increment | 1 |\n| write value | | 1 |\n| | write value | 1 |\n\n: A data race scenario when two threads are incrementing the same integer value\n:::\n\n\nIf you think about these diagrams exposed in form of tables, you will notice that they relate back to our discussion of atomic operations\nat @sec-atomic-operation. Remember, atomic operations are operations that the CPU executes\nfrom beginning to end, without interruptions from other threads or processes. So,\nthe scenario exposed at @tbl-data-race-ideal do not suffer from a data race, because\nthe operations performed by thread A are not interrupted in the middle by the operations\nfrom thread B.\n\nIf we also think about the discussion of critical section from @sec-critical-section, we can identify\nthe section that representes the critical section of the program, which is the section that is vulnerable\nto data race conditions. In this example, the critical section of the program is the line where we increment\nthe `counter` variable (`counter += 1`). So, ideally, we want to use a mutex, and lock right before this line, and then,\nunlock right after this line.\n\n\n\n\n### Using mutexes in Zig\n\nNow that we know the problem that mutexes seek to solve, we can learn how to use them in Zig.\nMutexes in Zig are available through the `std.Thread.Mutex` struct from the Zig Standard Library.\nIf we take the same code example from the previous example, and improve it with mutexes, to solve\nour data race problem, we get the code example exposed below.\n\nNotice that we had this time to alter the `increment()` function to receive a pointer to\nthe `Mutex` object as input. All that we need to do, to make this program safe against\ndata race problems, is to call the `lock()` method at the beginning of\nthe critical section, and then, call `unlock()` at the end of the critical section.\nNotice that the output of this program is now the correct number of 200000.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst Thread = std.Thread;\nconst Mutex = std.Thread.Mutex;\nvar counter: usize = 0;\nfn increment(mutex: *Mutex) void {\n for (0..100000) |_| {\n mutex.lock();\n counter += 1;\n mutex.unlock();\n }\n}\n\npub fn main() !void {\n var mutex: Mutex = .{};\n const thr1 = try Thread.spawn(.{}, increment, .{&mutex});\n const thr2 = try Thread.spawn(.{}, increment, .{&mutex});\n thr1.join();\n thr2.join();\n try stdout.print(\"Couter value: {d}\\n\", .{counter});\n}\n```\n\n\n::: {.cell-output .cell-output-stdout}\n\n```\nCouter value: 200000\n```\n\n\n:::\n:::\n\n\n\n\n\n\n\n## Read/Write locks\n\nMutexes are normally used when is always not safe to have two or more threads running the same\npiece of code at the same time. In contrast, read/write locks are normally used in situations\nwhere you have a mixture of scenarios, i.e. there are some pieces of the codebase that are safe to run in parallel, and other pieces that\nare not safe.\n\nFor example, suppose that you have multiple threads that uses the same shared file in the filesystem to store some configurations, or,\nstatistics. If two or more threads try to read the data from this same file at the same time, nothing bad happens.\nSo this part of the codebase is perfectly safe to be executed in parallel, with multiple threads reading the same file at the same time.\n\nHowever, if two or more threads try to write data into this same file at the same time, then, we cause some race conditions\nproblems. So this other part of the codebase is not safe to be executed in parallel.\nMore specifically, a thread might end up writing data in the middle of the data written by the other thread.\nThis process of two or more threads writing to the same location, might lead to data corruption.\nThis specific situation is usually called of a *torn write*.\n\nThus, what we can extract from this is that there is certain types of operations that causes a race condition, but there\nare also, other types of operations that do not cause a race condition problem.\nYou could also say that, there are types of operations that are susceptible to race condition problems,\nand there are other types of operations that are not.\n\nA read/write lock is a type of lock that acknowledges the existance of this specific scenario, and you can\nuse this type of lock to control which parts of the codebase are safe to run in parallel, and which parts are not safe.\n\n\n\n### Exclusive lock vs shared lock\n\nTherefore, a read/write lock is a little different from a mutex. Because a mutex is always an *exclusive lock*, meaning that, only\none thread is allowed to execute at all times. With an exclusive lock, the other threads are always \"excluded\",\ni.e. they are always blocked from executing.\nBut in a read/write lock, the other threads might be authorized to run at the same time, depending on the type of lock that they acquire.\n\nWe have two types of locks in a read/write lock, which are: an exclusive lock and a shared lock. An exclusive lock works exactly the same\nas a mutex, while a shared lock is a lock that does not block the other threads from running.\nIn the `pthreads` C library, read/write locks are available through the `pthread_rwlock_t` C struct. With\nthis C struct, you can create a \"write lock\", which corresponds to an exclusive lock, or, you can create a \"read lock\",\nwhich corresponds to a shared lock. The terminology might be a little different, but the meaning is the same,\nso just remember this relationship, write locks are exclusive locks, while read locks are shared locks.\n\nWhen a thread tries to acquire a read lock (i.e. a shared lock), this thread get's the shared lock\nif, and only if another thread does not currently holds a write lock (i.e. an exclusive lock), and also,\nif there are no other threads that are already in the queue,\nwaiting for their turn to acquire a write lock. In other words, the thread in the queue have attempted\nto get a write lock earlier, but this thread was blocked\nbecause there was another thread running that already had a write lock. As consequence, this thread is on the queue to get a write lock,\nand it's currently waiting for the other thread with a write lock to finish it's execution.\n\nWhen a thread tries to acquire a read lock, but it fails in acquiring this read lock, either because there is\na thread with a write lock already running, or, because there is a thread in the queue to get a write lock,\nthe execution of this thread is instantly blocked, i.e. paused. This thread will indefinitely attempt to get the\nread lock, and it's execution will be unblocked (or unpaused) only after this thread successfully acquires the read lock.\n\nIf you think deeply about this dynamic between read locks versus write locks, you might notice that a read lock is basically a safety mechanism.\nMore specifically, it is a way for us to\nallow a particular thread to run together with the other threads, only when it's safe to. In other words, if there is currently\na thread with a write lock running, then, it is very likely not safe for the thread that is trying to acquire the read lock to run now.\nAs consequence, the read lock protects this thread from running into dangerous waters, and patienly waits for the\n\"write lock\" thread to finishs it's tasks before it continues.\n\nOn the other hand, if there are only \"read lock\" (i.e. \"shared lock\") threads currently running\n(i.e. not a single \"write lock\" thread currently exists), then,\nis perfectly safe for this thread that is acquiring the read lock to run in parallel with the other\nthreads. As a result, the read lock just\nallows for this thread to run together with the other threads.\n\nThus, by using read locks (shared locks) in conjunction with write locks (exclusive locks), we can control which regions or sections\nof our multithreaded code is safe for us to have parallelism, and which sections are not safe to have parallelism.\n\n\n\n\n\n### Using read/write locks in Zig\n\nThe Zig Standard Library supports read/write locks through the `std.Thread.RwLock` module.\nIf you want to a particular thread to acquire a shared lock (i.e. a read lock), you should\ncall the `lockShared()` method from the `RwLock` object. But, if you want for this thread\nto acquire an exclusive lock (i.e. a write lock) instead, then, you should call the\n`lock()` method from the `RwLock` object.\n\nAs with mutexes, we also have to unlock the shared or exclusive locks that we acquire through a read/write lock object,\nonce we are at the end of our \"critical section\". If you have acquired an exclusive lock, then, you unlock\nthis exclusive lock by calling the `unlock()` method from the read/write lock object. In contrast,\nif you have acquired a shared lock instead, then, call `unlockShared()` to unlock this shared lock.\n\nAs a simple example, the code below creates three separate threads responsible for reading the\ncurrent value in a `counter` object, and it also creates another thread, responsible for writing\nnew data into the `counter` object (incrementing it, more specifically).\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar counter: u32 = 0;\nfn reader(lock: *RwLock) !void {\n while (true) {\n lock.lockShared();\n const v: u32 = counter;\n try stdout.print(\"{d}\", .{v});\n lock.unlockShared();\n std.time.sleep(2 * std.time.ns_per_s);\n }\n}\nfn writer(lock: *RwLock) void {\n while (true) {\n lock.lock();\n counter += 1;\n lock.unlock();\n std.time.sleep(2 * std.time.ns_per_s);\n }\n}\n\npub fn main() !void {\n var lock: RwLock = .{};\n const thr1 = try Thread.spawn(.{}, reader, .{&lock});\n const thr2 = try Thread.spawn(.{}, reader, .{&lock});\n const thr3 = try Thread.spawn(.{}, reader, .{&lock});\n const wthread = try Thread.spawn(.{}, writer, .{&lock});\n\n thr1.join();\n thr2.join();\n thr3.join();\n wthread.join();\n}\n```\n:::\n\n\n\n\n\n## Yielding a thread\n\nThe `Thread` struct supports yielding through the `yield()` method.\nYielding a thread means that the execution of the thread is temporarily stopped,\nand the thread comes back to the end of the queue of priority of the scheduler from\nyour operating system.\n\nThat is, when you yield a thread, you are essentially saying the following to your OS:\n\"Hey! Could you please stop executing this thread for now, and comeback to continue it later?\".\nYou could also interpret this yield operation as: \"Could you please deprioritize this thread,\nto focus on doing other things instead?\".\nSo this yield operation is also a way for you\nto stop a particular thread, so that you can work and prioritize other threads instead.\n\nIs important to say that, yielding a thread is a \"not so common\" thread operation these days.\nIn other words, not many programmers use yielding in production, simply because is hard to use\nthis operation and make it work properly, and also, there\nare better alternatives. Most programmers prefer to use `join()` instead.\nIn fact, most of the times, when you see somebody using yield in some code example, they are mostly using it to help them\ndebug race conditions in their applications. That is, yield is mostly used as a debug tool nowadays.\n\nAnyway, if you want to yield a thread, just call the `yield()` method from it, like this:\n\n```zig\nthread.yield();\n```\n\n\n\n\n\n\n## Common problems in threads\n\n\n\n### Deadlocks\n\nA deadlock occurs when two or more threads are blocked forever,\nwaiting for each other to release a resource. This usually happens when multiple locks are involved,\nand the order of acquiring them is not well managed.\n\nThe code example below demonstrates a deadlock situation. We have two different threads that execute\ntwo different functions (`work1()` and `work2()`) in this example. And we also have two separate\nmutexes. If you compile and run this code example, you will notice that the program just runs indefinitely,\nwithout ending.\n\nWhen we look into the first thread, which executes the `work1()` function, we can\nnotice that this function acquires the `mut1` lock first. Because this is the first operation\nthat is executed inside this thread, which is the first thread created in the program.\nAfter that, the function sleeps for 1 second, to\nsimulate some type of work, and then, the function tries to acquire the `mut2` lock.\n\nOn the other hand, when we look into the second thread, which executes the `work2()` function,\nwe can see that this function acquires the `mut2` lock first. Because when this thread get's created and it tries\nto acquire this `mut2` lock, the first thread is still sleeping on that \"sleep 1 second\" line.\nAfter acquiring `mut2`, the `work2()` function also sleeps for 1 second, to\nsimulate some type of work, and then, the function tries to acquire the `mut1` lock.\n\nThis creates a deadlock situation, because after the \"sleep for 1 second\" line in both threads,\nthe thread 1 is trying to acquire the `mut2` lock, but this lock is currently being used by thread 2.\nHowever, at this moment, the thread 2 is also trying to acquire the `mut1` lock, which is currently\nbeing used by thread 1. Therefore, both threads end up waiting for ever. Waiting for their peer to\nfree the lock that they want to acquire.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nvar mut1: Mutex = .{}; var mut2: Mutex = .{};\nfn work1() !void {\n mut1.lock();\n std.time.sleep(1 * std.time.ns_per_s);\n mut2.lock();\n _ = try stdout.write(\"Doing some work 1\\n\");\n mut2.unlock(); mut1.unlock();\n}\n\nfn work2() !void {\n mut2.lock();\n std.time.sleep(1 * std.time.ns_per_s);\n mut1.lock();\n _ = try stdout.write(\"Doing some work 1\\n\");\n mut1.unlock(); mut2.unlock();\n}\n\npub fn main() !void {\n const thr1 = try Thread.spawn(.{}, work1, .{});\n const thr2 = try Thread.spawn(.{}, work2, .{});\n thr1.join();\n thr2.join();\n}\n```\n:::\n\n\n\n\n\n### Not calling `join()` or `detach()` {#sec-not-call-join-detach}\n\nWhen you do not call either `join()` or `detach()` over a thread, then, this thread becomes a \"zombie thread\",\nbecause it does not have a clear \"return point\".\nYou could also interpret this as: \"nobody is properly resposible for managing the thread\".\nWhen we don't establish if a thread is either *joinable* or *detached*,\nnobody becomes responsible for dealing with the return value of this thread, and also,\nnobody becomes responsible for clearing (or freeing) the resources associated with this thread.\n\nYou don't want to be in this situation, so remember to always use `join()` or `detach()`\non the threads that you create. When you don't use these methods, the execution of the thread\nbecomes completely independent from the execution of the main process in your program.\nThis means that the main process of your program might end before the thread finish it's job,\nor vice-versa. The idea is that we have no idea of who is going to finish first. It\nbecomes a race condition problem.\nIn such case, we lose control over this thread, and it's resources are never freed\n(i.e. you have leaked resources in the system).\n\n\n### Cancelling or killing a particular thread\n\nWhen we think about the `pthreads` C library, there is a possible way to asynchronously kill or cancel\na thread, which is by sending a `SIGTERM` signal to the thread through the `pthread_kill()` function.\nBut canceling a thread like this is bad. Is dangerously bad. As consequence, the Zig implementation\nof threads does not have a similar function, or, a similar way to asynchronously cancel or kill\na thread.\n\nTherefore, if you want to cancel a thread in the middle of it's execution in Zig,\nthen, one good strategy that you can take is to use control flow in your favor in conjunction with `join()`.\nMore specifically, you can design your thread around a while loop, that is constantly\nchecking if the thread should continue running.\nIf is time to cancel the thread, we could make the while loop break, and join the thread with the main thread\nby calling `join()`.\n\nThe code example below demonstrates to some extent this strategy.\nHere, we are using control flow to break the while loop, and exit the thread earlier than\nwhat we have initially planned to. This example also demonstrates how can we use\natomic objects in Zig with the `Value()` generic function that we have mentioned at @sec-atomic-operation.\n\n\n```zig\nvar running = std.atomic.Value(bool).init(true);\nvar counter: u64 = 0;\nfn do_more_work() void {\n std.time.sleep(2 * std.time.ns_per_s);\n}\nfn work() !void {\n while (running.load(.monotonic)) {\n for (0..10000) |_| { counter += 1; }\n if (counter < 15000) {\n _ = try stdout.write(\"Time to cancel the thread.\\n\");\n running.store(false, .monotonic);\n } else {\n _ = try stdout.write(\"Time to do more work.\\n\");\n do_more_work();\n running.store(false, .monotonic);\n }\n }\n}\n\npub fn main() !void {\n const thread = try Thread.spawn(.{}, work, .{});\n thread.join();\n}\n```\n\n",
"supporting": [
"14-threads_files"
],
diff --git a/_freeze/Chapters/14-zig-c-interop/execute-results/html.json b/_freeze/Chapters/14-zig-c-interop/execute-results/html.json
index 2bce9b7..62ed00d 100644
--- a/_freeze/Chapters/14-zig-c-interop/execute-results/html.json
+++ b/_freeze/Chapters/14-zig-c-interop/execute-results/html.json
@@ -1,8 +1,8 @@
{
- "hash": "c1b0d4aacdc3fad33632fccfc4f71b07",
+ "hash": "d90e47da905e296465572217fed37b61",
"result": {
"engine": "knitr",
- "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Zig interoperability with C\n\nIn this chapter, we are going to discuss the interoperability of Zig with C.\nWe have discussed at @sec-building-c-code how to build C code using the `zig` compiler.\nBut we haven't discussed yet how to actually use C code in Zig. In other words,\nwe haven't discussed yet how to call and use C code from Zig.\n\nThese matters are discussed here, in this chapter.\nAlso, in our next small project in this book, we are going to use a C library in it.\nAs consequence, we will put in practice a lot of the knowledge discussed here on\nthis next project.\n\n\n## How to call C code from Zig\n\nInterop with C is not something new. Most high-level programming languages have FFI (foreign function interfaces),\nwhich can be used to call C code. For example, Python have Cython, R have `.Call()`, Javascript have `ccall()`, etc.\nBut Zig integrates with C in a deeper level, which affects not only the way that C code get's called, but also,\nhow this C code is compiled and incorporated into your Zig project.\n\nIn summary, Zig have great interoperability with C. If you want to call any C code from Zig,\nyou have to perform the following steps:\n\n- import a C header file into your Zig code.\n- link your Zig code with the C library.\n\n\n### Strategies to import C header files {#sec-strategy-c}\n\nSo using C code in Zig always involves performing the two steps cited above. However, when\nwe talk specifically about the first step listed above, there are currently two\ndifferent ways to perform this first step, which are:\n\n- translating the C header file into Zig code, through `zig translate-c`, and then, importing the translated Zig code.\n- importing the C header file directly into your Zig module through `@cImport()` built-in function.\n\nIf you are not familiar with `translate-c`, this is a subcommand inside the `zig` compiler that takes C files\nas input, and outputs the Zig representation of the C code present in these C files.\nIn other words, this subcommand works like a transpiler. It takes C code, and translates it into\nthe equivalent Zig code.\n\nI think it would be ok to interpret `translate-c` as a tool to generate Zig bindings\nto C code, similarly to the `rust-bindgen`[^bindgen] tool, which generates Rust FFI bindings to C code.\nBut that would not be a precise interpretation of `translate-c`. The idea behind this tool is\nto really translate the C code into Zig code.\n\n[^bindgen]: \n\nNow, on a surface level, `@cImport()` versus `translate-c` might seem like\ntwo completely different strategies. But in fact, they are effectively the exact same strategy.\nBecause, under the hood, the `@cImport()` built-in function is just a shortcut to `translate-c`.\nBoth tools use the same C to Zig translation functionality. So when you use `@cImport()`,\nyou are essentially asking the `zig` compiler to translate the C header file into Zig code, then,\nto import this Zig code into your current Zig module.\n\nAt the present moment, there is an accepted proposal at the Zig project, to move `@cImport()`\nto the Zig build system[^cimport-issue]. If this proposal is completed, then, the \"use `@cImport()`\"\nstrategy would be transformed into \"call a translate C function in your Zig build script\".\nSo, the step of translating the C code into Zig code would be moved to\nthe build script of your Zig project, and you would only have to import the translated Zig code into\nyour Zig module to start calling C code from Zig.\n\n[^cimport-issue]: \n\nIf you think about this proposal for a minute, you will understand that this is actually\na small change. I mean, the logic is the same, and the steps are still essentially the same.\nThe only difference is that one of the steps will be moved to the build script of your Zig project.\n\n\n\n### Linking Zig code with a C library {#sec-linking-c}\n\nRegardless of which of the two strategies mentioned in the previous section you choose,\nif you want to call C code from Zig, you will always have to link your Zig code\nwith the C library that contains the C code that you want to call.\n\nIn other words, everytime you use a C library in your Zig code, **you introduce a dependency in your build process**.\nThis should come as no surprise to anyone that have any experience with C and C++.\nBecause this is no different in C. Everytime you use a C library in your C code, you also\nhave to build and link your C code with this C library that you are using.\n\nWhen we use a C library in our Zig code, the `zig` compiler needs to access the definition of the C functions that\nare being called in your Zig code. The C header file of this library provides the\ndeclarations of these C functions, but not their definitions. So, in order to access these definitions,\nthe `zig` compiler needs to build your Zig code and link it with the C library in the build process.\n\nAs we discussed across the @sec-build-system, there are different strategies to link something with a library.\nThis might involve building the C library first, and then, linking it with the Zig code. Or,\nit could also involve just the linking step, if this C library is already built and\ninstalled in your system. Anyway, if you have doubts about this, comeback to @sec-build-system.\n\n\n\n## Importing C header files {#sec-import-c-header}\n\nAt @sec-strategy-c, we have described that, currently, there is two different paths that\nyou can take to import a C header file into your Zig modules, `translate-c` or `@cImport()`.\nThis section describes each strategy separately in more details.\n\n### Strategy 1: using `translate-c`\n\nWhen we choose this strategy, we first need to use the `translate-c` tool to translate\nthe C header files that we want to use into Zig code. For example, suppose we wanted to\nuse the `fopen()` C function from the `stdio.h` C header file. We can translate the\n`stdio.h` C header file through the bash command below:\n\n```bash\nzig translate-c /usr/include/stdio.h \\\n -lc -I/usr/include \\\n -D_NO_CRT_STDIO_INLINE=1 > c.zig \\\n```\n\nNotice that, in this bash command, we are passing the necessary compiler flags (`-D` to define macros,\n`-l` to link libraries, `-I` to include header file) to compile and use the `stdio.h` header file.\nAlso notice that we are saving the results of the translation inside a Zig module called `c.zig`.\n\nSo after running this command, all we have to do is to import this `c.zig` module, and start\ncalling the C functions that you want to call from it. The example below demonstrates that.\nImportant to remember what we've discussed at @sec-linking-c. In order to compile this\nexample you have to link this code with `libc`, by passing the flag `-lc` to the `zig` compiler.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c = @import(\"c.zig\");\npub fn main() !void {\n const x: f32 = 1772.94122;\n _ = c.printf(\"%.3f\\n\", x);\n}\n```\n:::\n\n\n\n\n```\n1772.941\n```\n\n\n### Strategy 2: using `@cImport()`\n\nTo import a C header file into our Zig code, we can use the built-in functions `@cInclude()` and `@cImport()`.\nInside the `@cImport()` function, we open a block (with a pair of curly braces). Inside this block\nwe can (if we need to) include multiple `@cDefine()` calls to define C macros when including this specific C header file.\nBut for the most part, you will probably need to use just a single call inside this block at `@cImport()`,\nwhich is a call to `@cInclude()`.\n\nThis `@cInclude()` function is equivalent to the `#include` statement in C.\nYou provide the name of the C header that you want to include as input to this `@cInclude()` function,\nthen, in conjunction with `@cImport()`, it will perform the necessary steps\nto include this C header file into your Zig code.\n\nYou should bind the result of `@cImport()` to a constant object, pretty much like you would do with\n`@import()`. You just assign the result to a constant object in your\nZig code, and, as consequence, all C functions, C structs, C macros, etc. that are defined inside the\nC header files will be available through this constant object.\n\nLook at the code example below, where we are importing the Standard I/O C Library (`stdio.h`),\nand calling the `printf()`[^printf] C function. Notice that we have also used in this example the C function `powf()`[^powf],\nwhich comes from the C Math Library (`math.h`).\nIn order to compile this example, you have to link this Zig code with both\nthe C Standard Library and the C Math Library, by passing the flags `-lc` and `-lm`\nto the `zig` compiler.\n\n[^printf]: \n[^powf]: \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c = @cImport({\n @cDefine(\"_NO_CRT_STDIO_INLINE\", \"1\");\n @cInclude(\"stdio.h\");\n @cInclude(\"math.h\");\n});\n\npub fn main() !void {\n const x: f32 = 15.2;\n const y = c.powf(x, @as(f32, 2.6));\n _ = c.printf(\"%.3f\\n\", y);\n}\n```\n:::\n\n\n\n\n```\n1182.478\n```\n\n\n## About passing Zig values to C functions {#sec-zig-obj-to-c}\n\nZig objects have some intrinsic differences between their C equivalents.\nProbably the most noticeable one is the difference between C strings and Zig strings,\nwhich I described at @sec-zig-strings.\nZig strings are objects that contains both an array of arbitrary bytes and a length value.\nOn the other hand, a C string is usually just a pointer to a null-terminated array of arbitrary bytes.\n\nBecause of these intrinsic differences, in some specific cases, you cannot pass Zig objects directly\nas inputs to C functions before you convert them into C compatible values. However, in some other cases,\nyou are allowed to pass Zig objects and Zig literal values directly as inputs to C functions,\nand everything will work just fine, because the `zig` compiler will handle everything for you.\n\nSo we have two different scenarios being described here. Let's call them \"auto-conversion\" and \"need-conversion\".\nThe \"auto-conversion\" scenario is when the `zig` compiler handles everything for you, and automatically convert your\nZig objects/values into C compatible values. In contrast,\nthe \"need-conversion\" scenario is when you, the programmer, have the responsibility of converting\nthat Zig object into a C compatible value, before passing it to C code.\n\nThere is also a third scenario that is not being described here, which is when you create a C object, or, a C struct, or\na C compatible value in your Zig code, and you pass this C object/value as input to a C function in your Zig code.\nThis scenario will be described later at @sec-c-inputs. In this section, we are focused on the scenarios where\nwe are passing Zig objects/values to C code, instead of C objects/values being passed to C code.\n\n\n### The \"auto-conversion\" scenario\n\nAn \"auto-conversion\" scenario is when the `zig` compiler automatically converts our Zig objects into\nC compatible values for us. This specific scenario happens mostly in two instances:\n\n- with string literal values;\n- with any of the primitive data types that were introduced at @sec-primitive-data-types.\n\nWhen we think about the second instance described above, the `zig` compiler does automatically\nconvert any of the primitive data types into their C equivalents, because the compiler knows how\nto properly convert a `i16` into a `signed short`, or, a `u8` into a `unsigned char`, etc.\nNow, when we think about string literal values, they can be automatically\nconverted into C strings as well, especially because the `zig` compiler does not forces\na specific Zig data type into a string literal at first glance, unless you store this\nstring literal into a Zig object, and explicitly annotate the data type of this object.\n\nThus, with string literal values, the `zig` compiler have more freedom to infer which is the appropriate data type\nto be used in each situation. You could say that the string literal value \"inherits it's data type\" depending on the context that\nit is used. Most of the times, this data type is going to be the type that we commonly associate with Zig strings (`[]const u8`).\nBut it might be a different type depending on the situation. When the `zig` compiler detects that you are providing\na string literal value as input to some C function, the compiler automatically interprets this string\nliteral as a C string value.\n\nAs an example, look at the code exposed below. Here we are using\nthe `fopen()` C function to simply open and close a file. If you do not know how this `fopen()`\nfunction works in C, it takes two C strings as input. But in this code example below, we are passing some\nstring literals written in our Zig code directly as inputs to this `fopen()` C function.\n\nIn other words, we are not doing any type of conversion from a Zig string to a C string.\nWe are just passing the Zig string literals directly as inputs to the C function. And it works just fine!\nBecause the compiler inteprets the string `\"foo.txt\"` as a C string, as a result of the current context\nthat this string literal is being used.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c = @cImport({\n @cDefine(\"_NO_CRT_STDIO_INLINE\", \"1\");\n @cInclude(\"stdio.h\");\n});\n\npub fn main() !void {\n const file = c.fopen(\"foo.txt\", \"rb\");\n if (file == null) {\n @panic(\"Could not open file!\");\n }\n if (c.fclose(file) != 0) {\n return error.CouldNotCloseFileDescriptor;\n }\n}\n```\n:::\n\n\n\n\nLet's make some experiments, by writing the same code in different manners, and we\nsee how this affects the program. As a starting point, let's store the `\"foo.txt\"` string inside\na Zig object, like the `path` object below, and then, we pass this Zig object as input to the `fopen()` C function.\n\nIf we do this, the program still compiles and runs successfully. Notice that I have ommitted most of the code in this example below.\nThis is just for brevitty reasons, because the remainder of the program is still the same.\nThe only difference between this example and the previous example is just these two lines exposed below.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path = \"foo.txt\";\n const file = c.fopen(path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\nNow, what happens if you give an explicit data type to the `path` object? Well, if I force\nthe `zig` compiler to interpret this `path` object as a Zig string object,\nby annotating the `path` object with the data type `[]const u8`, then, I actually get a compile error\nas demonstrated below. We get this compile error because now I'm forcing the `zig` compiler\nto interpret `path` as a Zig string object.\n\nAccording to the error message, the `fopen()` C function was expecting to receive an\ninput value of type `[*c]const u8` (C string) instead of a value of type `[]const u8` (Zig string).\nIn more details, the type `[*c]const u8` is actually the Zig type representation of a C string.\nThe `[*c]` portion of this type identifies a C pointer. So, this Zig type essentially means: a C pointer to an array (`[*c]`) of\nconstant bytes (`const u8`).\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const file = c.fopen(path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\n```\nt.zig:10:26: error: expected type '[*c]const u8', found '[]const u8'\n const file = c.fopen(path, \"rb\");\n ^~~~\n```\n\nTherefore, when we talk exclusively about string literal values, as long as you don't give an\nexplicit data type to these string literal values, the `zig` compiler should be capable of automatically\nconverting them into C strings as needed.\n\nBut what about using one of the primitive data types that were introduced at @sec-primitive-data-types?\nLet's take code exposed below as an example of that. Here, we are giving some float literal values as input\nto the C function `powf()`. Notice that this code example compiles and runs succesfully.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst cmath = @cImport({\n @cInclude(\"math.h\");\n});\n\npub fn main() !void {\n const y = cmath.powf(15.68, 2.32);\n try stdout.print(\"{d}\\n\", .{y});\n}\n```\n:::\n\n\n\n\n```\n593.2023\n```\n\nOnce again, because the `zig` compiler does not associate a specific data type with the literal values\n`15.68` and `2.32` at first glance, the compiler can automatically convert these values\ninto their C `float` (or `double`) equivalents, before it passes to the `powf()` C function.\nNow, even if I give an explicit Zig data type to these literal values, by storing them into a Zig object,\nand explicit annotating the type of these objects, the code still compiles and runs succesfully.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const x: f32 = 15.68;\n const y = cmath.powf(x, 2.32);\n // The remainder of the program\n```\n:::\n\n\n\n\n```\n593.2023\n```\n\n\n\n### The \"need-conversion\" scenario\n\nA \"need-conversion\" scenario is when we need to manually convert our Zig objects into C compatible values\nbefore passing them as input to C functions. You will fall in this scenario, when passing Zig string objects\nto C functions.\n\nWe already saw this specific circumstance on the last `fopen()` example,\nwhich is reproduced below. You can see in this example, that we have given an explicit Zig data type\n(`[]const u8`) to our `path` object, and, as a consequence of that, we have forced the `zig` compiler\nto see this `path` object, as a Zig string object. Because of that, we need now to manually convert\nthis `path` object into a C string before we pass it to `fopen()`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const file = c.fopen(path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\n```\nt.zig:10:26: error: expected type '[*c]const u8', found '[]const u8'\n const file = c.fopen(path, \"rb\");\n ^~~~\n```\n\n\nThere are different ways to convert a Zig string object into a C string.\nOne way to solve this problem is to provide the pointer to the underlying array\nof bytes, instead of providing the Zig object directly as input.\nYou can access this pointer by using the `ptr` property of the Zig string object.\n\nThe code example below demonstrates this strategy. Notice that, by giving the\npointer to the underlying array in `path` through the `ptr` property, we get no compile errors as result\nwhile using the `fopen()` C function.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const file = c.fopen(path.ptr, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\nThis strategy works because this pointer to the underlying array found in the `ptr` property,\nis semantically identical to a C pointer to a null-terminated array of bytes, i.e. a C object of type `*unsigned char`.\nThis is why this option also solves the problem of converting the Zig string into a C string.\n\nAnother option is to explicitly convert the Zig string object into a C pointer by using the\nbuilt-in function `@ptrCast()`. With this function we can convert\nan object of type `[]const u8` into an object of type `[*c]const u8`.\nAs I described at the previous section, the `[*c]` portion of the type\nmeans that it is a C pointer. This strategy is not-recommended. But it is\nuseful to demonstrate the use of `@ptrCast()`.\n\nYou may recall of `@as()` and `@ptrCast()` from @sec-type-cast. Just as a recap,\nthe `@as()` built-in function is used to explicit convert (or cast) a Zig value from a type \"x\"\nto a type \"y\".\nBut in our case here, we are converting a pointer object, or, a C pointer more specifically.\nEverytime a pointer is involved in some \"type casting operation\" in Zig,\nthe `@ptrCast()` function is involved.\n\nIn the example below, we are using this function to cast our `path` object\ninto a C pointer to an array of bytes. Then, we pass this C pointer as input\nto the `fopen()` function. Notice that this code example compiles succesfully\nwith no errors.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const c_path: [*c]const u8 = @ptrCast(path);\n const file = c.fopen(c_path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\n\n\n## Creating C objects in Zig {#sec-c-inputs}\n\nCreating C objects, or, in other words, creating instances of C structs in your Zig code\nis actually something quite easy to do. You first need to import the C header file (like I described at @sec-import-c-header) that describes\nthe C struct that you are trying to instantiate in your Zig code. After that, you can just\ncreate a new object in your Zig code, and annotate it with the C type of the struct.\n\nFor example, suppose we have a C header file called `user.h`, and that this header file is declaring a new struct named `User`.\nThis C header file is exposed below:\n\n```c\n#include \n\ntypedef struct\n{\n uint64_t id;\n char* name;\n} User;\n```\n\nThis `User` C struct have two distinct fields, or two struct members, named `id` and `name`.\nThe field `id` is a unsigned 64-bit integer value, while the field `name` is just a standard C string.\nNow, suppose that I want to create an instance of this `User` struct in my Zig code.\nI can do that by importing this `user.h` header file into my Zig code, and creating\na new object with type `User`. These steps are reproduced in the code example below.\n\nNotice that I have used the keyword `undefined` in this example. This allows me to\ncreate the `new_user` object without the need to provide an initial value to the object.\nAs consequence, the underlying memory associated with this `new_user` is unintialized,\ni.e. the memory is currently populated with \"garbage\" values.\nThus, this expression have the exact same effect of the expression `User new_user;` in C,\nwhich means \"declare a new object named `new_user` of type `User`\".\n\nIs our responsibility to properly initialize this memory associated with this `new_user` object,\nby assigining valid values to the members (or the fields) of the C struct. In the example below, I am assigning the integer 1 to the\nmember `id`. I am also saving the string `\"pedropark99\"` into the member `name`.\nNotice in this example that I manually add the null character (zero byte) to the end of the allocated array\nfor this string. This null character marks the end of the array in C.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst c = @cImport({\n @cInclude(\"user.h\");\n});\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n\n var new_user: c.User = undefined;\n new_user.id = 1;\n var user_name = try allocator.alloc(u8, 12);\n defer allocator.free(user_name);\n @memcpy(user_name[0..(user_name.len - 1)], \"pedropark99\");\n user_name[user_name.len - 1] = 0;\n new_user.name = user_name.ptr;\n}\n```\n:::\n\n\n\n\nSo, in this example above, we are manually initializing each field of the C struct.\nWe could say that, in this instance, we are \"manually instantiating\nthe C struct object\". However, when we use C libraries in our Zig code, we rarely need\nto manually instantiate the C structs like in the above example. Only because C libraries\nusually provide \"constructor functions\" in their public APIs. As consequence, we normally rely on\nthese constructor functions to properly initialize the C structs, and\nthe struct fields for us.\n\nFor example, consider the Harfbuzz C library. This a text shaping C library,\nand it works around a \"buffer object\", or, more specifically, an instance of\nthe C struct `hb_buffer_t`. Therefore, we need to create an instance of\nthis C struct if we want to use this C library. Luckily, this library offers\nthe function `hb_buffer_create()`, which we can use to create such object.\nSo the Zig code necessary to create such object would probably look something like this:\n\n```zig\nconst c = @cImport({\n @cInclude(\"hb.h\");\n});\nvar buf: c.hb_buffer_t = c.hb_buffer_create();\n// Do stuff with the \"buffer object\"\n```\n\nTherefore, we do not need to manually create an instance of the C struct\n`hb_buffer_t` here, and manually assign valid values to each field in this C struct.\nBecause the constructor function `hb_buffer_create()` is doing this heavy job for us.\n\nSince this `buf` object (and also the `new_user` object) is an instance of a C struct, this\nobject is, in itself, a C compatible value. It is a C object defined in our Zig code. As consequence,\nyou can freely pass this object as input to any C function that expects to receive this type\nof C struct as input. You do not need to use any special syntax, or, to convert this object in\nany special manner to use it in C code.\nThis is how we create and use C objects in our Zig code.\n\n\n\n## Passing C structs across Zig functions {#sec-pass-c-structs}\n\nNow that we have learned how to create/declare C objects in our Zig code, we\nneed to learn how to pass these C objects as inputs to Zig functions.\nAs I described at @sec-c-inputs, we can freely pass these C objects as inputs to C code\nthat we call from our Zig code. But what about passing these C objects as inputs to Zig functions?\n\nIn essence, this specific case requires one small adjustment in the Zig function declaration.\nAll you need to do, is to make sure that you pass your C object *by reference* to the function,\ninstead of passing it *by value*. To do that, you have to annotate the data type of the function argument\nthat is receiving this C object as \"a pointer to the C struct\", instead of annotating it as \"an instance of the C struct\".\n\nLet's consider the C struct `User` from the `user.h` C header file that we have used at @sec-c-inputs.\nNow, consider that we want to create a Zig function that sets the value of the `id` field\nin this C struct, like the `set_user_id()` function declared below.\nNotice that the `user` argument in this function is annotated as a pointer (`*`) to a `c.User` object.\n\nTherefore, essentially, all you have to do when passing C objects to Zig functions, is to add `*` to the\ndata type of the function argument that is receiving the C object. This will make sure that\nthe C object is passed *by reference* to the function.\n\nNow, because we have transformed the function argument into a pointer,\neverytime that you have to access the value pointed by the input pointer inside the function body, for whatever reason (e.g. you want\nto read, update, or delete this value), you have to dereference the pointer with the `.*` syntax that we\nlearned from @sec-pointer. Notice that the `set_user_id()` function is using this syntax to alter\nthe value in the `id` field of the `User` struct pointed by the input pointer.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst c = @cImport({\n @cInclude(\"user.h\");\n});\nfn set_user_id(id: u64, user: *c.User) void {\n user.*.id = id;\n}\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n\n var new_user: c.User = undefined;\n new_user.id = 1;\n var user_name = try allocator.alloc(u8, 12);\n defer allocator.free(user_name);\n @memcpy(user_name[0..(user_name.len - 1)], \"pedropark99\");\n user_name[user_name.len - 1] = 0;\n new_user.name = user_name.ptr;\n\n set_user_id(25, &new_user);\n try stdout.print(\"New ID: {any}\\n\", .{new_user.id});\n}\n```\n:::\n\n\n\n\n```\nNew ID: 25\n```\n\n",
+ "markdown": "---\nengine: knitr\nknitr: true\nsyntax-definition: \"../Assets/zig.xml\"\n---\n\n\n\n\n\n\n\n\n\n# Zig interoperability with C\n\nIn this chapter, we are going to discuss the interoperability of Zig with C.\nWe have discussed at @sec-building-c-code how to build C code using the `zig` compiler.\nBut we haven't discussed yet how to actually use C code in Zig. In other words,\nwe haven't discussed yet how to call and use C code from Zig.\n\nThese matters are discussed here, in this chapter.\nAlso, in our next small project in this book, we are going to use a C library in it.\nAs consequence, we will put in practice a lot of the knowledge discussed here on\nthis next project.\n\n\n## How to call C code from Zig\n\nInterop with C is not something new. Most high-level programming languages have FFI (foreign function interfaces),\nwhich can be used to call C code. For example, Python have Cython, R have `.Call()`, Javascript have `ccall()`, etc.\nBut Zig integrates with C in a deeper level, which affects not only the way that C code get's called, but also,\nhow this C code is compiled and incorporated into your Zig project.\n\nIn summary, Zig have great interoperability with C. If you want to call any C code from Zig,\nyou have to perform the following steps:\n\n- import a C header file into your Zig code.\n- link your Zig code with the C library.\n\n\n### Strategies to import C header files {#sec-strategy-c}\n\nSo using C code in Zig always involves performing the two steps cited above. However, when\nwe talk specifically about the first step listed above, there are currently two\ndifferent ways to perform this first step, which are:\n\n- translating the C header file into Zig code, through `zig translate-c`, and then, importing the translated Zig code.\n- importing the C header file directly into your Zig module through `@cImport()` built-in function.\n\nIf you are not familiar with `translate-c`, this is a subcommand inside the `zig` compiler that takes C files\nas input, and outputs the Zig representation of the C code present in these C files.\nIn other words, this subcommand works like a transpiler. It takes C code, and translates it into\nthe equivalent Zig code.\n\nI think it would be ok to interpret `translate-c` as a tool to generate Zig bindings\nto C code, similarly to the `rust-bindgen`[^bindgen] tool, which generates Rust FFI bindings to C code.\nBut that would not be a precise interpretation of `translate-c`. The idea behind this tool is\nto really translate the C code into Zig code.\n\n[^bindgen]: \n\nNow, on a surface level, `@cImport()` versus `translate-c` might seem like\ntwo completely different strategies. But in fact, they are effectively the exact same strategy.\nBecause, under the hood, the `@cImport()` built-in function is just a shortcut to `translate-c`.\nBoth tools use the same C to Zig translation functionality. So when you use `@cImport()`,\nyou are essentially asking the `zig` compiler to translate the C header file into Zig code, then,\nto import this Zig code into your current Zig module.\n\nAt the present moment, there is an accepted proposal at the Zig project, to move `@cImport()`\nto the Zig build system[^cimport-issue]. If this proposal is completed, then, the \"use `@cImport()`\"\nstrategy would be transformed into \"call a translate C function in your Zig build script\".\nSo, the step of translating the C code into Zig code would be moved to\nthe build script of your Zig project, and you would only have to import the translated Zig code into\nyour Zig module to start calling C code from Zig.\n\n[^cimport-issue]: \n\nIf you think about this proposal for a minute, you will understand that this is actually\na small change. I mean, the logic is the same, and the steps are still essentially the same.\nThe only difference is that one of the steps will be moved to the build script of your Zig project.\n\n\n\n### Linking Zig code with a C library {#sec-linking-c}\n\nRegardless of which of the two strategies mentioned in the previous section you choose,\nif you want to call C code from Zig, you will always have to link your Zig code\nwith the C library that contains the C code that you want to call.\n\nIn other words, everytime you use a C library in your Zig code, **you introduce a dependency in your build process**.\nThis should come as no surprise to anyone that have any experience with C and C++.\nBecause this is no different in C. Everytime you use a C library in your C code, you also\nhave to build and link your C code with this C library that you are using.\n\nWhen we use a C library in our Zig code, the `zig` compiler needs to access the definition of the C functions that\nare being called in your Zig code. The C header file of this library provides the\ndeclarations of these C functions, but not their definitions. So, in order to access these definitions,\nthe `zig` compiler needs to build your Zig code and link it with the C library in the build process.\n\nAs we discussed across the @sec-build-system, there are different strategies to link something with a library.\nThis might involve building the C library first, and then, linking it with the Zig code. Or,\nit could also involve just the linking step, if this C library is already built and\ninstalled in your system. Anyway, if you have doubts about this, comeback to @sec-build-system.\n\n\n\n## Importing C header files {#sec-import-c-header}\n\nAt @sec-strategy-c, we have described that, currently, there is two different paths that\nyou can take to import a C header file into your Zig modules, `translate-c` or `@cImport()`.\nThis section describes each strategy separately in more details.\n\n### Strategy 1: using `translate-c`\n\nWhen we choose this strategy, we first need to use the `translate-c` tool to translate\nthe C header files that we want to use into Zig code. For example, suppose we wanted to\nuse the `fopen()` C function from the `stdio.h` C header file. We can translate the\n`stdio.h` C header file through the bash command below:\n\n```bash\nzig translate-c /usr/include/stdio.h \\\n -lc -I/usr/include \\\n -D_NO_CRT_STDIO_INLINE=1 > c.zig \\\n```\n\nNotice that, in this bash command, we are passing the necessary compiler flags (`-D` to define macros,\n`-l` to link libraries, `-I` to include header file) to compile and use the `stdio.h` header file.\nAlso notice that we are saving the results of the translation inside a Zig module called `c.zig`.\n\nSo after running this command, all we have to do is to import this `c.zig` module, and start\ncalling the C functions that you want to call from it. The example below demonstrates that.\nImportant to remember what we've discussed at @sec-linking-c. In order to compile this\nexample you have to link this code with `libc`, by passing the flag `-lc` to the `zig` compiler.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c = @import(\"c.zig\");\npub fn main() !void {\n const x: f32 = 1772.94122;\n _ = c.printf(\"%.3f\\n\", x);\n}\n```\n:::\n\n\n\n\n```\n1772.941\n```\n\n\n### Strategy 2: using `@cImport()`\n\nTo import a C header file into our Zig code, we can use the built-in functions `@cInclude()` and `@cImport()`.\nInside the `@cImport()` function, we open a block (with a pair of curly braces). Inside this block\nwe can (if we need to) include multiple `@cDefine()` calls to define C macros when including this specific C header file.\nBut for the most part, you will probably need to use just a single call inside this block at `@cImport()`,\nwhich is a call to `@cInclude()`.\n\nThis `@cInclude()` function is equivalent to the `#include` statement in C.\nYou provide the name of the C header that you want to include as input to this `@cInclude()` function,\nthen, in conjunction with `@cImport()`, it will perform the necessary steps\nto include this C header file into your Zig code.\n\nYou should bind the result of `@cImport()` to a constant object, pretty much like you would do with\n`@import()`. You just assign the result to a constant object in your\nZig code, and, as consequence, all C functions, C structs, C macros, etc. that are defined inside the\nC header files will be available through this constant object.\n\nLook at the code example below, where we are importing the Standard I/O C Library (`stdio.h`),\nand calling the `printf()`[^printf] C function. Notice that we have also used in this example the C function `powf()`[^powf],\nwhich comes from the C Math Library (`math.h`).\nIn order to compile this example, you have to link this Zig code with both\nthe C Standard Library and the C Math Library, by passing the flags `-lc` and `-lm`\nto the `zig` compiler.\n\n[^printf]: \n[^powf]: \n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c = @cImport({\n @cDefine(\"_NO_CRT_STDIO_INLINE\", \"1\");\n @cInclude(\"stdio.h\");\n @cInclude(\"math.h\");\n});\n\npub fn main() !void {\n const x: f32 = 15.2;\n const y = c.powf(x, @as(f32, 2.6));\n _ = c.printf(\"%.3f\\n\", y);\n}\n```\n:::\n\n\n\n\n```\n1182.478\n```\n\n\n## About passing Zig values to C functions {#sec-zig-obj-to-c}\n\nZig objects have some intrinsic differences between their C equivalents.\nProbably the most noticeable one is the difference between C strings and Zig strings,\nwhich I described at @sec-zig-strings.\nZig strings are objects that contains both an array of arbitrary bytes and a length value.\nOn the other hand, a C string is usually just a pointer to a null-terminated array of arbitrary bytes.\n\nBecause of these intrinsic differences, in some specific cases, you cannot pass Zig objects directly\nas inputs to C functions before you convert them into C compatible values. However, in some other cases,\nyou are allowed to pass Zig objects and Zig literal values directly as inputs to C functions,\nand everything will work just fine, because the `zig` compiler will handle everything for you.\n\nSo we have two different scenarios being described here. Let's call them \"auto-conversion\" and \"need-conversion\".\nThe \"auto-conversion\" scenario is when the `zig` compiler handles everything for you, and automatically convert your\nZig objects/values into C compatible values. In contrast,\nthe \"need-conversion\" scenario is when you, the programmer, have the responsibility of converting\nthat Zig object into a C compatible value, before passing it to C code.\n\nThere is also a third scenario that is not being described here, which is when you create a C object, or, a C struct, or\na C compatible value in your Zig code, and you pass this C object/value as input to a C function in your Zig code.\nThis scenario will be described later at @sec-c-inputs. In this section, we are focused on the scenarios where\nwe are passing Zig objects/values to C code, instead of C objects/values being passed to C code.\n\n\n### The \"auto-conversion\" scenario\n\nAn \"auto-conversion\" scenario is when the `zig` compiler automatically converts our Zig objects into\nC compatible values for us. This specific scenario happens mostly in two instances:\n\n- with string literal values;\n- with any of the primitive data types that were introduced at @sec-primitive-data-types.\n\nWhen we think about the second instance described above, the `zig` compiler does automatically\nconvert any of the primitive data types into their C equivalents, because the compiler knows how\nto properly convert a `i16` into a `signed short`, or, a `u8` into a `unsigned char`, etc.\nNow, when we think about string literal values, they can be automatically\nconverted into C strings as well, especially because the `zig` compiler does not forces\na specific Zig data type into a string literal at first glance, unless you store this\nstring literal into a Zig object, and explicitly annotate the data type of this object.\n\nThus, with string literal values, the `zig` compiler have more freedom to infer which is the appropriate data type\nto be used in each situation. You could say that the string literal value \"inherits it's data type\" depending on the context that\nit is used. Most of the times, this data type is going to be the type that we commonly associate with Zig strings (`[]const u8`).\nBut it might be a different type depending on the situation. When the `zig` compiler detects that you are providing\na string literal value as input to some C function, the compiler automatically interprets this string\nliteral as a C string value.\n\nAs an example, look at the code exposed below. Here we are using\nthe `fopen()` C function to simply open and close a file. If you do not know how this `fopen()`\nfunction works in C, it takes two C strings as input. But in this code example below, we are passing some\nstring literals written in our Zig code directly as inputs to this `fopen()` C function.\n\nIn other words, we are not doing any type of conversion from a Zig string to a C string.\nWe are just passing the Zig string literals directly as inputs to the C function. And it works just fine!\nBecause the compiler inteprets the string `\"foo.txt\"` as a C string, as a result of the current context\nthat this string literal is being used.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst c = @cImport({\n @cDefine(\"_NO_CRT_STDIO_INLINE\", \"1\");\n @cInclude(\"stdio.h\");\n});\n\npub fn main() !void {\n const file = c.fopen(\"foo.txt\", \"rb\");\n if (file == null) {\n @panic(\"Could not open file!\");\n }\n if (c.fclose(file) != 0) {\n return error.CouldNotCloseFileDescriptor;\n }\n}\n```\n:::\n\n\n\n\nLet's make some experiments, by writing the same code in different manners, and we\nsee how this affects the program. As a starting point, let's store the `\"foo.txt\"` string inside\na Zig object, like the `path` object below, and then, we pass this Zig object as input to the `fopen()` C function.\n\nIf we do this, the program still compiles and runs successfully. Notice that I have ommitted most of the code in this example below.\nThis is just for brevitty reasons, because the remainder of the program is still the same.\nThe only difference between this example and the previous example is just these two lines exposed below.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path = \"foo.txt\";\n const file = c.fopen(path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\nNow, what happens if you give an explicit data type to the `path` object? Well, if I force\nthe `zig` compiler to interpret this `path` object as a Zig string object,\nby annotating the `path` object with the data type `[]const u8`, then, I actually get a compile error\nas demonstrated below. We get this compile error because now I'm forcing the `zig` compiler\nto interpret `path` as a Zig string object.\n\nAccording to the error message, the `fopen()` C function was expecting to receive an\ninput value of type `[*c]const u8` (C string) instead of a value of type `[]const u8` (Zig string).\nIn more details, the type `[*c]const u8` is actually the Zig type representation of a C string.\nThe `[*c]` portion of this type identifies a C pointer. So, this Zig type essentially means: a C pointer to an array (`[*c]`) of\nconstant bytes (`const u8`).\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const file = c.fopen(path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\n```\nt.zig:10:26: error: expected type '[*c]const u8', found '[]const u8'\n const file = c.fopen(path, \"rb\");\n ^~~~\n```\n\nTherefore, when we talk exclusively about string literal values, as long as you don't give an\nexplicit data type to these string literal values, the `zig` compiler should be capable of automatically\nconverting them into C strings as needed.\n\nBut what about using one of the primitive data types that were introduced at @sec-primitive-data-types?\nLet's take code exposed below as an example of that. Here, we are giving some float literal values as input\nto the C function `powf()`. Notice that this code example compiles and runs successfully.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst cmath = @cImport({\n @cInclude(\"math.h\");\n});\n\npub fn main() !void {\n const y = cmath.powf(15.68, 2.32);\n try stdout.print(\"{d}\\n\", .{y});\n}\n```\n:::\n\n\n\n\n```\n593.2023\n```\n\nOnce again, because the `zig` compiler does not associate a specific data type with the literal values\n`15.68` and `2.32` at first glance, the compiler can automatically convert these values\ninto their C `float` (or `double`) equivalents, before it passes to the `powf()` C function.\nNow, even if I give an explicit Zig data type to these literal values, by storing them into a Zig object,\nand explicit annotating the type of these objects, the code still compiles and runs successfully.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const x: f32 = 15.68;\n const y = cmath.powf(x, 2.32);\n // The remainder of the program\n```\n:::\n\n\n\n\n```\n593.2023\n```\n\n\n\n### The \"need-conversion\" scenario\n\nA \"need-conversion\" scenario is when we need to manually convert our Zig objects into C compatible values\nbefore passing them as input to C functions. You will fall in this scenario, when passing Zig string objects\nto C functions.\n\nWe already saw this specific circumstance on the last `fopen()` example,\nwhich is reproduced below. You can see in this example, that we have given an explicit Zig data type\n(`[]const u8`) to our `path` object, and, as a consequence of that, we have forced the `zig` compiler\nto see this `path` object, as a Zig string object. Because of that, we need now to manually convert\nthis `path` object into a C string before we pass it to `fopen()`.\n\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const file = c.fopen(path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\n```\nt.zig:10:26: error: expected type '[*c]const u8', found '[]const u8'\n const file = c.fopen(path, \"rb\");\n ^~~~\n```\n\n\nThere are different ways to convert a Zig string object into a C string.\nOne way to solve this problem is to provide the pointer to the underlying array\nof bytes, instead of providing the Zig object directly as input.\nYou can access this pointer by using the `ptr` property of the Zig string object.\n\nThe code example below demonstrates this strategy. Notice that, by giving the\npointer to the underlying array in `path` through the `ptr` property, we get no compile errors as result\nwhile using the `fopen()` C function.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const file = c.fopen(path.ptr, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\nThis strategy works because this pointer to the underlying array found in the `ptr` property,\nis semantically identical to a C pointer to a null-terminated array of bytes, i.e. a C object of type `*unsigned char`.\nThis is why this option also solves the problem of converting the Zig string into a C string.\n\nAnother option is to explicitly convert the Zig string object into a C pointer by using the\nbuilt-in function `@ptrCast()`. With this function we can convert\nan object of type `[]const u8` into an object of type `[*c]const u8`.\nAs I described at the previous section, the `[*c]` portion of the type\nmeans that it is a C pointer. This strategy is not-recommended. But it is\nuseful to demonstrate the use of `@ptrCast()`.\n\nYou may recall of `@as()` and `@ptrCast()` from @sec-type-cast. Just as a recap,\nthe `@as()` built-in function is used to explicit convert (or cast) a Zig value from a type \"x\"\nto a type \"y\".\nBut in our case here, we are converting a pointer object, or, a C pointer more specifically.\nEverytime a pointer is involved in some \"type casting operation\" in Zig,\nthe `@ptrCast()` function is involved.\n\nIn the example below, we are using this function to cast our `path` object\ninto a C pointer to an array of bytes. Then, we pass this C pointer as input\nto the `fopen()` function. Notice that this code example compiles successfully\nwith no errors.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\n const path: []const u8 = \"foo.txt\";\n const c_path: [*c]const u8 = @ptrCast(path);\n const file = c.fopen(c_path, \"rb\");\n // Remainder of the program\n```\n:::\n\n\n\n\n\n\n## Creating C objects in Zig {#sec-c-inputs}\n\nCreating C objects, or, in other words, creating instances of C structs in your Zig code\nis actually something quite easy to do. You first need to import the C header file (like I described at @sec-import-c-header) that describes\nthe C struct that you are trying to instantiate in your Zig code. After that, you can just\ncreate a new object in your Zig code, and annotate it with the C type of the struct.\n\nFor example, suppose we have a C header file called `user.h`, and that this header file is declaring a new struct named `User`.\nThis C header file is exposed below:\n\n```c\n#include \n\ntypedef struct\n{\n uint64_t id;\n char* name;\n} User;\n```\n\nThis `User` C struct have two distinct fields, or two struct members, named `id` and `name`.\nThe field `id` is a unsigned 64-bit integer value, while the field `name` is just a standard C string.\nNow, suppose that I want to create an instance of this `User` struct in my Zig code.\nI can do that by importing this `user.h` header file into my Zig code, and creating\na new object with type `User`. These steps are reproduced in the code example below.\n\nNotice that I have used the keyword `undefined` in this example. This allows me to\ncreate the `new_user` object without the need to provide an initial value to the object.\nAs consequence, the underlying memory associated with this `new_user` is unintialized,\ni.e. the memory is currently populated with \"garbage\" values.\nThus, this expression have the exact same effect of the expression `User new_user;` in C,\nwhich means \"declare a new object named `new_user` of type `User`\".\n\nIs our responsibility to properly initialize this memory associated with this `new_user` object,\nby assigining valid values to the members (or the fields) of the C struct. In the example below, I am assigning the integer 1 to the\nmember `id`. I am also saving the string `\"pedropark99\"` into the member `name`.\nNotice in this example that I manually add the null character (zero byte) to the end of the allocated array\nfor this string. This null character marks the end of the array in C.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst c = @cImport({\n @cInclude(\"user.h\");\n});\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n\n var new_user: c.User = undefined;\n new_user.id = 1;\n var user_name = try allocator.alloc(u8, 12);\n defer allocator.free(user_name);\n @memcpy(user_name[0..(user_name.len - 1)], \"pedropark99\");\n user_name[user_name.len - 1] = 0;\n new_user.name = user_name.ptr;\n}\n```\n:::\n\n\n\n\nSo, in this example above, we are manually initializing each field of the C struct.\nWe could say that, in this instance, we are \"manually instantiating\nthe C struct object\". However, when we use C libraries in our Zig code, we rarely need\nto manually instantiate the C structs like in the above example. Only because C libraries\nusually provide \"constructor functions\" in their public APIs. As consequence, we normally rely on\nthese constructor functions to properly initialize the C structs, and\nthe struct fields for us.\n\nFor example, consider the Harfbuzz C library. This a text shaping C library,\nand it works around a \"buffer object\", or, more specifically, an instance of\nthe C struct `hb_buffer_t`. Therefore, we need to create an instance of\nthis C struct if we want to use this C library. Luckily, this library offers\nthe function `hb_buffer_create()`, which we can use to create such object.\nSo the Zig code necessary to create such object would probably look something like this:\n\n```zig\nconst c = @cImport({\n @cInclude(\"hb.h\");\n});\nvar buf: c.hb_buffer_t = c.hb_buffer_create();\n// Do stuff with the \"buffer object\"\n```\n\nTherefore, we do not need to manually create an instance of the C struct\n`hb_buffer_t` here, and manually assign valid values to each field in this C struct.\nBecause the constructor function `hb_buffer_create()` is doing this heavy job for us.\n\nSince this `buf` object (and also the `new_user` object) is an instance of a C struct, this\nobject is, in itself, a C compatible value. It is a C object defined in our Zig code. As consequence,\nyou can freely pass this object as input to any C function that expects to receive this type\nof C struct as input. You do not need to use any special syntax, or, to convert this object in\nany special manner to use it in C code.\nThis is how we create and use C objects in our Zig code.\n\n\n\n## Passing C structs across Zig functions {#sec-pass-c-structs}\n\nNow that we have learned how to create/declare C objects in our Zig code, we\nneed to learn how to pass these C objects as inputs to Zig functions.\nAs I described at @sec-c-inputs, we can freely pass these C objects as inputs to C code\nthat we call from our Zig code. But what about passing these C objects as inputs to Zig functions?\n\nIn essence, this specific case requires one small adjustment in the Zig function declaration.\nAll you need to do, is to make sure that you pass your C object *by reference* to the function,\ninstead of passing it *by value*. To do that, you have to annotate the data type of the function argument\nthat is receiving this C object as \"a pointer to the C struct\", instead of annotating it as \"an instance of the C struct\".\n\nLet's consider the C struct `User` from the `user.h` C header file that we have used at @sec-c-inputs.\nNow, consider that we want to create a Zig function that sets the value of the `id` field\nin this C struct, like the `set_user_id()` function declared below.\nNotice that the `user` argument in this function is annotated as a pointer (`*`) to a `c.User` object.\n\nTherefore, essentially, all you have to do when passing C objects to Zig functions, is to add `*` to the\ndata type of the function argument that is receiving the C object. This will make sure that\nthe C object is passed *by reference* to the function.\n\nNow, because we have transformed the function argument into a pointer,\neverytime that you have to access the value pointed by the input pointer inside the function body, for whatever reason (e.g. you want\nto read, update, or delete this value), you have to dereference the pointer with the `.*` syntax that we\nlearned from @sec-pointer. Notice that the `set_user_id()` function is using this syntax to alter\nthe value in the `id` field of the `User` struct pointed by the input pointer.\n\n\n\n\n::: {.cell}\n\n```{.zig .cell-code}\nconst std = @import(\"std\");\nconst stdout = std.io.getStdOut().writer();\nconst c = @cImport({\n @cInclude(\"user.h\");\n});\nfn set_user_id(id: u64, user: *c.User) void {\n user.*.id = id;\n}\n\npub fn main() !void {\n var gpa = std.heap.GeneralPurposeAllocator(.{}){};\n const allocator = gpa.allocator();\n\n var new_user: c.User = undefined;\n new_user.id = 1;\n var user_name = try allocator.alloc(u8, 12);\n defer allocator.free(user_name);\n @memcpy(user_name[0..(user_name.len - 1)], \"pedropark99\");\n user_name[user_name.len - 1] = 0;\n new_user.name = user_name.ptr;\n\n set_user_id(25, &new_user);\n try stdout.print(\"New ID: {any}\\n\", .{new_user.id});\n}\n```\n:::\n\n\n\n\n```\nNew ID: 25\n```\n\n",
"supporting": [
"14-zig-c-interop_files"
],
diff --git a/docs/Chapters/01-base64.html b/docs/Chapters/01-base64.html
index 942de32..6f287e1 100644
--- a/docs/Chapters/01-base64.html
+++ b/docs/Chapters/01-base64.html
@@ -662,7 +662,7 @@
Figure 4.3 already showed you what effect this & operator produces in the bits of it’s operands. But let’s make a clear description of it.
In summary, the & operator performs a logical conjunction operation between the bits of it’s operands. In more details, the operator & compares each bit of the first operand to the corresponding bit of the second operand. If both bits are 1, the corresponding result bit is set to 1. Otherwise, the corresponding result bit is set to 0 (Microsoft 2021).
So, if we apply this operator to the binary sequences 1000100 and 00001101 the result of this operation is the binary sequence 00000100. Because only at the sixth position in both binary sequences we had a 1 value. So any position where we do not have both binary sequences setted to 1, we get a 0 bit in the resulting binary sequence.
-
We loose information about the original bit values from both sequences in this case. Because we no longer know if this 0 bit in the resulting binary sequence was produced by combining 0 with 0, or 1 with 0, or 0 with 1.
+
We lose information about the original bit values from both sequences in this case. Because we no longer know if this 0 bit in the resulting binary sequence was produced by combining 0 with 0, or 1 with 0, or 0 with 1.
As an example, suppose you have the binary sequence 10010111, which is the number 151 in decimal. How can we get a new binary sequence which contains only the third and fourth bits of this sequence?
We just need to combine this sequence with 00110000 (is 0x30 in hexadecimal) using the & operator. Notice that only the third and fourth positions in this binary sequence is setted to 1. As a consequence, only the third and fourth values of both binary sequences are potentially preserved in the output. All the remaining positions are setted to zero in the output sequence, which is 00010000 (is the number 16 in decimal).
This same logic applies to any other special structure in Zig that have it’s own scope by surrounding it with curly braces ({}). For loops, while loops, if else statements, etc. For example, if you declare any local object in the scope of a for loop, this local object is accessible only within the scope of this particular for loop. Because once the scope of this for loop ends, the space in the stack reserved for this for loop is freed. The example below demonstrates this idea.
-
// This does not compile succesfully!
+
// This does not compile successfully!const a = [_]u8{0, 1, 2, 3, 4};for (0..a.len) |i| {const index = i;
@@ -443,7 +443,7 @@
So, using again the add() function as an example, if you rewrite this function so that it returns a pointer to the local object result, the zig compiler will actually compile you program, with no warnings or erros. At first glance, it looks that this is good code that works as expected. But this is a lie!
If you try to take a look at the value inside of the r object, or, if you try to use this r object in another expression or function call, then, you would have undefined behaviour, and major bugs in your program (Zig Software Foundation 2024, see “Lifetime and Ownership”3 and “Undefined Behaviour”4 sections).
-
// This code compiles succesfully. But it has
+
// This code compiles successfully. But it has// undefined behaviour. Never do this!!!// The `r` object is undefined!
diff --git a/docs/Chapters/01-zig-weird.html b/docs/Chapters/01-zig-weird.html
index 8e6a345..a6cb16d 100644
--- a/docs/Chapters/01-zig-weird.html
+++ b/docs/Chapters/01-zig-weird.html
@@ -531,7 +531,7 @@
On the other side, if you use var, then, you are creating a variable (or mutable) object. You can change the value of this object as many times you want. Using the keyword var in Zig is similar to using the keywords let mut in Rust.
1.4.1 Constant objects vs variable objects
-
In the code example below, we are creating a new constant object called age. This object stores a number representing the age of someone. However, this code example does not compiles succesfully. Because on the next line of code, we are trying to change the value of the object age to 25.
+
In the code example below, we are creating a new constant object called age. This object stores a number representing the age of someone. However, this code example does not compiles successfully. Because on the next line of code, we are trying to change the value of the object age to 25.
The zig compiler detects that we are trying to change the value of an object/identifier that is constant, and because of that, the compiler will raise a compilation error, warning us about the mistake.
const age = 24;
@@ -542,7 +542,7 @@
var age: u8 = 24;age = 25;
@@ -577,7 +577,7 @@
// It compiles!const age = 15;
diff --git a/docs/Chapters/03-structs.html b/docs/Chapters/03-structs.html
index cc83b53..a6791f6 100644
--- a/docs/Chapters/03-structs.html
+++ b/docs/Chapters/03-structs.html
@@ -608,7 +608,7 @@
Now that we adjusted our main() function, I can now execute our program, and see the effects of these last changes. First, I execute the program once again, with the run command of the zig compiler. The program will hang, waiting for a client to connect.
-
Then, I open my web browser, and try to connect to the server again, using the URL localhost:3490. This time, instead of getting some sort of an error message from the browser, you will get the message “Hello World” printed into your web browser. Because this time, the server sended the HTTP Response succesfully to the web browser, as demonstrated by Figure 7.3.
+
Then, I open my web browser, and try to connect to the server again, using the URL localhost:3490. This time, instead of getting some sort of an error message from the browser, you will get the message “Hello World” printed into your web browser. Because this time, the server sended the HTTP Response successfully to the web browser, as demonstrated by Figure 7.3.