- Overview
- Project Challenges
- Team Development Steps for Minishell
- Project Structure
- Team Members
- Resources
- Acknowledgements
Our Minishell project represents a collaborative effort to build a custom shell program in C, inspired by the functionalities of bash, the Bourne Again SHell. As part of the 42 School curriculum, this project served as a unique opportunity for us to deepen our collective understanding of system processes, file descriptors, and the complexities of command-line interfaces. Through the development of our own shell, we engaged in a comprehensive exploration of Unix-based systems, gaining hands-on experience in process control, command interpretation, and environmental variable management.
- Parsing User Input: Accurately parsing user input, including handling spaces, quotes, and special characters, while distinguishing between command arguments and options.
- Executing Commands: Implementing logic to search for and execute the right executable based on the PATH environment variable or a specified path, and managing execution of built-in commands versus external commands.
- Signal Handling: Correctly handling Unix signals such as SIGINT (
ctrl-C
), SIGQUIT (ctrl-\
), and EOF (ctrl-D
), and ensuring the shell behaves similarly to bash in response to these signals. - Input/Output Redirection and Pipes: Implementing input and output redirection (
<
,>
,>>
,<<
) and pipes (|
) to allow for command chaining and data redirection, which involves managing file descriptors and process communication. - Environment Variable Expansion: Managing environment variables and supporting their expansion within commands, including the special case of
$?
to represent the exit status of the most recently executed command. - Memory Management: Ensuring efficient memory management throughout the shell, including preventing memory leaks especially in the context of the readline function and dynamically allocated resources.
- Built-in Commands Implementation: Creating internal implementations of several built-in commands (
echo
,cd
,pwd
,export
,unset
,env
,exit
) that behave consistently with their bash counterparts. - Concurrency and Process Management: Handling concurrency through process creation and management, using system calls like
fork
,execve
,wait
, andpipe
, and ensuring robust process control and signal handling. - Error Handling: Developing comprehensive error handling strategies to deal with invalid commands, permissions issues, nonexistent files, and other runtime errors.
- Repository Setup: Collaboratively set up the GitHub repository, ensuring a clear directory structure and branch strategy.
- Makefile Creation: Makefile that includes rules for
all
,clean
,fclean
,re
- Set up libft libray
This phase was about understanding the shell's operations, researching the external functions allowed, and dividing them among ourselves to research and explain their usage to the team.
Reviewed the external functions allowed, dividing them among ourselves to research and explain their usage to the team.
Readline Functions:
Function | Description |
---|---|
readline |
Reads a line from the standard input and returns it. |
rl_clear_history |
Clears the readline history list. |
rl_on_new_line |
Prepares readline for reading input on a new line. |
rl_replace_line |
Replaces the content of the readline current line buffer. |
rl_redisplay |
Updates the display to reflect changes to the input line. |
add_history |
Adds the most recent input to the readline history list. |
Standard I/O Functions:
Function | Description |
---|---|
printf |
Outputs formatted data to stdout. |
Memory Allocation Functions:
Function | Description |
---|---|
malloc |
Allocates specified bytes of heap memory. |
free |
Deallocates previously allocated memory. |
File I/O Functions:
Function | Description |
---|---|
write |
Writes data to a file descriptor. |
access |
Checks calling process's permissions for a file or directory. |
open |
Opens a file or device, returning a file descriptor. |
read |
Reads data from a file descriptor into a buffer. |
close |
Closes a previously opened file descriptor. |
Process Control Functions:
Function | Description |
---|---|
fork |
Creates a new process by duplicating the calling process. |
wait |
Suspends execution of the calling process until one of its children terminates. |
waitpid |
Waits for a specific child process to change state. |
wait3 |
Waits for any child process to change state. |
wait4 |
Waits for a specific child process to change state. |
signal |
Handles or ignores signals sent to the process. |
sigaction |
Handles or ignores signals sent to the process. |
sigemptyset |
Initializes and adds signals to a signal set. |
sigaddset |
Initializes and adds signals to a signal set. |
kill |
Sends a signal to a process or a group of processes. |
exit |
Terminates the calling process. |
Directory Functions:
Function | Description |
---|---|
getcwd |
Gets the current working directory. |
chdir |
Changes the current working directory. |
stat |
Returns information about a file or a file descriptor. |
lstat |
Returns information about a file or a file descriptor. |
fstat |
Returns information about a file or a file descriptor. |
unlink |
Removes a link to a file. |
execve |
Replaces the current process image with a new process image. |
File Descriptor Functions:
Function | Description |
---|---|
dup |
Duplicates a file descriptor. |
dup2 |
Duplicates a file descriptor. |
pipe |
Creates a pipe for inter-process communication. |
Directory Functions:
Function | Description |
---|---|
opendir |
Manages directory streams. |
readdir |
Manages directory streams. |
closedir |
Manages directory streams. |
Error Handling Functions:
Function | Description |
---|---|
strerror |
Returns a pointer to the textual representation of an error code. |
perror |
Returns a pointer to the textual representation of an error code. |
Terminal Functions:
Function | Description |
---|---|
isatty |
Tests whether a file descriptor refers to a terminal. |
ttyname |
Returns the name of the terminal associated with a file descriptor. |
ttyslot |
Returns the name of the terminal associated with a file descriptor. |
ioctl |
Controls device-specific input/output operations. |
getenv |
Returns the value of an environment variable. |
tcsetattr |
Sets and gets terminal attributes. |
tcgetattr |
Sets and gets terminal attributes. |
tgetent |
Terminal handling functions from the termcap library. |
tgetflag |
Terminal handling functions from the termcap library. |
tgetnum |
Terminal handling functions from the termcap library. |
tgetstr |
Terminal handling functions from the termcap library. |
tgoto |
Terminal handling functions from the termcap library. |
tput |
Terminal handling functions from the termcap library. |
Readline Library: Implemented readline and integrated add_history ( GNU Readline)
brew install readline
readline(3) - Linux manual page
Add the following to the Makefile:
READLINE_INCLUDE = $(shell brew --prefix readline)/include
READLINE_LIB = $(shell brew --prefix readline)/lib
INCLUDES = -I./includes -I./lib/libft -I$(READLINE_INCLUDE)
#include <stdio.h>
#include <readline/readline.h>
#include <readline/history.h>
int main(void)
{
char *input;
while (1)
{
input = readline("minishell$ ");
if (!input)
break;
if (*input)
add_history(input);
printf("Input: %s\n", input);
free(input);
}
return 0;
}
What Is An Abstract Syntax Tree
-
Syntax Error Checking: This involves verifying whether the input string adheres to the shell's syntax rules. It checks for unclosed quotes, and misuse of redirection or pipe symbols. Syntax error checking ensures that the input can be correctly interpreted and executed.
-
Tokenization: This step breaks the input string into meaningful pieces, known as tokens. Tokens can be commands, arguments, redirection operators (
<
,>
,>>
,<<
), pipe symbols (|
), and environment variable identifiers. Tokenization simplifies the parsing process by converting the input string into a format that's easier to analyze. -
Parsing: During parsing, tokens are analyzed to understand their syntactical relationship. This step involves constructing a representation of the input that reflects the user's intention. Depending on the complexity of the shell, this could mean building an abstract syntax tree (AST) or a simpler structure.
-
AST Construction:
- Commands: Nodes in the AST represent commands along with their arguments. These nodes are fundamental to understanding what actions the shell needs to perform.
- Redirections: Redirection nodes are created to represent input and output redirections. These nodes are attached to command nodes to modify how the commands read their input or write their output.
- Pipelines: When a pipe is encountered, a pipeline node links command nodes together, indicating that the output of one command serves as the input to another.
- Environment Variable Expansion: This can be handled as part of command parsing or immediately before command execution. It involves replacing environment variable identifiers with their corresponding values.
-
Execution AST: After the AST is built, it is traversed to execute the commands it represents. This involves:
- Executing built-in commands directly within the shell.
- Launching external commands by creating new processes.
- Setting up redirections as specified by the redirection nodes.
- Managing pipelines by connecting the stdout of one command to the stdin of the next.
the syntax error checker will be responsible for identifying and reporting syntax errors in the user input.
- Unclosed Quotes: verify that all quotes are properly closed.
- Misplaced Operators: Detect if pipes
|
are used incorrectly, such as being at the start or end of the input, or if multiple pipes are used consecutively. - Logical Operators: Detect logical operators such as
&&
and||
and report them as not supported. - Invalid Redirections: Detect invalid redirections, such as multiple consecutive redirections or redirections at the start or end of the input.
syntax_checker.c:
syntax_error_checker
Function: Iterates through the input string, checking for syntax errors and reporting them if found.has_unclosed_quotes
Function: Checks for unclosed quotes in the input string.has_invalid_redirections
Function: Detects invalid redirections, such as multiple consecutive redirections or redirections at the start or end of the input.has_misplaced_operators
Function: Detects misplaced pipes and redirections.has_logical_operators
Function: Detects logical operators such as&&
and||
and reports them as not supported.
The goal of the tokenization process is to break down the input string into a series of tokens that the parser can easily understand. These tokens represent commands, arguments, pipes, redirections, and other elements of the shell syntax.
- Quotations: Distinguishing between single (
'
) and double ("
) quotes. - Redirections: Recognizing input (
<
), output (>
), append (>>
), and here-documen(<<
) redirections. - Pipes (
|
): Splitting commands to be executed in a pipeline. - Environment variables: Expanding variables starting with
$
. - Command separation: Identifying commands and their arguments.
// Token type enumeration
typedef enum e_token_type
{
TOKEN_WORD, // For commands and arguments
TOKEN_PIPE, // For '|'
TOKEN_REDIR_IN, // For '<'
TOKEN_REDIR_OUT, // For '>'
TOKEN_REDIR_APPEND, // For '>>'
TOKEN_REDIR_HEREDOC, // For '<<'
TOKEN_ENV_VAR, // For environment variables
} t_token_type;
// Token structure
typedef struct s_token
{
t_token_type type;
char *value;
struct s_token *next;
} t_token;
- Whitespace Handling: Skip whitespace outside quotes to separate commands and arguments.
- Quoting: Correctly handle single (
'
) and double quotes ("
), preserving the text exactly as is within single quotes and allowing for variable expansion and escaped characters within double quotes. - Redirections and Pipes: Detect
>
,>>
,<
,<<
, and|
, treating them as separate tokens while managing any adjacent whitespace.
The tokenization function will iterate through the input string, identifying and categorizing segments according to the shell syntax.
tokenization.c:
tokenize_input
Function: Iterates through the input, creating tokens for words separated by spaces.handle_special_chars
Function: Handles the special characters in the input string.handle_word
Function: Handles the words in the input string.print_tokens
Function: Prints the tokens to verify the tokenization process.
Example:
> ls -l | wc -l > output.txt | ls > output2.txt
Token: ls | Type: WORD
--------------------------------------------------
Token: -l | Type: WORD
--------------------------------------------------
Token: | | Type: PIPE
--------------------------------------------------
Token: wc | Type: WORD
--------------------------------------------------
Token: -l | Type: WORD
--------------------------------------------------
Token: > | Type: REDIRECT_OUT
--------------------------------------------------
Token: output.txt | Type: WORD
--------------------------------------------------
Token: | | Type: PIPE
--------------------------------------------------
Token: ls | Type: WORD
--------------------------------------------------
Token: > | Type: REDIRECT_OUT
--------------------------------------------------
Token: output2.txt | Type: WORD
--------------------------------------------------
The parsing process involves analyzing the tokens to understand their syntactical relationship. This step constructs a representation of the input that reflects the user's intention.
- Command Parsing: Parsing commands and their arguments, creating command nodes in the AST.
- Pipeline Parsing: Parsing pipeline tokens, creating pipeline nodes in the AST.
- Redirection Parsing: Parsing redirection tokens, creating redirection nodes in the AST.
- File Node Creation: Creating a file node for redirections in the AST.
The AST node structure will represent the input string in a way that reflects the user's intention. The AST will be composed of nodes that represent commands, arguments, redirections, and pipelines.
typedef struct s_ast_node
{
t_node type type;
char *args;
struct s_ast_node *left;
struct s_ast_node *right;
} t_ast_node;
The parsing function will iterate through the tokens, building an abstract syntax tree (AST) that represents the input string.
parse.c:
parse_tokens
Function: Iterates through the tokens, building an abstract syntax tree (AST) that represents the input string.parse_command
Function: Parses a command and its arguments, creating a command node in the AST.parse_pipeline
Function: Parses pipeline tokens, creating pipeline nodes in the AST.parse_redirection
Function: Parses redirection tokens, creating redirection nodes in the AST.create_file_node
Function: Creates a file node for redirections in the AST.
generate_ast_diagram/generate_ast_diagram.c:
generate_ast_diagram
Function: Generates a visual representation of the AST, showing the structure of the input string.
Example:
ls -l | wc -l > output.txt | ls > output2.txt
-
Built-in Commands: Implementing internal versions of several built-in commands (
echo
,cd
,pwd
,export
,unset
,env
,exit
) that behave consistently with their bash counterparts. -
External Commands: Implementing logic to search for and execute the right executable based on the PATH environment variable or a specified path.
-
Process Creation: Using system calls like
fork
,execve
,wait
, andpipe
to manage process creation and execution. -
Redirection and Pipes: Implementing input and output redirection (
<
,>
,>>
,<<
) and pipes (|
) to allow for command chaining and data redirection.
-
echo: Outputs the arguments passed to it.
-
cd: Changes the current working directory.
-
pwd: Prints the current working directory.
-
export: Sets environment variables.
-
unset: Unsets environment variables
-
env: Prints the environment variables.
-
exit: Exits the shell.
- SIGINT: Handling the
ctrl-C
signal to interrupt the shell's execution. - SIGQUIT: Handling the
ctrl-\
signal to quit the shell. - EOF: Handling the
ctrl-D
signal to exit the shell.
-
Environment Variable Expansion: Managing environment variables and supporting their expansion within commands, including the special case of
$?
to represent the exit status of the most recently executed command.
- reda ghouzraf : The most skilled human tester in the world
- Minishell Tests
- https://github.com/Pyr-0/Minishell-42/wiki/edge-(or-non-edge)-cases-to-test-out
- https://github.com/nicolasgasco/42_minishell/blob/main/test_cases.txt
├── includes
│ └── minishell.h
├── lib
│ └── libft
├── Makefile
└── src
├── 01_input_validation
├── 02_tokenization
├── 03_parsing
├── 04_execution
├── 05_builtin
├── 06_signals
├── 07_env_var_expansion
├── main.c
└── utils
- https://www.gnu.org/software/bash/manual/html_node/Definitions.html#index-name
- https://www.youtube.com/watch?v=S2W3SXGPVyU&ab_channel=theroadmap
- https://tomassetti.me/guide-parsing-algorithms-terminology/
- https://github.com/os-moussao/Recursive-Descent-Parser/tree/master
- https://minishell.simple.ink/
- https://github.com/iciamyplant/Minishell
Thanks to reda ghouzraf, MTRX, Nasreddine hanafi, Khalid zerri for their help and support during the project.