To ensure that you can fully benefit from our specialization program in Big Data Analytics, we recommend that you have basic knowledge of Python and experience with using the Linux command line. If you are already familiar with these, our program will offer in-depth knowledge and skills that can help you to bring your expertise to the next level. We strive to create a supportive and inspiring environment for our students.
The required knowledge of Python includes:
-
Basic syntax and data types (
None
,bool
,int
,float
,str
,list
,tuple
,dict
). -
Understanding the concept of variables and variable scope.
-
Control structures (if-else, for and while loops).
-
Ability to write simple functions and use them in programs.
-
Basic Python libraries like
math
,random
, as well as understanding and utilizing classes from thedatetime
module. -
Ability to handle basic string and list methods for manipulation. For example:
For strings:
upper()
: Converts all the characters in a string to uppercase.lower()
: Converts all the characters in a string to lowercase.strip()
: Removes leading and trailing whitespace from a string.split()
: Splits a string into a list where each word is a list item.replace()
: Replaces a specified phrase with another specified phrase.
For lists:
append()
: Adds an element at the end of the list.insert()
: Adds an element at the specified position.remove()
: Removes the first item with the specified value.pop()
: Removes the element at the specified position.sort()
: Sorts the list.
-
Reading from and writing to files.
-
Handling errors and exceptions using try/except blocks.
-
An understanding of Python's logging system.
- Handle regular expressions for pattern matching in strings.
- Understand object-oriented programming: classes, objects, methods.
- Understand and use list comprehensions and lambda functions.
- Understand Python's memory management and optimization techniques.
Familiarity with the following Python libraries would be beneficial:
- NumPy: For numerical computations.
- Pandas: For data manipulation and analysis.
- Matplotlib: For data visualization.
Your code writing style should be clear, concise, and efficient. Avoid excessive code repetitions by adhering to the DRY (Don't Repeat Yourself) principle. This means that information is not duplicated, and you use appropriate structures to encapsulate repeated code. This not only reduces redundancy but also improves readability, maintainability, and scalability of the code. Code you produce should be self-explanatory, with meaningful variable, function, and class names. Comments should be used to explain the 'why' rather than the 'what' or 'how'. This promotes consistency and makes the code easier to understand and debug. Remember, code is read more often than it is written, so strive for clarity and simplicity.
Being comfortable with basic Linux terminal commands is essential to interact with Linux-based systems efficiently. These skills are indispensable for software and data engineers, whom our program is targeting. Knowledge of these terminal commands ensures a deeper understanding of computer systems and equips you with the tools needed for professional development in technology fields.
man
: Display the user manual of a commandssh
: Secure shell remote login.ls
: List directory contents.cd
: Change the current directory.pwd
: Print the name of the current directorycp
: Copy files and directories.mv
: Move or rename files and directories.rm
: Remove files and directories.cat
: Concatenate and display file content.echo
: Display a line of text.head
: Output the first part of files.tail
: Output the last part of files.grep
: Search for a specific pattern within files.find
: Search for files in a directory hierarchy.chmod
: Change the permissions of files or directories.chown
: Change the owner and group of files or directories.df
: Report file system disk space usage.du
: Estimate file and directory space usage.tar
: Archive files.gzip
: Compress or expand files.ps
: Report a snapshot of the current processes.top
: Display Linux tasks.kill
: Send a signal to a process.curl
: Transfer data to or from a server.scp
: Securely copy files between a local host and a remote host or between two remote hosts.rsync
: Only transfers the changes made rather than transferring all the files again.
You should be able to use these Linux commands comfortably and understand their options and parameters.
You should also understand the concept of Linux file permissions and know how to use pipes (|
) and redirection (>
, >>
, <
).
- Offers a free interactive Python tutorial that is perfect for both beginners and experienced programmers. It covers everything from the basics to more advanced topics, including data science tutorials. The website is designed to make learning Python accessible and straightforward, with the option to join a community group for discussions and updates.
https://www.codecademy.com/learn/learn-python
- This course is a great introduction to both fundamental programming concepts and the Python programming language. By the end, you’ll be comfortable programming in Python and taking your skills off the Codecademy platform and onto your own computer.
https://labex.io/courses/linux-basic-commands-practice-online
- In this course, you will practice the most commonly used Linux commands. You will learn how to use the commands to manage files and directories, search for files, and process text. You will also learn how to use the commands to check the disk usage, and measure the time of command execution.
- Linux Survival is a free tutorial designed to make it as easy as possible to learn Linux. Even though Linux has hundreds of commands, there are only about a dozen you need to know to perform most basic tasks.