The purpose of these study notes are to perpare software engineers for technical interviews, by providing bullet points on various subjects in software. These notes are useful for pre-interview study, and they're not written nor intended as a "cheat sheet", so definitely do not use this during an interview! It will not work and you will fail either way if you are considering it!
- Common Algorithms and Complexity Problems
- System Design and Frameworks
- Content Delivery Network (CDN) Design
- Load Balancer Design
- ACID-ity
- Databases
- CAP Theorem (Consistency, Availability, Partition Tolerance)
- Common Tools and Frameworks
- Common Steps for Designing a Framework
- URL Shortener Example
- Twitter design example
- Webcrawler Service
- Chat Server
- Design Scope
- Functional Requirements
- Non-functional Requirements
- OAuth2
- GeoIP Geolocation Backend Service
- C++ Programming (the language of the old gods and universe)
- C++ Versions
- Compilers
- Unit Testing
- Object Oriented Programming (OOP)
- Data Processing in C++
- Standard Template Library (STL)
- String Literals
- RAII (Resource Allocation Is Initialization)
- Other C++ notes
- Locality
- Example Questions
- Windows (win32) Interfaces and Internals
- Windows Driver Programming / NT Kernel
- IRP Callbacks and Initialization
- IOCTL (I/O Control)
- IRQLs
- Synchronization Primitives
- Executive Resources
- Dispatcher Objects
- Interlocked Operations
- Read-Write Locks
- Paged Memory
- Non-paged Memory
- Memory Routines / API
- DMA (Direct Memory Access)
- Kernel Objects in WDM TODO
- Threads TODO
- User-mode to kernel mode comms
- System Calls
- Functions TODO
- APC/DPC
- Usermode Communications
- Python
- Language Symantics
- Data Structures and Complexity
- Decorators
- OOP / Python Classes
- List Comprehension
- Dictionary comprehension
- Generators
with
statement- Lambda functions (anonymous functions)
- Slicing (like in golang)
- Unpacking
global
andnonlocal
keywordsyield
keyword- Ellipsis (
...
) - Walrus Operator (
:=
) f
strings, Format strings- Type hints (Type Annotations)
- Data Classes (library)
- Other language notes
- golang Programming Language
- Windows API
- Network Engineering and Security
- Malware Analysis and Research
- File Formats
Table of contents generated with markdown-toc
- Identify bottlenecks profiling, benchmarking
- Determine algorithm and data structures
- Identify algorithmic efficiency
- Identify memory allocation/deallocation issues
- Identify unnecessary instantiation
- Verify that structures are aligned
- Use multi-threading and synchronization, if possible
- Use a thread pool (creating/destroying threads is expensive)
- Cache optimization (spatial and temporal locality)
- Compiler optimization flags (
-O3
) - Refactoring
- Loop unrolling
- Minimize locking
- Buffering, Batching
- Scalability
mmap
on Linux lets you map the file into memory without reading into memory entirely- Use a database like NoSQL (Redis)
- Vectorization (SIMD)
- Sorted by key, using BST
- Automatically sorted using the BST each time an element is inserted
- Each node has two children, left and right. "If value is greater than node, store in left, otherwise right"
- Searching is O(log (n)) because for each node that is entered, half of the search tree is not searched
- No duplications
- Sorted dictionary
- Database indexing
- Symbol tables (compilers)
Insertion
Same as search since it requires time to find a node to insert into O(log(n))
Traversal
Can be done through recursive functions i.e. root, left subtree, right subtree, etc
- Key -> hash algo -> integer
- Performance is generally faster (O(1)) for search and insert
- Cache implementations (quick access of resources based on URL)
- Word Frequency count
- Indexes for unordered data
A singleton is a design pattern where only one object of its kind exists and can be accessed. For example a class with a static variable. Each instance of the class ensures that the resource is locked before access (mutex guard)
Lookup, full doc: System Design Interview An Insider’s Guide by Alex Xu (z-lib.org)
Typical System Design
- Prevent DDoS
- Separate malicious from legitimate traffic requests
- Pass request to load balancer
- Deliver content to users
- Cached in RAM
A
or AAAA
records or Authoratative Records are used to distribute requests to a specific server, based on subdomain
Round Robin DNS requests load balance by having a pool of IP addresses, and allowing each IP to handle requests such that they are equally distributed under load
Mechanism in which multiple servers have one IP address
- Reduces load balancer load DSR (Direct Server Return) allows for the server to directly return a response, without needing to route back through a load balancer
Requirements
Requirements | L4 | L7 | DNS |
---|---|---|---|
Data Requirements | Request Packet Header | Contents of the request | DNS lookup before request |
Session Security | SSL passthrough | TLS termination | Not impacted |
Implementation Location | Hardware or Software | Software Only | Fast |
Access to Content | Content Agnostic | Inspects and manipulates content (cookies, paths, sessions, etc) | Content agnostic |
- Atomicity is guaranteed
- Consistency in the database is guaranteed, prevents multiple states
- Isolation: transactions are executed concurrently
- Durability: in case of a power failure, is the data retained?
Database Type | Solution |
---|---|
Key–value cache | Apache Ignite, Couchbase, Coherence, eXtreme Scale, Hazelcast, Infinispan, Memcached, Redis, Velocity |
Key–value store | Azure Cosmos DB, ArangoDB, Amazon DynamoDB, Aerospike, Couchbase, ScyllaDB |
Key–value store (eventually consistent) | Azure Cosmos DB, Oracle NoSQL Database, Riak, Voldemort |
Key–value store (ordered) | FoundationDB, InfinityDB, LMDB, MemcacheDB |
Tuple store | Apache River, GigaSpaces, Tarantool, TIBCO ActiveSpaces, OpenLink Virtuoso |
Triplestore | AllegroGraph, MarkLogic, Ontotext-OWLIM, Oracle NoSQL database, Profium Sense, Virtuoso Universal Server |
Object database | Objectivity/DB, Perst, ZODB, db4o, GemStone/S, InterSystems Caché, JADE, ObjectDatabase++, ObjectDB, ObjectStore, ODABA, Realm, OpenLink Virtuoso, Versant Object Database |
Document store | Azure Cosmos DB, ArangoDB, BaseX, Clusterpoint, Couchbase, CouchDB, DocumentDB, eXist-db, Google Cloud Firestore, IBM Domino, MarkLogic, MongoDB, RavenDB, Qizx, RethinkDB, Elasticsearch, OrientDB |
Wide-column store | Azure Cosmos DB, Amazon DynamoDB, Bigtable, Cassandra, Google Cloud Datastore, HBase, Hypertable, ScyllaDB |
Native multi-model database | ArangoDB, Azure Cosmos DB, OrientDB, MarkLogic, Apache Ignite,[23][24] Couchbase, FoundationDB, Oracle Database |
Graph database | Azure Cosmos DB, AllegroGraph, ArangoDB, InfiniteGraph, Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso |
Multivalue database | D3 Pick database, Extensible Storage Engine (ESE/NT), InfinityDB, InterSystems Caché, jBASE Pick database, mvBase Rocket Software, mvEnterprise Rocket Software, Northgate Information Solutions Reality (the original Pick/MV Database), OpenQM, Revelation Software's OpenInsight (Windows) and Advanced Revelation (DOS), UniData Rocket U2, UniVerse Rocket U2 |
Spliting Databases into "shards" in order to distribute load along different systems. MSSQL, Oracle, MySQL, etc
SQL: Typical table-based relational database. Large physical storage model.
NoSQL: Key-Value (dictionarys). Key is an attribute name, which is linked to a value. Redis, Voldemort, Dynamo are examples. Better to scale. Schemas can be added on the fly and are dynamic.
Wide Column Databases: Instead of tables, columnar databases have column families which are containers for rows. Useful for large datasets. Examples are Cassandra and HBase
Graph Dtabases: Graph based storage (i.e. nodes, entities). Examples: Neo4J and InfiniteGraph
Which to Use?
- ACID compliance, which requires SQL. If data is structured and unchanging.
- NoSQL is useful in large volumes of data. Think twitter tweets. Useful in cloud solutions, efficient for large scale databases which are stored in RAM. NoSQL is useful in rapid development and doesn't need to be prepped ahead of time.
- Amazon S3 (Object Cloud Storage): useful for storing large objects, like pastebin text or images. Easy to scale in cloud
- What is the magnitutde of connections?
- How many users?
- Is it a subscription model, and does it require premium users (whereby limiting bandwidth and storage capacity)
- Design the API (REST)
- Authentication API
- Query for specific data
- Store specific data
- Do not add APIs that have no functional requirements
Method | Path | Description |
---|---|---|
GET | /v1/businesses/:id | Return detailed info about a business |
POST | /v1/businesses | Add a business |
PUT | /v1/businesses/:id | Update details of a business (id) |
DELETE | /v1/businesses/:id | Delete Business |
- If two way communication is required: websocket
- websocket is stateful, and stateful architecture is more complex to scale in the backend
- Load Balancer is an API gateway (mux library)
- Services require persistence: databases, or data storage layer
- Database scaling
- High concurrency
- Failure scenarios
Rather open-ended, look into details of the design and correct architecture
Functional Requirements: Given a URL, shorten it. Use a checksum/hash of the input URI? Optionally pick a short link. Expiration. Redirection Non-Functional: Highly-available. Redirection should be immediate. Shortened links should neot be predictable (solved with hash). Extended: Analytics, RESP API (Solved by designing functional requirement API). Requirement for authentication (OAuth2)
- Can anyone create URLs through API? Does it require a key?
- Some frequently used URLs should be cached
- Traffic will be 100:1 read:write ratio
- Load will be 500m new URLs per month
- 50B redirections per day
- Traffic estimate: Queries Per Second: If we have 500million new URLs per month, then
500m / (30days * 24hours * 3600seconds) = ~200URL/s
. Redirections will be100(read redirections) * 200 URLs/s = 20K/s
(since we have 100x more reads than writes, so 20K/s URLs will be accessed) - Storage estimate: Let's say the URL is stored for 1 month. At 500/m new URLs. Each one is 512bytes in the database. So 512b * 500m = 256,000,000,000 = 256gb of storage per month. For 5 years? 5 * 12 = 60. 60 * 256gb = 15.36TB for 5 years.
- Bandwidth: Read? 200URLs/s. Write? 2URLs/s. 128kb for one API redirect. So 128kb * 200URL/s = 25.6MB/s
- Accessing URLs doesn't require an API, it just reaches the shortener service
- Create URL: Expiry date, api key, user name, original url, custom URL name
- Delete URL: api key, shortened URI name
URL Table: URL Hash (PK), Original URL, Creation Date, UserID User Table: User ID (PK), Name, Email, CreationDate
- Since there are no relationships between tables, we can use NoSQL as its faster
- Many objects, lots of queries
- Client sends API request to Shorten
- Shortener sends requrest to Encoder (checksum)
- Checksum is stored in the database
- If failed the duplication, add append another unique sequence, like username
- Return shortened URL to user
- Can add a LB between clients and applications, application server and database, applications and cache
- Roundrobin load distribution
- Tweets (sending, receiving, and viewing tweets)
- Timeline
- User-timeline (list your own tweets)
- Home timeline (tweets from people you follow)
- Following
- Relational database design
- User-table
- Tweet table
Tweet Table
tweet_id | content | user_id |
---|---|---|
<sum> | <string 256 characters long> | <32bit int> |
- Issue with the tweet table is the size, it will increase substantially as it needs to store historical data
User Table
user_id | User Name | Creation Date |
---|---|---|
<int 32 bit> | String | datetime |
- Optimize the home page refresh (i.e. fill the user's home page with relevant tweets) Tweet Send -> REST/HTTP/PUT -> API Proxy (Load Balancer) -> Redis Cluster (In-memory database, can be n servers with the same db), 3 machines
- The tweet will trigger a refresh (in redis) of all home pages associated with that user
REDIS Cluster Details
- Each user has a list/vector of tweets, that represents his home timeline
- List contains tweet ID, sender ID
- Let's say that Alice sends a tweet, and Alice has 100 followers, then 300 user lists will be refreshed (this is only for active users)
- Issue with this is that if a user has millions of followers, then a million lists will need to be refreshed
- Another issue is that a user home list can be refreshed, while others are not (due to load). So some people see re-tweets while others do not see the original tweet. Still an on-going issue (apparently)
- Send tweet -> Load balancer -> Followers (?) -- this is unclear
- Bob Accessing Timeline Browser (GET) -> Load Balancer -> Redis Cluster (only one in cluster needs to answer; fastest) -> Populate bob's timeline
- There are lots of redis machines in the cluster, only one needs to answer (that contains Bob's list), this can be done through a hash map
- In between the Load Balancer and Redis, there is a hash lookup, that returns the redis IP containing Bob's timeline
- Searching for tweets? The load balancer send a request to another machine that will hash the tweet
- Use HBase (wide column database), database sharding
- Load balancer splits up users into their respective chat servers
- Chat servers can either do push (client pushes messages to server, requires open connection), or pull (clients periodically checks for new messages)
- Each chat server has a connection pool, with a particular client
- Clarify requirements
- Who will be the users? How many users do we anticipate?
- What features do we need?
- Focus on scale
- Focus on Performance
- User interactions
- Data management
- Business Processes
- Authentication/Authorization
- Reporting
- Error Handling
- Performance Requirements
- Integration Requirements (API)
- Security functions (TLS)
- Performance
- Reliability
- Availability
- Usability
- Security
- Maintainability
- Portability
- Scalability
- Compliance
- Disaster Recovery
- Obtain Authentication
- Exchange an access token
- Use access token to make requests to API (REST)
- Refresh token
Functional Requirements
- Users can search for businesses: Based on their location, get a map and list of locations of businesses near user
- Users can add or remove businesses
- Users can view information about a business
Non-functional Requirements
- 100million users (DAU, Daily Active Users)
- 200million businesses
- Latency should be low, users should be able to find nearby businesses quickly
- Highly available, traffic spikes, etc
Assumptions and Calculations
- Assume a user makes up to 5 queries per day
- 100m DAU (Users)
- 200m Businesses
So: (100m DAU) x (5 searches per user) = ~5000 queries
How much storage do we need for 200m businesses? We don't know yet until we design a schema
Uses REST API
- Search API Request:
GET /v1/search/nearby
- Note: this API contains the Geo location of the user
Field | Description | Type |
---|---|---|
latitude | Business location latitude | double |
longitude | Business location longitude | double |
radius | Optional. Default is 5KM | int |
- Search API Response:
{
"total": 10,
"business_list": [
{
"name": "business_name"
"location": ...
}
]
}
- Pagination, or having a list of responses based on location and relevance, should be included in real life
Request:
API Path | Description |
---|---|
GET /v1/businesses/:id | Return details on a business |
POST /v1/businesses | Add a business |
PUT /v1/businesses/:id | Update a business |
DELETE /v1/businesses/:id | Delete business |
Business Table
Businesses | |
---|---|
business_id | PK |
address | |
city | |
country | |
latitude | |
longitude |
geospatial_index
business_id | geospatial_index |
---|---|
business_name | location |
businesses table: 200m * 1kb = 200gb
- Just calculate based on database and size
- Load balancer distributes API, either you search
/search/nearby
or you check business data/businesses/{:id}
- Location Based Service is read heavy, no write. 5000 QPS (queries per second)
- Business Service Handles the requests for the business objects, it writes or modifies the business database
Database Cluster and Topology
- Search business dataset is small, but the total business dataset is high
TODO Major rewrite of this section
Multi-paradigm language (OOP) Procedural (C, callback) Imperative (uses statements to change the state of the program) Functional Generic, modular
TODO
auto [x, y] = std::make_pair(10, 20);
std::make_pair()
Create a std::pair
object
Allows the compiler to conditionally compile a type based on template parameters
template <typename T>
void foo(T value) {
if constexpr (std::is_integral_v<T>) {
// Compiled if code is an integral type
}
}
Variadic template code
template <typename... Args>
auto sum(Args... args) {
return (args + ...);
}
struct Point {
int x;
int y;
};
Point p{10, 20};
constexpr auto add = [](int x, int y) { return x + y; };
more on string literals below
u8'€'; // utf8_char type
std::variant
std::for_each
iterator
<filesystem>
header
std::make_unique<T>
initializes a unique_ptr
of type T
Personal study notes. test
LLVM VC++ gcc CLang Intel C++
TODO
gtest package from Google
Really over-simplified version of OOP
-
Encapsulation Classes Wrapping data and information in a single unit, I.E. encapsulating complexity away from the whole, by limiting it to a single object
-
Inheritance The concept of a class inheriting the properties, state, or variables of another class Derived Class: the class that inherits from another class (i.e. child) Base Class: the class being inherited from
-
Polymorphism Achieved through function overriding (separate from overloading, which allows one function to have multiple signatures and definitions) In greek: Taking many forms
class A { public: int func(void) { return 1; } }; class B : public A { public: int func(void) { // func is overridden by class B implementation return 2; } }
-
Abstraction Virtual functions and interface classes (i.e. pure virtual classes)
Useful for abstracting away the complexity of a certain class, for example:
class Logger
is an interface class, it can log to either OutputDebugString(), stdout, or to a file. It has purely virtual methods:virtual int func(std::string s) = 0;
class ConsoleLogger : Public Logger
implementsfunc
, such that the definition offunc
outputs to stdout
- Primitive Datatypes include
int
,floats
,char
,double
,bool
. - Composite Datatypes include
struct
,class
,std::array
, STL containers, database records, etc - Files and Streams include text, binaries, network streams (TCP/IP), CSV, JSON, XML, serialization
- Containers and Data Structures include STL,
std::vector
,std::list
,std::array
. Data processing is achieved by algorithms (see complexity): searching, sorting, filtering, access, etc
Applications
- Numerical and scientific computing
- Data Analysis and Machine Learning
- Game dev
- System monitoring utilities
- Network programming
std::vector<>
(dynamic expand array, similar to a C array but you can iterate and push to it) access: O(1) insert/remove at end O(1) insert remove at middle O(n)std::list
(essentially a linked list) access: O(n) at worst, search is not ideal insert/remove (beginning or end): O(1), relatively fast, otherwise O(n) at worststd::deque
(double ended fast insert queue) access: O(1) insert at beginning or end: O(1), middle is O(n)std::queue
(pretty much a queue) push: O(1) pop: O(1)std::map
(dictionary) access, insert, removal: O(log(n)) <- hella faststd::set
(unique keys in sorted order) TODO access, insert, remove: O(1) best, O(n) worststd::unordered_set
(same as above, but not ordered based on element insertion/deletion)std::pair
Pair, can be any type, simple containerstd::tuple
Tuple can hold a collection of elements, each can be a different type
*See signal.h
for a good example of this
Do not forget std::unique_lock
, at end of scope, release an std::mutex
std::unique_lock<std::mutex> mlock(syncMutex);
https://refactoring.guru/design-patterns/observer/cpp/example
An observer is a design pattern in C++ that allows objects (observers) to be signalled when an state change ocurrs. This is especially useful in GUI programming.
<thread>
<mutex>
<future>
<atomic>
<condition_variable>
bool isRunning = false;
void func1(int param)
{
while (isRunning) {
Sleep(1000);
}
}
int main(void)
{
std::thread new_thread(func1, 0);
new_thread->detach();
Sleep(1000);
isRunning = false;
if (std::thread->is_joinable()) {
std::thread->join();
}
return 0;
}
std::mutex
std::recursive_mutex
std::timed_mutex
std::recursive_timed_mutex
Locks are used in RAII, used for synchronizing access to resources
std::lock_guard
Locked on construction, unlocked on destruction, out of scopestd::unique_lock
Used in conjunction withcondition_variable
, can be locked and unlockedstd::scoped_lock
(C++17)
- Used om conjunction with
std::unique_lock
- Condition variables that allow for safe management of
std::thread
- Essentially wait for certain conditions to become true
- Used in conjunction with mutexes to synchronize threads, for example, the condition variable waits until a mutex is released
std::condition_variable
in <condition_variable>
Example in which you can send multiple signals to waiting threads, each will wait with a condition_variable
until mtx
is signalled
std::mutex mtx;
std::condition_variable cv;
bool ready = false;
void worker(void) {
// Acquire the mutex. i.e. reach the `CRITICAL_SECTION` for `lock`
std::unique_lock<std::mutex> lock(mtx);
// Wait until the lock is signalled, then execute lambda predicate
cv.wait(lock, []{ return ready; });
}
void signal(void) {
{
std::lock_guard<std::mutex> lock(mtx);
ready = true;
}
cv.notify_one();
}
This allows for each thread to be dispatched in the order that it was locked
std::mutex mtx;
std::condition_variable cv;
int order = 0;
void threadFunction(int id) {
std::unique_lock<std::mutex> lock(mtx);
cv.wait(lock, [id]{ return order == id; });
std::cout << "Thread " << id << std::endl;
order++;
cv.notify_all();
}
std::atomic
used for atomic (interlocked) operations in C++
- relaxed (
memory_order_relaxed
): There are no ordering or synchronization constraints, except the modification is interlocked - consume (
memory_order_consume
): Rarely used TODO - acquire (
memory_order_acquire
): Ensures that memory writes before the atomic operation are not moved after it. - release (
memory_ordr_release
): Ensures that memory writes after the atomic operation are not moved before it. - acquire-release (
memory_order_acq_rel
): Combines both acquire and release semantics. - sequentially consistent (
memory_order_seq_cst
): The strongest memory ordering, ensuring a total order of all sequentially consistent operations.
std::atomic<int> counter(0);
void increment() {
for (int i = 0; i < 10000; ++i) {
counter++;
}
}
int main() {
std::vector<std::thread> threads;
// Create 10 threads to increment the counter
for (int i = 0; i < 10; ++i) {
threads.emplace_back(increment);
}
// Wait for all threads to finish
for (auto &thread : threads) {
thread.join();
}
std::cout << "Counter value: " << counter << std::endl;
return 0;
}
std::memory_order_seq_cst
(Sequentially constant)
- Default memory order for atomics, strictest and most intuitive memory order.
- Prevents any reordering of read/write operations
- Is not ideal for performance
std::memory_order_acquire
andstd::memory_order_release
- Used for implementing mutexes, condition variables (), designing lock free data structures
std::memory_order_relaxed
- No ordering requirements, very fast
- Statistics collection, flag settings, counter increments
std::atomic<bool> flag(false);
flag.store(true, std::memory_order_seq_cst);
flag.load(std::memory_order_seq_cst);
std::thread tNewThread = std::thread(callback, 0);
std::thread
object containing an instance of a thread
thread->join()
, thread->detach()
, thread->is_joinable()
Smart way to make unsafe pointer allocation/deallocation in C safer by performing automatic garbage collection
std::unique_ptr<TypeObj> a = PtrTypeObjFactory();
Creates a std::unique_ptr
, using a factory.
std::unique_ptr
own their pointer uniquely. When the std::unique_ptr<>
instance goes out of scope, it will automatically destroy the object to which it points
std::shared_ptr<>
holds a reference count. Once the reference count reaches 0, and all instances of the object are out of scope, delete the object pointed to by the shared_ptr
std::auto_ptr<>
aye
template<typename Key, typename Value>
class Cache {
public:
struct CacheItem {
Value value;
std::chrono::steady_clock::time_point expiryTime;
};
private:
std::unordered_map<Key, CacheItem> cacheMap;
std::mutex cacheMutex; // For thread safety
std::chrono::seconds defaultTTL; // Time-to-live for each cache item
public:
Cache(std::chrono::seconds ttl) : defaultTTL(ttl) {}
void put(const Key& key, const Value& value, std::chrono::seconds ttl = std::chrono::seconds(0)) {
std::lock_guard<std::mutex> lock(cacheMutex);
auto expiryTime = std::chrono::steady_clock::now() + (ttl.count() > 0 ? ttl : defaultTTL);
cacheMap[key] = {value, expiryTime};
}
bool get(const Key& key, Value& value) {
std::lock_guard<std::mutex> lock(cacheMutex);
auto it = cacheMap.find(key);
if (it != cacheMap.end() && it->second.expiryTime > std::chrono::steady_clock::now()) {
value = it->second.value;
return true;
}
return false;
}
void removeExpiredItems() {
std::lock_guard<std::mutex> lock(cacheMutex);
auto now = std::chrono::steady_clock::now();
for (auto it = cacheMap.begin(); it != cacheMap.end(); ) {
if (it->second.expiryTime <= now) {
it = cacheMap.erase(it);
} else {
++it;
}
}
}
};
TODO
SFINAE (Substitution failure is not an error)
Invalid substitution may not necessarily indicate an error
Retrieves the type of the variable, useful in templates:
int a = 5;
decltype(a) b = a; // evaluates to int b = a;
decltype(auto) func(void) {
int x = 5;
return (x); //returns int& since x is an lvalue
}
TODO
<algorithm>
header
std::sort
std::stable_sort
std::partial_sort
std::find
, std::binary_search
, std::lower_bound
, std::upper_bound
std::copy
, std::move
, std::replace
, std::remove
, std::merge
, std::accumulate
TODO
- Input Iterators, for reading; moving forward
- Output Iterators: For writing, moving forward
- Bidirectional Iterators: for rw, bi-directional
- Random Access Iterators
std::begin()
,std::end()
,std::iterator
,std::const_iterator
std::transform()
,std::for_each()
TODO header (functional programming) lambda expressions
TODO Improve on this.
R"(\)" <- literal string, \ does not escape (const char *)
L"asdfasdf" <- UTF-8 string (wchar_t)
u"test" const char16_t* (UTF-16)
U"test" const char32_t* (UTF-32)
A terrible name for a beautiful concept: encapsulating the lifecycle of a resource within a class constructor and destructor. Once the scope of the class has ended, the resource is automatically free'd. This can be implemented through the class constructor and destructor.
- Encapsulate a resource into a class, for example, a
HANDLE
's lifecycle is managed by aclass
- Access the resource via class, for example, interfacing with
HANDLE
is done via callingclass
methods - The resource must be freed once at end of scope, ie call on the destructor
vftable
orvtable
is an array of function pointers that point to the definitions for that particular class- Livelock: when two threads are running, polling, and waiting on each other for some event, thus a cyclic depedency and the threads become locked
- Starvation: indicates what the name suggests: a thread is perpetually denied access to a resource that are being locked by other competing threads. Occurs when lock aquisition is not fair, thus starvation
not just C++, but a concept in architecture Refers to the access of the same value or related storage location
- Temporal Locality: Reuse of data in small time frames. For example in memory, if an address is accessed, there is a high chance it will be accessed again
- Spatial Locality: Refers to frequently used memory that are within close proximity, useful in caching
Implementations and examples of Locality
-
Caching: Both forms of locality are crucial for the effective design and utilization of cache memory in computing systems. Caches keep frequently accessed data and instructions close to the CPU to reduce the access time, and a good understanding of locality patterns can significantly enhance cache performance.
-
Algorithm Design: Algorithms can be optimized for better performance by arranging data structures and access patterns to maximize locality. For instance, iterating through an array sequentially exhibits high spatial locality, which is generally more cache-friendly than randomly accessing array elements.
-
Memory Hierarchy Design: Computer architectures are designed considering locality principles. This includes the implementation of various levels of caching (L1, L2, L3 caches), and the design of RAM and virtual memory.
-
Prefetching Strategies: Understanding data access patterns allows for effective prefetching, where the processor anticipates the needed data and loads it into the cache in advance.
Summary Deadlock: Threads are stuck waiting for each other, and there is no change in state without external intervention. Livelock: Threads are actively changing their state in response to each other, but no progress is made. Starvation: Some threads are unable to make progress because others are monopolizing the resources. The monopolizing threads are making progress.
bool isPalindromeUsingLetterFrequency(const std::string& input) {
std::unordered_map<char, int> freqMap;
// Count the frequency of each letter, ignoring case and non-alphabetical characters
for (char ch : input) {
if (std::isalpha(ch)) {
freqMap[std::tolower(ch)]++;
}
}
int oddCount = 0;
for (const auto& pair : freqMap) {
if (pair.second % 2 != 0) {
oddCount++;
}
}
// If the string's length is odd, allow for one odd-count character (middle character in a palindrome)
// If the string's length is even, no characters should have an odd count
int effectiveLength = 0; // This counts only alphabetical characters, considering the assumption
for (const auto& pair : freqMap) {
effectiveLength += pair.second;
}
if (effectiveLength % 2 == 0) {
return oddCount == 0;
} else {
return oddCount == 1;
}
}
The glorious Win32 API and kernel!
- IOCTL Communication between usermode and kernelmode. Remember that this is works by usermode calling on kernel mode using a particlar IOCTL code
- Kernel Object Manipulation Direct manipulation of kernel objects: i.e. threads, processes, etc
- Remote Procedure Call (RPC) Windows and UNIX protocol for defining callbacks in a master, called by slave applications
- Mailslots One-way IPC, applications register a mailslot and receive data from a single sender
- Named Pipes Named pipes are files/device objects that can be read and written to, and therefore allows for IPC and interfacing with drivers
- Sockets Useful for TCP/IP, can be used for IPC or "remote" IPC through the network, and using the network stack
- Synchronization Primitives are explained below
- File System Communication File I/O. Process can communicate through the fs
- System Calls (syscalls) Mechanism for calling the kernel via syscalls, mostly wrapped by
ntdll.dll
andkernel32.dll
- Shared Memory See below for APIs, but essentially
OpenProcess()
,WriteProcessMemory()
,MapViewOfSection()
, etc... - Message Queue (MSMQ) MQ implementation from MS
- Component Object Model (COM) and DCOM, an extension of COM that allows for IPC and shared global objects, which can communicate over the network
#include <iostream>
class Foo
{
public:
int x = 0;
// default ctor
Foo()
{
std::cout << "Default ctor\n";
}
// copy ctor
Foo(const Foo& rhs)
{
std::cout << "Copy ctor\n";
}
};
Foo CreateFooA()
{
return Foo();
}
Foo CreateFooB()
{
Foo temp;
temp.x = 42; // update member variable
return temp;
}
int main()
{
Foo t1(CreateFooA());
Foo t2(CreateFooB()); // Object created twice
return 0;
}
- CreateFooA() creates a temporary Foo object to return.
- The temporary object will then be copied into the object that will be returned by CreateFooA().
- The value returned by CreateFooA() will then be copied into t1.
CreateNamedPipe()
, ConnectNamedPipe()
, CreateFile()
, ReadFile()
, WriteFile()
CreateMailslot()
, see above create/read/write file
WSASocket()
, bind()
, send()
, recv()
, listen()
, accept()
TODO Lookup MSRPC implementation
RpcBindingFromStringBinding()
, RpcBindingSetOption()
, RpcStringBindingCompose()
CreateFileMapping()
, MapViewOfFile()
MQOpenQueue()
, MQSendMessage()
, MQRecvMessage()
LoadLibrary()
, CreateRemoteThread()
, stop thread, alter register state (rip
) and resume thread
- Process Hollowing: Create suspended process, replace text segment resume thread
WriteProcessMemory()
,VirtualAllocEx()
TODO
IUnknown
: provides methods for reference counting an object, all objects must contain an IUnknown interface- IDL (Interface Definition Language) used to define interfaces (in COM and DCOM/RPC)
CoCreateInstance()
,AddRef()
method,Release()
, to increment or decrement reference count to an object
CoCreateInstance()
WMI L"ROOT\\CIMV2"
FileSystem objects
ActiveX
IE COM objects (Navigate())
- Windows Driver Framework (WDF) is newer than Windows Driver Model (WDM)
- Windows Filtering Platform (WFP)
- WDF (Windows Driver Framework)
- KMDF (Kernel-Mode Driver Framework)
- Minifilter: (filesystem and file i/o)
DRIVER_OBJECT *drvObj = getDrvObj(); // Obtain driver object (i.e. instance of current driver)
// Set callbacks for DRIVER_DISPATCH
drvObj->MajorFunction[IRP_MJ_CREATE] = driverMyCreateHandler;
drvObj->MajorFunction[IRP_MJ_DEVICE_CONTROL] = driverDeviceControlHandler;
Callback for usermode IOCTL
drvObj->MajorFunction[IRP_MJ_DEVICE_CONTROL] = driverDeviceControlHandler;
// IOCTL definitions
#define IOCTL_CTD_CMD_REQUEST \
CTL_CODE(FILE_DEVICE_UNKNOWN, 0x101, METHOD_BUFFERED, FILE_READ_DATA | FILE_WRITE_DATA)
// Sample callback from usermode
NTSTATUS driverDeviceControlHandler(PDEVICE_OBJECT DeviceObject, PIRP irp)
{
PIO_STACK_LOCATION sp = IoGetCurrentIrpStackLocation(irp);
switch (sp->Parameters.DeviceIoControl.IoControlCode) {
case IOCTL_CTD_CMD_REQUEST: // IO_BUFFERED request
case IOCTL_CTD_CMD_RESPONSE:
default:
}
}
Initialally, open a file handle to the driver: CreateFile(L"\\\\.\\YourDeviceName");
, then DeviceIoControl()
is called from usermode, specify particular IOCTL:
#define IOCTL_CTD_CMD_REQUEST \
CTL_CODE(FILE_DEVICE_UNKNOWN, 0xa01, METHOD_BUFFERED, FILE_READ_DATA | FILE_WRITE_DATA)
#define IOCTL_CTD_CMD_RESPONSE \
CTL_CODE(FILE_DEVICE_UNKNOWN, 0xa02, METHOD_BUFFERED, FILE_READ_DATA | FILE_WRITE_DATA)
Driver intercepts code by setting the IRP callback:
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = MyIoctlHandler;
From there, switch IOCTL codes:
switch (irpSp->Parameters.DeviceIoControl.IoControlCode) {}
- Most code runs at passive
- Thread execution
- I/O
- Paged memory access
- Synchronization primitives
- Handles system calls
KMUTEX
, semaphores, spinlock, etc
- Handles Asynchronous Procedure Calls (APCs) from services
- Allows fast mutex
- Deferred Procedure Calls (DPCs) TODO how do DPCs differ from APCs?
- Non-paged memory access only
- Fast mutex
- Direct Memory Access (DMA)
- Interrupt Service Routines (ISRs) run at this level, which are callbacks from devices, these respond to hardware interruprs
PROFILE_LEVEL
(IRQL 27)CLOCK_LEVEL
(IRQL 28)IPI_LEVEL0
(IRLQ30)POWER_LEVEL
(IRQL 30)HIGH_LEVEL
(IRQL 31), for critical operations; cannot be context switched or interrupted
- "Once a CPU is executing at an IRQL above PASSIVE_LEVEL, an activity on that CPU can be preempted only by an activity that executes at a higher IRQL."
- The IRQLs between DISPATCH_LEVEL and PROFILE_LEVEL are reserved for hardware
- "Always access shared data at the same elevated IRQL."
- "Code executing at or above DISPATCH_LEVEL must not cause page faults."
Interrupt Request Level - Primer
- Poll until release of spinlock
- IRQL >= DISPATCH_LEVEL, raises IRQL, operates at DISPATCH_LEVEL
- KeAcquireSpinLock() KeReleaseSpinLock()
KMUTEX
- IRQL is
PASSIVE_LEVEL
KeInitializeMutex()
,KeWaitForSingleObject()
,KeReleaseMutex()
- Controlling access to a resource pool
PASSIVE_LEVEL
KeInitializeSemaphore()
KeWaitForSingleObject()
KeReleaseSemaphore()
TODO Get code sample
APC_LEVEL
ExAcquireFastMutex()
ExReleaseFastMutex()
- More flexible than mutexes, allowing shared and exclusive access
PASSIVE_LEVEL
ExInitializeResourceLite()
ExAcquireResourceExclusiveLite()
ExReleaseResourceLite()
- Event Signalling
PASSIVE_LEVEL
,APC_LEVEL
KeSetEvent()
KeWaitForSingleObject()
- Any IRQL
InterlockedIncrement()
InterlockedCompareExchange()
PASSIVE_LEVEL
ExInitializeResourceLite()
PASSIVE_LEVEL
can/should use,APC_LEVEL
ExAllocatePoolWithTag(PagedPool, 'Efl\0');
- Can result in page faults, therefore cannot be used for higher IRQLs
- Greater than or equal to
APC_LEVEL
requires nonpaged memory - ExAllocatePoolWithTag(NonPagedPool)
- Cannot be paged out
MmAllocateContiguousMemory()
ExAllocatePoolWithTag()
RtlZeroMemory()
ExFreePoolWithTag()
MmAllocateContiguousMemory()
- IRP I/O Request Packets
- Driver Object
- PDO (Physical Device Objects)
- FDO (Filter Device Object)
- Dispatch Routine
DriverEntry()
- Lookaside Lists
- Event Objects
- Registry objects
- Security Descriptors
RtlInitUnicodeString(&sddlString, L"D:P(A;;GA;;;SY)(A;;GA;;;BA)");
PsCreateSystemThread()
- IOCTL
- Read/Write Requests ReadFile(), WriteFile()
- Memory mapped I/O (between user and kernel mode)
- System calls
- Shared memory
- Named pipes, mailslots
- WMI
SYSENTER
,INT 0x2e
- SSDT (System Service Dispatch Table)
KeInsertQueueDpc()
KeRemoveQueueDpc()
ZwReadFile()
ZwWriteFile()
ZwCreateKey()
ZwQueryValueKey()
ZwQuerySystemInformation()
ZwSetSystemInformation()
PASSIVE_LEVEL
Asynchrnous Procedure Calls
IRQL: APCs typically run at PASSIVE_LEVEL or APC_LEVEL. The PASSIVE_LEVEL is the lowest level, allowing for most thread-based and user-mode operations. APC_LEVEL is slightly above PASSIVE_LEVEL but below DISPATCH_LEVEL, designed to allow certain kernel-mode operations to be performed asynchronously while blocking lower priority operations.
Purpose: APCs are used to execute operations asynchronously in the context of a specific thread. There are two types of APCs: User APCs, which can be delivered only when a thread is in an alertable state waiting for an event (thus executing in user mode), and Kernel APCs, which are executed when the thread is running in kernel mode but at a lower IRQL, allowing for preemptive multitasking within the kernel.
Deferred Procedure Calls
IRQL: DPCs are executed at DISPATCH_LEVEL, which is higher than APC_LEVEL and is used for high-priority kernel-mode operations that do not require immediate attention but must be executed before the system can attend to lower-priority tasks.
Purpose: DPCs are primarily used for deferred execution of kernel-mode functions, typically as a response to hardware interrupts. When an ISR concludes that a longer processing task is needed but not necessarily immediately, it can schedule a DPC to perform this task. Because DPCs run at DISPATCH_LEVEL, they preempt most other activities in the system, ensuring that the deferred task is executed relatively quickly but without disrupting the handling of current hardware interrupts.
METHOD_BUFFERED
METHOD_IN_DIRECT
METHOD_OUT_DIRECT
METHOD_NEITHER
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = MyIoctlHandler;
IoGetCurrentIrpStackLocation()
switch (irpSp->Parameters.DeviceIoControl.IoControlCode) {}
#define IOCTL_MY_OPERATION CTL_CODE(FILE_DEVICE_UNKNOWN, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)
CreateFile(L"\\\\.\\YourDeviceName");
DeviceIoControl()
- Dynamically typed
- Interpreted
- Scripted
- Garbage collector (internal reference counter)
- Mutability: i.e. does an object change (const)
- Ordering: does an object maintain order
- Duplicates: does an object maintain duplicate elements
Ordered, mutable, duplicates
my_list.append(4)
print(my_list[3]) # output is 4
- access: O(1)
- append: O(1)
- insert/delete: O(n)
- search: O(n) at worst
- len(): O(1)
Ordered, immutable, duplicates
when you need an ordered and immutable collection, which can also be used as a key in a dictionary. Tuples are suitable when you want to ensure the data remains unchanged and order matters.
my_list = (1, 2, 3)
- access: O(1)
- search: O(n)
- len(): O(1)
Unordered, mutable, no duplicates (advantage is that it removes duplicates automatically) Fast search O(1) - O(n)
when you need a collection of unique elements and efficient membership testing, and when the order of elements and immutability are not important.
my_list = {1, 2, 3}
- insert: O(1) - O(n) worst
- delete: O(1) - O(n) worst
- search: O(1) - O(n) worst
- len(): O(1)
- frozenset is an immutable version of a set
Note: normal dictionary is initialized using below [] format, however, dict comprehension is initialized using {} for some reason
my_dict["c"] = 3
- access: O(1)
- insert: O(1)
- delete: O(1)
Immutable array of bytes, i.e. string
my_bytes = b'hello'
mutable array of bytes
my_array = bytearray(b'hello')
my_array[0] = ord('H') # switch element 0 to uppercase H
double-ended queue
push / pop at O(1)
For extending the behaviour of functions. I.e. define a function used as a decorator, and "decorate" another function such that it is extended Used in web frameworks for routing URLs to view functions, logging, authorization, etc
Useful in debugging and logging, timing, authentication (for a specific function):
def log_function_call(func):
def wrapper(*args, **kwargs):
print(f"Calling function: {func.__name__} with args {args} kwargs {kwargs})
result = func(*args, *kwargs)
print(f"{func.__name__} return {result})
return result
return wrapper
@log_function_call
def add(a, b)
return a + b
add(3, 5)
Class: like C++ class, blueprint
Object: instantiation
Inheritance: Form child classes from parent classes
Encapsulation: Hiding/abstracting members of a class/object
Polymorphism: Overriding functions like in C++
class TestClass:
species = "something"
# initializer / constructor (C++)
def __init__(self, name, age):
self.name = name
self.age = age
def get_description(self):
print(f'Name: {self.name} Age: {self.age}')
def print_param(self, param)
print(f'Param: {param}')
def main()
test = TestClass("asdf", 434)
print(test.get_description())
if __name__ == "__main__":
main()
class baseClass:
def func(self):
raise NotImplementedError("Subclass not implemented)
class childClass(baseClass):
def func(self):
print(f"implement {instance.__class__.__name__}")
Same as C++ overloading, but just have different parameter names, nbd Decorators (@dec_function) can also be used to handle overloading
Overloading operators
class Point:
def __init__(self, x = 0, y = 0):
self.x = x
self.y = y
# overload the "+" operator
def __add__(self, other): #other is the second class, for example c1 + c2, where both are classes
return Point(self.x + other.x, self.y + other.y)
def main()
p1 = Point(4, 3)
p2 = Point(5, 6)
p3 = p1 + p2 # implement operator override
Method of merging lists (i.e.), specifically a sort of 'lambda' that can be embedded into a list initialization
test = [ "a", "b", "c" ]
test2 = []
for x in test:
if "a" in x:
test2.append(x)
# Rather do something like this
test2 = [ x for x in test2 if "a" in x ]
Same as above, but allows you initialze a dictionary in an efficient way using a lambda
square_dict = { x: x**2 for x in range(10)}
Generators are iterators, you can only iterate once Generators do not store all the values in memory, they generate values Used in parsing large files, CSV, data streams, etc Function that returns a lazy iterator
mygenerator = ( x * x for x in range(10))
for i in mygenerator:
print(i)
# you cannot iterate the generator twice!!!
# i.e. this will not work:
for i in mygenerator:
print(i0)
class first_n(object):
def __init__(self, n):
self.n = n
self.num = 0
def __iter__(self):
return self
def __next__(self):
return self.next()
def next(self):
if self.num < self.n:
cur, self.num = self.num, self.num + 1
return cur
raise StopIteration()
sum_of_first_n = sum(first_n(100000))
Automatic exception handling with certain functions and operations, produces cleaner code
with open('file_path', 'w') as file:
file.write("test")
multiply = lambda x, y: x * y
sublist = my_list[1:3] # copy from index 1 - 3, resultant is [2, 3]
Allows the assignment from a sequence, list, set, tuple, into different variables
a, b, *rest = range(10)
Useful for modifying scope of variables
def outer():
x = "local"
def inner():
nonlocal x
x = "nonlocal" # x is now outside of current scope
inner()
return x
see Generators
- Iterable is an object that counts a list, the list itself is an iterable or iterable
- A generator is an iterable, that is iterable only once
yield
is a keyword that is used like areturn
, except it returns a generator
mylist = range(10)
def create_generator():
for i in mylist:
yield i * i
mygenerator = create_generator()
print(mygenerator)
Indicates empty code section
def func():
...
Assign values to variables in an expression
if (n := len(a)):
print(f"{variable}")
Optional type hinting
def greet(name: str) -> str
Automatically generates __init__
and __repr__
from dataclasses import dataclass
@dataclass # decorator
class Point:
x: int
y: int
- Global Interface Lock (GIL): Automatic object locking (mutex) for objects
- Multi-paradigm
- concurrent
- imperative
- functional
- OOP
- static typing
var i = 10
i := 10
import (
"errors"
"fmt"
)
func div(x int, y int) (int, error) {
if y == 0 {
return 0, errors.New("Cannot divide by 0")
}
return x / y, nil
}
func main() {
ret, err := div(5, 0)
if err != nil {
fmt.Println("Error:", err)
}
}
type testStruct struct {
name string
a int
b int
}
func (c testStruct) do_operation(x int, str string) (int, error) {
//stuff
}
func main() {
newStruct := testStruct(name: "test", a: 123);
out, err := newStruct.do_operation(134134, "dfiaifj");
}
func main() {
for i := 0; i < 1000; i++ {
fmt.Println(i)
}
if num := 9; num == 9 {
//smth
}
}
Array have fixed sizes, always const Slices are dynamic
// Slice can be initialzed with []
var a []int{1, 2, 3, 4, 5}
a.append(a, 6)
// Array must be constant, with a fixed size
var b [3]int{1, 2, 3}
var b [...]int(1, 2, 3) // array inferred by the compiler
// Get slice and form new slice
c := b[1:2] // get element 1 to element 2
see C++ or Python section for complexities, as they are similar
m := make(map[string]int)
m["test"] = 1
Similar to C, but safer
i := 42
p := &42
fmt.Println(*p)
func f(in string) int
{
fmt.Println(in)
}
func main()
{
go f("test")
// Create a nested thread and send data via channel (blocking)
ch := make(chan string) // specify the type
go func(param string) {
sleep(1000)
ch <- param // Send into channel
}("test")
//Spawn thread and block channel
msg := <-ch
fmt.Println(msg)
}
Similar to pure virtual classes, or interface classes in C++ Golangs polymorphism implementation
type Speaker interface {
Speak(param string) string // Define a virual/interface function
}
// Implementation for Dog
type Dog struct {}
func (f Dog) Speak(param string) {
fmt.Println(param)
}
// Implementation for Car
type Cat struct {}
func (f Cat) Speak(param string) {
fmt.Println(param)
}
func main() {
dog := Dog{}
dog.Speak("test")
cat := Cat{}
cat.Speak("test")
}
fmt : formatted I/O
net/http : http, obvz
io/ioutil : file I/O and streaming
os : OS interoperability
encoding/json : JSON obvz
html/template : HTML
sync : synchronization / concurrency primitives
time : time API
gorilla/mux : URL router multiplexer
golang.org/x/net/websocket : websocks implementation
golang.org/x/oauth2: OAuth2 authorization via HTTP/REST (google)
github.com/gorilla/sessions: cookie / fs
- Key Generation
- RSA
- Choose two primes, ( p ), ( q ), and compute nonprime ( n ) such that ( n = p \times q ): ( n = p \times q ) ( \phi(n) = (p-1) \times (q-1) )
- Choose an integer ( e ) s.t. ( 1 < e < \phi(n) ) and ( e ) and ( \phi(n) ) are coprime (two numbers are coprime iff their GCD (greatest common divisor) is equal to 1)
- Compute ( d ) as the modular multiplicative inverse of ( d \times e \mod \phi(n) ): ( d \times e \mod \phi(n) = 1 )
- Therefore: Public Key: ( (e, n) ) Private Key: ( (d, n) )
- Encryption: For message ( M ) and cyphertext ( C ): ( C = M^e \mod n ) Decryption: ( M = C^d \mod n )
- Digital Signatures
- Signing a message involves generating a hash of the message, and encrypts it with its private key
- To verify: the receiver decrypts the message using its public key and compares the hash to the message
- Certificate Insurance
- Certificate Authority (CA) verifys the identify of the recipient or sender. Root CAs are third party (i.e. DigiCert, Let's Encrypt, Komodo, GlobalSign)
Symmetric
- AES256
- DES
- 3DES
- RC4, RC5, RC6 Asymmetric
- RCA
- ECC (Elliptic curve cryptography)
- MD5
- CRC16, CRC32
- DSA (Digital Signature Algorithm)
- ECDSA (Elliptic Curve DSA)
- Diffie-Hellman (DH) used to exchange keys over cleartext
- ECDH (Elliptic Curve DH)
- SSL/TLS 1.3
- IPSec
- HMAC (Hash based authentication code) combination of a key and hash function
- Stream Ciphers encrypt one bit at a time (RC4)
- Block Ciphers Encrypt a chain of blocks (AES)
WSAStartup()
socket()
connect()
send()
recv()
WSASend()
WSARecv()
bind()
listen()
accept()
InternetOpen()
InternetConnect()
HttpSendRequest()
,HttpOpenRequest()
CreateFile()
ReadFile()
,WriteFile()
DeleteFile()
CopyFile()
,MoveFile()
SetFileAttributes()
RegOpenKeyEx()
RegQueryValueEx()
RegSetValueEx()
RegCreateKeyEx()
RegDeleteKey()
,RegDeleteValue()
Include bcrypt.dll
CryptAcquireContext()
CryptEncrypt()
,CryptDecrypt()
CryptHashData()
CryptDeriveKey()
CryptExportKey()
,CryptImportKey()
- Enumeration of IAT
LoadLibraryA/W()
GetProcAddress()
IWbemServices::ExecQuery
: recon function, for direct querying of WQL
VirtualAllocEx()
,WriteProcessMemory()
,CreateRemoteThread()
SetWindowsHookEx()
: hooks processesSetThreadContext()
function for modify context of remote TEB
IsDebuggerPresent()
CheckRemoteDebuggerPresent()
- Open a
HANDLE
to the remote process - Allocate memory in the remote process, containing the target DLL name
- Call
CreateRemoteThread()
with the parameter as in step 2
- Create process as
PROCESS_SUSPENDED
- Zero out legitimate code from the process
- Set target shellcode or DLL to the remote process
- Call
ResumeProcess()
VirtualAllocEx()
withPAGE_READWRITE_EXECUTABLE
flagWriteProcessMemory()
to target page- Call
CreateRemoteThread()
against targetHANDLE
- An APC is queued to a thread of a process. When the thread enters an alertable state, the APC is executed
#include "pch.h"
#include <iostream>
#include <Windows.h>
FARPROC messageBoxAddress = NULL;
SIZE_T bytesWritten = 0;
char messageBoxOriginalBytes[6] = {};
int __stdcall HookedMessageBox(HWND hWnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType) {
// print intercepted values from the MessageBoxA function
std::cout << "Ohai from the hooked function\n";
std::cout << "Text: " << (LPCSTR)lpText << "\nCaption: " << (LPCSTR)lpCaption << std::endl;
// unpatch MessageBoxA
WriteProcessMemory(GetCurrentProcess(), (LPVOID)messageBoxAddress, messageBoxOriginalBytes, sizeof(messageBoxOriginalBytes), &bytesWritten);
// call the original MessageBoxA
return MessageBoxA(NULL, lpText, lpCaption, uType);
}
int main()
{
// show messagebox before hooking
MessageBoxA(NULL, "hi", "hi", MB_OK);
HINSTANCE library = LoadLibraryA("user32.dll");
SIZE_T bytesRead = 0;
// get address of the MessageBox function in memory
messageBoxAddress = GetProcAddress(library, "MessageBoxA");
// save the first 6 bytes of the original MessageBoxA function - will need for unhooking
ReadProcessMemory(GetCurrentProcess(), messageBoxAddress, messageBoxOriginalBytes, 6, &bytesRead);
// create a patch "push <address of new MessageBoxA); ret"
void *hookedMessageBoxAddress = &HookedMessageBox;
char patch[6] = { 0 };
memcpy_s(patch, 1, "\x68", 1);
memcpy_s(patch + 1, 4, &hookedMessageBoxAddress, 4);
memcpy_s(patch + 5, 1, "\xC3", 1);
// patch the MessageBoxA
WriteProcessMemory(GetCurrentProcess(), (LPVOID)messageBoxAddress, patch, sizeof(patch), &bytesWritten);
// show messagebox after hooking
MessageBoxA(NULL, "hi", "hi", MB_OK);
return 0;
}
Using the global atom table to write code into memory of a target process, then trigger this code of execution
NTSTATUS
NtQueueApcThread(
IN HANDLE ThreadHandle,
IN PPS_APC_ROUTINE ApcRoutine,
IN PVOID SystemArgument1 OPTIONAL,
IN PVOID SystemArgument2 OPTIONAL,
IN PVOID SystemArgument3 OPTIONAL
);
Manually loading a windows DLL, without using the standard API, this involves:
- Manually parsing PE and allocating memory for segments
- Write image segments to pages
- Resolve API, and write to IAT
- Resolve references
- Using the VEH chain to manually route an exception handler to run code
- Alter context of remote thread (i.e. modify
rip
oreip
registers to point to shellcode)
IsDebuggerPresent()
CheckRemoteDebuggerPresent()
NtQueryInformationProcess()
Using exception handlers to detect the presence of a debugger. Malware can generate exceptions to see how they are handled.
Check delta between a and b, and if the delta is greater than the expected value, it may indicate a debugger stepping
dr0-dr7
Registers contain values for debugging
NtQUeryObject()
detects debuggingCreateToolhelp32Snapshot()
enumerate processes and check for debugger running
BeingDebugged
flag in PEB indicates an attached debugger
NtGlobalFlag
may also be altered by debuggers
Check PPID for debugger
- Check MAC address for VM prefix
- Registry checks
- Filesystem and VM guest files
CPUID
instruction- Checking device drivers
Recall how some common queries work
Regex/pattern matching through traffic interception
alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"Suspected HTTP C2C Beacon";
flow:to_server,established; content:"GET"; http_method; content:"/checkin.php"; http_uri;
pcre:"/\/checkin\.php\?id=[0-9a-f]{32}&status=ok/i";
threshold:type limit, track by_src, count 1, seconds 300; classtype:trojan-activity; sid:1000002; rev:1;)
- Ollydbg
- x64dbg
- dnSpy
- Ghidra
- IDA Pro
Very similar to Snort IDS
alert http any any -> any any (msg:"Suspicious C2 User-Agent Detected";
flow:established,to_server; content:"GET"; http_method;
content:"User-Agent|3A| BadBot v1.0"; http_header; classtype:trojan-activity; sid:1000001; rev:1;)
TODO
- Application-based Policy Enforcement
- Identify and control applications on any port, not just by protocol or port
- App-Id(TM) technology to identify applications
- User Identification Controls
- User-Id(TM) technology to tie network activity to users, not just IPs
- Integration with MS AD
- Content Inspection and Threat Prevention
- Deep packet inspection
- SSL Decryption/inspection
- URL filtering based on policy
- WildFire Malware Analysis
- Cloud-based service that integrates with the firewall to identify unknown malware, zero-day exploits, and advanced persistent threats (APTs).
- Automatic sharing of threat intel
- GlobalProtect(tm) (mobile)
- PAN-OS (core OS for fw)
- Advanced Threat Protection and Intelligence
- Integration with Autofocus(tm) contextual threat intelligence service
- Cloud integration
- IoT
- MFA
Example YARA sig matching a particular string
rule RuleName {
meta:
author = "Author Name"
description = "Description of the rule"
strings:
$string1 = "This is a string"
$string2 = { E2 34 A1 C8 }
condition:
$string1 or $string2
}
- Logical operators: and, or, not.
- Count of strings: For instance, #string1 > 2 (meaning string1 should appear more than twice).
- Positional operators: at or in to specify where in the file the strings should appear.
- File size checks: For instance, filesize < 200KB.
- Other YARA-specific functions and keywords, like pe.imphash() for matching specific PE file import hashes.
pydbg
Debugging malware
capstone
Disassembly framework
pefile
PE executable analysis framework
Automated malware analysis system, runs in a sandbox and monitors APIs, behaviour and determines IoCs
- Ollydbg
- x64dbg
- dnSpy
- PEiD
- CFF Explorer
- HxD (hex editor)
- UPX
- Ghidra
- IDA
- VirusTotal
- Phishing
- Spearphishing
- Drive-by download (i.e. malicious site, email, bla bla)
- Malvertising (http redirection)
- Social Engineering (is this really hacking? plz don't pwn me)
- MitM
- Sniffing
- ARP poisoning
- mDNS (multicast DNS, which is DNS via broadcast, similar to ARP)
- DNS Spoofing
- Packet Injection
- Session Hijacking - temporary session tokens that are stolen and reused
- SSL/TLS stripping, i.e. reducing encryption difficulty
- USB / reusable media (see stuxnet)
- Supply chain attacks (hardware or software mods before it reaches consumer)
- RDP Exploits
- Botnets (DDoS/DoS)
- Watering hole attacks (target a specific site or org an infect them all)
- Fileless malware (see #fileless-malware)
- Mobile
- IoT botnets
RDP
- Powershell and .NET stagers
- Fileless: meaning all execution ocurrs in memory-only
- Use Powershell, WMI, Office docs, etc to execute code
- LOLBins (Living off the land binaries), that use PS, for example, to install malware or backdoors
- DNS Tunneling
- Beaconing Intervals Indicate potential C2 communication
- Constants or string literals
- Hardcoded URIs
- Assembler that handles decryption
- Assembler that handles API resolution
- Specific or Anomalous APIS
- IP / Domain Destination
- URLs Unusual or anomalous URLSs
- Checksums/Hashes that match known payloads
- Email addresses
- Artifacts Malware mutexes, string constants
- Network Sigs
- System or file modification
- Registry
- Unusual account behaviour
- Unusual DNS requests
- Anomalous HTTP requests/responses
TODO
- Designed to be a resource authorization protocol, not specifically authentication
- Defines an API token that acts as a resource handler
.bss (rw data, uninitialized)
.comment (comment section)
.data & .data1 (rw data, initialized)
.debug (debug info)
.fini (finalization instructions)
.init (runtime initialization)
.rodata & .rodata1 (ro data)
.text (executable)
.line (contains gdb line numbers for debugging)
.note (notes, etc)