Skip to content

Task #03

Raphael Fontes edited this page Apr 6, 2024 · 10 revisions

The Principles of Deep Learning Theory
Chapter 0 - Initialization

Main Concepts

Deep learning models are trained with real-world data to solve problems. They use deep neural networks, which are like recipes for transforming an input through many layers of components into an output. Each neural network is basically a list of instructions for computing a complex function, made up of computational units called neurons. These networks are parameterized by firing thresholds and weighted connections between neurons. Two crucial aspects are width, which refers to the number of neurons in each layer, and depth, indicating how many layers the network has.

Neurons are fundamental elements of neural networks, performing a simple function by considering a weighted sum of input signals and deciding whether or not to "fire" based on a threshold. The layers, in turn, are sets of neurons grouped at different stages of information processing. Representation learning, a key concept, refers to the ability of neural networks to learn useful representations of data throughout the training process. During this process, network parameters are adjusted to minimize the error between model predictions and actual results. An important principle is that of sparseness, which encourages networks to use only a small number of connections and active neurons, making the model more efficient and interpretable.

Theoretical Implications - Theoretical Minimum

It is critical to understand the mathematical foundations behind deep learning models. For example, understanding how words can be represented numerically using techniques such as word embeddings or one-hot encoding.

Python code
from sklearn.feature_extraction.text import CountVectorizer

# Data corpus
corpus = ["How Tech Giants Cut Corners to Harvest Data for AI",
          "What to Know About Tech Companies Using AI to Teach Their Own AI"]

# one-hot encoding
vectorizer = CountVectorizer(binary=True)
X = vectorizer.fit_transform(corpus)

print("Vocabulary:", vectorizer.get_feature_names_out())
print("\n one-hot representation:")
print(X.toarray())

# [[0 1 0 1 1 1 1 1 1 1 0 0 0 1 0 1 0 0]
# [1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1]]

Perturbation Theory

It can be applied to understand how small changes in these variables affect model accuracy. For example, by adjusting the learning rate (lr) when training a neural network, we can observe how this influences the convergence speed and accuracy of the model.

Python code
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Carregar conjunto de dados
newsgroups = fetch_20newsgroups(subset='all')

# Vetorizar texto
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(newsgroups.data)
y = newsgroups.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Treinar modelo de regressão logística com diferentes taxas de aprendizado
for lr in [0.01, 0.1, 1.0]:
    model = LogisticRegression(C=1.0, solver='liblinear', max_iter=1000, random_state=42, 
                               penalty='l2', class_weight='balanced', 
                               multi_class='auto', tol=0.0001, verbose=0,
                               warm_start=False, n_jobs=None, l1_ratio=None)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print(f"Accuracy {lr}: {acc}")

Interacting Theory

In a text classification model, interactions between words and their representations are crucial. For example, in a recurrent neural network (RNN), connections between words in a sentence are dynamically updated based on previous interactions, allowing the model to capture long-term dependencies in the text.

Python code
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense, Embedding
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# Texto de exemplo
corpus = ["How Tech Giants Cut Corners to Harvest Data for AI",
          "What to Know About Tech Companies Using AI to Teach Their Own AI"]

tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)
sequences = tokenizer.texts_to_sequences(corpus)

max_sequence_length = max([len(seq) for seq in sequences])
padded_sequences = pad_sequences(sequences, maxlen=max_sequence_length)

model = Sequential()
model.add(Embedding(input_dim=len(tokenizer.word_index) + 1, output_dim=100, input_length=max_sequence_length))
model.add(LSTM(units=64))
model.add(Dense(units=1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

labels = np.array([0, 1])
model.fit(padded_sequences, labels, epochs=10, batch_size=1, verbose=1)

# Epoch 1/10
# 2/2 [==============================] - 2s 16ms/step - loss: 0.6984 - accuracy: 0.5000
# Epoch 2/10
# 2/2 [==============================] - 0s 12ms/step - loss: 0.6859 - accuracy: 0.5000
# Epoch 3/10
# 2/2 [==============================] - 0s 10ms/step - loss: 0.6765 - accuracy: 1.0000
# Epoch 4/10
# 2/2 [==============================] - 0s 11ms/step - loss: 0.6670 - accuracy: 1.0000
# Epoch 5/10
# 2/2 [==============================] - 0s 15ms/step - loss: 0.6554 - accuracy: 1.0000
# Epoch 6/10
# 2/2 [==============================] - 0s 11ms/step - loss: 0.6413 - accuracy: 1.0000
# Epoch 7/10
# 2/2 [==============================] - 0s 10ms/step - loss: 0.6229 - accuracy: 1.0000
# Epoch 8/10
# 2/2 [==============================] - 0s 10ms/step - loss: 0.5988 - accuracy: 1.0000
# Epoch 9/10
# 2/2 [==============================] - 0s 10ms/step - loss: 0.5669 - accuracy: 1.0000
# Epoch 10/10
# 2/2 [==============================] - 0s 10ms/step - loss: 0.5244 - accuracy: 1.0000

Effective Theory

To simplify models and make them more effective, techniques such as dimensionality reduction can be applied. For example, when using a text classification model based on convolutional neural networks (CNN), we can apply pooling techniques to reduce the size of text representations, keeping only the features that are most important for the classification task. This results in more effective models, capable of processing large volumes of text efficiently and with better interpretability.

Python code
from sklearn.decomposition import TruncatedSVD
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer

newsgroups = fetch_20newsgroups(subset='all')

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(newsgroups.data)

svd = TruncatedSVD(n_components=100)
X_reduced = svd.fit_transform(X)

print("Dimensions before reduction:", X.shape)
print("Dimensions after reduction:", X_reduced.shape)

# Dimensions before reduction: (18846, 173762)
# Dimensions after reduction: (18846, 100)

Personal Assessment

The theories discussed offer a solid mathematical and conceptual framework for understanding how deep learning models work. Drawing on fundamental principles such as linear algebra, calculus and probability theory, these theories enable the construction of complex models capable of learning from data and performing challenging tasks such as pattern recognition in images and natural language processing. Furthermore, tools such as perturbation and effectiveness theories enable continuous analysis and improvement of models, allowing for refined adjustments and incremental progress.

However, deep learning models often become black boxes and are difficult to fully interpret. This opacity can limit understanding of how and why models make certain decisions. Furthermore, the application of these theories in practical scenarios generally requires advanced knowledge in mathematics and programming, which can be a barrier for many professionals interested in using these techniques in their areas of expertise.