-
Notifications
You must be signed in to change notification settings - Fork 0
Task #03
The Principles of Deep Learning Theory
Chapter 0 - Initialization
Deep learning models are trained with real-world data to solve problems. They use deep neural networks, which are like recipes
for transforming an input through many layers
of components into an output
. Each neural network is basically a list of instructions for computing a complex function, made up of computational units called neurons
. These networks are parameterized by firing thresholds
and weighted
connections between neurons. Two crucial aspects are width
, which refers to the number of neurons in each layer, and depth
, indicating how many layers the network has.
Neurons are fundamental elements of neural networks, performing a simple function by considering a weighted sum of input signals
and deciding whether or not to "fire" based on a threshold. The layers, in turn, are sets of neurons grouped at different stages of information processing. Representation learning, a key concept, refers to the ability of neural networks to learn useful representations of data throughout the training process. During this process, network parameters are adjusted to minimize the error between model predictions and actual results. An important principle is that of sparseness, which encourages networks to use only a small number of connections and active neurons, making the model more efficient and interpretable.
It is critical to understand the mathematical foundations behind deep learning models. For example, understanding how words can be represented numerically using techniques such as word embeddings or one-hot encoding.
Python code
from sklearn.feature_extraction.text import CountVectorizer
# Data corpus
corpus = ["How Tech Giants Cut Corners to Harvest Data for AI",
"What to Know About Tech Companies Using AI to Teach Their Own AI"]
# one-hot encoding
vectorizer = CountVectorizer(binary=True)
X = vectorizer.fit_transform(corpus)
print("Vocabulary:", vectorizer.get_feature_names_out())
print("\n one-hot representation:")
print(X.toarray())
# [[0 1 0 1 1 1 1 1 1 1 0 0 0 1 0 1 0 0]
# [1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1]]
It can be applied to understand how small changes in these variables affect model accuracy. For example, by adjusting the learning rate (lr) when training a neural network, we can observe how this influences the convergence speed and accuracy of the model.
Python code
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Carregar conjunto de dados
newsgroups = fetch_20newsgroups(subset='all')
# Vetorizar texto
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(newsgroups.data)
y = newsgroups.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Treinar modelo de regressão logística com diferentes taxas de aprendizado
for lr in [0.01, 0.1, 1.0]:
model = LogisticRegression(C=1.0, solver='liblinear', max_iter=1000, random_state=42,
penalty='l2', class_weight='balanced',
multi_class='auto', tol=0.0001, verbose=0,
warm_start=False, n_jobs=None, l1_ratio=None)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy {lr}: {acc}")
In a text classification model, interactions between words and their representations are crucial. For example, in a recurrent neural network (RNN), connections between words in a sentence are dynamically updated based on previous interactions, allowing the model to capture long-term dependencies in the text.
Python code
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense, Embedding
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
# Texto de exemplo
corpus = ["How Tech Giants Cut Corners to Harvest Data for AI",
"What to Know About Tech Companies Using AI to Teach Their Own AI"]
tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)
sequences = tokenizer.texts_to_sequences(corpus)
max_sequence_length = max([len(seq) for seq in sequences])
padded_sequences = pad_sequences(sequences, maxlen=max_sequence_length)
model = Sequential()
model.add(Embedding(input_dim=len(tokenizer.word_index) + 1, output_dim=100, input_length=max_sequence_length))
model.add(LSTM(units=64))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
labels = np.array([0, 1])
model.fit(padded_sequences, labels, epochs=10, batch_size=1, verbose=1)
# Epoch 1/10
# 2/2 [==============================] - 2s 16ms/step - loss: 0.6984 - accuracy: 0.5000
# Epoch 2/10
# 2/2 [==============================] - 0s 12ms/step - loss: 0.6859 - accuracy: 0.5000
# Epoch 3/10
# 2/2 [==============================] - 0s 10ms/step - loss: 0.6765 - accuracy: 1.0000
# Epoch 4/10
# 2/2 [==============================] - 0s 11ms/step - loss: 0.6670 - accuracy: 1.0000
# Epoch 5/10
# 2/2 [==============================] - 0s 15ms/step - loss: 0.6554 - accuracy: 1.0000
# Epoch 6/10
# 2/2 [==============================] - 0s 11ms/step - loss: 0.6413 - accuracy: 1.0000
# Epoch 7/10
# 2/2 [==============================] - 0s 10ms/step - loss: 0.6229 - accuracy: 1.0000
# Epoch 8/10
# 2/2 [==============================] - 0s 10ms/step - loss: 0.5988 - accuracy: 1.0000
# Epoch 9/10
# 2/2 [==============================] - 0s 10ms/step - loss: 0.5669 - accuracy: 1.0000
# Epoch 10/10
# 2/2 [==============================] - 0s 10ms/step - loss: 0.5244 - accuracy: 1.0000
To simplify models and make them more effective, techniques such as dimensionality reduction can be applied. For example, when using a text classification model based on convolutional neural networks (CNN), we can apply pooling techniques to reduce the size of text representations, keeping only the features that are most important for the classification task. This results in more effective models, capable of processing large volumes of text efficiently and with better interpretability.
Python code
from sklearn.decomposition import TruncatedSVD
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
newsgroups = fetch_20newsgroups(subset='all')
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(newsgroups.data)
svd = TruncatedSVD(n_components=100)
X_reduced = svd.fit_transform(X)
print("Dimensions before reduction:", X.shape)
print("Dimensions after reduction:", X_reduced.shape)
# Dimensions before reduction: (18846, 173762)
# Dimensions after reduction: (18846, 100)
The theories discussed offer a solid mathematical and conceptual framework for understanding how deep learning models work. Drawing on fundamental principles such as linear algebra, calculus and probability theory, these theories enable the construction of complex models capable of learning from data and performing challenging tasks such as pattern recognition in images and natural language processing. Furthermore, tools such as perturbation and effectiveness theories enable continuous analysis and improvement of models, allowing for refined adjustments and incremental progress.
However, deep learning models often become black boxes and are difficult to fully interpret. This opacity can limit understanding of how and why models make certain decisions. Furthermore, the application of these theories in practical scenarios generally requires advanced knowledge in mathematics and programming, which can be a barrier for many professionals interested in using these techniques in their areas of expertise.