Skip to content

AnjumeeJeba/6G-ML-Blockage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

6G THz Blockage and micromobility detection with machine learning

1. Importing Libraries

import arff
import torch
import copy
import numpy as np
import pandas as pd
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc
from sklearn.model_selection import train_test_split

from torch import nn, optim
import torch.nn.functional as F
  1. arff: Used to work with datasets in the ARFF format (common in machine learning). Likely, your THz micromobility dataset is in ARFF format.
  2. torch: A deep learning library for creating and training neural networks.
  3. copy: Allows deep copying of objects. This might be used later for duplicating models, data structures, etc.
  4. numpy: Fundamental library for numerical computations.
  5. pandas: For handling tabular data in DataFrames.
  6. seaborn and matplotlib: Visualization libraries to create graphs and charts.
  7. sklearn: Tools for machine learning, including splitting data into training and testing sets.
  8. torch.nn, optim, and F: Submodules of PyTorch for defining models (nn), optimization (optim), and additional utility functions (F).

2. Matplotlib Inline Configuration

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

%matplotlib inline: Ensures that matplotlib plots are displayed directly inside the Jupyter notebook. %config InlineBackend.figure_format = 'retina': Makes plots rendered in a higher resolution for better clarity.

3. Seaborn Plot Style and Palette

sns.set(style='whitegrid', palette='muted', font_scale=1.2)
HAPPY_COLORS_PALETTE = ["#01BEFE", "#FFDD00", "#FF7D00", "#FF006D", "#ADFF02", "#8F00FF"]
sns.set_palette(sns.color_palette(HAPPY_COLORS_PALETTE))

sns.set(): Customizes the appearance of Seaborn plots with:

whitegrid: Adds a grid background to plots. palette='muted':Sets muted colors as the default palette. font_scale=1.2: Scales up font sizes in plots for better readability.

HAPPY_COLORS_PALETTE: A custom color palette. These colors will be applied to the plots.

4. rcParams for Plot Size

rcParams['figure.figsize'] = 12, 8

Configures the default size of all plots to be 12 inches wide and 8 inches tall.

5. Setting Random Seed

RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)

RANDOM_SEED = 42: A fixed value for reproducibility. Ensures that results remain consistent across runs.

np.random.seed(): Seeds the random number generator for NumPy operations.

torch.manual_seed(): Seeds PyTorch's random number generator.

This ensures deterministic results during training and data preprocessing.

6. Output Explanation

output: <torch._C.Generator at 0x10a91bb30>

The output <torch._C.Generator at 0x10a91bb30> is generated by the line torch.manual_seed(RANDOM_SEED). It’s an object that represents the state of PyTorch's random number generator. This is normal and indicates that the seed has been successfully set.

7. Importing the ARFF File

import arff

# Load the ARFF file
with open('Train70%alloriginal.arff', 'r') as f:
    train_dataset = arff.load(f)

with open('Test30%alloriginal.arff', 'r') as f:
    test_dataset = arff.load(f)
  • The arff library is being used to read ARFF files, which are typically structured data files used in machine learning (e.g., WEKA datasets).
  • arff.load(f) reads and parses the contents of the ARFF file into Python objects.

8.Extracting Data and Attributes

train_data = train_dataset['data']
train_attributes = train_dataset['attributes']

test_data = test_dataset['data']
test_attributes = test_dataset['attributes']

The ARFF file is a dictionary-like structure with keys like data and attributes: data: Contains the actual dataset rows (features and labels). attributes: Describes the column names and types in the dataset (e.g., features and labels). Printing Data Print the first 14 rows of the training data

for row in train_data[:14]:
    print(row)

This loop iterates through the first 14 rows of the training data and prints them. From the screenshots, we can see that: Each row contains numeric values corresponding to the features of a trace. The last value in each row is a label: 1 indicates oscillation. 2 indicates non-block.

9. Loading ARFF Files

  1. Function load_arff_to_dataframe
  • Reads an ARFF file using the arff module.
  • Extracts the attributes (column names) and data (rows) from the ARFF file.
  • Converts these into a Pandas DataFrame, with attribute names as column headers.
  • Optionally ensures the target column (e.g., "Target") is treated as a
    categorical variable, which is useful for classification tasks.
attribute_names = [attr[0] for attr in dataset['attributes']]

Extracts column names from the attributes field of the ARFF file.

data = [list(row) for row in dataset['data']]

Converts the rows into a list of lists, making them compatible with Pandas

df = pd.DataFrame(data, columns=attribute_names)

Creates a DataFrame with extracted column names and data

print(train.head())
print(test.head())

Displays the first 5 rows of both datasets.

10. Dataframe Visualization

# Concatenate train and test DataFrames
df = pd.concat([train, test])

# Shuffle the DataFrame (shuffle rows randomly)
df = df.sample(frac=1.0)

# Check the shape of the combined and shuffled DataFrame
print(df.shape)
  • pd.concat([train, test]): This combines train and test vertically (row- wise).
  • .sample(frac=1.0): Shuffles all rows randomly while keeping all data (frac=1.0 means 100% of rows are sampled).
  • df.shape: Displays the shape of the resulting DataFrame.

Output: (480, 1501)

This means the train and test DataFrames combined have 480 rows and 1501 columns, which matches expectations based on our dataset structure.

# Concatenate train and test DataFrames
df = pd.concat([test])

# Shuffle the DataFrame (shuffle rows randomly)
df = df.sample(frac=1.0)

# Check the shape of the combined and shuffled DataFrame
print(df.shape)
  • pd.concat([test]): Only includes test (no actual concatenation because test is the sole DataFrame).
  • .sample(frac=1.0): Shuffles the rows in the test DataFrame.
  • df.shape: Displays the shape of the resulting DataFrame.

Output: (144, 1501)

This reflects that the test DataFrame alone has 144 rows and 1501 columns, consistent with our dataset structure.

# Concatenate train and test DataFrames
df = pd.concat([train])

# Shuffle the DataFrame (shuffle rows randomly)
df = df.sample(frac=1.0)

# Check the shape of the combined and shuffled DataFrame
print(df.shape)

This reflects that the train DataFrame alone has 336 rows and 1501 columns, consistent with our dataset structure again.

print(f"Train shape: {train.shape}")  
print(f"Test shape: {test.shape}")    

# Concatenate train and test DataFrames
df = pd.concat([train, test], axis=0)  

# Shuffle the combined DataFrame (randomizes the row order)
df = df.sample(frac=1.0, random_state=42)  

# Check the shape of the concatenated and shuffled DataFrame
print(f"Combined DataFrame shape: {df.shape}")  

# Display the first 5 rows of the shuffled DataFrame
print(df.head())

Check the shapes of train and test to verify they match the expected dimensions Train shape: (336, 1501) Test shape: (144, 1501) Combined DataFrame shape: (480, 1501)

11. Definition of classes

CLASS_OSCILLATION = 1

class_names = ['Oscillation','Nonblocked']

CLASS_OSCILLATION = 1:

A constant is defined to represent the "Oscillation" class with a value of 1. This is useful if you want to refer to the class in your code later without hardcoding the number.

class_names = ['Oscillation', 'Nonblocked']:

A list of class names is defined. These are human-readable labels for the numeric target values (1 and 2). They are used to make the plot more interpretable.

new_columns = list(df.columns)
new_columns[-1] = 'target'
df.columns = new_columns

This renames the last column of your DataFrame to target, which is presumably the column containing the class labels (1 or 2).

print(df.target.value_counts()) this method counts the occurrences of each unique value in the target column.

target
2    240
1    239
Name: count, dtype: int64

There are 240 examples of class 2 (Nonblocked) and 239 examples of class 1 (Oscillation).

ax = sns.countplot(df.target)
ax.set_xticklabels(class_names);

This creates a bar plot using Seaborn. Each bar represents the count of rows for a particular class (1 or 2) in the target column.

12. Time Series Class Plotting Function

The plot_time_series_class function plots a time series for a given class, with a smoothed version of the series (rolling mean) and a shaded region that represents the variability (confidence interval) around the smoothed curve.

def plot_time_series_class(data, class_name, ax, n_steps=10):

data: A single time-series sequence (e.g., one row or column from your dataset). This represents a specific example of the time-series values for one class (e.g., Oscillation). class_name: The name of the class being plotted, such as "Oscillation" or "Nonblocked". This will be used to label the plot. ax: The matplotlib Axes object where the plot will be drawn. n_steps: The number of steps for the rolling window. This determines how smooth the rolling average and standard deviation will appear.

time_series_df = pd.DataFrame(data)

Converts the input time series into a DataFrame, making it easier to calculate rolling statistics. For example, if you pass a row of time-series data for "Oscillation," this step creates a structured DataFrame with one column containing the series. Example from our dataset: If our row for "Oscillation" looks like this:

[-52.062580, -52.060561, -52.058553, ..., -52.322579]

time_series_df will look like:

       0
0 -52.062580
1 -52.060561
2 -52.058553
...
smooth_path = time_series_df.rolling(n_steps).mean()

Computes the rolling mean over n_steps. This smooths the raw time-series data, reducing noise and revealing trends in "Oscillation" or "Nonblocked." Example: With n_steps=10, each value in the smoothed series is the average of the preceding 10 values. This helps us to see the trend more clearly.

path_deviation = 2 * time_series_df.rolling(n_steps).std()

Calculates the rolling standard deviation over n_steps and multiplies it by 2. This quantifies variability in the time series and is used to create a confidence interval. Example: If the rolling standard deviation is 0.5, then path_deviation will be 1.0. The confidence interval will extend ±1.0 from the smoothed line.

under_line = (smooth_path - path_deviation)[0]
over_line = (smooth_path + path_deviation)[0]

Computes the lower (under_line) and upper (over_line) bounds of the confidence interval by subtracting or adding path_deviation from/to the smoothed series. For Oscillation: under_line might represent the minimum expected oscillation values over time. over_line might represent the maximum expected oscillation values over time.

ax.plot(smooth_path, linewidth=2)

Plots the smoothed time series as a line on the graph. This gives a clean view of the overall trend in the data.

ax.fill_between(
    path_deviation.index,
    under_line,
    over_line,
    alpha=.125
)
ax.set_title(class_name)
  • Fills the region between under_line and over_line to visualize the variability (or confidence interval) around the smoothed series. For Oscillation: This shaded area shows how much the oscillation values deviate from the trend.
  • Sets the title of the plot to the class name (e.g., "Oscillation" or "Nonblocked"). This helps identify which class the time series belongs to.