-
Notifications
You must be signed in to change notification settings - Fork 6
/
ps10.tex
135 lines (115 loc) · 6.41 KB
/
ps10.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
\documentclass[12pt, letterpaper]{article}
\title{Problem Set 10: Fractal Dimension}
\author{Ken Sheedlo}
\usepackage[pdftex]{graphicx}
\usepackage[margin=1.0in]{geometry}
\usepackage{verbatim}
\usepackage{amsfonts}
\usepackage{caption}
\usepackage{subcaption}
\begin{document}
\maketitle{}
\section*{1. Box Counting Algorithm}
In order to experiment with fractal dimension, I first implemented the box
counting algorithm for computing capacity dimension. My implementation of box
counting uses the naive solution with one critical optimization. Instead of
storing a matrix of boolean values and counting the spaces that the attractor
fills, I store a set of points on the attractor in discretized space and count
the cardinality of the set. The algorithm is as follows:
\begin{description}
\item[Algorithm] \textsc{Hashed Box Counting}
\item[Given] An attractor $A$ in $n$ dimensions; a ball size $\varepsilon$
\item[Return] $N(\varepsilon)$
\item[Procedure] \ \vspace{0em}
\begin{enumerate}
\item Let the set $X = \left\{\right\}$.
\item For each point $p \in A$
\begin{itemize}
\item Discretize $p$ in space. Choose $x = (x_1, x_2, ..., x_n)$ where
$x_i = \lfloor p_i / \varepsilon\rfloor \in \mathbb{Z}$.
\item Hash $x$ into $X$. That is, let $X = X \bigcup \left\{x\right\}$. This
is an $O(1)$ operation for a \emph{hash set}.
\end{itemize}
\item Return $|X|$.
\end{enumerate}
\end{description}
It is easy to demonstrate that this algorithm runs in $O(|A|)$ time and consumes
$O(|A|)$ memory. This is a vast improvement over the naive algorithm, which is
$O(m^n)$ in running time and memory (where $m$ is the number of cells on a side
in the grid). By using a hash set instead of a large matrix, I can let $\varepsilon$
go as small as necessary, down to machine $\varepsilon$ without being concerned
with running time or memory constraints. As an added bonus, hashed box
counting is trivially parallelizable. Note that one needs a number of estimates
for $N(\varepsilon)$ for various $\varepsilon$ before one can estimate the
capacity dimension $d_{cap}$. Since the resource constraints of hashed
box counting are relatively light, one can use a parallel map such as the one
found in Python's \texttt{multiprocessing.Pool} class to divide the work between
any number of simultaneous processes. I was able to map a vector of various
$\varepsilon$ to the corresponding $N(\varepsilon)$ across 8 simultaneous
processes on 8 virtual CPU cores, which sped up the calculation significantly.
\section*{2. Capacity Dimension from Experimental Data}
\subsection*{a. Embedded Lorenz Attractor}
Running my program on the embedded Lorenz attractor from PS9 gave
$d_{cap} = 1.580324$. The results are shown in Figure 1.
\begin{center}
\includegraphics[scale=0.6]{ps10_2a.png}
\\
Figure 1: Capacity dimension for the embedded Lorenz attractor.
\end{center}
Note that this plot and capacity dimension were generated using the zeroth and
fifth columns of the embedding. These are the same columns that I used to render
Figure 6 in PS9. To compute the capacity dimension, I plotted $N(\varepsilon)$
for $\varepsilon$ in powers of 2 from $2^{10}$ to $2^{-18}$. I then chose a
lower scaling bound of 0.08 and an upper scaling bound of 0.8, and took the
linear regression of $\log N(\varepsilon)$ versus $\log\varepsilon$ inside the
scaling region. The regression gives
$\log N(\varepsilon) = d_{cap}\log\varepsilon + c$, where c is a constant factor.
\subsection*{b. Actual Lorenz Attractor}
I ran my program on the original trajectory that I used to produce the embedding
and computed $d_{cap}$ to be 1.796762. For this calculation, I parameterized the
Lorenz system with $a = 16$, $r = 45$, $b = 4$ and took 400,000 steps of RK4 with
$h = 0.0001$. I threw away the first 100,000 steps to make sure the transient
would not throw off the result. Figure 2 shows the results.
\begin{center}
\includegraphics[scale=0.6]{ps10_2b.png}
\\
Figure 2: Capacity dimension for the original trajectory.
\end{center}
This result is significantly higher than the result from part (a), suggesting
that some dimensionality may have been lost in the embedding. It is not surprising
that the capacity dimension could appear lower while running on a two-dimensional
projection because some points that are not close together on the actual
three-dimensional attractor could appear close together on the projection. In
terms of running time and memory constraints, each calculation took about 10
seconds in real time, or 80 seconds of CPU time. The calculation for the embedding
took around 300 MB of memory, and the calculation for the original trajectory took
approximately 1 GB across all 9 processes. The apparently high memory usage can
be explained by Python's multiprocessing model, which has to fork off new
processes to achieve parallelism instead of using shared memory and threads.
Memory usage was higher for the original trajectory because I precomputed
1,000,000 timesteps of Runge-Kutta for the following calculation. If I had not
done this, memory usage would be about the same for the embedding and the
original.
\subsection*{c. Longer Lorenz Trajectory}
For this calculation, I used a longer trajectory. I took 1,000,000 steps of
Runge-Kutta and found $d_{cap} = 1.842249$. As in part (b), I threw away the
first 100,000 steps so the transient would not have a significant effect on the
calculation. The results are shown in Figure 3.
\begin{center}
\includegraphics[scale=0.6]{ps10_2c.png}
\\
Figure 3: Capacity dimension for a longer trajectory.
\end{center}
The results are somewhat higher than the results from the shorter run. The result
from the longer run is more accurate because the longer run is a better
approximation of the real attractor than the shorter run. A cross section of the
longer run will more closely resemble the Cantor-like structure we expect to see
in the real attractor. In comparision, a cross section of the shorter run will be
less filled out.
\section*{3. Thought Experiments}
As previously discussed, hashed box counting is already a significant
improvement over naive box counting. In fact, hashed box counting
is optimal in running time. Notice that an algorithm that attempts to characterize
the shape of an attractor must consider each point on the attractor. So an
algorithm for finding $N(\varepsilon)$ must run in $O(|A|)$ time or longer.
\end{document}