-
Notifications
You must be signed in to change notification settings - Fork 1
/
DA1_Preface.tex
executable file
·133 lines (119 loc) · 10.1 KB
/
DA1_Preface.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
\chapter*{Preface}
\addcontentsline{toc}{chapter}{Preface}
This book was initially devloped by Paul Wessel in support of an introductory course in statistics
and data analysis taken by many of our undergraduate majors and some graduate students in the Department of
Earth Sciences in the School of Ocean and Earth Science and Technology at
the University of Hawaii at M\={a}noa. Over the years, he
expanded the material to support undergraduate students in the broad area of the natural sciences
anywhere. There were several goals when in designing this book:
\begin{enumerate}
\item We wanted to introduce students of science and environmental engineering to some
of the most common methods used in the examination and analysis of simple data sets
encountered in the sciences. By learning these techniques your data analysis tool
chest will expand, preparing you for further learning when you need it.
\item We sought to fill in many of the intermediate steps in derivations that traditional
textbooks will skip. Thus, it should be much easier to trace these derivations since there
are no annoying messages of the type ``the derivation has been left as an exercise for the reader.''
The more elaborate derivations also help potential instructors using this book to present such
material at the level of detail they require.
\item We wished the book to be affordable. Publishing it as a digital book directly by the authors
means it costs a fraction of a traditional, professionally produced textbook. This method also enables timely updates
and easy corrections, at no extra cost to the purchaser. Of course, the flip-side of this decision
is that the book has not benefited from the help of professional editors and graphics artists
(but it \emph{has} been extensively reviewed by other scientists familiar with the methodologies).
\item We hoped to make the book suitable for introductory college level courses in data analysis and statistics, with
examples of analysis using real data sets. Thus, there is a
variety of assignments at the end of each chapter, for which data sets are available in the repository.
\end{enumerate}
As a science student or curious bystander, why should you consider this book? While many
general answers may be given, including the relatively low cost, some are particularly relevant:
\begin{enumerate}
\item The natural and environmental sciences are very data intensive, with huge quatitative data sets being collected
by improved instrumentation and made readily accessible over the Internet.
\item The job market requires quantitative skills, in particular for those pursuing a career in the natural
or environmental sciences. You will be dead in the water if you cannot juggle data to some extent.
\item All sciences need reproducibility of analyses for the testing of relevant hypotheses.
This concept goes to the heart of what science and the scientific method are all about.
\item The increased sophistication of newer instruments is transforming visual
characterizations into numbers that need to be analyzed quantitatively, hence new methods of analyses
continue to be developed.
\end{enumerate}
What is the target audience for this book? There are numerous candidates,
and we recommend the book specifically to:
\begin{enumerate}
\item Anyone ever planning to analyze data of any kind. This is of course a pretty broad statement but we believe it is true.
Once you get into the habit of thinking and working quantitatively, there is no going back.
\item Anyone who wants to be prepared for a changing job market, considering that you are competing for
opportunities with other students who likely have been exposed to similar material.
\item Budding scientists, engineers and technical personnel, especially those fearful of mathematics. Many students
tend to avoid courses and treatises that expose them to mathematics, thus limiting what they
can achieve as scientists. We hope, by showing all the steps in the derivations and presenting the coupling of
mathematics to data sets, that this book will help alleviate such fears.
\end{enumerate}
The main purpose of this introductory data analysis book is to prepare you for facing and dealing with data, their limitations and
the methods of analysis that may be most suitable for different types of data. We hope to achieve that goal by
a multiprong series of attacks:
\begin{enumerate}
\item Expose you to many different data analysis techniques and thus broaden your horizon. No book can cover
everything but being aware of other approaches allows you to pursue alternative
methodologies when the need arises.
\item Make you appreciate why you should fully understand a technique's nuts and bolts before running a ``black
box'' operator, such as most typical software packages, on your data.
\item Make you comfortable with applied mathematics at an intermediate level. The mathematics we employ
in this book is mostly algebra, trigonometry and calculus, with an introduction to matrix algebra.
There are numerous data analysis techniques that are simply exercises in applied matrix algebra.
\item Make you comfortable with the tools of the trade, such as MATLAB, R, Python, or Octave for your data analysis needs
rather than depending on ``business software'', such a spreadsheets.
However, this book is not a recipe collection of algorithms in these languages, but rather presents equations and the
assumptions used to pursue specific analysis goals or tests. The author's data analysis website
(\url{http://www.soest.hawaii.edu/pwessel/DA}) contains links to all the
data sets used in this book as well as any MATLAB example code discussed.
\end{enumerate}
Finally, data analysis is a very broad subject and the authors are certainly not experts in all the available
techniques. Our goal is to give you a flavor
of what is available, show you how to find out more, and enable you to make sensible decisions on how
to approach and analyze your data.
This book is based on Paul Wessel's collection of course notes that was inspired by several sources. Here are the ones he had
found most useful:
\begin{enumerate}
\item Course notes from a class in data analysis at Columbia University during his graduate school days, developed by Doug Martinson
at Lamont-Doherty Earth Observatory, provided a clear introduction to spectral analysis. His willingness to
let us use some of his early material in this book is gratefully acknowledged.
\item John C. Davis' textbook, \emph{Statistics and Data Analysis in Geology, 3rd edition} available from John Wiley and Sons.
His textbook has numerous useful problems complete with data sets that are now in the public domain
(\url{http://www.kgs.ku.edu/Mathgeo/Books/Stat}). A few of these are used in this book.
\item \emph{Exploratory Data Analysis} by John W. Tukey, available from Addison-Wesley Publishing Company, outlines the basics
of exploratory data analysis.
\item \emph{An Introduction to Error Analysis} by John R. Taylor, available from University Science Books, reviews the
various rules for the propagation of errors in compound expressions.
\item \emph{Numerical Recipes} by Press et al., available from Cambridge University Press in multiple language flavors,
clarifies many statistical procedures with a dose of nerd humor.
\item \emph{Robust Regression and Outlier Detection} by Peter. J. Rousseeuw and Annick M. Leroy, available from John Wiley and Sons, is a classic
for understanding why using robust methods is imperative when dealing with actual data.
\item \emph{Applied Regression Analysis} by Norman R. Draper and Harry Smith, also available from John Wiley and Sons, deals with
conventional least squares regressions.
\item \emph{Developments in Geomathematics} by Frederik P. Agterberg, available from Elsevier, deals with
a variety of topics, including multiple regressions.
\item Paul Wessel's cumulative experience with analyzing marine geophysical and plate kinematic data since 1985.
\end{enumerate}
Depending on the time available, this book may cover more material than can be presented in a single course. However,
many of the topics may be of interest to you at a later stage, hence we hope the book may serve a valuable
purpose as a reference.
Paul is grateful to his SOEST colleagues, both past and present. In the early years, the book benefited from
feedback from faculty who used the book in their courses (Fred Duennebier, Neil Frazier, Julia Morgan, Rob Dunn, Garrett Apuzen-Ito, and Cecily Wolfe). They provided valuable suggestions and error corrections. Likewise, numerous former students have
also contributed by pointing out typographical errors in the equations, odd or foul language,
stale jokes, or confusing illustrations. Starting with the 2023 edition these notes will be maintained by the
faculty of the Department of Earth Sciences in SOEST, University of Hawaii, and maintenance of the open
repository on GitHub will be a community effort. Because the repository is public, anyone may use it and
all we ask is to be notified if the reader finds errors, unclear sections, and similar problems.
We are also grateful to our former secretary, Evelyn Norris,
who taught herself enough \LaTeX\ to help typeset large portions of this manuscript from printouts of clumsy Microsoft Word documents
with obsolete equation editing plug-ins. As software come and go, \LaTeX\ remains solid and post-processing tools allow
for new output formats, such as the digital version you are reading. All illustrations are data-driven, original creations using the
Generic Mapping Tools (GMT; \url{https://www.generic-mapping-tools.org}) via individual scripts that produce
publication-quality PDF illustrations. The entire processing workflow from \LaTeX\ and GMT source code to PDF is automated via a UNIX makefile.
Because it is relatively easy to update digital books as frequently as required, we hope to hear from you should you find any remaining errors
in the text or graphics. By bringing them to our attention (\url{mailto:earth-da-book@soest.hawaii.edu})
they can be corrected and updated as soon as possible. Thank you for your interest in this book.
\vspace{2\baselineskip}
Dept. of Earth Sciences, SOEST, UHM, \DAmonth\ \DAyear.