-
Notifications
You must be signed in to change notification settings - Fork 0
/
atom.xml
195 lines (132 loc) · 6.62 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title><![CDATA[Romain NIO - Blog dealing about Data]]></title>
<link href="http://www.rnio.me/atom.xml" rel="self"/>
<link href="http://www.rnio.me/"/>
<updated>2016-03-12T13:12:04+01:00</updated>
<id>http://www.rnio.me/</id>
<author>
<name><![CDATA[Romain NIO]]></name>
</author>
<generator uri="http://octopress.org/">Octopress</generator>
<entry>
<title type="html"><![CDATA[Build hadoop native librairies]]></title>
<link href="http://www.rnio.me/blog/2015/06/16/build-hadoop-native-librairies/"/>
<updated>2015-06-16T19:21:41+02:00</updated>
<id>http://www.rnio.me/blog/2015/06/16/build-hadoop-native-librairies</id>
<content type="html"><![CDATA[<p>The Hadoop native librairies are compiled for 32 bits plateforms. If you are using Hadoop on x64, you have probably been faced to the following issue :</p>
<pre><code> WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
</code></pre>
<p>For performances purposes, it better to recompile those libraries according to your plateform.</p>
<p>It’s a good idea to compile on the same architecture than your Hadoop production plateform. Of course, avoid any compilations on your production server Not sure that hadoop native librairies are compiled for 32 bits plateform ? You can check that with the following command :</p>
<pre><code>file $HADOOP_HOME/lib/native/libhadoop.so.1.0.0
</code></pre>
<p>Here the result :</p>
<pre><code>file libhadoop.so.1.0.0: ELF 32-bit
</code></pre>
<h2 id="download-source">Download Source</h2>
<p>Visit http://mirrors.ircam.fr/pub/apache/hadoop/common/ and find the tarball of your Hadoop version. Download it :</p>
<pre><code>wget http://mirrors.ircam.fr/pub/apache/hadoop/common/hadoop-2.4.1/hadoop-2.4.1.tar.gz
</code></pre>
<p>Install dependencies :</p>
<pre><code>sudo apt-get install cmake autoconf automake libtool gcc zlib1g-dev pkg-config libssl-dev openssl gcc g++ make maven zlib zlib1g-dev libcurl4-o
</code></pre>
<p>Install protobuf :</p>
<pre><code>wget https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz gunzip protobuf-2.5.0.tar.gz
tar -xvf protobuf-2.5.0.tar
cd protobuf-2.5.0
sudo ./configure --prefix=/usr sudo make
sudo make install
</code></pre>
<h2 id="compile-hadoop">Compile Hadoop</h2>
<p>Unzip you tarball :</p>
<pre><code>tar -xzf hadoop-2.4.1-src.tar.gz
</code></pre>
<p>Enter in your folder :</p>
<pre><code>cd hadoop-2.4.1-src/
</code></pre>
<p>Set your environment :</p>
<pre><code>export Platform=x64
Compile :
mvn package -Pdist,native -DskipTests -Dtar
</code></pre>
<p>If you face issues while compiling, google is your friend ;). If all is OK, you will have this kind of output :</p>
<pre><code>[INFO] ------------------------------------------------------------- [INFO] BUILD SUCCESS
[I------------------------------------------------------------------ [INFO] Total time: 5:27.684s
[INFO] Finished at: Wed Jul 02 19:33:51 CEST 2014
[INFO] Final Memory: 165M/834M
[INFO] -----------------------------------------------------------------------
</code></pre>
<p>You can find the librairies in this folder :</p>
<pre><code>cd ./hadoop-dist/target/hadoop-2.4.1/lib/native
</code></pre>
<p>We can see all built librairies (“ls lh”) :</p>
<pre><code>-rw-r--r-- 1 hadoop hadoop 1.1M Jul 2 19:07 libhadoop.a
lrwxrwxrwx 1 hadoop hadoop 18 Jul 2 19:07 libhadoop.so -> libhadoop.so.1.0.0 -rwxr-xr-x 1 hadoop hadoop 650K Jul 2 19:07 libhadoop.so.1.0.0
-rw-r--r-- 1 hadoop hadoop 1.4M Jul 2 19:07 libhadooppipes.a
-rw-r--r-- 1 hadoop hadoop 421K Jul 2 19:07 libhadooputils.a
-rw-r--r-- 1 hadoop hadoop 373K Jul 2 19:07 libhdfs.a
lrwxrwxrwx 1 hadoop hadoop 16 Jul 2 19:07 libhdfs.so -> libhdfs.so.0.0.0 -rwxr-xr-x 1 hadoop hadoop 245K Jul 2 19:07 libhdfs.so.0.0.0
</code></pre>
<p>At this step, you can check the plateform of the librairies :</p>
<pre><code>file libhadoop.so.1.0.0
</code></pre>
<p>The result seems to be OK :</p>
<pre><code>libhadoop.so.1.0.0: ELF 64-bit LSB shared object, x86-64
</code></pre>
<p>save them and archive this package :</p>
<pre><code>tar -cvzf hadoop-native-libraries-2.4.1.tgz *
</code></pre>
<h2 id="copy-librairies-on-your-cluster">Copy librairies on your cluster</h2>
<p>This step need to be reproduced for namenode and datanode.
You just need to copy all those file in $HADOOP_HOME/lib/native (eg : /usr/local/hadoop/lib/native) :</p>
<pre><code>$ rsync hadoop-native-libraries-2.4.1.tgz <your_hadoop_production_server>:/usr/local/hadoop/lib/native/
</code></pre>
<p>Enter in your Hadoop home (eg : /usr/local/hadoop/lib/native)</p>
<pre><code>$ cd $HADOOP_HOME/lib/native
</code></pre>
<p>Extract archives :</p>
<pre><code>$ tar -xzf hadoop-native-libraries-2.4.1.tgz
</code></pre>
<h2 id="configure-your-environment">Configure your environment</h2>
<p>You probably have a specific unix user for your hadoop cluster. Add those lines in your ~/.bashrc (or coure, edit paths according to your configuration):</p>
<pre><code>export HADOOP_INSTALL=/usr/local/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_COMMON_LIB_NATIVE_DIR $HADOOP_OPTS"
</code></pre>
<h2 id="stop-and-restart-hadoop">Stop and restart Hadoop</h2>
<p>Stop cluster :</p>
<pre><code>./stop-dfs.sh
./stop-yarn.sh
</code></pre>
<p>Source again your bashrc :</p>
<pre><code>source ~/.bashrc
</code></pre>
<p>Start Hadoop :</p>
<pre><code>./start-dfs.sh
./start-yarn.sh
</code></pre>
<p>Test that the message disappears :</p>
<pre><code>$ hadoop fs -ls /user/hadoop
Found 1 item
-rwxr-xr-x 1 hadoop supergroup 8 2014-07-01 14:06 /user/hadoop/toto.txt
</code></pre>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[Split large file in bash]]></title>
<link href="http://www.rnio.me/blog/2014/06/05/split-large-file-unix-in-bash-command-line/"/>
<updated>2014-06-05T23:44:18+02:00</updated>
<id>http://www.rnio.me/blog/2014/06/05/split-large-file-unix-in-bash-command-line</id>
<content type="html"><![CDATA[<p>When you are dealing with large file, it’s complicated to share or manipulate them.</p>
<p>On linux the split command can be useful for you.</p>
<p>Basic Usage :</p>
<pre><code>$ split [-l] [-b] filename prefix
</code></pre>
<p>For example, if you want to split a large file named “clients.csv” and create files with 100k records / file, apply the following command:</p>
<pre><code>$ split -l 100000 clients.csv splitted_clients-
“splitted_client” is the prefix applied to each generated files
</code></pre>
]]></content>
</entry>
</feed>