You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In linux operating system, there is a posix api write, which supports writing count bytes to a file. Before writing, we need a file descriptor which can be returned from another posix api open. But how could we operate in python?
we can refer such apis from os module, which includes all functions from posix. For example,
importosfd=os.open('/tmp/test_os_module', os.O_WRONLY)
data=b'test os module.'count=os.write(fd, data)
print(count, 'bytes data has been written.')
os.close(fd)
It's quite similar to how we write with open, write and close apis in c. Right? According to python's execution speed, these operations run slower than c? Probably yes, but I need a benchmark.
Firstly, I ran the benchmark on my MacBook Pro. And the system info is Darwin yusenbindeMacBook-Pro.local 17.2.0 Darwin Kernel Version 17.2.0: Fri Sep 29 18:27:05 PDT 2017; root:xnu-4570.20.62~3/RELEASE_X86_64 x86_64.
Obviously, the fastest writing speed is at 451 MB/s, and the most of cpu time is spent in kernel context.
Python version
The test python info is Python 3.6.0 [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)].
The test bash script,
for i in `seq 5`; do time python disk-io-benchmark.py 10737418240 4096; done
Test output,
successfully write 10737418240 bytes.
real 0m26.430s
user 0m1.879s
sys 0m20.932s
successfully write 10737418240 bytes.
real 0m26.860s
user 0m1.960s
sys 0m21.113s
successfully write 10737418240 bytes.
real 0m25.659s
user 0m1.775s
sys 0m20.498s
successfully write 10737418240 bytes.
real 0m25.042s
user 0m1.813s
sys 0m19.902s
successfully write 10737418240 bytes.
real 0m25.783s
user 0m1.923s
sys 0m20.506s
Next, I ran the benchmark on my ubuntu server. And the system info is Linux justdoit-thinkpad-e420 4.13.0-16-lowlatency #19-Ubuntu SMP PREEMPT Wed Oct 11 19:51:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux.
And the fastest writing speed is at 82MB/s. From document of the command iostat, the %util value is explained as the followed.
%util
Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this
value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this num‐
ber does not reflect their performance limits.
successfully write 1073741824 bytes.
real 0m8.447s
user 0m0.582s
sys 0m2.543s
successfully write 1073741824 bytes.
real 0m13.885s
user 0m0.786s
sys 0m3.307s
successfully write 1073741824 bytes.
real 0m14.773s
user 0m0.829s
sys 0m3.466s
successfully write 1073741824 bytes.
real 0m19.497s
user 0m0.871s
sys 0m3.155s
successfully write 1073741824 bytes.
real 0m12.824s
user 0m0.728s
sys 0m3.240s
To compare the results of these two versions, it's clear that the writing speed in python is almost similar to that in c. This may be surprising to some of you, as the c program is far faster than python program. However here they are both limited to the disk's speed.
Synchronous write
Out of curiosity, I tested the synchronous write speed. The following is test bash script.
Beacuse time is limited, I jsut used 1G bytes data. When turning on the O_SYNC flag, the writing speed is almost ten times slower. For more details, you may refer to fsync posix api.
How does this happen?
From the docstring, we know that os.write is a c function implemented in CPython. And the following code is implemented with posix api write.
staticPy\_ssize_t_Py_write_impl(intfd, constvoid*buf, size_tcount, intgil_held)
{
Py_ssize_tn;
interr;
intasync_err=0;
_Py_BEGIN_SUPPRESS_IPH#ifdefMS_WINDOWSif (count>32767&&isatty(fd)) {
/* Issue #11395: the Windows console returns an error (12: not enough space error) on writing into stdout if stdout mode is binary and the length is greater than 66,000 bytes (or less, depending on heap usage). */count=32767;
}
elseif (count>INT_MAX)
count=INT_MAX;
#elseif (count>PY_SSIZE_T_MAX) {
/* write() should truncate count to PY_SSIZE_T_MAX, but it's safer * to do it ourself to have a portable behaviour. */count=PY_SSIZE_T_MAX;
}
#endifif (gil_held) {
do {
Py_BEGIN_ALLOW_THREADSerrno=0;
#ifdefMS_WINDOWSn=write(fd, buf, (int)count);
#elsen=write(fd, buf, count);
#endif/* save/restore errno because PyErr_CheckSignals() * and PyErr_SetFromErrno() can modify it */err=errno;
Py_END_ALLOW_THREADS
} while (n<0&&err==EINTR&&
!(async_err=PyErr_CheckSignals()));
}
else {
do {
errno=0;
#ifdefMS_WINDOWSn=write(fd, buf, (int)count);
#elsen=write(fd, buf, count);
#endiferr=errno;
} while (n<0&&err==EINTR);
}
_Py_END_SUPPRESS_IPHif (async_err) {
/* write() was interrupted by a signal (failed with EINTR) and the Python signal handler raised an exception (if gil_held is nonzero). */errno=err;
assert(errno==EINTR&& (!gil_held||PyErr_Occurred()));
return-1;
}
if (n<0) {
if (gil_held)
PyErr_SetFromErrno(PyExc_OSError);
errno=err;
return-1;
}
returnn;
}
This function is quite simple. Firstly, it checks the count size, then calls write api, finally checks whether error has occurred and return written count.
In python interpreter, pure c function is executed without native python stack frame overhead. Normally it can be executed faster than pure python function. In this benchmark, the hottest operation is os.write, which runs as fast as c version write. Therefore the total time makes a little difference.
In conslusion, any program's writing speed is limited to the storage device. When the storage device is not saturated, the writing speed is direct proportion to program's execution speed. Once the disk device is saturated, program can't write faster longer.
How to write a file?
In linux operating system, there is a posix api
write
, which supports writingcount
bytes to a file. Before writing, we need a file descriptor which can be returned from another posix apiopen
. But how could we operate in python?we can refer such apis from
os
module, which includes all functions from posix. For example,It's quite similar to how we write with
open
,write
andclose
apis in c. Right? According to python's execution speed, these operations run slower than c? Probably yes, but I need a benchmark.Writing benchmark
In order to comparing writing speed between python and c, I have written two programs in python and c. All programs can be found at https://github.com/justdoit0823/notes/tree/master/python/code-sample/how-the-write-function-in-python-runs-as-fast-as-that-in-c.
Asynchronous write
Test on macOS
Firstly, I ran the benchmark on my MacBook Pro. And the system info is
Darwin yusenbindeMacBook-Pro.local 17.2.0 Darwin Kernel Version 17.2.0: Fri Sep 29 18:27:05 PDT 2017; root:xnu-4570.20.62~3/RELEASE_X86_64 x86_64
.Test bash script,
The test output,
The disk io statistic,
Obviously, the fastest writing speed is at
451 MB/s
, and the most of cpu time is spent in kernel context.The test python info is
Python 3.6.0 [GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)]
.The test bash script,
Test output,
The disk io statistic,
I will discuss more details later.
Test on ubuntu
Next, I ran the benchmark on my ubuntu server. And the system info is
Linux justdoit-thinkpad-e420 4.13.0-16-lowlatency #19-Ubuntu SMP PREEMPT Wed Oct 11 19:51:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
.Test bash script,
The test output,
And the disk io statistic,
And the fastest writing speed is at
82MB/s
. From document of the commandiostat
, the%util
value is explained as the followed.So the disk's writing speed is saturated.
The python info is
Python 3.6.3 [GCC 7.2.0]
.Test bash script,
Test output,
The disk io statistic,
To compare the results of these two versions, it's clear that the writing speed in python is almost similar to that in c. This may be surprising to some of you, as the c program is far faster than python program. However here they are both limited to the disk's speed.
Synchronous write
Out of curiosity, I tested the synchronous write speed. The following is test bash script.
The test output,
The disk io statistic,
Beacuse time is limited, I jsut used 1G bytes data. When turning on the O_SYNC flag, the writing speed is almost ten times slower. For more details, you may refer to
fsync
posix api.How does this happen?
From the docstring, we know that
os.write
is a c function implemented in CPython. And the following code is implemented with posix apiwrite
.This function is quite simple. Firstly, it checks the
count
size, then callswrite
api, finally checks whether error has occurred and return written count.In python interpreter, pure c function is executed without native python stack frame overhead. Normally it can be executed faster than pure python function. In this benchmark, the hottest operation is
os.write
, which runs as fast as c versionwrite
. Therefore the total time makes a little difference.In conslusion, any program's writing speed is limited to the storage device. When the storage device is not saturated, the writing speed is direct proportion to program's execution speed. Once the disk device is saturated, program can't write faster longer.
Reference
open(2)
write(2)
fsync(2)
close(2)
iostat(1)
https://en.wikipedia.org/wiki/POSIX
The text was updated successfully, but these errors were encountered: