Linux/GCC | Windows/MSVC | Binaries |
---|---|---|
Travis CI | Appveyor CI | MSVC 2013 32-Bit/64-Bit |
--- |
NoWide is a library originally implemented by Artyom Beilis that makes cross-platform, Unicode-aware programming easier.
The library provides an implementation of standard C and C++ library functions, such that their inputs are UTF-8--aware on Windows without requiring to use Wide API.
Consider a simple application that splits a big file into chunks, such that they can be sent by email. It requires doing a few very simple tasks:
- Access Command-Line Arguments
- Open Input File
- Open Output Files
- Possibly Remove Output Files During Rollback
- Print Progress Report In Console
Unfortunately, it is impossible to implement this task in simple, standard C++. Why? Well, what happens when the filename being used in those operations contains non-ASCII characters?
On modern POSIX systems (Linux, Mac OSX, Solaris, BSD), filenames are
internally encoded in UTF-8. On such systems, the program reads the UTF-8
filenames from argv[]
and simply pass them verbatim to the needed classes
and functions (std::fstream
, std::remove
, std::cout
, etc.).
Windows, though, is not so simple. Windows uses UTF-16 internally. UTF-16
cannot fit into a simple char
. This means a Unicode filename simply
cannot be passed via the normal argv[]
and such files cannot be
opened or manipulated via the standard C and C++ APIs. Instead, the
Microsoft-specific APIs and extensions would need to be used to handle such a
program.
Normally, you'd need to write any code dealing with filenames twice: once for Windows and then again for all other platforms. This makes writing portable code a challenge even for such simple programs.
NoWide implements drop-in replacement functions for various C and C++ standard
library functions in the nowide
namespace rather than std
. On
Windows, these functions will translate between UTF-8 and UTF-16 where needed
and present a solely UTF-8 interface for you to program against that will work
anywhere. On other platforms, the functions are simply aliases to the
corresponding standard library function.
The library provides:
- Easy to use functions for converting between UTF-8 and UTF-16.
- A helper class to access UTF-8
argc
,argc
andenv
. - UTF-8--Aware Implementations:
<cstdio> Functions:
fopen
freopen
remove
rename
<cstdlib> Functions:
system
getenv
setenv
unsetenv
putenv
<fstream> Functions:
filebuf
fstream
ofstream
ifstream
<iostream> Functions:
cout
cerr
clog
cin
The trouble is wchar_t
isn't portable. It could be 1, 2, or 4 bytes
and there is no specific encoding it should be in. Additionally, the standard
library only provides narrow functions when dealing with the OS (e.g. there is
no fopen(wchar_t)
in the standard). We determined it would be better to
try and stick closely to the C and C++ standards rather than implement wide
function variants everywhere as Microsoft does.
For further reading, see UTF-8 Everywhere.
IMPORTANT: If you are using MSVC and a dynamic/shared build of NoWide, you
will need to define the NOWIDE_DLL
symbol prior to including the NoWide
headers so the functions are decorated with __declspec(dllimport)
as
needed. This is not required if using a static library or MinGW/GCC.
To use the library, you need to do to include the <nowide/*>
headers
instead of the standard ones and then call the functions using the nowide
namespace instead of std
.
For example, this is a naïve file line counter that cannot handle Unicode:
#include <fstream>
#include <iostream>
int main(int argc,char **argv)
{
if(argc!=2) {
std::cerr << "Usage: file_name" << std::endl;
return 1;
}
std::ifstream f(argv[1]);
if(!f) {
std::cerr << "Can't open a file " << argv[1] << std::endl;
return 1;
}
int total_lines = 0;
while(f) {
if(f.get() == '\n')
total_lines++;
}
f.close();
std::cout << "File " << argv[1] << " has " << total_lines << " lines"
<< std::endl;
return 0;
}
To make this program handle Unicode properly we make the following changes:
#include <nowide/args.hpp>
#include <nowide/fstream.hpp>
#include <nowide/iostream.hpp>
int main(int argc,char **argv)
{
nowide::args a(argc,argv); // UTF-8
if(argc!=2) {
nowide::cerr << "Usage: file_name" << std::endl; // UTF-8
return 1;
}
nowide::ifstream f(argv[1]); // UTF-8
if(!f) {
nowide::cerr << "Can't open a file " << argv[1] << std::endl; // UTF-8
return 1;
}
int total_lines = 0;
while(f) {
if(f.get() == '\n')
total_lines++;
}
f.close();
nowide::cout << "File " << argv[1] << " has " << total_lines << " lines"
<< std::endl; // UTF-8
return 0;
}
This simple and straightforward approach helps writing Unicode-aware programs.
Of course, the above cannot cover every use-case. There may be a Wide API that
you need to work with at some point -- either a Microsoft API or a custom
external one. When dealing with such APIs, use the nowide::widen
and nowide::narrow
functions to convert to/from UTF-8 at the point of use.
For Example:
CopyFileW( nowide::widen(existing_file).c_str(),
nowide::widen(new_file).c_str(),
TRUE);
These functions allocate normal std::string
s, but you may want to
allocate the string on the stack for particularly short strings. To do this,
the nowide::basic_stackstring
class can be used.
nowide::basic_stackstring<wchar_t,char,64> wexisting_file, wnew_file;
if(!wexisting_file.convert(existing_file) || !wnew_file.convert(new_file))
return -1; // invalid UTF-8
CopyFileW(wexisting_file.c_str(), wnew_file.c_str(), TRUE);
The following typedef
s are also provided for convenience:
stackstring
: narrowswchar_t
tochar
; holds 256 characters.wstackstring
: widenschar_t
towchar
; holds 256 characters.short_stackstring
: narrowswchar_t
tochar
; holds 16 characters.wshort_stackstring
: widenschar_t
towchar
; holds 16 characters.
These types will fall back to heap-based allocation if the string does not fit into the specified stack space.
The library does not include <windows.h>
in order to prevent namespace
pollution. The library rather defines the prototypes to the needed Win32 API
functions.
You may request to use the actual <windows.h>
anyways by setting defining
the NOWIDE_USE_WINDOWS_H
symbol before including any NoWide headers.
You will need a standard build environment for your platform (i.e. GCC, Xcode/Clang, MinGW, MSVC, etc.) as well as the following tools:
- CMake 2.8+
- Doxygen (Optional; For Documentation)
- GraphViz/Dot (Class Diagrams)
- HTML Help Workshop (CHM Documentation)
- PDFLaTeX (PDF Documentation)
Compilation steps are bog-standard for a CMake project:
mkdir build
cd build
cmake ..
make && make test
Optionally, to install:
make install