A KISS pure Fortran library providing astrings (class) manipulator for modern (2003+) Fortran projects.
- StringiFor is a pure Fortran (KISS) library providing a strings manipulator for modern Fortran projects;
- StringiFor is Fortran 2003+ standard compliant;
- StringiFor is OOP designed;
- StringiFor is TDD designed;
- StringiFor is a Free, Open Source Project.
What is StringiFor? | Main features | Copyrights | Download | Compilation | Documentation | Comparison to other Approaches
Modern Fortran standards (2003+) have introduced a better support for characters variables, but Fortraners still do not have the power on dealing with strings of other more-rich-programmers, e.g. Pythoners. Allocatable deferred length character variables are now quantum-leap with respect the old inflexible Fortran characters, but it is still not enough for many Fortraners. Moreover, Fortran does not provide builtin methods for widely used strings manipulations offered by other languages, e.g. UPPER/lowercase transformation, tokenization, etc... StringiFor attempts to fill this lack.
Go to Top
StringiFor exposes only one class (OO-designed), the string
type, that should be used as a more powerful string variable with respect a standard Fortran character
variable. The main features of this class are:
- seamless interchangeability with standard character variables, e.g. concatenation, IO, etc...;
- handy builtin methods, e.g. split, search, basename, join, etc...;
- low memory consumption: only one deferred length allocatable character member is stored, allowing for efficient memory allocation in array of strings, the elements of which can have different lengths;
- safe: almost all methods are elemental or pure;
- robust: the library is Test Driven Developed TDD, a comprehensive tests suite is provided.
Any feature request is welcome.
Go to Top
StringiFor is very handy...
The class string
IO is overloaded by defined write/read TBP. Moreover, dedicated methods and operators can be exploited for IO, e.g.
use stringifor
type(string) :: astring
astring = 'Hello World'
print "(A)", astring%chars() ! "chars" method returns a standard character variable
print "(DT)", astring ! defined IO (in gfortran is available for GNU GCC >= 7.1)
print "(A)", astring//'' ! on-the-fly conversion to standard character by means of concatenation
The class string
has many methods for a plethora of strings manipulations, e.g.
use stringifor
type(string) :: astring
type(string) :: strings(3)
astring = '0123456789'
print "(A)", astring%reverse()//'' ! print "9876543210"
astring = 'Hello World'
print "(A)", astring%replace(old='World', new='People')//'' ! print "Hello People"
astring = 'Hello World'
strings = astring%partition(sep='lo Wo')
print "(A)", 'Before sep: "'//strings(1)//'"' ! print "Hel"
print "(A)", 'Sep itself: "'//strings(2)//'"' ! print "lo Wo"
print "(A)", 'After sep: "'//strings(3)//'"' ! print "rld"
strings(1) = 'one'
strings(2) = 'two'
strings(3) = 'three'
print "(A)", astring%join(strings)//'' ! print "oneHello WorldtwoHello Worldthree"
print "(A)", astring%join(strings, sep='-')//'' ! print "one-two-three"
astring = ' a StraNgE caSe var'
print "(A)", astring%camelcase()//'' ! print " AStrangeCaseVar"
print "(A)", astring%snakecase()//'' ! print " a_strange_case_var"
print "(A)", astring%startcase()//'' ! print " A Strange Case Var"
StringiFor, by means of the portability environment library, PENF can handle numbers (reals and integers) effortless. The string/number casting (to/from and viceversa) is done by overloaded assignments (for all kinds of integers and reals). For convenience, StringiFor exposes the PENF number portable kind parameters.
use stringifor
type(string) :: astring
astring = 127 _I1P ! "I1P" is the PENF kind for 1-byte-like integer.
print "(A)", astring//'' ! print "+127"
astring = 3.021e6_R4P ! "R4P" is the PENF kind for 4-byte-like real.
print "(A)", astring//'' ! print "+0.302100E+07"
astring = "3.4e9" ! assign to a string without the necessity to define a real kind
if (astring%is_number()) then
if (astring%is_real()) then
print "(E13.6)", astring%to_number(kind=1._R4P) ! print " 0.340000E+10" using a 4-byte-like kind
endif
endif
StingiFor is developed to improve the poor Fortran people with daily strings-usage, however, also complex scenario is taken into account, e.g. file parsing, OS operations, etc...
use stringifor
type(string) :: astring
! OS like manipulation
astring = '/bar/foo.tar.bz2'
print "(A)", astring%basedir()//'' ! print "/bar"
print "(A)", astring%basename()//'' ! print "foo.tar.bz2"
print "(A)", astring%basename(extension='.tar')//'' ! print "foo"
print "(A)", astring%basename(strip_last_extension=.true.)//'' ! print "foo.tar"
! XML like tag parsing
astring = '<test> <first> hello </first> <first> not the first </first> </test>'
print "(A)", astring%search(tag_start='<first>', tag_end='</first>')//'' ! print "<first> hello </first>"
This is just a provocation, but with StringiFor it is easy to develop a naive CSV parser. Let us assume we want to parse a cars-price database as the following one
Year, Make, Model, Description, Price
1997, Ford, E350 , ac abs moon, 3000.00
1999, Chevy, Venture "Extended Edition" , , 4900.00
1999, Chevy, Venture "Extended Edition Very Large", , 5000.00
Well, parsing it and handling its cells values is very easy by means of StringiFor
use stringifor
implicit none
type(string) :: csv !< The CSV file as a single stream.
type(string), allocatable :: rows(:) !< The CSV table rows.
type(string), allocatable :: columns(:) !< The CSV table columns.
type(string), allocatable :: cells(:,:) !< The CSV table cells.
type(string) :: most_expensive !< The most expensive car.
real(R8P) :: highest_cost !< The highest cost.
integer :: rows_number !< The CSV file rows number.
integer :: columns_number !< The CSV file columns number.
integer :: r !< Counter.
! parsing the just created CSV file: all done 9 statements!
call csv%read_file(file='cars.csv') ! read the CSV file as a single stream
call csv%split(tokens=rows, sep=new_line('a')) ! get the CSV file rows
rows_number = size(rows, dim=1) ! get the CSV file rows number
columns_number = rows(1)%count(',') + 1 ! get the CSV file columns number
allocate(cells(1:columns_number, 1:rows_number)) ! allocate the CSV file cells
do r=1, rows_number ! parse all cells
call rows(r)%split(tokens=columns, sep=',') ! get current columns
cells(1:columns_number, r) = columns ! save current columns into cells
enddo
! now you can do whatever with your parsed data
! print the table in markdown syntax
print "(A)", 'A markdown-formatted table'
print "(A)", ''
print "(A)", '|'//csv%join(array=cells(:, 1), sep='|')//'|'
print "(A)", '|'//repeat('----|', size(columns)) ! printing separators
do r=2, rows_number
print "(A)", '|'//csv%join(array=cells(:, r), sep='|')//'|'
enddo
print "(A)", ''
! find the most expensive car
print "(A)", 'Searching for the most expensive car'
most_expensive = 'unknown'
highest_cost = -1._R8P
do r=2, rows_number
if (cells(5, r)%to_number(kind=1._R8P)>=highest_cost) then
highest_cost = cells(5, r)%to_number(kind=1._R8P)
most_expensive = csv%join(array=[cells(2, r), cells(3, r)], sep=' ')
endif
enddo
print "(A)", 'The most expensive car is : '//most_expensive
See the test program csv_naive_parser for a working example.
Obviously, this is a naive parser without any robustness, but it proves the usefulness of the StringiFor approach.
Go to Top
StringiFor is an open source project, it is distributed under a multi-licensing system:
- for FOSS projects:
- for closed source/commercial projects:
Anyone is interest to use, to develop or to contribute to StringiFor is welcome, feel free to select the license that best matches your soul!
More details can be found on wiki.
Go to Top
StringiFor home is at https://github.com/szaghi/StringiFor. It uses git submodule
to handle the third party dependencies. To download all the source files you can:
- clone this repository (all dependencies are satisfied):
git clone https://github.com/szaghi/StringiFor
cd StringiFor
git submodule update --init
- download only the StringiFor sources, all other dependencies must be downloaded manually:
- download the latest master-branch archive:
wget https://github.com/szaghi/StringiFor/archive/master.zip
unzip StringiFor-master.zip
cd StringiFor-master
git submodule update --init
- download a release archive at https://github.com/szaghi/StringiFor/releases
- download the latest master-branch archive:
Currently StringiFor depends on:
The third party libraries are necessary for building StringiFor. StringiFor is constantly made up-to-date with third party libraries master branch or their latest release.
If you download a release of StringiFor manually (without git) you must download manually the above dependencies and place them into src/third_party
sub-directory of the project root-tree.
Go to Top
StringiFor is a modern Fortran project thus a modern Fortran compiler is need to compile the project. In the following table the support for some widely-used Fortran compilers is summarized.
Compiler Vendor Support | Notes |
---|---|
full support | |
full support | |
not tested | |
not tested | |
not tested | |
not tested |
The library is modular, namely it exploits Fortran modules. As a consequence, there is compilation-cascade hierarchy to build the library. To correctly build the library the following approaches are supported
- Build by means of FoBiS: full support;
- Build by means of GNU Make: support for GNU Make is not provided, a Makefile is provided, but it is likely outdated and could not work as expected. Help for maintaining GNU Make support is strongly welcome, feel free to join this progect.
- Build by means of CMake: support for CMake is not provide, some CMake support is provided by great users, but it could be outdated. Help for maintaining CMake support is strongly welcome, feel free to join this progect.
The FoBiS building support is the most complete and the only one officially supported by the author, as it is the one used for the developing StringiFor.
A fobos
file is provided to build the library by means of the Fortran Building System FoBiS.
Type
FoBiS.py build
After (a successuful) building a directory ./exe
is created containing all the compiled tests that constitute the StringiFor regression-tests-suite, e.g.
→ FoBiS.py build
Builder options
Directories
Building directory: "exe"
Compiled-objects .o directory: "exe/obj"
Compiled-objects .mod directory: "exe/mod"
Compiler options
Vendor: "gnu"
Compiler command: "gfortran"
Module directory switch: "-J"
Compiling flags: "-c -frealloc-lhs -std=f2008 -fall-intrinsics -O2 -Dr16p"
Linking flags: "-O2"
Preprocessing flags: "-Dr16p"
Coverage: False
Profile: False
PreForM.py used: False
PreForM.py output directory: None
PreForM.py extensions processed: []
Building src/tests/is_real.f90
Compiling src/lib/penf.F90 serially
Compiling src/lib/string_t.F90 serially
Compiling src/lib/stringifor.F90 serially
Compiling src/tests/is_real.f90 serially
Linking exe/is_real
Target src/tests/is_real.f90 has been successfully built
Builder options
Directories
Building directory: "exe"
Compiled-objects .o directory: "exe/obj"
Compiled-objects .mod directory: "exe/mod"
Compiler options
Vendor: "gnu"
Compiler command: "gfortran"
Module directory switch: "-J"
Compiling flags: "-c -frealloc-lhs -std=f2008 -fall-intrinsics -O2 -Dr16p"
Linking flags: "-O2"
Preprocessing flags: "-Dr16p"
Coverage: False
Profile: False
PreForM.py used: False
PreForM.py output directory: None
PreForM.py extensions processed: []
Building src/tests/slen.f90
Compiling src/tests/slen.f90 serially
...
→ tree -L 1 exe/
exe/
├── assignments
├── basename_dir
├── camelcase
├── capitalize
├── concatenation
├── equal
├── escape
├── extension
├── fill
...
├── swapcase
├── to_number
├── unique
└── upper_lower
Type
# static-linked library by means of GNU gfortran
FoBiS.py build -mode stringifor-static-gnu
# shared-linked library by means of GNU gfortran
FoBiS.py build -mode stringifor-shared-gnu
# static-linked library by means of Intel Fortran
FoBiS.py build -mode stringifor-static-intel
# shared-linked library by means of Intel Fortran
FoBiS.py build -mode stringifor-shared-intel
The library will be built into the directory ./lib
.
To list all fobos-provided modes type
→ FoBiS.py build -lmodes
The fobos file defines the following modes:
- "tests-gnu"
- "tests-gnu-debug"
- "tests-intel"
- "tests-intel-debug"
- "stringifor-static-gnu"
- "stringifor-shared-gnu"
- "stringifor-static-intel"
- "stringifor-shared-intel"
It is worth to note that the first mode is the one automatically called by FoBiS.py build
.
The provided makefile support only static-linked library building (not shared one) with both Intel Fortran Compiler and GNU gfortran, and it has two main building rules:
- build the (static linked) library;
- build the tests suite.
the GNU gfortran compiler is the default one, but the compiler used can be customized with COMPILER=#vendor switch.
To build the library type with the GNU gfortran compiler.
make
The library will be built into the directory ./lib/libstringifor.a
.
To build the tests suite type
make TESTS=yes
The tests will be built into the directory ./exe
.
If you want to use Intel Fortran Compiler add the switch COMPILER=intel
to the above commands, i.e.
make COMPILER=intel # build only the library
make COMPILER=intel TESTS=yes # build the tests suite
To be done.
Go to Top
The StringiFor documentation is mainly contained into this file (it has its own wiki with some less important documents). Detailed documentation of the API is contained into the GitHub Pages that can also be created locally by means of ford tool.
In the following all the methods of string
are listed with a brief description of their aim. The hyperlinks bring you to the full API explained into the GH pages.
name | meaning |
---|---|
adjustl | adjustl replacement |
adjustr | adjustr replacement |
count | count replacement |
index | index replacement |
len | len replacement |
len_trim | len_trim replacement |
repeat | repeat replacement |
scan | scan replacement |
trim | trim replacement |
verify | verify replacement |
name | meaning |
---|---|
basedir | return the base directory name of a string containing a file name |
basename | return the base file name of a string containing a file name |
camelcase | return a string with all words capitalized without spaces |
capitalize | return a string with its first character capitalized and the rest lowercased |
chars | return the raw characters data |
decode | decode string |
encode | encode string |
escape | escape backslashes (or custom escape character) |
extension | return the extension of a string containing a file name |
fill | pad string on the left (or right) with zeros (or other char) to fill width |
free | free dynamic memory |
insert | insert substring into string at a specified position |
join | return a string that is a join of an array of strings or characters |
lower | return a string with all lowercase characters |
partition | split string at separator and return the 3 parts (before the separator and after) |
read_file | read a file a single string stream |
read_line | read line (record) from a connected unit |
read_lines | read (all) lines (records) from a connected unit as a single ascii stream |
replace | return a string with all occurrences of substring old replaced by new |
reverse | return a reversed string |
search | search for tagged record into string |
slice | return the raw characters data sliced |
snakecase | return a string with all words lowercase separated by _ |
split | return a list of substring in the string using sep as the delimiter string |
startcase | return a string with all words capitalized, e.g. title case |
strip | return a string with the leading and trailing characters removed |
swapcase | return a string with uppercase chars converted to lowercase and vice versa |
tempname | return a safe temporary name suitable for temporary file or directories |
to_number | cast string to number |
unescape | unescape double backslashes (or custom escaped character) |
unique | reduce to one (unique) multiple occurrences of a substring into a string |
upper | return a string with all uppercase characters |
write_file | write a single string stream into file |
write_line | write line (record) to a connected unit |
write_lines | write lines (records) to a connected unit |
name | meaning |
---|---|
end_with | return true if a string ends with a specified suffix |
is_allocated | return true if the string is allocated |
is_digit | return true if all characters in the string are digits |
is_integer | return true if the string contains an integer |
is_lower | return true if all characters in the string are lowercase |
is_number | return true if the string contains a number (real or integer) |
is_real | return true if the string contains an real |
is_upper | return true if all characters in the string are uppercase |
start_with | return true if a string starts with a specified prefix |
name | meaning |
---|---|
assignment | assignment of string from different inputs |
// | concatenation resulting in characters for seamless integration |
.cat. | concatenation resulting in string |
== | equal operator |
/= | not equal operator |
< | lower than operator |
<= | lower equal than operator |
>= | greater equal than operator |
> | greater than operator |
name | meaning |
---|---|
read(formatted) | formatted input |
write(formatted) | formatted output |
read(unformatted) | unformatted input |
write(unformatted) | unformatted output |
Go to Top
The lack of Fortran support for strings manipulation has promoted different solutions in the past years. Following the classification of Clive Page [1] we can consider:
- standard character type;
- deferred-length allocatable character type (standard 2003+);
VARYING_STRING
type (standard 90/95+) as defined in ISO/IEC 1539-2:2000 (Varying length character strings).
Let us compare StringiFor to the previous three approaches. In particular, let us consider Ian Harvey extension of VARYING_STRING
, i.e. the aniso_varying_string
[2].
Clive Page had pointed out the following issues, among the others:
- fixed (at compile time) string length
character(len=3) :: astring ! further lengths different from 3 are not allowed
- silent truncation on assignment
character(len=3) :: astring
astring = 'abcdefgh' ! silent trunctation at 'abc'
- trim-cluttered code
character(len=99) :: astring
character(len=99) :: anotherstring
astring = 'abcdefgh'
anotherstring = trim(astring)//'ilmnopqrst' ! trim-cluttering is a necessity
- handle significant trailing spaces
character(len=99) :: astring
character(len=99) :: anotherstring
astring = 'Hello ' ! for some reasons you want to keep these trailing white spaces
anotherstring = trim(astring)//'World' ! you need trim because
! len(astring)==len(anotherstring), but lost the significant
! trailing spaces...
- different character definition
character :: astring*10 ! old way
character(len=10) :: anotherstring ! new way
- allocation of array of strings
character(len=10), allocatable :: astring(:)
allocate(astring(100)) ! all 100 elements of the array have 10 characters,
! different lengths cannot be declared
- initialization of array of strings
! the following is illegal
character(len=9), parameter :: day(7) = ['Monday', &
'Tuesday', &
'Wednesday', &
'Thursday', &
'Friday', &
'Saturday', &
'Sunday']
! the following is legal, but cluttered by non significant trailing spaces
character(len=9), parameter :: day(7) = ['Monday ', &
'Tuesday ', &
'Wednesday', &
'Thursday ', &
'Friday ', &
'Saturday ', &
'Sunday']
- IO limitations for non standard character variables
character(len=99) :: astring
character(len=:), allocatable :: anotherstring
type(varying_string) :: yetanotherstring
! fully-simple support for standard character variables
astring = 'abcdefgh'
print*, astring
print "(A)", astring
read(10, *) astring
! partial-simple support for standard deferred length-length allocatable character variables
! care must be placed in input operation...
print*, anotherstring
print "(A)", anotherstring
read(10, *) anotherstring
! support depends on the implementation of the varying string type
print*, yetanotherstring
print "(DT)", yetanotherstring
read(10, *) yetanotherstring
- substring notation (slice) for non standard character variables
character(len=99) :: astring
character(len=:), allocatable :: anotherstring
type(varying_string) :: yetanotherstring
astring = 'abcdefgh'
yetanotherstring = astring
anotherstring = astring(2:6) ! allowed
anotherstring = yetanotherstring(2:6) ! not allowed
- passing string to procedures expecting standard character argument is complicated
Analyzing the above issues we can agree that deferred-length allocatable character and aniso_varyng_string approaches address many of them, at the cost of introducing some oddies.
This approaches addresses all the issues related to the fixed length limitation, e.g.
character(len=:), allocatable :: astring
character(len=:), allocatable :: anotherstring
astring = 'Hello '
anotherstring = astring//'World' ! trailing with spaces of astring correctly handled
! no need of trim
However, it has some limitations too. Aside the input operation, the most important (IMHO) are related to arrays of strings handling, e.g.
character(len=:), allocatable :: asetofstring(:)
allocate(character(len=99) :: asetofstring(10)) ! all 10 elements must have len=99
Aniso_varying_string is an implemention of ISO/IEC 1539-2:2000 (Varying length character strings) developed by Ian Harvey that is internally based on a deferred-lenght allocatable character variable: it is essentially a derived type wrapping a deferred-lenght allocatable character. As a consequence, it has all the advantages of the deferred-length allocatable character approach. The wrapping approach addresses the arrays related issues, e.g.
type(varying_string), allocatable :: asetofstring(:)
allocate(asetofstring(10)) ! all 10 elements can have diffent lengths
Its major issues are related to IO operations: however, this is addressed by new Fortran support for defined IO for derived type that make more effortless the IO of such an object. The other main issue is the impossibility to use the standard slice notation to access to substring: aniso_varying_string addresses (partially) this issue by public-exposing the wrapped allocatable character of its implementations thus allowing the slicing of it, e.g.
type(varying_string) :: astring
astring = 'abcdefg'
print "(A)", astring%chars(2:3) ! print 'bc'
StringiFor shares the same philosophy of aniso_varying_string, thut it has the same pros and cons. However, StringiFor is an Object Oriented Designed class, thus it has some peculiariaties distinguishing it from aniso_varying_string, see StringiFor Peculiarities.
The following table summarizes the comparison analysis.
issue | standard character | deferred-length allocatable character | aniso_varying_string | StringiFor |
---|---|---|---|---|
fixed length | ☁️ | ☀️ | ☀️ | ☀️ |
silent trunction | ☁️ | ☀️ | ☀️ | ☀️ |
trim-clutter | ☁️ | ☀️ | ☀️ | ☀️ |
significant trailing spaces | ☁️ | ☀️ | ☀️ | ☀️ |
different string definition | ☁️ | ☁️ | ☀️ | ☀️ |
array allocatation | ☁️ | ☁️ | ☀️ | ☀️ |
array initialization | ☁️ | ☁️ | ☀️ | ☀️ |
IO | ☀️ | ☀️ | ⛅ | ⛅ |
substring (slice) notation | ☀️ | ☀️ | ⛅ | ⛅ |
Fortran builtins | ☀️ | ☀️ | ⛅ | ⛅ |
symbol | meaning |
---|---|
☁️ | bad or no support |
⛅ | partial support |
☀️ | good support |
StringiFor publics an OOD class, the string
object. This class is aimed to address all the issues of the standard character type, as ISO Varying String approaches do, but it is also designed to provide a features-rich string object as you can find on other languages like Python. As a matter of facts, the auxiliary methods added to the string
object consitute a long list of new (for Fortraners) string-facilities, allowing you to handle strings effortless (cases-conversion, files-handling, encode/decode, numbers-casting, etc...), see the complete API. It is worth to note that StringiFor is a tentative to adopt an fully OOD thus all methods and operators are TBP defined: to use StringiFor you can import only the string
type, allowing a sane and robust names space handling. Only in the case you want the Fortran builtins to accept a string
instead of a standard character type, e.g. to use index(astring, 'c')
seamless with both a type(string) :: astring
and a character(99) :: astring
, you must use all the StringiFor public objects, including the overloaded interfaces of the Fortran builtins.
[1] Improved String-handling in Fortran, Clive Page, October 2015.
[2] aniso_varying_string, Ian Harvey, 2016.
Go to Top