Releases: chaoss/grimoirelab-sortinghat
0.7.21-rc.3
sortinghat 0.7.21-rc.3 - (2022-09-23)
- Update Poetry's package dependencies
0.7.21-rc.2
sortinghat 0.7.21-rc.2 - (2022-09-09)
- Update Poetry's package dependencies
0.7.21-rc.1
sortinghat 0.7.21-rc.1 - (2022-09-06)
- Update Poetry's package dependencies
0.7.20
Sorting Hat 0.7.20 - (2022-06-02)
Bug fixes:
- [gitdm] Skip invalid format lines
Gitdm parser won't fail reading files with an invalid format. Instead,
it will ignore invalid content.
0.7.0
Sorting Hat 0.7 - (2018-10-02)
NOTICE: Database schema generated by SortingHat < 0.7.0 is still
compatible but older versions can have problems inserting UTF-8
characters of 4 bytes.
Python 2.7 is no longer supported.
Please check "Compatibility between versions" section from README.md file.
New features and improvements:
-
Python 2.7 not longer supported
As Python 2.x will not be maintained after 2020, SortingHat is only
compatible with Python >= 3.4. -
Low level API
This API is able to execute basic operations over the database, such
as adding or removing identities or finding entities. All these operations
work within a session. Nothing is stored in the database until the
session is closed. Thus, these functions can be considered as "bricks",
that combined can create high-level functions. -
Storage of UTF-8 4-bytes characters
The default charset of UTF-8 (utf8) in MySQL/MariaDB does not support,
even when they are part of the standard, 4-bytes long characters.
This means characters like emojis or certain chinese characters cannot
be inserted. Usually, identities names or usernames have these types of
characters.The charset that fully supports UTF-8 is
utf8mb4
using the collation
utf8mb4_unicode_520_ci
. This collation implements the suggested Unicode
Collation Algorithm (v5.2).Using
utf8mb4
also implies that the maximum size of char (VARCHAR and
so on) columns is 191. Indexes cannot be larger than that when using
InnoDB engine.Starting on 0.7 series, SortingHat is using this charset.
-
Handle disconnection using pessimistic mode
SQLAlchemy offers a pessimistic mode to handle database disconnection.
Settingpool_pre_ping
parameter on the database engine will check if
the database connection is still active when a session of the connection
pool is reused. This causes a small hit in the performance but it's worth
it. -
Use a optimistic approach when inserting data
With this optimistic approach, no more queries to check whether an entity
exists on the database are run prior to its insertion.
0.6.0
Sorting Hat 0.6 - (2018-03-05)
NOTICE: Database schema generated by SortingHat < 0.6.0 are no longer
compatible. Please check "Compatibility between versions" section from
README.md file
New features and improvements:
-
Gender.
Unique identities gender can be set in the profile using the command
profile
and data will be stored in the table of the same name. This table
adds two new fields:gender
, a free text field to set the gender
value, andgender_acc
, to set the accuracy of the gender - in a range
of 1 to 100 - when it is set using automatic options.The new command
autogender
has also been added. It assigns a gender
to each unique identity using the name of the profile and the information
provided byhttp://genderize.io
. Possible values are male or female. -
Option for reusing a database.
An existing database can be reused when
init
command is called. So far,
when the database was already created, this command raised an exception. -
Version option.
Calling
sortinghat
with the option-v | --version
prints the version
ofsortinghat
running on the system. -
Tests improvements.
Some minor changes were done in the testing area. The main ones were to
support MariaDB engine and to use a remote testing database.
0.5.0
Sorting Hat 0.5 - (2017-12-21)
NOTICE: Database schemas generated by SortingHat < 0.5.0 are no longer
compatible. Please check "Compatibility between versions" section from
README.md file
New features and improvements:
-
Last modification.
Unique identities and identities log the last time they were modified
by adding, deleting, moving, merging, updating the profile, adding
or removing enrollments operations.The new
search_last_modified_identities
API function allows to search
for the UUIDs of those identities modified on or after a given date. -
No strict matching option.
This option allows to avoid a rigorous validation of values while
matching identities, for instance, with well formed email addresses
or names with first name and last name. This option is available on
load
andunify
commands. -
Reset option while loading.
Before loading any data, if
reset
option is set, all the relationships
between identities and their enrollments will be removed from the
database. -
GrimoireLab support.
GrimoireLab identities and organizations YAML files can be converted
to Sorting Hat JSON format using the scriptgrimoirelab2sh
.
Bugs fixed:
-
Fix tables created with invalid collation. In some random situations
Sorting Hat tables appear with an invalid collation. This is related
to a wrong generation of the DDL table statement by SQLAlchemy, which
may randomly prepend the collation information (MYSQL_COLLATE
) to
the charset one (MYSQL_CHARSET
), causing the former to be ignored.
ChangingMYSQL_CHARSET
toMYSQL_DEFAULT_CHARSET
fixed the problem. -
Remove trailing whitespaces in exported JSON files. This error is only
found in Python 2.7 due to a bug in the standard library with
json.dump()
andindent
parameter. (#103) -
Update profile information when loading identities. So far, profile
information was set only the first time a unique identity was loaded.
With this change, it will be updated always, except when the given
profile is empty
0.4.0
Sorting Hat 0.4 - (2017-07-17)
New features and improvements:
-
Mailmap and StackAlytics support.
Mailmap and StackAlytics files can be converted to Sorting Hat JSON
format using the new scriptsmailmap2sh
andstackalytics2sh
. -
Unify by sources.
Giving a list of sources, this option allows to
unify
command to
merge only those unique identities which belong to any of the given
sources.
Bugs fixed:
-
Encoding error generating UUIDs in Python 3. Some special characters
cannot be encoded in Python3. This caused functionuuid()
to fail
when converting those characters. 'surrogateescape' handler was
added to fix that problem. -
Force
utf8_unicode_ci
collation on MySQL tables to fix integrity errors.
MySQL considers chars likeβ
andb
orı
andi
the same, when
some collation values are set (i.eutf8_general_ci
). This can raise
integrity errors when Sorting Hat tries to add similar identities with
these pairs of characters.For instance, if the identity:
('scm', 'βart', 'bart@example.com', 'bart)
is stored in the database, the insertion of:
('scm', 'bart', 'bart@example.com', 'bart)
will raise an error, even when these identities have different UUIDs.
Forcing MySQL to useutf8_unicode_ci
fixes this error, allowing
to insert both identities.
0.3.0
Sorting Hat 0.3 - (2017-03-21)
NOTICE: UUIDs generated by SortingHat < 0.3.0 are no longer compatible.
Please check "Compatibility between versions" section from README.md file
New features and improvements:
-
New algorithm to genere UUIDs.
UUIDs were generated using case and accent sensitive values with the seed
(source:email:name:username)
. This means that for any identity with the
same values in lower or upper case (i.e:jsmith@example.com
andJSMITH@example.com
)
or with the same values accent or unaccent (i.e:John Smith
orJöhn Smith
)
would have different UUIDs for any of these combinations.The new algorithm changes upper to lower case characters and converts accent
characters to their canonical form before the UUIDs is generated.This change is caused by the behaviour of MySQL with case configurations
and accent and unaccent characters. MySQL considers those characters the same,
raisingIntegrityError
exceptions when similar tuple values are inserted
into the database. Generating the same UUID for these cases will prevent the
error.Take into account that previous UUIDs are no longer compatible with this
version of SortingHat. You should regenerate the UUIDs following the steps
described in section Compatibility between versions fromREADME.md
file.
Bugs fixed:
-
Any non-empty value in email field was used during the affiliation. This
caused some errors for non valid email addresses like with 'email@' cases,
which raised aIndexError
exception. This bug has been fixed using valid
email addresses only during the affiliation. -
Invalid database names were allowed in
init
command.
0.2.0
Sorting Hat 0.2 - (2017-02-01)
New features and improvements:
-
Auto complete profile information with
autoprofile
command.This command autocompletes the profiles information related to a set of unique
identities. To update the profile, the command uses a list of sources ordered
by priority. Only those unique identities which have one or more identities
from any of these sources will be updated. The name of the profile will be
filled using the best name possible, normally the longest one. -
GiHub identities matching method.
This new method tries to find equal identities using those identities from
GitHub sources. The identities must come from a source starting with agithub
label and the usernames must be equal.
Bugs fixed:
- The parser for Gitdm files only accepted email addresses as valid aliases.
This has been modified to accept any type of aliases. Thus, the input file
passed togidm2sh
script will be a list of valid aliases instead of email
aliases.