Skip to content

Releases: chaoss/grimoirelab-sortinghat

0.7.21-rc.3

23 Sep 09:35
Compare
Choose a tag to compare

sortinghat 0.7.21-rc.3 - (2022-09-23)

  • Update Poetry's package dependencies

0.7.21-rc.2

09 Sep 16:21
Compare
Choose a tag to compare

sortinghat 0.7.21-rc.2 - (2022-09-09)

  • Update Poetry's package dependencies

0.7.21-rc.1

06 Sep 12:33
Compare
Choose a tag to compare

sortinghat 0.7.21-rc.1 - (2022-09-06)

  • Update Poetry's package dependencies

0.7.20

02 Jun 17:02
Compare
Choose a tag to compare

Sorting Hat 0.7.20 - (2022-06-02)

Bug fixes:

  • [gitdm] Skip invalid format lines
    Gitdm parser won't fail reading files with an invalid format. Instead,
    it will ignore invalid content.

0.7.0

02 Oct 17:31
Compare
Choose a tag to compare

Sorting Hat 0.7 - (2018-10-02)

NOTICE: Database schema generated by SortingHat < 0.7.0 is still
compatible but older versions can have problems inserting UTF-8
characters of 4 bytes.

Python 2.7 is no longer supported.

Please check "Compatibility between versions" section from README.md file.

New features and improvements:

  • Python 2.7 not longer supported

    As Python 2.x will not be maintained after 2020, SortingHat is only
    compatible with Python >= 3.4.

  • Low level API

    This API is able to execute basic operations over the database, such
    as adding or removing identities or finding entities. All these operations
    work within a session. Nothing is stored in the database until the
    session is closed. Thus, these functions can be considered as "bricks",
    that combined can create high-level functions.

  • Storage of UTF-8 4-bytes characters

    The default charset of UTF-8 (utf8) in MySQL/MariaDB does not support,
    even when they are part of the standard, 4-bytes long characters.
    This means characters like emojis or certain chinese characters cannot
    be inserted. Usually, identities names or usernames have these types of
    characters.

    The charset that fully supports UTF-8 is utf8mb4 using the collation
    utf8mb4_unicode_520_ci. This collation implements the suggested Unicode
    Collation Algorithm (v5.2).

    Using utf8mb4 also implies that the maximum size of char (VARCHAR and
    so on) columns is 191. Indexes cannot be larger than that when using
    InnoDB engine.

    Starting on 0.7 series, SortingHat is using this charset.

  • Handle disconnection using pessimistic mode

    SQLAlchemy offers a pessimistic mode to handle database disconnection.
    Setting pool_pre_ping parameter on the database engine will check if
    the database connection is still active when a session of the connection
    pool is reused. This causes a small hit in the performance but it's worth
    it.

  • Use a optimistic approach when inserting data

    With this optimistic approach, no more queries to check whether an entity
    exists on the database are run prior to its insertion.

0.6.0

05 Mar 19:34
Compare
Choose a tag to compare

Sorting Hat 0.6 - (2018-03-05)

NOTICE: Database schema generated by SortingHat < 0.6.0 are no longer
compatible. Please check "Compatibility between versions" section from
README.md file

New features and improvements:

  • Gender.

    Unique identities gender can be set in the profile using the command
    profile and data will be stored in the table of the same name. This table
    adds two new fields: gender, a free text field to set the gender
    value, and gender_acc, to set the accuracy of the gender - in a range
    of 1 to 100 - when it is set using automatic options.

    The new command autogender has also been added. It assigns a gender
    to each unique identity using the name of the profile and the information
    provided by http://genderize.io. Possible values are male or female.

  • Option for reusing a database.

    An existing database can be reused when init command is called. So far,
    when the database was already created, this command raised an exception.

  • Version option.

    Calling sortinghat with the option -v | --version prints the version
    of sortinghat running on the system.

  • Tests improvements.

    Some minor changes were done in the testing area. The main ones were to
    support MariaDB engine and to use a remote testing database.

0.5.0

21 Dec 12:21
Compare
Choose a tag to compare

Sorting Hat 0.5 - (2017-12-21)

NOTICE: Database schemas generated by SortingHat < 0.5.0 are no longer
compatible. Please check "Compatibility between versions" section from
README.md file

New features and improvements:

  • Last modification.

    Unique identities and identities log the last time they were modified
    by adding, deleting, moving, merging, updating the profile, adding
    or removing enrollments operations.

    The new search_last_modified_identities API function allows to search
    for the UUIDs of those identities modified on or after a given date.

  • No strict matching option.

    This option allows to avoid a rigorous validation of values while
    matching identities, for instance, with well formed email addresses
    or names with first name and last name. This option is available on
    load and unify commands.

  • Reset option while loading.

    Before loading any data, if reset option is set, all the relationships
    between identities and their enrollments will be removed from the
    database.

  • GrimoireLab support.

    GrimoireLab identities and organizations YAML files can be converted
    to Sorting Hat JSON format using the script grimoirelab2sh.

Bugs fixed:

  • Fix tables created with invalid collation. In some random situations
    Sorting Hat tables appear with an invalid collation. This is related
    to a wrong generation of the DDL table statement by SQLAlchemy, which
    may randomly prepend the collation information (MYSQL_COLLATE) to
    the charset one (MYSQL_CHARSET), causing the former to be ignored.
    Changing MYSQL_CHARSET to MYSQL_DEFAULT_CHARSET fixed the problem.

  • Remove trailing whitespaces in exported JSON files. This error is only
    found in Python 2.7 due to a bug in the standard library with
    json.dump() and indent parameter. (#103)

  • Update profile information when loading identities. So far, profile
    information was set only the first time a unique identity was loaded.
    With this change, it will be updated always, except when the given
    profile is empty

0.4.0

17 Jul 17:09
Compare
Choose a tag to compare

Sorting Hat 0.4 - (2017-07-17)

New features and improvements:

  • Mailmap and StackAlytics support.

    Mailmap and StackAlytics files can be converted to Sorting Hat JSON
    format using the new scripts mailmap2sh and stackalytics2sh.

  • Unify by sources.

    Giving a list of sources, this option allows to unify command to
    merge only those unique identities which belong to any of the given
    sources.

Bugs fixed:

  • Encoding error generating UUIDs in Python 3. Some special characters
    cannot be encoded in Python3. This caused function uuid() to fail
    when converting those characters. 'surrogateescape' handler was
    added to fix that problem.

  • Force utf8_unicode_ci collation on MySQL tables to fix integrity errors.
    MySQL considers chars like β and b or ı and i the same, when
    some collation values are set (i.e utf8_general_ci). This can raise
    integrity errors when Sorting Hat tries to add similar identities with
    these pairs of characters.

    For instance, if the identity:

    ('scm', 'βart', 'bart@example.com', 'bart)
    

    is stored in the database, the insertion of:

    ('scm', 'bart', 'bart@example.com', 'bart)
    

    will raise an error, even when these identities have different UUIDs.
    Forcing MySQL to use utf8_unicode_ci fixes this error, allowing
    to insert both identities.

0.3.0

21 Mar 13:12
Compare
Choose a tag to compare

Sorting Hat 0.3 - (2017-03-21)

NOTICE: UUIDs generated by SortingHat < 0.3.0 are no longer compatible.
Please check "Compatibility between versions" section from README.md file

New features and improvements:

  • New algorithm to genere UUIDs.

    UUIDs were generated using case and accent sensitive values with the seed
    (source:email:name:username). This means that for any identity with the
    same values in lower or upper case (i.e: jsmith@example.com and JSMITH@example.com)
    or with the same values accent or unaccent (i.e: John Smith or Jöhn Smith)
    would have different UUIDs for any of these combinations.

    The new algorithm changes upper to lower case characters and converts accent
    characters to their canonical form before the UUIDs is generated.

    This change is caused by the behaviour of MySQL with case configurations
    and accent and unaccent characters. MySQL considers those characters the same,
    raising IntegrityError exceptions when similar tuple values are inserted
    into the database. Generating the same UUID for these cases will prevent the
    error.

    Take into account that previous UUIDs are no longer compatible with this
    version of SortingHat. You should regenerate the UUIDs following the steps
    described in section Compatibility between versions from README.md file.

Bugs fixed:

  • Any non-empty value in email field was used during the affiliation. This
    caused some errors for non valid email addresses like with 'email@' cases,
    which raised a IndexError exception. This bug has been fixed using valid
    email addresses only during the affiliation.

  • Invalid database names were allowed in init command.

0.2.0

01 Feb 17:29
Compare
Choose a tag to compare

Sorting Hat 0.2 - (2017-02-01)

New features and improvements:

  • Auto complete profile information with autoprofile command.

    This command autocompletes the profiles information related to a set of unique
    identities. To update the profile, the command uses a list of sources ordered
    by priority. Only those unique identities which have one or more identities
    from any of these sources will be updated. The name of the profile will be
    filled using the best name possible, normally the longest one.

  • GiHub identities matching method.

    This new method tries to find equal identities using those identities from
    GitHub sources. The identities must come from a source starting with a github
    label and the usernames must be equal.

Bugs fixed:

  • The parser for Gitdm files only accepted email addresses as valid aliases.
    This has been modified to accept any type of aliases. Thus, the input file
    passed to gidm2sh script will be a list of valid aliases instead of email
    aliases.