Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to boost trsort on some files #16

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

aeb1787
Copy link

@aeb1787 aeb1787 commented Sep 29, 2016

  1. It decreases the amount of updates of ranks.
  2. It decreases the number of comparisons while sorting of first
    and last partitions after tandem repeat partitioning.

It decreases total time for Manzini's corpus by 3% and total time for Gauntlet by 24%

@akamiru akamiru mentioned this pull request Sep 29, 2016
Copy link

@chadbrewbaker chadbrewbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comments. Suggested a few unit tests, and raised the issue of Boost license.

@@ -328,13 +328,13 @@ tr_introsort(saidx_t *ISA, const saidx_t *ISAd,
saidx_t *SA, saidx_t *first, saidx_t *last,
trbudget_t *budget) {
#define STACK_SIZE TR_STACKSIZE
struct { const saidx_t *a; saidx_t *b, *c; saint_t d, e; }stack[STACK_SIZE];
Copy link

@chadbrewbaker chadbrewbaker Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is stack[x].e dead code? If so this change alone is a win for cache efficiency unless there is some alignment reason why a dummy value was in there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack[x].e stored the trlink but the refactoring makes it unneccessary. It seems trlink was used to track when to switch to heap sort in case of tandem repeat.

@@ -552,7 +1045,7 @@ tr_introsort(saidx_t *ISA, const saidx_t *ISAd,

/* Tandem repeat sort */
void
trsort(saidx_t *ISA, saidx_t *SA, saidx_t n, saidx_t depth) {
trsort(saidx_t *ISA, saidx_t *SA, saidx_t n, saidx_t depth, saidx_t *buf, saidx_t bufsize) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add a small and large test case so this code gets coverage.

}
}
}
#undef STACK_SIZE
}

static
Copy link

@chadbrewbaker chadbrewbaker Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need at least a few tests that pass the same string to each of the sort subroutines and verify they return the same sort.

Also, is any of this code lifted from Boost? If so it needs to be refactored into a separate file, add a note to the license file, and add a build option so you compile with MIT only code by default.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell straigth from the code if it's equal in terms of computation but it seems by boost he refers to "improve" and not to the software library. The code looks like a modification of yuta mori's code and integrates quite closely. It would be nice to get an explaination for trsort and the changes as it's rather unoblivious and almost no documentation exists.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a flight coming up. I'll try to add a lightweight set of unit tests. Also make the CMake more llvm friendly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants