You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Does bwt-merge work with BWTs from non-DNA alphabets?
Given that it takes SGA's BWT format as input this would presumably require SGA to support non-DNA as well or another BWT format?
I'd like to build a full-text index of all wikipedia revisions. This is something like 100TB uncompressed and it's not clear to me if there is a better way to do this than using an approach based on BWT merging. But if there is I'd be curious to know.
The text was updated successfully, but these errors were encountered:
BWT-merge assumes alphabet size 6. Byte alphabet would require changes to the BWT encoding, the rank structure, and the optimizations in trie traversal.
Building a BWT for 100 terabytes of data would probably take around 10 CPU years. If you want a single FM-index, you also need enough memory for it on a single system, and the construction will take months. If you can live with p indexes, you can distribute the construction to p systems.
Does bwt-merge work with BWTs from non-DNA alphabets?
Given that it takes SGA's BWT format as input this would presumably require SGA to support non-DNA as well or another BWT format?
I'd like to build a full-text index of all wikipedia revisions. This is something like 100TB uncompressed and it's not clear to me if there is a better way to do this than using an approach based on BWT merging. But if there is I'd be curious to know.
The text was updated successfully, but these errors were encountered: