You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SeqRepo is capable of >1500 queries/second single-threaded with local data. At this rate, sequence fetching is likely to be a small component of overall execution of a typical analysis pipeline.
Optimizing significantly beyond current performance requires loading sequences in memory. However, it's not generally feasible or useful to prefetch all sequences. Current human databases are ~12GB compressed. Prefetching selected sequences on first access could be very beneficial for certain access patterns.
Prefetching might work as follows. The client would be instantiated with a prefetch cache size, which would control the number of sequences in the prefetch cache. The default is 0 (no prefetch).
When a client requests a slice of a sequence, the entire sequence would be read speculatively, anticipating that the next queries might be on the same sequence (e.g., on a single chromosome). Subsequent sequence lookups would be entirely in-memory.
The cache would operate in a typical LRU sense, automatically flushing the sequence least recently accessed if the cache size has reached its target size.
Importantly, prefetching can degrade performance if accesses are not suitably ordered.
The text was updated successfully, but these errors were encountered:
SeqRepo is capable of >1500 queries/second single-threaded with local data. At this rate, sequence fetching is likely to be a small component of overall execution of a typical analysis pipeline.
Optimizing significantly beyond current performance requires loading sequences in memory. However, it's not generally feasible or useful to prefetch all sequences. Current human databases are ~12GB compressed. Prefetching selected sequences on first access could be very beneficial for certain access patterns.
Prefetching might work as follows. The client would be instantiated with a prefetch cache size, which would control the number of sequences in the prefetch cache. The default is 0 (no prefetch).
When a client requests a slice of a sequence, the entire sequence would be read speculatively, anticipating that the next queries might be on the same sequence (e.g., on a single chromosome). Subsequent sequence lookups would be entirely in-memory.
The cache would operate in a typical LRU sense, automatically flushing the sequence least recently accessed if the cache size has reached its target size.
Importantly, prefetching can degrade performance if accesses are not suitably ordered.
The text was updated successfully, but these errors were encountered: