Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MDP-1754 Improve the memory and cpu performance of reading a subset of
columns. By using the index bitfield masks we can return a sparse dataframe. This is a behaviour change, as we don't return rows for timestamps where the field wasn't updated. Old code: ========= # All columns %timeit l.read('3284.JP', date_range=adu.DateRange(20170101, 20170206)) 1 loops, best of 3: 1.99 s per loop # Multiple columns %timeit l.read('3284.JP', date_range=adu.DateRange(20170101, 20170206), columns=['DISC_BID1', 'BID']) 10 loops, best of 3: 82.2 ms per loop # Single very sparse column %timeit l.read('3284.JP', date_range=adu.DateRange(20170101, 20170206), columns=['DISC_BID1']) 10 loops, best of 3: 76.4 ms per loop New code: ========= # All columns %timeit l.read('3284.JP', date_range=adu.DateRange(20170101, 20170206)) 1 loop, best of 3: 2.29 s per loop # Multiple columns %timeit l.read('3284.JP', date_range=adu.DateRange(20170101, 20170206), columns=['DISC_BID1', 'BID']) 10 loops, best of 3: 75.4 ms per loop # Single very sparse column %timeit l.read('3284.JP', date_range=adu.DateRange(20170101, 20170206), columns=['DISC_BID1']) 10 loops, best of 3: 47.4 ms per loop Fixes pandas-dev#290
- Loading branch information