forked from thomasmoelhave/tpie
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathTPIE-wish-list.txt
265 lines (233 loc) · 10.5 KB
/
TPIE-wish-list.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
//////////////////////////////////////////////
///
/// Pending Issues
///
//////////////////////////////////////////////
NEXT ID: 18
------------------
ID: 2
General Area: Priority Queues
Priority: Medium
Description: TPIE is lacking a good implementation of a general purpose
external priority queue. There is an implementation from TerraFlow, but
this is not in the main TPIE source and there are some concerns about its
performance. It would be nice to examine the performance of the TFlow PQ,
possibly implement some other EM PQueues and put at least one decent
implementation into the main TPIE source
Status: Open as of 1 April 2005
----------------------
ID: 6
General Area: Priority Queue
Priority: ?
Description: From Herman Haverkort "TPIE philosophy is, apparently, that
the user does not have to worry about how memory is used. This is fine as
long as you have one thing running at a time. But if, for example, you
want to run an algorithm that uses two priority queues in parallel, you
really want to be able to give them each a share of the main memory,
rather then seeing the first one to be initialized take all."
Perhaps this can be solved--in the case of the priority queue--by having
an option in the constructor that specifies the amount of memory to use.
It could default to all available memory. We would need to check other
data structures for this too. Perhaps the B-tree and related tree
structures. I don't think this is a problem for sorting as you can't do
anything else while you are sorting and sort cleans up its memory when it
is done.
Status: Open as of 21 April 2005
------------------------
ID: 17
General Area: Priority Queue
Description:
Kevin writes "can you add the following to the wish list of pqueue: Currently
it uses both x < y and x.getPriorty() < y.getPriority(), which are assumed to
be the same. Ideally it should only use x < y. Then the user can change the
definition of x < y so that we can also have a max pqueue easily"
I agree with Kevin that is does not make sense to require both "<" and
"getPriority()", but I don't agree that we should overload "<" to mean ">"
In this case it makes sense to support a version of the PQ that supports a
comparison operator "<" and a version that supporst a comparison object or
functor (). This is what is done in STL and TPIE sort has a similar
mechanism, except the comparison object has a function called compare().
In the next version of TPIE, this should probably be changed to match
STL-style, but that is a separate wish-item.
Status: Open as of 10 Feb 2006
------------------------
ID: 10
General Area: Priority Queue
Priority:
Description: Regarding the TerraFlow Priority Queue, messages are
displayed with printf to standard out. This should probably be sent to a
log file and it should be possible to turn off messages all together.
See Also: wish-list ID 2,11,12
Status: Open as of 21 April 2005
------------------------
ID: 11
General Area: Priority Queue
Priority:
Description: Regarding the TerraFlow Priority Queue, it leaves a lot of
temporary files persistent. They should be deleted if not needed.
See Also: wish-list ID 2,10,12
Status: Open as of 21 April 2005
------------------------
ID: 12
General Area: Priority Queue
Priority:
Description: Regarding the TerraFlow Priority Queue, Herman Haverkort
suspects that it crashes on low memory limits.
See Also: wish-list ID 2,10,12
Status: Open as of 21 April 2005
---------------------
ID: 3
General Area: Generalized Streaming
Priority: Medium
Description: A common sorting paradigm is decorate-sort-undecorate or DSU.
We decorate a stream with some keys to sort by, sort the stream, and then
later remove the keys from the stream. The output of TPIE sort is stream
of the same type as the input, i.e., a stream with keys. Removing the keys
requires an additional scan over the stream. However, one could imagine
adding a option in TPIE sort to apply some function object to each item in
the final merge phase when the keys are no longer needed. Because an
external sort is typically equivalent to about three scans, bundling the
undecorate phase and saving a fourth scan is a considerable savings.
Status: Open as of 1 April 2005
----------------------
ID: 5
General Area: Streams with variable length items
Priority: Medium
Description: AMI_streams currently maintain a stream of fixed sized items,
but it would often be useful to sort/scan variable sized items. A sample
data layout could be that each item could have a fixed sized header
including at a minimum the size of the variable length item. While this
might be difficult to do in a general setting, in most applications I have
encountered, the variable length is bounded, so perhaps one could create a
stream that could support scanning and sorting of variable length items
where the length of the item is bounded.
Status: Open as of 1 April 2005
------------------------
ID: 8
General Area: Sorting
Priority:
Description: It should be possible that sort feeds its output to some
arbitrary object that has a write_item type of function. This object
should not be restricted to being a converter or an "undecorator": it may
do complicated stuff with the objects, keep an internal state, and maybe
it does not even output anything in a one-to-one relation with the
objects fed to it.
Status: Open as of 21 April 2005
------------------------------
ID: 16
General Area: Pipelining, BTE
Priority: ?
Description: Support some form of pipelining or FIFO buffers, so output of
one algorithm can be used directly by another. This is likely a large
change and may even require threading, so a frist step could be something
like overloading the first/last scan in merge-sort, a sort of pseudo
pipeling where we allow pre/post-processing of certain applications to be
integrated better
Status: Open as of Dec 2005
//////////////////////////////////////////////
///
/// Resolved Issues
///
/////////////////////////////////////////////
-----------------
ID: 1
General Area: Sorting
Priority: Very High
Description: TPIE sometimes returns memory errors on sorting.
If the memory manager is set to enforce_memory_limit() the program will
crash. I have an email from Jan 13 2005 describing the problem in detail.
We need to calculate the space used by all "new" calls during the sorting
phase. I have identified several objects whose space is not accounted for.
There are also some circular dependecies in calculating memory usage.
Until these memory calculation is resolved, I suggest setting the memory
manager to warn_memory_limit() or ignore_memory_limit() before sorting.
Files: apm_dh.h
Status: Problem fixed by A.Danner in Summer 2005
---------------------
ID: 4
General Area: Substreams
Priority: High
Description: Herman Haverkort writes: "when making a substream of an
AMI_STREAM, I get an AMI_stream_base, which is not an AMI_STREAM but
something different. So different, in fact, that AMI_sort does not take it.
So this concept of making substreams is not as transparent as I would like
it to be." This should be addressed somehow.
Status: Issue resolved by merging AMI_stream_single and AMI_stream_base
November 2005 (J. Vahrenhold)
-----------------------
ID: 7
General Area: Streams
Priority:
Description: From Herman Haverkort "Include a member function that returns
the current position of the read/write pointer. Without such a thing, you
cannot detect end of file except by attempting to read past it. This is
annoying. In the Pfafstetter code, I handled this issue by putting a
wrapper around the AMI stream that counts the number of times an item has
been read---thus probably keeping the same count three times (once in the
wrapper, once as the internal read/write pointer of the stream, and once
in the statistics of the stream)..." Comment from Andrew Danner: "There is
the tell() function, but as Herman notes, it is not defined in
ami_stream_base, but it IS defined in ami_stream_single"
Status: Issue resolved by merging AMI_stream_single and AMI_stream_base
November 2005 (J. Vahrenhold)
------------------------
ID: 9
General Area: Sorting
Priority:
Description: Include a progress indicator for sorting. Users don't like it
when the software seems to stall for ten minutes.
Status: Issue resolved by progress_indicator_base and subclasses
November 2005 (J. Vahrenhold)
-------------------------
ID: 13
General Area: Sorting, Code cleanup
Priority:
Description: There are three main sort implementations in the TPIE source, but
only one seems to be used. Find out which one it is, document it. Remove
or update older code and get rid of _dh tags if the _dh code is really the
default sort.
Status: Resolved by A. Danner Nov-Dec 2005
-------------------------
ID: 14
General Area: Streams, Code cleanup
Priority:
Description: Can ami_stream_base and ami_stream_single be merged? Are we
seriously ever going to have another stream implementation other than
ami_stream_single? If not, the code should be merged.
See Also: wish-list ID 7
Status: Issue resolved by merging AMI_stream_single and AMI_stream_base
November 2005 (J. Vahrenhold)
--------------------------
ID: 15
General Area: Memory management/STL
Priority: ?
Description: the TPIE memory manager cannot keep track of memory
allocated in STL correctly. It seems that it can capture allocation but
not deallocation. For example the following piece of code will report a
memory leak.
cout << "Memory available = " << MM_manager.memory_available() << endl;
vector<int> *s = new vector<int>;
s->push_back(1);
delete s;
cout << "Memory available = " << MM_manager.memory_available() << endl;
The real problem is related to this lengthy document:
http://www.codeguru.com/Cpp/Cpp/cpp_mfc/stl/article.php/c4079/
A very quick summary:
The STL uses its own memory allocation methods. They could be just
new/delete or something else. You can also build your own and pass it as
a template parameter like
vector<my_data_type, my_memory_allocator> foo
Looking at GCC 3.3 the vector library uses "new" to allocate space, but
does not necessarily use "delete" to free space. What a nightmare. I
don't have any quick solutions in mind. Perhaps we can overload the
default STL allocator much like we overloaded new/delete, but I doubt
this will improve performance of the STL.
If one wants to see some more evidence, take a look at c
++/bits/stl_vector.h and c++/bits/stl_construct.h. You'll see a "new"
but no "delete"
We need to decide what is the best way to handle this.
Status: Resolved by setting GLIBCPP_FORCE_NEW and extending
the memory manager to provide a pause/resume
mechanism for registering allocation sized
November 2005 (J. Vahrenhold)
------------------------------