You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Josh,
I am glad to see you are back to maintain the package. I just tested your new addReadStreamLazy API, which should solve the issue (#74). Please correct me if I am wrong, but it looks like I found two cases with this API.
One case is when addReadStreamLazy is called multiple times, but streams are not created lazily, on demand, when they should actually start to be pumped. They are created immediately by the API call. The second issue is that the 'getReadStreamFunction' callback is called twice for some entries.
For the case of not lazy stream creation:
My idea to solve issue #74 was to create the next readable stream only when the currently streaming entry is finished, and then the pumpEntries function pops the next entity to pipe. The stream for specific entry should be created at this point, just before the entry is going to be pumped. Your current implementation creates the stream immediately by calling addReadStreamLazy function, which calls pumpEntries(self) at the end. So if you call N subsequent addReadStreamLazy, all N streams are created at this point. It is not much better from the point of view of closing idle streams than creating N streams and to call the old version of addReadStream N times. Some streams might be closed by idle socket timeout before YAZL starts to pump them.
For the case when 'getReadStreamFunction' callback is called twice:
Let’s assume you call addReadStreamLazy(...) and then end(). The first callback is called by addReadStreamLazy -> pumpEntries, and the second one by end() -> pumpEntries. This second call pops an entry which is currently being pumped and isn’t 'FILE_DATA_DONE' yet. As a result, you pump some data twice and also get a broken zip file. I guess you tested the new API with pretty small streams, so you didn’t face the race condition when an entry is still being pumped, and the second pumpEntries call takes this entry to pump again. With bigger streams, the issue is easy to reproduce.
From my point of view, pumpEntries should not be called at the end of addReadStreamLazy. Removing this pumpEntries call solves both issues I described. This is also how I implemented it in my PR (#75) two years ago, and this manually patched version has been running on our PROD servers for two years already. It’s just an option; maybe you have another way to solve it.
Anyway, thank you for your work!
The text was updated successfully, but these errors were encountered:
Hi Josh,
I am glad to see you are back to maintain the package. I just tested your new addReadStreamLazy API, which should solve the issue (#74). Please correct me if I am wrong, but it looks like I found two cases with this API.
One case is when addReadStreamLazy is called multiple times, but streams are not created lazily, on demand, when they should actually start to be pumped. They are created immediately by the API call. The second issue is that the 'getReadStreamFunction' callback is called twice for some entries.
For the case of not lazy stream creation:
My idea to solve issue #74 was to create the next readable stream only when the currently streaming entry is finished, and then the pumpEntries function pops the next entity to pipe. The stream for specific entry should be created at this point, just before the entry is going to be pumped. Your current implementation creates the stream immediately by calling addReadStreamLazy function, which calls pumpEntries(self) at the end. So if you call N subsequent addReadStreamLazy, all N streams are created at this point. It is not much better from the point of view of closing idle streams than creating N streams and to call the old version of addReadStream N times. Some streams might be closed by idle socket timeout before YAZL starts to pump them.
For the case when 'getReadStreamFunction' callback is called twice:
Let’s assume you call addReadStreamLazy(...) and then end(). The first callback is called by addReadStreamLazy -> pumpEntries, and the second one by end() -> pumpEntries. This second call pops an entry which is currently being pumped and isn’t 'FILE_DATA_DONE' yet. As a result, you pump some data twice and also get a broken zip file. I guess you tested the new API with pretty small streams, so you didn’t face the race condition when an entry is still being pumped, and the second pumpEntries call takes this entry to pump again. With bigger streams, the issue is easy to reproduce.
From my point of view, pumpEntries should not be called at the end of addReadStreamLazy. Removing this pumpEntries call solves both issues I described. This is also how I implemented it in my PR (#75) two years ago, and this manually patched version has been running on our PROD servers for two years already. It’s just an option; maybe you have another way to solve it.
Anyway, thank you for your work!
The text was updated successfully, but these errors were encountered: