-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Usage #23
Comments
Not significantly better with the new changes :( (40%+ ram locally; estimate ~ 2G) It's not ZIP specific. |
When making large numbers of instances of objects which only have a couple of per-instance variables, you can save a ton of memory by defining |
|
Now 33% ram. Better, but not a vast improvement. |
More improvements, driven by a change in this file. Mostly ditching the double-index. |
This remains a problem. Checking with the same code above (just tidied for ease of copy-pasting): import StringIO
import requests
import xypath
import messytables
url = 'http://faostat.fao.org/Portals/_Faostat/Downloads/zip_files/FoodSupply_Crops_E_Africa_1.zip'
z = requests.get(url).content
fh = StringIO.StringIO(z)
mt, = list(messytables.zip.ZIPTableSet(fh).tables)
xy = xypath.Table.from_messy(mt) and running it with results in:
|
This is a combined messytables/xypath issue
We need to be cautious about the amount of memory we're using:
http://faostat.fao.org/Portals/_Faostat/Downloads/zip_files/FoodSupply_Crops_E_Africa_1.zip
a 1.5MB zip (15MB csv)
with
uses around 3 gigabytes of ram.
Given that, in the "upload a spreadsheet" tool, people could upload files this big trivially, we'll need to think about memory consumption.
Top tip: dictionaries are horrific.
Dave.
The text was updated successfully, but these errors were encountered: