Memory Usage #23

scraperdragon · 2013-07-22T15:03:49Z

This is a combined messytables/xypath issue

We need to be cautious about the amount of memory we're using:

http://faostat.fao.org/Portals/_Faostat/Downloads/zip_files/FoodSupply_Crops_E_Africa_1.zip

a 1.5MB zip (15MB csv)

with

fh = dl.grab(url)
mt, = list(messytables.zip.ZIPTableSet(fh).tables)
xy = xypath.Table.from_messy(mt)

uses around 3 gigabytes of ram.

Given that, in the "upload a spreadsheet" tool, people could upload files this big trivially, we'll need to think about memory consumption.

Top tip: dictionaries are horrific.

Dave.

The text was updated successfully, but these errors were encountered:

scraperdragon · 2013-09-06T13:59:56Z

Not significantly better with the new changes :( (40%+ ram locally; estimate ~ 2G)
import StringIO import requests import xypath import messytables url = 'http://faostat.fao.org/Portals/_Faostat/Downloads/zip_files/FoodSupply_Crops_E_Africa_1.zip' z = requests.get(url).content fh = StringIO.StringIO(z) mt, = list(messytables.zip.ZIPTableSet(fh).tables) xy = xypath.Table.from_messy(mt)

It's not ZIP specific.

pwaller · 2013-09-06T14:39:21Z

When making large numbers of instances of objects which only have a couple of per-instance variables, you can save a ton of memory by defining __slots__.

scraperdragon · 2014-03-06T17:12:35Z

__slots__ was implemented; not tested performance.

scraperdragon · 2014-03-20T11:51:45Z

Now 33% ram. Better, but not a vast improvement.

scraperdragon · 2014-07-09T15:47:38Z

More improvements, driven by a change in this file. Mostly ditching the double-index.

StevenMaude · 2016-09-26T10:40:35Z

This remains a problem.

Checking with the same code above (just tidied for ease of copy-pasting):

import StringIO
import requests
import xypath
import messytables

url = 'http://faostat.fao.org/Portals/_Faostat/Downloads/zip_files/FoodSupply_Crops_E_Africa_1.zip'
z = requests.get(url).content
fh = StringIO.StringIO(z)
mt, = list(messytables.zip.ZIPTableSet(fh).tables)
xy = xypath.Table.from_messy(mt)

and running it with /usr/bin/time -v python faostat.py

results in:

Maximum resident set size (kbytes): 3375120

scraperdragon closed this as completed Jul 9, 2014

scraperdragon reopened this Jul 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Usage #23

Memory Usage #23

scraperdragon commented Jul 22, 2013

scraperdragon commented Sep 6, 2013

pwaller commented Sep 6, 2013

scraperdragon commented Mar 6, 2014

scraperdragon commented Mar 20, 2014

scraperdragon commented Jul 9, 2014

StevenMaude commented Sep 26, 2016 •

edited

Loading

Memory Usage #23

Memory Usage #23

Comments

scraperdragon commented Jul 22, 2013

scraperdragon commented Sep 6, 2013

pwaller commented Sep 6, 2013

scraperdragon commented Mar 6, 2014

scraperdragon commented Mar 20, 2014

scraperdragon commented Jul 9, 2014

StevenMaude commented Sep 26, 2016 • edited Loading

StevenMaude commented Sep 26, 2016 •

edited

Loading