Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Total size of Julia objects #7603

Closed
avitale opened this issue Jul 14, 2014 · 17 comments
Closed

WIP: Total size of Julia objects #7603

avitale opened this issue Jul 14, 2014 · 17 comments

Comments

@avitale
Copy link

avitale commented Jul 14, 2014

Hi, I couldn't find any function to calculate the total size of an object (ie including size of its elements), so I wrote one

https://gist.github.com/avitale/0070b629b89350b39c21#file-totalsize-jl

is there any interest in including it in base?

@JeffBezanson
Copy link
Member

see also #3393

@JeffBezanson
Copy link
Member

The obvious pitfalls here are shared structure and cycles. If A is a large matrix, then x = {A} and y = {A} will both be "charged" for the full storage of A, even though there is only one copy. This leaves me wondering what the actual use case is for a function like this. It has many of the same problems as the notion of "deep copying".

@stevengj
Copy link
Member

Just as for deepcopy, these problems are obviously solvable by using an auxiliary function with an object cache. But I agree that it would be better to have a clear practical use-case (as in I needed this in situation X because Y, not simple curiosity) before including it in Base.

@StefanKarpinski
Copy link
Member

The high-level use case is answering the question "where is all my memory going?" This is a completely common practical question when working with large amounts of data interactively. You look at it and see that some matrix you don't really need any more is taking a large chunk of memory that you need, so you clear it. In our case, you assign an empty matrix to it and call gc. If multiple variables reference the same chunk of memory, then you'd want to know that so that you can decide if you want to "clear" all of them to free up that memory or not.

@avitale
Copy link
Author

avitale commented Jul 14, 2014

Thanks for the comments, I have updated the gist adding a check on pointers. Now if A is a large matrix, you still get the total memory allocation with totalsizeof(A), but if you call totalsizeof({1 => A, 2 => A}) the storage of A is calculated only once. The check on pointers also breaks circular references. You can also call totalsizeof(Main) and get the memory allocation of the workspace without double countings (note that when calculating the allocation of Modules below Main it can throw some warnings and return just the sizeof the Module).

@stevengj
Copy link
Member

For that use-case, it would be good to report this in the whos() output.

@stevengj
Copy link
Member

@avitale, rather than using a Set of pointers, it would probably be better (cleaner and possibly faster) to use an ObjectIdDict, similar to deepcopy.

@avitale
Copy link
Author

avitale commented Jul 14, 2014

@stevengj, I cannot fully understand how ObjectIdDict works. WIth stackdict::ObjectIdDict and object x, would you store stackdict[x]=true and later check with haskey?

@stevengj
Copy link
Member

I would just store stackdict[x]=x, but yes. Although now that I look at the ObjectIdDict implementation, it seems to be based on the jl_object_id function, which is certainly much slower than a Set of pointers. So, I withdraw my suggestion.

@avitale
Copy link
Author

avitale commented Jul 15, 2014

I have polished the code in the gist and added unit tests in this other gist https://gist.github.com/avitale/e8bd84f6293f595a3845 . If you don't have any more comments, I consider it completed.

@quinnj
Copy link
Member

quinnj commented Aug 14, 2014

@avitale, it would be great if you could turn your two gists into a pull request against master. That's the usual way for this to get more traction and work out any other concerns. Just let us know if you need pointers on how to do a pull request. Thanks.

@avitale
Copy link
Author

avitale commented Aug 25, 2014

@quinnj , I can try and do a pull request, could you give me some info on where should I put the files with source/test code, where should I write the docs and if there is anything else to do? Thanks

@quinnj
Copy link
Member

quinnj commented Aug 25, 2014

Sure thing. You could probably put this in base/util.jl, as there are a few other "informational" methods there (you don't need a TotalSizeOf module though, just put the methods in). The tests can go in their own file (totalsizeof.jl, then include "totalsizeof" in runtests.jl), or perhaps in the core.jl test file?

Docs can go in julia/doc/stdlib/base.rst. I would just put the definition right after sizeof.

To make a pull request, you typically fork your own repo of julia, clone it locally, make your changes, add tests, then run make to rebuild julia with your change, then you can run julia test/runtests.jl all to make sure all tests pass. After committing your changes, you can push to your github repo fork, and it will give you the option to make PR against Base. I'm pretty sure there are github tutorials on this process as well that you can google, but feel free to ask any questions specific to julia.

@avitale
Copy link
Author

avitale commented Aug 25, 2014

Thanks @quinnj . Currently there are 2 "private" helper functions in the module that are not exported, when I put the code in base/util.jl should I prefix their name with _ or are there other guidelines to avoid polluting the global workspace?

@quinnj
Copy link
Member

quinnj commented Aug 25, 2014

It really depends. Sometimes the _ prefix is used, otherwise, they're probably ok to stay as-is, just unexported. Not too big a deal.

@pagnani
Copy link

pagnani commented Jan 4, 2015

I would find quite useful a totalsizeof(T) function, and also an improved whos() showing total size of desktop. I came across this thread and I tried using the above mentioned gist. There must be some error in the implementation since on a rank four tensor does not seem to return the same size

julia> totalsizeof(rand(21,21,53,53))
18953024

julia> totalsizeof(rand(21,21,53,53))
18884112

Also smaller size tensor show the same behaviour although with this size the inconsistency is systematic, at least on my laptop

julia> versioninfo()
Julia Version 0.3.5-pre+25
Commit 4f74d19 (2015-01-03 10:35 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

@LilithHafner
Copy link
Member

For folks who are still looking for this feature, we have Base.summarysize

help?> Base.summarysize
  Base.summarysize(obj; exclude=Union{...}, chargeall=Union{...}) -> Int

  Compute the amount of memory, in bytes, used by all unique objects reachable from the argument.

  Keyword Arguments
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

    •  exclude: specifies the types of objects to exclude from the traversal.

    •  chargeall: specifies the types of objects to always charge the size of all of their fields, even if those fields would normally be excluded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants