Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add option to suppress scientific notation (for small values?) #12374

Open
rosnfeld opened this issue Feb 17, 2016 · 5 comments
Open

ENH: add option to suppress scientific notation (for small values?) #12374

rosnfeld opened this issue Feb 17, 2016 · 5 comments
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string

Comments

@rosnfeld
Copy link
Contributor

I find myself running into a situation where I don't want to see small numbers as scientific notation fairly frequently, things like:

In [3]: pd.set_option('display.precision', 2)

In [4]: pd.DataFrame(np.random.randn(5, 5)).corr()
Out[4]: 
      0     1         2         3     4
0  1.00 -0.57  2.15e-02 -3.48e-02 -0.64
1 -0.57  1.00  2.59e-01 -5.56e-01  0.51
2  0.02  0.26  1.00e+00  2.91e-03 -0.06
3 -0.03 -0.56  2.91e-03  1.00e+00  0.36
4 -0.64  0.51 -6.21e-02  3.63e-01  1.00

or

In [16]: pd.Series(np.random.poisson(size=1000)).value_counts(normalize=True)
Out[16]: 
0    3.80e-01
1    3.63e-01
2    1.75e-01
3    5.70e-02
4    1.80e-02
5    5.00e-03
7    1.00e-03
6    1.00e-03
dtype: float64

Scientific notation isn't helpful when you are trying to make quick comparisons across elements, and have a well-defined notion of a -1 to 1 or 0 to 1 range.

I propose adding some sort of display flag to suppress scientific notation on small numbers, and just report zeros in these cases instead. Alternatively we could also suppress it on large numbers, but I am not sure how helpful that is. I usually only find myself going up against it on small numbers, in exactly the use cases (correlations, proportions) above.

@rosnfeld
Copy link
Contributor Author

(and I volunteer to work on this if others are okay with the idea)

@jreback
Copy link
Contributor

jreback commented Feb 18, 2016

http://pandas.pydata.org/pandas-docs/stable/options.html#number-formatting

there are already 4 related options to do things like this:
display.precision, display.chop_threshold, display.float_format, and pd.set_eng_float_format(accuracy=3, use_eng_prefix=True).

So what I think we need is some consolidation and maybe some docs.

@jreback
Copy link
Contributor

jreback commented Feb 18, 2016

some related issues:

#9448
#6839

@jreback
Copy link
Contributor

jreback commented Feb 18, 2016

love for you to have a look to see how this can be done better

@jreback jreback added Output-Formatting __repr__ of pandas objects, to_string API Design Clean labels Feb 18, 2016
@rosnfeld
Copy link
Contributor Author

Hmm, embarrassing that I hadn't seen chop_threshold before, I've made changes to display.precision and edited its docs and yet not seen this. That sounds like what I want, though I can still get it to behave poorly:

In [25]: pd.set_option('display.precision', 2)
In [26]: pd.set_option('chop_threshold', 0.01)  # maybe this should be 0.005, not sure of order of operations, but I get issues either way
...
In [30]: pd.DataFrame(np.random.randn(5, 5)).corr()
Out[30]: 
      0         1     2     3         4
0  1.00 -3.14e-01  0.07 -0.28  1.42e-01
1 -0.31  1.00e+00 -0.82 -0.35  0.00e+00
2  0.07 -8.19e-01  1.00  0.54 -4.71e-01
3 -0.28 -3.50e-01  0.54  1.00  1.21e-01
4  0.14  0.00e+00 -0.47  0.12  1.00e+00

Thanks for pointing me to it though. I'll play around with this for a while and see if there's some clean-up that can be done. I would love it I could change display.precision while working on some data and have the chop_threshold update to match rather than having to keep them in sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

3 participants