Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transparent markers misbehaving in plotly.express.scatter #4664

Open
lgi1sgm opened this issue Jul 15, 2024 · 5 comments
Open

Transparent markers misbehaving in plotly.express.scatter #4664

lgi1sgm opened this issue Jul 15, 2024 · 5 comments
Labels
bug something broken P3 not needed for current cycle

Comments

@lgi1sgm
Copy link

lgi1sgm commented Jul 15, 2024

Description

I create a scatter plot and use the color and size arguments. One subset gets transparent markers and is not visible in the plot and the legend.

But: if I hover over an area, where a marker should be, the tool tip is appearing, see figure below:

image

Expected Behavior

Marker should be visible in plot and legend.

Reproduction

Running the code below I get the figure above.

# %%
# Imports

import sys
import plotly
import pandas as pd
import plotly.express as px

print(f'Python version: {sys.version}')  # Mine is: 3.11.9
print(f'Pandas version: {pd.__version__}')  # Mine is: 2.2.2
print(f'Plotly version: {plotly.__version__}')  # Mine is: 5.22.0

# %%
# Create input frame

df = pd.DataFrame(
  [
    [11739,21.329416,10.010795,2,1],
    [20500,21.860714,12.238669,2,2],
    [1504,21.927166,10.314574,2,1],
    [28194,21.257576,12.823945,2,3],
    [9008,21.886381,9.579169,2,1],
    [17073,21.57327,11.087076,2,1],
    [40734,21.069445,11.887547,3,0],
    [36405,22.397081,11.608735,3,0],
    [36919,21.95463,12.856195,3,0],
    [9867,20.893126,10.761697,2,1]
  ],
  columns=['id' ,'x', 'y', 'loop_number', 'repetition']
)

df.set_index('id', inplace=True)

df.x = df.x.astype('float32')
df.y = df.y.astype('float32')
df.loop_number = df.loop_number.astype('category')  # As category to use color labels, not a color bar.
df.repetition = df.repetition.astype('int32')

df.head()

# %%
# Create Scatter Plot

px.scatter(
  df,
  x='x',
  y='y',
  color='loop_number',
  size='repetition',
  labels={
    'color': 'Type',
    'size': 'Size'
  },
  hover_name=df.index
)
@Rachmanichou
Copy link

Hi,
The problem is with your data. The missing data points have repetition set to zero. Their size is therefore zero and they are invisible.

@lgi1sgm
Copy link
Author

lgi1sgm commented Jul 16, 2024

Ok, thanks for that.

But is that the intended behavior? I would expect plotly to calculated some reasonable sizes.

What If I wanted to plot some big or small values, like city populations or bacteria diameters represented as size of the markers.

@lgi1sgm
Copy link
Author

lgi1sgm commented Jul 16, 2024

Workaround for upper example code:

 %%
# Imports

import sys
import plotly
import pandas as pd
import plotly.express as px

print(f'Python version: {sys.version}')  # Mine is: 3.11.9
print(f'Pandas version: {pd.__version__}')  # Mine is: 2.2.2
print(f'Plotly version: {plotly.__version__}')  # Mine is: 5.22.0

# %%
# Create input frame

df = pd.DataFrame(
  [
    [11739,21.329416,10.010795,2,1],
    [20500,21.860714,12.238669,2,2],
    [1504,21.927166,10.314574,2,1],
    [28194,21.257576,12.823945,2,3],
    [9008,21.886381,9.579169,2,1],
    [17073,21.57327,11.087076,2,1],
    [40734,21.069445,11.887547,3,0],
    [36405,22.397081,11.608735,3,0],
    [36919,21.95463,12.856195,3,0],
    [9867,20.893126,10.761697,2,1]
  ],
  columns=['id' ,'x', 'y', 'loop_number', 'repetition']
)

df.set_index('id', inplace=True)

df.x = df.x.astype('float32')
df.y = df.y.astype('float32')
df.loop_number = df.loop_number.astype('category')  # As category to use color labels, not a color bar.
df.repetition = df.repetition.astype('int32')


# ============================================================================
# ----------------------------------------------------------------------------
# This is a workaround for the issue. It seems, that the size is calculated
# directly based on the value of the column. A size of zero seems to lead to a
# marker with the diameter or area of 0.
#
df.repetition = df.repetition + 1
#
# ----------------------------------------------------------------------------
# ============================================================================


df.head()

# %%
# Create Scatter Plot

px.scatter(
  df,
  x='x',
  y='y',
  color='loop_number',
  size='repetition',
  labels={
    'color': 'Type',
    'size': 'Size'
  },
  hover_name=df.index
)

@Rachmanichou
Copy link

If you wanted to plot large values, or values with a large span, you would probably have to scale them before hand. For example by substracting by the mean and dividing by the standard deviation: (x - mean)/std. This allows you to have all your values squished onto a -1;1 scale. There are other methods to do so, such as using maximum and minimum values.

@lgi1sgm
Copy link
Author

lgi1sgm commented Jul 18, 2024

Yes, I understand.

The only open question for me is, whether this behavior is the intended one. I'm not convinced, because if you read the documentation it states:

size (str or int or Series or array-like) – Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign mark sizes.

The last sentence: "Values from this column (...) are used to assign mark sizes" tells me, that if I use huge marker sizes, then the markers should become huge, but it is not the case.

So this is the image I receive, when I use values from 1'000'000 to 1'000'003.

image

Now the marker sizes are similar in size, which I expected, but they are not huge which I also expected based on the documentation.

In comparison, this is the image I get if I use seaborn instead. Seaborn somehow calculates marker sizes internally and that is actually the behavior I expected:

image

Long story short, for me this issue is done, the only question remaining is, whether the maintainers want to adapt the documentation to better reflect what the size functionality is doing.

Example code:

# %%
# Imports

import sys
import seaborn as sns
import plotly
import pandas as pd
import plotly.express as px

print(f'Python version: {sys.version}')  # Mine is: 3.11.9
print(f'Pandas version: {pd.__version__}')  # Mine is: 2.2.2
print(f'Plotly version: {plotly.__version__}')  # Mine is: 5.22.0

# %%
# Create input frame

df = pd.DataFrame(
  [
    [11739,21.329416,10.010795,2,1],
    [20500,21.860714,12.238669,2,2],
    [1504,21.927166,10.314574,2,1],
    [28194,21.257576,12.823945,2,3],
    [9008,21.886381,9.579169,2,1],
    [17073,21.57327,11.087076,2,1],
    [40734,21.069445,11.887547,3,0],
    [36405,22.397081,11.608735,3,0],
    [36919,21.95463,12.856195,3,0],
    [9867,20.893126,10.761697,2,1]
  ],
  columns=['id' ,'x', 'y', 'loop_number', 'repetition']
)

df.set_index('id', inplace=True)

df.x = df.x.astype('float32')
df.y = df.y.astype('float32')
df.loop_number = df.loop_number.astype('category')  # As category to use color labels, not a color bar.
df.repetition = df.repetition.astype('int32')


# ============================================================================
# ----------------------------------------------------------------------------
# This is a workaround for the issue. It seems, that the size is calculated
# directly based on the value of the column. A size of zero seems to lead to a
# marker with the diameter or area of 0.
#
df.repetition = df.repetition + 1000000
#
# ----------------------------------------------------------------------------
# ============================================================================


df.head()

# %%
# Create Scatter Plot

px.scatter(
  df,
  x='x',
  y='y',
  color='loop_number',
  size='repetition',
  labels={
    'color': 'Type',
    'size': 'Size'
  },
  hover_name=df.index
)

# %%
# Compare the result to Seaborn

sns.scatterplot(
  df,
  x='x',
  y='y',
  hue='loop_number',
  size='repetition'
)

@gvwilson gvwilson assigned gvwilson and unassigned gvwilson Jul 26, 2024
@gvwilson gvwilson added the P3 not needed for current cycle label Aug 12, 2024
@gvwilson gvwilson changed the title Transparent markers in plotly.express.scatter Transparent markers misbehaving in plotly.express.scatter Aug 13, 2024
@gvwilson gvwilson added the bug something broken label Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken P3 not needed for current cycle
Projects
None yet
Development

No branches or pull requests

3 participants