Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python3.12: Two tests are failing with AssertionError #369

Closed
penguinpee opened this issue Jul 26, 2023 · 5 comments · Fixed by #370
Closed

Python3.12: Two tests are failing with AssertionError #369

penguinpee opened this issue Jul 26, 2023 · 5 comments · Fixed by #370
Assignees
Labels
bug 💥 Something isn't working

Comments

@penguinpee
Copy link

Python 3.12 was unleashed recently in Fedora. Building pingouin against Python 3.12 results in two tests failing:

=================================== FAILURES ===================================
______________________ TestCorrelation.test_partial_corr _______________________
self = <pingouin.tests.test_correlation.TestCorrelation testMethod=test_partial_corr>
    def test_partial_corr(self):
        """Test function partial_corr.
    
        Compare with the R package ppcor (which is also used by JASP).
        """
        df = read_dataset("partial_corr")
        #######################################################################
        # PARTIAL CORRELATION
        #######################################################################
        # With one covariate
>       pc = partial_corr(data=df, x="x", y="y", covar="cv1")
df         =            x         y       cv1       cv2       cv3
0   5.230740  7.141488  1.825430 -1.085631 -0.255619
1   6.013931...681  1.754886
28  3.697089  6.129922  3.138867 -0.140069  1.495644
29  5.466289  7.747689  1.053789 -0.861755  1.069393
self       = <pingouin.tests.test_correlation.TestCorrelation testMethod=test_partial_corr>
pingouin/tests/test_correlation.py:137: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
data =            x         y       cv1       cv2       cv3
0   5.230740  7.141488  1.825430 -1.085631 -0.255619
1   6.013931...681  1.754886
28  3.697089  6.129922  3.138867 -0.140069  1.495644
29  5.466289  7.747689  1.053789 -0.861755  1.069393
x = 'x', y = 'y', covar = 'cv1', x_covar = None, y_covar = None
alternative = 'two-sided', method = 'pearson'
    @pf.register_dataframe_method
    def partial_corr(
        data=None,
        x=None,
        y=None,
        covar=None,
        x_covar=None,
        y_covar=None,
        alternative="two-sided",
        method="pearson",
    ):
        """Partial and semi-partial correlation.
    
        Parameters
        ----------
        data : :py:class:`pandas.DataFrame`
            Pandas Dataframe. Note that this function can also directly be used
            as a :py:class:`pandas.DataFrame` method, in which case this argument
            is no longer needed.
        x, y : string
            x and y. Must be names of columns in ``data``.
        covar : string or list
            Covariate(s). Must be a names of columns in ``data``. Use a list if
            there are two or more covariates.
        x_covar : string or list
            Covariate(s) for the ``x`` variable. This is used to compute
            semi-partial correlation (i.e. the effect of ``x_covar`` is removed
            from ``x`` but not from ``y``). Only one of ``covar``,  ``x_covar`` and
            ``y_covar`` can be specified.
        y_covar : string or list
            Covariate(s) for the ``y`` variable. This is used to compute
            semi-partial correlation (i.e. the effect of ``y_covar`` is removed
            from ``y`` but not from ``x``). Only one of ``covar``,  ``x_covar`` and
            ``y_covar`` can be specified.
        alternative : string
            Defines the alternative hypothesis, or tail of the partial correlation. Must be one of
            "two-sided" (default), "greater" or "less". Both "greater" and "less" return a one-sided
            p-value. "greater" tests against the alternative hypothesis that the partial correlation is
            positive (greater than zero), "less" tests against the hypothesis that the partial
            correlation is negative.
        method : string
            Correlation type:
    
            * ``'pearson'``: Pearson :math:`r` product-moment correlation
            * ``'spearman'``: Spearman :math:`\\rho` rank-order correlation
    
        Returns
        -------
        stats : :py:class:`pandas.DataFrame`
    
            * ``'n'``: Sample size (after removal of missing values)
            * ``'r'``: Partial correlation coefficient
            * ``'CI95'``: 95% parametric confidence intervals around :math:`r`
            * ``'p-val'``: p-value
    
        See also
        --------
        corr, pcorr, pairwise_corr, rm_corr
    
        Notes
        -----
        Partial correlation [1]_ measures the degree of association between ``x``
        and ``y``, after removing the effect of one or more controlling variables
        (``covar``, or :math:`Z`). Practically, this is achieved by calculating the
        correlation coefficient between the residuals of two linear regressions:
    
        .. math:: x \\sim Z, y \\sim Z
    
        Like the correlation coefficient, the partial correlation
        coefficient takes on a value in the range from –1 to 1, where 1 indicates a
        perfect positive association.
    
        The semipartial correlation is similar to the partial correlation,
        with the exception that the set of controlling variables is only
        removed for either ``x`` or ``y``, but not both.
    
        Pingouin uses the method described in [2]_ to calculate the (semi)partial
        correlation coefficients and associated p-values. This method is based on
        the inverse covariance matrix and is significantly faster than the
        traditional regression-based method. Results have been tested against the
        `ppcor <https://cran.r-project.org/web/packages/ppcor/index.html>`_
        R package.
    
        .. important:: Rows with missing values are automatically removed from
            data.
    
        References
        ----------
        .. [1] https://en.wikipedia.org/wiki/Partial_correlation
    
        .. [2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681537/
    
        Examples
        --------
        1. Partial correlation with one covariate
    
        >>> import pingouin as pg
        >>> df = pg.read_dataset('partial_corr')
        >>> pg.partial_corr(data=df, x='x', y='y', covar='cv1').round(3)
                  n      r         CI95%  p-val
        pearson  30  0.568  [0.25, 0.77]  0.001
    
        2. Spearman partial correlation with several covariates
    
        >>> # Partial correlation of x and y controlling for cv1, cv2 and cv3
        >>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
        ...                 method='spearman').round(3)
                   n      r         CI95%  p-val
        spearman  30  0.521  [0.18, 0.75]  0.005
    
        3. Same but one-sided test
    
        >>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
        ...                 alternative="greater", method='spearman').round(3)
                   n      r        CI95%  p-val
        spearman  30  0.521  [0.24, 1.0]  0.003
    
        >>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
        ...                 alternative="less", method='spearman').round(3)
                   n      r         CI95%  p-val
        spearman  30  0.521  [-1.0, 0.72]  0.997
    
        4. As a pandas method
    
        >>> df.partial_corr(x='x', y='y', covar=['cv1'], method='spearman').round(3)
                   n      r         CI95%  p-val
        spearman  30  0.578  [0.27, 0.78]  0.001
    
        5. Partial correlation matrix (returns only the correlation coefficients)
    
        >>> df.pcorr().round(3)
                 x      y    cv1    cv2    cv3
        x    1.000  0.493 -0.095  0.130 -0.385
        y    0.493  1.000 -0.007  0.104 -0.002
        cv1 -0.095 -0.007  1.000 -0.241 -0.470
        cv2  0.130  0.104 -0.241  1.000 -0.118
        cv3 -0.385 -0.002 -0.470 -0.118  1.000
    
        6. Semi-partial correlation on x
    
        >>> pg.partial_corr(data=df, x='x', y='y', x_covar=['cv1', 'cv2', 'cv3']).round(3)
                  n      r        CI95%  p-val
        pearson  30  0.463  [0.1, 0.72]  0.015
        """
        from pingouin.utils import _flatten_list
    
        # Safety check
        assert alternative in [
            "two-sided",
            "greater",
            "less",
        ], "Alternative must be one of 'two-sided' (default), 'greater' or 'less'."
        assert method in [
            "pearson",
            "spearman",
        ], 'only "pearson" and "spearman" are supported for partial correlation.'
        assert isinstance(data, pd.DataFrame), "data must be a pandas DataFrame."
        assert data.shape[0] > 2, "Data must have at least 3 samples."
        if covar is not None and (x_covar is not None or y_covar is not None):
            raise ValueError("Cannot specify both covar and {x,y}_covar.")
        if x_covar is not None and y_covar is not None:
            raise ValueError("Cannot specify both x_covar and y_covar.")
        assert x != covar, "x and covar must be independent"
        assert y != covar, "y and covar must be independent"
        assert x != y, "x and y must be independent"
        if isinstance(covar, list):
            assert x not in covar, "x and covar must be independent"
            assert y not in covar, "y and covar must be independent"
        # Check that columns exist
        col = _flatten_list([x, y, covar, x_covar, y_covar])
>       assert all([c in data for c in col]), "columns are not in dataframe."
E       AssertionError: columns are not in dataframe.
_flatten_list = <function _flatten_list at 0x7f69387c63e0>
alternative = 'two-sided'
col        = ['x', 'y', 'cv1', None, None]
covar      = 'cv1'
data       =            x         y       cv1       cv2       cv3
0   5.230740  7.141488  1.825430 -1.085631 -0.255619
1   6.013931...681  1.754886
28  3.697089  6.129922  3.138867 -0.140069  1.495644
29  5.466289  7.747689  1.053789 -0.861755  1.069393
method     = 'pearson'
x          = 'x'
x_covar    = None
y          = 'y'
y_covar    = None
pingouin/correlation.py:843: AssertionError
_______________________ TestPairwise.test_pairwise_corr ________________________
self = <pingouin.tests.test_pairwise.TestPairwise testMethod=test_pairwise_corr>
    def test_pairwise_corr(self):
        """Test function pairwise_corr"""
        # Load JASP Big 5 DataSets (remove subject column)
        data = read_dataset("pairwise_corr").iloc[:, 1:]
        stats = pairwise_corr(data=data, method="pearson", alternative="two-sided")
        jasp_rval = [-0.350, -0.01, -0.134, -0.368, 0.267, 0.055, 0.065, 0.159, -0.013, 0.159]
        assert np.allclose(stats["r"].round(3).to_numpy(), jasp_rval)
        assert stats["n"].to_numpy()[0] == 500
        # Correct for multiple comparisons
        pairwise_corr(data=data, method="spearman", alternative="greater", padjust="bonf")
        # Check with a subset of columns
        pairwise_corr(data=data, columns=["Neuroticism", "Extraversion"])
        with pytest.raises(ValueError):
            pairwise_corr(data=data, columns="wrong")
        # Check with non-numeric columns
        data["test"] = "test"
        pairwise_corr(data=data, method="pearson")
        # Check different variation of product / combination
        n = data.shape[0]
        data["Age"] = np.random.randint(18, 65, n)
        data["IQ"] = np.random.normal(105, 1, n)
        data["One"] = 1
        data["Gender"] = np.repeat(["M", "F"], int(n / 2))
        pairwise_corr(data, columns=["Neuroticism", "Gender"], method="shepherd")
        pairwise_corr(data, columns=["Neuroticism", "Extraversion", "Gender"])
        pairwise_corr(data, columns=["Neuroticism"])
        pairwise_corr(data, columns="Neuroticism", method="skipped")
        pairwise_corr(data, columns=[["Neuroticism"]], method="spearman")
        pairwise_corr(data, columns=[["Neuroticism"], None], method="percbend")
        pairwise_corr(data, columns=[["Neuroticism", "Gender"], ["Age"]])
        pairwise_corr(data, columns=[["Neuroticism"], ["Age", "IQ"]])
        pairwise_corr(data, columns=[["Age", "IQ"], []])
        pairwise_corr(data, columns=["Age", "Gender", "IQ", "Wrong"])
        pairwise_corr(data, columns=["Age", "Gender", "Wrong"])
        # Test with no good combinations
        with pytest.raises(ValueError):
            pairwise_corr(data, columns=["Gender", "Gender"])
        # Test when one column has only one unique value
        pairwise_corr(data=data, columns=["Age", "One", "Gender"])
        stats = pairwise_corr(data, columns=["Neuroticism", "IQ", "One"])
        assert stats.shape[0] == 1
        # Test with covariate
>       pairwise_corr(data, covar="Age")
data       =      Neuroticism  Extraversion  Openness  ...          IQ  One Gender
0        2.47917       4.20833   3.93750  ...  1...  105.904904    1      F
499      2.54167       3.56250   3.14583  ...  105.719773    1      F
[500 rows x 10 columns]
jasp_rval  = [-0.35, -0.01, -0.134, -0.368, 0.267, 0.055, ...]
n          = 500
self       = <pingouin.tests.test_pairwise.TestPairwise testMethod=test_pairwise_corr>
stats      =              X   Y   method  ...     p-unc   BF10     power
0  Neuroticism  IQ  pearson  ...  0.670885  0.061  0.070912
[1 rows x 10 columns]
pingouin/tests/test_pairwise.py:681: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pingouin/pairwise.py:1458: in pairwise_corr
    cor_st = partial_corr(
        X          = array(['Neuroticism', 'Neuroticism', 'Neuroticism', 'Neuroticism',
       'Neuroticism', 'Neuroticism', 'Extraversion'...eness',
       'Agreeableness', 'Agreeableness', 'Conscientiousness',
       'Conscientiousness', 'Age'], dtype='<U17')
        Y          = array(['Extraversion', 'Openness', 'Agreeableness', 'Conscientiousness',
       'Age', 'IQ', 'Openness', 'Agreeablenes...ableness', 'Conscientiousness', 'Age', 'IQ',
       'Conscientiousness', 'Age', 'IQ', 'Age', 'IQ', 'IQ'], dtype='<U17')
        alternative = 'two-sided'
        col1       = 'Neuroticism'
        col2       = 'Extraversion'
        columns    = None
        combs      = array([['Neuroticism', 'Extraversion'],
       ['Neuroticism', 'Openness'],
       ['Neuroticism', 'Agreeableness'],
 ...', 'IQ'],
       ['Conscientiousness', 'Age'],
       ['Conscientiousness', 'IQ'],
       ['Age', 'IQ']], dtype='<U17')
        corr       = <function corr at 0x7f69378bfc40>
        covar      = ['Age']
        data       =      Neuroticism  Extraversion  Openness  ...  Conscientiousness  Age          IQ
0        2.47917       4.20833   3.9...8  105.904904
499      2.54167       3.56250   3.14583  ...            2.89583   32  105.719773
[500 rows x 7 columns]
        i          = 0
        keys       = ['Neuroticism', 'Extraversion', 'Openness', 'Agreeableness', 'Conscientiousness', 'Age', ...]
        method     = 'pearson'
        multi_index = False
        nan_policy = 'pairwise'
        old_options = {'round': None, 'round.column.BF10': <function _format_bf at 0x7f693a3de660>, 'round.column.CI95%': 2}
        padjust    = 'none'
        partial_corr = <function partial_corr at 0x7f69378bfce0>
        stats      =                     X                  Y   method  ... p-val BF10 power
0         Neuroticism       Extraversion  pear...n  ...   NaN  NaN   NaN
14  Conscientiousness                 IQ  pearson  ...   NaN  NaN   NaN
[15 rows x 11 columns]
        traverse   = <function pairwise_corr.<locals>.traverse at 0x7f692d9ee3e0>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
data =      Neuroticism  Extraversion  Openness  ...  Conscientiousness  Age          IQ
0        2.47917       4.20833   3.9...8  105.904904
499      2.54167       3.56250   3.14583  ...            2.89583   32  105.719773
[500 rows x 7 columns]
x = 'Neuroticism', y = 'Extraversion', covar = ['Age'], x_covar = None
y_covar = None, alternative = 'two-sided', method = 'pearson'
    @pf.register_dataframe_method
    def partial_corr(
        data=None,
        x=None,
        y=None,
        covar=None,
        x_covar=None,
        y_covar=None,
        alternative="two-sided",
        method="pearson",
    ):
        """Partial and semi-partial correlation.
    
        Parameters
        ----------
        data : :py:class:`pandas.DataFrame`
            Pandas Dataframe. Note that this function can also directly be used
            as a :py:class:`pandas.DataFrame` method, in which case this argument
            is no longer needed.
        x, y : string
            x and y. Must be names of columns in ``data``.
        covar : string or list
            Covariate(s). Must be a names of columns in ``data``. Use a list if
            there are two or more covariates.
        x_covar : string or list
            Covariate(s) for the ``x`` variable. This is used to compute
            semi-partial correlation (i.e. the effect of ``x_covar`` is removed
            from ``x`` but not from ``y``). Only one of ``covar``,  ``x_covar`` and
            ``y_covar`` can be specified.
        y_covar : string or list
            Covariate(s) for the ``y`` variable. This is used to compute
            semi-partial correlation (i.e. the effect of ``y_covar`` is removed
            from ``y`` but not from ``x``). Only one of ``covar``,  ``x_covar`` and
            ``y_covar`` can be specified.
        alternative : string
            Defines the alternative hypothesis, or tail of the partial correlation. Must be one of
            "two-sided" (default), "greater" or "less". Both "greater" and "less" return a one-sided
            p-value. "greater" tests against the alternative hypothesis that the partial correlation is
            positive (greater than zero), "less" tests against the hypothesis that the partial
            correlation is negative.
        method : string
            Correlation type:
    
            * ``'pearson'``: Pearson :math:`r` product-moment correlation
            * ``'spearman'``: Spearman :math:`\\rho` rank-order correlation
    
        Returns
        -------
        stats : :py:class:`pandas.DataFrame`
    
            * ``'n'``: Sample size (after removal of missing values)
            * ``'r'``: Partial correlation coefficient
            * ``'CI95'``: 95% parametric confidence intervals around :math:`r`
            * ``'p-val'``: p-value
    
        See also
        --------
        corr, pcorr, pairwise_corr, rm_corr
    
        Notes
        -----
        Partial correlation [1]_ measures the degree of association between ``x``
        and ``y``, after removing the effect of one or more controlling variables
        (``covar``, or :math:`Z`). Practically, this is achieved by calculating the
        correlation coefficient between the residuals of two linear regressions:
    
        .. math:: x \\sim Z, y \\sim Z
    
        Like the correlation coefficient, the partial correlation
        coefficient takes on a value in the range from1 to 1, where 1 indicates a
        perfect positive association.
    
        The semipartial correlation is similar to the partial correlation,
        with the exception that the set of controlling variables is only
        removed for either ``x`` or ``y``, but not both.
    
        Pingouin uses the method described in [2]_ to calculate the (semi)partial
        correlation coefficients and associated p-values. This method is based on
        the inverse covariance matrix and is significantly faster than the
        traditional regression-based method. Results have been tested against the
        `ppcor <https://cran.r-project.org/web/packages/ppcor/index.html>`_
        R package.
    
        .. important:: Rows with missing values are automatically removed from
            data.
    
        References
        ----------
        .. [1] https://en.wikipedia.org/wiki/Partial_correlation
    
        .. [2] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681537/
    
        Examples
        --------
        1. Partial correlation with one covariate
    
        >>> import pingouin as pg
        >>> df = pg.read_dataset('partial_corr')
        >>> pg.partial_corr(data=df, x='x', y='y', covar='cv1').round(3)
                  n      r         CI95%  p-val
        pearson  30  0.568  [0.25, 0.77]  0.001
    
        2. Spearman partial correlation with several covariates
    
        >>> # Partial correlation of x and y controlling for cv1, cv2 and cv3
        >>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
        ...                 method='spearman').round(3)
                   n      r         CI95%  p-val
        spearman  30  0.521  [0.18, 0.75]  0.005
    
        3. Same but one-sided test
    
        >>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
        ...                 alternative="greater", method='spearman').round(3)
                   n      r        CI95%  p-val
        spearman  30  0.521  [0.24, 1.0]  0.003
    
        >>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'],
        ...                 alternative="less", method='spearman').round(3)
                   n      r         CI95%  p-val
        spearman  30  0.521  [-1.0, 0.72]  0.997
    
        4. As a pandas method
    
        >>> df.partial_corr(x='x', y='y', covar=['cv1'], method='spearman').round(3)
                   n      r         CI95%  p-val
        spearman  30  0.578  [0.27, 0.78]  0.001
    
        5. Partial correlation matrix (returns only the correlation coefficients)
    
        >>> df.pcorr().round(3)
                 x      y    cv1    cv2    cv3
        x    1.000  0.493 -0.095  0.130 -0.385
        y    0.493  1.000 -0.007  0.104 -0.002
        cv1 -0.095 -0.007  1.000 -0.241 -0.470
        cv2  0.130  0.104 -0.241  1.000 -0.118
        cv3 -0.385 -0.002 -0.470 -0.118  1.000
    
        6. Semi-partial correlation on x
    
        >>> pg.partial_corr(data=df, x='x', y='y', x_covar=['cv1', 'cv2', 'cv3']).round(3)
                  n      r        CI95%  p-val
        pearson  30  0.463  [0.1, 0.72]  0.015
        """
        from pingouin.utils import _flatten_list
    
        # Safety check
        assert alternative in [
            "two-sided",
            "greater",
            "less",
        ], "Alternative must be one of 'two-sided' (default), 'greater' or 'less'."
        assert method in [
            "pearson",
            "spearman",
        ], 'only "pearson" and "spearman" are supported for partial correlation.'
        assert isinstance(data, pd.DataFrame), "data must be a pandas DataFrame."
        assert data.shape[0] > 2, "Data must have at least 3 samples."
        if covar is not None and (x_covar is not None or y_covar is not None):
            raise ValueError("Cannot specify both covar and {x,y}_covar.")
        if x_covar is not None and y_covar is not None:
            raise ValueError("Cannot specify both x_covar and y_covar.")
        assert x != covar, "x and covar must be independent"
        assert y != covar, "y and covar must be independent"
        assert x != y, "x and y must be independent"
        if isinstance(covar, list):
            assert x not in covar, "x and covar must be independent"
            assert y not in covar, "y and covar must be independent"
        # Check that columns exist
        col = _flatten_list([x, y, covar, x_covar, y_covar])
>       assert all([c in data for c in col]), "columns are not in dataframe."
E       AssertionError: columns are not in dataframe.
_flatten_list = <function _flatten_list at 0x7f69387c63e0>
alternative = 'two-sided'
col        = ['Neuroticism', 'Extraversion', 'Age', None, None]
covar      = ['Age']
data       =      Neuroticism  Extraversion  Openness  ...  Conscientiousness  Age          IQ
0        2.47917       4.20833   3.9...8  105.904904
499      2.54167       3.56250   3.14583  ...            2.89583   32  105.719773
[500 rows x 7 columns]
method     = 'pearson'
x          = 'Neuroticism'
x_covar    = None
y          = 'Extraversion'
y_covar    = None
pingouin/correlation.py:843: AssertionError

numpy<=1.23

Fedora currently has numpy 1.24 across all branches. But since the package builds fine with all tests passing using numpy 1.24 and Python 3.11, I suspect some issues (removed deprecations?) in relation to Python 3.12.

@raphaelvallat raphaelvallat self-assigned this Jul 29, 2023
@raphaelvallat raphaelvallat added the bug 💥 Something isn't working label Jul 29, 2023
@raphaelvallat
Copy link
Owner

Thanks for reporting. It appears that the _flatten_list function is no longer properly removing the None values:

x = list(filter(None.__ne__, x))

Can you please try running the following?

from pingouin.utils import _flatten_list
x = ['X1', ['M1', 'M2'], 'Y1', None]
_flatten_list(x)

My output:

['X1', 'M1', 'M2', 'Y1']

@penguinpee
Copy link
Author

You are correct. Running this in the build environment results in:

['X1', 'M1', 'M2', 'Y1', None]

@raphaelvallat
Copy link
Owner

Thanks @penguinpee — can you please run the following code in your test env?

import collections.abc

def _flatten_list(x, include_tuple=False):
    """Flatten an arbitrarily nested list into a new list.
    """
    # If x is not iterable, return x
    if not isinstance(x, collections.abc.Iterable):
        return x
    # Initialize empty output variable
    result = []
    # Loop over items in x
    for el in x:
        # Check if element is iterable
        el_is_iter = isinstance(el, collections.abc.Iterable)
        if el_is_iter:
            if not isinstance(el, (str, tuple)):
                result.extend(_flatten_list(el))
            else:
                if isinstance(el, tuple) and include_tuple:
                    result.extend(_flatten_list(el))
                else:
                    result.append(el)
        else:
            result.append(el)
    # Remove None from output
    result = [r for r in result if r is not None]
    return result


x = ['X1', ['M1', 'M2', None], 'Y1', None]
_flatten_list(x)  # Expected output: ['X1', 'M1', 'M2', 'Y1']

@penguinpee
Copy link
Author

Yes, that prints the expected output:

Out[1]: ['X1', 'M1', 'M2', 'Y1']

@raphaelvallat
Copy link
Owner

Thanks! Created a PR: #370

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 💥 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants