From 57f65b30d28322f59be4e25863c9cdf9ca7f6536 Mon Sep 17 00:00:00 2001 From: jinglinpeng Date: Wed, 19 May 2021 12:38:40 -0700 Subject: [PATCH] docs(eda):replace x, y, z with col1, col2, col3 --- docs/source/user_guide/eda/insights.ipynb | 12 ++++---- .../eda/parameter_configurations.ipynb | 6 ++-- docs/source/user_guide/eda/plot.ipynb | 30 +++++++++---------- .../user_guide/eda/plot_correlation.ipynb | 16 +++++----- docs/source/user_guide/eda/plot_missing.ipynb | 18 +++++------ 5 files changed, 41 insertions(+), 41 deletions(-) diff --git a/docs/source/user_guide/eda/insights.ipynb b/docs/source/user_guide/eda/insights.ipynb index b9a63ddc9..9a073c5f6 100644 --- a/docs/source/user_guide/eda/insights.ipynb +++ b/docs/source/user_guide/eda/insights.ipynb @@ -69,7 +69,7 @@ "id": "engaged-prince", "metadata": {}, "source": [ - "If we use plot(df, x) function, we have to click the following buttom:\n", + "If we use `plot(df, col)` function, we have to click the following buttom:\n", "\n", "![avatar]()" ] @@ -124,7 +124,7 @@ "id": "fatal-recovery", "metadata": {}, "source": [ - "## The insights provided by plot(df, x) when x is a continues column\n", + "## The insights provided by plot(df, col) when col is a continues column\n", "\n", "Here we give an example to show the insights could be yielded by plot(df, x), when x is a continues column.\n", "\n", @@ -147,7 +147,7 @@ "metadata": {}, "outputs": [], "source": [ - "plot(df, x=\"Age\")" + "plot(df, \"Age\")" ] }, { @@ -155,9 +155,9 @@ "id": "streaming-neutral", "metadata": {}, "source": [ - "## The insights provided by plot(df, x) when x is a nominal column\n", + "## The insights provided by plot(df, col) when col is a nominal column\n", "\n", - "Here we give an example to show the insights could be presented by plot(df, x), when x is a nominal column.\n", + "Here we give an example to show the insights could be presented by plot(df, col), when col is a nominal column.\n", "\n", "| insights | applied plots | type | threshold | discription |\n", "| :---- | :---- | :----| :---- | :---- |\n", @@ -203,4 +203,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} +} \ No newline at end of file diff --git a/docs/source/user_guide/eda/parameter_configurations.ipynb b/docs/source/user_guide/eda/parameter_configurations.ipynb index db7be4f22..c9967a81a 100644 --- a/docs/source/user_guide/eda/parameter_configurations.ipynb +++ b/docs/source/user_guide/eda/parameter_configurations.ipynb @@ -22,8 +22,8 @@ "\n", "| Global Parameter | Description |\n", "| --- | --- | \n", - "| `width` | Change the plots' width in `plot(df, x)`, `plot(df, x, y)`, `plot(df, x, y, z)`, `plot_correlation()` and `plot_missing()`.\n", - "| `height` | Change the plots' height in `plot(df, x)`, `plot(df, x, y)` and `plot(df, x, y, z)`, `plot_correlation()` and `plot_missing()`.\n", + "| `width` | Change the plots' width in `plot(df, col1)`, `plot(df, col1, col2)`, `plot(df, col1, col2, col3)`, `plot_correlation()` and `plot_missing()`.\n", + "| `height` | Change the plots' height in `plot(df, col1)`, `plot(df, col1, col2)` and `plot(df, col1, col2, col3)`, `plot_correlation()` and `plot_missing()`.\n", "| `bins` | Apply to `bins` for `Histogram`, `KDE Plot`, `Box Plot`, `Word Length`, `Line Chart`, `Spectrum`.\n", "| `ngroups` | Apply to `bars` and `slices` for the `Bar Chart` and `Pie Chart`." ] @@ -207,4 +207,4 @@ }, "nbformat": 4, "nbformat_minor": 2 -} +} \ No newline at end of file diff --git a/docs/source/user_guide/eda/plot.ipynb b/docs/source/user_guide/eda/plot.ipynb index 67d5d00a8..050ca7382 100644 --- a/docs/source/user_guide/eda/plot.ipynb +++ b/docs/source/user_guide/eda/plot.ipynb @@ -21,12 +21,12 @@ "The function `plot()` explores the distributions and statistics of the dataset. It generates a variety of visualizations and statistics which enables the user to achieve a comprehensive understanding of the column distributions and their relationships. The following describes the functionality of `plot()` for a given dataframe `df`.\n", "\n", "1. `plot(df)`: plots the distribution of each column and computes dataset statistics\n", - "2. `plot(df, x)`: plots the distribution of column `x` in various ways, and computes its statistics\n", - "3. `plot(df, x, y)`: generates plots depicting the relationship between columns `x` and `y`\n", + "2. `plot(df, col1)`: plots the distribution of column `col1` in various ways, and computes its statistics\n", + "3. `plot(df, col1, col2)`: generates plots depicting the relationship between columns `col1` and `col2`\n", "\n", "The generated plots are different for numerical, categorical and geography columns. The following table summarizes the output for the different column types.\n", "\n", - "| `x` | `y` | Output |\n", + "| `col1` | `col2` | Output |\n", "| --- | --- | --- |\n", "| None | None | dataset statistics, [histogram](https://www.wikiwand.com/en/Histogram) or [bar chart](https://www.wikiwand.com/en/Bar_chart) for each column |\n", "| Numerical | None | column statistics, histogram, [kde plot](https://www.wikiwand.com/en/Kernel_density_estimation), [qq-normal plot](https://www.wikiwand.com/en/Q%E2%80%93Q_plot), [box plot](https://www.wikiwand.com/en/Box_plot) |\n", @@ -101,11 +101,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Understand a column with `plot(df, x)`\n", + "## Understand a column with `plot(df, col1)`\n", "\n", - "After getting an overview of the dataset, we can thoroughly investigate a column of interest `x` using `plot(df, x)`. The output is of `plot(df, x)` is different for numerical and categorical columns.\n", + "After getting an overview of the dataset, we can thoroughly investigate a column of interest `col1` using `plot(df, col1)`. The output is of `plot(df, col1)` is different for numerical and categorical columns.\n", "\n", - "When `x` is a numerical column, it computes column statistics, and generates a histogram, kde plot, box plot and qq-normal plot:" + "When `col1` is a numerical column, it computes column statistics, and generates a histogram, kde plot, box plot and qq-normal plot:" ] }, { @@ -164,11 +164,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Understand the relationship between two columns with `plot(df, x, y)`\n", + "## Understand the relationship between two columns with `plot(df, col1, col2)`\n", "\n", - "Next, we can explore the relationship between columns `x` and `y` using `plot(df, x, y)`. The output depends on the types of the columns. \n", + "Next, we can explore the relationship between columns `col1` and `col2` using `plot(df, col1, col2)`. The output depends on the types of the columns. \n", "\n", - "When `x` and `y` are both numerical columns, it generates a scatter plot, hexbin plot and box plot:" + "When `col1` and `col2` are both numerical columns, it generates a scatter plot, hexbin plot and box plot:" ] }, { @@ -189,7 +189,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "When `x` and `y` are both categorical columns, it plots a nested bar chart, stacked bar chart and heat map:" + "When `col1` and `col2` are both categorical columns, it plots a nested bar chart, stacked bar chart and heat map:" ] }, { @@ -210,7 +210,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "When `x` and `y` are one each of type numerical and categorical, it generates a box plot per category and a multi-line chart:" + "When `col1` and `col2` are one each of type numerical and categorical, it generates a box plot per category and a multi-line chart:" ] }, { @@ -230,7 +230,7 @@ }, { "source": [ - "When `x` and `y` are one each of type geopoint and categorical, or, geography and categorical, it generates a box plot per category and a multi-line chart:" + "When `col1` and `col2` are one each of type geopoint and categorical, or, geography and categorical, it generates a box plot per category and a multi-line chart:" ], "cell_type": "markdown", "metadata": {} @@ -241,7 +241,7 @@ "metadata": {}, "outputs": [], "source": [ - "from dataprep.eda.dtypes import LatLong\n", + "from dataprep.eda.dtypes_v2 import LatLong\n", "covid = load_dataset('covid19')\n", "latlong = LatLong(\"Lat\", \"Long\") # create geopoint type using \"LatLong\" function by inputing two columns names\n", "plot(covid, latlong, \"Country/Region\")\n", @@ -253,7 +253,7 @@ }, { "source": [ - "When `x` and `y` are one each of type geography and numerical, it generates a box plot per category, a multi-line chart and a world map:" + "When `col1` and `col2` are one each of type geography and numerical, it generates a box plot per category, a multi-line chart and a world map:" ], "cell_type": "markdown", "metadata": {} @@ -270,7 +270,7 @@ }, { "source": [ - "When `x` and `y` are one each of type geopoint and numerical, it generates a geo map:" + "When `col1` and `col2` are one each of type geopoint and numerical, it generates a geo map:" ], "cell_type": "markdown", "metadata": {} diff --git a/docs/source/user_guide/eda/plot_correlation.ipynb b/docs/source/user_guide/eda/plot_correlation.ipynb index 311815f0b..509064181 100644 --- a/docs/source/user_guide/eda/plot_correlation.ipynb +++ b/docs/source/user_guide/eda/plot_correlation.ipynb @@ -16,12 +16,12 @@ "The function `plot_correlation()` explores the correlation between columns in various ways and using multiple correlation metrics. The following describes the functionality of `plot_correlation()` for a given dataframe `df`.\n", "\n", "1. `plot_correlation(df)`: plots correlation matrices (correlations between all pairs of columns)\n", - "2. `plot_correlation(df, x)`: plots the most correlated columns to column `x`\n", - "3. `plot_correlation(df, x, y)`: plots the joint distribution of column `x` and column `y` and computes a regression line\n", + "2. `plot_correlation(df, col1)`: plots the most correlated columns to column `col1`\n", + "3. `plot_correlation(df, col1, col2)`: plots the joint distribution of column `col1` and column `col2` and computes a regression line\n", "\n", - "The following table summarizes the output plots for different settings of `x` and `y`.\n", + "The following table summarizes the output plots for different settings of `col1` and `col2`.\n", "\n", - "| `x` | `y` | Output |\n", + "| `col1` | `col2` | Output |\n", "| --- | --- | --- |\n", "| None | None | *n*\\**n* correlation matrix, computed with [Person](https://www.wikiwand.com/en/Pearson_correlation_coefficien), [Spearman](https://www.wikiwand.com/en/Spearman%27s_rank_correlation_coefficient), and [KendallTau](https://www.wikiwand.com/en/Kendall_rank_correlation_coefficient) correlation coefficients | \n", "| Numerical | None | *n*\\*1 correlation matrix, computed with Pearson, Spearman, and KendallTau correlation coefficients |\n", @@ -86,7 +86,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Find the columns that are most correlated to column `x` with `plot_correlation(df, x)`\n", + "## Find the columns that are most correlated to column `col1` with `plot_correlation(df, col1)`\n", "\n", "After computing the correlation matrices, we can discover how other columns correlate to a specific column `x` using `plot_correlation(df, x)`. This function computes the correlation between column `x` and all other columns (using Pearson, Spearman, and KendallTau correlation coefficients), and sorts them in decreasing order. This enables easy determination of the columns that are most positively and negatively correlated with column `x`. The following shows an example:" ] @@ -109,9 +109,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Explore the correlation between two columns with `plot_correlation(df, x, y)`\n", + "## Explore the correlation between two columns with `plot_correlation(df, col1, col2)`\n", "\n", - "Furthermore, `plot_correlation(df, x, y)` provides detailed analysis of the correlation between two columns `x` and `y`. It plots the joint distribution of the columns `x` and `y` as a scatter plot, as well as a regression line. The following shows an example:" + "Furthermore, `plot_correlation(df, col1, col2)` provides detailed analysis of the correlation between two columns `col1` and `col2`. It plots the joint distribution of the columns `col1` and `col2` as a scatter plot, as well as a regression line. The following shows an example:" ] }, { @@ -193,4 +193,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/docs/source/user_guide/eda/plot_missing.ipynb b/docs/source/user_guide/eda/plot_missing.ipynb index 5c60d748b..0a1196556 100644 --- a/docs/source/user_guide/eda/plot_missing.ipynb +++ b/docs/source/user_guide/eda/plot_missing.ipynb @@ -21,8 +21,8 @@ "The function `plot_missing()` enables thorough analysis of the missing values and their impact on the dataset. The *impact* is the change in the dataset's characteristics (e.g., the histogram of a numerical column or bar chart of a categorical column) after removing the rows with missing values from the dataset. The following describes the functionality of `plot_missing()` for a given dataframe `df`.\n", "\n", "1. `plot_missing(df)`: plots the amount and position of missing values, and their relationship between columns\n", - "2. `plot_missing(df, x)`: plots the impact of the missing values in column `x` on all other columns\n", - "3. `plot_missing(df, x, y)`: plots the impact of the missing values from column `x` on column `y` in various ways.\n", + "2. `plot_missing(df, col1)`: plots the impact of the missing values in column `col1` on all other columns\n", + "3. `plot_missing(df, col1, col2)`: plots the impact of the missing values from column `col1` on column `col2` in various ways.\n", "\n", "Next, we demonstrate the functionality of `plot_missing()`. " ] @@ -95,9 +95,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Understand the *impact* of the missing values in column *x* with `plot_missing(df, x)`\n", + "## Understand the *impact* of the missing values in column *x* with `plot_missing(df, col1)`\n", "\n", - "After getting an overview of the missing values with `plot_missing(df)`, we can analyze the impact of the missing values in a specific column `x` with `plot_missing(df, x)`. The *impact* of the missing values in column `x` is the change in the dataset's characteristics after removing the rows where column `x`'s values are missing. Here, we consider two types of characteristics: the histogram (for numerical columns) and the bar chart (for categorical columns). `plot_missing(df, x)` plots the histogram or bar chart (for appropriate column types) for each column before and after removing the rows that contain missing values in column `x`.\n", + "After getting an overview of the missing values with `plot_missing(df)`, we can analyze the impact of the missing values in a specific column `col1` with `plot_missing(df, col1)`. The *impact* of the missing values in column `col1` is the change in the dataset's characteristics after removing the rows where column `col1`'s values are missing. Here, we consider two types of characteristics: the histogram (for numerical columns) and the bar chart (for categorical columns). `plot_missing(df, col1)` plots the histogram or bar chart (for appropriate column types) for each column before and after removing the rows that contain missing values in column `col1`.\n", "\n", "The following shows an example:" ] @@ -120,12 +120,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Understand the impact of the missing values in column `x` on column `y` with `plot_missing(df, x, y)`\n", + "## Understand the impact of the missing values in column `col1` on column `col2` with `plot_missing(df, col1, col2)`\n", "\n", "\n", - "`plot_missing(df, x)` only displays the frequency distribution of each column before and after removing the rows containing missing values in column `x`. If the user is specifically concerned with the impact of the missing values in one column `x` on another column `y`, she/he can call `plot_missing(df, x, y)`. `plot_missing(df, x, y)` plots the impact of the missing values in column `x` on column `y` in different ways depending on the type of column `y`.\n", + "`plot_missing(df, col1)` only displays the frequency distribution of each column before and after removing the rows containing missing values in column `col1`. If the user is specifically concerned with the impact of the missing values in one column `col1` on another column `col2`, she/he can call `plot_missing(df, col1, col2)`. `plot_missing(df, col1, col2)` plots the impact of the missing values in column `col1` on column `col2` in different ways depending on the type of column `col2`.\n", "\n", - "If `y` is a numerical column, `plot_missing(df, x, y)` shows the impact as a histogram, pdf, cdf, and box plot. The following shows an example:" + "If `col2` is a numerical column, `plot_missing(df, col1, col2)` shows the impact as a histogram, pdf, cdf, and box plot. The following shows an example:" ] }, { @@ -146,7 +146,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If `y` is a categorical column, `plot_missing(df, x, y)` shows the impact as a bar chart. The following shows an example:" + "If `y` is a categorical column, `plot_missing(df, col1, col2)` shows the impact as a bar chart. The following shows an example:" ] }, { @@ -228,4 +228,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file