# Scatterplot with marginal distributions and statistical results

Source:`R/ggscatterstats.R`

`ggscatterstats.Rd`

Scatterplots from `{ggplot2}`

combined with marginal densigram (density +
histogram) plots with statistical details.

## Usage

```
ggscatterstats(
data,
x,
y,
type = "parametric",
conf.level = 0.95,
bf.prior = 0.707,
bf.message = TRUE,
tr = 0.2,
k = 2L,
results.subtitle = TRUE,
label.var = NULL,
label.expression = NULL,
marginal = TRUE,
xfill = "#009E73",
yfill = "#D55E00",
point.args = list(size = 3, alpha = 0.4, stroke = 0, na.rm = TRUE),
point.width.jitter = 0,
point.height.jitter = 0,
point.label.args = list(size = 3, max.overlaps = 1e+06),
smooth.line.args = list(size = 1.5, color = "blue", method = "lm", formula = y ~ x,
na.rm = TRUE),
xsidehistogram.args = list(fill = xfill, color = "black", na.rm = TRUE),
ysidehistogram.args = list(fill = yfill, color = "black", na.rm = TRUE),
xlab = NULL,
ylab = NULL,
title = NULL,
subtitle = NULL,
caption = NULL,
ggtheme = ggstatsplot::theme_ggstatsplot(),
ggplot.component = NULL,
output = "plot",
...
)
```

## Arguments

- data
A dataframe (or a tibble) from which variables specified are to be taken. Other data types (e.g., matrix,table, array, etc.) will

**not**be accepted.- x
The column in

`data`

containing the explanatory variable to be plotted on the`x`

-axis.- y
The column in

`data`

containing the response (outcome) variable to be plotted on the`y`

-axis.- type
A character specifying the type of statistical approach:

`"parametric"`

`"nonparametric"`

`"robust"`

`"bayes"`

You can specify just the initial letter.

- conf.level
Scalar between

`0`

and`1`

. If unspecified, the defaults return`95%`

confidence/credible intervals (`0.95`

).- bf.prior
A number between

`0.5`

and`2`

(default`0.707`

), the prior width to use in calculating Bayes factors and posterior estimates. In addition to numeric arguments, several named values are also recognized:`"medium"`

,`"wide"`

, and`"ultrawide"`

, corresponding to*r*scale values of 1/2, sqrt(2)/2, and 1, respectively. In case of an ANOVA, this value corresponds to scale for fixed effects.- bf.message
Logical that decides whether to display Bayes Factor in favor of the

*null*hypothesis. This argument is relevant only**for parametric test**(Default:`TRUE`

).- tr
Trim level for the mean when carrying out

`robust`

tests. In case of an error, try reducing the value of`tr`

, which is by default set to`0.2`

. Lowering the value might help.- k
Number of digits after decimal point (should be an integer) (Default:

`k = 2L`

).- results.subtitle
Decides whether the results of statistical tests are to be displayed as a subtitle (Default:

`TRUE`

). If set to`FALSE`

, only the plot will be returned.- label.var
Variable to use for points labels entered as a symbol (e.g.

`var1`

).- label.expression
An expression evaluating to a logical vector that determines the subset of data points to label (e.g.

`y < 4 & z < 20`

). While using this argument with`purrr::pmap`

, you will have to provide a quoted expression (e.g.`quote(y < 4 & z < 20)`

).- marginal
Decides whether marginal distributions will be plotted on axes using

`ggside`

functions. The default is`TRUE`

. The package`ggside`

must already be installed by the user.- xfill, yfill
Character describing color fill for

`x`

and`y`

axes marginal distributions (default:`"#009E73"`

(for`x`

) and`"#D55E00"`

(for`y`

)). Note that the defaults are colorblind-friendly.- point.args
A list of additional aesthetic arguments to be passed to

`geom_point`

geom used to display the raw data points.- point.width.jitter, point.height.jitter
Degree of jitter in

`x`

and`y`

direction, respectively. Defaults to`0`

(0%) of the resolution of the data. Note that the jitter should not be specified in the`point.args`

because this information will be passed to two different`geom`

s: one displaying the**points**and the other displaying the ***labels**for these points.- point.label.args
A list of additional aesthetic arguments to be passed to

`ggrepel::geom_label_repel`

geom used to display the labels.- smooth.line.args
A list of additional aesthetic arguments to be passed to

`geom_smooth`

geom used to display the regression line.- xsidehistogram.args, ysidehistogram.args
A list of arguments passed to respective

`geom_`

s from`ggside`

package to change the marginal distribution histograms plots.- xlab
Labels for

`x`

and`y`

axis variables. If`NULL`

(default), variable names for`x`

and`y`

will be used.- ylab
Labels for

`x`

and`y`

axis variables. If`NULL`

(default), variable names for`x`

and`y`

will be used.- title
The text for the plot title.

- subtitle
The text for the plot subtitle. Will work only if

`results.subtitle = FALSE`

.- caption
The text for the plot caption.

- ggtheme
A

`{ggplot2}`

theme. Default value is`ggstatsplot::theme_ggstatsplot()`

. Any of the`{ggplot2}`

themes (e.g.,`theme_bw()`

), or themes from extension packages are allowed (e.g.,`ggthemes::theme_fivethirtyeight()`

,`hrbrthemes::theme_ipsum_ps()`

, etc.).- ggplot.component
A

`ggplot`

component to be added to the plot prepared by`{ggstatsplot}`

. This argument is primarily helpful for`grouped_`

variants of all primary functions. Default is`NULL`

. The argument should be entered as a`{ggplot2}`

function or a list of`{ggplot2}`

functions.- output
Character that describes what is to be returned: can be

`"plot"`

(default) or`"subtitle"`

or`"caption"`

. Setting this to`"subtitle"`

will return the expression containing statistical results. If you have set`results.subtitle = FALSE`

, then this will return a`NULL`

. Setting this to`"caption"`

will return the expression containing details about Bayes Factor analysis, but valid only when`type = "parametric"`

and`bf.message = TRUE`

, otherwise this will return a`NULL`

.- ...
Currently ignored.

## Details

For details, see: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/ggscatterstats.html

## Note

The plot uses `ggrepel::geom_label_repel`

to attempt to keep labels
from over-lapping to the largest degree possible. As a consequence plot
times will slow down massively (and the plot file will grow in size) if you
have a lot of labels that overlap.

## Examples

```
# to get reproducible results from bootstrapping
set.seed(123)
library(ggstatsplot)
library(dplyr, warn.conflicts = FALSE)
# creating dataframe with rownames converted to a new column
mtcars_new <- as_tibble(mtcars, rownames = "car")
# simple function call with the defaults
if (require("ggside")) {
ggscatterstats(
data = mtcars_new,
x = wt,
y = mpg,
label.var = car,
label.expression = wt < 4 & mpg < 20
) + # making further customization with `{ggplot2}` functions
geom_rug(sides = "b")
}
#> Loading required package: ggside
#> Registered S3 method overwritten by 'ggside':
#> method from
#> +.gg ggplot2
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
```