滴水穿石

种一棵树最好的时间是十年前,其次是现在

0%

如何绘制带有显著性标记的箱线图/柱状图

介绍如何用R绘制多组比较的箱线图/柱状图。

数据准备

Prerequisites
Make sure you have installed the following R packages:
tidyverse for data manipulation and visualization
ggpubr for creating easily publication ready plots
rstatix provides pipe-friendly R functions for easy statistical analyses.

1
library(ggpubr)
1
## Loading required package: ggplot2
1
## RStudio Community is a great place to get help: https://community.rstudio.com/c/tidyverse
1
2
3
## Registered S3 method overwritten by 'data.table':
## method from
## print.data.table
1
library(rstatix)
1
2
## 
## Attaching package: 'rstatix'
1
2
3
## The following object is masked from 'package:stats':
##
## filter
1
2
3
4
5
6
# Transform `dose` into factor variable 
df <- ToothGrowth
df$dose <- as.factor(df$dose)
# Add a random grouping variable
df$group <- factor(rep(c("grp1", "grp2"), 30))
head(df, 3)

每个panel中包含两组数据

使用一个变量进行分面

Statistical tests
使用dose变量进行分面,并在x轴上水平上比较supp变量。

1
2
3
4
5
6
stat.test <- df %>%
group_by(dose) %>%
t_test(len ~ supp) %>%
adjust_pvalue(method = "bonferroni") %>%
add_significance()
stat.test

Box plots

1
2
3
4
5
6
7
8
9
# Create a box plot
bxp <- ggboxplot(
df, x = "supp", y = "len", fill = "#00AFBB",
facet.by = "dose"
)

# Make facet and add p-values
stat.test <- stat.test %>% add_xy_position(x = "supp")
bxp + stat_pvalue_manual(stat.test)

plot of chunk unnamed-chunk-6

Box plots 显示P值
使分面比例自由,并添加散点Make the facet scale free and add jitter points
使用 bracket.nudge.y将括号位置向下调整
不显示ns (non-significant)
显示矫正之后并且Pvalue显著的标志
在p-value标签和plot边框之间添加10%的空格

1
2
3
4
5
6
7
8
9
10
bxp <- ggboxplot(
df, x = "supp", y = "len", fill = "#00AFBB",
facet.by = "dose", scales = "free", add = "jitter"
)
bxp +
stat_pvalue_manual(
stat.test, bracket.nudge.y = -2, hide.ns = TRUE,
label = "{p.adj}{p.adj.signif}"
) +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.1)))

plot of chunk unnamed-chunk-7

Bar plots
使用 add = “mean_sd”用于创建error bar(mean +/- SD)。
需要使用汇总统计函数来自动计算p-value标签位置 add_xy_position()。

1
2
3
4
5
6
7
8
9
# Create a bar plot with error bars (mean +/- sd)
bp <- ggbarplot(
df, x = "supp", y = "len", add = "mean_sd",
fill = "#00AFBB", facet.by = "dose"
)

# Add p-values onto the bar plots
stat.test <- stat.test %>% add_xy_position(fun = "mean_sd", x = "supp")
bp + stat_pvalue_manual(stat.test)

plot of chunk unnamed-chunk-8

Bar plots with jitter points
在计算p-value标签的位置时,需要设定fun = “max”,
从而将括号将从数据点的最大值开始,避免数据点和括号之间的重叠。

1
2
3
4
5
6
7
8
9
# Create a bar plot with error bars (mean +/- sd)
bp <- ggbarplot(
df, x = "supp", y = "len", add = c("mean_sd", "jitter"),
fill = "#00AFBB", facet.by = "dose"
)

# Add p-values onto the bar plots
stat.test <- stat.test %>% add_xy_position(fun = "max", x = "supp")
bp + stat_pvalue_manual(stat.test)

plot of chunk unnamed-chunk-9

使用两个变量进行分面

使用dose和group变量进行分面,并在x轴上水平上比较supp变量。

Statistical tests

1
2
3
4
5
6
stat.test <- df %>%
group_by(group, dose) %>%
t_test(len ~ supp) %>%
adjust_pvalue(method = "bonferroni") %>%
add_significance()
stat.test

Box plots

1
2
3
4
5
6
7
8
# Create box plots with significance levels
# Hide ns (non-significant)
stat.test <- stat.test %>% add_xy_position(x = "supp")
ggboxplot(
df, x = "supp", y = "len", fill = "#E7B800",
facet = c("group", "dose")
) +
stat_pvalue_manual(stat.test, hide.ns = TRUE)

plot of chunk unnamed-chunk-11

Bar plots

1
2
3
4
5
6
7
8
# Create bar plots with significance levels
# Hide ns (non-significant)
stat.test <- stat.test %>% add_xy_position(x = "supp", fun = "mean_sd")
ggbarplot(
df, x = "supp", y = "len", fill = "#E7B800",
add = c("mean_sd", "jitter"), facet = c("group", "dose")
) +
stat_pvalue_manual(stat.test, hide.ns = TRUE)

plot of chunk unnamed-chunk-12

每个panel中包含三组或以上的数据

使用一个变量进行分面

执行所有两两比较

使用supp变量进行分组,然后对dose变量之间的水平进行两两比较。
Statistical test:

1
2
3
4
stat.test <- df %>%
group_by(supp) %>%
t_test(len ~ dose)
stat.test

在图中添加p值。
ggplot2中scale_y_continuous(expand = expand (mult = c(0,0.1))用于在标签和图形顶部边框之间添加更多空格

1
2
3
4
5
# Box plots with p-values
stat.test <- stat.test %>% add_y_position()
ggboxplot(df, x = "dose", y = "len", fill = "#FC4E07", facet.by = "supp") +
stat_pvalue_manual(stat.test, label = "p.adj.signif", tip.length = 0.01) +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.1)))

plot of chunk unnamed-chunk-14

1
2
3
4
5
6
7
8
9
# Bar plot with p-values
# Add 10% space on the y-axis above the box plots
stat.test <- stat.test %>% add_y_position(fun = "mean_sd")
ggbarplot(
df, x = "dose", y = "len", fill = "#FC4E07",
add = "mean_sd", facet.by = "supp"
) +
stat_pvalue_manual(stat.test, label = "p.adj.signif", tip.length = 0.01) +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.1)))

plot of chunk unnamed-chunk-15

与参照组进行两两比较

Statistical test:

1
2
3
4
stat.test <- df %>%
group_by(supp) %>%
t_test(len ~ dose, ref.group = "0.5")
stat.test

Box plots

1
2
3
4
stat.test <- stat.test %>% add_y_position()
ggboxplot(df, x = "dose", y = "len", fill = "#FC4E07", facet.by = "supp") +
stat_pvalue_manual(stat.test, label = "p.adj.signif", tip.length = 0.01) +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.1)))

plot of chunk unnamed-chunk-17

1
2
3
4
5
# Show only significance levels at x = group2
# Move down significance symbols using vjust
stat.test <- stat.test %>% add_y_position()
ggboxplot(df, x = "dose", y = "len", fill = "#FC4E07", facet.by = "supp") +
stat_pvalue_manual(stat.test, label = "p.adj.signif", x = "group2", vjust = 2)

plot of chunk unnamed-chunk-18

Bar plot

1
2
3
4
5
6
7
8
# Add 10% space on the y-axis above the box plots
stat.test <- stat.test %>% add_y_position(fun = "mean_sd")
ggbarplot(
df, x = "dose", y = "len", fill = "#FC4E07",
add = c("mean_sd", "jitter"), facet.by = "supp"
) +
stat_pvalue_manual(stat.test, label = "p.adj.signif", tip.length = 0.01) +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.1)))

plot of chunk unnamed-chunk-19

使用两个变量进行分面

Statistical test
使用dose和group变量进行分面,并在x轴上水平上比较supp变量。

1
2
3
4
5
6
stat.test <- df %>%
group_by(group, supp) %>%
t_test(len ~ dose) %>%
adjust_pvalue(method = "bonferroni") %>%
add_significance()
stat.test

Box plots

1
2
3
4
5
6
7
8
9
10
# Create box plots with significance levels
# Hide ns (non-significant)
# Add 15% space between labels and the plot top border
stat.test <- stat.test %>% add_xy_position(x = "dose")
ggboxplot(
df, x = "dose", y = "len", fill = "#FC4E07",
facet = c("group", "supp")
) +
stat_pvalue_manual(stat.test, hide.ns = TRUE) +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.15)))

plot of chunk unnamed-chunk-21

Bar plots

1
2
3
4
5
6
7
8
9
10
# Create bar plots with significance levels
# Hide ns (non-significant)
# Add 15% space between labels and the plot top border
stat.test <- stat.test %>% add_xy_position(x = "dose", fun = "mean_sd")
ggbarplot(
df, x = "dose", y = "len", fill = "#FC4E07",
add = c("mean_sd", "jitter"), facet = c("group", "supp")
) +
stat_pvalue_manual(stat.test, hide.ns = TRUE) +
scale_y_continuous(expand = expansion(mult = c(0.05, 0.15)))

plot of chunk unnamed-chunk-22

使用group变量进行分面

简单的plot

1
2
3
4
5
6
# Box plots
bxp <- ggboxplot(
df, x = "supp", y = "len", color = "dose",
palette = "jco", facet.by = "group"
)
bxp

plot of chunk unnamed-chunk-23

Bar plots

1
2
3
4
5
6
7
bp <- ggbarplot(
df, x = "supp", y = "len", color = "dose",
palette = "jco", add = "mean_sd",
position = position_dodge(0.8),
facet.by = "group"
)
bp

plot of chunk unnamed-chunk-24

执行所有两两比较

Statistical test:

1
2
3
4
stat.test <- df %>%
group_by(supp, group) %>%
t_test(len ~ dose)
stat.test

在图上添加p值:

1
2
3
4
5
6
7
8
9
10
# Box plots with p-values
# Hide ns (non-significant)
stat.test <- stat.test %>%
add_xy_position(x = "supp", dodge = 0.8)
bxp +
stat_pvalue_manual(
stat.test, label = "p.adj.signif", tip.length = 0.01,
hide.ns = TRUE
) +
scale_y_continuous(expand = expansion(mult = c(0.01, 0.1)))

plot of chunk unnamed-chunk-26

Bar plots

1
2
3
4
5
6
7
8
stat.test <- stat.test %>%
add_xy_position(x = "supp", fun = "mean_sd", dodge = 0.8)
bp +
stat_pvalue_manual(
stat.test, label = "p.adj.signif", tip.length = 0.01,
hide.ns = TRUE
) +
scale_y_continuous(expand = expansion(mult = c(0.01, 0.1)))

plot of chunk unnamed-chunk-27

与参照组进行两两比较

Statistical test:

1
2
3
4
stat.test <- df %>%
group_by(supp, group) %>%
t_test(len ~ dose, ref.group = "0.5")
stat.test

Box plots with p-values

1
2
3
4
5
6
7
stat.test <- stat.test %>%
add_xy_position(x = "supp", dodge = 0.8)
bxp +
stat_pvalue_manual(
stat.test, label = "p.adj.signif", tip.length = 0.01
) +
scale_y_continuous(expand = expansion(mult = c(0.01, 0.1)))

plot of chunk unnamed-chunk-29

Bar plots with p-values

1
2
3
4
5
6
7
stat.test <- stat.test %>%
add_xy_position(x = "supp", fun = "mean_sd", dodge = 0.8)
bp +
stat_pvalue_manual(
stat.test, label = "p.adj.signif", tip.length = 0.01
) +
scale_y_continuous(expand = expansion(mult = c(0.01, 0.1)))

plot of chunk unnamed-chunk-30

version

1
sessionInfo()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936
## [3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C
## [5] LC_TIME=Chinese (Simplified)_China.936
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rstatix_0.6.0 ggpubr_0.4.0 ggplot2_3.3.2
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.0 xfun_0.19 purrr_0.3.4 haven_2.3.1 carData_3.0-4
## [6] colorspace_1.4-1 vctrs_0.3.4 generics_0.1.0 htmltools_0.5.0 yaml_2.2.1
## [11] rlang_0.4.8 pillar_1.4.6 foreign_0.8-80 glue_1.4.2 withr_2.3.0
## [16] readxl_1.3.1 lifecycle_0.2.0 stringr_1.4.0 munsell_0.5.0 ggsignif_0.6.0
## [21] gtable_0.3.0 cellranger_1.1.0 ggsci_2.9 zip_2.1.1 evaluate_0.14
## [26] labeling_0.4.2 knitr_1.30 rio_0.5.16 forcats_0.5.0 curl_4.3
## [31] highr_0.8 broom_0.7.2 Rcpp_1.0.5 backports_1.2.0 scales_1.1.1
## [36] jsonlite_1.7.1 abind_1.4-5 farver_2.0.3 hms_0.5.3 digest_0.6.27
## [41] stringi_1.5.3 openxlsx_4.2.3 dplyr_1.0.2 grid_4.0.2 tools_4.0.2
## [46] magrittr_1.5 tibble_3.0.4 crayon_1.3.4 tidyr_1.1.2 car_3.0-10
## [51] pkgconfig_2.0.3 ellipsis_0.3.1 data.table_1.13.2 rmarkdown_2.5 R6_2.5.0
## [56] compiler_4.0.2
1
getwd()
1
## [1] "E:/02GDL/Blog/ShuilinLiao.github.io/source/_posts"

更多相关问题,请参考 ggpubr FAQ

参考链接

HOW TO ADD P-VALUES TO GGPLOT FACETS

参考文章如引起任何侵权问题,可以与我联系,谢谢。

-------- 本文结束 感谢阅读 --------
# 添加内容