Sometimes you just need a little nudge before delving into something new, and sometimes that little nudge is a RStudio competition. Last year I finally gave Shiny ago after seeing RStudio announce the first ever Shiny contest - and things worked out pretty well! So here I am again to try and repeat the trick, but this time - it’s tables!
In this post I’ll be using the gt package to render a table in HTML. There are several packages in R for creating tables, but I was drawn to gt
(which is short for the grammar of tables) because of my love for ggplot2
(the grammar of graphics) and their shared philosophy of building output layer by layer, with the flexibility to customise the output.
For the full code to reproduce the below, go to this repo on Github: https://github.com/committedtotape/education-spending.
For a great introduction to the basics of gt
, look no further than Thomas Mock’s blogpost, and check out his other posts on building tables in R while you’re at it! I’ll skip over some of the core building blocks that Thomas has covered, as I want to mainly focus on ways to customise a table to give it that little bit extra!
The data comes from the wonderful TidyTuesday project, and it contains details on the public spending on children from 1997 to 2016 in the US. The data was compiled from Urban Institute by Joshua Rosenberg:
library(tidyverse)
library(gt)
kids <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-15/kids.csv')
I am just going to focus on Elementary and Secondary schools spending (see the codebook for descriptions of the types of spending included in data), and I’ll compare across states and years using the inflation adjusted per child figure:
school_spend <- kids %>%
filter(variable == "PK12ed") %>%
select(state, year, inf_adj_perchild)
school_spend
## # A tibble: 1,020 x 3
## state year inf_adj_perchild
## <chr> <dbl> <dbl>
## 1 Alabama 1997 3.93
## 2 Alaska 1997 7.55
## 3 Arizona 1997 3.71
## 4 Arkansas 1997 3.89
## 5 California 1997 4.28
## 6 Colorado 1997 4.38
## 7 Connecticut 1997 6.70
## 8 Delaware 1997 5.63
## 9 District of Columbia 1997 6.11
## 10 Florida 1997 4.45
## # … with 1,010 more rows
As we are dealing with data on education spending I thought it would be fun to make my final output look like a table written up on a school chalkboard. I love customising ggplot2
graphs in this sort of way, so why not try something similar with my first foray into gt
.
There are some things I need to do before building the table. I just can’t resist using ggplot2
, so I will be embedding some plots into the table. Firstly, I want some sparklines to illustrate how spending has changed over time. Here I create a function that will do just that:
# function for plotting spark lines
plot_spark <- function(name, df) {
plot_object <-
ggplot(data = df,
aes(x = year, y = inf_adj_perchild)) +
geom_line(colour = "#F3FCF0", size = 12) +
theme_void() +
theme(plot.background = element_rect(fill = "#47745A", colour = "#47745A"),
plot.margin = margin(0,0,0,0))
return(plot_object)
}
I want a nice, clean sparkline plot so I have removed all non-data ink with theme_void
. I am also setting the plot background colour to match the colour of my final table (as you’ll see later).
My table will also display the spending figure at the start and end of the period (1997 and 2016). As an added visual reference I am going to map these amounts to the height of bars, but give them a sketchy, hand-drawn look to align with my chalkboard aesthetic. This took much fiddling about, but a combination of runif
, jitter
and geom_curve
gave me the desired effect. I was really just trying to replicate the look of ggrough, which I tried out but couldn’t get to render within my gt
table:
# function for sketchy barplot
plot_bar <- function(name, df, yr) {
barheight <- df %>%
filter(year == yr) %>%
pull(inf_adj_perchild)
plot_object <-
tibble( x = runif(8*barheight, -0.01, 0.01), xend = runif(8*barheight, 0.99, 1.01),
y = runif(8*barheight, 0.1, barheight), yend = y + jitter(0.05),
y1 = yend, yend1 = y - jitter(1), x1 = xend, xend1 = x) %>%
ggplot() +
geom_curve(aes(x = x, xend = xend, y = y, yend = yend),
colour = "white", size = 2, curvature = 0.01, alpha = 0.7) +
geom_curve(aes(x = x1, xend = xend1, y = y1, yend = yend1),
colour = "white", size = 2, curvature = -0.01, alpha = 0.7) +
scale_y_continuous(limits = c(-0.2,20)) +
scale_x_continuous(limits = c(-0.2,1.2)) +
theme_void() +
theme(plot.background = element_rect(fill = "#47745A",
colour = "#47745A"),
plot.margin = margin(0,0,0,0))
return(plot_object)
}
I’ve again used theme_void
to keep the plots clutter-free.
I want the plots to appear for each state in the dataset. To do this I group my data by state and then nest
to get nested tables of data for each state within my dataframe. I can then map this nested data for each state through my plotting functions. For the bar charts I am creating a 1997 and a 2016 version:
# df of plots for each state
plots <-
school_spend %>%
dplyr::group_by(state) %>%
tidyr::nest() %>%
dplyr::mutate(
spark = map2(state, data, plot_spark),
bar97 = map2(state, data, plot_bar, yr = 1997),
bar16 = map2(state, data, plot_bar, yr = 2016)) %>%
select(-data)
plots
## # A tibble: 51 x 4
## # Groups: state [51]
## state spark bar97 bar16
## <chr> <list> <list> <list>
## 1 Alabama <gg> <gg> <gg>
## 2 Alaska <gg> <gg> <gg>
## 3 Arizona <gg> <gg> <gg>
## 4 Arkansas <gg> <gg> <gg>
## 5 California <gg> <gg> <gg>
## 6 Colorado <gg> <gg> <gg>
## 7 Connecticut <gg> <gg> <gg>
## 8 Delaware <gg> <gg> <gg>
## 9 District of Columbia <gg> <gg> <gg>
## 10 Florida <gg> <gg> <gg>
## # … with 41 more rows
Now, to get the data in the format I desire, I will filter to just the 2 years I want for each state, and then use pivot_wider
to turn these 2 rows for each state into 2 columns. I then also inner_join
the plotting information from previous code chunk. Lastly, I create 3 empty variables to serve as placeholders for the plots used in the gt
build:
table_prepped <- school_spend %>%
filter(year %in% c(1997, 2016)) %>%
pivot_wider(names_from = year, values_from = inf_adj_perchild) %>%
mutate(percent_change = (`2016` - `1997`) / `1997`) %>%
inner_join(plots, by = "state") %>%
mutate(ggplot1 = NA,
ggplot2 = NA,
ggplot3 = NA) %>%
select(state, `1997`, ggplot1, `2016`, ggplot2, percent_change, ggplot3, bar97, bar16, spark)
table_prepped
## # A tibble: 51 x 10
## state `1997` ggplot1 `2016` ggplot2 percent_change ggplot3 bar97 bar16 spark
## <chr> <dbl> <lgl> <dbl> <lgl> <dbl> <lgl> <lis> <lis> <lis>
## 1 Alaba… 3.93 NA 5.67 NA 0.443 NA <gg> <gg> <gg>
## 2 Alaska 7.55 NA 10.6 NA 0.408 NA <gg> <gg> <gg>
## 3 Arizo… 3.71 NA 4.15 NA 0.118 NA <gg> <gg> <gg>
## 4 Arkan… 3.89 NA 5.93 NA 0.523 NA <gg> <gg> <gg>
## 5 Calif… 4.28 NA 7.17 NA 0.673 NA <gg> <gg> <gg>
## 6 Color… 4.38 NA 6.06 NA 0.384 NA <gg> <gg> <gg>
## 7 Conne… 6.70 NA 10.8 NA 0.614 NA <gg> <gg> <gg>
## 8 Delaw… 5.63 NA 8.44 NA 0.500 NA <gg> <gg> <gg>
## 9 Distr… 6.11 NA 17.7 NA 1.89 NA <gg> <gg> <gg>
## 10 Flori… 4.45 NA 5.50 NA 0.236 NA <gg> <gg> <gg>
## # … with 41 more rows
A massive thanks to CerebralMastication and the commenters on this github issue for helping me get to a working solution on this - it had caused me quite the headache! Pretty much all the code for getting the inline plots was stolen from there.
Finally, I’ll truncate to just the top 10 spenders (based on 2016 figure). This is pretty arbitrary, but I wanted to focus more on the aesthetics of the table, so sacrificed some of the data:
top10 <- table_prepped %>%
arrange(-`2016`) %>%
head(10)
top10
## # A tibble: 10 x 10
## state `1997` ggplot1 `2016` ggplot2 percent_change ggplot3 bar97 bar16 spark
## <chr> <dbl> <lgl> <dbl> <lgl> <dbl> <lgl> <lis> <lis> <lis>
## 1 Distr… 6.11 NA 17.7 NA 1.89 NA <gg> <gg> <gg>
## 2 New Y… 7.05 NA 12.8 NA 0.811 NA <gg> <gg> <gg>
## 3 Vermo… 5.63 NA 12.0 NA 1.14 NA <gg> <gg> <gg>
## 4 New J… 7.22 NA 11.6 NA 0.605 NA <gg> <gg> <gg>
## 5 Conne… 6.70 NA 10.8 NA 0.614 NA <gg> <gg> <gg>
## 6 Alaska 7.55 NA 10.6 NA 0.408 NA <gg> <gg> <gg>
## 7 Wyomi… 5.81 NA 10.6 NA 0.823 NA <gg> <gg> <gg>
## 8 Rhode… 6.06 NA 10.2 NA 0.683 NA <gg> <gg> <gg>
## 9 New H… 5.06 NA 9.82 NA 0.941 NA <gg> <gg> <gg>
## 10 Massa… 5.52 NA 9.49 NA 0.720 NA <gg> <gg> <gg>
I’ll start with the basis of my table. Passing my data to the gt()
function will create a table, but I’m just taking it a little bit further here by adding some formatting, aligning columns and labelling columns. I won’t dwell on these steps as it is covered excellently by Thomas Mock in the aforementioned blog post:
gt_table_1 <- top10 %>%
gt() %>%
fmt_currency(
columns = vars(`1997`,`2016`),
decimals = 1,
pattern = "{x}k"
) %>%
fmt_percent(
columns = vars(percent_change),
decimals = 0
) %>%
cols_align(
align = "right",
columns = vars(`1997`,`2016`)
) %>%
cols_label(
state = "State",
percent_change = "Change",
ggplot1 = " ",
ggplot2 = " ",
ggplot3 = "Trend"
)
Let’s display what we have. I will need to perform some transformation on the plotting data (which I’ll do later), so for now I will hide these columns with the cols_hide
function:
gt_table_1 %>%
cols_hide(vars(ggplot1, ggplot2, ggplot3, bar97, bar16, spark))
State | 1997 | 2016 | Change |
---|---|---|---|
District of Columbia | $6.1k | $17.7k | 189% |
New York | $7.0k | $12.8k | 81% |
Vermont | $5.6k | $12.0k | 114% |
New Jersey | $7.2k | $11.6k | 60% |
Connecticut | $6.7k | $10.8k | 61% |
Alaska | $7.5k | $10.6k | 41% |
Wyoming | $5.8k | $10.6k | 82% |
Rhode Island | $6.1k | $10.2k | 68% |
New Hampshire | $5.1k | $9.8k | 94% |
Massachusetts | $5.5k | $9.5k | 72% |
Before I add in the plots, let’s have some fun customising the table. I want to give the table a green background reminiscent of a chalkboard, whilst also adding a brown frame (too much? Probably). Getting the border turned out trickier than I thought. I’ve used a combination of the tab_style
function to change cell_borders
plus the tab_options
function. The state
variable is the left-most column so I am adding a brown border (#A36734) on the left of the body (data rows) and header (column labels) of the table, plus the title and subtitle cells (which I’ll add later). I’ll do something similar for the right-hand side of the table (which will be the 3rd ggplot). Within another tab_style
I also fill the body of the table with the green chalkboard colour (#47745A):
gt_table_2 <- gt_table_1 %>%
# left hand board frame
tab_style(
style = cell_borders(sides = "left",
color = "#A36734",
weight = px(20)
),
locations = list(cells_body(columns = vars(state)), cells_column_labels(columns = vars(state)),
cells_title("title"), cells_title("subtitle"))
) %>%
# right hand board frame
tab_style(
style = cell_borders(
sides = "right",
color = "#A36734",
weight = px(20)
),
locations = list(cells_body(columns = vars(ggplot3)),
cells_column_labels(columns = vars(ggplot3)),
cells_title("title"), cells_title("subtitle"))
) %>%
tab_style(
style = cell_fill(color = "#47745A"),
locations = cells_body(
columns = gt::everything())
) %>%
# table options
tab_options(table.background.color = "#A36734",
heading.background.color = "#47745A",
column_labels.background.color = "#47745A",
table.font.color = "#F3FCF0",
data_row.padding = px(10),
table.border.top.width = px(20),
table.border.top.color = "#A36734",
table.border.bottom.color = "#A36734",
source_notes.background.color = "#A36734"
)
gt_table_2 %>%
cols_hide(vars(ggplot1, ggplot2, ggplot3, bar97, bar16, spark))
I’ve made further colour changes with the tab_options
function. In order to get the brown border around the whole table I ended up setting the entire background of the table (table.background.color
), along with the top and bottom borders, to the brown colour, but then applied the green colour to majority of the other elements.
State | 1997 | 2016 | Change |
---|---|---|---|
District of Columbia | $6.1k | $17.7k | 189% |
New York | $7.0k | $12.8k | 81% |
Vermont | $5.6k | $12.0k | 114% |
New Jersey | $7.2k | $11.6k | 60% |
Connecticut | $6.7k | $10.8k | 61% |
Alaska | $7.5k | $10.6k | 41% |
Wyoming | $5.8k | $10.6k | 82% |
Rhode Island | $6.1k | $10.2k | 68% |
New Hampshire | $5.1k | $9.8k | 94% |
Massachusetts | $5.5k | $9.5k | 72% |
The border doesn’t look right yet, as I am still hiding columns, including the right-most column.
Next let’s work on the text. I’m going to change the font styles throughout, and I am also going to add a title, subtitle and source note. I adjust the font, colour and alignment of the text in specific locations
using the cell_text
function within tab_style
.
I then add the detail of the source note (tab_source_note
), title and subtitle (tab_header
). The great thing about gt
is that as you’re rendering HTML you can style your text with HTML. Below I am making some further adjustments to the colour and size of some of the elements by writing HTML code within the html
function. This is great because it means you can also change style within elements (as I do to 2nd line of the subtitle):
gt_table_3 <- gt_table_2 %>%
# Adjust title font
tab_style(
style = cell_text(
font = "Chalkduster",
color = "#8DDBE0",
weight = "bold",
align = "center"),
locations = cells_title(groups = "title")
) %>%
# Adjust sub-title font
tab_style(
style = cell_text(
font = "Chalkduster",
align = "center"),
locations = cells_title(groups = "subtitle")
) %>%
# Style header font
tab_style(
style = cell_text(font = "Chalkduster",
weight = "bold",
color = "#FFD23F"),
locations = cells_column_labels(gt::everything())
) %>%
tab_style(
style = cell_text(font = "Chalkduster"),
locations = cells_body(columns = gt::everything())
) %>%
tab_source_note(html("<span style='color:#4E3423;font-size:12pt'><b>TABLE: committedtotape | DATA: Urban Institute</b></span>")) %>%
tab_header(
title = html("<br><span style='color:#F2AA99;font-size:20pt'>TOP OF THE CLASS</span><br>Public Spending on Elementary & Secondary Education"),
subtitle = html("Dollars spent per child (inflation adjusted) in the US By State, 1997-2016<br><br><span style='color:#F2AA99;'>Top 10 Spenders in 2016</span>")
) %>%
#borders
tab_style(
style = cell_borders(
sides = c("bottom", "top"),
color = "white",
weight = px(5)
),
locations = cells_column_labels(columns = gt::everything())
) %>%
tab_style(
style = cell_borders(
sides = "bottom",
color = "white",
weight = px(3)),
locations = cells_body()
)
gt_table_3 %>%
cols_hide(vars(ggplot1, ggplot2, ggplot3, bar97, bar16, spark))
Finally in the above I adjust the internal borders of the table.
TOP OF THE CLASS Public Spending on Elementary & Secondary Education |
|||
---|---|---|---|
Dollars spent per child (inflation adjusted) in the US By State, 1997-2016 Top 10 Spenders in 2016 |
|||
State | 1997 | 2016 | Change |
District of Columbia | $6.1k | $17.7k | 189% |
New York | $7.0k | $12.8k | 81% |
Vermont | $5.6k | $12.0k | 114% |
New Jersey | $7.2k | $11.6k | 60% |
Connecticut | $6.7k | $10.8k | 61% |
Alaska | $7.5k | $10.6k | 41% |
Wyoming | $5.8k | $10.6k | 82% |
Rhode Island | $6.1k | $10.2k | 68% |
New Hampshire | $5.1k | $9.8k | 94% |
Massachusetts | $5.5k | $9.5k | 72% |
TABLE: committedtotape | DATA: Urban Institute |
Now let’s include the plots. The plotting data needs to be transformed using the text_transform
function. Each plotting column goes through its own text_transform
where it is mapped to the ggplot_image
function of gt
. You can adjust the height
and aspect_ratio
arguments of this function to get the desired dimensions.
The 3 sets of plots are put into the placeholder columns created earlier, and at the end we remove just the 3 orginal plotting columns from the table:
gt_table_4 <- gt_table_3 %>%
# rendering bar 1997
text_transform(
locations = cells_body(columns = vars(ggplot1)), # use empty cell as location
fn = function(x) {
# Insert each image into each empty cell in `ggplot`
map(top10$bar97, ggplot_image, height = px(50), aspect_ratio = 1.5)
}
) %>%
# rendering bar 2016
text_transform(
locations = cells_body(columns = vars(ggplot2)),
fn = function(x) {
map(top10$bar16, ggplot_image, height = px(50), aspect_ratio = 1.5)
}
) %>%
# rendering the spark lines
text_transform(
locations = cells_body(columns = vars(ggplot3)),
fn = function(x) {
map(top10$spark, ggplot_image, height = px(30), aspect_ratio = 4)
}
) %>%
cols_hide(vars(bar97, bar16, spark))
gt_table_4
This completes the table - let’s take a look:
TOP OF THE CLASS Public Spending on Elementary & Secondary Education |
||||||
---|---|---|---|---|---|---|
Dollars spent per child (inflation adjusted) in the US By State, 1997-2016 Top 10 Spenders in 2016 |
||||||
State | 1997 | 2016 | Change | Trend | ||
District of Columbia | $6.1k | $17.7k | 189% | |||
New York | $7.0k | $12.8k | 81% | |||
Vermont | $5.6k | $12.0k | 114% | |||
New Jersey | $7.2k | $11.6k | 60% | |||
Connecticut | $6.7k | $10.8k | 61% | |||
Alaska | $7.5k | $10.6k | 41% | |||
Wyoming | $5.8k | $10.6k | 82% | |||
Rhode Island | $6.1k | $10.2k | 68% | |||
New Hampshire | $5.1k | $9.8k | 94% | |||
Massachusetts | $5.5k | $9.5k | 72% | |||
TABLE: committedtotape | DATA: Urban Institute |
And there we have it! A potentially unnecessary exploration into how you can turn a default gt
table into something more at home in the school classroom. Probably something you’ll never need to do yourself, but hopefully this post has demonstrated some of the functionality within gt
that you can use to customise your tables and make them stand-out from the crowd.
As I said up top, I have entered this table into RStudio’s Table Competition for 2020. You can find my entry, along with all the others, on RStudio Community.
There’s still a lot about gt
that I want to learn and master, but this was a fun place to start - thanks for reading, I hope there was no falling asleep at the back!