Making a quick data visualization web-app with Shiny

Lately we’ve been getting concerned about our PHP error log. You know the story: errors start popping up, but they’re not causing problems in production, so you move on with your busy life. But you know in your heart of hearts that you should really be fixing the error.

The time has come for us to prune those errors, and I thought the first step should be, as always, to look at the data. Since it’s really the PHP developers who will know what to do with it, I thought it might be useful to make my analysis interactive. Enter Shiny: a web app framework that lets users interact directly with your data.

The first step was to massage my log data into a CSV that looks like this:

"date","error.id","error.count","access.count"
"2013-06-04","inc/foo/mario/journey.php:700",5,308733
"2013-06-04","inc/foo/mario/xenu.php:498",1,308733
"2013-06-04","inc/bar/mario/larp.php:363",14,308733
"2013-06-04","inc/nico.php:1859",3,308733
"2013-06-04","inc/spoot/heehaw.php:728",5,308733
"2013-06-04","inc/spoot/heehaw.php:735",5,308733
"2013-06-04","inc/spoot/heehaw.php:736",5,308733
"2013-06-04","inc/spoot/heehaw.php:737",5,308733
"2013-06-04","inc/spoot/heehaw.php:739",5,308733

For each date, error.id indicates the file and line on which the error occurred, error.count is how many times that error occurred on that date, and access.count is the total number of hits our app received on that date. With me so far?

Now I install Shiny (sure, this is artifice — I already had Shiny installed — but let’s pretend) at the R console like so:

install.packages('devtools')
library(devtools)
install_github('shiny', 'rstudio')
library(shiny)

And from the shell, I start a project:

mkdir portalserr
cd portalserr
cp /tmp/portalserr.csv .

Defining the UI

Now I write my app. I know what I want it to look like, so I’ll start with ui.R. Going through that bit by bit:

shinyUI(pageWithSidebar(
  headerPanel("PHP errors by time"),

I’m telling Shiny how to lay out my UI. I want a sidebar with form controls, and a header that describes the app.

  sidebarPanel(
    checkboxGroupInput("errors_shown", "Most common errors:", c(
      "davidbowie.php:50"="lib/exosite/robot/davidbowie.php:50",
      "heehaw.php:728"="inc/spoot/heehaw.php:728",
      …
      "llamas-10.php:84"="inc/widgets/llamas-10.php:84"
    )
  )),

Now we put a bunch of checkboxes on my sidebar. The first argument to checkboxGroupInput() gives the checkbox group a name. This is how server.R will refer to the checkbox contents. You’ll see.

The second argument is a label for the form control, and the last argument is a list (in non-R parlance an associative array or a hash) defining the checkboxes themselves. The keys (like davidbowie.php:50) will be the labels visible in the browser, and the values are the strings that server.R will receive when the corresponding box is checked.

  mainPanel(
    plotOutput("freqPlot")
  )

We’re finished describing the sidebar, so now we describe the main section of the page. It will contain only one thing: a plot called “freqPlot”.

And that’s it for the UI! But it needs something to talk to.

Defining the server

The server goes in — surprise — server.R. Let’s walk through that.

logfreq <- read.csv('portalserr.csv')
logfreq$date <- as.POSIXct(logfreq$date)
logfreq$perthou <- logfreq$error.count / logfreq$access.count * 10^3

We load the CSV into a data frame called logfreq and translate all the strings in the date column into POSIXct objects so that they’ll plot right.

Then we generate the perthou column, which contains the number of occurrences of a given error on a given day, per thousand requests that occurred that day.

shinyServer(function(input, output) {
  output$freqPlot

Okay now we start to see the magic that makes Shiny so easy to use: reactivity. We start declaring the server application with shinyServer(), which we pass a callback. That callback will be passed the input and output parameters.

input is a data frame containing the values of all the inputs we defined in ui.R. Whenever the user messes with those checkboxes, the reactive blocks (what does that mean? I’ll tell you in a bit) of our callback will be re-run, and the names of any checked boxes will be in input$errors_shown.

Similarly, output is where you put the stuff you want to send back to the UI, like freqPlot.

But the coolest part of this excerpt is the last bit: renderPlot({. That curly-bracket there means that what follows is an expression: a literal block of R code that can be evaluated later. Shiny uses expressions in a very clever way: it determines which expressions depend on which input elements, and when the user messes with inputs Shiny reevaluates only the expressions that depend on the inputs that were touched! That way, if you have a complicated analysis that can be broken down into independent subroutines, you don’t have to re-run the whole thing every time a single parameter changes.

     lf.filtered <- subset(logfreq, error.id %in% input$errors_shown)

      p <- ggplot(lf.filtered) +
        geom_point(aes(date, perthou, color=error.id), size=3) +
        geom_line(aes(date, perthou, color=error.id, group=error.id), size=2) +
        expand_limits(ymin=0) +
        theme(legend.position='left') +
        ggtitle('Errors per thousand requests') +
        ylab('Errors per thousand requests') +
        xlab('Date')
      print(p)

This logic will be reevaluated every time our checkboxes are touched. It filters the logfreq data frame down to just the errors whose boxes are checked, then makes a plot with ggplot2 and sends it to the UI.

And we’re done.

Running it

From the R console, we do this:

> runApp('/path/to/portalserr')

Listening on port 3087

This automatically opens up http://localhost:3087 in a browser and presents us with our shiny new… uh… Shiny app:

Why don’t we do it in production?

Running Shiny apps straight from the R console is fine for sharing them around the office, but if you need a more robust production environment for Shiny apps (e.g. if you want to share them with the whole company or with the public), you’ll probably want to use shiny-server. If you’re putting your app behind an SSL-enabled proxy server, use the latest HEAD from Github since it contains this change.

Go forth and visualize!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s