Introduction

This document showcases a completely reproducible particulate matter analysis. Starting from the used measurement devices, via software for data hosting, to the analysis and visualisation environment. It stands on the shoulders of communities who provide free and open resources in the spirit of Open Science:

The actual analysis is based on the opensensmapR vignette osem-intro. The code and it’s environment are published, documented, and packaged to support reproducible research.

The code repository is https://github.com/nuest/sensebox-binder and the git version hash is 5b174648761ec1cfcc9fca74b80413967a9c2d16. The repository can be opened interactively at http://mybinder.org/v2/gh/nuest/sensebox-binder/master. The code (this document as either R Markdown or Jupyter Notebook) and environment (a Docker image) are archived with a DOI: 10.5281/zenodo.1135140.

Analysis

In the remainder of this file, code “chunks” and text are interspersed to provide a transparent and understandable workflow.

The analysis of takes a look at fine particulate matter measured in Germany at New Year’s Eve 2018.

Note: The data is included in the archive as a backup in JSON format. This document by default can only be compiled as long as the openSenseMap API is available. To use local backup data, set the variable online to FALSE in the second code chunk.

Load required libraries

library("opensensmapr")
library("dplyr")
library("lubridate")
library("units")
library("sf")
library("leaflet")
library("readr")
library("jsonlite")
library("here")
library("maps")

[output hidden]

Load data on senseBoxes

online <- TRUE # access online API or use local data backup?

analysis_date <- lubridate::as_datetime("2018-01-01 00:00:00")

if(online) {
  # retrieve data from openSenseMap API
  all_boxes <- osem_boxes()
  pm25_boxes <- osem_boxes(
    exposure = 'outdoor',
    date = analysis_date, # ±4 hours
    phenomenon = 'PM2.5'
  )
  
  # update local data
  all_json <- toJSON(all_boxes, digits = NA, pretty = TRUE)
  write(all_json, file = here("data/all_boxes.json"))
  pm25_json <- toJSON(pm25_boxes, digits = NA, pretty = TRUE)
  write(pm25_json, file = here("data/pm25_boxes.json"))
} else {
  # load data from file and fix column types
  all_boxes_file <- fromJSON(here("data/all_boxes.json"))
  all_boxes <- type_convert(all_boxes_file,
                                  col_types = cols(
                                    exposure = col_factor(levels = NULL),
                                    model = col_factor(levels = NULL),
                                    grouptag = col_factor(levels = NULL)))
  class(all_boxes) <- c("sensebox", class(all_boxes))
  pm25_boxes_file <- fromJSON(here("data/pm25_boxes.json"))
  pm25_boxes <- type_convert(pm25_boxes_file,
                                  col_types = cols(
                                    exposure = col_factor(levels = NULL),
                                    model = col_factor(levels = NULL),
                                    grouptag = col_factor(levels = NULL)))
  class(pm25_boxes) <- c("sensebox", class(pm25_boxes))
}

knitr::kable(data.frame(nrow(all_boxes), nrow(pm25_boxes)),
             col.names = c(
               "# senseBoxes",
               paste("# senseBoxes with PM2.5 measurements around", format(analysis_date, "%Y-%m-%d %T %Z"))))
# senseBoxes # senseBoxes with PM2.5 measurements around 2018-01-01 00:00:00 UTC
1114 417

Exploring openSenseMap

The openSenseMap currently provides access to 1114 senseBoxes of which 417 provide measurements of PM2.5 around 2018-01-01 00:00:00 UTC.

The following map shows the PM2.5 sensor locations, which are mostly deployed in central Europe.

plot(pm25_boxes)

Particulates at New Year’s Eve in Münster

How many senseBoxes in Münster measure PM2.5?

ms <- st_sfc(st_point(c(7.62571, 51.96236)))
st_crs(ms) <- 4326

pm25_boxes_sf <- st_as_sf(pm25_boxes, remove = FALSE, agr = "identity")
names(pm25_boxes_sf) <- c(names(pm25_boxes), "geometry")

pm25_boxes_sf <- cbind(pm25_boxes_sf, dist_to_ms = st_distance(ms, pm25_boxes_sf))
max_dist <- set_units(7, km) # km from city center

ms_boxes <- pm25_boxes_sf[pm25_boxes_sf$dist_to_ms < max_dist,c("X_id", "name")]
ms_boxes
## Simple feature collection with 18 features and 2 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 7.586553 ymin: 51.903 xmax: 7.684194 ymax: 52.00165
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
##                         X_id                    name
## 1   570bad2b45fd40c8197f13a2     Balkon Gasselstiege
## 2   5750220bed08f9680c6b4154 BalkonBox Mindener Str.
## 3   57585de44bab38000e49fa97       senseBox_wifi_Yun
## 8   58932c89d038b000102f7d35       KiTa Glühwürmchen
## 78  591f578c51d34600116a8ea5     Wetterstation Erpho
## 108 593c20216ccf3b00117ca2d6             stuc_sensor
## 223 599180be7e280a001044b837           C1024_balcony
## 235 599c9b80d67eb500118c6b5e                 wb132ms
## 255 59ad958fd67eb50011b85f6d               Kortumweg
## 273 59c3b5e1d67eb50011335cee         Gruenes_Zentrum
## 274 59c67b5ed67eb50011666dbb          kartoffelsalat
## 334 5a0587159fd3c200118eaaf6 Münster nachhaltig e.V.
## 345 5a0c15289fd3c200110f3d33 Wetterstation Hansaring
## 346 5a0c2cc89fd3c200111118f0      Umwelthaus Münster
## 347 5a0c30489fd3c20011115fb7             Nordschänke
## 348 5a0c347b9fd3c2001111b701            VennheideBox
## 382 5a2e6b9037b73700105ea907          Gefängnisblick
## 384 5a30ea5375a96c000f012fe0             Münster Süd
##                           geometry
## 1       POINT (7.607807 51.974581)
## 2   POINT (7.6511693559587 51.9...
## 3       POINT (7.586553 51.976527)
## 8       POINT (7.614025 52.001647)
## 78       POINT (7.645218 51.96422)
## 108     POINT (7.620859 51.905181)
## 223     POINT (7.684194 51.929339)
## 235     POINT (7.650543 51.954793)
## 255     POINT (7.635283 51.903004)
## 273     POINT (7.633289 51.956907)
## 274      POINT (7.62677 51.946322)
## 334     POINT (7.624276 51.955196)
## 345     POINT (7.641463 51.953351)
## 346     POINT (7.641426 51.960435)
## 347     POINT (7.631443 51.970725)
## 348     POINT (7.620665 51.921125)
## 382     POINT (7.633041 51.968728)
## 384     POINT (7.628977 51.951978)

Where are the sensors in Münster? [Does not work in Jupyter Notebook]

sense_icon <- awesomeIcons(
  icon = 'cube',
  iconColor = '#ffffff',
  library = 'fa',
  markerColor = 'green'
)

leaflet() %>% 
  addTiles() %>%
  addAwesomeMarkers(data = ms_boxes,
             popup = ~paste0("<b>Name:</b> ", name, "<br><b>Id:</b> ", 
                             "<a href='https://opensensemap.org/explore/", X_id, "' ",
                             "target='_blank'>", X_id, "</a>"),
             label = ~name,
             icon = sense_icon)

Now we retrieve data for 18 senseBoxes with values in the area of interest.

if(online) {
  class(ms_boxes) <- c("sensebox", class(ms_boxes))
  ms_data <- osem_measurements(ms_boxes, phenomenon = "PM2.5",
                             from = lubridate::as_datetime("2017-12-31 20:00:00"),
                             to = lubridate::as_datetime("2018-01-01 04:00:00"),
                             columns = c("value", "createdAt", "lat", "lon", "boxId",
                                         "boxName", "exposure", "sensorId",
                                         "phenomenon", "unit", "sensorType"))
  # update local data
  data_json <- toJSON(ms_data, digits = NA, pretty = TRUE)
  write(data_json, file = here("data/ms_data.json"))
} else {
  # load data from file and fix column types
  ms_data_file <- fromJSON(here("data/ms_data.json"))
  ms_data <- type_convert(ms_data_file,
                                  col_types = cols(
                                    sensorId = col_factor(levels = NULL),
                                    unit = col_factor(levels = NULL)))
  class(ms_data) <- c("sensebox", class(ms_data))
}
## Warning in if (!is.na(params$columns)) query$columns = paste(params
## $columns, : the condition has length > 1 and only the first element will be
## used
summary(ms_data %>% 
                select(value,sensorId,unit))
##      value                            sensorId        unit     
##  Min.   :  0.60   59458624a4ad590011186665: 815   µg/m^3:  48  
##  1st Qu.:  3.80   5a0d58d69fd3c2001129024f: 481   µg/m³ :6907  
##  Median :  6.00   5a0c15289fd3c200110f3d34: 480                
##  Mean   : 18.94   5a0c30489fd3c20011115fb8: 480                
##  3rd Qu.: 11.30   5a0c347b9fd3c2001111b702: 480                
##  Max.   :999.90   5a30ea5375a96c000f012fe1: 480                
##                   (Other)                 :3739

We can now plot 6955 measurements.

plot(value~createdAt, ms_data, 
     type = "p", pch = '*', cex = 2, # new year's style
     col = factor(ms_data$sensorId), 
     xlab = NA, 
     ylab = unique(ms_data$unit),
     main = "Particulates measurements (PM2.5) on New Year 2017/2018",
     sub = paste(nrow(ms_boxes), "stations in Münster, Germany\n",
                 "Data by openSenseMap.org licensed under",
                 "Public Domain Dedication and License 1.0"))

You can see, it was a very “particular” celebration.

Who are the record holders?

top_measurements <- ms_data %>%
  arrange(desc(value))
top_boxes <- top_measurements %>%
               distinct(sensorId, .keep_all = TRUE)
knitr::kable(x = top_boxes %>%
               select(value, createdAt, boxName) %>%
               head(n = 3),
             caption = "Top 3 boxes")
Top 3 boxes
value createdAt boxName
999.9 2017-12-31 23:15:06 Wetterstation Hansaring
999.9 2017-12-31 23:07:48 Nordschänke
815.0 2017-12-31 23:18:45 Gefängnisblick

Note: The timestamp is UTC and the local time is CET (UTC+1:00). The value 999.9 is the maximum value measured by the used sensor SDS011.

knitr::kable(top_boxes %>% filter(value == max(top_boxes$value)) %>%
               select(sensorId, boxName),
             col.names = c("Top sensor identifier", "Top box name"))
Top sensor identifier Top box name
5a0c15289fd3c200110f3d34 Wetterstation Hansaring
5a0c30489fd3c20011115fb8 Nordschänke

Congratulations (?) to boxes for holding the record values just after the new year started.

Where are the record holding boxes?

Static plot

top_boxes_sf <- top_boxes %>% 
  filter(value == max(top_boxes$value)) %>%
  st_as_sf(coords = c('lon', 'lat'), crs = 4326)
bbox <- sf::st_bbox(top_boxes_sf)

world <- map("world", plot = FALSE, fill = TRUE) %>%
  sf::st_as_sf() %>%
  sf::st_geometry()

plot(world,
     xlim = round(bbox[c(1,3)], digits = 1),
     ylim = round(bbox[c(2,4)], digits = 1),
     axes = TRUE, las = 1)
plot(top_boxes_sf, add = TRUE, col = "red", cex = 2)
title("senseBox stations in Münster with highest PM2.5 measurements")

Interactive map

fireworks_icon <- makeIcon(
  # icon source: https://commons.wikimedia.org/wiki/File:Fireworks_2.png
  iconUrl = "320px-Fireworks_2.png", iconWidth = 160)

leaflet(data = top_boxes_sf) %>% 
  addTiles() %>%
  addMarkers(popup = ~as.character(boxName),
             label = ~as.character(boxName),
             icon = fireworks_icon)

Jupyter Notebook conversion

A converted version of this file can in Jupyter Notebook format is automatically created with each rendering using ipyrmd. ipyrmd is installed with other dependencies in the file install.R. The Jupyter Notebook is intended to increase accessability for users unfamiliar with R Markdown. The automatic conversion does not handle code statements within sentences.

ipyrmd --to ipynb --from Rmd -y -o sensebox-analysis.ipynb sensebox-analysis.Rmd
## Converting (Rmd->ipynb) "sensebox-analysis.Rmd" to "sensebox-analysis.ipynb"
## Inline R code detected - treated as text
## Inline R code detected - treated as text
## Inline R code detected - treated as text
## Inline R code detected - treated as text
## Inline R code detected - treated as text

Conclusion

This document creates a reproducible workflow of open data from a public API. It leverages software to create a transparent analysis, which can be easily opened, investigated, and even developed further with a web browser by opening the public code repository on a free cloud platform. To increase reproducibility, the data is cached manually as CSV files (i.e. text-based data format) and stored next to the analysis file. A use may adjust this workflow to her own needs, like different location or time period, by adjust the R code and deleting the data files. In case the exploration platform ceases to exist, users may still recreate the environment themselves based on the files in the code repository. A snapshot of the files from the code repository, i.e. data, code, and runtime environment (as a Docker image) are stored in a reliable data repository. While the manual workflow of building the image and running it is very likely to work in the future, the archived image captures the exact version of the software the original author used.

The presented solution might seem complex. But it caters to many different levels of expertise (one-click open in browser vs. self-building of images and local inspection) and has several fail-safes (binder may disappear, GitHub repository may be lost, Docker may stop working). The additional work is much outweighed by the advantages in transparency and openness.

License

Creative Commons License

This document is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0).

Metadata

devtools::session_info()
## Session info -------------------------------------------------------------
##  setting  value                       
##  version  R version 3.4.2 (2017-09-28)
##  system   x86_64, linux-gnu           
##  ui       unknown                     
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       UTC                         
##  date     2018-01-10
## Packages -----------------------------------------------------------------
##  package      * version date       source                             
##  assertthat     0.2.0   2017-04-11 CRAN (R 3.4.2)                     
##  backports      1.1.1   2017-09-25 CRAN (R 3.4.2)                     
##  base         * 3.4.2   2017-11-01 local                              
##  bindr          0.1     2016-11-13 CRAN (R 3.4.2)                     
##  bindrcpp     * 0.2     2017-06-17 CRAN (R 3.4.2)                     
##  compiler       3.4.2   2017-11-01 local                              
##  crosstalk      1.0.0   2016-12-21 CRAN (R 3.4.2)                     
##  curl           3.0     2017-10-06 CRAN (R 3.4.2)                     
##  DBI            0.7     2017-06-18 CRAN (R 3.4.2)                     
##  devtools       1.13.3  2017-08-02 CRAN (R 3.4.2)                     
##  digest         0.6.12  2017-01-27 CRAN (R 3.4.2)                     
##  dplyr        * 0.7.4   2017-09-28 CRAN (R 3.4.2)                     
##  evaluate       0.10.1  2017-06-24 CRAN (R 3.4.2)                     
##  foreign        0.8-69  2017-06-21 CRAN (R 3.4.2)                     
##  geosphere      1.5-5   2016-06-15 CRAN (R 3.4.2)                     
##  glue           1.2.0   2017-10-29 CRAN (R 3.4.2)                     
##  graphics     * 3.4.2   2017-11-01 local                              
##  grDevices      3.4.2   2017-11-01 local                              
##  grid           3.4.2   2017-11-01 local                              
##  here         * 0.1     2017-05-28 CRAN (R 3.4.2)                     
##  highr          0.6     2016-05-09 CRAN (R 3.4.2)                     
##  hms            0.3     2016-11-22 CRAN (R 3.4.2)                     
##  htmltools      0.3.6   2017-04-28 CRAN (R 3.4.2)                     
##  htmlwidgets    0.9     2017-07-10 CRAN (R 3.4.2)                     
##  httpuv         1.3.5   2017-07-04 CRAN (R 3.4.2)                     
##  httr           1.3.1   2017-08-20 CRAN (R 3.4.2)                     
##  jsonlite     * 1.5     2017-06-01 CRAN (R 3.4.2)                     
##  knitr          1.17    2017-08-10 CRAN (R 3.4.2)                     
##  lattice        0.20-35 2017-03-25 CRAN (R 3.4.2)                     
##  leaflet      * 1.1.0   2017-02-21 CRAN (R 3.4.2)                     
##  lubridate    * 1.7.0   2017-10-29 CRAN (R 3.4.2)                     
##  magrittr       1.5     2014-11-22 CRAN (R 3.4.2)                     
##  maps         * 3.2.0   2017-06-08 CRAN (R 3.4.2)                     
##  maptools       0.9-2   2017-03-25 CRAN (R 3.4.2)                     
##  markdown       0.8     2017-04-20 CRAN (R 3.4.2)                     
##  memoise        1.1.0   2017-04-21 CRAN (R 3.4.2)                     
##  methods      * 3.4.2   2017-11-01 local                              
##  mime           0.5     2016-07-07 CRAN (R 3.4.2)                     
##  opensensmapr * 0.3.1   2018-01-10 Github (noerw/opensensmapr@416d986)
##  pkgconfig      2.0.1   2017-03-21 CRAN (R 3.4.2)                     
##  R6             2.2.2   2017-06-17 CRAN (R 3.4.2)                     
##  Rcpp           0.12.13 2017-09-28 CRAN (R 3.4.2)                     
##  readr        * 1.1.1   2017-05-16 CRAN (R 3.4.2)                     
##  rgdal          1.2-15  2017-10-30 CRAN (R 3.4.2)                     
##  rgeos          0.3-25  2017-09-25 CRAN (R 3.4.2)                     
##  rlang          0.1.2   2017-08-09 CRAN (R 3.4.2)                     
##  rmarkdown      1.6     2017-06-15 CRAN (R 3.4.2)                     
##  rprojroot      1.2     2017-01-16 CRAN (R 3.4.2)                     
##  sf           * 0.5-4   2017-08-28 CRAN (R 3.4.2)                     
##  shiny          1.0.5   2017-08-23 CRAN (R 3.4.2)                     
##  sp             1.2-5   2017-06-29 CRAN (R 3.4.2)                     
##  stats          3.4.2   2017-11-01 local                              
##  stringi        1.1.5   2017-04-07 CRAN (R 3.4.2)                     
##  stringr        1.2.0   2017-02-18 CRAN (R 3.4.2)                     
##  tibble         1.3.4   2017-08-22 CRAN (R 3.4.2)                     
##  tools          3.4.2   2017-11-01 local                              
##  udunits2       0.13    2016-11-17 CRAN (R 3.4.2)                     
##  units        * 0.4-6   2017-08-27 CRAN (R 3.4.2)                     
##  utils        * 3.4.2   2017-11-01 local                              
##  withr          2.0.0   2017-07-28 CRAN (R 3.4.2)                     
##  xtable         1.8-2   2016-02-05 CRAN (R 3.4.2)                     
##  yaml           2.1.14  2016-11-12 CRAN (R 3.4.2)