28 Our World in Data
28.1 データでみるわたしたちの世界
データをダウンロードして、分析する例を書いていきます。
R Package owidR も存在するが、これを書いている時点で、CRAN から削除されており、GitHub 上のものも、検索には利用できるものの、データのダウンロードなどはできないので、単純にサイトから、ダウンロードする方法を紹介します。
28.2 R パッケージについて
ここでは、R Package owidR を用いた例をあげる
必要に応じてインストールしてください。
install.packages("devtools")
devtools::install_github("piersyork/owidR")
使い方なのが変更になっている可能性があり、全面的に修正が必要かもしれません
28.3 owidR パッケージの現況
2023.09.25
CRAN からは削除されている。
GitHub サイトから devtools を使ってインストールは可能
owid_search() は使えるようである。
owid() によるデータのダウンロードしようとすると、Warning と Error が返される。
view_chart() は使えるようである。
owid_covid() は使えるようである。
ほかは、owid() でデータが取得できないので、使えない。
Source Code は、サイト(https://rdrr.io/cran/owidR/)で見ることができるので、修正を考えるのがよいように思われる。
28.5 References
The package official site contains other links. When you quote the package, use the link to the official site.
- Package Official Site: https://CRAN.R-project.org/package=owidR
-
owidR: Import Data from Our World in Data
- Man page and source codes
In general, README gives a short introduction to the package, a Manual, the comprehensive descriptions of each function, and a Vignette, a practical introduction containing examples and applications.
28.6 Introduction
This package acts as an interface to Our World in Data datasets, allowing for an easy way to search through data used in over 3,000 charts and load them into the R environment.
28.6.1 Setup
28.6.1.2 Load the package
library(owidR)
library(tidyverse)
#> ── Attaching core tidyverse packages ──── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.3 ✔ readr 2.1.4
#> ✔ forcats 1.0.0 ✔ stringr 1.5.0
#> ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
#> ✔ purrr 1.0.2
#> ── Conflicts ────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The package automatically load a part of tidyverse
, e.g., dplyr
, ggplot
, …. Since it works well with the schemetidyverse
, it is better to load tidyverse
with it.
The creator of this package also suggests loading packages plm
for panels of data, and texreg
for displaying models, but let us start without them until we actually use them. For panel data, see, for example, the site](https://www.aptech.com/blog/introduction-to-the-fundamentals-of-panel-data/).
28.6.2 Core functions
28.6.2.1 List of core functions
In this package, chart
is close to data, and chart id
is a data indicator.
- owid: Get a dataset used in an OWID chart
- owid_covid: Get the Our World in Data covid-19 dataset
- owid_search: Search the data sources used in OWID charts
- owid_source: A function to get source information from an OWID dataset and display it in the R console.
- pal_owid: Colour palettes based on the colours used by Our World in Data
- view_chart: A function that opens the original OWID chart in your browser
- world_map_data: Function that returns a simple feature collection of class sf. Map data is from naturalearthdata.com. Designed to be used internally.
28.6.2.2 Usage
28.6.2.2.1 owid_search
Search the data sources used in OWID charts
owid_search(term)
- Example
Since the output is long, I cut it off to the first six rows using head()
.
owid_search("emissions") %>% head()
#> titles
#> [1,] "Methane emissions from agriculture"
#> [2,] "Nitrous oxide emissions from agriculture"
#> [3,] "Per capita nitrous oxide emissions from agriculture"
#> [4,] "Air pollutant emissions"
#> [5,] "Emissions of air pollutants"
#> [6,] "Emissions of air pollutants"
#> chart_id
#> [1,] "methane-emissions-agriculture"
#> [2,] "nitrous-oxide-agriculture"
#> [3,] "per-capita-nitrous-oxide-agriculture"
#> [4,] "air-pollutant-emissions"
#> [5,] "emissions-of-air-pollutants"
#> [6,] "emissions-of-air-pollutants-oecd"
A matrix is returned. If the list is long, it is easier to see the pairs of the titles and chart_ids by adding as_tibble()
.
owid_search("emissions") %>% as_tibble()
If the list is not long, you do not need to add as_tibble()
. However, note that you need to keep in mind that the title and the chart_id consists of a pair, and you need to use the chart_id to download the data using owid
.
owid_search("human rights") %>% as_tibble()
28.6.2.2.2 owid
Get a dataset used in an OWID chart
owid(chart_id = NULL, rename = NULL, tidy.date = TRUE, ...)
chard_id
: The chart_id as returned by owid_search, which is combined with ‘-’. Don’t mix up with the chart titles.
rename
: Rename the value column. Currently only works if their is just one value col- umn.
- Example
emissions <- owid("per-capita-ghg-emissions")
emissions
rights <- owid("human-rights-scores")
rights
Note.
- You can use
rename
to change column names. For example,
owid("per-capita-ghg-emissions", rename = "ghgPcap")
- Since until after importing the data, you never know the original column name, and how many columns are for indicators. It is natural to change column names using
dpyr::rename
. In the next example, I usedTotal including LUCF
. However, ‘Total including LUCF’ and “Total including LUCF” work as well.
- If there are more than one variables to rename, use vector notation as follows. Here
top_n(1)
is same asslice(1)
, and gives the first row only.
- You can use
dplyr::rename
, and keep the record of renaming column names.
(democracy <- owid("electoral-democracy"))
28.6.2.2.3 owid_source
A function to get source information from an OWID dataset and display it in the R console.
owid_source(data)
- Example
owid_source(emissions)
owid_source(rights)
28.6.2.2.4 view_chart
A function that opens the original OWID chart in your browser
view_chart(x)
x
Either a tibble returned byowid()
, or achart_id
.Example
The first one uses the chart, i.e., the tibble returned by owid()
, and the second, chart_id
. You can also embed in your R Markdown file by copying Embed iframe
clink from Share
botton at the bottom right corner.
firearm_suicide <- owid("suicide-rate-by-firearm")
view_chart(firearm_suicide)
view_chart("electoral-democracy")
view_chart("share-of-individuals-using-the-internet")
28.6.2.2.6 owid_covid
owid_covid: Get the Our World in Data covid-19 dataset
owid_covid()
See the detail at the GitHub site.
- Example
covid <- owid_covid()
28.7 Examples
The following is based on the presentation and the first two R Notebook files created by Professor Kaizoji.
28.7.1 Human Rights- modified from README
Lets use the core functions to get data on how human rights have changed over time. First by searching for charts on human rights.
owid_search("human rights") %>% as_tibble()
Let’s use the human rights protection dataset.
rights <- owid("human-rights-protection")
rights
ggplot2 makes it easy to visualise our data.
28.7.2 Internet - modified from vignette
owid_search("internet") %>% as_tibble()
Get a dataset used in an OWID chart.
internet <- owid("share-of-individuals-using-the-internet", rename = "internet_use")
internet
Get source information on an OWID dataset
owid_source(internet)
A function that opens the original OWID chart in your browser.
view_chart(internet)
Plot an owid dataset. The first is the simplest, and the second uses oied theme.
internet %>% filter(entity == "World") %>%
ggplot(aes(year, internet_use))+ geom_line() +
labs(title = "Share of the World Population \nusing the Internet",
y = "Individuals using the Internet \n(% of population)") +
scale_y_continuous(limits = c(0, 100))
internet %>%
filter(entity %in% c("United Kingdom", "Spain", "Russia", "Egypt", "Nigeria")) %>%
ggplot(aes(year, internet_use, color = entity)) + geom_line() +
labs(title = "Share of Population with Using the Internet",
y = "Individuals using the Internet \n(% of population)",
color = "country") +
scale_y_continuous(limits = c(0, 100), labels = scales::label_number(suffix = "%"))
Creating a choropleth map.
28.7.3 Democracy - modified from vignette
owid_search("democrac") %>% as_tibble()
democracy <- owid("electoral-democracy", rename = c("electoral_democracy", "vdem_high", "vdem_low"))
democracy
owid_source(democracy)
democracy %>%
filter(entity %in% c("United Kingdom", "Spain", "Russia", "Egypt", "Nigeria")) %>%
ggplot(aes(year, electoral_democracy, color = entity)) + geom_line() +
labs(title = "Electoral Democracy", y = "", color = "country") +
scale_y_continuous(limits = c(0, 1), labels = scales::label_number(suffix = "%"))
gdp <- owid("gdp-per-capita-worldbank", rename = "gdp")
gov_exp <- owid("total-gov-expenditure-gdp-wdi", rename = "gov_exp")
age_dep <- owid("age-dependency-ratio-of-working-age-population", rename = "age_dep")
unemployment <- owid("unemployment-rate", rename = "unemp")
Mutating joins
left_join(): includes all rows in x.
References
- See https://dplyr.tidyverse.org/reference/mutate-joins.html.
- Posit Primers - Tidy your data: https://posit.cloud/learn/primers/4
- Join Data Sets: https://posit.cloud/learn/primers/4.3
data <- internet %>%
left_join(democracy) %>%
left_join(gdp) %>%
left_join(gov_exp) %>%
left_join(age_dep) %>%
left_join(unemployment)
Drawing scatter plot
data %>%
filter(year == 2015) %>%
ggplot(aes(internet_use, electoral_democracy)) +
geom_point(colour = "#57677D", na.rm = TRUE) +
geom_smooth(method = "lm", colour = "#DC5E78", na.rm = TRUE) +
labs(title = "Relationship Between Internet Use \nand electoral_democracy", x = "Internet Use", y = "electoral_democracy")
data %>%
filter(year == 2015) %>%
ggplot(aes(gdp, internet_use)) +
geom_point(colour = "blue") +
geom_smooth(method = "gam", colour = "red", level = 0.0) +
labs(title = "Relationship Between Internet Use and GDP", x = "GDP", y = "Internet Use")
Creating a table of the results of the regression analysis using texreg
. For the first time, install the pacage texreg
.
install.packages("texreg")
library(texreg)
#> Version: 1.38.6
#> Date: 2022-04-06
#> Author: Philip Leifeld (University of Essex)
#>
#> Consider submitting praise using the praise or praise_interactive functions.
#> Please cite the JSS article in your publications -- see citation("texreg").
#>
#> Attaching package: 'texreg'
#> The following object is masked from 'package:tidyr':
#>
#> extract