E Appendix E Population
Population Analysis using the UN Data and the US Census Bureau
E.1 About: United States Census Bureau
The Unites States of America Census Burea compiles a huge set of data. It provides data to the World Fact Book of Central Intelligence Agency and the Census Academy resources for Data Science Education.
In alignment with the Digital Government Strategy, the Census Bureau is offering the public wider access to key U.S. statistics. (About)
To study the population analysis of the world and its visualization, visit the following sites:
- Census Academy: https://www.census.gov/data/academy.html
- Infographics & Visualizations: https://www.census.gov/library/visualizations.html
- U.S. and World Population Clock: https://www.census.gov/popclock/world
- Data Tool: https://www.census.gov/data-tools/demo/idb/#/country?YR_ANIM=2020
We can access to the data directly or by an API, Application Program Interface. In the following we study population data of the world in these two ways.
E.2 International Data Base (IDB) December 2020 Release (Now in 2021)
These data files correspond to the data available in the U.S. Census Bureau’s API. Each file is pipe “|” delimited, and the header row is demarcated with “#” at the start of the row. For additional technical specifications, including variable definitions, please visit https://www.census.gov/data/developers/data-sets/international-database.html
For more information about the International Data Base, including release notes and detailed methodology, please visit https://www.census.gov/programs-surveys/international-programs/about/idb.html
- Variables:
- AGE: Single year of age from 0-100+
- AREA_KM2: Area in square kilometers
- FIPS: FIPS country/area Code Federal Information Processing Standards
- for: Census API FIPS ‘for’ clause
- GENC: Geopolitical Entities, Names, and Codes (GENC) two character country code standard
- in: Census API FIPS ‘in’ clause
- NAME Country or area name
- POP: Total mid-year population
- SEX: Sex 0 = Both Sexes, 1 = Male, 2 = Female
- time: ISO-8601 Date/Time value
- ucgid Uniform Census Geography Identifier clause
- YR Year
The file size is huge as a text file with about 8 million rows. (For Excel, the total number of rows on a worksheet is 1,048,576, about 1 million.) We need to download once but we should read the downloaded file instead of downloading it everytime. Commands in tidyverse package works very fast and we can handle the data of this size.
E.3 Analysis using idbzip
library(tidyverse) ### For the first time, delete # in the following four lines to download the files.
## From the second time, add # to the following four lines to avoid downloading the files.
# idbzip_url <- "https://www2.census.gov/programs-surveys/international-programs/about/idb/idbzip.zip" # URL of the zip file.
# dir.create("data/idbzip") # store everything in idbzip directory in the working directory
# download.file(url = idbzip_url, destfile = "data/idbzip/idbzip.zip") # file size: 43.1 MB
# unzip("data/idbzip/idbzip.zip", exdir = "data/idbzip") # zip file contains three files idb5yr.all, idbsingleyear.all, Readme.txt# idb <- read_delim("data/idbzip/idbsingleyear.all", delim = "|")
# glimpse(idb)
# idbSince it is too large, we chose 15 countries and stored it as data/idb15.csv and data/world.csv, which is the data of the world population.
#idb %>%
# filter(GENC %in% c("BD", "CH","DE","FR","GB","ID", "IN","JP", "KR","LK","MY", "PH","TH","US","VN")) %>%
# select("YEAR" = `#YR`, "COUNTRY" = NAME, "ISO2" = GENC, SEX, POP, AGE) %>%
# write_csv("data/idb15.csv")
world_all <- idb %>%
select("YEAR" = `#YR`, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX))
world <- world_all %>%
group_by(YEAR, SEX) %>%
summarize(POPULATION = sum(POP))
world
write_csv(world, "data/world.csv")
countries <- idb %>%
select("YEAR" = `#YR`, SEX, GENC, POP) %>%
mutate(SEX = as_factor(SEX)) %>% filter(SEX == 0) %>%
group_by(YEAR, GENC) %>%
summarize(POPULATION = sum(POP)) %>%
ungroup() %>%
group_by(YEAR) %>%
summarize(NUMBER = n())
write_csv(countries, "data/world2.csv")
E.3.1 Popultion of the World
world <- read_csv("data/world.csv")## Rows: 453 Columns: 3
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): YEAR, SEX, POPULATION
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
world2 <- read_csv("data/world2.csv")## Rows: 151 Columns: 2
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): YEAR, NUMBER
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
idb15 <- read_csv("data/idb15.csv")## Rows: 509343 Columns: 6
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): COUNTRY, ISO2
## dbl (4): YEAR, SEX, POP, AGE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
idb15 ## # A tibble: 509,343 × 6
## YEAR COUNTRY ISO2 SEX POP AGE
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1981 Bangladesh BD 0 3428071 0
## 2 1981 Bangladesh BD 0 3072594 1
## 3 1981 Bangladesh BD 0 2888362 2
## 4 1981 Bangladesh BD 0 2782738 3
## 5 1981 Bangladesh BD 0 2719081 4
## 6 1981 Bangladesh BD 0 2669809 5
## 7 1981 Bangladesh BD 0 2614671 6
## 8 1981 Bangladesh BD 0 2545849 7
## 9 1981 Bangladesh BD 0 2462867 8
## 10 1981 Bangladesh BD 0 2373595 9
## # … with 509,333 more rows
idb15 %>% distinct(COUNTRY, ISO2)## # A tibble: 15 × 2
## COUNTRY ISO2
## <chr> <chr>
## 1 Bangladesh BD
## 2 China CN
## 3 Germany DE
## 4 France FR
## 5 United Kingdom GB
## 6 Indonesia ID
## 7 India IN
## 8 Japan JP
## 9 Korea, South KR
## 10 Sri Lanka LK
## 11 Malaysia MY
## 12 Philippines PH
## 13 Thailand TH
## 14 United States US
## 15 Vietnam VN
summary(idb15)## YEAR COUNTRY ISO2 SEX
## Min. :1980 Length:509343 Length:509343 Min. :0
## 1st Qu.:2014 Class :character Class :character 1st Qu.:0
## Median :2042 Mode :character Mode :character Median :1
## Mean :2042 Mean :1
## 3rd Qu.:2070 3rd Qu.:2
## Max. :2100 Max. :2
## POP AGE
## Min. : 6 Min. : 0
## 1st Qu.: 253533 1st Qu.: 25
## Median : 516669 Median : 50
## Mean : 1813595 Mean : 50
## 3rd Qu.: 1329090 3rd Qu.: 75
## Max. :30630618 Max. :100
world %>% ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)
Something is wrong!
summary(world2)## YEAR NUMBER
## Min. :1950 Min. : 2.0
## 1st Qu.:1988 1st Qu.:113.5
## Median :2025 Median :226.0
## Mean :2025 Mean :173.3
## 3rd Qu.:2062 3rd Qu.:227.0
## Max. :2100 Max. :227.0
ggplot(world2, aes(x = YEAR, y = NUMBER)) +
geom_bar(stat = "identity")
E.3.2 Population of a Country, JAPAN
japan <- filter(idb15, ISO2 == "JP") %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX))
japan## # A tibble: 33,633 × 4
## YEAR SEX POP AGE
## <dbl> <fct> <dbl> <dbl>
## 1 1990 0 1228598 0
## 2 1990 0 1275792 1
## 3 1990 0 1318661 2
## 4 1990 0 1355679 3
## 5 1990 0 1388504 4
## 6 1990 0 1452576 5
## 7 1990 0 1494424 6
## 8 1990 0 1507634 7
## 9 1990 0 1515567 8
## 10 1990 0 1547527 9
## # … with 33,623 more rows
pop <- japan %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) ## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.
pop## # A tibble: 333 × 3
## # Groups: YEAR [111]
## YEAR SEX POPULATION
## <dbl> <fct> <dbl>
## 1 1990 0 123537399
## 2 1990 1 60628417
## 3 1990 2 62908982
## 4 1991 0 123962538
## 5 1991 1 60832741
## 6 1991 2 63129797
## 7 1992 0 124378689
## 8 1992 1 61030495
## 9 1992 2 63348194
## 10 1993 0 124738157
## # … with 323 more rows
pop %>% ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)
japan2020 <- japan %>% filter(YEAR == 2020, SEX == 0)
ggplot(japan2020) +
geom_bar(aes(x = AGE, y = POP), stat = "identity") 
japan_adult <- filter(japan2020, AGE >=18)
ggplot(japan_adult) +
geom_line(aes(x = AGE, y = cumsum(POP)/sum(POP)*100)) +
geom_vline(xintercept = 40, color = "red")
E.4 Population Pyramid
E.4.1 Population Pyramid of Japan or Other Countries
E.4.1.1 Japan
yr <- 2020
country <- "Japan"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.2 Bangladesh
yr <- 2020
country <- "Bangladesh"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.3 China
yr <- 2020
country <- "China"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.4 Germany
yr <- 2020
country <- "Germany"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.5 France
yr <- 2020
country <- "France"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.6 United Kingdom
yr <- 2020
country <- "United Kingdom"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.7 Indonesia
yr <- 2020
country <- "Indonesia"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.8 India
yr <- 2020
country <- "India"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.9 Korea, South
yr <- 2020
country <- "Korea, South"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.10 Sri Lanka
yr <- 2020
country <- "Sri Lanka"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.11 Malaysia
yr <- 2020
country <- "Malaysia"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.12 Philippines
yr <- 2020
country <- "Philippines"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.13 Thailand
yr <- 2020
country <- "Thailand"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.14 United States
yr <- 2020
country <- "United States"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.

E.4.1.15 Vietnam
yr <- 2020
country <- "Vietnam"
filter(idb15, COUNTRY == country) %>%
select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = fct_recode(as_factor(SEX), "Both Sex" = "0", "Male" = "1", "Female" = "2"), POP = POP/1000) %>%
filter(YEAR == yr, SEX != "Both Sex") %>%
ggplot(aes(x = AGE, y = ifelse(SEX == "Male", -POP, POP), fill = SEX)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = paste("Population Pyramid of", country, yr),
subtitle = "population unit in 1000") +
scale_y_continuous(breaks = seq(-1000, 1000, 500),
labels = as.character(c(1000, 500, 0, 500, 1000))) +
ylab("Male vs Female") +
scale_fill_brewer(palette = "Set2")
idb15 %>% filter(COUNTRY == country) %>% select(YEAR, SEX, POP, AGE) %>%
mutate(SEX = as_factor(SEX)) %>% group_by(YEAR, SEX) %>% summarize(POPULATION = sum(POP)) %>%
ggplot(aes(x = YEAR, y = POPULATION)) +
geom_line(aes(color = SEX)) +
geom_vline(xintercept = 2020)## `summarise()` has grouped output by 'YEAR'. You can override using the `.groups`
## argument.
