I came up with a weekly R challenge, as a way to “force” myself into learning new things. One figure, every week! The catch is that every time I should be trying something new… maybe I’ll use a new type of visualization for first time, or maybe I’ll explore a new dataset. Initially I thought it would be neat if there was some unifying topic, e.g. water, but then I decided to keep it open.

Intro to entry #3 for 2019

I was randomly browsing through the Danish statistics site, trying to find something interesting related to water or environment. While browsing, I got the idea to check also the demographic data, and while I was on the demographics, I somehow came up to this peculiar graph on divorce rates. I did not know that divorce rate in Denmark is so high (40-50%). The number was so striking, I did not know how to understand it. How can a country be leading in both happiness and divorce rates at the same time?
The Telegraph tried to find an answer on a similar question (read more: Denmark’s divorce express). Busines Insider also wrote about this here. So, after two days and couple of conversations about it, I decided to use this data for my next figure in R. I was thinking about the Danish landscape, and the weather (currently freezing!), so initially I wanted to do something in green-grey, but then I found the Moody Blue Hues Color Palette here. I love the blue color from this palette, so instead of landscape, I switched to the seascape area. I really wanted to make the graph more on the illustration side. I also wanted to show the number of marriages on the same graph, because I thought there may have been some trend in that too. While browsing through the data, I also learned that from 15 June 2012 it became possible for two persons of the same sex to get married (and divorced afterwards). So, the marriages data after 2012 includes the same-sex marriages (total number marriages).

Data

This week, like last week, I also used the official Danish statistic (dk: Danmarks Statistik). I downloaded the following datasets:

  1. Divorce Rate (SKI55): it includes all divorces in which at least on of the partners has his/her residence in Denmark.
  2. Marriages and entered registered partnerships (VIE7)

Graph and code

The data needed reformatting, so I was “Yes, now it is the time to get into the dplyr”. As with everything else new, if you don’t need it you don’t use it. At one point you also get stuck in the old habits. Another Old habit is to use R base for plotting graphs. I will try to snap out of it in the next months, but it works for today… Also, the pipes! I feel, it is time to get into using pipes… next time. Maybe most important thing I learned from this exercise was about clip(x1, x2, y1, y2) function. I spent so much time obsessing about some lines that get out of my plotting area! Then, when I found that I have to use clip, because the xpd=FALSE (in par) does not affect the points() and lines(), I felt so relieved. Sometimes, you just need to change the approach.
Another semi-useful thing I learned was the function jitter(), which adds some small noise for graphical purposes. It can be useful if the goal is to show how many data points are there within a group or category. I used it for the waves.
I wanted my graph to have an illustration value.

Here are the steps… I am sure it can be done with less lines of code, but I’m kind of ok with it.

  1. Load libraries. The tibble was because I wanted to use rownames_to_column(), because I saw it in this ref. card. The scales is for the transparency, which is not so visible below, but I was playing with transparencies, so I needed it.
library(tibble)
library(dplyr)
library(scales)
  1. Loading of csv. files and transformations. I was lazy and just did 3 times the same, while I could have been a bit smarter about it…
# load the divorce data
setwd("C:/Users/voutc/Documents/Personal/R_Blogs/3_Danish_divorce_lanscape")
div <- read.csv('divorceRate.csv', header=TRUE, stringsAsFactors = FALSE, sep = ",", skip=2)
# reorganise the data
div <- as.data.frame(t(div[,-1]))
div <- rownames_to_column(div)
div[,1] <- 1986:2017
colnames(div) <- c("year", "divorce.rate")
# load the marriages & partnership data
mNp <- read.csv('marriagesNpartnerships.csv', header=TRUE, stringsAsFactors = FALSE, sep = ",", skip=2)
# same as before
mNp <- as.data.frame(t(mNp[,-1]))
mNp <- rownames_to_column(mNp)
mNp[,1] <- 1989:2017
colnames(mNp) <- c("year", "MnP")
# load the partnership data (only)
p <- read.csv('partenrships.csv', header=TRUE, stringsAsFactors = FALSE, sep = ",", skip=2)
# same as before
p <- as.data.frame(t(p[,-1]))
p <- rownames_to_column(p)
p[,1] <- 1989:2017
colnames(p) <- c("year", "MM", "WW")
# paste them together, so I can get the number of marriages (- the partnerships)
all <- bind_cols(mNp, p[,-1])
all <- mutate(all, m = MnP-MM-WW) 
  1. Plot… the dark blue (sea) is the divorce rate, the whitish/grayish (iceberg or snowy cliffs) is the marriages totals (right axis). I also added the noise to make waves, and two Unicode emoji. Also, it was snowing here the last few days, so I added some snowflakes :D
# setting the graph params for background, colors, fonts etc.
par(mar=c(4, 4, 2, 2), col.axis="#00688b", col.lab="#00688b", family="mono", fg="#00688b", bty="n", xpd=FALSE) # bg="#67828d" 
# plot divorce data
plot(div, ylim=c(0, 100), las=1, type="n", xlim=c(1986, 2018), axes=FALSE,
     ylab="divorce rate (%)")
polygon(x=c(1986, 1986, 2017, 2017), y=c(0, 100, 100, 0), col="#e3ecee")
polygon(x=c(1989, all$year, 2017), y=c(0, all$m/500, 0), col=alpha("white",0.6), border="#536972", lty=3)
polygon(x=c(1986, div$year, 2017), y=c(0, div$divorce.rate, 0), col="#00688b", border="#00688b")
clip(1986, 2017, 0, 100)
# adding noise to the divorce data and ofsetting by 1% each time, so it looks like waves in a sea
for (i in 1:50) {
  lines(div$year, jitter(div[,2], amount=2)-i, col="#67828d", lty=3,lwd=0.5, ylim=c(0,100))
}
#customizing the axis
axis(side=2, las=2, lwd=0, lwd.ticks = 1)
axis(side=1, las=1, lwd=0, lwd.ticks = 1)
axis(side=4, at=c(0, 20, 40, 60, 80, 100), labels=c("0", "10", "20" , "30", "40", "50"), line=-2, las=1, lwd=0, lwd.ticks = 1, col.axis="#536972", fg="#536972")
mtext("number of marriages (x1000)", side=4, line=1, col="#536972")
mtext("@DenitzaV", side=1, line=3, adj = 1, col="#00688b", cex=0.7)
# add a boat
points(1995, 20, pch="\U1F6A3", col=alpha("white",0.9), cex=4)
# add a couple
points(2007, 73.5, pch="\U1F46B", col=alpha("#536972",0.9), cex=2)
# add a snowfall
set.seed(101)
x <- runif(50, 1986, 2017)
y <- runif(50, 65, 98)
points(x, y, pch="\U2744", col=alpha("white",0.9), cex=2)

That’s all… I think from next week I’ll be focusing more on old work I haven’t finished, so getting back to boring scientific figures, I’m afraid.

Peace!

---
title: "Divorce scape"
author: "Denitza D. Voutchkova"
date: "29 Jan 2019 (#3)"
output:
  html_notebook:
    highlight: kate
---

_I came up with a weekly __R__ challenge, as a way to "force" myself into learning new things. One figure, every week! The catch is that every time I should be trying something new... maybe I'll use  a new type of visualization for first time, or maybe I'll explore a new dataset. Initially I thought it would be neat if there was some unifying topic, e.g. water, but then I decided to keep it open._

### Intro to entry #3 for 2019

I was randomly browsing through the Danish statistics site, trying to find something interesting related to water or environment. While browsing, I got the idea to check also the demographic data, and while I was on the demographics, I somehow came up to this peculiar graph on divorce rates. I did not know that divorce rate in Denmark is so high (40-50%). The number was so striking, I did not know how to understand it. How can a country be leading in both happiness and divorce rates at the same time?  
_The Telegraph_ tried to find an answer on a similar question ([read more: Denmark's divorce express](https://www.telegraph.co.uk/expat/expatlife/9956936/Denmarks-divorce-express.html)). _Busines Insider_ also wrote about this [here](https://www.businessinsider.com/denmark-new-rules-parents-divorce-2018-4?r=US&IR=T&IR=T).
So, after two days and couple of conversations about it, I decided to use this data for my next figure in R. I was thinking about the Danish landscape, and the weather (currently freezing!), so initially I wanted to do something in green-grey, but then I found the _Moody Blue Hues Color Palette_ [here](https://www.color-hex.com/color-palette/41420). I love the blue color from this palette, so instead of landscape, I switched to the seascape area. I really wanted to make the graph more on the illustration side. I also wanted to show the number of marriages on the same graph, because I thought there may have been some trend in that too. 
While browsing through the data, I also learned that from _15 June 2012_ it  became possible for two persons of the same sex to get married (and divorced afterwards). So, the marriages data after 2012 includes the same-sex marriages (total number marriages). 

### Data

This week, like last week, I also used the official Danish statistic (dk: Danmarks Statistik). I downloaded the following datasets:

1. Divorce Rate ([SKI55](https://www.statbank.dk/statbank5a/selectvarval/define.asp?PLanguage=1&subword=tabsel&MainTable=SKI55&PXSId=210376&tablestyle=&ST=SD&buttons=0)): it includes all divorces in which at least on of the partners has his/her residence in Denmark.
2. Marriages and entered registered partnerships ([VIE7](http://www.statbank.dk/statbank5a/selectvarval/define.asp?PLanguage=1&subword=tabsel&MainTable=VIE7&PXSId=169579&tablestyle=&ST=SD&buttons=0))

### Graph and code

The data needed reformatting, so I was "Yes, now it is the time to get into the `dplyr`". As with everything else new, if you don't need it you don't use it. At one point you also get stuck in the old habits. Another Old habit is to use R base for plotting graphs. I will try to snap out of it in the next months, but it works for today... Also, the pipes! I feel, it is time to get into using pipes... next time. 
Maybe most important thing I learned from this exercise was about `clip(x1, x2, y1, y2)` function. I spent so much time obsessing about some lines that get out of my plotting area! Then, when I found that I have to use `clip`, because the `xpd=FALSE` (in `par`) does not affect the `points()` and `lines()`, I felt so relieved. Sometimes, you just need to change the approach.  
Another semi-useful thing I learned was the function `jitter()`, which adds some small noise for graphical purposes. It can be useful if the goal is to show how many data points are there within a group or category. I used it for the waves.  
I wanted my graph to have an illustration value.

Here are the steps... I am sure it can be done with less lines of code, but I'm kind of ok with it.

1. Load libraries. The `tibble` was because I wanted to use `rownames_to_column()`, because I saw it in this [ref. card](https://ugoproto.github.io/ugo_r_doc/dplyr.pdf). The `scales` is for the transparency, which is not so visible below, but I was playing with transparencies, so I needed it. 

```{r warning=FALSE}
library(tibble)
library(dplyr)
library(scales)
```

2. Loading of csv. files and transformations. I was lazy and just did 3 times the same, while I could have been a bit smarter about it... 
```{r collapse=TRUE}
# load the divorce data
setwd("C:/Users/voutc/Documents/Personal/R_Blogs/3_Danish_divorce_lanscape")
div <- read.csv('divorceRate.csv', header=TRUE, stringsAsFactors = FALSE, sep = ",", skip=2)

# reorganise the data
div <- as.data.frame(t(div[,-1]))
div <- rownames_to_column(div)
div[,1] <- 1986:2017
colnames(div) <- c("year", "divorce.rate")

# load the marriages & partnership data
mNp <- read.csv('marriagesNpartnerships.csv', header=TRUE, stringsAsFactors = FALSE, sep = ",", skip=2)
# same as before
mNp <- as.data.frame(t(mNp[,-1]))
mNp <- rownames_to_column(mNp)
mNp[,1] <- 1989:2017
colnames(mNp) <- c("year", "MnP")

# load the partnership data (only)
p <- read.csv('partenrships.csv', header=TRUE, stringsAsFactors = FALSE, sep = ",", skip=2)
# same as before
p <- as.data.frame(t(p[,-1]))
p <- rownames_to_column(p)
p[,1] <- 1989:2017
colnames(p) <- c("year", "MM", "WW")

# paste them together, so I can get the number of marriages (- the partnerships)
all <- bind_cols(mNp, p[,-1])
all <- mutate(all, m = MnP-MM-WW) 
```

3. Plot... the dark blue (sea) is the divorce rate, the whitish/grayish (iceberg or snowy cliffs) is the marriages totals (right axis). I also added the noise to make waves, and two Unicode [emoji](https://unicode.org/emoji/charts/full-emoji-list.html). Also, it was snowing here the last few days, so I added some snowflakes :D

```{r figure, fig.width=10, fig.asp=1,  dev='svg', collapse=TRUE, fig.align='center'}

# setting the graph params for background, colors, fonts etc.
par(mar=c(4, 4, 2, 2), col.axis="#00688b", col.lab="#00688b", family="mono", fg="#00688b", bty="n", xpd=FALSE) # bg="#67828d" 

# plot divorce data
plot(div, ylim=c(0, 100), las=1, type="n", xlim=c(1986, 2018), axes=FALSE,
     ylab="divorce rate (%)")
polygon(x=c(1986, 1986, 2017, 2017), y=c(0, 100, 100, 0), col="#e3ecee")
polygon(x=c(1989, all$year, 2017), y=c(0, all$m/500, 0), col=alpha("white",0.6), border="#536972", lty=3)
polygon(x=c(1986, div$year, 2017), y=c(0, div$divorce.rate, 0), col="#00688b", border="#00688b")
clip(1986, 2017, 0, 100)

# adding noise to the divorce data and ofsetting by 1% each time, so it looks like waves in a sea
for (i in 1:50) {
  lines(div$year, jitter(div[,2], amount=2)-i, col="#67828d", lty=3,lwd=0.5, ylim=c(0,100))
}

#customizing the axis
axis(side=2, las=2, lwd=0, lwd.ticks = 1)
axis(side=1, las=1, lwd=0, lwd.ticks = 1)
axis(side=4, at=c(0, 20, 40, 60, 80, 100), labels=c("0", "10", "20" , "30", "40", "50"), line=-2, las=1, lwd=0, lwd.ticks = 1, col.axis="#536972", fg="#536972")
mtext("number of marriages (x1000)", side=4, line=1, col="#536972")
mtext("@DenitzaV", side=1, line=3, adj = 1, col="#00688b", cex=0.7)

# add a boat
points(1995, 20, pch="\U1F6A3", col=alpha("white",0.9), cex=4)

# add a couple
points(2007, 73.5, pch="\U1F46B", col=alpha("#536972",0.9), cex=2)

# add a snowfall
set.seed(101)
x <- runif(50, 1986, 2017)
y <- runif(50, 65, 98)
points(x, y, pch="\U2744", col=alpha("white",0.9), cex=2)
```

That's all... I think from next week I'll be focusing more on old work I haven't finished, so getting back to boring scientific figures, I'm afraid. 

Peace!