Any Joke's left in Belgium?

What I was wondering: are there still Joke’s born in Belgium?

Every time I pass by a colleague named Joke, I wonder. Let me explain: Joke is quite regular Dutch first name for a girl. You pronounce it [yo-ke], like blending ‘yoghurt’ and ‘kebab’ together and put the accent on the ‘yo’. ‘Joke’ is just a diminutive form of Jo, kind of like Jenny, Abby and Debby. No-one who grew up speaking Dutch would bat an eye to hear about someone named Joke, but everyone else on this planet probably would.

Considering how this name sounds in English and how important English as a language has become in our daily life and business world, I wondered whether there were any parents that would still name their daughter Joke…

Data source

I found everything I needed on (this website) from the Belgian government.

Data

The data came in the form of two excel files, one for girls and one for boys. Every file contains a sheet for every year (1995-2016), and then numbers per region per name (see picture below). Not ideal, but nothing that some cleaning up can’t handle.
Original data
structure

Two quick notes before I move on:

The original data file contains the numbers for Belgium as a whole, and for every region seperate (Flanders, Wallonia and Brussels). Only the Belgian data is pulled out into the clean file right now.
No names below 5 occurences are present in the database, presumably for privacy reasons.

Cleaning data

Starting by loading the packages needed:

library(readxl)
library(tidyr)
library(dplyr)
library(ggplot2)

Given that both files contained many different sheets representing the years, I started by creating a variable that would hold all the sheetnames.

#Importing files
file_girls <- "First names girls 1995-2016.xls"
sheetnames_girls <- excel_sheets(file_girls) [1:22]
file_boys <- "First names boys 1995-2016.xls"
sheetnames_boys <- excel_sheets(file_girls) [1:22]

I wanted to build a function that would read all those sheets and bind the data together. I first wrote the code for one sheet before turning it into a function. At least then I know everything works the way I want it to.

#Trial set to build up the function later on
input1995 <- read_excel(file, "1995")
data1995 <- input1995[,2:3]
data1995$Year <- 1995L
data1995$Region <- "Total Belgium"

Time to build the actual function, which:

Reads a worksheet from the excel file
Retains only column two and three which represents the names and number of occurences for total Belgium.
Turns the sheetname into a variable called Year
And adds a variable region “Total Belgium”

#A function that read a sheet(representing a year), and returns Total Belgium data
read_excelsheet <- function (file, sheetname) {
  input <- read_excel(file, sheetname)
  data <- input[,2:3]
  data$Year <- as.integer(sheetname)
  data$Region <- "Total Belgium"
  return(data)
}

With the function read_excelsheet up and working, it’s time to start reading in the data. I used a for loop to go through all the sheetnames defined in sheetnames_girls and bind every new data to the previous ones. Afterwards I did the exact same thing for the boys.

#Reading all sheets 
data_g <- data.frame()
for (i in sheetnames_girls) {
  data_g <- bind_rows(data_g, read_excelsheet(file_girls, i))
}

data_b <- data.frame()
for (i in sheetnames_boys) {
  data_b <- bind_rows(data_b, read_excelsheet(file_boys, i))
}

Nearly there. I combined the girls and boys data, and did some touching up by changing column names and changing their order.

#Combine girls and boys in 1 database
data_g$Gender <- "Girls"
data_b$Gender <- "Boys"
data <- data.frame(bind_rows(data_g, data_b))

#Renaming columns and clean-up
colnames(data)[1:2] <- c("Name", "Count")
data <- select(data, Region, Year, Gender, Name, Count)
rm(data_b)
rm(data_g)

Cleaning is all done now! For anyone who does not want to go through the cleaning, I added a cleaned file via write.csv into (my github repository). The cleaned data now looks as following:

##          Region Year Gender      Name Count
## 1 Total Belgium 1995  Girls     Laura  1484
## 2 Total Belgium 1995  Girls     Sarah   914
## 3 Total Belgium 1995  Girls     Julie   865
## 4 Total Belgium 1995  Girls Charlotte   773
## 5 Total Belgium 1995  Girls     Marie   726
## 6 Total Belgium 1995  Girls     Manon   539

Searching for Joke

With the cleaning done, it’s time to find out whether there are any Joke’s left in Belgium.

#Searching for Joke
Joke <- filter(data, Name=="Joke")
Joke %>% 
  group_by (Year) %>% 
  summarise(Count)

## # A tibble: 18 x 2
##     Year Count
##    <int> <dbl>
##  1  1995   102
##  2  1996    91
##  3  1997    87
##  4  1998    92
##  5  1999    81
##  6  2000    65
##  7  2001    61
##  8  2002    58
##  9  2003    46
## 10  2004    34
## 11  2005    32
## 12  2006    26
## 13  2007    15
## 14  2008    16
## 15  2009    13
## 16  2010    19
## 17  2011    14
## 18  2012     9

It’s pretty obvious that the number of Joke’s has dropped significantly. The summary table above does not show any data for 2013-2016, so I will add zero’s manually to show in the graph below that we have data past 2012. Given that no data is present for names that are given less than 5 times, this might not be entirely correct though.

#Adding empty rows until 2016
Joke <- Joke %>%
  rbind(list("Total Belgium", 2013, "Girls", "Joke", 0)) %>% 
  rbind(list("Total Belgium", 2014, "Girls", "Joke", 0)) %>% 
  rbind(list("Total Belgium", 2015, "Girls", "Joke", 0)) %>% 
  rbind(list("Total Belgium", 2016, "Girls", "Joke", 0))

The answer

In 1995 the world was not yet very global and about 100 Joke’s were born that year, but since then it’s been decreasing year after year. The last few years, it’s been less than 5 a year.

The previous plot does not show anything in how it compares to other names though. So I made a second graph: Every grey line represents a girls’ name, and Joke is in purple. It never was a hugely popular name, but still far from being negligible either.

There are a lot of other interesting things in this database, but I just want to highlight one other thing: there was one curve that caught my eye because of it’s very abrupt demise.
Mathilde is the name of the current Queen of Belgium. She became engaged to our crown prince Filip back in 1999. Not everything royal starts a copycat movement - or at least not in Belgium.

Link to github repository with code and data: https://github.com/suzanbaert/2017_First-names-in-Belgium

Any Joke's left in Belgium?

What I was wondering: are there still Joke’s born in Belgium?

Data source

Data

Cleaning data

Searching for Joke

The answer

Suzan Baert

Data Wrangling Part 4: Summarizing and slicing your data

Reflections on 4 months of GitHub: my advice to beginners

Data Wrangling Part 3: Basic and more advanced ways to filter rows

Happy Galentine's Day!

Data Wrangling Part 2: Transforming your columns into the right shape

Data Wrangling Part 1: Basic to Advanced Ways to Select Columns

The Postmortem Airtime Effect

Who's on everyone's 2017 "hit list"?

Bilingual Town Names

Any Joke's left in Belgium?