Bilingual Town Names
It seems that we Belgians just love confusing foreigners…
Imagine wanting to take a train to Mons from Ghent but there only is one driving to Bergen. Or driving south with a GPS telling you to follow the direcion of Liège, but for a while you will only see signs to Luik.
Mons/Bergen, Liège/Luik, Ypres/Ieper… those names refer to exactly the same city - one of them is the official French name, the other one the official Dutch one.
Two week ago, I heard again a story from foreigners who got very confused, and I realized I have no idea how many towns/cities we have like this. Sounds like a perfect time to play around with tmap
and leaflet
!
Data source
I found everything I needed on this website from the Belgian government.
But before importing the data, there are some packages needed along the way:
#packages for the data exploration
library(dplyr)
library(stringr)
library(readr)
library(ggplot2)
#packages for the maps
library(sp)
library(tmap)
library(viridisLite)
library(leaflet)
library(BelgiumMaps.StatBel)
Importing the data. In the original repo everything was set up in Excel but ReadXL wasn’t playing nicely together with blogdown, so turned it into csv. Readr on the other hand didn’t play nice with the special characters (é, è, ü etc) but that was slightly easier to fix by adding an ´Encoding() call.
#Importing the data
raw_data <- read_csv("https://github.com/suzanbaert/BilingualTowns/blob/master/2017-12%20TF_SOC_POP_STRUCT_2017_tcm325-283761.csv?raw=true")
#Fixing the special characters after read_csv
Encoding (raw_data$TX_MUNTY_DESCR_NL) <- "latinl"
Encoding (raw_data$TX_MUNTY_DESCR_FR) <- "latinl"
Cleaning the data
The data contained a lot of unneeded administrative data, and I wanted to rename some columns to English.
#Keeping only the variables needed
data <- raw_data %>%
select(contains("MUNTY"), TX_RGN_DESCR_NL, CD_SEX, TX_NATLTY_NL, TX_CIV_STS_NL, CD_AGE, MS_POPULATION)
colnames(data) <- c("REFNIS", "TownNL", "TownFR", "Region", "Sex", "Nationality", "MaritalStatus", "Age", "Population")
#Translating Region names to English
data$Region <- data$Region %>%
str_replace("Vlaams Gewest", "Flanders") %>%
str_replace("Waals Gewest", "Wallonia") %>%
str_replace("Brussels Hoofdstedelijk Gewest", "Brussels agglomeration")
Additionally, the data does not have a total population count, but is divided in demographic subsets. If I ever wanted to know how many people there are with exactly the same demographics as me (town, age, gender, marital status) I can now find out (26 by the way).
But since that’s not really what I’m after, I used dplyr
to create a summary population table, and immediately added a new boolean column to compare Town Names in Flemish and French.
#Creating a dataframe with total population for each town,
#and adding a column to see whether they have the same name
popdata1 <- data %>%
group_by(TownNL, TownFR, Region, REFNIS) %>%
summarise(population=sum(Population)) %>%
arrange(desc(population)) %>%
mutate(SameName = TownNL==TownFR) %>%
ungroup()
Quite quickly an issue presented itself: while browsing through some breakouts, I noticed that some town names are annotated with their district. Beveren for instance is called the same in Flemish or French, but its district got translated, so it got flagged as a town with a different name in Flemish or French.
#Noticing an issue:
popdata1%>%
filter(Region=="Flanders") %>%
filter(!SameName) %>%
slice (11:13)
## # A tibble: 3 x 6
## TownNL TownFR Region REFNIS popu~ Same~
## <chr> <chr> <chr> <int> <int> <lgl>
## 1 Beveren (Sint-Niklaas) Beveren (Saint-Nicolas) Fland~ 46003 47946 F
## 2 Dendermonde Termonde Fland~ 42006 45583 F
## 3 Vilvoorde Vilvorde Fland~ 23088 43653 F
To get rid of the districts, I cleaned out any word pattern between brackets, and re-generated a boolean column DiffName
to see whether the town names are different.
#Removing the sectors between brackets
popdata <- popdata1
popdata$TownNL <- str_replace(popdata$TownNL, pattern="\\s\\(.+\\)", replacement="")
popdata$TownFR <- str_replace(popdata$TownFR, pattern="\\s\\(.+\\)", replacement="")
#Reassessing whether the names are the same
popdata <- popdata %>%
mutate(DiffName = TownNL != TownFR) %>%
select(TownNL, TownFR, DiffName, population, Region, REFNIS)
A tiny glimpse of the data
There are 95 towns/cities with two different official names, which is 16% of the total amount of towns. Contrary to what some people might assume, it’s more or less similar in both regions: 13% of Flemish towns have an official French name, 16% of Walloon towns have an official Flemish name on top. Only Brussels, an official bilingual region, has a much higher percentage of ‘double names’.
#How many have exactly the same name?
#by region
popdata %>%
group_by(Region) %>%
summarise(NTowns=n(), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 3 x 5
## Region NTowns N_DiffName Prop_SameName Prop_DiffName
## <chr> <int> <int> <dbl> <dbl>
## 1 Brussels agglomeration 19 13 0.320 0.680
## 2 Flanders 308 39 0.870 0.130
## 3 Wallonia 262 43 0.840 0.160
Mapping the towns with two official names
Using tmap
I created two first maps: the first one shows the general regions in Belgium, and a second one highlighting just the towns that have two official town names.
#Importing SPdataframe for Belgium
data("BE_ADMIN_MUNTY", package="BelgiumMaps.StatBel")
#creating a Region2 for making the second plot highlighting only DiffName towns
popdatamap <- popdata %>%
mutate(Region2 = ifelse(DiffName==TRUE, Region, NA))
#Merging my 2017 data with the SPdataframe
mapdata <- merge(BE_ADMIN_MUNTY, popdatamap, by.x = "CD_MUNTY_REFNIS", by.y = "REFNIS")
#Plot different regions
virpalette <- rev(viridis(3))
regionplot<- tm_shape(mapdata) +
tm_fill(col="Region", palette=virpalette,
title = "Regions in Belgium")+
tm_polygons()+
tm_layout(legend.position = c("left", "bottom"))
#Plot to show those with different name by region
nameplot <- tm_shape(mapdata) +
tm_fill(col="Region2", palette=virpalette,
colorNA = "gray90", textNA="Same name",
title = "Towns with two official names")+
tm_polygons()+
tm_layout(legend.position = c("left", "bottom"))
#Show both plots next to each other
tmap_arrange(regionplot, nameplot)
First of all, for people not familair with Belgium: you see our basic regions in the left plot
- The yellow dot in the middle is the Brussels agglomeration, officially bilingual
- The north in green is Flanders where the official language is Dutch (of Flemish as we call it)
- The south in purple is Wallonia where the official language is French
- The divide between green en purple is called the language border…
- To make things even more complicated, some towns in Flanders or Wallonia have a special status: they have “language facilities”. To make something complicated very simple: they are sort of bilingual without being bilingual.
The image on the right just shows all the towns with two official town names. Seeing a higher concentration of these towns around the language border is not a complete surprise, but it does not explain the majority of towns.
Distilling the reason for two official town names
Reason 1: Brussels, an official Bilingual region
In the above table it was obvious that the Brussels region has a much higher share of towns with two offical names: 68% versus the country average of 16%. Given Brussels status as bilingual that should not come as a surprise. I was actually more surprised to realize that there are still 6 that only have their original name: Anderlecht, Jette, Etterbeek, Evere, Ganshoren and Koekelberg.
#Checking the data on Brussels
popdata %>%
filter(Region=="Brussels agglomeration") %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 19 6 13 0.320 0.680
#Adding a column to note down the reason for different names
reason_BXL <- popdata %>%
filter(Region=="Brussels agglomeration") %>%
filter(DiffName) %>%
mutate(Reason = "Brussels")
popdata%>%
filter(Region=="Brussels agglomeration") %>%
filter(DiffName) %>%
arrange(desc(population))%>%
select(-Region, -REFNIS) %>%
knitr::kable()
TownNL | TownFR | DiffName | population |
---|---|---|---|
Brussel | Bruxelles | TRUE | 176545 |
Schaarbeek | Schaerbeek | TRUE | 133042 |
Sint-Jans-Molenbeek | Molenbeek-Saint-Jean | TRUE | 96629 |
Elsene | Ixelles | TRUE | 86244 |
Ukkel | Uccle | TRUE | 82307 |
Vorst | Forest | TRUE | 55746 |
Sint-Lambrechts-Woluwe | Woluwe-Saint-Lambert | TRUE | 55216 |
Sint-Gillis | Saint-Gilles | TRUE | 50471 |
Sint-Pieters-Woluwe | Woluwe-Saint-Pierre | TRUE | 41217 |
Oudergem | Auderghem | TRUE | 33313 |
Sint-Joost-ten-Node | Saint-Josse-ten-Noode | TRUE | 27115 |
Watermaal-Bosvoorde | Watermael-Boitsfort | TRUE | 24871 |
Sint-Agatha-Berchem | Berchem-Sainte-Agathe | TRUE | 24701 |
Reason 2: Larger cities
Cities are generally more important and I would have guessed that most of our cities have two official names. By just looking at the difference in average population between towns that have two names (DiffName==TRUE) and those who don’t, there clearly is a skew towards higher population town.
A quick plot in ggplot confirms this to be true: grey shows all the towns in Belgium according to their population size on a logarithmic scale. I coloured those who have two names in green.
popdata %>%
group_by(DiffName) %>%
summarise(mean=mean(population), median=median(population))
## # A tibble: 2 x 3
## DiffName mean median
## <lgl> <dbl> <dbl>
## 1 F 14744 11383
## 2 T 42511 24701
#Plotting average town size of small and larger towns
ggplot()+
geom_histogram(data=popdata, aes(x=population), fill="grey", alpha=0.6)+
geom_histogram(data=subset(popdata, DiffName==TRUE), aes(x=population), fill="cadetblue4", alpha=1)+
scale_x_log10()+
labs(x= "Population", y="Number of towns", title="Size of towns with two official names amongst all towns in Belgium")
I took a shortcut to define our cities: the 10% highest populated towns. I got the cutoff value via quantile(popdata$population, probs=0.9)
: 34190.
#Proportion of Cities with different names
popdata %>%
filter(population > 34000) %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 60 27 33 0.450 0.550
#Adding a reason column
reason_city <- popdata %>%
filter(population > 34000) %>%
filter(Region != "Brussels agglomeration") %>%
filter(DiffName) %>%
mutate(Reason = "City")
#Ten largest cities outside Brusels
popdata%>%
filter(population > 34000) %>%
filter(DiffName) %>%
filter(Region != "Brussels agglomeration") %>%
arrange(desc(population)) %>%
slice(1:10) %>%
select(-REFNIS) %>%
knitr::kable()
TownNL | TownFR | DiffName | population | Region |
---|---|---|---|---|
Antwerpen | Anvers | TRUE | 520504 | Flanders |
Gent | Gand | TRUE | 259083 | Flanders |
Luik | Liège | TRUE | 197885 | Wallonia |
Brugge | Bruges | TRUE | 118187 | Flanders |
Namen | Namur | TRUE | 110628 | Wallonia |
Leuven | Louvain | TRUE | 100291 | Flanders |
Bergen | Mons | TRUE | 95220 | Wallonia |
Mechelen | Malines | TRUE | 85665 | Flanders |
Aalst | Alost | TRUE | 84859 | Flanders |
Sint-Niklaas | Saint-Nicolas | TRUE | 76028 | Flanders |
Reason 3: German speaking region (and towns with German language facilities)
After World War I, the peace treaty of Versailles listed the annexation of 9 German towns into Belgium as war compensation. They make up our third language region as German is still their main language today.
Given that German and Dutch are both German langauges and have a lot of similarities it would make sense that the Flemish would refer to the German town names, while the French have changed some of them.
#Listing the German communes and the two additional towns with german facilities
germanspeaking <- c("Eupen", "Kelmis", "Lontzen", "Raeren", "Amel", "Büllingen",
"Burg-Reuland", "Bütgenbach", "Sankt Vith", "Malmedy", "Weismes")
#Proportion of Cities with different names
popdata %>%
filter(TownNL %in% germanspeaking) %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 11 5 6 0.450 0.550
#Adding a reason column
reason_german <- popdata %>%
filter(TownNL %in% germanspeaking) %>%
filter(DiffName) %>%
mutate(Reason = "German region")
#German towns with two official names
popdata %>%
filter(TownNL %in% germanspeaking) %>%
filter(DiffName==TRUE) %>%
select(-REFNIS) %>%
knitr::kable(align="c")
TownNL | TownFR | DiffName | population | Region |
---|---|---|---|---|
Kelmis | La Calamine | TRUE | 10964 | Wallonia |
Sankt Vith | Saint-Vith | TRUE | 9661 | Wallonia |
Weismes | Waimes | TRUE | 7493 | Wallonia |
Bütgenbach | Butgenbach | TRUE | 5583 | Wallonia |
Amel | Amblève | TRUE | 5523 | Wallonia |
Büllingen | Bullange | TRUE | 5489 | Wallonia |
Reason 4: Towns in Flanders or Wallonia with official language facilities
Always a topic for debate in Belgium: the towns with official language facilities. These are towns that belong to one region but they have some degree of bilingual facilities (it’s complicated!).
#Listing all towns with language facilities
faciliteiten <- c("Bever", "Drogenbos", "Herstappe", "Kraainem", "Linkebeek",
"Mesen", "Ronse", "Sint-Genesius-Rode", "Spiere-Helkijn",
"Voeren", "Wemmel", "Wezembeek-Oppem", "Edingen",
"Komen-Waasten", "Moeskroen", "Vloesberg")
#Proportion of Cities with different names
popdata %>%
filter(TownNL %in% faciliteiten) %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 16 6 10 0.380 0.620
#Adding a reason column
reason_facilities <- popdata %>%
filter(TownNL %in% faciliteiten) %>%
filter(DiffName) %>%
anti_join(reason_city) %>%
mutate(Reason = "Language facilities")
#Which towns have different names?
popdata %>%
filter(TownNL %in% faciliteiten) %>%
filter(DiffName==TRUE) %>%
select(-REFNIS) %>%
knitr::kable(align="c")
TownNL | TownFR | DiffName | population | Region |
---|---|---|---|---|
Moeskroen | Mouscron | TRUE | 57773 | Wallonia |
Ronse | Renaix | TRUE | 26092 | Flanders |
Sint-Genesius-Rode | Rhode-Saint-Genèse | TRUE | 18231 | Flanders |
Komen-Waasten | Comines-Warneton | TRUE | 18102 | Wallonia |
Edingen | Enghien | TRUE | 13563 | Wallonia |
Voeren | Fourons | TRUE | 4129 | Flanders |
Vloesberg | Flobecq | TRUE | 3426 | Wallonia |
Bever | Biévène | TRUE | 2160 | Flanders |
Spiere-Helkijn | Espierres-Helchin | TRUE | 2142 | Flanders |
Mesen | Messines | TRUE | 1049 | Flanders |
The language border is obviously another factor. Throughout history, many of these towns have changed region, so they inherited more than one name.
Finally, I wanted to make an “other reason” category, and bind all the reasons to my main data. Given the large size of this post already, I did all of this in the background.
A quick reason map:
The Brussels and German region are pretty obvious dots in the map, and equally obvious is our language border. Large cities are scattered across the whole of Belgium and many of the unidentied scattered dots also represent smaller cities (like Aarlen/Arlon or Temse/Tamise).
There is one other cluster of towns south west of Brussels starting from the language border but heading to the French border. That area used to be part of the medieval County of Flanders, where both Walloon and French towns often still carry a Flemish town name. Some of them got modernized but it seems quite a few kept their original name as well.
Making a final interactive map
I wanted to bring it all together in one final interactive map. Go ahead and click away…