Overweight in Kuwait – Food Supply with R

Disclaimer

I cannot say this enough … I have no idea about nutrition. In fact, if one were to ask me for a dietary plan it would consist of 4 portions of cookies and 20 portions of coffee. The reason I am doing this is to practice and demonstrate how to do some analysis with R on interesting issues.

The Problem: Food! (…delicious)

We already looked at the obesity in Kuwait using BMI data. What we found was that indeed, irrespective of gender, Kuwait ranks as one of the top countries with a high proportion of the population deemed obese (BMI >= 30).

The question we did not address is what might be the cause of this obesity. Following in the paths of the WHO’s report on “Diet, food supply, and obesity in the Pacific” (download here) we hope to emulate a very small subset of correlations between the dietary patterns and the prevalence of obesity in Kuwait.

Since this is only a blog post, we will only focus on the macro-nutrient level. Nevertheless, after completing this level we will be well equipped to dive into details with R (feel free to contact me if you want to do this!)

Conclusions First! … Plotting and Stuff

Lets start by plotting a graph to see if we notice any patterns (click it for a much larger image).

Food Supply - Kuwait vs World

What does this graph tell us?

  • The black line represents Kuwait figures.
  • The red dots represent the average around the world
  • The blue bars are top 80% to lower 20% of the observations with the medians marked by a blue diamond.

Lets make some observations about the plot. For brevity we will only highlight things relevant to obesity being mindful of:

  • the fact that I know nothing about nutrition
  • that there are other interesting things going on in the plot.
cereals

Cereals have been increasing since 1993. This means that the average Kuwaiti consumes more cereals every day than 80% of the persons around the world. The implication is, as explained in Cordain, 1999 that people who consume high amounts of cereals are “affected by disease and dysfunction directly attributable to the consumption of these foods”.

Fruits

Fruits & Starchy Roots both show trends that are below average. Both are important sources of fiber. Fibers help slim you down, but are also important for the prevention of other types of diseases; such as a prevalent one in Kuwaiti males: Colorectal cancer.

The consumption of vegetables perhaps serves as a balancing factor for this dietary nutrient of which the average Kuwaiti consumes more of compared to 80% of the people around the world.

Veg Oils

Vegetable Oils is my favourite … the only thing that stopped the rise of vegetable oil supply in Kuwait was the tragic war of 1990. In 2009 the food supply per capita in Kuwait was well over the 80th percentile in the rest of the world. Vegetable oils not only cause but rather promote obesity. We don’t love Mazola so much anymore now do we?

Sugars

Sugars & Sweetners have already made an appearance in another post and here we have more data. The findings are similar, the average Kuwaiti consumes much more sugar than the average person around the world. This not only contributes to obesity but also other diseases such as diabetes.

Data Gathering

So lets get going … where’s the data at?!

The Food and Agriculture Organization of the United Nations have a pretty awesome web portal that contains a lot of rich data. We are interested in data sets concerned with “Food Supply”. Particularly we want to look at:

… because there are only two types of food in the world: Vegetable and Not Vegetable … (cookies are vegetables)

So lets go ahead and download data sets that we want. You will find the selection for Food Supply of Crops. You will want to make sure all fields match:

  • Select all countries.
  • Use aggregated items
  • Leave out grand totals from the aggregated items
  • Choose the years you are interested in; I chose 1980 to 2009.
  • All checkboxes checked
  • Drop-down: output table, no thousands separator, no separator and 2 places for decimals.

Crops

Click download and you’re set for Crops!
Now repeat for the Food Supply of Livestock.

Making CSV Files

Now we have 2 XLS files that we can work with. The irony is that these are not excel files at all … they are HTML files
So the first thing we need to do is rename the 2 downloaded folders to “first.html” and “second.html” – it does not matter which one is first or second.

You will notice that the files are huge. So expect the next step to take time.

We will be using the XML library to read the files: first.html and second.html
We know that the files do not contain any headers and we do not want any white spaces.
We will make sure that the data is read into a data frame so we can work with data frame functions.

# Load libraries
library(XML)
 
# Read HTML Tables 
first<-readHTMLTable('first.html',header=F,trim=T,as.data.frame=T)
second<-readHTMLTable('second.html',header=F,trim=T,as.data.frame=T)
 
# Make sure that the data is of class data.frame
first<-as.data.frame(first)
second<-as.data.frame(second)

We now have our data frames and if we look into the data we can figure out what our headers should be.
So lets rename the headers now:

# Make header names
headers<-c("Domain","Country","CountryCode","Item","ItemCode","Element","ElementCode","Year","Unit","Value","Flag","FlagDescription")
names(first)<-headers
names(second)<-headers

Great! Now we can finish up by writing CSV files that are smaller in size and faster to read the next time we want to run the analysis:

# Write the CSV file for future use
write.csv(first,'first.csv')
write.csv(second,'second.csv')

Data Munging

We are now ready to play with the data. We will be using 2 libraries: dplyr to manage the data and ggplot2 for plotting. Load them up:

# Load libraries 
library(dplyr)
library(ggplot2)

Lets pretend we are starting from scratch and want to read in the data from the CSV files we created.
You will notice read.csv adds a column so we will also need to rename our headers.

# Read files
first<-read.csv('first.csv')
second<-read.csv('second.csv')
 
# Set header names
headers<-c("ID","Domain","Country","CountryCode","Item","ItemCode","Element","ElementCode","Year","Unit","Value","Flag","FlagDescription")
names(first)<-headers
names(second)<-headers

We now want to combine the Livestock and Crops data in the two files. This can be easily done with the rbind() function:

data<-rbind(first,second)

Great now we need to stop and think about what we want to do with this data.

We want to compare Kuwait’s nutritional information with some sort of summary data about the rest of the world. We can therefore break this up into 2 parts: Kuwait and Other.

Let us deal with Kuwait first and extract the calculated data for grams consumed per capita per day. If you look into the dataset you will know which filters to apply. We will apply those filters using the subset function:

# Extract Kuwait Data
data.kuwait<-subset(data,data$Country %in% "Kuwait" & 
                      data$"FlagDescription" %in% "Calculated data" &
                      data$"ElementCode" %in% "646")

Now we want to extract the same information for every other country except Kuwait. That is easy enough, just copy and paste the function from above and :

# Extract Other Countries data
data.other<-subset(data,!(data$Country %in% "Kuwait") & 
                      data$"FlagDescription" %in% "Calculated data" &
                      data$"ElementCode" %in% "646")

We need to do a bit more work with the other countries’ data. We said we want summary information and right now we only have raw data for each country. We will use the chaining mechanism in ‘dplyr’ to create summary data such as the mean, median, and upper/lower quantiles. We will group the data by Year and we will then group it by Item (Nutrient).

# Create summary data
data.other.summary<- data.other %.%  
  group_by(Year) %.%
  group_by(Item) %.%
  summarise(mean=mean(Value),
            median=median(Value),
            lowerbound=quantile(Value, probs=.20),
            upperbound=quantile(Value, probs=.80))

This is the same method we used earlier when looking at Sugar grams consumed per capita per day in an earlier post.

Great, our data is ready for some exploration.

GG Plotting and Stuff!

The ggplot code looks like a big ugly hairball … but I will explain it line by line.

# Plot
ggplot(data = data.other.summary, aes(x = Year, y=median))+
  geom_errorbar(aes(ymin = lowerbound, ymax = upperbound),colour = 'blue', width = 0.4) +
  stat_summary(fun.y=median, geom='point', shape=5, size=3, color='blue')+
  geom_line(data=data.kuwait, aes(Year, Value))+
  geom_point(data=data.other.summary, aes(Year, mean),color=I("red"))+
  ggtitle('Supply (g/capita/day) - Kuwait vs. World')+
  xlab('Year') + ylab('g/capita/day')+
  facet_wrap(~Item,scales = "free")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
  • Line 1: We start by creating a plot using the data set for the rest of the world and plot median values by year.
  • Line 2: We then overlay the blue bars by passing in our summary stats lowerbound and upperbound calculated in the previous step.
  • Line 3: We then pass the median values and set the shape to 5 – diamond.
  • Line 4: We plot the black line using the Kuwait data set’s values by year.
  • Line 5: We plot the averages in red from the rest of the world data set.
  • Line 6: We set the title of the graph
  • Line 7: We set the labels of the axes
  • Line 8: We use the face_wrap() function to tell GG Plot that we want one graph per Item in our data set. We set the scales to “free” so that we get visible graphs (Eggs weigh less than Chicken!)
  • Line 9: We set our theme elements – I just changed the angle of the X axis items

Entire Code

# Load libraries
library(XML)
 
# Read HTML Tables 
first<-readHTMLTable('first.html',header=F,trim=T,as.data.frame=T)
second<-readHTMLTable('second.html',header=F,trim=T,as.data.frame=T)
 
# Make sure that the data is of class data.frame
first<-as.data.frame(first)
second<-as.data.frame(second)
 
# Make header names
headers<-c("Domain","Country","CountryCode","Item","ItemCode","Element","ElementCode","Year","Unit","Value","Flag","FlagDescription")
names(first)<-headers
names(second)<-headers
 
# Write the CSV file for future use
write.csv(first,'first.csv')
write.csv(second,'second.csv')
 
 
# Load libraries 
library(dplyr)
library(ggplot2)
 
# Read files
first<-read.csv('first.csv')
second<-read.csv('second.csv')
 
# Set headers for each data frame
headers<-c("ID","Domain","Country","CountryCode","Item","ItemCode","Element","ElementCode","Year","Unit","Value","Flag","FlagDescription")
names(first)<-headers
names(second)<-headers
 
# Combine data frames
data<-rbind(first,second)
 
# Check the Element Codes 
print(unique(data[,c("Element","ElementCode")]))
 
# Extract Kuwait Data
data.kuwait<-subset(data,data$Country %in% "Kuwait" & 
                      data$"FlagDescription" %in% "Calculated data" &
                      data$"ElementCode" %in% "646") 
 
# Extract Other Countries Data
data.other<-subset(data,!(data$Country %in% "Kuwait") & 
                      data$"FlagDescription" %in% "Calculated data" &
                      data$"ElementCode" %in% "646") 
 
# Create summary data
data.other.summary<- data.other %.%  
  group_by(Year) %.%
  group_by(Item) %.%
  summarise(mean=mean(Value),
            median=median(Value),
            lowerbound=quantile(Value, probs=.20),
            upperbound=quantile(Value, probs=.80))
 
# Plotting
ggplot(data = data.other.summary, aes(x = Year, y=median))+
  geom_errorbar(aes(ymin = lowerbound, ymax = upperbound),colour = 'blue', width = 0.4) +
  stat_summary(fun.y=median, geom='point', shape=5, size=3, color='blue')+
  geom_line(data=data.kuwait, aes(Year, Value))+
  geom_point(data=data.other.summary, aes(Year, mean),color=I("red"))+
  ggtitle(paste(' Supply (g/capita/day) - Kuwait vs. World'))+
  xlab('Year') + ylab('g/capita/day')+facet_wrap(~Item,scales = "free")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Leave a Reply

Your email address will not be published. Required fields are marked *