Introduction to R - exercises

Please follow the following exercises to get familiar with R. Parts of the exercises follow A (very) short introduction to R by Paul Torfs & Claudia Brauer, Hydrology and Quantitative Water Management Group Wageningen University, The Netherlands

Task 1

Compute the difference between 2012 and the year you started at your university and divide this by the difference between 2012 and the year you were born. Multiply this with 100 to get the percentage of your life you have spent at this university. Use brackets if you need them.

(2012 - 2004)/(2012 - 1984) * 100

## [1] 28.57

Task 2

Find help for the log function.

?log

Task 3

Repeat Task 1, but with several steps in between. You can give the variables any name you want, but the name has to start with a letter. Additionally, calculate the following:

a)

\[ \frac{5}{5+345} \]

b)

\[ 2 \cdot \sin (90°) \] Mind the conversion between degrees and radians!

c)

\[ \sqrt{16}+\sqrt{25} \]

d)

\[ \frac{\frac{5}{5+345}+2 \cdot \sin (90°)}{\sqrt{16}+\sqrt{25}} \]

atUni <- 2012 - 2004
alive <- 2012 - 1984

atUni/alive * 100

## [1] 28.57


a <- 5/(5 + 345)
a

## [1] 0.01429


b <- 2 * sin(90 * pi/180)
b

## [1] 2


c <- sqrt(16) + sqrt(25)
c

## [1] 9


(a + b)/c

## [1] 0.2238

Task 4

Compute the sum of 4, 5, 8 and 11 by first combining them into a vector and then using the function sum. What are mean and median of this series?

vec <- c(4, 5, 8, 11)
sum(vec)

## [1] 28

mean(vec)

## [1] 7

median(vec)

## [1] 6.5

Task 5

Plot 100 random numbers following a Gaussian distribution with mean 11 and standard deviation 42 as a series of connected points and add their the mean as a horizontal grey line. Additionally, plot these random numbers in a histogram (showing densities on the y-axis) and as box plot.

vec <- rnorm(100, 11, 42)
plot(vec, type = "b")
abline(h = mean(vec), col = "gray")

plot of chunk unnamed-chunk-4

hist(vec, n = 10, freq = F)

plot of chunk unnamed-chunk-4

boxplot(vec)

plot of chunk unnamed-chunk-4

Task 6

Put the numbers 31 to 60 in a vector named p and in a matrix with 6 rows and 5 columns named q. Calculate the row- and column-wise sum of q.

p <- 31:60
q <- matrix(p, 6, 5)
q

##      [,1] [,2] [,3] [,4] [,5]
## [1,]   31   37   43   49   55
## [2,]   32   38   44   50   56
## [3,]   33   39   45   51   57
## [4,]   34   40   46   52   58
## [5,]   35   41   47   53   59
## [6,]   36   42   48   54   60

q <- matrix(p, 6, 5, byrow = T)
q

##      [,1] [,2] [,3] [,4] [,5]
## [1,]   31   32   33   34   35
## [2,]   36   37   38   39   40
## [3,]   41   42   43   44   45
## [4,]   46   47   48   49   50
## [5,]   51   52   53   54   55
## [6,]   56   57   58   59   60


apply(q, 1, sum)

## [1] 165 190 215 240 265 290

apply(q, 2, sum)

## [1] 261 267 273 279 285

Task 7

Construct three random standard normal vectors of length 100. Call these vectors x1, x2 and x3. Make a data frame called t with three columns (called Va, Vb and Vc) containing respectively x1, x1+x2 and x1+x2+x3. Call the following functions for this data frame: plot(t) and cov(t). For each column call the function sd. Can you understand the results?

x1 <- rnorm(100)
x2 <- rnorm(100)
x3 <- rnorm(100)

t <- data.frame(Va = x1, Vb = x1 + x2, Vc = x1 + x2 + x3)
str(t)

## 'data.frame':    100 obs. of  3 variables:
##  $ Va: num  -1.683 -0.239 1.321 -0.417 0.33 ...
##  $ Vb: num  -1.611 -2.613 0.558 -0.809 0.821 ...
##  $ Vc: num  -1.418 -2.427 0.814 -0.718 1.52 ...

plot(t)

plot of chunk unnamed-chunk-6

cov(t)

##        Va     Vb     Vc
## Va 0.8454 0.9047 0.8312
## Vb 0.9047 2.1204 1.9499
## Vc 0.8312 1.9499 2.8123


apply(t, 2, sd)

##     Va     Vb     Vc 
## 0.9194 1.4561 1.6770

lapply(t, sd)

## $Va
## [1] 0.9194
## 
## $Vb
## [1] 1.456
## 
## $Vc
## [1] 1.677

sapply(t, sd)

##     Va     Vb     Vc 
## 0.9194 1.4561 1.6770

Task 8

Compute the mean of the square root of a vector of 100 random numbers. What happens?

smpl <- rnorm(100)
mean(sqrt(smpl))

## Warning: NaNs produced

## [1] NaN

mean(sqrt(abs(smpl)))

## [1] 0.8278


sqrtSmpl <- sqrt(smpl)

## Warning: NaNs produced

sqrtSmpl

##   [1]    NaN    NaN 1.0123 1.4211 1.0484    NaN 0.7968 0.6534 1.2941 1.1250
##  [11]    NaN 0.6874 0.6137 0.4341    NaN    NaN    NaN    NaN 1.4721 0.5469
##  [21] 0.7369    NaN    NaN 0.7464    NaN 0.4092    NaN    NaN 0.5216 1.0074
##  [31]    NaN    NaN    NaN 0.2756 0.6043    NaN 1.4010 0.5813 0.4497    NaN
##  [41] 0.8242    NaN 1.0776    NaN 0.8118 1.0131 1.2734    NaN    NaN    NaN
##  [51] 1.2955 0.1381 0.7398    NaN    NaN 0.3028    NaN 0.9976 1.2344    NaN
##  [61] 1.0481 0.6593    NaN    NaN    NaN 0.8833 1.3619    NaN 0.7132    NaN
##  [71]    NaN 0.6165    NaN 0.2820 0.7753 1.3181    NaN 1.1700 0.1144    NaN
##  [81] 0.4681    NaN 0.3277 0.1482 1.1542    NaN 1.4893    NaN 0.9412    NaN
##  [91]    NaN    NaN    NaN    NaN 0.7589    NaN 0.9444    NaN    NaN    NaN

sqrtSmpl[is.nan(sqrtSmpl)] <- NA

mean(sqrtSmpl)

## [1] NA

mean(sqrtSmpl, na.rm = T)

## [1] 0.8216

Task 9

Make a vector from 1 to 100. Make a for-loop which runs through the whole vector. Multiply the elements which are smaller than 5 and larger than 90 with 10 and the other elements with 0.1.

x <- 1:100

for (i in 1:length(x)) {
    if (x[i] < 5 | x[i] > 90) 
        x[i] <- x[i] * 5 else x[i] <- x[i] * 0.1
}
x

##   [1]   5.0  10.0  15.0  20.0   0.5   0.6   0.7   0.8   0.9   1.0   1.1
##  [12]   1.2   1.3   1.4   1.5   1.6   1.7   1.8   1.9   2.0   2.1   2.2
##  [23]   2.3   2.4   2.5   2.6   2.7   2.8   2.9   3.0   3.1   3.2   3.3
##  [34]   3.4   3.5   3.6   3.7   3.8   3.9   4.0   4.1   4.2   4.3   4.4
##  [45]   4.5   4.6   4.7   4.8   4.9   5.0   5.1   5.2   5.3   5.4   5.5
##  [56]   5.6   5.7   5.8   5.9   6.0   6.1   6.2   6.3   6.4   6.5   6.6
##  [67]   6.7   6.8   6.9   7.0   7.1   7.2   7.3   7.4   7.5   7.6   7.7
##  [78]   7.8   7.9   8.0   8.1   8.2   8.3   8.4   8.5   8.6   8.7   8.8
##  [89]   8.9   9.0 455.0 460.0 465.0 470.0 475.0 480.0 485.0 490.0 495.0
## [100] 500.0

Task 10

Write a function for the previous Task, so that you can feed it any vector you like (as argument). Use the standard R function length in the specification of the counter.

scaleFun <- function(x) {
    for (i in 1:length(x)) {
        if (x[i] < 5 | x[i] > 90) 
            x[i] <- x[i] * 5 else x[i] <- x[i] * 0.1
    }

    return(x)
}

scaleFun(1:100)

##   [1]   5.0  10.0  15.0  20.0   0.5   0.6   0.7   0.8   0.9   1.0   1.1
##  [12]   1.2   1.3   1.4   1.5   1.6   1.7   1.8   1.9   2.0   2.1   2.2
##  [23]   2.3   2.4   2.5   2.6   2.7   2.8   2.9   3.0   3.1   3.2   3.3
##  [34]   3.4   3.5   3.6   3.7   3.8   3.9   4.0   4.1   4.2   4.3   4.4
##  [45]   4.5   4.6   4.7   4.8   4.9   5.0   5.1   5.2   5.3   5.4   5.5
##  [56]   5.6   5.7   5.8   5.9   6.0   6.1   6.2   6.3   6.4   6.5   6.6
##  [67]   6.7   6.8   6.9   7.0   7.1   7.2   7.3   7.4   7.5   7.6   7.7
##  [78]   7.8   7.9   8.0   8.1   8.2   8.3   8.4   8.5   8.6   8.7   8.8
##  [89]   8.9   9.0 455.0 460.0 465.0 470.0 475.0 480.0 485.0 490.0 495.0
## [100] 500.0

scaleFun(1:10)

##  [1]  5.0 10.0 15.0 20.0  0.5  0.6  0.7  0.8  0.9  1.0

Task 11

Find a suitable marginal distribution for the Nile dataset:

data(Nile)
str(Nile)

##  Time-Series [1:100] from 1871 to 1970: 1120 1160 963 1210 1160 1160 813 1230 1370 1140 ...


plot(Nile)

plot of chunk unnamed-chunk-10

hist(Nile, freq = F)

plot of chunk unnamed-chunk-10


optFunLogNorm <- function(param) {
    -sum(log(dlnorm(as.vector(Nile), param[1], param[2])))
}

lnormFit <- optim(c(1, 1), optFunLogNorm)$par

## Warning: NaNs produced

## Warning: NaNs produced


dlnormFun <- function(x) dlnorm(x, lnormFit[1], lnormFit[2])

# loglik
sum(log(dlnormFun(as.vector(Nile))))  # -654

## [1] -653.9


hist(Nile, freq = F, n = 20)
curve(dlnormFun, add = T, col = "blue")

plot of chunk unnamed-chunk-10

Task 12

Install all required packages for the course and take a look at their demos and data.

install.packages("copula")
install.packages("evd")

demo(package="copula")
demo(package="evd")

data(package="copula")
data(package="evd")

library("evd")
data(uccle)
str(uccle)

## 'data.frame':    35 obs. of  4 variables:
##  $ day : num  33.8 27.7 60 24 72.3 50.7 18.7 41.2 26.6 27.2 ...
##  $ hour: num  14 12.8 12.9 11.9 20.6 29.1 6.2 21.1 11.2 18 ...
##  $ tmin: num  6.5 8.5 5 8.4 13.2 11.9 3.8 13 11.1 13 ...
##  $ min : num  2.5 1 0.5 0.9 1.5 4.4 1 3 3.3 2 ...

hist(uccle$hour)

plot of chunk unnamed-chunk-11

Load your data

Locate your favourite data set you might want to use throughout the course and load it into R. Make sure to save it as a clean .RData file. Mind your current working directory. Set up a new RStudio project for the upcoming week.

triples <- read.csv("simulatedTriples.csv")
save(triples, file = "myData.RData")

Optional extensions:

Install optional packages:

install.packages("sp")
install.packages("spacetime")
install.packages("VineCopula")
install.packages("rgl")
install.packages("spcopula", repos="http://R-Forge.R-project.org")

check the demo doing a subset of a multivariate return period analysis in the spcopula package:

library(spcopula)
demo("MRP")