EXE 1.1

1. Create vectors

  1. Create a vector of 20 numbers.
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1. Assign a FALSE to the fifth element. What happens?
##  [1] 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  1. Transform the vector into a matrix of 4 rows and 5 columns in such a way that the last element of row 1 is 0.
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    1    1    1    0
## [2,]    1    1    1    1    1
## [3,]    1    1    1    1    1
## [4,]    1    1    1    1    1
  1. Generate a dataframe with 5 observations (i.e. rows) and two variables (i.e. columns). The first column is labeled gender and consists of 2 boys and 3 girls (boy=1, girl=2). The second column is labeled etnicity and has values: Dutch, Surinamese, Turkish, Dutch, and Moroccan.
##   gender   etnicity
## 1      1      Dutch
## 2      1 Surinamese
## 3      2    Turkish
## 4      2      Dutch
## 5      2   Moroccan
  1. Add a third column labeled age: 15, 34, 62, 12, 74 to the dataframe.
##   gender   etnicity age
## 1      1      Dutch  15
## 2      1 Surinamese  34
## 3      2    Turkish  62
## 4      2      Dutch  12
## 5      2   Moroccan  74
## 'data.frame':    5 obs. of  2 variables:
##  $ gender  : num  1 1 2 2 2
##  $ etnicity: Factor w/ 4 levels "Dutch","Moroccan",..: 1 3 4 1 2

2. Inspect data

Install package ISwR and load into environment. The dataset we will work with is named juul and provided by ISwR. Show first six lines:

##    age menarche sex igf1 tanner testvol newvar
## 1   NA       NA  NA   90     NA      NA   <NA>
## 2   NA       NA  NA   88     NA      NA   <NA>
## 3   NA       NA  NA  164     NA      NA   <NA>
## 4   NA       NA  NA  166     NA      NA   <NA>
## 5   NA       NA  NA  131     NA      NA   <NA>
## 6 0.17       NA   1  101      1      NA      M

Show summary statistics of all variables in juul:

##       age            menarche          sex             igf1      
##  Min.   : 0.170   Min.   :1.000   Min.   :1.000   Min.   : 25.0  
##  1st Qu.: 9.053   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:202.2  
##  Median :12.560   Median :1.000   Median :2.000   Median :313.5  
##  Mean   :15.095   Mean   :1.476   Mean   :1.534   Mean   :340.2  
##  3rd Qu.:16.855   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:462.8  
##  Max.   :83.000   Max.   :2.000   Max.   :2.000   Max.   :915.0  
##  NA's   :5        NA's   :635     NA's   :5       NA's   :321    
##      tanner        testvol          newvar         
##  Min.   :1.00   Min.   : 1.000   Length:1339       
##  1st Qu.:1.00   1st Qu.: 1.000   Class :character  
##  Median :2.00   Median : 3.000   Mode  :character  
##  Mean   :2.64   Mean   : 7.896                     
##  3rd Qu.:5.00   3rd Qu.:15.000                     
##  Max.   :5.00   Max.   :30.000                     
##  NA's   :240    NA's   :859

EXE 1.2

1.Read datafile

Set the directory by:
setwd(“….”) note / instead of  in windows
setwd(“….”) macs without C:
getwd() to print the current working directory

Read dataset using read.table(). Look up function read.table in help (right lower pane) or use ?read.table. Read the imdb.txt file. Assign a name to the file. Inspect dataset using head and summary:

##                   movie runtime budget     genre rating revenues screens
## 1                    42     128   40.0 Biography    7.5    15265      54
## 3                  2012     158  200.0    Action    5.8 11877060     996
## 5  (500) Days of Summer      95    7.5    Comedy    7.8   404221     140
## 7                2 Guns     109   61.0    Action    6.8  1490523     571
## 9           21 and Over      93   13.0    Comedy    5.9   125991      80
## 11       21 Jump Street     109   42.0    Action    7.2  1185398     438
##                   movie        runtime          budget      
##  (500) Days of Summer:  1   Min.   : 80.0   Min.   :  1.50  
##  2 Guns              :  1   1st Qu.: 97.0   1st Qu.: 21.00  
##  2012                :  1   Median :107.0   Median : 40.00  
##  21 and Over         :  1   Mean   :109.7   Mean   : 60.33  
##  21 Jump Street      :  1   3rd Qu.:120.0   3rd Qu.: 80.00  
##  22 Jump Street      :  1   Max.   :180.0   Max.   :250.00  
##  (Other)             :478                                   
##        genre         rating         revenues           screens      
##  Action   :179   Min.   :1.600   Min.   :   11705   Min.   :   7.0  
##  Comedy   :107   1st Qu.:5.800   1st Qu.:  481093   1st Qu.: 241.0  
##  Drama    : 58   Median :6.400   Median : 1215030   Median : 350.0  
##  Adventure: 37   Mean   :6.318   Mean   : 2191898   Mean   : 369.3  
##  Horror   : 33   3rd Qu.:7.000   3rd Qu.: 2769544   3rd Qu.: 501.0  
##  Crime    : 22   Max.   :8.800   Max.   :23845427   Max.   :1265.0  
##  (Other)  : 48                   NA's   :47         NA's   :45

2. Explore data

What is the mean budget?

## [1] 60.33099

How many Comedy movies are there in percentages?

## [1] 22.10744

Sort budget (high to low). Look up function ‘sort’

##   [1] 250.0 250.0 250.0 250.0 230.0 225.0 225.0 215.0 215.0 210.0 209.0
##  [12] 200.0 200.0 200.0 200.0 200.0 200.0 200.0 200.0 200.0 200.0 200.0
##  [23] 195.0 195.0 190.0 190.0 190.0 180.0 178.0 175.0 175.0 170.0 170.0
##  [34] 170.0 170.0 170.0 170.0 165.0 163.0 160.0 160.0 160.0 155.0 150.0
##  [45] 150.0 150.0 150.0 150.0 150.0 150.0 150.0 150.0 150.0 150.0 145.0
##  [56] 140.0 130.0 130.0 130.0 130.0 130.0 130.0 125.0 125.0 125.0 125.0
##  [67] 125.0 125.0 125.0 120.0 120.0 120.0 120.0 120.0 120.0 117.0 115.0
##  [78] 110.0 110.0 110.0 110.0 110.0 105.0 105.0 103.0 100.0 100.0 100.0
##  [89] 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
## [100] 100.0  95.0  93.0  92.0  90.0  90.0  90.0  90.0  90.0  85.0  85.0
## [111]  85.0  85.0  85.0  84.0  82.0  80.0  80.0  80.0  80.0  80.0  80.0
## [122]  80.0  80.0  80.0  80.0  79.0  79.0  78.0  75.0  75.0  75.0  75.0
## [133]  75.0  75.0  75.0  75.0  75.0  70.0  70.0  70.0  70.0  70.0  70.0
## [144]  70.0  70.0  70.0  69.0  69.0  68.0  68.0  66.0  66.0  65.0  65.0
## [155]  65.0  65.0  65.0  65.0  65.0  65.0  63.0  61.0  61.0  60.0  60.0
## [166]  60.0  60.0  60.0  60.0  60.0  60.0  60.0  60.0  60.0  60.0  60.0
## [177]  60.0  58.0  58.0  58.0  57.0  55.0  55.0  55.0  52.0  52.0  50.2
## [188]  50.0  50.0  50.0  50.0  50.0  50.0  50.0  50.0  50.0  50.0  50.0
## [199]  50.0  50.0  50.0  50.0  49.9  48.0  47.0  46.0  45.0  45.0  45.0
## [210]  45.0  44.5  44.0  43.0  42.0  42.0  42.0  42.0  40.0  40.0  40.0
## [221]  40.0  40.0  40.0  40.0  40.0  40.0  40.0  40.0  40.0  40.0  40.0
## [232]  40.0  40.0  40.0  40.0  40.0  40.0  40.0  40.0  40.0  40.0  40.0
## [243]  40.0  40.0  40.0  40.0  40.0  40.0  40.0  40.0  40.0  39.0  38.0
## [254]  38.0  38.0  38.0  38.0  38.0  38.0  37.0  37.0  37.0  36.0  36.0
## [265]  36.0  35.0  35.0  35.0  35.0  35.0  35.0  35.0  35.0  35.0  35.0
## [276]  35.0  35.0  35.0  35.0  35.0  35.0  35.0  35.0  35.0  34.0  33.0
## [287]  33.0  32.5  32.0  32.0  32.0  32.0  32.0  32.0  31.0  30.0  30.0
## [298]  30.0  30.0  30.0  30.0  30.0  30.0  30.0  30.0  30.0  30.0  30.0
## [309]  30.0  30.0  30.0  30.0  30.0  30.0  30.0  30.0  30.0  30.0  30.0
## [320]  29.0  28.0  28.0  28.0  28.0  28.0  28.0  28.0  27.0  26.0  26.0
## [331]  26.0  26.0  26.0  26.0  26.0  25.0  25.0  25.0  25.0  25.0  25.0
## [342]  25.0  25.0  25.0  25.0  25.0  25.0  25.0  25.0  25.0  25.0  25.0
## [353]  25.0  25.0  24.0  24.0  24.0  24.0  23.6  23.0  22.0  22.0  21.0
## [364]  21.0  21.0  20.0  20.0  20.0  20.0  20.0  20.0  20.0  20.0  20.0
## [375]  20.0  20.0  20.0  20.0  20.0  20.0  20.0  20.0  20.0  20.0  20.0
## [386]  20.0  20.0  20.0  20.0  20.0  20.0  19.0  19.0  19.0  18.5  18.0
## [397]  18.0  18.0  18.0  18.0  18.0  18.0  18.0  17.0  17.0  17.0  17.0
## [408]  17.0  16.0  16.0  16.0  16.0  16.0  15.0  15.0  15.0  15.0  15.0
## [419]  15.0  15.0  15.0  15.0  15.0  15.0  15.0  14.0  14.0  13.0  13.0
## [430]  13.0  13.0  13.0  12.6  12.5  12.5  12.0  12.0  12.0  12.0  12.0
## [441]  12.0  12.0  11.0  11.0  10.5  10.0  10.0  10.0   9.0   9.0   8.5
## [452]   8.0   8.0   8.0   7.5   7.5   7.0   7.0   6.6   6.5   6.0   5.0
## [463]   5.0   5.0   5.0   5.0   5.0   5.0   5.0   5.0   5.0   5.0   4.0
## [474]   4.0   3.5   3.0   3.0   3.0   3.0   2.5   2.0   2.0   1.8   1.5

Order the imdb according to budget. Look up function ‘order’. Look up the first 10 movies.

##                     movie runtime budget  genre rating revenues screens
## 324             Insidious      98    1.5  Drama    6.8   452340     132
## 811     The Last Exorcism     100    1.8  Drama    5.6   446086     196
## 135            Courageous     130    2.0  Drama    7.0       NA      NA
## 294           Hit and Run     100    2.0 Action    6.1       NA      NA
## 25        A Haunted House      80    2.5 Comedy    5.0   585663     346
## 180               Don Jon      90    3.0 Comedy    6.6   334566     182
## 519 Paranormal Activity 2      91    3.0 Horror    5.7  1675454     246
## 650              Sinister     110    3.0 Horror    6.8   287099      94
## 849             The Purge      85    3.0 Horror    5.6   874079     264
## 145            Dark Skies      97    3.5 Horror    6.3       NA      NA

3. Select cases

Create subset of Mystery movies in two ways using function ‘subset’ and without this function. Select variables revenues, rating, screens & budget. Check solutions using function ‘identical’:

## [1] TRUE

4. Write table

To write away a table use function ‘write.table’ in a specified location:

write.table(subimdb1,“…./subimdb1.csv”, sep=“;”,dec=“.”, row.names=FALSE)

Try open the file from the specified location in excel. write away the same file with extension .txt and open it again in excel:

write.table(subimdb1,“…./subimdb1.txt”, sep=“;”,dec=“.”, row.names=FALSE)