## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [1] 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 1 1 1 0
## [2,] 1 1 1 1 1
## [3,] 1 1 1 1 1
## [4,] 1 1 1 1 1
## gender etnicity
## 1 1 Dutch
## 2 1 Surinamese
## 3 2 Turkish
## 4 2 Dutch
## 5 2 Moroccan
## gender etnicity age
## 1 1 Dutch 15
## 2 1 Surinamese 34
## 3 2 Turkish 62
## 4 2 Dutch 12
## 5 2 Moroccan 74
## 'data.frame': 5 obs. of 2 variables:
## $ gender : num 1 1 2 2 2
## $ etnicity: Factor w/ 4 levels "Dutch","Moroccan",..: 1 3 4 1 2
Install package ISwR and load into environment. The dataset we will work with is named juul and provided by ISwR. Show first six lines:
## age menarche sex igf1 tanner testvol newvar
## 1 NA NA NA 90 NA NA <NA>
## 2 NA NA NA 88 NA NA <NA>
## 3 NA NA NA 164 NA NA <NA>
## 4 NA NA NA 166 NA NA <NA>
## 5 NA NA NA 131 NA NA <NA>
## 6 0.17 NA 1 101 1 NA M
Show summary statistics of all variables in juul:
## age menarche sex igf1
## Min. : 0.170 Min. :1.000 Min. :1.000 Min. : 25.0
## 1st Qu.: 9.053 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:202.2
## Median :12.560 Median :1.000 Median :2.000 Median :313.5
## Mean :15.095 Mean :1.476 Mean :1.534 Mean :340.2
## 3rd Qu.:16.855 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:462.8
## Max. :83.000 Max. :2.000 Max. :2.000 Max. :915.0
## NA's :5 NA's :635 NA's :5 NA's :321
## tanner testvol newvar
## Min. :1.00 Min. : 1.000 Length:1339
## 1st Qu.:1.00 1st Qu.: 1.000 Class :character
## Median :2.00 Median : 3.000 Mode :character
## Mean :2.64 Mean : 7.896
## 3rd Qu.:5.00 3rd Qu.:15.000
## Max. :5.00 Max. :30.000
## NA's :240 NA's :859
Set the directory by:
setwd(“….”) note / instead of in windows
setwd(“….”) macs without C:
getwd() to print the current working directory
Read dataset using read.table(). Look up function read.table in help (right lower pane) or use ?read.table. Read the imdb.txt file. Assign a name to the file. Inspect dataset using head and summary:
## movie runtime budget genre rating revenues screens
## 1 42 128 40.0 Biography 7.5 15265 54
## 3 2012 158 200.0 Action 5.8 11877060 996
## 5 (500) Days of Summer 95 7.5 Comedy 7.8 404221 140
## 7 2 Guns 109 61.0 Action 6.8 1490523 571
## 9 21 and Over 93 13.0 Comedy 5.9 125991 80
## 11 21 Jump Street 109 42.0 Action 7.2 1185398 438
## movie runtime budget
## (500) Days of Summer: 1 Min. : 80.0 Min. : 1.50
## 2 Guns : 1 1st Qu.: 97.0 1st Qu.: 21.00
## 2012 : 1 Median :107.0 Median : 40.00
## 21 and Over : 1 Mean :109.7 Mean : 60.33
## 21 Jump Street : 1 3rd Qu.:120.0 3rd Qu.: 80.00
## 22 Jump Street : 1 Max. :180.0 Max. :250.00
## (Other) :478
## genre rating revenues screens
## Action :179 Min. :1.600 Min. : 11705 Min. : 7.0
## Comedy :107 1st Qu.:5.800 1st Qu.: 481093 1st Qu.: 241.0
## Drama : 58 Median :6.400 Median : 1215030 Median : 350.0
## Adventure: 37 Mean :6.318 Mean : 2191898 Mean : 369.3
## Horror : 33 3rd Qu.:7.000 3rd Qu.: 2769544 3rd Qu.: 501.0
## Crime : 22 Max. :8.800 Max. :23845427 Max. :1265.0
## (Other) : 48 NA's :47 NA's :45
What is the mean budget?
## [1] 60.33099
How many Comedy movies are there in percentages?
## [1] 22.10744
Sort budget (high to low). Look up function ‘sort’
## [1] 250.0 250.0 250.0 250.0 230.0 225.0 225.0 215.0 215.0 210.0 209.0
## [12] 200.0 200.0 200.0 200.0 200.0 200.0 200.0 200.0 200.0 200.0 200.0
## [23] 195.0 195.0 190.0 190.0 190.0 180.0 178.0 175.0 175.0 170.0 170.0
## [34] 170.0 170.0 170.0 170.0 165.0 163.0 160.0 160.0 160.0 155.0 150.0
## [45] 150.0 150.0 150.0 150.0 150.0 150.0 150.0 150.0 150.0 150.0 145.0
## [56] 140.0 130.0 130.0 130.0 130.0 130.0 130.0 125.0 125.0 125.0 125.0
## [67] 125.0 125.0 125.0 120.0 120.0 120.0 120.0 120.0 120.0 117.0 115.0
## [78] 110.0 110.0 110.0 110.0 110.0 105.0 105.0 103.0 100.0 100.0 100.0
## [89] 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0
## [100] 100.0 95.0 93.0 92.0 90.0 90.0 90.0 90.0 90.0 85.0 85.0
## [111] 85.0 85.0 85.0 84.0 82.0 80.0 80.0 80.0 80.0 80.0 80.0
## [122] 80.0 80.0 80.0 80.0 79.0 79.0 78.0 75.0 75.0 75.0 75.0
## [133] 75.0 75.0 75.0 75.0 75.0 70.0 70.0 70.0 70.0 70.0 70.0
## [144] 70.0 70.0 70.0 69.0 69.0 68.0 68.0 66.0 66.0 65.0 65.0
## [155] 65.0 65.0 65.0 65.0 65.0 65.0 63.0 61.0 61.0 60.0 60.0
## [166] 60.0 60.0 60.0 60.0 60.0 60.0 60.0 60.0 60.0 60.0 60.0
## [177] 60.0 58.0 58.0 58.0 57.0 55.0 55.0 55.0 52.0 52.0 50.2
## [188] 50.0 50.0 50.0 50.0 50.0 50.0 50.0 50.0 50.0 50.0 50.0
## [199] 50.0 50.0 50.0 50.0 49.9 48.0 47.0 46.0 45.0 45.0 45.0
## [210] 45.0 44.5 44.0 43.0 42.0 42.0 42.0 42.0 40.0 40.0 40.0
## [221] 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0
## [232] 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0
## [243] 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0 40.0 39.0 38.0
## [254] 38.0 38.0 38.0 38.0 38.0 38.0 37.0 37.0 37.0 36.0 36.0
## [265] 36.0 35.0 35.0 35.0 35.0 35.0 35.0 35.0 35.0 35.0 35.0
## [276] 35.0 35.0 35.0 35.0 35.0 35.0 35.0 35.0 35.0 34.0 33.0
## [287] 33.0 32.5 32.0 32.0 32.0 32.0 32.0 32.0 31.0 30.0 30.0
## [298] 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0
## [309] 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0 30.0
## [320] 29.0 28.0 28.0 28.0 28.0 28.0 28.0 28.0 27.0 26.0 26.0
## [331] 26.0 26.0 26.0 26.0 26.0 25.0 25.0 25.0 25.0 25.0 25.0
## [342] 25.0 25.0 25.0 25.0 25.0 25.0 25.0 25.0 25.0 25.0 25.0
## [353] 25.0 25.0 24.0 24.0 24.0 24.0 23.6 23.0 22.0 22.0 21.0
## [364] 21.0 21.0 20.0 20.0 20.0 20.0 20.0 20.0 20.0 20.0 20.0
## [375] 20.0 20.0 20.0 20.0 20.0 20.0 20.0 20.0 20.0 20.0 20.0
## [386] 20.0 20.0 20.0 20.0 20.0 20.0 19.0 19.0 19.0 18.5 18.0
## [397] 18.0 18.0 18.0 18.0 18.0 18.0 18.0 17.0 17.0 17.0 17.0
## [408] 17.0 16.0 16.0 16.0 16.0 16.0 15.0 15.0 15.0 15.0 15.0
## [419] 15.0 15.0 15.0 15.0 15.0 15.0 15.0 14.0 14.0 13.0 13.0
## [430] 13.0 13.0 13.0 12.6 12.5 12.5 12.0 12.0 12.0 12.0 12.0
## [441] 12.0 12.0 11.0 11.0 10.5 10.0 10.0 10.0 9.0 9.0 8.5
## [452] 8.0 8.0 8.0 7.5 7.5 7.0 7.0 6.6 6.5 6.0 5.0
## [463] 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 4.0
## [474] 4.0 3.5 3.0 3.0 3.0 3.0 2.5 2.0 2.0 1.8 1.5
Order the imdb according to budget. Look up function ‘order’. Look up the first 10 movies.
## movie runtime budget genre rating revenues screens
## 324 Insidious 98 1.5 Drama 6.8 452340 132
## 811 The Last Exorcism 100 1.8 Drama 5.6 446086 196
## 135 Courageous 130 2.0 Drama 7.0 NA NA
## 294 Hit and Run 100 2.0 Action 6.1 NA NA
## 25 A Haunted House 80 2.5 Comedy 5.0 585663 346
## 180 Don Jon 90 3.0 Comedy 6.6 334566 182
## 519 Paranormal Activity 2 91 3.0 Horror 5.7 1675454 246
## 650 Sinister 110 3.0 Horror 6.8 287099 94
## 849 The Purge 85 3.0 Horror 5.6 874079 264
## 145 Dark Skies 97 3.5 Horror 6.3 NA NA
Create subset of Mystery movies in two ways using function ‘subset’ and without this function. Select variables revenues, rating, screens & budget. Check solutions using function ‘identical’:
## [1] TRUE
To write away a table use function ‘write.table’ in a specified location:
write.table(subimdb1,“…./subimdb1.csv”, sep=“;”,dec=“.”, row.names=FALSE)
Try open the file from the specified location in excel. write away the same file with extension .txt and open it again in excel:
write.table(subimdb1,“…./subimdb1.txt”, sep=“;”,dec=“.”, row.names=FALSE)