Generate objects

Everything in R is an object. You can make objects by assigning a letter, number or symbol to another letter using ‘<-’. Even functions are objects, so do not use words or letters that describe functions (such as typeof or mean).

R<-4

Learn more about an object by calling it:

R
## [1] 4

R modes

The R object created above is an integer (number without decimals). Other modes are characters, logical (true, false), numeric (number with decimals). Double is a numeric object with double precision. Typeof() is a function that shows the mode. You can overwrite any object. Quotation marks are used to identify words (or characters as called by R).

typeof(R) 
## [1] "double"
h <- "hello" 
h
## [1] "hello"
typeof(h)
## [1] "character"

An object can be anything from a word to a vector to a matrix. The main object is a vector in R, although the smallest object is an element. A vector can be created in several ways, and can consists of numeric or characters, or logical values (TRUE, FALSE). matrix() is a function, nrow= is an argument. The symbol # is used to comment. To make several lines comment lines at the same time you can use CNTRL-SHIFT-C.

word <-"R is fun" 
word
## [1] "R is fun"
vec <- 0:5 # the dots mean 'until'
vec
## [1] 0 1 2 3 4 5
c(0,1,2,3,4,5) # is the same as 0:5
## [1] 0 1 2 3 4 5
rep(1,6) # 6 times 1
## [1] 1 1 1 1 1 1
mat<-matrix(vec, nrow=3, ncol=2, byrow=TRUE) 
mat
##      [,1] [,2]
## [1,]    0    1
## [2,]    2    3
## [3,]    4    5

Dataframes

To create a matrix with vectors of different modes, you need a dataframe object. In the following example, I add a character vector to the matrix created before.

sex<-c("F","M","F")
df<-data.frame(mat, sex)
df
##   X1 X2 sex
## 1  0  1   F
## 2  2  3   M
## 3  4  5   F

To learn more about the dataframe the following functions can be very useful:

colnames(df) # get column names
## [1] "X1"  "X2"  "sex"
colnames(df)[3] # get third column name
## [1] "sex"
head(df) # see  first 6 lines of dataframe
##   X1 X2 sex
## 1  0  1   F
## 2  2  3   M
## 3  4  5   F
summary(df) # get information on variables
##        X1          X2    sex  
##  Min.   :0   Min.   :1   F:2  
##  1st Qu.:1   1st Qu.:2   M:1  
##  Median :2   Median :3        
##  Mean   :2   Mean   :3        
##  3rd Qu.:3   3rd Qu.:4        
##  Max.   :4   Max.   :5

In dataframes, where the columns have column names, you can also use $ to get a column. This can be used to recode variables, or make selections in the dataframe. As the names are stored as characters you need to use the quotation marks when used without the $ sign.

df$sex
## [1] F M F
## Levels: F M
df[,"sex"]
## [1] F M F
## Levels: F M

Calculation & combination

With objects you can calculate. If you want to use matrix multiplication you need to use %*%. If you use - in front of a column number, element or row number, this means you delete this variable.

vec/5
## [1] 0.0 0.2 0.4 0.6 0.8 1.0
mat*10
##      [,1] [,2]
## [1,]    0   10
## [2,]   20   30
## [3,]   40   50
mat[,-3]%*%rep(10,2)
##      [,1]
## [1,]   10
## [2,]   50
## [3,]   90

There are several functions available to add elements or vectors to an existing R object.

c(vec,6) # c to combine
## [1] 0 1 2 3 4 5 6
c(h,"world") # can also be used for other modes
## [1] "hello" "world"
cbind(mat,rep(10,3)) # cbind is combine for columns
##      [,1] [,2] [,3]
## [1,]    0    1   10
## [2,]    2    3   10
## [3,]    4    5   10
rbind(mat, rep(10,2)) # rbind is combine for rows
##      [,1] [,2]
## [1,]    0    1
## [2,]    2    3
## [3,]    4    5
## [4,]   10   10

Indexing

You can change objects or parts of them (called elements). The mode of the object will be the lowest mode level of the elements (= character) in the object. To get elements from objects you need square brackets. If you have a matrix you need to specify the row and column number. If you do not specify one, you will take all rows, or all columns. Please note that for functions the round brackets are used.

vec[3] # get the third element
## [1] 2
vec[3] <- word
vec
## [1] "0"        "1"        "R is fun" "3"        "4"        "5"
typeof(vec)
## [1] "character"
mat[3,2] # get the second element of the third row
## [1] 5
mat[3,] # select third row
## [1] 4 5
mat[,1] # select first column
## [1] 0 2 4

To select data from a dataframe, manipulation signs need to be used. For an overview of these signs, please have a look at the overviewcommands document. Here I use the double equal sign to select elements that are exactly equal to the character F. Less than is =< and more than is =>.

df[df$sex=="F",] 
##   X1 X2 sex
## 1  0  1   F
## 3  4  5   F
df[df$X1>=2,]
##   X1 X2 sex
## 2  2  3   M
## 3  4  5   F

Read files

Most data is stored in text files or comma separated values (csv) files. Usually the values are separated by a semicolon instead of a comma (which is used in numbers). Be aware wether you use american or european notation with numbers. There are many functions to read csv files directly but they do not always do what you want (i.e. read.csv() stores numeric values with commas to denote decimals as a characters). Other functions to translate spss (sav) files to R should also be carefully used as spss stores variables using both numbers and names. In R this only possible using factors (see ?factor). Therefore, we use the most elementary function to read a table: read.table() to read the imdb text file.

dta<-read.table("X:/Data/Users/Meike/summerschool/R/imdb.txt", header=TRUE, sep=";")
head(dta)
##                  movie runtime budget     genre rating revenues screens
## 1                   42     128   40.0 Biography    7.5    15265      54
## 2                 2012     158  200.0    Action    5.8 11877060     996
## 3 (500) Days of Summer      95    7.5    Comedy    7.8   404221     140
## 4               2 Guns     109   61.0    Action    6.8  1490523     571
## 5          21 and Over      93   13.0    Comedy    5.9   125991      80
## 6       21 Jump Street     109   42.0    Action    7.2  1185398     438

The only arguments treated here are colnames (header=TRUE), and seperation mark (sep=“;”). If you want to know which arguments are used in the function, use ?read.table.