Function for dataframes

A data-frame is a fundamental data structure in R. It is a collections of variables (by columns) and samples (by rows).

Data-frames can be built by the following functions:

Function	Arguments	Description
`c()`		Combine values into a vector or list (i.e. `c(1:5)` gives: 1, 2, 3, 4, 5)
`rbind()`		Combine vectors or data-frames by rows
`cbind()`		Combine vectors or data-frames by columns
`rep()`		Replicate a list (or vector, matrix, etc.)
	`times`	number of times to repeat a list
	`length.out`	exact number of elements in the output
	`each`	number of times to repeat each element of a list
`seq()`		Create regular sequences of numbers or characters
	`from`	Starting value of a sequence
	`to`	End value of a sequence
	`by`	Increment value of a sequence
	`length.out`	Exact length of a sequence
`sample()`		Take a random sample from a vector or list

In this Unit, you will learn how to build a $2^3$ experimental design, with three categorical factors and three responses, each affected by some experimental noise, like the following table:

FactorA	FactorB	FactorC	Response1	Response2	Response3
1	1	1	3	6	9
1	1	2	5	10	15
1	2	1	2	5	8
1	2	2	4	9	14
2	1	1	6	10	14
2	1	2	8	14	20
2	2	1	6	10	14
2	2	2	8	14	20

Factors

An Experimenter aims to understand the effect of a factor on a measurable variables. Factors may be fixed at certain levels (i.e. “all the experiments are performed at three different temperatures”), or taken randomly (the measurable variable is recorded at different temperatures randomply chosen). In R, factors are handled with the following functions:

Function	Arguments	Description
`factor()`		Encode a vector as a factor.
	`x`	a vector of data
	`levels`	a vector with a unique set of data.
	`labels`	a character vector of labels.
	`ordered`	are the levels are in the order given?
`as.factor()`		Coerces its argument to a factor.
`is.factor()`		Return TRUE or FALSE whether its argument is a type of factor.
`as.ordered()`		Coerces its argument to an ordered factor.
`is.ordered()`		Return TRUE or FALSE whether its argument is a type of ordered factor.
`levels()`		`levels(x)` returns the value of the levels; `levels(x)<-c("a",...)` sets the attribute.
`labels()`		Set labels for use in printing or plotting.

Function `factor()`

For instance, the sequence in the column name FactorA can be reproduced with:

FactorA <- rep(seq(from = 1, 
                   to = 2, 
                   by = 1), 
               each = 4)
FactorA

## [1] 1 1 1 1 2 2 2 2

However, the object FactorA is a vector:

is.factor(FactorA)

## [1] FALSE

is.vector(FactorA)

## [1] TRUE

To encode such object as factor, use the function factor or as.factor:

FactorA <- factor(FactorA)
str(FactorA)

##  Factor w/ 2 levels "1","2": 1 1 1 1 2 2 2 2

Labels

It is possible to use more expressive factor names with the argument labels:

FactorA <- factor(FactorA, 
                  levels = c(1, 2), 
                  labels = c("red", "green"))
str(FactorA)

##  Factor w/ 2 levels "red","green": 1 1 1 1 2 2 2 2

Although the factors appear now as characters, the levels have the same value as before.

Responses

Experimental responses can be simulated as linear combination of the factor levels. Moreover, it is possible to add some experimental noise to simulate experimental uncertainty.

Random numbers

Responses can be simulated with random numbers. In R, there are several functions for the generation of random numbers. A couple of these functions are the following:

Function	Arguments	Description
`rnorm()`		Normally distributed random numbers.
	`n`	Number of values to be drawn.
	`mean`	Mean of the gaussian population.
	`sd`	Standard deviation of the gaussian population.
`runif()`		Uniform distribution of random numbers.
	`min`	Lower limit of the distribution.
	`max`	Upper limit of the distribution.

Response1 <- rnorm(n = length(FactorA), 
                   mean = 10, 
                   sd = 1)
Response1

## [1] 10.594103 10.087919  9.918756 10.982557  9.927781  9.877853 10.521454
## [8]  9.820152

Assemble a data-frame

Any number of responses and factors can be set as shown before. Here is the code for a $2^3$ factorial experiment:

FactorA <- rep(seq(from = 1, 
                   to = 2, 
                   by = 1), 
               each = 4)
FactorB <- rep(seq(from = 1, 
                   to = 2, 
                   by = 1), 
               each = 2, 
               times = 2)
FactorC <- rep(seq(from = 1, 
                   to = 2, 
                   by = 1), 
               times = 4)
Response1 <- 2*FactorA - 
             2*FactorB + 
             2*FactorC + 
             FactorA*FactorB 
Response2 <- 3*FactorA - 
             2*FactorB + 
             4*FactorC + 
             FactorA*FactorB 
Response3 <- 4*FactorA - 
             2*FactorB + 
             6*FactorC + 
             FactorA*FactorB
df <- data.frame(A = as.factor(FactorA), 
                 B = as.factor(FactorB), 
                 C = as.factor(FactorC), 
                 R1 = Response1, 
                 R2 = Response2, 
                 R3 = Response3) 
df

##   A B C R1 R2 R3
## 1 1 1 1  3  6  9
## 2 1 1 2  5 10 15
## 3 1 2 1  2  5  8
## 4 1 2 2  4  9 14
## 5 2 1 1  6 10 14
## 6 2 1 2  8 14 20
## 7 2 2 1  6 10 14
## 8 2 2 2  8 14 20

Data overview

A preliminary overview of data is given by the following functions:

Function	Description
`head()`	Shows only the first six rows of the dataset
`tail()`	Shows only the last six rows of the dataset
`class()`	Shows the type of data
`dim()`	Shows the dimension of the dataset (rows x columns)
`ncol()`	Number of columns
`nrow()`	Number of rows
`length()`	Number of elements in a vector
`str()`	Shows the structure of the dataset
`names()`	Shows the column names of a dataset
`summary.default()`	Shows some basic information on the dataset, such as name of columns, length and class

For instance, to check the first few rows of the data-frame:

head(df)

##   A B C R1 R2 R3
## 1 1 1 1  3  6  9
## 2 1 1 2  5 10 15
## 3 1 2 1  2  5  8
## 4 1 2 2  4  9 14
## 5 2 1 1  6 10 14
## 6 2 1 2  8 14 20

To check data structure:

str(df)

## 'data.frame':    8 obs. of  6 variables:
##  $ A : Factor w/ 2 levels "1","2": 1 1 1 1 2 2 2 2
##  $ B : Factor w/ 2 levels "1","2": 1 1 2 2 1 1 2 2
##  $ C : Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2
##  $ R1: num  3 5 2 4 6 8 6 8
##  $ R2: num  6 10 5 9 10 14 10 14
##  $ R3: num  9 15 8 14 14 20 14 20

To have a summary of the data-frame:

summary.default(df)

##    Length Class  Mode   
## A  8      factor numeric
## B  8      factor numeric
## C  8      factor numeric
## R1 8      -none- numeric
## R2 8      -none- numeric
## R3 8      -none- numeric

To change columnames:

names(df) <- c("FactorA", 
               "FactorB", 
               "FactorC", 
               "Resp1", 
               "Resp2", 
               "Resp3")
df

##   FactorA FactorB FactorC Resp1 Resp2 Resp3
## 1       1       1       1     3     6     9
## 2       1       1       2     5    10    15
## 3       1       2       1     2     5     8
## 4       1       2       2     4     9    14
## 5       2       1       1     6    10    14
## 6       2       1       2     8    14    20
## 7       2       2       1     6    10    14
## 8       2       2       2     8    14    20

Data-frames

Aim of the Unit