Data Table
1.
You can apply all function that you can apply to data frames
2.
data.table is written in C (I am learning that language too!)
3.
Much faster.
load
from library
In this exercise, first
lets create a data frame to use for our R program.
library(data.table)
My_Dataframe=data.frame(x=rnorm(9), y=rep(c("a","b","c"), each=3), z=rnorm(9))
head(My_Dataframe)
## x y z
## 1 0.5049 a 1.6050
## 2 -2.7178 a -0.9644
## 3 -0.1471 a -1.2249
## 4 0.3693 b 1.9505
## 5 -1.1704 b -0.2547
## 6 -1.8508 b 0.3293
str(My_Dataframe)
## 'data.frame': 9 obs. of 3 variables:
## $ x: num 0.505 -2.718 -0.147 0.369 -1.17 ...
## $ y: Factor w/ 3 levels "a","b","c": 1 1 1 2 2 2 3 3 3
## $ z: num 1.605 -0.964 -1.225 1.951 -0.255 ...
To See all of the data
tables in the memory use command:
tables()
## No objects of class data.table exist in .GlobalEnv
Subsetting
ROWs:
DT=My_Dataframe
head(DT)
## x y z
## 1 0.5049 a 1.6050
## 2 -2.7178 a -0.9644
## 3 -0.1471 a -1.2249
## 4 0.3693 b 1.9505
## 5 -1.1704 b -0.2547
## 6 -1.8508 b 0.3293
DT[1,] #print row 1
## x y z
## 1 0.5049 a 1.605
DT[2,] #print row 2
## x y z
## 2 -2.718 a -0.9644
DT[DT$z>1,] #print rows where z is greater than 1
## x y z
## 1 0.5049 a 1.605
## 4 0.3693 b 1.951
DT[DT$y=="c",] #print rows where y="c"
## x y z
## 7 -0.07568 c 0.5264
## 8 -0.94607 c -0.4951
## 9 -0.47799 c -0.5098
#notice that the following code in not subsetting, it is selecting an element from the dataframe.
DT[1, 3] #print 1st rows and 3rd column element
## [1] 1.605
DT[c(1:5,3),] #print first 5 rows and 3 columns
## x y z
## 1 0.5049 a 1.6050
## 2 -2.7178 a -0.9644
## 3 -0.1471 a -1.2249
## 4 0.3693 b 1.9505
## 5 -1.1704 b -0.2547
## 3.1 -0.1471 a -1.2249
DT[c(3:5),c(1:2)] #print 3 to 5 rows and 2 columns
## x y
## 3 -0.1471 a
## 4 0.3693 b
## 5 -1.1704 b
Subsetting
COLUMNs:
DT=My_Dataframe
head(DT)
## x y z
## 1 0.5049 a 1.6050
## 2 -2.7178 a -0.9644
## 3 -0.1471 a -1.2249
## 4 0.3693 b 1.9505
## 5 -1.1704 b -0.2547
## 6 -1.8508 b 0.3293
DT[,1] #print column 1
## [1] 0.50492 -2.71782 -0.14713 0.36926 -1.17045 -1.85085 -0.07568 -0.94607
## [9] -0.47799
DT[,2] #print column 2
## [1] a a a b b b c c c
## Levels: a b c
DT[DT$y=="c",] #print columns where y="c"
## x y z
## 7 -0.07568 c 0.5264
## 8 -0.94607 c -0.4951
## 9 -0.47799 c -0.5098
#notice that the following code in not subsetting, it is selecting an element from the dataframe.
DT[1, 3] #print 1st rows and 3rd column element
## [1] 1.605
DT[,c(1:2)] #print first 2 column, all rows
## x y
## 1 0.50492 a
## 2 -2.71782 a
## 3 -0.14713 a
## 4 0.36926 b
## 5 -1.17045 b
## 6 -1.85085 b
## 7 -0.07568 c
## 8 -0.94607 c
## 9 -0.47799 c
DT[c(3:5),c(1:3)] #print 3 to 5 rows and 3 columns
## x y z
## 3 -0.1471 a -1.2249
## 4 0.3693 b 1.9505
## 5 -1.1704 b -0.2547
Column
Subsetting in Data Table:
k={print(100); 55}
## [1] 100
k
## [1] 55