Data Tables In R and Example Subsetting

 

1.
You can apply all function that you can apply to data frames

2.
data.table is written in C (I am learning that language too!)

3.
Much faster.

load
from library

In this exercise, first
lets create a data frame to use for our R program.

library(data.table)
My_Dataframe=data.frame(x=rnorm(9), y=rep(c("a","b","c"), each=3), z=rnorm(9))
head(My_Dataframe)
##         x y       z
## 1  0.5049 a  1.6050
## 2 -2.7178 a -0.9644
## 3 -0.1471 a -1.2249
## 4  0.3693 b  1.9505
## 5 -1.1704 b -0.2547
## 6 -1.8508 b  0.3293
str(My_Dataframe)
## 'data.frame':    9 obs. of  3 variables:
##  $ x: num  0.505 -2.718 -0.147 0.369 -1.17 ...
##  $ y: Factor w/ 3 levels "a","b","c": 1 1 1 2 2 2 3 3 3
##  $ z: num  1.605 -0.964 -1.225 1.951 -0.255 ...

To See all of the data
tables in the memory use command:

tables()
## No objects of class data.table exist in .GlobalEnv

Subsetting
ROWs:

DT=My_Dataframe
head(DT)
##         x y       z
## 1  0.5049 a  1.6050
## 2 -2.7178 a -0.9644
## 3 -0.1471 a -1.2249
## 4  0.3693 b  1.9505
## 5 -1.1704 b -0.2547
## 6 -1.8508 b  0.3293
DT[1,] #print row 1
##        x y     z
## 1 0.5049 a 1.605
DT[2,] #print row 2
##        x y       z
## 2 -2.718 a -0.9644
DT[DT$z>1,] #print rows where z is greater than 1
##        x y     z
## 1 0.5049 a 1.605
## 4 0.3693 b 1.951
DT[DT$y=="c",] #print rows where y="c"
##          x y       z
## 7 -0.07568 c  0.5264
## 8 -0.94607 c -0.4951
## 9 -0.47799 c -0.5098
#notice that the following code in not subsetting, it is selecting an element from the dataframe.
DT[1, 3] #print 1st rows and 3rd column element
## [1] 1.605
DT[c(1:5,3),] #print first 5 rows and 3 columns
##           x y       z
## 1    0.5049 a  1.6050
## 2   -2.7178 a -0.9644
## 3   -0.1471 a -1.2249
## 4    0.3693 b  1.9505
## 5   -1.1704 b -0.2547
## 3.1 -0.1471 a -1.2249
DT[c(3:5),c(1:2)] #print 3 to 5 rows and 2 columns
##         x y
## 3 -0.1471 a
## 4  0.3693 b
## 5 -1.1704 b

Subsetting
COLUMNs:

DT=My_Dataframe
head(DT)
##         x y       z
## 1  0.5049 a  1.6050
## 2 -2.7178 a -0.9644
## 3 -0.1471 a -1.2249
## 4  0.3693 b  1.9505
## 5 -1.1704 b -0.2547
## 6 -1.8508 b  0.3293
DT[,1] #print column 1
## [1]  0.50492 -2.71782 -0.14713  0.36926 -1.17045 -1.85085 -0.07568 -0.94607
## [9] -0.47799
DT[,2] #print column 2
## [1] a a a b b b c c c
## Levels: a b c
DT[DT$y=="c",] #print columns where y="c"
##          x y       z
## 7 -0.07568 c  0.5264
## 8 -0.94607 c -0.4951
## 9 -0.47799 c -0.5098
#notice that the following code in not subsetting, it is selecting an element from the dataframe.
DT[1, 3] #print 1st rows and 3rd column element
## [1] 1.605
DT[,c(1:2)] #print first 2 column, all rows
##          x y
## 1  0.50492 a
## 2 -2.71782 a
## 3 -0.14713 a
## 4  0.36926 b
## 5 -1.17045 b
## 6 -1.85085 b
## 7 -0.07568 c
## 8 -0.94607 c
## 9 -0.47799 c
DT[c(3:5),c(1:3)] #print 3 to 5 rows and 3 columns
##         x y       z
## 3 -0.1471 a -1.2249
## 4  0.3693 b  1.9505
## 5 -1.1704 b -0.2547

Column
Subsetting in Data Table:

k={print(100); 55}
## [1] 100
k
## [1] 55

 

Leave a Comment

Your email address will not be published. Required fields are marked *