VARIOUS LOOPING TECHNIQUES in R with EXAMPLES

 

VARIOUS LOOPING TECHNIQUES in R with EXAMPLES

In this text, we are going to over some of the most extraordinary functions to help us loop in R. Mastering these tools (functions) would make a more efficient R-programmer. I use R for various geologic/geochemical and hydrogeologic investigations and I use the functions discussed here a lot. Although, the examples used in this write up do not produce any graphs, you can easily produce graphs with the looping functions in R.

lapply(X, FUN, …)

sapply(X, FUN, …,
simplify = TRUE, USE.NAMES = TRUE)

vapply(X, FUN, FUN.VALUE,
…, USE.NAMES = TRUE)

replicate(n, expr,
simplify = “array”)

simplify2array(x, higher
= TRUE)

lapply

Apply a Function over a
List or Vector. loop over a list, item by item. Actual looping is done
in C language.

str(lapply)
## function (X, FUN, ...)
object=list(a=1:10, b=rnorm(10))
object
## $a
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $b
##  [1] -0.45736 -1.00367  1.61543  0.12046 -0.03302 -0.49324 -0.74551
##  [8]  0.85625 -0.81216 -0.18206
lapply(object, sd)
## $a
## [1] 3.028
## 
## $b
## [1] 0.8123

lapply returns a list of
the same length as X, each element of which is the result of applying
FUN to the corresponding element of X.

object=1:10
object
##  [1]  1  2  3  4  5  6  7  8  9 10
lapply(object, runif, min=0, max=10) #runif generates random deviates.
## [[1]]
## [1] 8.837
## 
## [[2]]
## [1] 8.957 3.166
## 
## [[3]]
## [1] 4.177 3.054 2.045
## 
## [[4]]
## [1] 8.055 7.978 7.803 2.051
## 
## [[5]]
## [1] 5.077 4.916 1.499 8.946 5.967
## 
## [[6]]
## [1] 3.674 1.465 3.482 2.330 7.211 4.193
## 
## [[7]]
## [1] 3.925 2.792 6.544 3.256 6.603 4.842 1.278
## 
## [[8]]
## [1] 5.086 1.168 2.051 3.887 2.210 5.409 8.314 7.609
## 
## [[9]]
## [1] 6.3217 9.3295 0.8053 8.2122 7.5664 6.2449 7.9882 4.4039 6.1933
## 
## [[10]]
##  [1] 1.2048 4.7770 0.1617 4.6686 6.4722 2.3602 5.1982 0.5486 4.4962 4.0775

Example of annonymous
fuctions within lapply

object=list(a=matrix(1:4,2,2), b=matrix(1:6, 3,2))
object
## $a
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## 
## $b
##      [,1] [,2]
## [1,]    1    4
## [2,]    2    5
## [3,]    3    6
lapply(object, function(firstcolumn) firstcolumn[,1]) 
## $a
## [1] 1 2
## 
## $b
## [1] 1 2 3
lapply(object, function(firstrow) firstrow[1,]) 
## $a
## [1] 1 3
## 
## $b
## [1] 1 4

sapply

sapply is a user-friendly
version and wrapper of lapply by default returning a vector, matrix or,
if simplify = “array”, an array.

similar to lapply,
simplified, loop over a list, item by item

str(sapply)
## function (X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
object=list(a=1:10, b=rnorm(10))
object
## $a
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $b
##  [1]  0.5605  0.8882  0.3827  0.5794 -0.2564 -1.0797  1.1774  0.7096
##  [9] -0.3987 -0.9082
loutput=lapply(object, sd)
loutput
## $a
## [1] 3.028
## 
## $b
## [1] 0.7758
class(loutput)
## [1] "list"
soutput=sapply(object,sd) # notice difference
soutput
##      a      b 
## 3.0277 0.7758
class(soutput)
## [1] "numeric"

vapply

vapply is similar to
sapply, but has a pre-specified type of return value, so it can be
safer (and sometimes faster) to use.

apply

apply a function of your
choice over an array; not really looping

str(apply)
## function (X, MARGIN, FUN, ...)
#MARGIN  is a vector giving the subscripts which the function will be applied over;  for a matrix 1 indicates rows, 2 indicates columns
my_matrix<-matrix(rnorm(30), 10, 3)
apply(my_matrix, 2, mean) # apply mean on columns
## [1] -0.38317 -0.13071 -0.02697
apply(my_matrix, 1, mean) # apply mean on rows
##  [1]  9.349e-06  1.120e-01 -1.958e+00  1.879e-01  4.237e-01  6.855e-01
##  [7] -2.568e-01 -2.324e-01  4.607e-01 -1.226e+00
rowSums = apply(my_matrix, 1, sum)
rowSums
##  [1]  2.805e-05  3.360e-01 -5.873e+00  5.638e-01  1.271e+00  2.056e+00
##  [7] -7.704e-01 -6.973e-01  1.382e+00 -3.677e+00
rowMeans = apply(my_matrix, 1, mean)
rowMeans
##  [1]  9.349e-06  1.120e-01 -1.958e+00  1.879e-01  4.237e-01  6.855e-01
##  [7] -2.568e-01 -2.324e-01  4.607e-01 -1.226e+00
colSums = apply(my_matrix, 2, sum)
colSums
## [1] -3.8317 -1.3071 -0.2697
colMeans = apply(my_matrix, 2, mean)
colMeans
## [1] -0.38317 -0.13071 -0.02697
apply(my_matrix, 1, quantile, probs = c(0.25, 0.75))
##        [,1]   [,2]   [,3]    [,4]   [,5]   [,6]    [,7]    [,8]     [,9]
## 25% -0.2612 -0.417 -2.323 -0.1307 0.3783 0.1301 -0.6924 -0.9489 -0.02951
## 75%  0.4376  0.423 -1.651  0.4127 0.4870 1.1369  0.2504  0.4258  0.87513
##       [,10]
## 25% -1.7341
## 75% -0.9402
 a <- array(data=rnorm(2 * 2 * 10), dim=c(2, 2, 10)) #generate 40 random numbers and assign 2x2x10 dimentions
a
## , , 1
## 
##         [,1]   [,2]
## [1,] -0.4396 -0.184
## [2,]  1.0707  1.776
## 
## , , 2
## 
##         [,1]    [,2]
## [1,] -0.2107  0.1165
## [2,]  0.4991 -0.8378
## 
## , , 3
## 
##        [,1]    [,2]
## [1,] 1.6221  0.4075
## [2,] 0.1481 -0.4790
## 
## , , 4
## 
##         [,1]  [,2]
## [1,] -0.4467 1.560
## [2,] -0.3858 2.175
## 
## , , 5
## 
##         [,1]  [,2]
## [1,] -0.8019 1.474
## [2,] -0.3677 1.580
## 
## , , 6
## 
##         [,1]   [,2]
## [1,]  0.1808 -1.560
## [2,] -0.4412  2.058
## 
## , , 7
## 
##        [,1]     [,2]
## [1,] 0.8715 -0.01511
## [2,] 0.2059 -0.61433
## 
## , , 8
## 
##         [,1]    [,2]
## [1,]  0.6668 -0.4581
## [2,] -0.9788 -2.3214
## 
## , , 9
## 
##         [,1]    [,2]
## [1,] -0.4312  0.3950
## [2,] -0.5110 -0.8342
## 
## , , 10
## 
##        [,1]    [,2]
## [1,] 0.5953 -0.2219
## [2,] 0.2690 -0.4078

tapply

apply a function of your
choice over any subset of a vector Apply a function to each cell of a
ragged array, that is to each (non-empty) group of values given by a
unique combination of the levels of certain factors.

str(tapply)
## function (X, INDEX, FUN = NULL, ..., simplify = TRUE)
require(stats)
groups <- as.factor(rbinom(100, n = 5, prob = 0.8)) #rbinom(n, size, prob)
groups
## [1] 81 83 83 81 84
## Levels: 81 83 84
tapply(groups, groups, length) #- is almost the same as
## 81 83 84 
##  2  2  1
table(groups)
## groups
## 81 83 84 
##  2  2  1
x <- c(rnorm(10), runif(10), rnorm(10, 1))
x
##  [1]  0.14386 -0.21274 -0.31354 -0.32914 -0.74218 -0.41923  0.32376
##  [8]  0.57782  1.29516  0.51849  0.09856  0.74341  0.64724  0.40818
## [15]  0.80790  0.46450  0.92045  0.65810  0.96937  0.53147 -0.54966
## [22]  1.03216  1.40758  1.18356  3.07369  1.99872 -0.30349  2.04159
## [29]  0.87591  0.73995
f <- gl(3, 10) # generate 3 levels with 10 repetitions
#gl(n, k, length = n*k, labels = 1:n, ordered = FALSE)
#Generate factors by specifying the pattern of their levels.
f
##  [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
## Levels: 1 2 3
tapply(x, f, mean) #get the mean for each level in x
##       1       2       3 
## 0.08423 0.62492 1.15000
tapply(x, f, mean, simplify = FALSE) #group means without simplification
## $`1`
## [1] 0.08423
## 
## $`2`
## [1] 0.6249
## 
## $`3`
## [1] 1.15
tapply(x, f, range) # check group ranges
## $`1`
## [1] -0.7422  1.2952
## 
## $`2`
## [1] 0.09856 0.96937
## 
## $`3`
## [1] -0.5497  3.0737
tapply(x, f, min) # check group min
##        1        2        3 
## -0.74218  0.09856 -0.54966

split

Divide into Groups and
Reassemble. split divides the data in the vector x into the groups
defined by f.

The replacement forms
replace values corresponding to such a division.

unsplit reverses the
effect of split. Usage

split(x, f, drop = FALSE,
…) split(x, f, drop = FALSE, …) <- value unsplit(value, f, drop
= FALSE

x <- c(rnorm(10), runif(10), rnorm(10, 1))
f <- gl(3, 10)
x # no splitting in the data
##  [1] -0.64199  1.24759  0.94857 -1.91172  0.17663  0.10583  0.08753
##  [8]  0.44691  0.88841 -0.24475  0.61367  0.93486  0.74584  0.74547
## [15]  0.45167  0.36987  0.54612  0.66831  0.36644  0.44551  1.18085
## [22]  0.03665 -0.02822  2.46721  0.60715  1.50149  2.13073  1.17925
## [29] -0.27713 -0.75292
split(x,f) #splitted data by factors
## $`1`
##  [1] -0.64199  1.24759  0.94857 -1.91172  0.17663  0.10583  0.08753
##  [8]  0.44691  0.88841 -0.24475
## 
## $`2`
##  [1] 0.6137 0.9349 0.7458 0.7455 0.4517 0.3699 0.5461 0.6683 0.3664 0.4455
## 
## $`3`
##  [1]  1.18085  0.03665 -0.02822  2.46721  0.60715  1.50149  2.13073
##  [8]  1.17925 -0.27713 -0.75292
lapply(split(x, f), mean) # is same as:
## $`1`
## [1] 0.1103
## 
## $`2`
## [1] 0.5888
## 
## $`3`
## [1] 0.8045
sapply(split(x, f), mean) #is same as :
##      1      2      3 
## 0.1103 0.5888 0.8045
tapply(x, f, mean)
##      1      2      3 
## 0.1103 0.5888 0.8045

You can also split a data
frame

library(datasets)
head(airquality, 2)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
#Split air Quality data by month
s <- split(airquality, airquality$Month)
s
## $`5`
##    Ozone Solar.R Wind Temp Month Day
## 1     41     190  7.4   67     5   1
## 2     36     118  8.0   72     5   2
## 3     12     149 12.6   74     5   3
## 4     18     313 11.5   62     5   4
## 5     NA      NA 14.3   56     5   5
## 6     28      NA 14.9   66     5   6
## 7     23     299  8.6   65     5   7
## 8     19      99 13.8   59     5   8
## 9      8      19 20.1   61     5   9
## 10    NA     194  8.6   69     5  10
## 11     7      NA  6.9   74     5  11
## 12    16     256  9.7   69     5  12
## 13    11     290  9.2   66     5  13
## 14    14     274 10.9   68     5  14
## 15    18      65 13.2   58     5  15
## 16    14     334 11.5   64     5  16
## 17    34     307 12.0   66     5  17
## 18     6      78 18.4   57     5  18
## 19    30     322 11.5   68     5  19
## 20    11      44  9.7   62     5  20
## 21     1       8  9.7   59     5  21
## 22    11     320 16.6   73     5  22
## 23     4      25  9.7   61     5  23
## 24    32      92 12.0   61     5  24
## 25    NA      66 16.6   57     5  25
## 26    NA     266 14.9   58     5  26
## 27    NA      NA  8.0   57     5  27
## 28    23      13 12.0   67     5  28
## 29    45     252 14.9   81     5  29
## 30   115     223  5.7   79     5  30
## 31    37     279  7.4   76     5  31
## 
## $`6`
##    Ozone Solar.R Wind Temp Month Day
## 32    NA     286  8.6   78     6   1
## 33    NA     287  9.7   74     6   2
## 34    NA     242 16.1   67     6   3
## 35    NA     186  9.2   84     6   4
## 36    NA     220  8.6   85     6   5
## 37    NA     264 14.3   79     6   6
## 38    29     127  9.7   82     6   7
## 39    NA     273  6.9   87     6   8
## 40    71     291 13.8   90     6   9
## 41    39     323 11.5   87     6  10
## 42    NA     259 10.9   93     6  11
## 43    NA     250  9.2   92     6  12
## 44    23     148  8.0   82     6  13
## 45    NA     332 13.8   80     6  14
## 46    NA     322 11.5   79     6  15
## 47    21     191 14.9   77     6  16
## 48    37     284 20.7   72     6  17
## 49    20      37  9.2   65     6  18
## 50    12     120 11.5   73     6  19
## 51    13     137 10.3   76     6  20
## 52    NA     150  6.3   77     6  21
## 53    NA      59  1.7   76     6  22
## 54    NA      91  4.6   76     6  23
## 55    NA     250  6.3   76     6  24
## 56    NA     135  8.0   75     6  25
## 57    NA     127  8.0   78     6  26
## 58    NA      47 10.3   73     6  27
## 59    NA      98 11.5   80     6  28
## 60    NA      31 14.9   77     6  29
## 61    NA     138  8.0   83     6  30
## 
## $`7`
##    Ozone Solar.R Wind Temp Month Day
## 62   135     269  4.1   84     7   1
## 63    49     248  9.2   85     7   2
## 64    32     236  9.2   81     7   3
## 65    NA     101 10.9   84     7   4
## 66    64     175  4.6   83     7   5
## 67    40     314 10.9   83     7   6
## 68    77     276  5.1   88     7   7
## 69    97     267  6.3   92     7   8
## 70    97     272  5.7   92     7   9
## 71    85     175  7.4   89     7  10
## 72    NA     139  8.6   82     7  11
## 73    10     264 14.3   73     7  12
## 74    27     175 14.9   81     7  13
## 75    NA     291 14.9   91     7  14
## 76     7      48 14.3   80     7  15
## 77    48     260  6.9   81     7  16
## 78    35     274 10.3   82     7  17
## 79    61     285  6.3   84     7  18
## 80    79     187  5.1   87     7  19
## 81    63     220 11.5   85     7  20
## 82    16       7  6.9   74     7  21
## 83    NA     258  9.7   81     7  22
## 84    NA     295 11.5   82     7  23
## 85    80     294  8.6   86     7  24
## 86   108     223  8.0   85     7  25
## 87    20      81  8.6   82     7  26
## 88    52      82 12.0   86     7  27
## 89    82     213  7.4   88     7  28
## 90    50     275  7.4   86     7  29
## 91    64     253  7.4   83     7  30
## 92    59     254  9.2   81     7  31
## 
## $`8`
##     Ozone Solar.R Wind Temp Month Day
## 93     39      83  6.9   81     8   1
## 94      9      24 13.8   81     8   2
## 95     16      77  7.4   82     8   3
## 96     78      NA  6.9   86     8   4
## 97     35      NA  7.4   85     8   5
## 98     66      NA  4.6   87     8   6
## 99    122     255  4.0   89     8   7
## 100    89     229 10.3   90     8   8
## 101   110     207  8.0   90     8   9
## 102    NA     222  8.6   92     8  10
## 103    NA     137 11.5   86     8  11
## 104    44     192 11.5   86     8  12
## 105    28     273 11.5   82     8  13
## 106    65     157  9.7   80     8  14
## 107    NA      64 11.5   79     8  15
## 108    22      71 10.3   77     8  16
## 109    59      51  6.3   79     8  17
## 110    23     115  7.4   76     8  18
## 111    31     244 10.9   78     8  19
## 112    44     190 10.3   78     8  20
## 113    21     259 15.5   77     8  21
## 114     9      36 14.3   72     8  22
## 115    NA     255 12.6   75     8  23
## 116    45     212  9.7   79     8  24
## 117   168     238  3.4   81     8  25
## 118    73     215  8.0   86     8  26
## 119    NA     153  5.7   88     8  27
## 120    76     203  9.7   97     8  28
## 121   118     225  2.3   94     8  29
## 122    84     237  6.3   96     8  30
## 123    85     188  6.3   94     8  31
## 
## $`9`
##     Ozone Solar.R Wind Temp Month Day
## 124    96     167  6.9   91     9   1
## 125    78     197  5.1   92     9   2
## 126    73     183  2.8   93     9   3
## 127    91     189  4.6   93     9   4
## 128    47      95  7.4   87     9   5
## 129    32      92 15.5   84     9   6
## 130    20     252 10.9   80     9   7
## 131    23     220 10.3   78     9   8
## 132    21     230 10.9   75     9   9
## 133    24     259  9.7   73     9  10
## 134    44     236 14.9   81     9  11
## 135    21     259 15.5   76     9  12
## 136    28     238  6.3   77     9  13
## 137     9      24 10.9   71     9  14
## 138    13     112 11.5   71     9  15
## 139    46     237  6.9   78     9  16
## 140    18     224 13.8   67     9  17
## 141    13      27 10.3   76     9  18
## 142    24     238 10.3   68     9  19
## 143    16     201  8.0   82     9  20
## 144    13     238 12.6   64     9  21
## 145    23      14  9.2   71     9  22
## 146    36     139 10.3   81     9  23
## 147     7      49 10.3   69     9  24
## 148    14      20 16.6   63     9  25
## 149    30     193  6.9   70     9  26
## 150    NA     145 13.2   77     9  27
## 151    14     191 14.3   75     9  28
## 152    18     131  8.0   76     9  29
## 153    20     223 11.5   68     9  30
lapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
## $`5`
##   Ozone Solar.R    Wind 
##      NA      NA   11.62 
## 
## $`6`
##   Ozone Solar.R    Wind 
##      NA  190.17   10.27 
## 
## $`7`
##   Ozone Solar.R    Wind 
##      NA 216.484   8.942 
## 
## $`8`
##   Ozone Solar.R    Wind 
##      NA      NA   8.794 
## 
## $`9`
##   Ozone Solar.R    Wind 
##      NA  167.43   10.18
colMeans(airquality) # for all columns
##   Ozone Solar.R    Wind    Temp   Month     Day 
##      NA      NA   9.958  77.882   6.993  15.804
sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")]))
##             5      6       7     8      9
## Ozone      NA     NA      NA    NA     NA
## Solar.R    NA 190.17 216.484    NA 167.43
## Wind    11.62  10.27   8.942 8.794  10.18
#Now try removing NAs from the dataset
sapply(s, function(x) colMeans(x[, c("Ozone", "Solar.R", "Wind")], na.rm = TRUE))
##              5      6       7       8      9
## Ozone    23.62  29.44  59.115  59.962  31.45
## Solar.R 181.30 190.17 216.484 171.857 167.43
## Wind     11.62  10.27   8.942   8.794  10.18

You can also split more
than one level with SPLIT command

x=rnorm(20)
x
##  [1]  2.48597  1.34398  0.57253  1.69531  0.75884  0.17921  0.14734
##  [8]  1.09718  0.07409 -2.00592 -1.49990 -2.23533  0.31148  0.68989
## [15]  2.61388 -1.32126  1.75524  0.36550 -0.76513  0.24786
factor1=gl(4,5)
factor1
##  [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
## Levels: 1 2 3 4
factor2=gl(5,4)
factor2
##  [1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5
## Levels: 1 2 3 4 5
interaction(factor1, factor2)
##  [1] 1.1 1.1 1.1 1.1 1.2 2.2 2.2 2.2 2.3 2.3 3.3 3.3 3.4 3.4 3.4 4.4 4.5
## [18] 4.5 4.5 4.5
## 20 Levels: 1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2 1.3 2.3 3.3 4.3 1.4 2.4 ... 4.5
str(split(x, list(factor1, factor2))) # this creates empty levels which can be dropped using
## List of 20
##  $ 1.1: num [1:4] 2.486 1.344 0.573 1.695
##  $ 2.1: num(0) 
##  $ 3.1: num(0) 
##  $ 4.1: num(0) 
##  $ 1.2: num 0.759
##  $ 2.2: num [1:3] 0.179 0.147 1.097
##  $ 3.2: num(0) 
##  $ 4.2: num(0) 
##  $ 1.3: num(0) 
##  $ 2.3: num [1:2] 0.0741 -2.0059
##  $ 3.3: num [1:2] -1.5 -2.24
##  $ 4.3: num(0) 
##  $ 1.4: num(0) 
##  $ 2.4: num(0) 
##  $ 3.4: num [1:3] 0.311 0.69 2.614
##  $ 4.4: num -1.32
##  $ 1.5: num(0) 
##  $ 2.5: num(0) 
##  $ 3.5: num(0) 
##  $ 4.5: num [1:4] 1.755 0.365 -0.765 0.248
str(split(x, list(factor1, factor2), drop = TRUE))
## List of 8
##  $ 1.1: num [1:4] 2.486 1.344 0.573 1.695
##  $ 1.2: num 0.759
##  $ 2.2: num [1:3] 0.179 0.147 1.097
##  $ 2.3: num [1:2] 0.0741 -2.0059
##  $ 3.3: num [1:2] -1.5 -2.24
##  $ 3.4: num [1:3] 0.311 0.69 2.614
##  $ 4.4: num -1.32
##  $ 4.5: num [1:4] 1.755 0.365 -0.765 0.248

mapply

this is the multivariate
version of tapply. Apply a Function to Multiple List or Vector
Arguments.

mapply is a multivariate
version of sapply. mapply applies FUN to the first elements of each …
argument, the second elements, the third elements, and so on. Arguments
are recycled if necessary.

mapply(rep, 1:4, 4:1)
## [[1]]
## [1] 1 1 1 1
## 
## [[2]]
## [1] 2 2 2
## 
## [[3]]
## [1] 3 3
## 
## [[4]]
## [1] 4
mapply(rep, times = 1:4, x = 4:1)
## [[1]]
## [1] 4
## 
## [[2]]
## [1] 3 3
## 
## [[3]]
## [1] 2 2 2
## 
## [[4]]
## [1] 1 1 1 1

More
Examples

require(stats); require(graphics)

x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))
# compute the list mean for each list element
lapply(x, mean)
## $a
## [1] 5.5
## 
## $beta
## [1] 4.535
## 
## $logic
## [1] 0.5
# median and quartiles for each list element
x
## $a
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $beta
## [1]  0.04979  0.13534  0.36788  1.00000  2.71828  7.38906 20.08554
## 
## $logic
## [1]  TRUE FALSE FALSE  TRUE
lapply(x, quantile, probs = 1:3/4)
## $a
##  25%  50%  75% 
## 3.25 5.50 7.75 
## 
## $beta
##    25%    50%    75% 
## 0.2516 1.0000 5.0537 
## 
## $logic
## 25% 50% 75% 
## 0.0 0.5 1.0
x
## $a
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $beta
## [1]  0.04979  0.13534  0.36788  1.00000  2.71828  7.38906 20.08554
## 
## $logic
## [1]  TRUE FALSE FALSE  TRUE
sapply(x, quantile)
##          a     beta logic
## 0%    1.00  0.04979   0.0
## 25%   3.25  0.25161   0.0
## 50%   5.50  1.00000   0.5
## 75%   7.75  5.05367   1.0
## 100% 10.00 20.08554   1.0
i39 <- sapply(3:9, seq) # list of vectors
i39
## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] 1 2 3 4
## 
## [[3]]
## [1] 1 2 3 4 5
## 
## [[4]]
## [1] 1 2 3 4 5 6
## 
## [[5]]
## [1] 1 2 3 4 5 6 7
## 
## [[6]]
## [1] 1 2 3 4 5 6 7 8
## 
## [[7]]
## [1] 1 2 3 4 5 6 7 8 9
sapply(i39, fivenum)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]  1.0  1.0    1  1.0  1.0  1.0    1
## [2,]  1.5  1.5    2  2.0  2.5  2.5    3
## [3,]  2.0  2.5    3  3.5  4.0  4.5    5
## [4,]  2.5  3.5    4  5.0  5.5  6.5    7
## [5,]  3.0  4.0    5  6.0  7.0  8.0    9

 

Dates and Times in R

 

Dealing with dates

Dates are represented by
the Date class Times are represented by the POSIXct or the POSIXlt
class Dates and times are stored internally as the number of days and
seconds since 1970-01-01 respectively.

date=as.Date("2013-12-11")
date
## [1] "2013-12-11"
class(date)
## [1] "Date"

POSIXct store time
similar to a data frame POSIXlt is a list containing day of the week,
day of the year, month, day of the month

#to figure our the day of the "date", try
weekdays(date)
## [1] "Wednesday"
#to figure our the month of the "date", try
months(date)
## [1] "December"
#to figure out the quarter of the year, try
quarters(date)
## [1] "Q4"

Dealing with time

Class “POSIXct”
represents the (signed) number of seconds since the beginning of 1970
(in the UTC timezone) as a numeric vector. Class “POSIXlt” is a named
list of vectors representing sec, min etc as elaborated below.

my_time<-Sys.time()
my_time
## [1] "2014-08-12 16:43:03 EDT"
#convert system time to POSIXlt
t1_Xlt <- as.POSIXlt(my_time)
t1_Xlt
## [1] "2014-08-12 16:43:03 EDT"
#unclass and use name function to see parameters that can be stored
names(unclass(t1_Xlt))
## [1] "sec"   "min"   "hour"  "mday"  "mon"   "year"  "wday"  "yday"  "isdst"
#now you can get all in information about time
t1_Xlt$sec  #0-61 secs
## [1] 3.024
t1_Xlt$min  #0-59 mins
## [1] 43
t1_Xlt$hour #0-23 hours
## [1] 16
t1_Xlt$mday #1-31st day of the month
## [1] 12
t1_Xlt$mon  #1-11 month since the first of the year
## [1] 7
t1_Xlt$year # since 1900
## [1] 114
t1_Xlt$wday #0-6 day of the week, starting on Sunday.
## [1] 2
t1_Xlt$yday #0-365: day of the year.
## [1] 223
t1_Xlt$isdst #Daylight Savings Time flag. Positive if in force, zero if not, negative if unknown.
## [1] 1

strptime

Functions to convert
between character representations and objects of classes “POSIXlt” and
“POSIXct” representing calendar dates and times.

date <- c("December 13, 2013 7:40", "December 23, 2014 8:40")
fixed_time <- strptime(date, "%B %d, %Y %H:%M")
fixed_time
## [1] "2013-12-13 07:40:00" "2014-12-23 08:40:00"
class(fixed_time)
## [1] "POSIXlt" "POSIXt"

 

Conditionals in R – For, While, Repeat Loops

 

Some example codes from R
course. Enhanced with comments and presented in a Knit HTML format.

IF ELSE in R

x=3 #initialize x as 3
y=100000 #initializde y to an arbitrary number
if(x>3){
  y<-100
} else {
  y=10
}
x # Print x
## [1] 3
y # see the new value of y after the if-else command
## [1] 10

Try writing the same code
in a different way:

x=3 #initialize x as 3
y=100000 #initializde y to an arbitrary number
y <- if(x > 3) {
 1000
} else { 
 10
}
x # Print x
## [1] 3
y # see the new value of y after the if-else command
## [1] 10

FOR LOOP in R

Example 1

for(i in 1:5){print(i)}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

Example2

x=c("R","C","JAVA","PYTHON") #create a vector with 4 character variables
for(loop in 1:4){print(x[loop])} #break the tradion of sticking to "i"
## [1] "R"
## [1] "C"
## [1] "JAVA"
## [1] "PYTHON"

Example
3 using seq_along

x=c("R","C","JAVA","PYTHON") #create a vector with 4 character variables
for(content in seq_along(x)){print(x[content])} #break the tradion of sticking to "i"
## [1] "R"
## [1] "C"
## [1] "JAVA"
## [1] "PYTHON"

Example 4

x=c("R","C","JAVA","PYTHON") #create a vector with 4 character variables
for(progLang in x) {print(x[progLang])} #break the tradion of sticking to "i"
## [1] NA
## [1] NA
## [1] NA
## [1] NA

NESTED LOOPS

print each element of the
matric

m=matrix(1:20, 4,5) #create a matrix 4 rows and 5 columns
m # print matrix
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    5    9   13   17
## [2,]    2    6   10   14   18
## [3,]    3    7   11   15   19
## [4,]    4    8   12   16   20
nrow(m)
## [1] 4
ncol(m)
## [1] 5
#print each element of the matric
for(rows in seq_len(nrow(m))) {
  for(columns in seq_len(ncol(m))){
    print(m[rows,columns])
    }
  }
## [1] 1
## [1] 5
## [1] 9
## [1] 13
## [1] 17
## [1] 2
## [1] 6
## [1] 10
## [1] 14
## [1] 18
## [1] 3
## [1] 7
## [1] 11
## [1] 15
## [1] 19
## [1] 4
## [1] 8
## [1] 12
## [1] 16
## [1] 20

WHILE LOOPS in R

Be careful with while
loops and don’t create an infite one!!!

counter=0 #initialize a counter variable
while(counter<15){print(counter)
                counter=counter+2}
## [1] 0
## [1] 2
## [1] 4
## [1] 6
## [1] 8
## [1] 10
## [1] 12
## [1] 14

More than one condition
testing with WHILE loop

z <- 5
t1="test1"
t2="test2"
z
## [1] 5
t1
## [1] "test1"
t2
## [1] "test2"
while(z >= -5 && z <= 10) {
 #Flip the coin one time
 coin <- rbinom(n=1, size=1, prob=0.5) 
 #n=number of observations, size=number of trials 
 if(coin == 1) { ## output of the rbinom by chance 1
 z <- z + 1
 cat(paste(coin, t1,z,"\n"))
 } else {
 z <- z - 1
 cat(paste(coin, t2,z,"\n"))
 } 
}
## 0 test2 4 
## 1 test1 5 
## 0 test2 4 
## 1 test1 5 
## 1 test1 6 
## 0 test2 5 
## 1 test1 6 
## 1 test1 7 
## 1 test1 8 
## 0 test2 7 
## 1 test1 8 
## 0 test2 7 
## 1 test1 8 
## 0 test2 7 
## 0 test2 6 
## 1 test1 7 
## 0 test2 6 
## 0 test2 5 
## 0 test2 4 
## 1 test1 5 
## 0 test2 4 
## 0 test2 3 
## 1 test1 4 
## 0 test2 3 
## 1 test1 4 
## 1 test1 5 
## 0 test2 4 
## 0 test2 3 
## 1 test1 4 
## 1 test1 5 
## 0 test2 4 
## 1 test1 5 
## 1 test1 6 
## 1 test1 7 
## 0 test2 6 
## 1 test1 7 
## 1 test1 8 
## 0 test2 7 
## 1 test1 8 
## 1 test1 9 
## 0 test2 8 
## 0 test2 7 
## 0 test2 6 
## 0 test2 5 
## 0 test2 4 
## 0 test2 3 
## 1 test1 4 
## 0 test2 3 
## 0 test2 2 
## 0 test2 1 
## 0 test2 0 
## 0 test2 -1 
## 0 test2 -2 
## 1 test1 -1 
## 0 test2 -2 
## 1 test1 -1 
## 1 test1 0 
## 1 test1 1 
## 1 test1 2 
## 1 test1 3 
## 0 test2 2 
## 0 test2 1 
## 0 test2 0 
## 1 test1 1 
## 0 test2 0 
## 0 test2 -1 
## 1 test1 0 
## 1 test1 1 
## 0 test2 0 
## 1 test1 1 
## 0 test2 0 
## 0 test2 -1 
## 0 test2 -2 
## 1 test1 -1 
## 1 test1 0 
## 0 test2 -1 
## 1 test1 0 
## 0 test2 -1 
## 1 test1 0 
## 1 test1 1 
## 1 test1 2 
## 1 test1 3 
## 1 test1 4 
## 1 test1 5 
## 0 test2 4 
## 1 test1 5 
## 1 test1 6 
## 1 test1 7 
## 1 test1 8 
## 0 test2 7 
## 1 test1 8 
## 1 test1 9 
## 1 test1 10 
## 0 test2 9 
## 0 test2 8 
## 0 test2 7 
## 0 test2 6 
## 1 test1 7 
## 0 test2 6 
## 1 test1 7 
## 0 test2 6 
## 0 test2 5 
## 1 test1 6 
## 1 test1 7 
## 0 test2 6 
## 1 test1 7 
## 0 test2 6 
## 0 test2 5 
## 1 test1 6 
## 1 test1 7 
## 1 test1 8 
## 0 test2 7 
## 0 test2 6 
## 0 test2 5 
## 0 test2 4 
## 0 test2 3 
## 1 test1 4 
## 0 test2 3 
## 0 test2 2 
## 0 test2 1 
## 1 test1 2 
## 1 test1 3 
## 1 test1 4 
## 1 test1 5 
## 0 test2 4 
## 1 test1 5 
## 0 test2 4 
## 0 test2 3 
## 1 test1 4 
## 1 test1 5 
## 0 test2 4 
## 0 test2 3 
## 0 test2 2 
## 1 test1 3 
## 1 test1 4 
## 1 test1 5 
## 1 test1 6 
## 1 test1 7 
## 0 test2 6 
## 0 test2 5 
## 1 test1 6 
## 1 test1 7 
## 1 test1 8 
## 1 test1 9 
## 1 test1 10 
## 0 test2 9 
## 1 test1 10 
## 1 test1 11

REPEAT / BREAK in R

could be used for iterative solution

x0 <- 1
tol <- 1e-5
repeat {
 x1 <- sqrt(100)
 if(sqrt(x1 - x0) < tol) {
 break
 } else {
 x0 <- x1
 } 
}
x0
## [1] 10
x1
## [1] 10

 

Data Tables In R and Example Subsetting

 

1.
You can apply all function that you can apply to data frames

2.
data.table is written in C (I am learning that language too!)

3.
Much faster.

load
from library

In this exercise, first
lets create a data frame to use for our R program.

library(data.table)
My_Dataframe=data.frame(x=rnorm(9), y=rep(c("a","b","c"), each=3), z=rnorm(9))
head(My_Dataframe)
##         x y       z
## 1  0.5049 a  1.6050
## 2 -2.7178 a -0.9644
## 3 -0.1471 a -1.2249
## 4  0.3693 b  1.9505
## 5 -1.1704 b -0.2547
## 6 -1.8508 b  0.3293
str(My_Dataframe)
## 'data.frame':    9 obs. of  3 variables:
##  $ x: num  0.505 -2.718 -0.147 0.369 -1.17 ...
##  $ y: Factor w/ 3 levels "a","b","c": 1 1 1 2 2 2 3 3 3
##  $ z: num  1.605 -0.964 -1.225 1.951 -0.255 ...

To See all of the data
tables in the memory use command:

tables()
## No objects of class data.table exist in .GlobalEnv

Subsetting
ROWs:

DT=My_Dataframe
head(DT)
##         x y       z
## 1  0.5049 a  1.6050
## 2 -2.7178 a -0.9644
## 3 -0.1471 a -1.2249
## 4  0.3693 b  1.9505
## 5 -1.1704 b -0.2547
## 6 -1.8508 b  0.3293
DT[1,] #print row 1
##        x y     z
## 1 0.5049 a 1.605
DT[2,] #print row 2
##        x y       z
## 2 -2.718 a -0.9644
DT[DT$z>1,] #print rows where z is greater than 1
##        x y     z
## 1 0.5049 a 1.605
## 4 0.3693 b 1.951
DT[DT$y=="c",] #print rows where y="c"
##          x y       z
## 7 -0.07568 c  0.5264
## 8 -0.94607 c -0.4951
## 9 -0.47799 c -0.5098
#notice that the following code in not subsetting, it is selecting an element from the dataframe.
DT[1, 3] #print 1st rows and 3rd column element
## [1] 1.605
DT[c(1:5,3),] #print first 5 rows and 3 columns
##           x y       z
## 1    0.5049 a  1.6050
## 2   -2.7178 a -0.9644
## 3   -0.1471 a -1.2249
## 4    0.3693 b  1.9505
## 5   -1.1704 b -0.2547
## 3.1 -0.1471 a -1.2249
DT[c(3:5),c(1:2)] #print 3 to 5 rows and 2 columns
##         x y
## 3 -0.1471 a
## 4  0.3693 b
## 5 -1.1704 b

Subsetting
COLUMNs:

DT=My_Dataframe
head(DT)
##         x y       z
## 1  0.5049 a  1.6050
## 2 -2.7178 a -0.9644
## 3 -0.1471 a -1.2249
## 4  0.3693 b  1.9505
## 5 -1.1704 b -0.2547
## 6 -1.8508 b  0.3293
DT[,1] #print column 1
## [1]  0.50492 -2.71782 -0.14713  0.36926 -1.17045 -1.85085 -0.07568 -0.94607
## [9] -0.47799
DT[,2] #print column 2
## [1] a a a b b b c c c
## Levels: a b c
DT[DT$y=="c",] #print columns where y="c"
##          x y       z
## 7 -0.07568 c  0.5264
## 8 -0.94607 c -0.4951
## 9 -0.47799 c -0.5098
#notice that the following code in not subsetting, it is selecting an element from the dataframe.
DT[1, 3] #print 1st rows and 3rd column element
## [1] 1.605
DT[,c(1:2)] #print first 2 column, all rows
##          x y
## 1  0.50492 a
## 2 -2.71782 a
## 3 -0.14713 a
## 4  0.36926 b
## 5 -1.17045 b
## 6 -1.85085 b
## 7 -0.07568 c
## 8 -0.94607 c
## 9 -0.47799 c
DT[c(3:5),c(1:3)] #print 3 to 5 rows and 3 columns
##         x y       z
## 3 -0.1471 a -1.2249
## 4  0.3693 b  1.9505
## 5 -1.1704 b -0.2547

Column
Subsetting in Data Table:

k={print(100); 55}
## [1] 100
k
## [1] 55

 

reading from JSON data in R

#reading from JSON data in R
library(jsonlite)
#JSON to my user account
json_data=fromJSON(“https://api.github.com/users/hydrogeologist/repos”)
names(json_data)
names(json_data$owner)
json_data$owner$login
#Writing data frames into JSON
My_JSON=toJSON(iris, pretty=T)
cat(My_JSON)
iris2=fromJSON(My_JSON)
head(iris2)

Example of input field with Shiny in R

Just Follow the code. The program is based on Developing Data products course on Coursera. I have added more comments and examples to enhance understanding.

ui.R
# A shiny code requires two files: ui.R and server.R
# This code will be saved as ui.R
library(shiny) # call the Library to load into R
shinyUI(pageWithSidebar(
headerPanel(“Example inputs using Shiny”),
sidebarPanel(
#keyword numericInput for numbers
numericInput(‘id1’, ‘#Description of Numeric input, labeled id1’, 0, min = 0, max = 10, step = 1),
#numericInput(inputId, label, value, min = NA, max = NA, step = NA)

#ckeckboxGroupInput for checkbox
#if you know Python, review basics of tkinter.

#checkboxGroupInput(inputId, label, choices, selected = NULL, inline = FALSE)
checkboxGroupInput(“id2”, “Checkbox”,
c(“Value 1” = “1”,
“Value 2” = “2”,
“Value 3” = “3”)),
#Creates a text input which, when clicked on, brings up a calendar
#that the user can click on to select dates.
dateInput(“date”, “Date:”)
),
mainPanel(
)
))

server.R
# A shiny code requires two files: ui.R and server.R
# This code will be saved as server.R
library(shiny) # call the Library to load into R
shinyServer(
function(input, output) {
}
)

 

Create a numeric input control

Description

Create an input control for entry of numeric values

Usage

numericInput(inputId, label, value, min = NA, max = NA, step = NA)
Arguments

inputId
Input variable to assign the control’s value to

label
Display label for the control

value
Initial value

min
Minimum allowed value

max
Maximum allowed value

step
Interval to use when stepping between min and max

Value

A numeric input control that can be added to a UI definition.

 

Checkbox Group Input Control

Description

Create a group of checkboxes that can be used to toggle multiple choices independently. The server will receive the input as a character vector of the selected values.

Usage

checkboxGroupInput(inputId, label, choices, selected = NULL, inline = FALSE)
Arguments

inputId
Input variable to assign the control’s value to.

label
Display label for the control, or NULL.

choices
List of values to show checkboxes for. If elements of the list are named then that name rather than the value is displayed to the user.

selected
The values that should be initially selected, if any.

inline
If TRUE, render the choices inline (i.e. horizontally)

Value

A list of HTML elements that can be added to a UI definition.

Create date input

Description

Creates a text input which, when clicked on, brings up a calendar that the user can click on to select dates.

Usage

dateInput(inputId, label, value = NULL, min = NULL, max = NULL,
format = “yyyy-mm-dd”, startview = “month”, weekstart = 0,
language = “en”)
Arguments

inputId
Input variable to assign the control’s value to.

label
Display label for the control, or NULL.

value
The starting date. Either a Date object, or a string in yyyy-mm-dd format. If NULL (the default), will use the current date in the client’s time zone.

min
The minimum allowed date. Either a Date object, or a string in yyyy-mm-dd format.

max
The maximum allowed date. Either a Date object, or a string in yyyy-mm-dd format.

format
The format of the date to display in the browser. Defaults to “yyyy-mm-dd”.

startview
The date range shown when the input object is first clicked. Can be “month” (the default), “year”, or “decade”.

weekstart
Which day is the start of the week. Should be an integer from 0 (Sunday) to 6 (Saturday).

language
The language used for month and day names. Default is “en”. Other valid values include “bg”, “ca”, “cs”, “da”, “de”, “el”, “es”, “fi”, “fr”, “he”, “hr”, “hu”, “id”, “is”, “it”, “ja”, “kr”, “lt”, “lv”, “ms”, “nb”, “nl”, “pl”, “pt”, “pt-BR”, “ro”, “rs”, “rs-latin”, “ru”, “sk”, “sl”, “sv”, “sw”, “th”, “tr”, “uk”, “zh-CN”, and “zh-TW”.

 

inputs with Shiny