Rの基本的な使い方

(1)

1

Rの基本的な使い方

(2)

2

Rの基本的な使い方　その１

(3)

3

Rで扱うデータの種類のうち主なも

の



atom: 真理値または数または文字



atomic vector: 同種類のatomを並べたもの



list: 同種類のatomとは限らないものを並べたも

の

(4)

4

ベクトルやデータフレームを

Rの中

で直接に入力するには

> c(1, 3, 5, 7) [1] 1 3 5 7 > seq(1, 7, by=2) [1] 1 3 5 7 > rep(3, 4) [1] 3 3 3 3 > rep(1:3, 2) [1] 1 2 3 1 2 3 > rep(1:3, each=2) [1] 1 1 2 2 3 3

>

height <- c(162, 170, 169)

>

name <- c("John", "Mary", "Sue")

>

data <- data.frame(name, height)

>

data

name height

1 John 162

2 Mary 170

3 Sue 169

(5)

5

Atomic vector

>

vector1 <- c(1,2)

>

vector1

[1] 1 2

>

vector2 <- c(3,4)

>

vector2

[1] 3 4

>

vector2[1]

[1] 3

>

vector2[2]

[1] 4

>

c(vector1, vector2)

[1] 1 2 3 4

>

c(vector1, 5)

[1] 1 2 5

>

c(vector1, "a")

[1] "1" "2" "a"

(6)

6

List

>

list1 <- list(vector1, 3)

>

list1

[[1]]

[1] 1 2

[[2]]

[1] 3

>

list1[2]

[[1]]

[1] 3

>

list1[[2]]

[1] 3

>

c(list1, 4)

[[1]]

[1] 1 2

[[2]]

[1] 3

[[3]]

[1] 4

>

unlist(list1)

[1] 1 2 3

(7)

7

データフレームからの情報抽出

> verbs.rs

RealizationOfRec Verb AnimacyOfRec AnimacyOfTheme LengthOfTheme 638 PP pay animate inanimate 0.6931472 799 PP sell animate inanimate 1.3862944 390 NP lend animate animate 0.6931472 569 PP sell animate inanimate 1.6094379 567 PP send inanimate inanimate 1.3862944 > verbs.rs[verbs.rs$AnimacyOfTheme == "inanimate" & verbs.rs$LengthO

fTheme < 1.4, ]

RealizationOfRec Verb AnimacyOfRec AnimacyOfTheme LengthOfTheme 638 PP pay animate inanimate 0.6931472 799 PP sell animate inanimate 1.3862944 567 PP send inanimate inanimate 1.3862944 > verbs.rs$AnimacyOfTheme == "inanimate"

[1] TRUE TRUE FALSE TRUE TRUE > verbs.rs$LengthOfTheme < 1.4

[1] TRUE TRUE TRUE FALSE TRUE

> verbs.rs$AnimacyOfTheme=="inanimate" & verbs.rs$LengthOfTheme<1.4

(8)

8

factorとcharacter

> verbs.rs

RealizationOfRec Verb AnimacyOfRec AnimacyOfTheme LengthOfTheme 638 PP pay animate inanimate 0.6931472 799 PP sell animate inanimate 1.3862944 390 NP lend animate animate 0.6931472 569 PP sell animate inanimate 1.6094379 567 PP send inanimate inanimate 1.3862944 > colnames(verbs.rs)

[1] "RealizationOfRec" "Verb" "AnimacyOfRec" "AnimacyOfTheme" "LengthOfThe me" > colnames(verbs.rs)[2] [1] "Verb" > nchar(colnames(verbs.rs)[2]) [1] 4 > verbs.rs[4, 3] [1] animate

Levels: animate inanimate > nchar(verbs.rs[4, 3])

(9)

9

R内でのデータフレームの編集

> verbs.rs$Length = nchar(as.character(verbs.rs$Verb)) > verbs.rs

RealizationOfRec Verb AnimacyOfRec AnimacyOfTheme LengthOfTheme Length 638 PP pay animate inanimate 0.6931472 3 799 PP sell animate inanimate 1.3862944 4 390 NP lend animate animate 0.6931472 4 569 PP sell animate inanimate 1.6094379 4 567 PP send inanimate inanimate 1.3862944 4

> colnames(verbs.rs)[2]="Spelling" > verbs.rs

RealizationOfRec Spelling AnimacyOfRec AnimacyOfTheme LengthOfTheme Length 638 PP pay animate inanimate 0.6931472 3 799 PP sell animate inanimate 1.3862944 4 390 NP lend animate animate 0.6931472 4 569 PP sell animate inanimate 1.6094379 4 567 PP send inanimate inanimate 1.3862944 4

> dim(verbs.rs)

(10)

10

オブジェクトの型・構造を知るには

> str(verbs.rs[3, 3])

Factor w/ 2 levels "animate","inanimate": 1 > summary(verbs.rs[3, 3]) animate inanimate 1 0 > str(colnames(verbs.rs)[3]) chr "AnimacyOfRec" > summary(colnames(verbs.rs)[3])

Length Class Mode 1 character character > str(verbs.rs)

'data.frame': 5 obs. of 5 variables:

$ RealizationOfRec: Factor w/ 2 levels "NP","PP": 2 2 1 2 2

$ Verb : Factor w/ 65 levels "accord","allocate",..: 38 50 31 50 53 $ AnimacyOfRec : Factor w/ 2 levels "animate","inanimate": 1 1 1 1 2 $ AnimacyOfTheme : Factor w/ 2 levels "animate","inanimate": 2 2 1 2 2 $ LengthOfTheme : num 0.693 1.386 0.693 1.609 1.386

> dim(verbs.rs)

(11)

11

Rの基本的な使い方　その２

(12)

12

表計算ソフトからのコピーアンド

ペースト

　　　　　　height 　　weight Suzuki Hanako 　　　 160 　　　50 Tanaka Taro 　　　 175 　　　65 Yoshimura Naoto 　　　 171 　　　70 というようなデータをWindows上で表計算ソフトからコピー・ペーストする場合： x <- read.delim(“clipboard”, row.names=1) 1列目からデータ自体が始まっている場合は： x <- read.delim("clipboard") 1行目からデータ自体が始まっている場合は： x <- read.delim("clipboard", header=F) read.tableは、デフォルトでは半角スペースもデータの区切り目扱いするので注意。　　

(13)

13

表計算ソフトからのコピーアンド

ペースト（続き）

単なるベクトル（

1行または1列の数字）の場合は

_{x <- scan(“clipboard”)}

Macの場合は、

”clipboard”

の代わりに

_stdin()

として、

_0:

が表示されたらペーストし、入力が終わったらリターンキーを

押す。

(14)

14

テキストファイルの読み込み

>

getwd()

[1] "C:/Users/修一/SkyDrive/Documents"

>

setwd("C:/Users/修

一

/SkyDrive/Documents/stats/DataAnalysis2017")

>

getwd()

[1] "C:/Users/修

一

_{/SkyDrive/Documents/stats/DataAnalysis2017"}

>

scan("jefferson2.txt", what="character") -> words

(15)

15

Rの基本的な使い方　その３

(16)

16

ここでの目標（練習問題その１）

長い英語の文章が入っているテキストファイルを整形することを考える。

文章全体が文のリストになっているように、そして各文が単語を要素とする

_vectorに

なっているようにする。つまり、次の「

_{data」のような形にする。}

>

sentence1 <- c("This", "is", "a", "pen")

>

sentence2 <- c("That", "is", "a", "hat")

>

data <- list(sentence1, sentence2)

>

data

[[1]]

[1] "This" "is" "a" "pen"

[[2]]

(17)

17

パッケージの利用

文字列の操作をするのに役立つパッケージ

stringr:

>

install.packages(“stringr”)

（

Windowsの場合、「管理者として実行」していないと問題が生じる）

>

library(stringr)

（これは毎回やる必要がある）

（そのほかに例えば、

他の統計ソフトとデータをやり取りするのに役立つパッケージforeign

Baayenの本に付随するパッケージ languageR　など）

(18)

18

文字列の一部の取り出し

>

x <- "peacock"

>

x

[1] "peacock"

>

library(stringr)

>

str_sub(x, 2, 4)

[1] "eac"

>

str_sub(x, 2, -2)

[1] "eacoc"

(19)

19

Rの基本的な使い方　その４

(20)

20

関数の定義

>

f <- function (x) { return(x^2) }

>

f(3)

[1] 9

>

f(4)

[1] 16

(21)

21

ループ

>

plus <- function (x)

+

{

+

result <- 0

+

for (i in 1:length(x))

+

result <- result + x[i]

+

return(result)

+

}

>

plus(1:10)

(22)

22

条件分岐

separate_punctuation <-

function(x)

{ output <- c()

for (i in 1:length(x))

{

last_letter <- str_sub(x[i], -1, -1)

if (last_letter == "." || last_letter == "?" || last_letter == "," || last_letter == ";")

output <- c(output, str_sub(x[i], 1, -2), last_letter)

else

output <- c(output, x[i])

}

return(output)

}

>

sentence1 <- c("This", "is", "a", "pen.")

>

separate_punctuation(sentence1)

(23)

23

Rの基本的な使い方　その５

(24)

24

セッションの管理



_objects()



_ls()



_{rm(list = ls())}



_q()

(25)

25

sapply

>

sentence1 <- c("This", "is", "a", "pen.")

>

sentence2 <- c("That's", "great.")

>

sentences <- list(sentence1, sentence2)

>

sapply(sentences, length)

(26)

26

tapplyとaggregate

> heid2 = aggregate(heid$RT, list(heid$Word), mean)

> head(heid2) Group.1 x 1 aftandsheid 6.705000 2 antiekheid 6.542353 3 banaalheid 6.587727 4 basaalheid 6.585714 5 bebrildheid 6.673333 6 beschutheid 6.551667

> heid3 = tapply(heid$RT, list(heid$Word), mean)

> head(heid3)

aftandsheid antiekheid banaalheid basaalheid bebrildheid beschutheid 6.705000 6.542353 6.587727 6.585714 6.673333 6.551667 > dim(heid2)

[1] 40 2 > dim(heid3)

[1] 40

> heid3 = tapply(heid$RT, heid$Word, mean)

> head(heid3)

aftandsheid antiekheid banaalheid basaalheid bebrildheid beschutheid 6.705000 6.542353 6.587727 6.585714 6.673333 6.551667

(27)

27

ソート

> x <- c(1,1,3:1,1:4,3)

> y <- c(9,9:1)

> x

[1] 1 1 3 2 1 1 2 3 4 3

> y

[1] 9 9 8 7 6 5 4 3 2 1

> order(x, y)

[1] 6 5 1 2 7 4 10 8 3 9

> order(x, y, decreasing=T)

[1] 9 3 8 10 4 7 1 2 5 6

(28)

28

ソート（続き）

> verbs.rs[order(verbs.rs$Verb), ]

RealizationOfRec Verb AnimacyOfRec AnimacyOfTheme LengthOfTheme 390 NP lend animate animate 0.6931472 1 PP pay animate inanimate 0.6931472 799 PP sell animate inanimate 1.3862944 569 PP sell animate inanimate 1.6094379 567 PP send inanimate inanimate 1.3862944 > verbs.rs[order(verbs.rs$Verb, rownames(verbs.rs)), ]

RealizationOfRec Verb AnimacyOfRec AnimacyOfTheme LengthOfTheme 390 NP lend animate animate 0.6931472 1 PP pay animate inanimate 0.6931472 569 PP sell animate inanimate 1.6094379 799 PP sell animate inanimate 1.3862944 567 PP send inanimate inanimate 1.3862944

(29)

29

分割表（＝クロス（集計）表）の作成

>

xtabs( ~ RealizationOfRec + Animacy

OfRec, data = verbs)

AnimacyOfRec

RealizationOfRec animate inanimate

NP 521 34

PP 301 47

(30)

30

merge

> head(heid2) Word MeanRT 1 aftandsheid 6.705000 2 antiekheid 6.542353 3 banaalheid 6.587727 4 basaalheid 6.585714 5 bebrildheid 6.673333 6 beschutheid 6.551667 > head(items) Word BaseFrequency 1 basaalheid 3.56 2 markantheid 5.16 3 ontroerdheid 5.55 4 contentheid 4.50 5 riantheid 4.53 6 tembaarheid 0.00

> heid2 = merge(heid2, items, by = "Word")

> head(heid2)

Word MeanRT BaseFrequency 1 aftandsheid 6.705000 4.20 2 antiekheid 6.542353 6.75 3 banaalheid 6.587727 5.74 4 basaalheid 6.585714 3.56 5 bebrildheid 6.673333 3.61 6 beschutheid 6.551667 4.79

Rの基本的な使い方