A Word Never Comes Alone. A Glance at Cooccurences

Enumerating words can be helpful. With a simple command (tokens_ngrams), in R we also got lists of word pairs and triples and so on. But usually we want to know which words appear within the same documents. What about co-occurences within the same song? Which words appear in the same songs? With the fcm() command in R we obtain a neat little table, more or less like the following.

ich du gut liebe kalt bitte sonne

ich 2775 1289 474 550 352 25 143

du 0 920 209 183 279 25 19

gut 0 0 91 19 0 2 20

lust 627 1 2 0 280 0 1

deutschl 209 532 0 76 152 0 171

liebe 140 55 41 11

kalt 84 28 4

bitte 36 0

sonne 166

As a plot:

We see a bridge from "ich" to "du", to "lust", "liebe" and a lighter one to "kalt". This really looks like the word center of Rammstein songs.

From the table, we see that "liebe" has no connection to "ich" and "du", while "lust" co-occurs with "ich" and "kalt". Reflect!

Note: Some of the above numbers are very high because co-occurences are calculated within a V*V matrix. Hence, the square of the term frequency is the basis. For definitive results, we should calculate the relative frequency.

technically

Packages quanteda, readtext

my_stopwords <- stopwords("de")%>%

char_remove(pattern=c("ich", "du"))

toks_fund <- tokens(basis, remove_punct = TRUE, remove_numbers = TRUE, remove_symbol = TRUE)%>%

tokens_remove(pattern = my_stopwords)

matrixa <- dfm(toks_fund)

toks_zusammen <- fcm(matrixa)

feat <- names(topfeatures(toks_zusammen, 20))

fcmat_tokszus_subset <- toks_zusammen %>%

fcm_select(feat)

tplot_zusammen <- textplot_network(fcmat_tokszus_subset)

set.seed(134)

tplot_zusammen

write.table(fcmat_tokszus_subset, file = "")

Rammstein Read by the Machine

Cerca nel blog

A Word Never Comes Alone. A Glance at Cooccurences

Etichette

Commenti

Posta un commento

Post popolari in questo blog

the Rammstein Mystery: Analysing songtexts with R (1)

"Ich" and "du" as stopwords?

Rammstein read by the machine (3): Liebe lieben