Passa ai contenuti principali

A Word Never Comes Alone. A Glance at Cooccurences

Enumerating words can be helpful. With a simple command (tokens_ngrams), in R we also got lists of word pairs and triples and so on. But usually we want to know which words appear within the same documents. What about co-occurences within the same song? Which words appear in the same songs? With the fcm() command in R we obtain a neat little table, more or less like the following

                ich     du     gut     liebe     kalt     bitte     sonne

ich           2775 1289  474    550      352      25       143

du            0       920  209   183       279      25       19

gut           0       0      91       19         0         2         20

lust          627    1     2         0           280     0          1

deutschl  209    532  0         76        152      0         171

liebe                                    140       55       41         11

kalt                                                     84       28         4

bitte                                                             36         0

sonne                                                                       166


As a plot: 





We see a bridge from "ich" to "du", to "lust""liebe" and a lighter one to "kalt". This really looks like the word center of Rammstein songs. 

From the table, we see that "liebe" has no connection to "ich" and "du", while "lust" co-occurs with "ich" and "kalt". Reflect!

Note: Some of the above numbers are very high because co-occurences are calculated within a V*V matrix. Hence, the square of the term frequency is the basis. For definitive results, we should calculate the relative frequency. 

technically
Packages quanteda, readtext

my_stopwords <- stopwords("de")%>%
  char_remove(pattern=c("ich", "du"))
toks_fund <- tokens(basis, remove_punct = TRUE, remove_numbers = TRUE, remove_symbol = TRUE)%>%
        tokens_remove(pattern = my_stopwords)
matrixa <- dfm(toks_fund)

toks_zusammen <- fcm(matrixa)
feat <- names(topfeatures(toks_zusammen, 20))
fcmat_tokszus_subset <- toks_zusammen %>%
  fcm_select(feat)
tplot_zusammen <- textplot_network(fcmat_tokszus_subset)
set.seed(134)
tplot_zusammen
write.table(fcmat_tokszus_subset, file = "")


Commenti

Post popolari in questo blog

"Ich" and "du" as stopwords?

Usually, text analyzing programs consider "ich" ("I", "me") a stopword, a functional word without special meaning.  This is due to the fact that grammars categorize "ich" as a pronoun. But, as Eugen Coseriu stated in his book "Introduction to the linguistics of texts" (German edition 1985), this does not correspond to the real use of this word.  While "he" in  "Ralf is tired. He will go to bed soon"  is a substitute, i.e. pro-noun, for "Ralf", "I" in  "Ralf is tired. I am going to bed"  is not.  The "I " here is understood as reference to a second person. T he same is true of  "du "  ( "you " ).  "Ich " and  "Du ",  these two words have a deictic function, they indicate somebody. Hence, we should not treat them as functional words or eliminate them as stopwords. Especially in literary texts, which have, according not only to Habermas (198

Rammstein read by the machine (3): Liebe lieben

  As   we have seen , love ("liebe") is, once the stopwords are eliminated,  the most frequent feature in Rammstein texts. Among all the nouns indicating elementary states and matters, like "Wasser", "Sonne", "Licht" and "Lust", "Love" looks surprising. Usually, we do not consider love an elementary feeling. Rammstein, as far as it seems, do. But still, we have to be cautious, as we do not even know whether the feature"liebe" corresponds to a noun or to a verb.  With a case-sensitive context research we obtain "Liebe"   28 times  "liebe"   18 occurrences. In "was ich liebe" (2019): "ich liebe nicht, dass ich was liebe", in "OK" (2019): "ich liebe dich, wenn du mich lässt". In the same song, we hear "was sich liebt, das darf sich lecken". Various forms of the same verb, which are counted separately. We can check them, one by one, and get: 0 "li