Passa ai contenuti principali

the Rammstein Mystery: Analysing songtexts with R (1)

 When we take a look at the most frequent words with a wordcloud (wordmax =20), having eliminated the stopwords, the most evident ones describe elementary states: "Lust", "Kälte", "Licht", "Mann"  Within this list, "Liebe" could be surprising, being or not being an elementary state? "Herz" of symbolic quality. "America" only appears in one song.

Numerically, it looks like this:

feature frequency docfreq      relative frequency   relative ranking

1 liebe 46              11                0.63428098                1

2 mann 45             11                 0.46447375                6

3 gut     44             18                 0.48527064                4

4 komm 43             9

5 ja        43             12 

6 immer 42            19                 0.50531695                3

7 sonne 42             15                 0.45047007

8 nacht 41             18 

9 licht 39                 25                 0.46691763                5

10 lust 37             8                        0.45877258                8

The impression changes if we do not eliminate stopwords. In this case, the most important word is "ich". 

"ich" indeed appears 523 times in 66 of the 92 documents, much more often than "liebe" with 46 appearances in only 11 documents. 

Together, the list with stopwords with the entire wordbag give a first impression of what Rammstein texts are about.


Commenti

Post popolari in questo blog

A Word Never Comes Alone. A Glance at Cooccurences

Enumerating words can be helpful. With a simple command (tokens_ngrams), in R we also got lists of word pairs and triples and so on. But usually we want to know which words appear within the same documents. W hat about co-occurences within the same song? Which words appear in the same songs? With the fcm() command in R we obtain a neat little table, more or less like the following .                      ich      du      gut      liebe      kalt      bitte      sonne ich              2775 1289  474    550      352      25       143 du             0       920  209   183       279      25       19 gut  ...

"Ich" and "du" as stopwords?

Usually, text analyzing programs consider "ich" ("I", "me") a stopword, a functional word without special meaning.  This is due to the fact that grammars categorize "ich" as a pronoun. But, as Eugen Coseriu stated in his book "Introduction to the linguistics of texts" (German edition 1985), this does not correspond to the real use of this word.  While "he" in  "Ralf is tired. He will go to bed soon"  is a substitute, i.e. pro-noun, for "Ralf", "I" in  "Ralf is tired. I am going to bed"  is not.  The "I " here is understood as reference to a second person. T he same is true of  "du "  ( "you " ).  "Ich " and  "Du ",  these two words have a deictic function, they indicate somebody. Hence, we should not treat them as functional words or eliminate them as stopwords. Especially in literary texts, which have, according not only to Habermas (198...