Passa ai contenuti principali

"Ich" and "du" as stopwords?

Usually, text analyzing programs consider "ich" ("I", "me") a stopword, a functional word without special meaning.  This is due to the fact that grammars categorize "ich" as a pronoun. But, as Eugen Coseriu stated in his book "Introduction to the linguistics of texts" (German edition 1985), this does not correspond to the real use of this word. 

While "he" in 

"Ralf is tired. He will go to bed soon" 

is a substitute, i.e. pro-noun, for "Ralf", "I" in 

"Ralf is tired. I am going to bed" 

is not. The"I" here is understood as reference to a second person. The same is true of "du("you"). "Ich" and "Du", these two words have a deictic function, they indicate somebody. Hence, we should not treat them as functional words or eliminate them as stopwords. Especially in literary texts, which have, according not only to Habermas (1981), an expressive function, I and me and you could be rather important. 

Having eliminated "Ich" and "Du" from the list of quanteda stopwords, the word cloud looks different: 


The impression differs from the prior ones. It looks, as if a dialogical space were opened, wherein elementary things and feelings were strolling around. 


Further statistical representations:

As text statistics:

textstat_frequency(matrixa, n=20)

feature frequency rank docfreq group

1 ich     523         1         66 all

2 du     207         2         45 all

3 liebe 46         3         11 all

4 mann 45         4         11 all

5 gut     44         5         18 all

6 komm 43         6         9 all

7 ja         43         6        12 all

8 immer 42         8         19 all

9 sonne 42         8         15 all

10 nacht 41 10 18 all

11 licht     39 11 25 all

12 lust     37 12 8 all

13 gesicht 36 13 22 all

14 mehr 36 13 16 all

15 wasser 35 15 9 all

16 nie 34 16 17 all

17 kommt 34 16 14 all

18 zeit 34 16 14 all

19 amerika 34 16 1 all

20 kalt 32 20 9 all


Relative frequency:

textstat_frequency(clean_rel, n=10)

feature frequency rank docfreq group

1 ich     5.4455093 1 66 all

2 du     2.3364267 2 45 all

3 liebe 0.5116736 3 11 all

4 immer 0.4783033 4 19 all

5 komm 0.4408699 5 9 all

6 sonne 0.4369036 6 15 all

7 mann 0.4266868 7 11 all

8 licht     0.4250503 8 25 all

9 gut     0.4097395 9 18 all

10 mehr 0.4051275 10 16 all




technically

Just added , with quanteda:

my_stopwords <- stopwords("de")%>%

char_remove(pattern=c("ich", "du"))

 






Commenti

Post popolari in questo blog

the Rammstein Mystery: Analysing songtexts with R (1)

  When we take a look at the most frequent words with a wordcloud (wordmax =20), having eliminated the stopwords, the most evident ones describe elementary states: "Lust", "Kälte", "Licht", "Mann"  Within this list, "Liebe" could be surprising, being or not being an elementary state? "Herz" of symbolic quality. "America" only appears in one song. Numerically, it looks like this: feature frequency docfreq      relative frequency   relative ranking 1 liebe 46                   11                     0.63428098                    1 2 mann 45                 11                       0.46447375                    6 3 g...

Rammstein read by the machine (3): Liebe lieben

  As   we have seen , love ("liebe") is, once the stopwords are eliminated,  the most frequent feature in Rammstein texts. Among all the nouns indicating elementary states and matters, like "Wasser", "Sonne", "Licht" and "Lust", "Love" looks surprising. Usually, we do not consider love an elementary feeling. Rammstein, as far as it seems, do. But still, we have to be cautious, as we do not even know whether the feature"liebe" corresponds to a noun or to a verb.  With a case-sensitive context research we obtain "Liebe"   28 times  "liebe"   18 occurrences. In "was ich liebe" (2019): "ich liebe nicht, dass ich was liebe", in "OK" (2019): "ich liebe dich, wenn du mich lässt". In the same song, we hear "was sich liebt, das darf sich lecken". Various forms of the same verb, which are counted separately. We can check them, one by one, and get: 0 "li...