Usually, text analyzing programs consider "ich" ("I", "me") a stopword, a functional word without special meaning. This is due to the fact that grammars categorize "ich" as a pronoun. But, as Eugen Coseriu stated in his book "Introduction to the linguistics of texts" (German edition 1985), this does not correspond to the real use of this word.
While "he" in
"Ralf is tired. He will go to bed soon"
is a substitute, i.e. pro-noun, for "Ralf", "I" in
"Ralf is tired. I am going to bed"
is not. The"I" here is understood as reference to a second person. The same is true of "du" ("you"). "Ich" and "Du", these two words have a deictic function, they indicate somebody. Hence, we should not treat them as functional words or eliminate them as stopwords. Especially in literary texts, which have, according not only to Habermas (1981), an expressive function, I and me and you could be rather important.
Having eliminated "Ich" and "Du" from the list of quanteda stopwords, the word cloud looks different:
The impression differs from the prior ones. It looks, as if a dialogical space were opened, wherein elementary things and feelings were strolling around.
Further statistical representations:
As text statistics:
textstat_frequency(matrixa, n=20)
feature frequency rank docfreq group
1 ich 523 1 66 all
2 du 207 2 45 all
3 liebe 46 3 11 all
4 mann 45 4 11 all
5 gut 44 5 18 all
6 komm 43 6 9 all
7 ja 43 6 12 all
8 immer 42 8 19 all
9 sonne 42 8 15 all
10 nacht 41 10 18 all
11 licht 39 11 25 all
12 lust 37 12 8 all
13 gesicht 36 13 22 all
14 mehr 36 13 16 all
15 wasser 35 15 9 all
16 nie 34 16 17 all
17 kommt 34 16 14 all
18 zeit 34 16 14 all
19 amerika 34 16 1 all
20 kalt 32 20 9 all
Relative frequency:
textstat_frequency(clean_rel, n=10)
feature frequency rank docfreq group
1 ich 5.4455093 1 66 all
2 du 2.3364267 2 45 all
3 liebe 0.5116736 3 11 all
4 immer 0.4783033 4 19 all
5 komm 0.4408699 5 9 all
6 sonne 0.4369036 6 15 all
7 mann 0.4266868 7 11 all
8 licht 0.4250503 8 25 all
9 gut 0.4097395 9 18 all
10 mehr 0.4051275 10 16 all
technically
Just added , with quanteda:
my_stopwords <- stopwords("de")%>%
char_remove(pattern=c("ich", "du"))
Commenti
Posta un commento