In Italy, in Germany, in Poland. Teaching German can all of a sudden seem exciting when we listen to Rammstein in class. Some students, as always, do not care, but there are only few of them. Some will be repelled, but a major group will start nodding and shouting as they start understanding the texts of songs they have always, at least in a certain period of their live, listened to.
When "du hasst" turns to "du hast mich gefragt", and back again, there is a light in the classroom. You may not love the industrial hammering of the music, you may be shocked by hearing aggressive German shouting, but the literary quality of these texts is evident. If you have studied German literature, you cannot ignore the allusions not only to GDR songs and brothers Grimm, but also to Trakl and French Symbolism.
But how can we analyze Rammstein texts? Isolate motifs and topics? See how they work? I want to try to make them read by the machine. The programming languages R and Python offer a lot of packages with interesting methods of getting into the texts. I am curious about how we can, by machine Learning, find something out.
I built a corpus of 92 songs I copied, one by one (not scraped, as we need to be sure about he purity of the material) from the Textemania website. The writing had to be corrected, as German äüö and ß are not always reported in the right way. 92 files with the year of the album and the title.
During the next weeks, I will try.
The technical side
The 92 files have names like
R1995 Laichzeit.txt
R text:
library ("quanteda")
library ("readtext")
library("quanteda.textstats")
library("quanteda.textplots")
library("rtweet")
basis <-readtext("/Users/Achim/Desktop/Rammstein/*.txt",
docvarsfrom = "filenames" )
arbname <- basis$doc_id
arbname <- gsub("^R","", arbname)
jahr <- substring(arbname, 0,4)
titel <- substring(arbname, 6,40)
basis <- corpus(basis)
docvars(basis, "year") <- jahr
docvars(basis, "title") <- titel
Commenti
Posta un commento