Passa ai contenuti principali

the Rammstein Mystery (0)

In Italy, in Germany, in Poland.  Teaching German can all of a sudden seem exciting when we listen to Rammstein in class. Some students, as always, do not care, but there are only few of them. Some will be repelled, but a major group will start nodding and shouting as they start understanding the texts of songs they have always, at least in a certain period of their live, listened to. 

When "du hasst" turns to "du hast mich gefragt", and back again, there is a light in the classroom. You may not love the industrial hammering of the music, you may be shocked by hearing aggressive German shouting, but the literary quality of these texts is evident. If you have studied German literature, you cannot ignore the allusions not only to GDR songs and brothers Grimm, but also to Trakl and French Symbolism. 

But how can we analyze Rammstein texts? Isolate motifs and topics? See how they work? I want to try to make them read by the machine. The programming languages R and Python offer a lot of packages with interesting methods of getting into the texts. I am curious about how we can, by machine Learning, find something out. 

I built a corpus of 92 songs I copied, one by one (not scraped, as we need to be sure about he purity of the material) from the Textemania website. The writing had to be corrected, as German äüö and ß are not always reported in the right way. 92 files with the year of the album and the title. 

During the  next weeks, I will try. 


The technical side

The 92 files have names like

R1995 Laichzeit.txt

R text: 

library ("quanteda")

library ("readtext")

library("quanteda.textstats")

library("quanteda.textplots")

library("rtweet")

basis <-readtext("/Users/Achim/Desktop/Rammstein/*.txt",

                 docvarsfrom = "filenames" )

arbname <- basis$doc_id

arbname <- gsub("^R","", arbname)

jahr <- substring(arbname, 0,4)

titel <- substring(arbname, 6,40)

basis <- corpus(basis)

docvars(basis, "year") <- jahr

docvars(basis, "title") <- titel


Commenti

Post popolari in questo blog

Co-occurrences with keyword lists

Playing R with Rammstein texts can be fun when the outcomes are unexpected, and you get plots like Gut, better. We might even see: How do we manage making "das Gute" appear? From general to keyword plots In the plot based on all the co-occurrences  in Rammstein song texts, we can identify a center on the left-hand side: The meaningful central terms seem to be "Lust", "Deutschland", "Liebe", "ich", "du", "kalt" and "gut". Choosing these words as keys, we can plot a new picture. What do we see? There is a clear "ich" - "du" axis. "Liebe" appears together with "ich" and "du", well. But "Lust" seems to be mainly referred only to "ich", as well as "kalt".  Into the contexts Finding "kalt" surprising, even at the border of a net with "ich", "du". "Liebe", "Lust", I searched for the occ...

about stopwords, in machine life and in literature

  The practical part of introductions to Text Analysis with R usually starts to the sound of the hunting horn. "Tokenization" is the parole, and "elimination" the first goal. The analyst does not care about punctuation (remove_punct =TRUE)! The analyst has to reduce the burden of his word sack, and there he or she notices the "stopwords", functional words that connect and move ideas as words .. "then" and "when", "how" and "because". Any contents? No! Throw them away! It might be true that programs for Data Analysis have been developed for, and are mostly used by marketing experts who do not really care about moving ideas, and usually end up with some "sentiment analysis": Feelin good? Ok! It is also true that in AI, things like Natural Language Understanding, caution is the dominant rule. Keep the stopwords, you never know! There are good reasons for doing so. AI should include every word that touches huma...

the Rammstein Mystery: Analysing songtexts with R (1)

  When we take a look at the most frequent words with a wordcloud (wordmax =20), having eliminated the stopwords, the most evident ones describe elementary states: "Lust", "Kälte", "Licht", "Mann"  Within this list, "Liebe" could be surprising, being or not being an elementary state? "Herz" of symbolic quality. "America" only appears in one song. Numerically, it looks like this: feature frequency docfreq      relative frequency   relative ranking 1 liebe 46                   11                     0.63428098                    1 2 mann 45                 11                       0.46447375                    6 3 g...