Passa ai contenuti principali

What are they singing about? Topic analysis of Rammstein texts?

What do we mean when asking what these song texts are about? 

It may mean we are looking for topics, considering each single song as a net of themes. Maybe one text is about the milkman and a boy and about love. We would consider certain words in the text as representation of these topics (maybe two of them). Bottle, white, milk, door could be seen as related to the milkman, kiss and forever and flowers to love and to the boy, too. 

For us humans, the work of relating words to topics or topics to texts is usually conceptual.  We do not need to have a look into the real world, not even into texts, in order to see a relation between the color white and the front door once the milkman is named. 

The machine instead needs data, empirical evidence. Working its way through the texts, it could make a list of words and the single texts where they occur, maybe calculating the frequency of the word within the text and within the entire corpus of songs. But this is not a very promising procedure. There are too many words. When we talk about topics, we imagine a relatively small number of fields. 

We can make the machine try finding out only a few topics, that is groups of words that belong together. The connection will not be conceptual, but can only be based on occurrences of words in documents. If we find words near each other, within the same documents,  it is likely they are linked semantically.  

With R, we can try an LDA topic analysis. We will choose a number of topics. The program will first assign random topics to every single word, then calculate relative frequencies of words and topics, un-assign one word, assign a new topic according to the calculations and so on. The entire procedure will make use of the Dirichlet distribution whenever it needs extra data. This way, trying again and again to get better correspondences, what was latent before will become visible. That is the promise of Latent Dirichlet Analysis.

We can imagine that we will need an awful lot of text material for this process of repeated corrections. Someone says we need at least 1000 documents. But the truth is, it depends. 

Making, for example, on December 11 a Twitter search for 20 tweets with hashtag "advent", an LDA analysis for 2 topics will result in two neatly distinguishable lists of words. Topic 2 is referred to Christmas ("love", "Advent Calendar"), topic 1 to the right wing party AfD.

> terms(fund_lda, 10)

        topic1                 topic2

[1,]     "🇩🇪"                   "liebe"

[2,]     "geht"                "#adventszeit"

[3,]     "sehen"             "#adventskalender"

[4,]     "alternative"       "heute"

[5,]     "deutschland"    "#weihnachtsmarkt"

[6,]     "#afd"                 "danke"


Here, LDA had to work with only 11752 tokens, formed with 1317 unique ones. Should it not be possible to get clear topics from our Rammstein texts? 17150  tokens?


With k=2 topics we get


        topic1         topic2

[1,] "gut"             "sehen"

[2,] "gehen"         "herz"

[3,] "nacht"         "mann"

[4,] "kind"             "immer"

[5,] "schön"         "sonne"

[6,] "gesicht"         "licht"

[7,] "zeit"             "komm"

[8,] "ganz"           "hand"

[9,] "lieben"         "mehr"

[10,] "weiß"         "lust"


Maybe we could write poems with these keywords, but it would be hard identifying topics here. "Child", "night", "beautiful", "face", "time" "to love" and "white" sound like a storyline. 


Increasing the number of topics does not help. With k = 4 the result is:


        topic1         topic2         topic3         topic4

[1,]     "gut"         "herz"         "kind"         "mann"

[2,]     "gehen"     "nacht"     "sonne"       "sehen"

[3,] "sterben"     "licht"         "immer"        "mehr"

[4,] "schön"       "hand"         "lieben"         "lust"

[5,] "stein"         "liebe"         "ganz"            "kommen"

[6,] "blut"             "spring"     "komm"         "leben"

[7,] "amerika"     "alt"             "zeit"              "haut"

[8,] "haben"         "stehen"     "halten"         "feuer"

[9,] "tief"             "leid"             "land"         "-manns" (ok, a mistake somewhere)


k = 17 starts with:


topic1     topic2         topic3

[1,] "ja"     "träne"     "hand"

[2,] "tier"     "amerika" "dürfen"

[3,] "fest"     "liebe" "rammstein"

[4,] "zunge" "alt"         "wer"

[5,] "schön" "deutschland" "frei"

[6,] "gesicht" "hand"     "stehen"

[7,] "recht"     "tränen" "himmel"

[8,] "gott"     "wissen"     "laufen"

[9,] "tief"     "jung"     "auge"

[10,] "ein" "wind"     "warten"




8166


poem 50 words "child" "night" "white" "totally" "love" The night so still, a child's pure delight, The stars twinkle in the white of night, A peaceful moment as the world sleeps tight, Totally in love with the beauty of white. poem 100 words "to see" "heart" "man" "always" "sun" "light""come" "hand" "more" "lust" A man with a heart of gold, Always looking for more to behold, The sun and the light, he will come to see, Lust for life, he will always be free. In his hand he will carry the sun, To show the world the light has come, A new day, a new chance to start, A fresh life, a fresh heart. He will bring us joy and hope, Open our eyes, help us cope, With love and kindness in his hand, His heart will always understand. And when the sun sets in the night, His heart will be full of light, Always looking for more to see, He will always love with glee. story 200 words wth poem 100 words "to see" "heart" "man" "always" "sun" "light""come" "hand" "more" "lust" Once upon a time, there was a man with a heart of gold. He was always looking for more to behold, for the sun and the light, he will come to see, and lust for life, he will always be free. In his hand he will carry the sun, to show the world the light has come. A new day, a new chance to start, a fresh life, a fresh heart. He will bring us joy and hope, open our eyes, help us cope. With love and kindness in his hand, his heart will always understand. He will go out of his way to help those in need, and never accept anything in return. He will show us that life is always worth living, and grant us the courage to take on any challenge. And when the sun sets in the night, his heart will be full of light. Always looking for more to see, he will always love with glee. He will be our guiding light, teaching us to never give up the fight. He will be our beacon of hope, showing us that we can cope.


write a poem 100 words
with "to die" "beautiful" "stone"

"blood" "amerika" "to have"

write a poem 100 words with "to die" "beautiful" "stone" "blood" "amerika" "to have" To Die in Amerika To die in Amerika, so far from home, As the blood drains out and the body turns to stone. Beautiful and tragic, a sight to behold, The person is gone, but the story is told. A life that was lived, a path that was trod, A love that was given, a heart that was gone. The dream of a better future, the hope of to have, Comes to an end in this tragic land. A life gone too soon, a future no more, A dream of a better life is closed at the door. A beautiful stone marks the place where they lay, A reminder of life and the price that was paid.

write an essay 100 words with "to die" "beautiful" "stone" "blood" "amerika" "to have" The Native American tribes of Amerika believed that to die was a beautiful thing. When a warrior was close to death, they would wrap him in a stone blanket and gather around him in a circle. The elders would chant and sing as the warrior's blood returned to the Earth. To have a peaceful death was a great honor. To die surrounded by loved ones was a beautiful way to transition from this world to the next.

Tears to know, wind's embrace Sorrowful yet sublime, love's grace. translate into emoji " Tears to know, wind's embrace Sorrowful yet sublime, love's grace." 😭🤔🌬🤝😔😌💖


  

Commenti

Post popolari in questo blog

Co-occurrences with keyword lists

Playing R with Rammstein texts can be fun when the outcomes are unexpected, and you get plots like Gut, better. We might even see: How do we manage making "das Gute" appear? From general to keyword plots In the plot based on all the co-occurrences  in Rammstein song texts, we can identify a center on the left-hand side: The meaningful central terms seem to be "Lust", "Deutschland", "Liebe", "ich", "du", "kalt" and "gut". Choosing these words as keys, we can plot a new picture. What do we see? There is a clear "ich" - "du" axis. "Liebe" appears together with "ich" and "du", well. But "Lust" seems to be mainly referred only to "ich", as well as "kalt".  Into the contexts Finding "kalt" surprising, even at the border of a net with "ich", "du". "Liebe", "Lust", I searched for the occ...

about stopwords, in machine life and in literature

  The practical part of introductions to Text Analysis with R usually starts to the sound of the hunting horn. "Tokenization" is the parole, and "elimination" the first goal. The analyst does not care about punctuation (remove_punct =TRUE)! The analyst has to reduce the burden of his word sack, and there he or she notices the "stopwords", functional words that connect and move ideas as words .. "then" and "when", "how" and "because". Any contents? No! Throw them away! It might be true that programs for Data Analysis have been developed for, and are mostly used by marketing experts who do not really care about moving ideas, and usually end up with some "sentiment analysis": Feelin good? Ok! It is also true that in AI, things like Natural Language Understanding, caution is the dominant rule. Keep the stopwords, you never know! There are good reasons for doing so. AI should include every word that touches huma...

the Rammstein Mystery: Analysing songtexts with R (1)

  When we take a look at the most frequent words with a wordcloud (wordmax =20), having eliminated the stopwords, the most evident ones describe elementary states: "Lust", "Kälte", "Licht", "Mann"  Within this list, "Liebe" could be surprising, being or not being an elementary state? "Herz" of symbolic quality. "America" only appears in one song. Numerically, it looks like this: feature frequency docfreq      relative frequency   relative ranking 1 liebe 46                   11                     0.63428098                    1 2 mann 45                 11                       0.46447375                    6 3 g...