What do we mean when asking what these song texts are about?
It may mean we are looking for topics, considering each single song as a net of themes. Maybe one text is about the milkman and a boy and about love. We would consider certain words in the text as representation of these topics (maybe two of them). Bottle, white, milk, door could be seen as related to the milkman, kiss and forever and flowers to love and to the boy, too.
For us humans, the work of relating words to topics or topics to texts is usually conceptual. We do not need to have a look into the real world, not even into texts, in order to see a relation between the color white and the front door once the milkman is named.
The machine instead needs data, empirical evidence. Working its way through the texts, it could make a list of words and the single texts where they occur, maybe calculating the frequency of the word within the text and within the entire corpus of songs. But this is not a very promising procedure. There are too many words. When we talk about topics, we imagine a relatively small number of fields.
We can make the machine try finding out only a few topics, that is groups of words that belong together. The connection will not be conceptual, but can only be based on occurrences of words in documents. If we find words near each other, within the same documents, it is likely they are linked semantically.
With R, we can try an LDA topic analysis. We will choose a number of topics. The program will first assign random topics to every single word, then calculate relative frequencies of words and topics, un-assign one word, assign a new topic according to the calculations and so on. The entire procedure will make use of the Dirichlet distribution whenever it needs extra data. This way, trying again and again to get better correspondences, what was latent before will become visible. That is the promise of Latent Dirichlet Analysis.
We can imagine that we will need an awful lot of text material for this process of repeated corrections. Someone says we need at least 1000 documents. But the truth is, it depends.
Making, for example, on December 11 a Twitter search for 20 tweets with hashtag "advent", an LDA analysis for 2 topics will result in two neatly distinguishable lists of words. Topic 2 is referred to Christmas ("love", "Advent Calendar"), topic 1 to the right wing party AfD.
> terms(fund_lda, 10)
topic1 topic2
[1,] "🇩🇪" "liebe"
[2,] "geht" "#adventszeit"
[3,] "sehen" "#adventskalender"
[4,] "alternative" "heute"
[5,] "deutschland" "#weihnachtsmarkt"
[6,] "#afd" "danke"
Here, LDA had to work with only 11752 tokens, formed with 1317 unique ones. Should it not be possible to get clear topics from our Rammstein texts? 17150 tokens?
With k=2 topics we get
topic1 topic2
[1,] "gut" "sehen"
[2,] "gehen" "herz"
[3,] "nacht" "mann"
[4,] "kind" "immer"
[5,] "schön" "sonne"
[6,] "gesicht" "licht"
[7,] "zeit" "komm"
[8,] "ganz" "hand"
[9,] "lieben" "mehr"
[10,] "weiß" "lust"
Maybe we could write poems with these keywords, but it would be hard identifying topics here. "Child", "night", "beautiful", "face", "time" "to love" and "white" sound like a storyline.
Increasing the number of topics does not help. With k = 4 the result is:
topic1 topic2 topic3 topic4
[1,] "gut" "herz" "kind" "mann"
[2,] "gehen" "nacht" "sonne" "sehen"
[3,] "sterben" "licht" "immer" "mehr"
[4,] "schön" "hand" "lieben" "lust"
[5,] "stein" "liebe" "ganz" "kommen"
[6,] "blut" "spring" "komm" "leben"
[7,] "amerika" "alt" "zeit" "haut"
[8,] "haben" "stehen" "halten" "feuer"
[9,] "tief" "leid" "land" "-manns" (ok, a mistake somewhere)
k = 17 starts with:
topic1 topic2 topic3
[1,] "ja" "träne" "hand"
[2,] "tier" "amerika" "dürfen"
[3,] "fest" "liebe" "rammstein"
[4,] "zunge" "alt" "wer"
[5,] "schön" "deutschland" "frei"
[6,] "gesicht" "hand" "stehen"
[7,] "recht" "tränen" "himmel"
[8,] "gott" "wissen" "laufen"
[9,] "tief" "jung" "auge"
[10,] "ein" "wind" "warten"
8166
poem 50 words "child" "night" "white" "totally" "love" The night so still, a child's pure delight, The stars twinkle in the white of night, A peaceful moment as the world sleeps tight, Totally in love with the beauty of white. poem 100 words "to see" "heart" "man" "always" "sun" "light""come" "hand" "more" "lust" A man with a heart of gold, Always looking for more to behold, The sun and the light, he will come to see, Lust for life, he will always be free. In his hand he will carry the sun, To show the world the light has come, A new day, a new chance to start, A fresh life, a fresh heart. He will bring us joy and hope, Open our eyes, help us cope, With love and kindness in his hand, His heart will always understand. And when the sun sets in the night, His heart will be full of light, Always looking for more to see, He will always love with glee. story 200 words wth poem 100 words "to see" "heart" "man" "always" "sun" "light""come" "hand" "more" "lust" Once upon a time, there was a man with a heart of gold. He was always looking for more to behold, for the sun and the light, he will come to see, and lust for life, he will always be free. In his hand he will carry the sun, to show the world the light has come. A new day, a new chance to start, a fresh life, a fresh heart. He will bring us joy and hope, open our eyes, help us cope. With love and kindness in his hand, his heart will always understand. He will go out of his way to help those in need, and never accept anything in return. He will show us that life is always worth living, and grant us the courage to take on any challenge. And when the sun sets in the night, his heart will be full of light. Always looking for more to see, he will always love with glee. He will be our guiding light, teaching us to never give up the fight. He will be our beacon of hope, showing us that we can cope.
Tears to know, wind's embrace Sorrowful yet sublime, love's grace. translate into emoji " Tears to know, wind's embrace Sorrowful yet sublime, love's grace." 😭🤔🌬🤝😔😌💖
Commenti
Posta un commento