Much of what I write about in this blog is about language technology and machine translation. The primary focus is on the technology and AI initiatives related to human language translation. This focus will remain so, but I recently came upon something that I felt was worth mentioning, especially in this holiday season, where many of us review, consider and express gratitude for the plenitude in our lives.
Language is a quintessentially human experience where we share, discover, learn, and express the many different facets of our lives through this medium we call language. This is probably why computers are unlikely to ever unravel it fully, there is too much amorphous but critical context about life, living, learning, and the world, around most words to easily capture with training data and give to a computer to learn.
While many of us surmise that language is only about words and about how words can be strung together to share, express, understand the world around us, in most cases, there is much that is unspoken or not directly referenced that also needs to be considered to understand any set of words accurately and faithfully. Sometimes the feeling and emotion are enough and words are not needed.
In 2021 Large Language Models (LLMs) were a big deal and GPT-3, in particular, was all over the news as a symbol of breakthrough AI that to some suggests that a sentient machine is close at hand. Until you look more closely and see that much of what is produced by LLMs are crude pattern reflections that are completely devoid of understanding, comprehension, or cognition in any meaningful sense. The initial enthusiasm for GPT-3 has been followed by increasing concern as people have realized how these systems are prone to producing unpredictable obscenity, prejudiced remarks, misinformation, and so forth. The toxicity and bias inherent in these systems will not be easily overcome without strategies that involve more than more data and more compute.
It is very likely that we will see these increasingly larger LLMs go through the same cycles of over-promising and under-delivering that machine translation has gone through for over 70 years now.
The problem is the same, the words used to train AI alone do not contain everything needed to establish understanding, comprehension, and cognition. And IMO simply training a deep learning algorithm with many more trillions of words will not somehow create understanding and cognition or even common sense.
The inability for AI "to understand" was clearly shown by Amazon Alexa recently when it told a child to essentially electrocute herself. "No current AI is remotely close to understanding the everyday physical or psychological world, what we have now is an approximation to intelligence, not the real thing, and as such it will never really be trustworthy," said Gary Marcus in response to this incident. GPT-3 has also advised suicidal humans to kill themselves in experiments conducted elsewhere.
The machine is not malicious, it simply has no real understanding of the world and life, and lacks common sense.
The truth is that we are forced to learn to query and instruct Alexa, Siri, and Google Voice so that they can do simple but useful tasks for us. This is "AI" where the human in the loop keeps it basically functional and useful. Expecting any real understanding and comprehension from these systems without many explicit and repeated clarifications is simply not possible in 2021.
But anyway, I digress, so, I wanted to talk about the areas where humans move beyond language (as in word-based) but yet communicate, share, and express quintessential humanness in the process.
It is my feeling that entering this space happens most often with music, especially improvised music where there is some uncertainty or unpredictability about the outcome. Where what happens, happens, often without a plan, but yet still with a clear artistic framework and structural outline. I happen to play the sitar focusing on the Indian Classical music of North India where the "Raga" is the basic blueprint that provides the needed foundations for highly disciplined improvisatory exploration.
To a great extent what these musicians do is "shape the air" and create something equivalent to sonic sculptures. These sculptures can be pleasing or relaxing in many ways that only humans can understand, and sometimes can be very moving, which means they can trigger emotional release (tears) or establish a deeply emotional presence (left speechless). Often it is not necessary to understand the actual language used in musical performance since there is still a common layer of feeling, emotion, and yearning that all humans can connect and tap into.
The key difference of this improvisation-heavy approach from a performance of score-based music is that neither the musician nor the audience really knows at the outset how things will turn out. With a score, there is a known and well-defined musical product that both the audience and the musician are aware of and expect. There is more of an elemental structure. However, here too it is possible for an attendee to listen to an unfamiliar language e.g. an operatic aria in Italian, and be deeply moved, even though the audience member speaks no Italian and may have no knowledge of the operatic drama. The connection is made at a feeling and emotional level, not at the word, language, or idea cognition level.
I came upon this musical performance of a Sufi (a mystical Muslim tradition) song sung by two musical legends on a commercial platform called Coke Studio Pakistan. Musically, this might be considered "fusion" but it is heavily influenced by Indian classical music and it is sung in Urdu (Braj) which is so close to Hindi (Hindustani) that they are virtually the same language, except that Urdu uses much more Persian vocabulary. The original poem was written in Braj Basha an antecedent of both Urdu and modern-day Hindi.
This particular performance was a rehearsal and was the first time all the musicians were in the same room, but the producers decided it was not possible to improve on this and published it, as is, since it was quite magical and probably impossible to reproduce.
There are almost 20,000 comments to the video shown below and this comment by Matt Dinopoulos typifies much of the feedback: "It hit my soul on so many levels and just brought me to tears and I don’t even know what they’re saying." The figures of speech “pluck at one’s heartstrings” and “strikes a chord in me” have found a home in our language for just this reason.
This song essentially expresses Khusrau's gratitude, devotion, love, and longing for communion with his Pir (Guru/Spiritual teacher) whose name is Nizam (Nizamuddin Auliya). Sung from the perspective of a young girl awaiting or yearning for her beloved, it is replete with modest yet enchanting symbols, as it celebrates the splendor of losing oneself in love. Both the use of motifs as well as the language itself were deliberate creative choices by Amir Khusrau, to communicate with common people using familiar ideas and aesthetics.
A closer examination of Khusrau's works will reveal that the Beloved in his songs/poems is always the Divine or the Pir. Many poets in India use the perspective of the romantic yearnings of a young maiden for the beloved as an analogy, as the relationship with the Divine is seen as the most intense kind of love. The longing and union they speak of are always about direct contact with the Sacred and so this song should be considered a spiritual lament whose essential intention is to express spiritual love and gratitude. The translations shown in the video are sporadic but still useful.
At the time of this publishing, the video above had already had 40 million views. Many thanks to the eminent Raymond Doctor for providing this link which provides a full translation, and useful background to better understand the thematic influences and artistic inspiration for this song.
“As long as a spiritual artist respects his craft, peace will prevail. It is wonderful when a singer has a noble cause and spreads the message of love, peace, and brotherhood as presented by our saints, without greed of money or the world. This is the real purpose of qawwali.”Rahat Fateh Ali Khan
“Music doesn’t have a language, it’s about the feeling. You have to put a lot of soul into whatever you are making. Music doesn’t work if you’re only doing it for money or professionally. It works only if it’s from the soul. There’s no price to it.” Aima Baig
"Information
is not knowledge. Knowledge is not wisdom. Wisdom is not truth. Truth
is not beauty. Beauty is not love. Love is not music. Music is THE
BEST.”
This unexpected, often surprising, emotion-heavy reaction is entirely and uniquely human. This kind of listener impact cannot come from musical virtuosity alone which is abundantly present here, the musicians here are also tapping into a deeper sub-strata of feeling and emotion that only exists in and is shared by humans.
This is the human space beyond language where understanding happens in spite of initial unfamiliarity. There is something in the human psyche that understands and connects to this even if by accidental discovery, and this initial response often leads to a more substantial connection. We could call this learning perhaps, and this is probably how children also gather knowledge about the world. Intensity and connection probably have a more profound impact on students than pedagogy and quite possibly drive intense learning activity in any sphere.
It is interesting that there are many reaction videos on Youtube where music teachers and YT celebrities from around the world share their first reactions to this particular song and other culturally unfamiliar music. Based on the number of these reaction videos, I guess more and more people are exploring and want to share in the larger human musical experience. Some examples:
- Latina Ceci Dover left speechless (around 4' 50")
- British rapper reacts in shock and awe (around 6' 05")
- A deep and informed analysis of the singing technique and mechanics. "If her voice was an animal it would be an eagle."
- Seda Nur Turkish German was surprised by the emotional connection. (around 3' 15") It also led her to actually visit Pakistan last week, a trip which she is also sharing in her Vlogs.
- John Cameron left speechless and in tears (around 11' 30')
- Waleska & Efra discover a new musical paradigm (around 10' 40")
- Asian dude is blown away (~2' 09"): Oh My Godness he says at 4' 0", and dances to the chorus like a bird (4' 25") Hilarious responses throughout the song.
FUTURA VECCHIA, NEW YEAR’S EVE
by Rebecca Elson
Returning, like the Earth
To the same point in space,
We go softly to the comfort of destruction,
And consume in flames
A school of fish,
A pair of hens,
A mountain poplar with its moss.
A shiver of sparks sweeps round
The dark shoulder of the Earth,
Frisson of recognition,
Preparation for another voyage,
And our own gentle bubbles
Float curious and mute
Towards the black lake
Boiling with light,
Towards the sharp night
Whistling with sound.