“Honesty is the best policy - when there is money in it”, quipped Mark Twain.
Though the abilities of computers have increased exponentially, recognising sarcastic commentary such as “being awake at 4am with a headache is fun”, still remains a challenge. Unlike humans, who use visual and physical cues like rolling of eyes to detect sarcasm, computers have to rely only on text. For the past decade, linguistic studies have accelerated enhancements in computational irony.
Researchers at IIT Bombay and Monash University, have compiled advancements in this field in their new paper.
One of the main observations of the paper is how information can be curated from online sources, specifically, Twitter and Amazon reviews. For instance, often tweets are annotated with sarcasm-indicative hashtags such as #sarcasm, #sarcastic and #not, allowing researchers to create labelled datasets. Several salient features including semantic similarity, readability and sentiment flips were derived from tweets for classification. Experiments have also been performed in Chinese, Hindi and Indonesian.
The authors also stress the importance of context in sarcasm evaluation. “I love solving math problems all weekend”, may not be sarcastic to a student who loves math, but may be sarcastic to many others. Contexts can also be associated with authors’ historical sentiments, by looking at past tweets or by investigating conversations that the sentence was a part of.
Asking computers to identify human sentiments of anger, sadness and joy, is commonplace. Often, companies spend millions harnessing sentiment analysis, to understand customer likes and dislikes. Several political campaigns have also used it successfully, to engage better with supporters.
Discovering sarcastic patterns was an early trend in this field. It is postulated that, sarcasm occurs due to a contrast between positive verbs indicating negative situations. “Just got off a wonderful 12 hour flight sitting next to a crying baby”, contain implicit sentiment phrases, allowing scientists to extract patterns.
Using several approaches ranging from Rule-based methods to Support Vector Machines, accuracy values close to 94% have been reported.
In spite of annotation and skewness issues that plague sarcasm detection, the field has come a far way from its humble origins a decade back.