Research

At IBM Research, I have been working on Natural Language Generation (NLG) problems for the last two and half years. I have participated in projects dealing with data-to-text and text-to-text generation paradigms. In data-to-text, our research aims at generating natural language descriptions from structured data such as knowledge graphs, tables etc. In text-to-text, our focus has been on problems such as text-simplification, text style-transfer and controllable paraphrasing.

My doctoral research comes broadly under the area of Natural Language Processing and relates to Machine Learning, Cognitive Science, Classical Linguistics and Psycholinguistics. The objective was to uncover the cognitive underpinning of human language processing and translate the insights to better algorithms in Computational Linguistics. I made use of the Eye-Tracking technology to record and analyze human eye movement patterns to gain insights into the human way of performing Translation, Sentiment and Sarcasm Analysis, and tackling linguistic subtleties during reading. To know more, please visit the Cognitive NLP website.

Apart from thesis related work, at IIT Bombay, I contributed to various projects related to Machine Translation, Social Media Text Analysis, Crowdsourcing and Development of Resources and Tools for Indian Language Processing.

Publications

Book

  1. Abhijit Mishra and Pushpak Bhattacharyya, Cognitively Inspired Natural Language Processing- An Investigation Based on Eye Tracking, Cognitive Intelligence and Robotics Series, Springer Nature Singapore, ISBN:978-981-13-1515-2, 2018.

Significant

  1. Abhijit Mishra, Tarun Tater, Karthik Sankaranarayanan, A Modular Architecture for Unsupervised Sarcasm Generation, EMNLP 2019, Hong Kong, China, 3rd Nov - 7th Nov, 2019

  2. Anirban Laha, Parag Jain, Abhijit Mishra, Karthik Sankaranarayanan, Scalable Micro-planned Generation of Discourse from Structured Data, Computational Linguistics, MIT Press, 2019 (equal contribution from the first three authors)

  3. Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, Karthik Sankaranarayanan, Unsupervised Neural Text Simplification, ACL 2019, Florence, Italy, 28th July-2nd Aug, 2019

  4. Parag Jain, Abhijit Mishra, Amar P. Azad, Karthik Sankaranarayanan, Unsupervised Controllable Text Formalization, AAAI 2019, Hawaii, USA, 27th Jan - 1st Feb, 2019

  5. Sandeep Mathias, Diptesh Kanojia, Kevin Patel, Samarth Agrawal, Abhijit Mishra and Pushpak Bhattacharyya, Eyes are the Windows to the Soul: Predicting the Rating of Text Quality Using Gaze Behaviour, ACL 2018, Melbourne, Australia, 15 July-20 July, 2018

  6. Vitobha Munigala, Abhijit Mishra, Srikanth Govindaraj Tamilselvam, Shreya Khare, Riddhiman Dasgupta and Anush Sankaran, PersuaAIDE ! An Adaptive Persuasive Text Generation System for Fashion Domain,WWW 2018, Lyon, France, 23th April - 27th April, 2018

  7. Abhijit Mishra, Srikanth Tamilselvam, Riddhiman Dasgupta, Seema Nagar and Kuntal Dey, Cognition-Cognizant Sentiment Analysis with Multitask Subjectivity Summarization based on Annotators’ Gaze Behavior, AAAI, 2018, New Orleans, USA, 2nd February - 7th February, 2018

  8. Srikanth Tamilselvam, Seema Nagar, Abhijit Mishra and Kuntal Dey, Graph Based Sentiment Aggregation using ConceptNet Ontology, IJCNLP, 2017, Taipei, Taiwan, 27 November-1st December, 2017

  9. Joe Cheri Ross, Abhijit Mishra, Kaustuv Kanti Ganguli, Pushpak Bhattacharyya, Identifying Raga Similarity Through Embeddings Learned from Compositions’ Notation, ISMIR 2017, Suzhou, China, 23-28 October, 2017

  10. Abhijit Mishra, Kuntal Dey, Pushpak Bhattacharyya, Learning Cognitive Features from Gaze Data for Sentiment and Sarcasm Classification using Convolutional Neural Network, ACL 2017, Vancouver, Canada, 30 July-4 August, 2017

  11. Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey, Pushpak Bhattacharyya, Scanpath Complexity: Modeling Reading Effort using Gaze Information, AAAI 2017, San Francisco, USA, 4-9 February, 2017

  12. Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey, Pushpak Bhattacharyya, Harnessing Cognitive Features for Sarcasm Detection, ACL 2016, Berlin, Germany, 7-12 August, 2016

  13. Abhijit Mishra, Diptesh Kanojia, Kuntal Dey, Seema Nagar and Pushpak Bhattacharyya, Leveraging Cognitive Features for Sentiment Analysis, CoNLL 2016, Berlin, Germany, August 11-12, 2016

  14. Abhijit Mishra, Diptesh Kanojia and Pushpak Bhattacharyya, Predicting Readers’ Sarcasm Understandability by Modelling Gaze Behaviour, AAAI 2016, Phoenix, USA, Feb 12-17, 2016

  15. Aditya Joshi, Abhijit Mishra, Balamurali AR, Pushpak Bhattacharyya, Mark J Carman, A computational approach for automatic prediction of drunk-texting ACL 2015, Beijing, China, July 2015

  16. Aditya Joshi, Abhijit Mishra, Nivvedan Senthamilselvan and Pushpak Bhattacharyya, Measuring Sentiment Annotation Complexity of Text, ACL 2014, Baltimore, USA, 23-25 June, 2014

  17. Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee, Ritesh Shah and Pushpak Bhattacharyya, Shata-Anuvadak: Tackling Multiway Translation of Indian Languages, LREC 2014, Rekjyavik, Iceland, 26-31 May, 2014

  18. Abhijit Mishra and Pushpak Bhattacharyya, Automatically Predicting Sentence Translation Difficulty, ACL 2013, Sofia, Bulgaria, 4-9 August, 2013

Workshop and Demo Papers

  1. Parag Jain, Priyanka Agrawal, Abhijit Mishra, Mohak Sukhwani, Anirban Laha, Story Generation from Sequence of Independent Short Descriptions, ML4Creativity, SIGKDD Workshop, Halifax, Nova Scotia - Canada.

  2. Joe Cheri, Abhijit Mishra and Pushpak Bhattacharyya, Leveraging Annotators’ Gaze Behaviour for Coreference Resolution, ACL 2016 Workshop on Cognitive Aspects of Computational Language Learning (CogACLL 2016) at ACL 2016, Berlin, Germany, August 11, 2016.

  3. Diptesh Kanojia, Shehzaad Dhuliawala, Abhijit Mishra, Naman Gupta and Pushpak Bhattacharyya, TransChat: Cross-Lingual Instant Messaging for Indian Languages, ICON 2015, December 2015

  4. Abhijit Mishra, Aditya Joshi and Pushpak Bhattacharyya, A cognitive study of subjectivity extraction in sentiment annotation, 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2014), Baltimore, USA, 27 June, 2014

  5. Anoop Kunchukuttan, Ratish Pudupully, Rajen Chatterjee, Abhijit Mishra, Pushpak Bhattacharyya, The IIT Bombay SMT System for ICON 2014 Tools Contest, ICON 2014, Goa, India, Dec 2014

  6. Piyush Dungarwal, Rajen Chatterjee, Abhijit Mishra, Anoop Kunchukuttan, Ritesh Shah and Pushpak Bhattacharyya, The IIT Bombay Hindi-English Translation System at WMT 2014, 9th Workshop on Statistical Machine Translation (WMT 2014), Baltimore, USA, 26-27 June, 2014

  7. Anoop Kunchukuttan, Rajen Chatterjee, Shourya Roy, Abhijit Mishra and Pushpak Bhattacharyya, TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain, ACL 2013, Sofia, Bulgaria, 4-9 August, 2013

  8. Abhijit Mishra, Michael Carl and Pushpak Bhattacharyya, A Heuristic Based Approach for Systematic Error Correction of Gaze Data for Reading, First Workshop on Eye Tracking and NLP, part of COLING 2012, Mumbai, India, 15 Dec, 2012

Projects

Significant

1. Natural Language Generation (NLG) from Structured Data

(Duration: 2017- , Status: Ongoing)

The research aims at generating natural language descriptions from structured data such as knowledge graphs, tables etc. Motivated by the need to approach this problem in a manner that is scalable and adaptable to newer domain (unlike existing related systems that are rule/template-based or end-to-end neural systems), we introduce scalable modular approaches that do not require any labelled data for generation. Rather, these systems require only large scale unlabelled text and basic NLP tools such as Part of Speech taggers. Our initial experiments on a benchmark mixed domain dataset reveal the superiority of our framework over various existing data-to-text systems. We are currently focusing on generation of interesting narratives from structured data.

2. Unsupervised Controllable Language Generation

(Duration: 2017-, Status: Ongoing)

Like data-to-text NLG, scalable and interpretable solutions are also elusive for text-to-text NLG problems such as text-simplification, formalization, purpose paraphrasing. We aim to devise new unsupervised learning schemes for text-to-text NLG problems. So far we have proposed novel and practical solutions for unsupervised text simplification and controllable text transformation. We have also attempted to provide solutions for sarcasm generation, a highly nuanced task that requires language understanding at deep semantic and pragmatic levels.

3. Cognitive NLP through Eye-tracking

(Duration: 2013-2017, Status: Completed, Remark: PhD. Thesis)

The research attempts to gain insights into the cognitive underpinnings of human language processing and understanding. The insights are then translated to methods and models that contribute to the field of NLP by achieving the following objectives: (1) Optimizing Human Annotation Effort for better annotation management for NLP, and (2) Improving existing NLP systems by introducing cognitive features.

Today’s NLP is highly statistical in nature and needs massive amount of human annotated data. In our setting, apart from collecting the annotations, we aim to record annotators’ activities in the form of their eye movement patterns, key-strokes and neuro-eletric signals obtained using EEG. Through a series of studies using eye-tracking alone, we show that data of such kind, can be used to model complexities of tasks like translation and sentiment annotation, where eye-movement data is used to label training data that model annotation effort for the specified tasks. This can be useful for better annotation management (for example, proposing better annotation cost models). We also show that eye movement data can also be used to extract Cognition Driven Features, to be used to be used for difficult NLP tasks like Sentiment Analysis and Sarcasm Detection. Our proposed approaches consistently perform better than state-of-the-art sentiment and sarcasm classifiers, showing that cognitive features can be useful for tasks that are nuanced by linguistic subtleties. For more information, visit the Cognitive NLP website.

4. Indian Language Machine Translation

(Duration: 2013-2014, Status: Completed)

We developed a compendium of 110 Statistical Machine Translation systems built from parallel corpora of 11 Indian languages belonging to the Indo-Aryan and Dravidian families. We analyze the relationship between translation accuracy and the language families involved. We feel that insights obtained from this analysis will provide guidelines for creating machine translation systems for specific Indian language pairs. For our studies, we built phrase based systems and some extensions. Across multiple languages, we show improvements on the baseline phrase based systems using these extensions: (1) Source side reordering for English-Indian language translation, and (2) Transliteration of untranslated words for Indian language-Indian language translation. These enhancements harness shared characteristics of Indian languages. To stimulate similar innovation widely in the NLP community, we have made the trained models for these language pairs publicly available. The system is available at: http://www.cfilt.iitb.ac.in/indic-translator

5. Crowdsourcing for NLP Resources

(Duration: 2011-2013, Status: Completed)

We developed a framework (Funded by Xerox Research Center, India) that helps an NLP developer to customize and float linguistic annotation tasks through popular crowdsourcing service providers (like Amazon’s Mechanical Turk). Though the framework is generic and flexible enough to tackle different linguistic tasks, we took Machine Translation as our use case to demonstrate its efficacy. We show that, using this framework, multilingual translation parallel corpora could be collected and quality controlled with less expenditure in comparison to the traditional way of outsourcing translation tasks to professional translators.

Tutorials and Invited Talks

[To be presented at ACL 2019]: Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective.

[18 Jan, 2019]: Tutorial on “Natural Language Generation and its Applications”, Indian Institute of Science, Bangalore, India

[28 July, 2018]: Invited talk at The FAER Faculty Development Workshop on AI and ML, M.S. Ramaiah University titled “Understanding how machines understand us: A perspective on Natural Language Processing”, M.S. Ramaiah University, Bangalore, India

[26 Apr, 2018]: Tutorial on “Cognitively Inspired Natural Language Understanding and Generation”, Dharmsinh Desai University , Gujarat, India

[2 Feb, 2018]: Talk on paper “Cognition-Cognizant Sentiment Analysis with Multitask Subjectivity Summarization based on Annotators’ Gaze Behavior”, AAAI 2018 Conference, New Orleans, USA

[20 Jan, 2018]: Tutorial on “Natural Language Generation”, Indian Institute of Science, Bangalore, India

[31 Jul, 2017]: Talk on paper “Learning Cognitive Features from Gaze Data for Sentiment and Sarcasm Classification using Convolutional Neural Network”, ACL 2017 Conference, Vancouver, Canada

[08 Aug, 2016]: Presentation on “Harnessing Cognitive Features for Sarcasm Detection.”, ACL 2016 Conference, Berlin, Germany

[20 June 2016]: Tutorial on “Natural Language Processing and Machine Learning”, VIVA Institute of Technology, Mumbai, India

[21 Jan, 2015]: Talk on “Cognitive NLP”, IIT Bombay (Target Audience: Visiting Team, NIST, USA)

[10 Jul, 2014 - 18 Jul, 2014]: Tutorial on “Natural Language Processing”, Samsung Research Lab, Bangalore

[28 Nov, 2013]: Talk on “Eye Tracking Applications in Translation Process Research”, JSS Academy of Science, Noida, India

[10 Aug, 2013]: Talk on “Estimation of Text Translation Complexity”, Copenhagen Business School, Copenhagen, Denmark

Dataset and Resources

1. Cognitive NLP:

Eye-tracking datasets for various NLP and Psycholinguistic tasks viz. Sentiment Analysis, Sarcasm Detection, Coreference Resolution, Text Quality Assessment, and Text Readability Assessment can be downloaded from this website (Go to “Resources”).

2. Natural Language Generation:

Code and dataset for “Sarcasm Generation” here

Code, dataset and resources for “Unsupervised Neural Text Simplification” can be found here .

Code, dataset and resources for “Natural Language Description Generation from Tables and Graphs” can be found here .

Code, dataset and resources for “Unsuperviser Controllable Text Formalization” can be found here .

Contact

  • abhijitmishra[dot]530[at]emailofgoogle[dot]com
  • +91-1000010001001110101110000101011101 (Spambots' bliss!!!)
  • 8th Floor, Block-G2, Manyata Embassy Business Park, Nagawara, Bangalore-560045, India.
  • Skype Me
Hits