I am an ML Scientist in the Siri team at Apple inc., Seattle. Before this I was a part of IBM Research, Bangalore, India, serving as Research Scientist in the department of AI Tech. I have obtained Ph.D in Computer Science and Engineering (CSE) from IIT Bombay, under the guidance of Prof. Pushpak Bhattacharyya.
Here is my CV.
PhD in Computer Science and Engineering, 2017
Indian Institute of Technology Bombay
B.Tech in Computer Science and Engineering, 2010
College of Engineering and Technology Bhubaneswar
At Apple Inc., I am a part of the Siri team. I contribute to the multi-lingual and cross-lingual efforts that are going around under the belt of Siri.
Prior to Apple, I was working at IBM Research, India on Natural Language Generation (NLG) problems for the last two and half years. I have participated in projects dealing with data-to-text and text-to-text generation paradigms. In data-to-text, our research aims at generating natural language descriptions from structured data such as knowledge graphs, tables etc. In text-to-text, our focus has been on problems such as text-simplification, text style-transfer and controllable paraphrasing.
My doctoral research comes broadly under the area of Natural Language Processing and relates to Machine Learning, Cognitive Science, Classical Linguistics and Psycholinguistics. The objective was to uncover the cognitive underpinning of human language processing and translate the insights to better algorithms in Computational Linguistics. I made use of the Eye-Tracking technology to record and analyze human eye movement patterns to gain insights into the human way of performing Translation, Sentiment and Sarcasm Analysis, and tackling linguistic subtleties during reading. To know more, please visit the Cognitive NLP website.
Apart from thesis related work, at IIT Bombay, I contributed to various projects related to Machine Translation, Social Media Text Analysis, Crowdsourcing and Development of Resources and Tools for Indian Language Processing.
Main Conferences and Journals
Sandeep Mathias, Rudra Murthy, Diptesh Kanojia, Abhijit Mishra, Pushpak Bhattacharyya, Happy Are Those Who Grade without Seeing: A Multi-Task Learning Approach to Grade Essays Using Gaze Behaviour. AACL-IJCNLP 2020, worldwide.
Sandeep Mathias, Diptesh Kanojia, Abhijit Mishra and Pushpak Bhattacharyya, A Survey on Using Gaze Behaviour for Natural Language Processing. IJCAI-PRICAI 2020, Yokohama, Japan.
Abhijit Mishra, Tarun Tater, Karthik Sankaranarayanan, A Modular Architecture for Unsupervised Sarcasm Generation, EMNLP 2019, Hong Kong, China, 3rd Nov - 7th Nov, 2019
Anirban Laha, Parag Jain, Abhijit Mishra, Karthik Sankaranarayanan, Scalable Micro-planned Generation of Discourse from Structured Data, Computational Linguistics, MIT Press, 2019 (equal contribution from the first three authors)
Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, Karthik Sankaranarayanan, Unsupervised Neural Text Simplification, ACL 2019, Florence, Italy, 28th July-2nd Aug, 2019
Parag Jain, Abhijit Mishra, Amar P. Azad, Karthik Sankaranarayanan, Unsupervised Controllable Text Formalization, AAAI 2019, Hawaii, USA, 27th Jan - 1st Feb, 2019
Sandeep Mathias, Diptesh Kanojia, Kevin Patel, Samarth Agrawal, Abhijit Mishra and Pushpak Bhattacharyya, Eyes are the Windows to the Soul: Predicting the Rating of Text Quality Using Gaze Behaviour, ACL 2018, Melbourne, Australia, 15 July-20 July, 2018
Vitobha Munigala, Abhijit Mishra, Srikanth Govindaraj Tamilselvam, Shreya Khare, Riddhiman Dasgupta and Anush Sankaran, PersuaAIDE ! An Adaptive Persuasive Text Generation System for Fashion Domain,WWW 2018, Lyon, France, 23th April - 27th April, 2018
Abhijit Mishra, Srikanth Tamilselvam, Riddhiman Dasgupta, Seema Nagar and Kuntal Dey, Cognition-Cognizant Sentiment Analysis with Multitask Subjectivity Summarization based on Annotators’ Gaze Behavior, AAAI, 2018, New Orleans, USA, 2nd February - 7th February, 2018
Srikanth Tamilselvam, Seema Nagar, Abhijit Mishra and Kuntal Dey, Graph Based Sentiment Aggregation using ConceptNet Ontology, IJCNLP, 2017, Taipei, Taiwan, 27 November-1st December, 2017
Joe Cheri Ross, Abhijit Mishra, Kaustuv Kanti Ganguli, Pushpak Bhattacharyya, Identifying Raga Similarity Through Embeddings Learned from Compositions’ Notation, ISMIR 2017, Suzhou, China, 23-28 October, 2017
Abhijit Mishra, Kuntal Dey, Pushpak Bhattacharyya, Learning Cognitive Features from Gaze Data for Sentiment and Sarcasm Classification using Convolutional Neural Network, ACL 2017, Vancouver, Canada, 30 July-4 August, 2017
Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey, Pushpak Bhattacharyya, Scanpath Complexity: Modeling Reading Effort using Gaze Information, AAAI 2017, San Francisco, USA, 4-9 February, 2017
Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey, Pushpak Bhattacharyya, Harnessing Cognitive Features for Sarcasm Detection, ACL 2016, Berlin, Germany, 7-12 August, 2016
Abhijit Mishra, Diptesh Kanojia, Kuntal Dey, Seema Nagar and Pushpak Bhattacharyya, Leveraging Cognitive Features for Sentiment Analysis, CoNLL 2016, Berlin, Germany, August 11-12, 2016
Abhijit Mishra, Diptesh Kanojia and Pushpak Bhattacharyya, Predicting Readers’ Sarcasm Understandability by Modelling Gaze Behaviour, AAAI 2016, Phoenix, USA, Feb 12-17, 2016
Aditya Joshi, Abhijit Mishra, Balamurali AR, Pushpak Bhattacharyya, Mark J Carman, A computational approach for automatic prediction of drunk-texting ACL 2015, Beijing, China, July 2015
Aditya Joshi, Abhijit Mishra, Nivvedan Senthamilselvan and Pushpak Bhattacharyya, Measuring Sentiment Annotation Complexity of Text, ACL 2014, Baltimore, USA, 23-25 June, 2014
Anoop Kunchukuttan, Abhijit Mishra, Rajen Chatterjee, Ritesh Shah and Pushpak Bhattacharyya, Shata-Anuvadak: Tackling Multiway Translation of Indian Languages, LREC 2014, Rekjyavik, Iceland, 26-31 May, 2014
Abhijit Mishra and Pushpak Bhattacharyya, Automatically Predicting Sentence Translation Difficulty, ACL 2013, Sofia, Bulgaria, 4-9 August, 2013
Workshop and Demo Papers
Abhijit Mishra, Md Faisal Mahbub Chowdhury, Sagar Manohar, Dan Gutfreund, Karthik Sankaranarayanan, Template Controllable keywords-to-text Generation. arXiv preprint arXiv:2011.03722
Parag Jain, Priyanka Agrawal, Abhijit Mishra, Mohak Sukhwani, Anirban Laha, Story Generation from Sequence of Independent Short Descriptions, ML4Creativity, SIGKDD Workshop, Halifax, Nova Scotia - Canada.
Joe Cheri, Abhijit Mishra and Pushpak Bhattacharyya, Leveraging Annotators’ Gaze Behaviour for Coreference Resolution, ACL 2016 Workshop on Cognitive Aspects of Computational Language Learning (CogACLL 2016) at ACL 2016, Berlin, Germany, August 11, 2016.
Diptesh Kanojia, Shehzaad Dhuliawala, Abhijit Mishra, Naman Gupta and Pushpak Bhattacharyya, TransChat: Cross-Lingual Instant Messaging for Indian Languages, ICON 2015, December 2015
Abhijit Mishra, Aditya Joshi and Pushpak Bhattacharyya, A cognitive study of subjectivity extraction in sentiment annotation, 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2014), Baltimore, USA, 27 June, 2014
Anoop Kunchukuttan, Ratish Pudupully, Rajen Chatterjee, Abhijit Mishra, Pushpak Bhattacharyya, The IIT Bombay SMT System for ICON 2014 Tools Contest, ICON 2014, Goa, India, Dec 2014
Piyush Dungarwal, Rajen Chatterjee, Abhijit Mishra, Anoop Kunchukuttan, Ritesh Shah and Pushpak Bhattacharyya, The IIT Bombay Hindi-English Translation System at WMT 2014, 9th Workshop on Statistical Machine Translation (WMT 2014), Baltimore, USA, 26-27 June, 2014
Anoop Kunchukuttan, Rajen Chatterjee, Shourya Roy, Abhijit Mishra and Pushpak Bhattacharyya, TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain, ACL 2013, Sofia, Bulgaria, 4-9 August, 2013
Abhijit Mishra, Michael Carl and Pushpak Bhattacharyya, A Heuristic Based Approach for Systematic Error Correction of Gaze Data for Reading, First Workshop on Eye Tracking and NLP, part of COLING 2012, Mumbai, India, 15 Dec, 2012
1. Multilingual and Cross Lingual Dialog Systems
(Duration: 2017-2020, Status: Completed)
The aim of the research is to make end-to-end dialog systems capable of adapting to newer domains and langauges. This builds upon my previous work on Natural Language Generation, and creation of multi-lingual resources for Indian Languages. Several transfer learning and K-shot learning schemes are also being explored under the belt of this project.
2. Natural Language Generation (NLG) from Structured Data
(Duration: 2017-2020, Status: Completed)
The research aims at generating natural language descriptions from structured data such as knowledge graphs, tables etc. Motivated by the need to approach this problem in a manner that is scalable and adaptable to newer domain (unlike existing related systems that are rule/template-based or end-to-end neural systems), we introduce scalable modular approaches that do not require any labelled data for generation. Rather, these systems require only large scale unlabelled text and basic NLP tools such as Part of Speech taggers. Our initial experiments on a benchmark mixed domain dataset reveal the superiority of our framework over various existing data-to-text systems. We are currently focusing on generation of interesting narratives from structured data.
3. Unsupervised Controllable Language Generation
(Duration: 2017-2018, Status: Completed)
Like data-to-text NLG, scalable and interpretable solutions are also elusive for text-to-text NLG problems such as text-simplification, formalization, purpose paraphrasing. We aim to devise new unsupervised learning schemes for text-to-text NLG problems. So far we have proposed novel and practical solutions for unsupervised text simplification and controllable text transformation. We have also attempted to provide solutions for sarcasm generation, a highly nuanced task that requires language understanding at deep semantic and pragmatic levels.
4. Cognitive NLP through Eye-tracking
(Duration: 2013-2017, Status: Completed, Remark: PhD. Thesis)
The research attempts to gain insights into the cognitive underpinnings of human language processing and understanding. The insights are then translated to methods and models that contribute to the field of NLP by achieving the following objectives: (1) Optimizing Human Annotation Effort for better annotation management for NLP, and (2) Improving existing NLP systems by introducing cognitive features.
Today’s NLP is highly statistical in nature and needs massive amount of human annotated data. In our setting, apart from collecting the annotations, we aim to record annotators’ activities in the form of their eye movement patterns, key-strokes and neuro-eletric signals obtained using EEG. Through a series of studies using eye-tracking alone, we show that data of such kind, can be used to model complexities of tasks like translation and sentiment annotation, where eye-movement data is used to label training data that model annotation effort for the specified tasks. This can be useful for better annotation management (for example, proposing better annotation cost models). We also show that eye movement data can also be used to extract Cognition Driven Features, to be used to be used for difficult NLP tasks like Sentiment Analysis and Sarcasm Detection. Our proposed approaches consistently perform better than state-of-the-art sentiment and sarcasm classifiers, showing that cognitive features can be useful for tasks that are nuanced by linguistic subtleties. For more information, visit the Cognitive NLP website.
5. Indian Language Machine Translation
(Duration: 2013-2014, Status: Completed)
We developed a compendium of 110 Statistical Machine Translation systems built from parallel corpora of 11 Indian languages belonging to the Indo-Aryan and Dravidian families. We analyze the relationship between translation accuracy and the language families involved. We feel that insights obtained from this analysis will provide guidelines for creating machine translation systems for specific Indian language pairs. For our studies, we built phrase based systems and some extensions. Across multiple languages, we show improvements on the baseline phrase based systems using these extensions: (1) Source side reordering for English-Indian language translation, and (2) Transliteration of untranslated words for Indian language-Indian language translation. These enhancements harness shared characteristics of Indian languages. To stimulate similar innovation widely in the NLP community, we have made the trained models for these language pairs publicly available. The system is available at: http://www.cfilt.iitb.ac.in/indic-translator
6. Crowdsourcing for NLP Resources
(Duration: 2011-2013, Status: Completed)
We developed a framework (Funded by Xerox Research Center, India) that helps an NLP developer to customize and float linguistic annotation tasks through popular crowdsourcing service providers (like Amazon’s Mechanical Turk). Though the framework is generic and flexible enough to tackle different linguistic tasks, we took Machine Translation as our use case to demonstrate its efficacy. We show that, using this framework, multilingual translation parallel corpora could be collected and quality controlled with less expenditure in comparison to the traditional way of outsourcing translation tasks to professional translators.
[Presented at ACL 2019]: Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective. Tutorial Materials can be found here
[18 Jan, 2019]: Tutorial on “Natural Language Generation and its Applications”, Indian Institute of Science, Bangalore, India
[28 July, 2018]: Invited talk at The FAER Faculty Development Workshop on AI and ML, M.S. Ramaiah University titled “Understanding how machines understand us: A perspective on Natural Language Processing”, M.S. Ramaiah University, Bangalore, India
[26 Apr, 2018]: Tutorial on “Cognitively Inspired Natural Language Understanding and Generation”, Dharmsinh Desai University , Gujarat, India
[2 Feb, 2018]: Talk on paper “Cognition-Cognizant Sentiment Analysis with Multitask Subjectivity Summarization based on Annotators’ Gaze Behavior”, AAAI 2018 Conference, New Orleans, USA
[20 Jan, 2018]: Tutorial on “Natural Language Generation”, Indian Institute of Science, Bangalore, India
[31 Jul, 2017]: Talk on paper “Learning Cognitive Features from Gaze Data for Sentiment and Sarcasm Classification using Convolutional Neural Network”, ACL 2017 Conference, Vancouver, Canada
[08 Aug, 2016]: Presentation on “Harnessing Cognitive Features for Sarcasm Detection.", ACL 2016 Conference, Berlin, Germany
[20 June 2016]: Tutorial on “Natural Language Processing and Machine Learning”, VIVA Institute of Technology, Mumbai, India
[21 Jan, 2015]: Talk on “Cognitive NLP”, IIT Bombay (Target Audience: Visiting Team, NIST, USA)
[10 Jul, 2014 - 18 Jul, 2014]: Tutorial on “Natural Language Processing”, Samsung Research Lab, Bangalore
[28 Nov, 2013]: Talk on “Eye Tracking Applications in Translation Process Research”, JSS Academy of Science, Noida, India
[10 Aug, 2013]: Talk on “Estimation of Text Translation Complexity”, Copenhagen Business School, Copenhagen, Denmark
1. Cognitive NLP:
Eye-tracking datasets for various NLP and Psycholinguistic tasks viz. Sentiment Analysis, Sarcasm Detection, Coreference Resolution, Text Quality Assessment, and Text Readability Assessment can be downloaded from this website (Go to “Resources”).
2. Natural Language Generation:
Code and dataset for “Sarcasm Generation” here
Code, dataset and resources for “Unsupervised Neural Text Simplification” can be found here .
Code, dataset and resources for “Natural Language Description Generation from Tables and Graphs” can be found here .
Code, dataset and resources for “Unsuperviser Controllable Text Formalization” can be found here .