Subhalingam D

Experience

KnowDis Data Science LLP, Delhi

Data Scientist

Product Category Search Engine (for IndiaMART)

Observed recall@2 of 94% (+6% than recall@1) and motivated to build a system to rescore top-k categories for improving accuracy
Built a reranker that encodes the query & retrieved categories independently, aligns each query token with the most relevant category token and aggregates the similarity scores across the query; the category embeddings are pre-computed offline and cached in memory
Revised confidence classification rules, resulting in 82% (+6%) coverage for high-confidence class while maintaining accuracy at 95.5%
Attained a 1-2% gain in overall accuracy and currently working on parallelizing the encoding step in the reranker with the retriever

Contextual Query Understanding (for IndiaMART)

Developed a two-stage system to identify all the relevant attributes mentioned in a query and extract their corresponding values
Trained BART and RoBERTa models using processed product names and specifications data for attributes identification and labelling
Formulated a negative sampling strategy and made input layer modifications to tackle incomplete tagging in the data during training
Deployed the system using FastAPI and presented a demo to the client; planned to integrate with search system for refining results

English-to-Hindi Translator with Style Restriction

Built an mBART-based translation baseline for converting English texts to Hindi in a specified style using in-house parallel corpora
Obtained the English translations for scraped monolingual Hindi data using Google Translate API to augment the training data
Reviewing existing works on controlling styles in text generation, specifically for low-resource settings, to create improved systems

Other contributions:

Explored non-autoregressive generation methods to convert Roman Hindi words in search queries to English to achieve low-latency
Experimented with lexical string matching using Elasticsearch to handle model numbers in a search query

KnowDis Data Science LLP, Delhi

Data Science Intern

Devised an NLP scheme to predict the most relevant product category (from 113k possible labels) from user queries/product listings
Trained a transformer-based classifier on automatically labelled data and added heuristics to improve knowledge of category labels
Incorporated causal attention mask, which improved results; fine-tuned T5 model for oversampling under-represented categories
Achieved similar accuracy (~88%) as the previous seq2seq model while significantly reducing average response time (3x faster) and completely eliminating timeouts
The model was integrated with IndiaMART's search system and was deployed in production

Data Group, Indian Institute of Technology, Delhi

Undergraduate Researcher • Supervised by Prof. Srikanta Bedathur & Prof. Maya Ramanath • In collaboration with IBM AI Horizons Network

Prepared a dataset consisting of How-to troubleshooting FAQs by scraping WikiHow pages from Computers and Electronics category
Constructed BERT-based baselines to predict changes in properties of the entities involved at each step of the process
Surveyed the literature to build next-step recommenderfrom a given sequence of performed actions and developed LSTM baselines

Samsung R&D Institute, Delhi

Software Engineering Intern (S/W Intelligence Team)

Developed sound source direction estimation module using time delay of arrival of signals between pairs of microphones in an array
Added modules for tracking active sound sources and extracting individual signals for downstream object identification pipeline
Integrated stationary noise estimation module for ambient noise removal and reduced maximum direction of arrival error to 7°

Received Pre-Placement Offer for impeccable performance during the internship

MateRate Education Pvt Ltd, Delhi

Machine Learning Researcher & Developer

Developed Item Response Theory-based models to estimate and analyze the ability of 5000+ students & difficulty of 200+ questions

Backend Web Developer and AWS Associate

Designed database schema and built Web APIs using Django REST framework to display students’ performance reports to parents
Deployed Django backend using Elastic Beanstalk with MySQL on RDS and React frontend to S3 with CloudFront CDN integration
Set up Auto Scaling group and attached Load Balancerfor horizontal scaling; the portal went live with the results of 5000+ students

Received Letter Of Recommendation from CEO for exemplary work accomplishments

Activities

Teaching Assisstant

Aug '21 - Dec '21

COL764: Information Retrieval & Web Search
(Graduate-level course taught by Prof. Srikanta Bedathur at IIT Delhi)

General Secretary

Aug '21 - Jul '22

Mathematics Society, IIT Delhi

Overall Coordinator

Jul '20 - Jul '21

Mathematics Society, IIT Delhi

Led Web Development, Blog & Social Media teams
Designed, developed & released the website & blog
Played key role in facilitating technical activities

Web Development Executive

Sep '19 - Jul '20

Student Incubation Cell, IIT Delhi

Designed & Developed Entrepreneurship Portal for the Institute using Node.js, CSS and Javascript as a team
Got appreciation from Prof. V. Ramgopal Rao (Director) & Prof. P. V. Madhusudhan Rao for commendable work done

Language Mentor

Aug '19 - Dec '19

Board for Student Welfare (BSW), IIT Delhi

Assisted newcomers regularly to improve their English language communication skill

Executive

Jul '19 - Jul '20

Mathematics Society, IIT Delhi

Planned and organised various events
Managed the social media handles

Volunteer

National Service Scheme (NSS), IIT Delhi

Over 120 hours of community work primarily in Teaching projects

VIDYA Project: Taught students of classes from 3rd to 12th to inculcate the fundamental concepts of all subjects
NAB Project: Interacted with the students at NAB, RK Puram, and assisted them in learning their school subjects
Garhwal Project: Prepared notes from certain chapters of Mathematics of Class 12 for usage in JEE Main preparation
Intellify: Prepared concept videos from Chemistry Class 12 to help students understand the chapters during COVID-19

Technical Executive

Aug '19 - Oct '19

Rendezvous, IIT Delhi

Part of the Web Frontend Development team

Projects

Identification of Hate Spreaders on Social Media (Bachelor's Thesis)

Prof. Niladri Chatterjee

We propose a novel model that uses pre-trained word embeddings for encoding the words and incorporates the sentiment scores as weights to mark the importance of the words. It then computes a weighted sum to get the tweet representation and aggregates these to obtain the user representation. The user representation is finally fed to an ML classifier. Our model achieves an accuracy of 76% on the test set and outperforms the best model in the competition.

Ongoing Project

chaii - Hindi and Tamil Question Answering

Prof. Mausam

Fine-tuned XLM-RoBERTa for multilingual Q/A using chaii-1 dataset augmented with MLQA, XQuAD & SQuAD and attained test Jaccard score of 68.72%.

View Project

Context-Sensitive Word Sense Disambiguation

Prof. Mausam

Compared non-contextual and contextual embeddings (GloVe+BiLSTM vs BERT) using WiC dataset for WSD task.

View Project

Tweet Sentiment Classifier

Prof. Mausam

Processed tweets with tweet normalization, internet slang dictionary, stemming, etc.; vectorized with TF-IDF; fed into LR.

View Project

Rule-based Written-to-Spoken Text Converter

Prof. Mausam

Built a regex-based system that accounts for chunks with abbreviations, dates. numerical quantities and inflections. Obtained test F1-score of 97.94%.

View Project

Bankruptcy Prediction

Prof. Niladri Chatterjee

Reviewed state-of-the-art bankruptcy prediction models and observed poor recall. Hypothesized class imbalance & missing values to be the reasons. Trained an ensemble model with Mean Imputation & SMOTE on Polish companies dataset and gained 10% improvement in recall.

View Project

Adaptive Network-based Fuzzy Inference System for Diabetes Prediction

Prof. Niladri Chatterjee

Trained a Takagi–Sugeno type neuro-fuzzy model in TensorFlow for diabetes prediction and obtained accuracy of 81.3%.

View Project

Document Reranking using Pseudo-Relevance Feedback

Prof. Srikanta Bedathur

Used probabilistic query expansion and relevance model based language modeling with unigram/bigram setting & Dirichlet smoothing to rerank retreived documents and improve the MRR and nDCG scores of the system.

View Project

Vector Space Model for News Articles Retrieval

Prof. Srikanta Bedathur

Implemented end-to-end retrieval system indexed with TF-IDF weights & cosine similarity-based ranking. Added prefix searching and named entity based searching (using StanfordNER) to narrow down the results of retreival. Compressed index file by encoding differences between document IDs & reduced size by half (topped class leaderboard for index size).

View Project

Web Designing & Development for SAC, IIT Delhi

Revamped the website using CSS & Javascript for better user experience and easy accessibility & retrieval of information.

Visit Website

Triangulation Topology Analysis using Graph Theory

Prof. Subodh Kumar

Generated generic Graph data structure to store triangles, points & edges for given triangulation topology of 3D shapes. Implemented traversal algorithms to get neighbours, boundary edges, count of connected components & closest components.

View Project

Priority-based Job Scheduler

Prof. Subodh Kumar

Implemented Trie, Red-Black Tree & Max-Heap to execute jobs from users for projects based on priorities & resources. Added features for fetching job status & top budget consuming users, flushing starving jobs & updating project priorities.

View Project

Symbolic Differentiation

Prof. Subhashis Banerjee

Generated a Binary Tree by parsing fully parenthesised infix expression and computed its derivative by traversal. The parser was made to support a variety of functions like algebraic, trigonometric, exponential & composite functions.

View Project

Subhalingam D

Data Scientist at KnowDis

About Me

Education

Indian Institute of Technology, Delhi

B.Tech. in Mathematics and Computing

Chennai Public School, Chennai

CBSE Std. XII

Chennai Public School, Chennai

CBSE Std. X

Experience

KnowDis Data Science LLP, Delhi

Data Scientist

KnowDis Data Science LLP, Delhi

Data Science Intern

Data Group, Indian Institute of Technology, Delhi

Undergraduate Researcher • Supervised by Prof. Srikanta Bedathur & Prof. Maya Ramanath • In collaboration with IBM AI Horizons Network

Samsung R&D Institute, Delhi

Software Engineering Intern (S/W Intelligence Team)

MateRate Education Pvt Ltd, Delhi

Machine Learning Researcher & Developer

Backend Web Developer and AWS Associate

Publications

Tracking entities in technical procedures -- a new dataset and baselines.

Saransh Goyal, Pratyush Pandey, Garima Gaur, Subhalingam D, Srikanta Bedathur, Maya Ramanath. CoRR, 2021.

Activities

Teaching Assisstant

COL764: Information Retrieval & Web Search (Graduate-level course taught by Prof. Srikanta Bedathur at IIT Delhi)

General Secretary

Mathematics Society, IIT Delhi

Overall Coordinator

Mathematics Society, IIT Delhi

Web Development Executive

Student Incubation Cell, IIT Delhi

Language Mentor

Board for Student Welfare (BSW), IIT Delhi

Executive

Mathematics Society, IIT Delhi

Volunteer

National Service Scheme (NSS), IIT Delhi

Technical Executive

Rendezvous, IIT Delhi

Projects

Identification of Hate Spreaders on Social Media (Bachelor's Thesis)

Prof. Niladri Chatterjee

chaii - Hindi and Tamil Question Answering

Prof. Mausam

Context-Sensitive Word Sense Disambiguation

Prof. Mausam

Tweet Sentiment Classifier

Prof. Mausam

Rule-based Written-to-Spoken Text Converter

Prof. Mausam

Bankruptcy Prediction

Prof. Niladri Chatterjee

Adaptive Network-based Fuzzy Inference System for Diabetes Prediction

Prof. Niladri Chatterjee

Movie Recommender System

Prof. Srikanta Bedathur

Dealing with Fake News on Social Media

Prof. Srikanta Bedathur

Document Reranking using Pseudo-Relevance Feedback

Prof. Srikanta Bedathur

Vector Space Model for News Articles Retrieval

Prof. Srikanta Bedathur

Web Designing & Development for SAC, IIT Delhi

Movie Review Sentiment Classifier using RNN

Triangulation Topology Analysis using Graph Theory

Prof. Subodh Kumar

Priority-based Job Scheduler

Prof. Subodh Kumar

Analysis of Double Hashing and Separate Chaining

Prof. Subodh Kumar

Consumer-Producer Problem with Multithreading

Prof. Subodh Kumar

Symbolic Differentiation

Prof. Subhashis Banerjee

Sound Detector Switch

Product Realisation by Manufacturing

Skills

COL764: Information Retrieval & Web Search
(Graduate-level course taught by Prof. Srikanta Bedathur at IIT Delhi)