Subhalingam is interested in the broad areas of Natural Language Processing (NLP), Information Retrieval (IR) and Deep Learning. He currently works as a Data Scientist at KnowDis Data Science and holds a B.Tech. degree in Mathematics and Computing from the Indian Institute of Technology, Delhi (IIT Delhi). He has worked on building neural Q&A, machine translation and recommender systems across a variety of domains. He is specifically interested in applying NLP techniques to Indian languages. Apart from coding, one can also find him listening to music, watching football, teaching other people, having some plates of biryani or sleeping.
CGPA: 8.196
Marks: 96.4%
CGPA: 10
Received Pre-Placement Offer for impeccable performance during the internship
Received Letter Of Recommendation from CEO for exemplary work accomplishments
Assisted newcomers regularly to improve their English language communication skill
Over 120 hours of community work primarily in Teaching projects
Part of the Web Frontend Development team
We propose a novel model that uses pre-trained word embeddings for encoding the words and incorporates the sentiment scores as weights to mark the importance of the words. It then computes a weighted sum to get the tweet representation and aggregates these to obtain the user representation. The user representation is finally fed to an ML classifier. Our model achieves an accuracy of 76% on the test set and outperforms the best model in the competition.
Ongoing ProjectFine-tuned XLM-RoBERTa for multilingual Q/A using chaii-1 dataset augmented with MLQA, XQuAD & SQuAD and attained test Jaccard score of 68.72%.
View ProjectCompared non-contextual and contextual embeddings (GloVe+BiLSTM vs BERT) using WiC dataset for WSD task.
View ProjectProcessed tweets with tweet normalization, internet slang dictionary, stemming, etc.; vectorized with TF-IDF; fed into LR.
View ProjectBuilt a regex-based system that accounts for chunks with abbreviations, dates. numerical quantities and inflections. Obtained test F1-score of 97.94%.
View ProjectReviewed state-of-the-art bankruptcy prediction models and observed poor recall. Hypothesized class imbalance & missing values to be the reasons. Trained an ensemble model with Mean Imputation & SMOTE on Polish companies dataset and gained 10% improvement in recall.
View ProjectTrained a Takagi–Sugeno type neuro-fuzzy model in TensorFlow for diabetes prediction and obtained accuracy of 81.3%.
View ProjectUsed probabilistic query expansion and relevance model based language modeling with unigram/bigram setting & Dirichlet smoothing to rerank retreived documents and improve the MRR and nDCG scores of the system.
View ProjectImplemented end-to-end retrieval system indexed with TF-IDF weights & cosine similarity-based ranking. Added prefix searching and named entity based searching (using StanfordNER) to narrow down the results of retreival. Compressed index file by encoding differences between document IDs & reduced size by half (topped class leaderboard for index size).
View ProjectRevamped the website using CSS & Javascript for better user experience and easy accessibility & retrieval of information.
Visit WebsiteGenerated generic Graph data structure to store triangles, points & edges for given triangulation topology of 3D shapes. Implemented traversal algorithms to get neighbours, boundary edges, count of connected components & closest components.
View ProjectImplemented Trie, Red-Black Tree & Max-Heap to execute jobs from users for projects based on priorities & resources. Added features for fetching job status & top budget consuming users, flushing starving jobs & updating project priorities.
View ProjectGenerated a Binary Tree by parsing fully parenthesised infix expression and computed its derivative by traversal. The parser was made to support a variety of functions like algebraic, trigonometric, exponential & composite functions.
View Project