I completed the following projects during my MSc in Data Science and Analytics:
Major Research Project
The topic for my major research project, done over the summer, was “A Confidence Measure-Based Evaluation Metric for Breast Cancer Screening Using Bayesian Neural Networks”.
Here, I developed a new evaluation criterion based on confidence measurement for breast cancer mammography image classification, so that in addition to classification accuracy, it provides a few tunable numeric parameters to adjust the confidence level of the classification. This project leverages Bayesian Deep Neural Networks and Transfer Learning, and was built with PyTorch and Pyro
Project for DS8001 - Design of Algorithms and Programming for Massive Data
In this course, I worked on a project to parallelize segments of the end to end ensemble training and evaluation for a text classification task.
As part of that, I developed an end to end scalable ensemble learning algorithm with 3 shallow models (Logistic Regression, Naive Bayes and Random Forest) and 3 deep neural models (LSTMs with 3 different dropout rates), showing the performance gain of parallelism. The training and evaluation of these models, along with preprocessing of data, is parallelized. The project was developed with python’s multi-processing framework along with joblib wrapper, numpy, pandas, scikit-learn and Keras.
Project for DS8003 - Management of Big Data and Big Data Tools
In this course, we had a take-home final project, where I worked on developing TFIDF Search With Spark, by using Apache Spark with HDFS to create TFIDF index and search for queries with a cricket corpus.
The documents were loaded as separate records; each record was tokenized; then the following were calculated: words per document (TF), distinct documents for each term (DF), IDF, and TFIDF index. The system can take any query and tokenize the query the same way it would tokenize any document in the corpus and then conduct a search.
Project for DS8008 - NLP (Text Mining)
The group project (of 2 people) for this course involved reimplementing a paper (Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings).
As part of this, I performed hard gender debiasing on pretrained GloVe embeddings and neutralized and equalized gender word pairs such that any neutral word is at equal distance to gender word pairs such as she-he.