About me

I am a Data Scientist with experience in both R and Python, I enjoy delivering projects on time and continuously improving my skill set. During my work as a Data Analyst, I acquired valuable skills in data cleaning and pattern recognition. Additionally, my past experiences in Biotechnology and IT equipped me with valuable communication and teamwork skills, enabling me to work efficiently with diverse teams and stakeholders.

Things I Can Do

Acquire, clean, analyze, and model data using different machine learning algorithms and techniques like Lasso, Ridge, XGBoost, Stacking, TensorFlow, Natural Language Processing, and Greykite.

  • Python and R
  • Time series models
  • Convolutional neural networks (CNN)
  • Natural language processing (NLP)
  • SQL
  • Microsoft Power BI

Projects

MRI disease classification

Throughout this project, I constructed four distinct models employing Convolutional Neural Networks (CNNs) to classify Brain MRI scans with the intention of identifying two particular diseases: Alzheimer's disease and brain tumors. The classification will depend on disease severity and tumor type. My most successful models utilized ResNet50, EfficientNetV2S, and augmented data through the Albumentations library. By combining these models, I developed a Streamlit application enabling users to classify individual input images. Upon uploading an image, the application displays it and predicts the corresponding class label and probability utilizing pre-trained models.

Effect of COVID-19 on Mental Health related searches

To forecast mental health-related searches, my team and I combined Google Trends with data on state-mandated restrictions during different time periods. We used SARIMAX to analyze these searches, both with and without restrictions as exogenous features . We also utilized SARIMA and Greykite to investigate how the initiation of restrictions affected mental health-related searches.

Differentiate Reddit Bioinformatics and Data Science Subreddits

I developed a model to distinguish between Bioinformatics and Data Science articles. Reddit posts were collected using PushShift. My Stacking model, which combined Naive Bayes, XGBoost, and Gradient Boosting, achieved an impressive F1 score of 0.85 and a test accuracy of 91%. This represents a significant improvement over my baseline model, which had an accuracy of only 67%.

Contact Me