Thanks for stopping by. I'm excited to share a bit about who I am, what I've done, and what I'm passionate about. Feel free to explore and reach out if you want to chat or collaborate!
I'm a Computer Science grad student at Binghamton University - SUNY, currently diving deep into data structures, machine learning, and social media data science. Before that, I completed my Bachelor's in Computer Engineering at University of Mumbai, where I built a strong foundation in everything from cloud computing to data mining.
I love working with a variety of technologies. Here's a quick rundown:
- Programming Languages: Python π, R, Mulesoft, Java β, JavaScript, Node.js, React.js, REST APIs, Rust, MongoDB, SQL
- Frameworks & Libraries: LangChain, PySpark, Hadoop, FastAPI, Flask, PyTorch, TensorFlow, NumPy, Pandas, Scikit-learn, Spring
- Tools & Platforms: Azure, AWS, GCP, Spark, Hive, Sqoop, Power Automate, Git, Maven, Postman, Docker, Messaging Queues, CI/CD, Tableau
- Certifications: Microsoft AZ-900, AWS Technical Accreditation, Machine Learning A-Zβ’: Python & R in Data Science
Research Assistant β Data Science & Analytics
Sep 2024 β Present
I'm working on some pretty cool projects:
- Fine-tuning large language models (think BERT and GPT-3.5) on hundreds of multimodal patient records to boost cancer imaging diagnosis accuracy by 12% π―.
- Engineering features using PCA and optimizing SVM classifiers to hit high accuracy rates while reducing false positives π.
- Using PySpark to process hundreds of thousands of genomic samples, making data harmonization a breeze for clinical research π.
- Running statistical analyses to unearth key biomarkers that could shape future treatments π.
Senior Software Engineer β Data Engineering
Nov 2021 β Jan 2024
I dove deep into data engineering and machine learning:
- Developed text summaries using Hugging Face models, speeding up issue prioritization by 50% β‘.
- Scaled AWS EMR workflows to process over 10M records daily and significantly cut runtimes β±οΈ.
- Built a serverless ML monitoring system that slashed manual reporting from 8 hours a week to just half an hour π.
- Enhanced data pipelines with Spark to boost throughput and feature engineering capabilities π.
Graduate Engineer Trainee
Jul 2021 β Nov 2021
- Developed Python-based ML microservices on Azure Functions, reducing deployment costs by 40% π‘.
- Integrated Azure tools to enhance scaling and security within CI/CD pipelines π.
Web Development Intern
Jun 2019 β Aug 2019
I got my start by:
- Building a timesheet application with React, Flask, and SQL that saved managers time and streamlined billing β³.
- Training an NLP-powered chatbot to answer FAQs with impressive accuracy π€.
- Assisting with third-party API integrations to boost website functionality and user engagement π.
I love taking on projects that challenge me and help me grow. Here are a few highlights:
-
Employee Timesheet and Billing Cost Calculator:
A handy web app using React JS and Python Flask that helps employees log work, managers approve submissions, and clients review project costs. -
Group Chat and Large-File Transfer Application:
Built with Python's Socket and GUI libraries, this app supports real-time chatting and file transfers up to 2GB π¬π. -
Facial Image Generation for Suspect Identification:
Trained a TensorFlow-based DCGAN on over 200K images to generate high-resolution facial images, achieving an impressive FrΓ©chet Inception Distance score and published in Springer ICSES 2021 πΌοΈβ¨. -
Comprehensive Study of Failed Machine Learning Applications:
A research project employing a 3C (Consolidation, Classification, Case Studies) approach, culminating in a co-authored chapter in a Taylor & Francis ML journal π. -
Forest Cover Classification and Clustering:
Leveraged PCA and SVM (with GridSearchCV) to classify forest cover data with high accuracy, and applied K-Means clustering to validate the results π²π. -
Gene Mutation and RNAseq Data Analysis:
Processed and analyzed data from over 500 NSCLC patient records to study gene expression and survival outcomes π§¬. -
Social Media Sentiment Analysis - Reddit and 4chan:
Developed a sentiment scoring pipeline with Spark NLP and Logistic Regression, deployed on AWS EMR to analyze over 100K stock market-related posts ππ¬. -
Virtual Chemistry Lab:
A web-based simulation of 25 chemistry experiments that even led to a published paper in an international journal (IRJET) π¬.
I believe in sharing knowledge and building community:
- Editorial Head, ACM Student Chapter:
Led the publication of our annual technical magazine in 2021 π°. - Co-Technical Head, ICACTA-2020:
Helped organize and manage the technical aspects of an international conference π.
If you're curious about my work, have a question, or just want to chat about tech and innovation, feel free to reach out:
- Phone: (607) 313-8194
- Email: [email protected]
- LinkedIn: Prem Bhajaj
- GitHub: prembhajaj
Looking forward to connecting and collaborating!
Cheers,
Prem