Balaj Khalid

About Me

Hello! I'm a passionate Data Scientist with 3 years of experience in transforming complex datasets into valuable business insights. My expertise lies in machine learning, statistical analysis, and data visualization.

I enjoy solving challenging problems and building models that help organizations make data-driven decisions. When I'm not coding or analyzing data, you can find me hiking, reading about the latest advancements in AI, or experimenting with new visualization techniques.

3+ Years Experience

15+ Projects Completed

1+ Publications

Projects

Exploring the use of LLMs in Social Media Bot Detection

Achieved 0.83 accuracy and 0.74 F1-score in bot detection on X (formerly Twitter) by developing an LLM-based classifier using Llama 3.1 and the Twibot-22 dataset, leveraging expert model ensembling, majority voting, and reinforcement learning with human feedback (RLHF).

Python LangChain Llama

View Project

Artsy Web App

Developed a full-stack art discovery platform with React frontend and Node.js backend, integrated MongoDB for data storage; features include user authentication, artist search, and dynamic artwork listing.

JavaScript React Node.js MongoDB

View Project

Disaster Response Information System

Achieved a 92.3% F1-score in detecting COVID-19 fake news by leading a team of four to develop a machine learning model using Python and Google Cloud Platform. Integrated the model into a chatbot deployed on a disaster response website to reduce misinformation and provide real-time, accurate information to users during the pandemic.

Python Machine Learning Google Cloud Platform (GCP)

View Project

Game Swap

Streamlined trading and selling of physical game copies by developing an iOS marketplace app using Swift and Google Cloud Platform, featuring search, user profiles, detailed game listings, real-time chat, and a scalable backend with Frebase.

iOS Swift Google Cloud Platform (GCP)

View Project

Resume

Work Experience

May 2025 - Present

Data Science Intern

Alcon Inc.

Developed demand forecasting solution for US-based IOLs by building an end-to-end pipeline leveraging advanced time series models, driving data-backed inventory optimization and enhancing supply chain visibility; positioned to improve operational efficiency and generate significant cost savings.

March 2025 - Present

AI Researcher

USC Center for Advanced Research Computing (CARC)

Conducted research on multi-agent reinforcement learning by implementing and comparing algorithms such as DQN, PPO, and coordination strategies like CTE, DTE, and CTDE; supplemented ongoing work in self-organizing systems to evaluate and improve the robustness and generalizability of reward functions across different learning models, contributing to a deeper understanding of scalable agent coordination.

March 2023 - December 2023

Data Scientist II

Afiniti Ltd.

Reduced model deployment time by 30% and enhanced operational efficiency by designing and automating a full-cycle MLOps pipeline, including raw data ingestion, transformation, model training, deployment, and monitoring.
Enhanced productivity and efficiency by mentoring and leading a team of three in data analysis, predictive modeling, and production monitoring, fostering a high-performance culture.
Achieved a 9% reduction in customer churn and generated €2M in additional revenue for Telefonica Spain by optimizing churn prediction models with advanced gradient boosting techniques (XGBoost, LightGBM)
Increased data processing efficiency and contributed to €1M in incremental revenue by developing a real-time customer interaction tracking system using R, PostgreSQL, and automated ETL workflows with Talend. Developed predictive models for customer behavior analysis, created data pipelines, and collaborated with cross-functional teams

June 2021 - Feburary 2023

Data Scientist I

Afiniti Ltd.

Generated €9.7M in additional revenue by successfully onboarding Telefonica Spain as a strategic client and designing real-time predictive modeling to optimize agent-caller pairing.
Increased model interpretability and predictive accuracy, leading to $0.8M in additional annual revenue, by engineering domain-specific ML features tailored to business needs.
Saved $0.5M in revenue by developing a real-time production monitoring and data integrity dashboard using Apache Superset, enabling proactive anomaly detection and preventing costly errors.

Education

Expected Graduation: December 2025

Master of Science in Computer Science

University of Southern California

Relevant course work: Artificial Intelligence, Machine Learning, NLP, Databases, Web Technologies, and Algorithms.

September 2017 - May 2021

Bachelor of Science in Computer Science

Lahore University of Management Sciences

Relevant course work: Data Science, Artificial Intelligence, Machine Learning, Data Structures and Algorithms, Probability, Statistics, Linear Algebra, and Calculus.

Download Full Resume

Balaj Khalid

Data Scientist

About Me

Skills

Programming Languages

Data Science & ML

Tools & Frameworks

Projects

Exploring the use of LLMs in Social Media Bot Detection

Artsy Web App

Disaster Response Information System

Game Swap

Resume

Work Experience

Data Science Intern

Alcon Inc.

AI Researcher

USC Center for Advanced Research Computing (CARC)

Data Scientist II

Afiniti Ltd.

Data Scientist I

Afiniti Ltd.

Education

Master of Science in Computer Science

University of Southern California

Bachelor of Science in Computer Science

Lahore University of Management Sciences

Contact Me

Email

Phone

Location