Hello! I'm Anuj, a passionate Data Scientist currently pursuing my Master's in Computer Science at the University of Texas at Arlington, graduating in May 2027. I enjoy creating things that live on the internet, whether that's building machine learning models, developing data-driven solutions, or solving complex analytical problems.
With 10+ completed projects under my belt, I've worked on various technologies and frameworks. My journey in data science has been driven by a passion for creating efficient, scalable, and user-friendly applications.
Designed machine-learning-powered automation systems, integrating OCR + LLMs for structured data extraction.
Rebuilt a legacy application, reducing system errors by 20% and increasing data reliability.
Coordinated with multiple companies to conduct campus recruitment drives and managed the entire placement process.
Developed a churn prediction model using supervised ML algorithms (Logistic Regression, Random Forest, XGBoost). Performed feature engineering, exploratory analysis, and hyperparameter tuning. Provided actionable insights for retention strategies based on model outputs.
Built a role-based AI expense management system using Flask and MongoDB, enabling secure bill uploads, automated extraction, and HR approval workflows. Implemented JWT authentication, scalable backend architecture, and Railway-ready deployment for production use. Leveraged OpenAI Vision & LLMs to extract structured expense data from bill images, reducing manual processing by ~70%.
Built a sentiment classification model (~80% accuracy) using NLP preprocessing and ML algorithms. Developed a web interface for real-time analysis via Flask API endpoints. Conducted cleaning, tokenization, stop-word removal, and vectorization.
Built a hybrid search system combining semantic embeddings + contextual LLM responses. Implemented ChromaDB vector store for dense embedding search. Designed an evaluation workflow for retrieval quality and output coherence.
Built a multi-agent orchestration framework enabling autonomous agent-to-agent communication using a standardized A2A protocol. Designed a centralized orchestrator for task routing, context sharing, and structured message passing between specialized agents.
Built a production-grade incremental news intelligence system for continuous article ingestion, clustering, and trend detection using NLP and sentence embeddings. Implemented incremental clustering, time-decayed topic modeling, and LLM-based summaries, enabling real-time insight without retraining models. Designed a deterministic, state-persistent architecture with REST APIs and a professional analytics dashboard.
Certified in Agentic AI by NVIDIA, demonstrating expertise in building and deploying intelligent autonomous AI systems.
Published in Metszet Journal. An in-depth analysis of AI technologies impacting the healthcare industry.
An exploration of DeepSeek OCR's revolutionary approach to optical character recognition and image compression technology.
Exploring the revolutionary concept of AI-to-AI communication and how autonomous agents are reshaping the future of technology.
A comprehensive guide to building and deploying AI agents using Google's Agent Starter Pack, simplifying the process of creating intelligent autonomous systems.
I am currently looking for new opportunities to contribute to Data Science projects. Whether you have a question, want to collaborate, or just want to say hi, feel free to reach out or schedule a meeting!
I can provide information about Anuj's professional background, including his education, projects, skills, and work experience. What would you like to know about Anuj?