About Me


I'm Rohit — a Data Scientist with a track record of delivering high-impact ML and NLP solutions in fast-paced SaaS and service-based startups. I specialize in LLMs, time series forecasting, and scalable end-to-end pipelines with a deep appreciation for product usability and business impact.

  • Saved $65K+/yr by deploying an in-house LLM tool (RAG + semantic search) to automate ESG compliance mapping at a B2B SaaS startup.
  • Accelerated client onboarding with NLP-powered autofill for 350+ ESG questions, validated via A/B testing on 50 clients.
  • Achieved 83% forecast accuracy by productionizing a time series model on 5 years of historical + external data (weather, housing), boosting planning decisions.
  • Reduced triage time by 30% on 375K+ dealer logs using a generative LLM pipeline with prompt engineering to extract key diagnostic features.
  • Enabled real-time email triage with a Gmail AI assistant powered by Gemini, LLMs, and PyMuPDF.

Tools: Python, SQL, Airflow, Snowflake, AWS (Amplify, Cognito, DynamoDB, S3), Docker, Ollama, Mistral, Gemini, GitHub, Jenkins, Tableau, Power BI, VS Code

Skills: Time series forecasting, supervised & unsupervised learning, prompt engineering, NLP & LLMs, A/B testing, semantic search, feature engineering, model interpretability, data pipeline design, CI/CD, MLOps fundamentals

I’m also the founder of the Modlee AI/ML Student Club at UConn where I hosted 20+ speaker sessions, hackathons, and demoed an LLM agent live at Hartford AI Day 2025. National finalist in the Humana-Mays Analytics Challenge (Top 10%) and winner of multiple hackathons.

If you're looking for someone who bridges the gap between data science, product, and business strategy — let's talk.

Experience

  • July 2025 – Present

    University of Connecticut

    Data Engineer

    • Tech Stack: Python, REST API, GitHub(CI/CD), GitHub Actions, Google Cloud Platform, Cloud Run Function, Cloud Scheduler, Anomaly Detection, Delta Lake, Data Lakehouse
    • Designed and operated distributed PySpark pipelines for climate and economic feeds, parsing XML to normalized schemas and writing to Delta Lake open tables, with downstream consumption in BigQuery for analytics and reporting.
    • Productionized ingestion with GCP Cloud Run services triggered by Cloud Scheduler, adding idempotency and schema validation to ensure reliable, reproducible writes to Delta or Iceberg compatible tables on cloud storage.
    • Built and deployed anomaly detection models using RNN and LSTM on tabular and text data, with batch scoring and evaluation pipelines to support large scale time series risk analysis.
  • January 2026 – February 2026

    CIS Labs, LLC.

    Technology Solutions Architect

    • Tech Stack: Python, REST API, Bitbucket(CI/CD), AWS, Snowflake, TypeScript, OpenAI API, Ollama, Jira
    • Built and optimized cloud-native data platforms by designing Snowflake schemas, writing performant SQL transformations, and developing AWS Glue ETL pipelines to reliably ingest, transform, and serve data for application and analytics use cases.
    • Engineered an AI-driven financial document processing system using OpenAI API and Ollama to split, classify, and organize multi-document files, significantly reducing manual data handling and improving accessibility of client financial data.
    • Developed backend services and APIs using TypeScript, integrating database-backed data access layers with application components while ensuring scalability, performance, and secure data exchange across systems.
    • Maintained and enhanced CI/CD pipelines using Bitbucket and Jira to support automated builds, testing, and deployments, while collaborating cross-functionally to track work, enforce code quality standards, and deliver reliable production releases.
  • January 2025 – April 2025

    Alō Index

    Data Engineer Intern

    • Tech Stack: Python, SQL, Docker, REST API, GitHub (CI/CD), AWS (Amplify, S3, DynamoDB, Cognito), Ollama, Mistral, PyMuPDF, React.js, JSON
    • Saved $5.9K/month by deploying an in-house Extractive LLM pipeline (RAG + semantic search) to map unstructured ESG documents to internal compliance frameworks, cutting document review time by 81%.
    • Generated $120K+ in annual value by building an NLP-driven autofill engine to pre-score 350+ due diligence questions; reduced onboarding by 13 hours/client and validated performance via A/B testing on 50-client cohorts.
    • Improved reporting accuracy by 97% by cleaning and transforming 1.2M+ risk records using supervised learning and feature validation techniques.
    • Enabled scale by automating ingestion from 33 external sources across 6.7K+ entities, streamlining structured data pipelines for operations and client success.
    • Led deployment workflows using GitHub CI/CD and AWS Amplify; collaborated with VP Engineering and non-technical C-suite to convert ML outputs into roadmap-aligned product features.
  • December 2024 – May 2025

    Stanley Black & Decker (Capstone Project at UConn)

    Data Engineer

    • Tech Stack: Python, SAS Studio, SARIMAX, RNN, LSTM, API, Docker, Ollama, LLaMa3, Tableau
    • Achieved 83% forecast accuracy by productionizing a time series model with 5 years of historical + external data (weather, housing), helping compliance teams detect and resolve regional outliers.
    • Accelerated triage by 30% and eliminated manual errors by building a generative LLM pipeline (Ollama, LLaMa3) with prompt engineering to extract 9 structured features from 3M+ unstructured dealer logs.
    • Streamlined decision-making by integrating forecasts and LLM outputs into a Python pipeline that auto-refreshed Tableau dashboards for monthly executive reviews.
    • Ranked 1st out of 5 teams for delivering a production-ready solution and standout presentation, earning praise from non-technical SBD leadership for clarity and business value.
  • January 2025 – May 2025

    University of Connecticut

    Graduate Teaching Assistant – Deep Learning

    • Evaluated graduate-level assignments with a focus on code quality, model performance, and documentation.
    • Delivered individualized, constructive feedback to support student learning and model implementation best practices.
  • August 2023 – May 2025

    University of Connecticut

    Master of Science (M.S.) in Business Analytics and Project Management

    GPA: 4.00/4.00

    • Relevant Coursework: Deep Learning, Data Science using Python, Data Mining and Business Intelligence, Predictive Modeling, Business Decision Modeling, Project Leadership and Communications, Statistics Business Analytics, Project Management, Data Management and Business Process Modeling, Agile Project Management and Methodologies
    • Awarded merit-based scholarship twice (Spring 2024 and Spring 2025) for outstanding academic performance.
    • Leadership: President and Founder of Modlee AI/ML Student Club. Organized hackathons, competitive events and projects, providing members with firsthand experience & industry insights through events featuring AI/ML professionals.
    • Keynote: AI Agent Demo Speaker, Hartford AI Day 2025 – Invited to deliver a live LLM-based agent demo at an event hosted by Launc[H], attended by 50+ industry professionals (technical and non-technical audience).
    • Recognition: Featured twice in Student Highlights, University Weekly Scoop – recognized for contributions to AI/ML community.
  • October 2019 – July 2023

    Greetworld Holidays

    Machine Learning Engineer

    • Reduced operational costs by 15% and boosted NPS by 12% through A/B testing and customer feedback analysis, optimizing tour logistics and enhancing service delivery.
    • Drove 35% increase in repeat bookings by building segmentation models on customer demographics and booking patterns, enabling high-impact marketing campaigns.
  • January 2017 – May 2019

    University of Bridgeport

    Master of Science (M.S.) in Technology Management (IT and Big Data)

    GPA: 3.89/4.00

    • Relevant Coursework: Intro to Big Data and Data Science, Intro to SQL and R for Data Science, Simulation and Modeling, Project Management, Statistical Quality Control Techniques, Strategic Sourcing and Vendor Management, Leadership in Technical Enterprise
  • May 2015 - November 2016

    Greetworld Holidays

    Data Analyst

    • Increased peak-season conversion by 18% by analyzing booking trends and traveler cohorts, informing property recommendations and pricing strategies.
    • Reduced downtime and improved onboarding by resolving live system issues and building workflows that enhanced ranking and filter functionality.
  • July 2012 – May 2015

    North Maharashtra University

    Bachelor of Business Management (E-Commerce)

    GPA: 6.88/10.00

    • Relevant Coursework: Business Mathematics and Statistics, System Analysis and Designing, MIS and ERP, Oracle and D2K, Java Programming, Intro to HTML, Intro to C++, Visual Basic 6.0, VB .Net, ASP .Net, Microsoft Access, Microsoft Visio, Finance, Cost Accounting, Management Accounting, Corporate Accounting, Business Laws

Skills

Programming Languages

PythonPython
SQLSQL
R ProgrammingR
DjangoDjango
PySparkPySpark
CSS3CSS3
HTML5HTML5

Machine Learning and Artificial Intelligence

Scikit-learnScikit-learn
TensorFlowTensorFlow
PytorchPytorch
imblearnImblearn
OllamaOllama
Gemin-AIGemini AI
ModleeModlee

Data Analysis and Visualization

TableauTableau
PowerBIPowerBI
NumpyNumpy
PandasPandas
MatplotlibMatplotlib
SeabornSeaborn
PlotlyPlotly

Business Intelligence and Deployment Tools

GitHubGitHub
SnowflakeSnowflake
AWSAWS
DockerDocker
SAS JMPSAS JMP
SAS StudioSAS
MySQLMySQL
PostgreSQLPostgreSQL
SQL ServerSQL Server

Project Management and Productivity Tools

JiraJira
PyCharmPyCharm
Jupyter NotebookNotebook
ExcelExcel
MS VisioMS Visio
MS ProjectMS Project
FigmaFigma
LucidChartLucidChart

Recent Projects

Insurance Fraud Detection Using Machine Learning

Insurance Fraud Detection Using Machine Learning

Utilizing machine learning techniques, modeling to detect fraudulent insurance claims.

#MachineLearning #FraudDetection #Python #EDA #DataAnalysis #PredictiveModeling

Predictive Modeling of Adolescent Digital Overuse Thumbnail

Predictive Modeling of Adolescent Digital Overuse

Modeled internet overuse risk using wearable sensor data and time-series features.

#DeepLearning # DNN #BehavioralAnalytics #NeuralNetworks #Python #SHAP #TimeSeriesAnalysis

Sector Based Stock Selection Thumbnail

S&P500 Investment Strategy and Stock Analysis

Simulated S&P 500 stock performance to select stable stocks across multiple sectors using daily return metrics.

#Finance #yfinance #Python #Pandas #Matplotlib #API #WebScraping #Optimization #MonteCarlo

Shoe Store Data Warehouse

SQL Data Warehouse for Shoe Store Operations

Built a retail shoe store data warehouse using dimensional modeling; managed SQL tables and visualized insights via dashboards.

#SQL #RDBMS #DimensionalModeling #RetailAnalytics #Reporting #ETL #ERD #DataWarehouse #StarSchema

Email AI Agent

Email AI Agent Using Gemini and Gmail API

Built an AI-powered agent to summarize emails, analyze sentiment, and generate smart replies using LLMs and the Gmail API.

#LLM #EmailAutomation #Python #AIIntegration #GmailAPI #GeminiAPI #GoogleCloud

California Housing Market Analysis and Modeling

California Housing Market Analysis and Modeling

Analysis and modeling of California housing market data to understand trends and predict prices.

#MarketAnalysis #PredictiveModeling #Python #DataVisualization #RealEstate

Sentiment Analysis of 2020 US Presidential Election Tweets

Sentiment Analysis of 2020 US Presidential Election Tweets

Analysis of tweets related to the 2020 US Presidential Election to understand public sentiment and opinion trends.

#SentimentAnalysis #NaturalLanguageProcessing #Python #SocialMediaAnalysis #DataMining #SASEnterpriseMiner

Bank Client Segregation Analysis with Tableau

Bank Client Segregation Analysis with Tableau

This dashboard helps in visualizing and interpreting client data, which can lead to better business decisions and strategies.

#DataVisualization #Tableau #ClientSegmentation #BusinessIntelligence #DataAnalysis

Comprehensive SpaceX Launch Data Analysis and Visualization

SpaceX Launch Data Analysis and Visualization

Web scraping, data wrangling, SQL, exploratory data analysis (EDA), data visualization, machine learning, and interactive dashboards with Dash.

#DataAnalysis #WebScraping #DataWrangling #SQL #ExploratoryDataAnalysis #EDA #DataVisualization #MachineLearning #Dash

Walmart Sales Forecasting Using Time Series Analysis

Walmart Sales Forecasting Using Time Series Analysis

Time series analysis to forecast sales, helping in inventory management and sales strategy optimization.

#TimeSeriesAnalysis #SalesForecasting #Python #PredictiveModeling #InventoryManagement

Lending Case Study: Comprehensive Predictive Modeling

Lending Case: Comprehensive Predictive Modeling

The analyses include various machine learning techniques such as Logistic Regression, Linear Regression, Ridge vs. Lasso, Decision Tree, Bagging, Boosting, Naive Bayes, Neural Networks, and Principal Component Analysis (PCA).

#PredictiveModeling #MachineLearning #DecisionTree #LogisticRegression #LinearRegression #NaiveBayes #NeuralNetworks #PrincipalComponentAnalysis

Toyota Corolla and Cereals Data Analysis Using PCA and K-means Clustering

Toyota and Cereals Data Analysis Using PCA and K-means Clustering

This project applies Principal Component Analysis (PCA) to the Toyota Corolla dataset and K-means clustering to a cereals dataset to identify healthy and unhealthy cereals.

#PrincipalComponentAnalysis #KmeansClustering #DataAnalysis #Python #DataMining

E-Commerce SQL Management System

E-Commerce SQL Management System

A robust SQL-based management system for handling all aspects of an e-commerce platform, including customers, products, orders, suppliers, and employees.

#SQL #DatabaseManagement #ECommerce #DataManagement #BusinessIntelligence

Inspiring Change: Collaborating with My Sisters' Place Non-Profit Organization

Inspiring Change: Collaborating with My Sisters' Place Non-Profit Organization

Project aimed at inspiring support for a non-profit organization, involving data collection, analysis, and strategy development to enhance community engagement.

#NonProfitCollaboration #Fundraising #CommunityOutreach #StrategicPlanning #StakeholderEngagement #ProjectManagement #ResourceMobilization

Planning Poker for Personal Project

Planning Poker for Personal Project

A collaborative Planning Poker assessment for estimating project tasks' effort, complexity, and uncertainty, ensuring accurate and effective project planning.

#PlanningPoker #EffortEstimation #ProjectPlanning #Collaboration #AgileMethodologies #Scrum

User Stories and User Persona for Personal Project

User Stories and User Persona for Personal Project

Ensuring targeted and effective career advancement by focusin on defining user stories and personas to guide the efforts of team members.

#UserStories #UserPersona #AgileMethodologies #TeamGuidance #ProjectPlanning #UserCenteredDesign

Roadmap for Personal Project

Roadmap for Personal Project

An upskilling and job readiness project using Agile methodologies and a JIRA board to enhance skills, create professional portfolios, and prepare resumes for career advancement.

#Roadmap #Upskilling #JobReadiness #AgileMethodologies #JIRA #ProfessionalDevelopment #CareerAdvancement

Project Charter for Personal Project

Project Charter for Personal Project

A comprehensive upskilling and job readiness program using agile methodologies combining structured learning, real-world projects, mentorship, and portfolio building.

#ProjectCharter #Upskilling #AgileMethodologies #StructuredLearning #Mentorship #PortfolioBuilding

Project Management Tools: Project Charter, Gantt Chart

Project Management Tools: Project Charter, Gantt Chart

This project demonstrates the application of project management principles through the creation of a Project Charter, Gantt chart for Course OPIM5270.

#ProjectManagement #ProjectCharter #GanttChart #ProjectPlanning #CourseProject #MSProject

Stakeholder Analysis for Highway Bypass

Stakeholder Analysis for Highway Bypass

Using Microsoft Excel, the analysis identifies all relevant stakeholders, assesses their interests, influence, and impact on the project, and develops strategies for effective communication and engagement.

#StakeholderAnalysis #ProjectManagement #MicrosoftExcel #StakeholderEngagement

Failure Modes and Effects Analysis (FMEA) for Painting Room

Failure Modes and Effects Analysis (FMEA) for Painting Room

Application of FMEA methodology to identify potential failure modes in the painting process and recommend mitigation actions.

#FMEA #RiskManagement #ProcessImprovement #FailureModeIdentification #MitigationActions #QualityControl

Crystal Ball Simulation for Risk Analysis and Decision Making

Crystal Ball Simulation for Risk Analysis and Decision Making

Use of Crystal Ball for Monte Carlo simulations to perform risk analysis and support decision-making.

#CrystalBall #RiskAnalysis #MonteCarloSimulation #DecisionMaking #PredictiveModeling #DataAnalysis

SimpliDocs: UI/UX App Design Using Figma

SimpliDocs: UI/UX App Design Using Figma

Comprehensive UI/UX design project for SimpliDocs, a user-friendly mobile app streamlining access to Indian government processes using Figma.

#UIUXDesign #Figma #MobileAppDesign #UserExperience #UserInterface #GovernmentProcesses #Accessibility

Arena Modeling Simulation

Arena Modeling Simulation

Detailed modeling and simulation of manufacturing processes using Arena software to optimize lathing operations and enhance production efficiency.

#ArenaSimulation #ManufacturingProcess #Modeling #ProductionEfficiency #ProcessOptimization

Achievements

President, Modlee AI/ML Student Club, University of Connecticut

  • Founded & led the Modlee AI/ML Student Club (MAIC) at University of Connecticut, creating a collaborative platform for students of all backgrounds to explore AI & machine learning.
  • Organized hackathons, competitive events & projects, providing members with firsthand experience & industry insights through events featuring AI/ML professionals.
  • Featured twice in Student Highlights by the University for leadership & dedication to fostering a supportive, innovative learning environment in AI/ML.

Twice Awarded Merit Scholarship Business Analytics and Project Management, University of Connecticut

  • Awarded $3,600 merit scholarship for academic excellence in the MS in Business Analytics & Project Management program - received twice (Fall 2024 & Spring 2025)
  • Recognized by the UConn School of Business for sustained high performance across consecutive semesters
  • Selected among top-performing graduate students based on academic merit and program contribution

Hackathons

  • Won three hackathons and placed 1st in two and 2nd in one hackathon organized by Modlee.ai

Humana-Mays Healthcare Analytics Competition 2024 | National Finalist (Top 10\%)

  • Tech Stack: Python, XGBoost, CUDA, Scikit-Learn, NumPy, Pandas, Seaborn, Matplotlib.
  • Achieved 0.7686 AUC by training a CUDA-accelerated XGBoost model on 1.6M+ claims to identify patients at risk of missing preventive care, enabling targeted interventions at scale.
  • Projected 37% increase in visit rates by uncovering top predictors such as claims history and chronic conditions, and recommending outreach strategies like telehealth, mobile clinics, and direct communication.

Travelers Modeling Competition | Ranked Top 3

  • Developed and implemented ML models including XGBoost, Lasso Regression, and Gradient Boosting to forecast claim volumes for call center operations, enhancing resource planning.
  • Performed data cleaning, exploratory data analysis, and data visualization using Python.
AI Day LinkedIn Post University Student Highlight 1 University Student Highlight 2 Capstone LinkedIn Post IBM Data Analyst IBM Data Science

Get In Touch

Thank you for visiting my portfolio! Whether you have a question, a project opportunity, or just want to say hello, I'd love to hear from you. I have expertise in data analysis, Python, SQL, and Tableau, and I'm always open to new connections and collaborations. Please reach out using the form below or connect with me on LinkedIn. Let's create something amazing together!