Teddy Liu
Computer Science
UC Davis

Computer Science student with research experience in LLM evaluation, data analysis, and ML-based systems. NeurIPS 2025 co-author. Authorized to work in the US on OPT/CPT.

Davis, CA Available 2026 ML · NLP · CV
01

About

I work on the layer where research meets product, designing benchmarks, evaluation pipelines, and ML-powered tools that turn raw model capability into something useful and measurable.

Recent work spans a multi-LLM benchmarking platform at UC Davis, a NeurIPS 2025 benchmark on adversarial bias detection, and shipping production AI dashboards with Claude, Gmail, and Drive APIs. I move between Python, TypeScript, and Swift, and care equally about model behavior and the seams between systems.

02

Background

Originally from Madagascar, I pursued my studies in International Business in England, where I immersed myself in diverse cultures and gained a deep appreciation for global markets.

Eager to expand my skill set, I am now advancing my expertise in Computer Science in the United States, merging my passion for technology with the strategic perspective I acquired through my business background.

03

Skills

Languages
Python · TypeScript · JavaScript · C++ · Java · SQL · HTML/CSS
Frameworks
React · Node.js · Next.js · Langchain · Flask · FastAPI
ML & Data
scikit-learn · NumPy · Pandas · OpenCV · YOLOv8
Infra
Git · Docker · Google Cloud · PostgreSQL · MongoDB · Prisma
Focus
LLM Evaluation · NLP · Computer Vision · Algorithm Design
APIs
Claude · Gmail · Google Drive · OAuth 2.0
04

Experience

Jun 2025 — Sep 2025

LLM AutoEval Benchmark — SWE Intern

Dept. of Computer Science, UC Davis
  • Built a multi-LLM benchmarking platform supporting 7 NLP metrics and custom datasets, enabling automated evaluation across multiple models.
  • Architected the automated evaluation pipeline comparing LLM outputs against ground truth data to generate accuracy rankings and performance insights.
  • Implemented BLEU, ROUGE, METEOR, Perplexity, Semantic Similarity, CIDEr, and SWD for comprehensive LLM assessment, significantly reducing manual evaluation.
Apr 2025 — Aug 2025

E-search — Research Assistant (Computer Vision)

Dept. of Chemical Engineering, UC Davis · Python
  • Optimized vibration parameters using Python computer vision analysis of high-speed camera footage to maximize object separation efficiency.
  • Developed a computer vision system processing high-speed camera footage to extract and analyze real-time waveform data from mechanical vibrations.
Apr 2025

RobustBiasBench — NeurIPS 2025 Co-author

Dept. of Computer Science, UC Davis
  • Co-authored a NeurIPS 2025 benchmark submission introducing RobustBiasBench — evaluating bias detection robustness in LLMs under adversarial textual perturbations.
  • Curated and preprocessed an 18k-sample dataset through systematic collection, cleaning, filtering, and manual labeling.
  • Trained and tested a Support Vector Machine on the final dataset using TF-IDF vectorization.
05

Selected Projects

Semicolon

May 2026
Swift · TypeScript · Next.js · Python · MongoDB

iOS app turning the iPhone into a smart dashcam. Rolling buffer saves the last 60s of dual-camera footage to MongoDB. Real-time perception pipeline streams frames to a FastAPI sidecar running YOLOv8, fused with ARKit LiDAR scene-depth for sub-100ms hazard scoring on close passes, doorings, and blocked bike lanes.

Company Pulse

Mar 2026
TypeScript · React · Node.js · PostgreSQL · Claude API · Gmail API

Deployed AI automation dashboard for a client company. Auto-generates daily business briefings from Google Drive and schedules AI-drafted client emails via Gmail. Human-in-the-loop outbox with one-click delivery. OAuth tokens secured with AES-256-GCM on a Prisma/Postgres backend.

FieldScout Copilot

Feb 2026
Python · Swift · On-device LLM

Offline iOS app converting field worker voice notes into structured agronomic observations via on-device LLM inference in under 90 seconds. Local rules engine generates time-bounded treatment recommendations from live weather features, with playbook patching and version-tracked audit trails.

PokeMe

Jan 2026
Python · Swift · Google App Engine

Sports app letting students find pickup buddies and meetups nearby. Full-stack on Google App Engine with a profile-based AI recommendation engine. 20+ active users.

The Moderator

Jun 2025
JavaScript · Python · Langchain

MultiAgent Diplomacy — an AI strategy game where LLM agents autonomously negotiate and compete via Langchain. Integrated Google Cloud TTS to synthesize natural voice between agents.

Pyrosphere

Apr 2025
TypeScript · Python · SQL

Real-time wildfire detection alerting California residents to active fire threats via live camera monitoring. ML model trained on 21,000+ images detects smoke and fire across 1,150 traffic cameras. Predictive risk model trained on 14,000 historical incidents.

06

Education

Sep 2024 — Jun 2026

University of California, Davis

B.S. Computer Science · Davis, CA

Coursework: Software Engineering, Artificial Intelligence, Computer Architecture, Algorithm Design & Analysis, Programming Languages, Theory of Computation, Operating Systems. Active in the Google Developer Student Club, ML lab research, and hackathons.

Jan 2022 — Jun 2024

De Anza College

A.S. Computer Science · Cupertino, CA

Coursework: Data Structures & Algorithms, Object Oriented Analysis & Design, Linear Algebra, Discrete Mathematics. Director's Choice Award at De Anza Hacks 2.5 for an audio-reactive LED & haptic visualization device.

Let's work together

Open to internships
and full time jobs.