Interpretability of Embodied VLAs and Cross-Modal Circuit Tracing
Developing theory and experimental results for tracing causal circuits across multiple ML model types in a VLA pipeline. Investigating how information flows between vision, language, and action modalities in embodied agents.
Mechanistic Interpretability
VLAs
Circuit Tracing
Cross-Modal
Read more →
Interpretability-Auditing for RAG LLMs in High-Regulation Contexts
Using Sparse Autoencoders and Transcoders to perform circuit tracing and reduction, explaining provenance, chain of thought, and internal reasoning of RAG LLM assistants deployed in financial contexts.
RAG
Sparse Autoencoders
Transcoders
Finance AI
Read more →