
Stochastic Methods for Data Science: An in-progress book that provides an introduction to the interplay between stochastic process theory and algorithms in data science, with a focus (large-scale) stochastic optimization and Markov chain Monte Carlo. It is designed to be accessible to advanced undergraduates, graduate students, and researchers working in machine learning, statistics, and related fields.

VIABEL: A Python package that provides two core features:

  1. Easy-to-use, lightweight, flexible variational inference algorithms that are agnostic to how the model is constructed (just provide a log density and its gradient).

  2. Post hoc diagnostics for the accuracy of continuous approximations to (unnormalized) distributions. A canonical application is to diagnose the accuracy of variational approximations.

ShorTeX: A LaTeX package that aims to streamline LaTeX writing, particularly math. It automatically includes and configures commons packages, and provides functionality to, among other things, (1) make LaTeX math code shorter and more readable, (2) avoid the verbose commands and boilerplate common in LaTeX, and (3) avoid multi-key presses (curly braces, capital letters, etc.) where reasonable. It is being developed by myself, Trevor Campbell, and Jeffrey Negrea.

Preprints & Working Papers

Quantitative Error Bounds for Scaling Limits of Stochastic Iterative Algorithms

arXiv:2501.12212 [stat.ML], 2025.

Preprint PDF

Robust discovery of mutational signatures using power posteriors

bioRxiv 2024.10.23.619958, 2024.


Tuning-free coreset Markov chain Monte Carlo

arXiv:2410.18973 [stat.CO], 2024.

Preprint PDF

Structurally Aware Robust Model Selection for Mixtures

arXiv:2403.00687 [stat.ME], 2024.

Preprint PDF

Tuning Stochastic Gradient Algorithms for Statistical Inference via Large-Sample Asymptotics

arXiv:2207.12395 [stat.CO], 2022.

Preprint PDF

Calibrated Model Criticism Using Split Predictive Checks

arXiv:2203.15897 [stat.ME], 2022.

Preprint PDF


More Publications

A Framework for Improving the Reliability of Black-box Variational Inference

Journal of Machine Learning Research 25(219): 1−71, 2024.


Reproducible Parameter Inference Using Bagged Posteriors

Electronic Journal of Statistics 18(1): 1549–1585, 2024.


Independent finite approximations for Bayesian nonparametric inference

Bayesian Analysis, 2024.


A Targeted Accuracy Diagnostic for Variational Approximations

In Proc. of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS), Valencia, Spain. PMLR: Volume 108, 2023.

Preprint PDF

Reproducible Model Selection Using Bagged Posteriors

Bayesian Analysis 18(1): 79-104, 2023.


The Mutational Signature Comprehensive Analysis Toolkit (musicatk) for the Discovery, Prediction, and Exploration of Mutational Signatures

Cancer Research 81(23), 2021.


Challenges and Opportunities in High-dimensional Variational Inference

In Proc. of the 35th Annual Conference on Neural Information Processing Systems (NeurIPS), 2021.

Preprint PDF

Bidirectional contact tracing could dramatically improve COVID-19 control

Nature Communications 12(232), 2021.

PDF Code

Robust, Accurate Stochastic Optimization for Variational Inference

In Proc. of the 34th Annual Conference on Neural Information Processing Systems (NeurIPS), 2020.

Preprint PDF

Validated Variational Inference via Practical Posterior Error Bounds

In Proc. of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), Palermo, Italy. PMLR: Volume 108, 2020.

Preprint PDF Code Video


Scaling Bayesian inference: theoretical foundations and practical methods

Ph.D. thesis, Massachusetts Institute of Technology, 2018.



The feasibility of targeted test-trace-isolate for the control of SARS-CoV-2 variants

F1000Research 10(291), 2021.


Reconstructing probabilistic trees of cellular differentiation from single-cell RNA-seq data

arXiv:1811.11790 [q-bio.QM], 2018.

Preprint PDF

Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach

arXiv:1809.09505 [stat.TH], 2018.

Preprint PDF

Detailed Derivations of Small-variance Asymptotics for some Hierarchical Bayesian Nonparametric Models

arXiv:1501.00052 [stat.ML], 2014.

Preprint PDF

Infinite Structured Hidden Semi-Markov Models

arXiv:1407.0044 [stat.ME], 2014.

Preprint PDF

Recent & Upcoming Talks

More Talks

Reproducible Statistical Inference
Dec 15, 2024
Gaussian Process Surrogates for Bayesian Inverse Problems
Oct 9, 2024
Reproducible Statistical Inference
Mar 13, 2024
Robust, structurally-aware inference for mixture models
May 18, 2023
Trustworthy variational inference
Oct 21, 2022
Algorithmically robust, general-purpose variational inference
Apr 13, 2022

Short Bio

Jonathan Huggins is an Assistant Professor in the Department of Mathematics & Statistics and the Faculty of Computing & Data Sciences at Boston University. He is also a Data Science Faculty Fellow and an affiliated faculty member of the Department of Computer Science, the BU URBAN Program, the BU Program in Bioinformatics. He is a recipient of the Blackwell–Rosenbluth Award, which recognizes outstanding junior Bayesian researchers based on their overall contribution to the field and to the community. Prior to joining BU, he was a Postdoctoral Research Fellow in the Department of Biostatistics at Harvard. He completed his Ph.D. in Computer Science at the Massachusetts Institute of Technology in 2018. Previously, he received a B.A. in Mathematics from Columbia University and an S.M. in Computer Science from the Massachusetts Institute of Technology. His research centers on the development of fast, trustworthy learning and inference methods that balance the need for computational efficiency and the desire for statistical optimality with the inherent imperfections that come from real-world problems, large datasets, and complex models. His current applied work is focused on developing software tools and computational methods for (1) accelerating and improving large-scale forecasting of ecological systems and (2) enabling more effective scientific discovery from high-throughput and multi-modal genomic data. His research is supported by the National Institutes of Health, the National Science Foundation, and the Department of Defense.
