Bruno Magalhaes

Machine Learning and High Performance Computing

Welcome 👋🏽. I am Bruno, a research engineer for large-scale AI at Synthesia. Previously, I was an ML researcher at Microsoft Research Cambridge on Project Silica. And before that, an HPC engineer, PhD and postdoc at EPFL, researching variable-step simulation of spiking neural networks on large supercomputers. In this space, I keep track of publications and resources of interest, and I post about ML and HPC 🚀.
2024 Distributed GPT model (part 4): context and sequence parallelism with Ulysses and Ring attention
2024 Distributed training of variable-length samples: curriculum learning, compilation, adaptive batch size and LR
2024 Mixture-of-Experts: a publications timeline, with serial and distributed implementations
2023 Distributed GPT model (part 3): model parallelism with Megatron-LM
2023 Distributed GPT model (part 2): pipeline parallelism with DeepSpeed 1F1B
2023 Distributed GPT model: data parallelism, sharding and CPU offloading
2023 Building a GPT model in C++, and benchmarking LibTorch, PyTorch, TorchScript and torch.compile
2023 Building a GPT model in PyTorch from scratch
2020 Learning from sequences: Encoder-Decoder, Transformers and BERT
2019 Variational Autoencoders (VAEs) and Generative Adversarial Neural Networks (GANs)
2019 Variational Inference: ELBO, Mean-Field Approximation, CAVI and Gaussian Mixture Models
2019 Exponential Family of Distributions
2018 Bayesian Linear Regression, Maximum Likelihood and Maximum-A-Priori
2018 Statistics for ML Engineers
2018 Algebra for ML Engineers
2018 Deep Neural Networks, backpropagation, autodiff, dropout, CNNs and embeddings
2017 Unsupervised Learning basics and Principal Component Analysis
2017 Variable Timestep Simulation of the Electrical Activity of Neurons
2017 Closed-form Linear Regression and Matrix Factorization, and loss functions
2016 Numerical Resolution of the Electrical Activity of Detailed Neuron Models
2016 The Leaky Integrate-and-Fire Neuron Model and The Brunel Network
2015 Distributed Orthogonal Slicing for Load Balancing of Large Spatial Datasets
2015 Distributed Matrix Transpose Algorithms
2014 Distributed Sorting Algorithms