Bruno Magalhaes

Machine Learning and High Performance Computing

my photo Hi! I'm a research engineer on the fields of Machine Learning (ML) and High Performance Computing (HPC). I work at Microsoft Research Cambridge on Project Silica, where I create large parallel-distributed ML models and pipelines on the cloud.

Prior to this, I completed a PhD in Computational Neuroscience at EPFL, researching large-scale variable-step simulation of brain-inspired spiking neural networks. Before that, I was an HPC research engineer at the Blue Brain Project at EPFL, focused on distributed computing, storage and multicore/GPU algorithms on supercomputers.

On the side, I maintain a publications bookmark where I summarize several papers of interest, and a resources page where I keep track of related books and material available online. My google scholar page indexes most of my scientific publications. When time allows, I post about HPC and ML:

2023 Distributed training of a GPT model (part 2): pipeline parallelism, Megatron-LM model parallelism and communication quantization
2023 Distributed training of a GPT model with DeepSpeed's ZeRO, sharding, offloading, and activation checkpointing
2023 Building a GPT model in C++, and benchmarking LibTorch, PyTorch, TorchScript and torch.compile
2023 Building a GPT model in PyTorch from scratch
2020 AI Supercomputing (part 2): Encoder-Decoder, Transformers, BERT, Sharding, and model compression
2020 AI Supercomputing: Levels of Parallelism, Linear Regression, Deep Neural Nets and Convolutional Neural Nets
2020 Generative Adversarial Networks
2019 Variational Autoencoders
2019 Variational Inference: ELBO, Mean-Field Approximation, CAVI and Gaussian Mixture Models
2019 Exponential Family of Distributions
2018 Bayesian Linear Regression, Maximum Likelihood and Maximum-A-Priori
2018 Statistics for ML Engineers
2018 Algebra for ML Engineers
2018 Deep Neural Networks, backpropagation, autodiff, dropout, CNNs and embeddings
2017 Unsupervised Learning basics and Principal Component Analysis
2017 Variable Timestep Simulation of the Electrical Activity of Neurons
2017 Closed-form Linear Regression and Matrix Factorization, and loss functions
2016 Numerical Resolution of the Electrical Activity of Detailed Neuron Models
2016 The Leaky Integrate-and-Fire Neuron Model and The Brunel Network
2015 Distributed Orthogonal Slicing for Load Balancing of Large Spatial Datasets
2015 Distributed Matrix Transpose Algorithms
2014 Distributed Sorting Algorithms