Bruno Magalhaes

Machine Learning and High Performance Computing

Welcome 👋🏽. I am Bruno, a Systems ML researcher at Huawei Research. Previously, I was an ML researcher at Microsoft Research Cambridge on Project Silica. And before that, an HPC engineer, PhD and postdoc at EPFL. In this space, I keep track of publications and resources of interest, and I post about ML and HPC 🚀.

github

scholar

RSS

WARNING: scammers are using my name and face (with AI) to impersonate me in job interviews. Please be careful and do the due diligence when you are talking with *me*.

2024	Distributed GPT model (part 4): context and sequence parallelism with Ulysses and Ring attention
2024	Distributed training of variable-length samples: curriculum learning, compilation, adaptive batch size and LR
2024	Mixture-of-Experts: a publications timeline, with serial and distributed implementations
2023	Distributed GPT model (part 3): model parallelism with Megatron-LM
2023	Distributed GPT model (part 2): pipeline parallelism with DeepSpeed 1F1B
2023	Distributed GPT model: data parallelism, sharding and CPU offloading
2023	Building a GPT model in C++, and benchmarking LibTorch, PyTorch, TorchScript and torch.compile
2023	Building a GPT model in PyTorch from scratch
2020	Learning from sequences: Encoder-Decoder, Transformers and BERT
2019	Variational Autoencoders (VAEs) and Generative Adversarial Neural Networks (GANs)
2019	Variational Inference: ELBO, Mean-Field Approximation, CAVI and Gaussian Mixture Models
2019	Exponential Family of Distributions
2018	Bayesian Linear Regression, Maximum Likelihood and Maximum-A-Priori
2018	Statistics for ML Engineers
2018	Algebra for ML Engineers
2018	Deep Neural Networks, backpropagation, autodiff, dropout, CNNs and embeddings
2017	Unsupervised Learning basics and Principal Component Analysis
2017	Variable Timestep Simulation of the Electrical Activity of Neurons
2017	Closed-form Linear Regression and Matrix Factorization, and loss functions
2016	Numerical Resolution of the Electrical Activity of Detailed Neuron Models
2016	The Leaky Integrate-and-Fire Neuron Model and The Brunel Network
2015	Distributed Orthogonal Slicing for Load Balancing of Large Spatial Datasets
2015	Distributed Matrix Transpose Algorithms
2014	Distributed Sorting Algorithms

Support this blog! If you like this content and would like to show appreciation, please donate instead to the children's cancer hospital in Porto via this GoFundMe campaign. Thank you for caring❤️‍🩹