GPUs are much more efficient when we can give them multiple training samples to work on in parallel, so we give them a “batch” of samples. Smart Batching with batch_encode_plus and DataLoader. I’ve also published a short YouTube walkthrough of this material here. This blog post is also available as a Colab Notebook here. I’ve taken a more manual approach in this Notebook, and I think it turned out well, especially for illustrating the technique. Unfortunately, the transformers library appears to have broken compatibility with his code, and it no longer runs. Michaël’s code is designed to make use of the new Trainer class in the transformers library, and makes use of many helper classes from PyTorch and transformers. I learned this technique from Michaël Benesty in his excellent blog post here, and used key pieces of his implementation ( here) in this Notebook. In this blog post / Notebook, I’ll demonstrate how to dramatically increase BERT’s training time by creating batches of samples with different sequence lengths. Chris McCormick About Membership Blog Archive Become an NLP expert with videos & code for BERT and beyond → Join NLP Basecamp now! Smart Batching Tutorial - Speed Up BERT Training
0 Comments
Leave a Reply. |