this post was submitted on 16 Aug 2023
4 points (100.0% liked)
Machine Learning
1765 readers
5 users here now
founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Without knowing more, I would expect it is a dataloader issue: your CPUs are bottlenecked trying to get enough data to your GPUs.
You can add more workers to your dataloader in order to paralyze it, though this can lead to weird parallelization bugs sometimes, so if things start acting weird, that might be a reason.
Yup this, if you would like more help we need the code, or at least a minimal viable reproduction scenario.