
Who've told you that distributed training is impossible? Democratizing AI: The Psyche Network Architecture
It seems that not only it is totally possible without incurring in unfeasible excessively narrow train data transfer bottlenecks but that several models have already been trained using this method. It mostly depends on how many GPUs join such kind of network.
See here: https://psyche.network/runs