Simultrain Solution [TRUSTED]

[ T_\textseq = T_\textsend + T_\textforward + T_\textbackward + T_\textrecv ]

SimulTrain sends activations (lower dimension than raw data but higher than gradients). However, it enables bidirectional overlap , reducing total bandwidth-time product by 65% compared to SyncSGD. | Dataset | Centralized | SyncSGD | FedAvg (5 local steps) | SimulTrain | |-------------|-------------|---------|------------------------|------------| | UCF-101 | 84.2% | 83.9% | 81.1% | 83.7% | | WISDM | 91.5% | 91.3% | 88.9% | 91.1% |

where ( \alpha ) is a learned or fixed extrapolation coefficient (set to 0.5 in our experiments). This linear correction term approximates the gradient at the cloud's version without recomputing forward pass. Edge and cloud maintain version counters ( v_e, v_c ). The cloud applies updates immediately. The edge applies received deltas in order but without locking. To prevent divergence, we use a soft reconciliation step every ( R ) iterations: simultrain solution

SimulTrain reduces latency by 78% on 4G and 71% on 5G compared to SyncSGD. FedAvg hides latency via local steps but suffers from model drift. | Method | Upload per step (KB) | Download per step (KB) | |----------------|----------------------|------------------------| | Centralized | 7,500 (video frame) | 75 (weights) | | SyncSGD | 75 (gradients) | 75 (weights) | | SimulTrain | 30 (activations) | 75 (delta weights) |

where ( \sigma^2 ) is gradient noise variance. This matches the rate of synchronous SGD when ( \tau ) is bounded. This linear correction term approximates the gradient at

of SimulTrain is that the forward pass of one batch and the backward pass of a previous batch can overlap in time, if we carefully manage parameter versions and gradients. This is analogous to CPU pipelining but applied to distributed training across heterogeneous compute nodes.

Proof sketch: The forecast term cancels first-order bias from staleness. Weight reconciliation prevents error accumulation. The pipeline yields the same effective gradient steps per unit time. Hardware: Edge = Raspberry Pi 4 (4GB RAM), Cloud = AWS g4dn.xlarge (NVIDIA T4). Network: emulated 4G (50 Mbps, 30 ms RTT) and 5G (300 Mbps, 10 ms RTT). The edge applies received deltas in order but

In edge-cloud setting, data is at edge, compute is in cloud. The sequential round-trip time is: