The step speed rounding and the cpu delay must be run after running
the endstop specific preparation code. Otherwise, a delay in the
home_prepare() code could undo those calculations. Specifically, this
could lead to errors on a multi-mcu setup when the Z is homed using a
virtual_z_offset and there is a delay in the activate_gcode section.
Signed-off-by: Kevin O'Connor <kevin@koconnor.net>