May 10, 2023
Backward ops run in the same type that autocast used for corresponding forward ops. It is not recommended because say for some operation in autocast, precision was decreased to FP16 but actual precision level for that parameter was FP32, then if backward-param-update is run under autocast, then the value is not updated for that param correctly and then propagation of errors will deplete the learning of model.