Let be sequences of scalar/vector/matrix random elements.
If converges in distribution to a random element and converges in probability to a constant , then
The requirement that Yn converges to a constant is important — if it were to converge to a non-degenerate random variable, the theorem would be no longer valid. For example, let and . The sum for all values of n. Moreover, , but does not converge in distribution to , where , , and and are independent.[4]
The theorem remains valid if we replace all convergences in distribution with convergences in probability.
Proof
This theorem follows from the fact that if Xn converges in distribution to X and Yn converges in probability to a constant c, then the joint vector (Xn, Yn) converges in distribution to (X, c) (see here).
Next we apply the continuous mapping theorem, recognizing the functions g(x,y) = x + y, g(x,y) = xy, and g(x,y) = xy−1 are continuous (for the last function to be continuous, y has to be invertible).
^Slutsky, E. (1925). "Über stochastische Asymptoten und Grenzwerte". Metron (in German). 5 (3): 3–89. JFM51.0380.03.
^Slutsky's theorem is also called Cramér's theorem according to Remark 11.1 (page 249) of Gut, Allan (2005). Probability: a graduate course. Springer-Verlag. ISBN0-387-22833-0.