We show how to use Stein variational gradient descent (SVGD) to carry out inference in Gaussian process (GP) models with non-Gaussian likelihoods and large data volumes. Markov chain Monte Carlo (MCMC) is extremely computationally intensive for these situations, but the parametric assumptions required for efficient variational inference (VI) result in incorrect inference when they encounter the multi-modal posterior distributions that are common for such models. SVGD provides a non-parametric alternative to variational inference which is substantially faster than MCMC. We prove that for GP models with Lipschitz gradients the SVGD algorithm monotonically decreases the Kullback-Leibler divergence from the sampling distribution to the true posterior. Our method is demonstrated on benchmark problems in both regression and classification, a multimodal posterior, and an air quality example with 550,134 spatiotemporal observations, showing substantial performance improvements over MCMC and VI.