The amount of smoothing is controlled by the width of the window used in the averaging, or how quickly (in X) the weighting drops to zero. The mean values at each value of X are then joined up to give a smoothed line. The simplest type of smoother is a running mean, where at a given value X=x, the line is equal to the mean (possibly weighted somehow) of the Y values. One approach to overcome this problem is rather than plotting individual (Y,X) values, to plot a smoothed line of how the average value of Y changes with X. This is entirely uninformative regarding how Y depends on X, due to the binary nature of Y. Now if we plot Y against X, we get the following To illustrate, using R let’s simulate some (X,Y) data where Y follows a logistic regression with X entering linearly in the model:
For a start, the scatter plot of Y against X is now entirely uninformative about the shape of the association between Y and X, and hence how X should be include in the logistic regression model. With a binary outcome which we typically model using logistic regression things are not quite as easy (at least when trying to use graphical methods). A simple but often effective approach is simply to look at a scatter plot of Y against X, to visually assess the shape of the association. For linear regression there are a number of ways of assessing what the appropriate functional form is for a covariate. For example, with a continuous outcome Y and continuous covariate X, it may be the case that the expected value of Y is a linear function of X and X^2, rather than a linear function of X. When we include a continuous variable as a covariate in a regression model, it’s important that we include it using the correct (or something approximately correct) functional form.