I wrote the following in opposition to the occasion of his recent departure party but I thought these thoughts potency of general interest:
When Carl Morris came to our part in 1989, I and my associate students were so excited. We whole took his class. The funny act is, though, the late 1980s main well have been the worst time to subsist Carl Morris, from the standpoint of that which was being done in statistics at that time—not merited at Harvard, but in the theatre of war in general. Carl has made vast contributions to statistical theory and constant exercise, developing ideas which have become especially important in statistics in the highest two decades. In 1989, though, Carl’s careful search was not in the mainstream of statistics, or at the very time of Bayesian statistics.
When Carl arrived to indicate to us at Harvard, he was as well-as; not only-but also; not only-but; not alone-but a throwback and ahead of his time.
Let me illustrate. Two central aspects of Carl’s exploration are the choice of probability disposal for hierarchical models, and frequency evaluations in hierarchical settings to what both Bayesian calibration (conditional on inferences) and classical proneness and variance (conditional on unknown parameter values) are apposite. In Carl’s terms, these are “NEF-QVF” and “empirical Bayes.” My quirk is: both of these areas were hot at the beginning of Carl’s sweep and they are hot now, end somewhere in the 1980s they languished.
In the a~n of Charles Stein’s work in c~tinuance admissibility in the late 1950s there was an interest, first theoretical if it be not that with clear practical motivations, to generate lower-risk estimates, to get the benefits of one-sided pooling while maintaining good statistical properties modified by conditions on the true parameter values, to render the Bayesian omelet without cracking the eggs, in the same manner to speak. In this work, the functional con~ation of the hierarchical distribution plays an important role—and in a divers way than had been considered in statistics up to that punctilio. In classical distribution theory, distributions are typically motivated by convolution properties (for example, the problem of two gamma distributions with a low shape parameter is itself gamma), or ~ dint of. stable laws such as the central define theorem, or by some combination or transfiguration of existing distributions. But in Carl’s drudge, the choice of distribution for a hierarchical imitation can be motivated based on the properties of the resulting unfairly pooled estimates. In this way, Carl’s ideas are without equivocation non-Bayesian because he is considering the distribution of the parameters in a hierarchical representation not as a representation of previous belief about the set of unknowns, and not of the same kind with a model for a population of parameters, ~-end as a device to obtain obliging estimates.
So, using a Bayesian fabric to get good classical estimates. Or, Carl strength say, using classical principles to procure better Bayesian estimates. I don’t know that they used the term “robust” in the 1950s and 1960s, however that’s how we could regard of it now.
The interesting some~ is, if we take Carl’s drudge seriously (and we should), we at once have two principles for choosing a hierarchical shape. In the absence of prior knowledge about the functional form of the arrangement of group-level parameters, and in the musing of prior information about the values of the hyperparameters that would underly such a model, we should use some form with good statistical properties. On the other transmit, if we do have good previous information, we should of course practice it—even R. A. Fisher accepted Bayesian methods in those settings whither the prior distribution is known.
But, therefore, what do we do in those cases in between—the sorts of problems that arose in Carl’s applied be in action in health policy and other areas? I well-informed from Carl to use our precedent information to structure the model, as being example to pick regression coefficients, to decide which groups to pool together, to decide which parameters to model as varying, and soon afterward use robust hierarchical modeling to feel the remaining, unexplained variation. This captain-~ strategy wasn’t always so unmistakable in the theoretical papers on empirical Bayes, unless it came through in the Carl’s applied act, as well as that of Art Dempster, Don Rubin, and others, a great quantity of which flowered in the tardily 1970s—not coincidentally, a few years later Carl’s classic articles with Brad Efron that place hierarchical modeling on a firm base that connected with the edifice of theoretical statistics, step by step transforming these ideas from a parlor trick into a way of life.
In a renowned paper, Efron and Morris wrote of “Stein’s absurdity in statistics,” but as a enlightened man once said, once something is understood, it is not at all longer a paradox. In un-paradoxing shrinkage estimation, Efron and Morris finished the work at ~s that Gauss, Laplace, and Galton had begun.
So to a great distance, so good. We’ve hit the 1950s, the 1960s, and the 1970s. But what happened next? Why do I tell that, as of 1989, Carl’s be was “out of time”? The simplest respond would be that these ideas were a cull of their own success: once understood, ~t one longer mysterious. But it was else than that. Carl’s specific scrutiny contribution was not just hierarchical modeling goal the particular intricacies involved in the combination of data distribution and group-proportion model. His advice was not alone “do Bayes” or even “translate empirical Bayes” but rather had to practise with a subtle examination of this interaction. And, in the far advanced 1980s and early 1990s, there wasn’t such much interest in this in the department of statistics. On one side, the anti-Bayesians were calm riding high in their rejection of total things prior, even in some dwelling a rejection of probability modeling itself. On the other party, a growing number of Bayesians—inspired by applied successes in fields as unlike as psychometrics, pharmacology, and political science—were make easy to just fit models and not worry in an opposite direction their statistical properties.
Similarly with empirical Bayes, a spell which in the hands of Efron and Morris represented a troubled, even precarious, theoretical structure intended to catch classical statistical criteria in a setting to what the classical ideas did not completely apply, a setting that mixed regard and prediction—but which had devolved to typically fair be shorthand for “Bayesian inference, plugging in point estimates for the hyperparameters.” In an era where the purveyors of classical postulate didn’t care to wrestle through the complexities of empirical Bayes, and whither Bayesians had built the modeling and technical infrastructure needed to fit full Bayesian inference, hyperpriors and whole, there was not much of a emporium for Carl’s hybrid ideas.
This is for what cause I say that, at the time Carl Morris came to Harvard, his labor was honored and recognized as pathbreaking, yet his actual research agenda was external part the mainstream.
As noted above, yet, I think things have changed. The first clue—although it was not at every one of clear to me at the time—was Trevor Hastie and Rob Tibshirani’s lasso regression, which was developed in the in season 1990s and which has of methodical arrangement become increasingly popular in statistics, tool learning, and all sorts of applications. Lasso is of influence to me partly as the fix where Bayesian ideas of shrinkage or subordinate polling entered what might be called the Stanford institute of statistics. But for the not past nor future discussion what is most relevant is the centrality of the functional configuration. The point of lasso is not regular partial pooling, it’s partial pooling with an exponential prior. As I uttered, I did not notice the affinity with Carl’s work and other Stein-inspired be in action back when lasso was introduced—at that time, a great quantity was made of the shrinkage of known but unnamed coefficients all the way to zero, which indeed is important (especially in practical problems with large numbers of predictors), if it be not that my point here is that the ideas of the tardily 1950s and early 1960s again be suitable to relevant. It’s not enough true to say you’re partial pooling—it matters _how_ this is root done.
In recent years there’s been a flood of research on prior distributions notwithstanding hierarchical models, for example the operate by Nick Polson and others steady the horseshoe distribution, and the issues raised ~ the agency of Carl in his classic work are every part of returning. I can illustrate with a incident from my own work. A small in number years ago some colleagues and I published a notes on penalized marginal maximum likelihood judgment for hierarchical models using, for the collection-level variance, a gamma prior with shape parameter 2, which has the cheerful feature of keeping the point reckon off of zero while allowing it to have ~ing arbitrarily close to zero if demanded through the data (a pair of properties that is not satisfied ~ means of the uniform, lognormal, or inverse-gamma distributions, totality of which had been proposed viewed like classes of priors for this example). I was (and am) proud of this outcome, and I linked it to the increasingly approved idea of weakly informative priors. After talking with Carl, I learned that these ideas were not starting a~ to me, indeed these were closely connected to the questions that Carl has been wrestling with for decades in his scrutiny, as they relate both to the technical event of the combination of prior and data distributions, and the larger concerns well-nigh default Bayesian (or Bayesian-like) inferences.
In severe: in the late 1980s, it was plenty to be Bayesian. Or, perhaps I should decide, Bayesian data analysis was in its artisanal revolution of time, and we tended to be blissfully uninformed about the dependence of our inferences up~ subtleties of the functional forms of our models. Or, to utter a more positive spin on things: when our inferences didn’t make judgment, we changed our models, hence the methods we used (in concoct with the prior information implicitly encoded in that clear-sounding phrase, “make sense”) had upper hand statistical properties than one would reflect based on theoretical analysis alone. Real-globe inferences can be superefficient, as Xiao-Li Meng efficiency say, because they make use of silent knowledge.
In recent years, however, Bayesian methods (or, added generally, regularization, thus including lasso and other methods that are alone partly in the Bayesian fold) esteem become routine, to the extent that we poverty to think of them as defaults, what one. means we need to be concerned here and there . . . their frequency properties. Hence the re-emergence of truly empirical Bayesian ideas like as weakly informative priors, and the re-emergence of research on the systematic properties of inferences based attached different classes of priors or regularization. Again, this aggregate represents a big step beyond the traditional classification of distributions: in the rude or empirical Bayesian perspective, the suitable properties of a prior distribution be pendent crucially on the data model to that it is linked.
So, over 25 years subsequent to taking Carl’s class, I’m continuing to observe the centrality of his work to present statistics: ideas from the early 1960s that were in ~ people ways ahead of their time.
Let me conclude through the observation that Carl seemed to us to have ~ing a “man out of time” in c~tinuance the personal level as well. In 1989 he seemed ageless to us one as well as the other physically and in his personal qualities, and indeed I tranquillize view him that way. When he came to Harvard he was not young (I imagine he was about the same duration of existence as I am now!) but he had, for the re~on that the saying goes, the enthusiasm of young persons the rising generation, which indeed continues to stay through him. At the same time, he has ever been even-tempered, and I wait for that, in his youth, people remarked on the subject of his maturity. It has been intimately fifty years since Carl completed his training, and his ideas remain fresh, and I go on to enjoy his warmth, humor, and insights.
There are plane more benefits of taking in antioxidants further fighting disease as well.