Advertisers
Free Chat Rooms   UK Chat Rooms   Chat Community   Chat   
Free Chat Rooms   Punk Rock T-Shirts   Free Chat   Live Chat   Concert Bands T Shirts   Chat Rooms   Fitness News   Band T Shirts   
Free Web Directory | Directory Submission Service | Buy Text Links | Theaters and Showtimes | News Archive |
Suggest a Site | Check Status
Kiva - loans that change lives

Experimental Implementation of an Ensemble Adjustment Filter for an Intermediate ENSO Model

Current Headlines

Experimental Implementation of an Ensemble Adjustment Filter for an Intermediate ENSO Model

Oct 09, 04:01 AM

Current Headlines: By Karspeck, Alicia R Anderson, Jeffrey L

ABSTRACT The assimilation of sea surface temperature (SST) anomalies into a coupled ocean-atmosphere model of the tropical Pacific is investigated using an ensemble adjustment Kalman filter (EAKF). The intermediate coupled model used here is the operational version of the Zebiak-Cane model, called LDEO5. The assimilation is applied as a means of estimating the true state of the system in the presence of incomplete observations of the state.

In the first part of this study assimilation is performed under the "perfect model" assumption, where SST observations are synthetically derived from a trajectory of the model. The focus is on how and why changes in the filter parameters (ensemble size, covariance localization, and covariance inflation) affect the quality of the analysis. It is shown that isotropic covariance localization does not benefit the analysis even when a small number of ensemble members are used. These results suggest that destruction of the "balance" between variables caused by localization is more detrimental than spurious correlation due to small ensemble size.

In the second part of this study the EAKF is used to assimilate an independent dataset of SST observations. The EAKF/Zebiak-Cane assimilation system is able to correctly estimate the phase and intensity of ENSO, as measured by the average SST anomaly in the eastern equatorial Pacific. A comparison of the analysis herein to independent wind stress and thermocline depth datasets suggests that even with the assimilation of only SST observations it is possible to reproduce over 70% of the interannual variability of thermocline depth in the eastern equatorial Pacific and off the coast of the Philippine Islands. The interannual variability of zonal wind stress in the central and western equatorial Pacific is also well correlated with independent observations (R > 0.75).

(ProQuest: ... denotes formulae omitted.)

1. Introduction

In the last decade there has been a significant increase in the use and understanding of ensemble data assimilation methods for the ocean and atmosphere. Unlike other advanced data assimilation techniques, such as four-dimensional variational methods, ensemble methods do not require the use of an adjoint to the dynamical model. This alone can be an advantage, as the construction of a model adjoint represents a significant investment of time and resources. In contrast, ensemble methods are more easily implemented, requiring only the dynamical model. In the oceanic and atmospheric sciences, the ensemble Kalman filter (EnKF) was largely developed to increase the computational feasibility of the full Kalman filter state estimation problem. Not only does it reduce the numerical cost of estimating the error covariance matrix, in nonlinear systems it allows the mean and covariance to be estimated without the imposition of second-order closure schemes on the error propagation (Evensen 1994).

This study focuses on the experimental implementation of the ensemble adjustment Kalman filter (EAKF; Anderson 2001, 2003) in an intermediate dynamical model of El Nino-Southern Oscillation (ENSO). While coupled data assimilation for ENSO analysis and forecast initialization is still relatively new, it has been the focus of a handful of studies. Most of this work has been done using intermediate complexity models (Kleeman et al. 1995; Lee et al. 2000; Canizares et al. 2001; Sun et al. 2002; Morss and Battisti 2004; Karspeck et al. 2006). More recently, coupled assimilation has been done in the more complex general circulation models (GCMs; Keenlyside et al. 2005; Zhang et al. 2005). Of these studies, only Zhang et al. (2005) have used ensemble methods.

In theory, an ensemble of analyses obtained using the EAKF can be used to initialize probabilistic ENSO forecasts. Because of inadequacies in intermediate models, however, initializations with analyses that are consistent with observations often lead to poor forecasts. This creates an ambiguity regarding the role of data assimilation within biased ENSO models.

The goal of this work is to demonstrate the utility of the EAKF for multivariate ENSO state estimation and to understand how and why changes in the filter parameters (ensemble size, covariance localization, and covariance inflation) affect the quality of the analysis. These issues have heretofore not been explored in the ENSO literature. While the ensembles generated by the EAKF are probably not useful for initializing forecasts with the existing intermediate models (due to the issue mentioned above), the lagged covariances that can be calculated from forecast ensembles can be used to shed light on the model behavior. This is briefly discussed in the final section of this paper.

The ENSO model used here is a variant of the Zebiak and Cane model (Zebiak and Cane 1987, hereafter ZC). It is an intermediate complexity coupled oceanatmosphere model that was developed for the simulation of ENSO variability in the tropical Pacific. We use the most recent version of the model, LDEO5, which differs from the original ZC model by a state-dependent bias correction to the model sea surface temperature (SST) equations. LDEO5 has been shown to have significant skill in retrospective long-lead forecasts (Chen et al. 2004). Versions of the ZC model have been used for experimental and operational seasonal-to-interannual ENSO forecasting for over 15 yr.

In theory, the best analysis would make use of all available data. However, here we investigate only the assimilation of SST information. We do this because the bias correction scheme in LDEO5 only corrects the SST field. Outside the perfect-model paradigm the mismatch between model and observations will be more severe in the other fields.

Section 2 describes the EAKF as it is implemented in this study, and the ENSO model is described in section 3. In section 4, a series of "perfect model" experiments will be used to explore the filter behavior in the context of the ENSO model. In perfect-model experiments, the "truth" is a trajectory of the model. "Observations" are generated by sampling from the model trajectory in a way that is consistent with the observational network. In section 5 we leave behind the perfect-model paradigm and use the EAKF to assimilate the SST data product of Kaplan et al. (2003, hereafter Kaplan SST). A comparison of our analysis to independent wind stress and thermocline depth data products is presented to assess the utility of this method for reconstructions of model variables that are not assimilated. A discussion, conclusions, and thoughts on future directions are presented in the final section.

2. The ensemble adjustment Kalman filter

Although the use of Monte Carlo (ensemble) methods for error prediction has been discussed in early works such as Epstein (1969), Evensen (1994) appears to be the first to apply these ideas in the context of Kaiman filtering for geophysical data assimilation. In particular. Evensen (1994) proposed that the background ("prior") error covariance in the Kalman filter formulation could be estimated from an ensemble of model forecasts made from the analysis distribution at a previous time. This was presented as an alternative to the extended Kalman filter, in which the analysis ("posterior") error covariance was integrated forward with a linear approximation to the nonlinear forecast model, essentially neglecting all higher-order statistical moments that might be generated by the dynamics. The EnKF as described in Evensen (1994) is followed in the literature by an array of variant EnKFs (van Leeuwen and Evensen 1996; Houtekamer and Mitchell 2001; Bishop et al. 2001; Anderson 2001). For brevity, we will not discuss the advantages and drawbacks of each. The reader is referred to Anderson (2003) for a summary of different ensemble filter methods. The filtering software we employ in this study is part of the Data Assimilation Research Test Bed (DART) developed and maintained at the National Center for Atmospheric Research. Here we will simply describe the EAKF as it was introduced in Anderson (2001, hereafter A01) and as it is implemented in DART.

In the current context, the EnKF assimilation algorithm is used as a means to estimate the true state of the system in the presence of incomplete observations of the state, observational error, and uncertainty in the prior state estimate. While the full Kalman filter also admits model error into the formalism, the EAKF method we use here does not explicitly account for this error source. We will show later that covariance inflation can be used as a poor- man's substitute for a proper treatment of model error. One of the advantages of the Kalman filter over variational data assimilation methods is that the prior error covariances need not be prescribed. Instead, they are calculated as part of the algorithm and are dependent on the model dynamics. It is worth noting that in practice it is often necessary in the EnKF to make some assumptions about the form of the prior error covariances. As we will discuss later, this is due to the necessity of approximating the error covariances using finite-sized ensembles. Nevertheless, depending on the application, the flow dependence of the error covariance is usually preserved to a greater or lesser extent. We chose to test the EAKF method for our system because it is computationally tractable, admits nonlinear behavior in the forward model, has a flow-dependent prior, and can be naturally extended into an ensemble prediction system. For consistency with A01, we will adopt a Bayesian perspective for the presentation of the filtering technique (Jazwinski 1970). The method partitions the assimilation of an observation into two parts: 1) the calculation of an observation increment using the ensemble adjustment filter, and 2) the computation of a corresponding increment to each ensemble member in the system state space. After all the observations at a given time have been assimilated, each ensemble member is propagated forward using the coupled model. At the time of the next available observations, the assimilation procedure is repeated.

In contrast to the traditional EnKF, which relies on the formation of a random sample of the observational distribution (e.g., Houtekamer and Mitchell 1998; also called "perturbed observations"), the ensemble adjustment filter is deterministic. The increments to each of the ensemble members can be computed deterministically once the observations and their error variance are specified.

Consider first that there is a linear mapping between vectors of state variables (x) and variables in observation space (y). In the presence of an observation. Bayes' rule gives that the best estimate of the updated variable in observation space is given by

... (1)

The left-hand side is the updated probability that y is the truth given the new observation y^sup o^sub t^ at time t and all previous observations y^sup o^sub t -^. Superscript a denotes the posterior analysis. The numerator on the right-hand side is the product of two terms. The first term is the likelihood of observing y^sup o^. This distribution is modified by the second term in the numerator, called the prior distribution (denoted by a superscript b). The prior distribution is the probability of the background state y^sup b^ given all the information leading up to (but not including) the acquisition of the new observation. The normalization in the denominator ensures that the probabilities in the posterior distribution sum to unity.

Houtekamer and Mitchell (2001) show that if the observational errors are spatially uncorrelated, then the filter can be applied to each observed variable sequentially, treating the assimilation of each new observation as a scalar problem. Making this assumption, the filtering procedure as implemented in the DART system performs the assimilation as a series of scalar computations. Hence, for both simplicity and consistency with the DART procedure, we will use a scalar representation in the following discussion of how (1) is used to update the state.

In addition to the assumption of uncorrelated observational errors, if we also assume that both the prior and observation distributions can be represented by Gaussians, then (1) predicts that the mean and variance of the posterior are given by

... (2)

... (3)

where sigma^sup 2^sub y^sup b^, sigma^sup 2^sub y^sup a^, and sigma^sup 2^sub y^sup o^ are the variances of the prior, posterior, and observation and overbars denote ensemble averages.

It is worth noting that the prior need not actually be normally distributed. While the first two moments of the posterior distribution are given by the above Gaussian assumption, the higher moments of the prior are partially preserved. In models with a significant degree of nonlinearity, this is one potential benefit of the adjustment filter.

Because only the first two moments of the ensemble distribution are adjusted, the deviations of the N updated ensemble members from their mean are proportional to the deviations in the prior ensemble, as in

... (4)

The subscript i denotes a single ensemble member. The proportionality factor, alpha, is chosen such that (3) is satisfied in the posterior, leading to

... (5)

Note that a is always less than one, consistent with the Kalman filter attribute that the assimilation of an observation always reduces the uncertainty. Knowing a and y^sup a^ from (2) and (3), the observation-space increment (deltay^sub i^ = y^sup a^sub i^ - y^sup b^sub i^) can readily be calculated from (4).

The transformation from an increment in observation space to an increment in state space is then done with a linear regression using the ensemble members. This regression is performed sequentially for each state variable (x^sub j^) within the state vector (x) given deltay^sub i^. The increment for each state-space variable within each ensemble member is given by

... (6)

where the covariances are calculated over the prior ensemble. The increment to each state variable represents only the additional information due to the assimilation of a single observation. The impact of each observation is computed and added sequentially, using the state ensemble that has been updated with all previous observations.

A01 notes that if the observational errors are uncorrected, the serial application of the above two-step process as a scalar problem is equivalent to operating the Kalman gain matrix on the innovation (y^sup b^ - y^sup o^). While this may not be immediately obvious, it may help the reader to consider that when written in scalar form, the traditional Kalman filter update to the mean (due to a single observation) can be expressed as

... (7)

Rearranging terms, the above equation can be rewritten

... (8)

... (9)

Using (2), it is easy to show that deltay = y^sup a^ - y^sup b^ is equal to the second two terms on the right-hand side of (8). The increment to the mean in state space is related to the increment in observational space through the linear least squares regression, represented in the first term on the right-hand side of (8) and (9). Hence, the two-step procedure (solve for the observation-space increment, translate to state space through linear regression) in the DART system is equivalent to that in the traditional Kalman filter formalism for the analysis mean value.

3. The ZC intermediate complexity ENSO model

The intermediate coupled model of ENSO used in this study is a bias-corrected variant of the Zebiak-Cane model called LDEO5. The ZC is an anomaly model, describing the evolution of deviations from a prescribed mean climatological background. The dynamics of the ocean component of ZC are modeled as a single baroclinic mode of the shallow water equations acting beneath a shallow surface mixed layer (Cane and Patton 1984). The longwave approximation made in the ocean dynamics filters out gravity waves and short Rossby waves. The ocean is forced with the wind field generated by the atmospheric component of the model. The thermodynamic equations that describe the evolution of the SST anomaly are fully nonlinear, including three- dimensional temperature advection by both the specified mean currents and the modeled anomalous currents. The surface heat flux is proportional to the local SST anomaly, thus acting always to damp the SST back to the specified climatology. The oceanic variables of interest here are the SST anomaly in the mixed layer and the Rossby and Kelvin wave component of the upper-layer depth (thermocline depth). These fields are computed at 10-day intervals. The model solves for the Rossby and Kelvin wave components separately, and for simplicity we sometimes present only the equatorial Kelvin component.

The dynamics of the atmospheric component are similar to that of Gill (1980) and are described in Zebiak (1986). These are the steady- state, linear shallow water equations on an equatorial beta plane. The vertical structure is given by a single baroclinic mode. Atmospheric heating anomalies, which drive the circulation, are associated partly with heat flux due to local SST anomalies and partly with moisture convergence parameterized in terms of surface wind convergence. The wind fields generated in the atmospheric model are converted to stress anomalies that force the ocean model. For this study, the only atmospheric variables we examine are the wind anomalies and their associated stresses.

Further details of the coupled model can be found in Cane (1986) and Zebiak and Cane (1987).

4. Perfect-model experiments

To improve our understanding of the EAKF applied to our ENSO model, we begin by making the assumption that the model is without errors. The perfectmodel setup allows us access to the "true state," which can then be used as a benchmark against which to compare our analysis.

The LDEO5 model was spun up for 10 yr, allowing it to reach a climatologically stationary state. Beginning in the model month of January, synthetic observations of SST anomalies were generated from the model solution every month for 20 yr. The observation locations were chosen for consistency with the Kaplan SST dataset (27.5[degrees]-27.5[degrees]S, 122.5[degrees]E-92.5[degrees]W, in 5[degrees] increments in both longitude and latitude; Kaplan et al. 1998). For comparison, the model SST grid is 101[degrees]E- 70[degrees]W in 5.625[degrees] increments, and 30[degrees]S- 30[degrees]N in 2[degrees] increments. The model SST anomalies were linearly interpolated from the model grid to the observation location and then perturbed by a normally distributed observational error with zero mean and prescribed variance. In all experiments in this study, the error variance for the synthetic SST anomalies was 0.1[degrees]C (Kaplan et al. 1998). The model trajectory from which the synthetic observations were taken is our metric of the truth. Figure 1 shows the model domain with observation locations marked as stars.

Model states for initializing each of the ensemble members in the filter were generated by sampling a spunup model trajectory (unrelated to the truth trajectory) every other January to accumulate a total of 100 ensemble initial states. This is only one of many possible methods for initializing a filter ensemble. For example, an ensemble could be generated by randomly perturbing the reference state mentioned above. However, this can lead to ensemble members that do not differ in a meaningful way from one another. We choose this sampling of Januarys to ensure that the ensemble members were initially on the model attractor, independent of the truth trajectory used to generate the observations, and representative of the full model phase space (i.e., contain model states in neutral, warm, and cool phases of ENSO). The Nino-3 index, defined as the average SST anomaly in the eastern equatorial Pacific (5[degrees]N/ S, 150[degrees]90[degrees]W), is used as a metric to track the phase and intensity of the ENSO cycle.

a. Ensemble size and filter divergence

The choice of an ensemble size can be critical to the success of an ensemble filter. The size of the ensemble has two main functions in the EAKF algorithm. Ideally the ensemble size should be large enough to accurately represent the mean and spread of the prior distribution and also be large enough to allow for the accurate representation of the covariances between the observation and the prior state. When the filter is being used in complex models (such as an atmospheric or oceanic general circulation model), increases in ensemble size can come at considerable computational cost. In models with many degrees of freedom, the enormous number of ensembles that seem to be needed can be prohibitive. Practically speaking, a relatively small number of ensemble members must sometimes be made to suffice.

As we would expect, increasing the ensemble size decreases the root-mean-square (RMS) error between the Nino-3 time series of the prior mean and the truth. Ensemble sizes ranging from 5 to 100 members were tested (Fig. 2). Beyond 40 ensemble members there is little improvement in the analysis. When the ensemble size is less than approximately 20, the assimilation has a tendency to "diverge" from the truth. This occurs when the prior distribution has a variance that is very low relative to the observational error and the filter subsequently gives relatively little weight to the observation. Over a number of assimilation cycles, the ensemble may shrink, ignoring the observations and deviating further from the truth. This can be seen by inspection of (2) and (3). When filter divergence occurs, the truth is no longer reasonably consistent with the ensemble distribution and the spread of the ensemble is not an indication of the error between the ensemble mean and the truth.

Filter divergence is actually a symptom of a more general problem: A finite ensemble size will always lead to an underestimation of the true variance of the prior (Furrer and Bengtsson 2007). From a practical perspective, however, one can consider dealing with the underestimated variance through artificial inflation, localization (explored in the following section), or an increase in the number of ensemble members.

Anderson and Anderson (1999) and Whitaker and Hamill (2002) proposed state-space covariance inflation as a fix for the problem of filter divergence. This is done by scaling each prior ensemble member's deviation from the mean by a factor greater than one. Unlike localization, state-space inflation leaves the subspace spanned by the ensemble unchanged. While we do this for simplicity, it is worth noting that there is no a priori reason to believe that this is the appropriate form for the information lost by the use of small ensembles (Hamill and Whitaker 2005). Hamill and Whitaker (2005) also point out that if there are data-sparse regions in the model domain, there may not be observational information to damp the inflation of the prior. This can potentially lead to unchecked growth in the variance of the ensemble. Because of the density of our SST observation network and the significant impact that SST observations have on other model fields (see section 4b), this is not an issue in our assimilation system. Through trial and error, we found that scaling the prior ensemble standard deviation by an additional 3% is sufficient to eliminate most instances of filter divergence even when the ensemble size is relatively low. Figure 3 shows the RMS error in the analyzed Nino-3 index with and without inflation when 15 ensemble members are used. Of course, since the tendency toward filter divergence is a function of the ensemble size, a smaller amount of inflation is sufficient when the ensemble is larger.

As a side note, assimilations performed over longer simulation periods (50 yr) suggest that the ENSO system (as simulated by this model) may move in and out of decadal-scale time periods that are more or less prone to filter divergence. The addition of more ensemble members or a higher value of covariance inflation tends to alleviate this problem, suggesting that the effective dimension of the system may be changing on these long time scales. However, we do not delve further into this issue. Instead, to simplify the study, we focus on the 20-yr time period that does not tend to diverge.

b. Localization

Houtekamer and Mitchell (2001) suggest that another problem that can result from using a small number of ensembles is spurious covariances between assimilated observations and spatially distant state variables. One solution to this problem is the imposition of a covariance localization. A localization function essentially limits the influence of information from a new observation on geographically distant state variables. Here we use a fifth-order, piecewise continuous, compactly supported function introduced by Gaspari and Cohn (1999). A single parameter controls the geographical radius of influence of a given observation. The localization parameter is specified by a half-width. An observation will not have any influence on a state variable that is twice this distance. This would be expressed in (6) as a multiplicative factor on the righthand side.

To illustrate the effect of this isotropic localization, we assimilate a single SST observation in the central equatorial Pacific (0[degrees], 157.5[degrees]W) at the beginning of the 20-yr period. The panels of Fig. 4 show the resulting analysis increment in the SST field after a single assimilation cycle. The localization half-width is indicated to the right of each panel.

In the most restrictive case (top panel) the observation only influences state variables in the immediate vicinity. On the other extreme (where localization is removed), covariances between state variables and the single observation develop over the entire domain. Since the purpose of localization is to limit the impact of sampling error, if we believe that an observation should not have a correlation with a distant state variable, then localization will ensure that spurious correlations due to small ensemble size are set to zero. However the coherency of the correlation pattern in the bottom panel of Fig. 4 suggests that in this system, basin-scale correlations may not be artifacts of small ensemble size.

Here we explore the expectation that a decrease in the number of ensemble members will introduce spurious remote correlations. Since the flow-dependent, background error covariance will control how an observation increment will affect the entire state, this will shed light on the role that localization might play in alleviating sampling error due to small ensemble size. The ensemble filter is run in the perfect-model setup using 60 ensemble members. In a postprocessing step, we can use the prior ensemble at a point in time to diagnose the impact that a single observation increment will have on the state. This regression is analogous to how the filter computes state increments from observation increments [see (6)]. For simplicity, we have chosen an observation increment that lies directly on a model grid point.

Figure 5 shows the impact of an SST observation on the SST state. The black star in each panel indicates the location of the SST observation (11[degrees]S, 152[degrees]W). In the top panel the impact is calculated with 60 ensemble members, and in the lower panel only 15 of the ensemble members (randomly chosen) were used. The shaded contours in both panels indicate regions of the SST state that are correlated with the observation at greater than 0.6. (This is statistically significant at greater than 98% confidence.) This snapshot corresponds to a sample from the assimilation sequence that falls in the model month of August, prior to the onset of a weak El Nino event.

Assuming that the upper panel captures the correct state-space impact pattern over the entire model domain, we see that using only 15 ensemble members (bottom panel. Fig. 5) will indeed degrade the update. This is especially true in the eastern Pacific. The degradation is less pronounced near the observation, as we would expect. However, what is notable about both the patterns is that there is significant nonlocal impact. In fact, the locus of influence is on the equator, even though the observation is 11[degrees] off the equator. If localization was to be effective, we should be seeing evidence that the primary effect of the observation is to impact local state variables. However, in this instance the localization will cut off the remote relationships that are the primary value of the off-equatorial observations. This does not mean that localization will always lead to poor performance in this model, but it does suggest that a simple, isotropic localization should be used with caution.

That the impact of an off-equatorial SST observation is most pronounced in the central equatorial Pacific is expected, as this is a region of high climatological variance in the model SST that is also correlated with the observation. We would also expect that an SST observation would impact other state variables. In Fig. 6 we show the impact that the same SST observation would have on the oceanic Kelvin wave and on the zonal and meridional wind fields. As in Fig. 5, the shaded regions of each field correlate at greater than 0.6 with the SST observation. The spatial pattern of the correlations in the second panel is meridionally uniform due to the dynamics of the Kelvin wave. In this model, the magnitude of the Kelvin wave is calculated at the equator and projected onto a fixed meridional structure. As a result, all points at a given longitude will correlate equally with other state variables. The physical interpretation of these plots is that a positive, incremental change in the SST observation space would simultaneously enhance the downwelling Kelvin wave and increase equatorial wind convergence. An observational increment of SST with negative sign would have the opposite influence on the state. If a tight geographical localization of SST was imposed, these broad patterns in wind convergence and Kelvin wave amplitude would not be preserved. In fact, in the cross-variate cases shown in the bottom three panels, the regions with the most significant correlation are not collocated with the SST observation.

We present the aforementioned snapshots to illuminate the discussion of localization. Of course, in a full assimilation sequence the impact patterns would be flow dependent, changing with the background and the location of the observations.

While the use of multivariate impacts for the ENSO system is a natural strength of ensemble filters, it has not heretofore been illustrated in the literature. This is likely due to the legacy of univariate 3D variational systems (Zhang et al. 2005; Morss and Battisti 2004).

We now turn our attention to the overall performance of the assimilation in the presence of a varying localization parameter. As before, the simulated SST observations are assimilated on the Kaplan SST network. We look at the effect of localization when using 15 ensemble members. An ensemble size of 15 is used because we know that it is on the low end of an adequate number of ensembles. As such, it falls in the regime in which localization should be of the greatest utility. The RMS errors between the true state and the analysis with an increasing localization half-width (decreasing localization) is shown in Fig. 7. To put in perspective the range of localization widths tested here, ocean data assimilation systems (both 3D variational methods and EnKFs) typically define correlation structures near the equator with half-widths of [asymptotically =]10[degrees]-15[degrees] (Derber and Rosati 1989; Morss and Battisti 2004; Zhang et al. 2005).

As mentioned earlier, a great deal of precedent in the literature holds that localization will help alleviate problems of filter divergence and spurious long-distance correlations, both of which can be categorized as "small sample errors" due to an insufficient number of ensembles. Localization can also (intentionally or not) effectively increase the rank of the prior covariance from the number of ensembles to the number of observations. When there is a wide network of observations to compensate for the loss of remote influence, this increase in the rank is typically considered a benefit.

However, even with the wide and dense SST observation network used in this work, isotropic localization of any width does not appear to help the quality of the analysis. Tight localizations tend to have a negative impact on the analysis, leading to increased error in the ensemble mean and greater frequency of filter divergence.

At least in part, we can trace this behavior to the fact that localization generates "wiggly" increments to the dynamical ocean variables, adding structure to the posterior that is smaller in scale than the coupled dynamics would naturally produce. An illustration of this is shown in Fig. 8. (This snapshot is taken from the same model month as Figs. 5 and 6.) The posterior Kelvin wave amplitude at the equator calculated with localization (dashed line) and without localization (dotted line) is shown. Fifteen ensemble members are used in both cases. The posterior from the 60- member ensemble is shown as a solid line. We assume that it is closer to the correct analysis. Note that the bold and dotted lines are nearly coincident. The smaller-scale variability in the eastern Pacific is due to the localization procedure.

There are two simplifications in the ocean component of the ZC model that might make these spurious localization wiggles detrimental. The long-wave approximation made in the ocean model does not admit gravity waves. Hence, small-scale spatial variability that might otherwise propagate away from the source as a gravity wave is, instead, falsely projected onto the planetary Rossby and Kelvin modes. Additionally, since the linear damping coefficient in this model is not scale dependant these structures do not rapidly dissipate. The small-scales structures persist along with the planetary waves and propagate around the basin. Fourier analysis of the posterior Kelvin wave with and without localization confirms that tight localization shifts power from the interannual band into the higher frequencies (not shown).

c. Ensemble verification through the rank histogram

The Kalman filter assimilation not only provides an estimate of the true state of the system (the ensemble mean), but also an estimate of the uncertainty (the spread of the ensemble). If we are operating in the perfect-model paradigm and the model is nearly linear, we expect that the spread of the ensemble will provide some indication of the analysis error. While there is not a one-to-one relationship between the two (even in theory), the error and spread should be of similar magnitudes. If the ensemble is normally distributed, the spread should exceed the error in the majority of instances.1

This is generally true in our experiments. Figure 9 shows the analysis spread and RMS error at the equator over the 20-yr assimilation period for the zonal wind stress field. Forty ensemble members have been used with a 1% inflation. The broad spatial and temporal patterns and the magnitudes are in good agreement (similar results, not shown, are found for the SST and thermocline depth). In particular, the spread of the ensemble correctly predicts the zonal position of the greatest error (in a time-mean sense) and the relatively low errors in years 6 and 14.

A more rigorous verification of the ensemble can be made using the rank histogram (Anderson 1996). If the probabilities inferred from the ensemble distribution are to be considered reliable, the true state of the system should be indistinguishable from a random draw of the ensemble. Pooling the true value of a variable with the values from the N ensemble members and sorting them from lowest to highest, the truth should have equal probability of being ranked from 1 to N + 1. Over a sufficiently large number of instances (in state space and/or time), a histogram of the rankings should be approximately uniform. Hence the uniformity of the rank histogram provides a means of quantifying the extent to which the truth is consistent with the ensemble distribution.

A chi^sup 2^ statistical hypothesis test can be used to determine if the rank histogram differs significantly from a uniform distribution. As noted in Hamill (2001), this hypothesis test is only valid if the samples used to populate the rank histogram are uncorrelated. Because of the limited domain and basinwide spatial correlations in this model, it is difficult to find regions of interest in the model domain that are completely uncorrelated. For this hypothesis test we settle on three points along the equator [in the western (146[degrees]E), central (169[degrees]W), and eastern (124[degrees]W) Pacific], which are only weakly correlated. We construct rank histograms for the SST, zonal wind stress, and thermocline depth anomalies using these three points (Fig. 10).

To reduce the temporal autocorrelation, the variables are sampled in time every three months for 20 yr of analysis. Wilks (2004) presents additive corrections to the tabulated chi^sup 2^ critical values for autocorrelated variables. Given that the ensemble size is 40 and that the 3-month lagged autocorrelation is at least 0.2 for all variables, the test for uniformity states that the calculated chi^sup 2^ values (noted in the upper right corner in of each panel in Fig. 10) should not exceed a critical value of 63.69 + 1.4 = 65.09. (The existence of an autocorrelation essentially reduces the degrees of freedom in the chi^sup 2^ test, raising the critical value.) Since our calculated chi^sup 2^ values are all less than this critical value, we are 99% confident that the rank histograms cannot be distinguished from uniform.

It is worth noting that combining distant state-space variables and temporal regimes into one histogram can mask pertinent information about the reliability of the ensemble in specific regions. In a more detailed verification study, one might choose to construct and analyze rank histograms at targeted instances in space and time. However, in this assessment we are more interested in the overall performance of the filter across the equatorial basin. To be specific, we would like to know if the ensembles are over- or underdispersed. Underdispersed ensembles tend to have rank histograms that show a higher frequency of counts in the lower and upper ranks, creating a U shape. Conversely, too much variance in the ensemble will result in a mound-shaped histogram, with higher frequency of counts concentrated in the middle ranks. Neither of these tendencies is evident in the rank histograms for the SST, zonal wind stress, or thermocline depth anomalies. While this (and the statistical test for uniformity) does not guarantee reliability at all points in time and space, it does suggest that the filter is performing properly in an overall sense.

5. Assimilation of Kaplan SST analysis

Progressing from the assimilation of synthetic observations to the assimilation of an independently generated data product presents new challenges to the filter, Most notably, the issues of model error and model bias become important. From a practical point of view, the goal of the perfect-model experiments is to gain insights into the system that will aid in the assimilation of real observations. At this point, it is useful to summarize some of the main lessons learned from the perfect-model experiments in the previous section. We expect that at a minimum we would need to respect these findings, even though additional issues will likely present themselves. 1) The perfect-model experiments illustrate that imposing an isotropic localization function is not a useful method for cutting off spurious regressions. This is because the system has significant nonlocal correlations within the SST field and between the SST field and other state variables. These can often be captured even when the ensemble size is relatively low compared to the model state space. Additionally, localization tends to add small-scale structures to the ocean model that are not consistent with the modes of variability. 2) When the ensemble size is low, the assimilation system is susceptible to filter divergence. Covariance inflation was a useful fix for this problem. Dealing with filter divergence will be of particular importance for the assimilation of real data.

We assimilate Kaplan SST anomalies located on the same 5[degrees] network used for the synthetic observations. Observations are assimilated every month from January 1970 to September 2005. This time period includes seven El Nino events. Based on the results of the perfect-model experiments we use a 40-member ensemble with no localization. As mentioned before, we know in advance that filter divergence will likely be an issue, but for a reason that is additional to that discussed in the perfect-model context. One interpretation of covariance inflation is that it is the admittance of model error into the EAKF (Hamill and Whitaker 2005). In theory, then, if the nature of the model error were known, inflating the covariance could be done in a way that is consistent with the traditional Kalman filter equations. In practice, however, we know little about how the errors in the model will impact the covariance structure and so make the simple assumption that they are proportional to the background errors. Trial and error led us to settle on a state inflation of 18%.

We divide the tropical Pacific basin into nine regions, three in the western part of the basin (120[degrees]-160[degrees]E), three in the central Pacific (160[degrees]-150[degrees]W), and three in the east (150[degrees]-90[degrees]W). Within each of the zonal regions, the meridional boundaries are 15[degrees]-5[degrees]S, 5[degrees]S- 5[degrees]N, and 5[degrees]-15[degrees]N. For each of these regions. Table 1 shows the temporal correlation between the area-averaged value of the Kaplan SST and the area-averaged value of the analysis. The values in bold parentheses are calculated from 7-month low-pass- filtered time series to highlight the interannual variability.

The eastern and central equatorial Pacific regions tend to have the greatest correlation between our reconstructions and the benchmark analysis. For the western and off-equatorial Pacific, we find that the variability is in relatively good agreement (hence the high correlation) but we also note that this score is slightly misleading. Figure 11 shows the time series of the Kaplan SST in the eastern, central, and western Pacific (gray line with dots) with the analysis mean superimposed in a bold solid line. While there is very good agreement in the Nino-3 region and in the central equatorial Pacific, in the western region there is a visible offset of [asymptotically =]0.2[degrees]C in the long-term mean and a lower variance in our analysis. This general result holds for the off- equatorial regions of the east and west Pacific as well.

We would also like to assess the extent to which the assimilation of only SST, as we have done here, can create meaningful analysis of the other state variables. Here we present a comparison of two additional variables (zonal wind stress anomaly and thermocline depth anomaly) from the analysis with independent observational products. Our reconstructed wind stress is compared with the Florida State University (FSU) wind stress product (spanning from January 1970 to December 2002; Stricherz et al. 1997), and the thermocline depth is compared with the depth of the 20[degrees]C isotherm from the ocean analysis system at the National Centers for Environmental Prediction (NCEP; January 1980 to September 2005; Ji et al. 1995; Behringer et al. 1998).

As in Table 1, Tables 2 and 3 show the correlations for each of the nine regions in the deep tropical Pacific. Time series from the equatorial region for the wind stress and thermocline depth are in Figs. 12 and 13. In comparison to the FSU wind stress our analysis has significantly less high-frequency variability. The analysis is capturing some of the interannual variability from the FSU product in the central and western equatorial Pacific, but there does not appear to be any significant correlation in the eastern equatorial Pacific and outside the Tropics. Our reconstructed thermocline depth anomaly in the equatorial region tends to perform better in the sense that it is in greater agreement with the NCEP product. Particularly in the east and in the west (on and off the equator), the interannual time scales are in very good agreement, albeit the variance of our analyzed product is a bit low.

While one could imagine that further inflation might enhance agreement in the SST field in the western equatorial Pacific (and off-equatorial regions), we find that increasing the inflation has little impact in this region and tends to reduce the agreement in the other state variables.

6. Discussion and concluding remarks

The purpose of this study was to explore the utility of an EAKF data assimilation system used with an intermediate ENSO model. Although it is possible to simultaneously assimilate multiple data fields using the EAKF, we focused exclusively on the assimilation of SST. We began by looking at the performance of the filter under the perfect-model assumption, where synthetic observations were used. This was a useful paradigm to investigate how ensemble size, covariance inflation, and covariance localization impact the analysis in this model. We also investigated the assimilation of Kaplan SST through the years 1970 through 2005, no longer operating under the perfect-model paradigm.

In our perfect-model experiments, where the SST is densely observed and an adequate number of ensemble members are retained, it does not appear that the assimilation of any additional observational types is necessary for creating a high quality analysis. The correlations between the truth and the analysis mean in SST and wind stress are greater than 0.98 over the entire basin. The correlations for thermocline depth exceed 0.95.

In the perfect-model experiments we find that isotropic covariance localization does not benefit the analysis even when a small number of ensemble members are used. One of the reasons for this is that the ENSO model has relatively few effective degrees of freedom. Even though there are [asymptotically =]20 000 state-space variables, using more than 40 ensemble members does not yield improvement in the filter performance. We showed in Fig. 5 that even when only 15 ensemble members are used, there are still significant nonlocal correlations in the prior.

We also showed that in this system SST observations should be allowed to impact other state variable fields. Figure 6 demonstrates that localizing around an SST observation would cut off the areas of significant remote influence on other state variables. We also find that even when the observing system is wide and dense, isotropic localization introduces small-scale structures into the ocean model that are inconsistent with the model dynamics. The lack of gravity waves and scaledependant damping in the ocean model prevents these structures from dissipating, thereby degrading the analysis. Our results suggest that the destruction of the "balance" between the variables through localization is more detrimental than spurious correlations due to small ensemble size. The problem of balance destruction is exacerbated when real observations are assimilated. In our experience with this model, we find that a tight localization not only degrades the analyses, it can also cause the model solution to "blow up." This is because the unbalanced increments are much greater when the model and observations are biased relative to one another.

Since this model is relatively inexpensive to run, the issue of whether or not to localize is not critical. We can afford to use many ensemble members, which will decrease spurious remote correlations. But in more complex models of the tropical Pacific, small ensemble sizes may be necessary due to computing constraints. If the effective degrees of freedom are small, then localization may not be necessary. However, we would expect models with more sophisticated ocean and atmospheres to have many more degrees of freedom than this intermediate model. In those instances, localization may be beneficial. However, a simple isotropic localization would likely continue to be problematic, especially when observations are sparse. The hierarchical group filter (Anderson 2007), which has a localization function that is both model and flow dependent, may be one solution for this issue.

In advancing from the assimilation of simulated observations to the assimilation of an independent dataset of SST, it was necessary to account for bias between the model and the dataset. When using 40 ensemble members, an increase in the state-space inflation to 18% was necessary to keep the Nino-3 index of the analysis in agreement with the Nino-3 index of the Kaplan SST. While the observations contain uncertainties, we trust that the average SST anomaly in the Nino-3 region is a reasonable estimate of the phase and intensity of ENSO. In nine regions in the near-equatorial Pacific, we correlated time series of the average SST anomaly from our analysis with the average Kaplan SST anomaly. While the variability was in reasonable agreement throughout the basin from 15[degrees]S to 15[degrees]N, it was in the best agreement in the eastern equatorial region. We find a very clear signal of bias in the western equatorial and off- equatorial Pacific. While the variability agrees well (hence the high correlation scores), the analysis tends to be cooler than the Kaplan SST by =0.2[degrees]C over the entire time period. When we look at comparisons of thermocline depth from our analysis with the NCEP thermocline depth product from 1980 to 2005, we find that we can make reasonable reconstructions of interannual variability in the eastern and western equatorial Pacific and just north of the equator in the west. That the analysis does not capture the high- frequency variability in the FSU wind stress is not a great surprise. The Gill-type atmosphere used in this model was envisioned as a model of the large-scale wind field in tropical regions of strong SST gradients. Off the equator, the wind reconstructions are dismal, again reflecting the simplicity of the atmospheric model.

Since the Kaplan SST product extends back to 1856, our results indicate that 19th century reconstructions of the interannual variability of wind stress and thermocline depth (in certain geographic locations) may be possible.

Through this comparison it is readily apparent that our model has significant shortcomings in certain regions. However, it is clearly successful in other regions, especially near the equator. These insights could be used for model improvement (e.g., adjusting the mean model SST bias in the western warm pool) or they could be used as guidelines for determining which regional variables might be most useful for assimilation. The assimilation will likely be more successful if we only assimilate in regions where we know the model biases are not significant. Interestingly, this is exactly what is done in the nudging technique of Chen et al. (1995). Knowing the model atmosphere does not have good fidelity off the equator, they do not assimilate observations in those areas. From this perspective, our results suggest that the assimilation of thermocline depth in the west (from the equator poleward to 15[degrees]N/S) and in the east near the equator and southward to 15[degrees]S would be most beneficial.

It is also worth mentioning that the lagged covariance information that is a byproduct of ensemble data analysis methods can also be exploited for uncovering dynamical relationships between the model variables. Hakim and Torn (2008) discuss these techniques in an investigation of atmospheric synoptic relationships. Ensembles generated using the EnKF are appropriate for these applications because they are designed to sample the probability distribution of the analysis. The physical insights that can be gained using ensemble-generated covariances are, of course, only as valid as the model dynamics. However, outside the "perfect model" framework one can imagine using these methods to compare the dynamics in multiple models, or even to make model bias corrections to the extent that linear perturbation dynamics are not violated.

Acknowlegments. This research was supported by the NOAA Postdoctoral Program in Climate and Global Change, administered by the University Corporation for Atmospheric Research. Helpful discussions with Peter Gent, Joe Tribbia, Marcus Jochum, Alexey Kaplan, and Dake Chen are gratefully acknowledged, as are comments from two anonymous reviewers.

1 To understand why this should be so, consider that if the ensemble distribution were perfect, then the true state would be indistinguishable from a random draw of the ensemble. In the case of a normal distribution, approximately 68% of the time the random draw would be within one standard deviation of the ensemble mean. Hence the error (defined as the absolute difference between the truth and the ensemble mean) would be greater than the spread only 32% of the time. Houtekamer (1993) uses similar reasoning to investigate the expected correlation between ensemble spread and forecast skill.

REFERENCES

Anderson, J. L., 1996: A method for producing and evaluating probabilistic forecasts from ensemble model integrations. J. Climate, 9, 1518-1530.

_____ . 2001: An ensemble adjustment Kaiman filter for data assimilation. Mon. Wea. Rev., 129, 2884-2903.

_____ , 2003: A local least squares framework for ensemble filtering. Mon. Wea. Rev., 131, 634-642.

_____ . 2007: Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter. Physica D, 230,99-111.

_____ , and S. L. Anderson, 1999: A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev., 127, 2741-2758.

Behringer, D. W., M. Ji. and A. Leetmaa. 1998: An improved coupled model for ENSO prediction and implications for ocean initialization. Part I: The ocean data assimilation system. Mon. Wea. Rev., 126, 1013-1021.

Bishop, C. H., B. J. Etherton. and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kaiman Filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420-436.

Cane, M. A., and R. J. Patton. 1984: A numerical model for lowfrequency equatorial dynamics. J. Phvs. Oceanogr., 14, 1853- 1863.

_____ , S. E. Zebiak, and S. C. Dolan. 1986: Experimental forecast of El Nino. Nature, 321, 827-832.

Canizares. R., A. Kaplan, M. A. Cane, D. Chen, and S. E. Zebiak, 2001: Use of data assimilation via linear low-order models for the initialization of El Nino-Southern Oscillation predictions. J. Geophys. Res., 106, 30 947-30 959.

Chen, D.. S. E. Zebiak, A. J. Busalacchi, and M. A. Cane, 1995: An improved procedure for El Nino forecasting: Implications for predictability. Science, 269, 1699-1702.

_____ . M. A. Cane. A. Kaplan. S. E. Zebiak, and D. Huang, 2004: Predictability of El Nino over the past 148 years. Nature, 428, 733- 736.

Derber, J., and A. Rosati. 1989: A global oceanic data assimilation system. J. Phys. Oceanogr., 19, 1333-1347.

Epstein. E.. 1969: Stochastic dynamic prediction. Tellus. 21, 739- 759.

Evensen, G.. 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 143-10 162.

Furrer, R., and T. Bengtsson. 2007: Estimation of high- dimensional prior and posterior convariance matrices in Kaiman filter variants. J. Multivar. Anal., 98, 227-255.

Gaspari. G., and S. Cohn, 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc, 125, 723-757.

Gill, A. E.. 1980: Some simple solutions for heat induced tropical circulation. Quart. J. Roy. Meteor. Soc, 106, 447-462.

Hakim, G., and R. Torn, 2008: Ensemble synoptic analysis. Sanders Symposium Monograph, Meteor. Monogr., No. 55. Amer. Meteor. Soc., in press.

Hamill, T. M.. 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550-560.

_____ . and J. S. Whitaker. 2005: Accounting for the error due to unresolved scales in ensemble data assimilation: A comparison of different approaches. Mon. Wea. Rev., 133, 3132-3147.

Houtekamer, P. L., 1993: Global and local skill forecasts. Mon. Wea. Rev., 121, 1834-1846.

_____ , and H. L. Mitchell. 1998: Data assimilation using an ensemble Kaiman filter technique. Mon. Wea. Rev., 126, 796-811.

_____ , and _____ . 2001: A sequential ensemble Kaiman filter for atmospheric data assimilation. Mon. Wea. Rev., 129, 123-137.

Jazwinski. A. H.. 1970: Stochastic Processes and Filtering Theory. Academic Press. 376 pp.

Ji, M., A. Leetmaa, and J. Derber. 1995: An ocean analysis system for seasonal to interannual climate studies. Mon. Wea. Rev., 123,460- 481.

Kaplan, A., M. A. Cane, Y. Kushnir. A. C. Clement. M. B. Blumenthal, and B. Rajagopalan, 1998: Analyses of global sea surface temperature 1856-1991. J. Geophys. Res., 103, 18 567-18 589.

_____ , Y. Kushnir, and M. Cane, 2003: Reduced space approach to the optimal analysis interpolation of historical marine observations: Accomplishments, difficulties, and prospects. Ad- vances in the Applications of Marine Climatology: The Dynamic Part of the WMO Guide to the Applications of Marine Climatology, World Meteorological Organization. WMO Tech. Doc. 1081, 99-216.

Karspeck, A. R., A. Kaplan, and M. A. Cane, 2006: Predictability loss in an intermediate ENSO model due to initial error and atmospheric noise. J. Climate, 19, 3572-3588.

Keenlyside, N., M. Latif. M. Botzet, J. Jungclaus, and U. Schulzweida, 2005: A coupled method for initializing El Nino Southern Oscillation forecasts using sea surface temperature. Telltis, 57A, 340-356.

Kleeman, R., A. M. Moore, and N. R. Smith. 1995: Assimilation of subsurface thermal data into a simple ocean model for the initialization of an intermediate tropical coupled oceanatmosphere forecast model. Mon. Weit. Rev., 123, 3103-31 14.

Lee, T, J.-P. Boulanger. A. Foo. L.-L. Fu, and R. Giering, 2000: Data assimilation by an intermediate coupled oceanatmosphere model: Application to the 1997-1998 El Nino. J. Geophys. Res., 105, 26 063- 26 087.

Morss, R. E., and D. S. Battisti, 2004: Evaluating observing requirements for ENSO prediction: Experiments with an intermediate coupled model. J. Climate, 17, 3057-3073.

Stricherz, J. N., D. M. Legler, and J. O'Brien. 1997: TOGA Pseudo- Stress Atlas, i985-1994. II: Tropical Pacific Ocean. Center for Ocean-Atmosphere Prediction Studies Rep. 97-2, 155 pp.

Sun, C, Z. Hao, M. Ghil, and J. D. Neelin, 2002: Data assimilation for a coupled ocean-atmosphere model. Part I: Sequential state estimation. Mon. Wea. Rev., 130, 1073-1099.

van Leeuwen. P. J., and G. Evensen, 1996: Data assimilation and inverse methods in terms of an probabilistic formulation. Mon. Wea. Rev., 124, 2898-2913. Whitaker. J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 1913-1924.

Wilks, D. S.. 2004: The minimum spanning tree histogram as a verification tool for multidimensional ensemble forecasts. Mon. Wea. Rev., 132, 1329-1340.

Zebiak, S. E., 1986: Atmospheric convergence feedback in a simple model for El Nino. Mon. Wea. Rev., 114, 1263-1271.

_____ , and M. A. Cane, 1987: A model El Nino-Southern Oscillation. Mon. Wea. Rev., 115, 2262-2278.

Zhang. S. M. J. Harrison, A. T. Whittenberg, A. Rosati, J. L. Anderson, and V. Balaji. 2005: Initialization of an ENSO forecast system using a parallelized ensemble filter. Mon. Wea. Rev., 133, 3176-3201.

ALICIA R. KARSPECK AND JEFFREY L. ANDERSON

National Center for Atmospheric Research, Boulder, Colorado

(Manuscript received 6 April 2006, in final form 19 December 2006)

Corresponding author address: Alicia R. Karspeck, National Center for Atmospheric Research, Box 3000, Boulder, CO 80307.

E-mail: aliciak@ucar.edu

Copyright American Meteorological Society Sep 15, 2007

(c) 2007 Journal of Climate. Provided by ProQuest Information and Learning. All rights Reserved.

Experimental Implementation of an Ensemble Adjustment Filter for an Intermediate ENSO Model
Back to Current Headlines
Repair Credit   Gate Operator   Harley Davidson Accessories   Wedding DJ Massachusetts