登录后查看更多精彩内容~
您需要 登录 才可以下载或查看,没有帐号?立即注册
x
本帖最后由 hillside 于 2014-1-22 23:44 编辑
近年,陆面数据同化近年发展迅速,具有良好的发展前景与应用价值,已有不少文献出现。现介绍由专著《Data assimilation, The Ensemble Kalman Filter, 2nd》作者主页提供的陆面数据同化的关键技术——集合卡尔曼滤波matlab计算程序及相关材料。此外,该主页亦提供fortran版本的集合卡尔曼滤波程序。
http://enkf.nersc.no/
[tr][/tr] |
EnKF-The Ensemble Kalman Filter
Geir Evensen Bergen, Norway
2nd edition of data assimilation book available:
Geir Evensen: Data assimilation, The Ensemble Kalman Filter, 2nd ed., Springer, 2009
Springer linkand Amazon link |
| 02.02.2007: Updated routine with mean preserving rotations in the EnKF SQRT schemes for analysis computation (Sakov 2006) is now available. See upgrades for details.
A new development has now lead to significant improvement in the EnKF algorithm and an enhanced understanding of the methodology, see Evensen (2004) with Correction
The EnKF is a sophisticated sequental data assimilation method. It applies an ensemble of model states to represent the error statistics of the model estimate, it applies ensemble integrations to predict the error statistics forward in time, and it uses an analysis scheme which operates directly on the ensemble of model states when observations are assimilated. The EnKF has proven to efficiently handle strongly nonlinear dynamics and large state spaces and is now used in realistic applications with primitive equation models for the ocean and atmosphere. A recent article in the Siam News Oct. 2003 by Dana McKenzie suggests that the killer heat wave that hit Central Europe in the summer 2003 could have been more efficiently forecast if the EnKF had been used by Meteorological Centers. See the article "Ensemble Kalman Filters Bring Weather Models Up to Date" onhttp://www.siam.org/siamnews/10-03/tococt03.htm
This page is established as a reference page for users of the EnKF, and it contains documentation, example codes, and standardized Fortran 90 subroutines which can be used in new implementations of the EnKF. The material on this page will provide new users of the EnKF with a quick start and spinup, and experienced users with optimized code which may increase the performence of their implementations.
The EnKF was originally proposed by Evensen (1994), and has later been further developed and examined in a large number of published papers. A recent review and overview of the EnKF is given by Evensen (2003), which provides detailed information on the formulation, interpretation and implementation of the EnKF, and now serves as a reference document for the basic methodology.
This page also provides some information and examples related to the Ensemble Kalman Smoother.
Users of the EnKF are encouraged to e-mail useful example codes and upgrades, which can be installed on this web site, to Geir Evensen.
|
Index of /Code
[tr][/tr]
Implementation Guide for the Generic EnKF Package.Written by Olwijn Leeuwenburgh (KNMI) and Laurent Bertino (NERSC) during their collaboration within the ENACT projectGeneral recommendationsWe advise to use the EnKF to perform the analysis step only. A separate program is needed to integrate the model ensemble forward (one can write a script or Fortran control program to alternatingly perform a model integration and analysis and take care of the time parameter). The EnKF will need as input files:- The forecast ensemble file produced by the model. This file can be written as a binary file with every member on a separate record, preceded by a character chain that identifies the model and its version. TIP: if one member propagation fails for some reason, then the record header chain will not be recognized and you can avoid processing the file.
- The observations file. We advise you to store together with each data value all informations that relate the observation to the model state :
- An identifier ('SST', 'SLA', 'TEM', 'TB' ...)
- The measurement error variance.
- The coordinates x,y,z of the observation.
- The nearest model grid cell (i,j)
- Some coefficients for interpolation of the model values to the observation location (the 4 bilinear interpolation coefficients for example).
- If the observations are given on larger averaging cells (or "support") than the model mesh the number of model cells covered by the observation should also be given. This seems unfortunate but consider for example Reynolds SST data or El Nino indexes.
- The topography/bathymetry in order to compute only wet points.
The EnKF will produce as output file the analysed ensembleA.uf file containing all restart members for the next model propagation. As for the forecast file, they should preferentially be written in a single binary file on different records.The makefile is written for the IBM and some compiling options may need to be changed. It is advised however to stick to the filenaming and other conventions followed. mksource.sh is used to create a list of source files. mkdepend.pl then diagnoses the dependencies between the routines. Use 'make' to compile and create the EnKF executable. Use of the generic routinesThe following points should be considered when adapting the code to make it suitable for one's own model: - 1) Specify the model dimensions nx,ny and nz in mod_dimensions.F90
- 2) Identify the prognostic variables of the model. If necessary, modify the type declarations, interfaces, subroutines and functions in mod_states.F90 (mod_states2D.F90).
- 3) m_pseudo2D.F90 contain calls to FFT routines for CRAY, IBM, DEC and SGI machines. An experimental version of this routine using a Numerical Recipes routine is also available.
- 4) Reset the random seed to a different integer each time you use the EnKF executable, this will avoid trouble due to "Extremely time-correlated measurement error". m_set_random_seed2.F90 is given in NERSC_implementation/Random/
- 5) An identifier (cident) is used to check for the proper model ensemble files and whether the files contain single or double precision data . This identifier may need to be changed (e.g. m_read_ensemble.F90).
- 6) The expected forecast ensemble file is 'ensembleF.uf'.
- 7) The restart file written by the model should be in the format expected by the EnKF.
- 8) analysis.F90 uses double or single precision BLAS routines dgemm or sgemm.
- 9) analysis.F90 is not m_analysis.F90. This keeps the compiler from complaining about the implicit conversion from type(states) A(:) to the real-valued array A(:,:) which should not be a problem as long as you have only floating points values in the state vector (no integer, no character values ...).
Application-oriented routinesOnly for those who want to do exactly the same as we do! These routines can be found following Code and NERSC_implementation/ People who start data assimilation from scratch can easily adapt them to their own needs.- 1) The routine m_consistency_A.F90 checks the model state ensemble for extreme or invalid values. Appropriate min/max values should be put here. (in HYCOM_IO/)
- 2) Both gridded fields and profile data can be assimilated. Some types of data are already accounted for (see e.g. list in m_Generate_element.F90 in Obs_operator/).
- 3) m_modstate_point.F90 (in Obs_operator/) identifies the model state variables corresponding to different types of data and may thus be model dependent (it is the core of the H matrix).
- 4) m_obs_pert.F90 (in Random/) determines the grid size on which random 2D fields are generated, select the range of 2D perturbations there.
- 5) mkensemble.F90 generates an initial ensemble for HYCOM coupled to an ice model.
- 6) Both local and global analysis use almost the same routines, in our implementation they are selected at run-time when reading the file assimilation.in
Setup of an EnKF assimilation experimentThe modelProgramming issueThe model should be able to restart several times from the same time step, usually it is just a matter of reading the forcing fields at the right place. Each of the member can then run with its own random perturbations independently from the other members. They will only meet at analysis time. Remember to reset the random numbers to a different seed every time you restart the executable. Choosing the model errorThat is where science really starts. Feel free to randomize wherever you think the model has its main error. The ensemble variance or spread will indicate whether you have touched a sensitive parameter or not. The assimilation results (in particular the reduction in ensemble spread for each variable) will then show you where the model builds the multivariate links between model variables an measurements. For example in the TOPAZ experiment, the ECMWF atmospheric forcing fields are perturbed by adding 2D random fields (m_pseudo2D.F90 was used for this purpose) and each ensemble member is running in HYCOM with its own Richardson mixing parameter (but constant in time). The model error can be made time-correlated, then remember to store its value and read it again on restart time. The measurementsThey do not need to be part of the state vector, if the measurement relate only to diagnostic variables (like sea surface heights in HYCOM), then you can store these diagnostic variables for every member in a separate auxiliary file (named model_ssh.uf for example) and read it before analysis to compute the matrix S=HA' and the innovations D=D-S. Measurement errors can be space correlated if these measurements have been interpolated/regridded beforehand. Consider using m_pseudo2D.F90 to simulate the random measurement error fields if this is the case. (a Cholesky decomposition of the covariance matrix can also be used if there are less than 1000 irregulaly spaced measurements).The initial errorDifficult to tell what is the initial state error at a given time! People having successful experience with OI may use their OI covariance matrix to draw random simulations the initial members. In HYCOM, a convenient (and efficient) way of simulating the initial error is to perturb the depths of the interfaces between isopycnic layers. (see program NERSC_implementation/HYCOM_IO/mkensemble.F90 ) Then we integrate the ensemble one/two weeks forward with perturbations in the forcing fields to let it build the adequate multivariate covariances. Another strategy is to record a climatological ensemble during a model spinup and store the restart files during the adequate season so that you will be able to restart from each of them. Then one should make sure to avoid having a bunch of members VERY different from all the others. Which leads us to the next remark ...Statistician's remarksGaussianityIn the EnKF the assimilation is performed by linear estimation based on an invisible Gaussian assumption. This means that all measurements and state variables should be satisfactorily approximated as Gaussian. This is often the case if the variability is rather small compared to the actual value of the variables. In the contrary case you can have a look at the paper Bertino et al (2003) and try a Gaussian transformed model.
To use a log-transformation for example (positive variables with exceptionally large values), set A = log (A), d = log (d) at the beginning, specify measurement errors as a ratio of their actual value, think hard about the observation matrix [can I say log(d) = H log(A)?], and use the same routines as provided here. Back transformed the ensemble at the end.Avoid the ugly little ducksThe enemy of the EnKF is the outlying member because it has a strong impact on the empirical covariances (that are ensemble-based). Imagine an ensemble among which one member (or a bunch of members) is very distant from the others, then it will draw all the correlations to himself, as if all other members were only one. So if you want to avoid 100 members to behave like 2, make sure the ensemble is a homogeneous population, avoid unusual members, small minorities or partitions within the ensemble. (Laurent is talking statistics here, humanly speaking he appreciates the company of unusual people and minorities).About the number of ensemble membersHow few ensemble members should I use?
Well ... Don't believe too much in miracles nor in "saturation effects". If your problem is really interesting, then 10 members won't be smart enough to tell you all about the most tricky multivariate covariances. We advise 100 members to get sound reproduceable results. 100 is the ensemble size that has been used during the DIADEM and TOPAZ real-time experiments. See the paper by Natvik (2002) to see how well 20, 40, 60, 80 and 100 members behave with a coupled physical-ecological model of the North Atlantic. If you can afford more than 100, then go for more! But remember that 1000 members will not help you if the model has serious biases: then all 1000 members will miss the solution and you will feel like beating a dead horse.
This article is translated to Serbo-Croatian language by Jovana Milutinovich from Web Geeks Resources.
气象家园相关帖子:
资料同化的书-Data Assimilation: The Ensemble Kalman Filter, Second Edition
介绍卡尔曼滤波应用不错的教科书(英文)
[源代码] ENKF的fortran程序 (帖子未说明来源,但根据目录看,亦来自英文作者主页)
上传几篇关于集合变换卡尔曼滤波和三维变分混合同化(Hybrid ETKF-3DVAR)的文献
[参考资料] 统计学、经济计量学FORTRAN程序或软件的一次群英会
资料同化的英文书(第一版的)
|