DOCUMENTATION FOR THE YALE RADIOSONDE HOMOGENIZATION DATASET Steven Sherwood 05/2007 Steven.Sherwood@yale.edu ---------------- We have produced this dataset following the iterative universal Kriging procedure described in Sherwood (1999) and Sherwood (2007). The key to this approach is to estimate the climate signals, missing data, and instrument change effects synergistically, i.e., iteratively, exploiting the spatial and temporal coherence of natural variability. The resulting dataset includes temperature and wind shear at mandatory reporting levels from 527 radiosonde stations, from 1959-2005. The stations are divided into two groups: 460 A stations having substantial data at two times of day, and 67 B stations not having these (B stations are included only in the Tropics and southern hemisphere). Trends in A stations are more reliable. The procedure appears to have been successful in eliminating systematic biases in most regions, although the deep tropics appear to retain cooling biases over time that we still cannot identify; these may be due to changes that are too numerous to detect, or not step-like. A penalty paid for the elimination of systematic biases is that individual stations now have more variable trends than in other homogenizations, so we recommend these data mainly for quantities involving multiple stations. The details involved in creating this dataset, and preliminary trend results, are currently in submission to J. Climate; further details may be obtained from the PI. The Data are available in files listed below. ---------------- * all_latlon.dat Contains coordinates of all stations. Station list begins with A stations (northern hemisphere extratropics, tropics, southern hemisphere extratropics), then B stations (tropics, southern hemisphere extratropics). There are no northern hemisphere extratropical "B" stations. This information is also contained in the NetCDF files. ---------------- * T_monthly.nc, X_monthly.nc The monthly and diurnal mean homogenized temperature (T) and windshear (X) data. Temperature file (T_monthly) contains: MONTH_DATES ... date (YYYYMM) LEVELS level pressures (hPa) STATIONS station id LON station longitudes LAT station latitudes MONTHLY_MEANS temperature data (C) SAMPLE_UNCERT 1-sigma sampling uncertainty of monthly means (C) STRUCT_UNCERT 1-sigma structural uncertainty of monthly means (C) The shear file contains the same, except there are two copies of each data variable with _X and _Y appended for zonal and meridional components respectively. Units for shear are m/s per log(p). The structural error should be assumed highly correlated from month to month at a given station while sampling uncertainty should be uncorrelated in time. Both types should be uncorrelated between stations. Thus, the former type will dominate the uncertainty of trends, the latter will dominate sufficiently short time scale changes, and both are important for differences between stations for a given month. Structural uncertainty is estimated as half the full range among the set of estimates comprising two change-point detection schemes and two (for group B) or four (for group A) stages of the analysis. This should not be regarded as an absolute limit but perhaps close to a 68% confidence interval (in our subjective judgement) making it roughly equivalent to the one-sigma sampling uncertainty. Shears are calculated by differencing the mandatory levels (including 700 and 400 hPa) above and below the target level, except at 850 hPa (where the 700-850, rather than the 700-1000 difference is used) and from 100 hPa up (where e.g. the 100-70 difference is used at 100 hPa, not 150-70; the 70-50 difference is used at 70 hPa; and so on). Sampling uncertainty is calculated only when at least 10 observations are available that month, otherwise is set to a missing (large) value. In most such cases a monthly mean value is still available, as these are based on imputed twice-daily values. The user should however consider carefully whether to use such monthly estimates, which are essentially interpolations with little independent information content, depending on the purpose. A few months are truly missing (indicated by a large positive missing value). ---------------- * T_CP.nc, X_CP.nc The change point times and level shifts found by each of two schemes (2PH for two-phase regression, L96 for nonparametric) at each of two station groups (A and B). These shifts were applied in generating the monthly data above. LEVELS level pressures (hPa) G_STATIONS station id for group G (A or B) G_LON station longitudes " G_LAT station latitudes " ALLCP_DATES_G_SSS non-solar change-point dates, ", scheme SSS DAYCP_DATES_G_SSS solar change-point dates, ", scheme SSS ALLCP_SHIFTS_G_SSS non-solar level shifts (C), ", scheme SSS DAYCP_SHIFTS_G_SSS solar level shifts (C), ", scheme SSS Two types of change points are included: those affecting all times of day equally ("non-solar"), and "solar" CPs that affect only "daytime" observations (those at whichever of the two nominal observing times falls between 600 and 1800 hours local non-daylight time). Non-solar CP's are found for both station groups, while solar CP's are found only for Group A. Each non-solar CP is assumed to affect all levels, but with different shift amplitudes for each level and season; at any level/season where this is not possible due to inadequate data, the shift is set to zero. Solar CPs are defined separately for each level, but their level shifts are estimated once for all seasons. Note that a positive value for the level shift indicates an upward artifact in temperature. Thus, artifacts would be removed by adding the specified shift value to all raw data prior to the specified date.