Observational Data Dumping at NCEP

Dennis Keyser - NOAA/NWS/NCEP/EMC
(Last Revised 10/29/2008)



Please take a moment to read the Disclaimer for this non-operational web page.
 

The dumping of observational data is the first step in each NCEP network production suite.  At the appropriate network data cutoff time, up to three separate jobs are executed simultaneously - two dump jobs and a tropical cyclone processing job.  Once the two dump jobs have completed, a separate dump post processing job is initiated.


===> Dump Job 1 performs the following steps in sequence:

 

ACopying Files For Later Use By Analyses

In the Global Forecast System (GFS) and Global Data Assimilation System (GDAS) network runs, GRIB files containing current analyses of snow depth, ice distribution, and sea-surface temperature from NESDIS are copied from the NCEP IBM Central Computer System (IBM-CCS) /dcom database into network-specific (“/com”) directories.  These fields will be read later by the Global Gridpoint Statistical Interpolation (GSI) analysis.  (Note: If the current files are not available, then one-day old files are copied.)

In the North American Model (NAM) and North American Data Assimilation System (NDAS) network runs, GRIB files containing current analyses of snow depth and snow cover from NESDIS are copied from the NCEP IBM-CCS /dcom database into network-specific /com directories.  These fields will be read later by the Regional Gridpoint Statistical Interpolation (GSI) analysis.  (Note: If the current files are not available, then one-day old files are copied.)

In the Rapid Update Cycle (RUC) network, GRIB files containing snow cover analyses from NESDIS are copied from the NCEP IBM-CCS /dcom database into network-specific /com directories.  These fields will be read later by the RUC 3DVAR-analysis.  (Note: If the current files are not available, then one-day old files are copied.) 
 


B.
  Dumping of BUFR Observational Data (excluding WSR-88D Level II radial wind and reflectivity - see Dump Job 2 for the dumping of these data)

The process of accessing the observational database and retrieving a select set of observational data is accomplished in several stages by a number of FORTRAN codes.  This retrieval process is run in all of the operational networks many times a day to assemble “dump” data for model assimilation.   The script that manages the retrieval of observations provides users with a wide range of options.  These include observational date/time windows, specification of geographic regions for filtering (via either a lat/lon box, a center point lat/lon and radius, or a lat/lon grid point mask), data specification and combination, duplicate checking and bulletin “part” merging, and parallel processing.

The primary retrieval software performs the initial stage of all data dumping by retrieving subsets of the NCEP IBM-CCS BUFR /dcom data base that contain all of the data base messages valid for the data type, geographical filter and time window requested by a user. (Recall that the /dcom data base is continuously updated with new data as the data GTS decoder and satellite ingest jobs run.)  The retrieval software looks only at the date in Section One of the BUFR message to determine which messages to copy for a particular data type.  This results in an observing set containing possibly more data than was requested, but allows the software to function very efficiently.

The second stage of the process performs a final 'winnowing' of the data to an observing set with the exact time window requested1.  This is done within the codes which remove exact- or near-duplicate reports (the nature of which is data type dependent) and merge bulletin parts for upper-air reports.

1Normally, the six-hour cycle GFS, GDAS, and Climate Data Assimilation System (CDAS) network runs dump BUFR data globally over a six-hour time window centered on the analysis time.  The six-hour cycle NAM and NDAS network runs normally dump data within the expanded WRF-NMM-model domain over a six-hour time window centered on the analysis time (the NDAS assimilates data and updates every three-hours).  The one-hour cycle upper-air RUC network runs normally dump BUFR data within the expanded WRF-NMM-model domain (a superset of the RUC domain) over a one-hour time window centered on the analysis time.  The one-hour cycle surface RUC network runs normally dump BUFR data within the expanded RUC domain over a one-hour time window centered on the analysis time.

The final stage of the process is the application of manual quality marks to the data extracted.  The quality marks are provided by personnel in two groups: the NCEP/NCO Systems Integration Branch (SIB) and the NCEP/Ocean Prediction Center (OPC).  The NCEP/NCO/SIB Senior Duty Meteorologists (SDMs) can apply quality markers to individual variables in many observational data types such as rawinsonde, dropwinsonde, PIBAL, aircraft, satellite wind, surface land, surface marine, wind profiler and Vertical Azimuth Display (VAD) wind reports.  These markers either ensure that the datum marked will be assimilated by the particular analysis regardless of any subsequent quality control on it (called a "keep" flag), or ensure that it will NOT be assimilated (called a "purge" flag).  The SDMs use an interactive program on the IBM-CCS which initiates the off line execution of automated quality control programs run in the subsequent PREPBUFR processing steps and then review the programs’ decisions before making assessment decisions.  The SDMs use satellite pictures, meteorological graphics, continuity of data, input from reporting stations, past station performance and horizontal data comparisons (buddy checks) to decide whether or not to override quality control flags from the automated programs.  All flags are stored in an ASCII file on the IBM-SP for use during this data retrieval process.  The NCEP/NCO/SIB also maintains a list of data that should be rejected based on, among other things, monthly statistics provided from the NCEP and other international centers, and feedback from data producers.  All rejected data receive either a "reject" or "purge" flag here.  The flags are appended to the same ASCII file used for storing the SDM quality marks.  NCEP/OPC personnel perform real-time interactive quality control of global surface marine meteorological data and sea surface temperature using a graphical interactive program called CREWSS (Collect, Review, and Edit Weather data from the Sea Surface).  CREWSS provides an evaluation of the quality of the marine surface data provided by ships, buoys (drifting and moored), Coastal Marine Automated Network (CMAN) stations, and tide gauge stations by comparing the observations to GFS model first guess fields for all four synoptic periods. Data that differ from the first guess fields by more than certain amounts are then examined via techniques that involve buddy checks versus neighboring platforms, the platform’s track, and a one week history for each platform.  The NCEP/OPC personnel can either mark these data according to their quality, here applying either a "keep" or "purge" flag, or they can correct obvious errors in the data, such as incorrect hemisphere, misplaced decimal, etc. (corrected data receive a "good" quality mark in the subsequent PREPBUFR processing steps.)   Upon completion of interactive quality control, an ASCII text file containing all quality control decisions and corrections is then uploaded to the IBM-CCS for use during this data retrieval process.

Each data type selected for dumping is associated with a unique mnemonic string which represents a particular BUFR type and subtype in the /dcom database.  The complete list of BUFR data types is shown in Table 1.a.  This includes obsolete data types, future data types, and current data types which are currently not dumped in any network job.  In order to limit the number of output dump files in the operational network jobs, like data types are grouped together and represented by sequence or group mnemonics.  The data group mnemonics used to generate dump files in the various NCEP networks (including obsolete types)  are read by either the subsequent PREPBUFR processing steps , by the subsequent analysis codes, or by neither according to network.  See Table 1.b for a listing of data group mnemonic dumps read by the PREPBUFR processing steps and Table 1.c for a listing of data group mnemonic dumps read by the analysis codes.
 

C.  Re-processing of BUFR Observational Data Dump Files

Some of the BUFR data dump files are re-processed into new BUFR files such that they can be used properly by the subsequent PREPBUFR processing or analysis programs.

 1.  SSM/I data - all network runs:  The “reports” in the SSM/I products BUFR dump files (group mnemonics “ssmip” or “ssmipn”, see Table 1.b) consist of orbital scans, each of which contain 64 retrieval footprints of one or more products.  The program PREPOBS_PREPSSMI unpacks selected products out of the scans, superobs them onto a one-degree latitude/longitude grid (optional in some network runs) then encodes them as individual “reports” in the output, re-processed, BUFR file which contains only those data needed for subsequent PREPBUFR processing.  The output filename contains the qualifier “spssmi” (see Table 1.b, key for superscript 2 in “NET” column).  The GDAS, GFS and CDAS network runs superob the “operational” rainfall rate product generated at FNMOC, and the surface ocean wind speed and total column precipitable water products generated using a Neural-Net 3 algorithm (OMBNN3) developed by the Marine Modeling Branch of NCEP/EMC.  The NAM and NDAS network runs superob the “operational” surface ocean wind speed and total column precipitable water products generated at FNMOC.  The upper-air RUC network run processes the same products as the NAM and NDAS network runs but it does not superob the data.

 2. QuikSCAT data - NAM, NDAS, GFS, GDAS and CDAS network runs: Each “report” in the QuikSCAT BUFR dump file (group mnemonic “qkscat”, see Table 1.b) consists of four sets of nudged wind vectors and other raw scatterometer information.  The program WAVE_DCODQUIKSCAT unpacks each report checking the report date for realism, selecting the proper nudged wind vector, and excluding reports over land, reports with missing nudged wind vector, reports with missing model wind direction and speed, reports with probability of rain greater than 10%, and reports at the edges of the orbital swath.  Reports passing checks are then superobed onto a one-half degree lat/lon grid according to satellite id and encoded into the output, re-processed BUFR file which contains only those data needed for subsequent PREPBUFR processing.  The output filename contains the qualifier “qkswnd” (see Table 1.b, key for superscript 1 in “NET” column).

 3. TRMM TMI data - GFS, GDAS and CDAS network runs: Each “report” in the TRMM TMI BUFR dump file (group mnemonic “trmm”, see Table 1.c) is at full footprint resolution.  The program BUFR_SUPERTMI unpacks each report checking the validity of the satellite id, observation date and total precipitation observation.  Reports passing checks are then superobed onto a one-degree lat/lon grid according to satellite id and encoded into the output, re-processed BUFR file.  The output filename contains the qualifier “sptrmm” (see Table 1.c, key for superscript 1 in “NET” column).  The Global GSI analysis (GFS and GDAS network runs only) reads the superobed data directly from the reprocessed "sptrmm" BUFR dump file (these data do not pass through the PREPBUFR processing steps). 

4. WindSat data - GFS, GDAS and CDAS network runs: Each “report” in the WindSat BUFR dump file (group mnemonic “wndsat”, see Table 1.b) consists of four sets of nudged wind vectors and other raw scatterometer information.  The program BUFR_DCODWINDSAT unpacks each report checking the report date for realism, selecting the proper nudged wind vector, and excluding reports not explicitly over ocean, reports with missing nudged wind vector, reports with missing model wind direction and speed, and reports with a "bad" or "no retrieval" EDR quality flag.  Reports passing checks are then superobed onto a one-degree lat/lon grid according to satellite id and encoded into the output, re-processed BUFR file which contains only those data needed for subsequent PREPBUFR processing.  The output filename contains the qualifier “wdsatr” (see Table 1.b, key for superscript 5 in “NET” column).

5. ASCAT data - GFS, GDAS and CDAS network runs: Each “report” in the ASCAT BUFR dump file (group mnemonic “ascatt”, see Table 1.b) consists of two sets of nudged wind vectors and other raw scatterometer information.  The program WAVE_DCODQUIKSCAT unpacks each report checking the report date for realism, selecting the proper nudged wind vector, and excluding reports over land, reports with missing nudged wind vector, reports with missing model wind direction and speed, and reports with one or more "critical" wind vector cell quality flags set. Reports passing checks are then encoded into the output, re-processed BUFR file which contains only those data needed for subsequent PREPBUFR processing.  The output filename contains the qualifier “ascatw” (see Table 1.b, key for superscript 6 in “NET” column).


===> Dump Job 2, running simultaneously with Dump Job 1, performs the following single step:


Dumping of WSR-88D Level II radial wind and reflectivity BUFR Data

This currently runs in only the NAM and NDAS networks.  The processing is identical to that described in Dump Job 1, Step B above.  The dumping of WSR-88D Level II radial wind and reflectivity data is performed in a separate job from the dumping of all other data in the NAM and NDAS networks in order to save computation time since it takes almost as long to dump Level II data here as it takes to dump all other observational data in Dump Job 1.





===> Tropical Cyclone Processing Job, running simultaneously with Dump Job 1 and, in the NAM and NDAS networks Dump Job 2, performs the following steps in sequence:


A.
  Quality Control of Tropical Cyclone Bulletin Data

In the GFS, GDAS, NAM and NDAS network runs, tropical cyclone bulletins valid for the current cycle from the Joint Typhoon Warning Center (JWTC) and Fleet Numerical Meteorology and Oceanography Center (FNMOC) are read from the NCEP IBM-CCS /dcom database and merged into the proper record structure by the program SYNDAT_GETJTBUL.  Next, tropical cyclone bulletins valid for the current cycle from the NCEP/Tropical Prediction Center (TPC) are read from the TPC directory on the NCEP IBM-CCS (these are already in the proper record format).  Finally, manually generated tropical cyclone bulletins are read from the NCEP IBM-CCS database.  The latter can be generated by the NCEP/NCO Senior Duty Meteorologist (SDM) in the event that data from other sources are not available.

Next, the program SYNDAT_QCTROPCY runs in order to merge the tropical cyclone records from the various sources and perform quality control on tropical cyclone position and intensity information.  Some of the checks performed include duplicate records, appropriate date/time, proper record structure, storm name/id number, records from multiple institutions, secondary variables (e.g. central pressure), storm position and direction/speed.  The emphasis is on internal consistency between the reported storm location and prior motion. The output tropical cyclone vital statistics (tcvitals) file is then copied to the network-specific /com directories in the NCEP IBM-CCS.  This file is read in the next tropical cyclone relocation step in all networks and also later in the PREPBUFR processing by the program SYNDAT_SYNDATA in the NAM and NDAS networks in order to generate tropical cyclone bogus wind reports.



B.  Relocation of Tropical Cyclone Vortices in the Global Sigma (First) Guess

In the GFS, GDAS, NAM and NDAS network runs, the quality-controlled tropical storm position and intensity field (tcvitals) file valid at the current time (output by the previous tropical cyclone record q.c. step). along with the tcvitals files valid 12- and 6-hours prior to the current time, and the "best" global sigma first guess and global pressure grib files valid 6-hours prior to the current time, 3-hours prior to the current time, at the current time, and 3-hours after the current time are input to a series of programs (SUPVIT, GETTRK, RELOCATE_MV_NVORTEX).  These programs relocate one or more tropical cyclone (or hurricane) vortices in the global sigma first guess files valid 3-hours prior to the current time, at the current time, and 3-hours after the current time.  The updated global sigma guess file for the current time is later read in the PREPBUFR processing by the program PREPOBS_PREPDATA and used by the various quality control programs in the PREPBUFR processing stream.  In the GFS and GDAS networks, the updated global sigma guess files for all three times (current time, for 3-hours prior to the current time, and for 3-hours after the current time) are read by the subsequent Global GSI analysis.   This processing may also (but usually not) generate an updated tcvitals file valid at the current time.  This file, if generated, contains only records for "weak" vortices which could not be used to update the global sigma first guess here.  It would be read later in the PREPBUFR processing by the program SYNDAT_SYNDATA in the GFS and GDAS networks in order to generate tropical cyclone bogus wind reports.  If this file is empty, no bogus reports will be generated by SYNDAT_SYNDATA.   This updated tcvitals file is not considered in the NAM and NDAS network runs as the original tcvtials file, output by the previous tropical cyclone record q.c. step, is always input to SYNDAT_SYNDATA.    

Note1: This job runs only in the GFS, GDAS, NAM and NDAS networks, and only if TPC and/or JTWC/FNMOC tropical storm records are originally present and valid at the current time.

Note2: In the NDAS network, Step A runs alone as a job only one time for each cycle (00, 06, 12, 18Z), four hours after cycle time.  This is much later than the corresponding NDAS cycle's series of four dump jobs and four Step B jobs (running relocation only) for cycle time minus 12-hours, cycle time minus 9-hours, cycle time minus 6-hours and cycle time minus 3-hours.  The dump and relocation jobs run simultaneously for each of the four NDAS cycle processing times.  The Step A job generates post-dated tcvitals files which will be read by future NDAS Step B jobs.


===> Dump Post-Processing Job, running after both 
Dump Job 1 and Dump Job 2 have completed, performs the following single step:

Post-processing of BUFR Observational Data Dump Files

The completion of the data dump job(s) triggers a job which performs post-processing on the data dump files just created.  This job does not produce any output necessary to the successful completion of the analysis/forecast network [indeed it runs simultaneously with the PREPBUFR Processing Job which is also triggered by the completion of the data dump job(s)].

The first job step prepares a table of data counts for the various reports just dumped via the execution of the program BUFR_DATACOUNT.  These counts are compared to the running average over the past 30 days for each report type for the particular network and cycle time.  If the current dump count for a particular type is considered abnormally low (for most report types this means more than 50% below the 30 day average), a dump alert is generated.  The action taken for low dump counts depends upon the report type.  For those types considered "critical" to the subsequent assimilation system, a low dump count generates diagnostics and triggers a code failure and a return code of 6 in the dump alert job .  For those types considered  "moderately-critical" (all types that are assimilated which are not in the "critical" category), a low dump count generates diagnostics and a non-fatal return code of 5 in the dump alert job.  For those types considered "non-critical" (all types that are not assimilated in the particular network), a low dump count generates diagnostics and a non-fatal return code of 4 in the dump alert job.  In all cases, a complete listing of dump counts vs. the 30 day average, along with those types which are either low or high (for most report types this means more than 200% above the 30 day average) is sent to the SDM.  High dump counts do not generate non-zero return codes in the dump alert job but they do generate diagnostics.  Trends in the 30 day averages vs. those for 3-, 6-, 9- and 12-months ago are also recorded for the SDM (report types trend low vs. one of these previous averaging periods if the current 30 day average is more than 20% below the 30 day average for that period, or report types trend high vs. one of these previous averaging periods if the current 30 day average is more than 20% above the 30 day average for that period).  Currently this dump count and alert processing runs only in the NAM, GFS and GDAS networks.

The next job step executes the program BUFR_REMOREST which removes or masks, from the appropriate dump files, certain data types that are restricted (either by the data producers themselves or by the WMO) from redistribution outside of NCEP.  NCEP/NCO has created a very strict policy on who may or may not have access to restricted data.   The resulting dump files, gleaned of all restricted data, are given a suffix qualifier of  ".nr" in the network-specific /com directories on the NCEP-CCS.

The next dump post-processing job step executes the program BUFR_LISTDUMPS which generates files containing text listings of all reports in the various BUFR data dump files.  These text files are then copied to the network-specific /com directories on the IBM-CCS in order to provide diagnostic information for troubleshooting problems in the data, etc.  Files containing listings of dump files that have been stripped of all restricted data are given the suffix qualifier ".nr".

The post-processing job also contains a step which generates unblocked versions of the BUFR data dump files and copies them to the /com directories (again, files containing unblocked forms of dump files that have been stripped of all restricted data are given the suffix qualifier ".nr").  The unblocked files are then copied to servers for use by organizations outside of NCEP.  (The native blocking on the IBM-SP machine is Fortran 77.)   Restricted data are not copied to these servers.

Finally, in the all networks, the final post-processing job of the day performs a data average processing step via the execution of the program BUFR_AVGDATA.  This updates the 30 day running average for each report type dumped, for each cycle for which a dump is generated.  These "current" 30 day averages are saved in text files, according to the network,  in the "/com/arch/prod/avgdata" directory on the NCEP CCS.   These files are used by the dump alert processing in the NAM, GFS and GDAS networks in order to generate alerts for high or low dump counts for the current dump vs. the current 30 day average (see paragraph two in this section).   For the final post-processing job of a particular month, the current 30 day average for the NAM, GFS and GDAS networks is saved off in a separate file for that month in the same "/com"  directory as the current 30 day average files.  These past month 30 day average files are used to check for high and low trends in the current NAM, GFS or GDAS 30 day average for a particular report vs. the 30 day average for 3-, 6-, 9- and 12-months ago (again, see paragraph two in this section).  Only the most recent 12 months of 30 day averages are saved here for the NAM, GFS and GDAS networks.
 
 

The NCEP production suite schedule, for those networks which originate with a dump of observational data, is shown in Table 2.  “DUMP” indicates the name of the Dump Job 1, "DUMP2” indicates the name of the Dump Job 2, "TROPCY" indicates the name of the Tropical Cyclone Processing Job (with "TROPC1" for the relocation part only and "TROPCY2" for the q.c. part only in the NDAS network), "DPOST” indicates the name of the Dump Post-processing Job, "PREP" (and "PREP1" and "PREP2" in the CDAS network) indicates the name of the PREPBUFR Processing Job, "ANAL” indicates the name of the Analysis Job, "FCST” (and "FCSTH" and "FCSTL" in the GFS network) indicates the name of the Forecast Job, "PPOST" (and "PPOST1" and "PPOST2" in the CDAS network) indicates the name of the PREPBUFR Post-processing Job, "GESS" in the RTMA network indicates the name of the job which retrieves the first-guess and "APOST" in the RTMA network indicates the name if the Analysis Post-processing Job.  The initiation of the dump jobs ("DUMP" and "DUMP2") and the tropical cyclone processing job ("TROPCY", or "TROPCY1" in the NDAS network) are triggered by the clock at the times indicated.  All subsequent jobs run in sequence.