File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/OCEANS.2008.5151904
- Scopus: eid_2-s2.0-70350130896
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Automated data quality assurance for marine observations
Title | Automated data quality assurance for marine observations |
---|---|
Authors | |
Issue Date | 2008 |
Citation | Oceans 2008, 2008 How to Cite? |
Abstract | The ocean monitoring community requires high quality data that is Data Management and Communications (DMAC)- compliant for both near real time (ex., weather forecasting and warnings) and climate data records (ex., comprehensive delayed mode). SAIC has developed a flexible and cost effective automated data quality assurance (ADQA) system that can be used to assess the quality of marine observations and provide quality controlled data to a wide variety of end users. For example, if a researcher needs data sets from different sources for a modeling project, ADQA provides a means of characterizing the relative quality of these input sets. These characterizations are necessary to enable the researcher to determine the cumulative accuracy of his modeling output. The system is scalable and can be used by a single data provider or a large data center. For the U.S. Integrated Ocean Observing System (IOOS), this approach is well-suited for a diverse science community employing many different methods to characterize the quality of their data products. The ADQA system has been implemented by integrating a set of data quality algorithms based upon National Data Buoy Center (NDBC) Technical Document 03-02 "Handbook of Automated Data Quality Control Checks and Procedures of the National Data Buoy Center" into a processing system that was created using the CALIPSO science data processing framework developed by SAIC. The CALIPSO framework is actually a library of reusable components that provide the core non-science functionality of any science data processing system. This design separates the science processing components from the basic, reusable system infrastructure and allows for the addition or removal of algorithms with relative ease. The generic infrastructure of the framework includes substantial core functionality (ex., read inputs, organize/format data, run algorithms, write output, handle errors and generate metadata) that is common to many science data management applications and is easily configurable to work with a wide variety of marine instruments and science data sets. The basic architecture includes the following subsystems: Input, Control, Data Store and Output. The Input subsystem uses an American Standard Code for Information Exchange (ASCII) configuration file to identify input parameters and define the input file format. As a data record is read from the input file, it is processed into discrete parameters and stored in the Data Store subsystem. Thus, the data are readily available within Framework. The current data formats are: Network Common Data Form (NetCDF), Hierarchical Data Format (HDF), Sensor Metadata Language (SensorML) and ASCII. However, the flexibility of the framework allows for the inclusion of additional data formats without having to redesign the entire ADQA system. The Control subsystem provides the linkage between Framework and the algorithm library and orchestrates the processing activities between the various Framework subsystems and the Algorithm Library. This subsystem balances the flexibility and performance of the ADQA system as it readily accommodates algorithm and data product additions with minimal code changes. The Store Data subsystem is implemented using common data structure for application specific data structures assembled from standard building blocks. Thus, allowing other Framework elements to be generic and configurable. This architectural approach facilitates changing "on the fly" the underlying data structures to allow the addition of parameters or new algorithms. Localized changes to algorithms are made with minimal changes to the input, control, data store and output modules. Furthermore, this subsystem allows parameters to be retrieved by name, consequently, other Framework subsystems and the algorithms in the algorithm library are able to query data structures and retrieve the necessary information. The output subsystem also uses an ASCII configuration file to define the format and content of data products. We are able to add additional data formatwithout having to design the entire ADQA system. The current data formats are: NetCDF, HDF, Sensor and ASCII. The Algorithm Library of the ADQA system was implemented using a modular architecture that encapsulates algorithms into decoupled, re-useable modules while providing the mechanism for assembling them into a working system. We are building a library of re-useable validation algorithms by implementing the algorithms as generic re-useable modules based upon NDBC Technical Document 03-02 that can be configured to work with parameters from any source. For example, we have configured a rate and limit check algorithm to process several marine parameters by simply defining the appropriate thresholds. For this paper, we will present a scalable ADQA processing system that is well-suited to address many different methods of characterizing the quality of data products. The system is a modular design that simplifies the integration of additional data quality assurance (DQA) processing components. © 2008 IEEE. |
Persistent Identifier | http://hdl.handle.net/10722/173414 |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Koziana, JV | en_US |
dc.contributor.author | Olson, J | en_US |
dc.contributor.author | Anselmo, T | en_US |
dc.contributor.author | Lu, W | en_US |
dc.date.accessioned | 2012-10-30T06:30:56Z | - |
dc.date.available | 2012-10-30T06:30:56Z | - |
dc.date.issued | 2008 | en_US |
dc.identifier.citation | Oceans 2008, 2008 | en_US |
dc.identifier.uri | http://hdl.handle.net/10722/173414 | - |
dc.description.abstract | The ocean monitoring community requires high quality data that is Data Management and Communications (DMAC)- compliant for both near real time (ex., weather forecasting and warnings) and climate data records (ex., comprehensive delayed mode). SAIC has developed a flexible and cost effective automated data quality assurance (ADQA) system that can be used to assess the quality of marine observations and provide quality controlled data to a wide variety of end users. For example, if a researcher needs data sets from different sources for a modeling project, ADQA provides a means of characterizing the relative quality of these input sets. These characterizations are necessary to enable the researcher to determine the cumulative accuracy of his modeling output. The system is scalable and can be used by a single data provider or a large data center. For the U.S. Integrated Ocean Observing System (IOOS), this approach is well-suited for a diverse science community employing many different methods to characterize the quality of their data products. The ADQA system has been implemented by integrating a set of data quality algorithms based upon National Data Buoy Center (NDBC) Technical Document 03-02 "Handbook of Automated Data Quality Control Checks and Procedures of the National Data Buoy Center" into a processing system that was created using the CALIPSO science data processing framework developed by SAIC. The CALIPSO framework is actually a library of reusable components that provide the core non-science functionality of any science data processing system. This design separates the science processing components from the basic, reusable system infrastructure and allows for the addition or removal of algorithms with relative ease. The generic infrastructure of the framework includes substantial core functionality (ex., read inputs, organize/format data, run algorithms, write output, handle errors and generate metadata) that is common to many science data management applications and is easily configurable to work with a wide variety of marine instruments and science data sets. The basic architecture includes the following subsystems: Input, Control, Data Store and Output. The Input subsystem uses an American Standard Code for Information Exchange (ASCII) configuration file to identify input parameters and define the input file format. As a data record is read from the input file, it is processed into discrete parameters and stored in the Data Store subsystem. Thus, the data are readily available within Framework. The current data formats are: Network Common Data Form (NetCDF), Hierarchical Data Format (HDF), Sensor Metadata Language (SensorML) and ASCII. However, the flexibility of the framework allows for the inclusion of additional data formats without having to redesign the entire ADQA system. The Control subsystem provides the linkage between Framework and the algorithm library and orchestrates the processing activities between the various Framework subsystems and the Algorithm Library. This subsystem balances the flexibility and performance of the ADQA system as it readily accommodates algorithm and data product additions with minimal code changes. The Store Data subsystem is implemented using common data structure for application specific data structures assembled from standard building blocks. Thus, allowing other Framework elements to be generic and configurable. This architectural approach facilitates changing "on the fly" the underlying data structures to allow the addition of parameters or new algorithms. Localized changes to algorithms are made with minimal changes to the input, control, data store and output modules. Furthermore, this subsystem allows parameters to be retrieved by name, consequently, other Framework subsystems and the algorithms in the algorithm library are able to query data structures and retrieve the necessary information. The output subsystem also uses an ASCII configuration file to define the format and content of data products. We are able to add additional data formatwithout having to design the entire ADQA system. The current data formats are: NetCDF, HDF, Sensor and ASCII. The Algorithm Library of the ADQA system was implemented using a modular architecture that encapsulates algorithms into decoupled, re-useable modules while providing the mechanism for assembling them into a working system. We are building a library of re-useable validation algorithms by implementing the algorithms as generic re-useable modules based upon NDBC Technical Document 03-02 that can be configured to work with parameters from any source. For example, we have configured a rate and limit check algorithm to process several marine parameters by simply defining the appropriate thresholds. For this paper, we will present a scalable ADQA processing system that is well-suited to address many different methods of characterizing the quality of data products. The system is a modular design that simplifies the integration of additional data quality assurance (DQA) processing components. © 2008 IEEE. | en_US |
dc.language | eng | en_US |
dc.relation.ispartof | OCEANS 2008 | en_US |
dc.title | Automated data quality assurance for marine observations | en_US |
dc.type | Conference_Paper | en_US |
dc.identifier.email | Lu, W:wwlu@hku.hk | en_US |
dc.identifier.authority | Lu, W=rp00411 | en_US |
dc.description.nature | link_to_subscribed_fulltext | en_US |
dc.identifier.doi | 10.1109/OCEANS.2008.5151904 | en_US |
dc.identifier.scopus | eid_2-s2.0-70350130896 | en_US |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-70350130896&selection=ref&src=s&origin=recordpage | en_US |
dc.identifier.scopusauthorid | Koziana, JV=6507561233 | en_US |
dc.identifier.scopusauthorid | Olson, J=7402882833 | en_US |
dc.identifier.scopusauthorid | Anselmo, T=36097760300 | en_US |
dc.identifier.scopusauthorid | Lu, W=7404215221 | en_US |