File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Automated data quality assurance for marine observations

TitleAutomated data quality assurance for marine observations
Authors
Issue Date2008
Citation
Oceans 2008, 2008 How to Cite?
AbstractThe ocean monitoring community requires high quality data that is Data Management and Communications (DMAC)- compliant for both near real time (ex., weather forecasting and warnings) and climate data records (ex., comprehensive delayed mode). SAIC has developed a flexible and cost effective automated data quality assurance (ADQA) system that can be used to assess the quality of marine observations and provide quality controlled data to a wide variety of end users. For example, if a researcher needs data sets from different sources for a modeling project, ADQA provides a means of characterizing the relative quality of these input sets. These characterizations are necessary to enable the researcher to determine the cumulative accuracy of his modeling output. The system is scalable and can be used by a single data provider or a large data center. For the U.S. Integrated Ocean Observing System (IOOS), this approach is well-suited for a diverse science community employing many different methods to characterize the quality of their data products. The ADQA system has been implemented by integrating a set of data quality algorithms based upon National Data Buoy Center (NDBC) Technical Document 03-02 "Handbook of Automated Data Quality Control Checks and Procedures of the National Data Buoy Center" into a processing system that was created using the CALIPSO science data processing framework developed by SAIC. The CALIPSO framework is actually a library of reusable components that provide the core non-science functionality of any science data processing system. This design separates the science processing components from the basic, reusable system infrastructure and allows for the addition or removal of algorithms with relative ease. The generic infrastructure of the framework includes substantial core functionality (ex., read inputs, organize/format data, run algorithms, write output, handle errors and generate metadata) that is common to many science data management applications and is easily configurable to work with a wide variety of marine instruments and science data sets. The basic architecture includes the following subsystems: Input, Control, Data Store and Output. The Input subsystem uses an American Standard Code for Information Exchange (ASCII) configuration file to identify input parameters and define the input file format. As a data record is read from the input file, it is processed into discrete parameters and stored in the Data Store subsystem. Thus, the data are readily available within Framework. The current data formats are: Network Common Data Form (NetCDF), Hierarchical Data Format (HDF), Sensor Metadata Language (SensorML) and ASCII. However, the flexibility of the framework allows for the inclusion of additional data formats without having to redesign the entire ADQA system. The Control subsystem provides the linkage between Framework and the algorithm library and orchestrates the processing activities between the various Framework subsystems and the Algorithm Library. This subsystem balances the flexibility and performance of the ADQA system as it readily accommodates algorithm and data product additions with minimal code changes. The Store Data subsystem is implemented using common data structure for application specific data structures assembled from standard building blocks. Thus, allowing other Framework elements to be generic and configurable. This architectural approach facilitates changing "on the fly" the underlying data structures to allow the addition of parameters or new algorithms. Localized changes to algorithms are made with minimal changes to the input, control, data store and output modules. Furthermore, this subsystem allows parameters to be retrieved by name, consequently, other Framework subsystems and the algorithms in the algorithm library are able to query data structures and retrieve the necessary information. The output subsystem also uses an ASCII configuration file to define the format and content of data products. We are able to add additional data formatwithout having to design the entire ADQA system. The current data formats are: NetCDF, HDF, Sensor and ASCII. The Algorithm Library of the ADQA system was implemented using a modular architecture that encapsulates algorithms into decoupled, re-useable modules while providing the mechanism for assembling them into a working system. We are building a library of re-useable validation algorithms by implementing the algorithms as generic re-useable modules based upon NDBC Technical Document 03-02 that can be configured to work with parameters from any source. For example, we have configured a rate and limit check algorithm to process several marine parameters by simply defining the appropriate thresholds. For this paper, we will present a scalable ADQA processing system that is well-suited to address many different methods of characterizing the quality of data products. The system is a modular design that simplifies the integration of additional data quality assurance (DQA) processing components. © 2008 IEEE.
Persistent Identifierhttp://hdl.handle.net/10722/173414
References

 

DC FieldValueLanguage
dc.contributor.authorKoziana, JVen_US
dc.contributor.authorOlson, Jen_US
dc.contributor.authorAnselmo, Ten_US
dc.contributor.authorLu, Wen_US
dc.date.accessioned2012-10-30T06:30:56Z-
dc.date.available2012-10-30T06:30:56Z-
dc.date.issued2008en_US
dc.identifier.citationOceans 2008, 2008en_US
dc.identifier.urihttp://hdl.handle.net/10722/173414-
dc.description.abstractThe ocean monitoring community requires high quality data that is Data Management and Communications (DMAC)- compliant for both near real time (ex., weather forecasting and warnings) and climate data records (ex., comprehensive delayed mode). SAIC has developed a flexible and cost effective automated data quality assurance (ADQA) system that can be used to assess the quality of marine observations and provide quality controlled data to a wide variety of end users. For example, if a researcher needs data sets from different sources for a modeling project, ADQA provides a means of characterizing the relative quality of these input sets. These characterizations are necessary to enable the researcher to determine the cumulative accuracy of his modeling output. The system is scalable and can be used by a single data provider or a large data center. For the U.S. Integrated Ocean Observing System (IOOS), this approach is well-suited for a diverse science community employing many different methods to characterize the quality of their data products. The ADQA system has been implemented by integrating a set of data quality algorithms based upon National Data Buoy Center (NDBC) Technical Document 03-02 "Handbook of Automated Data Quality Control Checks and Procedures of the National Data Buoy Center" into a processing system that was created using the CALIPSO science data processing framework developed by SAIC. The CALIPSO framework is actually a library of reusable components that provide the core non-science functionality of any science data processing system. This design separates the science processing components from the basic, reusable system infrastructure and allows for the addition or removal of algorithms with relative ease. The generic infrastructure of the framework includes substantial core functionality (ex., read inputs, organize/format data, run algorithms, write output, handle errors and generate metadata) that is common to many science data management applications and is easily configurable to work with a wide variety of marine instruments and science data sets. The basic architecture includes the following subsystems: Input, Control, Data Store and Output. The Input subsystem uses an American Standard Code for Information Exchange (ASCII) configuration file to identify input parameters and define the input file format. As a data record is read from the input file, it is processed into discrete parameters and stored in the Data Store subsystem. Thus, the data are readily available within Framework. The current data formats are: Network Common Data Form (NetCDF), Hierarchical Data Format (HDF), Sensor Metadata Language (SensorML) and ASCII. However, the flexibility of the framework allows for the inclusion of additional data formats without having to redesign the entire ADQA system. The Control subsystem provides the linkage between Framework and the algorithm library and orchestrates the processing activities between the various Framework subsystems and the Algorithm Library. This subsystem balances the flexibility and performance of the ADQA system as it readily accommodates algorithm and data product additions with minimal code changes. The Store Data subsystem is implemented using common data structure for application specific data structures assembled from standard building blocks. Thus, allowing other Framework elements to be generic and configurable. This architectural approach facilitates changing "on the fly" the underlying data structures to allow the addition of parameters or new algorithms. Localized changes to algorithms are made with minimal changes to the input, control, data store and output modules. Furthermore, this subsystem allows parameters to be retrieved by name, consequently, other Framework subsystems and the algorithms in the algorithm library are able to query data structures and retrieve the necessary information. The output subsystem also uses an ASCII configuration file to define the format and content of data products. We are able to add additional data formatwithout having to design the entire ADQA system. The current data formats are: NetCDF, HDF, Sensor and ASCII. The Algorithm Library of the ADQA system was implemented using a modular architecture that encapsulates algorithms into decoupled, re-useable modules while providing the mechanism for assembling them into a working system. We are building a library of re-useable validation algorithms by implementing the algorithms as generic re-useable modules based upon NDBC Technical Document 03-02 that can be configured to work with parameters from any source. For example, we have configured a rate and limit check algorithm to process several marine parameters by simply defining the appropriate thresholds. For this paper, we will present a scalable ADQA processing system that is well-suited to address many different methods of characterizing the quality of data products. The system is a modular design that simplifies the integration of additional data quality assurance (DQA) processing components. © 2008 IEEE.en_US
dc.languageengen_US
dc.relation.ispartofOCEANS 2008en_US
dc.titleAutomated data quality assurance for marine observationsen_US
dc.typeConference_Paperen_US
dc.identifier.emailLu, W:wwlu@hku.hken_US
dc.identifier.authorityLu, W=rp00411en_US
dc.description.naturelink_to_subscribed_fulltexten_US
dc.identifier.doi10.1109/OCEANS.2008.5151904en_US
dc.identifier.scopuseid_2-s2.0-70350130896en_US
dc.relation.referenceshttp://www.scopus.com/mlt/select.url?eid=2-s2.0-70350130896&selection=ref&src=s&origin=recordpageen_US
dc.identifier.scopusauthoridKoziana, JV=6507561233en_US
dc.identifier.scopusauthoridOlson, J=7402882833en_US
dc.identifier.scopusauthoridAnselmo, T=36097760300en_US
dc.identifier.scopusauthoridLu, W=7404215221en_US

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats