File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: SCODED: Statistical Constraint Oriented Data Error Detection

TitleSCODED: Statistical Constraint Oriented Data Error Detection
Authors
Keywordserror detection
machine learning
statistical constraints
Issue Date2020
PublisherAssociation for Computing Machinery.
Citation
SIGMOD/PODS '20: International Conference on Management of Data, Portland, OR, USA, 14-19 June 2020, p. 845–860 How to Cite?
AbstractStatistical Constraints (SCs) play an important role in statistical modeling and analysis. This paper brings the concept to data cleaning and studies how to leverage SCs for error detection. SCs provide a novel approach that has various application scenarios and works harmoniously with downstream statistical modeling. Entailment relationships between SCs and integrity constraints provide analytical insight into SCs. We develop SCODED, an SC-Oriented Data Error Detection system, comprising two key components: (1) SC Violation Detection : checks whether an SC is violated on a given dataset, and (2) Error Drill Down : identifies the top-k records that contribute most to the violation of an SC. Experiments on synthetic and real-world data show that SCs are effective in detecting data errors that violate them, compared to state-of-the-art approaches.
Persistent Identifierhttp://hdl.handle.net/10722/291190
ISBN

 

DC FieldValueLanguage
dc.contributor.authorYan, J-
dc.contributor.authorSchulte, O-
dc.contributor.authorZhang, ML-
dc.contributor.authorWang, J-
dc.contributor.authorCheng, CKR-
dc.date.accessioned2020-11-07T13:53:31Z-
dc.date.available2020-11-07T13:53:31Z-
dc.date.issued2020-
dc.identifier.citationSIGMOD/PODS '20: International Conference on Management of Data, Portland, OR, USA, 14-19 June 2020, p. 845–860-
dc.identifier.isbn9781450367356-
dc.identifier.urihttp://hdl.handle.net/10722/291190-
dc.description.abstractStatistical Constraints (SCs) play an important role in statistical modeling and analysis. This paper brings the concept to data cleaning and studies how to leverage SCs for error detection. SCs provide a novel approach that has various application scenarios and works harmoniously with downstream statistical modeling. Entailment relationships between SCs and integrity constraints provide analytical insight into SCs. We develop SCODED, an SC-Oriented Data Error Detection system, comprising two key components: (1) SC Violation Detection : checks whether an SC is violated on a given dataset, and (2) Error Drill Down : identifies the top-k records that contribute most to the violation of an SC. Experiments on synthetic and real-world data show that SCs are effective in detecting data errors that violate them, compared to state-of-the-art approaches.-
dc.languageeng-
dc.publisherAssociation for Computing Machinery.-
dc.relation.ispartofSIGMOD '20 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data-
dc.rightsSIGMOD '20 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. Copyright © Association for Computing Machinery.-
dc.subjecterror detection-
dc.subjectmachine learning-
dc.subjectstatistical constraints-
dc.titleSCODED: Statistical Constraint Oriented Data Error Detection-
dc.typeConference_Paper-
dc.identifier.emailCheng, CKR: ckcheng@cs.hku.hk-
dc.identifier.authorityCheng, CKR=rp00074-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1145/3318464.3380568-
dc.identifier.scopuseid_2-s2.0-85086275631-
dc.identifier.hkuros318671-
dc.identifier.spage845-
dc.identifier.epage860-
dc.publisher.placeNew York, NY-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats