File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/3318464.3380568
- Scopus: eid_2-s2.0-85086275631
- WOS: WOS:000644433700059
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: SCODED: Statistical Constraint Oriented Data Error Detection
Title | SCODED: Statistical Constraint Oriented Data Error Detection |
---|---|
Authors | |
Keywords | error detection machine learning statistical constraints |
Issue Date | 2020 |
Publisher | Association for Computing Machinery. |
Citation | SIGMOD/PODS '20: International Conference on Management of Data, Portland, OR, USA, 14-19 June 2020, p. 845–860 How to Cite? |
Abstract | Statistical Constraints (SCs) play an important role in statistical modeling and analysis. This paper brings the concept to data cleaning and studies how to leverage SCs for error detection. SCs provide a novel approach that has various application scenarios and works harmoniously with downstream statistical modeling. Entailment relationships between SCs and integrity constraints provide analytical insight into SCs. We develop SCODED, an SC-Oriented Data Error Detection system, comprising two key components: (1) SC Violation Detection : checks whether an SC is violated on a given dataset, and (2) Error Drill Down : identifies the top-k records that contribute most to the violation of an SC. Experiments on synthetic and real-world data show that SCs are effective in detecting data errors that violate them, compared to state-of-the-art approaches. |
Persistent Identifier | http://hdl.handle.net/10722/291190 |
ISBN | |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yan, J | - |
dc.contributor.author | Schulte, O | - |
dc.contributor.author | Zhang, ML | - |
dc.contributor.author | Wang, J | - |
dc.contributor.author | Cheng, CKR | - |
dc.date.accessioned | 2020-11-07T13:53:31Z | - |
dc.date.available | 2020-11-07T13:53:31Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | SIGMOD/PODS '20: International Conference on Management of Data, Portland, OR, USA, 14-19 June 2020, p. 845–860 | - |
dc.identifier.isbn | 9781450367356 | - |
dc.identifier.uri | http://hdl.handle.net/10722/291190 | - |
dc.description.abstract | Statistical Constraints (SCs) play an important role in statistical modeling and analysis. This paper brings the concept to data cleaning and studies how to leverage SCs for error detection. SCs provide a novel approach that has various application scenarios and works harmoniously with downstream statistical modeling. Entailment relationships between SCs and integrity constraints provide analytical insight into SCs. We develop SCODED, an SC-Oriented Data Error Detection system, comprising two key components: (1) SC Violation Detection : checks whether an SC is violated on a given dataset, and (2) Error Drill Down : identifies the top-k records that contribute most to the violation of an SC. Experiments on synthetic and real-world data show that SCs are effective in detecting data errors that violate them, compared to state-of-the-art approaches. | - |
dc.language | eng | - |
dc.publisher | Association for Computing Machinery. | - |
dc.relation.ispartof | SIGMOD '20 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data | - |
dc.rights | SIGMOD '20 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. Copyright © Association for Computing Machinery. | - |
dc.subject | error detection | - |
dc.subject | machine learning | - |
dc.subject | statistical constraints | - |
dc.title | SCODED: Statistical Constraint Oriented Data Error Detection | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Cheng, CKR: ckcheng@cs.hku.hk | - |
dc.identifier.authority | Cheng, CKR=rp00074 | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1145/3318464.3380568 | - |
dc.identifier.scopus | eid_2-s2.0-85086275631 | - |
dc.identifier.hkuros | 318671 | - |
dc.identifier.spage | 845 | - |
dc.identifier.epage | 860 | - |
dc.identifier.isi | WOS:000644433700059 | - |
dc.publisher.place | New York, NY | - |