File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1016/S0306-4379(01)00043-6
- Scopus: eid_2-s2.0-0035545935
- WOS: WOS:000172158400006
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: Discovering and reconciling value conflicts for numerical data integration
Title | Discovering and reconciling value conflicts for numerical data integration |
---|---|
Authors | |
Keywords | Conversion function Data integration Data mining Data quality Robust regression Semantic conflicts |
Issue Date | 2001 |
Publisher | Pergamon. The Journal's web site is located at http://www.elsevier.com/locate/is |
Citation | Information Systems, 2001, v. 26 n. 8, p. 635-656 How to Cite? |
Abstract | The built-up in Information Technology capital fueled by the Internet and cost-effectiveness of new telecommunications technologies has led to a proliferation of information systems that are in dire need to exchange information but incapable of doing so due to the lack of semantic interoperability. It is now evident that physical connectivity (the ability to exchange bits and bytes) is no longer adequate: the integration of data from autonomous and heterogeneous systems calls for the prior identification and resolution of semantic conflicts that may be present. Unfortunately, this requires the system integrator to sift through the data from disparate systems in a painstaking manner. We suggest that this process can be partially automated by presenting a methodology and technique for the discovery of potential semantic conflicts as well as the underlying data transformation needed to resolve the conflicts. Our methodology begins by classifying data value conflicts into two categories: context independent and context dependent. While context independent conflicts are usually caused by unexpected errors, the context dependent conflicts are primarily a result of the heterogeneity of underlying data sources. To facilitate data integration, data value conversion rules are proposed to describe the quantitative relationships among data values involving context dependent conflicts. A general approach is proposed to discover data value conversion rules from the data. The approach consists of the five major steps: relevant attribute analysis, candidate model selection, conversion function generation, conversion function selection and conversion rule formation. It is being implemented in a prototype system, DIRECT, for business data using statistics based techniques. Preliminary study using both synthetic and real world data indicated that the proposed approach is promising. |
Persistent Identifier | http://hdl.handle.net/10722/88983 |
ISSN | 2023 Impact Factor: 3.0 2023 SCImago Journal Rankings: 1.201 |
ISI Accession Number ID | |
References |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Fan, W | en_HK |
dc.contributor.author | Lu, H | en_HK |
dc.contributor.author | Madnick, SE | en_HK |
dc.contributor.author | Cheung, D | en_HK |
dc.date.accessioned | 2010-09-06T09:50:55Z | - |
dc.date.available | 2010-09-06T09:50:55Z | - |
dc.date.issued | 2001 | en_HK |
dc.identifier.citation | Information Systems, 2001, v. 26 n. 8, p. 635-656 | en_HK |
dc.identifier.issn | 0306-4379 | en_HK |
dc.identifier.uri | http://hdl.handle.net/10722/88983 | - |
dc.description.abstract | The built-up in Information Technology capital fueled by the Internet and cost-effectiveness of new telecommunications technologies has led to a proliferation of information systems that are in dire need to exchange information but incapable of doing so due to the lack of semantic interoperability. It is now evident that physical connectivity (the ability to exchange bits and bytes) is no longer adequate: the integration of data from autonomous and heterogeneous systems calls for the prior identification and resolution of semantic conflicts that may be present. Unfortunately, this requires the system integrator to sift through the data from disparate systems in a painstaking manner. We suggest that this process can be partially automated by presenting a methodology and technique for the discovery of potential semantic conflicts as well as the underlying data transformation needed to resolve the conflicts. Our methodology begins by classifying data value conflicts into two categories: context independent and context dependent. While context independent conflicts are usually caused by unexpected errors, the context dependent conflicts are primarily a result of the heterogeneity of underlying data sources. To facilitate data integration, data value conversion rules are proposed to describe the quantitative relationships among data values involving context dependent conflicts. A general approach is proposed to discover data value conversion rules from the data. The approach consists of the five major steps: relevant attribute analysis, candidate model selection, conversion function generation, conversion function selection and conversion rule formation. It is being implemented in a prototype system, DIRECT, for business data using statistics based techniques. Preliminary study using both synthetic and real world data indicated that the proposed approach is promising. | en_HK |
dc.language | eng | en_HK |
dc.publisher | Pergamon. The Journal's web site is located at http://www.elsevier.com/locate/is | en_HK |
dc.relation.ispartof | Information Systems | en_HK |
dc.subject | Conversion function | en_HK |
dc.subject | Data integration | en_HK |
dc.subject | Data mining | en_HK |
dc.subject | Data quality | en_HK |
dc.subject | Robust regression | en_HK |
dc.subject | Semantic conflicts | en_HK |
dc.title | Discovering and reconciling value conflicts for numerical data integration | en_HK |
dc.type | Article | en_HK |
dc.identifier.openurl | http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=0306-4379&volume=9&spage=635&epage=656&date=2001&atitle=Discovering+and+Reconciling+Value+Conflicts+for+Numerical+Data+Integration | en_HK |
dc.identifier.email | Cheung, D:dcheung@cs.hku.hk | en_HK |
dc.identifier.authority | Cheung, D=rp00101 | en_HK |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1016/S0306-4379(01)00043-6 | en_HK |
dc.identifier.scopus | eid_2-s2.0-0035545935 | en_HK |
dc.identifier.hkuros | 70939 | en_HK |
dc.relation.references | http://www.scopus.com/mlt/select.url?eid=2-s2.0-0035545935&selection=ref&src=s&origin=recordpage | en_HK |
dc.identifier.volume | 26 | en_HK |
dc.identifier.issue | 8 | en_HK |
dc.identifier.spage | 635 | en_HK |
dc.identifier.epage | 656 | en_HK |
dc.identifier.isi | WOS:000172158400006 | - |
dc.publisher.place | United Kingdom | en_HK |
dc.identifier.scopusauthorid | Fan, W=7401635358 | en_HK |
dc.identifier.scopusauthorid | Lu, H=7404843983 | en_HK |
dc.identifier.scopusauthorid | Madnick, SE=7003477810 | en_HK |
dc.identifier.scopusauthorid | Cheung, D=34567902600 | en_HK |
dc.identifier.issnl | 0306-4379 | - |