Show simple item record

dc.contributor.advisorRegina Barzilay
dc.contributor.authorKushman, Nateen_US
dc.contributor.authorAdib, Fadelen_US
dc.contributor.authorKatabi, Dinaen_US
dc.contributor.authorBarzilay, Reginaen_US
dc.contributor.otherNatural Language Processingen
dc.date.accessioned2013-09-10T21:30:03Z
dc.date.available2013-09-10T21:30:03Z
dc.date.issued2013-09-10
dc.identifier.urihttp://hdl.handle.net/1721.1/80380
dc.description.abstractConsider the problem of migrating a company's CRM or ERP database from one application to another, or integrating two such databases as a result of a merger. This problem requires matching two large relational schemas with hundreds and sometimes thousands of fields. Further, the correct match is likely complex: rather than a simple one-to-one alignment, some fields in the source database may map to multiple fields in the target database, and others may have no equivalent fields in the target database. Despite major advances in schema matching, fully automated solutions to large relational schema matching problems are still elusive. This paper focuses on improving the accuracy of automated large relational schema matching. Our key insight is the observation that modern database applications have a rich user interface that typically exhibits more consistency across applications than the underlying schemas. We associate UI widgets in the application with the underlying database fields on which they operate and demonstrate that this association delivers new information useful for matching large and complex relational schemas. Additionally, we show how to formalize the schema matching problem as a quadratic program, and solve it efficiently using standard optimization and machine learning techniques. We evaluate our approach on real-world CRM applications with hundreds of fields and show that it improves the accuracy by a factor of 2-4x.en_US
dc.format.extent12 p.en_US
dc.relation.ispartofseriesMIT-CSAIL-TR-2013-022
dc.titleHarvesting Application Information for Industry-Scale Relational Schema Matchingen_US
dc.date.updated2013-09-10T21:30:03Z


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record