[LINK] Personal Name Matching, Canberra, 13 December 2006
Tom Worthington
Tom.Worthington at tomw.net.au
Mon Dec 11 09:02:45 AEDT 2006
Recommended:
---
DCS SEMINAR SERIES
A Comparison of Personal Name Matching: Techniques and Practical
Issues. -and also- Privacy-Preserving Data Linkage and Geocoding:
Current Approaches and Research Directions
Peter Christen (DCS, ANU)
DATE: 2006-12-13
TIME: 16:00:00 - 17:00:00
LOCATION: CSIT Seminar Room, N101, ANU, Canberra
ABSTRACT:
In this seminar I will present two talks I will give at the IEEE
International Conference on Data Mining (ICDM) in Hong Kong, 18-22 December.
1) Finding and matching personal names is at the core of an
increasing number of applications: from text and Web mining, search
engines, to information extraction, deduplication and data linkage
systems. Variations and errors in names make exact string matching
problematic, and approximate matching techniques have to be applied.
When compared to general text, however, personal names have different
characteristics that need to be considered. In this talk I will
discuss the characteristics of personal names and present potential
sources of variations and errors. I then overview a comprehensive
number of commonly used, as well as some recently developed name
matching techniques. Experimental comparisons using four large name
data sets indicate that there is no clear best matching technique.
2) Data linkage is the task of matching and aggregating records that
relate to the same entity from one or more data sets. A related
technique is geocoding, the matching of addresses to their geographic
locations. As data linkage is often based on personal information
(like names and addresses), privacy and confidentiality are of
paramount importance. In this talk I will present an overview of
current approaches to privacy-preserving data linkage, and discuss
their limitations. Using real-world scenarios I will illustrate the
significance of developing improved techniques for automated, large
scale and distributed privacy-preserving linking and geocoding. I
then discuss four core research areas that need to be addressed in
order to make linking and geocoding of large confidential data
collections feasible.
BIO:
Dr Peter Christen is a lecturer at the Department of Computer Science
at the Australian National University. He received his Diploma in
computer science engineering from the ETH Zurich (Switzerland) in
1995 and his PhD in computer science from the University of Basel
(Switzerland) in 1999. His research interests are data mining
(especially data linkage and data pre-processing), high-performance
computing, and most recently security and privacy preservation (in
the context of data linkage and health informatics).
In the last four years his research has concentrated on the project
"Investigation and Development of Parallel Large Scale Record Linkage
Techniques", an ARC Linkage project conducted in collaboration with
and partially funded by the NSW Department of Health.
<http://cecs.anu.edu.au/seminars/showone.pl?SID=333>
---
Tom Worthington FACS HLM tom.worthington at tomw.net.au Ph: 0419 496150
Director, Tomw Communications Pty Ltd ABN: 17 088 714 309
PO Box 13, Belconnen ACT 2617 http://www.tomw.net.au/
Visiting Fellow, ANU Blog: http://www.tomw.net.au/blog/atom.xml
More information about the Link
mailing list