[LINK] Personal Name Matching, Canberra, 13 December 2006

Tom Worthington Tom.Worthington at tomw.net.au
Mon Dec 11 09:02:45 AEDT 2006



A Comparison of Personal Name Matching: Techniques and Practical 
Issues. -and also- Privacy-Preserving Data Linkage and Geocoding: 
Current Approaches and Research Directions

Peter Christen (DCS, ANU)

DATE: 2006-12-13
TIME: 16:00:00 - 17:00:00
LOCATION: CSIT Seminar Room, N101, ANU, Canberra

In this seminar I will present two talks I will give at the IEEE 
International Conference on Data Mining (ICDM) in Hong Kong, 18-22 December.

1) Finding and matching personal names is at the core of an 
increasing number of applications: from text and Web mining, search 
engines, to information extraction, deduplication and data linkage 
systems. Variations and errors in names make exact string matching 
problematic, and approximate matching techniques have to be applied. 
When compared to general text, however, personal names have different 
characteristics that need to be considered. In this talk I will 
discuss the characteristics of personal names and present potential 
sources of variations and errors. I then overview a comprehensive 
number of commonly used, as well as some recently developed name 
matching techniques. Experimental comparisons using four large name 
data sets indicate that there is no clear best matching technique.

2) Data linkage is the task of matching and aggregating records that 
relate to the same entity from one or more data sets. A related 
technique is geocoding, the matching of addresses to their geographic 
locations. As data linkage is often based on personal information 
(like names and addresses), privacy and confidentiality are of 
paramount importance. In this talk I will present an overview of 
current approaches to privacy-preserving data linkage, and discuss 
their limitations. Using real-world scenarios I will illustrate the 
significance of developing improved techniques for automated, large 
scale and distributed privacy-preserving linking and geocoding. I 
then discuss four core research areas that need to be addressed in 
order to make linking and geocoding of large confidential data 
collections feasible.

Dr Peter Christen is a lecturer at the Department of Computer Science 
at the Australian National University. He received his Diploma in 
computer science engineering from the ETH Zurich (Switzerland) in 
1995 and his PhD in computer science from the University of Basel 
(Switzerland) in 1999. His research interests are data mining 
(especially data linkage and data pre-processing), high-performance 
computing, and most recently security and privacy preservation (in 
the context of data linkage and health informatics).

In the last four years his research has concentrated on the project 
"Investigation and Development of Parallel Large Scale Record Linkage 
Techniques", an ARC Linkage project conducted in collaboration with 
and partially funded by the NSW Department of Health.


Tom Worthington FACS HLM tom.worthington at tomw.net.au Ph: 0419 496150
Director, Tomw Communications Pty Ltd            ABN: 17 088 714 309
PO Box 13, Belconnen ACT 2617                http://www.tomw.net.au/
Visiting Fellow, ANU      Blog: http://www.tomw.net.au/blog/atom.xml  

More information about the Link mailing list