Discussion: View Thread

wanted: software to identify "close" matches in a dataset t of names (either individuals or companies).

  • 1.  wanted: software to identify "close" matches in a dataset t of names (either individuals or companies).

    Posted 10-08-2009 12:03

    Andrew, I have found ASAP Utilities ( a plug into excel helpful) you can download it free at http://www.asap-utilities.com/ if you have datasets in excel.

     

    Hope this helpful

    Richard Laramy

    Consultant

    The Expert Knowledge Network

    www.utakethecredit.com

    512-252-3070

     

    From: Business Policy and Strategy List [mailto:BPS-NET@AOMLISTS.PACE.EDU] On Behalf Of Andrew Von Nordenflycht
    Sent: Wednesday, October 07, 2009 10:33 PM
    To: BPS-NET@AOMLISTS.PACE.EDU
    Subject: wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

     

    I have several large datasets containing names of companies and individual people. The companies or people can and do appear multiple times (e.g., in different years) and I want to link all instances of the same name. This is easy when the match is exact.

    However, for a variety of reasons, such as typos or 'nicknames', there are also many "close" matches – where the text does not match exactly but is very likely to refer to the same entity (e.g., "Jhon Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner Smith").

    My goal is to identify these close matches in a systematic way without manually going over the data. I presume the main function of such a program or algorithm would be to identify "all but 1 character" matches, and then "all but 2 character matches", etc. Preferably the program would suggest close matches and let me decide if they are matched.

    Any ideas on useful software for this task would be appreciated.

     

     

    Andrew von Nordenflycht

    Assistant Professor, Strategy

    Simon Fraser University

    vonetc@sfu.ca

     

    View my research on my SSRN Author page:
    http://ssrn.com/author=100363