Discussion: View Thread

Expand all | Collapse all

wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

1. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Like
Andrew von Nordenflycht
Posted 10-07-2009 23:33
I have several large datasets containing names of companies and individual people. The companies or people can and do appear multiple times (e.g., in different years) and I want to link all instances of the same name. This is easy when the match is exact.

However, for a variety of reasons, such as typos or 'nicknames', there are also many "close" matches – where the text does not match exactly but is very likely to refer to the same entity (e.g., "Jhon Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner Smith").

My goal is to identify these close matches in a systematic way without manually going over the data. I presume the main function of such a program or algorithm would be to identify "all but 1 character" matches, and then "all but 2 character matches", etc. Preferably the program would suggest close matches and let me decide if they are matched.

Any ideas on useful software for this task would be appreciated.

Andrew von Nordenflycht

Assistant Professor, Strategy

Simon Fraser University

vonetc@sfu.ca

View my research on my SSRN Author page:
http://ssrn.com/author=100363
2. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Like
Archive User
Posted 10-08-2009 06:10
I have not yet used it, but there is a program called DDupe that looks promising. I just finished cleaning a dataset by hand, then found it. It can be found at:

http://www.cs.umd.edu/projects/linqs/ddupe/

Perhaps this will help.

On Wed, Oct 7, 2009 at 11:33 PM, Andrew Von Nordenflycht <vonetc@sfu.ca> wrote:

I have several large datasets containing names of companies and individual people. The companies or people can and do appear multiple times (e.g., in different years) and I want to link all instances of the same name. This is easy when the match is exact.

However, for a variety of reasons, such as typos or 'nicknames', there are also many "close" matches – where the text does not match exactly but is very likely to refer to the same entity (e.g., "Jhon Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner Smith").

My goal is to identify these close matches in a systematic way without manually going over the data. I presume the main function of such a program or algorithm would be to identify "all but 1 character" matches, and then "all but 2 character matches", etc. Preferably the program would suggest close matches and let me decide if they are matched.

Any ideas on useful software for this task would be appreciated.

Andrew von Nordenflycht

Assistant Professor, Strategy

Simon Fraser University

vonetc@sfu.ca

View my research on my SSRN Author page:
http://ssrn.com/author=100363

--
***********************************************
Thomas E. Nelson
University of Louisville Entrepreneurship PhD Candidate
Office: 502.852.4874
Home: 812.944.8380
Cell: 765.212.1012
***********************************************
My greatest hope is to be a man of unborrowed vision

***********************************************
3. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Like
Archive User
Posted 10-08-2009 08:59
unfortunately, the only software I have ever found to work viably in such cases is eyeballs.

will
-------------------------------------------------------------------------------------
Will Mitchell
J. Rex Fuqua Professor of International Management, Professor of Strategy
Duke University, The Fuqua School of Business
Phone: 1.919.660.7994 | Fax: 1.919.681.6244 | email: will.mitchell@duke.edu | URL: willmitchell.org

Watch our video at www.fuqua.duke.edu/wakeup
________________________________________
From: Business Policy and Strategy List [BPS-NET@AOMLISTS.PACE.EDU] On Behalf Of Andrew Von Nordenflycht [vonetc@SFU.CA]
Sent: Wednesday, October 07, 2009 11:33 PM
To: BPS-NET@AOMLISTS.PACE.EDU
Subject: wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

I have several large datasets containing names of companies and individual people. The companies or people can and do appear multiple times (e.g., in different years) and I want to link all instances of the same name. This is easy when the match is exact.
However, for a variety of reasons, such as typos or ‘nicknames’, there are also many “close” matches – where the text does not match exactly but is very likely to refer to the same entity (e.g., “Jhon Smith” vs. “John Smith” or “Merrill Lynch” vs. “Merrill Lynch Fenner Smith”).
My goal is to identify these close matches in a systematic way without manually going over the data. I presume the main function of such a program or algorithm would be to identify “all but 1 character” matches, and then “all but 2 character matches”, etc. Preferably the program would suggest close matches and let me decide if they are matched.
Any ideas on useful software for this task would be appreciated.

Andrew von Nordenflycht
Assistant Professor, Strategy
Simon Fraser University
vonetc@sfu.ca

View my research on my SSRN Author page:
http://ssrn.com/author=100363
4. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Like
Mine Ozer
Posted 10-08-2009 09:26
Andrew,

I think STATA program might be helpful to merge different datasets.

Mine Ozer, Ph.D.

Assistant Professor of Management

Division of Economics and Business

SUNY Oneonta

<st1:place w:st="on"><st1:city w:st="on">Oneonta</st1:city>, <st1:state w:st="on">NY</st1:state> <st1:postalcode w:st="on">13820</st1:postalcode></st1:place>

Phone: 607-436-3047

From: Business Policy and Strategy List [mailto:BPS-NET@AOMLISTS.PACE.EDU] On Behalf Of Andrew Von Nordenflycht
Sent: Wednesday, October 07, 2009 11:33 PM
To: BPS-NET@AOMLISTS.PACE.EDU
Subject: wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

I have several large datasets containing names of companies and individual people. The companies or people can and do appear multiple times (e.g., in different years) and I want to link all instances of the same name. This is easy when the match is exact.

However, for a variety of reasons, such as typos or 'nicknames', there are also many "close" matches – where the text does not match exactly but is very likely to refer to the same entity (e.g., "Jhon Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner Smith").

My goal is to identify these close matches in a systematic way without manually going over the data. I presume the main function of such a program or algorithm would be to identify "all but 1 character" matches, and then "all but 2 character matches", etc. Preferably the program would suggest close matches and let me decide if they are matched.

Any ideas on useful software for this task would be appreciated.

Andrew von Nordenflycht

Assistant Professor, Strategy

<st1:place w:st="on"><st1:placename w:st="on">Simon</st1:placename> <st1:placename w:st="on">Fraser</st1:placename> <st1:placename w:st="on">University</st1:placename></st1:place>

vonetc@sfu.ca

View my research on my SSRN Author page:
http://ssrn.com/author=100363
5. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Like
Archive User
Posted 10-08-2009 13:06
Andrew, I'd largely agree with Will (having done name matching with him on one project) that eyeballs are essential and probably the best tool for any database on the small end of large. For really huge datasets, I would suggest Access plus some sql code.

Access used to even have its own fuzzy matching routine, though I'm not sure that it does anymore. But if you google around for name matching & access, fuzzy matching & access, and other similar terms you'll find routines and suggestions for coding this sort of matching process. Companies do this often with their mailing lists - though not enough judging by the stream of identical catalogs to my doorstep.

Charlie

On Wed, Oct 7, 2009 at 11:33 PM, Andrew Von Nordenflycht <vonetc@sfu.ca> wrote:

I have several large datasets containing names of companies and individual people. The companies or people can and do appear multiple times (e.g., in different years) and I want to link all instances of the same name. This is easy when the match is exact.

However, for a variety of reasons, such as typos or 'nicknames', there are also many "close" matches – where the text does not match exactly but is very likely to refer to the same entity (e.g., "Jhon Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner Smith").

My goal is to identify these close matches in a systematic way without manually going over the data. I presume the main function of such a program or algorithm would be to identify "all but 1 character" matches, and then "all but 2 character matches", etc. Preferably the program would suggest close matches and let me decide if they are matched.

Any ideas on useful software for this task would be appreciated.

Andrew von Nordenflycht

Assistant Professor, Strategy

Simon Fraser University

vonetc@sfu.ca

View my research on my SSRN Author page:
http://ssrn.com/author=100363

--
Charles Williams, Asst. Professor of Strategy
Fuqua School of Business, Duke University
P.O. Box 90120, Durham, NC 27708
tel: 919.660.7963 // fax: 919.681.6244
6. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Like
Hong Cui
Posted 10-08-2009 13:33
Andrew,

I just did a fuzzy matching using Excel and Eyeballs. I would have used SAS if my dataset is larger. SAS has several procedures which allow you to "customize" your matching routines. But eyeballing before and after any procedures that you use is necessary.

Victor
UBC

________________________________

From: Business Policy and Strategy List on behalf of Charles Williams
Sent: Thu 10/8/2009 10:06 AM
To: BPS-NET@AOMLISTS.PACE.EDU
Subject: Re: wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Andrew, I'd largely agree with Will (having done name matching with him on one project) that eyeballs are essential and probably the best tool for any database on the small end of large. For really huge datasets, I would suggest Access plus some sql code.

Access used to even have its own fuzzy matching routine, though I'm not sure that it does anymore. But if you google around for name matching & access, fuzzy matching & access, and other similar terms you'll find routines and suggestions for coding this sort of matching process. Companies do this often with their mailing lists - though not enough judging by the stream of identical catalogs to my doorstep.

Charlie

On Wed, Oct 7, 2009 at 11:33 PM, Andrew Von Nordenflycht <vonetc@sfu.ca> wrote:

I have several large datasets containing names of companies and individual people. The companies or people can and do appear multiple times (e.g., in different years) and I want to link all instances of the same name. This is easy when the match is exact.

However, for a variety of reasons, such as typos or 'nicknames', there are also many "close" matches - where the text does not match exactly but is very likely to refer to the same entity (e.g., "Jhon Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner Smith").

My goal is to identify these close matches in a systematic way without manually going over the data. I presume the main function of such a program or algorithm would be to identify "all but 1 character" matches, and then "all but 2 character matches", etc. Preferably the program would suggest close matches and let me decide if they are matched.

Any ideas on useful software for this task would be appreciated.

Andrew von Nordenflycht

Assistant Professor, Strategy

Simon Fraser University

vonetc@sfu.ca

View my research on my SSRN Author page:
http://ssrn.com/author=100363 <http://ssrn.com/author=100363>

--
Charles Williams, Asst. Professor of Strategy
Fuqua School of Business, Duke University
P.O. Box 90120, Durham, NC 27708
tel: 919.660.7963 // fax: 919.681.6244
7. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Like
Archive User
Posted 10-08-2009 15:20
Andrew,

I had a similar problem. My dataset is large so I do not use excel. I
find the best method is to directly match.

Here is the pseudo-code:

1. Match one-to-one using the name fields - remove from sample, store
these as 'matched.'
2. Find any common identifiers (dates, locations, etc.) - use these to
match the 'unmatched' records as closely as possible using many-to-many.
Ie. match all records with the same year. Your merged database will get
very big at this moment.

3. Loop:
a. Remove a word from the one of the name fields that is common.

Ie. Merrill from name field a:

(Name field a: "Merrill Lynch" & Name field b: "Merrill Lynch Fenner
Smith") becomes (Name field a: "Lynch" & Name field b: "Merrill Lynch
Fenner Smith").

b. Check to see if Name field b contains name field a.

Ie. Lynch is contained in "Merrill Lynch Fenner Smith"

c. Put record in 'matched' location.

d. Repeat.

You will have to go through your databases and find 'common names'
manually. Don't worry - I have a large database (> 100 k records) and
it did not take that long to create the 'common name list.' I found
that the best method was to check the 'unmatched' database after each
run and see if there were any 'common names' left over.

I hope that helps.

David Maslach
University of Western Ontario

-----Original Message-----
From: Business Policy and Strategy List
[mailto:BPS-NET@AOMLISTS.PACE.EDU] On Behalf Of Cui, Victor
Sent: Thursday, October 08, 2009 1:33 PM
To: BPS-NET@AOMLISTS.PACE.EDU
Subject: Re: wanted: software to identify "close" matches in a datase t
of names (either individuals or companies).

Andrew,

I just did a fuzzy matching using Excel and Eyeballs. I would have used
SAS if my dataset is larger. SAS has several procedures which allow you
to "customize" your matching routines. But eyeballing before and after
any procedures that you use is necessary.

Victor
UBC

________________________________

From: Business Policy and Strategy List on behalf of Charles Williams
Sent: Thu 10/8/2009 10:06 AM
To: BPS-NET@AOMLISTS.PACE.EDU
Subject: Re: wanted: software to identify "close" matches in a datase t
of names (either individuals or companies).

Andrew, I'd largely agree with Will (having done name matching with him
on one project) that eyeballs are essential and probably the best tool
for any database on the small end of large. For really huge datasets, I
would suggest Access plus some sql code.

Access used to even have its own fuzzy matching routine, though I'm not
sure that it does anymore. But if you google around for name matching &
access, fuzzy matching & access, and other similar terms you'll find
routines and suggestions for coding this sort of matching process.
Companies do this often with their mailing lists - though not enough
judging by the stream of identical catalogs to my doorstep.

Charlie

On Wed, Oct 7, 2009 at 11:33 PM, Andrew Von Nordenflycht <vonetc@sfu.ca>
wrote:

I have several large datasets containing names of companies and
individual people. The companies or people can and do appear multiple
times (e.g., in different years) and I want to link all instances of the
same name. This is easy when the match is exact.

However, for a variety of reasons, such as typos or 'nicknames',
there are also many "close" matches - where the text does not match
exactly but is very likely to refer to the same entity (e.g., "Jhon
Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner
Smith").

My goal is to identify these close matches in a systematic way
without manually going over the data. I presume the main function of
such a program or algorithm would be to identify "all but 1 character"
matches, and then "all but 2 character matches", etc. Preferably the
program would suggest close matches and let me decide if they are
matched.

Any ideas on useful software for this task would be appreciated.

Andrew von Nordenflycht

Assistant Professor, Strategy

Simon Fraser University

vonetc@sfu.ca

View my research on my SSRN Author page:
http://ssrn.com/author=100363 <http://ssrn.com/author=100363>

--
Charles Williams, Asst. Professor of Strategy
Fuqua School of Business, Duke University
P.O. Box 90120, Durham, NC 27708
tel: 919.660.7963 // fax: 919.681.6244
8. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Like
Mike Toffel
Posted 10-09-2009 06:48
Andrew,

I've been able to match company names and addresses that are spelled differently (as you describe) across various large datasets using MatchIT software. It also does fuzzy matching of individuals' names.

MatchIT is a bit complicated to learn to use, and does require some "tuning" based on the particulars of the datasets. But once you've got the hang of it, it works very well and quickly. Details are here: http://helpit.com/folders/software_solutions/batch_data_quality_us/

Because it's not cheap, though, our centralized IT group purchased it and makes it available to faculty in the IT lab.

-Mike

-----------------------------------------------------------
Michael Toffel
Assistant Professor | Harvard Business School
Morgan Hall 497 | Boston MA 02163
tel +1 (617) 384-8043
fax +1 (206) 339-7123
http://people.hbs.edu/mtoffel/

From: Andrew Von Nordenflycht [vonetc@SFU.CA]
Sent: Wednesday, October 07, 2009 11:33 PM
Subject: wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

I have several large datasets containing names of companies and individual people. The companies or people can and do appear multiple times (e.g., in different years) and I want to link all instances of the same name. This is easy when the match is exact.

However, for a variety of reasons, such as typos or 'nicknames', there are also many "close" matches – where the text does not match exactly but is very likely to refer to the same entity (e.g., "Jhon Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner Smith").

My goal is to identify these close matches in a systematic way without manually going over the data. I presume the main function of such a program or algorithm would be to identify "all but 1 character" matches, and then "all but 2 character matches", etc. Preferably the program would suggest close matches and let me decide if they are matched.

Any ideas on useful software for this task would be appreciated.

Andrew von Nordenflycht

Assistant Professor, Strategy

Simon Fraser University

vonetc@sfu.ca

View my research on my SSRN Author page:
http://ssrn.com/author=100363
9. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Like
Archive User
Posted 10-09-2009 06:57
Andrew,
I highly recommend the Data Quality Server in SAS. I am not the most comfortable programming in SAS, but found this essential for a match of names and addresses of companies across two very large (100,000+ observations) data sets.

You will want to standardize names as much as possible before running the match, but the program lets you do a lot of great things, like matching on the "sound" of the name, increasingly relaxing the spelling, etc. It is helpful to go in cycles, where you do a very exact string match first, then take those records out, match up a bit more loosely, etc. Then, on the remaining ones, I agree with the "eyeballs" folks. (Also spot-check the automated work visually).

One note of caution with the "eyeballs" approach -- if you ever have to go back and re-do your work, you will have to do the eyeballs part again. So, automate as much as possible!

Happy to discuss offline,
Kristina

-----Original Message-----
From: Maslach, David [mailto:dmaslach@IVEY.UWO.CA]
Sent: Thursday, October 08, 2009 3:20 PM
Subject: Re: wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Andrew,

I had a similar problem. My dataset is large so I do not use excel. I
find the best method is to directly match.

Here is the pseudo-code:

1. Match one-to-one using the name fields - remove from sample, store
these as 'matched.'
2. Find any common identifiers (dates, locations, etc.) - use these to
match the 'unmatched' records as closely as possible using many-to-many.
Ie. match all records with the same year. Your merged database will get
very big at this moment.

3. Loop:
a. Remove a word from the one of the name fields that is common.

Ie. Merrill from name field a:

(Name field a: "Merrill Lynch" & Name field b: "Merrill Lynch Fenner
Smith") becomes (Name field a: "Lynch" & Name field b: "Merrill Lynch
Fenner Smith").

b. Check to see if Name field b contains name field a.

Ie. Lynch is contained in "Merrill Lynch Fenner Smith"

c. Put record in 'matched' location.

d. Repeat.

You will have to go through your databases and find 'common names'
manually. Don't worry - I have a large database (> 100 k records) and
it did not take that long to create the 'common name list.' I found
that the best method was to check the 'unmatched' database after each
run and see if there were any 'common names' left over.

I hope that helps.

David Maslach
University of Western Ontario

-----Original Message-----
From: Business Policy and Strategy List
[mailto:BPS-NET@AOMLISTS.PACE.EDU] On Behalf Of Cui, Victor
Sent: Thursday, October 08, 2009 1:33 PM
To: BPS-NET@AOMLISTS.PACE.EDU
Subject: Re: wanted: software to identify "close" matches in a datase t
of names (either individuals or companies).

Andrew,

I just did a fuzzy matching using Excel and Eyeballs. I would have used
SAS if my dataset is larger. SAS has several procedures which allow you
to "customize" your matching routines. But eyeballing before and after
any procedures that you use is necessary.

Victor
UBC

________________________________

From: Business Policy and Strategy List on behalf of Charles Williams
Sent: Thu 10/8/2009 10:06 AM
To: BPS-NET@AOMLISTS.PACE.EDU
Subject: Re: wanted: software to identify "close" matches in a datase t
of names (either individuals or companies).

Andrew, I'd largely agree with Will (having done name matching with him
on one project) that eyeballs are essential and probably the best tool
for any database on the small end of large. For really huge datasets, I
would suggest Access plus some sql code.

Access used to even have its own fuzzy matching routine, though I'm not
sure that it does anymore. But if you google around for name matching &
access, fuzzy matching & access, and other similar terms you'll find
routines and suggestions for coding this sort of matching process.
Companies do this often with their mailing lists - though not enough
judging by the stream of identical catalogs to my doorstep.

Charlie

On Wed, Oct 7, 2009 at 11:33 PM, Andrew Von Nordenflycht <vonetc@sfu.ca>
wrote:

I have several large datasets containing names of companies and
individual people. The companies or people can and do appear multiple
times (e.g., in different years) and I want to link all instances of the
same name. This is easy when the match is exact.

However, for a variety of reasons, such as typos or 'nicknames',
there are also many "close" matches - where the text does not match
exactly but is very likely to refer to the same entity (e.g., "Jhon
Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner
Smith").

My goal is to identify these close matches in a systematic way
without manually going over the data. I presume the main function of
such a program or algorithm would be to identify "all but 1 character"
matches, and then "all but 2 character matches", etc. Preferably the
program would suggest close matches and let me decide if they are
matched.

Any ideas on useful software for this task would be appreciated.

Andrew von Nordenflycht

Assistant Professor, Strategy

Simon Fraser University

vonetc@sfu.ca

View my research on my SSRN Author page:
http://ssrn.com/author=100363 <http://ssrn.com/author=100363>

--
Charles Williams, Asst. Professor of Strategy
Fuqua School of Business, Duke University
P.O. Box 90120, Durham, NC 27708
tel: 919.660.7963 // fax: 919.681.6244
10. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Like
Archive User
Posted 10-09-2009 07:16
Andrew,

A forthcoming paper in Research Policy that compares different heuristics for patent retrieval might also be of interest to you (http://dx.doi.org/10.1016/j.respol.2009.08.001).

Best regards,
Marcel

Marcel Bogers, Ph.D.
University of Southern Denmark
Alsion 2, 6400 Sønderborg, Denmark
Phone: +45 6550 1284
E-mail: bogers@mci.sdu.dk
URL: www.marcelbogers.com

On Thu, Oct 8, 2009 at 5:33 AM, Andrew Von Nordenflycht <vonetc@sfu.ca> wrote:

I have several large datasets containing names of companies and individual people. The companies or people can and do appear multiple times (e.g., in different years) and I want to link all instances of the same name. This is easy when the match is exact.

However, for a variety of reasons, such as typos or 'nicknames', there are also many "close" matches – where the text does not match exactly but is very likely to refer to the same entity (e.g., "Jhon Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner Smith").

My goal is to identify these close matches in a systematic way without manually going over the data. I presume the main function of such a program or algorithm would be to identify "all but 1 character" matches, and then "all but 2 character matches", etc. Preferably the program would suggest close matches and let me decide if they are matched.

Any ideas on useful software for this task would be appreciated.

Andrew von Nordenflycht

Assistant Professor, Strategy

Simon Fraser University

vonetc@sfu.ca

View my research on my SSRN Author page:
http://ssrn.com/author=100363
11. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Like
Archive User
Posted 10-09-2009 10:57
Andrew,

I used STATA to match names in large datasets (>100K) by having the following steps.

First, "standardize" all the names. This procedure may include, e.g., changing all characters to capital, ensuring only one space between any two words, dropping any special signs or marks, such as " : , et al.

Second, start with "perfect" match. That is, deal with those names that can be perfectly matched by machine.

Third, have "key word" match for the rest. For example, you can start with matching the first 5 words, then matching the first 4 words, then matching the first 3 words, etc. It would be helpful if you check your dataset beforehand and learn about any "regularities" in your dataset. For example, you may decide to drop all non-essential words in the names, such as "The" "A", etc. With this step, you'll have a matched list for instances like "Jhon Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner Smith". You do need eyeballing, unfortunately.

Yong

www.buffalo.edu/~yl67

From: Andrew Von Nordenflycht [mailto:vonetc@SFU.CA]
Sent: Wednesday, October 07, 2009 11:33 PM
Subject: wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

I have several large datasets containing names of companies and individual people. The companies or people can and do appear multiple times (e.g., in different years) and I want to link all instances of the same name. This is easy when the match is exct.

However, for a variety of reasons, such as typos or 'nicknames', there are also many "close" matches – where the text does not match exactly but is very likely to refer to the same entity (e.g., "Jhon Smith" vs. "John Smith" or "Merrill Lynch" vs. "Merrill Lynch Fenner Smith").

My goal is to identify these close matches in a systematic way without manually going over the data. I presume the main function of such a program or algorithm would be to identify "all but 1 character" matches, and then "all but 2 character matches", etc. Preferably the program would suggest close matches and let me decide if they are matched.

Any ideas on useful software for this task would be appreciated.

Andrew von Nordenflycht

Assistant Professor, Strategy

Simon Fraser University

vonetc@sfu.ca

View my research on my SSRN Author page:
http://ssrn.com/author=100363

Discussion: View Thread

wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Andrew von Nordenflycht10-07-2009 23:33

Archive User10-08-2009 06:10

Archive User10-08-2009 08:59

Mine Ozer10-08-2009 09:26

Archive User10-08-2009 13:06

Hong Cui10-08-2009 13:33

Archive User10-08-2009 15:20

Mike Toffel10-09-2009 06:48

Archive User10-09-2009 06:57

Archive User10-09-2009 07:16

Archive User10-09-2009 10:57

1. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

2. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

3. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

4. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

5. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

6. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

7. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

8. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

9. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

10. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

11. wanted: software to identify "close" matches in a datase t of names (either individuals or companies).

Follow STR on Social Media