-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting matches when using blocking #63
Comments
I have a similar question. I figured out a work around by using the number of indices in each block to figure out which block corresponds to which value of the blocking variable. From there, I found the matches within each block and then binded them together. This really only works when the blocking variable(s) only have a small number of unique values. It would great to have a more systematic option. Thank you for making and maintaining such a great package. (This is the sample example as above with my approach pasted at the bottom -- note that I dropped gender from varnames to make the merge work.)
|
Disclaimer: I am a regular fastLink user, not a fastLink developer. @jamesmartherus @bengoehring Are you both "merely" asking how to extract matches when using blocking? I know how to do that. But I am not sure how to relate to your very specific code, which seems complicated and convoluted to me. |
Yes. |
I wrote "merely" in quotation marks because this is a known issue that the Ted helped me with my similar question a few years ago, and thanks to him I regularly use code similar to the example below. It should be much simpler to do this in
The confusion table of the example should look like this:
|
Hi there, I am trying to identify duplicates in a large dataset. I am blocking on several variables, aggregating with
aggregateEM()
and then trying to extract the matches withgetMatches()
. It looks likegetMatches()
won't work with thefastLink.aggregate
class. Is there some other way to get the same functionality?Reprex:
The text was updated successfully, but these errors were encountered: