Arman 0 Report post Posted November 16, 2005 In a project I have a data set 1 with more that 15.000 people. The same project has a second data set 2, imported from another researcher, that contains about 95% the same people, but several of these (about 750) are not in set 1 yet. My first question is how to find out which people are in data set 2 and not in data set 1? (Use some filter?) My second question is how I can merge these from data set 2 to set 1, without copying them one by one. (I'm using TMG v6.05 and I am aware of the merge fundamentals and the various tips by Terry.) Any help is appreciated. Share this post Link to post Share on other sites
RobinL 0 Report post Posted November 16, 2005 Arman, If you want to retain the data about each person completely or correctly, the short answer is NO. There are several issues here that you should be aware of before you assume that it would be possible to do this merge _completely automatically_ . You say that the data has been entered by a second person (and is not derived from a single master copy that has been independently updated) so there are likely different spellings of names, places, sources, citations, and different exhibits attached. To "merge" these data sets you need to asert that person 1:A (data set 1 person A) and person 2:B are the same and that you want merge the events of those together. But 1:A is married to 1:M, has children 1:D and 1:E, while 2:B is married to 2:N and has children 2:F and 2:G, while 1:A and 1:M have parents in data set 1 and 2:B and 2:N have parents in data set 2. Thus merging 1:A and 2:B still does not resolve the relationship links. It would be possible to design some software aids that would make this process easier, but in the end, a bulk merge operation *will require* human intervention to comfirm or modify suggested actions on a case-by-case basis. Such a merge is a bit like a chain-reaction or avalanche - a small change about duplicated identitity can imply many more possible changes. There are a number of aids within TMG that you can use. The most important are the use of flags and accenting of the Expanded Picklist to help you manage the process. ASIDE: (I am currently merging 4 data sets, each of about 3000 persons, in which there are about 2400 persons that appear in all data sets, but many who are only in 1, 2, or 3 data sets. I have found a tedious but accurate techique that requires a sequenced selective merging process to reduce the combined data set down to one ID number per person, then a second pass through this reduced number (approx. 4000) to deal with and consolidate the multiple duplicate relationships and duplicate names and duplicated events, etc. You need to create a custom flag for each data set and set it for the members of that data set. Then you need to create a new project with all the data sets combined into a single data set. Set up a custom flag in this project with 2 more states than the original number of data sets (if originally you have 2 data sets, then states 1,2,M (merged) and C (cleaned)). Set this flag to 1 when a person is from data set 1, to 2 when from data set 2, etc. Now set up an accent scheme with 4 distinct colours for states 1,2,M and C of this flag. Look at the Expanded Picklist sorted by surname and this will show you bi-colour pairs that are potential merge candidtes, the accenting should be in the parents and spouse columns as well. Confirm the potential to merge these person by looking at the side-by-side merge persons screen. If you are happy merge them to the lower ID person (but don't eliminate any tags, especially any Marriage or Child relationships). The Detail View now will have mixed accents on it. Change the custom flag of that person from 1 to M - this marks the person as having been merged. Once you have dealt with the obvious merge candidates, then look at the remainder. If you find a person who does not appear to be in both data sets, then change their custom flag to C (as there is no cleaning action needed). You will also find persons that you think are duplicates but it doen't make sense (because the suname was totally mispelt in one data set relative to another - in my case one Germanic surname commenced with "F" in the other "P" - soundex doesn't help here!) When you have completed this process over all persons (that is the Expanded Picklist is only at the colour for state M or C of your flag), then you start on the next process. For persons at state M examine their Detail View and consolidate the tag information (usually only requires deleting of duplicates, but beware of the differences in sureties, citations, exhibits, witnesses, and sentences in tags that display identically in the Detail View). When you complete this operation change their flag value to C. As you can't be sure about inter-cousin marriages and other witnesses for persons to be in either data set, I would advise _NOT_ trying to do find a small set of persons in one data set and progressively do many smaller cycles of this process on a family line by line basis. It is much less likely that you will not break any relationship links if you combine all your data into a single, but jumbled, data set and then work methodically through it. It is possible to check the relationship differences between data sets for specific family lines by creating a Descendant Indented Chart for nominally the same person in each data set - use the the end of line person as the focus. Share this post Link to post Share on other sites
Arman 0 Report post Posted November 17, 2005 Hello Robin, Thanks for your help. I will study it carefully and try to follow your advices. Share this post Link to post Share on other sites
RobinL 0 Report post Posted November 17, 2005 Your welcome! I am further refining my merging process to reduce the number of mouse clicks, keystrokes and the number of revisits to the same combined person. The sequence of which person to merge next can significantly shorten the total processing time. I also have added another state to my custom flag "Q" (query) - implying that there are potential errors in the data that I can see but I don't want to disturb the flow of the merging process to investigate the reason. For instance, I found a person born 1788, had child at 1910, and died in 1972 - something fishy here! I need to write up my method so that other persons can continue the merging process on the same data set. Once that has been done I will put it up on a web page for others to view. Share this post Link to post Share on other sites