Jump to content
TMGnewbie

Import GEDCOM problems

Recommended Posts

I am brand new to TMG and imported a GEDCOM to it. First error I noticed is that all the sources were labelled as source type "Book (authored)" even those clearly listed as Social Security applications, interviews, etc.

 

Why is that? It would be a major undertaking to go over everything, sources and all, again.

Share this post


Link to post
Share on other sites
I am brand new to TMG and imported a GEDCOM to it. First error I noticed is that all the sources were labelled as source type "Book (authored)" even those clearly listed as Social Security applications, interviews, etc.

 

Why is that? It would be a major undertaking to go over everything, sources and all, again.

Welcome to TMG! :)

 

GEDCOM doesn't identify the type of source, mainly, I suppose, because there is no agreement between genealogy programs as to how many types there are and what those types should be called. TMG does not try to "decode" the source type from the text of the source description in the GEDCOM. I suppose some types might be obvious, but many would not be.

 

First, look at the actual notes produced by TMG after the import. You can do that by printing a report with sources, or going to the individual sources and clicking Preview in the Output Form tab. I suspect you will find that many read OK despite being assigned a source type that is not optimal.

 

For those that need some editing, you don't have to "go over everything" but just fix the specific sources that need work. Open the source, click the Source Type button, and choose a better source type. You may find that some of the information entered needs to be in different Source Elements - use Windows cut and paste to transfer it to the appropriate field. Use the Preview button on the Output Form to check your work.

 

Most users of TMG find that they want to revisit imported sources over time to better format them to take advantage of TMG's advanced source recording features. But unless you need to produce a polished report right away, there is no need to do that in a rush. You might want to look at the article on Importing on my website (link below) for some suggestions about how to go about cleaning up after an import from another program.

 

Finally, I assume you used GEDCOM because there is no direct import from the program used before? Generally, direct import gives better results if it's available.

Share this post


Link to post
Share on other sites

I have the current 6.12 gold and am importing a very large 300,000 name GEDCOM to a new project and this will be the only file in there. There may be better ways, but that's what ROOTSWEB.COM has, GEDCOMs. It brought in the file and did most of what it needed to do in 4 hours. It seems to be busy at 99% of Updating inferred names. That has gone on for 10 more hours now. I'll leave it to run overnight. Nothing else is going on on that computer. I tried once before and shut the import down after 2 days. This is a fairly fast AMD 2.4GHz computer. It had saved nothing at that point, and was still empty. Memory usage is 186,244K. 1,099,316K peak from early processing. It cycles 4K in and out now and then. The computer has lots of HD space and 3 Gb memory on WinXP.

 

Anyhow, it writes a random something to disk every few seconds and does not appear to be looping. It may just be a VERRRY SLOW process. I saw a genealogy newsletter article about 6.0.2 having problems with inferred names from gedcoms. page faults are still going up, 35 million so far. Not much else is happening.

Share this post


Link to post
Share on other sites

It is still running, after overnight. CPU time is over 22 hours. It is still at 99%, updating inferred names. Page faults are a little over 500 per second, up to 53 million now.

 

Any ideas?

Share this post


Link to post
Share on other sites
I have the current 6.12 gold and am importing a very large 300,000 name GEDCOM to a new project and this will be the only file in there.

300,000 is a lot of people. Can you send me the GEDCOM for testing? If a ZIP archive of the GEDCOM is too large for email, I'll tell you how to upload it to a Wholly Genes server.

 

Please contact me directly. Click below for my email:

Jim Byram

Share this post


Link to post
Share on other sites
300,000 is a lot of people. Can you send me the GEDCOM for testing? If a ZIP archive of the GEDCOM is too large for email, I'll tell you how to upload it to a Wholly Genes server.

 

Please contact me directly. Click below for my email:

 

(in original message)

 

Not yet. I'm heading out tomorrow morning for a few days. The e-mail is on the computer that is still running the import.

I'm on dialup. It took a long time to get it, and would take an equally long time to send it anywhere. If you still need me to do that, it would have to be next week. Thanks for the prompt response.

 

It's actually larger than that, since he recently updated it. 468,716 people. I recall trying it on the older one with a similar problem. I haven't had any problems importing smaller gedcoms, except for the standard gedcom omissions like sources.

 

stobie.zip 27,744,214 bytes

stobie.ged 133,419,582 bytes (after unzipping)

 

The stobie database on rootsweb is here. I think I have it set so that the download the full gedcom.zip link shows up on the page. (It does.)

It should be available from looking at any name.

 

http://wc.rootsweb.com/cgi-bin/igm.cgi?op=...e&id=I90310

 

My own master file is over 600,000, but 50% or more needs to be merged and combined, before merging it with anything else. It is very time consuming.

Edited by Jim Byram

Share this post


Link to post
Share on other sites
stobie.zip 27,744,214 bytes

stobie.ged 133,419,582 bytes (after unzipping)

 

The stobie database on rootsweb is here. I think I have it set so that the download the full gedcom.zip link shows up on the page. (It does.)

It should be available from looking at any name.

 

http://wc.rootsweb.com/cgi-bin/igm.cgi?op=...e&id=I90310

Downloaded the GEDCOM and will take a look.

Share this post


Link to post
Share on other sites
Downloaded the GEDCOM and will take a look.

 

Jim,

 

If it helps any I also downloaded the file and began importing it at 1:30AM Thursday morning. It is now 12:37 AM (23 hours later) and I'm at 85%-importing families with 466,862 people, 18 headers and 14,997 warnings so far.

 

I'm importing to TMG 4.0d on a .997Ghz Pentium III with 512 meg of ram ( and yes, I'm a glutton of punishment!<G>)

Share this post


Link to post
Share on other sites
If it helps any I also downloaded the file and began importing it at 1:30AM Thursday morning. It is now 12:37 AM (23 hours later) and I'm at 85%-importing families with 466,862 people, 18 headers and 14,997 warnings so far.

I'm sure that the import speed could be shortened considerably by optimizing the import code. The problem is that the effort would take a very long time to do (think on the order of a week or more) and since files of this nature are so rare, it's difficult to justify the developer time required to do this.

 

I still haven't gotten to the import. I'll set it up on another machine this morning and let it run until it's done.

 

%%%%%%%%%%

 

The import of stobie.ged on my notebook under TMG7 took about 5.25 hours.

 

The import under TMG v6.12 on a slower machine took between 12.75 and 13.75 hours.

 

The disk space used by the indexed v6.12 project is about 2.05 Gb and the backup file size is about 95 Mb. While you can work with the project even with the Project Explorer linked to the other windows, a project of this size is really too sluggish to be practical. Doing some cleanup and looking at the project further.

Edited by Jim Byram

Share this post


Link to post
Share on other sites
I'm sure that the import speed could be shortened considerably by optimizing the import code. The problem is that the effort would take a very long time to do (think on the order of a week or more) and since files of this nature are so rare, it's difficult to justify the developer time required to do this.

 

I still haven't gotten to the import. I'll set it up on another machine this morning and let it run until it's done.

 

%%%%%%%%%%

 

The import of stobie.ged on my notebook under TMG7 took about 5.25 hours.

 

The import under TMG v6.12 on a slower machine took between 12.75 and 13.75 hours.

 

The disk space used by the indexed v6.12 project is about 2.05 Gb and the backup file size is about 95 Mb. While you can work with the project even with the Project Explorer linked to the other windows, a project of this size is really too sluggish to be practical. Doing some cleanup and looking at the project further.

 

 

Jim,

Even your "slow" machine is faster than mine.

 

On mine the import into v4.0d took 59 hrs. 15 min, 11 sec. (although I'm very surprised at rather quick response time when changing views. I've not yet had the courage to try to add something<g>)

Share this post


Link to post
Share on other sites

(I'm back for a couple of weeks.) I am seeing that some of you got this to work.

 

After thinking about this a bit more, I wonder whether the specific GEDCOM options set in the advanced wizard might have something to do whether it "gazinta" TMG or not.

 

I have been trying to import everything with this combination. I could always cut back if I had to:

 

x create DIVorced flag

x assume marriage of parents

 

x convert widowed SOURce tags

x combine identical sources

x cite GEDCOM file for all data

x read NPFX/GIVN/SURN/NSFX names

 

page 2

 

x reserve REFERENCE field for reference (REFN)

 

------------------

files of this nature are so rare

------------------

 

With more info being published and more connections being established, they should not be as rare as you might think. Many, many families descended from early immigrants to the US have one or more royal links to Edward III and the Plantagenets. That sufficed for me and is about all it takes to get an ancestry with thousands of confusingly cross-linked names (included in this gedcom and my own database). I also see the Stuarts of Scotland (also in there and in my own database.) I looked at it in the original rootsweb but don't yet know where the bulk of the stobie info is coming from. It looks like it still serves well as a giant test gedcom, eh?

Edited by retsof

Share this post


Link to post
Share on other sites
x create DIVorced flag

x assume marriage of parents

 

x convert widowed SOURce tags

x combine identical sources

x cite GEDCOM file for all data

x read NPFX/GIVN/SURN/NSFX names

 

page 2

 

x reserve REFERENCE field for reference (REFN)

You have to be real careful about selecting options that don't apply to the GEDCOM.

 

x convert widowed SOURce tags

x combine identical sources

x read NPFX/GIVN/SURN/NSFX names

 

x cite GEDCOM file for all data

 

On Step 7, you also need to create custom TMG tag types or assign to existing tag types all of the unassigned and event-misc tags. This is a very important step that saves considerable post-GEDCOM import cleanup. There are over two dozen such GEDCOM tags in this import.

 

Running VFI on the imported project took all day and overnight. Don't know how long it took. It had been running 17 hours when I went to bed and was completed when I got up.

Share this post


Link to post
Share on other sites

Thanks, I wanted an opinion on those choices. I will check them.

 

On Step 7, you also need to create custom TMG tag types or assign to existing tag types all of the unassigned and event-misc tags. This is a very important step that saves considerable post-GEDCOM import cleanup. There are over two dozen such GEDCOM tags in this import.
Yes, I do that, as the mix of tags in different imports can be completely different. I noticed a couple of things, especially things like military and christening, that don't seem to be in the list. My list has Milit-Beg and Milit-End. It asks me later whether I want to use these "standard" tags one by one for equivalence, and I say yes. "Military" was one of the after-the-fact tags. Some military-type things can be equated to Milit-Beg. Prop for property comes up a lot, and I assign it to residence. I also see a lot of different event types (like event-title and event-relationship) and I unassign them to fix later, since I want to preserve the types. Event-misc is the only one in the list. I keep the word in a memo, and have to move the rest into it, since it usually ends up as a place. The event-misc I leave alone as it equates 1:1. I could make some custom tags, I suppose, but I'm trying to keep with the standard ones as much as possible. Some seem to creep in from other early imports that I might have missed.
x cite GEDCOM file for all data <== results in over a half million citations being created but good choice
Unfortunately, in a way, but I want to know where my data is coming from when I merge it. Edited by retsof

Share this post


Link to post
Share on other sites

I've got the import from stobie in a new project!

 

A grid computing project was still running in the background, and the import quit at 85%. Usually that isn't a problem, but this wanted all of the memory that I could give it. I stopped the grid computing for the next import, which was the only application running.

 

Getting rid of these options probably had something to do with it.

convert widowed SOURce tags <== don't do this, not required

combine identical sources <== I wouldn't do that

 

The import ran about 6 hours, including 1 hour to check the inferred names at 99%. That is the portion that ran a couple of days previously until I killed it and added to this message problem thread.

 

All others were still there, including making a citation for stobie.ged. Thanks for the help. That shows that the gedcom was not corrupted, or at least not obviously. It is a good very large import test.

 

x read NPFX/GIVN/SURN/NSFX names <== probably not required?, need to check

 

I left that one in there. It may have something to do with reading name variations other than the primary one.

Edited by retsof

Share this post


Link to post
Share on other sites
x read NPFX/GIVN/SURN/NSFX names

 

I left that one in there. It may have something to do with reading name variations other than the primary one.

Stobie.ged uses both the NAME structure and the NPFX/GIVN/SURN/NSFX structure so that option might pick up some name parts that would otherwise be skipped. The option likely adds to the total import time but you were correct to select it.

 

The event-misc I leave alone as it equates 1:1.

You really need to assign all unasigned and all Event-Misc tags to custom tag types. Not assigning the Event-Misc tags will result in many different data types clumped under one tag type after import. The tag type list has 'Christning'. It was probably spelled like that at some point to keep it short.

 

So I imported.

Then I optimized (important to do this first). (File / Maintenance / Optimize)

Then I ran VFI. (File / Maintenance / Validate File Integrity)

Then I optimized again.

 

What took longer than running the import was running VFI on the import which found a grand total of 58 table issues to clean up. So the import of this massive GEDCOM file left little residue.

 

To complete the import, all of the custom tag types that were created need to be edited to fix the sentences.

For example...

Change:

[P] Cremation

to:

[P] cremated

etc.

 

And then the Master Place List will need some alignment of places. And the sources and repositories (if they exist) will need editing. When that's done, you're in fairly decent shape. :wacko:

Share this post


Link to post
Share on other sites
Then I optimized (important to do this first). (File / Maintenance / Optimize)

Then I ran VFI. (File / Maintenance / Validate File Integrity)

Then I optimized again.

 

The first optimize never mentions how much got optimized.

 

In other databases, skipping the first optimize, running VFI, and THEN optimize SEEMS to save a lot of time, and the data is clean.

 

I have also noticed that after running John Cardinal's utility to combine a couple of sources (and coming back to delete the second source) the data definitely needs optimizing, and perhaps VFI. There must be some stubs out there that can go away.

 

You really need to assign all unasigned and all Event-Misc tags to custom tag types. Not assigning the Event-Misc tags will result in many different data types clumped under one tag type after import.
I misspoke slightly. Event-misc was assigned to event-misc. The others that were not caught were still event-misc, but the type was in the memo. I needed to move the places out anyway, but I might be able to do some of that in the master place list. Creating a custom would be cleaner, I see.

 

How do TMG custom tags work when trying to EXPORT this to a standard gedcom, then?

 

I hadn't worried much about sentence narrative, since I usually display in the pure data mode.

 

When that's done, you're in fairly decent shape.
I am? Only THEN can I merge it with another project, and start combining more duplicates. I won't be merging that one soon. I also notice that stobie has a UID in each record, which I don't care about either. I assume that they can go away.

 

I have had good luck in another database by marking the reference code in a way that I knew I had been there and had checked the data, and which way I was going, assuming that this was a connected line in SOME WAY. The sequence seems to work well and tells me more than just setting a flag. I go to a (any) male line all the way to the top and come down with a child. When I see a wife that I had not checked, I will go there and go to her father's line all the way to the top. Only when I find a husband and wife of the same family both checked do I continue with a child. It's like leaving string on the way up, and leaving larger string on the way down. If I see any male ancestor, I immediately head up that way first. The way this method works, I can start ANYWHERE.

Edited by retsof

Share this post


Link to post
Share on other sites
The first optimize never mentions how much got optimized.

It did for me and I've done that twice. If you never see the messagebox, the optimize hasn't completed. A considerable chunk was removed by the first optimize... mostly reductions in the index files, of course.

 

In other databases, skipping the first optimize, running VFI, and THEN optimize SEEMS to save a lot of time, and the data is clean.

From experience, that's a bad practice.

 

How do TMG custom tags work when trying to EXPORT this to a standard gedcom, then?

Custom tags export by default as 1 EVEN 2 TYPE tags and that is how they should be exported.

 

I won't be merging that one soon. I also notice that stobie has a UID in each record, which I don't care about either. I assume that they can go away.

Delete the _UID tag type.

Share this post


Link to post
Share on other sites

At the end of an import, it always asks whether I want to do an optimize. If I say yes, it merely finishes without comment. If I say no and come back to optimize, it finishes with a box telling me how much was saved.

 

I will try to watch an optimize (of something else) immediately after the import. I think I get a beep when the optimize finishes.

 

I looked at it a bit (sluggishly) and DID come back to run an ordinary optimize on the data yesterday, which finished with the box. It ran less than an hour.

 

I am validating it now, which is about 8 hours into it. (35% overall, 67% of checking principals who are not witnesses, which is always a very large step)

 

The principal check ended at about 37% after almost 10 hours, but the remainder only took about 5 minutes. I have also seen that ratio with other files.

 

The final optimize was then run. After a day's computation, the access to the file and update time is faster.

Edited by retsof

Share this post


Link to post
Share on other sites
It did for me and I've done that twice. If you never see the messagebox, the optimize hasn't completed. A considerable chunk was removed by the first optimize... mostly reductions in the index files, of course.
I need to clarify. This thread has been about importing a large file to a blank project. The problem happens after a dataset merge of an import file with something that is already there, after I merge the data and delete the small import file. Upon exiting the dataset manager, the question comes up about recommending an optimize.

 

YES: optimize runs with progress box and ends quietly. No beep. No messagebox. I have gone away and come back with no change. I have also looked at the Windows Task Manager, and TMG isn't doing anything but sitting there.

 

NO: It goes back to the details entry screen. If I select a normal maintenance optimize here, it runs with the progress box, beeps at the end, and pops up another message box telling how much it saved.

Edited by retsof

Share this post


Link to post
Share on other sites
On Step 7, you also need to create custom TMG tag types or assign to existing tag types all of the unassigned and event-misc tags. This is a very important step that saves considerable post-GEDCOM import cleanup. There are over two dozen such GEDCOM tags in this import.
I later created some custom event-type tags when importing another file, and they worked well. It is a bit disappointing when doing the ACTUAL import. After unassigning the tag and hitting the property button, I saw only a little box to enter the custom tag. I suppose it uses one that I have previously assigned in the master tag list if it can, but otherwise would make a new one. I haven't forced an error to make it do that, but wonder about the gedcom equivalents, etc. that I see there if I make a custom tag in the master tag list.

 

The tag was imported correctly. It would be very easy to make a typo on a tag like "relationship", and one would STILL have to clean up the tag later to put the data in the PROPER custom tag that I envisioned. It's too bad that there isn't a picklist there for the custom tags in the import, or a merged picklist to begin with, with standard and custom tags in the assignment box.

Edited by retsof

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×