Jump to content
Kevin Schultz

GEDCOM Import Problem

Recommended Posts

After importing a large GEDCOM (~143,000 lines), I reviewed the import log file and found a list of source lines that were not processed. I am unable to determine what causes the problem. They seem to be a random selection of the sources. Both processed and unprocessed sources are cited at least once. Yet the process is reproducable. Importing the GEDCOM a second time as another dataset causes the same issue with the same sources "skipped". I've attached a file that extracts the beginning section of the unprocessed source lines from the log file. I've also extracted the section of the GEDCOM where the sources start. Is this a TMG problem or a GEDCOM file problem?

 

Kevin

 

TMG_607_GEDCOM_Import_Issue___List_file_extract.doc

TMG_607_GEDCOM_Import_Issue___GEDCOM_Extract.doc

Share this post


Link to post
Share on other sites

There is nothing intrinsically wrong with the first source that you say doesn't import so that points to the citations as the issue.

 

I need to see the GEDCOM. How big is it when compressed into a ZIP file?

 

You can contact me by email by clicking on the link below:

Jim Byram

Share this post


Link to post
Share on other sites

The problem is that you selected:

Step 5: GEDCOM Options 1

Convert widowed SOURce tags

 

This option will never apply to any GEDCOM 5.5 file which is what you are trying to import. In fact, I've never seen a GEDCOM file that required this option other than the GEDCOM that I created to originally test this.

 

Stick to the defaults except where you understand exactly what the change does and that you need to make it. For example, it made sense to select 'Assume marriage of parents'.

 

You need to look at the GEDCOM and the differences with a given option to see if a change makes sense.

 

You will need to use Step 7 to assign the custom GEDCOM tags as custom tag types for import.

Share this post


Link to post
Share on other sites

Thanks Jim,

 

After unselecting the "Convert widowed SOURce tags" option, all the sources were created.

 

However, that leads to the next question/issue. Many of the sources in the GEDCOM have a NOTE tag. For some, that information shows up in the comments box of the Supplemental tab where I expected to see it. For others, the actual note shows up as an unprocessed line in the Import .LST file and for others the note seems to be just "missing". It is not in the source supplemental and not in the unprocessed list. How should these notes be processed during the import?

 

Kevin

Share this post


Link to post
Share on other sites
However, that leads to the next question/issue. Many of the sources in the GEDCOM have a NOTE tag. For some, that information shows up in the comments box of the Supplemental tab where I expected to see it. For others, the actual note shows up as an unprocessed line in the Import .LST file and for others the note seems to be just "missing".  It is not in the source supplemental and not in the unprocessed list. How should these notes be processed during the import?

If the NOTE tag is subordinate to the source record tag (in other words, 1 NOTE), the text should always be imported to the note field on the Supplemental tab. You'll need to point me to some examples to check.

 

Hummm... There are 40 NOTE tags in the sources and all are using xrefs so don't yet see a reason why some would work and others not.

Share this post


Link to post
Share on other sites
If the NOTE tag is subordinate to the source record tag (in other words, 1 NOTE), the text should always be imported to the note field on the Supplemental tab. You'll need to point me to some examples to check.

 

Hummm... There are 40 NOTE tags in the sources and all are using xrefs so don't yet see a reason why some would work and others not.

 

I've looked throught the sources section of the GEDCOM and found the following patterns.

 

- If there is a single NOTE at the +1 level to the SOURce and there is no NOTE at the +1 level to the REPOsitory, then the NOTE shows up in the Supplemental tab.

 

- If there are 2 NOTEs at the +1 level to the SOURce and there is no NOTE at the +1 level to the REPOsitory, then sometimes the first and sometimes the second note shows up in the Supplemental tab and the other in the "Not Processed List" (from what I can see, the GEDCOM 5.5 standard allows multiple NOTE structures at the +1 level)

 

- NOTEs at the +1 level to the REPOsitory do not show up in the Source Supplemental or Repository Memo fields but sometimes show in the "Not Processed List".

 

- The Call Number (CALN) tag is put into the reference field of the Repository Link Entry Screen. The entry is truncated to the 25 characters of this field even though the GEDCOM 5.5 standard allows 120.

 

Is there a way to make the call number go to a Call Number source element? Does that require the default source type be modified to add the Call Number element to the footnotes, etc?

 

What is the expected behavior for REPOsitory notes? Does the MEDIa tag in the source map to a particular location?

 

I've sent you a file with examples to check.

 

Kevin

Share this post


Link to post
Share on other sites

Kevin,

 

My inclination is to do nothing. I built a test GEDCOM and tried it with GenViewer and Family Historian (which uses GEDCOM 5.5 as it's native database).

 

GenViewer uses the last source memo found which means it doesn't accept duplicate source NOTE tags just like TMG. Family Historian accepts both source notes but has no structural constraint since it is using the GEDCOM file.

 

Both programs reject the REPO structure since it is completely illegal. Repository records are top level records in a GEDCOM. It is illegal for the repository to be buried in the source record. Note that TMG actually imports this but is having some difficulty with the structure.

 

This represents just one of many instances over the years where FTM ignores the GEDCOM specs and does it its own way. The trick is to import this to something like PAF to see if the data gets in and create another GEDCOM from there.

 

This particular file is not a big undertaking to clean up after import since we're not talking about much data to hand correct.

 

(from what I can see, the GEDCOM 5.5 standard allows multiple NOTE structures at the +1 level)

The specs are not clear on this point. The practice from what I can see is to have one NOTE tag per source. It's hard to see how any program could have more than one NOTE tag since the program note field for sources is typically one field.

 

- NOTEs at the +1 level to the REPOsitory do not show up in the Source Supplemental or Repository Memo fields but sometimes show in the "Not Processed List".

As notes above, the REPO structure in this GEDCOM is illegal. And I would again think the practice to to have one repository note for the same structural reason as with sources.

 

- The Call Number (CALN) tag is put into the reference field of the Repository Link Entry Screen. The entry is truncated to the 25 characters of this field even though the GEDCOM 5.5 standard allows 120.

I don't see this tag as being different from any other GEDCOM tag and can have unlimited length by way of CONC and CONT statements on the following lines. In any event, the length of the TMG field is the length. The design of TMG is not driven by what GEDCOM does or doesn't do except for following the rules on export.

 

Does the MEDIa tag in the source map to a particular location?

Don't think so. the TMG option on the Supplemental tab is a list box of set choices and it would be hard to map the GEDCOM tag to that design.

 

Jim

Share this post


Link to post
Share on other sites
Both programs reject the REPO structure since it is completely illegal. Repository records are top level records in a GEDCOM. It is illegal for the repository to be buried in the source record. Note that TMG actually imports this but is having some difficulty with the structure.

 

This represents just one of many instances over the years where FTM ignores the GEDCOM specs and does it its own way. The trick is to import this to something like PAF to see if the data gets in and create another GEDCOM from there.

 

I don't want to seem argumentative, I'm just trying to become more familiar with the GEDCOM 5.5 standard. When I read the standard, I see the following.

 

SOURCE_RECORD: =

 

n @<XREF:SOUR>@ SOUR {1:1}

+1 DATA {0:1}

+2 EVEN <EVENTS_RECORDED> {0:M}

+3 DATE <DATE_PERIOD> {0:1}

+3 PLAC <SOURCE_JURISDICTION_PLACE> {0:1}

+2 AGNC <RESPONSIBLE_AGENCY> {0:1}

+2 <<NOTE_STRUCTURE>> {0:M}

+1 AUTH <SOURCE_ORIGINATOR> {0:1}

+2 [CONT|CONC] <SOURCE_ORIGINATOR> {0:M}

+1 TITL <SOURCE_DESCRIPTIVE_TITLE> {0:1}

+2 [CONT|CONC] <SOURCE_DESCRIPTIVE_TITLE> {0:M}

+1 ABBR <SOURCE_FILED_BY_ENTRY> {0:1}

+1 PUBL <SOURCE_PUBLICATION_FACTS> {0:1}

+2 [CONT|CONC] <SOURCE_PUBLICATION_FACTS> {0:M}

+1 TEXT <TEXT_FROM_SOURCE> {0:1}

+2 [CONT|CONC] <TEXT_FROM_SOURCE> {0:M}

+1 <<SOURCE_REPOSITORY_CITATION>> {0:1}

+1 <<MULTIMEDIA_LINK>> {0:M}

+1 <<NOTE_STRUCTURE>> {0:M}

+1 REFN <USER_REFERENCE_NUMBER> {0:M}

+2 TYPE <USER_REFERENCE_TYPE> {0:1}

+1 RIN <AUTOMATED_RECORD_ID> {0:1}

+1 <<CHANGE_DATE>> {0:1}

 

with the source repository at the +1 level and the double brackets... Indicates a subordinate GEDCOM structure pattern of a record, structure, or substructure is to be substituted in place of the enclosing double angle brackets.

 

And the source repository record structure is

SOURCE_REPOSITORY_CITATION: =

 

[

n REPO @XREF:REPO@ {1:1}

+1 <<NOTE_STRUCTURE>> {0:M}

+1 CALN <SOURCE_CALL_NUMBER> {0:M}

+2 MEDI <SOURCE_MEDIA_TYPE> {0:1}

 

And later in the example I see

...

0 @6@ SOUR

1 DATA

2 EVEN BIRT, DEAT, MARR

3 DATE FROM Jan 1820 TO DEC 1825

2 PLAC Madison, Connecticut

2 AGNC Madison County Court, State of Connecticut

1 TITL Madison County Birth, Death, and Marriage Records

1 ABBR VITAL RECORDS

1 REPO @7@

2 CALN 13B-1234.01

2 MEDI Microfilm

0 @7@ REPO

1 NAME Family History Library

1 ADDR 35 N West Temple Street

2 CONT Salt Lake City, Utah

2 CONT UT 84150

...

 

Now the example seems to have an error whereby the MEDIa tag should be at level 3 rather than 2, but I can't see where the structure in the GEDCOM I am importing doesn't follow this standard.

 

From my GEDCOM:

...

0 @S058338@ SOUR

1 TITL St-Gelais Families of North America

1 AUTH Robert St-Gelais

1 NOTE @NS583381@

1 REPO

2 NOTE @NS583383@

2 CALN

3 MEDI Other

...

It does not have an external refernce to the name and address of the repository, but the GEDCOM standard says:

Systems which do not structure a repository name and address interface should store the information about where the source record is stored in the <<NOTE_STRUCTURE>> of this structure.

 

What have I missed, misread, or misunderstood?

 

Kevin

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×