Jump to content
Jim M.

Citing Websites That Change Over Time

Recommended Posts

I haven't completely researched this, so correct me if I'm wrong... but it appears to me (see signature line) that some of the methods of publishing websites that use genealogical data (perhaps TMG+Second Site?) generate page names that change over time - not just the basic content. Even if that isn't the case... the content changes over time as researchers make progress - whether a "Second Site" or one of its competitors is used or not...

 

And ditto with respect to the content on familysearch.com, etc. And websites can literally disappear overnight.

 

So how do you document a moving website target?

 

Have any of you considered making a point-in-time copy of "important" websites? There is cheap/free website mirroring software out there...

 

-----

 

At this point, I *think* my modis operandi will be to make as many digital copies of as many sources as possible (I absolutely *LOATHE* the idea of storing piles of papers - I HATE PAPER! :furious:) - and that I may mirror as much of some websites as is necessary.

 

What do you think? Is this crazy?

Share this post


Link to post
Share on other sites
So how do you document a moving website target?

 

Have any of you considered making a point-in-time copy of "important" websites? There is cheap/free website mirroring software out there...

 

-----

 

At this point, I *think* my modis operandi will be to make as many digital copies of as many sources as possible (I absolutely *LOATHE* the idea of storing piles of papers - I HATE PAPER! :furious:) - and that I may mirror as much of some websites as is necessary.

When citing a web site I send people to the "front door" of the web site, rather than link to a specific page. This normally takes care of the moving target problem. Some searchable web sites force the user to generate a new search even though it takes you to the same material specified in the URL. Other web sites are forcing the user to enter through the "front door," regardless of the specified URL. These issues are especially true of some of the more well-known web sites.

 

I tend to stay away from web sites that I'm not sure will be there the next day, so a point-in-time copy isn't really necessary for me. With this in mind, I only make a printed copy of essential pages -- a will transcription, a birth certificate, a deed, etc.

Share this post


Link to post
Share on other sites

You could save the web page at that point in time by going "File, Save As", then upload that file if you are uploading that page onto your site with a link. You might want to save that file anyway on your hard drive rather than a paper copy.

 

Of course if the info changes over time you will be refering to old data (could be an incorrect assumption by someone).

 

How often have people redesigned their websites, which means changing file names, etc!

 

Hope it helps.

 

Shaun

Share this post


Link to post
Share on other sites

Changing websites is an increasingly difficult problem and not just in family history. I teach psychology to undergraduate students in my real life and they use web-based materials more and more as the basis for their research, papers etc. It's now common practice in academia to insist that students at least include a "date consulted" with their citations to websites. At least then a reader can make some sort of guess about the currency or otherwise about the material. There are an increasing number of programs that will download the content of websites and enable you to store that material as text or whatever on or offline. That may help, but the outdated information issue is still a difficult one.

Share this post


Link to post
Share on other sites

I agree. (IMO this is another example of why Mills's approach of citing everything separately is both misguided and outdated. Rant omitted.)

 

My source is always the root/home web page of the site. I might include a specific URL in the CD, always understanding that it might go stale. (Martin, I appreciate your note about citing the "date consulted" and I will start doing that.) However, I'm much more likely to record not the URL but the page title or descriptor in the CD. For example: the source is Wikipedia, the CD is "Article on Lady Mary Cambridge." When citing to Genealogics I rarely include a page title, but I frequently include his sources in the CD (e.g., "citing TCP II, p. 156"). Citing family trees found via WorldConnect is a little dicier: I usually make an effort to download a gedcom so I'll have a "real" source.

 

Oh! But what I actually wanted to say when I replied to this thread is that if you've encountered a "disappearing site" and are beating your head against the wall for not having saved the pages locally, try The Internet Archive's "Wayback Machine" at:

 

http://www.archive.org/

 

It can be really handy!

Edited by laura1814

Share this post


Link to post
Share on other sites
I agree. (IMO this is another example of why Mills's approach of citing everything separately is both misguided and outdated. Rant omitted.)

Where do you find this approach in Mills? :unsure:

Share this post


Link to post
Share on other sites

Okay, you got me. I have never read Mills. Substitute the "TMG implementation of Mills," please.

 

As for where I get this idea, here's one example from memory: there are separate "source types" for letters and email.

 

Also, IIRC, the website "source type" is missing several key attributes, and when I tried to add things I thought were important, the things I wanted weren't available. For example, I can't enter more than one URL.

 

I've also been frustrated by the complications involved in customizations, especially when importing data or switching back and forth among data sets or projects.

 

I usually just stick with the Book source type, and add any details I think relevant. I go for generic over specificity. Maybe I should add that to my sig? Or how about just:

 

I'm a Lumper!

Edited by laura1814

Share this post


Link to post
Share on other sites

Hello Jim,

 

When I come across information in a website that I require (although I must admit that I dont do this very often, perhaps I am missing out on something?), I take a screen shot and then manipulate it in an image program to delete my browser etc. Is this not an option?

 

I suppose that there could be times when there could be more than a screen shot's worth of info and I know there could be copyright issues but there is no commercial gain, the website is acknowledged in the source information and if you want to be extra careful you could email the site to request permission.

 

Ben

Share this post


Link to post
Share on other sites
Hello Jim,

 

When I come across information in a website that I require (although I must admit that I dont do this very often, perhaps I am missing out on something?), I take a screen shot and then manipulate it in an image program to delete my browser etc. Is this not an option?

 

I suppose that there could be times when there could be more than a screen shot's worth of info and I know there could be copyright issues but there is no commercial gain, the website is acknowledged in the source information and if you want to be extra careful you could email the site to request permission.

I guess I would start my reply with 'I'm think, ultimately, I'm not a lumper'. I have been a database administrator in the IT world in a previous life, and, with data quality/integrity being the primary objective of someone in that capacity, I'm leaning toward creating my own mirrors of websites that are "important" - if "File, Save" doesn't suffice. Heck - I saw a 500GB USB-enabled hard drive at Best Buy the other day.

 

And I definitely wouldn't ask for permission - so I wouldn't share that data with anyone else - only the reference to the site and date. When it comes down to the validity of the data, though, I think the sites I visit most often are going to get a surety of 0, or maybe 1, as they rarely include sources. As far as validity goes - they're often effectively useless - but they provide a starting point for other searches. (Having a "suspected" name of a sibling/parent is often enough to uncover a half dozen generations that are "verified" with more credible sources.)

Edited by Jim M.

Share this post


Link to post
Share on other sites
I guess I would start my reply with 'I'm think, ultimately, I'm not a lumper'.

I dont think that it is a case of being a 'lumper' or not, you could take as many screenshots of each individual piece of information and give it individual sources etc. I understand that for a large amount of information it could be a lot of work but if you want to immortalize information it is a possibility.

 

I also see that if the information on the website in question changed overtime, you would still have the old information and would have to update all the image files, (your original point of the thread), but the same can be said for linking to a website. You may be linking to a family tree that someone has researched and published and they could then delete the information you are pointing your users to. Really you would need to check the website in question frequently to ensure that the information is still available.

 

The only advantage of grabbing or linking to a website, that I can see, is the initial amount of work required. Screen shots of information saved as an image or downloading a complete website is just a different approach to the same outcome.

 

However ;),

 

For grabbing complete websites I use Pagesucker (http://www.pagesucker.com/). I paid the $10 fee for the complete version and I have had absolutely no trouble with it. You can read the information pages but basically it downloads the complete site you want and it has user definable rules to allow you control over how much or how little you want to download. I use it for when i have no access to the internet and i might need a lot of information, Terry's tips etc ;)

 

Ben

Share this post


Link to post
Share on other sites
I haven't completely researched this, so correct me if I'm wrong... but it appears to me (see signature line) that some of the methods of publishing websites that use genealogical data (perhaps TMG+Second Site?) generate page names that change over time - not just the basic content. Even if that isn't the case... the content changes over time as researchers make progress - whether a "Second Site" or one of its competitors is used or not...[snip]

 

Having read all the previous postings, I deduce that many are neglecting a significant aspect of our work.

 

Like many of us, I use prior-generation document citations as sources; my most commonly cited is an 1899 publication. That's well over 100 years old. Where will the web sites be 100 years from now? Which is a more useful citation for follow-on generations to use; a paper document, a computer file on some media (such as a web site copy) or a web site itself? The same pretty much applies to email citations.

 

I must admit, in fairness, that I sometimes cite web sites, but in those cases where possible I use the site data as a clue as to where to find the "real" source (I deplore greatly those many sites which don't include source citations). In the case of emails I most usually copy the contents into an exhibit associated with the source. One of these days I'll find it possible to publish - both on paper and CD - "the book;" in the interim a CD copy of my Second Site presentation will have to do. But will IT be able to be read 100 year from now? Probably not (sorry, John).

 

Please think about the distant future when it comes to citations.

 

Dick

Edited by RGC

Share this post


Link to post
Share on other sites
Okay, you got me. I have never read Mills. Substitute the "TMG implementation of Mills," please.

That's a very different thing than what you originally said. :)

 

I'm not surprised that you have difficulty with TMG's "Millls" source types if you've never read her book. Comparing the two makes using them a whole lot easier, in my experience. And provides a good basis for creating your own practices.

Share this post


Link to post
Share on other sites
Having read all the previous postings, I deduce that many are neglecting a significant aspect of our work.

 

Like many of us, I use prior-generation document citations as sources; my most commonly cited is an 1899 publication. That's well over 100 years old. Where will the web sites be 100 years from now? Which is a more useful citation for follow-on generations to use; a paper document, a computer file on some media (such as a web site copy) or a web site itself? The same pretty much applies to email citations.

 

I must admit, in fairness, that I sometimes cite web sites, but in those cases where possible I use the site data as a clue as to where to find the "real" source (I deplore greatly those many sites which don't include source citations). In the case of emails I most usually copy the contents into an exhibit associated with the source. One of these days I'll find it possible to publish - both on paper and CD - "the book;" in the interim a CD copy of my Second Site presentation will have to do. But will IT be able to be read 100 year from now? Probably not (sorry, John).

 

Please think about the distant future when it comes to citations.

 

Dick

Well said, Dick! A vast majority of web sites can ONLY be used as clues to what the actual information might be. I quote from several web sites that fall into a "trusted" category, simply because of their extensive use of sources and citations. Most web sites fall way short of providing "trusted" information because we have no idea where that information came from. Those web sites can provide clues, point us in a direction to look, or give us food for thought, but they fall short of providing truly useable information.

 

Then we get to those web sites where the owner has done nothing more than copy someone else's work and done no research of their own. They tend to copy mistakes and all and really have no clue as to what their own genealogy really is.

 

Wouldn't it be wonderful if all people posting personal web sites would provide the sources of their information and cite it properly! Then we could all find the record and see for ourselves what information it had.

Share this post


Link to post
Share on other sites
At this point, I *think* my modis operandi will be to make as many digital copies of as many sources as possible (I absolutely *LOATHE* the idea of storing piles of papers - I HATE PAPER! :furious:) - and that I may mirror as much of some websites as is necessary.

 

You might want to read this from the APG-L Re: [APG] Preserving Records - Microforms vs. CD etc.

Share this post


Link to post
Share on other sites

Hello All,

 

I save a text only copy of the web site data in a compressed file. I do not save images only the text (in most cases). I site the web site as well as my stored copy. If the web site disappears or changes, I can always go to my saved copy. I wrote a program to save genealogical reports posted to the web automatically several years back. It can save hundreds (or thousands) of pages without the user having to navigate to each page and click save. It only saves the text, the resuting saved pages can be compressed and take very little space on a hard drive. I offer it free to anyone that wants it. Send an email to WPSaver@hotmail_NOSPAM_.com. (remove the _NOSPAM_ from the email address).

 

I am now working on another program to take the saved web pages and convert them back into electronic format. Perhaps in will be completed in another year.

 

Best Regards,

Ken

Share this post


Link to post
Share on other sites
You might want to read this from the APG-L Re: [APG] Preserving Records - Microforms vs. CD etc.

You make an excellent point.

 

Having given this quite a lot of thought for a bit over a week, at this point I'm still not going to save piles of paper. Perhaps I have a dominant "loses organizational integrity" gene, but I just can't see spending a lot of time filling filing cabinets full of things I may never be able to find again (even with the best of schemes) - not to mention the fact that when I get too old or too feeble, my relatives will probably just cart it all to the landfill...

 

For now - I think I will replicate selected sites and pkzip them... and pray that Windows Explorer and/or its successors will support today's implementation of HTML/etc.

Share this post


Link to post
Share on other sites
You make an excellent point.

 

Having given this quite a lot of thought for a bit over a week, at this point I'm still not going to save piles of paper. Perhaps I have a dominant "loses organizational integrity" gene, but I just can't see spending a lot of time filling filing cabinets full of things I may never be able to find again (even with the best of schemes) - not to mention the fact that when I get too old or too feeble, my relatives will probably just cart it all to the landfill...

 

For now - I think I will replicate selected sites and pkzip them... and pray that Windows Explorer and/or its successors will support today's implementation of HTML/etc.

 

Jim,

 

I'm with you, but with a twist. I have Adobe Acrobat v7 Pro. When I find a website with content I want to preserve, I open it in IE and convert it to a PDF document or print it to a PDF printer. My citation then cites the website and makes the PDF file an exhibit. With the increasing capacities of USB/Firewire devices approaching/exceeding those of DVDs, that has become my primary mode of backup storage. However, the probable continuation of CD-ROM discs remains a viable source for sharing content with others. Paper is no more permanent than any other medium. Besides, not all sources are pertinent to just people.

 

I have been researching Swiss place names heavily this week, trying to match up current names/spellings with what has been historically documented in previously published genealogies, with only partial luck. Wikipedia now has a lot of information it didn't have before and I will document the place name changes accordingly. The main compiler for one piece of work had no understanding of the subdivisions of Swiss communities, nor how several cantons moved from German spellings to French in recent years.

 

I believe we need to get away from the prevalence of documentary evidence and into electronic evidence, which is the main reason I dropped off the RootsWeb mailing list. There just wasn't enough interest in citing/storing electronic information.

 

 

Roy Sprunger/Assyria, MI

Share this post


Link to post
Share on other sites

Thanks, Glenn. Your point taken is well taken.

 

While nothing lasts forever, the best we can hope for is to extend the life of whatever medium we choose, as long as possible. The only hope of doing that right now, is printing our data on acid-free paper, preferably in a bound book. Until then, everything else is transitory. My goal is to keep my information dynamic for as long as possible so as to ensure its usefulness. 'Cause one thing is for sure, if anybody waits too long to use my information, they will most likely have to rekey everything from scratch. By then, I'll be long gone and beyond caring.

 

 

 

 

Roy Sprunger/Assyria, MI

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×