Scripting/indexing help

| | Comments (2)

I'm working on a project to index a bunch of sermons for my church. I'm hoping I might be able to get some advice from some readers with experience on databases and/or scripting.

Particularly, the problem is this: There are full-text transcripts of roughly several hundred sermons. We would like to compile an index of Scripture references as a study aid. That is, we'd like to be able to look up which sermons mentioned particular Scriptures. I'd like to have a script or program I can run on a directory full of sermons to compile an index of Scripture references.

I'm sure this, or something similar, is a solved problem, as I often see indexes of Scriptures at the end of books (i.e. theology books), and people probably don't compile those indexes by hand. I have a rough idea how I might go about this with a Perl script, but before I go and spend a day or so trying to muddle through and make it work, I thought I should ask: Has anyone done this before (or something like it), and if so, can you give me any pointers?

No need to respond extensively in the comments. You can reach me by e-mail at abednego.azariah at gmail dot com.

And, for those still with me, I'll point out that I put this in the "Fun" category, as, strangely enough, I think it will be fun getting this to work.


You'll have to have these things put into a database to get them to show up indexed in the frontend. It is a fun project though.

For anyone who stumbles across this in the future, I've gotten this working and can provide code samples to anyone interested. Here's basically how I did it:
- Wrote a perl script which will automatically download a given web page from the Internet and parse it for Scripture references and web page title, extracting these and saving them to a list. I essentially then modified this to take a list of web pages and download them all, saving all of the references, with links to the web pages they came from, and the web page titles, to a list.
- Wrote another perl script to sort the list based on book of the Bible (Genesis, Exodus, Leviticus, and so on).

In practice how it works is I give it a list of sermons (online) which my pastor has preached and their URL's. It goes through and extracts all of the Scripture references from the sermons and makes a sorted list of them, so you can look up, for example, when he mentioned Genesis 1:33, and so on.

I ran this on a couple years worth of sermons and the resulting list is more than 3000 entries. It takes about 2 minutes to run on 150 sermons or so and make a sorted list of 3000 entries.

So it works pretty well. Now I'm working on formatting stuff: With so many entries, I'm going to make a separate index of those texts which were the primary text for the sermon, rather than texts that were just mentioned in the sermon.

I'm pretty excited about this, as I know I'll use the resulting index as a study aid sometimes, and I can see other people doing the same.

Leave a comment


    The Parablemen are: , , and .



Books I'm Reading

Fiction I've Finished Recently

Non-Fiction I've Finished Recently

Books I've Been Referring To

I've Been Listening To

Games I've Been Playing

Other Stuff


    thinking blogger
    thinking blogger

    Dr. Seuss Pro

    Search or read the Bible

    Example: John 1 or love one another (ESV)

  • Link Policy
Powered by Movable Type 5.04