Selecting a Catalog Management System: A Make-Versus-Buy Case Study
We recently completed a comparison of about two dozen software packages in an attempt to meet our museum’s needs for an asset, collection, and catalog management system. Catalog management systems provide ways to add and organize items, describe them with metadata, and make it easier to sort, search, and manage them as a series of collections. Our comparison included a make-versus-buy decision because we have also been developing a custom software solution internally called the Catalog Maintenance System. Our decision-making process is presented as a case study describing how we assessed various commercially available solutions, our selection criteria, and our tentative decision. Hopefully, our experience will be useful to other museums and similar organizations.
A good catalog management software solution is especially important to a virtual museum, like HCLE . Almost every museum, library, archive, and collection needs some way to keep track of its artifacts, objects, individual people, organizations, and other information. Staff members need to manage the items. Researchers need it to make discoveries and reveal connections. Contributors, enthusiasts, and the generally curious use it to follow up on items they encounter while browsing. Managing any significant number of items while making them publicly available would be incredibly inefficient without a catalog. Since a virtual museum contains only information, without actual artifacts to manipulate, the catalog is the only way to access its content other than browsing through items one by one.
Unfortunately there is no generally accepted standard for cataloging digital artifacts. At first that was a disappointment; but, we quickly realized we shouldn’t be surprised. Each repository is distinguished by its purpose; each collection answers a different question and requires unique ways to discover its contents. We are creating a virtual museum for the study of the history of computing in learning and education; basically, a repository of an era when we learned a new way to support learning. In addition to physical artifacts –books and hardware (like the first Apple 1, and a Radio Shack TRS-80 that includes an actual mouse that decided to live inside the chassis while the computer was in storage) — we have executable software stored on a wide variety of disks, tapes, and drives, an expanding set of ‘born-digital’ oral histories, and internet links to other repositories. Most providers of cataloging software distinguish themselves on how they handle tracking, describing and exhibiting physical objects (in buildings and online). They were not designed to meet HCLE’s needs.
Our Internal Effort
We started with a simple spreadsheet but it proved insufficient for both data entry and item searches by staff. We then went to a Microsoft Access database. This worked for a small number of items (under 500) but we were concerned that Access wouldn’t scale well as we increased the collection to many thousands. Also, it didn’t offer an adequate web interface. For the collection to be made viewable online by the public another layer of software would be necessary. Also, we want visitors to be able to interact with the all the information we have about an item (the metadata), not just a brief description of a displayed image. This requires a very sophisticated user interface.
Fortunately, our founder, Liza Loop, has a long history of thinking in terms of databases, knows the collection intimately, and has a vision of how the eventual virtual museum will operate – a good set of perspectives from which to properly define an efficient system. With that background, we set upon the task of writing custom software to meet our custom needs. We didn’t think it would be easy, but it could be worth the effort. This task has proved challenging.
We are also fortunate because we have a team of part-time and volunteer programmers who started with nothing and built an impressive system that had several of the critical functions we required. We were able to track a few hundred items, and had the opportunity to describe them with an extensive and custom metadata set. (See one of our recent posts about our modified Dublin Core.)
We were not, however, able to open our Catalog Maintenance System to the public – at least, not yet. Using our incomplete software requires lots of instruction, training, and support. We have not yet created a feature to limit permissions to ‘read-only’ access. And, it’s not pretty. While it looks fine to us, the interface needs better graphic design. Several other features like image gallery support and cross-platform functionality are also achievable additions; but we are in danger of being trapped by feature creep.
While part-time and volunteer efforts can achieve remarkable things, we are aware of the full scale of the task and our growing list of targets. Development of this software is also distracting us from the primary goals of digitizing the artifacts, cataloging the collection, and making a preliminary version of the museum available to researchers. It’s a question of resource allocation: what is the best way to spend our time and money to reach the museum’s goals? We have proven we can eventually build a viable system, but our main goal is to create a museum, not to write software. Would it be quicker, cheaper and sufficiently efficient to buy or license existing software? We are embarked on a make versus buy process.
Creating the List
Thanks to attending a few conferences and seminars we were able to create a list of vendors and user anecdotes. Brochures are always encouraging; that’s their goal. Anecdotes ranged from encouraging to discouraging. Available solutions dramatically improve many museums’ efforts, but the general consensus is that there is, as yet, no generally accepted solution. Each software package has advantages and disadvantages, and almost always requires compromises. The majority of features are useful, but most users experience some need for customization. Sometimes the customization is formal and managed with contracts and expenditures. Sometimes the customization is internal and managed by creative uses of the existing features and fields. Our selection will require more than just checking the price tags.
The initial list of candidates
Catalog Maintenance System – HCLE internal effort
Collection Space – Lyrasis
Collective Access (open source)
Design For Context
Omega (open source)
Rediscovery Software Proficio
Our initial list included about two dozen vendors, plus our internal Catalog Maintenance System. Each vendor was presented with a request for quote and was provided with an overview of our situation. Here’s our template for the initial communication.
Please provide a quote for the purchase and use of your software and services. We are a new, small virtual museum currently developing a digital repository, catalog of artifacts, documents, media, and online exhibits. We have an in-house catalog maintenance system, but are interested in understanding the costs and benefits of your offering.
The scope of our task is within the limits of approximately:
- 100,000 items,
- 100 metadata fields (including PURLs),
- 6 seats or licenses,
- file formats including a range of image files, video files, audio files, word processing files, spreadsheets, publication formats, and executable programs,
- an indefinite number of digital exhibits that will be linked to our and other collections.
Currently, a few dozen items have been entered into our MySQL database and in-house catalog. Access is available upon request if it would help produce a better quote.
Include prices for support services such as hosting, customization, porting, training, and real-time help.
Our goal is to create a virtual museum for the history of computing in learning and education. This catalog will be essential to the subsequent research and the development of exhibits.
Feel free to contact us for any clarifications.
Before we sent out the query we established a common set of criteria for an initial assessment based upon whether they could accommodate our scope, the pricing mentioned in the query, but also including the ability to import and export all metadata, database structure (if any), and whether their software was open-source.
This exercise was quickly instructive. While we asked for quotes broken into specific categories, vendors’ internal business processes dictated a variety of responses. Our spreadsheet became very messy very quickly. Direct comparisons were difficult.
Some lessons and distinctions were immediately apparent. Major expense drivers included self-hosted versus cloud-hosted, commercial versus open-source, and data migration fees. A self-hosted, open-source system was closest to our custom system and therefore initially appeared to be the most cost-effective for our processes. However, paying for hardware, software adaptation and ongoing, local administrative staff dramatically increases the recurring costs of open-source solutions. Commercial software is usually more expensive at the outset, but it also tends to include more functionality, though less customization. Data migration expenses were a surprise. Each company advertises that migration will be easy, but that claim is based on an assumption that your existing data is already internally consistent, well-organized and can be mapped directly into the categories available in the new software. The cost of labor to ‘normalize’ or clean up an existing catalog must be incurred regardless of whether the new system is made or bought.
Dozens of emails, phone calls, and online meetings were necessary to understand the various responses from vendors. Ideally, each communication would work from a standard script, but that was impractical. The vendors are competitors, and each made points that could be countered by another. We could have ended up in an infinite loop just trying to acquire a consistent set of questions and answers from all involved. We had to make judgments which also meant some mistakes and oversights.
Taking all of those insights and then strictly assessing our internal efforts was challenging. It’s difficult to be dispassionate when you know the people who put their passion into the work. Vendors do just as much work, but it is human nature to pay more attention to established relationships.
After much spreadsheet manipulation, we collected the cost estimates into four primary cost categories: non-recurring, first year costs, costs through three years, and costs through five years. As with most non-profits, we are sensitive to the initial costs such as initiation, training, and migration fees. First year costs included recurring licensing and hosting fees. A three year cost point was important because, according to our project-wide plan, we will reach almost full operation in about three years. (Catalog software is only one of the tasks necessary to enable our virtual museum.) Five year costs are a measure of sustainability, and an acceptance that three year plans frequently become five year plans because of schedule slides.
The initial sort resulted in nine candidates for the next phase: our internally-developed software, plus eight other cloud-hosted products. These had reasonable costs and met most or all of our initial technical criteria. We assumed that we would do as much of the data migration as possible with our own volunteer staff rather than paying for vendor consultants. This reduces non-recurring cost while accepting that it will increase the time required for the migration. We also decided that vendor-hosting was safer and less expensive than managing our own servers or buying external hosting.
Next we delved into more detailed procedural and technical discussions. While it would be very informative to query all of the initial candidates with a full set of questions, it would have cost too much in time and money. We chose to reserve the deeper discussions for the smaller set of candidates. We sent them a list of questions to prepare them for the next, and possibly final, sort.
The second list of candidates
Catalog Maintenance System – HCLE internal effort
Human communications are never perfect, and we realized that we needed some clarification. While we thought our initial communication described where we are and where we want to go, the format of the text emphasized the future — where we want to go and the capacity we expect a system to accommodate after we’ve grown. Evidently, some vendors missed our description of our current situation. The more immediate requirements were far simpler, and influenced estimates of non-recurring items like storage requirements and the number of users. We hoped to make that clarification as well as present the expanded list of criteria.
Here is a portion of the second communication.
Our current situation is simpler because we have fewer than 1,000 items in our existing database. This is a need for metadata normalization, and we are assessing the size of that task.
For the detailed technical discussion and any updates to the estimates (particularly for data migration) we have included a set of topics.
Topics for technical questions:
- database engine – mySQL vs other
- database structure (flat vs. relational)
- typical time to implement
- data migration – cost
- data migration – time
- data normalization ala Open Refine
- item and data flexibility for input and output
- file types – including executables
- search and select performance
- speed of access for individual queries
- performance metrics within metadata-only searches
- performance metrics including multiple displays
- automatic generation of thumbnails from pdf
- administrative vs collection data
- exhibit creation/item display
- including an exhibit as a database record
- storing all images versus storing one and modifying per display
- preservation repository
- approximate size of current installed base
Several more emails, phone calls, and online meetings occurred. Online meetings, often attended by both of us, included interactive demonstrations of the standard software.
Every system we reviewed demonstrated had an impressive set of features. If they didn’t, they wouldn’t be in business. Each had slightly different sets of features, pricing strategies, customer support approaches, and communication styles.
As we learned more, we also realized we might use one solution for two problems. Several of the systems included CRM functionality. (Pick your favorite variation on the C –Constituent, Contact, Client, or Customer; RM stands for Relationship Management.) Currently, we are using CiviCRM but have yet to master it. Ideally, a CRM system would enable tracking of communications, grant proposals, and would reveal connections among individuals and their institutional affiliations. Realistically, we continue to rely on Gmail and spreadsheets while CiviCRM primarily acts as a massive address book until we can learn how best to use it. Because of the nature of our virtual museum, a person could be represented in our Catalog and in our CRM. This is true of many museums dealing with artifacts created by living artists or inventors. For our museum, the same person might be a volunteer, artifact donor, fundraiser, financial donor, researcher, author, exhibit creator, and subject of an exhibit. We need to track current names, addresses and contact information for such individuals as well as their relationship to items in the museum. Putting those two functions into one solution has an obvious appeal and value. Every museum, library, archive, and collection will probably find similar synergies, as well as finding deficiencies.
One of our primary concerns is the ability to use our custom and extensive metadata set to associate previously unconnected items in the catalog. At the same time, we want to track their physical and digital locations from initial intake through possible deaccession. We are a virtual museum and we plan to deaccession most of our physical artifacts to permanent repositories; but there will always be items in the active intake process. We also have artifacts that are only digital: e.g. permanent universal resource locators (PURLs) of related items, copies of born-digital items, and whole books available online through public libraries. We intend to make full text searches available across and inside of each artifact so a researcher investigating a text string will find items with the text string in the title, or in the body of the text, or in the metadata, or within our CRM database.
Make Versus Buy
After reviewing the answers to our queries and updating the spreadsheet, we were better able to compare the commercially available solutions to our internal solution. Three key issues became apparent about creating a custom software package, time and money. No surprise, there. Also, no commercial product will fulfill all of our desired functions on our wish list. Compromises will be necessary.
Initial choice among the candidates
Our next step is to try out the most promising commercial candidate, Collector Systems, and compare its performance with our internal ‘make’ effort, the Catalog Maintenance System.
The money issue was described above. Custom software can be very affordable with volunteer efforts. Unfortunately, custom software benefits most from local hosting, which benefits most from hiring dedicated system administrators. Volunteers can build impressive software, but to make a system sustainable it is necessary to have a persistent staff. Volunteers donate their time, effort, and energy but they are also free to volunteer elsewhere as their interests or life situations shift. Besides, programmers are in great demand and we would be surprised if each of them weren’t eventually hired away. If we hire our volunteers, even for part time work, they will still be more expensive than the initial purchase price of many of the commercial packages. That expense became much more apparent when comparing three year accumulated expenses.
The time issue has several components: design time, programming time, management overhead, documentation time and training time. Even with a dedicated volunteer programming crew, someone has to manage their efforts, and spend time training the rest of the staff. It is a lot to expect of programmers to ask them also to be designers, documenters and teachers. As a small museum, we must manage our time carefully. Our goal is to properly steward the collection and enable research. Managing software could easily overwhelm that central task.
Another time issue is the urgency imposed by the age of the people involved in the history. Pioneers who brought computers and computing into teaching and learning environments in the sixties are necessarily past middle age. Every day, more stories and insights are lost. Artifacts are discarded by those who think no one cares, or by those who don’t know the value of what a previous generation collected. The more time we spend developing software, the more likely we are to lose valuable bits of history.
It has been a difficult decision to set aside, at least temporarily, the system we developed. By choosing a commercial package, however, we may save significant money and time. “May” reflects that any such choice is always made with educated guesses at best. Perhaps our internal software is just on the cusp of being fully functional, something we can share with the community. Maybe we are years from completion. Maybe the transition to another system will take longer and cost more than we expect. Maybe using an existing solution will provide opportunities we are otherwise missing. There is no way to test a new catalog system except to load in some sample data and see how it performs.
Tentative Selection – Collector Systems
We have decided to conduct an in-depth test and trial of Collector Systems, a cloud-based collection management system.
As described above, a key distinguishing feature was being cloud-based. Cloud-based systems dramatically decreased our system administration costs. We recognize that nothing is free, and that there will be a need and a cost associated with managing the management software; but, that’s a baseline all options were burdened with.
As a small museum, we are very sensitive to costs; so, being able to effectively scale from a single user also reduced our costs. Scalability appears to be manageable with each new user and on a monthly basis.
The feature set covers most of our key criteria. We want to be able to create galleries of the images and files, view extensive metadata, conduct searches within the database, and do so in a simple, yet powerful interface. At this point, almost every vendor could say they could provide the same function; and they’re probably right. As we described above, if they couldn’t, they wouldn’t be in business.
Costs are objective criteria. Feature sets are also measured against objective criteria, frequently as a pass/fail; but feature sets also are measured against subjective criteria. A menu, navigation strategy, visual interface, or data import and export scheme may be intuitive for some, but not for others. We’ve familiar with the history of how people use computers; in cases like this, we are familiar with the importance of how hard or easy it is to learn and use a new system. The progression from Altair, to Apple 1, to IBM PC, to Macintosh, up through smartphones isn’t just about processing power. Apple’s operating system is cheered as intuitive, but that is not universal.
While it would be ideal to test every system to the same depth, we can’t afford that luxury. But, we must test, which is why it is important for us to test a system in a way that is affordable that can also lead to migration and implementation. We won’t know whether this is our solution until we’ve tried learning their software, incorporating our metadata, and migrating several items.
Our trial of Collector Systems is just beginning. We will report news as appropriate – and as we have time. We have a lot of work to do. Picking a good catalog management system enables our current critical path task. Hopefully, we reach our goal more quickly and with lower costs.
Cost Comparisons – cost of conducting the survey
Conducting such a survey is not free. The representatives freely offered impressive amounts of time to guide us, but we did experience internal labor charges to manage the survey. The survey cost about $800 and took about four months. The survey wasn’t the only charge or the only task, but it was a significant effort considering the size of our organization.
The benefit of conducting the survey is a potential cost savings of $300,000 over three years, with additional recurring savings that increases our sustainability. We also are potentially shortening our program plan by several months, and possibly as much as a year.
The cost/benefit ratio is therefore about $300,000/$800 or $375 of benefit for every dollar spent. If you prefer, the reciprocal is a cost of 0.267%.
The survey would not have been possible without an internal willingness to face difficult internal reviews. The survey would not have been possible without an external willingness to educate us without compensation. Properly wording the communications to the vendors who weren’t chosen was difficult because their efforts are so deeply appreciated. Such surveys, however, only result in one answer; so, only one yes was delivered. Thanks to everyone else for their time.
Now begins the real work of learning, training, and exploring the software; while ultimately returning to the task of building our virtual museum for the history of computing in learning and education. Our urgency persists. The passion and confusion within the Educational Technology industry demonstrates a demand for understanding how to improve students’ learning. We are all students: kids in classrooms, employees at work, anyone who found that their computer had an upgrade overnight. Computers and computing have dramatically changed the need and the style of learning and education throughout our society. We intend to supply some of the insights needed.
But first, we have to figure out how to log in to Collector Systems.
(edited by Liza Loop)