Design, query, and evaluate information retrieval systems
Introduction
Information professionals design information retrieval systems to store information about all sorts of things depending on the field one works in. Often we use the term documents to include a great many of these things like books, dvds, corporate reports, scholarly journals, games, tools, and so forth. We generalize these many different items to the term documents for brevity. How useable an information retrieval system is relies on good design, understanding how to leverage the search functionality, and how user friendly the interface is.
Design
In information retrieval systems one must first start by investigating who you will be designing for rather than what you will be storing. The ultimate end user of the system is very important because in order to determine what information one will store, one needs to understand how that information will be used by the user and what their other information needs are (Weedman, 2018). Once the end user is identified, the designer can move on to deciding what attributes to store about these documents. When dealing with books, examples of attributes are the book’s title, publisher, number of pages, or reading level. Designs with too few attributes will not be useful to the user but too many attributes are a problem too. Each attribute stored must be created on ingestion of the document into the system and adds cost.
When designing an information retrieval system, one tool a designer uses to limit the content of an attribute is a controlled vocabulary. The controlled vocabulary is a list of predefined searchable terms such as subject headings that force documents into a set structure. The use of controlled vocabularies make retrieval of documents more precise but it is not a flexible system since it relies on the complete and accurate definition of the vocabulary from the beginning. Again knowing one’s users becomes doubly important when designing the controlled vocabulary. Vocabulary design relies on language which can be very interpretive, regional, and specific to the subject matter field. For instance, one must understand that if the primary user would be a resident of the UK then the term ‘boot’ might be more appropriate than the term ‘trunk’. Similarly, if the user is a medical researcher then the age descriptions might need to be more fine grained than just “child” and “adult”. Understanding the user of the information informs many controlled vocabulary decisions.
Querying
It helps to understand the design of an information retrieval system before attempting a search because as Weedman (2018) postulates, “How you store it determines how you can retrieve it.” For instance, when you hit search it helps to know which field is being searched by default. Is the search function querying only specified fields or doing a search of the full text of the document? If only the title field is searched by default, then one may miss documents that have creative and interesting titles that lack the keyword one has searched. Names can cause particular problems when searching and it is useful to know exactly how an information retrieval system designer enforced name input. Weedman (2018) uses the example of the author known as “Nancy Van House” and is able to detail over 17 different conventions used when this name is input. Knowing which convention was used improves one’s query results (although no one can account for data input errors which may cause the user to miss results.) Generally, when using a controlled vocabulary a searcher can expect a more accurate search result. If the searcher knows the that natural language tagging was used then one would be inspired to try multiple synonyms or even slang in order to retrieve all possible documents due to the variation of natural language. Anecdotally, I always search for common misspellings of product names on eBay because I know there is no controlled vocabulary and good deals are often found.
Evaluate
When evaluating an information retrieval system search one needs to be concerned with its precision and recall. Precision is defined as retrieval results that include only the relevant documents. This is calculated by dividing the number of relevant documents retrieved by the number of all documents retrieved. Recall is defined as the percentage of relevant documents retrieved divided by the number of all relevant documents in the database system. But, in order to use these two measures one must first define relevance. Unfortunately, defining relevance is quite tricky. Bill Maron, who worked on early search engine design, defined it most simply as “all and only” (Weedman, 2018, P. 182). His elegant definition of relevance means that one expects to see all documents one wants without any extra documents returned. But again we are stuck because understanding which documents are the ones a user wants is hard to tease out using language. For example, let’s say the user wants to search for help with the word processing application Pages and tries the following, “help adding page numbers to Pages”. Search engines often cannot distinguish the two different contexts of the word “pages” within this query and may return many irrelevant results.
Search engine design looks to tackle this problem with varied highly guarded secret algorithms that attempt to weight the ‘aboutness’ of a document and rank its relevance within the search results. Ultimately, as Weedman (2018, P. 184) states “It is much more difficult than it seems like it should be to turn a complex desire for something that a user may not be able to completely define into a query that will come close to retrieving all and only the relevant documents.” In the end, one relies on the skill of the searcher to discover the exact right amount of specificity and combination of synonyms to produce relevant results.
One final aspect of good information retrieval system design is usability. Information retrieval system are usually accessed through a website user interface. The degree to which this interface is intuitive and clear to the user effects the search results. Website usability is a whole field onto itself but it basically speaks to the clarity of a website’s design. Are labels clear and concise? Is a user able to quickly identify where on the page the information they are seeking is located? Are fonts readable? Even the choice of button color effects usability.
Good information retrieval system design starts with understanding the user and how the documents being stored will be used. One can retrieve more relevant results during search by paying attention to which fields are searched by default, noticing if a controlled vocabulary is used, and using multiple synonyms during search. Finally, searching for the relevant information for a user’s request relies on the ability of the searcher to translate an information request into the correct words that will retrieve relevant results.
Competency development
Although I have used databases throughout my life, my theoretical understanding comes mainly through the SJSU INFO 202 Database Retrieval System Design. The group project where we designed a database for a small collection of games was very helpful in demonstrating the possible pitfalls a designer encounters. Our collection included only a handful of items but we still encountered errors in attribute selection and missed an opportunity to increase user find-ability by using a controlled vocabulary. (As a small side note, this project also exposed the value of a diverse work group. One of our game types I contributed was “collaborative game” which my non-parent group members had never heard of. A diverse team creates a more complete product.)
I have expanded my practical knowledge of searching during INFO 200 Information Communities by using the SJSU databases. My information community was the Women’s March of 2017 and I was challenged to find search terms to bring back relevant research related to their information seeking behaviors.
In my professional experience I am now more aware of the limits of online library catalogs especially when it comes to young inexperienced searchers like the students I serve. For my elementary school catalog I have worked hard to clean up the data, which was often input by volunteers, to improve find-ability for books in our collection.
Evidence Description
INFO 202 – Group 5 Testing Observations
To demonstrate my ability to evaluate an information retrieval system, I chose this paper explaining my observations when testing another group’s database. In this group project although I contributed some insights to all the evaluation questions I was solely responsible for the discussion around the rules evaluation.
INFO 202 – Database Design Project Reflections
To demonstrate my ability to understand the intricacies of information retrieval system design and controlled vocabulary design I chose my database design reflection paper. In this paper I detail the complexities I encountered while designing a system for this small collection of board games.
INFO 202 – Orange County Visitors Association Website Usability Project
This extensive website usability project discusses the usability of the Orange County Visitors Association. It includes recommendations for improvement on the menu design and also usabilty testing.
INFO 202 – Discussion Access Points
To demonstrate my understanding of how access points affect searching I submit this discussion of access point review of four different databases. In this discussion I look at a library catalog, public library database, academic library database, and finally a retail website.
Concluding Remarks
Information retrieval system design can seem simple on the surface but involves a deep knowledge of the user’s intentions for the documents being stored, a well thought out vocabulary design, and attention to usability on the web interface. A skilled searcher knows that a well thought out query relies on knowledge of the database design and repeated refinement of the keyword search to retrieve the relevant results.
References
Weedman, J. (2018). Information retrieval: Designing, querying, and evaluating information systems. In K. Haycock & M.-J. Romaniuk (Eds.), The portable MLIS: insights from the experts (Second edition, pp. 171-198). Libraries Unlimited, an imprint of ABC-CLIO, LLC.
Last Updated September 11, 2021 10:12 am