Verbosity and Interface Design

Jussi Karlgren & Kristofer Franzén
March 1997

Users Pose Short Queries

We know from previous studies that the queries untrained users pose to information retrieval systems are short: most every query is three words or less. (Rose and Cutting, 1996; Rose and Stevens, 1996; Croft et al, 1995). There is little room to elicit finer-grained information from the user in an unstructured list of three or four disjoint content words, and it would be desirable to encourage users to enter longer queries.

1 word2 words 3 words4 words > 4 words
Apple
53
28
13
4
2
Excite
32
38
17
7
6
THOMAS
22
38
28
9
3


Table 1. Percentage user queries of various length in three systems (from Rose and Cutting, 1996)

Entry Field Length

It has been assumed that a short entry field encourages users to use short queries. For most popular web search engines the entry field is typically on the order of 20-55 characters.

Altavista
55
Altavista advanced
68x3
Excite
40
Galaxy
25
Infoseek
50
Lycos
20
RBSE Spider
20
Web Crawler
40
World Wide Web Worm
40
Yahoo
30


Table 2. Length of input field in some popular systems

Experiment

We have tested this hypothesis in a small study. We had nineteen linguistics students with varying, but mostly little, experience from information retrieval system use (ranging from proficient web retrieval system user to hardly any computer experience at all) perform three tasks using two different interfaces. One group of subjects were given an interface with a large text field of six full-length lines of text, and which allowed arbitrarily long queries to be entered; the other group an interface with a short entry field of only eighteen visible characters, which allowed queries of up to two hundred characters to be entered. The search interface was connected to the Altavista search engine - which the subjects were advised of - and the user query was sent to Altavista. The top twenty ranked documents Altavista retrieved for the search were presented to the user.



Figure 1. The two experimental conditions

The experimental interfaces can be found at http://www.sics.se/~jussi/soek.html and soekkort.html, respectively.

The tasks were to 1) find material on carpal tunnel syndrome, in some language other than Swedish; 2) find national holidays and festivals around the world that occur in February 1997; 3) find tips for evening entertainment in Palo Alto at the end of March 1997.

The instructions to the subjects were to search until they felt they had a reasonable result set in the list of top twenty ranked documents displayed. We discarded the results after the experiment - the success rate was not measured - and retained the queries. Queries of zero length were discarded, since we assumed they were test clicks by users rather than searches.

Results

# of subjects # of queriesAverage query length in words
Long entry field
9
118
3.43
Short entry field
10
123
2.81


Table 3. Average length of query for the two experimental conditions

The difference in average query length is significant by more than 90%, and close to 95% in a Mann Whitney U test as can be seen in Table 4.

Criterion 90%: 14190
Rank sum: 14055.5
Criterion 95%: 14149


Table 4. Mann Whitney U for significance

Conclusions

If longer queries are desired, they should be solicited by longer entry fields.

References

  1. W. B. Croft, R. Cook, and D. Wilder. 1995. "Providing Government Information on the Internet: Experiences with THOMAS". Proceedings of Digital Libraries '95. 19-24.
  2. Daniel E. Rose and Douglass R. Cutting. 1996. Ranking for Usability: Enhanced Retrieval for Short Queries. Apple Technical Report #163. Cupertino: Apple Computer Inc.
  3. Daniel E. Rose and Curt Stevens. 1996. V-Twin: A Lightweight Engine for Interactive Use. Proceedings of the fifth Text Retrieval Conference, TREC-5. Donna Harman (ed), NIST Special Publication, Gaithersburg: NIST.