We know from previous studies that the queries untrained users
pose to information retrieval systems are short: most every query
is three words or less. (Rose and Cutting, 1996; Rose and Stevens,
1996; Croft et al, 1995). There is little room to elicit finer-grained
information from the user in an unstructured list of three or
four disjoint content words, and it would be desirable to encourage
users to enter longer queries.
| 1 word | 2 words | 3 words | 4 words | > 4 words | |
| Apple | |||||
| Excite | |||||
| THOMAS |
It has been assumed that a short entry field encourages users
to use short queries. For most popular web search engines the
entry field is typically on the order of 20-55 characters.
| Altavista | |
| Altavista advanced | |
| Excite | |
| Galaxy | |
| Infoseek | |
| Lycos | |
| RBSE Spider | |
| Web Crawler | |
| World Wide Web Worm | |
| Yahoo |
We have tested this hypothesis in a small study. We had nineteen
linguistics students with varying, but mostly little, experience
from information retrieval system use (ranging from proficient
web retrieval system user to hardly any computer experience at
all) perform three tasks using two different interfaces. One group
of subjects were given an interface with a large text field of
six full-length lines of text, and which allowed arbitrarily long
queries to be entered; the other group an interface with a short
entry field of only eighteen visible characters, which allowed
queries of up to two hundred characters to be entered. The search
interface was connected to the Altavista search engine - which
the subjects were advised of - and the user query was sent to
Altavista. The top twenty ranked documents Altavista retrieved
for the search were presented to the user.
![]()
The experimental interfaces can be found at http://www.sics.se/~jussi/soek.html and soekkort.html, respectively.
The tasks were to 1) find material on carpal tunnel syndrome, in some language other than Swedish; 2) find national holidays and festivals around the world that occur in February 1997; 3) find tips for evening entertainment in Palo Alto at the end of March 1997.
The instructions to the subjects were to search until they felt
they had a reasonable result set in the list of top twenty ranked
documents displayed. We discarded the results after the experiment
- the success rate was not measured - and retained the queries.
Queries of zero length were discarded, since we assumed they were
test clicks by users rather than searches.
| # of subjects | # of queries | Average query length in words | |
| Long entry field | |||
| Short entry field |
The difference in average query length is significant by more
than 90%, and close to 95% in a Mann Whitney U test as can be
seen in Table 4.
| Criterion 90%: | 14190 |
| Rank sum: | 14055.5 |
| Criterion 95%: | 14149 |
If longer queries are desired, they should be solicited by longer
entry fields.