2010
04.01

ECIR Industry Day 2010

The event consisted of 12 different speakers each presenting for exactly 20 minutes, with about 10 minutes of Q&A after each. I particularly enjoyed the presentations from the major search engines: Yahoo, Google, Bing, and Wolfram Alpha. A topic that seemed to arise in each of those talks was how query reformulation data can provide a feedback loop to make search better. But without further ado, here are my summaries of each talk.

Mining the Web 2.0 to Improve Search

Ricardo Baeza-Yates, Yahoo Research

Ricardo Baeza-Yates talked about how web usage data can to be used improve relevance and accurately provide related queries.

He first talked about how user tagging can be combined with user-generated image annotations to increase the relevance of image search. What I found most interesting, however, was how Yahoo keep track of all the iterative queries people enter in order to get to a specific result (for example, if the user typed ‘furry animal,’ reformulated the query to ‘white black furry animal,’ and then chose an article on Pandas, Yahoo would associate both queries with ‘Panda’). Yahoo would then use this data to suggest related queries (i.e. if the user searched ‘furry animal’, Yahoo would suggest ‘white black furry animal’ as a related concept).

Google Squared: Web Scale, Open Domain Information Extraction and Presentation

Dan Crow, Google

Dan Crow opened by asserting that complex tasks, such as planning a trip or writing a research paper, are still very difficult on the web. Google’s user studies revealed that people make spreadsheets, email themselves things to remember, and add post-it notes to their computer monitors when undertaking these complex search activities. They also found that people seemed to love tables (and chose to adopt this table style for data presentation themselves).

Google Squared is an attempt to help people cope with comparison-heavy, list-driven search activities and it operates at three different levels. First, it seeks to discover the topic behind the user’s query (does “Ford” refer to U.S. Presidents or the car manufacturer, for example?). Once the topic has been isolated, Google Squared then tries to find attributes of that topic (price, horsepower, colour). And finally, it tries to fill in the values for each of the attributes.

Google Squared combines offline analysis (such as mining wikipedia categories and combing the web for attribute / value pairs) with run-time queries for finding specific values.

Relevance Challenges at Bing

Milad Shokouhi, Microsoft Research

I really enjoyed Milad Shokouhi’s talk about clever ways of boosting relevance in search, though I must say it stretched my vocabulary a bit. His initial slide outlining the challenges to relevancy included temporal queries, heterogenous verticals, and pre-retrieval query alteration. He did go on, however to clearly articulate each and offer concrete examples from Bing.

Like Ricardo Baeza-Yates, Milad Shokouhi touched on how query reformulation data can be used to enhance relevance of ambiguous terms. An example from Bing was the ranking of results for the query “wow.” The unadjusted top hit for this query was a cable company. But a look at the query reformulation logs indicated that a majority of people who queried for “wow” went on to search for “world of warcraft.” Bing then adjusted the ranking of the results so that world of warcraft appeared first.

He also talked about query trends, specifically the “spiking” of certain queries. Some queries quickly spike, but then quickly disappear. Other queries (like “iPad”), spike and remain high. Still others spike seasonally (such as “Halloween costume”).

These spikes of queries could be used to trigger the appearance of a news story, seasonally suggest related queries, or even forecast future events (elections, for instance).

Search User Experience, the Essentials of Great Search Design

Vegard Sandvold, Comperio

Vegard Sandvold argued for cross-disciplinary collaboration. Everyone should be involved in the design process, from stakeholders, to techies, to users. “Innovation happens happens where disciplines intersect.”

Vegard advocated a “Sprint Zero” phase of one to four weeks in which to set forth a plan of action for the project. During this time, he typically talks to stakeholders about their goals, interviews users, creates personas, and attempts to identify all of the problems that must be solved in the project. He also strives to prototype and test both the basics of the interaction design and the capacity of the underlying technology.

His talk can be summarised by his final slide: “We discover the best solutions together.”

Getting Value from the Search Master’s Toolbox

David Hawking, Funnelback

David’s talk was primarily about fine-tuning relevance in enterprise search deployments within organisations. He advocated picking a selection of random queries (that realistically reflect the distibution of queries within an organisation), trying those queries yourself, and tuning the engine until those queries produce the most relevant results.

Enterprise Search: State of the Market 2010 & Beyond

Nick Patience, The 451 Group

Nick Patience is a market analyst and provided some revealing insights into the industry. It’s a bit smaller than I would have guessed: in 2009 it totalled $1.3bn and is estimated to reach $2.8bn by 2013 (at a 22% annual growth rate).

Project Plaza – A New Approach to Information Management in the Construction Sector

Rob Blackwell, Active Web Solutions

Rob and his colleague demonstrated a search-driven web application for managing construction projects.

Rethinking the Library Catalogue

Sally Chambers, The European Library

Sally Chambers from the European Library talked about the challenges they faced in integrating a vast number of library corpora using a federated search approach. She described the issues they faced in providing relevant results due to the asynchronous nature of the system; where certain searches might time out and providing a blended set of relevant results was not feasible.

With users expecting the same user experience they get on the web, they initiated a move towards a single index of bibliographic and full text (including optically recognised) content. Some of the more persistent challenges that still remained had to do with the multi-lingual nature of the content, and the disparity in the metadata formats used by the individual institutions (such as MARC vs. Dublin Core).

Collaborative Research, Technology Transfer and Networking

John Tait and Francisco Webber

John Tait presented Francisco Webber’s ideas on the information retrieval innovation cycle. He advocated enhancing a mutually beneficial ecosystem between academia, government, and industry by aligning the incentives: academics need to publish papers, politicians need to get re-elected, and businesses need to generate revenue.

He insisted that innovation in technology hinges on open data. There was a rigorous discussion in the Q&A over the merit and drawbacks of open business models. A member of the audience asserted that companies aren’t realistically going to give information away for fear of competition exploiting it. John defended this criticism well by arguing that businesses who have embraced open business models have benefited from higher volume and the formation of communities around their products (his example was GE).

Wolfram Alpha – the New Computational Knowledge Engine

Jon McLoone, Wolfram Alpha

Jon McLoone began by saying that knowledge consists of four components: opinions, facts, methods, and understanding. While traditional search is geared towards retrieving opinions and facts, Wolfram Alpha focuses on revealing method and understanding, while leaving out opinion altogether.

The Wolfram Alpha approach asserts both that the desire for information does not indicate the ability to use it, and that computed data is more valuable than data alone (for example, turn-by-turn directions are more valuable than just a longitude and latitude).

This computational approach is evident in the examples that Jon demonstrated. Every type of data has a corresponding visualisation. Planets get plotted on a sky chart, financial markets are graphed, chemical elements are shown on the periodic table of elements, and recipe ingredients are even turned into a nutritional value chart.

One comment brought up in the Q&A session was the discoverability of all these visualisations, which is a concern that I share.

Using AI to get Answers from the Internet

Simon Overell, True Knowledge

True Knowledge is a platform for answering questions. It takes natural language questions and returns numerous well-cited answers by gathering information from both structured and unstructured sources.

Panel Discussion: Leveraging Semantics to Enable Better Search Experiences

Dan Crow (Google)
Gjergji Kasneci (Microsoft Research)
Jon McLoone (Wolfram Alpha)
Simon Overall (True Knowledge)

The primary thrust of the panel discussion was a debate over curated knowledge verses the wisdom of crowds.

Google Squared doesn’t bake in any knowledge or curate any information, it relies completely on scraping the open web in search for the answers. Wolfram Alpha, on the other hand, relies completely on authoritarian facts. True Knowledge takes a middle ground of presenting differing facts and opinions from around the web, and clearly citing the sources.

Tyler Tate

Tyler is a user experience designer at TwigKit. He has been designing websites and web applications for 9 years and is the creator of The 1KB CSS Grid. You can keep up with Tyler on Twitter.

  1. Thanks for the great write-up, Tyler. Glad you enjoyed it! The presentations and asociated mp3 files hould be going online shortly (at http://dces.essex.ac.uk/staff/udo/ecir2010/), along with the video of the panel session.

    Hope to also see you at Search Solutions 2010 (in October)!

    Cheers,
    Tony

TwigKit Enterprise Search London Meetup