TEDxBoston: The Future Of Search vs. Seeing The Future With Search

by Pete Bell

TEDx conferences, the local offshoots of TED, are more experimental in format than the classic TED talk. An innovation of TEDxBoston is the “Adventure” — an immersive trip that puts the big ideas of TED into the context of a physical location. This year, there were nearly two dozen, including a tour of Dean Kamen’s Willy Wonka-like DEKA factory, a helicopter flight over the city, a re-creation of Paul Revere’s ride on bicycles with Olympian Nicole Freedman, and a visit to the Suffolk County lock-up with its forward-thinking sheriff.

Endeca hosted an audience of 75 for one of the Adventures, The Future of Search panel, at our headquarters. And if you wanted any more evidence that search is still red hot, it turned out to be one of the most requested TEDxBoston Adventures.

Boston is a perfect city to assemble a search panel, and the mix for this one included:

Paul Sonderegger moderated a wide ranging tour not just of search, but of searching – the human activity of seeking out information and applying it to decisions. His questions covered 3 broad areas:

The night ended with predictions from each panelist about which aspects of search will matter most in the next 3-5 years. So what is the future of search? Social graphs, location-based search, semantic web, the convergence of search and analytic technologies, and richer content aggregation.

But earlier on, Paul had another interesting question about the future of search: With better search, could we have foreseen the financial crisis? Or to generalize Paul’s question, can we use search to see the future?

The search box is imbued with nearly magical powers these days. But if you stop to think about it, even with all that power, search is rarely thought of as a crystal ball to the future. Contrast that, with, say, business intelligence tools, which are widely used for forecasting and predictive modeling. Oracle even named themselves after a seer. So why the difference?

Let’s re-ask Paul’s question a different way: With better BI tools, could we have foreseen the financial crisis? Asked this way, I think the answer is obvious. We didn’t need better tools, we needed better people using the tools.

That complementary pairing of person and computer is the heart of HCIR, or Cyborg BI. Readers of Search Facets likely already think of search as a conversation with the data, akin to BI. Yet sampling the audience at the panel, most people outside the biz still have a mental model of the search box as a tool for fact finding, or query/response. To paraphrase William Gibson, the future of search is already here, it’s just unevenly distributed.

  • Share/Bookmark
Posted on July 28, 2010 at 11:50 pm · Permalink · Leave a comment
In: HCIR, Search/BI convergence

Introducing the Endeca User Interface Design Pattern Library

by Mark Burrell

Our UX team at Endeca gets a steady stream of questions about how to design effective search and discovery experiences. Just some of our FAQs:

In the course of solving problems like these in many contexts, we’ve been building up a library of UI design patterns specializing in search and discovery, and especially in faceted search and exploration. We previewed it for Jared Spool’s UIE virtual seminar “Leveraging Search & Discovery Patterns For Great Online Experiences,” and today marks the official launch of the library.

Endeca UI Design Pattern Library home screenshot

A pattern library serves as a knowledge base for designs that are known to work well, and provides kernels you can adapt to your specific case. This follows in the footsteps of some well-known collections like Yahoo’s Design Pattern Library and Peter Morville’s Search Patterns.

But when should you look for and apply patterns? How do you identify the most relevant and applicable ones? When you find one or more relevant patterns, what should you do with them?

We believe that UI design patterns should be used as an integral part of a human centered design process. This means that patterns should be part and parcel of the iterative process of understanding users, creating potential solutions, and evaluating and optimizing solutions.

First, it’s essential to understand the user context.

Certain patterns may be for very specific users and scenarios, while others may work across a wide range. For example, Horizontal Faceted Navigation Multiselect is specifically designed to help knowledgeable users engage in tradeoff analysis when trying to find entities that match complex sets of criteria – for example design engineers trying to find the best components to use together to build a product. In contrast, Vertical Stack Faceted Navigation can be viewed as a multipurpose “Swiss Army Knife” that can work fairly well for both the “knowledgeable seeker” (e.g., the pro photographer looking for a replacement lens) and the “uncertain explorer” (the photo novice looking for an affordable camera to take on a family vacation).

Next, patterns provide an evaluative lens. These are guiding principles that help us spot potential gaps or sources of confusion. For example, the Breadbox pattern can help us know whether an implementation of faceted “breadcrumbs” includes clear affordances for a user to modify his or her search criteria. It also cautions that it might confuse the user if mixed with “you are here” style “breadcrumbs.”

When using patterns as an evaluative lens, it is essential to do this with the user context in mind, not as an abstract exercise. For example, patterns are extremely useful as reference points during scenario-based walkthroughs of designs. This helps ensure that that all the moving parts in your application work together to aid successful discovery by users.

Patterns are not simply evaluative, they are generative, helping us create new UIs and experiences. As with a spoken language, we can creatively “make infinite use of finite means” — a finite set of words and grammatical rules enable us to create infinite meaningful sentences. Vertical Faceted Nav may include facet values as text links, icons, range sliders, or color pickers. The presentation of “more” facet values can be presented via fly-out menus or expanded lists, with or without a scroll bar.  Breadcrumbs in a breadbox may be presented as visual “building blocks,” layed out vertically or horizontally, or as simple text presented as a readable “sentence.”

Patterns should inspire us to create novel solutions. We should extend patterns, and break outside the box as needed to improve discovery experiences, take advantages of new opportunities (e.g., afforded by technology innovation), and solve new problems. In this spirit, one limitation of the Breadbox pattern points to a possible enhancement: not only should we be able to remove filters, we should also be able to modify filters directly from the breadcrumb itself (e.g., see Greg Nudelmann’s article on “super powered breadcrumbs”.) Likewise, one limitation of standard vertical stack faceted navigation is that it can be difficult for users to “see” the tradeoffs and relationships between various attributes as they make choices. This led to new ideas such as open horizontal faceted navigation with multiselect.

So UI design patterns can and should be used as much more than simple point-in-time evaluation tools. They’re generative tools that we creatively apply throughout the process of designing, evaluating, and continuously improving human centered solutions.

  • Share/Bookmark
Posted on July 22, 2010 at 8:33 am · Permalink · 2 Comments
In: IA, UX

QlikTech’s IPO & Vigilante BI

by Pete Bell

We’re often asked about how Endeca’s BI offering compares to QlikView — more than usual with their “heavily oversubscribed” IPO this morning of their parent company QlikTech (QLIK).

The comparisons aren’t surprising. If you read their S-1 IPO filing, you’ll find spots where you could cut-and-paste “Endeca” for “QlikView.” For example,

“We have pioneered a powerful, easy-to-use business intelligence solution that enables our customers to make better and faster business decisions. Our software platform, QlikView, combines enterprise-class analytics and search functionality with the simplicity and ease-of-use found in office productivity software tools for a broad set of business users.”

Despite this, we rarely compete. One reason is that our products have material differences. But we don’t usually get down to the product because upstream of that, we’re each going at this expanding BI market from different directions.

Appeasing the Angry Mob

Like QLIK, Endeca is addressing the need for BI on many more desks than classic BI can reach today. As Gartner puts it in their SWOT analysis of Qlik (which also mentions Endeca as a competitor), “According to Gartner BI user surveys, fewer than one-quarter of potential users use BI today.”

All those left-out users make a stink about it. Each time they get a BI report, it answers one question, but invariably raises new ones that it can’t answer in the moment, .e.g Demand went up….Why? So the consumer requests new views of the data from IT. It’s a hydra — each time a report answers a question, it engenders new ones. And that’s one root cause of the BI backlog, as Paul Sonderegger blogged in How Big Is The BI Backlog?

Angry mobs take matters into their own hands. That’s where QLIK sniffed a market for their vigilante BI. The beauty of their product is end-users can install and use it on their own, bypassing IT entirely. In fact, their business model is predicated on vigilante BI. From their S-1,

We have a differentiated business model designed to accelerate the adoption of our product by reducing the time and cost to purchase and implement our software. Our low risk approach to product sales provides a needed alternative to costly, all-or-nothing, traditional business intelligence sales models by offering free product downloads to individuals and a 30-day money back guarantee upon purchase. We initially focus on specific business users or departments within a prospective customer’s organization and seek to solve a targeted business need. After demonstrating QlikView’s benefits to initial adopters within an organization, we work to expand sales of our product to other business units, geographies and use cases with a long-term goal of broad organizational deployment.

But that final part of their “land and expand” strategy is in question. They want to give self-service BI to individual members of the angry mob, with the aspiration that IT will eventually sanction the approach. But a product designed for end-users isn’t necessarily aligned with IT objectives, like, say, data governance. Gartner cites limitations to centralization as a weakness in their SWOT:

“QlikView is rarely enterprisewide BI because of limitations with metadata.” Expanding on this, “QlikTech has very few examples in which a single QlikView instance is used for BI metadata for all BI applications, rather than for a number of disconnected QlikView implementations. Because of the nature of siloed deployments in business units, and because there is no enterprise metadata layer or a central metadata repository across QlikView applications, it is difficult to create an enterprisewide BI view with QlikView, in which cause and effect relationships are established.”

QlikView serves individual users well, but those silos don’t aggregate up into a centralized IT deployment.

Endeca comes at this large untapped market from a different direction: we’re IT’s choice for Agile BI. Our customers tend to be IT directors in the Fortune 500 that are on the other end of the angry mob. They pick Endeca because it lets them clear out a healthy amount of their BI backlog. They can publish out many new views of enterprise data, while also meeting classic IT goals on security, reliability, and scalability.

So that’s how Endeca compares to QlikTech. We both see a big market in the angry mob. But we want to serve them in different ways, and reflecting that, our products, architectures, and go-to-market are quite different. If the QLIK IPO is a good indicator of which way the BI market is going, justice is coming for the angry mob, and it’s coming from the outlaws and the sheriffs.

  • Share/Bookmark
Posted on July 16, 2010 at 12:16 pm · Permalink · Leave a comment
In: BI, Search/BI convergence

Listening to the Customers’ Story

by Pete Bell

My favorite part of the Endeca year just started with our sixth annual call for Navigator Award nominations, recognizing the most visionary Endeca deployments. What’s most fascinating to me about the awards is hearing our customers tell their stories in their own words.

We have our own narratives about each facet of the Endeca story. For example, the dev and product management organizations build towards user personas, like Melanie Merchandiser, our retail super user who is an expert in product promotions but who is not an expert in IT. Dozens of personas like Melanie are composites of the hundreds of people like them we hear from out on the front lines. Then from that product artifact, we need to tell more stories like ones for sales, services, education, and user experience. The point is, the personas are grounded in reality, and so in turn our stories match our customers really well. But despite all that, the Navigator entries surprise me every time.

The kernel of our Agile BI story comes from a Navigator entry a couple of years back. RS Components started as an Endeca B2B ecommerce customer, using faceted search to help buyers find components. But then RS started using the platform internally to build out an Agile BI app. Their use case: having acquired several smaller distributors in Asia, they found they couldn’t make sense of their business. That’s because each acquired company had its own ERP system — whether Oracle or SAP, each with idiosyncratic schemas – and so to get visibility across them all, they’d need to reconcile them into one data warehouse. But by the time they completed that project, maybe a year later, the business had changed. Instead, they put the data into Endeca, getting an immediate rough cut, helping them to revise their application, lather, rinse, repeat. According to our narrative, our value to RS was primarily in letting them report on both structured and unstructured content for the first time. But to them, the value was in rapid iterations, made possible by pouring in multiple schemas – another face on looking across semi-structured data. Here’s how then-CIO Richard Boynett told me the story.

eBags is one of my favorites. It’s a David and Goliath story, where they beat up giants on a modest budget. Their secret – drawing on founder Peter Cobb’s background in catalog marketing, they get big results by testing endless tiny improvements to the site – which is the retail equivalent of the Agile BI story.

One ecommerce customer reported in their Navigator entry that they saw a $270 million increase in revenue in the first year after deploying Endeca, which is a lick more than even our bold marketing story would pitch. The Auto Trader guys did a Rashomon, telling their story from multiple  perspectives, one a business story from their search product manager, one an IT story from their chief technical architect. And there are many more like this.

The race is on for 2010, and I can’t wait to hear the stories.

  • Share/Bookmark
Posted on July 8, 2010 at 11:42 am · Permalink · Leave a comment
In: Search/BI convergence, miscellaneous :)

Hadoop + Hive + Endeca, Spotted in the Wild

by Pete Bell

In his post MapReduce just semi-good for semi-structured data, Adam Ferrari answered one of his FAQs about the relationship between Endeca and MapReduce, the popular big data cruncher. Now here’s one example of them complementing each other.

The question Adam answered was, if MapReduce is so powerful for processing big data, then what role does Endeca play?

By way of background, MapReduce is “a software framework for distributed processing of large data sets on compute clusters,” which is itself a sub-project of Hadoop, “open-source software for reliable, scalable, distributed computing.” They take parallel processing that was once rarefied because it required esoteric dev skills and expensive hardware and make it accessible to people with mortal IT skills and cheap hardware.

Adam answered that the details matter. What kind of data are you crunching, and how do you want to query it? For example, if you understand the structure of the data and know how you want to query it, MapReduce is perfect. On the other hand, if you have heterogeneous, semi-structured data, then we know empirically that you likely won’t know in advance how you want to query it, so instead you’ll need to explore and refine it. Endeca fits that use case.

Another complement to Hadoop is Hive, “a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files.”

Taken together, Hadoop, Hive, and Endeca can give you an Agile BI solution for big data.

And in fact, Vinay Mohta, a product manager at Kayak, the vertical travel site, has been blogging about this very use case. Vinay is a perfect early adopter because he’s an Endeca veteran, having served as a both a core software architect and a product manager. From his blog:

I’ve been using Hadoop and Hive for the last six months and have been pretty impressed with how well it works.  To state the obvious, if you can correctly formulate your query, nothing beats this approach.  It’s been very useful for doing cohort analysis and large scale lifetime value computations on a relatively high traffic site.  There are of course limits to what you want to keep in Hadoop / Hive; however, the convenience and the growing feature set are reducing that limit more and more.

Hive is not a good store as a backend for a BI product, since it offers no caching at all.  However, a workflow where you crunch data in Hadoop/Hive and then export to a MySQL table (or an Endeca instance) for use in a BI tool works very well.

Vinay’s not the only one. We’ve heard from quite a few customers that have Hadoop and  Endeca together in their workflow. These are fun to track because once people are in an Agile workflow, they inevitably invent new use cases.

I’d love to hear how you’re using it. I’d also like to know your motivation. Is it because it’s a quick path to Agile BI, or is there a qualitative difference between these new tools and a traditional enterprise data warehouse?

  • Share/Bookmark
Posted on June 25, 2010 at 3:09 pm · Permalink · Leave a comment
In: BI, databases

Bring Back the Dead Ends

by Pete Bell

There’s still so much room for innovation on faceted search user experiences. Here’s a great improvement that’s still rarely seen in the wild: graying out dead ends instead of removing them. “Gray ends” are just for certain cases, but in those conditions, they make a big difference. Moreover, they exemplify one of the great Edward Tufte lessons.

An excellent implementation is up right now at B&H, a professional photo, video, and audio store based in New York. (Don’t miss it if you get a chance to visit their brick and mortar store. They gave me a tour, and it was clear they had designed the store experience around the well-understood needs of their pro customers — plus, it’s fun to walk down an entire row of broadcast television cameras.)

In this example, I searched for camera lenses made by Canon and narrowed the results by picking the least expensive range in facet:Price. Now if you look at the facet:Focal Length Type, voila – gray ends. They show me that when I selected the lowest price band, some focal length types, like “Super Telephoto,” are no longer available to me.

There are a few obvious benefits here, but also some nuances worth discussing. I made one more selection in facet:Focal Length Type, “Zoom Super Wide,” to highlight some of the nuances.

-In faceted spaces, when you make an explicit selection in one facet, you are also making implicit selections in the other facets. You already know this, but it’s not obvious to most users. Gray ends make it obvious, which helps them understand how facets work, and avoids confusing them when things disappear for reasons that aren’t obvious.

-To the searcher, linking explicit and implicit selections helps them understand tradeoffs. By showing the gray ends, people can understand the tradeoffs they’re making across facets without being forced to ping pong between screens. In the B&H example, I can perform sophisticated price/performance optimizations with a simple interface.

-I made that additional selection, Zoom Super Wide, to show that Focal Length Type is a multi-select facet – using the check boxes, I can now expand my selection to include, say, Wide and Super Wide lenses too. (Note it’s multi-select OR, as opposed to AND, which would have narrowed my selection.) Multi-select facets are the primary use case for gray ends. After making my first selection in the facet, I have clear visual cues from the check boxes, grays, and category counts that I can make additional choices in this facet. (B&H had an earlier implementation of multi-select without check boxes and grays, and they told me that with that earlier interface, no one noticed you could multi-select.)

-In ecommerce, gray ends can be good for merchandisers because they show abundance. For example, if there’s a facet for brands and I make a selection in a different facet that implicitly turns most of the brands into dead ends, I can still display to the shopper that I carry those brands. (Of course, as Barry Schwartz teaches us in The Paradox of Choice: Why More Is Less, additional choices can have the unexpected affect of making it so difficult for shoppers to make a decision that they’re more likely to leave empty handed.)

-You’ll notice that in facet:Price, the dead ends have been removed instead of grayed out. I can select $100-199 or $350-$499, but all the other choices in $0-$500 are gone. That’s because B&H treats price as a hierarchical facet, and from a UX standpoint, gray ends don’t work well with hierarchy. The same holds with multi-select and hierarchy, where instead of gray ends you get split ends. I’ll leave the reasoning as an exercise for the reader.

-Gray ends and multi-select can bring unwanted attention to dirty data, which everyone has in spades. In particular, things get mildly confusing for users when your records aren’t marked with at least one value from each gray end facet. For example, in the B&H example, there’s a facet for Camera Compatibility with just two choices, Full Frame or APS-C. You’ll notice in this screen shot that both are grayed out. That suggests that the remaining lenses aren’t compatible with either, but in this case, I think they’re just not tagged. The simple fix is to programmatically populate a value for “Unspecified” to gray end facets that don’t have at least one tag. The expensive fix is to clean all your data, but I’m not a utopian.

Gray ends are an exemplar of Edward Tufte’s advice to “always show comparisons adjacent in space rather than over time.” That is, if you want people to understand the difference between a “before” and “after” screen, when you redraw the screen, you’re asking them to rely on their memory to make the comparison. It’s always a risk to rely on memory, but it’s an even bigger risk here because we know people on the “before” screen were just focusing on the facet in which they made their explicit selection, rendering the others cognitively invisible. With gray ends, nothing has disappeared, so they get to compare the two states adjacent in space, no memory required. Tufte’s advice here is a classic for faceted UX work in general.

If you’re a faceted UX historian, the first website I know of with an implementation of gray ends was a mutual fund evaluator that Fidelity built in the UK around 2001. Gray ends are still pretty rare, but they shouldn’t be, so I expect we’ll start seeing more of this goodness for multi-select facets.

  • Share/Bookmark
Posted on June 18, 2010 at 1:30 pm · Permalink · 6 Comments
In: BI, IA

Faceted Search, Without Electricity

by Pete Bell

Yesterday, Paul Sonderegger blogged great examples of reporting and filing systems from before the days of computers, including DuPont’s “chart room” and the 19th century invention of the vertical file. Beyond their appeal to fans of oak cabinetry, those early systems remind us that the design of a system can be independent of its implementation, and that disentangling the two can be a great way to improve each.

Since good ideas are rarely new, it turns out that faceted browse systems also predate computers. In fact, when I was evangelizing the wonders of facets in the early days of Endeca, I frequently used this image of a faceted browse system based on edge-notched cards to show that the information design of the system predates its implementation. So how does that ice pick get you faceted browse?

Two years back, the New York Times ran a story on Paul Otlet’s Mundaneum,  a hypertext system first envisioned in 1934.

At the time, the great Kevin Kelly, a founder of Wired Magazine, the WELL, and the Whole Earth Catalog, blogged Otlet as a “steampunk hypertext.” I wrote to him that there was in fact a better “steampunk search engine,” and sent him examples of edge-notched faceted browse systems. It happened to jog his own memory about using edge-notched cards at the Whole Earth Catalog, which led to Kelly blogging a neat history of the cards and faceted navigation:

One of my suppositions is that technologies rarely go extinct — on the global level. Usually someone, somewhere will continue to employ the most ancient technology. There are probably more people making swords by hand now than in the past.…It is hard to find an old technology that is not available in any form any where on earth. But today I may have found one….

Follow this link for Kelly’s full history, “One Dead Media,” at his blog, The Technium.

Pete Bell

  • Share/Bookmark
Posted on June 11, 2010 at 3:51 pm · Permalink · Leave a comment
In: IA, miscellaneous :)

MBAs As Data Designers

by Paul Sonderegger

The Haas business school at University of California Berkeley is reinventing itself for an information-rich world. Mixed in with the usual themes of leadership, culture, innovation is a new one – experimenting with information. And this is changing the way some of the classics are taught. According to The Economist, “the focus of the statistics course will now be to get students to think about what data they would like to have to make a decision, and how they would get that data.” Dean Rich Lyons says the purpose is to “turn them from consumers of data into experiment designers, producers of data.”

This should come as no surprise. Business and information technology are joined at the hip. But there’s a new spirit here. To highlight the contrast, let’s go back to one of the most formative ages in American business – the late 1800s.

At the turn of the 19th century, American industry was operating at a larger scale than ever before. Previously, most manufacturing firms were run on the shop model. There was an owner, a small administrative staff, a foreman, a few skilled artisans, and unskilled labor. Toward the end of the 19th century, a new kind of business came on the scene – the corporation – in industries like manufacturing, insurance, and foodstuffs. And its large-scale operations demanded a transformation of professional management.

The big idea of this transformation was systems. To produce a good or service of a given quality at scale meant rigorously systematizing the activities of the firm to root out waste and error. Frederick Taylor, with his now infamous time-motion studies, may be the best-known example. But what he called “scientific management” of the shop floor was part of a larger movement of systematic management that affected every aspect of a large-scale company, including how it managed its information.

Consider filing cabinets. Systematic management had a heavy emphasis on written communication, rather than spoken agreements or instructions. The main kinds of written documents in use at the time, other than double-entry ledgers, were correspondence, both outbound and inbound, and internal papers, like legal documents or management memos. These different kinds of documents had different purposes, uses, and physical forms. Copies of outbound correspondence were kept in large bound pressbooks ordered chronologically for later reference. Inbound correspondence were loose papers. Of course, these two were related to each other based on time, correspondent, subject, or some combination. Legal documents, often written on long papers, verified contractual relationships. Internal memos circulated to managers and served as a record of directives and policies. These varied documents were kept in cabinets that held pressbooks lying on their sides, sheaves of papers standing up, legal documents folded then laid in drawers, and pigeonholes for whatever might fit.

In 1893, the Library Bureau unveiled the vertical file at the Chicago World’s Fair. It promised a new world of information management. Based on library card files for providing easy access to books, the vertical file stored documents on edge standing up in folders with tabs, arranged in drawers. It won a gold medal. The vertical file steadily gained share against traditional filing cabinets. By 1912, a presidential commission on economy and efficiency noted that “vertical filing has practically supplanted all other systems” in large commercial organizations.

The vertical file was the epitome of a systematic information technology. In order for it to be useful as a storage and retrieval mechanism, the folders had to be organized by an overarching scheme — by recipient or subject, for example — usually arranged alphabetically. Then, everyone who put documents into the system had to file them according to the same rules, including exception handling. Deviation from this system would frustrate later retrieval.

The vertical file, in turn, was part of a larger system. The typewriter, standardized forms, and mimeography for making inexpensive copies also debuted in the last quarter of the 19th century. Together, they helped to create systems for recording, storing and retrieving information on the scale corporations required, at an acceptable cost.

The ability to record, store, and retrieve information meant there was more to analyze and communicate. This meant reports. At some companies, like certain divisions of DuPont, managers produced monthly, weekly, even daily reports. These were standardized to make their production more efficient. More efficient production of reports led to an overwhelming amount of information for executives to interpret and understand. By the 1920s, this was such a problem at DuPont that the firm created “the chart room”, a specially designed room for the executive committee to review data charts. According to JoAnne Yates, author of Control Through Communication, in the chart room “[a] system of tracks and switches allowed any one of the 350 charts to be moved to the center position. There, the committee, seated in a semicircle, could view and discuss trends for a given division or for two or more divisions.”

As incredible as this sounds, it also sounds very familiar. Standardized recording, storage, retrieval and reporting are still with us today. In fact, they’re more critical than ever before. And large companies have been running effectively with this approach to information management for over a hundred years now. So, what is Haas talking about?

The tacit assumption of industrial information technology was that standardized production would support standardized consumption. This no longer holds. This is not to say that companies don’t need standard metrics and reports. They do. But these are not enough. A rising generation of managers, raised on consumerized information technology on the internet, accept standardized production of information, but demand individualized consumption. Standardized reports are just a jumping off point for asking follow-on questions in the moment, leading to new perspectives on the data that no one knew would be needed. This individualized consumption then leads to individualized production of new ideas and discoveries.

And this is the theme Haas is grabbing onto as it plans to turns its graduates “from consumers of data into experiment designers, producers of data.” In a world where the chart room can deliver only 350 charts, and each one is made by hand, there have to be rigorous systems to make sure they’re the right 350. And once the system is set up, that’s what it produces. But in a world where boundless information is rapidly available, there’s no reason to be bound by someone else’s idea of the information you need to solve the problem at hand. Instead, imagine the data you need to create answers no one else can.

Paul Sonderegger

  • Share/Bookmark
Posted on June 10, 2010 at 2:06 pm · Permalink · One Comment
In: miscellaneous :)

It Listens More Than It Speaks

by Pete Bell

“If men do not pour new wine into old bottles, they do something almost as bad — they invest old words with new meanings.” That’s Herb Simon’s warning at the beginning of his landmark speech, “Designing Organizations for an Information Rich World.”

Last time out, I willfully ignored him, making the claim that Simon, father of the attention economy, would have been a Human-Computer Information Retrieval (HCIR) man. That’s because in that speech, he makes the case to design organizations around intelligence systems, not vice versa.

When you travel in time machines, you risk stepping on the butterfly that will evolve into your grandmother — or, something like that. Paul Sonderegger pointed out to me that although Simon may have shared religion with the 2010 HCIR community from way back in 1969, he also noted that Simon was a believer in strong artificial intelligence (AI).

If you’re new to our discussion on the role of AI in HCIR, Vladimir Zelevinsky put it in a nutshell: strong AI is the utopian version where computers are as smart as people. That’s in contrast to weak AI — the HCIR-friendly version — which “only needs to be better than humans in its own narrow range of applicability.” For example, AI is good at extracting subject-verb-object relationships from documents, but not good at telling you what they mean. But complement human intelligence with computer intelligence and you get a good HCIR solution, along the lines of Paul’s “Cyborg BI.”

With that in mind, let’s go back to old words with new meanings. I’m particularly interested in the word “summarize” in the speech, because summarization is something we at Endeca claim to be good at.

Simon says (no pun intended) that a good intelligence system should “summarize” the documents in it. That’s because

“An information processing subsystem (a computer, a new organization unit) will reduce the net demand on attention of the rest of the organization only it absorbs more information, previously received by others, than it produces — it listens and thinks more than it speaks.”

How can it listen more than speak?

“An information processing subsystem can perform an attention-conserving function for other systems in two ways: (1) it can receive and store information that would otherwise have to be received by those other systems, and (2) it can transform (‘filter’) information into an output that demands fewer hours of attention than the input information.”

He breaks that into four steps that are still familiar to the search world today:
1)Analyze
2)Draw inferences
3)Summarize
4)Index

He says these steps might be conducted by a human or a computer. At the time, computers where only good at indexing. Today, they’re still only good at indexing. I know that’s a contentious claim to make to a search audience, but I want to be true to the 1969 library science meanings of those terms. In fact, at Endeca, the summarization we are good it is quite different than the one Simon meant.

Simon believed that maybe one day AI might take on more of those roles. Indeed, today Microsoft Word has an “Auto Summarize” button. If it worked, we could call it the Simonize button (pun intended), because it would get us most of the way to a system that conserves attention. But here it is in action, summarizing my last post on Simon to 5% of its original length:

What information consumes is rather obvious: it consumes the attention of its recipients. Now, given the high costs of amassing information — costs paid not in amassing the information, but in consuming it later — you might expect Simon to advocate for not amassing information in the first place. (In 1969, there were also the massive hardware costs associated with an information system. Simon doesn’t advocate pre-filtering information though. Why?

Good from far but far from good. And summarizing to 10% length doesn’t help.

Now let’s switch to 2010 definitions, and we find that computers can do some analysis, inferencing, and summarization. But to the degree that they’re successful, there’s some HCIR in play. Computers aren’t taking the place of a person — they’re making a person more efficient. And yet again, Simon seems to anticipate this. As I said, he notes that either humans or computers might fulfill these roles. But he also says that humans might fulfill these roles in a way that assists computers. And indeed, human metadata generation is the rocket fuel of HCIR.

When we at Endeca claim to do summarization, we’re referring to something quite different than the Microsoft version. There are two main differences:
-We’re summarizing a collection of documents rather than a single document
-The summary is to meant to help a person interact with the collection, i.e. it feeds an HCIR interaction

Here’s an example from Newssift, R.I.P.:

At first glance, that looks like faceted search. And we’d say that faceted search is a form of summarization because it gives an overview of a set of results. But in our definition, summarization has a bigger umbrella. We’d also include other kinds of summaries, like the pie chart of the sentiment analysis on the lower left, or the expansions of the Ford Motor Company. Other summaries include text analytics widgets like tag clouds or BI visualizations like maps.

Here’s one more example. This is the “Idea Navigator” that Vladimir built:

Here, we’re using weak AI to extract subject-verb-object. Then, we’re summarizing those across a set of results.

In each case, we’re analyzing an index to summarize a collection, sometimes with some inferencing. I think Herb Simon would call it something different, but would agree that it listens more than it speaks.

To give Herb Simon the last words, here’s one more bit of wisdom, this time speaking as an economist, from his handwritten notes from 1964 to The Impact of Management Sciences and the Computer:

However powerful, computers will not replace man: the doctrine of comparative advantage — each will do what he is relatively efficient at.

Pete Bell

  • Share/Bookmark
Posted on June 3, 2010 at 5:43 pm · Permalink · Leave a comment
In: miscellaneous :)

The Nobel Prize For Attention Spans

by Pete Bell

During a foreign crisis in the late 1960s, a government agency found itself starved for information. They reacted by upgrading their intelligence system, replacing their slow teletype machines with the latest technology, high-throughput line printers. The result? When the next crisis hit, they were even more starved for information.

That was one jumping-off point for Herb Simon, the Nobel and Turing prize winner, as he formulated his now famous rule that an abundance of information causes a scarcity of attention. For anyone with an interest in HCIR, it’s worth going back to his original 1969 speech to the Brookings Institution, “Designing Organizations for an Information-Rich World,” to restore context to the soundbites. (CMU has a great facsimile of the manuscript with marginalia).

A refresher on Simon’s core argument:

“When we speak of an information-rich world, we may expect, analogically, that the wealth of information means a dearth of something else — a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.”

What strikes me most in the original speech is Simon’s pragmatism. He wasn’t designing intelligence systems. He was designing organizations that use intelligence systems. To take an anachronistic liberty, he saw it as an HCIR problem about the relationship between people and computers. As he put it:

“How can we design organizations, business firms and government agencies, to operate effectively in such a world; how we can arrange to conserve and allocate effectively their scarce attention.”

And his answer:

“The proper aim of a management information system is not ‘to bring the manager all the information he needs,’ but to reorganize the manager’s environment of information so as to reduce the amount of time he must devote to receiving it. Stating the problem in these two different ways leads to very different system designs.”

If that’s all Simon had to teach us, it would have been enough. But Simon has more insights from this vein of thinking about the proper relationship between people and information. From 1969, he anticipates the next 40 years of bad Business Intelligence marketing:

“The dream of thinking everything out before we act; of making certain that we have all the facts, know all the consequences, is a sick Hamlet’s dream. It is the dream of someone who has no appreciation of the seamless web of causation in the world, the limits of human thinking, or — our topic tonight — the scarcity of human attention.”

Of course, Simon wasn’t anticipating bad BI marketing. In this case, he was using the then-current crisis around the pesticide DDT to illustrate his point. Through their retrospectoscopes, Senate subcommittees wanted to know why chemical companies hadn’t more thoroughly tested DDT. But Simon counters that given the constraint of scarce human attention, early detection would have been impossible.

“There is no special virtue, in an information-rich world, in prematurely early warnings. We can best afford to let the world store the information for us until the time has come for us to focus our attention and thought on it.”

Scientists had limited, noisy lab data, and were working against a social background where DDT was considered a public health miracle for its efficacy against malarial mosquitoes and crop pests. Pragmatically speaking, you couldn’t design an organization and intelligence system to anticipate the coming problem.

That isn’t to say we should sit back and wait for the next DDT crisis. Instead, Simon advocates for vigilance and rapid response:

“It is costly to learn from experience. But it is also costly to carry out research and analysis to anticipate experience. Knowledge from the laboratory is not always cheaper — and frequently is much less reliable — than knowledge from life.”

This brings to mind all the utopian KM and BI systems whose data is obsolete by the time it is cleansed and organized.

Now, given the high costs of amassing information — costs paid not in amassing the information, but in consuming it later — you might expect Simon to advocate for not amassing information in the first place. (In 1969, there were also the massive hardware costs associated with an information system. But in another prescient turn, Simon does the thought experiment of first designing his systems without accounting for hardware costs. He anticipates that the human costs of the system would be the first order design constraint.) Simon doesn’t advocate pre-filtering information though. Why? Because in practice, it’s impossible.

“The bulk of information that flows into the system from its environment is irrelevant to action at the time it flows in. Much of it will never be relevant, but we can’t know in advance with certainty what part will and what won’t.”

So Simon leaves us with an organization in need of an intelligence system meeting these constraints:

Sounds like an HCIR man!

Over the years, we’ve been fortunate to watch visionary clients design their organizations around an intelligence system, rather than vice versa. Whirlpool and Harris are two of my favorite examples, and it’s pure Herb Simon.

[An aside, if I still have your attention: I've written a few posts about authority facets. The idea is that faceted search systems give us new ways to filter information by the many facets related to the creator of the data (tenure, title, reputation, etc.) and its provenance (source systems, age, governance). Along these lines, Simon writes about the link between authority and attention. As he puts it, given a flood of information and scarce attention, "attention and legitimacy are interdependent." That is, certain people, degrees, journals, etc. help us decide which information to process. I like that lens of authority facets helping us focus scarce attention.]

Pete Bell

  • Share/Bookmark
Posted on May 24, 2010 at 9:54 am · Permalink · 3 Comments
In: HCIR