Thoughts Scribbled on Paper

Scribbles on Technology, Innovation, and Leadership

Posts Tagged ‘prag


leave a comment »

Mobile devices like smart phones, iPhone’s, tablets, and iPad’s, just to name a few, have made it possible to access the web from just about anywhere and anytime. As the web becomes more ubiquitous, so do these devices; people carry them around and access and interact with them often. These devices have become more mobile, interactive, and connected than ever before. iPhone’s and iPad’s exemplify the further shift towards simplicity, mobility, and connectivity of computing platforms.

One of the salient attributes of these devices that makes them so simple to interact with is touch capability. This capability is, in fact, essential for the ease of use and therefore, the utility of these devices while mobile. Touch is not only a compelling input modality for these mobile devices; it also drives new user experiences and engagement through interactivity, responsiveness, and simplicity.

Mobile and connected devices with multi-touch capabilities have become ideal platforms for gaming scenarios and increased interactivity. It has become common for users to play gaming apps on these devices in their spare time, even when they are on the go and have a couple minutes to spare. I often see people playing Fruit Ninja and Tetris in elevators , lobbies, or during lunch breaks, Traffic Rush or Angry Birds while waiting in checkout lines, Solitaire, Peggle, and Sudoku on bus and plane rides. Popular iPhone and Android gaming apps have high engagement, at times short-lived, and heightened interactivity; they are fun and entertaining. Tetris has over 100 million paid downloads, while Angry Birds has over 7 million and Fruit Ninja has over 2 million iPhone paid downloads. It is an easy choice to pay a few bucks for a game that can be taken and played anywhere.

Touch interactivity and the added bonus of portability on a device that is already used for making and receiving calls will make these games more popular on mobile phones than on handheld gaming devices. Can you slip your PlayStation Portable or Nintendo DS into your pant or coat pocket? If you really wish to carry your Sony or Nintendo handheld on you for ‘gaming-on-the-go’, you should perhaps consider getting yourself an utilikilt? But, wait — are there any limitations of touch on these mobile devices?

One of the problems, and therefore an opportunity, with touch interaction on mobile devices is the imprecise or approximate nature of the input modality. The so-called `fat finger problem’ extends beyond the problems of fingers being too wide to hit the desired contact point and users not being able to see what they are working on while touching the screen; consumer touch sensors lack the fidelity necessary to pick fingertip details at the contact point to increase precision of input. Combining the keyboard and screen has brought attention to this problem; however, the minuteness of display and imprecision of touch applies to all forms of input modalities–keyboards, icons, strokes, and movements.

Google recently acquired BlindType that allows users to simply start typing anywhere and their fingertip movements are recognized and typed into text. This enables users to type without having to look at the screens of their handhelds. Swype (founded by Cliff Kushler of T9 text-completion fame) uses a similar method of dragging the fingertip to each letter on the touch-screen keyboard instead of pecking each letter out to enter text. This process of registering strokes and movements of finger tips for multi-touch devices opens up a completely new form of interaction and feedback for user experience. Touch brings in a marked change in interaction design as it actively solicits feedback.

All natural interactions require learning. If movements, either in the form of touch or gestures, can be understood using machine learning and used for manipulating inputs and results, users can express themselves interactively and progressively refine what they are looking for. This two-way learning process, whereby the interface understands the user better and the user learns to interact with the interface more efficiently, becomes a natural and yet very powerful mode of communication; any data or information can be interactively molded to fit a perceptual model that best suits the user. This interactive process will increase ‘stickiness’ because user interactions are better modeled and understood over time and in effect, users will feel much `more connected’ to the device. The process, moreover, can be made fun and engaging as in most gaming apps. Touch happens to be one of the most powerful modalities of interaction as it transcends the spatially and temporally discrete nature of the keyboard and the spatial locality of the mouse.

How can we design products and services that will engage and encourage people to maximize their productivity though experiences that are tactile, kinesthetic, and interactive?

What do you say — “touché” or “pas de touché”?


Written by Pragyan Mishra

October 2, 2010 at 11:46 pm

Posted in Design, Technology

Tagged with , , , , , ,

The Social Nature of Image Interpretation

leave a comment »

Current online image search technologies use keyword queries to fetch results from large corpus of crawled images that are primarily indexed by textual data. Text surrounding images on webpages, tags or annotations, captions are used for indexing the content in the image. This severely limits the scope of what images can be searched; images that do not have sufficient text associated with them or are not accurately described by the associated text are not indexed appropriately. This limitation, in turn, affects the quality and relevance of image search results. Try searching for a ‘striped brooks brothers shirt‘ on Google or Bing image search, or even worse a ‘green squirtle‘ which happens to be a specific Pokémon character. In fact, one can argue that the more the specificity of the text description for querying images, the worse are its results.

The problem that I am describing here is more fundamental in nature—Google, Bing, and Yahoo! search for visual data such as images using a text description or set of keywords. The underlying assumption is that the mapping from text or keywords to images is both known and accurate. This assumption, albeit a statistically significant one (text surrounding an image usually has some relationship to the content of the image), may not always hold true as observed. Tags and annotations, primarily from user-generated or annotated data, can alleviate this problem to a great extent; however, this approach is not practical and often subjective due to the disparity in user-generated tags and annotations.

Talking about subjectivity of content in images–I was recently referred to a few social bookmarking services for images such as Image Spark, Visualize Us, FFFFound!, and Picture For Me. The theme for all these websites (as eloquently stated by ImageSpark) is “Discover, share, tag, and converge images that inspire you and your work.” If you browse through the collections you will find two distinct categories of popular tags — the first category is fairly objective and content specific while the second is subjective and abstract–that depends more on the eyes of the beholder. Tags such as ‘female’, ‘painting’, ‘nature’, ‘black and white’, ‘portrait’, and ‘illustration’ belong to the first category. The second category, which is as significant and comparable in size as the first, has tags like ‘beautiful’, ‘beauty’, ‘love’, ‘funny’, ‘cute’, ‘romantic’, and ‘inspiration’ — all of which are very subjective. If you are anything like me, most of these tags will depend on your mood and to a lesser degree, on the content of an image. While the first category of tags can be ‘learned’ by machine-learning algorithms, the second category has to account for the user’s ‘mood’ or the ‘beholder’s eye’ in addition to the content of the image. This brings in the social or community facet to the interpretation and querying of images. We can build learning algorithms that will drive how we search for images based on how and what we see in images and this of course, can be learned from the likes and dislikes of the ‘birds of the same feather’ as us–the networks we belong to.

How do we make accessing image content simpler and yet more precise? How can we precisely annotate the content of an image precisely for what it will be searched for (or for what it is worth)?

To answer these questions we must first answer this —

What are our peeps looking at?

Written by Pragyan Mishra

September 4, 2010 at 11:33 pm