Monday, March 9, 2009

head mounted projector





Today, the pace of surgical innovations has increased
dramatically, as have the societal demands for safe and
effective practices. The mechanisms for training and retraining
suffer from inflexible timing, extended time
commitments, and limited content. Video instruction has
long been available to help surgeons learn new
procedures, but it is generally viewed as marginally
effective at best for a number of reasons, such as the fixed
point of view that is integral to the narration, lack of depth
perception and interactivity, and missing information [1].
In short, the experience of watching a video is not
sufficiently close to being there and seeing the procedure.
A paradigm that uses immersive Virtual Reality could
be a more effective approach to allow surgeons to witness
and explore a past surgical procedure as if they were
there. We are indeed pursuing such an immersive
paradigm together with our medical collaborators at the
UNC-Chapel Hill School of Medicine (Dr. Bruce Cairns
and Dr. Anthony Meyer), and our computer graphics
collaborators at Brown University (Andy van Dam et al).
This paradigm demands methods to record the procedure
and to reconstruct the original time-varying events to
create an immersive 3D virtual environment of the real
scene. A more complete solution should also allow
relevant instructions and information, such as vocal
narration, 3D annotations and illustrations, to be added by
the original surgeon or other instructors.
Besides the recording and the reconstruction, providing
an effective way to display a 3D virtual environment to
the user is also a major challenge. In this paper, we
introduce a hybrid approach to address this challenge.
During a typical use of the training system, the trainee
would usually stand beside the patient paying close
attention to the surgery. She might even stand in the
position of a surgeon and observe the procedure from his
(a) (b)
Figure 1. Different views of a surgical operation.
Figure 2. A user using our prototype system based
on our hybrid display approach that combines a
HMD and a projector-based display.
point of view. At the same time, the trainee is also
required to be aware of the surrounding events that could
affect the surgeons’ actions. Such surrounding events
include the actions of other surgeons and technicians,
changes in monitoring and life-support devices, and
overall awareness of the patient’s dynamic condition.
Figure 1(a) shows a close-up view of a real surgical
operation in progress, and Figure 1(b) shows a snapshot
of the many events happening in the operation room.
The visual needs of the trainee can be divided into two
main parts. The first part requires high-quality stereo view
of the objects and events that the trainee is paying direct
attention to, such as the main surgical procedure. Highquality
and high-resolution views are needed to discern
the great intricacy of the surgery, and stereovision is
needed for better spatial understanding. The second part
of a trainee’s visual needs is the peripheral view of her
surroundings. This is needed by the trainee to maintain
visual awareness of the surrounding events. Our medical
collaborators, and others in the field, feel that visual
awareness of the entire patient and the surroundings is a
critical component of surgical training. In particular, with
trauma surgery there is typically a lot of relevant activity
in the operating room. It has been found that in the human
visual system, resolution in the periphery is less dense
than in the fovea [2], therefore peripheral view need not
be high-resolution and high-quality.
Traditionally, head-mounted displays (also called
head-worn displays) have been used to provide highquality
stereo visualization of 3D virtual environments.
However, most HMDs offer limited fields of view, often
only 40° to 60° horizontally and 30° to 45° vertically.
Wide-FOV HMDs have been manufactured, but they are
rare, expensive and heavy to wear. We are aware of no
HMD that can fully cover the human field of view of
approximately 200° horizontally and 135° vertically [3].
Although HMDs are good at providing high-quality stereo
views, the generally narrow FOV has rendered them less
than ideal for providing peripheral views.
The common alternatives to HMDs for immersive
visualization of 3D virtual environments are immersive
projector-based displays, such as the CAVETM [4]. Most
immersive projector-based displays are capable of
providing very wide-field-of-view visualization, and like
CAVETM, some of them are even capable of fully
covering the human field of view. Because of the
relatively large display surfaces and the fact that the user
may move close to them, the image quality and resolution
of such projector-based systems may be insufficient for
applications that require the display of fine details.

Image recognation



Humans currently have substantial performance advantages over machines in several areas, including
object recognition, knowledge representation, reasoning, learning and natural language processing
[RN03]. Intruigingly, most of the hard problems arising in these areas can naturally be cast as
NP-hard optimization problems, with the majority reducible to pattern matching problems such as
maximum common subgraph [Smi99, EV07, Bun00, BDK+08, Sin02]. The formal intractability
of most problems associated with human intelligence is at the heart of the continued difficulties AI
researchers face in mimicking or surpassing human capabilities in these areas.
It may seem surprising that capabilities that we take for granted and perform quite easily could
be computationally intractable. However it is important to remember that this intractability does not
preclude efficient generation of approximate solutions. In practice, exact solutions to optimization
problems arising in AI are not required. Generally there is a graceful degradation of performance
as a solution moves away from global optimality. Because of this behavior the ideal computational
1
Figure 1: Object recognition by image matching proceeds by pairing points in two images that correspond
to the same structure in the outside world. In the algorithms considered here, both feature
similarity and geometric consistency are considered in determining to what extent two images are
similar.
approach is to use specialized heuristic algorithms to attack these problems [Sim95]. It is interesting
to note that human brains are thought to contain structures specialized for pattern matching
(‘wetware heuristics’) that are used to support a variety of capabilities for which humans still hold a
performance advantage over machines, and that these structures have been used as inspirations for
development of successful heuristic algorithms [Sin02, Mou97, Mac91].
In this article we focus on the quintessential pattern recognition problem of deciding whether
two images contain the same object. This is a typical example of a capability in which humans
outperform modern computing systems and can be thought of as an NP-hard optimization problem.
We begin to explore whether quantum adiabatic algorithms [EFS00, CFGG00, BBTA99, SMTC02]
can be employed to obtain better solutions to this problem than can be achieved with classical optimization
algorithms. The first step in this exploration is to map image recognition into the particular
input format required for running quantum adiabatic algorithms on D-Wave superconducting AQC
processors.
2 Image matching
A popular method to determine whether two images contain the same object is image matching.
Image matching in its simplest form attempts to find pairs of image features from two images
that correspond to the same physical structure. An image feature is a vector that describes the
neighborhood of a given image location. In order to find corresponding features two factors are
typically considered: feature similarity, as for instance determined by the scalar product between
feature vectors, and geometric consistency. The latter is best defined when looking at rigid objects.
In this case the feature displacements are not random but exhibit correlations brought about by
a change in viewpoint. For instance, if the camera moves to the left we observe translations of
the feature locations in the image to the right. If the object is deformable or articulate then the
feature displacements are not solely determined by the camera viewpoint anymore but one can
2
Figure 2: Representation of images as labeled graphs. Shown are three exemplary interest points
for each image. The number of interest points detected is content dependent but is on the order
of several hundred for 640x480 resolution images with content as shown. Each interest point is
assigned a position, scale, and orientation [Low99]. In the figure the scale is indicated by a circle
and the orientation by a pointer. This information can be used to characterize the relative pose and
position of two interest points denoted by the vectors ~g next to the dotted lines.
still expect that neighboring features tend to move in a similar way. Thus image matching can
be cast as an optimization problem in which one attempts to minimize an objective function that
consists of two terms. The first term penalizes mismatches between features drawn from image one
and placed at corresponding locations in image two. The second term enforces spatial consistency
between neighboring matches by measuring the divergence between them. It has been shown that
this constitutes an NP-hard optimization problem [FH05].

Quantum Computers




By combining quantum computation and quantum interrogation, scientists at the University of Illinois at Urbana-Champaign have found an exotic way of determining an answer to an algorithm – without ever running the algorithm.


The world's first commercial quantum computer strutted its stuff in Reno, Nevada at the SC07 supercomputing conference. D-Wave Systems Inc. collaborated with Google to demonstrate how quantum computers can perform image recognition tasks at speeds rivalling human capabilities. The Neven-based image recognition and search-by-image capability was acquired by Google when it bought Neven Vision in 2006.
"Our image-matching demonstration, the core of which is too difficult for traditional computers, can automatically extract information from photos?recognising whether photos contain people, places or things?and then categorise the image elements by visual similarity," said Geordie Rose, D-Wave founder and CEO.
Google acquired Neven Vision for its expertise in recognising similarities among photos. Among the image-recognition tasks, the simplest would include determining whether a photo contains a person; the most complex would be accurate classification of images by person, place and thing. Even after tuning the algorithms so that they sidestepped the most difficult image-recognition problems, however, they remained too slow for practical deployment in the Google application.
"We have been collaborating with Hartmut Neven, founder of Neven Vision, since Google acquired it," said Rose. "Neven's original algorithms had to make many compromises on how they did things, since ordinary computers can't do things the way the brain does. But we believe that our quantum computer algorithms are not all that different from the way the brain solves image-matching problems, so we were able to simplify Neven's algorithms and get superior results."