bbvaopenmind | In George Orwell’s
1984,(39) it was the totalitarian Big Brother government who put the
surveillance cameras on every television—but in the reality of 2016, it
is consumer electronics companies who build cameras into the common
set-top box and every mobile handheld. Indeed, cameras are becoming
commodity, and as video feature extraction gets to lower power levels
via dedicated hardware, and other micropower sensors determine the
necessity of grabbing an image frame, cameras will become even more
common as generically embedded sensors. The first commercial, fully
integrated CMOS camera chips came from VVL in Edinburgh (now part of ST
Microelectronics) back in the early 1990s.(40) At the time, pixel
density was low (e.g., the VVL “Peach” with 312 x 287 pixels), and the
main commercial application of their devices was the “BarbieCam,” a toy
video camera sold by Mattel. I was an early adopter of these digital
cameras myself, using them in 1994 for a multi-camera precision
alignment system at the Superconducting Supercollider(41) that evolved
into the hardware used to continually align the forty-meter muon system
at micron-level precision for the ATLAS detector at CERN’s Large Hadron
Collider. This technology was poised for rapid growth: now, integrated
cameras peek at us everywhere, from laptops to cellphones, with typical
resolutions of scores of megapixels and bringing computational
photography increasingly to the masses. ASICs for basic image processing
are commonly embedded with or integrated into cameras, giving
increasing video processing capability for ever-decreasing power. The
mobile phone market has been driving this effort, but increasingly
static situated installations (e.g., video-driven motion/context/gesture
sensors in smart homes) and augmented reality will be an important
consumer application, and the requisite on-device image processing will
drop in power and become more agile. We already see this happening at
extreme levels, such as with the recently released Microsoft HoloLens,
which features six cameras, most of which are used for rapid environment
mapping, position tracking, and image registration in a lightweight,
battery-powered, head-mounted, self-contained AR unit. 3D cameras are
also becoming ubiquitous, breaking into the mass market via the original
structured-light-based Microsoft Kinect a half-decade ago.
Time-of-flight 3D cameras (pioneered in CMOS in the early 2000s by
researchers at Canesta(42) have evolved to recently displace structured
light approaches, and developers worldwide race to bring the power and
footprint of these devices down sufficiently to integrate into common
mobile devices (a very small version of such a device is already
embedded in the HoloLens). As pixel timing measurements become more
precise, photon-counting applications in computational photography, as
pursued by my Media Lab colleague Ramesh Raskar, promise to usher in
revolutionary new applications that can do things like reduce diffusion
and see around corners.(43)
My research group began exploring
this penetration of ubiquitous cameras over a decade ago, especially
applications that ground the video information with simultaneous data
from wearable sensors. Our early studies were based around a platform
called the “Portals”:(44) using an embedded camera feeding a TI DaVinci
DSP/ARM hybrid processor, surrounded by a core of basic sensors (motion,
audio, temperature/humidity, IR proximity) and coupled with a Zigbee RF
transceiver, we scattered forty-five of these devices all over the
Media Lab complex, interconnected through the wired building network.
One application that we built atop them was “SPINNER,”(45) which
labelled video from each camera with data from any wearable sensors in
the vicinity. The SPINNER framework was based on the idea of being able
to query the video database with higher-level parameters, lifting sensor
data up into a social/affective space,(46) then trying to effectively
script a sequential query as a simple narrative involving human subjects
adorned with the wearables. Video clips from large databases sporting
hundreds of hours of video would then be automatically selected to best
fit given timeslots in the query, producing edited videos that observers
deemed coherent.(47) Naively pointing to the future of reality
television, this work aims further, looking to enable people to engage
sensor systems via human-relevant query and interaction.
Rather than try to extract stories
from passive ambient activity, a related project from our team devised
an interactive camera with a goal of extracting structured stories from
people.(48) Taking the form factor of a small mobile robot, “Boxie”
featured an HD camera in one of its eyes: it would rove our building and
get stuck, then plea for help when people came nearby. It would then
ask people successive questions and request that they fulfill various
tasks (e.g., bring it to another part of the building, or show it what
they do in the area where it was found), making an indexed video that
can be easily edited to produce something of a documentary about the
people in the robot’s abode.
In the next years,
as large video surfaces cost less (potentially being roll-roll printed)
and are better integrated with responsive networks, we will see the
common deployment of pervasive interactive displays. Information coming
to us will manifest in the most appropriate fashion (e.g., in your smart
eyeglasses or on a nearby display)—the days of pulling your phone out
of your pocket and running an app are severely limited. To explore this,
we ran a project in my team called “Gestures Everywhere”(49) that
exploited the large monitors placed all over the public areas of our
building complex.(50) Already equipped with RFID to identify people
wearing tagged badges, we added a sensor suite and a Kinect 3D camera to
each display site. As an occupant approached a display and were
identified via RFID or video recognition, information most relevant to
them would appear on the display. We developed a recognition framework
for the Kinect that parsed a small set of generic hand gestures (e.g.,
signifying “next,” “more detail,” “go-away,” etc.), allowing users to
interact with their own data at a basic level without touching the
screen or pulling out a mobile device. Indeed, proxemic interactions(51)
around ubiquitous smart displays will be common within the next decade.
The plethora of cameras
that we sprinkled throughout our building during our SPINNER project
produced concerns about privacy (interestingly enough, the Kinects for
Gestures Everywhere did not evoke the same response—occupants either did
not see them as “cameras” or were becoming used to the idea of
ubiquitous vision). Accordingly, we put an obvious power switch on each
portal that enabled them to be easily switched off. This is a very
artificial solution, however—in the near future, there will just be too
many cameras and other invasive sensors in the environment to switch
off. These devices must answer verifiable and secure protocols to
dynamically and appropriately throttle streaming sensor data to answer
user privacy demands. We have designed a small, wireless token that
controlled our portals in order to study solutions to such concerns.(52)
It broadcast a beacon to the vicinity that dynamically deactivates the
transmission of proximate audio, video, and other derived features
according to the user’s stated privacy preferences—this device also
featured a large “panic” button that can be pushed at any time when
immediate privacy is desired, blocking audio and video from emanating
from nearby Portals.
Rather than block
the video stream entirely, we have explored just removing the
privacy-desiring person from the video image. By using information from
wearable sensors, we can more easily identify the appropriate person in
the image,(53) and blend them into the background. We are also looking
at the opposite issue—using wearable sensors to detect environmental
parameters that hint at potentially hazardous conditions for
construction workers and rendering that data in different ways atop
real-time video, highlighting workers in situations of particular
concern.(54)
0 comments:
Post a Comment