I am currently directing 4 PhD students in the Network Information Systems Lab (NISLab). My research interests are in the areas of networking and
distributed
systems, particularly in support of information systems that can scale
with both network size and data size. Select findings are
summarized below.
Publish/Subscribe Middleware for Sensor Networks
Earth is beginning to don an electronic skin. Every day, it is being stitched together across the planet by the deployment of more and more sensor networks to monitor the environment. Most of today's sensor networks, however, are in a ``passive" role, primarily to register the events that happen in the physical world. Sensor networks of tomorrow should be empowered with better built-in intelligence, making them not just a passive data collector, but also an autonomous system able to compute, control, and take action based on the information gathered.
Toward the above ultimate goal, the project is aimed to investigate, design, and implement enabling technologies for the realization of a system functionality that will be key to the development of next-generation sensor networks: the publish/subscribe functionality. The expected result will be a publish/subscribe infrastructure that allows for efficient coordination and communication among the nodes in the network in case of important events. For the nodes that need to take action when a certain event occurs somewhere, this infrastructure provides an efficient mechanism for them to subscribe their event interests. For the node that detects the event, the infrastructure enables this node to publish the event in the network properly so that the interested nodes can be quickly notified and thus act accordingly. Although the publish/subscribe paradigm has been studied extensively for the Internet, sensor networks present a much more challenging environment. Under sensor networks' resource constraints, the main challenge is how to design the publish/subscribe functionality so that it can be integrated in sensor networks of any size, support a wide range of applications, and allow not just one but possibly multiple services to run on the same network at any time without the need for network re-programming. The proposed research is a forward-looking and holistic effort to address this challenge. Based on a novel content-based networking approach, the research will result in a solution that has a high level of applicability and generality. The preliminary work is encouraging. The proposed research, if successful, will make substantial advances to the state of the art. It will enable not only a large class of applications for today's sensor networks, but also innovative ways to develop smarter sensor networks for the future. Besides theoretical contributions, the expected outcome includes software development and a real-world implementation in Boston Harbor.
Learning-based Techniques for Target Localization and Tracking in Sensor Networks
Recent advances in sensor technologies have made the task of
target localization and
tracking more efficient as a network of sensors nowadays can be spread
over a large geographic area to detect and monitor events of interest.
Benefited are many important applications such as those aimed at
following the footsteps of intruders in a highly secure area,
monitoring habitat mobility in a wild forrest, or predicting the
trajectory of a progressing disaster.
Usually, the location, velocity, and orientation of a target can be
inferred based on the locations of the detecting sensors. This requires
that the locations of most sensors be known, either using GPS or a
localization technique. While in many situations we cannot afford for
every sensor a built-in GPS-like capability, going through a
localization procedure for all the sensors is expensive when the
network is large. Especially, if targets appear sparse, which is
typical in practice, the locations of many sensors may never be used.
Thus, a necessary problem is, in a large sensor network where only a
small set of anchor nodes at known locations may be deployed, how
target tracking can be conducted effectively without computing the
location for every sensor in advance. In the proposed research, we
address this problem with an additional requirement that only hop-count
or signal-strength information may be used for the tracking task. Since
this information can be obtained easily at no cost, we can work with a
wide range of sensor networks, including those with resource-limited
sensors. We are interested in targets that can be of different types:
stationary, moving, single, or multiple. The case of multiple moving
targets is the most challenging. As their paths may overlap, not only
need we individually localize each target over time, we should be able
to distinguish its path from others'.
We approach the problem from the perspective of machine learning [ICCCN 2009, TPDS 2008]. The
main insight is that the topology implicit in sets of sensor readings
and anchor locations can be exploited to construct possibly
non-Euclidean signal-based and/or hop-count function spaces that are
useful for the prediction of unknown target locations, as well as other
extrinsic quantities of interest. While most existing techniques
utilize the Euclidean geometrical properties of the sensor network to
imply about the target locations, we work directly on the natural
non-Euclidean coordinate systems provided by the sensor devices.
Specifically, we transform the tracking problem into regression and
classification problems, the two popular subjects of machine learning,
and design kernel-based learning methods to solve them. Our approach,
therefore, is fundamentally novel.
Complex Networks: Generative Models and Search
Many
real-world networks exhibit a power-law degree distribution, the
property that defines scale-free networks. In understanding
this
phenomenon, it is important to have a generative
model that is able
to construct scale-free networks that imitate real-world
networks.
Some of the existing models are merely formulaic and do not offer an
explanation that can be interpreted naturally or intuitively, while the
others require certain information that is not available in many
real-world circumstances.
We have proposed the LogNormal Fitness
Attachment (LNFA) model [GRIDPEER 2009, IJPEDS 2009], a novel model for generating scale-free
networks with a wide range of power-law exponents as seen in real-world
networks. LNFA is based on the reasonable assumption that the effect of
various attributes, which determine the ability of each node to attract
other nodes, is multiplicative. This composite attribute or fitness is
lognormally distributed and is used in forming the complex network. The
fitness distribution is
characterized by a single statistical parameter and we show that, by
varying this parameter, LNFA can recover power-law networks as well as
those networks that exhibit exponential and monopolistic
(winner-take-all) degree distributions. Not only LNFA offers a
naturally comprehensibleexplanation for the growth of real-world
networks, LNFA can construct networks with a tunable degree
of dissortativity, a
property that is common in real-world technological networks and
biological networks. We have shown that increasing
dissortativity also results in better efficiency of searching
in networks. For the same degree sequence, the network generated with
LNFA enables much more efficient search than the network constructed by
the configuration model in \cite{newman2001} which is widely-considered
``the" algorithm to construct a network satisfying a given arbitrary
degree distribution. Inspired by the concepts in LNFA, we are working
on a similar model for generating scale-free networks that have
positive
assortativity, instead of dissortativity. We are also investigating
further the helpfulness of LNFA in supporting different
random search
algorithms that assume different kinds of information known to each
node.
Publish/Subscribe Architecture for P2P Networks
Search applications can be categorized into two models:
request/response and publish/subscribe. The former is the traditional,
where a query is submitted on demand expecting immediate
results; if none exists, a response indicating so is returned.
Contrarily, a query in the publish/subscribe model is submitted and
stored in advance, for which the results may not yet exist but the
query subscriber expects to be notified when they later become
available. This model is thus suitable for search applications where
queries await future information, as opposed to the traditional
applications where the information to be searched must pre-exist. This
project is focused on the publish/subscribe model and our goal is
to devise a mechanism that can be integrated into a given P2P network
to enable applications of this model. With this mechanism, a human
monitor in a P2P-based geographical observation network will be able to
subscribe a query to receive alerts of fire occurrences so that
necessary rescue efforts can be dispatched quickly; or in a P2P-based
scientific information sharing network, a subscriber will be notified
when new scientific information is published.
For DHT structured P2P networks, we have developed a random projection based method [ICCCN 08] that can deploy any publish/subscribe service on top of a CAN-based network. Most recently, we have developed PUB-2-SUB [IFIP Networking 09] which can be integrated into any unstructured network to allow any number of content-based publish/subscribe applications to be deployed simultaneously. Unlike the gossip-based approach previously recommended for unstructured networks, the proposed technique is based on directed routing and incurs less storage and communication costs. This is evident in an evaluation study in which PUB-2-SUB is compared to a representative technique of the other approach. It is also found that our technique results in low computation cost and low notification delay and remains highly effective in cases when many nodes in the network stop to function. PUB-2-SUB works best for P2P-based cooperative networks in which the nodes are supposed to be functional most of the time and failures should not happen too often. Thus, data grid networks and institutional collaborative networks can take full advantage of the proposed technique.
Storage and Search in P2P/Overlay Networks
Many good data indexing and search techniques were already
proposed for
centralized systems (e.g., database management systems, web search
engines). Yet, it remains a challenge how to achieve the same success
for systems whose data and computing nodes are decentralized and
frequently changing. The reason is that while existing
solutions focus mainly on data complexity, for decentralized data
networks we should consider both data complexity and network complexity
and their effect on each other. In other words, in addition to the
curse of dimensionality and insert/delete operations of the data, we
should also address the dimensionality (network size, connectivity) and
churn effect (insert/delete operations) of the nodes. I have proposed
EZSearch [USENIX 2005, COMCOM 2008] – a self-organizing and decentralized
infrastructure
that decouples the data component and network component for efficient
management and provides an indexing map between these two components
for effective data retrieval. EZSearch is highly scalable with the
growth of both the data and network. I have been awarded a
$225K NSF grant to explore other potentials of EZSearch and implement a
real system with additional functionalities such as security and
economics. The period of this grant is 9/2006 – 9/2009. Go
to the website of this project.
Multimedia Dissemination in P2P/Overlay Networks
The
traditional method for multimedia dissemination is client/server based,
where the content is always provided directly from the source or its
mirror sites. This method is not scalable due to the limit of server
bandwidth. The P2P method, on the other hand, allows clients to forward
content previously obtained to other clients and, therefore,
potentially has an excellent scalability. The quality of service
(delay, loss, smoothness) in this network, however, is difficult to
maintain due to the high frequency of node membership changes. I
proposed a novel solution called Zigzag [INFOCOM 2003, JSAC 2004] that builds a
highly robust and efficient P2P framework for effective multimedia
dissemination. This idea, since its publication, has received more than
500 citations (source: Google Scholar). Its original INFOCOM
paper
[INFOCOM 2003] was
one of the most-cited documents published in 2003 (source:
http://citeseer.ist.psu.edu/articles2003.html).
I was also the chief architect of a series of highly scalable overlay techniques for large-scale multimedia multicast [18, 16, 17, 14, 13, 8]. Among these techniques is Range Multicast [18, 16, 8], an innovative overlay multicast protocol for video on demand. This work, implemented in both Windows and Linux OS’s, led to a $300K NSF grant (2001-2004) awarded to my advisor, which funded my doctoral study.
Multimedia Broadcast in Mobile Ad hoc Networks
In this project, I focused on enabling
multimedia services in
infrastructure-less networks. Specifically, I proposed MobiVoD [MDM 2004] -
the first system design for video-on-demand applications in ad hoc
networks. I also proposed a congestion-adaptive ad hoc routing protocol
(CRP) [WCNC 2005, TPDS 2006], which is suitable for multimedia communications.
