I am currently directing 4 PhD students in the Network Information Systems Lab (NISLab). My research interests are in the areas of networking and distributed systems, particularly in support of information systems that can scale with both network size and data size. Select findings are summarized below.



Publish/Subscribe Middleware for Sensor Networks

Earth is beginning to don an electronic skin. Every day, it is being stitched together across the planet by the deployment of more and more sensor networks  to monitor the environment. Most of today's sensor networks, however, are in a ``passive" role,  primarily to register the events that happen in the physical world. Sensor networks of tomorrow should be empowered with better built-in intelligence, making them not just a passive data collector, but also an autonomous system able to compute, control, and take action based on the information gathered. 

Toward the above ultimate goal, the project is aimed to investigate, design, and implement enabling technologies for the realization of a system functionality that will be key to the development of next-generation sensor networks: the publish/subscribe functionality.  The expected result will be a publish/subscribe infrastructure that allows for efficient coordination and communication among the nodes in the network in case of important events. For the nodes that need to take action when a certain event occurs somewhere, this infrastructure provides an efficient mechanism for them to subscribe their event interests. For the node that detects the event, the infrastructure enables this node to publish the event in the network properly so that the interested nodes can be quickly notified and thus act accordingly. Although the publish/subscribe paradigm has been studied extensively for the Internet, sensor networks present a much more challenging environment. Under sensor networks' resource constraints, the main challenge is how to design the publish/subscribe functionality so that it can be integrated in sensor networks of any size, support a wide range of applications, and allow not just one but possibly multiple services to run on the same network at any time without the need for network re-programming. The proposed research is a forward-looking and holistic effort to address this challenge. Based on a novel content-based networking approach, the research will result in a solution that has a high level of applicability and generality.  The preliminary work is encouraging. The proposed research, if successful, will make substantial advances to the  state of the art. It will enable not only a large class of applications for today's sensor networks, but also innovative ways to develop smarter sensor networks for the future. Besides theoretical contributions, the expected outcome  includes software development and a real-world implementation in Boston Harbor.  


Learning-based Techniques for Target Localization and Tracking in Sensor Networks

Recent advances in sensor technologies have made the task of target localization and tracking more efficient as a network of sensors nowadays can be spread over a large geographic area to detect and monitor events of interest. Benefited are many important applications such as those aimed at following the footsteps of intruders in a highly secure area, monitoring habitat mobility in a wild forrest, or predicting the trajectory of a progressing disaster.

Usually, the location, velocity, and orientation of a target can be inferred based on the locations of the detecting sensors. This requires that the locations of most sensors be known, either using GPS or a localization technique. While in many situations we cannot afford for every sensor a built-in GPS-like capability, going through a localization procedure for all the sensors is expensive when the network is large. Especially, if targets appear sparse, which is typical in practice, the locations of many sensors may never be used.

Thus, a necessary problem is, in a large sensor network where only a small set of anchor nodes at known locations may be deployed, how target tracking can be conducted effectively without computing the location for every sensor in advance. In the proposed research, we address this problem with an additional requirement that only hop-count or signal-strength information may be used for the tracking task. Since this information can be obtained easily at no cost, we can work with a wide range of sensor networks, including those with resource-limited sensors. We are interested in targets that can be of different types: stationary, moving, single, or multiple. The case of multiple moving targets is the most challenging. As their paths may overlap, not only need we individually localize each target over time, we should be able to distinguish its path from others'.

We approach the problem from the perspective of machine learning [ICCCN 2009, TPDS 2008]. The main insight is that the topology implicit in sets of sensor readings and anchor locations can be exploited to construct possibly non-Euclidean signal-based and/or hop-count function spaces that are useful for the prediction of unknown target locations, as well as other extrinsic quantities of interest. While most existing techniques utilize the Euclidean geometrical properties of the sensor network to imply about the target locations, we work directly on the natural non-Euclidean coordinate systems provided by the sensor devices. Specifically, we transform the tracking problem into regression and classification problems, the two popular subjects of machine learning, and design kernel-based learning methods to solve them. Our approach, therefore, is fundamentally novel.


Complex Networks: Generative Models and Search

Many real-world networks exhibit a power-law degree distribution, the property that defines scale-free networks.  In understanding this phenomenon, it is important  to have a  generative model that is able to construct scale-free networks that imitate  real-world networks. Some of the existing models are merely formulaic and do not offer an explanation that can be interpreted naturally or intuitively, while the others require certain information that is not available in many real-world circumstances. 

We have proposed the LogNormal Fitness Attachment (LNFA) model [GRIDPEER 2009, IJPEDS 2009],  a novel model for generating scale-free networks with a wide range of power-law exponents as seen in real-world networks. LNFA is based on the reasonable assumption that the effect of various attributes, which determine the ability of each node to attract other nodes, is multiplicative. This composite attribute or fitness is lognormally distributed and is used in forming the complex network. The fitness distribution is characterized by a single statistical parameter and we show that, by varying this parameter, LNFA can recover power-law networks as well as those networks that exhibit exponential and monopolistic (winner-take-all) degree distributions. Not only LNFA offers a naturally comprehensibleexplanation for the growth of real-world networks, LNFA  can construct networks with a tunable degree of dissortativity, a property that is common in real-world technological networks and biological networks. We have shown that  increasing dissortativity also results in better efficiency of searching in networks. For the same degree sequence, the network generated with LNFA enables much more efficient search than the network constructed by the configuration model in \cite{newman2001} which is widely-considered ``the" algorithm to construct a network satisfying a given arbitrary degree distribution. Inspired by the concepts in LNFA, we are working on a similar model for generating scale-free networks that have positive assortativity, instead of dissortativity. We are also investigating further the helpfulness of LNFA in supporting  different random search algorithms that assume different kinds of information known to each node.


Publish/Subscribe Architecture for P2P Networks

Search applications can be categorized into two models: request/response and publish/subscribe. The former is the traditional, where a  query is submitted on demand expecting immediate  results; if none exists, a response indicating so is returned. Contrarily, a query in the publish/subscribe model is submitted and stored in advance, for which the results may not yet exist but the query subscriber expects to be notified when they later become available. This model is thus suitable for search applications where queries await future information, as opposed to the traditional applications where the information to be searched must pre-exist. This project is focused on the publish/subscribe model and our goal is to devise a mechanism that can be integrated into a given P2P network to enable applications of this model. With this mechanism, a human monitor in a P2P-based geographical observation network will be able to subscribe a query to receive alerts of fire occurrences so that necessary rescue efforts can be dispatched quickly; or in a P2P-based scientific information sharing network, a subscriber will be notified when new scientific information is published.

For DHT structured P2P networks, we have developed a random projection based method [ICCCN 08] that can deploy any publish/subscribe service on top of a CAN-based network. Most recently, we have developed PUB-2-SUB [IFIP Networking 09]  which can be integrated into any unstructured network to allow any number of content-based publish/subscribe applications to be deployed simultaneously. Unlike the gossip-based approach previously recommended for unstructured networks, the proposed technique is based on directed routing and incurs less storage and communication costs. This is evident in an evaluation study in which PUB-2-SUB is compared to a representative technique of the other approach. It is also found that our technique results in low computation cost and low notification delay and remains highly effective in cases when many nodes in the network stop to function. PUB-2-SUB works best for P2P-based cooperative networks in which the nodes are supposed to be functional most of the time and failures should not happen too often. Thus, data grid networks and institutional collaborative networks can take full advantage of the proposed technique. 


Storage and Search in P2P/Overlay Networks

Many good data indexing and search techniques were already proposed for centralized systems (e.g., database management systems, web search engines). Yet, it remains a challenge how to achieve the same success for systems whose data and computing nodes are decentralized and frequently changing. The reason is that while existing solutions focus mainly on data complexity, for decentralized data networks we should consider both data complexity and network complexity and their effect on each other. In other words, in addition to the curse of dimensionality and insert/delete operations of the data, we should also address the dimensionality (network size, connectivity) and churn effect (insert/delete operations) of the nodes. I have proposed EZSearch [USENIX 2005, COMCOM 2008] – a self-organizing and decentralized infrastructure that decouples the data component and network component for efficient management and provides an indexing map between these two components for effective data retrieval. EZSearch is highly scalable with the growth of both the data and network. I have been awarded a $225K NSF grant to explore other potentials of EZSearch and implement a real system with additional functionalities such as security and economics. The period of this grant is 9/2006 – 9/2009. Go to the website of this project.



Multimedia Dissemination in P2P/Overlay Networks

The traditional method for multimedia dissemination is client/server based, where the content is always provided directly from the source or its mirror sites. This method is not scalable due to the limit of server bandwidth. The P2P method, on the other hand, allows clients to forward content previously obtained to other clients and, therefore, potentially has an excellent scalability. The quality of service (delay, loss, smoothness) in this network, however, is difficult to maintain due to the high frequency of node membership changes. I proposed a novel solution called Zigzag [INFOCOM 2003, JSAC 2004] that builds a highly robust and efficient P2P framework for effective multimedia dissemination. This idea, since its publication, has received more than 500 citations (source:  Google Scholar). Its original INFOCOM paper [INFOCOM 2003] was one of the most-cited documents published in 2003 (source: http://citeseer.ist.psu.edu/articles2003.html).

I was also the chief architect of a series of highly scalable overlay techniques for large-scale multimedia multicast [18, 16, 17, 14, 13, 8]. Among these techniques is Range Multicast [18, 16, 8], an innovative overlay multicast protocol for video on demand.  This work, implemented in both Windows and Linux OS’s, led to a $300K NSF grant (2001-2004) awarded to my advisor, which funded my doctoral study.




Multimedia Broadcast in Mobile Ad hoc Networks

In this project, I focused on enabling multimedia services in infrastructure-less networks. Specifically, I proposed MobiVoD [MDM 2004] - the first system design for video-on-demand applications in ad hoc networks. I also proposed a congestion-adaptive ad hoc routing protocol (CRP) [WCNC 2005, TPDS 2006], which is suitable for multimedia communications.