Improving Metropolitan-Scale Transportation Systems
with Data-Driven Cyber-Control


NSF CNS-1446640
                    



Project Overview

Under the context of Smart Cities Initiative from White House, this project aims to address urban mobility challenge by data-driven cyber control. Our research project is uniquely built upon large-scale urban infrastructure systems across different cities in the world, including NYC, Washington D.C., San Francisco, Rome, Beijing, Shanghai and Shenzhen. We propose to investigate real-time interactions among heterogeneous urban systems including taxi, bus, truck, subway, bike, personal vehicle, electric vehicle, along with cellphone networks and smartcard systems. To improve these systems, it is essential to study their data, and we have been working on several types of big urban data, e.g., 10 TB vehicle GPS data from 10,000 vehicles, 1 TB smartcard transaction data from 16 million smartcard users, and 1 TB cellphone CDR data from 16 million cellphone users.



Research Topics

Our research topics span three key components of Cyber-Physical Systems, i.e., communication, computation, and control. For the communication part, we design protocols for faster and efficient data collection. For the computation part, we analyze these collected data to formulate generic models to capture urban phenomena such as travel pattern, traffic speed, human mobility, passenger demand, and transit supply, by advanced data analytics. For the control part, we utilize these formulated models to design advanced applications such as last-mile transit, ridesharing, dispatching and navigation to provide positive feedback to urban systems by robust control. As a result, our work formulates a closed-loop data-driven cyber-physical system to improve urban transportation.



Travel Pattern Modeling

Real-time human travel pattern modeling is essential to various urban applications. To model such patterns, numerous data-driven techniques have been proposed. However, existing techniques are mostly driven by data from a single view, e.g., a transportation view or a cellphone view, which leads to over-fitting of these single-view models. To address this issue, we propose a human mobility modeling technique based on a generic multi-view learning framework called coMobile. In coMobile, we first improve the performance of single-view models based on tensor decomposition with correlated contexts, and then we integrate these improved single-view models together for multi-view learning to iteratively obtain mutually-reinforced knowledge for real-time human mobility at urban scale. We implement coMobile based on an extremely large dataset in the Chinese city Shenzhen, including data about taxi, bus and subway passengers along with cellphone users, capturing more than 27 thousand vehicles and 10 million urban residents. The evaluation results show that our approach outperforms a single-view model by 51% on average. This paper is published in ACM SIGSPATIAL 2015. [PDF][Data Source]



Traffic Speed Modeling

Data-driven modeling usually suffers from data sparsity, especially for large-scale modeling for urban phenomena based on single-source urban infrastructure data under fine-grained spatial-temporal contexts. To address this challenge, we motivate, design and implement UrbanCPS, a cyber-physical system with heterogeneous model integration, based on extremely-large multi-source infrastructures in a Chinese city Shenzhen, involving 42 thousand vehicles, 10 million residents, and 16 million smartcards. Based on temporal, spatial and contextual contexts, we formulate an optimization problem about how to optimally integrate models based on highly-diverse datasets, under three practical issues, i.e., heterogeneity of models, input data sparsity or unknown ground truth. Based on an integration of five models, we propose a real-world application called Speedometer, inferring real-time traffic speeds in urban areas. The evaluation results show that compared to a state-of-the-art real-world system, Speedometer increases the inference accuracy by 21% on average. This work has been reported in ACM ICCPS 2015.  [PDF]


Human Mobility Modeling

Expanding our knowledge about human mobility is essential for building efficient wireless protocols and mobile applications. Previous human mobility studies have typically been built upon empirical single-source data (e.g., cellphone or transit data), which inevitably introduces a bias against residents not contributing this type of data, e.g., call detail records cannot be obtained from the residents without cellphone activities, and transit data cannot cover the residents who walk or ride private vehicles. To address this issue, we propose and implement a novel architecture mPat to explore human mobility using multi-source data. A reference implementation of mPat was developed at an unprecedented scale upon the urban infrastructures of Shenzhen, China. The novelty and uniqueness of mPat lie in its three layers: (i) a data feed layer consisting of real-time data feeds from 24 thousand vehicles, 16 million smart cards and 10 million cellphones; (ii) a mobility abstraction layer exploring the correlation and divergence among the multi-source data to analyze and infer human mobility; and (iii) an application layer to improve urban efficiency based on the human mobility findings of the study. The evaluation shows that mPat achieves a 75% inference accuracy, and that its real-world application reduces passenger travel time by 36%. This work has been reported in ACM MobiCom 2014.  [PDF]



Last-Mile Transit Service

In this work, we propose a transit service Feeder to tackle the last-mile problem, i.e., passengers’ destinations lay beyond a walking distance from a public transit station. Feeder utilizes ridesharing-based vehicles to deliver passengers from existing transit stations to selected stops closer to their destinations. We infer real-time passenger demand (e.g., exiting stations and times) for Feeder design by utilizing extreme-scale urban infrastructures, which consist of 10 million cellphones, 27 thousand vehicles, and 17 thousand smartcard readers for 16 million smartcards in a Chinese city Shenzhen. Regarding these numerous devices as pervasive sensors, we mine both online and offline data for a two-end Feeder service: a back-end Feeder server to calculate service schedules; front-end customized Feeder devices in vehicles for real-time schedule downloading. The evaluation results show that compared to the ground truth, Feeder reduces last-mile distances by 68% and time by 52% on average. This work is published in ACM IPSN 2015. [PDF]



Real-time Carpooling Service

Carpooling has long held the promise of reducing gas consumption by decreasing mileage to deliver co-riders. Although ad hoc carpools already exist in the real world through private arrangements, little research on the topic has been done. In this paper, we present the first systematic work to design, implement, and evaluate a carpool service, called coRide, in a large-scale taxicab network intended to reduce total mileage for less gas consumption. Our coRide system consists of three components, a dispatching cloud server, passenger clients, and an onboard customized device, called TaxiBox. To improve coRide’s efficiency in mileage reduction, we formulate a NP-hard route calculation problem under different practical constraints. We then provide (i) an optimal algorithm using Linear Programming, (ii) a 2 approximation algorithm with a polynomial complexity, and (iii) its corresponding online version. To encourage coRide’s adoption, we present a win-win fare model as the incentive mechanism for passengers and drivers to participate. We evaluate coRide with a real world dataset of more than 14,000 taxicabs, and the results show that compared with the ground truth, our service reduces 33% of total mileage; with our win-win fare model, we lower passenger fares by 49% and simultaneously increase driver profit by 76%. This work is published in ACM SenSys 2013. [PDF]