Trajectory: A novel geospatial data model of Pivotal GPDB
With the drastically increasing size of trajectory data generated by location-based services and applications which are collected from inexpensive GPS-enabled devices, the availability of such massive trajectory data has received significant attentions in recent years and spawned various novel applications, such as social gaming, route planning, carpooling, tour recommendation, commuting pattern etc. We are developing this novel geospatial data model in Pivotal Greenplum database, which considers a sequential data type that records the spatial locations of moving objects over time. In this talk, we will survey existing prototypes on trajectory and introduce our design and progress inner Pivotal GPDB, the world's first open source MPP data warehouse.
The continuous proliferation of GPS-enabled mobile devices (e.g., car navigation systems, smart phones and PDAs) and online map services (e.g., Google-maps, Bing- maps and MapQuest) enable people to log their current geographic locations and share their movements to web sites such as Bikely, GPS-Way-points, Share-My-Routes, Microsoft GeoLife. In the meantime, more and more social network sites, including Twitter, Foursquare and Facebook, begin to support the applications of sharing GPS traces. According to GSA’s reports, almost half of Apple and Android apps are collecting user location information. As a result, more and more smartphone users are appreciating or relying on the capabilities of LBS in their daily lives.
Nowadays, geospatial exports and scientists mostly organize this kind of geographic data in term of trajectory, which is composed of a sequence of time-stamped geographic points. Then they study the movements of a moving object and the behaviors of a cluster of moving objects, for example, the touring habits of visitors in a museum. The availability of such massive trajectory data creates various novel applications, such as social gaming, route planning, carpooling, tour recommendation, commuting pattern etc. Take trajectory search and recommendation, for example, which is designed to retrieve from a database the raw trajectories that best connect (or are close to) a few selected locations (e.g., a set of user specified geographical locations on map). Cyclopath is a vivid example, which tends to find bike routes that match the personalized cycling demand and share personal cycling knowledge with the community.
The major feature of trajectory that distinguishes from other geo-spatial data types, such as geometry and raster of PostGIS, is that: trajectory is a kind of time-sequenced geo-spatial data. An essential idea behind trajectory is that discrete GPS samplings are meaningless and difficult to be retrieved, and with this data model we can creatively bridge the gap between raw geospatial samplings storage and upper geospatial analytics requirements.
Several trajectory prototypes have been proposed in recent years, such as Domino from UIC America, Secondo from FernUni Gemany, Hermos from UP Greece, ST-Toolkit from EPFL Switzerland, and SharkDB from UQ Australia etc. Almost all of them are developed for academic research and education, and hard to be integrated into general-purpose databases. To the best of my knowledge, we are the first to develop this novel geospatial data model in commercial database, that is, inner Pivotal Greenplum Database (GPDB), the world's first open source MPP data warehouse.
This talk will briefly introduce (1) the state-of-art of trajectory research and academic prototypes in the world, (2) the design concept of trajectory data model that provides high-performance parallelization of data loading and data processing, (3) the latest status and process of trajectory development built inner Pivotal GPDB. Besides, I will provide examples of how data science teams may transform billions of customer records to tackle the real-world problem in efficient way. I will also discuss our plan of making trajectory open-source in the GPDB community.
Kuien Liu is a Principal Engineer at Pivotal, with a background in databases, and data mining with an emphasis on spatio-temporal data. His work has mainly focused on enhancing the framework of analytics databases with extensive analytical capabilities of geospatial data. Before joining Pivotal, Kuien Liu was an associate professor of Institute of Software at the Chinese Academy of Sciences. He received the Ph.D. degree in computer software and theory in 2010.