Postgres MPP Data Warehousing joins Hadoop ecosystem
Making two elephants dance
- Track: HPC, Big Data and Data Science devroom
- Room: H.2213
- Day: Saturday
- Start: 16:00
- End: 16:25
Hadoop has been touted as a replacement for data warehouses. In practice Hadoop has had success offloading ETL/ELT workloads, but still has gaps serving requirements for operational analytics.
Apache Bigtop now includes Greenplum Database in deployment of big data solutions. Greenplum Database is, an open source massively parallel data warehouse based on PostgreSQL, and is an excellent addition to the Hadoop ecosystem.
In this session we'll cover: * Introduction to Greenplum * Bigtop Support for Greenplum * External tables in Hadoop by Greenplum * Parallel reads and writes to Hadoop by Greenplum * Running advanced analytics on structured and unstructured data in both Hadoop and Greenplum via Apache MADlib (incubating) * Geospatial and Machine Learning in Greenplum based on HDFS data * Storing data from a data lake in Greenplum for high throughput analytical queries
Speakers
Roman Shaposhnik |