Streaming Pipelines for Neural Machine Translation
- Track: HPC, Big Data and Data Science devroom
- Room: UA2.118 (Henriot)
- Day: Sunday
- Start: 15:30
- End: 15:55
Machine Translation is important when having to cater to different geographies and locales for news or eCommerce website content. Machine Translation systems often need to handle a large volume of concurrent translation requests from multiple sources across multiple languages in near real time.
Many Machine Translation preprocessing tasks like Text Normalization, Language Detection, Sentence Segmentation etc. can be performed at scale in a real time streaming pipeline utilizing Apache Flink.
We will be looking at a few such streaming pipelines leveraging different NLP components and Flink’s dynamic processing capabilities for real time training and inference. We'll demonstrate and examine the end-to-end throughput and latency of a pipeline that detects language and translates news articles shared via twitter in real-time. Developers will come away with a better understanding of how Neural Machine Translation works, how to build pipelines for machine translation preprocessing tasks and Neural Machine Translation models.
Speaker Bio Suneel Marthi: Suneel is a member of the Apache Software Foundation and is a PMC member on Apache OpenNLP, Apache Mahout, and Apache Streams. He has done talks at Hadoop Summit, Apache Big Data, Flink Forward, Berlin Buzzwords, and Big Data Tech Warsaw. He is a Principal Engineer at Amazon Web Services.
Experience Suneel: He has done talks at Hadoop Summit, Apache Big Data, Flink Forward, Berlin Buzzwords, and Big Data Tech Warsaw.
Speaker Bio Jörn Kottmann: Jörn is a member of the Apache Software Foundation. He contributed to Apache OpenNLP for 13 years and is PMC Chair and committer of the project. In his day jobs he used OpenNLP to process large document collections and streams, often in combination with Apache UIMA where he is a PMC member and committer as well.
Speakers
Suneel Marthi | |
Jörn Kottmann |