Online / 5 & 6 February 2022

visit

Multi-language Data Wrangling and Acquisition Conversational Agents

Using Raku in data acquisition and wrangling


In this presentation we discuss the Conversational Agent (CA) designs for two closely related problem areas:

  • Data Acquisition Workflows (DAWs)

  • Data Transformation Workflows (DTWs)

The CA perspective is taken mostly for exposition and didactic purposes. Nevertheless, we emphasise the practical applicability of the underlying designs and implementations.

Although, operationally data acquisitions are prerequisite for data wrangling we discuss data wrangling first -- the corresponding DTWs designs and implementations are more mature and the related materials are more universal, applicable to multiple programming languages.

Multi-language Data Wrangling and Acquisition Conversational Agents

Anton Antonov
FOSDEM 2022

Abstract

In this presentation we discuss the Conversational Agent (CA) designs for two closely related problem areas:

  • Data Acquisition Workflows (DAWs)

  • Data Transformation Workflows (DTWs)

The CA perspective is taken mostly for exposition and didactic purposes. Nevertheless, we emphasise the practical applicability of the underlying designs and implementations.

Although, operationally data acquisitions are prerequisite for data wrangling we discuss data wrangling first -- the corresponding DTWs designs and implementations are more mature and the related materials are more universal, applicable to multiple programming languages.

Outline

Data Wrangling

In the first part of the presentation we show and compare data wrangling examples in different programming languages using different packages.

Here is a list of the programming languages and packages we consider:

  • Julia-DataFrames

  • Python-pandas

  • R

  • R-tidyverse

  • WL

We look into the common data wrangling workflows and how we can design a conversational agent that translates natural language commands into data wrangling code for Julia, Python, R, SQL, WL.

WL's external evaluator features are heavily utilized.

Data Acquisition Workflows

In the second part of the presentation we discuss the following facets of a data acquisition system:

  • Conversational Agent based on a Finite State Machine

  • Gathering and utilizing metadata taxonomies

  • The making of datasets recommender systems and search engines

    • In/for both R and WL
  • Making (ingredient) variables queries

  • Introspection queries

  • Random data generation specifications

  • Data obfuscation specifications

Extensions to ML models acquisition workflows

Speakers

Photo of Anton Antonov Anton Antonov

Attachments

Links