Autonomous Data Analysis

0 0
  • 0 Collaborators

we aim to set the ground for employing DRL techniques in the context of Exploratory Data Analysis (EDA), an important yet challenging task, that is critical in many application domains. We aim to develop an end-to-end framework architecture for autonomous EDA ...learn more

Project status: Under Development

Artificial Intelligence

Intel Technologies
Intel Opt ML/DL Framework

Code Samples [1]Links [2]

Overview / Usage

Employing DRL techniques in the context of EDA is a theoretical and practical challenge: First, one has to devise a machine- compatible representation of the EDA environment, often comprising of a large dataset with values of di erent types and semantics, a vast domain of possible analysis actions, and multifaceted result "screens" that may contain features such as grouping and aggregations. Second, at least to our knowledge, there is no clear, explicit reward de nition for EDA actions and sessions. Engineering an appropriate reward signal, that ensures that the agent’s actions are interesting, diverse, and coherent to humans is a challenge. Third, determining an adequate network architecture, input and output values, as well as setting its hyper-parameters, are known as di - cult tasks, often tailored to the application domain. Determining these for the EDA settings is another challenge.

Methodology / Approach

  1. We designed a generic yet extensible DRL environment for EDA, that allows the agent to perform analysis operations (actions) and examine their results (states). At each state the agent observes a concise numeric summary of the current results display. To interact with the dataset, we devise a parameterized EDA action space, where embedding techniques are employed to allow it to choose dataset values as action parameters.
  2. We formulate a compound reward function, which encourages the agent to perform actions that are: (i) interesting - for that we employ a data-driven notion of interestingness; (ii) diverse from one another - we use a distance metric between actions’ result-sets, and (iii) human understandable - for that we utilize real EDA sessions made by human experts as an exemplar.

Technologies Used

Open ai
Tensorflow
Pandas
Intel AI cloud servers

Repository

https://github.com/TAU-DB/REACT-IDA-Recommendation-benchmark

Comments (0)