Freshwater prediction using Machine Learning

Sidharth Panigrahi

Sidharth Panigrahi

Bengaluru, Karnataka

0 0
  • 0 Collaborators

The project is based on one of the themes Intel® oneAPI Hackathon for Open Innovation. It aims to devise a Machine Learning tool to predict the quality of freshwater ...learn more

Project status: Under Development

oneAPI, Artificial Intelligence, Cloud

Intel Technologies
oneAPI

Code Samples [1]

Overview / Usage

Problem:

Freshwater is one of our most vital and scarce natural resources, making up just 3% of the earth’s total water volume. It touches nearly every aspect of our daily lives, from drinking, swimming, and bathing to generating food, electricity, and the products we use every day. Access to a safe and sanitary water supply is essential not only to human life, but also to the survival of surrounding ecosystems that are experiencing the effects of droughts, pollution, and rising temperatures.

Expected Solution:

In this track of the hackathon, we apply Machine learning concepts and leverage oneAPI capabilities to help global water security and environmental sustainability efforts by predicting whether freshwater is safe to drink and use for the ecosystems that rely on it.

Methodology / Approach

Approach : Constructing the prediction model Data Audit & Analysis

Usage of OneAPI library modin helped accelerate the time of data load and optimisation.

The input data had 'Target' as the dependent and the below listed independent variables.

a. Continuous Attributes: pH, Iron, Nitrate,Chloride, Lead, Zinc, Color, Turbidity, Fluoride, Copper, Odor, Sulfate, Conductivity, Chlorine, Manganese, Total Dissolved Solids, Source,Water Temperature, Air Temperature

b. Categorical Attribute: Day,Time of Day, Month, Source, Color

The attributes 'Day', 'Month' and 'Time of Day' were not considered in order to avoid over-fitting and make the model generic to predict out of time data.

Attribute 'Source' was transformed into continuous variable using one-hot encoding. On the other hand the attribute 'Color' was converted from nominal to ordinal with following encoding: Colorless:1,Near Colorless:2,Faint Yellow:3,Light Yellow:4,Yellow:5

Feature Engineering

The dataset was balanced using imblearn. Boosting algorithm fared better, and within them LightGBM fared the best. The run time was accelerated by using sklearn library of OneAPI tool

From the set of 22 input features were able to drill down to 14 primary attributes needed for the prediction : Chloride,Chlorine,Color,Copper,Fluoride,Iron,Manganese,Nitrate,Odor,Sulfate,Total Dissolved Solids,Turbidity,Zinc,pH

Optimisation and final model

Using daal4py we were able to gain a faster execution time of LightGBM and derive the model to predict the target source (whether it is a freshwater or not)

Technologies Used

  1. Intel OneAPI libraries (modin, daal4py,sklearnex)
  2. Python
  3. Azure Cloud

Repository

https://github.com/span-11/freshwater_predn_oneapi

Comments (0)