Freshwater prediction using Machine Learning
Sidharth Panigrahi
Bengaluru, Karnataka
- 0 Collaborators
The project is based on one of the themes Intel® oneAPI Hackathon for Open Innovation. It aims to devise a Machine Learning tool to predict the quality of freshwater ...learn more
Project status: Under Development
oneAPI, Artificial Intelligence, Cloud
Intel Technologies
oneAPI
Overview / Usage
Problem:Freshwater is one of our most vital and scarce natural resources, making up just 3% of the earth’s total water volume. It touches nearly every aspect of our daily lives, from drinking, swimming, and bathing to generating food, electricity, and the products we use every day. Access to a safe and sanitary water supply is essential not only to human life, but also to the survival of surrounding ecosystems that are experiencing the effects of droughts, pollution, and rising temperatures.
Expected Solution:In this track of the hackathon, we apply Machine learning concepts and leverage oneAPI capabilities to help global water security and environmental sustainability efforts by predicting whether freshwater is safe to drink and use for the ecosystems that rely on it.
Methodology / Approach
Approach : Constructing the prediction model Data Audit & AnalysisUsage of OneAPI library modin helped accelerate the time of data load and optimisation.
The input data had 'Target' as the dependent and the below listed independent variables.
a. Continuous Attributes: pH, Iron, Nitrate,Chloride, Lead, Zinc, Color, Turbidity, Fluoride, Copper, Odor, Sulfate, Conductivity, Chlorine, Manganese, Total Dissolved Solids, Source,Water Temperature, Air Temperature
b. Categorical Attribute: Day,Time of Day, Month, Source, Color
The attributes 'Day', 'Month' and 'Time of Day' were not considered in order to avoid over-fitting and make the model generic to predict out of time data.
Attribute 'Source' was transformed into continuous variable using one-hot encoding. On the other hand the attribute 'Color' was converted from nominal to ordinal with following encoding: Colorless:1,Near Colorless:2,Faint Yellow:3,Light Yellow:4,Yellow:5
Feature EngineeringThe dataset was balanced using imblearn. Boosting algorithm fared better, and within them LightGBM fared the best. The run time was accelerated by using sklearn library of OneAPI tool
From the set of 22 input features were able to drill down to 14 primary attributes needed for the prediction : Chloride,Chlorine,Color,Copper,Fluoride,Iron,Manganese,Nitrate,Odor,Sulfate,Total Dissolved Solids,Turbidity,Zinc,pH
Optimisation and final modelUsing daal4py we were able to gain a faster execution time of LightGBM and derive the model to predict the target source (whether it is a freshwater or not)
Technologies Used
- Intel OneAPI libraries (modin, daal4py,sklearnex)
- Python
- Azure Cloud