Tea Rating

Using Chemical Data to Predict Tea Ratings ...learn more

Project status: Concept

Artificial Intelligence

Intel Technologies
Intel Opt ML/DL Framework

Links [1]

Overview / Usage

Our understanding of taste continues its constant evolution, as it remains poorly
understood by modern science. One of the original models for taste was that our sense of
taste came from our taste buds passing on information about which of the five common
flavors- sweet, bitter, sour, salty and the recently christened umami, were present in our
food. Now modern science has come to find that complicated interactions between protein
receptors in taste cells maybe be responsible for our perception of taste. Scientists at UC
San Diego found that the same pH protein receptors in the spinal cord were also found in
our taste cells, helping us measure the acidity content of food to give us the perception of
sourness and acid content. Therefore, it seems a sound idea, that in any attempt to
model human taste we seek to include such features in our feature space.

Since creating a model for taste seems a gargantuan and vaguely defined taste, we
can first start with a way of creating a rating system. Such a rating system achieves what we
are after with a model for taste, what tastes the best. Since machine learning tasks achieve
the best results with a set of great data, tea’s rating system is a fantastic place to start.
Hence, we can use tea to explore how to predict if something tastes good.
Noting that teas vary by region and variety it would be wisest to choose a single
type of tea and region. Hence I shall choose Darjeeling tea from West Bengal’s Darjeeling district.
since it will be readily available in this area.
I will begin the processing by choosing the amount of caffeine, polyphenols, theobromine, dietary
Minerals,theophylline,fluoride and aluminium content,flavanoids,gallate etc.Because of modern
Environmental pollution, fluoride, and aluminum also sometimes occur in tea. They remain
Contentious because recent studies indicate that they can lead to a negative experience.

Methodology / Approach

SVM and LDA:-
As a first attempt to classify our multi-class data, I shall use the “One-Against-All” variant
of the Support Vector Machine. SVM’s have a very good history of being excellent binary
classifiers by concentrating on maximizing the geometric margin, rather than the error in
relation to all training examples. Using the liblinear library [3], I will use the ‘-s 7’ flag when
training the SVM model. According to liblinear documentation, the SVM would use the OVA
implementation utilizing the logistic kernel. For each data point, it will make a separate
prediction for all the tea rating and an accompanying probability that represents how
confident that prediction is. Once a prediction for each class has been made, the most likely
prediction determined by the recorded probabilities gets chosen as the solution to the
classification problem. The other implementation of a multiclass SVM from the liblinear
library, written by Cramer and Singer, gets called by using the ‘s -4’. This implementation [4]
uses a generalized notion of margins to create a compact quadratic optimization problem.
The dual gets decomposed into multiple small optimization problems and uses this
representation to create a multiclass SVM.
To model the data using Linear Discriminant Analysis, I will use a library written by
Will Drinnelll. LDA works very similarly to our multiclass SVM since when classifying an
example LDA calculates a value representing its attempt to match the example to a certain
Tea rating, and then it will select the tea rating with the highest value to be its guess. These
Values will come from how far into the space designated for a certain tea rating a data point is
determined to be. In order to create a model representation of the data, the LDA will divide the
data so as to maximize the distance between classes.

Feature selection:-
Running a feature selection algorithm on LDA to find which features of a tea contributed the
most to a tea’s rating will give the following order of importance: caffeine,polyphenols,
Theobromine,dietary minerals,theophylline,fluoride,aluminium,flavanoids,gallate etc.
One of questions in the world of tea revolves around the presence of caffeine in
Teas.
So it would make sense that our feature selection revealed that the amount of caffeine
would be one of the more important features when a machine tries to learn a model for tea
rating. Other more influential features include polyphenol content, fluoride levels, and
dietary mnerals level.

Principal Component analysis:-
Looking into how the feature space affects the final rating of the tea a little bit more, and
the fact that something like caffeine may carry more information about taste than just means
that trying to simply group features together would be a difficult task. Hence, when running
PCA and trying to couple all the caffeine related variables into one vector and all polyphenol
related vectors into another vector, and then adding the fluoride and aluminium content as two
separate features, the accuracy will fall quite dramatically.

Collaborators

There are no people to show.

Comments (0)