Think of a grocery app like BigBasket. You are planning for grocery
essentials and you have listed out a few things that you are planning to buy.
But after listing out the things, you end up buying more than you think! This
is called impulsive buying and many brands like BigBasket or Grofers use
apriori algorithm to leverage this phenomenon.
Apriori algorithm is a concept of Data Mining. Data mining allows users
to analyze, classify, and discover correlations among data. The task is related
to the crucial process called Association Rule Learning.
What is Associate Rule Learning?
Associate Rule Learning or ARL is a part of data mining anomaly
detection. ARL is a procedure to look for the items or events that do not
correspond to a familiar pattern. Such kinds of familiar patterns are termed
anomalies and transcribe critical and actionable data in different application
fields. ARL looks for the association among variables.
What is the Apriori Algorithm?
Apriori algorithm also known as level-wise search is an iterative
approach that is attempted on a database record (particularly transactional
records or the records including a particular number of fields or items). The
apriori is a classic algorithm that follows a bottom-up approach to
incrementally contrast complex records. It is the most used algorithm in
today’s world of machine learning and artificial intelligence.
Apriori algorithm consists of two primary steps:
Repeat these two steps k times, where k is the number of items in the
last iteration you get frequent items sets containing k items. Another point to
remember here is Apriori algorithm follows the sequential approach.
Example – Take an example of app BigBasket. Using this app, one can
figure out which items are purchased together and frequently through
BigBasket.
Scientists describe apriori algorithm from its pseudo code that is
widely available online. The apriori algorithm is used with the collaboration of
other algorithms to sort and contrast data effectively to show a better picture
of how a complex system reflects patterns and trends.
Another example is – the information gathered from an e-commerce website
that a customer who purchases a keyboard is also likely to buy a mouse
simultaneously. For this, the following association rule will be made:
Support: The percentage of
task-relevant data transactions for which the pattern is true
Support (Keyboard -> Mouse) =
Confidence: The measure of certainty
or trustworthiness associated with each discovered pattern.
Confidence (Keyboard -> Mouse) =
Thus, the algorithm focuses on the finding of rules that satisfy both
minimum support threshold and a minimum confidence threshold (strong rules).
The working of Apriori algorithm
Flowchart of the above problem statement
Key Concepts
Example of Apriori algorithm
Consider the following database(D) with 4 transactions (T1,T2,T3 and
T4).
Let minimum support = 2%
Let minimum confidence = 50%
Now, let’s generate Association rule of the above:
And, to find confidence:
To convert confidence into confidence%, multiply confidence with 100
[For example, confidence%(A -> C) = 0.66 x 100 = 66%]
Now compare confidence% of association rule with a minimum confidence
threshold (50%). Since confidence% of both Association rules is greater than
minimum confidence threshold, so both Association rules are final rules.
Was the example technical one?
Don’t worry! I would like it if
even an undergraduate can understand this without any problem. Let’s make it
simple without any use of terminologies or jargons.
Let’s start with a non-simple
example,
Transaction ID |
Items Bought |
T1 |
{Mango, Onion, Nintendo, Key-chain,
Eggs, Yo-yo} |
T2 |
{Doll, Onion, Nintendo, Key-chain,
Eggs, Yo-yo} |
T3 |
{Mango, Apple, Key-chain, Eggs} |
T4 |
{Mango, Umbrella, Corn, Key-chain,
Yo-yo} |
T5 |
{Corn, Onion, Onion, Key-chain,
Ice-cream, Eggs} |
Now, we follow a simple golden
rule: we say an item/itemset is frequently bought if it is bought at least 60%
of times. So for here it should be bought at least 3 times.
For simplicity
M = Mango
O = Onion
And so on……
So the table will be
Original table:
Transaction ID |
Items Bought |
T1 |
{M, O, N, K, E, Y } |
T2 |
{D, O, N, K, E, Y } |
T3 |
{M, A, K, E} |
T4 |
{M, U, C, K, Y } |
T5 |
{C, O, O, K, I, E} |
Step 1: Count
the number of transactions in which each item occurs, Note
‘O=Onion’ is bought 4 times in total, but it occurs in just 3 transactions.
Item |
No of transactions |
M |
3 |
O |
3 |
N |
2 |
K |
5 |
E |
4 |
Y |
3 |
D |
1 |
A |
1 |
U |
1 |
C |
2 |
I |
1 |
Step 2: Now
remember as said the item is said frequently bought if it is bought at least 3
times. So in this step we remove all the items that are bought less than 3
times from the above table and we are left with
Item |
Number of transactions |
M |
3 |
O |
3 |
K |
5 |
E |
4 |
Y |
3 |
These are the single items that
are bought frequently. Now let’s say we want to find a pair of items that are
bought frequently. We continue from the above table (Table in step 2)
Step 3: We start
making pairs from the first item, like MO,MK,ME,MY and then we start with the
second item like OK,OE,OY. We did not do OM because we already did MO when we
were making pairs with M and buying a Mango and Onion together is the same as
buying Onion and Mango together. After making all the pairs we get,
Item pairs |
MO |
MK |
ME |
MY |
OK |
OE |
OY |
KE |
KY |
EY |
Step 4: Now we
count how many times each pair is bought together. For example M and O is just
bought together in {M,O,N,K,E,Y}
While M and K is bought
together 3 times in {M,O,N,K,E,Y}, {M,A,K,E} AND {M,U,C, K, Y}
After doing that for all the
pairs we get
Item Pairs |
Number of transactions |
MO |
1 |
MK |
3 |
ME |
2 |
MY |
2 |
OK |
3 |
OE |
3 |
OY |
2 |
KE |
4 |
KY |
3 |
EY |
2 |
Step 5: Golden
rule to the rescue. Remove all the item pairs with number of transactions less
than three and we are left with
Item Pairs |
Number of transactions |
MK |
3 |
OK |
3 |
OE |
3 |
KE |
4 |
KY |
3 |
These are the pairs of items
frequently bought together.
Now let’s say we want to find a
set of three items that are brought together.
We use the above table (table
in step 5) and make a set of 3 items.
Step 6: To make
the set of three items we need one more rule (it’s termed as self-join),
It simply means, from the Item
pairs in the above table, we find two pairs with the same first Alphabet, so we
get
· OK
and OE, this gives OKE
· KE
and KY, this gives KEY
Then we find how many times
O,K,E are bought together in the original table and same for K,E,Y and we get
the following table
Item Set |
Number of transactions |
OKE |
3 |
KEY |
2 |
While we are on this, suppose
you have sets of 3 items say ABC, ABD, ACD, ACE, BCD and you want to generate
item sets of 4 items you look for two sets having the same first two alphabets.
· ABC
and ABD -> ABCD
· ACD
and ACE -> ACDE
And so on … In general, you
have to look for sets having just the last alphabet/item different.
Step 7: So we
again apply the golden rule, that is, the item set must be bought together at
least 3 times which leaves us with just OKE, Since KEY are bought together just
two times.
Thus the set of three items that
are bought together most frequently are O,K,E.
Applications of Apriori Algorithm
Limitations of using Apriori algorithm
Methods to improve Apriori’s efficiency
Advantages of Apriori
Disadvantages of Apriori
Final Remarks
In this technical tutorial of Data Science, we have learned about
association rule mining that is implemented with the help of Apriori algorithm.
We have also learned an example of Apriori algorithm and how Apriori algorithm
works. The Apriori algorithm is also useful in Python programming language to
perform market basket analysis.
Get Certified From
Codegnan helps you to become one, Enroll now and start One on One Live sessions.
Get Microsoft Technology Associate and HPE Certificate after successful completion of Exams
Webinars are always part of our community. Attend regular webinars and gain knowledege from us
Learn moreOnline / Classroom12
189 Learners
Online / Classroom
189 Learners
Online / Classroom
189 Learners
Online / Classroom
189 Learners
"Got to know about Codegnan through google and joined in it after going through their reviews.It been a very encouraging and postivie experience right from getting details,joining and completing my microsoft certification exam.The staff here is very helpful in every possible way especially sai ram sir has been really good in helping through out the course.i highly recommend Codegnan."
"I have learned PYTHON course here.The teaching environment is very friendly here.they clear the doubts immediatly. I am very happy as i trained here and gained MTA badge.The best part is they focus on practical training.I suggest who are willing get train in python this is best place."