When landing on a product webpage at Amazon.com, customers will
find a section called "Customers Who Bought This Item Also Bought"
for recommending books, CDs, and other products that are commonly
co-purchased with the current product, a feature designed to boost
cross-selling.
Task 1 :
Here we're going to write Python code to process historical
transaction data to implement the feature.
Suppose that we are using a Python list to maintain all products
purchased in different transactions:
transactions = [[2, 8, 3, 6, 1, 9], [0, 5, 9], [0, 9], [4, 7, 0, 5, 9],
[8, 3], [1, 6, 3, 8], [9, 0, 5], [3, 8], [5, 7, 0, 4], [3, 1, 2, 6, 0]]
Elements in each sublist represent product IDs co-purchased in a
transaction.
Our goal is, for each product in this tiny transaction dataset,
recommend products that are most frequently purchased together with
the current product.
Please complete the following tasks based on the problem
description.
For each transaction, enumerate all pairs of products that
co-occur in the corresponding sublist. For example, processing the
sublist [9, 0, 5] should produce three pairs (0, 5), (0, 9), and
(5, 9).
Pay attention to the order of product IDs in each pair. The same
product pair (e.g., product 5 and product 9) can have two different
representations (i.e., (5, 9) and (9, 5)). To facilitate the
subsequent processing, make sure all pairs each are represented by
its ascending representation (e.g., (0, 5) rather than (5, 0)).
Moreover, make sure product pairs created from the same sublist
are sorted first by the 1st element and then by the 2nd element.
E.g., the list of pairs created from the sublist [1, 6, 3, 8]
should be presented as [(1, 3), (1, 6), (1, 8), (3, 6), (3, 8), (6,
8)].
Produce a Python list, product_pairs, whose values each are a
list of such pairs.
The expected output of print(product_pairs) is as follows:
[[(1, 2), (1, 3), (1, 6), (1, 8), (1, 9), (2, 3), (2, 6), (2, 8), (2, 9), (3, 6), (3, 8),
(3, 9), (6, 8), (6, 9), (8, 9)],
[(0, 5), (0, 9), (5, 9)],
[(0, 9)],
[(0, 4), (0, 5), (0, 7), (0, 9), (4, 5), (4, 7), (4, 9), (5, 7), (5, 9), (7, 9)],
[(3, 8)],
[(1, 3), (1, 6), (1, 8), (3, 6), (3, 8), (6, 8)],
[(0, 5), (0, 9), (5, 9)],
[(3, 8)],
[(0, 4), (0, 5), (0, 7), (4, 5), (4, 7), (5, 7)],
[(0, 1), (0, 2), (0, 3), (0, 6), (1, 2), (1, 3), (1, 6), (2, 3), (2, 6), (3, 6)]]
Full credits will be awarded only if your answer is
a comprehension implemention. Loop implementations can only get at
most 80% of full marks.
Task 2 :
Next, we will process product_pairs to create a dictionary
cooccurences, whose keys are product pairs and whose values
represent the number of their co-occurrences across
transactions.
The expected output of print(cooccurences) should look like the
following (the output formatting can be different):
{(0, 1): 1, (0, 2): 1, (0, 3): 1, (0, 4): 2, (0, 5): 4, (0, 6): 1, (0, 7): 2, (0, 9): 4,
(1, 2): 2, (1, 3): 3, (1, 6): 3, (1, 8): 2, (1, 9): 1,
(2, 3): 2, (2, 6): 2, (2, 8): 1, (2, 9): 1,
(3, 6): 3, (3, 8): 4, (3, 9): 1,
(4, 5): 2, (4, 7): 2, (4, 9): 1,
(5, 7): 2, (5, 9): 3,
(6, 8): 2, (6, 9): 1,
(7, 9): 1,
(8, 9): 1}
Full credits will be awarded only if the order of
keys in your result matches exactly that in the expected
output.
Task 3 :
Process cooccurences to output a new dictionary recommendation
whose entries each are a list of recommended products associated
with the number of transactions where they were co-purchased.
The expected output is as follows:
{0: [(5, 4), (9, 4), (4, 2), (7, 2), (1, 1), (2, 1), (3, 1), (6, 1)],
1: [(3, 3), (6, 3), (2, 2), (8, 2), (0, 1), (9, 1)],
2: [(1, 2), (3, 2), (6, 2), (0, 1), (8, 1), (9, 1)],
3: [(8, 4), (1, 3), (6, 3), (2, 2), (0, 1), (9, 1)],
4: [(0, 2), (5, 2), (7, 2), (9, 1)],
5: [(0, 4), (9, 3), (4, 2), (7, 2)],
6: [(1, 3), (3, 3), (2, 2), (8, 2), (0, 1), (9, 1)],
7: [(0, 2), (4, 2), (5, 2), (9, 1)],
9: [(0, 4), (5, 3), (1, 1), (2, 1), (3, 1), (4, 1), (6, 1), (7, 1), (8, 1)],
8: [(3, 4), (1, 2), (6, 2), (2, 1), (9, 1)]}
It means that, e.g., for product 5 (corresponding to the entry
5: [(0, 4), (9, 3), (4, 2), (7, 2)]), a list of products (i.e.,
product 0, 9, 4, and 7), each being represented by a tuple with the
1st element being the ID and the 2nd the number of transactions
where it has appeared together with product 5, are recommended. The
recommended products are also sorted in descending order by the the
number of cooccurences with product 5 (i.e., the 2nd element of
inner tuples).
# starter code
transactions = [[2, 8, 3, 6, 1, 9], [0, 5, 9], [0, 9], [4, 7, 0, 5, 9],
[8, 3], [1, 6, 3, 8], [9, 0, 5], [3, 8], [5, 7, 0, 4], [3, 1, 2, 6, 0]]
Please do not answer the question if u don't really know
it.