UP TO 15 % DISCOUNT

Get Your Assignment Completed At Lower Prices

Plagiarism Free Solutions
100% Original Work
24*7 Online Assistance
Native PhD Experts
Hire a Writer Now
DSM010 Big data analysis Course Work, UOL, Singapore: Implement the K-Means clustering algorithm with Euclidean and Manhattan Distance Measures
University University of London (UOL)
Subject DSM010 Big data analysis Course Work
Posted on: 1st Dec 2023

DSM010 Big data analysis Course Work, UOL, Singapore: Implement the K-Means clustering algorithm with Euclidean and Manhattan Distance Measures

Q2) Cluster Analysis using Apache Mahout.

For this question, you can optionally use the data (a set of text files that are placed in a folder) provided with Topic 4 for the K-Means algorithm. You are welcome to use your dataset for this question. If you choose to do so, please provide a link to the data in your report.

As we discussed in text clustering (Topic 4), the terms of the documents are considered features in text clustering. The vector space model is an algebraic model that maps the terms in a document into n-dimensional linear space. However, we need to represent textual information (terms) as a numerical representation and create feature vectors using the numerical values to evaluate the similarity between data points.

Use Apache Mahout and perform the standard steps for the cluster analysis,

1) create sequence files from the raw text,

2) create a sparse (efficient) representation of the vectors, initialization approximate centroids for K-Means,

3) run the K-Means algorithm,
4) get the final iteration’s clustering solution

5) evaluate the final solution

You need to consider the following points in the analysis:

Implement the K-Means clustering algorithm with Euclidean and Manhattan Distance Measures.
Find the optimum number (K) of clusters for the K-Mean clustering for the above distance measures.
Implement K-Mean clustering algorithm with Cosine Distance Measure and
verify the relation between the average distance to the centroid and the K value.
Plot the elbow graph for K-Mean clustering with Cosine Measure. Try to
smooth the graph so that you can explain the value for K as the best.
Compare the different clusters you obtained with different distance measures and discuss what is the best setting for K-Means clustering for this dataset.

Hire a Professional Essay & Assignment Writer for completing your Academic Assessments

You need to include the following in your coursework submission:
(a) For Q1 submit the pseudo code and Python code for the mappers and reducers implementation for all of the descriptive statistics, along with some comments so that a layperson can implement. Anyone should be able to run your code and reproduce your results with the instructions that you have provided.

(b) For Q2, write a brief summary of the impact of parameter changes on the
performance of the K-Means algorithm. For example, you may: 1) compare different distance measures in the K-Means algorithm discuss the merits and demerits and 2) present a table that shows the performance of the K-Means algorithm for different K values.

(c) Submit a report on the experiments. This report will be a detailed explanation (Max 1500 words, excluding code and references) of what you explored, the results you obtained, and some discussion points on the limitations of MapReduce methodology and Hadoop’s MapReduce computing engine.

Credit will be given to:
• The depth and breadth of your investigation.
• The technical skills you demonstrate in your write-up.
• Good use of the Hadoop cluster.
• Critical evaluation of your work.

Buy Custom Answer of This Assessment & Raise Your Grades

Get Help By Expert

In need of the best assignment help in Singapore? Look no further. Our platform specializes in aiding Singaporean students enrolled at the University of London (UOL). Whether it's handling GBA tasks, individual assignments, or specifically diving into DSM010 Big Data Analysis Course Work, such as implementing the K-Means clustering algorithm with Euclidean and Manhattan Distance Measures, we've got you covered. Our Dissertation Writing Services and academic expertise ensure that students can confidently seek assistance, enabling them to excel in their coursework at UOL Singapore while enjoying a smoother academic journey.

Categories:-
Tags:-
Answer
Recent Solved Questions
No Need To Pay Extra
  • Turnitin Report

    $10.00
  • Proofreading and Editing

    $9.00
    Per Page
  • Consultation with Expert

    $35.00
    Per Hour
  • Live Session 1-on-1

    $40.00
    Per 30 min.
  • Quality Check

    $25.00
  • Total
    Free

New Special Offer

Get 30% Off

UP TO 15 % DISCOUNT

Get Your Assignment Completed At Lower Prices

Plagiarism Free Solutions
100% Original Work
24*7 Online Assistance
Native PhD Experts
Hire a Writer Now
My Assignment Help SG Services
My Assignment Help SG

Rated 4.9/5 Based on 22945 Singaporean Students