Fundamentals of Data Mining: Analyzing Diabetes Dataset
| University | National University of Singapore (NUS) |
| Subject | Data Mining |
Fundamentals of Data Mining
Question 1
In this question, you will use the dataset “diabetes.csv” to learn more about diabetes. Table 1 describes the fields in the dataset, which contains 768 records. Each record is a medical record of a patient who has had several tests to determine whether they have diabetes.
The dataset was extracted from an internal healthcare organisation database. The technical staff who assisted in the data collection process shared that if the data cannot be captured successfully, 0 will be used. In short, for some fields, a value of 0 indicates that the values are not captured in the dataset.
| Table 1. Description of the dataset Field | Description |
| Pregnancies | Number of times pregnant |
| Glucose | Plasma glucose concentration after 2 hours in an oral glucose tolerance test |
| BloodPressure | Diastolic blood pression (mm Hg) |
| SkinThickness | Triceps skin fold thickness (mm) |
| Insulin | 2-Hour serum insulin (mu U/ml) |
| BMI | Body mass index (weight in kg/(height in m)^2) |
| DiabetesPedigreeFunction | Diabetes pedigree function |
| Age | Age (years) |
| Outcome | 0 (non-diabetic) or 1 (diabetic) |
(a) Assess the quality of the dataset. If needed, perform data cleaning. In less than 200 words, discuss how you identify the data quality issues and clean the data. Also, justify your choice of data cleaning method.
You are expected to use the cleaned dataset obtained from part (a) when attempting the subsequent
parts of the question.
(b) Determine the obesity level for each patient according to the following categories:
• “Underweight” if the BMI is below 18.5
• “Normal” if the BMI is 18.5 and above but below 25
• “Overweight” if the BMI is between 25 and above but below 30
• “Obese” if the BMI is 30 and above
Then, present one (1) graphical display that can answer the following:
Which obesity level has the highest number of diabetic patients?
(c) Construct a K-Means model that can help you identify the profile of patients diagnosed with
diabetes. In your answer, discuss the following:
• How do you decide the input(s) and parameter(s) to be used in the model
• How do you determine your model is the final best model
• What are the profiles of the clusters
• How do you identify the cluster to be the target cluster
• Data preparation steps and post-model analysis, if any
(d) Construct an Apriori model that can help you identify the profile of patients diagnosed with
diabetes. In your answer, discuss the following:
• How do you decide the input(s) and parameter(s) to be used in your model
• How do you determine your model is the final best model
• Report the total number of association rules obtained
• Pick one interesting association rule and explain it in terms of support, rule support and confidence
• Data preparation steps and post-model analysis, if any.
Your writing should be succinct but not at the expense of excluding relevant details. Highlight only the points that are relevant to your discussion. Use plain and simple language. Some questions may not come with absolutely right or wrong answers. For such questions, you have the liberty to express
your views about the problem. However, your points have to be supported by evidence and good reasoning. It’s the quality and not the length that counts. Make sure you follow the report guidelines and style specified in this assignment.
The topics in the main report should be presented in the order according to the sequence of the tasks/questions listed in the assignment; that is, in the order of (a), (b), …, etc. You can have several sub-sections within a section if you deem it appropriate.
The report must be self-contained. It is important to include all relevant tables and figures in the report as evidence to support the answers given.
The following are some details of the report format:
• Length: should not exceed 10 pages (including the relevant graphs, tables, references, screenshots and appendices (if any), but excluding the cover page)
• Font Style: Times New Roman
• Font size: 12
• Line spacing: 1.5
• Margins: 1” for the top, bottom, right and left
• Include the page number on each page
Some further suggestions:
• Ensure minimal grammatical and typographical errors
• Write clearly in plain English
• Write appropriately to the context
• Cite appropriate sources
• Provide a reference or bibliography at the end of the main report
• Include less relevant details in the Appendix
• Good overall presentation of the repor
Hire a Professional Essay & Assignment Writer for completing your Academic Assessments
- BUS6062 International Business Assessment Brief 2026 | BCU Singapore
- SCM204 Specialized Diploma in Construction Management Integrated Group Assignment 2026
- ICT302 Generative AI: Theory and Practice End-of-Course Assessment 2026 | SUSS
- MGT564 Managing Digital Transformation in Operations End-of-Course Assessment 2026 | SUSS
- IBUS2004 Managing International Business Assessment 1 Brief 2026 | UON
- ICT340 Application Analysis and Design Tutor-Marked Assignment 2026 | SUSS
- BSL305 Company Law Assignment 2026 | Murdoch University Singapore
- MRSC2560 Medical Radiation Science Instrumentation 2B Assignment Brief 2026 | UON
- EXSS6110 Evidence Supported Practice in Exercise Science Assessment Brief 2026 | UON
- BPM401 Professional Practices in Construction Project Management Assignment 2026
