University | University of Wollongong (UOW) |
CSCI312 Big Data Management Assignment, UOW, Singapore Implementation of HDFS application Implements an HDFS application that merges two files located in HDFS into one file also located in HDFS
Scope
The objectives of Assignment 1 include implementation of HDFS applications, implementation of simple MapReduce applications, and describing an implementation of complex MapReduce applications.
Task 1
Implementation of HDFS application Implements an HDFS application that merges two files located in HDFS into one file also located in HDFS.
The application must have the following parameters.
(1) A path to, and a name of the first input file in HDFS.
(2) A path to, and a name of the second input file in HDFS.
(3) A path to, and a new name of an output file to be created in HDFS. The file is supposed to contain the contents of the first input file followed by the contents of the second input file.
Perform the following steps.
Implement the application and save its source code in a file solution1.java.
Upload two files to HDFS. The contents, the name, and the locations of the files in HDSF are up to you.
When ready, compile, create a jar file, and process your application. Display the results created by the application.
Use Hadoop to provide evidence, that two files uploaded into HDFS have been successfully merged in one file in HDFS.
Deliverables
A file solution1.java with a source code of the application that merges two HDFS files. A file solution1.pdf that contains the contents of Terminal window with a report from compilation, creation of jar file, uploading to HDFS two small files for testing, processing of the application, and evidence that two files uploaded into HDFS have been successful merges in one file in HDFS.
Task 2
Implementation of MapReduce application
Assume, that a speed camera records the speed of passing cars and saves the measurements in a text file. The speed of each car is measured in kilometers per hour. A single row in the file contains a car registration number, a date when the speed has been measured and the speed of a car with the recorded registration number. The values are always separated with a single blank.
For example, a sample file with the speed measurements contains the following lines:
PKR856 12-DEC-2018 120
UPS234 17-JAN-2019 190
PKR856 12-FEB-2018 80
PKR856 01-JAN-2019 60
UPS234 21-OCT-2020 200
UPS234 22-OCT-2020 160
Assume, that a speed limit in the location of the speed camera is 60 kilometers per hour. Your task is to implement a MapReduce application, that finds an average speed of all cars, that exceeded a speed limit in the location of the speed camera.
An input file with the speed measurements must include the lines listed above and it must contain at least 20 measurements. All additional measurements are up to you. Save your solution in a file solution2.java.
When ready, compile, create a jar file, and process your application. Display the results created by the application. Next, list your input file with the speed measurements. When finished, Copy and Paste the messages from a Terminal screen into a file solution2.pdf.
Deliverables
file solution2.java with a source code of the application that implements the functionality of the SELECT statement given above. A file solution2.pdf with a report from compilation, creating jar file, processing, displaying the results of processing solution2.java, and listing of your input file with the speed measurements.
Task 3
Implementation of MapReduce application
An application MinMax described in Exercise 2 has the functionality the same as the following SQL statement.
SELECT key, MIN(value), MAX(value)
FROM Sequence-of-key-value-pairs
GROUP BY key;
Extend the Java code of the application such that it implements the functionality the same as the following SQL statement.
SELECT key, key, MIN(value), MAX(value), SUM(value), AVG(value)
FROM Sequence-of-key-value-pairs
GROUP BY key;
Save your solution in a file solution3.java. When ready, compile, create jar file, and process your application. To test your application, you can use a file sales.txt included in a folder with a specification of Exercise 2. Display the results created by the application. When finished, Copy and Paste the messages from a Terminal screen into a file solution3.pdf.
Deliverables
A file solution3.java with a source code of the application that implements the functionality of the SELECT statement given above. A file solution3.pdf with a report from compilation, creating jar file, processing, and displaying the results of processing solution3.java
Task 4
Describing MapReduce implementation of join operation
Assume, that a file measurement.txt contains the speed measurements of the passing cars (see Task 2). A single row in the file contains a car registration number, a date when the speed has been measured and the speed of a car with the recorded registration number. A sample contents of the is listed below.
PKR856 12-DEC-2018 120
UPS234 17-JAN-2019 190
PKR856 12-FEB-2018 80
PKR856 01-JAN-2019 60
UPS234 21-OCT-2020 200
UPS234 22-OCT-2020 160
Assume, that a file car.txt contains the technical descriptions of the cars. A single row in the file contains a car registration number, maximum speed and fuel consumption
PKR856 180 10
UPS234 200 15
Assume that both files have been loaded to HDFS.
Write the comprehensive explanations how would you implement in Java a MapReduce application that finds the cars, that reached its maximum speed at the speed checkpoint.
Save you explanations in a file solution4.pdf. This task does not require you to write any code in Java. However, the comprehensive explanations related to all stages of data processing are expected. You are allowed to support your explanations with the fragments of pseudocode. Try to be as specific as it is possible.
Deliverables
A file solution4.pdf with the comprehensive explanations how would you implement in Java a MapReduce application that finds the cars that reached its maximum speed at the speed checkpoint.
- GER688 Applied Research in Gerontology Assignment, SUSS, Singapore: A programme developer plans to come up with a service to provide counselling and guidance to older adults
- Gerontology Assignment, SUSS, Singapore: Care Shield Life is a long-term care insurance scheme that provides basic financial support should Singaporeans become severely disabled
- Mechanical Engineering System Assignment, SIT, Singapore: You are part of a team that has been sent to the South Pole for experiments on the acceleration of gravity in extreme latitudes
- Business Information Systems Assignment, SIMGE, Singapore: Computation of Salaries using Excel Sportitude Pte Ltd is a fast-growing local sports retailer that offers a wide range of high-performance sporting apparel
- Vacuum Technology Assignment, NTU, Singapore: How many pumps will you be using for your vacuum system and explain your answer? Name the pumps used
- Analogue Electronic Assignment, SUSS, Singapore: Develop a design for an eight-floor elevator. The inputs/outputs are shown below. The elevator displays the moving direction
- Managing Change Essay, NUS, Singapore: Discuss and give examples of how organizational culture can influence organizational change
- Biostatistics and Epidemiology Research Paper, NUS, Singapore: Find and collect a set or few sets of biomedical related data dengue fever, covid, HFMD, cancer, etc
- English Essay, SUSS, Singapore: Many modern Singaporean families have both parents working. Such dual-income families often result in children spending less
- Organizations in a Global Context Assignment, KU, Singapore: There are three characteristics of a “black swan” event – rarity, extreme impact, and retrospective predictability
UP TO 15 % DISCOUNT