Part I – PySpark source code (50%) Important Note: For code reproduction, your code must be self-contained. That is, it should not require other libraries besides PySpark environment we have used in the workshops. The data files are packaged properly with your code file. In this component, we need to utilise Python 3 and PySpark to complete the following data


Assignment Task
This assignment consists of two deliverables, being:
• One code implementation (50%). The code file in Jupyter Notebook format and the relevant
data set files should be contained within a folder named: Task 3-Your NameStudent_Number, the folder is then to be zipped and uploaded to blackboard.
• A report (50%). The report must be uploaded as a separate file.
Part I – PySpark source code (50%)
Important Note: For code reproduction, your code must be self-contained. That is, it should
not require other libraries besides PySpark environment we have used in the workshops. The
data files are packaged properly with your code file.
In this component, we need to utilise Python 3 and PySpark to complete the following data
analysis tasks:
1. Exploratory data analysis
2. Recommendation engine
3. Classification
4. Clustering
You need to choose a dataset from Kaggle (https://www.kaggle.com/datasets) to complete
these tasks. Remember to include the data set file in you source code submission.
Note: In your notebook, please use Heading 1 Markdown cell to separate each sub task.
Task I.1: Exploratory data analysis
This subtask requires you to explore your dataset by
• telling its number of rows and columns,
• doing the data cleaning (missing values or duplicated records) if necessary
• selecting 3 columns, and drawing 1 plot (e.g. bar chart, histogram, boxplot, etc.) for each to
summarise it
Task I.2: Recommendation engine
This subtask requires you to implement a recommender system on Collaborative filtering
with Alternative Least Squares Algorithm. You need to include
• Model training and predictions
• Model evaluation using MSE
Task I.3: Classification
This subtask requires you to implement a classification system with Logistic regression with
LogisticRegressionWithLBFGS class. You need to include

Have a similar question?

Ask your homework question

 

Enjoy Our Unique Features!

INDIVIDUAL APPROACH:
Chat with every writer who applies to your request, and view their skills and portfolio. Make the choice that’s right for you.
MANAGE YOUR ORDER:
Monitor progress and see any changes made. Have full control over every phase of the process.
COMMUNICATE:
Ask your writer questions and provide your ideas about your paper. Produce the exact result that you want.
ENJOY THE OUTCOME:
Get everything done on time with high quality. Writing papers is much simpler with us.

Submit Your Instructions to Writers for FREE!!

Ask your homework question

 

Recent Posts

© 2017 theacademicessays. All Rights Reserved. Design & Developed by theacademicessays.
Loading...