Step 1. Find a messy data from the Internet The data must violate at least one of the tidy data principles.
Any data format (csv, txt, excel, …) is okay, as long as you can read it into R.
However, you need to convert and save the data as a csv file and include it in your submission.
Each student will have unique data. No two students can use the same data. Once you find the data, double check that it is not used by another student. Please send it to me when you find the data. I need the data source link first before starting.
Take my confirmation and then proceed.
Step 2. Use R Markdown to achieve the following:
1. Specify author, date, and title in the YAML metadata of your document
2. Describe the data source, background, characteristics, variables, etc.
3. Load the data into R. Depending on the data format, you need an appropriate way to import the data.
4. Show and explain why the data is not tidy. Don’t use data that is already tidy.
5. Tidy up the data using dplyr and/or tidyr
6. Explain why the data is tidy now
7. Create two different & meaningful data visualization out of the tidy data using ggplot. (It is not enough to just change one variable in the axis.)
8. Identify the patterns in each plot and explain why they are meaningful
Resources to learn R Markdown:
https://r4ds.had.co.nz/r-markdown.html https://rmarkdown.rstudio.com/lesson-1.html http://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf R Markdown: The Definitive Guide by Yihui Xie, J.J. Allaire, and Garrett Grolemund
Here are some additional notes about writing a RMarkdown report. Violating these rules may lead to a lower grade.
1. Put the data in the same folder as your Rmd file. Whenever we run/knit an RMarkdown file, it uses the folder with the Rmd file as the working directory.
2. Read the data in your Rmd code chunk using relative path. If you use an absolute path, I will not be able to knit the Rmd file to an html file from my end.
3. You will lose 5 points if for any reason (input path, error in code, etc.) the Rmd file cannot be knitted to an html file.
4. Distinguish headings (## heading) and normal text. We should not put all the text in headings.
5. Do not print excessive data in your RMarkdown report. Use kable to format tables, if you prefer.
6. Do not put your discussions/explanations in code chunk. Write them as normal text.
7. Do not use include=FALSE or echo=FALSE in your code chunk. I need to read your code. You may use message=FALSE, warning=FALSE to suppress messages/warnings.
8. Do not write an excessively long line of code. Break it into multiple lines to improve readability.
Step 3. Knit the R Markdown file (.Rmd) to an HTML file
Step 4. The Rmd, HTML, and csv files must follow the following naming rule:
Assignment1-YourLastName-Title With Six Words Or Less.FileExtension
Assignment1-Lin-Twitter Data Wrangling and Visualization.Rmd
Assignment1-Lin-Twitter Data Wrangling and Visualization.html
Assignment1-Lin-Twitter Data Wrangling and Visualization.csv
Step 5. If the csv file is larger than 5MB, remove some rows such that the file size is 5MB or less
Step 6. Submit the three files (individually)
The post Tidy data principles. first appeared on COMPLIANT PAPERS.