Tidy data principles.


 

 

 

 

Step 1. Find a messy data from the Internet The data must violate at least one of the tidy data principles.

Any data format (csv, txt, excel, …) is okay, as long as you can read it into R.

However, you need to convert and save the data as a csv file and include it in your submission.
Each student will have unique data. No two students can use the same data. Once you find the data, double check that it is not used by another student. Please send it to me when you find the data. I need the data source link first before starting.

Take my confirmation and then proceed.

Step 2. Use R Markdown to achieve the following:

1. Specify author, date, and title in the YAML metadata of your document

2. Describe the data source, background, characteristics, variables, etc.

3. Load the data into R. Depending on the data format, you need an appropriate way to import the data.

4. Show and explain why the data is not tidy. Don’t use data that is already tidy.

5. Tidy up the data using dplyr and/or tidyr

6. Explain why the data is tidy now

7. Create two different & meaningful data visualization out of the tidy data using ggplot. (It is not enough to just change one variable in the axis.)

8. Identify the patterns in each plot and explain why they are meaningful

Resources to learn R Markdown:

https://r4ds.had.co.nz/r-markdown.html https://rmarkdown.rstudio.com/lesson-1.html http://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf R Markdown: The Definitive Guide by Yihui Xie, J.J. Allaire, and Garrett Grolemund

Here are some additional notes about writing a RMarkdown report. Violating these rules may lead to a lower grade.

1. Put the data in the same folder as your Rmd file. Whenever we run/knit an RMarkdown file, it uses the folder with the Rmd file as the working directory.
2. Read the data in your Rmd code chunk using relative path. If you use an absolute path, I will not be able to knit the Rmd file to an html file from my end.
3. You will lose 5 points if for any reason (input path, error in code, etc.) the Rmd file cannot be knitted to an html file.
4. Distinguish headings (## heading) and normal text. We should not put all the text in headings.
5. Do not print excessive data in your RMarkdown report. Use kable to format tables, if you prefer.
6. Do not put your discussions/explanations in code chunk. Write them as normal text.
7. Do not use include=FALSE or echo=FALSE in your code chunk. I need to read your code. You may use message=FALSE, warning=FALSE to suppress messages/warnings.
8. Do not write an excessively long line of code. Break it into multiple lines to improve readability.

Step 3. Knit the R Markdown file (.Rmd) to an HTML file

Step 4. The Rmd, HTML, and csv files must follow the following naming rule:

Assignment1-YourLastName-Title With Six Words Or Less.FileExtension

For example:
Assignment1-Lin-Twitter Data Wrangling and Visualization.Rmd

Assignment1-Lin-Twitter Data Wrangling and Visualization.html

Assignment1-Lin-Twitter Data Wrangling and Visualization.csv

Step 5. If the csv file is larger than 5MB, remove some rows such that the file size is 5MB or less

Step 6. Submit the three files (individually)

 

 

The post Tidy data principles. first appeared on COMPLIANT PAPERS.



This essay is written by:

Prof. SirMojo Verified writer

Finished papers: 435

Proficient in:

English, History, Business and Entrepreneurship, Nursing, Psychology, Management

You can get writing help to write an essay on these topics
100% plagiarism-free

Hire This Writer
© 2017 theacademicessays. All Rights Reserved. Design & Developed by theacademicessays.
How to Avoid Plagiarism
  • Use multiple resourses when assembling your essay
  • Use Plagiarism Checker to double check your essay
  • Get help from professional writers when not sure you can do it yourself
  • Do not copy and paste free to download essays
Get plagiarism free essay
Loading...