23 Communicating

In this chapter, after reviewing the basics of R Markdown in chapters 4 and 14, we explain tips you should know when you write R Markdown. documents.

23.1 What is R Markdown and R Notebook

R Markdown provides an authoring framework for data science. You can use a single R Markdown file to both

  • save and execute code
  • generate high quality reports that can be shared with an audience

R Notebooks are an implementation of Literate Programming that allows for direct interaction with R while producing a reproducible document with publication-quality output.

An R Notebook is an R Markdown document with chunks that can be executed independently and interactively, with output visible immediately beneath the input.

(Reference: R Markdown: The Definitive Guide, 3.2 Notebook)

23.1.1 Two Goodies

  • Important: Implementation of Reproducible Research and Literate Programming

  • Useful to Render into Various Formats: R Notebook (HTML), R Markdown (HTML), PDF, MS Word, MS Powerpoint, Ioslides Presentation (HTML), Slidy Presentation (HTML), Beamer Presentation (PDF), etc.

23.2 Reproducible Research and Literate Programming

23.2.1 Literate Programming by D. Knuth

Literate programming is an approach to programming introduced by Donald Knuth in which a program is given as an explanation of the program logic in a natural language, such as English, interspersed with snippets of macros and traditional source code, from which a compilable source code can be generated.

23.2.2 D. Knuth

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

23.2.3 Reproducible Research - Quote from a Coursera Course

Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. The need for reproducibility is increasing dramatically as data analyses become more complex, involving larger datasets and more sophisticated computations. Reproducibility allows for people to focus on the actual content of a data analysis, rather than on superficial details reported in a written summary. In addition, reproducibility makes an analysis more useful to others because the data and code that actually conducted the analysis are available.

23.2.4 R Markdown workflow, R for Data Science

R Markdown is also important because it so tightly integrates prose and code. This makes it a great analysis notebook because it lets you develop code and record your thoughts. It:

  • Records what you did and why you did it. Regardless of how great your memory is, if you don’t record what you do, there will come a time when you have forgotten important details. Write them down so you don’t forget!

  • Supports rigorous thinking. You are more likely to come up with a strong analysis if you record your thoughts as you go, and continue to reflect on them. This also saves you time when you eventually write up your analysis to share with others.

  • Helps others understand your work. It is rare to do data analysis by yourself, and you’ll often be working as part of a team. A lab notebook helps you share why you did it with your colleagues or lab mates.

23.2.5 Records of EDA and Communication

  1. Memo on a scratch paper: R Scripts
  2. Record on a notebook: R Notebook (an R Markdown format)
  3. Short paper or a digital communication: R Notebook
  4. Paper or a report: R Markdown (html, pdf, MS Word, etc.)
  5. Presentation (html, pdf, MS Powerpoint, etc.)
  6. Publication of a Book

23.3 Structure of the R Markdown

What is R Markdown: https://vimeo.com/178485416 created by RStudio

R Markdown documents consist of three components.

  • Code Chunks
  • Text
  • YAML Metadata

23.4 Let’s Get Started

  1. Start R Studio - Update R Studio if old
  2. Create a Project
  3. Tool > Install Packages rmarkdown
    • Or on Console: install.packages("rmarkdown")
  4. Tool > Install Packages tinytex (for pdf generation)
    • Alternatively, install.packages('tinytex')
    • If TeX is not installed: tinytex::install_tinytex() # install TinyTeX
      • If you are not sure, please check on Terminal in the left below pane:
        • which latex, which mktexlsr - Mac or Linux
        • where mktexlsr - Windows
  5. Let’s try!
    1. File > New File > R Notebook
    2. Save with a file name, say, test-notebook
    3. Preview by [Preview] button
    4. Run Code Chunk plot(cars) and then Preview again.
    5. Knit PDF, Word (and HTML)

23.5 Templates

23.5.1 RNotebook_Template

Template to submit your assignment of this course: RNotebook_Template.nb.html

title: "Title of R Notebook"
author: "ID and Your Name"
date: "2023-05-13" 
output:
  html_notebook: null

23.5.1.1 YAML

  • Change the title
  • Write ID and your name
  • Date is auto-generated and inserted. If you wish, you can replace “2023-05-13” by your favorite date style.

23.5.1.2 Code Chunk

  • When you execute or run a code within the notebook, the results appear beneath the code.
  • Try executing this chunk by clicking the Run button, a triangle pointing right, within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter (Win) or Cmd+Shift+Enter (Mac).
    • Ctrl + Shift + Enter (Windows) or Cmd + Shift + Enter (Mac): Runs the current code chunk and advances to the next one.
    • Ctrl + Alt + C (Windows) or Cmd + Option + C (Mac): Runs all the code chunks in the document.
  • Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Option+I (Win) or Cmd+Option+I (Mac).
  • When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K (Win) or Cmd+Shift+K (Mac) to preview the HTML file).
  • The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
  • We will use the pipe command %>% very often later in this class.
    • The shortcut for the pipe operator (%>%) in Rmarkdown on Windows and Mac OS is:
      • On Windows: Ctrl + Shift + M
      • On Mac OS: Cmd + Shift + M

23.5.2 Testing R Markdown Formats

Various Output Formats: test-rmarkdown.nb.html

title: "Testing R Markdown Formats"
author: "DS-SL"
date: "2023-05-13"
output:
  html_notebook:
    number_sections: yes
  pdf_document: 
    number_sections: yes
  html_document:
    df_print: paged
    number_sections: yes
  word_document: 
    number_sections: yes
  powerpoint_presentation: default
  ioslides_presentation:
    widescreen: yes
    smaller: yes
  slidy_presentation: default
  beamer_presentation: default

23.5.3 Comments on Presentation Formats and Options

  • For slides, a new slide starts at ##, the second-level heading.
  • --- is page break for presentation formats.
  • For Word and Powerpoint, you can add your template. See the documents in References
    • Use R Markdown to create a Word document [similar for PowerPoint]
    • Save the rendered Word file as: ref-doc-style.docx
    • Edit the styles of the file ref-doc-style.docx
    • Add ref-doc-style.docx as reference_doc in YAML with indention as below
  word_document: 
    number_sections: yes
    reference_doc: ref-doc-style.docx
  powerpoint_presentation: 
    reference_doc: ref-ppt-style.pptx
  • You can use Output Options at the bottom of the gear icon next to Preview/knit button.

23.6 Markdown Language – or use WYSIWYG editor

  • Headers: #, ##, ###, ####
  • Lists: 1. 2. , *
  • Links: linked phrase
  • Images: ![alt text](figures/filename.jpg)
  • Block quotes” > (block)
  •   equations: e.g. $\frac{a}{b}$ for \(\frac{a}{b}\)
  • Horizontal rules: Three or more asterisks or dashes (*** or - - - )
  • Tables
  • Footnotes
  • Bibliographies and Citations
  • Slide breaks
  • Italicized text by _italic_, Bold text by **bold**
  • Superscripts, Subscripts, Strikethrough text

23.6.1 Visual R Markdown

R Studio introduced Visual Editor towards the end of 2021. It seems to be stable but it is not perfect to go back and forth from the original editor using tags. I always use the original editor and I am confident on all the functions of it but I do not have much experience on Visual Editor. [My Note in QALL401 2021]

Please refer to the document in the following link. You can switch between the Source editor and the Visual editor using the button on the top left pane’s left top corner. The document below is a bit old, and the switch button is shown at the top right corner of the top left pane.

23.7 R Markdown Revisited

Presentation: Submit an R Notebook (with codes used in the presentation), and PowerPoint file or other files used for your presentation, if any. If you use R Notebook for your presentation, you do not need to submit extra files.

Final Paper: Submit an R Notebook (with codes as a work file), and a PDF (rendered directly from an R Notebook, or created from Word) - Maximum pages of PDF is eight.

Format of Presentation - R Notebook is fine and slide presentation in various format is also fine


23.7.1 Literate Programming and Reproducible Research

Importing Data:

  1. Read a csv file: read_csv("./data/file_name.csv")
  2. Download and import using a url of a csv file: read_csv(url)
  3. Read an Excel file: readxl::read_excel("./data/excel_file_name.xlsx")
  4. Read from the clipboard: read_delim(clipboard())
  • zip file:
    • copy the url

    • wir1to10 <- “https://wir2022.wid.world/www-site/uploads/2022/03/WIR2022TablesFigures-Chapter.zip

    • download.file(wir1to10, destfile = “./data/wir1to10.zip”)

    • unzip(“./data/wir1to10.zip”, exdir = “./data”)

    • list.files(“./data/WIR2022TablesFigures-Chapter”)

    • excel_sheets(“./data/WIR2022TablesFigures-Chapter/WIR2022TablesFigures-Chapter1.xlsx”)

    • df <- read_delim(clipboard()); df

    • Not reproducible unless clearly explained.


23.7.2 Code Chunk Options

https://yihui.org/knitr/options/

  • Chunk Name

  • Output: use document default

    • Show code and output: echo=TRUE, eval=TRUE - Default
    • Show output only: echo=FALSE
    • Show nothing (run code): include=FALSE
    • Show nothing (don’t run code): include=FALSE, eval=FALSE
  • Show message: message=TRUE, FALSE

  • Show warning: warning=TRUE, FALSE

  • Use Paged Tables: paged.print=TRUE, FALSE

  • Use custom figure size: width and height in inch.

  • You can use Hide Code and Show Code option on the rendered Notebook file.


23.7.3 Presentation and Paper

  1. Data Source

  2. Variables

  3. Problems

  4. Visualization

  5. Model

  6. Conclusions and Further Research

    WDI, WIR, etc


23.7.4 Word

Custom Word templates: https://bookdown.org/yihui/rmarkdown-cookbook/word-template.html

You can apply the styles defined in a Word template document to new Word documents generated from R Markdown. Such a template document is also called a “style reference document.” The key is that you have to create this template document from Pandoc first, and change the style definitions in it later. Then pass the path of this template to the reference_docx option of word_document

---
 word_document:
    reference_docx: "template.docx"
---

23.7.6 Create a PDF or Word file.

A Notebook file is created by pressing the Preview button, and the outputs appear as is. However, making a file with another format, R runs all code chunks from the top. So if the object is not defined above the code used, the knit program stops with an error message. I recommend the following steps.

  1. Create a PDF right after you create a new (R Notebook) file (using Template). By this step, you can check your ‘Knit to PDF’ process by tinytex is working well. Please let me know if you fail to create a PDF and cannot solve the problem. I will look at the setting of your PC in class.

  2. Run all codes before you preview Notebook. You can use ‘Run All’, and ‘Run All Code Chunks Below’ under the ‘Run’ button if there is an incomplete code chunk.

  3. Before you create a PDF or word, you need to correct all errors. But if you could not, add eval = FALSE as an option.

```{r eval=FALSE}
# code chunk with errors
```

You can add a similar option from the gear mark at the top right in the code chunk. Select show nothing (don’t run code); it adds {r eval = FALSE, include = FALSE}, and the code chunk itself is skipped.

  1. Rerun all. If you can reach the end of the file without having an error, ‘Knit to PDF’ or ‘Knit to Word’.

Creating a Word file is similar, and should be more accessible.

If you fail to create a PDF using Knit to PDF or Knit to Word, the alternative is to open the notebook wile with nb.html at the end in your web browser, such as Google Chrome, Edge, or Safari, and use the functionality of printing to PDF of your browser.

23.7.6.1 Other Code Chunk Options

Please review EDA5, and try options under the gear mark at the top right of each code chunk. I will add two useful options, I use often

  1. cash = TRUE option. Downloading data and accessing to the internet takes time, and may cause trouble for the hosting site. With this option, you can avoid it, and shorten the compilation time to render. I always add this option to WDI(). As for WDIsearch(), if you use cache = wdi_cache, you do not need to add this option. It is another benefit to use cache = wdi_cache.
```{r cash = TRUE}
# download from the internet
```
  1. echo = FALSE option. When you create a PDF with a limit of pages, you do not want to include some code chunks. Then use this option. The output is included, but the code chunk is not. You can select this option by choosing ’Show output only` option.

23.7.6.2 Reference

23.8 References