This is a set of tips and tricks to know when coding in R which aren’t for beginners but not as advanced as Hadley Wickham advance-R.
The idea is to compile the best practice in R which had increased my productivity in the last years and share them.
When working with Rstudio, one project = one folder
Project could be found in file -> New Project…
This is especially a good advice if you work on different projects or are used to have multiple sessions of R open at the same time.
It allows to fastly close and open a project, as the session is saved, including the tab of your session and the data in your memory.
The only thing to take care is to have a light amount of data in memory. Otherwise, the project could take a while to close. If it becomes problematic, there is an option in Tools>global options>untick “always save history” to change that.
Personally, I like to organise my project in the following directory structure: Project
– – R
– – doc
– – data
– – plot
– – output
It is similar to the folder structure used when creating a package, with doc and output in addition.
Style your code
As R is a case sensitive language, it is very important to define a style property that you keep across all your programs.
I am a big fan of google style (10 min) when coding in R.
The most important rules:
- Place spaces around all binary operators (=, +, -, <-, etc.).
- Exception: Spaces around ='s are optional when passing parameters in a function call.
- Do not place a space before a comma, but always place one after a comma.
- The preferred form for variable names is all lower case letters and words separated with dots (variable.name).
- Function names have initial capital letters and no dots (FunctionName).
The data.table is the package which make me consider R as a software which could be used as equal to SAS (at least in marketing studies).
The concepts could be tough at the beginning, but after a few time of practice, your productivity is multiplied.
The data.table allows, among others:
- Fast import of csv.
- Fast merging through indexation.
- Fast variable modification.
- Fast summarisation of data.
- Fast ranking process.
The only function needed to make this package the core data management in R is a version of sqlQuery of the RODBC package which query database directly into data.table.
Some exemple of utilisation: For more, read the introduction: (10 min here: data.table intro)
This is only a glipse of what could be done with the data.table packages. All I used to do in SQL before, I now do it with data.table.
A good way to aprehend a set of data is to use data visualisation. The ggplot2 package allows, when mastered, to do highly customised graph. Even XKCD-like ones (In this case, replace perl by ggplot2).
A few concepts are good to handle:
- A ggplot2 object could be stored.
- Customise a plot is the same as "summing" code.
- When you do a plot, it is good to build it element by element.
- Whatever your issue, someone on ![stackoverflow](https://stackoverflow.com/) already had the same one.
- It is better to plot already reshaped data, as plotting more than 10,000 points could be slow and probably useless.
When doing an analysis, it is important to keep a track of whatever process you went through. Saving your graphs is a good practice and the
ggsave function allows you to do so.
Generally, I plot and save the distribution of all the main variables plus the multivariate distributions which could be interesting.
It allows to grab an idea on what the data set looks like.
The following code create the boxplot of the quantitative variables by the qualitative variables.
Generating code with brew
I used SAS for a while and really liked the concept of macro, the SAS version of function.
In R, there is functions. But not everything which is possible with a macro is possibble with a function.
One exemple of difficulties I get was, for a shiny app, to modify the ui part. The function renderUI is good but make your app unecessary messy. Using the brew package, I had been able to automate an UI modification.
The brew package is made to create report and text from a template.
Another way to use the package is to generate code, like a ui.R file or a SQL script.
The syntax is very simple:
With stdin() a template file or a connection and 1 the file to generate or a connection.