10  Getting Started with R

10.1 What is R?

R is a programming language designed for statistical computing and graphics. It’s widely used in data science, economics, and academic research to analyze and visualize data. Think of R as a toolbox that automates manual, repetitive tasks—especially useful for working with large datasets.

  • You’ll use R to perform calculations, analyze data, and create visualizations.
  • R is also open source, meaning it’s free to use and has a large community creating useful packages to make your work easier.

10.2 Using R as a Calculator

Before diving into programming concepts, let’s start by using R as a simple calculator:

## Simple arithmetic
1 + 1
[1] 2
3 * 4
[1] 12
1 / 200 * 30
[1] 0.15
(59 + 73 + 2) / 3
[1] 44.66667
sin(pi / 2)
[1] 1

You can use R just like a calculator for basic math.

Practice

Try calculating the following in R:

(5 + 3) * 2

10.3 Objects in R

Now, let’s introduce objects.1 Everything in R is an object—numbers, text, datasets, even functions. Objects are like boxes that hold information. You can create objects, give them names, and use them later.

Creating Objects

Let’s create an object in R. We’ll store a number in a box called x:

x <- 5
x
[1] 5

The characters <- can be thought of as the assignment operator – it is used to assign a name to an object.

Notice how R stores the newly created object in the RStudio environment and allows you to use it. Now, you can use x whenever you need the value 5. This idea of naming and storing values is essential in R.

x + 1
[1] 6

Practice

Try creating an object called my_number that stores the value 10:


10.4 Functions in R

Functions are tasks that take input (called arguments) and give output. For example, mean() calculates the average of a group of numbers:

mean(c(1, 2, 3, 4, 5))
[1] 3

The mean() function takes a vector of numbers as an argument (c(1, 2, 3, 4, 5)) and returns their average.

Practice

Try using the sum() function to add the numbers 5, 10, and 15:

[1] 30

Functions in R are like tools in your toolbox. There are many built-in functions, but you can also create your own later in the course!


10.5 Libraries in R

R has many libraries (collections of functions). To use one, you first install the library and then load it into your session:

## Installing and loading a library
install.packages("tidyverse")
library(tidyverse)

You can also use functions from libraries without loading them:

stringr::str_replace("This is the old text", "old", "new")
[1] "This is the new text"

Here we call the library stringr and its function str_replace in a single command and without loading the library by separating the library from the function with ::, i.e., library::function().

Practice

Create a tibble with inconsistent column names and clean them

  1. Install the janitor library.
  2. Load it using library(janitor).
  3. Create a small tibble with inconsistent column names (e.g., spaces, uppercase letters).
  4. Use the clean_names() function to clean the column names.

The Tidyverse

The tidyverse is a collection of libraries designed to make data science easier. We’ll use it throughout the course for data manipulation and visualization.


10.6 Data in R

Since R is a data science platform, data is central to everything you do. You’ll often work with different types of data, including numbers, text, and more complex datasets.

R comes with some built-in datasets for you to practice with. Let’s look at a built-in dataset called mtcars:

head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Practice

Try displaying the first few rows of the built-in dataset iris:


10.7 Data Frames and Tibbles

In R, tabular data is stored in data frames or tibbles (a tidyverse-friendly data frame). These structures organize your data into rows and columns.

Here’s an example of a small data frame:

sales_data <- tibble(
  category = c("A", "B", "A", "C", "B", "A"),
  price = c(100, 200, 150, 300, 120, 90)
)
sales_data
# A tibble: 6 × 2
  category price
  <chr>    <dbl>
1 A          100
2 B          200
3 A          150
4 C          300
5 B          120
6 A           90

You can access specific columns using the $ operator:

sales_data$category
[1] "A" "B" "A" "C" "B" "A"

Practice

Create a tibble with categories and prices of your choosing.


10.8 Basic Data Manipulation

You can filter, select, and manipulate data frames using functions from the dplyr package (part of the tidyverse). For example, to filter for a specific category:

filter(sales_data, category == "A")
# A tibble: 3 × 2
  category price
  <chr>    <dbl>
1 A          100
2 A          150
3 A           90

Practice

Try filtering the sales_data tibble to show only rows where the price is greater than 120:


10.9 Pipe Operator (%>% or |>)

R lets you chain together multiple commands using pipes. Pipes allow you to take the output of one function and use it as the input for the next. Think of it as saying “and then…”.

Here’s an example using the mtcars dataset to calculate the average miles per gallon (mpg) for cars with more than 100 horsepower:

mtcars |> 
  filter(hp > 100) |> 
  summarize(avg_mpg = mean(mpg))
   avg_mpg
1 17.45217

In this example:

  1. We start with the mtcars dataset, and then
  2. We filter it to only include cars with more than 100 horsepower, and then
  3. We summarize the average mpg for these cars.

Practice

Use pipes to filter the mtcars dataset for cars with more than 20 mpg and then calculate the average horsepower (hp):


10.10 R Scripts

While you can run commands directly in the console, it’s better to save your work in R scripts (files with .R extension). This way, you can write and save your code and run it again later.

To create a script, go to the “File” menu in RStudio, choose “New File” and then “R Script.” You can write and save all your commands there.


10.11 Error Handling

understand what went wrong. For example, if you try to use a variable that doesn’t exist, R will return an error:

my_variable
Error: object 'my_variable' not found

If you’re stuck for more than 15 minutes, ask for help from an AI, a TA or the instructor. They’re available to guide you through debugging.

Practice

A common error occurs when you try to use a function from a package without loading the library first. For example, let’s try using the to_snake_case() function without loading the `snakecase`` package. This package and function allow us to convert a string of characters into “snake case” which we will be able to see after fixing the error.

## Attempting to use to_snake_case() without loading snakecase
to_snake_case("This is a test string")
Error in to_snake_case("This is a test string"): could not find function "to_snake_case"

You’ll see an error that says something like: Error in to_snake_case("This is a test string"): could not find function "snakecase"

The solution to this error is to install and load the library before calling the function.

  1. Install the snakecase library (use the install.packages() function)

    • If you got an error that says something like Error in install.packages : object 'snakecase' not found, it usually means that you spelled the package name wrong or you did not put the library name in quotes.
    • Check your spelling and put the package name in quotes to fix this error (use the install.packages("library")) function with the library argument in quotes.
  2. Load the snakecase library

  3. Call the snakecase function again to see what snake case is


If this error is fixed, you should have the output “this_is_a_test_string”. As you can see, snake case means that all upper-case letters are made lower-case and all spaces are replaced by _.


10.12 Getting Help

When you’re stuck or unsure how to use a function, R provides built-in help:

?function_name: Shows help documentation for a function. Example: ?mean

help("function_name"): Similar to ?function_name. Example: help("mean")

You can also search online for help. Be sure to add “in R” to your search queries

  • Google: Try searching “calculate the mean in R.”
  • Stack Overflow: For programming-specific questions.
  • Cross Validated: For statistics-focused questions.
  • and many others

10.13 Using AI for Code Assistance


AI tools like ChatGPT or R-specific tools like the R Wizard GPT are excellent for writing code. You can use them to get help with specific tasks. However, remember that you need to understand the context to ask the right questions and to verify if the solution fits your problem.


10.14 Directories in R

A directory is like a folder where R looks for files and saves things. You can find out what directory R is using with:

getwd()
[1] "/Users/nilehatch/Dropbox/Teaching/Bus Analytics Book/bus_analytics"

You can change the directory to where your files are located with:

setwd("path/to/your/folder")

The Files pane in RStudio also lets you navigate and set directories.


10.15 Practice Section

Let’s practice a few of the concepts you’ve learned:

R as a Calculator

## Simple arithmetic
1 / 200 * 30
[1] 0.15
(59 + 73 + 2) / 3
[1] 44.66667
sin(pi / 2)
[1] 1

Creating Objects

## Create new objects
a <- 3 * 4
x <- 4 + 3 / 10 ^ 2

Modifying Objects

## Modify an object
x <- x + 1
x
[1] 5.03

10.16 Conclusion

This chapter introduced the fundamentals of R, from its object-oriented structure to how to create objects, work with functions, and import data. As you work through the course, you’ll become more familiar with these concepts, and your confidence in R will grow!

Feel free to experiment with the code examples and ask questions whenever you need help.


  1. R is an object-oriented programming (OOP) language. This means that everything in R is treated as an object, whether it’s a simple number or a complex dataset. Understanding how objects work is important for using R effectively.↩︎