## Simple arithmetic
1 + 1
[1] 2
3 * 4
[1] 12
1 / 200 * 30
[1] 0.15
59 + 73 + 2) / 3 (
[1] 44.66667
sin(pi / 2)
[1] 1
%>%
or |>
)
R is a programming language designed for statistical computing and graphics. It’s widely used in data science, economics, and academic research to analyze and visualize data. Think of R as a toolbox that automates manual, repetitive tasks—especially useful for working with large datasets.
Before diving into programming concepts, let’s start by using R as a simple calculator:
[1] 2
[1] 12
[1] 0.15
[1] 44.66667
[1] 1
You can use R just like a calculator for basic math.
Try calculating the following in R:
Now, let’s introduce objects.1 Everything in R is an object—numbers, text, datasets, even functions. Objects are like boxes that hold information. You can create objects, give them names, and use them later.
Let’s create an object in R. We’ll store a number in a box called x
:
The characters <-
can be thought of as the assignment operator – it is used to assign a name to an object.
Notice how R stores the newly created object in the RStudio environment and allows you to use it. Now, you can use x
whenever you need the value 5. This idea of naming and storing values is essential in R.
Try creating an object called my_number
that stores the value 10:
Functions are tasks that take input (called arguments) and give output. For example, mean()
calculates the average of a group of numbers:
The mean()
function takes a vector of numbers as an argument (c(1, 2, 3, 4, 5)
) and returns their average.
Try using the sum()
function to add the numbers 5, 10, and 15:
[1] 30
Functions in R are like tools in your toolbox. There are many built-in functions, but you can also create your own later in the course!
R has many libraries (collections of functions). To use one, you first install the library and then load it into your session:
You can also use functions from libraries without loading them:
Here we call the library stringr
and its function str_replace
in a single command and without loading the library by separating the library from the function with ::
, i.e., library::function()
.
Create a tibble with inconsistent column names and clean them
janitor
library.library(janitor)
.clean_names()
function to clean the column names.The tidyverse is a collection of libraries designed to make data science easier. We’ll use it throughout the course for data manipulation and visualization.
Since R is a data science platform, data is central to everything you do. You’ll often work with different types of data, including numbers, text, and more complex datasets.
R comes with some built-in datasets for you to practice with. Let’s look at a built-in dataset called mtcars
:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Try displaying the first few rows of the built-in dataset iris
:
In R, tabular data is stored in data frames or tibbles (a tidyverse-friendly data frame). These structures organize your data into rows and columns.
Here’s an example of a small data frame:
sales_data <- tibble(
category = c("A", "B", "A", "C", "B", "A"),
price = c(100, 200, 150, 300, 120, 90)
)
sales_data
# A tibble: 6 × 2
category price
<chr> <dbl>
1 A 100
2 B 200
3 A 150
4 C 300
5 B 120
6 A 90
You can access specific columns using the $
operator:
Create a tibble with categories and prices of your choosing.
You can filter, select, and manipulate data frames using functions from the dplyr
package (part of the tidyverse). For example, to filter for a specific category:
# A tibble: 3 × 2
category price
<chr> <dbl>
1 A 100
2 A 150
3 A 90
Try filtering the sales_data
tibble to show only rows where the price is greater than 120:
%>%
or |>
)R lets you chain together multiple commands using pipes. Pipes allow you to take the output of one function and use it as the input for the next. Think of it as saying “and then…”.
Here’s an example using the mtcars
dataset to calculate the average miles per gallon (mpg) for cars with more than 100 horsepower:
In this example:
mtcars
dataset, and thenUse pipes to filter the mtcars dataset for cars with more than 20 mpg and then calculate the average horsepower (hp):
While you can run commands directly in the console, it’s better to save your work in R scripts (files with .R
extension). This way, you can write and save your code and run it again later.
To create a script, go to the “File” menu in RStudio, choose “New File” and then “R Script.” You can write and save all your commands there.
understand what went wrong. For example, if you try to use a variable that doesn’t exist, R will return an error:
If you’re stuck for more than 15 minutes, ask for help from an AI, a TA or the instructor. They’re available to guide you through debugging.
A common error occurs when you try to use a function from a package without loading the library first. For example, let’s try using the to_snake_case()
function without loading the `snakecase`` package. This package and function allow us to convert a string of characters into “snake case” which we will be able to see after fixing the error.
## Attempting to use to_snake_case() without loading snakecase
to_snake_case("This is a test string")
Error in to_snake_case("This is a test string"): could not find function "to_snake_case"
You’ll see an error that says something like: Error in to_snake_case("This is a test string"): could not find function "snakecase"
The solution to this error is to install and load the library before calling the function.
Install the snakecase
library (use the install.packages()
function)
Error in install.packages : object 'snakecase' not found
, it usually means that you spelled the package name wrong or you did not put the library name in quotes.install.packages("library")
) function with the library argument in quotes.Load the snakecase
library
Call the snakecase function again to see what snake case is
If this error is fixed, you should have the output “this_is_a_test_string”. As you can see, snake case means that all upper-case letters are made lower-case and all spaces are replaced by _.
When you’re stuck or unsure how to use a function, R provides built-in help:
?function_name
: Shows help documentation for a function. Example: ?mean
help("function_name")
: Similar to ?function_name
. Example: help("mean")
You can also search online for help. Be sure to add “in R” to your search queries
AI tools like ChatGPT or R-specific tools like the R Wizard GPT are excellent for writing code. You can use them to get help with specific tasks. However, remember that you need to understand the context to ask the right questions and to verify if the solution fits your problem.
A directory is like a folder where R looks for files and saves things. You can find out what directory R is using with:
You can change the directory to where your files are located with:
The Files pane in RStudio also lets you navigate and set directories.
Let’s practice a few of the concepts you’ve learned:
This chapter introduced the fundamentals of R, from its object-oriented structure to how to create objects, work with functions, and import data. As you work through the course, you’ll become more familiar with these concepts, and your confidence in R will grow!
Feel free to experiment with the code examples and ask questions whenever you need help.
R is an object-oriented programming (OOP) language. This means that everything in R is treated as an object, whether it’s a simple number or a complex dataset. Understanding how objects work is important for using R effectively.↩︎