Image

Importing data: the basics.

Getting some actual data into RStudio is probably the first thing you're looking to do. Luckily, it's a super easy process. Lets start by making a new Project in RStudio. Open RStudio and navigate to File > New Project > New Directory:

Image


Under Project Type, choose 'New Project', then give the directory a name (and choose where you want the folder to be). For ease, I'm just going to locate it on my desktop. Finally, choose 'Create Project':

Image


You should now have a new folder that contains an .Rproj file. A 'project' is sort of like a parent file that links lots of other files together...but let's not worry too much about that now. Check that your .Rproj file is actually open by checking the name at the top of the window (it should match the name you just chose for the project - if it doesn't, close the window and double click the .Rproj file):

Image


Now we're *in* our new project, let's make a script. Go to File > New File > R Script, then File > Save As... Make sure to save the empty script in the same folder as the .Rproj file. You can even call it the same name if you like, since the file type is different. Your project folder should now contain 2 files and look something like this:

Image


If you're following along to this guide, you can either use some of your own data, or download my example dataset here: coffee.csv. If you open the file up in Excel, it should look like this:

Image


Drag and drop your data into the project folder. In order to get this data into RStudio, we're going to make use of some packages. A package is basically an "add-on" that someone has made that gives us access to a bunch of useful features. Here, we want to use the 'readr' package, which allows us to read .csv files into RStudio really easily. The readr package is actually a part of the tidyverse - see What is the 'tidyverse'? if you're not sure what this means - so if we just install the 'tidyverse' package, it will also install 'readr' along with it. In the Source window (where we write our code), type or copy and paste the following:

  
    install.packages('tidyverse')   # This installs the 'tidyverse' package.
    library(tidyverse)              # This tells RStudio to load up the package.
  

Press "Run" at the top of the Source window. Notice the 'Stop' logo that shows up in the Console window - this means that RStudio is currently running our code (and it's often a good time to wait before entering anything else, so we can check our code has been evaluated in the way we expect):

Image


The tidyverse package (and subsequently, readr) is now loaded and ready-to-go! It's worth noting that we don't need to install packages like this every single time we run our code (but we do need to load them using library(...)). However, it's still good practice to include the install line in your script if you ever plan on sharing your code - that way, other's know what packages are required in order for your script to run successfully and can install them too. For now though, since we don't want to re-download the package every. single. time. we press Run, we can add a # in front of the install line. This means it'll get treated as a note, and won't be evaluated in the Console:

  
    #install.packages('tidyverse')   # This installs the 'tidyverse' package.
    library(tidyverse)               # This tells RStudio to load up the package.
  

We can read our data into RStudio by adding this to our script:

  
    coffee_data <- read_csv('coffee.csv')
  

If your data is an Excel file, rather than a .csv file, the steps are only slightly different. I would suggest trying to re-save your Excel data as a .csv file first, but if your data simply must remain an Excel file, you'll need to install a package specially designed for that purpose. 'readxl' can be installed and loaded in the same way as the tidyverse package, and the data can be imported using a similar line of code:

  
    #install.packages('readxl')   # This installs the 'readxl' package.
    library(readxl)               # This tells RStudio to load up the package.
    coffee_data <- read_excel('coffee.xlsx')
  

Whether your data is .csv or Excel, you can now highlight all of your script and press 'Run':

Image


Woo hoo! Our data is imported! We can see it whenever we want by typing its name into the Console.

Check out the next guide: Sorting and tidying data, for some practical tips on what to do next with your data.


Bonus tip: if you're accessing data that is stored online - like my coffee.csv for example - you don't actually need to download it as .csv or Excel file and then import it into RStudio. Instead, you can tell R where the data is located by first assigning the URL to a variable:

  
    my_online_data <- "https://raw.githubusercontent.com/adajam89/adajam89.github.io/master/coffee.csv"
  

then, you can use the 'readr' package to import it straight from the source into a new variable:

  
    library (readr)
    my_downloaded_data <- read_csv(url(my_online_data))
  



Comment Form is loading comments...

More guides:

01

Get started.

02

Importing data: the basics.

03

Selecting data.

See All Guides