Introduction to Biological Sciences lab, second semester

R is a freely available, open source programming language that is widely used for statistics and data visualization. It is available for PC, Mac, and Linux. Although R itself is a programming language, there are collections of pre-programmed functions, code, and data sets called packages that can be called on by users. R can be run from the command line, but it is often used through an integrated development environment (IDE) called RStudio. RStudio makes it easy to run R packages, commands, create scripts, and check on the values of variables. (The computers in 2122 have R fully installed).

There are many R packages available to do common statistical tests and graphing tasks. Several packages are shipped with R itself and others can be downloaded and installed from online libraries. In this class, we will not focus on teaching you how to write in R, but rather how to use the R package through RStudio to do several types of powerful statistical analyses. If the below content (for your system) does not work exactly, then look up on-line for yourself how to get R installed for your specific system (system type, OS, bit, etc,.).

R can be downloaded from one of many Comprehensive R Archive Network (CRAN) sites. The closest one to Vanderbilt is at UT Knoxville: - http://mirrors.nics.utk.edu/cran/ . From the UTK CRAN site homepage, click on the “Download R for [OS]” link that is appropriate for your operating system.

__BEFORE YOU DO THIS, know what operating system (OS) you have…Mac, Linux, etc. Know the system bit you have (32 or 64) … in short, find out what your system has before trying to download R. __A quick google search of your computer model or finding the information through settings should be able to inform you on what OS you are using. Note, if you download the incorrect OS file it will not work. If that happens, remove the download and try a different file.

NOTICE: There are usually **TWO things to download, whether using Mac or Windows. TWO things to download...R and Rstudio.**

**Downloading and installing ****R**

On the CRAN Windows download page, click on the "base" link, which will take you to the download page for the most recent base R distribution. Then click on the "Download R X.X.X for Windows" link (where X.X.X is the version number). This will initiate the download of an executable installation file to the default download directory for your browser. After the download completes, click (or double-click) on the installer file to initiate the install. Click the Next button repeatedly to accept all of the defaults. After completing the install, you should see an R shortcut on your desktop. Double-click on the icon to launch R. You should see the R Console with a ">" prompt at the bottom. Enter:

2+2

and you should see

[1] 4

as the answer. Click the X in the upper right of the window to quit the console, and don't save.

**Downloading and installing RStudio**

Go to https://www.rstudio.com/products/rstudio/download/ and click on the installer link for Windows. This will initiate the download of an executable installation file to the default download directory for your browser. After the download completes, click (or double-click) on the installer file to initiate the install. Click the Next button repeatedly to accept all of the defaults. By default, there is no shortcut on the desktop - if you want one there, click on the Start menu, find the RStudio icon in the list of programs and drag it to the desktop. Run RStudio. In the left side of the window, you should see a Console pane similar to what you saw before. Try adding 2+2 as you did above and you should get the same result.

**Downloading and installing R**

The main CRAN download page for Mac contains the installers for OS X 10.6 and above. For older operating systems, read the page and rummage around until you find what you need. On the main download page, click on the link for the correct binary for your OS version. When the download is complete, click on the installer file to launch the install. Click the Next button repeatedly to accept all of the defaults, and Agree to the terms. Click on Install as prompted and enter your password as necessary. When complete, close the installation window.

In Finder, click on Applications. You should see R listed. Double click on it to launch R. You should see the R Console with a ">" prompt at the bottom. Enter:

2+2

and you should see

[1] 4

as the answer. Click the red dot in the upper left of the window to quit the console, and don't save.

**Downloading and installing RStudio**

Go to https://www.rstudio.com/products/rstudio/download/ and click on the appropriate installer link for your computer's operating system. This will initiate the download of an executable installation file to the default download directory for your browser. After the download completes, click on the .dmg file to open it. Drag the RStudio icon into the Applications folder and close the window. You should now be able to find RStudio in your Applications folder. Run RStudio. Allow the application to run. In the left side of the window, you should see a Console pane similar to what you saw before. Try adding 2+2 as you did above and you should get the same result.

**If you have any trouble completing this task, please reach out to your lab contact for part of your Week 0 Assignment will be to show that you successfully have the Data Analysis Tool Pak for Excel enabled on your system AND ALSO that you have R installed and that you could do 2+2 to show 4.**

**And if the above steps did not fully help you get it installed, there may a difference needed for your specific system. We can't know all the different systems that you might have. **

**You are ALWAYS welcome to use the computers in the BSCI Labs (when open) to work on R. **

A question you may have is ‘why?’. Why do Excel and R? There are two reasons.

Reason 1 –1511L is an introductory biology series continuation of 1510L, and both of the courses are designed for the students to be exposed to commonly used lab equipment and techniques used in Research labs. One of the most essential parts of the scientific process is centered around OBJECTIVELY (statistics) determining if the independent variables tested had an effect on the dependent variable or not. A wide range of statistical software exists including R-Studio, Excel, JMP, PYTHON, SPSS, and several others. However, in this course we will focus on using Excel due to the familiarity many of you already have, and R/RStudio. We focus on R/RStudio for a few essential reasons: it is free, it is powerful, and it is flexible to perform many types of tests shown by its widespread adoption across biological fields (Micro, Molecular, Ecology, etc.). In many lab settings after 1511L that you might find yourself in, using R is no different than learning to use a pipettor or a microfuge.

Reason 2 – Another big benefit for adding RStudio to your toolbox is to have a check against yourself. Being able to run a statistical test such as a t-test of means with both Excel and RStudio and receive the same result from both... then you have some assurance of getting a correct value. AND THAT IF YOU got differing values, then either the program commands were in error or the data is not being entered correctly.

And that is the ‘why.’ This is increasing your toolbox of science tools that you have encountered and ALSO giving you a way to know if you did the statistical test correctly to get the p-value (and t-stat, etc.) that you were supposed to for the class data set!

The math problem, 2+2 is what? 4. 12x12=? 144. You can all (hopefully) do those basic operations without a calculator. How would you know if the P-value from our Review of how to perform a t-test of means Example 1 (0.00171) is correct? All that any statistical software package does is to give you numbers based on the number(s) and commands that you tell it do. The program does not know if you left off a pair of data points. It does not know that you made a typo on entering one value. Or that you did not ‘turn on’ an added setting, like “add R2.” THE PROGRAM just uses what YOU tell it to use. Thus, having a way to verify that by two programs, you obtained the same results, you can know you did it correctly. How do you know that the t-test of means that you performed for Example 1 and Example 2 were correct and go....ok, that's what the computer says it is (aside from us giving you the answer)? By running the *same data *in two different programs and **obtaining the same result(s)**, you then have some assurance that you did it correctly.

CAVEAT: R and Excel have limits of the size of the P-value they give. For Excel, if the P-value is “so small” (beyond 10^-200), it will just say ‘0’. For R, the version downloaded will only go as low as 2.2 x 10^-16 even if the p-value is actually 4.5 x 10^-25 or similar, it will just show "2.2 x 10^-16". All the other associated values (such as t-stat, df) will match with Excel, if you compare.

BIG IMPORTANT COMMENT: something that you will run into this semester is that there are at least three different ways to run a set of data in R to do a statistical test, as opposed to only one way in Excel. Try not to get the differences mixed up. The first way is direct data input. Second, is using a saved file of data for R to run. Third, is using a URL of data. We will present two of these ways. The same basic idea is happening in each way. You give R a program/command, telling it what to do. You provide the data to R, either there or in a file to find. You get a result.