Skip to content

April Heyward, MRA

Social Science Doctoral Researcher, R Programmer, Data Science and Machine Learning Inspired, and STEM Program Manager

  • Home
  • About April Heyward
  • Education
  • Research
  • Service
  • Presentations
  • Publications
  • Teaching
  • Professional Experience
  • Blog
  • Contact

R Programming Language

Posted byApril HeywardMay 10, 2020June 17, 2020Posted inComputational Social Science, Data Science, General, Machine Learning, Public Administration, Public Policy, R Programming, Research

I receive questions all the time about the R programming language. R is one of my happy places and I can spend hours, days, and weeks programming. R is vast and its capabilities are extensive from employing data science, machine learning, neural networks, deep learning, apps development, interactive widgets for websites, etc. This is contrary to the opinion that these capabilities can only be achieved in the Python programming language. Yes, there is the R ecosystem vs the Python ecosystem. There are numerous R environments grounded in Base R and some of the most popular are RStudio, Bioconductor R (genomics data), and Microsoft R. I work in RStudio and it has become a part of the foundation of my research design and methods. I receive questions about R vs SPSS Statistics. My response is R requires programming and SPSS Statistics does not require programming. Do you want to program or not? I used SPSS Statistics Pre-R but once I learned how to program in R there was no reason for me to return to SPSS Statistics. I have more freedom in R and can program what I want. I have a sense of satisfaction that I can create and build by programming. It is really about your preference.

There is a steep learning curve with R. I receive questions about the best way to learn R. You can take courses with LinkedIn Learning, HarvardX, SuperDataScience, DataCamp, Udemy, etc. R programming will come alive when you practice with datasets. I must issue a cautionary note when attempting to replicate another person’s code. You need to keep in mind that a different version of R may have been employed for the code you are viewing from the version of R that you are using. R has been updated three times in the last six months. R packages and R functions update frequently. There are times when R packages and R functions operate differently in R versions. Sometimes R packages and R functions do not work or are replaced when R is updated. Another issue is when someone presents a loaded R Package in their demonstration code but leave out the other required R packages needed for the R package presented to work. For instance, if you are loading the Hmisc package you also need to load the lattice, survival, and formula packages for the Hmisc package to work. Another example of leaving out information in code presentations is the RJava package requires the Java System Environment to be setup prior to loading RJava package. Therefore, it is imperative for you to practice with datasets to learn how to problem solve and break through brick walls in R. There has been times where I have employed different coding to achieve the same output because the code presented is not correct, is missing a function, or not working. You will get to this point as you gain more experience with R.

I will never say that I am an R expert because there are over 10,000 R packages and functions. There are many different reasons to use R as I outlined earlier and you must figure out what works best for your purposes in R. I have staple R packages and functions that I employ. I am not going to wrangle data without Dr. Hadley Wickham’s tidyverse R package. Any Data Scientist can attest to the hours, days, and weeks it takes to wrangle data in structured and unstructured formats to a tidy format before the data analysis, data modeling, and data visualization stages can begin. I employ a combination of tidytext R package and tm R Package for text mining (machine learning algorithm). RScripts and outputs should be reproducible like research should be reproducible. When I work on a project in R, I leave # notes documenting my decisions executed in case I need to share my code with someone to reproduce. Note: # is employed to let R know to not execute that line as code. I also leave # notes to remind myself months down the road why I made the decisions I made in a previous project.

I want to share snippets of RScripts and outputs that I have employed in R for different projects. Please note that the RScripts and outputs below were preceded by many steps that are not depicted below.

summary ( ) in R
describe ( ) in R – psych R package
cor.test ( ) in R
plot ( ) in R
plot ( ) output
boxplot ( ) in R
boxplot ( ) output
lm ( ), summary ( ), par ( ), and plot ( ) in R
Regression Diagnostic Plots output – Residuals vs Fitted, Normal Q-Q, Scale-Location, and Residuals vs Leverage plots
Posted byApril HeywardMay 10, 2020June 17, 2020Posted inComputational Social Science, Data Science, General, Machine Learning, Public Administration, Public Policy, R Programming, Research

Post navigation

Previous Post Previous post:
Self Motivation for the Doctoral Journey
Next Post Next post:
Text Mining in R
April Heyward, MRA, Website Powered by WordPress.com.
Cancel

 
Loading Comments...
Comment
    ×