Installation, Configuration and Environment
.libPaths() displays the location of package libraries
Vectors and Matrices
base::rep() replicates elements of vectors and lists.
- rep(5, 3) #returns 5, 5, 5
- rep(c(1,2),2) #returns 1 2 1 2
- rep(c(1, 2), each=2) #returns 1 1 2 2
- rep(1:3, 3:1) # 1, 1, 1, 2, 2, 3
base::seq() generates regular sequences. Very flexible with many options. Typical usage includes seq(from, to), seq(from, to, by= ), seq(from, to, length.out= ), seq(along.with= ), seq(from), seq(length.out= ).
- seq(0, 1, length.out=11)
- seq(stats::rnorm(20))
- seq(1, 9, by = 2) # match
- seq(1, 9, by = pi)# stay below
- seq(1, 6, by = 3)
- seq(1.575, 5.125, by=0.05)
- seq(17) # same as 1:17
base::vector() produces a vector of the given length and mode. The atomic modes are ‘logical’, ‘integer’, ‘numeric’, ‘complex’, ‘character’ and ‘raw’. Mode can also be ‘list’
- X = vector(mode=’list’, length=10000) #creates list with 10000 cells of NULL
- X = vector(mode=’numeric’, length = 5) #creates numeric vector of 5 zeros
base::matrix() creates a 2d matrix (see also base::array) X= matrix(data = NA, nrow=2,ncol=2, dimnames = list(c(‘row1’, ‘row2’), c(‘col1’, ‘col2’))) # creates empty 2X2 matrix
Generic Variable Manipulation
Calculate new variable within subjects (e.g., standardize startle within subject)
Use the plyr package. In this example, we use the baseball dataset, In this dataset, each baseball player has n rows for each of the n years they played ball. There is a year variable which indicates the calendar year (e.g. 1991) for each row. To transform calendar year to career year (cyear; i.e. the number of years since the player started playing), for each player, do the following:
baseball = ddply(.data= baseball, .variables= c(‘id’), .fun= transform, cyear = year – min(year) + 1)
NOTES: transform() is a function in base R. id is a unique identifier (e.g., SubID) for each baseball player. More detail on this example can be found in the published article on plyr from the plyr website
Recoding variable values (NEED)
Factor Manipulation
Create a factor
base::factor() creates a factor variable from text or numeric variable
- d$AFactor = factor(d$BevGroup, levels = c(‘no-alcohol’, ‘placebo’, ‘alcohol’)) #create factor AFactor from variable with text data labels
- d$AFactor = factor(d$BevGroup, levels = c(1,2,3), labels = c(‘no-alcohol’, ‘placebo’, ‘alcohol’)) #create factor AFactor from variable with numeric entries 1=no-alcohol, 2=placebo, 3=alcohol
Display levels of a factor
base::levels() sets or displays the levels of a factor.
- levels(BevGroup) #displays the levels the BevGroup factor
- levels(BevGroup) = c(‘no-alcohol’, ‘placebo’, ‘alcohol’) #set levels of BevGroup as indicated. NOTE: this is not recommended because it is error prone, use revalue()
Changle labels of factor levels
plyr::revalue() changes the values of specific levels of the factor, without respect to their order
- d$BevGroup = revalue(d$BevGroup, c(“Alcohol”=”Alc”, “No-Alcohol”=”NoAlc”))
Reorder factor levels (NEED)
Set contrasts for a factor (NEED)
Date/Time Manipulations
use as.POSIXct() to convert text date to a POSIXct calendar date. This allows for standard use of a date object.
Date = ’10/12/2016 14:32:10′ #timezone is ‘America/Chicago’ which determines CST and CDT by date but probably presents a problem for the ambiguous times during switch days
(t=as.POSIXct(x=Date, format=’%m/%d/%Y %H:%M:%S’, tz = “America/Chicago”))
Info on specification of format string can be found in strptime()
To change a POSIXct date object to an integer (i.e. unix timestamp)
as.numeric(t)
To see the attributes of a POSIXct date object
attributes(t)
To change the timezone of a POSIXct date object (for display only. It doesnt fundamentally change the date/time. It is still the same moment in time)
attributes(t)$tzone = ‘UTC’
Note that it doesnt change the date itself, it just changes how it is displayed. e.g. Unix time stamp is unchanged by timezone change as.numeric(t)
If you want to force the time zone to change without updating the time use force_tz from the lubridate package. Note that this will change the moment in time but can be useful when functions default to giving you UTC but your input was in another time zone (like converting from excel)
NewTime = force_tz(OldTime, tz=’America/Chicago’)
This wikipedia page is useful for finding valid timezone info
The Epoch Converter is also a useful web resource
Using Dataframes
Opening dataframes from various sources
lmSupport::lm.readDat() loads tab delimited text in Curtin lab format
- d = lm.readDat (‘Data.dat)
- d = lm.readDat(‘Data.dat’, SubID = ‘ID’)
utils::read.table() to read text data
- d= read.table(‘Prison.dat’, header=TRUE)
- d= read.table(‘clipboard’, header=TRUE) #read data via the clipboard (e.g., from Excel)
- d= read.table(‘SampleData.dat’, header=TRUE) #read .dat data file
foreign::read.spss() loads SPSS data files
- d= read.spss(‘Prison.sav’, to.data.frame=TRUE)
base::scan() allows input from keyboard directly into a data frame. Separate entries by space. Enter a blank line to terminate input.
- dData = scan()
R.matlab::readMat() and writeMat() are used to read and write Matlab MAT files.
- d= readMat(‘X.mat’)
- d= data.frame(d)
base::file.choose() is used to bring up dialog box to select filename and path
- d= read.table(file.choose(), header=TRUE)
clipboard is used in various functions to read from clipboard rather than file.
d = read.delim(‘clipboard’) after copying excel data to clipboard.
Creating a new dataframe
base::data.frame() creates a dataframe from vectors
- d= data.frame(X=seq(2,10,2), Y=(1:5), Z=c(1,3,6,10,12))#define vectors named X, Y, and Z
- d= data.frame(BevGroup, Sex, Age) #use previously defined vectors.
Saving dataframe
lmSupport::lm.writeDat() writes a data frame to tab-delimited text file using Curtin lab defaults
- lm.writeDat(d, ‘Data.dat’)
utils::write.table() writes a dataframe to a text file.
- write.table(d,file=’c:\\Data.dat’, sep=’\t’) #use sep = ‘\t’ to write as tab-delimited (non-default option)
- write.table(d,file=’c:\\Data.dat’, sep=’\t’, row.names=FALSE) #use row.names=FALSE to not write case/row.names (i.e., if you want to use data later in SPSS)
Display data and properties of dataframes
car::some() displays 10 (by default) randomly selected participants from the data frame
- some(d)
- some(d,20) #passing second argument allows display of more (or less) cases
utils::head() displays the first n rows of the dataframe
- head(d)
- head(d,20) #passing second argument allows display of more (or less) cases
utils::tail() displays the last n rows of the dataframe
- tail(d)
- tail(d,20) #passing second argument allows display of more (or less) cases
utils::View() displays dataframe in a crude spreadsheet. [see also relimp::showData()]
- View(d)
base::rownames() sets the row names for a dataframe
- rownames(d) = as.character(d$SubID) #set row names to the SubIDs
- row.names(d) #print row names for d
base::dim() provides the dimensions (# of rows and columns) of the dataframe
- dim(d)
base::nrow() provides the # of rows (observations) of the dataframe
- nrow(d)
base::ncol() provides the # of columns (variables) of the dataframe
- ncol(d)
base::str() compactly provides the stucture of an object
- str(d)
base::names() gets or sets the names of an object.
- names(d) #displays the variable names of the dataframe
- names(d) = c(‘VarName1’, ‘VarName2’, ‘VarName3’) #set names of three variables in dData
- names(d)[3] = ‘VarName3’ #set name of third variable to ‘VarName3’
Indexing
base::which() returns indices for specific cases based on variables in dataframe
- which(d$Age> 21) #return indices for cases based on Age
- which(d$BevGroup == ‘alcohol’) #return indices for cases where factor BevGroup = alcohol (level label)
stats::na.omit() selects subset of non-missing cases in dataframe
- dNew = na.omit(d)
stats::complete.cases() returns a logical vector indicating which cases are complete, i.e., have no missing values.
- complete.cases(d$X1, d$X2)
- d= d[complete.cases(d$X1,d$X1),]
car::whichNames() returns indices of specific row names in dataframe
- whichNames(c(‘1001’, ‘2022’), d) #returns indices of SubIDs 1002 and 2033 (assuming SubIDs are row names)
Get row name of specific indices
- rownames(d)[10] #returns row name of case 10
- rownames(d)[1:10] #returns row names of first 10 cases
Selecting a single variable in a dataframe.
- d$MyVariable
- d[1] #select variable in first column of dataframe
Selecting cases in a dataframe.
- d[10, ] #select case 10 in d
- d[7:10, ] #select cases 7-10 in d
- d[c(7,10,11), ] #select cases 7, 10, 11 in d
Dataframe Manipulation
Aggregating data
Use the plyr package. In this example, we use the baseball dataset, In this dataset, each baseball player has n rows for each of the n years they played ball. To make an aggregate data file that includes the mean and max number of runs across years for each player, use the following code:
NewData = ddply(.data= baseball, .variables = c(‘id’), .fun= summarise, MeanRuns = mean(r), MaxRuns = max(r))
Convert dataframe from LONG to WIDE format
Use dcast() from reshape2 package
The dcast formula has the following format: x_variable + x_2 ~ y_variable + y_2 ~ z_variable ~ … The order of the variables makes a difference. The first varies slowest, and the last fastest.
If the combination of variables you supply does not uniquely identify one row in the original data set, you will need to supply an aggregating function, fun.aggregate=. You would typically use fun.aggregate=mean.
First lets create a sample dataframe in LONG format for this example.
- dLong = read.table(header=T, text=’
- SubID sex condition measurement
- 1 M control 7.9
- 1 M cond1 12.3
- 1 M cond2 10.7
- 2 F control 6.3
- 2 F cond1 10.6
- 2 F cond2 11.1
- 3 F control 9.5
- 3 F cond1 13.1
- 3 F cond2 13.8
- 4 M control 11.5
- 4 M cond1 13.4
- 4 M cond2 12.9
- ‘)
Then use dcast as follows:
- dWide = dcast(data= dLong, formula= SubID + sex ~ condition, value.var=”measurement”)
Convert dataframe from WIDE to LONG format
Use dcast() from reshape2 package You need to specify:
- id.vars: the variables that will not be split apart on melt
- measure.vars: the variates of the within subject variable
- variable name: the name of the within subject variable
- value.name: the name of the dependent variable
First, lets make a WIDE example dataframe
- dWide <- read.table(header=T, text=’
- SubID sex control cond1 cond2
- 1 M 7.9 12.3 10.7
- 2 F 6.3 10.6 11.1
- 3 F 9.5 13.1 13.8
- 4 M 11.5 13.4 12.9
- ‘)
Then use melt():
- dLong <- melt(dWide, id.vars=c(‘SubID’,’sex’), measure.vars=c(‘control’, ‘cond1’, ‘cond2′), variable.name=’condition’, value.name=’DV’)
Summary/Descriptive Statistics
Calculate summary statistics separately on every subject in dataframe
Use the plyr package. In this example, we use the baseball dataset, In this dataset, each baseball player has n rows for each of the n years they played ball. There is a year variable which indicates the calendar year (e.g. 1991) for each year played for each player. To calculate the last year each player played ball choose one of these options:
Option 1 uses summarize to return one row per subject in new dataframe, LastYears:
- LastYears = ddply(.data= baseball, .variables= c(‘id’), .fun= summarise, MaxYear = max(year))
Option 2 uses transform to return max year in every existing row in the original dataframe (same value for every row for the same player)
- baseball = ddply(.data= baseball, .variables= c(‘id’), .fun= transform, MaxYear = max(year))
Option 3 uses our own anonymous function to demo situation where you need a new or more complex function that doesnt exist (of course, this one does exist):
- NewMax = function(x) {max(x, na.rm=TRUE)}
- LastYears = ddply(.data= baseball, .variables= c(‘id’), .fun= summarise, MaxYear = NewMax(year))
Programming and Debugging (DRAFT)
Conditional branching
If() not vectorized If(x< 0) –x else x
Ifelse() is vectorized
list all logical operators
! x x & y x && y x | y x || y xor(x, y)
== != >= <=
Functions
return()
Control structures
switch()
convert2meters <- function(x,
units=c("inches", "feet", "yards", "miles")) { units <- match.arg(units) switch(units, inches = x * 0.0254, feet = x * 0.3048, yards = x * 0.9144, miles = x * 1609.344)
}
for()
- f=0
- for (i in seq(1:10)) {f = f * i}
while()
fact3 <- function(x){
if ((!is.numeric(x)) || (x != floor(x)) || (x < 0) || (length(x) > 1)) stop("x must be a non-negative integer") i <- f <- 1 # initialize while (i <= x) { f <- f * i # accumulate product i <- i + 1 # increment counter } f # return result
}
repeat
fact4 <- function(x) {
if ((!is.numeric(x)) || (x != floor(x)) || (x < 0) || (length(x) > 1)) stop("x must be a non-negative integer") i <- f <- 1 # initialize repeat { f <- f * i # accumulate product i <- i + 1 # increment counter if (i > x) break # termination test } f # return result
}
recursion
fact5 <- function(x){
if (x <= 1) 1 # termination condition else x * fact5(x - 1) # recursive call
}
Function definitions (NEED)
Debugging
browser()
debug() & undebug()
system.time()
debugger() with options(error=dump.frames) & options(error=NULL)
Rprof w/tempfile() & unlink() & summaryRprof()
traceback()
str()
class()
Iterative Procedures
Calculate summary statistics separately on every subject in dataframe
Use the plyr package. In this example, we use the baseball dataset, In this dataset, each baseball player has n rows for each of the n years they played ball. There is a year variable which indicates the calendar year (e.g. 1991) for each year played for each player. To calculate the last year each player played ball choose one of these options:
Option 1 uses summarize to return one row per subject in new dataframe, LastYears:
- LastYears = ddply(.data= baseball, .variables= c(‘id’), .fun= summarise, MaxYear = max(year))
Option 2 uses transform to return max year in every existing row in the original dataframe (same value for every row for the same player)
- baseball = ddply(.data= baseball, .variables= c(‘id’), .fun= transform, MaxYear = max(year))
Option 3 uses our own anonymous function to demo situation where you need a new or more complex function that doesnt exist (of course, this one does exist):
- NewMax = function(x) {max(x, na.rm=TRUE)}
- LastYears = ddply(.data= baseball, .variables= c(‘id’), .fun= summarise, MaxYear = NewMax(year))
Create and save (to pdf) individual subject plots
In this example, we use the baseball dataset, In this dataset, each baseball player has n rows for each of the n years they played ball. To make subject by subject plots of runs (r) by at bats (ab), do the following:
- xlim = range(baseball$ab)
- ylim = range(baseball$r)
- MakePlot =function(df)
- {
- plot(df$ab, df$r, data = df, xlab = ‘at bat’, ylab = ‘runs’)
- title(df$id[1])
- }
- pdf(“c:\\paths.pdf”, width = 8, height = 4) #print to pdf
- d_ply(.data= baseball, .variables = c(‘id’), .fun = failwith(NA, MakePlot), .print = TRUE)
- dev.off() #turn off output to pdf
Estimate linear models on individual subjects
In this example, we use the baseball dataset, In this dataset, each baseball player has n rows for each of the n years they played ball. To estimate a linear model for each player regressing runs (r) on at bats (ab) and save models in a list, do the following”
- DoLM = function(df) {lm(r ~ ab, data=df)}
- Models = dlply(.data= baseball, .variables= c(‘id’), .fun= DoLM)
NOTES: Defined DoLM outside of plyr call to demonstrate this functionality. Could include multiple lines of code in DoLM if needed.
Extract parameters from list of linear models with plyr
In this example, we use the baseball dataset, We first make a list of simple linear models within subject as in the previous example
- DoLM = function(df) {lm(r ~ ab, data=df)}
- Models = dlply(.data= baseball, .variables= c(‘id’), .fun= DoLM)
To extract the parameters (and model r-squared) and save as variables in a data frame, do the following:
- rsq = function(x) summary(x)$r.squared
- Parameters <- ldply(Models, function(x) c(coef(x), rsquare = rsq(x)))
John Curtin’s R Reference Card
R Installation and Workspace
utils::install.packages() installs the package or packages listed. Must load package (using library()) after installation.
- install.packages(‘car’, dependencies = TRUE)
base::library() loads a package into the workspace for use or lists available packages
- library(car)
- library() #lists all installed packages
base::detach(‘package:car’) removes the package from the workspace.
- detach(‘package:car’)
base::search() returns list of attached packages and dataframes.
- search()
base::ls() returns names of objects in workspace.
- ls()
base::rm() removes an object from the workspace.
- rm(dData) #removes dataframe dData
- rm(dData, mLM) #removes dData and mLM
- rm(list = ls()) #removes all objects in workspace (with no warnings).
base::options() get or set options for R
- options(digits=4) #sets digits options to 4
- options() #returns all options
- options(‘digits’) #returns option setting for digits
- names(options()) #returns the names of all options
base::source() accept input from the named file. Used typically to load function libraries that are not in packages.
- source(‘P:\\Methods\\Statistics\\R\\functions\\CurtinGLM.R’)
utils::str() returns the structure of an object.
- str(dData) #return structure of the dataframe dData
base::class() returns the class of an object. Useful for debugging.
- class(dData)
- class(mLM)
utils::methods() List all available methods for an S3 generic function, or all methods for a class.
- methods(lm)
utils::data() loads specified data sets, or list the available data sets.
- data() #lists available data sets
- data(USArrests) #loads USArrests data.frame
grDevices::graphics.off() closes all graphic devices.
grDevices::dev.off() closes current graphic device.
ctrl-L clears the console.
rm(list = ls()) clears the workspace
Help
The CRAN Task Views webpage provide overviews on various topics in R.
utils::help provides help on functions. ? is shortcut.
- help(lm)
- help(lm)
- ?lm
- help(‘for’)
- help(package= Hmisc) #help on a package
utils::apropos() finds funcions or other objects by partial name.
- apropos(‘lm’)
- apropos(‘log’)
utils::help.search() provides a broader search of a topic in all installed packages. ?? is a shortcut.
- help.search(‘log’)
- ??’linear model’
base::args() displays the argument names and corresponding default values of a function.
- args(lm)
utils::RSiteSearch() provides a search of websites, mailing lists, etc.
- RSiteSearch(‘loglinear’,’functions’)
General Useful Functions
base::sign returns a vector with the signs of the corresponding elements in its single argument (the sign of a real number is 1, 0, or -1 if the number is positive, zero, or negative, respectively).
- sign(c(-2, -1, 0, 1, 2))
base::abs returns a vector with the absolute values of elements in its single argument.
- abs(c(-2, 0, 2))
base::identical(x,y) compare R objects ‘x’ and ‘y’ and tests for equality. Helpful when comparing vectors where == would return a vector but identical returns single TRUE or FALSE.
- identical(1, 1)
- identical(c(1,2), c(1,3))
base::is.element(x,y) tests ‘x %in% y’ and returns logical vector that is length of x.
- is.element(c(1,2), c(1,3,5,7))
base::is.na() indicates which elements are missing.
- is.na(c(1,NA,3))
base::unique()
base::sort(x, decreasing = FALSE, index.return = FALSE, …) sorts/orders a vector or factor (partially) into ascending (or descending) order. For ordering along more than one variable, (e.g., for sorting data frames), see order(). index.return=TRUE will return indices for the new sorted vector
- sort(c(2,1,5))
- sort(c(2,1,5), index.return=TRUE)
MASS::fractions() finds rational approximations to the components of a real numeric object.
- fractions(c(.5, .33333333))
Dataframes
Indexing and manipulating
utils::fix() allows simple editing of an exisiting dataframe via a crude text editor.
- fix(dData)
Creating a new dataframe with a subset of variables.
- dNew = dData[,c(‘SubID’, ‘BevGroup’, ‘Sex’, ‘FPS1’)
Creating a new dataframe w/o specific row #s.
- dNew = dData[-c(1:5,10),] #remove rows 1-5, 10
Creating a new dataframe based on values of a variable.
- dNew = dData[dData$Age > 21,] #Select participants with Age < 21
- dNew = dData[dData$Age > mean(dData$Age),] #Select participants with Age > mean Age
Remove a variable from dataframe.
- dData$SubID <- NULL
Create a data frame from all combinations of the supplied vectors or factors
expand.grid(c(‘control’, ‘placebo’, ‘alcohol’), c(‘word first’, ‘color first’))
Var1 Var2
1 control word first
2 placebo word first
3 alcohol word first
4 control color first
5 placebo color first
6 alcohol color first
Reshaping dataframe from Wide to Long format.
- dLong <- melt(dWide, id.vars = c(“SubID”, “Sex”, “Alcohol”, “Baseline”), variable.name = “Condition”, value.name = “Startle”)
1st argument = Wide format data frame
2nd argument = id.vars = List variables you want to keep in rows in the new data frame
3rd = variable.name = Name of new column header/variable
4th argument = value.name = Value to be input in columns
Reshaping dataframe from Long to Wide format.
- dWide <- dcast(dLong, SubID + Sex + Alcohol + Baseline ~ Condition, value.var = “Startle”)
1st argument = Long format data frame
2nd argument = Variables on the left of the ~ represents data that are staying in columns.
2nd argument = Variables on the right of the ~ represent variables to be transformed into wide format.
3rd argument = value.var = Represents the numeric values that are being transformed into wide format.
NOTE: Need to add information on merging dataframes
Working with Variables (DRAFT)
General manipulations
base::cbind() combines vectors together as columns
- test = cbind(dData$X1, dData$X2, dData$X3) #sets test to three columns from dData
String Pattern Matching
- search for a string pattern
- foo<-c(‘a’,’b’,’c’) #create variable with list of strings
- grep(“a”,foo, value=FALSE) #returns a list of indices of all levels of foo equal to ‘a’
- grep(“a”,foo, value=TRUE) #returns a list of strings in foo equal to ‘a’
- grepl(“a”,foo) #returns a logical index of levels of foo equal to ‘a’ (eg, returns 0 and 1 for every level of foo)
- search for a string pattern with wildcards
- foo<-c(‘abc’,’def’,’ghi’) #create variable with list of strings
- grep(glob2rx(‘*e*’),foo) #returns a list of all levels in foo containing the string ‘e’
Quantitative Variable manipulations
base::scale() mean and/or sd transforms a matrix. Default returns a matrix so must use index if working with variable in data.frame.
- dData$cX1 = scale(dData$X1, center=TRUE)[1] #mean center X1
- dData$zX1 = scale(dData$X1, center=TRUE, scale=TRUE)[1] #standardize X1
car::recode() recodes a numeric vector, character vector, or factor according to simple recode specifications.
- dData$NewX1 = recode(dData$X1, ‘lo:50″=1; 51:hi=2’)
- dData$NewX2 = recode(dData$X2, ‘c(1,2)=”A”; else=”B”‘)
base::rowMeans() is used to create a mean across a row (i.e., across variables in a data.frame). See also colMeans(), rowSums(), & colSums().
dData$MeanX123 = rowMeans(dData$X1,dData$X2, dData$X3)
Summary Statistics (DRAFT)
base::print() is used to print the object to the screen. It is a generic function whose method depends on the object
- print(c(1.234, 2.3456, 3),digits=2)
base::summary() is used to summarize an arguement. It is a generic function whose method depends on the object.
- summary(dData) #provide summary statistics for variables in a dataframe
- summary(mLM) #provide summary statistics for a linear model object
psych::describe() provides many common summary statistics for variables in a dataframe.
- describe(dData)
descriptives by group
- describe.by(dData, group,…)
base::table() provides cross tabs of counts for factors
- table(Sex)
- table (BevGroup, Sex)
xtabs
base::apply() returns a vector or array or list of values obtained by applying a function to margins of an array or data.frame. See also lapply(), sapply(), & tapply()
- apply(dData, 1, mean) #sum across each row in dData
- apply(dData, 2, sum) #sum down each column in dData
- apply(dData, 1, function(x) 7*mean(x, na.rm=TRUE)) #using anon function across rows in dData
- apply(dData, 1, make.scale) #applying user-defined function named make.scale()
Bivariate Statistics (DRAFT)
Correlation
stats::cor()
- cor(D1)
- cor(D1[,c(“mp1_con”, “mp1_nem”)])
stats::cor.test()
psych::corr.test()
Hmisc::rcorr.adjust()
psych::fisherz()
psych::fisherz2r()
psych::r.test()
psychometric::CIr()
- CIr(.5, n=100)
- CIr(.3, n=100, level= 0.99)
stats::padjust()
corpcor::cor2pcor()
Means comparison
stats::t.test()
Linear Models (DRAFT)
car::box.cox()
MASS::boxcox
car::box.tidwell()
car::ncvTest()
car::qq-plot()
car::cr.plots()
car::spread.level.plot()
model.matrix (~ type, data=dData)
Graphing (DRAFT)
There are many sample figures and additional resources for Graphing in R in our Wiki. In addition, the CRAN Graphics Task View provides a nice overview of graphing in R.
Options
graphics::par() sets and returns display (and many other) options for R. Type help(par) for detailed information on all parameters.
- par() #returns names and values of all current options
- par(cex.lab=1.5, cex.axis=1.2, lwd=2)
mfrow is used to produce multipanel figures filled by row (see also mfcol).
- par(mfrow = c(2,2)) #set options to produce a four panel figure filled topleft, top right, bottom left, bottom right.
- par(mfrow = c(1,2) #set options to produce two panel horizonal orientation
- par(mfrow = c(2,1) #set options to produce two panel vertical orientation
Colors
palette()
rainbox()
gray()
colors()
High-level plotting functions
graphics:::plot()
- plot(aex_tot ~ sss_tot, xlab=”MPQ Negative Emotionality”, ylab=”Anger Expression”)
hist()
- hist(Data$aex_tot, main=”Anger Expression”)
Low-level plotting functions
graphics:::abline()
- abline(lm(aex_tot ~ sss_tot),col= “red”, lwd=4)
- abline(h=mean(D1$aex_tot), col=”blue”, lwd=4)
graphics:::lines()
graphics:::points()
graphics:::axis()
graphics:::legend()
text()
polygon()
curve()
arrows()
See also p.arrows() in sfsmisc package
Other useful graphing functions
identify()
- identify(mp1_con,sss_tot)
- identify(D1$mp1,con,D1$sss_tot,labels=row.names(Data)
density()
- plot(density(Data$aex_tot),main=”Anger Expression”)
jitter()
Create multi-panel plot
locator()
plot(allEffects(mLM), ask=FALSE)
.
Mapping
https://rstudio.github.io/leaflet/
Building Packages
Read “Writing R Extensions” for more information. http://cran.r-project.org/doc/manuals/R-exts.html#Top
Read this for putting packages on CRAN http://cran.r-project.org/doc/manuals/R-exts.html#Submitting-a-package-to-CRAN
Using devtools to check and build package
- Choose version number for release. See: [1]
- Make sure package is up to date on Sourceforge
- export package to C:\RBuild\lmSupport\
- library(devtools)
- set working directory in RStudio to package folder (e.g., C:\RBuild\lmSupport\)
- check(document=FALSE)
- build()
- Upload to CRAN here: http://cran.r-project.org/submit.html
Other Notes
- To install the tar in R use, install.packages(‘P:/Methods/R/lmSupport/lmSupport_2.9.8.tar.gz’, repos=NULL, type= ‘source’)
Other R Reference Cards
R Reference Card by Tom Short
Regression Reference Card by Vito Ricci
Short R Reference Card by Jonathan Baron