Laboratory of Microbial Genomics and Big Data (강원대학교 미생물유전체빅데이터 연구실)

R: Basics - Labeling, Missing Data, Filtering - by Eun Bae Kim (08/22/2018)
 Visits : 497,910 ( Your IP 18.227.46.202 )
 

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
 10: 
 11: 
 12: 
 13: 
 14: 
 15: 
 16: 
 17: 
 18: 
 19: 
 20: 
 21: 
 22: 
 23: 
 24: 
 25: 
 26: 
 27: 
 28: 
 29: 
 30: 
 31: 
 32: 
 33: 
 34: 
 35: 
 36: 
 37: 
 38: 
 39: 
 40: 
 41: 
 42: 
 43: 
 44: 
 45: 
 46: 
 47: 
 48: 
 49: 
 50: 
 51: 
 52: 
 53: 
 54: 
 55: 
 56: 
 57: 
# Labeling values
# "Use the factor() function for nominal data and the ordered() function for ordinal data."
numData = c(1, 2, 3, 3, 2, 3, 1, 2)
numData
factor(numData)
labelData1 = factor(numData, levels = c(3, 2, 1), labels = c("Excellent", "Fair", "Bad"))
labelData1
labelData2 = ordered(numData, levels = c(3, 2, 1), labels = c("Excellent", "Fair", "Bad"))
numData
labelData2



# Handling Missing Data ###################################################################
numData1 = c(4, 3, 5, 7, 6, 8, NA, 3, 4, 3)
is.na(numData1)
mean(numData1)
mean(numData1, na.rm=TRUE)

complete.cases(numData1)
!complete.cases(numData1)

dfData = read.table(header = TRUE, text =
"Korean English Mathematics
EBKim 89 85 100
GDJin 87 70 80
JBPark 96 80 90
IHYou 88 100 85
JHWon NA 97 78
SJChoi 76 92 83")
dfData

dfData$Korean
dfData[complete.cases(dfData$Korean), ]   # only rows without NA at the "Korean" column

dfDataOmit <- na.omit(dfData)             # New data frame object without missing data
dfDataOmit

# Filtering Data
dfData = read.table(header = TRUE, text =
"Korean English Mathematics
EBKim 89 85 100
GDJin 50 70 80
JBPark 96 80 90
IHYou 88 100 85
JHWon 50 97 78
SJChoi 76 92 83")
dfData

dfData[dfData$Korean>50, ]

numData2 = c(4, 3, 5, 7, 6, 8, NA, 3, 4)
numData2 > 4
numData2[numData2 > 4]
!is.na(numData2)
numData2[!is.na(numData2)]
numData2[numData2 > 4 & !is.na(numData2)]



Kangwon National University