dplyr을 사용하여 여러 열의 합계

Nice programing

dplyr을 사용하여 여러 열의 합계

nicepro 2020. 10. 26. 21:02

dplyr을 사용하여 여러 열의 합계

내 질문에는 데이터 프레임의 여러 열에 걸쳐 값을 합산하고을 사용 하여이 합계에 해당하는 새 열을 만드는 것이 포함됩니다 dplyr. 열의 데이터 항목은 binary (0,1)입니다. 의 summarise_each또는 mutate_each함수의 행 방식 아날로그를 생각하고 dplyr있습니다. 다음은 데이터 프레임의 최소 예입니다.

library(dplyr)
df=data.frame(
  x1=c(1,0,0,NA,0,1,1,NA,0,1),
  x2=c(1,1,NA,1,1,0,NA,NA,0,1),
  x3=c(0,1,0,1,1,0,NA,NA,0,1),
  x4=c(1,0,NA,1,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,1,1,NA,1,0,1))

> df
   x1 x2 x3 x4 x5
1   1  1  0  1  1
2   0  1  1  0  1
3   0 NA  0 NA NA
4  NA  1  1  1  1
5   0  1  1  0  1
6   1  0  0  0  1
7   1 NA NA NA NA
8  NA NA NA  0  1
9   0  0  0  0  0
10  1  1  1  1  1

다음과 같이 사용할 수 있습니다.

df <- df %>% mutate(sumrow= x1 + x2 + x3 + x4 + x5)

그러나 이것은 각 열의 이름을 작성하는 것을 포함합니다. 50 개의 열이 있습니다. 또한이 작업을 구현하려는 루프의 다른 반복에서 열 이름이 변경되므로 열 이름을 제공하지 않으려 고합니다.

어떻게 가장 효율적으로 할 수 있습니까? 도움을 주시면 대단히 감사하겠습니다.

어때

각 열을 요약

df %>%
   replace(is.na(.), 0) %>%
   summarise_all(funs(sum))

각 행을 요약

df %>%
   replace(is.na(.), 0) %>%
   mutate(sum = rowSums(.[1:5]))

특정 패턴 이름을 가진 변수를 합산하기 위해 정규식 일치를 사용합니다. 예를 들면 :

df <- df %>% mutate(sum1 = rowSums(.[grep("x[3-5]", names(.))], na.rm = TRUE),
                    sum_all = rowSums(.[grep("x", names(.))], na.rm = TRUE))

이렇게하면 데이터 프레임의 특정 변수 그룹의 합계로 둘 이상의 변수를 만들 수 있습니다.

특정 열만 합산하려면 다음과 같이 사용합니다.

library(dplyr)
df=data.frame(
  x1=c(1,0,0,NA,0,1,1,NA,0,1),
  x2=c(1,1,NA,1,1,0,NA,NA,0,1),
  x3=c(0,1,0,1,1,0,NA,NA,0,1),
  x4=c(1,0,NA,1,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,1,1,NA,1,0,1))
df %>% select(x3:x5) %>% rowSums(na.rm=TRUE) -> df$x3x5.total
head(df)

이렇게하면 dplyr::select의 구문을 사용할 수 있습니다 .

이 문제가 자주 발생하며이를 수행하는 가장 쉬운 방법 apply()은 mutate명령 내 에서 함수 를 사용하는 것 입니다.

library(tidyverse)
df=data.frame(
  x1=c(1,0,0,NA,0,1,1,NA,0,1),
  x2=c(1,1,NA,1,1,0,NA,NA,0,1),
  x3=c(0,1,0,1,1,0,NA,NA,0,1),
  x4=c(1,0,NA,1,0,0,NA,0,0,1),
  x5=c(1,1,NA,1,1,1,NA,1,0,1))

df %>%
  mutate(sum = select(., x1:x5) %>% apply(1, sum, na.rm=TRUE))

Here you could use whatever you want to select the columns using the standard dplyr tricks (e.g. starts_with() or contains()). By doing all the work within a single mutate command, this action can occur anywhere within a dplyr stream of processing steps. Finally, by using the apply() function, you have the flexibility to use whatever summary you need, including your own purpose built summarization function.

Alternatively, if the idea of using a non-tidyverse function is unappealing, then you could gather up the columns, summarize them and finally join the result back to the original data frame.

df <- df %>% mutate( id = 1:n() )   # Need some ID column for this to work

df <- df %>%
  group_by(id) %>%
  gather('Key', 'value', starts_with('x')) %>%
  summarise( Key.Sum = sum(value) ) %>%
  left_join( df, . )

Here I used the starts_with() function to select the columns and calculated the sum and you can do whatever you want with NA values. The downside to this approach is that while it is pretty flexible, it doesn't really fit into a dplyr stream of data cleaning steps.

Using reduce() from purrr is slightly faster than rowSums and definately faster than apply, since you avoid iterating over all the rows and just take advantage of the vectorized operations:

library(purrr)
library(dplyr)
iris %>% mutate(Petal = reduce(select(., starts_with("Petal")), `+`))

See this for timings

참고URL : https://stackoverflow.com/questions/28873057/sum-across-multiple-columns-with-dplyr

'Nice programing' 카테고리의 다른 글

가능한 한 include 대신 포워드 선언을 사용해야합니까? (0)	2020.10.26
Optional.ifPresent ()의 적절한 사용 (0)	2020.10.26
주석을 추가하면 파서가 중단되는 이유는 무엇입니까? (0)	2020.10.26
Javascript / CSS를 통해 HTML / 이미지를 뒤집는 브라우저 간 방법? (0)	2020.10.25
PowerShell을 사용하여 폴더의 항목 계산 (0)	2020.10.25

현재글dplyr을 사용하여 여러 열의 합계

nicepro

dplyr을 사용하여 여러 열의 합계

dplyr을 사용하여 여러 열의 합계

'Nice programing' 카테고리의 다른 글

'Nice programing'의 다른글

티스토리툴바

dplyr을 사용하여 여러 열의 합계

dplyr을 사용하여 여러 열의 합계

'Nice programing' 카테고리의 다른 글

'Nice programing'의 다른글

관련글

티스토리툴바