R: Merge Two Data Frames (2024)

merge {base}R Documentation

Description

Merge two data frames by common columns or row names, or do otherversions of database join operations.

Usage

merge(x, y, ...)## Default S3 method:merge(x, y, ...)## S3 method for class 'data.frame'merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x",".y"), no.dups = TRUE, incomparables = NULL, ...)

Arguments

x, y

data frames, or objects to be coerced to one.

by, by.x, by.y

specifications of the columns used for merging.See ‘Details’.

all

logical; all = L is shorthand for all.x = L andall.y = L, where L is either TRUE orFALSE.

all.x

logical; if TRUE, then extra rows will be added tothe output, one for each row in x that has no matching row iny. These rows will have NAs in those columns that areusually filled with values from y. The default isFALSE, so that only rows with data from both x andy are included in the output.

all.y

logical; analogous to all.x.

sort

logical. Should the result be sorted on the bycolumns?

suffixes

a character vector of length 2 specifying the suffixesto be used for making unique the names of columns in the resultwhich are not used for merging (appearing in by etc).

no.dups

logical indicating that suffixes are appended inmore cases to avoid duplicated column names in the result. Thiswas implicitly false before R version 3.5.0.

incomparables

values which cannot be matched. Seematch. This is intended to be used for merging on onecolumn, so these are incomparable values of that column.

...

arguments to be passed to or from methods.

Details

merge is a generic function whose principal method is for dataframes: the default method coerces its arguments to data frames andcalls the "data.frame" method.

By default the data frames are merged on the columns with names theyboth have, but separate specifications of the columns can be given byby.x and by.y. The rows in the two data frames thatmatch on the specified columns are extracted, and joined together. Ifthere is more than one match, all possible matches contribute one roweach. For the precise meaning of ‘match’, seematch.

Columns to merge on can be specified by name, number or by a logicalvector: the name "row.names" or the number 0 specifiesthe row names. If specified by name it must correspond uniquely to anamed column in the input.

If by or both by.x and by.y are of length 0 (alength zero vector or NULL), the result, r, is theCartesian product of x and y, i.e.,dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y)).

If all.x is true, all the non matching cases of x areappended to the result as well, with NA filled in thecorresponding columns of y; analogously for all.y.

If the columns in the data frames not used in merging have any commonnames, these have suffixes (".x" and ".y" bydefault) appended to try to make the names of the result unique. Ifthis is not possible, an error is thrown.

If a by.x column name matches one of y, and ifno.dups is true (as by default), the y version gets suffixed aswell, avoiding duplicate column names in the result.

The complexity of the algorithm used is proportional to the length ofthe answer.

In SQL database terminology, the default value of all = FALSEgives a natural join, a special case of an innerjoin. Specifying all.x = TRUE gives a left (outer)join, all.y = TRUE a right (outer) join, and both(all = TRUE) a (full) outer join. DBMSes do not matchNULL records, equivalent to incomparables = NA in R.

Value

A data frame. The rows are by default lexicographically sorted on thecommon columns, but for sort = FALSE are in an unspecified order.The columns are the common columns followed by theremaining columns in x and then those in y. If thematching involved row names, an extra character column calledRow.names is added at the left, and in all cases the result has‘automatic’ row names.

Note

This is intended to work with data frames with vector-like columns:some aspects work with data frames containing matrices, but not all.

Currently long vectors are not accepted for inputs, which are thusrestricted to less than 2^31 rows. That restriction also applies tothe result for 32-bit platforms.

See Also

data.frame,by,cbind.

dendrogram for a class which has a merge method.

Examples

authors <- data.frame( ## I(*) : use character columns of names to get sensible sort order surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")), nationality = c("US", "Australia", "US", "UK", "Australia"), deceased = c("yes", rep("no", 4)))authorN <- within(authors, { name <- surname; rm(surname) })books <- data.frame( name = I(c("Tukey", "Venables", "Tierney", "Ripley", "Ripley", "McNeil", "R Core")), title = c("Exploratory Data Analysis", "Modern Applied Statistics ...", "LISP-STAT", "Spatial Statistics", "Stochastic Simulation", "Interactive Data Analysis", "An Introduction to R"), other.author = c(NA, "Ripley", NA, NA, NA, NA, "Venables & Smith"))(m0 <- merge(authorN, books))(m1 <- merge(authors, books, by.x = "surname", by.y = "name")) m2 <- merge(books, authors, by.x = "name", by.y = "surname")stopifnot(exprs = { identical(m0, m2[, names(m0)]) as.character(m1[, 1]) == as.character(m2[, 1]) all.equal(m1[, -1], m2[, -1][ names(m1)[-1] ]) identical(dim(merge(m1, m2, by = NULL)), c(nrow(m1)*nrow(m2), ncol(m1)+ncol(m2)))})## "R core" is missing from authors and appears only here :merge(authors, books, by.x = "surname", by.y = "name", all = TRUE)## example of using 'incomparables'x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5)y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5)merge(x, y, by = c("k1","k2")) # NA's matchmerge(x, y, by = "k1") # NA's match, so 6 rowsmerge(x, y, by = "k2", incomparables = NA) # 2 rows

[Package base version 4.4.1 Index]

R: Merge Two Data Frames (2024)

FAQs

How to combine two data frames into one in R? ›

Merges with base R

One base R way to do this is with the merge() function, using the basic syntax merge(df1, df2) . The order of data frame 1 and data frame 2 doesn't matter, but whichever one is first is considered x and the second one is y.

How do I combine multiple data frames into one? ›

concat() method is used to combine DataFrames either vertically (along rows) or horizontally (along columns). It takes a list of DataFrames as input and concatenates them based on the specified axis (0 for vertical, 1 for horizontal).

How do I join more than two data frames in R? ›

Multiple data frames can be merged together at once by stringing multiple calls to inner_join with the pipe %>% . inner_join() will use the month column as the column to match on, as the sales , target , and small_medium_large data frames have a month column.

How do you match two data frames? ›

The merge() operation is a method used to combine two dataframes based on one or more common columns, also called keys. The resulting data frame contains only the rows from both dataframes with matching keys. The merge() function is similar to the SQL JOIN operation.

How to merge two datasets? ›

To do this you use a MERGE statement and a BY statement within a data step, like this: DATA New-Dataset-Name (OPTIONS); MERGE Dataset-Name-1 (OPTIONS) Dataset-Name-2 (OPTIONS); BY Variable(s); RUN; You must sort both datasets on your matching variable(s) before merging them!

How do I combine two rows in R data frame by addition? ›

How to combine two rows in R data frame by addition?
  1. First of all, create a data frame.
  2. Then, using plus sign (+) to add two rows and store the addition in one of the rows.
  3. After that, remove the row that is not required by subsetting with single square brackets.
Nov 16, 2021

How to append 3 DataFrames in R? ›

To append (add) rows from one or more dataframes to another, use the bind_rows() function from dplyr . This function is especially useful in combining survey responses from different individuals. bind_rows() will match columns by name, so the dataframes can have different numbers and names of columns and rows.

How can you merge two data frames without losing any rows? ›

Outer Join

If a row doesn't have a match in the other DataFrame based on the key column(s), then you won't lose the row like you would with an inner join. Instead, the row will be in the merged DataFrame, with NaN values filled in where appropriate.

Can you left join 3 tables in R? ›

Fortunately, the left join() function from the dplyr package makes this simple to accomplish. We can easily conduct two left joins, one after the other, to combine all three data frames. connect the three data frames.

What is the difference between join and merge Dataframes? ›

Both join and merge can be used to combines two dataframes but the join method combines two dataframes on the basis of their indexes whereas the merge method is more versatile and allows us to specify columns beside the index to join on for both dataframes.

Which function is used to merge two data frames? ›

Pandas DataFrame merge() Method

The merge() method updates the content of two DataFrame by merging them together, using the specified method(s).

How do you compare two data frames in R? ›

We can use the compare package in R. We can easily use this package to compare two data frames and check out the summary of what extent it is changed. The function comparedf() is used to compare two dataframes in R. The function takes two dataframes and then check them for comparison.

How do I combine two lists into a Dataframe in R? ›

The expand. grid function create a data frame from all combinations of the provided lists or vectors or factors. For example, if we have two lists defined as List1 and List2 then we can create a data frame using the code expand. grid(List1,List2).

How do I combine two data series into a Dataframe? ›

Combine Two Series Using pandas.

merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case. For instance, pd. merge(S1, S2, right_index=True, left_index=True) .

What is merging in data frames? ›

Definition and Usage

The merge() method updates the content of two DataFrame by merging them together, using the specified method(s). Use the parameters to control which values to keep and which to replace.

References

Top Articles
Latest Posts
Article information

Author: Fredrick Kertzmann

Last Updated:

Views: 6462

Rating: 4.6 / 5 (66 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Fredrick Kertzmann

Birthday: 2000-04-29

Address: Apt. 203 613 Huels Gateway, Ralphtown, LA 40204

Phone: +2135150832870

Job: Regional Design Producer

Hobby: Nordic skating, Lacemaking, Mountain biking, Rowing, Gardening, Water sports, role-playing games

Introduction: My name is Fredrick Kertzmann, I am a gleaming, encouraging, inexpensive, thankful, tender, quaint, precious person who loves writing and wants to share my knowledge and understanding with you.