Comments (5)
Within the join
command, 1:1 and m:1 are essentially identical except for one extra check at the end (equivalent to isid make
). So after everything is done and the merge is finalized, the program just runs something like isid
to verify that the IDs are in fact unique.
Now, because join does not have preserve+restore commands, you end up with a "dirty" dataset after the error. I chose that because otherwise running preserve on very large datasets is potentially very slow, and because I almost always run my analysis through do-files, so I would still re-run everything again.
This leads to what you found. That the results are almost like those of an m:1 join. However, there are a few lines that are actually run later: https://github.com/sergiocorreia/ftools/blob/master/src/join.ado#L101 (lines 110-130). Those lines enforce the checks of the assert()
option, as well as keep the sample required by the keep()
option. So they are not completely trivial.
All in all, I think it would just be better to run the isid
at the beginning of the command, in order to fail earlier and minimize waiting time. So I would suggest you to depend on this method.
Finally, on a related note I have a still in-progress update to ftools, that should make merges quite faster, as well as allow string+numeric keys. Probably won't be out for a couple of weeks though.
from ftools.
Hi Adam,
Wouldou be able to.give me an example so I can replicate it exactly on my side?
Perhaps something generated by the auto dataset, together with tempfile or dataex.
Thanks,
S
from ftools.
Hi Sergio,
Below is a toy example using auto.dta where the 1:1 fmerge switches to a m:1 fmerge and the 1:1 merge fails:
use auto.dta, clear
* Duplicate each observation
expand 2
sort make
foreach var in price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign {
rename `var' `var'dup
}
* fmerge
fmerge 1:1 make using auto.dta, keep(master match) keepusing(make price) nogen
rename price pricemerge
* merge
merge 1:1 make using auto.dta, keep(master match) keepusing(make price) nogen
After reading into the fact that fmerge is just a wrapper for join, I think this makes sense: join will work for m:1 and 1:1 given the keys you input.
from ftools.
For what it's worth: to my mind this is very useful - it saves me having to run my code again when I accidentally specify 1:1 - but perhaps it should be documented?
from ftools.
Thanks, Sergio!
I can't quite get my head around the implications of the assert
and keep
. In the cases I have worked on, they don't seem to change anything. I'll try to read more on this to see what is going on.
Great to hear there will be an even quicker merge! ftools (and gtools) have dramatically increased my pace.
from ftools.
Related Issues (20)
- fmerge error HOT 1
- data type following fcollapse HOT 1
- fcollapse with any missing weights returns all missing HOT 1
- fcollapse incorrectly parses negative weights HOT 6
- join: do not copy certain chars
- Adding update / replace to fmerge HOT 9
- join: problems with spaces in filepath HOT 1
- fmerge error HOT 4
- error with join HOT 3
- join: error when using labels and key has different name in using
- fmerge overwrites master dataset's xtset HOT 2
- Running ftools commands such as `join` can clear mata objects unrelated to ftools HOT 1
- Dict size exceeds Mata limits? HOT 5
- fmerge / join changes using-keys>100 to missing HOT 1
- Treat negative values in verbose(#) as if they were zero instead of positive HOT 1
- join does not clear sortedby macro HOT 6
- Support many to many join HOT 1
- fcollapse issue with double identifier HOT 2
- parallel_map crashes on computers with slow temp folder
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ftools.