Git Product home page Git Product logo

Comments (2)

alexandersimoes avatar alexandersimoes commented on August 20, 2024

Thanks for your interest... as I'm sure you know there are many ways to skin a cat (so to speak). Feel free to take a look at the import scripts which are found in the scripts branch of this repo and get back with any feedback you may have for us. We're constantly improving our data collection/calculation methods.

from oec.

pstoyanov avatar pstoyanov commented on August 20, 2024

Hello again,
I did look at the scripts, and they seem to deal with steps after those in my question. I do not know Python but it seems that you are importing the already processed data, and calculating the respective aggregates, RCAs, growth rates, etc. Are you drawing data from some internal database rather than the original sources you mention? E.g. the Feenstra's WTF Stata/SAS files from http://cid.econ.ucdavis.edu?
I am interested in the cleaning up steps (for the SITC data, I assume this is not needed for BACI). My goal was to compare your approach to my current work in matching data from Feenstra’s World Trade Flows database to Comtrade, the EU’s Comext, PWT and World Bank’s WDI.
Below are a few detailed examples & questions. Apologies for getting it so long, I will break it up as separate issues if you want me to. Since they are related, I ended up with a giant post.

[Cosmetic issue] You may want to fix the files in the database dump – oec_2014-04-30.sql.bz2 is in the ./static/db/ folder and not in ./static/db/sql; hs_yodp.csv.bz2 is in the sql folder and not in the csv folder; and there is no csv dump for oec_2014-04-30. It would also be nice to have MD5/SHA/whatever checksums to verify whether the data has changed without re-downloading the large files.

The remarks below are based on the oec_2014-04-30.sql file. I did, however, cut it up with sed to import just the sitc yodp table into R, as I do not have a MySQL server. Even though I think it is a fairly straightforward process, I may have made a mistake.

[Country codes] The country code for Canada is “211240” in Feenstra’s World Trade Flows (WTF), which you give as a source for data prior to 2001. Your scripts seem to use directly the ISO 3-char alpha codes, with the continent prefix. The step of matching WTF country codes to ISO codes is missing (at least I could not locate it in the scripts).
As you point out in the updates section for Germany and Benelux, in some cases a single ISO code may correspond to several country names, and at least in theory, vice versa (the same ISO code may be reassigned to another country after a few years). You may want to include the year as part of your key for ISO codes matching.
And I assume you are aware that merging country data is a very slippery slope – (1) in most cases with former socialist countries there is little info on the trade between the countries you merge and you end up with overstating their trade flows if you view them as a whole, and (2) there are too many similar cases, over a dozen come to mind only in Eastern/Central Europe and the former USSR…

[World totals] It is not clear what “World” means in your case. Is this a ‘true’ world total, including all flows (i.e. even ones which are only reported at a higher SITC level, 3- or 2-digit), or is this calculated as the sum of the individual flows in the database? My guess is that it is actually a leftover from the original dataset, since it is only present for some years. I think you should be more explicit and consistent (either define & include World for all years, or remove it altogether).
Currently the sum of the individual flows matches the World flows for 1962:1973, there are slight (less than 1%) differences over 1974:1983, and no World totals from 1984 onwards. In the WTF there are World totals for all years, calculated as the sum of the individual flows in the dataset.

[SITC codes] The WTF uses a lot of artificial codes (which include As and Xs in the codes), which are not present in your database. Looking at the data, it seems that you have simply dropped them – which is OK I guess, since they are not actual products. But since they do not appear uniformly throughout the years, this may be related to the world totals issue above (you are dropping codes disproportionately), and makes the below even stranger.

[Number of flows] Probably the most puzzling of all – there are way too many individual flows. Take year 2000 for example. In the WTF the grand total world trade is 6,568,385,296,000 dollars. In OEC, the sum of the individual flows is 5,599,123,372,802; or about 14.7% lower. That said, in the WTF there are 670,973 individual flows, after excluding flows where exporter or importer is “World”. In OEC there are 2,026,932 individual flows with origin_id!=”xxwld” & dest_id!=”xxwld” & export_val>0. The differences are very small for 1962:1983, and then balloon after 1984. Also, starting 1984, the number of trading country pairs is much higher in OEC than in WTF, after being identical over 1962:1983.
Considering WTF carries quantity info, i.e. sometimes there are several records for one exporter/importer/year/product combo (with different quantity units), and OEC does not, I cannot explain this. When I have the time I maybe will go through the data country-by-country, but maybe I am missing something very basic.

Once again, sorry for the long write-up, don't take it as a rant ;)

from oec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.