Git Product home page Git Product logo

pyblp's Introduction

PyBLP

docs-badge pypi-badge downloads-badge python-badge license-badge

An overview of the model, examples, references, and other documentation can be found on Read the Docs.

PyBLP is a Python 3 implementation of routines for estimating the demand for differentiated products with BLP-type random coefficients logit models. This package was created by Jeff Gortmaker in collaboration with Chris Conlon.

Development of the package has been guided by the work of many researchers and practitioners. For a full list of references, including the original work of Berry, Levinsohn, and Pakes (1995), refer to the references section of the documentation.

Citation

If you use PyBLP in your research, we ask that you also cite Conlon and Gortmaker (2020), which describes the advances implemented in the package.

@article{PyBLP,
    author = {Conlon, Christopher and Gortmaker, Jeff},
    title = {Best practices for differentiated products demand estimation with {PyBLP}},
    journal = {The RAND Journal of Economics},
    volume = {51},
    number = {4},
    pages = {1108-1161},
    doi = {https://doi.org/10.1111/1756-2171.12352},
    url = {https://onlinelibrary.wiley.com/doi/abs/10.1111/1756-2171.12352},
    eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1111/1756-2171.12352},
    year = {2020}
}

If you use PyBLP's micro moments functionality, we ask that you also cite Conlon and Gortmaker (2023), which describes the standardized framework implemented by PyBLP for incorporating micro data into BLP-style estimation.

@misc{MicroPyBLP,
    author = {Conlon, Christopher and Gortmaker, Jeff},
    title = {Incorporating micro data into differentiated products demand estimation with {PyBLP}},
    note = {Working paper},
    year = {2023}
}

Installation

The PyBLP package has been tested on Python versions 3.6 through 3.9. The SciPy instructions for installing related packages is a good guide for how to install a scientific Python environment. A good choice is the Anaconda Distribution, since it comes packaged with the following PyBLP dependencies: NumPy, SciPy, SymPy, and Patsy. For absorption of high dimension fixed effects, PyBLP also depends on its companion package PyHDFE, which will be installed when PyBLP is installed.

However, PyBLP may not work with old versions of its dependencies. You can update PyBLP's Anaconda dependencies with:

conda update numpy scipy sympy patsy

You can update PyHDFE with:

pip install --upgrade pyhdfe

You can install the current release of PyBLP with pip:

pip install pyblp

You can upgrade to a newer release with the --upgrade flag:

pip install --upgrade pyblp

If you lack permissions, you can install PyBLP in your user directory with the --user flag:

pip install --user pyblp

Alternatively, you can download a wheel or source archive from PyPI. You can find the latest development code on GitHub and the latest development documentation here.

Other Languages

Once installed, PyBLP can be incorporated into projects written in many other languages with the help of various tools that enable interoperability with Python.

For example, the reticulate package makes interacting with PyBLP in R straightforward (when supported, Python objects can be converted to their R counterparts with the py_to_r function, which needs to be used manually because we set convert=FALSE to get rid of errors about trying to automatically convert unsupported objects):

library(reticulate)
pyblp <- import("pyblp", convert=FALSE)
pyblp$options$flush_output <- TRUE

Similarly, PyCall can be used to incorporate PyBLP into a Julia workflow:

using PyCall
pyblp = pyimport("pyblp")

The py command serves a similar purpose in MATLAB:

py.pyblp

Features

  • R-style formula interface
  • Bertrand-Nash supply-side moments
  • Multiple equation GMM
  • Demographic interactions
  • Product-specific demographics
  • Consumer-specific product availability
  • Flexible micro moments that can match statistics based on survey data
  • Support for micro moments based on second choice data
  • Support for optimal micro moments that match micro data scores
  • Fixed effect absorption
  • Nonlinear functions of product characteristics
  • Concentrating out linear parameters
  • Flexible random coefficient distributions
  • Parameter bounds and constraints
  • Random coefficients nested logit (RCNL)
  • Approximation to the pure characteristics model
  • Varying nesting parameters across groups
  • Logit and nested logit benchmarks
  • Classic BLP instruments
  • Differentiation instruments
  • Optimal instruments
  • Covariance restrictions
  • Adjustments for simulation error
  • Tests of overidentifying and model restrictions
  • Parametric boostrapping post-estimation outputs
  • Elasticities and diversion ratios
  • Marginal costs and markups
  • Passthrough calculations
  • Profits and consumer surplus
  • Newton and fixed point methods for computing pricing equilibria
  • Merger simulation
  • Custom counterfactual simulation
  • Synthetic data construction
  • SciPy or Artleys Knitro optimization
  • Fixed point acceleration
  • Monte Carlo, quasi-random sequences, quadrature, and sparse grids
  • Importance sampling
  • Custom optimization and iteration routines
  • Robust and clustered errors
  • Linear or log-linear marginal costs
  • Partial ownership matrices
  • Analytic gradients
  • Finite difference Hessians
  • Market-by-market parallelization
  • Extended floating point precision
  • Robust error handling

Bugs and Requests

Please use the GitHub issue tracker to submit bugs or to request features.

pyblp's People

Contributors

chrisconlon avatar fpinter avatar james-atkins avatar jeffgortmaker avatar kenneth-rios avatar mcaceresb avatar ryanpiao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyblp's Issues

optimization fails with np.longdouble and 'tnc' or 'l-bfgs' with nevo example

Using Nevo dataset and np.longdouble precision fails under the 'l-bfgs-b' and 'tnc' solvers immediately errors out.

Works fine with 'knitro' and 'slsqp' solver.

pyblp.options.dtype = np.longdouble
pyblp.options.digits = 3
pyblp.options.verbose = True

nevo_results_lbfgs = nevo_problem.solve(nevo_sigma,nevo_pi,steps=1,optimization=pyblp.Optimization('l-bfgs-b'))
nevo_results_tnc = nevo_problem.solve(nevo_sigma,nevo_pi,steps=1,optimization=pyblp.Optimization('tnc'))

LBFGS trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-28ef3977a353> in <module>()
      1 ### L-BFGS+TNC fails with high precision numerics
----> 2 nevo_results_lbfgs = nevo_problem.solve(nevo_sigma,nevo_pi,steps=1,optimization=pyblp.Optimization('l-bfgs-b'))
      3 #nevo_results_tnc = nevo_problem.solve(nevo_sigma,nevo_pi,steps=1,optimization=pyblp.Optimization('tnc'))

~/anaconda3/lib/python3.6/site-packages/pyblp/problem.py in solve(self, sigma, pi, sigma_bounds, pi_bounds, delta, WD, WS, steps, optimization, custom_display, error_behavior, error_punishment, iteration, linear_costs, linear_fp, center_moments, se_type, processes)
    423             bounds = [p.bounds for p in parameter_info.unfixed]
    424             verbose = options.verbose and not custom_display
--> 425             theta = optimization._optimize(objective_function, theta, bounds, verbose=verbose)
    426             end_time = time.time()
    427             output("")

~/anaconda3/lib/python3.6/site-packages/pyblp/utilities/optimization.py in _optimize(self, objective_function, start_values, bounds, verbose)
    202             if self._supports_bounds:
    203                 minimize_kwargs['bounds'] = bounds
--> 204             return scipy.optimize.minimize(objective_function, start_values, **minimize_kwargs).x
    205 
    206         # set default Knitro options

~/anaconda3/lib/python3.6/site-packages/scipy/optimize/_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    485     elif meth == 'l-bfgs-b':
    486         return _minimize_lbfgsb(fun, x0, args, jac, bounds,
--> 487                                 callback=callback, **options)
    488     elif meth == 'tnc':
    489         return _minimize_tnc(fun, x0, args, jac, bounds, callback=callback,

~/anaconda3/lib/python3.6/site-packages/scipy/optimize/lbfgsb.py in _minimize_lbfgsb(fun, x0, args, jac, bounds, disp, maxcor, ftol, gtol, eps, maxfun, maxiter, iprint, callback, maxls, **unknown_options)
    326         _lbfgsb.setulb(m, x, low_bnd, upper_bnd, nbd, f, g, factr,
    327                        pgtol, wa, iwa, task, iprint, csave, lsave,
--> 328                        isave, dsave, maxls)
    329         task_str = task.tostring()
    330         if task_str.startswith(b'FG'):

ValueError: failed to initialize intent(inout) array -- expected elsize=8 but got 16

TNC trace:

ValueError                                Traceback (most recent call last)
<ipython-input-31-df7d85ccaaa8> in <module>()
      1 ### L-BFGS+TNC fails with high precision numerics
      2 #nevo_results_lbfgs = nevo_problem.solve(nevo_sigma,nevo_pi,steps=1,optimization=pyblp.Optimization('l-bfgs-b'))
----> 3 nevo_results_tnc = nevo_problem.solve(nevo_sigma,nevo_pi,steps=1,optimization=pyblp.Optimization('tnc'))

~/anaconda3/lib/python3.6/site-packages/pyblp/problem.py in solve(self, sigma, pi, sigma_bounds, pi_bounds, delta, WD, WS, steps, optimization, custom_display, error_behavior, error_punishment, iteration, linear_costs, linear_fp, center_moments, se_type, processes)
    423             bounds = [p.bounds for p in parameter_info.unfixed]
    424             verbose = options.verbose and not custom_display
--> 425             theta = optimization._optimize(objective_function, theta, bounds, verbose=verbose)
    426             end_time = time.time()
    427             output("")

~/anaconda3/lib/python3.6/site-packages/pyblp/utilities/optimization.py in _optimize(self, objective_function, start_values, bounds, verbose)
    202             if self._supports_bounds:
    203                 minimize_kwargs['bounds'] = bounds
--> 204             return scipy.optimize.minimize(objective_function, start_values, **minimize_kwargs).x
    205 
    206         # set default Knitro options

~/anaconda3/lib/python3.6/site-packages/scipy/optimize/_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    488     elif meth == 'tnc':
    489         return _minimize_tnc(fun, x0, args, jac, bounds, callback=callback,
--> 490                              **options)
    491     elif meth == 'cobyla':
    492         return _minimize_cobyla(fun, x0, args, constraints, **options)

~/anaconda3/lib/python3.6/site-packages/scipy/optimize/tnc.py in _minimize_tnc(fun, x0, args, jac, bounds, eps, scale, offset, mesg_num, maxCGit, maxiter, eta, stepmx, accuracy, minfev, ftol, xtol, gtol, rescale, disp, callback, **unknown_options)
    407                                         offset, messages, maxCGit, maxfun,
    408                                         eta, stepmx, accuracy, fmin, ftol,
--> 409                                         xtol, pgtol, rescale, callback)
    410 
    411     funv, jacv = func_and_grad(x)

ValueError: tnc: invalid gradient vector.

Trouble simulating fake data

Hi Jeff,

Love the package. But having some trouble simulating fake data when the dimensions of the random effects are large. Not sure precisely what's going on, but I do know that from my own function to find the equilibrium price/shares it can be sensitive to parameters. Any tips?


import pyblp
import numpy as np
import pandas as pd

pyblp.options.digits = 2
pyblp.options.verbose = False
pyblp.__version__

id_data = pyblp.build_id_data(T=50, J=20, F=10)
integration = pyblp.Integration('product', 9)
simulation = pyblp.Simulation(
   product_formulations=(
       pyblp.Formulation('0 + prices + x1 + x2 + x3 + x4 + x5 + x6'),
       pyblp.Formulation('0 + prices + x1 + x2 + x3 + x4 + x5 + x6'),
       pyblp.Formulation('1 + x1 + x2 + x3 + x4 + x5 + x6 + z1 + z2 + z3')
   ),
   beta=[-2, 2, 1, -1, 1, -1, 1],
   sigma=np.diag([0.4, 0.3, 0.5, 0.3, 0.2, 0.7, 0.8]),
   gamma=[5, 1, -1, 1, -1, 1, -1, 1, -1, 1],
   product_data=id_data,
   integration=integration,
   seed=0
)

costs_type on version 0.8.0

I am trying to reproduce the "Random Coefficients Logit Tutorial with the Automobile Data" from the package documentation using version 0.8.0, but the 'solve' function does not recognize the 'costs_type' argument. The code run if I drop 'costs_type'. Is this a bug? Or the package does not support more 'costs_type'?

Thank you

Adding Supply Explodes Memory Usage

I am running a fairly big problem with about 9000 product firms, absorbing a single fixed effect (with ~250 levels), with five random coefficients (price being endogenous). I can give more details in needed. Using only the demand side, I can estimate the model in Matlab and via pyBLP and get the same results (though your pyBLP is much faster). There is no memory strain on my 16gb RAM laptop. In fact I can run three or four notebooks with no problem.

Issue: I wanted to try out pyBLP's Supply Side which I have not coded in Matlab. When I did this, the memory usage exploded and the kernel died. Do you know why this could be the case? Is there anything that could be done given the problem size?

Note: I chose the log option, used the cost bounds from the BLP example, and absorbed the FEs as in the X1 matrix for demand.

Aside: thank you for all your work in developing pyBLP; I've implemented some items into my Matlab code and am looking forward to using you package in other projects going forward

Best,
Luke

could not change the firm_ids and ownership when calculating marginal costs!

API : ProblemResults.compute_costs(firm_ids=None, ownership=None, market_id=None)

  1. I did not include firm_ids in the product_data, but using results.compute_costs could still calculate the marginal costs. The API documentation says you must include firm_ids field before calculating marginal costs. so why could it work without firm_ids?
  2. After estimation, I try to use ProblemResults.compute_costs(firm_ids=None, ownership=None, market_id=None) to calculate the costs, but error shows up, which says that could not find firm_ids in compute_costs(), there is only market_id in this API?

Add FRAC estimation

From Salanie and Wolak (2018). This is easy to implement with a new construction function or Problem.solve argument (or both). It can also give great starting values for Problem.solve.

knitroNumPy is now in examples/Python/build/lib/knitro/numpy instead of examples/Python. Version 12 also calls a wrapper and ktr to verify license and for backwards compatibility. Simplest solution may be an if/else on knitro.__version__ to load knitro.numpy or knitroNumPy.

By any chance do you have some code solving this issue that you could post?

knitroNumPy is now in examples/Python/build/lib/knitro/numpy instead of examples/Python. Version 12 also calls a wrapper and ktr to verify license and for backwards compatibility. Simplest solution may be an if/else on knitro.version to load knitro.numpy or knitroNumPy.

I'll report back on other alternatives if I can find one.

Thanks!

Originally posted by @kw468 in #33 (comment)

BLP (1995) tutorial not working

I'm trying to run the BLP tutorial without any modifications to the code from the documentation tutorial and I run into an error creating the "Problem" (looks like something with the costs_type argument).
It runs if I remove costs_type = "log".

image

I'm using numpy 1.17.0 and pandas 0.22.0

Finding it difficult to understand micro moments syntax

Hi Jeff,

I was wondering if you wouldn't mind helping me understand the syntax for adding micro moments. I tried this (as well as a number of other attempts at figuring out the syntax):

Micro_formulations = pyblp.FirstChoiceCovarianceMoment(1, 1, 0.45744)
results1 = pr_problem.solve(...,micro_moments=(Micro_formulations))

and got the error:

File "", line 2, in
results1 = pr_problem.solve(initial_sigma1,initial_pi1,sigma_bounds=sigma_bounds, pi_bounds = pi_bounds, method='1s',iteration=iteration,W_type='clustered',se_type='clustered',optimization=optimization,micro_moments=(Micro_formulations))

File "C:\Users\shiller.USERS\AppData\Local\Continuum\anaconda3\lib\site-packages\pyblp-0.8.1-py3.6.egg\pyblp\economies\problem.py", line 441, in solve
moments = EconomyMoments(self, micro_moments)

File "C:\Users\shiller.USERS\AppData\Local\Continuum\anaconda3\lib\site-packages\pyblp-0.8.1-py3.6.egg\pyblp\moments.py", line 183, in init
raise TypeError("micro_moments must be a sequence of micro moment instances.")

TypeError: micro_moments must be a sequence of micro moment instances.

Also, one other quick question. Is the demographics_index in the micro moments formulation the column of agent data loaded from a CSV (where the first column might identify market) or the index of the agent formulation?

Thanks!

Support for log income effects

As in the original BLP (1995) paper. This can be used in the automobile example problem to give more familiar estimates.

Computing shares after a change in exogenous variable

Hello,

I have been trying to simulate shares under a counterfactual where ONLY one of the exogenous variables changes. I can't seem to get it to work with the ProblemResults.compute_shares function (it seems that it doesn't register a change in product_data, it must have the product data stored somewhere I can't see). When I tried using the simulation class, it appears to simulate not only shares, but also prices (which I don't want). I was wondering if there is any way to resimulate counterfactuals shares under a scenario where only one of the exogenous variables changes (keeping prices the same).

Estimate supply side with only fixed effects.

Hello,
I am estimating a BLP model with time-invariant characteristics (besides price) and product fixed effects. Thus, I am estimating only heterogenous taste coefficients for each time invariant characteristic, and a linear parameter in prices. My supply side only includes, as a result, product fixed effects. When I try to feed a formulation for the supply side with only product fixed effects, (e.g. supply_formulation = pyblp.Formulation('0',absorb='C(product_id)')) I receive a Patsy error given from this line of code in formulations.py

raise patsy.PatsyError("formula has no terms.", patsy.origin.Origin(formula, 0, len(formula)))

Is there a way to estimate such a supply side specification in pyblp? I realize I could include indicators for each product_id (e.g. supply_formulation = pyblp.Formulation('C(product_id)')) but this will run much slower/take up more memory since there are many products. I am wondering if there is a fix I am missing that is workable within the existing framework of pyblp that will allow me to do this. Thanks for your time.

Incorporate Instrument Construction in pyblp.Problem or to_problem() method

Hi,

Maybe I have not understood completely the docs, but why can't we let the instrument construction phase to be done in the pyblp.Problem definition phase?

All examples (except the one on optimal instruments) seem to get the instruments brought in from the raw data, and there's no method to_problem() for build_blp_instruments and build_differentiation_instruments.

Sorry if the question seems silly.

Best,
Claudio

Issue on Formulation package, the new bug edit didn't help

alueError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pyblp\configurations\formulation.py in parse_expression(string, mark_categorical)
588 try:
--> 589 expression = sympy.parsing.sympy_parser.parse_expr(string, mapping, [transform_tokens], evaluate=False)
590 str(expression)

~\Anaconda3\lib\site-packages\sympy\parsing\sympy_parser.py in parse_expr(s, local_dict, transformations, global_dict, evaluate)
959
--> 960 code = stringify_expr(s, local_dict, global_dict, transformations)
961

~\Anaconda3\lib\site-packages\sympy\parsing\sympy_parser.py in stringify_expr(s, local_dict, global_dict, transformations)
865 for transform in transformations:
--> 866 tokens = transform(tokens, local_dict, global_dict)
867

~\Anaconda3\lib\site-packages\pyblp\configurations\formulation.py in transform_tokens(tokens, _)
559 if code not in {token.NAME, token.OP, token.NUMBER, token.ENDMARKER}:
--> 560 raise ValueError(f"The token '{value}' is invalid.")
561 if code == token.OP and value not in {'+', '-', '
', '/', '**', '(', ')'}:

ValueError: The token '' is invalid.

The above exception was the direct cause of the following exception:

ValueError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pyblp\configurations\formulation.py in parse_term_expression(term)
531 try:
--> 532 expression *= parse_expression(factor.name())
533 except Exception as exception:

~\Anaconda3\lib\site-packages\pyblp\configurations\formulation.py in parse_expression(string, mark_categorical)
592 except (TypeError, ValueError) as exception:
--> 593 raise ValueError(f"The expression '{string}' is malformed.") from exception
594

ValueError: The expression 'prices' is malformed.

The above exception was the direct cause of the following exception:

PatsyError Traceback (most recent call last)
in
2 #linear_formula = pyblp.Formulation('0 + prices', absorb='C(product_ids)')
3 # These are X_2 formulas
----> 4 nonlinear_formula = pyblp.Formulation('1 + prices + sugar + mushy')
5
6 product_formulations = (linear_formula,nonlinear_formula)

~\Anaconda3\lib\site-packages\pyblp\configurations\formulation.py in init(self, formula, absorb, absorb_method)
136
137 # parse the terms into SymPy expressions and extract variable names
--> 138 self._expressions = [parse_term_expression(t) for t in self._terms]
139 self._absorbed_expressions = [parse_term_expression(t) for t in self._absorbed_terms]
140 self._names = {str(s) for e in self._expressions for s in e.free_symbols}

~\Anaconda3\lib\site-packages\pyblp\configurations\formulation.py in (.0)
136
137 # parse the terms into SymPy expressions and extract variable names
--> 138 self._expressions = [parse_term_expression(t) for t in self._terms]
139 self._absorbed_expressions = [parse_term_expression(t) for t in self._absorbed_terms]
140 self._names = {str(s) for e in self._expressions for s in e.free_symbols}

~\Anaconda3\lib\site-packages\pyblp\configurations\formulation.py in parse_term_expression(term)
532 expression *= parse_expression(factor.name())
533 except Exception as exception:
--> 534 raise patsy.PatsyError("Failed to parse a term.", factor.origin) from exception
535 return expression
536

PatsyError: Failed to parse a term.
1 + prices + sugar + mushy
^^^^^^

Collinearity issue in problem simulation when #(firms) = #(products)

Hi Jeff,

I'm running into a problem trying to simulate fake data when the number of firms is equal to the number of products (J=F). The following code snippet follows your problem simulation tutorial exactly, except J=10, instead of J=20.

# Build market and firm IDs
id_data = pyblp.build_id_data(T=50, J=10, F=10)

# Specify integration rule
integration = pyblp.Integration('product', 9)
integration

# Set parameters for simulated data
simulation = pyblp.Simulation(
   product_formulations=(
       pyblp.Formulation('1 + prices + x'),
       pyblp.Formulation('0 + x'),
       pyblp.Formulation('0 + x + z')
   ),
   beta=[1, -2, 2],
   sigma=1,
   gamma=[1, 4],
   product_data=id_data,
   integration=integration,
   seed=0
)
simulation

# Simulate data from the model
simulation_results = simulation.solve()
product_data = simulation_results.product_data
simulation_results
# Set up and solve the BLP problem with pyBLP
problem = simulation_results.to_problem()

Running this code returns an error:

ValueError: Detected collinearity issues with [demand_instruments0] and at least one other column in ZD. To disable collinearity checks, set options.collinear_atol = options.collinear_rtol = 0.

Disabling collinearity checks seems to work, (problem.solve runs and recovers the true parameter estimates), but I guess there should be something under the hood that prevents this error from being thrown in the first place, since the package should be able to handle single-product firms.

Thanks in advance, this package is a great service to everyone trying to do applied work in IO!

Multi Level nested logit.

Hi, thank for your awesome project.
I have a question about the how to setup a multiple levels of nest, it seems like only support one layer of nest.

Thanks in advance.

Parametric bootstrap computation of post-estimation standard errors

This could be implemented with a Results.simulate method that creates an instance of a SimulatedResults class, which inherits from Results.

Parameter matrices would be drawn according to the estimated parameter distributions and would be stacked along a third dimension. Returned results would also be stacked along a third dimension.

Support for more standard optimization to compute Bertrand-Nash prices

This could be implemented with iteration, optimization, and zeta_formulation arguments to both Simulation.solve and Results.solve_merger with by default iteration over the zeta mapping, but with the option to iterate over the BLP mapping (not recommended) or optimize over either the BLP or zeta markup equation.

Beyond updating the API and docs, the real work here is implementing and testing gradient computation for both formulations.

Support the pure characteristics model

From Berry and Pakes (2007). This wouldn't require many changes to the API, but it would require new methods for solving the contraction mapping. Analytic Jacobians will likely be problematic as well.

Questions about the initial value

Hi,
In the tutorial of Nevo, I find that it will yield the same result when I pass different initial value. However, I get different results using my own data. I am asure that nothing but the initial value change. Another weird thing is that when I try to pass the result as the initial value, it return different result instead. If it return different results, how can I get a robust one?
Looking for your help~ Thanks

Question about the pyblp.build_integration

Hi~
It's me again.

  1. Does pyblp.build_integration build the same nodes every time? I use the code below and find that it will yield the same nodes, even when I change the number like 4→6, the first 4 column of nodes are the same as those when the characteristics number is 4.
    Code: mc_integration = pyblp.Integration('monte_carlo', size=50, specification_options={'seed': 0})
    agent_data=pyblp.build_integration(mc_integration,4)
  2. If it yiled the same nodes every time I use the Monte_carlo, then why the regression results are different even when I pass the same initial value to the problem.solve

Questions about the replication of nevo 2000

Hi, I read your paper and go through the tutorial. I have some questions mainly about the replication of Nevo 2000. As I only read very limited contents of the literature, I really hope some of these questions are not too innocent.

  1. Nevo 2000 suggest a minimum-distance procedure to identify the taste coefficients β when adding product fixed effect into the linear part. Do pyblp provide this function? I didn't find it in the tutorial and the taste coefficients are not reported. Also is it possible to output the taste coefficients for all i so that one can draw the distribution like the figure in Nevo 2000?

  2. In your paper, you say the loose tolerances explain the discrepancy between your results and those originally reported by Nevo. However, I use another blp code with a tolerance of 1e-7 and get almost the same result as yours. I think the results do seem better than the original, but the reason might be somehow different.

  3. Under the IV strategy of Nevo, is it meaningful to add market_ids as a fixed effect additional to product_ids, especially under panel data with large t? I try absorb='C(product_ids) + C(market_ids)' in the Nevo case and the price coefficient α decreases from -63 to -38. If this is not a good choice, why? (Sorry this question is completely due to the fact that I haven't fully understand the IV strategy of exploiting panel data.)

  4. I notice that in the paper you specify price elasticity α as a part of parameter θ2 that enter non-linear part. It looks like somehow confusing because the estimated α in non-linear estimation should be some deviation and we still need to estimate α in the linear part. In other words, price is nothing different from other product characteristics except it always differs across markets. Is my understanding correct or do I miss something?

Thanks.

Knitro 12 support

Knitro 12 doesn't include knitroNumPy but here are two options:
(1) Instead of importing knitroNumPy, import knitro.numpy as knp in pyblp/configurations/optimization.py (lines 231-238 are no longer needed),
(2) import knitroNumPy for Knitro 11, which is compatible with Knitro 12.

Support for price curvature

For example:

  • Incorporate powers of prices by letting linear_prices and nonlinear_prices in Problem initialization take on integer values.
  • Incorporate the log of prices with a log_prices argument.

Formulas with large number of columns in X

Is there any support for using a dynamic variable trick for X1 and X2?

E.g. X1 = [x1_1,x1_2,x1_3,...x1_100] and I put in a formula of 0 + prices + x1_% or something like that instead of writing out the full formula? Or putting in X1 as a whole?

I apologize for the novice issues!

Accessing Monte Carlo NS Draws

Unless I've misunderstood, there's currently no way to access the monte carlo draws from the integration step in the results class. Is that right? Would it be possible to change that?

Support discrete types

Like in the original Berry, Carnall, and Spiller (1996). This would require optimization over the integration weights (i.e., type shares).

Nested Logit Results differ strongly from direct IV estimation

Hi,

this looks like a great package and thanks to reticulate, I was also able to test it with R. While in the end I would like to use the RCNL option, I had problems to replicate the normal nested logit estimation in my examples. I get quite different coefficients than using the direct IV approach.

You state in the tutorial: "Rather than include ln𝑠𝑗|ℎ(𝑗)𝑡 along with the linear components of utility, 𝑋1, whenever nesting_ids are included in product_data, 𝜌 is treated as a nonlinear 𝑋2 parameter. " But does this have such a strong impact? In all examples I tried, estimates are completely different.

Below is a reproducible example (in R) using the nested logit example from the pyblp tutorial. I hope I did not make an error in my IV specification.

  1. Replicating the tutorial example in R using pyblp
library(reticulate)
library(dplyr)
library(readr)
library(AER)

# Import pyblp via reticulate and set options
pyblp <- import("pyblp")

pyblp$options$digits = 2L
pyblp.options.verbose = TRUE

# Load Nevo Product Data
file = pyblp$data$NEVO_PRODUCTS_LOCATION
dat = read_csv(file)
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   market_ids = col_character(),
##   city_ids = col_integer(),
##   quarter = col_integer(),
##   product_ids = col_character(),
##   firm_ids = col_integer(),
##   brand_ids = col_integer(),
##   sugar = col_integer(),
##   mushy = col_integer()
## )
## See spec(...) for full column specifications.
# Set nesting_ids to mushy
dat = dat %>%
  mutate(nesting_ids = mushy)

# Add demand instrument
# number of products in each nest
dat = dat %>%
  group_by(nesting_ids, market_ids) %>%
  mutate(demand_instruments20 = n())


# Same problem formulation as in pyblp example
formulation = pyblp$Formulation('0 + prices')
problem = pyblp$Problem(formulation, dat)
result = problem$solve(rho=0.7)
result
## Problem Results Summary:
## ==================================================================================
## Computation  GMM   Optimization   Objective   Objective    Gradient      Hessian  
##    Time      Step   Iterations   Evaluations    Value    Infinity Norm  Eigenvalue
## -----------  ----  ------------  -----------  ---------  -------------  ----------
##  00:00:05     2         4             9       +1.6E+06     +5.5E-07      +1.3E+07 
## ==================================================================================
## 
## Rho Estimates (Robust SEs in Parentheses):
## ==========
## All Groups
## ----------
##  +8.9E-01 
## (+1.9E-02)
## ==========
## 
## Beta Estimates (Robust SEs in Parentheses):
## ==========
##   prices  
## ----------
##  -7.8E+00 
## (+4.8E-01)
## ==========

Same result as in pyblp tutorial!

  1. Now I want to estimate the linear model $$
    \log s_{jt} - \log s_{0t} = \alpha p_{jt}+ x_{jt} \beta^x +\rho \log s_{j|h(j)t} + \xi_{jt}
    $$
    directly.
# Compute all relevant variables
dat = dat %>%
  group_by(market_ids) %>%
  mutate(s0 = 1-sum(shares)) %>% # outside product
  mutate(y = log(shares)-log(s0)) %>% # lhs variable
  group_by(market_ids, nesting_ids) %>%
  mutate(
    s_h = sum(shares), # shares of nest
    s_j_h = shares / s_h, # shares in nest
    log_s_j_h = log(s_j_h) # log s_{j|h(j)t}
  ) %>%
  ungroup()

# Build formula for iv regression
inst = colnames(dat)[startsWith(colnames(dat),"demand_instruments")]
form = paste0("y~0+prices+log_s_j_h+product_ids | product_ids + ", paste0(inst, collapse="+"))
form
## [1] "y~0+prices+log_s_j_h+product_ids | product_ids + demand_instruments0+demand_instruments1+demand_instruments2+demand_instruments3+demand_instruments4+demand_instruments5+demand_instruments6+demand_instruments7+demand_instruments8+demand_instruments9+demand_instruments10+demand_instruments11+demand_instruments12+demand_instruments13+demand_instruments14+demand_instruments15+demand_instruments16+demand_instruments17+demand_instruments18+demand_instruments19+demand_instruments20"
# Run IV regression
iv = ivreg(form, data=dat)
iv
## 
## Call:
## ivreg(formula = form, data = dat)
## 
## Coefficients:
##           prices         log_s_j_h  product_idsF1B04  product_idsF1B06  
##           2.5808            1.1784           -0.8328           -1.1490  
## product_idsF1B07  product_idsF1B09  product_idsF1B11  product_idsF1B13  
##          -1.0834           -0.2071           -0.5915           -0.3250  
## product_idsF1B17  product_idsF1B30  product_idsF1B45  product_idsF2B05  
##          -1.0471           -0.1585           -0.2370           -0.3549  
## product_idsF2B08  product_idsF2B15  product_idsF2B16  product_idsF2B19  
##          -0.3551           -0.7316           -1.0906           -0.3331  
## product_idsF2B26  product_idsF2B28  product_idsF2B40  product_idsF2B48  
##          -0.3478           -1.2219           -0.1393           -0.1523  
## product_idsF3B06  product_idsF3B14  product_idsF4B02  product_idsF4B10  
##          -1.0349           -0.4524           -0.3984           -0.1476  
## product_idsF4B12  product_idsF6B18  
##          -0.3058           -0.3499

We estimate a completely different coefficient for price. It is now positive and not significant

  1. Replicating the simple logit model

In contrast, we can perfectly replicate the results for the simple (non-nested) logit model from the pyblp tutorial.

file = pyblp$data$NEVO_PRODUCTS_LOCATION
product_data = read_csv(file)
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   market_ids = col_character(),
##   city_ids = col_integer(),
##   quarter = col_integer(),
##   product_ids = col_character(),
##   firm_ids = col_integer(),
##   brand_ids = col_integer(),
##   sugar = col_integer(),
##   mushy = col_integer()
## )
## See spec(...) for full column specifications.
logit_formulation = pyblp$Formulation('prices', absorb='C(product_ids)')
problem = pyblp$Problem(logit_formulation, product_data)
logit_results = problem$solve()
logit_results
## Problem Results Summary:
## =========================================
## Computation  GMM    Objective   Objective
##    Time      Step  Evaluations    Value  
## -----------  ----  -----------  ---------
##  00:00:00     2         2       +4.2E+05 
## =========================================
## 
## Beta Estimates (Robust SEs in Parentheses):
## ==========
##   prices  
## ----------
##  -3.0E+01 
## (+1.0E+00)
## ==========
# Replicate resulst via ivreg
dat = product_data %>%
  group_by(market_ids) %>%
  mutate(s0 = 1-sum(shares)) %>%
  mutate(y = log(shares)-log(s0)) %>%
  ungroup()

# Create instruments
inst = colnames(dat)[startsWith(colnames(dat),"demand_instruments")]
form = paste0("y~prices+product_ids | product_ids + ", paste0(inst, collapse="+"))
# Run IV regression and show results
iv = ivreg(form, data=dat)
iv
## 
## Call:
## ivreg(formula = form, data = dat)
## 
## Coefficients:
##      (Intercept)            prices  product_idsF1B06  product_idsF1B07  
##          -1.7747          -30.0978            2.3660            1.8011  
... output ommited ...

Add test for overidentifying restrictions

Report overidentifying restrictions tests of supply moments when the supply side is included.

LR is easiest to implement but slowest to run.
Wald requires Jacobians but only a single estimate.
LM (Score) is the third.

More flexible and intuitive matrix design

X1, X2, X3, and D should be designed with formulas that support interactions and potentially common functions such as logs. This would cover #14 and provide a foundation for #6.

At first glance, the patsy package seems ideal. However, derivatives with respect to individual product characteristics (in particular, prices) would need to be computed. Instead, sympy.diff seems promising:

  1. Use sympy.parsing.sympy_parser.parse_expr to parse new Problem arguments: linear_formula, nonlinear_formula, cost_formula, and demographic_formula. Default arguments would be overridden so that only sympy.Symbols created from non-special keys in product_data and agent_data, along with common functions such as sympy.log, are accepted.
  2. Split each expression into its arguments, which are formulas for columns of X1, X2, X3, and D. Expressions that aren't subclasses of sympy.Add constitute the entirety of a matrix.
  3. Use sympy.diff to differentiate each column expression with respect to each product characteristic.
  4. Use sympy.lambdify to create numpy-friendly functions, which can compute all columns and derivatives.
  5. Compute all columns.
  6. Pre-compute all derivatives, except for the majority of columns, which are all zeros or ones.

Subtracting contributions of linear gamma parameters in theta from iv_delta instead of iv_tilde_costs

iv_delta -= self._compute_true_X3(index=~parameters.eliminated_gamma_index.flatten()) @ theta_gamma

Unless I'm mistaken, line 672 should read


iv_tilde_costs -= self._compute_true_X3(index=~parameters.eliminated_gamma_index.flatten()) @ theta_gamma

This is so the contributions of the linear gamma parameters are subtracted from iv_tilde_costs. The contributions of the beta parameters were subtracted from iv_delta on line 669. Apologies if I'm misunderstanding something.

Trouble Getting Started

I'm using python 3.6 through Anaconda. I installed pyblp through pip with no problems and tried running through the nested logit tutorial.

For some reason, the pyblp.Formulation command raised an assertion error. I'm sure the issue is something stupid, but I'm wondering if it might have to do with a dependency on patsy or something else? Any pointers you might have on how to resolve this?

image

Display results at each iteration

Hi,
I am estimating a large model and want to monitor the optimization process. Thus, I am wondering is there any option to display results at each iteration, during (not after) the optimization? The analogue option in MATLAB, for example, is
"options = optimoptions(@fminunc,'Display' , 'iter');"

Thanks you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.