Associate Editor: Julien Chiquet
Reviewer 1 : Nicolas Bousquet (chose to lift his anonymity)
Reviewer 1: Reviewing history
- Paper submitted March 30, 2022
- Reviewer invited April 25, 2022
- Review 1 received June 29, 2022
- Paper revised December 15, 2022
- Reviewer invited December 16, 2022
- Review 2 received January 01, 2023
- Paper conditionally accepted January 02, 2023
- Paper published January 11, 2023
First Round (received June 29, 2022)
Recommendation
Revise and resubmit
General comment
This paper offers a contribution that belongs to the class of software papers presenting implementations of stats/ML algorithms encapsulated within a new module. It aims at offering new Python implementations of copula families, that extend the range of families provided in existing Python packages, and filling a hole with respect to R, which knows today many powerful packages on this theme (including especially vine copulas).
More in detail, the COPPY module proposed by the author offers to access to the extreme value family of copula, both in inference and simulation, which was not inserted within the historical Python tools “Copulas” and “Copulae”. A keypoint is that COPPY provides sampling techniques, of great interest in problems related to predictive analysis, bootstrapping, machine learning, etc.
The explanations about how the module is built are interesting and useful. The paper is easy to read, well illustrated, and the entanglement with the code allows for easy reproduction. It would be appreciable if the use of the selected copula families for several applications were referenced, along with some isodensity plots illustrating the main properties (e.g., isodensity plots for asymmetric copulas), in addition to the illustrations provided in the final sections of the paper. I think the paper globally fits with the requirements of Computo, and could be accepted after a revision taking account of these general and specific comments.
A main concern, in my opinion, is that a Python platform used by many practicians (e.g. engineers) is OpenTURNS, which already incorporates a wide variety of multivariate parametric models, especially under the form of copulas. See :
links
author' answer: Thank you for sharing information about the OpenTURNS package in Python. I wasn’t aware of it before. In the main paper, I have added the following sentence: Other packages provide sampling methods for copulae, but they are typically restricted to the bivariate case and the conditional simulation method (see, for example, [Baudin et al., 2017]). Additionally, if the multivariate case is considered only Archimedean and elliptical copulae are under interest and those packages (see [Nicolas, 2022]) do not include the extreme value class in arbitrary dimensions $d ≥ 2$.
It sounds clear to me that the paper, in addition to taking account the specific comments listed beneath, should have a look on this platform and position its content with respect to it. But I am confident that the author will provide some details about the differences and complementarity between COPPY and OpenTURNS.
Specific comments
Introduction
“it is characterizing only for a few models, the multivariate normal distribution”. This sentence is strangely formulated. Student copula is defined by a correlation matrix too, and probably Levy-based copulas offer similar properties. I suggest to say that the use of usual (linear) correlation coefficients is most often misleading, as it is known that rank correlations are real dependence indicators, but in practice (please search for appropriate references) the use of the multivariate normal is thought to be “easy” because of its canonic covariance / correlation parametrization. Many papers deal with this problem of reducing dependence structures to covariance matrices.
author' answer: I understand the referee’s concerns about the sentence ”it is characterizing only for a few models, the multivariate normal distribution.” Here is a new sentence that conveys the same idea: It is well known that only linear dependence can be captured by the covariance and it is only characteristic for a few models, e.g., the multivariate normal distribution or binary random variables. To elaborate on this point, consider two random variables $X$ and $Y$ defined in the same probability space. Then, it is not necessarily true that if $Cov(X, Y ) = 0$, the random variables $X$ and $Y$ are independent. It also holds for Student copulae that are parametrized by the correlation matrix. It is true that rank correlations are better indicators of dependence than linear correlations, but they are not able to detect more complicated nonlinear, non-monotone dependencies (see, e.g., [Drton et al., 2020]). These concerns are already addressed by the sentence quoted earlier about the limitations of the covariance matrix.
“of prime interest for…” I suggest to place some appropriate references here, relative to the fields listed by the author
author' answer: Thank you for pointing out the need for references in this sentence. I have now added the following sentence: The theory of copulae has been of prime interest for many applied fields of science, such as quantitative finance ([Patton, 2012]) or environmental sciences ([Mishra and Singh, 2011]).
Figure 1 Symbols used in this figure are explained further in the text. I suggest however to provide a short explanation in the legend to clarify the reading.
Section 2.1
It is peculiarly… than d". A reference would be suitable here.
author' answer: Thank you for pointing out the need of an additional reference. The corresponding sentence is thus modified leading to: Note that d-monotonic Archimedean inverse generators do not necessarily generate Archimedean copulae in dimensions higher than dd (see [McNeil and Neslehova, 2009]).
Section 2.2
asymmetric dependence: could you provide a graphic illustration to help the reader to understand? A more general question is “can COPPY help to visualize with isodensity, an usual technique in R”?
author' answer: The package COPPY (now known as clayton) does not provide tools for visualizing isodensity. As for asymmetric dependence, here is the definition for the bivariate case added in the main text: Asymmetric dependence refers to the property where, for a bivariate copula $C$, there exists $(u_0, u_1) \in [0,1]^2$ such that
$$ C(u_0, u_1) \neq C(u_1, u_0) $$
Section 3.1
The Pickands dependence function is not defined before. It would be appropriate for the reader to provide short details about it and its importanace in multivariate analysis.
author' answer: Thank you for pointing out the need for more information about the Pickands dependence function. I have now added the following text in Section 3.1: The Pickands dependence function characterizes the extremal dependence structure of an extreme value random vector and verifies $\max{w_0,\dots , w_{d-1}} ≤ A(w_0,\dots , w_{d−1}) ≤ 1$ where the lower bound corresponds to comonotonicity and the upper bound corresponds to independence. Estimating this function is an active area of research, with many compelling studies having been conducted on the topic (see, for example, [Bücher et al., 2011], [Gudendorf and Segers, 2012]).
Section 3.2
Sample from the Gaussian … should it be corrected by “samples”?
Section 4
Even if I recognize this could be a very long task, I would have appreciated to have a kind of comparison table with the use of other existing packages to compare CPU time, the effect of dimension, etc., maybe in another dedicated section, before the discussion. I let this decision to the AE, but this would give a noticeable gain in information for the reader that want to incorporate copula handling in machine-learning type routines.
author' answer: Thank you for suggesting a comparison of different packages for copula. I have added a new subsection in the discussion section (Section 5) where we compare our package clayton
with two other packages in R
: the copula and mev
packages. We provide a comparison table that includes metrics such as CPU time and the effect of dimension. This should provide valuable information for readers who want to incorporate copulas into machine-learning routines.
Section 5
I completely agree with the line of improvement about the vines. Automatic copula selection using statistical criteria should be a very useful new contribution for the community.
Author's additional References
- [Baudin et al., 2017] Baudin, M., Dutfoy, A., Iooss, B., and Popelin, A.-L. (2017). Openturns: An industrial software for uncertainty quantification in simulation. In Handbook of uncertainty quantification, pages 2001–2038. Springer.
- [Bücher et al., 2011] B¨ucher, A., Dette, H., and Volgushev, S. (2011). New estimators of the pickands dependence function and a test for extreme-value dependence. The Annals of Statistics, 39(4):1963–2006.
- [Drton et al., 2020] Drton, M., Han, F., and Shi, H. (2020). High-dimensional consistent independence testing with maxima of rank correlations. The Annals of Statistics, 48(6):3206–3227.
- [Gudendorf and Segers, 2012] Gudendorf, G. and Segers, J. (2012). Nonparametric estimation of multivariate extreme-value copulas. Journal of Statistical Planning and Inference, 142(12):3073–3085.
- [McNeil and Neslehova, 2009] McNeil, A. J. and Neslehova, J. (2009). Multivariate Archimedean copulas, d-monotone functions and l1-norm symmetric distributions. The Annals of Statistics, 37(5B):3059 – 3097.
- [Mishra and Singh, 2011] Mishra, A. K. and Singh, V. P. (2011). Drought modeling – a review. Journal of Hydrology, 403(1):157–175. 5
- [Nicolas, 2022] Nicolas, M. L. (2022). pycop: a python package for dependence modeling with copulas. Zenodo Software Package, 70:7030034.
- [Patton, 2012] Patton, A. J. (2012). A review of copula models for economic time series. Journal of Multivariate Analysis, 110:4–18.
Second Round (received January 1, 2023)
The revision is good, and the author satisfactorily answer to my remarks. I appreciate the care given to the package finalization (a wish expressed by the second reviewer). A small defect is the impossibility to plot some visual diagnostics, but this could be probably managed in another package. Therefore I think the manuscript is valuable for publication.
Recommendation
Accept