Git Product home page Git Product logo

Comments (10)

diviyank avatar diviyank commented on August 20, 2024 1

Sorry for the delay.
To solve @ArnoVel 's issue, it originates from :

pc = PC(CItest="hsic",method_indep="rcit")

These two options are incompatible. to use RCIT, you should use CItest='randomized'.
I should change the API to make it simpler to avoid these issues.

from causaldiscoverytoolbox.

ArnoVel avatar ArnoVel commented on August 20, 2024

If my understanding is correct, either pc.R does not write results (because it fails to do so, or is simply not called) ; or PC.py does not know how/where to get the results back.

More detailed:

  • Check the fact R package can load
dfpkg = DefaultRPackages()
dfpkg.check_R_package('some_package')

Returns True for all the packages needed.

  • The scripts seem to be writing to funny paths such as /tmp/cdt_pc_ea47a7a6-85c3-4624-a8a8-144b68ae72b0//data.csv. Wouldn't the double-slash ( <something>//<something>) mess things up?
  • The problem seems to come from the launch_R_script function from R.py

from causaldiscoverytoolbox.

ArnoVel avatar ArnoVel commented on August 20, 2024

Update: problem comes from the R script pc.R.
The following pops on the terminal when the process is called:

Loading required package: momentchi2
Loading required package: MASS
Casting arguments...
Error in skeleton(suffStat, indepTest, alpha, labels = labels, method = skel.method,  : 
  Evaluation error: object 'pval' not found.
Calls: runPC -> <Anonymous> -> skeleton
Execution halted

Investigating further...

from causaldiscoverytoolbox.

ArnoVel avatar ArnoVel commented on August 20, 2024

Strangely enough, the CRAN pcalg doc doesn't list any pval variable for the skeleton function..
Edit: this is what I found through R's getAnywhere() : the relevant bit of the skeleton function is:

                  if (length_nbrs >= ord) {
                    if (length_nbrs > ord) 
                      done <- FALSE
                    S <- seq_len(ord)
                    repeat {
                      n.edgetests[ord1] <- n.edgetests[ord1] + 
                        1
                      pval <- indepTest(x, y, nbrs[S], suffStat)
                      if (verbose) 
                        cat("x=", x, " y=", y, " S=", nbrs[S], 
                          ": pval =", pval, "\n")
                      if (is.na(pval)) 
                        pval <- as.numeric(NAdelete)
                      if (pMax[x, y] < pval) 
                        pMax[x, y] <- pval
                      if (pval >= alpha) {
                        G[x, y] <- G[y, x] <- FALSE
                        sepset[[x]][[y]] <- nbrs[S]
                        break
                      }
                      else {
                        nextSet <- getNextSet(length_nbrs, ord, 
                          S)
                        if (nextSet$wasLast) 
                          break
                        S <- nextSet$nextSet
                      }

What do you think? Some problem related to the IndepTest itself, or the way it is passed to R

from causaldiscoverytoolbox.

ArnoVel avatar ArnoVel commented on August 20, 2024

Ok, went to check the pc.R file, and something bugs me.
First, you define the PC options this way

# additional options for PC
  optionsList <- list("indepTest"={CITEST}, "fixedEdges"=fixedEdges,
                      "NAdelete"=TRUE, "m.max"=Inf, "u2pd" = "relaxed",
                      "skel.method"= "stable.fast", "conservative"=FALSE,
                      "maj.rule"=TRUE, "solve.confl"=FALSE, numCores={NJOBS})

which becomes (after your template editing) in my case

  # additional options for PC
  optionsList <- list("indepTest"=kpcalg::kernelCItest, "fixedEdges"=fixedEdges,
                      "NAdelete"=TRUE, "m.max"=Inf, "u2pd" = "relaxed",
                      "skel.method"= "stable.fast", "conservative"=FALSE,
                      "maj.rule"=TRUE, "solve.confl"=FALSE, numCores=1)

But then, both in the template pc.R and after-editing, I find that the last function call

result <- runPC(dataset, suffStat = NULL, parentsOf, alpha,
               variableSelMat, setOptions,
               directed, verbose, fixedEdges, fixedGaps, CI_test)

is made using some variable CI_test which I cannot find in any of the two files?

Would that be the problem ? (as you know that pval , the output of indepTest is the root of my issue)

Any clue would help me greatly!

Edit: So far, the variable CI_test takes the place of result in the function call, which is immediately set to an empty list through

result <- vector("list", length = length(parentsOf))

I do not see the point of this? And why give result as function argument; and why give CI_test such a suggestive name ? Much confusion

from causaldiscoverytoolbox.

diviyank avatar diviyank commented on August 20, 2024

Hi,
Thanks for the detailed bug report, I'll look into this
The strange point is that the Continuous integration tool did not report any bug....

Best,
Diviyan

from causaldiscoverytoolbox.

braden-scherting avatar braden-scherting commented on August 20, 2024

Hi All,

I'll just note that I encounter this error for all R-dependent cdt.causality.graph methods. The problem persists with custom data.

Running the basic tutorial code verbatim:

PC is ran on the skeleton of the given graph.
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
 in 
----> 1 model.predict(data, new_skeleton)

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/cdt/causality/graph/model.py in predict(self, df_data, graph, **kwargs)
     63             return self.create_graph_from_data(df_data, **kwargs)
     64         elif isinstance(graph, nx.DiGraph):
---> 65             return self.orient_directed_graph(df_data, graph, **kwargs)
     66         elif isinstance(graph, nx.Graph):
     67             return self.orient_undirected_graph(df_data, graph, **kwargs)

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/cdt/causality/graph/PC.py in orient_directed_graph(self, data, graph, *args, **kwargs)
    240         """
    241         warnings.warn("PC is ran on the skeleton of the given graph.")
--> 242         return self.orient_undirected_graph(data, nx.Graph(graph), *args, **kwargs)
    243 
    244     def create_graph_from_data(self, data, **kwargs):

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/cdt/causality/graph/PC.py in orient_undirected_graph(self, data, graph, **kwargs)
    219         fg = DataFrame(1 - fe.values)
    220 
--> 221         results = self._run_pc(data, fixedEdges=fe, fixedGaps=fg, verbose=self.verbose)
    222 
    223         return nx.relabel_nodes(nx.DiGraph(results),

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/cdt/causality/graph/PC.py in _run_pc(self, data, fixedEdges, fixedGaps, verbose)
    302         except Exception as e:
    303             rmtree(run_dir)
--> 304             raise e
    305         except KeyboardInterrupt:
    306             rmtree(run_dir)

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/cdt/causality/graph/PC.py in _run_pc(self, data, fixedEdges, fixedGaps, verbose)
    298 
    299             pc_result = launch_R_script("{}/R_templates/pc.R".format(os.path.dirname(os.path.realpath(__file__))),
--> 300                                         self.arguments, output_function=retrieve_result, verbose=verbose)
    301         # Cleanup
    302         except Exception as e:

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/cdt/utils/R.py in launch_R_script(template, arguments, output_function, verbose, debug)
    198         if not debug:
    199             rmtree(base_dir)
--> 200         raise e
    201     except KeyboardInterrupt:
    202         if not debug:

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/cdt/utils/R.py in launch_R_script(template, arguments, output_function, verbose, debug)
    192                                            stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    193             process.wait()
--> 194             output = output_function()
    195 
    196     # Cleaning up

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/cdt/causality/graph/PC.py in retrieve_result()
    286 
    287         def retrieve_result():
--> 288             return read_csv('{}/result.csv'.format(run_dir), delimiter=',').values
    289 
    290         try:

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    683         )
    684 
--> 685         return _read(filepath_or_buffer, kwds)
    686 
    687     parser_f.__name__ = name

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    455 
    456     # Create the parser.
--> 457     parser = TextFileReader(fp_or_buf, **kwds)
    458 
    459     if chunksize or iterator:

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    893             self.options["has_index_names"] = kwds["has_index_names"]
    894 
--> 895         self._make_engine(self.engine)
    896 
    897     def close(self):

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1133     def _make_engine(self, engine="c"):
   1134         if engine == "c":
-> 1135             self._engine = CParserWrapper(self.f, **self.options)
   1136         else:
   1137             if engine == "python":

~/.pyenv/versions/3.7.3/envs/cause_env/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1915         kwds["usecols"] = self.usecols
   1916 
-> 1917         self._reader = parsers.TextReader(src, **kwds)
   1918         self.unnamed_cols = self._reader.unnamed_cols
   1919 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: [Errno 2] File b'/var/folders/r0/l_20fjfn6vs6l848f3jfysj00000gn/T/cdt_pc_b91480ab-9011-49ca-8f51-0f3aa9c5d861//result.csv' does not exist: b'/var/folders/r0/l_20fjfn6vs6l848f3jfysj00000gn/T/cdt_pc_b91480ab-9011-49ca-8f51-0f3aa9c5d861//result.csv'

from causaldiscoverytoolbox.

diviyank avatar diviyank commented on August 20, 2024

@nedarb708 , I think your issue is different; could you explain your setup ? Please provide the lines that error. This FilenotFoundError originates from the R process that errors. Are you running it in a docker ?

from causaldiscoverytoolbox.

ArnoVel avatar ArnoVel commented on August 20, 2024

@Diviyan-Kalainathan Thanks for this, it makes sense !
I do think however my intuition of first picking the type of test, and then specifying that I wanted to use the randomized version to approximate the HSIC test makes sense in terms of design?
Anyways, thank you for the explanation :)

from causaldiscoverytoolbox.

diviyank avatar diviyank commented on August 20, 2024

It should be much easier to understand, and errors might be avoided in the future :)

from causaldiscoverytoolbox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.