Git Product home page Git Product logo

gslab_make's People

Contributors

etang21 avatar gentzkow avatar houdanait avatar jc-cisneros avatar jcconway avatar szahedian avatar z-y-huang avatar zkashner avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

gslab_make's Issues

Debug code for gslab_make

Hi @jcconway,

As we work concurrently on other issues, my guess is that we'll find bugs that will need to be fixed for gslab_make to run correctly. I was thinking we could make a bug branch to centralize everything. Let me know if you have any ideas on how to facilitate the bugfixing process.

Fix run_latex hanging errors

Following the conversion of Gentzkow/VirusAffect paper.lyx and SI.lyx to latex format, it turns out that the function run_latex defined here does not return an error when there actually is a LaTeX error.

Therefore instead of displaying an error message on terminal and quitting, it runs forever on terminal without exiting.

Task list

  • Fix the run_latex function so that it stops running when encountering an error and that it displays the error message

@zkashner fyi

Default config file for update_paths

Follows comment of issue.

The purpose of this issue is to change how config files are loaded. We want to update paths using two config files: the default one and a second one which users can use to override default settings.

Task list

Miscellaneous repository tidying tasks

Follows #22, #26.

Tidy up repository after documentation and testing overhaul.

Task list

  • Clean up codebase (e.g., standardize formatting, remove outdated imports, etc.)
  • Add LaTeX program function
  • Add continuous documentation compiling to Github Actions
  • Add continuous testing to Github Actions

Brainstorm workplan for gslab_make

Thoughts on what needs to be done:


Workplan

  • Update unit tests for gslab_make

  • Debug code for gslab_make
    โ†“โ†“

    • Remove outdated features for gslab_make
    • Refactor code for gslab_make
    • Implement new features for gslab_make

  • Restructure gslab_python to only containgslab_scons

Notes/questions from going through the code and talking with Adam:

  • Unit tests currently exist on SVN, but are likely broken.
  • We need to confirm that the current version runs correctly (if at all).
  • Some features (e.g., downloading from SVN) are no longer relevant.
  • Do we want to invest in the time to refactor the code for gslab_make? (It's the only task that's more of an ideal as opposed to necessity.) My estimate is that it would take two weeks.
  • We can break down new features once we have finalized a list of what we want to add.

Implement new features for gslab_make

Initial round of changes listed here:

  1. I would vote that we get rid of the default paths altogether. They add complexity to the code and reduce transparency in the resulting make.py files. We might instead define the make.py template so the user defines a paths dictionary (or similar structure) at the top which includes values for the relevant paths. Then this can be explicitly passed forward into the run_XX commands, something like:

run_program.run_stata(program = 'script.do', paths = paths)

(Maybe we could then get rid of the set_option functionality as well?)

  1. It seems to me like the Linking and Link Logging functions are pretty separable. What about carving these out into a separate module gslab_links so the core gslab_make module is even lighter?

  2. Are we sure that all of the directory functions are still needed (i.e., cannot be replaced by equivalent built in Python functions)? Some of the original motivations for these -- e.g., needing to clear directories without deleting the .svn directories that SVN creates -- are no longer relevant. But it's fine to keep them if they allow us to replace 3 lines of code with 1 or something in every make.py.

  3. Do we want to make run_XX an alias for run_program.run_XX? So I can just write

run_stata(program = 'script.do', paths = paths)

  1. write_output_logs seems like a vague name. I would not have known what this did based on the name alone. Maybe log_files_in_directory?

  2. I might re-order the readme as Logging Functions, Program Functions, Directory Functions. (I'm imagining we're carving the Link stuff into a separate repo.) I might re-order within program functions to put execute_command last and then order the run_XX functions alphabetically.

  3. Can we get rid of the /gslab_make_dev/ subdirectory and just have the library be gslab_make? (I think we could delete gslab_make from gslab_python at this point to avoid conflicts.) Also, do we think it's necessary to keep the /private/ subdirectory? Not sure what best practice is on that but it seems like it might be overkill.

Revise unit test syntax

Follows #41, in particular this comment.

The issue is that many unit tests are written like so

def test_something(self):     
     try:
          some_code_that_possibly_throws_error
     except Exception as e:
          self.assertRaises(Exception, e)

This is meant to check that faulty inputs cause errors. But tests written this way will always pass.

Pull in check_conda functionality from gentzkow/template

Follows Issue 38 in gentzkow/template.

In Issue 38 of gentzkow/template, we found that there were changes made to the copy of gslab_make in gentzkow/template which have not been merged into this repository. These changes are largely about checking that the user's conda environment setup is correct. In this issue, we will pull these changes into this repository.

Task list

  • Pull in changes from gentzkow/template

Review progress and discuss next steps

Hi @gentzkow:

I've been updating gslab_make and template to some best practices and wanted to get your feedback.

  1. I added some community standards documents (#23).

  2. I migrated the documentation to Sphinx (#22). As a demo, I'm currently hosting it on my forked version of gslab_make (I don't have access for the main repository). I'm still playing around with some of the auto-documentation functionalities, but let me know if there's anything about the layout I should change.

In terms of next steps:

  1. I want to revamp the unit tests (#9) and switch to continuous testing (I was thinking about using Travis CI).
  2. I need to wrap up some of the changes I was making to gslab_make, template, and ra-manual as a response to the data editor reviews from SocialMediaEffects.
  3. I did a first pass at creating real examples for the template and need to continue building on them.

How do you think I should best prioritize the next steps?

Rename `gslab_make/` folder to `source/`

Follows Issue 38 in gentzkow/template.

In Issue 38 of gentzkow/template, we are moving to use Git submodules to track the entire gslab_make codebase. Scripts that use gslab_make then should include only the module in the current gslab_make folder.

To make the name of the included module more intuitive, we will change the folder name from gslab_make to source. When importing the relevant module from this codebase, this lets us import from path gslab_make/source instead of from path gslab_make/gslab_make.

  • update folder name
  • update other code paths
  • update documentation

Revise run_module function in run_program.py

Follows Revise run_all.py #58 in the gentzkow/template repository.

After completing the set up process in gentzkow/template, snairdesai and I both got the same error after running run_all.py on the terminal. The text of the error on the terminal can be seen on template_build_error terminal.txt. The purpose of the run_module function is to 'Run script build_script in module directory module relative to root of repository root. Therefore, the function should return to root directory after running a module if run from the root directory.

Task list

  • Edit run_mododule in run_program.py. Conduct a test run of the build script run_all.py template in gentzkow/template repository.

Remove get_externals, SVN, and other outdated gslab_make features

As part of the larger project outline on the wiki, this task is to removed outdated features from gslab_make. As listed on the wiki, this includes the following tasks:

  • Remove get_externals and get_externals_github
  • Remove SVN related features
  • Flag other features for potential removal

Refactor tablefill and textfill

Goals:

  1. Port tablefill and textfill for Python 3
  2. Refactor tablefill and textfill to make codebase more Pythonic and transparent

I would focus on completing (1) then moving on to (2). I had started on (1) with this commit but never actually tested to see if everything was working correctly on Python 3.

I would branch off issue13_template_to_dos.

Add log file of software + packages versions

Following gentzkow/template#65 (comment), we want use the latest versions of software + packages in the lab projects, unless there is a valid argument to freeze one or more of the packages to specific versions. @snairdesai and I proposed to comment the versions used in successful runs, but that might lead to errors that are difficult to track (i.e., someone forgets to manually update the comments). As discussed on a meeting, @gentzkow agreed that a good idea would be to automatically produce a log file that keeps track of all the versions installed by a user in any commit that is pushed to the repo. In that scenario, if a script stops working in the future, we can check the difference in the software + package versions and identify the source of the error.

Task list

  • Revise the current log scripts (write_logs.py, write_source_logs.py) and if we should add a separate script for this purpose or add a function to write_logs.py
  • Write the function and test it on template

Import content from gslab_python

We will use this repo to build a simplified version of the gslab_make and gslab_fill modules which are currently part of gslab_python.

This issue takes the first step which is to import the existing content.

Step:

  1. Copy all the content from gslab_python into this repo excluding the gslab_scons and gslab_misc subdirectories
  2. Edit the setup files, readme, etc. to eliminate references to gslab_scons and gslab_misc, and to boil things down to the minimum needed to run gslab_make and gslab_fill
  3. Revise the structure of the repo so it functions as a single module rather than a package with two modules; this means the gslab_make and gslab_fill files can be combined in a single directory. Make sure everything follows best practices for organizing modules -- see e.g. here.

At this point you should not make any significant changes to the code inside gslab_make and gslab_fill. That will be the next step.

Refactor tablefill

Following #16.

Refactor tablefill. Goal is to simplify codebase and add error messaging.

Fix hanging on execute_command

Follows issue 261 in PhoneAddiction.

When running many bootstrap iterations in R for the PhoneAddiction repo, we found that the R files ran correctly when run using RScript, but hung and did not complete when run using gslab.run_r. The diagnosis for this appears to be the process.wait() statement, which may block if pipes overflow. From the documentation,

Note This will deadlock when using stdout=PIPE or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use Popen.communicate() when using pipes to avoid that.

This StackOverflow discussion offers another explanation of the difference between wait and communicate.

The best fix appears to be simply removing the process.wait() line, as the subsequent process.communicate() line already calls wait internally. The one concern is that communicate() cannot handle output which is too large to be buffered in memory. ("Too large" is on the order of gigabytes or many megabytes here). Since our outputs are mostly R logs, though, it seems that communicate() should be sufficient for our purposes.

Task list

  • Remove process.wait() from Directive.execute_command()

@Houdanait tagging you here if you have any other thoughts on wait vs communicate!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.