gslab-econ / gslab_make Goto Github PK
View Code? Open in Web Editor NEWPython tools for GSLab
License: MIT License
Python tools for GSLab
License: MIT License
Hi @jcconway,
As we work concurrently on other issues, my guess is that we'll find bugs that will need to be fixed for gslab_make
to run correctly. I was thinking we could make a bug branch to centralize everything. Let me know if you have any ideas on how to facilitate the bugfixing process.
Follows gentzkow/template#54. We would like to remove the git
python dependency from gentzkow/template. This involves editing gslab_make so as to allow logging even if the not being run inside a git repo. The python git dependency is used in check_setup.py.
This issue stems from gentzkow/template#53 (comment).
We want to streamline lab tools to use fewer dependencies for the sake of performance. We noticed that the future
module is used sparingly throughout this repo, and would like to eliminate it.
Follows #9.
Following the conversion of Gentzkow/VirusAffect paper.lyx and SI.lyx to latex format, it turns out that the function run_latex defined here does not return an error when there actually is a LaTeX error.
Therefore instead of displaying an error message on terminal and quitting, it runs forever on terminal without exiting.
Task list
@zkashner fyi
The purpose of this issue is to change how config files are loaded. We want to update paths using two config files: the default one and a second one which users can use to override default settings.
Task list
Thoughts on what needs to be done:
Workplan
Update unit tests for gslab_make
Debug code for gslab_make
โโ
Notes/questions from going through the code and talking with Adam:
gslab_make
? (It's the only task that's more of an ideal as opposed to necessity.) My estimate is that it would take two weeks.Initial round of changes listed here:
- I would vote that we get rid of the default paths altogether. They add complexity to the code and reduce transparency in the resulting make.py files. We might instead define the make.py template so the user defines a
paths
dictionary (or similar structure) at the top which includes values for the relevant paths. Then this can be explicitly passed forward into the run_XX commands, something like:run_program.run_stata(program = 'script.do', paths = paths)
(Maybe we could then get rid of the set_option functionality as well?)
It seems to me like the Linking and Link Logging functions are pretty separable. What about carving these out into a separate module gslab_links so the core gslab_make module is even lighter?
Are we sure that all of the directory functions are still needed (i.e., cannot be replaced by equivalent built in Python functions)? Some of the original motivations for these -- e.g., needing to clear directories without deleting the .svn directories that SVN creates -- are no longer relevant. But it's fine to keep them if they allow us to replace 3 lines of code with 1 or something in every make.py.
Do we want to make
run_XX
an alias forrun_program.run_XX
? So I can just writerun_stata(program = 'script.do', paths = paths)
write_output_logs
seems like a vague name. I would not have known what this did based on the name alone. Maybelog_files_in_directory
?I might re-order the readme as Logging Functions, Program Functions, Directory Functions. (I'm imagining we're carving the Link stuff into a separate repo.) I might re-order within program functions to put execute_command last and then order the run_XX functions alphabetically.
Can we get rid of the /gslab_make_dev/ subdirectory and just have the library be gslab_make? (I think we could delete gslab_make from gslab_python at this point to avoid conflicts.) Also, do we think it's necessary to keep the /private/ subdirectory? Not sure what best practice is on that but it seems like it might be overkill.
Follows #41, in particular this comment.
The issue is that many unit tests are written like so
def test_something(self):
try:
some_code_that_possibly_throws_error
except Exception as e:
self.assertRaises(Exception, e)
This is meant to check that faulty inputs cause errors. But tests written this way will always pass.
Follows Issue 38 in gentzkow/template.
In Issue 38 of gentzkow/template, we found that there were changes made to the copy of gslab_make in gentzkow/template which have not been merged into this repository. These changes are largely about checking that the user's conda environment setup is correct. In this issue, we will pull these changes into this repository.
Task list
See here.
- Make error handling as robust and user-friendly as possible
- Add functionality to inputs.txt that checks inputs have an unmodified git status and give friendly warning if check fails
- Add output_local functionality
Hi @gentzkow:
I've been updating gslab_make
and template
to some best practices and wanted to get your feedback.
I added some community standards documents (#23).
I migrated the documentation to Sphinx (#22). As a demo, I'm currently hosting it on my forked version of gslab_make
(I don't have access for the main repository). I'm still playing around with some of the auto-documentation functionalities, but let me know if there's anything about the layout I should change.
In terms of next steps:
gslab_make
, template
, and ra-manual
as a response to the data editor reviews from SocialMediaEffects
.How do you think I should best prioritize the next steps?
Follows Issue 38 in gentzkow/template.
In Issue 38 of gentzkow/template, we are moving to use Git submodules to track the entire gslab_make codebase. Scripts that use gslab_make
then should include only the module in the current gslab_make
folder.
To make the name of the included module more intuitive, we will change the folder name from gslab_make
to source
. When importing the relevant module from this codebase, this lets us import from path gslab_make/source
instead of from path gslab_make/gslab_make
.
Follows Revise run_all.py #58 in the gentzkow/template repository.
After completing the set up process in gentzkow/template, snairdesai and I both got the same error after running run_all.py on the terminal. The text of the error on the terminal can be seen on template_build_error terminal.txt. The purpose of the run_module function is to 'Run script build_script
in module directory module
relative to root of repository root
. Therefore, the function should return to root directory after running a module if run from the root directory.
Task list
This is an issue in which @danielagonzalez can review the current refactored gslab_make
code, to raise any questions for @z-y-huang and @jcconway that she may have, and to make suggestions of additional improvements to the code. Particular attention should be paid to the code in the branch here.
Per https://github.com/gentzkow/CommitFlex/issues/112#issuecomment-1262897169, we could add support for Julia in the gslab_make module. Tests will be run in template
(see gentzkow/template#67)
Task list
As part of the larger project outline on the wiki, this task is to removed outdated features from gslab_make
. As listed on the wiki, this includes the following tasks:
get_externals
and get_externals_github
Currently, make.py
does not break when Stata runs into an error (as it does not return an error code).
Goals:
tablefill
and textfill
for Python 3tablefill
and textfill
to make codebase more Pythonic and transparentI would focus on completing (1) then moving on to (2). I had started on (1) with this commit but never actually tested to see if everything was working correctly on Python 3.
I would branch off issue13_template_to_dos.
Following gentzkow/template#65 (comment), we want use the latest versions of software + packages in the lab projects, unless there is a valid argument to freeze one or more of the packages to specific versions. @snairdesai and I proposed to comment the versions used in successful runs, but that might lead to errors that are difficult to track (i.e., someone forgets to manually update the comments). As discussed on a meeting, @gentzkow agreed that a good idea would be to automatically produce a log file that keeps track of all the versions installed by a user in any commit that is pushed to the repo. In that scenario, if a script stops working in the future, we can check the difference in the software + package versions and identify the source of the error.
Task list
write_logs.py
, write_source_logs.py
) and if we should add a separate script for this purpose or add a function to write_logs.py
In accordance with larger project outline on the wiki.
We will use this repo to build a simplified version of the gslab_make
and gslab_fill
modules which are currently part of gslab_python.
This issue takes the first step which is to import the existing content.
Step:
gslab_scons
and gslab_misc
subdirectoriesgslab_scons
and gslab_misc
, and to boil things down to the minimum needed to run gslab_make
and gslab_fill
gslab_make
and gslab_fill
files can be combined in a single directory. Make sure everything follows best practices for organizing modules -- see e.g. here.At this point you should not make any significant changes to the code inside gslab_make
and gslab_fill
. That will be the next step.
Following #16.
Refactor tablefill
. Goal is to simplify codebase and add error messaging.
A few minor changes need to be made in order for the changes in #45 to be useable in gentzkow/template.
Continuation of #13.
gslab_make
and template
for SocialMediaEffects
Follows gentzkow/template#45. More specific guidance here.
Among the goals of gentzkow/template#45 is to move directory paths to config.yaml
. This requires a function akin to update_paths()
that can load PATHS
.
Follows issue 261 in PhoneAddiction.
When running many bootstrap iterations in R for the PhoneAddiction repo, we found that the R files ran correctly when run using RScript
, but hung and did not complete when run using gslab.run_r
. The diagnosis for this appears to be the process.wait()
statement, which may block if pipes overflow. From the documentation,
Note This will deadlock when using stdout=PIPE or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use Popen.communicate() when using pipes to avoid that.
This StackOverflow discussion offers another explanation of the difference between wait and communicate.
The best fix appears to be simply removing the process.wait()
line, as the subsequent process.communicate()
line already calls wait internally. The one concern is that communicate()
cannot handle output which is too large to be buffered in memory. ("Too large" is on the order of gigabytes or many megabytes here). Since our outputs are mostly R logs, though, it seems that communicate()
should be sufficient for our purposes.
Task list
process.wait()
from Directive.execute_command()
@Houdanait tagging you here if you have any other thoughts on wait vs communicate!
Add community standards suggested here to promote open source contribution.
In accordance with larger project outline on the wiki.
Have gslab_make
under gslab_python_lite
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.