gslab-econ / gslab_python Goto Github PK
View Code? Open in Web Editor NEWPython tools for GSLab
License: MIT License
Python tools for GSLab
License: MIT License
Gang: I just made a fresh copy of the [template|https://github.com/gslab-econ/template] on my local machine and got all targets built successfully! It took a little time to get the configuration right but all the dependencies were clearly documented. This is great.
The only issue I ran into was with Stata flavor. Even though I have a (windows) environment variable set, the executable-finder logic did not catch it. Has that been tested on windows?
In the meantime I am running with scons sf=StataMP-64.exe
. Is there a way to set "sf" within the SConstruct
so that I don't have to specify it at the command line?
See #58 (comment)
This task will prepare a changelog for gslab_python
and template
4.0 release. In particular:
Document new & removed features, bugs fixed, and changes in existing functionality. Document where they happened.
Document required changes to existing projects' templates due to backward incompatibility.
Ping PIs for review.
See gslab-econ/template#19 (comment). Should either live in gslab_python or at the root of the template and its helpers live in gslab_python.
See the next two comments also.
The following code snipper in log.py
is quite slow when there's a very large build
.
for root, dirs, files in os.walk(parent_dir): # take a walk in parent and child dirs
for f in files:
print (root, dirs, f)
if f.endswith("sconscript.log"):
f_full = os.path.join(root, f)
My instinct is that there are two ways you can speed this up: 1) search targets in sconscript
for directory and only look in those directory, and then use grep
2) use grep
at the top directory.
I think this should be quick, if there indeed is a faster way. If not I think we may want to delay this beyond 4.0 and build in a mechanism allowing you to not search for logs in certani directory, or skip logging in some runs.
Follow package organization established in #24.
check_code_extension(source_file, '.do')
and in misc.py
def check_code_extension(source_file, ext):
source_file = str.lower(source_file)
if not source_file.endswith(ext):
raise BadExtensionError('First argument, ' + source_file + ', must be a ' + ext + ' file')
return None
target_file
in the code makes it look like we require there to be a single target file. Really, we're only using this to pick a directory for the log file. So I might drop the lines starting with target_file
and target_dir
and instead saylog_dir = os.path.dirname( str(target[0]) )
Should we have all builder's return an informative error message for the case where the executable is not found / installed (so, e.g., if I call the Lyx builder and don't have lyx correctly on the path I get a nice error explaining what happened and suggesting that I make sure it's installed and on the path)? This is a very common case when people try to run our code, and it would be nice to handle it explicitly.
In build tables we can replace source[1:len(source)]
with source[1:]
I think we should restructure build_stata. I would suggest we define two custom functions get_stata_executable() and get_stata_options() so that the main builder ends with
exec = get_stata_executable('user_flavor')
options = get_stata_options()
os.system('%s %s' % (exec, options))
and is otherwise the same as the other builders. Then we could structure get_stata_executable() as roughly:
def get_stata_executable(user_flavor):
exec_list = ['stata-mp', 'stata-mp.exe', 'stata-mp-64.exe', 'stata-se', ...etc...]
if user_flavor:
exec = user_flavor
elif platform == 'win32' & os.environ['STATAEXE']:
exec = os.environ['STATAEXE']
else:
for e in exec_list:
if is_in_path(e):
exec = e
return exec
if not (mode in ['develop', 'cache', 'release']):
print("Error: %s is not a defined mode" % mode)
sys.exit()
if mode == 'release' and vers == '':
print("Error: Version must be defined in release mode")
sys.exit()
This was code that was living in large_template and missed getting integrated into the original gslab_scons release. This step should involve removing this code from the template.
INPUT
from gslab-econ/template#19 (comment).
INPUT
suggestion, we should require that source have a single element, since these builders are designed to only handle a single source file?(1) We log only the builds scons runs, but we append rather than overwrite;
so the log accumulates a record of every run. One can easily search to find
the most recent run of a particular build.
I should be structured so that it is easy to search and filter to find a specific run. Run time for each build step along with warnings and errors should be printed to this file.
As discussed here, PyPI should be updated after gslab-econ/admin#67 has closed.
move everything to the gslab_scons library with unit tests, the options Sconscript file, and also flag everyone to review before PyPI push.
Will start after sprint and gslab-econ/admin#67 has closed
EDIT: I (LB) think this task should include updating the large_template to use the gslab_scons library from PyPI (or a follow up task should be created to implement this).
Do the issue_size_warnings
in scons_debrief
only get printed to the state_of_repo.log
? If so, I think we might want to remove them. I think there are two issues with the current setup:
(1) I think this line causes issues on a repo that has been cloned, but doesn't have any version history yet.
(2) I get errors if I don't have /release/
as a folder. Though I don't understand why not having /raw/
doesn't cause similar issues (see here).
Perhaps I'm missing something.
I'm getting:
scons: building terminated because of errors.
fatal: your current branch 'master' does not have any commits yet
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/Users/leviboxell/anaconda/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/misc.py", line 26, in scons_debrief
issue_size_warnings(look_in, file_MB_limit, total_MB_limit)
File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/size_warning.py", line 17, in issue_size_warnings
versioned = create_size_dictionary(look_in)
File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/size_warning.py", line 121, in create_size_dictionary
raise ReleaseError("The path argument does not specify an "
ReleaseError: The path argument does not specify an existing directory.
Error in sys.exitfunc:
Traceback (most recent call last):
File "/Users/leviboxell/anaconda/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/misc.py", line 26, in scons_debrief
issue_size_warnings(look_in, file_MB_limit, total_MB_limit)
File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/size_warning.py", line 17, in issue_size_warnings
versioned = create_size_dictionary(look_in)
File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/size_warning.py", line 121, in create_size_dictionary
raise ReleaseError("The path argument does not specify an "
gslab_scons._exception_classes.ReleaseError: The path argument does not specify an existing directory.
Following python template suggestion from #24, update the testings along with the requirements below:
If our use of unittests is out of line with standard practices, there are good reasons to switch, and switching is low cost, propose a new solution and implement after confirming with rest of team. [Switching for some modules only may be an option. I.e., we probably don't want to re-write gslab_make tests.]
Let's get rid of make.py here. Perhaps we can just execute tests directly as python run_all_tests.py > ...
or similar? [We probably want to be able to just specify python run_all_tests.py
and the logging gets taken care of within run_all_tests.py
]
Would be nice to print test results to stdout as well as log file so user can see what's going on and know whether tests passed wihtout opening log.
/tests/test/ seems like bad naming? Or is this standard? We should do whatever people do in real Python libraries.
When I run tests in gslab_scons everything seems to pass OK, but the last part of the log file is a bunch of errors. Maybe these are errors from tests that were supposed to produce errors? In any case, we should find a way to suppress this output because it makes it unclear whether the tests ran as expected or not.
When I run the tests in gslab_make they hang (or appear to hang) on test_zip. All tests should be written so they take no more than a few seconds to run.
@M-R-Sullivan,
We never updated our PyPI package after https://github.com/gslab-econ/admin/issues/61. Could you do this?
@paultfriedrich1, @michaelstepner, and Ande uncovered a bug in the current PyPI version of gslab_tools
.
When more than one make.py
runs at the same time, they may open, write to, read from, and remove the same temporary text file at the same time. An issue arises when one make.py
removes the temporary text file while another is still trying to write to or read from it.
This bug was introduced in the switch from SVN to git and PyPI. On SVN, each directory pulls down its own copy of the packages in gslab_tools
and stores them locally. The referencing in runprogramdirective.py
takes advantage of this modularity and stores the temporary text file among its own packages. With PyPI, all the packages are stored in a single central location. So when runprogramdirective.py
references its own packages, it's really referencing the central packages shared by all the other repositories. When more than one repository runs at a time, each one can end up depending on a single file that is edited and removed by a different repository.
To solve the bug, each repository should create and store its own local version of the temporary text file. This will store the temporary text file in the same way as the SVN version of gslab_tools
.
See gslab-econ/template#19 (comment) and http://www.scons.org/doc/2.0.1/HTML/scons-user/c3833.html. Should be implemented in gslab_scons.
From the last meeting we agree that we should allow user to specify a release directory that is not Google Drive. My sense is that we can do that by changing
release.py
line 35
USER = os.environ['USER']
if branch == 'master':
name = repo
branch = ''
else:
name = "%s-%s" % (repo, branch)
local_release = '/Users/%s/Google Drive/release/%s/' % (USER, name)
local_release = local_release + version + '/'
where we move USER
and local_release
to user_config.yaml
.
@lboxell, I think you're the most familiar with this code out of current lab members and are least likely to unintentionally break something while working with it. Do you want to make this change? I can do it as well. Thanks!
Remove extra layer so pypi recognizes it.
@lboxell @arosenbe @M-R-Sullivan @stanfordquan it looks like gslab-python depends on a package called "requests" that is not automatically installed by setup.py
.
Is this intended behavior?
@lboxell @arosenbe @M-R-Sullivan @stanfordquan my reading is that the template depends on gslab-python but that, since we deprecated support for PyPI, gslab-python's releases are no longer being made public.
Is that correct?
If so let's use this issue to discuss how best to publicize gslab-python. One option would be to make the whole repo public. Another would be to publicize only the releases. From this post it looks like there is no direct way to publicize a release from a private repo but there are some workarounds.
Follow up from #37 a number of tests were found out to be incompatible with Windows machines. This task changes that.
Particularly python setup.py install clean
. May have to look into the directory where pip install
goes. Limited within Python distribution.
Create gslab_scons builders for SVN export and gdrive downloads.
This task should involve updating the template appropriately (or creating a new task to do this) with our new build/externals protocol.
It might be best to wait on this task until https://github.com/gslab-econ/admin/issues/78 is finished.
For after 4.0 is released.
It seems that the cache reverts to single alphanumeric indexing of the cache instead of double alphanumeric indexing quite frequently, which causes scons to tell you to upgrade your cache. I've had trouble following the scons instructions directly to get the upgrade to work, but the code below should work (follow the instructions within it).
We should think about incorporating this into our gslab_python codebase as a function. Then have a call in Template that if mode == cache
and the user's cache is out-of-date, then it prompts the user whether they would like to run this script.
Running the following code prints nothing to the terminal on Win.
import gslab_scons as gs
gs.start_log()
while True:
print('printing until time ends')
prints nothing to terminal, only log file.
We probably don't want to redirect Scons output entirely but duplicate, since otherwise this makes repo bugs difficult to track. Especially important is the fact that this method prevent user input even when cmd
asks for it, since the prompt is redirected to a different file.
The notes on release.py have fallen out of sync with the code. (They still mention Google Drive!). The goal of this task is to bring them up to date.
Currently, tablefill isn't compatible with .tex. The goal of this task is to make it so.
See #58 (comment)
See #58 (comment)
See https://github.com/gslab-econ/admin/issues/87#issuecomment-275005375. We'll want a dropbox module that can upload and download both files and folders (recursively) from a specific revision.
We'll then want to use these functions in gslab_scons for release and getting raw/externals, along with the ability to easily upload files to dropbox from command line.
See gslab-econ/template#19 (comment) and gslab-econ/template#19 (comment). Should live in gslab_python and, by default, run after every scons run. Should be printed to sconstruct (or it's own) log file.
There should be some safety checks to ensure it doesn't take forever to run through millions of files and shouldn't do any computational steps (i.e., compute hashes).
In #36 (specifically, in this comment), we noted that the unit tests for gslab_python
's gslab_scons
package should be reviewed and improved. The goal of this issue is to do so.
The tests currently take over half a minute to run. We should probably find a way to make them run faster.
Suppose we have
source/analysis/dofile.do
source/analysis/mfile.m
writing to build/analysis/
Then they will overwrite sconscript.log
under the current implementation.
We want them to write to
build/analysis/sconscript_dofile_do.log
build/analysis/sconscript_mfile_m.log
respectively.
This should also change the log.py
particularly in the os.walk
step to add a wildcard for whatever follows sconscript
in sconscript.log
This task should write unit tests for the methods in the gencat
abstract class. The goal is to btring the module up to version 1.0.0.
start_make_log is causing issues when checking for SVN info.
See gslab-econ/template#35 (comment). Fix both the gslab_python and template side of things.
In congress_text, we pass congress and years to python via the command line. This prevents us from using the current implementation of build_python. We should modify the builder tools to allow command line arguments to be passed.
This task will:
Create variant of our file size checker used for release, to prevent commits of large files when git-lfs is turned off.
Ensure that .gitattributes stays stable even when git-lfs is turned off.
The builder for tablefill doesn't use any of the nice machinery we built for the other builders. We thought it may be a bit complicated to update them but then we realize a hackish way to get around it without having to touch legacy code (i.e. tablefill.py
).
Basically we'll write another wrapper script for tablefill.py
with a __main__
function, and then calls it from subprocess as you would Matlab/R/Stata the likes. Then the logging machinery works normally and you can produce a Sconscript.log
from xx import *
; switch to import xx
or from xx import specific-commands
__init__.py
scripts to import the library's constituent functions and hence avoid having to import each script or function manually if this conforms with best practices.__init__.py
. But I could be wrong.build.py
into some other files, e.g. build_lyx.py
, etc., stored in a subdirectory. If the way we're doing it is python SOP then it's fine with me, but it seems like as we add builders it is going to be a nuisance to scan build.py
and find what we want.The goal of this issue is to discuss ways to integrate SQL queries executed against an external database into the SCons build framework.
See comment below.
As far as I know we don't yet have a matlab builder for scons. I wrote a rudimentary one for my course.
It is attached here: build_matlab.txt
The purpose of this issue is to improve it and fold it into our library.
I've assigned to everyone but feel free to shuffle around as you like.
Code uses default path to a StataMP executable for mac or windows.
User is allowed to specify to the executable in config_user.yaml.
If neither path works and Stata dependencies are enabled, raise informative error.
Our release.py script doesn't ignore non-versioned files when issuing size warnings if they are not explicitly listed in the .gitignore. The goal of this issue is to edit the script so that it ignores files in directories listed in the .gitignore and whose paths are captured by glob patterns in the .gitignore.
Also, from the outline:
It also would be nice if it recognized that the files were already versioned, and not ask about those, but only ask about new versioned files (need to think about this part some more though).
The current Python Builder will raise a custom error and message when a python process exits on error, but it doesn't print the traceback. The goal of this task is to get the Python Builder to print both our custom error and the traceback when a python process exits on error.
Follow up to gslab-econ/template#16. Per https://github.com/gslab-econ/admin/issues/67#issuecomment-243509241 and gslab-econ/template#7 (comment), we would like to have everyone review the gslab_scons module as time permits.
After gslab-econ/template#16, everyone should be able to download the gslab_scons module as a part of the gslab_tools package on pypi and use gslab_scons module within the template repo.
Probably low priority task relative to finalizing TRANS and congress_text, and @gentzkow can decide how he would like to aggregate comments (either discussion in person or via GitHub).
Also, flagging @jmshapir because he should be able to begin playing around with the module at this point and add any other comments he wants to regarding the code.
Evaluation can be done in parallel with gslab-econ/template#19.
Follow up to #8.
@huntallcott and I found that the build_stata()
builder wouldn't work on his Windows computer. It seems like the error arises fromplatform
not being defined in its script. The goal of this issue is to fix the builder so that it runs on Windows computers.
Add 'import os' to release.py in gslab_scons.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.