Git Product home page Git Product logo

gslab_python's People

Contributors

arosenbe avatar davidritzwoller avatar lboxell avatar veli-m-andirin avatar yuchuan2016 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gslab_python's Issues

Stata flavor

Gang: I just made a fresh copy of the [template|https://github.com/gslab-econ/template] on my local machine and got all targets built successfully! It took a little time to get the configuration right but all the dependencies were clearly documented. This is great.

The only issue I ran into was with Stata flavor. Even though I have a (windows) environment variable set, the executable-finder logic did not catch it. Has that been tested on windows?

In the meantime I am running with scons sf=StataMP-64.exe. Is there a way to set "sf" within the SConstruct so that I don't have to specify it at the command line?

Prepare for 4.0 release

This task will prepare a changelog for gslab_python and template 4.0 release. In particular:

  1. Document new & removed features, bugs fixed, and changes in existing functionality. Document where they happened.

  2. Document required changes to existing projects' templates due to backward incompatibility.

  3. Ping PIs for review.

Find way to speed up log.py

The following code snipper in log.py is quite slow when there's a very large build.

    for root, dirs, files in os.walk(parent_dir): # take a walk in parent and child dirs
        for f in files:
            print (root, dirs, f)
            if f.endswith("sconscript.log"):
                f_full = os.path.join(root, f)

My instinct is that there are two ways you can speed this up: 1) search targets in sconscript for directory and only look in those directory, and then use grep 2) use grep at the top directory.

I think this should be quick, if there indeed is a faster way. If not I think we may want to delay this beyond 4.0 and build in a mechanism allowing you to not search for logs in certani directory, or skip logging in some runs.

gslab_scons restructure

Follow package organization established in #24.

  • For check_code_extension(), why not just pass it the extension directly rather than the software name; so we could say in build.py:
check_code_extension(source_file, '.do')

and in misc.py

def check_code_extension(source_file, ext):
    source_file = str.lower(source_file)
    if not source_file.endswith(ext):
        raise BadExtensionError('First argument, ' + source_file + ', must be a ' + ext + ' file')
    return None
  • Having target_file in the code makes it look like we require there to be a single target file. Really, we're only using this to pick a directory for the log file. So I might drop the lines starting with target_file and target_dir and instead say
log_dir = os.path.dirname( str(target[0]) )
  • Should we have all builder's return an informative error message for the case where the executable is not found / installed (so, e.g., if I call the Lyx builder and don't have lyx correctly on the path I get a nice error explaining what happened and suggesting that I make sure it's installed and on the path)? This is a very common case when people try to run our code, and it would be nice to handle it explicitly.

  • In build tables we can replace source[1:len(source)] with source[1:]

  • I think we should restructure build_stata. I would suggest we define two custom functions get_stata_executable() and get_stata_options() so that the main builder ends with

exec = get_stata_executable('user_flavor')
options = get_stata_options()

os.system('%s %s' % (exec, options))

and is otherwise the same as the other builders. Then we could structure get_stata_executable() as roughly:

def get_stata_executable(user_flavor):
    exec_list = ['stata-mp', 'stata-mp.exe', 'stata-mp-64.exe', 'stata-se', ...etc...]

    if user_flavor:
        exec = user_flavor
    elif platform == 'win32' & os.environ['STATAEXE']:
        exec = os.environ['STATAEXE']
    else: 
        for e in exec_list:
            if is_in_path(e):
                exec = e
    return exec
  • start_log should include:
if not (mode in ['develop', 'cache', 'release']):
	print("Error: %s is not a defined mode" % mode)
	sys.exit()

if mode == 'release' and vers == '':
	print("Error: Version must be defined in release mode")
	sys.exit()

This was code that was living in large_template and missed getting integrated into the original gslab_scons release. This step should involve removing this code from the template.

  • Rethink how we are doing logging in a way that tracks build time and warnings across runs for each build step. See #16 for a discussion.
  • See if there is a way to write unit tests for release.py functions.
  • Include cache reconfiguration command/flag? (#16 (comment))
  • Think through using INPUT from gslab-econ/template#19 (comment).
    • If we use the INPUT suggestion, we should require that source have a single element, since these builders are designed to only handle a single source file?
  • Using a single log file (that replaces sconstruct.log and sconscript):

(1) We log only the builds scons runs, but we append rather than overwrite;
so the log accumulates a record of every run. One can easily search to find
the most recent run of a particular build.

I should be structured so that it is easy to search and filter to find a specific run. Run time for each build step along with warnings and errors should be printed to this file.

Follow-up to gslab-econ/admin#67: Update PyPI with gslab_scons library

As discussed here, PyPI should be updated after gslab-econ/admin#67 has closed.

move everything to the gslab_scons library with unit tests, the options Sconscript file, and also flag everyone to review before PyPI push.

Will start after sprint and gslab-econ/admin#67 has closed

EDIT: I (LB) think this task should include updating the large_template to use the gslab_scons library from PyPI (or a follow up task should be created to implement this).

debrief error on new repo

@stanfordquan,

Do the issue_size_warnings in scons_debrief only get printed to the state_of_repo.log? If so, I think we might want to remove them. I think there are two issues with the current setup:
(1) I think this line causes issues on a repo that has been cloned, but doesn't have any version history yet.
(2) I get errors if I don't have /release/ as a folder. Though I don't understand why not having /raw/ doesn't cause similar issues (see here).

Perhaps I'm missing something.

I'm getting:

scons: building terminated because of errors.
fatal: your current branch 'master' does not have any commits yet
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/Users/leviboxell/anaconda/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/misc.py", line 26, in scons_debrief
    issue_size_warnings(look_in, file_MB_limit, total_MB_limit)
  File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/size_warning.py", line 17, in issue_size_warnings
    versioned = create_size_dictionary(look_in)
  File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/size_warning.py", line 121, in create_size_dictionary
    raise ReleaseError("The path argument does not specify an "
ReleaseError: The path argument does not specify an existing directory.
Error in sys.exitfunc:
Traceback (most recent call last):
  File "/Users/leviboxell/anaconda/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/misc.py", line 26, in scons_debrief
    issue_size_warnings(look_in, file_MB_limit, total_MB_limit)
  File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/size_warning.py", line 17, in issue_size_warnings
    versioned = create_size_dictionary(look_in)
  File "/Users/leviboxell/anaconda/lib/python2.7/site-packages/gslab_scons/size_warning.py", line 121, in create_size_dictionary
    raise ReleaseError("The path argument does not specify an "
gslab_scons._exception_classes.ReleaseError: The path argument does not specify an existing directory.

Improve unit testing

Following python template suggestion from #24, update the testings along with the requirements below:

  • If our use of unittests is out of line with standard practices, there are good reasons to switch, and switching is low cost, propose a new solution and implement after confirming with rest of team. [Switching for some modules only may be an option. I.e., we probably don't want to re-write gslab_make tests.]

  • Let's get rid of make.py here. Perhaps we can just execute tests directly as python run_all_tests.py > ... or similar? [We probably want to be able to just specify python run_all_tests.py and the logging gets taken care of within run_all_tests.py]

  • Would be nice to print test results to stdout as well as log file so user can see what's going on and know whether tests passed wihtout opening log.

  • /tests/test/ seems like bad naming? Or is this standard? We should do whatever people do in real Python libraries.

  • When I run tests in gslab_scons everything seems to pass OK, but the last part of the log file is a bunch of errors. Maybe these are errors from tests that were supposed to produce errors? In any case, we should find a way to suppress this output because it makes it unclear whether the tests ran as expected or not.

  • When I run the tests in gslab_make they hang (or appear to hang) on test_zip. All tests should be written so they take no more than a few seconds to run.

Change tempfile location in runprogramdirective.py

@paultfriedrich1, @michaelstepner, and Ande uncovered a bug in the current PyPI version of gslab_tools.

When more than one make.py runs at the same time, they may open, write to, read from, and remove the same temporary text file at the same time. An issue arises when one make.py removes the temporary text file while another is still trying to write to or read from it.

This bug was introduced in the switch from SVN to git and PyPI. On SVN, each directory pulls down its own copy of the packages in gslab_tools and stores them locally. The referencing in runprogramdirective.py takes advantage of this modularity and stores the temporary text file among its own packages. With PyPI, all the packages are stored in a single central location. So when runprogramdirective.py references its own packages, it's really referencing the central packages shared by all the other repositories. When more than one repository runs at a time, each one can end up depending on a single file that is edited and removed by a different repository.

To solve the bug, each repository should create and store its own local version of the temporary text file. This will store the temporary text file in the same way as the SVN version of gslab_tools.

Remove Google Drive as default drive for `release.py`

From the last meeting we agree that we should allow user to specify a release directory that is not Google Drive. My sense is that we can do that by changing

release.py line 35

    USER = os.environ['USER']
    if branch == 'master':
        name   = repo
        branch = ''
    else:
        name = "%s-%s" % (repo, branch)
    local_release = '/Users/%s/Google Drive/release/%s/' % (USER, name)
    local_release = local_release + version + '/'

where we move USER and local_release to user_config.yaml.

@lboxell, I think you're the most familiar with this code out of current lab members and are least likely to unintentionally break something while working with it. Do you want to make this change? I can do it as well. Thanks!

Publicizing releases

@lboxell @arosenbe @M-R-Sullivan @stanfordquan my reading is that the template depends on gslab-python but that, since we deprecated support for PyPI, gslab-python's releases are no longer being made public.

Is that correct?

If so let's use this issue to discuss how best to publicize gslab-python. One option would be to make the whole repo public. Another would be to publicize only the releases. From this post it looks like there is no direct way to publicize a release from a private repo but there are some workarounds.

Add reconfigure cache code

For after 4.0 is released.

It seems that the cache reverts to single alphanumeric indexing of the cache instead of double alphanumeric indexing quite frequently, which causes scons to tell you to upgrade your cache. I've had trouble following the scons instructions directly to get the upgrade to work, but the code below should work (follow the instructions within it).

We should think about incorporating this into our gslab_python codebase as a function. Then have a call in Template that if mode == cache and the user's cache is out-of-date, then it prompts the user whether they would like to run this script.

scons-configure-cache.txt

gslab_scons.start_log() prevent terminal output

Running the following code prints nothing to the terminal on Win.

import gslab_scons as gs

gs.start_log()

while True:
    print('printing until time ends')

prints nothing to terminal, only log file.

We probably don't want to redirect Scons output entirely but duplicate, since otherwise this makes repo bugs difficult to track. Especially important is the fact that this method prevent user input even when cmd asks for it, since the prompt is redirected to a different file.

Review and improve gslab_scons unit tests

In #36 (specifically, in this comment), we noted that the unit tests for gslab_python's gslab_scons package should be reviewed and improved. The goal of this issue is to do so.

The tests currently take over half a minute to run. We should probably find a way to make them run faster.

Replace <*> with <source file name> in <sconscript_*.log>

Suppose we have

source/analysis/dofile.do
source/analysis/mfile.m

writing to build/analysis/

Then they will overwrite sconscript.log under the current implementation.

We want them to write to

build/analysis/sconscript_dofile_do.log
build/analysis/sconscript_mfile_m.log

respectively.

This should also change the log.py particularly in the os.walk step to add a wildcard for whatever follows sconscript in sconscript.log

Write unit tests for gencat

This task should write unit tests for the methods in the gencat abstract class. The goal is to btring the module up to version 1.0.0.

Update tablefill builder with 4.0 logging machinery

The builder for tablefill doesn't use any of the nice machinery we built for the other builders. We thought it may be a bit complicated to update them but then we realize a hackish way to get around it without having to touch legacy code (i.e. tablefill.py).

Basically we'll write another wrapper script for tablefill.py with a __main__ function, and then calls it from subprocess as you would Matlab/R/Stata the likes. Then the logging machinery works normally and you can produce a Sconscript.log

Reorganise repository following review

  • Restructure repo to allow for installation from pip via GitHub. Remove references to PyPI. This will most likely include making the repo public after checking with @gentzkow and @jmshapir to confirm there isn't anything here that we want to hide.
  • We should not use from xx import *; switch to import xx or from xx import specific-commands
  • We should write all readmes, documentation files, docstrings, etc. as if we were publishing code externally. So we should not use "us" or "we", should not write anything that assumes lab specific knowledge (e.g., of the way we used to write make.py scripts), and should not refer to lab-specific assets (like SVN repo) except when absolutely necessary. If we do refer to lab-specific assets, this should be set apart from the rest of the content and flagged as something like "lab-specific notes"
  • Top level readme should only contain general content; details that apply only to a specific package (e.g., gslab_make) should go in that package's documentation
  • Determine standard python package layout to use that mimics an official python library following best practices and confirm with team that everyone agrees with it (see #16 (comment) for a starting point). Then:
    • Restructure all packages to conform to this layout.
    • Use the __init__.py scripts to import the library's constituent functions and hence avoid having to import each script or function manually if this conforms with best practices.
    • We should follow whatever is the preferred scheme for documenting Python packages. Right now we have very detailed readmes for gslab_make and gslab_fill, but a very short readme for gslab_scons.py. My impression had been that we should instead be including documentation in docstrings, with documentation for the overall package going in the docstring for __init__.py. But I could be wrong.
    • Whatever we do, we should pare back the gslab_make documentation to the most important content and delete anything that is likely to become inconsistent or obsolete. E.g., we should not be listing what command line calls are run for run_sas, run_stata, etc. since this is redundant with the metadata. Better to include a pointer to the appropriate metadata file for details.
    • I wonder about separating the definitions in build.py into some other files, e.g. build_lyx.py, etc., stored in a subdirectory. If the way we're doing it is python SOP then it's fine with me, but it seems like as we add builders it is going to be a nuisance to scan build.py and find what we want.

Incorporate matlab builder

As far as I know we don't yet have a matlab builder for scons. I wrote a rudimentary one for my course.
It is attached here: build_matlab.txt

The purpose of this issue is to improve it and fold it into our library.

I've assigned to everyone but feel free to shuffle around as you like.

Change executable-finding behavior in Stata builder

Code uses default path to a StataMP executable for mac or windows.

User is allowed to specify to the executable in config_user.yaml.

If neither path works and Stata dependencies are enabled, raise informative error.

Fix gslab_scons.release.py's size warnings

Our release.py script doesn't ignore non-versioned files when issuing size warnings if they are not explicitly listed in the .gitignore. The goal of this issue is to edit the script so that it ignores files in directories listed in the .gitignore and whose paths are captured by glob patterns in the .gitignore.

Also, from the outline:

It also would be nice if it recognized that the files were already versioned, and not ask about those, but only ask about new versioned files (need to think about this part some more though).

Print traceback for python builder

The current Python Builder will raise a custom error and message when a python process exits on error, but it doesn't print the traceback. The goal of this task is to get the Python Builder to print both our custom error and the traceback when a python process exits on error.

Team review of gslab_scons module

Follow up to gslab-econ/template#16. Per https://github.com/gslab-econ/admin/issues/67#issuecomment-243509241 and gslab-econ/template#7 (comment), we would like to have everyone review the gslab_scons module as time permits.

After gslab-econ/template#16, everyone should be able to download the gslab_scons module as a part of the gslab_tools package on pypi and use gslab_scons module within the template repo.

Probably low priority task relative to finalizing TRANS and congress_text, and @gentzkow can decide how he would like to aggregate comments (either discussion in person or via GitHub).

Also, flagging @jmshapir because he should be able to begin playing around with the module at this point and add any other comments he wants to regarding the code.

Evaluation can be done in parallel with gslab-econ/template#19.

Fix bug in build_stata.py

@huntallcott and I found that the build_stata() builder wouldn't work on his Windows computer. It seems like the error arises fromplatform not being defined in its script. The goal of this issue is to fix the builder so that it runs on Windows computers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.