Git Product home page Git Product logo

Comments (3)

dpanech avatar dpanech commented on May 18, 2024

Here's a patch that fixes some of these problems:

  • sets a flag ("success") on the loader objects any time an exception is caught in a thread
  • the main function checks all threads for the success flag and returns 0 (all threads finished successfully) or 1 (at least one thread had an exception, or some other exception occured in the main thread).
  • The script no longer returns the result of "print_summary" to the OS. I'm not sure what that's supposed to do, but it seems wrong. Note that only values 0..127 are portable for sys.exit return codes, with 0 meaning "success". Without this it's impossible to tell whether pgloader succeeded or not after running it from, say, another script.

Also, it would be nice if pgloader returned some well-defined (and documented) error codes to the OS, so calling scripts can check them, maybe something like:

  • 0 -- success
  • 1 -- fatal error
  • 2 -- some records were rejected
  • ...

Note that I don't really "know" python, please double check these changes... they seem to work for me though.

    diff --git a/pgloader.py b/pgloader.py
    index 494e57d..d9e0f52 100755
    --- a/pgloader.py
    +++ b/pgloader.py
    @@ -752,23 +752,29 @@ def load_data():
         log.info("All threads are started, wait for them to terminate")
         check_events(finished, log, "processing is over")

    +    # check whether any thread failed
    +    for section, loader in threads.iteritems():
    +        if not loader.success:
    +            return 1
    +
         # total duration
         td = time.time() - begin
    -    retcode = 0

         if SUMMARY and not interrupted:
             try:
    -            retcode = print_summary(None, sections, summary, td)
    +            print_summary(None, sections, summary, td)
                 print
             except PGLoader_Error, e:
                 log.error("Can't print summary: %s" % e)
    +            return 1

             except KeyboardInterrupt:
    -            pass
    +            return 1

    -    return retcode
    +    return 0

     if __name__ == "__main__":
    +    ret = 1
         try:
             ret = load_data()
         except Exception, e:
    diff --git a/pgloader/pgloader.py b/pgloader/pgloader.py
    index 5b1becd..e585419 100644
    --- a/pgloader/pgloader.py
    +++ b/pgloader/pgloader.py
    @@ -826,36 +826,42 @@ class PGLoader(threading.Thread):
             self.sem.acquire()
             self.log.debug("%s acquired starting semaphore" % self.logname)

    -        # postinit (will call db.reset() which will get us connected)
    -        self.log.debug("%s postinit" % self.logname)
    -        self._postinit()
    -
             # tell parent thread we are running now
             self.started.set()
             self.init_time = time.time()        

    +        try:
    +            # postinit (will call db.reset() which will get us connected)
    +            self.log.debug("%s postinit" % self.logname)
    +            self._postinit()
    +
    +            # do the actual processing in do_run
    +            self.do_run()
    +            
    +        except Exception, e:
    +            self.log.error(e)
    +            self.terminate(False)
    +            return
    +
    +        self.terminate()
    +        return
    +
    +    def do_run(self):
    +
             # Announce the beginning of the work
             self.log.info("%s processing" % self.logname)

             if self.section_threads == 1:
    -            try:
    -                # when "No space left on device" where logs are sent,
    -                # we want to catch the exception
    -                if 'reader' in self.__dict__ and self.reader.start is not None:
    -                    self.log.debug("Loading from offset %d to %d" \
    -                                   %  (self.reader.start, self.reader.end))
    -
    -                self.prepare_processing()
    -                self.process()
    -                self.finish_processing()
    +            # when "No space left on device" where logs are sent,
    +            # we want to catch the exception
    +            if 'reader' in self.__dict__ and self.reader.start is not None:
    +                self.log.debug("Loading from offset %d to %d" \
    +                               %  (self.reader.start, self.reader.end))

    -            except Exception, e:
    -                # resources get freed in self.terminate()
    -                self.terminate()
    -                self.log.error(e)
    -                raise
    +            self.prepare_processing()
    +            self.process()
    +            self.finish_processing()

    -            self.terminate()
                 return

             # Mutli-Threaded processing of current section
    @@ -873,10 +879,9 @@ class PGLoader(threading.Thread):
                 # here we need a special thread reading the file
                 self.round_robin_read()

    -        self.terminate()
             return

    -    def terminate(self):
    +    def terminate(self, success = True):
             """ Announce it's over and free the concurrency control semaphore """

             # force PostgreSQL connection closing, do not wait for garbage
    @@ -898,6 +903,7 @@ class PGLoader(threading.Thread):
             except IOError, e:
                 pass

    +        self.success = success
             self.finished.set()
             return

from pgloader.

dpanech avatar dpanech commented on May 18, 2024

BTW is there a mailing list? This issue tracker doesn't support attachments. I had to add a tab to every line in the above patch to prevent it from being re-formatted. Am I doing this wrong? I have never used github (or git or python for that matter) before today, sorry.

from pgloader.

dimitri avatar dimitri commented on May 18, 2024

The github way seems to be: fork the project, use git, push your patches on your fork, then create a pull request ticket --- or just a ticket like this, I can see the patches in the Fork Queue tab too.

I'd appreciate it if you can send me "real" patches, either this way to the plain git way (git send-email, then I git am -s)

from pgloader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.