Git Product home page Git Product logo

Comments (9)

knmnyn avatar knmnyn commented on July 26, 2024

I've just tried to correct errors with W17-35 INLG 2017 which requires re-ingesting the all of the works from W17. This brought up problems with certain consistency checks that seem to have been recently added.

aclanthology@aclanthology:~/acl-anthology/public/pdf/W$ rake import:xml[true,"W17"]
(in /home/aclanthology/acl-anthology)
Seeding individual volume: W17.
PG::ForeignKeyViolation: ERROR: update or delete on table "papers" violates foreign key constraint "papers_people_paper_id_fkey" on table "papers_people"
DETAIL: Key (id)=(52374) is still referenced from table "papers_people".
: DELETE FROM papers WHERE volume_id IN (SELECT id FROM volumes WHERE anthology_id LIKE 'W17%');
rake aborted!
ActiveRecord::InvalidForeignKey: PG::ForeignKeyViolation: ERROR: update or delete on table "papers" violates foreign key constraint "papers_people_paper_id_fkey" on table "papers_people"
DETAIL: Key (id)=(52374) is still referenced from table "papers_people".
: DELETE FROM papers WHERE volume_id IN (SELECT id FROM volumes WHERE anthology_id LIKE 'W17%');
/home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activerecord-4.0.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:128:in exec' /home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activerecord-4.0.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:128:in block in execute'
/home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activerecord-4.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:435:in block in log' /home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activesupport-4.0.1/lib/active_support/notifications/instrumenter.rb:20:in instrument'
/home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activerecord-4.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:430:in log' /home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activerecord-4.0.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:127:in execute'
/home/aclanthology/acl-anthology/lib/tasks/xml_import.rake:330:in block (2 levels) in <top (required)>' /home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/bin/ruby_executable_hooks:15:in eval'
/home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/bin/ruby_executable_hooks:15:in `

'
Tasks: TOP => import:xml
(See full trace by running task with --trace)

from acl-anthology.

knmnyn avatar knmnyn commented on July 26, 2024

We need to fix this problem soon as without the ability to reingest a volume (say workshops from W17) which used to work as of one month ago, we can't ingest new proceedings or update ones.

@villalbamartin can you try ingesting the proceedings in the Saarlands VM at import/W17.xml?

from acl-anthology.

villalbamartin avatar villalbamartin commented on July 26, 2024

I'm on it. While it's not nice that something broke, we were expecting something like this to happen with the new checks.

from acl-anthology.

knmnyn avatar knmnyn commented on July 26, 2024

from acl-anthology.

villalbamartin avatar villalbamartin commented on July 26, 2024

Documenting this bug before I attempt to squash it.

The error is triggered by this line, which deletes all papers (if any) that belong to the volume that it's about to be imported - those papers either do not exist and nothing will be deleted, or they will be re-inserted briefly afterwards.

The problem is the table papers_people, which relates papers and authors. This table remains untouched, leading to the following sequence of events:

  • All papers from the given volume are deleted. Even if they are re-inserted, their new ID means that they are effectively new papers.
  • The table now contains references to deleted papers belonging to some authors.
  • At some point, someone performs a search that involves one of those same authors.
  • The search will then try to calculate how many papers that author has written. This process will fail for deleted papers (because the entry in the papers table is missing), the whole search will then fail, and you'll get the error we've seen recently.

These are the two steps I'll attempt to correct this:

  • Delete the proper entries from papers_people before deleting the referenced entries.
  • Make the entire ingestion process a transaction (if it isn't already), to avoid future inconsistencies.

from acl-anthology.

villalbamartin avatar villalbamartin commented on July 26, 2024

I've now added a line to delete the proper records, and ran rake import:xml[true,"W17"] without error. @knmnyn, could you confirm that things are working as expected? Note that I didn't run the entire ingestion pipeline, only the line that you mentioned in your report.

from acl-anthology.

knmnyn avatar knmnyn commented on July 26, 2024

@villalbamartin Thanks, that looks like it worked!

from acl-anthology.

villalbamartin avatar villalbamartin commented on July 26, 2024

Update: the consistency check disappeared when we recreated the database at some point, and yet the modification we did to xml_import.rake seems to have solved the problem, so I don't see the need to add an extra constraint to the database that we apparently don't need. So even though the fix is real and still around, the database consistency check is no longer there.

from acl-anthology.

knmnyn avatar knmnyn commented on July 26, 2024

Ok. Thanks, @villalbamartin !

from acl-anthology.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.