Comments (9)
I've just tried to correct errors with W17-35 INLG 2017 which requires re-ingesting the all of the works from W17. This brought up problems with certain consistency checks that seem to have been recently added.
aclanthology@aclanthology:~/acl-anthology/public/pdf/W$ rake import:xml[true,"W17"]
(in /home/aclanthology/acl-anthology)
Seeding individual volume: W17.
PG::ForeignKeyViolation: ERROR: update or delete on table "papers" violates foreign key constraint "papers_people_paper_id_fkey" on table "papers_people"
DETAIL: Key (id)=(52374) is still referenced from table "papers_people".
: DELETE FROM papers WHERE volume_id IN (SELECT id FROM volumes WHERE anthology_id LIKE 'W17%');
rake aborted!
ActiveRecord::InvalidForeignKey: PG::ForeignKeyViolation: ERROR: update or delete on table "papers" violates foreign key constraint "papers_people_paper_id_fkey" on table "papers_people"
DETAIL: Key (id)=(52374) is still referenced from table "papers_people".
: DELETE FROM papers WHERE volume_id IN (SELECT id FROM volumes WHERE anthology_id LIKE 'W17%');
/home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activerecord-4.0.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:128:in exec' /home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activerecord-4.0.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:128:in
block in execute'
/home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activerecord-4.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:435:in block in log' /home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activesupport-4.0.1/lib/active_support/notifications/instrumenter.rb:20:in
instrument'
/home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activerecord-4.0.1/lib/active_record/connection_adapters/abstract_adapter.rb:430:in log' /home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/gems/activerecord-4.0.1/lib/active_record/connection_adapters/postgresql/database_statements.rb:127:in
execute'
/home/aclanthology/acl-anthology/lib/tasks/xml_import.rake:330:in block (2 levels) in <top (required)>' /home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/bin/ruby_executable_hooks:15:in
eval'
/home/aclanthology/.rvm/gems/ruby-2.0.0-p353@acl/bin/ruby_executable_hooks:15:in `
Tasks: TOP => import:xml
(See full trace by running task with --trace)
from acl-anthology.
We need to fix this problem soon as without the ability to reingest a volume (say workshops from W17) which used to work as of one month ago, we can't ingest new proceedings or update ones.
@villalbamartin can you try ingesting the proceedings in the Saarlands VM at import/W17.xml?
from acl-anthology.
I'm on it. While it's not nice that something broke, we were expecting something like this to happen with the new checks.
from acl-anthology.
from acl-anthology.
Documenting this bug before I attempt to squash it.
The error is triggered by this line, which deletes all papers (if any) that belong to the volume that it's about to be imported - those papers either do not exist and nothing will be deleted, or they will be re-inserted briefly afterwards.
The problem is the table papers_people
, which relates papers and authors. This table remains untouched, leading to the following sequence of events:
- All papers from the given volume are deleted. Even if they are re-inserted, their new ID means that they are effectively new papers.
- The table now contains references to deleted papers belonging to some authors.
- At some point, someone performs a search that involves one of those same authors.
- The search will then try to calculate how many papers that author has written. This process will fail for deleted papers (because the entry in the
papers
table is missing), the whole search will then fail, and you'll get the error we've seen recently.
These are the two steps I'll attempt to correct this:
- Delete the proper entries from
papers_people
before deleting the referenced entries. - Make the entire ingestion process a transaction (if it isn't already), to avoid future inconsistencies.
from acl-anthology.
I've now added a line to delete the proper records, and ran rake import:xml[true,"W17"]
without error. @knmnyn, could you confirm that things are working as expected? Note that I didn't run the entire ingestion pipeline, only the line that you mentioned in your report.
from acl-anthology.
@villalbamartin Thanks, that looks like it worked!
from acl-anthology.
Update: the consistency check disappeared when we recreated the database at some point, and yet the modification we did to xml_import.rake
seems to have solved the problem, so I don't see the need to add an extra constraint to the database that we apparently don't need. So even though the fix is real and still around, the database consistency check is no longer there.
from acl-anthology.
Ok. Thanks, @villalbamartin !
from acl-anthology.
Related Issues (20)
- Paper Revision 2024.eacl-short.22
- Author Metadata: {Ranran Haoran Zhang} HOT 1
- Ingestion Request [11-15-2024]: NLP4MusA 2024 HOT 1
- Author Metadata: {Zhicheng Guo}
- Author Metadata: {Catherine Chen}
- Author Metadata: Kathy Reid HOT 2
- Paper Revision 2024.case-1.2 HOT 1
- Paper Revision 2024.findings-eacl.147
- Author Metadata: Weiwei Sun HOT 2
- Paper Revision 2023.emnlp-main.502
- DOIs for Eval4NLP 2022 not working HOT 1
- Ingestion Request [06-24-2024]: EAMT 2024 - The 25th Annual Conference of the European Association for Machine Translation HOT 1
- Ingestion Request: EAMT 2024 - The 25th Annual Conference of the European Association for Machine Translation HOT 1
- Paper Metadata: {replace with Anthology ID} HOT 2
- Paper Revision 2020.emnlp-main.26
- Paper Metadata: 2024.eacl-demo.22
- Paper Metadata: 2023.sicon-1.5 HOT 1
- Name corrections for Ranran Haoran Zhang HOT 2
- Author Metadata: Gerasimos Spanakis
- Paper Revision{2022.findings-acl.278}
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from acl-anthology.