Git Product home page Git Product logo

Comments (23)

chrisjbaik avatar chrisjbaik commented on August 27, 2024 4

Hi, thanks again for providing this valuable resource.

Wanted to report some errors in the dev set:

  1. NLQ: What is the name and capacity of the stadium with the most concerts after 2013?. The resulting SQL query has >= 2014, but while technically correct, don't think it makes as much semantic sense as > 2013.
  2. NLQ: What is the smallest weight of the car produced with 8 cylinders on 1974? The SQL query incorrectly references Cylinders = 4.
  3. NLQ: What is the minimu weight of the car with 8 cylinders produced in 1974? Same as the previous one, and minimum is spelled incorrectly.
  4. NLQ: Among the cars that do not have the minimum horsepower, what are the make ids and names of al those with less than 4 cylinders? Similar to 1, it should be < 4 in the SQL, not <= 3.
  5. NLQ: List the name of teachers whose hometown is not "Little Lever Urban District". The SQL query condition is missing a t: little lever urban distric.
  6. NLQ: What are the names of the teachers whose hometown is not "Little Lever Urban District"? same as above.
  7. NLQ: What is the mobile phone number of the student named Timothy Ward? The actual database has the value Timmothy, as does the SQL query. (This is debatably incorrect.)
  8. NLQ: What is the total population and average area of countries in the continent of North America whose area is bigger than 3000? The question mark symbol uses a strange character for this one.
  9. NLQ: Give the total population and average surface area corresponding to countries in Noth America that have a surface area greater than 3000. North America is spelled incorrectly.
  10. NLQ: Return the names of cities that have a population between 160000 and 900000. The query incorrectly uses 90000 (missing a 0).

from spider.

taoyds avatar taoyds commented on August 27, 2024 3
  1. SQL annotation errors: issue 23,
  2. Schema errors: issue 17, issue 20

from spider.

taoyds avatar taoyds commented on August 27, 2024

@chrisjbaik Thanks a lot!

from spider.

ygan avatar ygan commented on August 27, 2024

There is a bug in the table of players in wta_1database.
The wta_1.sql file contains wrong contents on the first line which is :
"CRloser_rank_pointsEATE TABLE players("
If you run: "select * from players" or any other related to table players, you will get an error.

from spider.

todpole3 avatar todpole3 commented on August 27, 2024

Do you plan to incorporate the annotation fixes in future data release? If so that would be great.

The dataset .zip file released on the website still contain the original errors.

from spider.

taoyds avatar taoyds commented on August 27, 2024

Yes! We are planning to fix annotation errors and release Spider 2.0 by January 2020. Also, we will update the evaluation script. Thanks!

from spider.

andrewbury avatar andrewbury commented on August 27, 2024

NLQ: "How many acting statuses are there?"
SQL: "SELECT count(DISTINCT temporary_acting) FROM management"

I feel like there is some ambiguity here, it could be distinct or it could just be the total number

from spider.

whwang299 avatar whwang299 commented on August 27, 2024

Hi!

I would like to inform you small inconsistency db_id = formula_1 table in tables.json,

The order of tables presented in "table_names" & "table_names_original" fields are different making in difficult to automatically match between them.

Thanks!

from spider.

longxudou avatar longxudou commented on August 27, 2024

Hi

In dev dataset, I found that
QuestionA: What is the average and the maximum capacity of all stadiums ?
QuestionB: What is the average and maximum capacities for all stations ?
PS: The stadiums table has a column named "AVERAGE".

They are all annotated to the same SQL: SELECT avg(capacity) , max(capacity) FROM stadium.
I suppose that QuestionB is right, but the QuetionA should be annotated as select Average , max ( Capacity ) from stadium

Thanks!

from spider.

mnoukhov avatar mnoukhov commented on August 27, 2024

adding issue #10 for the train_gold.sql file which is still unfixed

for the question What is the description of the type of the company who concluded its contracts most recently?
the correct SQL should be SELECT T1.company_type FROM Third_Party_Companies AS T1 JOIN Maintenance_Contracts AS T2 ON T1.company_id = T2.maintenance_contract_company_id ORDER BY T2.contract_end_date DESC LIMIT 1

from spider.

taoyds avatar taoyds commented on August 27, 2024

Sorry for the delay! We finally corrected some annotation errors and label mismatches (not errors) in Spider dev and test sets (~4% of dev examples updated, click here for more details). Thanks for your inputs!

from spider.

CrafterKolyan avatar CrafterKolyan commented on August 27, 2024

Schema error: issue #53

from spider.

CrafterKolyan avatar CrafterKolyan commented on August 27, 2024

@taoyds ygan already told about this problem in Apr 19, 2019. Please fix it.

from spider.

CrafterKolyan avatar CrafterKolyan commented on August 27, 2024

As I didn't want to wait one more year for fixes to come I've already done everything by myself.
Spider Dataset from 06/07/2020 with fixed wta_1: https://drive.google.com/file/d/1m68AHHPC4pqyjT-Zmt-u8TRqdw5vp-U5/view
Fixed evaluation.py which is stable to decoding issues in wta_1 database and gives 1.0 accuracy, recall and F1 when dev_gold.sql are supposed to be gold and predicted labels simultaneously: https://github.com/CrafterKolyan/spider-fixed/blob/master/evaluation.py

from spider.

aosokin avatar aosokin commented on August 27, 2024

See #56

from spider.

aosokin avatar aosokin commented on August 27, 2024

See #57

from spider.

aosokin avatar aosokin commented on August 27, 2024

See #58

from spider.

tomerwolgithub avatar tomerwolgithub commented on August 27, 2024

In the train set, the NL question:
What are the first names of the faculty members playing both Canoeing and Kayaking?

The gold SQL is incorrect:

SELECT T1.lname FROM Faculty AS T1 JOIN Faculty_participates_in AS T2 ON T1.facID = T2.facID JOIN activity AS T3 ON T2.actid = T2.actid WHERE T3.activity_name = 'Canoeing' INTERSECT SELECT T1.lname FROM Faculty AS T1 JOIN Faculty_participates_in AS T2 ON T1.facID = T2.facID JOIN activity AS T3 ON T2.actid = T2.actid WHERE T3.activity_name = 'Kayaking'

Should be:

SELECT T1.fname FROM Faculty AS T1 JOIN Faculty_participates_in AS T2 ON T1.facID = T2.facID JOIN activity AS T3 ON T2.actid = T3.actid WHERE T3.activity_name = 'Canoeing' INTERSECT SELECT T1.lname FROM Faculty AS T1 JOIN Faculty_participates_in AS T2 ON T1.facID = T2.facID JOIN activity AS T3 ON T2.actid = T3.actid WHERE T3.activity_name = 'Kayaking'

from spider.

taoyds avatar taoyds commented on August 27, 2024

a few unexpected table names found in tables.json. e.g. sqlite_sequence table in world_1 database in tables.json.

credits to Pedro

from spider.

ReinierKoops avatar ReinierKoops commented on August 27, 2024

See #70

from spider.

alan-ai-learner avatar alan-ai-learner commented on August 27, 2024

As I didn't want to wait one more year for fixes to come I've already done everything by myself.
Spider Dataset from 06/07/2020 with fixed wta_1: https://drive.google.com/file/d/1m68AHHPC4pqyjT-Zmt-u8TRqdw5vp-U5/view
Fixed evaluation.py which is stable to decoding issues in wta_1 database and gives 1.0 accuracy, recall and F1 when dev_gold.sql are supposed to be gold and predicted labels simultaneously: https://github.com/CrafterKolyan/spider-fixed/blob/master/evaluation.py

#75 Can you please look into it?

from spider.

alan-ai-learner avatar alan-ai-learner commented on August 27, 2024

See #70

#75 Can you please look into it?

from spider.

ReinierKoops avatar ReinierKoops commented on August 27, 2024

Tables.json > "useracct" should be "user account" for table_names (not for table_names_original)

from spider.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.