Git Product home page Git Product logo

Comments (8)

anthonysena avatar anthonysena commented on June 30, 2024

@TomWhite-MedStar - I'm going to close this for now so we can track this work on the SqlRender issue. If needed, we can transfer the SqlRender issue to this repo and re-visit the information you have provided in this issue (which is excellent by the way!).

from webapi.

TomWhite-MedStar avatar TomWhite-MedStar commented on June 30, 2024

@anthonysena, we upgraded to Atlas 2.13, and I can confirm that all of the above-listed issues are now fixed.

However, when I ran the Heracles Full Analysis it failed partway through. Here is the generated Spark SQL that failed:
Cohort-Heracles-full.sql.error.txt

The error message is:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'into'.(line 8, pos 0)

Should I continue to track such issue here, or only in the SqlRender thread?

from webapi.

anthonysena avatar anthonysena commented on June 30, 2024

Excellent - let's track this work here. I'll close out the SqlRender issue that is now linked. Let me see if I can track down the query causing the issue and go from there.

from webapi.

anthonysena avatar anthonysena commented on June 30, 2024

@TomWhite-MedStar you mentioned the error message is:

[PARSE_SYNTAX_ERROR] Syntax error at or near 'into'.(line 8, pos 0)

However, I see no mention of an 'into' statement in the SQL you provided:

select
  c1.cohort_definition_id,
  200 as analysis_id,
  --
  --
  vo1.visit_CONCEPT_ID as stratum_1,
  --
  cast('' as STRING) as stratum_2,
  cast('' as STRING) as stratum_3,
  cast('' as STRING) as stratum_4,
  COUNT(distinct vo1.person_id) as count_value into tmp_v0125_v2.sl527h4fresults_200
from
  omop_160101_to_221231_v0125.visit_occurrence vo1
  inner join tmp_v0125_v2.sl527h4fHERACLES_cohort c1 on vo1.person_id = c1.subject_id --
group by
  c1.cohort_definition_id,
  --
  --
  vo1.visit_CONCEPT_ID --;
  CREATE TABLE tmp_v0125_v2.sl527h4fresults_500 USING DELTA AS
SELECT
  c1.cohort_definition_id,
  500 as analysis_id,
  d1.cause_CONCEPT_ID as stratum_1,
  cast('' as STRING) as stratum_2,
  cast('' as STRING) as stratum_3,
  cast('' as STRING) as stratum_4,
  COUNT(distinct d1.PERSON_ID) as count_value
FROM
  omop_160101_to_221231_v0125.death d1
  inner join tmp_v0125_v2.sl527h4fHERACLES_cohort c1 on d1.person_id = c1.subject_id --
group by
  c1.cohort_definition_id,
  d1.cause_CONCEPT_ID

Is this SQL from the WebAPI log itself? I think I may be missing some of the SQL that may have caused the issue here.

For reference, I believe this is the SQL script that was used to generate the 1st part of the SQL above: https://github.com/OHDSI/WebAPI/blob/a1a130a9606c0318ca3b69255b1aee50b39ab56d/src/main/resources/resources/cohortanalysis/heraclesanalyses/sql/200_201.sql

from webapi.

TomWhite-MedStar avatar TomWhite-MedStar commented on June 30, 2024

@anthonysena , the INTO statement is hidden in the formatting on the line before the first FROM statement:

COUNT(distinct vo1.person_id) as count_value **into** tmp_v0125_v2.sl527h4fresults_200

When I run the code you linked above through SqlRender, I get a different output from the above.

Might it be a SQL splitting issue? I see from this line above:
vo1.visit_CONCEPT_ID --;
that there is a semi-colon after a comment. The full SQL I posted above is what was sent to Databricks, so it appears that SQLRender didn't recognize that there are two SQL statements above.

Which code does the substitutions in sections like this?:
--{@CDM_version == '4'}?{ vo1.place_of_service_CONCEPT_ID --} --{@CDM_version == '5'}?{ vo1.visit_CONCEPT_ID --} ;
Could that be leading to cases like shown on the line with the commented semi-colon?

from webapi.

anthonysena avatar anthonysena commented on June 30, 2024

@anthonysena , the INTO statement is hidden in the formatting on the line before the first FROM statement:

COUNT(distinct vo1.person_id) as count_value into tmp_v0125_v2.sl527h4fresults_200

In the words of Homer Simpson: d'oh! Sorry I missed that.

There does appear to be some type of rendering issue based on what I am seeing in SqlDeveloper. Here is what the SQL mentioned above looks like when translated to Spark:

image

For reference, here is the rendered Spark SQL:

select c1.cohort_definition_id, 200 as analysis_id,
 --
 --
 vo1.visit_CONCEPT_ID as stratum_1,
 --
 cast( '' as STRING ) as stratum_2, cast( '' as STRING ) as stratum_3, cast( '' as STRING ) as stratum_4,
 COUNT(distinct vo1.person_id) as count_value
into temp_em.k6raaf7fresults_200
from
cdm.visit_occurrence vo1
inner join temp_em.k6raaf7fHERACLES_cohort c1
on vo1.person_id = c1.subject_id
--
WHERE vo1.visit_start_date>=c1.cohort_start_date and vo1.visit_end_date<=c1.cohort_end_date
--
group by c1.cohort_definition_id,
--
--
vo1.visit_CONCEPT_ID
--;

As you noted above, the trailing semi-colon is commented out so Spark is likely treating that entire SQL block as 1 statement thus causing the error. This appears to be unique to the Spark translation when comparing the same query translated to PostgreSQL for comparison:

image

For reference, here is the rendered PostgreSQL:

-- 200                'My analysis'
--insert into results_schema.heracles_results (cohort_definition_id, analysis_id, stratum_1, count_value)
CREATE TEMP TABLE results_200
AS
SELECT
c1.cohort_definition_id, 200 as analysis_id,
  --
  --
                                vo1.visit_CONCEPT_ID as stratum_1,
  --
                                cast( '' as varchar(1) ) as stratum_2, cast( '' as varchar(1) ) as stratum_3, cast( '' as varchar(1) ) as stratum_4,
                                COUNT(distinct vo1.person_id) as count_value
FROM
cdm.visit_occurrence vo1
inner join HERACLES_cohort c1
on vo1.person_id = c1.subject_id
--
WHERE vo1.visit_start_date>=c1.cohort_start_date and vo1.visit_end_date<=c1.cohort_end_date
--
group by c1.cohort_definition_id,
--
--
vo1.visit_CONCEPT_ID
--
;
ANALYZE results_200
;

Tagging @schuemie for input - this now seems like it may be an issue w/ SqlRender whereby a new line is required in one of the replacement patterns? Alternatively, we could review the WebAPI query and perhaps some re-formatting in that file would fix things?

from webapi.

TomWhite-MedStar avatar TomWhite-MedStar commented on June 30, 2024

@chrisknoll , I submitted a pull request before noticing that you may have already fixed query 200_201.sql.

Silly question - where do I find the control file that determines the order in which those heraclesanalyses are run? I monitored the generated queries, and they are not in an obvious order. If you can let me know what sequence of queries are called for Full Analysis after 200, I can test the remaining ones for potential issues on Databricks / Spark.

from webapi.

chrisknoll avatar chrisknoll commented on June 30, 2024

Sorry, I saw this after I saw the PR, so the PR is approved and merged (the changes looked good to me).

I need to look at the code again and remember how the heracles reports are generated....I'll reply here when I work it out.

from webapi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.