Expected behavior Three Atlas functions that work fine on SQL Serv

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Atlas Characterizations, Cohort Pathways & Cohort Reports not working on Spark (Databricks) about webapi HOT 8 CLOSED

TomWhite-MedStar commented on June 30, 2024

Atlas Characterizations, Cohort Pathways & Cohort Reports not working on Spark (Databricks)

from webapi.

Comments (8)

anthonysena commented on June 30, 2024

@TomWhite-MedStar - I'm going to close this for now so we can track this work on the SqlRender issue. If needed, we can transfer the SqlRender issue to this repo and re-visit the information you have provided in this issue (which is excellent by the way!).

from webapi.

TomWhite-MedStar commented on June 30, 2024

@anthonysena, we upgraded to Atlas 2.13, and I can confirm that all of the above-listed issues are now fixed.

However, when I ran the Heracles Full Analysis it failed partway through. Here is the generated Spark SQL that failed:
Cohort-Heracles-full.sql.error.txt

The error message is:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'into'.(line 8, pos 0)

Should I continue to track such issue here, or only in the SqlRender thread?

from webapi.

anthonysena commented on June 30, 2024

Excellent - let's track this work here. I'll close out the SqlRender issue that is now linked. Let me see if I can track down the query causing the issue and go from there.

from webapi.

anthonysena commented on June 30, 2024

@TomWhite-MedStar you mentioned the error message is:

[PARSE_SYNTAX_ERROR] Syntax error at or near 'into'.(line 8, pos 0)

However, I see no mention of an 'into' statement in the SQL you provided:

select
  c1.cohort_definition_id,
  200 as analysis_id,
  --
  --
  vo1.visit_CONCEPT_ID as stratum_1,
  --
  cast('' as STRING) as stratum_2,
  cast('' as STRING) as stratum_3,
  cast('' as STRING) as stratum_4,
  COUNT(distinct vo1.person_id) as count_value into tmp_v0125_v2.sl527h4fresults_200
from
  omop_160101_to_221231_v0125.visit_occurrence vo1
  inner join tmp_v0125_v2.sl527h4fHERACLES_cohort c1 on vo1.person_id = c1.subject_id --
group by
  c1.cohort_definition_id,
  --
  --
  vo1.visit_CONCEPT_ID --;
  CREATE TABLE tmp_v0125_v2.sl527h4fresults_500 USING DELTA AS
SELECT
  c1.cohort_definition_id,
  500 as analysis_id,
  d1.cause_CONCEPT_ID as stratum_1,
  cast('' as STRING) as stratum_2,
  cast('' as STRING) as stratum_3,
  cast('' as STRING) as stratum_4,
  COUNT(distinct d1.PERSON_ID) as count_value
FROM
  omop_160101_to_221231_v0125.death d1
  inner join tmp_v0125_v2.sl527h4fHERACLES_cohort c1 on d1.person_id = c1.subject_id --
group by
  c1.cohort_definition_id,
  d1.cause_CONCEPT_ID

Is this SQL from the WebAPI log itself? I think I may be missing some of the SQL that may have caused the issue here.

For reference, I believe this is the SQL script that was used to generate the 1st part of the SQL above: https://github.com/OHDSI/WebAPI/blob/a1a130a9606c0318ca3b69255b1aee50b39ab56d/src/main/resources/resources/cohortanalysis/heraclesanalyses/sql/200_201.sql

from webapi.

TomWhite-MedStar commented on June 30, 2024

@anthonysena , the INTO statement is hidden in the formatting on the line before the first FROM statement:

COUNT(distinct vo1.person_id) as count_value **into** tmp_v0125_v2.sl527h4fresults_200

When I run the code you linked above through SqlRender, I get a different output from the above.

Might it be a SQL splitting issue? I see from this line above:
vo1.visit_CONCEPT_ID --;
that there is a semi-colon after a comment. The full SQL I posted above is what was sent to Databricks, so it appears that SQLRender didn't recognize that there are two SQL statements above.

Which code does the substitutions in sections like this?:
--{@CDM_version == '4'}?{ vo1.place_of_service_CONCEPT_ID --} --{@CDM_version == '5'}?{ vo1.visit_CONCEPT_ID --} ;
Could that be leading to cases like shown on the line with the commented semi-colon?

from webapi.

anthonysena commented on June 30, 2024

@anthonysena , the INTO statement is hidden in the formatting on the line before the first FROM statement:

COUNT(distinct vo1.person_id) as count_value into tmp_v0125_v2.sl527h4fresults_200

In the words of Homer Simpson: d'oh! Sorry I missed that.

There does appear to be some type of rendering issue based on what I am seeing in SqlDeveloper. Here is what the SQL mentioned above looks like when translated to Spark:

For reference, here is the rendered Spark SQL:

select c1.cohort_definition_id, 200 as analysis_id,
 --
 --
 vo1.visit_CONCEPT_ID as stratum_1,
 --
 cast( '' as STRING ) as stratum_2, cast( '' as STRING ) as stratum_3, cast( '' as STRING ) as stratum_4,
 COUNT(distinct vo1.person_id) as count_value
into temp_em.k6raaf7fresults_200
from
cdm.visit_occurrence vo1
inner join temp_em.k6raaf7fHERACLES_cohort c1
on vo1.person_id = c1.subject_id
--
WHERE vo1.visit_start_date>=c1.cohort_start_date and vo1.visit_end_date<=c1.cohort_end_date
--
group by c1.cohort_definition_id,
--
--
vo1.visit_CONCEPT_ID
--;

As you noted above, the trailing semi-colon is commented out so Spark is likely treating that entire SQL block as 1 statement thus causing the error. This appears to be unique to the Spark translation when comparing the same query translated to PostgreSQL for comparison:

For reference, here is the rendered PostgreSQL:

-- 200                'My analysis'
--insert into results_schema.heracles_results (cohort_definition_id, analysis_id, stratum_1, count_value)
CREATE TEMP TABLE results_200
AS
SELECT
c1.cohort_definition_id, 200 as analysis_id,
  --
  --
                                vo1.visit_CONCEPT_ID as stratum_1,
  --
                                cast( '' as varchar(1) ) as stratum_2, cast( '' as varchar(1) ) as stratum_3, cast( '' as varchar(1) ) as stratum_4,
                                COUNT(distinct vo1.person_id) as count_value
FROM
cdm.visit_occurrence vo1
inner join HERACLES_cohort c1
on vo1.person_id = c1.subject_id
--
WHERE vo1.visit_start_date>=c1.cohort_start_date and vo1.visit_end_date<=c1.cohort_end_date
--
group by c1.cohort_definition_id,
--
--
vo1.visit_CONCEPT_ID
--
;
ANALYZE results_200
;

Tagging @schuemie for input - this now seems like it may be an issue w/ SqlRender whereby a new line is required in one of the replacement patterns? Alternatively, we could review the WebAPI query and perhaps some re-formatting in that file would fix things?

from webapi.

TomWhite-MedStar commented on June 30, 2024

@chrisknoll , I submitted a pull request before noticing that you may have already fixed query 200_201.sql.

Silly question - where do I find the control file that determines the order in which those heraclesanalyses are run? I monitored the generated queries, and they are not in an obvious order. If you can let me know what sequence of queries are called for Full Analysis after 200, I can test the remaining ones for potential issues on Databricks / Spark.

from webapi.

chrisknoll commented on June 30, 2024

Sorry, I saw this after I saw the PR, so the PR is approved and merged (the changes looked good to me).

I need to look at the code again and remember how the heracles reports are generated....I'll reply here when I work it out.

from webapi.

Atlas Characterizations, Cohort Pathways & Cohort Reports not working on Spark (Databricks) about webapi HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent