Comments (8)
@TomWhite-MedStar - I'm going to close this for now so we can track this work on the SqlRender issue. If needed, we can transfer the SqlRender issue to this repo and re-visit the information you have provided in this issue (which is excellent by the way!).
from webapi.
@anthonysena, we upgraded to Atlas 2.13, and I can confirm that all of the above-listed issues are now fixed.
However, when I ran the Heracles Full Analysis it failed partway through. Here is the generated Spark SQL that failed:
Cohort-Heracles-full.sql.error.txt
The error message is:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'into'.(line 8, pos 0)
Should I continue to track such issue here, or only in the SqlRender thread?
from webapi.
Excellent - let's track this work here. I'll close out the SqlRender issue that is now linked. Let me see if I can track down the query causing the issue and go from there.
from webapi.
@TomWhite-MedStar you mentioned the error message is:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'into'.(line 8, pos 0)
However, I see no mention of an 'into' statement in the SQL you provided:
select
c1.cohort_definition_id,
200 as analysis_id,
--
--
vo1.visit_CONCEPT_ID as stratum_1,
--
cast('' as STRING) as stratum_2,
cast('' as STRING) as stratum_3,
cast('' as STRING) as stratum_4,
COUNT(distinct vo1.person_id) as count_value into tmp_v0125_v2.sl527h4fresults_200
from
omop_160101_to_221231_v0125.visit_occurrence vo1
inner join tmp_v0125_v2.sl527h4fHERACLES_cohort c1 on vo1.person_id = c1.subject_id --
group by
c1.cohort_definition_id,
--
--
vo1.visit_CONCEPT_ID --;
CREATE TABLE tmp_v0125_v2.sl527h4fresults_500 USING DELTA AS
SELECT
c1.cohort_definition_id,
500 as analysis_id,
d1.cause_CONCEPT_ID as stratum_1,
cast('' as STRING) as stratum_2,
cast('' as STRING) as stratum_3,
cast('' as STRING) as stratum_4,
COUNT(distinct d1.PERSON_ID) as count_value
FROM
omop_160101_to_221231_v0125.death d1
inner join tmp_v0125_v2.sl527h4fHERACLES_cohort c1 on d1.person_id = c1.subject_id --
group by
c1.cohort_definition_id,
d1.cause_CONCEPT_ID
Is this SQL from the WebAPI log itself? I think I may be missing some of the SQL that may have caused the issue here.
For reference, I believe this is the SQL script that was used to generate the 1st part of the SQL above: https://github.com/OHDSI/WebAPI/blob/a1a130a9606c0318ca3b69255b1aee50b39ab56d/src/main/resources/resources/cohortanalysis/heraclesanalyses/sql/200_201.sql
from webapi.
@anthonysena , the INTO statement is hidden in the formatting on the line before the first FROM statement:
COUNT(distinct vo1.person_id) as count_value **into** tmp_v0125_v2.sl527h4fresults_200
When I run the code you linked above through SqlRender, I get a different output from the above.
Might it be a SQL splitting issue? I see from this line above:
vo1.visit_CONCEPT_ID --;
that there is a semi-colon after a comment. The full SQL I posted above is what was sent to Databricks, so it appears that SQLRender didn't recognize that there are two SQL statements above.
Which code does the substitutions in sections like this?:
--{@CDM_version == '4'}?{ vo1.place_of_service_CONCEPT_ID --} --{@CDM_version == '5'}?{ vo1.visit_CONCEPT_ID --} ;
Could that be leading to cases like shown on the line with the commented semi-colon?
from webapi.
@anthonysena , the INTO statement is hidden in the formatting on the line before the first FROM statement:
COUNT(distinct vo1.person_id) as count_value into tmp_v0125_v2.sl527h4fresults_200
In the words of Homer Simpson: d'oh! Sorry I missed that.
There does appear to be some type of rendering issue based on what I am seeing in SqlDeveloper. Here is what the SQL mentioned above looks like when translated to Spark:
For reference, here is the rendered Spark SQL:
select c1.cohort_definition_id, 200 as analysis_id,
--
--
vo1.visit_CONCEPT_ID as stratum_1,
--
cast( '' as STRING ) as stratum_2, cast( '' as STRING ) as stratum_3, cast( '' as STRING ) as stratum_4,
COUNT(distinct vo1.person_id) as count_value
into temp_em.k6raaf7fresults_200
from
cdm.visit_occurrence vo1
inner join temp_em.k6raaf7fHERACLES_cohort c1
on vo1.person_id = c1.subject_id
--
WHERE vo1.visit_start_date>=c1.cohort_start_date and vo1.visit_end_date<=c1.cohort_end_date
--
group by c1.cohort_definition_id,
--
--
vo1.visit_CONCEPT_ID
--;
As you noted above, the trailing semi-colon is commented out so Spark is likely treating that entire SQL block as 1 statement thus causing the error. This appears to be unique to the Spark translation when comparing the same query translated to PostgreSQL for comparison:
For reference, here is the rendered PostgreSQL:
-- 200 'My analysis'
--insert into results_schema.heracles_results (cohort_definition_id, analysis_id, stratum_1, count_value)
CREATE TEMP TABLE results_200
AS
SELECT
c1.cohort_definition_id, 200 as analysis_id,
--
--
vo1.visit_CONCEPT_ID as stratum_1,
--
cast( '' as varchar(1) ) as stratum_2, cast( '' as varchar(1) ) as stratum_3, cast( '' as varchar(1) ) as stratum_4,
COUNT(distinct vo1.person_id) as count_value
FROM
cdm.visit_occurrence vo1
inner join HERACLES_cohort c1
on vo1.person_id = c1.subject_id
--
WHERE vo1.visit_start_date>=c1.cohort_start_date and vo1.visit_end_date<=c1.cohort_end_date
--
group by c1.cohort_definition_id,
--
--
vo1.visit_CONCEPT_ID
--
;
ANALYZE results_200
;
Tagging @schuemie for input - this now seems like it may be an issue w/ SqlRender whereby a new line is required in one of the replacement patterns? Alternatively, we could review the WebAPI query and perhaps some re-formatting in that file would fix things?
from webapi.
@chrisknoll , I submitted a pull request before noticing that you may have already fixed query 200_201.sql.
Silly question - where do I find the control file that determines the order in which those heraclesanalyses are run? I monitored the generated queries, and they are not in an obvious order. If you can let me know what sequence of queries are called for Full Analysis after 200, I can test the remaining ones for potential issues on Databricks / Spark.
from webapi.
Sorry, I saw this after I saw the PR, so the PR is approved and merged (the changes looked good to me).
I need to look at the code again and remember how the heracles reports are generated....I'll reply here when I work it out.
from webapi.
Related Issues (20)
- Password encryption feature crashing application to start HOT 5
- Create a separate library for SOLR classes and interfaces
- Request to replace hyphen(-) with underscore(_) - Property naming conventions HOT 3
- Execution engine port 8888 HOT 5
- Rename master to main HOT 1
- Achilles DDL not working on Databricks Spark HOT 1
- Improve cohort definition deletion behavior HOT 1
- WebAPI Security Login Failed HOT 5
- Databricks: vocabulary search triggers SQLNonTransientConnectionException about missing PWD HOT 2
- Unclear spark/databricks support HOT 17
- Hypersistence Optimizer Issues: Critical: BidirectionalSynchronizationEvent HOT 2
- Consider adding JavaMelody : monitoring of JavaEE applications
- Characterization not using Vocab daimon
- OpenID Session not established at Atlas UI with Azure AD HOT 7
- Search - RC/DRC counts give 'timeout' in BigQuery HOT 2
- Update BigQuery driver to the latest 1.x version HOT 1
- Clear Server Cache button in ATLAS does not work HOT 2
- Non-fatal error in SQL construction for Pathway analysis on Spark (e.g. Databricks) HOT 9
- How to set a source in the database? HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from webapi.