Executive summary:
Considering the spans that group predications and tokens for each sentence. In total, we have 1842193 such groups. In only 49793 of them, I found apparent POS inconsistency between ERG and the sense annotation.
49793/1842193 = 0.02
Note that I only consider the tokens that were sense tagged. If we count per sentence, 38883 sentences contain at least one error from a total of 159614 sentences. If we ignore the mismatches a/r (adverbs as adjectives) and q/n (someone), we have 28358 sentences with at least one error. If we also ignore mismatches caused by verb/adjective we have 17401 sentences:
38883/159614 = 0.24
28358/159614 = 0.17
17401/159614 = 0.11
The dataset contains 165994 sentences, but not all of them got a parse from ERG.
Details:
For all sentences, I join the tokens with the MRS predicates using the spans.
Below I found no conflict between ERG and the annotation. For instance, affect%2 means it was annotated as a verb, and ERG made it the predicate _affect_v_1. For hydrarthrosis, it was annotated as a noun, and ERG preprocessing instantiated a generic token from NNS pos tagger.
> START def hydrarthrosis affecting the knee
(0,32) 0 => [('unknown', 0, 32, 'e2', 'h1', None), ('udef_q', 0, 32, 'q4', 'h5', None)]
(0,13) 1 => [('hydrarthrosis', 0, 13, ['hydrarthrosis%1:26:00::'], ['wf'], 'NN'), ('_hydrarthrosis/nns_u_unknown', 0, 13, 'x4', 'h8', None)]
(14,23) 1 => [('affecting', 14, 23, ['affect%2:29:00::'], ['wf'], 'VBG'), ('_affect_v_1', 14, 23, 'e9', 'h8', None)]
(24,27) 1 => [('the', 24, 27, None, ['wf'], 'DT'), ('_the_q', 24, 27, 'q10', 'h11', None)]
(28,32) 1 => [('knee', 28, 32, ['knee%1:08:00::'], ['wf'], 'NN'), ('_knee_n_1', 28, 32, 'x10', 'h14', None)]
Next, excess was annotated as an adjective (%5) but analysed as NOUN by ERG. See the line starting with “D>"
> START def an abnormality of pregnancy; accumulation of excess amniotic fluid
D> {'n', 'a'} [('excess', 45, 51, ['excess%5:00:00:unnecessary:00'], ['wf'], 'JJ'), ('udef_q', 45, 51, 'q29', 'h30', None), ('_excess_n_1', 45, 51, 'x29', 'h33', None)]
(0,66) 0 => [('implicit_conj', 0, 66, 'e2', 'h1', None)]
(0,28) 1 => [('unknown', 0, 28, 'e4', 'h1', None)]
(0,2) 2 => [('an', 0, 2, None, ['wf'], 'DT'), ('_a_q', 0, 2, 'q6', 'h7', None)]
(3,14) 2 => [('abnormality', 3, 14, None, ['wf'], 'NN'), ('_abnormality_n_1', 3, 14, 'x6', 'h10', None)]
(15,17) 2 => [('of', 15, 17, None, ['wf'], 'IN'), ('_of_p', 15, 17, 'e11', 'h10', None)]
(18,28) 2 => [('udef_q', 18, 28, 'q12', 'h13', None)]
(18,27) 3 => [('pregnancy', 18, 27, ['pregnancy%1:26:00::'], ['wf'], 'NN'), ('_pregnancy_n_1', 18, 27, 'x12', 'h16', None)]
(27,28) 3 => [(';', 27, 28, None, ['wf'], 'punc')]
(29,66) 1 => [('unknown', 29, 66, 'e5', 'h1', None), ('udef_q', 29, 66, 'q17', 'h18', None)]
(29,41) 2 => [('accumulation', 29, 41, ['accumulation%1:22:00::'], ['wf'], 'NN'), ('_accumulation_n_of', 29, 41, 'x17', 'h21', None)]
(42,44) 2 => [('of', 42, 44, None, ['wf'], 'IN')]
(45,66) 2 => [('udef_q', 45, 66, 'q22', 'h23', None)]
(45,60) 3 => [('compound', 45, 60, 'e27', 'h26', None)]
(45,51) 4 => [('excess', 45, 51, ['excess%5:00:00:unnecessary:00'], ['wf'], 'JJ'), ('udef_q', 45, 51, 'q29', 'h30', None), ('_excess_n_1', 45, 51, 'x29', 'h33', None)]
(52,60) 4 => [('amniotic', 52, 60, None, ['cf', 'a'], 'JJ'), ('_amniotic/jj_u_unknown', 52, 60, 'e28', 'h26', None)]
(61,66) 3 => [('fluid', 61, 66, None, ['cf', 'a'], 'NN'), ('_fluid_n_1', 61, 66, 'x22', 'h26', None)]
ERG annotated adverbs and adjectives as adjoins, so another common mismatch is a vs r. The fragment after the first semi-colon should be an example "equally balanced”?
> START def a state of being essentially equal or equivalent; equally balanced;
D> {'a', 'r'} [('essentially', 17, 28, ['essentially%4:02:01::'], ['wf'], 'RB'), ('_essential_a_1', 17, 28, 'e17', 'h16', None)]
D> {'n', 'a'} [('equivalent', 38, 48, ['equivalent%1:09:00::'], ['wf'], 'JJ'), ('_equivalent_a_to', 38, 48, 'e22', 'h16', None)]
(0,67) 0 => [('implicit_conj', 0, 67, 'e2', 'h1', None)]
(0,49) 1 => [('unknown', 0, 49, 'e4', 'h1', None)]
(0,1) 2 => [('a', 0, 1, None, ['wf'], 'DT'), ('_a_q', 0, 1, 'q6', 'h7', None)]
(2,7) 2 => [('state', 2, 7, ['state%1:03:00::'], ['wf'], 'NN'), ('_state_n_of', 2, 7, 'x6', 'h10', None)]
(8,10) 2 => [('of', 8, 10, None, ['wf'], 'IN')]
(11,49) 2 => [('udef_q', 11, 49, 'q11', 'h12', None), ('nominalization', 11, 49, 'x11', 'h15', None)]
(11,16) 3 => [('being', 11, 16, None, ['wf'], 'VBG')]
(17,28) 3 => [('essentially', 17, 28, ['essentially%4:02:01::'], ['wf'], 'RB'), ('_essential_a_1', 17, 28, 'e17', 'h16', None)]
(29,34) 3 => [('equal', 29, 34, None, ['wf'], 'JJ'), ('_equal_a_to', 29, 34, 'e18', 'h16', None)]
(35,37) 3 => [('or', 35, 37, None, ['wf'], 'CC'), ('_or_c', 35, 37, 'e21', 'h16', None)]
(38,48) 3 => [('equivalent', 38, 48, ['equivalent%1:09:00::'], ['wf'], 'JJ'), ('_equivalent_a_to', 38, 48, 'e22', 'h16', None)]
(48,49) 3 => [(';', 48, 49, None, ['wf'], 'punc')]
(50,67) 1 => [('unknown', 50, 67, 'e5', 'h1', None)]
(50,57) 2 => [('equally', 50, 57, None, ['wf'], 'RB'), ('_equal_a_to', 50, 57, 'e25', 'h1', None)]
(58,66) 2 => [('balanced', 58, 66, ['balance%2:42:00::'], ['wf'], 'VBN'), ('_balance_v_1', 58, 66, 'e26', 'h1', None)]
(66,67) 2 => [(';', 66, 67, None, ['wf'], 'punc')]
Adjective vs verb:
> START def the condition of being reinstated;
D> {'v', 'a'} [('reinstated', 23, 33, ['reinstate%2:41:00::'], ['wf'], 'VBN'), ('_instate_v_1', 23, 33, 'e15', 'h14', None), ('_re-_a_again', 23, 33, 'e18', 'h14', None)]
(0,34) 0 => [('unknown', 0, 34, 'e2', 'h1', None)]
(0,3) 1 => [('the', 0, 3, None, ['wf'], 'DT'), ('_the_q', 0, 3, 'q4', 'h5', None)]
(4,13) 1 => [('condition', 4, 13, ['condition%1:26:00::'], ['wf'], 'NN'), ('_condition_n_of', 4, 13, 'x4', 'h8', None)]
(14,16) 1 => [('of', 14, 16, None, ['wf'], 'IN')]
(17,34) 1 => [('udef_q', 17, 34, 'q9', 'h10', None), ('nominalization', 17, 34, 'x9', 'h13', None)]
(17,22) 2 => [('being', 17, 22, None, ['wf'], 'VBG')]
(23,33) 2 => [('reinstated', 23, 33, ['reinstate%2:41:00::'], ['wf'], 'VBN'), ('_instate_v_1', 23, 33, 'e15', 'h14', None), ('_re-_a_again', 23, 33, 'e18', 'h14', None)]
(33,34) 2 => [(';', 33, 34, None, ['wf'], 'punc')]
Someone vs person+some_q. (1829 cases), I need to improve my check to remove this from the suspicious cases.
> START def a situation of being uncomfortably close to someone or something
D> {'a', 'r'} [('uncomfortably', 21, 34, ['uncomfortably%4:02:00::'], ['wf'], 'RB'), ('_uncomfortable_a_1', 21, 34, 'e16', 'h15', None)]
D> {'q', 'n'} [('someone', 44, 51, ['someone%1:03:00::'], ['wf'], 'NN'), ('person', 44, 51, 'x24', 'h23', None), ('_some_q', 44, 51, 'q24', 'h25', None)]
(0,64) 0 => [('unknown', 0, 64, 'e2', 'h1', None)]
(0,1) 1 => [('a', 0, 1, None, ['wf'], 'DT'), ('_a_q', 0, 1, 'q4', 'h5', None)]
(2,11) 1 => [('situation', 2, 11, ['situation%1:15:00::'], ['wf'], 'NN'), ('_situation_n_1', 2, 11, 'x4', 'h8', None)]
(12,14) 1 => [('of', 12, 14, None, ['wf'], 'IN'), ('_of_p', 12, 14, 'e9', 'h8', None)]
(15,64) 1 => [('udef_q', 15, 64, 'q10', 'h11', None), ('nominalization', 15, 64, 'x10', 'h14', None)]
(15,20) 2 => [('being', 15, 20, None, ['wf'], 'VBG')]
(21,34) 2 => [('uncomfortably', 21, 34, ['uncomfortably%4:02:00::'], ['wf'], 'RB'), ('_uncomfortable_a_1', 21, 34, 'e16', 'h15', None)]
(35,40) 2 => [('close', 35, 40, None, ['wf'], 'JJ'), ('_close_a_to', 35, 40, 'e17', 'h15', None)]
(41,43) 2 => [('to', 41, 43, None, ['wf'], 'TO')]
(44,64) 2 => [('udef_q', 44, 64, 'q19', 'h20', None)]
(44,51) 3 => [('someone', 44, 51, ['someone%1:03:00::'], ['wf'], 'NN'), ('person', 44, 51, 'x24', 'h23', None), ('_some_q', 44, 51, 'q24', 'h25', None)]
(52,54) 3 => [('or', 52, 54, None, ['wf'], 'CC'), ('_or_c', 52, 54, 'x19', 'h28', None)]
(55,64) 3 => [('something', 55, 64, None, ['wf'], 'PRP'), ('thing', 55, 64, 'x29', 'h30', None), ('_some_q', 55, 64, 'q29', 'h31', None)]
What is especially below? Tagged as an adverb, in the ERG analysis, it is X?
> START def the relative position or standing of things or especially persons in a society;
D> {'x', 'r'} [('especially', 47, 57, ['especially%4:02:01::'], ['wf'], 'RB'), ('_especially_x_deg', 47, 57, 'e35', 'h34', None)]
(0,79) 0 => [('unknown', 0, 79, 'e2', 'h1', None), ('udef_q', 0, 79, 'q4', 'h5', None)]
(0,3) 1 => [('the', 0, 3, None, ['wf'], 'DT'), ('_the_q', 0, 3, 'q9', 'h8', None)]
(4,12) 1 => [('relative', 4, 12, ['relative%3:00:00::'], ['wf'], 'JJ'), ('_relative_a_to', 4, 12, 'e13', 'h12', None)]
(13,21) 1 => [('position', 13, 21, None, ['wf'], 'NN'), ('udef_q', 13, 21, 'q16', 'h15', None), ('_position_n_of', 13, 21, 'x16', 'h19', None)]
(22,33) 1 => [('udef_q', 22, 33, 'q21', 'h20', None)]
(22,24) 2 => [('or', 22, 24, None, ['wf'], 'CC'), ('_or_c', 22, 24, 'x9', 'h12', None)]
(25,33) 2 => [('standing', 25, 33, None, ['wf'], 'NN'), ('_standing_n_1', 25, 33, 'x21', 'h24', None)]
(34,36) 1 => [('of', 34, 36, None, ['wf'], 'IN'), ('_of_p', 34, 36, 'e25', 'h12', None)]
(37,43) 1 => [('things', 37, 43, ['thing%1:06:01::'], ['wf'], 'NNS'), ('udef_q', 37, 43, 'q26', 'h27', None), ('_thing_n_of-about', 37, 43, 'x26', 'h30', None)]
(44,46) 1 => [('or', 44, 46, None, ['wf'], 'CC'), ('_or_c', 44, 46, 'x4', 'h32', None)]
(47,57) 1 => [('especially', 47, 57, ['especially%4:02:01::'], ['wf'], 'RB'), ('_especially_x_deg', 47, 57, 'e35', 'h34', None)]
(58,79) 1 => [('udef_q', 58, 79, 'q33', 'h34', None)]
(58,65) 2 => [('persons', 58, 65, ['person%1:03:00::'], ['wf'], 'NNS'), ('_person_n_1', 58, 65, 'x33', 'h39', None)]
(66,68) 2 => [('in', 66, 68, None, ['wf'], 'IN'), ('_in_p_loc', 66, 68, 'e40', 'h39', None)]
(69,70) 2 => [('a', 69, 70, None, ['wf'], 'DT'), ('_a_q', 69, 70, 'q41', 'h42', None)]
(71,78) 2 => [('society', 71, 78, None, ['wf'], 'NN'), ('_society_n_of', 71, 78, 'x41', 'h45', None)]
(78,79) 2 => [(';', 78, 79, None, ['wf'], 'punc')]