Git Product home page Git Product logo

portuguese-pos-dict's Introduction

languagetool-org

repo for GitHub pages

portuguese-pos-dict's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

portuguese-pos-dict's Issues

UTF-8 e quais os dicionários a alterar?

Olá, @susanaboatto e @p-goulart

Quero melhorar os dicionários PT (ficheiros .AFF) usando algumas regras do pt-BR para os verbos, como, por exemplo: “referimo-nos”.

Aquilo tem carradas de dicionários, qual devo alterar?

Podem-me adicionar como membro deste repositório?

Outra questão: porquê os dicionários estão em ISO e não em UTF-8?

Obrigado!

dicionarios_varios_20231216

diminutives

@ricardojosehlima @marcoagpinto

In this repo, I am rebuilding the Portuguese tagger dictionary. It will have twice as many words as the previous dictionary.

This is a list of diminutive and augmentative adjectives: diminutives.txt
Do they make sense in the tagger dictionary? Are they correct and usual words? Or are they automatically generated?

Most of them are not accepted by the spellers (PT, BR...).

abalado AQCFP0 abaladazinhas
abalado AQCFP0 abaladazitas
abalado AQCFP0 abaladinhas
abalado AQCFP0 abaladitas
abalado AQCFS0 abaladazinha
abalado AQCFS0 abaladazita
abalado AQCFS0 abaladinha
abalado AQCFS0 abaladita
abalado AQCMP0 abaladinhos
abalado AQCMP0 abaladitos
abalado AQCMP0 abaladozinhos
abalado AQCMP0 abaladozitos
abalado AQCMS0 abaladinho
abalado AQCMS0 abaladito
abalado AQCMS0 abaladozinho
abalado AQCMS0 abaladozito
esperto AQAFP0 espertaças
esperto AQAFP0 espertonas
esperto AQAFS0 espertaça
esperto AQAFS0 espertona
esperto AQAMP0 espertaços
esperto AQAMP0 espertões
esperto AQAMS0 espertaço
esperto AQAMS0 espertão

Testing the new tagger dictionary

We have a new tagger dictionary ready to use in this branch: https://github.com/languagetool-org/languagetool/tree/pt-new-pos-dict
The differences with the current master branch can be seen in this file.
diff-results-2022-04-28T12 50.zip

There are still many undesired differences. I will track here the revision.

  • ABREVIATIONS_DR[3] [g/m]
  • ABREVIATIONS_PUNCTUATION[1] [g/m]
  • ABREVIATIONS_PUNCTUATION[2] [g/m]
  • ACENTUAÇÃO_VOGAL_ÊNCLISE[1] [g/m]
  • ACHO_QUE[1] [g/m]
  • ADJ_PARA_PRA_CARAÇAS_CARAMBA[1] [g/m]
  • ADVERBIOS_MODO_EM_SEQUENCIA[1] [g/m]
  • ADVERBIOS_MODO_EM_SEQUENCIA[2] [g/m]
  • AI_AÍ[1] [g/m]
  • ALTERNATIVE_CONJUNCTIONS_COMMA[2] [g/m]
  • ALUGAR_CASA[1] [g/m]
  • ALÉM_AQUÉM_RECÉM[1] [g/m]
  • AO90_COMPOUNDS_GENERIC_PREFIX[13] [g/m]
  • AO90_COMPOUNDS_GENERIC_PREFIX[15] [g/m]
  • AO90_COMPOUNDS_GENERIC_PREFIX[16] [g/m]
  • AO90_COMPOUNDS_GENERIC_PREFIX[1] [g/m]
  • AO90_COMPOUNDS_GENERIC_PREFIX[20] [g/m]
  • AO90_COMPOUNDS_GENERIC_PREFIX[5] [g/m]
  • AO90_COMPOUNDS_GENERIC_PREFIX[7] [g/m]
  • AO90_MONTHS_CASING[1] [g/m]
  • AO90_WEEKDAYS_CASING[1] [g/m]
  • AO_MEU_VER[1] [g/m]
  • ARCHAISMS[4] [g/m]
  • ARTICLES_PRECEDING_LOCATIONS[1] [g/m]
  • ATRAVES_DE_POR[1] [g/m]
  • AUXILIARY_VERB_INFINITIVE[1] [g/m]
  • AUXILIARY_VERB_INFINITIVE[2] [g/m]
  • AU[1] [g/m]
  • AVOID_GERUND[1] [g/m]
  • AVOID_GERUND[2] [g/m]
  • AVOID_GERUND[3] [g/m]
  • A_MAIORIA_SINGULAR[1] [g/m]
  • A_NOUN_VERB[2] [g/m]
  • A_NOUN_VERB[3] [g/m]
  • A_WORD[1] [g/m]
  • A_WORD[2] [g/m]
  • BARBARISMS[1] [g/m]
  • BARBARISMS[2] [g/m]
  • BEM_FEITO[1] [g/m]
  • BRASILEIRISMO_16_ATERRISAR[19] [g/m]
  • BRASILEIRISMO_17_DECOLAR[20] [g/m]
  • BRASILEIRISMO_20_REVISAR[24] [g/m]
  • CACOPHONY[10] [g/m]
  • CACOPHONY[1] [g/m]
  • CACOPHONY[2] [g/m]
  • CHAMAR_DENOMINAR_DE[1] [g/m]
  • CHILDISH_LANGUAGE[1] [g/m]
  • CLICHE_127_POR[59] [g/m]
  • CLICHE_17_IR[29] [g/m]
  • CLICHE_25_ABRIR[66] [g/m]
  • CLICHE_3123_PERDER[225] [g/m]
  • CLICHE_313_BATER[115] [g/m]
  • CLICHE_319_BOTAR_2[121] [g/m]
  • CLICHE_320_BRINCAR[122] [g/m]
  • CLICHE_323_CAIR[125] [g/m]
  • CLICHE_327_CANTAR[129] [g/m]
  • CLICHE_338_COLOCAR[140] [g/m]
  • CLICHE_374_DAR[176] [g/m]
  • CLICHE_388_ESTAR[190] [g/m]
  • CLICHE_389_ESTAR_2[192] [g/m]
  • CLICHE_391_ESTAR[194] [g/m]
  • CLICHE_444_TIRAR[269] [g/m]
  • CLICHE_450_TRAZER[275] [g/m]
  • CLICHE_46_NÃO[232] [g/m]
  • CLICHE_7_FAZER[19] [g/m]
  • COLOCACAO_ADVERBIOS_LUGAR[1] [g/m]
  • COLOCACAO_ADVERBIOS_LUGAR[2] [g/m]
  • COLOCACAO_ADVERBIO[1] [g/m]
  • COLOCAÇÃO_ADVÉRBIO[1] [g/m]
  • CONCORDANCIA_COM_NUCLEO_DO_SUJEITO_V2[1] [g/m]
  • CONFUSÃO_ATE_ATÉ[1] [g/m]
  • CONFUSÃO_ESTA_ESTÁ[1] [g/m]
  • CONFUSÃO_ESTA_ESTÁ[2] [g/m]
  • CONFUSÃO_E_É[1] [g/m]
  • CONFUSÃO_E_É[3] [g/m]
  • CONFUSÃO_MAS_MAIS[1] [g/m]
  • CONFUSÃO_NOS_NÓS[1] [g/m]
  • CONFUSÃO_TER_ESTAR[1] [g/m]
  • CONFUSÃO_À_HÁ[2] [g/m]
  • CONJUNTIVO_OBRIGATORIO[3] [g/m]
  • CONTRACOES2[3] [g/m]
  • CONTRACOES_OBRIGATORIAS[3] [g/m]
  • CONTRACOES_OBRIGATORIAS[5] [g/m]
  • CORRER_RISCO_ARRISCAR[1] [g/m]
  • CRASE_CONFUSION[10] [g/m]
  • CRASE_CONFUSION[14] [g/m]
  • CRASE_CONFUSION[1] [g/m]
  • CRASE_CONFUSION[3] [g/m]
  • CRASE_CONFUSION[4] [g/m]
  • CRASE_CONFUSION[6] [g/m]
  • CRASE_CONFUSION[7] [g/m]
  • CRASE_CONFUSION_2[1] [g/m]
  • CRASE_CONFUSION_2[2] [g/m]
  • CUJO_LIGACAO_NOME_ADJETIVO_NUMERAL[1] [g/m]
  • DASH_ENUMERATION_SPACE_RULE[1] [g/m]
  • DASH_RULE[3] [g/m]
  • DENTRO_DE_DA_DAS_DO_DOS_EM_NA_NAS_NO_NOS[1] [g/m]
  • DENTRO_DE_DA_DAS_DO_DOS_EM_NA_NAS_NO_NOS[2] [g/m]
  • DENTRO_DE_DA_DAS_DO_DOS_EM_NA_NAS_NO_NOS[3] [g/m]
  • DENTRO_DE_DA_DAS_DO_DOS_EM_NA_NAS_NO_NOS[4] [g/m]
  • DENTRO_DE_DA_DAS_DO_DOS_EM_NA_NAS_NO_NOS[5] [g/m]
  • DEPOIS_DE_APÓS[1] [g/m]
  • DET_FEM_MAS_A_O[1] [g/m]
  • DE_FORMA-MODO_ADJ[1] [g/m]
  • DE_MANEIRA_ADVERBIO_MODO[1] [g/m]
  • DE_VOCES_VOSSAS[1] [g/m]
  • DIACRITICS[1] [g/m]
  • DIFERENTES[1] [g/m]
  • DOSE_DOZE[2] [g/m]
  • DOS_HUMANOS_HUMANO[1] [g/m]
  • DOS_HUMANOS_HUMANO[2] [g/m]
  • DOUBLE_PUNCTUATION[4] [g/m]
  • DO_MUNDO_REAL_real_reais[1] [g/m]
  • EASILY_CONFUSED_RARE_WORDS[9] [g/m]
  • ELES_ELAS[1] [g/m]
  • ELES_ELAS[2] [g/m]
  • ELES_ELAS[3] [g/m]
  • ELES_ELAS[4] [g/m]
  • ELLIPSIS[1] [g/m]
  • EMPREGAR_TERMO[1] [g/m]
  • EM_QUE_O_OS_A_AS_ONDE[1] [g/m]
  • ENUMERATIONS_AND_AND[1] [g/m]
  • ENUMERATION_COMMAS[1] [g/m]
  • ENUMERATION_COMMAS[2] [g/m]
  • ERROS_DE_CRASE_MARCOAGPINTO[1] [g/m]
  • ERRO_DE_CONCORDNCIA_DO_GÉNERO_MASCULINO_O[1] [g/m]
  • ESPERA_QUE_INDICATIVO[1] [g/m]
  • ESQUECER_VERB[1] [g/m]
  • ESTAR_ADVÉRBIO_A_VERBO_VERBO_ADVÉRBIO[1] [g/m]
  • ESTAR_ADVÉRBIO_A_VERBO_VERBO_ADVÉRBIO[6] [g/m]
  • ESTAR_CLARO_DE_QUE[5] [g/m]
  • ESTAR_DE_ACORDO[1] [g/m]
  • ESTA_ESTÁ[1] [g/m]
  • ESTA_ESTÁ[3] [g/m]
  • ESTA_ESTÁ[4] [g/m]
  • ETC_USAGE[1] [g/m]
  • ETC_USAGE[2] [g/m]
  • ETC_USAGE[3] [g/m]
  • EU_NÓS_REMOVAL[1] [g/m]
  • EU_NÓS_REMOVAL[2] [g/m]
  • E_NO_COMECO[1] [g/m]
  • E_QUE_QUE[1] [g/m]
  • E_QUE_VERBO_E_VERBO[1] [g/m]
  • E_É_SÃO_FOI_FORAM_SENDO_SIDO[1] [g/m]
  • FAZER_EFETUAR_REALIZAR_CONDUZIR_CONCRETIZAR_ELABORAR[1] [g/m]
  • FAZER_EFETUAR_REALIZAR_CONDUZIR_CONCRETIZAR_ELABORAR[2] [g/m]
  • FAZER_USO_DE-USAR-RECORRER[1] [g/m]
  • FAZER_USO_DE-USAR-RECORRER[2] [g/m]
  • FEEDBACK[1] [g/m]
  • FINAL_STOPS[3] [g/m]
  • FORMAL_PRA_PARA[1] [g/m]
  • FRAGMENT_TWO_PREPOSITIONS[1] [g/m]
  • FUTURE_CONJUGATION_ERROR[1] [g/m]
  • GENDER_ESTE_ESTA[2] [g/m]
  • GENERAL_GENDER_AGREEMENT_ERRORS[1] [g/m]
  • GENERAL_GENDER_AGREEMENT_ERRORS[2] [g/m]
  • GENERAL_GENDER_AGREEMENT_ERRORS[3] [g/m]
  • GENERAL_GENDER_AGREEMENT_ERRORS[4] [g/m]
  • GENERAL_GENDER_AGREEMENT_ERRORS[5] [g/m]
  • GENERAL_GENDER_AGREEMENT_ERRORS[6] [g/m]
  • GENERAL_GENDER_AGREEMENT_ERRORS[7] [g/m]
  • GENERAL_GENDER_AGREEMENT_ERRORS[8] [g/m]
  • GENERAL_GENDER_AGREEMENT_ERRORS[9] [g/m]
  • GENERAL_NUMBER_AGREEMENT_ERRORS[1] [g/m]
  • GENERAL_NUMBER_AGREEMENT_ERRORS[2] [g/m]
  • GENERAL_NUMBER_AGREEMENT_ERRORS[3] [g/m]
  • GENERAL_NUMBER_AGREEMENT_ERRORS[4] [g/m]
  • GENERAL_NUMBER_AGREEMENT_ERRORS[5] [g/m]
  • GENERAL_NUMBER_AGREEMENT_ERRORS[6] [g/m]
  • GENERAL_NUMBER_AGREEMENT_ERRORS[7] [g/m]
  • GENERAL_NUMBER_AGREEMENT_ERRORS[8] [g/m]
  • GENERAL_NUMBER_AGREEMENT_ERRORS[9] [g/m]
  • GENERAL_PRONOMIAL_COLOCATIONS[1] [g/m]
  • GENERAL_PRONOMIAL_COLOCATIONS[2] [g/m]
  • GENERAL_PRONOMIAL_COLOCATIONS[3] [g/m]
  • GENERAL_VERB_AGREEMENT_ERRORS[10] [g/m]
  • GENERAL_VERB_AGREEMENT_ERRORS[11] [g/m]
  • GENERAL_VERB_AGREEMENT_ERRORS[2] [g/m]
  • GENERAL_VERB_AGREEMENT_ERRORS[3] [g/m]
  • GENERAL_VERB_AGREEMENT_ERRORS[4] [g/m]
  • GENERAL_VERB_AGREEMENT_ERRORS[5] [g/m]
  • GENERAL_VERB_AGREEMENT_ERRORS[6] [g/m]
  • GENERAL_VERB_AGREEMENT_ERRORS[7] [g/m]
  • GENERAL_VERB_AGREEMENT_ERRORS[8] [g/m]
  • GENERAL_VERB_AGREEMENT_ERRORS[9] [g/m]
  • GENTILICOS_LINGUAS[1] [g/m]
  • GOOGLE[1] [g/m]
  • HAVIA_PRESENTE_CONJUNTIVO_IMPESSOAL[1] [g/m]
  • HIFENIZADOR_VERBOS_2[1] [g/m]
  • HIFENIZADOR_VERBOS_2[2] [g/m]
  • HIFENIZADOR_VERBOS_2[3] [g/m]
  • HIFENIZADOR_VERBOS_2[4] [g/m]
  • HIFENIZADOR_VERBOS_2[5] [g/m]
  • HIPHEN_SPACE_RULES[1] [g/m]
  • HIPHEN_SPACE_RULES[2] [g/m]
  • HOMONYM_VIAGEM_2[2] [g/m]
  • HOURS_ABREVIATION[3] [g/m]
  • HÁ-ATRÁS[1] [g/m]
  • HÁ_TEMPO_QUE_VERBO_V2[2] [g/m]
  • IMPERSONAL_FAZER_TIME_EXPRESSION[1] [g/m]
  • INFORMALITIES[112] [g/m]
  • INFORMALITIES[1] [g/m]
  • INFORMALITIES[27] [g/m]
  • INFORMALITIES[34] [g/m]
  • INFORMALITIES[3] [g/m]
  • INFORMALITIES[50] [g/m]
  • INFORMALITIES[51] [g/m]
  • INIMIGO_ADVERSÁRIO_ALIADO_OPONENTE[1] [g/m]
  • INTERJECTIONS_PUNTUATION[2] [g/m]
  • INTERJECTIONS_PUNTUATION[3] [g/m]
  • INTERNET_ABBREVIATIONS[1] [g/m]
  • INVARIABLE_NOUNS[10] [g/m]
  • INVARIABLE_NOUNS[12] [g/m]
  • INVARIABLE_NOUNS[1] [g/m]
  • IRREGULAR_PAST_PARTICIPLES[1] [g/m]
  • IRREGULAR_PAST_PARTICIPLES[2] [g/m]
  • IR_AOS_POUCOS_BOCADOS_BOCADINHOS[1] [g/m]
  • IR_CONTRACTION_NOUN[1] [g/m]
  • IR_CONTRACTION_NOUN[2] [g/m]
  • KLEENEX[1] [g/m]
  • LHE_S_ME_TE_VOS_VERB[1] [g/m]
  • LINKING_VERB_PREDICATE_AGREEMENT[1] [g/m]
  • LINKING_VERB_PREDICATE_AGREEMENT[2] [g/m]
  • LP_PARONYMS[52] [g/m]
  • MADRUGADA_MATINAL[1] [g/m]
  • MAU_MAL_CONFUSION[2] [g/m]
  • MUITOS_MUITO[1] [g/m]
  • MUITOS_MUITO[2] [g/m]
  • MULTIPLICATION_SIGN[1] [g/m]
  • MULTIPLICATION_SIGN[3] [g/m]
  • NA_NÃO[1] [g/m]
  • NON_IMPERSONAL_VERBS[1] [g/m]
  • NO_VERB[1] [g/m]
  • NUM_NÃO[1] [g/m]
  • OBLIQUOUS_PRONOUN_VERB[1] [g/m]
  • OBLIQUOUS_PRONOUN_VERB[2] [g/m]
  • OBLIQUOUS_PRONOUN_VERB[3] [g/m]
  • OPENING_EXCLAMATION[1] [g/m]
  • ORDINAL_ABREVIATION[1] [g/m]
  • ORDINAL_ABREVIATION[3] [g/m]
  • O_FACTO_DA_ACÇÂO[1] [g/m]
  • PARA-POR_TER_PARTICIPIO-PASSADO[1] [g/m]
  • PARA-POR_TER_PARTICIPIO-PASSADO[2] [g/m]
  • PARA-POR_TER_PARTICIPIO-PASSADO[4] [g/m]
  • PARENTESESE_AND_QUOTES_SPACING[3] [g/m]
  • PARONYM_ABRENUNCIO_1[4] [g/m]
  • PARONYM_EXPERIENCIA_126[103] [g/m]
  • PARONYM_MAFIA_196[164] [g/m]
  • PARONYM_MEDIA_201[169] [g/m]
  • PARONYM_MEDICA_484[408] [g/m]
  • PARONYM_NOTICIA_224[188] [g/m]
  • PARONYM_POLITICA_523[448] [g/m]
  • PARONYM_PREVIA_526[451] [g/m]
  • PARONYM_PUBLICA_535[460] [g/m]
  • PARONYM_SABIA_554[472] [g/m]
  • PARONYM_VARIA_340[284] [g/m]
  • PARONYM_VIVENCIA_350[291] [g/m]
  • PASSAR_A_VERBO[1] [g/m]
  • PAST_FUTURE_CONFUSION[3] [g/m]
  • PHRASAL_VERB_RESIDIR_EM[9] [g/m]
  • PHRASAL_VERB_TER_DE_QUE[8] [g/m]
  • PHRASE_REPETITION[1] [g/m]
  • PORQUE_É_POR_SER[1] [g/m]
  • POR_QUEH_PORQUE[1] [g/m]
  • POR_QUE_PORQUE[4] [g/m]
  • POR_QUE_PORQUE[9] [g/m]
  • POSSESSIVE_WITHOUT_ARTICLE[1] [g/m]
  • POSSESSIVE_WITHOUT_ARTICLE[2] [g/m]
  • POSSESSIVE_WITHOUT_ARTICLE[3] [g/m]
  • POSSIVELMENTE_CONJUNTIVO[1] [g/m]
  • POST-IT[1] [g/m]
  • PP_OBJ_IND[1] [g/m]
  • PP_OBJ_IND[2] [g/m]
  • PROCLISE_COMECO_FRASE[1] [g/m]
  • PROCLISE_COMECO_FRASE[6] [g/m]
  • PROFANITY[1] [g/m]
  • QUE_ESTAR_CONTRACAO_PREPOSICAO[1] [g/m]
  • QUE_FORAM_FOI_SÃO_É_SENDO[1] [g/m]
  • QUE_FORAM_FOI_SÃO_É_SENDO[2] [g/m]
  • QUE_FORAM_FOI_SÃO_É_SENDO[3] [g/m]
  • QUE_FORAM_FOI_SÃO_É_SENDO[4] [g/m]
  • QUE_HÁ_QUE_NÃO_HÁ[1] [g/m]
  • QUE_SER_ESTAR_PARTPASSADO[1] [g/m]
  • QUE_SUBJ_VS_INF_PESS[1] [g/m]
  • QUE_SUBJ_VS_INF_PESS[2] [g/m]
  • QUE_VERBO[1] [g/m]
  • QUE_VERBO[2] [g/m]
  • QUE_VERBO[3] [g/m]
  • QUE_VERBO[4] [g/m]
  • QUE_VERBO_A_VERBOINFINITIVO[1] [g/m]
  • QUE_VERB_GERUND[1] [g/m]
  • QUE_É-SÃO_NC-ADJ_COMO-POR[1] [g/m]
  • REDUNDANCY_JUNTO_COM[5] [g/m]
  • REDUNDANCY_SEGUIR_EM_FRENTE[10] [g/m]
  • REDUNDANT_CONJUNCTIONS[1] [g/m]
  • REFLEXIVE_VERB_SE_AGREEMENT[1] [g/m]
  • REFLEXIVE_VERB_SE_AGREEMENT[2] [g/m]
  • REPEATED_WORDS[1] [g/m]
  • REPEATED_WORDS_3X[1] [g/m]
  • ROMAN_NUMBERS_CHECKER[1] [g/m]
  • SAO-E_PARA_SER-SEREM_PARTPASSADO[1] [g/m]
  • SENTENCE_FRAGMENT[2] [g/m]
  • SER_ADJECTIVE_AGREEMENT[1] [g/m]
  • SER_ADJECTIVE_AGREEMENT[3] [g/m]
  • SER_CAPAZ_DE_CONSEGUIR[1] [g/m]
  • SIMPLIFICAR_COMO_SENDO[1] [g/m]
  • SIMPLIFICAR_CONVERTER_PARA_VERBO_INFINITIVO[1] [g/m]
  • SIMPLIFICAR_CONVERTER_PARA_VERBO_INFINITIVO[2] [g/m]
  • SIMPLIFICAR_O_QUE_VERBO_VERBOGERUNDIO[1] [g/m]
  • SIMPLIFICAR_POIS_REMOVER[1] [g/m]
  • SIMPLIFICAR_QUE_E_TEM_TÊM[1] [g/m]
  • SIMPLIFICAR_REMOVER_É_QUE_PARTICÍPIO[1] [g/m]
  • SIMPLIFICAR_VERBO_TER_MAIS_PARTICIPIO_PASSADO[1] [g/m]
  • SIMPLIFICAR_VERBO_TER_MAIS_PARTICIPIO_PASSADO[2] [g/m]
  • SIMPLIFICAR_VERBO_TER_MAIS_PARTICIPIO_PASSADO[3] [g/m]
  • SIMPLIFICAR_VERBO_TER_MAIS_PARTICIPIO_PASSADO[4] [g/m]
  • SMART_QUOTES[1] [g/m]
  • SMART_QUOTES[2] [g/m]
  • SPACE_AFTER_PUNCTUATION[2] [g/m]
  • SPACE_BEFORE_PUNCTUATION[1] [g/m]
  • SUBSTANTIVOS_USADOS_COMO_ADJETIVOS_INVARIAVEIS_V2[1] [g/m]
  • T-V_DISTINCTION[10] [g/m]
  • T-V_DISTINCTION[11] [g/m]
  • T-V_DISTINCTION[1] [g/m]
  • T-V_DISTINCTION[2] [g/m]
  • T-V_DISTINCTION[3] [g/m]
  • T-V_DISTINCTION[4] [g/m]
  • T-V_DISTINCTION[6] [g/m]
  • T-V_DISTINCTION[7] [g/m]
  • T-V_DISTINCTION[8] [g/m]
  • T-V_DISTINCTION[9] [g/m]
  • TER_PARTICIPIO-PASSADO[1] [g/m]
  • TER_PARTICIPIO-PASSADO[2] [g/m]
  • TER_PARTICIPIO-PASSADO[3] [g/m]
  • TER_PARTICIPIO-PASSADO[4] [g/m]
  • TESE_PHD_FICOU_DEMONSTRADO_QUE_DEMONSTRÁMOS_QUE[1] [g/m]
  • TESE_PHD_PROCURAR_PROVAR_PROVARA[1] [g/m]
  • TESE_PHD_PROCURAR_PROVAR_PROVARA[2] [g/m]
  • TE_DE[1] [g/m]
  • TIME_FORMAT[2] [g/m]
  • TIME_FORMAT[3] [g/m]
  • TIRAR_FOTOGRAFIA[2] [g/m]
  • TODOS_FOLLOWED_BY_NOUN_PLURAL[1] [g/m]
  • TODOS_FOLLOWED_BY_NOUN_SINGULAR[1] [g/m]
  • UNA_UNAS_UMA_UMAS_CONFUSÃO[1] [g/m]
  • UPPERCASE_AFTER_COMMA[1] [g/m]
  • VERBO1_QUE_VERBO2_VERBO3_O_QUE[1] [g/m]
  • VERBO_ESTAR_A_VERBO_INF[1] [g/m]
  • VERBO_ESTAR_A_VERBO_INF[2] [g/m]
  • VERBO_ESTAR_A_VERBO_INF[3] [g/m]
  • VERBO_ESTAR_A_VERBO_INF[4] [g/m]
  • VERBO_ESTAR_A_VERBO_INF[5] [g/m]
  • VERBO_ESTAR_A_VERBO_INF[6] [g/m]
  • VERBO_INFINITIVO[1] [g/m]
  • VERBO_PARA_PRONOME_PESSOAL[1] [g/m]
  • VERBO_PARA_PRONOME_PESSOAL[2] [g/m]
  • VERBO_PARA_PRONOME_PESSOAL[3] [g/m]
  • VERBO_PARA_PRONOME_PESSOAL[6] [g/m]
  • VERB_COMMA_CONJUNCTION[18] [g/m]
  • VERB_COMMA_CONJUNCTION[25] [g/m]
  • VERB_COMMA_CONJUNCTION[2] [g/m]
  • VERB_COMMA_CONJUNCTION[39] [g/m]
  • VERB_COMMA_CONJUNCTION[3] [g/m]
  • VERB_COMMA_CONJUNCTION[4] [g/m]
  • VERB_COMMA_CONJUNCTION[5] [g/m]
  • VERB_COMMA_CONJUNCTION[7] [g/m]
  • VERB_QUE_É_VERB_SER[1] [g/m]
  • VERB_QUE_É_VERB_SER[2] [g/m]
  • VERB_QUE_É_VERB_SER[3] [g/m]
  • VER_OBSERVAR_CONSTATAR[1] [g/m]
  • VIR_A_VERBO_VERBO[1] [g/m]
  • VISTO_DADO_QUE[2] [g/m]
  • VISTO_POR[1] [g/m]
  • WORDINESS[10] [g/m]
  • WORDINESS[11] [g/m]
  • WORDINESS[16] [g/m]
  • WORDINESS[25] [g/m]
  • WORDINESS[26] [g/m]
  • WORDINESS[28] [g/m]
  • WORDINESS[7] [g/m]
  • WORDINESS[9] [g/m]
  • ZERO_MEDIDA_SINGULAR[1] [g/m]
  • À_VERBO_DE[1] [g/m]
  • É_PLURAL[1] [g/m]
  • É_QUE_AS-OS_NC_É-SÃO[1] [g/m]
  • COMMA_PARENTHESIS_WHITESPACE [g/m]
  • HUNSPELL_RULE [g/m]
  • PORTUGUESE_WORD_REPEAT_BEGINNING_RULE [g/m]
  • PORTUGUESE_WORD_REPEAT_RULE [g/m]
  • PT_AGREEMENT_REPLACE [g/m]
  • PT_ARCHAISMS_REPLACE [g/m]
  • PT_BARBARISMS_REPLACE [g/m]
  • PT_COMPOUNDS_POST_REFORM [g/m]
  • PT_PT_SIMPLE_REPLACE [g/m]
  • PT_REDUNDANCY_REPLACE [g/m]
  • PT_SIMPLE_REPLACE [g/m]
  • PT_WIKIPEDIA_COMMON_ERRORS [g/m]
  • PT_WORDINESS_REPLACE [g/m]
  • PT_WORD_COHERENCY [g/m]
  • SENTENCE_WHITESPACE [g/m]
  • TOO_LONG_SENTENCE [g/m]
  • UPPERCASE_SENTENCE_START [g/m]

some gender issues

bisavô (m) bisavó (f). But what are the proper plurals? Online dictionaries are a bit confusing.
cookies (anglicism) can be both m and f?
fan (anglicism) both m and f?
gameta (BR m), gâmeta (m) is a scientific term; gameta (f) is a regionalism we can ignore?

@marcoagpinto @ricardojosehlima

info: some numbers about PT dictionaries and other languages

Number of lines in tagger dicts in different languages
FR  634004 
CA 1265005
PT 1489625
ES 3603260 

The FR dictionary seems small in comparison, but in fact is enormous. We do all the FR spelling with this.
The main difference between FR and CA/PT is the quantity of verbal forms. Catalan and Portuguese have more than twice as many. Spanish still has many more verbal forms, with joined enclitic pronouns.

Portuguese spelling dicts:

Lines in PT spelling dicts before tokenization:
  9960408 pt_AO1.txt
 10485607 pt_BR1.txt
  9163224 pt_MZ1.txt
 11535687 pt_PT1.txt

Lines in PT spelling dicts after tokenization, removing enclitics:
  5047492 pt_AO2.txt
  2787300 pt_BR2.txt
  4283146 pt_MZ2.txt
  6360406 pt_PT2.txt

Words invalid for PT?

@jaumeortola

I don't know how to tag Ricardo Joseh Lima in this post.

The words:
dize
faze

appear in the tagger dictionary, but I believe they don't exist.

Differences in spelling: Brazil/Portugal

I would like to have a clear idea of the main differences in spelling and the scope of these differences.

Portugal: receção, facto/fato (depends on the meaning), contacto, óptimo, acção, ténis, tónica, académico, demónio, António
Brazil: recepção, fato, contato, ótimo, ação, tênis, tônica, acadêmico, demônio, Antônio

What other examples we can find?

Documentation about Acordo Otográfico at Priberam:
http://www.priberam.pt/docs/CriteriosFLiPAO.pdf
http://www.priberam.pt/docs/AcOrtog90.pdf

'ada' is a word?

These forms come from the BR Hunspell dict.

pt_BR1.txt:ada
pt_BR1.txt:ada-a
pt_BR1.txt:ada-as
pt_BR1.txt:ada-lhe
pt_BR1.txt:ada-lhes
pt_BR1.txt:ada-me
pt_BR1.txt:ada-nos
pt_BR1.txt:ada-o
pt_BR1.txt:ada-os
pt_BR1.txt:ada-se
pt_BR1.txt:ada-se-lhe
pt_BR1.txt:ada-se-lhes
pt_BR1.txt:ada-te
pt_BR1.txt:ada-vos

adir/akjL in pt_BR.dic probably is wrong. SFX a = regular verb in pt_BR.aff

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.