Comments (10)
hey Jared, parentheses in the match syntax are for OR matches, like (a|b|c)
- I'm not sure what (#Person+)
is intended.
Maybe you can describe the match you're looking for, and I can help you create it.
cheers
from compromise.
@spencermountain - hopefully this makes sense.
import nlp from "https://esm.sh/compromise"
function findMotorplex(text) {
const doc = nlp(text);
const motorplexes = doc.match('#Person+ (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)').out('array');
return motorplexes;
}//
//
// Test the function with an expanded test list
const testList = [
'Lucas Oil Raceway is a famous motorsports complex.',
];
testList.forEach((test, index) => {
const result = findMotorplex(test);
console.log(`Test ${index + 1}: ${result.length > 0 ? result : 'No motorplex found.'}`);
});//
Outputs:
"Test 1: Lucas Oil Raceway"
Now when I use a match like this - trying to handle all cases & match ALL names (in this list) for common patterns found with dragways / raceways etc... (to hopefully help improve compromise rule sets finding orgs etc)
import nlp from "https://esm.sh/compromise"
function findMotorplex(text) {
const doc = nlp(text);
const motorplexes = doc.match('(#Place+|#Organization|#Noun|#Person+) (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)').out('array');
return motorplexes;
}//
//
// Test the function with an expanded test list
const testList = [
'I live at the Motorplex and I am hosting an event this weekend.',
'I visited Brisbane Dragway last summer.',
'Lucas Oil Raceway is a famous motorsports complex.',
'Sydney Dragway hosts the Nitro Thunder event.',
'Bandimere Speedway is known for the NHRA Mile-High Nationals.',
'Santa Pod Raceway in Wellingborough is a popular drag racing venue.',
'Perth Motorplex features drag racing, speedway, and dirt track events.',
'Maple Grove Raceway hosts the NHRA Nationals.',
'Gulfport Dragway is a drag racing facility in Mississippi.',
'South Georgia Motorsports Park is a versatile motorsports facility.',
];
testList.forEach((test, index) => {
const result = findMotorplex(test);
console.log(`Test ${index + 1}: ${result.length > 0 ? result : 'No motorplex found.'}`);
});//
The match for Lucas Oil only returns "Oil Raceway"
. Again - hoping this is just a brain fart on my regex skills right now and not an issue with the parser. lol But hoping you can play with that code and try changing orders of matches for first group as it seem's the results were off. (or if I am just loosing my mind).
& oddly this (just playing with parser - I know groups are meant for different matches lol)
const motorplexes = doc.match('(#Person+) (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)').out('array');
will only return Sydney Raceway. (Which confused me even more) lol.
from compromise.
@spencermountain - think I somewhat found the issue (has to do with people in match I think) - hoping you get your thoughts.
import nlp from "https://esm.sh/compromise"
function findMotorplex(text) {
const doc = nlp(text);
const motorplexes = doc.match('(#Place+|#Person #Person|#Organization|#Noun) (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)').out('array');
return motorplexes;
}//
//
// Test the function with an expanded test list
const testList = [
'I live at the Motorplex and I am hosting an event this weekend.',
'I visited Brisbane Dragway last summer.',
'Lucas Oil Raceway is a famous motorsports complex.',
'Sydney Dragway hosts the Nitro Thunder event.',
'Bandimere Speedway is known for the NHRA Mile-High Nationals.',
'Santa Pod Raceway in Wellingborough is a popular drag racing venue.',
'Perth Motorplex features drag racing, speedway, and dirt track events.',
'Maple Grove Raceway hosts the NHRA Nationals.',
'Gulfport Dragway is a drag racing facility in Mississippi.',
'South Georgia Motorsports Park is a versatile motorsports facility.',
];
testList.forEach((test, index) => {
const result = findMotorplex(test);
console.log(`Test ${index + 1}: ${result.length > 0 ? result : 'No motorplex found.'}`);
});//
This properly matches all results properly. (besides maple grove & santa pod - gets partial matches - not sure best solution yet for that) But I should not need to use a #Person #Person
& only should need a #Person+
.
from compromise.
yep, looks good to me
from compromise.
@spencermountain - are you sure? Shouldn't "Person+" grab multiple words? And if not - why does it do this when used by itself?
As still confused why this #Person+ (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)
works for Lucas Oil Speedway...?
But required to use #Person #Person
to match it properly.
As this:
(#Place+|#Person #Person|#Organization|#Noun) (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)
as far as I know SHOULD have worked with
(#Place+|#Person+|#Organization|#Noun) (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)
and same results.
(Again - hoping you can clarify this for me - as I don't wanna touch a rule set till I am cleared up on this lol)
Hoping to we can go through a list one weekend etc and make some more rules for common patterns of organizations etc found in human language.
from compromise.
hey, ya if you do this:
nlp('Lucas Oil Speedway').debug()
you'll see it's mistakenly tagged as firstname-lastname.
if you wanted to drill-down into why, add nlp.verbose('tagger')
before you run it, and it will show Lucas is tagged as a first name, then the firstname-titlecase
matcher mistakenly taggs it as a lastname.
cheers
from compromise.
for 14.10.1 i added a bunch of org/person tagging changes, provoked by some of the issues you've found.
It's a bit of a mess - you can see it in ./src/2-two/preTagger/compute/tagger/3rd-pass/
The main concern was that doing #TitleCase (library|theatre|airport|.....)
works for a handful of OR matches, but not two hundred. It starts to slow-down the library considerably. I added the placeWords/orgWords and a bunch of loops, to reproduce this in a faster way, but it's not very nice.
It would be great if you could find more issues with both - particularly false-positives, which IMO are a much more important problem than missing an organization here or there.
For example, if you found that it tags 'park my car' as an Org, or something - that would be lovely.
thanks
from compromise.
@spencermountain - so I was not going crazy then! I was purposely trying to tag Lucas Oil as person as I seen the tags were "#FirstName #LastName". As when I was looking at debugger (was using it previously to see wtf was going on).
nlp('Lucas Oil Speedway').debug()
by itself worked fine. (strangely in list too)....?
But when I used the WHOLE list it wouldn't catch that. (And it appeared to have same tags as I do recall)
I will try to report some false positives. Tho this is one of those thing's that kinda is just a mind fuck (if that makes sense lol).
Tho I am concerned about this and hoping this is not a MAJOR issue - as it will obviously be affecting the rule set right now (that we think is correct / passing tests....)
That said - if we are on same page, should this issue be opened back up again? (if so please do so - so I / other's are not confused). Mostly for me tho right now as still confused lol 😿
from compromise.
@spencermountain - should this be re-opened or....?
from compromise.
hey jared, can you reproduce it in a dumbed-down format, because i'm a big dumb guy
nlp('foo bar').match('(foo+) bar') //failing
that helps, thanks.
from compromise.
Related Issues (20)
- Apostrophe "s" disambiguation issue with search query style sentences HOT 7
- Query: Does Compromise.js compile RegExes from match-syntax? HOT 1
- Get .terms() but keep hyphenated strings (similar to .hyphenated() ) HOT 1
- Using .freeze() in nlp.plugin()? HOT 11
- JSON Speed HOT 2
- Tagging mixed number as #Value HOT 5
- Feature request: Logical operations in match HOT 2
- [Issue]: Various common nouns tagged as proper noun. HOT 6
- True Casing HOT 10
- [Improvements]: Add .toLowerCase() API to various functions. HOT 1
- [Issue]: Gov Rule & Possible Other's Needs Improved. HOT 5
- [Issue]: "My favorite time of the year" in .nouns() response HOT 3
- `.prepend()` removes frozen tags for acronyms HOT 2
- Improve TypeScript DX by reducing usage of "any" HOT 1
- NFD form combining characters not picked up as part of word HOT 3
- Feature: .slashes() tokenize transform HOT 6
- Geedy tag matching and punctuation HOT 2
- [Feature Request]: Flesch–Kincaid Function HOT 6
- "to" is a preposition and not a conjuction HOT 1
- Verb is mistakenly parsed as a noun. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from compromise.