Git Product home page Git Product logo

Comments (10)

spencermountain avatar spencermountain commented on June 2, 2024

hey Jared, parentheses in the match syntax are for OR matches, like (a|b|c) - I'm not sure what (#Person+) is intended.
Maybe you can describe the match you're looking for, and I can help you create it.
cheers

from compromise.

MarketingPip avatar MarketingPip commented on June 2, 2024

@spencermountain - hopefully this makes sense.

import nlp from "https://esm.sh/compromise"

function findMotorplex(text) {
  const doc = nlp(text);
  const motorplexes = doc.match('#Person+ (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)').out('array');

  return motorplexes;
}//
//
// Test the function with an expanded test list
const testList = [
  'Lucas Oil Raceway is a famous motorsports complex.',
];

testList.forEach((test, index) => {
  const result = findMotorplex(test);
  console.log(`Test ${index + 1}: ${result.length > 0 ? result : 'No motorplex found.'}`);
});//

Outputs:
"Test 1: Lucas Oil Raceway"

Now when I use a match like this - trying to handle all cases & match ALL names (in this list) for common patterns found with dragways / raceways etc... (to hopefully help improve compromise rule sets finding orgs etc)

import nlp from "https://esm.sh/compromise"

function findMotorplex(text) {
  const doc = nlp(text);
 const motorplexes = doc.match('(#Place+|#Organization|#Noun|#Person+) (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)').out('array');

  return motorplexes;
}//
//
// Test the function with an expanded test list
const testList = [
  'I live at the Motorplex and I am hosting an event this weekend.',
  'I visited Brisbane Dragway last summer.',
  'Lucas Oil Raceway is a famous motorsports complex.',
  'Sydney Dragway hosts the Nitro Thunder event.',
  'Bandimere Speedway is known for the NHRA Mile-High Nationals.',
  'Santa Pod Raceway in Wellingborough is a popular drag racing venue.',
  'Perth Motorplex features drag racing, speedway, and dirt track events.',
  'Maple Grove Raceway hosts the NHRA Nationals.',
  'Gulfport Dragway is a drag racing facility in Mississippi.',
  'South Georgia Motorsports Park is a versatile motorsports facility.',
];

testList.forEach((test, index) => {
  const result = findMotorplex(test);
  console.log(`Test ${index + 1}: ${result.length > 0 ? result : 'No motorplex found.'}`);
});//

The match for Lucas Oil only returns "Oil Raceway". Again - hoping this is just a brain fart on my regex skills right now and not an issue with the parser. lol But hoping you can play with that code and try changing orders of matches for first group as it seem's the results were off. (or if I am just loosing my mind).

& oddly this (just playing with parser - I know groups are meant for different matches lol)

const motorplexes = doc.match('(#Person+) (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)').out('array');

will only return Sydney Raceway. (Which confused me even more) lol.

from compromise.

MarketingPip avatar MarketingPip commented on June 2, 2024

@spencermountain - think I somewhat found the issue (has to do with people in match I think) - hoping you get your thoughts.

import nlp from "https://esm.sh/compromise"

function findMotorplex(text) {
  const doc = nlp(text);
  const motorplexes = doc.match('(#Place+|#Person #Person|#Organization|#Noun) (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)').out('array');

  return motorplexes;
}//
//
// Test the function with an expanded test list
const testList = [
  'I live at the Motorplex and I am hosting an event this weekend.',
  'I visited Brisbane Dragway last summer.',
  'Lucas Oil Raceway is a famous motorsports complex.',
  'Sydney Dragway hosts the Nitro Thunder event.',
  'Bandimere Speedway is known for the NHRA Mile-High Nationals.',
  'Santa Pod Raceway in Wellingborough is a popular drag racing venue.',
  'Perth Motorplex features drag racing, speedway, and dirt track events.',
  'Maple Grove Raceway hosts the NHRA Nationals.',
  'Gulfport Dragway is a drag racing facility in Mississippi.',
  'South Georgia Motorsports Park is a versatile motorsports facility.',
];

testList.forEach((test, index) => {
  const result = findMotorplex(test);
  console.log(`Test ${index + 1}: ${result.length > 0 ? result : 'No motorplex found.'}`);
});//

This properly matches all results properly. (besides maple grove & santa pod - gets partial matches - not sure best solution yet for that) But I should not need to use a #Person #Person & only should need a #Person+.

from compromise.

spencermountain avatar spencermountain commented on June 2, 2024

yep, looks good to me

from compromise.

MarketingPip avatar MarketingPip commented on June 2, 2024

@spencermountain - are you sure? Shouldn't "Person+" grab multiple words? And if not - why does it do this when used by itself?

As still confused why this #Person+ (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway) works for Lucas Oil Speedway...?

But required to use #Person #Person to match it properly.

As this:

(#Place+|#Person #Person|#Organization|#Noun) (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)

as far as I know SHOULD have worked with

(#Place+|#Person+|#Organization|#Noun) (Motorplex|Dragway|Raceway|Motorsports|Racetrack|Speedway)

and same results.

(Again - hoping you can clarify this for me - as I don't wanna touch a rule set till I am cleared up on this lol)

Hoping to we can go through a list one weekend etc and make some more rules for common patterns of organizations etc found in human language.

from compromise.

spencermountain avatar spencermountain commented on June 2, 2024

hey, ya if you do this:

nlp('Lucas Oil Speedway').debug()

you'll see it's mistakenly tagged as firstname-lastname.

if you wanted to drill-down into why, add nlp.verbose('tagger') before you run it, and it will show Lucas is tagged as a first name, then the firstname-titlecase matcher mistakenly taggs it as a lastname.
cheers

from compromise.

spencermountain avatar spencermountain commented on June 2, 2024

for 14.10.1 i added a bunch of org/person tagging changes, provoked by some of the issues you've found.

It's a bit of a mess - you can see it in ./src/2-two/preTagger/compute/tagger/3rd-pass/
The main concern was that doing #TitleCase (library|theatre|airport|.....) works for a handful of OR matches, but not two hundred. It starts to slow-down the library considerably. I added the placeWords/orgWords and a bunch of loops, to reproduce this in a faster way, but it's not very nice.

It would be great if you could find more issues with both - particularly false-positives, which IMO are a much more important problem than missing an organization here or there.

For example, if you found that it tags 'park my car' as an Org, or something - that would be lovely.
thanks

from compromise.

MarketingPip avatar MarketingPip commented on June 2, 2024

@spencermountain - so I was not going crazy then! I was purposely trying to tag Lucas Oil as person as I seen the tags were "#FirstName #LastName". As when I was looking at debugger (was using it previously to see wtf was going on).

 nlp('Lucas Oil Speedway').debug() 

by itself worked fine. (strangely in list too)....?

But when I used the WHOLE list it wouldn't catch that. (And it appeared to have same tags as I do recall)

I will try to report some false positives. Tho this is one of those thing's that kinda is just a mind fuck (if that makes sense lol).

Tho I am concerned about this and hoping this is not a MAJOR issue - as it will obviously be affecting the rule set right now (that we think is correct / passing tests....)

That said - if we are on same page, should this issue be opened back up again? (if so please do so - so I / other's are not confused). Mostly for me tho right now as still confused lol 😿

from compromise.

MarketingPip avatar MarketingPip commented on June 2, 2024

@spencermountain - should this be re-opened or....?

from compromise.

spencermountain avatar spencermountain commented on June 2, 2024

hey jared, can you reproduce it in a dumbed-down format, because i'm a big dumb guy

nlp('foo bar').match('(foo+) bar') //failing

that helps, thanks.

from compromise.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.