typelevel / cats-parse Goto Github PK
View Code? Open in Web Editor NEWA parsing library for the cats ecosystem
License: MIT License
A parsing library for the cats ecosystem
License: MIT License
make the CI fail the build if the code isn't formatted.
import cats.parse.{Parser0, Parser => P, Numbers}
private def name:P[String] = P.charIn(('a' to 'z')).rep.string
private def alias: P[String] = name <* P.char(':')
(alias.? ~ name).parse("abc")
the code will return Left(Error(3,NonEmptyList(InRange(3,:,:))))
, It should be Right(_, (None,abc))
.
I tried (alias.?.backtrack ~ name)
, (alias.?.soft ~ name)
, both does not work.
`
/**
Demonstrates a possible bug with cats-parse.
Parses a parenthesized word list (foo, bar, x, y),
but fails to parse if a space precedes the final ')'.
cats-parse version 0.3.2
Scala version 3.0.0-RC2
*/
import cats.parse.Parser=>P
@main def main():Unit =
// Specific types of characters.
val whitespace = P.charIn( " \r\t\n")
val letter = P.charIn('a' to 'z')
val comma = P.char(',')
val lParen = P.char('(')
val rParen = P.char(')')
// For testing, a lowercase word.
val word = letter.rep.string
// Allow optional spaces around the list characters - ( , )
val whitespaces0 = whitespace.rep0.void
val listStart = lParen.surroundedBy(whitespaces0).void
val listEnd = rParen.surroundedBy(whitespaces0).void
val listSeparator = comma.surroundedBy(whitespaces0).void
// Define a parenthesized list of words ... eg. (foo, bar, x, y)
val wordList = listStart ~ word.repSep0(listSeparator) ~ listEnd
// This wordlist parses fine.
val result1 = wordList.parseAll("(foo, bar, x, y)")
assert(result1.isRight)
// PROBLEM: If a space precedes the final ')', then it fails.
val result2 = wordList.parseAll("(foo, bar, x, y )")
assert(result2.isRight)
`
needed for downstream http4s
Hi @johnynek!
Can you enable gh-pages
in the repo settings? The site seems to have been published successfully and I believe that's the only thing missing for this to work https://typelevel.github.io/cats-parse.
Thanks!
I got the following message trying to open Rfc5234 in IntelliJ
Error reading TASTy file: /Users/hjs/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/typelevel/cats-parse_3.0.0-RC2/0.3-3-d801d0a/cats-parse_3.0.0-RC2-0.3-3-d801d0a.jar!/cats/parse/Rfc5234.tasty
I tried other versions of cats-parse for RC2 from maven central with the same result.
As this is a bit bleeding edge it could be IntelliJ or the deployed library. Just thought I'd ping you here.
Is it possible to work with YAML files using this library? So far all solutions I found for Scala are using Java libs for that.
users should be able to give us (String, Int) => Either[Error, (Int, A)]
for cases where they can't express their parsing in terms of the core combinators.
Then we would at runtime check if the Int is >=
the input offset, and it it is return, else report an InvariantViolationError or something for the parser (which I guess would be an epsilon error), or potentially throw an exception, since users should not be able to recover from errors like that...
An alternative is just let all bets be off, and not check that the returned Index makes sense, and just let users live with the consequences.
This is a can of worms, and maybe we should avoid adding such a function.
I'm trying to implement a parser that repeats parser p1 until the rest of the string matches parser p2.
My current solution is this, but it's not really elegant and needs the input string to work.
def repeatUntil2ndParserMatches(input: String, p1: P[String], p2: P[String], maxRepetitions: Int = 100): P[String] = {
import cats.syntax.applicative._
LazyList
.range(1, maxRepetitions)
.flatMap { i =>
println(s"trying p1 $i times")
val newP1 = p1.backtrack.replicateA(i)
val p1Result = newP1.parse(input)
p1Result match {
case Left(_) =>
List(P.fail)
case Right((rest, _)) =>
val p2Result = p2.backtrack.parse(rest)
p2Result match {
case Left(_) => List.empty
case Right(_) =>
List((newP1 ~ p2).map { case (list, res2) => list.appended(res2) }.map(_.mkString))
}
}
}
.headOption match {
case Some(value) => value
case None => P.fail
}
}
Can this be done more elegantly?
I don't know if that is a common use-case in the parser world, but do know that regex groups can be made non-greedy.
I saw #128 where repetition is being discussed - maybe that is something others might find useful.
Maybe it'd be helpful if there was a combinator similar to flatMap that provides the tuple of (remainder, matched)
.
Currently, when a parser fails you get a nonempty list of offsets and failed expectations.
We could also add "scope" wrapper, like: number.scope("number").orElse(str.scope("string"))
. How this would work is the mutable State would have a stack of these scopes, and we have:
case class ScopeParser[A](parser: Parser[A], scope: String) extends Parser[A]
in parseMut we would push the current scope onto the stack, then parse with parser, then pop it off the stack.
When we error, we take a snapshot of the current scope stack.
So, if we do this, users can have an easier time labeling parts of their parsers and seeing where things went wrong.
Fastparse does something similar.
What do you think of this design @mpilquist @non @rossabaker and really anyone who cares to comment.
see:
The main issue is that if you have do this pattern:
Defer[P1].fix[Ast] { self =>
...
P.oneOf1(a :: b :: c ...
}
then you have to make sure that all of a, b, c, etc... can make some progress without first using self.
For instance, you need to put all your constants first in that list. Parsing operators is its own item in the FAQ. Generally what I like to do is parse a list of (Operator, Item)
. so you do: (parseAtom ~ postOp.rep).map { case (a, fs) => fs.fold(a) { case (a0, (op, a1)) => addOp(a0, op, a1)`
keeping in mind operator precedence, etc... that's a whole other faq...
I think we are currently testing all the laws these typeclasses require, but we aren't using the cats laws package.
It might be nice to use those just to be 100% sure everything is fully lawful.
Hi all!
I'm just trying to port scala-uri from parboiled2 to cats-parse.
One thing I find rather confusing is the with1
method to make a Parser0
behave like a Parser
.
I'm wondering if a different way to encode this could work, e.g.
trait LowPriorityImplicits {
implicit class RichParser0(parser: Parser0) {
def ~(other: Parser0): Parser0 = ???
}
}
object Parser0 extends LowPriorityImplicits {
implicit class EvenRicherParser(parser: Parser0) {
def ~(other: Parser): Parser = ???
}
}
val foo0: Parser0 = null
val foo: Parser = null
val concatted: Parser = (foo0 ~ foo)
Some of our GitHub Actions are a bit old, and we are getting deprecation warnings. My hyopthesis is that at least one of our actions is making a deprecated call.
things like BigInt, Int, Long, Float, etc...
Also things like standard whitespace, bracketed lists, etc...
It should be possible to write a JSON parser with a pretty minimal combination of these. This helps people learn patterns but also lets us really optimize some basic blocks that people will almost always need.
There are various ways to repeat stuff. With or without a separator, with a minimum number of repetitions, allowing 0 repetitions or not, gathering into some accumulator. There is also a ticket open to add a maximum: #97
That gives rise to a lot of different combinations for repeating parser constructors, with opportunity for inconsistency and not having some specific combinations of concerns.
What do you think about adding a fluent API starting from rep or rep0, and then adding combinators for min, max, separator and accumulator?
Repsep is 0 or more and rep1sep is 1 or more, but you still have to pass in a minimum, which feels unnatural, especially for repsep where the minimum is implied to be 0 anyway. Default values of 0 or 1, or overloads to the same effect would be useful.
Hi!
I encountered a problem when using oneOf.
To make it easier to communicate I created a test that fails. You can find it here:
https://github.com/FloWi/cats-parse/blob/oneOfError/core/shared/src/test/scala/cats/parse/OneOfTest.scala
When you run the failing test, you see that the last parser in the list succeeds, but the oneOf-parser, that uses those parsers, still fails.
sbt "testOnly *OneOfTest*"
I tried to write a parser that simplifies an expression of a grammar.
// (a|b)a --> aa|ba
// a(a|b) --> aa|ab
// (a|b) --> a|b
I'm quite new to parsers and have to idea, if this is a bug or if I did something wrong - AdventOfCode brought me to this rabbit-hole :)
I'm seeing two tests failing:
cats.parse.ParserTest.voided only changes the result
cats.parse.ParserTest.with1 *> and with1 <* work as expected
The errors have the form:
values are not the same
=> Diff (- obtained, + expected)
upper = 'ﷷ'
+ ),
+ Fail(
+ offset = 0
)
To reproduce add the following to ParserTest
:
override val scalaCheckInitialSeed = "SDzb3fKPxR67aeO2sgq4BlvTm5NphF9OM4j-dSIS9RD="
I haven't dug into this -- just figured I should report it ASAP.
How would I best use cats-parse
with Scalas string interpolation feature, where the input I am parsing is not just a plain string but also arbitrary values that are interpolated in it?
So how would I best write a parser for something like:
json"{ name: $name, id: $id }"
from https://docs.scala-lang.org/overviews/core/string-interpolation.html#advanced-usage
Hi,
We are currently trying to replace fastparse by cats-parse and ran into the issue that parsing became 3 to 30x slower than it was with fastparse. Using a profiler, I saw that most of the CPU is actually spent on calculating hashcode, triggered by the use of .distinct
in oneOf
. See the following screenshots:
Do you think this is caused by a wrong use of cat-parse, or is it something that can improved in cats-parse itself? Parser code can be found here if it helps.
I'm trying to parse a format where different parts are separated by <start_of_line>####
fragment and so, I would like to be able to detect the <start_of_line>
.
IMHO the logic should be similar to P.start | <prev_char = '\n'>
.
I'm not sure if that matters, but I'm trying to parse Intellij HTTP client file format with an explicit requirement of supporting ###
in the first line, so for example
###
// A basic request
http://example.com/a/
###
// A second request using the GET method
http://example.com:8080/api/html/get?id=123&value=content
Currently, we can only parse from entire strings. It would be nice to be able to parse a string at a given offset and only up to a given length.
This would allow you to parse the inside of a string that might be provided by another process without having to copy.
Should be as simple as updating State.
It appears that order matters for Parser.oneOf
, in cases where one parser accepts a subset of another parser.
This makes sense, however it might be good to explicitly mention in the docs the implication this has for generating parsers for a set of String
values - specifically that they should be reverse sorted according to length, because it's very easy to create inconsistent parsers if the input isn't correctly prepared.
For example, parsing a truthy value for true
will consistently work (or fail) depending on the order of the parsers:
import cats.parse.{Parser => P}
val buggy: P[Boolean] =
P.oneOf(List("1", "t", "tru", "yes", "true").map(P.string(_)))
.void
.as[Boolean](true)
val works: P[Boolean] =
P.oneOf(List("true", "yes", "tru", "t", "1").map(P.string(_)))
.void
.as[Boolean](true)
def sort(strings: List[String]): List[String] =
strings
.map(str => (str.length, str))
.sorted
.reverse
.map(_._2)
val sorted: P[Boolean] =
P.oneOf(sort(List("1", "t", "tru", "yes", "true")).map(P.string(_)))
.void
.as[Boolean](true)
List("1", "t", "tru", "true", "y")
.foreach { input =>
println {
"""|%-6s => %6s => %s
|%-6s => %6s => %s
|%-6s => %6s => %s
|""".stripMargin.format(
s"<$input>", "buggy", buggy.parseAll(input),
"", "works", works.parseAll(input),
"", "sorted", sorted.parseAll(input)
)
}
}
This is particularly troublesome because the error looks like an unexpected end of string, rather than an expected end of string that didn't happen:
Left(Error(1,NonEmptyList(EndOfString(1,3))))
How do I write recursive parsers? The following fails with a StackOverflowError. I assume somewhere I don't have something tail-recursive, but I'm not sure how to write this any differently. (Both op
and condexp
fail similarly below.)
package foo
import cats.parse.{Parser0, Parser, Numbers}
import cats.syntax.all._
import scala.language.postfixOps
sealed class Expr
case class Lit(x: Int) extends Expr
case class Op(left: Expr, op: String, right: Expr) extends Expr
case class Cond(cond: Expr, tr: Expr, fl: Expr) extends Expr
object testrecurse {
import Parser._
def expr: Parser[Expr] = recursive[Expr] { recurse =>
def subexpr = recurse.between(char('('), char(')'))
def lit = Numbers.digits.map(_.toInt).map(Lit(_))
// def condexp = ((recurse <* char('?')) ~ recurse ~ (char(':') *> recurse))
// .map { case ((cond, tr), fl) => Cond(cond, tr, fl) }
def op = (recurse, stringIn(List("+", "-", "*", "/")), recurse)
.mapN(Op(_, _, _))
oneOf(subexpr :: op :: lit :: Nil)
}
def main(args: Array[String]): Unit = {
//val expr = "1?(5):2"
val expr = "(5+3)/2"
println(testrecurse.expr.parse(expr))
}
}
We should be able to match fastparse v1, matching v2 would be tricky due to their use of macros there.
make sure any examples are typechecked.
There are many RFCs that reference the core rules of RFC5234. Is there any interest in an object that provides those? http4s has already implemented those imported by RFC7230.
https://github.com/typelevel/cats-parse/runs/1392179131?check_suite_focus=true
looks like the dotty version of 0.1.0 didn't publish... maybe I have to rerun some ci job....
ugh
Some of the error messages produce results that are either hard to render, or not particularly clear. This could be improved by providing the ability to replace or map over the error.
For example, if we have this parser(which is equivalent to -\s-
):
import cats.parse.{Parser => P}
val parser = P.charWhere(_.isWhitespace).surroundedBy(P.char('-'))
List(
"- -",
"-t-",
"--"
).foreach { input =>
println("%-10s \t => %s".format(s""""$input"""", parser.parseAll(input)))
}
We get errors that aren't terribly readable:
"- -" => Right( )
"-t-" => Left(Error(1,NonEmptyList(InRange(1, ,
), InRange(1,, ), InRange(1, , ), InRange(1,,), InRange(1, , ), InRange(1, , ), InRange(1,
,
), InRange(1, , ), InRange(1, , ))))
"--" => Left(Error(1,NonEmptyList(InRange(1, ,
), InRange(1,, ), InRange(1, , ), InRange(1,,), InRange(1, , ), InRange(1, , ), InRange(1,
,
), InRange(1, , ), InRange(1, , ))))
It would be handy to do something like fastparse's opaque
:
parser.opaque("whitespace")
Or a lower level map over the errors:
parser.leftMap { error =>
case InRange(index, _) => FailWith(index, "whitespace")
case unexpected => unexpected
}
Which could produce errors like this:
"- -" => Left(Error(1,NonEmptyList(FailWith(1,whitespace))))
"-t-" => Left(Error(1,NonEmptyList(FailWith(1,whitespace))))
"--" => Left(Error(1,NonEmptyList(FailWith(1,whitespace))))
We are going to waste a lot of CI time on failures that users may not know about.
Cats uses a prePR alias to run everything required to pass CI. We can add this note to the readme as well as in the template to make a PR.
each scala repo has a somewhat bespoke way to do a release.
Since many of us contribute on many repos, it is easy to lose track of how each works. Let's add a short md doc that explains the steps in a list.
There is the #16 plugin that drafts releases, and there is the auto publishing, and then there is the question of setting version numbers. Lastly, some repos need the mima versions to compare to updated. I'm not 100% what we need in this repo.
cc @mpilquist
Hi all,
I'm trying to port scala-uri to cats-parse.
One difficulty I'm facing is with parsing path parts including empty ones:
a/b/c
/a/b/c
a//c
/a//c
What I would like to do is something like this (simplified)
def _path_segment: Parser0[String] = Parser.until0(charIn("/?#"))
def _path: Parser[String] = (Parser.char('/').? ~ _path_segment.repSep0(char('/')).string
but this doesn't work, as Parser0
doesn't have repSep
or repSep0
defined.
I understand that Parser0#rep
is problematic, as one could easily run the empty parser infinitely, but in my naive thinking this problem shouldn't exist with repSep
, right?
Or is there a nicer way to do this in cats-parse?
Because ParserTest contains so many tests, making isolated changes and quickly testing with testOnly or testQuick is slower than you'd ideally want it to be.
Splitting up ParserTest would enable a tighter test loop. Are you OK with that?
Related to #52, question was raised about parsers that repeat a min and a max number of times. Here's a real-world case from RFC7321, which has a precision to the thousandths:
qvalue = ( "0" [ "." 0*3DIGIT ] )
/ ( "1" [ "." 0*3("0") ] )
It's not real common, but they're out there.
.as from Functor and maybe .replicateA from Applicative are useful enough that adding those methods on Parser and Parser1 might help discovery.
Since users with IDEs often rely on autocomplete, this can be useful for them.
I am implementing RFC 8941 and noticed testing my code that length restrictions don't quite seem to work as I expected.
val signedDecIntegral: P[String] =
(P.char('-').?.with1 ~ digits.rep(1,12)).map {
case (min, i) =>
min.map(_ => "-").getOrElse("")+i.toList.mkString
}
val decFraction: P[String] = digits.rep(1,3).string
val sfDecimal: P[(String,String)] =
(signedDecIntegral ~ (P.char('.') *> decFraction)).map {
case (dec: String,frac: String) =>
(dec,frac.toList.mkString) //todo: keep the non-empty list?
}
It seems like the length restrictions don't cause the expected errors
scala> decFraction.parse("12312323")
val res17: Either[cats.parse.Parser.Error,(String, String)] = Right((,12312323))
scala> sfDecimal.parseAll("12345678901234567890.22222")
val res18: Either[cats.parse.Parser.Error,(String, String)] = Right((12345678901234567890,22222))
https://github.com/typelevel/cats/blob/master/core/src/main/scala/cats/FlatMap.scala
for some reason, FlatMap overrides some functions that apply implements in terms of product and map, by using flatMap. Since flatMap is more expensive for a parser, we don't want those overrides.
We should go through all the overrides in FlatMap and if we can implement in terms of product and map do so.
have a doc site that publishes and looks similar to the cats documentation (logos etc...)
This issue may be a bit premature/pretentious to file as an issue, but I figured it would be good to get it out early. The safety that Parser1
and its combinators are great. So great, that while trying out the library, I find I use them pretty much all the time. Using a Parser
or a method that creates a Parser
is an exception.
Has it been considered doing the other way around, renaming Parser
to Parser0
, and Parser1
to Parser
, along with all 1
methods on object Parser
? That would make the potentially non-consuming Parsers the exception not only in practice but also in naming.
maybe, remove this |
method. just use orElse
Spun off from an http4s issue.
char
returns a Parser1[Unit]
. We know what it captured, so we don't return it.charIn
returns a Parser1[Char]
. It captured one of a set of characters, and we want to know which.ignoreChar
returns a Parser1[Unit]
. Arguably like charIn
in that we don't know what it cpatured, but the docs tell us to call .string
if we need the result.string1
returns a Parser[Unit]
. Seems consistent with .char
.Did we get this right, or should everything return what it captured?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.