Git Product home page Git Product logo

Comments (22)

adrianbn avatar adrianbn commented on July 17, 2024 4

Just curious since this issue seems old and still open: what is the recommended way to disable insecure features with scala-xml? Maybe override XMLLoader.parser with an implementation that includes the following?

spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
spf.setFeature("http://xml.org/sax/features/external-parameter-entities", false)
spf.setFeature("http://xml.org/sax/features/external-general-entities", false)
spf.setXIncludeAware(false)
spf.setExpandEntityReferences(false)

Thank you!

from scala-xml.

jroper avatar jroper commented on July 17, 2024 2

Something not mentioned in the RFC is the memory issues that doctypes introduce, including the billion laughs vulnerability (recursive entity references leading to exponential memory usage) and the quadratic blowup vulnerability (many references to a single large entity leading to quadratic memory usage). Both of these vulnerabilities allow an attacker, with a small payload (as small as 100B for billion laughs, as small as 200KB for quadratic blowup) to cause a JVM to OOME. While the JDKs XML parser does have protection against billion laughs (recursion limits, but off by default), it doesn't have any protection against quadratic blowup. So the only safe way to handle untrusted XML on the JVM is to disallow doctypes altogether. It would be nice if the JDK XML parsers allowed you to simple ignore doctypes, but unfortunately, they don't, either it accepts the doctype, or fails if the doctype is present.

Disallowing doctypes is likely to cause some problems for users - many systems sending XML will automatically add a doctype, and most users will be frustrated that Scala doesn't accept this. So, we need to consider that if we decide to introduce this as a default. On Play, we do have users every so often report this issue, but we find that after explaining to them that if doctypes were allowed, the attacker could take down their webapp with a single request, they're happy to change the sending system to not include a doctype.

from scala-xml.

retronym avatar retronym commented on July 17, 2024 1

Thanks, @milessabin!

Quoting the juicy parts:

XML mechanisms that follow external references (Section 4.14) may
also expose an implementation to various threats by causing the
implementation to access external resources automatically. It is
important to disallow arbitrary access to such external references
within XML data from untrusted sources. Many XML grammars define
constructs using URIs for external references; in such cases, the
same precautions must be taken.

.

Acknowledgements
The authors would like to thank the following people who have
provided significant contributions to the development of this
document:

Mark Baker, Tim Berners-Lee, Tim Bray, James Clark, ... , Miles Sabin ...

A blast from the past :)

from scala-xml.

milessabin avatar milessabin commented on July 17, 2024

This IETF BCP is worth a read: https://www.ietf.org/rfc/rfc3470.txt

from scala-xml.

milessabin avatar milessabin commented on July 17, 2024

Indeed ... I think Billion Laughs came a little later.

from scala-xml.

dnvriend avatar dnvriend commented on July 17, 2024

Last year I got a warning from a friendly hacker that warned me about this issue in JBoss Fuse 6.0.0 that we used then. We also use spray and accept XML. Other components also use the SecureXML. The xxeFilter is also used by other routes. The code I created then:

object SecureXml {
  def loadString(xml: String): NodeSeq = {
      val spf = SAXParserFactory.newInstance()
      spf.setFeature("http://xml.org/sax/features/external-general-entities", false)
      spf.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true)
      val saxParser = spf.newSAXParser()
      XML.withSAXParser(saxParser).loadString(xml)
  }
}

The 'xxeFilter' is:

trait XxeFilter {
  def log: LoggingAdapter

  implicit def executionContext: ExecutionContext

  def xxeFilter(xxe: String): Future[String] = {
    log.info("Validating xml: [{}]", xxe)
    Future(xmlMustStartWith(xmlMustNotContain(xxe)))
  }

  def throwError(xxe: String, message: String): String = {
    log.error("{}:\n [{}]", message, xxe)
    throw new Error(message)
  }

  private def xmlMustNotContain(xxe: String): String = xxe match {
    case _ if xxe.toUpperCase.contains("<!DOCTYPE") => throwError(xxe, "XML contains DOCTYPE declaration")
    case _ if xxe.toUpperCase.contains("<!ENTITY") => throwError(xxe, "XML contains ENTITY declaration")
    case _ if xxe.toUpperCase.contains("<!ELEMENT") => throwError(xxe, "XML contains ELEMENT declaration")
    case _ if xxe.toUpperCase.contains("SYSTEM") => throwError(xxe, "XML contains SYSTEM declaration")
    case _ => xxe
  }

  private def checkForXXE(xxe: String): String = {
    SecureXml.loadString(xxe).toString()
    xxe
  }

  private def xmlDataMustStartWith(xxe: String): String = xxe match {
    case _ if xxe.startsWith("//removed - non open source//") && xxe.contains("""xmlns="//removed non open-source"""") => checkForXXE(xxe)
    // snip
    case _ => throwError(xxe, "received XML does not start with xml declaration or //snip // element or namespace is not correct.")
  }

  private def xmlMustStartWith(xxe: String): String = xxe match {
    case _ if xxe.startsWith("<?xml") => xmlDataMustStartWith(xxe.substring(xxe.indexOf("?>") + 2, xxe.size).trim)
    case _ => xmlDataMustStartWith(xxe)
  }
}

BasicRoute:

trait BasicRoute extends Directives {
  implicit def executionContext: ExecutionContext
  implicit def timeout: Timeout
  implicit def system: ActorSystem

  def badRequestXml = "//snip//"

  def internalServiceErrorXml(description: String, t: Option[Throwable]) = "//snip//"

  def validationError(code: String, description: String) = "//snip//"

  def completeWithBadRequest = complete(BadRequest, badRequestXml.toString())

  def completeWithError(throwable: Throwable) = complete(InternalServerError, internalServiceErrorXml(s"An error occurred: ${throwable.getMessage}", Option(throwable)))

  def completeWithError = complete(InternalServerError, internalServiceErrorXml(s"An error occurred", None))

  def completeWithValidationError(code: String, description: String) = complete(BadRequest, validationError(code, description))
}

use in route:

trait ARoute extends BasicRoute with XxeFilter {

  lazy val ARoute: Route =
    path("//snip//") {
      post {
        entity(as[String]) { xxe =>
          onComplete(xxeFilter(xxe)) {
            case Success(xml) =>
                camel(xml).map { _ =>
                  complete(StatusCodes.NoContent)
                }.recover { case t: Throwable =>
                  completeWithError(t)
                }.getOrElse {
                  complete(StatusCodes.InternalServerError)
                }
            case Failure(ex) => completeWithBadRequest
          }
        }
      }
    }

  def camel(xml: String): Try[Unit]
}

from scala-xml.

adriaanm avatar adriaanm commented on July 17, 2024

Thanks, @dnvriend! Maybe we could include this so that others can just call 'xml.beSafe'?

from scala-xml.

dnvriend avatar dnvriend commented on July 17, 2024

Or just be safe... implicitly :)

from scala-xml.

jawshooah avatar jawshooah commented on July 17, 2024

Any plans to implement this? I'd love it if scala-xml had XXE prevention bundled in.

from scala-xml.

biswanaths avatar biswanaths commented on July 17, 2024

Let me have a look at this.

from scala-xml.

EdgeCaseBerg avatar EdgeCaseBerg commented on July 17, 2024

curious, has any work been done on this?

from scala-xml.

biswanaths avatar biswanaths commented on July 17, 2024

https://github.com/scala/scala-xml/blob/master/src/main/scala/scala/xml/factory/XMLLoader.scala#L28

Changing the parser features as suggest while creating the parser should be simple enough.

What should be the route to upgrade ? We can change the code but not the data. Anybody who upgrades to latest version there is a possibility that some data path might fail.

  • Option A:
    Provide a non-default secure parser. With warning generated, for using the default non-safe parser.
  • Option B:
    Make default parser to safe and allow a non-default, unsafe parser if anybody wants to fall back on.

Please let me know, how we should proceed with this.

from scala-xml.

EdgeCaseBerg avatar EdgeCaseBerg commented on July 17, 2024

If Option A: Then people who upgrade their code will see the warnings right away and know they should put some time into implementing the necessary changes.

If Option B: Then people who upgrade their code will be happily unaware of issues with any of their other software that might be at risk if using non-updated versions.

Option A seems to increase awareness of these type of issues in general which I'd count as a +1 to that approach, but I can see why many would like to have safety by default and would advocate for that change instead.

from scala-xml.

v6ak avatar v6ak commented on July 17, 2024

I am against option B for some other major reason: It might increase fear of security updates (as it might break things), which is bad for security. What is worse, everything will compile after the change, unit tests will likely also pass, but the app still might break. (I know, integration tests should theoretically catch it, but still… Even in the case that integration tests catch the issue, it does not encourage that updates are painless.)

Maybe the only advantage of Option B is forcing old code to secure mode. Maybe this goal might be achieved with some tradeoff. For example, we would have three objects:

  • scala.xml.SecureXml – newly added object with secure behavior
  • scala.xml.InsecureXml – newly added object with compatible (potentially insecure) behavior
  • scala.xml.Xml – old deprecated object; By default, it behaves as scala.xml.InsecureXml for compatibility reasons. However, one might globally change the behavior to scala.xml.SecureXml by switching some system property. While I am generally not an advocate of global config, I consider this to be the least evil there.

We might call this tradeoff as Option C.

Finally, there is an Option D. It is like Option C, but with scala.xml.Xml object completely removed. As a result, all programmers would be forced to rewrite the code using the legacy scala.xml.Xml. Or maybe all the methods of scala.xml.Xml object would be marked as synthetic and start throwing an exception.

Scenarios of upgrade:

  1. Programmer upgrades to a new version of the library. A: Programmer is warned and will hopefully take the action. B: Code will compile without warning, but might silently break. C: Programmer is warned and will hopefully take the action. D: Programmer is forced to take the action.
  2. Programmer upgrades another library that evicts the old version. The code is, however, compiled against the old version, maybe because it is part of a 3rd party library. A: Programmer is not warned. B: Code will compile without warning, but might silently break. C: Programmer is not warned, the code will not break (but might be insecure). D: The code will suddenly start throwing weird errors at runtime. (Grrrr!)
  3. Someone wants to force the secure behavior for all the application (with taking the risks of potential incompatibility). I suppose that there is the new XML library version in the project. A: No way. B: Automatically done. C: Can be done by setting a global system property. D: Forced manual action.

The Option A covers only one of these three scenarios.

The Option B covers all these three scenarios. The cost is, however, potential silent BC breaks and discouraging users from security updates.

Option C covers scenarios 1 and 3 and performs definitely better than Option A. While Option C seems to fail at the scenario 2, it is not as bad as it might sound. If the programmer also uses the XML library directly, the scenario 3 might be triggered.

The Option D might look like the safest option, because it forces taking some action, but it is not so simple. At the scenario 3, the programmer can't simply evict the old version of scala-xml. All the relevant libraries are required to be upgraded before, which can actually delay the fix. In such case, the Option D performs significantly worse than others.

In short term: I am in favour of Option C. The Option A looks the second best for me. Options B and D look wrong for me in the short term.

In long term: After the old object or methods being deprecated for some period of time, I'd use the Option D and remove the deprecated scala.xml.Xml object.

from scala-xml.

biswanaths avatar biswanaths commented on July 17, 2024

Anybody else has any opinion please let us know. I think we should try to get a satisfactory solution to this.

from scala-xml.

tyohDeveloper avatar tyohDeveloper commented on July 17, 2024

The name of the "secure" option should be something like "no XSS, mostly secure*." We don't want to imply security that we can't implement. Hackers will holes in every implementation. Even if there isn't a hole in the library, some naïve developers will use unsafe practices. We shouldn't imply more than what can be delivered.

  • I have no ide what it should be called. LimitedXssSchema seems too bulky. KindaSecure and KnownSecured can be interpreted to be the opposite of what is intended. MoreSecure? I think the more secure version should be the default (option 2).

Good programming practices always put parsing in its own thread. Futures make that somewhat easy. Unsafe usages or unintended Xss should throw run time exceptions that will kill the process. Only that thread will die, the main thread won't be affected. It can help control DSS attacks as well.

A number of strategies need to be employed to control resource usage, naturally. If possible, well known public key, elliptical (or other) methods that are somewhat difficult to crack.

AES 256 or a version that doesn't have well known bugs. The usual, but not always available methods. All of which take more runtime resources. So some resource control options should be available.

Most of this would require a lot of work to implement. Perhaps even a complete rewrite. Hooks to allow the consumer to add those features might be possible with considerably less work. A library of monads could be made available incrementally. That would help with implementation time. The community could add those over time. It would require extensive testing to ensure they don't actually weaken security. The NSA has injected weaknesses in the Linux (and others) distributions a number of times. The known ones have been caught and addressed. Monads would make them far less obvious and harder to spot.

Most of the committers in the scala know to each other. The use base is small enough to be ignored. As a paranoid system architect, I have high standards. They are probably unreasonable.

LessSecure, XML, MoreSecure?

from scala-xml.

v6ak avatar v6ak commented on July 17, 2024

from scala-xml.

tyohDeveloper avatar tyohDeveloper commented on July 17, 2024

from scala-xml.

v6ak avatar v6ak commented on July 17, 2024

from scala-xml.

SethTisue avatar SethTisue commented on July 17, 2024

"safer"?

from scala-xml.

tyohDeveloper avatar tyohDeveloper commented on July 17, 2024

from scala-xml.

SethTisue avatar SethTisue commented on July 17, 2024

fixed by #177

from scala-xml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.