Comments (6)
Hmmm.... there is the type registration capability...
>>> from glom import *
>>> import xml.etree.ElementTree as ET
>>> annoying_xml = """<?xml version="1.0" encoding="UTF-8"?>
... <kml xmlns="http://www.opengis.net/kml/2.2">
... <Placemark>
... <name>Simple placemark</name>
... <description>Attached to the ground. Intelligently places itself
... at the height of the underlying terrain.</description>
... <Point>
... <coordinates>-122.0822035425683,37.42228990140251,0</coordinates>
... </Point>
... </Placemark>
... </kml>
... """
>>> e = ET.fromstring(annoying_xml)
>>> g = Glommer()
>>> g.register(type(e), iterate=type(e).iter)
>>> g.glom(e, T[0][2][0].text)
'-122.0822035425683,37.42228990140251,0'
that's kind of along the right lines
from glom.
@kurtbrose Actually the registration can be as simple as: g.register(type(e))
glom now detects whether a type supports iteration automatically.
from glom.
Let me give an actual use case: I have many Placemarks and want each to be a row in a table. It'd be great to get the elegance of the PySpark DataFrame without the Spark dependency but once I have a DataFrame Spark can work its magic.
df = (spark.read
.format('com.databricks.spark.xml')
.options(rowTag='Placemark')
.load('file:/more_annoying_xml.kml')
)
df.printSchema()
# root
# |-- Point: struct (nullable = true)
# | |-- altitudeMode: string (nullable = true)
# | |-- coordinates: string (nullable = true)
# |-- TimeStamp: struct (nullable = true)
# | |-- when: string (nullable = true)
# |-- description: string (nullable = true)
# |-- name: string (nullable = true)
df.select('Point.coordinates', 'TimeStamp.when', 'name').printSchema()
# root
# |-- coordinates: string (nullable = true)
# |-- when: string (nullable = true)
# |-- name: string (nullable = true)
from glom.
Hey Rory, thanks for your patience, I've been running a bit behind lately.
I think glom itself can provide only about half of what you want. The other half is going to have to come from a library that gives us a better XML API. Here's how I would do that, using a neat library called untangle.
$ pip install untangle glom
Then:
import untangle
from glom import glom, T
annoying_xml = """<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Placemark>
<name>Simple placemark</name>
<description>Attached to the ground. Intelligently places itself
at the height of the underlying terrain.</description>
<Point>
<coordinates>-122.0822035425683,37.42228990140251,0</coordinates>
</Point>
</Placemark>
</kml>
"""
ut = untangle.parse(annoying_xml)
glom(ut, {'coords': (T.kml.Placemark.Point.coordinates.cdata, T.split(','), [float]),
'name': T.kml.Placemark.name.cdata})
# result:
# {'coords': [-122.0822035425683, 37.42228990140251, 0.0],
# 'name': u'Simple placemark'}
Threw in a little bit of coordinate transformation in there at the end. Is that closer to what you expected?
from glom.
Yeah! This is great. Thanks
from glom.
Glad you like it! Closing this case for now :)
from glom.
Related Issues (20)
- glom.grouping is super useful actually HOT 1
- Inconsistent Delete Behavior HOT 5
- Extract nodes from json based on user input preserveing a portion of the higher level object as well HOT 1
- unprintable PathAccessError when using Scope/S
- Assign a target with an object which has a `__setattr__` which returns a callable produces a false positive for `glom.core._has_callable_glomit` HOT 4
- Feature Discussion: More granular `skip_exc` in `Coalesce` HOT 2
- Recurive wildcard produces a StopIteration exception when called on objects containing iterators HOT 2
- GitHub releases not updated for 23.1.0 and 23.1.1 HOT 2
- Scope usage changes from v22 to v23 HOT 7
- Question, how to pull items into an array in a spec? HOT 1
- Replace deprecated imp module with importlib HOT 2
- Traceback mismatch in a couple tests on 3.11+ HOT 8
- `skip` option for Coalesce doesn't seems to work HOT 2
- Inspect output is not clear HOT 1
- replicate logic with Switch HOT 2
- Missing assert in test_mutation.py HOT 1
- Missing assert in test_cli.py
- Enumerate and Assign HOT 2
- Recursive Delete-If-Empty? HOT 6
- Storing dict keys during traversal HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glom.