Comments (10)
That sounds more like an enhancement request than a defect report.
To generalize things a bit, support for a standard data access API would allow
people
to plug in multiple DB backends.
Original comment by tfmorris
on 10 May 2010 at 7:26
from google-refine.
One of the major challenges here is how to support undo/redo when changes can
get into
the back-end database without going through Gridworks. Another major challenge
is where
to store metadata (such as reconciliation records) that is specific to
Gridworks and
not native to any existing back-end database.
Original comment by [email protected]
on 10 May 2010 at 7:32
- Added labels: Priority-Low, Type-Enhancement
- Removed labels: Priority-Medium, Type-Defect
from google-refine.
Hmm...I presume a new project would have to be created when doing this, and that
could hold the meta information.
As for undo/redo, maybe adding a commit button would make that easier. So
changes can
be made to a snapshot of the data, but then no changes are made to the DB itself
until commit is pressed?
Original comment by mjlissner
on 10 May 2010 at 7:36
from google-refine.
You can either keep it synchronous with the database (effectively using the
database
as the backend), but lose undo/redo support and reconciliation with Freebase
(unless
you add suitable tables to the database). You're then using Gridworks for just
the
facets really.
The other way, as you suggest, is to make it similar to a disconnected session
with a
commit transaction. The issue is then ensuring consistency between the remote
database and the snapshot held in Gridworks. Merging the two back in would be
an
issue. You'd also need to hold keys from the remote database in Gridworks for
updating records.
Frameworks such as Hibernate or Spring would be worth considering for their
database
abstraction layers.
Original comment by iainsproat
on 10 May 2010 at 7:46
from google-refine.
I have read that good ORMs such as Hibernate now support ordered lists and
other
features now incorporated into JPA 2.0 as of Dec 2009.
Original comment by thadguidry
on 10 May 2010 at 8:05
from google-refine.
Original comment by iainsproat
on 25 May 2010 at 7:59
- Changed state: Accepted
from google-refine.
Original comment by iainsproat
on 14 Oct 2010 at 9:26
- Added labels: Component-Logic, Component-Persistence, Usability
from google-refine.
Initially, what I believe is most important/useful, is simply having the
ability to direct-connect to MySQL/PostgreSQL/etc. data sources from the
get-go(create a new project). Also initially, being able to set/save multiple
data sources, and multiple dBs within those sources. When the user creates a
new project, this would in effect create a disconnected session(as mentioned
above), wherein the data is treated as is now the data. Adding 'commit'
features can come next, followed by more advanced connectivity options, until
such a point synchronous functionality is in place to one degree or another.
But for now, it would certainly be nice to add data sources and pull data from
those sources!
Eric Jarvies
Original comment by [email protected]
on 11 Nov 2010 at 7:08
from google-refine.
Eric, A cleanup tool using industry best practices is best used offline within
a process. There are existing ETL tools that easily consume from
MySQL/PostgreSQL etc and offer excellent flow control, exporting, and
connectivity to produce delimited files with relative ease. Talend is one such
product that I use along with Google Refine. Talend (Open Source Community
edition) does my scheduled daily gathering from 3 databases (MySQL and Oracle)
and then dumps a customized TSV file that I open with Google Refine for further
analysis and sometimes clean up. There are other tools that provide ETL
(Extract, Transform, Load) like Talend. I'm not 100% sure if the team really
feels the need to copy that and flesh out a full ETL platform, since Talend and
other tools fill that need very nicely. Incidentally, using Google Refine and
a bit of clustering, I was able to find a few loop holes in our data storage
processing that we fixed with a few stored procedures within Oracle. Google
Refine was instrumental as a discovery tool for that. Talend does have an MDM
component but does not have the interactivity of a discovery tool like Google
Refine does. If you do NOT need a daily process, but only one time cleanup,
just dumping with MySQL or PostgreSQL would offer about the same and depending
on the size of database takes only secs to minutes. Dumping can also avoid
potential live database locks, that if Refine supported might have to tip-toe
around, depending on the teams' chosen implementation of database connectivity.
If you have large database size needs, give Talend or another ETL tool a try
with Google Refine, and you'll soon see the powerful left-right combination.
I'm not sure how far the team ultimately decides to absorb direct connectivity
support within Google Refine. I'd like to hear other opinions as well on this
Issue-12.
Original comment by thadguidry
on 11 Nov 2010 at 2:36
from google-refine.
I agree with thaguidry. Let the Refine team focus on bringing data quality
issues to light. Let Talend focus on data Quality (they do have an data
profiling tool that can identify some of this stuff) Talend is what we use for
basic ETL. You could write some SQL to get the data out of MySQL anyway. If you
analyze directly connected to db server for data quality against an entire
large table your dba might become angry too.
C
Original comment by [email protected]
on 14 Apr 2011 at 2:23
from google-refine.
Related Issues (20)
- Index out of range: -1 when creating project from local compressed file HOT 1
- Provide preview parser for RDF importers HOT 1
- Update gdata client libs HOT 4
- importing-controller error with JSON file HOT 3
- Refine is damaged - won't open HOT 5
- HtmlText GREL function locks up Google Refine HOT 2
- row processing java class HOT 1
- Diff Dates returning NULL
- Add ANDed behaviour to intra-facet operations HOT 1
- Butterfly classloader deadlock
- Patch: Update Freebase suggest widget HOT 9
- Switch to released version of Freebase reconciliation service HOT 1
- Minor patch: Use new suggest widget HOT 1
- Change branding from Google Refine to OpenRefine HOT 1
- Review How Dates Are Exporting using Template Function
- Unable to run the Augmentaton code to fetching URLs based column
- Refine is open as defaul in IE
- Adding column based on column not working?
- ASC
- KCP
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from google-refine.