Comments (7)
We actually have already use two different separator guessers, but if they disagreed we were just giving up which caused it to default to TSV, which also disabled the quote handling which defaults to ON for CSV and OFF for TSV. The Univocity parser guessed " " (the space character), which isn't completely unreasonable, and we guessed "," (comma), which is correct. I've created a patch which will use our internal guess in cases of disagreement, which preserves backward compatibility.
from openrefine.
Ok, let's do that
from openrefine.
Yes, you're not the first one to be surprised by that! We are aware that the records mode is pretty unintuitive for newcomers, especially because it gets activated by default in a case like yours.
You can read more about it here:
https://openrefine.org/docs/manual/exploring#rows-vs-records
We have some ideas of how to make that more transparent #5174, #5175. Do you think they would have helped?
from openrefine.
Wow I did not consider the record mode because I've never used it, despite years of using OpenRefine! Is it for dealing with json and other hierarchical data?
My bad then! I really like the suggestion in #5175 as it would have made clear to me that OR is warning me about something, even if I don't know yet what it is. It would then be easy to have a Q&A page with "why is my first column red?" for people looking for answers.
A tutorial would have been useless because probably forgotten by the time you end up activating record mode by chance. As I mentionned, I've been using OpenRefine for years, and I mentioned the issue to 2 other colleagues who use it regularly, and everybody thought that was a bug.
What about the double quotes though?
from openrefine.
What about the double quotes though?
Ah yes I had missed that part, well, that's something that we should definitely investigate, I don't see any reason why this would be by design.
from openrefine.
I think perhaps just changing the defaults to never start in record mode might be the best solution.
There's no "hallucination" here because a) we don't use LLMs and b) the quotes are in the original data. Selecting the option Use character " to enclose cells containing column separators
will make them go away.
I see three things that could be improved with this dataset:
- Guess CSV instead of TSV, as it currently does, for the initial format
- Enable quote stripping by default when it's indicated
- Start in row mode instead of record mode
The CSV package that we use has a format "sniffer" which may be able to help with 1 & 2.
from openrefine.
@wetneb I recommend that the fix for this be included in 3.8 since the regression was first introduced in that release with #6098 and this is, in my opinion, a low risk fix which restores the prior behavior.
from openrefine.
Related Issues (20)
- Quote table name in SQL exporter HOT 3
- Encoding issue regression for files imported into version 3.8.0 HOT 10
- Allow manual selection of UTF-8 BOM encoding
- forEachIndex with array containing null values throw NullPointerException
- 'Search for match' link not shown when no reconciliation candidates are present
- The dialog system uses an incorrect WAI ARIA Role attribute
- TSV import always trims white space, ignoring parse setting
- Add new GREL function to normalize characters HOT 1
- Search option has disappeared from reconciliation results (3.8.0) HOT 1
- Add new GREL function to calculate the edit distance HOT 3
- Checking running status of OpenRefine with wget will not work correctly
- Move the Wikitable importer to an extension HOT 1
- Don't catch exceptions in Java unit tests
- Allow user to automatically report their OpenRefine installation configuration
- Incorrect localization for row/record count in main summary bar
- Restore deleted constructor to StandardReconConfig
- Import progress bar exceeds the intended box HOT 1
- Fail to open the browser after startup on linux without Desktop.browse support
- Update the UI for the starred tab in expression dialogue HOT 5
- Column menus: select submenu item by moving mouse diagonally
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openrefine.