fixing-character-encoding's Introduction

Character Encoding Examples

Character encodings are sets of mappings from raw bits (0’s and 1’s) to text characters.
When a text encoded with a specific encoder is decoded with a different encoder, it changes the output text. Sometimes this results in completely unreadable text.

About the dataset:

This dataset is made up of six text files that represent five different character encodings and six different languages. The character encodings represented in this dataset are ISO-8859-1 (also known as Latin 1), ASCII, Windows 1251, UTF-16 that has been successfully converted into the UTF-8 and BIG-5. More information on the files is available in the file_guide.csv file.

Source:

Kaggle Datasets URL:https://www.kaggle.com/datasets/rtatman/character-encoding-examples

Recommend Projects

k14anb / fixing-character-encoding Goto Github PK

fixing-character-encoding's Introduction

Character Encoding Examples

About the dataset:

Source:

fixing-character-encoding's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent