Character encodings are sets of mappings from raw bits (0โs and 1โs) to text characters.
When a text encoded with a specific encoder is decoded with a different encoder, it changes the output text. Sometimes this results in completely unreadable text.
This dataset is made up of six text files that represent five different character encodings and six different languages. The character encodings represented in this dataset are ISO-8859-1 (also known as Latin 1), ASCII, Windows 1251, UTF-16 that has been successfully converted into the UTF-8 and BIG-5. More information on the files is available in the file_guide.csv file.
Kaggle Datasets URL:https://www.kaggle.com/datasets/rtatman/character-encoding-examples