UTF8 - Simple Library for Internationalization
Loading...
Searching...
No Matches
utf8.cpp File Reference

Basic UTF-8 Conversion functions. More...

#include <utf8/utf8.h>
#include <vector>
#include <cassert>
#include <cstring>

Functions

action utf8::error_mode (action mode)
 Set error handling mode for this thread.
 
char32_t utf8::throw_or_replace (exception::cause err)
 
std::string utf8::narrow (const wchar_t *s, size_t nch)
 Conversion from wide character to UTF-8.
 
std::string utf8::narrow (const std::wstring &ws)
 Conversion from wide character to UTF-8.
 
std::string utf8::narrow (const char32_t *s, size_t nch)
 Conversion from UTF32 to UTF8.
 
std::string utf8::narrow (const std::u32string &s)
 Conversion from UTF32 to UTF8.
 
std::string utf8::narrow (char32_t r)
 Conversion from UTF32 to UTF8.
 
std::wstring utf8::widen (const char *s, size_t nch)
 Conversion from UTF-8 to wide character.
 
std::wstring utf8::widen (const std::string &s)
 Conversion from UTF-8 to wide character.
 
std::u32string utf8::runes (const char *s, size_t nch)
 Conversion from UTF-8 to UTF-32.
 
std::u32string utf8::runes (const std::string &s)
 Converts a string of characters from UTF-8 to UTF-32.
 
bool utf8::valid_str (const char *s, size_t nch)
 Verifies if string is a valid UTF-8 string.
 
char32_t utf8::next (std::string::const_iterator &ptr, const std::string::const_iterator last)
 Decodes a UTF-8 encoded character and advances iterator to next code point.
 
char32_t utf8::next (const char *&ptr)
 Decodes a UTF-8 encoded character and advances pointer to next character.
 
char32_t utf8::prev (const char *&ptr)
 Decrements a character pointer to previous UTF-8 character.
 
char32_t utf8::prev (std::string::const_iterator &ptr, const std::string::const_iterator first)
 Decrements an iterator to previous UTF-8 character.
 
size_t utf8::length (const std::string &s)
 Counts number of characters in an UTF8 encoded string.
 
size_t utf8::length (const char *s)
 Counts number of characters in an UTF8 encoded string.
 
bool utf8::isblank (char32_t r)
 Check if character is space or tab.
 
bool utf8::isspace (char32_t r)
 Check if character is white space.
 

Detailed Description

Basic UTF-8 Conversion functions.

Function Documentation

◆ error_mode()

action utf8::error_mode ( action  mode)

Set error handling mode for this thread.

Parameters
modenew error handling mode
Returns
previous error handling mode for this thread

◆ length() [1/2]

size_t utf8::length ( const char *  s)

Counts number of characters in an UTF8 encoded string.

Parameters
sUTF8-encoded string
Returns
number of characters in string
Note
Algorithm from http://canonical.org/~kragen/strlen-utf8.html

◆ length() [2/2]

size_t utf8::length ( const std::string &  s)

Counts number of characters in an UTF8 encoded string.

Parameters
sUTF8-encoded string
Returns
number of characters in string
Note
Algorithm from http://canonical.org/~kragen/strlen-utf8.html

◆ next() [1/2]

char32_t utf8::next ( const char *&  ptr)

Decodes a UTF-8 encoded character and advances pointer to next character.

Parameters
ptrReference to character pointer to be advanced
Returns
decoded character

If the string contains an invalid UTF-8 encoding, the function throws an exception or returns utf8::REPLACEMENT_CHARACTER (0xfffd) depending on error handling mode. In any case, the pointer is advanced to beginning of next character or end of string.

◆ next() [2/2]

char32_t utf8::next ( std::string::const_iterator &  ptr,
const std::string::const_iterator  last 
)

Decodes a UTF-8 encoded character and advances iterator to next code point.

Parameters
ptrReference to iterator to be advanced
lastIterator pointing to the end of range
Returns
decoded character

If the iterator points to an invalid UTF-8 encoding or is at end, the function throws an exception or returns utf8::REPLACEMENT_CHARACTER (0xfffd) depending on error handling mode. In any case, the iterator is advanced to beginning of next character or end of string.

◆ prev() [1/2]

char32_t utf8::prev ( const char *&  ptr)

Decrements a character pointer to previous UTF-8 character.

Parameters
ptrReference to character pointer to be decremented
Returns
previous UTF-8 encoded character

If the string contains an invalid UTF-8 encoding, the function throws an exception or returns utf8::REPLACEMENT_CHARACTER (0xfffd) depending on error handling mode. In this case the pointer remains unchanged.

◆ prev() [2/2]

char32_t utf8::prev ( std::string::const_iterator &  ptr,
const std::string::const_iterator  first 
)

Decrements an iterator to previous UTF-8 character.

Parameters
ptriterator to be decremented
firstiterator pointing to beginning of string
Returns
previous UTF-8 encoded character

If the string contains an invalid UTF-8 encoding, the function returns REPLACEMENT_CHARACTER (0xfffd) and iterator remains unchanged.

◆ valid_str()

bool utf8::valid_str ( const char *  s,
size_t  nch 
)

Verifies if string is a valid UTF-8 string.

Parameters
spointer to character string to verify
nchnumber of characters to verify or 0 if string is null-terminated
Returns
true if string is a valid UTF-8 encoded string, false otherwise