How do PHP string functions handle UTF-8 characters like ‘鉑’?

Solution 1:

When working with multibyte strings in PHP, make sure to set the correct internal encoding:


<?php
// Check the current internal encoding
echo mb_internal_encoding() . '<br />';

// Count the length of a multibyte character
echo mb_strlen('鉑', 'UTF-8') . '<br />'; // Specify encoding explicitly
echo mb_strlen('鉑') . '<br />';          // Uses current internal encoding

// Set the internal encoding to UTF-8
mb_internal_encoding('UTF-8');
echo mb_internal_encoding() . '<br />';  
echo mb_strlen('鉑') . '<br />';          // Now counts correctly
?>

Output Example:

ISO-8859-1   // default encoding
1            // mb_strlen with UTF-8 specified
3            // mb_strlen with default ISO-8859-1
UTF-8        // after setting internal encoding
1            // mb_strlen now correct

Explanation:

mb_internal_encoding() shows the current encoding used by multibyte string functions.

mb_strlen() counts characters correctly only when the encoding matches the string.

Setting mb_internal_encoding(‘UTF-8’) ensures all subsequent mb_* functions handle UTF-8 properly.

✅ Always set the correct internal encoding when working with non-ASCII characters.