PHP mb_substr() Function – Multibyte-Safe Alternative to substr()

Concept of the `mb_substr()` Function

The mb_substr() function slices a string starting from a specified position (offset) and for a specified length, returning the extracted substring while safely handling multibyte characters.

While mb_substr() performs almost the same role as substr(), its key advantage is that it safely handles multibyte strings — such as those encoded in UTF-8 — without corrupting characters.

Encoding Issues with the `substr()` Function

When working with strings containing multibyte characters—such as Japanese, Chinese, Korean, or emoji—using the substr() function can lead to unexpected results. This is because substr() operates on byte offsets rather than character counts.

example, consider the Japanese string 'こんにちは世界' ("Hello, World" in Japanese):

Fixing Encoding Issues with the `mb_substr()` Function

When working with strings in UTF-8 or other multibyte encodings, characters such as Japanese, Korean, or Chinese use multiple bytes per character, unlike English letters or digits which use only one byte.

Multibyte encodings represent each character using two or more bytes. Since substr() cuts strings at the byte level, slicing within a multibyte character can cause corrupted or unreadable output.

The mb_substr() function solves this problem by operating on character counts rather than byte offsets, ensuring safe and accurate substring extraction.

Note:
The mb_substr() function is a safe, multibyte-aware alternative to substr().

Syntax

Parameters

Parameters of the `mb_substr()` function
`$string`	Required. The original string from which the substring will be extracted.
`$start`	Required. The starting position for extraction. The index starts at `0`, meaning the first character of the string is at index `0`. If a negative value is used, it counts backward from the end of the string. For example, `-1` refers to the last character, and `-2` refers to the second to last character.
`$length`	Optional. The length of the substring to extract. The default value is `null`, which means extracting all characters from the start position to the end of the string.
`$encoding`	Optional. The `$encoding` parameter is the character encoding. If it is omitted or `null`, the internal character encoding value will be used.

Return Values

The mb_substr() function returns the extracted substring when the operation is successful.

Changelog

Version history for `mb_substr()` Function
Version	Description
8.0.0	The `$encoding` parameter can now be set to `null`. When `null` or omitted, the function uses the default character encoding automatically.

Practical Examples

The following examples demonstrate how the mb_substr() function behaves in various scenarios—including basic usage with multibyte characters and working with negative offset values.

Basic Usage

PHP

$originalString = 'こんにちは、はじめまして！';
$start = 3; // Starting from the 4th character (0-based index)
$length = 5; // Extract 5 characters

$extractedString = mb_substr($originalString, $start, $length);

echo 'Extracted substring: ' . $extractedString;
// Output: Extracted substring: ちは、はじ

In this example, mb_substr() extracts 5 characters starting from the 4th character of a Japanese string. Since it counts characters (not bytes), it preserves the integrity of multibyte characters like Japanese kana and punctuation.

Using Negative Values for the `$start` Parameter: Counting from the End of the String

When the $start parameter is a negative number, mb_substr() starts counting from the end of the string. For example, -1 refers to the last character, -2 to the second-to-last, and so on.

PHP

$originalString = 'こんにちは、はじめまして！';
$start = -6; // Start 6 characters from the end
$length = 4; // Extract 4 characters

$extractedString = mb_substr($originalString, $start, $length);

echo 'Extracted substring: ' . $extractedString;
// Output: Extracted substring: じめまし

Here, $start is set to -6, which tells mb_substr() to start from the sixth character from the end of the string. It then extracts four characters, resulting in a clean, valid substring with no broken multibyte characters.

PHP mb_substr() Function – Multibyte-Safe Alternative to substr()

Concept of the `mb_substr()` Function

Encoding Issues with the `substr()` Function

Fixing Encoding Issues with the `mb_substr()` Function

Syntax

Parameters

Return Values

Changelog

Practical Examples

Basic Usage

Using Negative Values for the `$start` Parameter: Counting from the End of the String

References

See also

Concept of the mb_substr() Function

Encoding Issues with the substr() Function

Fixing Encoding Issues with the mb_substr() Function

Syntax

Parameters

Return Values

Changelog

Practical Examples

Basic Usage

Using Negative Values for the $start Parameter: Counting from the End of the String

References

See also

Concept of the `mb_substr()` Function

Encoding Issues with the `substr()` Function

Fixing Encoding Issues with the `mb_substr()` Function

Using Negative Values for the `$start` Parameter: Counting from the End of the String