Serializing and unserializing data in PHP

Apr 19, 2024 by Jeroen Deviaene

Every programmer has encountered the issue of having to store objects in a database or cache. You always need some kind of encoding and decoding to turn your complex data into a primitive type that can be interpreted by other systems. Usually, the go-to solution is to use JSON, however, this loses all typing information.

Luckily, PHP has a built-in solution that stores this information and easily unserializes it to any form of PHP object. The format is similar to JSON but with a few special alterations. For a personal project, I needed to dig deeper into how this serialization works and its format. Let’s dive in!

Serialization

To serialize any data in PHP, the programming language has the well-named function serialize. You can pass almost any variable or value to this function to have it serialized in the PHP format. The result is something that looks like the example below.

O:15:"App\Models\User":2:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}

Serializing scalar values

Scalar values (int, float, string, or bool) are the easiest to serialize as they always consist of a single value. Each of these is serialized with a prefix depicting the type of the value followed by a colon and the actual value. There is a slight deviation from this format for strings.

serialize(33);      // i:33;
serialize(3.14);    // d:3.14;
serialize(true);    // b:1;

For strings, a number is added between the type and the value that indicates the length of the string that follows.

serialize('lorem ipsum');   // s:11:"lorem ipsum";

null is not a scalar value, but I include this here as it is serialized to an equal simple string:

serialize(null);    // N;

Serializing arrays

Arrays in PHP are more like maps or dictionaries with key-value pairs and are also serialized as such. Serializing an array starts with an a: prefix, followed by the length of the array and a list of all key-value elements. This list of elements is just a string of serialized values alternated by key and value. So an array with one single string element will first have an array key 0 followed by the string value.

serialize(['foo']);     // a:1:{i:0;s:3:"foo";}

If this array were to contain any more elements, they would simply be added after the foo string. First, the key (probably 1), followed by the value for this key.

Serializing objects

Objects are serialized almost as an array where the key is a string containing the property name in the object. In addition, the prefix also contains the fully qualified class name of the object that was serialized. Lastly, the prefix also contains a counter of how many fields the object contains, keep in mind that fields of child objects are also counted. putting this all together, the serialized string for a simple object looks as follows.

class User 
{
    public string $username = 'jerodev';
    public int $age = 33;
}


serialize(new User());  // O:4:"User":2:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}

Unserializing

The unserialize function simply does the exact opposite of the serialize function. So it should always return exactly the value you serialized. Because of how the serialized string is made up this also works perfectly with objects and namespaces.

> unserialize('O:4:"User":2:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}')

= User {#5896}

While unserializing, the function also does a bunch of checks such as verifying that the amount of fields in an object is correct, and checking if the array length and string lengths are correct. If any of these fail the unserialize function issue an E_WARNING.

> unserialize('O:4:"User":3:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}')

    WARNING  unserialize(): Unexpected end of serialized data.
    WARNING  unserialize(): Error at offset 58 of 59 bytes.

= false

If you were to pass a serialized object that does not exist in the current project, unserialize() will return an object of the type __PHP_Incomplete_Class with the properties defined in the serialized string

> unserialize('O:7:"Unknown":2:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}')

= __PHP_Incomplete_Class(Unknown) {#5893
    +username: "Jerodev",
    +age: 33,
  }

Security when unserializing

A concern when unserializing objects is that a bad actor could inject another class in the serialized string that has malicious actions. To make sure the unserialized object is an expected object type, you can provide the function a list of allowed classes that may be unserialized. When unserializing objects, this is always recommended.

\unserialize('O:15:"App\Models\User":3:{s:8:"username";s:7:"Jerodev";s:3:"age";i:33;}', [
    'allowed_classes' => [
        App\Models\User::class,
    ],
]);

If a class is detected in the serialized string that is not part of the allowed_classes, a __PHP_Incomplete_Class will be returned.

Magic serialization methods

PHP contains a few methods that can be added to classes to expand serialization functionality for objects of this class.

If the __sleep method is defined, it will be called before serializing the object. This method should be used to clean up the object before serialization and must return an array of strings. These strings are all properties that should be serialized for this object. Any property name that is not in the array will be omitted from the serialized string.

You can also define a __wakeup method. This method is called on the unserialized object directly after creating it. This function can be used to execute any initialization functions before the object is further used in the code.

In conclusion

While the serialization of PHP objects is a great way of storing and restoring data, its greatest problem is that the serialization is proprietary. The string generated in this format cannot be read by any other programming language without creating a parser.

On the other hand, this is currently the best way to store PHP objects without the need for a third-party package.

The rule of thumb seems to be to use this only if you are 100% certain that no other programming language will ever need the data generated by the serialize function. And this is where I was wrong, so now I am creating a parser for serialized PHP objects in Go. 😉