In my hunt for the perfect clean url (smart url, slug, permalink, whatever) generator I’ve always slipped in some exception or bug that made the function a piece of junk. But I recently found an easy solution I hope I could call “definitive”.

Clean url generators are crucial for search engine optimization or just to tidy up the site navigation. They are even more important if you work with international characters, accented vowels /à, è, ì, .../, cedilla /ç/, dieresis /ë/, tilde /ñ/ and so on.

First of all we need to strip all special characters and punctuation away. This is easily accomplished with something like:

function toAscii($str) {
	$clean = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", '', $str);
	$clean = strtolower(trim($clean, '-'));
	$clean = preg_replace("/[\/_|+ -]+/", '-', $clean);

	return $clean;

With our toAscii function we can convert a string like “Hi! I’m the title of your page!” to hi-im-the-title-of-your-page. This is nice, but what happens with a title like “A piñata is a paper container filled with candy”?

The result will be a-piata-is-a-paper-container-filled-with-candy, which is not cool. We need to convert all special characters to the closest ascii character equivalent.

There are many ways to do this, maybe the easiest is by using iconv.

setlocale(LC_ALL, 'en_US.UTF8');
function toAscii($str) {
	$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
	$clean = preg_replace("/[^a-zA-Z0-9\/_| -]/", '', $clean);
	$clean = strtolower(trim($clean, '-'));
	$clean = preg_replace("/[\/_| -]+/", '-', $clean);

	return $clean;

I always work with UTF-8 but you can obviously use any character encoding recognized by your system. Thepiñata text is now transliterated into a-pinata-is-a-paper-container-filled-with-candy. Lovable.

If they are not Spanish, users will hardly search your site for the word piñata, they will most likely search forpinata. So you may want to store both versions in your database. You may have a title field with the actual displayed text and a slug field containing its ascii version counterpart.

We can add a delimiter parameter to our function so we can use it to generate both clean urls and slugs (in newspaper editing, a slug is a short name given to an article that is in production, source).

setlocale(LC_ALL, 'en_US.UTF8');
function toAscii($str, $delimiter='-') {
	$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
	$clean = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", '', $clean);
	$clean = strtolower(trim($clean, '-'));
	$clean = preg_replace("/[\/_|+ -]+/", $delimiter, $clean);

	return $clean;

// echo toAscii(“A piñata is a paper container filled with candy.”, ‘ ‘);
// returns: a pinata is a paper container filled with candy

There’s one more thing. The string “I’ll be back!” is converted to ill-be-back. This may or may not be an issue depending on your application. If you use the function to generate a searchable slug for example, looking for “ill” would return the famous Terminator quote that probably isn’t what you wanted.

setlocale(LC_ALL, 'en_US.UTF8');
function toAscii($str, $replace=array(), $delimiter='-') {
	if( !empty($replace) ) {
		$str = str_replace((array)$replace, ' ', $str);

	$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
	$clean = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", '', $clean);
	$clean = strtolower(trim($clean, '-'));
	$clean = preg_replace("/[\/_|+ -]+/", $delimiter, $clean);

	return $clean;

You can now pass custom delimiters to the function. Calling toAscii("I'll be back!", "'") you’ll get i-ll-be-back. Also note that the apostrophe is replaced before the string is converted to ascii as character encoding conversion may lead to weird results, for example é is converted to 'e, so the apostrophe needs to be parsed before the string is mangled by iconv.

The function seems now complete. Lets stress test it.

echo toAscii("Mess'd up --text-- just (to) stress /test/ ?our! `little` \\clean\\ url fun.ction!?-->");

returns: messd-up-text-just-to-stress-test-our-little-clean-url-function

echo toAscii("Perché l'erba è verde?", "'"); // Italian

returns: perche-l-erba-e-verde

echo toAscii("Peux-tu m'aider s'il te plaît?", "'"); // French

returns: peux-tu-m-aider-s-il-te-plait

echo toAscii("Tänk efter nu – förr'n vi föser dig bort"); // Swedish

returns: tank-efter-nu-forrn-vi-foser-dig-bort

echo toAscii("ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöùúûüýÿ");

returns: aaaaaaaeceeeeiiiidnooooouuuuyssaaaaaaaeceeeeiiiidnooooouuuuyy

echo toAscii("Custom`delimiter*example", array('*', '`'));

returns: custom-delimiter-example

echo toAscii("My+Last_Crazy|delimiter/example", '', ' ');

returns: my last crazy delimiter example

I’m sure we are far from perfection and probably some php/regex guru will soon bury me under my ignorance suggesting an über-simple alternative to my function. What do you thing?

Post source author: Andre, from



  1. Uberstrike Hackre · October 19, 2013

    I’m curious to find out what blog system you’re working with?

    I’m having some small security problems with my latest blog and I would like to find something more risk-free.
    Do you have any suggestions?

  2. Hearthstone · October 19, 2013

    I will right away grab your rss feed as I can’t find your e-mail subscription
    hyperlink or e-newsletter service. Do you’ve any?
    Please let me recognise so that I may subscribe. Thanks.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s