URL Validation in WordPress

If your WordPress website allows users to submit URLs, and you’re not sanitizing them properly, you could have a whole host of security problems. On the flipside, if you’re removing too much, you might not be allowing valid URLs either.

This issue is pretty complex, and there’s quite a bit of confusion surrounding it, but it actually has a really simple solution in WordPress.

tl;dr; take-away

If your WordPress website/theme/plugin accepts user input of a URL, you need to either:

  • sanitize it using esc_url_raw($url) before saving it
  • validate it using esc_url_raw($url) === $url , and if it fails validation, reject it

Don’t use filter_var($url, FILTER_VALIDATE_URL)!

A Problematic Issue

Apparently, determining what’s a valid URL in PHP is a struggle. And it has been for a while.

“Just use filter_var” It’s So Easy…

The “PHP approved” way of doing it is to use PHP’s built-in function filter_var($url, FILTER_VALIDATE_URL). That’s the most commonly accepted answer on Stack Overflow.So, problem solved right?

Actually, using filter_var has a number of problems:

    • it rejects URLs with underscores, eg http://my_site.com
    • it accepts URLs that could lead to Cross-Site-Scripting Attacks, eg http://example.com/">
  • it rejects URLs with multi-byte characters, eg http://스타벅스코리아.com

The problems with filter_var are explained better in this article, and discussed extensively on php.net’s documentation page.

Some have argued that filter_var is technically correct about what’s a valid or invalid URL. But really what we want is a **safe** URL, not just a technically valid one. And filter_var doesn’t do much to verify the URL is safe.

So that’s no good.

Just Make a Regex…

How hard is it to make a Regular Expression to validate URLs? Some poor soul asked that question once on Stack Overflow, and received a barrage of 19 answers. There was really no universally accepted answer (the most popular saying to use filter_var, and the accepted answer had other issues). Besides, I personally find regexes impossible to understand.

Just Find a Library to Do That…

Mika Epstein blogged about her struggles to validate a URL. She found a PHP library that mostly did it, but it still required some tweaking.

If you’re not using WordPress, you’re right to look for a pre-made library to do this, because it’s not straight forward…

I personally was quite unsatisfied that such a common task had no well-documented, good solution.

WordPress’ built-in Solution to Validating URLs

It turns out WordPress has a good option that’s super simple: esc_url_raw(). As documented here on wordpress.org, the function is meant to sanitize a URL before saving it to the database (not to prepare for outputting on the screen, that’s what esc_url() is for.)

Technically, the function is for sanitizing URLs (ie, removing bad stuff from them), not validating them (asserting whether or not they’re valid). But you can use it for validating like so:

If the url had nothing invalid in it, then it’s valid. Pretty simple eh?

And it works well too. None of the criticisms of filter_var apply to it. We ran it through some unit tests and I have yet to see any problems with it.

Invalid URLs according to esc_url_raw():

Valid URLs According to esc_url_raw():

A Better-Sounding, but Inferior, Alternative

There’s also a better-sounding function, wp_http_validate_url(). But from my testing, it found http://localhost invalid, when it should be valid. And it found URLs like  ​​​http://example.com/"<script>alert("xss")<script> to be valid.

The light documentation says this function is primarily meant for validating a URL for use in the WP HTTP API, not for storing a user-submitted URL. So although it’s name sounds better, it’s probably not what you’re looking for, unless you’re using the WP HTTP API.

Conclusion

esc_url_raw() function is used to ensure website URLs of commenters on WordPress websites are safe. Ie, it’s used to sanitize input from public users on websites running over 30% of the web, so it’s pretty battle-tested. If there was a security problem with it, or it was rejecting valid URLs, I’m pretty sure it would have already been discovered.

So is a URL valid? Just check esc_url_raw($url) === $url.

Thoughts on this?