URL Validation in WordPress

If your WordPress website allows users to submit URLs, and you’re not sanitizing them properly, you could have a whole host of security problems. On the flipside, if you’re removing too much, you might not be allowing valid URLs either.

This issue is pretty complex, and there’s quite a bit of confusion surrounding it, but it actually has a really simple solution in WordPress.

tl;dr; take-away

If your WordPress website/theme/plugin accepts user input of a URL, you need to either:

  • sanitize it using esc_url_raw($url) before saving it
  • validate it using esc_url_raw($url) === $url , and if it fails validation, reject it

Don’t use filter_var($url, FILTER_VALIDATE_URL)!

A Problematic Issue

Apparently, determining what’s a valid URL in PHP is a struggle. And it has been for a while.

“Just use filter_var” It’s So Easy…

The “PHP approved” way of doing it is to use PHP’s built-in function filter_var($url, FILTER_VALIDATE_URL). That’s the most commonly accepted answer on Stack Overflow.So, problem solved right?

Actually, using filter_var has a number of problems:

    • it rejects URLs with underscores, eg http://my_site.com
    • it accepts URLs that could lead to Cross-Site-Scripting Attacks, eg http://example.com/">
  • it rejects URLs with multi-byte characters, eg http://스타벅스코리아.com

The problems with filter_var are explained better in this article, and discussed extensively on php.net’s documentation page.

Some have argued that filter_var is technically correct about what’s a valid or invalid URL. But really what we want is a **safe** URL, not just a technically valid one. And filter_var doesn’t do much to verify the URL is safe.

So that’s no good.

Just Make a Regex…

How hard is it to make a Regular Expression to validate URLs? Some poor soul asked that question once on Stack Overflow, and received a barrage of 19 answers. There was really no universally accepted answer (the most popular saying to use filter_var, and the accepted answer had other issues). Besides, I personally find regexes impossible to understand.

Just Find a Library to Do That…

Mika Epstein blogged about her struggles to validate a URL. She found a PHP library that mostly did it, but it still required some tweaking.

If you’re not using WordPress, you’re right to look for a pre-made library to do this, because it’s not straight forward…

I personally was quite unsatisfied that such a common task had no well-documented, good solution.

WordPress’ built-in Solution to Validating URLs

It turns out WordPress has a good option that’s super simple: esc_url_raw(). As documented here on wordpress.org, the function is meant to sanitize a URL before saving it to the database (not to prepare for outputting on the screen, that’s what esc_url() is for.)

Technically, the function is for sanitizing URLs (ie, removing bad stuff from them), not validating them (asserting whether or not they’re valid). But you can use it for validating like so:

If the url had nothing invalid in it, then it’s valid. Pretty simple eh?

And it works well too. None of the criticisms of filter_var apply to it. We ran it through some unit tests and I have yet to see any problems with it.

Invalid URLs according to esc_url_raw():

Valid URLs According to esc_url_raw():

A Better-Sounding, but Inferior, Alternative

There’s also a better-sounding function, wp_http_validate_url(). But from my testing, it found http://localhost invalid, when it should be valid. And it found URLs like  ​​​http://example.com/"<script>alert("xss")<script> to be valid.

The light documentation says this function is primarily meant for validating a URL for use in the WP HTTP API, not for storing a user-submitted URL. So although it’s name sounds better, it’s probably not what you’re looking for, unless you’re using the WP HTTP API.

Conclusion

esc_url_raw() function is used to ensure website URLs of commenters on WordPress websites are safe. Ie, it’s used to sanitize input from public users on websites running over 30% of the web, so it’s pretty battle-tested. If there was a security problem with it, or it was rejecting valid URLs, I’m pretty sure it would have already been discovered.

So is a URL valid? Just check esc_url_raw($url) === $url.

Thoughts on this?

URLs with Unicode Have Arrived to Event Espresso!

Want to put unicode characters in your event URLs? Eg Chinese characters like 中文, or even emojis like ? ? ?? As of Event Espresso 4.9.66 you can!
Before pull request 635, we were technically handling these characters correctly (we were percent encoding them), but it seems WordPress has a bug that was preventing these URLs from working properly. So we’re instead, technically, creating IRLs for now- that is, URLs with unicode characters in them. We checked the code, and WooCommerce and Custom Post Types UI plugins both do the same thing: whatever unicode characters you enter into the custom post type slugs get left intact (ie, not percent-encoded).

When you enter your the desired slug for the custom post type slugs (eg a replacement for events), we do at least run sanitize_title() on it, which makes sure it’s a string that can be used in URLs. So we’ll help event managers to not use totally invalid URLs.

So feel free to try to use any crazy characters you like in your URLs. Event Espresso will strip away the bad ones, and leave only valid ones. Just be aware that while putting emojis in your URLs might be fun, it’s also not a great idea to have potential attendees trying to manually enter emojis into address bars ?.