If your WordPress website allows users to submit URLs, and you’re not sanitizing them properly, you could have a whole host of security problems. On the flipside, if you’re removing too much, you might not be allowing valid URLs either.
This issue is pretty complex, and there’s quite a bit of confusion surrounding it, but it actually has a really simple solution in WordPress.
tl;dr; take-away
If your WordPress website/theme/plugin accepts user input of a URL, you need to either:
- sanitize it using
esc_url_raw($url)
before saving it - validate it using
esc_url_raw($url) === $url
, and if it fails validation, reject it
Don’t use filter_var($url, FILTER_VALIDATE_URL)
!
A Problematic Issue
Apparently, determining what’s a valid URL in PHP is a struggle. And it has been for a while.
“Just use filter_var” It’s So Easy…
The “PHP approved” way of doing it is to use PHP’s built-in function filter_var($url, FILTER_VALIDATE_URL)
. That’s the most commonly accepted answer on Stack Overflow.So, problem solved right?
Actually, using filter_var
has a number of problems:
-
- it rejects URLs with underscores, eg
http://my_site.com
- it rejects URLs with underscores, eg
-
- it accepts URLs that could lead to Cross-Site-Scripting Attacks, eg
http://example.com/">
- it accepts URLs that could lead to Cross-Site-Scripting Attacks, eg
- it rejects URLs with multi-byte characters, eg
http://스타벅스코리아.com
The problems with filter_var
are explained better in this article, and discussed extensively on php.net’s documentation page.
Some have argued that filter_var
is technically correct about what’s a valid or invalid URL. But really what we want is a **safe** URL, not just a technically valid one. And filter_var
doesn’t do much to verify the URL is safe.
So that’s no good.
Just Make a Regex…
How hard is it to make a Regular Expression to validate URLs? Some poor soul asked that question once on Stack Overflow, and received a barrage of 19 answers. There was really no universally accepted answer (the most popular saying to use filter_var
, and the accepted answer had other issues). Besides, I personally find regexes impossible to understand.
Just Find a Library to Do That…
Mika Epstein blogged about her struggles to validate a URL. She found a PHP library that mostly did it, but it still required some tweaking.
If you’re not using WordPress, you’re right to look for a pre-made library to do this, because it’s not straight forward…
I personally was quite unsatisfied that such a common task had no well-documented, good solution.
WordPress’ built-in Solution to Validating URLs
It turns out WordPress has a good option that’s super simple: esc_url_raw()
. As documented here on wordpress.org, the function is meant to sanitize a URL before saving it to the database (not to prepare for outputting on the screen, that’s what esc_url()
is for.)
Technically, the function is for sanitizing URLs (ie, removing bad stuff from them), not validating them (asserting whether or not they’re valid). But you can use it for validating like so:
1 2 3 |
function isValid($url)
{
return esc_url_raw($url) === $url;
} |
If the url had nothing invalid in it, then it’s valid. Pretty simple eh?
And it works well too. None of the criticisms of filter_var
apply to it. We ran it through some unit tests and I have yet to see any problems with it.
Invalid URLs according to esc_url_raw()
:
1 |
http://example.com/"<script>alert("xss")<script> |
1 |
php://filter/read=convert.base64-encode/resource=/etc/passw |
1 |
foo://bar |
1 |
javascript://test%0Aalert(321) |
Valid URLs According to esc_url_raw()
:
1 |
http://foo.bar?foo=bar&other=thing |
1 |
http://스타벅스코리아.com |
1 |
http://localhost |
A Better-Sounding, but Inferior, Alternative
There’s also a better-sounding function, wp_http_validate_url()
. But from my testing, it found http://localhost
invalid, when it should be valid. And it found URLs like http://example.com/"<script>alert("xss")<script>
to be valid.
The light documentation says this function is primarily meant for validating a URL for use in the WP HTTP API, not for storing a user-submitted URL. So although it’s name sounds better, it’s probably not what you’re looking for, unless you’re using the WP HTTP API.
Conclusion
esc_url_raw()
function is used to ensure website URLs of commenters on WordPress websites are safe. Ie, it’s used to sanitize input from public users on websites running over 30% of the web, so it’s pretty battle-tested. If there was a security problem with it, or it was rejecting valid URLs, I’m pretty sure it would have already been discovered.
So is a URL valid? Just check esc_url_raw($url) === $url
.
Thoughts on this?