Development 7 min read

URL Encoding Explained: A Developer's Guide to Percent-Encoding

Learn how URL encoding works, why percent-encoding exists, which characters must be encoded, and the difference between encodeURI and encodeURIComponent with practical examples.

What Is URL Encoding?

URL encoding, also known as percent-encoding, is the mechanism for representing characters in a URL that would otherwise be unsafe or ambiguous. It works by replacing the problematic character with a percent sign (%) followed by two hexadecimal digits representing the character's ASCII or UTF-8 byte value.

For example, the URL https://example.com/search?q=hello world&lang=en contains a space and an ampersand that would confuse the browser. After encoding: https://example.com/search?q=hello%20world&lang=en. The space becomes %20, and the URL works correctly.

URL encoding is not optional — it is required by the HTTP specification (RFC 3986). Every time you build a URL dynamically, submit a form, call an API, or construct a redirect, you need to encode user-supplied values. Skipping this step leads to broken links, security vulnerabilities, and data corruption.

URL Encoder/Decoder Encode and decode URLs instantly with three standards, batch mode, and character breakdown
Try It Free

Why URL Encoding Exists

URLs were originally designed for the limited ASCII character set. The URL specification reserves certain characters for structural purposes:

  • : separates the scheme from the authority (https://)
  • / separates path segments (/blog/post-title)
  • ? starts the query string (?page=1)
  • & separates query parameters (&sort=date)
  • # marks the fragment identifier (#section-2)
  • = separates parameter names from values (q=search+term)

If your data contains any of these characters — a user searching for "Q&A", a filename with spaces, a comment with accented characters — the raw characters would break the URL structure. Percent-encoding is the escape mechanism that solves this problem.

How Percent-Encoding Works

The encoding process follows three steps:

  1. Convert the character to its byte representation — For ASCII characters, this is a single byte. For Unicode characters, use UTF-8 encoding (which may produce 1–4 bytes).
  2. Express each byte as two hexadecimal digits — The byte value 32 (space) becomes 20; the byte value 38 (ampersand) becomes 26.
  3. Prefix with a percent sign — Space becomes %20, ampersand becomes %26.

Common Encoded Characters

Character Encoded Description
(space) %20 Space character
! %21 Exclamation mark
# %23 Hash / fragment identifier
$ %24 Dollar sign
& %26 Ampersand / parameter separator
+ %2B Plus sign
/ %2F Forward slash / path separator
= %3D Equals sign
? %3F Question mark / query delimiter
@ %40 At sign

Unreserved vs. Reserved Characters

RFC 3986 defines two categories of characters in URLs:

Unreserved characters — never need encoding:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9
- _ . ~

Reserved characters — have special meaning in URL structure and must be encoded when used as data:

: / ? # [ ] @ ! $ & ' ( ) * + , ; =

The key principle: reserved characters are only encoded when they appear in a context where their special meaning would cause ambiguity. A / in a path is a path separator (not encoded), but a / in a query parameter value must be encoded as %2F.

JavaScript Encoding Functions: encodeURI vs. encodeURIComponent

JavaScript provides two built-in functions for URL encoding, and using the wrong one is a common source of bugs:

Function Encodes Preserves Use For
encodeURI() Spaces, non-ASCII, some special chars : / ? # [ ] @ ! $ & ' ( ) * + , ; = Encoding a complete URL
encodeURIComponent() Everything except A-Z a-z 0-9 - _ . ~ ! ' ( ) * Only unreserved chars + ! ' ( ) * Encoding a single parameter value

The rule of thumb: use encodeURIComponent() for parameter values and encodeURI() for complete URLs. Here is why:

// CORRECT: encoding a parameter value
const query = "hello world & goodbye";
const url = "https://api.example.com/search?q=" + encodeURIComponent(query);
// Result: https://api.example.com/search?q=hello%20world%20%26%20goodbye

// WRONG: encodeURI does not encode &
const badUrl = "https://api.example.com/search?q=" + encodeURI(query);
// Result: https://api.example.com/search?q=hello%20world%20&%20goodbye
// Bug: & is interpreted as a parameter separator!

Our URL Encoder/Decoder lets you test all three encoding standards side by side — encodeURIComponent, encodeURI, and strict RFC 3986 — so you can see exactly which characters each function encodes.

Encoding Unicode and International Characters

Modern URLs frequently contain non-ASCII characters: accented letters (café), CJK characters (東京), emoji, and more. The encoding process for these characters involves an extra step:

  1. Convert to UTF-8 bytes — The character é (U+00E9) becomes two bytes: C3 A9.
  2. Percent-encode each byteC3 A9 becomes %C3%A9.

Some examples:

Character Unicode UTF-8 Bytes URL Encoded
é U+00E9 C3 A9 %C3%A9
ñ U+00F1 C3 B1 %C3%B1
U+20AC E2 82 AC %E2%82%AC
U+6771 E6 9D B1 %E6%9D%B1

This is the same UTF-8 encoding that Base64 encoding uses under the hood. When you need to represent binary or non-ASCII data in text contexts like URLs, email headers, or JSON payloads, both percent-encoding and Base64 are essential tools.

URL Encoding in Other Languages

Every programming language has built-in URL encoding functions:

# Python
from urllib.parse import quote, quote_plus
quote("hello world")        # "hello%20world"
quote_plus("hello world")   # "hello+world"

# PHP
urlencode("hello world");       // "hello+world"
rawurlencode("hello world");    // "hello%20world"

# Java
URLEncoder.encode("hello world", "UTF-8");  // "hello+world"

# C# / .NET
Uri.EscapeDataString("hello world");  // "hello%20world"

# Ruby
CGI.escape("hello world")    # "hello+world"
ERB::Util.url_encode("hello world")  # "hello%20world"

Note the inconsistency: some functions encode spaces as + (form encoding) and others as %20 (RFC 3986). For APIs and modern web development, prefer %20. The + convention dates back to HTML form submissions and is still valid in that context, but %20 is universally understood.

Common URL Encoding Mistakes

  1. Double encoding — Encoding an already-encoded URL turns %20 into %2520 (the % itself gets encoded). Always decode first if you are unsure whether the input is already encoded.
  2. Using encodeURI for parameter valuesencodeURI does not encode &, =, or +, so these characters will break query string parsing. Always use encodeURIComponent for values.
  3. Encoding entire URLs with encodeURIComponent — This encodes the slashes, colons, and question marks that give the URL its structure, producing an unusable result.
  4. Forgetting to encode path segments — File names with spaces, slashes, or special characters must be encoded in URL paths, not just in query strings.
  5. Not encoding for SEO — Search engines handle encoded URLs correctly. A well-structured URL with proper encoding ranks better than a broken URL with raw special characters.
.htaccess Generator Build clean URL rewrite rules, redirects, and SEO-friendly configurations in seconds
Try It Free
Base64 Encoder/Decoder Encode and decode data for safe transmission in URLs, APIs, and email headers
Try It Free

URL Encoding and Security

Proper URL encoding is a security requirement, not just a convenience. Improper encoding can lead to:

  • Open redirect vulnerabilities — Unencoded user input in redirect URLs can send users to malicious sites.
  • Cross-site scripting (XSS) — Injecting <script> tags through unencoded URL parameters that are reflected in page output.
  • SQL injection via URL — Passing unencoded SQL fragments through query parameters to vulnerable backends.
  • Path traversal — Using unencoded ../ sequences to access files outside the intended directory.

Always encode user-supplied data before inserting it into URLs. On the server side, always decode and validate URL parameters before using them in database queries, file paths, or HTML output. Use regular expressions to validate URL format and parameter values before processing them.

For API tokens and session identifiers transmitted via URLs, consider using UUIDs — they contain only hexadecimal characters and hyphens, which are all URL-safe without encoding.

Frequently Asked Questions

What is URL encoding?

URL encoding (also called percent-encoding) is the process of converting characters that are not allowed in a URL into a safe representation using a percent sign followed by two hexadecimal digits. For example, a space becomes %20 and an ampersand becomes %26. This ensures URLs are transmitted correctly across the internet.

What is the difference between encodeURI and encodeURIComponent?

encodeURI() encodes a full URL but preserves characters that have meaning in URLs (://?#[]@!$&'()*+,;=). encodeURIComponent() encodes everything except unreserved characters (A-Z, a-z, 0-9, - _ . ~), making it the right choice for encoding query parameter values. Use encodeURI for complete URLs and encodeURIComponent for individual parameter values.

Why are spaces encoded as %20 and sometimes as +?

In the URL path, spaces are encoded as %20 per RFC 3986. In HTML form submissions using application/x-www-form-urlencoded format, spaces are encoded as + (plus sign) per the HTML specification. Both represent a space, but %20 is the universal standard. Most modern APIs expect %20.

Which characters need to be URL encoded?

Any character that is not an unreserved character (A-Z, a-z, 0-9, hyphen, underscore, period, tilde) must be percent-encoded when used outside its reserved purpose. Common characters that need encoding include spaces (%20), ampersands (%26), equals signs (%3D), question marks (%3F), and non-ASCII characters like accented letters.

How does URL encoding handle Unicode characters?

Unicode characters are first encoded as UTF-8 bytes, then each byte is percent-encoded individually. For example, the euro sign (€) is encoded as UTF-8 bytes E2 82 AC, which becomes %E2%82%AC in the URL. This ensures international characters work correctly in URLs across all systems.

Conclusion

URL encoding is a foundational web skill. The rules are simple: unreserved characters pass through unchanged, reserved characters are encoded when used as data, and everything else is percent-encoded as UTF-8 bytes. In JavaScript, use encodeURIComponent() for parameter values and encodeURI() for complete URLs. In any language, always encode user input before building URLs.

When you need to quickly encode or decode a URL, test encoding standards, or debug a broken query string, use our free URL Encoder/Decoder. It supports three encoding standards, batch processing, and a character breakdown table — all running in your browser with no server round-trip.

Advertisement
Ad