RFC 3986 Quiz

Understand URI components and encoding

0 / 0

References (URLs)

Goal: avoid subtle bugs when comparing, joining, or encoding URIs.

Q1: In a URI, what delimiter indicates the start of an authority component when it is present

Multiple Choice
**Explanation:** **Terms:** authority, delimiter, scheme. In the generic URI syntax, the authority component is preceded by "//" and is followed by path, query, or fragment delimiters. **Correct (C):** "//" signals that the following part is an authority, which can include userinfo, host, and port. **Options:** - A (incorrect): # introduces the fragment component, not authority. - B (incorrect): ? introduces the query component. - C (correct): // is the authority introducer in the generic syntax. **Related:** Not all schemes use an authority. For example, some schemes use an opaque path without //.

Q2: In the URI https://user:pass@example.com:8443/a?x=1#frag what is the authority component

Multiple Choice
**Explanation:** **Terms:** authority, userinfo, host, port. The authority component is the part after // up to the next /, ?, or # and it can include userinfo and port. **Correct (A):** In this example the authority includes userinfo (user:pass), host (example.com), and port (8443). **Options:** - A (correct): This matches the RFC 3986 structure [userinfo@]host[:port]. - B (incorrect): This is only the host subcomponent, not the entire authority here. - C (incorrect): /a is path, not authority. **Related:** Userinfo is discouraged in many contexts for security reasons, but it is part of the grammar.

Q3: Which characters are reserved in RFC 3986 (select all)

Multi-Select
**Explanation:** **Terms:** reserved, unreserved, gen-delims, sub-delims. Reserved characters have special meaning as delimiters in some URI components and may need percent-encoding when used as data. **Correct (A,B,C):** ?, # are gen-delims. & is a sub-delims character. ~ is unreserved, so it is typically safe without encoding. **Options:** - A (correct): ? separates path from query in generic syntax. - B (correct): # introduces the fragment component. - C (correct): & is reserved as a sub-delims character and often has application-specific meaning inside the query. - D (incorrect): ~ is in the unreserved set in RFC 3986 and is not reserved. **Related:** Do not confuse reserved vs unsafe. Reserved means it might be a delimiter, so encoding depends on component and intent.

Q4: When should you percent-encode a character in a URI

Multiple Choice
**Explanation:** **Terms:** percent-encoding, component, delimiter vs data. Percent-encoding is a mechanism to represent octets using %HH so that the URI remains syntactically valid and unambiguous. **Correct (C):** Encoding is context-dependent. If a reserved character would be interpreted as a delimiter but you need it as data, encode it. If a character is not permitted in a component, encode it or use an appropriate encoding scheme for that context. **Options:** - A (incorrect): RFC 3986 defines a generic syntax and percent-encoding of octets, but handling of non-ASCII depends on scheme and IRI processing. Blanket rules can break round-tripping. - B (incorrect): Reserved characters can appear as data, but then they should be percent-encoded to avoid delimiter interpretation. - C (correct): This captures the core rule and the reason behind it. **Related:** Many bugs happen when code decodes too early and loses the ability to distinguish delimiter vs data.

Q5: Which normalization is generally safe for comparison without changing meaning

Multiple Choice
**Explanation:** **Terms:** normalization, scheme, host, path, fragment. Normalization is changing the textual form while attempting to preserve the same reference. **Correct (A):** Scheme and host are case-insensitive in generic URI processing, so lowercasing them is a common, safe normalization step. **Options:** - A (correct): Does not change the reference for typical URI processing, because scheme and host are case-insensitive. - B (incorrect): Path case-sensitivity is scheme and server dependent. Lowercasing can change what resource is referenced. - C (incorrect): Fragments are not sent to the server, but they can be meaningful to clients. Removing them may change client behavior and is not universally safe. **Related:** Be careful: even seemingly harmless normalization can break signed URLs, caches, and routing.

Q6: In a URI, the part after # is called the ___ component

Short Text
**Explanation:** **Terms:** fragment component. In RFC 3986 the fragment is the part after # and it identifies a secondary resource or client-side reference. **Correct:** fragment. That is the RFC name for that component. **Why others are wrong:** Query is introduced by ?, path is the hierarchical component after the authority, and scheme is before :. Confusing these leads to incorrect parsing and incorrect URL handling. **Related:** Fragments are typically processed by the user agent and are not transmitted in an HTTP request.