The cheat sheet below condenses insights extracted from the OWASP article about user input validation. This table aims to offer a concise and user-friendly overview that can be swiftly referenced for a quick grasp of key points. For further information, please visit OWASP – Input Validation

Topic
Goals of Input Validation
Input Validation Strategies
Allow List vs. Block List
Validating Free-Form Unicode Text
Public Serving of Uploaded Content
Email Address Validation
Disposable Email Addresses
Sub-Addressing
Key Points
Input validation is a critical defensive measure within information systems. Its primary objective is to ensure that only properly structured data enters the system, preventing the storage of erroneous or malicious content in the database. By identifying and rejecting malformed data at the earliest possible stage, input validation helps prevent downstream malfunctions and security vulnerabilities arising from tainted data.
This validation should occur promptly upon receiving data from external sources, encompassing not only web clients but also backend feeds, like those from suppliers or partners, which could be compromised and inadvertently transmit flawed data. While input validation contributes to mitigating the impact of attacks like Cross-Site Scripting (XSS) and SQL Injection, it's essential to combine it with dedicated security measures for these threats.
Input validation employs two levels of verification: Syntactic Validation, which enforces correct syntax according to a predefined structure (e.g., SSN format, date format); and Semantic Validation, which goes beyond syntax to ensure data correctness within the business context (e.g., checking if start date precedes end date, verifying price range).
Input validation methods can be diverse, including native data type validators within web frameworks, validation against JSON or XML schemas, type conversion functions with rigorous error handling, and regular expressions for structured data. By applying input validation to all incoming data, developers can prevent unauthorized input from progressing through the application.
The approach of Allow List Validation is recommended over Block List Validation. Allow listing involves defining explicitly what data is authorized, which enhances security. In contrast, block listing attempts to identify and filter out malicious patterns, but it's often ineffective as attackers can bypass such filters. Additionally, block list filters can inadvertently block legitimate input. Allow list validation is particularly suited for user-provided input fields.
For structured data, like dates or email addresses, developers can create strong validation patterns based on regular expressions. If the input is confined to a set of options, such as dropdown lists, validation should match predefined values.
Validating free-form text input, especially with Unicode characters, requires special consideration. Normalization is a key step, ensuring uniform encoding and removing invalid characters. Leveraging Character Category Allow-Listing, which includes Unicode categories like "decimal digits" or "letters," accommodates diverse scripts. For special cases, such as allowing apostrophes in specific names, individual character allow-listing is beneficial. However, it's crucial to prioritize context-aware output encoding, as input validation alone isn't the primary defense against Cross-Site Scripting (XSS).
Safely handling uploaded content involves ensuring that the correct Content-Type headers are set when serving files. Additionally, caution is warranted with "special" files like crossdomain.xml and .htaccess, as they can enable cross-domain data loading and server configuration manipulation. Validating uploaded files involves checking their filename extensions, maximum size, and content for potential malicious code.
Email address validation comprises Syntactic Validation, which checks the address format based on RFC 5321. However, real-world implementations may vary. More importantly, Semantic Validation involves sending an email to verify ownership and mailbox accessibility. This multi-step process ensures that the address is not only properly formatted but also functional.
Disposable email addresses are temporary, often used to avoid spam. Attempting to block them is challenging due to the constantly changing domains of disposable email providers. While there are lists of known disposable domains, these lists are never exhaustive. Educating users about potential risks associated with disposable email addresses is advisable, rather than outright blocking.
Sub-addressing enables users to append tags to their email addresses, allowing for categorization or identification purposes. While not universally supported by all email providers, it's commonly used by services like Gmail. Some users utilize different tags for different websites to track potential sources of spam. However, blocking sub-addressing might discourage users from providing accurate contact information. Furthermore, attempting to block sub-addressing can be bypassed by using disposable email addresses or creating multiple accounts with trusted providers.