SQL and Relational Databases Explained: How Data Is Structured, Stored, and Queried

Posts

SQL stands for Structured Query Language. It is the standard programming language used to communicate with and manage data in a Relational Database Management System (RDBMS). At its core, SQL is a declarative language, which means that the user specifies what data they want to retrieve or manipulate, rather than specifying how to perform the operation. The database engine itself figures out the most efficient way to execute the user’s “query,” or request. SQL is used for a wide range of tasks, including retrieving data, inserting new records, updating existing data, and deleting data. It is also used for database administration, which includes defining the structure of the database (its schema) and controlling access to the data.

What are SQL dialects? Give some examples.

While SQL is an industry standard, overseen by organizations like ANSI and ISO, it is not a single, monolithic language. Different database vendors have implemented the standard with their own proprietary extensions, optimizations, and variations. These different versions of the language are known as SQL dialects. While the core commands like SELECT, INSERT, UPDATE, and DELETE are nearly identical across all dialects, more advanced functions, data types, and procedural extensions can vary significantly. Some popular dialects are tied to specific commercial database systems, while others are associated with popular open-source platforms. There are also dialects designed for lightweight, embedded databases. Being familiar with one dialect provides a strong foundation for learning any other.

What are the main applications of SQL?

SQL is the backbone of data operations for countless applications and industries. Its primary application is interacting with relational databases, which store structured data for everything from websites and mobile apps to large-scale enterprise resource planning systems. Data analysts and business intelligence professionals use SQL to query databases to retrieve, summarize, and analyze data to generate reports and insights. Data scientists use it to extract and aggregate large datasets for use in machine learning models. Software developers embed SQL commands within their applications to store and retrieve user data, manage sessions, and log transactions. Database administrators use SQL to create, maintain, and secure the database itself, managing user permissions and tuning performance.

What is an SQL statement?

An SQL statement, also known as an SQL command, is a complete instruction sent to the database engine to perform a specific task. A statement is the “sentence” of the SQL language; it must be syntactically complete and executable on its own. Statements are composed of various keywords, clauses, identifiers, and operators. They are the fundamental units of action in SQL. Some of the most common examples of SQL statements include SELECT, which retrieves data; CREATE TABLE, which defines a new table; INSERT INTO, which adds new records; UPDATE, which modifies existing records; and GRANT, which manages user permissions.

What is an SQL query?

An SQL query is a specific type of SQL statement whose primary purpose is to retrieve data from a database. While the terms “query” and “statement” are often used interchangeably, “query” most precisely refers to a request for information. The SELECT statement is the primary and most common tool for writing queries. A query can be as simple as SELECT * FROM Users;, which retrieves all data from a table, or it can be incredibly complex, involving multiple tables, filtering, grouping, and subqueries to produce a specific, calculated result. There are also “action queries,” a term some use to describe statements that modify data, such as INSERT, UPDATE, and DELETE.

What is an SQL subquery?

An SQL subquery, also known as an inner query or a nested query, is a complete SELECT statement that is written inside another SQL statement (the “outer query”). The database engine executes the innermost subquery first, and its result set is then used by the outer query to complete its operation. Subqueries are a powerful tool for performing complex, multi-step logical operations in a single statement. They can be used in various parts of an outer query, such as in the SELECT clause to return a calculated value, in the FROM clause to create a temporary table to query against, or, most commonly, in the WHERE clause to filter data based on a condition derived from another table.

What is an SQL join?

An SQL JOIN clause is a fundamental operation used to combine rows from two or more tables based on a related column between them. Relational databases are designed to reduce redundancy by splitting data into multiple, related tables. A JOIN is the mechanism used to “re-assemble” this data at query time. For example, a Customers table might store customer information, and an Orders table might store order information. The JOIN clause would use a common CustomerID column to link the two tables, allowing you to retrieve a result set that shows which customer placed which order.

What is an SQL comment?

An SQL comment is human-readable text included in a piece of SQL code that is completely ignored by the database engine. Comments have no effect on the execution of the statement. Their sole purpose is to improve the readability and maintainability of the code for humans. They are used to explain what a complex part of the code does, why a certain decision was made, or to temporarily “comment out” a line of code for debugging purposes. There are typically two types of comments: single-line comments, which are preceded by a double hyphen (–), and multi-line comments, which are enclosed between /* and */.

What is an SQL alias?

An SQL alias is a temporary, user-defined name given to a table or a column during the execution of a specific query. Aliases are introduced with the AS keyword, though in many dialects, the AS is optional for table aliases. Aliases are used for two primary reasons: readability and necessity. In queries involving JOIN operations, table names can be long and repetitive. An alias (e.g., Customers AS c) makes the query much more compact and easier to read. For columns, aliases are used to give a more descriptive name to a calculated column (e.g., SUM(Salary) AS Total_Payroll) or to resolve ambiguity when two joined tables have a column with the same name.

What types of SQL commands do you know?

SQL commands are traditionally divided into five main categories based on their function. Data Definition Language (DDL) commands are used to define, create, and modify the structure of database objects like tables, indexes, and schemas. Data Manipulation Language (DML) commands are used for interacting with the data itself: inserting, updating, and deleting it. Data Control Language (DCL) commands are used to manage user access and permissions to the database. Transaction Control Language (TCL) commands are used to manage transactions, which are groups of operations that must succeed or fail as a single unit. Finally, Data Query Language (DQL) is the category for commands that retrieve data, which primarily consists of the SELECT statement.

Give some examples of common SQL commands.

For each of the five categories, there are several key commands. For Data Definition Language (DDL), the most common are CREATE (to build new objects like tables), ALTER (to modify the structure of existing objects), and DROP (to delete objects entirely). For Data Manipulation Language (DML), the core commands are INSERT (to add new rows), UPDATE (to modify data in existing rows), and DELETE (to remove rows). For Data Control Language (DCL), the main commands are GRANT (to give permissions) and REVOKE (to take permissions away). For Transaction Control Language (TCL), the key commands are COMMIT (to save a transaction) and ROLLBACK (to undo a transaction). For Data Query Language (DQL), the one and only command is SELECT.

What is a DBMS and what types of DBMS do you know?

DBMS stands for Database Management System. It is the software package that acts as an intermediary between the user, the applications, and the physical database. The DBMS is responsible for everything: it manages data storage and retrieval, enforces data integrity and security, handles backup and recovery, and controls concurrent access by multiple users. It provides a high-level, abstract interface (like SQL) so that users and developers do not need to worry about the complex, low-level details of how data is stored on disk. Besides the common relational model, other types include hierarchical, network, object-oriented, and graph-based database management systems, each organized around a different data model.

What is an RDBMS? Give some examples of RDBMS.

RDBMS stands for Relational Database Management System. It is the most common and widely used type of DBMS today. An RDBMS is based on the relational model, which means it stores data in a structured, tabular format, i.e., in rows and columns. These tables are related to each other through the use of shared keys. This model is incredibly effective for ensuring data integrity, consistency, and for performing complex queries. SQL is the standard language used to interact with an RDBMS. Examples of RDBMS platforms are abundant and include many popular open-source systems as well as large-scale commercial database products.

What are tables and fields in SQL?

A table is the fundamental data-storage object in a relational database. It is an organized collection of related data stored in a grid-like format of rows and columns. A table represents an “entity,” such as Customers, Products, or Orders. Each row in the table represents a single, complete record or “tuple” for that entity (e.g., one specific customer). A field is another term for a column in a table. Each column, or field, represents a single attribute or piece of information about that entity (e.g., FirstName, EmailAddress, or Price). Every column is defined with a specific data type, such as INTEGER or VARCHAR, which dictates what kind of data can be stored in that field.

What types of SQL subqueries do you know?

Subqueries can be categorized in several ways. One common distinction is based on the number of rows and columns they return. A “single-row” subquery returns, at most, one row and is often used with comparison operators in a WHERE clause. A “multi-row” subquery returns two or more rows and must be used with multi-row operators like IN, ANY, or ALL. A “multi-column” subquery returns two or more columns and is often used in the FROM clause. A more critical distinction is between “nested” and “correlated” subqueries. A nested, or simple, subquery is independent of the outer query and runs once. A “correlated” subquery depends on the outer query (it references a column from the outer query’s row) and must be executed once for every row processed by the outer query, which can be much slower.

What is a constraint and why are they used?

An SQL constraint is a rule applied to a column or a set of columns in a table to restrict the type of data that can be entered. Constraints are the primary mechanism for enforcing data integrity, accuracy, and reliability at the database level. By defining constraints, you prevent invalid or “bad” data from ever being stored, which is far more effective than trying to find and fix it later with an application. They enforce the business rules of the database. For example, a constraint can ensure that an Email column is never left blank, that a StockQuantity column can never be a negative number, or that every EmployeeID is unique.

What SQL constraints do you know?

There are several key types of constraints. NOT NULL ensures that a column cannot have a NULL (missing) value. UNIQUE guarantees that all values in a column (or a set of columns) are different from one another. PRIMARY KEY is a combination of NOT NULL and UNIQUE, uniquely identifying each row in a table. FOREIGN KEY enforces a link between two tables, ensuring that a value in one table’s column matches a value in another table’s primary key. CHECK is a more general constraint that ensures all values in a column satisfy a specific condition (e.g., Salary > 0). DEFAULT provides a default value for a column when an INSERT statement does not specify one.

What types of joins do you know?

There are four main types of JOIN operations. The INNER JOIN, which is the default, returns only the rows that have matching values in both tables. LEFT JOIN (or LEFT OUTER JOIN) returns all rows from the left table, and the matched rows from the right table; if no match is found, the columns from the right table will contain NULL. RIGHT JOIN (or RIGHT OUTER JOIN) is the mirror opposite: it returns all rows from the right table and the matched rows from the left. FULL JOIN (or FULL OUTER JOIN) returns all rows when there is a match in either the left or the right table; it is effectively a combination of a LEFT and a RIGHT join, filling in NULL values on both sides where matches do not exist. A CROSS JOIN is a special type that returns the “Cartesian product,” matching every row from the first table with every row from the second.

What is a primary key in SQL?

A primary key is a specific type of constraint that serves as the main, unique identifier for every row in a table. It is a column, or a set of columns, that must have a value, and that value must be unique within the table. This constraint is a combination of the UNIQUE and NOT NULL constraints. Because the primary key guarantees that every row can be uniquely identified, it is the central mechanism for relating one table to another. For example, the CustomerID in a Customers table would be the primary key, as it is a unique, non-null value that identifies one specific customer. A table can have only one primary key.

What is a unique key in SQL?

A unique key is a constraint that ensures all values in a column, or a set of columns, are unique. It is similar to a primary key in that it prevents duplicate data. However, it has two key differences. First, a table can have multiple UNIQUE constraints, whereas it can have only one PRIMARY KEY. Second, a UNIQUE constraint will, in most database systems, allow for one (and only one) NULL value to be stored in the column, because NULL is not considered equal to another NULL. In some systems, multiple NULL values are permitted. A unique key is typically used to enforce a business rule (e.g., “no two users can have the same email address”) on a column that is not the primary identifier.

What is a foreign key in SQL?

A foreign key is a constraint that creates a link between two tables, enforcing “referential integrity.” It is a column, or a set of columns, in one table that references the primary key (or a unique key) of another table. This constraint ensures that a value can only be inserted into the foreign key column if that same value already exists in the referenced primary key column. For example, if you add an Orders table, you would add a CustomerID column as a foreign key that references the CustomerID primary key in the Customers table. This makes it impossible to insert an order for a customer who does not exist, thereby maintaining the logical relationship and integrity of the data.

What is an SQL index?

An SQL index is a special data structure that is associated with a table and is used to significantly speed up data retrieval operations. An index works much like the index in the back of a book. Instead of having to read the entire book (a “full table scan”) to find a topic, you can go to the sorted index, find the topic, and get a page number that points directly to the information. An index in SQL does the same thing: it stores a sorted copy of one or more columns with a “pointer” to the physical location of the full row. This allows the database engine to find specific rows using a WHERE clause much more quickly.

What types of indices do you know?

The two most common and important types of indices are “clustered” and “non-clustered.” A “clustered” index defines the physical sort order of the data in the table. The table rows themselves are stored on disk in the order of the clustered index key. Because of this, a table can have only one clustered index. A “non-clustered” index is a separate data structure that exists apart from the table data. It contains the indexed column values in sorted order, and each value has a pointer back to the physical row in the table. It is like a traditional book index. A table can have many non-clustered indices. There are also “unique” indices, which enforce the UNIQUE constraint in addition to providing a fast lookup.

What is a schema?

In a database, a schema is a logical “container” or “blueprint” that defines the entire structure of the database. It is a collection of all the database objects, including tables, views, indexes, stored procedures, functions, and triggers. The schema defines the structural elements, such as the column names and data types in a table, the relationships between tables (primary and foreign keys), and the access permissions for different users. You can think of the database as the building, and the schema as the architect’s detailed blueprints that describe every room, hallway, and structural support. In some database systems, a schema is also used as a “namespace” to group related objects together.

What is an SQL operator?

An SQL operator is a reserved character, combination of characters, or keyword that is used in an SQL query to perform a specific operation. These operations can be mathematical calculations, comparisons between values, logical combinations of conditions, or string pattern matching. Operators are the “verbs” and “conjunctions” of an SQL statement, allowing you to build complex logic. They are most commonly used in the WHERE clause to establish the conditions for filtering data, but they are also used in SELECT clauses for calculations or in JOIN clauses to define the link.

What types of SQL operators do you know?

SQL operators can be grouped into several categories. “Arithmetic” operators perform mathematical calculations, such as + (addition), – (subtraction), * (multiplication), and / (division). “Comparison” operators are used to compare two values, such as = (equal), > (greater than), < (less than), and <> or != (not equal). “Logical” operators combine multiple boolean conditions, with the most common being AND, OR, and NOT. There are also special logical operators like BETWEEN, IN, and ANY. “String” operators, such as LIKE (for pattern matching) and its associated wildcards (%, _), are used for text manipulation. Finally, “Set” operators (UNION, INTERSECT, EXCEPT) are used to combine the results of two SELECT statements.

What is a clause?

In SQL, a clause is a component or a “keyword” that introduces a specific part of an SQL statement. Clauses are the main building blocks of a command. They define the different sections of the statement, each with a specific purpose. For example, in a SELECT statement, the FROM clause specifies which table to retrieve data from, the WHERE clause specifies the conditions for filtering rows, and the ORDER BY clause specifies how to sort the final result. Each clause begins with a specific keyword and provides instructions to the database engine on how to execute the statement.

What are some of the most common instructions used with the SELECT query?

The SELECT statement is the most complex command in SQL and is composed of many different clauses that modify its behavior. The most common ones, often listed in the order they are written, are: FROM, which specifies the source table or tables; JOIN, which combines data from other tables; WHERE, which filters rows based on a condition; GROUP BY, which groups rows that have the same values into summary rows; HAVING, which filters the groups after they have been aggregated; ORDER BY, which sorts the final result set; and LIMIT (or FETCH in some dialects), which restricts the number of rows returned by the query.

How do you create a table in SQL?

You create a table in SQL using the CREATE TABLE statement, which is a Data Definition Language (DDL) command. This statement not as only creates the table but also defines its entire structure: the names of its columns and the specific data type for each column (e.g., INTEGER, VARCHAR, DATE). You can also define constraints, such as NOT NULL, UNIQUE, and PRIMARY KEY, as part of the CREATE TABLE definition. The basic syntax involves specifying the table name, followed by a comma-separated list of column definitions enclosed in parentheses. For example: CREATE TABLE Users (UserID INT PRIMARY KEY, Username VARCHAR(50) NOT NULL, Email VARCHAR(100) UNIQUE);.

How to update a table?

You update existing data in a table using the UPDATE statement, which is a Data Manipulation Language (DML) command. The syntax consists of three main parts: the UPDATE clause, which specifies the table to be modified; the SET clause, which lists the column (or columns) to be changed and their new values; and, most importantly, the WHERE clause, which specifies which rows to update. If you omit the WHERE clause, the UPDATE statement will modify every single row in the table, which can be a catastrophic mistake. A correct statement would be: UPDATE Users SET Email = ‘new.email@example.com’ WHERE UserID = 101;.

How do I delete a table from a database?

You delete an entire table from a database using the DROP TABLE statement. This is a Data Definition Language (DDL) command and is one of the most destructive commands in SQL. When you execute DROP TABLE table_name;, you are not just deleting the data in the table; you are permanently removing the table’s entire structure from the database. All of the table’s data, indexes, constraints, and triggers are deleted and cannot be easily undone (unless you restore from a backup). This is different from DELETE, which only removes data from the table, or TRUNCATE, which removes all data but leaves the table structure intact.

How do I get the record count in a table?

You get the total number of records, or rows, in a table by using the COUNT() aggregate function. The most common way to do this is with the SELECT COUNT(*) FROM table_name; statement. The COUNT(*) syntax tells the database to count all rows in the table. You can also use COUNT(column_name), which will count all rows where the specified column is not NULL. For very large tables, COUNT(*) can be slightly faster or more accurate depending on the database engine, as it does not need to check the contents of a specific column. This query returns a single row with a single column containing the total row count.

How to sort the records in a table?

You sort the records in your query’s result set using the ORDER BY clause. This clause is almost always the last clause in a SELECT statement. By default, ORDER BY sorts the data in ascending order (ASC), from smallest to largest or alphabetically from A to Z. You can explicitly change this to descending order (DESC) by adding the DESC keyword after the column name. You can also sort by multiple columns. The database will first sort by the first column listed, and then, for any rows that have the same value in the first column, it will sort them based on the second column, and so on. For example: SELECT * FROM Products ORDER BY Category ASC, Price DESC;.

How to select all columns from a table?

You select all columns from a table by using the asterisk (*) wildcard character in the SELECT clause. The syntax is: SELECT * FROM table_name;. This is a shorthand that tells the database, “Instead of making me list every column name, just return all of them.” This is extremely useful for ad-hoc querying and data exploration, as you can quickly see the full structure of a table. However, it is considered bad practice to use SELECT * in production code or applications. This is because it can be inefficient (retrieving more data than needed) and, more importantly, it is not robust. If someone adds, removes, or renames a column in the table, the application code relying on SELECT * may break.

How to select common records from two tables?

You select only the common records that appear in the result sets of two different queries by using the INTERSECT set operator. The INTERSECT operator takes two SELECT statements and returns only the rows that are present in both results. For this to work, the two SELECT statements must have the same number of columns, and the columns must be of compatible data types. For example, if you wanted to find the customers who are also employees, you could use: SELECT Name FROM Customers INTERSECT SELECT Name FROM Employees;. This is different from a JOIN, which combines columns from two tables based on a key; INTERSECT compares the entire rows of two result sets.

What is the DISTINCT instruction and how is it used?

The DISTINCT instruction, or keyword, is used in a SELECT statement to filter out duplicate rows from the final result set. When you query a column, you may get many duplicate values. For example, SELECT City FROM Customers; might return “London” 50 times. By adding DISTINCT, as in SELECT DISTINCT City FROM Customers;, you will get a list of all unique cities, with “London” appearing only once. DISTINCT operates on the entire row of the SELECT list, not just a single column. If you select multiple columns (e.g., SELECT DISTINCT City, Country …), it will return only the unique combinations of those columns.

What are relationships?

In the context of a relational database, relationships are the logical connections and correlations between two or more tables. These relationships are the core of the “relational” model and are what allow you to store data efficiently without redundancy. They are established by linking tables using primary and foreign keys. For example, a Customers table and an Orders table have a relationship: a single customer can place many orders. This “one-to-many” relationship is enforced by placing the CustomerID (the primary key of the Customers table) into the Orders table as a foreign key, thus linking every order to a specific customer.

What is a NULL value? How is it different from zero or a blank space?

A NULL value is a special marker in an SQL database used to indicate that a data value is “missing” or “unknown” for a specific cell in a table. It is crucial to understand that NULL is not a value itself; it is the absence of a value. This makes it fundamentally different from zero (0), which is a valid numeric value, and different from an empty string (”), which is a valid string value of zero length. NULL behaves uniquely in comparisons. For example, NULL = NULL is not true, and NULL != 10 is also not true. Because NULL means “unknown,” you cannot definitively say if an “unknown” value is equal to another “unknown” value. To check for NULL, you must use the special operators IS NULL or IS NOT NULL.

What is the difference between SQL and NoSQL?

SQL and NoSQL represent two different philosophies of database management. SQL databases are “relational” (RDBMS). They are built on a foundation of structured data, where data is stored in tables with predefined schemas. A schema is a rigid blueprint that dictates the column names, data types, and relationships before any data is inserted. This model is excellent for ensuring high data integrity and for performing complex, transaction-heavy queries. NoSQL, which stands for “Not Only SQL,” refers to “non-relational” databases. These databases are designed for unstructured or semi-structured data and are typically “schema-less” or have a dynamic schema. They come in various types, like document stores, key-value stores, and graph databases. NoSQL databases are often favored for their horizontal scalability, high performance, and flexibility, making them popular for big data and real-time web applications.

What are some of the most common challenges when working with SQL databases?

One of the most common challenges is query performance tuning. As tables grow to millions or billions of rows, a poorly written query or a missing index can cause operations to slow down from milliseconds to minutes or even hours. Managing indexing strategies is a constant balancing act; while indexes speed up reads, they slow down writes (INSERT, UPDATE), so you must choose them carefully. Ensuring data integrity with constraints and managing concurrent transactions (multiple users reading and writing at the same time) are also complex challenges that can lead to “deadlocks” or data corruption if not handled properly. Database design itself, specifically normalization, is a challenge of balancing data redundancy against query performance.

What is a function in SQL?

An SQL function is a reusable database object that encapsulates a set of SQL statements to perform a specific task. A function is designed to be called within an SQL statement. It takes zero or more input parameters, performs calculations or other manipulations on them, and must return a result, which is typically a single value (a scalar function) or, in some dialects, a table (a table-valued function). Functions are used to promote code reusability, improve code readability, and encapsulate complex business logic. For example, you could create a function called CalculateSalesTax(price) that takes a price and returns the price with tax, allowing you to use it in any query without repeating the logic.

What types of SQL functions do you know?

There are two primary ways to categorize SQL functions. The first is by their behavior: “aggregate” versus “scalar.” Aggregate functions operate on a group of rows (usually defined by a GROUP BY clause) and return a single summary value for that entire group (e.g., SUM()). Scalar functions, on the other hand, operate on a single, individual value (or row) and return a single value for each row processed (e.g., UPPER()). The second way to categorize functions is by their origin: “built-in” versus “user-defined.” Built-in functions are provided by the database system itself (like all the examples so far). User-defined functions (UDFs) are functions created by the user to perform custom, specific business logic.

SQL Functions, Indexing, and Procedural Logic

Welcome to the fourth part of our 6-part series on SQL interview questions. Having covered the fundamentals of SQL, database structures, and practical DML/DDL commands, we now dive deeper into the powerful toolkit that SQL provides for data manipulation and optimization. This part focuses on the rich library of built-in functions that allow you to transform, calculate, and aggregate data directly within your queries. We will explore aggregate functions for summarizing data, scalar functions for row-by-row manipulation, and specific functions for handling text.

Beyond functions, we will move into more advanced intermediate-level topics. This includes a deeper look at indexing for performance, the critical difference between a stored procedure and a function, and the various set operators that allow you to combine query results. These questions test a candidate’s ability to write queries that are not just functional but also efficient and clean. Understanding this “second level” of SQL is what separates a beginner from an intermediate practitioner who can confidently handle complex data processing tasks.

What SQL aggregate functions do you know?

Aggregate functions are a core part of SQL, used to perform a calculation on a set of rows and return a single summary value. The most common aggregate functions are COUNT(), which returns the number of rows in a group; SUM(), which returns the total sum of a numeric column; AVG(), which returns the average value of a numeric column; MIN(), which returns the minimum value in a group; and MAX(), which returns the maximum value. These functions are almost always used in conjunction with the GROUP BY clause, which divides the table into groups, allowing the aggregate function to be calculated for each one (e.g., SELECT Category, AVG(Price) FROM Products GROUP BY Category;).

What SQL scalar functions do you know?

Scalar functions operate on a single value (or row) at a time and return a single value as a result. They are used in SELECT clauses to format data or in WHERE clauses to filter it. There are many categories. Numeric functions like ROUND() return a number rounded to a specified number of decimal places. Date and time functions like NOW() or GETDATE() return the current date and time. Conversion functions change data from one type to another. One of the most common categories is string (or character) functions. These include UPPER() (or UCASE()) to convert a string to uppercase, LOWER() (or LCASE()) to convert to lowercase, and LENGTH() (or LEN()) to return the length of a string.

What are case manipulation functions in SQL?

Case manipulation functions are a subset of character or scalar functions specifically used to change the letter case of text data. These are extremely useful for standardizing data for comparisons or for formatting data for display. For example, if users can enter their city as “London”, “london”, or “LONDON”, these will be treated as different values. Using a case manipulation function allows you to normalize them. The main functions are UPPER() (or UCASE() in some dialects), which returns the string converted to all uppercase letters, and LOWER() (or LCASE()), which returns the string converted to all lowercase. Some dialects also provide an INITCAP() function, which capitalizes the first letter of each word in the string.

What are character manipulation functions in SQL?

Character manipulation functions are a broad category of scalar functions used to modify, parse, or analyze text data. CONCAT() is used to join two or more string values together. SUBSTRING() (or SUBSTR()) is used to extract a portion of a string by specifying a starting position and a length. LENGTH() (or LEN()) returns the number of characters in a string. REPLACE() is used to find all occurrences of a specific substring within a string and replace them with another substring. TRIM() (or LTRIM() and RTRIM()) is used to remove leading, trailing, or all occurrences of specified characters or whitespace from a string. INSTR() (or CHARINDEX()) returns the numeric starting position of a substring within a string.

What is the difference between local and global variables?

In procedural SQL extensions (like T-SQL or PL/SQL), variables are used to store data during the execution of a script or procedure. The primary difference between local and global variables is their “scope.” A “local” variable is one that is declared within a specific function, stored procedure, or batch of code. It can only be accessed and is only visible within that specific block of code where it was declared. When the function or procedure finishes, the local variable is destroyed. A “global” variable, which is much less common, is declared in a way that makes it visible and accessible to the entire database session, or in some cases, across all sessions on the server. They are stored in fixed memory structures and persist for the duration of the session.

What is the difference between SQL and PL/SQL?

SQL (Structured Query Language) is a “declarative” language. It is primarily used to define and manipulate data; you tell the database what data you want, and the engine figures out how to get it. PL/SQL (Procedural Language/SQL) is a procedural extension for a specific, major commercial database system (other dialects have similar extensions, like T-SQL). PL/SQL is a full-fledged programming language that “wraps” SQL. It allows you to use SQL commands inside of a procedural block, adding programming constructs like IF…THEN…ELSE conditional statements, FOR or WHILE loops, exception handling, and variables. It is used to write complex stored procedures, functions, and triggers that execute complex business logic directly within the database, rather than in an external application.

What is the difference between LEFT JOIN and LEFT OUTER JOIN?

There is absolutely no functional difference between LEFT JOIN and LEFT OUTER JOIN. The keyword OUTER is optional and purely for readability and explicitness. In the SQL standard, the OUTER keyword is implied for LEFT, RIGHT, and FULL joins. Therefore, LEFT JOIN is simply a shorthand for LEFT OUTER JOIN. Both commands will execute identically, returning all records from the “left” table and only the matching records from the “right” table. Any rows from the left table that do not have a match in the right table will have NULL values in the columns from the right table.

What is indexing in SQL and how does it improve performance?

Indexing is the process of creating a special data structure that is associated with a table or view to dramatically speed up data retrieval. This structure is essentially an optimized, sorted lookup table for one or more columns. When you run a query with a WHERE clause on an indexed column, the database engine does not need to perform a “full table scan” (reading every single row in the table). Instead, it uses the index to quickly find the exact “address” or location of the rows that match the condition, much like using a book’s index to find a page number. This can reduce query time from minutes to milliseconds. However, there is a trade-off: indexes speed up SELECT operations but slow down INSERT, UPDATE, and DELETE operations, because the index itself must also be updated with every data change.

What is a stored procedure and how does it differ from a function?

A stored procedure is a precompiled set of one or more SQL statements that are saved by name in the database and can be executed as a single unit. Procedures are used to encapsulate complex business logic, manage transactions, and perform administrative tasks. The key difference between a stored procedure and a function is their primary purpose and return value. A function must return a value (either a single scalar value or, in some systems, a table) and is typically used within an SQL expression (like in a SELECT or WHERE clause). A procedure does not have to return a value. Its main purpose is to perform an action, such as inserting data, modifying tables, or managing transactions. It can, however, return zero or more result sets to the client.

What is the default order of data with the ORDER BY statement and how can it be changed?

The default order of data when using the ORDER BY statement is ascending, or ASC. This means that numeric values are sorted from smallest to largest, dates are sorted from earliest to latest, and text strings are sorted in alphabetical order from A to Z. The handling of NULL values in this default sort can vary by database dialect; some place them first, others place them last. You can explicitly change this default behavior by adding the DESC (descending) keyword after the column name in the ORDER BY clause. This will sort the data in the opposite direction (largest to smallest, Z to A). You can also specify ASC for clarity, but it is not required.

What are SQL set operators?

SQL set operators are used to combine the results of two or more independent SELECT statements into a single result set. For these operators to work, the SELECT statements being combined must have the same number of columns, and those columns must have compatible data types. The main set operators are UNION, which combines the results and automatically removes any duplicate rows; UNION ALL, which combines the results but includes all duplicate rows, making it much faster; INTERSECT, which returns only the rows that are present in both query results; and EXCEPT (or MINUS in some dialects), which returns only the rows that are present in the first query’s result set but not in the second.

Which operator is used in the pattern matching query?

The operator used for pattern matching in SQL is the LIKE operator, which is used within a WHERE clause. The LIKE operator allows you to search for a specific pattern in a text column, rather than an exact match. It is used in combination with two wildcard characters. The percent sign (%) is a wildcard that represents any number of characters, including zero characters. For example, WHERE Name LIKE ‘S%’ would find all names that start with the letter ‘S’. The underscore (_) is a wildcard that represents exactly one character. For example, WHERE Name LIKE ‘T_m’ would find “Tim” and “Tom” but not “Thom”.

What is the difference between a primary key and a unique key in SQL?

This is a very common question that tests a candidate’s understanding of data integrity. Both constraints enforce uniqueness in a column. The main difference is that a table can have only one PRIMARY KEY, but it can have multiple UNIQUE keys (or constraints). The PRIMARY KEY is the main identifier for the table and, by definition, cannot contain any NULL values. The UNIQUE constraint is used to enforce a business rule on other columns (e.g., ensuring no duplicate email addresses). In most database systems, a UNIQUE constraint will allow a NULL value to be stored (often, only one NULL is permitted, though this varies).

What is a composite SQL primary key?

A composite primary key is a primary key that is formed by combining two or more columns in a table. This is used in situations where no single column by itself can uniquely identify a row. The PRIMARY KEY constraint is applied to the combination of the columns, meaning that the combination of values in those columns must be unique and non-null for every row. A common example is a “junction table” or “associative table,” used to model a many-to-many relationship. For instance, in a Student_Courses table, the primary key would likely be a composite of (StudentID, CourseID) to represent a unique enrollment.

SQL Logic, Query Execution, and Database Design

Welcome to the fifth part of our 6-part series on SQL interview questions. In the previous installments, we progressed from foundational definitions to practical commands and the rich library of SQL functions. Now, we enter the realm of advanced query logic and database design. This part focuses on the “why” and “how” of SQL operations. We will explore the logical execution order of a query, which is a critical concept for debugging complex GROUP BY statements. We will also demystify views, subqueries, and the core database design principles of normalization and denormalization.

These intermediate-to-advanced questions are designed to separate candidates who merely use SQL from those who truly understand it. Answering these questions correctly demonstrates a deeper level of expertise and shows an interviewer that a candidate can think about performance, security, and data integrity at a high level. Understanding the difference between a correlated and a nested subquery, or when to use denormalization, is a hallmark of a mature data professional.

What is the usual order of SQL clauses in a SELECT statement?

The usual syntactical order, which is the order you must write the clauses in your query, is very specific. The statement must begin with the SELECT clause, followed by the FROM clause. After that, the common clauses follow in this sequence: JOIN (with its ON condition), WHERE, GROUP BY, HAVING, and finally ORDER BY. A LIMIT or FETCH clause, if used, almost always comes last. This order is a rigid rule of the SQL language. For example, you cannot place the WHERE clause before the FROM clause, or the ORDER BY clause before the GROUP BY clause.

In what order does the interpreter execute the common statements in the SELECT query?

This is a critical “trick” question that tests a deep understanding of SQL. The logical execution order (how the database engine thinks about the query) is very different from the syntactical order (how you write the query). The first clause to be processed is FROM, which, along with JOIN and ON, determines the source data. Next, the WHERE clause is applied to filter out individual rows that do not meet the criteria. After that, the GROUP BY clause is used to aggregate the remaining rows into groups. Then, the HAVING clause is applied to filter out entire groups that do not meet its criteria. Only then is the SELECT clause processed to choose the final columns and calculate expressions. Finally, the ORDER BY clause sorts the result set, and LIMIT discards any extra rows.

What is a view in SQL?

A view is a “virtual table” whose contents are defined by a SELECT query. It is a database object that stores a saved query. When you query a view, it looks and acts just like a regular table, but it does not physically store any data itself. Instead, it runs its underlying SELECT query every time it is accessed, presenting the results to the user as if it were a real table. Views are extremely useful for two main reasons. First, they provide security: you can grant a user access to a view that shows only some of the columns or rows from a sensitive table. Second, they provide simplicity: a complex, multi-table JOIN query can be hidden inside a view, allowing users to query it with a simple SELECT * statement.

Can we create a view based on another view in SQL?

Yes, you can. Views can be “nested,” meaning you can create a new view whose SELECT statement queries an existing view, which in turn queries an actual table (or even another view). For example, you could have a Sales_View that joins Orders and Products. You could then create a second view, High_Value_Sales_View, that queries the Sales_View with a WHERE clause like WHERE TotalPrice > 1000. This can be a useful way to build up layers of abstraction and security. However, this practice should be used with caution. Deeply nested views can become very difficult to debug and can often lead to significant performance problems, as the database engine has to “un-pack” multiple layers of queries.

Can a view still be used if the original table is deleted?

No. A view is just a “definition” or a “stored query”; it is not a copy of the data. The view is completely dependent on the underlying tables and schema objects that it references in its SELECT statement. If you execute a DROP TABLE command on one of the original tables that the view is based on, the view object itself will still exist in the database, but it will become invalid or “broken.” The next time any user tries to query that view, the database engine will return an error because it cannot find the underlying table it needs to run its query. To fix this, you would have to either restore the dropped table or drop and recreate the view with a new definition.

What types of SQL relationships do you know?

There are three main types of relationships that can exist between tables in a relational database. The “one-to-one” relationship is the least common, where one row in Table A is linked to exactly one row in Table B (e.g., an Employee table and an Employee_Payroll_Details table). The “one-to-many” relationship is the most common, where one row in Table A can be linked to many rows in Table B (e.g., one Customer can have many Orders). This is implemented by placing the primary key of Table A into Table B as a foreign key. The “many-to-many” relationship is where many rows in Table A can be linked to many rows in Table B (e.g., Students and Courses). This type of relationship cannot be implemented directly and requires a third “junction” or “associative” table to link the two.

What are the possible values of a BOOLEAN data field?

The SQL standard defines the BOOLEAN data type as having three possible values: TRUE, FALSE, and UNKNOWN. The UNKNOWN value is how the database represents a NULL in a boolean context. This “three-valued logic” is a key concept. For example, the result of TRUE AND UNKNOWN is UNKNOWN, while the result of FALSE AND UNKNOWN is FALSE. While this is the standard, not all SQL dialects implement a formal BOOLEAN data type. A common alternative is to use a BIT or TINYINT data type, where the value 1 is used to represent TRUE and the value 0 is used to represent FALSE, with NULL still representing the “unknown” or “missing” state.

What is normalization in SQL?

Normalization is the process of designing a relational database schema in a systematic way to reduce data redundancy and improve data integrity. The goal is to ensure that data is stored logically and that dependencies are enforced correctly. This process involves “decomposing” or splitting large tables into smaller, less redundant, and more specialized tables. These new, smaller tables are then linked together using foreign keys. The process is guided by a set of rules known as “normal forms” (e.g., First Normal Form – 1NF, Second Normal Form – 2NF, Third Normal Form – 3NF). A normalized database is very efficient for write operations (INSERT, UPDATE) because a piece of data (like a customer’s address) only needs to be updated in one single, correct place.

What is denormalization in SQL?

Denormalization is the opposite of normalization. It is the process of intentionally adding redundant data back into a database schema to improve query performance. While a highly normalized database is efficient for writing data, it can be slow for reading data, as a simple report might require the database to perform many complex JOIN operations to re-assemble the data. Denormalization optimizes for “read” speed by pre-joining some of this data. For example, instead of forcing a JOIN to the Products table every time, you might add the ProductName column directly to the Order_Details table. This is a classic trade-off: you sacrifice data integrity and write-efficiency in exchange for much faster read queries.

What is the difference between renaming a column and assigning it an alias?

The difference is one of permanence and scope. Assigning an “alias” is a temporary, logical change that exists only for the duration of a single query. You use the AS keyword in a SELECT statement to change the name of a column in the final result set. The physical table on the disk is not changed at all. “Renaming” a column, by contrast, is a permanent, physical change to the database schema. This is done using a Data Definition Language (DDL) command, such as ALTER TABLE table_name RENAME COLUMN old_name TO new_name;. This command permanently changes the structure of the table, and all future queries must use the new column name.

What is the difference between nested and correlated subqueries?

A nested subquery (or simple subquery) is an independent query. It can be run on its own, and its result is then passed to the outer query. The database engine executes the nested subquery once, and the outer query then uses that result (which is a list of values or a single value) to complete its own operation. A correlated subquery, on the other hand, is dependent on the outer query. It contains a reference to a column from the outer query’s row. Because of this dependency, the correlated subquery must be executed repeatedly, once for every single row that is being processed by the outer query. This makes correlated subqueries generally much, much slower than simple nested subqueries.

What is the difference between clustered and non-clustered indices?

This is a fundamental concept in database performance. A “clustered” index defines the physical sort order of the data rows in the table. The data itself is stored on disk in the order of the clustered index key. Because the data can only be physically stored in one order, a table can have at most one clustered index (in many systems, the primary key automatically becomes the clustered index). A “non-clustered” index is a completely separate data structure, like an index in a book. It is a sorted list of the indexed values, and each value has a “pointer” (like a page number) that points to the physical location of the row in the table. A table can have many non-clustered indices.

What is the CASE() function?

The CASE statement is the primary way to implement conditional IF…THEN…ELSE logic within an SQL query. It is an expression that can be used in a SELECT clause to transform data or in a WHERE or ORDER BY clause to define conditional logic. It evaluates a list of conditions and returns one of multiple possible result expressions. The syntax has two forms: a “simple” form that compares one value against multiple possibilities, and a “searched” form that evaluates multiple independent conditions. The searched form is more common: CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 ELSE default_result END.

What is the difference between the DELETE and TRUNCATE statements?

Both DELETE and TRUNCATE are used to remove data from a table, but they work very differently. DELETE is a Data Manipulation Language (DML) command. It removes rows one by one and logs each deletion in the transaction log. Because it is row-by-row, it is slow, but it allows you to use a WHERE clause to specify which rows to delete. It is also a logged operation, meaning it can be “rolled back.” TRUNCATE is a Data Definition Language (DDL) command. It removes all data by “de-allocating” the data pages of the table, which is an extremely fast, non-logged operation. You cannot use a WHERE clause with TRUNCATE (it is all-or-nothing), and it generally cannot be rolled back.

What is the difference between DROP and TRUNCATE statements?

This is another common distinction. Both are Data Definition Language (DDL) commands and are very fast. The difference is what they remove. TRUNCATE removes all data from a table, but the table’s structure remains. After a TRUNCATE, you are left with an empty table, complete with all its columns, constraints, and indexes. DROP, on the other hand, is far more destructive. The DROP TABLE command removes everything. It deletes all the data, the table’s structure, and all associated indexes, constraints, and triggers. After a DROP, the table no longer exists in the database.

Advanced Scenarios and Practical Query Solutions

Welcome to the sixth and final part of our comprehensive series on SQL interview questions. In the previous five parts, we have built a complete pyramid of knowledge, starting from the foundational definitions of SQL and RDBMS, moving through the practical mechanics of data manipulation and joins, exploring the nuances of functions and procedural logic, and finally analyzing the “behind-the-scenes” world of query execution and database design. Now, we put all that knowledge to the test.

This concluding part focuses on practical, scenario-based questions. These are the “problem-solving” questions that an interviewer uses to see if a candidate can apply their theoretical knowledge to solve real-world problems. This section covers critical comparisons like HAVING versus WHERE, and provides the specific syntax for common tasks like adding columns, finding the “nth” highest value, and selecting random records. These questions test a candidate’s fluency and problem-solving ability, which are the final and most important indicators of a skilled SQL practitioner.

What is the difference between a HAVING and a WHERE clause?

This is one of the most important concepts in intermediate SQL. Both clauses are used for filtering, but they operate at different stages of the query execution. The WHERE clause is processed before any grouping or aggregation happens. It filters individual rows from the source table. If a row does not meet the WHERE criteria, it is discarded and is not included in the GROUP BY calculation. The HAVING clause is processed after the GROUP BY clause and after the aggregate functions have been calculated. It is used to filter entire groups of rows. You must use HAVING if you want to filter on the result of an aggregate function (e.g., HAVING SUM(Sales) > 1000), as those values do not yet exist when the WHERE clause is being processed.

How do you add a record to a table?

You add a new record, or row, to a table using the INSERT INTO statement, which is a Data Manipulation Language (DML) command. The syntax requires you to specify the table name and the values to be inserted. There are two main ways to write it. The first, and most robust, way is to specify the column names you are inserting into: INSERT INTO Customers (Username, Email, SignupDate) VALUES (‘jsmith’, ‘jsmith@example.com’, ‘2025-10-17’);. This is the best practice because it will not break if the table’s column order changes. The second way is to omit the column names, but this requires you to provide a value for every column in the table, in the exact order they appear in the schema.

How do you delete a record from a table?

You delete one or more records from a table using the DELETE FROM statement, which is a DML command. The most critical part of this statement is the WHERE clause, which specifies which records to delete. The syntax is: DELETE FROM Customers WHERE CustomerID = 101;. If you execute a DELETE statement without a WHERE clause (e.g., DELETE FROM Customers;), it will delete every single row from the table. This is a very dangerous command to run without a WHERE condition. Because DELETE is a logged, row-by-row operation, it can be slow on large tables, but it can also be undone with a ROLLBACK command if it is part of a transaction.

How do you add a column to a table?

You add a new column to an existing table using the ALTER TABLE statement, which is a Data Definition Language (DDL) command. This command is used to modify the structure of a table. To add a column, you must specify the table name, the name of the new column, and the data type for that new column. The syntax is: ALTER TABLE Customers ADD COLUMN LoyaltyPoints INT;. You can also specify constraints for the new column at the same time, such as a DEFAULT value (e.g., ADD COLUMN LoyaltyPoints INT DEFAULT 0), which is often a good practice, especially on tables that already contain data.

Conclusion

We have covered the “what” (definitions), the “how” (practical syntax), and the “why” (database design and execution logic). For candidates, mastering these 84 questions and, more importantly, the concepts behind them will provide a strong foundation for any technical interview. For hiring managers and recruiters, this list serves as a comprehensive guide for evaluating a candidate’s true depth of SQL knowledge. SQL remains one of the most durable and in-demand skills in the technology industry, and a solid grasp of these topics is a key indicator of a successful data professional.